I am trying to set up data source, but while Linking Data File and schema validation I am getting different messages.
Steps performed:
1. Generated Parquet data stream in ADLS Gen2 from SStreams in ADLS Gen1 using ADF copy activity.
2. File generated are in folder structure: <container_name>/<folder>/YYYY/MM/DD/<file_name>.parquet
3. Creating new data source in CI and ingesting data.
4. Now, while setting up data source, when I am trying to validate schema for files under folder structures as mentioned below.
Error:
class parquet::ParquetStatusException (message: 'IOError: Corrupt snappy compressed data.')
Request ID: d3bca710-c0b1-49e6-94f4-b74a2baf7477
Time: 7/13/2022, 4:12:22 PM
How it got succeeded for one file and failing for other.
Also, I have confirmed that schema is exactly matching for all files by running few scripts. But still we are getting error in CI while schema validation.
Hi Kulbushan -
Can you write the output files in parquet version 1? ParquetProperties.WriterVersion.PARQUET_1_0
parquet version 2 files are not yet supported.
thanks,
mukesh
Yes I have tried it, result remains the same. I am getting same error messages.
Earlier I was generating parquet files from synapse pipeline using copy activity.
Now I have tried generating parquet files through pyspark script and they trying to ingest this data in CI but getting same error messages.
Hi Kulbhushan,
Have you tried to add it again?
Data files are not matching as expected.
Note: There are only two folders i.e 2022, 2021 under <folder>
What's the issue here?
André Arnaud de Cal...
291,996
Super User 2025 Season 1
Martin Dráb
230,853
Most Valuable Professional
nmaenpaa
101,156