I am trying to set up data source, but while Linking Data File and schema validation I am getting different messages.
Steps performed:
1. Generated Parquet data stream in ADLS Gen2 from SStreams in ADLS Gen1 using ADF copy activity.
2. File generated are in folder structure: <container_name>/<folder>/YYYY/MM/DD/<file_name>.parquet
3. Creating new data source in CI and ingesting data.
4. Now, while setting up data source, when I am trying to validate schema for files under folder structures as mentioned below.
- <container_name>/<folder>/2022/05/ --> It got succeeded.
- <container_name>/<folder>/2022/04/ --> It got failed with error as mentioned below.
- <container_name>/<folder>/2022/02/ --> It has found data files, but schema mismatch error.
Error:
class parquet::ParquetStatusException (message: 'IOError: Corrupt snappy compressed data.')
Request ID: d3bca710-c0b1-49e6-94f4-b74a2baf7477
Time: 7/13/2022, 4:12:22 PM
How it got succeeded for one file and failing for other.
Also, I have confirmed that schema is exactly matching for all files by running few scripts. But still we are getting error in CI while schema validation.
