Skip to main content

Notifications

Customer experience | Sales, Customer Insights,...
Unanswered

Schema mismatch error - Data Ingestion

(0) ShareShare
ReportReport
Posted on by

I am trying to set up data source, but while Linking Data File and schema validation I am getting different messages.

Steps performed:

1. Generated Parquet data stream in ADLS Gen2  from SStreams in ADLS Gen1 using ADF copy activity.

2. File generated are in folder structure:  <container_name>/<folder>/YYYY/MM/DD/<file_name>.parquet

3. Creating new data source in CI and ingesting data.

4. Now, while setting up data source, when I am trying to validate schema for files under folder structures as mentioned below.

  1. <container_name>/<folder>/2022/05/ --> It got succeeded.
  2. <container_name>/<folder>/2022/04/ --> It got failed with error as mentioned below.
  3. <container_name>/<folder>/2022/02/ --> It has found data files, but schema mismatch error.

Error:

class parquet::ParquetStatusException (message: 'IOError: Corrupt snappy compressed data.')

Request ID: d3bca710-c0b1-49e6-94f4-b74a2baf7477

Time: 7/13/2022, 4:12:22 PM

How it got succeeded for one file and failing for other. 

Also, I have confirmed that schema is exactly matching for all files by running few scripts. But still we are getting error in CI while schema validation.

pastedimage1657710111099v1.png

  • RE: Schema mismatch error - Data Ingestion

    Hi Kulbushan -

    Can you write the output files in parquet version 1? ParquetProperties.WriterVersion.PARQUET_1_0

    parquet version 2 files are not yet supported.

    thanks,

    mukesh

  • RE: Schema mismatch error - Data Ingestion

    Yes I have tried it, result remains the same. I am getting same error messages.

    Earlier I was generating parquet files from synapse pipeline using copy activity.

    Now I have tried generating parquet files through pyspark script and they trying to ingest this data in CI but getting same error messages.

  • Community Member Profile Picture
    Community Member Microsoft Employee on at
    RE: Schema mismatch error - Data Ingestion

    Hi Kulbhushan,

    Have you tried to add it again?

  • Schema mistmatch and Data file mismatch - Data Ingestion

    Data files are not matching as expected.

    Note: There are only two folders i.e 2022, 2021 under <folder> 

    1. <container_name>/<folder>/2022/ --> It has 2496 data files matched
    2. <container_name>/<folder>/2021/ --> It has 3019 data files matched.
    3. <container_name>/<folder>/ --> It has only 4677 data files matched. However, I am expecting this to be sum (2496+3019) from 2022 and 2021. 

    What's the issue here?

    pastedimage1657712092969v2.png

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Announcing Our 2025 Season 1 Super Users!

A new season of Super Users has arrived, and we are so grateful for the daily…

Vahid Ghafarpour – Community Spotlight

We are excited to recognize Vahid Ghafarpour as our February 2025 Community…

Congratulations to the January Top 10 leaders!

Check out the January community rock stars...

Leaderboard

#1
André Arnaud de Calavon Profile Picture

André Arnaud de Cal... 291,996 Super User 2025 Season 1

#2
Martin Dráb Profile Picture

Martin Dráb 230,853 Most Valuable Professional

#3
nmaenpaa Profile Picture

nmaenpaa 101,156

Leaderboard

Featured topics

Product updates

Dynamics 365 release plans