web
You’re offline. This is a read only version of the page.
close
Skip to main content

Notifications

Announcements

No record found.

Community site session details

Community site session details

Session Id :
Customer experience | Sales, Customer Insights,...
Unanswered

Schema mismatch error - Data Ingestion

(0) ShareShare
ReportReport
Posted on by

I am trying to set up data source, but while Linking Data File and schema validation I am getting different messages.

Steps performed:

1. Generated Parquet data stream in ADLS Gen2  from SStreams in ADLS Gen1 using ADF copy activity.

2. File generated are in folder structure:  <container_name>/<folder>/YYYY/MM/DD/<file_name>.parquet

3. Creating new data source in CI and ingesting data.

4. Now, while setting up data source, when I am trying to validate schema for files under folder structures as mentioned below.

  1. <container_name>/<folder>/2022/05/ --> It got succeeded.
  2. <container_name>/<folder>/2022/04/ --> It got failed with error as mentioned below.
  3. <container_name>/<folder>/2022/02/ --> It has found data files, but schema mismatch error.

Error:

class parquet::ParquetStatusException (message: 'IOError: Corrupt snappy compressed data.')

Request ID: d3bca710-c0b1-49e6-94f4-b74a2baf7477

Time: 7/13/2022, 4:12:22 PM

How it got succeeded for one file and failing for other. 

Also, I have confirmed that schema is exactly matching for all files by running few scripts. But still we are getting error in CI while schema validation.

pastedimage1657710111099v1.png

I have the same question (0)
  • Kulbhushan Katoch Profile Picture
    on at

    Data files are not matching as expected.

    Note: There are only two folders i.e 2022, 2021 under <folder> 

    1. <container_name>/<folder>/2022/ --> It has 2496 data files matched
    2. <container_name>/<folder>/2021/ --> It has 3019 data files matched.
    3. <container_name>/<folder>/ --> It has only 4677 data files matched. However, I am expecting this to be sum (2496+3019) from 2022 and 2021. 

    What's the issue here?

    pastedimage1657712092969v2.png

  • Community Member Profile Picture
    on at

    Hi Kulbhushan,

    Have you tried to add it again?

  • Kulbhushan Katoch Profile Picture
    on at

    Yes I have tried it, result remains the same. I am getting same error messages.

    Earlier I was generating parquet files from synapse pipeline using copy activity.

    Now I have tried generating parquet files through pyspark script and they trying to ingest this data in CI but getting same error messages.

  • Mukesh Pohuja Profile Picture
    on at

    Hi Kulbushan -

    Can you write the output files in parquet version 1? ParquetProperties.WriterVersion.PARQUET_1_0

    parquet version 2 files are not yet supported.

    thanks,

    mukesh

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Responsible AI policies

As AI tools become more common, we’re introducing a Responsible AI Use…

Neeraj Kumar – Community Spotlight

We are honored to recognize Neeraj Kumar as our Community Spotlight honoree for…

Leaderboard > Customer experience | Sales, Customer Insights, CRM

#1
Tom_Gioielli Profile Picture

Tom_Gioielli 70 Super User 2025 Season 2

#2
Gerardo Rentería García Profile Picture

Gerardo Rentería Ga... 33 Most Valuable Professional

#3
Daniyal Khaleel Profile Picture

Daniyal Khaleel 32 Most Valuable Professional

Last 30 days Overall leaderboard

Product updates

Dynamics 365 release plans