web
You’re offline. This is a read only version of the page.
close
Skip to main content

Notifications

Announcements

No record found.

Community site session details

Community site session details

Session Id :
Customer experience | Sales, Customer Insights,...
Suggested Answer

Manifest.json for parquet file present in ADLS2

(1) ShareShare
ReportReport
Posted on by

hi folks,

I have a requirement to ingest data in CI from parquet file, available in ADLS2. How can we generate manifest file for the parquet file. 

 Is there any tool /reference link available to generate it?

I have the same question (0)
  • Bill Ryan Profile Picture
    8 on at

    All you really need is the default manifest.json file.  Upload that to the container at the root of your lake, where you are going to manage the data.  Make sure that you have the access issues worked out, but assuming your containers are all in place, you start bby choosing Azure Data Lake, and then fill out the details.  You just need to have either a default manifest.json or model.json file ( if you use Manifest, it'll let you actually build the entities with the entity builder, which makes things a lot easier)

    ADL Ingestion

    Storage Account

    Manifest.json

    {
        "jsonSchemaSemanticVersion": "1.0.0",
        "imports": [{
            "corpusPath": "ci.definitions.cdm.json"
        }],
        "manifestName": "Your Sandbox Name",
        "entities": [{
            "type": "LocalEntity",
            "entityName": "Customer",
            "exhibitsTraits": [{
                    "traitReference": "is.formatted.dateTime",
                    "arguments": [{
                        "name": "format",
                        "value": "yyyy-MM-dd'T'HH:mm:ss"
                    }]
                },
                {
                    "traitReference": "is.formatted.date",
                    "arguments": [{
                        "name": "format",
                        "value": "yyyy-MM-dd"
                    }]
                },
                {
                    "traitReference": "is.CI.partition.incremental.upsert",
                    "arguments": [{
                            "name": "regularExpression",
                            "value": "/IncrementalData/(\\d{4})/(\\d{2})/(\\d{2})/(\\d{2})/Upserts/.*\\.csv"
                        },
                        {
                            "name": "parameters",
                            "value": ""
                        },
                        {
                            "name": "rootLocation",
                            "value": "Customer"
                        }
                    ]
                },
                {
                    "traitReference": "is.CI.partition.incremental.delete",
                    "arguments": [{
                            "name": "regularExpression",
                            "value": "/IncrementalData/(\\d{4})/(\\d{2})/(\\d{2})/(\\d{2})/Deletes/.*\\.csv "
                        },
                        {
                            "name": "parameters",
                            "value": ""
                        },
                        {
                            "name": "rootLocation",
                            "value": "Customer"
                        }
                    ]
                }
            ]
        }]
    }

    Just put that in a file. call is cdp.manifest.cdm.json or something similar so you recognize it.  When you go back in to ingest, it should see that and then you'll be able to add your entity references.  I'm guessing you've seen this, https://docs.microsoft.com/en-us/minecraft/creator/reference/content/addonsreference/examples/addonmanifest  but if not, it's just the reference.  Now that you have the UI to build your entities it's mostly downhill from here.  This is a little tricky (or can be) the first time, so if you run into any problems let me know and I can walk you through it.

  • Purnima Profile Picture
    on at

    Bill Ryan  Thanks for the quick reply. I am facing issue in generating manifest json . Is there any tool to generate these files from parquet files? I know the further steps I am only stucked with the manifest json file. It has few parameter like regex, root location, partition url --I believe those values I am setting wrong.

  • Suggested answer
    Bill Ryan Profile Picture
    8 on at

    First, create a fiile named ci.definitions.1.0.cdm.json and use the following - then upload to correct roog of storage container you're using:

    {
    
     "jsonSchemaSemanticVersion": "1.0.0",
    
     "imports": [
    
       {
    
         "corpusPath": "cdm:/foundations.cdm.json"
    
       },
    
       {
    
         "corpusPath": "cdm:/primitives.cdm.json"
    
       },
    
       {
    
         "corpusPath": "cdm:/meanings.concepts.cdm.json"
    
       },
    
       {
    
         "corpusPath": "cdm:/meanings.measurement.cdm.json"
    
       }
    
     ],
    
     "definitions": [
    
       {
    
         "traitName": "is.CI.partition.incremental",
    
         "extendsTrait": "is",
    
         "hasParameters": [
    
           {
    
             "name": "regularExpression",
    
             "dataType": "string",
    
             "explanation": "The regular expression to use for the incremental partition.",
    
             "required": true
    
           },
    
           {
    
             "name": "rootLocation",
    
             "dataType": "string",
    
             "explanation": "The root location to use for discovering the partitions. If not specified, then we default to the rootLocation of the first data partition pattern",
    
             "required": false
    
           },
    
           {
    
             "name": "parameters",
    
             "dataType": "list",
    
             "explanation": "Parameters for the regex capture i.e capture groups.",
    
             "required": false
    
           }
    
         ]
    
       },
    
       {
    
         "traitName": "is.CI.partition.incremental.upsert",
    
         "extendsTrait": "is.CI.partition.incremental",
    
         "hasParameters": []
    
       },
    
       {
    
         "traitName": "is.CI.partition.incremental.delete",
    
         "extendsTrait": "is.CI.partition.incremental",
    
         "hasParameters": []
    
       },
    
       {
    
         "traitName": "is.formatted",
    
         "extendsTrait": "is",
    
         "explanation": "a root for traits that describe how data is formatted"
    
       },
    
       {
    
         "traitName": "means.reference.culture",
    
         "extendsTrait": "means.reference"
    
       },
    
       {
    
         "traitName": "means.reference.culture.tag",
    
         "extendsTrait": "means.reference.culture"
    
       },
    
       {
    
         "dataTypeName": "cultureTag",
    
         "extendsDataType": "languageTag",
    
         "explanation": "a BCP 47 language tag",
    
         "exhibitsTraits": [
    
           "means.reference.culture.tag"
    
         ]
    
       },
    
       {
    
         "traitName": "is.formatted.forCulture",
    
         "extendsTrait": "is.formatted",
    
         "explanation": "values are stored using the specified culture",
    
         "hasParameters": [
    
           {
    
             "name": "culture",
    
             "dataType": "cultureTag",
    
             "required": true,
    
             "explanation": "a IETF BCP 47 language tag"
    
           }
    
         ]
    
       },
    
       {
    
         "traitName": "means.measurement.currencyCode",
    
         "extendsTrait": "means.measurement",
    
         "explanation": "indicates this value represents an ISO 4217 currency code"
    
       },
    
       {
    
         "dataTypeName": "currencyCode",
    
         "extendsDataType": "stringFormat",
    
         "explanation": "value is a ISO 4217 currency code",
    
         "exhibitsTraits": [
    
           "means.measurement.currencyCode"
    
         ]
    
       },
    
       {
    
         "traitName": "is.inCurrency",
    
         "extendsTrait": "is",
    
         "explanation": "the data represents an amount of the specified currency",
    
         "hasParameters": [
    
           {
    
             "name": "code",
    
             "dataType": "currencyCode",
    
             "required": true,
    
             "explanation": "ISO 4217 currency code"
    
           }
    
         ]
    
       },
    
       {
    
         "traitName": "means.formatting.stringFormat",
    
         "extendsTrait": "means.formatting",
    
         "explanation": "indicates this value represents the format of a string"
    
       },
    
       {
    
         "dataTypeName": "stringFormat",
    
         "extendsDataType": "string",
    
         "explanation": "a string representing the format used to encode data in another string",
    
         "exhibitsTraits": [
    
           "means.formatting.stringFormat"
    
         ]
    
       },
    
       {
    
         "traitName": "is.formatted.text",
    
         "extendsTrait": "is.formatted",
    
         "explanation": "string data is formatted according to the format parameter",
    
         "hasParameters": [
    
           {
    
             "name": "format",
    
             "dataType": "stringFormat",
    
             "required": true,
    
             "explanation": "String indicating the format of the data"
    
           }
    
         ]
    
       },
    
       {
    
         "traitName": "is.formatted.dateTime",
    
         "extendsTrait": "is.formatted",
    
         "explanation": "dateTime data formatted as a string in ISO 8601 format",
    
         "hasParameters": [
    
           {
    
             "name": "format",
    
             "dataType": "stringFormat",
    
             "defaultValue": "YYYY-MM-DDThh:mmZ"
    
           }
    
         ]
    
       },
    
       {
    
         "traitName": "is.formatted.date",
    
         "extendsTrait": "is.formatted",
    
         "explanation": "date data formatted as a string in ISO 8601 format",
    
         "hasParameters": [
    
           {
    
             "name": "format",
    
             "dataType": "stringFormat",
    
             "defaultValue": "YYYY-MM-DD"
    
           }
    
         ]
    
       },
    
       {
    
         "traitName": "is.formatted.time",
    
         "extendsTrait": "is.formatted",
    
         "explanation": "time data formatted as a string in ISO 8601 format",
    
         "hasParameters": [
    
           {
    
             "name": "format",
    
             "dataType": "stringFormat",
    
             "defaultValue": "hh:mm:ss"
    
           }
    
         ]
    
       },
    
       {
    
         "traitName": "is.inTimeZone",
    
         "extendsTrait": "is",
    
         "explanation": "the associated data is assumed to be in the specified time zone",
    
         "hasParameters": [
    
           {
    
             "name": "timeZoneName",
    
             "dataType": "timezone",
    
             "required": true,
    
             "explanation": "the name of a time zone"
    
           },
    
           {
    
             "name": "format",
    
             "dataType": "stringFormat",
    
             "required": true,
    
             "explanation": "the time zone naming scheme used for the timeZoneName parameter"
    
           }
    
         ]
    
       },
    
       {
    
         "traitName": "is.inTimeZone.MicrosoftFormat",
    
         "extendsTrait": {
    
           "traitReference": "is.inTimeZone",
    
           "arguments": [
    
             {
    
               "name": "format",
    
               "value": "MicrosoftFormat"
    
             }
    
           ]
    
         },
    
         "explanation": "the associated data is assumed to be in the specified time zone. timeZoneName value is a Microsoft standard time zone name. see support.microsoft.com/.../973627"
    
       },
    
       {
    
         "traitName": "is.inTimeZone.tzDatabaseFormat",
    
         "extendsTrait": {
    
           "traitReference": "is.inTimeZone",
    
           "arguments": [
    
             {
    
               "name": "format",
    
               "value": "tzDatabaseFormat"
    
             }
    
           ]
    
         },
    
         "explanation": "the associated data is assumed to be in the specified time zone. timeZoneName value is a Time Zone Database standard time zone name. see www.iana.org/time-zones"
    
       }
    
     ]
    
    }

    Then create the cdp.manifest.cdm.json file - it's the one I showed above but here are examples of initial file names that reference actual Parquet files.

    {
      "manifestName": "YOUR CUSTOMER SANBOX or PRODUCTION",
      "entities": [
        {
          "type": "LocalEntity",
          "entityName": "CustomerAlternate",
          "entityPath": "CustomerAlternate.cdm.json/CustomerAlternate",
          "dataPartitions": [
            {
              "location": "/Customer/CICustomer.parquet",
              "exhibitsTraits": [
                {
                  "traitReference": "is.partition.format.parquet"
                }
              ]
            }
          ]
        },
        {
          "type": "LocalEntity",
          "entityName": "Order",
          "entityPath": "Order.cdm.json/Order",
          "dataPartitions": [
            {
              "location": "/Activities/Order/CIOrders.parquet",
              "exhibitsTraits": [
                {
                  "traitReference": "is.partition.format.parquet"
                }
              ]
            }
          ]
        },
        {
          "type": "LocalEntity",
          "entityName": "Customer",
          "entityPath": "Customer.cdm.json/Customer",
          "dataPartitions": [
            {
              "location": "/Customer/CICustomer.parquet",
              "exhibitsTraits": [
                {
                  "traitReference": "is.partition.format.parquet"
                }
              ]
            }
          ]
        },
        {
          "type": "LocalEntity",
          "entityName": "OrderItem",
          "entityPath": "OrderItem.cdm.json/OrderItem",
          "dataPartitions": [
            {
              "location": "/Activities/OrderItem/CIOrderItem.parquet",
              "exhibitsTraits": [
                {
                  "traitReference": "is.partition.format.parquet"
                }
              ]
            }
          ]
        }
      ],
      "jsonSchemaSemanticVersion": "1.0.0",
      "imports": [
        {
          "corpusPath": "ci.definitions.cdm.json"
        },
        {
          "corpusPath": "/CustomerInsightsDefinitions/ci.definitions.1.0.cdm.json",
          "moniker": "[CI Auto Import] Customer Insights definitions"
        }
      ]
    }

    Just make sure you have those folders there.  There are a lot of nuances, there are a lot of gotchas but if it saves, then you''re good to go. You should be able to edit and create enities after this.  Obviously adjust the file references and exclude to just one entity to begin with - that's the main parquet file you mentioned.  These are both from live instances with just container names changed. As long as CI can access the lake, you should be good.  Let me know if you have any problem.

  • Bill Ryan Profile Picture
    8 on at

    One other important thing... just upload those json files changing just the paths. You dont need to worry about the regex pattern ,just change it so it points to your container.  After that, you can use the Editor tools, first, to get into the editor, then you can Create a new Entity and finally  you can add attributes to it that match the parquet file.  At this time, AFAIK, there's no tool that's going to just read it in and build it for you but I can't say for sure. I've always had to edit them manually and you used to have to edit every single field in json, now you can use the tool, but see if you can get to the Editor screen, it'll do everything 

       CDPBeginEdit.png

    CDPFull.png

    CDPEditEntity.png

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Responsible AI policies

As AI tools become more common, we’re introducing a Responsible AI Use…

Neeraj Kumar – Community Spotlight

We are honored to recognize Neeraj Kumar as our Community Spotlight honoree for…

Leaderboard > Customer experience | Sales, Customer Insights, CRM

#1
Tom_Gioielli Profile Picture

Tom_Gioielli 74 Super User 2025 Season 2

#2
Daniyal Khaleel Profile Picture

Daniyal Khaleel 32 Most Valuable Professional

#3
Gerardo Rentería García Profile Picture

Gerardo Rentería Ga... 31 Most Valuable Professional

Last 30 days Overall leaderboard

Product updates

Dynamics 365 release plans