Skip to main content

Notifications

Announcements

No record found.

Customer experience | Sales, Customer Insights,...
Suggested answer

Manifest.json for parquet file present in ADLS2

Posted on by

hi folks,

I have a requirement to ingest data in CI from parquet file, available in ADLS2. How can we generate manifest file for the parquet file. 

 Is there any tool /reference link available to generate it?

  • Bill Ryan Profile Picture
    Bill Ryan 803 on at
    RE: Manifest.json for parquet file present in ADLS2

    One other important thing... just upload those json files changing just the paths. You dont need to worry about the regex pattern ,just change it so it points to your container.  After that, you can use the Editor tools, first, to get into the editor, then you can Create a new Entity and finally  you can add attributes to it that match the parquet file.  At this time, AFAIK, there's no tool that's going to just read it in and build it for you but I can't say for sure. I've always had to edit them manually and you used to have to edit every single field in json, now you can use the tool, but see if you can get to the Editor screen, it'll do everything 

       CDPBeginEdit.png

    CDPFull.png

    CDPEditEntity.png

  • Suggested answer
    Bill Ryan Profile Picture
    Bill Ryan 803 on at
    RE: Manifest.json for parquet file present in ADLS2

    First, create a fiile named ci.definitions.1.0.cdm.json and use the following - then upload to correct roog of storage container you're using:

    {
    
     "jsonSchemaSemanticVersion": "1.0.0",
    
     "imports": [
    
       {
    
         "corpusPath": "cdm:/foundations.cdm.json"
    
       },
    
       {
    
         "corpusPath": "cdm:/primitives.cdm.json"
    
       },
    
       {
    
         "corpusPath": "cdm:/meanings.concepts.cdm.json"
    
       },
    
       {
    
         "corpusPath": "cdm:/meanings.measurement.cdm.json"
    
       }
    
     ],
    
     "definitions": [
    
       {
    
         "traitName": "is.CI.partition.incremental",
    
         "extendsTrait": "is",
    
         "hasParameters": [
    
           {
    
             "name": "regularExpression",
    
             "dataType": "string",
    
             "explanation": "The regular expression to use for the incremental partition.",
    
             "required": true
    
           },
    
           {
    
             "name": "rootLocation",
    
             "dataType": "string",
    
             "explanation": "The root location to use for discovering the partitions. If not specified, then we default to the rootLocation of the first data partition pattern",
    
             "required": false
    
           },
    
           {
    
             "name": "parameters",
    
             "dataType": "list",
    
             "explanation": "Parameters for the regex capture i.e capture groups.",
    
             "required": false
    
           }
    
         ]
    
       },
    
       {
    
         "traitName": "is.CI.partition.incremental.upsert",
    
         "extendsTrait": "is.CI.partition.incremental",
    
         "hasParameters": []
    
       },
    
       {
    
         "traitName": "is.CI.partition.incremental.delete",
    
         "extendsTrait": "is.CI.partition.incremental",
    
         "hasParameters": []
    
       },
    
       {
    
         "traitName": "is.formatted",
    
         "extendsTrait": "is",
    
         "explanation": "a root for traits that describe how data is formatted"
    
       },
    
       {
    
         "traitName": "means.reference.culture",
    
         "extendsTrait": "means.reference"
    
       },
    
       {
    
         "traitName": "means.reference.culture.tag",
    
         "extendsTrait": "means.reference.culture"
    
       },
    
       {
    
         "dataTypeName": "cultureTag",
    
         "extendsDataType": "languageTag",
    
         "explanation": "a BCP 47 language tag",
    
         "exhibitsTraits": [
    
           "means.reference.culture.tag"
    
         ]
    
       },
    
       {
    
         "traitName": "is.formatted.forCulture",
    
         "extendsTrait": "is.formatted",
    
         "explanation": "values are stored using the specified culture",
    
         "hasParameters": [
    
           {
    
             "name": "culture",
    
             "dataType": "cultureTag",
    
             "required": true,
    
             "explanation": "a IETF BCP 47 language tag"
    
           }
    
         ]
    
       },
    
       {
    
         "traitName": "means.measurement.currencyCode",
    
         "extendsTrait": "means.measurement",
    
         "explanation": "indicates this value represents an ISO 4217 currency code"
    
       },
    
       {
    
         "dataTypeName": "currencyCode",
    
         "extendsDataType": "stringFormat",
    
         "explanation": "value is a ISO 4217 currency code",
    
         "exhibitsTraits": [
    
           "means.measurement.currencyCode"
    
         ]
    
       },
    
       {
    
         "traitName": "is.inCurrency",
    
         "extendsTrait": "is",
    
         "explanation": "the data represents an amount of the specified currency",
    
         "hasParameters": [
    
           {
    
             "name": "code",
    
             "dataType": "currencyCode",
    
             "required": true,
    
             "explanation": "ISO 4217 currency code"
    
           }
    
         ]
    
       },
    
       {
    
         "traitName": "means.formatting.stringFormat",
    
         "extendsTrait": "means.formatting",
    
         "explanation": "indicates this value represents the format of a string"
    
       },
    
       {
    
         "dataTypeName": "stringFormat",
    
         "extendsDataType": "string",
    
         "explanation": "a string representing the format used to encode data in another string",
    
         "exhibitsTraits": [
    
           "means.formatting.stringFormat"
    
         ]
    
       },
    
       {
    
         "traitName": "is.formatted.text",
    
         "extendsTrait": "is.formatted",
    
         "explanation": "string data is formatted according to the format parameter",
    
         "hasParameters": [
    
           {
    
             "name": "format",
    
             "dataType": "stringFormat",
    
             "required": true,
    
             "explanation": "String indicating the format of the data"
    
           }
    
         ]
    
       },
    
       {
    
         "traitName": "is.formatted.dateTime",
    
         "extendsTrait": "is.formatted",
    
         "explanation": "dateTime data formatted as a string in ISO 8601 format",
    
         "hasParameters": [
    
           {
    
             "name": "format",
    
             "dataType": "stringFormat",
    
             "defaultValue": "YYYY-MM-DDThh:mmZ"
    
           }
    
         ]
    
       },
    
       {
    
         "traitName": "is.formatted.date",
    
         "extendsTrait": "is.formatted",
    
         "explanation": "date data formatted as a string in ISO 8601 format",
    
         "hasParameters": [
    
           {
    
             "name": "format",
    
             "dataType": "stringFormat",
    
             "defaultValue": "YYYY-MM-DD"
    
           }
    
         ]
    
       },
    
       {
    
         "traitName": "is.formatted.time",
    
         "extendsTrait": "is.formatted",
    
         "explanation": "time data formatted as a string in ISO 8601 format",
    
         "hasParameters": [
    
           {
    
             "name": "format",
    
             "dataType": "stringFormat",
    
             "defaultValue": "hh:mm:ss"
    
           }
    
         ]
    
       },
    
       {
    
         "traitName": "is.inTimeZone",
    
         "extendsTrait": "is",
    
         "explanation": "the associated data is assumed to be in the specified time zone",
    
         "hasParameters": [
    
           {
    
             "name": "timeZoneName",
    
             "dataType": "timezone",
    
             "required": true,
    
             "explanation": "the name of a time zone"
    
           },
    
           {
    
             "name": "format",
    
             "dataType": "stringFormat",
    
             "required": true,
    
             "explanation": "the time zone naming scheme used for the timeZoneName parameter"
    
           }
    
         ]
    
       },
    
       {
    
         "traitName": "is.inTimeZone.MicrosoftFormat",
    
         "extendsTrait": {
    
           "traitReference": "is.inTimeZone",
    
           "arguments": [
    
             {
    
               "name": "format",
    
               "value": "MicrosoftFormat"
    
             }
    
           ]
    
         },
    
         "explanation": "the associated data is assumed to be in the specified time zone. timeZoneName value is a Microsoft standard time zone name. see support.microsoft.com/.../973627"
    
       },
    
       {
    
         "traitName": "is.inTimeZone.tzDatabaseFormat",
    
         "extendsTrait": {
    
           "traitReference": "is.inTimeZone",
    
           "arguments": [
    
             {
    
               "name": "format",
    
               "value": "tzDatabaseFormat"
    
             }
    
           ]
    
         },
    
         "explanation": "the associated data is assumed to be in the specified time zone. timeZoneName value is a Time Zone Database standard time zone name. see www.iana.org/time-zones"
    
       }
    
     ]
    
    }

    Then create the cdp.manifest.cdm.json file - it's the one I showed above but here are examples of initial file names that reference actual Parquet files.

    {
      "manifestName": "YOUR CUSTOMER SANBOX or PRODUCTION",
      "entities": [
        {
          "type": "LocalEntity",
          "entityName": "CustomerAlternate",
          "entityPath": "CustomerAlternate.cdm.json/CustomerAlternate",
          "dataPartitions": [
            {
              "location": "/Customer/CICustomer.parquet",
              "exhibitsTraits": [
                {
                  "traitReference": "is.partition.format.parquet"
                }
              ]
            }
          ]
        },
        {
          "type": "LocalEntity",
          "entityName": "Order",
          "entityPath": "Order.cdm.json/Order",
          "dataPartitions": [
            {
              "location": "/Activities/Order/CIOrders.parquet",
              "exhibitsTraits": [
                {
                  "traitReference": "is.partition.format.parquet"
                }
              ]
            }
          ]
        },
        {
          "type": "LocalEntity",
          "entityName": "Customer",
          "entityPath": "Customer.cdm.json/Customer",
          "dataPartitions": [
            {
              "location": "/Customer/CICustomer.parquet",
              "exhibitsTraits": [
                {
                  "traitReference": "is.partition.format.parquet"
                }
              ]
            }
          ]
        },
        {
          "type": "LocalEntity",
          "entityName": "OrderItem",
          "entityPath": "OrderItem.cdm.json/OrderItem",
          "dataPartitions": [
            {
              "location": "/Activities/OrderItem/CIOrderItem.parquet",
              "exhibitsTraits": [
                {
                  "traitReference": "is.partition.format.parquet"
                }
              ]
            }
          ]
        }
      ],
      "jsonSchemaSemanticVersion": "1.0.0",
      "imports": [
        {
          "corpusPath": "ci.definitions.cdm.json"
        },
        {
          "corpusPath": "/CustomerInsightsDefinitions/ci.definitions.1.0.cdm.json",
          "moniker": "[CI Auto Import] Customer Insights definitions"
        }
      ]
    }

    Just make sure you have those folders there.  There are a lot of nuances, there are a lot of gotchas but if it saves, then you''re good to go. You should be able to edit and create enities after this.  Obviously adjust the file references and exclude to just one entity to begin with - that's the main parquet file you mentioned.  These are both from live instances with just container names changed. As long as CI can access the lake, you should be good.  Let me know if you have any problem.

  • Purnima Profile Picture
    Purnima on at
    RE: Manifest.json for parquet file present in ADLS2

    Bill Ryan  Thanks for the quick reply. I am facing issue in generating manifest json . Is there any tool to generate these files from parquet files? I know the further steps I am only stucked with the manifest json file. It has few parameter like regex, root location, partition url --I believe those values I am setting wrong.

  • Bill Ryan Profile Picture
    Bill Ryan 803 on at
    RE: Manifest.json for parquet file present in ADLS2

    All you really need is the default manifest.json file.  Upload that to the container at the root of your lake, where you are going to manage the data.  Make sure that you have the access issues worked out, but assuming your containers are all in place, you start bby choosing Azure Data Lake, and then fill out the details.  You just need to have either a default manifest.json or model.json file ( if you use Manifest, it'll let you actually build the entities with the entity builder, which makes things a lot easier)

    ADL Ingestion

    Storage Account

    Manifest.json

    {
        "jsonSchemaSemanticVersion": "1.0.0",
        "imports": [{
            "corpusPath": "ci.definitions.cdm.json"
        }],
        "manifestName": "Your Sandbox Name",
        "entities": [{
            "type": "LocalEntity",
            "entityName": "Customer",
            "exhibitsTraits": [{
                    "traitReference": "is.formatted.dateTime",
                    "arguments": [{
                        "name": "format",
                        "value": "yyyy-MM-dd'T'HH:mm:ss"
                    }]
                },
                {
                    "traitReference": "is.formatted.date",
                    "arguments": [{
                        "name": "format",
                        "value": "yyyy-MM-dd"
                    }]
                },
                {
                    "traitReference": "is.CI.partition.incremental.upsert",
                    "arguments": [{
                            "name": "regularExpression",
                            "value": "/IncrementalData/(\\d{4})/(\\d{2})/(\\d{2})/(\\d{2})/Upserts/.*\\.csv"
                        },
                        {
                            "name": "parameters",
                            "value": ""
                        },
                        {
                            "name": "rootLocation",
                            "value": "Customer"
                        }
                    ]
                },
                {
                    "traitReference": "is.CI.partition.incremental.delete",
                    "arguments": [{
                            "name": "regularExpression",
                            "value": "/IncrementalData/(\\d{4})/(\\d{2})/(\\d{2})/(\\d{2})/Deletes/.*\\.csv "
                        },
                        {
                            "name": "parameters",
                            "value": ""
                        },
                        {
                            "name": "rootLocation",
                            "value": "Customer"
                        }
                    ]
                }
            ]
        }]
    }

    Just put that in a file. call is cdp.manifest.cdm.json or something similar so you recognize it.  When you go back in to ingest, it should see that and then you'll be able to add your entity references.  I'm guessing you've seen this, https://docs.microsoft.com/en-us/minecraft/creator/reference/content/addonsreference/examples/addonmanifest  but if not, it's just the reference.  Now that you have the UI to build your entities it's mostly downhill from here.  This is a little tricky (or can be) the first time, so if you run into any problems let me know and I can walk you through it.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

December Spotlight Star - Muhammad Affan

Congratulations to a top community star!

Top 10 leaders for November!

Congratulations to our November super stars!

Tips for Writing Effective Suggested Answers

Best practices for providing successful forum answers ✍️

Leaderboard

#1
André Arnaud de Calavon Profile Picture

André Arnaud de Cal... 291,280 Super User 2024 Season 2

#2
Martin Dráb Profile Picture

Martin Dráb 230,214 Most Valuable Professional

#3
nmaenpaa Profile Picture

nmaenpaa 101,156

Leaderboard

Featured topics

Product updates

Dynamics 365 release plans