hi folks,
I have a requirement to ingest data in CI from parquet file, available in ADLS2. How can we generate manifest file for the parquet file.
Is there any tool /reference link available to generate it?
hi folks,
I have a requirement to ingest data in CI from parquet file, available in ADLS2. How can we generate manifest file for the parquet file.
Is there any tool /reference link available to generate it?
One other important thing... just upload those json files changing just the paths. You dont need to worry about the regex pattern ,just change it so it points to your container. After that, you can use the Editor tools, first, to get into the editor, then you can Create a new Entity and finally you can add attributes to it that match the parquet file. At this time, AFAIK, there's no tool that's going to just read it in and build it for you but I can't say for sure. I've always had to edit them manually and you used to have to edit every single field in json, now you can use the tool, but see if you can get to the Editor screen, it'll do everything
First, create a fiile named ci.definitions.1.0.cdm.json and use the following - then upload to correct roog of storage container you're using:
{ "jsonSchemaSemanticVersion": "1.0.0", "imports": [ { "corpusPath": "cdm:/foundations.cdm.json" }, { "corpusPath": "cdm:/primitives.cdm.json" }, { "corpusPath": "cdm:/meanings.concepts.cdm.json" }, { "corpusPath": "cdm:/meanings.measurement.cdm.json" } ], "definitions": [ { "traitName": "is.CI.partition.incremental", "extendsTrait": "is", "hasParameters": [ { "name": "regularExpression", "dataType": "string", "explanation": "The regular expression to use for the incremental partition.", "required": true }, { "name": "rootLocation", "dataType": "string", "explanation": "The root location to use for discovering the partitions. If not specified, then we default to the rootLocation of the first data partition pattern", "required": false }, { "name": "parameters", "dataType": "list", "explanation": "Parameters for the regex capture i.e capture groups.", "required": false } ] }, { "traitName": "is.CI.partition.incremental.upsert", "extendsTrait": "is.CI.partition.incremental", "hasParameters": [] }, { "traitName": "is.CI.partition.incremental.delete", "extendsTrait": "is.CI.partition.incremental", "hasParameters": [] }, { "traitName": "is.formatted", "extendsTrait": "is", "explanation": "a root for traits that describe how data is formatted" }, { "traitName": "means.reference.culture", "extendsTrait": "means.reference" }, { "traitName": "means.reference.culture.tag", "extendsTrait": "means.reference.culture" }, { "dataTypeName": "cultureTag", "extendsDataType": "languageTag", "explanation": "a BCP 47 language tag", "exhibitsTraits": [ "means.reference.culture.tag" ] }, { "traitName": "is.formatted.forCulture", "extendsTrait": "is.formatted", "explanation": "values are stored using the specified culture", "hasParameters": [ { "name": "culture", "dataType": "cultureTag", "required": true, "explanation": "a IETF BCP 47 language tag" } ] }, { "traitName": "means.measurement.currencyCode", "extendsTrait": "means.measurement", "explanation": "indicates this value represents an ISO 4217 currency code" }, { "dataTypeName": "currencyCode", "extendsDataType": "stringFormat", "explanation": "value is a ISO 4217 currency code", "exhibitsTraits": [ "means.measurement.currencyCode" ] }, { "traitName": "is.inCurrency", "extendsTrait": "is", "explanation": "the data represents an amount of the specified currency", "hasParameters": [ { "name": "code", "dataType": "currencyCode", "required": true, "explanation": "ISO 4217 currency code" } ] }, { "traitName": "means.formatting.stringFormat", "extendsTrait": "means.formatting", "explanation": "indicates this value represents the format of a string" }, { "dataTypeName": "stringFormat", "extendsDataType": "string", "explanation": "a string representing the format used to encode data in another string", "exhibitsTraits": [ "means.formatting.stringFormat" ] }, { "traitName": "is.formatted.text", "extendsTrait": "is.formatted", "explanation": "string data is formatted according to the format parameter", "hasParameters": [ { "name": "format", "dataType": "stringFormat", "required": true, "explanation": "String indicating the format of the data" } ] }, { "traitName": "is.formatted.dateTime", "extendsTrait": "is.formatted", "explanation": "dateTime data formatted as a string in ISO 8601 format", "hasParameters": [ { "name": "format", "dataType": "stringFormat", "defaultValue": "YYYY-MM-DDThh:mmZ" } ] }, { "traitName": "is.formatted.date", "extendsTrait": "is.formatted", "explanation": "date data formatted as a string in ISO 8601 format", "hasParameters": [ { "name": "format", "dataType": "stringFormat", "defaultValue": "YYYY-MM-DD" } ] }, { "traitName": "is.formatted.time", "extendsTrait": "is.formatted", "explanation": "time data formatted as a string in ISO 8601 format", "hasParameters": [ { "name": "format", "dataType": "stringFormat", "defaultValue": "hh:mm:ss" } ] }, { "traitName": "is.inTimeZone", "extendsTrait": "is", "explanation": "the associated data is assumed to be in the specified time zone", "hasParameters": [ { "name": "timeZoneName", "dataType": "timezone", "required": true, "explanation": "the name of a time zone" }, { "name": "format", "dataType": "stringFormat", "required": true, "explanation": "the time zone naming scheme used for the timeZoneName parameter" } ] }, { "traitName": "is.inTimeZone.MicrosoftFormat", "extendsTrait": { "traitReference": "is.inTimeZone", "arguments": [ { "name": "format", "value": "MicrosoftFormat" } ] }, "explanation": "the associated data is assumed to be in the specified time zone. timeZoneName value is a Microsoft standard time zone name. see support.microsoft.com/.../973627" }, { "traitName": "is.inTimeZone.tzDatabaseFormat", "extendsTrait": { "traitReference": "is.inTimeZone", "arguments": [ { "name": "format", "value": "tzDatabaseFormat" } ] }, "explanation": "the associated data is assumed to be in the specified time zone. timeZoneName value is a Time Zone Database standard time zone name. see www.iana.org/time-zones" } ] }
Then create the cdp.manifest.cdm.json file - it's the one I showed above but here are examples of initial file names that reference actual Parquet files.
{ "manifestName": "YOUR CUSTOMER SANBOX or PRODUCTION", "entities": [ { "type": "LocalEntity", "entityName": "CustomerAlternate", "entityPath": "CustomerAlternate.cdm.json/CustomerAlternate", "dataPartitions": [ { "location": "/Customer/CICustomer.parquet", "exhibitsTraits": [ { "traitReference": "is.partition.format.parquet" } ] } ] }, { "type": "LocalEntity", "entityName": "Order", "entityPath": "Order.cdm.json/Order", "dataPartitions": [ { "location": "/Activities/Order/CIOrders.parquet", "exhibitsTraits": [ { "traitReference": "is.partition.format.parquet" } ] } ] }, { "type": "LocalEntity", "entityName": "Customer", "entityPath": "Customer.cdm.json/Customer", "dataPartitions": [ { "location": "/Customer/CICustomer.parquet", "exhibitsTraits": [ { "traitReference": "is.partition.format.parquet" } ] } ] }, { "type": "LocalEntity", "entityName": "OrderItem", "entityPath": "OrderItem.cdm.json/OrderItem", "dataPartitions": [ { "location": "/Activities/OrderItem/CIOrderItem.parquet", "exhibitsTraits": [ { "traitReference": "is.partition.format.parquet" } ] } ] } ], "jsonSchemaSemanticVersion": "1.0.0", "imports": [ { "corpusPath": "ci.definitions.cdm.json" }, { "corpusPath": "/CustomerInsightsDefinitions/ci.definitions.1.0.cdm.json", "moniker": "[CI Auto Import] Customer Insights definitions" } ] }
Just make sure you have those folders there. There are a lot of nuances, there are a lot of gotchas but if it saves, then you''re good to go. You should be able to edit and create enities after this. Obviously adjust the file references and exclude to just one entity to begin with - that's the main parquet file you mentioned. These are both from live instances with just container names changed. As long as CI can access the lake, you should be good. Let me know if you have any problem.
Bill Ryan Thanks for the quick reply. I am facing issue in generating manifest json . Is there any tool to generate these files from parquet files? I know the further steps I am only stucked with the manifest json file. It has few parameter like regex, root location, partition url --I believe those values I am setting wrong.
All you really need is the default manifest.json file. Upload that to the container at the root of your lake, where you are going to manage the data. Make sure that you have the access issues worked out, but assuming your containers are all in place, you start bby choosing Azure Data Lake, and then fill out the details. You just need to have either a default manifest.json or model.json file ( if you use Manifest, it'll let you actually build the entities with the entity builder, which makes things a lot easier)
Manifest.json
{ "jsonSchemaSemanticVersion": "1.0.0", "imports": [{ "corpusPath": "ci.definitions.cdm.json" }], "manifestName": "Your Sandbox Name", "entities": [{ "type": "LocalEntity", "entityName": "Customer", "exhibitsTraits": [{ "traitReference": "is.formatted.dateTime", "arguments": [{ "name": "format", "value": "yyyy-MM-dd'T'HH:mm:ss" }] }, { "traitReference": "is.formatted.date", "arguments": [{ "name": "format", "value": "yyyy-MM-dd" }] }, { "traitReference": "is.CI.partition.incremental.upsert", "arguments": [{ "name": "regularExpression", "value": "/IncrementalData/(\\d{4})/(\\d{2})/(\\d{2})/(\\d{2})/Upserts/.*\\.csv" }, { "name": "parameters", "value": "" }, { "name": "rootLocation", "value": "Customer" } ] }, { "traitReference": "is.CI.partition.incremental.delete", "arguments": [{ "name": "regularExpression", "value": "/IncrementalData/(\\d{4})/(\\d{2})/(\\d{2})/(\\d{2})/Deletes/.*\\.csv " }, { "name": "parameters", "value": "" }, { "name": "rootLocation", "value": "Customer" } ] } ] }] }
Just put that in a file. call is cdp.manifest.cdm.json or something similar so you recognize it. When you go back in to ingest, it should see that and then you'll be able to add your entity references. I'm guessing you've seen this, https://docs.microsoft.com/en-us/minecraft/creator/reference/content/addonsreference/examples/addonmanifest but if not, it's just the reference. Now that you have the UI to build your entities it's mostly downhill from here. This is a little tricky (or can be) the first time, so if you run into any problems let me know and I can walk you through it.
Stay up to date on forum activity by subscribing. You can also customize your in-app and email Notification settings across all subscriptions.
André Arnaud de Cal... 291,280 Super User 2024 Season 2
Martin Dráb 230,214 Most Valuable Professional
nmaenpaa 101,156