Customer experience | Sales, Customer Insights,...

On-Premises Data Gateway connector deletes imported rows in CI

(0) Share

Report

Posted on by Community Member

We planned to use on-premise gateway connector to import historical data for orders/order lines into CI (more than 10 million of rows, On-Premise SQL server) and upload new orders/order lines later. I have read about the limits for on-premise gateway connector about max 10Gb per table ( docs.microsoft.com/.../service-gateway-onprem-tshoot ). I decided to make ingestion in several steps:
1. Import historical data (several steps: Import 5 million records -> Clear SQL table, Import next 5 million records -> Clear SQL table….)
2. Clear the SQL table
3. Import new data on daily basis
4. Periodically clear SQL table of imported records
5. Go to step p.3

But when I delete rows in SQL table (after successful import into CI) – records also deleted in CI as well!
My questions are:
1. Is it possible to forbid deletion of imported data in CI ? Configure On-Premise Data Gateway connector in “append-only” mode ?
2. Are there another way of loading data into CI from on-premise SQL Server ?

I have the same question (0)

All responses (11)

Answers (0)

Suggested answer

Bill Ryan 8 on at

Like (0)

Report

Activities need to be relinked after M3 so this is AFAIK, the only way they work. If you could write the records out to a file(s) sitting in ADL or Blob store, you can just leave the historical records there or append them/add file every day. Do you happen to know the size of each row? Did you hit a limit or just trying to stay ahead of it? If you don't have a lot of wide fields you're probably ok.

Was this reply helpful? Yes No
Community Member on at

Like (0)

Report

Thank you Bill,

1. I do not know the size or amont of rows yet. For now I'm trying stay ahead of it (do not hit the limit of 10 Gb). Also I'm worreid about the statement "The exact limitation is 10 GB of uncompressed data per table". What does it mean "uncompressed" and how can it be mapped to the size of SQL Server table?

2. I suspect the degration of import performance, since every day we will have more and more data in SQL Server table.

3. Could you please clarify a bit about files in Azure Data Lake or Blob? Do you mean csv files ? I hardly imagine how to import several Gbs of data from csv files :) Also we will have record updates (order status, etc) - so I have to update data in already uploaded files. I have these options for data import. Did You mean Azure blobs ?

Was this reply helpful? Yes No
Suggested answer

Bill Ryan 8 on at

Like (0)

Report

Here there Ivan:

So a few things.

1- To look at the size of each row,you can look at the DataType of each column to get an idea of the size of data you're moving. This has a sample procedure that can give you can use for : rows and this will give you an idea of Table Size. Regarding compression, if the rows are compressed it'll need to be uncompressed to be ingested but unless you've enabled it, I don't think that's an issue.

2-Understood, so performance will be an issue but transfer time is generally a lot less than simply having your contacts conflate and then associating activities unified. I can't say it's always true, but if you have a lot of activities, chances are that Activity unification in the refresh process will be the time killer. Usually ingestion is pretty fast, (a few minutes).

3-Ok, so you can use CSV, Excel, Parquet. ON Data ingestion, the middle option is to mount a CDM folder instead of using the Power Query interface. But that requires a little more work. So if you just did a full dump of your table to CSV/Parquet/JSON or pick the file format and wrote it to either Azure Blob Storage or a Data Lake, you can just connect to it just like a database. Other than setting up the connection initially, you can't really tell the difference. So in this instance of your screen shot, pick a file format, say Parquet or CSV, then you just specify the container it's in and the key, it'll authenticate and after that, everything is exactly the same.

You can use OneDrive or box or something different but Azure Blob storage or ADL are a lot more robust.

I think you're table size is well below 10gb but run those queries and we can figure it out.

Was this reply helpful? Yes No
AriaConsulting 2 on at

Like (0)

Report

Pardon what might be be a dumb question, but instead of using the on-prem SQL DB, why not use Azure SQL and then use a Dataverse dataflow to bring in the data you need to D365? Azure SQL is massively scalable, already lives in Azure (which is also where D365 lives, of course), and it doesn't seem that MS "taxes" dataflows against your D365 storage quota. And you don't have to worry about data gateway limits.

You also don't have to then deal with the comparatively "flat" data structure that ADL is optimized for (as it is obviously intended for analysis/reporting, etc.).

Was this reply helpful? Yes No
Community Member on at

Like (1)

Report

Hi! Thank you for response! Yes, I'm aware of possibility to import data from Azure SQL, but customer is currently have all infrastructure on-premise. Besides the fact that CI is expensive enough on its own, they will need to pay for Azure SQL also (just for transferring data to CI) :) That's why I'm looking for less expensive solution for importing data into CI.

Was this reply helpful? Yes No
Community Member on at

Like (1)

Report

We will have 12 million of order records and 35 million for order lines. And these amounts will be increased on daily basis. So I do not think that 35 millions of order lines will fit into 10 Gbs :(

In case of Azure Blob Storage - did you mean to export all rows from single table into single CSV file for example and put it into Azure Blob Storage Container ? Will it be OK to have single CSV file (more than 10Gb size) uploaded every day to Azure Blob Storage ? Will CI can handle it successfully ?

Was this reply helpful? Yes No
Suggested answer

Bill Ryan 8 on at

Like (0)

Report

Ivan, I think you might be surprised at the table size. Before going a lot further, run this inside the db using your order and Orderline tables exec sp_spaceused 'orders'. This will tell you the actually size. But as far as using delimitted files, i'd personally suggest using .parquet or .avro, but yes, you can do it with one big file, or one historical ones and reference the whole directory if you mount as a CDM Folder (the second option onwhen you create a data source). I think the guidance is to keep them around 2GB but that's just a guide. The same file in csv will reduce from around 10GB to 3GB if you use parquet instead in my experience. We've had 30 GB files in csv that CI was able to process so yes, it can. And if you mount, you can just reference the whole directory and drop a new file each day to just keep loading the incremental portions which will greatly shorten the process. That said,before you go too far down any road, check those table sizes b/c I think they're going to be a lot smaller than you might think and it's easy enough to check for sure - if they're say 2GB or smaller with the current volume, that gives you 5x the volume before you have to really be concerned with much and that's a lot of runway.

Was this reply helpful? Yes No
Community Member on at

Like (0)

Report

Bill, thank you. I'll try to estimate size of real tables. Since I havent't got the 'real' data in them - only approximate model and expected number of rows.

Was this reply helpful? Yes No
Suggested answer

Bill Ryan 8 on at

Like (0)

Report

If you can script out the table, we I’ll gladly do a quick computation on it and can give you a rough estimate. Actual table size includes Keys and Indices but we can get a very good estimate. I have a 35 million row table that's fairly wide along with an 85 million row table that's well under 10 GB together so I suspect you're going to be fine ,but we can know instead of guess. If you know the schema, tbh, I'm happy to every use a data generator and pump up 100 million rows just to be sure.

Was this reply helpful? Yes No
Community Member on at

Like (0)

Report

Hi Bill! MS advised us to use Incremental Refresh. It is in preview now. Te enable it - add parameter to the CI url &PPDFIncrementalRefresh=true. I did not tested this mode yet but it look promising

Was this reply helpful? Yes No