We planned to use on-premise gateway connector to import historical data for orders/order lines into CI (more than 10 million of rows, On-Premise SQL server) and upload new orders/order lines later. I have read about the limits for on-premise gateway connector about max 10Gb per table ( docs.microsoft.com/.../service-gateway-onprem-tshoot ). I decided to make ingestion in several steps:
1. Import historical data (several steps: Import 5 million records -> Clear SQL table, Import next 5 million records -> Clear SQL table….)
2. Clear the SQL table
3. Import new data on daily basis
4. Periodically clear SQL table of imported records
5. Go to step p.3
But when I delete rows in SQL table (after successful import into CI) – records also deleted in CI as well!
My questions are:
1. Is it possible to forbid deletion of imported data in CI ? Configure On-Premise Data Gateway connector in “append-only” mode ?
2. Are there another way of loading data into CI from on-premise SQL Server ?
Just to clarify, the Incremental Refresh functionality is on by default if you have a new instance or connect to your own Dataverse as that enables the Power Platform Data Flow connectors which support incremental refresh. You can force using those connectors instead of the Power BI Connectors with the flag added to your URL as mentioned.
Hi Bill! MS advised us to use Incremental Refresh. It is in preview now. Te enable it - add parameter to the CI url &PPDFIncrementalRefresh=true. I did not tested this mode yet but it look promising
If you can script out the table, we I’ll gladly do a quick computation on it and can give you a rough estimate. Actual table size includes Keys and Indices but we can get a very good estimate. I have a 35 million row table that's fairly wide along with an 85 million row table that's well under 10 GB together so I suspect you're going to be fine ,but we can know instead of guess. If you know the schema, tbh, I'm happy to every use a data generator and pump up 100 million rows just to be sure.
Bill, thank you. I'll try to estimate size of real tables. Since I havent't got the 'real' data in them - only approximate model and expected number of rows.
Ivan, I think you might be surprised at the table size. Before going a lot further, run this inside the db using your order and Orderline tables exec sp_spaceused 'orders'. This will tell you the actually size. But as far as using delimitted files, i'd personally suggest using .parquet or .avro, but yes, you can do it with one big file, or one historical ones and reference the whole directory if you mount as a CDM Folder (the second option onwhen you create a data source). I think the guidance is to keep them around 2GB but that's just a guide. The same file in csv will reduce from around 10GB to 3GB if you use parquet instead in my experience. We've had 30 GB files in csv that CI was able to process so yes, it can. And if you mount, you can just reference the whole directory and drop a new file each day to just keep loading the incremental portions which will greatly shorten the process. That said,before you go too far down any road, check those table sizes b/c I think they're going to be a lot smaller than you might think and it's easy enough to check for sure - if they're say 2GB or smaller with the current volume, that gives you 5x the volume before you have to really be concerned with much and that's a lot of runway.
We will have 12 million of order records and 35 million for order lines. And these amounts will be increased on daily basis. So I do not think that 35 millions of order lines will fit into 10 Gbs :(
In case of Azure Blob Storage - did you mean to export all rows from single table into single CSV file for example and put it into Azure Blob Storage Container ? Will it be OK to have single CSV file (more than 10Gb size) uploaded every day to Azure Blob Storage ? Will CI can handle it successfully ?
Hi! Thank you for response! Yes, I'm aware of possibility to import data from Azure SQL, but customer is currently have all infrastructure on-premise. Besides the fact that CI is expensive enough on its own, they will need to pay for Azure SQL also (just for transferring data to CI) :) That's why I'm looking for less expensive solution for importing data into CI.
Pardon what might be be a dumb question, but instead of using the on-prem SQL DB, why not use Azure SQL and then use a Dataverse dataflow to bring in the data you need to D365? Azure SQL is massively scalable, already lives in Azure (which is also where D365 lives, of course), and it doesn't seem that MS "taxes" dataflows against your D365 storage quota. And you don't have to worry about data gateway limits.
You also don't have to then deal with the comparatively "flat" data structure that ADL is optimized for (as it is obviously intended for analysis/reporting, etc.).
Here there Ivan:
So a few things.
1- To look at the size of each row,you can look at the DataType of each column to get an idea of the size of data you're moving. This has a sample procedure that can give you can use for : rows and this will give you an idea of Table Size. Regarding compression, if the rows are compressed it'll need to be uncompressed to be ingested but unless you've enabled it, I don't think that's an issue.
2-Understood, so performance will be an issue but transfer time is generally a lot less than simply having your contacts conflate and then associating activities unified. I can't say it's always true, but if you have a lot of activities, chances are that Activity unification in the refresh process will be the time killer. Usually ingestion is pretty fast, (a few minutes).
3-Ok, so you can use CSV, Excel, Parquet. ON Data ingestion, the middle option is to mount a CDM folder instead of using the Power Query interface. But that requires a little more work. So if you just did a full dump of your table to CSV/Parquet/JSON or pick the file format and wrote it to either Azure Blob Storage or a Data Lake, you can just connect to it just like a database. Other than setting up the connection initially, you can't really tell the difference. So in this instance of your screen shot, pick a file format, say Parquet or CSV, then you just specify the container it's in and the key, it'll authenticate and after that, everything is exactly the same.
You can use OneDrive or box or something different but Azure Blob storage or ADL are a lot more robust.
I think you're table size is well below 10gb but run those queries and we can figure it out.
Thank you Bill,
1. I do not know the size or amont of rows yet. For now I'm trying stay ahead of it (do not hit the limit of 10 Gb). Also I'm worreid about the statement "The exact limitation is 10 GB of uncompressed data per table". What does it mean "uncompressed" and how can it be mapped to the size of SQL Server table?
2. I suspect the degration of import performance, since every day we will have more and more data in SQL Server table.
3. Could you please clarify a bit about files in Azure Data Lake or Blob? Do you mean csv files ? I hardly imagine how to import several Gbs of data from csv files :) Also we will have record updates (order status, etc) - so I have to update data in already uploaded files. I have these options for data import. Did You mean Azure blobs ?
André Arnaud de Cal...
292,031
Super User 2025 Season 1
Martin Dráb
230,868
Most Valuable Professional
nmaenpaa
101,156