Skip to main content

Notifications

Speed up batch data sync from Business Central to Dataverse

Hello colleagues!

Imagine a situation when we have a big table in the Business Central which we need to sync to the Dataverse. By "big table" I mean a table with more than 75000 records.

Why 75K of records? By default, each sync job has a timeout of 12 hours. Of course, you can increase it, but this is not a way. By my measurement, 75K of records is a number that you can sync from Business Central to Dataverse for 12 hours. This time is changing from sync to sync, it could be even 10 hours, but if no issues occurs - 12 hours seems like a maximum time when 75K of any table will be sync to the Dataverse. I checked this with a ~ 25 tables and a result is quite the same everywhere and not very different even with increasing of field numbers.

6131.JPG

And now the main question - how to sync 100K of records? 500K of records? 5 million of records to Dataverse? In my practice I made all of this syncs and I want to share my experience with you.

Let's divide "an elephant" by parts. Open the Integration Table Mappings page and click on the Table filter value:

4657.JPG

For example, I have a sync of G/L Entry table with millions of records. I can split it by Entry No. field like this:

8816.JPG

You can use also Created At field or Document No. to divide the table.

And now, if I press Synchronization -> Run Full synchronization, system will create a sync job, which will sync only the first 50K of records to the Dataverse. and it will take about 7-8 hours to get them in Dataverse. Too slow - isn't it? This way sync of 5M records will take 700-800 hours, or about 1 Month. Sad.

How to speed up this? Let's apply a trick. Click on Synchronization -> Integration Synch. Job log and ensure that your job start running (it appeared in the list and Inserted, modified or skipped numbers are increasing):

3264.JPG

Now turn back to the Integration table mappings list, click on Table Filter and change it from 1..50000 to 50001..100000. When press Synchronization -> Run Full synchronization again. Open the Integration Synch. Job log again and ensure that 2 jobs are now running and syncing the different sets of records.

And after that repeat a trick in 3rd time. Set up Entry no. filter as 100001..150000 and create a third sync job with Synchronization -> Run Full synchronization. Now we are syncing 150K of records at the same time! 3 times faster.

Unfortunately, 3 sync jobs is a limit. All other created jobs will go to the queue.

Important!

1. More than that, you can only have total 3 running jobs at the moment. If you have other scheduled tasks - they will be in Ready status until one of your sync jobs will finish. By this reason, do not increase job timeout more than 12 hours and also avoid of syncing 100K and more records in one job - your normal jobs will be delayed.

2. After you create 3rd sync job - set the table filter to a single record - like 150000..150001. Otherwise, when any of your sync jobs will finish - whey will automatically run again with the last table filter. And you can get re-sync of existing records. It's not a critical, but one of your jobs will do a useless work for 3-4 hours instead of syncing needed records.

Wait until all 3 jobs will finish and continue the sync setup for the next record set.

3. Suddenly you can get sync job stop. Which looks like this:

2251.JPG

Any numbers are not incrementing and Finish time and duration are blank. Job usually turns to the Error status with a message something like "Connection to Dataverse failed". It could occurs while you publish an App to the environment or even without any reasonable actions. This way you have to make sync setup again, but if you write somewhere which job syncing which set of records - you can just change the filter counting inserted records. Imagine that first job was 1..50000 entries and second - 50001..100000. So I could create not jobs with the filters like 660..50000 and 53300..100000 to avoid of re-syncing inserted records.

4. Sometimes you can get a numbers in the Failed column - please check the reasons. If a record was not replicated because it already exists - this is OK. But sometimes it could be connection issues or something really strange. By clicking on record you will see a record number and you can check if it exists in the Dataverse. If not - create a filter for such records and sync them again.

5. A good point will be creation of Azure synapse link in the Dataverse. It will show you how many records were copied to the DataLake and you can compare it with the total entries quantity in the Business Central. Just use this as a checksum.

So, batch sync is a process which requires a lot of attention. Be patient and fix the amount of data that you already synced until the end.

I hope this material will help you.

Comments

*This post is locked for comments