Hello everyone,
I have a large contacts data that resides on our SQL DB and I would like to do a daily synchronization from this data directly to dataverse after business hours (for example every night starting at 10:00 PM) so it's not only a one-time migration.
I already have a dataflow that syncs our data from oracle DB to a table in dataverse and the approximate number of records that are synced daily are around 50-60K between Upsert and Insert operations.
The total duration to sync the above-mentioned amount of records took around 45min to 1h.
Please find below my concern regarding the options to sync a large dataset in Dataverse:
Lets say I have 300k contacts in our SQL DB and the first time I am running the dataflow it should create those records on dataverse. The next day the dataflow will run to perform an update for this amount of records, I am assuming that this dataflow will take more than 2 hours to process all the records the 300K or 500K+.
- What is the best practice to optimize the performance of the dataflow run to process the records in an efficient time?
- When should we consider using Azure Integration with dataverse vs dataflow?
- Can Azure integration also sync records from any data source to dataverse?
- What might happen if the dataflow has 300K records that should be synced to dataverse with a combination of Upsert/Insert count? I believe that every record is one API call, if so, I have seen that as per the below documentation the max limit is 40k requests per 24 hours (based on the license we have for the user that we will plan to create the dataflow on):
https://docs.microsoft.com/en-us/power-platform/admin/api-request-limits-allocations#licensed-user-request-limits - How it will impact the dataflow during the run if we exceeded this limit for the user where we have the dataflow on? Shall we purchase additional API limits?
Can please someone advise and provide the answers in detail?
Any help is highly appreciated.
Best regards,
EBMRay