Hi,
This is Dipei, I'm working on a product which needs to transform over Millions rows ant it will take over 12 hours to get completed.
I'm using Dynamics CRM 2016 online, version 8.2.
My use case is quite clear and simple, here are the potential data flow:
1. Read data from Dynamics CRM (batch size <= 5000)
2. Transform the data (need to run some customized codes)
3. Write back (update) the data to Dynamics CRM (batch size <= 1000)
As you guys know, there are some limitations in Dynamics CRM (online) to run the customize codes:
- we can't run our customize codes in Plugin over 2 minutes, and
- we can't recurrent to run the transformation process neither frequently, because the another limitation: Context Depth = 8 (some customers don't want to change this value).
I googled a lot and here are the potential solutions I may can go:
1. Microsoft allow we running big data job (long time running) in Microsoft Azure side, but this is not acceptable because part of our customers are still not have Microsoft Azure enabled.
2. Out of box application solutions to do that:
2.1, A desk top client app
It is doable, but the process may need to be run over 12 hours, it is not an ideal solution.
2.2, A online transform utility/system
It is also doable, but the problem is, such an utility/system will need to access data from Dynamics 365, some customers are not comfortable to share the credentials to connect to Dynamics CRM.
3. Workflow + Plugin:
I tried several different combinations, but still have no luck because of the limitations from Dynamics CRM online.
For example, I created a recurrence workflow (hourly), with which we can run the customize codes (Plugin) to complete the data transformation jobs. It works, but the problem is, when I tried to have it run more frequently (E.G. run once each half hour), it finally failed because Dynamics CRM took it as an infinite loop. If I can only schedule the recurrent workflow hourly, E.g. I can only have the customize code running not over 2 minutes in an hour, it will take weeks to complete the transformation and that is unacceptable.
Now, I'm planning to schedule 60 parallel recurrent workflows (all are run once each hour). It could be complicated because I learned somewhere, it says, Dynamics CRM online will not allow over 2 parallel thread to bulk read the data from the single table. In that case, I have to do do some extra works to have each independent recurrent workflow instance running in different time time lines. For example:
- workflow 1: started at 8:00 AM, recurrent hourly
- workflow 2: started at 8:01 AM, recurrent hourly
...
- workflow 60: started at 8:59 AM, , recurrent hourly
It may works, but have to run lots of tests to confirm that. I'm expecting to meet some other limitations which is unknown to me now.
I struggled over a week to run different tests with no lucks. It seems like it is not an easy thing. I do need expert advises/suggestions what should I do to enable such a feature working smoothly. Any helps are appreciated!
Thanks,
Dipei