We're trying to build a data sync from BC into a data lake (in our case, Azure Blob Storage) where we are exporting all attributes of specific entities and storing them in our data lake. The process is meant to run periodically and do incremental changes, so it only pull records that have been updated since the last sync. We are doing this by querying against the SystemModifiedAt field. Our entities are pretty static, but do change sometimes, so if would be nice if the data sync process was able to handle schema drift automatically.
We've currently tried the following approaches:
- using the native built in BC APIs - this doesn't return all the attributes, especially custom fields that have been added
- using dataverse - this is in beta for BC and like the native APIs, doesn't support all entities and doesn't return all the attributes
- using azure data factory with the built in data connector - however, there are some challenges we've run into with this method. As documented in the second bullet point in the blue box here (https://docs.microsoft.com/en-us/azure/data-factory/connector-dynamics-crm-office-365?tabs=data-factory#dynamics-as-a-source-type), because we are using a query, it samples the data and ends up dropping columns if the data in sampled rows is null for those columns. This is leading to an incomplete data sync. We could potentially use a copy without the query, but then we'd have to sync full data sets rather the incrementals.
What we are thinking is that we need to now build a custom query/API in BC and creating our own custom data connector for adf. However, before we go down that path, does anybody have any other suggestions or methods they have used to sync data from BC into a data lake?