My previous blog discussed the use of Staging Databases as part of integration design and some tips and pointers whilst building a staging database (if you decide to use one that is).
My background with data integration is using either out of the box/free tools or 3rd party data integration software (like Scribe or KingswaySoft) and the methods for designing and building data integration have evolved over time. This blog will detail the best practices when dealing with performance issues that I have encountered over the years.
You can apply some (if not all) of these principles for bespoke data integration tools, such as:
This will be my final* post when discussing best practices with data integration, most of which can be applied when using any tool to complete the integration.
*Star Wars did not end with a trilogy.
Integrating with Dynamics 365 can be a painful learning curve, some of the issues (that I would see as performance related) I have encountered are:
Additionally, big factor to consider with data integrations and loading data directly into Dynamics 365 is that this can only be achieved via two supported methods:
There is no supported method for loading data directly into the Dynamics 365 database, and if anyone asks you to do this, just say NO. So even if you are using a Staging DB no SQL – SQL transfers are NOT allowed.
With Dynamics 365 Online, Microsoft have protected customers by this method of tomfoolery by restricting access (or even visibility) to the Dynamics 365 Database!
All Dynamics 365 data integrations must be made through the API, which means they:
Considering the above items, we can see:
Whilst investigating why performance is being impacted, it also makes sense that we investigate ways of improving the performance of data integrations with the aim of improving:
I will walk though a common data load scenario which will highlight a few of the performance issues are listed above and are usually encountered when completing a data migration to Dynamics 365.
I will then review the migration and make suggestions/ list my best practices on how to overcome or help mitigate these issues.
For this example, I will walk through a very basic scenario of importing data without using any special tips or best practices; this also assumes that the the data integration tool is importing 1 record at a time (discussed later on). This is a direct source to target data import and all values are made up for the purposes of this example.
If you have the requirement to load a medium size data set (e.g. the last 5 years worth of data) for a customer, and say:
Dynamics 365 Online data migration from a data source.
We will use the example Sales load highlighted in my first blog, the load order would be as follows:
The loading of the Account data into Dynamics 365 will be relatively painless; it is a simple profile to just perform a Create operation into the CRM database. The Account data set is relatively small and this will probably complete within about 20-25 mins.
When measuring import speed, you will probably hear people talking about records/second (which a 3rd party tool will calculate for you). The average Account import speed here is about 10-15 records/second.
The next step is to load the Contact data into Dynamics 365, this profile is slightly more complicated in the fact that we have to also search for an existing Account to firstly set as the Parent Account ID and also update it with the Primary Contact value.
So instead of the one load operation to Dynamics 365, we have 3 operations with each operation requiring round trips to and from Dynamics 365 to the data integration tool:
Because of the additional work the profile has to do, the import speed may go down to an average of 3-4 records/second as the profile is having to do more work (roughly 3 times). This will take approximately 7 hours to import 100,000 Contact records.
This process is actually grouping and loading Opportunities and child Opportunity Product records together and loading these in the same data profile and contains 4 primary operations:
With the added complexity of grouping related data together (Opportunity and Opportunity Products) at the source, the profile may only achieve an average maximum of 1-2 records/second; with a maximum of 6000 rows the load time will be around 50 minutes.
Reviewing the Sale Process load example it can seen that increasing the complexity of a data profile can reduce the performance of the import speed.
For low-volume data loads, the time differences can be in minutes or hours which do not impact project timelines too much.
But, consider the impact to projects where the data sets contain millions of rows (i.e. high-volume); the potential increase in project timelines can be calculated in either days or weeks which your customers will not be happy with.
Identifying performance gains will be critical in these scenarios to reduce the time it takes to integrate the data.
So, how can we go about ways of improving data integration performance?
There will be different environments for every single data integration, and there will be different things you can try to help improve performance without touching the data integration itself. The changes could be either hardware (e.g. increase Server Memory) or software configuration (e.g. creating a database index) optimisations.
So lets look at the environmental changes we can complete to increase performance when integrating data with Dynamics 365 – first we can split out the environments as follows:
I will use the Staging Database approach with data integrations to provide the context of the environment, if using a file then there is not too much you can do except clear down erroneous rows or the 63000 extra blank rows that Excel sometimes like to add.
So with Staging Databases, I will make the assumption that this is a SQL database (but same tips will apply to additional database types like MySQL.
Firstly I recommend you look standard best practices for optimising a database; I will list out a few of the key ones that you can actually perform yourself.
The following tasks may require an Customer System Administrator to complete:
Some integration tools (like Scribe Insight) require their own Server with its own set of resources (and SQL database); I recommend using the following as a guide when choosing a location for your data integration tool:
For bespoke solutions e.g. console apps or web services, make sure they have:
If these are hosted in Azure and connecting to Dynamics 365 Online, how about moving to the same data center as where your Dynamics 365 instance is located? This will reduce network latency by reducing the number of connections it needs to make to Dynamics 365 via other servers.
For all tools and bespoke solutions console apps or web services – make sure they have a decent connection to Dynamics 365 (as stated above).
Loading data into Dynamics 365 can be affected by any number of items once the connection to D365 has been established. There are several tips I recommend completing whilst loading data into Dynamics 365:
We have looked at my recommended best practices from an environmental point of view, next we can look at tweaking the way your data profiles operate and suggesting improvements in logic.
The first item to think about is your profile load order, each data integration will require a profile order where the data should be ordered based on the required data to create that record type. The following diagram shows an example scenario for a typical data migration.
The decision could be made to either start with Leads or start with the Product Catalog (Products and Price Lists).
My best practices when planning a data integration and the profiles ordering logic are:
When querying source data, only select the exact columns you need to process as part of your data integration. Any additional Columns queried, but are not used become a waste of resource and can impact the performance of data profiles.
So in an example source Contact SQL Query:
SELECT [FirstName], [SurName], [Email], [PostCode], [Status]
WHERE [STATUS] = ‘Active’
SELECT [FirstName], [SurName], [Email], [PostCode], [Status]
WHERE [STATUS] = ‘Active’
Instead of the below query which returns all Columns.
SELECT * FROM CUSTOMER
WHERE [STATUS] = ‘Active’
SELECT * FROM CUSTOMER
Populating Lookup fields often requires some element of validation to ensure that the record GUID exists; e.g:
This means that for every lookup value that is set requires as part of the data integration, an extra query to and from Dynamics 365 is required before the Create/Update operation would occur. This will slow down the time it takes to complete the transaction for this row (e.g. the record/second would decrease).
On the Case Entity, you may have several lookup GUIDs to query before the Case can be created:
If we had these GUIDS before we started the import (as we should have imported the prerequisite data beforehand), then we could populate these fields without having to perform the additional query.
This would reduce the number of steps the profile has to complete to just a single Create Case step and improve the performance of data integration import speed (records/second).
For smaller data-sets, this may not be required as the data does not take much time to load; but for the larger data-sets we can see that improvements to import speed can have a positive impact on the total time it takes to load the entire data set.
To record the GUIDS of the pre-requisite data (especially configuration/Account/Contact data) I will always write back to the source data row in my staging database with the record GUID in the original profile. Or, I will build a separate profile which query Dynamics 365 (as the source) and targets the staging database. This will literally just write the GUID back to the Staging database source row.
So back to my Case import example, assuming I had captured the record GUIDS of each lookup, I could either:
Identify your largest data sets, some of these may be simple but some of them maybe complex. The aim is to improve the import speed (record/second) and the larger data-sets will gain the most from performance improvements.
The downside is, importing large volumes of data will eventually slow down table performance in Dynamics 365, which also will have a negative effect when trying to match to records as part of data profiles.
Reduce the number of steps in the profile by completing additional ETL transformations on the Staging Database (if available) – like pre-populating lookup fields; aim for straight inserts or straight update.
For simple – high volume profiles, many tools (even bespoke Console Apps/Web Services_ can take advantage of the Bulk methods (ExecuteMultiple) provided by Dynamics 365 where you can submit data in blocks as part of a single transaction. These are designed to increase the throughput of your data to Dynamics 365.
Nearly all data integrations will involve some sort of matching criteria, whether this is to return related record ID’s to populate lookup fields on import or when matching for either create new or update existing records.
Matching to records will often involve matching on key fields (in most cases a single field or key, in others the matching criteria may involve multiple field values.
Single key matching example – matching to Orders:
Multiple key matching example – matching to Contacts:
Whilst matching to these values may be required and cannot be changed. we can actually create indexes on these columns quite easily. In Dynamics 365, you can directly add these to the Quick Find View – Find Columns for that Entity; Dynamic 365 will create a database index for you automatically, this is extremely useful for Online environments where you can directly affect the database without touching it.
Additionally – it may be useful to also create indexes on the source columns used for matching over larger data-sets.
Where unique records are required to be matched, it may be worth considering creating an Alternate Key in Dynamics 365. Ben Hosking has written a very good article on alternate keys here. This will also create an index in the CRM database which improves lookup times.
Assigning records to User or Team can take a while even through the web interface, and this may be additionally impacted by cascading relationships to the records children (if applicable) where is child is also assigned to the User or Team. So this may have a negative impact on the data import speed
Where the requirement for records to be have their ownership during a data migration/integration, I considered it best practice on larger data sets or records where cascading relationship exists (e.g. Accounts and Opportunities) to either:
Tip: Ensure Users:
Completing Activities or setting record Statuses can have the same impact that assigning record ownership can have (e.g. cascade behavior). Also, depending on when the profile sets the status, further updates cannot (should not be in most scenarios) be made to those records through the data integration; especially with Activities – you will receive a hard Dynamics 365 error.
So my recommended approach is to consider having a separate record status profile as this will increase the performance of data integration initially creating or updating the record; then a profile to set the record Status executed post load (assuming no further record updates are required).
This completes the final blog*: Performance Best Practices blog for the data integration series, I hope you can start incorporating some of these principles into your data integration designs.
Thanks for reading,
*I lied, there is another post to come with more best practices!