The purpose of this blog is to investigate in detail a benchmark run conducted with the CRM V4 Performance toolkit (see http://blogs.msdn.com/crm/archive/2008/02/29/microsoft-dynamicstm-crm-4-0-performance-toolkit.aspx or www.codeplex.com/crmperftoolkit). My intent is to provide you with:
a. Results from a run conducted in our performance labs, including details of the environment itself; this will provide a reference point to calibrate new benchmark setups.
b. The statistics we gather during the benchmark runs
c. Insight into how these statistics can be used to find and eliminate bottlenecks.
Hardware
Most of our performance benchmarking is done on a five server system which includes one domain controller, one SQL server, one application server, one asynchronous server, and one client machine. The benchmark runs done for this discussion were done on the following hardware:
- Domain Controller – Dell 2850
- SQL server – HP Proliant DL585 with 4 AMD Opteron 850 dual core processors, 32G of RAM,
- using 5 internal drives for the OS (RAID - 0)
- HP Storage Works MSA30 using a smart array 6400 controller, 7 drives for data, 3 drives for logs, 2 drives for temp, 2 drives for backups (all stripes RAID – 0)
- Asynchronous server – HP Proliant DL585 with 4 AMD Opteron 850 dual core processors, 32G of RAM
- Application server – HP Proliant DL585 with 4 AMD Opteron 850 dual core processors, 8G of RAM
- Client machine – Desktop with Intel P4 3.2 MHz single core processor with HyperThreading, 2G of RAM
Configuration
All machines had Microsoft Windows Server 2003 Enterprise x64 Edition SP2, except for the Domain Controller and the Client machine which had Microsoft Windows Server 2003 Enterprise Edition SP2. All database logical drives had their Windows allocation unit size set to 64K and all machines were on a gigabit network switch with gigabit NIC’s.
The SQL server is running SQL Server 2005 x64 Enterprise edition. The Microsoft CRM database, its log file, and the temp database are all on dedicated logical and physical devices to get better throughput from SQL. We also set Max Degree of Parallelism to 1 as SQL Server does not produce efficient parallel execution plans for the CRM workload, which can cause deadlocks and other performance degradations. Lastly SQL Server SQL Server 2005 deadlock tracing (T1222 and T1204) is enabled to allow us to better understand deadlocks encountered during the runs.
The Application (App) server has a basic install of Microsoft Dynamics CRM V4, with the Microsoft CRM Asynchronous Processing Service disabled (since we are using a dedicated Asynchronous server), as well as having CRM tracing configured to just catch errors (registry setting TraceCategories = *:Off;Platform.*:Error).
The Asynchronous (Async) server has a basic install of Microsoft Dynamics CRM V4 (pointing it to the existing Database from the install on the Application server). Error tracing is enabled on this server just as it is on the App server.
The Client Machine had Microsoft Visual Studio 2005 Team System and the Microsoft Dynamics CRM 4.0 Performance Toolkit installed. This is the same toolkit we built and used during the CRM 4 development cycle. To download and find more information about the toolkit visit www.codeplex.com/crmperftoolkit. We configure the load test to gather performance counters for the machines in our environment along with the standard benchmark statistics (like response time, user load, etc.). For more information on adding perfmon counters to the load test see How to: Specify Counter Sets. You can also get perfmon statistics from the App, Async, and SQL servers using the scripts below to add a counter set for each machine (note you have to update the bracketed parameters for the script to run correctly, we usually set our sample to 90 seconds on an hour run).
App and Async Servers:
logman -y create counter web_perf_log -o [SystemDrive]\Performance\Perfmons\[webperfmonname] --v -si [SAMPLE] -c "\.NET CLR Exceptions(*)\*" "\.NET CLR Interop(*)\*" "\.NET CLR Jit(*)\*" "\.NET CLR Loading(*)\*" "\.NET CLR LocksAndThreads(*)\*" "\.NET CLR Memory(*)\*" "\.NET CLR Remoting(*)\*" "\.NET CLR Security(*)\*" "\Active Server Pages\*" "\ASP.NET Applications(*)\*" "\ASP.NET Apps v1.1.4322(*)\*" "\ASP.NET Apps v2.0.50727(*)\*" "\ASP.NET State Service\*" "\ASP.NET v1.1.4322\*" "\ASP.NET v2.0.50727\*" "\ASP.NET\*" "\Browser\*" "\Cache\*" "\CrmPrfCntrs\*" "\Distributed Transaction Coordinator\*" "\eTrust Antivirus Realtime Server(*)\*" "\ICMP\*" "\ICMPv6\*" "\Indexing Service Filter(*)\*" "\Indexing Service(*)\*" "\Internet Information Services Global\*" "\IPv4\*" "\IPv6\*" "\Job Object Details(*)\*" "\Job Object(*)\*" "\LogicalDisk(*)\*" "\Memory\*" "\NBT Connection(*)\*" "\Network Interface(*)\*" "\NNTP Commands(*)\*" "\NNTP Server(*)\*" "\Objects\*" "\Paging File(*)\*" "\PhysicalDisk(*)\*" "\Print Queue(*)\*" "\Process(*)\*" "\Processor(*)\*" "\RAS Port(*)\*" "\RAS Total\*" "\Redirector\*" "\Server Work Queues(*)\*" "\Server\*" "\SMTP NTFS Store Driver(*)\*" "\SMTP Server(*)\*" "\System\*" "\TCPv4\*" "\TCPv6\*" "\Thread(*)\*" "\UDPv4\*" "\UDPv6\*" "\Web Service Cache\*" "\Web Service(*)\*" "\WMI Objects\*"
SQL Server:
logman -y create counter sql_perf_log -o [SystemDrive]\Performance\Perfmons\[sqlperfmonname] --v -si [SAMPLE] -c "\LogicalDisk(*)\*" "\Memory\*" "\MSAS 2005:Cache\*" "\MSAS 2005:Connection\*" "\MSAS 2005:Data Mining Model Processing\*" "\MSAS 2005:Locks\*" "\MSAS 2005:MDX\*" "\MSAS 2005:Memory\*" "\MSAS 2005:Proactive Caching\*" "\MSAS 2005:Proc Aggregations\*" "\MSAS 2005:Proc Indexes\*" "\MSAS 2005:Processing\*" "\MSAS 2005:Storage Engine Query\*" "\MSAS 2005:Threads\*" "\MSFTESQL:Catalogs(*)\*" "\MSFTESQL:Indexer PlugIn(*)\*" "\MSFTESQL:Service\*" "\MSRS 2005 Web Service(*)\*" "\MSRS 2005 Windows Service(*)\*" "\Objects\*" "\Paging File(*)\*" "\PhysicalDisk(*)\*" "\Process(*)\*" "\Processor(*)\*" "\Server Work Queues(*)\*" "\Server\*" "\SQLAgent:Statistics\*" "\SQLServer:Access Methods\*" "\SQLServer:Broker Activation(*)\*" "\SQLServer:Broker Statistics\*" "\SQLServer:Broker/DBM Transport\*" "\SQLServer:Buffer Manager\*" "\SQLServer:Buffer Node(*)\*" "\SQLServer:Buffer Partition(*)\*" "\SQLServer:Catalog Metadata(*)\*" "\SQLServer:CLR\*" "\SQLServer:Cursor Manager by Type(*)\*" "\SQLServer:Cursor Manager Total\*" "\SQLServer:Databases(*)\*" "\SQLServer:Exec Statistics(*)\*" "\SQLServer:General Statistics\*" "\SQLServer:Latches\*" "\SQLServer:Locks(*)\*" "\SQLServer:Memory Manager\*" "\SQLServer:Plan Cache(*)\*" "\SQLServer:SQL Errors(*)\*" "\SQLServer:SQL Statistics\*" "\SQLServer:SSIS Pipeline\*" "\SQLServer:Transactions\*" "\SQLServer:User Settable(*)\*" "\SQLServer:Wait Statistics(*)\*" "\System\*" "\TCPv4\*" "\TCPv6\*" "\Thread(*)\*" "\Web Service Cache\*" "\Web Service(*)\*" "\WMI Objects\*"
Benchmark Runs
Being on the performance team we make numerous benchmark runs each week to gather data points for historical analysis as well as to identify areas for potential deep dive investigations. Below are results from one such run made using the LoadTestOutlookDM.loadtest workload (which is included in the performance toolkit). This run simulates a 500 user workload with complex workflows, and was intentionally designed to stress the system. Where relevant, I’ll mention results from the exact same load test, but without the complex workflows, to show how the toolkit can be used to carry out comparative analysis around some change in the workload. The LoadTestOutlookDM loadtest contains 16 different scenarios that include tests that cover Reports, Outlook, Data Management, Service Management, random email generation, as well as basic crud operations. For the remainder of this blog we will discuss the results in detail and review the process we use to determine the key performance characteristics of the system.
Step 1: Understanding the Visual Studio result windows
After the run completes you’ll be presented with a screen that should look similar to this:
Figure 1: Graphical view.
The key elements in this window are:
- The graph which displays the counter values (perfmon and benchmark stats) over the course of the run. Use the Counters pane to control which stats are displayed.
- The tabular display of the counter values which displays the Min, Max, and Average values of the stats (note that the slider allows you to zoom in on particular sections of the run)
- The summary pane which shows a few high level stats about the overall run
- Buttons for displaying graph or table view
Clicking on the Tables button will bring up this:
Figure 2: Table view.
The key element in this window is the table which can display one of seven categories. The categories we normally look at are:
- Tests – this displays the test and scenario name, along with various stats about the execution of the test. Tests map to webtests in the toolkit, and are the fundamental building blocks of any workload.
- Transactions – Each test above can contain zero or more explicitly measured “transactions” (not to be confused with database transactions). Most of the tests included with the toolkit consist of a sequence of user actions – for example, the CreateNewAccount test will include NavAccountHomepage and CreateAccount. Note that the same transaction name can appear in multiple tests, you can see exactly which test and scenario a transaction was recorded for in the table.
- Errors – this displays the type of error hit, the subtype of the error hit, the number of times that error was hit (with a link to more detail about those errors), and the last message of the last error hit.
- Requests – this displays the request name, the scenario it was in, the test(s) it happened in, the total number of times the request was made, the number of failures on that request (with a link to more detail about those errors), the response time of that request, and the content length of that request. Just like transactions, the same request can come from different scenarios, tests, and transactions – it can be important to pay attention to these details because different tests can access the same page in different manners (and have different performance characteristics).
Step 2: Digesting the results
To start we pull up some counter values from the relevant machines (either through perfmon or Visual Studio if you added the machines to your load test’s “Counter Set Mappings” area). Since we were running workflow on this run we’ll look at the Async, Web, and SQL servers. First thing I do is check the following counters: Total Requests, Total Tests, and Failed Tests from the “Overall” counter on the client, the “% Processor Time” (for all machines) under the “Computer” object, the “Batch Requests/sec” from the SQL Servers “SQLServer:SQL Statistics” object, then lastly I add nine counters (because I have the SQL drives split up for better throughput, so one for each drive (SQL data, SQL log, and the temp database) from the SQL Servers “PhysicalDisk” object, they are “Avg. Disk Queue Length”, “Disk Transfers/sec”, and “Disk Bytes/sec” counters (refer to Figure 1 as an example).
The first thing I noticed is the high number of failed tests; in this case just below 6% of tests failed (we consider anything above 1% as an investigation point). Next I look at the Requests/Sec and Avg. Response Time they aren’t that bad (9.2 and .5 respectively), but there was a slight increase in the Response Time .3 to .5 from my earlier run without Complex Workflows. Since each user action in CRM typically involves at least a few requests to the server, small differences in the average request time across all requests can translate to noticeable increases in user perceived performance. I check the % Processor Time on all three systems to make sure something didn’t go wrong on one of my App servers. They are all fairly busy, but nothing they shouldn’t be able handle. If they were up in the 65-70% average I’d be looking at offloading some of the work onto another server.
Nothing jumped out after reviewing the high level system stats, so I’m going to look at the SQL Server in detail. Microsoft CRM has a very complex schema, and experience has taught us that most of CRM’s performance characteristics stem from the performance of the underlying database. For starters I like to check the counters “Batch Request/sec” and a few physical disk counters to see if that is where the bottleneck resides. Sure enough the Batch Request/sec is close to twice as busy as the earlier run without workflow and the Avg. Disk Queue Length and Disk Bytes/sec are both significantly higher, but I also notice that the Last counter read for Ave Disk Queue Length and Disk Bytes/sec were significantly high that this server had a lot of things to process late in the cycle or even after the run was over which points to Async processes “workflow”.
The Benchmark Toolkit will automatically make a backup of the AsyncOperationBase table after the Benchmark, so you can query Asynchronous operations that ran during your benchmark run. The tool also creates a workflow stats files that is saved out in your test results out directory, but I like to keep track of the async operation type as well as get my hands dirty. The table will be saved off under the master database as WorkfowStats_[your load test name]_[the time load test was started]_[organization name]_MSCRM. Edit the database name as well as the startedon and completedon of the following query, and then use this query against the above database to extract the Asynchronous operations that ran during your benchmark.
select name,
OperationType,
min(datediff(second ,startedon, completedon)) as [min],
max(datediff(second ,startedon, completedon))as [max],
avg(datediff(second ,startedon, completedon)) as [avg],
count(*) as [count]
from dbo.WorkfowStats_LoadTestOutlookDM2_23_2008_1_10_07_AM_perfv4RTM_MSCRM
where completedon is not null
and
startedon > '2/23/2008 9:10:00 AM'
and
completedon < '2/23/2008 10:11:00 AM'
group by name, OperationType
It will return something like this.
| Name | OperationType | min | max | avg | count |
| Workflow expansion task | 1 | 0 | 96 | 38 | 13761 |
| importlead633393282086689699 | 3 | 36 | 36 | 36 | 1 |
| importcontact633393286571064699 | 3 | 137 | 137 | 137 | 1 |
| importcontact633393286571064699 | 4 | 54 | 54 | 54 | 1 |
| importlead633393282086689699 | 4 | 6 | 6 | 6 | 1 |
| importlead633393282086689699 | 5 | 170 | 170 | 170 | 1 |
| MyPropagateByExpression BulkOperation | 6 | 2 | 72 | 20 | 6 |
| cross entity contact and account rule | 7 | 72 | 72 | 72 | 1 |
| single entity complex rule | 7 | 128 | 128 | 128 | 1 |
| Inactivity Alert – incident | 10 | 5 | 195 | 71 | 32 |
| Create Record Alert – opportunity | 10 | 1 | 163 | 28 | 44 |
| Lead Routing Scenario | 10 | 1 | 196 | 126 | 2014 |
| Inactivity Alert – lead | 10 | 6 | 199 | 101 | 20 |
| Assignment Alert – appointment | 10 | 12 | 183 | 74 | 155 |
| Assignment Alert – fax | 10 | 12 | 183 | 76 | 155 |
| Create Record Alert – lead | 10 | 0 | 198 | 127 | 2015 |
| Assignment Alert – phonecall | 10 | 12 | 194 | 78 | 155 |
| Assignment Alert – contact | 10 | 18 | 185 | 76 | 152 |
| Assignment Alert – task | 10 | 15 | 196 | 77 | 3708 |
| Assignment Alert – invoice | 10 | 12 | 166 | 74 | 155 |
| Case Routing Scenario | 10 | 1 | 140 | 16 | 79 |
| Assignment Alert – lead | 10 | 0 | 171 | 41 | 159 |
| Inactivity Alert – opportunity | 10 | 5 | 236 | 107 | 28 |
| Assignment Alert – letter | 10 | 12 | 185 | 77 | 155 |
| Assignment Alert – opportunity | 10 | 1 | 198 | 78 | 2764 |
| Assignment Alert – incident | 10 | 0 | 171 | 65 | 587 |
| MyPropagateByExpression BulkOperation | 11 | 2 | 81 | 17 | 40 |
| Leads | 12 | 1 | 53 | 7 | 10 |
| Accounts | 12 | 1 | 28 | 6 | 10 |
| Contacts | 12 | 1 | 74 | 11 | 10 |
As you can see there were a lot of Asynchronous operations running during that hour run, 12,377 were from the workflows we defined. Let see how much data was added to the asyncoperationbase table from before and after this run. Most of these queries I’m putting down here I run before and after the benchmark runs to get a better idea what is happening to the system, this often allows me to catch bottlenecks and degradations very easily and once I find the root I can work my way back up to make system configuration changes or database changes or create bugs which will improve design of that particular feature that may be causing disruptions in flow.
This query will save off the table size information (i.e. page counts) into a database (500UserRTMStats) that you’ll have to create beforehand (make sure you run this against your organization database). This will allow you to compare any CRM database size before and after each run.
select
cast (object_name(p.object_id) as char(40)) as 'object_name', cast (i.name as char(40)) as 'index_name', p.in_row_used_page_count, p.lob_used_page_count, p.used_page_count
into [500UserRTMStats]..[OrgTableSizeAfterLoadTestOutlookDM]
from sys.dm_db_partition_stats p join sys.indexes i on
p.object_id = i.object_id and p.index_id = i.index_id
order by used_page_count desc
As mentioned before, here I’m interested in the AsyncOperationBase table to see how much in size that table grew.
From before the run:
SELECT *
FROM dbo.OrgTableSizeBeforeRuns
WHERE object_name LIKE 'AsyncOperationBase'
| object_name | index_name | in_row_ used_ page_count | lob_used _page_count | used_ page_count |
| AsyncOperationBase | cndx_PrimaryKey _AsyncOperation | 381 | 0 | 381 |
| AsyncOperationBase | ndx_SystemManaged | 51 | 0 | 51 |
| AsyncOperationBase | ndx_Cover_Async Operation | 41 | 0 | 41 |
| AsyncOperationBase | ndx_RequestId_Async Operation | 38 | 0 | 38 |
| AsyncOperationBase | ndx_RegardingObjectId _AsyncOperation | 36 | 0 | 36 |
| AsyncOperationBase | ndx_StartedOn_Async Operation | 33 | 0 | 33 |
As well as the size of the table after the run:
SELECT *
FROM dbo.OrgTableSizeAfterLoadTestOutlookDM
WHERE object_name LIKE 'AsyncOperationBase'
| object_name | index_name | in_row_ used_page _count | lob_ used_page _count | used_ page_count |
| AsyncOperationBase | cndx_PrimaryKey _AsyncOperation | 44560 | 4384 | 48944 |
| AsyncOperationBase | ndx_SystemManaged | 964 | 0 | 964 |
| AsyncOperationBase | ndx_Cover_Async Operation | 505 | 0 | 505 |
| AsyncOperationBase | ndx_StartedOn_Async Operation | 382 | 0 | 382 |
| AsyncOperationBase | ndx_RegardingObjectId _AsyncOperation | 380 | 0 | 380 |
| AsyncOperationBase | ndx_RequestId_Async Operation | 288 | 0 | 288 |
As you can see the table size had a pretty large increase in size. That makes me wonder how much the indexes associated with that table grew in relationship to the other indexes, so I’m going to query the index physical stats to see if any of these rise to the top. Again I would use this query right after the run, before would also provide you with some interesting perspective as well. This query also puts the results in a pre-created database of your liking it will give you the physical stats of all your indexes of your database (again make sure you run this against your organization database).
select object_name(p.object_id) as ‘table_name’, cast (i.name as char(40)) as 'index_name', p.*
into [500UserRTMStats]..[OrgIndexPhysicalAfterLoadTestOutlookDM]
from sys.dm_db_index_physical_stats(DB_ID(), null, null, null, 'LIMITED')p join sys.indexes i on
p.object_id = i.object_id and p.index_id = i.index_id
order by oname
Then I pick the top 20:
SELECT TOP (20) table_name, index_name, index_id, partition_number, index_depth, index_level, avg_fragmentation_in_percent, fragment_count, avg_fragment_size_in_pages, page_count
FROM OrgIndexPhysicalAfterLoadTestOutlookDM
ORDER BY page_count DESC
Column name due to lack of space:
a = table_name
b = index_name
c = index_id
d = partition_number
e = index_depth
f = index_level
g = avg_fragmentation_in_percent
h = fragment_count
i = avg_fragment_size_in_pages
j = page_count