Quick start: KQL helper
Add this as base queries in you Azure Data Explorer workbook. It unions traces and customEvents, then exposes common fields used by the visuals.// ------ Global parameters ------let _lookback = 7d; // time window for analysislet _bin = 1h; // chart bin (10m/30m/1h)// ------ Helper: DataverseThrottle ------let DataverseThrottle = () { union isfuzzy=true ( traces | where timestamp > ago(_lookback) | where tostring(customDimensions.throttlingAction) =~ "Throttle" or tostring(customDimensions.ThrottleAction) =~ "Throttle" ), ( customEvents | where timestamp > ago(_lookback) | where name =~ "QueryThrottled" or tostring(customDimensions.throttlingAction) =~ "Throttle" ) | extend cd = todynamic(customDimensions) | extend command = tostring(cd.Command), cdsQueryHash = tostring(cd.CdsQueryHash), delayMs = tolong(cd.throttlingDelayMilliseconds), probability = todouble(cd.throttleProbabilityPercentage), reason = tostring(cd.throttleReason), expiryTime = todatetime(cd.throttleExpiryTime), environmentId = tostring(cd.environmentId), organization = tostring(cd.organizationUrl), userId = tostring(user_Id), opId = tostring(operation_Id), opName = tostring(operation_Name), sessionId = tostring(session_Id) | project timestamp, command, cdsQueryHash, delayMs, probability, reason, expiryTime, environmentId, organization, userId, opId, opName, sessionId, itemType};Explore the data: ready-to-paste queries
1) Throttles over time (with average delay)
Purpose: Show the volume and severity of throttling over time so you can spot patterns, regressions, and verify improvements after changes.
How it works: Counts throttle events per time bin and computes the average throttle delay within each bin. (here the bin is defaulted to 1h. change as appropriate)
Interpretation:
Rising Throttles = more frequent throttling; rising
AvgDelayMs = more impactful throttling. Both should decline after query optimizations.
Use a time chart (line/area) and plot both series.DataverseThrottle| summarize Throttles = count(), AvgDelayMs = avg(delayMs) by bin(timestamp, 1h)| order by timestamp asc2) Top throttled queries (by hash) + sample text
Purpose: Identify which logical queries cause the most throttling so you can prioritize remediation.
How it works: Groups by a stable query hash and shows total throttles, average delay, and one representative command text.
Interpretation: Start with the top hashes; high AvgDelayMs indicates heavier impact per event.
Start remediation here.DataverseThrottle| summarize Throttles = count(), AvgDelayMs = round(avg(delayMs), 1), AnySampleCommand = take_any(command) by cdsQueryHash| top 15 by Throttles desc3) Reasons for throttling
Purpose: Explain why throttling occurred to guide the right fix (e.g., concurrency vs. cost).
How it works: Aggregates the reason field and ranks by count.
Interpretation: Dominant reasons point to the governing policy or resource under pressure.
Distribution of “why”.DataverseThrottle| summarize Throttles = count() by tostring(reason)| top 10 by Throttles desc4) Delay percentiles (impact)
Purpose: Quantify how much throttling slows requests for typical users and the long tail.
How it works: Computes min/max and percentiles of the applied delay across all throttle events in the window.
Interpretation: High P95/P99 means a subset of requests are significantly delayed; track these post-remediation.
Single-value tiles or a compact table.DataverseThrottle| summarize MinDelayMs = min(delayMs), P50DelayMs = percentile(delayMs, 50), P90DelayMs = percentile(delayMs, 90), P95DelayMs = percentile(delayMs, 95), P99DelayMs = percentile(delayMs, 99), MaxDelayMs = max(delayMs)5) Hot windows (spike detection)
Purpose: Alert on statistically significant spikes so teams can respond quickly.
How it works: Builds a 7-bin simple moving average using prev() and flags bins where current > 2x baseline and above a minimum count.
Interpretation: Each resulting row is a spike timestamp. Drill into correlating queries, reasons, and callers for RCA.
Cluster-agnostic rolling baseline using prev().let _multiplier = 2.0; // spike sensitivity (current > 2x baseline)let _minCount = 5; // ignore tiny blipsDataverseThrottle| summarize Throttles = count() by timestamp = bin(timestamp, 1h)| order by timestamp asc| serialize| extend p1 = prev(Throttles,1), p2 = prev(Throttles,2), p3 = prev(Throttles,3), p4 = prev(Throttles,4), p5 = prev(Throttles,5), p6 = prev(Throttles,6)| where isnotnull(p6)| extend MovingAvg = (todouble(Throttles)+todouble(p1)+todouble(p2)+todouble(p3)+todouble(p4)+todouble(p5)+todouble(p6)) / 7.0| where Throttles > _multiplier * MovingAvg and Throttles > _minCount| project timestamp, Throttles, MovingAvg6) Most-affected callers / operations
Purpose: Identify which users, operations, or sessions are most impacted for targeted fixes and communication.
How it works: Groups by the first non-null identifier among userId, operation name, and sessionId; computes counts and average delay.
Interpretation: High counts and large AvgDelayMs indicate prime candidates for optimization or scheduling changes.
DataverseThrottle| summarize Throttles = count(), AvgDelayMs = avg(delayMs) by Caller = coalesce(userId, opName, sessionId)| top 15 by Throttles desc7) Recent detailed samples (triage)
Purpose: Provide a short, actionable list of the latest throttle events for on-call and RCA.
How it works: Projects key fields (reason, delay, expiry, command) and sorts by time descending.
Interpretation: Use the command snippet to trace the source; expiry helps estimate when pressure will ease.
DataverseThrottle
| project timestamp, reason, delayMs, probability, expiryTime,
opId, opName, userId,
Command = substring(command, 0, 800)
| order by timestamp desc
| take 100
8) Environment / org breakdown (optional, multi-tenant)
Purpose: Show where throttling is concentrated across environments or organization URLs for multi-tenant teams.
How it works: Groups by environment and organization, computing counts and average delay; returns the top 20.
Interpretation: Hotspots indicate localized workload or configuration issues; focus remediation there first.
DataverseThrottle| summarize Throttles = count(), AvgDelayMs = avg(delayMs) by tostring(environmentId), tostring(organization)| top 20 by Throttles descBuild a dashboard / workbook
Follow the instructions and the downloadable json file to install the dashboard :
Dynamics-365-FastTrack-Implementation-Assets/MonitoringAndTelemetry/DataverseThrottling at master · microsoft/Dynamics-365-FastTrack-Implementation-AssetsHow to act on the findings
- Reduce query cost: fetch fewer columns, add filters, avoid broad joins, paginate.
- Reschedule heavy jobs: move bulk loads/exports/reporting to off-peak hours.
- Cache repeat reads: use app/service caching for hot paths.
- Work the list: start with Top Hashes, confirm Reasons, validate user impact via Percentiles, and watch Trend/Spikes trend down after fixes.
Alerting (optional)
- Sustained spikes: count() of throttles above a threshold per period.
- Single noisy query: a specific cdsQueryHash crossing N events in M minutes.
- Rising severity: AvgDelayMs or P95 delay above target.
FAQ
1. Does this replace API throttling?
No. Query throttling governs work we accept, while API throttling protects the service at the boundary (rate/concurrency). You need both layers for stability; this telemetry helps diagnose and tune queries once requests are inside.
2. Does the telemetry add overhead?
It’s lightweight, similar to other App Insights traces. The benefit—faster time-to-root-cause—far outweighs the cost.
3. Will field names change?
It’s a preview; minor name changes are possible. Use the helper function to normalize.
Closing
This preview closes one of the biggest black boxes in Dataverse performance. Turn it on, paste the queries, and give your teams the visibility to self-diagnose and fix throttling quickly.