Dynamics Community Forum Thread Details

A/B Test Results are Inconclusive

I've done an AB test for a customer insights journey however it's showing that the test was inconclusive despite Version B being the clear winner. The test duration was for 24 hours and there was over 150 contacts in each test group so I think it's a fair test - any idea why the results were inconclusive? And what is the definition of statistically significant since I would've though a 15% difference is significant?

Categories:

Dynamics 365 Customer Insights - Journeys

Here are some likely reasons in your context:

Sample size may still be too low given variance. Even with 150 contacts per group, if your underlying conversion rate or behavior has high variability, 150 might not be enough to reach statistical significance ,especially if the baseline rate is low.
Noise / randomness / external factors. Over just 24 hours, external events (timing, when people open emails or visit) or random fluctuations may dominate. A/B tests are more reliable over longer durations (to smooth out daily or hourly variation). Many sources recommend running for a full cycle (often a week or more) to avoid transient noise.
Your conversion metric may be uncommon or unstable. If only a small fraction convert (or some have delayed conversions), then many contacts may not yet have “converted,” inflating uncertainty and making significance harder to reach.
You may not have predetermined your MDE / power / sample-size calculations. Proper A/B testing best practices require you to decide before the test what Minimum Detectable Effect (MDE) you consider meaningful, estimate required sample size (power), then run until you hit that threshold. Stopping early or after a fixed short time (24h) without that planning can lead to inconclusive results even if a difference appears large.
Statistical power / test assumptions violated. If your data distribution is skewed, or your metric does not meet assumptions underlying standard tests (normality, independence), then significance testing becomes less reliable. Recent research shows that for non-normal or highly variable metrics, very large sample sizes may be needed.

Quick Links

Subscribe to this forum!