web
You’re offline. This is a read only version of the page.
close
Skip to main content

Announcements

No record found.

News and Announcements icon
Community site session details

Community site session details

Session Id :
Customer experience | Sales, Customer Insights,...
Suggested Answer

A/B Test Results are Inconclusive

(1) ShareShare
ReportReport
Posted on by 2
I've done an AB test for a customer insights journey however it's showing that the test was inconclusive despite Version B being the clear winner. The test duration was for 24 hours and there was over 150 contacts in each test group so I think it's a fair test - any idea why the results were inconclusive? And what is the definition of statistically significant since I would've though a 15% difference is significant? 
 
 
newsletter.png
I have the same question (0)
  • Suggested answer
    Daniyal Khaleel Profile Picture
    760 Most Valuable Professional on at
    Here are some likely reasons in your context:
    • Sample size may still be too low given variance. Even with 150 contacts per group, if your underlying conversion rate or behavior has high variability, 150 might not be enough to reach statistical significance ,especially if the baseline rate is low. 
    • Noise / randomness / external factors. Over just 24 hours, external events (timing, when people open emails or visit) or random fluctuations may dominate. A/B tests are more reliable over longer durations (to smooth out daily or hourly variation). Many sources recommend running for a full cycle (often a week or more) to avoid transient noise. 
    • Your conversion metric may be uncommon or unstable. If only a small fraction convert (or some have delayed conversions), then many contacts may not yet have “converted,” inflating uncertainty and making significance harder to reach.
    • You may not have predetermined your MDE / power / sample-size calculations. Proper A/B testing best practices require you to decide before the test what Minimum Detectable Effect (MDE) you consider meaningful, estimate required sample size (power), then run until you hit that threshold. Stopping early or after a fixed short time (24h) without that planning can lead to inconclusive results even if a difference appears large. 
    • Statistical power / test assumptions violated. If your data distribution is skewed, or your metric does not meet assumptions underlying standard tests (normality, independence), then significance testing becomes less reliable. Recent research shows that for non-normal or highly variable metrics, very large sample sizes may be needed.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Introducing the 2026 Season 1 community Super Users

Congratulations to our 2026 Super Stars!

Congratulations to our 2025 Community Spotlights

Thanks to all of our 2025 Community Spotlight stars!

Leaderboard > Customer experience | Sales, Customer Insights, CRM

#1
ManoVerse Profile Picture

ManoVerse 166 Super User 2026 Season 1

#2
Jimmy Passeti Profile Picture

Jimmy Passeti 51 Most Valuable Professional

#2
NeerajPawar Profile Picture

NeerajPawar 51

Last 30 days Overall leaderboard

Product updates

Dynamics 365 release plans