Agent Evaluation Series Part 2: Sets, Metrics, & Evaluation Blueprint | FastTrack TechTalk | Dynamics 365

Views (84)

Alejandra Cabrales ... Microsoft Employee

Like(0)

Report

Building AI agents in Dynamics 365 is no longer just about functionality—it’s about proving reliability, safety, and performance at scale.

In this TechTalk, Microsoft FastTrack architects walk through how to move from evaluation theory to real-world implementation. You’ll see how to design effective evaluation sets, choose the right metrics, and use an Evaluation Design Document (EDD) to ensure your agents behave as expected in production.

This session includes a real-world Dynamics 365 onboarding agent scenario, showcasing how AI-driven workflows introduce new risks—and how structured evaluation prevents costly failures.

Companion deck: EvalDesign.pdf

What you’ll learn:

Why traditional testing models don’t work for AI agents
How to build evaluation scenarios (including edge cases that matter most)
The role of synthetic vs real-world data in testing
How to select meaningful metrics for single-turn vs multi-turn agents
Why an Evaluation Design Document (EDD) is critical for governance, risk, and scale
If you’re working with Copilot, AI agents, or automation in Dynamics 365, this session will help you design evaluations that actually catch issues before users do.

View the all the FastTrack Agent Evaluation Series recordings

Comments

*This post is locked for comments

Community site session details

Agent Evaluation Series Part 2: Sets, Metrics, & Evaluation Blueprint | FastTrack TechTalk | Dynamics 365

Comments

Season of Sharing Community Challenge Winners!

Women in Power Builds Momentum

Congratulations to the June Top 10 Community Leaders