Skip to content

Commit f69584c

Browse files
Merge pull request #53712 from MicrosoftDocs/NEW-module9-manage2
Release branch to Main (module 9)
2 parents ad554ec + 39c7480 commit f69584c

21 files changed

Lines changed: 949 additions & 0 deletions
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.manage-testing-ai-powered-business-solutions.introduction
3+
title: "Introduction"
4+
metadata:
5+
title: "Introduction"
6+
description: "Learn essential practices to validate and maintain the quality of AI-powered business solutions with structured testing and governance."
7+
ms.date: 02/13/2026
8+
author: msdavidram
9+
ms.author: taeldin
10+
ms.topic: unit
11+
durationInMinutes: 3
12+
content: |
13+
[!include[](includes/1-introduction.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.manage-testing-ai-powered-business-solutions.recommend-process-metrics-test-agents
3+
title: "Recommend process metrics for testing AI agents"
4+
metadata:
5+
title: "Recommend Process Metrics for Testing AI Agents"
6+
description: "Learn how to recommend process metrics for testing AI agents to ensure reliability, usability, and compliance before production deployment."
7+
ms.date: 02/13/2026
8+
author: msdavidram
9+
ms.author: taeldin
10+
ms.topic: unit
11+
durationInMinutes: 5
12+
content: |
13+
[!include[](includes/2-recommend-process-metrics-test-agents.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.manage-testing-ai-powered-business-solutions.create-validation-criteria-custom-ai-models
3+
title: "Create validation criteria for custom AI models"
4+
metadata:
5+
title: "Create Validation Criteria for Custom AI Models"
6+
description: "Learn how to define robust validation criteria for custom AI models to ensure accuracy, reliability, safety, and scalability in enterprise environments."
7+
ms.date: 02/13/2026
8+
author: msdavidram
9+
ms.author: taeldin
10+
ms.topic: unit
11+
durationInMinutes: 6
12+
content: |
13+
[!include[](includes/3-create-validation-criteria-custom-ai-models.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.manage-testing-ai-powered-business-solutions.validate-effective-copilot-prompt-best-practices
3+
title: "Validate effective Copilot prompt best practices"
4+
metadata:
5+
title: "Validate Effective Copilot Prompt Best Practices"
6+
description: "Learn how to validate effective Copilot prompt best practices to ensure clarity, safety, and high-quality AI output for enterprise workflows."
7+
ms.date: 02/13/2026
8+
author: msdavidram
9+
ms.author: taeldin
10+
ms.topic: unit
11+
durationInMinutes: 5
12+
content: |
13+
[!include[](includes/4-validate-effective-copilot-prompt-best-practices.md)]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.manage-testing-ai-powered-business-solutions.design-test-scenarios-ai-solutions-multiple-dynamics-365-apps
3+
title: "Design end-to-end test scenarios for AI solutions using multiple Dynamics 365 apps"
4+
metadata:
5+
title: "Design End-to-End Test Scenarios for AI Solutions"
6+
description: "Learn how to design end-to-end test scenarios for AI solutions that integrate multiple Dynamics 365 apps, ensuring seamless workflows and accurate outputs."
7+
ms.date: 02/13/2026
8+
author: msdavidram
9+
ms.author: taeldin
10+
ms.topic: unit
11+
durationInMinutes: 5
12+
content: |
13+
[!include[](includes/5-design-test-scenarios-ai-solutions-multiple-dynamics-365-apps.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.manage-testing-ai-powered-business-solutions.build-strategy-creating-test-cases-using-copilot
3+
title: "Build a strategy for creating test cases using Copilot"
4+
metadata:
5+
title: "Build a Strategy for Creating Test Cases Using Copilot"
6+
description: "Learn to build a scalable strategy for creating high-quality test cases using Copilot. Strengthen reliability, coverage, and consistency in AI solutions."
7+
ms.date: 02/13/2026
8+
author: msdavidram
9+
ms.author: taeldin
10+
ms.topic: unit
11+
durationInMinutes: 5
12+
content: |
13+
[!include[](includes/6-build-strategy-creating-test-cases-using-copilot.md)]
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.manage-testing-ai-powered-business-solutions.knowledge-check
3+
title: "Module assessment"
4+
metadata:
5+
title: "Knowledge check"
6+
description: "Knowledge check"
7+
ms.date: 02/13/2026
8+
author: msdavidram
9+
ms.author: taeldin
10+
ms.topic: unit
11+
module_assessment: false
12+
durationInMinutes: 3
13+
content: "Choose the best response for each of the following questions."
14+
quiz:
15+
questions:
16+
- content: "Which statement best explains why traditional software testing isn't sufficient for AI-powered business solutions?"
17+
choices:
18+
- content: "AI solutions require more UI automation than traditional apps."
19+
isCorrect: false
20+
explanation: "Incorrect. While UI automation may be part of testing, it doesn't address the unique challenges of AI systems."
21+
- content: "AI outputs are probabilistic and can vary based on context, data, and input phrasing."
22+
isCorrect: true
23+
explanation: "Correct. AI models produce probabilistic, context-dependent outputs. Two similar inputs can generate different results depending on data grounding, prompt variations, or system state. This variability requires new testing approaches that validate behavior patterns, safety, guardrails, grounding integrity, and consistency—areas traditional deterministic software testing doesn't fully cover."
24+
- content: "AI systems don't require validation of compliance or safety."
25+
isCorrect: false
26+
explanation: "Incorrect. AI systems often require rigorous validation for compliance and safety, especially in regulated industries."
27+
- content: "AI solutions only need to be tested once before deployment."
28+
isCorrect: false
29+
explanation: "Incorrect. AI solutions require continuous testing and monitoring to ensure consistent performance and alignment with business goals."
30+
- content: "Which metric is most important when validating whether an AI solution is producing outputs aligned with business outcomes?"
31+
choices:
32+
- content: "Length of the conversation"
33+
isCorrect: false
34+
explanation: "Incorrect. Conversation length doesn't measure the quality or alignment of AI outputs with business outcomes."
35+
- content: "Number of users interacting with the AI"
36+
isCorrect: false
37+
explanation: "Incorrect. While user engagement is important, it doesn't directly validate the accuracy or relevance of AI outputs."
38+
- content: "Accuracy and relevance of the AI's responses"
39+
isCorrect: true
40+
explanation: "Correct. To ensure an AI solution is delivering value, you must confirm that the outputs are accurate, relevant, and aligned with business intent. This directly validates whether the AI is producing correct insights or actions that support real business workflows."
41+
- content: "Frequency of UI updates"
42+
isCorrect: false
43+
explanation: "Incorrect. UI updates are unrelated to the validation of AI output quality or business alignment."
44+
- content: "Why must end-to-end test scenarios for AI solutions validate cross-app data flow across multiple Dynamics 365 applications?"
45+
choices:
46+
- content: "Because each Dynamics 365 app requires a separate AI model."
47+
isCorrect: false
48+
explanation: "Incorrect. AI solutions can share models across apps. The need for cross-app testing is driven by data dependencies, not separate models."
49+
- content: "Because AI output quality depends on consistent, trusted, and well-timed input data from across integrated systems."
50+
isCorrect: true
51+
explanation: "Correct. AI decisions depend on data flowing across multiple apps. Workflow orchestration can break when one app changes, data may be duplicated or transformed, and AI output quality relies on consistent, trusted, well-timed input data. Testing must validate the entire business process to ensure accurate outputs."
52+
- content: "Because testing individual apps isn't possible with AI solutions."
53+
isCorrect: false
54+
explanation: "Incorrect. Individual app testing is still possible, but it doesn't validate the end-to-end behavior of AI solutions that span multiple systems."
55+
- content: "Because Dynamics 365 apps can't share data without manual intervention."
56+
isCorrect: false
57+
explanation: "Incorrect. Dynamics 365 apps integrate through connectors, data sync, and automations. The challenge is validating that these integrations work correctly for AI-driven workflows."
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.manage-testing-ai-powered-business-solutions.summary
3+
title: "Summary"
4+
metadata:
5+
title: "Summary"
6+
description: "Discover key practices for testing AI solutions, ensuring reliability, safety, and performance across dynamic business environments."
7+
ms.date: 02/13/2026
8+
author: msdavidram
9+
ms.author: taeldin
10+
ms.topic: unit
11+
durationInMinutes: 3
12+
content: |
13+
[!include[](includes/8-summary.md)]
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
This module introduces solution architects to the essential practices required to validate and maintain the quality of AI-powered business solutions across the enterprise. Because AI systems generate probabilistic outputs and rely on dynamic data sources, traditional testing methods aren't sufficient. This module equips learners with the frameworks, metrics, and governance needed to ensure AI solutions behave reliably, safely, and in alignment with business goals.
2+
3+
Learners will explore how to design structured testing processes for agents, custom AI models, prompts, and end-to-end multi-application scenarios. Each unit provides practical guidance on defining objectives, creating measurable validation criteria, evaluating safety and compliance, and understanding how data flows across integrated business applications affect AI behavior.
4+
5+
The module emphasizes measurable performance indicators such as accuracy, latency, stability, guardrail adherence, and user experience quality. Learners also gain strategies for validating prompt design, assessing grounding integrity, and ensuring predictable AI reasoning across varied scenarios and user types.
6+
7+
Finally, the module introduces scalable testing strategies using Copilot to accelerate test case creation while maintaining consistency, coverage, and governance. By the end, solution architects will be able to design repeatable testing frameworks that ensure AI solutions remain trustworthy, resilient, and aligned to enterprise requirements throughout their lifecycle.
Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
## Overview
2+
3+
This unit teaches solution architects how to design and implement a structured and repeatable process for testing AI agents before production deployment. Testing ensures that agents operate reliably, meet business requirements, and behave predictably across diverse scenarios. You'll define key performance metrics, establish standardized test plans, and recommend measurement strategies to validate agent quality, usability, and compliance.
4+
5+
## 1. Testing framework for AI agents
6+
7+
It's important to create a testing framework for all AI Agents.
8+
9+
### 1.1 Establish the testing objective
10+
11+
#### Before testing begins, define the purpose of the test:
12+
13+
- Validate the agent's ability to meet the intended business outcome.
14+
15+
- Ensure accuracy and consistency across scenarios.
16+
17+
- Verify that guardrails, data boundaries, and compliance policies operate correctly.
18+
19+
- Detect issues early and establish a baseline for future performance tuning.
20+
21+
### 1.2 Develop a structured test plan
22+
23+
#### A complete agent testing plan should include:
24+
25+
- **Test Scope** - features, workflows, channels, and scenarios.
26+
27+
- **Test Data** - representative prompts, business cases, and realistic contextual inputs.
28+
29+
- **Test Roles** - who executes tests, who validates behavioral output, who documents findings.
30+
31+
- **Success Criteria** - measurable thresholds for accuracy, speed, safety, and usability.
32+
33+
## 2. Recommended testing process
34+
35+
There are several types of testing which can occur against AI agents. They can be manually tested or through an automated testing process.
36+
37+
### 2.1 Scenario-based testing
38+
39+
- Use real business workflows that reflect how employees will interact with the agent.
40+
41+
- Include ambiguous, incomplete, and varied user inputs.
42+
43+
- Validate multi-turn reasoning, memory handling, and follow-up behavior.
44+
45+
- Ensure agent output matches expected outcomes for each scenario.
46+
47+
### 2.2 Performance and reliability testing
48+
49+
#### Evaluate how the agent performs under different conditions:
50+
51+
- High request volume.
52+
53+
- Long interactions.
54+
55+
- Complex multi-step tasks.
56+
57+
- Concurrent sessions.
58+
59+
### 2.3 Safety and compliance testing
60+
61+
#### Confirm the agent respects enterprise constraints:
62+
63+
- Sensitive data protection.
64+
65+
- Role-based access rules.
66+
67+
- Policy triggers (such as restricted actions or DLP rules).
68+
69+
- Rejection of disallowed instructions.
70+
71+
### 2.4 Usability testing
72+
73+
#### Assess agent clarity, helpfulness, and ease of use:
74+
75+
- Are answers concise, accurate, and understandable?
76+
77+
- Does the agent require excessive refinement?
78+
79+
- Do users understand how to prompt the agent effectively?
80+
81+
## 3. Metrics to validate agent performance
82+
83+
When measuring the AI Agent's performance, consider the below metrics.
84+
85+
### 3.1 Core quantitative metrics
86+
87+
Use measurable indicators to determine whether the agent is performing optimally.
88+
89+
#### Accuracy and relevance
90+
91+
- Percentage of responses that correctly answer the user's intent.
92+
93+
- Alignment with the expected business process.
94+
95+
#### Response time
96+
97+
- How quickly the agent generates useful answers.
98+
99+
- Variability of response time across different tasks.
100+
101+
#### Success rate
102+
103+
- Percentage of tasks fully completed without human intervention.
104+
105+
#### Failure rate
106+
107+
- Incorrect, incomplete, or unusable answers.
108+
109+
- Frequency of unexpected errors or guardrail triggers.
110+
111+
#### Token efficiency (for generative agents)
112+
113+
- Amount of content generated relative to cost.
114+
115+
- Signs of overly verbose or inefficient prompting.
116+
117+
### 3.2 Behavioral and quality metrics
118+
119+
#### User satisfaction
120+
121+
- Survey or rating-based signals.
122+
123+
- Number of escalations or repeated attempts.
124+
125+
#### Conversation quality
126+
127+
- Coherence.
128+
129+
- Step-by-step reasoning quality.
130+
131+
- Ability to interpret follow-up questions.
132+
133+
#### Knowledge coverage
134+
135+
- Depth and breadth of domain knowledge.
136+
137+
- Completeness of grounding sources.
138+
139+
- Gaps where the agent fails to retrieve necessary information.
140+
141+
### 3.3 Observability and operational metrics
142+
143+
#### Stability
144+
145+
- Sessions completed without interruption.
146+
147+
- Error spikes or instability patterns.
148+
149+
#### Load handling
150+
151+
- Agent behavior under heavy usage.
152+
153+
- Throughput capacity.
154+
155+
#### Guardrail compliance
156+
157+
- Count of prevented actions.
158+
159+
- Instances where the agent approached restricted content.
160+
161+
## 4. Agent testing lifecycle
162+
163+
:::image type="content" source="../media/agent-testing-lifecycle.png" alt-text="Diagram showing the agent testing lifecycle: Test Planning, Scenario Design, Execution, Measurement, Analysis, Tuning, Re-Test, Approval, and Deployment." border="false":::
164+
165+
## 5. Recommendations for solution architects
166+
167+
- Create a unified **testing blueprint** used across all agent implementations.
168+
169+
- Maintain a **centralized log** of test results for comparison across releases.
170+
171+
- Incorporate **automation** where possible, including repeatable scripts for standard interactions.
172+
173+
- Establish governance checkpoints before each deployment.
174+
175+
- Pair telemetry insights with qualitative feedback to drive continuous improvement.
176+
177+
## References
178+
179+
[Conversational agents performance testing](/microsoft-copilot-studio/guidance/conversational-agents-performance-testing)

0 commit comments

Comments
 (0)