You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learn-pr/wwl/manage-testing-ai-powered-business-solutions/5-design-test-scenarios-ai-solutions-multiple-dynamics-365-apps.yml
Copy file name to clipboardExpand all lines: learn-pr/wwl/manage-testing-ai-powered-business-solutions/includes/2-recommend-process-metrics-test-agents.md
+24-24Lines changed: 24 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,11 +2,11 @@
2
2
3
3
This unit teaches solution architects how to design and implement a structured and repeatable process for testing AI agents before production deployment. Testing ensures that agents operate reliably, meet business requirements, and behave predictably across diverse scenarios. You'll define key performance metrics, establish standardized test plans, and recommend measurement strategies to validate agent quality, usability, and compliance.
4
4
5
-
## 1. Testing Framework for AI Agents
5
+
## 1. Testing framework for AI agents
6
6
7
7
It's important to create a testing framework for all AI Agents.
8
8
9
-
### 1.1 Establish the Testing Objective
9
+
### 1.1 Establish the testing objective
10
10
11
11
#### Before testing begins, define the purpose of the test:
12
12
@@ -18,7 +18,7 @@ It's important to create a testing framework for all AI Agents.
18
18
19
19
- Detect issues early and establish a baseline for future performance tuning.
20
20
21
-
### 1.2 Develop a Structured Test Plan
21
+
### 1.2 Develop a structured test plan
22
22
23
23
#### A complete agent testing plan should include:
24
24
@@ -30,11 +30,11 @@ It's important to create a testing framework for all AI Agents.
30
30
31
31
-**Success Criteria** - measurable thresholds for accuracy, speed, safety, and usability.
32
32
33
-
## 2. Recommended Testing Process
33
+
## 2. Recommended testing process
34
34
35
35
There are several types of testing which can occur against AI agents. They can be manually tested or through an automated testing process.
36
36
37
-
### 2.1 Scenario-Based Testing
37
+
### 2.1 Scenario-based testing
38
38
39
39
- Use real business workflows that reflect how employees will interact with the agent.
40
40
@@ -44,7 +44,7 @@ There are several types of testing which can occur against AI agents. They can b
44
44
45
45
- Ensure agent output matches expected outcomes for each scenario.
46
46
47
-
### 2.2 Performance and Reliability Testing
47
+
### 2.2 Performance and reliability testing
48
48
49
49
#### Evaluate how the agent performs under different conditions:
50
50
@@ -56,7 +56,7 @@ There are several types of testing which can occur against AI agents. They can b
56
56
57
57
- Concurrent sessions.
58
58
59
-
### 2.3 Safety and Compliance Testing
59
+
### 2.3 Safety and compliance testing
60
60
61
61
#### Confirm the agent respects enterprise constraints:
62
62
@@ -68,7 +68,7 @@ There are several types of testing which can occur against AI agents. They can b
68
68
69
69
- Rejection of disallowed instructions.
70
70
71
-
### 2.4 Usability Testing
71
+
### 2.4 Usability testing
72
72
73
73
#### Assess agent clarity, helpfulness, and ease of use:
74
74
@@ -78,91 +78,91 @@ There are several types of testing which can occur against AI agents. They can b
78
78
79
79
- Do users understand how to prompt the agent effectively?
80
80
81
-
## 3. Metrics to Validate Agent Performance
81
+
## 3. Metrics to validate agent performance
82
82
83
83
When measuring the AI Agent's performance, consider the below metrics.
84
84
85
-
### 3.1 Core Quantitative Metrics
85
+
### 3.1 Core quantitative metrics
86
86
87
87
Use measurable indicators to determine whether the agent is performing optimally.
88
88
89
-
#### Accuracy and Relevance
89
+
#### Accuracy and relevance
90
90
91
91
- Percentage of responses that correctly answer the user's intent.
92
92
93
93
- Alignment with the expected business process.
94
94
95
-
#### Response Time
95
+
#### Response time
96
96
97
97
- How quickly the agent generates useful answers.
98
98
99
99
- Variability of response time across different tasks.
100
100
101
-
#### Success Rate
101
+
#### Success rate
102
102
103
103
- Percentage of tasks fully completed without human intervention.
104
104
105
-
#### Failure Rate
105
+
#### Failure rate
106
106
107
107
- Incorrect, incomplete, or unusable answers.
108
108
109
109
- Frequency of unexpected errors or guardrail triggers.
110
110
111
-
#### Token Efficiency (for generative agents)
111
+
#### Token efficiency (for generative agents)
112
112
113
113
- Amount of content generated relative to cost.
114
114
115
115
- Signs of overly verbose or inefficient prompting.
116
116
117
-
### 3.2 Behavioral and Quality Metrics
117
+
### 3.2 Behavioral and quality metrics
118
118
119
-
#### User Satisfaction
119
+
#### User satisfaction
120
120
121
121
- Survey or rating-based signals.
122
122
123
123
- Number of escalations or repeated attempts.
124
124
125
-
#### Conversation Quality
125
+
#### Conversation quality
126
126
127
127
- Coherence.
128
128
129
129
- Step-by-step reasoning quality.
130
130
131
131
- Ability to interpret follow-up questions.
132
132
133
-
#### Knowledge Coverage
133
+
#### Knowledge coverage
134
134
135
135
- Depth and breadth of domain knowledge.
136
136
137
137
- Completeness of grounding sources.
138
138
139
139
- Gaps where the agent fails to retrieve necessary information.
140
140
141
-
### 3.3 Observability and Operational Metrics
141
+
### 3.3 Observability and operational metrics
142
142
143
143
#### Stability
144
144
145
145
- Sessions completed without interruption.
146
146
147
147
- Error spikes or instability patterns.
148
148
149
-
#### Load Handling
149
+
#### Load handling
150
150
151
151
- Agent behavior under heavy usage.
152
152
153
153
- Throughput capacity.
154
154
155
-
#### Guardrail Compliance
155
+
#### Guardrail compliance
156
156
157
157
- Count of prevented actions.
158
158
159
159
- Instances where the agent approached restricted content.
160
160
161
-
## 4. Agent Testing Lifecycle
161
+
## 4. Agent testing lifecycle
162
162
163
163
:::image type="content" source="../media/agent-testing-lifecycle.png" alt-text="Diagram showing the agent testing lifecycle: Test Planning, Scenario Design, Execution, Measurement, Analysis, Tuning, Re-Test, Approval, and Deployment." border="false":::
164
164
165
-
## 5. Recommendations for Solution Architects
165
+
## 5. Recommendations for solution architects
166
166
167
167
- Create a unified **testing blueprint** used across all agent implementations.
Copy file name to clipboardExpand all lines: learn-pr/wwl/manage-testing-ai-powered-business-solutions/includes/3-create-validation-criteria-custom-ai-models.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,11 +14,11 @@ Validation criteria help architects consistently confirm that a model is:
14
14
15
15
- Evaluated consistently before, during, and after deployment.
16
16
17
-
## 1. Foundations of Model Validation
17
+
## 1. Foundations of model validation
18
18
19
19
Model validation establishes whether a custom AI model performs as expected and maintains consistent quality in production.
20
20
21
-
### Core Questions for Validation
21
+
### Core questions for validation
22
22
23
23
-_Does the model generate correct, relevant, grounded outputs?_
24
24
@@ -28,7 +28,7 @@ Model validation establishes whether a custom AI model performs as expected and
28
28
29
29
-_Is model behavior aligned with established business intent and expected outcomes?_
30
30
31
-
### Key Validation Dimensions
31
+
### Key validation dimensions
32
32
33
33
-**Performance metrics**
34
34
@@ -40,11 +40,11 @@ Model validation establishes whether a custom AI model performs as expected and
40
40
41
41
-**User-centric metrics**
42
42
43
-
## 2. Define Quantitative Validation Criteria
43
+
## 2. Define quantitative validation criteria
44
44
45
45
Quantitative criteria ensure measurable and repeatable evaluation during tuning or deployment.
46
46
47
-
### Primary Metrics
47
+
### Primary metrics
48
48
49
49
Below are key metrics that must be included in the evaluation of custom AI Models.
50
50
@@ -55,22 +55,22 @@ Below are key metrics that must be included in the evaluation of custom AI Model
55
55
-**Token Efficiency**<br>Amount of model usage cost relative to output quality.
56
56
-**Drift Indicators**<br>Changes in output quality due to evolving data or shifting patterns.
-**Relevance and Completeness**<br>Does the model respond with the right level of detail, in context, without hallucinations?
64
+
-**Relevance and Completeness**<br>Does the model respond with the right level of detail, in context, without incorrect information?
65
65
-**Consistency of Reasoning**<br>Does the model follow logical steps aligned with enterprise workflows?
66
66
-**Grounding Integrity**<br>Does the model use approved organizational knowledge?
67
67
-**User Experience Quality**<br>Clarity, coherence, readability, and instructional usefulness.
68
68
69
-
## 4. Establish Safety and Compliance Validation
69
+
## 4. Establish safety and compliance validation
70
70
71
71
Before production, custom models must satisfy enterprise governance requirements. Depending on the organization, there may be additional requirements. It's important to use the below as a neutral baseline.
72
72
73
-
### Key Safety Criteria
73
+
### Key safety criteria
74
74
75
75
- Enforces role-based access to restricted content.
76
76
@@ -80,30 +80,30 @@ Before production, custom models must satisfy enterprise governance requirements
80
80
81
81
- Maintains auditability and traceability of actions.
82
82
83
-
### Risk-Mitigation Requirements
83
+
### Risk-mitigation requirements
84
84
85
85
- Human-in-the-loop review for sensitive workflows.
86
86
87
87
- Guardrail testing for disallowed instructions.
88
88
89
89
- Verified grounding exclusively in authorized knowledge sources.
90
90
91
-
## 5. Operational Validation Criteria
91
+
## 5. Operational validation criteria
92
92
93
93
Operational validation ensures the model can be trusted in real systems.
94
94
95
-
### Areas to Validate
95
+
### Areas to validate
96
96
97
97
-**Scalability** - Stable behavior under varying compute and workload patterns.<br>**Resilience** - Recovery from errors, timeouts, or dependency interruptions.<br>**Integration Reliability** - Works consistently with APIs, connectors, or orchestration components.<br>**Monitoring Support** - Telemetry produced is adequate for observability and triage.
98
98
99
-
## 6. Example Validation Metrics for Custom Models
99
+
## 6. Example validation metrics for custom models
0 commit comments