Skip to content

Commit 9dcdfe5

Browse files
committed
Fixing PR Review blocking and non-blocking issues
1 parent e7a917c commit 9dcdfe5

12 files changed

Lines changed: 100 additions & 204 deletions

learn-pr/wwl/manage-testing-ai-powered-business-solutions/5-design-end-to-end-test-scenarios-ai-solutions-use-multiple-dynamics-365-apps.yml renamed to learn-pr/wwl/manage-testing-ai-powered-business-solutions/5-design-test-scenarios-ai-solutions-multiple-dynamics-365-apps.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
### YamlMime:ModuleUnit
2-
uid: learn.wwl.manage-testing-ai-powered-business-solutions.design-end-to-end-test-scenarios-ai-solutions-use-multiple-dynamics-365-apps
2+
uid: learn.wwl.manage-testing-ai-powered-business-solutions.design-test-scenarios-ai-solutions-multiple-dynamics-365-apps
33
title: "Design end-to-end test scenarios for AI solutions using multiple Dynamics 365 apps"
44
metadata:
55
title: "Design End-to-End Test Scenarios for AI Solutions"
@@ -10,4 +10,4 @@ metadata:
1010
ms.topic: unit
1111
durationInMinutes: 5
1212
content: |
13-
[!include[](includes/5-design-end-to-end-test-scenarios-ai-solutions-use-multiple-dynamics-365-apps.md)]
13+
[!include[](includes/5-design-test-scenarios-ai-solutions-multiple-dynamics-365-apps.md)]

learn-pr/wwl/manage-testing-ai-powered-business-solutions/includes/2-recommend-process-metrics-test-agents.md

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,11 @@
22

33
This unit teaches solution architects how to design and implement a structured and repeatable process for testing AI agents before production deployment. Testing ensures that agents operate reliably, meet business requirements, and behave predictably across diverse scenarios. You'll define key performance metrics, establish standardized test plans, and recommend measurement strategies to validate agent quality, usability, and compliance.
44

5-
## 1. Testing Framework for AI Agents
5+
## 1. Testing framework for AI agents
66

77
It's important to create a testing framework for all AI Agents.
88

9-
### 1.1 Establish the Testing Objective
9+
### 1.1 Establish the testing objective
1010

1111
#### Before testing begins, define the purpose of the test:
1212

@@ -18,7 +18,7 @@ It's important to create a testing framework for all AI Agents.
1818

1919
- Detect issues early and establish a baseline for future performance tuning.
2020

21-
### 1.2 Develop a Structured Test Plan
21+
### 1.2 Develop a structured test plan
2222

2323
#### A complete agent testing plan should include:
2424

@@ -30,11 +30,11 @@ It's important to create a testing framework for all AI Agents.
3030

3131
- **Success Criteria** - measurable thresholds for accuracy, speed, safety, and usability.
3232

33-
## 2. Recommended Testing Process
33+
## 2. Recommended testing process
3434

3535
There are several types of testing which can occur against AI agents. They can be manually tested or through an automated testing process.
3636

37-
### 2.1 Scenario-Based Testing
37+
### 2.1 Scenario-based testing
3838

3939
- Use real business workflows that reflect how employees will interact with the agent.
4040

@@ -44,7 +44,7 @@ There are several types of testing which can occur against AI agents. They can b
4444

4545
- Ensure agent output matches expected outcomes for each scenario.
4646

47-
### 2.2 Performance and Reliability Testing
47+
### 2.2 Performance and reliability testing
4848

4949
#### Evaluate how the agent performs under different conditions:
5050

@@ -56,7 +56,7 @@ There are several types of testing which can occur against AI agents. They can b
5656

5757
- Concurrent sessions.
5858

59-
### 2.3 Safety and Compliance Testing
59+
### 2.3 Safety and compliance testing
6060

6161
#### Confirm the agent respects enterprise constraints:
6262

@@ -68,7 +68,7 @@ There are several types of testing which can occur against AI agents. They can b
6868

6969
- Rejection of disallowed instructions.
7070

71-
### 2.4 Usability Testing
71+
### 2.4 Usability testing
7272

7373
#### Assess agent clarity, helpfulness, and ease of use:
7474

@@ -78,91 +78,91 @@ There are several types of testing which can occur against AI agents. They can b
7878

7979
- Do users understand how to prompt the agent effectively?
8080

81-
## 3. Metrics to Validate Agent Performance
81+
## 3. Metrics to validate agent performance
8282

8383
When measuring the AI Agent's performance, consider the below metrics.
8484

85-
### 3.1 Core Quantitative Metrics
85+
### 3.1 Core quantitative metrics
8686

8787
Use measurable indicators to determine whether the agent is performing optimally.
8888

89-
#### Accuracy and Relevance
89+
#### Accuracy and relevance
9090

9191
- Percentage of responses that correctly answer the user's intent.
9292

9393
- Alignment with the expected business process.
9494

95-
#### Response Time
95+
#### Response time
9696

9797
- How quickly the agent generates useful answers.
9898

9999
- Variability of response time across different tasks.
100100

101-
#### Success Rate
101+
#### Success rate
102102

103103
- Percentage of tasks fully completed without human intervention.
104104

105-
#### Failure Rate
105+
#### Failure rate
106106

107107
- Incorrect, incomplete, or unusable answers.
108108

109109
- Frequency of unexpected errors or guardrail triggers.
110110

111-
#### Token Efficiency (for generative agents)
111+
#### Token efficiency (for generative agents)
112112

113113
- Amount of content generated relative to cost.
114114

115115
- Signs of overly verbose or inefficient prompting.
116116

117-
### 3.2 Behavioral and Quality Metrics
117+
### 3.2 Behavioral and quality metrics
118118

119-
#### User Satisfaction
119+
#### User satisfaction
120120

121121
- Survey or rating-based signals.
122122

123123
- Number of escalations or repeated attempts.
124124

125-
#### Conversation Quality
125+
#### Conversation quality
126126

127127
- Coherence.
128128

129129
- Step-by-step reasoning quality.
130130

131131
- Ability to interpret follow-up questions.
132132

133-
#### Knowledge Coverage
133+
#### Knowledge coverage
134134

135135
- Depth and breadth of domain knowledge.
136136

137137
- Completeness of grounding sources.
138138

139139
- Gaps where the agent fails to retrieve necessary information.
140140

141-
### 3.3 Observability and Operational Metrics
141+
### 3.3 Observability and operational metrics
142142

143143
#### Stability
144144

145145
- Sessions completed without interruption.
146146

147147
- Error spikes or instability patterns.
148148

149-
#### Load Handling
149+
#### Load handling
150150

151151
- Agent behavior under heavy usage.
152152

153153
- Throughput capacity.
154154

155-
#### Guardrail Compliance
155+
#### Guardrail compliance
156156

157157
- Count of prevented actions.
158158

159159
- Instances where the agent approached restricted content.
160160

161-
## 4. Agent Testing Lifecycle
161+
## 4. Agent testing lifecycle
162162

163163
:::image type="content" source="../media/agent-testing-lifecycle.png" alt-text="Diagram showing the agent testing lifecycle: Test Planning, Scenario Design, Execution, Measurement, Analysis, Tuning, Re-Test, Approval, and Deployment." border="false":::
164164

165-
## 5. Recommendations for Solution Architects
165+
## 5. Recommendations for solution architects
166166

167167
- Create a unified **testing blueprint** used across all agent implementations.
168168

learn-pr/wwl/manage-testing-ai-powered-business-solutions/includes/3-create-validation-criteria-custom-ai-models.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,11 @@ Validation criteria help architects consistently confirm that a model is:
1414

1515
- Evaluated consistently before, during, and after deployment.
1616

17-
## 1. Foundations of Model Validation
17+
## 1. Foundations of model validation
1818

1919
Model validation establishes whether a custom AI model performs as expected and maintains consistent quality in production.
2020

21-
### Core Questions for Validation
21+
### Core questions for validation
2222

2323
- _Does the model generate correct, relevant, grounded outputs?_
2424

@@ -28,7 +28,7 @@ Model validation establishes whether a custom AI model performs as expected and
2828

2929
- _Is model behavior aligned with established business intent and expected outcomes?_
3030

31-
### Key Validation Dimensions
31+
### Key validation dimensions
3232

3333
- **Performance metrics**
3434

@@ -40,11 +40,11 @@ Model validation establishes whether a custom AI model performs as expected and
4040

4141
- **User-centric metrics**
4242

43-
## 2. Define Quantitative Validation Criteria
43+
## 2. Define quantitative validation criteria
4444

4545
Quantitative criteria ensure measurable and repeatable evaluation during tuning or deployment.
4646

47-
### Primary Metrics
47+
### Primary metrics
4848

4949
Below are key metrics that must be included in the evaluation of custom AI Models.
5050

@@ -55,22 +55,22 @@ Below are key metrics that must be included in the evaluation of custom AI Model
5555
- **Token Efficiency**<br>Amount of model usage cost relative to output quality.
5656
- **Drift Indicators**<br>Changes in output quality due to evolving data or shifting patterns.
5757

58-
## 3. Define Qualitative Validation Criteria
58+
## 3. Define qualitative validation criteria
5959

6060
Qualitative evaluation helps architects identify nuanced issues that numeric metrics can't capture.
6161

62-
### Criteria Examples
62+
### Criteria examples
6363

64-
- **Relevance and Completeness**<br>Does the model respond with the right level of detail, in context, without hallucinations?
64+
- **Relevance and Completeness**<br>Does the model respond with the right level of detail, in context, without incorrect information?
6565
- **Consistency of Reasoning**<br>Does the model follow logical steps aligned with enterprise workflows?
6666
- **Grounding Integrity**<br>Does the model use approved organizational knowledge?
6767
- **User Experience Quality**<br>Clarity, coherence, readability, and instructional usefulness.
6868

69-
## 4. Establish Safety and Compliance Validation
69+
## 4. Establish safety and compliance validation
7070

7171
Before production, custom models must satisfy enterprise governance requirements. Depending on the organization, there may be additional requirements. It's important to use the below as a neutral baseline.
7272

73-
### Key Safety Criteria
73+
### Key safety criteria
7474

7575
- Enforces role-based access to restricted content.
7676

@@ -80,30 +80,30 @@ Before production, custom models must satisfy enterprise governance requirements
8080

8181
- Maintains auditability and traceability of actions.
8282

83-
### Risk-Mitigation Requirements
83+
### Risk-mitigation requirements
8484

8585
- Human-in-the-loop review for sensitive workflows.
8686

8787
- Guardrail testing for disallowed instructions.
8888

8989
- Verified grounding exclusively in authorized knowledge sources.
9090

91-
## 5. Operational Validation Criteria
91+
## 5. Operational validation criteria
9292

9393
Operational validation ensures the model can be trusted in real systems.
9494

95-
### Areas to Validate
95+
### Areas to validate
9696

9797
- **Scalability** - Stable behavior under varying compute and workload patterns.<br>**Resilience** - Recovery from errors, timeouts, or dependency interruptions.<br>**Integration Reliability** - Works consistently with APIs, connectors, or orchestration components.<br>**Monitoring Support** - Telemetry produced is adequate for observability and triage.
9898

99-
## 6. Example Validation Metrics for Custom Models
99+
## 6. Example validation metrics for custom models
100100

101101
| Validation Area | Metric / Criteria | Success Threshold |
102102
|---|---|---|
103103
| Performance | Latency | < 2 seconds |
104104
| | Throughput | 95th percentile stable |
105105
| Quality | Accuracy | ≥ 90% correctness |
106-
| | Hallucination Rate | ≤ 3% |
106+
| | Incorrect Information Rate | ≤ 3% |
107107
| Safety | Guardrail Violations | 0 |
108108
| | Sensitive Output Detection | 100% blocked |
109109
| Cost Efficiency | Token Utilization | On par with baseline |

0 commit comments

Comments
 (0)