Skip to content

Commit 18f62c9

Browse files
committed
fix issues
1 parent 394edc3 commit 18f62c9

8 files changed

Lines changed: 20 additions & 20 deletions

File tree

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
### YamlMime:ModuleUnit
22
uid: learn.wwl.evaluate-optimize-agents.git-based-experimentation-workflow
33
metadata:
4-
title: Apply git-based workflows to optimization experiments
5-
description: Learn how to organize agent optimization experiments using git branches, systematic testing scripts, and documented evaluation results for reproducible comparisons.
4+
title: Apply Git-based workflows to optimization experiments
5+
description: Learn how to organize agent optimization experiments using Git branches, systematic testing scripts, and documented evaluation results for reproducible comparisons.
66
ms.date: 02/17/2026
77
author: madiepev
88
ms.author: madiepev
99
ms.topic: unit
1010
ai-usage: ai-generated
11-
title: Apply git-based workflows to optimization experiments
11+
title: Apply Git-based workflows to optimization experiments
1212
durationInMinutes: 8
1313
content: |
1414
[!include[](includes/3-git-based-experimentation-workflow.md)]

learn-pr/wwl-data-ai/evaluate-optimize-agents/6-knowledge-check.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ uid: learn.wwl.evaluate-optimize-agents.knowledge-check
33
title: Knowledge check
44
metadata:
55
title: Knowledge check
6-
description: "Test your understanding of agent evaluation experiments, git-based workflows, and evaluation rubrics."
6+
description: "Test your understanding of agent evaluation experiments, Git-based workflows, and evaluation rubrics."
77
ms.date: 02/17/2026
88
author: madiepev
99
ms.author: madiepev
@@ -17,11 +17,11 @@ content: |
1717
quiz:
1818
title: "Check your knowledge"
1919
questions:
20-
- content: "What is the primary reason for organizing agent optimization experiments into separate git branches?"
20+
- content: "What is the primary reason for organizing agent optimization experiments into separate Git branches?"
2121
choices:
2222
- content: "To enable parallel development by multiple team members simultaneously"
2323
isCorrect: false
24-
explanation: "Incorrect: While git supports parallel development, the primary reason for separate experiment branches is controlled comparison, not collaboration."
24+
explanation: "Incorrect: While Git supports parallel development, the primary reason for separate experiment branches is controlled comparison, not collaboration."
2525
- content: "To isolate specific changes and attribute performance differences to individual modifications"
2626
isCorrect: true
2727
explanation: "Correct: Separate branches enable controlled comparison by testing one change at a time, making it clear which modification caused observed performance differences."

learn-pr/wwl-data-ai/evaluate-optimize-agents/7-summary.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
uid: learn.wwl.evaluate-optimize-agents.summary
33
metadata:
44
title: Summary
5-
description: Summary of key takeaways for structured agent optimization including evaluation metrics, git-based workflows, consistent scoring rubrics, and evidence-based decision making.
5+
description: Summary of key takeaways for structured agent optimization including evaluation metrics, Git-based workflows, consistent scoring rubrics, and evidence-based decision making.
66
ms.date: 02/17/2026
77
author: madiepev
88
ms.author: madiepev

learn-pr/wwl-data-ai/evaluate-optimize-agents/includes/2-design-evaluation-experiments.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -65,21 +65,21 @@ Representative test prompts cover the spectrum of real-world usage. For the Adve
6565

6666
- **Digital nomads planning weekend hikes**: "I'm hiking in the Scottish Highlands in March, what waterproof gear do I need from Adventure Works?"
6767
- **Families preparing for their first outdoor adventure**: "We're taking our teenagers on easy trails near London next month, what basic equipment should we buy or rent?"
68-
- **Experienced hikers planning extended trips**: "I need a complete gear list for five-day backpacking trip in moderate terrain with variable weather"
68+
- **Experienced hikers planning extended trips**: "I need a complete gear list for five-day backpacking trip in moderate terrain with variable weather."
6969

7070
Edge cases test how the agent handles challenging situations:
7171

7272
- **Ambiguous requests**: "What should I pack for hiking?"
73-
- **Incomplete trip details**: "I need gear for Scotland"
73+
- **Incomplete trip details**: "I need gear for Scotland."
7474
- **Last-minute gear changes**: "Can I swap my camping equipment rental for different sizes?"
7575

7676
Including five to 10 diverse test prompts provides sufficient coverage for manual testing and smoke tests while remaining practical for human evaluation. Each test prompt captures the user query, expected information needs, and ideal response characteristics.
7777

7878
Success criteria establish what constitutes acceptable performance before you run experiments. Setting thresholds in advance prevents rationalizing disappointing results. Adventure Works defines success thresholds across all three optimization dimensions:
7979

80-
- **Quality**: Average 4.2+ (five-point scale), minimum 3.5 per response to align with customer satisfaction targets and prevent trust erosion
81-
- **Cost**: 60% expense reduction to achieve operational efficiency goals while maintaining 85% resolution rate
82-
- **Performance**: Average response time <30 seconds, time-to-first-token <2 seconds (streaming) to ensure acceptable user experience for real-time interactions
80+
- **Quality**: Average 4.2+ (five-point scale), minimum 3.5 per response to align with customer satisfaction targets and prevent trust erosion.
81+
- **Cost**: 60% expense reduction to achieve operational efficiency goals while maintaining 85% resolution rate.
82+
- **Performance**: Average response time <30 seconds, time-to-first-token <2 seconds (streaming) to ensure acceptable user experience for real-time interactions.
8383

8484
Business requirements influence these thresholds: customer-facing agents handling trip planning need higher quality standards and faster response times than internal tools.
8585

learn-pr/wwl-data-ai/evaluate-optimize-agents/includes/3-git-based-experimentation-workflow.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ For the Adventure Works experiments, you might document your comparison:
9191
| prompt-v2-concise | Maintains quality, more focused | Yes (4.4 avg) |
9292
| gpt4o-mini-model | Lower quality on complex prompts | No (4.1 avg, below 4.2 threshold) |
9393

94-
If `prompt-v2-concise` meets your quality threshold and improves conciseness, use git to merge the winning experiment:
94+
If `prompt-v2-concise` meets your quality threshold and improves conciseness, use Git to merge the winning experiment:
9595

9696
```bash
9797
git checkout main
@@ -102,4 +102,4 @@ git push origin main --tags
102102

103103
For experiments that don't meet criteria, document why before deciding whether to keep or delete the branch: "gpt4o-mini-model: Quality dropped below 4.2 threshold on complex trip planning prompts. Not recommended for production."
104104

105-
With git workflows established for organizing experiments, you're ready to execute the actual evaluations by running agents against test prompts and systematically scoring the results.
105+
With Git workflows established for organizing experiments, you're ready to execute the actual evaluations by running agents against test prompts and systematically scoring the results.

learn-pr/wwl-data-ai/evaluate-optimize-agents/includes/5-exercise.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ In this exercise, you evaluate two prompt versions of the Trail Guide Agent and
33

44
Throughout this exercise, you:
55

6-
- Design evaluation experiments using git branches to manage and compare prompt versions
6+
- Design evaluation experiments using Git branches to manage and compare prompt versions
77
- Manually evaluate AI agent responses against structured quality criteria (Intent Resolution, Relevance, Groundedness)
88
- Compare agent versions across test scenarios to identify quality differences and cost trade-offs
99

learn-pr/wwl-data-ai/evaluate-optimize-agents/includes/7-summary.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ You've learned how to optimize AI agents through structured evaluation that tran
55

66
Effective optimization depends on clear metrics that measure quality, cost, and performance. Quality metrics like Intent Resolution, Relevance, and Groundedness reveal whether agents serve user needs effectively. Cost metrics quantify token usage and operational expenses, enabling you to calculate the financial impact of model changes. Performance metrics measure response times that directly affect user experience. Together, these metrics provide objective criteria for comparing agent variants.
77

8-
## Organize experiments with git-based workflows
8+
## Organize experiments with Git-based workflows
99

1010
Git-based workflows bring engineering discipline to agent optimization. You create one branch per experiment variant, isolating specific changes like prompt modifications or model switches. Each branch maintains test prompts, evaluation scripts, and documented results. This structured approach lets you test changes safely, compare experiments systematically, and merge successful optimizations to production with confidence.
1111

learn-pr/wwl-data-ai/evaluate-optimize-agents/index.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
uid: learn.wwl.evaluate-optimize-agents
33
metadata:
44
title: Evaluate and optimize AI agents through structured experiments
5-
description: Learn how to evaluate and optimize AI agents systematically through structured experiments that measure quality, cost, and performance. Design evaluation metrics, apply git-based workflows, create consistent scoring rubrics, and make evidence-based optimization decisions.
5+
description: Learn how to evaluate and optimize AI agents systematically through structured experiments that measure quality, cost, and performance. Design evaluation metrics, apply Git-based workflows, create consistent scoring rubrics, and make evidence-based optimization decisions.
66
ms.date: 02/17/2026
77
author: madiepev
88
ms.author: madiepev
@@ -11,17 +11,17 @@ metadata:
1111
ms.service: azure-ai-foundry
1212
title: Evaluate and optimize AI agents through structured experiments
1313
summary: |
14-
Learn how to optimize AI agents through structured evaluation that transforms guesswork into evidence-based engineering decisions. You'll explore how to design evaluation experiments with clear metrics for quality, cost, and performance; organize experiments using git-based workflows; create evaluation rubrics for consistent scoring; and compare results to make informed optimization decisions.
14+
Learn how to optimize AI agents through structured evaluation that transforms guesswork into evidence-based engineering decisions. You'll explore how to design evaluation experiments with clear metrics for quality, cost, and performance; organize experiments using Git-based workflows; create evaluation rubrics for consistent scoring; and compare results to make informed optimization decisions.
1515
abstract: |
1616
In this module, you:
1717
- Design evaluation experiments with clear metrics for quality, cost, and performance
18-
- Apply git-based workflows to organize and compare agent variants systematically
18+
- Apply Git-based workflows to organize and compare agent variants systematically
1919
- Create evaluation rubrics that ensure consistent scoring across human evaluators
2020
- Compare experiment results to make evidence-based optimization decisions
2121
prerequisites: |
2222
Before starting this module, you should have:
2323
- Basic understanding of AI agents and large language models
24-
- Familiarity with git version control workflows
24+
- Familiarity with Git version control workflows
2525
- Experience with Microsoft Azure AI Foundry or similar AI development platforms
2626
iconUrl: /training/achievements/generic-badge.svg
2727
levels:

0 commit comments

Comments
 (0)