Skip to content

Commit 662f425

Browse files
authored
Merge pull request #53615 from madiepev/update-auto-eval
New module auto eval
2 parents 265f908 + a3a6403 commit 662f425

20 files changed

Lines changed: 1126 additions & 4 deletions

learn-pr/paths/operationalize-gen-ai-apps/index.yml

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
uid: learn.wwl.operationalize-gen-ai-apps
33
metadata:
44
title: Operationalize generative AI applications (GenAIOps)
5-
description: Learn how to develop, evaluate, optimize, and deploy generative AI applications (GenAIOps)
6-
ms.date: 08/06/2025
5+
description: Learn the full GenAIOps lifecycle for generative AI applications, from planning and prompt management to evaluation, automated testing, monitoring, and tracing in production.
6+
ms.date: 02/23/2026
77
author: wwlpublish
88
ms.author: madiepev
99
ms.topic: learning-path
@@ -13,23 +13,28 @@ title: Operationalize generative AI applications (GenAIOps)
1313
prerequisites: |
1414
Before starting this learning path, you should be familiar with fundamental generative AI concepts and services in Azure. Consider completing the [Microsoft Azure AI Fundamentals: Generative AI](/training/paths/introduction-generative-ai/?azure-portal=true) learning path first.
1515
summary: |
16-
To effectively scale generative Artificial Intelligence (AI) applications, you need to manage, deploy, and maintain GenAI apps to ensure their performance, reliability, and continuous improvement in real-world applications.
16+
Learn how to operationalize generative AI applications using the complete GenAIOps lifecycle. This learning path covers planning and preparing GenAIOps solutions, managing prompts for agents with version control, evaluating and optimizing agents through structured experiments, automating evaluations with Microsoft Foundry and GitHub Actions, monitoring application performance and costs, and implementing distributed tracing to debug complex AI workflows.
1717
iconUrl: /training/achievements/generic-badge.svg
1818
levels:
1919
- intermediate
2020
roles:
2121
- data-scientist
2222
- ai-engineer
23+
- devops-engineer
2324
products:
2425
- ai-services
26+
- azure-ai-foundry
27+
- github
2528
subjects:
2629
- artificial-intelligence
2730
- machine-learning
2831
- natural-language-processing
32+
- devops
2933
modules:
3034
- learn.wwl.plan-prepare-genaiops
3135
- learn.wwl.prompt-versioning-genaiops
32-
- learn.evaluate-generative-ai-apps
36+
- learn.wwl.evaluate-optimize-agents
37+
- learn.wwl.automated-evaluation-genaiops
3338
- learn.wwl.monitor-generative-ai-app
3439
- learn.wwl.tracing-generative-ai-app
3540
trophy:
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.automated-evaluation-genaiops.introduction
3+
title: Introduction
4+
metadata:
5+
title: Introduction
6+
description: "Introduction to automated evaluations with Microsoft Foundry and GitHub Actions"
7+
ms.date: 02/22/2026
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 3
14+
content: |
15+
[!include[](includes/1-introduction.md)]
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.automated-evaluation-genaiops.why-automated-evaluations
3+
title: Understand why automated evaluations matter
4+
metadata:
5+
title: Understand why automated evaluations matter
6+
description: "Understand the trade-offs between human and automated evaluation, and learn how human-in-the-loop approaches combine both strategically"
7+
ms.date: 02/22/2026
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 6
14+
content: |
15+
[!include[](includes/2-why-automated-evaluations.md)]
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.automated-evaluation-genaiops.align-evaluators-human-criteria
3+
title: Align evaluators with human criteria
4+
metadata:
5+
title: Align evaluators with human criteria
6+
description: "Follow a workflow to select evaluators, run shadow rating, monitor alignment, and refine with custom evaluators"
7+
ms.date: 02/22/2026
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 10
14+
content: |
15+
[!include[](includes/3-align-evaluators-human-criteria.md)]
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.automated-evaluation-genaiops.create-evaluation-data
3+
title: Create evaluation datasets
4+
metadata:
5+
title: Create evaluation datasets
6+
description: "Create comprehensive evaluation datasets from production data and synthetic generation with proper composition across common scenarios, variations, edge cases, and adversarial examples"
7+
ms.date: 02/22/2026
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 10
14+
content: |
15+
[!include[](includes/4-create-evaluation-data.md)]
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.automated-evaluation-genaiops.batch-evaluations-python
3+
title: Implement batch evaluations with Python
4+
metadata:
5+
title: Implement batch evaluations with Python
6+
description: "Learn how to run batch evaluations using Python scripts with Microsoft Foundry"
7+
ms.date: 02/22/2026
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 10
14+
content: |
15+
[!include[](includes/5-batch-evaluations-python.md)]
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.automated-evaluation-genaiops.github-actions-workflow
3+
title: Integrate evaluations into GitHub Actions
4+
metadata:
5+
title: Integrate evaluations into GitHub Actions
6+
description: "Learn how to automate Python evaluation scripts in GitHub Actions workflows triggered by pull requests"
7+
ms.date: 02/22/2026
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 10
14+
content: |
15+
[!include[](includes/6-github-actions-workflow.md)]
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.automated-evaluation-genaiops.exercise
3+
title: Exercise - Set up automated evaluations
4+
metadata:
5+
title: Exercise - Set up automated evaluations
6+
description: "Implement automated evaluations with Microsoft Foundry and GitHub Actions"
7+
ms.date: 02/22/2026
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 20
14+
content: |
15+
[!include[](includes/7-exercise.md)]
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.automated-evaluation-genaiops.knowledge-check
3+
title: Knowledge check
4+
metadata:
5+
title: Knowledge check
6+
description: "Knowledge check for automated evaluation with Microsoft Foundry and GitHub Actions"
7+
ms.date: 02/22/2026
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 3
14+
content: |
15+
quiz:
16+
title: "Check your knowledge"
17+
questions:
18+
- content: "What is the primary benefit of using shadow ratings during the transition from human to automated evaluations?"
19+
choices:
20+
- content: "Shadow ratings eliminate the need for human evaluators completely."
21+
isCorrect: false
22+
explanation: "Incorrect. Shadow ratings don't eliminate human evaluators; they run alongside them to validate automated evaluators."
23+
- content: "Shadow ratings allow you to compare automated evaluator scores with human ratings to measure alignment before fully trusting automation."
24+
isCorrect: true
25+
explanation: "Correct. Shadow ratings run automated evaluations alongside human evaluations to validate that automated scores align with human judgment before relying on them exclusively."
26+
- content: "Shadow ratings reduce the cost of evaluations by replacing expensive cloud computing with local processing."
27+
isCorrect: false
28+
explanation: "Incorrect. Shadow ratings don't focus on cost reduction; they focus on validating automated evaluators against human judgment."
29+
- content: "When creating a synthetic test dataset, what percentage should typically represent edge cases?"
30+
choices:
31+
- content: "5-10% to ensure the system handles unusual scenarios without over-optimizing for rare cases."
32+
isCorrect: true
33+
explanation: "Correct. Edge cases should represent 5-10% of your test dataset to validate handling of unusual scenarios while maintaining focus on common use cases."
34+
- content: "50% to ensure comprehensive coverage of all possible scenarios."
35+
isCorrect: false
36+
explanation: "Incorrect. 50% edge cases would over-represent unusual scenarios and lead to systems over-optimized for rare situations."
37+
- content: "Less than 1% since edge cases rarely occur in production."
38+
isCorrect: false
39+
explanation: "Incorrect. While edge cases are rare, less than 1% provides insufficient coverage to validate system behavior in unusual scenarios."
40+
- content: "What triggers a GitHub Actions workflow for automated evaluation in a pull request-based workflow?"
41+
choices:
42+
- content: "Manual approval from a senior team member after code review."
43+
isCorrect: false
44+
explanation: "Incorrect. GitHub Actions workflows are triggered automatically by events like pull request creation, not manual approval."
45+
- content: "Creating or updating a pull request that modifies prompt files."
46+
isCorrect: true
47+
explanation: "Correct. GitHub Actions workflows use triggers like 'pull_request' events on specific paths to automatically run evaluations when prompt changes are proposed."
48+
- content: "Deploying code to production after merging to the main branch."
49+
isCorrect: false
50+
explanation: "Incorrect. While you can trigger workflows on merge, the primary evaluation workflow runs before merge during the pull request phase to catch issues early."
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.automated-evaluation-genaiops.summary
3+
title: Summary
4+
metadata:
5+
title: Summary
6+
description: "Summary of automated evaluation with Microsoft Foundry and GitHub Actions"
7+
ms.date: 02/22/2026
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 2
14+
content: |
15+
[!include[](includes/9-summary.md)]

0 commit comments

Comments
 (0)