Skip to content

Commit 593929b

Browse files
authored
Merge pull request #54049 from MicrosoftDocs/NEW-module08-analyze-monitor-tune-ai-powered-business-solutions
New module08 analyze monitor tune ai powered business solutions
2 parents fd5a99b + d4a3932 commit 593929b

19 files changed

Lines changed: 1118 additions & 0 deletions
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.analyze-monitor-tune-ai-powered-business-solutions.introduction
3+
title: "Introduction"
4+
metadata:
5+
title: "Introduction"
6+
description: "Learn the essentials of monitoring, analyzing, and tuning AI-powered agents to ensure reliability, effectiveness, and continuous improvement."
7+
ms.date: 02/13/2026
8+
author: msdavidram
9+
ms.author: taeldin
10+
ms.topic: unit
11+
durationInMinutes: 3
12+
content: |
13+
[!include[](includes/1-introduction.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.analyze-monitor-tune-ai-powered-business-solutions.recommend-process-tools-monitoring-agents
3+
title: "Recommend process tools for monitoring agents"
4+
metadata:
5+
title: "Recommend Process Tools for Monitoring Agents"
6+
description: "Learn how to recommend processes and tools for monitoring AI agents, ensuring observability, compliance, and continuous improvement."
7+
ms.date: 02/13/2026
8+
author: msdavidram
9+
ms.author: taeldin
10+
ms.topic: unit
11+
durationInMinutes: 6
12+
content: |
13+
[!include[](includes/2-recommend-process-tools-monitoring-agents.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.analyze-monitor-tune-ai-powered-business-solutions.analyze-backlog-user-feedback-ai-agent-usage
3+
title: "Analyze backlog and user feedback for AI agent usage"
4+
metadata:
5+
title: "Analyze Backlog and User Feedback for AI Agent Usage"
6+
description: "Learn how to analyze backlog data and user feedback to improve AI agent performance, prioritize enhancements, and address operational issues."
7+
ms.date: 02/13/2026
8+
author: msdavidram
9+
ms.author: taeldin
10+
ms.topic: unit
11+
durationInMinutes: 6
12+
content: |
13+
[!include[](includes/3-analyze-backlog-user-feedback-ai-agent-usage.md)]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.analyze-monitor-tune-ai-powered-business-solutions.apply-ai-based-tools-analyze-identify-issues-perform-tuning
3+
title: "Apply AI-based tools to analyze, identify issues, and perform tuning"
4+
metadata:
5+
title: "Apply AI-Based Tools to Analyze, Identify Issues, and Perform Tuning"
6+
description: "Learn how to analyze AI agent behavior, diagnose issues, and implement tuning strategies to improve reliability, performance, and user satisfaction."
7+
ms.date: 02/13/2026
8+
author: msdavidram
9+
ms.author: taeldin
10+
ms.topic: unit
11+
durationInMinutes: 5
12+
content: |
13+
[!include[](includes/4-apply-ai-based-tools-analyze-identify-issues-perform-tuning.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.analyze-monitor-tune-ai-powered-business-solutions.monitor-agent-performance-metrics
3+
title: "Monitor AI agent performance metrics"
4+
metadata:
5+
title: "Monitor AI Agent Performance Metrics"
6+
description: "Learn how to monitor AI agent performance metrics, evaluate operational health, and optimize reliability using structured observability practices."
7+
ms.date: 02/13/2026
8+
author: msdavidram
9+
ms.author: taeldin
10+
ms.topic: unit
11+
durationInMinutes: 5
12+
content: |
13+
[!include[](includes/5-monitor-agent-performance-metrics.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.analyze-monitor-tune-ai-powered-business-solutions.interpret-telemetry-data-performance-model-tuning
3+
title: "Interpret telemetry data to tune AI performance"
4+
metadata:
5+
title: "Interpret Telemetry Data to Tune AI Performance"
6+
description: "Learn to analyze telemetry data from AI systems to diagnose issues, optimize performance, and guide continuous tuning for better outcomes."
7+
ms.date: 02/13/2026
8+
author: msdavidram
9+
ms.author: taeldin
10+
ms.topic: unit
11+
durationInMinutes: 4
12+
content: |
13+
[!include[](includes/6-interpret-telemetry-data-performance-model-tuning.md)]
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.analyze-monitor-tune-ai-powered-business-solutions.knowledge-check
3+
title: "Module assessment"
4+
metadata:
5+
title: "Knowledge check"
6+
description: "Knowledge check"
7+
ms.date: 02/13/2026
8+
author: msdavidram
9+
ms.author: taeldin
10+
ms.topic: unit
11+
module_assessment: false
12+
durationInMinutes: 3
13+
content: "Choose the best response for each of the following questions."
14+
quiz:
15+
questions:
16+
- content: "Which of the following is a key component when establishing a monitoring operating model for AI agents?"
17+
choices:
18+
- content: "Ignoring agent guardrail triggers"
19+
isCorrect: false
20+
explanation: "Incorrect. Ignoring guardrail triggers can compromise agent reliability and compliance."
21+
- content: "Standardizing metric definitions and log review cadence"
22+
isCorrect: true
23+
explanation: "Correct. Establishing a monitoring operating model requires clear and consistent processes, including standardized metric definitions and a regular log review cadence. This ensures accountability, consistency, and the ability to detect and respond to issues effectively."
24+
- content: "Disabling all alerts to reduce noise"
25+
isCorrect: false
26+
explanation: "Incorrect. Disabling alerts can prevent the detection of critical issues and compromise agent reliability."
27+
- content: "Allowing unrestricted agent configuration"
28+
isCorrect: false
29+
explanation: "Incorrect. Allowing unrestricted configuration can lead to inconsistencies and potential compliance issues."
30+
- content: "When analyzing backlog items for AI and agent usage, what is the best first step?"
31+
choices:
32+
- content: "Immediately redesign agent prompts"
33+
isCorrect: false
34+
explanation: "Incorrect. Immediate redesign does not provide the necessary insight for targeted improvement."
35+
- content: "Categorize backlog items into meaningful domains"
36+
isCorrect: true
37+
explanation: "Correct. The first step in effective backlog analysis is to categorize items into meaningful domains such as accuracy, knowledge, performance, user experience, integration, and governance. This structured approach helps solution architects prioritize improvements, detect patterns, and address root causes systematically."
38+
- content: "Archive past feedback to avoid noise"
39+
isCorrect: false
40+
explanation: "Incorrect. Archiving feedback can result in the loss of valuable insights needed for improvement."
41+
- content: "Disable the agent until issues are fixed"
42+
isCorrect: false
43+
explanation: "Incorrect. Disabling the agent does not address the root causes or provide insights for improvement."
44+
- content: "Which metric best indicates whether users are achieving the intended outcome of an agent workflow?"
45+
choices:
46+
- content: "Token usage"
47+
isCorrect: false
48+
explanation: "Incorrect. Token usage is an operational metric and does not directly indicate user success in achieving intended outcomes."
49+
- content: "Task completion rate"
50+
isCorrect: true
51+
explanation: "Correct. Task completion rate directly measures whether users are able to successfully complete the workflows for which the agent was designed. It reflects both the agent's effectiveness and the user's satisfaction with the process."
52+
- content: "Connector quota"
53+
isCorrect: false
54+
explanation: "Incorrect. Connector quota is an operational metric and does not directly indicate user success in achieving intended outcomes."
55+
- content: "Storage utilization"
56+
isCorrect: false
57+
explanation: "Incorrect. Storage utilization is an operational metric and does not directly indicate user success in achieving intended outcomes."
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.analyze-monitor-tune-ai-powered-business-solutions.module-summary
3+
title: "Ensure reliable AI agent operations"
4+
metadata:
5+
title: "Ensure Reliable AI Agent Operations"
6+
description: "Learn how to monitor, analyze, and tune AI agents to ensure reliability, optimize performance, and align with enterprise governance goals."
7+
ms.date: 02/13/2026
8+
author: msdavidram
9+
ms.author: taeldin
10+
ms.topic: unit
11+
durationInMinutes: 4
12+
content: |
13+
[!include[](includes/8-module-summary.md)]
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
This module is designed to empower solution architects with the foundational knowledge and practical techniques required to ensure the reliability, effectiveness, and continuous improvement of AI-driven agents within enterprise environments.
2+
3+
AI-powered agents are transforming business operations by automating workflows, enhancing decision-making, and enabling new forms of user interaction. However, their success depends on robust monitoring, structured analysis, and systematic tuning practices. Solution architects play a pivotal role in defining strategies that guarantee agents operate predictably, deliver high-quality outcomes, and comply with organizational governance standards.
4+
5+
## Throughout this module, you will learn about:
6+
7+
- **Establish Monitoring Frameworks**: Understand multi-layered monitoring requirements, including operational health, performance metrics, quality assurance, usage insights, and risk management.
8+
- **Leverage Industry Tools and Processes**: Explore recommended monitoring tools such as Azure Monitor, Microsoft 365 Admin Analytics, Copilot analytics dashboards, Power Platform Admin Center, and enterprise observability platforms. Learn how to design resilient monitoring models, configure guardrails, set alerts, and conduct regular quality evaluations.
9+
- **Analyze Backlogs and User Feedback:** Develop repeatable frameworks for interpreting backlog data and user feedback. Learn to categorize issues, prioritize enhancements, and translate insights into actionable improvements.
10+
- **Apply AI-Based Diagnostic and Tuning Methods:** Master the use of telemetry, conversation transcripts, and performance scorecards to diagnose agent issues and implement targeted tuning strategies.
11+
- **Monitor Performance and Metrics:** Define and track operational, qualitative, and user-centered metrics. Understand how to interpret telemetry data to identify anomalies, assess model drift, and optimize agent workflows.
12+
13+
This module will equip you with the expertise to design and operationalize monitoring and tuning strategies that align with business objectives, drive continuous improvement, and ensure compliance with IT and governance requirements. By mastering these principles, you will be prepared to deliver high-confidence AI solutions that scale reliably and adapt to evolving enterprise needs.
14+
15+
Let us begin our journey into the essential practices of analyzing, monitoring, and tuning AI-powered business solutions.
Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
## Overview
2+
3+
This unit equips solution architects with the expertise to define, recommend, and operationalize a monitoring strategy for AI agents across the Microsoft ecosystem. The focus is on designing a resilient, governed, and observable monitoring model that enables organizations to measure agent effectiveness, detect operational risks, and ensure compliance with IT and business requirements.
4+
5+
You will explore monitoring processes, recommended tools, observability patterns, dashboards, alerting approaches, and analytical insights that support continuous improvement of agent behavior.
6+
7+
## Understanding Monitoring Requirements for AI Agents
8+
9+
### Monitoring AI agents requires a multilayered approach. Solution architects must consider
10+
11+
**Operational Health**<br>Uptime, availability, error frequency, throttling conditions, processing delays.
12+
13+
**Performance Metrics**<br>Response times, success rates of actions, tool invocation reliability, workflow completion metrics.
14+
15+
**Quality and Output Accuracy**<br>Appropriateness of generated actions or responses, alignment with business rules, deviation from expected behavior.
16+
17+
**Usage Insights**<br>Volume trends, active user adoption, agent feature utilization, behavioral patterns over time.
18+
19+
**Risk, Compliance, and Security**<br>Guardrail violations, sensitivedata handling, suspicious activity spikes, adherence to organizational policies.
20+
21+
## Recommended Processes for Monitoring AI Agents
22+
23+
Solution architects should recommend the processes for monitoring AI Agents across an organization. When an existing framework is in place, the architect should look for missing components or improvements.
24+
25+
### Establish a Monitoring Operating Model
26+
27+
* A strong operational model ensures consistency, ownership, and accountability.
28+
29+
#### Key components
30+
31+
* Defined roles (Ops team, product owners, data engineers, architects)
32+
33+
* Process workflows for incident response
34+
35+
* Standardized metric definitions (creating a baseline with trends)
36+
37+
* Log review cadence (daily/weekly/monthly)
38+
39+
* Change management and version tracking
40+
41+
* Documentation of expected agent behaviors and constraints
42+
43+
### Configure Guardrails and Threshold Alerts
44+
45+
* Set thresholds for latency, exception volume, and unusual activity.
46+
47+
* Create automated alerts for guardrail triggers or tool invocation failures.
48+
49+
* Monitor for unexpected spikes in prompts indicating potential misuse.
50+
51+
### Conduct Regular Quality Evaluations
52+
53+
* Humanintheloop spot checks
54+
55+
* Scenariobased evaluations
56+
57+
* Review lowconfidence outputs
58+
59+
* Validate alignment with business rules or compliance requirements
60+
61+
### Continuously Improve Based on Insights
62+
63+
* Analyze logs and telemetry to find failure patterns.
64+
65+
* Identify training needs for users.
66+
67+
* Recommend prompt engineering improvements.
68+
69+
* Propose workflow adjustments or retraining of custom models (if applicable).
70+
71+
## Recommended Tools for Monitoring AI Agents
72+
73+
Solution architects should recommend the toolset that covers **observability**, **analytics**, and **administrative insights**.
74+
75+
### Azure Monitor (Core Telemetry + Alerts)
76+
77+
#### Azure Monitor provides
78+
79+
* Application and agent telemetry
80+
81+
* *Dashboards for real-time* metrics
82+
83+
* Alert rules for anomalies
84+
85+
* Integration with Log Analytics Workspaces
86+
87+
#### Use cases
88+
89+
* Monitor agent workflows built with Power Platform or custom services.
90+
91+
* Track errors, latency, throughput, connector failures.
92+
93+
* Build KQL-based queries for deep diagnostics.
94+
95+
### Microsoft 365 Admin Analytics (Usage & Adoption Trends)
96+
97+
#### Useful for
98+
99+
* Understanding agent usage volume
100+
101+
* Tracking adoption and engagement
102+
103+
* Identifying departments with low usage or operational barriers
104+
105+
* Measuring improvements week-over-week
106+
107+
### Copilot & Agent Analytics Dashboards
108+
109+
#### When available in an organization's tenant, Copilot analytics can provide
110+
111+
* Agent invocation frequency
112+
113+
* Task completion trends
114+
115+
* Common user queries
116+
117+
* Productivity pattern insights
118+
119+
* Error or guardrail-trigger events
120+
121+
### Power Platform Admin Center (Environment-Level Monitoring)
122+
123+
#### Provides
124+
125+
* Environment health
126+
127+
* Connector usage and limits
128+
129+
* Flow telemetry (for agents using workflows)
130+
131+
* DLP rule impact visibility
132+
133+
### Foundry or Organizational Observability Platforms
134+
135+
#### Enterprises may adopt centralized observability platforms (example: Foundry-like solutions, if present in the environment) to unify
136+
137+
* Multisystem logs
138+
139+
* Event traces
140+
141+
* Cross-environment dashboards
142+
143+
* AI model execution insights
144+
145+
* These platforms reduce fragmentation and provide a single-pane-of-glass view for complex agent ecosystems.
146+
147+
### Custom Dashboards for Enterprise AI Agents
148+
149+
#### Solution architects often design
150+
151+
* KPI dashboards in Power BI
152+
153+
* Heatmaps of usage
154+
155+
* Drift detection visualizations
156+
157+
* Compliance trend reports
158+
159+
#### Example Agent Health Summary
160+
161+
| Agent Name | Success Rate | Avg. Response Time | Errors Today | Usage Trend |
162+
| --- | --- | --- | --- | --- |
163+
| Sales Helper | 98% | 1.8 sec | 3 | ↑ Increasing |
164+
| Ops Agent | 92% | 2.5 sec | 17 | → Steady |
165+
| Finance Advisor | 86% | 3.2 sec | 28 | ↓ Decreasing |
166+
167+
#### Best Practices
168+
169+
* Always centralize logs.
170+
171+
* Standardize naming conventions.
172+
173+
* Define clear SLAs for agent responsiveness.
174+
175+
* Automate alerting for critical business workflows.
176+
177+
* Integrate monitoring outputs into monthly operational reviews.
178+
179+
## References
180+
181+
[https://learn.microsoft.com/training/modules/describe-monitoring-tools-azure/4-describe-azure-monitor](/training/modules/describe-monitoring-tools-azure/4-describe-azure-monitor)
182+
183+
[https://learn.microsoft.com/training/modules/perform-admin-tasks-microsoft-365-copilot/](/training/modules/perform-admin-tasks-microsoft-365-copilot/)
184+
185+
[https://learn.microsoft.com/azure/ai-foundry/observability/how-to/how-to-monitor-agents-dashboard?view=foundry](/azure/ai-foundry/observability/how-to/how-to-monitor-agents-dashboard)
186+
187+
[https://learn.microsoft.com/power-platform/admin/analytics-copilot](/power-platform/admin/analytics-copilot)

0 commit comments

Comments
 (0)