MicrosoftDocs
diff --git a/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/1-introduction.yml‎
Lines changed: 13 additions & 0 deletions b/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/1-introduction.yml‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/2-understand-azure-monitor-metrics-visualization.yml‎
Lines changed: 13 additions & 0 deletions b/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/2-understand-azure-monitor-metrics-visualization.yml‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/3-configure-alerts-alert-processing-rules.yml‎
Lines changed: 13 additions & 0 deletions b/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/3-configure-alerts-alert-processing-rules.yml‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/4-query-log-data-log-analytics-workspace.yml‎
Lines changed: 13 additions & 0 deletions b/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/4-query-log-data-log-analytics-workspace.yml‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/5-exercise-configure-monitoring-azure-infrastructure.yml‎
Lines changed: 13 additions & 0 deletions b/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/5-exercise-configure-monitoring-azure-infrastructure.yml‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/6-knowledge-check.yml‎
Lines changed: 48 additions & 0 deletions b/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/6-knowledge-check.yml‎
Lines changed: 48 additions & 0 deletions
diff --git a/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/7-summary.yml‎
Lines changed: 13 additions & 0 deletions b/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/7-summary.yml‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/includes/1-introduction.md‎
Lines changed: 25 additions & 0 deletions b/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/includes/1-introduction.md‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/includes/2-understand-azure-monitor-metrics-visualization.md‎
Lines changed: 39 additions & 0 deletions b/‎learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/includes/2-understand-azure-monitor-metrics-visualization.md‎
Lines changed: 39 additions & 0 deletions
@@ -0,0 +1,13 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.manage-monitoring-ai-ready-infrastructure.introduction
+title: "Introduction"
+metadata:
+  title: "Introduction"
+  description: "Introduction"
+  ms.date: 02/02/2026
+  author: wwlpublish
+  ms.author: bradj
+  ms.topic: unit
+durationInMinutes: 5
+content: |
+  [!include[](includes/1-introduction.md)]
@@ -0,0 +1,13 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.manage-monitoring-ai-ready-infrastructure.understand-azure-monitor-metrics-visualization
+title: "Understand Azure Monitor metrics and visualization"
+metadata:
+  title: "Understand Azure Monitor metrics and visualization"
+  description: "Understand Azure Monitor metrics and visualization"
+  ms.date: 02/02/2026
+  author: wwlpublish
+  ms.author: bradj
+  ms.topic: unit
+durationInMinutes: 12
+content: |
+  [!include[](includes/2-understand-azure-monitor-metrics-visualization.md)]
@@ -0,0 +1,13 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.manage-monitoring-ai-ready-infrastructure.configure-alerts-alert-processing-rules
+title: "Configure alerts and alert processing rules"
+metadata:
+  title: "Configure alerts and alert processing rules"
+  description: "Configure alerts and alert processing rules"
+  ms.date: 02/02/2026
+  author: wwlpublish
+  ms.author: bradj
+  ms.topic: unit
+durationInMinutes: 13
+content: |
+  [!include[](includes/3-configure-alerts-alert-processing-rules.md)]
@@ -0,0 +1,13 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.manage-monitoring-ai-ready-infrastructure.query-log-data-log-analytics-workspace
+title: "Query log data in Log Analytics Workspace"
+metadata:
+  title: "Query log data in Log Analytics Workspace"
+  description: "Query log data in Log Analytics Workspace"
+  ms.date: 02/02/2026
+  author: wwlpublish
+  ms.author: bradj
+  ms.topic: unit
+durationInMinutes: 11
+content: |
+  [!include[](includes/4-query-log-data-log-analytics-workspace.md)]
@@ -0,0 +1,13 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.manage-monitoring-ai-ready-infrastructure.exercise-configure-monitoring-azure-infrastructure
+title: "Configure Monitoring Azure Infrastructure"
+metadata:
+  title: "Configure Monitoring Azure Infrastructure"
+  description: "Configure Monitoring Azure Infrastructure"
+  ms.date: 02/02/2026
+  author: wwlpublish
+  ms.author: bradj
+  ms.topic: unit
+durationInMinutes: 60
+content: |
+  [!include[](includes/5-exercise-configure-monitoring-azure-infrastructure.md)]
@@ -0,0 +1,48 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.manage-monitoring-ai-ready-infrastructure.knowledge-check
+title: "Module assessment"
+metadata:
+  title: "Knowledge check"
+  description: "Test your understanding of Azure Monitor implementation by answering these scenario-based questions. Consider how you would apply monitoring concepts to real-world infrastructure management challenges."
+  ms.date: 02/02/2026
+  author: wwlpublish
+  ms.author: bradj
+  ms.topic: unit
+  module_assessment: false
+durationInMinutes: 5
+content: "Choose the best response for each of the following questions."
+quiz:
+  questions:
+  - content: "Your operations team manages 50 virtual machines running AI training workloads. You need to track memory usage inside the VMs to detect when training processes consume excessive resources. Which metric collection method provides the visibility you require?"
+    choices:
+    - content: "Platform metrics collected automatically by Azure Monitor, which include memory available bytes for all virtual machines"
+      isCorrect: false
+      explanation: "Incorrect. Platform metrics don't include memory usage inside the virtual machine's operating system—they only track resource consumption at the Azure infrastructure level like CPU percentage and disk IOPS."
+    - content: "Guest OS metrics collected by the Azure Diagnostics extension installed on each virtual machine"
+      isCorrect: true
+      explanation: "Correct. Guest OS metrics provide visibility into memory usage and process-level performance counters. The Azure Diagnostics extension must be installed on each VM to collect these metrics."
+    - content: "Custom metrics published from your training application using the Application Insights SDK"
+      isCorrect: false
+      explanation: "Incorrect. Custom metrics would work but require modifying your training application code to publish metrics, adding unnecessary complexity when guest OS metrics provide the needed data automatically once the diagnostics extension is installed."
+  - content: "You create an alert rule that fires when storage account transaction latency exceeds 500 milliseconds. During testing, you notice the alert fires briefly every hour during backup operations, generating notifications your team ignores. How should you reduce alert fatigue while maintaining visibility into genuine latency issues?"
+    choices:
+    - content: "Increase the latency threshold to 1000 milliseconds and reduce the evaluation frequency to 15 minutes"
+      isCorrect: false
+      explanation: "Incorrect. Increasing the threshold to 1000 milliseconds would hide real latency issues that occur between 500-1000ms, potentially missing performance degradation that affects user experience."
+    - content: "Create an alert processing rule that suppresses notifications during the 10-minute backup window each hour"
+      isCorrect: true
+      explanation: "Correct. An alert processing rule with time-based suppression eliminates notifications for expected latency spikes during backups while preserving the alert rule for genuine performance problems outside the backup window."
+    - content: "Disable the alert rule entirely and rely on user reports to detect storage performance problems"
+      isCorrect: false
+      explanation: "Incorrect. Disabling the alert entirely removes proactive monitoring, forcing your team into reactive mode where storage problems surface only after users complain, increasing mean time to detection and business impact."
+  - content: "After receiving an alert about high CPU usage on a virtual machine, you need to identify which process consumed the most resources during the spike. Which Kusto Query Language (KQL) query pattern provides this information?"
+    choices:
+    - content: "Query the AzureDiagnostics table filtering for Level equals 'Error' to find failures that caused the CPU spike"
+      isCorrect: false
+      explanation: "Incorrect. The AzureDiagnostics table contains error logs but doesn't provide process-level performance data—errors might correlate with high CPU but won't tell you which process was responsible."
+    - content: "Query the Perf table filtering for CounterName equals '% Processor Time' and summarize by process name"
+      isCorrect: true
+      explanation: "Correct. Querying the Perf table for processor time counters by process name provides the granular resource consumption data needed to identify which specific process caused the CPU spike."
+    - content: "Query the SecurityEvent table to detect unauthorized access attempts that might have triggered resource-intensive operations"
+      isCorrect: false
+      explanation: "Incorrect. The SecurityEvent table tracks authentication and access events, useful for security investigations but irrelevant for diagnosing resource consumption patterns that cause CPU saturation."
@@ -0,0 +1,13 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.manage-monitoring-ai-ready-infrastructure.summary
+title: "Summary"
+metadata:
+  title: "Summary"
+  description: "Summary"
+  ms.date: 02/02/2026
+  author: wwlpublish
+  ms.author: bradj
+  ms.topic: unit
+durationInMinutes: 2
+content: |
+  [!include[](includes/7-summary.md)]
@@ -0,0 +1,25 @@
+Your company runs machine learning workloads on Azure that analyze customer data around the clock. Last week, a virtual machine crashed during a training job, but your team discovered the failure only after users reported missing results. By then, two hours of compute time were wasted, and the training pipeline needed manual restart. This scenario highlights a critical gap: without proactive monitoring, infrastructure failures disrupt business operations before anyone notices.
+
+Azure Monitor closes this gap by collecting metrics, logs, and alerts from your infrastructure in real time. With Azure Monitor, you detect performance degradation before it causes downtime, receive notifications when resources exceed capacity thresholds, and query log data to diagnose the root cause of failures. For AI workloads that demand high availability, this visibility translates to measurable outcomes: reduced mean time to resolution (MTTR), improved service level agreement (SLA) compliance, and fewer manual interventions during production incidents.
+
+In this module, you configure monitoring for Azure infrastructure supporting AI workloads. You set up metric collection to track CPU, memory, and disk performance. You create alert rules that notify your operations team when thresholds are breached. You implement alert processing rules to suppress notifications during planned maintenance windows. Finally, you query log data in Log Analytics Workspace to investigate infrastructure events and validate your monitoring configuration.
+
+## Learning objectives
+
+By the end of this module, you're able to:
+
+- Explain how Azure Monitor and Log Analytics Workspace support infrastructure management
+- Configure metrics collection and visualization for Azure resources
+- Implement alert rules and processing rules to respond to infrastructure events
+- Query log data to diagnose infrastructure issues
+
+## Prerequisites
+
+- Familiarity with basic Azure concepts and resource types such as virtual machines, storage accounts, and networking components
+- Access to an Azure subscription with Contributor permissions to create and configure resources
+- Understanding of fundamental networking and compute concepts including IP addressing, load balancing, and CPU utilization
+
+## More resources
+
+- [Azure Monitor overview](/azure/azure-monitor/overview) - Official documentation covering Azure Monitor architecture and capabilities
+- [Log Analytics Workspace documentation](/azure/azure-monitor/logs/log-analytics-workspace-overview) - Detailed guide to Log Analytics Workspace setup and query capabilities
@@ -0,0 +1,39 @@
+You need visibility into infrastructure performance before problems escalate into outages. Consider a scenario where your virtual machine's CPU usage climbs steadily over several days. Without metric tracking, you discover the capacity issue only after the VM becomes unresponsive and training jobs fail. Azure Monitor metrics solve this problem by collecting performance data automatically from every Azure resource you deploy.
+
+## How Azure Monitor collects metrics
+
+Azure Monitor captures three types of metrics that provide different layers of visibility into your infrastructure. Platform metrics are collected automatically the moment you create a resource—no configuration required. When you deploy a virtual machine, Azure Monitor immediately begins tracking CPU percentage, network throughput, and disk operations per second. These metrics flow into Azure Monitor's time-series database every 60 seconds, giving you near real-time visibility into resource behavior.
+
+Platform metrics cover the fundamentals, but they don't reveal what's happening inside your virtual machine's operating system. For deeper visibility, you enable guest OS metrics by installing the Azure Diagnostics extension on your VM. This agent collects memory usage, process-level performance counters, and application-specific metrics that platform monitoring can't access. With this approach, you track not just whether your VM is running, but whether it has sufficient memory to handle current workloads and which processes consume the most resources.
+
+Custom metrics extend monitoring beyond infrastructure to capture business-specific indicators. Using the Application Insights SDK or Azure Monitor REST API, you send metrics that matter to your organization—such as the number of AI model predictions completed per minute, queue processing latency, or user session duration. This becomes especially important when your operations team needs to correlate infrastructure performance with business outcomes and demonstrate how resource optimization improves application responsiveness.
+
+:::image type="content" source="../media/custom-metrics-extend-monitoring-infrastructure.png" alt-text="Diagram showing how custom metrics extend monitoring beyond infrastructure to capture business-specific indicators.":::
+
+## Visualizing metrics for operations teams
+
+Collecting metrics delivers value only when your team can interpret trends and act on anomalies. Azure Monitor provides two primary visualization tools that serve different operational needs. Metrics Explorer offers ad-hoc analysis when you investigate a specific performance question or troubleshoot an active incident. You select a resource, choose one or more metrics, apply time range filters, and view trend charts that reveal patterns like CPU spikes during batch processing or gradual memory leaks over multiple days.
+
+With Metrics Explorer, you answer immediate questions: Did CPU usage exceed 80% during last night's training run? How does network throughput compare between this week and last week? However, ad-hoc analysis doesn't provide continuous monitoring. Your operations team needs persistent visibility into critical metrics without repeatedly building the same charts. Azure dashboards solve this by pinning Metrics Explorer visualizations to a shared view that displays real-time data from multiple resources simultaneously.
+
+:::image type="content" source="../media/temporary-analysis-continuous-monitor.png" alt-text="Diagram Azure dashboards showing how to pin Metrics Explorer visualizations to a shared view.":::
+
+A well-designed dashboard shows your team the health of compute, storage, and networking resources at a glance. You create separate panels for CPU utilization across all virtual machines, storage account transaction rates, and network gateway bandwidth consumption. This consolidated view enables your operations team to detect cross-resource patterns—such as high CPU correlating with increased storage I/O—and prioritize investigation efforts based on severity and business impact. For AI workloads that span multiple services, this holistic visibility reduces the time spent switching between resource pages and accelerates root cause analysis during incidents.
+
+## Business impact of continuous metric monitoring
+
+Proactive metric tracking transforms infrastructure management from reactive firefighting to preventive maintenance. When you visualize CPU trends over weeks instead of responding to individual spikes, you identify capacity planning opportunities before resources become bottlenecks. Your finance team benefits from this visibility through more accurate cost forecasting, because metric data reveals when to scale resources up or down based on actual usage patterns rather than guesswork.
+
+For teams managing AI infrastructure, continuous monitoring delivers measurable operational improvements. Organizations that implement metric dashboards report 40-60% reductions in mean time to detection (MTTD) for performance issues, because anomalies become visible immediately rather than surfacing only after user complaints. This early detection prevents cascading failures—such as a memory leak in one VM causing downstream service timeouts—and reduces the business impact of infrastructure incidents by enabling faster, more targeted remediation efforts.
+
+:::image type="content" source="../media/azure-monitor-collect-platform.png" alt-text="Diagram showing how three metric sources with Azure resources emitting platform metrics automatically.":::
+
+*Azure Monitor collects platform, guest OS, and custom metrics, then delivers them to visualization and alerting tools*
+
+
+## More resources
+
+- [Azure Monitor Metrics overview](/azure/azure-monitor/essentials/data-platform-metrics) - Comprehensive guide to metric types, collection methods, and retention policies
+- [Metrics Explorer documentation](/azure/azure-monitor/essentials/metrics-getting-started) - Step-by-step instructions for creating charts and analyzing metric data
+- [Azure dashboards best practices](/azure/azure-portal/azure-portal-dashboards) - Design patterns for effective operational dashboards
+