Skip to content

Commit 49a94f1

Browse files
authored
Create 1-introduction.md
1 parent b330dfd commit 49a94f1

1 file changed

Lines changed: 25 additions & 0 deletions

File tree

  • learn-pr/wwl-azure/manage-monitoring-ai-ready-infrastructure/includes
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
Your company runs machine learning workloads on Azure that analyze customer data around the clock. Last week, a virtual machine crashed during a training job, but your team discovered the failure only after users reported missing results. By then, two hours of compute time were wasted, and the training pipeline needed manual restart. This scenario highlights a critical gap: without proactive monitoring, infrastructure failures disrupt business operations before anyone notices.
2+
3+
Azure Monitor closes this gap by collecting metrics, logs, and alerts from your infrastructure in real time. With Azure Monitor, you detect performance degradation before it causes downtime, receive notifications when resources exceed capacity thresholds, and query log data to diagnose the root cause of failures. For AI workloads that demand high availability, this visibility translates to measurable outcomes: reduced mean time to resolution (MTTR), improved service level agreement (SLA) compliance, and fewer manual interventions during production incidents.
4+
5+
In this module, you configure monitoring for Azure infrastructure supporting AI workloads. You set up metric collection to track CPU, memory, and disk performance. You create alert rules that notify your operations team when thresholds are breached. You implement alert processing rules to suppress notifications during planned maintenance windows. Finally, you query log data in Log Analytics Workspace to investigate infrastructure events and validate your monitoring configuration.
6+
7+
## Learning objectives
8+
9+
By the end of this module, you're able to:
10+
11+
- Explain how Azure Monitor and Log Analytics Workspace support infrastructure management
12+
- Configure metrics collection and visualization for Azure resources
13+
- Implement alert rules and processing rules to respond to infrastructure events
14+
- Query log data to diagnose infrastructure issues
15+
16+
## Prerequisites
17+
18+
- Familiarity with basic Azure concepts and resource types such as virtual machines, storage accounts, and networking components
19+
- Access to an Azure subscription with Contributor permissions to create and configure resources
20+
- Understanding of fundamental networking and compute concepts including IP addressing, load balancing, and CPU utilization
21+
22+
## More resources
23+
24+
- [Azure Monitor overview](/azure/azure-monitor/overview) - Official documentation covering Azure Monitor architecture and capabilities
25+
- [Log Analytics Workspace documentation](/azure/azure-monitor/logs/log-analytics-workspace-overview) - Detailed guide to Log Analytics Workspace setup and query capabilities

0 commit comments

Comments
 (0)