You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: Learn how AI-enabled agents help solve problems and support resilient and self-healing systems on your behalf.
4
4
author: craigshoemaker
5
5
ms.topic: conceptual
6
-
ms.date: 08/20/2025
6
+
ms.date: 08/22/2025
7
7
ms.author: cshoe
8
8
ms.service: azure
9
9
---
10
10
11
11
# Azure SRE Agent overview (preview)
12
12
13
-
Azure SRE Agent is an AI-powered service that helps you monitor, manage, and maintain your Azure resources with minimal human intervention. By combining artificial intelligence with Azure and site reliability engineering best practices, the agent proactively identifies issues, provides actionable insights, and even performs remediation tasks to help you ensure maximum uptime for your critical cloud services.
13
+
Azure SRE Agent is an AI-powered reliability assistant that helps teams diagnose and resolve production issues, reduce operational toil, and lower mean time to resolution (MTTR).
14
14
15
-
As an automated tool, the agent continuously monitors your resources, analyzes performance data, and responds to incidents in real-time. Whether you're troubleshooting application issues, deploying new services, or maintaining existing infrastructure, Azure SRE Agent reduces operational toil and lets you focus on higher value work.
15
+
Ask questions in natural language, get explainable root cause analysis (RCA), and orchestrate incident workflows with human-in-the-loop approvals or autonomous execution within scoped guardrails. You can configure the agent to follow customized instructions and runbooks and enable consistent and scalable incident response aligned with your team’s operational practices.
16
16
17
-
:::image type="content" source="media/overview/azure-sre-agent-activities-screenshot.png" alt-text="Screenshot of Azure SRE Agent activities window.":::
17
+
## What you can do with SRE Agent
18
18
19
-
> [!NOTE]
20
-
> The SRE Agent is in preview and is only available to customers on the SRE Agent wait list. To sign up for the wait list, fill out the [SRE Agent application](https://go.microsoft.com/fwlink/?linkid=2319540).
| Ask plain-language questions about Azure resources, incidents, and health. | Diagnose, mitigate, and resolve incidents across Azure Monitor or integrated tools. The agent works autonomously or with approvals. | Agent sends daily summaries of environment health, flags spikes in CPU/memory usage, and identifies resources not following security best practices. |
22
+
|**Examples:**<br><br>• *What changed in production in last 24 hours?*<br><br>• *Which resources are unhealthy?*<br><br>• *What alerts are active now?*|**Examples:**<br><br>• Incidents from ServiceNow or PagerDuty<br><br>• 500 error alerts from Azure Monitor<br><br> • Run custom incident resolution workflows |**Examples:**<br><br>• Daily health summary for production<br><br>• CPU spike detection<br><br>• Security compliance violations |
21
23
22
-
## Key features
24
+
Watch the following video to see SRE Agent in action.
23
25
24
-
Azure SRE Agent is made up from the following key features:
26
+
<br>
25
27
26
-
***Specialized AI-enabled tooling**: SRE Agent offers fine-tuned tools that help you manage and maintain Azure services according to architectural and security best practices. The agent handles manual and mundane tasks while maintaining a continuous context of your Azure services and environments. This environmental awareness enables the agent to quickly identify root causes and troubleshoot issues for you.
***Conversational chat interface**: By responding to your prompts, you can use the agent to troubleshoot and resolve application issues, deploy Azure services, and delegate work to autonomous agents all via a chat window.
30
+
## Key capabilities
29
31
30
-
***Incident management**: You can configure the agent to be the first to respond to incidents in your environment. Paired with [Azure Monitor Alerts](/azure/azure-monitor/alerts/alerts-overview), [PagerDuty](https://www.pagerduty.com/), or [ServiceNow](https://www.servicenow.com/), you can define custom resolution workflows to quickly resolve issues. Depending on your configuration, the agent can fix problems upon your approval, or work autonomously.
32
+
| Feature | Description |
33
+
|---|---|
34
+
|**Incident Automation**| Diagnose, enrich, and orchestrate workflows across Azure Monitor and supported tools with human-in-the-loop approvals or autonomous execution using custom incident resolution plans. |
35
+
|**Customizable incident handling**| Tailor the agent’s behavior to follow your operational instructions, ensuring incidents are managed in alignment with your team’s SRE best practices. |
36
+
|**Explainable root cause analysis (RCA)**| Correlate metrics, logs, traces, and recent deployments to propose likely causes and safe mitigations. When attached to a source code repository, the agent can pinpoint code diffs in RCA reports. |
37
+
|**Dev work item creation**| Automatically create developer work items in GitHub or Azure DevOps, linking incidents to commits, PRs, and deployment history. Includes repro steps, logs, and suspects to accelerate resolution. |
38
+
|**Natural language insights**| Ask questions and issue commands in plain English. |
31
39
32
-
***Continuous, proactive monitoring**: The agent continuously evaluates activity in your environments to monitor active resources. The agent maintains a knowledge graph of your Azure infrastructure maintaining relationships between resources and dependencies.
40
+
## Integrations
33
41
34
-
***Source code integration**: When you connect your Azure resources to Azure DevOps or GitHub repositories through SRE Agent, the agent is able to help pinpoint root causes in your code base. Additionally, if you connect a resource to a GitHub repository, the agent can delegate fixes to GitHub Copilot.
42
+
Azure SRE Agent integrates with the following services:
Azure SRE Agent serves as a virtual Site Reliability Engineer, helping you maintain and optimize your Azure resources. Here's how you can use the agent on a daily basis:
46
+
-**Source Code:** GitHub, Azure DevOps
39
47
40
-
***Incident response**: When alerts trigger from your monitoring systems, interact with the agent through the chat interface to quickly diagnose issues. The agent can analyze the problem, suggest mitigations, and with your approval, implement fixes automatically.
48
+
## Get started
41
49
42
-
***Proactive maintenance**: Use the agent to identify optimization opportunities in your environment. Ask questions like "Which of my apps need configuration improvements?" or "Are there any resources not following security best practices?"
50
+
Use the following steps to start working with Azure SRE Agent.
43
51
44
-
***Daily briefings**: Start your day by reviewing the daily resource report automatically generated by the agent. This report provides a snapshot of your environment's health, highlighting any incidents that occurred overnight and their current status.
52
+
# [Explore](#tab/explore)
45
53
46
-
***Resource management**: Delegate routine tasks like scaling resources, reviewing configurations, or checking on service health by asking the agent through the chat interface. The agent monitors and maintains any Azure resources in resource groups managed by the agent.
54
+
1. Create [a new agent](usage.md)in your subscription with *[Reader](security-context.md)* permissions.
47
55
48
-
***Resource visualization**: When you need to understand relationships between resources or troubleshoot dependencies, ask the agent to visualize your infrastructure and explain connections between services. Further, you can review the resource visualization the agent maintains in the Azure portal.
56
+
1. Point the agent to the resource groups you want to manage.
49
57
50
-
:::image type="content" source="media/overview/resources.png" alt-text="Screenshot of an SRE Agent knowledge graph." lightbox="media/overview/resources.png":::
58
+
1. Try prompts like:
51
59
52
-
## Supported services
60
+
-*What’s the CPU and memory utilization of my app?*
53
61
54
-
Azure SRE Agent helps you monitor, maintain, and manage any Azure service.
62
+
-*Which resources are unhealthy?*
55
63
56
-
Additionally, the agent features fine-tuned contexts for managing the following services:
64
+
-*Where am I missing alert rules?*
57
65
58
-
:::row:::
59
-
:::column span="":::
60
-
- Azure API Management
61
-
- Azure App Service
62
-
- Azure Cache for Redis
63
-
- Azure Container Apps
64
-
- Azure Cosmos DB
65
-
- Azure Database for PostgreSQL
66
-
:::column-end:::
67
-
:::column span="":::
68
-
- Azure Functions
69
-
- Azure Kubernetes Service
70
-
- Azure SQL
71
-
- Azure Storage
72
-
- Azure Virtual Machines
73
-
:::column-end:::
74
-
:::row-end:::
66
+
1. Take action to proposed next steps.
75
67
76
-
To get the latest list of services with custom agent tooling, submit the following prompt to the agent:
68
+
# [Handle an incident](#tab/incident)
77
69
78
-
```text
79
-
Which Azure services do you have specialized tooling available for?
80
-
```
70
+
1. Enable integrations:
71
+
72
+
- Incident management tools: Link to ServiceNow, PagerDuty, or use Azure Monitor alerts.
73
+
74
+
- Ticketing systems: Azure Boards.
75
+
76
+
- Source code repositories: Connect to GitHub or Azure DevOps.
77
+
78
+
1. Send a test incident to validate enrichment, RCA, and automation flow.
79
+
80
+
1. Review incident context, RCA timeline, and proposed mitigations.
81
+
82
+
---
81
83
82
84
## Considerations
83
85
84
86
Keep in mind the following considerations as you use Azure SRE Agent:
85
87
86
-
* English is the only supported language in the chat interface
87
-
* During preview, you can deploy the agent to the Sweden Central region, but the agent can monitor and remediate issues for services in any Azure region.
88
-
* For more information on how data is managed in Azure SRE Agent, see the [Microsoft privacy policy](https://www.microsoft.com/privacy/privacystatement).
88
+
- English is the only supported language in the chat interface
89
+
- During preview, you can deploy the agent to the Sweden Central region, but the agent can monitor and remediate issues for services in any Azure region.
90
+
- For more information on how data is managed in Azure SRE Agent, see the [Microsoft privacy policy](https://www.microsoft.com/privacy/privacystatement).
91
+
- Availability varies by region/tenant configuration.
92
+
- Preview [billing](billing.md) begins **September 1, 2025**, using Azure Agent Units (AAU).
0 commit comments