Skip to content

Commit f6a39dc

Browse files
authored
Merge pull request #304539 from craigshoemaker/sre/overview2
[SRE Agent] Prototype Overview
2 parents 351d4e5 + 06c0a53 commit f6a39dc

1 file changed

Lines changed: 52 additions & 48 deletions

File tree

articles/sre-agent/overview.md

Lines changed: 52 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -3,89 +3,93 @@ title: Azure SRE Agent overview (preview)
33
description: Learn how AI-enabled agents help solve problems and support resilient and self-healing systems on your behalf.
44
author: craigshoemaker
55
ms.topic: conceptual
6-
ms.date: 08/20/2025
6+
ms.date: 08/22/2025
77
ms.author: cshoe
88
ms.service: azure
99
---
1010

1111
# Azure SRE Agent overview (preview)
1212

13-
Azure SRE Agent is an AI-powered service that helps you monitor, manage, and maintain your Azure resources with minimal human intervention. By combining artificial intelligence with Azure and site reliability engineering best practices, the agent proactively identifies issues, provides actionable insights, and even performs remediation tasks to help you ensure maximum uptime for your critical cloud services.
13+
Azure SRE Agent is an AI-powered reliability assistant that helps teams diagnose and resolve production issues, reduce operational toil, and lower mean time to resolution (MTTR).
1414

15-
As an automated tool, the agent continuously monitors your resources, analyzes performance data, and responds to incidents in real-time. Whether you're troubleshooting application issues, deploying new services, or maintaining existing infrastructure, Azure SRE Agent reduces operational toil and lets you focus on higher value work.
15+
Ask questions in natural language, get explainable root cause analysis (RCA), and orchestrate incident workflows with human-in-the-loop approvals or autonomous execution within scoped guardrails. You can configure the agent to follow customized instructions and runbooks and enable consistent and scalable incident response aligned with your team’s operational practices.
1616

17-
:::image type="content" source="media/overview/azure-sre-agent-activities-screenshot.png" alt-text="Screenshot of Azure SRE Agent activities window.":::
17+
## What you can do with SRE Agent
1818

19-
> [!NOTE]
20-
> The SRE Agent is in preview and is only available to customers on the SRE Agent wait list. To sign up for the wait list, fill out the [SRE Agent application](https://go.microsoft.com/fwlink/?linkid=2319540).
19+
| Ask & understand | Automate incidents | Stay proactive |
20+
|---|---|---|
21+
| Ask plain-language questions about Azure resources, incidents, and health. | Diagnose, mitigate, and resolve incidents across Azure Monitor or integrated tools. The agent works autonomously or with approvals. | Agent sends daily summaries of environment health, flags spikes in CPU/memory usage, and identifies resources not following security best practices. |
22+
| **Examples:**<br><br>• *What changed in production in last 24 hours?*<br><br>• *Which resources are unhealthy?*<br><br>• *What alerts are active now?* | **Examples:**<br><br>• Incidents from ServiceNow or PagerDuty<br><br>• 500 error alerts from Azure Monitor<br><br> • Run custom incident resolution workflows | **Examples:**<br><br>• Daily health summary for production<br><br>• CPU spike detection<br><br>• Security compliance violations |
2123

22-
## Key features
24+
Watch the following video to see SRE Agent in action.
2325

24-
Azure SRE Agent is made up from the following key features:
26+
<br>
2527

26-
* **Specialized AI-enabled tooling**: SRE Agent offers fine-tuned tools that help you manage and maintain Azure services according to architectural and security best practices. The agent handles manual and mundane tasks while maintaining a continuous context of your Azure services and environments. This environmental awareness enables the agent to quickly identify root causes and troubleshoot issues for you.
28+
> [!VIDEO https://www.youtube.com/embed/DRWppVNOTqQ?si=FJ9dNk5uY1kUET-R]
2729
28-
* **Conversational chat interface**: By responding to your prompts, you can use the agent to troubleshoot and resolve application issues, deploy Azure services, and delegate work to autonomous agents all via a chat window.
30+
## Key capabilities
2931

30-
* **Incident management**: You can configure the agent to be the first to respond to incidents in your environment. Paired with [Azure Monitor Alerts](/azure/azure-monitor/alerts/alerts-overview), [PagerDuty](https://www.pagerduty.com/), or [ServiceNow](https://www.servicenow.com/), you can define custom resolution workflows to quickly resolve issues. Depending on your configuration, the agent can fix problems upon your approval, or work autonomously.
32+
| Feature | Description |
33+
|---|---|
34+
| **Incident Automation** | Diagnose, enrich, and orchestrate workflows across Azure Monitor and supported tools with human-in-the-loop approvals or autonomous execution using custom incident resolution plans. |
35+
| **Customizable incident handling** | Tailor the agent’s behavior to follow your operational instructions, ensuring incidents are managed in alignment with your team’s SRE best practices. |
36+
| **Explainable root cause analysis (RCA)** | Correlate metrics, logs, traces, and recent deployments to propose likely causes and safe mitigations. When attached to a source code repository, the agent can pinpoint code diffs in RCA reports. |
37+
| **Dev work item creation** | Automatically create developer work items in GitHub or Azure DevOps, linking incidents to commits, PRs, and deployment history. Includes repro steps, logs, and suspects to accelerate resolution. |
38+
| **Natural language insights** | Ask questions and issue commands in plain English. |
3139

32-
* **Continuous, proactive monitoring**: The agent continuously evaluates activity in your environments to monitor active resources. The agent maintains a knowledge graph of your Azure infrastructure maintaining relationships between resources and dependencies.
40+
## Integrations
3341

34-
* **Source code integration**: When you connect your Azure resources to Azure DevOps or GitHub repositories through SRE Agent, the agent is able to help pinpoint root causes in your code base. Additionally, if you connect a resource to a GitHub repository, the agent can delegate fixes to GitHub Copilot.
42+
Azure SRE Agent integrates with the following services:
3543

36-
## How do I use Azure SRE Agent?
44+
- **Incidents & work:** [Azure Monitor alerts](/azure/azure-monitor/alerts/alerts-overview), [PagerDuty](https://www.pagerduty.com/), [ServiceNow](https://www.servicenow.com/)
3745

38-
Azure SRE Agent serves as a virtual Site Reliability Engineer, helping you maintain and optimize your Azure resources. Here's how you can use the agent on a daily basis:
46+
- **Source Code:** GitHub, Azure DevOps
3947

40-
* **Incident response**: When alerts trigger from your monitoring systems, interact with the agent through the chat interface to quickly diagnose issues. The agent can analyze the problem, suggest mitigations, and with your approval, implement fixes automatically.
48+
## Get started
4149

42-
* **Proactive maintenance**: Use the agent to identify optimization opportunities in your environment. Ask questions like "Which of my apps need configuration improvements?" or "Are there any resources not following security best practices?"
50+
Use the following steps to start working with Azure SRE Agent.
4351

44-
* **Daily briefings**: Start your day by reviewing the daily resource report automatically generated by the agent. This report provides a snapshot of your environment's health, highlighting any incidents that occurred overnight and their current status.
52+
# [Explore](#tab/explore)
4553

46-
* **Resource management**: Delegate routine tasks like scaling resources, reviewing configurations, or checking on service health by asking the agent through the chat interface. The agent monitors and maintains any Azure resources in resource groups managed by the agent.
54+
1. Create [a new agent](usage.md) in your subscription with *[Reader](security-context.md)* permissions.
4755

48-
* **Resource visualization**: When you need to understand relationships between resources or troubleshoot dependencies, ask the agent to visualize your infrastructure and explain connections between services. Further, you can review the resource visualization the agent maintains in the Azure portal.
56+
1. Point the agent to the resource groups you want to manage.
4957

50-
:::image type="content" source="media/overview/resources.png" alt-text="Screenshot of an SRE Agent knowledge graph." lightbox="media/overview/resources.png":::
58+
1. Try prompts like:
5159

52-
## Supported services
60+
- *What’s the CPU and memory utilization of my app?*
5361

54-
Azure SRE Agent helps you monitor, maintain, and manage any Azure service.
62+
- *Which resources are unhealthy?*
5563

56-
Additionally, the agent features fine-tuned contexts for managing the following services:
64+
- *Where am I missing alert rules?*
5765

58-
:::row:::
59-
:::column span="":::
60-
- Azure API Management
61-
- Azure App Service
62-
- Azure Cache for Redis
63-
- Azure Container Apps
64-
- Azure Cosmos DB
65-
- Azure Database for PostgreSQL
66-
:::column-end:::
67-
:::column span="":::
68-
- Azure Functions
69-
- Azure Kubernetes Service
70-
- Azure SQL
71-
- Azure Storage
72-
- Azure Virtual Machines
73-
:::column-end:::
74-
:::row-end:::
66+
1. Take action to proposed next steps.
7567

76-
To get the latest list of services with custom agent tooling, submit the following prompt to the agent:
68+
# [Handle an incident](#tab/incident)
7769

78-
```text
79-
Which Azure services do you have specialized tooling available for?
80-
```
70+
1. Enable integrations:
71+
72+
- Incident management tools: Link to ServiceNow, PagerDuty, or use Azure Monitor alerts.
73+
74+
- Ticketing systems: Azure Boards.
75+
76+
- Source code repositories: Connect to GitHub or Azure DevOps.
77+
78+
1. Send a test incident to validate enrichment, RCA, and automation flow.
79+
80+
1. Review incident context, RCA timeline, and proposed mitigations.
81+
82+
---
8183

8284
## Considerations
8385

8486
Keep in mind the following considerations as you use Azure SRE Agent:
8587

86-
* English is the only supported language in the chat interface
87-
* During preview, you can deploy the agent to the Sweden Central region, but the agent can monitor and remediate issues for services in any Azure region.
88-
* For more information on how data is managed in Azure SRE Agent, see the [Microsoft privacy policy](https://www.microsoft.com/privacy/privacystatement).
88+
- English is the only supported language in the chat interface
89+
- During preview, you can deploy the agent to the Sweden Central region, but the agent can monitor and remediate issues for services in any Azure region.
90+
- For more information on how data is managed in Azure SRE Agent, see the [Microsoft privacy policy](https://www.microsoft.com/privacy/privacystatement).
91+
- Availability varies by region/tenant configuration.
92+
- Preview [billing](billing.md) begins **September 1, 2025**, using Azure Agent Units (AAU).
8993

9094
## Preview access
9195

0 commit comments

Comments
 (0)