Skip to content

Commit 56e91ff

Browse files
authored
Merge pull request #313264 from craigshoemaker/sre/sync-new-0317-01
[SRE Agent] Setup, identity, and getting started
2 parents 7d47e63 + f874db0 commit 56e91ff

35 files changed

Lines changed: 1238 additions & 140 deletions
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
---
2+
title: Agent identity in Azure SRE Agent
3+
description: Learn how Azure SRE Agent uses managed identities to authenticate to Azure resources and external connectors.
4+
ms.topic: conceptual
5+
ms.service: azure-sre-agent
6+
ms.date: 03/16/2026
7+
author: craigshoemaker
8+
ms.author: cshoe
9+
ms.ai-usage: ai-assisted
10+
ms.custom: identity, managed identity, uami, connectors, oauth, security
11+
#customer intent: As an SRE, I want to understand the identities my agent uses so that I can manage access and configure connectors correctly.
12+
---
13+
14+
# Agent identity in Azure SRE Agent
15+
16+
When you create an agent, Azure automatically provisions identity resources. This article explains what gets created, why two identities exist, and how connectors use them.
17+
18+
For information about how your agent gets permissions on Azure resources (RBAC roles, permission levels, on-behalf-of flow), see [Agent permissions](permissions.md).
19+
20+
## What gets created
21+
22+
Two managed identities are created alongside your agent.
23+
24+
| Identity | What it is | What you do with it |
25+
|---|---|---|
26+
| **User-assigned managed identity (UAMI)** | A standalone identity resource in your resource group | Assign RBAC roles, select it when setting up connectors. This is the identity you manage |
27+
| **System-assigned managed identity** | An internal identity used by the agent's infrastructure | Nothing—this identity is managed automatically and used for internal operations only |
28+
29+
The UAMI is the identity you work with. It appears in your resource group, you assign RBAC roles to it, and you select it when setting up connectors.
30+
31+
> [!TIP]
32+
> When you see a managed identity dropdown in the portal (for connectors, repositories, or other integrations), select your agent's UAMI. It's the identity that matches your RBAC role assignments.
33+
34+
## Where your agent's UAMI is used
35+
36+
Your agent's UAMI is the primary identity for most operations.
37+
38+
| Operation | Identity | Notes |
39+
|---|---|---|
40+
| **Azure resource operations** (Azure Resource Manager, CLI, diagnostics) | UAMI | The RBAC roles you assign determine what the agent can access |
41+
| **Communication connectors** (Outlook, Teams) | UAMI + your OAuth credentials | You sign in via OAuth; the UAMI brokers authentication to the connector resource |
42+
| **Data connectors** (Azure Data Explorer) | UAMI | Grant the UAMI permissions on the target Kusto cluster |
43+
| **Source code connectors** (GitHub, Azure DevOps) | UAMI (for Azure DevOps managed identity) | Azure DevOps connector uses UAMI; GitHub uses OAuth |
44+
| **MCP connectors** | Varies | You provide endpoint URL and credentials; optionally assign a managed identity for downstream Azure calls |
45+
| **Internal infrastructure** | UAMI | Used automatically for the agent's internal operations |
46+
| **Key Vault** | UAMI (preferred) or system-assigned | Falls back to system-assigned if no UAMI is specified |
47+
48+
## How connectors use identity
49+
50+
Different connector types use identity differently. The key distinction is whether the connector needs to go through Azure Resource Manager (ARM) to reach the external service.
51+
52+
### Communication connectors (Outlook, Teams)
53+
54+
When you set up a communication connector, two things happen:
55+
56+
1. **You sign in** with your account via OAuth, which gives the connector your user credentials.
57+
1. **You select a UAMI** from the identity dropdown, which the connector uses to authenticate to the connector resource.
58+
59+
The connector stores your OAuth token securely in a connector resource. The connector resource acts as a secure bridge. The resource holds your credentials so the agent doesn't need direct access to them. It uses the UAMI to broker the authentication when the agent sends an email or posts a Teams message on your behalf.
60+
61+
### Data connectors (Azure Data Explorer / Kusto)
62+
63+
For Kusto connectors, the agent uses the UAMI directly to authenticate to your Azure Data Explorer cluster. No OAuth sign-in is needed. Grant the UAMI the required permissions, such as the **Viewer** role, on the Kusto cluster.
64+
65+
### Source code connectors (GitHub, Azure DevOps)
66+
67+
Source code connectors use different authentication methods depending on the platform.
68+
69+
- **Azure DevOps:** Uses the UAMI for managed identity authentication. Select the UAMI from the identity dropdown and grant it access to your Azure DevOps organization.
70+
- **GitHub:** Uses OAuth authentication. Sign in by using your GitHub account. No managed identity is needed for the GitHub connection itself.
71+
72+
### Custom MCP connectors
73+
74+
MCP connectors use endpoint-based authentication. Provide the MCP server URL along with credentials, such as an API key, Bearer token, or OAuth. You can optionally assign a managed identity for the MCP server to use when making downstream Azure API calls.
75+
76+
## Find your agent's UAMI
77+
78+
You can locate your agent's user-assigned managed identity from the agent portal, the Azure portal, or the Azure CLI.
79+
80+
**From the agent portal:**
81+
82+
1. Go to **Settings** > **Azure settings**.
83+
1. The identity name appears in the **Managed Identity** field.
84+
1. Select **Go to Identity** to open it in the Azure portal.
85+
86+
**From the Azure portal:**
87+
88+
1. Go to your agent's resource group.
89+
1. Find the `id-*` managed identity resource.
90+
1. Copy the **Object (principal) ID**. Use this value for RBAC role assignments.
91+
92+
**From Azure CLI:**
93+
94+
```azurecli
95+
# List user-assigned identities on the agent resource
96+
az resource show \
97+
--resource-group <RESOURCE_GROUP_NAME> \
98+
--name <AGENT_NAME> \
99+
--resource-type Microsoft.App/containerApps \
100+
--query identity.userAssignedIdentities
101+
```
102+
103+
## Next step
104+
105+
> [!div class="nextstepaction"]
106+
> [Configure agent permissions](./permissions.md)
107+
108+
## Related content
109+
110+
- [Agent permissions](permissions.md): Learn how to configure RBAC roles and permission levels for your agent.
111+
- [Connectors](connectors.md): Set up connector types and learn how they extend your agent's capabilities.
112+
- [User roles and permissions](user-roles.md): Control who can view, interact with, and administer your agent.
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
---
2+
title: "Tutorial: Automate Incident Response in Azure SRE Agent"
3+
description: Connect Azure Monitor, create response plans, and let your agent investigate and resolve incidents autonomously from detection to fix.
4+
ms.topic: tutorial
5+
ms.date: 03/16/2026
6+
author: craigshoemaker
7+
ms.author: cshoe
8+
ms.service: azure-sre-agent
9+
ms.ai-usage: ai-assisted
10+
#customer intent: As a site reliability engineer, I want to connect my incident platform and create response plans so that my agent automatically investigates and resolves incidents end-to-end.
11+
---
12+
13+
# Tutorial: Automate incident response in Azure SRE Agent
14+
15+
**Estimated time**: 10 minutes
16+
17+
Connect your incident platform and let your agent handle alerts automatically. The system handles alerts from detection to diagnosis to fix, all without you typing a single message.
18+
19+
## What you accomplish
20+
21+
By the end of this step, your agent:
22+
23+
- Connects to Azure Monitor as your incident platform
24+
- Receives incidents filtered by severity through a response plan
25+
- Investigates matching alerts end-to-end, including code fixes and pull requests
26+
27+
## Prerequisites
28+
29+
| Requirement | Details |
30+
|---|---|
31+
| **Completed Steps 1–3** | [Create agent](create-agent.md), [Add knowledge](first-value.md), [Connect source code](connect-source-code.md). |
32+
| **Azure resources connected** | At least one Azure subscription with resources the agent can monitor. |
33+
34+
## Connect Azure Monitor
35+
36+
Link Azure Monitor as your incident platform so the agent automatically receives alerts.
37+
38+
1. In the left sidebar, go to **Builder** > **Incident platform**.
39+
1. Select the **Incident platform** dropdown and choose **Azure Monitor**.
40+
1. The **Quickstart response plan** toggle is on by default. Turn it off as you create your own response plan in the next section.
41+
1. Select **Save**.
42+
43+
Wait for the connection to complete. The status changes to **"Azure Monitor connected. Your next step is to set up incident response plans."**
44+
45+
:::image type="content" source="media/automate-incidents/response-plan-saved.png" alt-text="Screenshot of Azure Monitor connected with a green checkmark status." lightbox="media/automate-incidents/response-plan-saved.png":::
46+
47+
**Checkpoint:** The incident platform page shows a green checkmark with **Azure Monitor connected**.
48+
49+
> [!TIP]
50+
> You can also connect [PagerDuty](pagerduty-incidents.md) or [ServiceNow](servicenow-incidents.md) from the same dropdown.
51+
52+
## Create an incident response plan
53+
54+
An incident response plan tells the agent which incidents to pick up and how much autonomy it has. The following steps are for Azure Monitor. PagerDuty and ServiceNow response plans use different filter fields based on their own incident metadata, such as priority, category, and assignment group.
55+
56+
1. Go to **Builder** > **Incident response plans** in the left sidebar.
57+
58+
1. Select **New incident response plan**.
59+
60+
1. **Step 1: Set up incident filters:**
61+
62+
- Enter a name, such as `all-incidents`.
63+
- Select severity levels. Choose **All severity** to catch everything during setup.
64+
- Optionally, add a title filter to narrow scope.
65+
66+
1. Select **Next**.
67+
68+
:::image type="content" source="media/automate-incidents/response-plan-step-1.png" alt-text="Screenshot of the response plan creation form with name and severity fields." lightbox="media/automate-incidents/response-plan-step-1.png":::
69+
70+
1. **Step 2: Preview filter results:** Review matching past incidents from your incident platform (empty if no incidents exist yet). Select **Next**.
71+
72+
1. **Step 3: Save response plan:**
73+
- Choose how much control the agent has:
74+
- **Autonomous (Default)**: The agent investigates and acts independently, including code fixes and container restarts.
75+
- **Review**: The agent diagnoses but waits for your approval before acting.
76+
- Select **Save**.
77+
78+
:::image type="content" source="media/automate-incidents/response-plan-step-3-save.png" alt-text="Screenshot of the response plan autonomy options showing Review and Autonomous modes." lightbox="media/automate-incidents/response-plan-step-3-save.png":::
79+
80+
**Checkpoint:** Your response plan appears in the list with status **On** and the autonomy level you selected.
81+
82+
## What happens when an alert fires
83+
84+
When Azure Monitor fires an alert that matches your response plan, the agent investigates automatically. What the agent does depends on the context you gave it. Runbooks, code repositories, Azure resources, and prior investigations all shape the depth and actions of the investigation.
85+
86+
### Example: HTTP 500 errors on a container app
87+
88+
In this example, the agent has a runbook for handling HTTP 500 errors, a connected code repository, and Azure resource access.
89+
90+
:::image type="content" source="media/automate-incidents/incident-completed.png" alt-text="Screenshot of the incidents page showing one completed Sev3 alert with green Completed status." lightbox="media/automate-incidents/incident-completed.png":::
91+
92+
**The agent builds a plan from your runbook.** Rather than following a generic troubleshooting sequence, the agent reads the HTTP 500 runbook you upload during onboarding and follows your team's procedures. The agent checks for upstream dependencies first, then connection pool, then recent deployments.
93+
94+
:::image type="content" source="media/automate-incidents/incident-full-page-top.png" alt-text="Screenshot of the agent showing investigation plan for HTTP 5xx alert with six numbered steps." lightbox="media/automate-incidents/incident-full-page-top.png":::
95+
96+
**The agent recalls prior knowledge.** If the agent investigated a similar issue before, it recognizes the pattern and skips discovery. It does this operation to combine your runbook procedures with what it learned from previous investigations.
97+
98+
**The agent takes action.** In **Review** mode, the agent asks for your approval before each action. In **Autonomous** mode, it acts independently. In this example, the agent:
99+
100+
- Reads the source code and identifies the root cause
101+
- Edits the code to fix the bug
102+
- Restarts the container to mitigate the alert
103+
- Commits the fix and pushes it to a new branch
104+
- Creates a GitHub issue for tracking
105+
- Verifies the service is healthy after the fix
106+
107+
**The agent delivers a remediation summary.** The agent produces a structured report with everything the team needs to follow up:
108+
109+
:::image type="content" source="media/automate-incidents/incident-full-page-code-fix.png" alt-text="Screenshot of the remediation summary table showing alert, mitigation, permanent fix, root cause, status, and tracking." lightbox="media/automate-incidents/incident-full-page-code-fix.png":::
110+
111+
| Item | What the agent reports |
112+
|---|---|
113+
| **Alert** | Which alert fired, severity, affected resource |
114+
| **Immediate mitigation** | What was done to restore service right now |
115+
| **Permanent fix** | Code changes made and branch pushed |
116+
| **Root cause** | Specific code bug or configuration issue with file references |
117+
| **Status** | Current health of the affected resource |
118+
| **Tracking** | GitHub issue number |
119+
| **Next steps** | Merge pull request and redeploy |
120+
121+
> [!NOTE]
122+
> Your results vary based on the context your agent has. An agent with more runbooks, connected repositories, and prior investigations produces deeper, more targeted responses.
123+
124+
## Next step
125+
126+
> [!div class="nextstepaction"]
127+
> [Step 5: Automate actions](automate-actions.md)
128+
129+
## Related content
130+
131+
- [Incident response plans](incident-response-plans.md)
132+
- [PagerDuty incidents](pagerduty-incidents.md)
133+
- [ServiceNow incidents](servicenow-incidents.md)
134+
- [Memory and knowledge](memory.md)
135+
- [Monitor agent usage](monitor-agent-usage.md)

0 commit comments

Comments
 (0)