Skip to content

Commit c6e40ff

Browse files
authored
Merge pull request #313420 from craigshoemaker/sre/sync-updates-0319-02
[SRE Agent] Update incident response and diagnostics documentation
2 parents 81c8f66 + 618e672 commit c6e40ff

12 files changed

Lines changed: 438 additions & 367 deletions

articles/sre-agent/diagnose-azure-observability.md

Lines changed: 23 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Diagnose with Azure Observability in Azure SRE Agent
33
description: Learn how your agent queries Application Insights, Log Analytics, Azure Monitor metrics, Activity Logs, Resource Graph, and resource-specific diagnostics automatically without connectors.
44
ms.topic: conceptual
55
ms.service: azure-sre-agent
6-
ms.date: 03/04/2026
6+
ms.date: 03/18/2026
77
author: craigshoemaker
88
ms.author: cshoe
99
ms.ai-usage: ai-assisted
@@ -17,16 +17,16 @@ Your agent queries Application Insights, Log Analytics, Azure Monitor metrics, R
1717
> [!TIP]
1818
> Key benefits of Azure observability diagnostics:
1919
>
20-
> - Your agent queries App Insights, Log Analytics, Azure Monitor metrics, Resource Graph, Activity Logs, and resource-specific diagnostics all in one investigation.
21-
> - No connectors are needed — everything works through managed identity and Azure RBAC.
20+
> - Your agent queries App Insights, Log Analytics, Azure Monitor metrics, Resource Graph, Activity Logs, and resource-specific diagnostics, all in one investigation.
21+
> - No connectors are needed. Everything works through managed identity and Azure RBAC.
2222
> - Your agent decides which sources to query based on the symptom, correlates evidence across them, and explains what it found.
23-
> - Deep diagnostics go beyond metrics CPU profiling, memory analysis, connectivity checks, and deployment history.
23+
> - Deep diagnostics go beyond metrics including CPU profiling, memory analysis, connectivity checks, and deployment history.
2424
2525
## The problem: too many places to look
2626

2727
Azure's observability stack is comprehensive. Application Insights captures traces and dependencies. Log Analytics stores custom logs and events. Azure Monitor tracks resource metrics. Resource Graph maps topology. Activity Logs record configuration changes. Each Azure service has its own diagnostics including Container Apps console logs, App Service deployment history, Function App health checks, and AKS pod status.
2828

29-
That breadth is the problem. During an incident, you need data from several of these sources, but you have to remember which portal has which data, write KQL from scratch, manually copy operation IDs between tools, and correlate timestamps across tabs. The data exists everywhere. Knowing where to look and connecting what you find is what takes the most time.
29+
That breadth is the problem. During an incident, you need data from several of these sources, but you have to remember which portal has which data, write KQL from scratch, manually copy operation IDs between tools, and correlate timestamps across tabs. The data exists everywhere. Knowing where to look and connecting what you find takes the most time.
3030

3131
## How your agent investigates
3232

@@ -36,31 +36,26 @@ The following diagram shows how your agent diagnoses Azure services by querying
3636

3737
Your agent has built-in access to Azure's full diagnostic surface. Grant permissions once, and your agent queries the right sources automatically based on the symptom:
3838

39-
1. **Discovers resources** Resource Graph finds topology, relationships, and connected resources across your subscriptions.
40-
1. **Queries logs** Application Insights for request traces, exceptions, and dependencies; Log Analytics for custom workspace data.
41-
1. **Analyzes metrics** Azure Monitor for CPU, memory, request rates, and availability with automatic time-series analysis.
42-
1. **Checks changes** Activity Logs surface recent configuration changes and deployments that might correlate with the issue.
43-
1. **Runs deep diagnostics** Built-in skills perform CPU profiling, memory analysis, latency assessment, connectivity checks, and resource-specific health analysis.
44-
1. **Executes Azure CLI commands** Reads resource state, checks configurations, and inspects properties that APIs don't expose directly.
45-
1. **Correlates everything** Evidence from all sources is connected automatically, with no copy-paste between portals.
39+
1. **Discovers resources**: Resource Graph finds topology, relationships, and connected resources across your subscriptions.
40+
1. **Queries logs**: Application Insights for request traces, exceptions, and dependencies; Log Analytics for custom workspace data.
41+
1. **Analyzes metrics**: Azure Monitor for CPU, memory, request rates, and availability with automatic time-series analysis.
42+
1. **Checks changes**: Activity Logs surface recent configuration changes and deployments that might correlate with the issue.
43+
1. **Runs deep diagnostics**: Built-in skills perform CPU profiling, memory analysis, latency assessment, connectivity checks, and resource-specific health analysis.
44+
1. **Executes Azure CLI commands**: Reads resource state, checks configurations, and inspects properties that APIs don't expose directly.
45+
1. **Correlates everything**: Evidence from all sources is connected automatically, with no copy-paste between portals.
4646

4747
> [!NOTE]
4848
> Your agent selects the right tools for each resource type automatically. You don't configure which tools to use. Your agent decides based on the symptom and the resource involved.
4949
50-
## What makes this different
50+
## What makes this approach different
5151

5252
Azure's observability capabilities are excellent. The challenge is navigating them under pressure. Your agent eliminates the cognitive overhead of knowing where to look and how to connect what you find.
5353

54-
**Single investigation instead of portal-hopping.** Your agent queries all sources in one investigation. You don't need to remember whether a specific metric lives in Azure Monitor, Application Insights, or a resource-specific blade.
54+
**Single investigation instead of portal-hopping.** Your agent queries all sources in one investigation. You don't need to remember whether a specific metric lives in Azure Monitor, Application Insights, or a resource-specific window.
5555

5656
**Symptom-driven queries instead of writing KQL from scratch.** Your agent constructs queries based on the symptom. It knows which tables to query, which dimensions to split by, and how to interpret the results in context.
5757

58-
**Automatic correlation instead of manual correlation.** Your agent follows the thread automatically by inspecting operation IDs, timestamps, resource relationships, deployment timelines across every source it queries.
59-
60-
| Capability | What it contributes |
61-
|---|---|
62-
| [Memory and knowledge](memory.md) | Recalls what worked for similar issues; your docs explain application-specific telemetry |
63-
| [Run modes](run-modes.md) | Control whether your agent investigates only or also takes action |
58+
**Automatic correlation instead of manual correlation.** Your agent follows the thread automatically by inspecting operation IDs, timestamps, resource relationships, and deployment timelines across every source it queries.
6459

6560
## Before and after
6661

@@ -95,7 +90,7 @@ Your agent discovers available metrics for any resource type, queries time-serie
9590

9691
When your agent uses Azure Monitor as its incident platform, it also manages alerts directly by acknowledging and closing them during investigation.
9792

98-
### Resource Graph and Activity Logs
93+
### Resource graph and activity logs
9994

10095
Your agent uses Resource Graph and Activity Logs to discover resources and correlate changes with incidents.
10196

@@ -109,10 +104,10 @@ Beyond metrics and logs, your agent has specialized capabilities that go deeper.
109104

110105
| Category | What it does |
111106
|---|---|
112-
| **Deep diagnostics** | CPU profiling, memory analysis, latency assessment, threadpool starvation detection |
107+
| **Deep diagnostics** | CPU profiling, memory analysis, latency assessment, thread pool starvation detection |
113108
| **Connectivity checks** | TCP connectivity tests, DNS resolution, storage connectivity verification |
114109
| **Resource-specific diagnostics** | Container app revision management, App Service configuration checks, Function App deployment history, AKS kubectl commands, Redis diagnostics, PostgreSQL health, API Management analysis |
115-
| **Reliability assessment** | App Service health scoring: AlwaysOn, health checks, instance count, auto-heal configuration |
110+
| **Reliability assessment** | App Service health scoring: AlwaysOn, health checks, instance count, autoheal configuration |
116111
| **Azure CLI** | Read commands (`az ... show`, `az ... list`) for any Azure service, and write commands (`az ... update`, `az ... scale`) with approval |
117112
| **ARM operations** | Direct resource property inspection, app settings management, deployment slot operations |
118113

@@ -155,32 +150,12 @@ The following example shows how your agent investigates an error in a container
155150

156151
## Get started
157152

158-
Azure observability is built in which means that no connectors are required. Grant your agent these permissions:
159-
160-
| Scope | Role | What it enables |
161-
|---|---|---|
162-
| Subscription | **Reader** | Resource discovery, Resource Graph, Activity Logs |
163-
| Subscription | **Monitoring Contributor** | Alert management: acknowledge and close Azure Monitor alerts |
164-
| Application Insights | **Monitoring Reader** | Traces, exceptions, dependencies via KQL |
165-
| Log Analytics workspace | **Log Analytics Reader** | KQL queries on workspace data |
166-
167-
> [!NOTE]
168-
> If your agent uses Azure Monitor as its incident platform, the **Monitoring Contributor** role is required at the subscription level. Your agent receives this role automatically when created through the portal. This permission enables your agent to acknowledge and close alerts during investigation. Without it, your agent can still query metrics and resource health, but can't manage alert states.
169-
170-
> [!TIP]
171-
> If your agent uses Azure Monitor as its incident platform and its managed identity is missing the **Monitoring Contributor** role, a warning banner appears in the chat interface. This role is required specifically for alert management which acknowledges and closes Azure Monitor alerts. Your agent can still read metrics, logs, and resource health without it.
172-
>
173-
> The banner includes an **Assign Monitoring Contributor role** button that assigns the role directly. There's no need to navigate to the Azure portal. You can also dismiss the banner if you prefer to assign the role manually.
174-
175-
## When to use external tools
176-
177-
Azure observability covers most scenarios for applications running on Azure. You might need other tools when your data lives elsewhere.
153+
Azure observability works automatically when you grant your agent Reader access to your subscription during initial setup.
178154

179-
| Scenario | Solution |
180-
|---|---|
181-
| Custom metrics in Azure Data Explorer | [Set up Kusto tools](kusto-tools.md) |
182-
| Logs in Datadog, Splunk, or other platforms | [Configure external observability](diagnose-observability.md) |
183-
| Specialized monitoring (Prometheus, Grafana) | [Configure external observability](diagnose-observability.md) |
155+
| Resource | What you'll learn |
156+
|----------|-------------------|
157+
| [Create and set up](create-agent.md) | Grant permissions during initial agent setup |
158+
| [Manage permissions](permissions.md) | Add or change resource access after setup |
184159

185160
## Next step
186161

@@ -189,7 +164,4 @@ Azure observability covers most scenarios for applications running on Azure. You
189164

190165
## Related content
191166

192-
- [Tools](tools.md)
193167
- [Root cause analysis](root-cause-analysis.md)
194-
- [External observability](diagnose-observability.md)
195-
- [Kusto tools](kusto-tools.md)

0 commit comments

Comments
 (0)