You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learn-pr/advocates/improve-reliability-monitoring/4-change-frame.yml
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ title: "Changing the frame"
4
4
metadata:
5
5
title: "Changing the frame"
6
6
description: "Changing the frame"
7
-
ms.date: 08/18/2023
7
+
ms.date: 04/15/2026
8
8
author: dnblankedelman
9
9
ms.author: dnb
10
10
ms.topic: unit
@@ -42,4 +42,4 @@ quiz:
42
42
explanation: 'It depends on the noticeable impact to the customers of the service. If they are experiencing a severe problem and the service does not work for them, we may need to fix them immediately.'
43
43
- content: 'It depends'
44
44
isCorrect: true
45
-
explanation: 'Correct, the severity level depends on the noticeable impact to their customer.'
45
+
explanation: 'Correct, the severity level depends on the noticeable impact to the customer.'
Copy file name to clipboardExpand all lines: learn-pr/advocates/improve-reliability-monitoring/5-tools.yml
+8-8Lines changed: 8 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ title: "Azure monitoring tools"
4
4
metadata:
5
5
title: "Azure monitoring tools"
6
6
description: "Azure monitoring tools"
7
-
ms.date: 08/18/2023
7
+
ms.date: 04/15/2026
8
8
author: dnblankedelman
9
9
ms.author: dnb
10
10
ms.topic: unit
@@ -17,14 +17,14 @@ quiz:
17
17
title: Check your knowledge
18
18
questions:
19
19
20
-
- content: 'Which of the following are the two major data types used in Azure Monitor?'
20
+
- content: 'Which of the following best describes the common observability data types emphasized in Azure Monitor?'
21
21
choices:
22
-
- content: 'Chartsand tables'
22
+
- content: 'Charts, tables, and dashboards'
23
23
isCorrect: false
24
-
explanation: 'Azure monitor can display both charts and tables, but they are not data types.'
25
-
- content: 'Metricsand logs'
24
+
explanation: 'Azure Monitor can display charts and tables on dashboards, but those are visualizations rather than observability data types.'
25
+
- content: 'Metrics, logs, and distributed traces'
26
26
isCorrect: true
27
-
explanation: 'Correct, Azure Monitor works with metricsand logs.'
28
-
- content: 'Countersand gauges'
27
+
explanation: 'Correct, Azure Monitor emphasizes metrics, logs, and distributed traces as core observability data types.'
28
+
- content: 'Counters, gauges, and alerts'
29
29
isCorrect: false
30
-
explanation: 'Azure monitor can ingest metrics from counters and gauges, but they are not the two major data types.'
30
+
explanation: 'Counters and gauges can produce metrics, and alerts act on data, but these are not the observability data types emphasized by Azure Monitor.'
In order for us begin to work on monitoring for reliability, there's a predecessor step we have to take. First, we need to make sure we have a reasonable level of operational awareness.
1
+
In order for us to begin to work on monitoring for reliability, there's a predecessor step we have to take. First, we need to make sure we have a reasonable level of operational awareness.
2
2
3
3
The simplest way to say this is that in order to work towards the reliability of systems in production, we first have to have a decent understanding of those systems and how they're functioning in production.
4
4
5
5
## Collect information about the present configuration
6
6
7
-
Though it may sound peculiar, in many environments the first question wee need to answer is "**What exactly is running in production?**" Our production environments these days and the paths to deploying to them are sufficiently complex that it's not uncommon to first have to do a bit of discovery first. Given a specific application, what are its component parts? What parts talk to other parts? What are the obvious (and not-so-obvious) dependencies for this application?
7
+
Though it may sound peculiar, in many environments the first question we need to answer is "**What exactly is running in production?**" Our production environments these days and the paths to deploying to them are sufficiently complex that it's not uncommon to first have to do a bit of discovery first. Given a specific application, what are its component parts? What parts talk to other parts? What are the obvious (and not-so-obvious) dependencies for this application?
8
8
9
9
## Collect information about normal and past performance
10
10
@@ -18,9 +18,9 @@ And finally, it's useful for us to gain some contextual knowledge around a syste
18
18
19
19
You might think *oh, it's obvious who owns or cares about a particular app/service*, but in enterprise situations or other complex organizations, this can be much harder than it sounds.
20
20
21
-
The sad truth is THAT we're not going to be able to make much headway on a system's reliability without a clear idea of who the stakeholders are (for reasons that will become clear later when we discuss SLIs and SLOs).
21
+
The sad truth is that we're not going to be able to make much headway on a system's reliability without a clear idea of who the stakeholders are (for reasons that will become clear later when we discuss SLIs and SLOs).
22
22
23
-
On the technical side of the context question, it's really helpful for us to pay attention to technical questions like *just how did this application get in production?* Was it deployed manually during an "epic" deployment, or was it deployed via an automated CI/CD pipeline with a great set of unit tests?
23
+
On the technical side of the context question, it's really helpful for us to pay attention to technical questions like *just how did this application get into production?* Was it deployed manually during an "epic" deployment, or was it deployed via an automated CI/CD pipeline with a great set of unit tests?
24
24
25
25
This information can have many ramifications, including how easy it will be to iterate if and when we have reliability improving updates to make. It's also possibly a useful indicator of work that we could be doing that will make a real difference.
26
26
@@ -30,9 +30,9 @@ Gaining operational awareness is often not easy, but we're going to look at a fe
30
30
31
31
### Application Insights
32
32
33
-
The first tools we'll look at can help us with the "what's actually running?" question. As operations people, it's not unusual to be asked to work with an application that's already running in production. While ideally we'd be part of the entire lifecycle of the software, starting at the design phase, that's not always (or perhaps often) the case. When this happens, especially with more complex multitiered or microservicebased applications, just being able to understand what all of the moving parts do can take effort.
33
+
The first tools we'll look at can help us with the "what's actually running?" question. As operations people, it's not unusual to be asked to work with an application that's already running in production. While ideally we'd be part of the entire lifecycle of the software, starting at the design phase, that's not always (or perhaps often) the case. When this happens, especially with more complex multitiered or microservice-based applications, just being able to understand what all of the moving parts do can take effort.
34
34
35
-
One tool that can reduce that effort—plus give us information about the application's behavior in production—is Application Insights. With minimal effort, developers can instrument their application so that it automatically sends telemetry information to collectors running in Azure. With this information, Application Insights can create a visual map of the components of the application and the communication between these components.
35
+
One tool that can reduce that effort—plus give us information about the application's behavior in production—is Application Insights. Developers can instrument their application—ideally by using the Azure Monitor OpenTelemetry Distro, which is the recommended approach for new projects—and send telemetry to an Application Insights resource in Azure Monitor. When dependency telemetry is flowing and the application's cloud role names are configured correctly, Application Insights can create an Application map that shows the components of the application and the communication between those components.
36
36
37
37
Here's an example:
38
38
@@ -44,44 +44,41 @@ In the preceding picture, you can see not only the components of the application
44
44
45
45
Application Insights is a great way to gain some operational awareness for an application, but what if you want to get a view from even higher up and see all of the resources you have in play on Azure in a subscription? In the past, you'd download reports or write PowerShell to gather this information, but now there's a much easier way.
46
46
47
-
Azure Resource Graph Explorer provides an interactive query environment right from the Azure portal for the data you need. It lets you run arbitrary queries that return real-time answers based on the resources currently in use. For example, if you to see all of the VMs you're currently running, you could run the following query:
47
+
Azure Resource Graph Explorer provides an interactive query environment right from the Azure portal for the data you need. It lets you run queries against near-current inventory data for the resources in your subscriptions. For example, if you want to see all of the VMs you're currently running, you could run the following query:
48
48
49
49
:::image type="content" source="../media/resource-graph-explorer.png" alt-text="Resource graph panel in Azure portal with the query of where type == microsoft.compute/virtualmachines":::
50
50
51
51
and you'd get back a complete detailed list of the VMs being used in our subscription:
52
52
53
53
:::image type="content" source="../media/resource-graph-explorer-results.png" alt-text="Resource graph panel in the Azure portal with results of query showing table of results.":::
54
54
55
-
The query language used in this environment is Kusto Query Language (KQL). We'll be discussing it in more depth later in this module when we talk about Azure Monitor Log Analytics.
55
+
The query language used in this environment is based on Kusto Query Language (KQL). Azure Resource Graph supports a useful subset of KQL rather than every KQL feature. We'll be discussing KQL in more depth later in this module when we talk about Azure Monitor Log Analytics.
56
56
57
-
### Dashboards
58
-
59
-
The most traditional operations tool for operational awareness is the venerable dashboard. Often when we think of people doing operations, we imagine them sitting in front of large monitors intensely peering into dashboards full of graphs, charts and counters. In this module, we're not going to explore how you construct, edit and use dashboards. That is largely done by pinning content from other places in the portal and then moving them around as you see fit.
57
+
### Workbooks
60
58
61
-
Instead, let's look at two dashboard features less commonly used that could be of real benefit to you. You can find these features at the top of every dashboard.
59
+
Azure Monitor workbooks are the richest built-in visualization tool for operational awareness. They let you combine KQL queries, metrics, text, parameters, and links into interactive reports. They're especially useful when you want an end-to-end monitoring view across multiple resources or when you need to build things like reliability scorecards and SLI/SLO views with filters and drill-through.
62
60
63
-
:::image type="content" source="../media/dashboard.png" alt-text="Screenshot of the Dashboard panel in the Azure portal with the Upload and Export buttons highlighted.":::
64
-
65
-
The two highlighted arrows allow you to upload and export JSON representations of dashboards.
61
+
### Dashboards
66
62
67
-
First, let's start with the export functionality. If you select **Export**, then select **Download**, a JSON file that represents the current dashboard is downloaded to your computer. If you'd like, try this now by logging into the portal, choosing **Dashboard**from the product menu, and then selecting **Export** > **Download**.
63
+
The most traditional operations tool for operational awareness is the venerable dashboard. Often when we think of people doing operations, we imagine them sitting in front of large monitors intensely peering into dashboards full of graphs, charts, and counters. In this module, we're not going to explore how you construct, edit, and use dashboards in detail. That's largely done by pinning content from other places in the portal and arranging tiles as you see fit.
68
64
69
-
There are at least two things you can do with this file that you might find handy:
65
+
Instead, let's look at a powerful idea: **dashboards as code**. Azure portal dashboards can be exported and imported as JSON files, and you can also deploy them by using ARM templates or Bicep. This means you can:
70
66
71
-
-You could check this file into your source control system. This allows you to keep track of your different versions of dashboards, and also allow others to access them if they would like to use your dashboard. Some might call this "dashboards as code."
67
+
-**Version-control dashboard definitions** by checking the exported JSON into source control. This lets you track layout and configuration changes over time and share reusable dashboards with colleagues.
72
68
73
-
-You can use this file as the basis of a new dashboard. Here's a concrete example we'll revisit later in this learning path: let's say you need to show a colleague what a particular dashboard looked like for an hour during an outage that happened last week. You could publish your dashboard and ask them to go select the precise time and time period. But far easier and less error prone, you could download your dashboard set up exactly as you need and share that JSON file. If you want to highlight a second period from the same dashboard, let's say an hour in the future, it's easy to edit the JSON.
69
+
-**Reuse and deploy dashboards across environments.** The exported JSON captures the dashboard's layout, pinned content, and configuration. It doesn't freeze the live telemetry at a point in time, which makes it a good fit for repeatable deployment rather than for capturing historical evidence.
74
70
75
-
That's the export functionality. Now, let's focus on the uses for the upload functionality. Besides being able to load the version-controlled or edited files from the last section, you can use the upload functionality to make use of other people's careful work when constructing dashboards.
71
+
-**Build dashboards from Azure Resource Graph queries.** You can construct queries that count your virtual machines, storage accounts, and databases, pin those results to a dashboard, and then export the JSON for version control or reuse in another subscription. This gives you a live inventory dashboard driven by current resource data. If you select one of the tiles on that dashboard, you can inspect and refine the underlying query that produced it.
76
72
77
-
Let's look at final example for this section that nicely ties together two of the ideas from this unit. If you download this JSON file:
73
+
If you need to show a colleague the view you used during an outage, it's usually better to share a link to the dashboard or workbook with the relevant filters and time range than to treat exported JSON as a snapshot of the data.
:::image type="content" source="../media/dashboard.png" alt-text="Screenshot of an Azure portal dashboard showing export and sharing options.":::
80
76
81
-
to your computer and then upload it to a dashboard, you should see something like this:
77
+
### Grafana
82
78
83
-
:::image type="content" source="../media/azure-inventory-dashboard.png" alt-text="Screenshot of dashboard displaying inventory of Azure resources, one resource per tile.":::
79
+
If your team prefers Grafana-style operational dashboards, Azure offers two main paths. **Azure Monitor dashboards with Grafana** are a good fit when you're working only with Azure-native data sources. **Azure Managed Grafana** is the better option when you want broader Grafana capabilities, more advanced sharing, or non-Azure data sources.
84
80
85
-
You now have a live dashboard that shows you a fairly comprehensible inventory of your resources in use in a subscription. This dashboard's data is coming from the same source as the Azure Resource Graph Explorer we looked at earlier. In fact, if you select one of tiles, you can see (and edit if desired) the exact query that'ss being run to yield the information show in that square. Excellent, no?
81
+
> [!TIP]
82
+
> Azure Monitor offers several visualization options. Use [Workbooks](/azure/azure-monitor/visualize/workbooks-overview) for rich interactive reports, [Azure dashboards](/azure/azure-portal/azure-portal-dashboards-create-programmatically) when you want reusable portal dashboards defined as code, and [Grafana options in Azure](/azure/azure-monitor/visualize/visualize-grafana-overview) when you want Grafana-style dashboards, including **Azure Monitor dashboards with Grafana** and Azure Managed Grafana.
86
83
87
84
With that help for our operational awareness, let's begin to explore just what we'll want to monitor to assist us with improving our reliability.
0 commit comments