Skip to content

Commit b34f444

Browse files
Merge pull request #54272 from MicrosoftDocs/main
Auto Publish – main to live - 2026-04-16 17:00 UTC
2 parents 8b07457 + b3b3e24 commit b34f444

18 files changed

Lines changed: 350 additions & 358 deletions

File tree

learn-pr/advocates/improve-reliability-incidents/1-introduction.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: "Introduction"
44
metadata:
55
title: "Introduction"
66
description: "Introduction"
7-
ms.date: 02/28/2024
7+
ms.date: 04/16/2026
88
author: dnblankedelman
99
ms.author: dnb
1010
ms.topic: unit
Lines changed: 46 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,46 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.improve-reliability-incidents.2-importance
3-
title: "Importance of incident response"
4-
metadata:
5-
title: "Importance of incident response"
6-
description: "Importance of incident response"
7-
ms.date: 02/28/2024
8-
author: dnblankedelman
9-
ms.author: dnb
10-
ms.topic: unit
11-
ms.custom: team=cloud_advocates
12-
ms.contributors: dnb-06092020
13-
durationInMinutes: 3
14-
content: |
15-
[!include[](includes/2-importance.md)]
16-
quiz:
17-
title: Check your knowledge
18-
questions:
19-
20-
- content: 'Which of the following is a goal for effective incident response?'
21-
choices:
22-
- content: 'Be able to react with caution'
23-
isCorrect: false
24-
explanation: "While caution is important (so we don't break more things), it is not a goal for effective incident response."
25-
- content: 'Be able to respond with urgency'
26-
isCorrect: true
27-
explanation: 'Correct, we want to respond, not just react, with urgency.'
28-
- content: 'Be able to act with deliberation'
29-
isCorrect: false
30-
explanation: 'While caution is important (so we try the right things), it is not a goal for effective incident response.'
31-
32-
- content: 'How quickly can engineering teams that are classified as “elite or high performers” generally detect, respond, and remediate service disruptions?'
33-
choices:
34-
- content: 'in less than 1 hour'
35-
isCorrect: true
36-
explanation: 'Correct.'
37-
- content: 'in less than 4 hours'
38-
isCorrect: false
39-
explanation: 'High performers can handle an incident in less than an hour according to the _State of DevOps_ report.'
40-
- content: 'in less than 24 hours'
41-
isCorrect: false
42-
explanation: 'This is the definition of a "medium" performer according to the _State of DevOps_ report.'
43-
- content: 'in less than 1 week or a month'
44-
isCorrect: false
45-
explanation: 'This is the definition of a "low" performer according to the _State of DevOps_ report.'
46-
1+
### YamlMime:ModuleUnit
2+
uid: learn.improve-reliability-incidents.2-importance
3+
title: "Importance of incident response"
4+
metadata:
5+
title: "Importance of incident response"
6+
description: "Importance of incident response"
7+
ms.date: 04/16/2026
8+
author: dnblankedelman
9+
ms.author: dnb
10+
ms.topic: unit
11+
ms.custom: team=cloud_advocates
12+
ms.contributors: dnb-06092020
13+
durationInMinutes: 3
14+
content: |
15+
[!include[](includes/2-importance.md)]
16+
quiz:
17+
title: Check your knowledge
18+
questions:
19+
20+
- content: "Which of the following is a goal for effective incident response?"
21+
choices:
22+
- content: "Be able to react with caution"
23+
isCorrect: false
24+
explanation: "While caution is important (so we don't break more things), it isn't a goal for effective incident response."
25+
- content: "Be able to respond with urgency"
26+
isCorrect: true
27+
explanation: "Correct, we want to respond, not just react, with urgency."
28+
- content: "Be able to act with deliberation"
29+
isCorrect: false
30+
explanation: "While caution is important (so we try the right things), it isn't a goal for effective incident response."
31+
32+
- content: "Which DORA metric most directly measures how quickly a team recovers from a failed deployment?"
33+
choices:
34+
- content: "Deployment frequency"
35+
isCorrect: false
36+
explanation: "Deployment frequency measures how often code is released, not how quickly a team recovers from a failure."
37+
- content: "Lead time for changes"
38+
isCorrect: false
39+
explanation: "Lead time measures how long it takes for a change to move from commit to production."
40+
- content: "Failed deployment recovery time"
41+
isCorrect: true
42+
explanation: "Correct. DORA uses failed deployment recovery time to measure how quickly a team recovers from a deployment that causes a production issue."
43+
- content: "Change fail rate"
44+
isCorrect: false
45+
explanation: "Change fail rate measures how often deployments cause problems, not how long recovery takes."
46+
Lines changed: 52 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,52 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.improve-reliability-incidents.3-lifecycle
3-
title: "Characteristics and lifecycle of an incident"
4-
metadata:
5-
title: "Characteristics and lifecycle of an incident"
6-
description: "Characteristics and lifecycle of an incident"
7-
ms.date: 02/28/2024
8-
author: dnblankedelman
9-
ms.author: dnb
10-
ms.topic: unit
11-
ms.custom: team=cloud_advocates
12-
ms.contributors: dnb-06092020
13-
durationInMinutes: 4
14-
content: |
15-
[!include[](includes/3-lifecycle.md)]
16-
quiz:
17-
title: Check your knowledge
18-
questions:
19-
20-
- content: 'Which of these can be considered the "pulse" of your system?'
21-
choices:
22-
- content: 'Monitoring'
23-
isCorrect: false
24-
explanation: 'Monitoring is a way we can check and report on a system.'
25-
- content: 'A service ticketing system'
26-
isCorrect: false
27-
explanation: 'A ticket system might be a useful place to keep details about incidents, but does not offer a pulse.'
28-
- content: 'Incidents'
29-
isCorrect: true
30-
explanation: 'Correct, incidents do provide a pulse for you.'
31-
32-
- content: 'Which of these is not a phase of an incident?'
33-
choices:
34-
- content: 'Detection'
35-
isCorrect: false
36-
explanation: 'Detection is the first phase of an incident.'
37-
- content: 'Response'
38-
isCorrect: false
39-
explanation: 'Response is the second phase of an incident.'
40-
- content: 'Communication'
41-
isCorrect: true
42-
explanation: 'Correct, communication is important, but not a phase of an incident.'
43-
- content: 'Remediation'
44-
isCorrect: false
45-
explanation: 'Remediation is the third phase of an incident.'
46-
- content: 'Analysis'
47-
isCorrect: false
48-
explanation: 'Analysis is the fourth phase of an incident.'
49-
- content: 'Readiness'
50-
isCorrect: false
51-
explanation: 'Readiness is the fifth and final phase of an incident.'
52-
1+
### YamlMime:ModuleUnit
2+
uid: learn.improve-reliability-incidents.3-lifecycle
3+
title: "Characteristics and lifecycle of an incident"
4+
metadata:
5+
title: "Characteristics and lifecycle of an incident"
6+
description: "Characteristics and lifecycle of an incident"
7+
ms.date: 04/16/2026
8+
author: dnblankedelman
9+
ms.author: dnb
10+
ms.topic: unit
11+
ms.custom: team=cloud_advocates
12+
ms.contributors: dnb-06092020
13+
durationInMinutes: 4
14+
content: |
15+
[!include[](includes/3-lifecycle.md)]
16+
quiz:
17+
title: Check your knowledge
18+
questions:
19+
20+
- content: "Which of these can be considered the \"pulse\" of your system?"
21+
choices:
22+
- content: "Monitoring"
23+
isCorrect: false
24+
explanation: "Monitoring is a way we can check and report on a system."
25+
- content: "A service ticketing system"
26+
isCorrect: false
27+
explanation: "A ticket system might be a useful place to keep details about incidents, but doesn't offer a pulse."
28+
- content: "Incidents"
29+
isCorrect: true
30+
explanation: "Correct, incidents do provide a pulse for you."
31+
32+
- content: "Which of these isn't a phase of an incident?"
33+
choices:
34+
- content: "Detection"
35+
isCorrect: false
36+
explanation: "Detection is the first phase of an incident."
37+
- content: "Response"
38+
isCorrect: false
39+
explanation: "Response is the second phase of an incident."
40+
- content: "Communication"
41+
isCorrect: true
42+
explanation: "Correct, communication is important, but not a phase of an incident."
43+
- content: "Remediation"
44+
isCorrect: false
45+
explanation: "Remediation is the third phase of an incident."
46+
- content: "Analysis"
47+
isCorrect: false
48+
explanation: "Analysis is the fourth phase of an incident."
49+
- content: "Readiness"
50+
isCorrect: false
51+
explanation: "Readiness is the fifth and final phase of an incident."
52+
Lines changed: 59 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -1,59 +1,59 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.improve-reliability-incidents.4-foundations
3-
title: "Foundations of incident response"
4-
metadata:
5-
title: "Foundations of incident response"
6-
description: "Foundations of incident response"
7-
ms.date: 02/28/2024
8-
author: dnblankedelman
9-
ms.author: dnb
10-
ms.topic: unit
11-
ms.custom: team=cloud_advocates
12-
ms.contributors: dnb-06092020
13-
durationInMinutes: 6
14-
content: |
15-
[!include[](includes/4-foundations.md)]
16-
quiz:
17-
title: Check your knowledge
18-
questions:
19-
20-
- content: 'Which of these things is a pillar of incident response?'
21-
choices:
22-
- content: 'Rosters'
23-
isCorrect: false
24-
explanation: 'Rosters are one of the pillars, but not the only one.'
25-
- content: 'Roles'
26-
isCorrect: false
27-
explanation: 'Roles are one of the pillars, but not the only one.'
28-
- content: 'Rotations'
29-
isCorrect: false
30-
explanation: 'Rotations are one of the pillars, but not the only one.'
31-
- content: 'All of these'
32-
isCorrect: true
33-
explanation: 'Correct, all of these are pillars of effective incident response.'
34-
35-
36-
- content: 'What does scribe role do as part of incident response?'
37-
choices:
38-
- content: 'Writes up the post-incident review'
39-
isCorrect: false
40-
explanation: 'Though they might do this later, this role has a different main purpose.'
41-
- content: 'Creates a verbatim transcript for a call bridge'
42-
isCorrect: false
43-
explanation: 'The scribe captures all the data possible, not just what team members are doing but also what they’re saying and even what they’re feeling or experiencing.'
44-
- content: 'Keeps track of the metrics in the monitoring system that are important'
45-
isCorrect: false
46-
explanation: 'The scribe has a different role during incident response.'
47-
- content: 'Documents the conversation around incident in as much detail as possible'
48-
isCorrect: true
49-
explanation: 'Correct.'
50-
51-
- content: 'Do you need all of the roles mentioned in this unit to do successful incident response?'
52-
choices:
53-
- content: 'Yes'
54-
isCorrect: false
55-
explanation: 'Especially in smaller organizations or during more minor incidents, not all roles will be needed.'
56-
- content: 'No'
57-
isCorrect: true
58-
explanation: 'Correct, especially in smaller organizations or during more minor incidents, not all roles will be needed.'
59-
1+
### YamlMime:ModuleUnit
2+
uid: learn.improve-reliability-incidents.4-foundations
3+
title: "Foundations of incident response"
4+
metadata:
5+
title: "Foundations of incident response"
6+
description: "Foundations of incident response"
7+
ms.date: 04/16/2026
8+
author: dnblankedelman
9+
ms.author: dnb
10+
ms.topic: unit
11+
ms.custom: team=cloud_advocates
12+
ms.contributors: dnb-06092020
13+
durationInMinutes: 6
14+
content: |
15+
[!include[](includes/4-foundations.md)]
16+
quiz:
17+
title: Check your knowledge
18+
questions:
19+
20+
- content: "Which of these things is a pillar of incident response?"
21+
choices:
22+
- content: "Rosters"
23+
isCorrect: false
24+
explanation: "Rosters are one of the pillars, but not the only one."
25+
- content: "Roles"
26+
isCorrect: false
27+
explanation: "Roles are one of the pillars, but not the only one."
28+
- content: "Rotations"
29+
isCorrect: false
30+
explanation: "Rotations are one of the pillars, but not the only one."
31+
- content: "All of these"
32+
isCorrect: true
33+
explanation: "Correct, all of these are pillars of effective incident response."
34+
35+
36+
- content: "What does the scribe role do as part of incident response?"
37+
choices:
38+
- content: "Writes up the post-incident review"
39+
isCorrect: false
40+
explanation: "Though they might do this later, this role has a different main purpose."
41+
- content: "Creates a verbatim transcript for a call bridge"
42+
isCorrect: false
43+
explanation: "The scribe captures all the data possible, not just what team members are doing but also what they\u2019re saying and even what they\u2019re feeling or experiencing."
44+
- content: "Keeps track of the metrics in the monitoring system that are important"
45+
isCorrect: false
46+
explanation: "The scribe has a different role during incident response."
47+
- content: "Documents the conversation around the incident in as much detail as possible"
48+
isCorrect: true
49+
explanation: "Correct."
50+
51+
- content: "Do you need all of the roles mentioned in this unit to do successful incident response?"
52+
choices:
53+
- content: "Yes"
54+
isCorrect: false
55+
explanation: "Especially in smaller organizations or during more minor incidents, not all roles are needed."
56+
- content: "No"
57+
isCorrect: true
58+
explanation: "Correct, especially in smaller organizations or during more minor incidents, not all roles are needed."
59+

0 commit comments

Comments
 (0)