Skip to content

Commit 0e63df9

Browse files
committed
First draft of module in my fork of release branch
1 parent 1f391e4 commit 0e63df9

12 files changed

Lines changed: 821 additions & 0 deletions

File tree

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: "learn.devrel.troubleshoot-optimize-internet-information-services-performance.introduction"
3+
title: "Introduction"
4+
metadata:
5+
title: "Introduction"
6+
description: "Module introduction."
7+
ms.date: "04/12/2026"
8+
author: "Orin-Thomas"
9+
ms.author: "orthomas"
10+
ms.topic: "unit"
11+
durationInMinutes: 3
12+
content: |
13+
[!include[](includes/1-introduction.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: "learn.devrel.troubleshoot-optimize-internet-information-services-performance.logs-tracing"
3+
title: "Diagnose errors with logs and tracing"
4+
metadata:
5+
title: "Diagnose Errors with Logs and Tracing"
6+
description: "Learn how to use logs and tracing to isolate IIS failures."
7+
ms.date: "04/12/2026"
8+
author: "Orin-Thomas"
9+
ms.author: "orthomas"
10+
ms.topic: "unit"
11+
durationInMinutes: 8
12+
content: |
13+
[!include[](includes/2-logs-tracing.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: "learn.devrel.troubleshoot-optimize-internet-information-services-performance.monitor-tune-performance"
3+
title: "Monitor and tune IIS performance"
4+
metadata:
5+
title: "Monitor and Tune IIS Performance"
6+
description: "Learn how to monitor and tune IIS performance using performance counters."
7+
ms.date: "04/12/2026"
8+
author: "Orin-Thomas"
9+
ms.author: "orthomas"
10+
ms.topic: "unit"
11+
durationInMinutes: 8
12+
content: |
13+
[!include[](includes/3-monitor-tune-performance.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: "learn.devrel.troubleshoot-optimize-internet-information-services-performance.troubleshoot-practice"
3+
title: "Build an operational troubleshooting practice"
4+
metadata:
5+
title: "Build an Operational Troubleshooting Practice"
6+
description: "Perform troubleshooting in a repeatable fashion."
7+
ms.date: "04/12/2026"
8+
author: "Orin-Thomas"
9+
ms.author: "orthomas"
10+
ms.topic: "unit"
11+
durationInMinutes: 8
12+
content: |
13+
[!include[](includes/4-troubleshoot-practice.md)]
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
### YamlMime:ModuleUnit
2+
uid: "learn.devrel.troubleshoot-optimize-internet-information-services-performance.knowledge-check"
3+
title: "Knowledge check"
4+
metadata:
5+
title: "Knowledge Check"
6+
description: "Check your knowledge."
7+
ms.date: "04/12/2026"
8+
author: "Orin-Thomas"
9+
ms.author: "orthomas"
10+
ms.topic: "unit"
11+
module_assessment: true
12+
durationInMinutes: 4
13+
content: |
14+
Choose the best response for each question.
15+
quiz:
16+
title: "Knowledge check"
17+
questions:
18+
- content: "A user reports \"Service Unavailable\" errors, but you find no matching entries in the site's IIS request logs. Which log should you check next?"
19+
choices:
20+
- content: "The Windows Security Event Log"
21+
isCorrect: false
22+
explanation: "Incorrect. The Windows Security Event Log records security-related events such as logon attempts, not HTTP request rejections."
23+
- content: "The HTTP.sys error log (httperr.log)"
24+
isCorrect: true
25+
explanation: "Correct. HTTP.sys rejects requests before they reach the worker process when the app pool is offline or the queue is full. These rejections appear only in httperr.log, not in IIS request logs."
26+
- content: "The IIS configuration audit log"
27+
isCorrect: false
28+
explanation: "Incorrect. The IIS configuration audit log tracks changes to IIS configuration, not request-level errors."
29+
- content: "The Failed Request Tracing output"
30+
isCorrect: false
31+
explanation: "Incorrect. Failed Request Tracing captures requests that reach the IIS pipeline. Requests rejected by HTTP.sys before reaching IIS don't appear in Failed Request Tracing output."
32+
- content: "You need to capture detailed diagnostic data for requests that complete successfully (HTTP 200) but take longer than 30 seconds. Which Failed Request Tracing rule type should you configure?"
33+
choices:
34+
- content: "A rule triggered by status code 200"
35+
isCorrect: false
36+
explanation: "Incorrect. A status-code rule for 200 would trace all successful requests, which is far too many in production."
37+
- content: "A rule triggered by time-taken exceeding 30 seconds"
38+
isCorrect: true
39+
explanation: "Correct. A time-taken threshold captures any request exceeding the specified duration regardless of status code."
40+
- content: "A rule triggered by the GENERAL_REQUEST_END event"
41+
isCorrect: false
42+
explanation: "Incorrect. The GENERAL_REQUEST_END event fires for every completed request, not just slow ones."
43+
- content: "A rule triggered by verbosity level \"Verbose\""
44+
isCorrect: false
45+
explanation: "Incorrect. Verbosity level controls the detail of trace output, not the condition that triggers tracing."
46+
- content: "Performance Monitor shows Requests Queued (ASP.NET) steadily increasing while Requests/sec (Web Service) is flat. What does this most likely indicate?"
47+
choices:
48+
- content: "The server is receiving fewer requests than usual"
49+
isCorrect: false
50+
explanation: "Incorrect. If the server were receiving fewer requests, the queue wouldn't be growing."
51+
- content: "The HTTP.sys request queue limit is too low"
52+
isCorrect: false
53+
explanation: "Incorrect. A low queue limit would cause rejected requests rather than a steadily growing queue."
54+
- content: "The worker process can't process requests fast enough to keep up with incoming traffic"
55+
isCorrect: true
56+
explanation: "Correct. Flat Requests/sec with rising Requests Queued means the worker process has hit a throughput ceiling. New requests are arriving faster than completed requests are leaving the pipeline."
57+
- content: "IIS output caching is invalidating entries too frequently"
58+
isCorrect: false
59+
explanation: "Incorrect. Cache invalidation issues would show increased processing time, not a specific pattern of flat throughput with a growing queue."
60+
- content: "An application has a known slow memory leak. Which application pool recycling trigger is most appropriate to prevent memory exhaustion without causing unnecessary restarts?"
61+
choices:
62+
- content: "Regular time interval set to 4 hours"
63+
isCorrect: false
64+
explanation: "Incorrect. Time-based recycling may fire too frequently, wasting resources, or not frequently enough, failing to prevent memory exhaustion."
65+
- content: "Request count limit set to 100,000"
66+
isCorrect: false
67+
explanation: "Incorrect. Request-count-based recycling doesn't correlate with actual memory consumption and may trigger unnecessarily or too late."
68+
- content: "Private memory limit set to a threshold below the problem point"
69+
isCorrect: true
70+
explanation: "Correct. A private memory limit triggers recycling only when the leak has consumed significant memory, directly addressing the problem without unnecessary restarts."
71+
- content: "Disable all recycling and restart the server nightly"
72+
isCorrect: false
73+
explanation: "Incorrect. Disabling recycling and relying on nightly restarts risks memory exhaustion during the day and is disruptive."
74+
- content: "After deploying a new application version, the application pool keeps stopping. Event Viewer shows Event ID 5002 from source WAS. Which two actions should you take first?"
75+
choices:
76+
- content: "Check the Application Event Log for the exception causing the worker process crash and check httperr.log for the reason code associated with the 503 responses"
77+
isCorrect: true
78+
explanation: "Correct. Event ID 5002 means rapid-fail protection triggered due to repeated crashes. The priority is finding the crash cause in the Application Event Log and confirming the rejection mechanism in httperr.log."
79+
- content: "Increase the rapid-fail protection failure count to 20 and disable rapid-fail protection to keep the app pool running"
80+
isCorrect: false
81+
explanation: "Incorrect. Increasing the failure count or disabling rapid-fail protection just allows the crash loop to continue longer without addressing the root cause."
82+
- content: "Increase the rapid-fail protection failure count to 20 and check httperr.log for the reason code associated with the 503 responses"
83+
isCorrect: false
84+
explanation: "Incorrect. While checking httperr.log is appropriate, increasing the rapid-fail protection failure count doesn't help diagnose the root cause."
85+
- content: "Check the Application Event Log for the exception causing the worker process crash and disable rapid-fail protection to keep the app pool running"
86+
isCorrect: false
87+
explanation: "Incorrect. While checking the Application Event Log is appropriate, disabling rapid-fail protection masks the problem rather than fixing it."
88+
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: "learn.devrel.troubleshoot-optimize-internet-information-services-performance.summary"
3+
title: "Summary"
4+
metadata:
5+
title: "Summary"
6+
description: "Module summary."
7+
ms.date: "04/12/2026"
8+
author: "Orin-Thomas"
9+
ms.author: "orthomas"
10+
ms.topic: "unit"
11+
durationInMinutes: 1
12+
content: |
13+
[!include[](includes/6-summary.md)]
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
Imagine that your web application passed every test in staging. Within hours of you deploying it to production, users report intermittent "Service Unavailable" errors and page loads stretching past 10 seconds. The application code hasn't changed. The server shows no obvious hardware failure. Something between the client and the application is broken, and it's your job to find it.
2+
3+
This scenario is routine for IIS administrators. The challenge isn't that IIS lacks diagnostic data. Instead, the data is spread across multiple log streams, performance counters, and event sources. Knowing *which* source to check *first* is the skill that separates a five-minute fix from an hours-long investigation.
4+
5+
## Why troubleshooting and performance are a single discipline
6+
7+
In IIS, errors and performance problems share the same diagnostic pipeline. A 503 "Service Unavailable" response might be caused by a crashed worker process, or by a healthy worker process that can't drain its request queue fast enough. A slow page might be an application code problem, or a misconfigured recycling schedule that restarts the worker process under load. The tools and workflow are the same: collect logs, read traces, check counters, adjust configuration.
8+
9+
This module treats troubleshooting and performance tuning as a single workflow: **symptoms → data collection → root cause → resolution → verification**.
10+
11+
This module teaches you about:
12+
13+
- **Failed Request Tracing (FREB):** Captures the full request pipeline with per-module timing, letting you pinpoint exactly where a request stalls or fails.
14+
- **HTTP.sys error logs:** Show requests that the kernel-mode HTTP driver rejected *before* they ever reached a worker process.
15+
- **Performance Monitor counters:** Provide quantitative baselines and bottleneck identification for CPU, memory, disk, and request throughput.
16+
- **Tuning:** Covers application pool recycling, output caching, compression, and concurrency settings that directly impact production performance.
17+
18+
## Diagnostic toolbox
19+
20+
The following table summarizes the tools you use throughout this module. Refer back to it as a quick-reference during investigations.
21+
22+
| Tool | What it tells you | When to use it |
23+
|------|-------------------|----------------|
24+
| IIS request logs (W3C) | Per-request status codes, time-taken, client IPs, URI stems | First stop for any reported error or slowness |
25+
| HTTP.sys error log (`httperr.log`) | Requests rejected before reaching the IIS worker process | 503s, connection resets, queue-full conditions |
26+
| Failed Request Tracing (FREB) | Full request pipeline trace with per-module timing | Intermittent errors or slow requests that don't reproduce on demand |
27+
| Event Viewer (System + Application) | App pool crashes, worker process recycling events, configuration errors | Startup failures, unexpected recycling |
28+
| Performance Monitor (PerfMon) | Real-time and logged counters for CPU, memory, requests, queues | Sustained load problems, capacity planning, baseline capture |
29+
| Resource Monitor / Task Manager | Live process-level CPU, memory, disk, and network consumption | Rapid triage to determine whether IIS is even the bottleneck |

0 commit comments

Comments
 (0)