Skip to content

Commit d64f9fd

Browse files
committed
new draft info
1 parent f134a2a commit d64f9fd

1 file changed

Lines changed: 168 additions & 37 deletions

File tree

support/sql/database-engine/agent/job-failed-error-258.md

Lines changed: 168 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ This article provides troubleshooting guidance for an issue where SQL Agent jobs
1212

1313
## Symptoms
1414

15-
The SQL Agent service runs, but scheduled SQL Agent jobs don't execute. The SQL Server and Agent logs show network and authentication timeouts as well as failed sign-ins.
15+
The [SQL Agent service](/ssms/agent/sql-server-agent#sql-server-agent-components) runs, but scheduled SQL Agent jobs don't execute. The SQL Server and Agent logs show network and authentication timeouts as well as failed sign-ins.
1616

1717
The following example shows the error message that's added to the logs:
1818

@@ -27,36 +27,59 @@ Logon to server '<ServerName>' failed (ConnLogJobHistory)
2727

2828
This issue can be caused by any of the following underlying problems:
2929

30-
- Blocking on `msdb` system tables used by Agent, which prevents job metadata reads and writes.
30+
- Blocking on [msdb](/sql/relational-databases/databases/msdb-database) system tables used by Agent, which prevents job metadata reads and writes.
3131
- Example system tables: `dbo.sysjobs`, `dbo.sysjobschedulers`, and `dbo.jobsteps`.
3232
- Hangs inside important SQL Server Agent threads or other process-level problems.
3333
- Worker thread exhaustion in SQL Server (no workers available), making the Agent unable to connect or process schedules.
3434

3535
## Solution
3636

37-
1. Confirm that the SQL Server Agent service is running by using the following PowerShell command:
37+
1. Confirm that the SQL Server Agent service is running by using one of the following PowerShell commands:
3838

39-
```powershell
40-
Get-Service -Name "SQLSERVERAGENT"
41-
```
39+
1. For default SQL instances:
40+
41+
```powershell
42+
Get-Service -Name "SQLSERVERAGENT"
43+
```
44+
45+
1. For named SQL instances:
46+
47+
```powershell
48+
Get-Service -Name "SQLSERVERAGENT$<InstanceName>"
49+
```
50+
51+
1. If the SQL Server Agent service isn't running, start it by using one of the following commands:
52+
53+
1. For default SQL instances:
54+
55+
```powershell
56+
Start-Service -Name "SQLSERVERAGENT"
57+
```
58+
59+
1. For named SQL instances:
4260
43-
1. If the SQL Server Agent service isn't running, start it.
61+
```powershell
62+
Start-Service -Name "SQLSERVERAGENT$<InstanceName>"
63+
```
64+
65+
1. If jobs continue to fail after starting the SQL Server Agent service, continue to the next step. If jobs are completing successfully, the issue is resolved and no further action is needed.
4466
1. Check the jobs and schedules in `msdb` by opening [SQL Server Management Studio (SSMS)](/ssms/install/install) and running the following query:
4567
4668
```tsql
4769
USE msdb;
4870
GO
49-
5071
-- List enabled jobs
51-
SELECT name, enabled, description FROM msdb.dbo.sysjobs WHERE enabled = 1;
72+
SELECT name, enabled, description
73+
FROM msdb.dbo.sysjobs
74+
WHERE enabled = 1;
5275
GO
53-
54-
-- Show job schedules and next run
55-
SELECT s.name AS ScheduleName,
56-
j.name AS JobName,
57-
s.enabled AS ScheduleEnabled,
58-
s.active_start_date,
59-
s.active_end_time
76+
-- List schedules and next run information
77+
SELECT
78+
s.name AS ScheduleName,
79+
j.name AS JobName,
80+
s.enabled AS ScheduleEnabled,
81+
s.active_start_date,
82+
s.active_end_time
6083
FROM msdb.dbo.sysjobs j
6184
JOIN msdb.dbo.sysjobschedules js ON j.job_id = js.job_id
6285
JOIN msdb.dbo.sysschedules s ON js.schedule_id = s.schedule_id
@@ -66,46 +89,154 @@ This issue can be caused by any of the following underlying problems:
6689
6790
Analyze the query output for any jobs which are enabled but have failed. Investigate the job history and job-step outputs for any problematic jobs to identify and fix underlying issues.
6891
69-
1. Detect blocking on `msdb` Agent system tables by running the following query in SSMS:
92+
1. Detect blocking sessions on `msdb` Agent system tables by running the following query in SSMS:
7093
7194
```tsql
7295
USE msdb;
7396
GO
74-
75-
SELECT session_id, blocking_session_id, wait_type, wait_duration_ms, resource_description
97+
SELECT
98+
session_id,
99+
blocking_session_id,
100+
wait_type,
101+
wait_duration_ms,
102+
resource_description
76103
FROM sys.dm_os_waiting_tasks
77104
WHERE resource_description LIKE '%sysjobs%'
78105
OR resource_description LIKE '%sysjobschedulers%'
79106
OR resource_description LIKE '%jobsteps%';
80-
GO
107+
GO
81108
```
82109
83-
1. If blocking sessions are found, investigate the blocking query using `sys.dm_exec_requests` and `sys.dm_exec_sql_text`. Then, resolve or kill the blocking session.
84-
1. Check the `system_health` Extended Events session for any worker, thread, or resource issues by running the following query in SSMS:
110+
1. To identify the query associated with a blocking session run the following query in SSMS:
111+
112+
```tsql
113+
SELECT
114+
wt.session_id,
115+
wt.blocking_session_id,
116+
wt.wait_type,
117+
wt.wait_duration_ms,
118+
wt.resource_description,
119+
er.status,
120+
er.command,
121+
er.cpu_time,
122+
er.total_elapsed_time,
123+
txt.text AS sql_text
124+
FROM sys.dm_os_waiting_tasks wt
125+
LEFT JOIN sys.dm_exec_requests er
126+
ON wt.session_id = er.session_id
127+
CROSS APPLY sys.dm_exec_sql_text(er.sql_handle) AS txt
128+
WHERE wt.resource_description LIKE '%sysjobs%'
129+
OR wt.resource_description LIKE '%sysjobschedulers%'
130+
OR wt.resource_description LIKE '%jobsteps%';
131+
```
132+
133+
1. Resolve or terminate any blocking sessions you identified in the previous step. To terminate a session run the following query in SSMS:
85134
86135
```tsql
87-
SELECT CAST(xet.target_data as xml) AS target_data
88-
FROM sys.dm_xe_session_targets xet
89-
JOIN sys.dm_xe_sessions xe
90-
ON xe.address = xet.event_session_address
91-
WHERE xe.name = 'system_health'
136+
Kill <Blocking_Session_ID>
92137
```
93138
94-
Inspect the query results for `QUERY_PROCESSING`, `RESOURCE`, and `SYSTEM` components. Look for thread exhaustion, memory pressure, or CPU issues. If you identify any issues resolve them by following the guidance provided in [Troubleshooting SQL Agent Issues](/to-do). <!--Fill in this link one the doc is published -->
95-
139+
Once all blocking sessions are resolved or terminated, proceed to the next step.
96140
97-
1. If you can't resolve blocking, hangs, or worker exhaustion, restart the SQL Server Agent by running the following commands in PowerShell:
141+
1. Check for any worker, thread, or resource health issues by running the following query in SSMS:
98142
99-
```powershell
100-
Restart-Service -Name "SQLSERVERAGENT" -Force
101-
net stop "SQL Server Agent (MSSQLSERVER)"
102-
net start "SQL Server Agent (MSSQLSERVER)"
143+
```tsql
144+
/* ============================================================
145+
HEALTH CHECK (Worker, CPU, Memory)
146+
============================================================ */
147+
148+
SELECT
149+
Section,
150+
Metric,
151+
Value,
152+
ExtraInfo
153+
FROM (
154+
155+
/* ===============================
156+
WORKER THREAD STATUS
157+
=============================== */
158+
SELECT
159+
CAST('WORKER THREAD STATUS' AS VARCHAR(MAX)) AS Section,
160+
CAST(CONCAT('Scheduler ', scheduler_id) AS VARCHAR(MAX)) AS Metric,
161+
CAST(CONCAT('Workers: ', active_workers_count, '/', current_workers_count) AS VARCHAR(MAX)) AS Value,
162+
CAST(CONCAT('WorkQueue=', work_queue_count, ', Idle=', is_idle) AS VARCHAR(MAX)) AS ExtraInfo
163+
FROM sys.dm_os_schedulers
164+
WHERE scheduler_id < 255
165+
166+
UNION ALL
167+
168+
/* ===============================
169+
CPU PRESSURE
170+
=============================== */
171+
SELECT
172+
CAST('CPU PRESSURE' AS VARCHAR(MAX)) AS Section,
173+
CAST(CONCAT('Scheduler ', scheduler_id) AS VARCHAR(MAX)) AS Metric,
174+
CAST(CONCAT('RunnableTasks=', runnable_tasks_count) AS VARCHAR(MAX)) AS Value,
175+
CAST(CONCAT('PendingIO=', pending_disk_io_count) AS VARCHAR(MAX)) AS ExtraInfo
176+
FROM sys.dm_os_schedulers
177+
WHERE scheduler_id < 255
178+
179+
UNION ALL
180+
181+
/* ===============================
182+
MEMORY STATUS (System)
183+
=============================== */
184+
SELECT
185+
CAST('MEMORY STATUS' AS VARCHAR(MAX)) AS Section,
186+
CAST('SystemMemoryState' AS VARCHAR(MAX)) AS Metric,
187+
CAST(system_memory_state_desc AS VARCHAR(MAX)) AS Value,
188+
CAST(CONCAT('TotalMB=', total_physical_memory_kb/1024,
189+
', AvailableMB=', available_physical_memory_kb/1024) AS VARCHAR(MAX)) AS ExtraInfo
190+
FROM sys.dm_os_sys_memory
191+
192+
UNION ALL
193+
194+
/* ===============================
195+
PAGE LIFE EXPECTANCY
196+
=============================== */
197+
SELECT
198+
CAST('PAGE LIFE EXPECTANCY' AS VARCHAR(MAX)) AS Section,
199+
CAST('PLE' AS VARCHAR(MAX)) AS Metric,
200+
CAST(cntr_value AS VARCHAR(MAX)) AS Value,
201+
CAST(NULL AS VARCHAR(MAX)) AS ExtraInfo
202+
FROM sys.dm_os_performance_counters
203+
WHERE counter_name = 'Page life expectancy'
204+
AND object_name LIKE '%Buffer Manager%'
205+
206+
) AS x
207+
ORDER BY Section, Metric;
103208
```
104209
105-
> [!IMPORTANT]
106-
> Restarting the SQL Server Agent interrupts any currently running jobs.
210+
Investigate the output of the health check query for any of the following issues using the given symptoms:
211+
212+
1. Worker thread pressure:
213+
1. Worker exhaustion, for example `Workers: 512/512`.
214+
1. `WorkQueue` is greater than zero, indicating that tasks are waiting and the system is overloaded.
215+
1. CPU pressure:
216+
1. `RunnableTasks` is greater than zero, indicating there is a CPU bottleneck.
217+
1. Memory pressure:
218+
1. `Memory state` is `LOW`, indicating the overall system is low on memory.
219+
1. A low value for `AvailableMB`, indicating high memory usage for SQL Server.
220+
1. A `PLE` value less than 300, indicating high memory churn.
221+
1. If you identified any worker, CPU, or memory issues in the previous step reduce your current workload to resolve them. If no worker, CPU, or memory issues were identified, proceed to the next step.
222+
1. Restart the SQL Server Agent by running one of the the following PowerShell commands:
223+
224+
> [!IMPORTANT]
225+
> Restarting the SQL Server Agent interrupts any currently running jobs.
226+
227+
1. For default SQL instances:
228+
229+
```powershell
230+
Restart-Service -Name "SQLSERVERAGENT"
231+
```
232+
233+
1. For named SQL instances:
234+
235+
```powershell
236+
Restart-Service -Name "SQLAgent$<InstanceName>"
237+
```
107238
108-
After the SQL Server Agent restarts, verify that jobs are now being executed by using the [Job Activity Monitor](/ssms/agent/monitor-job-activity#job-activity-monitor).
239+
1. After the SQL Server Agent restarts, verify that jobs are now being executed by using the [Job Activity Monitor](/ssms/agent/monitor-job-activity#job-activity-monitor).
109240
110241
## Related content
111242

0 commit comments

Comments
 (0)