You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Customer intent: As a Service Fabric customer, I want to analyze the reason why a repair job is stuck using Service Fabric Explorer.
12
13
---
@@ -75,36 +76,36 @@ Finally, in the Completed state, the task is finished and no further state chang
75
76
76
77
### Infrastructure Jobs view
77
78
78
-
To view jobs that Service Fabric receives for approval, go to the **Infrastructure Jobs** tab in the cluster view. Each entry includes a **Job ID** which stays the same across and outside of Service Fabric. The **Acknowledgement Status** shows whether Service Fabric approves the job with one of the following states:
79
+
To view jobs that Service Fabric receives for approval, select **Infrastructure Jobs** in the cluster view. Each entry includes a **Job ID** which stays the same across and outside of Service Fabric. The **Acknowledgement Status** shows whether Service Fabric approves the job with one of the following states:
79
80
80
81
-**WaitingForAcknowledgement** - The job is still waiting for approval.
81
82
-**Acknowledged** - Service Fabric approves the job.
82
83
83
84
Jobs only appear here when they're present in the received document. In addition to the **Job ID** and **Acknowledgement Status**, the **Impact Types** section displays the nature of the job’s impact. The **Current Repair Task** section shows which repair task is actively running for job approval on the Service Fabric side. By selecting **All Repair Tasks**, you can view the status of every repair task associated with the current job.
84
85
85
-
:::image type="content" source="media/troubleshoot-service-fabric-repair-jobs/cluster-infrastructure-job-view.png" alt-text="Screenshot of the Infrastructure Jobs tab in Service Fabric Explorer showing job ID, acknowledgement status, and impact types." lightbox="media/troubleshoot-service-fabric-repair-jobs/cluster-infrastructure-job-view.png":::
86
+
:::image type="content" source="media/troubleshoot-service-fabric-repair-jobs/cluster-infrastructure-job-view.png" alt-text="Screenshot of the Infrastructure Jobs view in Service Fabric Explorer showing job ID, acknowledgement status, and impact types." lightbox="media/troubleshoot-service-fabric-repair-jobs/cluster-infrastructure-job-view.png":::
86
87
87
-
### Repair Jobs and Health Check view
88
+
### Repair Jobs and Health Checks view
88
89
89
-
To view individual and all repair tasks associated with a cluster, go to the **Repair Jobs** tab. This displays pending repair tasks, completed repair tasks, or cancelled repair tasks. You can also see the state for any pending task.
90
+
To view individual and all repair tasks associated with a cluster, select **Repair Jobs**. This displays pending repair tasks, completed repair tasks, or cancelled repair tasks. You can also see the state for any pending task.
90
91
91
92
If a repair task state is Created, Claimed, or Preparing, it's not yet approved by Service Fabric. Once a repair task transitions to the Approved state, it's considered approved and is then forwarded to the Repair Executor for the corresponding job.
92
93
93
-
:::image type="content" source="media/troubleshoot-service-fabric-repair-jobs/repair-task-view.png" alt-text="Screenshot of the Repair Jobs tab in Service Fabric Explorer showing repair task states." lightbox="media/troubleshoot-service-fabric-repair-jobs/repair-task-view.png":::
94
+
:::image type="content" source="media/troubleshoot-service-fabric-repair-jobs/repair-task-view.png" alt-text="Screenshot of the Repair Jobs view in Service Fabric Explorer showing repair task states." lightbox="media/troubleshoot-service-fabric-repair-jobs/repair-task-view.png":::
94
95
95
-
If a repair task gets stuck in the Preparing state, it's either stuck in a health check or a safety check. An unhealthy entity in the cluster (including customer applications as well as system applications) can cause the health check to fail. To determine if the task is stuck in a health check, first verify whether **Preparing** or **Restoring Health Check** is enabled based on the state where the task is stuck. In the **Repair Task** view, expanding the task shows the health check status, indicating if it's enabled.
96
+
If a repair task gets stuck in the Preparing state, it's either stuck in a health check or a safety check. An unhealthy entity in the cluster (including customer applications as well as system applications) can cause the health check to fail. To determine if the task is stuck in a health check, first verify whether **Preparing Health Check** or **Restoring Health Check** is enabled based on the state where the task is stuck. In the **Repair Task** view, expanding the task shows the health check status, indicating if it's enabled.
96
97
97
98
:::image type="content" source="media/troubleshoot-service-fabric-repair-jobs/cluster-health-check.png" alt-text="Screenshot of an expanded repair task showing health check status and preparing health check details." lightbox="media/troubleshoot-service-fabric-repair-jobs/cluster-health-check.png":::
98
99
99
100
If enabled, **Repair Task History** shows that the health check started but didn't complete, confirming that the task is stuck in the Health Check phase.
100
101
101
-
### Safety Check view
102
+
### Safety Checks view
102
103
103
-
A repair task can get stuck in the Safety Check phase only if it has an impact on any node. This can be verified by checking the **Impact** section in the **Repair Task** view. If a node impact is present, you can identify which Safety Check is causing the delay by inspecting each impacted node individually. Select the node from the **Node List**. In the **Safety Check** section, you’ll see the specific check where the task is stuck. The **Repair Task ID** is also displayed here, indicating which repair task is responsible for the node deactivation and safety check.
104
+
A repair task can get stuck in the Safety Check phase only if it has an impact on any node. This can be verified by checking the **Impact** section in the **Repair Task** view. If a node impact is present, you can identify which Safety Check is causing the delay by inspecting each impacted node individually. Select the node from the **Node List**. In the **Safety Checks** section, you’ll see the specific check where the task is stuck. The **Repair Task ID** is also displayed here, indicating which repair task is responsible for the node deactivation and safety check.
104
105
105
106
For example, in the following screenshot, the repair task is stuck in the **EnsureSeedNodeQuorum** safety check.
106
107
107
-
:::image type="content" source="media/troubleshoot-service-fabric-repair-jobs/safety-check-view.png" alt-text="Screenshot of the Safety Check view in Service Fabric Explorer showing the specific check where the task is stuck." lightbox="media/troubleshoot-service-fabric-repair-jobs/safety-check-view.png":::
108
+
:::image type="content" source="media/troubleshoot-service-fabric-repair-jobs/safety-check-view.png" alt-text="Screenshot of the Safety Checks view in Service Fabric Explorer showing the specific check where the task is stuck." lightbox="media/troubleshoot-service-fabric-repair-jobs/safety-check-view.png":::
108
109
109
110
If there are no errors in **Infrastructure Service** related to a repair task and the task has entered the Executing state, it means the job’s acknowledgment status is Acknowledged for Impact Start. Similarly, if the repair task transitions to the Completed state, it indicates that the job’s acknowledgment status is Acknowledged for Impact End.
110
111
@@ -124,6 +125,6 @@ To check the health of the Infrastructure Service or Repair Manager Service, sel
124
125
125
126
### Job throttling status for Infrastructure Service
126
127
127
-
To check if any job is being throttled for a specific Infrastructure Service, select the service > **Health Evaluation** > **All**. Look for health events related to job throttling. If a job is throttled, the job ID along with the reason for throttling is displayed.
128
+
To check if any job is being throttled for a specific Infrastructure Service, select the service > **Health Evaluations** > **All**. Look for health events related to job throttling. If a job is throttled, the job ID along with the reason for throttling is displayed.
128
129
129
-
:::image type="content" source="media/troubleshoot-service-fabric-repair-jobs/cluster-job-throttling-status.png" alt-text="Screenshot of the Job throttling view in Service Fabric Explorer." lightbox="media/troubleshoot-service-fabric-repair-jobs/cluster-job-throttling-status.png":::
130
+
:::image type="content" source="media/troubleshoot-service-fabric-repair-jobs/cluster-job-throttling-status.png" alt-text="Screenshot of the Health Evaluations view in Service Fabric Explorer." lightbox="media/troubleshoot-service-fabric-repair-jobs/cluster-job-throttling-status.png":::
0 commit comments