You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: support/azure/service-fabric/cluster/troubleshoot-service-fabric-repair-jobs.md
+16-12Lines changed: 16 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,11 +13,11 @@ ms.date: 01/20/2026
13
13
# Troubleshooting guide for customers to investigate and analyse, using Service Fabric Explorer (SFX), why repair jobs are not being approved
14
14
15
15
## Repair Task overview in service fabric
16
-
Any operation initiated from the Virtual Machine Scale Set (VMSS) that targets VMs is processed by Service Fabric (SF) as a repair task derived from the job it receives. The Infrastructure Service creates a repair task for each job and enriches it with details such as the update type, targeted update domain (UD), and document incarnation number. These jobs begin with UD0 and progress sequentially through UD1, UD2, and so on within the Service Fabric cluster. If an Update Domain walk is required, separate repair tasks are generated for each UD. For example, in a cluster with five UDs, five distinct repair tasks will be created. These tasks execute one after another, UD by UD, and their progress can be tracked in Service Fabric Explorer (SFX).
16
+
Any operation initiated from the scale-set that targets VMs is processed by Service Fabric as a repair task derived from the job it receives. The Infrastructure Service creates a repair task for each job and adds details like the update type, targeted update domain (UD), and document incarnation number. These jobs begin with UD0 and progress sequentially through UD1, UD2, and so on within the Service Fabric cluster. If an Update Domain walk is required, separate repair tasks are generated for each UD. For example, in a cluster with five UDs, five distinct repair tasks will be created. These tasks execute one after another, UD by UD, and their progress can be tracked in Service Fabric Explorer (SFX).
17
17
18
-
RepairManager – Repair Manager (RM) defines and implements a safe workflow for performing repairs by coordinating between the Repair Requestor, Repair Executor, and itself to ensure safe and consistent repair actions.
18
+
Repair Manager - defines and implements a safe workflow for performing repairs by coordinating between the Repair Requestor, Repair Executor, and itself to ensure safe and consistent repair actions.
19
19
20
-
Infrastructure Service – Infrastructure Service (IS) is responsible for managing and orchestrating infrastructure-level operations, such as updates and repairs, ensuring the health and stability of the Service Fabric cluster.
20
+
Infrastructure Service – responsible for managing and orchestrating infrastructure-level operations, such as updates and repairs, ensuring the health and stability of the Service Fabric cluster.
21
21
22
22
### Repair Task vs. Repair Job
23
23
@@ -47,11 +47,11 @@ In the Created state, the Repair Manager (RM) accepts and stores the repair requ
47
47
48
48
* Claimed
49
49
50
-
Once the task is Claimed, the Repair Executor (RE) has taken ownership but has not yet specified the repair's impact. The requestor still retains the ability to cancel the task at this stage. Repair executor has ownership in this state.
50
+
Once the task is Claimed, the Repair Executor (RE) has taken ownership but hasn't specified the repair's impact. The requestor still retains the ability to cancel the task at this stage. Repair executor has ownership in this state.
51
51
52
52
* Preparing
53
53
54
-
In the Preparing state, the Repair Executor specifies the impact, and the Repair Manager prepares the environment, such as deactivating nodes. If the task is cancelled now, it skips execution and moves directly to restoring. Operator also have the option to force approval, bypassing certain safety checks. Repair Manager has ownership in this state.
54
+
In the Preparing state, the Repair Executor specifies the impact, and the Repair Manager prepares the environment, such as deactivating nodes. If the task is cancelled now, it skips execution and moves directly to restoring. Operator also has the option to force approval, bypassing certain safety checks. Repair Manager has ownership in this state.
55
55
56
56
* Approved
57
57
@@ -73,8 +73,12 @@ Finally, in the Completed state, the task is finished, and no further state chan
73
73
74
74
### Infrastructure Jobs view
75
75
76
-
To view jobs that have been submitted to Service Fabric for approval, navigate to the Infrastructure Jobs tab under cluster view. Each entry includes a Job ID, which remains consistent across Service Fabric as well as outside service fabric. The Acknowledgement Status indicates whether the job has been approved by Service Fabric: • WaitingForAcknowledgement means the job is still pending approval. • Acknowledged confirms that the job has been approved by Service Fabric. This view represents perspective of the job. Jobs will only appear here when they are present in the received document. In addition to the Job ID and Acknowledgement Status, the Impact Types section displays the nature of the job’s impact. The Current Repair Task section shows which repair task is actively running for the job approval on the Service Fabric side. By selecting All Repair Tasks, you can view the status of every repair task associated with the current job.
76
+
To view jobs that have been submitted to Service Fabric for approval, navigate to the **Infrastructure Jobs** tab under cluster view. Each entry includes a **Job ID**, which remains consistent across Service Fabric and outside service fabric. The **Acknowledgement Status** indicates whether the job has been approved by Service Fabric:
77
77
78
+
-**WaitingForAcknowledgement** means the job is still pending approval.
79
+
-**Acknowledged** confirms that the job has been approved by Service Fabric.
80
+
81
+
This view represents perspective of the job. Jobs will only appear here when they are present in the received document. In addition to the **Job ID** and **Acknowledgement Status**, the **Impact Types** section displays the nature of the job’s impact. The **Current Repair Task** section shows which repair task is actively running for the job approval on the Service Fabric side. By selecting **All Repair Tasks**, you can view the status of every repair task associated with the current job.
78
82
79
83
<center>
80
84
![Infrastructure Job view][Image1]
@@ -141,12 +145,12 @@ To check if any job is being throttled for a specific Infrastructure Service, se
0 commit comments