Skip to content

Commit 0c14527

Browse files
Update troubleshoot-service-fabric-repair-jobs.md
1 parent e06f3aa commit 0c14527

1 file changed

Lines changed: 27 additions & 25 deletions

File tree

support/azure/service-fabric/cluster/troubleshoot-service-fabric-repair-jobs.md

Lines changed: 27 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,55 @@
11
---
2-
title: Troubleshooting guide for customers to investigate and analyse, using Service Fabric Explorer (SFX), why repair jobs are not being approved.
3-
description: Learn how to analyze stuck repair jobs using Service Fabric Explorer.
4-
ms.topic: concept-article
5-
ms.author: ashukumar
6-
author: ashukumar
2+
title: Troubleshoot repair jobs that aren't approved by using Service Fabric Explorer
3+
description: This article provides guidance on troubleshooting repair jobs that aren't being approved in a Service Fabric cluster using Service Fabric Explorer (SFX). It explains the concepts of repair tasks and repair jobs, their states, and how to analyze them through SFX.
4+
ms.topic: troubleshooting-general
5+
ms.author: jarrettr
6+
ms.reviewer: ashukumar, v-ryanberg
7+
ms.editor: v-gsitser
78
ms.service: azure-service-fabric
89
services: service-fabric
910
ms.date: 01/20/2026
1011
# Customer intent: As a Service Fabric customer, I want to analyze the reason why a repair job is stuck using Service Fabric Explorer.
1112
---
1213

13-
# Troubleshooting guide for customers to investigate and analyse, using Service Fabric Explorer (SFX), why repair jobs are not being approved
14+
# Troubleshoot repair jobs that aren't approved by using Service Fabric Explorer
1415

15-
## Repair Task overview in service fabric
16-
Any operation initiated from the scale-set that targets VMs is processed by Service Fabric as a repair task derived from the job it receives. The Infrastructure Service creates a repair task for each job and adds details like the update type, targeted update domain (UD), and document incarnation number. These jobs begin with UD0 and progress sequentially through UD1, UD2, and so on within the Service Fabric cluster. If an Update Domain walk is required, separate repair tasks are generated for each UD. For example, in a cluster with five UDs, five distinct repair tasks will be created. These tasks execute one after another, UD by UD, and their progress can be tracked in Service Fabric Explorer (SFX).
16+
## Summary
1717

18-
Repair Manager - defines and implements a safe workflow for performing repairs by coordinating between the Repair Requestor, Repair Executor, and itself to ensure safe and consistent repair actions.
18+
This article provides guidance on troubleshooting repair jobs that aren't approved in an Azure Service Fabric cluster by using Service Fabric Explorer (SFX). It explains the concepts of repair tasks and repair jobs, their states, and how to analyze them by using SFX.
1919

20-
Infrastructure Service – responsible for managing and orchestrating infrastructure-level operations, such as updates and repairs, ensuring the health and stability of the Service Fabric cluster.
20+
## Repair task overview
2121

22-
### Repair Task vs. Repair Job
22+
Service Fabric processes any operation initiated from the scale set that targets virtual machines (VMs) as a repair task derived from the job it receives. The Infrastructure Service creates a repair task for each job and adds details like the update type, targeted update domain (UD), and document incarnation number. These jobs start with UD0 and progress sequentially through UD1, UD2, and so on within the Service Fabric cluster. If a domain update is required, the Infrastructure Service generates separate repair tasks for each UD. For example, in a cluster with five UDs, the Infrastructure Service creates five distinct repair tasks. These tasks run one after another, UD by UD, and you can track their progress in SFX.
2323

24-
A repair job refers to an Azure‑initiated maintenance operation that provides essential details such as the job ID, repair type, targeted update domain, document incarnation number, node‑impact information, and additional metadata. Service Fabric then creates a repair task by combining details such as:
24+
- **Repair Manager** - Defines and implements a safe workflow for performing repairs by coordinating between the Repair Requestor, Repair Executor, and itself to ensure safe and consistent repair actions.
2525

26-
* Repair type
27-
* Target update domain
28-
* Document incarnation number
26+
- **Infrastructure Service** – Responsible for managing and orchestrating infrastructure-level operations, like updates and repairs, ensuring the health and stability of the Service Fabric cluster.
2927

30-
These elements are combined in the following format:
28+
### Repair job and repair task
3129

32-
Azure/repair type/repair job/update domain/document incarnation number
30+
A repair job refers to an Azure‑initiated maintenance operation that provides essential details like the job ID, repair type, targeted update domain, document incarnation number, node‑impact information, and additional metadata. Service Fabric then creates a repair task by combining details including:
3331

34-
Example:Azure/TenantUpdate/addfb79e-1e8c-42c8-a967-b0e2e0afd6b4/0/110
32+
- *Repair type*, which indicates the repair category, like **Tenant** or **Platform**, and whether the operation is a maintenance action or an update.
33+
- *Target UD*, which refers to the update domain that the repair job is targeting at that time.
34+
- *Document incarnation number*, which is a monotonically increasing version identifier for the update document received by Service Fabric from Azure.
3535

36-
Repair Type: - It indicates the repair category, such as Tenant or Platform, and whether the operation is a maintenance action or an update.
37-
Target Update domain: - It refers to the update domain that the repair job is targeting at that time.
38-
Document incarnation number: - The Document Incarnation Number is a monotonically increasing version identifier for the update document received by Service Fabric from Azure.
36+
These elements are then combined in the following format:
3937

40-
The resulting entity is the repair task, which is used within the Service Fabric context. In contrast, the repair job is recognized outside Service Fabric components.
38+
`Azure/repair type/repair job/update domain/document incarnation number`
39+
40+
For example: `Azure/TenantUpdate/addfb79e-1e8c-42c8-a967-b0e2e0afd6b4/0/110`
41+
42+
The resulting entity is the *repair task*, which is used within the Service Fabric context. In contrast, the repair job is recognized outside Service Fabric components.
4143

4244
## Repair task states and their ownership
4345

44-
* Created
46+
### Created
4547

46-
In the Created state, the Repair Manager (RM) accepts and stores the repair request. At this point, the task is waiting for a Repair Executor (RE) to claim it. The requestor can cancel the task during this stage without any restrictions. Repair manager has ownership in this state.
48+
In the Created state, the Repair Manager accepts and stores the repair request. The task then waits for a Repair Executor to claim it. The requestor can cancel the task during this stage without any restrictions. The Repair Manager has ownership in this state.
4749

4850
* Claimed
4951

50-
Once the task is Claimed, the Repair Executor (RE) has taken ownership but hasn't specified the repair's impact. The requestor still retains the ability to cancel the task at this stage. Repair executor has ownership in this state.
52+
Once the task is Claimed, the Repair Executor (RE) takes ownership but doesn't specify the repair's impact. The requestor still retains the ability to cancel the task at this stage. The repair executor has ownership in this state.
5153

5254
* Preparing
5355

0 commit comments

Comments
 (0)