Skip to content

Commit 6dcf9b1

Browse files
Update troubleshoot-vm-boot-error.md
Doing minor modifications on file paths. Also, adding new section for auto repair with ALAR and labeling the previous approach as manual
1 parent e6ba7be commit 6dcf9b1

1 file changed

Lines changed: 66 additions & 43 deletions

File tree

support/azure/virtual-machines/linux/troubleshoot-vm-boot-error.md

Lines changed: 66 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.workload: infrastructure-services
1010
ms.tgt_pltfrm: vm-linux
1111
ms.custom: sap:My VM is not booting, linux-related-content
1212
ms.topic: troubleshooting
13-
ms.date: 10/18/2024
13+
ms.date: 02/25/2025
1414
ms.author: divargas
1515
ms.reviewer: ekpathak, v-leedennis, v-weizhu
1616
---
@@ -68,37 +68,32 @@ See the following sections for detailed errors, possible causes, and solutions.
6868
> [!NOTE]
6969
> In the commands mentioned in the following sections, replace `/dev/sdX` with the corresponding Operating System (OS) disk device.
7070
71-
## <a id="unknown-filesystem"></a>Error: unknown filesystem
72-
73-
The following screenshot shows the error message:
71+
### <a id="offline-troubleshooting"></a> Reinstall GRUB and regenerate GRUB configuration file using Auto Repair (ALAR)
7472

75-
:::image type="content" source="./media/troubleshoot-vm-boot-error/grub-unknown-filesystem.png" alt-text="Screenshot of grub unknown file system error.":::
76-
77-
This error might be associated with one of the following issues:
78-
79-
* /boot file system corruption.
80-
81-
To resolve this issue, follow the steps in [Fix /boot file system corruption](#fix-boot-file-system-corruption).
82-
83-
* GRUB boot loader is pointing to an invalid disk or partition.
73+
Azure Linux Auto Repair (ALAR) scripts are part of the VM repair extension described in [Use Azure Linux Auto Repair (ALAR) to fix a Linux VM](./repair-linux-vm-using-alar.md). ALAR covers the automation of multiple repair scenarios, including GRUB rescue issues.
8474

85-
To resolve this issue, [reinstall GRUB and regenerate GRUB configuration file](#reinstall-grub-regenerate-grub-configuration-file).
75+
The ALAR scripts use the repair extension `repair-button` to fix GRUB issues by specifying `--button-command grubfix` for Generation 1 VMs, or `--button-command efifix` for Generation 2 VMs. This parameter triggers the automated recovery. Implement the following step to automate the fix of common GRUB errors that could be fixed by reinstalling GRUB and regenerating the corresponding configuration file:
8676

87-
* OS disk partition table issues caused by human error.
77+
* **Linux VMs without UEFI (BIOS based - Gen1):**
8878

89-
To resolve such issues, follow the steps in [Error: No such partition](#no-such-partition) with recommendations to re-create the /boot partition if missing or created incorrectly.
79+
```azurecli-interactive
80+
az vm repair repair-button --button-command 'grubfix' --verbose $RGNAME --name $VMNAME
81+
```
9082

91-
### <a id="fix-boot-file-system-corruption"></a>Fix /boot file system corruption
83+
* **Linux VMs with UEFI (Gen2):**
9284

93-
1. Check whether a rescue/repair VM was created. If it wasn't created, follow step 1 in [Troubleshoot GRUB rescue issue offline](#offline-troubleshooting) to create the VM.
85+
```azurecli-interactive
86+
az vm repair repair-button --button-command 'efifix' --verbose $RGNAME --name $VMNAME
87+
```
9488

95-
2. Refer to [Troubleshoot file system corruption errors in Azure Linux](linux-recovery-cannot-start-file-system-errors.md) to resolve the corruption issues in the corresponding /boot partition.
89+
> [!IMPORTANT]
90+
> Replace the resource group name `$RGNAME` and VM name `$VMNAME` accordingly.
9691
97-
3. Go to step 3 in [Troubleshoot GRUB rescue issue offline](#offline-troubleshooting) to swap the OS disk.
92+
The repair VM script, in conjunction with the ALAR script, temporarily creates a resource group, a repair VM, and a copy of the affected VM's OS disk. It reinstalls GRUB and regenerates the corresponding GRUB configuration file and then it swaps the OS disk of the broken VM with the copied fixed disk. Finally, the `repair-button` script will automatically delete the resource group containing the temporary repair VM.
9893

99-
### <a id="reinstall-grub-regenerate-grub-configuration-file"></a>Reinstall GRUB and regenerate GRUB configuration file
94+
### <a id="reinstall-grub-regenerate-grub-configuration-file"></a>Reinstall GRUB and regenerate GRUB configuration file manually
10095

101-
1. Check whether a rescue/repair VM was created. If it wasn't created, follow step 1 in [Troubleshoot GRUB rescue issue offline](#offline-troubleshooting) to create the VM. Mount all the required file systems, including / and /boot in the rescue/repair VM, and then enter the [chroot](chroot-environment-linux.md) environment.
96+
1. Check whether a rescue/repair VM was created. If it wasn't created, follow step 1 in [Troubleshoot GRUB rescue issue offline](#offline-troubleshooting) to create the VM. Mount all the required file systems, including `/` and `/boot` in the rescue/repair VM, and then enter the [chroot](chroot-environment-linux.md) environment.
10297

10398
2. Reinstall GRUB and regenerate the corresponding GRUB configuration file by using one of the following commands:
10499

@@ -129,7 +124,7 @@ This error might be associated with one of the following issues:
129124
sed -i 's/hd2/hd0/g' /boot/grub2/grub.cfg
130125
```
131126

132-
* **Ubuntu 20.04/22.04/24.04**
127+
* **Ubuntu Gen1 and Gen2**
133128

134129
```bash
135130
grub-install /dev/sdX
@@ -138,6 +133,34 @@ This error might be associated with one of the following issues:
138133

139134
3. Go to step 3 in [Troubleshoot GRUB rescue issue offline](#offline-troubleshooting) to swap the OS disk.
140135

136+
## <a id="unknown-filesystem"></a>Error: unknown filesystem
137+
138+
The following screenshot shows the error message:
139+
140+
:::image type="content" source="./media/troubleshoot-vm-boot-error/grub-unknown-filesystem.png" alt-text="Screenshot of grub unknown file system error.":::
141+
142+
This error might be associated with one of the following issues:
143+
144+
* `/boot` file system corruption.
145+
146+
To resolve this issue, follow the steps in [Fix /boot file system corruption](#fix-boot-file-system-corruption).
147+
148+
* GRUB boot loader is pointing to an invalid disk or partition.
149+
150+
To resolve this issue, [reinstall GRUB and regenerate GRUB configuration file](#reinstall-grub-regenerate-grub-configuration-file).
151+
152+
* OS disk partition table issues caused by human error.
153+
154+
To resolve such issues, follow the steps in [Error: No such partition](#no-such-partition) with recommendations to re-create the `/boot` partition if missing or created incorrectly.
155+
156+
### <a id="fix-boot-file-system-corruption"></a>Fix /boot file system corruption
157+
158+
1. Check whether a rescue/repair VM was created. If it wasn't created, follow step 1 in [Troubleshoot GRUB rescue issue offline](#offline-troubleshooting) to create the VM.
159+
160+
2. Refer to [Troubleshoot file system corruption errors in Azure Linux](linux-recovery-cannot-start-file-system-errors.md) to resolve the corruption issues in the corresponding `/boot` partition.
161+
162+
3. Go to step 3 in [Troubleshoot GRUB rescue issue offline](#offline-troubleshooting) to swap the OS disk.
163+
141164
## <a id="error15"></a>Error 15: File not found
142165
143166
The following screenshot shows the error message:
@@ -146,13 +169,13 @@ The following screenshot shows the error message:
146169
147170
To resolve this issue, follow these steps:
148171
149-
1. Check whether a rescue/repair VM was created. If it wasn't created, follow step 1 in [Troubleshoot GRUB rescue issue offline](#offline-troubleshooting) to create the VM. Mount all the required file systems, including / and */boot* in the rescue/repair VM, and then enter the [chroot](chroot-environment-linux.md) environment.
172+
1. Check whether a rescue/repair VM was created. If it wasn't created, follow step 1 in [Troubleshoot GRUB rescue issue offline](#offline-troubleshooting) to create the VM. Mount all the required file systems, including `/` and `/boot` in the rescue/repair VM, and then enter the [chroot](chroot-environment-linux.md) environment.
150173

151-
2. Inspect the /boot file system contents and determine what's missing.
174+
2. Inspect the `/boot` file system contents and determine what's missing.
152175
153176
3. If the GRUB configuration file is missing, [reinstall GRUB and regenerate GRUB configuration file](#reinstall-grub-regenerate-grub-configuration-file).
154177
155-
4. Verify that the file permissions in the /boot file system are OK. You can compare the permissions by using another VM that's running the same Linux version.
178+
4. Verify that the file permissions in the `/boot` file system are OK. You can compare the permissions by using another VM that's running the same Linux version.
156179

157180
5. If the entire /boot partition or other important contents are missing and can't be recovered, we recommend restoring the VM from a backup. For more information, see [How to restore Azure VM data in Azure portal](/azure/backup/backup-azure-arm-restore-vms).
158181
@@ -166,9 +189,9 @@ The following screenshot shows the error message:
166189
167190
1. Check whether a rescue/repair VM was created. If it wasn't created, follow step 1 in [Troubleshoot GRUB rescue issue offline](#offline-troubleshooting) to create one. Mount all the required file systems, including / and /boot in the rescue/repair VM, and then enter the [chroot](chroot-environment-linux.md) environment.
168191

169-
2. If you're unable to mount the /boot file system due to a corruption error, [fix /boot file system corruption](#fix-boot-file-system-corruption).
192+
2. If you're unable to mount the `/boot` file system due to a corruption error, [fix /boot file system corruption](#fix-boot-file-system-corruption).
170193
171-
3. When you're located inside chroot, verify the contents in the */boot/grub2/i386-pc* directory. If the contents are missing, copy the contents from */usr/lib/grub/i386-pc*. To do this, use the following commands:
194+
3. When you're located inside chroot, verify the contents in the `/boot/grub2/i386-pc` directory. If the contents are missing, copy the contents from `/usr/lib/grub/i386-pc`. To do this, use the following commands:
172195

173196
```bash
174197
ls -l /boot/grub2/i386-pc
@@ -194,7 +217,7 @@ The following screenshot shows the error message:
194217
```bash
195218
yum reinstall $(rpm -qa | grep -i kernel)
196219
```
197-
4. Create the *grub.cfg* file:
220+
4. Create the `grub.cfg` file:
198221

199222
```bash
200223
grub2-mkconfig -o /boot/grub2/grub.cfg
@@ -211,12 +234,12 @@ The following screenshot shows the error message:
211234

212235
This error occurs on a RHEL-based VM (Red Hat, Oracle Linux, CentOS) in one of the following scenarios:
213236

214-
* The /boot partition is deleted by mistake.
215-
* The /boot partition is re-created by using the wrong start and end sectors.
237+
* The `/boot` partition is deleted by mistake.
238+
* The `/boot` partition is re-created by using the wrong start and end sectors.
216239

217240
### Solution: Re-create /boot partition
218241

219-
If the /boot partition is missing, re-create it by following these steps:
242+
If the `/boot` partition is missing, re-create it by following these steps:
220243

221244
1. Check whether a rescue/repair VM was created. If it wasn't created, follow step 1 in [Troubleshoot GRUB rescue issue offline](#offline-troubleshooting) to create the VM.
222245
@@ -242,7 +265,7 @@ If the /boot partition is missing, re-create it by following these steps:
242265
243266
#### <a id="re-create-boot-partition-in-dos-systems"></a>Re-create /boot partition in dos systems
244267
245-
1. Re-create the /boot partition by using the following command:
268+
1. Re-create the /boot partition from a rescue/repair VM by using the following command:
246269
247270
```bash
248271
sudo fdisk /dev/sdX
@@ -300,7 +323,7 @@ If the /boot partition is missing, re-create it by following these steps:
300323
Calling ioctl() to re-read partition table.
301324
```
302325
303-
2. After you re-create the missing /boot partition, check whether the /boot file system is detected. You should be able to see an entry for `/dev/sdX1` (the missing /boot partition).
326+
2. After you re-create the missing `/boot` partition, check whether the `/boot` file system is detected. You should be able to see an entry for `/dev/sdX1` (the missing /boot partition).
304327
305328
```bash
306329
sudo blkid /dev/sdX1
@@ -311,11 +334,11 @@ If the /boot partition is missing, re-create it by following these steps:
311334
/dev/sdc1: UUID="<UUID>" TYPE="ext4"
312335
```
313336
314-
3. If the /boot file system isn't visible in `blkid` after you re-create the partition, this means that the /boot data no longer exists. You have to re-create the /boot file system (by using the same UUID and file system format that's in the */etc/fstab* /boot entry), and then [restore its contents from a backup](/azure/backup/backup-azure-arm-restore-vms).
337+
3. If the `/boot` file system isn't visible in `blkid` after you re-create the partition, this means that the /boot data no longer exists. You have to re-create the /boot file system (by using the same UUID and file system format that's in the `/etc/fstab` /boot entry), and then [restore its contents from a backup](/azure/backup/backup-azure-arm-restore-vms).
315338
316339
#### <a id="re-create-boot-partition-in-gpt-systems"></a>Re-create /boot partition in GPT systems
317340
318-
1. Re-create the /boot partition by using the following command:
341+
1. Re-create the /boot partition by using the following command, from a rescue/repair VM:
319342
320343
```bash
321344
sudo gdisk /dev/sdX
@@ -379,22 +402,22 @@ If the /boot partition is missing, re-create it by following these steps:
379402
sudo blkid /dev/sdX1
380403
```
381404
382-
You should be able to see an entry for `/dev/sdX1` (the missing /boot partition).
405+
You should be able to see an entry for `/dev/sdX1` (the missing `/boot` partition).
383406
384407
```output
385408
sudo blkid /dev/sdc1
386409
/dev/sdc1: UUID="<UUID>" BLOCK_SIZE="4096" TYPE="xfs" PARTLABEL="Linux filesystem" PARTUUID="<PARTUUID>"
387410
```
388411
389-
3. If the /boot file system isn't visible after you re-create the partition, this means that the /boot data no longer exists. You have to re-create the /boot file system (by using the same UUID that's in the */etc/fstab* /boot entry), and then [restore its contents from a backup](/azure/backup/backup-azure-arm-restore-vms).
412+
3. If the /boot file system isn't visible after you re-create the partition, this means that the /boot data no longer exists. You have to re-create the /boot file system (by using the same UUID that's in the `/etc/fstab` `/boot` entry), and then [restore its contents from a backup](/azure/backup/backup-azure-arm-restore-vms).
390413
391414
## <a id="grub_efi_get_secure_boot"></a>Error: symbol 'grub_efi_get_secure_boot' not found
392415
393416
The following screenshot shows the error message:
394417
395418
:::image type="content" source="./media/troubleshoot-vm-boot-error/grub-efi-get-secure-boot-not-found.jpg" alt-text="Screenshot of grub error 'grub_efi_get_secure_boot' not found.":::
396419
397-
Linux kernel version 4.12.14 (that's used in SLES 12 SP5) doesn't support the [Secure Boot](/windows-hardware/design/device-experiences/oem-secure-boot) option. Therefore, if secure boot is enabled during the deployment of the VM (that is, the **Security type** field is set to [Trusted launch virtual machines](/azure/virtual-machines/trusted-launch)), the virtual machine generates the secure boot error through the console when you try to start by using this SUSE kernel version on a Gen2 VM image.
420+
Linux kernel version `4.12.14`(that's used in SLES 12 SP5) doesn't support the [Secure Boot](/windows-hardware/design/device-experiences/oem-secure-boot) option. Therefore, if secure boot is enabled during the deployment of the VM (that is, the **Security type** field is set to [Trusted launch virtual machines](/azure/virtual-machines/trusted-launch)), the virtual machine generates the secure boot error through the console when you try to start by using this SUSE kernel version on a Gen2 VM image.
398421
399422
### Solution
400423
@@ -428,9 +451,9 @@ This kind of error is triggered in one of the following scenarios:
428451

429452
To resolve this error, follow these steps:
430453

431-
1. Check whether a rescue/repair VM was created. If it wasn't created, follow step 1 in [Troubleshoot GRUB rescue issue offline](#offline-troubleshooting) to create the VM. Mount all the required file systems, including / and /boot, and then enter the [chroot](chroot-environment-linux.md) environment.
454+
1. Check whether a rescue/repair VM was created. If it wasn't created, follow step 1 in [Troubleshoot GRUB rescue issue offline](#offline-troubleshooting) to create the VM. Mount all the required file systems, including `/` and `/boot`, and then enter the [chroot](chroot-environment-linux.md) environment.
432455
433-
2. Make sure that the */etc/default/grub* configuration file is configured. The [endorsed Azure Linux images](/azure/virtual-machines/linux/endorsed-distros) already have the required configurations. For more information, see the following articles:
456+
2. Make sure that the `/etc/default/grub` configuration file is configured. The [endorsed Azure Linux images](/azure/virtual-machines/linux/endorsed-distros) already have the required configurations. For more information, see the following articles:
434457
435458
* [GRUB access in RHEL](serial-console-grub-single-user-mode.md#grub-access-in-rhel)
436459
* [GRUB access in CentOS](serial-console-grub-single-user-mode.md#grub-access-in-centos)
@@ -441,9 +464,9 @@ To resolve this error, follow these steps:
441464
3. [Reinstall GRUB and regenerate GRUB configuration file](#reinstall-grub-regenerate-grub-configuration-file).
442465
443466
> [!NOTE]
444-
> If the missing file is */boot/grub/menu.lst*, this error is for older OS versions (RHEL 6.x, Centos 6.x and Ubuntu 14.04). The commands will differ because GRUB version 1 is used in those systems instead. GRUB version 1 isn't covered in this article.
467+
> If the missing file is `/boot/grub/menu.lst`, this error is for older OS versions (RHEL 6.x, Centos 6.x and Ubuntu 14.04). The commands will differ because GRUB version 1 is used in those systems instead. GRUB version 1 isn't covered in this article.
445468

446-
4. If the entire /boot partition is missing, follow the steps in [Error: no such partition](#no-such-partition).
469+
4. If the entire `/boot` partition is missing, follow the steps in [Error: no such partition](#no-such-partition).
447470

448471
5. After the issue is resolved, go to step 3 in [Troubleshoot GRUB rescue issue offline](#offline-troubleshooting) to swap the OS disk.
449472

0 commit comments

Comments
 (0)