Skip to content

Commit d4a42cc

Browse files
authored
Create troubleshoot-non-boot-scenarios-after-enabling-ade-in-the-os-disk-on-linux-vms
List of the most common scenarios for a VM not to boot after ADE is deployed and how to approach them towards a feasible solution.
1 parent 19f7631 commit d4a42cc

1 file changed

Lines changed: 178 additions & 0 deletions

File tree

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
---
2+
title: Troubleshoot non-boot scenarios after enabling Azure Disk Encryption in the OS disk on Linux VMs
3+
description: Resolve issues when a Linux VM is not booting after enabling Azure Disk Encryption
4+
author: elicorme
5+
ms.author: elcorral
6+
ms.date: 04/01/2025
7+
ms.reviewer: divargas
8+
ms.service: azure-virtual-machines
9+
ms.custom: linux-related-content
10+
ms.topic: troubleshooting
11+
ms.collection: linux
12+
---
13+
14+
# How to fix issues related to VMs not booting after enabling Azure Disk Encryption
15+
16+
**Applies to:** :heavy_check_mark: Linux VMs
17+
18+
When deploying Azure Disk Encryption (ADE), various essential settings related to the boot process and system components are modified by editing files. If ADE fails or is interrupted, the virtual machine is likely to get stuck in emergency mode or become unusable. Especially when the OS disk is the one being encrypted.
19+
20+
Based on this, here you can find a list of the most common scenarios for a VM not to boot after ADE is deployed and how to approach them towards a feasible solution.
21+
22+
Remember that in all cases, you should [take a snapshot](https://learn.microsoft.com/azure/virtual-machines/linux/snapshot-copy-managed-disk) and/or create a backup before disks are encrypted.
23+
24+
Backups ensure that a recovery option is possible if an unexpected failure occurs during encryption. For more information about how to back up and restore encrypted VMs, see the [Azure Backup](https://learn.microsoft.com/azure/backup/backup-azure-vms-encryption) article.
25+
26+
## Common issues related to non-boot scenarios on machines using Azure Disk Encryption
27+
28+
For many of the issues related to non-boot scenarios, you need to pay attention to the extension logs showed either in the serial console or the extension log file, which is normally located at `/var/log/azure/Microsoft.Azure.Security.AzureDiskEncryptionForLinux/extension.log`.
29+
30+
## <a id="initram-miss"> </a> ADE modules missing in the initramfs image ADD THE STEPS FOR UBUNTU
31+
32+
If the OS disk is using LVM and you see a message like this:
33+
34+
```bash
35+
Warning: /dev/mapper/rootvg-rootlv does not exist
36+
...
37+
Entering emergency mode. Exit the shell to continue.
38+
dracut:/#
39+
```
40+
41+
chances are that the required modules were not added to the initial ram disk image, then try to:
42+
43+
* [Restore from backup](https://learn.microsoft.com/azure/backup/restore-azure-encrypted-virtual-machines) and attempt the encryption again
44+
* Use either the Azure CLI extension [az vm repair](https://learn.microsoft.com/troubleshoot/azure/virtual-machines/linux/unlock-encrypted-linux-disk-offline-repair#method1) or the [manual method](https://learn.microsoft.com/troubleshoot/azure/virtual-machines/linux/unlock-encrypted-linux-disk-offline-repair#method2) to create a rescue VM, attach and unlock the OS disk of the failed Linux machine to that rescue VM
45+
* Inside the failed disk, execute the following commands. Replace the kernel and extension version accordingly
46+
47+
RHEL 8,9
48+
49+
```bash
50+
# cp /var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-X.X.X.X/main/oscrypto/91adeOnline /usr/lib/dracut/modules.d/
51+
52+
# dracut -f -v /boot/initramfs-X.XX.X-XXX.XX.X.x86_64.img <KERNEL VERSION>
53+
```
54+
55+
Ubuntu 20
56+
57+
> [!NOTE]
58+
> This procedure could apply to non-boot scenarios after upgrading from Ubuntu 18 to Ubuntu 20. Review the scenario to confirm if it applies.
59+
60+
Copy the following files from the extension configuration directory to the initramfs scripts directory:
61+
62+
```bash
63+
# cd /var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-X.x.x.xx/main/oscrypto/ubuntu_2004/encryptscripts
64+
# cp crypt-ade-boot /usr/share/initramfs-tools/scripts/init-premount/
65+
# cp crypt-ade-hook /usr/share/initramfs-tools/hooks/
66+
```
67+
68+
Once the file crypt-ade-boot is copied, replace `ROOTPARTUUID` variable in the line below with the OS partition path from /dev/disk/by-partuuid/.
69+
70+
```bash
71+
Example:
72+
# ls -l /dev/disk/by-partuuid/ | grep -w <partition containing the OS>
73+
lrwxrwxrwx 1 root root 10 May 18 17:33 ef61c3c3-50bb-40f0-8124-4cbe8cb2a380 -> ../../sda1
74+
```
75+
76+
Replace the `ROOTPARTUUID` variable below with the one obtained in the step above. Remember to replace the UUID according to your enviroment
77+
78+
```bash
79+
cryptsetup luksOpen /dev/disk/by-partuuid/ROOTPARTUUID osencrypt --header /boot/luks/osluksheader -d /mnt/azure_bek_disk/LinuxPassPhraseFileName
80+
```
81+
82+
Regenerate the initramfs image
83+
84+
```bash
85+
update-initramfs -u -k all
86+
```
87+
88+
An output similar to the one below is expected:
89+
90+
```bash
91+
update-initramfs: Generating /boot/initrd.img-5.15.0-1038-azure
92+
cryptsetup: WARNING: target 'osencrypt' not found in /etc/crypttab
93+
+ PREREQS=udev
94+
+ mount -a
95+
+ cryptsetup luksOpen /dev/disk/by-partuuid/ef61c3c3-50bb-40f0-8124-4cbe8cb2a380 osencrypt --header /boot/luks/osluksheader -d /mnt/azure_bek_disk/LinuxPassPhraseFileName
96+
Device osencrypt already exists.
97+
+ exit 0
98+
```
99+
100+
* Swap the failed OS disk with the one containing the fix.
101+
* Review the extension and console logs to ensure the encryption process finished successfully.
102+
103+
## Interrupted encryption
104+
105+
It depends on where the encryption process was interrupted to determine what steps to follow for troubleshooting, keep in mind that there could be scenarios where the only option will be to [restore from backup](https://learn.microsoft.com/azure/backup/restore-azure-encrypted-virtual-machines).
106+
107+
* Review the console logs and look for any error messages, normally extension deployment problems will be presented in the form of python errors.
108+
109+
* Ensure all the [extension pre-requisites](https://learn.microsoft.com/azure/virtual-machines/linux/disk-encryption-overview#additional-vm-requirements) are met.
110+
111+
* If required, work on a rescue VM and analyze the failed disk. For the operating system disk ensure that:
112+
* The required partitions are in place and the data is healthy.
113+
* The [operating system LUKS header file](https://learn.microsoft.com/troubleshoot/azure/virtual-machines/linux/unlock-encrypted-linux-disk-offline-repair#identify-the-header-file) is called `osluksheader` and is stored separately under the `/boot` partition. If the disk was encrypted and this file is missing or corrupted, there is no way to recover the virtual machine unless you have a working backup.
114+
* The initramfs contains the required ADE modules. If he modules are missing, follow the steps on [ADE modules missing in the initram image](#initram-miss).
115+
* The BEK VOLUME contains the [ADE key file](https://learn.microsoft.com/troubleshoot/azure/virtual-machines/linux/unlock-encrypted-linux-disk-offline-repair#identify-the-ade-key-file).
116+
* In case the key file is missing, then create a test machine and encrypt it (volume type DATA) using the original encryption settings used to encrypt the faulty VM, once encrypted, check the test VM looking for the ADE key file in the BEK volume.
117+
1. Copy the ADE key file
118+
2. Start the faulty machine
119+
3. While in the emergency mode, [Identify the ADE key file](https://learn.microsoft.com/troubleshoot/azure/virtual-machines/linux/), [Identify the header file](https://learn.microsoft.com/troubleshoot/azure/virtual-machines/linux/unlock-encrypted-linux-disk-offline-repair#identify-the-header-file) then, based on the disk layout LVM or raw, [open the disk from encryption manually](https://learn.microsoft.com/troubleshoot/azure/virtual-machines/linux/unlock-encrypted-linux-disk-offline-repair#unlock-by-files).
120+
4. Let the machine boot.
121+
5. If the ADE key file is still missing, and the BEK volume is mounted, manually create a file called `/mnt/azure_bek_disk/LinuxPassPhraseFileName` with the ADE key file contents.
122+
6. Reboot the machine
123+
7. Redeploy the machine.
124+
125+
## Not enough space in the boot partition (Ubuntu)
126+
127+
> [!NOTE]
128+
> Ubuntu 24 images now come with a separate `boot` partition with 1GB size.
129+
130+
ADE needs a separate partition for `/boot`, for that reason during the extension deployment it creates `/boot` as a separate partition and restore the original files back. At the end of the process a new initial ram disk file is created, if there is not enough space, this step is going to fail. This scenario is particularly complex since there are many variants and as for now [resizing the OS disk](https://learn.microsoft.com/azure/virtual-machines/linux/how-to-resize-encrypted-lvm#scenarios) is not supported when the OS disk is using ADE.
131+
At the time of writing, only Ubuntu images may fall under this process of boot split.
132+
133+
In order to avoid falling into this issue, check on the following items:
134+
135+
* Delete old kernels not in use.
136+
* Ensure only the necessary files are under `/boot`.
137+
138+
## VFAT kernel module disabled
139+
140+
The VFAT kernel module is required in order to mount the BEK volume. If the module is not enabled the ADE key file is not going to be available, therefore the disk is not going to be unlocked.
141+
142+
Before continuing with the encryption [enable the VFAT module](https://learn.microsoft.com/troubleshoot/azure/virtual-machines/linux/vfat-disabled-boot-issues#ade-encrypted-vm-is-unable-to-access-root-volume)
143+
144+
## Problems related to missing packages
145+
146+
The ADE extension will install the required packages in case they are not installed by default.
147+
If for some reason this installation step fails, the encryption will also fail.
148+
149+
In order to identify the cause for packages not being installed review the extension logs from the console. Locate a message like this:
150+
151+
`[Info] Installing pre-requisites`
152+
153+
Then, ensure all the packages were successfully installed. Visit [Package management](https://learn.microsoft.com/azure/virtual-machines/linux/disk-encryption-isolated-network#package-management) for a full list of the required packages based on the Linux distro.
154+
155+
If there are errors related to package installation, identify which package failed and why it failed.
156+
Ensure the VM has access to the package repositories. Go to [Azure Disk Encryption on an isolated network](https://learn.microsoft.com/azure/virtual-machines/linux/disk-encryption-isolated-network) in case the VM is under special network requirements.
157+
158+
## Missing parameters in the GRUB configuration
159+
160+
During the encryption process the extension will add a couple of parameters to the kernel options in the file `/etc/default/grub` these are related to the boot and root partition UUID:
161+
162+
`rd.luks.ade.partuuid` and `rd.luks.ade.bootuuid`
163+
164+
These parameters must be present and properly set to the `UUIDs` accordingly. If this is not case, [offline troubleshooting](https://learn.microsoft.com/troubleshoot/azure/virtual-machines/linux/unlock-encrypted-linux-disk-offline-repair) will be required in order to add the parameter manually. The UUIDs can be obtained in a `chroot` environment by running the command `blkid`.
165+
166+
## Missing or corrupted osluksheader file
167+
168+
LUKS stores its encryption metadata in a special section at the beginning of the encrypted partition called the LUKS header.
169+
This header contains some critical information such as the cipher and mode, hash function, and key slots.
170+
The actual encrypting of the partition is done using a master key.
171+
172+
When using ADE in the OS disk, the header is stored in a file under the `/boot` partition named `osluksheader`. If for any reason this file suffers corruption or if it is missing, the only way to retrive it is via a backup. Use the [offline troubleshooting](https://learn.microsoft.com/troubleshoot/azure/virtual-machines/linux/unlock-encrypted-linux-disk-offline-repair) method to mount the `boot` partition of the affected disk and place the `osluksheader` file from backup respectively.
173+
174+
## Resources
175+
176+
* [Azure Disk Encryption for Linux VMs](https://learn.microsoft.com/azure/virtual-machines/linux/disk-encryption-overview)
177+
* [Azure Disk Encryption troubleshooting](https://docs.microsoft.com/azure/virtual-machines/linux/disk-encryption-troubleshooting)
178+
* [Azure Disk Encryption frequently asked questions](https://docs.microsoft.com/azure/virtual-machines/linux/disk-encryption-faq)

0 commit comments

Comments
 (0)