Virtual Machine Disk Recovery: VMware VMDK, Hyper-V VHDX and the Snapshot Chain
When virtual machine disk files (VMware VMDK, Hyper-V VHDX) get corrupted, recovery is layered: first the outer image, then the file system inside the VM. A broken snapshot chain, a missing descriptor, a damaged VMFS datastore and thin provision gaps are the most common causes. DSET recovers virtual disks in its Ankara Hacettepe Teknokent Beytepe lab. First diagnosis free.
Virtual Machine Disk Recovery: VMware VMDK, Hyper-V VHDX and the Snapshot Chain
Quick answer: If your virtual machine will not start and the disk file (VMDK in VMware, VHDX in Hyper-V) is corrupted, do not assume the data inside is gone. Virtual disk recovery is a layered job: first the structure of the outer image file or the datastore is repaired, then the file system inside the VM is reassembled. The most common causes are a broken snapshot chain, a missing descriptor and a damaged VMFS datastore. On first response, do not try to restart the VM. Emergency line: +90 536 662 38 09.
Why is virtual disk a two layered recovery?
A physical disk has a single file system. In virtual machines two separate worlds nest inside each other. On the outside there is a disk image file the host sees: .vmdk in VMware, .vhdx in Hyper-V. On the inside, within that file, lies the file system of the virtual machine's own operating system (such as NTFS, ext4, ReFS).
So virtual disk recovery is always two layered. First the health of the outer image file or the datastore that holds it must be resolved and the file accessed consistently. Only then can you enter the inner file system and reach the user's actual data. If one layer is corrupt, the data cannot be reached even if the other is intact.
The most common types of corruption
Broken snapshot chain
Taking a snapshot in virtual machines means freezing the current disk and writing changes to a separate file (a delta or differencing disk). In VMware these are delta vmdk, in Hyper-V avhdx files. These files form a chain: the base disk, snapshot 1 on top, snapshot 2 on top of that. If a single link in the chain is deleted, moved or corrupted, the whole chain becomes unreadable. This is one of the most common causes of data loss, because users sometimes think snapshot files are "temporary" and delete them.
Missing or corrupt descriptor
A VMware VMDK usually has two parts: a small text descriptor file and a flat file holding the actual data. The descriptor carries info such as the disk size, type and chain. If this small file is corrupted or lost, the system does not know how to interpret terabytes of actual data. Often the descriptor can be recreated and the disk made readable again.
Damaged VMFS datastore
VMware ESXi keeps virtual disk files in a special file system called VMFS. If the LUN or RAID array the datastore sits on fails, the VMFS metadata is damaged and all VMs inside become inaccessible. In this case recovery starts at the datastore level: the VMFS structure is reassembled, then the VMDK files are extracted.
Thin provision gaps
Thin provisioning allocates space to the virtual disk as needed, so the file looks smaller than real usage. When the disk is corrupted, you must work out which blocks were actually written and which are empty. The difference between thin and thick provision determines how the block map is interpreted in recovery.
Corruption types and recovery approach
| Corruption | Platform | Root layer | Approach |
|---|---|---|---|
| Broken snapshot chain | VMware / Hyper-V | Delta / avhdx | Chain rebuilt, best point chosen |
| Missing descriptor | VMware | VMDK metadata | Descriptor recreated |
| Damaged datastore | VMware ESXi | VMFS | Datastore reassembled, then VMDK extracted |
| Thin provision gap | VMware / Hyper-V | Block map | Written blocks separated |
| RAID/disk failure | All platforms | Physical array | Image array first, then upper layers |
First response: what not to do?
The most common mistake in virtual disk cases is trying to restart the VM over and over. Forcing a VM to start with a broken snapshot chain makes the host write to delta files and complicates the situation further. Deleting, moving or renaming a snapshot file can also break the chain permanently.
Do: Keep the VM off, preserve all vmdk, vhdx, delta and avhdx files as they are, do not create a new VM on the datastore. If there is a crash on a physical RAID, do not touch the array, see the RAID 5 crash recovery process article for first response.
Do not: Do not delete snapshot files, do not blindly run "consolidate" or "merge" operations, do not reformat the datastore.
Relationship to server roles
Virtual machines often carry critical server roles. If there is an Exchange or Active Directory inside, a separate database level recovery may be needed after the outer image is repaired. We covered this in detail in our Exchange and Active Directory server recovery article. Because after the virtual disk opens, the EDB or NTDS.dit inside may still be in a dirty shutdown state.
The DSET virtual machine recovery approach
We have been recovering virtual disks in our Ankara Hacettepe Teknokent Beytepe lab since 2003. We first take a bit by bit copy of the datastore or image files and do all work on the copy. We rebuild the snapshot chain, repair the descriptor and reach the file system inside the VM. Our success rate is 99.4 percent. First diagnosis is free, no data no fee.
Frequently Asked Questions (FAQ)
I accidentally deleted the snapshot files and the VM will not start. Will it recover?
Most of the time yes. Even if the snapshot files are deleted, the base disk is usually intact and remnants of the delta files can often be recovered. At worst, you return to the pre snapshot state. Power off the VM immediately and do not touch the files.
My VMDK descriptor file is lost but the flat file is there. Is that enough?
Usually yes. The actual data sits in the flat file, the descriptor is just a small definition file that says how to interpret it. With the disk size and type correctly determined, the descriptor can be recreated.
The datastore (VMFS) is corrupt and all VMs are gone. Are they all lost?
No. Once the LUN or array under the datastore is imaged and the VMFS structure reassembled, the VMDK files inside can usually be extracted. The key is not to write anything new to the array.
Is recovery harder on a thin provision disk?
A little more complex, because you have to work out which blocks were actually written. But once the correct block map is extracted, thin disks recover successfully too.
You opened the virtual disk but the Windows inside will not start, what is happening?
Even if the outer image is intact, the file system or database inside the VM may be corrupt. This is the second layer and needs a separate recovery. For example, if there is Exchange inside, the database may be in a dirty shutdown state.
Sources
- VMware Docs, Virtual Disk Manager and VMDK formats: https://docs.vmware.com/en/VMware-vSphere/index.html
- VMware Docs, Working with Snapshots in vSphere: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-vm-administration/GUID-CA948C69-7F58-4519-AEB1-739545EA94E5.html
- VMware Docs, Understanding VMFS Datastores: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-storage/GUID-5EE84941-366D-4D37-8B7B-767D08928888.html
- Microsoft Learn, Hyper-V on Windows Server best practices: https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/best-practices-for-running-linux-on-hyper-v
- Microsoft Learn, Manage Hyper-V checkpoints (snapshots): https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/manage/manage-checkpoints
Kimliğinizi doğrulayın
Yetkilendirilmiş erişim alanı. Tüm giriş denemeleri kayıt altına alınır.