The concept of instant recovery is relatively simple – the ability to run a virtual machine directly from a backup of that VM – but the possibilities offered by such a simple concept are virtually limitless, which explains why it’s considered one of the most important advances in backup and recovery for many years.
Before the advent of instant recovery all restores were basically the same, starting with how backups were stored – in some type of container or image. Prior to commercial backup-and-recovery software, backups were stored in formats such as tar, cpio, or dump.
Most commercial backup products chose to use other formats, typically proprietary ones, to store backups in, but the result was always the same; backups must first be restored in order to be useful. A restore was the reverse of a backup; it opened the backup container, extracted the appropriate files and copied them to the appropriate location.
The road to instant recovery started with some backup companies choosing to store their backups in a way that made them directly accessible; they were no longer trapped inside a container, proprietary or not. This allowed the ability to directly mount the backup of the file system instead of having to restore it first. For example, some backup systems made it possible to directly access a backed up VMDK as a VMDK, which meant that you could boot the VM using VMware.
What started as something to make the recovery of individual files faster quickly turned into something much more. For the first time, customers could easily see if the backup of their VM was any good simply by asking the backup system to mount the backup as an actual system. It broke the fundamental axiom that you never knew whether or not your backup was good until you restored it. This was definitely a game changer.
It’s important to understand the performance characteristics of a typical recovery set up because they are rarely designed to perform as well as a typical production system for many reasons.
The first challenge is that the hypervisor is not really reading a VMDK image; it is reading a virtual image being presented to it by the backup product. Depending on which product you’re using and which version of the backup you chose, the backup system may have to do quite a bit of work to present this virtual image. This is why most backup systems recommend limiting the number of instant booted images at a time if performance is important.
The second reason instant recovery is not typically high-performance is that the VMDK is on secondary storage. In a world where many primary systems have gone to all-flash arrays, today’s backup systems still use SATA, which is much slower.
The final enemy of high-performance in an instant-recovery system is that many backups are stored in a deduplicated format. Presenting the deduplicated files as a full image takes quite a bit of processing power and again takes away from the performance of the system. Some deduplication systems can store the most recent copy in an un-deduplicated fashion making them much faster for an instant-recovery set up.
How does instant recovery work?
It was no small feat to get to a point where customers could directly mount their backups into production or test. The first big change is that backups had to be stored in a way that allowed them to be directly accessed; they couldn’t be stored inside a container like tar or a proprietary image from another vendor. Some type of driver also needs to sit on top of the data in a way that allows access to multiple views of the data so that you can access the backup of a VM from different points in time. Most importantly, this driver will need to have read-write access in order for a VM to actually run, which means that it really needs to present a virtual view of the backup – not a direct one. Otherwise running a VM from its backup would actually overwrite the backup.
Once all of the above has been accomplished, the backup system needs to make available to the hypervisor the virtual view of the appropriate VMDK. This is typically done via NFS, which the hypervisor will see as a data store, allowing it to import and run the VMs.
Due to the performance characteristics mentioned above, the running VM is only temporary. If the VM is needed long-term, it needs to be restored to a typical location were VMs are stored. This can also be done by using something like Storage vMotion.
What can you do with it?
Many see backup testing as the best possible use of the instant recovery feature, and it goes way beyond simply mounting a particular VM. Some backup products are able to create recovery groups with the appropriate boot order and boot several VM’s together in order to test the recovery of all of them. Imagine the level of comfort such testing would give a typical backup administrator.
The most common use of instant recovery is the same as the initial use it was designed for – file-level recovery from an otherwise opaque image of a VM. Even if a particular backup product has the ability to do file-level recovery from within a VM backup, some customers prefer this method of recovery instead.
Instant recovery of a VM can also be used to copy a production VM to another location for testing or other purposes. Again, while most backup products have the ability to restore a backup of a VM to a different data store or hypervisor, some customers prefer using other tools to accomplish that task. Being able to directly access the VMDKs of a given VM gives these customers the functionality they’re looking for.
Instant recovery can also be used in a limited way to recover an entire VM if that VM becomes damaged in some way. For example, if someone accidentally deleted or corrupted the VM decays of a given VM, being able to quickly run that VM from a backup would allow them to recover from that mistake relatively quickly while they rectify it. However, instant recovery is not typically meant to take the place of an entire DR system due to the performance characteristics of how it works.
Instant recovery has become so popular that many customers have put it on their “must have” checklists when sending out RFPs. Using it to automatically test your entire backup every night could greatly increase your confidence in how well your backup system works. And imagine how good you would look when you immediately boot up a VM that someone accidentally deleted. Instant recovery truly changes how a backup system is perceived.