Tag Archives: backup

Troubleshoot VSS errors in whole VM backups

I’ve dealt with many whole VM backup products in my experience with virtualization, including Veeam, VMware Data Protection, Avamar, vRanger Pro, Backup Exec, and more.  With that experience came lots of troubleshooting through various issues.  Originally, this post was going to deal with a recent specific issue I had, but I thought a better post would be to deal with an entire category of problems with these products, so someone could use this post to perhaps fix what could be one (or more) of lots of potential root causes, not just the singular one.  Many of the steps to troubleshoot this stuff helps keep your environment healthy and avoid lots of issues, not just issues with backups.

This post will focus specifically with VSS quiescing problems, not a definitive guide to all backup problems of VMs.

Revision Level of Your Backup Product

Often times, the issue has to do with the revision level of your backup product itself.   Generally, it’s good to be on the latest patch level, but not always.  Here are a few things to think about:

  • Is your backup product patched to current?  If not, perhaps look into doing so.
  • Is your backup product compatibile with your environment?  Check to ensure it supports the current build of your hypervisor, your hypervisor management software such as SCVMM or vCenter, and the guests you’re backing up, and take appropriate action.
  • Did you install an update to the backup product recently?  If so, perhaps there’s a bug in that update.

Revision Level of Guests That Are Backed Up

Backups that quiesce the file systems of guests depend upon OS components within said guests, and this is especially true of Windows guests, which rely on Volume Shadow Copies (VSS).  VSS, just like any other software, can have bugs in it that need to be fixed, so there are patches to VSS.  Other OS components could also be the culprit.  Ensure your guests are patched to current.  Conversely, if you recently applied patches to your guests recently, perhaps there are problems with those updates, so you may try removing those.

As a side note, I would recommend using multiple methods of checking your guest patch levels.  For example, while not very common, I’ve seen numerous cases of Windows Update saying all patches are installed, but when I used a second utility to check, those utilities reported missing patches.  Use a second utility to check, such as Microsoft Baseline Security Analyzer (which is free) if the guest is Windows based, to ensure you’re not missing anything.

Also, don’t assume the guests are patched to current.  I recently ran into an issue where the customer somehow hadn’t patched the server… ever.  Somehow it slipped through the cracks.

Hypervisor Revisions

Hypervisors also can cause issues with quiescing.  Some considerations here:

  • Does the build of the hypervisor support the guest having the issue?
  • Are the hypervisors patched to current?  If not, consider updating them.
  • Were the hypervisors recently patched?  If so, perhaps one of the installed patches has a problem, and removing it might resolve the issue.
  • Have the in guest optimization components such as VMTools within the guests been updated?  If not, do so.  If this was done recently, perhaps try to downgrade them to see if that resolves the issue.  These are important, as this is typically the means by which the hypervisor issues the command to quiesce the file system within the guest.

Other Guest Considerations

There are other issues that can cause problems with backups.

  • Other backup agents installed within the guest can also cause problems.  Remove any backup agents that are no longer needed.  I personally just ran into this issue with a customer that had an old Backup Exec agent from before they used their current backup product.
  • Applications have their own VSS agents, such as SQL and Exchange.  Sometimes those need to be updated, too.  It can also be that recent updates to them can also cause problems with quiescing.  Look for updates to those, or remove recent updates.
  • Antivirus software has also been known to cause VSS issues.  Try updating, disabling, configure proper exclusions, uninstalling and/or reinstalling the AV agents.
  • Ensure there is adequate free space within the guests.
  • There are a finite number of shadow copies, and when that limit is reached, it can cause quiescing to fail.  Try removing all shadow copies within the guest using the command:  vssadmin delete shadows /all

Hopefully, this provides you with some ideas to try to resolve the issue you’re experiencing.

Do you have any other tips for resolving VSS issues with whole VM backups?

NDMP isn’t always a slam dunk

Hey everyone,

I recently did some work with a customer who deployed a VNXe to act initially as a file server for quite a bit of file data.  They were using Unitrends to backup their Hyper-V VMs, and for application level backups and physical machines.  Before deploying the VNXe to serve up CIFS data, they ran a Windows file server, which Unitrends could simply backup by deploying an agent into the OS, and all was well.

The customer asked me how to go about backing up the VNXe, especially with Unitrends, because they didn’t want to switch backup products or deploy an entirely new backup solution just for the VNXe.  My initial response was that NDMP is the best way to go for backups if that is an option with your backup product.

However, I did some research, and found that NDMP isn’t always the best thing to use.  In this case, check out Unitrends’ limitations when using NDMP.   The fact that you can’t restore your backups to anything but a VNXe, and you can’t granularly recover out of an NDMP backup would be very serious deal breakers for a lot of companies.

So check your NDMP backup vendor and storage array for what is supported, and make sure it’s what you want to use for backing it up.  You may actually prefer using CIFS mounts within a server, and back the server up to get the file data.