Wednesday, November 25, 2015

ESXi 6.0 - CBT bug and fix

In the last weeks there was found another bug in the CBT feature in ESXi 6.0 and ESXi Update 1.

The bug affects all Backups in the VMware Virtual area. This bug affect any tool that use CBT(used in the incremental backups) in their backups.  

VMware note:

"When running incremental virtual machine backups, backup applications typically rely on the vSphere API call QueryDiskChangedAreas() to determine the changed sectors.

This issue occurs due to a problem with CBT in the disklib area, which results in the change tracking information of I/Os that occur during snapshot consolidation to be lost. The main backup payload data is never lost and it is always written to the backend device. However, the corresponding change tracking information entries which occur during the consolidation task are missed. Subsequent QueryDiskChangedAreas() calls do not include these missed blocks and, therefore a backup based on this CBT data is inconsistent." 
To fix this issue, we need to disable CBT in the VMs, or just deselected the use of CBT in the Backups jobs.
  • How to disable CBT in your Veeam Backup & Replication:
             Edit job and in the Storage menu choose Advanced and in vSphere Tab. Disable in the CBT

Or other option was to downgrade to ESXi 5.5 and VM hw revision 10.  This is no solution for us of course.

In out environment this have a huge impact. As we all know, CBT is to be use mainly in Incremental backups, whiteout this all changes are not synchronized and all backups work as was a full backup.

Using CBT backups will only backup the changes that happen in the VMs after the last backup.

In our case we use Veeam Backup & Replication, and on jobs that normally takes 1/2h to finish is taking more then 3/4h. Others with more VMs that takes around 6/7h to finish is taking more than 18h to finish. With this our daily backups cannot finish in a 24h cycle and have a huge impact in our environment.

Plus the size of each backup will increase. When you have around 20Tb of backup repository data already, this size will increase. And in the last days I need to had more 2Tb to our Backup Repository because of this problem.

Of course this is huge problem for many environments. Not only the backups cycle are not running properly(with all issues in the restores that maybe be needed), but also the size of the backups will increase.

UPDATE 27-11-2015: Finally in the last days VMware as launched a fix for this Bug.
We will implement during this next weekend and hope that will fix the problem.

More information about the patch in VMware HERE

Hope this article can you help understanding this bug and fix.