Virtual SAN – Disk Hotplug

Yesterday the VMware Compatibility Guide for Virtual SAN was significantly updated to reflect recent certification completed by the team. I should point out that one of the great things about Virtual SAN is that it is a true Software Defined Storage solution in that VMware doesn’t force specific hardware upon customers. This being said, we know customers want and need quality and reliable storage solutions and that there are some low quality components out there that could cause undesirable results with Virtual SAN. For this reason, the team has taken on the daunting task of maintaining a Compatibility Guide for Virtual SAN made up of many hardware vendor IO Controllers, SSD, PCIe, and HDD’s. They do a great job managing the unlimited combinations and handling the requests from the field.

Some of the recent things added that I know many customers have been waiting for are:

Stay tuned for more updates in July.

One thing you might see as you browse various components (IO Controllers, SSD, PCIe, and HDD’s) on the VMware Compatibility Guide for Virtual SAN are:

  • Hot-plug
  • Hot-plug Drive
  • Hot-plug Hard Drive
  • Hotplug certification for this controller is not complete and is currently in progress.
  • Disk hotplug feature is not supported on the driver.
  • The driver does not support SSD hot unplug. Please refer to KB2092959 at more details.

Many customers have asked me what “hotplug” means or misinterpret that drives cannot be replaced without downtime. Here’s how I have responded when asked about this:

The VMware Compatibility Guide (VCG) for Virtual SAN lists the supported components (IO Controllers, SSD, and HDD) for use in Virtual SAN.  In the case of the IO Controllers it is a combination of the controller Firmware and ESXi Device Driver that are qualified for VSAN.  In the case of SSD, PCIe, or HDD’s its the firmware that is qualified.  VMware and our technology partners puts these through rigorous tests to make sure they are enterprise ready.

One of the many tests that are done is “Disk hotplug”.  In this test, VMware starts with a perfectly good vSphere cluster with VSAN enabled and I/O workloads running. Then they walk up to the hardware and yank a HDD or SSD out of one of the hosts.  It should be noted that this scenario would likely never happen on purpose in production.  Its possible to accidentally do it but if following proper maintenance procedures it should never happen. Especially now that vSphere 6.0 / VSAN 6.0 has the ability to blink the LED of a failed disk to properly identify it.  When this “failure” test is executed, some IO Controllers, SSD’s, and HDD’s do not behave properly.  The issue can be due to the firmware/driver combination provided to VMware by the IO Controller manufacturer or could be a problem with the HDD or SSD firmware. The VMware team works with the manufacturers to identify the issues and plan for resolution. Rather than not list the component on Compatibility Guide, VMware chooses to list it but caution against this unlikely event.

By the way, VSAN is smart enough to know the difference between a yanked drive and a drive that is degraded or failed, and triggers different events accordingly.  Since yanking a drive does not actually simulate a failed drive, it’s not really a valid failed drive test. Thus, the hotplug warnings do not apply to a more realistic scenario where a drive fails or has sector errors for instance.  In these scenarios, even the components that fail the hoplug test will behave properly and as expected.  Meaning, once VSAN detects that the disk is bad and marks it as a failed drive, the drives can be evacuated from VSAN using the vSphere Web Client, then technicians are free to remove it from a live system and replace it with a new one. This hardware replacement process can be different for every vendor.

I hope this helps clarify why we list the “hotplug” warnings on various components.