I think one of the hidden gem features of VMware Virtual SAN (VSAN) is it’s software defined self healing ability. On the surface this concept is simple. The entire pool of disks in VSAN are used as hot spares. In the event of a failure, data from the failed disks or hosts are found on other disks in the cluster and replicas (mirrors) are rebuilt onto other disks in the cluster to get back to having redundant copies for protection. For VSAN, the protection level is defined through VMware’s Storage Policy Based Management (SPBM) which is built into vSphere and managed through vCenter. OK, lets get into the details.
Lets start with the smallest VSAN configuration possible that provides redundancy, a 3 host vSphere cluster with VSAN enabled and 1 SSD and 1 HDD per host. And, lets start with a single VM with the default # Failures To Tolerate (#FTT) equal to 1. A VM has at least 3 objects (namespace, swap, vmdk). Each object has 3 components (data 1, data 2, witness) to satisfy #FTT=1. Lets just focus on the vmdk object and say that the VM sits on host 1 with copies of its vmdk data on host 1 and 2 and the witness on host 3.

OK, lets start causing some trouble. With the default # Failures To Tolerate equal 1, VM data on VSAN should be available if a single SSD, a single HDD, or an entire host fails.
Continue reading “Virtual SAN Software Defined Self Healing” →