A few colleagues of mine recently worked with a customer deploying VMware Virtual SAN (VSAN) with Cisco UCS hosts using the LSI MegaRAID 9271CV-81 I/O Controller and documented the configuration choices below. Note, in general the VSAN guidance has been to disable all controller cache so these choices follow this theme. Also note that we are not LSI experts and would welcome feedback from others on their experience with other settings.
Change the default settings in the screenshot below to the following:
- Access = RW
- I/O = Direct
- Read = Disable
- Disk Cache = Disabled
- Disable BGI = No
- Default Write = Write Through
* All settings can be changed on the fly or using storcli for VMware:
* User guide
Other versions of the MegaRAID controller might have a screen that looks something like the one below:
Thanks to my colleagues Justin Beck and Jason Burroughs for documenting and sharing their experience.
I think one of the hidden gem features of VMware Virtual SAN (VSAN) is it’s software defined self healing ability. On the surface this concept is simple. The entire pool of disks in VSAN are used as hot spares. In the event of a failure, data from the failed disks or hosts are found on other disks in the cluster and replicas (mirrors) are rebuilt onto other disks in the cluster to get back to having redundant copies for protection. For VSAN, the protection level is defined through VMware’s Storage Policy Based Management (SPBM) which is built into vSphere and managed through vCenter. OK, lets get into the details.
Lets start with the smallest VSAN configuration possible that provides redundancy, a 3 host vSphere cluster with VSAN enabled and 1 SSD and 1 HDD per host. And, lets start with a single VM with the default # Failures To Tolerate (#FTT) equal to 1. A VM has at least 3 objects (namespace, swap, vmdk). Each object has 3 components (data 1, data 2, witness) to satisfy #FTT=1. Lets just focus on the vmdk object and say that the VM sits on host 1 with copies of its vmdk data on host 1 and 2 and the witness on host 3.
OK, lets start causing some trouble. With the default # Failures To Tolerate equal 1, VM data on VSAN should be available if a single SSD, a single HDD, or an entire host fails.
Continue reading “Virtual SAN Software Defined Self Healing”