Virtual SAN 6 Rack Awareness – Software Defined Self Healing with Failure Domains

I continue to think one of the hidden gem features of VMware Virtual SAN (VSAN) is it’s software defined self healing ability. I wrote about it a few months back here in: Virtual SAN Software Defined Self Healing

Since Virtual SAN is such a different way to do storage, it allows for some interesting configuration combinations. With vSphere 6 (built into vSphere 6), VMware will be introducing a new add-on feature for Virtual SAN called “Rack Awareness” accomplished by creating multiple “Failure Domains” and placing hosts in the same rack into the same Failure Domain. This “Rack Awareness” feature exploits the # Failures To Tolerate policy of Virtual SAN.

The rest of this post will look a lot like the previous post I did on self healing but will translate it for the Rack Awareness feature.

Minimum Rack Awareness Configuration

Lets start with the smallest VSAN “Rack Awareness” configuration possible that provides redundancy: a 3 rack, 6 host (2 per rack) vSphere cluster with VSAN enabled and 1 SSD and 1 HDD per host. In VSAN, an SSD constitutes a disk group so the 1 HDD is placed into a Disk Group with the 1 SSD. The SSD performs the write and read caching for the HDD’s in its disk group. The HDD permanently stores the data.

Lets start with a single VM with the default # Failures To Tolerate (#FTT) equal to 1. A VM has at least 3 objects (namespace, swap, vmdk). Each object has 3 components (data 1, data 2, witness) to satisfy #FTT=1. Lets just focus on the vmdk object and say that the VM sits on host 1 with replicas/mirrors/copies (these terms can be used interchangeably) of its vmdk data on host 1 in rack 1 and host 2 in rack 2 and the witness on host 3 in rack 3. The rule in Virtual SAN is that each of these three components of an object (data 1, data 2, witness) must sit on different hosts. With Rack Awareness, they also must be in different hosts in different racks.

RackAware01

OK, lets start causing some trouble. With the default # Failures To Tolerate equal 1, VM data on VSAN should be available if a single SSD, a single HDD, a single host fails, or an entire rack fails.

Continue reading “Virtual SAN 6 Rack Awareness – Software Defined Self Healing with Failure Domains”

VMware Jobs!!! – Software Defined Storage (Virtual SAN, EVO:RAIL, etc.)

I’ve been at VMware for 1.5 years and have had a blast talking to customers, partners, and VMware employees about all things software defined storage. This primarily involves Virtual SAN & EVO:RAIL which take advantage of VASA, Storage Policy Based Management, and VVOLS. Because we are talking about storage it also includes discussing the benefits of vSphere Replication, Site Recovery Manager, and vSphere Data Protection. Basically, anything to do with storing, protecting, and managing Virtual Machine data.  Its exciting to be part of the whole software defined data center strategy.

We are growing our Software Defined Storage team and are looking for qualified rockstars. If you are one, and the topics above are familiar to you, and you are interested in joining the VMware Software Defined Storage Team, then check out the openings below.  Feel free to apply directly or reach out to me with any questions at: pkeilty at vmware dot com

You can find the openings on the VMware Public Job Page: http://vmware.jobs/

Plug in the Requisition Number below to find more details on the openings and full job descriptions:

Systems Engineers

  • Requisition Number 55635BR – Sr. Systems Engineer-Software Defined Storage-East in New York New York United States

We are also looking for SE’s in the Ohio Valley and South East USA. In addition, we are looking for a Technical Field SE in the East. These jobs Requisitions will be posted soon.

Sales

  • Requisition Number 58265BR – Storage Account Executive in Austin Texas United States
  • Requisition Number 58420BR – Storage Account Executive – Federal in Reston Virginia United States
  • Requisition Number 58501BR – Sales Leader, Software Defined Storage – Palo Alto or Austin in Austin Texas United States
  • Requisition 58504BR – Inside Sales Representative, Software Defined Storage in Austin Texas United States

Good luck!

VMware Software Defined Storage and Virtual SAN at PEX

Unfortunately I won’t be attending VMware PEX this year.  Its a great event to meet up with our great VMware partners and learn the latest VMware tech.  There will be tons of Software Defined goodness, specifically, here is a great link to all the storage stuff:

Discover Software-Defined Storage & VMware Virtual SAN at PEX 2015!

 

Best Practice for Preparing Hardware for a Virtual SAN Deployment

This may be stating the obvious but I think it’s worth repeating. Before building a Virtual SAN enabled cluster make sure:

  • The server hardware is updated to the latest and greatest system ROM / BIOS / firmware
  • The IO Controller is running the latest firmware
  • The SSD are running the latest firmware
  • The HDD are running the latest firmware

These firmware updates often resolve some important hardware issues.

Next, make sure you follow the Performance Best Practices for VMware vSphere® 5.5

  • Specifically, make sure Power Management BIOS Settings are disabled in the server BIOS (see page 17)

Once ESXi is installed on the host

  • Make sure the IO Controller is loading the correct version of the device driver.  You can look this up on the Virtual SAN HCL

I work with a lot of customers who are evaluating or implementing Virtual SAN and following these simple, obvious, but important best practices have led to better performance and a better overall experience with Virtual SAN.

“Virtualization and Cloud Are Here to Stay” PC Connection podcast series – VMware Software Defined Storage and Virtual SAN

This is another fun short project I was fortunate enough to be involved in with a great VMware partner, PC Connection.

VMware Software Defined Storage and Virtual SAN

This is part of their “Virtualization and Cloud Are Here to Stay” podcast series.  Thanks to PC Connection for letting me be a part of it.

 

VSAN Sessions at VMworld 2014

I’m really looking forward to attending VMworld 2014 in a few weeks.  Its a great time to catch up with friends and meet new ones.  For me the week kicks off on Thursday at the VMware SE Tech Summit.  Then vOdgeball on Sunday afternoon with a team of former EMC vSpecialists.  This is a great event to help out a great cause, the Wounded Warrior Project.  We’ll also be honoring our two vSpecialist friends, Jim Ruddy and Stephen Hunt, that were recently involved in a tragic accident.  Then the main event starts Sunday night.  It should be fun and exhausting.

Virtual SAN (VSAN) is sure to be one of the highlights of the show.

Continue reading “VSAN Sessions at VMworld 2014”

Configuring LSI MegaRAID 9271CV-81 I/O Controller for VSAN

A few colleagues of mine recently worked with a customer deploying VMware Virtual SAN (VSAN) with Cisco UCS hosts using the LSI MegaRAID 9271CV-81 I/O Controller and documented the configuration choices below. Note, in general the VSAN guidance has been to disable all controller cache so these choices follow this theme. Also note that we are not LSI experts and would welcome feedback from others on their experience with other settings.

Change the default settings in the screenshot below to the following:

  • Access = RW
  • I/O = Direct
  • Read = Disable
  • Disk Cache = Disabled
  • Disable BGI = No
  • Default Write = Write Through

LSI MegaRAID BIOS Config 01

* All settings can be changed on the fly or using storcli for VMware:

http://www.lsi.com/downloads/Public/RAID%2520Controllers/RAID%2520Controllers%2520Common%2520Files/1.09.13_StorCLI.zip 

* User guide

http://www.lsi.com/downloads/Public/RAID%20Controllers/RAID%20Controllers%20Common%20Files/51530-00_Rev_L.zip

Other versions of the MegaRAID controller might have a screen that looks something like the one below:

LSI MegaRAID BIOS Config 02

Thanks to my colleagues Justin Beck and Jason Burroughs for documenting and sharing their experience.

Virtual SAN Software Defined Self Healing

I think one of the hidden gem features of VMware Virtual SAN (VSAN) is it’s software defined self healing ability.  On the surface this concept is simple.  The entire pool of disks in VSAN are used as hot spares.  In the event of a failure, data from the failed disks or hosts are found on other disks in the cluster and replicas (mirrors) are rebuilt onto other disks in the cluster to get back to having redundant copies for protection.  For VSAN, the protection level is defined through VMware’s Storage Policy Based Management (SPBM) which is built into vSphere and managed through vCenter.  OK, lets get into the details.

Lets start with the smallest VSAN configuration possible that provides redundancy, a 3 host vSphere cluster with VSAN enabled and 1 SSD and 1 HDD per host.  And, lets start with a single VM with the default # Failures To Tolerate (#FTT) equal to 1.  A VM has at least 3 objects (namespace, swap, vmdk).  Each object has 3 components (data 1, data 2, witness) to satisfy #FTT=1.  Lets just focus on the vmdk object and say that the VM sits on host 1 with copies of its vmdk data on host 1 and 2 and the witness on host 3.

Minimum VSAN configuration with VM policy of #FTT=1

OK, lets start causing some trouble.  With the default # Failures To Tolerate equal 1, VM data on VSAN should be available if a single SSD, a single HDD, or an entire host fails.

Continue reading “Virtual SAN Software Defined Self Healing”

Configuring HP Smart Array P420i I/O Controller for VSAN

I’ve been working with many customers over the last several months and found that many are very familiar with HP hardware and just know how to set things up.  Others are looking for guidance from VMware on how to configure for VSAN.  There are things I’ve discovered that might not be obvious but can help in the VSAN setup.  Bear in mind, I am not an HP server hardware expert, so your comments are greatly appreciated.

Before I go too far, there is a bug in the HP async controller driver for the HP 420i that is included in the HP ESXi image.  The bug reduces the queue depth to 28, instead of 1020, causing poor performance in VSAN.

Here’s how to check your hosts IO Controller (storage adapter) queue depth:

  • Run the esxtop command on the ESXi shell / SSH session
  • Press d
  • Press f and select Queue Stats (d)
  • The value listed under AQLEN is the queue depth of the storage adapter

To resolve, follow these directions to implement the correct driver:

HP ProLiant Smart Array Controller Driver for VMware vSphere 5.5 (VIB file)

OK, a little background/overview on I/O Controller guidance for VSAN.  In general, VSAN recommends disabling Read and Write cache for any I/O Controller.  Since VSAN handles Read and Write caching at the software layer, there’s no need to do it at the hardware level.  Also, when destaging write cache, we want to ensure that the writes are committed to disk and not in I/O Controller cache.

In the case of the HP P420i, you cannot disable the I/O Controller cache so VSAN recommends setting it to 100% Read which essentially disables Write cache.  I recently discovered that you can also selectively pick and choose which disks to enable cache for.

Continue reading “Configuring HP Smart Array P420i I/O Controller for VSAN”

Virtual SAN Disaster Recovery – vSphere Replication (available now) or Virtual RecoverPoint (coming soon), choose your protection!

I’m often asked how to protect Virtual SAN (VSAN). Its simple, any product focused on protecting a virtual machine (VM) will work for protecting VM’s sitting on a VSAN enabled vSphere cluster. VMware offers VDP/VDPA for backup & recovery and there are many other VMware partners with backup & recovery solutions focused on protecting VM’s. Backup & Recovery is a great way to protect data but some customers like the benefit of more granular recovery points that comes from data replication either locally or to a disaster recovery site.

To protect VSAN data in a primary site to a remote disaster recovery site VMware offers vSphere Replication (VR) to replicate the VM data sitting on a VSAN Datastore over the DR site. Of course Site Recovery Manager (SRM) is supported to automate failover, failback and testing. The VR/SRM combined solution can also be used for planned data center migrations. Here are a few great write-ups on the topics:

VMware Virtual SAN Interoperability: vSphere Replication and vCenter Site Recovery Manager

Virtual SAN Interoperability – Planned migration with vSphere Replication and SRM

VSAN and vSphere Replication Interop

One of the main benefits of VR is that it will work to replicate VM data on any storage to another site with hosts connected to any other storage. So, VSAN can be the source, the target, or both.

VSAN & VR

 

vSphere Replication can be set to asynchronously replicate every day, hour, or up to every 15 minutes. Thus providing a Recovery Point Objective (RPO) of up to 15 minutes. For many customers, this is “good enough”. For some customer workloads, asynchronous replication is not “good enough”. They need synchronous replication protection and there are several solutions in the market. One that I’ve been a big fan of for a long time is EMC’s RecoverPoint which has a great reputation for protecting enterprise mission critical data and applications.  Essentially it splits every write transaction, journals it, and synchronously makes a copy of it either locally or to a remote DR site without impacting application performance. Of course there are more details but this is essentially what it does which results in being able to recover back to any point in time. Often it’s labeled as “Tivo or DVR for the data center”. One other benefit of RecoverPoint is it can replicate data from any storage to any storage, as long as there is a splitter for the storage. EMC VNX and VMAX storage arrays have splitters built in.

The big news that just came out last week that peeked my interest is that EMC is now offering a Beta of a completely software based RecoverPoint solution that embeds the splitter into vSphere. This brings the RecoverPoint benefits to any VMware customer running VM’s on any storage: block, file, or of course even VSAN. The EMC initiative is call Project Mercury and for more information check out:

Summer Gift Part 1 – Project Mercury Beta Open!

I’m excited that VSAN customers will have a choice for data protection, asynchronously with 15 minute RPO using vSphere Replication or continuous, synchronous, and asynchronous with EMC’s Virtual RecoverPoint.