What if the SSD and HDD Firmware Versions are Newer Than What is Listed on the VMware Compatibility Guide (VCG) for Virtual SAN?

No problem, this is OK.

If you want to know more detail, keep reading…

Last week I was working with a customer to implement a VSAN ReadyNode. Before enabling VSAN on a cluster it’s a best practice to validate that the firmware of the host I/O Controller, SSD’s (SAS, SATA, PCIe, NVMe, or UltraDIMM), and HDD’s (SAS, NL-SAS, or SATA) are up to the required versions. Each hardware vendor has a different way of doing this.

In reviewing this particular customers hardware, we found that the SSD and HDD Firmware Versions were newer than what is listed on the VCG.

Note that for SSD’s and HDD’s, the hardware vendors provides the VMware Virtual SAN team with the firmware version they tested and qualified for VSAN. VMware then lists that firmware version for that model of disk on the VMware Compatibility Guide (VCG) for Virtual SAN. If the hardware vendor comes out with “new firmware” then it does not require VSAN re-certification of the SSD or HDD. VMware supports disks with “newer firmware” for Virtual SAN but VMware leaves the VCG alone and continues listing the “old firmware”. However, if the hardware vendor wants VMware to remove the “old firmware” from the VCG listing and replace it with the “new firmware” VMware would do that upon their request. This would typically happen if the hardware vendor discovers an issue/bug with the “old firmware”.

I hope this helps clarify how VMware treats SSD and HDD Firmware Version listings on the VMware Compatibility Guide for Virtual SAN.

2 Node Virtual SAN (ROBO VSAN) Deployment

I’ve been asked a few times over the last few weeks for deployment guides or white papers for 2 Node Virtual SAN, typically used to support remote offices (ROBO VSAN). VMware currently doesn’t have specific documentation that I can find but it’s essentially the same process as building a stretched cluster, which is documented in the Administering VMware Virtual SAN guide and the Virtual SAN Stretched Clustering Guide. The difference is that with 2 node VSAN, the two fault domains only have 1 node each and are located in the same site.  We expect the witness VM to be installed on a host back in the primary data center.  The latency of the WAN link back to the primary data center can be up to 500ms and the bandwidth must be a minimum of 1.5Mbps.

In the Administering VMware Virtual SAN guide, we detail how to build a stretched cluster starting on page 51.

http://pubs.vmware.com/vsphere-60/topic/com.vmware.ICbase/PDF/virtual-san-61-administration-guide.pdf

So, basically follow these steps, create two fault domains with 1 host in each and locate them on the same site.  Then deploy the Witness VM back in the production data center. The networking requirements are less complicated since you don’t have to worry about using a Layer 2 stretched or a Layer 3 network between data sites. VSAN supports a Layer 3 network between the ROBO sites and the witness host site.

vsan-robo-wit

Here are a few other helpful blog posts on 2 Node VSAN (ROBO VSAN):

VMware Virtual SAN ROBO Edition

VMware Virtual SAN Cluster ROBO Edition: Configuration Demonstration

2 is the minimum number of hosts for VSAN if you ask me

Virtual SAN Licensing And Packaging For Virtual SAN 6.1

VMworld 2015 Keynote – Storage and Availability

I watched the streaming video of the VMworld 2015 Keynote this morning and there were plenty of highlights. In my mind and many others, Yanbing Li stole the show. Not only did she provide new and impressive announcements, she was funny too. And, she comes from the Storage and Availability Business Unit.

The Keynote began with somewhat cute application mascots escorting Carl Eschenbach on stage. Apps are the important, but, they have to run somewhere,… on infrastructure. Yanbing’s first message was that VMware is constantly working to Simplify infrastructure, which makes sense. Lets make infrastructure as simple as possible so that the focus can be put on the important things, the applications.

Simplify Extend Reach

Yanbing discussed VMware’s infrastructure foundation being vSphere 6 for compute, NSX 6.2 for network, and for storage she stated “my personal favorite storage platform, Virtual SAN” 6.1. Of course there are lots of other storage choices out there and VASA, VVols, and SPBM aims to simplify the management of those, but if you are looking to build the simplest possibly physical infrastructure then Virtual SAN makes storage a simple extension of the capabilities of the vSphere cluster. Here’s more on “What is new for Virtual SAN 6.1?”.

Once the infrastructure is up and running, vRealize Suite simplifies management of it and the deployment of VM’s. All this infrastructure software has to run on hardware, and that hardware can be a challenge to setup. So, the big announcement is how EVO SDDC Manager will simplify the deployment of physical infrastructure. The same simplification benefits that EVO:RAIL brought will be extended to whole racks of hardware. Each rack will support 1000 VM’s or 2000 VDI’s and provide up to 2M IOPS of storage performance. Using the EVO SDDC Manager, multiple racks can be combined to create a virtual rack that can be divided up into workload domains. These workload domains can be secured with NSX for multi-tenancy protection. In addition, EVO SDDC Manager will provide Non-disruptive LifeCycle Automation to take care of infrastructure and software updates, just like EVO:RAIL does today. Pretty cool. I’m looking forward to seeing this in action.

Next, Yanbing discussed how to Extend your datacenter. Basically, what new things can be done to federate hybrid clouds (between your own datacenters or your datacenter and a public cloud). The first thing discussed was something called Content Library Automatic Synchronization and Yanbing demonstrated syncing VM templates between datacenters or vCloud Air. This is great, but to move live workloads, VMware introduced vCloud Air Hybrid Network Services to enable vMotion between hybrid clouds. Side note, I was part of a project that demonstrated an alpha version of something like this at EMC World a few years back, but, back then we didn’t have NSX. Last year, VMware announced long distance vMotion between customer data centers. But this year, in Yanbing’s words, “we have just witnessed history, cross-cloud vMotion”. Makes you think back to the first time you witnessed the original vMotion.

Finally, Yanbing discussed the benefits of Reach and vCloud Air’s capabilities. Disaster Recovery and Backup Services have been available for awhile to protect critical workloads. But now VMware announces availability of vCloud Air Object Storage as a service which is powered by EMC’s Elastic Cloud Storage.  This is great news for customers as object storage becomes more and more of a requirement and one that can now be satisfied by expanding capabilities of VMware’s Storage and Availability Business Unit and vCloud Air.

There was more to the Keynote presentation but I chose to just focus this post on the Storage and Availability announcements and Yanbing’s presentation. As she walked off stage she yelled “Go VSAN!”… Awesome!

Yanbing Li

Virtual SAN – Disk Hotplug

Yesterday the VMware Compatibility Guide for Virtual SAN was significantly updated to reflect recent certification completed by the team. I should point out that one of the great things about Virtual SAN is that it is a true Software Defined Storage solution in that VMware doesn’t force specific hardware upon customers. This being said, we know customers want and need quality and reliable storage solutions and that there are some low quality components out there that could cause undesirable results with Virtual SAN. For this reason, the team has taken on the daunting task of maintaining a Compatibility Guide for Virtual SAN made up of many hardware vendor IO Controllers, SSD, PCIe, and HDD’s. They do a great job managing the unlimited combinations and handling the requests from the field.

Some of the recent things added that I know many customers have been waiting for are:

Stay tuned for more updates in July.

One thing you might see as you browse various components (IO Controllers, SSD, PCIe, and HDD’s) on the VMware Compatibility Guide for Virtual SAN are:

  • Hot-plug
  • Hot-plug Drive
  • Hot-plug Hard Drive
  • Hotplug certification for this controller is not complete and is currently in progress.
  • Disk hotplug feature is not supported on the driver.
  • The driver does not support SSD hot unplug. Please refer to KB2092959 at http://kb.vmware.com/kb/2092959for more details.

Many customers have asked me what “hotplug” means or misinterpret that drives cannot be replaced without downtime. Here’s how I have responded when asked about this:

The VMware Compatibility Guide (VCG) for Virtual SAN lists the supported components (IO Controllers, SSD, and HDD) for use in Virtual SAN.  In the case of the IO Controllers it is a combination of the controller Firmware and ESXi Device Driver that are qualified for VSAN.  In the case of SSD, PCIe, or HDD’s its the firmware that is qualified.  VMware and our technology partners puts these through rigorous tests to make sure they are enterprise ready.

One of the many tests that are done is “Disk hotplug”.  In this test, VMware starts with a perfectly good vSphere cluster with VSAN enabled and I/O workloads running. Then they walk up to the hardware and yank a HDD or SSD out of one of the hosts.  It should be noted that this scenario would likely never happen on purpose in production.  Its possible to accidentally do it but if following proper maintenance procedures it should never happen. Especially now that vSphere 6.0 / VSAN 6.0 has the ability to blink the LED of a failed disk to properly identify it.  When this “failure” test is executed, some IO Controllers, SSD’s, and HDD’s do not behave properly.  The issue can be due to the firmware/driver combination provided to VMware by the IO Controller manufacturer or could be a problem with the HDD or SSD firmware. The VMware team works with the manufacturers to identify the issues and plan for resolution. Rather than not list the component on Compatibility Guide, VMware chooses to list it but caution against this unlikely event.

By the way, VSAN is smart enough to know the difference between a yanked drive and a drive that is degraded or failed, and triggers different events accordingly.  Since yanking a drive does not actually simulate a failed drive, it’s not really a valid failed drive test. Thus, the hotplug warnings do not apply to a more realistic scenario where a drive fails or has sector errors for instance.  In these scenarios, even the components that fail the hoplug test will behave properly and as expected.  Meaning, once VSAN detects that the disk is bad and marks it as a failed drive, the drives can be evacuated from VSAN using the vSphere Web Client, then technicians are free to remove it from a live system and replace it with a new one. This hardware replacement process can be different for every vendor.

I hope this helps clarify why we list the “hotplug” warnings on various components.

What Makes VSAN Different?

I had a question today asking how VMware Virtual SAN (VSAN) compares to XYZ company. There are over a dozen virtual machine software based solutions that leverage the local disks in ESXi hosts to present storage back to the hosts in the vSphere cluster. Those solutions require a vSphere cluster to be created then their virtual machine must be installed on every host to handle the storage services. Some are more efficient at this than others but there is always level of effort to “build-your-own” storage on top of the vSphere cluster and those virtual machines can take up significant host resources to deliver on the storage services they offer. So converged infrastructure itself is nothing new or unique. Its how it’s done that is important.

Here’s what makes VMware Virtual SAN (VSAN) different:

  • VSAN is the ONLY software defined storage solution that is embedded into the ESXi hypervisor making it the most efficient data path for performance.  VM’s send their data through the hypervisor right to disk, there’s no middle man.  In addition, VSAN is the most efficient in its use of the host resources to deliver on the storage service. VSAN is designed to take up no more than 10% of the host CPU and memory resources and testing with vSphere 6 show significantly less impact than that. Since VSAN is not a VM on top of the hypervisor, it has this distinct advantage. This was a positive tradeoff for the fact that VSAN is a VMware vSphere only solution.
  • Being built in also makes it simple and easy to manage. There is no VSAN install, it is simply enabled as a feature of the hypervisor by clicking a check box. When enabled, VSAN will collect all the local disks on all the hosts and create the VSAN Datastore. Bear in mind, the server IO controller and disks must be in place and networking configurations must be completed to make sure VSAN will work when you click that check box.

VSAN Checkbox

  • VSAN is fully integrated into VMware Storage Policy Based Management (SPBM), VASA, and VVOLs. When that check box is clicked, the VSAN datastore is created and it’s VASA provider is registered with vCenter to expose it’s capabilities to SPBM. This allows different policy to be created so the same pool of capacity can deliver different service levels to different VM’s based on performance, availability, and protection. When VM’s are attached to a policy service level, their VM objects get created on the VSAN datastore in the form of Virtual Volume (VVOL) objects. VSAN further breaks these VVOL objects up into components to deliver on the defined protection and performance service levels.

VSAN and SPBM

  • VSAN deals with data protection at the software layer so it doesn’t suffer the performance and capacity penalty of hardware RAID. Different “tiers” of protection can be defined by policy and set for different VM’s using the same pool of disks in the VSAN datastore.  Numbers of Failures to Tolerate settings determine how many data replicas are written to different hosts to deliver the desired protection level for VM’s.
  • VSAN now supports a feature called “Rack Diversity”.  I wrote about the benefits here.  This brings Software Defined Self Healing with Failure Domains.  Hosts in the same rack can be placed into the same fault domain so that if an entire rack is lost then data remains available since another replica copy of the data resides on another host in another rack.

VSAN Rack Diversity

  • VSAN is a hybrid storage solution leveraging SSD as cache to accelerate both reads and writes and low cost high capacity hard disks to persist the data. This results in near All-Flash array performance at a fraction of the cost. With vSphere 6 along with Virtual SAN 6, an All-Flash VSAN is supported delivering extreme performance.

VMware Virtual SAN™ 6.0 Performance

  • VSAN is one of the few software based storage solutions that can leverage the in host SSD/Flash for Read AND Write caching. There are many solutions that can leverage in host SSD/Flash for read caching. Write back caching is more difficult to implement but VSAN does it while maintaining high availability of those writes across the cluster.

All other converged software based storage solutions require running a Virtual Machine on top of ESXi. So all VM’s have to go through their own IO path, through the hypervisor, then through that single VM IO path, then back through the hypervisor, then to the disks. In some cases, the disks themselves need to be setup with a hardware RAID configuration then their VM solution implements software RAID in addition to the underlying hardware RAID paying a double performance and capacity penalty. Each of these VM’s take on additional host CPU and Memory. Some require 2-4 vCPU’s and 16GB or more of RAM. And some are limited to the number of nodes they can scale to and how much total capacity can be supported. Again, some solutions are more efficient and scalable than others so do the homework and ask the right questions when comparing. Finally, most don’t support VMware’s Storage Policy Based Management which is the VMware framework for managing all vSphere storage going forward.

VMware’s vision for Virtual SAN is that it be the best storage solution for Virtual Machines. With the release of vSphere 6 and Virtual SAN 6, VMware is closer to that vision. There are many software defined storage choices out there.  Hopefully this helps in that decision making process.

Virtual SAN 6 – What Does a Maxed Out 64 Host VSAN Cluster Look Like?

The big VMware vSphere 6 launch was yesterday and along with it comes Virtual SAN (VSAN) 6. Here are a couple of good summaries:

Rawlinson Rivera – VMware Virtual SAN 6.0

What’s New: VMware Virtual SAN 6.0

The big news is that a vSphere cluster will now scale to 64 hosts and thus VSAN will too. So what does that look like fully scaled up and out with the maximum hosts, maximum disk groups, and maximum disks per disk group? By the way, for details on how VSAN scales up and out check Is Virtual SAN (VSAN) Scale Up or Scale Out Storage…, Yes!.

Virtual SAN (VSAN) Enabled vSphere Cluster Scaled Up and Out to 64 hosts (nodes).

64 host VSAN

Oh yea, the overall VSAN performance is significantly improved. Plus with double the number of hosts that doubles the performance. In addition, VSAN now supports an All-Flash configuration that even further increases the performance.

Virtual SAN 6 Rack Awareness – Software Defined Self Healing with Failure Domains

I continue to think one of the hidden gem features of VMware Virtual SAN (VSAN) is it’s software defined self healing ability. I wrote about it a few months back here in: Virtual SAN Software Defined Self Healing

Since Virtual SAN is such a different way to do storage, it allows for some interesting configuration combinations. With vSphere 6 (built into vSphere 6), VMware will be introducing a new add-on feature for Virtual SAN called “Rack Awareness” accomplished by creating multiple “Failure Domains” and placing hosts in the same rack into the same Failure Domain. This “Rack Awareness” feature exploits the # Failures To Tolerate policy of Virtual SAN.

The rest of this post will look a lot like the previous post I did on self healing but will translate it for the Rack Awareness feature.

Minimum Rack Awareness Configuration

Lets start with the smallest VSAN “Rack Awareness” configuration possible that provides redundancy: a 3 rack, 6 host (2 per rack) vSphere cluster with VSAN enabled and 1 SSD and 1 HDD per host. In VSAN, an SSD constitutes a disk group so the 1 HDD is placed into a Disk Group with the 1 SSD. The SSD performs the write and read caching for the HDD’s in its disk group. The HDD permanently stores the data.

Lets start with a single VM with the default # Failures To Tolerate (#FTT) equal to 1. A VM has at least 3 objects (namespace, swap, vmdk). Each object has 3 components (data 1, data 2, witness) to satisfy #FTT=1. Lets just focus on the vmdk object and say that the VM sits on host 1 with replicas/mirrors/copies (these terms can be used interchangeably) of its vmdk data on host 1 in rack 1 and host 2 in rack 2 and the witness on host 3 in rack 3. The rule in Virtual SAN is that each of these three components of an object (data 1, data 2, witness) must sit on different hosts. With Rack Awareness, they also must be in different hosts in different racks.

RackAware01

OK, lets start causing some trouble. With the default # Failures To Tolerate equal 1, VM data on VSAN should be available if a single SSD, a single HDD, a single host fails, or an entire rack fails.

Continue reading “Virtual SAN 6 Rack Awareness – Software Defined Self Healing with Failure Domains”

VMware Jobs!!! – Software Defined Storage (Virtual SAN, EVO:RAIL, etc.)

I’ve been at VMware for 1.5 years and have had a blast talking to customers, partners, and VMware employees about all things software defined storage. This primarily involves Virtual SAN & EVO:RAIL which take advantage of VASA, Storage Policy Based Management, and VVOLS. Because we are talking about storage it also includes discussing the benefits of vSphere Replication, Site Recovery Manager, and vSphere Data Protection. Basically, anything to do with storing, protecting, and managing Virtual Machine data.  Its exciting to be part of the whole software defined data center strategy.

We are growing our Software Defined Storage team and are looking for qualified rockstars. If you are one, and the topics above are familiar to you, and you are interested in joining the VMware Software Defined Storage Team, then check out the openings below.  Feel free to apply directly or reach out to me with any questions at: pkeilty at vmware dot com

You can find the openings on the VMware Public Job Page: http://vmware.jobs/

Plug in the Requisition Number below to find more details on the openings and full job descriptions:

Systems Engineers

  • Requisition Number 55635BR – Sr. Systems Engineer-Software Defined Storage-East in New York New York United States

We are also looking for SE’s in the Ohio Valley and South East USA. In addition, we are looking for a Technical Field SE in the East. These jobs Requisitions will be posted soon.

Sales

  • Requisition Number 58265BR – Storage Account Executive in Austin Texas United States
  • Requisition Number 58420BR – Storage Account Executive – Federal in Reston Virginia United States
  • Requisition Number 58501BR – Sales Leader, Software Defined Storage – Palo Alto or Austin in Austin Texas United States
  • Requisition 58504BR – Inside Sales Representative, Software Defined Storage in Austin Texas United States

Good luck!

VMware Software Defined Storage and Virtual SAN at PEX

Unfortunately I won’t be attending VMware PEX this year.  Its a great event to meet up with our great VMware partners and learn the latest VMware tech.  There will be tons of Software Defined goodness, specifically, here is a great link to all the storage stuff:

Discover Software-Defined Storage & VMware Virtual SAN at PEX 2015!

 

Best Practice for Preparing Hardware for a Virtual SAN Deployment

This may be stating the obvious but I think it’s worth repeating. Before building a Virtual SAN enabled cluster make sure:

  • The server hardware is updated to the latest and greatest system ROM / BIOS / firmware
  • The IO Controller is running the latest firmware
  • The SSD are running the latest firmware
  • The HDD are running the latest firmware

These firmware updates often resolve some important hardware issues.

Next, make sure you follow the Performance Best Practices for VMware vSphere® 5.5

  • Specifically, make sure Power Management BIOS Settings are disabled in the server BIOS (see page 17)

Once ESXi is installed on the host

  • Make sure the IO Controller is loading the correct version of the device driver.  You can look this up on the Virtual SAN HCL

I work with a lot of customers who are evaluating or implementing Virtual SAN and following these simple, obvious, but important best practices have led to better performance and a better overall experience with Virtual SAN.