VMware Virtual SAN at Storage Field Day 9 (SFD9) – Making Storage Great Again!

On Friday, March 18 I took the opportunity to watch the live Webcast of Storage Field Day 9. If you can carve our some time, I highly recommend this.

Tech Field Day‎@TechFieldDay
VMware Storage Presents at Storage Field Day 9

The panel of industry experts ask all the tough questions and the great VMware Storage team answers them all.

Storage Industry Experts VMware Virtual SAN Experts
  • Alex Galbraith @AlexGalbraith
  • Chris M Evans @ChrisMEvans
  • Dave Henry @DaveMHenry
  • Enrico Signoretti @ESignoretti
  • Howard Marks @DeepStorageNet
  • Justin Warren @JPWarren
  • Mark May @CincyStorage
  • Matthew Leib @MBLeib
  • Richard Arnold @3ParDude
  • Scott D. Lowe @OtherScottLowe
  • Vipin V.K. @VipinVK111
  • W. Curtis Preston @WCPreston
  • Yanbing Le @ybhighheels
  • Christos Karamanolis @XtosK
  • Rawlinson Rivera @PunchingClouds
  • Vahid Fereydouny @vahidfk
  • Gaetan Castelein @gcastelein1
  • Anita Kibunguchy @kibuanita

 

The ~2 hour presentation was broken up into easily consumable chunks. Here’s a breakdown or the recoded session:

VMware Virtual SAN Overview

In this Introduction, Yanbing Le, Senior Vice President and General Manager, Storage and Availability, discusses VMware’s company success, the state of the storage market, and the success of HCI market leading Virtual SAN in over 3000 customers.

What Is VMware Virtual SAN?

Christos Karamanolis, CTO, Storage and Availability BU, jumps into how Virtual SAN works, answers questions on the use of high endurance and commodity SSD, and how Virtual SAN service levels can be managed through VMware’s common control plane – Storage Policy Based Management.

VMware Virtual SAN 6.2 Features and Enhancements

Christos continues the discussion around VSAN features as they’ve progressed from the 1st generation Virtual SAN released in March 12, 2014 to the 2nd, 3rd, and now 4th generation Virtual SAN that was just released March 16, 2016. The discussion in this section focuses a lot on data protection features like stretched clustering and vSphere Replication. They dove deep into how vSphere Replication can deliver application consistent protection as well as a true 5 minute RPO based on the built in intelligent scheduler sending the data deltas within the 5 minute window, monitoring the SLAs, and alerting if they cannot be met due to network issues.

VMware Virtual SAN Space Efficiency

Deduplication, Compression, Distributed RAID 5 & 6 Erasure Coding are all now available to all flash Virtual SAN configurations. Christos provides the skinny on all these data reduction space efficiency features and how enabling these add very little overhead on the vSphere hosts. Rawlinson chimes on the automated way Virtual SAN can build the cluster of disks and disk groups that deliver the capacity for the shared VSAN datastore. These can certainly be built manually but VMware’s design goal is to make the storage system as automated as possible. The conversation moves to checksum and how Virtual SAN is protecting the integrity of data on disks.

VMware Virtual SAN Performance

OK, this part was incredible! Christos laid down the gauntlet, so to speak. He presented the data behind the testing that shows minimal impact on the hosts when enabling the space efficiency features. Also, he presents performance data for OLTP workloads, VDI, Oracle RACK, etc. All cards on the table here. I can’t begin to summarize, you’ll just need to watch.

VMware Virtual SAN Operational Model

Rawlinson Rivera takes over and does what he does best, throwing all caution to the wind and delivering live demonstrations. He showed the Virtual SAN Health Check and the new Virtual SAN Performance Monitoring and Capacity Management views built into the vSphere Web Client. Towards the end, Howard Marks asked about supporting future Intel NVMe capabilities and Christos’s response was that it’s safe to say VMware is working closely with Intel on ensuring the VMware storage stack can utilize the next generation devices. Virtual SAN already supports the Intel P3700 and P3600 NVMe devices.

This was such a great session I thought I’d promote it and make it easy to check it out. By the way, here’s Rawlinson wearing a special hat!

Make Storage Great Again

 

 

 

What if the SSD and HDD Firmware Versions are Newer Than What is Listed on the VMware Compatibility Guide (VCG) for Virtual SAN?

No problem, this is OK.

If you want to know more detail, keep reading…

Last week I was working with a customer to implement a VSAN ReadyNode. Before enabling VSAN on a cluster it’s a best practice to validate that the firmware of the host I/O Controller, SSD’s (SAS, SATA, PCIe, NVMe, or UltraDIMM), and HDD’s (SAS, NL-SAS, or SATA) are up to the required versions. Each hardware vendor has a different way of doing this.

In reviewing this particular customers hardware, we found that the SSD and HDD Firmware Versions were newer than what is listed on the VCG.

Note that for SSD’s and HDD’s, the hardware vendors provides the VMware Virtual SAN team with the firmware version they tested and qualified for VSAN. VMware then lists that firmware version for that model of disk on the VMware Compatibility Guide (VCG) for Virtual SAN. If the hardware vendor comes out with “new firmware” then it does not require VSAN re-certification of the SSD or HDD. VMware supports disks with “newer firmware” for Virtual SAN but VMware leaves the VCG alone and continues listing the “old firmware”. However, if the hardware vendor wants VMware to remove the “old firmware” from the VCG listing and replace it with the “new firmware” VMware would do that upon their request. This would typically happen if the hardware vendor discovers an issue/bug with the “old firmware”.

I hope this helps clarify how VMware treats SSD and HDD Firmware Version listings on the VMware Compatibility Guide for Virtual SAN.

Configuring HP Smart Array P420i I/O Controller for VSAN

I’ve been working with many customers over the last several months and found that many are very familiar with HP hardware and just know how to set things up.  Others are looking for guidance from VMware on how to configure for VSAN.  There are things I’ve discovered that might not be obvious but can help in the VSAN setup.  Bear in mind, I am not an HP server hardware expert, so your comments are greatly appreciated.

Before I go too far, there is a bug in the HP async controller driver for the HP 420i that is included in the HP ESXi image.  The bug reduces the queue depth to 28, instead of 1020, causing poor performance in VSAN.

Here’s how to check your hosts IO Controller (storage adapter) queue depth:

  • Run the esxtop command on the ESXi shell / SSH session
  • Press d
  • Press f and select Queue Stats (d)
  • The value listed under AQLEN is the queue depth of the storage adapter

To resolve, follow these directions to implement the correct driver:

HP ProLiant Smart Array Controller Driver for VMware vSphere 5.5 (VIB file)

OK, a little background/overview on I/O Controller guidance for VSAN.  In general, VSAN recommends disabling Read and Write cache for any I/O Controller.  Since VSAN handles Read and Write caching at the software layer, there’s no need to do it at the hardware level.  Also, when destaging write cache, we want to ensure that the writes are committed to disk and not in I/O Controller cache.

In the case of the HP P420i, you cannot disable the I/O Controller cache so VSAN recommends setting it to 100% Read which essentially disables Write cache.  I recently discovered that you can also selectively pick and choose which disks to enable cache for.

Continue reading “Configuring HP Smart Array P420i I/O Controller for VSAN”

Virtual SAN (VSAN) Beta, now 17% larger!

In a previous post here I detailed the Scale Up and Scale Out capabilities of VSAN.  It looks like I’ll need to redo my diagrams since Virtual SAN just increased the number of HDDs in a disk group from 6 to 7.  That’s a 17% increase in RAW capacity.  The number of SSD’s remain 1 per disk group, 5 per host, 40 per 8 host cluster.  With the increase from 6 to 7 HDD’s per disk group you can now have 35 HDD’s per host and in an 8 host cluster an increase from 240 to 280 HDD’s.  That’s an extra 40 HDD’s which translates to a ton of extra RAW capacity.

Virtual SAN Enabled vSphere Cluster Fully Scaled Up and Out 2

To support this increase you’ll need to download the recently released VSAN Beta code found on the VMware Virtual SAN Community page.

Also check out this great post on Virtual SAN – Sizing Considerations.

Is Virtual SAN (VSAN) Scale Up or Scale Out Storage…, Yes!

This may not be the most sophisticated definition but Scale Up storage means you buy a box with a certain amount of storage capacity and performance then sometime later you add more storage (HDD or SSD) to it to increased both capacity and IOPS performance.  Scale Out means you buy a box then sometime later add more boxes to increase both capacity and IOPS performance.  VMware Virtual SAN is both.  You have options to Scale Out as well as Scale Up.  Lets investigate.

The minimum configuration for a VSAN is 3 hosts (boxes) with 1 SSD and 1 HDD in each.  Lets say you started with that and stored enough data that you need more capacity.  You have the option to add another host (box) with SSD and HDD.  This would be a Scale Out approach to solving the problem.

Virtual SAN Scale Out

But in your analysis you might find that you don’t need the extra host CPU and memory to support more Virtual Machines.  So rather than adding another host, you can simply add more HDDs to the existing VSAN disk groups on the existing hosts for increased capacity and as a side benefit you get increased IOPS performance too.  This would be a Scale Up approach to solving the capacity problem.

Virtual SAN Scale Up - Capacity

Lets say in your VSAN analysis (see here, here, and here for good info on VSAN analysis) you are seeing a lot of read cache misses.  To improve performance caused by this problem you could increase the number of disk stripes for the VMs.  However, this doesn’t necessarily address fixing the problem.  Reducing the number of read cache misses might be better accomplished by adding more SSD caching capacity.  Like the previous example, you have the option to add another host with SSD and HDD.  But if you don’t need the extra host CPU and memory to support more Virtual Machines you can add more SSD to the existing hosts.  For each host in the VSAN enabled cluster you’ll need to create a second disk group, or up to five disk groups, and add at least 1 SSD and 1 HHD to each group.  This will increased IOPS performance (Scale Up for performance) and as a side benefit you get increased capacity.

Virtual SAN Scale Up & Out

A few VMware VSAN Beta Highlights & Best Practices to keep in mind:

  • Min 1 SSD & 1 HDD per host, Max 1 SSD & 7 HDD per disk group, Max 5 disk groups per host
  • Min 3 Hosts, Max 8 Hosts, Max 1 VSAN datastore per cluster (support for more hosts may increase in the future)
  • Max vsanDatastore = (8 hosts * 5 disk groups * 7 disks * size of disks) = 280 * size of disks
  • SSD capacity should be ~10% of your HDD capacity (e.g. 1 GB of SSD to every 10 GB of SAS/SATA).

Virtual SAN Enabled vSphere Cluster Fully Scaled Up and Out 2

VMware Virtual SAN Scales Out as your vSphere cluster Scales Out but don’t forget that you can Scale Up individual hosts as well for both increased capacity and IOPS performance.