vSAN,… correction its VSAN,… OK, OK, its vSAN now, VSAN and Virtual SAN are wrong.

I spent 4 years at EMC prior to moving to VMware over 3 years ago to join the Software Defined Storage team. At EMC it was always a challenge to get the acronyms and names correct. When Acadia (VCE) first came out with the “Vblock” everybody wanted to type it as “vBlock”. I’d always try to subtly correct it and hope people got the hint. Other times I’d straight out correct them and feel like a jerk. But, to me, using the proper name was and is important. The same problem happened with “VPLEX”, everyone wanted to type it “vPLEX”. Why did people want to do this? Well, it’s VMware’s fault because they named things like “vSphere” and “vCenter” and later “vCloud” and “vRealize”. So when I joined VMware it was odd to me that we called our upcoming product “VSAN” and not “vSAN”. I’ve spent 3 years correcting Customers and VMware people one way or another that publically, and in product documentation, VMware actually could only call our product “Virtual SAN”. Many people, including me, got lazy and called it “VSAN”… but it was definitely not “vSAN”. Well, yesterday that all changed. Without going into detail, “vSAN” is the only name to use. Virtual SAN and VSAN are no more. Now I have to go fix all my spell checkers.

Citrix & VSAN

There are many VMware and Citrix customers happily running Citrix XenApp and XenDesktop on VMware vSphere clusters with Virtual SAN enabled.

Citrix XenApp is fully supported on VSAN.

Citrix XenDesktop PVS is fully supported on VSAN.

Citrix XenDesktop MCS is still not supported on VSAN by Citrix at the time of this writing on October 7, 2016. Citrix has a fix that is in 7.8 and 7.9 already and customers have reported that the fix works, however Citrix claims the fix has not been qualified by them and thus is not supported. ETA for their official support is unclear at this point but is the responsibility of Citrix. If you are needing this feature, please reach out to Citrix to let them know.

Our friends at Dell Technologies (EMC/VCE) have tested XenApp, XenDesktop PVS and MCS on VxRail and have produced a report here:

Citrix XenDesktop 7.9 and VMware vSphere 6.0 with VCE VxRail Appliance
http://www.emc.com/collateral/technical-documentation/h15433-euc-citrix-xendesktop-vxrail-sg.pdf

In it they state “Citrix official support of MCS on VMware Virtual SAN is expected in a future release of XenDesktop. EMC tested this configuration and found no observable issues.

For the record, I’ve been a fan of Citrix since I first deployed Citrix WinView in my data center and remote sites back in 1994. Yes, I’m that old. I’m sure this will all get worked out.

Replays of Virtual SAN Sessions at VMworld 2016 That You Didn’t Want to Miss

What a great week last week at VMworld 2016. I had many good meetings with customers, participated in 3 breakout sessions, met up with some old friends and met some new ones. If you missed VMworld, well, then you missed a bunch of great sessions. There’s no way you could have seen them all, so, VMware has made them available here: http://www.vmworld.com/en/sessions/2016.html.

I participated in two sessions:

The first one was a customer panel discussion on Tuesday afternoon. I need to thank Glenn Brown from Stanley Black & Decker, Mike Caruso from Synergent, Tom Cronin from M&T Bank, and Andrew Schilling from Baystate Health who all did a fantastic job representing themselves, their companies, and their use of Virtual SAN. We had great interaction from the audience with lots of good questions. For a replay of the session check this out:

Four Unique Enterprise Customers Deployment of VMware Virtual SAN [STO7560]
Glen Brown
, System Engineer, Stanley Black and Decker
Michael Caruso, AVP Corporate Information Systems, Synergent
Tom Cronin, Sr. Staff Specialist – Platforms Engineering Group, M&T Bank
Frank Gesino, Senior Technical Account Manager, VMware
Andrew Schilling, Team Leader – IT Infrastructure, Baystate Health Inc.
Tuesday, Aug 30, 5:00 p.m. – 6:00 p.m.

The other session I was involved in was on Wednesday and repeated on Thursday. I had the good fortune to present with two VSAN Product Managers who are responsible for making VSAN great. Vahid Fereydounkolahi is responsible for driving new features into the VSAN product and Rakesh Radhakrishnan is responsible for making sure all the vendor hardware components are properly qualified for VSAN and for looking out into the future of new technologies like NVMe and RDMA to adopt into VSAN. For a replay of the two sessions check these out:

Peter Keilty, Office of the CTO, Americas Field – Storage and Availability, VMware, Inc.
Rakesh Radhakrishnan, Product Management & Strategy Leader, VMware
Wednesday, Aug 31, 2:00 p.m. – 3:00 p.m.
Vahid Fereydounkolahi kicked this one off discussion VSAN features, capabilities, and how it works, I took over in the middle to discuss Day 2 operations, and Rakesh Radhakrishnan finished it off discussing the Ready Node program as well as current and future flash and IO technology that VSAN incorporates or will incorporate.
Virtual SAN Technical Deep Dive and What’s New [STO8246R]

Thursday, Sep 01, 10:30 a.m. – 11:30 a.m.
Vahid wasn’t able to make this time so I kicked things off talking about VSAN features, capabilities, how it works, and Day 2 operations, and Rakesh Radhakrishnan finished it off discussing the Ready Node program as well as current and future flash and IO technology that VSAN incorporates or will incorporate.
Virtual SAN Technical Deep Dive and What’s New [STO8246R]

In my previous blog post I highlighted the sessions you wouldn’t want to miss. So here, I’ll provide the links to those sessions. A few either haven’t been uploaded yet or won’t because of legal or future looking reasons:

Christos Karamanolis is literally the brains behind VSAN since its inception and our chief visionary for Storage. If you want the whole picture wrapped up in a 1 hour session, this is it.
An Industry Roadmap: From storage to data management [STO7903]
Christos Karamanolis, VMware Fellow – CTO of Storage and Availability, VMware
Wednesday, Aug 31, 4:00 p.m. – 5:00 p.m.

Continue reading “Replays of Virtual SAN Sessions at VMworld 2016 That You Didn’t Want to Miss”

Virtual SAN Sessions You Won’t Want to Miss at VMworld 2016

Shameless self-promotion here. I’m very excited to be presenting 2 sessions at the upcoming VMworld 2016 in Las Vegas. So, of course I think you shouldn’t miss them. The first is a customer panel session that I’ll be hosting. I’ve worked with each of these customers who have had VSAN running production workloads for well over a year. Everything wasn’t always perfect, but, they continue to expand their usage of VSAN in their data centers. In two of the customers, they are now standardized on VSAN for any new workloads. These customers will provide an overview of their deployments, answer some of my questions, then take questions from the audience.

Four Unique Enterprise Customers Deployment of VMware Virtual SAN [STO7560]
Glen Brown, System Engineer, Stanley Black and Decker
Michael Caruso, AVP Corporate Information Systems, Synergent
Tom Cronin, Sr. Staff Specialist – Platforms Engineering Group, M&T Bank
Frank Gesino, Senior Technical Account Manager, VMware
Andrew Schilling, Team Leader – IT Infrastructure, Baystate Health Inc.
Tuesday, Aug 30, 5:00 p.m. – 6:00 p.m.

This VSAN Deep Dive session will cover features of the latest VSAN release, how they work, and some best practices for deploying VSAN. I’ll be presenting along with our lead VSAN Product Managers. This session will be held on two different days.

Virtual SAN Technical Deep Dive and What’s New [STO8246R]
Peter Keilty, Office of the CTO, Americas Field – Storage and Availability, VMware, Inc.
Rakesh Radhakrishnan, Product Management & Strategy Leader, VMware
Wednesday, Aug 31, 2:00 p.m. – 3:00 p.m.
Thursday, Sep 01, 10:30 a.m. – 11:30 a.m.

Other VSAN Sessions You Won’t Want to Miss

There are so many great VSAN sessions it’s hard to pick just a few. So, here are the ones I am most familiar with that I’m confident will be great. But that doesn’t mean that some of the others won’t be.

Christos Karamanolis is literally the brains behind VSAN since its inception and our chief visionary for Storage. If you want the whole picture wrapped up in a 1 hour session, this is it.

An Industry Roadmap: From storage to data management [STO7903]
Christos Karamanolis, VMware Fellow – CTO of Storage and Availability, VMware
Wednesday, Aug 31, 4:00 p.m. – 5:00 p.m.

Continue reading “Virtual SAN Sessions You Won’t Want to Miss at VMworld 2016”

2-Node Virtual SAN Software Defined Self Healing

I continue to think one of the hidden gem features of VMware Virtual SAN (VSAN) is its software defined self healing ability.  I recently received a request for a description of 2-Node self healing. I wrote about our self healing capabilities for 3-Node, 4-Node and more here. And I wrote about Virtual SAN 6 Rack Awareness Software Defined Self Healing with Failure Domains here. I suggest you check out both before reading the rest of this. I also suggest you check out these two posts on 2-Node VSAN for a description on how they work here and are licensed here.

For VSAN, protection levels can be defined through VMware’s Storage Policy Based Management (SPBM) which is built into vSphere and managed through vCenter.  VM objects can be assigned to different policy which dictates the protection level they receive on VSAN. With a 2-Node Virtual SAN there is only one option for protection, which is the default # Failures To Tolerate (#FTT) equal to 1 using RAID1 mirroring. In other words, each VM will write to both hosts, if one fails, the data exists on the other host and is accessible as long as the VSAN Witness VM is available.

Now that we support 2-Node VSAN, the smallest VSAN configuration possible is 2 physical nodes with 1 caching device (SSD, PCIe, or NVMe) and 1 capacity device (HDD, SSD, PCIe, or NVMe) each and one virtual node (VSAN Witness VM) to hold all the witness components. Let’s focus on a single VM with the default # Failures To Tolerate (#FTT) equal to 1.  A VM has at least 3 objects (namespace, swap, vmdk).  Each object has at least 3 components (data mirror 1, data mirror 2, witness) to satisfy #FTT=1.  Lets just focus on the vmdk object and say that the VM sits on host 1 with mirror components of its vmdk data on host 1 and 2 and the witness component on the virtual Witness VM (host 3).

01 - 2-Node VSAN min

OK, lets start causing some trouble.  With the default # Failures To Tolerate equal 1, VM data on VSAN should be available if a single caching device, a single capacity device, or an entire host fails.  If a single capacity device fails, lets say the one on esxi-02, no problem, another copy of the vmdk is available on esxi-01 and the witness is available on the Witness VM so all is good.  There is no outage, no downtime, VSAN has tolerated 1 failure causing loss of one mirror, and VSAN is doing its job per the defined policy and providing access to the remaining mirror copy of data.  Each object has more that 50% of its components available (one mirror and witness are 2 out of 3 i.e. 66% of the components available) so data will continue to be available unless there is a 2nd failure of either the caching device, capacity device, or esxi-01 host.  The situation is the same if the caching device on esxi-02 fails or the whole host esxi-02 fails. VM data on VSAN would still be available and accessible. If the VM happened to be running on esxi-02 then HA would fail it over to esxi-01 and data would be available. In this configuration, there is no automatic self healing because there’s no where to self heal to. Host esxi-02 would need to be repaired or replaced in order for self healing to kick in and get back to compliance with both mirrors and witness components available.

02 - 2-Node VSAN min

Self healing upon repair

How can we get back to the point where we are able to tolerate another failure?  We must repair or replace the failed caching device, capacity device, or failed host.  Once repaired or replaced, data will resync, and the VSAN Datastore will be back to compliance where it could then tolerate one failure.  With this minimum VSAN configuration, self healing happens only when the failed component is repaired or replaced.

03 - 2-Node VSAN min

2-Node VSAN Self Healing Within Hosts and Across Cluster

To get self healing within hosts and across the hosts in the cluster you must configure your hosts with more disks. Let’s investigate what happens when there are 2 SSD and 4 HDD per host and 4 hosts in a cluster and the policy is set to # Failures To Tolerate equal 1 using the RAID 1 (mirroring) protection method.

01~ - 2-Node VSAN.png

If one of the capacity devices on esxi-02 fails then VSAN could chose to self heal to:

  1. Other disks in the same disk group
  2. Other disks on other disk groups on the same host

The green disks in the diagram below are eligible targets for the new instant mirror copy of the vmdk:

02~ - 2-Node VSAN

This is not an all encompassing and thorough explanation of all the possible scenarios.  There are dependencies on how large the vmdk is, how much spare capacity is available on the disks, and other factors.  But, this should give you a good idea of how failures are tolerated and how self healing can kick in to get back to policy compliance.

Self Healing When SSD Fails

If there is a failure of the caching device on esxi-02 that supports the capacity devices that contain the mirror copy of the vmdk then VSAN could chose to self heal to:

  1. Other disks on other disk groups on the same host
  2. Other disks on other disk groups on other hosts.

The green disks in the diagram below are eligible targets for the new instant mirror of the vmdk:

03~ - 2-Node VSAN.png

Self Healing When a Host Fails

If there is a failure of a host (e.g. esxi-02) that supports mirror of the vmdk then VSAN cannot self heal until the host is repaired or replaced.

04~ - 2-Node VSAN

Summary

VMware Virtual SAN leverages all the disks on all the hosts in the VSAN datastore to self heal.  Note that I’ve only discussed above the self healing behavior of one VM but other VM’s on other hosts may have data on the same failed disk(s) but their mirror may be on different disks in the cluster and VSAN might choose to self heal to other different disks in the cluster.  Thus the self healing workload is a many-to-many operation and thus spread around all the disks in the VSAN datastore.

Self healing is enabled by default, behavior is dependent on the software defined protection policy (#FTT setting), and can occur to disks in the same disk group, to other disk groups on the same host, or to other disks on other hosts. The availability and self healing properties make VSAN a robust storage solution for all data center applications.

VSAN In 3 Minutes Series

These are so cool I had to recognize them. If you are like me and would rather see things in action than read about them in a manual, then the VSAN In 3 Minutes Series is for you.

VSAN in 3 Minutes Series

Check the videos out. A big shout out to my colleague Greg Mulholland who does a great job putting these together.

VMware Virtual SAN at Storage Field Day 9 (SFD9) – Making Storage Great Again!

On Friday, March 18 I took the opportunity to watch the live Webcast of Storage Field Day 9. If you can carve our some time, I highly recommend this.

Tech Field Day‎@TechFieldDay
VMware Storage Presents at Storage Field Day 9

The panel of industry experts ask all the tough questions and the great VMware Storage team answers them all.

Storage Industry Experts VMware Virtual SAN Experts
  • Alex Galbraith @AlexGalbraith
  • Chris M Evans @ChrisMEvans
  • Dave Henry @DaveMHenry
  • Enrico Signoretti @ESignoretti
  • Howard Marks @DeepStorageNet
  • Justin Warren @JPWarren
  • Mark May @CincyStorage
  • Matthew Leib @MBLeib
  • Richard Arnold @3ParDude
  • Scott D. Lowe @OtherScottLowe
  • Vipin V.K. @VipinVK111
  • W. Curtis Preston @WCPreston
  • Yanbing Le @ybhighheels
  • Christos Karamanolis @XtosK
  • Rawlinson Rivera @PunchingClouds
  • Vahid Fereydouny @vahidfk
  • Gaetan Castelein @gcastelein1
  • Anita Kibunguchy @kibuanita

 

The ~2 hour presentation was broken up into easily consumable chunks. Here’s a breakdown or the recoded session:

VMware Virtual SAN Overview

In this Introduction, Yanbing Le, Senior Vice President and General Manager, Storage and Availability, discusses VMware’s company success, the state of the storage market, and the success of HCI market leading Virtual SAN in over 3000 customers.

What Is VMware Virtual SAN?

Christos Karamanolis, CTO, Storage and Availability BU, jumps into how Virtual SAN works, answers questions on the use of high endurance and commodity SSD, and how Virtual SAN service levels can be managed through VMware’s common control plane – Storage Policy Based Management.

VMware Virtual SAN 6.2 Features and Enhancements

Christos continues the discussion around VSAN features as they’ve progressed from the 1st generation Virtual SAN released in March 12, 2014 to the 2nd, 3rd, and now 4th generation Virtual SAN that was just released March 16, 2016. The discussion in this section focuses a lot on data protection features like stretched clustering and vSphere Replication. They dove deep into how vSphere Replication can deliver application consistent protection as well as a true 5 minute RPO based on the built in intelligent scheduler sending the data deltas within the 5 minute window, monitoring the SLAs, and alerting if they cannot be met due to network issues.

VMware Virtual SAN Space Efficiency

Deduplication, Compression, Distributed RAID 5 & 6 Erasure Coding are all now available to all flash Virtual SAN configurations. Christos provides the skinny on all these data reduction space efficiency features and how enabling these add very little overhead on the vSphere hosts. Rawlinson chimes on the automated way Virtual SAN can build the cluster of disks and disk groups that deliver the capacity for the shared VSAN datastore. These can certainly be built manually but VMware’s design goal is to make the storage system as automated as possible. The conversation moves to checksum and how Virtual SAN is protecting the integrity of data on disks.

VMware Virtual SAN Performance

OK, this part was incredible! Christos laid down the gauntlet, so to speak. He presented the data behind the testing that shows minimal impact on the hosts when enabling the space efficiency features. Also, he presents performance data for OLTP workloads, VDI, Oracle RACK, etc. All cards on the table here. I can’t begin to summarize, you’ll just need to watch.

VMware Virtual SAN Operational Model

Rawlinson Rivera takes over and does what he does best, throwing all caution to the wind and delivering live demonstrations. He showed the Virtual SAN Health Check and the new Virtual SAN Performance Monitoring and Capacity Management views built into the vSphere Web Client. Towards the end, Howard Marks asked about supporting future Intel NVMe capabilities and Christos’s response was that it’s safe to say VMware is working closely with Intel on ensuring the VMware storage stack can utilize the next generation devices. Virtual SAN already supports the Intel P3700 and P3600 NVMe devices.

This was such a great session I thought I’d promote it and make it easy to check it out. By the way, here’s Rawlinson wearing a special hat!

Make Storage Great Again

 

 

 

How to delete the RAID configuration from drives managed by the Cisco 12G SAS Modular Raid Controller

The content of this blog post was created by a couple of colleagues of mine, David Boone and Bradford Garvey. These guys do a phenomenal job making sure VMware Virtual SAN customers get a great experience with the product by helping them plan, configure, and test VSAN. Because of this, they end up uncovering interesting information like what’s to follow.

Sometimes Cisco UCS hardware intended to be used for VMware Virtual SAN has previously been configured for other uses. In these cases, sometimes a RAID configuration has already been configured on the drives. VSAN requires the individual drives be presented to ESXi either raw via the I/O controller set in Pass-Through Mode (See: How To Configure the Cisco 12G SAS Modular Raid Controller for Pass-Through Mode) or disks set in their own RAID 0 disk groups. Best practice is to set the I/O Controller in Pass-Through Mode (Enable JBOD).  However, if a RAID configuration previously existed, on the Cisco UCS platform there are a few extra steps to complete after enabling JBOD mode for the controller.

If drives were already configured as RAID virtual devices, delete the RAID configuration from the drives.  One way to do that is to Clear the entire VD configuration:

Clear the entire VD configuration

  • Log into the Cisco UCS Manager
  • Open a console to the host
  • Reboot the host
  • On boot up hit Ctrl+R to enter the Cisco 12G SAS Modular Raid Controller BIOS Configuration Utility
  • Hit Ctrl-N until the “VD Mgmt” page is selected
  • In the “VD Mgmt” screen, navigate to the controller, and press the F2 key.
  • Navigate to “Clear Configuration” and press Enter.  You should see this popup:

CiscoUCS - Remove RAID 1

  • Press “Yes” to delete all the virtual drives

Drives will then be in an “Unconfigured Good” state.  They might look something like this:

CiscoUCS - Remove RAID 2

If you see this, these 10 drives are in an “Unconfigured Good” state. They need to be converted to a JBOD state.

There are two options. You can convert a bunch of Unconfigured Good drives to JBOD drives (from the “VD Mgmt” screen) or you can convert a particular Unconfigured Good drive to a JBOD drive (from the “Drive Management” screen)

Option 1: Convert a bunch of Unconfigured Good drives to JBOD drives

Perform the following steps to convert a bunch of Unconfigured Good drives to JBOD drives:

  • In the “VD Mgmt” screen, navigate to the controller and press the F2 key.
  • Navigate to “Make JBOD”, and press Enter.
    The “Convert Unconfigured Good to JBOD” dialog appears, which shows all Unconfigured Good drives in the system.

CiscoUCS - Remove RAID 3

 

  • Select the Unconfigured Good drives which you want configured as JBODs for VSAN.
    To select or deselect all the Unconfigured Good drives at one go, select the topmost square backets in the “Unconfig good drives” box.
  • Press “OK”.
    The selected Unconfigured Good drives are converted to JBOD drives.

Option 2: Convert a particular Unconfigured Good drive to a JBOD drive

Perform the following steps to convert a particular Unconfigured Good drive to a JBOD drive:

  • In the “Drive Management” screen, navigate to an Unconfigured Good drive, and press the F2 key.
  • Navigate to “Make JBOD”, and press Enter.
  • Press “OK” in the message confirmation box to continue.

After converting all the 10 drives above to JBOD, the screen looks like this:

CiscoUCS - Remove RAID 4

Result

After rebooting, the BIOS will report all 10 drives and ESXi will see all of them in a JBOD (Pass-Through) configuration, with all the benefits of JBOD like being able to retrieve S.M.A.R.T.S. info from the physical drives.

The information obtained to create this post was gathered from the Avago – 12Gb/s MegaRAID® SAS Software – User Guide

Thanks again to David Boone and Bradford Garvey for providing this information.

How To Configure the Cisco 12G SAS Modular Raid Controller for Pass-Through Mode

Yesterday I was at the New England VTUG event which is always a great event to meet up with familiar faces and be introduced to some new ones. I met up with a relatively new VMware Virtual SAN customer and we discussed lots of fun things about VSAN and their implementation experience. One frustrating thing they mentioned is that they couldn’t find anywhere that documented how to put the Cisco 12G SAS Modular Raid Controller in Pass-Through mode. They explained that after lots of searching on VMware and Cisco’s site, they contacted Cisco and were provided the information. They were kind enough to capture a screenshot of the setting and provide it to me.

The procedure is:

  • Log into the Cisco UCS Manager
  • Open a console to the host
  • Reboot the host
  • On boot up hit Ctrl+R to enter the Cisco 12G SAS Modular Raid Controller BIOS Configuration Utility
  • Hit Ctrl-N until the “Ctrl Mgmt” page is selected
  • In the bottom right hand corner, make sure the “Enable JBOD” field shows an X per the screen shot below.
  • Hit Ctrl-S to save Reboot

Cisco 12G SAS Enable JBOD

That’s it. Easy.

If this is a brand new, unconfigured host, the unclaimed disks in the host will now get passed to ESXi and VSAN can use them for the VSAN datastore.

However, if this host IO Controller had previously been configured with RAID, you should check out: How to delete the RAID configuration from drives managed by the Cisco 12G SAS Modular Raid Controller

I hope that helps others save some time in getting VSAN up and running.

Special thanks to Stephanie Forde and Matthew Gabrick from the Boston Water and Sewer Commission for pointing this out and providing the screenshot.

Queue Depth and the FBWC Controller Cache module on the Cisco 12G SAS Modular Raid Controller for Virtual SAN

If you scan the bill of materials for the various Cisco UCS VSAN ReadyNodes you’ll see a line item for:

Controller Cache: Cisco 12Gbps SAS 1GB FBWC Cache module (Raid 0/1/5/6)

If you’ve followed Virtual SAN for awhile you might wonder, why would the ReadyNodes include controller cache when VMware recommends disabling controller cache when implementing Virtual SAN. Well, it turns out that the presence of the FBWC Cache module allows the queue depth of the Cisco 12G SAS Modular Raid Controller to go from the low 200’s to the advertised 895. The minimum queue depth requirement for Virtual SAN is 256 so including the FBWC Cache module allows the queue depth to increase above that minimum requirement and improve Virtual SAN performance.