vSAN IP Networking Versus Fibre Channel

Today I was asked by a customer: “By taking all of our storage traffic off of our Fibre Channel network and putting it onto our IP network, won’t that cause major network congestion?”

Quick answer is no if you implement some form of traffic “Isolation”. There are many ways to achieve isolation of vSAN traffic. All of them are identical to how you would isolate NAS or iSCSI storage traffic.

vSAN traffic can be isolated using dedicate physical switches. But most customers implementations leverage existing core switches that support all other IP traffic. In this case, for vSAN, the strong recommendation is to create a dedicated VLAN for the vSAN traffic.

On the host side, you can dedicate physical NIC’s for vSAN traffic. Some customers do this but others find that shared physical 10GbE NICs (2 for redundancy) on hosts provides enough bandwidth. When doing this, it is recommend to implement vSphere Distributed Switches and then configure vSphere NIOC to allocate bandwidth shares for different vSphere traffic. Typically vSAN should be allocated 50% of the shares.

There is an updated networking design guide for more details on all of this here:

VMware® Virtual SAN™ Design and Sizing Guide 

vSAN Maintenance Mode Considerations

There are 3 options when putting a host in maintenance mode when that host is a member of a vSphere Cluster with vSAN enabled.  You follow the normal process to put a host in maintenance mode, but if vSAN is enabled, these options will pop up:

  1. Ensure accessibility
  2. Full data migration
  3. No data migration

There’s a 4th consideration that I’ll describe at the end.

I would expect most virtualization administrators to pick “Ensure accessibility” almost every time.

Ensure accessibility

Before we investigate, I want to reinforce that vSAN, by default, is designed to work and continue to provide VM’s access to data even if a host disappears.  The default vSAN policy is “Number of Failures To Tolerate” equal to 1 (#FTT=1), which means a HDD, SSD, or whole host (thus all the SSD and HDD on that host) can be unavailable, and data is available somewhere else on another host in the cluster.  If a host is in maintenance mode, then it is down, but vSAN by default has another copy of the data on another host.

VMware documents the options here:

Place a Member of Virtual SAN Cluster in Maintenance Mode

Ensure accessibility

This option will check to make sure that putting the particular host in maintenance mode will not take away the only data copy of any VM.  There are two scenarios I can think of that this would happen:

  • In Storage Policy Based Management, you created a Storage Policy based on vSAN with #FTT=0 and attached at least 1 VM to that policy and that VM has data on the host going into maintenance mode.
  • Somewhere in the cluster you have failed drives or hosts and vSAN self-healing rebuilds haven’t completed. You then put a host into maintenance mode and that host has the only good copy of data remaining.

As rare as these scenarios are, they are possible.  By choosing the “Ensure accessibility” option, vSAN will find the single copies of data on that host and regenerate them on other hosts. Now when the host goes into maintenance mode, all VM data is available.  This is not a full migration of all the data off that host, its just a migration of the necessary data to “ensure accessibility” by all the VM’s in the cluster.  When the host goes into maintenance mode, it may take a little bit of time to complete the migration but you’ll know that VM’s won’t be impacted.  During the maintenance of this host, some VM’s will likely be running in a degraded state with 1 less copy that the policy specifies.  Personally, I think this choice makes the most sense most of the time, it is the default selection, and I expect vSphere administrators to choose this option almost every time.

No data migration

This option puts the host in maintenance mode no matter what’s going on in the cluster.  I would expect virtualization administrators to almost never pick this option unless:

  • You know the cluster is completely healthy (no disk or host failures anywhere else)
  • The VM’s that would be impacted aren’t critical.
  • All the VM’s in the cluster are powered off.

For the reasons explained in the “Ensure accessibility” above, its possible that the host going into maintenance mode has the only good copy of the data.  If this is not a problem, then choose this option for the fastest way to put a host into maintenance mode.  Otherwise, choose “Ensure accessibility”.

Full data migration

I would expect virtualization administrators to choose this option less frequently than “Ensure Accessibility” but will choose it for a couple of reasons:

  • The host is being replaced by a new one.
  • The host will be down for a long time, longer than the normal maintenance window of applying a patch and rebooting.
  • You want to maintain the #FTT availability for all VM’s during the maintenance window

Keep in mind, if you choose this option you must have 4 or more hosts in your cluster, and you don’t mind waiting for the data migration to complete.  The time to complete the data migration is dependent on the amount of capacity consumed on the host going into maintenance mode.  Yes, this could take some time.  The laws of physics apply.  10GbE helps to move more data in the same amount of time. And it helps if the overall environment is not too busy.

When the migration is complete, the host is essentially evacuated out of the cluster and all it’s data is spread across the remaining hosts.  VM’s will not be running in a degraded state during the maintenance window and will be able to tolerate the failures per their #FTT policy.

4th consideration

I mentioned there is a 4th consideration.  For the VM’s that you want protected with at least two copies of data (#FTT=1) even during maintenance windows, you have two options.  One is to set the #FTT=2 for those VM’s so they have 3 copies on 3 different hosts.  If one of those hosts is in maintenance mode and you didn’t choose “Full Data Migration” then you still have 2 copies on other hosts, thus the VM’s could tolerate another failure of a disk or host.  You could choose to create a storage policy based on vSAN with #FTT=2 and attach your most critical VM’s to it.  For more information on running business critical applications on vSAN see:

Running Microsoft Business Critical Application on Virtual SAN 6.0

I hope this helps in your decision making while administering vSAN.  I recommend testing the scenarios prior to implementing a cluster in production so you get a feel for the various options.

Podcast Fun!

In my role I have to drive a lot around New England. To pass the time I listen to a number of podcasts. Some of my favorites include:

Job Related:

Fun stuff:

But by far my favorite and the most entertaining is:

Virtually Speaking

I guess it’s partly because it focuses on storage for VMware environments, but, it’s also because Pete Flecha and John Nicholson are the right amount of funny, geek, and attitude all rolled into one.

A few weeks ago I had the chance to sit with John Nicholson and Duncan Epping to record some sound bits regarding customer experiences with vSAN in the field. I get to meet and work with a lot of remarkable customers up and down the eastern USA and over the last 3 years I’ve seen them accomplish great things with vSAN. You name an application or use case and it’s pretty likely its being done with vSAN. I was able to share a few stories as was Josh Fidel (@jcefidel) who’s doing great things with vSAN at customers in the Michigan, Ohio, Indiana, and Kentucky areas. He’s no SLOB and don’t let him fool you, he’s as smart as he is interesting. Check out what I mean by listening to this episode:

Virtually Speaking Podcast Episode 36: vSAN Use Cases

https://blogs.vmware.com/virtualblocks/2017/02/21/vspeaking-podcast-episode-36-vsan-use-cases/

 

 

 

vSAN and Data-At-Rest Encryption – Rebooted (i.e. Part 2)

 

Encryption is here, now shipping with vSphere 6.5.

I first wrote about vSAN and Encryption here:

Virtual SAN and Data-At-Rest Encryption – https://livevirtually.net/2015/10/21/virtual-san-and-data-at-rest-encryption/

At the time, I knew what was coming but couldn’t say. Also, the vSAN team had plans that changed. So, let’s set the record straight.

vSAN

  • Does not support Self Encrypting Drives (SEDs) with encryption enabled.
  • Does not support controller based encryption.
  • Supports 3rd party software based encryption solutions like HyTrust DataControl and Dell EMC Cloud Link.
  • Supports the VMware VM Encryption released with vSphere 6.5
  • Will support its own VMware vSAN Encryption in a future release.

At VMworld 2016 in Barcelona VMware announced vSphere 6.5 and with it, VM Encryption. In the past, VMware relied on 3rd party encryption solutions, but now, VMware has its own. For more details, check out:

What’s new in vSphere 6.5: Security – https://blogs.vmware.com/vsphere/2016/10/whats-new-in-vsphere-6-5-security.html

In this, Mike Foley briefly highlights a few advantages of VM Encryption. Stay tuned for more from him on this topic.

In addition to what Mike highlighted, VM encryption is implemented using VAIO Filters, can be enabled per VM object (vmdk), will encrypt VM data no matter what storage solution is implemented (e.g. object, file, block using vendors like VMware vSAN, Dell Technologies, NetApp, IBM, HDS, etc.), and satisfies data-in-flight and data-at-rest encryption. The solution does not require SED’s so it works with all the commodity HDD, SSD, PCIe, and NVMe devices and integrates with several third party Key Management solutions. Since VM Encryption is set via policy, that policy could extended across to public clouds like Cloud Foundation on IBM SoftLayer, VMware Cloud on AWS, VMware vCloud Air or to any vCloud Air Network partner. This is great because your VM’s could live in the cloud but you will own and control the encryption keys. And you can use different keys for different VM’s.

At VMworld 2016 in Las Vegas VMware announced the upcoming vSAN Beta. For more details see:

Virtual SAN Beta – Register Today! – https://blogs.vmware.com/virtualblocks/2016/09/07/virtual-san-beta-register-today/

This vSAN Beta includes vSAN encryption targeted for a future release of vSphere. vSAN Encryption will satisfy data-at-rest encryption. You might ask why vSAN Encryption would be necessary if vSphere has VM Encryption? I will say that you should always look to use VM Encryption first. The one downside to VM Encryption is that since the VM’s data is encrypted as soon as it leaves the VM and hits the ESXi kernel, each block is unique, so no matter what storage system that data goes to (e.g. VMware vSAN, Dell Technologies, NetApp, IBM, HDS, etc.) that block can’t be deduped or compressed. The benefit of vSAN encryption will be that the encryption will be done at the vSAN level. Data will be send to the vSAN cache and encrypted at that tier. When it is later destaged, it will be decrypted, deduped, compressed, and encrypted when its written to the capacity tier. This satisfies the data-at-rest encryption requirements but not data-in-flight. It does allow you to take advantage of the vSAN dedupe and compression data services and it’s one key for the entire vSAN datastore.

It should be noted that both solutions will require a 3rd party Key Management Server (KMS) and the same one can be used for both VM Encryption and vSAN Encryption. The KMS must support the Key Management Interoperability Protocol (KMIP) 1.1 standard. There are many that do and VMware has tested a lot of them. We’ll soon be publishing a list, but for now, check with your KMS vendor or your VMware SE for details.

VMware is all about customer choice. So, we offer a number of software based encryption options depending on your requirements.

It’s worth restating that VM Encryption should be the standard for software based encryption for VM’s. After reviewing vSAN Encryption, some may choose it instead to go with vSAN encryption if they want to take advantage of deduplication and compression. Duncan Epping provides a little more detail here:

The difference between VM Encryption in vSphere 6.5 and vSAN encryption – http://www.yellow-bricks.com/2016/11/07/the-difference-between-vm-encryption-in-vsphere-6-5-and-vsan-encryption/

 

In summary:

  1. Use VM Encryption for Hybrid vSAN clusters
  2. Use VM Encryption on All-Flash if storage efficiency (dedupe/compression) is not critical
  3. Wait for vSAN native software data at rest encryption if you must have dedupe/compression on All-Flash

 

Correlating vSAN versions with vSphere (vCenter & ESXi) Versions

I often get asked if a certain version of vSAN can be deployed on a different version of vSphere. The answer is no. vSAN is built into the vSphere version. That means vCenter needs to be upgraded to the correct version of vCenter and all the hosts in the cluster need to be upgraded to the correct version of ESXi in order to get the features of that version of vSAN. Lastly, vSAN formats each disk drive with an on-disk format, so to get the full features of a specific release, you may need to update the on-disk format.

Here’s basically how everything breaks down:

  • If you have vSphere 5.5 (vCenter Server 5.0 & ESXi 5.0) then you have vSAN 5.5.
  • If you have vSphere 6.0 (vCenter Server 6.0 & ESXi 6.0) then you have vSAN 6.0.
  • If you have vSphere 6.0 U1 (vCenter Server 6.0 Update 2 & ESXi 6.0 Update 1) then you have vSAN 6.1.
  • If you have vSphere 6.0 U2 (vCenter Server 6.0 Update 2 & ESXi 6.0 Update 2) then you have vSAN 6.2.
  • If you have vSphere 6.5 (vCenter Server 6.5 & ESXi 6.5) then you have vSAN 6.5.
  • If you have vSphere 6.5.0d (vCenter Server 6.5.0d & ESXi 6.5.0d) then you have vSAN 6.6.
  • If you have vSphere 6.5 Update 1 (vCenter Server 6.5 Update 1 & ESXi 6.5 Update 1) then you have vSAN 6.6.1.

Here’s a more detailed matrix:

Version Release

Date

Build

Number

Installer Build Number vSAN Version vSAN

On-Disk Format

(Web Client)

ESXi 6.5 Update 1 2017-07-27 5969303 N/A 6.6.1 5
ESXi 6.5.0d 2017-04-18 5310538 N/A 6.6 5
ESXi 6.5. Express Patch 1a 2017-03-28 5224529 N/A 6.5 2.5, 3
ESXi 6.5. Patch 01 2017-03-09 5146846 5146843 6.5 2.5, 3
ESXi 6.5.0a 2017-02-02 4887370 N/A 6.5  2.5, 3
ESXi 6.5 GA 2016-11-15 4564106 N/A 6.5  2.5, 3
ESXi 6.0 Express Patch 7a 2017-03-28 5224934 N/A 6.2 2.5, 3
ESXi 6.0 Update 3 2017-02-24 5050593 N/A 6.2 2.5, 3
ESXi 6.0 Patch 4 2016-11-22 4600944 N/A 6.2 2.5, 3
ESXi 6.0 Express Patch 7 2016-10-17 4510822 N/A 6.2 2.5, 3
ESXi 6.0 Patch 3 2016-08-04 4192238 N/A 6.2 2.5, 3
ESXi 6.0 Express Patch 6 2016-05-12 3825889 N/A 6.2 2.5, 3
ESXi 6.0 Update 2 2016-03-16 3620759 N/A 6.2 2.5, 3
ESXi 6.0 Express Patch 5 2016-02-23 3568940 N/A 6.1 2
ESXi 6.0 Update 1b 2016-01-07 3380124 N/A 6.1 2
ESXi 6.0 Express Patch 4 2015-11-25 3247720 N/A 6.1 2
ESXi 6.0 U1a (Express Patch 3) 2015-10-06 3073146 N/A 6.1 2
ESXi 6.0 U1 2015-09-10 3029758 N/A 6.1 2
ESXi 6.0.0b 2015-07-07 2809209 N/A 6.0 2
ESXi 6.0 Express Patch 2 2015-05-14 2715440 N/A 6.0 2
ESXi 6.0 Express Patch 1 2015-04-09 2615704 2615979 6.0 2
ESXi 6.0 GA 2015-03-12 2494585 N/A 6.0 2
ESXi 5.5 Patch 10 2016-12-20 4722766 4761836 5.5 1
ESXi 5.5 Patch 9 2016-09-15 4345813 4362114 5.5 1
ESXi 5.5 Patch 8 2016-08-04 4179633 N/A 5.5 1
ESXi 5.5 Express Patch 10 2016-02-22 3568722 N/A 5.5 1
ESXi 5.5 Express Patch 9 2016-01-04 3343343 N/A 5.5 1
ESXi 5.5 Update 3b 2015-12-08 3248547 N/A 5.5 1
ESXi 5.5 Update 3a 2015-10-06 3116895 N/A 5.5 1
ESXi 5.5 Update 3 2015-09-16 3029944 N/A 5.5 1
ESXi 5.5 Patch 5 re-release 2015-05-08 2718055 N/A 5.5 1
ESXi 5.5 Express Patch 7 2015-04-07 2638301 N/A 5.5 1
ESXi 5.5 Express Patch 6 2015-02-05 2456374 N/A 5.5 1
ESXi 5.5 Patch 4 2015-01-27 2403361 N/A 5.5 1
ESXi 5.5 Express Patch 5 2014-12-02 2302651 N/A 5.5 1
ESXi 5.5 Patch 3 2014-10-15 2143827 N/A 5.5 1
ESXi 5.5 Update 2 2014-09-09 2068190 N/A 5.5 1
ESXi 5.5 Patch 2 2014-07-01 1892794 N/A 5.5 1
ESXi 5.5 Express Patch 4 2014-06-11 1881737 N/A 5.5 1
ESXi 5.5 Update 1a 2014-04-19 1746018 N/A 5.5 1
ESXi 5.5 Express Patch 3 2014-04-19 1746974 N/A 5.5 1
ESXi 5.5 Update 1 2014-03-11 1623387 N/A 5.5 1
ESXi 5.5 Patch 1 2013-12-22 1474528 N/A 5.5 1
ESXi 5.5 GA 2013-09-22 1331820 N/A 5.5 1

 

As a reference, see:

Build numbers and versions of VMware vSAN (2150753) – This is a new KB post that went up on July 31, 2017 which provides the same information as above.

Build numbers and versions of VMware vCenter Server (2143838)

Build numbers and versions of VMware ESXi/ESX (2143832)

Understanding vSAN on-disk format versions (2145267)

 

 

 

 

 

How to Change All of the vSAN VMkernel Port IP Addresses in a vSphere Cluster.

Several months ago I was asked how to change all the vSAN VMkernel port IP Addresses in a vSphere cluster and today I was asked again, so here it is.

Non-Disruptively

Assuming each host has 2 VMkernel ports (a & b) enabled for vSAN traffic.

  • Disable vSAN traffic on each of the b networking interfaces on each host
  • Change the IP addresses on each of the b networking interfaces on each host
  • Move the physical network cable if moving to new switch ports
  • Re-enable vSAN traffic on each of the b networking interfaces on each host
  • Verify communication between all the b networking interfaces using vmkping test.
  • Repeat for all the a networking interfaces

Disruptively (downtime is OK and/or the hosts are being moved)

 

vSAN,… correction its VSAN,… OK, OK, its vSAN now, VSAN and Virtual SAN are wrong.

I spent 4 years at EMC prior to moving to VMware over 3 years ago to join the Software Defined Storage team. At EMC it was always a challenge to get the acronyms and names correct. When Acadia (VCE) first came out with the “Vblock” everybody wanted to type it as “vBlock”. I’d always try to subtly correct it and hope people got the hint. Other times I’d straight out correct them and feel like a jerk. But, to me, using the proper name was and is important. The same problem happened with “VPLEX”, everyone wanted to type it “vPLEX”. Why did people want to do this? Well, it’s VMware’s fault because they named things like “vSphere” and “vCenter” and later “vCloud” and “vRealize”. So when I joined VMware it was odd to me that we called our upcoming product “VSAN” and not “vSAN”. I’ve spent 3 years correcting Customers and VMware people one way or another that publically, and in product documentation, VMware actually could only call our product “Virtual SAN”. Many people, including me, got lazy and called it “VSAN”… but it was definitely not “vSAN”. Well, yesterday that all changed. Without going into detail, “vSAN” is the only name to use. Virtual SAN and VSAN are no more. Now I have to go fix all my spell checkers.