Recently, with the announcement of the availability of VVols in vSphere.NEXT I was asked to give a deep dive presentation to a customer with a focus on what VVols meant for protection VM’s. While at EMC as a vSpecialist I lead a group focused on protecting VM’s so this is something I’ve been interested in for awhile. I’m a big fan of RecoverPoint and am excited about virtual RecoverPoint’s ability to offer continuous data protection for VSAN as I indicated here. I’m also a huge fan of VPLEX and spent a lot of time during my days at EMC discussing what it could do. The more I dug into what VVols could do to help with various VM movement and data protection schemes the more I realized there was much to be excited about but also much need for clarification. So, after some research, phone calls, and email exchanges with people in the know I gathered the information and felt it would be good information to share.
What follows is kind of a “everything but the kitchen sink” post on various ways to move and protect VM’s. There were several pieces of the puzzle to put together so here are the past, present, and future options.
XvMotion (Enhanced vMotion) – vMotion without shared storage – Released in vSphere 5.1
In vSphere 5.1 VMware eliminated the shared storage requirement of vMotion.
- vMotion – vMotion can be used to non-disruptively move a VM from one host to another host provided both hosts have access to the same shared storage (i.e. A datastore backed by a LUN or volume on a storage array or shared storage device). Prior to vSphere 5.1 this was the only option to non-disruptively move a VM between hosts.
- Storage vMotion – this allows VM vmdk’s to be non-disruptively moved from one datastore to another datastore provided the host has access to both.
- XvMotion – As of vSphere 5.1. XvMotion allows a VM on one host, regardless of the storage it is using, to be non-disruptively moved to another host, regardless of the storage it is using. Shared storage is no longer a requirement. The data is moved through the vMotion network. This was a major step towards VM mobility freedom, especially when you think of moving workloads in and out of the cloud.
- For more information see: Requirements and Limitations for vMotion Without Shared Storage
Cross-vCenter vMotion – Announced at VMworld 2014, available in vSphere.NEXT (future release)
This new feature was announced during the VMworld 2014 US – General Session – Tuesday.
You can check out the session here:
- Cross-vCenter vMotion will use VMware Network File Copy (NFC) and the vMotion network for live VM transfer to move VM’s and their data from a host managed by one vCenter to a host managed by another vCenter. Essentially extending the capabilities of XvMotion to work between vCenters.
- The difficult problem that NFC solves is to copy a file that has an open lock.
- There are no API requirements with storage arrays to take advantage of this feature.
- The latency tolerance for Cross-vCenter vMotion will be ~100ms. It will be an asynchronous replication operation until all the data is copied to the other site, then the VM will be stunned in order to migrate the memory state of the VM. The amount of memory in the VM and the network latency will impact the length of the stun and thus the application tolerance. Likely use case when a 5ms or higher round trip latency link is used is to move 1 VM or a few VM’s at a time. This would not be an option to move a whole data center full of many VM’s for disaster avoidance purposes.
VVols –Announced at VMworld 2014, available in vSphere.NEXT
There have been many great articles written about VMware Virtual Volumes (VVols) so I’ll just provide a diagram and a quick glossary of terms.
- LUNs/Volumes – with VVols, LUNs/Volumes go away (for many implementations, some storage array vendors may keep them in their implementation)… without VVols, the job of a LUN/Volume was to:
- Provide a place to store data
- Provide the access point between hosts and the place to store data
- Storage Containers – pools of disks from a storage array (e.g. RAID group, pool, etc. whatever the array calls it). This takes on the role of providing a place to store data. Note: VMware Virtual SAN is a Storage Container.
- Protocol Endpoints – This takes on the role of providing the connectivity mapping between hosts and the place to store data. Arrays can have multiple Protocol Endpoints that are of different protocols (Fibre Chanel, iSCSI, NFS) all pointing to the same storage container. vSphere will be able to scan for and discover these.
- VASA – vSphere API’s for Storage Awareness. Each storage device requires a VASA/VVol provider to advertise the capability of the Storage Containers to vCenter and vSphere and to create VVols.
- SPBM – Storage Policy Based Management. This is a framework for managing access to storage containers when VASA providers are registered. Various policies can be created to achieve various service levels based on the features and capabilities of Storage Containers. Multiple policies (service levels) can be set to take advantage of different sets of features of the storage containers.
- VM objects – Home, swap, vmdk, snaps/clones, vendor specific. A VM is made up of up to 5 VM objects. These VM objects can be assigned to different policy to inherit different service levels.
- VVols – When a VM object is attached to a SPBM Policy, SPBM talks to the VASA/VVol provider to tell the storage container to create VVols that meet the defined policy. The 5 types of VM objects that can be created as VVols are:
- Config-VVol – Metadata (home)
- Swap-VVol – Swap files
- Data-VVol – vmdk’s
- Mem-VVol – Snapshots/Clones
- Other-VVol – Vendor solution specific
If you want to participate in the Virtual Volumes Beta check out this great blog post and link to sign up.
Long Distance vMotion (aka LVM or vMotion over Distance) – Announced at VMworld 2014, available in vSphere.NEXT
This new feature was announced during the VMworld 2014 US – General Session – Tuesday. Again, you can check out the session here:
Long Distance vMotion will work with or without storage array or host based replication.
- Without replication, it will work much the same as Cross-vCenter vMotion and will work without VVols.
- The latency tolerance for this will be ~100ms.
- Likely use case is to move 1 VM or a few VM’s at a time. This would not be an option to move a whole data center full of many VM’s for disaster avoidance purposes.
- With replication, initially, it will require an active/active stretched LUN and storage that supports VASA and VVols. For example: EMC VPLEX Metro and vSphere Metro Storage Cluster (vMSC). VPLEX would need to register a VASA/VVol provider with vCenter and keep the data synchronously replicated between sites. Other active/passive vMSC solutions would not be supported.
- With vSphere.NEXT, a VVol will have a 1-to-1 relationship with a VPLEX distributed virtual volume.
- LVM and VPLEX would work together to properly deal with VM’s moving from one site to another and network partitions to ensure VM would stay active. No manual intervention necessary.
- Note: vMotion over Distance support with EMC VPLEX Metro is currently supported in vSphere 5.5. The new benefits in vSpere.NEXT are LVM and VPLEX working together to properly deal with VM’s moving from one site to another.
- Starting with VPLEX Geosynchrony 5.2 and ESXi 5.5,VPLEX round-trip-time for a non-uniform host access configuration is supported up to 10 milliseconds. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1021215
- This LVM with replicaiton use case could potentially satisfy a data center disaster avoidance scenario to move many VM’s from one data center to another.
- Other storage array or host based active/active replication technologies may also work with LVM so verify with your vendors.
Long Distance vMotion with Active/Passive Replication
This is a highly sought after feature that VMware customers have been requesting for a long time. VMware is working closely with various storage vendors on this topic but there currently is no public commitment to deliver this capability. However, this was demonstrated a few years ago at VMworld 2011 here using a customer vSphere vmkernel.
BCO1916.1 – Site Recovery Manager and Stretched Storage: Tech Preview of a New Approach to Active-Active Data Centers
This capability was presented during a break out session during VMworld 2014. You can check out the session here:
- The current challenges of a vSphere Metro Storage Cluster (vMSC stretched cluster) are:
- With a single vCenter managing both sites, if the site with the vCenter fails then vCenter can be manually restarted on the surviving site.
- DRS and HA are not site aware so DRS affinity rules can be implemented and managed.
- Orchestration – HA will restart VMs based on restart order
- Testability – Planned migration, disaster avoidance, and unplanned failures can be tested manually.
- VMware is considering adding support for Site Recovery Manager (SRM) with vMSC which will allow vMSC clusters to be managed just like SRM sites but using stretched storage LUNs/volumes
- VM protection level managed via Storage-Profile based Protection Groups (SPPG) – this links Protection to Storage Policy Based Management (SPBM)
- Configurable recovery settings for each VM (via policy membership)
- Benefits of Active/Active Datacenters with SRM:
- Same orchestrated recovery plan for unplanned failures and Continuous Availability (Live Migration for Disaster Avoidance)
- Enables low RTO for unplanned failures
- Non-disruptive test for unplanned failures
- Unplanned Failover
- Planned Live Migration
- Rerunning Planned Migration
- Reprotect and Failback
- IBM SVC and EMC VLEX Metro were highlighted in the VMworld session
VAIO Filters – Announced at VMworld 2014, available in vSphere.NEXT
- vSphere API’s for IO Filters (VAIO Filters) provides an easier path for 3rd party developers and VMware to add storage related features. For instance, server side read and write caching (e.g. SANdisk), or host based replication (e.g. Virtual RecoverPoint)
- VAIO Filters will be supported and enabled per VM object by SPBM
- EMC RecoverPoint
- Project Mercury – RecoverPoint wanted to release this kind of capability before VVols and VAIO Filters is available so they made it work without. However, as stated in here “when vVols GA – Recoverpoint VM will support the vVol IO filters.“
- When RecoverPoint supports LVM it will use VVols and VAIO Filters
- SanDisk VAIO Server-Side Caching
- Serge Shats, Engineering Fellow, discusses SanDisk’s server-side caching solution for VMware APIs for I/O filtering. Recorded at VMworld 2014 on August 26, 2014.
- SanDisk VAIO Server-Side Caching: http://youtu.be/HY1N2asFK4Y
Putting this together helped me better understand the announcements and determine how the various technologies can and should be utilized. Hopefully this us useful to others too. Thanks to those who helped me in the process!