Best Practices for VM Swap File Locations and Guest OS Page Files when using EMC Storage

I cannot take credit for this post, I posed the question regarding “Best Practices for locating the Virtual Machine swap file and several vSpecialist responded with number of great answers / opinions / recommended best practices for that and for Guest OS Paging. So thanks to Robert Way, Rajeev Dawar, Rich Barlow, Itzik Reich, Scott Drummonds, and Erik Zandboer for their contributions.

Here’s one of the best general reads on this topic.  “(Alternative) VM swap file locations Q&A” http://frankdenneman.nl/vmware/alternative-vm-swap-file-locations-qa/

The general consensus is that if you are swapping you are screwed.  Once swapping starts, very bad things are already happening, so a design that is centered around identifying and stopping swap as soon as it happens is best.  Sure, EMC would love it if you put your swap files on EMC Flash technology to reduce the performance impact of swapping.  But this might not be the best use of Flash resources if you are concerned about cost, and who isn’t?  If you’ve done a good job architecting your VM’s then you shouldn’t be overcommitting memory and thus it shouldn’t really matter where you put the swap files.  So in general you could keep things simple and just accept the default “Virtual machine directory – Store the swap files in the same directory as the virtual machines”.  With EMC’s FAST VP technology on the VNX and VMAX, if the swap files are not used then they will be migrated to the lowest tier.

There are several good reasons to choose the option: “Datastore Specified by host – Store the swap files in the datastore specified by the host to be used for swap files.  If not possible, store the swap files in the same directory as the virtual machine.”

  • Capacity –  The VM swap file space can add up for a large number of VM’s so isolating them to a separate datastore can provide some efficiencies.
  • Replication – The VM swap file should be in a non-replicated datastore.  If swapping is occurring then the blocks are constantly changing thus it will be constantly replicated possibly resulting in a very busy WAN link resulting in poor performance and a nightmare for VMware Site Recovery Manager (SRM).
  • Snapshot – Similar to replication, the VM swap file swap should not be on a datastore with snapshots enabled.  This will save on storage capacity and provide better overall efficiency.
  • Backup – Similar to replication, the VM swap file should be on a datastore that is not backed up in order to skip scanning and backing up swap file data.   Here’s a good reference: “What Do You Do With Your Swap File? – VMworld Update, Thursday” http://thebackupwindow.emc.com/alex_almeida/what-do-you-do-with-your-swap-file-vmworld-update-thursday/
  • Troubleshooting – isolation helps in finding guest OS issues but resolving the underlying overcommit issues should be the focus.  Here’s a good reference: “Storing a virtual machine swap file in a location other than the default” http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004082

And a reason to keep the VM swap files on a shared datastore versus a local datastore:

  • vMotion Performance – If the destination host cannot see the swap file then it creates a new swap file and potentially copies swapped memory from the source.  This can increase the time it takes to perform a vMotion but might be a worthy tradeoff to leveraging the host’s local disk.

If you have a good number of virtual machines then based on what’s written above, there are more compelling reasons to choose to put your VM swap files in a different location.  The next question is “Where do you put them?”  If you agree that if your swapping you’re screwed anyway then it makes sense to just put them somewhere cheap and forget about it.  Here is some guidance related to EMC storage.

Option 1 – Put them somewhere cheap and monitor closely for swapping

Create a LUN or volume on the least expensive storage tier.  This could be a dedicated NL-SAS or SATA disk group or with EMC FAST VP technology you could force the LUN to the lowest tier.  Connect all the hosts in the cluster to this datastore and use it as a consolidated location for all the VM swap files in the cluster.  Since the performance will be horrible if swapping occurs, the IO’s to this datastore should be closely monitored and if the alarm triggers then spring into action and go stop the swapping.

Option 2 – Let FAST VP do its thing

Create a LUN or volume in a FAST VP pool.  With EMC’s FAST VP technology on the VNX and VMAX, if the swap files are not used then they will be migrated to the lowest tier, thus the most cost effective tier for swap files.  You could also force the LUN or Volume to the lowest tier upon creation.  Connect all the hosts in the cluster to this datastore and use it as a consolidated location for all the VM swap files in the cluster.  Then, as in Option 1, monitor the IO’s.

Option 3 – Plan for good performance if swapping occurs

There may be some applications where you do not want to risk poor performance if swapping occurs.  Or, there may be applications that heavily utilize the available swap space.  One such application is SAP.  This is where Flash technology can dramatically help.  There are a few choices here:

  1. Create a LUN or volume in a FAST VP pool on a VNX or VMAX that included EFD’s and force the LUN or Volume to the highest tier upon creation.  Connect all the hosts in the cluster to this datastore and use it as a consolidated location for all the VM swap files in the cluster that you care about swap performance.
  2. Install a VF Cache card on the host and dedicate a portion of it as local storage.  Then create a datastore using this local storage and point the VM swap files to it.  The swap file performance will be great but the vMotion performance will be impacted since other hosts cannot see this storage.
  3. Leverage local EFD’s on the server to create a datastore and point the VM swap files to it.  The swap file performance will be good but the vMotion performance will be impacted since other hosts cannot see this storage.
  4. Create a LUN or volume on EMC’s “Project X” (XtremIO) all-cache-array and point the VM swap files to it.  Other factor to consider is if you are starting a large number of VM’s in a short time (E.g. VDI) then the creation of the VM swap files can cause an impact to performance.  This should be considered in deciding the location of VM swap files.  “Project X” shines in this use case but is not available today so stay tuned in 2013.

It’s important to note that the above discussion was around ESX host swapping.  Don’t forget about the other swapping: in the guest OS.  For this we used the term “paging” to differentiate it from ESX swapping.  If you really want to fully manage paging you should create a virtual disk for each guest OS’s page file on the same datastore as the VM swap file.  Then if you are monitoring for IO’s to that datastore, the storage alarm will detect host swapping or guest paging.

Summary

To keep things simple you can leave the VM swap file and OS page file in the default locations, place the VM in cluster with a shared datastore backed by a LUN or volume in a FAST VP pool, and monitor IO’s closely.

If you want to gain maximum efficiency then choose a location other than the default for the VM swap file and OS page file, place them on a datastore that is backed by the lowest tier of storage (e.g. NL-SAS RAID group or a FAST VM pool with NL-SAS), and monitor IO’s closely.

If you are concerned with performance if swapping occurs then choose a location other than the default for the VM swap file and OS page file and place them on a datastore that is backed by the Flash technology either on the array or host.

As always, questions, comments, feedback is welcome.

Best practice for VMware LUN size

I was asked this question today.  Its one of my favorite questions to answer but I’ve never wrote it down.  Today I did so here it is.  Let me know if you agree or if you have other thoughts.

For a long time VMware’s max LUN size was 2TB.  This restriction was not a big issue to many but some wanted larger LUN sizes because of an application requirement. In these cases it was typically one or only a few VM’s accessing the large datastore/LUN.  vSphere 5 took the LUN size limit from 2 TB to 64TB.  Quite a dramatic improvement and hopefully big enough to satisfy those applications.

For general purpose VMs, prior to vSphere 4.1, the best practice was to keep LUN sizes smaller than 2TB (i.e. even though ESX supports 2TB LUNs, don’t make them that big).  500GB was often recommended.  1TB was OK too.  But it really depended on a few factors.  In general, the larger the LUN the more VM’s it can support.  The reason for keeping the LUN sizes small in the past was to limit the number of VM’s per datastore/LUN.  The implication of putting too many VM’s on a datastore/LUN is that performance would suffer.  First reason is that vSphere’s native multipathing only leverages one path at a time per datastore/LUN.  So if you have multiple datastores/LUN’s then you can leverage multiple paths at the same time.  Or, you could go with EMC’s PowerPath/VE to better load balance the IO workload.  Second reason is with block storage for vSphere 4.0 and earlier there was a hardware locking issue.  This meant that if a VM was powered on, off, suspended, cloned,… then the entire datastore/LUN was locked until the operation was complete thus freezing out the other VM’s utilizing that same datastore/LUN.  This was resolved in vSphere 4.1 with VAAI Hardware Offload Locking assuming the underlying storage array supported the API’s.  But before VAAI, keeping the LUN sizes small helped administrators limit the number of VM’s on a single datastore/LUN thus reducing the effects of the locking and pathing issues.

OK, that was the history, now for the future.  The general direction for VMware is to go with larger and larger pools of compute, network, and storage.  Makes the whole cloud thing simpler.  Thus the increase of support from 2TB to 64TB LUN’s.  I wouldn’t recommend going out and creating 64TB LUN’s all the time.  Because of VAAI the locking issue goes away.  The pathing issue is still there with native multipathing but if you go with EMC’s PowerPath/VE then that goes away.  So then it comes down to how big the customer wants to make their failure domains.  The thinking is that the smaller the LUN the less VM’s placed on it thus the less impact if a datastore/LUN were to go away.  Of course we go through great lengths to prevent that with five 9’s arrays and redundant storage networks, etc.  So, the guidance I’ve been seeing lately is 2TB datastores/LUNs is a good happy medium of not too big and not too small for general purpose VM’s.  If the customer has specific requirements to go bigger then that’s fine, it’s supported.

So, in the end, it depends!!!

Oh, and the storage array behavior does have an impact on the decision.  In the case of an EMC VNX, assuming a FAST VP pool then the blocks will be distributed across various tiers of drives.  If more drives are added to the pool then the VNX will rebalance the blocks to take advantage of all the drives.  So whether it’s a 500GB LUN or 50TB LUN, the VNX will balance the overall performance of the pool.  Lots of good info here about the latest Inyo release for VNX:

http://virtualgeek.typepad.com/virtual_geek/2012/05/vnx-inyo-is-going-to-blow-some-minds.html