Best practice for VMware LUN size

I was asked this question today.  Its one of my favorite questions to answer but I’ve never wrote it down.  Today I did so here it is.  Let me know if you agree or if you have other thoughts.

For a long time VMware’s max LUN size was 2TB.  This restriction was not a big issue to many but some wanted larger LUN sizes because of an application requirement. In these cases it was typically one or only a few VM’s accessing the large datastore/LUN.  vSphere 5 took the LUN size limit from 2 TB to 64TB.  Quite a dramatic improvement and hopefully big enough to satisfy those applications.

For general purpose VMs, prior to vSphere 4.1, the best practice was to keep LUN sizes smaller than 2TB (i.e. even though ESX supports 2TB LUNs, don’t make them that big).  500GB was often recommended.  1TB was OK too.  But it really depended on a few factors.  In general, the larger the LUN the more VM’s it can support.  The reason for keeping the LUN sizes small in the past was to limit the number of VM’s per datastore/LUN.  The implication of putting too many VM’s on a datastore/LUN is that performance would suffer.  First reason is that vSphere’s native multipathing only leverages one path at a time per datastore/LUN.  So if you have multiple datastores/LUN’s then you can leverage multiple paths at the same time.  Or, you could go with EMC’s PowerPath/VE to better load balance the IO workload.  Second reason is with block storage for vSphere 4.0 and earlier there was a hardware locking issue.  This meant that if a VM was powered on, off, suspended, cloned,… then the entire datastore/LUN was locked until the operation was complete thus freezing out the other VM’s utilizing that same datastore/LUN.  This was resolved in vSphere 4.1 with VAAI Hardware Offload Locking assuming the underlying storage array supported the API’s.  But before VAAI, keeping the LUN sizes small helped administrators limit the number of VM’s on a single datastore/LUN thus reducing the effects of the locking and pathing issues.

OK, that was the history, now for the future.  The general direction for VMware is to go with larger and larger pools of compute, network, and storage.  Makes the whole cloud thing simpler.  Thus the increase of support from 2TB to 64TB LUN’s.  I wouldn’t recommend going out and creating 64TB LUN’s all the time.  Because of VAAI the locking issue goes away.  The pathing issue is still there with native multipathing but if you go with EMC’s PowerPath/VE then that goes away.  So then it comes down to how big the customer wants to make their failure domains.  The thinking is that the smaller the LUN the less VM’s placed on it thus the less impact if a datastore/LUN were to go away.  Of course we go through great lengths to prevent that with five 9’s arrays and redundant storage networks, etc.  So, the guidance I’ve been seeing lately is 2TB datastores/LUNs is a good happy medium of not too big and not too small for general purpose VM’s.  If the customer has specific requirements to go bigger then that’s fine, it’s supported.

So, in the end, it depends!!!

Oh, and the storage array behavior does have an impact on the decision.  In the case of an EMC VNX, assuming a FAST VP pool then the blocks will be distributed across various tiers of drives.  If more drives are added to the pool then the VNX will rebalance the blocks to take advantage of all the drives.  So whether it’s a 500GB LUN or 50TB LUN, the VNX will balance the overall performance of the pool.  Lots of good info here about the latest Inyo release for VNX:

http://virtualgeek.typepad.com/virtual_geek/2012/05/vnx-inyo-is-going-to-blow-some-minds.html