VCE-101 Thin Provisioning Part 2 – Going Beyond - Vaughn Stewart Insights on Data & Storage Technologies

This is the second installment of a two-part post focusing on the thin provisioning capabilities native to vSphere. If you haven’t read Part I – The Basics, I’d suggest you do so before proceeding. We’ve covered a lot of content which you really should have a solid grasp of before proceeding.

In part I, I covered the basics of thin provisioning and how it operates with any storage platform. In part II we are going to expand upon a number of the points introduced in part I with a focus on how the storage virtualization functionality in Data ONTAP enhances VMware’s thin provisioning and the over arching goal to reduce both the CapEX and OpEx associated with storage in the virtualization space.

Anatomy of a Virtual Disk

Let’s begin where we started in part I, by reviewing different disk formats. This time let’s consider them with the virtual disks operating at 70% storage capacity.

– The Thick Virtual Disk

Again this is the traditional virtual disk format which we all know rather well. From this image you will notice do not receive any storage savings in the datastore, yet we do receive 30% savings on the storage array.

If you’re asking, “how can a thick VMDK consume less storage on the array?”
You’ll need to go back and read Part I

– The Thin Virtual Disk

In this image we have the thin provisioned virtual disk and as you can see we are receiving a 30% storage savings in both the datastore and on the array.

– The Eager Zeroed Thick Virtual Disk

In this last image we have an eager zeroed thick virtual disk which provides us no savings at either the datastore or on the array.

Taking Thin Provisioning Further…

I’d like to build off of the example list in the previous section and introduce the concept of enabling data deduplication into our discussion. Doing this should not be perceived as a comparison of competing technologies; far from it! My intentions are to clearly demonstrate that by leveraging VMware technologies along with Data ONTAP customers can receive exponentially greater reductions in storage and storage management.

In order to demonstrate the effects of running VMware on deduplicated storage we will need to look at the storage consumption and savings of multiple VMs contained in a shared datastore.

– Three Thin Provisioned Virtual Disks

There’s not much to comment on with this image. We have three virtual disks and our savings of 30% in both the data store and array remains with these three as it existed with a single VMDK. It’s pretty straight forward and simple (btw – I like simple).

– Three Thin Provisioned Virtual Disks plus Data Deduplication

Now we’ve got something to talk about! The first thing I need you to consider is that the observable savings vary between the datastore and storage array. While our datastore remained steady at 30%, our array is demonstrating 70% storage savings. This is possible by only storing unique blocks for all of the data within the datastore.

btw – achieving roughly 70% savings is pretty common with our deployments

Just like CPU or memory virtualization in ESX/ESXi, each VM has its own file system and virtual disk, but thru Data ONTAP we are able to share the underlying data blocks for greater efficiency and scaling of the physical hardware (or disk).

I need to clarify a point: in this example we are storing our VMDKs on LUNs (FC, FCoE, iSCSI). As LUNs are formatted with VMFS the VMware admin natively has no view into the storage array. Any storage virtualization will require tools that bridge this ‘gap’. This statement is true for every array serving LUNs regardless of vendor.

If you deploy with LUNs, I would suggest you take a peek at the NetApp Virtual Storage Console. It is a vCenter 4 plug-in, which among many of its functions provides reporting on the storage utilization through all layers; spanning datastore to physical disks. VMware admins now have direct access to understanding of their storage efficiency.

– Three Thin Provisioned Virtual Disks plus Data Deduplication on NFS

I’m sure those who know me are saying, ‘I knew Vaughn was eventually going to bring up NAS.’

I do so to demonstrate a key point in this image there is something very different. The storage savings in our datastore reports the same savings as in the storage array… 63% savings!

Do you recall earlier, when I stated I like simplicity, here’s one of the many reasons… NFS provides transparent storage virtualization between the array and the hypervisor. This architecture and combination of storage savings technologies delivers value to both the VI and storage admin teams. It’s that simple!

In fact NetApp dedupe provides the same storage savings for any type of virtual disk, thick, thin, or eager-zeroed thick. Find a need to inflate a a thin virtual disk to an eager zeroed thick? No problem, it doesn’t have any impact on the storage utilization and reporting in the datastore.

Characteristics of Virtual Disks

– Application and Feature Support Considerations

In part I we learned that thin provisioned VMDKs don’t provide ‘clustering supported’ for MSCS and FT. In addition, I cited that we see most integrators shy away from thin with business critical applications. These configurations are perfect scenarios where the introduction of data deduplication can provide storage savings.

This functionality is possible as the presence of data deduplication is unknown to the hypervisor, the VM, the application, etc… s an example: imagine the costs benefits of virtualizing Exchange 2010 and running it on deduplicated storage?

Hint – With Exchange 2010 Microsoft has removed the single instance storage model which has existed all far back as I can recall (which is Exchange 5.5)

– Thin is only thin on day one

In part I we covered the concept that deleted data is still data and how this deleted data will eventually expand a thin provisioned virtual disk thus wasting storage capacity. We also covered how to leverage a number of tools and processes in order to return the VMDK to its minimal and optimal size; however, one has to balance these processes with the impact they will impose on the rest of the data center, specifically resources like replication and replication bandwidth.

This is another area where running VMware on NetApp provides unique value unavailable with traditional storage arrays.

To begin, data deduplication addresses part of this issue by eliminating the redundancy of content on the array. By eliminating the redundancy we eliminate the traditional cost of deleted file system content being stored on the array. For example, open a saved word doc, edit, save, and close it. The file system of the VM will create an entirely new copy of the word doc and will delete the original. Dedupe will reduce the block redundancy between these two copies.

NetApp is already shipping ‘hole punching’ technology today with FC, FCoE, and iSCSI LUNs managed by SnapDrive. This technology is available for RDMs and guest connected storage in VMs. Now this isn’t the end game, we’d still like to ensure that virtual disks can remain as space efficient in an automated fashion.

Our current plans are to expand this functionality to run as a policy on datastores. This is an example of NetApp engineering working to make your life easier. Does this type of simplicity and optimization sound appealing?

This feature was discussed at VMworld 2009 in session TA4881 featuring Arthur Lent and Scott Davis. I’d advocate anyone with access to watch the session in its entirety, ‘hole punching’ or ‘space reclamation’ demo occurs around 52:30 and runs for approximately 1 minute. Note: You must have access to the VMworld 2009 online content in order to view this session.

If you constrained on time and would prefer to view the only the demo (without audio) you can find it here (again VMworld access required).

– Avoid Running Defragmentation Utilities Inside of VMs

Our position remains unchanged since Part I; you should still consider discontinuing the use of defrag utilities within VMs. However, if you can;t we can help undo the havoc these utilities wreak. Defrag utilities rewrite data and their use will expand a thin provisioned virtual disk. While we can;t stop the expansion of the thin VMDK we can prevent the storage growth in on the array by simply a dedupe update following the defrag process. This post defrag process will return storage capacity and replication bandwidth requirements back to an optimized state.

Understanding Thin – From an Availability Perspective

– The Goal of Thin Provisioning is Datastore Oversubscription

In part I, I cautioned against oversubscribing datastores based on the lack of automated storage capacity management. When combined with oversubscription this rigidity establishes an environment where should the datastore fill all of the VMs become inaccessible. In order to overcome this type of failure we discussed implementing a script that monitors datastore capacity and migrates VMs in the event of scarce free space.

What if you could place policies on the datastore and have them resize without human intervention or scripts? This functionality is available today with VMware on NetApp NFS. It really is an elegant solution an ideal when considering oversubscription. This technology provides customers the ability to monitor large global storage pools (NetApp aggregates) versus individual datastores. By definition, the oversubscription of a physical resource requires the monitoring of the physical resource’s capacity.

Why would anyone not allow the storage array to manage the size of datastores versus monitoring and migrating individual VMs? While our goal is efficiency we mustn’t sacrifice simplicity.

Sure this is a very ‘storage centric’ view, but I’d position that form a storage capacity management standpoint it is probably easier to manage datatsores than individual VMs.

As NetApp and VMware support the hot expansion of LUNs and VMFS we should be able to add this process to the datastore monitoring we highlighted in part I.

I’d love to publish a post highlight another blog once someone whips up a script to handle LUNs – hint hint

A Few Additional Enhancements for Thin Provisioned VMDKs

– On-Demand Block Allocation

You may recall we shared that thin virtual disks allocate storage from the datastore on-demand and that this process triggers SCSI locks on the datastore, as these operations are metadata updates. SCSI locks are very short lived; however, they do impact the I/O of the datastore. I just wanted to share one point: block allocation events cannot invoke SCSI locks with NFS datastores, as they don’t exist. This means deploy all of the thin provisioned virtual disks you’d like, you’ll never incur an issue related to a metadata updates.

Default VMDK Formats with NFS

When running VMware over NFS the storage array determines whether the virtual disks are thick or thin. I’m happy to share with you that with NetApp storage arrays all VMDKs will be thin provisioned by default. Note the disk format options are grayed out in the wizard.

I guess you could say as the storage vendor with largest VMware on NFS install base, NetApp also have the largest install base of customers running with thin provisioned VMDKs.

In Closing

In this two part post I’ve attempted to communicate the following points:

• Storage and the associated operational costs aren’t inline with server and network virtualization efforts

• VMware has delivered a wealth of technologies to help address these costs of which we have highlighted thin provisioning inthese posts

• NetApp provides unmatched storage virtualization technologies which enhance the storage saving technologies from VMware resulting in unmatched CapEx and OpEx savings

Virtualization is truly changing everything – heck look at how long we’ve been discussing virtual disk types! For those of you who have NetApp arrays, please implement all of the technologies we have covered in parts I & II of this post (just remember to follow all of the best practices in TR-3749 or TR-3428). For those of you who aren’t running VMware on NetApp, I’d ask you to leverage the storage savings technologies of VMware to their fullest. I trust between these posts and others like this one, by my good friend and adversarial Chad Sakac, you should be well armed on the subject of thin provisioned VMDKs. When you’re ready to go further please check out our gear or allow us to virtualize your arrays with Data ONTAP by introducing a vSeries to your existing storage arrays.

I fear this post will be too technical for some and too sales-y for others. Trying to disseminate information to a broad audience is always a difficult task. Let me know if you found this data useful, by sharing your feedback.

7 thoughts on “VCE-101 Thin Provisioning Part 2 – Going Beyond”

Brian says:

November 5, 2009 at 9:32 am

Good article… I found the tidbits throughout the article on benefits of NFS storage for vSphere interesting. Definitely some good benefits.
Unfortunately…we’re a Microsoft shop, so vSphere would be the ONLY thing we’d use NFS for. If it was a cheap license, we’d have it in a second…but when you’re only going to use it for a small list of additional features (even if they are great), it makes the cost justification a little tough.
On the other hand — we’re running FC for storage and like this over IP for several reasons. Hopefully VMware and NetApp can continue to integrate further and some of the same benefits can come to FC VMFS datastores.
Good info…thx

Ian Forbes says:

November 6, 2009 at 10:14 pm

Great articles. I learned a bunch. Thanks. I actually ran into a problem at a client recently related to VMFS and WAFL not being in sync with “actual” capacity metrics. Thin is thin only on day one. When vm’s are destroyed or storage vmotioned to another datastore, the source datastore will reflect the new “available” space but WAFL will still report the “old” available space.
I didn’t know the hole punching or space reclaimer feature has already been discussed for ESX vmfs datastores. In my post I mentioned how it would be nice to have space reclaimer for vmfs – run from the Virtual Storage Console 🙂
So, my question is when will space reclaimer be a reality? My second question is what is your suggestion in the meantime to have the “reclaimed” space in the vmfs datstore be reflected at the WAFL level?
Here is my post in Netapp Communities:
http://communities.netapp.com/message/18846

Ian Forbes says:

November 6, 2009 at 10:32 pm

Just to clarify, I’m hoping to be able to run space reclaimer against a vmfs datastore, not the GOS. I thought that’s what the demo from VMWorld 2009 was going to show – but it showed space reclaimer against a vmdk in a GOS (iSCSI LUN).

Paul Evans says:

December 3, 2009 at 12:37 am

Excellent posts, Vaughn. I was glad to find convergence of viewpoints around the fundamentals between your and Chad Sakac’s posts. I learnt quite a lot from them and have summarized my learning at
http://blog.sharevm.com/2009/12/03/thin-provisioning-when-to-use-benefits-and-challenges/

Vaughn Stewart says:

January 6, 2010 at 2:34 pm

@Paul
Thanks for the follow up – great post

invisible says:

June 4, 2010 at 2:40 pm

Paul, Vaughn
Do you know any method allowing reclaiming free space with thin provisioned VMs WITHOUT storage vmotion?
We have a structure in our NFS volumes and VMs are located in different subfolders rather then in root folder of the datastore. This is because we run scripts on regular intervals to take backups of VMs.
If we SVmotion VMs between datastores it will break our structure because SVmotion moves VM folders to root folder of the datastore.
Do you know any way to avoid it?

Vaughn Stewart says:

June 7, 2010 at 7:06 am

Re: [NetApp – The Virtual Storage Guy] invisible submitted a comment to VCE-101 Thin Provisioning Part 2 Going Beyond
@invisible thanks for the question. As a global technical partner of VMware NetApp designs its tools and applications to follow VMware designs including the storing of a VM in the root of the VM Homedir. I would suggest that the model you have deployed will inevitably cause you issues with products from VMware and their partners. You may want to reconsider some aspects of what is deployed.
As for storage reclamation in VMDKs… There is nothing available today which completes this process without migrating the data (ala Storage VMotion). Work is going on with the SNIA via T10 storage reclamation. This work will eventually show itself in all systems, including ones from VMware NetApp.
As more on this front becomes available I will post on it.