I’d like to ask you, do you think the following statements are generally considered to be true?
- Virtualization is software, and hardware is simply hardware
- RAID10 out performs all other RAID types that provide data protection (redundancy)
- Fibre Channel provides greater I/O performance than NFS for virtualized workloads
- Deduplicated data sets perform slower than traditional data sets
I believe that for most these statements are considered true and why shouldn’t they? These views originate from historical validation in our physical infrastructures. Moreover, most of us are not storage experts, intimately familiar with nuances in arrays architectures and capabilities. For most SAN & NAS storage is just a black box and thus storage is simply hardware; or as my good friend Duncan Epping likes to say, ‘just spinning tin’
Duncan’s comment of spinning tin led me to ponder, does hardware matter with server virtualization? On one hand no, it does not as the hardware layer has been abstracted to a point where customers at any time can non-disruptively migrate to a net-new platform.
Yet on the other hand hardware does truly matter.
The level of intelligence and capability in the hardware provides greater performance, solution sets, functionality, and points of integration (both direct like VAAI and indirect like dedupe).
So the question is, how does a customer verify the level of benefit a list of features a hardware vendor offers? If I ran a data center virtualization project I would not acquire a single piece of hardware until a hardware vendor validated their technology using my dataset. Vendors looking to partner in our initiatives would have to demonstrate their ability to deliver the benefits of their technology prior to the release of a purchase order and acknowledging the acceptance of any hardware.
Suffice to say, virtualization empowers customers to adopt the best-of-breed technologies without being bound to their legacy footprint, platform, or partner.
Alas, I digress…
As I stated hardware does mater, and virtualization can be found in the operational software embedded within hardware. Now the amount of virtualization benefits can be hard to discern so we have attempted to make it easier to measure the impact of storage virtualization provided by Data ONTAP with the publication of our Technical Report TR-3856.
Details, Details, Details…
In this report we ran 80 VMs on a NetApp FAS array and a Traditional Storage Array. We focused on designing tests that we believe allows one to consider the storage capabilities of an array with vSphere when multiple technologies are used in conjunction. These tests include:
- Measure the amount of time required to provision shared data stores to vSphere hosts
- Measure the amount of time required provisioning 80 VMs
- Review the data protection options available to virtualized environments
- Evaluate the storage efficiency technologies and measure the actual storage savings provided
- Measure the performance using realistic workloads when combining storage efficiency and data protection technologies
One way to consider the value of a technology or feature is to measure its applicability in a number of use cases. At NetApp we like to simplify architectural design and operation challenges by eliminating functional restrictions. We refer to the mindset of unified capabilities as the AND principal.
Test Bed Details:
- Servers: 4 x IBM x3650 servers configured with 8 Intel® XeonTM E5430 CPUs at 2.66GHz and 36GB memory running vSphere 4 update 1
- Guests: 80 VMs running Windows® Server 2003 and SP2; each VM configured with 1GB memory and 1 virtual CPU
- Guest I/O: Each VM ran IOMeter configured with a 100% random mix of 75% reads and 25% writes using a 4KB request size, with two I/Os outstanding. This is the same workload we have used for joint NetApp & VMware performance validation tests in TR-3808 & TR-3697.
- NetApp FAS: 2x4GB of cache (dual controllers) with 40 15k rpm SAS drives
- Traditional Mid-Tier Fibre Channel Array: 2x8GB of cache (dual controllers) with 152 15k rpm FC drives
We ran all NetApp tests for a single configuration of 10GbE NFS with RAID-DP and data deduplication enabled. While a NetApp array is the only true Unified Array (providing concurrent access to FC, FCoE, iSCSI, and NFS for vSphere datastore access), we selected to proceed in this manner as a means to reinforce the value of AND. We wanted to demonstrate that a single storage array configuration with all of the bells and whistles enabled could provide greater benefits than ANY configuration with the Traditional Storage Array.
Think of this model as a ‘Set-it-and-Forget-it’ storage architecture design.
I would also highlight the I/O testing completed on the NetApp array utilized 38 of 40 disk drives whereas the Traditional Storage Array used 76 of 152 drives. For our testing we wanted to measure the results of many hardware configurations with the Traditional Storage Array. In order to reduce the time required to build each test bed be used half of the total number of disks to run a test so we could stage the next test.
Bottom line, all test on NetApp used half the number of disks (38) as the Traditional array (76). With this said, our tests provided the following results:
- 95% reduction in data store provisioning time
- 97% reduction in VM deployment provisioning time
- 98% gain in usable capacity
- Performance on NetApp was 25% better than RAID 5
- Performance on NetApp was 30% better than RAID 6
- Performance on NetApp was 10% better than RAID 10
I’ve shared posts on our cache expansion module, Flash Cache (aka PAM II) and it’s ability to increase storage performance. Just for giggles we added a Flash Cache to the FAS3170 array and this upgrade produced a 52% performance gain over the Traditional storage array with RAID 10 (which was it’s highest performing RAID type).
One of the functions I appreciate about the Flash Cache is its ability to raise the performance of any disk drive technology without having to move data around. We call this benefit ‘tier-less storage.’
BTW – a Flash Cache module costs less than a shelf of disks drives and as such still allowed for the NetApp configuration to cost significantly less than the Traditional Storage Array.
Wrapping Up This Post
I started this post by asking if you thought the following statements are generally considered to be true?
- Virtualization is software, and hardware is simply hardware
- RAID10 out performs all other RAID types that provide data protection (redundancy)
- Fibre Channel provides greater I/O performance than NFS for virtualized workloads
- Deduplicated data sets perform slower than traditional data sets
In TR-3856 we fell we have successfully demonstrated that the embedded software in hardware can provide a tremendous amount of virtualization benefits; resulting in running faster and greener than what is available from a rather well known Traditional Fiber Channel Storage Array.
I’d like to close with ensuring you that VMware engineering reviewed the content and results of TR-3856 and they approved the publication of this document. Many of you may not be aware that the publication of any performance tests ran with VMware require their authorization.
Unfortunately we cannot disclose the vendor or model of the traditional array used in this testing for legal reasons. Simply all array vendors (including NetApp) require their consent prior to the publication of any performance test results which includes their gear. We did approach the Traditional Array Vendor and they declined to participate.
I expect some reading this post and technical report to not agree with the results and that’s fair. I fully endorse skepticism. To those who need more validation, I would make the following suggestion; contact a NetApp partner and request a demo system. With our vSeries we can benchmark your existing storage array with and without it being virtualized, and I assure you, you’ll see the similar benefits.
This testing has been a lot of effort and hopefully you will find the output beneficial.
As Always, Virtualization Changes Everything!
how well are vfilers supported in a vmwarenetapp modell?
we know the VCS is cool, and shows the allocated netapp ressources within vCenter, but is that true to vfilers?
Faster? Great.
Greener? I really could not care less.
My big question is how does the introduction of “load balanced teams” in vSphere 4.1 impact NFS architecture and performance?
In TR-3749 release 2 we have to have a bunch of IP address aliases and multiple volumes and map that one each in VMware. Does the new load balancing based on NIC load in a virtual port group get rid of the need for all that setup on the filer while still maintaining load balance and the same level of performance?
If so then NFS on NetApp just became even more of a slam dunk obvious choice for VMware environments.
@Timo – Yes, vFilers are supported
@Kevin – With ESX/ESXi the NFS client relies on the hypervisor routing table, thus we recommend 1GbE environments use multiple subnets to increase total bandwidth available to the hypervisor. This recommendation does not apply to 10GbE environments today, and will be addressed in a future release of ESXi.
There is no Raid-DP support on vfilers when virtualising existing storage. Given most of the performance benefit looks to come from Raid-DP, isnt your comment re organising a test run using legacy hardware and a vFiler flawed ?
Neil, I don’t understand your question. All volumes in a NetApp system that are built on RAID-DP are protected by RAID-DP. This includes volumes owned by vfilers.
There is no way to disassociate a volume owned by a vfiler from the underlying RAID mechanism in Data ONTAP.
Never mind. I can’t read this morning, apparently. You’re meaning a vfiler on a NetApp V-Series system in front of existing FC SAN storage.
@timo: When using vfilers in a VSphere environment and with the VSC 2.0 plugin, you see resources allocated on a per-vfiler basis.
I think Neil was meaning V-Series (not vfiler). i.e. when you put a V-series in front of your existing ‘traditional’ array, there’s no RAID-DP – it’s the ‘traditional’ array’s RAID (5 or 10, say) at the bottom, with the V-Series simply doing RAID-0 across the top of the backend-LUNs within each aggregate. Benefits derived from dedup still apply in this scenario though.
I was meaning vSeries Filer (the disk-less version). You cant just put that in front of your existing storage and expect similar performance gains as indicated. You will get SOME gains, sure (from PAMII), but most of the performance benefit from NetApp comes from Raid-DP.
Hi Neil,
Actually, RAID-DP by itself is not the performance booster, it’s mostly the way we write (WAFL).
Several customers get much better performance with V-Series than without…
D
The administration of the law can never go lax where every individual sees to it that it grows not lax in his own case, or in cases which fall under his eyes. (Mark Twain, American writer)
The administration of the law can never go lax where every individual sees to it that it grows not lax in his own case, or in cases which fall under his eyes. (Mark Twain, American writer)