New vSphere 4.1 Report: Measuring SAN & NAS Performance


NetApp and VMware performance engineering have completed a new storage performance technical report, TR-3916. This new report provides the relative I/O performance available from SAN and NAS storage protocols with vSphere 4.1 and a NetApp FAS array.

The testing in the new TR-3916 is a leap forward from our previous reports as it includes results obtained from shared and non-shared datastores, has VAAI enabled, and measures the gains provided by the Paravirtual SCSI adapter.

NetApp and VMware engineering believe that these tests result in the most accurate measurement of the performance capabilities of the storage protocols in vSphere as the test bed remains consistent throughout the testing. In fact, switching between SAN protocols with a NetApp array is as simple as mapping a different igroup (aka LUN mask). No reconfiguration or migration was required in completing the FC, FCoE, and iSCSI tests; just a simple rescan of the storage adapters from the hosts.

For the NAS (NFS) tests, our engineers simply created a FlexVol and connected it to the hosts in the cluster. The remainder of environment was unchanged, i.e. we leveraged the same arrays, disks, converged network adapters, etc… with the exception of the LUN we deleted.

The Test Bed

The test bed was comprised of a 8-node vSphere 4.1 cluster. Each host was powered by a Fujitsu Primergy RX200 with 2 Quad-core Intel Xeon E5507 Nehalem CPUs & 48GBs of memory, Qlogic CNA & HBAs, and Intel NICs. The I/O load was generated by 128 VMs, each running IOMeter.

The storage array comprised of a NetApp FAS 6210 running Data Ontap 8.0.1RC2, configured with 190 15k SAS drives, connected to a pair of Cisco Nexus 5020 unified fabric network switches via NetApp’s Unified Connect CNA.

Some Highlights from TR-3916

I want to refrain from sharing all of the great content in TR-3916 here in this post. However, I will share with you the results from section 5: A high-performance, non-shared datastore where the VM was running a 60/40 read/write workload with an 8KB block size. This is a OLT-type workload commonly found with ERP systems running on Oracle Database or SQL Server.

I think the most common perception around delivering optimal performance for this type of workload would be one that is biased towards FC or FCoE SAN connectivity. You can’t blame anyone for having such a bias, as it is was the message of the storage industry.

I advocate the premise that ‘virtualization changes everything’, and it looks like the old adage of FC SAN is the means to high performance no longer applies.

— click on any image to view at full scale —

IOPs – higher is better

Latency – lower is better

Guest CPU Utilization – lower is better

As you can see from the charts above, storage IOPs, latency, and guest CPU consumption are within 10% of each other regardless of storage protocol tested. 8Gb FC, 10GbE NFS and 10GbE FCoE faired a bit better than the other configurations and provided nearly identical results.

Wrapping Up This Post

The goal of our performance comparison reports is to help you, the customer, make a sound decision around your storage architecture. Whether you prefer SAN or NAS, have 10GbE or 1GbE networks, etc, you can have confidence in knowing a vSphere on NetApp & Cisco solution will be able to meet your current and future requirements.

Your data center will change over time, whether it be in terms of protocols, network types, storage connection options, etc. You may have insight to what some of these change may be; others will catch you by surprise (us technology vendors like to surprise you from time to time). With an end-to-end unified architecture like cloud Cisco, NetApp and VMware your future proof.

I’d like to thank the performance engineering teams at NetApp and VMware for their efforts in providing this informative data. Please take some time to review TR-3916 and consider sharing your thoughts in the comments section.


  1. I was looking through the report, and it seems that the maximum bandwidth usage was set at 128 VMs having 256 total outstanding I/O’s of 4KB each, delivering a total bandwidth usage of 1MB/s through the protocol under test. Is this correct?

  2. @Erik – There are two models which one deploys storage with vSphere, which I will refer to as shared & isolated datastores. In this TR we review both. Shared in sections 3 & 4, and Isolated in section 5.
    Shared datastores are large pools typically comprised of multiple VMs, where each VM commonly has low to moderate I/O requests. Shared datastores commonly contain 5-15 VMs with SAN protocols (FC, FCoE, iSCSI) and 60-200 VMs with NAS (NFS). While each VMs I/O load is not large, the aggregated I/O load is rather large.
    Isolated data stores are smaller pools comprised of a single VM that has high I/O requirements, such as an OLTP database.
    From the details of your question, IOMeter settings including outstanding I/O and block size, applies to the shared datastore tests of section 3.
    IOMeter sends I/O requests asynchronously, resulting in an aggregate I/O load on the data store that can be measured in the hundreds of MB/s and tens of thousands of IOPs. I wish I could share with you the actual results, but VMware engineering specifically requests that we only publish relative numbers.
    Trust me here; the workload on the shared datastore is massive.
    BTW – There’s additional conversation on this topic here:

  3. Hi Vaughn. Would that be feasible the performance engineers post the IOmeter Configuration Files (.ICF) they used to benchmark this environment?
    They can be recreated from the report though, but that would be great if that is available online…

Leave a Reply