I am please to share with you that this morning our new Technical Report, TR-3749: vSphere on NetApp Storage Best Practices, has been released. I’d like to thank Mike Slisinger, Larry Touchette, and Peter Learmonth for all of their assistance in the production of this TR.
I’d like to take a few moments to share some of the core enhancements covered in the report.
iSCSI Enhancements
iSCSI has made major gains in vSphere and in TR-3749 we detail how to aggregate the bandwidth of multiple network links. This solution allows customers with GbE based storage networks to achieve meet the throughput needs of their most demanding applications without having to upgrade the storage network to FC, FCOE, or 10 GbE.
This solution is comprised of two technologies; support for multiple TCP sessions with the iSCSI initiator and the Round Robin Path Selection Policy (PSP) within the Native Multipathing Plugin (NMP). Both of these technologies are components within the VMware Pluggable Storage Architecture (PSA).
The enablement of the multiple TCP sessions will require that each ESX/ESXi server to have multiple VMkernel ports for storage networking defined. With this design you will have to assign the iSCSI service to each of these VMkernel ports. This last step in only available to be completed via the command line interface on each server.
The second half of this requires each ESX/ESXi server to enable the Round Robin PSP. If you don’t complete this last step your iSCSI traffic will only traverse a single Ethernet link. This change can be completed via the NetApp Virtual Service Console or via the CLI on each ESX/ESXi server.
If you would like to read more about the process to enable the RR PSP see my post on the Plug-n-Play SAN.
Fibre Channel Enhancements
Fibre Channel and Fibre Channel over Ethernet has also made major strides with vSphere and in TR-3749 we detail how to enable ALUA and along with the RR PSP we detail how you can enable a high performing Plug-n-Play SAN architecture. For this post I will skip on the details of this design as I have covered it in depth in the P-n-P SAN post.
In addition, configuring FCoE is exactly the same as FC within ESX/ESXi 4. I state this point to reassure customers that FCoE is simply FC running over a different medium. No more, no less.
NFS Enhancements
The Network File System within ESX/ESXi and how it is best deployed has remained static from VI3 to vSphere. There are some additional enhancements available today such as deploying the Cisco 1000V Virtual Switch and enabling the LACP for a simpler architecture and great network resiliency. The 1000V can be enabled whether or not your network is running Catalyst switches from Cisco.
Unfortunately with version 1.0 of TR-3749 we were unable to go into detail around the 1000V and the Catalyst line as our work was completed prior to the opening of the NetApp Cisco Ethernet Unification Center of Excellence in RTP North Carolina. Look for an update shortly which significantly build out the Catalyst content.
NFS customers should expect additional enhancements with NFS in the future. I’ll make sure to share them with you as they are made available.
Additional vSphere Enhancements
There are a number of additional technologies, deployment and operational details covered in TR-3749. These include deploying Distributed Virtual Switches, Increasing datastore capacity by growing LUNs and VMFS, and Thin Provisioned Virtual Disks.
I hope you find the content in the new TR very useful as you being developing your vSphere migration plan. Please send us your feedback as you review the TR. Please include ‘TR-3749 Feedback’ in your subject line (sorry i didn’t know how to insert the subject line into the hyperlink as I created this post).
Thanks!
Just a note that on the top of page 10 there are three duplicated rows in the chart…working my way through slowly through the doc finally.
Vaughn, care to explain what the recommended flow control settings of “filer: send” and “switches: receive” are trying to address? Searched high and low to get some background information on the rationale to no avail…
Does it make any difference at all whether the storage emits or receives PAUSE frames? Isn’t communication paused briefly in both cases?
Flow control is discussed in more detail within TR-3802:
Ethernet Storage Best Practises
http://www.netapp.com/us/library/technical-reports/tr-3802.html
CONGESTION MANAGEMENT WITH FLOW CONTROL (pg22)
Chapter 10 in TR-3749 Version 1.0 has many technical inconsistencies. #1 – on page 59 – stacked switches, the NICs are teamed with IP hash to the vmkernel. This is a contradiction of what vSphere 4.0 recommends and that is 1:1 relationship between the vmkernel and physical NIC. So there should be 2 vmkernels and one physical NIC per vmkernel. Multipathing needs to be done at the storage level and not the networking level with ip hash. In fact, I see little value in stacked switches for this configuration, since cost must be included. #2 – with non-stacked switches it shows 2 VMkernels but does not explain the 1:1 relationship. #3 the service console is not required in 4.0 (it was required in 3.5). These are my observations. Maybe someone else also reported these findings.