As one might imagine
Several of us within NetApp receive a fair amount of feedback around the content in our VMware on NetApp Best Practices Technical Report, TR-3428. A few of the areas where we receive a large amount of inquiry are related to the frequency of the updates to the document and requests for notification of when updates are posted.
I’d like to use this post as my way of saying we hear you, to follow up on these questions and ask for some feedback as we move forward with the document and begin work on our ESX 4.0 on NetApp Best Practices document.
An update has been posted
Version 4.4 went live on November 24th around midnight EDT. As this is update closely followed version 4.3, which posted in November, I’d like to share with you some of the key updates and additions to the content that have been introduced in the past 30 days.
VMware snapshots with NFS datastores
This topic has been discussed enough in the blogosphere (here, here, and here as examples) where I feel that I don’t need to spend too much time on the details, but at a high level this issue has been resolved in ESX 3.5 updates 1 thru 3. This issue is also commonly referred to as ‘NFS locking’ as previous workarounds to this issue involved disabling locking for NFS datastores.
In short, this issue resulted in the lengthy suspension of I/O within a GOS during the deletion of a VMware snapshot when the VM resided in a NFS datastore. This issue was found to affect any NFS server and had an impact on many solutions including, but not limited to, Storage VMotion, Consolidated Backup, and NetApp SnapManager for Virtual Infrastructure.
Which version of ESX you are running will dictate the course of action required to enable this fix. If your environment is running ESX 3.5 update 1 or update 2 you will need to download and install VMware patch ESX350-200808401-BG. If you have ESX 3.5 update 3, the code for the fix is included in the release; however, you still need to enable the fix.
The process for patching or enabling the fix is outlined in TR-3428. Regardless of which update you are running, the process requires the completion of multiple steps in order to be successfully implemented.
As stated earlier, this fix only applies to ESX 3.5 updates 1 thru 3. If you are running VMware on NFS datastores and aren’t on ESX 3.5 updates 1 or higher, I’d suggest that you plan an upgrade. Look on the bright side; you should be able to complete this process without disrupting your VMs, while at the same time gaining the wealth of enhancements provided with 3.5.
Updated TCP/IP heap settings
The joint engineering teams at VMware and NetApp discovered that the published recommendations for TCP/IP heap were incomplete. This information has been flushed out in VMware KB Article 2239 & is repeated in TR-3428.
In summary, the TCP/IP heap setting allocates memory when an ESX server boots for IP related functions such as IP based storage (booth iSCSI & NFS), VMotion and Storage VMotion, remote console connection to VMs, and possibly other purposes. This heap also has a second setting which defines the maximum amount of memory which can be consumed by the EX server should more heap be required to maintain operations.
I’d suggest that if you are running on a storage IP infrastructure that you review the updated content and implement the recommendations during your next maintenance window. Failure to do so can result in a situation where the ESX server exhausts the heap and services are disrupted.
Thinking ahead and soliciting feedback
TR-3428 began as a grassroots effort a few years ago, and it is beginning to show its age. Recently I was surprised to learn that it is the most popular Technical Report in the NetApp tech Library. This news lets us know that we have been on the mark with providing information into a wildly exploding market.
It is with this need in mind that I’d like to share a few of my thoughts on what we are considering in the forthcoming ESX 4.0 on NetApp Best Practices Technical Report.
Thoughts for the next version include…
1. Organizing the document based on administrative roles. For example, dedicated sections for VMware admins, storage admins, storage network admins, & GOS image admins.
2. Provide advanced switch configuration settings for options such as flow control, spanning tree, bridge protocol data units, channel modes etc. It is clear that Ethernet is the storage network for many extremely large installations and the growth in this area isn’t slowing.
3. Providing a more centralized collection of data, be it in protocol driven navigation, inclusion of more links to external references, including more KB related content (which historically has been separate), etc.
This is a document for the people by the people
I’d like to thank all of our customers, partners, and VMware engineers who have consistently provided feedback thru the lifecycle of TR-3428. Your feedback directly influences the future development of our technical documents and I’d like to ask for your continued feedback and suggestions so we can ensure that we are delivering the content you need to be successful.