VMware on NetApp Best Practices Update

9
80

TR-3428_cover.gifAs one might imagine

Several of us within NetApp receive a fair amount of feedback around the content in our VMware on NetApp Best Practices Technical Report, TR-3428.  A few of the areas where we receive a large amount of inquiry are related to the frequency of the updates to the document and requests for notification of when updates are posted.
I’d like to use this post as my way of saying we hear you, to follow up on these questions and ask for some feedback as we move forward with the document and begin work on our ESX 4.0 on NetApp Best Practices document.

An update has been posted

Version 4.4 went live on November 24th around midnight EDT.  As this is update closely followed version 4.3, which posted in November, I’d like to share with you some of the key updates and additions to the content that have been introduced in the past 30 days.

VMware snapshots with NFS datastores

This topic has been discussed enough in the blogosphere (here, here, and here as examples) where I feel that I don’t need to spend too much time on the details, but at a high level this issue has been resolved in ESX 3.5 updates 1 thru 3.  This issue is also commonly referred to as ‘NFS locking’ as previous workarounds to this issue involved disabling locking for NFS datastores.

In short, this issue resulted in the lengthy suspension of I/O within a GOS during the deletion of a VMware snapshot when the VM resided in a NFS datastore.  This issue was found to affect any NFS server and had an impact on many solutions including, but not limited to, Storage VMotion, Consolidated Backup, and NetApp SnapManager for Virtual Infrastructure.

Which version of ESX you are running will dictate the course of action required to enable this fix.  If your environment is running ESX 3.5 update 1 or update 2 you will need to download and install VMware patch ESX350-200808401-BG. If you have ESX 3.5 update 3, the code for the fix is included in the release; however, you still need to enable the fix. 

The process for patching or enabling the fix is outlined in TR-3428.  Regardless of which update you are running, the process requires the completion of multiple steps in order to be successfully implemented. 

As stated earlier, this fix only applies to ESX 3.5 updates 1 thru 3.  If you are running VMware on NFS datastores and aren’t on ESX 3.5 updates 1 or higher, I’d suggest that you plan an upgrade.  Look on the bright side; you should be able to complete this process without disrupting your VMs, while at the same time gaining the wealth of enhancements provided with 3.5.

Updated TCP/IP heap settings

The joint engineering teams at VMware and NetApp discovered that the published recommendations for TCP/IP heap were incomplete.  This information has been flushed out in VMware KB Article 2239 & is repeated in TR-3428.

In summary, the TCP/IP heap setting allocates memory when an ESX server boots for IP related functions such as IP based storage (booth iSCSI & NFS), VMotion and Storage VMotion, remote console connection to VMs, and possibly other purposes.  This heap also has a second setting which defines the maximum amount of memory which can be consumed by the EX server should more heap be required to maintain operations.

I’d suggest that if you are running on a storage IP infrastructure that you review the updated content and implement the recommendations during your next maintenance window.  Failure to do so can result in a situation where the ESX server exhausts the heap and services are disrupted.

Thinking ahead and soliciting feedback

TR-3428 began as a grassroots effort a few years ago, and it is beginning to show its age.  Recently I was surprised to learn that it is the most popular Technical Report in the NetApp tech Library.  This news lets us know that we have been on the mark with providing information into a wildly exploding market.

It is with this need in mind that I’d like to share a few of my thoughts on what we are considering in the forthcoming ESX 4.0 on NetApp Best Practices Technical Report. 

Thoughts for the next version include…

1. Organizing the document based on administrative roles.  For example, dedicated sections for VMware admins, storage admins, storage network admins, & GOS image admins.

2. Provide advanced switch configuration settings for options such as flow control, spanning tree, bridge protocol data units, channel modes etc.  It is clear that Ethernet is the storage network for many extremely large installations and the growth in this area isn’t slowing.


3. Providing a more centralized collection of data, be it in protocol driven navigation, inclusion of more links to external references, including more KB related content (which historically has been separate), etc.


This is a document for the people by the people

I’d like to thank all of our customers, partners, and VMware engineers who have consistently provided feedback thru the lifecycle of TR-3428.  Your feedback directly influences the future development of our technical documents and I’d like to ask for your continued feedback and suggestions so we can ensure that we are delivering the content you need to be successful.

9 Comments

  1. Feedback – How?
    I have some feedback on the TR-3428, but there’s no where to turn with the feedback.. A link or an e-mail adress to post the comments in would be great!
    Btw, I think I have found a problem with one of these practices, and would like to share it with the writers.
    -Ulf

  2. Paul, I’d like to make some comments regarding dedupe if that’s OK. While Dedupe is available from many vendors, NetApp is the only one offering Dedupe for production environments. Dedupe impacts many of the requirements in one’s infrastructure from production storage, to backups (when backing up to disk), & DR bandwidth replication.
    DeDupe is available on all arrays from NetApp and works with all protocols. Now if you think Dedupe is cool check out NTAP intelligent caching (or dedupe aware array cache).
    For more info see:
    http://www.netapp.com/us/solutions/infrastructure/virtualization/
    and a video here:

  3. Additional info in NFS configuration is appreciated. Configs for SMVI and disk persistence are critical. Lots of good info in this doc.
    Have seen improved performance when a dedicated swap datastore has less than 24 guests per AND hosts page to their own dedicated datastore, one per host machine. vswap_4guest and swap_4host, for example.. They don’t seem to share well when combined.
    As the software offerings (smvi, ops mgr suite, etc) grow keeping on the standard becomes mandatory for those of us who consume the technology. This is certainly on my notification list for white papers.
    Off topic but has VM released any beta esx4 to major vendors? After recent issues I am surprised not to have seen more on a beta product out there.
    thanks again,
    ben g.

  4. TR-3428 has been recently updated again in March 2009. Could you do a follow up post similar to this article on the most recent updates? I think it would be very helpful to customers to understand the most changes to the document and perhaps why a re-read might be good.
    Thank you.

Leave a Reply