It has been one year to the date since the last time I authored a post on the topic of partition alignment. I think its fair to state this issue has gained awareness over the last year thanks to posts from a number of industry experts such as Duncan Epping, Aaron Delp, and Chad Sakac just to name a few. However alignment remains an open issue, which is unfortunate for customers and partners.
In this post I hope to shed some new light on this old topic with a to revisit this topic one year from now to proclaim misalignment is dead!
Let’s Get on the Same Page
From where I sit it appears that this topic is somewhat confusing. It seems that some of the confusion is a direct result of some sales teams attempting to leverage the confusion around misalignment as incentive for the customer to purchase an alternative storage platform. I’ve heard the following messages far too many times…
Storage vendor sales pitch: “Mr. customer, Your current storage array suffers an awful performance penalty with misalignment. Our storage is unaffected by misalignment, so if you’d purchase our storage all of your problems will be resolved…”
These statements are simply rubbish. If you have a storage sales rep who has made statements such as these, maybe they’re taking you and your business for granted. Don’t take my word for it; here are quotes, with direct links, from the technology partners which power your data center.
Dell – “The physical translation is important when selecting LUN element size. The smaller the element size, the more efficient the distribution of data read or written. However, if the size is too small for a single I/O operation, the operation requires access to two stripes,which requires reading and/or writing from two disks instead of one. This is known as disk crossing. Best practices recommend selecting a size that is a multiple of 16 sectors (8 KB) and is the smallest size that will rarely result in forced access to another disk.”
EMC Symmetrix with vSphere – “Prior experience with misaligned Windows partitions and file systems has shown as much as 20 to 30 percent degradation in performance.” .. “Aligning the data partitions on 64KB boundary results in positive improvements in overall I/O response time experienced by all hosts connected to the shared storage array.”
EMC Symmetrix with Windows – “Misalignment with these storage boundaries could potentially lead to performance problems.”
EMC Symmetrix with Oracle – “Because of the first partition misalignment on x86 systems relative to the storage array tracks, all data on that partition will continue to be misaligned and that misalignment has been shown to cause performance degradation.”
EMC Clariion – “File-system misalignment affects performance in two ways: 1. Misalignment causes disk crossings: an I/O broken across two drives (where normally one would service it). 2. Misalignment makes it hard to stripe-align large uncached writes. The first case is more commonly encountered. Even if disk operations are bufferred by cache, the effect can be detrimental, as it will slow flushing from cache. Random reads, which by nature require disk access, are also affected, both directly (waiting for two drives in order to return data) and indirectly(making the disks busier than they need to be).”
IBM SVC – “The recommended settings for the best performance with SVC when you use Microsoft Windows operating systems and applications with a significant amount of I/O can be found at the following Web site”
IBM DS8000 – “An aligned partition setup makes sure that a single I/O request results in a minimum number of physical disk I/Os, eliminating the additional disk operations, which, in fact, results in an overall performance improvement.”
HP EVA – “As a quick background advisory, applications that utilize EVA VRAID disks might experience a write performance penalty with the default Windows 2003 primary disk partition alignment.” … “Qualitative evidence has shown that sector realignment with DiskPar has the greatest impact on large block sequential writes to VRAID5 LUNs rather than random I/O data streams. A significant impact can occur when performing a disk-to-disk backup using a VRAID5 LUN for the destination volume, for which there are large block sequential writes.”
Microsoft Exchange Server – “Setting the starting offset correctly will align Exchange I/O with storage track boundaries and improve disk performance” .. “Therefore, make sure that the starting offset is a multiple of 8 KB. Failure to do so may cause a single I/O operation spanning two tracks, causing performance degradation.”
Microsoft SQL Server 2008 – “Disk partition alignment is a powerful tool for improving SQL Server performance. Configuring optimal disk performance is often viewed as much art as science. A best practice that is essential yet often overlooked is disk partition alignment. Windows Server 2008 attempts to align new partitions out-of-the-box, yet disk partition alignment remains a relevant technology for partitions created on prior versions of Windows.”
Microsoft Windows – “Disk performance may be slower than expected when you use multiple disks in Microsoft Windows Server 2003, in Microsoft Windows XP, and in Microsoft Windows 2000. For example, performance may slow when you use a hardware-based redundant array of independent disks (RAID) or a software-based RAID.” … “To resolve this issue, use the Diskpart.exe tool to create the disk partition and to specify a starting offset of 2,048 sectors (1 megabyte). A starting offset of 2,048 sectors covers most stripe unit size scenarios.”
NetApp – “For optimal performance, the starting offset of a file system should align with the start of a block in the next lower layer of storage. For example, an NTFS file system that resides on a LUN should have an offset that is divisible by the block size of the storage array presenting the LUN. Misalignment of block boundaries at any one of these storage layers can result in performance degradation.”
Oracle Database Server – “On some Oracle ports, an Oracle block boundary may not align with the stripe. If your stripe depth is the same size as the Oracle block, then a single I/O issued by Oracle might result in two physical I/O operations. This is not optimal in an OLTP environment. To ensure a higher probability of one logical I/O resulting in no more than one physical I/O, the minimum stripe depth should be at least twice the Oracle block size.”
SUN Solaris – “An advanced storage system, such as Oracle’s Sun Storage 7000 Unified Storage System, is a traditional SCSI-accessed logical unit (LUN) to client systems. Although these devices may be accessed with legacy 512B I/O transfers, internally these devices may be managed with variable block sizes that are larger than the standard 512B sector size available on commercial hard disk drives. In practice, these advanced storage devices process data most effectively when the operating system I/O request is aligned with the block size of the LUN presented by the storage device. When I/O is not aligned to the block size of the LUN, response time may increase and throughput may decrease compared to the aligned case. “
VMware vSphere – “The alignment of your file system partitions can impact performance. VMware makes the following recommendations for VMFS partitions: Like other disk-based file systems, VMFS suffers a penalty when the partition is unaligned. Using the vSphere Client to create VMFS partitions avoids this problem since it automatically aligns the partitions along the 64KB boundary.” … “Make sure the system partitions within the guest are aligned.”
As you can See the IT industry is aligned on alignment!
The impact of a single misaligned VM may be nearly undetectable; however, today’s NAS & SAN arrays store exponentially more data than before the onset of server virtualization. At scale the effect of misalignment becomes compounded and debilitating, impacting all of the VMs on the storage array.
Could you imagine If server virtualization resulted in a 30% penalty in CPU performance? Would you be eager to virtualized the majority of your data center? Would you throw 30% more CPUs to offset the overhead? This may sound far-fetched, but this is exactly what some are doing, deploying additional storage hardware to offset the performance impact of misalignment. I think we know this step is a stop-gap measure at best.
It’s Time to Get Busy
Misalignment isn’t going to solve itself, so let’s discuss how one can start to tackle this issue and return our storage platforms to optimal performance levels. Your CIO will thank you for this as it will result in a reduction in storage expenditures as you increase the performance of your existing arrays.
Step 1 – Stop Deploying Misaligned VMs from Templates
If you are deploying Windows VMs you need to review your templates based on operating systems version. The primary offenders are Windows NT, 2000 and 2003. These platforms have a default starting partition offset of 32.256 bytes. For alignment they need to be evenly divisible by 4,096 bytes thus a minimum value for these systems is 32,768. This is the smallest value resulting in properly aligned partitions for NetApp and most arrays with the exception of EMC’s Symmetrix DMX & VMAX (which require a starting offset of 65,536).
When it comes to current version of Windows like Windows 7, 2008 and Vista these systems are properly aligned by default as they all have a 1MB starting partition offset, which works universally with all arrays. Kudos to Microsoft for stepping up to the plate to assist their entire customer base with this change!
While you may feel comfortable with the recent versions of Windows your templates can still be misaligned if you upgraded an older version of Windows to create the template.
I’d suggest you verify all of your templates and correct any, which are not properly aligned. If you’re a NetApp customer you can complete an audit with MBRscan and corrective actions with MBRAlign. If you’re not a customer and/or prefer to not use the MBRTools you have a plethora of additional tools including (but not limited to):
I should add, that misalignment also occurs with almost every release of LINUX and only recently has been addressed in default settings. At the time I wrote this post I couldn’t verify which recent releases and distros have moved to a 1MB partition offset. if someone sends me this info, I will add it to this post or an addendum.
Step 2 – Stop Deploying Misaligned VMs with your P2V Process
Unless you are using a physical to virtual migration tool that explicitly states it’s aligns partitions that you likely are creating misaligned VMs. I hate to single out VMware here, but the (free) VM Convertor creates misaligned VMs.
If your P2V process requires refinement you have two choices, either…
a) Upgrade your P2V tools to one from the list above
b) Continue using the misbehaving tool, but run MBRAlign on the newly migrated VM prior to powering it on.
Frankly option a) seems much more elegant, but that’s just my opinion.
Step 3 – Identify the Misaligned VMs in Production
If you have completed the above actions, you should fell confident you have started down the path of getting healthy, which is good; however, it only gets more difficult from here as we need to turn our attention to the VMs which are already running and this phase is going to require service disruption with each misaligned VM.
Before we jump to step 4, we need to begin by identifying the running VMs that are misaligned. Again NetApp customers can use MBRScan or our new tool Balance (formerly Akorri BalancePoint). As in step 2 if you’re not a NetApp customer and/or prefer other tools you can many to choose from (see the list above).
Step 4 – Correct Misaligned VMs
This is the final phase, and as long as you are no longer proliferating misaligned VMs, soon this process will be a distant memory. There’s no shortcuts to this last step, well not today, so prepare yourself to embark on a substantial project which requires VM offline while the misalignment is corrected.
The most difficult part of this process tends to be obtaining permission for application owners to take their systems offline and frankly you may find some application owners will be unwilling to do so, while others are more than happy to do so in hopes of increased performance. If you are replicating these VMs for disaster recovery purposes, you should also be prepared to consider the bandwidth requirements to re-replicate these VMs. WAN bandwidth can sometimes act as a capacity limiter on alignment projects.
MBRAlign and the other tools listed above all complete the alignment process by rewriting the virtual disk (the *–flat.vmdk file) with an offset more friendly for storage arrays. Some tools send data between the hypervisor and the storage array, while others may require a third host to act as a proxy. Be sure you understand the data flow before embarking on this last step.
Now an alternative to the traditional method of rewriting the file is to migrate the application to a new VM. With the maturation of Windows 2008 I am seeing more customers go down this path. While it is not the norm, it is a viable option that may bring other benefits.
Recent NetApp Enhancements Around Alignment
NetApp engineering is committed to helping customers correct the issue of alignment, and I’d like to share with you a few recent updates…
The use of the offload capabilities is a bit different with Data Ontap 7.3.x, where the thin VMDKs are created and thick VMDKs with 8.0.1. This difference will not impact data deduplication results, in fact by aligning, you should see improved dedupe savings.
My apologies to VMFS customers, as I often state NFS is a networked file system and as such it allows for direct access to storage virtualization layers by hypervisors, orchestration tools, etc. As such NFS commonly receives points of integration before we can do so with VMFS.
I love what we can do with WAFL! I guess I should have shared this info in the reasons to upgrade to 8.0.1 post. ☺
Looking Forward to the Future…
In future releases NetApp will deliver… oh how I wish I could publicly share what we’ve got cooking. Damn those NDAs!
I realize this opening may have been a cruel move on my part, but while I can’t share specifically what we are doing I want to assure you the NetApp and VMware engineering teams are stepping up to provide more advanced methods of addressing misalignment and we are doing so on a number of fronts. As each capability comes to market I will make sure you can read about it here first.
If you are a NetApp customer or partner with a NDA, you can get the inside scoop by contacting your NetApp representative and asking for a roadmap presentation on this topic.
Wrapping Up This Post
Wow – this post was a bit longer than I had planned… I apologize for that. In review I believe we’ve covered the following points around misalignment:
I hope you find this information helpful and aid in your plans around alignment. I look forward to sharing as we make progress on our roadmap. Cheers!