Current Status, An Update, and a Look to the Future of Alignment

-

 

1

It has been one year to the date since the last time I authored a post on the topic of partition alignment. I think its fair to state this issue has gained awareness over the last year thanks to posts from a number of industry experts such as Duncan Epping, Aaron Delp, and Chad Sakac just to name a few. However alignment remains an open issue, which is unfortunate for customers and partners.

 

 

2

In this post I hope to shed some new light on this old topic with a to revisit this topic one year from now to proclaim misalignment is dead!

 

Let’s Get on the Same Page

From where I sit it appears that this topic is somewhat confusing. It seems that some of the confusion is a direct result of some sales teams attempting to leverage the confusion around misalignment as incentive for the customer to purchase an alternative storage platform. I’ve heard the following messages far too many times…

Storage vendor sales pitch: “Mr. customer, Your current storage array suffers an awful performance penalty with misalignment. Our storage is unaffected by misalignment, so if you’d purchase our storage all of your problems will be resolved…”

These statements are simply rubbish. If you have a storage sales rep who has made statements such as these, maybe they’re taking you and your business for granted. Don’t take my word for it; here are quotes, with direct links, from the technology partners which power your data center.

Dell – “The physical translation is important when selecting LUN element size. The smaller the element size, the more efficient the distribution of data read or written. However, if the size is too small for a single I/O operation, the operation requires access to two stripes,which requires reading and/or writing from two disks instead of one. This is known as disk crossing. Best practices recommend selecting a size that is a multiple of 16 sectors (8 KB) and is the smallest size that will rarely result in forced access to another disk.”

EMC Symmetrix with vSphere – “Prior experience with misaligned Windows partitions and file systems has shown as much as 20 to 30 percent degradation in performance.” .. “Aligning the data partitions on 64KB boundary results in positive improvements in overall I/O response time experienced by all hosts connected to the shared storage array.”

EMC Symmetrix with Windows – “Misalignment with these storage boundaries could potentially lead to performance problems.”

EMC Symmetrix with Oracle – “Because of the first partition misalignment on x86 systems relative to the storage array tracks, all data on that partition will continue to be misaligned and that misalignment has been shown to cause performance degradation.”

EMC Symmetrix with Oracle – “Aligning partitions to a 64KB offset is a general requirement on Linux x86_64 with ASM and Symmetrix storage arrays.”

EMC Clariion – “File-system misalignment affects performance in two ways: 1. Misalignment causes disk crossings: an I/O broken across two drives (where normally one would service it). 2. Misalignment makes it hard to stripe-align large uncached writes. The first case is more commonly encountered. Even if disk operations are bufferred by cache, the effect can be detrimental, as it will slow flushing from cache. Random reads, which by nature require disk access, are also affected, both directly (waiting for two drives in order to return data) and indirectly(making the disks busier than they need to be).”

EMC Celerra – “Performance improvement as high as 40% was observed on partitioning the drive using DISKPART and aligning the disk.”

IBM SVC – “The recommended settings for the best performance with SVC when you use Microsoft Windows operating systems and applications with a significant amount of I/O can be found at the following Web site”

IBM DS8000 – “An aligned partition setup makes sure that a single I/O request results in a minimum number of physical disk I/Os, eliminating the additional disk operations, which, in fact, results in an overall performance improvement.”

HP EVA – “As a quick background advisory, applications that utilize EVA VRAID disks might experience a write performance penalty with the default Windows 2003 primary disk partition alignment.” … “Qualitative evidence has shown that sector realignment with DiskPar has the greatest impact on large block sequential writes to VRAID5 LUNs rather than random I/O data streams. A significant impact can occur when performing a disk-to-disk backup using a VRAID5 LUN for the destination volume, for which there are large block sequential writes.”

Microsoft Exchange Server – “Setting the starting offset correctly will align Exchange I/O with storage track boundaries and improve disk performance” .. “Therefore, make sure that the starting offset is a multiple of 8 KB. Failure to do so may cause a single I/O operation spanning two tracks, causing performance degradation.”

Microsoft SQL Server 2008 – “Disk partition alignment is a powerful tool for improving SQL Server performance. Configuring optimal disk performance is often viewed as much art as science. A best practice that is essential yet often overlooked is disk partition alignment. Windows Server 2008 attempts to align new partitions out-of-the-box, yet disk partition alignment remains a relevant technology for partitions created on prior versions of Windows.”

Microsoft Windows – “Disk performance may be slower than expected when you use multiple disks in Microsoft Windows Server 2003, in Microsoft Windows XP, and in Microsoft Windows 2000. For example, performance may slow when you use a hardware-based redundant array of independent disks (RAID) or a software-based RAID.” … “To resolve this issue, use the Diskpart.exe tool to create the disk partition and to specify a starting offset of 2,048 sectors (1 megabyte). A starting offset of 2,048 sectors covers most stripe unit size scenarios.”

NetApp – “For optimal performance, the starting offset of a file system should align with the start of a block in the next lower layer of storage. For example, an NTFS file system that resides on a LUN should have an offset that is divisible by the block size of the storage array presenting the LUN. Misalignment of block boundaries at any one of these storage layers can result in performance degradation.”

Oracle Database Server – “On some Oracle ports, an Oracle block boundary may not align with the stripe. If your stripe depth is the same size as the Oracle block, then a single I/O issued by Oracle might result in two physical I/O operations. This is not optimal in an OLTP environment. To ensure a higher probability of one logical I/O resulting in no more than one physical I/O, the minimum stripe depth should be at least twice the Oracle block size.”

SUN Solaris – “An advanced storage system, such as Oracle’s Sun Storage 7000 Unified Storage System, is a traditional SCSI-accessed logical unit (LUN) to client systems. Although these devices may be accessed with legacy 512B I/O transfers, internally these devices may be managed with variable block sizes that are larger than the standard 512B sector size available on commercial hard disk drives. In practice, these advanced storage devices process data most effectively when the operating system I/O request is aligned with the block size of the LUN presented by the storage device. When I/O is not aligned to the block size of the LUN, response time may increase and throughput may decrease compared to the aligned case. “

 

 


As you can See the IT industry is aligned on alignment!

The impact of a single misaligned VM may be nearly undetectable; however, today’s NAS & SAN arrays store exponentially more data than before the onset of server virtualization. At scale the effect of misalignment becomes compounded and debilitating, impacting all of the VMs on the storage array.

Could you imagine If server virtualization resulted in a 30% penalty in CPU performance? Would you be eager to virtualized the majority of your data center? Would you throw 30% more CPUs to offset the overhead? This may sound far-fetched, but this is exactly what some are doing, deploying additional storage hardware to offset the performance impact of misalignment. I think we know this step is a stop-gap measure at best.

It’s Time to Get Busy

Misalignment isn’t going to solve itself, so let’s discuss how one can start to tackle this issue and return our storage platforms to optimal performance levels. Your CIO will thank you for this as it will result in a reduction in storage expenditures as you increase the performance of your existing arrays.

Step 1 – Stop Deploying Misaligned VMs from Templates

If you are deploying Windows VMs you need to review your templates based on operating systems version. The primary offenders are Windows NT, 2000 and 2003. These platforms have a default starting partition offset of 32.256 bytes. For alignment they need to be evenly divisible by 4,096 bytes thus a minimum value for these systems is 32,768. This is the smallest value resulting in properly aligned partitions for NetApp and most arrays with the exception of EMC’s Symmetrix DMX & VMAX (which require a starting offset of 65,536).

When it comes to current version of Windows like Windows 7, 2008 and Vista these systems are properly aligned by default as they all have a 1MB starting partition offset, which works universally with all arrays. Kudos to Microsoft for stepping up to the plate to assist their entire customer base with this change!

While you may feel comfortable with the recent versions of Windows your templates can still be misaligned if you upgraded an older version of Windows to create the template.

I’d suggest you verify all of your templates and correct any, which are not properly aligned. If you’re a NetApp customer you can complete an audit with MBRscan and corrective actions with MBRAlign. If you’re not a customer and/or prefer to not use the MBRTools you have a plethora of additional tools including (but not limited to):

I should add, that misalignment also occurs with almost every release of LINUX and only recently has been addressed in default settings. At the time I wrote this post I couldn’t verify which recent releases and distros have moved to a 1MB partition offset. if someone sends me this info, I will add it to this post or an addendum.

Step 2 – Stop Deploying Misaligned VMs with your P2V Process

Unless you are using a physical to virtual migration tool that explicitly states it’s aligns partitions that you likely are creating misaligned VMs. I hate to single out VMware here, but the (free) VM Convertor creates misaligned VMs.

If your P2V process requires refinement you have two choices, either…

a) Upgrade your P2V tools to one from the list above

or

b) Continue using the misbehaving tool, but run MBRAlign on the newly migrated VM prior to powering it on.

Frankly option a) seems much more elegant, but that’s just my opinion.

Step 3 – Identify the Misaligned VMs in Production

If you have completed the above actions, you should fell confident you have started down the path of getting healthy, which is good; however, it only gets more difficult from here as we need to turn our attention to the VMs which are already running and this phase is going to require service disruption with each misaligned VM.

Before we jump to step 4, we need to begin by identifying the running VMs that are misaligned. Again NetApp customers can use MBRScan or our new tool Balance (formerly Akorri BalancePoint). As in step 2 if you’re not a NetApp customer and/or prefer other tools you can many to choose from (see the list above).

Akorri copy

 

Step 4 – Correct Misaligned VMs

This is the final phase, and as long as you are no longer proliferating misaligned VMs, soon this process will be a distant memory. There’s no shortcuts to this last step, well not today, so prepare yourself to embark on a substantial project which requires VM offline while the misalignment is corrected.

The most difficult part of this process tends to be obtaining permission for application owners to take their systems offline and frankly you may find some application owners will be unwilling to do so, while others are more than happy to do so in hopes of increased performance. If you are replicating these VMs for disaster recovery purposes, you should also be prepared to consider the bandwidth requirements to re-replicate these VMs. WAN bandwidth can sometimes act as a capacity limiter on alignment projects.

MBRAlign and the other tools listed above all complete the alignment process by rewriting the virtual disk (the *–flat.vmdk file) with an offset more friendly for storage arrays. Some tools send data between the hypervisor and the storage array, while others may require a third host to act as a proxy. Be sure you understand the data flow before embarking on this last step.

Now an alternative to the traditional method of rewriting the file is to migrate the application to a new VM. With the maturation of Windows 2008 I am seeing more customers go down this path. While it is not the norm, it is a viable option that may bring other benefits.

Recent NetApp Enhancements Around Alignment

NetApp engineering is committed to helping customers correct the issue of alignment, and I’d like to share with you a few recent updates…

  • MBRAlign has been updated and now supports I/O offload or hardware acceleration for NFS datastores. You can download this update in the latest release of the EHU on NOW. Test results with a 8GB VMDK containing 4.1GBs of data show a performance improvement of 66% (time reduced from 5:43 to 1:58)!

    The use of the offload capabilities is a bit different with Data Ontap 7.3.x, where the thin VMDKs are created and thick VMDKs with 8.0.1. This difference will not impact data deduplication results, in fact by aligning, you should see improved dedupe savings.

    My apologies to VMFS customers, as I often state NFS is a networked file system and as such it allows for direct access to storage virtualization layers by hypervisors, orchestration tools, etc. As such NFS commonly receives points of integration before we can do so with VMFS.

  • Data Ontap 8.0.1 has an update (burt 167599) that allows the array to reduce the performance impact of I/Os created by misaligned VMDKs. While this update does not eliminate the need for alignment, it is a start.

    I love what we can do with WAFL! I guess I should have shared this info in the reasons to upgrade to 8.0.1 post. ☺

  • NetApp professional services have launched VMware Alignment Services, a turnkey solution available to customers that will execute your alignment project following best practices in order to ensure the smallest disruption to your environment. If you need to correct your alignment issue right away, VMAS may be your best bet.

Looking Forward to the Future…

In future releases NetApp will deliver… oh how I wish I could publicly share what we’ve got cooking. Damn those NDAs!

I realize this opening may have been a cruel move on my part, but while I can’t share specifically what we are doing I want to assure you the NetApp and VMware engineering teams are stepping up to provide more advanced methods of addressing misalignment and we are doing so on a number of fronts. As each capability comes to market I will make sure you can read about it here first.

If you are a NetApp customer or partner with a NDA, you can get the inside scoop by contacting your NetApp representative and asking for a roadmap presentation on this topic.

Wrapping Up This Post

Wow – this post was a bit longer than I had planned… I apologize for that. In review I believe we’ve covered the following points around misalignment:

  • Alignment is an industry wide problem
  • It impacts performance
  • Adding hardware will only mask and not solve the issue
  • Discussed the current methods to correct misalignment
  • Shared NetApp enhancements to the situation
  • Reinforced that we are working on some additional technologies to remediate this issue

I hope you find this information helpful and aid in your plans around alignment. I look forward to sharing as we make progress on our roadmap. Cheers!


Vaughn Stewart
Vaughn Stewarthttp://twitter.com/vStewed
Vaughn is a VP of Systems Engineering at VAST Data. He helps organizations capitalize on what’s possible from VAST’s Universal Storage in a multitude of environments including A.I. & deep learning, data analytics, animation & VFX, media & broadcast, health & life sciences, data protection, etc. He spent 23 years in various leadership roles at Pure Storage and NetApp, and has been awarded a U.S. patent. Vaughn strives to simplify the technically complex and advocates thinking outside the box. You can find his perspective online at vaughnstewart.com and in print; he’s coauthored multiple books including “Virtualization Changes Everything: Storage Strategies for VMware vSphere & Cloud Computing“.

Share this article

Recent posts

29 Comments

  1. Great post Vaughn! one thing I will note on the Linux side of the house, if you are using LVM (Logical Volume Manager) or Oracle’s ASM, as long as you use the entire device i.e /dev/sda and do not put a partition table down i.e /dev/sda1 then you will not be mis-aligned. Its only when there is a partition table layed down the be OS when you are misaligned.

  2. Definetly a great “add-on” to your original post I am wondering what kind of impact could have the re-aligment of vmdks on large architecture.
    As you correctly state “Be sure you understand the data flow before embarking on this last step.”.

  3. This is very good information…Can someone please advice if PLATESPIN actually does align the disks after a P2V? Thank You!

  4. Hey Vaughn – First off, thanks for the link! A quick point of emphasis to your step 4. When you say you can migrate the application to a 2008 instance, you mean a fresh install (that is aligned by default) of 2008. You can’t upgrade from 2003 to 2008 and gain alignment. If you were unaligned in 2003, you would be unaligned if you upgrade the vm to 2008 because it doesn’t rewrite the partition tables. Makes sense when you think about it but I have seen that be a point of confusion in the past.
    Thanks!
    -Aaron

  5. Will any of the new tools address thin provisioning after alignment? Correct me if I’m wrong, but if you have a thin provisioned VM that needs alignment, you not only have to take the VM offline to perform mbralign, but afterwards you would have to storage migrate it to a different datastore and then back? Without the storage migration step to re-thin provision it, I will make all my thin VMDKs “fat”. Frankly, that’s a PITA and is what has held us back from correcting all of ours. 🙂
    I’m liking your idea of just migrating them to new 2008 servers. Lots of other benefits to go with it, like being able to resize the system partition with diskpart on the fly.

  6. I’m pretty sure you can actually get the most recent version of Converter from VMware to produce aligned VMs — you need to precreate the vmdk with an aligned partition and then point VMware Converter at an existing vmdk rather than creating one from scratch. A bit tedious but feasible (and not too bad if you create the vmdk thin and copy it as needed).

  7. NetApp Syncsort Integrated Backup or NSB — the joint data protection solution from Syncsort and NetApp — also provides built-in P2V migration capabilities. Couple of cool things about it.
    1. It starts with backups of your physical servers, stored as NetApp snapshots.
    2. When the P2V migration takes place, it first uses the snapshot to boot the VM (creates a FlexClone). This means that your “conversion” time is about 5-10 minutes. That is, the new VM is up and running in that time.
    3. The migration of the data to the VMDK takes place behind the scenes after the VM is already running off the FlexClone.
    4. When the data is all moved to the VMDK, it invokes Storage vMotion to switch from the FlexClone to the VMDK. Zero additional downtime.
    5. As part of the data migration from P2V, storage alignment takes place. So your new VMDK is correctly aligned.
    6. When the server is migrated, data protection is already in place and backups just continue to run.
    7. If needed, NSB can also migrate systems back to physical servers. V2V is also available.
    All this is included as part of the licensing cost. There is no additional fees for the migration tools. And it’s capacity based so you can migrate all the systems you want.
    Note: primary storage does NOT need to be NetApp. Works with any DAS or SAN storage. Supports Windows and Linux systems.
    It’s pretty neat stuff!
    Peter Eicher
    Syncsort

  8. ZFS misalignment due to variable block size is the pain point we’re seeing now on our Netapp storage. The VMware side of things has been sorted for some time.

  9. @All – thanks for the great dialog & feedback
    @David – thanks for the additional info
    @Alfwebcom – I’ll see if I can get to this request, but it may be beyond my scope.
    @Parikshith – yes platespin aligns vmdks
    @Nick – great pint on gparted
    @Aaron – right!
    @Brian – MBRAlign will preserve the thin attribute with the exception of using I/O offload with Data Ontap 8.0.1. Now, I’d suggest thin or thick provisioning is a non-issue if you are using data deduplication… you are using dedupe aren’t you? 🙂
    @Andrew – good points, but may be a bit difficult for mass adoption. I have found if it isn’t easy than most wont adopt.
    @Peter – thanks for the info on Syncsort
    @Tm – NetApp arrays don’t use ZFS, so may I ask you to clarify your statement?

  10. @David: While not using a partition table works for ASM and maybe LVM, this is not supported by SnapManager for Oracle on RedHat.
    And, how does LVM make sure that it’s logical partitions (or filesystems) are aligned?

  11. If you use Hyper-V I’d add one additional step: Use fixed VHDs. Dynamic VHDs insert container metadata inline with the filesystem data (as the VHD grows) resulting in misalignment regardless of your partition layout. So if you’re fixing partition misalignment on a VM with dynamic VHDs make sure you also migrate (not convert) to fixed VHDs as well.

  12. I noticed there is not much conversation around aligning VMs that exist in ESXi, just using the host utilities for ESX or third party tools that barely work on ESXi. I take it this is some of the NDA stuff? 🙁

  13. @Malhoit: I’m afraid that’s incorrect. It would be great if ESX had insight into how the guest OS writes data to disk and how the underlying storage handles disk blocks, but it doesn’t, and that’s why aligning is such a hassle. The only hope is that newer OS releases are aware of virtualization unlike the previous generation, so things are bound to get better once we start upgrading our old Windows 2003 and older installations.

  14. As a update, a co-worker pointed out that ASM does require a partition table but LVM you can use a raw device. Thanks @wayne and @allen for pointing out my error

  15. @Fletcher – I don’t have intimate knowledge around the data you are seeing with nfsstat, but I’d suggest you have an application in your VM that is creating a number of writes that are less than 4KB in size.
    If what I suggest is accurate there’s no need to be concerned, as it is normal behavior for the application.
    I know you’re well aware of misalignment, but please allow me to restate for those who may not be… What we want to avoid is having misaligned I/O for hundreds or thousands of VMs on an array. The inefficiency in I/O transfers due to a large mass of misaligned VMs will stress the array and lead to the eventual need to upgrade hardware.
    Let me know if the small writes is or is not the case. I’d be happy to engage others to continue the conversation if needed.
    Vaughn

  16. Hi Vaughn,
    Any news on the ESXi side of things? I did an evaluation of NetApp earlier this year and remember the lack of alignment tools (or host utilities) seemed slightly limiting. Also any news on EHU and SnapDrive compatibility with vSphere 5?
    Cheers,
    Tom

    • Tom,

      NetApp engineering has some updates which are drawing close to completion. As soon as I can share publicly I will.

      I would suggest that you contact your NetApp partner or sales team and request a NDA presentation on our VMware roadmap including alignment. I think you’ll be pleased.

      Cheers,
      Vaughn

  17. Just to be sure: By “offset”, you mean the “start offset of a partition”, am I right? In other words, all partitions should begin at multiples of 1 MB?

Leave a Reply

Recent comments

Jens Melhede (Violin Memory) on I’m Joining the Flash Revolution. How about You?
Andris Masengi Indonesia on One Week with an iPad
Just A Storage Guy on Myth Busting: Storage Guarantees
Stavstud on One Week with an iPad
Packetboy on One Week with an iPad
Andrew Mitchell on One Week with an iPad
Keith Norbie on One Week with an iPad
Adrian Simays on EMC Benchmarking Shenanigans
Brian on Welcome…