Recently the news wire has been abuzz with the revelation that EMC XtremIO customers wanting to upgrade the system software to the coming XIOS version 3.0 will be disruptive. To be clear, this isn’t the typical run-of-the-mill, inconvenient, disruptive upgrade you’ve come to loathe. It’s destructive to your data. An upgrade to XIOS 3.0 requires all data on the XtremIO to be evacuated prior to the start of the upgrade and restored upon completion. Such an upgrade may leave customers without access to data sets for hours or days and who’s responsible for the storage capacity required to house the data while the XtremIO is wiped out and reconfigured?
Luckily, most XtremIO customers have become accustomed to the XtremIO hardware and software ‘upgrade’ procedures…
Want to add a Xbrick to the ‘scale-out’ cluster.
Wipe the XtremIO.
Want to increase flash capacity from 10TB to 20TB Xbrick?
Wipe the XtremIO.
Want to add encryption with XIOS 2.4?
Wipe the XtremIO.
Want to add compression with XIOS 3.0?
Wipe the XtremIO.
Etc…
I realize I may sound like a vendor who’s taking advantage of the news cycle to bash my competition. That’s a fair charge; however, this ‘news’ isn’t actually news. The disruptive data destructive upgrades required for XtremIO have been reported by a few bloggers. I tip my hat to Chris M. Evans, Martin Glassborow and Justin Warren the only storage experts that I am aware of who have been on top of this data point since the product released late last year.
Downtime windows have gone the way of the Dodo
Today’s IT departments support geographically distributed, mobile workforces who expect business systems and applications to be available and operating at full capacity 24 hours a day, 365 days a year. There is no down time, let alone a time for a dip in performance.
Data is persistent; it will outlive the software and hardware it resides on several times over. This fact should factor heavily in your storage selection process. I know it drives the engineers at Pure Storage to deliver storage systems that are always-on. The formula is simple: combine stateless controllers and modular flash expansion shelves with the Purity Operating Environment and you can add, remove, upgrade, replace and refresh every element in a FlashArray without ever losing performance or reconfiguring the surrounding environment. We even have a business model, Forever Flash that marries that flexibility in technology with a continuous acquisition model to match.
Architecture matters, so let’s dig in a bit deeper
In a recent post my good friend and EMC SVP, Chad Sakac, has asked the public to consider disruptive outages a norm within the storage industry. While downtime is common in storage architectures originating in the 1990s, it is simply a non-starter for next-generation storage like all-flash arrays. Data center technologies either advance or they disappear.
The only disruptive upgrades required on a FlashArray have been when we have moved customers from beta to GA code. This occurred in 2012, when we GA’d Purity for the first time and again in 2014 with our replication beta program. We have never asked customers to take an outage in order to upgrade a GA release of Purity.
We have even enabled customers to non-disruptively upgrade between hardware platforms, including from previous to future generations (from FA-300 to FA-400) without downtime or performance loss, and guess what… we’re committed to doing it again in 2015, and the year after, and the year after…
Architecture matters. At Pure Storage we view hardware and software as a mutually reliant and supportive element in a platform that delivers always-on availability. Four key components for Always-On include…
1. Stateless Controllers: FlashArray controller provides CPU, memory and IO ports to access and store data. Both the transient and persistent data storage layers (NVRAM and SSD respectively) reside in the flash storage enclosures (or shelves/modules depending on your terminology of choice). This allows for the mixing of different controller hardware and in turn enables on-demand performance upgrades. By contrast an XtremIO would require the addition of a new Xbrick and the wiping of data on the existing array.
2. Modular Flash Enclosures: There’s nothing simpler to increasing storage capacity than adding an additional shelf of flash to a running FlashArray. Cable up the new shelf and… well that’s it, the storage is ready to be provisioned. There’s nothing else to do. Scale capacity on demand. Like the controller upgrade, a capacity expansion requires an additional Xbrick and subsequent wiping of the existing data.
3. Adaptive Metadata Fabric: Our adaptive metadata structure is scalable, versioned, and hierarchical, which provides Pure a flexible foundation for future enhancements to the platform. When we need to make a major update to our metadata, old metadata structures can be left intact and referenced through the new metadata, and the natural background optimization processes of the array migrate metadata to the new format over time. As an example, we added completely new metadata structures for snapshots as part of the Purity 2.5 release, all without downtime or migration. By contrast, XtremIO has a fixed metadata structure with DRAM-bound metadata constraints. The metadata structure must be recreated when a new feature like encryption or compression is added to the system and thus the requirement to wipe and restore existing data.
4. Variable-Sized Data Segments and Adaptive RAID: Purity stores data in adaptive, variable-sized segments. This flexibility is fundamental to the system in term of performance optimization, data reduction and data protection. There’s no fixed segment size, no fixed alignment to drives or drive sizes, and there’s no constraint on RAID geometry. All segments are written with dual-parity and those highly reduced are written with triple or even greater parity. So, if we need to update the structure of a segment in a new release to add richness – no problem. This is unlike fixed size, content addressing scheme found in XtremIO. For XIOS to add compression the backend block size will increase from 4KB to 8KB and as you by should know by now… require you to wipe the system of data.
Flexibility at these two layers wasn’t just designed – it is actively used in the FlashArray. We’ve shipped modifications to our metadata structures and data layout segments in EVERY major release since inception – such flexibility is key in rapidly advancing a modern storage array – without downtime.
“In 20 years at Veritas, we never required disruptive upgrades for the file system or volume manager products – and we made major changes to the metadata structure of each product somewhere on the order of four or five times during that phase. We implemented versioning and backward compatible formats to ensure customers were always online. At Pure we’re taking this one step further and applying this design to the FlashArray architecture; this includes the hardware and the software.”
— John ‘Coz’ Colgrove, Founder and CTO, Pure Storage
I’m confident some of you are shouting at this post, “What about technologies like VMware Storage VMotion, SQL Server Availability Groups, and Oracle on ASM, they all provide means to non-disruptively migrate data sets!” You are correct, often the infrastructure can be called upon to solve a storage challenge; however, note that every one of these solutions require 100% spare or unused storage capacity exist and be available prior to the initialization of the migration process. Who has this type of extra storage sitting around, let alone the staffing resources to manage the data shuffle, and what would the performance impact be when you move off of flash?
Wiping the data from an array also results in the loss of all snapshot backups and data set clones. What are you to do if your AFA supports virtual desktops (VDI) or database application developers – just turn off those users?
In Closing…
The decision to change the underlying architecture of the XtremIO wasn’t a trival one. It’s a mssive undertaking that was made some time ago. Such a change within the first 10 months of a product’s GA release strongly suggests it was rushed to market.
“When you have to migrate your data every time your vendor adds a feature, there’s a term for that… beta program.”
– John Hayes, Founder and Chief Architect, Pure Storage
I give Chad a lot of credit. He’s trying to be as transparent as possible while in the unenviable position of defending the data destructive upgrade process but a blog post is nothing, why didn’t EMC strike non-disruptive upgrades from their sales and marketing slides?
Customers expect more from a next generation storage platform and ‘Always-On’ is a critical requirement of modern data centers. Maybe XtremIO will get there in time but we at Pure Storage are delivering on a vision of next generation storage today. If you haven’t already, you should check out the Gartner Solid State Array Magic Quadrant.
Firm but fair, but everyone has warts. It depends on how you treat them, and if you let the customer know at sale time. If the customer finds on after the fact, you will never sell to them again. There is also a certain amount of due diligence that customers need to do when deciding on a solution.
I agree with you. Based on the EMC slide referenced in this post and the content on the XtremIO website, I’m not sure this information is coming through in sales cycles or marketing collateral.