Flash is the revolutionizing force in the storage market, providing next-generation storage platforms a level of performance previously unobtainable and more ideal to support the data access requirements of today and for the foreseeable future. Flash is broadly offered in one of two formats. There are all-flash arrays that tend to focus solely on high-performance for the high-end of the market. There’s also a vast number of hybrid flash & disk architectures, where the consensus views flash as too expensive for use as a persistent storage tier and instead implement it as a cache tier in one form or another (array, host-based cache or converged platforms).
Since our inception Pure Storage has committed to make flash affordable for broad adoption across a wide range of customers, applications and markets. In order to deliver on this vision we had to develop 5 forms of data reduction technology that we call FlashReduce. These technologies work in unison, autonomously, and without encumbering performance. This is a fundamentally unique approach within the storage industry and one worthy of deeper discussion.
Traditional Data Reduction in Storage Arrays
Data reduction technologies have been available in various forms on storage platforms for a number of years. The bulk of array-based implementations have been for backup operations as the associated overhead is unacceptable for the vast majority of production applications.
In arrays that provide data reduction for production workloads one will often find a single form of data reduction, compression or deduplication, which tend to result in storage savings for a subset of data, often resulting in modest capacity savings of 2:1 or less. These technologies are often burdensome to implement as they often operate as a scheduled process that can be difficult to manage without negatively impacting other data management processes including snapshots, backup and replication.
Reinventing Data Reduction for Broad Adoption
Pure Storage designed FlashReduce to be broadly applicable, able to reduce capacity across a large number of data types, and operate automatically without impact to performance or datacenter operations. To truly appreciate the elegance of the data reduction engine one should understand the technologies implemented, how block size impacts reduction, and the broad applicability that results from the combined use of these technologies.
An Adaptive Data Reduction Engine
The Purity Operating Environment of the FlashArray implements 5 forms of the data reduction technology. All are autonomous and work in unison to provide data reduction across a broad set of applications and use cases without scheduling or administrative intervention. As data is written to the FlashArray it is checksummed for integrity and is acknowledged (ACK’d). This is where the Purity O.E. FlashReduce process begins…
1. Pattern Removal with 8-bit Granularity Pattern removal identifies repetitive binary patterns, including zeros, as data enters NVRAM. This process optimizes NVRAM capacity and reduces the volume of data to be processed by the dedupe scanner and compression engine.
2. Adaptive Data Deduplication with 512-byte Granularity Dedupe ensures only unique blocks of data are stored on flash. Data in NVRAM is scanned via a lightweight hash-based process that ensures data integrity and performance. Hash results from new data are compared to the results of existing data stored in a hash table. Potential matches are validated via a binary comparison of the NVRAM and SSD data prior to being confirmed and released (i.e. hashes are not trusted, dedupes are only declared after verifying a data match 100%). The adaptability of the dedupe scanner ensures peak performance while maximizing data reduction. With 512-byte granularity, Purity O.E. can produce data deduplication results from datasets that cannot be reduced by traditional fixed block dedupe implementations. Supporting variable dedupe chunks ranging in size from 4KB to 32KB reduces metadata, which in turn aides system scalability. The scanner even adapts to changes in I/O load and will reduce the time allocated to the dedupe process in order to ensure system performance under extreme write loads.
3. Adaptive Data Compression Compression encodes data in a format that requires less capacity than the original format. The compression engine scans data in NVRAM to determine the level of reduction that can be obtained from a lightweight Lempel–Ziv–Oberhumer (LZO) lossless algorithm. The engine compresses data identified to produce moderate to high capacity savings and skips data identified as producing little savings or deemed as uncompressible. This adaptive nature maximizes CPU by ensuring they are spent on data sets that produce the greatest returns.
4. Instant, Zero-Cost Storage Snapshots, Clones, and XCopy Purity provides instant references to data for use as storage snapshots and cloned data sets. Each of these use cases leverages the data reduction engine and only consumes capacity as globally unique data is stored. Many of these abilities are available within infrastructure, operating system and application stacks via API integration with our technology partners.
- Storage-based snapshots provide the initial means of data protection for many enterprises. They enable Recovery Point Objectives (RPO) & Restore Time Objectives (RTO) that aren’t possible with tape.
- Storage-based clones allow many business applications, like SAP or Oracle Database, to scale on demand. They provide automation to the many teams that support these architectures from development to Q&A teams.
- Infrastructure partners like VMware with their vStorage APIs can clone and migrate VMs almost instantly via VAAI XCopy and without moving data and adding wear to the flash.
5. Deep Reduction All SSDs relocate data as a part of the Garbage Collection (GC) process. In Purity GC is a function of FlashCare, which optimizes flash and as it operates at the system level provide we can provide benefits above what is possible with firmware level GC. When FlashCare executes GC Purity validates RAID parity and the integrity of checksums while also further reducing data capacity. This ‘deeper reduction’ includes a dedupe sweeper to ensure only unique data exists on SSD and applies a second, more aggressive compression algorithm to further reduce storage capacity. The compression implemented is a patent-pending form of the Huffman encoding algorithm that is too taxing to be applied inline and which may further the savings provided by the inline process. This allows us to provide savings for data that may have originally been stored uncompressed due to expected low returns with LZO or being identified as uncompressible.
You may be wondering, “What about Thin Provisioning?” Thin Provisioning is a dynamic form of storage provisioning and not a data reduction technology. It allows capacity to be allocated on demand and thus maximizes storing data by not storing zeros. All LUNs on a FlashArray are Thin Provisioned and to the best of my knowledge this is true with all flash-optimized AFAs (as opposed to traditional disk arrays fitted with SSDs).
Data Reduction Expands Total Addressable Market
There is no single form of data reduction that can address every workload. Each of the 5 forms of data reduction provides a unique benefit to an application or dataset and some times multiple technologies act as a multiplier to data reduction capabilities. For example deduplicated data can often benefit from compression and data that has been compressed at the application layer can often be compressed further at the storage layer. The latter is possible as compression algorithms at the application layer tend to prioritize performance over storage savings. With this in mind consider how three uses cases leverage different forms of data reduction.
Databases and many applications perform single instancing within the application. As such compression tends to provide the bulk of the storage savings. Customers running Microsoft SQL Server and Oracle Database often receive data reduction ratios ranging between 2:1 and 4:1. With that said, these numbers are conservative and many receive higher returns. For example we have seen some MongoDB deployments with greater than 6:1 savings.
Virtual Desktops and SAP landscapes are two examples of applications that are designed around the use of identical cloned data sets. In deployments of this type, deduplication eliminates the redundancy inherent in the architecture and compression tends to double these savings. Storage savings correlates to the number of clones; as such customer deployments tend to exceed ratios of 10:1.
Virtual infrastructures, like VMware vSphere, Microsoft Hyper-V, OpenStack and KVM are the sweet spot for the combination of data reduction technologies. They combine redundant data (like guest operating systems and application binaries) with application data (installed in the VM). In these use cases dedupe and compression are used equally and customers often see a data reduction ratios between 5:1 and 9:1.
Reducing data capacity stored on SSD, both as it enters the system and as relocated over the lifetime of the medium significantly reduces Write Amplification (WA) and results in greater flash life and reliability.
Is Data Deduplication always Inline and Does it Turn Off?
Implementing data reduction technologies without negatively impacting performance is a non-trivial endeavor and may lead one to ask, “Is Pure’s data deduplication truly inline or does it turn off?” This is a fair question.
To start, I think many are unintentionally misusing the term dedupe in place of data reduction. Dedupe is one form of data reduction technology along with compression, pattern removal, etc. When asking if dedupe turns off or is inline I think many are actually asking about our data reduction process and wether it is inline or turns off. In fairness, Pure Storage is guilty of this over simplification. have you seen our ‘dedupe ticker‘?
FlashReduce never turns off and is more than inline deduplication. FlashReduce combines 5 data reduction processes; 3 inline (pattern removal, dedupe and compression), 1 provisioning (clones/xcopy), and 1 post process (deep reduction). It is always on and cannot be scheduled or disabled.
For the technically curious, let’s take this discussion a bit deeper…
Ensuring performance is a priority in the architectural design of every application. This capability is also possible from a flash storage architecture that adapts resource assignment based on workload. This is truly unique as historically, and even today, many are conditioned to purchase additional hardware explicitly to address the unexpected.
Pure Storage prioritizes the delivery of 100% of the system performance under all conditions including planned and unplanned outages. We’ve even designed service level assurance within the data reduction engine. Should the FlashArray become CPU constrained, Purity will reduce the amount of time allocated for data in NVRAM to be deduplicated and will assign the freed CPU resources to proritize the delivery of host IO. The dedupe process is never disabled, but in these conditions the thoroughness of a hash lookup is reduced and may allow for redundant blocks to hit flash. After the high write load subsides, Purity will respond and reassign resources back to the data reduction engine. FlashCare will identify and dedupe any redundant data that may have made it to SSD.
So you may ask, “How often does this occur?” According to CloudAssist a very minimal amount of all data stored across our entire install base occurred while the dedupe engine was deprioritized. The number isn’t zero but it’s likely you can’t measure it.
Unlike data compression, where all data is compressed, data deduplication tends to operate in bursts that are easy to identify. Two common examples of bursts of redundant data entering the FlashArray include the import or migration of multiple VMs on to the FlashArray and the mass deployment of a patch to many VMs. In both of these cases the dedupe engine still eliminates the redundancy in these instance as what you could consider as the ‘most recently deduped’ hash table and lookup is never disabled.
Prioritizing application performance over secondary benefits can be a foreign concept for those who only think in terms of hardware. I assure you the adaptive capability within the FlashArray provides a level of assurance that application admins have never experienced from a storage platform.
Let’s see the forest for the trees. If another vendor is challenging the completeness of Pure’s adaptive approach to data reduction, ask yourself realistically which is a more capable architecture: 1). The array that misses >50% or more of the data reduction potential by lacking compression or implementing fixed and/or large block deduplication engine” or 2). The array that offers more forms of data reduction, in a more granular fashion and is designed to ensure 100% performance to the applications it hosts” Obviously this is a rhetorical question that has competitors struggling to counter with any substance beyond a claim that, “Pure’s dedupe turns off!”
I’ll provide more detail on some of technical constructs in this section in the upcoming posts: “The Benefits of NVRAM“ and “Ensuring 100% Performance“
Enabling the Flash-Powered Datacenter
Pure Storage has changed the economics of storage. When you can have all-flash at the price of disk why would you consider any disk-based storage platform? Our customers are averaging 6:1 data reduction and these savings are measured without any over provisioning enabled by thin provisioning. I hope I have helped explain how Pure Storage can provide at least 2X the storage savings and in turn a 2X reduction in cost over alternative all-flash arrays.