Introducing Data Compression in ONTAP 8.0.1

It’s no secret; NetApp wants to sell you less storage than any other storage vendor and by forecasts from industry experts like IDC it is time we all begin enabling storage savings technologies throughout our datacenters. When NetApp was founded we unknowingly set out on this mission with the release of our first feature, the snapshot backup. It was a means for customers to take full backup copies of their data, yet each backup only consumed the amount of data that was unique to the data at the point in time the backup was created. The storage savings came from the sharing of blocks that were common between the production and backup data sets. Over the years we have continued to release new and enhanced storage savings capabilities for use in production, backup, DR, archival, and test & development data sets.

Suffice to say, storage efficiency is a driving theme in many of our technologies. With the release of Data ONTAP 8.0.1 we are extending our capabilities to include inline data compression and I’d like to spend a few minutes introducing you to our new offering. While it would be true to question the term industry leading as NetApp is one of the laggards in terms of adding support for compression, it would be rather myopic to discount our history of leading the industry with the vast number of storage savings technologies engineered for use with production workloads.

Once one considers the contents of this list it’s easy to understand why we lead the industry in reducing storage footprints in data centers through out the world:

Snapshot backups – logical full backups that only consume 4KB blocks that are unique between each backup

SnapMirror – Dedupe aware data replication that can be sent in a compressed form synchronously, a-synchronously, or in a semi-synchronous mode over fibre or IP.

RAID-DP – Provides data protection greater than RAID-10 with an overhead less equal to or less than RAID-5.

Thin provisioning – provisioning a logical unit of storage (FlexVol, LUN, or file) without pre-allocating the required storage.

Space reclamation – Ability to actually delete data that is marked as available for overwrite but still resides in the NTFS file systems of LUNs and VMDKs and on storage array. BTW – thin provisioning isn’t very compelling without space reclamation. It keeps thin, thin.

FlexClone – zero cost provisioning of FlexVols, Files, and LUNs (and sub-LUNs)

Data Deduplication – sharing physical 4KB blocks between dissimilar storage objects

Transparent Storage Cache Sharing – Dedupe aware cache that allows a 4KB block to be accessed by multiple external references (such as multiple VMs, DBs, etc.)

Platforms Supporting & Details of Data Compression

Let’s start with the basics; data compression is available in 8.0.1 with supported for the following arrays:

FAS & vSeries 2040*
FAS & vSeries 3070* (and the FAS3050 via PVR)
FAS & vSeries 3100 platforms
FAS & vSeries 3200 platforms
FAS & vSeries 6000 platforms
FAS & vSeries 6100 platforms
FAS & vSeries 6200 platforms
IBM N5000 platforms
IBM N7000 platforms

* Compression is not supported on other models in the 2000 & 3000 series platforms as they do not meet the memory requirements for Data OnTap 8.0.1.

Historically flexible volumes (FlexVols) with deduplication enabled have had restrictions around the maximum size of a FlexVol, which had a correlation to the amount of cache in the array. With Data ONTAP 8.0.1 this restriction has been replaced with a universal 16TB limit for FlexVols with either, or both, dedupe and compression enabled.

In 8.0.1 data compression is an in-line, or real time process, where as dedupe runs as a post-process. Enabling compression on a FlexVol will begin compressing new data as it is written while not affect the existing data. There is an option to instruct the compression scanner to compress the existing data, including the data stored in snapshots. The process of compressing existing data where snapshots exist will likely require one to temporarily increase the size of the FlexVol in order to hold the newly re-written data. This means the FlexVol will have compressed data and a snapshot history where the data is uncompressed. Once the snapshots which existing before compression was enabled have cycled off, the FlexVol can be shrunk.

The process of compressing the existing data is a background process and as such is not a barnburner. A recommendation is to schedule the compression process to run during periods of low I/O, say like over a weekend. The compression scanner includes checkpoints and thus supports being stopped and restarted.

Data compression requires 64-bit aggregates, which were introduced in Data ONTAP 8.0. As all aggregates created on Data ONTAP 7.x are 32-bit, this requirement may limit the use of compression to new data sets or existing data sets that have been migrated to a 64-bit aggregate. Trust me, we are working to eliminating the need for the later. I can’t say anymore at this time but stay tuned for updates on this subject.

Industry Leading Implementation

NetApp’s implementation of data compression is truly unique in the storage industry for a number of reasons including the support for…

Both SAN and NAS
Production, backup, DR, archive, and test & development data sets
Use in conjunction with data deduplication
Data sets provisioned via FlexClone (be it FlexVols, files, & LUNs including sub-LUN clones)
Thin provisioned FlexVols & LUNs
Space reclamation via SnapDrive for Windows
Compressed snapshot backups

Today nearly every storage vendor promotes some form of storage savings technologies with most focusing on thin provisioning, compression, and deduplication. The reality is the current versions of these capabilities are riddled with caveats ranging from poor-performance, incompatibility with other functions, or use only in backup solutions. In short, use of storage savings technologies are so restrictive they tend to be relegated to slideware-only implementations.

“Most of today’s deduplication happens on 2nd tier storage, but it doesn’t have to. Tomorrow’s opportunity is for deduplication on primary storage (assuming no impact to performance)”

From “The Digital Universe Decade – Are You Ready?”
Published by IDC

NetApp storage savings technologies are implemented without the caveats, and this is what sets us apart from the rest of the industry. But then again, what else would you expect from a unified storage architecture?

Isn’t dedupe and compression two means to accomplish the same goal?

Ugh, if I only had a nickel for every time I’ve heard this phrase spoken… I’d have a truckload of nickels!

Some could consider this statement ‘a small white lie’ as it is somewhat truthful; however, this statement is also nefarious in nature as it limits discourse, information sharing, and the opportunity to increase a customer’s knowledge base. I instinctively have a negative reaction when one attempts to wrap another in the comfort of such shenanigans.

Dedupe & Compression both reduce storage footprints; however, how this goal is achieved and the cost of implementing each vary almost as widely as the use cases which are appropriate for each.

From a technical perspective data deduplication does not modify the data in the blocks that comprise a file or LUN. Dedupe, like Snapshots and FlexClone, allows the sharing of unique 4KB blocks between multiple dissimilar data objects such as files or LUNs. Dedupe ensure high performance by running as a scheduled process and as unique blocks are used to comprise multiple data objects, like VMDKs, it’s use results in optimized storage I/O. The array becomes ‘intelligent’ and only unique I/O requests actually go all the way to disk. We call this level from of cached block sharing Transparent Storage Cache Sharing or TSCS. This capability is unique to NetApp in the storage industry but exists in a similar fashion in the form of Transparent Page Sharing in ESX/ESXi.

By contrast data compression modifies data at the bit level in order to be stored in a more space efficient manner, one that uses fewer bits. NetApp data compression addresses data in chunks of 8 consecutive 4KB blocks. These 32KB units are referred to as a compression group. Compression groups are comprised of sub-LUN or sub-file data blocks. Storing data in compression groups results provides greater storage performance than compressing or decompressing entire files or LUNs.

In short both technologies do reduce storage footprints, but one increases performance (dedupe) while the other (compression) can provide storage savings to data sets that are void of redundancy and don’t dedupe well. I’ve covered these topics in depth in past posts:

Data Deduplication and Data Compression in Data Ontap
Data Compression, Deduplication, & Single Instance Storage
FACT: VMware vSphere on NetApp is Faster and Greener
Myth Busting: Storage Guarantees
Myth Busting: Storage Guarantees – Part II
Transparent Storage Cache Sharing– Part 1: An Introduction
Transparent Storage Cache Sharing– Part 2: More Use Cases
“http://blogs.netapp.com/virtualstorageguy/2009/11/spec-sfs2008-verifies-you-can-run-faster-on-fewer-disks-with-pam.html” target=”_blank”>SPEC SFS2008 Verifies You Can Run Faster on Fewer Disks with PAM
Run Everything Virtualized and Deduplicated : aka Chuck Anti-FUD
VCE-101: Deduplication: Storage Capacity and Array Cache

Where to Apply Dedupe and Compression

In the past I’ve encouraged customers to dedupe every data set; however, I’m not sure compression in it’s current form is optimal for use with every application data set. As I shared earlier, compression in 8.0.1 is an in-line process and it will place some additional load on the storage array. I’d like to suggest that an approach to implementing compression in a highly successful manner. Begin by targeting data sets that should provide significant storage savings above dedupe without negatively impacting the end users experience to access the data set. Data sets with these characteristics tend to be home directories, backup and archival data sets.

My recommendation is not meant to limit the use of compression; I just want to suggest areas that are known to consistently produce beneficial results in terms of both savings and performance.

As you can see from the content in the chart below, storage savings provided by compression and dedupe vary based on the type of data. Over the next several months we will publish more data on enabling compression with other types of data sets and applications.

While this chart includes some interesting data points I’d suggest that some of these points are basic. As an example the savings from dedupe with Microsoft Exchange 2010 appear modest at 15%. This value was obtained from deduplicating a single mail database. By contrast when one enables dedupe to address multiple databases stored in the same FlexVol the savings increase significantly! We are working with our technology partners to deliver more solutions which leverage the use of out storage efficiency technologies in a manner like this. Be patient, it’s coming.

Obtaining and Enabling Compression

Customers interested in obtaining a data compression license need to contact their NetApp Star Partner or sale team who can access the online license request form located in the NetApp Field Portal. Also included in the Field portal is a collection of technical resources that provide additional details on these technologies that we refer to as the ‘Deduplication and Compression Binder’

While you can enable compression on any FlexVol, I’d encourage you to run the Space Savings Estimation Tool (SSET 3.0) to analyze the dataset in order to understand the effectiveness of deduplication and or compression. SSET performs a nonintrusive crawl of a FlexVol from a Linux or Windows host. Please note SSET has a limit of 2TB and will stop processing data at 2TBs if used on a FlexVol of greater capacity.

SSET is available in the Field Portal for NetApp field & partner engineers.

For those interested below are the commands required to enable compression from the FAS terminal:

Initiate compression on a FlexVol (requires Dedupe to be enabled on the FlexVol)

vol options compression on

Initiate compression of existing data on a FlexVol

vol compress start

Initiate compression on existing data including blocks shared by dedupe and FlexCone

vol compress start -a

Initiate compression on existing data including compressing of snapshot data

vol compress start -s

Wrapping up this post

NetApp has a long history of validating storage savings technologies with a number of solutions; ranging from VMs to databases. Our engineering teams are working with our technology partners to validate the use of compression for more applications/datasets than our current conservative guidance of home directories, backups, and archives. You may recall a number of technologies including SATA drives and data deduplication were released for use with backups and archives and today these technologies are considered mainstream for production use in NetApp accounts world-wide. I expect compression to follow suit.

If you’d like to learn more about data compression you can check out the following resources:

Tech OnTap article featuring Sandra Moulton
Jay Kidd interview on compression
NetApp Play-by-Play with Sanjay Jagad, Sr Product Manager for Storage Efficency

3 thoughts on “Introducing Data Compression in ONTAP 8.0.1”

Aaron Delp says:

December 1, 2010 at 11:16 am

Disclaimer: I work for Acadia/VCE but I also follow the NetApp technology.
Vaughn, couple of technical questions for you on the implementation. I’m sure a TR will be released at some point to cover this but the following pops into my head.
1. Is compression at the volume or aggr level? I didn’t see that listed. I assume volume level like dedupe
2. Will compression only happen if compressable (is that a word?) blocks are in the cache at the same time and are therefore compressed in cache and then written to disk at the next consistency point?
3. If the above is true, then it isn’t like the old days Intel 286/386 days of disk compression (and dedupe when you think about it) where you would do a pass of the disk and then a compression run after that.
4. Since most of the heavy lifting is in cache, is this only in cache on the controller or is compression PAM aware as well?
Thanks!!
Aaron

Vaughn Stewart says:

December 1, 2010 at 11:27 am

Aaron,
Hey buddy I hope things are going well. Great questions, whch I may need to clarify the content in the post.
Compression is enabled on a per FlexVol basis, but it requires the FlexVol to reside on a 64 bit aggregate. Compression occurs as the data is injested (after NVRAM ACK to host ) and before it is written to disk.
Take care.
V

Nuniko says:

December 14, 2010 at 3:46 pm

Its very intresting reading your opening:It’s no secret; NetApp wants to sell you less storage than any other storage vendor….
Its also not a secret NetApp wants to give you for free anything that will cause you spend much more money in the future….
Like DeDup, Also NetApp compression increase your controller utilization, decrease performance and force you after a short time to upgrade your storage controller. this is of course bring NetApp much more revenue than disks you buy….
BTW – I wonder why no one is talking about the “other” solution – the old Storwize company IBM just acquired. An appliance based solution solve all the problems customers need to face with while implementing NTAP compression.