Data Deduplication and Data Compression in Data Ontap

Last week I published the post ‘Data Compression, Deduplication, & Single Instance Storage’ as a means to raise awareness around these different types of storage savings technologies and to provide some guidance as where each is best suited to be deployed.

I’d like to thank all that chimed in to share their thoughts and knowledge in the comments section of this post. I believe we can chalk up the discussion as a win for the readers.

(click image to view at full size)

It appears the last post led a few to revisit data deduplication with their NetApp arrays as I recieved a number of email and direct messages where the sender was (pleasantly) surprised to discover that Data Ontap 7.3.3 includes data compression to the list of the integrated storage savings technologies. (Storage features available can be view via the license command when connected to the FAS console).

Similar to ‘dedupe’, ‘compression’ will be supported with any NetApp controller, data set, and any storage communications protocol including SMB (CIFS), NFS, FTP, HTTP, FC, FCoE, iSCSI, et al. Beyond this flexibility compression can be combined with data deduplication for both SAN and NAS data sets.

NetApp is leading storage innovation. No other storage vendor can provide this level of storage savings.

Availability of Data Compression

One of the questions many have asked has been, ‘Why hasn’t NetApp announced data compression in Ontap?’ This is great quesiton which I felt was easier to share and answer with the community rather than reply to individually.

The official release which will provide support for data compression is currently targeted for the 8.0.1 release of Data Ontap. I have spoken with both the Product Manager for compression and NetApp’s own Dr. Dedupe (aka Larry Freeman) and they have shared that both have confirmed that there is no active pre-release program available for those interested in data compression with 7.3.3. Many NetApp customers are familiar with our pre-release program, which is known as PVR.

One item that I can share with you, is when DOT 8.0.1 releases, data compression will be a no-cost (or free) software license (as data deduplication is today).

For Those Planning To Enable Data Compression

For those of you considering enabling data compression with VMware, hyper-V, KVM, etc. I’d suggest that you hold off on your plans until after you have validated the technology on unstructured and archival data sets.

Some you may be asking, ‘Why such a conservative recommendation?’

To be frank, we didn’t target the use of data compression with virtual machines as our customers are currently recieving tremendous savings from data deduplication along with perfomrance gains via Flash Cache (formerly PAM) and it’s inherent TSCS capabilities. Customers commonly realize savings of 50% to 70% (and sometime greater) in virtual server environments and around 95% virtual desktop deployments.

Due to the level of success we have had with VMs we have directed our engineering efforts towards data sets where we wanted to deliver greater savings. It is for this reason why I would suggest enabling data comprssion in areas like home directories, engineering data sets, data archives, external blob storage with SharePoint Server, etc.

NAS data – aka File Services

NAS data with Dedupe

NAS data with Dedupe and Compression

Closing Thoughts

My guidance to you would be to ‘dedupe’ every data set today. Doing so reduces primary and secondary storage requirements and bandwidth for disk-to-disk replication. There’s litterally no reason to not enable his capability. With the release of 8.0.1 data compression will be generally available and will provide additional storage savings with your unstrucutred data sets.

When we ship 8.0.1 and you begin to adopt data compression, please share your results! We love the feedback.

For those of you still uncertain about how much of information around data deduplication, TSCS, and data compression is fact and how much is fiction, I’d suggest you register to attend VMworld 2010 because we are going to have many of these technologies on display, including performance validations, and technical demonstrations.

6 thoughts on “Data Deduplication and Data Compression in Data Ontap”

Nice of you to give us a teaser of upcomming tech in Ontap. As a customer I feel that things are going a too slowly forward lately for my taste.
Now, regarding the mentioned compression, can you reveal what kind of data is the target group? We use Ontap deduplication on almost all our data except Exchange (due to MS not supporting it) but I see fairly low numbers on our enduser filedata (10-25%). Now such kind of data usualy compresses well and is realy static so the savings should be realy high.
Yet it depends on how you use the compression. The smaller the data-set (ie only within the 4K block) the smaller the gain I would expect.
Talking about 4K block. Deduplication keeps the “natural” 4K in WAFL unchanged. How do you solve it with the compression involved? Do you merge compressed blocks on harddisk? You will sudenly have sub-4K blocks involved. Or have I missunderstood something?

@Dejan – Thanks for the comments and for being a customer.
As for deduplicating a data set like Exchange, the array based storage savings technology is invisible to the application. It is my understanding that Microsoft has not published not support statements regarding lack of support for array based storage savings technologies.
Home directories tend to vary in the amount of file, and sub file redundancy. This in turn impacts the amount of storage savings with dedupe. We commonly hear customer stating 30% savings +/- 5%. Data compression is the ideal compliment to this data set as the contents of home dirs tend to compress rather well.
As for block size, all forms of data compression store the data in a non-native state. For WAFL less blocks will be read from disk to serve a file, which will then be expanded in array cache. Should the file be edited and saved additional work will be required by the array to recompress the content.
For additional information on the differences between dedupe and compression please see this post:
http://blogs.netapp.com/virtualstorageguy/2010/06/data-compression-deduplication-single-instance-storage.html

Excited to see compression arriving for ONTAP – we have lots of data that compresses well but doesn’t dedupe (think millions of ASCII data files).
Are there any improvements coming in upping maxfiles and increasing performance with thousands of files per directory?

Vaughn:
It’s nice to see NetApp adding compression functionality into Ontap. We completely agree with your analysis of how compression combined with deduplication gives you the ultimate in data optimization (in fact Tom Cook recently wrote about it in his blog too). NetApp’s approach to having it completely embedded is the correct (and safest way) to provide this functionality within primary storage since it is a read path operation. Compression combined with scalable and high performing dedupe is the right track.
Mike Ivanov – Permabit

I asked for a license to try compression in our 7.3.3 cluster but was told its not actually available until 8.0.1?
Can someone please clarify which version of ONTAP will actually run compression routines?
thanks

@Andy – your data is why we have been doing the engineering work!
@Mike – Glad to see other storage vendors adopting storage efficiency technologies for production use cases.
@Fletcher – Compression officially release with DOT 8.0.1. At this time we are not providing early access (or PVR) for this functionality in 7.3.x.

Dejan Ilic says:

July 1, 2010 at 1:12 pm

Nice of you to give us a teaser of upcomming tech in Ontap. As a customer I feel that things are going a too slowly forward lately for my taste.
Now, regarding the mentioned compression, can you reveal what kind of data is the target group? We use Ontap deduplication on almost all our data except Exchange (due to MS not supporting it) but I see fairly low numbers on our enduser filedata (10-25%). Now such kind of data usualy compresses well and is realy static so the savings should be realy high.
Yet it depends on how you use the compression. The smaller the data-set (ie only within the 4K block) the smaller the gain I would expect.
Talking about 4K block. Deduplication keeps the “natural” 4K in WAFL unchanged. How do you solve it with the compression involved? Do you merge compressed blocks on harddisk? You will sudenly have sub-4K blocks involved. Or have I missunderstood something?

Vaughn Stewart says:

July 1, 2010 at 1:26 pm

@Dejan – Thanks for the comments and for being a customer.
As for deduplicating a data set like Exchange, the array based storage savings technology is invisible to the application. It is my understanding that Microsoft has not published not support statements regarding lack of support for array based storage savings technologies.
Home directories tend to vary in the amount of file, and sub file redundancy. This in turn impacts the amount of storage savings with dedupe. We commonly hear customer stating 30% savings +/- 5%. Data compression is the ideal compliment to this data set as the contents of home dirs tend to compress rather well.
As for block size, all forms of data compression store the data in a non-native state. For WAFL less blocks will be read from disk to serve a file, which will then be expanded in array cache. Should the file be edited and saved additional work will be required by the array to recompress the content.
For additional information on the differences between dedupe and compression please see this post:
http://blogs.netapp.com/virtualstorageguy/2010/06/data-compression-deduplication-single-instance-storage.html

Andy Leonard says:

July 2, 2010 at 9:47 pm

Excited to see compression arriving for ONTAP – we have lots of data that compresses well but doesn’t dedupe (think millions of ASCII data files).
Are there any improvements coming in upping maxfiles and increasing performance with thousands of files per directory?

Mike Ivanov says:

July 8, 2010 at 12:48 pm

Vaughn:
It’s nice to see NetApp adding compression functionality into Ontap. We completely agree with your analysis of how compression combined with deduplication gives you the ultimate in data optimization (in fact Tom Cook recently wrote about it in his blog too). NetApp’s approach to having it completely embedded is the correct (and safest way) to provide this functionality within primary storage since it is a read path operation. Compression combined with scalable and high performing dedupe is the right track.
Mike Ivanov – Permabit

www.facebook.com/profile.php?id=658313066 says:

July 9, 2010 at 10:29 am

I asked for a license to try compression in our 7.3.3 cluster but was told its not actually available until 8.0.1?
Can someone please clarify which version of ONTAP will actually run compression routines?
thanks

Vaughn Stewart says:

July 16, 2010 at 8:48 am

@Andy – your data is why we have been doing the engineering work!
@Mike – Glad to see other storage vendors adopting storage efficiency technologies for production use cases.
@Fletcher – Compression officially release with DOT 8.0.1. At this time we are not providing early access (or PVR) for this functionality in 7.3.x.

Data Deduplication and Data Compression in Data Ontap

If you found this article valuable, consider sharing...

Related

Vaughn Stewart

6 thoughts on “Data Deduplication and Data Compression in Data Ontap”

Leave a ReplyCancel reply

Related Posts