I’d like to continue the discussion on storage savings technologies. In my recent post on Dedupe and Compression I attempted to clarify the differences in the data reduction capabilities between each technology and demonstrate how the two deliver even greater savings when used in conjunction. Let’s expand the conversation by digging into Thin Provisioning, likely the most widely available storage savings technology.
Provisioning storage in the traditional sense follows the “If you build it, they will come” aka the Field of Dreams model. Storage capacity is preallocated in anticipation of eventually, someday storing data. As a result this model requires storage and application administrators to develop a set of data growth forecasting skills, which with the benefit of time, we’ve learned are inaccurate more often than not.
At a high level, Thin Provisioning (T.P.) is a virtual provisioning mechanism that allows addressable storage capacity to be provisioned without consuming or reserving physical capacity. The latter is allocated on-demand, as data is being written. This dynamic capability eliminates physical capacity lost as free space within traditional, thick provisioned LUNs, volumes and virtual disks.
If you’d like a deep dive into the details of thin and thick provisioned virtual disks see these posts: Thin Provisioning Part 1: The Basics & Part 2: Going Beyond.
Clarifying the Role of Thin Provisioning
Data Deduplication, Compression and Thin Provisioning take different paths to delivering storage efficiencies. Here’s the simple way I view these technologies:
- By removing unused reserve space, Thin Provisioning maximizes physical storage capacity
- By reducing data, Dedupe and Compression increase physical storage capacity
Leveraging multiple storage savings technologies can provide unprecedented gains in usable storage capacity, resulting in significant reduction in storage costs. Pure Storage publishes the Dedupe Ticker, a publicly available real-time display of the actual data reduction and thin provisioning savings realized by our customer base. This data spans all applications and highlights the 2X multiplier I’ve previously referred to. As you can see Dedupe & Compression combine to drive an average 6:1 data reduction.
When you add Thin Provisioning to the data reduction technologies, storage savings increase to nearly 13:1. Now I must admit, measuring the savings from Thin Provisioning is somewhat nuanced. While T.P. makes it easier to fill a drive or array to capacity it doesn’t actually increase the amount of addressable storage on the drive or array. As a result the ‘benefits’ of thin provisioning can be overstated. Allow me to demonstrate how easy it is to manipulate the ‘thin provisioning effect on savings’.
Let’s compare 8 TB of data stored on a 10 TB and a 100TB Thin Provisioned LUN. You’ll notice the former provides 20% savings where as the latter 92%.
The logical data set remained 8 TB and the deduped capacity at 4 TB, yet by simply adjusting the capacity of the T.P. LUN we see an increase in T.P. Savings. There was no impact on the volume of data that is or can be stored.
In Closing
You should absolutely use every storage saving technology available on your array. The combination of Dedupe, Compression and Thin Provisioning along with additional technologies including SCSI unmap, array clones, and the removal of patterns and zeros are critical to scaling storage in the world of fixed physical datacenter capacity. This is the new norm – embrace it!
With that said… be cautious around any emphasis on Thin Provisioning. While I’m not suggesting that there’s no value to T.P., you need to watch out for the ‘thin provisioning effect on savings’ as it can set false expectations. Don’t take my word for it, see the example below from Big Storage.
Unlike dedupe and compression technologies, I think promise of thin provisioning is not about increasing the amount of addressable storage on the drive or array, rather it is to reduce administration tasks. Ex: As administrator can allocate disk space with growth/future demand in mind and not worry about increasing individual allocations but only worry about monitoring and adding more (or higher capacity) disks at the storage layer.
Moreover all of these technologies/techniques can be used together.
I am new to this area and I would appreciate your response.
Regards!
In reference to my earlier comment, I think the post should say what thin provisioning is good for in addition to what it shouldn’t be mistaken for- Admins won’t allocate disk space on servers based on how much data is actually stored, but based on the storage needs in the near/mid-term future. Thin provisioning allows them to allocate storage to the highest needs anticipated without actually buying any hardware – resulting in cost savings.
If a lots of servers are thin provisioned like that, an admin can monitor storage growth at storage layer (macro level) and add more disks when actual usage reached 70-80% or more without worrying about allocations at the server level to the most part.