Splunk SmartStore on Pure Storage: Simple, Efficient & Accelerated


SmartStore is their next-generation architecture from Splunk® designed to help turn data into action for ITOPs, SIEM, IOT, and business analytics. SmartStore delivers operational simplicity, agility, and infrastructure cost reduction. When paired with Pure Storage FlashBlade™, SmartStore gains accelerated search capabilities and increased storage efficiencies.

Also, FlashBlade is future proof for Splunk, enabling customers to accelerate their ‘classic’ Indexer infrastructure while providing the foundation for a non-disruptive upgrade to SmartStore.

Sound too good to be true? Let’s break down each benefit and the underlying technologies.

SmartStore: A Disaggregated Architecture for Splunk Indexers

Splunk SmartStore is a cloud native architecture, that’s comprised of stateless Indexer servers and an S3 object store. This is a radical departure from the ‘classic’ Splunk Indexer architecture, which dedicates storage volumes (DAS, SAN or NAS) to compute resources. This architectural model, like HDFS and HCI, is costly to operate at scale due to the requirement to migrate large volumes of data when managing servers.  

SmartStore replaces rigid storage allocation and performance tiers with a cache and an S3 object store. As data is ingested it is stored in the Indexer cache (aka hot buckets). Once indexed, a persistent copy of data is stored in the object store (aka warm buckets), and an ephemeral copy reside in the cache. The object store is responsible for providing data services like data protection and multi-site data synchronization.

Cold buckets & TSIDX reduction no longer exist; however, frozen buckets (for archival) are still supported.

It is my firm opinion that Indexer cache storage should be SSDs (be it NAND, NVMe or Intel Optane). This is based on a) transient nature of the data, b) small capacity requirements, and c) the aggregated read / write bandwidth (of 10s to 100s of SSDs).

SmartStore is 10X Simpler than Classic Splunk Deployments

SmartStore is truly revolutionary in terms of Indexer server management – this advancement is the directly result of the disaggregated architecture. Data evacuation, rehydration, validation, reconstruction and rebalancing have been removed for Indexer management and maintenance.

Splunk Indexers and storage capacity can now be added on-demand. This agility enables new capabilities, like bursting Indexer nodes (bare metal, VM, or container), to support demand spikes. Need more storage capacity, no problem,  slide a new blade into a FlashBlade chassis… that’s it. Indexer clusters can have software updated/patched in hours versus days or weeks.

Yes – days to weeks is the norm for classic deployments.

SmartStore increases Splunk availability. Indexer HA failures recover in mere minutes without the requirement to validate and reconstruct data (which is required with DAS, SAN and NAS). In SmartStore the S3 object store, not the Indexer node, is responsible for data protection. 

SmartStore & FlashBlade: Radically Efficient Infrastructure

Together, SmartStore and FlashBlade can radically reduce hardware infrastructure requirements. SmartStore frees Indexer clusters to be sized solely on the rate of data ingest without consideration to data capacities.

SmartStore implements single instance storage, eliminating data replicas with warm buckets and resulting in significant storage capacity reductions. FlashBlade further reduces warm bucket data requirements with advanced data compression. This savings is in addition to Splunk’s native data compression.

Data replicas only exist in the cache for hot buckets. Warm buckets now store as a single copy in the object store. 

No two Splunk deployments are the same – so while your mileage may vary – but it’s common for compute requirements to reduce up to 40% and storage capacity up to 60%.

FlashBlade Accelerates SmartStore

For all of the gains provided by SmartStore there is a gap – search performance can suffer. Traditionally S3 object stores have been designed for low performance, archival data. This means a SmartStore backed by a slow object store may be unable to complete searches of older data in a timely manner or worse, may time out.

In developing FlashBlade, one bet that Pure Storage made was that data analytics would move to S3, would leverage high-speed networks and thus would require high bandwidth, low latency object stores.

FlashBlade accelerates SmartStore – enabling customers to be prepared for when the unexpected occurs and large volumes of data must be searched. Cyber breach, e-legal discovery, regulatory and compliance requirements (like GDPR) are just a few examples of large-scale searches that customers have shared are very difficult to complete in a timely manner with classic Splunk.

In comparative testing, FlashBlade was more than 10X faster than an alternative S3 object store. The best part is FlashBlade is comparable in costs to that of SAN or NAS storage deployed with classic Splunk.

Pure Storage: Providing A Modern Data Experience for Splunk

I believe SmartStore on FlashBlade is the ideal architecture for Splunk environments based on the new levels of operational simplicity and agility, reduction in infrastructure requirements and the performance of FlashBlade. This value prop extends with Evergreen Storage and Pure-as-a-Service offerings.

There are also some new publications, including the latest Splunk SmartStore on FlashBlade Reference Architecture and Splunk SmartStore on Pure Storage Reference Architecture from the Kinney Group. The former is a deployment blueprint, whereas the latter is a technical validation from a Splunk practice expert.

There’s a lot I tried to cover in the post, which I plan to dive a bit deeper into in subsequent posts – so look for those shortly.



Leave a Reply