TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda...
Transcript of TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda...
![Page 1: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/1.jpg)
Copyright © 2016 Splunk Inc.
Mustafa Ahamed Director, Product Management
Ashish Mathew SoDware Engineer
TCO ReducIon Through Storage
![Page 2: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/2.jpg)
Disclaimer
2
During the course of this presentaIon, we may make forward looking statements regarding future events or the expected performance of the company. We cauIon you that such statements reflect our current expectaIons and esImates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements, please review our filings with the SEC. The forward-‐looking statements made in the this presentaIon are being made as of the Ime and date of its live presentaIon. If reviewed aDer its live presentaIon, this presentaIon may not contain current or
accurate informaIon. We do not assume any obligaIon to update any forward looking statements we may make. In addiIon, any informaIon about our roadmap outlines our general product direcIon and is
subject to change at any Ime without noIce. It is for informaIonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligaIon either to develop the features or funcIonality described or to include any such feature or funcIonality in a future release.
![Page 3: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/3.jpg)
Agenda
IntroducIon To Data Storage In Splunk TSIDX ReducIon – Overview TSIDX ReducIon – Set Up Performance Comparisons Tips & Tricks
3
![Page 4: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/4.jpg)
IntroducIon To Data Storage In Splunk
![Page 5: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/5.jpg)
Splunk Architecture
5
1 Search Head gets the peer
list from Cluster Master
2 Search Head sends the
search queries to peers
3 Redundant copies of raw
data are available
1 2
3
![Page 6: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/6.jpg)
Bucket Lifecycle
6
Events
[Too Many Warms] [Hot Bucket is Full]
[Out of Space or Bucket is Old]
[Explicit User AcIon]
$ Thawed Path
$ Home Path $ Cold Path [Cheaper Storage]
$ Frozen Path or Deleted
![Page 7: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/7.jpg)
Storage Requirements
7
Raw data on disk = ~ 15% of indexed data Index files on disk = ~35% of indexed data
Index data = 100GB, RF = 3, SF = 2
• Raw data = 15 * 3 = 45 GB • Index files = 35 * 2 = 70 GB
Total size across cluster = 115 GB
Per peer storage = 38 GB hip://blogs.splunk.com/2013/01/31/disk-‐space-‐esImator-‐for-‐index-‐replicaIon/
![Page 8: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/8.jpg)
TSIDX ReducIon Overview
![Page 9: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/9.jpg)
TSIDX RetenIon Policy
Ability to remove TSIDX file contents for historical data to save disk space
RAW Data
TSIDX Files
RAW Data
TSIDX Files (minified)
9
![Page 10: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/10.jpg)
Deep Dive
![Page 11: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/11.jpg)
Reduce What ?
Lexicon and PosIngs list
Raw data: ‒ Event 1: Happy kiiy ‒ Event 2: Sad kiiy
‒ Lexicon: ‒ Happy: Term-‐id 1 ‒ Kiiy: Term-‐id 2 ‒ Sad: Term-‐id 3
‒ PosIngs List: ‒ Term-‐1:
‣ [Event-‐1] – Term-‐2:
‣ [Event-‐1,Event-‐2] – Term-‐3:
‣ [Event-‐2]
11
![Page 12: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/12.jpg)
So How Do We Search ? • Brute Force ! – Read EVERYTHING from disk, filter raw in memory
• Some opImizaIons by retaining the following – Bloom filters : Eliminate buckets that do not contain the terms – Reduced TSIDX : Eliminate events that fall outside the Ime range – *.data Files : Eliminate events that don’t match host/source/sourcetype
12
![Page 13: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/13.jpg)
Won’t Searches Be Slow ? • It Depends !!! – Dense searches not affected at all – Sparse searches affected significantly
• AssumpIon : Old data is less searched • Before configuring determine a cutoff point
13
![Page 14: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/14.jpg)
Numbers • Disk Savings : 60-‐70% on average – Beier for numerical data – Beier for larger lexicons
• Search Times: – Dense : Not affected – Sparse/Rare
ê Goes from seconds to minutes ê Scales with data volume
14
![Page 15: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/15.jpg)
ConfiguraIon Per-‐index sevngs in indexes.conf REST/CLI/UI: No restart required • enableTsidxResucIon : true|false – Enable the feature. Off by default
• ImePeriodInSecBeforeTsidxReducIon – Age at which bucket eligible for reducIon
• tsidxReducIonCheckPeriodInSec – Frequency of scans for eligible buckets
15
![Page 16: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/16.jpg)
UI
16
![Page 17: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/17.jpg)
ReducIon Process • Eligibility – Bucket is not HOT – No more splunk-‐opImize runs scheduled on the bucket – Bucket is the right age
• Create reduced files in a tmp directory in the bucket • Copy over reduced files, delete the full files • Ongoing searches uninterrupted • NOTE: Marginal disk usage increase when first enabled
17
![Page 18: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/18.jpg)
DANGER ! • Once a bucket is reduced going back is very expensive
• Two ways: – Disable reducIon, then wait for the reduced buckets to be phased out – Stop Splunk and rebuild the bucket
18
![Page 19: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/19.jpg)
Clustering • indexes.conf Is consistent across slaves. • ReducIon does not happen in lock step across all slaves • Eventually all copies of the bucket will have the same state across peers • Bucket is SEARCHABLE if it has either full or mini-‐TSIDX files
19
![Page 20: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/20.jpg)
Debug OpIons • Undocumented CLI to manually minify a specific bucket – Stop splunk
ê splunk fsck minify-‐tsidx -‐-‐one-‐bucket -‐-‐bucket-‐path=<path>
• New field in dbinspect : – tsidxState : full | mini
• Log channels – MinificaIon scheduler
ê category.OnlineFsck – New filtering layers in Search
ê category.ISearchOperator ê category.FastSearchFilter ê category.LispyPostFilter
20
![Page 21: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/21.jpg)
Performance TesIng Results
![Page 22: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/22.jpg)
Performance
Vs Hadoop Data roll
22
![Page 23: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/23.jpg)
Comparison To Hadoop Data Roll
![Page 24: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/24.jpg)
Hadoop Data Roll
24
Moves raw data from Splunk to Hadoop infrastructure Useful if you already have Hadoop in your env Performance wise TSIDX reducIon is faster due to Bloom filters
![Page 25: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/25.jpg)
Best PracIce RecommendaIons
![Page 26: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/26.jpg)
Key Details
Per-‐index configuraIon – Can be enabled globally or per-‐index basis
Cluster-‐aware
Bloom filter – Always Use Bloom Filters
Performance
26
![Page 27: TCO*ReducIon*Through*Storage** - SplunkConf · 2017-10-08 · Agenda IntroducIon*To*DataStorage*In*Splunk* TSIDXReducIon*–Overview* TSIDXReducIon*–SetUp** Performance*Comparisons*](https://reader033.fdocuments.net/reader033/viewer/2022060222/5f07ad707e708231d41e2f52/html5/thumbnails/27.jpg)
THANK YOU