Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

83
Chris Lohfink Cassandra Metrics

Transcript of Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Page 1: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Chris Lohfink

Cassandra Metrics

Page 2: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 2

About me

• Software developer at DataStax• OpsCenter, Metrics & Cassandra interactions

Page 3: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 3

What this talk is• What does the thing the metrics report mean (da dum tis)• How metrics evolved in C*

Page 4: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

CollectingNot how, but what and why

Page 5: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Cassandra Metrics

• For the most part metrics do not break backwards compatibility• Until they do (from deprecation or bugs)

• Deprecated metrics are hard to identify without looking at source code, so their disappearance may have surprising impacts even if deprecated for years.

• i.e. Cassandra 2.2 removal of “Recent Latency” metrics

© DataStax, All Rights Reserved. 5

Page 6: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

C* Metrics Pre-1.1

© DataStax, All Rights Reserved. 6

• Classes implemented MBeans and metrics were added in place• ColumnFamilyStore -> ColumnFamilyStoreMBean

• Semi-adhoc, tightly coupled to code but had a “theme” or common abstractions

Page 7: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Latency Tracker

• LatencyTracker stores: • recent histogram• total histogram• number of ops• total latency

• Use latency/#ops since last time called to compute “recent” average latency

• Every time queried it will reset the latency and histogram.

© DataStax, All Rights Reserved. 7

Page 8: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 8

Describing Latencies

0 100 200 300 400 500 600 700 800 900 1000

• Listing the raw the values:

13ms, 14ms, 2ms, 13ms, 90ms, 734ms, 8ms, 23ms, 30ms

• Doesn’t scale well• Not easy to parse, with larger amounts can be difficult to find high values

Page 9: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 9

Describing Latencies

0 100 200 300 400 500 600 700 800 900 1000

• Average:• 103ms

Page 10: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 10

Describing Latencies

0 100 200 300 400 500 600 700 800 900 1000

• Average:• 103ms

Page 11: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 11

Describing Latencies

0 100 200 300 400 500 600 700 800 900 1000

• Average:• 103ms

• Missing outliers

Page 12: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 12

Describing Latencies

0 100 200 300 400 500 600 700 800 900 1000

• Average:• 103ms

• Missing outliers• Max: 734ms• Min: 2ms

Page 13: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 13

Describing Latencies

0 100 200 300 400 500 600 700 800 900 1000

• Average:• 103ms

• Missing outliers• Max: 734ms• Min: 2ms

Page 14: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Latency Tracker

• LatencyTracker stores: • recent histogram• total histogram• number of ops• total latency

• Use latency/#ops since last time called to compute “recent” average latency

• Every time queried it will reset the latency and histogram.

© DataStax, All Rights Reserved. 14

Page 15: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 15

Recent Average Latencies

0 100 200 300 400 500 600 700 800 900 1000

• Reported latency from• Sum of latencies since last called• Number of requests since last called

• Average:• 103ms

• Outliers lost

Page 16: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histograms• Describes frequency of data

© DataStax, All Rights Reserved. 16

1, 2, 1, 1, 3, 4, 3, 1

Page 17: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histograms• Describes frequency of data

1

© DataStax, All Rights Reserved. 17

1, 2, 1, 1, 3, 4, 3, 1

Page 18: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histograms• Describes frequency of data

12

© DataStax, All Rights Reserved. 18

1, 2, 1, 1, 3, 4, 3, 1

Page 19: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histograms• Describes frequency of data

112

© DataStax, All Rights Reserved. 19

1, 2, 1, 1, 3, 4, 3, 1

Page 20: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histograms• Describes frequency of data

1112

© DataStax, All Rights Reserved. 20

1, 2, 1, 1, 3, 4, 3, 1

Page 21: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histograms• Describes frequency of data

11123

© DataStax, All Rights Reserved. 21

1, 2, 1, 1, 3, 4, 3, 1

Page 22: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histograms• Describes frequency of data

111234

© DataStax, All Rights Reserved. 22

1, 2, 1, 1, 3, 4, 3, 1

Page 23: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histograms• Describes frequency of data

1112334

© DataStax, All Rights Reserved. 23

1, 2, 1, 1, 3, 4, 3, 1

Page 24: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histograms• Describes frequency of data

11112334

© DataStax, All Rights Reserved. 24

1, 2, 1, 1, 3, 4, 3, 1

Page 25: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histograms• Describes frequency of data

11112334

© DataStax, All Rights Reserved. 25

1, 2, 1, 1, 3, 4, 3, 1

4

3

2

1

0 1 2 3 4Count

Page 26: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histograms• "bin" the range of values

• divide the entire range of values into a series of intervals• Count how many values fall into each interval

© DataStax, All Rights Reserved. 26

Page 27: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histograms• "bin" the range of values—that is, divide the entire range of values

into a series of intervals—and then count how many values fall into each interval

© DataStax, All Rights Reserved. 27

0 100 200 300 400 500 600 700 800 900 1000

13, 14, 2, 20, 13, 90, 734, 8, 53, 23, 30

Page 28: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histograms

• "bin" the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval

© DataStax, All Rights Reserved. 28

13, 14, 2, 20, 13, 90, 734, 8, 53, 23, 30

Page 29: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histograms

• "bin" the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval

© DataStax, All Rights Reserved. 29

2, 8, 13, 13, 14, 20, 23, 30, 53, 90, 734

Page 30: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histograms

• "bin" the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval

© DataStax, All Rights Reserved. 30

2, 8, 13, 13, 14, 20, 23, 30, 53, 90, 734

1-10 11-100 101-10002 8 1

Page 31: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 31

Histograms

Approximations

Max: 1000 (actual 734)

1-10 11-100 101-10002 8 1

Page 32: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 32

Histograms

Approximations

Max: 1000 (actual 734)

Min: 10 (actual 2)

1-10 11-100 101-10002 8 1

Page 33: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 33

Histograms

Approximations

Max: 1000 (actual 734)

Min: 10 (actual 2)

Average: sum / count, (10*2 + 100*8 + 1000*1) / (2+8+1) = 165 (actual 103)

1-10 11-100 101-10002 8 1

Page 34: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 34

Histograms

Approximations

Max: 1000 (actual 734)

Min: 10 (actual 2)

Average: sum / count, (10*2 + 100*8 + 1000*1) / (2+8+1) = 165 (actual 103)

Percentiles: 11 requests, so we know 90 percent of the latencies occurred in the 11-100 bucket or lower.

90th Percentile: 100

1-10 11-100 101-10002 8 1

Page 35: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 35

Histograms

Approximations

Max: 1000 (actual 734)

Min: 10 (actual 2)

Average: sum / count, (10*2 + 100*8 + 1000) / (2+8+1) = 165 (actual 103)

Percentiles: 11 requests, so we know 90 percent of the latencies occurred in the 11-100 bucket or lower.

90th Percentile: 100

1-10 11-100 101-10002 8 1

Page 36: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 36

EstimatedHistogram

The series starts at 1 and grows by 1.2 each time

1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 17, 20, 24, 29, …12108970, 14530764, 17436917, 20924300, 25109160

Page 37: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 37

LatencyTrackerHas two histograms• Recent

• Count of times a latency occurred since last time read for each bin

• Total• Count of times a latency occurred since Cassandra started for each bin

Page 38: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 38

Total Histogram Deltas

If you keep track of histogram last time you read it can find delta to determine how many occurred in that interval

Last

Now

1-10 11-100 101-10002 8 1

1-10 11-100 101-10004 8 2

Page 39: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved.

Total Histogram Deltas

If you keep track of histogram last time you read it can find delta to determine how many occurred in that interval

Last

Now

Delta

1-10 11-100 101-10002 8 1

1-10 11-100 101-10004 8 2

1-10 11-100 101-10002 0 1

Page 40: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 40

Cassandra 1.1

• Yammer/Codahale/Dropwizard Metrics introduced • Awesome!• Not so awesome…

Page 41: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 41

Reservoirs

• Maintain a sample of the data that is representative of the entire set.• Can perform operations on the limited, fixed memory set as if on entire dataset

• Vitters Algorithm R• Offers a 99.9% confidence level & 5% margin of error• Simple

• Randomly include value in reservoir, less and less likely as more values seen

Page 42: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 42

Reservoirs

• Maintain a sample of the data that is representative of the entire set.• Can perform operations on the limited, fixed memory set as if on entire dataset

• Vitters Algorithm R• Offers a 99.9% confidence level & 5% margin of error * When the stream has a normal distribution

Page 43: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Metrics Reservoirs• Random sampling, what can it miss?

– Min– Max– Everything in 99th percentile?– The more rare, the less likely to be included

43

Page 44: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Metrics Reservoirs• “Good enough” for basic adhoc viewing but too non-deterministic for many• Commonly resolved using replacement reservoirs (i.e. HdrHistogram)

44

Page 45: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Metrics Reservoirs• “Good enough” for basic adhoc viewing but too non-deterministic for many• Commonly resolved using replacement reservoirs (i.e. HdrHistogram)

– org.apache.cassandra.metrics.EstimatedHistogramReservoir

45

Page 46: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Cassandra 2.2• CASSANDRA-5657 – upgrade metrics library (and extend it)

– Replaced reservoir with EH• Also exposed raw bin counts in values operation

– Deleted deprecated metrics• Non EH latencies from LatencyTracker

46

Page 47: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Cassandra 2.2• No recency in histograms• Requires delta’ing on the total bin counts currently which is beyond

some simple tooling• CASSANDRA-11752 (fixed 2.2.8, 3.0.9, 3.8)

47

Page 48: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Storage

Page 49: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Storing the data• We have data, now to store it. Approaches tend to follow:

– Store all data points• Provide aggregations either pre-computed as entered, MR, or on query

– Round Robin Database• Only store pre-computed aggregations

• Choice depends heavily on requirements

49

Page 50: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Round Robin Database• Store state required to generate the aggregations, and only store the

aggregations– Sum & Count for Average– Current min, max– “One pass” or “online” algorithms

• Constant footprint

50

Page 51: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Round Robin Database• Store state required to generate the aggregations, and only store the aggregations

– Sum & Count for Average– Current min, max– “One pass” or “online” algorithms

• Constant footprint

51

60 300 3600Sum 0 0 0Count 0 0 0Min 0 0 0Max 0 0 0

Page 52: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Round Robin Database> 10ms @ 00:00

52

60 300 3600Sum 10 10 10Count 1 1 1Min 10 10 10Max 10 10 10

Page 53: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Round Robin Database> 10ms @ 00:00> 12ms @ 00:30

53

60 300 3600Sum 22 22 22Count 2 2 2Min 10 10 10Max 12 12 12

Page 54: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Round Robin Database> 10ms @ 00:00> 12ms @ 00:30> 14ms @ 00:59

54

60 300 3600Sum 36 36 36Count 3 3 3Min 10 10 10Max 14 14 14

Page 55: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Round Robin Database> 10ms @ 00:00> 12ms @ 00:30> 14ms @ 00:59> 13ms @ 01:10

55

60 300 3600Sum 36 36 36Count 3 3 3Min 10 10 10Max 14 14 14

Page 56: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Round Robin Database> 10ms @ 00:00> 12ms @ 00:30> 14ms @ 00:59> 13ms @ 01:10

56

60 300 3600Sum 36 36 36Count 3 3 3Min 10 10 10Max 14 14 14

Average 12

Min 10

Max 14

Page 57: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Round Robin Database> 10ms @ 00:00> 12ms @ 00:30> 14ms @ 00:59> 13ms @ 01:10

57

60 300 3600Sum 0 36 36Count 0 3 3Min 0 10 10Max 0 14 14

Page 58: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Round Robin Database> 10ms @ 00:00> 12ms @ 00:30> 14ms @ 00:59> 13ms @ 01:10

58

60 300 3600Sum 13 49 49Count 1 4 4Min 13 10 10Max 13 14 14

Page 59: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Max is a lie• The issue with the deprecated LatencyTracker metrics is that the 1 minute interval

does not have a min/max. So we cannot compute true min/max

the rollups min/max will be the minimum and maximum average

59

Page 60: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histograms to the rescue (again)• The histograms of the data does not have this issue. But storage is

more complex. Some options include:– Store each bin of the histogram as a metric– Store the percentiles/min/max each as own metric– Store raw long[90] (possibly compressed)

60

Page 61: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histogram Storage Size• Some things to note:

– “Normal” clusters have over 100 tables.– Each table has at least two histograms we want to record

• Read latency• Write latency• Tombstones scanned• Cells scanned• Partition cell size• Partition cell count

61

Page 62: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histogram Storage

Because we store the extra histograms we have a 600 (minimum) with upper bounds seen to be over 24,000 histograms per minute.

• Storing 1 per bin means [54000] metrics (expensive to store, expensive to read)

• Storing raw histograms is [600] metrics• Storing min, max, 50th, 90th, 99th is [3000] metrics

– Additional problems with this• Cant compute 10th, 95th, 99.99th etc• Aggregations

62

Page 63: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 63

Aggregating Histograms

Averaging the percentiles

[ INSERT DISAPOINTED GIL TENE PHOTO ]

Page 64: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 64

Aggregating Histograms• Consider averaging the maximumIf there is a node with a 10 second GC, but the maximum latency on your other 9 nodes is 60ms. If you report a “Max 1 second” latency, it would be misleading.

• Poor at representing hotspots affects on your applicationOne node in 10 node raspberry pi cluster gets 1000 write reqs/sec while others get 10 reqs/sec. The 1 node being under heavy stress has a 90th percentile of 10 second. The other nodes are basically sub ms and writes are taking 1ms on 90 th percentile. Would report a 1 second 90th percentile, even though 10% of our applications writes are taking >10 seconds

Page 65: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

© DataStax, All Rights Reserved. 65

Aggregating Histograms

Merging histograms from different nodes more accurately can be straight forward:

Node1

Node2

Cluster

1-10 11-100 101-10002 8 1

1-10 11-100 101-10002 1 5

1-10 11-100 101-10004 9 6

Page 66: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Histogram Storage

Because we store the extra histograms we have a 600 (minimum) with upper bounds seen to be over 24,000 histograms per minute.

• Storing 1 per bin means [54000] metrics (expensive to store, expensive to read)

• Storing raw histograms is [600] metrics• Storing min, max, 50th, 90th, 99th is [3000] metrics

– Additional problems with this• Cant compute 10th, 95th, 99.99th etc• Aggregations

66

Page 67: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Raw Histogram storage• Storing raw histograms 160 (default) longs is a minimum of 1.2kb

bytes per rollup and hard sell

– 760kb per minute (600 tables)– 7.7gb for the 7 day TTL we want to keep our 1 min rollups at– ~77gb with 10 nodes– ~2.3 Tb on 10 node clusters with 3k tables– Expired data isn’t immediately purged so disk space can be much worse

67

Page 68: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Raw Histogram storage• Goal: We wanted this to be comparable to other min/max/avg metric

storage (12 bytes each)– 700mb on expected 10 node cluster– 2gb on extreme 10 node cluster

• Enter compression

68

Page 69: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Compressing Histograms• Overhead of typical compression makes it a non-starter.

– headers (ie 10 bytes for gzip) alone nearly exceeds the length used by existing rollup storage (~12 bytes per metric)

• Instead we opt to leverage known context to reduce the size of the data along with some universal encoding.

69

Page 70: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Compressing Histograms• Instead of storing every bin, only store the value of each bin with a value > 0

since most bin will have no data (ie, very unlikely for a read histogram to be between 1-10 microseconds which is first 10 bins)

• Write the count of offset/count pairs• Use varint for the bin count

– To reduce the value of the varint as much as possible we sort the offset/count pairs by the count and represent it as a delta sequence

70

Page 71: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Compressing Histograms0 0 0 0 1 0 0 0 100 0 0 9999999 0 0 1 127 128 129 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

71

1 byte 1 byte 1 byte 1 byte 1 byte 1 byte 1 byte 1 byte

1 byte 1 byte 1 byte 1 byte 1 byte 1 byte 1 byte 1 byte

1 byte 1 byte 1 byte 1 byte 1 byte 1 byte 1 byte 1 byte

Page 72: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Compressing Histograms0 0 0 0 1 0 0 0 100 0 0 9999999 0 0 1 127 128 129 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

72

7

Page 73: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Compressing Histograms0 0 0 0 1 0 0 0 100 0 0 9999999 0 0 1 127 128 129 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

{4:1, 8:100, 11:9999999, 14:1, 15:127, 16:128 17:129}

73

7

Page 74: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Compressing Histograms0 0 0 0 1 0 0 0 100 0 0 9999999 0 0 1 127 128 129 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

{4:1, 14:1, 8:100, 15:127, 16:128, 17:129, 11:9999999}

74

7

Page 75: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Compressing Histograms0 0 0 0 1 0 0 0 100 0 0 9999999 0 0 1 127 128 129 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

{4:1, 14:1, 8:100, 15:127, 16:128, 17:129, 11:9999999}

75

7 4 1

Page 76: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Compressing Histograms0 0 0 0 1 0 0 0 100 0 0 9999999 0 0 1 127 128 129 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

{4:1, 14:1, 8:100, 15:127, 16:128, 17:129, 11:9999999}

76

7 4 1 14 0

Page 77: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Compressing Histograms0 0 0 0 1 0 0 0 100 0 0 9999999 0 0 1 127 128 129 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

{4:1, 14:1, 8:100, 15:127, 16:128, 17:129, 11:9999999}

77

7 4 1 14 0 8 99

Page 78: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Compressing Histograms0 0 0 0 1 0 0 0 100 0 0 9999999 0 0 1 127 128 129 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

{4:1, 14:1, 8:100, 15:127, 16:128, 17:129, 11:9999999}

78

7 4 1 14 0 8 99 1527

Page 79: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Compressing Histograms0 0 0 0 1 0 0 0 100 0 0 9999999 0 0 1 127 128 129 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

{4:1, 14:1, 8:100, 15:127, 16:128, 17:129, 11:9999999}

79

7 4 1 14 0 8 99 1527 16 1 17 1

Page 80: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Compressing Histograms0 0 0 0 1 0 0 0 100 0 0 9999999 0 0 1 127 128 129 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

{4:1, 14:1, 8:100, 15:127, 16:128, 17:129, 11:9999999}

80

7 4 1 14 0 8 99 1527 16 1 17 1 11

9999870

Page 81: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Compressing HistogramsReal Life** results of compression:

81

Size in bytesMedian 1

75th 3

95th 15

99th 45

Max** 124

Page 82: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Note on HdrHistogram• Comes up every couple months• Very awesome histogram, popular replacement for Metrics reservoir.

– More powerful and general purpose than EH– Only slightly slower for all it offers

A issue comes up a bit with storage:

• Logged HdrHistograms are ~31kb each (30,000x more than our average use)• Compressed version: 1kb each• Perfect for many many people when tracking 1 or two metrics. Gets painful when

tracking hundreds or thousands

82

Page 83: Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016

Questions?