Cassandra Summit 2014: Monitor Everything!

83
About Me #CassandraSummit 2014 Sr. Engineer at Pythian o Lead of Cassandra Practice Remote in Minnesota Interests o Java, Clojure, Python dev o Data science o Information Security o Hobbyist electronics

description

Presenter: Chris Lohfink, Engineer at Pythian This session will cover a walk-through to provide an understanding of key metrics critical to operating a Cassandra cluster effectively. Without context to the metrics, we just have pretty graphs. With context, we have a powerful tool to determine problems before they happen and to debug production issues more quickly.

Transcript of Cassandra Summit 2014: Monitor Everything!

Page 1: Cassandra Summit 2014: Monitor Everything!

About Me

#CassandraSummit 2014

●  Sr. Engineer at Pythian o  Lead of Cassandra Practice

●  Remote in Minnesota ●  Interests

o  Java, Clojure, Python dev o  Data science o  Information Security o  Hobbyist electronics

Page 2: Cassandra Summit 2014: Monitor Everything!

About Pythian

#CassandraSummit 2014

Pythian is a global data outsourcing and consulting company that specializes in optimizing and managing mission-critical data systems.

Pythian blends the world’s leading data experts with advanced, secure service delivery processes to create the industry’s best standard of care for its clients.

Since its inception, Pythian has managed some of the world’s largest, most business-critical data infrastructures.

10,000 Pythian currently manages more than 10,000 systems.

350 Pythian currently employs more than 350 people in 25 countries worldwide.

1997 Pythian was founded in 1997

Page 3: Cassandra Summit 2014: Monitor Everything!

About Cassandra

#CassandraSummit 2014

●  No Single Point of Failure

●  Fault Tolerant

●  Awesome properties for an operations team who does not want to get up at 3am

Page 4: Cassandra Summit 2014: Monitor Everything!

About Cassandra

#CassandraSummit 2014

●  Nothing should be set up and forgotten about ●  Easy to do with Cassandra though

o  Fault tolerance on properly configured setup handles single node being down or having temp performance issues

o  No back pressure on writes until there is a lot of trouble

Page 5: Cassandra Summit 2014: Monitor Everything!

Utilize the fault tolerance buffer

#CassandraSummit 2014

●  Need to observe and react to current issues ●  Predict future issues ●  Divide this into two approaches o  Proactive o  Reactive

Page 6: Cassandra Summit 2014: Monitor Everything!

Proactive

#CassandraSummit 2014

●  Daily & Weekly checkups to prevent, and predict problems o  Capacity o  Performance bottlenecks o  Data Modeling issues

Page 7: Cassandra Summit 2014: Monitor Everything!

Reactive

#CassandraSummit 2014

●  Something about best laid plans… o  Hardware failures o  Bugs o  Malicious or Non-Malicious users

●  Alarms, Pager Duty

Page 8: Cassandra Summit 2014: Monitor Everything!

Common element

#CassandraSummit 2014

●  Data is needed o  form alerts o  find anomalies o  trending o  debugging

Page 9: Cassandra Summit 2014: Monitor Everything!

Metrics

#CassandraSummit 2014

●  Window to the application o  Bridge the gap - Coda Hale

Page 10: Cassandra Summit 2014: Monitor Everything!

Cassandra Environment

OpsCenter Logs

JMX CPU, Disk, Network

Nodetool JVM, GC

SOURCES

Gathering Metrics

#CassandraSummit 2014

Page 11: Cassandra Summit 2014: Monitor Everything!

Metrics but of course… Without context, the data is just pretty graphs

Page 12: Cassandra Summit 2014: Monitor Everything!

JMX

#CassandraSummit 2014

●  Java Management Extensions ●  Complex… very engineered ●  Resources represented as objects with

attributes and operations ●  Used for monitoring or as input

Page 13: Cassandra Summit 2014: Monitor Everything!

●  The annoying gateway to metrics ○  Poor tooling - requires java ○  Slow, Memory Leaks ○  Historically and currently frustrating for ops (pre 2.0.8)

JMX

#CassandraSummit 2014

1024-65535

Init connection to port 7199 Reply with hostname:port for

RMI connection

Client (You)

Gets new hostname:port, drops old connection and attempts to connect

7199

7199

Connected!

Cassandra

Page 14: Cassandra Summit 2014: Monitor Everything!

JMX

#CassandraSummit 2014

●  Visual o  jconsole o  visualvm

●  Command line o  jmxterm o  jmxsh

●  MX4J ●  Jolokia

Page 15: Cassandra Summit 2014: Monitor Everything!

JMX

#CassandraSummit 2014

[domain]:[key]=[value],[key2]=[value2]...

Page 16: Cassandra Summit 2014: Monitor Everything!

JMX

#CassandraSummit 2014

[domain]:[key]=[value],[key2]=[value2]... com.pythian:site=blog,type=views,target=post1

Page 17: Cassandra Summit 2014: Monitor Everything!

JMX

#CassandraSummit 2014

[domain]:[key]=[value],[key2]=[value2]... com.pythian:site=blog,type=views,target=post1

Page 18: Cassandra Summit 2014: Monitor Everything!

JMX

#CassandraSummit 2014

[domain]:[key]=[value],[key2]=[value2]... com.pythian:site=blog,type=views,target=post1

Page 19: Cassandra Summit 2014: Monitor Everything!

JMX Domains

#CassandraSummit 2014

org.apache.cassandra. ●  db ●  internal ●  net ●  request

Page 20: Cassandra Summit 2014: Monitor Everything!

org.apache.cassandra.metrics ●  db ●  internal ●  net ●  request

JMX Beans

#CassandraSummit 2014

Page 21: Cassandra Summit 2014: Monitor Everything!

JMX

#CassandraSummit 2014

org.apache.cassandra.metrics :type=

●  Cache ●  Client ●  ClientRequest ●  ClientRequestMetrics ●  ColumnFamily ●  CommitLog ●  Compaction

●  DroppedMessage ●  FileCache ●  Keyspace ●  Storage ●  ThreadPools

Page 22: Cassandra Summit 2014: Monitor Everything!

JMX

#CassandraSummit 2014

org.apache.cassandra.metrics

type=*, scope=*, name=*,

type=ThreadPools, path=*, scope=*, name=*,

type=ColumnFamily, keyspace=*, scope=*, name=*,

type=Keyspace, keyspace=*, name=*,

Page 23: Cassandra Summit 2014: Monitor Everything!

Metrics

#CassandraSummit 2014

●  Toolkit called metrics for metrics o  By Coda Hale @ Yammer

●  Easy to use ●  Popular

Page 24: Cassandra Summit 2014: Monitor Everything!

Types of Metrics

#CassandraSummit 2014

●  Gauge o  instantaneous value

●  Counter o  number that can be incremented & decremented

●  Meter o  rate of events over time (1/5/15 min moving avg)

●  Histogram o  representation of statistical distribution

§  50, 75, 95, 98, 99, 99.9 percentile §  average, median, min, max, standard deviation

●  Timer o  rate of events (meter) o  histogram of duration

Page 25: Cassandra Summit 2014: Monitor Everything!

JMX

#CassandraSummit 2014

75th percentile is 683 MICROSECONDS (75% took 683us or less)

One minute rate is 13,915 calls per SECOND

Page 26: Cassandra Summit 2014: Monitor Everything!

JMX

#CassandraSummit 2014

●  Overwhelming at first ●  Hard to tell what they mean without the source ●  Moves around a lot ●  Fortunately there is nodetool

Page 27: Cassandra Summit 2014: Monitor Everything!

Nodetool

#CassandraSummit 2014

●  JMX command line wrapper ●  Many options ●  Operations and diagnostic procedures ●  For reactive analysis

o  ad hoc, spot checks

Page 28: Cassandra Summit 2014: Monitor Everything!

Nodetool tpstats

#CassandraSummit 2014

nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0

Page 29: Cassandra Summit 2014: Monitor Everything!

Staged Event Driven Architecture

#CassandraSummit 2014

●  Decomposes complex event system ●  Set of stages (thread pools) ●  Queue between each ●  Shares a lot of pros cons as SOA

Page 30: Cassandra Summit 2014: Monitor Everything!

Staged Event Driven Architecture

#CassandraSummit 2014

Threads

ReadStage

x32

Clie

nt R

eque

st RequestResponse

Threads

ReadRepairStage

Threads

Messaging Service

Node 2

Node 1 Node 1

Nod

e 1

= Task

Page 31: Cassandra Summit 2014: Monitor Everything!

Staged Event Driven Architecture

#CassandraSummit 2014

●  Its easy to overrun the processing capabilities of a stage that is not in the requests feedback loop (i.e. ReadRepairStage).

●  No write back pressure

Page 32: Cassandra Summit 2014: Monitor Everything!

Nodetool tpstats

#CassandraSummit 2014

nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0

Page 33: Cassandra Summit 2014: Monitor Everything!

Nodetool tpstats

#CassandraSummit 2014

nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0

Page 34: Cassandra Summit 2014: Monitor Everything!

Nodetool tpstats

#CassandraSummit 2014

nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0

Page 35: Cassandra Summit 2014: Monitor Everything!

Nodetool tpstats

#CassandraSummit 2014

nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0

Page 36: Cassandra Summit 2014: Monitor Everything!

Nodetool tpstats

#CassandraSummit 2014

nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0

Page 37: Cassandra Summit 2014: Monitor Everything!

Nodetool tpstats

#CassandraSummit 2014

nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0

Page 38: Cassandra Summit 2014: Monitor Everything!

nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0

Nodetool tpstats

#CassandraSummit 2014

RequestResponse

Threads

Page 39: Cassandra Summit 2014: Monitor Everything!

nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 1 COUNTER_MUTATION 0

Nodetool tpstats

#CassandraSummit 2014

RequestResponse

Threads

Page 40: Cassandra Summit 2014: Monitor Everything!

Nodetool tpstats

#CassandraSummit 2014

nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}

More at: http://www.evidencebasedit.com/guide-to-cassandra-thread-pools

Page 41: Cassandra Summit 2014: Monitor Everything!

Nodetool cfhistograms

#CassandraSummit 2014

nodetool cfhistograms {keyspace} {table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

SSTables per Read 1 sstables: 98554 2 sstables: 4534 Write Latency (microseconds) No Data Read Latency (microseconds) 10 us: 2 12 us: 17 14 us: 96 17 us: 208 20 us: 677 24 us: 3081 29 us: 4552 35 us: 3559

Page 42: Cassandra Summit 2014: Monitor Everything!

Read Write Path mile high overview

#CassandraSummit 2014

Memtable SSTable

Writes Reads

Page 43: Cassandra Summit 2014: Monitor Everything!

Read Write Path mile high overview

#CassandraSummit 2014

Memtable SSTable

Writes Reads

Page 44: Cassandra Summit 2014: Monitor Everything!

Read Write Path mile high overview

#CassandraSummit 2014

Memtable SSTable

Writes Reads

Page 45: Cassandra Summit 2014: Monitor Everything!

Read Write Path mile high overview

#CassandraSummit 2014

Memtable SSTable

Writes Reads

Page 46: Cassandra Summit 2014: Monitor Everything!

Read Write Path mile high overview

#CassandraSummit 2014

Memtable SSTable

Writes Reads

Page 47: Cassandra Summit 2014: Monitor Everything!

Nodetool cfhistograms

#CassandraSummit 2014

nodetool cfhistograms {keyspace} {table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

SSTables per Read 1 sstables: 98554 2 sstables: 4534 Write Latency (microseconds) No Data Read Latency (microseconds) 10 us: 2 12 us: 17 14 us: 96 17 us: 208 20 us: 677 24 us: 3081 29 us: 4552 35 us: 3559

Page 48: Cassandra Summit 2014: Monitor Everything!

Nodetool cfhistograms 1.1

#CassandraSummit 2014

Offset SSTables Write Latency Read Latency Row Size Column Count 1 3579 0 0 0 0 2 0 0 0 0 0 . . . 35 0 0 0 0 0 42 0 0 27 0 0 50 0 0 187 0 0 60 0 10 460 0 0 72 0 200 689 0 0 86 0 663 552 0 0 103 0 796 367 0 0 124 0 297 736 0 0 149 0 265 243 0 0 179 0 460 263 0 0 . . . 25109160 0 0 0 0 0

nodetool cfhistograms {keyspace} {table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Page 49: Cassandra Summit 2014: Monitor Everything!

Nodetool cfhistograms

#CassandraSummit 2014

https://gist.github.com/clohfink/6068003

Page 50: Cassandra Summit 2014: Monitor Everything!

Nodetool cfhistograms 2.1

#CassandraSummit 2014

nodetool cfhistograms {keyspace} {table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace/Table histograms Percentile SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes) 50% 1.00 10.00 524.00 310 5 75% 1.00 11.75 888.00 310 5 95% 1.00 15.00 4843.75 310 5 98% 1.00 17.00 9658.90 310 5 99% 1.00 19.00 12306.47 310 5 Min 0.00 0.00 68.00 30 0 Max 2.00 1219386.00 45383.00 310 5

Page 51: Cassandra Summit 2014: Monitor Everything!

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0

Page 52: Cassandra Summit 2014: Monitor Everything!

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0

Page 53: Cassandra Summit 2014: Monitor Everything!

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0

Page 54: Cassandra Summit 2014: Monitor Everything!

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0

Page 55: Cassandra Summit 2014: Monitor Everything!

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 SSTables in each level: [14/4, 1, 0, …, 0] Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0

Page 56: Cassandra Summit 2014: Monitor Everything!

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0

Page 57: Cassandra Summit 2014: Monitor Everything!

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0

Page 58: Cassandra Summit 2014: Monitor Everything!

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0

Page 59: Cassandra Summit 2014: Monitor Everything!

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0

Page 60: Cassandra Summit 2014: Monitor Everything!

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0

Page 61: Cassandra Summit 2014: Monitor Everything!

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0

Page 62: Cassandra Summit 2014: Monitor Everything!

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0

Page 63: Cassandra Summit 2014: Monitor Everything!

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0

Page 64: Cassandra Summit 2014: Monitor Everything!

Nodetool proxyhistograms

#CassandraSummit 2014

nodetool proxyhistograms org.apache.cassandra.metrics:type=ClientRequest,scope={Read|Write|RangeSlice},name=Latency

$ nodetool proxyhistograms proxy histograms Read Latency (microseconds) 61214 us: 1 Write Latency (microseconds) 103 us: 22 124 us: 142 149 us: 297 179 us: 1190 215 us: 1823 258 us: 2091

...

Page 65: Cassandra Summit 2014: Monitor Everything!

Nodetool compactionstats

#CassandraSummit 2014

nodetool compactionstats org.apache.cassandra.metrics:type=Compaction

pending tasks: 1

compaction type keyspace table completed total unit Progress

Compaction Keyspace1 Standard1 6076415 29605054 bytes 20.06%

Active compaction remaining time : 0h00m03s

Page 66: Cassandra Summit 2014: Monitor Everything!

Nodetool compactionstats

#CassandraSummit 2014

nodetool compactionstats org.apache.cassandra.metrics:type=Compaction

pending tasks: 1

compaction type keyspace table completed total unit Progress

Compaction Keyspace1 Standard1 6076415 29605054 bytes 20.06%

Active compaction remaining time : 0h00m03s

Page 67: Cassandra Summit 2014: Monitor Everything!

Nodetool compactionstats

#CassandraSummit 2014

nodetool compactionstats org.apache.cassandra.metrics:type=Compaction

pending tasks: 1

compaction type keyspace table completed total unit Progress

Compaction Keyspace1 Standard1 6076415 29605054 bytes 20.06%

Active compaction remaining time : 0h00m03s

Page 68: Cassandra Summit 2014: Monitor Everything!

Nodetool compactionstats

#CassandraSummit 2014

nodetool compactionstats org.apache.cassandra.metrics:type=Compaction

pending tasks: 1

compaction type keyspace table completed total unit Progress

Compaction Keyspace1 Standard1 6076415 29605054 bytes 20.06%

Active compaction remaining time : 0h00m03s

Page 69: Cassandra Summit 2014: Monitor Everything!

Nodetool

#CassandraSummit 2014

Much more!! http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsNodetool_r.html

Page 70: Cassandra Summit 2014: Monitor Everything!

OpsCenter

#CassandraSummit 2014

●  Provides visibility to key metrics ●  Alarming ●  Basic orchestration and config management ●  Constantly improving ●  Free* ●  Almost zero barrier to get setup ●  Very few reasons not to run it

Page 71: Cassandra Summit 2014: Monitor Everything!

OpsCenter

#CassandraSummit 2014

●  Homogeneous tooling with rest of stack o  Integrate metrics in with what app is using o  orchestration and config management

●  (paid version) “Good enough” o  a mature environment should have more

Page 72: Cassandra Summit 2014: Monitor Everything!

Reporting Interface

#CassandraSummit 2014

Default Addons Community

JMX Ganglia Cassandra StatsD NewRelic Splunk

Console Graphite Cloudwatch Kafka Riemann TempDB

Csv Munin Riak InfluxDB Sematext

Slf4j MongoDB OpenTSDB Librato … MORE

Page 73: Cassandra Summit 2014: Monitor Everything!

Reporting Interface

#CassandraSummit 2014

●  Configurable with yaml o  console, csv, ganglia, graphite

●  Create reporter with premain agent o  compiling new jar with manifest o  add to classpath o  add javaagent in cassandra-env.sh

Page 74: Cassandra Summit 2014: Monitor Everything!

Garbage Collection

#CassandraSummit 2014

●  Death, Taxes, and a stop the world GC ●  Common issue to all JVM based applications

Page 75: Cassandra Summit 2014: Monitor Everything!

Garbage Collection

#CassandraSummit 2014

Enable gc logging ●  Virtually no overhead ●  Can be very helpful in diagnosing

performance issues

Page 76: Cassandra Summit 2014: Monitor Everything!

Garbage Collection

#CassandraSummit 2014

JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails" JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps" JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC" JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution" JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime" JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure" JVM_OPTS="$JVM_OPTS -XX:PrintFLSStatistics=1" JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log" JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc.log" JVM_OPTS="$JVM_OPTS -XX:+UseGCLogFileRotation" JVM_OPTS="$JVM_OPTS -XX:NumberOfGCLogFiles=10" JVM_OPTS="$JVM_OPTS -XX:GCLogFileSize=10M"

Page 77: Cassandra Summit 2014: Monitor Everything!

Garbage Collection

#CassandraSummit 2014

JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails" JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps" JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC" JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution" JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime" JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure" JVM_OPTS="$JVM_OPTS -XX:PrintFLSStatistics=1" JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log" JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc.log" JVM_OPTS="$JVM_OPTS -XX:+UseGCLogFileRotation" JVM_OPTS="$JVM_OPTS -XX:NumberOfGCLogFiles=10" JVM_OPTS="$JVM_OPTS -XX:GCLogFileSize=10M"

Page 78: Cassandra Summit 2014: Monitor Everything!

Garbage Collection

#CassandraSummit 2014

Could be its own talk Honorable mentions: ●  https://github.com/chewiebug/GCViewer ●  http://jworks.idv.tw/GcWeb/ ●  Python, R, Octave

Page 79: Cassandra Summit 2014: Monitor Everything!

Logging

#CassandraSummit 2014

/var/log/cassandra/system.log o  provides a rolling log o  log4j

/var/log/cassandra/output.log o  captured standard error and standard out o  truncated on restart

System Logs o  syslog, dmesg, etc

Page 80: Cassandra Summit 2014: Monitor Everything!

OS Metrics

#CassandraSummit 2014

Shout-out: http://www.brendangregg.com/linuxperf.html

Page 81: Cassandra Summit 2014: Monitor Everything!

JVM

#CassandraSummit 2014

●  Heap o  GC logs o  JMX

●  Threads o  jvmtop o  Jstack (+htop) o  kill -3 o  JMX

Page 82: Cassandra Summit 2014: Monitor Everything!

And Everything

#CassandraSummit 2014

Page 83: Cassandra Summit 2014: Monitor Everything!

Questions

#CassandraSummit 2014

?