Download - Cassandra Day Atlanta 2016 - Monitoring Cassandra

Transcript
Page 1: Cassandra Day Atlanta 2016  - Monitoring Cassandra

CASSANDRA DAY ATLANTA 2016

MONITORING CASSANDRA

Aaron Morton@aaronmorton

CEO

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Page 2: Cassandra Day Atlanta 2016  - Monitoring Cassandra

About The Last Pickle.

Work with clients to deliver and improve Apache Cassandra based solutions.

Apache Cassandra Committer and DataStax MVPs.

Based in New Zealand, Australia, France & USA.

Page 3: Cassandra Day Atlanta 2016  - Monitoring Cassandra

MetricsMonitoring & Alerting

Insights

Page 4: Cassandra Day Atlanta 2016  - Monitoring Cassandra

codehale / yammer / drop wizard

Page 5: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Metrics<dependency groupId=“io.dropwizard.metrics" artifactId=“metrics-core" version="3.1.0" />

Page 6: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Metrics

Seperate Collection from Reporting.

Page 7: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Metrics Collection

Metrics are always collected.

Page 8: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Metrics

Metrics have a dotted notation name, timestamp, and

value e.g.com.thelastpickle.presenters.count=2

Page 9: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Metric Types

Gauge.

A simple value.

Page 10: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Metric Types

Ratio Gauge.

A ratio between two values.

Page 11: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Metric Types

Histograms.

The distribution of values in a stream of data.

Page 12: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Histograms

Quantiles (e.g. 75th, 95th) calculated using reservoir

sampling.(Check docs.)

Page 13: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Histograms

Default Exponentially Decaying Reservoirs, (roughly) the last five

minutes of data, exponential weighting towards newer data.

(Check docs.)

Page 14: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Metric Types

Meter

Measures the per second rate at which a set of events occur.

Page 15: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Meter

Three different exponentially-weighted moving average rates: 1, 5, and 15 minutes

Page 16: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Metric Types

Timer.

Histogram of duration and rate of events .

Page 17: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Reporting

Reporters run in the Cassandra process, pushing

metrics to external services.

Page 18: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Reporters

ConsoleReporter, GraphiteReporter, InfluxDBReporter, RiemannReporter,

Page 19: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Reporters In Cassandra

Configuration file:

metrics-reporter-config-sample.yaml

Page 20: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Reporters In Cassandragraphite: - period: 10 timeunit: 'SECONDS' prefix: 'cassandra.prod.ip_1_2_3_4.' hosts: - host: '1.2.3.4' port: 2003 predicate: color: "white" useQualifiedName: true patterns: - "^org.apache.cassandra.metrics.+"

Page 21: Cassandra Day Atlanta 2016  - Monitoring Cassandra

metrics-reporter-config

Configures Metrics reporters.

github.com/addthis/metrics-reporter-config

Page 22: Cassandra Day Atlanta 2016  - Monitoring Cassandra

metrics-reporter-config

Supports:

GangliaGraphiteRiemann

Page 23: Cassandra Day Atlanta 2016  - Monitoring Cassandra

JMX

Cassandra creates JMX MBeans for each Metric.

Page 24: Cassandra Day Atlanta 2016  - Monitoring Cassandra

JMX

Page 25: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Reporters

Reporters may change the name of measures, e.g.95thPercentile == p95

Page 26: Cassandra Day Atlanta 2016  - Monitoring Cassandra

MetricsMonitoring & Alerting

Insights

Page 27: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Monitoring and Alerting

Use what you like and what works for you.

Page 28: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Monitoring Platforms

OpsCentre, Grafana & Graphite, DataDog, Riemann

Page 29: Cassandra Day Atlanta 2016  - Monitoring Cassandra

MetricsMonitoring & Alerting

Insights

Page 30: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Names ?

All under

org.apache.cassandra.metrics

Page 31: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Scale ?

Latency? microsecondsRates? per second

Data? bytes

Page 32: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Percentiles ? 75thPercentile 95thPercentile 99thPercentile

Page 33: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Rates ? OneMinuteRate

Page 34: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Request Throughput - All RequestsClientRequest.

$REQUEST.Latency.1MinuteRate

CASRead, CASWrite, RangeSlice, Read, ViewWrite,

Write

Page 35: Cassandra Day Atlanta 2016  - Monitoring Cassandra

A Note On Requests

We will focus onRead, Write

But there are othersCAS*, RangeSlice, ViewWrite

Page 36: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Request Throughput - Per TableTable.$KEYSPACE.$TABLE.

ReadLatency.1MinuteRate WriteLatency.1MinuteRate

Page 37: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Request Latency - All RequestsClientRequest.

Write.Latency.95percentile Read.Latency.95percentile

Page 38: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Request Latency - Per TableTable.$KEYSPACE.$TABLE.

CoordinatorReadLatency.95percentile

Page 39: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Local Latency - Per TableTable.$KEYSPACE.$TABLE.

WriteLatency.95percentile ReadLatency.95percentile

Page 40: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Local Read PathTable.$KEYSPACE.$TABLE.

KeyCacheHitRate.value BloomFilterFalseRatio.value

LiveScannedHistogram.95percentile TombstoneScannedHistogram.95percentile SSTablesPerReadHistogram.95percentile

Page 41: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Memory UsageTable.$KEYSPACE.$TABLE.

BloomFilterOffHeapMemoryUsed.value IndexSummaryOffHeapMemoryUsed.value

MemtableOnHeapSize.value MemtableOffHeapSize.value

Page 42: Cassandra Day Atlanta 2016  - Monitoring Cassandra

ClientsClient.connnectedNativeClients.value

CQL.PreparedStatementsRatio.value

CQL.PreparedStatementsEvicted.value

Page 43: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Client ErrorsClientRequest.

$REQUEST.Unavailables.1MinuteRate $REQUEST.Timeouts.1MinuteRate $REQUEST.Failures.1MinuteRate

Page 44: Cassandra Day Atlanta 2016  - Monitoring Cassandra

InconsistencyStorage.TotalHints.count

HintedHandOffManager. Hints_created-$IP_ADDRESS.count

Connection.TotalTimeouts.1MinuteRate Connection.$IP_ADDRESS.Timeouts.

1MinuteRate

Page 45: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Inconsistency

Will also want to monitor dropped messages, later…

Page 46: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Eventual ConsistencyReadRepair.Attempted.1MinuteRate

ReadRepair.RepairedBackground.1MinuteRate

ReadRepair.RepairedBlocking.1MinuteRate

Page 47: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Server ErrorsStorage.Exceptions.count

Page 48: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Disk UsageStorage.Load.count

Table.$KEYSPACE.$TABLE. TotalDiskSpaceUsed.count

Page 49: Cassandra Day Atlanta 2016  - Monitoring Cassandra

CompactionsCompaction.PendingTasks.value

Compaction.TotalCompactionsCompleted.1MinuteRate

Table.$KEYSPACE.$TABLE.PendingCompactions .value

Page 50: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Thread Pool PerformanceThreadPools.request.

MutationStage.PendingTasks.value ReadStage.PendingTasks.value

CounterMutationStage.PendingTasks.value RequestResponseStage.PendingTasks.value

ViewMutationStage.PendingTasks.value

Page 51: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Thread Pool PerformanceDroppedMessage.

MUTATION.Dropped.1MinuteRate READ.Dropped.1MinuteRate

Page 52: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Thread Pool PerformanceDroppedMessage.

$VERB.InternalDroppedLatency .95thPercentile

$VERB.CrossNodeDroppedLatency .95thPercentile

Page 53: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Commit Log PerformanceCommitLog.

PendingTasks.Value

WaitingOnSegmentAllocation.95thPercentile

WaitingOnCommit.Value

Page 54: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Thanks.

Page 55: Cassandra Day Atlanta 2016  - Monitoring Cassandra

Aaron Morton@aaronmorton

Co-Founder & Principal Consultantwww.thelastpickle.com