Cassandra Day Atlanta 2016 - Monitoring Cassandra

Click here to load reader

  • date post

    15-Apr-2017
  • Category

    Technology

  • view

    600
  • download

    2

Embed Size (px)

Transcript of Cassandra Day Atlanta 2016 - Monitoring Cassandra

  • CASSANDRA DAY ATLANTA 2016

    MONITORING CASSANDRA

    Aaron Morton@aaronmorton

    CEO

    Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

    http://creativecommons.org/licenses/by-nc/3.0/nz/

  • About The Last Pickle.

    Work with clients to deliver and improve Apache Cassandra based solutions.

    Apache Cassandra Committer and DataStax MVPs.

    Based in New Zealand, Australia, France & USA.

  • MetricsMonitoring & Alerting

    Insights

  • codehale / yammer / drop wizard

  • Metrics

  • Metrics

    Seperate Collection from Reporting.

  • Metrics Collection

    Metrics are always collected.

  • Metrics

    Metrics have a dotted notation name, timestamp, and

    value e.g.com.thelastpickle.presenters.count=2

  • Metric Types

    Gauge.

    A simple value.

  • Metric Types

    Ratio Gauge.

    A ratio between two values.

  • Metric Types

    Histograms.

    The distribution of values in a stream of data.

  • Histograms

    Quantiles (e.g. 75th, 95th) calculated using reservoir

    sampling.(Check docs.)

  • Histograms

    Default Exponentially Decaying Reservoirs, (roughly) the last five

    minutes of data, exponential weighting towards newer data.

    (Check docs.)

  • Metric Types

    Meter

    Measures the per second rate at which a set of events occur.

  • Meter

    Three different exponentially-weighted moving average rates: 1, 5, and 15 minutes

  • Metric Types

    Timer.

    Histogram of duration and rate of events .

  • Reporting

    Reporters run in the Cassandra process, pushing

    metrics to external services.

  • Reporters

    ConsoleReporter, GraphiteReporter, InfluxDBReporter, RiemannReporter,

  • Reporters In Cassandra

    Configuration file:

    metrics-reporter-config-sample.yaml

  • Reporters In Cassandragraphite: - period: 10 timeunit: 'SECONDS' prefix: 'cassandra.prod.ip_1_2_3_4.' hosts: - host: '1.2.3.4' port: 2003 predicate: color: "white" useQualifiedName: true patterns: - "^org.apache.cassandra.metrics.+"

  • metrics-reporter-config

    Configures Metrics reporters.

    github.com/addthis/metrics-reporter-config

    http://github.com/addthis/metrics-reporter-config

  • metrics-reporter-config

    Supports:

    GangliaGraphiteRiemann

  • JMX

    Cassandra creates JMX MBeans for each Metric.

  • JMX

  • Reporters

    Reporters may change the name of measures, e.g.95thPercentile == p95

  • MetricsMonitoring & Alerting

    Insights

  • Monitoring and Alerting

    Use what you like and what works for you.

  • Monitoring Platforms

    OpsCentre, Grafana & Graphite, DataDog, Riemann

  • MetricsMonitoring & Alerting

    Insights

  • Names ?

    All under

    org.apache.cassandra.metrics

  • Scale ?

    Latency? microsecondsRates? per second

    Data? bytes

  • Percentiles ? 75thPercentile 95thPercentile 99thPercentile

  • Rates ? OneMinuteRate

  • Request Throughput - All RequestsClientRequest.

    $REQUEST.Latency.1MinuteRate

    CASRead, CASWrite, RangeSlice, Read, ViewWrite,

    Write

  • A Note On Requests

    We will focus onRead, Write

    But there are othersCAS*, RangeSlice, ViewWrite

  • Request Throughput - Per TableTable.$KEYSPACE.$TABLE.

    ReadLatency.1MinuteRate WriteLatency.1MinuteRate

  • Request Latency - All RequestsClientRequest.

    Write.Latency.95percentile Read.Latency.95percentile

  • Request Latency - Per TableTable.$KEYSPACE.$TABLE.

    CoordinatorReadLatency.95percentile

  • Local Latency - Per TableTable.$KEYSPACE.$TABLE.

    WriteLatency.95percentile ReadLatency.95percentile

  • Local Read PathTable.$KEYSPACE.$TABLE.

    KeyCacheHitRate.value BloomFilterFalseRatio.value

    LiveScannedHistogram.95percentile TombstoneScannedHistogram.95percentile SSTablesPerReadHistogram.95percentile

  • Memory UsageTable.$KEYSPACE.$TABLE.

    BloomFilterOffHeapMemoryUsed.value IndexSummaryOffHeapMemoryUsed.value

    MemtableOnHeapSize.value MemtableOffHeapSize.value

  • ClientsClient.connnectedNativeClients.value

    CQL.PreparedStatementsRatio.value

    CQL.PreparedStatementsEvicted.value

  • Client ErrorsClientRequest.

    $REQUEST.Unavailables.1MinuteRate $REQUEST.Timeouts.1MinuteRate $REQUEST.Failures.1MinuteRate

  • InconsistencyStorage.TotalHints.count

    HintedHandOffManager. Hints_created-$IP_ADDRESS.count

    Connection.TotalTimeouts.1MinuteRate Connection.$IP_ADDRESS.Timeouts.

    1MinuteRate

  • Inconsistency

    Will also want to monitor dropped messages, later

  • Eventual ConsistencyReadRepair.Attempted.1MinuteRate

    ReadRepair.RepairedBackground.1MinuteRate

    ReadRepair.RepairedBlocking.1MinuteRate

  • Server ErrorsStorage.Exceptions.count

  • Disk UsageStorage.Load.count

    Table.$KEYSPACE.$TABLE. TotalDiskSpaceUsed.count

  • CompactionsCompaction.PendingTasks.value

    Compaction.TotalCompactionsCompleted.1MinuteRate

    Table.$KEYSPACE.$TABLE.PendingCompactions .value

  • Thread Pool PerformanceThreadPools.request.

    MutationStage.PendingTasks.value ReadStage.PendingTasks.value

    CounterMutationStage.PendingTasks.value RequestResponseStage.PendingTasks.value

    ViewMutationStage.PendingTasks.value

  • Thread Pool PerformanceDroppedMessage.

    MUTATION.Dropped.1MinuteRate READ.Dropped.1MinuteRate

  • Thread Pool PerformanceDroppedMessage.

    $VERB.InternalDroppedLatency .95thPercentile

    $VERB.CrossNodeDroppedLatency .95thPercentile

  • Commit Log PerformanceCommitLog.

    PendingTasks.Value

    WaitingOnSegmentAllocation.95thPercentile

    WaitingOnCommit.Value

  • Thanks.

  • Aaron Morton@aaronmorton

    Co-Founder & Principal Consultantwww.thelastpickle.com

    http://www.thelastpickle.com