Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

25
Monitoring C* Health @ Scale Jason Cacciatore

Transcript of Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Page 1: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Monitoring C* Health @ Scale

Jason Cacciatore

Page 2: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Netflix Scale

Hundreds of clusters

Thousands of nodes

Page 3: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

How do we assess health ?

• Node Level– dmesg errors– gossip status– thresholds of system metrics (disk usage, heap, etc)

• Cluster Level– Rely on C*’s view of its health (nt ring)– AWS Cache as secondary source of truth

Page 4: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Common Approach

CRON System

JobRunnerJob

RunnerJobRunnerJob

Runner

Page 5: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Common Scenario

Node disappears

T0T1T2T3T4

Page 6: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Problems inherent in polling

● Point-in-time snapshot, no state● Establishing a connection to a cluster when it’s

under heavy load is problematic● Not resilient to network hiccups, especially for

large clusters

Page 7: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

A different approach

What if we had a continuous stream of fine-grained snapshots ?

Page 8: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Mantis Streaming System

Stream processing system built on Apache Mesos

– Provides a flexible programming model– Models computation as a distributed DAG– Designed for high throughput, low latency– Open Source date soon

Page 9: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Streaming micro-services

Source

Stage

Sink

Source

Stage

SinkSource

Stage

Sink

Source - input, handles backpressureStage - business logicSink - output, handles backpressure

Source

Stage

Sink

MantisJob

Page 10: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Mantis Programming Model

• ReactiveX observable sequences• Source, Stage, and Sink together form an

observable chain (which only emits data when subscribed to)

Page 11: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Health Check using Mantis

Source

Job

LocalRingAgg

GlobalRingAgg

SourceJob

SourceJob

eu-west-1

us-east-1us-w

est-2

LocalRingAgg

LocalRingAgg

Page 12: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

How much data ?

● Each node sends data every 20 seconds● Payload size depends on cluster size● ~6 MB/s total across east, west, and eu sent to

Local Ring Aggregators● ~600Kbps processed by Global Aggregator

Page 13: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Local Ring Aggregator

• Stateless• Single instance per region• Groups data by C* cluster and scores it

Page 14: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Local Ring Aggregator (cont)@Override

public Observable<String> call(Context context, Observable<MantisServerSentEvent> o) {

...

return

...

.filter(this::isValid)

.map(NodeRingMessage::filterByOwnership)

.buffer(config.getWindowInMillis(), TimeUnit.MILLISECONDS, 5000)

.map((nodeRingMessageList) -> new AggregatedView(nodeRingMessageList, config.getAWSClient()).score())

.flatMap((score) -> score)

.map(gson::toJson);

Page 15: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Anatomy of a Score

● Evidence - aggregate of all data points gathered from all nodes

● AWS view - each instance in the cluster● Cluster metadata (token to IP mapping, name, etc)

Page 16: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Global Ring Aggregator

Score

ScoreScore

ScoreScoreScoreScore

GroupedScoreScoreScoreScoreScoreScoreScoreScoreScoreScoreScoreScore

window

Page 17: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Global Ring Aggregator (cont)

reduce ClusterHealth

EvaluatorSINK

Real TimeDashboard

Score

RemediationSystem

transform

Page 18: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Common Scenario revisited

T0

Cluster Health Evaluator

T1

Score

FSM[ Start ]

T2

Score

FSM[ Node Gone ]

T3

Score

FSM[ Wait for Signal ]

T4

<Not tracked>

FSM[ Wait for Signal ]

<Not tracked>

FSM[ Done ]

Page 19: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

That’s great, but...

Now the health of the fleet is encapsulated in a

single data stream, so how do we make sense of

that ?

Page 20: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Real Time Dash (Macro View)

Macro View of the fleet

Page 21: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Real Time Dash (Cluster View)

Page 22: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Real Time Dash (Perspective)

Page 23: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Benefits

● Faster detection of issues● Greater accuracy● Massive reduction in false positives● Operational - hosted by Mantis infrastructure● Separation of concerns (decouples detection from

remediation)

Page 24: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Monitoring the Monitor

● Mantis SLA● Bytes read + written● Incoming message count● Sink processed + dropped counts● CPU, memory

Page 25: Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Questions ?