SignalFx: Making Cassandra Perform as a Time Series Database

23
MM/DD/YY YOUR TITLE HERE PREPARED FOR: PLACE LOGO HERE Making Cassandra perform as a time series database Paul Ingram [email protected]

Transcript of SignalFx: Making Cassandra Perform as a Time Series Database

Page 1: SignalFx: Making Cassandra Perform as a Time Series Database

M M / D D / Y Y

YOUR T ITLE HERE

P R E PA R E D F O R :

P L A C E L O G O

H E R E

Making Cassandra performas a time series database

Paul [email protected]

Page 2: SignalFx: Making Cassandra Perform as a Time Series Database

Introduction

• real time streaming analytics for monitoring and alerting

• ingest many billions of points of timeseries data per day

• ingest at 1 second resolution

• all of this data ends up in cassandra

#CassandraSummit

Page 3: SignalFx: Making Cassandra Perform as a Time Series Database

What we’re talking about

• a metric is an abstract quantity such as CPU load or heap size

• a source is some entity which measures and reports metrics

• a datapoint is a value for a metric from a source at some time

• a timeseries a sequence of those datapoints over time

#CassandraSummit

Page 4: SignalFx: Making Cassandra Perform as a Time Series Database

4#CassandraSummit

overall performance (version 0→1→2→3)

Page 5: SignalFx: Making Cassandra Perform as a Time Series Database

5

original ingest path (version 0)

ingestserver loader queue

TSDBserversources

TSDB clients

sourcessources TSDBC*

#CassandraSummit

Page 6: SignalFx: Making Cassandra Perform as a Time Series Database

6

TSDB schema (versions 0,1,2,3)

CREATE TABLE table_0 ( segment text time timestamp, value blob, PRIMARY KEY (segment, time) ) WITH COMPACT STORAGE;

#CassandraSummit

Page 7: SignalFx: Making Cassandra Perform as a Time Series Database

7

cassandra operation (version 0)

#CassandraSummit

Page 8: SignalFx: Making Cassandra Perform as a Time Series Database

8#CassandraSummit

init ial performance (version 0)

Page 9: SignalFx: Making Cassandra Perform as a Time Series Database

buffered writes rationale (version 1)

• writing every datapoint individually is very expensive

• buffer data in memory

• write many points in a batch statement

• buffers are dropped when they have been written to cassandra

9#CassandraSummit

Page 10: SignalFx: Making Cassandra Perform as a Time Series Database

10

buffered write ingest path (versions 1,2)

TSDBserver sources

TSDB clients

sourcessources TSDBC*migratormemory

tieringestserver

#CassandraSummit

Page 11: SignalFx: Making Cassandra Perform as a Time Series Database

11

buffered writes operation (version 1)

#CassandraSummit

Page 12: SignalFx: Making Cassandra Perform as a Time Series Database

12

buffered writes performance (versions 0→1)

#CassandraSummit

Page 13: SignalFx: Making Cassandra Perform as a Time Series Database

packed writes rationale (version 2)

• writing data point-by-point means a column for each datapoint

• pack a buffer of datapoints into a block and write the block

• this will reduce the number of columns and write operations

• will have more impact on storage than on performance

• schema and overall flow remain the same

13#CassandraSummit

Page 14: SignalFx: Making Cassandra Perform as a Time Series Database

14

packed writes operation (version 2)

#CassandraSummit

Page 15: SignalFx: Making Cassandra Perform as a Time Series Database

15

packed writes performance (versions 1→2)

#CassandraSummit

Page 16: SignalFx: Making Cassandra Perform as a Time Series Database

redo-log rationale (version 3)

• if the ingest server dies, we lose the buffered data

• fix this with more cassandra

• write a persistent log of data as it’s written to the memory-tier

• when an ingest server restarts it will reload its memory-tier from this log

16#CassandraSummit

Page 17: SignalFx: Making Cassandra Perform as a Time Series Database

17

redo-log diagram (version 3)

TSDBserver sources

TSDB clients

sourcessources TSDBC*migratormemory

tieringestserver

logC*

#CassandraSummit

Page 18: SignalFx: Making Cassandra Perform as a Time Series Database

18

log schema (version 3)

CREATE TABLE table_0 ( stamp text, sequence bigint, value blob, PRIMARY KEY (stamp, sequence) ) WITH COMPACT STORAGE;

#CassandraSummit

Page 19: SignalFx: Making Cassandra Perform as a Time Series Database

19

packed writes with log operation (version 3)

#CassandraSummit

Page 20: SignalFx: Making Cassandra Perform as a Time Series Database

20

log performance (version 2→3)

#CassandraSummit

Page 21: SignalFx: Making Cassandra Perform as a Time Series Database

what we found

• matching the workload to the database is very important

• load is much more dependent on rate of writes than on volume of data written

• for our very write-heavy workload we saw 4x performance improvement by doing fewer, larger writes

• it turns out to be cheaper to write data twice efficiently than once naively

21#CassandraSummit

Page 22: SignalFx: Making Cassandra Perform as a Time Series Database

22

overall performance (version 0→1→2→3)

#CassandraSummit

Page 23: SignalFx: Making Cassandra Perform as a Time Series Database

23

Thanks

Paul [email protected]

#CassandraSummit

WE’RE [email protected]://signalfx.com/careers.html