Cassandra meetup 20150331

60
@WrathOfChris github.com/WrathOfChris . blog.wrathofchris.com Time Series Metrics with Cassandra

Transcript of Cassandra meetup 20150331

Page 1: Cassandra meetup 20150331

@WrathOfChris github.com/WrathOfChris . blog.wrathofchris.com

Time Series Metrics with Cassandra

Page 2: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

About Me

• Chris Maxwell

• @WrathOfChris

• Sr Systems Engineer @ Ubiquiti Networks

• Cloud Guy

• DevOps

Page 3: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Mission

• Metrics service for internal services

• Deliver 90 60 30 days of system and app metrics

• Gain experience with Cassandra

Page 4: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

HistoryAncient Designs

Aging Tools

Pitfalls

https://flic.kr/p/6pqVnP

Page 5: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Graphite (v1)

• Single instance

• carbon-relay + (2-4) carbon-cacheprocesses (=cpu)

Page 6: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Graphite (v1)

Problems:

• Single point of SUCCESS!

• Can grow to 16-32 cores, but I/O saturation

• Carbon write-amplifies 10x (flushes every 10s)

Page 7: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Graphite (v2)

• Frontend: carbon-relay

• Backend: carbon-relay + 4x carbon-cache

• m3.2xlarge ephemeral SSD

• Manual consistent-hash by IP

• Replication 3

Page 8: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Graphite (v2)Problems:

• Kind of like a Dynamo, but not

• Replacing node requires full partition key shuffle

• Adding 5 nodes took 6 days on 1Gbps to re-replicate ring

• Less than 50% disk free means pain during reshuffle

Page 9: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Limitations

• Cloud Native

• Avoid Manual Intervention

• Ephemeral SSD > EBS

https://flic.kr/p/2hZy6P

Page 10: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

DesignWhat we set out to build

https://flic.kr/p/2spiXb

Page 11: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Graphite (v3)…it got complicated…

Page 12: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Graphite (v3)

Ingest:

• carbon-c-relayhttps://github.com/grobian/carbon-c-relay

• cyanitehttps://github.com/pyr/cyanite

• cassandra

Page 13: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Graphite (v3)

Retrieval:

• graphite-apihttps://github.com/brutasse/graphite-api

• grafanahttps://github.com/grafana/grafana

• cyanitehttps://github.com/pyr/cyanite

• elasticsearch(metric path cache)

Page 14: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Page 15: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

JourneyLessons learned along the way

https://flic.kr/p/hjY15L

Page 16: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Size Tiered Compaction

• Sorted String Table (SSTable) is an immutable data file

• New data written to small SSTables

• Periodically merged into larger SSTables

Page 17: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Size Tiered Compaction

• Merge 4 similarly sized SSTables into 1 new SSTable

• Data migrates into larger SSTables that are less-regularly compacted

• Disk space required:Sum of 4 largest SSTables

Page 18: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Size Tiered Compaction

• Updating a partition frequently may cause it to be spread between SSTables

• Metrics workload writes toall partitions,every period

Page 19: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Size Tiered Compaction

• Metrics workload writes toall partitions,every period

• Range queries that spanned 50+ SSTables !!!

Page 20: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Size Tiered Compaction

• Getting to the older data…

• Ingest 25% more data

• Major Compaction:

• Requires 50% free space

• Compacts all SSTables into 1 large SSTable

Page 21: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Aside: DELETE

• DELETE is the INSERT of a TOMBSTONE to the end of a partition

• INSERTs with TTL become tombstones in the future

• Tombstones live for at least gc_grace_seconds

• Data is only deleted during compaction

https://flic.kr/p/35RACf

Page 22: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

gc_grace_secondsGrace is getting something you don’t deserve(time to noetool repair a node that is down)

Page 23: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

gc_grace_secondsdeleted data reappears!

Page 24: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Time To Live

• INSERT with TTL becomes tombstone after expiry

• 10s for 6 hours

• 60s for 3 days

• 300s for 30 days

https://flic.kr/p/6Fxv7M

Page 25: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

TTL

• gc_grace_seconds is 10 days(by default)

• 10s for 6 hours 10.25 days

• 60s for 3 days 13 days

• 300s for 30 days 40 days

https://flic.kr/p/gBLHYf

Page 26: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

https://flic.kr/p/4LNiXg

https://flic.kr/p/35RACf

1.4TBDisks

Page 27: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Levelled Compaction

based on Google’s LevelDB implementation

Page 28: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Levelled Compaction

• Data is ingested at Level 0

• Immediately compacted and merged with L1

• Partitions are merged up to Ln

• 90% of partition data guaranteed to be in same level

Page 29: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Levelled Compaction• Metrics workload writes to

all partitions,every period

• Immediately rolled up to L1

• Immediately rolled up to L2

• Immediately rolled up to L3

• Immediately rolled up to L4

• Immediately rolled up to L5

Page 30: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Levelled Compaction

• Metrics workload writes toall partitions,every period

• 1 batch of writes —> 5 writes

Page 31: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Increasing Write rate

Constant Ingest rate

Page 32: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Increasing Write rate

Constant Ingest rate

https://flic.kr/p/4LNiXg

Page 33: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

compaction_throughput_mb_per_sec: 128

…then 0 (unlimited)

Page 34: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Speeding Compactions… Don’t Do This …multithreaded: true

cassandra_in_memory_compaction_limit_in_mb: 256M

Page 35: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Date Tiered Compaction

Page 36: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Date Tiered Compaction

• Written by Björn Hegerfors at Spotify

• Experimental!

• Released in 2.0.11 / 2.1.1

• Group data by time

• Compact by time

• Drop expired data by time

Page 37: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Compact SSTables by date window

Page 38: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

– but the docs say 8GB maximum heap!

MAX_HEAP_SIZE=16GHEAP_NEWSIZE=2048M

Page 39: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

– Rick Branson, Instagram

http://www.slideshare.net/planetcassandra/cassandra-summit-2014-cassandra-at-instagram-2014

-XX:+CMSScavengeBeforeRemark

-XX:CMSMaxAbortablePrecleanTime=60000

-XX:CMSWaitDuration=30000

Page 40: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

All systems normalInadvertently tested 30,000 writes/sec during

launch

Page 41: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Cloud Native

http://wattsupwiththat.com/2015/03/17/spaceship-lenticular-cloud-maybe-the-coolest-cloud-picture-evah/

Page 42: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Cloud NativeEc2MultiRegionSnitch

Page 43: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Cloud NativeEphemeral RAID0

-Djava.io.tmpdir=/mnt/cassandra/tmp

Page 44: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Disable AutoScaling Terminate Process:

aws autoscaling suspend-processes --scaling-processes Terminate

Page 45: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Cloud NativeThis design works to 50 instances per region

Page 46: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Security GroupsIAM instance-profile role

Security Group + (per region) Security Group

Page 47: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Management (OpsCenter)IAM instance-profile role

Security Group + (per region) Security Group

Page 48: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Internode Encryption

server_encryption_options: internode_encryption: all

• keytool -genkeypair -alias test-cass -keyalg RSA -validity 3650 \-keystore test-cass.keystore

• keytool -export -alias test-cass -keystore test-cass.keystore \-rfc -file test-cass.crt

• keytool -import -alias test-cass -file test-cass.crt -keystore \ test-cass.truststore

Page 49: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

SeedsCheated….

Page 50: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Seeds

• selects first 3 nodes from each region using Autoscale Group order

• ignores (self) as a seed for bootstrapping first 3 nodes in each region

Page 51: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

General• >= 4 Cores per node always

• >= 8 Cores as soon as feasible

• EC2 sweet spots:

• m3.2xlarge (8c/160GB) for small workloads

• i2.2xlarge (8c/1.6TB) for production

• Avoid c3.2xlarge - CPU:Mem ratio is too high

Page 52: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Breaking News!Dense-storage Instances for EC2

Page 53: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Questions?

Page 54: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

d2 instancesJoining a node - system/network

Page 55: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

d2 instancesJoining a node - disk performance

Page 56: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

GeneralMetrics

Page 57: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

GeneralCassandra Metrics

Page 58: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

MetricsCPU - DateTiered

Page 59: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

MetricsJVM - DateTiered

Page 60: Cassandra meetup 20150331

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

MetricsCompaction/CommitLog - DateTiered