Cassandra meetup 20150331
-
Upload
chris-maxwell -
Category
Technology
-
view
777 -
download
1
Transcript of Cassandra meetup 20150331
@WrathOfChris github.com/WrathOfChris . blog.wrathofchris.com
Time Series Metrics with Cassandra
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
About Me
• Chris Maxwell
• @WrathOfChris
• Sr Systems Engineer @ Ubiquiti Networks
• Cloud Guy
• DevOps
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Mission
• Metrics service for internal services
• Deliver 90 60 30 days of system and app metrics
• Gain experience with Cassandra
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
HistoryAncient Designs
Aging Tools
Pitfalls
https://flic.kr/p/6pqVnP
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v1)
• Single instance
• carbon-relay + (2-4) carbon-cacheprocesses (=cpu)
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v1)
Problems:
• Single point of SUCCESS!
• Can grow to 16-32 cores, but I/O saturation
• Carbon write-amplifies 10x (flushes every 10s)
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v2)
• Frontend: carbon-relay
• Backend: carbon-relay + 4x carbon-cache
• m3.2xlarge ephemeral SSD
• Manual consistent-hash by IP
• Replication 3
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v2)Problems:
• Kind of like a Dynamo, but not
• Replacing node requires full partition key shuffle
• Adding 5 nodes took 6 days on 1Gbps to re-replicate ring
• Less than 50% disk free means pain during reshuffle
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Limitations
• Cloud Native
• Avoid Manual Intervention
• Ephemeral SSD > EBS
https://flic.kr/p/2hZy6P
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
DesignWhat we set out to build
https://flic.kr/p/2spiXb
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v3)…it got complicated…
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v3)
Ingest:
• carbon-c-relayhttps://github.com/grobian/carbon-c-relay
• cyanitehttps://github.com/pyr/cyanite
• cassandra
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v3)
Retrieval:
• graphite-apihttps://github.com/brutasse/graphite-api
• grafanahttps://github.com/grafana/grafana
• cyanitehttps://github.com/pyr/cyanite
• elasticsearch(metric path cache)
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
JourneyLessons learned along the way
https://flic.kr/p/hjY15L
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Sorted String Table (SSTable) is an immutable data file
• New data written to small SSTables
• Periodically merged into larger SSTables
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Merge 4 similarly sized SSTables into 1 new SSTable
• Data migrates into larger SSTables that are less-regularly compacted
• Disk space required:Sum of 4 largest SSTables
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Updating a partition frequently may cause it to be spread between SSTables
• Metrics workload writes toall partitions,every period
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Metrics workload writes toall partitions,every period
• Range queries that spanned 50+ SSTables !!!
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Getting to the older data…
• Ingest 25% more data
• Major Compaction:
• Requires 50% free space
• Compacts all SSTables into 1 large SSTable
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Aside: DELETE
• DELETE is the INSERT of a TOMBSTONE to the end of a partition
• INSERTs with TTL become tombstones in the future
• Tombstones live for at least gc_grace_seconds
• Data is only deleted during compaction
https://flic.kr/p/35RACf
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
gc_grace_secondsGrace is getting something you don’t deserve(time to noetool repair a node that is down)
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
gc_grace_secondsdeleted data reappears!
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Time To Live
• INSERT with TTL becomes tombstone after expiry
• 10s for 6 hours
• 60s for 3 days
• 300s for 30 days
https://flic.kr/p/6Fxv7M
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
TTL
• gc_grace_seconds is 10 days(by default)
• 10s for 6 hours 10.25 days
• 60s for 3 days 13 days
• 300s for 30 days 40 days
https://flic.kr/p/gBLHYf
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
https://flic.kr/p/4LNiXg
https://flic.kr/p/35RACf
1.4TBDisks
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Levelled Compaction
based on Google’s LevelDB implementation
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Levelled Compaction
• Data is ingested at Level 0
• Immediately compacted and merged with L1
• Partitions are merged up to Ln
• 90% of partition data guaranteed to be in same level
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Levelled Compaction• Metrics workload writes to
all partitions,every period
• Immediately rolled up to L1
• Immediately rolled up to L2
• Immediately rolled up to L3
• Immediately rolled up to L4
• Immediately rolled up to L5
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Levelled Compaction
• Metrics workload writes toall partitions,every period
• 1 batch of writes —> 5 writes
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Increasing Write rate
Constant Ingest rate
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Increasing Write rate
Constant Ingest rate
https://flic.kr/p/4LNiXg
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
compaction_throughput_mb_per_sec: 128
…then 0 (unlimited)
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Speeding Compactions… Don’t Do This …multithreaded: true
cassandra_in_memory_compaction_limit_in_mb: 256M
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Date Tiered Compaction
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Date Tiered Compaction
• Written by Björn Hegerfors at Spotify
• Experimental!
• Released in 2.0.11 / 2.1.1
• Group data by time
• Compact by time
• Drop expired data by time
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Compact SSTables by date window
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
– but the docs say 8GB maximum heap!
MAX_HEAP_SIZE=16GHEAP_NEWSIZE=2048M
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
– Rick Branson, Instagram
http://www.slideshare.net/planetcassandra/cassandra-summit-2014-cassandra-at-instagram-2014
-XX:+CMSScavengeBeforeRemark
-XX:CMSMaxAbortablePrecleanTime=60000
-XX:CMSWaitDuration=30000
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
All systems normalInadvertently tested 30,000 writes/sec during
launch
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Cloud Native
http://wattsupwiththat.com/2015/03/17/spaceship-lenticular-cloud-maybe-the-coolest-cloud-picture-evah/
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Cloud NativeEc2MultiRegionSnitch
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Cloud NativeEphemeral RAID0
-Djava.io.tmpdir=/mnt/cassandra/tmp
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Disable AutoScaling Terminate Process:
aws autoscaling suspend-processes --scaling-processes Terminate
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Cloud NativeThis design works to 50 instances per region
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Security GroupsIAM instance-profile role
Security Group + (per region) Security Group
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Management (OpsCenter)IAM instance-profile role
Security Group + (per region) Security Group
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Internode Encryption
server_encryption_options: internode_encryption: all
• keytool -genkeypair -alias test-cass -keyalg RSA -validity 3650 \-keystore test-cass.keystore
• keytool -export -alias test-cass -keystore test-cass.keystore \-rfc -file test-cass.crt
• keytool -import -alias test-cass -file test-cass.crt -keystore \ test-cass.truststore
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
SeedsCheated….
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Seeds
• selects first 3 nodes from each region using Autoscale Group order
• ignores (self) as a seed for bootstrapping first 3 nodes in each region
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
General• >= 4 Cores per node always
• >= 8 Cores as soon as feasible
• EC2 sweet spots:
• m3.2xlarge (8c/160GB) for small workloads
• i2.2xlarge (8c/1.6TB) for production
• Avoid c3.2xlarge - CPU:Mem ratio is too high
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Breaking News!Dense-storage Instances for EC2
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Questions?
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
d2 instancesJoining a node - system/network
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
d2 instancesJoining a node - disk performance
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
GeneralMetrics
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
GeneralCassandra Metrics
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
MetricsCPU - DateTiered
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
MetricsJVM - DateTiered
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
MetricsCompaction/CommitLog - DateTiered