Cassandra at Lithium

download Cassandra at Lithium

of 39

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Cassandra at Lithium

  • 1.Cassandra at Lithium Paul Cichonski, Senior Software Engineer @paulcichonski

2. Lithium? Helping companies build social communities for their customers Founded in 2001 ~300 customers ~84 million users ~5 million unique logins in past 20 days2 3. Use Case: Notification Service 1. Stores subscriptions 2. Processes community events 3. Generates notifications when events match against subscriptions 4. Builds user activity feed out of notifications3 4. Notification Service System View4 5. The Cluster (v1.2.6) 4 nodes, each node: Centos 6.4 8 cores, 2TB for commit-log, 3x 512GB SSD for data Average writes/s: 100-150, peak: 2000 Average reads/s: 100, peak: 1500 Use Astyanax on client-side5 6. Data Model6 7. Data Model: Subscriptions Fulfillmentidentifies target of subscription identifies entity that is subscribed7 8. standard_subscription_index row stored as: user:2:creationtimesta user:53:creationtimest user:88:creationtimest 66edfdb7-6ff7amp amp mp 458c-94a8421627c1b6f5:me 1390939665 1390939670 1390939660 ssage:13maps to (cqlsh):8 9. Data Model: Subscription Display (time series)9 10. subscriptions_for_entity_by_time row stored as: 1390939670:label:testl 66edfdb7-6ff71390939665:board:53 abel 458c-94a8421627c1b6f5:use r:2:01390939660:message: 13maps to (cqlsh):10 11. Data Model: Subscription Display (content browsing)11 12. subscriptions_for_entity_by_type row stored as: message:13:creationti 66edfdb7-6ff7mestamp 458c-94a8421627c1b6f5:use 1390939660 r:2board:53:creationtime label:testlabel:creation stamp timestamp 13909396651390939670maps to (cqlsh):12 13. Data Model: Activity Feed (fan-out writes)JSON blob representing activity13 14. activity_for_entity row stored as: 66edfdb7-6ff7458c-94a8421627c1b6f5:use r:2:031aac580-8550-11e3-ad74000c29351b9d:moderationA ction:event_summaryf4efd590-82ca-11e3-ad74000c29351b9d:badge:event_ summary1571b680-7254-11e3-8d70000c29351b9d:kudos:event_ summary{moderation_json}{badge_json}{kudos_json}maps to (cqlsh):14 15. Migration Strategy (mysql cassandra)15 16. Data Migration: Trust, but Verify Fully repeatable due to idempotent writes1) Bulk Migrate all subscription data (HTTP)liaNS2) Consistency check all subscription data (HTTP)Also runs after migration to verify shadow-writes16 17. Verify: Consistency Checking17 18. Subscription Write Strategy Reads for subscription fulfillment happen in ns.user subscription_writeNS system boundarysubscription_write (shadow_write) liaactivemqNotification ServicemysqlReads for UI fulfilled by legacy mysql (temporary)Cassandr a18 19. Path to Production: QA Issue #1 (many writes to same row kill cluster)19 20. Problem: CQL INSERTS Single Thread SLOW, even with BATCH (multiple second latency for writing chunks of 1000 subscriptions) Largest customer (~20 million subscriptions) would have taken weeks to migrate20 21. Just Use More Threads? Not Quite21 22. Cluster Essentially Died22 23. Mutations Could Not Keep Up23 24. Solution: Work Closer to Storage Layer Work here: user:2:creationtimesta user:53:creationtimest user:88:creationtimest 66edfdb7-6ff7amp amp mp 458c-94a8421627c1b6f5:me 1390939665 1390939670 1390939660 ssage:13Not here:24 25. Solution: Thrift batch_mutateMore details: Allowed us to write 200,000 subscriptions to 3 CFs in ~45 seconds with almost no impact on cluster. NOTE: supposedly fixed in 2.0: CASSANDRA-4693 25 26. Path to Production: QA Issue #2 (read timeouts)26 27. Tombstone Buildup and TimeoutsCF holding notification settings rewritten every 30 minutes Eventually tombstone build-up caused reads to time out27 28. Solution28 29. Production Issue #1 (dead cluster)29 30. Hard Drive Failure on All Nodes 4 days after release, we started seeing this in /var/log/cassandra/system.logAfter following a bunch of dead ends, we also found this in /var/messages.logThis cascaded to all nodes and within an hour, cluster was dead30 31. TRIM Support to the Rescue* 32. Production Issue #2 (repair causing tornadoes of destruction)32 33. Activity Feed Data Explosion Activity data written with a TTL of 30 days. Users in 99th percentile were receiving multiple thousands of writes per day. compacted row maximum size: ~85mb (after 30 days)Here, be Dragons: CASSANDRA-5799: Column can expire while lazy compacting it... 33 34. Problem Did Not Surface for 30 Days Repairs started taking up to a week Created 1000s of SSTables High latency:34 35. Solution: Trim Feeds Manually35 36. activity_for_entity cfstats36 37. How we monitor in Prod Nodetool, Opscenter and JMX to monitor cluster Yammer Metrics at every layer of Notification Service, use graphite to visualize Use Netflix Hystrix in Notification Service to guard against cluster failure37 38. Lessons Learned Have a migration strategy that allows both systems to stay live until you have proven Cassandra in prod Longevity tests are key, especially if you will have tombstones Understand how gc_grace_seconds and compaction affect tombstone cleanup Test with production data loads if you can 38 39. Questions? @paulcichonski39