Cassandra at Lithium

39
Cassandra at Lithium Paul Cichonski, Senior Software Engineer @paulcichonski

Transcript of Cassandra at Lithium

Page 1: Cassandra at Lithium

Cassandra at Lithium

Paul Cichonski, Senior Software Engineer

@paulcichonski

Page 2: Cassandra at Lithium

2

Lithium?

• Helping companies build social communities for their customers

• Founded in 2001• ~300 customers• ~84 million users • ~5 million unique logins

in past 20 days

Page 3: Cassandra at Lithium

3

Use Case: Notification Service

1. Stores subscriptions

2. Processes community events

3. Generates notifications when events match against subscriptions

4. Builds user activity feed out of notifications

Page 4: Cassandra at Lithium

4

Notification Service System View

Page 5: Cassandra at Lithium

5

The Cluster (v1.2.6)

• 4 nodes, each node:– Centos 6.4– 8 cores, 2TB for commit-log, 3x 512GB SSD

for data

• Average writes/s: 100-150, peak: 2000• Average reads/s: 100, peak: 1500• Use Astyanax on client-side

Page 6: Cassandra at Lithium

6

Data Model

Page 7: Cassandra at Lithium

7

Data Model: Subscriptions Fulfillment

identifies target of subscription

identifies entity that is subscribed

Page 8: Cassandra at Lithium

8

standard_subscription_index row

66edfdb7-6ff7-458c-94a8-

421627c1b6f5:message:13

user:2:creationtimestamp

1390939660

user:53:creationtimestamp

1390939665

user:88:creationtimestamp

1390939670

stored as:

maps to (cqlsh):

Page 9: Cassandra at Lithium

9

Data Model: Subscription Display (time series)

Page 10: Cassandra at Lithium

10

subscriptions_for_entity_by_time row

66edfdb7-6ff7-458c-94a8-

421627c1b6f5:user:2:0

1390939660:message:131390939665:board:531390939670:label:testl

abel

stored as:

maps to (cqlsh):

Page 11: Cassandra at Lithium

11

Data Model: Subscription Display (content browsing)

Page 12: Cassandra at Lithium

12

subscriptions_for_entity_by_type row

66edfdb7-6ff7-458c-94a8-

421627c1b6f5:user:2

message:13:creationtimestamp

1390939660

board:53:creationtimestamp

1390939665

label:testlabel:creationtimestamp

1390939670

stored as:

maps to (cqlsh):

Page 13: Cassandra at Lithium

13

Data Model: Activity Feed (fan-out writes)

JSON blob representing activity

Page 14: Cassandra at Lithium

14

activity_for_entity row

66edfdb7-6ff7-458c-94a8-

421627c1b6f5:user:2:0

1571b680-7254-11e3-8d70-000c29351b9d:kudos:event_

summary

{kudos_json}

31aac580-8550-11e3-ad74-000c29351b9d:moderationAc

tion:event_summary

{moderation_json}

f4efd590-82ca-11e3-ad74-000c29351b9d:badge:event_

summary

{badge_json}

stored as:

maps to (cqlsh):

Page 15: Cassandra at Lithium

15

Migration Strategy(mysql cassandra)

Page 16: Cassandra at Lithium

16

Data Migration: Trust, but Verify

lia NS

1) Bulk Migrate all subscription data (HTTP)

2) Consistency check all subscription data (HTTP)

Also runs after migration to verify shadow-writes

Fully repeatable due to idempotent writes

Page 17: Cassandra at Lithium

17

Verify: Consistency Checking

Page 18: Cassandra at Lithium

18

Subscription Write Strategy

CassandraNotification

Service

activemqlia

subscription_write (shadow_write)

NS system boundary

user

subscription_write

mysql

subs

crip

tion_

writ

e

Reads for UI fulfilled by legacy mysql (temporary)

Reads for subscription fulfillment happen in ns.

Page 19: Cassandra at Lithium

19

Path to Production: QA Issue #1(many writes to same row kill cluster)

Page 20: Cassandra at Lithium

20

Problem: CQL INSERTS

Single Thread SLOW, even with BATCH (multiple second latency for writing chunks of 1000 subscriptions)

Largest customer (~20 million subscriptions) would have taken weeks to migrate

Page 21: Cassandra at Lithium

21

Just Use More Threads? Not Quite

Page 22: Cassandra at Lithium

22

Cluster Essentially Died

Page 23: Cassandra at Lithium

23

Mutations Could Not Keep Up

Page 24: Cassandra at Lithium

24

Solution: Work Closer to Storage Layer

66edfdb7-6ff7-458c-94a8-

421627c1b6f5:message:13

user:2:creationtimestamp

1390939660

user:53:creationtimestamp

1390939665

user:88:creationtimestamp

1390939670

Work here:

Not here:

Page 25: Cassandra at Lithium

25

Solution: Thrift batch_mutate

More details: http://thelastpickle.com/blog/2013/09/13/CQL3-to-Astyanax-Compatibility.html

Allowed us to write 200,000 subscriptions to 3 CFs in ~45 seconds with almost no impact on cluster. NOTE: supposedly fixed in 2.0: CASSANDRA-4693

Page 26: Cassandra at Lithium

26

Path to Production: QA Issue #2(read timeouts)

Page 27: Cassandra at Lithium

27

Tombstone Buildup and Timeouts

CF holding notification settings re-written every 30 minutes

Eventually tombstone build-up caused reads to time out

Page 28: Cassandra at Lithium

28

Solution

Page 29: Cassandra at Lithium

29

Production Issue #1(dead cluster)

Page 30: Cassandra at Lithium

30

Hard Drive Failure on All Nodes4 days after release, we started seeing this in /var/log/cassandra/system.log

After following a bunch of dead ends, we also found this in /var/messages.log

This cascaded to all nodes and within an hour, cluster was dead

Page 31: Cassandra at Lithium

31

TRIM Support to the Rescue

* http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives

Page 32: Cassandra at Lithium

32

Production Issue #2(repair causing tornadoes of destruction)

Page 33: Cassandra at Lithium

33

Activity Feed Data Explosion

• Activity data written with a TTL of 30 days.• Users in 99th percentile were receiving

multiple thousands of writes per day.• compacted row maximum size: ~85mb

(after 30 days)

Here, be Dragons:– CASSANDRA-5799: Column can expire while lazy compacting

it...

Page 34: Cassandra at Lithium

34

Problem Did Not Surface for 30 Days

• Repairs started taking up to a week• Created 1000’s of SSTables• High latency:

Page 35: Cassandra at Lithium

35

Solution: Trim Feeds Manually

Page 36: Cassandra at Lithium

36

activity_for_entity cfstats

Page 37: Cassandra at Lithium

37

How we monitor in Prod

• Nodetool, Opscenter and JMX to monitor cluster

• Yammer Metrics at every layer of Notification Service, use graphite to visualize

• Use Netflix Hystrix in Notification Service to guard against cluster failure

Page 38: Cassandra at Lithium

38

Lessons Learned

• Have a migration strategy that allows both systems to stay live until you have proven Cassandra in prod

• Longevity tests are key, especially if you will have tombstones

• Understand how gc_grace_seconds and compaction affect tombstone cleanup

• Test with production data loads if you can

Page 39: Cassandra at Lithium

39

Questions?@paulcichonski