Cassandra @Formspring
Tuesday, August 30, 2011
Cassandra @Formspring
Yet another Knight, Dragon and Princess Story.
Tuesday, August 30, 2011
Formspring helps people find out more about each other by sharing interesting & personal responses
Tuesday, August 30, 2011
25M users
Tuesday, August 30, 2011
3.5B responses
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Follow[ing|ers]
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Our situation before Cassandra
Tuesday, August 30, 2011
MySQL + Memcache
Tuesday, August 30, 2011
MySQL + Memcache
• Flat MySQL table
Tuesday, August 30, 2011
MySQL + Memcache
• Flat MySQL table
• Indexes (Followers vs Following)
Tuesday, August 30, 2011
MySQL + Memcache
• Flat MySQL table
• Indexes (Followers vs Following)
• Memcache
Tuesday, August 30, 2011
Memcache
• Stored as JSON list
Tuesday, August 30, 2011
Memcache
• Stored as JSON list
• Can get out of sync
Tuesday, August 30, 2011
MySQL
• Hundreds of millions of rows
Tuesday, August 30, 2011
MySQL
• Hundreds of millions of rows
• m2.4xl EC2 instance
Tuesday, August 30, 2011
It all started with a Feature.
Tuesday, August 30, 2011
QOTD
Tuesday, August 30, 2011
QOTD
Tuesday, August 30, 2011
QOTD
• Feb: 35K Followers
Tuesday, August 30, 2011
QOTD
• Feb: 35K Followers
• Mid-Feb: 100K
Tuesday, August 30, 2011
QOTD
• Feb: 35K Followers
• Mid-Feb: 100K
• Growing about 30K a week.
Tuesday, August 30, 2011
QOTD
• Feb: 35K Followers
• Mid-Feb: 100K
• Growing about 30K a week.
• 200K: closer to Memcache 1MB limit.
Tuesday, August 30, 2011
QOTD
• Feb: 35K Followers
• Mid-Feb: 100K
• Growing about 30K a week.
• 200K: closer to Memcache 1MB limit.
• (with gz compaction . . .)
Tuesday, August 30, 2011
Then we added more features . . .
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Temporary solution
Tuesday, August 30, 2011
Redis
• Key : JSON blob of IDs
Tuesday, August 30, 2011
Redis
• Key : JSON blob of IDs
• ~2-3s / insert
Tuesday, August 30, 2011
Frontend Ask
MySQLMemCache
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
MySQLMemCache
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
SELECT account_idFROM follow WHERE following = 1234 MySQL
MemCache
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
MySQLMemCache
OH SHIT
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
OH SHIT
MySQLMemCache
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
Gazillion messages
OH SHIT
MySQLMemCache
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
Gazillion messagesInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWs
OH SHIT
MySQLMemCache
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
Gazillion messagesInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQW
Inbox Object
InboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWs
OH SHIT
MySQLMemCache
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
Gazillion messagesInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQW
Inbox Object
InboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWs
*HANGING*OH SHIT
MySQLMemCache
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
Gazillion messagesInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQW
Inbox Object
InboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWs
*OOM*OH SHIT
MySQLMemCache
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
Gazillion messagesInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQW
Inbox Object
InboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWs
OutboxQW
OutboxQW
OutboxQW
OutboxQW
OutboxQW
OutboxQW
OH SHIT
MySQLMemCache
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
Gazillion messagesInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQW
Inbox Object
InboxQWInboxQWInboxQWInboxQWInboxQWInboxQWInboxQWs
OutboxQW
OutboxQW
OutboxQW
OutboxQW
OutboxQW
OH SHIT
MySQLMemCacheOOM
Tuesday, August 30, 2011
We need a solution
Tuesday, August 30, 2011
Fast
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Step 1: Research
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Our first shot with Cassandra
Tuesday, August 30, 2011
All Responses
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Explain our stack here
Tuesday, August 30, 2011
Index of responses to questions on the site.
Tuesday, August 30, 2011
Index of responses to questions on the site.
Tuesday, August 30, 2011
Explain our stack here
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Ask Followers
Tuesday, August 30, 2011
How to scale ?
Tuesday, August 30, 2011
Build a Social Graph
Tuesday, August 30, 2011
Social Graph
Tuesday, August 30, 2011
Build a Social Graph
• Python
Tuesday, August 30, 2011
Build a Social Graph
• Python
• Cassandra (Obviously..)
Tuesday, August 30, 2011
Build a Social Graph
• Python
• Cassandra (Obviously..)
• PyCassa
Tuesday, August 30, 2011
Build a Social Graph
• Python
• Cassandra (Obviously..)
• PyCassa
• Thrift
Tuesday, August 30, 2011
4 types of actions
• Following
Tuesday, August 30, 2011
4 types of actions
• Following
• Followers
Tuesday, August 30, 2011
4 types of actions
• Following
• Followers
• Blocks
Tuesday, August 30, 2011
4 types of actions
• Following
• Followers
• Blocks
• Blockers
Tuesday, August 30, 2011
Thrift
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Tuesday, August 30, 2011
Cassandra
Tuesday, August 30, 2011
Cassandra
Tuesday, August 30, 2011
Cassandra
Tuesday, August 30, 2011
Cassandra
Tuesday, August 30, 2011
Cassandra
Tuesday, August 30, 2011
Cassandra
Tuesday, August 30, 2011
Python + Pycassa
Tuesday, August 30, 2011
Python + Pycassa
Column Families
Tuesday, August 30, 2011
Python + Pycassa
Get list of Follow[ers/ing]
Tuesday, August 30, 2011
Python + Pycassa
More Queries
Tuesday, August 30, 2011
Python + Pycassa
Follow user
Tuesday, August 30, 2011
Switch to Production
Tuesday, August 30, 2011
Simultaneous writes to MySQL and Cassandra
Tuesday, August 30, 2011
Cassandra MySQL
Read x
Write x x
Switch to Production
Tuesday, August 30, 2011
Idempotence is your friend.
Tuesday, August 30, 2011
Migration from MySQL to Cassandra
Tuesday, August 30, 2011
Load batches of user info into Cassandra.
Tuesday, August 30, 2011
It’s OK if it fails.
Tuesday, August 30, 2011
Remember idempotence is your
friend.
Tuesday, August 30, 2011
Adjust your capacity
Tuesday, August 30, 2011
From 4 instancesto
a 12 instances
Tuesday, August 30, 2011
New Ask Followers Infrastructure
Tuesday, August 30, 2011
Frontend Ask
CassandraCluster
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
CassandraCluster
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
CassandraCluster
IterativeKW
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
CassandraCluster
IterativeKW
get_followers(user_id, 0, N)
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
CassandraCluster
IterativeKW
[‘1234, ‘1238’, ..., ‘1999’]
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
CassandraCluster
IterativeKW
[‘1234, ‘1238’, ..., ‘1999’]
N Messages
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
CassandraCluster
IterativeKW
[‘1234, ‘1238’, ..., ‘1999’]
InboxQW
N Messages
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
CassandraCluster
IterativeKW
[‘1234, ‘1238’, ..., ‘1999’]
InboxQW
N Inbox Object
Tuesday, August 30, 2011
OutboxQW
Frontend Ask
CassandraCluster
IterativeKW
InboxQW
N Inbox Object
get_followers(user_id, N, N+N)
Tuesday, August 30, 2011
Count Following
14K/min hits at peak time
Tuesday, August 30, 2011
Count Followers
40K/min hits at peak time
Tuesday, August 30, 2011
Reduce our #instances
InboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWs
InboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsOutboxQWs
12x 12x
Tuesday, August 30, 2011
InboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWs
12x
InboxQWsInboxQWsInboxQWsInboxQWsOutboxQWs
4x
Reduce our #instances
Tuesday, August 30, 2011
InboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWsInboxQWs
Inbox+
Iterator QW
InboxQWsInboxQWsInboxQWsInboxQWsOutboxQWs
12x 4x
Reduce our #instances
Tuesday, August 30, 2011
Counts for free with 0.8.1
Tuesday, August 30, 2011
Counting . . .
Tuesday, August 30, 2011
Counting is hard
Let’s go Shopping.
Tuesday, August 30, 2011
Counting
• Fast
Tuesday, August 30, 2011
Counting
• Fast
• Not-accurate
Tuesday, August 30, 2011
Counting
• Fast
• Not-accurate
• Faking “Read-repair”
Tuesday, August 30, 2011
Counting
Tuesday, August 30, 2011
Ghetto Count
Tuesday, August 30, 2011
Cassandra allowed us:
• To create new features.
Tuesday, August 30, 2011
Cassandra allowed us:
• To create new features.
• To scale.
Tuesday, August 30, 2011
Cassandra allowed us:
• To create new features.
• To scale.
• To save money.
Tuesday, August 30, 2011
Cassandra allowed us:
• To create new features.
• To scale.
• To save money.
• To sleep. (Almost there)
Tuesday, August 30, 2011
Cassandra allowed us:
• To create new features.
• To scale.
• To save money.
• To sleep. (Almost there)
• Lots of fun :)
Tuesday, August 30, 2011
Come work with us !
Tuesday, August 30, 2011
Top Related