Riak at Posterous
-
Upload
capotej -
Category
Technology
-
view
7.771 -
download
0
description
Transcript of Riak at Posterous
Riak at PosterousJulio Capote
San Francisco Riak Meetup1/18/2012
A/S/L?
• Julio Capote
• Backend Developer at Posterous
• @capotej
• Allows anyone to create multiple private or public spaces (blogs)
• Around since 2008
• Millions of posts and users
• Tons of long tail traffic
Some of the first posts are still being accessed today due to search engines
How we store posts
• Original post body goes into MySQL
• Multiple variants are generated (nojs, mobile, etc)
• Expensive to generate (sanitizers, expanders)
Enter Variant Cache
• A generic read/write-through cache library
• Started with Memcache
• Moved to Redis
At the time disk store looked promising, so we moved from memcache to redis
Redis is awesome, but• Requires both the key and value go into
memory
• Terrible disk store performance
• Even with 3 machines with 64gb ram, couldn’t fit entire working set
• Forced to set a TTL
redis wasn’t really designed to ever hit the disk
The Dream
What we wanted
• Key/Value store
• Disk backed
• Built in distribution
• Use less boxes to serve more users
• Consistent performance over raw performance
Percona MySQL / HandlerSocket
MySQL / HandlerSocket
• Great performance
• Can handle a huge number of rows
• Mature / Safe (at least the mysql part)
The Good
MySQL /HandlerSocket
• Sharding definitely not built in
• HandlerSocket is pretty much abandoned
The Bad
No support going forward
MongoDB
• Crazy fast
• Built in sharding support
• ...did I mention it was fast?
The Good
MongoDB
• 30% standard deviation on fetch times (!)
• Would falsely acknowledge a write
The Bad
This is probably tunable, but still
Riak + Bitcask
• Distributed by default
• Consistent and predictable performance
• Highly concurrent, no perf degradation
• Ops guy loves it!
The Good
Riak + Bitcask
• Not crazy fast
• Stuck it behind memcache
• Still way faster than generating
• No multi get support
The Bad
write and read through memcache
Riak in production
• Started using our 3 node cluster for the global production cache
• Accidentally turned off a node
• Keys rebalanced, site didn’t skip a beat
• No one even noticed till hours later
Stats
• 3 nodes
• 2600+ requests/second
• 300+ GB
• ~200 million keys
• 10 GB memcache/host
#Protips
• All nodes can serve all requests, so...
• Use a vip, or...
• Pass all cluster nodes to client driver (thanks @aphyr!)
• Use curb instead of net/http
• Use Keep Alive
Any Questions?
Thanks for listening!
Special thanks to@twoism@vincentchu@kangchen@argv0@pharkmillups@seancribbs@aphyr@jrecursive