Riak at Posterous

26
Riak at Posterous Julio Capote San Francisco Riak Meetup 1/18/2012

description

Posterous recently deployed Riak to serve as their content cache. In this talk, Julio Capote will cover why the engineering team chose Riak for the use case. He'll also share some details on the old post cache and its problems, what solutions they evaluated, and how they settled on Riak.

Transcript of Riak at Posterous

Page 1: Riak at Posterous

Riak at PosterousJulio Capote

San Francisco Riak Meetup1/18/2012

Page 2: Riak at Posterous

A/S/L?

• Julio Capote

• Backend Developer at Posterous

• @capotej

Page 3: Riak at Posterous

• Allows anyone to create multiple private or public spaces (blogs)

• Around since 2008

• Millions of posts and users

• Tons of long tail traffic

Some of the first posts are still being accessed today due to search engines

Page 4: Riak at Posterous

How we store posts

• Original post body goes into MySQL

• Multiple variants are generated (nojs, mobile, etc)

• Expensive to generate (sanitizers, expanders)

Page 5: Riak at Posterous

Enter Variant Cache

• A generic read/write-through cache library

• Started with Memcache

• Moved to Redis

At the time disk store looked promising, so we moved from memcache to redis

Page 6: Riak at Posterous

Redis is awesome, but• Requires both the key and value go into

memory

• Terrible disk store performance

• Even with 3 machines with 64gb ram, couldn’t fit entire working set

• Forced to set a TTL

redis wasn’t really designed to ever hit the disk

Page 7: Riak at Posterous
Page 8: Riak at Posterous

The Dream

Page 9: Riak at Posterous
Page 10: Riak at Posterous

What we wanted

• Key/Value store

• Disk backed

• Built in distribution

• Use less boxes to serve more users

• Consistent performance over raw performance

Page 11: Riak at Posterous

Percona MySQL / HandlerSocket

Page 12: Riak at Posterous

MySQL / HandlerSocket

• Great performance

• Can handle a huge number of rows

• Mature / Safe (at least the mysql part)

The Good

Page 13: Riak at Posterous

MySQL /HandlerSocket

• Sharding definitely not built in

• HandlerSocket is pretty much abandoned

The Bad

No support going forward

Page 14: Riak at Posterous
Page 15: Riak at Posterous

MongoDB

• Crazy fast

• Built in sharding support

• ...did I mention it was fast?

The Good

Page 16: Riak at Posterous
Page 17: Riak at Posterous

MongoDB

• 30% standard deviation on fetch times (!)

• Would falsely acknowledge a write

The Bad

This is probably tunable, but still

Page 18: Riak at Posterous
Page 19: Riak at Posterous

Riak + Bitcask

• Distributed by default

• Consistent and predictable performance

• Highly concurrent, no perf degradation

• Ops guy loves it!

The Good

Page 20: Riak at Posterous

Riak + Bitcask

• Not crazy fast

• Stuck it behind memcache

• Still way faster than generating

• No multi get support

The Bad

write and read through memcache

Page 21: Riak at Posterous

Riak in production

• Started using our 3 node cluster for the global production cache

• Accidentally turned off a node

• Keys rebalanced, site didn’t skip a beat

• No one even noticed till hours later

Page 22: Riak at Posterous
Page 23: Riak at Posterous

Stats

• 3 nodes

• 2600+ requests/second

• 300+ GB

• ~200 million keys

• 10 GB memcache/host

Page 24: Riak at Posterous

#Protips

• All nodes can serve all requests, so...

• Use a vip, or...

• Pass all cluster nodes to client driver (thanks @aphyr!)

• Use curb instead of net/http

• Use Keep Alive

Page 25: Riak at Posterous

Any Questions?

Page 26: Riak at Posterous

Thanks for listening!

Special thanks to@twoism@vincentchu@kangchen@argv0@pharkmillups@seancribbs@aphyr@jrecursive