Key-Value-Stores -- The Key to Scaling?

68
Tim Lossen Key-Value-Stores: The Key to Scaling?

Transcript of Key-Value-Stores -- The Key to Scaling?

Page 1: Key-Value-Stores -- The Key to Scaling?

Tim Lossen

Key-Value-Stores:The Key to Scaling?

Page 2: Key-Value-Stores -- The Key to Scaling?

Who?• @tlossen

• backend developer

- Ruby, Rails, Sinatra ...

• passionate about technology

Page 4: Key-Value-Stores -- The Key to Scaling?
Page 5: Key-Value-Stores -- The Key to Scaling?

Problem

Page 6: Key-Value-Stores -- The Key to Scaling?

Challenge• backend for facebook game

Page 7: Key-Value-Stores -- The Key to Scaling?

Challenge• backend for facebook game

• expected load:

- 1 mio. daily active users

- 20 mio. total users

- 100 KB data per user

Page 8: Key-Value-Stores -- The Key to Scaling?

Challenge• expected peak traffic:

- 10.000 concurrent users

- 200.000 requests / minute

Page 9: Key-Value-Stores -- The Key to Scaling?

Challenge• expected peak traffic:

- 10.000 concurrent users

- 200.000 requests / minute

• write-heavy workload

Page 10: Key-Value-Stores -- The Key to Scaling?

Wanted• scalable database

• with high throughput

- especially for writes

Page 11: Key-Value-Stores -- The Key to Scaling?

Options• relational database

- with sharding

Page 12: Key-Value-Stores -- The Key to Scaling?

Options• relational database

- with sharding

• nosql database

- key-value-store

- document db

- graph db

Page 13: Key-Value-Stores -- The Key to Scaling?

Options• relational database

- with sharding

• nosql database

- key-value-store

- document db

- graph db

Page 14: Key-Value-Stores -- The Key to Scaling?

Options• relational database

- with sharding

• nosql database

- key-value-store

- document db

- graph db

Page 15: Key-Value-Stores -- The Key to Scaling?

Options• relational database

- with sharding

• nosql database

- key-value-store

- document db

- graph db

Page 16: Key-Value-Stores -- The Key to Scaling?

Shortlist• Cassandra

• Redis

• Membase

Page 17: Key-Value-Stores -- The Key to Scaling?

Cassandra

Page 18: Key-Value-Stores -- The Key to Scaling?

Facts• written in Java

- 55.000 lines of code

• Thrift API

- clients for Java, Ruby, Python ...

Page 19: Key-Value-Stores -- The Key to Scaling?

History• originally developed by Facebook

- in production for “Inbox Search”

• later open-sourced

- top-level Apache project

Page 20: Key-Value-Stores -- The Key to Scaling?

Features• high availability

- no single point of failure

• incremental scalability

• eventual consistency

Page 21: Key-Value-Stores -- The Key to Scaling?

Architecture• Dynamo-like hash ring

- partitioning + replication

- all nodes are equal

Page 22: Key-Value-Stores -- The Key to Scaling?

Hash Ring

Page 23: Key-Value-Stores -- The Key to Scaling?

Architecture• Dynamo-like hash ring

- partitioning + replication

- all nodes are equal

• Bigtable data model

- column families

- supercolumns

Page 24: Key-Value-Stores -- The Key to Scaling?
Page 25: Key-Value-Stores -- The Key to Scaling?

“Cassandra aims to run on an in"astructure of hundreds of nodes.”

Page 26: Key-Value-Stores -- The Key to Scaling?

Redis

Page 27: Key-Value-Stores -- The Key to Scaling?

Facts• written in C

- 13.000 lines of code

• socket API

- redis-cli

- client libs for all major languages

Page 28: Key-Value-Stores -- The Key to Scaling?

Features• high read & write throughput

- 50.000 to 100.000 ops / second

Page 29: Key-Value-Stores -- The Key to Scaling?

Features• high read & write throughput

- 50.000 to 100.000 ops / second

• interesting data structures

- lists, hashes, (sorted) sets

- atomic operations

Page 30: Key-Value-Stores -- The Key to Scaling?

Features• high read & write throughput

- 50.000 to 100.000 ops / second

• interesting data structures

- lists, hashes, (sorted) sets

- atomic operations

• strong consistency

Page 31: Key-Value-Stores -- The Key to Scaling?

Architecture• in-memory database

- append-only log on disk

- virtual memory

Page 32: Key-Value-Stores -- The Key to Scaling?

Architecture• in-memory database

- append-only log on disk

- virtual memory

• single instance

- master-slave replication

- clustering is on roadmap

Page 33: Key-Value-Stores -- The Key to Scaling?

“Memory is the new disk, disk is the new tape.”

— Jim Gray

Page 34: Key-Value-Stores -- The Key to Scaling?

Membase

Page 35: Key-Value-Stores -- The Key to Scaling?

Facts• written in C and Erlang

• API-compatible to Memcached

- same protocol

• client libs for all major languages

Page 36: Key-Value-Stores -- The Key to Scaling?

History• developed by NorthScale & Zynga

- used in production (Farmville)

• released in June 2010

- Apache 2.0 License

Page 37: Key-Value-Stores -- The Key to Scaling?

Features• “Memcached with persistence”

- extremely fast

- throughput scales linearly

Page 38: Key-Value-Stores -- The Key to Scaling?

Features• “Memcached with persistence”

- extremely fast

- throughput scales linearly

• automatic data placement

- memory, ssd, disk

Page 39: Key-Value-Stores -- The Key to Scaling?

Features• “Memcached with persistence”

- extremely fast

- throughput scales linearly

• automatic data placement

- memory, ssd, disk

• configurable replica count

Page 40: Key-Value-Stores -- The Key to Scaling?

Architecture• cluster

- all nodes are alike

- one elected as “coordinator”

Page 41: Key-Value-Stores -- The Key to Scaling?

Architecture• cluster

- all nodes are alike

- one elected as “coordinator”

• each node is master for part of key space

- handles all reads & writes

Page 42: Key-Value-Stores -- The Key to Scaling?

Mapping Scheme

Page 43: Key-Value-Stores -- The Key to Scaling?

“simple, fast, elastic”

Page 44: Key-Value-Stores -- The Key to Scaling?

Solution

Page 45: Key-Value-Stores -- The Key to Scaling?

Which one would you pick?

Page 46: Key-Value-Stores -- The Key to Scaling?

Decision• Cassandra ?

Page 47: Key-Value-Stores -- The Key to Scaling?

Decision• Cassandra ?

- too big, too complicated

Page 48: Key-Value-Stores -- The Key to Scaling?

Decision• Cassandra ?

- too big, too complicated

• Membase ?

Page 49: Key-Value-Stores -- The Key to Scaling?

Decision• Cassandra ?

- too big, too complicated

• Membase ?

- not yet available (then)

Page 50: Key-Value-Stores -- The Key to Scaling?

Decision• Cassandra ?

- too big, too complicated

• Membase ?

- not yet available (then)

• Redis !

Page 51: Key-Value-Stores -- The Key to Scaling?

Motivation• keep operations simple

• use as few machines as possible

- ideally, only one

Page 52: Key-Value-Stores -- The Key to Scaling?

Design• two machines (+ load balancer)

- Redis master handles all reads / writes

- Redis slave as hot standby

Page 53: Key-Value-Stores -- The Key to Scaling?

Design• two machines (+ load balancer)

- Redis master handles all reads / writes

- Redis slave as hot standby

- both machines used as app servers

Page 54: Key-Value-Stores -- The Key to Scaling?

Design• two machines (+ load balancer)

- Redis master handles all reads / writes

- Redis slave as hot standby

- both machines used as app servers

• dedicated hardware

Page 55: Key-Value-Stores -- The Key to Scaling?

Data model• one Redis hash per user

- key: facebook id

• store data as serialized JSON

- booleans, strings, numbers, timestamps ...

Page 56: Key-Value-Stores -- The Key to Scaling?

Advantages• turns Redis into “document db”

- efficient to swap user data in / out

- atomic ops on parts

• easy to dump / restore user data

Page 57: Key-Value-Stores -- The Key to Scaling?

Capacity• 4 GB memory for 20 mio. integer keys

- keys always stay in memory!

Page 58: Key-Value-Stores -- The Key to Scaling?

Capacity• 4 GB memory for 20 mio. integer keys

- keys always stay in memory!

• 2 GB memory for 10.000 user hashes

- others can be swapped out

Page 59: Key-Value-Stores -- The Key to Scaling?

Capacity• 4 GB memory for 20 mio. integer keys

- keys always stay in memory!

• 2 GB memory for 10.000 user hashes

- others can be swapped out

• 3.6 mio. ops / minute

- sufficient for 200.000 requests

Page 60: Key-Value-Stores -- The Key to Scaling?

Status• game was launched in august

- currently still in beta

Page 61: Key-Value-Stores -- The Key to Scaling?

Status• game was launched in august

- currently still in beta

• expect to reach 1 mio. daily active users in Q1/2011

Page 62: Key-Value-Stores -- The Key to Scaling?

Status• game was launched in august

- currently still in beta

• expect to reach 1 mio. daily active users in Q1/2011

• will try to stick to 2 or 3 machines

- possibly bigger / faster ones

Page 63: Key-Value-Stores -- The Key to Scaling?

Conclusions• use the right tool for the job

Page 64: Key-Value-Stores -- The Key to Scaling?

Conclusions• use the right tool for the job

• keep it simple

- avoid sharding, if possible

Page 65: Key-Value-Stores -- The Key to Scaling?

Conclusions• use the right tool for the job

• keep it simple

- avoid sharding, if possible

• don’t scale out too early

- but have a viable “plan b”

Page 66: Key-Value-Stores -- The Key to Scaling?

Conclusions• use the right tool for the job

• keep it simple

- avoid sharding, if possible

• don’t scale out too early

- but have a viable “plan b”

• use dedicated hardware

Page 67: Key-Value-Stores -- The Key to Scaling?

Q & A