flex cookbook

61
memcache@facebook Marc Kwiatkowski memcache tech lead QCon Monday, April 12, 2010

description

something

Transcript of flex cookbook

Page 1: flex cookbook

memcache@facebook

Marc Kwiatkowskimemcache tech leadQCon 2010

Monday, April 12, 2010

Page 2: flex cookbook

How big is facebook?

Monday, April 12, 2010

Page 3: flex cookbook

!"" million active users

Monday, April 12, 2010

400M Active

Page 4: flex cookbook

!"" million active users

0M

90M

180M

270M

360M

450M

#""$ #""% #"&"

Title

!""M

Monday, April 12, 2010

400M Active

Page 5: flex cookbook

Objects▪ More than 60 million status updates posted each day

▪ 694/s

▪ More than 3 billion photos uploaded to the site each month

▪ 23/s

▪ More than 5 billion pieces of content (web links, news stories, blog posts, notes, photo albums, etc.) shared each week

▪ 8K/s

▪ Average user has 130 friends on the site

▪ 50 Billion friend graph edges

▪ Average user clicks the Like button on 9 pieces of content each month

Monday, April 12, 2010

Page 6: flex cookbook

- Infrastructure▪ Thousands of servers in several data centers in two regions

▪ Web servers

▪ DB servers

▪ Memcache Servers

▪ Other services

Monday, April 12, 2010

Page 7: flex cookbook

The scale of memcache @ facebook▪ Memcache Ops/s

▪ over 400M gets/sec

▪ over 28M sets/sec

▪ over 2T cached items

▪ over 200Tbytes

▪ Network IO

▪ peak rx 530Mpkts/s 60GB/s

▪ peak tx 500Mpkts/s 120GB/s

Monday, April 12, 2010

Page 8: flex cookbook

A typical memcache server’s P.O.V.▪ Network I/O

▪ rx 90Kpkts/s 9.7MB/s

▪ tx 94Kpkts/s 19MB/s

▪ Memcache OPS

▪ 80K gets/s

▪ 2K sets/s

▪ 200M items

Monday, April 12, 2010

All rates are 1 day moving averages

Page 9: flex cookbook

Evolution of facebook’s architecture

Monday, April 12, 2010

Page 10: flex cookbook

Monday, April 12, 2010

•When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone on one server

•Then as Facebook grew, they could scale like a traditional site by just adding servers

•Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a separated network that could be served on an isolated set of servers

•But as people connected more between schools connected, the model changed--and the big change came when Facebook opened to everyone in Sept. 2006

•[For globe]: That led to people being connected everywhere around the world--not just on a single college campus.

•[For globe]: This visualization shows accepted friend requests animating from requesting friend to accepting friend

Page 11: flex cookbook

Monday, April 12, 2010

•When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone on one server

•Then as Facebook grew, they could scale like a traditional site by just adding servers

•Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a separated network that could be served on an isolated set of servers

•But as people connected more between schools connected, the model changed--and the big change came when Facebook opened to everyone in Sept. 2006

•[For globe]: That led to people being connected everywhere around the world--not just on a single college campus.

•[For globe]: This visualization shows accepted friend requests animating from requesting friend to accepting friend

Page 12: flex cookbook

Monday, April 12, 2010

•When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone on one server

•Then as Facebook grew, they could scale like a traditional site by just adding servers

•Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a separated network that could be served on an isolated set of servers

•But as people connected more between schools connected, the model changed--and the big change came when Facebook opened to everyone in Sept. 2006

•[For globe]: That led to people being connected everywhere around the world--not just on a single college campus.

•[For globe]: This visualization shows accepted friend requests animating from requesting friend to accepting friend

Page 13: flex cookbook

Monday, April 12, 2010

•When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone on one server

•Then as Facebook grew, they could scale like a traditional site by just adding servers

•Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a separated network that could be served on an isolated set of servers

•But as people connected more between schools connected, the model changed--and the big change came when Facebook opened to everyone in Sept. 2006

•[For globe]: That led to people being connected everywhere around the world--not just on a single college campus.

•[For globe]: This visualization shows accepted friend requests animating from requesting friend to accepting friend

Page 14: flex cookbook

Monday, April 12, 2010

•When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone on one server

•Then as Facebook grew, they could scale like a traditional site by just adding servers

•Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a separated network that could be served on an isolated set of servers

•But as people connected more between schools connected, the model changed--and the big change came when Facebook opened to everyone in Sept. 2006

•[For globe]: That led to people being connected everywhere around the world--not just on a single college campus.

•[For globe]: This visualization shows accepted friend requests animating from requesting friend to accepting friend

Page 15: flex cookbook

Scaling Facebook: Interconnected data

Bob

Monday, April 12, 2010

•On Facebook, the data required to serve your home page or any other page s incredibly interconnected

•Your data can’t sit on one server or cluster of servers because almost every piece of content on Facebook requires information about your network of friends

•And the average user has 130 friends

•As we scale, we have to be able to quickly pull data across all of our servers, wherever it’s stored.

Page 16: flex cookbook

Scaling Facebook: Interconnected data

Bob Brian

Monday, April 12, 2010

•On Facebook, the data required to serve your home page or any other page s incredibly interconnected

•Your data can’t sit on one server or cluster of servers because almost every piece of content on Facebook requires information about your network of friends

•And the average user has 130 friends

•As we scale, we have to be able to quickly pull data across all of our servers, wherever it’s stored.

Page 17: flex cookbook

Scaling Facebook: Interconnected data

Bob BrianFelicia

Monday, April 12, 2010

•On Facebook, the data required to serve your home page or any other page s incredibly interconnected

•Your data can’t sit on one server or cluster of servers because almost every piece of content on Facebook requires information about your network of friends

•And the average user has 130 friends

•As we scale, we have to be able to quickly pull data across all of our servers, wherever it’s stored.

Page 18: flex cookbook

Memcache Rules of the Game▪ GET object from memcache

▪ on miss, query database and SET object to memcache

▪ Update database row and DELETE object in memcache

▪ No derived objects in memcache

▪ Every memcache object maps to persisted data in database

Monday, April 12, 2010

Page 19: flex cookbook

Scaling memcache

Monday, April 12, 2010

Page 20: flex cookbook

Phatty Phatty Multiget

Monday, April 12, 2010

Page 21: flex cookbook

Phatty Phatty Multiget

Monday, April 12, 2010

Page 22: flex cookbook

Phatty Phatty Multiget (notes)▪ PHP runtime is single threaded and synchronous

▪ To get good performance for data-parallel operations like retrieving info for all friends, it’s necessary to dispatch memcache get requests in parallel

▪ Initially we just used polling I/O in PHP.

▪ Later we switched to true asynchronous I/O in a PHP C extension

▪ In both case the result was reduced latency through parallelism.

Monday, April 12, 2010

Page 23: flex cookbook

Pools and Threads

PHP Client

Monday, April 12, 2010

Page 24: flex cookbook

PHP Client

sp:12345

sp:12346 sp:12347 cs:12345cs:12346 cs:12347

Monday, April 12, 2010

Different objects have different sizes and access patterns. We began creating memcache pools to segregate different kinds of objects for better cache efficiency and memory utilization.

Page 25: flex cookbook

PHP Client

sp:12345 sp:12346 sp:12347 cs:12345 cs:12346 cs:12347

Monday, April 12, 2010

Different objects have different sizes and access patterns. We began creating memcache pools to segregate different kinds of objects for better cache efficiency and memory utilization.

Page 26: flex cookbook

PHP Client

Monday, April 12, 2010

Different objects have different sizes and access patterns. We began creating memcache pools to segregate different kinds of objects for better cache efficiency and memory utilization.

Page 27: flex cookbook

Pools and Threads (notes)▪ Privacy objects are small but have poor hit rates

▪ User-profiles are large but have good hit rates

▪ We achieve better overall caching by segregating different classes of objects into different pools of memcache servers

▪ Memcache was originally a classic single-threaded unix daemon

▪ This meant we needed to run 4 instances with 1/4 the RAM on each memcache server

▪ 4X the number of connections to each both

▪ 4X the meta-data overhead

▪ We needed a multi-threaded service

Monday, April 12, 2010

Page 28: flex cookbook

Connections and Congestion▪ [animation]

Monday, April 12, 2010

Page 29: flex cookbook

Connections and Congestion (notes)▪ As we added web-servers the connections to each memcache box

grew.

▪ Each webserver ran 50-100 PHP processes

▪ Each memcache box has 100K+ TCP connections

▪ UDP could reduce the number of connections

▪ As we added users and features, the number of keys per-multiget increased

▪ Popular people and groups

▪ Platform and FBML

▪ We began to see incast congestion on our ToR switches.

Monday, April 12, 2010

Page 30: flex cookbook

Serialization and Compression▪ We noticed our short profiles weren’t so short

▪ 1K PHP serialized object

▪ fb-serialization

▪ based on thrift wire format

▪ 3X faster

▪ 30% smaller

▪ gzcompress serialized strings

Monday, April 12, 2010

Page 31: flex cookbook

Multiple Datacenters

SC Memcache

SC Web

SC MySQL

Monday, April 12, 2010

Page 32: flex cookbook

Multiple Datacenters

SF Web

SF Memcache

SC Memcache

SC Web

SC MySQL

Monday, April 12, 2010

Page 33: flex cookbook

Multiple Datacenters

SF Web

SF Memcache

SC Memcache

SC Web

SC MySQL

Memcache ProxyMemcache Proxy

Monday, April 12, 2010

Page 34: flex cookbook

▪Multiple Datacenters (notes)▪ In the early days we had two data-centers

▪ The one we were about to turn off

▪ The one we were about to turn on

▪ Eventually we outgrew a single data-center

▪ Still only one master database tier

▪ Rules of the game require that after an update we need to broadcast deletes to all tiers

▪ The mcproxy era begins

Monday, April 12, 2010

Page 35: flex cookbook

Multiple Regions

SC Memcache

SC Web

SC MySQL

East Coast

VA MySQL

VA Web

VA Memcache

Memcache Proxy

West Coast

Monday, April 12, 2010

Page 36: flex cookbook

Multiple Regions

SF Web

SF Memcache

SC Memcache

SC Web

SC MySQL

Memcache ProxyMemcache Proxy

East Coast

VA MySQL

VA Web

VA Memcache

Memcache Proxy

West Coast

Monday, April 12, 2010

Page 37: flex cookbook

Multiple Regions

SF Web

SF Memcache

SC Memcache

SC Web

SC MySQL

Memcache ProxyMemcache Proxy

MySql replication

East Coast

VA MySQL

VA Web

VA Memcache

Memcache Proxy

West Coast

Monday, April 12, 2010

Page 38: flex cookbook

▪ Multiple Regions (notes)

▪ Latency to east coast and European users was/is terrible.

▪ So we deployed a slave DB tier in Ashburn VA

▪ Slave DB tracks syncs with master via MySQL binlog

▪ This introduces a race condition

▪ mcproxy to the rescue again

▪ Add memcache delete pramga to MySQL update and insert ops

▪ Added thread to slave mysqld to dispatch deletes in east coast via mcpro

Monday, April 12, 2010

Page 39: flex cookbook

Replicated Keys

PHP Client PHP ClientPHP Client

Memcache MemcacheMemcache

Monday, April 12, 2010

Page 40: flex cookbook

key

Replicated Keys

PHP Client PHP ClientPHP Client

Memcache MemcacheMemcache

Monday, April 12, 2010

Page 41: flex cookbook

key

Replicated Keys

key key key

PHP Client PHP ClientPHP Client

Memcache MemcacheMemcacheMemcache

Monday, April 12, 2010

Page 42: flex cookbook

key

Replicated Keys

key key key

PHP Client PHP ClientPHP Client

Memcache MemcacheMemcache

Monday, April 12, 2010

Page 43: flex cookbook

key

Replicated Keys

PHP Client PHP ClientPHP Client

MemcacheMemcache Memcache

Monday, April 12, 2010

Page 44: flex cookbook

key

Replicated Keys

key#" key#& key#'

PHP Client PHP ClientPHP Client

MemcacheMemcache Memcache

Monday, April 12, 2010

Page 45: flex cookbook

Replicated Keys (notes)▪ Viral groups and applications cause hot keys

▪ More gets than a single memcache server can process

▪ (Remember the rules of the game!)

▪ That means more queries than a single DB server can process

▪ That means that group or application is effectively down

▪ Creating key aliases allows us to add server capacity.

▪ Hot keys are published to all web-servers

▪ Each web-server picks an alias for gets

▪ get key:xxx => get key:xxx#N

▪ Each web-server deletes all aliases

Monday, April 12, 2010

Page 46: flex cookbook

Memcache Rules of the Game▪ New Rule

▪ If a key is hot, pick an alias and fetch that for reads

▪ Delete all aliases on updates

Monday, April 12, 2010

Page 47: flex cookbook

Mirrored Pools

General pool with wide fanout

Shard & Shard #

Specialized Replica !

Shard & Shard #

Shard & Shard # Shard ' Shard n

Specialized Replica "

...

Monday, April 12, 2010

Page 48: flex cookbook

Mirrored Pools (notes)▪ As our memcache tier grows the ratio of keys/packet decreases

▪ 100 keys/1 server = 1 packet

▪ 100 keys/100 server = 100 packets

▪ More network traffic

▪ More memcache server kernel interrupts per request

▪ Confirmed Info - critical account meta-data

▪ Have you confirmed your account?

▪ Are you a minor?

▪ Pulled from large user-profile objects

▪ Since we just need a few bytes of data for many users

Monday, April 12, 2010

Page 49: flex cookbook

Hot Misses▪ [animation]

Monday, April 12, 2010

Page 50: flex cookbook

Hot Misses (notes)▪ Remember the rules of the game

▪ update and delete

▪ miss, query, and set

▪ When the object is very, very popular, that query rate can kill a database server

▪ We need flow control!

Monday, April 12, 2010

Page 51: flex cookbook

Memcache Rules of the Game▪ For hot keys, on miss grab a mutex before issuing db query

▪ memcache-add a per-object mutex

▪ key:xxx => key:xxx#mutex

▪ If add succeeds do the query

▪ If add fails (because mutex already exists) back-off and try again

▪ After set delete mutex

Monday, April 12, 2010

Page 52: flex cookbook

Hot Deletes▪ [hot groups graphics]

Monday, April 12, 2010

Page 53: flex cookbook

Hot Deletes (notes) ▪ We’re not out of the woods yet

▪ Cache mutex doesn’t work for frequently updated objects

▪ like membership lists and walls for viral groups and applications.

▪ Each process that acquires a mutex finds that the object has been deleted again

▪ ...and again

▪ ...and again

Monday, April 12, 2010

Page 54: flex cookbook

Rules of the Game: Caching Intent▪ Each memcache server is in the perfect position to detect and

mitigate contention

▪ Record misses

▪ Record deletes

▪ Serve stale data

▪ Serve lease-ids

▪ Don’t allow updates without a valid lease id

Monday, April 12, 2010

Page 55: flex cookbook

Next Steps

Monday, April 12, 2010

Page 56: flex cookbook

Shaping Memcache Traffic▪ mcproxy as router

▪ admission control

▪ tunneling inter-datacenter traffic

Monday, April 12, 2010

Page 57: flex cookbook

Cache Hierarchies▪ Warming up Cold Clusters

▪ Proxies for Cacheless Clusters

Monday, April 12, 2010

Page 58: flex cookbook

Big Low Latency Clusters▪ Bigger Clusters are Better

▪ Low Latency is Better

▪ L2.5

▪ UDP

▪ Proxy Facebook Architecture

Monday, April 12, 2010

Page 59: flex cookbook

Worse IS better▪ Richard Gabriel’s famous essay contrasted

▪ ITS and Unix

▪ LISP and C

▪ MIT and New Jersey

Monday, April 12, 2010

http://www.jwz.org/doc/worse-is-better.html

Page 60: flex cookbook

Why Memcache Works▪ Uniform, low latency with partial results is a better user

experience

▪ memcache provides a few robust primitives

▪ key-to-server mapping

▪ parallel I/O

▪ flow-control

▪ traffic shaping

▪ that allow ad hoc solutions to a wide range of scaling issues

Monday, April 12, 2010

We started with simple, obvious improvements.As we grew we deployed less obvious improvements...But they’ve remained pretty simple

Page 61: flex cookbook

(c) 2010 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

Monday, April 12, 2010