Buildinga billionuserloadbalancer may2015-sre-con15europe-shuff

72

Transcript of Buildinga billionuserloadbalancer may2015-sre-con15europe-shuff

  • Production Engineer, Traffic Team

    Building a Billion User Load Balancer

    Patrick Shuff

  • Well be talking about

    2015 Facebook | Dublin | Credits: network icon by Daniel Gamage https://thenounproject.com/term/network/49138/, tools icon by Kiryl Sytsko https://thenounproject.com/term/tools/122204/, chess icon by Dream Icons, https://thenounproject.com/term/chess/127867/, Paraglider by Matt Brooks https://thenounproject.com/term/paraglider/114410/, Question by Henrik Lund Mikkelsen https://thenounproject.com/term/question/8325/ https://creativecommons.org/licenses/by/3.0/ https://thenounproject.com/term/web-search/18390/

    Serving DynamicFacebook Requests

    L4/L7 Load Balancing

    Edge PoP and Reducing Latency

    Global DNSLoad Balancing

    Q&A

    https://thenounproject.com/term/network/49138/https://thenounproject.com/term/tools/122204/https://thenounproject.com/term/chess/127867/https://thenounproject.com/term/paraglider/114410/https://creativecommons.org/licenses/by/3.0/https://thenounproject.com/term/web-search/18390/

  • Traffic @ fb

  • Facebook scaleas of March 2015

    936 million daily active people on average

    798 million mobile daily active people on average

    1.25 billion mobile monthly active people

    1.44 billion monthly active people

    2015 Facebook | Dublin | Credit: Network icon by Daniel Gamage https://thenounproject.com/term/network/49138/ Public Domain

    https://thenounproject.com/term/network/49138/

  • Facebook scaleas of March 2015

    936 million daily active people on average

    798 million mobile daily active people on average

    1.25 billion mobile monthly active people

    1.44 billion monthly active people

    Approximately 82.4% of our daily active users are outside the US and Canada

    2015 Facebook | Dublin | Credit: Network icon by Daniel Gamage https://thenounproject.com/term/network/49138/ Public Domain

    https://thenounproject.com/term/network/49138/

  • What is facebook?

    Photos

    Videos

    Newsfeed

    Likes

    2015 Facebook | Dublin | Credit: Network icon by Daniel Gamage https://thenounproject.com/term/network/49138/ Public Domain

    Dynamic Requests

    Javascript/CSSStatus Updates

    Static Requests

    Messaging

    (from traffic's perspective)

    https://thenounproject.com/term/network/49138/

  • What is facebook?

    Photos

    Videos

    Newsfeed

    Likes

    2015 Facebook | Dublin | Credit: Network icon by Daniel Gamage https://thenounproject.com/term/network/49138/ Public Domain

    Dynamic Requests

    Javascript/CSSStatus Updates

    Static Requests

    Messaging

    (from traffic's perspective)

    Terabits of egress (outgoing bits per second)

    https://thenounproject.com/term/network/49138/

  • What is facebook?

    Photos

    Videos

    Newsfeed

    Likes

    2015 Facebook | Dublin | Credit: Network icon by Daniel Gamage https://thenounproject.com/term/network/49138/ Public Domain

    Dynamic Requests

    Javascript/CSSStatus Updates

    Static Requests

    Messaging

    (from traffic's perspective)

    https://thenounproject.com/term/network/49138/

  • 2015 Facebook | Menlo Park | Credit:

    What are we talking about?

    HHVM

  • 2015 Facebook | Menlo Park | Credit:

    What are we not talking about?

    HHVM

    MySQL

    Cache

    Msgs

    Feed

    Ads

  • 7 Days

    Egress

    Ingress

    Weekly egress cycle

  • 11 AM

    24 hours

    3 PM

    Diurnal egress CycleTime zone == Pacific

    (-0800 GMT)

  • United Kingdom Canada Indonesia

    7 PM

    24 hours

    9 AM

    Sum of timezonesTime zone == Pacific

    (-0800 GMT)

  • TCP/IP Review

  • 2015 Facebook | Menlo Park | Credit:

    OSI ModelLayer Purpose Ex

    7: Application High-Level API HTTP, SPDY, MQTT

    6: Presentation Data Translation ASCII, JPEG

    5: Session Communication Session RPC

    4: Transport Transmission TCP, UDP

    3: Network Address, Routing, Flow IPv6, IPv4

    2: Data Link Reliable Physical Comm. IEEE, 802.2

    1: Physical Raw bit transmission DSL, USB

  • 2015 Facebook | Menlo Park | Credit:

    IP Header (OSI Layer 3)

    Version DSCP ECN Flow Label

    Payload Length Next Header Hop limit

    Source Address

    Destination Address

    Data

  • 2015 Facebook | Menlo Park | Credit:

    TCP Header (OSI Layer 4)

    Source Port Destination Port

    Sequence Number

    Acknowledgement Number

    .

    Application Payload

  • 2015 Facebook | Menlo Park | Credit:

    HTTP Request (OSI Layer 7)

    GET / HTTP/1.1 host: www.facebook.com

  • 2015 Facebook | Menlo Park | Credit:

    Putting it all togetherVersion DSCP ECN Flow Label

    Payload Length Next Header Hop limit

    Source Address

    Destination Address

    Source Port Destination Port

    Sequence Number

    Acknowledgement Number

    .

    GET / HTTP/1.1 host: www.facebook.com

    L3

    L4

    L7

  • 2015 Facebook | Menlo Park | Credit:

    Putting it all together

    IP Packet

    TCP Segment

    HTTP Request

  • Serving Dynamic Facebook Requests

  • 2015 Facebook | Menlo Park | Credit:

    FB Request -- one web server

    low rps

    how do we get more rps?!

    GET / HHVM

    AAAA www.facebook.com DNS

    rps = requests per second

    ...

    AAAA 2a03:2880:2130:cf05:face:b00c::1

  • 2015 Facebook | Menlo Park | Credit:

    Add a load balancer!

    www ?

    GET /

    ...

    L7LB

    (proxygen)

    PHP

    PHP

    HHVM

    GET /

    ...

    AAAA 2a03:2880:2130:cf05:face:b00c::1

    DNSrps = requests per second

    how do we get more rps?!

    lots more rps low rps

    AAAA www.facebook.com

    http://www.facebook.com

  • 2015 Facebook | Menlo Park | Credit:

    Add another load balancer!www ?

    GET /

    ...

    L7LB

    (proxygen)

    PHPL4LB(ipvs)

    PHP

    PHPGET /

    ...

    L7LB

    (proxygen)

    L7LB

    (proxygen)

    GET /

    ...

    DNS

    how do we get more rps?!

    network bound lots more rps low rps

    PHP

    HHVM

    AAAA 2a03:2880:2130:cf05:face:b00c::1

    AAAA www.facebook.com

    http://www.facebook.com

  • 2015 Facebook | Menlo Park | Credit:

    Add another load balancer!

    GET /

    ...

    L7LB

    (proxygen)

    L4LB(shiv)

    GET /

    ...

    L7LB (proxygen)

    L7LB

    (proxygen)

    GET /

    ...L4LB(ipvs)

    DNS

    network bound lots more rps low rps

    L7LB

    (proxygen)

    PHP

    PHP

    PHPPHP

    HHVM

    PHP

    PHP

    PHPPHP

    HHVMECMP

    AAAA 2a03:2880:2130:cf05:face:b00c::1

    AAAA www.facebook.com

    http://www.facebook.com

  • 2015 Facebook | Menlo Park | Credit:

    Front end Web Cluster

    ~10

    ~100

    Thousands

  • 2015 Facebook | Menlo Park | Credit:

    cont.

    x 10 or more

  • 2015 Facebook | Menlo Park | Credit:

    More RPS? Add another cluster!

    www ?GET /

    ...

    L7LB

    (proxygen)

    PHP

    L4LB(shiv)

    PHP

    HHVM

    how do we get more rps?!

    GET /

    ...

    L7LB

    (proxygen)

    L7LB

    (proxygen)

    GET /

    ...L4LB(ipvs)

    GET /

    ...

    L7LB

    (proxygen)

    PHP

    L4LB(shiv)

    PHP

    HHVM

    GET /

    ...

    L7LB

    (proxygen)

    L7LB

    (proxygen)

    GET /

    ...L4LB(ipvs)

    DNS AAAA www.facebook.comAAAA 2a03:2880:2130:cf05:face:b00c::1

  • 2015 Facebook | Menlo Park | Credit:

    Add another datacenter!

    www ?

    G