DockerCon EU 2015: The Glue is the Hard Part: Making a Production-Ready PaaS

The Glue is the Hard Part:Making a Production-Ready PaaS

Evan KrallSite Reliability Engineer @ Yelp

Agenda

PaaSTAWhat parts does PaaSTA have?How did we glue them together?

Wrap-up

IntroContext: Yelp before PaaSTAWhat's in a PaaS?

Production-ReadyWhat makes a PaaS production-ready?

Lessons learnedNext steps

Yelp’s Mission:Connecting people with great

local businesses.

4

5

Yelp Stats:As of Q3 2015

89M 3271%90M

Context: Yelp before PaaSTA

6

Service Oriented ArchitectureScale our engineering team by splitting our

codebase into many smaller parts

7

Dependency HellAs services gain adoption, shared libraries

become difficult to upgrade. Not all services are Python anymore.

8

Too Many ServicesWe can no longer fit all services on each service host. How do we split them up?

9

“I wonder how many organizations that say they're "doing DevOps" are actually

building a bespoke PaaS. And how many of those realize it.”

— @markimbriaco

10

Basic PaaS Components

11

SchedulingDecide which hosts run a service

12

DeliveryPut the code on the host and run it

13

DiscoveryTell clients where your service is running

14

What makes a PaaS trustworthy enough to run our website?

Production-Ready

16

Production-ready systems minimize impact of failures

impact =

frequency×

severity×

duration

A production-ready PaaS should minimize the impact of both application failures

and PaaS failures

17

Use stable components (software, hardware)You will always have failures.

Reduce failure frequency

18

Reduce failure severity

19

No SPOFsKeep working when a box dies

20

Graceful DegradationAvoid full outages when components break

21

Painless upgradesUpgrades should be easy, without downtime

22

Reduce failure duration

23

Self-healingRecover from common failures automatically

24

AlertingTell humans when things are still broken

25

VisibilityMake it easy for humans to diagnose issues

26

PaaSTAYelp's Open-SourceDocker-based PaaS

PaaSTA

28

● Delivery: Docker

● Scheduling: Mesos + Marathon

● Discovery: Smartstack

● Alerting: Sensu

Delivery in PaaSTA: Docker

29

● Self-contained artifacts● Provides software flexibility● Reproducible builds● Resource limits make scheduling

easier

● Mesos is an "SDK for distributed systems", batteries not included.

● Requires a framework○ Marathon (like ASG for Mesos)○ Chronos (Periodic tasks)

● Supports Docker as task executor

Scheduling in PaaSTA:Mesos and Marathon

30

Marathon

● Run N copies of Docker image● Works with Mesos to find space on

cluster● Replaces dead instances

31

32

from http://mesos.apache.org/documentation/latest/mesos-architecture/


33



34


(Marathon)

(Docker)


35


(Marathon)

(Docker)


36


(Marathon)

(Docker)


37


(Marathon)

(Docker)

How do we build & distribute Docker images?

38

Building Docker images

39

● Jenkins builds and tests images● Bless images by creating git tags

○ 1:1 git commit <-> docker image

● Pushes to registry

Shipping Docker images

40

● Distribution via private registry● S3 bucket shared among all

environments


codemetadata

stagebuild prod

41

How do we configure Marathon?

42

Aside: Declarative Control

43

● Describe end goal, not path● Helps achieve fault tolerance

"Deploy 12abcd34 to prod"vs.

"Commit 12abcd34 should be running in prod"

Gas pedal vs. Cruise Control

Configuring Marathon

44

● Need a wall around Marathon: it has root on your entire cluster.

● Cron job

● Combines per-service config and currently-blessed docker image

marathon-$cluster.yaml

45

● # tasks

● CPU, memory

● How to healthcheck your service

● Bounce strategy

● Command / args

Demo: Deploys

46

How do services talk to each other?

47

Discovery in PaaSTA:SmartStack● Registration agent on each box

writes to ZooKeeper

● Discovery agent on each box reads from ZK, configures HAProxy

48

Registration

49

Registering with SmartStack

50

● configure_nerve.py queries local mesos-slave API

● Keeping it local means registration works even if Mesos master or Marathon is down.

● We can register non-PaaSTA services as well


hacheck

service_1

service_2

service_3

Service host

ZK configure_nerve.py

nerve

metadatahealthcheck

Architecture: Registration

51

Nerve registers service instance in ZooKeeper:

/nerve/region:myregion ├── service_1 │ └── server_1_0000013614 ├── service_2 │ └── server_1_0000000959 ├── service_3 │ ├── server_1_0000002468 │ └── server_2_0000002467 [...]


{ "host":"10.0.0.123", "port":31337, "name":"server_1", "weight":10,}

ZooKeeper Data

52

Normally hacheck acts as a transparent proxy for healthchecks:$ curl -s yocalhost:6666/http/service_1/1234/status{ "uptime": 5693819.315988064, "pid": 2595160, "host": "server_1", "version": "b6309e09d71da8f1e28213d251f7c",}$

hacheck

53

Can also force healthchecks to fail before we shut down a service$ hadown service_1$ curl -s yocalhost:6666/http/service_1/1234/statusService service_1 in down state since 1443217910: krall$

hacheck

54

Discovery

55

synapse

haproxy

ZK

client

configure_synapse.py

nerve

metadata

traffic

Architecture: Discovery

56

HAProxy● By default, bind to 0.0.0.0● Bind only to yocalhost on public-

facing servers● Gives us goodies for all clients:○ Redispatch on conn failure○ Easy request logging○ Rock-solid load balancing

57

yocalhost

58

● One HAProxy per host

● What address to bind HAProxy to?

● 127.0.0.1 is per-container

● Add loopback address to host: 169.254.255.254

● This also works on servers without Docker

docker container 2

lo 127.0.0.1

eth0 169.254.14.18

docker container 1

yocalhost

59

lo 127.0.0.1

eth0 169.254.14.17

docker0 169.254.1.1

eth0 10.1.2.3

haproxy

lo 127.0.0.1

lo:0 169.254.255.244

smartstack.yaml

60

● port that HAProxy binds to

● mode (TCP/HTTP)

● Timeouts

● Healthcheck URI

Demo: Discovery

61

Monitoring

62

Monitoring a PaaS is different

63

● Things can change frequently

○ Which boxes run which services?

○ What services even exist?

● Traditional "host X runs service Y" checks don't work anymore.

Monitor the invariants

64

● N copies of a service are running

● Marathon running on X,Y,Z

● All nodes are running mesos-slave, synapse, nerve, docker

● Cron jobs have succeeded recently

Sensu monitoring

65

● Decentralized checking

● Client executes checks, puts results on a message queue

● Sensu servers handle results from the queue, route them to email, PagerDuty, JIRA, etc.

try:

something that might fail

except:

send failure event

else:

send success event

We can send our own events

66

Lessons LearnedWhat has PaaSTA taught us?

Interfaces are important

68

App-Infra boundaryPermissive enough for developers to do their

job, strict enough to prevent infrastructure from ballooning

69

The right abstractions can save you a lot of work if you need to swap components

Between infra components

70

Iterative improvements find local optima

Sometimes you need to take bigger risks to get bigger rewards

"Evolution versus Revolution"

71

● It's open source now!

● More polish, docs, examples

● Support more technologies

○ Chronos in-progress

○ Docker Swarm?

○ Kubernetes?

What's next for PaaSTA?

72

Thank you!Evan Krall@[email protected]

DockerCon EU 2015: The Glue is the Hard Part: Making a Production-Ready PaaS

Technology

Transcript of DockerCon EU 2015: The Glue is the Hard Part: Making a Production-Ready PaaS