Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes...

42
Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production [email protected] [email protected] KubeCon - Berlin, 29-30 March 2017

Transcript of Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes...

Page 1: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production

[email protected] [email protected]

KubeCon - Berlin, 29-30 March 2017

Page 2: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

An inspiring travel company ..

Page 3: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

A tech company to the core

Tech department: 300+ people

Applications: ~100

Database: 4 TB of data

Servers: 1400 VMs, 300 physical machines

Locations: Chiasso, Milan, Madrid, London, Bengaluru

Page 4: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Business: "technology is slow"

https://www.pexels.com/photo/turtle-walking-on-sand-132936/

Page 5: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Technology: "the monolith is the problem"

https://www.flickr.com/photos/southtopia/5702790189

Page 6: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

https://www.pexels.com/photo/gray-pebbles-with-green-grass-51168/

"... let’s break into microservices!"

Page 7: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

A lot of issues

● LONG provisioning time

● LACK OF alignment across environments

● LACK OF alignment across applications

● LACK OF awareness about ops

Page 8: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

A year-long endeavour

● build a new, modern infrastructure

● migrate the search (flight/hotel) product there

... without:

● impacting the business● throwing away our whole datacenter

Page 9: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

https://www.pexels.com/photo/colorful-toothed-wheels-171198/

Our infrastructure and our architecture

Page 10: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Virtualization platform

TONS

OF

VIRTUAL MACHINES

Page 11: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Virtualization platform

TONS

OF

VIRTUAL MACHINES

Page 12: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

● CoreOS, the all-in-one choice

○ Cloudconfig configuration

○ Automatable in a shot

○ Really simple patch management

Engage

Page 13: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Our Kubernetes on CoreOS architecture is born

● The stack○ ETCD○ FLANNELD○ DOCKER

● KUBERNETES (Google!)

K8S

DOCKER

FLAN

NE

LD

ET

CD

CoreOS

Po

dP

od

Po

d

Server

Page 14: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

NODE 2

NODE 1

NODE 2

NODE 1

How to talk with pods

NGINX

NGINX

Pod

Pod

Pod

Pod

Pod

np

np

np

Pod

Pod

Pod

Proxy

np

np

Pod

Pod

Podnp

Proxy

Proxy

Proxy

F5 F5

tcp http

NodePort Ingress

Page 15: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

In the name of service

- host: awesomeservice.prd.mykubecluster.intra http: paths: - path: / backend: serviceName: awesomeservice servicePort: 8081

awesomeservice-ingress.yaml

Page 16: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

In the name of service

*.[prd|qa|dev].mykubecluster.intra. IN CNAME kubef5ingress

Page 17: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

The return of NodePort

np

np

Pod

Pod

Pod

np Proxy

NODE n

F5 TLS TLSTLS

tcp

TLS

Page 18: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

The registry brought another question...

?

Page 19: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Seriously?

Page 20: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Rear window on kubernetes

Server

graphite

OScollectd

image

Nagios first Grafana 4 now

icons from http://www.flaticon.com

Kube API

Page 21: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

We were happy!

Page 22: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Not happy anymore

Page 23: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Seriously?

Page 24: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

The change… It’s a kind of magic

KEEP CALM

andTRUST KUBERNETES

Page 25: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

What we learned

Lots of things!

Page 26: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

The final architecture (so far…)

K8S

DOCKER

FLAN

NE

LD

ET

CD

Ubuntu

Po

dP

od

Po

d

F5

OU

TSID

EKU

BERN

ETES INSIDE KUBERNETES:

3 different environments7 MASTERS

2 REGISTRYs+ 70 PHYSICAL NODES

+ 47 ETCDs+ 7 DNS

+ 140 Namespaces+ 1300 PODs

ingress

Page 27: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Our infrastructure and our architecture

https://www.pexels.com/photo/colorful-toothed-wheels-171198/

Page 28: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Our core axioms

● same architecture across environments● a common framework to align software● centralized monitoring/logging, with alerts● zero downtime deployment ● automation everywhere

Page 29: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

APP3-PRODUCTION

Kubernetes: our architecture

APP2-PRODUCTIONAPP1-PRODUCTION

APP3-PRODUCTIONAPP2-PRODUCTIONAPP1-PREVIEW

APP3-PRODUCTIONAPP2-PRODUCTIONAPP1-DEVELOPMENT

APP3-PRODUCTIONAPP2-PRODUCTIONAPP1-QA

nonproductionproduction

Page 30: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Kubernetes: our architecture and choices

APP1-PRODUCTION

deployment

replica-set

app1-production.prd.mykubecluster.intra

secret configmap

POD-3POD-2POD-1

production

Page 31: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

"To ingress or not to ingress? .."

NODE 1

NODE 2

NODE 3

● easier DNS management● customizable proxy server

● 3rd party tool● requires external sync● all requests go through it● reload risks

F5

NGINX

NGINX

Page 32: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

APP1-PRODUCTION

Kubernetes: our architecture and choices

POD

production

applicationfluentdcollectd

carbon

Page 33: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

APP1-PRODUCTION

POD

Monitoring and alerting: grafana/graphitecluster

graphite

applicationcollectd

Grafana 4

icons from http://www.flaticon.com

carbon

Page 34: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Zero downtime (1): graceful shutdown

lifecycle: preStop: exec: command: ["/stop_helper.sh"]

deployment.yaml

#!/bin/bash

wget http://localhost:8002/stop

stop_helper.sh

Page 35: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Zero downtime (2): graceful startup

private CompletableFuture run(Stream<CompletableFuture> startupJobs) {

return allOf(startupJobs.toArray(CompletableFuture[]::new)) .thenAccept(this::raiseReadinessUp) .exceptionally(this::shutdown);

}

JobsExecutor.java

Page 36: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Automate everything: pipeline DSL

microservice = factory.newDeployRequest().withArtifact("com.lastminute.application1",2).fromGitRepo("git.lastminute.com/team/application")

lmn_deployCanaryStrategy(microservice,"qa") lmn_deployCanaryStrategy(microservice,"preview")lmn_deployCanaryStrategy(microservice,"production")

pipeline

Page 37: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Automate everything: pipeline

pulljar

builddocker

(gate)

QAcanary

(gate)

QAstable

(gate)

PREVcanary

(gate)

PREVstable

(gate)

PRODcanary

(gate)

PRODstable

● git push○ continuous integration○ continuous delivery

Page 38: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

https://www.flickr.com/photos/ghost_of_kuji/2763674926

.. failure ..

Page 39: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

nginx ingress controller problem

10.0.0.1

10.0.0.2

10.0.0.3

10.0.0.4

NGINX

NGINX

NGINX

10.0.0.5

10.0.0.6

F5

Page 40: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

https://www.pexels.com/photo/grayscale-photography-of-person-at-the-end-of-tunnel-211816/

There’s light .. at the end

Page 41: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

● 20K req/sec in the new cluster● 10 minutes to create a new environment ● whole pipeline runs in 16 minutes

○ 4 minutes to release 100 instances of a new version● 2M metrics/minute flows

Give me the numbers .. again!

Page 42: Tales from Lastminute.com Machine Room: Our Journey Towards a Full On-Premise Kubernetes Architecture in Production

Yes, we’re hiring!

THANKSwww.lastminutegroup.com