Download - Driving DevOps at GS Shop using Mesos

Transcript
Page 1: Driving DevOps at GS Shop using Mesos
Page 2: Driving DevOps at GS Shop using Mesos

IT Innovation Center*Dev*Ops*, Cloud

Infrastructure, Microservices,

Containers2015 - Current

Founding Member,Container Platform Team

vivekjuneja

Page 3: Driving DevOps at GS Shop using Mesos

The DevOps Enabler

Page 4: Driving DevOps at GS Shop using Mesos

Happy families are all alike

every unhappy family is unhappy in its own way.

幸福的家庭大抵相同,而不幸的家庭却各有其不幸。

Page 5: Driving DevOps at GS Shop using Mesos

Productive teams are all alike

every unproductive team is unhappy in its own way.

Page 6: Driving DevOps at GS Shop using Mesos

Productivity = Happy Teams

Page 7: Driving DevOps at GS Shop using Mesos

AGENDA

Page 8: Driving DevOps at GS Shop using Mesos

NOT A LONG AGO

BEGINNING OF THE CHANGE

ADOPTING THE CHANGE

THE ROAD AHEAD

1995 - 2015

2015 - 2016

2016 - 2017

2017 - 2019

Page 9: Driving DevOps at GS Shop using Mesos

NOT A LONG AGO

BEGINNING OF THE CHANGE

ADOPTING THE CHANGE

THE ROAD AHEAD

Page 10: Driving DevOps at GS Shop using Mesos

SourceRepo NEXUS

BUILDER& DEPLOYER

DEV

TEST

STAGE

PROD

Build & Deploy

Maintenance

Development

Page 11: Driving DevOps at GS Shop using Mesos

Build & Deploy

7 days

10 changes

10 days

3 changes

Deploy Frequency

Lead Time for Change

Per developer per weekChanges per Deploy

Page 12: Driving DevOps at GS Shop using Mesos

Introducing…

Page 13: Driving DevOps at GS Shop using Mesos

OPERATIONS* DEVELOPER*

Page 14: Driving DevOps at GS Shop using Mesos

OPERATIONS* DEVELOPER*

Stable system

Minimal Changes

Control on changes

New Features

Fast Changes

Quick rollout to Prod

Page 15: Driving DevOps at GS Shop using Mesos

Service Management

Monolithic App

Simplewell-understood

Management Primitives

Minimal Moving Parts

Page 16: Driving DevOps at GS Shop using Mesos

Multi-Apps / Microservices

New and Complicated Management Primitives

Too many Moving Parts

Service Management

Page 17: Driving DevOps at GS Shop using Mesos

YawnDrivenDeployment

(打哈欠)

Page 18: Driving DevOps at GS Shop using Mesos

YawnDrivenDeployment

(打哈欠)

Deploy Code at 3 AM to Production

Page 19: Driving DevOps at GS Shop using Mesos

NOT A LONG AGO

BEGINNING OF THE CHANGE

THE ROAD AHEAD

ADOPTING THE CHANGE

Page 20: Driving DevOps at GS Shop using Mesos

Know thy Issues

Page 21: Driving DevOps at GS Shop using Mesos

We manage our machines like

pets.

Pets get old, die and it is sad

Page 22: Driving DevOps at GS Shop using Mesos

Each team invents their own tools and processes

Page 23: Driving DevOps at GS Shop using Mesos

It takes a long time for

developers to get feedback

Page 24: Driving DevOps at GS Shop using Mesos

Big Bang releases that are treated

like religious events

Page 25: Driving DevOps at GS Shop using Mesos

Not-my-problem syndrome

Lack of Empathy between roles

Page 26: Driving DevOps at GS Shop using Mesos

Inspiration

Page 27: Driving DevOps at GS Shop using Mesos

Inverse Conway Maneuver

Page 28: Driving DevOps at GS Shop using Mesos

Inverse Conway Maneuver

Who is

Melvin Conway

Page 29: Driving DevOps at GS Shop using Mesos

Conway’s Law

organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations

Page 30: Driving DevOps at GS Shop using Mesos

organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations

Conway’s Law

Page 31: Driving DevOps at GS Shop using Mesos

Inverse Conway Maneuver

Design systems that impose constructive constraints on the teams to change the way they communicate and manage

Page 32: Driving DevOps at GS Shop using Mesos

Design systems that impose constructive constraints on the teams to change the way they communicate and manage

Inverse Conway Maneuver

Page 33: Driving DevOps at GS Shop using Mesos

O-ring theory of economics

Tasks of Production must be executed proficiently together in order for any of them to be of any value Michael Kremer

Page 34: Driving DevOps at GS Shop using Mesos

Adrian Colyer

Page 35: Driving DevOps at GS Shop using Mesos

O-ring theory of economics

devops

Page 36: Driving DevOps at GS Shop using Mesos

E

D

B

A

CO-ring theory of

devops

Page 37: Driving DevOps at GS Shop using Mesos

E

D

B

A

CO-ring theory of

devops

Page 38: Driving DevOps at GS Shop using Mesos

Tenets

Page 39: Driving DevOps at GS Shop using Mesos

Disposable Apps

Developer Productivity

Shared Multi-tenant Infrastructure and Tooling

Automate Service management primitives

Measure and Log everything

Tenets

Page 40: Driving DevOps at GS Shop using Mesos

Reverse Conway Maneuver

O-ring theory of devops

Apply !

Master Plan

Page 41: Driving DevOps at GS Shop using Mesos

Service Delivery Platform

The building blocks for building reliable software at scale

Page 42: Driving DevOps at GS Shop using Mesos

End to End Workflow

CODE

Build Process

Docker Image Deploy Preparation

Deployment Manifest

ZDD Load Balancer Reload

DNS Update

DEPLOY PHASE (for all environments)

Page 43: Driving DevOps at GS Shop using Mesos

APM Logging Dashboard Notification

End to End Workflow

Page 44: Driving DevOps at GS Shop using Mesos

One Deployment Manifest

to rule them all

Page 45: Driving DevOps at GS Shop using Mesos

DEV

TEST

STAGE

PROD

Page 46: Driving DevOps at GS Shop using Mesos

Deployment Manifest Template

Page 47: Driving DevOps at GS Shop using Mesos

DEV TEST STAGE PROD

Page 48: Driving DevOps at GS Shop using Mesos

Blue Green Deployment

NEWOLD OLD OLD OLD OLD OLD

HAProxy

ZDDControl

Marathon

Mesos

Control

disabled

Page 49: Driving DevOps at GS Shop using Mesos

Blue Green Deployment

Zero Downtime Deployment

Ideally

Page 50: Driving DevOps at GS Shop using Mesos

Blue Green Deployment

Deploy when awake !

Page 51: Driving DevOps at GS Shop using Mesos

Notifications

Page 52: Driving DevOps at GS Shop using Mesos

Custom Dashboard

Metadata based Rollback

Comprehensive Health Check uses Service Discovery

Supports Multiple Data Centers / Platform Regions

Common API for Developers for integration with CI Server

Page 53: Driving DevOps at GS Shop using Mesos
Page 54: Driving DevOps at GS Shop using Mesos

Devops Metrics

Marathon/events

failed_health_check_event

deployment_success

deployment_failed

deployment_step_failure

Timeseries DB

Metrics Collector

Page 55: Driving DevOps at GS Shop using Mesos

Devops Metrics

Page 56: Driving DevOps at GS Shop using Mesos

APM

Monitoring and Alerts

Hardware Monitoring (VM, Physical Machine)

Service Monitoring (container, non-container)

Container Platform Stack Monitoring

Service Latency

Service Tracing

Audit Trail

Page 57: Driving DevOps at GS Shop using Mesos

APM

Hardware Monitoring (VM, Physical Machine)

Service Monitoring (container, non-container)

Container Platform Stack Monitoring Service

LatencyService Tracing

Audit Trail

Monitoring and Alerts

Page 58: Driving DevOps at GS Shop using Mesos

Monitoring and Alerts

Page 59: Driving DevOps at GS Shop using Mesos

Monitoring and Alerts

Page 60: Driving DevOps at GS Shop using Mesos

Fault Identification

Notification APM Log Dashboard

…….TRACEID………... …….MESOS_TASK_ID……….

Marathon…….MARATHON_APP………

Kill / Kill and Scale

Page 61: Driving DevOps at GS Shop using Mesos

Fault Identification

Page 62: Driving DevOps at GS Shop using Mesos

Fair Share Usage

Gravity Platform

DEVEnvironment #1

TESTEnvironment #1

DEVEnvironment #2

TESTEnvironment #1

DEVEnvironment #2

Disposable Transient Dev/Test Environments

Hours

Page 63: Driving DevOps at GS Shop using Mesos

Platform Provisioning

Mesos Agent Docker

Monit Log Forwarder

cAdvisor

Worker Node

Mesos Master Marathon

Monit Log Forwarder

Master NodePrometheus cAdvisor

HAproxy

Marathon-LB

Standardization

Page 64: Driving DevOps at GS Shop using Mesos

Platform Provisioning

FLEET

PKG REPOSITORY

WORKER WORKER WORKER MASTER MASTER

Page 65: Driving DevOps at GS Shop using Mesos

Platform Provisioning

FLEETWORKER WORKER WORKER MASTER …...

WORKER WORKER WORKER MASTER …...

WORKER WORKER WORKER MASTER …...

Page 66: Driving DevOps at GS Shop using Mesos

Platform Provisioning

FLEETWORKER WORKER WORKER MASTER …...

WORKER WORKER WORKER MASTER …...

APPLICATIONS SYSTEM MGMT

WORKER WORKER WORKER MASTER …...

Page 67: Driving DevOps at GS Shop using Mesos

& many more

Page 68: Driving DevOps at GS Shop using Mesos

NOT A LONG AGO

BEGINNING OF THE CHANGE

THE ROAD AHEAD

ADOPTING THE CHANGE

Page 69: Driving DevOps at GS Shop using Mesos

Change is hard !

Page 70: Driving DevOps at GS Shop using Mesos

Evaluation

Experiencein Production

Confidence in Adoption

Tipping Point

Maturity with Devops

Timescale

Page 71: Driving DevOps at GS Shop using Mesos

Our Adoption Playbook

Confidence in Technology

Compare and Contrast

Create new Roles

Page 72: Driving DevOps at GS Shop using Mesos

Present Present Future Future

Unified Deployment

Centralized Logging

ConsolidatedMonitoring

Common Notifications

Evaluation Experience in Production

Confidence in Adoption

Page 73: Driving DevOps at GS Shop using Mesos

Compare and Contrast

L4 (load balancer)

Dedicated VM

App Server

Dedicated VM

App Server

Dedicated VM

App Server

MarathonLB

MarathonLB

Mesos Agent Mesos Agent

App Container App Container

App Container App Container

100%.................50%...................0% 0%.................50%...................100%

Launch Stability Confidence Launch Stability Confidence

Page 74: Driving DevOps at GS Shop using Mesos

New Roles

Page 75: Driving DevOps at GS Shop using Mesos

OPERATIONS DEVELOPER

Create Self Service

Automate Primitives

Shared Goal with Dev

Use Self Service

Ops friendly code

Shared Goal with Ops

Page 76: Driving DevOps at GS Shop using Mesos

NEW SHARED GOALS

Reduce Time from Code checkin to

Production Release

Ensure Releases can be performed

during normal business hours

Reduce unplanned work and increase

productivity

Page 77: Driving DevOps at GS Shop using Mesos

Reality Check

Allocate VM to a Service

Less upfront capacity allocation meetings, and

more work done !

1

Let Software decide

Page 78: Driving DevOps at GS Shop using Mesos

Availability and Tolerance = manual

mgmt.

Let Software decide

Less manual intervention, and more time to spend

on improving quality

2 Reality Check

Page 79: Driving DevOps at GS Shop using Mesos

Time to Production influenced by lot of

manual monotonous work

Minimal Manual work and increased

Self-service

Ops work made more accessible to Devs via Self

Service

3 Reality Check

Page 80: Driving DevOps at GS Shop using Mesos

Limited reusability and lack of standards

across teams

Standardize through Containers and

Deployment primitives

Reusability across teams, and more time to

focus on innovation

4 Reality Check

Page 81: Driving DevOps at GS Shop using Mesos

Less upfront capacity allocation meetings, and more

work done !

Less manual intervention, and more time to spend on

improving quality

Ops work made more accessible to Devs. Ops spend more time

improving quality

Reusability across teams, and more time to focus on

innovation

1

2

3

4

Reality Check

Page 82: Driving DevOps at GS Shop using Mesos

But DevOps

is also

about Architecture

Page 83: Driving DevOps at GS Shop using Mesos

Architecture Constraints

Ops friendly development

Metrics friendly development

Build systems which are failure

aware

Everything is distributed

Page 84: Driving DevOps at GS Shop using Mesos

NOT A LONG AGO

BEGINNING OF THE CHANGE

THE ROAD AHEAD

ADOPTING THE CHANGE

Page 85: Driving DevOps at GS Shop using Mesos

Multi-Tenant CLuster

Mix workloads for increased efficiency

Share Common platform primitives

Performance and Isolation guarantees

Avoid Noisy neighboursMesos Agent Selection

Isolated Load balancer and Discovery

Isolated Container Registry

Resource Reservation

Page 86: Driving DevOps at GS Shop using Mesos

Global Workload Allocation

Not all workloads are same !

CCost

PPerformance

IIsolation

Page 87: Driving DevOps at GS Shop using Mesos

Mesos Cluster

MULTI-TENANT

Physical Machines( Mesos Agents )

SINGLE-TENANT

Physical Machines( Mesos Agents )

MULTI-TENANT

VMs( Mesos Agents )

SINGLE-TENANT

VMs( Mesos Agents )

CP IP CP CI

Global Workload Allocation

Page 88: Driving DevOps at GS Shop using Mesos

Mesos Cluster

Custom FrameworkFenzo

Mesos Master

Physical Machine( Mesos Agent )

Physical Machine( Mesos Agent )

VM( Mesos Agent )

VM( Mesos Agent )

Global Workload Allocation

Page 89: Driving DevOps at GS Shop using Mesos

Custom Framework

● Recommendation System for Resource allocation

● Integrate with Billing system to provide cost-efficient allocation scheme

Global Workload Allocation

Page 90: Driving DevOps at GS Shop using Mesos

Custom Framework

● Recommendation System for Resource allocation● Integrate with Billing system to provide

cost-efficient allocation scheme

Global Workload Allocation

● Support more advanced bin-packing with Fenzo #2430 (mesosphere/marathon)

Page 91: Driving DevOps at GS Shop using Mesos

Application aware scheduling

Tenant Allocation Framework

Mesos Cluster

Mesos Agent Mesos Agent Mesos Agent Mesos Agent

Tenant Tenant Tenant

Mesos Master

Page 92: Driving DevOps at GS Shop using Mesos

Application aware scheduling

Cost Reduction

Page 93: Driving DevOps at GS Shop using Mesos

Canary Release and Architecture A/B

Mesos Agent Mesos Agent Mesos Agent Mesos Agent

Mesos Master

Marathon

Auto Scaling based on automated workflows

Automated Workflows for Testing new versions

Self Healing and automated rollback to previous versions

Page 94: Driving DevOps at GS Shop using Mesos

THE EARLY YEARS

BEGINNING OF THE CHANGE

THE ROAD AHEAD

ADOPTING THE CHANGE

Page 95: Driving DevOps at GS Shop using Mesos

1DAY ONE

Page 96: Driving DevOps at GS Shop using Mesos

Change is possible.

Page 97: Driving DevOps at GS Shop using Mesos

Change is possible.

Microsoft joins Linux Foundation - November 16, 2016

Page 98: Driving DevOps at GS Shop using Mesos

We Mesos community !

Thanks

고맙습니다

谢谢

Questions ?

Page 99: Driving DevOps at GS Shop using Mesos

We Mesos community !

Thanks

Want to help us build this further ?

We are hiring !

고맙습니다

谢谢