AWS re:Invent 2016: DevOps on AWS: Advanced Continuous Delivery Techniques (DEV403)

79
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Mark Mansour, Senior Manager, Continuous Delivery November 30, 2016 DEV403 DevOps on AWS Advanced Continuous Delivery Techniques

Transcript of AWS re:Invent 2016: DevOps on AWS: Advanced Continuous Delivery Techniques (DEV403)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Mark Mansour, Senior Manager, Continuous Delivery

November 30, 2016

DEV403

DevOps on AWSAdvanced Continuous Delivery Techniques

What to expect from the session

Make your pipeline safer by

1. Identifying production issues quickly

2. Deploying changes safely

3. Automatically deciding when to release changes

Techniques

1. Continuous production testing

2. Manage deployment health

3. Segment production

4. Halt promotions

5. Gates

Starting Point:

The release process is automated

Prerequisites

• Versioned source

• Automated build

• Automated deployments

• Deploy to > 1 instance

• Unit tests

• Integration tests

• Continuous Delivery

• Operations dashboard

Source

Build

Deploy to Integration Stack

Integration Tests

Deploy to Production

Best practices with your tools

• Focus in on best practices

• Keep using your current tools where possible

• Deployment tools

• Continuous Integration and Continuous Delivery Tools

• Extend your current tools when needed

• This talk uses AWS tools

Tools used in this talk

Monitoring

Amazon CloudWatch

Software Development

Amazon SNS

AWS Lambda

Deployment

AWS CodeDeploy

AWS CodePipeline

MyApp

CodeCommit

Source

Build

CodeCommit

Build

DeployToInteg

CodeDeploy

Integration

IntegTest

End2EndTester

DeployToProd

CodeDeploy

Production

Source

Build

Deploy to Integration Stack

Integration Tests

Deploy to Production

Model the release process in CodePipeline

Pipeline Run

ActionStage

Pipeline

Source change

• starts a run; and

• creates an artifact to be used by

other actions.

Change 1

Release and deploy process: Starting point

MyApp

CodeCommit

Source

Build

Build

Build

DeployToInteg

CodeDeploy

Integration

IntegTest

End2EndTester

DeployToProd

CodeDeploy

Production

CodeDeploy

1. Continuous production testing

2. Manage deployment health

3. Segment production

4. Halt promotions

5. Gates

Techniques

Be aware when a service is unavailable

Problem:

A service can stop working at any time for reasons inside

or outside of its control.

Consequence:

Your service may be unavailable without your team

knowing about it.

1 of 5 – Continuous production testing

Use synthetic traffic to simulate real users

• Test all business critical functionality (UI and APIs)

• Tests must run quickly

• Measure client latencies

• Check for reachability

1 of 5 – Continuous production testing

Synthetic Traffic

How synthetic traffic flows

CloudWatch

Alarm

1 of 5 – Continuous production testing

CloudWatch

Events (1m)

CloudWatch

Events (1m)

Synthetic Traffic

Synthetic traffic flow – why two metric streams?

CloudWatch

Alarm

1 of 5 – Continuous production testing

Building a Synthetic Traffic Test

Building a synthetic traffic test

• Keep it simple

• Build logic in Lambda (invoke with CloudWatch Events)

• Capture data in CloudWatch metrics

1 of 5 – Continuous production testing

Lambda’s synthetic traffic blueprint

1 of 5 – Continuous production testing

Scheduling the synthetic traffic test

1 of 5 – Continuous production testing

Building a synthetic traffic test - Code

1 of 5 – Continuous production testing

Building a synthetic traffic test – Alarming

1 of 5 – Continuous production testing

Release and deploy process: Synthetic traffic

DeployToProd

CodeDeploy

Production

Synthetic Traffic

CodeDeploy

1. Continuous production testing

2. Manage deployment health

3. Segment production

4. Halt promotions

5. Gates

1. Continuous production testing

2. Manage deployment health

3. Segment production

4. Halt promotions

5. Gates

Techniques

V1V1 V1 V1 V1 V1 V1 V1 V1 V1V2 V2 V2 V2 V2V2 V2 V2 V2 V2

Rolling deployments – success

Production Fleet

ELB

2 of 5 – Manage deployment health

V1V1 V1 V1 V1 V1 V1 V1 V1 V1V2 V2 V2 V2 V2V2 V2 V2 V2 V2

Rolling deployments – fail

Production Fleet

ELB

2 of 5 – Manage deployment health

Check for deployment failures in production

Problem:

There are no automated tests to verify a service is working

after a new deployment.

Consequence:

Each production deployment needs to be checked

manually.

2 of 5 – Manage deployment health

Add safety to rolling deployments

1. Validate each host’s health

2. Ensure a minimum percentage of the fleet is healthy

3. Rollback if the deployment failed

2 of 5 – Manage deployment health

Configure CodeDeploy

Step 1: Deployment Validation – AppSpec.yml

2 of 5 – Manage deployment health

V1V1 V1 V1 V1 V1 V1 V1 V1 V1V2 V2 V2 V2 V2V2

Step 1: Working tests raises more issues

Production Fleet

ELB

2 of 5 – Manage deployment health

Failed Deployment

4 failures – 60% healthy

MHH 70%, 10 hosts:

V1V2 V1V1 V1 V1 V1 V1 V1 V1 V1V2 V2 V2 V2V2 V2 V2 V2 V2

Step 2: Use minimum healthy hosts

Production Fleet

ELB

2 of 5 – Manage deployment health

1 failure – 90% healthy

Step 2: Use minimum health hosts - CodeDeploy

2 of 5 – Manage deployment health

Step 3: Rollback when a deployment fails

• CodeDeploy: configured in deployment group

2 of 5 – Manage deployment health

Release and deploy: Deployment health

DeployToProd

CodeDeploy

Production

Synthetic Traffic

CodeDeploy

1. Continuous production testing

2. Manage deployment health

3. Segment production

4. Halt promotions

5. Gates

1. Continuous production testing

2. Manage deployment health

3. Segment production

4. Halt promotions

5. Gates

Techniques

3 of 5 - Segment production

Bad changes must not affect all customers

Pipeline Problem:

When a critical issue reaches production all hosts are

affected.

Consequence:

Bad changes impact all customers.

3 of 5 - Segment production

Lower deployment risk by segmenting

1. Break production into multiple segments

2. Deploy to a segment

3. Test a segment after a deployment

4. Repeat 2 & 3 until done

3 of 5 - Segment production

Segment Production

Step 1: Break production into multiple segments

Typical segment types:

• Region

• Availability Zone

• Sub-Zonal

• Single Host (Canary)

3 of 5 - Segment production

US-EAST-1

US-EAST-1A US-EAST-1B

V2 V2 V2V2V1 V1V1

Step 1: Typical deployment segmentation

Availability Zone based

Deployment

Availability Zone based

DeploymentAvailability Zone based

Deployment

V2 V2V2V1 V1V1 V2 V2V2V1 V1V1

Production Fleet

Post-deployment test

3 of 5 - Segment production

Canary

Deployment

V1

Region based Deployment

Step 1: Use deployment groups as segments

Create deployment groups per segment using:

• Tags

• Auto Scaling groups

3 of 5 - Segment production

Production

CanaryDeploy

CodeDeploy

PostDeployTest

Approval

Deploy-AZ-1

CodeDeploy

PostDeployTest

Approval

Deploy-AZ-2

CodeDeploy

Deploy-AZ-3

CodeDeploy

DeployToInteg

CodeDeploy

Integration

IntegTest

End2EndTester

1. Deploy to smallest segment

2. Post-deployment tests

3. Deploy to one Availability Zone

4. Post-deployment tests

5. Deploy to remaining Availability Zones

Step 2: Deploy to each segment

3 of 5 - Segment production

Step 3: Test each segment

A deployment is valid if:

• The test has gathered enough data to gain confidence

• CloudWatch metrics

• No service alarms have fired

• CloudWatch alarms

• The test has not timed out

• Code

3 of 5 - Segment production

Add segment tests to your pipeline

Extend CodePipeline with:

• Test Actions

• Lambda Invoke Actions

• Custom Actions

• Approval Actions

3 of 5 - Segment production

1 hour timeout

7 day timeout

Use CodePipeline approvals to trigger tests

Source

MyAppSource

CodeCommit

Deploy

DeployToSegment

CodeDeploy

SNS topicValidateSegment

Approval

putApprovalResult

Approval

message

3 of 5 - Segment production

DeployToSegment

CodeDeploy

Use SNS to start an automated approval check

3 of 5 - Segment production

Creating a post-deployment test

Source

MyAppSource

CodeCommit

Build

MyAppBuild

Build

Deploy

CanaryDeploy

CodeDeploy

ValidateCanary

Approval

SNS topic Lambda Function

registerDeployTest()

Lambda Function

evaluateDeploy()

DynamoDB

CloudWatch

Events (1m)

Change 1

Prod-us-east-1a

CodeDeploy alarmtimeusage

3 of 5 - Segment production

Post-deployment test – registerDeployTest

Source

MyAppSource

CodeCommit

Build

MyAppBuild

Build

Deploy

CanaryDeploy

CodeDeploy

ValidateCanary

Approval

SNS topic Lambda Function

registerDeployTest()

Lambda Function

evaluateDeploy()

DynamoDB

CloudWatch

Events (1m)

Change 1

Prod-us-east-1a

CodeDeploy alarmtimeusage

3 of 5 - Segment production

registerDeployTest function – (Node.js 4.3)

3 of 5 - Segment production

Post-deployment test – evaluateDeployTest

Source

MyAppSource

CodeCommit

Build

MyAppBuild

Build

Deploy

CanaryDeploy

CodeDeploy

ValidateCanary

Approval

SNS topic Lambda Function

registerDeployTest()

Lambda Function

evaluateDeploy()

DynamoDB

CloudWatch

Events (1m)

Change 1

Prod-us-east-1a

CodeDeploy alarmtimeusage

3 of 5 - Segment production

approveValidation function (Node.js 4.3)

3 of 5 - Segment production

Canary Deployments – they’re different

All production hosts:

• Participates in serving production traffic

• Configured as a production instance

• Participates in production metrics stream

Canary hosts:

• Has its own metrics stream

• Canary validations use the canary metric stream

3 of 5 - Segment production

Summary: Segment production

• Segment production to reduce impact of a bad change

• Minimum segmentation:

• Region

• Canary deployment per region

• Larger service segmentation

• Zonal

• Sub-zonal

• Test each segment before moving on

3 of 5 - Segment production

Release and deploy: Segment production

Synthetic Traffic

CodeDeploy

Production

CanaryDeploy

CodeDeploy

PostDeployTest

Approval

Deploy-AZ-1

CodeDeploy

PostDeployTest

Approval

Deploy-AZ-2

CodeDeploy

Deploy-AZ-3

CodeDeployDeployToProd

CodeDeploy

Production

1. Continuous production testing

2. Manage deployment health

3. Segment production

4. Halt promotions

5. Gates

1. Continuous production testing

2. Manage deployment health

3. Segment production

4. Halt promotions

5. Gates

Techniques

3 of 5 - Segment production

4 of 5 – Halt promotions

EC2 instance

Change 2Change 3

Don’t change the system under test

Source

MyAppSource

CodeCommit

Build

MyAppBuild

Build

DeployToProd

MyApp

CodeDeploy

deploys

Change 1

Don’t compound problems during an outage

Pipeline Problem:

The pipeline is unaware of the health of the infrastructure

that it is deploying to

Consequence:

Production changes, usually deployments, can make it

difficult for an operator to resolve a production event.

4 of 5 – Halt promotions

Build promotion blockers

Source

MyAppSource

CodeCommit

Build

MyAppBuild

Build

DeployToProd

MyApp

CodeDeploy

Change 1Change 2

Automatically stop deploying to production

during an event

CloudWatchSynthetic

Trafficdeploys

checks

CloudWatch

Events (1m)

triggers

emitsdisables

disableTransition() Alarm

EC2 instance

SNS

4 of 5 – Halt promotions

disableTransition function (Lambda Node.js 4.3)

4 of 5 – Halt promotions

Enable production deployments - CodePipeline

4 of 5 – Halt promotions

Summary: Halt promotions

• Halt promotions to production when your production

environment has “issues”

• Automate by disabling stage transitions

4 of 5 – Halt promotions

Release and deploy: Halt promotions

Synthetic Traffic

CodeDeploy

Production

CanaryDeploy

CodeDeploy

PostDeployTest

Approval

Deploy-AZ-1

CodeDeploy

PostDeployTest

Approval

Deploy-AZ-2

CodeDeploy

Deploy-AZ-3

CodeDeploy

1. Continuous production testing

2. Manage deployment health

3. Segment production

4. Halt promotions

5. Gates

1. Continuous production testing

2. Manage deployment health

3. Segment production

4. Halt promotions

5. Gates

Techniques

3 of 5 - Segment production

Do not deploy at sensitive times

Problem:

A bad change during sensitive times has a disproportionate

affect on the business.

Consequence:

Issues during sensitive days risk reputation and financial

loss.

5 of 5 - Gates

Adding safety with deployment black-days

Deploy to production during normal conditions

• Halt deployments during sensitive times

Building a black-day calendar with CodePipeline:

• Use Approvals to pause production deployments

• Lambda to automatically approve when the time is right

5 of 5 - Gates

Build black-day gates

Black-day test

Source

MyAppSource

CodeCommit

Build

MyAppBuild

Build

Deploy

BlackDayCheck

Approval

ProductionDeploy

CodeDeploy

SNS topic Lambda Function

registerDeployment

Lambda Function

processTimeWindows

DynamoDB

CloudWatch

Events (1m)

Change 1

5 of 5 - Gates

This looks familiar…

Source

MyAppSource

CodeCommit

Build

MyAppBuild

Build

Deploy

BlackDayCheck

Approval

ProductionDeploy

CodeDeploy

SNS topic Lambda Function

registerDeployment

Lambda Function

processTimeWindows

DynamoDB

CloudWatch

Events (1m)

5 of 5 - Gates

This looks familiar – post-deployment test

Source

MyAppSource

CodeCommit

Build

MyAppBuild

Build

Deploy

CanaryDeploy

CodeDeploy

ValidateCanary

Approval

SNS topic Lambda Function

registerDeployTest()

Lambda Function

evaluateDeploy()

DynamoDB

CloudWatch

Events (1m)

Prod-us-east-1a

CodeDeploy alarmtimeusage

3 of 5 - Segment production

What’s the difference?

Source

MyAppSource

CodeCommit

Build

MyAppBuild

Build

Deploy

BlackDayCheck

Approval

ProductionDeploy

CodeDeploy

SNS topic Lambda Function

registerDeployment

Lambda Function

processTimeWindows

DynamoDB

CloudWatch

Events (1m)

5 of 5 - Gates

Summary: Gates

• Black-days provide centralized control

• Add common action to all pipelines

• Black-days are a type of gate

• Implement with Approval actions in CodePipeline

5 of 5 - Gates

Production

CanaryDeploy

CodeDeploy

PostDeployTest

Approval

Deploy-AZ-1

CodeDeploy

PostDeployTest

Approval

Deploy-AZ-2

CodeDeploy

Deploy-AZ-3

CodeDeploy

CheckBlackDays

Approval

Release and deploy: Gates

Synthetic Traffic

CodeDeploy

Production

CanaryDeploy

CodeDeploy

PostDeployTest

Approval

Deploy-AZ-1

CodeDeploy

PostDeployTest

Approval

Deploy-AZ-2

CodeDeploy

Deploy-AZ-3

CodeDeploy

What we’ve learned

Goal: Make your pipeline safer…

1. Identify production issues quickly

• Continuous Production Testing

2. Safely deploy changes

• Manage deployment health

• Segment production

3. Automatically decide when to release changes

• Halt promotions

• Black-days and Gates

Release and deploy process: Ending point

DeployToProd

CodeDeploy

Production

CodeDeploy

Synthetic Traffic

CanaryDeploy

CodeDeploy

PostDeployTest

Approval

Deploy-AZ-1

CodeDeploy

PostDeployTest

Approval

Deploy-AZ-2

CodeDeploy

CheckBlackDays

Approval

CanaryDeploy

CodeDeploy

PostDeployTest

Approval

Deploy-AZ-1

CodeDeploy

PostDeployTest

Approval

Deploy-AZ-2

CodeDeploy

Deploy-AZ-3

CodeDeploy

Production

Thank you!

Remember to complete

your evaluations!

Code is available online

• github.com/awslabs/aws-codepipeline-time-windows

• github.com/awslabs/aws-codepipeline-synthetic-tests

• github.com/awslabs/aws-codepipeline-block-production

Related Sessions

• DEV303 – Deploying and Managing .NET Pipelines and

Microsoft Workloads

• DEV310 – DevOps on AWS: Choosing the Right

Software Deployment Technique

• DEV313 – Infrastructure Continuous Deployment Using

AWS CloudFormation

• SVR307 – Application Lifecycle Management in a

Serverless World

Author:

Slides written and prepared by Mark Mansour, Senior

Manager, Continuous Delivery, AWS.

This presentation, “DevOps on AWS: Advanced

Continuous Delivery Techniques”, was originally given at

re:Invent 2016 on Nov 30.