Scaling to 150,000 Builds a Month... and Beyond

84
PETER LESCHEV TEAM LEAD ATLASSIAN @PETERLESCHEV Build Engineering @ Atlassian: Scaling to 150k builds per month & beyond Summit 2015

Transcript of Scaling to 150,000 Builds a Month... and Beyond

Page 1: Scaling to 150,000 Builds a Month... and Beyond

PETER LESCHEV • TEAM LEAD • ATLASSIAN • @PETERLESCHEV

Build Engineering @ Atlassian:Scaling to 150k builds per month & beyond

Summit 2015

Page 2: Scaling to 150,000 Builds a Month... and Beyond

T E A M

I N T R O D U C T I O N

I N F R A S T R U C T U R E

B A M B O O S E RV E R S

Introduction

C O N C L U S I O N

Page 3: Scaling to 150,000 Builds a Month... and Beyond

Build platform & services used internally within

Atlassian to build, test & deliver

software

Page 4: Scaling to 150,000 Builds a Month... and Beyond

Developers expect a reliable infrastructure

& fast CI feedback

Page 5: Scaling to 150,000 Builds a Month... and Beyond

• 12 Bamboo Servers• maven.atlassian.com / 9 Nexus instances / 9 TB

• 7 Nexus proxies for internal traffic

• Monitoring• opsview, graphite, statsd, newrelic, datadog

Build Engineering today @ Atlassian

• 1200 build agents on EC2• include SCM clients, JDKs, JVM build tools, databases, headless

browser testing, Python builds, NodeJS, installers & more

• Maintain 20 AMIs of various build configurations

Page 6: Scaling to 150,000 Builds a Month... and Beyond

4 years ago:

Builds per month

21k

Page 7: Scaling to 150,000 Builds a Month... and Beyond

Last month:

Builds per month

225k

Page 8: Scaling to 150,000 Builds a Month... and Beyond

Build Engineering @ Atlassian

Page 9: Scaling to 150,000 Builds a Month... and Beyond
Page 10: Scaling to 150,000 Builds a Month... and Beyond

JIRA alone has

Automated tests

49k

Page 11: Scaling to 150,000 Builds a Month... and Beyond
Page 12: Scaling to 150,000 Builds a Month... and Beyond

3 stories of gaining maturity to handle Atlassian growth

Page 13: Scaling to 150,000 Builds a Month... and Beyond

I N T R O D U C T I O N

T E A M

I N F R A S T R U C T U R E

B A M B O O S E RV E R S

Team

C O N C L U S I O N

Page 14: Scaling to 150,000 Builds a Month... and Beyond

History of team roles

Page 15: Scaling to 150,000 Builds a Month... and Beyond

Individual Engineers

Information Silos

Fault investigation, requests for advice, unplanned work

Little project work

Very interrupt driven

Duplication of effort

Limited to customer driven changes

Page 16: Scaling to 150,000 Builds a Month... and Beyond

Disturbed roleKnowledge Transfer

When switching between project / disturbed roles is difficult

More project workNon-disturbed can focus on larger tasks

Context switching

Reduction in duplication of effort, promotes collaboration within the team

2 week rotation

Page 17: Scaling to 150,000 Builds a Month... and Beyond

Team expands

Infra Engineers

Developers

Build Engineers

Page 18: Scaling to 150,000 Builds a Month... and Beyond

Disturbed for Dev & Infra

Too interrupt driven

To encourage knowledge transfer between infra & dev

Staggered changeoversMinimising disruption due to context switching

Disturbed pairing

Couldn’t handle smaller customer raised requests & interrupt driven work

Page 19: Scaling to 150,000 Builds a Month... and Beyond

Supporting Developers

team channel

Page 20: Scaling to 150,000 Builds a Month... and Beyond

Supporting Developers

Questions for Confluence

Page 21: Scaling to 150,000 Builds a Month... and Beyond

Supporting Developers

Questions for Confluence

Page 22: Scaling to 150,000 Builds a Month... and Beyond

1. Measure the pain

2. Continuous Improvement

Page 23: Scaling to 150,000 Builds a Month... and Beyond

Technical Debt

Page 24: Scaling to 150,000 Builds a Month... and Beyond

Technical Debt

Page 25: Scaling to 150,000 Builds a Month... and Beyond

Contact Rate

+ Confluence Questions+ Hipchat queriesCustomer JIRA issues

Number of Developers

( )÷

=

Page 26: Scaling to 150,000 Builds a Month... and Beyond

Contact Rate

Page 27: Scaling to 150,000 Builds a Month... and Beyond

The Shield

http://www.clker.com/cliparts/e/d/c/4/11970889822084687040sinoptik_Medieval_shield.svg.hi.png

Page 28: Scaling to 150,000 Builds a Month... and Beyond

Rebranding MaintenanceDisturbed

Removing the negative attitude towards the old role within the team

Page 29: Scaling to 150,000 Builds a Month... and Beyond

Project

work

Maintenance

The Shield

Page 30: Scaling to 150,000 Builds a Month... and Beyond

How do we avoid this in the future?P E T E R L E S C H E V

“ ”

Page 31: Scaling to 150,000 Builds a Month... and Beyond

Fix it now, fix it for the future

Page 32: Scaling to 150,000 Builds a Month... and Beyond

Self service

Page 33: Scaling to 150,000 Builds a Month... and Beyond

Chat bots

Self Service

Page 34: Scaling to 150,000 Builds a Month... and Beyond

Self Service

Maven Self Help Tool

Page 35: Scaling to 150,000 Builds a Month... and Beyond

I N T R O D U C T I O N

I N F R A S T R U C T U R E

T E A M

B A M B O O S E RV E R S

Infrastructure

C O N C L U S I O N

Page 36: Scaling to 150,000 Builds a Month... and Beyond

Infrastructure as Code

= Puppet + SCM ?

Page 37: Scaling to 150,000 Builds a Month... and Beyond

4 years ago…

Started using Puppet

Manually maintained snow flakes

Page 38: Scaling to 150,000 Builds a Month... and Beyond

Production rollout

puppetmaster

build agents

Page 39: Scaling to 150,000 Builds a Month... and Beyond

Production rollout failure

puppetmaster

build agents

Page 40: Scaling to 150,000 Builds a Month... and Beyond

Low confidence of change

Page 41: Scaling to 150,000 Builds a Month... and Beyond

• Coding on Puppet Master• Culture of manually modifying production - Configuration Drift• Impact on Builds

Using Staging for Development

puppetmaster

build agents

staging puppet environment

Page 42: Scaling to 150,000 Builds a Month... and Beyond

Vagrant

www.vagrantup.comMitchell Hashimoto

@mitchellh

Packer

packer.io

Page 43: Scaling to 150,000 Builds a Month... and Beyond

Rolling out to staging

Rolling out to production

Broken build agents

Developing locally

Page 44: Scaling to 150,000 Builds a Month... and Beyond

Behaviour Driven Development

Cucumber

https://github.com/cucumber/aruba

Page 45: Scaling to 150,000 Builds a Month... and Beyond

But it works on my machineE V E RY D E V E L O P E R

“ ”

Page 46: Scaling to 150,000 Builds a Month... and Beyond

Continuous Integration‘From scratch’ provisioning

Confidence that you can rebuild in disaster

Page 47: Scaling to 150,000 Builds a Month... and Beyond

The Pets: you give nice names, you stroke them, and when they get ill, you nurse them back to health, taking a long time over it.

”The Cattle: you give them numbers. When they get ill, you shoot them T I M B E L L , C E R N

Page 48: Scaling to 150,000 Builds a Month... and Beyond

Broken buildsmaster

Page 49: Scaling to 150,000 Builds a Month... and Beyond

Branch builds

BUILDENG-5670

BUILDENG-5669

master

Page 50: Scaling to 150,000 Builds a Month... and Beyond

Infrequent Releases

Page 51: Scaling to 150,000 Builds a Month... and Beyond

Manual Puppet Rollouts

git clone

librarian-puppet install

symlink update on puppet master

Page 52: Scaling to 150,000 Builds a Month... and Beyond

Bamboo Deployments

How environments work

Task list Available agents

Available agents

Available agents

Destination server

Destination server

Production

TASK 1TASK 2

TASK 1TASK 2

TASK 1TASK 2

1.3

Task list

Task list Available agents

TASK 1TASK 2

Task list

Task list

Release

Production

TASK 1TASK 2

1.3

Task list Available agents Destination server

Production

TASK 1TASK 2

1.3

Available agents Destination server

TASK 1TASK 2

Task list

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

staging

production

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

build• git clone

• librarian-puppet

• to specific environments

• scp to puppet master & symlink update

test deploy• ‘delta’ & ‘from scratch’

vagrant provisions

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

build & test AMIs• Generated using Packer

• AMIs on Bamboo Servers updateddeploy AMIs

Page 53: Scaling to 150,000 Builds a Month... and Beyond

Puppet Build, Test & Deploy Pipeline

Page 54: Scaling to 150,000 Builds a Month... and Beyond

Puppet Build, Test & Deploy Pipeline

Page 55: Scaling to 150,000 Builds a Month... and Beyond

‘open prs’ Bot

Page 56: Scaling to 150,000 Builds a Month... and Beyond

Less human effort through automation

= Increased frequency

& reliability of releases

Page 57: Scaling to 150,000 Builds a Month... and Beyond

Snowflakes

Pets

Cattle

Stateless Machines

Page 58: Scaling to 150,000 Builds a Month... and Beyond

Infrastructure consistency is key

Page 59: Scaling to 150,000 Builds a Month... and Beyond

I N T R O D U C T I O N

B A M B O O S E RV E R S

T E A M

I N F R A S T R U C T U R E

Bamboo Servers

C O N C L U S I O N

Page 60: Scaling to 150,000 Builds a Month... and Beyond

At scale is hard

Page 61: Scaling to 150,000 Builds a Month... and Beyond

Bamboo Servers

12

Page 62: Scaling to 150,000 Builds a Month... and Beyond

Build Plans

3500

Page 63: Scaling to 150,000 Builds a Month... and Beyond

Plan Branches

14k

Page 64: Scaling to 150,000 Builds a Month... and Beyond

Bamboo is great, but hard to manage at scale

Page 65: Scaling to 150,000 Builds a Month... and Beyond

Build Configuration as code

Page 66: Scaling to 150,000 Builds a Month... and Beyond

Plan Templates

Bamboo Plugin:

Page 67: Scaling to 150,000 Builds a Month... and Beyond

Plan Templates

Checked into SCM

Bamboo Plugin:Reusable snippets

changes can be code reviewed

Export plans for backup, or move to another Bamboo instance easily

Bulk changes

Export existing plans

Update 100s of job requirements with a single commit

Page 68: Scaling to 150,000 Builds a Month... and Beyond

Pushing Bamboo to its limits

Page 69: Scaling to 150,000 Builds a Month... and Beyond

Agent Smith Wallboard

Bamboo Plugin:

Trend data sent to Graphite

https://marketplace.atlassian.com/plugins/com.atlassian.bamboo.plugin.agent-smith-wallboard

Page 70: Scaling to 150,000 Builds a Month... and Beyond

Add metrics, then alert on them

Page 71: Scaling to 150,000 Builds a Month... and Beyond

Bamboo Monitoring Plugin

Metrics to graphiteBamboo Plugin:

Bamboo HealthActiveMQ, Database connections, Tomcat, JVM Memory usage.

Background thread workers. Number of plans / plan branches, plans / plan branches for deletion.

https://marketplace.atlassian.com/plugins/com.atlassian.bamboo.plugin.bamboo-monitoring-plugin

Page 72: Scaling to 150,000 Builds a Month... and Beyond
Page 73: Scaling to 150,000 Builds a Month... and Beyond

When a Bamboo Server starts

misbehaving…

Page 74: Scaling to 150,000 Builds a Month... and Beyond

Infrastructure differences? Is it Bamboo Configuration?

Is it a Bamboo Plugin? Is it Bamboo the product?

How is it being used?

Page 75: Scaling to 150,000 Builds a Month... and Beyond

Infrastructure consistency of Bamboo Servers is key

Page 76: Scaling to 150,000 Builds a Month... and Beyond

Bamboo Puppet provider

+

REST API for Administration

Bamboo Puppet Provider

REST calls

https://forge.puppetlabs.com/atlassian/bamboo_rest

Page 77: Scaling to 150,000 Builds a Month... and Beyond

Bamboo Puppet provider

https://forge.puppetlabs.com/atlassian/bamboo_rest

Hipchat Notification

Managed via Puppet

Page 78: Scaling to 150,000 Builds a Month... and Beyond

Bamboo Plugins‘Continuous Plugin Deployment’ Task

This text box is not intended to contain a bunch of copy.

1-click upgrades of

How environments work

Task list Available agents

Available agents

Available agents

Destination server

Destination server

Production

TASK 1TASK 2

TASK 1TASK 2

TASK 1TASK 2

1.3

Task list

Task list Available agents

TASK 1TASK 2

Task list

Task list

Release

Production

TASK 1TASK 2

1.3

Task list Available agents Destination server

Production

TASK 1TASK 2

1.3

Available agents Destination server

TASK 1TASK 2

Task list

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

All Bamboo Servers

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

build

Deploy

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

build & test AMIs

Build

https://marketplace.atlassian.com/plugins/com.atlassian.bamboo.plugins.deploy.continuous-plugin-deployment

Page 79: Scaling to 150,000 Builds a Month... and Beyond

Bamboo Servers1-click upgrades of

Using scp / ssh & puppet

How environments work

Task list Available agents

Available agents

Available agents

Destination server

Destination server

Production

TASK 1TASK 2

TASK 1TASK 2

TASK 1TASK 2

1.3

Task list

Task list Available agents

TASK 1TASK 2

Task list

Task list

Release

Production

TASK 1TASK 2

1.3

Task list Available agents Destination server

Production

TASK 1TASK 2

1.3

Available agents Destination server

TASK 1TASK 2

Task list

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Upgrade Bamboo

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Build Bamboo

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

jira-bamboo

servicedesk-bamboo

Page 80: Scaling to 150,000 Builds a Month... and Beyond

Infrastructure differences? Is it Bamboo Configuration?

Is it a Bamboo Plugin? Is it Bamboo the product?

How is it being used?

Page 81: Scaling to 150,000 Builds a Month... and Beyond

T E A M

I N F R A S T R U C T U R E

B A M B O O S E RV E R S

Conclusion

C O N C L U S I O N

I N T R O D U C T I O N

Page 82: Scaling to 150,000 Builds a Month... and Beyond

Constant improvement

Page 83: Scaling to 150,000 Builds a Month... and Beyond

We’ve matured to handle the growth of Atlassian

Page 84: Scaling to 150,000 Builds a Month... and Beyond

Thank you!

PETER LESCHEV • TEAM LEAD • ATLASSIAN • @PETERLESCHEV