Scaling to 150,000 Builds a Month... and Beyond

Post on 08-Jan-2017

1.351 views 2 download

Transcript of Scaling to 150,000 Builds a Month... and Beyond

PETER LESCHEV • TEAM LEAD • ATLASSIAN • @PETERLESCHEV

Build Engineering @ Atlassian:Scaling to 150k builds per month & beyond

Summit 2015

T E A M

I N T R O D U C T I O N

I N F R A S T R U C T U R E

B A M B O O S E RV E R S

Introduction

C O N C L U S I O N

Build platform & services used internally within

Atlassian to build, test & deliver

software

Developers expect a reliable infrastructure

& fast CI feedback

• 12 Bamboo Servers• maven.atlassian.com / 9 Nexus instances / 9 TB

• 7 Nexus proxies for internal traffic

• Monitoring• opsview, graphite, statsd, newrelic, datadog

Build Engineering today @ Atlassian

• 1200 build agents on EC2• include SCM clients, JDKs, JVM build tools, databases, headless

browser testing, Python builds, NodeJS, installers & more

• Maintain 20 AMIs of various build configurations

4 years ago:

Builds per month

21k

Last month:

Builds per month

225k

Build Engineering @ Atlassian

JIRA alone has

Automated tests

49k

3 stories of gaining maturity to handle Atlassian growth

I N T R O D U C T I O N

T E A M

I N F R A S T R U C T U R E

B A M B O O S E RV E R S

Team

C O N C L U S I O N

History of team roles

Individual Engineers

Information Silos

Fault investigation, requests for advice, unplanned work

Little project work

Very interrupt driven

Duplication of effort

Limited to customer driven changes

Disturbed roleKnowledge Transfer

When switching between project / disturbed roles is difficult

More project workNon-disturbed can focus on larger tasks

Context switching

Reduction in duplication of effort, promotes collaboration within the team

2 week rotation

Team expands

Infra Engineers

Developers

Build Engineers

Disturbed for Dev & Infra

Too interrupt driven

To encourage knowledge transfer between infra & dev

Staggered changeoversMinimising disruption due to context switching

Disturbed pairing

Couldn’t handle smaller customer raised requests & interrupt driven work

Supporting Developers

team channel

Supporting Developers

Questions for Confluence

Supporting Developers

Questions for Confluence

1. Measure the pain

2. Continuous Improvement

Technical Debt

Technical Debt

Contact Rate

+ Confluence Questions+ Hipchat queriesCustomer JIRA issues

Number of Developers

( )÷

=

Contact Rate

The Shield

http://www.clker.com/cliparts/e/d/c/4/11970889822084687040sinoptik_Medieval_shield.svg.hi.png

Rebranding MaintenanceDisturbed

Removing the negative attitude towards the old role within the team

Project

work

Maintenance

The Shield

How do we avoid this in the future?P E T E R L E S C H E V

“ ”

Fix it now, fix it for the future

Self service

Chat bots

Self Service

Self Service

Maven Self Help Tool

I N T R O D U C T I O N

I N F R A S T R U C T U R E

T E A M

B A M B O O S E RV E R S

Infrastructure

C O N C L U S I O N

Infrastructure as Code

= Puppet + SCM ?

4 years ago…

Started using Puppet

Manually maintained snow flakes

Production rollout

puppetmaster

build agents

Production rollout failure

puppetmaster

build agents

Low confidence of change

• Coding on Puppet Master• Culture of manually modifying production - Configuration Drift• Impact on Builds

Using Staging for Development

puppetmaster

build agents

staging puppet environment

Vagrant

www.vagrantup.comMitchell Hashimoto

@mitchellh

Packer

packer.io

Rolling out to staging

Rolling out to production

Broken build agents

Developing locally

Behaviour Driven Development

Cucumber

https://github.com/cucumber/aruba

But it works on my machineE V E RY D E V E L O P E R

“ ”

Continuous Integration‘From scratch’ provisioning

Confidence that you can rebuild in disaster

The Pets: you give nice names, you stroke them, and when they get ill, you nurse them back to health, taking a long time over it.

”The Cattle: you give them numbers. When they get ill, you shoot them T I M B E L L , C E R N

Broken buildsmaster

Branch builds

BUILDENG-5670

BUILDENG-5669

master

Infrequent Releases

Manual Puppet Rollouts

git clone

librarian-puppet install

symlink update on puppet master

Bamboo Deployments

How environments work

Task list Available agents

Available agents

Available agents

Destination server

Destination server

Production

TASK 1TASK 2

TASK 1TASK 2

TASK 1TASK 2

1.3

Task list

Task list Available agents

TASK 1TASK 2

Task list

Task list

Release

Production

TASK 1TASK 2

1.3

Task list Available agents Destination server

Production

TASK 1TASK 2

1.3

Available agents Destination server

TASK 1TASK 2

Task list

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

staging

production

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

build• git clone

• librarian-puppet

• to specific environments

• scp to puppet master & symlink update

test deploy• ‘delta’ & ‘from scratch’

vagrant provisions

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

build & test AMIs• Generated using Packer

• AMIs on Bamboo Servers updateddeploy AMIs

Puppet Build, Test & Deploy Pipeline

Puppet Build, Test & Deploy Pipeline

‘open prs’ Bot

Less human effort through automation

= Increased frequency

& reliability of releases

Snowflakes

Pets

Cattle

Stateless Machines

Infrastructure consistency is key

I N T R O D U C T I O N

B A M B O O S E RV E R S

T E A M

I N F R A S T R U C T U R E

Bamboo Servers

C O N C L U S I O N

At scale is hard

Bamboo Servers

12

Build Plans

3500

Plan Branches

14k

Bamboo is great, but hard to manage at scale

Build Configuration as code

Plan Templates

Bamboo Plugin:

Plan Templates

Checked into SCM

Bamboo Plugin:Reusable snippets

changes can be code reviewed

Export plans for backup, or move to another Bamboo instance easily

Bulk changes

Export existing plans

Update 100s of job requirements with a single commit

Pushing Bamboo to its limits

Agent Smith Wallboard

Bamboo Plugin:

Trend data sent to Graphite

https://marketplace.atlassian.com/plugins/com.atlassian.bamboo.plugin.agent-smith-wallboard

Add metrics, then alert on them

Bamboo Monitoring Plugin

Metrics to graphiteBamboo Plugin:

Bamboo HealthActiveMQ, Database connections, Tomcat, JVM Memory usage.

Background thread workers. Number of plans / plan branches, plans / plan branches for deletion.

https://marketplace.atlassian.com/plugins/com.atlassian.bamboo.plugin.bamboo-monitoring-plugin

When a Bamboo Server starts

misbehaving…

Infrastructure differences? Is it Bamboo Configuration?

Is it a Bamboo Plugin? Is it Bamboo the product?

How is it being used?

Infrastructure consistency of Bamboo Servers is key

Bamboo Puppet provider

+

REST API for Administration

Bamboo Puppet Provider

REST calls

https://forge.puppetlabs.com/atlassian/bamboo_rest

Bamboo Puppet provider

https://forge.puppetlabs.com/atlassian/bamboo_rest

Hipchat Notification

Managed via Puppet

Bamboo Plugins‘Continuous Plugin Deployment’ Task

This text box is not intended to contain a bunch of copy.

1-click upgrades of

How environments work

Task list Available agents

Available agents

Available agents

Destination server

Destination server

Production

TASK 1TASK 2

TASK 1TASK 2

TASK 1TASK 2

1.3

Task list

Task list Available agents

TASK 1TASK 2

Task list

Task list

Release

Production

TASK 1TASK 2

1.3

Task list Available agents Destination server

Production

TASK 1TASK 2

1.3

Available agents Destination server

TASK 1TASK 2

Task list

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

All Bamboo Servers

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

build

Deploy

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

build & test AMIs

Build

https://marketplace.atlassian.com/plugins/com.atlassian.bamboo.plugins.deploy.continuous-plugin-deployment

Bamboo Servers1-click upgrades of

Using scp / ssh & puppet

How environments work

Task list Available agents

Available agents

Available agents

Destination server

Destination server

Production

TASK 1TASK 2

TASK 1TASK 2

TASK 1TASK 2

1.3

Task list

Task list Available agents

TASK 1TASK 2

Task list

Task list

Release

Production

TASK 1TASK 2

1.3

Task list Available agents Destination server

Production

TASK 1TASK 2

1.3

Available agents Destination server

TASK 1TASK 2

Task list

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Upgrade Bamboo

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Build Bamboo

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

Deploymentproject

Build plan

How artifacts work

1.0

1.3

1.3

1.3

Build results

(Artifacts)

Release Environments

Productio

n

Develop

ment

1.0

1.31.3

Productio

n

Develop

ment

1.31.3

Develop

ment

Artifactsn

n+1

n+2

Versions

Test &

Build

JIRA

issueCommit TriggerCode

Release notes

Repository Build artifacts Release

jira-bamboo

servicedesk-bamboo

Infrastructure differences? Is it Bamboo Configuration?

Is it a Bamboo Plugin? Is it Bamboo the product?

How is it being used?

T E A M

I N F R A S T R U C T U R E

B A M B O O S E RV E R S

Conclusion

C O N C L U S I O N

I N T R O D U C T I O N

Constant improvement

We’ve matured to handle the growth of Atlassian

Thank you!

PETER LESCHEV • TEAM LEAD • ATLASSIAN • @PETERLESCHEV