From Vagrant to production Mark Eijsermans HootSuite - So"ware Engineer
@markeijsermans http://code.hootsuite.com
• Social media dashboard • 8M users • 40 people committing code • 4 ops • AWS (2/3) & private cloud (1/3) • 100M requests/day (hootsuite.com) • 70M requests/day (ow.ly)
HootSuite
• Monolithic web app (PHP)
• Transitioning towards service oriented architecture (Scala)
In the beginning
dev server (LAMP)
production (LAMP)
dev 1 dev 2
svn
smb smb
Release anytime • Small team • Is intimately aware of prod • Low overhead to release
DevOps?
In the beginning
..a while later
dev server svn
dev dev dev dev dev
dev dev dev dev dev
dev dev dev dev dev
dev dev dev dev dev
dev dev dev dev dev
web web web web
web web web web
web web web web
gearman gearman
0mq worker
cron cron cron cron
smb 0mq worker
ops ops
1 release / 2-4 weeks • Medium sized team • Branching, pre-release code freezes • Only some devs knowledgeable of prod • Ops mostly handles deploy • Complicated process
Devs vs. Ops?
..a while later
…and a few years more (now)
Vagrant
automate
build
test
QA
dev dev dev dev dev
dev dev dev dev dev
dev dev dev dev dev
dev dev dev dev dev
dev dev dev dev dev
dev dev ops ops ops
monolithic web app
Service A Service B
Service C Service D
6 - 10 release / day • Larger team • No branching • Anyone deploys • Automated process • Commit to production in 15min • Much higher % of devs understand prod
…and a few years more (now)
• You’re never going to get a perfect system • How far along are we? • How many things are actually: • Automated • Reproducible • In a state we want them to be
• When can we start feeling good about it?
Perfection is the enemy of good
• svn • svn mirror • PHP • phar • tar + zsync
• Scala • debian packages
The move to artifacts
Dev Build Deploy
Jenkins
project code unit test create artifact
Artifact Repo
build #124
build #123
production
staging
smoke test
production service A/B/C
dash4deploy
“I’m on it”
Broke the build?
• Working on a shared web server over samba was simply painful • extremely volatile environment
• Vagrant simply kicks ass • Dev / prod parity • Encourages experimentation • go ahead and try out that new X
Vagrant
Dev Build Deploy
Jenkins
project code (vagrant)
unit test create artifact
Artifact Repo
build #124
build #123
production
staging
smoke test
production service A/B/C
dash4deploy
Dev Build Deploy
Jenkins
project code (vagrant)
unit test create artifact
Artifact Repo
vagrant.box
build #124
production
staging
smoke test
production service A/B/C
dash4deploy
Ansible build vagrant box
build #123
…yes, use any CM, but why we chose Ansible:
• New team members up to speed in a few days: • Gentle learning curve – YAML
• Push over ssh - easy to understand
• The more declarative it is, the more it documents
• It's all about disseminating information
• Agentless model suits immutable ephemeral instances
Ansible (and why we love it)
Digital archeology
There are gems in that dirt
• logs • you mean logs aren’t saved to /dev/null ? • app was rotating logs, vs. using logrotate
• memcached, mongodb servers not all the same version • what the hell is server X for ???
Auditing our infrastructure
• With configuration management gold images become a caching layer to the build process
• Private cloud - idle instances are cheap
• AMI - need auto spinning to save $$$ & time
Cache for gold
• 200+ server types is hard to manage
• Amortize the stack
• Custom bespoke vs. generic appliances
Simplify for sanity
• Learning happens in production
• Production is still running at 3am
Devs on call
“People who are really serious about so"ware should make their own hardware”
Alan Kay
• Is your monitoring and tooling truly effective?
• Make sure they can ssh into the damn box! (if they have to)
• Throwing some poor dev onto a broken system they know nothing about doesn't work
• Choose your own adventure cookbooks
• Create registry of specialists
Devs on call
Is this working?
• Yammer (& HipChat for on call) • Tried IRC, was too silo’ed
• 5 whys post mortems
• Accountability • Build fails - “I'm on it”
Working out loud
Learning happens in production
Thank you! @markeijsermans http://code.hootsuite.com