DevOps Naughties Style - How We DevOps at MP3.com in the Early 2000's

60
DevOps Naughties Style (how we did DevOps at MP3.com in the early 2000’s) [email protected] 20140203

Transcript of DevOps Naughties Style - How We DevOps at MP3.com in the Early 2000's

Page 1: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

DevOps Naughties Style(how we did DevOps at MP3.com in the early 2000’s)

[email protected]

Page 2: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

MP3.com – Dec 1998

Page 3: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Probably not the only ones but there wasn’t a good way to know that

Few sources of information for internet startups – little sharing of information on reliable, scalable internet architecture

MP3.com: We Did DevOps Before It Was Called DevOps

Page 4: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Probably not the only ones but there wasn’t a good way to know that

Few sources of information for internet startups – little sharing of information on reliable, scalable internet architecture

After Vivendi bought us we were more than just MP3.com – we hosted all of the above

MP3.com: We Did DevOps Before It Was Called DevOps

Page 5: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

There were all sorts of other internet sites starting up but we were one of the first with small requests resulting in large (comparatively) responses

Our Problem(s)

Page 6: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

There were all sorts of other internet sites starting up but we were one of the first with small requests resulting in large (comparatively) responses

A quickly growing community sharing .mp3 files – 4 million mp3 files, 25 million users

Our Problem(s)

Page 7: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

There were all sorts of other internet sites starting up but we were one of the first with small requests resulting in large (comparatively) responses

A quickly growing community sharing .mp3 files – 4 million mp3 files, 25 million users

At our peak we quite happily kept 1.2Gbps busy with audio files – progressive downloads

Our Problem(s)

Page 8: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

There were all sorts of other internet sites starting up but we were one of the first with small requests resulting in large (comparatively) responses

A quickly growing community sharing .mp3 files – 4 million mp3 files, 25 million users

At our peak we quite happily kept 1.2Gbps busy with audio files – progressive downloads

Had to come up with solutions to scaling problems at a time when few, if any, had done so before

Our Problem(s)

Page 9: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

OK, it wasn’t called DevOps but :◦ We used tools to make a workflow and worked

together as an Engineering team rather than against each other as was more typical at the time (and still is in some (many?) places)

◦ This included my Ops team being a integral part of the Engineering group not a separate organization

Our Solution - DevOps

Page 10: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Our workflow included mainly open source tools – we’ll look at some

Strict procedures and protocols I’ll start from the Ops side

Overview

Page 11: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Change is bad – we tried to use the same machine types as long as possible – although with the speed of growth that was sometimes only months

Machine Builds

Page 12: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Change is bad – we tried to use the same machine types as long as possible – although with the speed of growth that was sometimes only months

Kept to as few basic types as possible – our last Linux whitebox version was out MkV server (based on a Supermicro chassis)

Machine Builds

Page 13: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Single unified base OS install (Redhat – pre RHEL from 5 through 8) – Kickstart install

Machine Builds II

Page 14: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Single unified base OS install (Redhat – pre RHEL from 5 through 8) – Kickstart install

Solaris boxes where we needed a journalled filesystem (we funded ReiserFS later on) – jumpstart installed

Machine Builds II

Page 15: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Single unified base OS install (Redhat – pre RHEL from 5 through 8) – Kickstart install

Solaris boxes where we needed a journalled filesystem (we funded ReiserFS later on) – jumpstart installed

We build 20 or 30 boxes ahead and could build 10 boxes at a time in around 10 minutes (total) – all because we knew exactly what was going on the machines & that they were the same basic install.

Machine Builds II

Page 16: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Clustering – something we had to learn for ourselves. Other people did it too but we didn’t know that

Redundancy and Clustering

Page 17: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Clustering – something we had to learn for ourselves. Other people did it too but we didn’t know that

Initially RadWare load balancers, then F5

Redundancy and Clustering

Page 18: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Clustering – something we had to learn for ourselves. Other people did it too but we didn’t know that

Initially RadWare load balancers, then F5 Lots of small unrelated clusters

Redundancy and Clustering

Page 19: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Clustering – something we had to learn for ourselves. Other people did it too but we didn’t know that

Initially RadWare load balancers, then F5 Lots of small unrelated clusters Every cluster did as little as possible and

had as few dependencies as possible – mainly just DB

Redundancy and Clustering

Page 20: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Clustering – something we had to learn for ourselves. Other people did it too but we didn’t know that

Initially RadWare load balancers, then F5 Lots of small unrelated clusters Every cluster did as little as possible and

had as few dependencies as possible – mainly just DB

Naming Convention for hosts and clusters

Redundancy and Clustering

Page 21: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

XXyyyyyynn◦ XX = two letter code for datacenter◦ yyyyy = a short descriptive name for a cluster◦ nn = a two digit numeric for the cluster

Examples :◦ sdwww03◦ sjdb01

Naming Convention

Page 22: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Mon : this was an open source monitoring tool (you can still find it here ftp://mirror.csclub.uwaterloo.ca/slackware/slackware-4.0/kernel.org/software/admin/mon/html/news.html) ~= Nagios/Sensu

RRDtool/SNMP data gathering for all servers, storage devices. Graphs were our own – I can’t find screenshots for this ~= Cacti/Graphite

Monitoring Tools

Page 23: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

We had over 130TB of storage. Sounds small today but 4TB was a whole rack

Storage

Page 24: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

One of our storage towers was delivered just a little carelessly - $120K of disk :

Storage Ooops

Page 25: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

All machines reported to MachDB which collected data about the machine and reported back (~= ohai)

Host Management

Page 26: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

All machines reported to MachDB which collected data about the machine and reported back (~= ohai)

We manually entered a switch location on install

Host Management

Page 27: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

All machines reported to MachDB which collected data about the machine and reported back (~= ohai)

We manually entered a switch location on install

We could know exactly which rack any of the 2000 hosts were on our network and create web based maps of every machine

Host Management

Page 28: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

All machines reported to MachDB which collected data about the machine and reported back (~= ohai)

We manually entered a switch location on install

We could know exactly which rack any of the 2000 hosts were on our network and create web based maps of every machine

MachDB is still around: http://www.machdb.org/

Host Management

Page 29: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Centralized User Management – built internally

User Management

Page 30: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Centralized User Management – built internally

Based on LDAP – used by whole company not just Engineering

User Management

Page 31: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Centralized User Management – built internally

Based on LDAP – used by whole company not just Engineering

/etc/passwd and /etc/group built individually for each server and distributed via SSH

User Management

Page 32: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Centralized User Management – built internally

Based on LDAP – used by whole company not just Engineering

/etc/passwd and /etc/group built individually for each server and distributed via SSH

Combined with local versions to ensure if the remote file was bad at least we could get in as root

User Management

Page 33: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Centralized User Management – built internally

Based on LDAP – used by whole company not just Engineering

/etc/passwd and /etc/group built individually for each server and distributed via SSH

Combined with local versions to ensure if the remote file was bad at least we could get in as root

Root user passwords only within Ops group

User Management

Page 34: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Unified problem reporting over the company using an in house web based ticketing system - tix

Tix had many of the features of today’s tools – user assignment, escalation etc.

Ticketing

Page 35: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Divided into the standard dev, qa/staging, production

Dev and staging systems built using the same kickstart as production (can we see chef/puppet here)

CM using CVS◦ A whole group dealt with merge issues

Development

Page 36: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

We used CVS – no svn or git

Version Control

Page 37: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

We used CVS – no svn or git Problem was that CVS versions per file so

no overall state

Version Control

Page 38: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

We used CVS – no svn or git Problem was that CVS versions per file so

no overall state Started using manifest files – every

application had a list of version vs files and dependencies. Manifest files were version controlled themselves.

Version Control

Page 39: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

We used CVS – no svn or git Problem was that CVS versions per file so

no overall state Started using manifest files – every

application had a list of version vs files and dependencies. Manifest files were version controlled themselves.

This evolved into a tool that built artifact bundles (tar) from manifests for use by cfengine

Version Control

Page 40: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Every cluster had a ’00’ machine using our naming convention eg. sdwww00

QA/Staging

Page 41: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Every cluster had a ’00’ machine using our naming convention eg. sdwww00

On deployment code would be put on this machine first and tested by QA

QA/Staging

Page 42: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Every cluster had a ’00’ machine using our naming convention eg. sdwww00

On deployment code would be put on this machine first and tested by QA

We knew what code to deploy from the manifest file – just use the version number from that

QA/Staging

Page 43: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Every cluster had a ’00’ machine using our naming convention eg. sdwww00

On deployment code would be put on this machine first and tested by QA

We knew what code to deploy from the manifest file – just use the version number from that

Testing was a mixture of manual and scripted regression tests & new feature tests

QA/Staging

Page 44: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

We ended up using cfengine to deploy code

cfengine

Page 45: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

We ended up using cfengine to deploy code Using manifests artifact files from ‘00’

machines were distributed to relevant servers

cfengine

Page 46: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

We ended up using cfengine to deploy code Using manifests artifact files from ‘00’

machines were distributed to relevant servers

Links to change to latest code ( = capistrano/chef deploy)

cfengine

Page 47: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

We ended up using cfengine to deploy code Using manifests artifact files from ‘00’

machines were distributed to relevant servers

Links to change to latest code ( = capistrano/chef deploy)

We also used cfengine for host package updates and OS level config files (/etc/resolv,conf etc.)

cfengine

Page 48: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Rough Infrastructure Diagram

Page 49: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Single server NFS mounted disk storage

◦ Soon had problems so added NetApps for storage

Early Infrastructure

Page 50: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Single server NFS mounted disk storage

◦ Soon had problems so added NetApps for storage◦ Heat Issues in the Datacenter we were in

Early Infrastructure

Page 51: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Single server NFS mounted disk storage

◦ Soon had problems so added NetApps for storage◦ Heat Issues in the Datacenter we were in◦ Added squid caches & moved to a bigger better

datacenter 3am datacenter visits ended…

Early Infrastructure

Page 52: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Single server NFS mounted disk storage

◦ Soon had problems so added NetApps for storage◦ Heat Issues in the Datacenter we were in◦ Added squid caches & moved to a bigger better

datacenter 3am datacenter visits ended… Rain in the server room ended Random disk pulls on live servers also

ended

Early Infrastructure

Page 53: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

We needed Geographic redundancy

Datacenter Duplication

Page 54: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

We needed Geographic redundancy Business deal with WorldCom ended up with

a second datacenter in…. San Jose – not perfect but better

Datacenter Duplication

Page 55: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

We needed Geographic redundancy Business deal with WorldCom ended up with

a second datacenter in…. San Jose – not perfect but better

AT&T wouldn’t give us good bandwidth pricing so the new Datacenter had a pre-populated Squid cache

Datacenter Duplication

Page 56: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

We needed Geographic redundancy Business deal with WorldCom ended up with

a second datacenter in…. San Jose – not perfect but better

AT&T wouldn’t give us good bandwidth pricing so the new Datacenter had a pre-populated Squid cache (3TB of cache)

Cut AT&T bandwidth in ½ overnight when we turned it on

Datacenter Duplication

Page 57: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

We needed Geographic redundancy Business deal with WorldCom ended up with

a second datacenter in…. San Jose – not perfect but better

AT&T wouldn’t give us good bandwidth pricing so the new Datacenter had a pre-populated Squid cache (3TB of cache)

Cut AT&T bandwidth in ½ overnight when we turned it on

Better AT&T bandwidth pricing

Datacenter Duplication

Page 58: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

Respect across disciplines Great work environment Fast pace Cohesive management for Dev and Ops

Company Atmosphere

Page 59: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

2003

Page 60: DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's

MP3.com the domain was sold to CNet and still lives on as a completely different site

Much of the infrastructure design and machines lived on as the new Napster (ex-Pressplay, now part of Rhapsody)

Original MP3.com employees are *everywhere*

Sold and Resurrected