SRE @ Dropbox - LCA2016 - Tammy Butow
Transcript of SRE @ Dropbox - LCA2016 - Tammy Butow
SITE RELIABILITY ENGINEERING AT DROPBOX@tammybutow, SRE MANAGER
TODAY
LET’S INVENT A BETTER FUTURE.
400M
SITE RELIABILITY ENGINEERS
SWEAT THE DETAILS
AUTOMATION
PYGERDUTY REDUCE PAGES BY 10x
MONITORING & ALERTINGIT’S OK TO SNOOZE
SCRIPTING & AUTO-REMEDIATIONSELF-HEALING SYSTEMS
TOOLS
HERMESQUESTS & LABORS
DBMANAGERAUTOMATING DATABASE OPERATIONS
SLACKBOTSWE MAKE CHAT BOTS IN GO & PYTHON
AIM HIGHER
ROADMAPDECIDE ON WHERE TO GO, BUT STAY FLEXIBLE
DISTRIBUTED TEAMSFOLLOW THE SUN
Person with sun
KTLORELIABILITY & AVAILABILITY
CAPTAIN’S LOGPOST MORTEM FOR EVERY PAGE
DISASTER RECOVERY TESTINGFREQUENT & CROSS-TEAM
FAULT INJECTIONWHAT ARE THE UNKNOWN UNKNOWNS?
Skull with person
WE NOT ICELEBRATE ACHIEVEMETS
AIM HIGHER. SWEAT THE DETAILS. REDUCE PAGES. INVENT TOOLS. AUTOMATE. SCRIPT. AUTO-REMEDIATE. TRIGGER FAULTS. MONITOR. ALERT. SNOOZE.
CREATE TIME TO BUILD. PLAN. ROADMAP. TRACK. CELEBRATE.
HACKALWAYS BE CODING
Q&A