Life On-Call, Availa-liberty, and the Pursuit of Happiness
-
Upload
dave-cliffe -
Category
Software
-
view
30 -
download
0
Transcript of Life On-Call, Availa-liberty, and the Pursuit of Happiness
![Page 1: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/1.jpg)
Life On-Call, Availa-
liberty, & the Pursuit
of HappinessRunbooksDave Cliffe
@CliffeHangers
![Page 2: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/2.jpg)
![Page 3: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/3.jpg)
Incident #1:Oct 27, 2011
Incident #2:May 1, 2013
Incident #3:Nov 2, 2015
![Page 4: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/4.jpg)
![Page 5: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/5.jpg)
Collaboration/Resolution
MICROSERVICES
APPS & SERVICES
CONTAINERS
CLOUD
NETWORK
DATABASE
SERVERS
Developer
NOC
Helpdesk
IT OpsSystem and User
Efficiency
ALERT 1 ALERT 2 ALERT 3
Correlate, Cluster and Manage
EVENTS
People Data Process
Deployment Tools
Monitoring Tools
Ticketing Tools
APP
SYSTEM
LOG
WEB
MOBILE APP
Automatic Escalations
On-CallScheduling
Your Fastest Path to Incident Resolution
![Page 6: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/6.jpg)
![Page 7: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/7.jpg)
Availability
![Page 8: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/8.jpg)
Every software powered company experiences downtime
http://www.evolven.com/blog/downtime-outages-and-failures-understanding-their-true-costs.html
Cost of outages:
$7,400,000 annual cost @175 hours downtime Gartner
![Page 9: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/9.jpg)
“The most important ability is availability.”
All CEOs everywhere
![Page 10: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/10.jpg)
Why is Availability a terrible metric?
![Page 11: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/11.jpg)
The Tyranny of the SLA
credit: J. Paul Reed (@jpaulreed)
![Page 12: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/12.jpg)
“System Availability” means the percentage of total time during which the Hosted Service network is available to Client and Client is able to access the Hosted Service system interface.
______ warrants the following minimum levels of Hosted Service System Availability during each calendar month: 99.95%
The following definitions will apply to the calculation of “availability”:“Hosted Service System Availability” means the percentage of total time during each calendar month during whichthe Hosted Service is available to Client, excluding Scheduled Downtime and Emergency Maintenance
An actual SaaS SLA
![Page 13: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/13.jpg)
Are you Available?
![Page 14: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/14.jpg)
Happiness
![Page 15: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/15.jpg)
![Page 16: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/16.jpg)
Measuring (Un)Happiness
![Page 17: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/17.jpg)
Responsiveness
![Page 18: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/18.jpg)
Pain
![Page 19: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/19.jpg)
Health Checks
https://labs.spotify.com/2014/09/16/squad-health-check-model/
![Page 20: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/20.jpg)
Happiness++
![Page 21: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/21.jpg)
http://www.activestate.com/blog/2014/01/devops-hero-culture
Beware the ‘Hero Culture’
![Page 22: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/22.jpg)
Eliminate Single Points of
Dependence
![Page 23: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/23.jpg)
Reduce Alert
Fatigue
https://www.pinterest.com/pin/497929302524908289
![Page 24: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/24.jpg)
![Page 25: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/25.jpg)
On a regular basis, For every alert, Ask …
1) Is it actionable?2) Is it urgent?3) Could we consolidate?4) Did the right person get it?
![Page 26: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/26.jpg)
“The most important on-call responsibility is to understand customer impact.” Anonymous Customer (who I didn’t verify I could quote)
![Page 27: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/27.jpg)
Sharing Operational
Responsibility
![Page 28: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/28.jpg)
“Giving developers operational responsibilities has greatly enhanced the QUALITY of the services, both from a customer and
a technology point of view.
The TRADITIONAL model is that you take your software to the wall that separates development and operations, and throw it
over and then forget about it.
-Werner Vogels, CTO Amazon
SHARED OPERATIONAL RESPONSIBILITY
… You build it, you run it.”
![Page 29: Life On-Call, Availa-liberty, and the Pursuit of Happiness](https://reader036.fdocuments.net/reader036/viewer/2022062503/58a2feaf1a28abea508b476b/html5/thumbnails/29.jpg)
“For developers to take responsibility for the systems they create, they need support from
operations to understand how to build ’reliable software that can be continuous deployed to an
unreliable platform that scales horizontally’.”
-Jez Humble, quoting Jesse Robbins (Chef)
SHARED OPERATIONAL RESPONSIBILITY