Forgetting Encoding Failure Encoding Failure Storage Decay Retrieval Failure.
SPOF - Single "Person" of Failure
-
Upload
sasha-rosenbaum -
Category
Technology
-
view
4.004 -
download
4
Transcript of SPOF - Single "Person" of Failure
![Page 1: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/1.jpg)
Single Point of Failure… ExpertSasha Rosenbaum, @DivineOps
![Page 2: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/2.jpg)
Who am I?
Sasha Rosenbaum
Azure & DevOps consultant
at 10th Magnitude for 4 years
Co-organizer of
- DevOps Days Chicago Conference
- Chicago Azure meetup
@DivineOps
![Page 3: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/3.jpg)
What is a Single Point of Failure?
@DivineOps
![Page 4: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/4.jpg)
A single point of failure (SPOF) is a part of a system that, if it fails, will stop the
entire system from working
@DivineOps
![Page 5: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/5.jpg)
High Availability
Achieving redundancy by removing single points of failure
Having reliable cross-over capabilities to switch between components
Detection of failures as they occur, so that cross-over can be initiated
@DivineOps
![Page 6: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/6.jpg)
This is complicated
@DivineOps
![Page 7: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/7.jpg)
Architecting for HA
@DivineOps
![Page 8: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/8.jpg)
How is the entire system down?
@DivineOps
![Page 9: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/9.jpg)
We forgot a dependency!
@DivineOps
![Page 10: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/10.jpg)
Oh…
@DivineOps
![Page 11: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/11.jpg)
Just imagine buying a server that
Uptime of roughly 16 hours a day
With interruptions
Single one of its kind
Cannot be replicated!
@DivineOps
![Page 12: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/12.jpg)
Humans are NOT highly available
@DivineOps
![Page 13: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/13.jpg)
How did we get here?
Lack of budget
Lack of people
Human nature
@DivineOps
![Page 14: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/14.jpg)
How to recognize that you have a problem?
@DivineOps
![Page 15: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/15.jpg)
1
@DivineOps
![Page 16: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/16.jpg)
Keys to the Kingdom
@DivineOps
![Page 17: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/17.jpg)
TO MY PRODUCTION SERVER @DivineOps
![Page 18: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/18.jpg)
Even when the systems are automated there are still humans who manage them
@DivineOps
![Page 19: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/19.jpg)
Why is there a single admin?
The situation evolved organically from having a small team
Someone took over deliberately
@DivineOps
![Page 20: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/20.jpg)
Role Based Access
Grant access based on a role/group
Admin group size > 1
Service accounts
@DivineOps
![Page 21: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/21.jpg)
Make sure that the person on call has the necessary access to fix the problem
@DivineOps
![Page 22: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/22.jpg)
TRUST YOUR PEOPLE!!!
@DivineOps
![Page 23: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/23.jpg)
2
@DivineOps
![Page 24: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/24.jpg)
Beware of the Expert!
@DivineOps
![Page 25: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/25.jpg)
“This will take 15 minutes to fix
And 8 hours to explain”
@DivineOps
![Page 26: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/26.jpg)
We cannot afford the loss of productivity!
@DivineOps
![Page 27: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/27.jpg)
Can you afford losing this knowledge?
@DivineOps
![Page 28: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/28.jpg)
Delegate to Juniors
@DivineOps
![Page 29: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/29.jpg)
Juniors are wonderful people
They ask tough questions
@DivineOps
![Page 30: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/30.jpg)
Your new hires haven’t yet caught the
“This is how it’s always been” virus
@DivineOps
![Page 31: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/31.jpg)
You are emotionally invested in your code
It is hard not to get protective of it
@DivineOps
![Page 32: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/32.jpg)
Documentation
Documents
Readme
Comments
Tests
Automation
Features
@DivineOps
![Page 33: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/33.jpg)
3
@DivineOps
![Page 34: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/34.jpg)
“I cannot afford to take vacation!”
@DivineOps
![Page 35: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/35.jpg)
Job security?
@DivineOps
![Page 36: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/36.jpg)
Productivity?
@DivineOps
![Page 37: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/37.jpg)
Hours / Productivity
@DivineOps
![Page 38: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/38.jpg)
Research shows that working longer hours
DOES NOT increase productivity
@DivineOps
![Page 39: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/39.jpg)
You need rest to be at your best!
@DivineOps
![Page 40: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/40.jpg)
Cell phones are the single worse thing that happened to people AND businesses in the last century
@DivineOps
![Page 41: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/41.jpg)
If people were actually unreachable we would find a more reliable way to solve problems
@DivineOps
![Page 42: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/42.jpg)
Mandatory Vacation
@DivineOps
![Page 43: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/43.jpg)
Game Days
@DivineOps
![Page 44: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/44.jpg)
Say NO to having a
Single PERSON of Failure ;-)
@DivineOps
![Page 45: SPOF - Single "Person" of Failure](https://reader030.fdocuments.net/reader030/viewer/2022021506/586fd8651a28ab18428b55a7/html5/thumbnails/45.jpg)
Great job, DoD Silicon Valley!
@DivineOps