DevopsHQ #2017

26
DevopsHQ #2017 SLAs & SREs

Transcript of DevopsHQ #2017

DevopsHQ #2017SLAs & SREs

SLA - Service Level Agreement

is a contract between a service provider (either internal or external) and the end user that defines the level of service expected from the service provider. SLAs are output-based in that their purpose is specifically to define what the customer will receive.

SRE - Site Reliability Engineering

Google’s mastermind behind SRE, Ben Treynor, still hasn’t published a single-sentence definition, but describes site reliability as “what happens when a software engineer is tasked with what used to be called operations.”

THINK! THINK! THINK!

What to take note?

- People are not generally evil, but busy

- Consider time

- Consider stress

- Consider benefits

- Consider rewards

- Help people be great on their job

Improvement

- Post-Mortems

- Communication

- Collaboration

- Ownership

- Standardization

- Policy

- ISMS (Optional)

>>> LOOP

MONITOR! MONITOR! MONITOR!

UP/RESPONSE

OPS METRICS

- CPU Utilization

- Memory Utilization

- Disk Utilization

- Process Monitoring

- Webserver Processes

- DB server Processes

- Custom Script Processes

- UPTIME

- Server Load

- NTP (Time)

HOW DO WE KNOW IF EVERYTHING WORKS?

500/502/504200/206/404/403302/304

To the rescue: CDN

HOW DO WE KNOW IF EVERYTHING WORKS?

HOW DO WE KNOW IF EVERYTHING WORKS?

500/502/504200/206/404/403302/304

Working as SRE, everything on your metric/stats count.

SLA is hard to maintain, it takes a badass SRE (not just one but a team) to get things rolling.

- Always be proactive

- Always improve

- Always be ready

@NeilUpbeta01 | [email protected]