Large-scale Infrastructure Automation at Verizon

50
LARGE-SCALE INFRASTRUCTURE AUTOMATION Timothy Perrett Hashiconf 2016

Transcript of Large-scale Infrastructure Automation at Verizon

LARGE-SCALE INFRASTRUCTURE AUTOMATION

Timothy Perrett

Hashiconf 2016

HELLO.

STATE OF THE UNION INDUSTRY^

PROLIFERATION OF RESOURCE MANAGEMENT

LINUX CONTAINERS ARE FINALLY POPULAR

NOSQL EVERYWHERE

NOSQL EVERYWHERE

with RDMS still commonplace

^

OBSERVATIONS.

“ Infrastructure engineering is 60% social, and only 40% technical. Changing people is far more important than changing technology.

“Enable sociological change. Technological changes are an implementation detail.

“Operational complexity is often proportional to the lack of developer responsibility.

FLEXIBILITY

CONSTRAINT

Business Staff

FLEXIBILITY

CONSTRAINT

Engineering Staff

“Constraints liberate. Liberties constrain.

- Runar Bjarnason

BRIEF HISTORYWe have to go back!

4 YEARS AGO.

4 YEARS AGO.� Pretty typical configuration management.

� Centralized Chef servers. � Lots of unmaintainable Ruby. � Ruby that generates Ruby which is

evaluated at runtime (yikes!). � Developer contract is non-existent.

Operations need to understand every application in detail.

� Code complete to finally deployed took around two weeks.

3 YEARS AGO.

3 YEARS AGO.� Implemented immutable machine images

with Hashicorp Packer.

� Developer / Ops contract becomes an RPM/DEB file along with two YAML manifests.

� One manifest for provisioning.

� Another for runtime deployment setup.

� Drive the entire release workflow from source repositories.

� Orchestrated with many linked Jenkins jobs and schedules.

� Code complete to finally deployed took around 40 minutes.

TODAY.

TODAY.� Developer / operations contract is just a

linux container. � Repository contains a YAML manifest.

� Realization that placement and orchestration are entirely separate.

� Intelligent and fully automated cleanup. � Application dependency management. � Automated traffic bleeding. � Integrated alerting with prometheus,

general notifications with slack or email. � Code complete to deployed takes

around 5 minutes.

NELSON.

“Desperate affairs require desperate remedies.

-Vice Admiral Horatio Nelson, 1758-1805

GOALS.

GOALS.� System elements should be awesome at

just one thing. � Reduce system complexity by increasing

responsibility of engineering teams. � Break it, you bought it. � All application specifications are

checked into source control. � Focus on orchestration, not placement. � Force automation in every aspect of work

� Manual access to systems are a crutch that enables automation avoidance.

UNITS.

job

- name: hello world type: job description: > mindlessly prints hello world to the console for five minutes schedule: hourly retries: 2 expiration_policy: > retain-latest-two-major dependencies: - ref: [email protected]

unit type

job stuff

service

- name: howdy type: service description: > always responds with hello world ports: - default->8080/http expiration_policy: > retain-latest-two-major dependencies: - ref: [email protected]

unit type

service stuff

edge proxy

- name: foobar-proxy type: proxy description: > proxy inbound from outside routes: - name: expose the ssl port expose: inbound->443/https destination: [email protected]>default expiration_policy: > retain-until-deprecated

routes

WORKFLOWS.

failure domain

container replication

alerting routing discovery

scheduling

credentials

LIFECYCLE.

User activated

Graph Pruning

Upgraded!

AutomaticallyTerminatedX

X

LIFECYCLE.� Various cleanup strategies

� Graph pruning � Explicit deprecation cycles � User selected policies for versions

� Retain last two major � Retain last two minor � Retain latest � Retain always

� Eliminates the “Do we still need this?” conversations between ops and development.

- name: hello world type: job description: > mindlessly prints hello world to the console for five minutes schedule: hourly retries: 2 expiration_policy: > retain-latest-two-major dependencies: - ref: [email protected]

TL;DR.

� Automate everything. Your future sanity depends on it. � Define concrete protocols at system integration points; favor machine verifiable

protocols where possible. � Your path to success involves people. Listen, learn and be open for criticism. � Consul & Vault provide building-block functionality that just works. � Never settle for mediocre tools.

� Know when buying is better than building, but don’t be afraid to build if it adds value.

EOF

WE’RE HIRING!

timperrettgithub.com/verizon