Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

47
AWS Loft: Behind the scenes with Cotap Architecting for the Cloud: Hoping for the best, prepared for the worst.

Transcript of Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Page 1: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

AWS Loft: Behind the scenes with Cotap

Architecting for the Cloud:

Hoping for the best, prepared for the worst.

Page 2: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst
Page 3: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst
Page 4: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst
Page 5: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst
Page 6: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst
Page 7: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst
Page 8: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst
Page 9: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst
Page 10: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst
Page 11: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Infrastructure as Code

Page 12: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Infrastructure as Code

● Current state

● Past decisions

● Tracking the evolution

Page 13: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

● CloudFormation

● Design -> JSON

● Version Control!

Infrastructure as Code

Page 14: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Infrastructure as Code

Page 15: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Infrastructure as Code

Page 16: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Infrastructure as Code

Page 17: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst
Page 18: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst
Page 19: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Rule #1

All changes have to be under Version

Control

Page 20: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Design for automation

Page 21: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Design for automation

● AutoScalingGroups

● Hardware: CloudFormation

● Software: Configuration management

● Cattle not Cats

Page 22: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Rule #2

No instances should be launched manually.

Page 23: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Monitoring & Alerting

Page 24: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Monitoring & Alerting

● Cost ofo Interruptions

o Waking somebody up

● Channels

● Self-healing infrastructure

● External monitoring

● Page only when critical

Page 25: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Monitoring & Alerting

Situation Channel Page

Disk full 60% Chat, Email ✗

Disk full 90% Chat, Email, PagerDuty ✓

Chef not running for > 30m Chat, Email ✗

Redis not running for > 3 x 5s Chat, Email, PagerDuty ✓

ElasticSearch N-1 Chat, Email ✗

ElasticSearch N-2 Chat, Email, PagerDuty ✓

Page 26: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Monitoring & Alerting

● Cost ofo Interruptions

o Waking somebody up

● Channels

● Self-healing infrastructure

● External monitoring

● Page only when critical

Page 27: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Platform to fail

Page 28: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Platform to fail

● Easy creation of temporary “Stacks”

● Branches can get their own hardware

● Clients can talk to a branch

● QA happens on Sandbox

● Exact copy of Production

● Scale up/down based on needs

● Different Region (us-east-1)

Page 29: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Platform to fail

Page 30: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Platform to fail

● Easy creation of temporary “Stacks”

● Branches can get their own hardware

● Clients can talk to a branch

● QA happens on Sandbox

● Exact copy of Production

● Scale up/down based on needs

● Different Region (us-east-1)

Page 31: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

All changes have to go through Sandbox.

Rule #3

Page 32: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Rule #4

Production is just a more powerful Sandbox

Page 33: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Disaster Recovery

Page 34: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Disaster Recovery

● Multi-AZs

● Traffic routing

● Multi-Regions (S3 too)

● AutoScalingGroups Min:1 Max:1

● Off-site backups (VPN + Disks)

● RPO + RTO

Page 35: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Security

Page 36: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Security

● MFA

● Public key distribution

● Root key rotation

● Private/Public Subnets

● ACLs/Security Groups

● Update AMIs

● Trusted Advisor!

Page 37: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Security

Page 38: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Scaling

Page 39: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Scaling

● Preemptive

● Automatic

● Vertically

● Horizontally

● Bottlenecks

Page 40: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Scaling

Page 41: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Cost Control

Page 42: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Cost Control

● Tagso Role

o Environment

● Cost explorer

● Threshold alerting

● Share monthly

● Export to CSV

● Right-Scale (ASG)

Page 43: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Cost Control

Page 44: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Cost Control

Page 45: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Cost Control

● Tagso Role

o Environment

● Cost explorer

● Threshold alerting

● Share monthly

● Export to CSV

● Right-Scale (ASG)

Page 46: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

4 rules of 5 nines.

● All changes have to be under VC

● No instance should be launched manually

● All changes are deployed to Sandbox first

● Production is just a more powerful Sandbox

Page 47: Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Questions?

t: @martincozzi

e: [email protected]

engineering.cotap.com