Gov 2.0: Scaling, Automation, & Management in the Cloud

98
Copyright © 2010 Opscode, Inc - All Rights Reserved Speaker: [email protected] @jesserobbins www.opscode.com Jesse Robbins CEO Scaling in the Cloud 1

description

Gov 2.0: Scaling, Automation, & Management in the Cloud

Transcript of Gov 2.0: Scaling, Automation, & Management in the Cloud

Page 1: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Speaker:

[email protected]‣ @jesserobbins‣ www.opscode.com

Jesse Robbins CEO

Scaling in the Cloud

1

Page 2: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc. – Confidential – Do Not Redistribute

Opscode makes a new kind of Infrastructure Automation, offered as a hosted Service.

2

Page 7: Gov 2.0: Scaling, Automation, & Management in the Cloud

For Developers...

Page 8: Gov 2.0: Scaling, Automation, & Management in the Cloud

For Developers...

• Do it yourself.

Page 9: Gov 2.0: Scaling, Automation, & Management in the Cloud

For Developers...

• Do it yourself.

• The infrastructure is the application (and vice versa).

Page 10: Gov 2.0: Scaling, Automation, & Management in the Cloud

For Developers...

• Do it yourself.

• The infrastructure is the application (and vice versa).

• You are not a Systems Administrator.

Page 11: Gov 2.0: Scaling, Automation, & Management in the Cloud

For Developers...

• Do it yourself.

• The infrastructure is the application (and vice versa).

• You are not a Systems Administrator.

• You need tools.

Page 12: Gov 2.0: Scaling, Automation, & Management in the Cloud

Sysadmins..

http://covers.oreilly.com/images/9780596007836/lrg.jpg

Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft

Page 13: Gov 2.0: Scaling, Automation, & Management in the Cloud

Sysadmins..• Say “Yes”.

http://covers.oreilly.com/images/9780596007836/lrg.jpg

Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft

Page 14: Gov 2.0: Scaling, Automation, & Management in the Cloud

Sysadmins..• Say “Yes”.

• You never liked rack and stack that much anyway.

http://covers.oreilly.com/images/9780596007836/lrg.jpg

Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft

Page 15: Gov 2.0: Scaling, Automation, & Management in the Cloud

Sysadmins..• Say “Yes”.

• You never liked rack and stack that much anyway.

• You have never been more critical.

http://covers.oreilly.com/images/9780596007836/lrg.jpg

Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft

Page 16: Gov 2.0: Scaling, Automation, & Management in the Cloud

Sysadmins..• Say “Yes”.

• You never liked rack and stack that much anyway.

• You have never been more critical.

• Lean into it.

http://covers.oreilly.com/images/9780596007836/lrg.jpg

Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft

Page 17: Gov 2.0: Scaling, Automation, & Management in the Cloud

Executives...

Page 18: Gov 2.0: Scaling, Automation, & Management in the Cloud

Executives...

• Not a magic unicorn

Page 19: Gov 2.0: Scaling, Automation, & Management in the Cloud

Executives...

• Not a magic unicorn

• Benefits come from efficiency, not raw Capex

Page 20: Gov 2.0: Scaling, Automation, & Management in the Cloud

Executives...

• Not a magic unicorn

• Benefits come from efficiency, not raw Capex

• Has real cultural implications at every level

Page 21: Gov 2.0: Scaling, Automation, & Management in the Cloud

Executives...

• Not a magic unicorn

• Benefits come from efficiency, not raw Capex

• Has real cultural implications at every level

• You are the biggest asset to success

Page 22: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved 7

(http://radar.oreilly.com/archives/2007/10/operations-advantage.html)

10

20

30

40

50

“Traditional” Operations

# o

f H

our

s

05

101520

1 2 3 4 5 6 7 9 10 11 12

Ser

vers

Week #

10

20

30

40

50

Operations - The “Secret Sauce”

UpkeepConfigOS InstallHardware

05

101520

1 2 3 4 5 6 7 9 10 11 12

Week #

ExistingNew

Page 23: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved 7

(http://radar.oreilly.com/archives/2007/10/operations-advantage.html)

10

20

30

40

50

“Traditional” Operations

# o

f H

our

s

05

101520

1 2 3 4 5 6 7 9 10 11 12

Ser

vers

Week #

10

20

30

40

50

Operations - The “Secret Sauce”

UpkeepConfigOS InstallHardware

05

101520

1 2 3 4 5 6 7 9 10 11 12

Week #

ExistingNew

This is the secret of Cloud Computing.

Every other virtue stems from here.

Page 24: Gov 2.0: Scaling, Automation, & Management in the Cloud
Page 25: Gov 2.0: Scaling, Automation, & Management in the Cloud

You are 10% Unique

Page 26: Gov 2.0: Scaling, Automation, & Management in the Cloud

You are 10% Unique

And itʼs probablythe things you did wrong

Page 27: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is Hard

9

Page 28: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is Hard

9

1999Inventory, packaged file transers and desktops

Page 29: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is Hard

9

1999Inventory, packaged file transers and desktops

2005Unattended bare metal servers “very very” hard7k Nodes took 5 days w/90 success

Page 30: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is Hard

9

1999Inventory, packaged file transers and desktops

2005Unattended bare metal servers “very very” hard7k Nodes took 5 days w/90 success

2007 Unattended bare metal in under 10 minutesFully configured in under 3 mins

Page 31: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is Hard

9

1999Inventory, packaged file transers and desktops

2005Unattended bare metal servers “very very” hard7k Nodes took 5 days w/90 success

2007 Unattended bare metal in under 10 minutesFully configured in under 3 mins

2008 Unattended server in 2 minutes 5000 servers in a week

Page 32: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is Hard

9

1999Inventory, packaged file transers and desktops

2005Unattended bare metal servers “very very” hard7k Nodes took 5 days w/90 success

2007 Unattended bare metal in under 10 minutesFully configured in under 3 mins

2008 Unattended server in 2 minutes 5000 servers in a week

201010k Nodes in under 5 minutes

Page 33: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is changing

10

Page 34: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is changing

10

‣ Easier to get (good!)...but harder to manage (bad!)

Page 35: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is changing

10

‣ Easier to get (good!)...but harder to manage (bad!)

‣ Demand is dynamic

Page 36: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is changing

10

‣ Easier to get (good!)...but harder to manage (bad!)

‣ Demand is dynamic

‣ Developers are crucial to Operations

Page 37: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is changing

10

‣ Easier to get (good!)...but harder to manage (bad!)

‣ Demand is dynamic

‣ Developers are crucial to Operations

‣ Web / Cloud services are proliferating...and Enterprise is following along.

Page 38: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is changing

10

‣ Easier to get (good!)...but harder to manage (bad!)

‣ Demand is dynamic

‣ Developers are crucial to Operations

‣ Web / Cloud services are proliferating...and Enterprise is following along.

‣ Manual configuration no longer a crutch

Page 39: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is changing

10

‣ Easier to get (good!)...but harder to manage (bad!)

‣ Demand is dynamic

‣ Developers are crucial to Operations

‣ Web / Cloud services are proliferating...and Enterprise is following along.

‣ Manual configuration no longer a crutch

‣ Few tools to solve a ubiquitous problem

Page 40: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Managing Infrastructure Is HardHas Always Been

1980

1989

1999

2001

•Solve very little of the problem...

•Reach just a handful of large, enterprise customers

•Require custom implementations with large professional services bills

•Deployed exclusively on-premise

•Acquired by companies with large consulting organizations (IBM, HP, CA)

Previous Attempts Typically...

Proprietary Solutions

Page 41: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Google, Amazon, Microsoftbuilt their own tools

12

Page 42: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved13

but it’s “secret sauce”

Page 43: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

P

everyone else is here

... inexperienced & poorly equipped for the world they must now operate in.

14

Page 44: Gov 2.0: Scaling, Automation, & Management in the Cloud

“Cloud”

Page 45: Gov 2.0: Scaling, Automation, & Management in the Cloud

Cloud

Alistair’s mom’s definition

Page 46: Gov 2.0: Scaling, Automation, & Management in the Cloud

Cloud Web=

Alistair’s mom’s definition

Page 47: Gov 2.0: Scaling, Automation, & Management in the Cloud

Cloud Web= Internet=

Alistair’s mom’s definition

Page 48: Gov 2.0: Scaling, Automation, & Management in the Cloud

Cloud Web= Internet= Useless=

Alistair’s mom’s definition

Page 49: Gov 2.0: Scaling, Automation, & Management in the Cloud

Slide courtesy Alistair Croll - [email protected]

Page 50: Gov 2.0: Scaling, Automation, & Management in the Cloud

Private Public

Slide courtesy Alistair Croll - [email protected]

Page 51: Gov 2.0: Scaling, Automation, & Management in the Cloud

Managedhosting

Virtualization

Private Public

Slide courtesy Alistair Croll - [email protected]

Page 52: Gov 2.0: Scaling, Automation, & Management in the Cloud

Managedhosting

Virtualization

Private Public

IaaS IaaS

Slide courtesy Alistair Croll - [email protected]

Page 53: Gov 2.0: Scaling, Automation, & Management in the Cloud

Managedhosting

Virtualization

Private Public

PaaS PaaS

IaaS IaaS

Slide courtesy Alistair Croll - [email protected]

Page 54: Gov 2.0: Scaling, Automation, & Management in the Cloud

Managedhosting

Virtualization

Private Public

SaaS

PaaS PaaS

IaaS IaaS

Slide courtesy Alistair Croll - [email protected]

Page 55: Gov 2.0: Scaling, Automation, & Management in the Cloud

Managedhosting

Virtualization

Private Public

SaaS

PaaS PaaS

IaaS IaaS

If you want to

talk clouds,

pick one first.

Slide courtesy Alistair Croll - [email protected]

Page 56: Gov 2.0: Scaling, Automation, & Management in the Cloud

Infrastructure as a Service(IaaS)

Amazon EC2, Rackspace Cloud, Terremark, Gogrid, Joyent (and nearly every private cloud built on Zenserver or VMWare.)

Slide courtesy Alistair Croll - [email protected]

Page 57: Gov 2.0: Scaling, Automation, & Management in the Cloud

Dedicatedhardware

On-premiseprivate clouds

Virtualprivate clouds

Third-partypublic clouds

Slide courtesy Alistair Croll - [email protected]

Page 58: Gov 2.0: Scaling, Automation, & Management in the Cloud

Slide courtesy Alistair Croll - [email protected]

Page 59: Gov 2.0: Scaling, Automation, & Management in the Cloud

Always on premise

Private

Compliance-enforced

Need to track and audit

Legislative

Data near local computation

Slide courtesy Alistair Croll - [email protected]

Page 60: Gov 2.0: Scaling, Automation, & Management in the Cloud

Always on premise

Private

Compliance-enforced

Need to track and audit

Legislative

Data near local computation

Can be done anywhere

Testing

Training

Prototyping

Batch processing

Seasonal load

Slide courtesy Alistair Croll - [email protected]

Page 61: Gov 2.0: Scaling, Automation, & Management in the Cloud

Always on premise

Private

Compliance-enforced

Need to track and audit

Legislative

Data near local computation

Can be done anywhere

Testing

Training

Prototyping

Batch processing

Seasonal load

Always in cloud

Partner access

Proximity to cloud services (storage,

CDN, etc.)

Massively grid/parallel (genomic,

modelling)

Slide courtesy Alistair Croll - [email protected]

Page 62: Gov 2.0: Scaling, Automation, & Management in the Cloud

Always on premise

Private

Compliance-enforced

Need to track and audit

Legislative

Data near local computation

Can be done anywhere

Testing

Training

Prototyping

Batch processing

Seasonal load

Always in cloud

Partner access

Proximity to cloud services (storage,

CDN, etc.)

Massively grid/parallel (genomic,

modelling)Lo

ad/p

ricin

g en

gine

Slide courtesy Alistair Croll - [email protected]

Page 63: Gov 2.0: Scaling, Automation, & Management in the Cloud

Always on premise

Private

Compliance-enforced

Need to track and audit

Legislative

Data near local computation

Can be done anywhere

Testing

Training

Prototyping

Batch processing

Seasonal load

Always in cloud

Partner access

Proximity to cloud services (storage,

CDN, etc.)

Massively grid/parallel (genomic,

modelling)Lo

ad/p

ricin

g en

gine

Polic

y en

gine

Slide courtesy Alistair Croll - [email protected]

Page 64: Gov 2.0: Scaling, Automation, & Management in the Cloud

Always on premise

Private

Compliance-enforced

Need to track and audit

Legislative

Data near local computation

Can be done anywhere

Testing

Training

Prototyping

Batch processing

Seasonal load

Always in cloud

Partner access

Proximity to cloud services (storage,

CDN, etc.)

Massively grid/parallel (genomic,

modelling)Lo

ad/p

ricin

g en

gine

Polic

y en

gine

Virtual machine(infrastructure cloud)

Slide courtesy Alistair Croll - [email protected]

Page 65: Gov 2.0: Scaling, Automation, & Management in the Cloud

Always on premise

Private

Compliance-enforced

Need to track and audit

Legislative

Data near local computation

Can be done anywhere

Testing

Training

Prototyping

Batch processing

Seasonal load

Always in cloud

Partner access

Proximity to cloud services (storage,

CDN, etc.)

Massively grid/parallel (genomic,

modelling)Lo

ad/p

ricin

g en

gine

Polic

y en

gine

Compute task(service cloud)

Slide courtesy Alistair Croll - [email protected]

Page 66: Gov 2.0: Scaling, Automation, & Management in the Cloud

Automation

Page 67: Gov 2.0: Scaling, Automation, & Management in the Cloud

Bootstrapping

Page 68: Gov 2.0: Scaling, Automation, & Management in the Cloud

Bootstrapping ApproachesGood Bad Time

Corp Approvals

Agile Corp Approvals

Cloud

Known Costs, No Variation.

Anything you want, as long as IT pre-approved it.

High Waste (Hoarding)Red Tape

Expensive ($/Time)Long lead time

6-8w

Known Costs.Total Hardware Control.

Trivial Approvals.

Lower WasteLess Red Tape

Still slowExpensive ($/Time)Shorter lead time

2-4w

Variable Costs.Highly Adaptable.Minimal lead time.Trivial approvals.

No humans needed.

Variable Costs.No control over hardware.

Must re-train.5-10m

Page 69: Gov 2.0: Scaling, Automation, & Management in the Cloud

curl -O http://brainspl.at/velocity.sh && sh velocity.sh

Configuration

Page 70: Gov 2.0: Scaling, Automation, & Management in the Cloud

Configuration ApproachesGood Bad

Manual

Ad-Hoc

Infrastructure as Code

You can do anything.Results in an intimate knowledge

of the details.

Slow.Error Prone (Bus Error!)

Non-repeatable.Difficult knowledge transfer.

More repeatable.Knowledge is dispersed.

Built your way, with your model.

Rarely idempotent.Hard to collaborate.

Brittle.No API.

Repeatable.Idempotent.

Agile.Sharable.

Self documenting.

Have to learn how to use it.Hard things remain hard.

Not magic. (Yet!)

Page 71: Gov 2.0: Scaling, Automation, & Management in the Cloud

Command and Control

Page 72: Gov 2.0: Scaling, Automation, & Management in the Cloud

Command and ControlGood Bad

Meatcloud*

Ad-Hoc

Framework

Super flexible.Can do almost anything.

Always easy to find someone to blame.

Free will.

Error Prone.Slow.

Expensive to Scale.Not repeatable.

Free will.

More repeatable.Easier to scale.

Less error prone (hopefully!)

One-off by neccessity.Tooling sprawl.

Hard to share solutions.Much higher learning curve.

One system to learn.Scales well.

Paint by numbers.Repeatable.

Two-Way.

Not everything maps cleanly.Trades depth of knowledge for

ease of use.

*Meatcloud appears in this presentation courtesy of Andrew Shafer - http://is.gd/Ega

Page 73: Gov 2.0: Scaling, Automation, & Management in the Cloud

Lightning Strikes!

Webservers

Webservers

Database Servers

DOOM

Page 74: Gov 2.0: Scaling, Automation, & Management in the Cloud

Lightning Strikes!

Webservers

Webservers

Database Servers

XX X

DOOM

Page 75: Gov 2.0: Scaling, Automation, & Management in the Cloud

Lightning Strikes!

Webservers

Webservers

Database Servers

XX X

Configuration

BootstrappingCommand &Control

MonitoringSystem Updates

Signals Moar!

Provisions

11

12

12

1313

1414

15

DOOM

Page 76: Gov 2.0: Scaling, Automation, & Management in the Cloud

Lightning Strikes!

Webservers

Webservers

Database Servers

XX X

Configuration

BootstrappingCommand &Control

MonitoringSystem Updates

Signals Moar!

Provisions

11

12

12

1313

1414

15

DOOM

Monitoring Signals Nanite

/node/down Service

Page 77: Gov 2.0: Scaling, Automation, & Management in the Cloud

Lightning Strikes!

Webservers

Webservers

Database Servers

XX X

Configuration

BootstrappingCommand &Control

MonitoringSystem Updates

Signals Moar!

Provisions

11

12

12

1313

1414

15

DOOMNanite

boots new EC2 Instances, with

Chef Role + Attribute

Nanite removes nodes in Chef

Page 78: Gov 2.0: Scaling, Automation, & Management in the Cloud

Lightning Strikes!

Webservers

Webservers

Database Servers

XX X

Configuration

BootstrappingCommand &Control

MonitoringSystem Updates

Signals Moar!

Provisions

11

12

12

1313

1414

15

DOOMProvisions

Instances, EBS, Elastic IPs

Page 79: Gov 2.0: Scaling, Automation, & Management in the Cloud

Lightning Strikes!

Webservers

Webservers

Database Servers

XX X

Configuration

BootstrappingCommand &Control

MonitoringSystem Updates

Signals Moar!

Provisions

11

12

12

1313

1414

15

DOOM

Chef configures nodes

according to assigned

Page 80: Gov 2.0: Scaling, Automation, & Management in the Cloud

Lightning Strikes!

Webservers

Webservers

Database Servers

XX X

Configuration

BootstrappingCommand &Control

MonitoringSystem Updates

Signals Moar!

Provisions

11

12

12

1313

1414

15

DOOM

Chef updates the monitoring

system

Page 81: Gov 2.0: Scaling, Automation, & Management in the Cloud

A word about Scaling...

Page 82: Gov 2.0: Scaling, Automation, & Management in the Cloud

Typical Peak Load

Graphs in this portion of the presentation taken from Theo Schlossnaglehttp://omniti.com/seeds/dissecting-todays-internet-traffic-spikes

1.Bring on capacity as traffic ramps up2.Take down capacity as it ramps down3.10-15 Minutes on either side, fully unattended

Page 83: Gov 2.0: Scaling, Automation, & Management in the Cloud

Atypical Load

1.Hope you know it is coming.2.Increase capacity in advance.3.Take down capacity as it ramps down.

Graphs in this portion of the presentation taken from Theo Schlossnaglehttp://omniti.com/seeds/dissecting-todays-internet-traffic-spikes

No way around

Capacity Planning

However,you are

still better off!

Page 84: Gov 2.0: Scaling, Automation, & Management in the Cloud

Capacity Planning is king.

http://www.flickr.com/photos/allspaw/2095439645/sizes/l/

Page 85: Gov 2.0: Scaling, Automation, & Management in the Cloud
Page 86: Gov 2.0: Scaling, Automation, & Management in the Cloud

Have a queue?

Page 87: Gov 2.0: Scaling, Automation, & Management in the Cloud

Have a queue?

Does it scale linearly with more resources?

Page 88: Gov 2.0: Scaling, Automation, & Management in the Cloud

Have a queue?

Does it scale linearly with more resources?

Congratulations - you can auto-scale!

Page 90: Gov 2.0: Scaling, Automation, & Management in the Cloud

CAP Theorem

• Consistency

• Availability

• Partition Tolerance

PickTwo

Page 91: Gov 2.0: Scaling, Automation, & Management in the Cloud

Most SQL Databases

• Choose Consistency over all

• Availability comes distant second

Page 92: Gov 2.0: Scaling, Automation, & Management in the Cloud

Web Applications need...

• Availability

• Partition Tolerance

Page 93: Gov 2.0: Scaling, Automation, & Management in the Cloud

“Global temporal consistency is a fiction”

Christopher Brown

Page 94: Gov 2.0: Scaling, Automation, & Management in the Cloud

Choosing Consistency for your Web App...

Means failure is global

Page 95: Gov 2.0: Scaling, Automation, & Management in the Cloud

When you choose Partition Tolerance and

Availability...

You fail or succeed for a subset of users

Page 96: Gov 2.0: Scaling, Automation, & Management in the Cloud

Apologies

• Apologize after the fact for failures

• Better than nothing at all

Page 97: Gov 2.0: Scaling, Automation, & Management in the Cloud

NoSQL

• Many different tools

• They tweak CAP differently

• CouchDB

• Cassandra

• Redis

• MongoDB

Page 98: Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Speaker:

[email protected]‣ @jesserobbins‣ www.opscode.com

Jesse Robbins CEO

Scaling in the Cloud

43