Gov 2.0: Scaling, Automation, & Management in the Cloud

Post on 15-Jan-2015

5.049 views 0 download

Tags:

description

Gov 2.0: Scaling, Automation, & Management in the Cloud

Transcript of Gov 2.0: Scaling, Automation, & Management in the Cloud

Copyright © 2010 Opscode, Inc - All Rights Reserved

Speaker:

‣ jesse@opscode.com‣ @jesserobbins‣ www.opscode.com

Jesse Robbins CEO

Scaling in the Cloud

1

Copyright © 2010 Opscode, Inc. – Confidential – Do Not Redistribute

Opscode makes a new kind of Infrastructure Automation, offered as a hosted Service.

2

For Developers...

For Developers...

• Do it yourself.

For Developers...

• Do it yourself.

• The infrastructure is the application (and vice versa).

For Developers...

• Do it yourself.

• The infrastructure is the application (and vice versa).

• You are not a Systems Administrator.

For Developers...

• Do it yourself.

• The infrastructure is the application (and vice versa).

• You are not a Systems Administrator.

• You need tools.

Sysadmins..

http://covers.oreilly.com/images/9780596007836/lrg.jpg

Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft

Sysadmins..• Say “Yes”.

http://covers.oreilly.com/images/9780596007836/lrg.jpg

Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft

Sysadmins..• Say “Yes”.

• You never liked rack and stack that much anyway.

http://covers.oreilly.com/images/9780596007836/lrg.jpg

Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft

Sysadmins..• Say “Yes”.

• You never liked rack and stack that much anyway.

• You have never been more critical.

http://covers.oreilly.com/images/9780596007836/lrg.jpg

Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft

Sysadmins..• Say “Yes”.

• You never liked rack and stack that much anyway.

• You have never been more critical.

• Lean into it.

http://covers.oreilly.com/images/9780596007836/lrg.jpg

Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft

Executives...

Executives...

• Not a magic unicorn

Executives...

• Not a magic unicorn

• Benefits come from efficiency, not raw Capex

Executives...

• Not a magic unicorn

• Benefits come from efficiency, not raw Capex

• Has real cultural implications at every level

Executives...

• Not a magic unicorn

• Benefits come from efficiency, not raw Capex

• Has real cultural implications at every level

• You are the biggest asset to success

Copyright © 2010 Opscode, Inc - All Rights Reserved 7

(http://radar.oreilly.com/archives/2007/10/operations-advantage.html)

10

20

30

40

50

“Traditional” Operations

# o

f H

our

s

05

101520

1 2 3 4 5 6 7 9 10 11 12

Ser

vers

Week #

10

20

30

40

50

Operations - The “Secret Sauce”

UpkeepConfigOS InstallHardware

05

101520

1 2 3 4 5 6 7 9 10 11 12

Week #

ExistingNew

Copyright © 2010 Opscode, Inc - All Rights Reserved 7

(http://radar.oreilly.com/archives/2007/10/operations-advantage.html)

10

20

30

40

50

“Traditional” Operations

# o

f H

our

s

05

101520

1 2 3 4 5 6 7 9 10 11 12

Ser

vers

Week #

10

20

30

40

50

Operations - The “Secret Sauce”

UpkeepConfigOS InstallHardware

05

101520

1 2 3 4 5 6 7 9 10 11 12

Week #

ExistingNew

This is the secret of Cloud Computing.

Every other virtue stems from here.

You are 10% Unique

You are 10% Unique

And itʼs probablythe things you did wrong

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is Hard

9

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is Hard

9

1999Inventory, packaged file transers and desktops

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is Hard

9

1999Inventory, packaged file transers and desktops

2005Unattended bare metal servers “very very” hard7k Nodes took 5 days w/90 success

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is Hard

9

1999Inventory, packaged file transers and desktops

2005Unattended bare metal servers “very very” hard7k Nodes took 5 days w/90 success

2007 Unattended bare metal in under 10 minutesFully configured in under 3 mins

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is Hard

9

1999Inventory, packaged file transers and desktops

2005Unattended bare metal servers “very very” hard7k Nodes took 5 days w/90 success

2007 Unattended bare metal in under 10 minutesFully configured in under 3 mins

2008 Unattended server in 2 minutes 5000 servers in a week

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is Hard

9

1999Inventory, packaged file transers and desktops

2005Unattended bare metal servers “very very” hard7k Nodes took 5 days w/90 success

2007 Unattended bare metal in under 10 minutesFully configured in under 3 mins

2008 Unattended server in 2 minutes 5000 servers in a week

201010k Nodes in under 5 minutes

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is changing

10

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is changing

10

‣ Easier to get (good!)...but harder to manage (bad!)

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is changing

10

‣ Easier to get (good!)...but harder to manage (bad!)

‣ Demand is dynamic

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is changing

10

‣ Easier to get (good!)...but harder to manage (bad!)

‣ Demand is dynamic

‣ Developers are crucial to Operations

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is changing

10

‣ Easier to get (good!)...but harder to manage (bad!)

‣ Demand is dynamic

‣ Developers are crucial to Operations

‣ Web / Cloud services are proliferating...and Enterprise is following along.

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is changing

10

‣ Easier to get (good!)...but harder to manage (bad!)

‣ Demand is dynamic

‣ Developers are crucial to Operations

‣ Web / Cloud services are proliferating...and Enterprise is following along.

‣ Manual configuration no longer a crutch

Copyright © 2010 Opscode, Inc - All Rights Reserved

Infrastructure is changing

10

‣ Easier to get (good!)...but harder to manage (bad!)

‣ Demand is dynamic

‣ Developers are crucial to Operations

‣ Web / Cloud services are proliferating...and Enterprise is following along.

‣ Manual configuration no longer a crutch

‣ Few tools to solve a ubiquitous problem

Copyright © 2010 Opscode, Inc - All Rights Reserved

Managing Infrastructure Is HardHas Always Been

1980

1989

1999

2001

•Solve very little of the problem...

•Reach just a handful of large, enterprise customers

•Require custom implementations with large professional services bills

•Deployed exclusively on-premise

•Acquired by companies with large consulting organizations (IBM, HP, CA)

Previous Attempts Typically...

Proprietary Solutions

Copyright © 2010 Opscode, Inc - All Rights Reserved

Google, Amazon, Microsoftbuilt their own tools

12

Copyright © 2010 Opscode, Inc - All Rights Reserved13

but it’s “secret sauce”

Copyright © 2010 Opscode, Inc - All Rights Reserved

P

everyone else is here

... inexperienced & poorly equipped for the world they must now operate in.

14

“Cloud”

Cloud

Alistair’s mom’s definition

Cloud Web=

Alistair’s mom’s definition

Cloud Web= Internet=

Alistair’s mom’s definition

Cloud Web= Internet= Useless=

Alistair’s mom’s definition

Slide courtesy Alistair Croll - alistair@rednod.com

Private Public

Slide courtesy Alistair Croll - alistair@rednod.com

Managedhosting

Virtualization

Private Public

Slide courtesy Alistair Croll - alistair@rednod.com

Managedhosting

Virtualization

Private Public

IaaS IaaS

Slide courtesy Alistair Croll - alistair@rednod.com

Managedhosting

Virtualization

Private Public

PaaS PaaS

IaaS IaaS

Slide courtesy Alistair Croll - alistair@rednod.com

Managedhosting

Virtualization

Private Public

SaaS

PaaS PaaS

IaaS IaaS

Slide courtesy Alistair Croll - alistair@rednod.com

Managedhosting

Virtualization

Private Public

SaaS

PaaS PaaS

IaaS IaaS

If you want to

talk clouds,

pick one first.

Slide courtesy Alistair Croll - alistair@rednod.com

Infrastructure as a Service(IaaS)

Amazon EC2, Rackspace Cloud, Terremark, Gogrid, Joyent (and nearly every private cloud built on Zenserver or VMWare.)

Slide courtesy Alistair Croll - alistair@rednod.com

Dedicatedhardware

On-premiseprivate clouds

Virtualprivate clouds

Third-partypublic clouds

Slide courtesy Alistair Croll - alistair@rednod.com

Slide courtesy Alistair Croll - alistair@rednod.com

Always on premise

Private

Compliance-enforced

Need to track and audit

Legislative

Data near local computation

Slide courtesy Alistair Croll - alistair@rednod.com

Always on premise

Private

Compliance-enforced

Need to track and audit

Legislative

Data near local computation

Can be done anywhere

Testing

Training

Prototyping

Batch processing

Seasonal load

Slide courtesy Alistair Croll - alistair@rednod.com

Always on premise

Private

Compliance-enforced

Need to track and audit

Legislative

Data near local computation

Can be done anywhere

Testing

Training

Prototyping

Batch processing

Seasonal load

Always in cloud

Partner access

Proximity to cloud services (storage,

CDN, etc.)

Massively grid/parallel (genomic,

modelling)

Slide courtesy Alistair Croll - alistair@rednod.com

Always on premise

Private

Compliance-enforced

Need to track and audit

Legislative

Data near local computation

Can be done anywhere

Testing

Training

Prototyping

Batch processing

Seasonal load

Always in cloud

Partner access

Proximity to cloud services (storage,

CDN, etc.)

Massively grid/parallel (genomic,

modelling)Lo

ad/p

ricin

g en

gine

Slide courtesy Alistair Croll - alistair@rednod.com

Always on premise

Private

Compliance-enforced

Need to track and audit

Legislative

Data near local computation

Can be done anywhere

Testing

Training

Prototyping

Batch processing

Seasonal load

Always in cloud

Partner access

Proximity to cloud services (storage,

CDN, etc.)

Massively grid/parallel (genomic,

modelling)Lo

ad/p

ricin

g en

gine

Polic

y en

gine

Slide courtesy Alistair Croll - alistair@rednod.com

Always on premise

Private

Compliance-enforced

Need to track and audit

Legislative

Data near local computation

Can be done anywhere

Testing

Training

Prototyping

Batch processing

Seasonal load

Always in cloud

Partner access

Proximity to cloud services (storage,

CDN, etc.)

Massively grid/parallel (genomic,

modelling)Lo

ad/p

ricin

g en

gine

Polic

y en

gine

Virtual machine(infrastructure cloud)

Slide courtesy Alistair Croll - alistair@rednod.com

Always on premise

Private

Compliance-enforced

Need to track and audit

Legislative

Data near local computation

Can be done anywhere

Testing

Training

Prototyping

Batch processing

Seasonal load

Always in cloud

Partner access

Proximity to cloud services (storage,

CDN, etc.)

Massively grid/parallel (genomic,

modelling)Lo

ad/p

ricin

g en

gine

Polic

y en

gine

Compute task(service cloud)

Slide courtesy Alistair Croll - alistair@rednod.com

Automation

Bootstrapping

Bootstrapping ApproachesGood Bad Time

Corp Approvals

Agile Corp Approvals

Cloud

Known Costs, No Variation.

Anything you want, as long as IT pre-approved it.

High Waste (Hoarding)Red Tape

Expensive ($/Time)Long lead time

6-8w

Known Costs.Total Hardware Control.

Trivial Approvals.

Lower WasteLess Red Tape

Still slowExpensive ($/Time)Shorter lead time

2-4w

Variable Costs.Highly Adaptable.Minimal lead time.Trivial approvals.

No humans needed.

Variable Costs.No control over hardware.

Must re-train.5-10m

curl -O http://brainspl.at/velocity.sh && sh velocity.sh

Configuration

Configuration ApproachesGood Bad

Manual

Ad-Hoc

Infrastructure as Code

You can do anything.Results in an intimate knowledge

of the details.

Slow.Error Prone (Bus Error!)

Non-repeatable.Difficult knowledge transfer.

More repeatable.Knowledge is dispersed.

Built your way, with your model.

Rarely idempotent.Hard to collaborate.

Brittle.No API.

Repeatable.Idempotent.

Agile.Sharable.

Self documenting.

Have to learn how to use it.Hard things remain hard.

Not magic. (Yet!)

Command and Control

Command and ControlGood Bad

Meatcloud*

Ad-Hoc

Framework

Super flexible.Can do almost anything.

Always easy to find someone to blame.

Free will.

Error Prone.Slow.

Expensive to Scale.Not repeatable.

Free will.

More repeatable.Easier to scale.

Less error prone (hopefully!)

One-off by neccessity.Tooling sprawl.

Hard to share solutions.Much higher learning curve.

One system to learn.Scales well.

Paint by numbers.Repeatable.

Two-Way.

Not everything maps cleanly.Trades depth of knowledge for

ease of use.

*Meatcloud appears in this presentation courtesy of Andrew Shafer - http://is.gd/Ega

Lightning Strikes!

Webservers

Webservers

Database Servers

DOOM

Lightning Strikes!

Webservers

Webservers

Database Servers

XX X

DOOM

Lightning Strikes!

Webservers

Webservers

Database Servers

XX X

Configuration

BootstrappingCommand &Control

MonitoringSystem Updates

Signals Moar!

Provisions

11

12

12

1313

1414

15

DOOM

Lightning Strikes!

Webservers

Webservers

Database Servers

XX X

Configuration

BootstrappingCommand &Control

MonitoringSystem Updates

Signals Moar!

Provisions

11

12

12

1313

1414

15

DOOM

Monitoring Signals Nanite

/node/down Service

Lightning Strikes!

Webservers

Webservers

Database Servers

XX X

Configuration

BootstrappingCommand &Control

MonitoringSystem Updates

Signals Moar!

Provisions

11

12

12

1313

1414

15

DOOMNanite

boots new EC2 Instances, with

Chef Role + Attribute

Nanite removes nodes in Chef

Lightning Strikes!

Webservers

Webservers

Database Servers

XX X

Configuration

BootstrappingCommand &Control

MonitoringSystem Updates

Signals Moar!

Provisions

11

12

12

1313

1414

15

DOOMProvisions

Instances, EBS, Elastic IPs

Lightning Strikes!

Webservers

Webservers

Database Servers

XX X

Configuration

BootstrappingCommand &Control

MonitoringSystem Updates

Signals Moar!

Provisions

11

12

12

1313

1414

15

DOOM

Chef configures nodes

according to assigned

Lightning Strikes!

Webservers

Webservers

Database Servers

XX X

Configuration

BootstrappingCommand &Control

MonitoringSystem Updates

Signals Moar!

Provisions

11

12

12

1313

1414

15

DOOM

Chef updates the monitoring

system

A word about Scaling...

Typical Peak Load

Graphs in this portion of the presentation taken from Theo Schlossnaglehttp://omniti.com/seeds/dissecting-todays-internet-traffic-spikes

1.Bring on capacity as traffic ramps up2.Take down capacity as it ramps down3.10-15 Minutes on either side, fully unattended

Atypical Load

1.Hope you know it is coming.2.Increase capacity in advance.3.Take down capacity as it ramps down.

Graphs in this portion of the presentation taken from Theo Schlossnaglehttp://omniti.com/seeds/dissecting-todays-internet-traffic-spikes

No way around

Capacity Planning

However,you are

still better off!

Capacity Planning is king.

http://www.flickr.com/photos/allspaw/2095439645/sizes/l/

Have a queue?

Have a queue?

Does it scale linearly with more resources?

Have a queue?

Does it scale linearly with more resources?

Congratulations - you can auto-scale!

CAP Theorem

• Consistency

• Availability

• Partition Tolerance

PickTwo

Most SQL Databases

• Choose Consistency over all

• Availability comes distant second

Web Applications need...

• Availability

• Partition Tolerance

“Global temporal consistency is a fiction”

Christopher Brown

Choosing Consistency for your Web App...

Means failure is global

When you choose Partition Tolerance and

Availability...

You fail or succeed for a subset of users

Apologies

• Apologize after the fact for failures

• Better than nothing at all

NoSQL

• Many different tools

• They tweak CAP differently

• CouchDB

• Cassandra

• Redis

• MongoDB

Copyright © 2010 Opscode, Inc - All Rights Reserved

Speaker:

‣ jesse@opscode.com‣ @jesserobbins‣ www.opscode.com

Jesse Robbins CEO

Scaling in the Cloud

43