Capacity Managementand the Cloud

40
Danny Quilton, COO, Capacitas Capacity Management and the Cloud

description

Danny Quilton from Capacitas presented a paper, ‘Capacity Management and the Cloud’. The presentation made the case for capacity management of cloud-based services, highlighting the critical role of capacity management in controlling cloud cost. The presentation referenced a number of client engagement case studies to debunk some of the myths surrounding cloud: Capacity can be turned up instantaneously Capacity planning discipline is no longer required Cloud capacity is cheap Bottlenecks can be alleviated by expanding cloud capacity Capacity management can be delegated to the cloud provider Performance is guaranteed by the cloud provider

Transcript of Capacity Managementand the Cloud

Page 1: Capacity Managementand the Cloud

Danny Quilton, COO,

Capacitas

Capacity

Management and the

Cloud

Page 2: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

With the advent of cloud computing is capacity management still required?

© Capacitas 2002-2012 2

Page 3: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Capacity planning no longer required? When the Associated Press (AP) wanted the flexibility for application

hosting and cloud data storage in the cloud, they turned to the Windows Azure platform from Microsoft.

"Capacity planning is the thing that stands out as the biggest advantage of the Microsoft cloud model. The Windows Azure platform takes that out of the equation for us, unlike the other cloud providers." - Jonathan Malek, Chief Architect and Director of Research, Associated Press

See how Windows Azure helped the AP develop a new global API through easy scalability that removed the need for costly and time-consuming capacity planning.

Microsoft.com

© Capacitas 2002-2012 3

Page 4: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Agenda • Capacity management defined

• Flawed assumptions

• Case studies

• Summary

© Capacitas 2002-2012 4

Page 5: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

What is Capacity and Performance Management?

© Capacitas 2002-2012

Supply Demand

ICT capacity

Business demand

5

Page 6: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

What is Capacity and Performance Management?

© Capacitas 2002-2012

Cost Level of Service

Service availability

Business throughput

Service response time

Third-party provider

operational costs

Capital cost of ICT

Operational cost of ICT

6

Page 7: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

A Risk-based Approach

© Capacitas 2002-2012

Capacity Management is required

Business-critical service

Significant business growth

Extraordinary business peaks

High level of service

demanded by the business

Long lead times associated with

capacity upgrades

Highly competitive

market

Requirement to manage ICT

costs

High likelihood of a merger or

acquisition

7

Page 8: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Flawed Assumptions

© Capacitas 2002-2012

Capacity can be turned up

instantaneously

Capacity planning discipline is no longer required

Cloud capacity is cheap

Bottlenecks can be alleviated by expanding cloud

capacity

Capacity management can be delegated to

the cloud provider

Performance is guaranteed by the

cloud provider

8

Page 9: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Capacity can be turned up instantly?

© Capacitas 2002-2012

Public Cloud

Cloud instance may be brought up rapidly

Private Cloud

Our experience is that this is of the order of

weeks

9

Page 10: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Capacity planning discipline is no longer required?

How many servers? How much will it cost?

Time

Cloud

Demand Waste Capacity

Time

Physical

Demand Waste Capacity

© Capacitas 2002-2012 10

Page 11: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

The key questions

• Understand the application

• Model future user demand

• Model utilisation

• Understand acceptable utilisation thresholds

• Plan how many servers to buy and when

How many servers?

Time

Physical

Demand Waste Capacity

© Capacitas 2002-2012 11

Page 12: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

The key questions

• Understand the application

• Model future user demand

• Model utilisation

• Understand acceptable utilisation thresholds

• Plan how much to spend and when

How much will it cost?

Time

Cloud

Demand Waste Capacity

© Capacitas 2002-2012 12

Page 13: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Capacity planning discipline is no longer required?

• Consider an ecommerce service

• Variable service demand:

• Seasonality

• Promotions

• What will future demand look like?

© Capacitas 2002-2012

y = 12.088x - 436266R² = 0.9436

26/0

3/2

004

26/0

4/2

004

26/0

5/2

004

26/0

6/2

004

26/0

7/2

004

26/0

8/2

004

26/0

9/2

004

26/1

0/2

004

26/1

1/2

004

26/1

2/2

004

26/0

1/2

005

26/0

2/2

005

26/0

3/2

005

26/0

4/2

005

26/0

5/2

005

26/0

6/2

005

26/0

7/2

005

26/0

8/2

005

26/0

9/2

005

26/1

0/2

005

26/1

1/2

005

26/1

2/2

005

26/0

1/2

006

26/0

2/2

006

26/0

3/2

006

26/0

4/2

006

26/0

5/2

006

26/0

6/2

006

26/0

7/2

006

26/0

8/2

006

26/0

9/2

006

26/1

0/2

006

26/1

1/2

006

26/1

2/2

006

26/0

1/2

007

26/0

2/2

007

26/0

3/2

007

26/0

4/2

007

26/0

5/2

007

26/0

6/2

007

26/0

7/2

007

26/0

8/2

007

26/0

9/2

007

26/1

0/2

007

26/1

1/2

007

26/1

2/2

007

26/0

1/2

008

26/0

2/2

008

26/0

3/2

008

26/0

4/2

008

26/0

5/2

008

26/0

6/2

008

26/0

7/2

008

26/0

8/2

008

26/0

9/2

008

26/1

0/2

008

26/1

1/2

008

26/1

2/2

008

26/0

1/2

009

26/0

2/2

009

26/0

3/2

009

26/0

4/2

009

26/0

5/2

009

26/0

6/2

009

Dai

ly P

urc

has

es

Historical Service Demand for an e-commerce Service

Actual Daily Purchases Trend 180day Linear (Trend 180day)

13

Page 14: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Capacity planning discipline is no longer required?

• Demand planning must still be undertaken © Capacitas 2002-2012

01

/01

/20

06

01

/04

/20

06

01

/07

/20

06

01

/10

/20

06

01

/01

/20

07

01

/04

/20

07

01

/07

/20

07

01

/10

/20

07

01

/01

/20

08

01

/04

/20

08

01

/07

/20

08

01

/10

/20

08

01

/01

/20

09

01

/04

/20

09

01

/07

/20

09

01

/10

/20

09

01

/01

/20

10

01

/04

/20

10

01

/07

/20

10

01

/10

/20

10

01

/01

/20

11

01

/04

/20

11

01

/07

/20

11

01

/10

/20

11

01

/01

/20

12

Forecast Service Demand for an e-commerce Service

Actual Daily Purchases Forecast Daily Purchases

14

Page 15: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Capacity planning discipline is no longer required?

• Still require

• Performance testing

• Performance tuning

• Otherwise we have proliferation of capacity in the cloud

• Here we have a 3-fold increase in opex

© Capacitas 2002-2012

Non-tuned application

Server instance

1

Server instance

2

Server instance

3

Server instance

4

Server instance

5

Server instance

6

15

Tuned application

Server instance

1

Server instance

2

Page 16: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Case Study 1: Cloud Capacity is Cheap?

• E-commerce service on owned, physical infrastructure

• Proof of concept to assess capacity required on a private cloud

© Capacitas 2002-2012

Processing Capacity per Instance

Number of Instances

Owned infrastructure 8-core 32

Private Cloud Proposal A 4-core ?

Private Cloud Proposal B 8-core ?

16

Page 17: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Case Study 1: Cloud Capacity is Cheap?

• Testing carried out to established the relative capacity of current and proposed architecture

• Tests against key transactions in the e-commerce application

• Profound implications for the business case © Capacitas 2002-2012

CPU processing time per key transaction (relative to owned infrastructure; per core)

Number of Instances Required

Owned infrastructure 1.0 32

Private Cloud Proposal A 1.4 90

Private Cloud Proposal B 2.0 64

17

Page 18: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Case study 2: Capacity Planning in the Cloud

• A travel e-commerce service

• A mobile site to provide a better travel information during periods of disruption

• This site was to be hosted as a cloud-based service

• The demand on the mobile site would start off low and grow to unknown levels

© Capacitas 2002-2012 18

Page 19: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Performance Testing

To ensure that

• User response times will be within performance SLAs

• System will provide value for money within budgets

Find/Fix Code Defects

Determine and Optimise Response Times

Determine and Optimise Costs

© Capacitas 2002-2012 19

Page 20: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Step 1: Finding Code Defects • Before response times and costs can be measured, defects

introducing non-linearity must be found and fixed

• Memory leaks; logical bottlenecks; locking

© Capacitas 2002-2012 20

Page 21: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Step 2: Determining Response Time • Test

response times as experienced by the user

• Are times within SLAs?

© Capacitas 2002-2012 21

Average Before fix

Average After fix 1

Average After fix 2

Average SLA

Page 22: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Step 3: Cost Optimisation

• We have resolved code defects

• The service now meets response time SLA

• We can go live, right?

• No!

Service performance should be tuned to achieve cost optimality

© Capacitas 2002-2012 22

Page 23: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Step 3: Cost Optimisation • Pre-optimisation

© Capacitas 2002-2012 23

Capacity of the service (User visits per second per instance)

Number of instances required (to support 40 visits per second)

Pre-optimisation 2 20

• Post-optimisation

Capacity of the service (User visits per second per instance)

Number of instances required (to support 40 visits per second)

Post-optimisation 50 1

Page 24: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Step 3: Cost Optimisation Pre-optimisation

© Capacitas 2002-2012 24

Post-optimisation

Page 25: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Bottlenecks can be alleviated by expanding cloud capacity?

• Our experience is that most bottlenecks relate to logical rather than physical capacity constraints

© Capacitas 2002-2012

Physi

cal Capaci

ty CPU

Memory

Disk space

Disk I/O

Network bandwidth

Logic

al Capaci

ty Allocated size of a database table

Capacity of a third-party’s web service

The number of threads

The number of database locks

Free connections in a connection pool

25

Page 26: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Bottlenecks can be alleviated by expanding cloud capacity?

• Here there is a logical capacity constraint with regard a database table’s allocated space

• Increasing web or database instances capacity will not address the root cause

© Capacitas 2002-2012

Allocated space to database table

Database instance

Web server instance Web server instance Web server instance

26

Page 27: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Capacity management can be delegated to the cloud provider?

• Planning must be undertaken to forecast peak demand and size accordingly

• Requires business knowledge

• Requires specific skills

• Potential conflict of interest!

© Capacitas 2002-2012

01

/01

/20

06

01

/04

/20

06

01

/07

/20

06

01

/10

/20

06

01

/01

/20

07

01

/04

/20

07

01

/07

/20

07

01

/10

/20

07

01

/01

/20

08

01

/04

/20

08

01

/07

/20

08

01

/10

/20

08

01

/01

/20

09

01

/04

/20

09

01

/07

/20

09

01

/10

/20

09

01

/01

/20

10

01

/04

/20

10

01

/07

/20

10

01

/10

/20

10

01

/01

/20

11

01

/04

/20

11

01

/07

/20

11

01

/10

/20

11

01

/01

/20

12

Forecast Service Demand for an e-commerce Service

Actual Daily Purchases Forecast Daily Purchases

27

Page 28: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Case Study 3: Capacity management can be delegated to the cloud provider?

• A retailer

• E-commerce service

• Migrated the service from own infrastructure to a cloud service

• The number of processor cores in the cloud was the same as the number of processor cores on the previous infrastructure

• Service performance degraded post migration

© Capacitas 2002-2012 28

Page 29: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Case Study 3: Capacity management can be delegated to the cloud provider?

• No issues reported by service provider

• CPU loading measured at the guest

• Other than 1 server CPU loading was within acceptable bounds!

© Capacitas 2002-2012 29

Page 30: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Case Study 3: Capacity management can be delegated to the cloud provider?

• However evidence of high CPU queue lengths and service performance degradation

• Insufficient processor capacity configured on the host machines

© Capacitas 2002-2012 30

Page 31: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Case Study 4: Capacity management can be delegated to the cloud provider?

• e-commerce travel

• Cloud database on Microsoft Azure

• Service went live with a 5 GB database

• Low growth expected

© Capacitas 2002-2012 31

• Post go-live the database was growing at 2.1 GB per day

• Forecast growth of 767 GB over the first year

• SQL Azure databases instances limited to 150 GB

• So forecast capacity requirement of 6 databases, each 130 GB in size

Page 32: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Case Study 4: Cost Implications

Current capacity

© Capacitas 2002-2012 32

Forecast capacity

Page 33: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Case Study 4: Capacity management can be delegated to the cloud provider?

• Investigation demonstrated that the growth of 2.1 GB per day was due to:

1. Poor archiving

2. ‘Scrapers’ searching for invalid route combinations, resulting in large numbers of database inserts

© Capacitas 2002-2012 33

Page 34: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Case Study 5: Remediation of Performance Issues

• Application hosted in the Microsoft Azure cloud (web app and database)

• Performance testing proved a response time degradation

• Web tier OK

• It was not possible to launch any performance tools/diagnostics against the database service

• Extremely difficult to establish root cause!

© Capacitas 2002-2012 34

Page 35: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Performance is Guaranteed by the Cloud Provider?

• Extract from the Amazon EC2 SLA

• End to end performance is not guaranteed

© Capacitas 2002-2012 35

Page 36: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Performance is Guaranteed?

• Counter-argument is to ‘design for failure’

• Automatically detect capacity constraints

• Automatically detect unhealthy instances

• Then automatically bring up new instances

• Risk that instance proliferation can adversely impact system-wide performance

• However end to end performance is still not guaranteed

© Capacitas 2002-2012 36

Page 37: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Comparing Performance of Cloud Service Providers

© Capacitas 2002-2012 37

Page 38: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Summary • Clear benefits of cloud computing

• However capacity management is still required

• Capacity management is key to managing the cost of a cloud services

© Capacitas 2002-2012

38

Page 39: Capacity Managementand the Cloud

itSMF UK Conference 2012

Capacity Management and the Cloud

Questions?

www.capacitas.co.uk

[email protected]

© Capacitas 2002-2012

39

Page 40: Capacity Managementand the Cloud