Starting for the Cloud, OW2 Conference Nov10

19
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org. Starting for the cloud -- two issuses in cluster: resource allocation and overload management Ziyou Wang, Yan Li, Chao You, Minghui Zhou Peking University [email protected] [email protected]

Transcript of Starting for the Cloud, OW2 Conference Nov10

Page 1: Starting for the Cloud, OW2 Conference Nov10

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

Starting for the cloud -- two issuses in cluster:

resource allocation and overload management

Ziyou Wang, Yan Li, Chao You, Minghui Zhou Peking University

[email protected] [email protected]

Page 2: Starting for the Cloud, OW2 Conference Nov10

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

Agenda

  Cloud Computing: Challenges   Resource Allocation

  Shared cluster   Resource allocation planning

  Overload Management   Examples   Automatic degradation mechanism   Considerations

Page 3: Starting for the Cloud, OW2 Conference Nov10

Cloud Computing: Challenges� The emergence of cloud computing makes it a cost-efficient way

for application providers to lease the computing resources from a third provider  Benefit: increase resource utilization, improve business agility,

decrease power consumption…  But how to effectively allocate various resources in cloud to

different applications is still an open problem.  When the applications host in the cloud face with overload, which

means the demand on at least one of the cloud’s resources exceeds the capacity of that resource, what can we do to handle this situation?

 … …

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

Page 4: Starting for the Cloud, OW2 Conference Nov10

Shared Cluster� Considering one kind of cloud implementation: the workloads of

different web applications are not correlated, a large-scale cluster, called shared cluster or data center, is maintained to host a large number of applications simultaneously  Each application runs on a subset of nodes  Each node may run multiple applications

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

Users Enterprises

Third parties

Page 5: Starting for the Cloud, OW2 Conference Nov10

Resource Allocation: a scenario  As the cluster’s resources are no longer occupied by one

application, it requires the cluster to allocate the resources on demand  For example

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

middleware

Node  150  

app  D  

High-­‐throughput    low-­‐latency  network  

app  C  

An increase of app A,C’s workload Place new instances in the data center

re-allocate workload

middleware

Node  1  

app  A   app  C  

Repository  

Apps  …

Other  nodes  

Dispatcher  

Applica>on  users  

middleware

Node  16  

app  B  app  A  

middleware

Node  99  

app  B   app  A  

Shared  cluster  

Page 6: Starting for the Cloud, OW2 Conference Nov10

Self-adaptive Resource Allocation Model

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

Resource  alloca>on    planning  

Resource  alloca>on  execu>on    

Requests

Self-­‐adap4ve    resource    alloca4on    

Page 7: Starting for the Cloud, OW2 Conference Nov10

Our Resource Allocation Work

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

Middleware

Virtual  Machine  Monitor�

VM�

customized  JOnAS�app  a�

… Resource  par>>oner�

App  deployer�

Dispatcher �

requests

Repository�

VM�

customized  JOnAS�app  x �

Communicator �

Local  valuator�

Resource  alloca>on  planning�

Resource  alloca>on  execu>on �

Middleware�

Resource    alloca>on    planning�

……

coopera>on �

Management    Console �

commands messages

For the resource allocation planning, we propose a decentralized resource allocation planning approach • Nodes decide their own resource allocation • Market-based coordination is adopted to help them make the resource decision Until now, the approach is evaluated with a serial of simulated experiments, and is being implemented in the cluster with JO2nAS

Page 8: Starting for the Cloud, OW2 Conference Nov10

Resource Allocation Planning  To support application prioritization, applications can be assign

with the different utility values. Accordingly, the goal of resource management is to maximize the total utility values of the requests satisfied

 Inspired by human market, we model the shared cluster as a market, where shares of application requests are treated as goods and nodes as dealers to exchange goods

 Basing on local valuation of the goods, each node autonomously and continuously trades with others in order to find an application share combination which fits the node’s resource constrains and maximize its income

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

Page 9: Starting for the Cloud, OW2 Conference Nov10

Resource Allocation Planning  When a node wants to sell, more than one node may want to buy.

To make the seller transfer the goods to the appropriate buyers, an auction mechanism is adopted

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

1. multicast

4. notify

2.1 valuation

2.1 valuation

2.1 valuation

4. inform

(appC, 50%, 100 req/s)

...

Node 1

app A app C Node 50

app A app B

Node 65

app B app C

Node 100

app B app D

...

Nodes

app

...

...

want C, 35%

want C, 20%

2.2

Sell C 30%

2.2

Sell C 20%

3. sort

4 notify N100: …

N65: …(app C, 10%)

N50: …

N1: … (app C, 70%)

Dispatcher

N100: …

N65: …(app C, 30%)

N50: … (app C,30%)

N1: … (app C, 20%)

update

(app C , 30% to n50, 20% to n65)

middleware

middleware

middleware middleware

middleware

Page 10: Starting for the Cloud, OW2 Conference Nov10

Our Resource Allocation Work

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

Middleware

Virtual  Machine  Monitor�

VM�

customized  JOnAS�app  a�

… Resource  par>>oner�

App  deployer�

Dispatcher �

requests

Repository�

VM�

customized  JOnAS�app  x �

Communicator �

Local  valuator�

Resource  alloca>on  planning�

Resource  alloca>on  execu>on �

Middleware�

Resource    alloca>on    planning�

……

coopera>on �

Management    Console �

commands messages

For the resource allocation execution • Integrate a VMM into the middleware • Automatically load the app and partition the resource at runtime via VMM • Customize JOnAS for the app, and store the customized image in the repository • Proportionally workload dispatching Now, we use Open VZ, a lightweight OS level VMM, as a case study, and are trying to integrate OpenVZ into the middleware

Page 11: Starting for the Cloud, OW2 Conference Nov10

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

Agenda

  Cloud Computing: Challenges   Resource Allocation

  Shared cluster   Resource allocation planning

  Overload Management   Examples   Automatic degradation mechanism   Considerations

Page 12: Starting for the Cloud, OW2 Conference Nov10

Examples

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

 On September 11th 2001, for instance, the workload on a popular news web site increased by an order of magnitude in 30 min, with the workload doubling every 7 min in that period.

 April 21th 2010, is the China National Mourning for Yushu Quake Victims. Theatre and sporting performances are cancelled, karaoke bars shut and the culture ministry has ordered suspension of all online music, games, comics, films and TV shows.

 Too many people choose to visit an online shopping site.

Page 13: Starting for the Cloud, OW2 Conference Nov10

When overload happens?  Overload prevention is a critical goal so that a system can remain

operational in the presence of overload even when the incoming request rate is several times greater than the system’s capacity.

 It is well known that the workload seen by Internet applications varies over multiple time-scales and often in an unpredictable fashion.

 Unexpected things are always happening:  Featured on national television or in a major newspaper.  Under-provisioning for sales-boosting holidays

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

Page 14: Starting for the Cloud, OW2 Conference Nov10

The TaoBao Architecture

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

 Apache + Application Server + MySQL  200+ applications, thousands of components  12k servers  2k~3k java servers

Search

Product Browsing

Product Recommendation

Shop Cart

14/46

Page 15: Starting for the Cloud, OW2 Conference Nov10

The Reality – Manual Service Degradation

 In response to overload:  CNN replaced its front page with simple HTML page that could

be transmitted in a single Ethernet packet .  Taobao turned off a sub system.

 All these techniques are implemented manually, though a better approach would be to degrade service gracefully and automatically in response to load.  Which point causes overload?  Which resource is the bottleneck?  Which service should be degraded or turned off?  All user be affected or not?

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

Page 16: Starting for the Cloud, OW2 Conference Nov10

Automatic Degradation Mechanism

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

 Overload Priority defines the priorities of different services and degradation actions can be taken.

 Overload Detection is responsible for signaling the occurrence of instable status of the application.

 Overload Localization is triggered to locate the bottleneck of resources.  Overload Controller will take appropriate actions to degrade some

unnecessary services to release more resources to support key services.

Resource Monitoring�

Mechanism�

Overload Detection�

Overload �Localization�

Overload Controller�

Performance �Metrics�

Degradation�Actions�

-�Applications�

Service � Service �

Service� Service�

Service �Overload Priority�

Page 17: Starting for the Cloud, OW2 Conference Nov10

Automatic Application Degradation

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

 Cluster level degradation  Coarse-grained

  Sub-system level degradation

 Resource management   Service differentiation

 Node level degradation  Fine-grained

  Component level degradation   Middleware level degradation

Page 18: Starting for the Cloud, OW2 Conference Nov10

Considerations  Hard to be transparent to the user ( what can de degraded?

sometimes how?)

 Using it alone can contribute to delay overload, but it needs to be combined with other techniques to be fully effective.  Dynamic resource allocation  Admission control  Service differentiation  … …

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.

Page 19: Starting for the Cloud, OW2 Conference Nov10

OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.