Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling...

34
NO POWER STRUGGLES:COORDINATED MULTI-LEVEL POWER MANAGEMENT FOR THE DATA CENTER Ramya (UCSB), Parthasarathy et al (HP Labs)

Transcript of Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling...

Page 1: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

NO POWER STRUGGLES:COORDINATED MULTI-LEVEL POWER MANAGEMENT FOR THE DATA CENTER

Ramya (UCSB), Parthasarathy et al (HP Labs)

Page 2: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Overview

Power delivery, consumption and cooling problems in a data center are being tackled currently by several systems that address “separate” aspects of these problems either locally/globally, in hardware/software.

When these systems are deployed simultaneously, the policies of one tends to interfere with the others

Page 3: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Overview…

The lack of coordination amongst such systems leads to undesirable consequences.

This paper proposes a “Global Power Management Solution” that coordinates these individual solutions.

Page 4: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Classifying the existing power management solutions..

Approach used: localized/distributed resource management, VMs

Power control : voltage scaling, power states, turning off machines

Implementation scope: server/cluster/data center level

Optimization requirements and constraints: accept performance loss?, allow power budget violation ?

Page 5: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

In a nutshell..

“Tracking” problem – optimize power consumption while delivering performance.

“Capping” problem – Optimize power provisioning and cooling so as not to violate the power budget.

“Optimization” problem – maximize power saving while minimizing performance loss. (ACPIs, VMs, etc)

Page 6: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Representative Power Management Solutions Efficiency Controller (EC -tracking) –

optimize per server avg. power consumption. Adjusts ACPI P- states based on past resource usage to manage “estimated” future demand.

Server Manager (SM – capping) – Reduce P-state of a server on violation of Power budget.

Page 7: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Representative solutions.. Enclosure Manager (EM ) – thermal

power capping at blade level Group Manager (GM ) – at rack or data

center level These two monitor power usage on sets

of machines and re-provision power to maintain group power budget (determined manually or mandated by higher level power managers)

Page 8: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Representative solutions.. Virtual Machine Controller (VMC) –

reduce average power usage across a set of machines by workload consolidation, turning of idling machines, etc.

Page 9: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Power Struggles..

What happens if these solutions are deployed simultaneously ?

Page 10: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Power Struggles - examples EC and the SM both operate on the same

knob/actuator (P-state) but for different metrics. If uncoordinated, the EC can potentially overwrite the SM leading to power budget violations and eventual thermal failover! – A correctness issue.

Page 11: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Examples.. If the VMC and group cappers are

uncoordinated, the VMC can consolidate more capacity onto a collection of servers than allowed by the group power budget.

In addition to excessive performance violations (inefficiency), the VMC can potentially react to the lower utilization (because of power capping) and pack even more workloads onto the server, leading to a vicious cycle and system instability

Page 12: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Design Challenges of a Coordination System Interaction between different controllers

(EC, SM, EM, etc) must maintain “correctness, stability and efficiency”.

Global Awareness of the “presence” of other controllers while having minimal/zero knowledge of their properties.

Adaptability and Scalability – new controllers with same/different properties, new applications, etc.

Page 13: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Design Challenges - Sensitivity Issues. Overlapping functionalities and policies

of controllers – can they be mitigated ? Is the Coordinated Management System

agnostic to the deployed systems and applications (workloads) ?

Page 14: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

The Design

Page 15: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

The Design..

Use of feedback control loops.

Measure the required “metric”, compare with the “reference” value and manipulate the actuator based on the error so that the output follows the reference.

Page 16: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Details..

Diagram Efficiency Controller EC:

Reference utilization rref

Actual utilization ri

If ri < rref adjust Actuator A (P-State) ie reduce from say P0 to P4, resulting in higher utilization and lower power usage.

Page 17: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Details..

Diagram Server Manager SM:

Power Capping by measuring per server power consumption

If current consumption exceeds “power budget”, SM “INCREASES rref “ thereby allowing the EC to reduce the P-State of the machine

In effect, EC and SM use rref as communication channel.

Page 18: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Design..

EM & GM:Same principle as SM. Compare current

power usage against ref. power budget and assign new values to lower level servers ( EM ->SM, GM->EM) based on some policy (FIFO, random, etc).

The lower level servers pick the “minimum of upper level recommendation and their own local power budget”.

Page 19: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Design.. VMCs:

Use Actual utilization instead of “apparent” utilization (100% at P0 is not same as 100% at P3).

Supplied with data about approx power budget at various levels.

Also supplied with data about current power budget violations at various levels (through CIM)

The above three enable the VMCs to consolidate right workloads and making sure that the consolidated servers don’t violate the power budgets nor fall into the vicious cycle mentioned earlier.

Page 20: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Summary of changes to the controllers

Page 21: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Modeling the Controllers

Power – Performance Model – run actual workloads on hardware at different utilization levels and measure the power and performance.

Through curve-fitting of the simulation data, obtain linear models that represent the controller behavior.

Page 22: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.
Page 23: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Modeling..

EC - scaled up or down by λ (changes proportional to error in utilization).

r_ref is increased by SM in case of power budget violation cap_loc, resulting in EC lowering the power states of the machines.

Page 24: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Modeling.. SM: manipulates r_ref of EC if its power

budget violates cap_loc , subject to a cap determined by βloc factor.

EM & GM – operate on a fair share policy, power allocated to a component is proportional to power consumed in last interval

Page 25: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Modeling..

VMCs – Constrained Optimization Problem to map n VMs to m servers (decision variable matrix X).Include total power consumption and

migration overhead (αM ) in the calculation

Consider Server capacity constraints

Page 26: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Modeling VMCs..

Consider local, enclosure and group level power budget constraints

The level of consolidation is tuned by tuning the power budget buffers based on the violations at different levels.

Page 27: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Modeling VMCs..

Equations 1 to 6 depict a 0-1 integer optimization problem.

The authors use a greedy bin packing algorithm that yields an approximate optimal solution for the placement of VMs

Page 28: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Evaluation

How?Real time deployment in Data Center or a

full-system simulation ?○ Impractical, limits the set of use case

scenarios that can be studied due to the actual system being tested

Use of trace-driven simulation – use real world traces of enterprise deployments that would enable detailed workload modeling and evaluation of tradeoffs at policy and system levels. -?

Page 29: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Metrics used

Aggregate Power Saving, performance loss and power budget violation at SM, EM and GM levels.

No peak power saving is measured. No workload queuing i.e. if workload

exceeds capacity, there is performance loss due to power capping. No demand carry over.

Page 30: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Experimentation

180 workload traces (databases, web servers, remote desktops, e-commerce, etc).Create different types of mixes (real &

synthetic) from this set to exercise different utilization scenarios.

SUT – A low power Blade server A and an entry level 2U server B.

Experiment with different power budgets and also study the sensitivity of this architecture by varying the time constants.

Page 31: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Power – Performance models for Blade A and Server B

Page 32: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Results

Baseline: No power management

Page 33: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Results.. Base Results:

Coordinated – 64% reduction in power consumption, 3% performance degradation and 5% power budget violation

Uncoordinated – 12 % performance loss and 7% budget violation.

Sensitivity towards different Systems:Blade A - 5 p-states over higher power

rangeServer B - 6 p-states over low power range.Blade A’s absolute power saving > Server B.

○ Implies, “Range of Power control is more important than its granularity”

Page 34: Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.

Results.. Variation for different workloads

At low utilization, VMC is major contributor to savings (assuming idle machines are “turned off”).

As utilization increases, benefits of VMC decrease while the combination of EC & VMC is better (i.e. a Coordinated Solution is better than a single one).

If idle m/c are not switched off, savings drop “significantly”!