Power is Leading Design Constraint

14
Power is Leading Design Constraint Direct Impacts of Power Management IDC: Server 2% of US energy consumption and growing exponentially HPC cluster market growing 44%/year 2013, HPC cluster will be largest fraction of server mkt. dramatic power reduction for HPC will have enormous impact on power and carbon footprint Indirect Impacts of Power Management Makes construction of exascale machines feasible Direct power towards useful work 99% of energy use is not targeted at useful work Thermals dictate design limits Enables higher bandwidth and higher computational rate if power up part-time More performance for application Broader impact across IT sector energy reduction

description

Power is Leading Design Constraint. Direct Impacts of Power Management IDC: Server 2% of US energy consumption and growing exponentially HPC cluster market growing 44%/year 2013, HPC cluster will be largest fraction of server mkt. - PowerPoint PPT Presentation

Transcript of Power is Leading Design Constraint

Page 1: Power is Leading Design Constraint

Power is Leading Design Constraint• Direct Impacts of Power Management

– IDC: Server 2% of US energy consumption and growing exponentially• HPC cluster market growing 44%/year• 2013, HPC cluster will be largest fraction of server mkt.

– dramatic power reduction for HPC will have enormous impact on power and carbon footprint

• Indirect Impacts of Power Management– Makes construction of exascale machines feasible– Direct power towards useful work

• 99% of energy use is not targeted at useful work• Thermals dictate design limits• Enables higher bandwidth and higher computational rate if power up part-time• More performance for application

– Broader impact across IT sector energy reduction

Page 2: Power is Leading Design Constraint

Computing Energy Consumption

Page 3: Power is Leading Design Constraint
Page 4: Power is Leading Design Constraint

State of the Art• Power down underutilized components

– DVFS (SW/HW) to power down components you are underutilizing– Memory can also be put in low power modes when underutilized– MAID disks can be powered down incrementally to reduce power

• Explicitly manage data movement– SSDs for lower I/O power while maintaining performance– Offload work to accelerators when more effective– Management of data movement through memory hierarchy (logistics)

Current approaches are narrowly focused and not scalable

Page 5: Power is Leading Design Constraint

Problems• No Scalable System-Level approaches

– Power management services derived from commodity market make only local decisions

– Locally optimal decisions are not globally optimal– Non-scalable data aggregation or filtering for control systems decisions

• Lack of standards for power monitoring, control, policy description– Required for both vertical and horizontal integration

• Control loop for system-scale optimization is fundamentally broken– Lack of predictive models for response to control decisions– No common expression of policy or objective– No comprehensive monitoring or data aggregation

• No tool support for integration of power management into application codes (apps people have enough to worry about)

Page 6: Power is Leading Design Constraint

Research Agenda• Power Performance monitoring & aggregation that scales to 1B+ core system• Control system that spans system software stack that can disseminate control

decisions across 1B+ cores• Scalable control algorithms to bridge gap between global and local models

– analytical power models of system response– empirical models based on advanced learning theory

• Optimally tune system based on control loop– Comprehensive instrumentation that connects to the control system– Need Declarative objective function specification for control system– Both online and offline tuning options based on advanced search pruning heuristics

• Effective power-aware and scalable resource control– Managing heterogeneous computing resources as OS level– Manage data movement and locality in memory hierarchy– Adaptable software to handle diversity of hardware features/designs

• Power instrumentation & control standardization– For coordination of international effort– For horizontal integration (e.g. so library components can interoperate effectively)– For vertical integration: (e.g. so that local DVFS coordinates with global system scheduling)

Page 7: Power is Leading Design Constraint

Cross-Cutting Research Agenda• Resource Management: OS and system management services

– Policy description (standardized) to do fine-grained management on chip– Standardized monitoring interfaces for energy & resource utilization (PAPI for energy)– Standardized models of HW power impact and algorithm performance to make logistical

decisions (when/where to move computation + response to adaptations)

• Algorithms: base order of complexity on energy cost of operations rather than #flops – communication-avoiding algorithms (how much to trade-off FLOPS for communication

before it doesn't work)– Enable libraries to be annotated for parameterized model of energy to articulate a policy to

manage those trade-offs (different architectures) – Standardized approach to lightweight models to predict response to resource adjustment

• Libraries: how do you build energy efficiency models / management interfaces in SW libraries standardized (software engineering)

– how do you make sure SCALAPACK libraries use policy & strategy description & controls that are compatible with FFTW

• Compilers: automagically instrument code for programmability– Automatically expose “knobs for control” and “sensors” for monitoring– How to automatically generate models to predict response to resource adaptation

• Applications: effective declarative annotations to convey application characteristics and requirements

Page 8: Power is Leading Design Constraint

What Happens If We Do Nothing?

• HPC system power will be unfeasibly large– 100+ Megawatts by DARPA Projections

or

• Design trade offs to keep power under control will – narrow application scope– Reduce delivered performance

Page 9: Power is Leading Design Constraint

Metrics / Benefits• Performance: Reduce power without having corresponding

impact on performance• Programmability: The applications people cannot be

expected to manage power explicitly– Transparency requires support from compiler, libraries, and system

• Composability: SCALAPACK must be able to work with FFTW• Minimize number of incompatible ad-hoc approaches• Organize international effort

• Scalability: Must be able to use common infrastructure for OS, system level resource manager, and applications for unified strategy to meet objectives

• Useful to embedded , departmental AND Exascale systems

Page 10: Power is Leading Design Constraint

Priority Research Direction for Power/Energy (PE) Efficiency Cross-Cut

Key challenges

•Power Performance monitoring & aggregation that scales to 1B+ core system•Control system that can disseminate control decisions across 1B cores•Scalable control algorithms to bridge gap between global and local models

•Optimally tune system based on control loop

•Power-aware and scalable resource control

•Power instrumentation & control standardization

Power Efficiency: is leading design constraint, but optimization strategy is complex objective

Scalability: chip, node, system level objectives

Optimal control: requires accurate predictive models

Integration: cannot make policy decision without integrated & cohesive control, prediction, and monitoring approaches

Energy Efficiency: Apply power exactly where needed (reduces total power)

Performance: With power constraint, apply power where it matters most for performance

Programmability: achieve these objectives without huge additional effort from apps.

Makes delivery of exascale system feasible

Active Power management reduces design trade-offs that limit delivered application performance

Broader impact across entire HPC/server industry

Local optimizations can see impact in 2-4 years and comprehensive system level benefit in 5-10 years

Summary of research direction

Potential impact on software component Potential impact on usability, capability, and breadth of community

Page 11: Power is Leading Design Constraint

4.4.2 PowerEnergy Efficiency Adaptation

BaselineEnergy Monitoring

Interface Standards

Factor of 1.5xOS-level/Node Level

Energy Efficency Adaptation

Factor of 2xCompatible Energy Aware Library

And standardized interfaces

Factor of 5x power reductionAutomated Code Instrumentation(compilers and code-generators)

Factor of 10xAutomated systemLevel adaptation for

Energy efficiency

2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

Power Reduction over Baseline

Page 12: Power is Leading Design Constraint

Extra

Page 13: Power is Leading Design Constraint

Research Problems• Optimal Control: ( sensors and actuators )

– Need to define policy objectives• more complex than just “reduce power”• Describe trade-off space and express it to control system

– Model to accurate predict effect of actuators on performance and power• Need to be able to predict energy impact of any change • Need standard method for expressing predictive model

– Must have accurate, scalable and standardized interfaces to monitor response to model driven adaptation (predictor/corrector method)

• Dynamic Response– Explicit software control is not fast enough (need to define as policy)– Must have standardized approach for expressing policy– Need scalable approach to data reduction to enable fast policy decisions– Need scalable approaches for strategy optimization to achieve: Optimizing energy efficiency is

itself daunting optimization problem

• Scaling: Commodity market will give us chip-level adaptation– handle fine-grained (chip level), node level, and system level policy– Requires standardization of interfaces to express policy, model and collect sensor data to enable

unified response strategy to achieve objective

Page 14: Power is Leading Design Constraint

What are the Problems• Scalability

– Depth and Breadth (horizontal & vertical integration)– Diversity in scale and response time is nontrivial

• Optimality– Devices can only make local decisions– Optimal local decisions are not optimally for global system– Data assimilation to make global decisions requires software

• Responsiveness– Software cannot make decisions fast enough– Data assimilation for control decisions is huge problem– Optimal point of control is not easy to find