Power Reduction Techniques for Microprocessor Systems by Timothy Goldberg
description
Transcript of Power Reduction Techniques for Microprocessor Systems by Timothy Goldberg
Power Reduction Techniques for Microprocessor Systems
by Timothy Goldberg
Paper by: Vasanth Venkatachalam and Michael FranzPublished 2005
Power Consumption and its Importance
Saving Power– Save money, save electricity, save the planet
Heat Dissipation– Heat density and cooling
Battery Life– Use less energy, extend battery running time
Outline Definition of Power and Energy Power Reduction Techniques
From the Circuit level through Hardware to Compiler and Application level techniques
Commercial Systems Emerging Technologies
Power and Energy Need to reduce both Power = Work / Time– Affects heat
Energy = Power * Time– Affects battery
Dynamic Power Consumption: Circuit activity Switched capacitance (depends on V, f, C, a)
• Clock gating Short-circuit current, transistors with opposite
charges (10-15% of total power)
Power and Energy Leakage Power Consumption: Static/Idle power
Depends on Voltage and Leakage Current Sub-threshold leakage: supply voltage, threshold
voltage, temperature. • Reduce Voltage, Fewer transistors, increase Threshold
voltage
Power Reduction From low level circuit changes Low-Power Interconnect Memories and Memory Hierarchies Hardware/Architecture Dynamic Voltage Scaling Resource Hibernation Compiler Application Cross-layer
Circuit and Logic Level Techniques
Transistor Sizing: Reduce width • less dynamic power consumption, but increases delay
Transistor Reordering: Minimize switching activity
• place frequently switching transistors closer to the circuit's outputs
Logic Gate Restructuring: Reduce switching– Gates must receive inputs at the same time
Circuit and Logic Level Techniques
Technology Mapping: Software tools• Find best configuration, based on restraints• Design circuit out of logic gates to minimize total power
consumption• NP-Hard DAG problem
Low Power Flip-Flops: Self-gating flip-flop: Reduce switching activity Dual-edge triggered: Reduce power dissipated by
clock signal
Circuit and Logic Level Techniques
Low Power Control: Processor as a FSM• Activate only the circuitry needed for current executing
sub-FSM Delay-Based Dynamic Supply Voltage– Look-up table of voltages and clock speeds has worst
case– Adjust voltage based on the delay and monitor errors
• Requires more hardware (shadow-latches)
Low-Power Interconnect Bus Encoding: inversion to reduce switching Crosstalk: activity in neighbor wires (shield
wire) Low Swing Buses: +300mV and -300mV instead
of +5V and -5V Immune to crosstalk, but increased hardware at
encoder and decoder Bus Segmentation: allows most of bus to remain
powered down when not communicating
Low-Power Interconnect Adiabatic Buses: Reuses existing charge– Reduce total capacitance– Delay in transferring charge
Network-On-Chip: – Functional units sharing buses: lack speed and
volume of transfers– Generic Interconnection Networks replace buses
• Concurrent connections
Low-Power Memories and Memory Hierarchies
Reduce power regardless of type (ROM/RAM) Split Memories into smaller Sub-Systems:
activate only the needed circuits in accesses Specialized cache to reduce accesses
Before first cache level, store application's working set Block Buffering – store most recently accessed cache
set Scratch Pad Memories – determined by compiler Trace cache: store instructions in executed order
Dynamic direction prediction-based trace cache Selective Trace Cache: compiler helps
Low-Power Processor Architecture Adaptations
Adaptive Caches: lines, blocks, or sets selectively activated based on miss threshold
Lost data and delay with No Voltage Cache Decay turns off unused cache lines after interval
Hot Spot Detection: count branch taken, activate cache lines within hotspot
Dead Block: powers down cache lines containing basic blocks that have reached final use (compiler-directed)
Architecture Adaptations Adaptive Instruction Queues: partitions powered
down when instructions aren't needed Heuristics: measure IPC, with thresholds
Algorithms for reconfiguring Multiple Structures: Adjust pipeline width and register update unit for
hotspots Tests configurations within hotspot Offline Profiling Occupancy-based
Selective Way Caches: measure cache hits in each way
Dynamic Voltage Scaling Modulate clock frequency and
supply voltage Dynamic, depending on
workload Difficulties:
Unpredictable workloads (tasks and I/O requests, predicting run-time)
Indeterminism – how to decide how fast?
Running an application at slowest speed may not be best
Non-linear effect of frequency
Dynamic Voltage Scaling Interval-Based approaches: measure how busy,
and estimate future, workloads are not regular Idling with a threshold, thrashing Aged Averages, weighted intervals
Intertask Approaches: assign speeds for different tasks
Monitor hardware events Frequency for tasks generated in offline mode, cannot
be known perfectly beforehand Unaware of program structure, such as memory access
Dynamic Voltage Scaling Intratask Approaches: Adjust processor speed and
voltage within tasks Split a task into fixed length Time Slots Slow down away from critical path, help from compiler
Memory Bounded Code: memory accesses limit how fast program can execute
Heuristics through experimentation Cache miss counter Stall cycle counter, PC marked as hot Measure rate of instructions, compute-intensive
Dynamic Voltage Scaling Multiple Clock Domain Architectures:
Globally Asynchronous Locally Synchronous chip: Chip split into multiple domains with independent clock
rates Allows certain sections of CPU to scale down when not
needed Needs to be divided such that communication between
domains doesn't waste more energy Can scale voltage based on instruction issue queues
Resource Hibernation Disk Drives: Stop rotating platter during idle
An acceptable threshold Delay non-urgent requests in a queue
Dynamic RPM Drives for servers Network Interfaces: can it be turned off?
Track idleness of devices, enter listening or sleep mode Allows network card to remain idle before shutting
down Displays: Dim display with no input
Face-off to recognize a face in front of display Zoned Backlighting: Adjust brightness of display
regions
Compiler-Level Power Management
Code that reduces execution time No fixed relationship between performance and power
Reduce memory accesses Remote Compilation and Remote Execution
Server compiles and mobile device downloads Cost of download must be less than compiling
Statically Optimized Compilers Program's runtime behavior may differ from expected Process will run on an unpredictable system
Compiler-Level Power Management
Dynamic Compilation: Program recompiled as runtime environment changes
Resources levels such as battery capacity and energy budgets
Trade-off of recompilation
Application-Level Power Management
Enable application to adapt to runtime environment
Trading off fidelity or quality of data to users Lower QoS when resources are low
Interfaces to allow applications to provide hints Allow application to communicate with OS, and OS
with hardware Expected execution of tasks, deadlines Better DVS, power down disk for longer periods of time
Cross-Layer Adaptations Forge: integrated power management
framework Streams videos at most efficient QoS level Frequency and voltage scaling, network card interface
Grace: adaptation framework Global and local adaptations
Compiler and Operating System interaction Compiler has a worst-case deadline OS adjusts processor speed to meet deadline
Conclusion of Techniques Multifaceted effort from various disciplines From transistors to applications, and across all
layers Still ongoing research, new algorithms and
heuristics Impossible to tell what new technologies will
prove most successful
Commercial Systems Pentium 4: high performance goal
Internal temperature cap Intel Speedstep – 2 frequency and voltage settings
Pentium M: mobile performance and low power Reduce switching activity in circuit, idle units and
buses Low leakage transistors in cache Enhanced Speedstep with 6 frequency/voltage settings
Intel PXA27x: wireless handheld devices Uses memory boundedness to manage power modes
Emerging Radical Technologies Fuel Cells to replace batteries
Chemical reaction, but can supply energy indefinitely Fuel enters anode, splits into proton + electron and
generates charge Fuel is abundantly available, such as hydrogen
Micro-electrical and Mechanical Systems Convert mechanical to electrical energy Millimeter scale turbine engines, ignite air with fuel Produce hot exhaust gases and flammability