Packetizing scalable streams in heterogeneous peer to-peer networks
HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for...
Transcript of HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for...
![Page 1: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/1.jpg)
HCAPP: Scalable Power Control for Heterogeneous 2.5DIntegrated Systems
Kramer Straube†, Jason Lowe-Power*, Christopher Nitta*, Matthew Farrens*, Venkatesh Akella†
†Department of Electrical & Computer EngineeringUniversity of California, Davis
*Department of Computer ScienceUniversity of California, Davis
![Page 2: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/2.jpg)
Summary• 2.5D systems are limited by the available package
pins– Many of these pins are used for supplying power
• Increasing utilization of these pins enables higher performance
• HCAPP– ensures a maximum power (eg. package pin limitation)– decoupled control through the power supply network
• 21% - 43% geomean speedup (on-die and off-die time constraint)
![Page 3: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/3.jpg)
Background
• Next gen computation speedup requires heterogeneous machines– Currently CPU+GPU systems exist (Summit and Sierra)
• Accelerators provide speedups that are not reliant on Moore’s law
• 2.5D integration (shared interposer + specialized dies) allows increased scalability
![Page 4: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/4.jpg)
Motivation
• New problem: multiple dies share single set of package pins
• Increasing need for power and IO (via package pins)
Image from A Case for Packageless Processors
![Page 5: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/5.jpg)
Background
• Power behavior is very bursty – short high-activity periods followed by longer low-activity
periods– P = CV2 f
• Keeping the power below the limit = Power Capping
![Page 6: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/6.jpg)
Large wasted provisioned power
Background
LU Decomposition power consumption at 700 MHz
![Page 7: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/7.jpg)
Background
• Required power behavior detailed by the power limit specification– Acceptable power level (50 watts)– Acceptable time window (20 µs)
• Time windows dictated by which component will fail first – ~20 µs for package pins or ~1 ms for an external voltage
regulator (VR)
![Page 8: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/8.jpg)
Current Approaches
• Centralized controllers: – RAPL/TurboBoost [Intel]
• Software-based control:– Isci et al [MICRO39], Joao et al [SIGARCH ’13],
Lefurgy et al [Cluster Computing v11 ‘08], SW response time is >50 µs (too slow!)
• Heterogeneous systems:– Harmonia [SIGARCH ’15], Komoda et al [ICCD ‘13],
DynaCo [SC ‘13],, Pupil [ASPLOS ‘16], Co-Cap [SAC ’16]– Focused on saving energy or Software-based (too slow!)
![Page 9: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/9.jpg)
Background
• SoCs have many components (processor cores, GPU compute units, etc) in a single package– Nvidia Volta V100 has 5120 CUDA cores (in 80 streaming
multiprocessors)• Large scale means difficult communication for
centralized controllers– Lots of global wires or use bus
![Page 10: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/10.jpg)
Background
• SoCs have many different components– CPUs, GPUs, accelerators, FPGAs, etc.– Hard to create a single central algorithm for all
combinations
![Page 11: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/11.jpg)
Problem Definition
• How can we take advantage of bursty power consumption to improve performance on average?– Bring average and peak power closer together– Steer the power to where it is needed
• How can we ensure that the approach will scale as 2.5D systems get larger and larger, and support heterogeneity?– Cannot have separate communication for each unit– Enable swappable support for different architectures
HCAPP: Heterogeneous Constant Average Power Processing
![Page 12: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/12.jpg)
Design Requirements
Requirement Reason
Scalable to many components
Needs to work for larger and larger designs in the future
Support multiple architectures
Needs to enable multiple different configurations of dies in the 2.5D system
Maintains power cap Power limit must be upheld for system viability
Uses extra average power
Use as much power as possible since it is already provisioned
Fast reaction time Power cap must be maintained over short time step (~20 µs)
![Page 13: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/13.jpg)
HCAPP Design
![Page 14: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/14.jpg)
Design
Global controller: maintain power cap through voltage ctrl
![Page 15: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/15.jpg)
Design
Domain controller: Scale voltage for die, SW interface
![Page 16: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/16.jpg)
Design
Local controller: Use local metric to improve efficiency
![Page 17: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/17.jpg)
Design
Step 1: Activity change in a component
![Page 18: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/18.jpg)
Design
Step 2: Power draw propagates back to global VR
![Page 19: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/19.jpg)
Design
Step 3: Global VR senses new current draw
![Page 20: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/20.jpg)
Design
Step 4: Global controller calculates next voltage (PID)
![Page 21: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/21.jpg)
Design
Step 5: Global VR assigns new global voltage
![Page 22: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/22.jpg)
Design
Step 6: Global voltage propagates to domain VR
![Page 23: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/23.jpg)
Design
Step 7: Domain VR senses new global voltage and current
![Page 24: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/24.jpg)
Design
Step 8: Domain ctrl calculates new domain voltage
![Page 25: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/25.jpg)
Design
Step 9: Domain VR applies new domain voltage
![Page 26: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/26.jpg)
Design
Step 10: New domain voltage propagates to component
![Page 27: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/27.jpg)
Design
Step 11: New local voltage determined from domain voltage and local controller
![Page 28: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/28.jpg)
Design
Step 12: Component uses new local voltage and frequency
![Page 29: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/29.jpg)
Design
• PID Tuning– Done manually with general methodology– First, increase proportional (KP)– Then, increase integral until steady state error is
acceptable (KI)– Derivative component not used in this controller
![Page 30: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/30.jpg)
Component-Specific Design
• Local controllers designed to take advantage of local architecture metrics (such as IPC or warp occupancy)
• Scale voltage locally based on metrics to push power to components that need the power
• Used high IPC (CPU) and dynamic warp controllers (GPU) from CAPP and GPU-CAPP work
![Page 31: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/31.jpg)
Global Controller Speed
Component Response time (ns)
Voltage Regulator (36-226)x2 = 72-452
Sensing Circuitry 50-60
Controller 10-30
Power Supply Network (3-15)x5 = 15-75
Total 147-617
HCAPP Cycle Time 1000
![Page 32: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/32.jpg)
Design Summary
Requirement HCAPP Related Feature Status
Scalable to many components
Decentralized control through power network PASS
Support multiple architectures
Architecture-specific domain controller and local controller logic
PASS
Maintains power cap PID power control tuned to ensure cap PASS
Uses extra average power
PID power control increases voltage when power is below cap
PASS
Fast reaction time Speed of CAPP control is 1 µs PASS
![Page 33: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/33.jpg)
Experimental Setup (System)
• System was defined as:– 1 CPU– 1 GPU– 1 SHA Accelerator
• Focused on execution time of one benchmark run on each starting at the same time– Combinations chosen based on benchmark characteristics
![Page 34: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/34.jpg)
Experimental Setup (Models)
• CPU modeled using Sniper simulator with McPAT power model
• GPU modeled using GPGPUSim with GPUWattch
• Accelerator modeled as SHA Accelerator [Suresh et al, ESSCIRC’18]
![Page 35: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/35.jpg)
Experimental Setup (Benchmarks)
• CPU: PARSEC benchmark subset• GPU: Rodinia benchmark subset• SHA Accelerator
– Analytical model with fixed amount of input work
• Benchmarks selected to create combinations of interesting power behaviors
![Page 36: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/36.jpg)
Experimental Setup
• Baseline: system with a single fixed global voltage and no local controllers
• Comparison systems:– HCAPP with 1µs control period– HCAPP with 100µs control period (RAPL-like equivalent)– HCAPP with 10ms control period (SW equivalent)
• Constraints: 100 W (20 µs window) and 100 W (1 ms window)
![Page 37: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/37.jpg)
HCAPP Maximum Power
RAPL-like and SW-like greatly exceed maximum power20 µs time window
![Page 38: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/38.jpg)
HCAPP Performance
Average speedup of +21%20 µs time window
![Page 39: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/39.jpg)
HCAPP PPE
Provisioned Power Efficiency = Average Power / Power LimitAverage PPE improved from 69.1% to 79.3%20 µs time window
![Page 40: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/40.jpg)
HCAPP Maximum Power
RAPL-like and SW-like still exceed limit, RAPL-like approaches viability1 ms time window
![Page 41: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/41.jpg)
HCAPP Performance
Average speedup of +43% (compared to 36% for RAPL)1 ms time window
![Page 42: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/42.jpg)
HCAPP PPE
Average PPE improved from 69.1% to 93.9% (RAPL: 79.7%)1 ms time window
![Page 43: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/43.jpg)
HCAPP SW Interface
Simple SW prioritization results in average speedups of +8.3% (CPU), +5.4% (GPU), and +12.0% (SHA)
![Page 44: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/44.jpg)
Final Thoughts
HCAPP is a power management architecture that can:• Manage heterogeneous systems• Scale with increasingly large systems in a single
package• Maximize performance under a power limit
Application Pin power limit VR power limit
Speedup +21% +43%
PPE +10% +35%
![Page 45: HCAPP: Scalable Power Control for Heterogeneous 2.5D ... · HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems Kramer Straube†, Jason Lowe-Power*, Christopher](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbdb96720175d6cb954b52a/html5/thumbnails/45.jpg)
Thank you for watching