Accurate Power and Energy Measurement on Kepler -based Tesla GPUs

30
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs Martin Burtscher Department of Computer Science

description

Accurate Power and Energy Measurement on Kepler -based Tesla GPUs. Martin Burtscher Department of Computer Science. Introduction. GPU-based accelerators Quickly spreading in PCs and even handheld devices Widely used in high-performance computing Power and energy efficiency - PowerPoint PPT Presentation

Transcript of Accurate Power and Energy Measurement on Kepler -based Tesla GPUs

Page 1: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

Accurate Power and Energy Measurementon Kepler-based Tesla GPUs

Martin BurtscherDepartment of Computer Science

Page 2: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

2

Introduction GPU-based accelerators

Quickly spreading in PCs and even handheld devices Widely used in high-performance computing

Power and energy efficiency Heat dissipation is a problem Electric bill and battery life are of growing concern Exascale requires 50x boost in performance per watt

Important research area Need to develop techniques to reduce power and energy Have to be able to measure power/energy of programs

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

Page 3: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

3

GPU Power Sensors

Hardware High-end compute GPUs include power sensors For example, K20/K40 Tesla cards have built-in sensor These cards are the target of this talk

Software Can query sensor with NVIDIA Management Library http://developer.nvidia.com/nvidia-management-library-nvml

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

Page 4: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

4

Problems

Power sensor data behaves strangely Running the same kernel twice yields different energy

First launch: 114 J, second launch: 147 J (29% more energy) Running a kernel 2x as long more than doubles energy

1x input: 732 J, 2x input: 1579 J (8% above doubling)

Power sensor sampling rate varies greatly Ranges from 0.266 ms to 130 ms (7.7 Hz to 3760 Hz)

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

Page 5: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

5

Methodology Hardware

Two K20c, two K20m, two K20X, and two K40m GPUs

Measurement Query power and time in loop on “idle” CPU core

Test code Compute-intensive regular n-body kernel Constant computation rate of over 2 TFlops on a K20c No data dependences; vary n to adjust kernel runtime

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

Page 6: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

6

Expected Power Profile

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

Kernel starts executing

Kernel stops executing

GPU idle power

Measurement loop runtime

Page 7: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

7

Measured Power Profile

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

Power ramps up slowly

Power ramps down slowly

Switch to step shape

Idle power reached

Macroscopic phenomena

5s 3s 4s

Page 8: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

8

Energy = Area Under Power Curve

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

Integrate to where?

Unclear how big energy is

Missing energy? Delayed

energy?

Page 9: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

9

Ramp-up Behavior of 2 Short Runs

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

Short run same as longer run

2nd run starts higher but also follows curve

Ramp down doesn’t follow

Page 10: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

10

Ramp-down Behavior of Several Runs

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

0

20

40

60

80

100

120

140

160

16.2 17.2 18.2 19.2 20.2 21.2 22.2 23.2

Mea

sure

d Po

wer

[W]

Shifted Runtime [s]

t2 t3 t4

Shape depends on power at t2

Power increases after kernel done

Shape always the same

Steps down every second

Driver lowers power level

Page 11: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

11

Sampling Interval Lengths

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

0

10

20

30

40

50

60

70

80

0

20

40

60

80

100

120

140

160

10.7 12.0 13.3 14.6 15.9 17.2 18.5 19.8 21.1 22.4 23.7

Sam

plin

g Int

erva

l [m

s]

Mea

sure

d Po

wer

[W]

Runtime [s]

t1 t2 t3 t4

Short intervals

Wide range of intervals

Very long interval

Driver activity can prevent sampling

Page 12: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

12

Sampling Interval Lengths (zoomed-in)

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

0

2

4

6

8

10

12

0

20

40

60

80

100

120

12.030 12.035 12.040 12.045 12.050 12.055 12.060

Sam

plin

g Int

erva

l [m

s]

Mea

sure

d Po

wer

[W]

Runtime [s]

Identical values

Many short intervals

Very long interval

Sampled power only ever changes after long interval

Page 13: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

13

Correcting the Measurements

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

Page 14: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

14

Sampling Frequency Eliminate redundant samples

Only sample once every 15 ms (66.7 Hz) Cannot accurately measure kernels under ~150 ms

Account for the variation in interval length Use high-resolution time stamps

Example: energy from t1 to t4

Dotted (fixed intervals): 1205 J Solid (variable intervals): 1066 J 13% discrepancy

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

0

20

40

60

80

100

120

140

160

10.7 12.0 13.3 14.6 15.9 17.2 18.5 19.8 21.1 22.4 23.7

Mea

sure

d Po

wer

[W]

Runtime [s]

t1 t4

Page 15: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

15

True Power Sensor hardware

Seems to asymptotically approach true power Reminiscent of capacitor charging

True instant power Ptrue is a function of the slope of the power profile

dP/dt and the power measured by the sensor Psensor

Ptrue = Psensor + C × dPsensor/dt “Capacitance” of sensor

C ≈ 0.84 s on all tested K20 GPUs

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

Page 16: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

16

Back-calculated from Expected Profile

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

‘Capacitor’ function matches measured

values perfectly

Minimized absolute errors to determine C

Page 17: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

17

Corrected Power Profile

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

0

20

40

60

80

100

120

140

160

13 14 15 16 17 18 19 20 21

Pow

er [W

]

Time [s]

t1 t2 t3

Wobbles due to sampling errors

Corrected profile matches expected rectangular profile

‘Active idle’ power level

Page 18: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

18

Correction of 2 Short Runs

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

0

20

40

60

80

100

120

140

160

111 112 113 114 115 116 117 118 119

Pow

er [W

]

Time [s]

t1a t2b t3bt1bt2a

Corrected power profile matches expected profile

Page 19: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

19

Second K20c GPU

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

0

20

40

60

80

100

120

140

160

16.5 17.5 18.5 19.5 20.5 21.5 22.5 23.5

Pow

er [W

]

Time [s]

t1 t2 t3

Identical to original K20c

Page 20: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

20

K20m GPU

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

0

20

40

60

80

100

120

140

160

180

62.7 63.7 64.7 65.7 66.7 67.7 68.7 69.7

Pow

er [W

]

Time [s]

t1 t2 t3

Similar profile but higher power level

Page 21: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

21

K20X GPU

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

0

20

40

60

80

100

120

140

160

180

200

128 129 130 131 132 133 134 135 136 137

Pow

er [W

]

Time [s]

t1 t2 t4

Profile is good, no correction needed!

Huge 600 ms gap

Page 22: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

22

K40m GPU

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

K40m again requires correction

Page 23: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

23

Application to Full CUDA Program

Implementation of Barnes Hut n-body algorithm Taken from LonestarGPU benchmark suite Contains multiple regular and irregular kernels Highly optimized, but still suffers from load imbalance,

divergence, and uncoalesced accesses Main kernel is ‘regularized’ (warp-based)

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

NASA/JPL-Caltech/SSC

Page 24: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

24

Barnes Hut Power Profile (1 Step)

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

Slow then fast drop-off

“Wave” in profile Original profile is

hard to interpret

Page 25: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

25

Barnes Hut Power Profile (Kernels)

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

Slow then fast drop-off

“Wave” in profile Original profile is

hard to interpret

Page 26: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

26

Corrected Barnes Hut Power Profile

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

0

20

40

60

80

100

120

140

160

61.7 62.7 63.7 64.7 65.7 66.7 67.7 68.7

Pow

er [W

]

Time [s]

a b cd ef

Decrease due to load imbal.

Two similar irreg. kernels

One more irreg. kernel

Very short regular kernel

Corrected profile reveals important info

Regularized main kernel

Page 27: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

27

K20Power Tool Output

Corrected profile and corresponding ‘active’ energy Features

Computes instant power using ‘capacitor’ formula Employs high-resolution time steps Samples at true frequency of 66.7 Hz

Dissemination Open source, research license http://cs.txstate.edu/~burtscher/research/K20power/

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

Page 28: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

28

Marcher System Tool will be part of Marcher system at Texas State

NSF-funded green computing infrastructure Marcher is a power-measurable cluster system

832 general-purpose cores 12,000 GPU and MIC cores 1.2 TB of DDR3 with power throttling and scaling 50 TB of hybrid storage with hard drives and SSDs Component-level power measurement tools (e.g.,

CPU, DRAM, Disk, GPU, Xeon Phi)

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

Page 29: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

29

Summary Correctly measuring K20/K40 power and energy

Sample at 66.7 Hz and include time stamps Compute true power with presented formula

Use neighboring power samples to approximate slope Compute true energy by integrating true power

Over intervals where power is above ‘active idle’

K20Power tool Software tool that implements this methodology

Paper at http://cs.txstate.edu/~burtscher/papers/gpgpu14.pdf

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

Page 30: Accurate Power and Energy  Measurement on  Kepler -based  Tesla GPUs

30

Acknowledgments Collaborators

Ivan Zecena and Ziliang Zong U.S. National Science Foundation

DUE-1141022, CNS-1217231, and CNS-1305359 NVIDIA Corporation

Grants and equipment donations Texas State University

Research Enhancement Program

Accurate Power and Energy Measurement on Kepler-based Tesla GPUs

Nvidia