Power and Performance Characterization of Computational Kernels on the GPU
description
Transcript of Power and Performance Characterization of Computational Kernels on the GPU
synergy.cs.vt.edu
Power and Performance Characterization of
Computational Kernels on the GPUYang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng
synergy.cs.vt.edu
Graphic Processing Units (GPU) are Powerful
* Data and image source, http://people.sc.fsu.edu/~jburkardt/latex/ajou_2009_parallel/ajou_2009_parallel.html
synergy.cs.vt.edu
GPU is Increasingly Popular in HPC Three out of top five supercomputers are GPU-
based
synergy.cs.vt.edu
GPUs are Power Hungry
Xeon GTX280 Fermi0
50
100
150
200
250
300
350Th
erm
al D
esig
n Po
wer
(Wat
ts)
It is imperative to investigate Green GPU computing
synergy.cs.vt.edu
Green Computing with DVFS on CPUs Mechanism
Minimizing performance impact Lower voltage and frequency when CPU not in critical
path
What about GPUs?
Power Voltage∝ 2 × Frequency
synergy.cs.vt.edu
What is this Paper about? Characterize performance and power for
various kernels on GPUs Kernels with different compute and memory
intensiveness Various core and memory frequencies
Contributions Reveal unique frequency scaling behaviors on GPUs Provide useful hints for green GPU computing
synergy.cs.vt.edu
Outline Introduction GPU Overview Characterization Methodology Experimental Results Conclusion & Future Work
synergy.cs.vt.edu
NVIDIA GTX280 Architecture
8
On-chip memory • Small sizes• Fast access
Off-chip memory • Large size• High access latency
Device (Global) Memory
synergy.cs.vt.edu
OpenCL Write once, run on any GPUs Allow programmer to fully exploit power of
GPUs Compute kernel: function executed on a GPU
OpenCL Device Abstraction
synergy.cs.vt.edu
GPU Frequency Scaling Two dimensional
Compute core frequency and memory frequency
Semi-automatic Dynamic configuration not supported User can only control peak frequencies Automatically switch to idle mode when no
computation
Details not available to public
synergy.cs.vt.edu
Outline Introduction GPU Overview Characterization Methodology Experimental Results Conclusion & Future Work
synergy.cs.vt.edu
Kernel Selection High performance of GPUs
Massive parallelism (e.g., 240 cores) High memory bandwidth (e.g., 140GB/s)
Three kernels of computational diversityCompute Intensive
Memory Intensive
Matrix Multiplication
Matrix Transpose
Fast Fourier Transform (FFT)
synergy.cs.vt.edu
Kernel Characteristics Memory to compute ratio
Instruction throughput
€
Rmem =#Global_Memory _Transactions#Computation _ Instructions
€
Rins =#Computation _ Instructions
GPU _Time
synergy.cs.vt.edu
Kernel Profile
Matrix Multiplication
Matrix Transpose
FFT
Rmem 5.6% 53.7% 8.3%Rins 203215711 12095895 145165788
synergy.cs.vt.edu
Measurement Performance
Matrix multiplication, FFT: GFLOPS Matrix transpose: MB/s
Energy Whole system when executing the kernel on the GPU
Power Reported using the average power
Energy Efficiency Performance / power
synergy.cs.vt.edu
Outline Introduction GPU Overview Characterization Methodology Experimental Results Conclusion & Future Work
synergy.cs.vt.edu
Experimental Setup System
Intel Core 2 Quad Q6600 NVIDIA GTX280 1GB memory
Power Meter Watts Up? Pro ES
synergy.cs.vt.edu
Matrix Multiplication - Performance Mostly affected by core frequency, almost not
affected by memory frequency
400 450 500 550 600 650 70085
95
105
115
125
135
145
155
600700800900100011001200
GPU Core Frequency (MHz)
Perf
orm
ance
(GFL
OPS
)
synergy.cs.vt.edu
Matrix Multiplication - Power Mostly affected by core frequency, slightly
affected by memory frequency
400 450 500 550 600 650 700245
255
265
275
285
295
305
315
600700800900100011001200
GPU Core Frequency (MHz)
Pow
er (W
atts)
synergy.cs.vt.edu
Matrix Multiplication - Efficiency Best efficiency achieved at highest core
frequency and relatively high memory frequency
400 450 500 550 600 650 700340360380400420440460480500
600700800900100011001200
GPU Core Frequency (MHz)
Pow
er E
ffici
ency
(M
FLO
PS/W
att)
synergy.cs.vt.edu
Matrix Transpose - Performance Performance dominated by memory frequency
400 450 500 550 600 650 700150
170
190
210
230
250
270
600700800900100011001200
GPU Core Frequency (MHz)
Perf
orm
ance
(MB/
s)
synergy.cs.vt.edu
Matrix Transpose - Power Higher core frequency increase power
consumption (not performance)
400 450 500 550 600 650 700195200205210215220225230235240
600700800900100011001200
GPU Core Frequency (MHz)
Pow
er (W
atts)
synergy.cs.vt.edu
Matrix Transpose - Efficiency Best efficiency achieved at highest memory
frequency and lowest core frequency
400 450 500 550 600 650 700650
750
850
950
1050
1150
1250
600700800900100011001200
GPU Core Frequency (MHz)
Pow
er E
ffici
ency
(KBP
S/W
att)
synergy.cs.vt.edu
FFT - Performance Affected by both core and memory frequencies
400 450 500 550 600 650 70040455055606570758085
600700800900100011001200
GPU Core Frequency (MHz)
Perf
orm
ance
(GFL
OPS
)
synergy.cs.vt.edu
FFT - Power Affected by both core and memory frequencies
400 450 500 550 600 650 700225
235
245
255
265
275
285
600700800900100011001200
GPU Core Frequency (MHz)
Pow
er (W
atts)
synergy.cs.vt.edu
FFT - Efficiency Best efficiency at highest core and memory
frequencies
400 450 500 550 600 650 700185
205
225
245
265
285
305
600700800900100011001200
GPU Core Frequency (MHz)
Pow
er E
ffcie
ncy
(GFL
OPS
/w)
synergy.cs.vt.edu
FFT – Two Dimensional Effect
Power (Watts) Efficiency (Mflops/Watt)225
230
235
240
245
250
255
260
265
270
<550, 1200><600, 1000><700, 800>
7%
synergy.cs.vt.edu
Power and Efficiency Range
Power Efficiency0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
Matrix MultiplicationMatrix TransposeFFT
synergy.cs.vt.edu
Conclusion & Future Work To take away
Green computing on GPUs are important GPU frequency scaling considerably different than
CPUs
Next Finer-grained level of characterization (e.g., different
types of operations) Experiments on Fermi and AMD GPUs
synergy.cs.vt.edu
Acknowledgment NSF Center for High Performance
Reconfigurable Computing (CHREC) for their support through NSF I/UCRC Grant IIP-0804155;
National Science Foundation for their support partialy through CNS-0915861 and CNS-0916719.
synergy.cs.vt.edu
Questions?