Power and Performance Characterization of Computational Kernels on the GPU
description
Transcript of Power and Performance Characterization of Computational Kernels on the GPU
![Page 1: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/1.jpg)
synergy.cs.vt.edu
Power and Performance Characterization of
Computational Kernels on the GPUYang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng
![Page 2: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/2.jpg)
synergy.cs.vt.edu
Graphic Processing Units (GPU) are Powerful
* Data and image source, http://people.sc.fsu.edu/~jburkardt/latex/ajou_2009_parallel/ajou_2009_parallel.html
![Page 3: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/3.jpg)
synergy.cs.vt.edu
GPU is Increasingly Popular in HPC Three out of top five supercomputers are GPU-
based
![Page 4: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/4.jpg)
synergy.cs.vt.edu
GPUs are Power Hungry
Xeon GTX280 Fermi0
50
100
150
200
250
300
350Th
erm
al D
esig
n Po
wer
(Wat
ts)
It is imperative to investigate Green GPU computing
![Page 5: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/5.jpg)
synergy.cs.vt.edu
Green Computing with DVFS on CPUs Mechanism
Minimizing performance impact Lower voltage and frequency when CPU not in critical
path
What about GPUs?
Power Voltage∝ 2 × Frequency
![Page 6: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/6.jpg)
synergy.cs.vt.edu
What is this Paper about? Characterize performance and power for
various kernels on GPUs Kernels with different compute and memory
intensiveness Various core and memory frequencies
Contributions Reveal unique frequency scaling behaviors on GPUs Provide useful hints for green GPU computing
![Page 7: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/7.jpg)
synergy.cs.vt.edu
Outline Introduction GPU Overview Characterization Methodology Experimental Results Conclusion & Future Work
![Page 8: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/8.jpg)
synergy.cs.vt.edu
NVIDIA GTX280 Architecture
8
On-chip memory • Small sizes• Fast access
Off-chip memory • Large size• High access latency
Device (Global) Memory
![Page 9: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/9.jpg)
synergy.cs.vt.edu
OpenCL Write once, run on any GPUs Allow programmer to fully exploit power of
GPUs Compute kernel: function executed on a GPU
OpenCL Device Abstraction
![Page 10: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/10.jpg)
synergy.cs.vt.edu
GPU Frequency Scaling Two dimensional
Compute core frequency and memory frequency
Semi-automatic Dynamic configuration not supported User can only control peak frequencies Automatically switch to idle mode when no
computation
Details not available to public
![Page 11: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/11.jpg)
synergy.cs.vt.edu
Outline Introduction GPU Overview Characterization Methodology Experimental Results Conclusion & Future Work
![Page 12: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/12.jpg)
synergy.cs.vt.edu
Kernel Selection High performance of GPUs
Massive parallelism (e.g., 240 cores) High memory bandwidth (e.g., 140GB/s)
Three kernels of computational diversityCompute Intensive
Memory Intensive
Matrix Multiplication
Matrix Transpose
Fast Fourier Transform (FFT)
![Page 13: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/13.jpg)
synergy.cs.vt.edu
Kernel Characteristics Memory to compute ratio
Instruction throughput
€
Rmem =#Global_Memory _Transactions#Computation _ Instructions
€
Rins =#Computation _ Instructions
GPU _Time
![Page 14: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/14.jpg)
synergy.cs.vt.edu
Kernel Profile
Matrix Multiplication
Matrix Transpose
FFT
Rmem 5.6% 53.7% 8.3%Rins 203215711 12095895 145165788
![Page 15: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/15.jpg)
synergy.cs.vt.edu
Measurement Performance
Matrix multiplication, FFT: GFLOPS Matrix transpose: MB/s
Energy Whole system when executing the kernel on the GPU
Power Reported using the average power
Energy Efficiency Performance / power
![Page 16: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/16.jpg)
synergy.cs.vt.edu
Outline Introduction GPU Overview Characterization Methodology Experimental Results Conclusion & Future Work
![Page 17: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/17.jpg)
synergy.cs.vt.edu
Experimental Setup System
Intel Core 2 Quad Q6600 NVIDIA GTX280 1GB memory
Power Meter Watts Up? Pro ES
![Page 18: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/18.jpg)
synergy.cs.vt.edu
Matrix Multiplication - Performance Mostly affected by core frequency, almost not
affected by memory frequency
400 450 500 550 600 650 70085
95
105
115
125
135
145
155
600700800900100011001200
GPU Core Frequency (MHz)
Perf
orm
ance
(GFL
OPS
)
![Page 19: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/19.jpg)
synergy.cs.vt.edu
Matrix Multiplication - Power Mostly affected by core frequency, slightly
affected by memory frequency
400 450 500 550 600 650 700245
255
265
275
285
295
305
315
600700800900100011001200
GPU Core Frequency (MHz)
Pow
er (W
atts)
![Page 20: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/20.jpg)
synergy.cs.vt.edu
Matrix Multiplication - Efficiency Best efficiency achieved at highest core
frequency and relatively high memory frequency
400 450 500 550 600 650 700340360380400420440460480500
600700800900100011001200
GPU Core Frequency (MHz)
Pow
er E
ffici
ency
(M
FLO
PS/W
att)
![Page 21: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/21.jpg)
synergy.cs.vt.edu
Matrix Transpose - Performance Performance dominated by memory frequency
400 450 500 550 600 650 700150
170
190
210
230
250
270
600700800900100011001200
GPU Core Frequency (MHz)
Perf
orm
ance
(MB/
s)
![Page 22: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/22.jpg)
synergy.cs.vt.edu
Matrix Transpose - Power Higher core frequency increase power
consumption (not performance)
400 450 500 550 600 650 700195200205210215220225230235240
600700800900100011001200
GPU Core Frequency (MHz)
Pow
er (W
atts)
![Page 23: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/23.jpg)
synergy.cs.vt.edu
Matrix Transpose - Efficiency Best efficiency achieved at highest memory
frequency and lowest core frequency
400 450 500 550 600 650 700650
750
850
950
1050
1150
1250
600700800900100011001200
GPU Core Frequency (MHz)
Pow
er E
ffici
ency
(KBP
S/W
att)
![Page 24: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/24.jpg)
synergy.cs.vt.edu
FFT - Performance Affected by both core and memory frequencies
400 450 500 550 600 650 70040455055606570758085
600700800900100011001200
GPU Core Frequency (MHz)
Perf
orm
ance
(GFL
OPS
)
![Page 25: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/25.jpg)
synergy.cs.vt.edu
FFT - Power Affected by both core and memory frequencies
400 450 500 550 600 650 700225
235
245
255
265
275
285
600700800900100011001200
GPU Core Frequency (MHz)
Pow
er (W
atts)
![Page 26: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/26.jpg)
synergy.cs.vt.edu
FFT - Efficiency Best efficiency at highest core and memory
frequencies
400 450 500 550 600 650 700185
205
225
245
265
285
305
600700800900100011001200
GPU Core Frequency (MHz)
Pow
er E
ffcie
ncy
(GFL
OPS
/w)
![Page 27: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/27.jpg)
synergy.cs.vt.edu
FFT – Two Dimensional Effect
Power (Watts) Efficiency (Mflops/Watt)225
230
235
240
245
250
255
260
265
270
<550, 1200><600, 1000><700, 800>
7%
![Page 28: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/28.jpg)
synergy.cs.vt.edu
Power and Efficiency Range
Power Efficiency0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
Matrix MultiplicationMatrix TransposeFFT
![Page 29: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/29.jpg)
synergy.cs.vt.edu
Conclusion & Future Work To take away
Green computing on GPUs are important GPU frequency scaling considerably different than
CPUs
Next Finer-grained level of characterization (e.g., different
types of operations) Experiments on Fermi and AMD GPUs
![Page 30: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/30.jpg)
synergy.cs.vt.edu
Acknowledgment NSF Center for High Performance
Reconfigurable Computing (CHREC) for their support through NSF I/UCRC Grant IIP-0804155;
National Science Foundation for their support partialy through CNS-0915861 and CNS-0916719.
![Page 31: Power and Performance Characterization of Computational Kernels on the GPU](https://reader035.fdocuments.net/reader035/viewer/2022062315/568163f3550346895dd57902/html5/thumbnails/31.jpg)
synergy.cs.vt.edu
Questions?