Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around...
Transcript of Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around...
![Page 1: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/1.jpg)
Building GPU Compilers with libNVVM
Yuan Lin
![Page 2: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/2.jpg)
Vision
Build a platform for GPU
computing around foundations
of CUDA.
— Bring other languages to GPUs
— Enable CUDA for other platforms
Make that platform available for
ISVs, researchers, and hobbyists
— Create a flourishing eco-system
CUDA C, C++
Compiler For CUDA
NVIDIA GPUs
x86 CPUs
New Language Support
New Processor Support
![Page 3: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/3.jpg)
Structure of NVCC NVCC: CUDA compiler
![Page 4: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/4.jpg)
NVCC
a.cu
Host Compiler CUDA Executable
GPU Machine Code
Host C++ Code
Structure of NVCC NVCC: CUDA compiler
![Page 5: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/5.jpg)
CICC
PTX Assembly
CUDA C++ Front End a.cu
PTXAS
Host Compiler CUDA Executable
GPU Machine Code
C code
Host C++ Code
Structure of NVCC NVCC: CUDA compiler
CICC: LLVM based high level
optimizer and PTX generator
PTX: Virtual Instruction Set
![Page 6: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/6.jpg)
Structure of CICC
CICC
PTX Assembly
C code
![Page 7: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/7.jpg)
Structure of CICC
PTX Assembly
PTX CodeGen
Optimizer
LLVM IR
Optimized IR
C code
C Front End
![Page 8: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/8.jpg)
Structure of CICC
PTX Assembly
PTX CodeGen
Optimizer
LLVM IR
Optimized IR
C code
C Front End
![Page 9: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/9.jpg)
Front Ends
Common Compiler
PTX Assembly
PTX CodeGen
Optimizer
LLVM IR
Optimized IR
Front Ends
![Page 10: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/10.jpg)
Front Ends
Common Compiler
PTX Assembly
PTX CodeGen
Optimizer
LLVM IR
Optimized IR
Front Ends
Built-in Functions Library
![Page 11: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/11.jpg)
Front Ends
Common Compiler
PTX Assembly
PTX CodeGen
Optimizer
LLVM IR
Optimized IR
Front Ends
Built-in Functions Library
NVVM IR Spec
libNVVM
library
Open Source
libDevice.bc
library
![Page 12: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/12.jpg)
Two-pronged Approach
Open-source LLVM NVPTX backend
— Community supported
NVIDIA Compiler SDK
— Binary library, header files, documents
— Supported product
![Page 13: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/13.jpg)
Enabling Open Source GPU Compilers
Contributed NVPTX code generator sources back to LLVM in
summer 2012
Part of LLVM 3.2 release
Actively maintained in LLVM trunk
— by NVIDIA and other LLVM developers
Standard LLVM License
Best for
— Prototyping
— Developers who work only with LLVM trunk
![Page 14: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/14.jpg)
NVIDIA Compiler SDK
Preview was released at GTC 2012.
1st official release will be included in CUDA 5.5 toolkit.
![Page 15: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/15.jpg)
LLVM NVPTX / libNVVM Users
Numba Pro: array-oriented compiler for Numpy/Python
Halide image processing language : MIT (halide-lang.org)
Jet fluid dynamics DSL : Double Negative
Alea.CUDA, F# on the GPU : QuantAlea
Delite parallel EDSL framework : Stanford PPL
KernelGen: open source compiler project at HPC forge
![Page 16: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/16.jpg)
NVIDIA Compiler SDK
NVVM IR specification
libNVVM library and header file
libDevice
Code samples
Developer’s guide and API document
![Page 17: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/17.jpg)
libNVVM Library
![Page 18: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/18.jpg)
libNVVM Library
An optimizing compiler library that generates PTX from
NVVM IR
— Supports LLVM 3.0, 3.1 and 3.2 IR format
— Supports Fermi, Kepler and later architectures
Available on 32-bit and 64-bit Windows, Linux and Mac
Actually used by CUDA 5.5 compiler
— Dynamically linked by cicc
![Page 19: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/19.jpg)
libNVVM Library
Analysis and optimizations
— Address space access optimization
— Thread variance/convergence analyses
— Re-materialization
— Load/store coalescing
— Sign extension elimination
— New phi elimination
— Enhanced alias analysis
— …
Support DWARF generation
![Page 20: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/20.jpg)
libNVVM C APIs
Create a program
— Add NVVM IR modules to the program
— Support NVVM IR level linking
Verify the input IR
Compile IR to PTX
Get result
— Get back PTX string
— Get back message log
![Page 21: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/21.jpg)
libDevice
![Page 22: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/22.jpg)
libDevice
Common device math functions
— Distributed in LLVM bitcode format
— Supports Fermi and Kepler
Supports both ftz and non-ftz mode
Can be linked with NVVM IR program using libNVVM API
— Treated as a normal NVVM IR module in libNVVM
Can be used with the open source LLVM NVPTX backend
We build our CUDA math functions on top of it in CUDA 5.5.
Will include more common device functions in the future.
![Page 23: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/23.jpg)
NVVM IR
![Page 24: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/24.jpg)
NVVM IR
Designed to represent GPU kernel functions and device
functions
— Represents the code executed by each CUDA thread
NVVM IR Specification 1.0
— Based on LLVM IR 3.2
![Page 25: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/25.jpg)
NVVM IR and LLVM IR
NVVM IR
— Based on LLVM IR
— With a set of rules and intrinsics
No new types. No new operators. No new reserved words.
An NVVM IR program can work with any standard LLVM IR
tool
llvm-as llvm-link llvm-extract
llvm-dis llvm-ar …
An NVVM IR program can be built with the standard LLVM
distribution.
svn co http://llvm.org/svn/llvm-project/llvm/branches/release_32 llvm
![Page 26: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/26.jpg)
NVVM IR
![Page 27: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/27.jpg)
NVVM IR: Address Spaces
![Page 28: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/28.jpg)
NVVM IR: Address Spaces
CUDA C++
— Address space is a storage qualifier.
— A pointer is generic pointer, which can point to
any address space.
__global__ int g;
__shared__ int s;
__constant__ int c;
void foo(int a) {
int l;
int *p ;
switch (a) {
case 1: p = &g; …
case 2: p = &s; …
case 3: p = &c; …
case 4: p = &l; …
}
…
}
![Page 29: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/29.jpg)
NVVM IR: Address Spaces
CUDA C++
— Address space is a storage qualifier.
— A pointer is generic pointer, which can point to
any address space.
OpenCL C
— Address space is part of the type system.
— A pointer type must be qualified with an
address space.
constant int c;
foo(global int *pg) {
int l;
int *p ;
p = &l;
constant int *pc = &c;
…
}
![Page 30: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/30.jpg)
NVVM IR: Address Spaces
CUDA C++
— Address space is a storage qualifier.
— A pointer is generic pointer, which can point to
any address space.
OpenCL C
— Address space is part of the type system.
— A pointer type must be qualified with an
address space.
NVVM IR
— Support both use cases in the same program.
![Page 31: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/31.jpg)
NVVM IR: Address Spaces
Define address space numbers
Allow generic pointers and specific pointers
Provide intrinsics to perform conversions between generic
pointers and specific pointers
![Page 32: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/32.jpg)
NVVM IR: Address Spaces
Allow module scope variables that are in the global
address space to have generic address values.
— Make generating NVVM IR code much easier.
// @a is a module scope variable residing in the // global address space (1). // The address value of @a is a global address value. @a = addrspace(1) float 0.000000e+00
![Page 33: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/33.jpg)
// @a is a module scope variable residing in the // global address space (1). // The address value of @a is a global address value. @a = addrspace(1) float 0.000000e+00 // @b is a module scope variable residing in the // global address space (1). // The address value of @b is a generic address value. @b = float 0.000000e+00
NVVM IR: Address Spaces
Allow module scope variables that are in the global
address space to have generic address values.
— Make generating NVVM IR code much easier.
![Page 34: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/34.jpg)
NVVM IR: GPU Program Properties
Properties:
— Maximum expected CTA size from any launch
— Minimum number of CTAs on an SM
— Kernel function vs. device function
— Texture/surface variables
— more
Use named metadata
![Page 35: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/35.jpg)
NVVM IR: Intrinsics
Atomic operations
Barriers
Address space conversions
Special registers read
Texture/surface access
more
![Page 36: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/36.jpg)
Programming Guide
![Page 37: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/37.jpg)
Developer’s Guide and API Document
NVVM IR Specification
libNVVM: Developer’s Guide and API Spec
libDevice: Developer’s Guide and API Spec
Will be available
— online @ docs.nvidia.com
— as PDF files
![Page 38: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/38.jpg)
Samples
![Page 39: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/39.jpg)
Samples
“simple”
— JIT compile a NVVM IR program using libNVVM
— launch it using CUDA driver API
libNVVM CUDA Driver
![Page 40: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/40.jpg)
Samples
“ptxgen”
— A simple offline NVVM IR to PTX compiler
— Links in libDevice
libNVVM libDevice
![Page 41: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/41.jpg)
Samples
“cuda-c-linking”
— Create a NVVM IR program using LLVM IR builder API
— JIT compile the NVVM IR program using libNVVM
— Link it with a PTX generated from CUDA C using PTX JIT linking API
— Launch the final code using CUDA driver API
libNVVM LLVM PTX JIT linking
![Page 42: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/42.jpg)
Samples
“simple”
“ptxgen”
“cuda-c-linking”
Part of CUDA 5.5 toolkit
![Page 43: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/43.jpg)
Samples
Samples that
— are relatively big,
— depend on other open-source package, or
— need to be updated more frequently
![Page 44: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/44.jpg)
Samples on github
Samples that
— are relatively big,
— depend on other open-source package, or
— need to be updated more frequently
Examples
— Other language bindings for libNVVM: Python, Haskell
— Kaleidoscope
— Small utilities
Open source BSD style license
![Page 45: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/45.jpg)
Example
![Page 46: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/46.jpg)
let (saxpy (lambda (a : f32) (x : vf32) (y : vf32) : vf32 (map (lambda (xi : f32) (yi : f32) : f32 (+ (* a xi) yi) ) x y))) in (saxpy A X Y)
SAXPY
![Page 47: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/47.jpg)
SAXPY
let (saxpy (lambda (a : f32) (x : vf32) (y : vf32) : vf32 (map (lambda (xi : f32) (yi : f32) : f32 (+ (* a xi) yi) ) x y))) in (saxpy A X Y)
![Page 48: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/48.jpg)
SAXPY
let (saxpy (lambda (a : f32) (x : vf32) (y : vf32) : vf32 (map (lambda (xi : f32) (yi : f32) : f32 (+ (* a xi) yi) ) x y))) in (saxpy A X Y)
• Execute this on the GPU.
• Use one GPU thread for each vector element.
saxpy(float a, float *x, float *y) { y[thread_id] = a * x[thread_id] + y[thread_id]; }
![Page 49: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/49.jpg)
@n = internal global i32 0, align 4
@a = internal global float 0.000000e+00, align 4
define void @saxpy(float* %x, float* %y) {
}
![Page 50: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/50.jpg)
@n = internal global i32 0, align 4
@a = internal global float 0.000000e+00, align 4
define void @saxpy(float* %x, float* %y) {
}
!nvvm.annotations = !{!0}
!0 = metadata !{void (float, float*, float*)* @saxpy, metadata !"kernel", i32 1}
![Page 51: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/51.jpg)
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32
-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
@n = internal global i32 0, align 4
@a = internal global float 0.000000e+00, align 4
define void @saxpy(float* %x, float* %y) {
}
!nvvm.annotations = !{!0}
!0 = metadata !{void (float, float*, float*)* @saxpy, metadata !"kernel", i32 1}
![Page 52: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/52.jpg)
define void @saxpy(flat %a, float* %x, float* %y) {
; load @n and @a
; int i = blockIdx.x * blockDim.x + threadIdx.x;
; load x[i]
; load y[i]
; a * x + y
; store y[i]
}
![Page 53: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/53.jpg)
define void @saxpy(float %a, float* %x, float* %y) {
; load @n and @a
%n = load i32 * @n, align 4
%a = load float * @a, align4
…
}
![Page 54: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/54.jpg)
declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
declare i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
declare i32 @llvm.nvvm.read.ptx.sreg.tid.x()
define void @saxpy(float %a, float* %x, float* %y) {
; load @n and @a
%n = load i32 * @n, align 4
%a = load float * @a, align4
; int i = blockIdx.x * blockDim.x + threadIdx.x;
%0 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
%1 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
%mul = mul i32 %1, %0
%2 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
%add = add i32 %mul, %2
…
}
![Page 55: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/55.jpg)
declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
declare i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
declare i32 @llvm.nvvm.read.ptx.sreg.tid.x()
define void @saxpy(float %a, float* %x, float* %y) {
; load @n and @a
%n = load i32 * @n, align 4
%a = load float * @a, align4
; int i = blockIdx.x * blockDim.x + threadIdx.x;
%0 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
%1 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
%mul = mul i32 %1, %0
%2 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
%add = add i32 %mul, %2
; load x[i]
%3 = sext i32 %add to i64
%x_ptr = getelementptr float * %x, i64 %3
%x_value = load float * %x_ptr, align 4
; load y[i]
%y_ptr = getelementptr float * %y, i64 %3
%y_value = load float * %y_ptr, align 4
; a * x + y
%mul11 = fmul float %a, %x_value
%add16 = fadd float %mul11, %y_value
; store y[i]
store float %add16, float * %y_ptr, align 4
ret void
}
![Page 56: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/56.jpg)
Using libNVVM APIs
#include "nvvm.h" char *compiler () { nvvmProgram prog; nvvmCreateProgram(&prog); char *buffer; size_t size; getSAXPYir(&buffer, &size); nvvmProgramAddModule(prog, buffer, size); getLibDevice(&buffer, &size, "libdevice.compute_" #ARCH "." #MAJOR #MINOR ".bc") nvvmProgramAddModule(prog, buffer, size); nvvmCompileProgram(prog, 0, NULL); size_t ptxSize; nvvmGetCompiledResultSize(prog, &ptxSize); char *ptx = (char *) malloc(ptxSize); nvvmGetCompiledResult(prog, ptx); nvvmDestroyProgram(&prog); return ptx; }
![Page 57: Building GPU Compilers with libNVVM | GTC 2013 · Build a platform for GPU computing around foundations of CUDA. —Bring other languages to GPUs —Enable CUDA for other platforms](https://reader034.fdocuments.net/reader034/viewer/2022051813/60340fb38cbd8a521e49cbae/html5/thumbnails/57.jpg)
How to get them?
Distributed with CUDA 5.5
— NVVM IR spec
— libNVVM library and header file
— libDevice
— Code samples
— Developer’s guide and API document
LLVM.org: NVPTX backend
Github: More samples
devtalk.nvidia.com: Post your questions/suggestions