Jacket: Faster MATLAB® Genomics Codes by Chris...
Transcript of Jacket: Faster MATLAB® Genomics Codes by Chris...
![Page 1: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/1.jpg)
Jacket: Faster MATLAB® Genomics Codes
by Chris McClanahan, GPU Engineer
![Page 2: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/2.jpg)
Outline
• Introduction to Jacket for MATLAB®
• GFOR
• Comparison with GPU-PCT alternative
• Case Studies: Genomics Examples
![Page 3: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/3.jpg)
Matrix Types
gdouble double precision
gsingle single precision
glogical boolean
gint# integers
guint# unsigned integers
![Page 4: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/4.jpg)
Matrix Types: ND Support
vectors
matrices
volumes … ND
![Page 5: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/5.jpg)
Matrix Types: Easy Manipulation
A(1,:)
A(end,1)
A(1,1)
A(end,:)
A(:,:,2)
![Page 6: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/6.jpg)
n = 20e6; % 20 million random samples
X = grand(1,n,’gdouble’);
Y = grand(1,n,’gdouble’);
distance_to_origin = sqrt( X.*X + Y.*Y );
is_inside = (distance_to_origin <= 1);
pi = 4 * sum(is_inside) / n;
Easy GPU Acceleration of M code
![Page 7: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/7.jpg)
n = 20e6; % 20 million random samples
X = grand(1,n,’gdouble’);
Y = grand(1,n,’gdouble’);
distance_to_origin = sqrt( X.*X + Y.*Y );
is_inside = (distance_to_origin <= 1);
pi = 4 * sum(is_inside) / n;
Easy GPU Acceleration of M code
![Page 8: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/8.jpg)
No GPU-specific stuff involved (no kernels, no threads, no blocks, just regular M code)
“Very little recoding was needed to promote our Lattice Boltzmann Model code to run on the GPU.” –Dr. Kevin Tubbs, HPTi
Easy GPU Acceleration of M code
![Page 9: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/9.jpg)
y = gzeros( 5, 5, n );
for i = 1:n,
gselect(i); % choose GPU for this iteration
x = grand(5,5); % add work to GPU’s queue
y(:,:,i) = fft(x); % more work in queue
end
% all GPUs are now computing simultaneously, until done
Easy Multi GPU Scaling
![Page 10: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/10.jpg)
Technology Stack • A full system making
optimizations for you
• Including
– “Core” runtime
– “JIT” smart copy/exec
– “Calls” functionality
runtime memory mgt binary handling GPU-multiplex thread mgt
core
JIT Engine(s)
plus.mex
minus.mex
bsxfun.mex
tan.mex
times.mex
power.mex
Calls (library routines + JIT)
fft.mex
fft2.mex
bessel.mex
conv2.mex
convn.mex
find.mex
sum.mex
subsasgn.mex
mldivide.mex
lu.mex
![Page 11: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/11.jpg)
Automated Optimizations
300 cycles one-way
GPU Memory
GPU Cores
A = sin( x + y ).^2
CPU
![Page 12: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/12.jpg)
Automated Optimizations
300 cycles one-way
GPU Memory
GPU Cores
A = sin( x + y ).^2
CPU Optimized via
async transfer and smart copy
Optimized via runtime
![Page 13: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/13.jpg)
GFOR
GPU FOR-loops
![Page 14: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/14.jpg)
GFOR – Parallel FOR-loop for GPUs
• Like a normal FOR-loop, but faster
for i = 1:3
C(:,:,i) = A(:,:,i) * B;
Regular FOR-loop (3 serial kernel launches)
gfor i = 1:3
C(:,:,i) = A(:,:,i) * B;
Parallel GPU FOR-loop (only 1 kernel launch)
![Page 15: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/15.jpg)
Example: Matrix Multiply
*
B A(:,:,i)
iteration i = 1
C(:,:,i)
=
for i = 1:3
C(:,:,i) = A(:,:,i) * B;
Regular FOR-loop (3 serial kernel launches)
![Page 16: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/16.jpg)
Example: Matrix Multiply
*
B A(:,:,i)
iteration i = 1
C(:,:,i)
= *
B A(:,:,i)
iteration i = 2
C(:,:,i)
=
for i = 1:3
C(:,:,i) = A(:,:,i) * B;
Regular FOR-loop (3 serial kernel launches)
![Page 17: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/17.jpg)
Example: Matrix Multiply
*
B A(:,:,i)
iteration i = 1
C(:,:,i)
= *
B A(:,:,i)
iteration i = 2
C(:,:,i)
= *
B A(:,:,i)
iteration i = 3
C(:,:,i)
=
for i = 1:3
C(:,:,i) = A(:,:,i) * B;
Regular FOR-loop (3 serial kernel launches)
![Page 18: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/18.jpg)
simultaneous iterations i = 1:3
B A(:,:,1:3) C(:,:,1:3)
* = * =
* =
Example: Matrix Multiply
gfor i = 1:3
C(:,:,i) = A(:,:,i) * B;
Parallel GPU FOR-loop (only 1 kernel launch)
![Page 19: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/19.jpg)
simultaneous iterations i = 1:3
*
B A(:,:,1) C(:,:,1)
=
Example: Matrix Multiply
gfor i = 1:3
C(:,:,i) = A(:,:,i) * B;
Parallel GPU FOR-loop (only 1 kernel launch)
![Page 20: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/20.jpg)
Example: Summing over Columns
• Think of gfor as “syntactic sugar” to write vectorized code in an iterative style.
for i = 1:3
A(i) = sum(B(:,i));
gfor i = 1:3
A(i) = sum(B(:,i));
Three passes to sum all columns of B
One pass to sum all columns of B
Both equivalent to “sum(B)”, but latter is faster (more
explicitly written)
![Page 21: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/21.jpg)
Jacket versus PCT parallel computing toolbox
MATLAB and PCT are products and trademarks of MathWorks.
![Page 22: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/22.jpg)
Compare with R2012a PCT
MATLAB and PCT are products and trademarks of MathWorks.
• Jacket is faster
• Jacket does not use Java
• Jacket is very mature (~5 years)
• Jacket includes more functionality
![Page 23: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/23.jpg)
Speedups with Jacket
MATLAB and PCT are products and trademarks of MathWorks.
![Page 24: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/24.jpg)
Jacket has 10X more functions…
reductions • sum, min, max,
any, all, nnz, prod • vectors, columns,
rows, etc
convolutions • 2D, 3D, ND
dense linear algebra • LU, QR, Cholesky,
SVD, Eigenvalues, Inversion, det, Matrix Power, Solvers
FFTs • 2D, 3D, ND
image processing • filter, rotate, erode,
dilate, bwmorph, resize, rgb2gray
• hist, histeq
interp and rescale • vectors, matrices • rescaling
sorting • along any dimension • find
and many more…
gfor (loops) gcompile (fine-grain) gselect (multi-GPU)
help • gprofview
![Page 25: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/25.jpg)
Easy To Maintain
• Write your code once and let Jacket carry you through the coming hardware evolution.
– Each new Jacket release improves the speed of your code, without any code modification.
– Each new Jacket release leverages latest GPU hardware (e.g. Fermi, Kepler), without any code modification.
![Page 26: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/26.jpg)
Case Studies
![Page 27: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/27.jpg)
http://www.accelereyes.com/case_studies
17X
Neuro-imaging
Georgia Tech
20X
Video Processing
12X
Medical Devices
Spencer Tech
5X
Weather Modeling
NCAR
35X
Power Engineering
IIT India
17X
Track Bad Guys
BAE
Systems
70X
Drug Delivery
Georgia Tech
35X
Bioinformatics
Leibniz
20X
Bio-Research
CDC
45X
Radar Imaging
System Planning
![Page 28: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/28.jpg)
Case Study: CDC Genomics
• Hepatitis C Virus (HCV)
• Goal to explore random genetic mutations
• 10,000 random alignments – simulating the distribution of correlation values
under the null hypothesis that substitutions of amino acids at two sites are statistically independent (how aa’s mutate HCV)
![Page 29: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/29.jpg)
Case Study: CDC Genomics
• 10,000 random alignments intractable on CPU
• Addition of GPUs brings ~18X speedup
![Page 30: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/30.jpg)
Case Study: Leibniz Institute
• High Throughput Multi-Dimensional Scaling (HiT-MDS)
• High dimensional data reconstructed for visualization
• Goal to understand data
• Speedup: 35X
![Page 31: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/31.jpg)
Case Study: Spencer Technologies
• Real-time signal processing
• 64 ultrasound sensors
• Precise brain blood flow
• Speedup: 12X
![Page 32: Jacket: Faster MATLAB® Genomics Codes by Chris …on-demand.gputechconf.com/gtc/2012/presentations/S0287-Jacket-for... · •Introduction to Jacket for MATLAB® •GFOR •Comparison](https://reader031.fdocuments.net/reader031/viewer/2022030500/5aabce697f8b9a9c2e8c5e6e/html5/thumbnails/32.jpg)
Discussion
Faster MATLAB® through GPU computing