Experiences Accelerating MATLAB Systems Biology Applications Myocyte

16
Experiences Accelerating MATLAB Systems Biology Applications Myocyte Lukasz Szafaryn, Kevin Skadron, and Jeffrey J. Saucerman University of Virginia

description

Experiences Accelerating MATLAB Systems Biology Applications Myocyte. Lukasz Szafaryn, Kevin Skadron, and Jeffrey J. Saucerman University of Virginia. Outline. MATLAB Optimizations to MATLAB GPU Acceleration with CUDA Applications (Heart Wall Tracking and Myocyte Simulation) Problem - PowerPoint PPT Presentation

Transcript of Experiences Accelerating MATLAB Systems Biology Applications Myocyte

Page 1: Experiences Accelerating MATLAB Systems Biology Applications Myocyte

Experiences Accelerating MATLAB Systems Biology

Applications

Myocyte

Lukasz Szafaryn, Kevin Skadron, and Jeffrey J. Saucerman

University of Virginia

Page 2: Experiences Accelerating MATLAB Systems Biology Applications Myocyte

2

Outline

• MATLAB• Optimizations to MATLAB• GPU Acceleration with CUDA• Applications (Heart Wall Tracking and Myocyte Simulation)

– Problem– Algorithm– Optimization and performance– Lessons

• Conclusions • Future Research

Page 3: Experiences Accelerating MATLAB Systems Biology Applications Myocyte

MATLAB

• Convenient but inefficient programming language of choice for scientists- Interpreted language- Most of the existing code and libraries are single-threaded

• MATLAB Parallel Toolbox - understanding of parallel programming

• Jacket and GPUmat - large parallelism to justify overhead

3

Page 4: Experiences Accelerating MATLAB Systems Biology Applications Myocyte

MATLAB contd.

• Interpreted language optimized by JIT compiler – 2x slower than C

• MATLAB Embedded Compiler has limited support – 1.2-1.4x slower than C

• MEX Interface to link C code- translating to C - many functions written from scratch- no support for convenient OpenMP standard, need to use

thread libraries

4

Page 5: Experiences Accelerating MATLAB Systems Biology Applications Myocyte

5

Acceleration

1. Translation: - convert MATLAB to C

2. Parallelization:– C for multi-core CPU– CUDA for GPU

Experimental Setup

– CPU: 3.2 GHz quad-core Intel Core 2 Extreme– GPU: NVIDIA GeForce GTX 280 (PCIe 2.0)– MS Windows, MS C Compiler

Page 6: Experiences Accelerating MATLAB Systems Biology Applications Myocyte

6

Allocate GPU memory

Transfer inputs

Launch kernel

Return to CPU

Transfer results

Free GPU memory

C Program

CUDA Kernel

CPU GPU

Acceleration with GPU (CUDA)

Page 7: Experiences Accelerating MATLAB Systems Biology Applications Myocyte

7

Myocyte SimulationApplication

• Models single cardiac myocyte and its electrical activity - determined to be a key aspect in the development of heart failure

• Modeled by 91 Ordinary Differential Equations (ODEs) and 250 supporting equations

ODE solver

Initial Values

Model evaluation

91 equations

ODE Values / Next time step

35%

65%

Page 8: Experiences Accelerating MATLAB Systems Biology Applications Myocyte

8

Myocyte SimulationAlgorithm

• Sequential nature of ODE solving does not allow processing of time steps in parallel

• Speed-up from parallelizing model evaluation is limited by Amdahl's law

• Mainly fine-grained TLP, no DLP, limited coarse-grained TLP by grouping equations

task-level parallelism (TLP)

time

1 2 3 4 15

Page 9: Experiences Accelerating MATLAB Systems Biology Applications Myocyte

9

Myocyte SimulationPerformance of a Single Simulation

• Time reported for 10,000-point simulation (10s of simulated time)

1.57x

2.23x

1.52x1.28x

2.40x

4.29x

2.19x

Page 10: Experiences Accelerating MATLAB Systems Biology Applications Myocyte

10

Myocyte SimulationPerformance of Multiple Simulations

• Time reported for 1000-point simulation (1.0s of simulated time) using a custom solver

OpenMP constant

~3.6x speedup

CUDA saturates at

~10.88x speedup

Page 11: Experiences Accelerating MATLAB Systems Biology Applications Myocyte

Porting Tradeoffs

Convenience Performance

Developing modular code

• replacing each MATLAB function with equivalent parallelized GPU function

• common routines can be possibly reused in other applications

Hiding specific aspects of GPU programming inside each module

• each module sets up its own GPU execution parameters

• each module performs its own I/O with GPU transparently

Restructuring and combining code

•writing code based on algorithm tasks rather than MATLAB statements

•overlap parallel (often unrelated) tasks by executing them in the same kernel call

Exposing specific aspects of GPU programming

•doing more work inside each GPU kernel call to fully exploit parallelism and avoid overhead

•performing I/O with GPU manually to eliminate redundant data transfers

Page 12: Experiences Accelerating MATLAB Systems Biology Applications Myocyte

• Typical MATLAB code written by a scientist has room for optimization – 2.0x

• Conversion of the model to C was straightforward• More speedup possible by accelerating entire solver, not just

the model evaluation• GPU can still provide best acceleration if its overhead is

eliminated (heterogeneous chip), but…• Significant speedup is anticipated by offloading application to

FPGA (well suited to fine-grained irregular parallelism)

Myocyte SimulationLessons

12

Page 13: Experiences Accelerating MATLAB Systems Biology Applications Myocyte

13

Conclusions

• Limited availability of C libraries necessitates time consuming coding

• Many systems biology applications (even those with limited parallelism) benefit from GPU

• GPU overheads are significant (should be eliminated in new CPU-GPU architectures)

• Real-time processing feasible in near future• Ultimately, acceleration of applications should be

automated!

Page 14: Experiences Accelerating MATLAB Systems Biology Applications Myocyte

14

Future Research

• Automatic acceleration with the use of compiler - via use of architecture-specific libraries- via compiling for target architecture

• Merging of workloads- based on resource needs- based on dependency

• Acceleration with alternative architectures- well suited for fine-grained parallelism

- esp. FPGA

Page 15: Experiences Accelerating MATLAB Systems Biology Applications Myocyte

15

Acknowledgements

• Funding provided by:– NSF grant IIS-0612049– SRC grant 1607.001

• Equipment donated by NVIDIA

Page 16: Experiences Accelerating MATLAB Systems Biology Applications Myocyte

16

Software

Source code will be soon available at:http://lava.cs.virginia.edu/wiki/rodinia