Acceleration of software package "R" using GPU's

20
Acceleration of software package "R" using GPU's Sachinthaka Abeywardana

description

Acceleration of software package "R" using GPU's. Sachinthaka Abeywardana. Introduction to Graphic Processing Units (GPU). Introduction to GPU contd. Introduction to R and BLAS. R Statistical Package Graphics. BLAS (Basic Linear Algebra Subprograms) - PowerPoint PPT Presentation

Transcript of Acceleration of software package "R" using GPU's

Page 1: Acceleration of software package "R" using GPU's

Acceleration of software package "R" using GPU's

Sachinthaka Abeywardana

Page 2: Acceleration of software package "R" using GPU's

CSIRO.

Introduction to Graphic Processing Units (GPU)

Page 3: Acceleration of software package "R" using GPU's

CSIRO.

Introduction to GPU contd.

Page 4: Acceleration of software package "R" using GPU's

CSIRO.

Introduction to R and BLAS

• R• Statistical Package

• Graphics

•BLAS (Basic Linear Algebra Subprograms)

•Vector-Vector Addition/Multiplication etc.

•Vector-Matrix Addition/Multiplication etc.

•Matrix-Matrix Addition/Multiplication etc.

LAPack (Linear Algebra Package)

Page 5: Acceleration of software package "R" using GPU's

What has been done in this project

• Aim: Replace Rblas.dll with a faster BLAS library

CSIRO.

R LAPack BLAS

New BLAS

Replace

Page 6: Acceleration of software package "R" using GPU's

Rblas.dll

How New Rblas.dll was created

CSIRO.

CUBLAS library

‘C program’ wrapper

FORTRAN

Initialise

Page 7: Acceleration of software package "R" using GPU's

CSIRO.

Results for 1000 x 1000 Matrices

CPU

Average (s)

3.2 * A %*% B + 4.1 * A

(3.2 A x B + 4.1 B)

1.9335

A%*%B

(Matrix A x matrix B)

1.8855

t(A)%*%B

(Transpose matrix A x Matrix B)

1.9135

solve(A)

(Invert Matrix A)

2.227 4.69 5.288

GPU

Average (s)

Single Precision

GPU

Average (s)

Double Precision

0.2375 0.123

0.176 0.092

0.207 0.089

Page 8: Acceleration of software package "R" using GPU's

CSIRO.

Improvements

 Single Precision (%)

Double Precision (%)

3.2 * A %*% B + 4.1 * A 814.1052632 1571.95122

A%*%B 1071.306818 2049.456522

t(A)%*%B 924.3961353 2150

solve(A) -210.597216 -237.4494836

Page 9: Acceleration of software package "R" using GPU's

CSIRO.

Who to Blame

A. Simply random?

B. Me???

C. Stupid Computer?

D. Memory allocation.

Page 10: Acceleration of software package "R" using GPU's

CSIRO.

Nvidia GPU Architecture

Page 11: Acceleration of software package "R" using GPU's

CSIRO.

Nvidia GPU Architecture contd.

Page 12: Acceleration of software package "R" using GPU's

CSIRO.

Nvidia GPU Architecture contd.

Page 13: Acceleration of software package "R" using GPU's

CSIRO.

CPU vs GPU calculations for matrix inversion

139.5

45.42

-20

0

20

40

60

80

100

120

140

160

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Size of Square Matrix (one side)

Tim

e (s

)

CPU

GPU

Page 14: Acceleration of software package "R" using GPU's

CSIRO.

Matrix Multiplication Timing

-20

0

20

40

60

80

100

120

140

0 1000 2000 3000 4000 5000

Matrix Size (one side)

Tim

e (s

) CPU

GPU Single Precision

GPU Double Precision

Page 15: Acceleration of software package "R" using GPU's

CSIRO.

Comparison with Atlas RBlas

• Improvement on multiplication : A%*%B 319%• Improvement on inverting matrix: solve(A) 281%

(source:http://www.stat.columbia.edu/~cook/movabletype/archives/2008/06/a-trick-to-spee.html)

Limitations on Atlas:

•Latest version is for pentium 4 only

Page 16: Acceleration of software package "R" using GPU's

CSIRO.

Limitations of this Project

• Specific Card• Cost

• GeForce GTX 280 $582 (Source: http://www.msy.com.au/Parts/PARTS.pdf)

• Precision?• RMS of 6.350072e-06 for inverting a 1024 x 1024 matrix for the

single precision cards.

• IEEE 754 deviations

Page 17: Acceleration of software package "R" using GPU's

CSIRO.

Where can I get this from

• https://wiki.csiro.au/confluence/display/terabyte/GPU+Accelerated+R

Page 18: Acceleration of software package "R" using GPU's

CSIRO.

Where to from now?

• Implementation of more Blas functions• Getting rid of overhead

• Adjusting LAPack

• Double precision to Single Precision and Single to Double Conversion

• Parallel Extensions (CPU)

Page 19: Acceleration of software package "R" using GPU's

CSIRO.

Thank You

• Luke Domanski• Dadong Wang• Pascal Valotton• Glenn Stone• Robert Dunne• CMIS/ CSIRO staff

Page 20: Acceleration of software package "R" using GPU's

CSIRO.