Performance Computing with R Tips and Tricks...

43
Tips and Tricks for Performance Computing with R Bryan W Lewis [email protected] http://goo.gl/gcPezs

Transcript of Performance Computing with R Tips and Tricks...

Page 1: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Tips and Tricks forPerformance Computing with R

Bryan W [email protected]

http://goo.gl/gcPezs

Page 2: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

First tip:

Read Patrick Burns' The R Infernohttp://www.burns-stat.com/documents/books/the-r-inferno

— If you are using R and you think you’re in hell,this is a map for you.

Page 3: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Numeric librariesNumeric librariesNumeric libraries

The very heart of R

Page 4: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

R BLAS/LAPACK PERFORMANCE

Mac OS X is pretty good

GNU/Linux is OK to poor but easiest to improve and most flexible

Windows is generally poor but easy to improve

http://cran.r-project.org/bin/windows/base/rw-FAQ.html#Can-I-use-a-fast-BLAS_003f

Page 5: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Commercial BLAS/LAPACK libraries exhibit the best all-around performance.

Intel MKL -- Superb performance, but not free and can be tricky to use.

AMD ACML -- Freely available on Linux and Windows, pretty easy to use.

Page 6: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

We're going to install free commercial BLAS/LAPACK libraries on Windows and Linux.

Brace yourself.

It's not hard, but not pretty either.

Page 7: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Installing ACML for R on Windows

1. Download and install ACML (PGI Windows version) http://developer.amd.com/amd-license-agreement/?f=acml4.4.0-win64.exe

2. Install vcredist_x64.exe http://www.microsoft.com/en-us/download/details.aspx?id=30679 www.microsoft.com/download/en/details.aspx?14632

3. copy e:\AMD\acml4.4.0\win64\lib\* "c:\Program Files\R\R-3.0.1\bin\x64\"

copy "c:\Program Files\R\R-3.0.1\bin\x64\Rblas.dll" "c:\Program Files\R\R-3.0.1\bin\x64\Rblas.save"

copy "c:\Program Files\R\R-3.0.1\bin\x64\libacml_dll.dll" "c:\Program Files\R\R-3.0.1\bin\x64\Rblas.dll"

4. Set OMP_NUM_THREADS environment variable.

Page 8: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Why the old library version in the previous slide?

I couldn't get newer versions to work using multiple threads on Windows (although they work fine in single-threaded mode).

If you have a lot of cores, use the old library version shown. Otherwise, you can use the latest library version in single-threaded mode and still typically achieve large performance gains.

Page 9: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Installing ACML for R on Linux

Download and install ACML (GFortran Linux version) http://developer.amd.com/amd-license-agreement/?f=acml-5-3-1-gfortran-64bit.tgz

cp /opt/acml5.3.1/gfortran64_mp/lib/* \ /usr/local/R/lib/

cd /usr/local/R/lib

cp libRblas.so libRblas.so.backup

cp libacml_mp.so libRblas.so

Set OMP_NUM_THREADS environment variable.

Page 10: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

set.seed(1)A = matrix(rnorm(2000^2),2000)S = svd(A)

Page 11: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Caveat Emptor

There is an issue with the propagation of NA and NaN values in the AMD library version for Linux shown in the previous slides.

A discussion of the issue can be found here:

http://devgurus.amd.com/thread/153983

Page 12: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Vectorization

Page 13: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Example: Find overlapping time intervals within each unique ID grouping in a data frame

ID start end 1 2001-11-26 2002-06-03 1 2002-08-30 2002-10-15 1 2002-10-07 2003-01-27 1 2003-08-27 2003-11-18 1 2004-02-11 2004-06-23 1 2004-07-23 2005-02-10 2 2003-02-24 2003-02-28 2 2003-07-11 2003-09-09 2 2004-06-26 2004-10-16 3 2002-09-15 2002-12-18

Easy enough to spot that row 2 overlaps row 3 by eye. But what about doing this automatically and efficiently on vast data?

Page 14: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

f = function(X){ overlap = c() for(j in 1:(nrow(X)-1)) { if (X[j,"ID"] == X[j+1, "ID"] && X[j,"end"] > X[j+1, "start"]) { overlap = c(overlap, j) } } overlap}

This will work, but slowly

There are at least two performance-related problems with this approach, do you see them?

Page 15: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Let's try again...maybe compile it?

Unfortunately, this doesn't help much (at least on my cheap laptop).

library("compiler")cf = cmpfun(f)

Page 16: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

The vectorized way...

v = function (X){ block = c(diff(X$ID) == 0,TRUE) up = c(X$start, Inf)[2:(nrow(X)+1)]

which ((up < X$end) & block)}

Page 17: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

ID start end block 1 2001-11-26 2002-06-03 TRUE 1 2002-08-30 2002-10-15 TRUE 1 2002-10-07 2003-01-27 TRUE 1 2003-08-27 2003-11-18 TRUE 1 2004-02-11 2004-06-23 TRUE 1 2004-07-23 2005-02-10 FALSE 2 2003-02-24 2003-02-28 TRUE 2 2003-07-11 2003-09-09 TRUE 2 2004-06-26 2004-10-16 FALSE 3 2002-09-15 2002-12-18 TRUE

Gist 1. Start by adding the 'block' variable:

Page 18: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

ID start up end block 1 2001-11-26 2002-08-30 2002-06-03 TRUE 1 2002-08-30 2002-10-07 2002-10-15 TRUE 1 2002-10-07 2003-08-27 2003-01-27 TRUE 1 2003-08-27 2004-02-11 2003-11-18 TRUE 1 2004-02-11 2004-07-23 2004-06-23 TRUE 1 2004-07-23 2003-02-24 2005-02-10 FALSE 2 2003-02-24 2003-07-11 2003-02-28 TRUE 2 2003-07-11 2004-06-26 2003-09-09 TRUE 2 2004-06-26 2002-09-15 2004-10-16 FALSE 3 2002-09-15 <NA> 2002-12-18 TRUE

2. Shift the start column up

Gist

Page 19: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

ID start up end block 1 2001-11-26 2002-08-30 2002-06-03 TRUE 1 2002-08-30 2002-10-07 2002-10-15 TRUE 1 2002-10-07 2003-08-27 2003-01-27 TRUE 1 2003-08-27 2004-02-11 2003-11-18 TRUE 1 2004-02-11 2004-07-23 2004-06-23 TRUE 1 2004-07-23 2003-02-24 2005-02-10 FALSE 2 2003-02-24 2003-07-11 2003-02-28 TRUE 2 2003-07-11 2004-06-26 2003-09-09 TRUE 2 2004-06-26 2002-09-15 2004-10-16 FALSE 3 2002-09-15 <NA> 2002-12-18 TRUE

3. Compare shifted start with end column and block condition

Gist

Page 20: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Vectorization performance

Tested on my slow laptop with 10,000 rows of data like the example:

For loop 6.8 seconds

Compiled for loop 6.4 seconds

Vectorized 0.009 seconds (!!)

Page 21: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Parallel computing with foreach

—Don't mourn slow-runningprograms...Organize them!

Page 22: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Foreach defines an abstract interface to parallel computing.

Computations are performed by 'back ends' that register with foreach.

The same code works sequentially or in parallel.

Page 23: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

foreach (iterator, ...) %dopar% {R expression}

Page 24: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

library("foreach")

foreach(j=1:4) %dopar% {j}

[[1]][1] 1

[[2]][1] 2

[[3]][1] 3

[[4]][1] 4

Page 25: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

foreach(j=1:4, .combine=c) %dopar% {j}

[1] 1 2 3 4

foreach(j=1:4, .combine=`+`) %dopar% {j}

[1] 10

Mapped expression

Reduction function

Page 26: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Foreach respects lexical scope—it just works...

z = 2f = function(x) { sqrt(x + z) }

foreach(j=1:4, .combine=c) %dopar% { f(j) }

[1] 1.732051 2.000000 2.236068 2.449490

Foreach figures out that the mapped expression needs to know definition of f the value of z.

Foreach automatically handles exporting this value to to wherever the work is being computed.

Page 27: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Nesting parallel loops

Use %:% to nest foreach loops. The example creates one set of 15 tasks instead of 3 tasks, which might be better load-balanced across available resources:

foreach(x=0:2) %:% foreach(y=1:5,.combine=c) %dopar% { x+y }

[[1]][1] 1 2 3 4 5

[[2]][1] 2 3 4 5 6

[[3]][1] 3 4 5 6 7

Page 28: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

List comprehension-like syntax

Use `when` to add a filter predicate

foreach(x=0:2) %:% foreach(y=1:5,.combine=c) %:% when(x<y) %dopar% {x+y}

[[1]][1] 1 2 3 4 5

[[2]][1] 3 4 5 6

[[3]][1] 5 6 7

Page 29: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Some available parallel backends...

doMPIdoSNOWdoMC (Unix-like OS only)doNWSdoSMP (Windows only, maybe unmaintained now)doDeathstar (Zero MQ based--nifty!)doRedis (Elastic, fault-tolerant, cross-platform)

See http://goo.gl/G9VAA for a different presentation about elastic computing in R with doRedis.

Page 30: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

There exist many other superb parallel computing techniques for R.

http://cran.r-project.org/web/views/HighPerformanceComputing.html

Page 31: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Know your algorithms

Page 32: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Say you want a few principal components of a matrix...

set.seed(55)A = matrix(rnorm(1000^2),1000)P = princomp(A)

princomp works, but computes way more than we want!

Page 33: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Say you want a few principal components of a matrix...

set.seed(55)A = matrix(rnorm(1000^2),1000)P = princomp(A)

library("irlba")C = scale(A, center=TRUE, scale=FALSE)P = irlba(C, nu=2, nv=2)

princomp works, but computes way more than we want!

irlba efficiently computes just what we want.

Page 34: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Example performance, 1000x1000 matrix, two principal components (computed on my cheap laptop)

And the performance gain grows with bigger problems...

Page 35: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Higher performance GLMs(well, glm.fit really)

Page 36: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

GLM boils down to a nonlinear optimization problem.

It's usually solved with iteratively re-weighted least squares, an iteration something like:

Where X is a model matrix, W a weight matrix, beta are the model coefs, and Yhat is a quantity derived from the response vector.

Page 37: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

IRWLS can be split into a map/reduce-like parallel problem.

Let:

Then

These are independent partial products that can be computed in parallel.Here, Z is a value that depends on Yhat in the previous slide...

Page 38: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

The code ends up looking something like...

#Initilize partitions IX of model matrix X on the cluster somehow...

combiner = function(x,y){ list(XTWX=x$XTWX + y$XTWX, XTWz=x$XTWz + y$XTWz)}...

PX = foreach(icount(np), .combine=combiner, .inorder=FALSE) %dopar% { list(XTWX=crossprod(X, (W[IX] * X)), XTWz=t(crossprod(W[IX]*z[IX],X))) }

beta = solve(PX$XTWX, PX$XTWz, tol=2*.Machine$double.eps)...

Page 39: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Quick example

Logistic regression

10 million observations x 200 variables

bigglm works, but takes quite a while

I just ran a quick test (not really well optimized) on a 4-computer, 32 CPU core Linux cluster. It took about 5 minutes.

Page 40: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Notes...

The speedglm package almost got this right, but didn't think about parallelism.

There are some numerical stability issues to think about with this approach.

Look for code examples on http://illposed.net soon...

Page 41: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Performance gems to know...

Rcpp almost turns C++ into R(!!), making C++ much nicer to use, and making it really easy to mix R and C++ ideas and code.

Bigmemory lets R work with matrices larger than available RAM. Flexmem employs a trick to let any R object exceed available RAM size.

SciDB lets R easily work in parallel on distributed arrays. SciDB can handle really big data problems (tens of terabytes or more).

Programming bid data in R (pbdR) defines Scalapack-based distributed dense arrays and parallel methods for R. It runs on giant supercomputers at ORNL.

rmr is the most elegant R/Hadoop integration I know of.

Page 42: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

A few algorithm gems...

Jordan's Bag of Little Bootstraps may be one of the more important algorithms for big data mining in a while. It shows that many problems can be computed in an embarrassingly parallel way (that is, partitioned into fully independent sub-problems whose answers can be combined to form a final result).

GLMNet is an extension of the elastic net variable selection/regularization method from the Friedman, Hastie, and Rob Tibshirani that uncovers and exploits a remarkable computational trick to gain substantial computational efficiency. It's super cool.

Benzi, Boito, Estrada and others have come up with some amazingly efficient and very elegant techniques for efficiently estimating functions of huge graphs. See, for example, http://www.mathcs.emory.edu/~benzi/Web_papers/adjacency_paper.pdf.

Page 43: Performance Computing with R Tips and Tricks forfiles.meetup.com/3114872/CRUG_08_HighPerformanceComputing_20130807.pdfWhy the old library version in the previous slide? I couldn't

Tips and Tricks forPerformance Computing with R

Bryan W [email protected]

http://goo.gl/gcPezs