Matrix Completion with ALS - Stanford...

17
Reza Zadeh Matrix Completion with ALS @Reza_Zadeh | http://reza-zadeh.com

Transcript of Matrix Completion with ALS - Stanford...

Page 1: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

Reza Zadeh

Matrix Completion with ALS

@Reza_Zadeh | http://reza-zadeh.com

Page 2: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

Collaborative Filtering Goal: predict users’ movie ratings based on past ratings of other movies

R  =  

1  ?  ?  4  5  ?  3  ?  ?  3  5  ?  ?  3  5  ?  5  ?  ?  ?  1  4  ?  ?  ?  ?  2  ?  

Movies  

Users  

Page 3: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

Don’t mistake this with SVD.

Both are matrix factorizations, however SVD cannot handle missing entries.

Page 4: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

Optimization problem

Page 5: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

Alternating Least Squares 1.  Start with random A1, B1 2.  Solve for A2 to minimize ||R – A2B1

T|| 3.  Solve for B2 to minimize ||R – A2B2

T|| 4.  Repeat until convergence

R A = BT

Page 6: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

Attempt 1: Broadcast All

Page 7: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

Attempt 2: Data Parallel

Page 8: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

Attempt 3: Fully Parallel

Page 9: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

ALS on Spark Cache 2 copies of R in memory, one partitioned by rows and one by columns Keep A & B partitioned in corresponding way Operate on blocks to lower communication

R A = BT

Matei Zaharia,"Joey Gonzales,"Virginia Smith

Page 10: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

ALS Results

4208

481 297 0

1000

2000

3000

4000

5000

Tota

l Tim

e (s)

Mahout / Hadoop Spark (Scala) GraphLab (C++)

Page 11: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

State of the Spark ecosystem

Page 12: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

Most active open source community in big data

200+ developers, 50+ companies contributing

Spark Community

Giraph Storm

0

50

100

150

Contributors in past year

Page 13: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

Project Activity M

apRe

duce

YA

RN HD

FS

Stor

m

Spar

k

0

200

400

600

800

1000

1200

1400

1600

Map

Redu

ce

YARN

HDFS

St

orm

Spar

k

0

50000

100000

150000

200000

250000

300000

350000

Commits Lines of Code Changed

Activity in past 6 months

Page 14: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

Continuing Growth

source: ohloh.net

Contributors per month to Spark

Page 15: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

Conclusions

Page 16: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

Spark and Research Spark has all its roots in research, so we hope to keep incorporating new ideas!

Page 17: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD ...

Conclusion Data flow engines are becoming an important platform for numerical algorithms While early models like MapReduce were inefficient, new ones like Spark close this gap More info: spark.apache.org