Matrix Factorization via SGD

47
Matrix Factorization via SGD

description

Matrix Factorization via SGD. Background. Recovering latent factors in a matrix. r. m movies. m movies. ~. H. W. V. n users. V[ i,j ] = user i’s rating of movie j. MF VIA SGD. Matrix factorization as SGD. local gradient. …scaled up by N to approximate gradient. step size. - PowerPoint PPT Presentation

Transcript of Matrix Factorization via SGD

Page 1: Matrix Factorization via SGD

Matrix Factorizationvia SGD

Page 2: Matrix Factorization via SGD

BACKGROUND

Page 3: Matrix Factorization via SGD
Page 4: Matrix Factorization via SGD

Recovering latent factors in a matrixm movies

n u

sers

m movies

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

V[i,j] = user i’s rating of movie j

r

W

H

V

Page 5: Matrix Factorization via SGD
Page 6: Matrix Factorization via SGD
Page 7: Matrix Factorization via SGD

MF VIA SGD

Page 8: Matrix Factorization via SGD

Matrix factorization as SGD

step size

local gradient

…scaled up by N to approximate gradient

Page 9: Matrix Factorization via SGD

Key claim:

Page 10: Matrix Factorization via SGD

What loss functions are possible?

Page 11: Matrix Factorization via SGD
Page 12: Matrix Factorization via SGD

ALS = alternating least squares

Page 13: Matrix Factorization via SGD

DISTRIBUTED MF VIA SGD

Page 14: Matrix Factorization via SGD

talk pilfered from …..

KDD 2011

Page 15: Matrix Factorization via SGD

NAACL 2010

Page 16: Matrix Factorization via SGD

Parallel Perceptrons• Simplest idea:

– Split data into S “shards”– Train a perceptron on each shard independently

• weight vectors are w(1) , w(2) , …

– Produce some weighted average of the w(i)‘s as the final result

Page 17: Matrix Factorization via SGD

Parallelizing perceptrons

Instances/labels

Instances/labels – 1 Instances/labels – 2 Instances/labels – 3

vk -1 vk- 2 vk-3

vk

Split into example subsets

Combine by some sort of

weighted averaging

Compute vk’s on subsets

Page 18: Matrix Factorization via SGD

Parallel Perceptrons – take 2

Idea: do the simplest possible thing iteratively.

• Split the data into shards• Let w = 0• For n=1,…• Train a perceptron on each

shard with one pass starting with w

• Average the weight vectors (somehow) and let w be that average

Extra communication cost: • redistributing the weight vectors• done less frequently than if fully synchronized, more frequently than if fully parallelized

All-Reduce

Page 19: Matrix Factorization via SGD

Parallelizing perceptrons – take 2

Instances/labels

Instances/labels – 1

Instances/labels – 2

Instances/labels – 3

w -1 w- 2 w-3

w

Split into example subsets

Combine by some sort of

weighted averaging

Compute local vk’s

w (previous)

Page 20: Matrix Factorization via SGD
Page 21: Matrix Factorization via SGD
Page 22: Matrix Factorization via SGD
Page 23: Matrix Factorization via SGD

Similar to McDonnell et al with perceptron learning

Page 24: Matrix Factorization via SGD

Slow convergence…..

Page 25: Matrix Factorization via SGD
Page 26: Matrix Factorization via SGD
Page 27: Matrix Factorization via SGD
Page 28: Matrix Factorization via SGD
Page 29: Matrix Factorization via SGD
Page 30: Matrix Factorization via SGD
Page 31: Matrix Factorization via SGD

More detail….• Initialize W,H randomly– not at zero

• Choose a random ordering (random sort) of the points in a stratum in each “sub-epoch”• Pick strata sequence by permuting rows and columns of M, and using M’[k,i] as column index of row i in subepoch k • Use “bold driver” to set step size:– increase step size when loss decreases (in an epoch)– decrease step size when loss increases

• Implemented in Hadoop and R/Snowfall

M=

Page 32: Matrix Factorization via SGD
Page 33: Matrix Factorization via SGD

Wall Clock Time8 nodes, 64 cores, R/snowIn-memory implementation

Page 34: Matrix Factorization via SGD
Page 35: Matrix Factorization via SGD
Page 36: Matrix Factorization via SGD
Page 37: Matrix Factorization via SGD
Page 38: Matrix Factorization via SGD

Number of Epochs

Page 39: Matrix Factorization via SGD
Page 40: Matrix Factorization via SGD
Page 41: Matrix Factorization via SGD
Page 42: Matrix Factorization via SGD
Page 43: Matrix Factorization via SGD

Varying rank100 epochs for all

Page 44: Matrix Factorization via SGD

Wall Clock TimeHadoopOne map-reduce job per epoch

Page 45: Matrix Factorization via SGD

Hadoop scalabilityHadoop

process setup time starts to

dominate

Page 46: Matrix Factorization via SGD

Hadoop scalability

Page 47: Matrix Factorization via SGD