Factorbird: a Parameter Server Approach to Distributed...

26
Factorbird: a Parameter Server Approach to Distributed Matrix Factorization Sebastian Schelter, Venu Satuluri, Reza Zadeh Distributed Machine Learning and Matrix Computations workshop in conjunction with NIPS 2014

Transcript of Factorbird: a Parameter Server Approach to Distributed...

Page 1: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Factorbird: a Parameter Server Approach to Distributed Matrix

Factorization

Sebastian  Schelter,  Venu  Satuluri,  Reza  Zadeh Distributed  Machine  Learning  and  Matrix  

Computations  workshop  in  conjunction  with  NIPS  2014

Page 2: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Latent Factor Models •  Given  M – sparse – n  x  m

•  Returns  U  and  V – rank  k

•  Applications – Dimensionality  reduction – Recommendation – Inference

Page 3: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Seem familiar?

•  So  why  not  just  use  SVD?

SVD!

Page 4: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Problems with SVD

•  (Feb  24,  2015  edition)

Page 5: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Revamped loss function

•  g  –  global  bias  term •  bUi  –  user-­‐speciUic  bias  term  for  user  i •  bVj  –  item-­‐speciUic  bias  term  for  item  j •  prediction  function� p(i,  j)  =  g  +  bUi  +  bVj  +  uTivj

•  a(i,  j)  –  analogous  to  SVD’s  mij  (ground  truth)

•  New  loss  function:

Page 6: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Algorithm

Page 7: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Problems

1.  Resulting  U  and  V,  for  graphs  with  millions  of  vertices,  still  equate  to  hundreds  of  gigabytes  of  Uloating  point  values.

2.  SGD  is  inherently  sequential;  either  locking  or  multiple  passes  are  required  to  synchronize.

Page 8: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Problem 1: size of parameters

•  Solution:  Parameter  Server  architecture

Page 9: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Problem 2: simultaneous writes

•  Solution: …so  what?

Page 10: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Lock-free concurrent updates?

•  Assumptions

1.   f  is  Lipshitz  continuously  differentiable 2.   f  is  strongly  convex 3.  Ω  (size  of  hypergraph)  is  small 4.  Δ  (fraction  of  edges  that  intersect  any  variable)  is  small

5.  ρ  (sparsity  of  hypergraph)  is  small

Page 11: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Factorbird Architecture

Page 12: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Parameter server architecture

•  Open  source! – http://parameterserver.org/

Page 13: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Factorbird Machinery

•  memcached  –  Distributed  memory  object  caching  system

•  Uinagle  –  Twitter’s  RPC  system •  HDFS  –  persistent  Uilestore  for  data •  Scalding  –  Scala  front-­‐end  for  Hadoop  MapReduce  jobs

•  Mesos  –  resource  manager  for  learner  machines

Page 14: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Factorbird stubs

Page 15: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Model assessment

•  Matrix  factorization  using  RMSE – Root-­‐mean  squared  error

•  SGD  performance  often  a  function  of  hyperparameters – λ:  regularization – η:  learning  rate – k:  number  of  latent  factors

Page 16: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

[Hyper]Parameter grid search

•  aka  “parameter  scans:”  Uinding  the  optimal  combination  of  hyperparameters – Parallelize!

m⇥ (c ⇤ k)(c ⇤ k)⇥ n

Page 17: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Experiments

•  “RealGraph” – Not  a  dataset;  a  framework  for  creating  graph  of  user-­‐user  interactions  on  Twitter

Kamath, Krishna, et al. "RealGraph: User Interaction Prediction at Twitter." User Engagement Optimization Workshop@ KDD. 2014.

Page 18: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Experiments

•  Data:  binarized  adjacency  matrix  of  subset  of  Twitter  follower  graph – a(i,  j)  =  1  if  user  i  interacted  with  user  j,  0  otherwise

•  All  prediction  errors  weighted  equally  (w(i,  j)  =  1)

•  100  million  interactions •  440,000  [popular]  users

Page 19: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Experiments

•  80%  training,  10%  validation,  10%  testing

Page 20: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Experiments

•  k  =  2 •  Homophily

Page 21: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Experiments

•  Scalability  of  Factorbird – large  RealGraph  subset – 229M  x  195M  (44.6  quadrillion) – 38.5  billion  non-­‐zero  entries

•  Single  SGD  pass  through  training  set:  ~2.5  hours

•  ~  40  billion  parameters

Page 22: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Important to note

•  As  with  most  (if  not  all)  distributed  platforms:

Page 23: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Future work

•  Support  streaming  (user  follows) •  Simultaneous  factorization •  Fault  tolerance •  Reduce  network  trafUic •  s/memcached/custom  application/g •  Load  balancing

Page 24: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Strengths

•  Excellent  extension  of  prior  work – Hogwild,  RealGraph

•  Current  and  [mostly]  open  technology – Hadoop,  Scalding,  Mesos,  memcached

•  Clear  problem,  clear  solution,  clear  validation

Page 25: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Weaknesses •  Lack  of  detail,  lack  of  detail,  lack  of  detail – How  does  number  of  machines  affect  runtime? – What  were  performance  metrics  of  the  large  RealGraph  subset? – What  were  some  of  the  properties  of  the  dataset  (when  was  it  collected,  how  were  edges  determined,  what  does  “popular”  mean,  etc)? – How  did  other  factorization  methods  perform  by  comparison?

Page 26: Factorbird: a Parameter Server Approach to Distributed ...stanford.edu/~rezab/slides/factorbird_slides_shannon.pdf · Factorbird: a Parameter Server Approach to Distributed Matrix

Questions?