SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis...
-
Upload
kaylie-bulen -
Category
Documents
-
view
215 -
download
0
Transcript of SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis...
![Page 1: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/1.jpg)
SCALING SGD TO BIG DATA & HUGE MODELSAlex Beutel
Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos, and Eric Xing
![Page 2: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/2.jpg)
2
Big Learning Challenges
Collaborative FilteringPredict movie preferences
Topic ModelingWhat are the topics of webpages,
tweets, or status updatesDictionary Learning
Remove noise or missing pixels from images
Tensor DecompositionFind communities in temporal graphs
300 Million Photos uploaded to Facebook per day!
1 Billion users on Facebook
400 million tweets per day
![Page 3: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/3.jpg)
3
Big Data & Huge Model Challenge• 2 Billion Tweets covering
300,000 words • Break into 1000 Topics• More than 2 Trillion
parameters to learn• Over 7 Terabytes of model
Topic ModelingWhat are the topics of webpages,
tweets, or status updates
400 million tweets per day
![Page 4: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/4.jpg)
4
Outline
1. Background
2. Optimization• Partitioning• Constraints & Projections
3. System Design1. General algorithm
2. How to use Hadoop
3. Distributed normalization
4. “Always-On SGD” – Dealing with stragglers
4. Experiments
5. Future questions
![Page 5: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/5.jpg)
5
BACKGROUND
![Page 6: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/6.jpg)
6
Stochastic Gradient Descent (SGD)
![Page 7: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/7.jpg)
7
Stochastic Gradient Descent (SGD)
![Page 8: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/8.jpg)
8
SGD for Matrix Factorization
XU
V
≈Users
Movies
Genres
![Page 9: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/9.jpg)
9
SGD for Matrix Factorization
XU
V
≈Independent!
![Page 10: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/10.jpg)
10
The Rise of SGD• Hogwild! (Niu et al, 2011)
• Noticed independence• If matrix is sparse, there will be little contention• Ignore locks
• DSGD (Gemulla et al, 2011)• Noticed independence• Broke matrix into blocks
![Page 11: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/11.jpg)
11
DSGD for Matrix Factorization (Gemulla, 2011)
Independent Blocks
![Page 12: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/12.jpg)
12
DSGD for Matrix Factorization (Gemulla, 2011)
Partition your data & model into d × d blocks
Results in d=3 strata
Process strata sequentially, process blocks in each stratum in parallel
![Page 13: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/13.jpg)
14
TENSOR DECOMPOSITION
![Page 14: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/14.jpg)
15
What is a tensor?• Tensors are used for structured data > 2 dimensions• Think of as a 3D-matrix
Subject
Verb
Object
For example:
Derek Jeter plays baseball
![Page 15: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/15.jpg)
16
Tensor Decomposition
≈U
V
W
XSubject
Verb
Object
Derek Jeter plays baseball
![Page 16: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/16.jpg)
17
Tensor Decomposition
≈U
V
W
X
![Page 17: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/17.jpg)
18
Tensor Decomposition
≈U
V
W
X
Independent
Not Independent
![Page 18: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/18.jpg)
19
Tensor Decomposition
![Page 19: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/19.jpg)
20
For d=3 blocks per stratum, we require d2=9 strata
![Page 20: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/20.jpg)
21
Coupled Matrix + Tensor Decomposition
XY
Subject
Verb
Object
Document
![Page 21: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/21.jpg)
22
Coupled Matrix + Tensor Decomposition
≈U
V
W
XY
A
![Page 22: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/22.jpg)
23
Coupled Matrix + Tensor Decomposition
![Page 23: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/23.jpg)
24
CONSTRAINTS & PROJECTIONS
![Page 24: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/24.jpg)
25
Example: Topic Modeling
Documents
Words
Topics
![Page 25: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/25.jpg)
26
Constraints
• Sometimes we want to restrict response:• Non-negative
• Sparsity
• Simplex (so vectors become probabilities)
• Keep inside unit ball
![Page 26: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/26.jpg)
27
How to enforce? Projections• Example: Non-negative
![Page 27: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/27.jpg)
28
More projections• Sparsity (soft thresholding):
• Simplex
• Unit ball
![Page 28: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/28.jpg)
29
Sparse Non-Negative Tensor Factorization
Sparse encoding
Non-negativity:
More interpretable results
![Page 29: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/29.jpg)
30
Dictionary Learning• Learn a dictionary of concepts and a sparse
reconstruction• Useful for fixing noise and missing pixels of images
Sparse encoding
Within unit ball
![Page 30: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/30.jpg)
31
Mixed Membership Network Decomp.
• Used for modeling communities in graphs (e.g. a social network)
Simplex
Non-negative
![Page 31: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/31.jpg)
32
Proof Sketch of Convergence• Regenerative process – each point is used once/epoch• Projections are not too big and don’t “wander off”
(Lipschitz continuous)• Step sizes are bounded:
[Details]
Normal Gradient Descent Update
Noise from SGD Projection
SGD Constraint error
![Page 32: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/32.jpg)
33
SYSTEM DESIGN
![Page 33: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/33.jpg)
34
High level algorithm
for Epoch e = 1 … T do
for Subepoch s = 1 … d2 do
Let be the set of blocks in stratum s
for block b = 1 … d in parallel do
Run SGD on all points in block
end
end
end
Stratum 1 Stratum 2 Stratum 3 …
![Page 34: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/34.jpg)
35
Bad Hadoop Algorithm: Subepoch 1
Run SGD on Update:
Run SGD on Update:
Run SGD on Update:
ReducersMappers
U2 V1 W3
U3 V2 W1
U1 V3 W2
![Page 35: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/35.jpg)
36
Bad Hadoop Algorithm: Subepoch 2
Run SGD on Update:
Run SGD on Update:
Run SGD on Update:
ReducersMappers
U2 V1 W2
U3 V2 W3
U1 V3 W1
![Page 36: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/36.jpg)
37
Hadoop Challenges• MapReduce is typically very bad for iterative algorithms
• T × d2 iterations
• Sizable overhead per Hadoop job• Little flexibility
![Page 37: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/37.jpg)
38
High Level Algorithm
V1
V2
V3
U1 U
2 U3
W 1
W 2
W 3
V1
V2
V3
U1 U
2 U3
W 1
W 2
W 3
U1 V1 W1 U2 V2 W2 U3 V3 W3
![Page 38: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/38.jpg)
39
High Level Algorithm
V1
V2
V3
U1 U
2 U3
W 1
W 2
W 3
V1
V2
V3
U1 U
2 U3
W 1
W 2
W 3
U1 V1 W3 U2 V2 W1 U3 V3 W2
![Page 39: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/39.jpg)
40
High Level Algorithm
V1
V2
V3
U1 U
2 U3
W 1
W 2
W 3
V1
V2
V3
U1 U
2 U3
W 1
W 2
W 3
U1 V1 W2 U2 V2 W3 U3 V3 W1
![Page 40: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/40.jpg)
42
Hadoop Algorithm
Process points:
Map each point
to its block
with necessary info to order
Reducers
Mappers
Partition &
Sort
![Page 41: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/41.jpg)
43
Hadoop Algorithm
Process points:
Map each point
to its block
with necessary info to order
Reducers
Mappers
Partition &
Sort
…
…
![Page 42: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/42.jpg)
44
Hadoop Algorithm
Process points:
Map each point
to its block
with necessary info to order
U1 V1 W1
Run SGD on Update:
U2 V2 W2
Run SGD on Update:
U3 V3 W3
Run SGD on Update:
Reducers
Mappers
…
…
Partition &
Sort
![Page 43: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/43.jpg)
45
Hadoop Algorithm
Process points:
Map each point
to its block
with necessary info to order
U1 V1 W1
Run SGD on Update:
U2 V2 W2
Run SGD on Update:
U3 V3 W3
Run SGD on Update:
Reducers
Mappers
Partition &
Sort
…
…
![Page 44: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/44.jpg)
46
Hadoop Algorithm
Process points:
Map each point
to its block
with necessary info to order
U1 V1
Run SGD on Update:
U2 V2
Run SGD on Update:
U3 V3
Run SGD on Update:
Reducers
Mappers
Partition &
Sort
…
…
HDFS
HDFS
W2
W1
W3
![Page 45: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/45.jpg)
47
System Summary• Limit storage and transfer of data and model• Stock Hadoop can be used with HDFS for communication• Hadoop makes the implementation highly portable• Alternatively, could also implement on top of MPI or even
a parameter server
![Page 46: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/46.jpg)
48
Distributed Normalization
Documents
Words
Topics
π1 β1
π2 β2
π3 β3
![Page 47: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/47.jpg)
49
Distributed Normalization
π1 β1
π2 β2π3 β3
σ(1)
σ(2)
σ(3)
σ(b) is a k-dimensional vector, summing the terms of βb
σ(1)
σ(1)
σ(3)
σ(3)
σ(2) σ(2)
Transfer σ(b) to all machinesEach machine calculates σ:
Normalize:
![Page 48: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/48.jpg)
50
Barriers & Stragglers
Process points:
Map each point
to its block
with necessary info to order
Run SGD on
Run SGD on
Run SGD on
Reducers
Mappers
Partition &
Sort
…
…U1 V1
Update:
U2 V2
Update:
U3 V3
Update:
HDFS
HDFS
W2
W1
W3
Wasting time waiting!
![Page 49: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/49.jpg)
51
Solution: “Always-On SGD”For each reducer:
Run SGD on all points in current block Z
Shuffle points in Z and decrease step size Check if other reducers
are ready to syncRun SGD on points in Z
againIf not ready to sync
Wait
If not ready to sync
Sync parameters and get new block Z
![Page 50: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/50.jpg)
52
“Always-On SGD”
Process points:
Map each point
to its block
with necessary info to order
Run SGD on
Run SGD on
Run SGD on
Reducers
Partition &
Sort
…
…U1 V1
Update:
U2 V2
Update:
U3 V3
Update:
HDFS
HDFS
W2
W1
W3
Run SGD on old points again!
![Page 51: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/51.jpg)
53
Proof Sketch• Martingale Difference Sequence: At the beginning of each
epoch, the expected number of times each point will be processed is equal
[Details]
![Page 52: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/52.jpg)
54
Proof Sketch• Martingale Difference Sequence: At the beginning of each
epoch, the expected number of times each point will be processed is equal
• Can use properties of SGD and MDS to show variance decreases with more points used
• Extra updates are valuable
[Details]
![Page 53: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/53.jpg)
55
“Always-On SGD”
First SGD pass of block Z
Extra SGD Updates
Read Parameters from HDFS
Write Parameters to HDFS
Reducer 1
Reducer2
Reducer 3
Reducer 4
![Page 54: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/54.jpg)
56
EXPERIMENTS
![Page 55: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/55.jpg)
57
FlexiFaCT (Tensor Decomposition)Convergence
![Page 56: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/56.jpg)
58
FlexiFaCT (Tensor Decomposition)Scalability in Data Size
![Page 57: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/57.jpg)
59
FlexiFaCT (Tensor Decomposition)Scalability in Tensor Dimension
Handles up to 2 billion parameters!
![Page 58: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/58.jpg)
60
FlexiFaCT (Tensor Decomposition)Scalability in Rank of Decomposition
Handles up to 4 billion parameters!
![Page 59: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/59.jpg)
61
Fugue (Using “Always-On SGD”)Dictionary Learning: Convergence
![Page 60: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/60.jpg)
62
Fugue (Using “Always-On SGD”)Community Detection: Convergence
![Page 61: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/61.jpg)
63
Fugue (Using “Always-On SGD”)Topic Modeling: Convergence
![Page 62: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/62.jpg)
64
Fugue (Using “Always-On SGD”)Topic Modeling: Scalability in Data Size
GraphLab cannot spill to
disk
![Page 63: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/63.jpg)
65
Fugue (Using “Always-On SGD”)Topic Modeling: Scalability in Rank
![Page 64: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/64.jpg)
66
Fugue (Using “Always-On SGD”)Topic Modeling: Scalability over Machines
![Page 65: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/65.jpg)
67
Fugue (Using “Always-On SGD”)Topic Modeling: Number of Machines
![Page 66: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/66.jpg)
68
Fugue (Using “Always-On SGD”)
![Page 67: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/67.jpg)
69
LOOKING FORWARD
![Page 68: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/68.jpg)
70
Future Questions• Do “extra updates” work on other techniques, e.g. Gibbs
sampling? Other iterative algorithms?• What other problems can be partitioned well? (Model &
Data)• Can we better choose certain data for extra updates?• How can we store large models on disk for I/O efficient
updates?
![Page 69: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/69.jpg)
71
Key Points• Flexible method for tensors & ML models• Partition both data and model together for efficiency and
scalability• When waiting for slower machines, run extra updates on
old data again• Algorithmic & systems challenges in scaling ML can be
addressed through statistical innovation
![Page 70: SCALING SGD TO BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos,](https://reader036.fdocuments.net/reader036/viewer/2022081520/56649ca25503460f9496181f/html5/thumbnails/70.jpg)
72
Questions?
Alex [email protected]://alexbeutel.comSource code available at http://beu.tl/flexifact