Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
-
Upload
flink-forward -
Category
Technology
-
view
6.547 -
download
2
Transcript of Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
![Page 1: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/1.jpg)
Distributed Machine Learning with the Samsara DSL
Sebastian Schelter, Flink Forward 2015
![Page 2: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/2.jpg)
About me• about to finish my PhD on „Scaling Data Mining in Massively Parallel
Dataflow Systems“
• currently:
– Machine Learning Scientist / Post-Doctoral Researcher at Amazon's Berlin-based ML group
– senior researcher at the Database Group of TU Berlin
• member of the Apache Software Foundation (Mahout/Giraph/Flink)
![Page 3: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/3.jpg)
Samsara• Samsara is an easy-to-use domain specific language (DSL) for distributed large-scale
machine learning on systems like Apache Spark and Apache Flink
• part of the Apache Mahout project
• uses Scala as programming/scripting environment
• system-agnostic, R-like DSL :
val G = B %*% B.t - C - C.t + (ksi dot ksi) * (s_q cross s_q)
• algebraic expression optimizer for distributed linear algebra– provides a translation layer to distributed engines
Tqq
TTT ssCCBBG
![Page 4: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/4.jpg)
Data Types• Scalar real values
• In-memory vectors – dense – 2 types of sparse
• In-memory matrices– sparse and dense– a number of specialized matrices
• Distributed Row Matrices (DRM)– huge matrix, partitioned by rows– lives in the main memory of the cluster– provides small set of parallelized operations– lazily evaluated operation execution
val x = 2.367
val v = dvec(1, 0, 5)
val w = svec((0 -> 1)::(2 -> 5)::Nil)
val A = dense((1, 0, 5), (2, 1, 4), (4, 3, 1))
val drmA = drmFromHDFS(...)
![Page 5: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/5.jpg)
Features (1)• matrix, vector, scalar operators:
in-memory, distributed
• slicing operators
• assignments (in-memory only)
• vector-specific
• summaries
drmA %*% drmBA %*% xA.t %*% drmBA * B
A(5 until 20, 3 until 40)A(5, ::); A(5, 5)x(a to b)
A(5, ::) := x A *= BA -=: B; 1 /:= x
x dot y; x cross y
A.nrow; x.length; A.colSums; B.rowMeans x.sum; A.norm
![Page 6: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/6.jpg)
Features (2)• solving linear systems
• in-memory decompositions
• distributed decompositions
• caching of DRMs
val x = solve(A, b)
val (inMemQ, inMemR) = qr(inMemM)val ch = chol(inMemM)val (inMemV, d) = eigen(inMemM)val (inMemU, inMemV, s) = svd(inMemM)
val (drmQ, inMemR) = thinQR(drmA)val (drmU, drmV, s) = dssvd(drmA, k = 50, q = 1)
val drmA_cached = drmA.checkpoint()
drmA_cached.uncache()
![Page 7: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/7.jpg)
Example
![Page 8: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/8.jpg)
Cereals
Name protein fat carbo sugars rating
Apple Cinnamon Cheerios 2 2 10.5 10 29.509541
Cap‘n‘Crunch 1 2 12 12 18.042851
Cocoa Puffs 1 1 12 13 22.736446
Froot Loops 2 1 11 13 32.207582
Honey Graham Ohs 1 2 12 11 21.871292
Wheaties Honey Gold 2 1 16 8 36.187559
Cheerios 6 2 17 1 50.764999
Clusters 3 2 13 7 40.400208
Great Grains Pecan 3 3 13 4 45.811716
http://lib.stat.cmu.edu/DASL/Datafiles/Cereals.html
![Page 9: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/9.jpg)
Linear Regression• Assumption: target variable y generated by linear combination of feature
matrix X with parameter vector β, plus noise ε
• Goal: find estimate of the parameter vector β that explains the data well
• Cereals example X = weights of ingredients
y = customer rating
Xy
![Page 10: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/10.jpg)
Data Ingestion• Usually: load dataset as DRM from a distributed filesystem:
val drmData = drmFromHdfs(...)
• ‚Mimick‘ a large dataset for our example:
val drmData = drmParallelize(dense( (2, 2, 10.5, 10, 29.509541), // Apple Cinnamon Cheerios (1, 2, 12, 12, 18.042851), // Cap'n'Crunch (1, 1, 12, 13, 22.736446), // Cocoa Puffs (2, 1, 11, 13, 32.207582), // Froot Loops (1, 2, 12, 11, 21.871292), // Honey Graham Ohs (2, 1, 16, 8, 36.187559), // Wheaties Honey Gold (6, 2, 17, 1, 50.764999), // Cheerios (3, 2, 13, 7, 40.400208), // Clusters
(3, 3, 13, 4, 45.811716)), // Great Grains Pecan numPartitions = 2)
![Page 11: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/11.jpg)
Data Preparation• Cereals example: target variable y is customer rating, weights of
ingredients are features X
• extract X as DRM by slicing, fetch y as in-core vector
val drmX = drmData(::, 0 until 4)
val y = drmData.collect(::, 4)
8117164541333
4002084071323
7649995011726
1875593681612
87129221111221
20758232131112
73644622131211
04285118121221
509541291051022
.
.
.
.
.
.
.
.
..
drmX y
![Page 12: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/12.jpg)
Estimating β• Ordinary Least Squares: minimizes the sum of residual squares between
true target variable and prediction of target variable• Closed-form expression for estimation of ß as
• Computing XTX and XTy is as simple as typing the formulas:
val drmXtX = drmX.t %*% drmX
val drmXty = drmX %*% y
yXXX TT 1)(ˆ
![Page 13: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/13.jpg)
Estimating β• Solve the following linear system to get least-squares estimate of ß
• Fetch XTX and XTy onto the driver and use an in-core solver – assumes XTX fits into memory– uses analogon to R’s solve() function
val XtX = drmXtX.collect val Xty = drmXty.collect(::, 0)
val betaHat = solve(XtX, Xty)
yXXX TT
![Page 14: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/14.jpg)
Estimating β• Solve the following linear system to get least-squares estimate of ß
• Fetch XTX and XTy onto the driver and use an in-memory solver – assumes XTX fits into memory– uses analogon to R’s solve() function
val XtX = drmXtX.collect val Xty = drmXty.collect(::, 0)
val betaHat = solve(XtX, Xty)
yXXX TT
→ We have implemented distributed linear regression!
![Page 15: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/15.jpg)
Goodness of fit• Prediction of the target variable is simple matrix-vector multiplication
• Check L2 norm of the difference between true target variable and our prediction
val yHat = (drmX %*% betaHat).collect(::, 0) (y - yHat).norm(2)
ˆ Xy
![Page 16: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/16.jpg)
Adding a bias term• Bias term left out so far
– constant factor added to the model, “shifts the line vertically”
• Common trick is to add a column of ones to the feature matrix– bias term will be learned automatically
141333
171323
111726
181612
1111221
1131112
1131211
1121221
11051022 .
41333
71323
11726
81612
111221
131112
131211
121221
105.1022
![Page 17: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/17.jpg)
Adding a bias term• How do we add a new column to a DRM? → mapBlock() allows for custom modifications of the matrix
val drmXwithBiasColumn = drmX.mapBlock(ncol = drmX.ncol + 1) {
case(keys, block) => // create a new block with an additional column
val blockWithBiasCol = block.like(block.nrow, block.ncol+1)
// copy data from current block into the new block blockWithBiasCol(::, 0 until block.ncol) := block // last column consists of ones blockWithBiasColumn(::, block.ncol) := 1
keys -> blockWithBiasColumn }
![Page 18: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/18.jpg)
Under the covers
![Page 19: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/19.jpg)
Underlying systems
• prototype on Apache Spark
• prototype on h20:
• coming up: support for Apache Flink
![Page 20: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/20.jpg)
Runtime & Optimization• Execution is defered, user
composes logical operators
• Computational actions implicitly trigger optimization (= selection of physical plan) and execution
• Optimization factors: size of operands, orientation of operands, partitioning, sharing of computational paths
• e. g.: matrix multiplication:– 5 physical operators for drmA %*% drmB– 2 operators for drmA %*% inMemA– 1 operator for drm A %*% x – 1 operator for x %*% drmA
val C = A.t %*% A
I.writeDrm(path);
val inMemV =(U %*% M).collect
![Page 21: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/21.jpg)
Optimization Example• Computation of ATA in example
• Naïve execution
1st pass: transpose A (requires repartitioning of A)
2nd pass: multiply result with A(expensive, potentially requires repartitioning again)
• Logical optimization:
rewrite plan to use specialized logical operator for Transpose-Times-Self matrix multiplication
val C = A.t %*% A
![Page 22: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/22.jpg)
Optimization Example• Computation of ATA in example
• Naïve execution
1st pass: transpose A (requires repartitioning of A)
2nd pass: multiply result with A(expensive, potentially requires repartitioning again)
• Logical optimization:
rewrite plan to use specialized logical operator for Transpose-Times-Self matrix multiplication
val C = A.t %*% A
Transpose
A
![Page 23: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/23.jpg)
Optimization Example• Computation of ATA in example
• Naïve execution
1st pass: transpose A (requires repartitioning of A)
2nd pass: multiply result with A(expensive, potentially requires repartitioning again)
• Logical optimization:
rewrite plan to use specialized logical operator for Transpose-Times-Self matrix multiplication
val C = A.t %*% A
Transpose
MatrixMult
A A
C
![Page 24: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/24.jpg)
Optimization Example• Computation of ATA in example
• Naïve execution
1st pass: transpose A (requires repartitioning of A)
2nd pass: multiply result with A(expensive, potentially requires repartitioning again)
• Logical optimization
Optimizer rewrites plan to use specialized logical operator for Transpose-Times-Self matrix multiplication
val C = A.t %*% A
Transpose
MatrixMult
A A
C
Transpose-Times-Self
A
C
![Page 25: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/25.jpg)
Transpose-Times-Self• Samsara computes ATA via row-outer-product formulation
– executes in a single pass over row-partitioned A
m
i
Tii
T aaAA0
![Page 26: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/26.jpg)
Transpose-Times-Self• Samsara computes ATA via row-outer-product formulation
– executes in a single pass over row-partitioned A
m
i
Tii
T aaAA0
A
![Page 27: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/27.jpg)
Transpose-Times-Self• Samsara computes ATA via row-outer-product formulation
– executes in a single pass over row-partitioned A
m
i
Tii
T aaAA0
x
A AT
![Page 28: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/28.jpg)
Transpose-Times-Self• Samsara computes ATA via row-outer-product formulation
– executes in a single pass over row-partitioned A
m
i
Tii
T aaAA0
x = x
A AT a1• a1•T
![Page 29: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/29.jpg)
Transpose-Times-Self• Samsara computes ATA via row-outer-product formulation
– executes in a single pass over row-partitioned A
m
i
Tii
T aaAA0
x = x + x
A AT a1• a1•T a2• a2•
T
![Page 30: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/30.jpg)
Transpose-Times-Self• Mahout computes ATA via row-outer-product formulation
– executes in a single pass over row-partitioned A
m
i
Tii
T aaAA0
x = x + +x x
A AT a1• a1•T a2• a2•
T a3• a3•T
![Page 31: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/31.jpg)
Transpose-Times-Self• Samsara computes ATA via row-outer-product formulation
– executes in a single pass over row-partitioned A
m
i
Tii
T aaAA0
x = x + + +x x x
A AT a1• a1•T a2• a2•
T a3• a3•T a4• a4•
T
![Page 32: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/32.jpg)
Physical operators for the distributed computation of ATA
![Page 33: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/33.jpg)
Physical operators for Transpose-Times-Self
• Two physical operators (concrete implementations) available for Transpose-Times-Self operation
– standard operator AtA – operator AtA_slim, specialized
implementation for tall & skinny matrices
• Optimizer must choose – currently: depends on user-defined
threshold for number of columns– ideally: cost based decision, dependent on
estimates of intermediate result sizes
Transpose-Times-Self
A
C
![Page 34: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/34.jpg)
Physical operator AtA
1100
0101
0111
A
![Page 35: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/35.jpg)
A2
1100
Physical operator AtA
1100
0101
0111A1
A
worker 1
worker 2
0101
0111
![Page 36: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/36.jpg)
A2
1100
Physical operator AtA
1100
0101
0111A1
A
worker 1
worker 2
0101
0111
for 1st partition
for 1st partition
![Page 37: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/37.jpg)
A2
1100
Physical operator AtA
1100
0101
0111A1
A
worker 1
worker 2
0101
0111
01111
1
11000
0
for 1st partition
for 1st partition
![Page 38: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/38.jpg)
A2
1100
Physical operator AtA
1100
0101
0111A1
A
worker 1
worker 2
0101
0111
01111
1
11000
0
for 1st partition
for 1st partition
01010
1
![Page 39: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/39.jpg)
A2
1100
Physical operator AtA
1100
0101
0111A1
A
worker 1
worker 2
0101
0111
01111
1
11000
0
for 1st partition
for 1st partition
01010
1
for 2nd partition
for 2nd partition
![Page 40: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/40.jpg)
A2
1100
Physical operator AtA
1100
0101
0111A1
A
worker 1
worker 2
0101
0111
01111
1
11000
0
for 1st partition
for 1st partition
01010
1
01110
1
for 2nd partition
11001
1
for 2nd partition
![Page 41: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/41.jpg)
A2
1100
Physical operator AtA
1100
0101
0111A1
A
worker 1
worker 2
0101
0111
01111
1
11000
0
for 1st partition
for 1st partition
01010
1
01110
1
for 2nd partition
01010
1
11001
1
for 2nd partition
![Page 42: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/42.jpg)
A2
1100
Physical operator AtA
1100
0101
0111A1
A
worker 1
worker 2
0101
0111
0111
0111
0000
0000
for 1st partition
for 1st partition
0000
0101
0000
0111
for 2nd partition
0000
0101
1100
1100
for 2nd partition
![Page 43: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/43.jpg)
A2
1100
Physical operator AtA
1100
0101
0111A1
A
worker 1
worker 2
0101
0111
0111
0111
0000
0000
for 1st partition
for 1st partition
0000
0101
0000
0111
for 2nd partition
0000
0101
1100
1100
for 2nd partition
0111
0212
worker 3
1100
1312
worker 4
∑
∑
ATA
![Page 44: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/44.jpg)
Physical operator AtA_slim
1100
0101
0111
A
![Page 45: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/45.jpg)
A2
1100
Physical operator AtA_slim
1100
0101
0111A1
A
worker 1
worker 2
0101
0111
![Page 46: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/46.jpg)
A2TA2 A2
1100
1
11
000
0000
Physical operator AtA_slim
1100
0101
0111A1
TA1 A1
A
worker 1
worker 2
0101
0111
0
02
011
0212
![Page 47: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/47.jpg)
A2TA2 A2
1100
1
11
000
0000
Physical operator AtA_slim
1100
0101
0111A1
TA1 A1
A C = ATA
worker 1
worker 2
A1TA1 + A2
TA2
driver
0101
0111
0
02
011
0212
1100
1312
0111
0212
![Page 48: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/48.jpg)
Pointers
• Contribution to Apache Mahout in progress: https://issues.apache.org/jira/browse/MAHOUT-1570
• Apache Mahout has extensive documentation on Samsara– http://mahout.apache.org/users/environment/in-core-reference.html– https://mahout.apache.org/users/environment/out-of-core-reference.html
![Page 49: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL](https://reader031.fdocuments.net/reader031/viewer/2022030305/587138591a28abf0568b6367/html5/thumbnails/49.jpg)
Thank you. Questions?