Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

44
Scaling Apache Spark MLlib to billions of parameters Yanbo Liang (Apache Spark committer @ Hortonworks) joint with Weichen Xu

Transcript of Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Page 1: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Scaling Apache Spark MLlib tobillions of parameters

Yanbo Liang(Apache Spark committer @ Hortonworks)

joint with Weichen Xu

Page 2: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Outline

• Background• Vector-free L-BFGS on Spark• Logistic regression on vector-free L-BFGS• Performance• Integrate with existing MLlib• Future work

Page 3: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Outline

• Background• Vector-free L-BFGS on Spark• Logistic regression on vector-free L-BFGS• Performance• Integrate with existing MLlib• Future work

Page 4: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Big data + big models = better accuracies

Page 5: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Spark: a unified platform

“Hidden Technical Debt in Machine Learning Systems”, Google

Page 6: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Spark: a unified platform

“Hidden Technical Debt in Machine Learning Systems”, Google

Page 7: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Optimization

• L-BFGS• SGD• WLS

Page 8: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

MLlib logistic regression

Initialize coefficients

Broadcast coefficients to Executors

Comput loss and gradient for each instance, and sum them up locally

Comput loss and gradient for each instance, and sum them up locally

Comput loss and gradient for each instance, and sum them up locally

Reduce from executors to get

lossSum and gradientSum

Handle regularizationand

use L-BFGS/OWL-QNto find next step Final model

coefficients

Executors/Workers

Driver/Controller

loop untial converge

Page 9: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Bottleneck 1

Initialize coefficients

Broadcast coefficients to Executors

Comput loss and gradient for each instance, and sum them up locally

Comput loss and gradient for each instance, and sum them up locally

Comput loss and gradient for each instance, and sum them up locally

Reduce from executors to get

lossSum and gradientSum

Handle regularizationand

use L-BFGS/OWL-QNto find next step Final model

coefficients

Executors/Workers

Driver/Controller

loop untial converge

Page 10: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Solution 1Comput loss and gradient for each instance, and sum them up locally

Comput loss and gradient for each instance, and sum them up locally

Comput loss and gradient for each instance, and sum them up locally

Reduce from executors to get

final lossSum and gradientSum

Comput loss and gradient for each instance, and sum them up locally

Reduce fromexecutors to getpartial lossSum

and gradientSum

Reduce fromexecutors to getpartial lossSum

and gradientSum

Page 11: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Bottleneck 2

Initialize coefficients

Broadcast coefficients to Executors

Comput loss and gradient for each instance, and sum them up locally

Comput loss and gradient for each instance, and sum them up locally

Comput loss and gradient for each instance, and sum them up locally

Reduce from executors to get

lossSum and gradientSum

Handle regularizationand

use L-BFGS/OWL-QNto find next step Final model

coefficients

Executors/Workers

Driver/Controller

loop untial converge

Page 12: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Outline

• Background• Vector-free L-BFGS on Spark• Logistic regression on vector-free L-BFGS• Performance• Integrate with existing MLlib• Future work

Page 13: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Distributed Vector

Driver

[2.5, 6.8, 3.1, 9.0, 0.7, 1.9, 2.6, 8.7, 6.2, 1.1, 5.0, 0.0]

worker0 worker1 worker2 worker3

Page 14: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Distributed Vector

Driver

[2.5, 6.8, 3.1, 9.0, 0.7, 1.9, 2.6, 8.7, 6.2, 1.1, 5.0, 0.0]

[2.5, 6.8, 3.1]

worker0

[9.0, 0.7, 1.9]

worker1

[2.6, 8.7, 6.2]

worker2

[1.1, 5.0, 0.0]

worker3

Page 15: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Distributed Vector

• Definition:

• Linear algebra operations:– a * x + y– a * x + b * y + c * z + …– x.dot(y)– x.norm– …

class DistribuedVector(val values: RDD[Vector],val sizePerPart: Int,val numPartitions: Int,val size: Long)

Page 16: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

L-BFGSCoefficients: X(k)

Choose descent direction(Two loop recursion)

X(k+1) = alpha * dir(k) + X(k)

Calculate G(k+1) and F(k+1)

Page 17: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

L-BFGSCoefficients: X(k)

Choose descent direction(Two loop recursion)

X(k+1) = alpha * dir(k) + X(k)

Calculate G(k+1) and F(k+1)

Worker Worker Worker Worker

WorkerWorkerWorkerWorker

Driver

Space: (2m+1)dTime: 6md

Page 18: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Vector-free L-BFGSCoefficients: X(k)

Choose descent direction(Two loop recursion)

X(k+1) = alpha * dir(k) + X(k)

Calculate G(k+1) and F(k+1)

Page 19: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Vector-free L-BFGSCoefficients: X(k)

Choose descent direction(Two loop recursion)

X(k+1) = alpha * dir(k) + X(k)

Calculate G(k+1) and F(k+1)

Worker Worker Worker Worker

WorkerWorkerWorkerWorker

Driver

Space: (2m+1)*dTime: 6md

Space: (2m+1)^2Time: 8m^2

Page 20: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Outline

• Background• Vector-free L-BFGS on Spark• Logistic regression on vector-free L-BFGS• Performance• Integrate with existing MLlib• Future work

Page 21: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

numFeatures

numInstance

partition1

partition2

partition3

Page 22: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

numFeatures

numInstance

Page 23: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

coefficients: DistributedVectorcoeff0 coeff1 coeff2

Page 24: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

coeff2coeff0

coeff0

coeff0

coeff1

coeff1

coeff1

coeff2

coeff2

Page 25: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

multiplier02

multiplier22

multiplier12

multiplier01

multiplier00

multiplier11

multiplier10

multiplier21

multiplier20

Page 26: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

multiplier02

multiplier22

multiplier12

multiplier01

multiplier00

multiplier11

multiplier10

multiplier21

multiplier20

label0label2

label1

Page 27: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

multiplier0

multiplier2

multiplier1

Page 28: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

multiplier0

multiplier2

multiplier1

multiplier0

multiplier0

multiplier1

multiplier1

multiplier2

multiplier2

Page 29: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

grad02grad00

grad10

grad20

grad01

grad11

grad21

grad12

grad22

Page 30: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

grad0 grad1 grad2 gradient: DistributedVector

Page 31: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Regularization and OWL-QN

• Regularization:– L2– L1– elasticNet

• Implement vector-free OWL-QN for objective with L1/elasticNetregularization.

Page 32: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

CheckpointCoefficients: X(k)

Choose descent direction(Two loop recursion)

X(k+1) = alpha * dir(k) + X(k)

Calculate G(k+1) and F(k+1)

Worker Worker Worker Worker

WorkerWorkerWorkerWorker

Driver

Space: (2m+1)*dTime: 6md

Space: (2m+1)^2Time: 8m^2

Page 33: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Outline

• Background• Vector-free L-BFGS on Spark• Logistic regression on vector-free L-BFGS• Performance• Integrate with existing MLlib• Future work

Page 34: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Performance(WIP)

0

20

40

60

80

100

120

1M 10M 50M 100M 250M 500M 750M 1B

Time for each epoch

L-BFGS VL-BFGS

Page 35: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Outline

• Background• Vector-free L-BFGS on Spark• Logistic regression on vector-free L-BFGS• Performance• Integrate with existing MLlib• Future work

Page 36: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

APIs

val dataset: Dataset[_] = spark.read.format("libsvm").load("data/a9a")

val trainer = new VLogisticRegression().setColsPerBlock(100) .setRowsPerBlock(10) .setColPartitions(3) .setRowPartitions(3) .setRegParam(0.5)

val model = trainer.fit(dataset)println(s"Vector-free logistic regression coefficients:

${model.coefficients}")

Page 37: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Adaptive logistic regression

Number of features Optimization method

less than 4096 WLS/IRLS

more than 4096,but less than 10 million

L-BFGS

more than 10 million VL-BFGS

Page 38: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Whole picture of MLlib

Application layer(Ads CTR, anti-fraud, data science, NLP, …)

WLS/IRLS

Optimization layer

Spark core

L-BFGS VL-BFGS

LinearRegression

Machine learning algorithm layer

LogisticRegression MLP …

SGD …

Page 39: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

APIs

val dataset: Dataset[_] = spark.read.format("libsvm").load("data/a9a")

val trainer = new LogisticRegression().setColsPerBlock(100) .setRowsPerBlock(10) .setColPartitions(3) .setRowPartitions(3) .setRegParam(0.5).setSolver(“vl-bfgs”)

val model = trainer.fit(dataset)println(s"Vector-free logistic regression coefficients:

${model.coefficients}")

Page 40: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Outline

• Background• Vector-free L-BFGS on Spark• Logistic regression on vector-free L-BFGS• Performance• Integrate with existing MLlib• Future work

Page 41: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Future work

• Reduce stages for each iteration.• Performance improvements.• Implement LinearRegression, SoftmaxRegression and

MultilayerPerceptronClassifier base on vector-free L-BFGS/OWL-QN.• Real word use case for Ads CTR prediction with billions parameters

and will share experiences and lessons we learned:– https://github.com/yanboliang/spark-vlbfgs

Page 42: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Key takeaways

• Full distributed calculation, full distributed model, can runsuccessfully without OOM.

• VL-BFGS API is consistent with breeze L-BFGS.• VL-BFGS output exactly the same solution as breeze L-BFGS.• Does not require special components such as parameter servers.• Pure library and can be deployed and used easily.• Can benefit from the Spark cluster operation and development

experience.• Use the same platform as the data size increasing.

Page 43: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Reference

• https://papers.nips.cc/paper/5333-large-scale-l-bfgs-using-mapreduce

• https://github.com/mengxr/spark-vl-bfgs• https://spark-summit.org/2015/events/large-scale-lasso-and-elastic-net-regularized-generalized-linear-models/

Page 44: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

Thank You.

Yanbo Liang