mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical...

Collaborators: Evan Sparks, Michael Franklin, Michael I. Jordan, Tim Kraska

UC Berkeley

Ameet Talwalkar

Towards an OpBmizer for MLbase

Problem: Scalable implementa.ons difficult for ML Developers…

ML Developer

Meta-Data

Statistics

Declarative ML Task

ML Contract + Code

Master Server

result (e.g., fn-model & summary)

Optimizer

Parser

Executor/Monitoring

ML Library

DMX Runtime

Master

Slaves

ML Developer

Meta-Data

Statistics

Declarative ML Task

ML Contract + Code

Master Server

Optimizer

Parser

Executor/Monitoring

ML Library

DMX Runtime

Master

Slaves

Key Features

The Language of Technical Computing

MATLAB® is a high-level language and interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java™.

You can use MATLAB for a range of appli-cations, including signal processing and communications, image and video process-ing, control systems, test and measurement, computational finance, and computational biology. More than a million engineers and scientists in industry and academia use MATLAB, the language of technical computing.

MATLAB Overview 2:04

Analyzing and visualizing data using the MATLAB desktop. The MATLAB environment also lets you write programs and develop algorithms and applications.

Key Features

The Language of Technical Computing

MATLAB® is a high-level language and interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java™.

You can use MATLAB for a range of appli-cations, including signal processing and communications, image and video process-ing, control systems, test and measurement, computational finance, and computational biology. More than a million engineers and scientists in industry and academia use MATLAB, the language of technical computing.

MATLAB Overview 2:04

Analyzing and visualizing data using the MATLAB desktop. The MATLAB environment also lets you write programs and develop algorithms and applications.

ML Developer

Meta-Data

Statistics

Declarative ML Task

ML Contract + Code

Master Server

Optimizer

Parser

Executor/Monitoring

ML Library

DMX Runtime

Master

Slaves

ML Developer

Meta-Data

Statistics

Declarative ML Task

ML Contract + Code

Master Server

Optimizer

Parser

Executor/Monitoring

ML Library

DMX Runtime

Master

Slaves

CHALLENGE: Can we simplify distributed ML development?

Problem: ML is difficultfor End Users…

Too many ways to preprocess…

Too many algorithms…

Too many knobs…

Difficult to debug…

Too many knobs…

Doesn’t scale…

Too many knobs…

Doesn’t scale…

CHALLENGE: Can we automate ML pipeline

construcBon?

MLbase

MLbase aims to simplify development and deployment of ML

pipelines

MLbase

Apache Spark

Spark: Cluster compuBng system designed for iteraBve computaBon

pipelines

MLbase

MLlibApache Spark

MLlib: Spark’s core ML library

pipelines

MLbase

Apache Spark

MLI: API to simplify ML development

pipelines

MLbase

MLIMLOpt

Apache Spark

MLOpt: DeclaraBve layer that aims to automate ML pipeline construcBon via search over feature extractors and models

pipelines

MLbase

MLIMLOpt

Apache Spark

MLOpt: DeclaraBve layer that aims to automate ML pipeline construcBon via search over feature extractors and models

pipelinesMLOpt and MLI are experimental testbeds

Vision MLlib and MLI MLOpt

MLlib+ Scalable and fast + Simple development environment + Part of Spark’s robust ecosystem

SparkSQL

Apache Spark

Spark Streaming

MLlib (machine learning)

GraphX (graph)

AcBve Development

AcBve DevelopmentIni.al Release• Developed by MLbase team in AMPLab (11 contributors)

• Scala, Java

• Shipped with Spark v0.8 (Sep 2013)

AcBve DevelopmentIni.al Release• Developed by MLbase team in AMPLab (11 contributors)

• Scala, Java

• Shipped with Spark v0.8 (Sep 2013)

11 months later…• 55+ contributors from various organizaBons

• Scala, Java, Python

• Improved documentaBon / code examples, API stability

• Latest release part of Spark v1.0 (May 2014)

Algorithms in v0.8• classifica.on: logisBc regression, linear support vector machines (SVM)

• regression: linear regression,

• collabora.ve filtering: alternaBng least squares (ALS)

• clustering: k-‐means

• op.miza.on: stochasBc gradient descent (SGD)

Algorithms in v1.0• classifica.on: logisBc regression, linear support vector machines (SVM), naive Bayes, decision trees

• regression: linear regression, regression trees

• collabora.ve filtering: alternaBng least squares (ALS)

• clustering: k-‐means

• op.miza.on: stochasBc gradient descent (SGD), limited-‐memory BFGS (L-‐BFGS)

• dimensionality reduc.on: singular value decomposiBon (SVD), principal component analysis (PCA)

MLlib, MLI and Roadmap

MLlib, MLI and Roadmap• MLI: Shield ML Developers from low-‐details

• Provide familiar mathemaBcal operators in distributed sebng (tables, matrices, opBmizaBon primiBves)

• Standard APIs defining ML algorithms and feature extractors

• Many of these ideas are (or soon will be) in MLlib

• Next release of Spark and MLlib being tested now • staBsBcal toolbox, python decision tree API, online logisBc regression, …

• Longer term • Scalable implementaBons of standard ML algorithms and underlying opBmizaBon primiBves

• Support for ML pipeline development (related to MLOpt)

• Longer term • Scalable implementaBons of standard ML algorithms and underlying opBmizaBon primiBves

• Support for ML pipeline development (related to MLOpt)

Feedback and Contribu9ons Encouraged!

Vision MLlib and MLI MLOpt

Grand Vision

✦ User declaraBvely specifies a task ✦ Search through MLlib/MLI to find

the best model/pipeline

SQL Result ‘MQL’ Model

A Standard ML Pipeline

Feature ExtracBon

Model Training

Feature ExtracBon

Model Training

Final Model

Feature ExtracBon

Model Training

Final Model

✦ In pracBce, model building is an iteraBve process of conBnuous refinement

Feature ExtracBon

Model Training

Final Model

✦ Our grand vision is to automate the construcBon of these pipelines

Training A Model

✦ For each point in dataset ✦ compute gradient ✦ update model ✦ repeat unBl converged

Training A Model

✦ Requires mul.ple passes

Training A Model

✦ Requires mul.ple passes✦ Common access pafern

✦ Naive Bayes, Trees, etc.

Training A Model

✦ Requires mul.ple passes✦ Common access pafern

✦ Naive Bayes, Trees, etc.

✦ Minutes to train an SVM on 200GB of data on a 16-‐node cluster

The Tricky Part✦ Algorithms

✦ LogisBc Regression, SVM, Tree-‐based, etc.

✦ Algorithm hyper-‐parameters ✦ Learning Rate, RegularizaBon, etc.

Algorithms

Hyper Parameters

Algorithms

Hyper Parameters

FeaturizaBon

✦ FeaturizaBon ✦ Text: n-‐grams, TF-‐IDF ✦ Images: Gabor filters, random

convoluBons ✦ Random projecBon? Scaling?

Algorithms

Hyper Parameters

FeaturizaBon

✦ FeaturizaBon ✦ Text: n-‐grams, TF-‐IDF ✦ Images: Gabor filters, random

convoluBons ✦ Random projecBon? Scaling?

A Standard ML Pipeline!

DataFeature ExtracBon

Model Training

Final Model

Feature ExtracBon

Model Training

Final Model

✦ Start with one aspect of the pipeline -‐ model selecBon

Feature ExtracBon

Model Training

Final Model

Automated Model Selec.on

One Approach

Learning Rate

RegularizaBon

One Approach

Learning Rate

RegularizaBon✦ Try it all!

✦ Search over all hyperparameters, algorithms, features, etc.

One Approach

Learning Rate

RegularizaBon

Best answer

✦ Try it all! ✦ Search over all

hyperparameters, algorithms, features, etc.

One Approach

Learning Rate

RegularizaBon

Best answer

One Approach

Learning Rate

RegularizaBon

Best answer

One Approach

Learning Rate

RegularizaBon

Best answer

One Approach

Learning Rate

RegularizaBon

Best answer

One Approach

Learning Rate

RegularizaBon

Best answer

One Approach

Learning Rate

RegularizaBon

Best answer

One Approach

Learning Rate

RegularizaBon

Best answer

One Approach

Learning Rate

RegularizaBon

Best answer

One Approach

Learning Rate

RegularizaBon

Best answer

✦ Drawbacks ✦ Expensive to compute models ✦ Hyperparameter space is large

✦ Some version of this sBll oken done in pracBce!

A Befer Approach

✦ Befer resource uBlizaBon ✦ through batching

✦ Algorithmic Speedups ✦ via early stopping

✦ Improved Search ✦ e.g., via randomizaBon

Learning Rate

RegularizaBon

Best answer

A Befer Approach

Learning Rate

RegularizaBon

Best answer

A Befer Approach

Learning Rate

RegularizaBon

Best answer

A Befer Approach

Learning Rate

RegularizaBon

Best answer

A Tale Of 3 OpBmizaBons

Be4er Resource U.liza.on

Algorithmic Speedups

Improved Search

Befer Resource UBlizaBon

✦ Modern memory slower than processors

✦ Can read: 0.6b doubles/sec/core (4.8 GB/s)

✦ Can compute: 15b flops/sec/core

Befer Resource UBlizaBon

✦ Modern memory slower than processors

✦ Can read: 0.6b doubles/sec/core (4.8 GB/s)

✦ Can compute: 15b flops/sec/core

✦ We can do 25 flops/double read

What Does This Mean For Modeling?

1 a Dog

1 b Cat

2 c Cat

2 d Cat

3 e Dog

3 f Horse

4 g Doge

✦ Typical model update requires 2-‐4 flops/double ✦ recall: 25 flops / double read

1 a Dog

1 b Cat

2 c Cat

2 d Cat

3 e Dog

3 f Horse

4 g Doge

✦ Can do 7-‐10 model updates per double we read ✦ Assuming that models fit in

1 a Dog

1 b Cat

2 c Cat

2 d Cat

3 e Dog

3 f Horse

4 g Doge

✦ Can do 7-‐10 model updates per double we read ✦ Assuming that models fit in

✦ Train mul.ple models simultaneously

1 a Dog

1 b Cat

2 c Cat

2 d Cat

3 e Dog

3 f Horse

4 g Doge

What Do We See In Spark?

✦ 2x and 5x increase in models trained/sec with batching

✦ Overhead from virtualizaBon, network, etc.

✦ These numbers are with vector-‐matrix mulBplies

✦ Can do befer when rewriBng in terms of matrix-‐matrix mulBplies

Improved Search

Learning Rate

RegularizaBon

Best answer

Learning Rate

RegularizaBon

Best answer

✦ Each point in hyper-‐parameter space represents trained model

Learning Rate

RegularizaBon

Best answer

✦ SomeBmes we see early on that a model is no good

Learning Rate

RegularizaBon

Best answer

✦ So we stop early -‐ give up on models that are not progressing

Learning Rate

RegularizaBon

Best answer

Improved Search

What Search Method?

✦ Various derivaBve-‐free opBmizaBon techniques ✦ Simple ones (Grid, Random) ✦ Classic DerivaBve-‐Free (Nelder-‐Mead, Powell’s method) ✦ Bayesian (SMAC, TPE, Spearmint)

What Search Method?

✦ What should we do? ✦ Tried on 5 datasets, opBmized over 4 hyperparameters!

What Search Method? GRID NELDER_MEAD POWELL RANDOM SMAC SPEARMINT TPE

0.00.10.20.30.40.5

australianbreast

diabetesfourclass

splice

16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625Method and Maximum Calls

Maximum Calls1681256625

Comparison of Search Methods Across Learning Problems

0.00.10.20.30.40.5

australianbreast

diabetesfourclass

splice

0.00.10.20.30.40.5

australianbreast

diabetesfourclass

splice

0.00.10.20.30.40.5

australianbreast

diabetesfourclass

splice

Pubng It All Together

Pubng It All Together✦ First version of MLbase opBmizer

Pubng It All Together✦ First version of MLbase opBmizer✦ 30GB dense images (240K x 16K)✦ 2 model families, 5 hyperparams

Pubng It All Together✦ First version of MLbase opBmizer✦ 30GB dense images (240K x 16K)✦ 2 model families, 5 hyperparams✦ Baseline: grid search✦ Our method: combinaBon of

✦ Batching ✦ Early stopping ✦ Random or TPE

●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 200 400 600 800Time elapsed (m)

Search Method●

Grid − UnoptimizedRandom − OptimizedTPE − Optimized

Model Convergence Over Time

●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 200 400 600 800Time elapsed (m)

Search Method●

Grid − UnoptimizedRandom − OptimizedTPE − Optimized

Model Convergence Over Time

20x speedup compared to grid search 15 minutes vs 5 hours!

Does It Scale?

●●●●●●●● ●●●●●● ●●●●● ●●●●

●●●●

●● ●● ●

5 10Time elapsed (h)

Convergence of Model Accuracy on 1.5TB Dataset

Does It Scale?

✦ 1.5TB dataset (1.2M x 160K)●●●●●●●● ●●●●●● ●●●●● ●●●●

●●●●

●● ●● ●

Does It Scale?

✦ 1.5TB dataset (1.2M x 160K)✦ 128 nodes, thousands of passes

over data✦ Tried 32 models in 15 hours

✦ Good answer aker 11 hours

●●●●●●●● ●●●●●● ●●●●● ●●●●

●●●●

●● ●● ●

Future Work

Feature ExtracBon

Model Training

Final Model

Future Work

Feature ExtracBon

Model Training

Final Model

Automated Model Selec.on

Future Work

Feature ExtracBon

Model Training

Final Model

Future Work

Feature ExtracBon

Model Training

Final Model

Automated ML Pipeline Construc.on

A Real Pipeline for Image ClassificaBon

Inspired by Coates & Ng, 2012

Data Image Parser Normalizer Convolver

sqrt,mean

Zipper

Linear Solver

Symmetric Rectifier

ident,absident,mean

Global Pooling

Pooler

Patch Extractor

Patch Whitener

KMeans Clusterer

Feature Extractor

Label Extractor

ModelLinear Mapper

Test Data

Label Extractor

Feature Extractor

Test Error

Error Computer

Slide courtesy of Evan Sparks

Data Image Parser Normalizer Convolver

sqrt,mean

Zipper

Linear Solver

Symmetric Rectifier

ident,absident,mean

Global

Pooler

Patch Extractor

Patch Whitener

KMeans Clusterer

Feature Extractor

Label Extractor

Linear Mapper Model

Test Data

Label Extractor

Feature Extractor

Test Error

Error Computer

No Hyperparameters A few Hyperparameters Lotsa Hyperparameters

Slide courtesy of Evan Sparks

Other Future Work

✦ Ensembling

Other Future Work

✦ Ensembling

✦ Leverage sampling

Other Future Work

✦ Ensembling

✦ Befer parallelism for smaller datasets

Other Future Work

✦ Ensembling

✦ Befer parallelism for smaller datasets

✦ MulBple hypothesis tesBng issues

MLOpt: DeclaraBve layer that aims to automate ML pipeline construcBon

THANKS! QUESTIONS?

baseML

ML base

www.mlbase.org

MLIMLOpt

Apache Spark

mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical...

Documents

Transcript of mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical...

MLbase: A Distributed Machine-learning Systematalwalk/mlbase.pdf · Machine learning (ML) and statistical techniques are key to transforming big data into actionable knowledge. In

Machine Learning & MapReduce - daslab.seas.harvard.edudaslab.seas.harvard.edu/classes/cs265/files/presentations/CS265... · Machine Learning & MapReduce MLbase: A Distributed Machine

MLbase: A Distributed Machine-learning Systemprovide little or no help for applying machine learning on Big Data. Many systems, such as standard databases and Hadoop, are not designed

Intro to Apache Spark - Stanford Universitystanford.edu/~rezab/sparkclass/slides/itas_workshop.pdf · By end of day, participants will be comfortable with the following:! • open

ML base ML base ML baseampcamp.berkeley.edu/wp-content/uploads/2013/08/mlbase...Code Master Server …. result (e.g., fn-model & summary) Optimizer Parser Executor/Monitoring ML Library

MLlib and Distributing the Singular Value …rezab/sparkclass/slides/reza_mllib.pdfOutline Example Invocations Beneﬁts of Iterations Singular Value Decomposition All-pairs Similarity

Scalable Automated Model Search - EECS at UC Berkeley · FACE, is part of MLbase [26] and takes advantage of MLI [37], a ... stricts the maximum training data-size to 250MB and the

Large-Scale Matrix Factorization: DSGD in Sparkrezab/classes/cme323/S16/projects_slides/baalbaki.pdf• PCA, SVD, NMF, TF, ALS, SGD, SSGD, DSGD, etc. • Spark MLlib: ALS • Gemulla

Giovanni Simonini - Welcome to HPC-Forge | HPC-Forge Streaming! MLlib! MLbase! GraphX! SparkR! Research Projects Università degli Studi di Modena e Reggio Emilia 9 imore RDDs are

Intro to Apache Spark - Stanford Universitystanford.edu/~rezab/sparkclass/slides/itas_workshop.pdfBy end of day, participants will be comfortable with the following:! • open a Spark

MLlib: Scalable Machine Learning on Sparkrezab/sparkworkshop/slides/...Spark SQL + MLlib // Data can easily be extracted from existing sources, // such as Apache Hive. val trainingTable

Distributed Computing with Spark - Stanford Universitystanford.edu/~rezab/sparkclass/slides/reza_introtalk.pdfSpark computing engine Machine Learning Example Current State of Spark

MLbase: A System for Distributed Machine Learningrezab/nips2014workshop/slides/...Distributed Machine Learning Problem: Scalable implementations difﬁcult for ML Developers … ML

MLbase: Distributed Machine Learning Made Easykubitron/courses/cs262...MLbase: Distributed Machine Learning Made Easy Xinghao Pan, Evan R. Sparks, Andre Wibisono ABSTRACT MLbase is

Distributed Computing with Spark - Stanford Universityrezab/sparkclass/slides/reza_introtalk.pdf · » The world only lets you make make RDDs such that they can be: Automatically

UC Berkeley MLbase$ - Massachusetts Institute of Technologydb.csail.mit.edu/nedbday13/slides/kraska.pdf · Unsupervised%Feature%ExtracMon:%Twi‘er% ML Developer Meta-Data Statistics

MLbase: A Distributed Machine-learning Systemi.stanford.edu/~adityagp/courses/cs598/papers/mlbase.pdfBig Data. Many systems, such as standard databases and Hadoop, are not designed

New Large-Scale Matrix Factorization: DSGD in Sparkrezab/classes/cme323/S16/projects... · 2016. 6. 9. · • PCA, SVD, NMF, TF, ALS, SGD, SSGD, DSGD, etc. • Spark MLlib: ALS •

Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure

MLbase: A Distributed Machine-learning System › wp-content › uploads › ... · Data is a growing concern, and machine learning techniques enable users to extract underlying structure