mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical...

Post on 28-May-2020

11 views 0 download

Transcript of mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical...

Collaborators:  Evan  Sparks,  Michael  Franklin,  Michael  I.  Jordan,  Tim  Kraska  

UC Berkeley

Ameet  Talwalkar

Towards  an  OpBmizer  for  MLbase

Problem:  Scalable  implementa.ons  difficult  for  ML  Developers…

ML Developer

Meta-Data

Statistics

User

Declarative ML Task

ML Contract + Code

Master Server

….

result (e.g., fn-model & summary)

Optimizer

Parser

Executor/Monitoring

ML Library

DMX Runtime

DMX Runtime

DMX Runtime

DMX Runtime

LLP

PLP

Master

Slaves

Problem:  Scalable  implementa.ons  difficult  for  ML  Developers…

ML Developer

Meta-Data

Statistics

User

Declarative ML Task

ML Contract + Code

Master Server

….

result (e.g., fn-model & summary)

Optimizer

Parser

Executor/Monitoring

ML Library

DMX Runtime

DMX Runtime

DMX Runtime

DMX Runtime

LLP

PLP

Master

Slaves

Key Features

-

-

® ®

The Language of Technical Computing

MATLAB® is a high-level language and interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java™.

You can use MATLAB for a range of appli-cations, including signal processing and communications, image and video process-ing, control systems, test and measurement, computational finance, and computational biology. More than a million engineers and scientists in industry and academia use MATLAB, the language of technical computing.

MATLAB Overview 2:04

Analyzing and visualizing data using the MATLAB desktop. The MATLAB environment also lets you write programs and develop algorithms and applications.

Key Features

-

-

® ®

The Language of Technical Computing

MATLAB® is a high-level language and interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java™.

You can use MATLAB for a range of appli-cations, including signal processing and communications, image and video process-ing, control systems, test and measurement, computational finance, and computational biology. More than a million engineers and scientists in industry and academia use MATLAB, the language of technical computing.

MATLAB Overview 2:04

Analyzing and visualizing data using the MATLAB desktop. The MATLAB environment also lets you write programs and develop algorithms and applications.

Problem:  Scalable  implementa.ons  difficult  for  ML  Developers…

ML Developer

Meta-Data

Statistics

User

Declarative ML Task

ML Contract + Code

Master Server

….

result (e.g., fn-model & summary)

Optimizer

Parser

Executor/Monitoring

ML Library

DMX Runtime

DMX Runtime

DMX Runtime

DMX Runtime

LLP

PLP

Master

Slaves

Problem:  Scalable  implementa.ons  difficult  for  ML  Developers…

ML Developer

Meta-Data

Statistics

User

Declarative ML Task

ML Contract + Code

Master Server

….

result (e.g., fn-model & summary)

Optimizer

Parser

Executor/Monitoring

ML Library

DMX Runtime

DMX Runtime

DMX Runtime

DMX Runtime

LLP

PLP

Master

Slaves

CHALLENGE:  Can  we  simplify  distributed  ML  development?

Problem:  ML  is  difficultfor  End  Users…

Too  many  ways  to  preprocess…

Problem:  ML  is  difficultfor  End  Users…

Too  many  ways  to  preprocess…

Problem:  ML  is  difficultfor  End  Users…

Too  many  algorithms…

Too  many  ways  to  preprocess…

Too  many  knobs…

Problem:  ML  is  difficultfor  End  Users…

Too  many  algorithms…

Too  many  ways  to  preprocess…

Too  many  knobs…

Problem:  ML  is  difficultfor  End  Users…

Difficult  to  debug…

Too  many  algorithms…

Too  many  ways  to  preprocess…

Too  many  knobs…

Problem:  ML  is  difficultfor  End  Users…

Difficult  to  debug…

Doesn’t  scale…

Too  many  algorithms…

Too  many  ways  to  preprocess…

Too  many  knobs…

Problem:  ML  is  difficultfor  End  Users…

Difficult  to  debug…

Doesn’t  scale…

CHALLENGE:  Can  we  automate  ML  pipeline  

construcBon?

Too  many  algorithms…

MLbase

4

MLbase  aims  to  simplify  development  and  deployment  of  ML  

pipelines

MLbase

4

Apache  Spark

Spark:  Cluster  compuBng  system  designed  for  iteraBve  computaBon

MLbase  aims  to  simplify  development  and  deployment  of  ML  

pipelines

MLbase

4

MLlibApache  Spark

Spark:  Cluster  compuBng  system  designed  for  iteraBve  computaBon

MLlib:  Spark’s  core  ML  library

MLbase  aims  to  simplify  development  and  deployment  of  ML  

pipelines

MLbase

4

MLlib

MLI

Apache  Spark

Spark:  Cluster  compuBng  system  designed  for  iteraBve  computaBon

MLlib:  Spark’s  core  ML  library

MLI:  API  to  simplify  ML  development

MLbase  aims  to  simplify  development  and  deployment  of  ML  

pipelines

MLbase

4

MLlib

MLIMLOpt

Apache  Spark

Spark:  Cluster  compuBng  system  designed  for  iteraBve  computaBon

MLlib:  Spark’s  core  ML  library

MLI:  API  to  simplify  ML  development

MLOpt:  DeclaraBve  layer  that  aims  to  automate  ML  pipeline  construcBon  via  search  over  feature  extractors  and  models

MLbase  aims  to  simplify  development  and  deployment  of  ML  

pipelines

MLbase

4

MLlib

MLIMLOpt

Apache  Spark

Spark:  Cluster  compuBng  system  designed  for  iteraBve  computaBon

MLlib:  Spark’s  core  ML  library

MLI:  API  to  simplify  ML  development

MLOpt:  DeclaraBve  layer  that  aims  to  automate  ML  pipeline  construcBon  via  search  over  feature  extractors  and  models

MLbase  aims  to  simplify  development  and  deployment  of  ML  

pipelinesMLOpt and MLI are experimental testbeds

Vision  MLlib  and  MLI  MLOpt

6

MLlib+    Scalable  and  fast    +    Simple  development  environment  +    Part  of  Spark’s  robust  ecosystem

SparkSQL

Apache Spark

Spark Streaming

MLlib (machine learning)

GraphX (graph)

AcBve  Development

AcBve  DevelopmentIni.al  Release• Developed  by  MLbase  team  in  AMPLab  (11  contributors)

• Scala,  Java

• Shipped  with  Spark  v0.8  (Sep  2013)

AcBve  DevelopmentIni.al  Release• Developed  by  MLbase  team  in  AMPLab  (11  contributors)

• Scala,  Java

• Shipped  with  Spark  v0.8  (Sep  2013)

11  months  later…• 55+  contributors  from  various  organizaBons

• Scala,  Java,  Python

• Improved  documentaBon  /  code  examples,  API  stability

• Latest  release  part  of  Spark  v1.0  (May  2014)

Algorithms  in  v0.8• classifica.on:  logisBc  regression,  linear  support  vector  machines  (SVM)  

• regression:  linear  regression,  

• collabora.ve  filtering:  alternaBng  least  squares  (ALS)  

• clustering:  k-­‐means  

• op.miza.on:  stochasBc  gradient  descent  (SGD)

Algorithms  in  v1.0• classifica.on:  logisBc  regression,  linear  support  vector  machines  (SVM),  naive  Bayes,  decision  trees  

• regression:  linear  regression,  regression  trees  

• collabora.ve  filtering:  alternaBng  least  squares  (ALS)  

• clustering:  k-­‐means  

• op.miza.on:  stochasBc  gradient  descent  (SGD),  limited-­‐memory  BFGS  (L-­‐BFGS)  

• dimensionality  reduc.on:  singular  value  decomposiBon  (SVD),  principal  component  analysis  (PCA)

MLlib,  MLI  and  Roadmap

MLlib,  MLI  and  Roadmap• MLI:  Shield  ML  Developers  from  low-­‐details  

• Provide  familiar  mathemaBcal  operators  in  distributed  sebng  (tables,  matrices,  opBmizaBon  primiBves)  

• Standard  APIs  defining  ML  algorithms  and  feature  extractors

MLlib,  MLI  and  Roadmap• MLI:  Shield  ML  Developers  from  low-­‐details  

• Provide  familiar  mathemaBcal  operators  in  distributed  sebng  (tables,  matrices,  opBmizaBon  primiBves)  

• Standard  APIs  defining  ML  algorithms  and  feature  extractors

• Many  of  these  ideas  are  (or  soon  will  be)  in  MLlib

MLlib,  MLI  and  Roadmap• MLI:  Shield  ML  Developers  from  low-­‐details  

• Provide  familiar  mathemaBcal  operators  in  distributed  sebng  (tables,  matrices,  opBmizaBon  primiBves)  

• Standard  APIs  defining  ML  algorithms  and  feature  extractors

• Many  of  these  ideas  are  (or  soon  will  be)  in  MLlib

• Next  release  of  Spark  and  MLlib  being  tested  now  • staBsBcal  toolbox,  python  decision  tree  API,  online  logisBc  regression,  …

MLlib,  MLI  and  Roadmap• MLI:  Shield  ML  Developers  from  low-­‐details  

• Provide  familiar  mathemaBcal  operators  in  distributed  sebng  (tables,  matrices,  opBmizaBon  primiBves)  

• Standard  APIs  defining  ML  algorithms  and  feature  extractors

• Many  of  these  ideas  are  (or  soon  will  be)  in  MLlib

• Next  release  of  Spark  and  MLlib  being  tested  now  • staBsBcal  toolbox,  python  decision  tree  API,  online  logisBc  regression,  …

• Longer  term  • Scalable  implementaBons  of  standard  ML  algorithms  and  underlying  opBmizaBon  primiBves  

• Support  for  ML  pipeline  development  (related  to  MLOpt)

MLlib,  MLI  and  Roadmap• MLI:  Shield  ML  Developers  from  low-­‐details  

• Provide  familiar  mathemaBcal  operators  in  distributed  sebng  (tables,  matrices,  opBmizaBon  primiBves)  

• Standard  APIs  defining  ML  algorithms  and  feature  extractors

• Many  of  these  ideas  are  (or  soon  will  be)  in  MLlib

• Next  release  of  Spark  and  MLlib  being  tested  now  • staBsBcal  toolbox,  python  decision  tree  API,  online  logisBc  regression,  …

• Longer  term  • Scalable  implementaBons  of  standard  ML  algorithms  and  underlying  opBmizaBon  primiBves  

• Support  for  ML  pipeline  development  (related  to  MLOpt)

Feedback  and  Contribu9ons  Encouraged!

Vision  MLlib  and  MLI  MLOpt

Grand  Vision

✦ User  declaraBvely  specifies  a  task  ✦ Search  through  MLlib/MLI  to  find  

the  best  model/pipeline

SQL Result ‘MQL’ Model

A  Standard  ML  Pipeline

!Data

A  Standard  ML  Pipeline

!Data

Feature  ExtracBon

A  Standard  ML  Pipeline

!Data

Feature  ExtracBon

Model  Training

A  Standard  ML  Pipeline

!Data

Feature  ExtracBon

Model  Training

Final    Model

A  Standard  ML  Pipeline

!Data

Feature  ExtracBon

Model  Training

Final    Model

A  Standard  ML  Pipeline

✦ In  pracBce,  model  building  is  an  iteraBve  process  of  conBnuous  refinement

!Data

Feature  ExtracBon

Model  Training

Final    Model

A  Standard  ML  Pipeline

✦ In  pracBce,  model  building  is  an  iteraBve  process  of  conBnuous  refinement

✦ Our  grand  vision  is  to  automate  the  construcBon  of  these  pipelines

Training  A  Model

Training  A  Model

✦ For  each  point  in  dataset  ✦ compute  gradient  ✦ update  model  ✦ repeat  unBl  converged

Training  A  Model

✦ For  each  point  in  dataset  ✦ compute  gradient  ✦ update  model  ✦ repeat  unBl  converged

✦ Requires  mul.ple  passes

Training  A  Model

✦ For  each  point  in  dataset  ✦ compute  gradient  ✦ update  model  ✦ repeat  unBl  converged

✦ Requires  mul.ple  passes✦ Common  access  pafern  

✦ Naive  Bayes,  Trees,  etc.

Training  A  Model

✦ For  each  point  in  dataset  ✦ compute  gradient  ✦ update  model  ✦ repeat  unBl  converged

✦ Requires  mul.ple  passes✦ Common  access  pafern  

✦ Naive  Bayes,  Trees,  etc.

✦ Minutes  to  train  an  SVM  on  200GB  of  data  on  a  16-­‐node  cluster

The  Tricky  Part✦ Algorithms  

✦ LogisBc  Regression,  SVM,  Tree-­‐based,  etc.  

✦ Algorithm  hyper-­‐parameters  ✦ Learning  Rate,  RegularizaBon,  etc.

Algorithms

Hyper  Parameters

The  Tricky  Part✦ Algorithms  

✦ LogisBc  Regression,  SVM,  Tree-­‐based,  etc.  

✦ Algorithm  hyper-­‐parameters  ✦ Learning  Rate,  RegularizaBon,  etc.

Algorithms

Hyper  Parameters

FeaturizaBon

✦ FeaturizaBon  ✦ Text:  n-­‐grams,  TF-­‐IDF  ✦ Images:  Gabor  filters,  random  

convoluBons  ✦ Random  projecBon?  Scaling?

The  Tricky  Part✦ Algorithms  

✦ LogisBc  Regression,  SVM,  Tree-­‐based,  etc.  

✦ Algorithm  hyper-­‐parameters  ✦ Learning  Rate,  RegularizaBon,  etc.

Algorithms

Hyper  Parameters

FeaturizaBon

✦ FeaturizaBon  ✦ Text:  n-­‐grams,  TF-­‐IDF  ✦ Images:  Gabor  filters,  random  

convoluBons  ✦ Random  projecBon?  Scaling?

A  Standard  ML  Pipeline!

DataFeature  ExtracBon

Model  Training

Final    Model

A  Standard  ML  Pipeline

✦ In  pracBce,  model  building  is  an  iteraBve  process  of  conBnuous  refinement

✦ Our  grand  vision  is  to  automate  the  construcBon  of  these  pipelines

!Data

Feature  ExtracBon

Model  Training

Final    Model

A  Standard  ML  Pipeline

✦ In  pracBce,  model  building  is  an  iteraBve  process  of  conBnuous  refinement

✦ Our  grand  vision  is  to  automate  the  construcBon  of  these  pipelines

✦ Start  with  one  aspect  of  the  pipeline  -­‐  model  selecBon

!Data

Feature  ExtracBon

Model  Training

Final    Model

Automated  Model  Selec.on

One  Approach

Learning  Rate

RegularizaBon

One  Approach

Learning  Rate

RegularizaBon✦ Try  it  all!  

✦ Search  over  all  hyperparameters,  algorithms,  features,  etc.

One  Approach

Learning  Rate

RegularizaBon

Best  answer

✦ Try  it  all!  ✦ Search  over  all  

hyperparameters,  algorithms,  features,  etc.

One  Approach

Learning  Rate

RegularizaBon

Best  answer

✦ Try  it  all!  ✦ Search  over  all  

hyperparameters,  algorithms,  features,  etc.

One  Approach

Learning  Rate

RegularizaBon

Best  answer

✦ Try  it  all!  ✦ Search  over  all  

hyperparameters,  algorithms,  features,  etc.

One  Approach

Learning  Rate

RegularizaBon

Best  answer

✦ Try  it  all!  ✦ Search  over  all  

hyperparameters,  algorithms,  features,  etc.

One  Approach

Learning  Rate

RegularizaBon

Best  answer

✦ Try  it  all!  ✦ Search  over  all  

hyperparameters,  algorithms,  features,  etc.

One  Approach

Learning  Rate

RegularizaBon

Best  answer

✦ Try  it  all!  ✦ Search  over  all  

hyperparameters,  algorithms,  features,  etc.

One  Approach

Learning  Rate

RegularizaBon

Best  answer

✦ Try  it  all!  ✦ Search  over  all  

hyperparameters,  algorithms,  features,  etc.

One  Approach

Learning  Rate

RegularizaBon

Best  answer

✦ Try  it  all!  ✦ Search  over  all  

hyperparameters,  algorithms,  features,  etc.

One  Approach

Learning  Rate

RegularizaBon

Best  answer

✦ Try  it  all!  ✦ Search  over  all  

hyperparameters,  algorithms,  features,  etc.

One  Approach

Learning  Rate

RegularizaBon

Best  answer

✦ Try  it  all!  ✦ Search  over  all  

hyperparameters,  algorithms,  features,  etc.

✦ Drawbacks  ✦ Expensive  to  compute  models  ✦ Hyperparameter  space  is  large

✦ Some  version  of  this  sBll  oken  done  in  pracBce!

A  Befer  Approach

✦ Befer  resource  uBlizaBon  ✦ through  batching  

✦ Algorithmic  Speedups  ✦ via  early  stopping  

✦ Improved  Search  ✦ e.g.,  via  randomizaBon

Learning  Rate

RegularizaBon

Best  answer

A  Befer  Approach

✦ Befer  resource  uBlizaBon  ✦ through  batching  

✦ Algorithmic  Speedups  ✦ via  early  stopping  

✦ Improved  Search  ✦ e.g.,  via  randomizaBon

Learning  Rate

RegularizaBon

Best  answer

A  Befer  Approach

✦ Befer  resource  uBlizaBon  ✦ through  batching  

✦ Algorithmic  Speedups  ✦ via  early  stopping  

✦ Improved  Search  ✦ e.g.,  via  randomizaBon

Learning  Rate

RegularizaBon

Best  answer

A  Befer  Approach

✦ Befer  resource  uBlizaBon  ✦ through  batching  

✦ Algorithmic  Speedups  ✦ via  early  stopping  

✦ Improved  Search  ✦ e.g.,  via  randomizaBon

Learning  Rate

RegularizaBon

Best  answer

A  Tale  Of  3  OpBmizaBons

Be4er  Resource  U.liza.on  

Algorithmic  Speedups  

Improved  Search

Befer  Resource  UBlizaBon

Befer  Resource  UBlizaBon

✦ Modern  memory  slower  than  processors

✦ Can  read:  0.6b  doubles/sec/core  (4.8  GB/s)

✦ Can  compute:  15b  flops/sec/core

Befer  Resource  UBlizaBon

✦ Modern  memory  slower  than  processors

✦ Can  read:  0.6b  doubles/sec/core  (4.8  GB/s)

✦ Can  compute:  15b  flops/sec/core

✦ We  can  do  25  flops/double  read

What  Does  This  Mean  For  Modeling?

A B C

1 a Dog

1 b Cat

2 c Cat

2 d Cat

3 e Dog

3 f Horse

4 g Doge

Mod

el 2

Mod

el 1

What  Does  This  Mean  For  Modeling?

✦ Typical  model  update  requires  2-­‐4  flops/double  ✦ recall:  25  flops  /  double  read

A B C

1 a Dog

1 b Cat

2 c Cat

2 d Cat

3 e Dog

3 f Horse

4 g Doge

Mod

el 2

Mod

el 1

What  Does  This  Mean  For  Modeling?

✦ Typical  model  update  requires  2-­‐4  flops/double  ✦ recall:  25  flops  /  double  read

✦ Can  do  7-­‐10  model  updates  per  double  we  read  ✦ Assuming  that  models  fit  in  

cache

A B C

1 a Dog

1 b Cat

2 c Cat

2 d Cat

3 e Dog

3 f Horse

4 g Doge

Mod

el 2

Mod

el 1

What  Does  This  Mean  For  Modeling?

✦ Typical  model  update  requires  2-­‐4  flops/double  ✦ recall:  25  flops  /  double  read

✦ Can  do  7-­‐10  model  updates  per  double  we  read  ✦ Assuming  that  models  fit  in  

cache

✦ Train  mul.ple  models  simultaneously

A B C

1 a Dog

1 b Cat

2 c Cat

2 d Cat

3 e Dog

3 f Horse

4 g Doge

Mod

el 2

Mod

el 1

What  Do  We  See  In  Spark?

✦ 2x  and  5x  increase  in  models  trained/sec  with  batching  

✦ Overhead  from  virtualizaBon,  network,  etc.

What  Do  We  See  In  Spark?

What  Do  We  See  In  Spark?

✦ These  numbers  are  with  vector-­‐matrix  mulBplies

What  Do  We  See  In  Spark?

✦ These  numbers  are  with  vector-­‐matrix  mulBplies

✦ Can  do  befer  when  rewriBng  in  terms  of  matrix-­‐matrix  mulBplies

What  Do  We  See  In  Spark?

✦ These  numbers  are  with  vector-­‐matrix  mulBplies

✦ Can  do  befer  when  rewriBng  in  terms  of  matrix-­‐matrix  mulBplies

A  Tale  Of  3  OpBmizaBons

Be4er  Resource  U.liza.on  

Algorithmic  Speedups  

Improved  Search

Learning  Rate

RegularizaBon

Best  answer

Algorithmic    Speedups

Learning  Rate

RegularizaBon

Best  answer

Algorithmic    Speedups

✦ Each  point  in  hyper-­‐parameter  space  represents  trained  model

Learning  Rate

RegularizaBon

Best  answer

Algorithmic    Speedups

✦ Each  point  in  hyper-­‐parameter  space  represents  trained  model

✦ SomeBmes  we  see  early  on  that  a  model  is  no  good

Learning  Rate

RegularizaBon

Best  answer

Algorithmic    Speedups

✦ Each  point  in  hyper-­‐parameter  space  represents  trained  model

✦ SomeBmes  we  see  early  on  that  a  model  is  no  good

✦ So  we  stop  early  -­‐  give  up  on  models  that  are  not  progressing

Learning  Rate

RegularizaBon

Best  answer

Algorithmic    Speedups

✦ Each  point  in  hyper-­‐parameter  space  represents  trained  model

✦ SomeBmes  we  see  early  on  that  a  model  is  no  good

✦ So  we  stop  early  -­‐  give  up  on  models  that  are  not  progressing

Algorithmic    Speedups

✦ Each  point  in  hyper-­‐parameter  space  represents  trained  model

✦ SomeBmes  we  see  early  on  that  a  model  is  no  good

✦ So  we  stop  early  -­‐  give  up  on  models  that  are  not  progressing

Algorithmic    Speedups

✦ Each  point  in  hyper-­‐parameter  space  represents  trained  model

✦ SomeBmes  we  see  early  on  that  a  model  is  no  good

✦ So  we  stop  early  -­‐  give  up  on  models  that  are  not  progressing

Be4er  Resource  U.liza.on  

Algorithmic  Speedups  

Improved  Search

A  Tale  Of  3  OpBmizaBons

What  Search  Method?  

What  Search  Method?  

✦ Various  derivaBve-­‐free  opBmizaBon  techniques  ✦ Simple  ones  (Grid,  Random)  ✦ Classic  DerivaBve-­‐Free  (Nelder-­‐Mead,  Powell’s  method)  ✦ Bayesian  (SMAC,  TPE,  Spearmint)

What  Search  Method?  

✦ Various  derivaBve-­‐free  opBmizaBon  techniques  ✦ Simple  ones  (Grid,  Random)  ✦ Classic  DerivaBve-­‐Free  (Nelder-­‐Mead,  Powell’s  method)  ✦ Bayesian  (SMAC,  TPE,  Spearmint)

✦ What  should  we  do?  ✦ Tried  on  5  datasets,  opBmized  over  4  hyperparameters!

What  Search  Method?  GRID NELDER_MEAD POWELL RANDOM SMAC SPEARMINT TPE

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

australianbreast

diabetesfourclass

splice

16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625Method and Maximum Calls

Data

set a

nd V

alid

atio

n Er

ror

Maximum Calls1681256625

Comparison of Search Methods Across Learning Problems

✦ Various  derivaBve-­‐free  opBmizaBon  techniques  ✦ Simple  ones  (Grid,  Random)  ✦ Classic  DerivaBve-­‐Free  (Nelder-­‐Mead,  Powell’s  method)  ✦ Bayesian  (SMAC,  TPE,  Spearmint)

✦ What  should  we  do?  ✦ Tried  on  5  datasets,  opBmized  over  4  hyperparameters!

What  Search  Method?  GRID NELDER_MEAD POWELL RANDOM SMAC SPEARMINT TPE

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

australianbreast

diabetesfourclass

splice

16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625Method and Maximum Calls

Data

set a

nd V

alid

atio

n Er

ror

Maximum Calls1681256625

Comparison of Search Methods Across Learning Problems

✦ Various  derivaBve-­‐free  opBmizaBon  techniques  ✦ Simple  ones  (Grid,  Random)  ✦ Classic  DerivaBve-­‐Free  (Nelder-­‐Mead,  Powell’s  method)  ✦ Bayesian  (SMAC,  TPE,  Spearmint)

✦ What  should  we  do?  ✦ Tried  on  5  datasets,  opBmized  over  4  hyperparameters!

What  Search  Method?  GRID NELDER_MEAD POWELL RANDOM SMAC SPEARMINT TPE

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

australianbreast

diabetesfourclass

splice

16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625Method and Maximum Calls

Data

set a

nd V

alid

atio

n Er

ror

Maximum Calls1681256625

Comparison of Search Methods Across Learning Problems

✦ Various  derivaBve-­‐free  opBmizaBon  techniques  ✦ Simple  ones  (Grid,  Random)  ✦ Classic  DerivaBve-­‐Free  (Nelder-­‐Mead,  Powell’s  method)  ✦ Bayesian  (SMAC,  TPE,  Spearmint)

✦ What  should  we  do?  ✦ Tried  on  5  datasets,  opBmized  over  4  hyperparameters!

What  Search  Method?  GRID NELDER_MEAD POWELL RANDOM SMAC SPEARMINT TPE

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

australianbreast

diabetesfourclass

splice

16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625Method and Maximum Calls

Data

set a

nd V

alid

atio

n Er

ror

Maximum Calls1681256625

Comparison of Search Methods Across Learning Problems

Pubng  It  All  Together

Pubng  It  All  Together✦ First  version  of  MLbase  opBmizer

Pubng  It  All  Together✦ First  version  of  MLbase  opBmizer✦ 30GB  dense  images  (240K  x  16K)✦ 2  model  families,  5  hyperparams

Pubng  It  All  Together✦ First  version  of  MLbase  opBmizer✦ 30GB  dense  images  (240K  x  16K)✦ 2  model  families,  5  hyperparams✦ Baseline:  grid  search✦ Our  method:  combinaBon  of  

✦ Batching  ✦ Early  stopping  ✦ Random  or  TPE  

●●●●●●●●●●●●●●●●

●●●●

●●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.25

0.50

0.75

0 200 400 600 800Time elapsed (m)

Best

Val

idat

ion

Erro

r See

n So

Far

Search Method●

Grid − UnoptimizedRandom − OptimizedTPE − Optimized

Model Convergence Over Time

Pubng  It  All  Together✦ First  version  of  MLbase  opBmizer✦ 30GB  dense  images  (240K  x  16K)✦ 2  model  families,  5  hyperparams✦ Baseline:  grid  search✦ Our  method:  combinaBon  of  

✦ Batching  ✦ Early  stopping  ✦ Random  or  TPE  

●●●●●●●●●●●●●●●●

●●●●

●●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.25

0.50

0.75

0 200 400 600 800Time elapsed (m)

Best

Val

idat

ion

Erro

r See

n So

Far

Search Method●

Grid − UnoptimizedRandom − OptimizedTPE − Optimized

Model Convergence Over Time

Pubng  It  All  Together✦ First  version  of  MLbase  opBmizer✦ 30GB  dense  images  (240K  x  16K)✦ 2  model  families,  5  hyperparams✦ Baseline:  grid  search✦ Our  method:  combinaBon  of  

✦ Batching  ✦ Early  stopping  ✦ Random  or  TPE  

20x  speedup  compared  to  grid  search  15  minutes  vs  5  hours!    

Does  It  Scale?

●●●●●●●● ●●●●●● ●●●●● ●●●●

●●●●

●● ●● ●

0.25

0.50

0.75

5 10Time elapsed (h)

Best

Val

idat

ion

Erro

r See

n So

Far

Convergence of Model Accuracy on 1.5TB Dataset

Does  It  Scale?

✦ 1.5TB  dataset  (1.2M  x  160K)●●●●●●●● ●●●●●● ●●●●● ●●●●

●●●●

●● ●● ●

0.25

0.50

0.75

5 10Time elapsed (h)

Best

Val

idat

ion

Erro

r See

n So

Far

Convergence of Model Accuracy on 1.5TB Dataset

Does  It  Scale?

✦ 1.5TB  dataset  (1.2M  x  160K)✦ 128  nodes,  thousands  of  passes  

over  data✦ Tried  32  models  in  15  hours  

✦ Good  answer  aker  11  hours

●●●●●●●● ●●●●●● ●●●●● ●●●●

●●●●

●● ●● ●

0.25

0.50

0.75

5 10Time elapsed (h)

Best

Val

idat

ion

Erro

r See

n So

Far

Convergence of Model Accuracy on 1.5TB Dataset

Future  Work

!Data

Feature  ExtracBon

Model  Training

Final    Model

Future  Work

!Data

Feature  ExtracBon

Model  Training

Final    Model

Automated  Model  Selec.on

Future  Work

!Data

Feature  ExtracBon

Model  Training

Final    Model

Future  Work

!Data

Feature  ExtracBon

Model  Training

Final    Model

Automated  ML  Pipeline  Construc.on

A  Real  Pipeline  for  Image  ClassificaBon  

Inspired  by  Coates  &  Ng,  2012

Data Image Parser Normalizer Convolver

sqrt,mean

Zipper

Linear Solver

Symmetric Rectifier

ident,absident,mean

Global Pooling

Pooler

Patch Extractor

Patch Whitener

KMeans Clusterer

Feature Extractor

Label Extractor

ModelLinear Mapper

Test Data

Label Extractor

Feature Extractor

Test Error

Error Computer

Slide courtesy of Evan Sparks

Data Image Parser Normalizer Convolver

sqrt,mean

Zipper

Linear Solver

Symmetric Rectifier

ident,absident,mean

Global

Pooler

Patch Extractor

Patch Whitener

KMeans Clusterer

Feature Extractor

Label Extractor

Linear Mapper Model

Test Data

Label Extractor

Feature Extractor

Test Error

Error Computer

No Hyperparameters A few Hyperparameters Lotsa Hyperparameters

Slide courtesy of Evan Sparks

Other  Future  Work

Other  Future  Work

✦ Ensembling

Other  Future  Work

✦ Ensembling

✦ Leverage  sampling

Other  Future  Work

✦ Ensembling

✦ Leverage  sampling

✦ Befer  parallelism  for  smaller  datasets

Other  Future  Work

✦ Ensembling

✦ Leverage  sampling

✦ Befer  parallelism  for  smaller  datasets

✦ MulBple  hypothesis  tesBng  issues

MLOpt:  DeclaraBve  layer  that  aims  to  automate  ML  pipeline  construcBon  

MLlib:  Spark’s  core  ML  library  

MLI:  API  to  simplify  ML  development  

Spark:  Cluster  compuBng  system  designed  for  iteraBve  computaBon

THANKS! QUESTIONS?

baseML

baseML

baseML

baseML

ML base

ML base

ML base

ML base

ML base

www.mlbase.org

MLlib

MLIMLOpt

Apache  Spark