ch ch Caruana Ls - IJCLab Events Directory (Indico)

Research Opportunities

in AutoML

Rich CaruanaM

icrosoft Research

Motivation

75% of M

achine Learning is preparing to do m

achine learning

75% of M


achine learning

and 15% is w

hat you do afterwards…

75% of M


achine learning

and 15% is w

hat you do afterwards…

most M

L research about the middle 10%

Goals For This Talk

•Foster research on the complete M

L pipeline•Describe a few

open problems in AutoM

L•Suggest future challenge/com

petition problems

•Ultimate goal is to m

ake the practice of ML m

ore reliable so youdon’t need a Ph.D. in M

L + 10 years experience to do ML w

ell

•How/W

here do we start?

•Start by looking at difference betw

een ML in Lab and M

L in the field

ML Research

vs.Engineering

UC-Irvine/CIFAR

Real-World

UC-Irvine/CIFAR

vs.Real-W

orld

UC-Irvine/CIFAR

vs.Real-W

orld

•Download data

•N

o collection, cleaning, …

•Know how

well others did

•Metric(s) pre-defined

•Change algs, params, and coding

until doing well on m

etric•Som

etimes add data

UC-Irvine/CIFAR

vs.Real-W

orld

•Download data

•N

o collection, cleaning, …•Know

how w

ell others did•M

etric(s) pre-defined•Change algs, param

s, and coding until doing w

ell on metric

•Sometim

es add data

•Problem undefined

•Don’t know

how w

ell you can do•

Add new features and data feeds

•Clean, clean, clean

•M

ost effort goes into the data!•

Coding of data is critical•Choose practical algorithm

s•

Debug, debug, debug•

Wash, rinse, repeat•

month after m

onth after month!

Surprisingly, the research pipeline is complex because

we assum

e the researcher is an expert

Machine Learning (Research) Pipeline

Modify

LearningAlgorithm

Up-Sample

MinorityClass

FeatureSelection

Re-CodeFeatures

Tweak Hyper

Parameters

Deal With

Missing Values

CheckCalibration

DiscoverLeakage

Machine Learning (Engineering) Pipeline

•By real engineers, teams of engineers, …

•On real data, to real m

etrics, …•O

n schedule, on budget, …•M

ust be maintainable, repeatable, docum

entable, …

Machine Learning (Engineering) Pipeline

Each step in the pipeline is an opportunity to do AutoML research

Future AutoML (Engineering) Pipeline

Goals For This Talk

•Foster research on the complete M

L pipeline•Describe a few

open problems in AutoM

L•Suggest future challenge/com

petition problems

•So let’s just jump in…

Importance of H

yper-Parameter O

ptimization

•Hyper-Parameter O

ptimization is m

ost mature subarea in AutoM

L•

Manual heuristic search: surprisingly sub-optim

al•

Grid search: effective with sm

all number of param

eters•

Random search: better than grid w

ith larger number of param

eters•

Bayesian Optim

ization: better than random w

ith very large # parameters

•…

•With m

odern algorithms (boosting, deep neural nets, …

) parameter

optimization is m

uch more critical than you m

ight think…•

… because m

odern high-flying algorithms are all low

-bias, high variance

•How m

any people here use automatic hyper-param

eter optimization?

Importance of H

yper-Parameter O

ptimization

•Around 2000-2005, some thought supervised learning w

as done

•Quiz: they thought best algorithm

was:

•Neural N

ets?•Boosting?•Random

Forests?•SVM

s?

Importance of H

yper-Parameter O

ptimization

9SVM

s (circa 2000-2005)9

Bing Ranker: FastRankvs. N

euralNetRanker (circa 2010)9

Best DNN on CIFAR-10 and -100 use m

assive parameter optim

ization9

Optim

ize usual hyper-parameters such as learning rate, initialization, drop-out

9O

ptimize hyper-param

eters per layer(s)9

Optim

ize augmentations

9O

ptimize netw

ork architecture

9O

ur results: +1-4% on DNN

s by doing careful Bayesian Optim

ization9

TIMIT benefits from

careful hyper-parameter optim

ization9

Why didn’t deep nets get discovered in m

id 90’s?9

Didn’t explore the space and hyper-parameters thoroughly enough?

ML Algorithm

is an Important H

yper-Parameter

Threshold M

etricsR

ank/Ordering M

etricsProbability M

etricsM

odelA

ccuracyF-Score

Lift

RO

C A

reaAverage Precision

Break E

ven Point

Squared E

rrorC

ross-E

ntropyC

alibrationM

ean

BE

ST0.928

0.9180.975

0.9870.958

0.9580.919

0.9440.989

0.953

BST-D

T0.860

0.8540.956

0.9770.958

0.9520.929

0.9320.808

0.914R

ND

-FOR

0.8660.871

0.9580.977

0.9570.948

0.8920.898

0.7020.897

ANN

0.8170.875

0.9470.963

0.9260.929

0.8720.878

0.8260.892

SVM

0.8230.851

0.9280.961

0.9310.929

0.8820.880

0.7690.884

BAG-DT

0.8360.849

0.9530.972

0.9500.928

0.8750.901

0.6370.878

KN

N0.759

0.8200.914

0.9370.893

0.8980.786

0.8050.706

0.835B

ST-STM

P0.698

0.7600.898

0.9260.871

0.8540.740

0.7830.678

0.801D

T0.611

0.7710.856

0.8710.789

0.8080.586

0.6250.688

0.734LOG-REG

0.6020.623

0.8290.849

0.7320.714

0.6140.620

0.6780.696

NA

ÏVE

-B0.536

0.6150.786

0.8330.733

0.7300.539

0.5650.161

0.611

Importance of H

yper-Parameter O

ptimization

•Hyper-parameter optim

ization is example of w

hat AutoML can achieve

•20 years ago selecting hyper-parameters w

ere part of the craft of ML

•N

eural nets: number of hidden units, learning rate, m

omentum

, …•

Knowing how

to select hyper-parameters is part of w

hat made you an expert

•Now, m

ultiple papers and algorithms for hyper-param

eter optimization

•Thriving research comm

unity with m

ultiple workshops

•Makes a significant difference in accuracy of trained m

odels•O

pen source code•N

eed to view other steps in M

L pipeline as new research opportunities

Tools to Better Understand Data

NEVER Trust the DB/Data Spec!!!

AutoML Tools to Better U

nderstand Data

•Auto variable type determination

•0, 1, 2, 3, 4, 5: nom

inal, ordinal, integer, continuous?•

Are there dates in fields?•

Is a field a unique identifier or sequence number?

•Auto coding•

Different coding needed for NNs, SVM

s, KNN vs. decision tree-based m

ethods•Auto m

issing value detector•

0, 1, 2•

-1, 0, +1•

Can’t just try everything ---missing variables often cause leakage!

•Auto anomaly detection

•spurious strings, m

issing entries in table, …

First Real Data Set I Worked W

ith (1995 /)

•Pneumonia data from

1992-1995•14,199 patients•< 200 features•m

ix of Booleans, categorical, and continuous variables•m

issing values•

MAR ---M

issing At Random•

missing correlated w

ith target class (caused leakage!)

•Quickly w

rote simple unix

utility to help better understand the data

colstats demo…

DataDiff

•Automatically recognize changes in data

•changes in DB design, broken sensors, new

semantics, new

feeds, …•

In real world, DBs and data sources are living, breathing, evolving entities

•Hum

ans make m

istakes, forget what they did 1

sttime, retire, …

•C-Section 1993-1995 vs. 1996-1998 data (m

issing values recoded, …)

•30-day Hospital Re-Adm

ission 2011-2013 vs new 2014 sam

ple

•dDiffis not as trivial as it might seem

:•

Density estimation is hard in high dim

ensions (but this is a special case)•

Don’t care about simple drift if learned m

odel can handle it•

E.g., from 50-50 m

ale-female to 40-60 m

ale-female

•Care m

ost about changes that affect model accuracy or utility

•W

arning flags, default to more robust m

odel, auto-retrain/adapt, …

Model Protection W

rappers•

Model trained to predict 30-day re-adm

ission was

deployed at a children’s hospital•

Real-world:

•w

orld is always changing

•test data often looks very different from

train data

•M

odel should know data it w

as trained on and raise red flags when it detects

run-time (test) data looks m

eaningfully different•

Can make this part of standard practice

Feedback ---the Future Curse of ML!

•If you train a model on patient data

•And model is used to change practice of m

edicine (intervention)•N

ext time you collect data it is affected by m

odel…•…

so how do you collect unbiased data 2

ndtime around?

•This is a deep, fundamental problem

that in some dom

ains is not easy to solve (ethically, or efficiently) ---problem

with non-causal learning

•Could approach this as a dDiffproblem that looks not just at input

features but at labels and relationship between inputs and outputs

Post-Processing: Calibrated Probabilities

•Probabilities make com

plex systems easier to engineer

•Uniform language that is easy to explain/understand

•Consistent from rev to rev (elim

inates threshold effects)•W

here do probabilities come from

?•

Careful choice of learning algorithm?

•M

ost learning algorithms do N

OT generate good probabilities•

Even the best can often be improved

•Post-Calibration?

SVM Reliability Plots

Platt Scaling by Fitting a Sigmoid

•Linear scaling of SVM [-∞

,+∞] predictions to [0,1] is bad

•Platt’s Method [Platt 1999]:

•scale predictions by fitting sigm

oid on a valid

atio

n setusing 3-fold CV and

Bayes-motivated sm

oothing to avoid overfitting

Platt Scaling vs. Isotonic Regression

•Platt Scaling:

yIsotonic Regression:

P1 P2 P3 P4 P5 P6 P7

Platt Scaling vs. Isotonic Regression

Auto-Calibrate

•N

ot as easy to make bulletproof as you m

ight think•

Depends on sample size

•Depends on data skew

•Depends on RO

C•

Probably depends on source model that generated scores in 1

stplace•

Try multiple m

ethods and reliab

lypick best…

•Autom

atically select sample to be used for post-training calibration?

•Use cross-validation for calibration sam

ples when sm

all data?•

Easy to use tool for automatic calibration w

ould see widespread use

•Current tools require expertise and careful use

•Data m

ining challenge problem on calibration

•Foster new

research on new calibration m

ethods

AutoML O

pen Problems

•robust attribute typing and coding ---the spec is never right

•dDiff---because the w

orld never stops changing•

runtime w

rappers ---a model has to know

its limitations

•feedback cycle detection ---and w

e never stop changing the world

•auto calibration ---probabilities are good, but not easy to autom

ate•

auto leakage detection ---because data is never good enough•

skewed data expert ---because rare classes are very com

mon

•auto cross-validate ---because cross-validation isn’t really as sim

ple as you think•

auto metric selection ---w

hich metrics are sensitive to changes

•auto com

pression ---make sm

all model as sm

all and fast as possible•

auto transfer ---sometim

es transfer helps, sometim

es transfer hurts

Leakage and other “accidents”

•50%

of data mining com

petitions have leakage!!!•

win data m

ining competitions

•KDD2011 best paper aw

ard:

“Leakage in Data Mining: Form

ulation, Detection, and Avoidance”

ShacharKaufman, Saharon

Rosset, Claudia Perlich, Ori Stitelm

an

•Pneum

onia leakage 1: missing values

•Pneum

onia leakage 2: 4k features (AUC = 0.99)•

important to have expectations and know

when they are violated

•Autom

atic leakage detection: •

sequential analysis, missing value analysis, feature analysis, dDifftrain to real test, …

AutoML O

pen Problems

•robust attribute typing and coding ---the spec is never right

•dDiff---because the w

orld never stops changing•

runtime w

rappers ---a model has to know

its limitations

•feedback cycle detection ---and w

e never stop changing the world

•auto calibration ---probabilities are good, but not easy to autom

ate•

auto leakage detection ---because data is never good enough•

skewed data expert ---because rare classes are very com

mon

•auto cross-validate ---because cross-validation isn’t really as sim

ple as you think•

auto metric selection ---w

hich metrics are sensitive to changes

•auto com

pression ---make sm

all model as sm

all and fast as possible•

auto transfer ---sometim

es transfer helps, sometim

es transfer hurts

Want to do N

ew Research that G

ets Cited?

Want to do N

ew Research that G

ets Cited?

•Pick a part of the ML pipeline that’s still largely m

anual•Define w

hat it would m

ean to make it (m

ore) automatic

•Develop and publish methods

•Fully-autom

atic “robot” that solves problem•

Assistant that helps human recognize and solve problem

•Tools that alert w

hen problem (probably) exists

•Make data sets publicly available

•Make code available for use as a baseline (and possibly openSource)

•Organize a challenge com

petition on that part of the pipeline•Good w

ay to pick a thesis topic!

Summ

ary

•AutoML is a grow

th research are•

Comm

unity has neglected 85% of the challenges of doing real M

L•

Many independent sub-problem

s all worthy of attention

•Every tim

e you stub your toe on real problem => opportunity for new

research

•Hyper-Parameter O

ptimization often critical ---start using it!

•Suggest we all do research and w

rite paper on dDiffthis year•

dDiffworksop

in 1-2 years?•

dDiffchallenge/competition in 1-2 years?

•M

ake dDifftools Open Source and available in R and Linux

•Tools w

ill imm

ediately see widespread use

Thanks!

ch ch Caruana Ls - IJCLab Events Directory (Indico)

Documents

Transcript of ch ch Caruana Ls - IJCLab Events Directory (Indico)