Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of...

43
Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico [email protected] 2 0 1 5
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    0

Transcript of Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of...

Page 1: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

Machine Learning in 25 minutes or

lessAnd why the HotOS folks should care...

Terran LaneDept. of Computer ScienceUniversity of New Mexico

[email protected]

20

15

Page 2: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

Machine learning is the study of algorithms or systems that improve their performance in response to experience.

Page 3: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

Machine learning is the study of algorithms or systems that improve their performance in response to experience.

Page 4: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

Machine learning is the study of algorithms or systems that improve their performance in response to experience.

Page 5: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

Machine learning is the study of algorithms or systems that improve their performance in response to experience.

Page 6: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problem

The W

orl d

Page 7: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problemThe W

orl d

- Network- CPU- Program memory footprint- User activity- Multi-process performance

Page 8: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problem

The W

orl d

Senso

rs

Page 9: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problem

The W

orl d

Senso

rs

- Latency; bandwidth- Branches taken; cache misses- Memory allocs; object age- Keystroke rates; recent commands- Process throughput; cache activity; synch delays

Page 10: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problem

The W

orl d

Senso

rs

X

Page 11: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problem

The W

orl d

Senso

rsModel

f(X)

X

prediction

Page 12: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problemThe W

orl d

Senso

rsModel

f(X)

X

- Compression/redundancy rates- Branch prediction- Object lifetime- Legitimate/hostile- Normal/abnormal

Page 13: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problem

The W

orl d

Senso

rsModel

f(X)

X

ŷ

Page 14: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problem

The W

orl d

Senso

rsModel

f(X)

X

ŷ

Performancemeasure

L(ŷ)

assessment

Page 15: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problem

The W

orl d

Senso

rsModel

f(X)

X

ŷ

Performancemeasure

L(ŷ,y)

assessment

y

Page 16: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problemThe W

orl d

Senso

rsModel

f(X)

X

ŷ

Performancemeasure

L(ŷ,y)

assessment

y

- accuracy (0/1 loss)- squared error- time-to-response

Page 17: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problem

The W

orl d

Senso

rsModel

f(X)

X

ŷ

Performancemeasure

assessment

control

Page 18: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problem

The W

orl d

Senso

rsModel

f(X)

X

ŷ

Performancemeasure

assessment

response

Page 19: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problem

The W

orl d

Senso

rsModel

f(X)

X

ŷ

Performancemeasure

assessment

L(ŷ,X’)

Page 20: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problemThe W

orl d

Senso

rsModel

f(X)

X

ŷ

Performancemeasure

assessment

L(ŷ,X’)

- Correctness- Stability- Robustness- Total system performance (throughput, latency, etc.)

Page 21: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problem

The W

orl d

Senso

rsModel

f(X)

X

Performancemeasure

assessment

Page 22: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problemThe W

orl d

Senso

rsModel

f(X)

X

Performancemeasure

assessment

- ???- Do you like the model?- Does it make sense?- Does it make you feel warm and fuzzy?

Page 23: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problemThe W

orl d

Senso

rsModel

f(X)

X

ŷ

Performancemeasure

assessment

The ML job:find this...

Page 24: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The core ML problemThe W

orl d

Senso

rsModel

f(X)

X

ŷ

Performancemeasure

assessment

The ML job:find this...

... so thatthis is as good

as possible.

Page 25: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

Types of learning•Supervised

•Reinforcement learning

•Unsupervised

•Special cases:

•Semi-supervised

•Anomaly detection

•Behavioral cloning

•etc...

Page 26: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

Supervised Learning•Characteristics:

•Measure features/sensor values ⇒ X

•Want to predict system “output”, y

•Have some source of example (X,y) pairs

•System, human-labeling, etc.

•Have a well-defined performance criterion

Page 27: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

Example sup. learners•Discriminative: only produces classifier

•Decision tree: fast; comprehensible models

•Support vector machine: high dim data; accurate

•Nearest-neighbor / k-nn: low-dim data; slow

•Neural net: special case of SVM

•Generative: produces complete probability model

•Naive Bayes: very simple; surprisingly accurate

•Bayesian network: powerful; descriptive; accurate

•Markov random field: closely related to BNs

•Meta-learners/ensemble methods: sets of models

•Boosting

•Bagging

•Winnow

Page 28: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

Key assumption #1

The train/test data reflect the same data

distribution that will be experienced when the

learned model is embedded in

performance system.•System not changing over time

•Model doesn’t affect behavior of system

Page 29: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

Key assumption #2

All data points are statistically

independent.

•No linkage between “adjacent”/“successive” points

•No other process that is affecting data generation

Page 30: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

Reinforcement learning•Characteristics:

•Measure features of system ⇒ X

•Want to control sys. -- model outputs are “knobs”

•Can interact with system/simulation

•Have performance measure that recognizes “good” system behavior

•Don’t need to know “correct” control actions

Page 31: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

Key criterion•Are the sensor readings enough to completely

characterize state of the system?

•Knowing X tells you everything relevant

•Yes:

•“Fully observable”

•Learning optimal performance fairly tractable (*)

•No (multiple system states produce same X):

•“Partially observable”

•Learning barely satisfactory performance incredibly difficult (PSPACE-complete. Or worse.)

Page 32: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

RL: The good news•It does everything that traditional control

doesn’t!

•Stochasticity ok

•Don’t need a model

•Don’t need linearity

•Discrete time ok

•No messy ODEs or z transforms!

•Delay ok

Page 33: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

RL: The bad news•Low dimensions

•Discrete variables/features

•Need to know state space

•Convergence can be slow

•Glacial

•Optimal control can be intractable

Page 34: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

Example RL•Fully observable systems

•Q-learning

•SARSA

•Dyna

•E3

•Partially observable

•Reinforce

•Utile distinction memories

•Policy gradient methods

Page 35: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

Key difference #1Unlike supervised learning...

Distinct data points can be temporally

correlated.•Key parameter: how much history is

necessary to characterize the system?

•Markov order

•1 time unit? 2? All of them?

Page 36: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

Key difference #2Unlike supervised learning...

Model is expected to influence behavior of

system•It’s a good thing...

Page 37: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

References (partial)•General:

•Mitchell, Machine Learning, McGraw-Hill, 1997.

•Duda, Hart, & Stork, Pattern Classification, Wiley, 2001.

•Hastie, Tibshirani, & Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, 2001.

•Software (general; mostly supervised):

•Weka: Data Mining Software in Java.http://www.cs.waikato.ac.nz/ml/weka/

Page 38: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

References (partial)•Decision trees:

•Quinlan, C4.5: Programs for machine learning, Morgan Kaufmann, 1993.

•Brieman, Classification & Regression Trees (CART), Wadsworth, 1983.

•Support vector machines:

•Burges, “A Tutorial on Support Vector Machines for Pattern Recognition”, Data Mining and Knowledge Discovery, 2(2), 1998.

•Software: SVMlighthttp://svmlight.joachims.org/

Page 39: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

References (partial)•Reinforcement learning

•Sutton & Barto, Reinforcment Learning: An Introduction, MIT Press, 1998.

•Kaelbling, Littman, & Moore, “Reinforcement Learning: A Survey”, Journal of Artificial Intelligence Research, 4, 1996.

•Kaelbling, Littman, & Cassandra, “Planning and Acting in Partially Observable Stochastic Domains”, Artificial Intelligence, 101,1998.

Page 40: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

Thank you!

Questions?

Page 41: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

ML keywords•Learning

•Adaptive

•Self-tuning

•State estimation

•Parameter estimation

•Data mining

•Computational statistics

•Predictive modeling

•Pattern recognition

•etc...

Page 42: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The Learning LoopThe W

orl d

Senso

rsModel

f(X)

X

ŷ

Performancemeasure

L(ŷ,y)

assessment

y

Generate“training”

data

Learningmodule

f(X)

Performancemeasure

Page 43: Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu.

The training process•Gather large set of “training data”

•Dtrain

=[ (X1,y

1), (X

2,y

2), ... , (X

n,y

n) ]

•Also large set of “testing” (eval; holdout) data

•Deval

=[ (X1,y

1), ... , (X

m,y

m) ]

•Apply learner to train to get model

•f() = learn(Dtrain

,L)

•Evaluate results on test set

•[ ŷtest

] = f(Xtest

)

•assessment = L(ŷtest

,ytest

)