DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance...

30
Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections DIGITAL TWIN AI and Machine Learning: The Bias-Variance Tradeoff and Model Selection Prof. Andrew D. Bagdanov andrew.bagdanov AT unifi.it Dipartimento di Ingegneria dell’Informazione Università degli Studi di Firenze 13/19 November 2020 AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Transcript of DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance...

Page 1: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

DIGITAL TWIN AI and Machine Learning:The Bias-Variance Tradeoff and Model Selection

Prof. Andrew D. Bagdanovandrew.bagdanov AT unifi.it

Dipartimento di Ingegneria dell’InformazioneUniversità degli Studi di Firenze

13/19 November 2020

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 2: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Outline

Introduction

The Bias-Variance Decomposition

Cross-validation

Hyperparameter Selection

Reflections

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 3: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Introduction

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 4: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Overview

I Today will bring to a close the first part of this course.I We will first look at a very important concept: the Bias-Variance

Decomposition.I This gives us a conceptual model of estimator performance in

terms of generalization error.I With this in hand, we will then look at some concrete tools from we

can use to manage the trade-off between model bias and variance:I Cross-validation: the basis for understanding bias and variance.I Learning curves: to understand how model variance depends on

training split size.I Validation curves: to understand how hyperparameters affect model

performance.I Model selection: to systematically explore hyperparameter setting

through grid search.

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 5: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

The Bias-Variance Decomposition

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 6: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Bias-variance in a Nutshell

The Math

I The biasvariance decomposition is a way of analyzing a model’sexpected generalization error: the bias, the variance, and theirreducible error resulting from noise in the problem itself.

I Say we estimate a true function f (x) by y = f̂ (x) + ε, where ε isthe noise with zero-mean and variance σ2.

I It can be shown that the expected error is equal to:

E[f (x)− f̂ (x) + ε] = (Bias[f̂ (x)])2 + Var[f̂ (x)] + σ2, where

Bias[f̂ (x)]) = E[f̂ (x)]− E[f (x ]Var[f̂ (x)] = E[f̂ (x)2]− E[f̂ (x)]2

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 7: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Bias-variance in a Nutshell

The Main Point

I As we increase model complexity:I Bias decreases: a better fit to data.I Variance increases: fit model varies more with data.I Imagine the hierachy of polynomial models:

I f (x) = cI f (x) = ax + cI f (x) = ax2 + bx + cI . . .

I As we go up in this hierarchy, model complexity increases and biasdecreases.

I But, the model parameters estimated from data will wildly fluctuatewith changing data – even if drawn from the same distribution.

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 8: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Bias-variance in a Nutshell

Visualizing Bias and Variance

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 9: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Cross-validation

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 10: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Robust Evaluation of Model Performance

Cross-validation Overview

I Learning the parameters of any model and testing it on the samedata is a methodological mistake.

I A model that just memorizes the labels of the training sampleswould have a perfect score.

I But, of course it would fail miserably to classify any samples notyet seen.

I This is an extreme example of what is called overfitting.I To avoid it, it is common practice when performing supervised

machine learning to hold out part of the available data as a test set(as we have done since the beginning).

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 11: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Robust Evaluation of Model Performance

A Useful FlowchartI Here is a flowchart of the cross-validation workflow for training:

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 12: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Robust Evaluation of Model Performance

Validation Set

I When evaluating different hyperparameter settings, there is still arisk of overfitting on the test set.

I If we tweak parameters until estimator is optimal, knowledge aboutthe test set can "leak" into the model and evaluation metrics nolonger reflect generalization performance.

I To solve this problem, usually another part of the dataset can beheld out as a validation set: we train on training set, then evaluateon the validation set, and when the model seems to work well thefinal evaluation is done on the test set.

I However, by partitioning the available data into three sets, wedrastically reduce the number of samples used for learning.

I Moreover, the results can depend on a particular random choice fortrain and validation sets.

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 13: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Robust Evaluation of Model Performance

Enter, Cross-validation

I A solution to this problem is a procedure called cross-validation.I A test set should still be held out for final evaluation, but the

validation set is no longer needed.I The basic approach is called k-fold cross-validation: the training

set is split into k equally-sized, smaller sets, and the followingprocedure is followed for each of the k folds:

1. A model is trained using k − 1 of the folds as training data;2. The resulting model is validated on the remaining part of the data

by computing a performance measure such as accuracy on it.

I Important: the average performance over the k folds gives us alower bound on the generalization of the model to unseen data.

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 14: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Robust Evaluation of Model Performance

Cross-validation (continued)I Here is a diagram explaining the k-fold process:

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 15: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Robust Evaluation of Model Performance

Cross-validation (continued)

I In sklearn we can easily do k-fold cross-validation using thesklearn.model_selection.cross_val_score function:

from sklearn.model_selection import cross_val_scorefrom sklearn.svm import LinearSVCmodel = LinearSVC(C=100, verbose=3)scores = cross_val_score(model, X_tr, y_tr, cv=3,

verbose=3, n_jobs=4)

I Some parameters to pay attention to:I cv: number of folds to use.I verbose: logging level – useful to have feedback for long runs.I n_jobs: number of parallel jobs to use.I scoring: function to use for scoring (defaults to model.score().

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 16: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Robust Evaluation of Model Performance

Cross-validation: Analysis

I Cross-validation is a powerful tool for understanding how models(might) generalize.

I As we will see next, it is the basis for hyperparameter evaluationand selection.

I Problem: cross-validation is expensive as multiple models must befit to multiple splits of data.

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 17: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Hyperparameter Selection

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 18: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Finding the Best Model

Hyperparameter Selection

I Up to now we have used cross-validation only to obtain a morereliable estimate of the performance of our estimator.

I By training multiple times on random train/validation splits wemake the most of available data.

I But this still leaves open the question of how to effectively selectthe hyperparameters of our model.

I Up to now we have used models that have relatively fewhyperparameters.

I When we look at deep models based on neural networks, however,there will be significantly more.

I Fortunately, cross-validation also gives us a tool for robustlyestimating performance over a grid of hyperparameters.

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 19: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Finding the Best Model

Hyperparameter Selection: Validation Curves

I An excellent way to get an overview of the sensitivity of a model toone hyperparameter is to plot a validation curve.

from sklearn.model_selection import validation_curve(train_scores, val_scores) = validation_curve(

LinearSVC(), X_tr, y_tr,"C", [0.1, 1.0, 10, 100, 1000],cv=3)

val_scores

array([[0.85090745, 0.85135743, 0.82478248],[0.85180741, 0.84535773, 0.85238524],[0.84910754, 0.85300735, 0.83408341],[0.83620819, 0.84640768, 0.85358536],[0.85525724, 0.86170691, 0.84983498]])

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 20: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Finding the Best Model

Hyperparameter Selection: Validation CurvesI Useful: validation_curve returns the cross-validated scores for

all folds for all parameters.I This allows us to make useful plots:

See: https://scikit-learn.org/stable/auto_examples/model_selection/plot_validation_curve.html

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 21: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Finding the Best Model

Hyperparameter Selection: Learning Curves

I An important factor in the variance of any model is the size of thetraining split.

I According to Geoffrey Hinton: "More labeled data is the bestpossible model regularizer. . . "

I Using sklearn.model_selection.learning_curve() we canevaluate model performance as a function of test split size:

from sklearn.model_selection import learning_curvetrain_sizes, train_scores, test_scores = learning_curve(

LinearSVC(),X_tr, y_tr, cv=3)

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 22: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Finding the Best Model

Hyperparameter Selection: Learning CurvesI Again, this returns all scores for all folds for all training set sizes.I From these we can produce nice plots like:

See: https://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 23: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Finding the Best Model

Hyperparameter Selection: Grid Search

I Grid search is an unsophisticated, brute-force technique that worksvery well in practice.

I There are two main variations: Uniform and Random Grid Search

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 24: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Finding the Best Model

Hyperparameter Selection: Grid Search (continued)I The first thing to do is understand which hyperparameters are of

interest.I This almost always requires a detailed perusal of the

documentation.I Consider a linear SVM with hinge loss: the model essentially has

only one hyperparameter: the C used to weight model complexityversus empirical loss:

f (x) = wT x+ b

L(D;w, b) = minw||w||2 +

∑(x,y)∈D

Cmax(0, 1− yf (x))

class(x) =

{−1 if f (x) ≤ 0+1 if f (x) > 0

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 25: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Finding the Best Model

Hyperparameter Selection: Grid Search (continued)I The key class in sklearn is

sklearn.model_selection.GridSearchCV.I What must provide to GridSearchCV is a grid of parameters to

search:from sklearn.model_selection import GridSearchCVmodel = LinearSVC(max_iter=2000)param_grid = {'C': [0.001, 0.1, 1.0, 10, 20, 50, 100, 1000]}search = GridSearchCV(model, param_grid, cv=3, verbose=3, n_jobs=4)search.fit(X_tr, y_tr)test_score = accuracy_score(y_te, search.best_estimator_.predict(X_te))print(f'Best parameters: {search.best_params_}')print(f'Best cross-val score: {search.best_score_}')print(f'Score on test set: {test_score}')

Fitting 3 folds for each of 8 candidates, totalling 24 fits...

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 26: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Finding the Best Model

Hyperparameter Selection: Grid Search (continued)

I What if we have more hyperparameters?I For example, in LinearSVC we can also choose the type of penalty

(L1 or L2).I Well, we can just add them to the grid:

...param_grid = {'C': [0.001, 0.1, 1.0, 10, 20, 50, 100, 1000],

'penalty': ['l1', 'l2']}...

Fitting 3 folds for each of 16 candidates, totalling 48 fits

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 27: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Reflections

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 28: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Model Selection

I Model selection is a fundamental fact of life when working withmachine learning algorithms.

I Most of the models we have seen so far are low-variance models:they perform fairly stably over a range of hyperparameter settings.

I This is why these models are the tried-and-true techniques forsupervised learning: they often just work.

I In the next part of the course we will start looking at neuralnetwork models.

I They can achieve significantly better performance. . .I . . . at the cost, however, of significantly complicating the model

selection process.

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 29: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

The Bias-Variance Decomposition

I Nutshell: The more complex the model f̂ (x) is, the more datapoints it will capture, and the lower the bias will be; however,complexity will make the model "move" more to capture the datapoints, and hence its variance will be larger.

I Caveat: The Bias-Variance Decomposition is useful as aconceptual model – in practice the bias and variance of models isdifficult to estimate.

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov

Page 30: DIGITAL TWIN AI and Machine Learning: The Bias-Variance ... · Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections Robust Evaluation

Introduction The Bias-Variance Decomposition Cross-validation Hyperparameter Selection Reflections

Model Selection Lab

I The laboratory notebook for today:

http://bit.ly/DTwin-ML5

AI&ML: The Bias-Variance Tradeoff A. D. Bagdanov