Bias Variance Decomposition - Stanford University

Bias Variance Decomposition

Classical Modern Elements

Classical Theory

Regularization

Parameterselection K fold CV Data Efficient

ComputeSuccessiveHatay

EfficientModern Theory Bonus

True functionho a OzWt Ol K t Oo

don'tobserve hoonly samples from it

a

SM se s

Samples

what if we fit a tone to samplerInformally we underfit the data Bias

or a

a a

AreWhat if we fit high degree polynomial degree 5

Overfitting data Variance

Hope If we picked quadratics save ha quadraticlow bias variance

not expect zero error inherent noise

Famous Chartoptimalmodelcomplexity

Error

Test Loss

Training Lossmodel complexity

degree

More Formally Bias Variance TradeoffTrue hypothesis ha n a O u

y ho a E E NO E error

E Features data CIRD hfweaann.gg IEEE 0output ECE Tobserved

GE IR O n ElRd

ProcedureiL Draw h labeledpants CX g n y

call this S

y ho na tell

2 We train model on 5Call it hs IRD IR

3 Pick a CIRD test point y hole E En NCO Et

4 Measure hsbc y Risk

We examine IE LhsCa g J X Y independent

IEEXYS EIEEXT E

Goat Decompose error

Ee LhsCn y5 IsEe Chsm hocus e5

Ee e 2E E hsh holn IE Chin howp sindptof S IEeT o

o O ISE hsm holnl5Unavoidableerror

Define haugCa EsLhsCns

randomlyselect S tram to fit hs evaluate hscn

haug average prediction overs

1EsfthsCul hotel5 EsfLhsCa haugen haugen hotel

ISE hsiu hangCND t F Charges how2IEs havgCu havg n

Since techsCNT hangbn

Es hsin haugenD O

EsKhsaa haugen 5 t ChangCN hotelVariance of tracing procedure Bias does not depend on S

VAR Ch error introduced bymodel family

SummarysEe Chin how

Unavoidable error c Bias 1 Variance02

Et ChangCui hocus t techsCa havgcu

optimal Total Errormodel

Bias

RegularizationReduce Variance to obtain more robust model

to training set variation

Explicit Change the model Penalty terms

Implicit by the algorithmregularizationClassical setting parameter C Rt

Argun Cn O g t Iz 110thOEIRDpick a lesscomplex model

Xi 0 ordinary heart squaresXi 100100 O O probably good

tradeoff 4 hyperparameter

solution fix a O

co XO g Too y t 12020

Pollo XT Xo g XO

xtx XT.JO X'Ty Normal equation

Uhde determined modern ML

Rank XTX L d o

there U no unique sod

X O XTXOo My7 v GTX v O

Oo TV is a solution too

TSO KTX is PSD Omani 7022700h70

THAI 0,7 3 on'xX 70

9 47 1 25 XtyA regularized solution

ridge regressionReduces variance

Varsch lsE hsin how

increase X spectrum getting flatterdecreases variance 1

Bonuses Implicitly regularize as well

thought expt run gradient descent

0GB PNaucxtx OGD t Pspancx OG

Claim Pnm Oso Pnm Old

get't OCA L XT xd y e always in

spanCx

09 0 min norm sol just by usinggradient descent

Extra Belkin Hsu Ma Mandel 2018 DoubleDescent

e

loss D

interpolation

Bias Variance Decomposition - Stanford University

Documents

Transcript of Bias Variance Decomposition - Stanford University