Bias Variance Decomposition - Stanford University
Transcript of Bias Variance Decomposition - Stanford University
Bias Variance Decomposition
Classical Modern Elements
Classical Theory
Regularization
Parameterselection K fold CV Data Efficient
ComputeSuccessiveHatay
EfficientModern Theory Bonus
True functionho a OzWt Ol K t Oo
don'tobserve hoonly samples from it
a
SM se s
Samples
what if we fit a tone to samplerInformally we underfit the data Bias
or a
a a
AreWhat if we fit high degree polynomial degree 5
Overfitting data Variance
Hope If we picked quadratics save ha quadraticlow bias variance
not expect zero error inherent noise
Famous Chartoptimalmodelcomplexity
Error
Test Loss
Training Lossmodel complexity
degree
More Formally Bias Variance TradeoffTrue hypothesis ha n a O u
y ho a E E NO E error
E Features data CIRD hfweaann.gg IEEE 0output ECE Tobserved
GE IR O n ElRd
ProcedureiL Draw h labeledpants CX g n y
call this S
y ho na tell
2 We train model on 5Call it hs IRD IR
3 Pick a CIRD test point y hole E En NCO Et
4 Measure hsbc y Risk
We examine IE LhsCa g J X Y independent
IEEXYS EIEEXT E
Goat Decompose error
Ee LhsCn y5 IsEe Chsm hocus e5
Ee e 2E E hsh holn IE Chin howp sindptof S IEeT o
o O ISE hsm holnl5Unavoidableerror
Define haugCa EsLhsCns
randomlyselect S tram to fit hs evaluate hscn
haug average prediction overs
1EsfthsCul hotel5 EsfLhsCa haugen haugen hotel
ISE hsiu hangCND t F Charges how2IEs havgCu havg n
Since techsCNT hangbn
Es hsin haugenD O
EsKhsaa haugen 5 t ChangCN hotelVariance of tracing procedure Bias does not depend on S
VAR Ch error introduced bymodel family
SummarysEe Chin how
Unavoidable error c Bias 1 Variance02
Et ChangCui hocus t techsCa havgcu
optimal Total Errormodel
Bias
RegularizationReduce Variance to obtain more robust model
to training set variation
Explicit Change the model Penalty terms
Implicit by the algorithmregularizationClassical setting parameter C Rt
Argun Cn O g t Iz 110thOEIRDpick a lesscomplex model
Xi 0 ordinary heart squaresXi 100100 O O probably good
tradeoff 4 hyperparameter
solution fix a O
co XO g Too y t 12020
Pollo XT Xo g XO
xtx XT.JO X'Ty Normal equation
Uhde determined modern ML
Rank XTX L d o
there U no unique sod
X O XTXOo My7 v GTX v O
Oo TV is a solution too
TSO KTX is PSD Omani 7022700h70
THAI 0,7 3 on'xX 70
9 47 1 25 XtyA regularized solution
ridge regressionReduces variance
Varsch lsE hsin how
increase X spectrum getting flatterdecreases variance 1
Bonuses Implicitly regularize as well
thought expt run gradient descent
0GB PNaucxtx OGD t Pspancx OG
Claim Pnm Oso Pnm Old
get't OCA L XT xd y e always in
spanCx
09 0 min norm sol just by usinggradient descent
Extra Belkin Hsu Ma Mandel 2018 DoubleDescent
e
loss D
interpolation