Regression Variance-Bias Trade-off
description
Transcript of Regression Variance-Bias Trade-off
RegressionVariance-Bias Trade-off
Regression• We need a regression function h(x)
• We need a loss function L(h(x),y)
• We have a true distribution p(x,y)
• Assume a quadratic loss, then:
Note: yt; h(x)y(x)
estimation error noise error
Regression: Learning• Assume h(x) is a parametric curve, e.g. h(x)=af(x)+b.
• Minimize loss over the parameters (e.g. a,b), where p(x,y) is replaced with a sum over data-cases (called a “Monte Carlo sum”):
• That is: we solve:
• The same results follows from posing a Gaussian model q(y|x) for p(y|x) with mean h(x) and maximizing the probability of the data over the parameters. (This approach is taken in 274; probabilistic learning).
Back to overfitting• More parameters lead to more flexible functions which may lead to over-fitting.
• Formalize this by imagining very many datasets D, all of size N. Call h(x,D) the regression function estimated from a dataset D of size N, i.e. a(D)f(x)+b(D), then:
• Next, average over p(D)=p(x1)p(x2)….p(xN). Only first term depends on D:
0
Variance+bias2
Bias/Variance Tradeoff
A
B
C
A: The label y label fluctuates (label variance).
B: The estimate of h fluctuates across different datasets (estimation variance).
C: The average estimate of h does not fit well to the true curve (squared estimation bias).
Bias/Variance Illustration
Bias
Variance
Relation to Over-fitting
Increasing regularization(less flexible models)
Decreasing regularization(more flexible models)
Training error is measuringbias, but ignoring variance.
Testing error / X-validation erroris measuring both bias and variance.