Post on 25-Apr-2018
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Linear Models for Regression
Henrik I Christensen
Robotics & Intelligent Machines @ GTGeorgia Institute of Technology,
Atlanta, GA 30332-0280hic@cc.gatech.edu
Henrik I Christensen (RIM@GT) Linear Regression 1 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Outline
1 Introduction
2 Preliminaries
3 Linear Basis Function Models
4 Baysian Linear Regression
5 Baysian Model Comparison
6 Summary
Henrik I Christensen (RIM@GT) Linear Regression 2 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Introduction
The objective of regression is to enable prediction of a value t basedon modelling over a dataset X .
Consider a set of D observations over a space
How can we generate estimates for the future?
Battery time?Time to completion?Position of doors?
Henrik I Christensen (RIM@GT) Linear Regression 3 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Introduction (2)
Example from Chapter 1
x
t
0 1
−1
0
1
y(x ,w) = w0 + w1x + w2x2 + . . . + wmxm =
m∑i=0
wixi
Henrik I Christensen (RIM@GT) Linear Regression 4 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Introduction (3)
In general the functions could be beyond simple polynomials
The “components” are termed basis functions, i.e.
y(x ,w) =m∑
i=0
wiφi (x) = ~wT ~φ(x)
Henrik I Christensen (RIM@GT) Linear Regression 5 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Outline
1 Introduction
2 Preliminaries
3 Linear Basis Function Models
4 Baysian Linear Regression
5 Baysian Model Comparison
6 Summary
Henrik I Christensen (RIM@GT) Linear Regression 6 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Loss Function
For optimization we need a penalty / loss function
L(t, y(x))
Expected loss is then
E [L] =
∫ ∫L(t, y(x))p(x , t)dxdt
For the squared loss function we have
E [L] =
∫ ∫{y(x)− t}2p(x , t)dxdt
Goal: choose y(x) to minimize expected loss (E[L])
Henrik I Christensen (RIM@GT) Linear Regression 7 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Loss Function
Derivation of the extrema
δE [L]
δy(x)= 2
∫{y(x)− t}p(x , t)dt = 0
Implies that
y(x) =
∫tp(x , t)dt
p(x)=
∫tp(t|x)dt = E [t|x ]
Henrik I Christensen (RIM@GT) Linear Regression 8 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Loss Function - Interpretation
t
xx0
y(x0)
y(x)
p(t|x0)
Henrik I Christensen (RIM@GT) Linear Regression 9 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Alternative
Consider a small rewrite
{y(x)− t}2 = {y(x)− E [t|x ] + E [t|x ]− t}2
The expected loss is then
E [L] =
∫{y(x)− E [t|x ]}2p(x)dx +
∫{E [t|x ]− t}2p(x)dx
Henrik I Christensen (RIM@GT) Linear Regression 10 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Outline
1 Introduction
2 Preliminaries
3 Linear Basis Function Models
4 Baysian Linear Regression
5 Baysian Model Comparison
6 Summary
Henrik I Christensen (RIM@GT) Linear Regression 11 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Polynomial Basis Functions
Basic Definition:
φi (x) = x i
Global functionsSmall change in x affects all ofthem
−1 0 1−1
−0.5
0
0.5
1
Henrik I Christensen (RIM@GT) Linear Regression 12 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Gaussian Basis Functions
Basic Definition:
φi (x) = e−(x−µi )
2
2s2
A way to Gaussian mixtures,local impactNot required to haveprobabilistic interpretation.µ control position and scontrol scale
−1 0 10
0.25
0.5
0.75
1
Henrik I Christensen (RIM@GT) Linear Regression 13 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Sigmoid Basis Functions
Basic Definition:
φi (x) = σ
(x − µi
s
)where
σ(a) =1
1 + e−a
µ controls location and scontrols slope
−1 0 10
0.25
0.5
0.75
1
Henrik I Christensen (RIM@GT) Linear Regression 14 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Maximum Likelihood & Least Squares
Assume observation from a deterministic function contaminated byGaussian Noise
t = y(x ,w) + ε p(ε|β) = N(ε|0, β−1)
the problem at hand is then
p(t|x ,w , β) = N(t|y(x ,w), β−1)
From a series of observations we have the likelihood
p(t|X|w , β) =N∏
i=1
N(ti |wTφ(xi ), β−1)
Henrik I Christensen (RIM@GT) Linear Regression 15 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Maximum Likelihood & Least Squares (2)
This results in
ln p(t|w, β) =N
2lnβ − N
2ln(2π)− βED(w)
where
ED(w) =1
2
N∑i=1
{ti −wTφ(xi )}2
is the sum of squared errors
Henrik I Christensen (RIM@GT) Linear Regression 16 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Maximum Likelihood & Least Squares (3)
Computing the extrema yields:
wML =(ΦTΦ
)−1ΦT t
where
Φ =
φ0(x1) φ1(x1) · · · φM−1(x1)φ0(x1) φ1(x2) · · · φM−1(x2)
......
. . ....
φ0(xN) φ1(xN) · · · φM−1(xN)
Henrik I Christensen (RIM@GT) Linear Regression 17 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Line Estimation
Least square minimization:
Line equation: y = ax + bError in fit:
∑i (yi − axi − b)2
Solution: (y2
y
)=
(x2 xx 1
) (ab
)Minimizes vertical errors. Non-robust!
Henrik I Christensen (RIM@GT) Linear Regression 18 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
LSQ on Lasers
Line model: ri cos(φi − θ) = ρ
Error model: di = ri cos(φi − θ)− ρ
Optimize: argmin(ρ,θ)
∑i (ri cos(φi − θ)− ρ)2
Error model derived in Deriche et al. (1992)
Well suited for “clean-up” of Hough lines
Henrik I Christensen (RIM@GT) Linear Regression 19 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Total Least Squares
Line equation: ax + by + c = 0
Error in fit:∑
i (axi + byi + c)2 where a2 + b2 = 1.
Solution: (x2 − x x xy − x y
xy − x y y2 − y y
) (ab
)= µ
(ab
)where µ is a scale factor.
c = −ax − by
Henrik I Christensen (RIM@GT) Linear Regression 20 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Line Representations
The line representation is crucialOften a redundant model isadoptedLine parameters vs end-pointsImportant for fusion ofsegments.End-points are less stable
Henrik I Christensen (RIM@GT) Linear Regression 21 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Sequential Adaptation
In some cases one at a time estimation is more suitable
Also known as gradient descent
w(τ+1) = w(τ) − η∇En
= w(τ) − η(tn −w(τ)Tφ(xn))φ(xn)
Knows as least-mean square (LMS). An issue is how to choose η?
Henrik I Christensen (RIM@GT) Linear Regression 22 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Regularized Least Squares
As seen in lecture 2 sometime control of parameters might be useful.
Consider the error function:
ED(w) + λEW (w)
which generates
1
2
N∑i=1
{ti − w tφ(xi )}2 +λ
2wTw
which is minimized by
w =(λI + ΦTΦ
)−1ΦT t
Henrik I Christensen (RIM@GT) Linear Regression 23 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Outline
1 Introduction
2 Preliminaries
3 Linear Basis Function Models
4 Baysian Linear Regression
5 Baysian Model Comparison
6 Summary
Henrik I Christensen (RIM@GT) Linear Regression 24 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Bayesian Linear Regression
Define a conjugate prior over w
p(w) = N(w |m0,S0)
given the likelihood function and regular from Bayesian analysis wecan derive
p(w |t) = N(w |mN ,SN)
where
mN = SN
(S−1
0 m0 + βΦT t)
S−1N = S−1
0 + βΦTΦ
Henrik I Christensen (RIM@GT) Linear Regression 25 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Bayesian Linear Regression (2)
A common choice is
p(w) = N(w |0, α−1I )
So that
mN = βSNΦT t
S−1N = αI + βΦTΦ
Henrik I Christensen (RIM@GT) Linear Regression 26 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Example - No Data
Henrik I Christensen (RIM@GT) Linear Regression 27 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Example - 1 Data Point
Henrik I Christensen (RIM@GT) Linear Regression 28 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Example - 2 Data Points
Henrik I Christensen (RIM@GT) Linear Regression 29 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Example - 20 Data Points
Henrik I Christensen (RIM@GT) Linear Regression 30 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Outline
1 Introduction
2 Preliminaries
3 Linear Basis Function Models
4 Baysian Linear Regression
5 Baysian Model Comparison
6 Summary
Henrik I Christensen (RIM@GT) Linear Regression 31 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Bayesian Model Comparison
How does one select an appropriate model?
Assume for a minute we want to compare a set of models Mi ,i ∈ 1, ...L for a dataset D
We could compute
p(Mi |D) ∝ p(D|Mi )p(Mi )
Bayes Factor: Ratio of evidence for two models
p(D|Mi )
p(D|Mj)
Henrik I Christensen (RIM@GT) Linear Regression 32 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
The mixture distribution approach
We could use all the models:
p(t|x ,D) =L∑
i=1
p(t|x ,Mi ,D)p(Mi |D)
Or simply go with the most probably/best model.
Henrik I Christensen (RIM@GT) Linear Regression 33 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Model Evidence
We can compute model evidence
p(D|Mi ) =
∫p(D|w ,Mi )p(w |Mi )dw
Allow computation of model fit based on parameter range
Henrik I Christensen (RIM@GT) Linear Regression 34 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Evaluation of Parameters
Evaluation of posterior over parameters
p(w |D,Mi ) =P(D|w ,Mi )p(w |Mi )
P(D|Mi )
There is a need to understand how good is a model?
Henrik I Christensen (RIM@GT) Linear Regression 35 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Model Comparison
Consider evaluation of a model w. parameters w
p(D) =
∫p(D|w)p(w)dw ≈ p(D|wmap)
σposterior
σprior
Then
ln p(D) ≈ ln p(D|wmap) + ln
(σposterior
σprior
)
Henrik I Christensen (RIM@GT) Linear Regression 36 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Model Comparison as Kullback-Leibler
From earlier we have comparison of distributions
KL =
∫p(D|M1) ln
p(D|M1)
p(D|M2)dD
Enables comparison of two different models
Henrik I Christensen (RIM@GT) Linear Regression 37 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Outline
1 Introduction
2 Preliminaries
3 Linear Basis Function Models
4 Baysian Linear Regression
5 Baysian Model Comparison
6 Summary
Henrik I Christensen (RIM@GT) Linear Regression 38 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Summary
Brief intro to linear methods for estimation of models
Prediction of values and models
Needed for adaptive selection of models (black-box/grey-box)Evaluation of sensor models, ...
Consideration of batch and recursive estimation methods
Significant discussion of methods for evaluation of models andparameters.
This far purely a discussion of linear models
Henrik I Christensen (RIM@GT) Linear Regression 39 / 39
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References
Deriche, R., Vaillant, R., & Faugeras, O. 1992. From Noisy Edges Pointsto 3D Reconstruction of a Scene : A Robust Approach and ItsUncertainty Analysis. Vol. 2. World Scientific. Series in MachinePerception and Artificial Intelligence. Pages 71–79.
Henrik I Christensen (RIM@GT) Linear Regression 39 / 39