Slide Set 3 Regression Models and the Classical Linear ... · SlideSet3 Regression Models and the...
Transcript of Slide Set 3 Regression Models and the Classical Linear ... · SlideSet3 Regression Models and the...
Slide Set 3Regression Models and the
Classical Linear Regression Model (CLRM)
Pietro [email protected]
EconometricsMaster in Economics and Finance (MEF)
Università degli Studi di Napoli “Federico II”
Version: Tuesday 21st January, 2020 (h11:31)
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 1 / 26
Regression analysis
Regression analysis is a set of statistical techniques for modeling andanalyzing the relationship between a dependent variable Y and one ormore independent variables X. Typically X = (X1, X2, . . . , XK)′ is avector of variables.
Depending on the context, and the field of application, we have differentnames
Y : dependent variable, response variable, outcome variable, outputvariable, target variable etc.X: independent variable, regressor, covariate, explanatory variable,predictor, feature, etc.
In regression analysis we assume a certain mechanism linking the X to theY . We want to use observed data to understand the link.
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 2 / 26
Notes
Notes
Regression function and types of regression models
The link is formalized in terms of a regression function. The latter modelsthe relationship between Y and X
Y ≈ f(X)
Building a regression model requires to specifyhow the f(·) transforms Xin which sense the f(·) approximates Y
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 3 / 26
Depending on how the f(·) transforms X we havenonparametric modelsparametric models
Nonparametric modelsThe f(·) is treated as the unknown. Therefore the object of interestbelongs to an infinite dimensional space. Usually we restrict our quest tosome well defined class, for instance we may restrict the analysis to{
f : f is real valued, smooth and∫|f(x)|dx < +∞
}
Nonparametric models allow for a lot flexibility, and this comes at a price.
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 4 / 26
Notes
Notes
Parametric modelsf is assumed to have a specific form controlled by a scalar parameter β ora vector of parameters β = (β1, β2, . . . , βp)′. Therefore the object ofinterest is β.
Exampleslinear parametric regession function: f(X;β) = β1X1 + β2X2
nonlinear parametric regession function:f(X;β) = sin(β1X1) + eβ2X2
Some nonlinear regression function can be linearized. Example:
f(X;β) = eX1+βX2 −→ log(f(X;β)) = X1 + βX2
Sometimes a regression function is not linear in the original X, but it islinear in a transformation of X
f(X;β) = β1X21 + β2X2
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 5 / 26
Depending on what kind of approximation f(·) provides about Y(conditional) mean regression: f(X) = E[Y |X](conditional) quantile regression:f(X) = median[Y |X]f(X) = quantileα[Y |X]etc...
Conditional mean regression functions are central in regression analysis forseveral reasons:
approximating Y by an average it’s intuitivemost theoretical models are expressed in terms of expectations“Optimal Predictor Theorem”
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 6 / 26
Notes
Notes
The quality of the approximation Y ≈ f(X) can be measure by thequadratic risk or MSE
E[(Y − f(X))2]
Theorem (Optimal Predictor)
Under suitable regularity conditions
inff
E[(Y − f(X))2] = E[Y |X]
In other wordsf(X) = E[Y |X] gives the best approximation to Y in terms of MSEIf we want to guess Y based on information generated by X,f(X) = E[Y |X] is the best guess in terms of MSE
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 7 / 26
Proof. We need to show that for any function f(X)
E[(Y − E[Y |X])2] ≤ E[(Y − f(X))2]
E[(Y − f(X))2] = E
Y−E[Y |X]︸ ︷︷ ︸
A
+ E[Y |X]− f(X)︸ ︷︷ ︸B
2
computing (A+B)2 and using using the lineary of expectations
E[A2] + E[B2] + 2 E[AB] = E[(Y − f(X))2] (3.1)
E[B2] = E[(E[Y |X]− f(X))2] ≥ 0, therefore (3.1) becomes
E[A2] + 2 E[AB] ≤ E[(Y − f(X))2] (3.2)
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 8 / 26
Notes
Notes
E[AB] = E[ E[AB |X] ] (law of iterated expectations )= E
[(Y − E[Y |X])(E[Y |X]− f(X))
∣∣ X]= (E[Y |X]− f(X)) E
[(Y − E[Y |X])
∣∣ X] (pull out)= (E[Y |X]− f(X)) { E[Y |X]− E[E[Y |X] |X]}= (E[Y |X]− f(X))× 0
Therefore, (3.2) becomes
E[A2] ≤ E[(Y − f(X))2]
with E[A2] = E[(Y − E[Y |X])2], which proves the result �
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 9 / 26
In this course we focus on conditional mean regression models where theregression function has a linear parametric form:
Y ≈ E[Y |X] = f(X;β) = β1X1 + β2X2 + . . .+ βKXK
The reason why this class of regression models is so popular is thatbecause they can reproduce correlation between Y and Xs. Going back tothe example of Slide Set #1
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 10 / 26
Notes
Notes
5 10 15 20
50100
150
200
250
X
Y
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 11 / 26
X
Y
x = 5
y|x
=13
5.1
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 12 / 26
Notes
Notes
X
Y
x = 10
y|x
=17
1.8
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 13 / 26
X
Y
x = 15
y|x
=20
8.6
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 14 / 26
Notes
Notes
The model postulates that Y ≈ E[Y |X], but we cannot observe E[Y |X].
For each sample unit i = 1, 2, . . . , n we observe the samples(Yi, Xi1, Xi2, . . . , Xik). Therefore, we need an additional term whichsummarizes the difference between Yi and its conditional mean E[Yi |X]
A way to reproduce the previous sampling mechanism is to add an errorterm εi, which is a random variable that “summarizes” the deviations ofYi from its conditional mean E[Yi |X]. Therefore
Yi = f(Xi;β) + εi
= E[Yi |X] + εi
= β1X1 + β2X2 + . . .+ βKXK + εi
The short name for this class of models is linear regression models = linearparametric regression function plus an additive error term.
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 15 / 26
Partial or marginal effects
The partial/marginal effect is a measure of the effect on the regressionfunction determined by a change in a regressor Xj holding all the otherregressors constant (waee = “with all else equal”).
Let us focus on conditional mean models. Assuming differentiability, thepartial/marginal effect of a change ∆Xj is given by
∆ E[Y |X] = ∂ E[Y |X]∂Xj
∆Xj hoding fixed X1, . . . , Xj−1, Xj+1, . . . , XK
Computing marginal/partial effects make sense only when the model has acausal interpretation (see later).
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 16 / 26
Notes
Notes
For the linear regression model
∂ E[Y |X]∂Xj
= βj
Therefore, the unknown parameters βj coincides with partial effect of anunit change in Xj waee.
For a discrete regressor Xj , partial effects are computed as the variationsin E[Y |X] obtained by changing the level of Xj and waee.
Suppose Xk ∈ {a, b, c}. The partial effect when Xk changes from level ato b (waee) is given by
E[Y |Xk = b,X] − E[Y |Xk = a,X]
Another measure of regressors’ effect on the Y is the partial/marginalelasticity (see homeworks).
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 17 / 26
Notations
Indexes and constants:n: number of sample unitsK: number of covariates/regressors/features measured on each of then sample unitsi = 1, 2, . . . , n: indexes sample unitsk = 1, 2, . . . ,K: indexes regressors
y ∈ Rn: column vector of the dependent/response variable
y =
y1y2...yn
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 18 / 26
Notes
Notes
xi ∈ RK : column vector containing the K regressors measured on theith sample unit
xi =
xi1xi2...
xiK
So called design matrix, is the (n×K) matrix whose rows contain sampleunits and columns contain regressors
X =
x11 x12 . . . x1Kx21 x22 . . . x2K...
... . . . ...xn1 xn2 . . . xnK
=
x1′
x2′
...xn′
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 19 / 26
ε ∈ Rn: column vector containing the error term for each unit
ε =
ε1ε2...εn
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 20 / 26
Notes
Notes
Classical linear regression model (CLRM)
A1: linearityFor all i = 1, 2, ..., n observed data are generated by the following linearmodel
yi = β1xi1 + β2xi2 + ...+ βKxiK + εi (3.3)= x′iβ + εi (3.4)
where β ∈ RK is a vector of coefficients. In matrix form
y = Xβ + ε (3.5)
Remark 1: linearity of the model is wrt parameters not regressors. Forx2=log(consumption), the model is still linear wrt to log(consumption).
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 21 / 26
Remark 2: often a constant/intercept term is introduced in the model
yi = β1 + β2xi2 + ...+ βKxiK + εi for i = 1, 2, ..., n
In this case we consider conventionally (3.3) as if xi1 = 1 for all i in thesample.
If the model includes a constant/intercept term, then the first column ofX is the unit column vector, that is X·1 = 1n = (1, 1, . . . , 1)′
X =
1 x12 . . . x1K1 x22 . . . x2K...
... . . . ...1 xn2 . . . xnK
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 22 / 26
Notes
Notes
A2: strict exogeneityE[εi |X] = 0 for all i = 1, 2, . . . , n, or E[ε |X] = 0
Implications:1 E[εi] = 0 for all i = 1, 2, . . . , n.2 All regressors are orthogonal to the error term for all units:
E[xjkεi] = 0 for all i, j = 1, 2, . . . , n and k = 1, 2, . . . ,K3 Orthogonality implies the zero-correlation conditions
Cov[xjk, εi] = 0 for all i, j = 1, 2, . . . , n and k = 1, 2, . . . ,K
If i =time (time series data), strict exogeneity implies that the errorterm is orthogonal to the past, current, and future regressors. For mosttime-series data, this condition is not satisfied, so the finite sample theorybased on strict exogeneity is rarely applicable in time-series contexts.
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 23 / 26
Proof of the implications:1. It follows from the law of iterated expectations:
E[εi] = E[E[εi |X]] = E[0] = 0
2. By the law of iterated expectations write
E[xjiεi] = E[E[xjkεi |xjk]]
And by the linearity of the conditional expectation (“pull out what’sknown”), write
E[E[xjkεi |xjk]] = E[xjkE[εi |xjk]]
But A2 says E[εi |xjk] = 0, which proves the desired result.3. This follows from previous results:
Cov[xjk, εi] = E[xjkεi]− E[xjk]E[εi] = E[xjkεi] = 0
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 24 / 26
Notes
Notes
A3: absence of multicollinearityThe (n×K) design matrix X has rank K with probability 1.
This assumption implies that X has full column rank, which means thatthe columns of X are linearly independent
A3 also requires that n ≥ K.
Essentially A3 is a technical assumption that guarantees that mostcomputations can be performed... see this later.
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 25 / 26
A4: spherical error varianceFor all i, j = 1, 2, . . . , n and i 6= j
1 E[ε2i |X] = σ2 > 0 (homoskedasticity)
2 E[εiεj |X] = 0 (units are uncorrelated)
A vector random variable is said to have a spherical distribution if itscovariance matrix is a scalar multiple of the identity matrix. The sphericityhere is shown as follows:
Var[εi |X] = E[ε2i |X]− E[εi |X]2 = E[ε2
i |X] = σ2
and
Cov[εi, εj |X] = E[εiεj |X]− E[εi |X]E[εj |X] = E[εiεj |X] = 0
Now it is easy to show (do it as an exercise) that
E[εε′ |X] = Var[ε |X] = σ2In
P. Coretto • MEF Regression Models and the Classical Linear Regression Model (CLRM) 26 / 26
Notes
Notes