Linear regression models
-
Upload
serafina-mauro -
Category
Documents
-
view
77 -
download
2
description
Transcript of Linear regression models
![Page 1: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/1.jpg)
Linear regression models
![Page 2: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/2.jpg)
Simple Linear Regression
![Page 3: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/3.jpg)
History
• Developed by Sir Francis Galton (1822-1911) in his article “Regression towards mediocrity in hereditary structure”
![Page 4: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/4.jpg)
Purposes:
• To describe the linear relationship between two continuous variables, the response variable (y-axis) and a single predictor variable (x-axis)
• To determine how much of the variation in Y can be explained by the linear relationship with X and how much of this relationship remains unexplained
• To predict new values of Y from new values of X
![Page 5: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/5.jpg)
The linear regression model is:
• Xi and Yi are paired observations (i = 1 to n)
• β0 = population intercept (when Xi =0)
• β1 = population slope (measures the change in Yi
per unit change in Xi)
• εi = the random or unexplained error associated with the i th observation. The εi are assumed to be independent and distributed as N(0, σ2).
iii XY 10
![Page 6: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/6.jpg)
Linear relationshipY
X
ß0
ß1
1.0
![Page 7: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/7.jpg)
Linear models approximate non-linear functions
over a limited domain
extrapolation extrapolationinterpolation
![Page 8: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/8.jpg)
Yi = βo + β1*Xi + εi
ε ~ N(0,σ2) E(εi) = 0
E(Yi ) = βo + β1*Xi
X1 X2
E(Y1)
E(Y2)
Y
X
• For a given value of X, the sampled Y values are independent with normally distributed errors:
![Page 9: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/9.jpg)
Yi
Ŷi
Yi – Ŷi = εi (residual)
Xi
Fitting data to a linear model:
ii XY 10ˆˆˆ
![Page 10: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/10.jpg)
The residual
2)( iii YYd
n
i ii YYRSS1
2)(
The residual sum of squares
![Page 11: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/11.jpg)
Estimating Regression Parameters
• The “best fit” estimates for the regression population parameters (β0 and β1) are the values that minimize the residual sum of squares (SSresidual) between each observed value and the predicted value of the model:
n
iii
n
iii XYYY
1
210
1
210 ))ˆˆ(()ˆ(minimize toˆ ,ˆ Choose
![Page 12: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/12.jpg)
)()()(1 1
2ii
n
i
n
i iiiiY YYYYYYSS
)()(1 ii
n
i iiXY XXYYSS
Sum of squares
Sum of cross products
![Page 13: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/13.jpg)
Least-squares parameter estimates
n
i iiX XXSS1
2)(
XX
XY
X
XY
SS
SS
s
s
21̂
where
![Page 14: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/14.jpg)
)()(1
11
2 XXXXn
s ini iX
)()(1
11 YYXX
ns i
ni iXY
Sample variance of X:
Sample covariance:
X
XY
i
n
ii
i
n
ii
X
XYSS
SS
XXXX
YYXX
s
s
)()(
)()(
ˆ
1
121
![Page 15: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/15.jpg)
Thus, our estimated regression equation is:
Solving for the intercept:
XY 10ˆˆ
ii XY 10ˆˆˆ
![Page 16: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/16.jpg)
Hypothesis Tests with Regression
• Null hypothesis is that there is no linear relationship between X and Y:
H0: β1 = 0 Yi = β0 + εi
HA: β1 ≠ 0 Yi = β0 + β1 Xi + εi
• We can use an F-ratio (i.e., the ratio of variances) to test these hypotheses
![Page 17: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/17.jpg)
Variance of the error of regression:
2
122
ˆ
2ˆ
n
YY
n
SS
n
iii
residual
NOTE: this is also referred to as residual variance, mean squared error (MSE) or residual mean square (MSresidual)
![Page 18: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/18.jpg)
Mean square of regression:
2
1regressionregression 1
ˆ
1
n
ii YY
SSMS
The F-ratio is: (MSRegression)/(MSResidual)
This ratio follows the F-distribution with (1, n-2) degrees of freedom
![Page 19: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/19.jpg)
Variance components and Coefficient of determination
RSSSSSS Yreg
RSSSSSS regY
![Page 20: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/20.jpg)
Coefficient of determination
RSSSS
SS
SS
SSr
reg
reg
Y
reg
2
![Page 21: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/21.jpg)
ANOVA table for regression
Source Degrees of freedom
Sum of squares Mean square
Expected mean square
F ratio
Regression 1
Residual n-2
Total n-1
n
i iiY YYSS1
2)(
n
i iireg YYSS1
2)ˆ(
n
i ii YYRSS1
2)ˆ(
1regSS
2nRSS
1nSSY
N
i
X1
221
2
2
2Y
)2/(
1/
nRSS
SSreg
![Page 22: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/22.jpg)
Product-moment correlation coefficient
YX
XY
YX
XY
ss
s
SSSS
SSr
![Page 23: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/23.jpg)
Parametric Confidence Intervals• If we assume our parameter of interest has a particular sampling
distribution and we have estimated its expected value and variance, we can construct a confidence interval for a given percentile.
• Example: if we assume Y is a normal random variable with unknown mean μ and variance σ2, then is distributed as a standard normal variable. But, since we don’t know σ, we must divide by the standard error instead: , giving us a t-distribution with (n-1) degrees of freedom.
• The 100(1-α)% confidence interval for μ is then given by:
• IMPORTANT: this does not mean “There is a 100(1-α)% chance that the true population mean μ occurs inside this interval.” It means that if we were to repeatedly sample the population in the same way, 100(1-α)% of the confidence intervals would contain the true population mean μ.
)( Y
YsY )(
YnYn stYstY )1;2/1()1;2/1(
![Page 24: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/24.jpg)
Publication form of ANOVA table for regression
SourceSum of Squares df
Mean Square F Sig.
Regression 11.479 1 11.479 21.044 0.00035
Residual8.182 15 .545
Total 19.661 16
![Page 25: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/25.jpg)
Variance of estimated intercept
XSS
X
n
222 1
ˆˆ0
00
ˆ2,00ˆ2,0 ˆˆˆˆˆ nn tt
![Page 26: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/26.jpg)
Variance of the slope estimator
XSS
22 ˆ
ˆ1
11
ˆ2,11ˆ2,1 ˆˆˆˆ nn tt
![Page 27: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/27.jpg)
Variance of the fitted value
X
iXY SS
XX
n
222
)|ˆ(
1ˆˆ
)|ˆ(2,)|ˆ(2, ˆˆˆˆˆXYnXYn tYYtY
![Page 28: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/28.jpg)
Variance of the predicted value (Ỹ):
XXY SS
XX
n
222
)~
|~
(
~1
1ˆˆ
)~
|~
(2,)~
|~
(2, ˆ~~
ˆ~
XYnXYn tYYtY
![Page 29: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/29.jpg)
Regression
Ln( Island Area)
1086420-2
Ln (nu
mbe
r of
spe
cies
)
8
7
6
5
4
3
2
1
![Page 30: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/30.jpg)
Assumptions of regression
• The linear model correctly describes the functional relationship between X and Y
• The X variable is measured without error
• For a given value of X, the sampled Y values are independent with normally distributed errors
• Variances are constant along the regression line
![Page 31: Linear regression models](https://reader030.fdocuments.net/reader030/viewer/2022013105/5681329a550346895d9935b0/html5/thumbnails/31.jpg)
Residual plot for species-area relationship
Unstandardized Predicted Value
6.05.55.04.54.03.53.02.5
Uns
tand
ardi
zed
Res
idua
l
1.5
1.0
.5
0.0
-.5
-1.0
-1.5