Post on 31-Dec-2015
description
Overview for Today Review Simple Linear Regression, Ch 12 Go over problem 12.56 Multiple Linear Regression, Ch 13 (1-5)
Multiple explanatory variables Coefficient of multiple determination Adjusted R2
Residue Analysis F-test t test and confidence interval for slope Partial F-tests for each individual contributions Coefficients of partial determination
Homework assignment
Regression Modeling Analysis of variance to “fit” a predictive model for a
response (dependent) variable to a set of one or more explanatory (independent) variables
Minimize residual error w.r.t. linear coefficients Interpolative over relevant range - do not extrapolative Typically linear, but may be curvilinear or more
complex (w.r.t. independent variables) Related to Correlation Analysis - measuring the
strength of association between variables Regression is about variance in the response variable Correlation is about co-variance - symmetric
Types of Regression Models
Based on Scatter Plots Y vs X Dependent vs independent
Linear Models Positive, negative or no slope Zero or non-zero intercept
Curvilinear Models Positive, negative or no “slope” Positive, negative or varied curvature May be U shaped, with extrema May be asymptotically or piece-wise linear May be polynomial, exponential, inverse,…
Least-Square Linear Regression
Simple Linear Model (for population) Yi = 0 + 1Xi + i
Xi = value of independent variable Yi = observed value of dependent variable 0 = Y-intercept (Y at X=0) 1 = slope (Y/X) i = random error for observation i
Yi’ = b0 + b1Xi (predicted value) b0 and b1 are called regression coefficients ei = Yi - Yi’ (residual) Minimize ei
2 for sample with respect to b0 and b1
Partitioning of Variation Total variation
Regression variation
Random variation
SST (Yi
i1
n
Y )2
Y 1
nYi
i1
n
(Mean response)
SSR ( ˆ Y ii1
n
Y )2
SSE (Yi
i1
n
ˆ Y i)2
SST = SSR + SSE
Coefficient of Determinationr2 = SSR/SST
Standard Error of the Estimate
SXY SSE
n 2
Assumptions of Regression (and Correlation) Normality of error
about regression line Homoscedasticity
(equal variance) along X Independence of errors
with respect to X No autocorrelation in time
Analysis of residuals to test assumptions Histogram, Box-and-Whisker plots Normalcy plot Ordered plots (by X, by time,…)
See figures on pp 584-5
t Test for Slope
t b1 1
Sb1
H0: 1 = 0
Sb1
SYX
SSX
SYX SSE
n 2
SSE (Yi
i1
n
ˆ Y i)2
SSX (X i
i1
n
X)2
Critical t value based on chosen level of significance, , and n-2 degrees of freedom
F Test for Single Regression F = MSR / MSE Reject H0 if F > FU(,1,n-2) [or p<]
Note: t2 (,n-2) = FU(,1,n-2)
One-Way ANOVA SummarySource Degrees of
Freedom (df)Sum of Squares (SS)
Mean Square (MS) (Variance)
F p-value
Regression 1 SSR MSR = SSR MSR/MSE
Error n-2 SSE MSE = SSE/(n-2)
Total n-1 SST
Confidence and Prediction Intervals
Confidence Interval Estimate for the Slope
Confidence Interval Estimate for the Mean
Confidence Interval Estimate for Individual Response
b1 tn 2Sb1
b1 tn 2Sb11 b1 tn 2Sb1
ˆ Y i tn 2SYX hi
hi 1
n
(X i X)2
(X i X)2
i1
n
See Fig 12.16, p 592
ˆ Y i tn 2SYX 1 hi
ˆ Y i tn 2SYX hi Y |X i ˆ Y i tn 2SYX hi
ˆ Y i tn 2SYX 1 hi Yi ˆ Y i tn 2SYX 1 hi
Pitfalls Not testing assumptions of least-square regression
by analyzing residuals, looking for Patterns Outliers Non-uniform distribution about mean See Figs 12.18-19, p 597-8
Not being aware of alternatives to least-square regression when assumptions are violated
Not knowing subject matter being modeled
Computing by Hand Slope
Y-Intercept
b1 SSXY
SSX
SSXY (X i Xi1
n
)(Yi Y ) X iYi
i1
n
1
nX i
i1
n
Yi
i1
n
SSX (X i Xi1
n
)2 X i2
i1
n
1
nX i
i1
n
2
b0 Y b1 X
X 1
nX i
i1
n
Y 1
nYi
i1
n
Computing by Hand Measures of Variation
SST (Yi Yi1
n
)2 Yi2
i1
n
1
nYi
i1
n
2
SSR ( ˆ Y i Yi1
n
)2 b0 Yi
i1
n
b1 X iYi
i1
n
1
nYi
i1
n
2
SSE (Yi ˆ Y ii1
n
)2 Yi2
i1
n
b0 Yi
i1
n
b1 X iYi
i1
n
Coefficient of Correlation For a regression
For a correlation
r r2 SSR
SST
r SSXY
SSX SSY
SSX (X i
i1
n
X)2
SSY (Yi
i1
n
Y )2
SSXY (X i X)(Yi
i1
n
Y ) Covariance
Also called… Pearson’s product-moment correlation coefficient
t Test for Correlation
t r 1 r2
n 2
H0: = 0
Critical t value based on chosen level of significance, , and n-2 degrees of freedom
)2(1 2
2
nr
rF Compared to FU(,1,n-2) = t2(,n-2)Or
Multiple Regression Linear model - multiple dependent variables
Yi = 0 + 1X1i + … + jXji + i Xji = value of independent variable Yi = observed value of dependent variable 0 = Y-intercept (Y at X=0) j = slope (Y/Xj) i = random error for observation i
Yi’ = b0 + b1Xi + … + bjXji (predicted value) The bj’s are called the regression coefficients ei = Yi - Yi’ (residual)
Minimize ei2 for sample with respect to all bj
Partitioning of Variation Total variation
Regression variation
Random variation
SST (Yi
i1
n
Y )2
Y 1
nYi
i1
n
(Mean response)
SSR ( ˆ Y ii1
n
Y )2
SSE (Yi
i1
n
ˆ Y i)2
SST = SSR + SSE
Coefficient of Multiple DeterminationR2
Y.12..k = SSR/SST
Standard Error of the Estimate
SXY SSE
n k 1
Adjusted R2
To account for sample size (n) and number of dependent variables (k) for comparison purposes
Radj2 1 1 RY .12...k
2 n 1
n k 1
Residual Analysis Plot residuals vs
Yi’ (predicted values)
X1, X2,…,Xk
Time (for autocorrelation) Check for
Patterns Outliers Non-uniform distribution about mean See Figs 12.18-19, p 597-8
F Test for Multiple Regression F = MSR / MSE Reject H0 if F > FU(,k,n-k-1) [or p<]
k = number of independent variables One-Way ANOVA Summary
Source Degrees of Freedom (df)
Sum of Squares (SS)
Mean Square (MS) (Variance)
F p-value
Regression k SSR MSR = SSR/k MSR/MSE
Error n-k-1 SSE MSE = SSE/(n-k-1)
Total n-1 SST
t Test for Slope
t b j j
Sb j
H0: j = 0
Sb j
SY .12..k
SSX j
SY .12..k SSE
n k 1
SSE (Yi
i1
n
ˆ Y i)2
SSX j (X ji
i1
n
X j )2
Critical t value based on chosen level of significance, , and n-k-1 degrees of freedom
See output from PHStat
Confidence and Prediction Intervals
Confidence Interval Estimate for the Slope
Confidence Interval Estimate for the Mean and Prediction Interval Estimate for Individual Response
Beyond the scope of this text
b j tn k 1Sb j
b j tn k 1Sb j j b j tn k 1Sb j
Partial F Tests Significance test for contribution from individual
independent variable Measure of incremental improvement All others already taken into account
Fj = SSR(Xj|{Xi≠j}) / MSE SSR(Xj|{Xi≠j}) = SSR - SSR({Xi≠j})
Reject H0 if Fj > FU(,1,n-k-1) [or p<] Note: t2 (,n-k-1) = FU(,1,n-k-1)
Coefficients of Partial Determination
RYj.12..k(j )2
SSR(X j |{X ij})
SST SSR SSR(X j |{X ij})
See PHStat output in Fig 13.10, p 637
SSR SSR({X ij})
SST SSR({X ij})
Homework Review “Multiple Regression”, 13.1-5 Work through Appendix 13.1 Work and hand in Problem 13.62 Read “Multiple Regression”, 13.6-11
Quadratic model Dummy-variable model Using transformations Collinearity (VIF) Modeling building
Cp statistic and stepwise regression
Preview problems 13.63-13.67