Topic 3: Multiple Regression Analysisprofessional.education.uiowa.edu/skliethe/teaching... ·...
Transcript of Topic 3: Multiple Regression Analysisprofessional.education.uiowa.edu/skliethe/teaching... ·...
3a-1
Design and Analysis of Biomedical Studies (171:162) Summer Semester 2012
Topic 3: Multiple Linear Regression I (Part i) Models and Parameter Estimation
Stephanie Kliethermes
June 13, 2012
3a-2
Multiple Linear Regression Multiple regression analysis is an extension of simple linear (straight-line) regression. With multiple linear regression we use more than one explanatory variable (or higher order terms, i.e. X2) to explain or predict a single response variable. Dealing with several independent variables simultaneously in a regression analysis is considerably more difficult than dealing with a single independent variable.
1. It is more difficult to choose the best model. 2. It is more difficult to visualize what the fitted model looks like. 3. It can be more difficult to interpret what the model means in
clinical terms. 4. Computations are virtually impossible by hand.
3a-3
Example 3.1 (Problem 8.2, pages 127-8)
A psychiatrist wants to know whether the level of pathology (Y) in
psychotic patients 6 months after treatment can be predicted with
reasonable accuracy from knowledge of pre-treatment symptom
ratings of thinking disturbance (X1) and hostile suspiciousness (X2).
3a-4
First try to fit two simple linear regression models: Model: MODEL1 Dependent Variable: pathology pathology Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 1535.85697 1535.85697 6.39 0.0146 Error 51 12255 240.30025 Corrected Total 52 13791 Root MSE 15.50162 R-Square 0.1114 Dependent Mean 22.69811 Adj R-Sq 0.0939 Coeff Var 68.29476 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 -25.04084 19.00283 -1.32 0.1935 thinking thinking 1 15.95111 6.30947 2.53 0.0146
3a-5
Model: MODEL2 Dependent Variable: pathology pathology Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 157.84733 157.84733 0.59 0.4458 Error 51 13633 267.32005 Corrected Total 51 13791 Root MSE 16.34993 R-Square 0.0114 Dependent Mean 22.69811 Adj R-Sq -0.0079 Coeff Var 72.03209 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t|
3a-6
Intercept Intercept 1 37.61326 19.53945 1.92 0.0598 hostile hostile 1 -2.25150 2.93001 -0.77 0.4458
Multiple Regression Models The general form of a regression model for k independent variables is given by
EXXXY kk +++++= ββββ 22110 where Y is the dependent (response) variable X1, X2, …, Xk are the independent (explanatory) variables
β0, β1, β2, …, βk are the regression coefficients that need to be estimated. Data Structure
Y 1X 2X kX
3a-7
1
2
n
yy
y
11
12
1n
xx
x
21
22
2n
xx
x
1
2
k
k
k n
xx
x
3a-8
Instead of looking at individual relationships, we can look at all of the
pair-wise scatterplots:
3a-9
With two independent variables, our representation of the data can best be viewed in three-dimensions. (As the number of independent variables increase, our best-fitting “curve” becomes a hypersurface in (K+1) dimensional space).
3a-10
Thus, in three dimensions the response surface is not a line, but rather a plane. Similarly to the first step in simple linear regression, we now want to find the best fitting plane to the data using the method of least squares. The plane will describe the mean values of Y at various combinations of the independent variables (X1= thinking, X2=hostile)
3a-11
Assumptions of Multiple Regression
1. Existence: For each specific combination of values of the (basic) independent variables X1, X2, …, Xk, Y is a (univariate) random variable with a certain probability distribution having finite mean and variance.
2. Independence: The Y observations are statistically independent of one
another. 3. Linearity: The mean value of Y for each specific combination of X1,
X2, …, Xk is a linear function of X1, X2, …, Xk; that is
EXXXYor
XXX
kk
kkXXXY k
+++++=
++++=
ββββ
ββββµ
22110
22110,...,,| 21
where E is the error component reflecting the difference between an individual’s observed response Y and the true average response
kXXXY ,...,,| 21µ .
3a-12
4. Homoscedasticity: The variance of Y is the same for any fixed combination of X1, X2, …, Xk; that is
221
2,,,| ),,,|(
21σσ ≡= kXXXY XXXYVar
k
5. Normality: For any fixed combination of X1, X2, …, Xk, the variable Y
is normally distributed.
),(~ 2,...,,| 21
σµkXXXYNY
or equivalently we can assume that E ~ N( 0, σ 2 ).
• normality is required for inference of the regression model, but not to fit
the model using least squares regression • Tests are fairly robust—only extreme departures from normality will
produce incorrect results. • If lack of normality does occur, a transformation of the Y values is
considered.
3a-13
Similarly to simple linear regression: • Y is an observable random variable, while X1, X2, . . , Xk are generally
considered fixed (nonrandom) known quantities • The constants β0, β1, …, βk are unknown regression/population
parameters • E is an unobservable random variable • If one estimates β0, β1, …, βk with �̂�0, �̂�1, … , �̂�𝑘 then an acceptable
estimate of Ei for the ith subject is
( )kikiiiii XXYYYE βββ ˆˆˆˆˆ110 +++−=−=
• The estimated error 𝐸�𝑖 is usually called a residual o 𝑌 = 𝛽0 + 𝛽1𝑋1 + ⋯+ 𝛽𝑘𝑋𝑘 is called the prediction equation With numbers (estimates) we call it the least squares equation or
fitted model or estimated regression equation o 𝑌 = 𝛽0 + 𝛽1𝑋1 + ⋯+ 𝛽𝑘𝑋𝑘 + 𝐸 is called the regression model
3a-14
Determining the Best Estimate of the Multiple Regression Equation The least squares approach is again used to estimate the multiple
regression equation.
Just as before, we minimize the sum of squared distances between the
observed responses and the predicted responses.
Let the fitted regression model be
𝑌� = �̂�0 + �̂�1𝑋1 + �̂�2𝑋2 + ⋯�̂�𝑘𝑋𝑘
We then obtain the estimates that minimize
𝑆𝑆𝐸 = ∑�𝑌𝑖 − 𝑌�𝑖�2 = ∑�𝑌𝑖 − �̂�0 − �̂�1𝑋𝑖1 − ⋯− �̂�𝑘𝑋𝑖𝑘�
2
Calculation of these estimates is beyond the scope of this class but estimates are easily calculated in standard statistical software.
3a-15
Properties of the least squares estimators:
1. The estimators �̂�0, �̂�1, … , �̂�𝑘 are each a linear function of the Y-
values. Since Y values assumed to be normally distributed (and
independent), �̂�0, �̂�1, … , �̂�𝑘 will be normally distributed, too.
2. The least-squares equation 𝑌� = �̂�0 + �̂�1𝑋1 + �̂�2𝑋2 + ⋯�̂�𝑘𝑋𝑘 is the
combination of variables X1, X2,…, Xk that has maximum possible
correlation with Y.
𝑟𝑌,𝑌� =∑(𝑌𝑖 − 𝑌�)(𝑌�𝑖 − 𝑌��)
�∑(𝑌𝑖 − 𝑌�)2∑�𝑌�𝑖 − 𝑌���2
The quantity 𝑟𝑌,𝑌� is called the multiple correlation coefficient.
3. Multiple regression is related to the multivariate normal distribution
just as straight-line regression is related to the bivariate normal.
3a-16
Interpretation of Regression Parameters In multiple linear regression, the interpretation of 𝛽𝑖 (𝑖 = 1,2, … , 𝑘) is the same as it was in simple linear regression – with one additional (and important) restriction.
𝛽𝑖 (𝑖 = 1,2, … , 𝑘) represents the expected change of Y corresponding to a
one unit increase in X given the other independent variables are fixed
(must be within the scope of the model).
If we allowed the other predictors to change, we would be unable to
determine how much of the change in the mean of Y was associated with the
differing values of the predictor of interest vs. how much was due to some
other predictor(s) differing as well.
3a-17
The ANOVA Table for Multiple Regression Similarly to SLR, the ANOVA table can provide an overall summary of a multiple regression analysis. We begin by partitioning the sums of squares. 𝑆𝑆𝑌 = ∑(𝑌𝑖 − 𝑌�)2 is called the total sum of squares
- represents total variability in Y without accounting for any X variables in the regression equation.
𝑆𝑆𝐸 = ∑�𝑌𝑖 − 𝑌�𝑖�
2 is the residual sum of squares
- represents the amount of Y variation left unexplained after the X variables have been used in the regression equation.
𝑆𝑆𝑅 = 𝑆𝑆𝑌 − 𝑆𝑆𝐸 = ∑�𝑌�𝑖 − 𝑌��2 is the regression sum of squares
- represents the reduction in variation due to the X variables in the regression equation.
3a-18
SSY = SSR + SSE
( ) ( ) ( )2 22
1 1 1
ˆ ˆ +
SS(Total) SS(Regression)
n n n
i i i ii i i
Y Y Y Y Y Y= = =
− = − −
↑ ↑ ↑
∑ ∑ ∑
SS(Residuals) SS(Explained) SS(Unexplained)
In SAS Source DF Sum of
Squares Mean Square
F
Model k SSR MSR=SSR/k F=MSR/MSE Error n-k-1 SSE MSE=SSE/(n-k-1) C Total n-1 SSY
Note: Different software packages may present the form of the ANOVA table with slight variations.
3a-19
SSYSSE
SSYSSR
SSYSSESSYR −==
−= 12
• The R2 value is a quantitative measure of how well the fitted model
containing variables X1, X2, … Xk predicts the dependent variable Y.
• 0 ≤ R2 ≤ 1. Back to our example:
PROC REG DATA=Ex0802; MODEL y=x1 x2; TITLE 'Multivariate Regression Analysis'; RUN;
3a-20
Multivariate Regression Analysis The REG Procedure Model: MODEL1 Dependent Variable: pathology Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 2753.87136 1376.93568 6.24 0.0038 Error 50 11037 220.74597 Corrected Total 52 13791 Root MSE 14.85752 R-Square 0.1997 Dependent Mean 22.69811 Adj R-Sq 0.1677 Coeff Var 65.45708 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -0.63535 20.96833 -0.03 0.9759 thinking 1 23.45144 6.83851 3.43 0.0012
hostile 1 -7.07261 3.01092 -2.35 0.0228
3a-21
The least squares equation is
1 2
0 1 2
ˆ 0.635 23.451 7.073ˆ ˆ ˆ0.635, 23.451, and 7.073
Y X X
β β β
= − + −
= − = = −
• Given a fixed rating on hostile suspiciousness (X2), every unit increase
in thinking disturbance (X1) yields a 23.451 unit increase in level of pathology, on average.
• Given a fixed rating of thinking disturbance (X1), every unit increase in hostile suspiciousness (X2) yields a 7.07261 unit decrease in level of pathology, on average.
Using this equation, determine the predicted level of pathology for a patient
with pretreatment scores of 2.80 on thinking disturbance and 7.0 on hostile
suspiciousness. How does this predicted value compare with the value
actually obtained for patient 5 (Y5=25)?
3a-22
Comparison of parameter estimates in three models considered: 𝑌� = �̂�0 + �̂�1𝑋1 Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -25.04084 19.00283 -1.32 0.1935 thinking 1 15.95111 6.30947 2.53 0.0146
𝑌� = �̂�0 + �̂�1𝑋2 Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 37.61326 19.53945 1.92 0.0598 hostile 1 -2.25150 2.93001 -0.77 0.445
𝑌� = �̂�0 + �̂�1𝑋1 + �̂�2𝑋2 Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -0.63535 20.96833 -0.03 0.9759 thinking 1 23.45144 6.83851 3.43 0.0012 hostile 1 -7.07261 3.01092 -2.35 0.0228