Lewinian Transformations, Transformations of Transformations, Music Hermeneutics
Chapter 4 Transformations and Weighting to Correct Model Inadequacies 13 March
-
Upload
lookeyloke5014 -
Category
Documents
-
view
73 -
download
19
Transcript of Chapter 4 Transformations and Weighting to Correct Model Inadequacies 13 March
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 1
Recall that Regression model fitting has several implicit assumptions, including the following:
1. The model errors have mean zero and constant variance and use uncorrelated.
2. The model errors have a normal distribution – this assumption is made in order to conduct
hypothesis tests and construct CIs – under this assumptions, the errors are independent.
3. The form of the model, including the specification of the regressors, is correct.
Chapter 3 presented several techniques for checking the adequacy of the linear regression model. If the
linear regression model is not appropriate for a data set, there are two basic choices:
1. Abandon the regression model and develop a more appropriate model.
2. Employ some transformation on the data so that regression model is appropriate for the
transformed data.
We consider the use of transformation in this chapter.
4.1 Variance Stabilizing Transformation
The assumption of constant variance is a basic requirement of regression analysis. A common reason
for the violation of this assumption is for the response variable Y to follow a probability distribution in
which the variance is functionally related to the mean.
For example, if Y follow a Poisson distribution with mean , note that the variance of Y is equal to its
mean . Since the mean of Y related to the regressor variable X , the variance of Y will be
proportional to X .
Example 4.1:
Consider the simple linear regression model iii xy 10 , where ii xVar 2 )( . Suppose we use
the transformations X
YY . Is this a Variance Stabilizing Transformation?
22
2
2
11)(
'
)(
)(
xx
YVarxx
YVarYVar
x
YY
xYVar
xVar
i
ii
Yes, variance of Y became constanst.
Unequal error variances and non-normality of the error terms frequently appears together. To
remedial these departures from linear regression model, we need a transformation on Y , since
the shape and spreads of the distributions of Y need to be changed.
Transformation on Y may also at the same time help to linearize a curvilinear regression relation.
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 2
Figure 4.1 below contains some prototype regression relations where the skewness and error variance
increase with the mean response )(YE .
Figure 4.1: Prototype Regression Pattern
Transformation on Y
YY )(log YY 10 YY /1
Note: A simultaneous transformation on X may also be helpful or necessary.
Useful Variance-Stabilizing Transformations:
Relationship of 2 to E(Y) Transformation
constant2 Y’ = Y (no transformation)
)(E2 Y Y’ = y (square root, Poisson data)
)](1)[(E2 YEY )(sin' 1 YY (arsin; binomial proportions
0≤ Yi ≤ 1) 22 )(E Y Y’=ln(Y) (natural log)
32 )(E Y Y’ = Y-1/2
(reciprocal square root)
42 )(E Y Y’= Y-1
(reciprocal)
Example 4.2: Data on age ( X ) and plasma level of polymine (Y ) for a portion of the 25 healthy children in a study
are presented below in R codes:
Age <- c(0,0,0,0,0,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
Plasma <- c(13.44,12.84,11.91,20.09,15.60,10.11,11.38,10.28,8.96,
8.59,9.83,9,8.65,7.85,8.88,7.94,6.01,5.14,6.9,6.77,4.86,5.1,5.67,5.75,6.23)
#Use lm() function to fit the model
Blood.Reg <- lm(Plasma~Age)
#create the scatter plot
plot(x = Age, y =Plasma , xlab="Age", ylab = "Plasma", main = "Plasma Level vs. Age Before
Transformation", col = "Red", pch = 19, cex=1.5)
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 3
0 1 2 3 4
51
01
52
0Plasma Level vs. Age Before Transformation
Age
Pla
sm
a
The scatter plot indicates curvilinear regression relationship, as well as the greater variability for
younger children than for older ones.
Based on the prototype regression pattern, we shall first try the logarithmic transformation, YY 10log'
#create the scatter plot after transformation
LY <- log10(Plasma)
plot(x = Age, y =LY , xlab="Age", ylab = "Plasma", main = "Plasma Level vs. Age Before
Transformation", col = "Red", pch = 19, cex=1.2)
Note that the transformation not only has led to reasonably linear regression relation, but the variability
at the different levels of X also becomes reasonably constant.
To further examine the reasonableness of the transformation YY 10log' , we fitted the simple linear
regression model to the transformed Y data and obtained:
Xy 102301351 ..ˆ
#To fit the model YY 10log' vs X
BloodT.Reg <- lm(I(log10(Plasma)~Age))
summary(BloodT.Reg)
# Create plot of Residual vs. Age after transformaton
plot(x = Age, y =BloodT.Reg$residuals, xlab ="Age", ylab = "Residuals", main = "Residuals vs. Age
after Transformation (y’ = log10(Y))", col = "blue", pch = 19, cex=1.5, panel.first = grid(col = "gray",
lty = "dotted"))
abline(h = 0, col = "red")
#Normal Probability plot After transformation
qqt.plot <- qqnorm(BloodT.Reg$residuals, main = "Normal Probability Plot After Transformation", xlab
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 4
= "Theoretical Quantiles", ylab = "Sample Quantiles", plot.it = TRUE,col="blue", pch = 19, cex=1.5,
panel.first = grid(col = "gray", lty = "dotted"))
abline(lm(qqt.plot$y~qqt.plot$x))
A plot of residuals against X and a normal probability plot after the transformation are shown below.
All of this shows evidence of the appropriateness of linear regression model for the transformed Y data.
0 1 2 3 4
-0.1
0-0
.05
0.00
0.05
0.10
0.15
Residuals vs. Age after Transformation (y’ = log10(Y))
Age
Res
idua
ls
-2 -1 0 1 2
-0.2
-0.1
0.00.1
0.20.3
0.4
Normal Probability Plot After Transformation
Theoretical Quantiles
Samp
le Qu
antile
s
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 5
4.1.1 Transformations on Y : The Box-Cox Method It is often difficult to determine from diagnostic plots, such as the one in the plasma levels example,
which transformation of Y is most appropriate for correcting skewness of the distributions of error
terms, unequal error variances, and nonlinearity of regression function. The Box-Cox procedure
automatically identifies a transformation from the family of power transformations on Y .
Consider the transformed regression model of
iii xY 10
)( where
0
01
)(log
)(
Y
YY
e
This definition was given by Box and Cox (1964). Due to the structure of a linear regression model, one
can equivalently express this as
0
0
)(log
)(
Y
YY
e
With this model, there is an extra parameter, , that need to be estimated. , 0 , 1 , and 2 can be
estimated via maximum likelihood estimation. The estimated can then be used to suggest the type
of transformation. For example,
22 YY '
YY '.50
YY ln' 0 (by definition)
Y
Y1
50 '.
Y
Y1
01 '.
Notice if is estimated to be 1, no transformation is needed. The estimate for is commonly searched
for in the range of -2 to 2.
The MLE of corresponds to the value of for which the residual sum of squares from the fitted
model )(ESS is minimum. It is usually determined by plotting )(ESS versus . Usually 10 – 20
values of are sufficient for estimation of the optimum value.
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 6
From Example 4.2, the Box-Cox results show:
)(ESS )(ESS
1.0 78.0 -0.1 33.1
0.9 70.4 -0.3 31.2
0.7 57.8 -0.4 30.7
0.5 48.4 -0.5 30.6
0.3 41.4 -0.6 30.7
0.1 36.4 -0.7 31.1
0 34.5 -0.9 32.7
-1.0 33.9
Note that 50.ˆ , with 630.)( ESS
Beside YY 10log' , another choice is Y
Y 1' .
Another approach by R-codes
Example 4.4:
This data is in the MASS package. The MASS package contains a set of functions and datasets. See
help(trees) for specific information on the dataset.
Let Y = volume and X = height for the trees in the sample.
R-Codes
Library(MASS)
trees
mod.fit<-lm(formula = Volume ~ Height, data=trees)
summary(mod.fit)
#Plot of Y vs. X with sample model
plot(x = trees$Height, y = trees$Volume, xlab = "Height",
ylab = "Volume", main = "Volume vs. Height",
panel.first = grid(col = "gray", lty = "dotted"))
abline(mod.fit)
#e.i vs. Yhat.i
plot(x = mod.fit$fitted.values, y = mod.fit$residuals,
xlab = expression(hat(Y)), ylab = "Residual",
main = expression(paste("Residuals vs. ", hat(Y))),
panel.first = grid(col = "gray", lty = "dotted"))
abline(h = 0, col = "red")
#Determine lambda.hat In MASS package
save.bc<-boxcox(object = mod.fit, lambda = seq(from = -2,to = 2, by = 0.01))
title(main = "Box-Cox transformation plot")
lambda.hat<-save.bc$x[save.bc$y == max(save.bc$y)]
lambda.hat
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 7
65 70 75 80 85
1020
3040
5060
70
Volume vs. Height
Height
Volum
e
Notice that the variability in the iy ’s increases as ix increases.
10 20 30 40
-20-10
010
2030
Residuals vs. Y^
Y^
Resid
ual
The funnel shape occurs here. Based upon this and the scatter plot, it would be of interest to
consider a transformation of Y .
Also, notice the use of hat(Y) and the expression() function in the plot() function. Use demo(plotmath)
for more information about how to get mathematical symbols in plots.
Note:
The function “expression” returns a vector of type "expression" containing its arguments (unevaluated)
lambda.hat
[1] -0.19 -2 -1 0 1 2
-14
5-1
40
-13
5-1
30
-12
5
log
-Lik
elih
oo
d
95%
Box-Cox transformation plot
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 8
The boxcox() function estimates using maximum likelihood estimation.
Here, it shows the log-likelihood function is maximized when = -0.19. It also gives a likelihood
based 95% confidence interval of about -0.8 to 0.4 for . Notice that = 0 is in the interval (may want
to consider natural log transformation), and notice = 1 is not interval (transformation needed).
Using 190.ˆ results in the following, Y’ = Y-0.19
mod.fit2<-lm(formula = Volume^lambda.hat ~ Height, data=trees)
plot(x = mod.fit2$fitted.values, y =
mod.fit2$residuals, xlab = expression(hat(Y)^{-
0.19}), ylab = "Residual", main =
expression(paste("Residuals vs. ", hat(Y)^{-0.19})),
panel.first = grid(col = "gray", lty = "dotted"))
abline(h = 0, col = "red")
============================
Call:
lm(formula = Volume^lambda.hat ~ Height, data = trees)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.959543 0.090273 10.629 1.62e-11 ***
Height -0.005526 0.001184 -4.668 6.38e-05 ***
0.48 0.50 0.52 0.54 0.56 0.58 0.60
-0.0
6-0
.04
-0.0
20
.00
0.0
20
.04
0.0
6
Residuals vs. Y^
0.19
Y^
0.19
Re
sid
ua
l
It looks like 190.ˆ leads to an approximately constant variance. The sample model can then be
expressed as
HeightY *..ˆ . 005526095950190
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 9
How would you find Y ?
19.0
1
*005526.09595.0ˆ HeightY
Since = 0 is in the interval, it may be of interest to try the natural log transformation since this is
easier to interpret (and more common).
R-Codes
mod.fit3<-lm(formula = log(Volume) ~ Height, data = trees)
summary(mod.fit3)
plot(x = mod.fit3$fitted.values, y =
mod.fit3$residuals, xlab = "log(Y)", ylab =
"Residual", main = "Residuals vs. log(Y)",
panel.first = grid(col = "gray", lty = "dotted"))
abline(h = 0, col = "red")
Call:
lm(formula = log(Volume) ~ Height, data = trees)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.79652 0.89053 -0.894 0.378
Height 0.05354 0.01168 4.585 8.03e-05 ***
2.6 2.8 3.0 3.2 3.4 3.6 3.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
Residuals vs.log Y^
log Y^
Resid
ual
The natural log transformation works as well. This sample model can be expressed as
)ˆlog(Y -0.7965 + 0.05354X
How would you find Y ?
XeY 05354.07965.0ˆ
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 10
4.2 Transformations to Linearize the Model
When the distributions of the error terms are reasonable close to normal and have constant
variance, transformations on X should be attempted. The reason why transformations on Y may
not be desirable here is that a transformation on Y , such as YY ' , may change the shape of the
distribution of the error terms from normal distribution and may also lead to substantially differing error
term variances.
Figure 4.2:
Prototype Regression Pattern Transformations of X
XX 10log'
XX '
2XX ' )exp(' XX
XX /' 1
)exp(' XX
Example 4.3:
Data from an experiment on the effect of number of days of training received ( X ) on performance(Y )
in a battery of simulated sales situations are presented below:
Train <- c(.5,.5,1,1,1.5,1.5,2,2,2.5,2.5)
Score <- c(42.5,50.6,68.5,80.7,89,99.6,105.3,111.8,112.3,125.7)
perf.Reg <- lm(Score~Train)
# Create scatter plot of Trainning vs.Score before transformaton
plot(x = Train, y = Score, xlab ="Trainning", ylab = "Performance", main = "Trainning vs. Performance
before Transformation", col = "blue", pch = 19, cex=1.5)
abline(perf.Reg)
# Create plot of Residual vs. Predited variable before transformaton
plot(x = perf.Reg$fitted.values, y =perf.Reg$residuals, xlab ="Predicted Values", ylab = "Residuals",
main = "Residuals vs. Predicted Values Before Transformation", col = "blue", pch = 19, cex=1.5,
panel.first = grid(col = "gray", lty = "dotted"))
abline(h = 0, col = "red")
#Normal Probability plot Before transformation
qq.plot <- qqnorm(perf.Reg$residuals, main = "Normal Probability Plot Before Transformation", xlab =
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 11
"Theoretical Quantiles", ylab = "Sample Quantiles", plot.it = TRUE,col="blue", pch = 19, cex=1.5,
panel.first = grid(col = "gray", lty = "dotted"))
abline(lm(qq.plot$y~qq.plot$x))
0.5 1.0 1.5 2.0 2.5
4060
8010
012
0
Trainning vs. Performance before Transformation
Trainning
Perfo
rman
ce
50 60 70 80 90 100 110 120
-10-5
05
10
Residuals vs. Predicted Values Before Transformation
Predicted Values
Resid
uals
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
-10-5
05
10
Normal Probability Plot Before Transformation
Theoretical Quantiles
Samp
le Quan
tiles
The scatter plot indicates that the relation appears to be fairly curvilinear. Since the variability at
the different X levels appears to be fairly constant, we shall consider a transformation on X .
Based on the prototype plot, we shall consider initially the square root transformation XX .
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 12
# Create scatter plot of Trainning vs.Score after transformaton
XP <- sqrt(Train)
plot(x = XP, y = Score, xlab ="Sqrt(Trainning)", ylab = "Performance",
main = "Sqrt(Trainning) vs. Performance
after Transformation", col = "blue", pch = 19, cex=1.5)
#To fit the model y vs sqrt(x)
perfT.Reg <- lm(Score~I(sqrt(Train)))
summary(perfT.Reg)
plot(x = perfT.Reg$fitted.values, y =perfT.Reg$residuals, xlab ="Predicted Values", ylab = "Residuals",
main = "Residuals vs. Predicted Values After Transformation", col = "blue", pch = 19, cex=1.5,
panel.first = grid(col = "gray", lty = "dotted"))
abline(h = 0, col = "red")
qqt.plot <- qqnorm(perfT.Reg$residuals, main = "Normal Probability Plot After
Transformation (x` =sqrt(x))", xlab = "Theoretical Quantiles", ylab = "Sample
Quantiles", plot.it = TRUE,col="blue", pch = 19, cex=1.5, panel.first = grid(col =
"gray", lty = "dotted"))
abline(lm(qqt.plot$y~qqt.plot$x))
0.8 1.0 1.2 1.4 1.6
40
60
80
10
01
20
Sqrt(Trainning) vs. Performance
after Transformation
Sqrt(Trainning)
Pe
rfo
rma
nce
60 80 100 120
-10
-50
5
Residuals vs. Predicted Values After Transformation
Predicted Values
Re
sid
ua
ls
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 13
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
-10
-50
5Normal Probability Plot After
Transformation (x` =sqrt(x))
Theoretical Quantiles
Sa
mp
le
Qu
an
tile
s
Note that the scatter plot of Y versus X shows a reasonable linear relation. The variability
of the scatter plot at the different X levels is the same as before.
The plot of residual against X shows no evidence of unequal error variances. The normal probability
plot after transformation also shows no indications of substantial departures from normality. Thus the
simple linear regression model XY 10 appears to be appropriate here.
Fit the model using the transformed data, we obtain:
XY 45.8333.10ˆ
4.3.2 Transformations on the Predictor variable ( X ): The Box and Tidwell Method Suppose that the relationship between Y and one or more of the regressor variables is nonlinear but
that the usual assumptions of normally and independently distributed responses with constant
variance are at least approximately satisfied. We want to select an appropriate transformation on
the regressor variables so that the relationship between y and the transformed regressor is as
simple as possible.
Box and Tidwell describe an analytical procedure for determining the form of the transformation on X .
Assume that the response variable Y is related to a power of the regressor, say X , as
1010 ),,()( fYE where
0
0
X
X
ln
and 0 , 1 and are unknown parameters.
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 14
The procedure is:
Let 10 as the initial guess of , so that XX 0
0
, or that no transformation at all is applied in
the first iteration.
Expanding about the initial guess in a Taylor series and ignoring terms of higher than first order:
0
0
100
0100
d
dffYE
),,()(),,()(
0
0
100
10 1
d
dfX
),,()(
Note:
If the term in braces were known, it could be treated as an additional regressor variable, and it would be
possible to estimate the parameters 0 , 1 and by least squared estimation.
000
0
100100
d
d
d
df
d
df ),,(),,(
=
dX
Xd 10 .
0
d
Xd )(
= )ln(XX1
Thus,
)ln()()(** XXXYE 110 1
WX ***
210
where 12 1 )(* and )ln(XXW .
Note that 1 can be estimated by fitting the model XY 10 ˆˆˆ
*
2 can be estimated by fitting the model WXY *** ˆˆˆˆ210
Taking 1
1
21
ˆ
ˆ *
as the revised estimate of .
This procedure may now be repeated using new regressor 1XX in the calculations.
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 15
Box and Tidwell (1962) noted that this procedure usually converges quite rapidly, and often the first-
stage result 1 is a satisfactory estimate of . However, round-off error is potentially a problem.
Convergence problems may be encountered in cases where the error standard deviation is large or when
the range of the regressor is very small compared to its mean.
Note:
1 and *ˆ1 are generally differ.
Example 4.5:
A research engineer is investigating the use of a windmill to generate electricity. He has collected data
on the DC output (Y ) from his windmill and the corresponding wind velocity ( X ).
R-Codes:
Y <- c(.123, .5, .653, .558, 1.057, 1.137, 1.144, 1.194, 1.562, 1.582, 1.501, 1.737, 1.822, 1.866, 1.93,
1.8, 2.088, 2.179, 2.166, 2.112, 2.303, 2.294, 2.386, 2.236,2.31)
X <- c(2.45, 2.7, 2.9, 3.05, 3.4, 3.6, 3.95, 4.1, 4.6, 5, 5.45,5.8, 6, 6.2, 6.35, 7,7.4, 7.85, 8.15, 8.8, 9.1,
9.55, 9.7, 10, 10.2)
plot(X, Y, xlab = "Wind Velocity, X", ylab = "DC Output, Y", main = "DC Output vs. Wind Velocity",
col = "Blue", pch = 19, cex=1.5)
#First iteration
Fit0 <- lm(Y~X)
FitT0 <- lm(Y~X+I(X*log(X)))
Fit0
FitT0
4 6 8 10
0.51.0
1.52.0
DC Output vs. Wind Velocity
Wind Velocity, X
DC Ou
tput, Y
The scatter plot suggests that the relationship between DC output and wind speed is not straight
line and that some transformation on X may be appropriate.
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 16
#First iteration
Call:
lm(formula = Y ~ X)
Coefficients:
(Intercept) X
0.1309 0.2411
Call:
lm(formula = Y ~ X + I(X * log(X)))
Coefficients:
(Intercept) X I(X * log(X))
-2.4168 1.5344 -0.4626
We begin with the initial guess 10 and fit the two variables:
XY 10ˆˆˆ = 0.1309 + 0.2411X
and
WXY *** ˆˆˆˆ210 = -2.4168 + 1.5344X – 0.4626W
and we calculate 1 = 9187.012411.0
4626.01
ˆ
ˆ
1
*
2
as the improve estimate of . Note that this estimate of is very close to -1, so the reciprocal, X/1 ,
transformation on X is appropriate.
R-codes:
#Download the package “car” from the CRAN homepage.
#To install the package: Menu->Packages->Install package(s) from local zip files.
Library(car)
Box.tidwell(Y~X)
Output:
Initial Power -0.91830
Score Statistic -9.13243
p-value 0.00000
MLE of Power -0.83334
iterations = 3
W = XlnX (from pg 14)
1
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 17
#Second iteration
Alpha1<- FitT0$coefficients[3]/ Fit0$coefficients[2]+1
lm(Y~I(X^ Alpha1))
lm(Y~I(X^Alpha1)+I((X^ Alpha1)*log(X^ Alpha1)))
#Second iteration
Call:
lm(formula = Y ~ I(X^ Alpha1))
Coefficients:
(Intercept) I(X^ Alpha1)
3.101 -6.683
Call:
lm(formula = Y ~ I(X^ Alpha1) + I((X^ Alpha1) * log(X^ Alpha1)))
Coefficients:
(Intercept) I(X^Alpha1) I((X^ Alpha1) * log(X^ Alpha1))
3.2409 -6.4445 0.5994
To perform a second iteration, define a new regressor variable 91830.'
XX and fit the model
'ˆˆˆ XY 10 = 3.101 – 6.683X’
and
WXY *** ˆˆˆˆ210 = 3.2409 – 6.4445X’ + 0.5994W’
where 'ln'' XXW . The second-step estimate of .is thus
2 01.1)9183.0(683.6
5994.0
ˆ
ˆ
1
1
*
2
which again supports the use of the reciprocal transformation on X .
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 18
Generalized and Weighted Least Squares
4.2.1 Generalized Least Squares
A difficulty with transformations of Y is that they may create an inappropriate regression
relationship. When an appropriate regression relationship has been found but the variances of the error
terms are unequal, an alternative transformation is weighted least squares.
Consider the model: εXβY
0(εε)E , V(εε) 2Var
The ordinary least-squares estimator yXX)X(ˆ 1 is no longer appropriate.
Note:
V2 is the covariance matrix of the errors and we define KKKKV , where K is a nonsingular
symmetric matrix. The matrix K is often called the square root of V .
Define the new variables
yKZ1 , XKB
1 , εKg1
The regression model can be transformed as
εKβXKyK111 or gBβZ
where the errors in this transformed model have zero expectation,
i.e. 0(εε)K(g) 1 EE
and the covariance matrix of g is
}](g)(g)][g{[g(g) EEEVar
1111 K)ε(εK)Kεε(K)g(g EEE
IKKKKVKK2112112
Thus, the elements of g have mean zero and constant variance and are uncorrelated.
Since errors g in this new model satisfy the usual assumptions, we may apply ordinary least squares.
The least squares function is Xββ)(yV)Xβ(yεVεgg(ββ) 11 S .
The normal equations are yVXβX)VX(11 .
The solution to these equations is yVXX)VX(β111
β is called the generalized least squares estimators of β .
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 19
Notes:
1. β)β( E
2. 11212 X)VX(B)B()β( Var
3. When IV , the error terms, ε , have uncorrelated and equal variances, the ordinary least-
squares estimator yXX)X(ˆ 1 is appropriate.
4. When V is a diagonal matrix with unequal diagonal, the error terms, ε , have uncorrelated
but unequal variances, the generalized least squares estimator yVXX)VX(β111 is used.
4.2.2 Weighted Least Squares
When the errors ε are uncorrelated but have unequal variances and
nw
w
w
/
/
/
V
10
1
01
2
1
,
let 1 VW , (since V is a diagonal matrix, W is also diagonal with diagonal elements or weights
nwww ,,, 21 .) the weighted least squares estimator yWXX)WX(β111 is used.
Notes:
1) “ iw ” is used to stand for weight
2) These estimators are unbiased and have minimum variance among all unbiased estimators.
3) Since the weight iw is inversely related to the variance 2i , it reflects the amount of information
contained in the observation iy . Thus, an observation iy that has a large variance receives less
weight than another observation that has a smaller variance. The more precise is iy (i.e., the
smaller is 2i ), the more information iy provides about )( iyE and therefore the more weight it
should receive in fitting the regression function.
Problem: iw is usually unknown.
Solutions:
1) Examine a plot of ie vs. iy (using regular least squares estimates). When the constant variance
assumption is violated, the plot may look like:
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 20
YY
0 Y
Divide the plot into 3 to 5 groups. Estimate the variance of the ie ’s for each group by 2jS .
YY
0 Y
Set 21 jj Sw / where j denotes the group number.
2) Suppose the variance of the residuals is varying with one of the predictor variables. For
example, suppose the following plot is obtained.
YY
0 Y
Xk
ei
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 21
Fit a simple regression model (estimated variance or standard deviation function) using the 2ie
(or ie ) as the response variable and ikX as the predictor variable. The predicted values from
the estimated variance or standard deviation function for each observation are then used to
find the weights, ii Vw ˆ/1 where iV denotes the fitted values.
3) Estimate the regression coefficients using these weights.
Notes:
1. Inferences are usually done assuming W is known – even though it really is not. By using
estimated quantities in W , there is a source of variablity that is not being accounted for.
2. 2R does not have the same meaning as for unweighted least squares.
Example 4.6: Fit a regression model using weighted least squares
We try to simulate some data to illustrate non-constant variance.
#Simulate data with nonconstant variance
X<-seq(from = 1, to = 40, by = 0.25)
#random generation for the normal distribution
set.seed(5)
epsilon<-rnorm(n = length(X), mean = 0, sd = 1)
epsilon2<-X*epsilon
#Var(epsilon2) = X^2 * 1 = X^2 (non-constant variance), recall:Var(epsilon) =1,i.e., 1)( V
Y<- 2 + 3*X + epsilon2
set1<-data.frame(Y, X)
#Y vs. X with sample model
plot(x = X, y = Y, xlab = "X", ylab = "Y", main = "Y vs. X", panel.first = grid(col = "gray", lty = "dotted"))
mod.fit<-lm(formula = Y ~ X, data = set1)
abline(mod.fit, col="red")
summary(mod.fit)
Call:
lm(formula = Y ~ X, data = set1)
Residuals:
Min 1Q Median 3Q Max
-67.436 -9.892 -1.117 10.978 78.869
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.051 3.818 0.537 0.592
X 3.018 0.163 18.514 <2e-16 ***
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 22
0 10 20 30 40
05
01
00
15
0
Y vs. X
X
Y
From examining the plot, one can see that the variance is a function of X . (as X increases, the
variability increases).
#Residuals vs. Yhat
plot(x = mod.fit$fitted.values, y = mod.fit$residuals, xlab = expression(hat(Y)), ylab ="Residuals",
main = "Residuals vs. estimated mean response", panel.first = grid(col = "gray", lty = "dotted"))
abline(h = 0, col = "darkgreen")
20 40 60 80 100 120
-50
05
0
Residuals vs. estimated mean response
Y^
Re
sid
ua
ls
The megaphone shape above indicates non- constant variance.
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 23
#Try calculating a P.I. for X = 40 (will use later)
pred<-predict(object = mod.fit, newdata = data.frame(X = 40), interval = "prediction", level = 0.95)
fit lwr upr
[1,] 122.773 76.48418 169.0618
Three different weighted least squares methods are investigated.
1.) Based on the predicted values, the data is broken up into 5 groups. The estimated variance for
each group is obtained. The weight used is 21 jj Sw / where 2jS is the sample variance of the
residuals for the jm observations in group 51 ...,,j .
# Method 1
#Find quantiles for Y
quant5<-quantile(x = mod.fit$fitted.values, probs =c(0.2, 0.4, 0.6, 0.8), type = 1)
round(quant5,2)
#Put Y into groups based upon quantiles
groups<-ifelse(mod.fit$fitted.values < quant5[1], 1,
ifelse(mod.fit$fitted.values < quant5[2], 2,
ifelse(mod.fit$fitted.values < quant5[3], 3,
ifelse(mod.fit$fitted.values < quant5[4], 4,
5))))
#Quick way to find the variance of residuals for each group
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140
4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
20% 40% 60% 80%
28.46 51.85 75.99 99.38
5 groups
157 x 20% = 31.4 40% i.e. 62.8
80% i.e. 125.6
60% i.e. 94.2
Based on the predicted values, the data is
broken up into 5 groups
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 24
# function “tapply” = apply a function to each cell of a ragged array, that is to each (non-empty) group
of values given by a unique combination of the levels of certain factors.
var.eps<-tapply(X = mod.fit$residuals, groups, var)
var.eps
#Visualization of creating the groups
plot(x = mod.fit$fitted.values, y = mod.fit$residuals,
xlab = expression(hat(Y)), ylab = "Residuals",
main = "Residuals vs. estimated mean response",
panel.first = grid(col = "gray", lty = "dotted"))
abline(h = 0, col = "darkgreen")
abline(v = quant5, col = "red", lwd = 3)
20 40 60 80 100 120
-50
05
0
Residuals vs. estimated mean response
Y^
Re
sid
ua
ls
#Put the group variances into a vector corresponding to each observation
group.var<-ifelse(groups == 1, var.eps[1],
ifelse(groups == 2, var.eps[2],
ifelse(groups == 3, var.eps[3],
ifelse(groups == 4, var.eps[4],
var.eps[5]))))
1 2 3 4 5
25.91165 148.35059 331.15305 1036.06249 1172.47827
Refer to page 22, 28.46.
31 observations named “1” on previous page
will go to 1st quartile
51.8
5
75.99
99.38.
32 observations named
“5” on previous page
will go into 5th
quartile.
The estimated
variance for
each group is
obtained.
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 25
mod.fit1<-lm(formula = Y ~ X, data = set1, weight = 1/group.var)
summary(mod.fit1)
#Try calculating a P.I. for X = 40
pred1<-predict(object = mod.fit1, newdata=data.frame(X = 40), interval = "prediction" , level = 0.95)
pred1
fit lwr upr
[1,] 123.9026 116.1134 131.6919
2) Based on the predicted values, the data is broken up into 3 groups. The estimated variance for
each group is obtained. The weight used is 21 jj Sw / where 2jS is the sample variance of the
residuals for the jm observations in group 321 ,,j .
# Method 2
#Find quantiles for Y^'s
quant3<-quantile(x = mod.fit$fitted.values, probs = c(1/3, 2/3), type = 1)
quant3
#Put Y into groups based upon quantiles
groups<-ifelse(mod.fit$fitted.values < quant3[1], 1,
ifelse(mod.fit$fitted.values < quant3[2], 2, 3))
#Quick way to find the variance of residuals for each group
var.eps<-tapply(X = mod.fit$residuals, groups, var)
var.eps
1 2 3 4 5 6 7
25.91165 25.91165 25.91165 25.91165 25.91165 25.91165 25.91165
8 9 10 11 12 13 14
25.91165 25.91165 25.91165 25.91165 25.91165 25.91165 25.91165
15 16 17 18 19 20 21
25.91165 25.91165 25.91165 25.91165 25.91165 25.91165 25.91165
22 23 24 25 26 27 28
25.91165 25.91165 25.91165 25.91165 25.91165 25.91165 25.91165
29 30 31 32 33 34 35
25.91165 25.91165 25.91165 148.35059 148.35059 148.35059 148.35059
36 37 38 39 40 41 42
148.35059 148.35059 148.35059 148.35059 148.35059 148.35059 148.35059
43 44 45 46 47 48 49
148.35059 148.35059 148.35059 148.35059 148.35059 148.35059 148.35059
50 51 52 53 54 55 56
148.35059 148.35059 148.35059 148.35059 148.35059 148.35059 148.35059
57 58 59 60 61 62 63
148.35059 148.35059 148.35059 148.35059 148.35059 148.35059 331.15305
64 65 66 67 68 69 70 …
331.15305 331.15305 331.15305 331.15305 331.15305 331.15305 331.15305 …
Group 1 variance,
25.91165.
31 in total,
corresponding to 31
predicted response
values in Q1 in
previous page.
Group 2
variance,
148.35059.
31 in total,
corresponding to
31 predicted
response values
in Q2 in
previous page.
Compare to P.I. on page 22, current width
decreases
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 26
#Visualization of creating the groups
plot(x = mod.fit$fitted.values, y = mod.fit$residuals,
xlab = expression(hat(Y)), ylab = "Residuals",
main = "Residuals vs. estimated mean response",
panel.first = grid(col = "gray", lty = "dotted"))
abline(h = 0, col = "darkgreen")
abline(v = quant3, col = "red", lwd = 3)
20 40 60 80 100 120
-50
05
0
Residuals vs. estimated mean response
Y^
Re
sid
ua
ls
#Put the group variances into a vector corresponding to each observation
group.var<-ifelse(groups == 1, var.eps[1],
ifelse(groups == 2, var.eps[2], var.eps[3]))
mod.fit2<-lm(formula = Y ~ X, data = set1, weight = 1/group.var)
summary(mod.fit2)
#Try calculating a P.I. for X = 40
pred2<-predict(object = mod.fit2, newdata =data.frame(X = 40), interval = "prediction", level = 0.95)
pred2
fit lwr upr
[1,] 123.08 115.03 131.13
3) Suppose right now, example looking at, Z ~ ),(20 N . It can be shown that cZ ~ ),(
220 cN . In
the data simulation process, we are using i ~ ),(22
0 ixN as the error term where 12 . Thus,
the most appropriate weight to use is 21 ii xw / . Of course, in a real-life data analysis
setting, this information would not be known. However, this can serve here then as the “best”
method to compare with methods #1 and #2.
Chapter 4 Transformations and Weighting to Correct Model Inadequacies
UECM2263 Applied Statistical Model Chapter 4 - 27
# Method 3
mod.fit3<-lm(formula = Y ~ X, data = set1, weight = 1/X^2)
summary(mod.fit3)
#Try calculating a P.I. for X = 40
pred3<-predict(object = mod.fit3, newdata = data.frame(X = 40), interval = "prediction", level = 0.95)
pred3
fit lwr upr
[1,] 123.4184 116.0678 130.7691
Here’s an overall summary of the estimated j ’s:
name X.Intercept. X
1 Least Squares 2.05 3.02
2 WLS 1 2.22 3.04
3 WLS 2 2.67 3.01
4 WLS 3 1.84 3.04
Since the constant variance assumption is violated, inferences using least squares estimation may be
incorrect.
Below are the prediction intervals for 40X .
name fit lwr upr
1 Least Squares 122.77 76.48 169.06
2 WLS 1 123.90 116.11 131.69
3 WLS 2 123.08 115.03 131.13
4 WLS 3 123.42 116.07 130.77
Notice how different the regular least squares based interval (thus, variance used in calculation) is from
the WLS intervals.
Almost similar
Obtained from Ordinary least
square method, may be incorrect