Chapter 4 Transformations and Weighting to Correct Model Inadequacies 13 March

Chapter 4 Transformations and Weighting to Correct Model Inadequacies

UECM2263 Applied Statistical Model Chapter 4 - 1

Recall that Regression model fitting has several implicit assumptions, including the following:

1. The model errors have mean zero and constant variance and use uncorrelated.

2. The model errors have a normal distribution – this assumption is made in order to conduct

hypothesis tests and construct CIs – under this assumptions, the errors are independent.

3. The form of the model, including the specification of the regressors, is correct.

Chapter 3 presented several techniques for checking the adequacy of the linear regression model. If the

linear regression model is not appropriate for a data set, there are two basic choices:

1. Abandon the regression model and develop a more appropriate model.

2. Employ some transformation on the data so that regression model is appropriate for the

transformed data.

We consider the use of transformation in this chapter.

4.1 Variance Stabilizing Transformation

The assumption of constant variance is a basic requirement of regression analysis. A common reason

for the violation of this assumption is for the response variable Y to follow a probability distribution in

which the variance is functionally related to the mean.

For example, if Y follow a Poisson distribution with mean , note that the variance of Y is equal to its

mean . Since the mean of Y related to the regressor variable X , the variance of Y will be

proportional to X .

Example 4.1:

Consider the simple linear regression model iii xy 10 , where ii xVar 2 )( . Suppose we use

the transformations X

YY . Is this a Variance Stabilizing Transformation?

22

2

2

11)(

'

)(

)(

xx

YVarxx

YVarYVar

x

YY

xYVar

xVar

i

ii

Yes, variance of Y became constanst.

Unequal error variances and non-normality of the error terms frequently appears together. To

remedial these departures from linear regression model, we need a transformation on Y , since

the shape and spreads of the distributions of Y need to be changed.

Transformation on Y may also at the same time help to linearize a curvilinear regression relation.



Figure 4.1 below contains some prototype regression relations where the skewness and error variance

increase with the mean response )(YE .

Figure 4.1: Prototype Regression Pattern

Transformation on Y

YY )(log YY 10 YY /1

Note: A simultaneous transformation on X may also be helpful or necessary.

Useful Variance-Stabilizing Transformations:

Relationship of 2 to E(Y) Transformation

constant2 Y’ = Y (no transformation)

)(E2 Y Y’ = y (square root, Poisson data)

)](1)[(E2 YEY )(sin' 1 YY (arsin; binomial proportions

0≤ Yi ≤ 1) 22 )(E Y Y’=ln(Y) (natural log)

32 )(E Y Y’ = Y-1/2

(reciprocal square root)

42 )(E Y Y’= Y-1

(reciprocal)

Example 4.2: Data on age ( X ) and plasma level of polymine (Y ) for a portion of the 25 healthy children in a study

are presented below in R codes:

Age <- c(0,0,0,0,0,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4)

Plasma <- c(13.44,12.84,11.91,20.09,15.60,10.11,11.38,10.28,8.96,

8.59,9.83,9,8.65,7.85,8.88,7.94,6.01,5.14,6.9,6.77,4.86,5.1,5.67,5.75,6.23)

#Use lm() function to fit the model

Blood.Reg <- lm(Plasma~Age)

#create the scatter plot

plot(x = Age, y =Plasma , xlab="Age", ylab = "Plasma", main = "Plasma Level vs. Age Before

Transformation", col = "Red", pch = 19, cex=1.5)



0 1 2 3 4

51

01

52

0Plasma Level vs. Age Before Transformation

Age

Pla

sm

a

The scatter plot indicates curvilinear regression relationship, as well as the greater variability for

younger children than for older ones.

Based on the prototype regression pattern, we shall first try the logarithmic transformation, YY 10log'

#create the scatter plot after transformation

LY <- log10(Plasma)

plot(x = Age, y =LY , xlab="Age", ylab = "Plasma", main = "Plasma Level vs. Age Before

Transformation", col = "Red", pch = 19, cex=1.2)

Note that the transformation not only has led to reasonably linear regression relation, but the variability

at the different levels of X also becomes reasonably constant.

To further examine the reasonableness of the transformation YY 10log' , we fitted the simple linear

regression model to the transformed Y data and obtained:

Xy 102301351 ..ˆ

#To fit the model YY 10log' vs X

BloodT.Reg <- lm(I(log10(Plasma)~Age))

summary(BloodT.Reg)

# Create plot of Residual vs. Age after transformaton

plot(x = Age, y =BloodT.Reg$residuals, xlab ="Age", ylab = "Residuals", main = "Residuals vs. Age

after Transformation (y’ = log10(Y))", col = "blue", pch = 19, cex=1.5, panel.first = grid(col = "gray",

lty = "dotted"))

abline(h = 0, col = "red")

#Normal Probability plot After transformation

qqt.plot <- qqnorm(BloodT.Reg$residuals, main = "Normal Probability Plot After Transformation", xlab



= "Theoretical Quantiles", ylab = "Sample Quantiles", plot.it = TRUE,col="blue", pch = 19, cex=1.5,

panel.first = grid(col = "gray", lty = "dotted"))

abline(lm(qqt.plot$y~qqt.plot$x))

A plot of residuals against X and a normal probability plot after the transformation are shown below.

All of this shows evidence of the appropriateness of linear regression model for the transformed Y data.

0 1 2 3 4

-0.1

0-0

.05

0.00

0.05

0.10

0.15

Residuals vs. Age after Transformation (y’ = log10(Y))

Age

Res

idua

ls

-2 -1 0 1 2

-0.2

-0.1

0.00.1

0.20.3

0.4

Normal Probability Plot After Transformation

Theoretical Quantiles

Samp

le Qu

antile

s



4.1.1 Transformations on Y : The Box-Cox Method It is often difficult to determine from diagnostic plots, such as the one in the plasma levels example,

which transformation of Y is most appropriate for correcting skewness of the distributions of error

terms, unequal error variances, and nonlinearity of regression function. The Box-Cox procedure

automatically identifies a transformation from the family of power transformations on Y .

Consider the transformed regression model of

iii xY 10

)( where

0

01

)(log

)(

Y

YY

e

This definition was given by Box and Cox (1964). Due to the structure of a linear regression model, one

can equivalently express this as

0

0

)(log

)(

Y

YY

e

With this model, there is an extra parameter, , that need to be estimated. , 0 , 1 , and 2 can be

estimated via maximum likelihood estimation. The estimated can then be used to suggest the type

of transformation. For example,

22 YY '

YY '.50

YY ln' 0 (by definition)

Y

Y1

50 '.

Y

Y1

01 '.

Notice if is estimated to be 1, no transformation is needed. The estimate for is commonly searched

for in the range of -2 to 2.

The MLE of corresponds to the value of for which the residual sum of squares from the fitted

model )(ESS is minimum. It is usually determined by plotting )(ESS versus . Usually 10 – 20

values of are sufficient for estimation of the optimum value.



From Example 4.2, the Box-Cox results show:

)(ESS )(ESS

1.0 78.0 -0.1 33.1

0.9 70.4 -0.3 31.2

0.7 57.8 -0.4 30.7

0.5 48.4 -0.5 30.6

0.3 41.4 -0.6 30.7

0.1 36.4 -0.7 31.1

0 34.5 -0.9 32.7

-1.0 33.9

Note that 50.ˆ , with 630.)( ESS

Beside YY 10log' , another choice is Y

Y 1' .

Another approach by R-codes

Example 4.4:

This data is in the MASS package. The MASS package contains a set of functions and datasets. See

help(trees) for specific information on the dataset.

Let Y = volume and X = height for the trees in the sample.

R-Codes

Library(MASS)

trees

mod.fit<-lm(formula = Volume ~ Height, data=trees)

summary(mod.fit)

#Plot of Y vs. X with sample model

plot(x = trees$Height, y = trees$Volume, xlab = "Height",

ylab = "Volume", main = "Volume vs. Height",


abline(mod.fit)

#e.i vs. Yhat.i

plot(x = mod.fit$fitted.values, y = mod.fit$residuals,

xlab = expression(hat(Y)), ylab = "Residual",

main = expression(paste("Residuals vs. ", hat(Y))),



#Determine lambda.hat In MASS package

save.bc<-boxcox(object = mod.fit, lambda = seq(from = -2,to = 2, by = 0.01))

title(main = "Box-Cox transformation plot")

lambda.hat<-save.bc$x[save.bc$y == max(save.bc$y)]

lambda.hat



65 70 75 80 85

1020

3040

5060

70

Volume vs. Height

Height

Volum

e

Notice that the variability in the iy ’s increases as ix increases.

10 20 30 40

-20-10

010

2030

Residuals vs. Y^

Y^

Resid

ual

The funnel shape occurs here. Based upon this and the scatter plot, it would be of interest to

consider a transformation of Y .

Also, notice the use of hat(Y) and the expression() function in the plot() function. Use demo(plotmath)

for more information about how to get mathematical symbols in plots.

Note:

The function “expression” returns a vector of type "expression" containing its arguments (unevaluated)

lambda.hat

[1] -0.19 -2 -1 0 1 2

-14

5-1

40

-13

5-1

30

-12

5

log

-Lik

elih

oo

d

95%

Box-Cox transformation plot



The boxcox() function estimates using maximum likelihood estimation.

Here, it shows the log-likelihood function is maximized when = -0.19. It also gives a likelihood

based 95% confidence interval of about -0.8 to 0.4 for . Notice that = 0 is in the interval (may want

to consider natural log transformation), and notice = 1 is not interval (transformation needed).

Using 190.ˆ results in the following, Y’ = Y-0.19

mod.fit2<-lm(formula = Volume^lambda.hat ~ Height, data=trees)

plot(x = mod.fit2$fitted.values, y =

mod.fit2$residuals, xlab = expression(hat(Y)^{-

0.19}), ylab = "Residual", main =

expression(paste("Residuals vs. ", hat(Y)^{-0.19})),



============================

Call:

lm(formula = Volume^lambda.hat ~ Height, data = trees)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.959543 0.090273 10.629 1.62e-11 ***

Height -0.005526 0.001184 -4.668 6.38e-05 ***

0.48 0.50 0.52 0.54 0.56 0.58 0.60

-0.0

6-0

.04

-0.0

20

.00

0.0

20

.04

0.0

6

Residuals vs. Y^

0.19

Y^

0.19

Re

sid

ua

l

It looks like 190.ˆ leads to an approximately constant variance. The sample model can then be

expressed as

HeightY *..ˆ . 005526095950190



How would you find Y ?

19.0

1

*005526.09595.0ˆ HeightY

Since = 0 is in the interval, it may be of interest to try the natural log transformation since this is

easier to interpret (and more common).

R-Codes

mod.fit3<-lm(formula = log(Volume) ~ Height, data = trees)

summary(mod.fit3)

plot(x = mod.fit3$fitted.values, y =

mod.fit3$residuals, xlab = "log(Y)", ylab =

"Residual", main = "Residuals vs. log(Y)",



Call:

lm(formula = log(Volume) ~ Height, data = trees)

Coefficients:


(Intercept) -0.79652 0.89053 -0.894 0.378

Height 0.05354 0.01168 4.585 8.03e-05 ***

2.6 2.8 3.0 3.2 3.4 3.6 3.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

Residuals vs.log Y^

log Y^

Resid

ual

The natural log transformation works as well. This sample model can be expressed as

)ˆlog(Y -0.7965 + 0.05354X

How would you find Y ?

XeY 05354.07965.0ˆ



4.2 Transformations to Linearize the Model

When the distributions of the error terms are reasonable close to normal and have constant

variance, transformations on X should be attempted. The reason why transformations on Y may

not be desirable here is that a transformation on Y , such as YY ' , may change the shape of the

distribution of the error terms from normal distribution and may also lead to substantially differing error

term variances.

Figure 4.2:

Prototype Regression Pattern Transformations of X

XX 10log'

XX '

2XX ' )exp(' XX

XX /' 1

)exp(' XX

Example 4.3:

Data from an experiment on the effect of number of days of training received ( X ) on performance(Y )

in a battery of simulated sales situations are presented below:

Train <- c(.5,.5,1,1,1.5,1.5,2,2,2.5,2.5)

Score <- c(42.5,50.6,68.5,80.7,89,99.6,105.3,111.8,112.3,125.7)

perf.Reg <- lm(Score~Train)

# Create scatter plot of Trainning vs.Score before transformaton

plot(x = Train, y = Score, xlab ="Trainning", ylab = "Performance", main = "Trainning vs. Performance

before Transformation", col = "blue", pch = 19, cex=1.5)

abline(perf.Reg)

# Create plot of Residual vs. Predited variable before transformaton

plot(x = perf.Reg$fitted.values, y =perf.Reg$residuals, xlab ="Predicted Values", ylab = "Residuals",

main = "Residuals vs. Predicted Values Before Transformation", col = "blue", pch = 19, cex=1.5,



#Normal Probability plot Before transformation

qq.plot <- qqnorm(perf.Reg$residuals, main = "Normal Probability Plot Before Transformation", xlab =



"Theoretical Quantiles", ylab = "Sample Quantiles", plot.it = TRUE,col="blue", pch = 19, cex=1.5,


abline(lm(qq.plot$y~qq.plot$x))

0.5 1.0 1.5 2.0 2.5

4060

8010

012

0

Trainning vs. Performance before Transformation

Trainning

Perfo

rman

ce

50 60 70 80 90 100 110 120

-10-5

05

10

Residuals vs. Predicted Values Before Transformation

Predicted Values

Resid

uals

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

-10-5

05

10

Normal Probability Plot Before Transformation


Samp

le Quan

tiles

The scatter plot indicates that the relation appears to be fairly curvilinear. Since the variability at

the different X levels appears to be fairly constant, we shall consider a transformation on X .

Based on the prototype plot, we shall consider initially the square root transformation XX .



# Create scatter plot of Trainning vs.Score after transformaton

XP <- sqrt(Train)

plot(x = XP, y = Score, xlab ="Sqrt(Trainning)", ylab = "Performance",

main = "Sqrt(Trainning) vs. Performance

after Transformation", col = "blue", pch = 19, cex=1.5)

#To fit the model y vs sqrt(x)

perfT.Reg <- lm(Score~I(sqrt(Train)))

summary(perfT.Reg)

plot(x = perfT.Reg$fitted.values, y =perfT.Reg$residuals, xlab ="Predicted Values", ylab = "Residuals",

main = "Residuals vs. Predicted Values After Transformation", col = "blue", pch = 19, cex=1.5,



qqt.plot <- qqnorm(perfT.Reg$residuals, main = "Normal Probability Plot After

Transformation (x` =sqrt(x))", xlab = "Theoretical Quantiles", ylab = "Sample

Quantiles", plot.it = TRUE,col="blue", pch = 19, cex=1.5, panel.first = grid(col =

"gray", lty = "dotted"))

abline(lm(qqt.plot$y~qqt.plot$x))

0.8 1.0 1.2 1.4 1.6

40

60

80

10

01

20

Sqrt(Trainning) vs. Performance

after Transformation

Sqrt(Trainning)

Pe

rfo

rma

nce

60 80 100 120

-10

-50

5

Residuals vs. Predicted Values After Transformation

Predicted Values

Re

sid

ua

ls



-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

-10

-50

5Normal Probability Plot After

Transformation (x` =sqrt(x))


Sa

mp

le

Qu

an

tile

s

Note that the scatter plot of Y versus X shows a reasonable linear relation. The variability

of the scatter plot at the different X levels is the same as before.

The plot of residual against X shows no evidence of unequal error variances. The normal probability

plot after transformation also shows no indications of substantial departures from normality. Thus the

simple linear regression model XY 10 appears to be appropriate here.

Fit the model using the transformed data, we obtain:

XY 45.8333.10ˆ

4.3.2 Transformations on the Predictor variable ( X ): The Box and Tidwell Method Suppose that the relationship between Y and one or more of the regressor variables is nonlinear but

that the usual assumptions of normally and independently distributed responses with constant

variance are at least approximately satisfied. We want to select an appropriate transformation on

the regressor variables so that the relationship between y and the transformed regressor is as

simple as possible.

Box and Tidwell describe an analytical procedure for determining the form of the transformation on X .

Assume that the response variable Y is related to a power of the regressor, say X , as

1010 ),,()( fYE where

0

0

X

X

ln

and 0 , 1 and are unknown parameters.



The procedure is:

Let 10 as the initial guess of , so that XX 0

0

, or that no transformation at all is applied in

the first iteration.

Expanding about the initial guess in a Taylor series and ignoring terms of higher than first order:

0

0

100

0100

d

dffYE

),,()(),,()(

0

0

100

10 1

d

dfX

),,()(

Note:

If the term in braces were known, it could be treated as an additional regressor variable, and it would be

possible to estimate the parameters 0 , 1 and by least squared estimation.

000

0

100100

d

d

d

df

d

df ),,(),,(

=

dX

Xd 10 .

0

d

Xd )(

= )ln(XX1

Thus,

)ln()()(** XXXYE 110 1

WX ***

210

where 12 1 )(* and )ln(XXW .

Note that 1 can be estimated by fitting the model XY 10 ˆˆˆ

*

2 can be estimated by fitting the model WXY *** ˆˆˆˆ210

Taking 1

1

21

ˆ

ˆ *

as the revised estimate of .

This procedure may now be repeated using new regressor 1XX in the calculations.



Box and Tidwell (1962) noted that this procedure usually converges quite rapidly, and often the first-

stage result 1 is a satisfactory estimate of . However, round-off error is potentially a problem.

Convergence problems may be encountered in cases where the error standard deviation is large or when

the range of the regressor is very small compared to its mean.

Note:

1 and *ˆ1 are generally differ.

Example 4.5:

A research engineer is investigating the use of a windmill to generate electricity. He has collected data

on the DC output (Y ) from his windmill and the corresponding wind velocity ( X ).

R-Codes:

Y <- c(.123, .5, .653, .558, 1.057, 1.137, 1.144, 1.194, 1.562, 1.582, 1.501, 1.737, 1.822, 1.866, 1.93,

1.8, 2.088, 2.179, 2.166, 2.112, 2.303, 2.294, 2.386, 2.236,2.31)

X <- c(2.45, 2.7, 2.9, 3.05, 3.4, 3.6, 3.95, 4.1, 4.6, 5, 5.45,5.8, 6, 6.2, 6.35, 7,7.4, 7.85, 8.15, 8.8, 9.1,

9.55, 9.7, 10, 10.2)

plot(X, Y, xlab = "Wind Velocity, X", ylab = "DC Output, Y", main = "DC Output vs. Wind Velocity",

col = "Blue", pch = 19, cex=1.5)

#First iteration

Fit0 <- lm(Y~X)

FitT0 <- lm(Y~X+I(X*log(X)))

Fit0

FitT0

4 6 8 10

0.51.0

1.52.0

DC Output vs. Wind Velocity

Wind Velocity, X

DC Ou

tput, Y

The scatter plot suggests that the relationship between DC output and wind speed is not straight

line and that some transformation on X may be appropriate.



#First iteration

Call:

lm(formula = Y ~ X)

Coefficients:

(Intercept) X

0.1309 0.2411

Call:

lm(formula = Y ~ X + I(X * log(X)))

Coefficients:

(Intercept) X I(X * log(X))

-2.4168 1.5344 -0.4626

We begin with the initial guess 10 and fit the two variables:

XY 10ˆˆˆ = 0.1309 + 0.2411X

and

WXY *** ˆˆˆˆ210 = -2.4168 + 1.5344X – 0.4626W

and we calculate 1 = 9187.012411.0

4626.01

ˆ

ˆ

1

*

2

as the improve estimate of . Note that this estimate of is very close to -1, so the reciprocal, X/1 ,

transformation on X is appropriate.

R-codes:

#Download the package “car” from the CRAN homepage.

#To install the package: Menu->Packages->Install package(s) from local zip files.

Library(car)

Box.tidwell(Y~X)

Output:

Initial Power -0.91830

Score Statistic -9.13243

p-value 0.00000

MLE of Power -0.83334

iterations = 3

W = XlnX (from pg 14)

1



#Second iteration

Alpha1<- FitT0$coefficients[3]/ Fit0$coefficients[2]+1

lm(Y~I(X^ Alpha1))

lm(Y~I(X^Alpha1)+I((X^ Alpha1)*log(X^ Alpha1)))

#Second iteration

Call:

lm(formula = Y ~ I(X^ Alpha1))

Coefficients:

(Intercept) I(X^ Alpha1)

3.101 -6.683

Call:

lm(formula = Y ~ I(X^ Alpha1) + I((X^ Alpha1) * log(X^ Alpha1)))

Coefficients:

(Intercept) I(X^Alpha1) I((X^ Alpha1) * log(X^ Alpha1))

3.2409 -6.4445 0.5994

To perform a second iteration, define a new regressor variable 91830.'

XX and fit the model

'ˆˆˆ XY 10 = 3.101 – 6.683X’

and

WXY *** ˆˆˆˆ210 = 3.2409 – 6.4445X’ + 0.5994W’

where 'ln'' XXW . The second-step estimate of .is thus

2 01.1)9183.0(683.6

5994.0

ˆ

ˆ

1

1

*

2

which again supports the use of the reciprocal transformation on X .



Generalized and Weighted Least Squares

4.2.1 Generalized Least Squares

A difficulty with transformations of Y is that they may create an inappropriate regression

relationship. When an appropriate regression relationship has been found but the variances of the error

terms are unequal, an alternative transformation is weighted least squares.

Consider the model: εXβY

0(εε)E , V(εε) 2Var

The ordinary least-squares estimator yXX)X(ˆ 1 is no longer appropriate.

Note:

V2 is the covariance matrix of the errors and we define KKKKV , where K is a nonsingular

symmetric matrix. The matrix K is often called the square root of V .

Define the new variables

yKZ1 , XKB

1 , εKg1

The regression model can be transformed as

εKβXKyK111 or gBβZ

where the errors in this transformed model have zero expectation,

i.e. 0(εε)K(g) 1 EE

and the covariance matrix of g is

}](g)(g)][g{[g(g) EEEVar

1111 K)ε(εK)Kεε(K)g(g EEE

IKKKKVKK2112112

Thus, the elements of g have mean zero and constant variance and are uncorrelated.

Since errors g in this new model satisfy the usual assumptions, we may apply ordinary least squares.

The least squares function is Xββ)(yV)Xβ(yεVεgg(ββ) 11 S .

The normal equations are yVXβX)VX(11 .

The solution to these equations is yVXX)VX(β111

β is called the generalized least squares estimators of β .



Notes:

1. β)β( E

2. 11212 X)VX(B)B()β( Var

3. When IV , the error terms, ε , have uncorrelated and equal variances, the ordinary least-

squares estimator yXX)X(ˆ 1 is appropriate.

4. When V is a diagonal matrix with unequal diagonal, the error terms, ε , have uncorrelated

but unequal variances, the generalized least squares estimator yVXX)VX(β111 is used.

4.2.2 Weighted Least Squares

When the errors ε are uncorrelated but have unequal variances and

nw

w

w

/

/

/

V

10

1

01

2

1

,

let 1 VW , (since V is a diagonal matrix, W is also diagonal with diagonal elements or weights

nwww ,,, 21 .) the weighted least squares estimator yWXX)WX(β111 is used.

Notes:

1) “ iw ” is used to stand for weight

2) These estimators are unbiased and have minimum variance among all unbiased estimators.

3) Since the weight iw is inversely related to the variance 2i , it reflects the amount of information

contained in the observation iy . Thus, an observation iy that has a large variance receives less

weight than another observation that has a smaller variance. The more precise is iy (i.e., the

smaller is 2i ), the more information iy provides about )( iyE and therefore the more weight it

should receive in fitting the regression function.

Problem: iw is usually unknown.

Solutions:

1) Examine a plot of ie vs. iy (using regular least squares estimates). When the constant variance

assumption is violated, the plot may look like:



YY

0 Y

Divide the plot into 3 to 5 groups. Estimate the variance of the ie ’s for each group by 2jS .

YY

0 Y

Set 21 jj Sw / where j denotes the group number.

2) Suppose the variance of the residuals is varying with one of the predictor variables. For

example, suppose the following plot is obtained.

YY

0 Y

Xk

ei



Fit a simple regression model (estimated variance or standard deviation function) using the 2ie

(or ie ) as the response variable and ikX as the predictor variable. The predicted values from

the estimated variance or standard deviation function for each observation are then used to

find the weights, ii Vw ˆ/1 where iV denotes the fitted values.

3) Estimate the regression coefficients using these weights.

Notes:

1. Inferences are usually done assuming W is known – even though it really is not. By using

estimated quantities in W , there is a source of variablity that is not being accounted for.

2. 2R does not have the same meaning as for unweighted least squares.

Example 4.6: Fit a regression model using weighted least squares

We try to simulate some data to illustrate non-constant variance.

#Simulate data with nonconstant variance

X<-seq(from = 1, to = 40, by = 0.25)

#random generation for the normal distribution

set.seed(5)

epsilon<-rnorm(n = length(X), mean = 0, sd = 1)

epsilon2<-X*epsilon

#Var(epsilon2) = X^2 * 1 = X^2 (non-constant variance), recall:Var(epsilon) =1,i.e., 1)( V

Y<- 2 + 3*X + epsilon2

set1<-data.frame(Y, X)

#Y vs. X with sample model

plot(x = X, y = Y, xlab = "X", ylab = "Y", main = "Y vs. X", panel.first = grid(col = "gray", lty = "dotted"))

mod.fit<-lm(formula = Y ~ X, data = set1)

abline(mod.fit, col="red")

summary(mod.fit)

Call:

lm(formula = Y ~ X, data = set1)

Residuals:

Min 1Q Median 3Q Max

-67.436 -9.892 -1.117 10.978 78.869

Coefficients:


(Intercept) 2.051 3.818 0.537 0.592

X 3.018 0.163 18.514 <2e-16 ***



0 10 20 30 40

05

01

00

15

0

Y vs. X

X

Y

From examining the plot, one can see that the variance is a function of X . (as X increases, the

variability increases).

#Residuals vs. Yhat

plot(x = mod.fit$fitted.values, y = mod.fit$residuals, xlab = expression(hat(Y)), ylab ="Residuals",

main = "Residuals vs. estimated mean response", panel.first = grid(col = "gray", lty = "dotted"))

abline(h = 0, col = "darkgreen")

20 40 60 80 100 120

-50

05

0

Residuals vs. estimated mean response

Y^

Re

sid

ua

ls

The megaphone shape above indicates non- constant variance.



#Try calculating a P.I. for X = 40 (will use later)

pred<-predict(object = mod.fit, newdata = data.frame(X = 40), interval = "prediction", level = 0.95)

fit lwr upr

[1,] 122.773 76.48418 169.0618

Three different weighted least squares methods are investigated.

1.) Based on the predicted values, the data is broken up into 5 groups. The estimated variance for

each group is obtained. The weight used is 21 jj Sw / where 2jS is the sample variance of the

residuals for the jm observations in group 51 ...,,j .

# Method 1

#Find quantiles for Y

quant5<-quantile(x = mod.fit$fitted.values, probs =c(0.2, 0.4, 0.6, 0.8), type = 1)

round(quant5,2)

#Put Y into groups based upon quantiles

groups<-ifelse(mod.fit$fitted.values < quant5[1], 1,

ifelse(mod.fit$fitted.values < quant5[2], 2,



5))))

#Quick way to find the variance of residuals for each group

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2

41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80

2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4

101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120

4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140

4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157

5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

20% 40% 60% 80%

28.46 51.85 75.99 99.38

5 groups

157 x 20% = 31.4 40% i.e. 62.8

80% i.e. 125.6

60% i.e. 94.2

Based on the predicted values, the data is

broken up into 5 groups



# function “tapply” = apply a function to each cell of a ragged array, that is to each (non-empty) group

of values given by a unique combination of the levels of certain factors.

var.eps<-tapply(X = mod.fit$residuals, groups, var)

var.eps

#Visualization of creating the groups


xlab = expression(hat(Y)), ylab = "Residuals",

main = "Residuals vs. estimated mean response",



abline(v = quant5, col = "red", lwd = 3)

20 40 60 80 100 120

-50

05

0


Y^

Re

sid

ua

ls

#Put the group variances into a vector corresponding to each observation

group.var<-ifelse(groups == 1, var.eps[1],

ifelse(groups == 2, var.eps[2],



var.eps[5]))))

1 2 3 4 5

25.91165 148.35059 331.15305 1036.06249 1172.47827

Refer to page 22, 28.46.

31 observations named “1” on previous page

will go to 1st quartile

51.8

5

75.99

99.38.

32 observations named

“5” on previous page

will go into 5th

quartile.

The estimated

variance for

each group is

obtained.



mod.fit1<-lm(formula = Y ~ X, data = set1, weight = 1/group.var)

summary(mod.fit1)

#Try calculating a P.I. for X = 40

pred1<-predict(object = mod.fit1, newdata=data.frame(X = 40), interval = "prediction" , level = 0.95)

pred1

fit lwr upr

[1,] 123.9026 116.1134 131.6919

2) Based on the predicted values, the data is broken up into 3 groups. The estimated variance for

each group is obtained. The weight used is 21 jj Sw / where 2jS is the sample variance of the

residuals for the jm observations in group 321 ,,j .

# Method 2

#Find quantiles for Y^'s

quant3<-quantile(x = mod.fit$fitted.values, probs = c(1/3, 2/3), type = 1)

quant3

#Put Y into groups based upon quantiles

groups<-ifelse(mod.fit$fitted.values < quant3[1], 1,

ifelse(mod.fit$fitted.values < quant3[2], 2, 3))

#Quick way to find the variance of residuals for each group

var.eps<-tapply(X = mod.fit$residuals, groups, var)

var.eps

1 2 3 4 5 6 7

25.91165 25.91165 25.91165 25.91165 25.91165 25.91165 25.91165

8 9 10 11 12 13 14

25.91165 25.91165 25.91165 25.91165 25.91165 25.91165 25.91165

15 16 17 18 19 20 21

25.91165 25.91165 25.91165 25.91165 25.91165 25.91165 25.91165

22 23 24 25 26 27 28

25.91165 25.91165 25.91165 25.91165 25.91165 25.91165 25.91165

29 30 31 32 33 34 35

25.91165 25.91165 25.91165 148.35059 148.35059 148.35059 148.35059

36 37 38 39 40 41 42

148.35059 148.35059 148.35059 148.35059 148.35059 148.35059 148.35059

43 44 45 46 47 48 49

148.35059 148.35059 148.35059 148.35059 148.35059 148.35059 148.35059

50 51 52 53 54 55 56

148.35059 148.35059 148.35059 148.35059 148.35059 148.35059 148.35059

57 58 59 60 61 62 63

148.35059 148.35059 148.35059 148.35059 148.35059 148.35059 331.15305

64 65 66 67 68 69 70 …

331.15305 331.15305 331.15305 331.15305 331.15305 331.15305 331.15305 …

Group 1 variance,

25.91165.

31 in total,

corresponding to 31

predicted response

values in Q1 in

previous page.

Group 2

variance,

148.35059.

31 in total,

corresponding to

31 predicted

response values

in Q2 in

previous page.

Compare to P.I. on page 22, current width

decreases



#Visualization of creating the groups


xlab = expression(hat(Y)), ylab = "Residuals",

main = "Residuals vs. estimated mean response",



abline(v = quant3, col = "red", lwd = 3)

20 40 60 80 100 120

-50

05

0


Y^

Re

sid

ua

ls

#Put the group variances into a vector corresponding to each observation

group.var<-ifelse(groups == 1, var.eps[1],

ifelse(groups == 2, var.eps[2], var.eps[3]))

mod.fit2<-lm(formula = Y ~ X, data = set1, weight = 1/group.var)

summary(mod.fit2)


pred2<-predict(object = mod.fit2, newdata =data.frame(X = 40), interval = "prediction", level = 0.95)

pred2

fit lwr upr

[1,] 123.08 115.03 131.13

3) Suppose right now, example looking at, Z ~ ),(20 N . It can be shown that cZ ~ ),(

220 cN . In

the data simulation process, we are using i ~ ),(22

0 ixN as the error term where 12 . Thus,

the most appropriate weight to use is 21 ii xw / . Of course, in a real-life data analysis

setting, this information would not be known. However, this can serve here then as the “best”

method to compare with methods #1 and #2.



# Method 3

mod.fit3<-lm(formula = Y ~ X, data = set1, weight = 1/X^2)

summary(mod.fit3)


pred3<-predict(object = mod.fit3, newdata = data.frame(X = 40), interval = "prediction", level = 0.95)

pred3

fit lwr upr

[1,] 123.4184 116.0678 130.7691

Here’s an overall summary of the estimated j ’s:

name X.Intercept. X

1 Least Squares 2.05 3.02

2 WLS 1 2.22 3.04

3 WLS 2 2.67 3.01

4 WLS 3 1.84 3.04

Since the constant variance assumption is violated, inferences using least squares estimation may be

incorrect.

Below are the prediction intervals for 40X .

name fit lwr upr

1 Least Squares 122.77 76.48 169.06

2 WLS 1 123.90 116.11 131.69

3 WLS 2 123.08 115.03 131.13

4 WLS 3 123.42 116.07 130.77

Notice how different the regular least squares based interval (thus, variance used in calculation) is from

the WLS intervals.

Almost similar

Obtained from Ordinary least

square method, may be incorrect

Chapter 4 Transformations and Weighting to Correct Model Inadequacies 13 March

Documents

Transcript of Chapter 4 Transformations and Weighting to Correct Model Inadequacies 13 March