Adjustment for Covariates

8/13/2019 Adjustment for Covariates

1/5

PLEASE SCROLL DOWN FOR ARTICLE

This article was downloaded by: [University of Alberta]On: 7 January 2009Access details: Access Details: [subscription number 713587337]Publisher Informa HealthcareInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Encyclopedia of Biopharmaceutical StatisticsPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713172960

djustment for CovariatesThomas T. Permutt aaU.S. Food and Drug Administration, Rockville, Maryland, U.S.A.

Online Publication Date: 23 April 2003

To cite this SectionPermutt, Thomas T.(2003)'Adjustment for Covariates',Encyclopedia of Biopharmaceutical Statistics,1:1,18 21

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.
http://www.informaworld.com/smpp/title~content=t713172960http://www.informaworld.com/terms-and-conditions-of-access.pdfhttp://www.informaworld.com/terms-and-conditions-of-access.pdfhttp://www.informaworld.com/smpp/title~content=t713172960


2/5

Adjustment for Covariates

Thomas Permutt

U.S. Food and Drug Administration, Rockville, Maryland, U.S.A.

INTRODUCTION

The techniques of analysis of covariance are employed in

three mathematically similar but conceptually very dif-

ferent kinds of problem. Examples of all three kinds arise

in connection with the development of pharmaceuti-

cal products.

In the first case, a regression model is expected to fit

the data well enough to serve as the basis for prediction.

In testing the stability of a drug product, for example, the

potency may be modeled as a linear function of time, and

the possibility of different lines for different batches of

the product needs to be allowed for. The purpose of the

statistical analysis is to ensure, with a stated degree of

confidence, that the potency at a given time will be within

given limits.

The second and perhaps widest application of analy-

sis of covariance is in observational studies, such as

arise in the postmarketing phase of drug development.

It may be desired, for example, to study the associationof some outcome with exposure to a drug. It is ne-

cessary to adjust for covariates that may be systematic-

ally associated both with the outcome and with the

exposure and so induce a spurious relationship between

the outcome and the exposure. In such studies the un-

explained variation is typically high, so the model is

not expected to fit the individual observations well. It

must, however, include all the important potential con-

founders and must have at least approximately the right

functional form, if a causal relationship, or the absence

of one, between the outcome and the exposure is to

be inferred.The third kind of application of analysis of covar-

iance, although the first historically,[1] is to randomized,

controlled experiments such as clinical trials of the effi-

cacy of new drugs. In such experiments, adjustment for

covariates is optional in a sense, because the validity

of unadjusted comparisons is ensured by randomiza-

tion. Adjustments properly planned and executed, how-

ever, can reduce the probabilities of inferential errors

and so help to control the size, cost, and time of clini-

cal trials.

The modeling problem is straightforward, well covered

in textbooks, and, strictly speaking, not a matter of

adjustment. The observational problem, in contrast, is

essentially intractable from the standpoint of formal

statistical inference; but heuristic methods have had wide

application and discussion. We focus here on the adjust-

ment for covariates in the experimental setting. This

problem has had relatively little attention in the litera-

ture, partly because early writings[1] are largely complete,

correct, and still sufficient. Unfortunately, the more recent

literature on modeling and on observational studies has

been misapplied to the experimental problem. Either a

well-fitting model is thought to be required, as in the

first problem, or the analysis is supposed to be heuristic,

as in the second. In fact, a rigorous theory of analysis of

covariance in controlled experiments can be developed,even in the absence of a good model for the covar-

iate effects.

ADJUSTING FOR BASELINE VALUES

Consider the case of a randomized trial of two treatments,

with a continuous measure of outcome (Y) that is also

measured at baseline (X). If the populations are normal or

the samples are large, the treatments might be compared

by a two-samplet-test on the difference in mean outcome

Y. Alternatively, the change from baseline, Y X, mightbe analyzed in the same way. The difference between

groups in Yand the difference between groups in YX

have the same expectation, because the expected

difference between groups in X is zero. We therefore

have two unbiased estimators of the same parameter.

They have different variances, according to how well the

baseline predicts the outcome. If the variances (within

treatment groups) of baseline and outcome are the same

and the correlation is r, then the standard errors are in the

ratio (2 2r)1/2. The adjusted estimator is better if

r > 0.5.The opinions expressed are those of the author and not necessarily of the

U.S. Food and Drug Administration.

18 Encyclopedia of Biopharmaceutical Statistics

DOI: 10.1081/E-EBS 120007378

Copyright D 2003 by Marcel Dekker, Inc. All rights reserved.


3/5

Of course, there is no need to choose. The average of

the two estimators has standard error proportional to

(1.25r)1/2, which is less than either of the two when-

ever 0.25 < r < 0.75. This average can be written as the

difference between treatment groups in Y 0.5X. So

Y

0.5Xis a less variable measure of outcome than eitherthe mean raw score Y or the mean difference from

baseline YX, whenever the correlation is between 0.25

and 0.75. This can, but need not, be viewed as fitting

parallel straight lines with slope 0.5 to the two groups and

measuring the vertical distance between them.

Naturally, there is no need to choose 0.5 either. The

difference in group means of any statistic of the form

YbX can be used to estimate the treatment effect. The

smallest variance, and so the most sensitive test, is ac-

hieved whenb happens to coincide with the least-squares

common slope, but the variance does not increase steeply

as b moves away from this optimal value. Thus, even a

very rough a priori guess for b is likely to perform better

than either of the special cases b= 0 (no adjustment) and

b= 1 (subtract the baseline).

Finally, there is no need to guess. The least-squares

slope, calculated from the data, can be used for b,

without any consequences beyond the loss of a degree

of freedom for error. Asymptotic theory for the result-

ing adjusted estimator of the treatment effect was given

by Robinson,[2] and an exact, small-sample theory by

Tukey.[3]

In general, then, the best way to adjust for a baseline

value is neither to ignore it nor to subtract it, but to

subtract a fraction of it. The fraction will be estimatedfrom the data, simultaneously with the treatment effect,

by analysis of covariance. There is no need to check the

assumption that the outcome is linearly related to the

baseline value, because this assumption plays no role in

the analysis. If it did, not only the analysis of co-

variance would be tainted: After all, the unadjusted an-

alysis also assumes a linear relationship, with slope

0, and the change-from-baseline analysis assumes a

slope of 1.

OTHER COVARIATES

Any single, prespecified covariate can be adjusted for in

much the same way as a baseline measurement of the

outcome variable. That is, the mean of a linear function

YbX may be compared across treatment groups, the

coefficient b being estimated, simultaneously with the

treatment effect, by least squares. Again, the much-tested

assumption of a linear relationship between Y and X is

superfluous. Two other critical assumptions are some-

times neglected, however.

First, the covariate must be unaffected by treatment.

While it is possible to give an interpretation of an-

alysis of covariance adjusting for intermediate causes,

this interpretation is not often useful in clinical trials.

Any covariate measured before randomization is ac-

ceptable. With care, some covariates measured latermay be assumed to be unaffected by treatment: the

weather, for example, in a study of seasonal allergies.

It may be noted that, while analysis of covariance is

not usually appropriate for variables in the causal path-

way, some of the advantages of analysis of covariance

are shared by instrumental-variables techniques[4] that

are appropriate.

Second, the covariate is assumed to be prespecified.

Model-searching procedures are unavoidable in obser-

vational studies, for there are typically many potential

confounding variables whose effects must be considered

and eliminated if necessary. Alarmingly little is known

about the statistical properties of such procedures, how-

ever, and what is known is not generally encouraging. It

is usual, although unjustifiable, to ignore the searching

process in reporting the results, presenting simply the

chosen model, its estimates, and its optimistic estimates

of variability.

Randomized trials are radically different from obser-

vational studies in this respect. There is no confounding,

because a covariate cannot be systematically associated

with treatment if it is not affected by treatment and if

treatment is assigned at random. The purpose of analysis

of covariance in randomized studies is to reduce the ran-

dom variability of the estimated treatment effects by eli-minating some of what would otherwise be unexplained

variance in the observations. This difference has implica-

tions for the choice of covariates, which will be discussed

in the next section.

CHOICE OF COVARIATES

Whereas a confounder in an observational study is a

variable correlated both with the outcome and with the

treatment, a useful covariate in a randomized trial is avariable correlated just with the outcome. The greater the

absolute correlation, the more the reduction in residual

variance and so also in the standard error of the estimated

treatment effect. This benefit is realized whether the

treatment groups happen to be balanced with respect to

the covariate or not. It is neither necessary nor useful,

therefore, to choose covariates retrospectively, on the ba-

sis of imbalance.[5]

It is accordingly safe to prespecify, in the protocol for a

randomized trial, a covariate, or a few covariates, un-

Adjustment for Covariates 19

A


4/5

affected by treatment but likely to be correlated with the

outcome. Analysis of covariance, adjusting for these

covariates, may then be carried out and relied on, without

any justification after the fact. The probability of Type I

error will be controlled by significance testing, and the

probability of Type II error will be less than if covariateswere not used.

The improvement, however, depends on the correla-

tions (and partial correlations) between the covariates and

the outcome, and these may not be perfectly known ahead

of time. It might therefore seem advantageous to deter-

mine the correlations for some candidate covariates with

the data in view, and select a subset that explains a high

proportion of the variance of the outcome. With care, it is

possible to specify an unambiguous algorithm for

selecting a model and to control the probability of Type

I error.[3] It is not known, however, whether such pro-

cedures have any advantage with respect to Type II error

over simply prespecifying the model. In practice, in

critical efficacy trials the relevant covariates will often be

apparent in advance; and when they are not, it may not be

any easier or better to specify a set of candidates and

an algorithm for choosing among the than to specify a

single model.

The properties of models with large numbers of co-

variates are not well understood. Various rules of thumb

relating the number of variables to the sample sizes have

been given, but none has any compelling theoretical

justification. Furthermore, searches in large sets of

potential models probably share some of the defects of

models with many covariates, even if the chosen modelhas only a few covariates.

NONLINEAR MODELS

The word linear in the context of the analysis of

covariance may be understood in two senses. In many

applications, the model is linear in the covariates.

However, a model with polynomial or other nonlinear

covariate effects is still linear in the coefficients, and

the least-squares estimators are consequently linearfunctions of the outcome measurements, so the theory

of the general linear model applies. In contrast, logis-

tic, proportional-hazards, and Poisson regression models

all involve covariates in a more fundamentally nonli-

near way.

Nonlinear covariate effects can be added to an anal-

ysis of covariance without difficulty. The most common

examples are the 1/0 variables used to represent cate-

gorical covariates, but polynomial, logarithmic, expo-

nential, and other functions may sometimes be useful.

It is important to bear in mind, however, that in

randomized trials the purpose of the covariate model is

to reduce unexplained variance. Thus, nonlinear terms

should be introduced when they are expected to explain

substantial variance in the outcome, and not simply

because it is feared that the assumption of a linearrelationship between the outcome and the covariate may

be violated.

Conversely, trials with outcomes that are successes or

failures, survival times, or small numbers of events are

analyzed by methods that are nonlinear in the second

sense. Recent theoretical developments (the generalized

linear model) and computer programs have tended to

emphasize the analogies between these methods and the

linear model. Some of the same principles undoubtedly

apply when such methods are used to analyze random-

ized trials. For example, if a model selection procedure is

used, it is vital to understand the statistical properties of

the procedure as a whole, rather than simply to report

the nominal standard errors and p-values of the model

that happens to be chosen. On the other hand, the simi-

larity in form may conceal important differences in

mathematical structure between linear and nonlinear

models, and the linear results must not be casually as-

sumed to have nonlinear analogs. It is not clear, for

example, that the robustness of linear models against

misspecification in randomized trials carries over to all

the nonlinear cases.

INTERACTION

If the difference in mean outcome between treatments

changes as a covariate changes, there is said to be a

treatment-by-covariate interaction. In a drug trial, such a

finding would have important implications. In the extreme

case, the treatment effect might change direction as the

covariate changed. That is, a drug that was beneficial in

one subset of patients, identified by the covariate, would

be harmful in a different subset. Clearly such a drug

would be effective. Equally clearly, for such a drug to

be useful, the populations in which it was beneficial andharmful would need to be characterized. In less extreme

cases, where the magnitude but not the direction of the

treatment effect changes, considerations of risk and ben-

efit might also make it very desirable to estimate the

effect in different subgroups.

The question of interaction often arises in connection

with analysis of covariance, but it really has little to do

with adjustment for covariates. Everything in the

preceding paragraph is equally true whether the

covariate in question is adjusted for, ignored, or even

20 Adjustment for Covariates


5/5

unmeasured. Furthermore, if the treatment main effect

is to be estimated, it is still better to estimate it by

analysis of covariance, even without an interaction

term, than by the unadjusted difference in means. As

with the assumption of linearity, the analysis of

covariance is not invalidated by violation of theassumption of parallelism, for this assumption plays

no role in the analysis. Also, as with linearity, if this

assumption were crucial, its failure would taint as well

the unadjusted analysis, which also assumes parallel

regressions of the outcome on the covariate, but forces

them to have slope 0.

The possibility of interaction should be taken into

account whenever it appears at all probable that different

groups may respond differently. The reason for this is

practical and concerns the interpretation and application

of the results of a successful trial. However, the presence

of interaction or, what is more common, the inability to

rule interaction in or out with confidence, should not be

seen as invalidating analysis of covariance nor, especially,

as a reason to prefer unadjusted analysis.

REFERENCES

1. Fisher, R.A.Statistical Methods for Research Workers,14th

Ed.; Oliver and Boyd: Edinburgh, 1970; 272286.

2. Robinson, J. J. R. Stat. Soc., Ser. B 1973, 35, 368

376.

3. Tukey, J.W. Control. Clin. Trials 1993, 14, 266 285.

4. Angrist, J.D.; Imbens, G.W.; Rubin, D.B. J. Am. Stat.

Assoc. 1996, 91, 444 455.

5. Permutt, T. Statist. Medd. 1990, 9, 14551462.

Adjustment for Covariates 21

A

Adjustment for Covariates

Documents

Transcript of Adjustment for Covariates