Adjustment for Covariates
-
Upload
ecaterina-adascalitei -
Category
Documents
-
view
216 -
download
0
Transcript of Adjustment for Covariates
-
8/13/2019 Adjustment for Covariates
1/5
PLEASE SCROLL DOWN FOR ARTICLE
This article was downloaded by: [University of Alberta]On: 7 January 2009Access details: Access Details: [subscription number 713587337]Publisher Informa HealthcareInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK
Encyclopedia of Biopharmaceutical StatisticsPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713172960
djustment for CovariatesThomas T. Permutt aaU.S. Food and Drug Administration, Rockville, Maryland, U.S.A.
Online Publication Date: 23 April 2003
To cite this SectionPermutt, Thomas T.(2003)'Adjustment for Covariates',Encyclopedia of Biopharmaceutical Statistics,1:1,18 21
Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf
This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.
http://www.informaworld.com/smpp/title~content=t713172960http://www.informaworld.com/terms-and-conditions-of-access.pdfhttp://www.informaworld.com/terms-and-conditions-of-access.pdfhttp://www.informaworld.com/smpp/title~content=t713172960 -
8/13/2019 Adjustment for Covariates
2/5
Adjustment for Covariates
Thomas Permutt
U.S. Food and Drug Administration, Rockville, Maryland, U.S.A.
INTRODUCTION
The techniques of analysis of covariance are employed in
three mathematically similar but conceptually very dif-
ferent kinds of problem. Examples of all three kinds arise
in connection with the development of pharmaceuti-
cal products.
In the first case, a regression model is expected to fit
the data well enough to serve as the basis for prediction.
In testing the stability of a drug product, for example, the
potency may be modeled as a linear function of time, and
the possibility of different lines for different batches of
the product needs to be allowed for. The purpose of the
statistical analysis is to ensure, with a stated degree of
confidence, that the potency at a given time will be within
given limits.
The second and perhaps widest application of analy-
sis of covariance is in observational studies, such as
arise in the postmarketing phase of drug development.
It may be desired, for example, to study the associationof some outcome with exposure to a drug. It is ne-
cessary to adjust for covariates that may be systematic-
ally associated both with the outcome and with the
exposure and so induce a spurious relationship between
the outcome and the exposure. In such studies the un-
explained variation is typically high, so the model is
not expected to fit the individual observations well. It
must, however, include all the important potential con-
founders and must have at least approximately the right
functional form, if a causal relationship, or the absence
of one, between the outcome and the exposure is to
be inferred.The third kind of application of analysis of covar-
iance, although the first historically,[1] is to randomized,
controlled experiments such as clinical trials of the effi-
cacy of new drugs. In such experiments, adjustment for
covariates is optional in a sense, because the validity
of unadjusted comparisons is ensured by randomiza-
tion. Adjustments properly planned and executed, how-
ever, can reduce the probabilities of inferential errors
and so help to control the size, cost, and time of clini-
cal trials.
The modeling problem is straightforward, well covered
in textbooks, and, strictly speaking, not a matter of
adjustment. The observational problem, in contrast, is
essentially intractable from the standpoint of formal
statistical inference; but heuristic methods have had wide
application and discussion. We focus here on the adjust-
ment for covariates in the experimental setting. This
problem has had relatively little attention in the litera-
ture, partly because early writings[1] are largely complete,
correct, and still sufficient. Unfortunately, the more recent
literature on modeling and on observational studies has
been misapplied to the experimental problem. Either a
well-fitting model is thought to be required, as in the
first problem, or the analysis is supposed to be heuristic,
as in the second. In fact, a rigorous theory of analysis of
covariance in controlled experiments can be developed,even in the absence of a good model for the covar-
iate effects.
ADJUSTING FOR BASELINE VALUES
Consider the case of a randomized trial of two treatments,
with a continuous measure of outcome (Y) that is also
measured at baseline (X). If the populations are normal or
the samples are large, the treatments might be compared
by a two-samplet-test on the difference in mean outcome
Y. Alternatively, the change from baseline, Y X, mightbe analyzed in the same way. The difference between
groups in Yand the difference between groups in YX
have the same expectation, because the expected
difference between groups in X is zero. We therefore
have two unbiased estimators of the same parameter.
They have different variances, according to how well the
baseline predicts the outcome. If the variances (within
treatment groups) of baseline and outcome are the same
and the correlation is r, then the standard errors are in the
ratio (2 2r)1/2. The adjusted estimator is better if
r > 0.5.The opinions expressed are those of the author and not necessarily of the
U.S. Food and Drug Administration.
18 Encyclopedia of Biopharmaceutical Statistics
DOI: 10.1081/E-EBS 120007378
Copyright D 2003 by Marcel Dekker, Inc. All rights reserved.
-
8/13/2019 Adjustment for Covariates
3/5
Of course, there is no need to choose. The average of
the two estimators has standard error proportional to
(1.25r)1/2, which is less than either of the two when-
ever 0.25 < r < 0.75. This average can be written as the
difference between treatment groups in Y 0.5X. So
Y
0.5Xis a less variable measure of outcome than eitherthe mean raw score Y or the mean difference from
baseline YX, whenever the correlation is between 0.25
and 0.75. This can, but need not, be viewed as fitting
parallel straight lines with slope 0.5 to the two groups and
measuring the vertical distance between them.
Naturally, there is no need to choose 0.5 either. The
difference in group means of any statistic of the form
YbX can be used to estimate the treatment effect. The
smallest variance, and so the most sensitive test, is ac-
hieved whenb happens to coincide with the least-squares
common slope, but the variance does not increase steeply
as b moves away from this optimal value. Thus, even a
very rough a priori guess for b is likely to perform better
than either of the special cases b= 0 (no adjustment) and
b= 1 (subtract the baseline).
Finally, there is no need to guess. The least-squares
slope, calculated from the data, can be used for b,
without any consequences beyond the loss of a degree
of freedom for error. Asymptotic theory for the result-
ing adjusted estimator of the treatment effect was given
by Robinson,[2] and an exact, small-sample theory by
Tukey.[3]
In general, then, the best way to adjust for a baseline
value is neither to ignore it nor to subtract it, but to
subtract a fraction of it. The fraction will be estimatedfrom the data, simultaneously with the treatment effect,
by analysis of covariance. There is no need to check the
assumption that the outcome is linearly related to the
baseline value, because this assumption plays no role in
the analysis. If it did, not only the analysis of co-
variance would be tainted: After all, the unadjusted an-
alysis also assumes a linear relationship, with slope
0, and the change-from-baseline analysis assumes a
slope of 1.
OTHER COVARIATES
Any single, prespecified covariate can be adjusted for in
much the same way as a baseline measurement of the
outcome variable. That is, the mean of a linear function
YbX may be compared across treatment groups, the
coefficient b being estimated, simultaneously with the
treatment effect, by least squares. Again, the much-tested
assumption of a linear relationship between Y and X is
superfluous. Two other critical assumptions are some-
times neglected, however.
First, the covariate must be unaffected by treatment.
While it is possible to give an interpretation of an-
alysis of covariance adjusting for intermediate causes,
this interpretation is not often useful in clinical trials.
Any covariate measured before randomization is ac-
ceptable. With care, some covariates measured latermay be assumed to be unaffected by treatment: the
weather, for example, in a study of seasonal allergies.
It may be noted that, while analysis of covariance is
not usually appropriate for variables in the causal path-
way, some of the advantages of analysis of covariance
are shared by instrumental-variables techniques[4] that
are appropriate.
Second, the covariate is assumed to be prespecified.
Model-searching procedures are unavoidable in obser-
vational studies, for there are typically many potential
confounding variables whose effects must be considered
and eliminated if necessary. Alarmingly little is known
about the statistical properties of such procedures, how-
ever, and what is known is not generally encouraging. It
is usual, although unjustifiable, to ignore the searching
process in reporting the results, presenting simply the
chosen model, its estimates, and its optimistic estimates
of variability.
Randomized trials are radically different from obser-
vational studies in this respect. There is no confounding,
because a covariate cannot be systematically associated
with treatment if it is not affected by treatment and if
treatment is assigned at random. The purpose of analysis
of covariance in randomized studies is to reduce the ran-
dom variability of the estimated treatment effects by eli-minating some of what would otherwise be unexplained
variance in the observations. This difference has implica-
tions for the choice of covariates, which will be discussed
in the next section.
CHOICE OF COVARIATES
Whereas a confounder in an observational study is a
variable correlated both with the outcome and with the
treatment, a useful covariate in a randomized trial is avariable correlated just with the outcome. The greater the
absolute correlation, the more the reduction in residual
variance and so also in the standard error of the estimated
treatment effect. This benefit is realized whether the
treatment groups happen to be balanced with respect to
the covariate or not. It is neither necessary nor useful,
therefore, to choose covariates retrospectively, on the ba-
sis of imbalance.[5]
It is accordingly safe to prespecify, in the protocol for a
randomized trial, a covariate, or a few covariates, un-
Adjustment for Covariates 19
A
-
8/13/2019 Adjustment for Covariates
4/5
affected by treatment but likely to be correlated with the
outcome. Analysis of covariance, adjusting for these
covariates, may then be carried out and relied on, without
any justification after the fact. The probability of Type I
error will be controlled by significance testing, and the
probability of Type II error will be less than if covariateswere not used.
The improvement, however, depends on the correla-
tions (and partial correlations) between the covariates and
the outcome, and these may not be perfectly known ahead
of time. It might therefore seem advantageous to deter-
mine the correlations for some candidate covariates with
the data in view, and select a subset that explains a high
proportion of the variance of the outcome. With care, it is
possible to specify an unambiguous algorithm for
selecting a model and to control the probability of Type
I error.[3] It is not known, however, whether such pro-
cedures have any advantage with respect to Type II error
over simply prespecifying the model. In practice, in
critical efficacy trials the relevant covariates will often be
apparent in advance; and when they are not, it may not be
any easier or better to specify a set of candidates and
an algorithm for choosing among the than to specify a
single model.
The properties of models with large numbers of co-
variates are not well understood. Various rules of thumb
relating the number of variables to the sample sizes have
been given, but none has any compelling theoretical
justification. Furthermore, searches in large sets of
potential models probably share some of the defects of
models with many covariates, even if the chosen modelhas only a few covariates.
NONLINEAR MODELS
The word linear in the context of the analysis of
covariance may be understood in two senses. In many
applications, the model is linear in the covariates.
However, a model with polynomial or other nonlinear
covariate effects is still linear in the coefficients, and
the least-squares estimators are consequently linearfunctions of the outcome measurements, so the theory
of the general linear model applies. In contrast, logis-
tic, proportional-hazards, and Poisson regression models
all involve covariates in a more fundamentally nonli-
near way.
Nonlinear covariate effects can be added to an anal-
ysis of covariance without difficulty. The most common
examples are the 1/0 variables used to represent cate-
gorical covariates, but polynomial, logarithmic, expo-
nential, and other functions may sometimes be useful.
It is important to bear in mind, however, that in
randomized trials the purpose of the covariate model is
to reduce unexplained variance. Thus, nonlinear terms
should be introduced when they are expected to explain
substantial variance in the outcome, and not simply
because it is feared that the assumption of a linearrelationship between the outcome and the covariate may
be violated.
Conversely, trials with outcomes that are successes or
failures, survival times, or small numbers of events are
analyzed by methods that are nonlinear in the second
sense. Recent theoretical developments (the generalized
linear model) and computer programs have tended to
emphasize the analogies between these methods and the
linear model. Some of the same principles undoubtedly
apply when such methods are used to analyze random-
ized trials. For example, if a model selection procedure is
used, it is vital to understand the statistical properties of
the procedure as a whole, rather than simply to report
the nominal standard errors and p-values of the model
that happens to be chosen. On the other hand, the simi-
larity in form may conceal important differences in
mathematical structure between linear and nonlinear
models, and the linear results must not be casually as-
sumed to have nonlinear analogs. It is not clear, for
example, that the robustness of linear models against
misspecification in randomized trials carries over to all
the nonlinear cases.
INTERACTION
If the difference in mean outcome between treatments
changes as a covariate changes, there is said to be a
treatment-by-covariate interaction. In a drug trial, such a
finding would have important implications. In the extreme
case, the treatment effect might change direction as the
covariate changed. That is, a drug that was beneficial in
one subset of patients, identified by the covariate, would
be harmful in a different subset. Clearly such a drug
would be effective. Equally clearly, for such a drug to
be useful, the populations in which it was beneficial andharmful would need to be characterized. In less extreme
cases, where the magnitude but not the direction of the
treatment effect changes, considerations of risk and ben-
efit might also make it very desirable to estimate the
effect in different subgroups.
The question of interaction often arises in connection
with analysis of covariance, but it really has little to do
with adjustment for covariates. Everything in the
preceding paragraph is equally true whether the
covariate in question is adjusted for, ignored, or even
20 Adjustment for Covariates
-
8/13/2019 Adjustment for Covariates
5/5
unmeasured. Furthermore, if the treatment main effect
is to be estimated, it is still better to estimate it by
analysis of covariance, even without an interaction
term, than by the unadjusted difference in means. As
with the assumption of linearity, the analysis of
covariance is not invalidated by violation of theassumption of parallelism, for this assumption plays
no role in the analysis. Also, as with linearity, if this
assumption were crucial, its failure would taint as well
the unadjusted analysis, which also assumes parallel
regressions of the outcome on the covariate, but forces
them to have slope 0.
The possibility of interaction should be taken into
account whenever it appears at all probable that different
groups may respond differently. The reason for this is
practical and concerns the interpretation and application
of the results of a successful trial. However, the presence
of interaction or, what is more common, the inability to
rule interaction in or out with confidence, should not be
seen as invalidating analysis of covariance nor, especially,
as a reason to prefer unadjusted analysis.
REFERENCES
1. Fisher, R.A.Statistical Methods for Research Workers,14th
Ed.; Oliver and Boyd: Edinburgh, 1970; 272286.
2. Robinson, J. J. R. Stat. Soc., Ser. B 1973, 35, 368
376.
3. Tukey, J.W. Control. Clin. Trials 1993, 14, 266 285.
4. Angrist, J.D.; Imbens, G.W.; Rubin, D.B. J. Am. Stat.
Assoc. 1996, 91, 444 455.
5. Permutt, T. Statist. Medd. 1990, 9, 14551462.
Adjustment for Covariates 21
A