Adjustment for Covariates

download Adjustment for Covariates

of 5

Transcript of Adjustment for Covariates

  • 8/13/2019 Adjustment for Covariates

    1/5

    PLEASE SCROLL DOWN FOR ARTICLE

    This article was downloaded by: [University of Alberta]On: 7 January 2009Access details: Access Details: [subscription number 713587337]Publisher Informa HealthcareInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

    Encyclopedia of Biopharmaceutical StatisticsPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713172960

    djustment for CovariatesThomas T. Permutt aaU.S. Food and Drug Administration, Rockville, Maryland, U.S.A.

    Online Publication Date: 23 April 2003

    To cite this SectionPermutt, Thomas T.(2003)'Adjustment for Covariates',Encyclopedia of Biopharmaceutical Statistics,1:1,18 21

    Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

    This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.

    The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.

    http://www.informaworld.com/smpp/title~content=t713172960http://www.informaworld.com/terms-and-conditions-of-access.pdfhttp://www.informaworld.com/terms-and-conditions-of-access.pdfhttp://www.informaworld.com/smpp/title~content=t713172960
  • 8/13/2019 Adjustment for Covariates

    2/5

    Adjustment for Covariates

    Thomas Permutt

    U.S. Food and Drug Administration, Rockville, Maryland, U.S.A.

    INTRODUCTION

    The techniques of analysis of covariance are employed in

    three mathematically similar but conceptually very dif-

    ferent kinds of problem. Examples of all three kinds arise

    in connection with the development of pharmaceuti-

    cal products.

    In the first case, a regression model is expected to fit

    the data well enough to serve as the basis for prediction.

    In testing the stability of a drug product, for example, the

    potency may be modeled as a linear function of time, and

    the possibility of different lines for different batches of

    the product needs to be allowed for. The purpose of the

    statistical analysis is to ensure, with a stated degree of

    confidence, that the potency at a given time will be within

    given limits.

    The second and perhaps widest application of analy-

    sis of covariance is in observational studies, such as

    arise in the postmarketing phase of drug development.

    It may be desired, for example, to study the associationof some outcome with exposure to a drug. It is ne-

    cessary to adjust for covariates that may be systematic-

    ally associated both with the outcome and with the

    exposure and so induce a spurious relationship between

    the outcome and the exposure. In such studies the un-

    explained variation is typically high, so the model is

    not expected to fit the individual observations well. It

    must, however, include all the important potential con-

    founders and must have at least approximately the right

    functional form, if a causal relationship, or the absence

    of one, between the outcome and the exposure is to

    be inferred.The third kind of application of analysis of covar-

    iance, although the first historically,[1] is to randomized,

    controlled experiments such as clinical trials of the effi-

    cacy of new drugs. In such experiments, adjustment for

    covariates is optional in a sense, because the validity

    of unadjusted comparisons is ensured by randomiza-

    tion. Adjustments properly planned and executed, how-

    ever, can reduce the probabilities of inferential errors

    and so help to control the size, cost, and time of clini-

    cal trials.

    The modeling problem is straightforward, well covered

    in textbooks, and, strictly speaking, not a matter of

    adjustment. The observational problem, in contrast, is

    essentially intractable from the standpoint of formal

    statistical inference; but heuristic methods have had wide

    application and discussion. We focus here on the adjust-

    ment for covariates in the experimental setting. This

    problem has had relatively little attention in the litera-

    ture, partly because early writings[1] are largely complete,

    correct, and still sufficient. Unfortunately, the more recent

    literature on modeling and on observational studies has

    been misapplied to the experimental problem. Either a

    well-fitting model is thought to be required, as in the

    first problem, or the analysis is supposed to be heuristic,

    as in the second. In fact, a rigorous theory of analysis of

    covariance in controlled experiments can be developed,even in the absence of a good model for the covar-

    iate effects.

    ADJUSTING FOR BASELINE VALUES

    Consider the case of a randomized trial of two treatments,

    with a continuous measure of outcome (Y) that is also

    measured at baseline (X). If the populations are normal or

    the samples are large, the treatments might be compared

    by a two-samplet-test on the difference in mean outcome

    Y. Alternatively, the change from baseline, Y X, mightbe analyzed in the same way. The difference between

    groups in Yand the difference between groups in YX

    have the same expectation, because the expected

    difference between groups in X is zero. We therefore

    have two unbiased estimators of the same parameter.

    They have different variances, according to how well the

    baseline predicts the outcome. If the variances (within

    treatment groups) of baseline and outcome are the same

    and the correlation is r, then the standard errors are in the

    ratio (2 2r)1/2. The adjusted estimator is better if

    r > 0.5.The opinions expressed are those of the author and not necessarily of the

    U.S. Food and Drug Administration.

    18 Encyclopedia of Biopharmaceutical Statistics

    DOI: 10.1081/E-EBS 120007378

    Copyright D 2003 by Marcel Dekker, Inc. All rights reserved.

  • 8/13/2019 Adjustment for Covariates

    3/5

    Of course, there is no need to choose. The average of

    the two estimators has standard error proportional to

    (1.25r)1/2, which is less than either of the two when-

    ever 0.25 < r < 0.75. This average can be written as the

    difference between treatment groups in Y 0.5X. So

    Y

    0.5Xis a less variable measure of outcome than eitherthe mean raw score Y or the mean difference from

    baseline YX, whenever the correlation is between 0.25

    and 0.75. This can, but need not, be viewed as fitting

    parallel straight lines with slope 0.5 to the two groups and

    measuring the vertical distance between them.

    Naturally, there is no need to choose 0.5 either. The

    difference in group means of any statistic of the form

    YbX can be used to estimate the treatment effect. The

    smallest variance, and so the most sensitive test, is ac-

    hieved whenb happens to coincide with the least-squares

    common slope, but the variance does not increase steeply

    as b moves away from this optimal value. Thus, even a

    very rough a priori guess for b is likely to perform better

    than either of the special cases b= 0 (no adjustment) and

    b= 1 (subtract the baseline).

    Finally, there is no need to guess. The least-squares

    slope, calculated from the data, can be used for b,

    without any consequences beyond the loss of a degree

    of freedom for error. Asymptotic theory for the result-

    ing adjusted estimator of the treatment effect was given

    by Robinson,[2] and an exact, small-sample theory by

    Tukey.[3]

    In general, then, the best way to adjust for a baseline

    value is neither to ignore it nor to subtract it, but to

    subtract a fraction of it. The fraction will be estimatedfrom the data, simultaneously with the treatment effect,

    by analysis of covariance. There is no need to check the

    assumption that the outcome is linearly related to the

    baseline value, because this assumption plays no role in

    the analysis. If it did, not only the analysis of co-

    variance would be tainted: After all, the unadjusted an-

    alysis also assumes a linear relationship, with slope

    0, and the change-from-baseline analysis assumes a

    slope of 1.

    OTHER COVARIATES

    Any single, prespecified covariate can be adjusted for in

    much the same way as a baseline measurement of the

    outcome variable. That is, the mean of a linear function

    YbX may be compared across treatment groups, the

    coefficient b being estimated, simultaneously with the

    treatment effect, by least squares. Again, the much-tested

    assumption of a linear relationship between Y and X is

    superfluous. Two other critical assumptions are some-

    times neglected, however.

    First, the covariate must be unaffected by treatment.

    While it is possible to give an interpretation of an-

    alysis of covariance adjusting for intermediate causes,

    this interpretation is not often useful in clinical trials.

    Any covariate measured before randomization is ac-

    ceptable. With care, some covariates measured latermay be assumed to be unaffected by treatment: the

    weather, for example, in a study of seasonal allergies.

    It may be noted that, while analysis of covariance is

    not usually appropriate for variables in the causal path-

    way, some of the advantages of analysis of covariance

    are shared by instrumental-variables techniques[4] that

    are appropriate.

    Second, the covariate is assumed to be prespecified.

    Model-searching procedures are unavoidable in obser-

    vational studies, for there are typically many potential

    confounding variables whose effects must be considered

    and eliminated if necessary. Alarmingly little is known

    about the statistical properties of such procedures, how-

    ever, and what is known is not generally encouraging. It

    is usual, although unjustifiable, to ignore the searching

    process in reporting the results, presenting simply the

    chosen model, its estimates, and its optimistic estimates

    of variability.

    Randomized trials are radically different from obser-

    vational studies in this respect. There is no confounding,

    because a covariate cannot be systematically associated

    with treatment if it is not affected by treatment and if

    treatment is assigned at random. The purpose of analysis

    of covariance in randomized studies is to reduce the ran-

    dom variability of the estimated treatment effects by eli-minating some of what would otherwise be unexplained

    variance in the observations. This difference has implica-

    tions for the choice of covariates, which will be discussed

    in the next section.

    CHOICE OF COVARIATES

    Whereas a confounder in an observational study is a

    variable correlated both with the outcome and with the

    treatment, a useful covariate in a randomized trial is avariable correlated just with the outcome. The greater the

    absolute correlation, the more the reduction in residual

    variance and so also in the standard error of the estimated

    treatment effect. This benefit is realized whether the

    treatment groups happen to be balanced with respect to

    the covariate or not. It is neither necessary nor useful,

    therefore, to choose covariates retrospectively, on the ba-

    sis of imbalance.[5]

    It is accordingly safe to prespecify, in the protocol for a

    randomized trial, a covariate, or a few covariates, un-

    Adjustment for Covariates 19

    A

  • 8/13/2019 Adjustment for Covariates

    4/5

    affected by treatment but likely to be correlated with the

    outcome. Analysis of covariance, adjusting for these

    covariates, may then be carried out and relied on, without

    any justification after the fact. The probability of Type I

    error will be controlled by significance testing, and the

    probability of Type II error will be less than if covariateswere not used.

    The improvement, however, depends on the correla-

    tions (and partial correlations) between the covariates and

    the outcome, and these may not be perfectly known ahead

    of time. It might therefore seem advantageous to deter-

    mine the correlations for some candidate covariates with

    the data in view, and select a subset that explains a high

    proportion of the variance of the outcome. With care, it is

    possible to specify an unambiguous algorithm for

    selecting a model and to control the probability of Type

    I error.[3] It is not known, however, whether such pro-

    cedures have any advantage with respect to Type II error

    over simply prespecifying the model. In practice, in

    critical efficacy trials the relevant covariates will often be

    apparent in advance; and when they are not, it may not be

    any easier or better to specify a set of candidates and

    an algorithm for choosing among the than to specify a

    single model.

    The properties of models with large numbers of co-

    variates are not well understood. Various rules of thumb

    relating the number of variables to the sample sizes have

    been given, but none has any compelling theoretical

    justification. Furthermore, searches in large sets of

    potential models probably share some of the defects of

    models with many covariates, even if the chosen modelhas only a few covariates.

    NONLINEAR MODELS

    The word linear in the context of the analysis of

    covariance may be understood in two senses. In many

    applications, the model is linear in the covariates.

    However, a model with polynomial or other nonlinear

    covariate effects is still linear in the coefficients, and

    the least-squares estimators are consequently linearfunctions of the outcome measurements, so the theory

    of the general linear model applies. In contrast, logis-

    tic, proportional-hazards, and Poisson regression models

    all involve covariates in a more fundamentally nonli-

    near way.

    Nonlinear covariate effects can be added to an anal-

    ysis of covariance without difficulty. The most common

    examples are the 1/0 variables used to represent cate-

    gorical covariates, but polynomial, logarithmic, expo-

    nential, and other functions may sometimes be useful.

    It is important to bear in mind, however, that in

    randomized trials the purpose of the covariate model is

    to reduce unexplained variance. Thus, nonlinear terms

    should be introduced when they are expected to explain

    substantial variance in the outcome, and not simply

    because it is feared that the assumption of a linearrelationship between the outcome and the covariate may

    be violated.

    Conversely, trials with outcomes that are successes or

    failures, survival times, or small numbers of events are

    analyzed by methods that are nonlinear in the second

    sense. Recent theoretical developments (the generalized

    linear model) and computer programs have tended to

    emphasize the analogies between these methods and the

    linear model. Some of the same principles undoubtedly

    apply when such methods are used to analyze random-

    ized trials. For example, if a model selection procedure is

    used, it is vital to understand the statistical properties of

    the procedure as a whole, rather than simply to report

    the nominal standard errors and p-values of the model

    that happens to be chosen. On the other hand, the simi-

    larity in form may conceal important differences in

    mathematical structure between linear and nonlinear

    models, and the linear results must not be casually as-

    sumed to have nonlinear analogs. It is not clear, for

    example, that the robustness of linear models against

    misspecification in randomized trials carries over to all

    the nonlinear cases.

    INTERACTION

    If the difference in mean outcome between treatments

    changes as a covariate changes, there is said to be a

    treatment-by-covariate interaction. In a drug trial, such a

    finding would have important implications. In the extreme

    case, the treatment effect might change direction as the

    covariate changed. That is, a drug that was beneficial in

    one subset of patients, identified by the covariate, would

    be harmful in a different subset. Clearly such a drug

    would be effective. Equally clearly, for such a drug to

    be useful, the populations in which it was beneficial andharmful would need to be characterized. In less extreme

    cases, where the magnitude but not the direction of the

    treatment effect changes, considerations of risk and ben-

    efit might also make it very desirable to estimate the

    effect in different subgroups.

    The question of interaction often arises in connection

    with analysis of covariance, but it really has little to do

    with adjustment for covariates. Everything in the

    preceding paragraph is equally true whether the

    covariate in question is adjusted for, ignored, or even

    20 Adjustment for Covariates

  • 8/13/2019 Adjustment for Covariates

    5/5

    unmeasured. Furthermore, if the treatment main effect

    is to be estimated, it is still better to estimate it by

    analysis of covariance, even without an interaction

    term, than by the unadjusted difference in means. As

    with the assumption of linearity, the analysis of

    covariance is not invalidated by violation of theassumption of parallelism, for this assumption plays

    no role in the analysis. Also, as with linearity, if this

    assumption were crucial, its failure would taint as well

    the unadjusted analysis, which also assumes parallel

    regressions of the outcome on the covariate, but forces

    them to have slope 0.

    The possibility of interaction should be taken into

    account whenever it appears at all probable that different

    groups may respond differently. The reason for this is

    practical and concerns the interpretation and application

    of the results of a successful trial. However, the presence

    of interaction or, what is more common, the inability to

    rule interaction in or out with confidence, should not be

    seen as invalidating analysis of covariance nor, especially,

    as a reason to prefer unadjusted analysis.

    REFERENCES

    1. Fisher, R.A.Statistical Methods for Research Workers,14th

    Ed.; Oliver and Boyd: Edinburgh, 1970; 272286.

    2. Robinson, J. J. R. Stat. Soc., Ser. B 1973, 35, 368

    376.

    3. Tukey, J.W. Control. Clin. Trials 1993, 14, 266 285.

    4. Angrist, J.D.; Imbens, G.W.; Rubin, D.B. J. Am. Stat.

    Assoc. 1996, 91, 444 455.

    5. Permutt, T. Statist. Medd. 1990, 9, 14551462.

    Adjustment for Covariates 21

    A