Analysis of Repeated Measures Data With Missing Values- An Overview of Methods

download Analysis of Repeated Measures Data With Missing Values- An Overview of Methods

of 8

Transcript of Analysis of Repeated Measures Data With Missing Values- An Overview of Methods

  • 8/13/2019 Analysis of Repeated Measures Data With Missing Values- An Overview of Methods

    1/8

    PLEASE SCROLL DOWN FOR ARTICLE

    This article was downloaded by: [University of Alberta]On: 7 January 2009Access details: Access Details: [subscription number 713587337]Publisher Informa HealthcareInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

    Encyclopedia of Biopharmaceutical StatisticsPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713172960

    Analysis of Repeated Measures Data with Missing Values An Overview ofMethodsK. C. Carrire a; Taesung Park b; Yuanyuan Liang aaUniversity of Alberta, Edmonton, Alberta, Canada bSeoul National University, Seoul, South Korea

    Online Publication Date: 18 July 2005

    To cite this SectionCarrire, K. C., Park, Taesung and Liang, Yuanyuan(2005)'Analysis of Repeated Measures Data with MissingValues: An Overview of Methods',Encyclopedia of Biopharmaceutical Statistics,1:1,1 7

    Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

    This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.

    The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.

    http://www.informaworld.com/smpp/title~content=t713172960http://www.informaworld.com/terms-and-conditions-of-access.pdfhttp://www.informaworld.com/terms-and-conditions-of-access.pdfhttp://www.informaworld.com/smpp/title~content=t713172960
  • 8/13/2019 Analysis of Repeated Measures Data With Missing Values- An Overview of Methods

    2/8

    Analysis of Repeated Measures Data with Missing Values:An Overview of Methods

    K. C. CarriereUniversity of Alberta, Edmonton, Alberta, Canada

    Taesung ParkSeoul National University, Seoul, South Korea

    Yuanyuan LiangUniversity of Alberta, Edmonton, Alberta, Canada

    INTRODUCTION

    The missing data problem, which persists in muchof empirical scientific investigations, is particularly

    common in repeated measures data. One reason forthis is that the same subjects are used repeatedly overtime. One of the main goals in dealing with missingdata is to try to remove biases and reduce the estima-tion variances in an attempt to improve the overallestimation efficiency, thereby increasing study power.With current advancements in computer technology,many computational analysis methods have beendeveloped, but most have a major drawback: they relyheavily on large sample theory.[13]

    In their overview of missing data methods forapproximately normally distributed repeated measuresdata in small samples, Carriere et al.[4] discuss non-

    iterative procedures for data with compound symmetryand unstructured covariance matrices, as well as theuse of proxy information. In general, the approachto incomplete data involves identifying appropriatemissing data mechanisms.

    In this work, which supplements the previous workdone by Carriere et al.,[4] we discuss the merits anddrawbacks of available small sample approaches withapproximately normally distributed repeated measuresdata with missing values. This discussion is limited tocases where the missing data occurs on the outcomevariable. In particular, this entry expands the discus-sion to include multiple imputation approaches. We

    use a numerical example to demonstrate the practicalimplications.

    MISSING DATA MECHANISMS ANDIMPLICATIONS

    Most missing data methods are based on an assump-tion that missingness indicators contain true valuesthat are meaningful for analysis.[5] Therefore, we must

    make an effort to reveal the true values. Obviously,procedures based only on the complete subset datacan create serious biases and result in inefficientanalyses.[1,5]

    We consider repeated measures data yij for subjectj 1; . . . ; N in period i 1; . . . ;p in the presenceof treatment effects, along with a covariate or designvector xij with mij 0 if yij is missing and 1 if yijis observed. The goal is to efficiently estimate thecontrast of treatment effects in the presence of missingvalues.

    Little and Rubin[5] define three unique of types miss-ing data mechanisms that occur in different situations:missing completely at random (MCAR), missing atrandom (MAR), and nonignorable missing (NIM).In the MCAR situation, cases with complete data areindistinguishable from cases with incomplete data, so

    thatEyijjmij 1 Eyijjmij 0. Also, we havepmij 1jxij; yij pmij 1. This implies thatthe investigator can get consistent results by examiningonly the complete subset data. There is no danger ofbiased estimation, whether or not incomplete pairsreceive appropriate attention. However, data can bemissing for uncontrolled events in the course of datacollection, and missing data are often associated withstudy variablesboth the outcome and the covariates.The MCAR is too strong an assumption in reality.

    In the MAR situation, cases with incompletedata differ from cases with complete data. The prob-ability of observing a missing value depends on the

    observed values, but not on the unobserved values(both the covariates and outcome). Here, we haveEyijjxij;mij 1 Eyijjxij;mij 0 and withinsome subclasses of the data, they are still a randomsample of cases.[5] Therefore, missing values are trace-able or predictable from other variables available inthe database already observed. Investigators haveapproached this situation in a variety of ways,including imputation,[611] weighting,[5,12,13] resamplingmethods,[7,14] data augmentation,[15] and the Gibbs

    Encyclopedia of Biopharmaceutical StatisticsDOI: 10.1081/E-EBS-120023806Copyright # 2005 by Taylor & Francis. All rights reserved. 1

  • 8/13/2019 Analysis of Repeated Measures Data With Missing Values- An Overview of Methods

    3/8

    sampler.[16] The success of these techniques depends oncorrectly specifying the missingness mechanism and=orthe imputation model. Rubin and Little[5] show that, inthe case of MAR data, the likelihood inference pro-duces consistent and efficient results even if the missingdata mechanism is ignored. In the context of a gener-alized estimating equations approach, Liang andZeger[17] have found that the normal-model GEE isconsistent when the missing data depend on any num-ber of previous observations (i.e., MAR), if the meanfunction is correctly specified. It appears that boththe bias and the standard error of estimates can bereduced as long as all available cases are used forboth MCAR and MAR data, especially when usinglikelihood-based methods.[5] Little[10] and Park andLee[18,19] advocated simple pattern mixture models totest the dependence of responses on the missing datamechanisms, assuming that the missing data areignorable. If the test is rejected, the missing data aredetermined not to be of type MCAR, but MAR.

    In the NIM situation, the missing data mechanism isnot completely random (non-MCAR), nor is it predict-able from other variables in the database (non-MAR).Here, the reason for the missing data is explainable,but unmeasurable, because the very variable causingthe data to be missing is unobservable.[5] For validanalysis results, it is important to handle nonignorablemissing data appropriately. Imputation, or a sampling-importance resampling approach, has been used toachieve the desired efficiency and unbiased analysis.[12]

    Sheng and Carriere[14] conclude that, in item responsedata analysis, the bootstrap under imputation approachcan accommodate all missing data mechanisms, includ-

    ing the NIM situation, efficiently and without bias.However, this procedure is based on large sample theory.

    In summary, consistent and efficient analysisdepends on the investigators ability to choose anappropriate missing data mechanism. It is especiallyimportant to devise a model for the nonignorable miss-ing values, which accommodates a particular situation.Since any model for nonignorable data is specific to agiven study and cannot be discussed in general terms,we have limited this discussion to MAR and MCARsituations.

    AVAILABLE DATA ANALYSIS

    Many investigators developed methods to utilize allavailable data rather than discarding incomplete pairs.Little and Rubin[5] noted that available case analysismethods generally lack practical appeal due to theimbalance problem of varying sample bases. Whenthe sample bases vary from one variable to anothervariable within a study, obtaining estimates of theparameters, or even asymptotic standard errors, can

    become very complicated. However, this is not an issuewhen using the ML method under an MCAR or MARmissing data mechanism with large samples, as thereare valid standard errors and tests available based onlarge sample theory.[3] For small sample cases, though,further work is still needed. Several studies, althoughlimited in scope, have indicated that the availablecase analysis method is superior to complete subsetanalysis.[1,2,20]

    Approximate inference procedures have also beenproposed for small samples to utilize all available data,if not all available cases. Carriere et al.[4] present anoverview of missing data strategies, with a focus onavailable data analysis methods for approximatelynormal repeated measures data. Their concern is withsmall sample data, and they discuss approximatedistributions of estimators for making inferences forparameters of interest.

    In particular, Carriere[1,2] has developed smallsample testing procedures based on the maximum

    likelihood method for two particular situations of thewithin-subject covariance matrixwhen it has a com-pound symmetry pattern and when it is unstructured.Approximate solutions for small sample inference pro-cedure were found upon obtaining the explicit formsof the standard errors of the parameter estimatorsof interest. Carriere[1,2] also suggested an approxi-mate degrees of freedom approach based on theSatterthwaite[21] approximation method and on theassumption that the variability of higher-order termsis negligible. Although rather complex, these methodshave been demonstrated to work well in computersimulation studies.

    Other small sample available data analysis methodscan also be applicable when using common software(for example, PROC MIXED in SAS[22]), with approx-imation techniques similar to those suggested byCarriere,[1,2] but based on different assumptions.Comparing her procedure with the SAS procedure,Carriere[1] noted that the inference based on the avail-able data method depends on the design structure.PROC MIXED in SAS approximates the distributionof the estimators uniformly to make inferences for allmodel parameters. It appears that caution is advisedif the design is not orthogonal, as the analysis resultscan be more liberal or more conservative than

    expected.[1]Although the Carriere procedure[1,2] can generally

    apply to any repeated measures data with missingvalues occurring monotonically and at random, theestimation methods are applicable to only two, veryspecific cases. The procedure can be extended to otherforms of covariance structures in an attempt to obtainapproximate or even asymptotic distributions of theestimators. But because of the complexity involved, itmay not be practical or even possible to use the

    2 Analysis of Repeated Measures Data with Missing Values

  • 8/13/2019 Analysis of Repeated Measures Data With Missing Values- An Overview of Methods

    4/8

    procedure without making many unrealistic assump-tions. For this reason, we echo Little and Rubin[5]

    who concluded that the current state of availabledata=case analysis is not generally satisfactory forsmall sample repeated measures data. Further develop-ment is needed.

    IMPUTATIONS

    Imputations (both single and multiple) require creatinga predictive distribution based on the observed data,either implicitly or explicitly.[9] Examples include mean,regression, stochastic regression, hot or cold deckimputations, and substitutions. Composite methodscombining some of these imputations and other techni-ques can also be used. Typically, imputation builds onthe conditional mean of the missing outcome givenobserved data. Practically, imputation approaches areattractive because of their easy implementation and

    computational convenience, because the data are com-pletely filled in even if the values are artificial, and thepseudocomplete data can be analyzed using standardsoftware.

    Single Imputations

    Single imputation approaches impute a single datapoint for each missing value. Single imputations wereused in empirical research even before the theoryhad been formally developed. Using imputationapproaches for each draw of the data, a complete data

    set y yobs; yImis is obtained by combining the

    observed data set yobs and the imputed data set forthe missing values yImis. The most popular of thesesingle imputations is probably the mean substitutionbased on a specific covariate pattern. See Rubin[9] fordetails.

    From a slightly different perspective, Huang, et al.[23]

    discussed a single imputation strategy where the imputa-tion uses proxy information from caregivers or familymembers of a respondent who can provide approximateinformation on behalf of the respondents. Then, thisproxy information can be used for analysis instead ofdealing with missing responses. The principal idea of this

    approach is implicit, in that it assumes an underlyingmodel where the respondents and their proxies share acommon mean and variance. Possible differences in themean and variance can be accommodated. Huang etal.[23] also provided an approximate degrees of freedomsolution to the testing hypothesis problem for missingrepeated measures data with proxy.

    Huang et al.[23] reported that a single imputationwith proxy information can play a significant role inmissing data analyses. They also discussed design

    implications in order to provide general guidelinesfor utilizing available proxy information to improvedata collection strategy for missing data situations.However, when the variability between the actualand proxy data sets is different, it is better to useother available case analyses so as to maintain theType I error rate and to increase the power of testingparameters of interest.

    Multiple Imputations

    In the multiple imputation strategy, which was firstproposed by Rubin,[8] the imputed values are supposedto represent repeated random draws under a givenmodel for each missing value. Overall inferences canthen be drawn by combining results from completedata sets.

    Multiple imputation involves drawing multiplevalues for each missing data point and there arevarious methods available for drawing the values toimpute.[9] For each draw of the data under a chosenimputation method, a complete data set ym yobs; ymmis is created by combining yobs and ymmis,m 1; . . .; M, as defined earlier. Substantialempirical work[2426] has shown that multiple imputa-tion withM 3 or 5 works well with typical fractions(

  • 8/13/2019 Analysis of Repeated Measures Data With Missing Values- An Overview of Methods

    5/8

    investigator draws noise from the empirical distribu-tion, and is therefore less sensitive to violations ofthe normal assumption.

    The RD method is implemented separately foreach group=sequence. It involves estimating the leastsquares estimators to fit a regression of the observedp 1th period data on the observed first p periodsdata to obtain ^bbfor each regression model. Then, theerror variance is set as s2 SSE=W where SSE isthe residual sum of squares from the least squares fitand W is a draw from a chi-square distribution withthe degrees of freedom of SSE. Then, a b is drawnfrom the N(^bb;s2xTx1. Finally, to impute foran incomplete case i, the method imputes a valueYi xTi b r0 Y0 xTi xT0 b wherethose with a subscript 0 are data values observed fora complete case drawn at random from the subjectsin the given sequence. The r0 is the residual of that casewhen b b: Through simulations, Richardson andFlack[3] show that their strategy works well. However,

    their simulation uses relatively large sample data.Huang and Carriere[27] suggest that a less para-metric approach than the one proposed by Richardsonand Flack would be more robust. They impute themissing values using conditional distribution of themissing data, given the observed data in previousperiods. The p1-vector y

    Tp1j y1j; . . . ; yp1j for the

    complete subset data up to the first p1 periods isassumed to be distributed as multivariate normal withmeanlTp1 m1; . . . ;mp1and covariance matrix S11,the p1p1 subcovariance matrix for the first p1periods. Then, the conditional distribution of yp11;jgiven yp1j is normal with mean mp11 r21S111yp1j lp1, and variances22 r21S

    111r12, where

    S11 is the submatrix ofS for the first p1 rows and p1columns, r21 is that for thep1 1th row and thefirst p1 columns, with r12 rT21, and s22 is thep1 1th diagonal element of S. For subsequentperiods, replace p1 with p1 1, and repeat the pro-cess. See also Carriere.[2] Since the covariance matrixis usually unknown, the investigator uses the respectivesample estimates, obtained based on complete subsetsof the data and denoted by s ij, sij, and Sij. Then,Huang and Carriere[27] extended the imputationstrategy that Rubin[9] suggested for a univariatenormal model to a multivariate normal model.

    This strategy involves estimating the conditionalmean mmp11 yyp11 s21S111yp1j yyp1:, andthe variance ss2 s22 s21S111s12. then, the condi-tional estimators are updated by drawing a chi-squarerandom variable gwith degrees of freedom Nl sand a random variable from a standard normal distri-bution z as s ssNl s=g and mp11 mmp11 sz=

    ffiffiffiffiffiffiffiffiNl

    p , whereNl is the total number of

    observations at the missing data stage l and s is thenumber of sequences. Then, a random variable z is

    drawn from a standard normal distribution to imputefor the missing values in the period p1 1, asyp11;j mp11 sz. This is repeated for all missingcomponents in the period p1 1. Treating theimputed values as if they were actual values, the stepsare repeated for the next periods, with p1 replaced byp1 1. This whole process is repeatedMtimes to cre-ate Mmultiply imputed data sets.

    The multiply imputed data sets are analyzed toobtain an overall result. First, the usual data analysisis performed for each of the M imputed data sets.The M anaysis results are then combined to give

    a repeated-imputation inference as follows. Let ^bbm andWm be the estimators and their associated variancecovariance matrix for b from the complete data set m,m 1; . . . ;M. The overall estimator ofb fromMdatasets is obtained from bbM

    PMm1 ^bbm=M and its

    variance as V(bbM) TM WM 1 M1BM.The M within-imputation variances Wm are aver-

    aged to obtain WM

    PMm

    1 Wm=M, and BM

    PMm1^bbm bbM ^bbm bbMT=M 1 is the

    between-imputation variance.To test a hypothesis of a linear contrast ylTb,

    Rubin[9] considers an approximate distribution for ygiven by

    y yyMlTTMl1=2 tv 2

    where yyM lTbbM, and the degree of freedom v is~vv v0ffv01 rM1 v0=vg1 3

    where fv0 v0 1=v0 3, rM 1 M1 trBMT1M=q; and v0 is the degree offreedom based on the complete subset data. The rMestimates the fraction of information on b that ismissing due to nonresponses. This fraction can beno larger than that of all missing data.[9]

    IMPUTATION OR AVAILABLE CASE ANALYSIS?

    The available data analysis method is not always satis-factory for the reasons stated in the section Available

    Data Analysis above. Then, the imputation approachmight be one alternate method to use. There are others,but most of them build on simulation methods (forexample, data augmentation[15] and Gibbs sampler[16]).This entry compares available data analysis methodsand imputations, among other possible approaches.

    Many investigators favor multiple imputations oversingle imputations because they can generate randomvariations in imputed artificial data. However, manystudies have also found that this method does not

    4 Analysis of Repeated Measures Data with Missing Values

  • 8/13/2019 Analysis of Repeated Measures Data With Missing Values- An Overview of Methods

    6/8

    perform as effectively as expected. For example, in thecontext of generalized linear models, Xie and Paik [13]

    considered four multiple imputation strategies andsample average imputations. They conclude that aslong as one uses all available data, all approaches areconsistent and efficient. For continuous responses,Huang and Carriere[27] found that there is no realadvantage in creating multiple complete data sets, asthe analysis method that uses only available data per-forms as well as or better than the multiple imputationmethod.

    Specifically, Huang and Carriere[27] note that thecorresponding asymptotic distributions appear to fitreasonably well for the two approaches they consid-ered (ML and MI approaches) as long as all availabledata are used. In that study, the multiple imputationmethod performed as well as the ML method[1,2] forall situations considered. However, the ML methodwas found generally superior to the multiple imputa-tion method, especially when the correlation is large

    and the covariance structure is unspecified.In light of the technical limitations noted earlierregarding the available data method, Huang andCarriere[27] suggest adopting the multiple imputationmethod when types of covariance structures otherthan compound symmetry or unstructured covariancematrices provide a good fit to the data in small samples.Although their implementation of the multiple imputa-tion method for small samples was not entirely satisfac-tory in terms of keeping the Type I error and testingpower, they recommended that the violation may befar less serious than that by the ML approach with largesample theory assumptions. Also, it could be consider-

    ably more complicated to try to calculate the asymptoticstandard error of the ML estimators than using multipleimputations. See also Richardson and Flack.[3]

    As noted in the previous section, the real advantageof using multiple imputations is not in obtaining effi-cient estimators, but in obtaining unbiased estimatorsfor parameters by attempting to reveal the true valuesfor missing data. However, the one advantage of themultiple imputation (and of any other imputation tech-niques) is its capacity to use standard analysis statisti-cal software. As long as all available data are used,both ML and MI approaches seem to be valid for uni-variate data as well as for repeated measures data ana-

    lyses. Further work needs to be done on deriving theasymptotic small sample procedures for available dataanalysis. Also, there is a need to investigate the samplesizes required for the asymptotic ML procedures thatcan be validly used with standard software.

    This study has not dealt with the NIM case. Wesuggest that the alternative imputation strategy usingproxy information, as in Huang et al.,[23] could beeffective. This approach would likely work if proxyproviders carefully assess reasonable responses from

    the respondents with a full understanding of thereasons for the missing data in the given situation.

    ANALYSIS OF BRONCHIAL ASTHMA DATA

    To contrast the methods discussed in this paper, weanalyzed the bronchial asthma data in Patel,[28] whichutilizes the traditional two-period, two-treatment, two-sequence design with eight and nine patients in eachsequence, respectively, and N 17. We conductedan exploratory data analysis to determine thecovariance structure, and we observed the compoundsymmetry covariance structure to provide an adequatefit to the data.

    We analyzed the original complete data usingPROC MIXED in SAS, and compared the (exact)test for the treatment and carryover effects to the tdistribution with the degrees of freedom computedfrom (N 2 15 from the complete actual data.Grizzle

    [29]

    suggested a two-stage analysis for assessingthe significance of the residual effects, using a P-valueof 0.15 to determine insignificance of a residual effect.Based on the two-stage analysis, we did not remove theresidual effect, and thus concluded that, in the presenceof a nonnegligible residual effect, the test of a directtreatment effect was significant at 0.05 (first row ofTable 1).

    Next, we induced missing values by deleting themeasurements in the second period from the four sub-

    jects in the BA treatment sequence to produce MARdata. We then compared the analysis based on thecomplete subset data omitting both measurements

    from the four subjects with missing observations inthe second period to t distribution, with the exactdegrees of freedom computed from the complete subsetdata of 17 4 13 patients (second row ofTable 1). The results are similar to the original dataanalysis, with a slightly lower estimate for the treat-ment effect and a higher estimate for the residualeffect, but higher standard errors for both.

    Applying the available case analysis method ofCarriere,[1] the tests are compared against t11, theapproximate degrees of freedom ofN2 2, whereN2 is the number of subjects with complete observa-tions. We obtained results similar to the original data

    analysis (third row of Table 1). We see improved powerover the complete subset analysis. A slightly differentapproximation procedure by SAS PROC MIXEDproduced results similar to complete subset analysis(fourth row of Table 1). In particular, the residualeffect is now insignificant according to Grizzle.[29]

    Some comparisons of these two available case methodscan be found in Carriere.[1]

    Applying the single imputation approach of Huanget al.[23], we considered: 1) choosing values close to

    Analysis of Repeated Measures Data with Missing Values 5

  • 8/13/2019 Analysis of Repeated Measures Data With Missing Values- An Overview of Methods

    7/8

    their first period values, taking into account that theytend to be smaller in period 2 than in period 1 (proxyI 2:5; 1:5; 3:0; 1:0 and 2) slightly overestimatingthe missing data (proxy II 3:0; 2:0; 3:5; 1:25. Wefirst examined the bias and possible variance heteroge-neity of the proxy data from the actual responses. Thevariances of the proxy data sets are a little larger thanthose of the actual data, but the test of a commonvariance is not rejected. Hence, the tests are comparedagainst t13, with approximate degrees of freedom ofN 4, assuming a common variance between proxyand the real data. Rows 5 and 6 in Table 1 show the

    results of the analysis described in Huang et al.[23]

    ,and rows 7 and 8 show those using SAS PROCMIXED. When using PROC MIXED, we adjustedthe degrees of freedom as suggested by Huang et al. [23]

    These proxy approaches reflected the mechanism usedin choosing the proxy values with slightly less sensitiveresults in the proxy II than in proxy I, and an indica-tion that the residual effects are nonexistent. Huanget al.[23] note that tests of treatment effects are affectedby the presence of proxy data, resulting in a less sensi-tive outcome than for nonproxy approaches in thiscase. However, if we remove the insignificant residualeffects from the model, we reject the null hypothesis

    of equal treatment effects in all cases, and concludethat the treatment effects are significantly different,as in the analysis of full original data.

    Finally, we applied the non-regression-based multi-ple imputation approach of Huang and Carriere.[27]

    The results are reported in row 9 of Table 1. Theadvantage of this approach is the convenience of usingany standard software of choice for the analysis, butthe penalty is quite high, as reflected in the adjusteddegrees of freedom. The qualitative results are similar

    to those from the original data analysis and theavailable case analysis (see rows 1 and 3).

    CONCLUSIONS

    The purpose of this entry is twofold: the first is to sup-plement the Carriere et al.[4] by including the imputa-tion procedure for small sample repeated measuresdata and the second is to compare implications of var-ious incomplete data methods. Available data analysesare said to be generally unsatisfactory, because thecalculation of the asymptotic standard errors of esti-mators is quite complex even for MCAR. However,when available, they are found to be more powerfulthan multiple imputations. Imputation-based proce-dures fill in for the missing values to analyze them bystandard methods. When desired, the strategy of themultiple imputation method discussed by Huang andCarriere[27] may be used for small sample repeatedmeasures data; its overall performance was notsubstantially worse than that of the alternative. Thediscussion was limited to the MCAR and MAR casesand future work dealing with the NIM case will beforthcoming.

    ACKNOWLEDGMENTS

    This work was funded by grants from the NaturalSciences and Engineering Research Council ofCanada, the Alberta Heritage Foundation for MedicalResearch, and the Korea Federation of Science andTechnology Societies (Brain Pool) to K.C. Carriereand from the Korean Research Foundation (KRF-2004-015-C0086) to T.S. Park.

    Table 1 Analysis results of bronchial asthma data

    Method ss sess df P-value cc secc df P-valueFull original data 0.384 0.169 15 0.038 0.512 0.315 15 0.125Complete subset data 0.404 0.178 11 0.044 0.503 0.323 11 0.148Incomplete methoda 0.384 0.152 11 0.028 0.471 0.285 11 0.127Incomplete methodb 0.384 0.163 10 0.040 0.470 0.309 10 0.159

    Proxy

    c

    0.384 0.177 13 0.049 0.468 0.340 13 0.193Proxyc 0.384 0.180 13 0.053 0.468 0.348 13 0.206Proxyd 0.384 0.166 13 0.038 0.468 0.319 13 0.166Proxyd 0.384 0.169 13 0.041 0.468 0.326 13 0.174MIe 0.384 0.164 9.429 0.043 0.554 0.333 6.554 0.143aMethod by Carriere.[1]bMethod by PROC MIXED of SAS.cMethod by Huang et al.[23]

    dMethod by PROC MIXED of SAS with df adjustment of Huang et al.[23]eMultiple imputation method by Huang and Carriere.[27]

    6 Analysis of Repeated Measures Data with Missing Values

  • 8/13/2019 Analysis of Repeated Measures Data With Missing Values- An Overview of Methods

    8/8

    REFERENCES

    1. Carriere, K.C. Incomplete repeated measures dataanalysis in the presence of treatment effects. J.Am. Statist. Assoc. 1994, 89, 680686.

    2. Carriere, K.C. Methods for repeated measuresdata analysis with missing values. J. Statist.Plann. Inference 1999, 77, 221236.

    3. Richardson, B.A.; Flack, V.F. The analysis ofincomplete data in the three-period two-treatmentcross-over design for clinical trials. Stat. Med.1996, 15(2), 127143.

    4. Carriere, K.C.; Huang, R.; Sheng, X.; Liang, Y.Missing values in repeated measures designs. InEncyclopedia of Biopharmaceutical Statistics,2nd Ed.; Marcel Dekker Inc., in press.

    5. Little, R.J.A.; Rubin, D.B. Statistical Analysiswith Missing Data; John Wiley: New York, 1987.

    6. Barnard, J.O.; Rubin, D.B. Small-sample degreesof freedom with multiple imputation. Biometrika

    1999, 86(4), 948955.7. Efron, B. Missing data, imputation, and the boot-strap. J. Am. Statist. Assoc. 1994, 89, 463478.

    8. Gelfand, A.E.; Smith, A.F.M. Sampling-basedapproaches to calculating marginal densities. J.Am. Statist. Assoc. 1990, 85, 398409.

    9. Rubin, D.B. Multiple Imputation for Non-response in Surveys; John Wiley: New York, 1987.

    10. Little, R. Pattern-mixture models for multivariateincomplete data. J. Am. Statist. Assoc. 1993, 88,125134.

    11. Rubin, D.B.; Schenker, N. Interval estimationfrom multiple imputed data: a case study using

    agriculture industry codes. J. Official Statist.1987, 3, 375387.

    12. Paik, M.C. The generalized estimating equationapproach when data are not missing completelyat random. J. Am. Statist. Assoc. 1997, 92,13201329.

    13. Xie, F.; Paik, M.C. Generalized estimatingequation model for binary outcomes with missingcovariates. Biometrics 1997, 53, 14581466.

    14. Sheng, X.; Carriere, K.C. Strategies for analyzingmissing item response data with an application.Biom. J. in press.

    15. Tanner, M.A.; Wong, W.H. The calculation of

    posterior distributions by data augmentation

    (C=R: p541550). J. Am. Statist. Assoc. 1987,82, 528540.

    16. Gelman, A.; Rubin, D.B. Inference from iterativesimulation using multiple sequences (Disc: p. 483501, 503511). Statist. Sci. 1992, 7, 457472.

    17. Liang, K.-Y.; Zeger, S. Longitudinal data analysisusing generalized linear models. Biometrika 1986,73, 1333.

    18. Park, T.; Lee, S.Y. A test of missing completelyat random for longitudinal data with missingobservations. Statist. Med. 1997, 16, 18591871.

    19. Park, T.; Lee, S.Y. Simple pattern-mixture modelsfor longitudinal data with missing observations.Statist. Med. 1999, 18, 29332941.

    20. Kim, J.O.; Curry, J. The treatment of missingdata in multivariate analysis. Sociol. MethodsRes. 1977, 6, 215240.

    21. Satterthwaite, F.E. An approximate distributionof estimates of variance components. BiometricsBull.1946, 2, 110114.

    22. SAS Institute. SAS Technical Report P-229(6.07); SAS Institute, Inc.: Cary, NC, 2002.23. Huang, R.; Liang, Y.Y.; Carriere, K.C. The role

    of proxy information in missing data analysis.Statist. Methods Med. Res. in press.

    24. Li, K.H.; Raghunathan, T.E.; Rubin, D.B.;Large-sample significance levels from multiplyimputed data using moment-based statistics andanFreference distribution. J. Am. Statist. Assoc.1991, 86, 10651073.

    25. Li, K.H.; Meng, X.L.; Raghunathan, T.E.; Rubin,D.B. Significance levels from repeated p-valueswith multiply-imputed data. Statist. Sinica 1991,

    1, 6592.26. Meng, X.L.; Rubin, D.B. Performing likelihood

    ratio tests with multiply-imputed data sets.Biometrika1992, 79, 103111.

    27. Huang, R; Carriere, K.C. Comparison of meth-ods for incomplete repeated measures data analy-sis in small samples. J. Statist. Plann. Inference, inpress.

    28. Patel, H.I. Analysis of incomplete data from aclinical trial with repeated measurements. Biome-trika1991, 78, 609919.

    29. Grizzle, J.E. The two-period change-over designand its use in clinical trials. Biometrics 1965, 21,

    467480.

    Analysis of Repeated Measures Data with Missing Values 7