Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ......
Transcript of Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ......
Lecture Notes in Statistics Edited by P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg, I. OIkin, N. Wermuth, S. Zeger
126
Springer New York Berlin Heidelberg Barcelona Budapest Hong Kong London Milan Paris Santa Clara Singapore Tokyo
Geert Verbeke Geert Molenberghs
Linear Mixed Models in Practice A SAS-Oriented Approach
t Springer
Geert Verbeke Biostatistical Centre for Clinical Trials Katholieke Universiteit Leuven U.Z. Sint-Rafael Kapucijnenvoer 35 B-3000 Leuven Belgium
Geert Molenberghs Biostatistics Limburgs Universitair Centrum Universitaire Campus, Building D B-3590 Diepenbeek Belgium
SAS® is a registered trademark of SAS Institute, Inc. in the USA and other countries.
Verbeke, Geert. Linear mixed models in practice: a SAS-oriented approach/Geert
Verbeke, Geert Molenberghs. ' p. cm.-(Lecture notes in statistics; 88)
Includes bibliographical references and index. ISBN-13: 978-0-387-98222-9 e-ISBN-13: 978-1-4612-2294-1 DOl: 10.1007/978-1-4612-2294-1 1. Linear models (Statistics)-Data processing. 2. SAS (Computer
file) I. Molenberghs, Geert. II. Title III. Series: Lecture notes in statistics (Springer-Verlag); v. 88.
QA279.V46 1997 519.5'35--dc21 97-15705
Printed on acid-free paper.
© 1997 Springer-Verlag New York, Inc.
All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone.
Camera-ready copy provided by the authors.
987654321
ISBN-13: 978-0-387-98222-9 Springer-Verlag New York Berlin Heidelberg SPIN 10557083
To Godewina and Lien
To Conny and An
Preface
The dissemination of the MIXED procedure in SAS has provided a whole class of statistical models for routine use. We believe that both the ideas behind the techniques and their implementation in SAS are not at all straightforward and users from various applied backgrounds, including the pharmaceutical industry, have experienced difficulties in using the procedure effectively. Courses and consultancy on PROC MIXED have been in great demand in recent years, illustrating the clear need for resource material to aid the user. This book is intended as a contribution to bridging this gap.
We hope the book will be of value to a wide audience, including applied statisticians and many biomedical researchers, particularly in the pharmaceutical industry, medical and public health research organizations, contract research organizations, and academic departments. This implies that our book is explanatory rather than research oriented and that it emphasizes practice rather than mathematical rigor. In this respect, clear guidance and advice on practical issues are the main focus of the text. Nevertheless, this does not imply that more advanced topics have been avoided. Sections containing material of a deeper level have been sign posted by means of an asterisk.
The text grew out of two short courses for which the initiative was taken in 1995 and 1996 by the Biopharmaceutical Section of the Belgian Statistical Society. They took place at the Limburgs Universitair Centrum (Diepenbeek). Each of these two-day courses devoted one session to an examplebased introduction of the linear mixed model (Luc Duchateau and Paul Janssen) where various types of applications were explored, two sessions to longitudinal data analysis with the linear mixed model (Geert Verbeke), and one session to missing data issues (Geert Molenberghs, Luc Bijnens, and David Shaw). This structure is directly reflected in the structure of this book. While this text is strictly speaking an edited volume, editors and authors have devoted a lot of attention to streamlining the treatment. In particular, we aimed for a single notational convention and for a clear delineation between the treatment of topics. For a couple of reasons there is a little residual overlap. First, the earlier chapters introduce concepts with emphasis on examples and simple derivations, while a broader treatment is given in later chapters. Secondly, we have chosen not to introduce new concepts until their first place of natural occurrence. Thirdly, different data structures require different choices at analysis time.
viii Preface
The various chapters have benefited from exposure to students in various situations, such as the European Course of Advanced Statistics (MiltonKeynes, September 1995) where Chapter 3 was used in a session on missing data. Most of the material has been used in the Analysis of Variance and Longitudinal Data Analysis courses at the Master of Science in Biostatistics Programme of the Limburgs Universitair Centrum and in the Topics in Biostatistics course of the Universiteit Antwerpen.
The first chapter provides a general introduction. Chapter 2 is an examplebased treatment to statistical inference for mixed models. In Chapter 3, we discuss the analysis of longitudinal data using linear mixed models, from a practical point of view. Chapter 4 is devoted to four longitudinal case studies. The concluding chapter considers the ubiquitous problem of missing and incomplete longitudinal data.
It will be clear that we have placed a strong emphasis on the SAS procedure MIXED. This should not discourage the non-SAS users from consulting this book. We have put considerable effort in treating data analysis issues in a generic fashion, instead of making them fully software dependent. Therefore, a research question is first translated into a statistical model by means of algebraic notation. In a number of cases, such a model is then implemented using SAS code. Similarly, SAS output is in the majority of cases avoided, even though most parts of a typical SAS PROC MIXED output are discussed at least once. In general, we have opted to present relevant results of model fit (such as parameter estimates, standard errors, maximized log-likelihoods, etc.) in figures and tables that do not necessarily reflect the structure of the output. In Section 5.11, we give a brief treatment of the S-Plus set of functions termed OSWALD (Version 2.6).
Throughout this book we used SAS Versions 6.11 and 6.12. For the MIXED procedure, the differences between both versions are minor. They include different tabular output and a few new covariance structures. Selected macros for model checks, diagnostic tools, and for multiple imputation are available from Springer-Verlag's URL: www.springer-ny.com.
Luc Duchateau (ILRI, Nairobi)
Paul Janssen (LUC, Diepenbeek)
Geert Molenberghs (LUC, Diepenbeek)
Geert Verbeke (KUL, Leuven)
Contents
Preface
Acknowledgements
1 Introduction GEERT MOLENBERGHS, GEERT VERBEKE
2 An Example-Based Tour in Linear Mixed Models Luc DUCHATEAU, PAUL JANSSEN
vii
xv
1
11
2.1 Fixed Effects and Random Effects in Mixed Models. 11
2.2 General Linear Mixed Models. . . . . . . . . . . . . 22
2.3 Variance Components Estimation and Best Linear Unbiased Prediction . . . . . . . . . . . . . . . . . . 30
2.3.1 Variance Components Estimation.
2.3.2 Best Linear Unbiased Prediction (BLUP)
2.3.3 Examples and the SAS Procedure MIXED.
2.4 Fixed Effects: Estimation and Hypotheses Testing
2.4.1 General Considerations .......... .
2.4.2 Examples and the SAS Procedure MIXED.
2.5 Case Studies .. ....
2.5.1 Cell Proliferation
2.5.2 A Cross-Over Experiment
2.5.3 A Multicenter Trial
3 Linear Mixed Models for Longitudinal Data GEERT VERBEKE
3.1 Introduction
30
34
36
39
39
48
50
50
54
56
63
63
x Contents
3.2 The Study of Natural History of Prostate Disease
3.3 A Two-Stage Analysis .......... .
3.4 The General Linear Mixed-Effects Model
65
67
70
3.4.1 The Model ...... . . . . . . 70
3.4.2 Maximum Likelihood Estimation 72
3.4.3 Restricted Maximum Likelihood Estimation 73
3.4.4 Comparison between ML and REML Estimation 74
3.4.5 Model-Fitting Procedures 75
3.5 Example
3.5.1 The SAS Program
3.5.2 The SAS Output
3.5.3 Estimation Problems due to Small Variance Components . . . . . . . . . . . . . . .
3.6 The RANDOM and REPEATED Statements
3.7 Testing and Estimating Contrasts of Fixed Effects
3.7.1 The CONTRAST Statement
3.7.2 Model Reduction
3.7.3 The ESTIMATE Statement
3.8 PROC MIXED versus PROC GLM
3.9 Tests for the Need of Random Effects
3.9.1 The Likelihood Ratio Test
3.9.2 Applied to the Prostate Data
3.10 Comparing Non-Nested Covariance Structures
3.11 Estimating the Random Effects
3.12 General Guidelines for Model Construction
3.12.1 Selection of a Preliminary Mean Structure
3.12.2 Selection of Random-Effects
76
76
81
92
94
96
96
100
104
106
108
109
112
113
115
120
121
122
Contents xi
3.12.3 Selection of Residual Covariance Structure 125
3.12.4 Model Reduction 130
3.13 Model Checks and Diagnostic Tools * 132
3.13.1 Normality Assumption for the Random Effects * 133
3.13.2 The Detection of Influential Subjects * 144
3.13.3 Checking the Covariance Structure * . 150
4 Case Studies 155 GEERT VERBEKE, GEERT MOLENBERGHS
4.1 Example 1: Variceal Pressures
4.2 Example 2: Growth Curves
4.3 Example 3: Blood Pressures
4.4 Example 4: Growth Data
4.4.1 Modell
4.4.2 Model 2
4.4.3 Model 3
4.4.4 Gr:aphical Exploration
4.4.5 Model 4
4.4.6 Model 5
4.4.7 Model 6
4.4.8 Model 7
4.4.9 Model 8
5 Linear Mixed Models and Missing Data GEERT MOLENBERGHS, Luc BIJNENS, DAVID SHAW
5.1 Introduction.
5.2 Missing Data
5.2.1 Missing Data Patterns
5.2.2 Missing Data Mechanisms
155
160
167
172
174
179
181
182
183
185
186
187
189
191
191
193
194
195
xii Contents
5.2.3 Ignorability ....... .
5.3 Approaches to Incomplete Data.
5.4 Complete Case Analysis
5.4.1 Growth Data . .
5.5 Simple Forms of Imputation.
196
197
198
199
203
5.5.1 Last Observation Carried Forward 204
5.5.2 Imputing Unconditional Means . . 209
5.5.3 Buck's Method: Conditional Mean Imputation 214
5.5.4 Discussion of Imputation Techniques 218
5.6 Available Case Methods 219
5.6.1 Growth Data . . 220
5.7 Likelihood-Based Ignorable Analysis and PROC MIXED. 222
5.7.1 Growth Data 223
5.7.2 Summary . . 233
5.8 How Ignorable Is Missing At Random? * 233
5.8.1 .Information and Sampling Distributions * 5.8.2 Illustration * 5.8.3 Example * .. 5.8.4 Implications for PROC MIXED.
5.9 The Expectation-Maximization Algorithm * 5.10 Multiple Imputation *
5.10.1 General Theory * 5.10.2 Illustration: Growth Data *
5.11 Exploring the Missing Data Process
5.11.1 Growth Data ....... .
5.11.2 Informative Non-Response.
235
237
242
243
244
247
248
250
254
255
256
Contents xiii
5.11.3 OSWALD for Informative Non-Response 258
A Inference for Fixed Effects 275
A.l Estimation .... 275
A.2 Hypothesis Testing 277
A.3 Determination of Degrees of Freedom. 277
A.4 Satterthwaite's Procedure ....... 279
B Variance Components and Standard Errors 281
C Details on Table 2.10: Expected Mean Squares 283
D Example 2.8: Cell Proliferation 285
References 287
Index 300
Acknowledgements
We are indebted to the board members of the Biopharmaceutical Section of the Belgian Statistical Society, Annick Leroy (Bristol-Myers Squibb, Waterloo), William Malbecq (Merck Sharpe and Dohme, Brussels), and Linda Ritter (ID2, Brussels) on the one hand, and to Paul Janssen (Limburgs Universitair Centrum, Diepenbeek) on the other hand who jointly took the initiative to organize a couple of workshops on "Linear Mixed Models in SAS". We gratefully acknowledge the help we had from the other members of the initial working group that was formed in preparation of these courses, Luc Wouters and Tony Vangeneugden (Janssen Pharmaceutica, Beerse), Luc Duchateau (ILRI, Nairobi), Luc Bijnens (EORTC, Brussels) and David Shaw (Shaw Statistics Ltd, Bucks). We thank Peter Diggle and Dave Smith (Lancaster University) for valuable support and stimulating discussions, not in the least on OSWALD. We appreciate input from Russell Wolfinger (SAS Institute, Cary) on SAS related matters. Several sections in the book are based on joint research: with Emmanuel Lesaffre (Katholieke Universiteit Leuven), Larry Brant (Gerontology Research Center and The Johns Hopkins University, Baltimore), Mike Kenward (University of Kent, Canterbury), and Stuart Lipsitz (Dana-Farber Cancer Institute and Harvard School of Public Health, Boston). Interesting sets of data were provided by Luc Wouters, Tony Vangeneugden, and Larry Brant. We are very grateful to Viviane Mebis (Limburgs Universitair Centrum) and Bart Spiessens (Katholieke Universiteit Leuven) for their invaluable secretarial and technical support. We apologize to our wives and daughters for the time we did not spend with them during the preparation of this book and we are very grateful for their understanding. The preparation of this book has been a period of close and stimulating collaboration, to which we will keep good memories.
Geert and Geert
Kessel-Lo, April 1997