Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ......

13
Lecture Notes in Statistics Edited by P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg, I. OIkin, N. Wermuth, S. Zeger 126

Transcript of Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ......

Page 1: Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ... Janssen) where various types of applications were explored, two sessions to longitudinal

Lecture Notes in Statistics Edited by P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg, I. OIkin, N. Wermuth, S. Zeger

126

Page 2: Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ... Janssen) where various types of applications were explored, two sessions to longitudinal

Springer New York Berlin Heidelberg Barcelona Budapest Hong Kong London Milan Paris Santa Clara Singapore Tokyo

Page 3: Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ... Janssen) where various types of applications were explored, two sessions to longitudinal

Geert Verbeke Geert Molenberghs

Linear Mixed Models in Practice A SAS-Oriented Approach

t Springer

Page 4: Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ... Janssen) where various types of applications were explored, two sessions to longitudinal

Geert Verbeke Biostatistical Centre for Clinical Trials Katholieke Universiteit Leuven U.Z. Sint-Rafael Kapucijnenvoer 35 B-3000 Leuven Belgium

Geert Molenberghs Biostatistics Limburgs Universitair Centrum Universitaire Campus, Building D B-3590 Diepenbeek Belgium

SAS® is a registered trademark of SAS Institute, Inc. in the USA and other countries.

Verbeke, Geert. Linear mixed models in practice: a SAS-oriented approach/Geert

Verbeke, Geert Molenberghs. ' p. cm.-(Lecture notes in statistics; 88)

Includes bibliographical references and index. ISBN-13: 978-0-387-98222-9 e-ISBN-13: 978-1-4612-2294-1 DOl: 10.1007/978-1-4612-2294-1 1. Linear models (Statistics)-Data processing. 2. SAS (Computer

file) I. Molenberghs, Geert. II. Title III. Series: Lecture notes in statistics (Springer-Verlag); v. 88.

QA279.V46 1997 519.5'35--dc21 97-15705

Printed on acid-free paper.

© 1997 Springer-Verlag New York, Inc.

All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodol­ogy now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone.

Camera-ready copy provided by the authors.

987654321

ISBN-13: 978-0-387-98222-9 Springer-Verlag New York Berlin Heidelberg SPIN 10557083

Page 5: Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ... Janssen) where various types of applications were explored, two sessions to longitudinal

To Godewina and Lien

To Conny and An

Page 6: Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ... Janssen) where various types of applications were explored, two sessions to longitudinal

Preface

The dissemination of the MIXED procedure in SAS has provided a whole class of statistical models for routine use. We believe that both the ideas be­hind the techniques and their implementation in SAS are not at all straight­forward and users from various applied backgrounds, including the phar­maceutical industry, have experienced difficulties in using the procedure effectively. Courses and consultancy on PROC MIXED have been in great demand in recent years, illustrating the clear need for resource material to aid the user. This book is intended as a contribution to bridging this gap.

We hope the book will be of value to a wide audience, including applied statisticians and many biomedical researchers, particularly in the pharma­ceutical industry, medical and public health research organizations, con­tract research organizations, and academic departments. This implies that our book is explanatory rather than research oriented and that it empha­sizes practice rather than mathematical rigor. In this respect, clear guidance and advice on practical issues are the main focus of the text. Nevertheless, this does not imply that more advanced topics have been avoided. Sections containing material of a deeper level have been sign posted by means of an asterisk.

The text grew out of two short courses for which the initiative was taken in 1995 and 1996 by the Biopharmaceutical Section of the Belgian Statistical Society. They took place at the Limburgs Universitair Centrum (Diepen­beek). Each of these two-day courses devoted one session to an example­based introduction of the linear mixed model (Luc Duchateau and Paul Janssen) where various types of applications were explored, two sessions to longitudinal data analysis with the linear mixed model (Geert Verbeke), and one session to missing data issues (Geert Molenberghs, Luc Bijnens, and David Shaw). This structure is directly reflected in the structure of this book. While this text is strictly speaking an edited volume, editors and authors have devoted a lot of attention to streamlining the treatment. In particular, we aimed for a single notational convention and for a clear delineation between the treatment of topics. For a couple of reasons there is a little residual overlap. First, the earlier chapters introduce concepts with emphasis on examples and simple derivations, while a broader treatment is given in later chapters. Secondly, we have chosen not to introduce new concepts until their first place of natural occurrence. Thirdly, different data structures require different choices at analysis time.

Page 7: Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ... Janssen) where various types of applications were explored, two sessions to longitudinal

viii Preface

The various chapters have benefited from exposure to students in various situations, such as the European Course of Advanced Statistics (Milton­Keynes, September 1995) where Chapter 3 was used in a session on missing data. Most of the material has been used in the Analysis of Variance and Longitudinal Data Analysis courses at the Master of Science in Biostatistics Programme of the Limburgs Universitair Centrum and in the Topics in Biostatistics course of the Universiteit Antwerpen.

The first chapter provides a general introduction. Chapter 2 is an example­based treatment to statistical inference for mixed models. In Chapter 3, we discuss the analysis of longitudinal data using linear mixed models, from a practical point of view. Chapter 4 is devoted to four longitudinal case studies. The concluding chapter considers the ubiquitous problem of missing and incomplete longitudinal data.

It will be clear that we have placed a strong emphasis on the SAS procedure MIXED. This should not discourage the non-SAS users from consulting this book. We have put considerable effort in treating data analysis issues in a generic fashion, instead of making them fully software dependent. Therefore, a research question is first translated into a statistical model by means of algebraic notation. In a number of cases, such a model is then implemented using SAS code. Similarly, SAS output is in the majority of cases avoided, even though most parts of a typical SAS PROC MIXED output are discussed at least once. In general, we have opted to present relevant results of model fit (such as parameter estimates, standard errors, maximized log-likelihoods, etc.) in figures and tables that do not necessarily reflect the structure of the output. In Section 5.11, we give a brief treatment of the S-Plus set of functions termed OSWALD (Version 2.6).

Throughout this book we used SAS Versions 6.11 and 6.12. For the MIXED procedure, the differences between both versions are minor. They include different tabular output and a few new covariance structures. Selected macros for model checks, diagnostic tools, and for multiple imputation are available from Springer-Verlag's URL: www.springer-ny.com.

Luc Duchateau (ILRI, Nairobi)

Paul Janssen (LUC, Diepenbeek)

Geert Molenberghs (LUC, Diepenbeek)

Geert Verbeke (KUL, Leuven)

Page 8: Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ... Janssen) where various types of applications were explored, two sessions to longitudinal

Contents

Preface

Acknowledgements

1 Introduction GEERT MOLENBERGHS, GEERT VERBEKE

2 An Example-Based Tour in Linear Mixed Models Luc DUCHATEAU, PAUL JANSSEN

vii

xv

1

11

2.1 Fixed Effects and Random Effects in Mixed Models. 11

2.2 General Linear Mixed Models. . . . . . . . . . . . . 22

2.3 Variance Components Estimation and Best Linear Unbiased Prediction . . . . . . . . . . . . . . . . . . 30

2.3.1 Variance Components Estimation.

2.3.2 Best Linear Unbiased Prediction (BLUP)

2.3.3 Examples and the SAS Procedure MIXED.

2.4 Fixed Effects: Estimation and Hypotheses Testing

2.4.1 General Considerations .......... .

2.4.2 Examples and the SAS Procedure MIXED.

2.5 Case Studies .. ....

2.5.1 Cell Proliferation

2.5.2 A Cross-Over Experiment

2.5.3 A Multicenter Trial

3 Linear Mixed Models for Longitudinal Data GEERT VERBEKE

3.1 Introduction

30

34

36

39

39

48

50

50

54

56

63

63

Page 9: Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ... Janssen) where various types of applications were explored, two sessions to longitudinal

x Contents

3.2 The Study of Natural History of Prostate Disease

3.3 A Two-Stage Analysis .......... .

3.4 The General Linear Mixed-Effects Model

65

67

70

3.4.1 The Model ...... . . . . . . 70

3.4.2 Maximum Likelihood Estimation 72

3.4.3 Restricted Maximum Likelihood Estimation 73

3.4.4 Comparison between ML and REML Estimation 74

3.4.5 Model-Fitting Procedures 75

3.5 Example

3.5.1 The SAS Program

3.5.2 The SAS Output

3.5.3 Estimation Problems due to Small Variance Components . . . . . . . . . . . . . . .

3.6 The RANDOM and REPEATED Statements

3.7 Testing and Estimating Contrasts of Fixed Effects

3.7.1 The CONTRAST Statement

3.7.2 Model Reduction

3.7.3 The ESTIMATE Statement

3.8 PROC MIXED versus PROC GLM

3.9 Tests for the Need of Random Effects

3.9.1 The Likelihood Ratio Test

3.9.2 Applied to the Prostate Data

3.10 Comparing Non-Nested Covariance Structures

3.11 Estimating the Random Effects

3.12 General Guidelines for Model Construction

3.12.1 Selection of a Preliminary Mean Structure

3.12.2 Selection of Random-Effects

76

76

81

92

94

96

96

100

104

106

108

109

112

113

115

120

121

122

Page 10: Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ... Janssen) where various types of applications were explored, two sessions to longitudinal

Contents xi

3.12.3 Selection of Residual Covariance Structure 125

3.12.4 Model Reduction 130

3.13 Model Checks and Diagnostic Tools * 132

3.13.1 Normality Assumption for the Random Effects * 133

3.13.2 The Detection of Influential Subjects * 144

3.13.3 Checking the Covariance Structure * . 150

4 Case Studies 155 GEERT VERBEKE, GEERT MOLENBERGHS

4.1 Example 1: Variceal Pressures

4.2 Example 2: Growth Curves

4.3 Example 3: Blood Pressures

4.4 Example 4: Growth Data

4.4.1 Modell

4.4.2 Model 2

4.4.3 Model 3

4.4.4 Gr:aphical Exploration

4.4.5 Model 4

4.4.6 Model 5

4.4.7 Model 6

4.4.8 Model 7

4.4.9 Model 8

5 Linear Mixed Models and Missing Data GEERT MOLENBERGHS, Luc BIJNENS, DAVID SHAW

5.1 Introduction.

5.2 Missing Data

5.2.1 Missing Data Patterns

5.2.2 Missing Data Mechanisms

155

160

167

172

174

179

181

182

183

185

186

187

189

191

191

193

194

195

Page 11: Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ... Janssen) where various types of applications were explored, two sessions to longitudinal

xii Contents

5.2.3 Ignorability ....... .

5.3 Approaches to Incomplete Data.

5.4 Complete Case Analysis

5.4.1 Growth Data . .

5.5 Simple Forms of Imputation.

196

197

198

199

203

5.5.1 Last Observation Carried Forward 204

5.5.2 Imputing Unconditional Means . . 209

5.5.3 Buck's Method: Conditional Mean Imputation 214

5.5.4 Discussion of Imputation Techniques 218

5.6 Available Case Methods 219

5.6.1 Growth Data . . 220

5.7 Likelihood-Based Ignorable Analysis and PROC MIXED. 222

5.7.1 Growth Data 223

5.7.2 Summary . . 233

5.8 How Ignorable Is Missing At Random? * 233

5.8.1 .Information and Sampling Distributions * 5.8.2 Illustration * 5.8.3 Example * .. 5.8.4 Implications for PROC MIXED.

5.9 The Expectation-Maximization Algorithm * 5.10 Multiple Imputation *

5.10.1 General Theory * 5.10.2 Illustration: Growth Data *

5.11 Exploring the Missing Data Process

5.11.1 Growth Data ....... .

5.11.2 Informative Non-Response.

235

237

242

243

244

247

248

250

254

255

256

Page 12: Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ... Janssen) where various types of applications were explored, two sessions to longitudinal

Contents xiii

5.11.3 OSWALD for Informative Non-Response 258

A Inference for Fixed Effects 275

A.l Estimation .... 275

A.2 Hypothesis Testing 277

A.3 Determination of Degrees of Freedom. 277

A.4 Satterthwaite's Procedure ....... 279

B Variance Components and Standard Errors 281

C Details on Table 2.10: Expected Mean Squares 283

D Example 2.8: Cell Proliferation 285

References 287

Index 300

Page 13: Lecture Notes in Statistics 126 - Springer978-1-4612-2294-1/1.pdfTo Conny and An . Preface ... Janssen) where various types of applications were explored, two sessions to longitudinal

Acknowledgements

We are indebted to the board members of the Biopharmaceutical Section of the Belgian Statistical Society, Annick Leroy (Bristol-Myers Squibb, Wa­terloo), William Malbecq (Merck Sharpe and Dohme, Brussels), and Linda Ritter (ID2, Brussels) on the one hand, and to Paul Janssen (Limburgs Universitair Centrum, Diepenbeek) on the other hand who jointly took the initiative to organize a couple of workshops on "Linear Mixed Mod­els in SAS". We gratefully acknowledge the help we had from the other members of the initial working group that was formed in preparation of these courses, Luc Wouters and Tony Vangeneugden (Janssen Pharma­ceutica, Beerse), Luc Duchateau (ILRI, Nairobi), Luc Bijnens (EORTC, Brussels) and David Shaw (Shaw Statistics Ltd, Bucks). We thank Peter Diggle and Dave Smith (Lancaster University) for valuable support and stimulating discussions, not in the least on OSWALD. We appreciate in­put from Russell Wolfinger (SAS Institute, Cary) on SAS related matters. Several sections in the book are based on joint research: with Emmanuel Lesaffre (Katholieke Universiteit Leuven), Larry Brant (Gerontology Re­search Center and The Johns Hopkins University, Baltimore), Mike Ken­ward (University of Kent, Canterbury), and Stuart Lipsitz (Dana-Farber Cancer Institute and Harvard School of Public Health, Boston). Interest­ing sets of data were provided by Luc Wouters, Tony Vangeneugden, and Larry Brant. We are very grateful to Viviane Mebis (Limburgs Universitair Centrum) and Bart Spiessens (Katholieke Universiteit Leuven) for their in­valuable secretarial and technical support. We apologize to our wives and daughters for the time we did not spend with them during the preparation of this book and we are very grateful for their understanding. The prepara­tion of this book has been a period of close and stimulating collaboration, to which we will keep good memories.

Geert and Geert

Kessel-Lo, April 1997