Applied Missing Data MISSING DATA: A Gentle Introduction Patrick E. McKnight, Katherine M. McKnight,...
Embed Size (px)
Transcript of Applied Missing Data MISSING DATA: A Gentle Introduction Patrick E. McKnight, Katherine M. McKnight,...
Applied Missing Data Analysis
Methodology in the Social Sciences David A. Kenny, Founding Editor Todd D. Little, Series Editor
This series provides applied researchers and students with analysis and research design books that emphasize the use of methods to answer research questions. Rather than emphasizing statistical theory, each volume in the series illustrates when a technique should (and should not) be used and how the output from available software programs should (and should not) be interpreted. Common pitfalls as well as areas of further development are clearly articulated.
SPECTRAL ANALYSIS OF TIME-SERIES DATA Rebecca M. Warner
A PRIMER ON REGRESSION ARTIFACTS Donald T. Campbell and David A. Kenny
REGRESSION ANALYSIS FOR CATEGORICAL MODERATORS Herman Aguinis
HOW TO CONDUCT BEHAVIORAL RESEARCH OVER THE INTERNET: A Beginner’s Guide to HTML and CGI/Perl R. Chris Fraley
CONFIRMATORY FACTOR ANALYSIS FOR APPLIED RESEARCH Timothy A. Brown
DYADIC DATA ANALYSIS David A. Kenny, Deborah A. Kashy, and William L. Cook
MISSING DATA: A Gentle Introduction Patrick E. McKnight, Katherine M. McKnight, Souraya Sidani, and Aurelio José Figueredo
MULTILEVEL ANALYSIS FOR APPLIED RESEARCH: It’s Just Regression! Robert Bickel
THE THEORY AND PRACTICE OF ITEM RESPONSE THEORY R. J. de Ayala
THEORY CONSTRUCTION AND MODEL-BUILDING SKILLS: A Practical Guide for Social Scientists James Jaccard and Jacob Jacoby
DIAGNOSTIC MEASUREMENT: Theory, Methods, and Applications André A. Rupp, Jonathan Templin, and Robert A. Henson
APPLIED MISSING DATA ANALYSIS Craig K. Enders
APPLIED MISSING DATA ANALYSIS
Craig K. Enders
Series Editor’s Note by Todd D. Little
THE GUILFORD PRESS New York London
© 2010 The Guilford Press A Division of Guilford Publications, Inc. 72 Spring Street, New York, NY 10012 www.guilford.com
All rights reserved
No part of this book may be reproduced, translated, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, microfi lming, recording, or otherwise, without written permission from the publisher.
Printed in the United States of America
This book is printed on acid-free paper.
Last digit is print number: 9 8 7 6 5 4 3 2 1
Library of Congress Cataloging-in-Publication Data is available from the publisher.
Series Editor’s Note
Missing data are a real bane to researchers across all social science disciplines. For most of our scientifi c history, we have approached missing data much like a doctor from the ancient world might use bloodletting to cure disease or amputation to stem infection (e.g, removing the infected parts of one’s data by using list-wise or pair-wise deletion). My metaphor should make you feel a bit squeamish, just as you should feel if you deal with missing data using the antediluvian and ill-advised approaches of old. Fortunately, Craig Enders is a gifted quan- titative specialist who can clearly explain missing data procedures to diverse readers from beginners to seasoned veterans. He brings us into the age of modern missing data treatments by demystifying the arcane discussions of missing data mechanisms and their labels (e.g., MNAR) and the esoteric acronyms of the various techniques used to address them (e.g., FIML, MCMC, and the like).
Enders’s approachable treatise provides a comprehensive treatment of the causes of miss- ing data and how best to address them. He clarifi es the principles by which various mecha- nisms of missing data can be recovered, and he provides expert guidance on which method to implement and how to execute it, and what to report about the modern approach you have chosen. Enders’s deft balancing of practical guidance with expert insight is refreshing and enlightening. It is rare to fi nd a book on quantitative methods that you can read for its stated purpose (educating the reader about modern missing data procedures) and fi nd that it treats you to a level of insight on a topic that whole books dedicated to the topic cannot match. For example, Enders’s discussions of maximum likelihood and Bayesian estimation procedures are the clearest, most understandable, and instructive discussions I have read— your inner geek will be delighted, really.
Enders successfully translates the state-of-the art technical missing data literature into an accessible reference that you can readily rely on and use. Among the treasures of Enders’s work are the pointed simulations that he has developed to show you exactly what the techni- cal literature obtusely presents. Because he provides such careful guidance of the foundations and the step-by-step processes involved, you will quickly master the concepts and issues of this critical literature. Another treasure is his use of a common running example that he
vi Series Editor’s Note
builds upon as more complex issues are presented. And if these features are not enough, you can also visit the accompanying website (www.appliedmissingdata.com), where you will fi nd up-to-date program fi les for the examples presented, as well as additional examples of the different software programs available for handling missing data.
What you will learn from this book is that missing data imputation is not cheating. In fact, you will learn why the egregious scientifi c error would be the business-as-usual ap- proaches that still permeate our journals. You will learn that modern missing data procedures are so effective that intentionally missing data designs often can provide more valid and gen- eralizable results than traditional data collection protocols. In addition, you will learn to re- think how you collect data to maximize your ability to recover any missing data mechanisms and that many quandaries of design and analysis become resolvable when recast as a missing data problem. Bottom line—after you read this book you will have learned how to go forth and impute with impunity!
TODD D. LITTLE University of Kansas Lawrence, Kansas
Methodologists have been studying missing data problems for the better part of a century, and the number of published articles on this topic has increased dramatically in recent years. A great deal of recent methodological research has focused on two “state-of-the-art” missing data methods: maximum likelihood and multiple imputation. Accordingly, this book is de- voted largely to these techniques. Quoted in the American Psychological Association’s Monitor on Psychology, Stephen G. West, former editor of Psychological Methods, stated that “routine implementation of these new methods of addressing missing data will be one of the major changes in research over the next decade” (Azar, 2002). Although researchers are using maxi- mum likelihood and multiple imputation with greater frequency, reviews of published articles in substantive journals suggest that a gap still exists between the procedures that the meth- odological literature recommends and those that are actually used in the applied research studies (Bodner, 2006; Peugh & Enders, 2004; Wood, White, & Thompson, 2004).
It is understandable that researchers routinely employ missing data handling techniques that are objectionable to methodologists. Software packages make old standby techniques (e.g., eliminating incomplete cases) very convenient to implement. The fact that software pro- grams routinely implement default procedures that are prone to substantial bias, however, is troubling because such routines implicitly send the wrong message to researchers interested in using statistics without having to keep up with the latest advances in the missing data literature. The technical nature of the missing data literature is also a signifi cant barrier to the widespread adoption of maximum likelihood and multiple imputation. While many of the fl awed missing data handling techniques (e.g., excluding cases, replacing missing values with the mean) are very easy to understand, the newer approaches can seem like voodoo. For ex- ample, researchers often appear perplexed by the possibility of conducting an analysis with- out discarding cases and without fi lling in the missing values—and rightfully so. The seminal books on missing data analyses (Little & Rubin, 2002; Schafer, 1997) are rich sources of technical information, but these books can be a daunting read for substantive researchers and methodologists alike. In large part, the purpose of this book is to “translate” the techni- cal missing data literature into an accessible reference text.
Because missing data are a pervasive problem in virtually any discipline that employs quantitative research methods, my goal was to write a book that would be relevant and ac- cessible to researchers from a wide range of disciplines, including psychology, education, sociology, business, and medicine. For me, it is important for the book to serve as an acces- sible reference for substantive researchers who use quantitative methods in their work but do not consider themselves quantitative specialists. At the same time, many quantitative meth- odologists are unfamiliar with the nuances of modern missing data handling techniques. Therefore, it was also important to provide a level of detail that could serve as a springboard for accessing technically oriented missing data books such as Little and Rubin (2002) and Schafer (1997). Most of the information in this book