Flexible Imputation of Missing Data
Chapman and Hall/CRC – 2012 – 342 pages
Missing data form a problem in every scientific discipline, yet the techniques required to handle them are complicated and often lacking. One of the great ideas in statistical science—multiple imputation—fills gaps in the data with plausible values, the uncertainty of which is coded in the data itself. It also solves other problems, many of which are missing data problems in disguise.
Flexible Imputation of Missing Data is supported by many examples using real data taken from the author's vast experience of collaborative research, and presents a practical guide for handling missing data under the framework of multiple imputation. Furthermore, detailed guidance of implementation in R using the author’s package MICE is included throughout the book.
Assuming familiarity with basic statistical concepts and multivariate methods, Flexible Imputation of Missing Data is intended for two audiences:
This graduate-tested book avoids mathematical and technical details as much as possible: formulas are accompanied by a verbal statement that explains the formula in layperson terms. Readers less concerned with the theoretical underpinnings will be able to pick up the general idea, and technical material is available for those who desire deeper understanding. The analyses can be replicated in R using a dedicated package developed by the author.
"This book would be well suited as a textbook, especially at the graduate level, possibly for biostatisticians, epidemiologists, or applied scientists and users of statistical methodology. …a very enjoyable read, and—at least in my opinion—it is a book that belongs on everyone’s shelf as it does open one’s eyes to a problem that has surrounded us (and that many of us have ignored!) for a very long time."
—Wolfgang S. Jank, Journal of the American Statistical Association, June 2013
"From the first lines of Chapter 1 throughout the entire monograph, the author presents numerous R language codes, so the book also serves as a good introduction to R. Each chapter is complete with various examples and exercises. The book is very useful to graduate students and researchers for solving practical problems with real data."
—Technometrics, February 2013
"It’s excellent and I highly recommend it. … van Buuren’s book is great even if you don’t end up using the algorithm described in the book … he supplies lots of intuition, examples, and graphs."
—Andrew Gelman, Columbia University
"… a beautiful book that is so full of guidance for statisticians … exceptionally up to date and has more useful wisdom about dealing with common missing data problems than any other source I've seen."
—Frank Harrell, Vanderbilt University
"I’m delighted to see this new book on multiple imputation by Stef van Buuren …This book represents a 'no nonsense' straightforward approach to the application of multiple imputation. I particularly like Stef’s use of graphical displays … It’s great to have Stef’s book on multiple imputation, and I look forward to seeing more editions as this rapidly developing methodology continues to become even more effective at handling missing data problems in practice."
—From the Foreword by Donald B. Rubin
The problem of missing data
Concepts of MCAR, MAR and MNAR
Simple solutions that do not (always) work
Multiple imputation in a nutshell
Goal of the book
What the book does not cover
Structure of the book
Incomplete data concepts
Why and when multiple imputation works
Statistical intervals and tests
When to use multiple imputation
How many imputations?
Univariate missing data
How to generate multiple imputations
Imputation under the normal linear normal
Imputation under non-normal distributions
Predictive mean matching
Other data types
Classification and regression trees
Multivariate missing data
Missing data pattern
Issues in multivariate imputation
Monotone data imputation
Fully Conditional Specification
FCS and JM
Imputation in practice
Overview of modeling choices
Ignorable or non-ignorable?
Model form and predictors
Analysis of imputed data
What to do with the imputed data?
Statistical tests for multiple imputation
Stepwise model selection
Too many columns
Correct prevalence estimates from self-reported data
Correcting for selective drop-out
Correcting for non-response
Long and wide format
SE Fireworks Disaster Study
Time raster imputation
Some dangers, some do's and some don'ts