Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

43
July 14 July 14 th th , 2003 , 2003 www.kiprc.uky.edu www.kiprc.uky.edu 29 29 th th TRF 2003, Denver TRF 2003, Denver Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center University of Kentucky Performing Sensitivity Analyses of Imputed Missing Values

description

Performing Sensitivity Analyses of Imputed Missing Values. Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center University of Kentucky. Multiple Imputation in Public Health Research. Handling Missing Data in Nursing Research with Multiple Imputation. - PowerPoint PPT Presentation

Transcript of Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

Page 1: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth, 2003, 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Jenny H. Qin and Mike Singleton

Kentucky CODESKentucky Injury Prevention & Research Center

University of Kentucky

Performing Sensitivity Analyses of Imputed Missing

Values

Page 2: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

Multiple Imputation in Public Health Research

Handling Missing Data in Nursing Research with Multiple Imputation

Application of Multiple Imputation in Medical Studies: from AIDS to NHANES

NHTSA: Transitioning to Multiple Imputation!

A new Method to Impute Missing BAC values in FARS Multiple Imputation

Publications

Page 3: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Questions???Questions???•May I use MI to deal with missing

data problems for my data sets?

•How can I believe that the MI will give me better analysis results?

•What should I do to get good results from MI?

Page 4: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

??????

AnswersAnswers

Sensitivity Analyses Sensitivity Analyses on Imputed Valueson Imputed Values

A sensitivity analysis A sensitivity analysis tests if our study tests if our study results are sensitive results are sensitive to our assumptions to our assumptions (missing data (missing data mechanism), data mechanism), data conditions (missing conditions (missing data rate), and data rate), and choices (imputation choices (imputation models or number of models or number of imputations) made imputations) made for obtaining the for obtaining the resultsresults

Page 5: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

MI ProcessMI Process

Data Set of Interest

Missing Data Mechanism1

Missing Data Rate2

Proc MI

Results

Analysis Model

Imputation Model3

Proc MI Options

4

Set 1

Set 3

Set 2

Set n

.

.

.

ProcMIANALYZE

Set n

Results n

Results 3

Results 2

Results 1

.

.

.

Page 6: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Research Question: What was the relationship between driving under the influence of drugs and/or alcohol, and being killed or hospitalized in a crash, for motorcycle riders in Kentucky in 2001?

Outcome (Dependent Variable): Killed or Hospitalized (K/H)

Risk Factor Candidates (Independent Variables): Age, gender, suspected DUI, posted speed limit, helmet use,

fixed object, head-on collision, collision time, rural vs. urban

CODES ApplicationCODES Application

Page 7: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Logistic Regression Model:K/H = β0 + β1*DUI + β2*Speed + β3*Fixed + β4*Head-On

Total records in our study Data set: 1,226

Records with missing values: 14 (1.1%)

Analysis Model

Page 8: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Results for the Gold StandardResults for the Gold StandardParamet

erOR(95%

CI)Estimate SE P

DUI 2.51 (1.58 3.98)

0.9189 0.2364 0.0001

Speed 1.58 (1.18 2.10)

0.4546 0.1456 0.0018

Fixed 1.70 (1.24 2.33)

0.5311 0.1599 0.0009

Head-on 1.70 (1.04 2.77)

0.5316 0.2486 0.0380

This Gold Standard result is used to compare with all other results.

Conclusion: comparing motorcyclists with DUI to motorcyclists without DUI, the odds of being killed or hospitalized are 2.5 times greater than the odds of not being killed or hospitalized, when other factors are controlled.

Page 9: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Analysis Model:K/H = β0 + β1*DUI + β2*Speed + β3*Fixed + β4*Head-On

Imputation Model:K/H DUI Speed Fixed Head-On

Note: The imputation model does not have to be identical to the analysis model, but at least it should include all of the analysis covariates. You can add any additional variables that are correlated to the variables that have missing values.

Imputation Model

Page 10: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

MCAR MAR NMAR

Study Data Set

Missing Data Mechanism1

Missing Data Rate2

Proc MIData

Analysis Proc

MIANALYZE

Results

Analysis Model

Imputation Model3

Proc MI options4

SA:SA: Missing Data Mechanism1

Page 11: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

SA:SA:• Missing Completely At Random (MCAR)

– DFN: the missing data values are a simple random sample of all data values.

– We simulated this condition by using SAS Proc SurveySelect to pick a random sample from the study data set, then set DUI = missing for those selected cases.

• Missing At Random (MAR) - DFN: the probability of missing values on one variable is unrelated to

the values of this variable, after controlling for other variables in the analysis

- We simulated this condition by setting DUI = missing for riders aged 46 or older

• Not Missing At Random (NMAR) – DFN: the probability of missing values on one variable is related to the

values of this variable even if we control other variables in the analysis– We simulated this condition by setting DUI = missing for uninjured

riders who were not suspected of DUI (DUI=‘NO’).

Missing Data Mechanism1

Page 12: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Created 3 data sets from the study data set with different missing data mechanisms, but with the same percent missing values for DUI (25%)

MCAR25% missing on

DUI

MAR25% missing on

DUI

NMAR25% missing on

DUIParameter E SE P E SE P E SE P

Intercept -1.7336 0.1096 0.0001 -1.7259 0.1092 0.0001 -1.7204 0.1092 0.0001

DUI 0.8544 0.2664 0.0016 0.8286 0.2623 0.0018 0.5791 0.2223 0.0092

Speed 0.5018 0.1449 0.0005 0.4843 0.1448 0.0008 0.4812 0.1443 0.0009

Fixed 0.4927 0.1610 0.0022 0.5079 0.1597 0.0015 0.5400 0.1578 0.0006

Head-on 0.5133 0.2485 0.0388 0.5133 0.2486 0.0389 0.5103 0.2475 0.0393

Page 13: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Sensitivity analysis on missing data mechanism:

Different

Same

Same

Same

What is the result?

Imputation Model3

Proc MI Options4

Missing Data Rate (25%)2

Missing Data Mechanism1

Estimates for Parameters with Different Missing Data Mechanisms

0

0.2

0.4

0.6

0.8

1

Estim

ate

GoldStdMCARMARNMAR

Page 14: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Conclusions of SA on Missing Data Mechanism

•Even if we used the simplest imputation model MI was able to produce results that are consistent with the Gold Standard when the missing data mechanisms were MCAR or MAR, but not NMAR

•we would predict the increased odds of death or hospitalization for riders suspected of DUI to be 1.78 (1.15 2.76) for NMAR, while our Gold Standard predicts it to be 2.51 (1.58 3.98).

Point Estimate and 95% CI for DUI with Different Missing Data Mechanisms

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

GoldStd MCAR MAR NMAR

Odd

s R

atio

95%CI_upper

Point Estimate

95%CI_lower

Page 15: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

6% 25% 50%

Study Data Set

Missing Data Mechanism1

Missing Data Rate2

Proc MIData

Analysis Proc

MIANALYZE

Results

Analysis Model

Imputation Model3

Proc MI options4

SA:SA: Missing Data Rate2

Page 16: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

SA:SA:

•Data sets with MCAR (Test on percentage of values missing for DUI as 6%, 25%, 50% respectively)

•Data sets with MAR (Test on percentage of values missing for DUI as 6%, 25%, 50% respectively)

Missing Data Rate2

Page 17: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Create 3 data sets with MCAR from the study data set having values missing for DUI as 6%, 25%, and 50% respectively.

MCAR6% missing on DUI

MCAR25% missing on

DUI

MCAR50% missing on

DUIParameter E SE P E SE P E SE P

Intercept -1.7361 0.1094 0.0001 -1.7336 0.1096 0.0001 -1.7377 0.1119 0.0001

DUI 0.9447 0.2429 0.0001 0.8544 0.2664 0.0016 0.8457 0.2973 0.0065

Speed 0.4812 0.1446 0.0009 0.5018 0.1449 0.0005 0.4831 0.1460 0.0009

Fixed 0.5213 0.1584 0.0010 0.4927 0.1610 0.0022 0.5200 0.1617 0.0013

Head-on 0.5245 0.2489 0.0351 0.5133 0.2485 0.0388 0.4936 0.2508 0.0490

Page 18: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Create 3 data sets with MAR from the study data set having values missing for DUI as 6%, 25%, and 50% respectively.

MAR6% missing on DUI

MAR25% missing on

DUI

MAR50% missing on

DUIParameter E SE P E SE P E SE P

Intercept -1.7382 0.1095 0.0001 -1.7259 0.1092 0.0001 -1.7502 0.1109 0.0001

DUI 0.9191 0.2334 0.0001 0.8286 0.2623 0.0018 1.2722 0.3298 0.0002

Speed 0.4836 0.1449 0.0008 0.4843 0.1448 0.0008 0.5063 0.1473 0.0006

Fixed 0.5076 0.1590 0.0014 0.5079 0.1597 0.0015 0.5234 0.1597 0.0010

Head-on 0.5174 0.2486 0.0374 0.5133 0.2486 0.0389 0.5371 0.2487 0.0308

Page 19: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Sensitivity analysis on Missing Data Rate?

Same

Different

Same

Same

What is the result?

Imputation Model3

Proc MI Options4

Missing Data Rate 2

Missing Data MechanismMCAR or MAR

1

Estimates for Parameters with Different Missing Rates

0

0.2

0.4

0.6

0.8

1

1.2

1.4

DUI Speed Fixed Head-on

Estim

ate

GoldStdMAR6%MAR25%MAR50%MCAR6%MCAR25%MCAR50%

Page 20: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Conclusions of SA on Missing Data Rate• For both missing data mechanisms, the 50% missing case produced the DUI parameter estimate farthest from the Gold Standard estimate, as well as the widest 95% CI. However, for MCAR the difference from the Gold Standard estimate was -7%, whereas for MAR it was 42%. In addition, the 95% CI for 50%MCAR was 19% wider than the Gold Standard 95% CI, whereas for 50%MAR it was 106% wider.

•It shows that the simplest imputation model is not sufficient to handle very high missing data rates .

Point Estimate and 95%CI for DUI with Different Missing Data Rates

0

1

2

3

4

5

6

7

8

Odd

s Ra

tio

95%CI_upper

Point Estimate

95%CI_lower

Page 21: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Study Data Set

Missing Data Mechanism1

Missing Data Rate2

Proc MIData

Analysis Proc

MIANALYZE

Results

Analysis Model

Imputation Model3

Proc MI options2

SA:SA: Imputation Model3

Model1 Model2 Model3 Model4

Page 22: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

SA:SA:

• Data set with MAR and values missing for DUI=50%

• Tests on the following 4 Imputation models

– Model1: D/H DUI Speed Fixed Head-onModel1 = Analysis model, it is the simplest imputation model

– Model2: Model1 + age_group + colltime (Categorical)

– Model3: Model1 + age_group + hour (Continuous)

– Model4: Model1 + age_group + hour_normal (Continuous)We are adding age and collision time to help predict DUI in Model2, Model3, and Model4

Imputation Model3

Page 23: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Use 4 different imputation models to do MI on the same data set with MAR, 50% missing on DUI.

Model 250% missing on DUI

Model 350% missing on

DUI

Model 450% missing on

DUIParameter E SE P E SE P E SE P

Intercept -1.8110 0.1222 0.0001 -1.8081 0.1235 0.0001 -1.8034 0.1238 0.0001

DUI 1.0127 0.2948 0.0016 0.9814 0.2966 0.0024 0.9563 0.2813 0.0015

Speed 0.5079 0.1466 0.0005 0.5021 0.1463 0.0006 0.5081 0.1469 0.0005

Fixed 0.5370 0.1604 0.0008 0.5404 0.1601 0.0007 0.5371 0.1598 0.0008

Head-on 0.5554 0.2537 0.0286 0.5477 0.2552 0.0320 0.5561 0.2521 0.0274

Page 24: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Sensitivity analysis on Imputation Model

Same

Same

Different

Same

What is the result?

Imputation Models3

Proc MI Options4

Missing Data Rate (50%)2

Missing Data MechanismMAR

1

Estimates for Parameters with Different Imputation Models

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

Est

imat

es

GoldStdNoMIModel1Model2Model3Model4

Page 25: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Conclusions of SA on Imputation Models

•Models 2, 3, and 4 are all improvements over model 1, and produced DUI parameter estimates and 95% CI widths close to those of the Gold Standard.

•So even with 50% missing values (MAR), we are able to get a good result by using a richer imputation model.

•The higher percent missing values (MAR) in your data set, the more you must include additional predictors in the imputation model.

Point Estimate and 95% CI for DUI with Different Imputation Models

0

1

2

3

4

5

6

7

8

9

NoMI Model1 Model2 Model3 Model4 GoldStd

Odd

s R

atio

95%CI_upper

Point Estimate

95%CI_lower

Page 26: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Comparison of No MI and Model 4 to the Gold StandardComparison of No MI and Model 4 to the Gold Standard

Estimates for Parameters (Data set with 50% MAR on DUI)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

DUI Speed Fixed Head-on

Est

imat

es

GoldStd

NoMI

Model4

Page 27: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Comparison of No MI and Model 4 to the Gold StandardComparison of No MI and Model 4 to the Gold Standard

Point Estimate and 95% CI for DUI

0

1

2

3

4

5

6

7

8

9

Od

ds

Rat

ioPoint Estimate and 95% CI for Speed

0

0.5

1

1.5

2

2.5

Od

ds

Rat

io

Point Estimate and 95% CI for Fixed

0

0.5

1

1.5

2

2.5

3

3.5

Od

ds

Rat

io

Point Estimate and 95% CI for Head-on

0

1

2

3

4

5

6

Odd

s R

atio

No MI

G.S.

G.S.

G.S.

G.S.

MI

MI

MI

MI

Page 28: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Study Data Set

Missing Data Mechanism1

Missing Data Rate2

Proc MIData

Analysis Proc

MIANALYZE

Results

Analysis Model

Imputation Model3

Proc MI: number of MI4

N=2N=0 N=5 N=10 N=20

SA:SA: Proc MI: Number of Imputations4

Page 29: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

SA:SA:

• Data set with MAR and values missing for DUI=50%, use Model4 to do MI

• Test on different number of imputations– N=0

– N=2

– N=5

– N=10

– N=20

4 Proc MI: Number of Imputations

Page 30: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Use same imputation model (Model4), but different number of imputations to do MI on the same data set with MAR, 50% missing on DUI.

N=550% missing on DUI

N=1050% missing on

DUI

N=2050% missing on

DUIParameter E SE P E SE P E SE P

Intercept -1.7975 0.1177 0.0001 -1.8034 0.1238 0.0001 -1.7898 0.1204 0.0001

DUI 0.8658 0.2537 0.0023 0.9563 0.2813 0.0015 0.9942 0.3176 0.0026

Speed 0.4971 0.1457 0.0006 0.5081 0.1469 0.0005 0.5016 0.1465 0.0006

Fixed 0.5448 0.1610 0.0007 0.5371 0.1598 0.0008 0.5286 0.1599 0.0010

Head-on 0.5652 0.2522 0.0251 0.5561 0.2521 0.0274 0.5506 0.2509 0.0282

Page 31: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Sensitivity analysis on Number of Imputations

Same

Same

Same

Different

What is the result?

Imputation Model3

Number of Imputation4

Missing Data Rate (50%)2

Missing Data MechanismMAR

1

Estimates for Parameters with Different Number of Imputations

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Estim

ates

GoldStdNoMIMI N2MI N5MI N10MI N20

Page 32: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Conclusions of SA on Number of Imputations

•In our example, n=5 to 10 is enough to get good results for data set with 50% MAR on DUI.

•No MI (complete cases only), we would conclude that: motorcyclists with DUI had 4.2 (2.1, 8.4) times more likely killed or hospitalized than motorcyclists without DUI. But from the Gold Standard, the OR is 2.5 (1.5, 4.0)

Point Estimate and 95% CI for DUI with Different Imputation Numbers

0

1

2

3

4

5

6

7

8

9

n=0 n=2 n=5 GoldStd n=10 n=20

Odd

s R

atio

95%CI_upper

Point Estimate

95%CI_lower

Page 33: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Summary---Answers?Summary---Answers?• May I use MI to deal with missing data problems

for my data sets?Seems a good idea to try MI. Depend on the missing data mechanisms of variables with missing values in your data sets (however, even our results with MI for NMAR were better than No MI)

• How can I believe that the MI will give me the better analysis results?We found that using MI on our example gave us much better analysis results than No MI (the complete cases only)

• How can I get better analysis results by using MI?Understand the relationship between variables in your data sets; Know the missing data mechanisms of variables;Determine the percent of missing information;Build a reasonable imputation model;Use Proc MI options wisely

Page 34: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Q1. I like Denver.Q1. I like Denver.

Q2. I like TRF.Q2. I like TRF.

Q3. I liked the talk.Q3. I liked the talk.

Q4. I will use the MI.Q4. I will use the MI.

Missing Data Problems

Everywhere

Poll ResultsPoll ResultsLike Denver Like TRF Liked the Talk Use MI

Y Y Y Y

Missing (left session early)

Y Missing (too nice to say “NO”)

N

Y N Y Y

Y N N Missing (not sure yet)

N Missing (daydreaming)

Y Y

Missing (fell asleep)

Y Missing N

N N N Missing

N Missing Y Y

Page 35: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Acknowledgment

Special thanks to Dr. Mike McGlincy, who gave us helpful suggestions during our study of sensitivity analyses on imputed values and insightful comments on the analysis results.

Page 36: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Thank You

Page 37: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Questions?

Page 38: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Can We Improve Analysis Results for Can We Improve Analysis Results for NMAR by Using a More Complex NMAR by Using a More Complex

Imputation Model?Imputation Model?

Estimates for Parameters on 25% NMAR with Different Models

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

DUI Speed Fixed Head-on

Estim

ates

GoldStdNoMIModel1Model4Model5

Model5=Model1+age+hour+gender+safety

Model4=Model1+age+hour

Model1=K/H + DUI + Speed+ Fixed + Head-on

No MI=Complete cases only

Page 39: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Multiple Imputation inference involves

three distinct phases:

1. The missing data are filled in m times to generate m complete data sets (using imputation model)

2. The m complete data sets are analyzed by using standard procedures (using analysis model)

3. The results from the m complete data sets are combined for the inference

Page 40: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Statistical Assumptions for Multiple Imputation

1. The MI procedure assumes that the data are from a continuous multivariate distribution. It also assumes that the data are from a multivariate normal distribution when the MCMC method is usedAccording to Schafer’s MI FAQ page, MI tends to be quite forgiving of assumption for normal distribution. For example: when working with binary or ordered categorical variables, it is often acceptable to impute under a normality assumption and then round off the continuous imputed values to the nearest category. Variables whose distributions are heavily skewed may be transformed to approximate normality and then transformed back to their original scale after imputation.

2. Proc MI and Proc MIANALYZE assume that the missing data are Missing At Random (MAR)MCAR is unlikely for real world crash datasetsNMAR may be shifted to MAR by using a richer imputation model to help predict missing values. Because crash datasets include many related variables that can help predict each other

Page 41: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Page 42: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver

Page 43: Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center

July 14July 14thth , 2003 , 2003 www.kiprc.uky.eduwww.kiprc.uky.edu 2929thth TRF 2003, Denver TRF 2003, Denver