2013 Logistics Regression - Sas Institute...Case Study: Utilizing logistic regression to deal with...
Transcript of 2013 Logistics Regression - Sas Institute...Case Study: Utilizing logistic regression to deal with...
-
Logistic Regression A solution for imperfect binary data
Xue Yao, Lisa Lix
Department of Community Health Sciences
Winnipeg SAS Users Group
May 17, 2013
-
Outline
• Brief Review of Logistic Regression
• Logistic Regression in SAS
• Case Study: Utilizing logistic regression to deal with
imperfect binary data (i.e. missing and misclassification)
– SAS Global Forum Paper 283-2013: A Flexible Method to Apply
Multiple Imputation Using SAS/IML® Studio
-
• To model the PROBABILITY of the event of interest based on
the values of independent variables
• 𝑙𝑜𝑔𝑖𝑡 𝑌 = 1 𝑿 = log𝑃
1−𝑃= 𝜷𝑿
where 𝑃 = 𝑃𝑟𝑜𝑏 𝑌 = 1 𝑿 = 𝐸(𝑌 = 1|𝑿)
• 𝑃 =exp(𝜷𝑿)
1+exp(𝜷𝑿)
• o𝑑𝑑𝑠 =𝑃
1−𝑃= exp 𝜷𝑿
Logistic Regression
-
Logistic Regression
M1 is a continuous
independent variable
M2 is a binary
independent variable
-
Logistic Regression in SAS
• CATMOD, GENMOD, PROBIT and LOGISTIC procedures perform
logistic regression in SAS
• LOGISTIC Procedure Syntax:
PROC LOGISTIC ;
CLASS discrete variable ;
MODEL variable= ;
OUTPUT ;
-
Case Study: Utilizing logistic regression to
deal with imperfect binary data
• Problem: Misclassification of disease status (0 or 1) results in
bias of either descriptive or inferential analysis based on
disease status using administrative health data
• Solution: Using logistic regression as predictive model for
multiple imputation method
• Data Sources: Validation dataset (e.g. medical chart) includes
accurate measures which can be linked to the administrative
health data
-
Illustration of Data
-
Using Logistic Predictive Model For Multiple
Imputation
Figure 1. A Schematic Diagram of the Multiple Imputation Method
Logistic predictive model
-
Step 1: Prepare the Data
Step 2: Build the Logistic Predictive Model
Step 3: Generate Multiple Parameters
Step 4: Create Multiple Datasets
Step 5: Analyze the Multiple Complete Datasets
Using Logistic Predictive Model For Multiple
Imputation
-
Building Logistic Predictive Model
𝑙𝑜𝑔𝑖𝑡 𝑌 = 1 𝑴 = 𝛽0 + 𝛽1𝑀1 + 𝛽2𝑀2
• To estimate the parameters of the model using LOGISTIC
procedure in SAS IML studio
-
Using Logistic Predictive Model For Multiple
Imputation
• To generate the multiple coefficients of the logistic predictive
model for multiple imputation based on the estimated
coefficients and covariance matrix from LOGISTIC procedure
-
Using Logistic Predictive Model For Multiple
Imputation
• To predict/impute the disease status (1 or 0) multiple times
using the generated coefficients and logistic predictive model
• To save the dataset that contains the variable of imputed
disease status and the number of imputations for further
analysis
-
Using Logistic Predictive Model For Multiple
Imputation
• To estimate the disease prevalence
– PROC UNIVARIATE to estimate prevalence of each dataset
– PROC MIANALYZE to combine the estimates from each dataset
-
Using Logistic Predictive Model For Multiple
Imputation
• Outputs of PROC UNIVARIATE and MIANALYZE
-
Results
• Multiple imputation based on logistic model improves the
accuracy of disease prevalence estimate
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
True Prev Obs Prev Imp Prev
-
Conclusions
• Imputation of missing data is better than discarding incomplete
observations. – Frank E. Harrell
• Misclassification of binary data can be treated as a missing data
problem
• For monotone missing data pattern, the binary data can be
imputed using PROC MI with logistic model
• For arbitrary missing data pattern, the proposed approach can
be used to impute more than one binary variables
simultaneously
-
Thank you!
Your comments and questions are valued and encouraged, please
contact
mailto:[email protected]