2013 Logistics Regression - Sas Institute...Case Study: Utilizing logistic regression to deal with...

17
Logistic Regression A solution for imperfect binary data Xue Yao, Lisa Lix Department of Community Health Sciences Winnipeg SAS Users Group May 17, 2013

Transcript of 2013 Logistics Regression - Sas Institute...Case Study: Utilizing logistic regression to deal with...

  • Logistic Regression A solution for imperfect binary data

    Xue Yao, Lisa Lix

    Department of Community Health Sciences

    Winnipeg SAS Users Group

    May 17, 2013

  • Outline

    • Brief Review of Logistic Regression

    • Logistic Regression in SAS

    • Case Study: Utilizing logistic regression to deal with

    imperfect binary data (i.e. missing and misclassification)

    – SAS Global Forum Paper 283-2013: A Flexible Method to Apply

    Multiple Imputation Using SAS/IML® Studio

  • • To model the PROBABILITY of the event of interest based on

    the values of independent variables

    • 𝑙𝑜𝑔𝑖𝑡 𝑌 = 1 𝑿 = log𝑃

    1−𝑃= 𝜷𝑿

    where 𝑃 = 𝑃𝑟𝑜𝑏 𝑌 = 1 𝑿 = 𝐸(𝑌 = 1|𝑿)

    • 𝑃 =exp(𝜷𝑿)

    1+exp(𝜷𝑿)

    • o𝑑𝑑𝑠 =𝑃

    1−𝑃= exp 𝜷𝑿

    Logistic Regression

  • Logistic Regression

    M1 is a continuous

    independent variable

    M2 is a binary

    independent variable

  • Logistic Regression in SAS

    • CATMOD, GENMOD, PROBIT and LOGISTIC procedures perform

    logistic regression in SAS

    • LOGISTIC Procedure Syntax:

    PROC LOGISTIC ;

    CLASS discrete variable ;

    MODEL variable= ;

    OUTPUT ;

  • Case Study: Utilizing logistic regression to

    deal with imperfect binary data

    • Problem: Misclassification of disease status (0 or 1) results in

    bias of either descriptive or inferential analysis based on

    disease status using administrative health data

    • Solution: Using logistic regression as predictive model for

    multiple imputation method

    • Data Sources: Validation dataset (e.g. medical chart) includes

    accurate measures which can be linked to the administrative

    health data

  • Illustration of Data

  • Using Logistic Predictive Model For Multiple

    Imputation

    Figure 1. A Schematic Diagram of the Multiple Imputation Method

    Logistic predictive model

  • Step 1: Prepare the Data

    Step 2: Build the Logistic Predictive Model

    Step 3: Generate Multiple Parameters

    Step 4: Create Multiple Datasets

    Step 5: Analyze the Multiple Complete Datasets

    Using Logistic Predictive Model For Multiple

    Imputation

  • Building Logistic Predictive Model

    𝑙𝑜𝑔𝑖𝑡 𝑌 = 1 𝑴 = 𝛽0 + 𝛽1𝑀1 + 𝛽2𝑀2

    • To estimate the parameters of the model using LOGISTIC

    procedure in SAS IML studio

  • Using Logistic Predictive Model For Multiple

    Imputation

    • To generate the multiple coefficients of the logistic predictive

    model for multiple imputation based on the estimated

    coefficients and covariance matrix from LOGISTIC procedure

  • Using Logistic Predictive Model For Multiple

    Imputation

    • To predict/impute the disease status (1 or 0) multiple times

    using the generated coefficients and logistic predictive model

    • To save the dataset that contains the variable of imputed

    disease status and the number of imputations for further

    analysis

  • Using Logistic Predictive Model For Multiple

    Imputation

    • To estimate the disease prevalence

    – PROC UNIVARIATE to estimate prevalence of each dataset

    – PROC MIANALYZE to combine the estimates from each dataset

  • Using Logistic Predictive Model For Multiple

    Imputation

    • Outputs of PROC UNIVARIATE and MIANALYZE

  • Results

    • Multiple imputation based on logistic model improves the

    accuracy of disease prevalence estimate

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    0.45

    True Prev Obs Prev Imp Prev

  • Conclusions

    • Imputation of missing data is better than discarding incomplete

    observations. – Frank E. Harrell

    • Misclassification of binary data can be treated as a missing data

    problem

    • For monotone missing data pattern, the binary data can be

    imputed using PROC MI with logistic model

    • For arbitrary missing data pattern, the proposed approach can

    be used to impute more than one binary variables

    simultaneously

  • Thank you!

    Your comments and questions are valued and encouraged, please

    contact

    [email protected]

    mailto:[email protected]