Predicting Risk of Re-hospitalization for Congestive Heart Failure Patients(in collaboration with )
Jayshree AgarwalSenjuti Basu Roy,
Ankur Teredesai, Si-Chi Chin, David Hazel, Kiyana, Mehrdad, (UWT)
Paul Amoroso, Yoshi Williams, Dr. Lester Reed, Sheila, Eric Johnson (MHS)
Motivation
Congestive Heart
Failure(CHF)
Many hospitalizations
readmissions
19.6% patients readmitted within 30 days [Jencks et
al. 2009]
31.1% patients readmitted within 60 days [Jencks et
al. 2009]
LOW Readmission rate = HIGH quality of care by hospital
No reimbursement for readmission within 30 days
$$$COST - 2004 unplanned re-admits = $17.4 billion [Jencks et
al. 2009]
2
MHS - UWT Web and Data Science collaboration objectives
Predict the RISK of Readmission for CHF patientsReduce the Readmission rate and cost Improve patient satisfaction and quality of careAppropriate pre-discharge and post-discharge planningProper resource utilization
3
Problem
Develop models that can predict risk of readmission for CHF patients within 30 days after discharge 60 days after discharge
The readmission may happen for other reasons in addition to CHF
5
Overall Approach
How to solve the problem?– Apply predictive data mining techniques such as,
classificationWhat do these predictive mining techniques
require?– Data in homogeneous format• Information Extraction, Integration, and data
preparation• Prepare labeled dataset to train the model; used later
on for testing.6
Our ChallengesBuilding domain knowledge– Which variables to consider?– How to merge and unify them in a homogeneous
format (information extraction and integration)– How to understand the relative importance of the
variables in the prediction task?How to prepare data?– Class label generation– Noisy real world data (missing values, inconsistencies,
etc.)– Serious skew in the dataset
7
8
Solution
Building Predictive Classification Models
Data Understanding
Data Preprocessing
Modeling
Evaluation
9
Data Understanding
Collect initial data Acquire Domain knowledge
Describe and explore dataset
Create data visualization
10
Building Predictive Classification Models
Data Understanding
Data Preprocessing
Modeling
Evaluation
11
12
Data Preprocessing
Define class label Attribute selection
Data Integration
Removal of incomplete data
Finding Eligible CHF admissions
13
Eligible CHF admissions and Generating Class Labels
All CHF Admissions
Eligible CHF Admissions
In hospital deaths removed
Is there any readmission
within x days of discharge?
The class label is assigned as 1
The class label is assigned as 0
YESNO
X=30 X=60
14
Attribute selection
Yale Model [Krumholz et al]
-Socio-Demographic variable(2)
-Comorbidities(35)
“Baseline”
Additional predictor variables identified by us
(14)
“New”
“Correlated”“All”
Chi-square correlation test
15
Data Extraction
Labeled data
Patient details
Primary and Secondary diagnosis
Lab measurement
Administrative data
Data used for training the Models
Data
Incomplete data removed
Table Joins
16
Data Distribution
30 days time frame 60 days time frame
Readmissions0
2000
4000
6000
8000
10000
12000
ReadmitNo Readmit
Readmissions0
2000
4000
6000
8000
10000
12000
ReadmitNo Readmit
17
Building Predictive Classification Models
Data Understanding
Data Preprocessing
Modeling
Evaluation
18
Modeling
• Logistic regression• Naïve Bayes classifier• Support Vector Machine
Balancing imbalanced data by under-sampling and over
sampling
Selecting modeling technique for Binary
Classification
Building prediction models
19
Logistic Regression Model
P (P
roba
bilit
y of
Y)
Z ------>
20
Naïve Bayesian Classification
Statistical Classifier performs probabilistic prediction based on Bayes Theorem
Assumes that the attributes are conditionally independent
Given a data tuple X and m classes Predicts X belongs to only if is highest among all the
for all the m classes
21
Support Vector Machine
A method of classification for both linear and non linear data
Searches for optimal separating hyperplane separating the two classes
Building Predictive Classification Models
Data Understanding
Data Preprocessing
Modeling
Evaluation
22
Performance Evaluation Metrics
Precision – percentage of tuples labeled as positive are actually positive = TP/TP+FP
Recall – measures the percentage of positive tuples that are labeled positive = TP/TP+FN
Accuracy – percentage of tuples correctly classified = (TP+TN)/P+N ROC curves and area under the curve (AUC) – Shows the trade-off
between true positive rate and false positive rate.
23
Evaluation
• Predictive models are assessed using 10 fold cross validation
• The performance is compared using different evaluation metrics mentioned previously
25
RESULTS
Logistic Regression for 30 days
Area Under the Curve (AUC) Recall
27
Logistic regression for 60 days
Area Under the Curve (AUC) Recall
28
29
Naïve Bayes classifier for 30 days
Attribute Set0.56
0.57
0.58
0.59
0.6
0.61
0.62
0.63
0.64
BaselineNewAllCorrelated
Area Under the Curve (AUC)
30
Support Vector Machine for 30 days
Attribute Set0.58
0.59
0.6
0.61
0.62
0.63
0.64
BaselineNewAllCorrelated
Area Under the Curve (AUC)
35
Conclusion and Discussion
It is one of the difficult problem to solveFeature selection gives the best results. With data balancing recall of the model improves
36
Future Work
Investigate other classifier techniques like ensemble methods, neural networks
To explore additional features and study their relevance
To employ other feature selection techniquesTo device a method to impute missing valuesDeploying the predictive models
37
Acknowledgement
Multicare health System (MHS) and Dr. Lester Reed for giving us this opportunity
Data architects and domain experts in MHS for their inputs
Professors Dr. Ankur Teredesai and Dr. Senjuti Basu Roy for their guidance
Other team members in UWT for their support
38
References
S. F. Jencks, M. V. Williams, and E. A. Coleman, “Rehospitalizations among Patients in the Medicare Fee-for-Service Program,” New England Journal of Medicine, vol. 360, no. 14, pp. 1418–1428, 2009.
J. Han and M. Kamber, Data mining: concepts and techniques. Morgan Kaufmann, 2006
H. M. Krumholz, S. L. T. Normand, P. S. Keenan, Z. Q. Lin, E. E. Drye, K. R. Bhat, Y. F. Wang, J. S. Ross, J. D. Schuur, and B. D. Stauffer, Hospital 30-day heart failure readmission measure methodology. Report prepared for the Centers for Medicare & Medicaid Services.
39
Questions
Top Related