Predicting Who Will Die Within Six Months
-
Upload
matthew-dunning -
Category
Healthcare
-
view
12 -
download
1
Transcript of Predicting Who Will Die Within Six Months
Predicting Who Will Die
Presentation by: Matthew Dunning
Objective of work
• The objective of this work is to identify the probability of death based on the diagnosis, medical history, category of diagnosis and reoccurring diagnoses.
• I used Bayes Theorem to in order to solve for likelihood ratio, probability, sensitivity and specificity.
• This work is important in order to make evidence based decisions on patient ids that need more attention/follow up if they are alive, and if dead, use the data to help understand the effects of these diseases.
Data & It’s Source• In the original data
source, there are 17443442 rows of data.
• Sorting by count of icd9 desc, you would have a left skewed. Without sort, the distribution would be “all jumbled up” I4
01.9
I305.1
I496
.
I414
.01
I427.3
1
I600.0
0
I285.9
IV60
.0
I403.9
0
I724.2
I303.9
1
I070.5
4
I338.2
9
I585.9
I486.
IV45
.81
I276
.8
I070
.70
I564
.00
I276
.1
IV62
.0
I278.0
0
IV45
.82
IV57
.89
I427.8
90
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
Count of ICD9, Top 50
ICD9 Code
ICD
9 C
ount
Preparation of the Data
Remove Cases Where Patients
Died Before Visit
17443442 to 17439892
rows of id,icd9
Original Data
Remove cases in a year that
exceed 365
17443442 rows id,
icd9
829801 to 828616 id’s
Final Data to use for analysis
15715093 rows id, icd9
Randomization
• 80% of data was used for training• 20% of data was used for validation
Calculation of Likelihood Ratios
LR = _______________________________________DeadwithDx/DeadAlivewithDx/Alive
Top Ten Deadliest Diseases – Each Dx from Training Set
1. Brain Death2. Malignant Ascites3. Malignant Neoplasm of bilary tract4. Encounter for palliative care5. Cardiac Arrest6. Coma7. Malignant Pleural Effusion8. Secondary Malignant Neoplasm of Adrenal Gland9. Disseminated Malignant Neoplasm without
specification of site10. Secondary Malignant Neoplasm of brain and spinal
cord
icd9 LR
I348.82 11.78
I789.51 5.53
I156.9 5.04
IV66.7 4.95
I427.5 4.85
I780.01 4.75
I511.81 4.62
I198.7 4.45I199.0 4.42
I198.3 4.35
Least Ten Deadliest Diseases – Each Dx from Training Set
1. Dysmenorrhea
2. Chondromalacia of patella
3. Hypertrophy of tonsils alone
4. Personal History of injury presenting hazards to health
5. Schizophrenic disorders, residual type, chronic with acute exacerbation
6. Unspecified symptom associated with female genital organs
7. Pelvic peritoneal adhesions, female (postoperative)(post infection)
8. Cervicitis and endocervicitis
9. Migraine without aura, without mention of intractable migraine without mention of status migrainosus
10. Amphetamine or related acting sympathomimetic abuse, episodic
icd9 LR
I625.3 0.0009I717.7 0.001
I474.11 0.002
IV15.5 0.002I295.64 0.002
I625.9 0.002
I614.6 0.003
I616.0 0.004
I346.10 0.004
I305.72 0.004
How medical history can be used to predict prognosis
• Medical history can be used to predict future prognosis because it can be used to look into probabilities of developing another medical condition, developing the medical condition again X amount of times and your chances of death within a certain timespan.
id icd9 ageatdx ageatdeath odds prob
463305 I529.4 63.5 NULL 9.646010674 0.906068101
463305 I300.00 64.333333 NULL 0.479591875 0.324137949
463305 I528.9 64.333333 NULL 0.441291616 0.306177883
463305 I528.9 60.416666 NULL 0.435537554 0.303396838
463305 I714.0 62.666666 NULL 0.330190365 0.248227903
463305 I300.00 60.083333 NULL 0.29491929 0.227751098
463305 I530.81 66.416666 NULL 0.260475898 0.206648853
463305 I401.9 62.916666 NULL 0.257884965 0.205014745
463305 I300.00 64.083333 NULL 0.255498056 0.203503346
463305 I401.9 60.166666 NULL 0.23220788 0.188448624
463305 I530.81 63 NULL 0.229879244 0.186912045
463305 I311. 64.083333 NULL 0.219384003 0.179913795
463305 IV15.81 62.916666 NULL 0.217714068 0.178789154
463305 I300.01 60.083333 NULL 0.131771237 0.116429215
463305 I278.00 60.166666 NULL 0.109584144 0.098761454
463305 I278.00 66.416666 NULL 0.107392413 0.09697774
463305 I296.7 63 NULL 0.099679257 0.090643936
463305 I296.7 63.583333 NULL 0.098795268 0.089912353
463305 I306.9 63 NULL 0.070926549 0.066229144463305 I306.9 63.583333 NULL 0.06561912 0.0615784
463305 I296.50 63.5 NULL 0.062146048 0.05850989
463305 IV62.4 63 NULL 0.048503436 0.046259683
463305 IV62.0 62.916666 NULL 0.029597396 0.028746573
00.20.40.60.811.20
0.2
0.4
0.6
0.8
1
1.2
Sensitivity vs Specificity
Specificity
Sens
itivi
ty
Accuracy of the prediction
• I used the formula: Accuracy = (TN+TP)/(TN+TP+FN+FP)
Probability Sensitivity Specificity Accuracy
0 1 0 50%
0.2 0.738 0.493 62%
0.4 0.3033 0.86 58%
0.8 0.01206 0.997 50%
0.9 0.00001084 0.9999 50%
0.95 4.0669E-06 1 50%
1 0 1 50%
• Sensitivity – among patients with a disease, the probability of a positive test
• Specificity – Among patients without disease, the probability of a negative test
Contingency Table
Dead Alive TotalYes 23 14 47No 141089 680473 821562Total 141112 680487 821609
Usefulness of the Project• The usefulness of the project to others is that hospitals can focus more
time to these patients which may result in a delay of death or an increase in patient satisfaction.
• Researchers can use this project to better understand trends and stages of different icd9’s, i.e: why there is such a high odds/probability or low odds/probability associated with the corresponding id.
• For me, it gave me some insight on how to apply Bayes Theorem (statistical processes) to a big dataset and how to use different functions within SQL to complete the desired tasks.