Knowledge Discovery in Databases: Improving Quality in...
Transcript of Knowledge Discovery in Databases: Improving Quality in...
1
Knowledge Discovery in Databases: Improving Quality in Homecare
Bonnie L. Westra, PhD, RN, Assistant ProfessorUniversity of Minnesota, School of Nursing
An educational update to the HIMSS Management Engineering – Performance Improvement Task Force
June 17, 2008
Acknowledgments
Co-InvestigatorsKay Savik, MS John H. Holmes, PhD Cristina Oancea, MS, PhD Student (RA)Lynn Choromanski, MS, RN, PhD Student (RA)Mary Dierich, MS, RN, PhD Student
Industrial PartnersCareFacts Information SystemsCHAMP SoftwareDeb Solomon, RN, MS, Home Caring & Hospice (consultant)
FundingUniversity of Minnesota Digital Technology Initiative Grant, UMN-Grant-In-Aide, NIH Health Trajectory –
P20 Grant
Objectives
•
Describe current homecare research using EHR data•
Demonstrate a series of steps in comparing traditional statistical analytic methods with knowledge discovery methods (data mining)
•
Examine lessons learned with the use of EHR data quality improvement
•
Explore the use of KDD for future research
Problem
•
Increasing homecare/ community-based care–
Annual expenditure in 2005 of $47.5 billion
•
2000 CMS implemented PPS for Medicare patients•
Concern about decrease in service/ visits on outcomes
•
First study -
28% hospitalization rate nationally –
remained constant
–
Limited research on ways to reduce hospitalization
Research Aims
The purpose of the first study was to develop predictive models for risk factors
associated with increased
likelihood of hospitalization
of homecare patients and discover if interventions documented as part of routine care using the Omaha System influence hospitalization.
•
Use knowledge discovery in databases combined with traditional statistics.
•
Reported here is the first models using traditional statistics.
Design/ Sample
Secondary analysis of EHR data •
OASIS and Omaha System interventions from two different EHR systems and 15 homecare agencies.
Data included •
All patients in 2004 receiving homecare services
with a
minimum of two OASIS records for the start and end of an episode of care and who also had Omaha System interventions.
KDD Process
Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R. Advances in knowledge discovery and data mining. Menlo Park, CA: AAI Press/ The MIT Press Press; 1996.
*
*
*
* *
Expertise Required
•
Clinical expert–
What data are collected, when, why, and how–
Interpretation of the data–
Meaningful decisions throughout the process•
Information system knowledge -
specifying requirements
–
What data are available–
Similarity across agencies and vendors–
Data base issues –
how the data are stored•
Data analysis
–
Statistical knowledge–
Data mining knowledge•
Clinical validation throughout the process
OASIS Data
OMAHA SYSTEM PROBLEMS
ENVIRONMENTAL 22 - Dentition 1 - Income 23 - Cognition 2 - Sanitation 24 - Pain 3 - Residence 25 - Consciousness 4 - Neighborhood/workplace safety 26 - Integument
5 - Other 27 - Neuro-musculo-skeletal function
PSYCHSOCIAL 28 - Respiration 6 - Communication with community resources 29 - Circulation 7 - Social contact 30 - Digestion-hydration 8 - Role change 31 - Bowel function 9 - Interpersonal relationship 32 - Genito-urinary function 10 - Spiritual distress 33 - Antepartum/postpartum 11 - Grief 34 - Other 12 - Emotional stability HEATH RELATED BEHAVIORS 13 - Human sexuality 35 - Nutrition 14 - Caretaking/parenting 36 - Sleep and rest patterns 15 - Neglected child/adult 37 - Physical activity 16 - Abused child/adult 38 - Personal hygiene 17 - Growth and development 39 - Substance use 18 - Other 40 - Family planning PHYSIOLOGICAL 41 - Health care supervision 19 - Hearing 42 - Prescribed medication regimen 20 - Vision 43 - Technical procedure 21 - Speech and language 44 - Other
Omaha System
Analyses
•
Traditional statistical analyses–
Frequencies, descriptive, histograms
–
Chi square/ bivariate association–
Latent class analysis
–
Logistic regression analysis
•
Future -
Data mining techniques–
Visualization–
Feature selection–
Decision trees–
Clustering
Preprocessing
•
18,067 OASIS records for 3,199 patients–
Missing data
–
Duplicate records–
Invalid values
•
989,772 Omaha System Interventions–
Missing data
•
Matched patients with OASIS and Omaha System Data
•
65,000 Medication records
Data Preparation
•
Preparation –
cleaning data–
Missing values
–
Duplicate records–
Out of range values
•
Grouping data into episodes of care
Unit of Analysis
Episodes•
2,806 patients -
4,242 Episodes
Discharge, 48.8%
Transfer, 38.6%
Continue, 10.9%
Death, 1.7%
Transformation
Summative scales•
Prognosis, Pain, Pressure Ulcers, Stasis Ulcers, Surgical Wounds, Respiratory Status, ADLs, IADLs
Clinical Classification Software•
Primary diagnoses and then reduced into 51 smaller groups within
11 major categories
Charlson
Index of Comorbidity•
Additional medical diagnosesInterventions
•
Theoretically grouped into 23 categoriesCreated dummy variables
•
For non-normally distributed data
Primary Diagnoses11
Groups
Categories51
Clinical Classification Software Groups
260
Primary Diagnoses – ICD 9 codes~13,000
Clinical Classification SoftwareGrouping CCS Categories DescriptorsCardiac and Other Circulation Diseases
24 97, 98, 99, 111, 112, 113, 117, 120, 121
Hypertension & other circulatory diseases
25 100, 101, 102 Myocardial infarction
26 103, 104, 96, 213, 245 Other heart disease
27 105, 106 Conduction
28 108 Congestive Heart Failure; NONHP
29 109, 110 Acute cerebrovascular disease
30 114, 116, 118, 119 Peripheral atherosclerosis
31 115 Aneurysm
Applying a Clusterer: Identifying similarities and dissimilarities
Data Analysis
•
Latent class analysis–
ADL Scale (M0640 –
M0710) –
Who Provides Assistance (M0350) –
Management of medications (M0780) –
Diagnosis group (M0230 CCS Groups)
•
Logistic regression–
Create models for predictors of hospitalization -
OASIS–
Added interventions –
Omaha System Interventions
Demographics
•
2,806 patients Mean age 74.4 (SD = 14.1)–
64.6% Females –
97.9% White
•
4,242 Episodes–
Length of stay ranged from 1 -
6,354 days (Median = 38 days) –
48.8% discharged –
38.6% transfer to inpatient setting –
1,620 (38.4%) hospitalized–
29.9% continued with care–
1.7% died
Demographics
Primary diagnoses (most frequent)•
18.8% cardiac and circulatory diseases •
18.1% orthopedic/ trauma surgery and follow up •
9.1% endocrine and nutrition •
7.3% respiratory problems •
2.3% infectious diseases Charlson Index of Comorbidity
•
0 –
10 with a mean of .58 (SD = 1.32)Interventions (384,081)
•
62.5% monitoring •
44.9% teaching •
30.2% treatments •
16.0% case management
Class I: Functionally Impaired
Risk Factors Risk of Hospitalization
Assistance with IADLs 1.5 –
2.3 ↓
Expected Prognosis 1.9 –
2.2 ↑
Charlson Index 2.6 –
3.3 ↑
Medicare as homecare payor 2.0 –
2.3 ↑
Significant InterventionsVariable Frequency OR
Monitoring Injury Prevention Moderate 1.7 ↑
Significant Interventions Class I: Functionally Impaired
Class III: Cardiac/ CirculatoryRisk Factors Risk of Hospitalization
IADL Status: 1.5 –
2.3 ↑
Expected Prognosis: 1.6 –
1.8 ↑
Pain 1.9 –
2.2 ↑
Charlson Index 2.1 –
2.6 ↑
Bowel Incontinence 2.0 ↑
Patient equipment 3.9 ↑
Significant InterventionsVariable Frequency OR
Teaching Disease Treatment Moderate .50 ↓Providing Medication Treatment Low 1.9 ↑Teaching Disease Treatment High 3.0 ↑
Significant Interventions Class III: Cardiac/ Circulatory
Interpreting ResultsWho Interprets
•
Nurses on research team•
Homecare clinical manager–
Broader homecare audience
What were they asked?•
Latent Classes –
are they meaningful?•
Within class predictors–
What does it mean to have bowel incontinence as a predictor of hospitalization?•
Across classes: most consistent predictors of hospitalization are –
Charlson Index of Comorbidity, –
Prognosis–
Medicare–
Patient management of equipment–
IADLs
Discussion
•
Homecare patients are heterogeneous in needs –
latent class analysis was useful
–
ADLs, management or oral medications, caregiver assistance, and primary diagnoses
•
Differences between classes•
Similarities across classes
–
Most consistent predictors of hospitalization are Charlson Index
of Comorbidity, prognosis, Medicare, patient management of equipment, and IADLs
–
The addition of interventions to the predictive models for hospitalization modified some predictors -
Injury prevention•
Some interventions were risk factors, others were protective
Is There a Better Way?
•
Use KDD methods•
How are they similar or different?
•
What can we learn compared with traditional statistical analyses?
•
What are the strengths and weaknesses?
Definition
•
Knowledge discovery in databases (KDD)–
Rigorous analytic approach –
Combines traditional statistical concepts with semi-automated analyses
•
Uses tools from the statistical and machine learning –
Inductive, data driven approach to analyze large, complex datasets–
Identify patterns in data that could be missed using only traditional analytic methods.
Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. Second Edition ed. San Francisco: Morgan Kaufmann; 2005.
Traditional Statistics KDD
Feature Selection Chi-Square, bivariate Chi-SquareInfoGainCFS evaluation
BestFirst Greedy StepwiseGenetic
Clustering Latent Class K MeansEM
Predictive Modeling Logistic Regression Decision TreesBayesian Network
Strengths & Weaknesses
•
Traditional Statistics•
Well known and accepted
•
Use to discover and test hypotheses•
Limited by statistical assumptions
•
KDD•
Newer and treated with suspicion
•
Used for discovery•
Much more flexible in working with data
•
Requires more interaction in making decisions about data•
Health care data is temporal and non-retangular
Lessons Learned
•
Health care data are messy –
audit, Audit, AUDIT!!–
80% is data preparation (minimally)
•
Know your data –
dwell in the data early and often•
Many decisions made to manage the data –
each could
influence the validity of the results–
Incorrectly coded data–
Missing data–
Data reduction strategies–
Feature selection –
cut points–
Dummy variables –
cut points
Lessons Learned
•
Walk before you run–
Phasing in steps with each subsequent study
•
Comparisons between traditional and data mining techniques
–
Both use similar math–
Difference in assumptions and how data are managed–
Data mining -
discovery–
Traditional statistics –
discovery & verification
•
Art and a science
Research in Process
•
Predict outcomes using protective / risk factors (OASIS), interventions (Omaha System) and medication data
–
Hospitalization and emergent care use (DTI)–
Pressure ulcers and incontinence (P20)
–
Oral medication management/ ambulation (GIA)•
Clustering of interventions
Bonnie Westra, PhD, RN
Assistant Professor & Co-Director ICNP Center
University of Minnesota, School of Nursing
Robert Wood Johnson, Nurse Executive Fellow5-140 Weaver-Densford Hall
308 Harvard St. SE
Minneapolis, MN 55455
W -
612-625-4470
F -
612-626-3255
Thank you!
For more information, please contact HIMSS Staff Liaison
JoAnn W. Klinedinst, CPHIMS, PMP, FHIMSS at [email protected]