Data Mining Presentation
Transcript of Data Mining Presentation
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
Data Mining Applications In Healthcare
TEPR 2004May 21, 2004
V. “Juggy” JagannathanVP of Research
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
Introduction
Provide an overview of the technologies that are relevant to the development and deployment of data mining solutions in healthcare
Goals of today’s presentation:
Allow participants to evaluate where the technology is useful
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
What is Data mining?
Divining knowledgefrom data
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
.Topic Outline
Data mining
• Uses
• Algorithms
• Technology
• Applications in healthcare
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
.Data Mining Uses
• Descriptive
• Predictive
ClassificationRegressionTime-Series
ClusteringSummarizationAssociation RulesSequence Discovery
Understand and characterize
Extrapolate and forecast
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
Data Mining Algorithms
• Classification> Statistical> K-nearest
neighbors> Decision trees
▲ ID3▲ C4.5
> Neural Networks (Self Organizing Maps)
• Clustering> Hierarchical> Partitioned> Genetic
• Association> Apriori
Algorithm> If….Then rules
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
Technology
• Database Technologies
• On-Line Analytical Processing (OLAP)
• Visualization Technologies
• Data scrubbing technologies
• Natural Language Processing (NLP)
Technology solutions
Data Mining Infrastructure Technologies
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
Database Technologies
• Data warehouse vs. Data mart
• Relational technologies> Oracle> Microsoft
• XML-databases> Raining Data
•Database
•OLAP
•Visualization
•Scrubbing
•NLP
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
On-Line Analytical Processing
• Analyze multi-dimensional data
• N-dimensional data cubes
• Operations> Roll-up> Drill-down> Slice and dice> Pivot
•Database
•OLAP
•Visualization
•Scrubbing
•NLP
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
Visualization
• 2D/3D Charts
• Topographic displays
• Cluster displays
• Histograms
• Scatter plots
• Advanced visualization (genomic data patterns)
• http://www.ncbi.nlm.nih.gov/Tools/
•Database
•OLAP
•Visualization
•Scrubbing
•NLP
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
• Data cleansing
• Filling in missing data
• In healthcare, there is a strong need for de-identification to protect privacy
•Database
•OLAP
•Visualization
•Scrubbing
•NLP
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
De-Identification of Medical Records *
• Names;
• all elements of a street address, city, county, precinct, zip code, & their equivalent
• geocodes, except for the initial three digits of a zip code for areas that contain over 20,000 people;
• all elements of dates (except year) for dates directly related to the individual, (e.g., birth date, admission/discharge dates, date of death); and all ages over 89
• and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older;
• telephone numbers;
• fax numbers;
• e-mail addresses;
• social security numbers;
• medical record numbers;
• health plan beneficiary numbers;
• account numbers;
• certificate/license numbers;
• license plate numbers, vehicle identifiers and serial numbers;
• device identifiers and serial numbers;
• URL addresses;
• Internet Protocol (IP) address numbers;
• biometric identifiers, including finger and voice prints;
• full face photographic images and comparable images;
• any other unique identifying number except as created by IHS to re-identify information.
* Source: Policy and Procedures for De-Identification of Protected Health Information and Subsequent Re-Identification 45 CFR 164.514(a)-(c) posted by IHS (Indian Health Services)
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
Natural Language Processing
• NLP Uses> translation,
summarization, information extraction, document retrieval or categorization
• NLP Approaches> Clustering,
Classification, Linguistic analysis, knowledge-based analysis
• NLP Companies in health care> A-Life> Language and
Computing
•Database
•OLAP
•Visualization
•Scrubbing
•NLP
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
Applications in Healthcare
• Safety and quality
• Clinical Research
• Financial
• Public Health
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
“To err is Human” IOM Report
• Characterization> JCAHO Core Measures> CMS Quality measures starter
set> Improves patient care –
reactive response
• Prediction> Identifying cases that can
result in bad clinical outcomes and raising appropriate alarms
> Impacts patient care – proactive response
•Safety and Quality
•Clinical Research
•Financial
•Public Health
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
Quality Measures – Initial Set*
Starter Set of 10 Hospital Quality Measures
Measure Condition
Aspirin at arrival Acute Myocardial Infarction (AMI)/Heart attack
Aspirin at discharge
Beta-Blocker at arrival
Beta-Blocker at discharge
ACE Inhibitor for left ventricular systolic dysfunction
Left ventricular function assessmentHeart Failure
ACE inhibitor for left ventricular systolic dysfunction
Initial antibiotic timing Pneumonia
Pneumococcal vaccination
Oxygenation assessment
*Source: http://www.cms.hhs.gov/quality/hospital/overview.pdf
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
Safety and Quality
• University of Mississippi Medical Center> Data Warehouse Technologies to understand
Medication Errors – Funded by AHRQ> Anonymous report data collection> Data mining technologies> Use of Neural networks and associative rule inference
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
Clinical Research & Clinical Trials
• Pharmacy and medical claims data
• Drug efficacy and clinical trials – for example how effective is a particular drug regimen
• Protein structure analysis
• Genomic data mining
• Diagnostic Imaging data research
•Safety and Quality
•Clinical Research
•Financial
•Public Health
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
The bottom line on cost
• General Utilization review – does the care provided meet accepted clinical and cost guidelines
• Drug Utilization review
• Outlier analysis – exceptions to treatment – analyzing treatments which cost more than the normal or less than normal.
•Safety and Quality
•Clinical Research
•Financial
•Public Health
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
Data mining in public health
• Syndromatic surveillance
• Bio-terrorism detection
• Communicable disease reporting (Centers for Disease Control (CDC))
• DAWN (Drug Awareness and Warning Network)
• Federal Drug Agency (FDA) – reporting of adverse drug events.
•Safety and Quality
•Clinical Research
•Financial
•Public Health
Example effort: AEGIS
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
Data mining
• Uses
• Algorithms
• Technology
• Applications in healthcare
•Descriptive
•Predictive •Classification
•Clustering
•Association rules
•Database
•OLAP
•Visualization
•Scrubbing
•NLP•Safety and Quality
•Clinical Research
•Financial
•Public Health
Conclusion
01010100100101001001010010101010001010101010001010100101010101010101001010010010101001010100100100100010010010100100100001010101010010101010010010010010010100101010100101010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010100010101001010010100101001001010010010010010101001001010010100101001001010100100100101001001001001001010000100101001010100010010101001010010101101010010100100100100100010101010100100101010010010010010010010101001001001001001001001001001001001001001001001001001010100101001010010010101001010010010010010100100101001010010010010100101000101010010010010010010100010010100100101110010100101001001010010100100101001010010100101000100100100100100101001001001001001001010010001010100100101001001001001001010
Conclusion
Technology solutions
uestions?