Post on 20-Jun-2015
description
Social and Decision Analytics Laboratory
SALLIE KELLER, DIRECTOR SOCIAL AND DECISION ANALYTICS LABORATORY
VIRGINIA BIOINFORMATICS INSTITUTE AT VIRGINIA TECH
Health & Social Development Analytics and Big Data –
A Joint AIR and Virginia Tech Workshop
Social and Decision Analytics Laboratory
“In attempting to arrive at the truth, I have applied everywhere for information, but scarcely an instance have I been able to obtain hospital records fit for any purpose of comparison. If they could be obtained, they would enable us to decide many other questions besides the ones alluded to. They would show subscribers how their money was spent, what amount of good was really being done with it, or whether their money was not doing mischief rather than good.”
Florence Nightingale (1864)
Starting the Journey
Social and Decision Analytics Laboratory
• Pressures & Opportunities of Today
• Big data – Why important? – What about privacy?
• Health & Social Development analytics – What makes it big data? – How does big data change
current approaches? • Selected examples • Methodology challenges
Outline
Social and Decision Analytics Laboratory
• Health as a percent of GDP – 5% in 1960 to 18% in 2012
• Changing demographics – Increasing minority
populations – Rapidly aging populations – Rural vs. urban living – Increasing inequality
• Focus on the patient – Health outcomes
4
Source: Congressional Budget Office.
Health and Social Development Pressures
Social and Decision Analytics Laboratory
• Drivers behind health care costs – Technology, infectious and chronic diseases
• Workforce demand – Care givers, biomedical researchers, IT specialists
• Prevention and personalization – Changing demographics and lifestyles
Health Care Analytics Opportunities
Social and Decision Analytics Laboratory
• Understanding and anticipating – Changes in population growth, aging and diversity – Adapting to increasing urbanization – Building individual and community resiliency
• Tailoring programs and policies by defined subpopulations
Social Development Analytics Opportunities
Social and Decision Analytics Laboratory
• Big data – Structured & unstructured – Collections
• Designed • Observational/convenience
• Statistics / analytics – Replication, reproducibility,
representativeness – Description, association, causation
• prediction ≠ correlation
• Cost drivers – Analytics and informatics, NOT data collection
Big Data - Doesn’t matter what its called, only matters what you do with it
Social and Decision Analytics Laboratory
• Social science research – Traditionally informed by
surveys and statistically designed experiments
– Clean, well-controlled, limited in scale (~103)
• Bringing “Big data” to bear for social policy – Data informed computational
social science models – Quantitative social science
methods & practice at scale
Now Big Data is Changing Social Sciences
Social and Decision Analytics Laboratory
Methodological Issues New methods and tools are needed to ensure
– Data access – Data quality – Representativeness – Replication – Reproducibility – Characterization of noisy
data • Managing biases
– Selection bias – Measurement bias
National Research Council 2013
Social and Decision Analytics Laboratory 10
1993 2013
Changing Privacy Landscape
Social and Decision Analytics Laboratory
• European Council 1995/1996: – “… any information relating to an
identified or identifiable natural person; an identifiable person is one who can be identified (data subject), directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity.”
• World Economic Forum 2011: – “… digital data created by and
about people.”
11
Personal Data - New Asset Class
Social and Decision Analytics Laboratory
World Economic Forum 2013 Yesterday
• Definition of personal data is predetermined and binary
• Individual provides legal consent but not truly engaged
• Policy framework focuses on minimizing risk to individual
Today • Definition of personal data is
contextual and dependent on social norms
• Individual engaged and understands how data is used and value created
• Policy needs to focus on balancing protection with innovation and economic growth
12
Social and Decision Analytics Laboratory
Further Privacy Thoughts • Will people voluntarily give up their data if they can see a
personal or societal benefit? • Are norms/expectations changing with generations? • What are technical fixes for multi-level privacy/
classification? • What is the optimal level of privacy for studies of interest?
Social and Decision Analytics Laboratory
Can we table privacy for the duration of the workshop?
• Deserves serious, devoted conversation • We should be leaders in this conversation • Will need to specifically address as projects develop
Social and Decision Analytics Laboratory
Changing Landscape of Health Data • Electronic Health Records • Interoperability challenges • Public choices
– 23andME – Google Health – Health Vault
P. Breugel, Tower of Babel (1563)
Social and Decision Analytics Laboratory
Personal Health Data • Today
– medical history – lab results – imaging results (X-ray,
MRI) – medication records – Allergies – vaccination records – demographic data – billing information
• Tomorrow – genome sequence – Epigenome – Transcriptome – Proteome – Metabolome – Immunome – Microbiome – survey data – health monitor data
Social and Decision Analytics Laboratory
Omics "Omics" datasets are large, require sophisticated interpretation, and will have to be reinterpreted over time as knowledge and standard of care change
• Tomorrow – Genome sequence – Epigenome – Transcriptome – Proteome – Metabolome – Immunome – Microbiome – Survey data – Health monitor data
Social and Decision Analytics Laboratory
Self Reported Data
These self-reported data will vary widely in quality and utility for research, but will be an important source of phenotype information
• Tomorrow – genome sequence – Epigenome – Transcriptome – Proteome – Metabolome – Immunome – Microbiome – survey data – health monitor data
Social and Decision Analytics Laboratory
Tomorrow is Today • Infrastructure is being created to enable large longitudinal
studies that combine: – Comprehensive electronic health records – Behavioral and environmental factors (survey information) – Genetic information (partial or complete genome sequence)
NIH - Electronic Medical Records and Genomics Network Wellcome Trust - UK Biobank Vanderbilt University - BioVU Kaiser Permanente – Research Genes, Enviro., & Health Veterans Administration - Million Veteran Program
Social and Decision Analytics Laboratory
Tomorrow is Today • Began collecting DNA in 2007; now has 167,250 samples • Opt-out program; relatively few patients opt out • Samples are matched with deidentified EHRs • Use is restricted to Vanderbilt researchers
NIH - Electronic Medical Records and Genomics Network Wellcome Trust - UK Biobank Vanderbilt University - BioVU Kaiser Permanente – Research Genes, Enviro., & Health Veterans Administration - Million Veteran Program
Social and Decision Analytics Laboratory
Additional Characteristics that Make the Data Big • Multi-sourced • Observational • Noisy • Multi-purposed
Social and Decision Analytics Laboratory
Multi-Sourced Data Health and social development occurs within context • Individual and family history and experiences • Environment • Access to care, programs, and facilities • Local, state, and national health and welfare systems • Political and economic factors Information communication technology opens opportunity to capture meta data and provenance of the information Challenge: integration and interpretation of data captured under such varied circumstances
Social and Decision Analytics Laboratory
Observational Data • Can come from every stakeholder, source, or technology
that interacts with the patient, care giver, or facility • Little discrimination on what is captured
– Internet medical surveys, on-line disease tracking, prevention activities, attitudes on blogs, etc.
• On-demand data from multiple systems – Social networks, education records, work history, medical
records, extramural activities, etc.
Presents opportunity to study the health and development processes as the naturally occur Challenge: manage biases, data quality, and data linkage
Social and Decision Analytics Laboratory
Meanwhile, if the quantity of information is increasing by 2.5 quintillion bytes per day, the amount of useful information almost certainly isn’t. Most of it is just noise, and the noise is increasing faster than the signal. Nate Silver, 2013
Challenge: uncertainty quantification
Noisy data
Social and Decision Analytics Laboratory
Multi-Purposed Data • Individual health and well being versus the population • Data reuse for multiple purposes
– Macro-level: regional, state, national, and international – Meso-level: institution-wide – Micro-level: individuals, cohorts, and groups
An opportunity to more fully use data Challenge: What is optimal for an individual may not be optimal for the population and vice versa
Source: Buckingham Shum, S. (2012)
Social and Decision Analytics Laboratory
Case Studies from VT Colleagues and Collaborators
• Bureau of Economic Analysis Health Accounts • Out of Hospital Cardiac Arrest • EMBERS • Mild Cognitive Impairment • Synthetic Information
Household Consumption Expenditures for Medical Care: An Alternate Presentation
Ana Aizcorbe, Eli B. Liebman, David M. Cutler, and
Allison B. Rosen
• Health care predicted to reach 20% of GDP by 2020 • Health care expenditures increased ~29% (2002-2006) • Developing a satellite account on medical care spending • Data include public and private sources
Survey of Current Business June 2012:34-47
http://www.bea.gov/scb/pdf/2012/06%20June/0612_healthcare.pdf
Social and Decision Analytics Laboratory
Growth in spending varies by disease Growth'in'Medical'Care'Spening,'200272006' Percent'Endocrine' 70.2'Blood' 68.9'Complica9ons'of'pregnancy' 68.9'Residual'codes'and'unclassified' 42.5'Musculoskeletal'system''' 38.6'Injury'and'poisoning' 34.2'Genitourinary'system.' 30.5'Diges9ve'system'' 28.2'Circulatory'system'' 25.6'Nervous'system'' 25.3'Neoplasms'' 24.0'Mental'illness'' 16.7'Respiratory'system' 14.8'Skin' 5.8'Symptoms'and'illNdefined' 2.4'Congenital'anomalies3'' N8.3'Infec9ous'and'parasi9c' N8.7'Certain'perinatal'condi9ons'' N28.1'
Copyright © American Heart Association
A Case-Crossover Analysis of Out-of-Hospital Cardiac Arrest and Air Pollution Clinical Perspective
Katherine B. Ensor, Loren H. Raun, and David Persse
• Houston 2004-2011 • Integration of hourly ambient air pollution data with EMS
locations
Circulation Volume 127(11):1192-1199
March 19, 2013
Copyright © American Heart Association
Locations of OHCA events between 2004 and 2011 in Houston, Texas
Forest plot of relative risk of OHCA associated per an interquartile range increase in the average of 1- to 3-hour lagged ozone and 1- to 2-
day lagged PM2.5 by age, ethnicity, sex, and season.
Copyright © American Heart Association
Open Source Indicators for Forecasting ILI Case Counts and Rare Disease Outbreaks
Naren Ramakrishnan (PI) – involves large multi-institutional team • EMBERS: Early Model-based Event Recognition using
Surrogates • Fully automated processing of data and delivery of warnings
Source
https://www.cs.vt.edu/node/6565
33
Google Flu Trends Google Search Trends Healthmap Weather Twitter OpenTable Parking Lot Imagery
EMBERS Prediction Pipeline
34
EMBERS Dashboard: Fusing Data and Models
Family Triad Perceptions of Mild Cognitive Impairment (MCI)
Karen A. Roberto, Rosemary Blieszner and Tina Savla
• Age-related decline in memory and executive functioning • 10-20% of individuals aged 65+ have MCI • Data Sources
– Memory clinics, churches, senior housing – Family-level data: Elder with MCI age 60+, Primary care partner ,
Secondary care partner
Journal of Gerontology: Social Sciences 2011(6): 756-768
reasoning, planning, speech, movement emotions, problem-solving
vision perception of touch, pressure, temperature, pain
perception and recognition of auditory stimuli, memory
*Executive Function*
Brain Functioning
Benefits of Multiple Informants
Families
Complete Acknowledgement
Partial Acknowledgement
No Acknowledgement
Passive Acknowledgement
Synthetic Information – Disease (Pandemic) Evolution
Stephen Eubank, Bryan Lewis, and many others • Age-related decline in memory and executive functioning • 10-20% of individuals aged 65+ have MCI • Data Sources
– Memory clinics, churches, senior housing – Family-level data: Elder with MCI age 60+, Primary care
partner , Secondary care partner
Source : Roberto, Blieszner, McCann, & McPherson 2011
FIX
http://supercomputing.vbi.vt.edu/
Structured and Unstructured Data Sources
and transforms them…
Overview
Structured and Unstructured Data Sources …into
Synthetic Information
creates and enables
Synthetic Platform
Interactive visualization - Virginia
Social and Decision Analytics Laboratory
• Imagine a different world –case studies are examples • Look for synergistic capabilities to build partnerships • Assess opportunities to integrate multiple sources of data
and approaches to comprehensively understand health and social development issues
• Propose prototype projects to work on together to set the stage for future projects
Goals for the Workshop