2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative working group meeting
-
Upload
university-of-california-san-francisco -
Category
Health & Medicine
-
view
2.829 -
download
0
Transcript of 2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative working group meeting
Precision Medicine Workshop:Big Data Aspects
Atul Butte, MD, PhD
Director, Institute for Computational Health Science
University of California, San Francisco
@atulbutte
@ImmPortDB
Disclosures• Scientific founder and
advisory board membership– Genstruct– NuMedii– Personalis– Carmenta
• Honoraria for talks– Lilly– Pfizer– Siemens– Bristol Myers Squibb– AstraZeneca– Roche– Genentech– Warburg Pincus
• Past or present consultancy– Lilly– Johnson and Johnson– Roche– NuMedii– Genstruct– Tercica– Ecoeos– Ansh Labs– Prevendia– Samsung– Assay Depot– Regeneron– Verinata
– Pathway Diagnostics– Geisinger Health– Covance– Wilson Sonsini Goodrich & Rosati– 10X Genomics– Medgenics– GNS Healthcare– Gerson Lehman Group– Coatue Management
• Corporate Relationships– Northrop Grumman– Aptalis– Thomson Reuters– Intel– SAP– SV Angel
• Speakers’ bureau– None
• Companies started by students– Carmenta– Serendipity– NuMedii– Stimulomics– NunaHealth– Praedicat– MyTime– Flipora
bit.ly/1b4sa7b
Institute for Computational Health Sciences
1. Major potential for disparities
• Will you capture any from the 2.2 million incarcerated? Nearly half black?
• The 43 million over age 65? Only 16% over age 65 with income under $50k have a smartphone.
• The 14 million disabled?• The 4 million just born last year?• The 2.6 million that died last year?
2. Start with a million, or end with a million?
• Keeping it sticky and useful?
3. Active participants
• If data returned to participants, will they alter their behavior and exposures?
• Can we tell they are doing this?
4. Not enough power
• So must think early about downstream validation studies.
• Leave one sub-cohort out cross-validation?
• Or are you testing whether every individual gets something out of the approach?
5. If done right, reproducibility won’t matter
6. Exploit the network effect
• Connect cohort and data to others, to gain synergy• Need methods to connect data sets,
keep confidentiality
• Not just academic cohorts, also pharma trials?• Maybe recruit at the end of a trial, and gather
starting data from pharma and contract research organizations.
• Maybe start from the 35 million discharged from a hospitalized last year?
• Maybe work with Quest and LabCorp to get existing lab data on patients
7. Success of the effort depends on 3rd party usage
• Needs to be easy to access and understand data without you.
• Easy to build useful tools and mashup data.• Shouldn’t have to hire an insider or expert to
understand the data.• Of course, the cloud and all modern commercial
tools and services should be allowed.• Put real money into dissemination, do not
assume this will happen correctly.• Beyond data sharing agreements• Difference between Genome and ENCODE
8. Perfection is the enemy of the good
• Perfection delays data release.• You won’t always make the right choices.• Keep simple things simple (e.g. API), but
complex things possible (e.g. downloading).
• Let others in, access, and build tools, alternative representations.
9. Data gets stale
• 1500 papers at Nucleic Acids Research on open databases!
• Even reference data sets get stale.• Will soon be a struggle to get eyes on this
data set.• Shelf life from technologies, from
measurements. Freshen data.• Framingham Health Study has great data
on dbGAP. Why aren’t you using it now?
In August, I unveiled the Cancer Genome Anatomy Project -- the comprehensive clearinghouse of information about tens of thousands of cancer genes, which will enable scientists and researchers around the world to work together through a website available on the Internet, to bring us closer a cure.-- Al Gore, 1998
9. Data gets stale
• 1500 papers at Nucleic Acids Research on open databases!
• Even reference data sets get stale.• Will soon be a struggle to get eyes on this
data set.• Shelf life from technologies, from
measurements. Freshen data.• Framingham Health Study has great data
on dbGAP. Why aren’t you using it now?
10. Leave some interesting questions open for others
• Don’t shoot for a whole issue of Science or Nature that tries to answer everything about a million people.
• Leave some of the Nature papers for others.• The real value of this data set will be in the
questions others can see being asked and answered
• Great success stories already with Geisinger, Million Veterans, and many more.
• Create something here that cannot be done by the academic, medical, and private world.