2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative working group meeting

Post on 15-Jul-2015

2.829 views 0 download

Tags:

Transcript of 2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative working group meeting

Precision Medicine Workshop:Big Data Aspects

Atul Butte, MD, PhD

Director, Institute for Computational Health Science

University of California, San Francisco

atul.butte@ucsf.edu

@atulbutte

@ImmPortDB

Disclosures• Scientific founder and

advisory board membership– Genstruct– NuMedii– Personalis– Carmenta

• Honoraria for talks– Lilly– Pfizer– Siemens– Bristol Myers Squibb– AstraZeneca– Roche– Genentech– Warburg Pincus

• Past or present consultancy– Lilly– Johnson and Johnson– Roche– NuMedii– Genstruct– Tercica– Ecoeos– Ansh Labs– Prevendia– Samsung– Assay Depot– Regeneron– Verinata

– Pathway Diagnostics– Geisinger Health– Covance– Wilson Sonsini Goodrich & Rosati– 10X Genomics– Medgenics– GNS Healthcare– Gerson Lehman Group– Coatue Management

• Corporate Relationships– Northrop Grumman– Aptalis– Thomson Reuters– Intel– SAP– SV Angel

• Speakers’ bureau– None

• Companies started by students– Carmenta– Serendipity– NuMedii– Stimulomics– NunaHealth– Praedicat– MyTime– Flipora

bit.ly/1b4sa7b

Institute for Computational Health Sciences

1. Major potential for disparities

• Will you capture any from the 2.2 million incarcerated? Nearly half black?

• The 43 million over age 65? Only 16% over age 65 with income under $50k have a smartphone.

• The 14 million disabled?• The 4 million just born last year?• The 2.6 million that died last year?

2. Start with a million, or end with a million?

• Keeping it sticky and useful?

3. Active participants

• If data returned to participants, will they alter their behavior and exposures?

• Can we tell they are doing this?

4. Not enough power

• So must think early about downstream validation studies.

• Leave one sub-cohort out cross-validation?

• Or are you testing whether every individual gets something out of the approach?

5. If done right, reproducibility won’t matter

6. Exploit the network effect

• Connect cohort and data to others, to gain synergy• Need methods to connect data sets,

keep confidentiality

• Not just academic cohorts, also pharma trials?• Maybe recruit at the end of a trial, and gather

starting data from pharma and contract research organizations.

• Maybe start from the 35 million discharged from a hospitalized last year?

• Maybe work with Quest and LabCorp to get existing lab data on patients

7. Success of the effort depends on 3rd party usage

• Needs to be easy to access and understand data without you.

• Easy to build useful tools and mashup data.• Shouldn’t have to hire an insider or expert to

understand the data.• Of course, the cloud and all modern commercial

tools and services should be allowed.• Put real money into dissemination, do not

assume this will happen correctly.• Beyond data sharing agreements• Difference between Genome and ENCODE

8. Perfection is the enemy of the good

• Perfection delays data release.• You won’t always make the right choices.• Keep simple things simple (e.g. API), but

complex things possible (e.g. downloading).

• Let others in, access, and build tools, alternative representations.

9. Data gets stale

• 1500 papers at Nucleic Acids Research on open databases!

• Even reference data sets get stale.• Will soon be a struggle to get eyes on this

data set.• Shelf life from technologies, from

measurements. Freshen data.• Framingham Health Study has great data

on dbGAP. Why aren’t you using it now?

In August, I unveiled the Cancer Genome Anatomy Project -- the comprehensive clearinghouse of information about tens of thousands of cancer genes, which will enable scientists and researchers around the world to work together through a website available on the Internet, to bring us closer a cure.-- Al Gore, 1998

9. Data gets stale

• 1500 papers at Nucleic Acids Research on open databases!

• Even reference data sets get stale.• Will soon be a struggle to get eyes on this

data set.• Shelf life from technologies, from

measurements. Freshen data.• Framingham Health Study has great data

on dbGAP. Why aren’t you using it now?

10. Leave some interesting questions open for others

• Don’t shoot for a whole issue of Science or Nature that tries to answer everything about a million people.

• Leave some of the Nature papers for others.• The real value of this data set will be in the

questions others can see being asked and answered

• Great success stories already with Geisinger, Million Veterans, and many more.

• Create something here that cannot be done by the academic, medical, and private world.