Open Science: Where Theory Meets Practice

39
Open Science: When Theory Meets Practice Philip E. Bourne PhD, FACMI Stephenson Chair of Data Science Director, Data Science Institute Professor of Biomedical Engineering [email protected] https://www.slideshare.net/pebourne Albert Dorfman Lecture 1 6/8/17

Transcript of Open Science: Where Theory Meets Practice

Open Science: When Theory Meets Practice

Philip E. Bourne PhD, FACMIStephenson Chair of Data Science

Director, Data Science Institute

Professor of Biomedical Engineering

[email protected]

https://www.slideshare.net/pebourne

Albert Dorfman Lecture 16/8/17

My Bias in Addressing this Question• Research in computational biology and big data

• Open science zealot

• AVC for Innovation UCSD

• Maintained biological data resources for 15 years (PDB, IEDB)

• Chief Data Officer of the NIH for 3 years (federal view)

• DSI Director 1 month (state view)

2Albert Dorfman Lecture 6/8/17

Open Science: One Definition

• Making as much as the basic and clinical research life cycle as open as possible without compromising the wishes of all stakeholders with the view of accelerating and improving the quality of the research process.

• Having as many people as possible contribute to research outcomes.

6/8/17 Albert Dorfman Lecture 3

User Case: Accelerating the Research Process …

6/8/17 Albert Dorfman Lecture 4

Diffuse Intrinsic Pontine Gliomas (DIPG)

• Occur 1:100,000

individuals

• Peak incidence 6-8 years

of age

• Median survival 9-12

months

• Surgery is not an option

• Chemotherapy ineffective

and radiotherapy only

transitive

From Adam Resnick

Timeline of genomic studies in DIPG

• Landmark studies identify

histone mutations as

recurrent driver mutations in

DIPG ~2012

• Almost 3 years later, in

largely the same datasets,

but partially expanded, the

same two groups and 2

others identify ACVR1

mutations as a secondary, co-

occurring mutation

From Adam Resnick

What do we need to do differently to reveal ACVR1?

• ACVR1 is a targetable kinase

• Inhibition of ACVR1 inhibited tumor

progression in vitro

• ~300 DIPG patients a year

• ~60 are predicted to have ACVR1

• If large scale data sets were only

integrated with TCGA and/or rare

disease data in 2012, ACVR1 mutations

would have been identified

• 60 patients/year X 3 years = 180

children’s lives (who likely succumbed to

the disease during that time) could have

been impacted if only data were FAIRFrom Adam Resnick

Use case: Having as many people as possible contribute to research outcomes….

The Story of Meridith

6/8/17 Albert Dorfman Lecture 8

A broader example of what comes out of open science…

6/8/17 Albert Dorfman Lecture 9

Driving sharing and innovation: Open Science Prize

NIH, Wellcome Trust, HHMI

https://www.openscienceprize.orgAccepted PLOS Biology

• An international scientific challenge competition to encourage and support the prototyping and development of services, tools, or platforms that enable utilization of open content

• 96 submissions received

• Solvers from 45 countries,

spanning 5 continents

• Timeline

• May 2016: Phase 1 winners announced at Health DataPalooza

• Dec 1, 2016: Presentations and public voting

• Feb 2017: Overall winner announced

Consider some of the history of open science from the NIH perspective …

6/8/17 Albert Dorfman Lecture 11

Some slides courtesy of Francis Collins

Laying the Foundation: Human Genome Project, Bermuda, 1996

“The HGP changed the norms around data sharing in biomedical research.”

Data Sharing: An Essential Component

1000 Genomes Project

A Culture of Sharing

1999 20042003 2007 20142008

Research Tools Policy

NIH Data Sharing Policy

Model Organism Policy

Genome-wide Association (GWAS) Policy

2012

NIH Public Access Policy (Publications)

Big Data to Knowledge (BD2K) Initiative

Genomic Data Sharing (GDS) Policy

Modernization of NIH Clinical Trials

White House Initiative

(2013 “HoldrenMemo”)

Guiding Principle of NIH GWAS PolicyThe greatest public benefit will be realized if data from GWAS are made available, under terms and conditions consistent with the informed consent provided by individual participants, in a timely manner to the largest possible number of investigators.

NIH expectation that data would be shared in the NIH database of Genotype and Phenotype (dbGaP)

Data Access Requests Per Year 2007–September 2015

32962

21973

0

5000

10000

15000

20000

25000

30000

35000

2007 2008 2009 2010 2011 2012 2013 2014 2015

Total Approved

A Culture of Sharing

1999 20042003 2007 20142008

Research Tools Policy

NIH Data Sharing Policy

Model Organism Policy

Genome-wide Association (GWAS) Policy

2012

NIH Public Access Policy (Publications)

Big Data to Knowledge (BD2K) Initiative

Genomic Data Sharing (GDS) Policy

Modernization of NIH Clinical Trials

White House Initiative

(2013 “HoldrenMemo”)

NIH Public Access Policy for Publications

• Ensures public access to published results of all research funded by NIH since 2008

• Recipients of NIH funds required to submit final peer-reviewed journal manuscripts to PubMed Central (PMC) upon acceptance for publication

• Papers must be accessible to the public on PMC no later than 12 months after publication

A Culture of Sharing

1999 20042003 2007 20142008

Research Tools Policy

NIH Data Sharing Policy

Model Organism Policy

Genome-wide Association (GWAS) Policy

2012

NIH Public Access Policy (Publications)

Big Data to Knowledge (BD2K) Initiative

Genomic Data Sharing (GDS) Policy

Modernization of NIH Clinical Trials

White House Initiative

(2013 “HoldrenMemo”)

Harnessing Data to Improve Health: BD2K (Big Data to Knowledge)

NIH’s 6-year initiative to use data science to foster an open digital ecosystem that will accelerate efficient, cost-effective biomedical research to enhance health, lengthen life, and reduce illness and disability

Programs and activities:

• Advance discovery for biomedical research

• Facilitate use and re-use of biomedical data

• Develop analytical methods and software

• Enhance biomedical data science training

Will we do this research in a different way?

Will it become more like Airbnb?

6/6/17 UNC 24

Bonazzi & Bourne 2017 PLOS Biology 15(4) e2001818

I am not crazy, hear me out

• Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host)

• The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder

• It seems to be working: • 60 million users searching 2 million listings in 192 countries

• Average of 500,000 stays per night.

• Evaluation of US $25bn

Bonazzi & Bourne 2017 PLOS Biology 15(4) e2001818

Is not biomedical research the same?

Paper Author Paper Reader

Data Provider Data Consumer

Employer Employee

Reagent Provider

Reagent Consumer

Software Provider

Software Consumer

Grant Writer Grant Reviewer

Supplier Consumer Platform

MS ProjectGoogle Drive

CourseraResearchgateAcademia.eduOpen Science

FrameworkSynapseF1000

Rio

Educator Student

Platforms – The situation today

Commons Compliance

• Treat products of research – data, methods, papers etc. as digital objects

• These digital objects exist in a sharedvirtual space

• Digital object compliance through FAIR principles:

• Findable

• Accessible (and usable)

• Interoperable

• Reusable

https://commonfund.nih.gov/bd2k/commons

1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

We Need Data and Knowledge About That Data to Interoperate

1. User clicks on content

2. Metadata and webservices to data provide an interactiveview that can be annotated

3. Selecting features provides a data/knowledge mashup

4. Analysis leads to new content I can share

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

The Knowledge and Data Cycle

PLoS Comp. Biol. 2005 1(3) e34

Incentives

• Airbnb• Monetize unutilized

space

• Ease of use

• New vacation experience

• Commons• Need to improve rigor

and reproducibility• Productivity• Sustainability

• Education and training

• Opportunity to undertake elastic compute on large complex data

https://commonfund.nih.gov/bd2k/commons

A Culture of Sharing

1999 20042003 2007 20142008

Research Tools Policy

NIH Data Sharing Policy

Model Organism Policy

Genome-wide Association (GWAS) Policy

2012

NIH Public Access Policy (Publications)

Big Data to Knowledge (BD2K) Initiative

Genomic Data Sharing (GDS) Policy

Modernization of NIH Clinical Trials

White House Initiative

(2013 “HoldrenMemo”)

NIH Genomic Data Sharing (GDS) Policy• Purpose

• Sets forth expectations, responsibilities that ensure broad, responsible sharing of genomic research data in a timely manner

• Scope• All NIH-funded research generating large-scale human or non-

human genomic data – and their use for subsequent research• Data to be submitted to NIH-designated data repositories (e.g.,

dbGaP, GEO, GenBank, WormBase, FlyBase, Rat Genome Database)

• Applies to all funding mechanisms (grants, contracts, intramural support) with no minimum threshold for cost

• Released August 2014; effective January 25, 2015

gds.nih.gov

Data Sharing Goes Global: GA4GHGlobal Alliance for Genomics and Health

• Accelerating the potential of genomic medicine to advance human health, by:

• Establishing common framework of approaches to enable effective, responsible sharing of genomic and clinical data

• Catalyzing data sharing projects that drive and demonstrate value of data sharing

• Alliance*: >350 leading institutions (healthcare, research, advocacy, life science, IT) representing 35 countries

• Working groups (Clinical, Data, Security, Regulatory & Ethics) assess, prioritize needs

• Form task teams to produce tools, solutions, demonstration projects

*Statistics as of October 5, 2015

A Culture of Sharing

1999 20042003 2007 20142008

Research Tools Policy

NIH Data Sharing Policy

Model Organism Policy

Genome-wide Association (GWAS) Policy

2012

NIH Public Access Policy (Publications)

Big Data to Knowledge (BD2K) Initiative

Genomic Data Sharing (GDS) Policy

Modernization of NIH Clinical Trials

White House Initiative

(2013 “HoldrenMemo”)

Modernizing NIH Clinical Trials Activities: The Need• NIH-Funded trials published within 100 months of

completion

Less than 50% published within 30 months of completion

BMJ 2012;344:d7292

Modernizing NIH Clinical Trials Activities: Call to Action

Progress has Been Made

6/8/17 Albert Dorfman Lecture 37

Improvements to the Common Rule

6/8/17 Albert Dorfman Lecture 38

Acknowledgements

6/6/17 UNC 39

The BD2K Team at NIH

My New Colleagues at UVA

The 150 folks who have passed through my laboratoryhttps://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0