Open Science: Where Theory Meets Practice
-
Upload
philip-bourne -
Category
Education
-
view
612 -
download
0
Transcript of Open Science: Where Theory Meets Practice
Open Science: When Theory Meets Practice
Philip E. Bourne PhD, FACMIStephenson Chair of Data Science
Director, Data Science Institute
Professor of Biomedical Engineering
https://www.slideshare.net/pebourne
Albert Dorfman Lecture 16/8/17
My Bias in Addressing this Question• Research in computational biology and big data
• Open science zealot
• AVC for Innovation UCSD
• Maintained biological data resources for 15 years (PDB, IEDB)
• Chief Data Officer of the NIH for 3 years (federal view)
• DSI Director 1 month (state view)
2Albert Dorfman Lecture 6/8/17
Open Science: One Definition
• Making as much as the basic and clinical research life cycle as open as possible without compromising the wishes of all stakeholders with the view of accelerating and improving the quality of the research process.
• Having as many people as possible contribute to research outcomes.
6/8/17 Albert Dorfman Lecture 3
Diffuse Intrinsic Pontine Gliomas (DIPG)
• Occur 1:100,000
individuals
• Peak incidence 6-8 years
of age
• Median survival 9-12
months
• Surgery is not an option
• Chemotherapy ineffective
and radiotherapy only
transitive
From Adam Resnick
Timeline of genomic studies in DIPG
• Landmark studies identify
histone mutations as
recurrent driver mutations in
DIPG ~2012
• Almost 3 years later, in
largely the same datasets,
but partially expanded, the
same two groups and 2
others identify ACVR1
mutations as a secondary, co-
occurring mutation
From Adam Resnick
What do we need to do differently to reveal ACVR1?
• ACVR1 is a targetable kinase
• Inhibition of ACVR1 inhibited tumor
progression in vitro
• ~300 DIPG patients a year
• ~60 are predicted to have ACVR1
• If large scale data sets were only
integrated with TCGA and/or rare
disease data in 2012, ACVR1 mutations
would have been identified
• 60 patients/year X 3 years = 180
children’s lives (who likely succumbed to
the disease during that time) could have
been impacted if only data were FAIRFrom Adam Resnick
Use case: Having as many people as possible contribute to research outcomes….
The Story of Meridith
6/8/17 Albert Dorfman Lecture 8
Driving sharing and innovation: Open Science Prize
NIH, Wellcome Trust, HHMI
https://www.openscienceprize.orgAccepted PLOS Biology
• An international scientific challenge competition to encourage and support the prototyping and development of services, tools, or platforms that enable utilization of open content
• 96 submissions received
• Solvers from 45 countries,
spanning 5 continents
• Timeline
• May 2016: Phase 1 winners announced at Health DataPalooza
• Dec 1, 2016: Presentations and public voting
• Feb 2017: Overall winner announced
Consider some of the history of open science from the NIH perspective …
6/8/17 Albert Dorfman Lecture 11
Some slides courtesy of Francis Collins
A Culture of Sharing
1999 20042003 2007 20142008
Research Tools Policy
NIH Data Sharing Policy
Model Organism Policy
Genome-wide Association (GWAS) Policy
2012
NIH Public Access Policy (Publications)
Big Data to Knowledge (BD2K) Initiative
Genomic Data Sharing (GDS) Policy
Modernization of NIH Clinical Trials
White House Initiative
(2013 “HoldrenMemo”)
Guiding Principle of NIH GWAS PolicyThe greatest public benefit will be realized if data from GWAS are made available, under terms and conditions consistent with the informed consent provided by individual participants, in a timely manner to the largest possible number of investigators.
NIH expectation that data would be shared in the NIH database of Genotype and Phenotype (dbGaP)
Data Access Requests Per Year 2007–September 2015
32962
21973
0
5000
10000
15000
20000
25000
30000
35000
2007 2008 2009 2010 2011 2012 2013 2014 2015
Total Approved
A Culture of Sharing
1999 20042003 2007 20142008
Research Tools Policy
NIH Data Sharing Policy
Model Organism Policy
Genome-wide Association (GWAS) Policy
2012
NIH Public Access Policy (Publications)
Big Data to Knowledge (BD2K) Initiative
Genomic Data Sharing (GDS) Policy
Modernization of NIH Clinical Trials
White House Initiative
(2013 “HoldrenMemo”)
NIH Public Access Policy for Publications
• Ensures public access to published results of all research funded by NIH since 2008
• Recipients of NIH funds required to submit final peer-reviewed journal manuscripts to PubMed Central (PMC) upon acceptance for publication
• Papers must be accessible to the public on PMC no later than 12 months after publication
A Culture of Sharing
1999 20042003 2007 20142008
Research Tools Policy
NIH Data Sharing Policy
Model Organism Policy
Genome-wide Association (GWAS) Policy
2012
NIH Public Access Policy (Publications)
Big Data to Knowledge (BD2K) Initiative
Genomic Data Sharing (GDS) Policy
Modernization of NIH Clinical Trials
White House Initiative
(2013 “HoldrenMemo”)
Harnessing Data to Improve Health: BD2K (Big Data to Knowledge)
NIH’s 6-year initiative to use data science to foster an open digital ecosystem that will accelerate efficient, cost-effective biomedical research to enhance health, lengthen life, and reduce illness and disability
Programs and activities:
• Advance discovery for biomedical research
• Facilitate use and re-use of biomedical data
• Develop analytical methods and software
• Enhance biomedical data science training
Will we do this research in a different way?
Will it become more like Airbnb?
6/6/17 UNC 24
Bonazzi & Bourne 2017 PLOS Biology 15(4) e2001818
I am not crazy, hear me out
• Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host)
• The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder
• It seems to be working: • 60 million users searching 2 million listings in 192 countries
• Average of 500,000 stays per night.
• Evaluation of US $25bn
Bonazzi & Bourne 2017 PLOS Biology 15(4) e2001818
Paper Author Paper Reader
Data Provider Data Consumer
Employer Employee
Reagent Provider
Reagent Consumer
Software Provider
Software Consumer
Grant Writer Grant Reviewer
Supplier Consumer Platform
MS ProjectGoogle Drive
CourseraResearchgateAcademia.eduOpen Science
FrameworkSynapseF1000
Rio
Educator Student
Platforms – The situation today
Commons Compliance
• Treat products of research – data, methods, papers etc. as digital objects
• These digital objects exist in a sharedvirtual space
• Digital object compliance through FAIR principles:
• Findable
• Accessible (and usable)
• Interoperable
• Reusable
https://commonfund.nih.gov/bd2k/commons
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
We Need Data and Knowledge About That Data to Interoperate
1. User clicks on content
2. Metadata and webservices to data provide an interactiveview that can be annotated
3. Selecting features provides a data/knowledge mashup
4. Analysis leads to new content I can share
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
PLoS Comp. Biol. 2005 1(3) e34
Incentives
• Airbnb• Monetize unutilized
space
• Ease of use
• New vacation experience
• Commons• Need to improve rigor
and reproducibility• Productivity• Sustainability
• Education and training
• Opportunity to undertake elastic compute on large complex data
https://commonfund.nih.gov/bd2k/commons
A Culture of Sharing
1999 20042003 2007 20142008
Research Tools Policy
NIH Data Sharing Policy
Model Organism Policy
Genome-wide Association (GWAS) Policy
2012
NIH Public Access Policy (Publications)
Big Data to Knowledge (BD2K) Initiative
Genomic Data Sharing (GDS) Policy
Modernization of NIH Clinical Trials
White House Initiative
(2013 “HoldrenMemo”)
NIH Genomic Data Sharing (GDS) Policy• Purpose
• Sets forth expectations, responsibilities that ensure broad, responsible sharing of genomic research data in a timely manner
• Scope• All NIH-funded research generating large-scale human or non-
human genomic data – and their use for subsequent research• Data to be submitted to NIH-designated data repositories (e.g.,
dbGaP, GEO, GenBank, WormBase, FlyBase, Rat Genome Database)
• Applies to all funding mechanisms (grants, contracts, intramural support) with no minimum threshold for cost
• Released August 2014; effective January 25, 2015
gds.nih.gov
Data Sharing Goes Global: GA4GHGlobal Alliance for Genomics and Health
• Accelerating the potential of genomic medicine to advance human health, by:
• Establishing common framework of approaches to enable effective, responsible sharing of genomic and clinical data
• Catalyzing data sharing projects that drive and demonstrate value of data sharing
• Alliance*: >350 leading institutions (healthcare, research, advocacy, life science, IT) representing 35 countries
• Working groups (Clinical, Data, Security, Regulatory & Ethics) assess, prioritize needs
• Form task teams to produce tools, solutions, demonstration projects
*Statistics as of October 5, 2015
A Culture of Sharing
1999 20042003 2007 20142008
Research Tools Policy
NIH Data Sharing Policy
Model Organism Policy
Genome-wide Association (GWAS) Policy
2012
NIH Public Access Policy (Publications)
Big Data to Knowledge (BD2K) Initiative
Genomic Data Sharing (GDS) Policy
Modernization of NIH Clinical Trials
White House Initiative
(2013 “HoldrenMemo”)
Modernizing NIH Clinical Trials Activities: The Need• NIH-Funded trials published within 100 months of
completion
Less than 50% published within 30 months of completion
BMJ 2012;344:d7292