Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an...
Transcript of Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an...
![Page 1: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/1.jpg)
Justin Guinney, PhD
Director, Computational Oncology
Sage Bionetworks
Co-Director
DREAM Challenges
Open science and data sharing
in practice
![Page 2: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/2.jpg)
promote open systems, incentives, and norms
to redefine how complex biological data is
gathered, shared, and used
![Page 3: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/3.jpg)
our research is built on three pillars
Open
Science
Team
Science
Participant
centered
Science
![Page 4: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/4.jpg)
we pilot approaches
to create open systems, incentives, and norms
Pilot Systems and Approaches
Open
Science
Team
Science
Participant
centered
Science
![Page 5: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/5.jpg)
we build infrastructure
to provide robust, reusable solutions
Infrastructure
Open
Science
Team
Science
Participant
centered
Science
Pilot Systems and Approaches
![Page 6: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/6.jpg)
we support research communities
that operate under these principles
Infrastructure
Pilot Systems and Approaches
Open
Science
Team
Science
Participant
centered
Science
![Page 7: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/7.jpg)
cancer communities
Cancer Systems
Biology
Project Genie
NTAP CTF BD2K Neo-epitopes
DREAM Challenges Colorectal cancer
Infrastructure
Pilot Systems and Approaches
Open
Science
Team
Science
Participant
centered
Science
e-consent
![Page 8: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/8.jpg)
Data sharing: with whom?
Sharing with the research community.
Sharing with collaborators.
Sharing with oneself.
![Page 9: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/9.jpg)
Barriers to sharing
Culture / reluctance to share / weak sharing policies
Disorganization and lack of mechanisms to facilitate sharing
![Page 10: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/10.jpg)
Synapse: data management system
http://synapse.org
![Page 11: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/11.jpg)
![Page 12: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/12.jpg)
![Page 13: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/13.jpg)
Synapse
Dash boarding for meta-data
Access controls for sharing
Governance facilities and auditing
Docker store for methods and pipelines
Embedding of visualizations and tools
![Page 14: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/14.jpg)
CTF & Sage
Building networks among CTF researchers
powered by
![Page 15: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/15.jpg)
Building a network for the NF community
![Page 16: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/16.jpg)
Data sharing vignettes
1. AACR Project GENIE
2. DREAM Challenges
![Page 17: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/17.jpg)
![Page 18: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/18.jpg)
GENIE: Motivation
![Page 19: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/19.jpg)
GENIE Consortium
![Page 20: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/20.jpg)
First Data Release
Released January 5, 2017
~19,000 samples
Includes genomic data plus Tier 1 Clinical Data: cancer type, primary v.
metastatic sample, gender, race, age at sequencing, etc.
Data is now available at:
Sage Synapse Platform: http://synapse.org/genie
cBioPortal for Cancer Genomics: http://www.cbioportal.org/genie/
Users are required to agree to terms of access at each site.
![Page 21: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/21.jpg)
Multiple Gene Panels
![Page 22: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/22.jpg)
GENIE Landscape Three largest sample sets:
Non-Small Cell Lung Cancer
Breast Cancer
Colorectal Cancer
![Page 23: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/23.jpg)
Landscape of Clinical Actionability
Long tail of Level 2B mutations
where mutation is linked to
standard therapy in a different
cancer type.
![Page 24: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/24.jpg)
GENIE’s future 2nd release scheduled for end of 2017
Expect to double current database size: over 40k samples!!
More extensive clinical annotation, including patient outcomes, staging,
and treatments
In process of moving GENIE data to GDC!
![Page 25: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/25.jpg)
![Page 26: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/26.jpg)
A crowdsourcing effort that poses quantitative challenges in biomedicine.
Our mission is
to contribute to the solution of important biomedical problems
to foster collaboration between research groups
to democratize data
to accelerate research
to objectively assess and benchmark algorithms
![Page 27: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/27.jpg)
Over last 10 years, we have run Challenges on:
Breast cancer prognosis
Prostate cancer prognosis
Somatic variant detection
Drug sensitivity prediction
Drug combination prediction
Drug toxicity prediction
ALS
Alzheimer’s
Many others…
![Page 28: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/28.jpg)
Models of
sharing
![Page 29: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/29.jpg)
‘Data to model(ers)’
![Page 30: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/30.jpg)
30
Goal: Predict overall survival in patients with
metastatic castration resistant prostate cancer
Enthuse 33
N=470
Training data Validation data
Enthuse M1
N=380
Guinney, et al, Lancet Oncology, 2017
![Page 31: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/31.jpg)
How can we improve model reproducibility?
How can we improve utilization of restricted data?
![Page 32: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/32.jpg)
Cheap and scalable data storage and
computing
![Page 33: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/33.jpg)
Virtualization and container
technologies: platform agnostic
application and model portability
![Page 34: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/34.jpg)
‘Data to model(ers)’
‘Models to data’ Hybrid:
![Page 35: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/35.jpg)
Goal: Improve identification of “high-risk” patients with
newly diagnosed multiple myeloma
Public Private
![Page 36: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/36.jpg)
‘Models to data’
![Page 37: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/37.jpg)
Goal: Improve accuracy of digital mammograms screening by
classifying images as low or high risk for breast cancer
1 in 10 women are falsely diagnosed with breast cancer.
![Page 38: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/38.jpg)
641k images
146k exams
87k women
But…
Images not allowed to
be directly accessed by
participants.
![Page 39: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/39.jpg)
641k
10k
600k
![Page 40: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/40.jpg)
~ 1k participants
~ 10k model submissions
~ 1k TB (1 Petabyte) data usage
~ 874k CPU-hours
Key statistics: DM Challenge
Challenge summary • Currently, in 3rd round of leaderboard phase
• Validation phase begins in April
• Currently, top models are performing as well as a
radiologist (sensitivity + specificity)
![Page 41: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/41.jpg)
Prostate Cancer,
Drug Combination,
Toxicogenomics
Multiple Myeloma,
RNA fusion detection
Digital Mammography
Challenges
![Page 42: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/42.jpg)
Data sharing: what can you do?
• Play an active role in setting data sharing policies.
• Set clear guidelines and expectations on what is meant by data sharing.
• Put in place mechanisms for oversight and enforcement of data sharing practices.
![Page 43: Open science and data sharing in practice · Docker store for methods and pipelines ... •Play an active role in setting data sharing policies. •Set clear guidelines and expectations](https://reader036.fdocuments.net/reader036/viewer/2022071021/5fd52d3f2a135817f2020a7f/html5/thumbnails/43.jpg)
Thank you