Big Data in Biomedicine – An NIH Perspective
-
Upload
philip-bourne -
Category
Education
-
view
914 -
download
1
Transcript of Big Data in Biomedicine – An NIH Perspective
![Page 1: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/1.jpg)
Big Data in Biomedicine – An NIH Perspective
Philip E. Bourne Ph.D., FACMIAssociate Director for Data Science
National Institutes of [email protected]
IEEE BIBM Nov 10 2015, Washington DC
http://www.slideshare.net/pebourne
![Page 2: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/2.jpg)
Perspective
Structural bioinformatics researcher
Former custodian of the PDB
Obsessive about open science e.g., PLOS
NIH-wide responsibility for developments in data science – responding to the disruption
![Page 3: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/3.jpg)
Bioinformatics 2015 31(1):146-50
![Page 4: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/4.jpg)
Big Data in Biomedicine…
This speaks to something more fundamental that more data …
It speaks to new methodologies, new skills, new emphasis, new cultures,
new modes of discovery …
![Page 5: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/5.jpg)
Consider this change from my own career experience ….
![Page 6: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/6.jpg)
The History of Computational Biomedicine According to Bourne
1980s 1990s 2000s 2010s 2020
Discipline:
Unknown Expt. Driven Emergent Over-sold A Service A Partner A Driver
The Raw Material:
Non-existent Limited /Poor More/Ontologies Big Data/Siloed Open/Integrated
The People:
No name Technicians Industry recognition data scientists Academics
Searls (ed) The Roots in Bioinformatics Series PLOS Comp Biol
![Page 7: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/7.jpg)
Premise:
We are entering a period of disruption in biomedical research and we should all be thinking about what this means
to bioinformatics & biomedicine
http://i1.wp.com/chisconsult.com/wp-content/uploads/2013/05/disruption-is-a-process.jpg http://cdn2.hubspot.net/hubfs/418817/disruption1.jpg
![Page 8: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/8.jpg)
We are at a Point of Deception …
Evidence:– Google car– 3D printers– Waze– Robotics– Sensors
From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee
![Page 9: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/9.jpg)
Disruption: Example - Photography
DigitizationDeception
Disruption
Demonetization
Dematerialization
Democratization
Time
Vol
ume,
Vel
ocity
, Var
iety
Digital camera invented byKodak but shelved
Megapixels & quality improve slowly; Kodak slow to react
Film market collapses;Kodak goes bankrupt
Phones replacecameras
Instagram,Flickr become thevalue proposition
Digital media becomes bona fide form of communication
![Page 10: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/10.jpg)
Disruption: Biomedical Research
Digitization of Basic & Clinical Research & EHR’s
Deception
We Are Here
Disruption
Demonetization
Dematerialization
Democratization
Open science
Patient centered health care
![Page 11: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/11.jpg)
Disruptive Features: Sustainability
Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
![Page 12: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/12.jpg)
Disruptive Features:Reproducibility
Changing Value of Scholarship (?)
![Page 13: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/13.jpg)
“And that’s why we’re here today. Because something called precision medicine … gives us one of the greatest opportunities for new medical breakthroughs that we have ever seen.”
President Barack ObamaJanuary 30, 2015
Disruptive Features – New Science
![Page 14: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/14.jpg)
An Example of That Promise:Comorbidity Network for 6.2M Danes
Over 14.9 Years
Jensen et al 2014 Nat Comm 5:4022
![Page 15: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/15.jpg)
What Are Some General Implications of Such a Future?
Open collaborative science becomes of increasing importance
The value of data and associated analytics becomes of increasing value to scholarship
Opportunities exist to improve the efficiency of the research enterprise and hence fund more research
Cooperation between funders will be needed to sustain the emergent digital enterprise
Current training content and modalities will not match supply to demand
Balancing accessibility vs security becomes more important yet more complex
![Page 16: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/16.jpg)
How Should We Respond?
Funders: Encourage change and facilitate an orderly transition
Academic Leaders: Respond and facilitate a cultural shift
Developers: Develop working environments that are more adaptive and capable of answers questions in a more efficient and hopefully accurate way
Users: Use the above environments Publishers: Move beyond papers
![Page 17: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/17.jpg)
Take an Example That is Central to What We Do
Molecular Graphics
Is It Optimal for Today’s Science?
http://upload.wikimedia.org/wikipedia/commons/2/2e/Molecular-Graphics-GRIP-75-Console.jpg
![Page 18: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/18.jpg)
Good News/Bad News
Good News:– It is harder to think of a
more powerful way to comprehend complex data
– It has excited generations to the promise of science
– It has adapted to changing technologies
Bad News:– It is not an
adaptive/extensible environment
– It is not a collaborative environment
– It is not an integrative environment
– It is the curse of the ribbon
BMC Bioinformatics 2005, 6:21
![Page 19: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/19.jpg)
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
Is a database really different than a biological journal?
PloS Comp Biol 2005 1(3) e34
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
![Page 20: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/20.jpg)
Take Another Example:The Raw Material of Structural
Bioinformatics
Is this the optimal starting point anymore?
![Page 21: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/21.jpg)
Do data resources including the PDB best serve the needs of the user at
this point?
![Page 22: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/22.jpg)
Good News/Bad News for the PDB in this Changing Landscape
Bad News:
– Interface complex and uni-data oriented
– Data accessible; methods accessible (sort of); but not together
– Significant redundancy in services offered
Good News:
– Annotation!
– Demand is increasing
– Integrated with other data types
– Restful services
![Page 23: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/23.jpg)
General Problem Statement:
How to insure a high quality annotated data source that provides
the optimal environment for accessibility, integration and analysis
by a broad community of diverse users?
![Page 24: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/24.jpg)
The Commons is a shared virtual space which is FAIR:
– Find
– Access (use effectively)
– Interoperate
– Reuse
An environment to find and catalyze the use of shared digital research objects
The CommonsConcept
![Page 25: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/25.jpg)
The Developer or User Defines the Environment from the Appropriate
Building Blocks
![Page 26: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/26.jpg)
Public Beacons
Host Content
AMPLab 1000 Genomes Project
Broad Institute ExAC
Curoverse PGP, GA4GH Example Data
EBI 1000 Genomes Project, UK10K, GoNL, EVS, GEUVADIS, UMCG Cardio GenePanel
Google 1000 Genomes Project, Phase III, Illumina Platinum Genomes
ISB Known VARiants
NCBI NHLBI Exome Sequence Project
OICR 55 cancer datasets
SolveBio 56 public datasets
UCSC ClinVar, LOVD, UniProt
University of Leicester Cafe CardioKit, Cafe Variome Central
WTSI IBD, Native American, Egyptian, UK10K
Over ?? public datasets beaconized across 21 institutions
10s thousands of individuals
![Page 27: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/27.jpg)
![Page 28: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/28.jpg)
The CommonsComponents
Computing environment – cloud or HPC (High Performance Computing)
– supports access, utilization, sharing and storage of digital objects.
Methods for Interoperability– enables connectivity, shareability and interoperability
between digital objects.
Digital object compliance model – describes the properties of digital objects that
enables them to be discoverable and shareable.
![Page 29: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/29.jpg)
The CommonsComponents
![Page 30: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/30.jpg)
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
DDICC
Software
Standards
Infrastructure - The Commons
Labs
Labs
Labs
Labs
![Page 31: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/31.jpg)
The ability to store and share and compute on digital research objects Especially useful for large data sets that
are not easily computed locally
Scalable and Elastic
Pay per use - Cost effective
An environment that fosters collaboration
The CommonsComputing Environment: Cloud
![Page 32: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/32.jpg)
Commons - Pilots
The Cloud Credits - business model
BD2K Centers
MODs (Model Organism Databases)
HMP Data and tools available in the cloud
NCI Cloud Pilots & Genomic Data Commons
![Page 33: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/33.jpg)
The PDB in the Commons
Components:– Annotated collection of data files– API’s to access these data files– Example methods using these APIs
Potential outcomes– Nothing happens?– A new breed of developer starts to use PDB data in new
ways ?– The casual user has a broader set of services that
previously?– Quality declines/increases?
![Page 34: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/34.jpg)
I not only use all the brains I have, but all I can borrow.
– Woodrow Wilson
![Page 35: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/35.jpg)
ADDS Team
BD2K Representatives
![Page 36: Big Data in Biomedicine – An NIH Perspective](https://reader036.fdocuments.net/reader036/viewer/2022062522/58850e831a28abd05e8b4e4b/html5/thumbnails/36.jpg)
NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health
[email protected]://datascience.nih.gov/
http://www.ncbi.nlm.nih.gov/research/staff/bourne/