If Data Are The New Oil, How Do We Prevent Global Warming?
-
Upload
philip-bourne -
Category
Education
-
view
77 -
download
0
Transcript of If Data Are The New Oil, How Do We Prevent Global Warming?
If Data Are The New Oil,How Do We Prevent Global
Warming?
Philip E. Bourne, PhD, FACMIThe National Institutes of Health
http://www.slideshare.net/[email protected]
University of Cincinnati Data Day 2017March 23, 2017
2
Who am I representing and what is my bias?• I am presenting my views, not necessarily those of NIH
• Total data parasite
• Unnatural interest in scholarly communication• Co-founded and founding EIC PLOS Computational Biology – OA advocate• Prior co-Director Protein Data Bank• Amateur student researcher in scholarly communication
I appreciate this is a day to focus on data, but ..
I don’t think you can consider data in isolation from the analytics associated with that data and indeed the knowledge derived from both.
4
The Knowledge versus Data Landscape• Knowledge
• Largely a for-profit business with limited input into that business from the producers of scholarship
• Some open access (OA), costs shifted from consumer to producer
• Full accessibility for non-OA is constrained/controlled
• Funders able to influence the landscape eg PubMed Central
• Sustainable!
• An analog system functioning in a digital world – aka not born digital
• Data• Largely left to governments to support
• Mostly OA
•
• Funders control the landscape
• Not sustainable
• Mostly born digital
Some Shared Issues …• Reproducibility
• Comprehension / communication
• Quality
Reproducibility Examples From My Own Work
It took several months to replicate this work this work
… And just last week…
Phew…
http://www.sdsc.edu/pb/kinases/
Tools Fix This Problem Right?• Extracted all PMC papers with associated Jupyter notebooks available• Approx 100• Took a random sample of 25• Only 1 ran out of the box• Several ran with minor modification• Others lacked libraries, sufficient details to run etc.
It takes more than tools.. It takes incentives …
Daniel Mietchen 2017 Personal Communication
8
1. A link brings up figures from the paper
0. Full text of paper stored in a database – one view
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
One Hypothetical End Point
• Paper is one attributable view of the knowledge
• User clicks on a static image• Metadata and data provide direct
further analysis - an executable paper
• Private and public annotations revealed
• Selecting a feature forms a query for yet further knowledge
• That knowledge rendered as a knowledge graph rather than a paper
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
PLoS Comp. Biol. 2005 1(3) e34
So how do we get there?Well first…..
Source Washington PostOn November 6, 2012, Donald Trump tweeted: "The concept of global warming was created by and for the Chinese in order to make U.S. manufacturing non-competitive."
We Need Relationships Built on Trust
Trust Becomes Even More Important as We Move to Platforms
Sangeet Paul Choudary https://www.slideshare.net/sanguit
The Research Pipeline
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Tools and Resources Will Continue To Be Developed
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
AuthoringTools
Lab Notebooks
DataCapture
Software
Analysis Tools
Visualization
ScholarlyCommunication
2014 SPARC Annual Meeting 14
And Become More Interconnected
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
AuthoringTools
Lab Notebooks
DataCapture
Software
Analysis Tools
Visualization
ScholarlyCommunication
3/01/14
Until We Become a Platform
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
AuthoringTools
Lab Notebooks
DataCapture
Software
Analysis Tools
Visualization
ScholarlyCommunication
Commercial &Public Tools
Git-likeResources
By Discipline
Data JournalsDiscipline-Based Metadata
Standards
Community Portals
Institutional Repositories
New Reward Systems
Commercial Repositories
Training
Consider an example of an existing platform….
• Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host)
• The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder
• It seems to be working: • 60 million users searching 2 million listings in 192 countries • Average of 500,000 stays per night. • Evaluation of US $25bn
Bonazzi & Bourne 2017, PLOS Biology, In Press
Platforms are Certainly Not Without Issues
Nevertheless It would seem we need to move in this direction if we are to solve the many issues swirling around scholarly communication …
Is not biomedical research the same?
Why a comparison to Airbnb is not fair• Airbnb was born digital
• The exchange of services on Airbnb are simple compared to what is required of a platform to support biomedical research
Nevertheless there is much to be learnt
Paper Author Paper Reader
Data Provider Data Consumer
Employer Employee
Reagent Provider Reagent Consumer
Software Provider Software Consumer
Grant Writer Grant Reviewer
Supplier ConsumerSc
hola
rly W
orkfl
owPlatform
MS ProjectGoogle Drive
CourseraResearchgateAcademia.eduOpen Science Framework
SynapseF1000
Rio
Educator Student
Platforms - The Situation Today
In summary there is not currently a widely adopted single platform for the exchange of services in biomedical research. Either there is a platform per service or no platform at all. Why have we not done better and what are the impediments today?
Impediments to a biomedical platform
• Current work practices by all stakeholders• Entrenched business models• Size of the undertaking aka resources needed• Trust• Incentives to use the platform
http://www.forbes.com/sites/johnhall/2013/04/29/10-barriers-to-employee-innovation/#8bdbaa811133
The NIH through the Big Data to Knowledge (BD2K) and others are experimenting with a platform, keeping in mind the need to overcome these impediments
Enter The Commons
https://en.wikipedia.org/wiki/Ealing_Common#/media/File:Ealing_Common_-_geograph.org.uk_-_17075.jpg
Paper Author Paper Reader
Data Provider Data Consumer
Employer Employee
Reagent Provider Reagent Consumer
Software Provider Software Consumer
Grant Writer Grant Reviewer
Supplier ConsumerSc
hola
rly W
orkfl
owPlatform
MS ProjectGoogle Drive
CourseraResearchgateAcademia.eduOpen Science Framework
SynapseF1000
Rio
Educator Student
Commons – Initial focus is on integrating two layers of the scholarly workflow
Commons Topology
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data“Reference” Data Sets
User defined data
Digital Object Com
pliance
App store/User Interface
PaaS
SaaS
IaaS
https://datascience.nih.gov/commons
“I really admire Airbnb as a pioneer of the sharing economy and for building community. They've found an elegant way to help hosts make more money and for guests to have authentic experiences. It brings those people together in a unique way. “
Logan Green
“The Commons is one effort at creating a sharing economy and for building community. We hope for a more cost effective and productive research environment while bringing people together in a unique way. “
Phil Bourne
Acknowledgements
• Vivien Bonazzi, Jennie Larkin, Michelle Dunn, Mark Guyer, Allen Dearry, Sonynka Ngosso, Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS)
• NLM/NCBI: Mike Huerta, George Komatsoulis• NHGRI: Valentina di Francesco
• NIGMS: Susan Gregurick
• CIT: Debbie Sinmao, Andrea Norris
• NIH Common Fund: Jim Anderson , Betsy Wilder, Leslie Derr
• NCI Cloud Pilots/ GDC: Warren Kibbe, Tony Kerlavage, Tanja Davidsen
• Commons Reference Data Set Working Group: Weiniu Gan (HL), Ajay Pillai (HG), Elaine Ayres, (BITRIS), Sean Davis (NCI), Vinay Pai (NIBIB), Maria Giovanni (AI), Leslie Derr (CF), Claire Schulkey (AI)
• RIWG Core Team: Ron Margolis (DK), Ian Fore, (NCI), Alison Yao (AI), Claire Schulkey (AI), Eric Choi (AI)
• OSP: Dina Paltoo, Kris Langlais, Erin Luetkemeier, Agnes Rooke,
Bonazzi & Bourne 2017, PLOS Biology, In Press