If Data Are The New Oil, How Do We Prevent Global Warming?

30
If Data Are The New Oil, How Do We Prevent Global Warming? Philip E. Bourne, PhD, FACMI The National Institutes of Health http:// www.slideshare.net/pebourne p [email protected] University of Cincinnati Data Day 2017 March 23, 2017

Transcript of If Data Are The New Oil, How Do We Prevent Global Warming?

Page 1: If Data Are The New Oil, How Do We Prevent Global Warming?

If Data Are The New Oil,How Do We Prevent Global

Warming?

Philip E. Bourne, PhD, FACMIThe National Institutes of Health

http://www.slideshare.net/[email protected]

University of Cincinnati Data Day 2017March 23, 2017

Page 2: If Data Are The New Oil, How Do We Prevent Global Warming?

2

Who am I representing and what is my bias?• I am presenting my views, not necessarily those of NIH

• Total data parasite

• Unnatural interest in scholarly communication• Co-founded and founding EIC PLOS Computational Biology – OA advocate• Prior co-Director Protein Data Bank• Amateur student researcher in scholarly communication

Page 3: If Data Are The New Oil, How Do We Prevent Global Warming?

I appreciate this is a day to focus on data, but ..

I don’t think you can consider data in isolation from the analytics associated with that data and indeed the knowledge derived from both.

Page 4: If Data Are The New Oil, How Do We Prevent Global Warming?

4

The Knowledge versus Data Landscape• Knowledge

• Largely a for-profit business with limited input into that business from the producers of scholarship

• Some open access (OA), costs shifted from consumer to producer

• Full accessibility for non-OA is constrained/controlled

• Funders able to influence the landscape eg PubMed Central

• Sustainable!

• An analog system functioning in a digital world – aka not born digital

• Data• Largely left to governments to support

• Mostly OA

• Funders control the landscape

• Not sustainable

• Mostly born digital

Page 5: If Data Are The New Oil, How Do We Prevent Global Warming?

Some Shared Issues …• Reproducibility

• Comprehension / communication

• Quality

Page 6: If Data Are The New Oil, How Do We Prevent Global Warming?

Reproducibility Examples From My Own Work

It took several months to replicate this work this work

… And just last week…

Phew…

http://www.sdsc.edu/pb/kinases/

Page 7: If Data Are The New Oil, How Do We Prevent Global Warming?

Tools Fix This Problem Right?• Extracted all PMC papers with associated Jupyter notebooks available• Approx 100• Took a random sample of 25• Only 1 ran out of the box• Several ran with minor modification• Others lacked libraries, sufficient details to run etc.

It takes more than tools.. It takes incentives …

Daniel Mietchen 2017 Personal Communication

Page 8: If Data Are The New Oil, How Do We Prevent Global Warming?

8

1. A link brings up figures from the paper

0. Full text of paper stored in a database – one view

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

One Hypothetical End Point

• Paper is one attributable view of the knowledge

• User clicks on a static image• Metadata and data provide direct

further analysis - an executable paper

• Private and public annotations revealed

• Selecting a feature forms a query for yet further knowledge

• That knowledge rendered as a knowledge graph rather than a paper

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

PLoS Comp. Biol. 2005 1(3) e34

Page 9: If Data Are The New Oil, How Do We Prevent Global Warming?

So how do we get there?Well first…..

Page 10: If Data Are The New Oil, How Do We Prevent Global Warming?

Source Washington PostOn November 6, 2012, Donald Trump tweeted: "The concept of global warming was created by and for the Chinese in order to make U.S. manufacturing non-competitive."

We Need Relationships Built on Trust

Page 11: If Data Are The New Oil, How Do We Prevent Global Warming?

Trust Becomes Even More Important as We Move to Platforms

Sangeet Paul Choudary https://www.slideshare.net/sanguit

Page 12: If Data Are The New Oil, How Do We Prevent Global Warming?

The Research Pipeline

IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION

Page 13: If Data Are The New Oil, How Do We Prevent Global Warming?

Tools and Resources Will Continue To Be Developed

IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION

AuthoringTools

Lab Notebooks

DataCapture

Software

Analysis Tools

Visualization

ScholarlyCommunication

Page 14: If Data Are The New Oil, How Do We Prevent Global Warming?

2014 SPARC Annual Meeting 14

And Become More Interconnected

IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION

AuthoringTools

Lab Notebooks

DataCapture

Software

Analysis Tools

Visualization

ScholarlyCommunication

3/01/14

Page 15: If Data Are The New Oil, How Do We Prevent Global Warming?

Until We Become a Platform

IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION

AuthoringTools

Lab Notebooks

DataCapture

Software

Analysis Tools

Visualization

ScholarlyCommunication

Commercial &Public Tools

Git-likeResources

By Discipline

Data JournalsDiscipline-Based Metadata

Standards

Community Portals

Institutional Repositories

New Reward Systems

Commercial Repositories

Training

Page 16: If Data Are The New Oil, How Do We Prevent Global Warming?

Consider an example of an existing platform….

Page 17: If Data Are The New Oil, How Do We Prevent Global Warming?

• Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host)

• The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder

• It seems to be working: • 60 million users searching 2 million listings in 192 countries • Average of 500,000 stays per night. • Evaluation of US $25bn

Bonazzi & Bourne 2017, PLOS Biology, In Press

Page 18: If Data Are The New Oil, How Do We Prevent Global Warming?

Platforms are Certainly Not Without Issues

Page 19: If Data Are The New Oil, How Do We Prevent Global Warming?

Nevertheless It would seem we need to move in this direction if we are to solve the many issues swirling around scholarly communication …

Page 20: If Data Are The New Oil, How Do We Prevent Global Warming?

Is not biomedical research the same?

Page 21: If Data Are The New Oil, How Do We Prevent Global Warming?

Why a comparison to Airbnb is not fair• Airbnb was born digital

• The exchange of services on Airbnb are simple compared to what is required of a platform to support biomedical research

Nevertheless there is much to be learnt

Page 22: If Data Are The New Oil, How Do We Prevent Global Warming?

Paper Author Paper Reader

Data Provider Data Consumer

Employer Employee

Reagent Provider Reagent Consumer

Software Provider Software Consumer

Grant Writer Grant Reviewer

Supplier ConsumerSc

hola

rly W

orkfl

owPlatform

MS ProjectGoogle Drive

CourseraResearchgateAcademia.eduOpen Science Framework

SynapseF1000

Rio

Educator Student

Platforms - The Situation Today

Page 23: If Data Are The New Oil, How Do We Prevent Global Warming?

In summary there is not currently a widely adopted single platform for the exchange of services in biomedical research. Either there is a platform per service or no platform at all.  Why have we not done better and what are the impediments today?

Page 24: If Data Are The New Oil, How Do We Prevent Global Warming?

Impediments to a biomedical platform

• Current work practices by all stakeholders• Entrenched business models• Size of the undertaking aka resources needed• Trust• Incentives to use the platform

http://www.forbes.com/sites/johnhall/2013/04/29/10-barriers-to-employee-innovation/#8bdbaa811133

Page 25: If Data Are The New Oil, How Do We Prevent Global Warming?

The NIH through the Big Data to Knowledge (BD2K) and others are experimenting with a platform, keeping in mind the need to overcome these impediments

Enter The Commons

https://en.wikipedia.org/wiki/Ealing_Common#/media/File:Ealing_Common_-_geograph.org.uk_-_17075.jpg

Page 26: If Data Are The New Oil, How Do We Prevent Global Warming?

Paper Author Paper Reader

Data Provider Data Consumer

Employer Employee

Reagent Provider Reagent Consumer

Software Provider Software Consumer

Grant Writer Grant Reviewer

Supplier ConsumerSc

hola

rly W

orkfl

owPlatform

MS ProjectGoogle Drive

CourseraResearchgateAcademia.eduOpen Science Framework

SynapseF1000

Rio

Educator Student

Commons – Initial focus is on integrating two layers of the scholarly workflow

Page 27: If Data Are The New Oil, How Do We Prevent Global Warming?

Commons Topology

Compute Platform: Cloud or HPC

Services: APIs, Containers, Indexing,

Software: Services & Tools

scientific analysis tools/workflows

Data“Reference” Data Sets

User defined data

Digital Object Com

pliance

App store/User Interface

PaaS

SaaS

IaaS

https://datascience.nih.gov/commons

Page 28: If Data Are The New Oil, How Do We Prevent Global Warming?

“I really admire Airbnb as a pioneer of the sharing economy and for building community. They've found an elegant way to help hosts make more money and for guests to have authentic experiences. It brings those people together in a unique way. “

Logan Green

Page 29: If Data Are The New Oil, How Do We Prevent Global Warming?

“The Commons is one effort at creating a sharing economy and for building community. We hope for a more cost effective and productive research environment while bringing people together in a unique way. “

Phil Bourne

Page 30: If Data Are The New Oil, How Do We Prevent Global Warming?

Acknowledgements

• Vivien Bonazzi, Jennie Larkin, Michelle Dunn, Mark Guyer, Allen Dearry, Sonynka Ngosso, Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS)

• NLM/NCBI: Mike Huerta, George Komatsoulis• NHGRI: Valentina di Francesco

• NIGMS: Susan Gregurick

• CIT: Debbie Sinmao, Andrea Norris

• NIH Common Fund: Jim Anderson , Betsy Wilder, Leslie Derr

• NCI Cloud Pilots/ GDC: Warren Kibbe, Tony Kerlavage, Tanja Davidsen

• Commons Reference Data Set Working Group: Weiniu Gan (HL), Ajay Pillai (HG), Elaine Ayres, (BITRIS), Sean Davis (NCI), Vinay Pai (NIBIB), Maria Giovanni (AI), Leslie Derr (CF), Claire Schulkey (AI)

• RIWG Core Team: Ron Margolis (DK), Ian Fore, (NCI), Alison Yao (AI), Claire Schulkey (AI), Eric Choi (AI)

• OSP: Dina Paltoo, Kris Langlais, Erin Luetkemeier, Agnes Rooke,

Bonazzi & Bourne 2017, PLOS Biology, In Press