Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access...

28
NATIONAL INSTITUTES OF HEALTH: Data Management, Open Access and Data Sharing Today and Beyond Presentation to the ASEE March 7, 2017 Patricia Flatley Brennan, RN, PhD Director National Library of Medicine

Transcript of Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access...

Page 1: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

NATIONAL INSTITUTES OF HEALTH: Data Management,

Open Access and Data Sharing

Today and Beyond

Presentation to the ASEEMarch 7, 2017

Patricia Flatley Brennan, RN, PhDDirector

National Library of Medicine

Page 2: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full
Page 3: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

DataScience@NIH

• Opportunities

• Challenges

• Directions

Discovery!

Page 4: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Reusable

Findable

Accessible

Interoperable

Data

Page 5: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

FIN

DA

BLE •Persistent

identifier

•Metadata

•Curation

•Indexing

•Catalogs

•Source on the

fly

AC

CESSIB

LE

•Security

•Authentication

•Authorization

•Metadata persists

•Public-private solutions IN

TER

OP

ER

AB

LE •Formal, accessible, shared and broadly applicable language

•Vocabularies follow FAIR principles

REU

SA

BLE •Meet domain-

relevant community standards

•Provenance

•Sustainable

•Shift to Discovery

Page 6: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Computation• Analytics

– Biostatistics

– Statistics

– Distributed analytics

– Machine learning

– Optimization

• Visualization– Visual Analytics

– Depicting results

• Management – Business process

– Preserving provenance of analytical strategies

– Maintaining version control

Page 7: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Infrastructure

• Commons

• Identity management and authentication

• Planning and forecasting tools

• Business analyticsCourtesy Warren Kibbe

Page 8: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

National Library of Medicine (NLM)

• Mission: To acquire, organize, disseminate, and preserve the biomedical knowledge of the world for the benefit of the public’s health

• Open science is key to NLM’s mission of supporting scientific discovery

• NLM products highlighted:

– PubMed Central

– ClinicalTrials.gov

– PubChem

Page 9: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Our Future

Builds on

Our Past

Data2016 - ∞

Network

1984-2015

Digitize

~1970-1983

Index

1838-~1960

Page 10: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full
Page 11: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Create mechanisms to:

– Determine high value data sets

– Forecast their costs

– Set preservation strategies

– Anticipate utilization

Assess Value

Mine

high-value

datasets

Page 12: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Create mechanisms to:

– Determine high value data sets

– Forecast their costs

– Set preservation strategies

– Anticipate utilization

Preserve

with a

purpose

Page 13: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Promote Standards

Page 14: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Tools

for

Discovery &

Analysis

Page 15: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Promote Training

Page 16: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Engage with Others

Page 17: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full
Page 18: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

PubMed• 26.5 M records• Only bibliographic

recordsMEDLINE

• ~5600 selected journal titles

• Only bibliographic records

• 93% of PubMed

National Library of Medicine Collection• ~17,000 serial titles• All Serials – journals, annuals, statistics, etc.• Discoverable in NLM Catalog & LocatorPlus• NLM provides ILL and ensures archiving

NLM Collection

PMC = PubMed Central Archive • ~2,000 full participation

journals• 4 million full text articles• ~1M federally funded

public access articles• Bibliographic records

display in PubMed

2008: NIH Public Access Policy & PMC

International collaboration:Europe PMC & PMC Canada

Page 19: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Enhancing Access

• Research results more readily accessible to the public, providers, educators, and the scientific community

• Permanent preservation of research findings

• Common format used to store data from diverse sources

• Collections available for bulk download and text mining

Page 20: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

PubMed Central®

Page 21: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

• A registry and results database of publicly and privately supported clinical studies of human participants conducted around the world

• Sponsor or researcher registers study when it begins; updates information throughout; reports summary results upon completion

• Over 233,000 studies in all 50 States and 195 countries; nearly 24,000 studies with posted summary results

Page 22: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

23

Page 23: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

ClinicalTrials.gov: API & Data Use• API allows third parties to submit queries for specified trials

and display records: e.g., BreastCancerTrials.org, Foundation

Fighting Blindness, Colon Cancer Alliance

Page 24: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

• Comprehensive source of chemical structures of small organic molecules and their biological activities

• Organized as three linked databases– PubChem Substance, PubChem Compound,

PubChem BioAssay

• Records link to other databases, including scientific literature in PubMed

Page 25: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

• Provides guidance on citing the original generators of the data sets

– Corresponding scientists get credit

– Readers can locate data source

• Includes approximately 93 million compounds, 225 million substances

Page 26: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

NLM & the Future of Data Science

• Evolving data storage, communications, and computer security technologies

• Methods for generation, formalization, management, and sharing of knowledge resources

• Training for data scientists, data-informed investigators, data librarians

• Partnership with other NIH components and agencies promoting best practices for data storage, access, discovery and analysis.

• Strategic planning underway

Page 27: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

DataScience@NIH pivots to NLM

• Ensure integration of lessons learned from BD2K and Cloud Pilot

• Create mechanisms to determine high value data sets, locate them, forecast their cost and utilization

• Implement efficient, secure preservation strategies that facilitate access and reuse

• Re-engage and stimulate intramural and extramural efforts in standards

• Develop new methods for for data management and data-driven discovery

• Grow a talented workforce

• Foster open science

• Engage with government, national, and global collaboratives

Page 28: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Reaching me http://nlmdirector.nlm.nih.gov

[email protected]

@NLMdirector

emey

87

/ Ic

on

Arc

hiv

e /

CC

BY-

NC

-ND

-4.0