High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
-
Upload
susanna-assunta-sansone -
Category
Data & Analytics
-
view
431 -
download
0
description
Transcript of High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
![Page 1: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/1.jpg)
!
High quality data publications:!
drives and needs!!
Susanna-Assunta Sansone, PhD!
!
@biosharing!@isatools!
@scientificdata!!
B-DEBATE: Big Data in Biomedicine. Challenges and Opportunities, 12 Nov, 2014
Data Consultant, Honorary Academic Editor
Associate Director, Principal Investigator
![Page 2: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/2.jpg)
https://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/
Credit to:
![Page 3: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/3.jpg)
• Over 50% of completed studies in biomedicine do not appear in the published literature!
!
• Often because results do not conform to author's hypotheses!
“Only half the health-related studies funded by the European Union between 1998 and 2006 - an expenditure of €6 billion - led to identifiable reports”!
Plagued by selective reporting of data and methods
![Page 4: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/4.jpg)
• Big science efforts!o data is often better organized, reported and shared!
• Small independent efforts, yielding a rich variety of specialty data sets!o Most of these data (such as null findings) is unpublished!o These dark data hold a potential wealth of knowledge!
Incentivizing individual contributor to share data
![Page 5: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/5.jpg)
From made reproducible to born reproducible
“Reproducing the method took several months of effort, and required using new versions and new software that posed challenges to reconstructing and validating the results”
![Page 6: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/6.jpg)
http://bd2k.nih.gov/workshops.html#ADDS
Worldwide movement for FAIR data
![Page 7: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/7.jpg)
Because of importance of formal publications in the academic !
incentive structure!
Publishers occupy a leverage point
![Page 8: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/8.jpg)
Serve as the implementation and/or enforcement arm at the point of publication!
Role of publishers as “agents of change”
![Page 9: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/9.jpg)
Credit to: Iain Hrynaszkiewicz
2013
![Page 10: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/10.jpg)
Wang et al, Nature, 2013 doi:10.1038/nature12730
Data/reproducibility at NPG
• Figure source data o putting data behind figures/graphs
![Page 11: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/11.jpg)
Data/reproducibility at NPG
• Figure source data o putting data behind figures/graphs
• Data citation o tackling both styling and format; monitoring community developments,
such the Data Citation Synthesis Group
• Code reproducibility o peer review, availability and reuse
• NPG’s Linked Data release – CC0
• A new data journal
![Page 12: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/12.jpg)
Role of data papers and data journals
• Incentive, credit for sharing!• Peer review focus!• Value of data vs. analysis!• Discoverability and reusability!
![Page 13: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/13.jpg)
market research (2011)
• What do researchers want from a data publications? o 96% - increased visibility and discovery o 95% - increased usability of their research data o 93% - credit mechanism for deposit of data o 80% - peer review of content/datasets
Respondent characteristics 387 respondents (329 active researchers Physics (24%) Earth and environmental science (21%) Biology (20%) Chemistry (19%) Others (16%)
![Page 14: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/14.jpg)
!!!
Helping you publish, discover and reuse research data
Credit for sharing your data
Focused on reuse and reproducibility
Peer reviewed, curated
Promoting community data and code repositories
Open Access
• Currently covering life, natural and environmental sciences!
• Big and small data!o power of small data are in their aggregation and
integration with other datasets!
• New and previously published individual datasets, curated collections and citizen science!
o a fuller, more in-depth look at the data processing steps, additional data files, codes etc!
o tutorial-like information for scientists interested in reusing or integrating the data with their own!
![Page 15: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/15.jpg)
Methods and technical analyses supporting the quality of the measurements:!What did I do to generate the data?!How was the data processed?!Where is the data?!Who did what when!How can the data be used or reused?!
Introducing a new content type: Data Descriptor
Designed to make data more discoverable,
interpretable and reusable
![Page 16: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/16.jpg)
!!!!!!!!Scientific hypotheses:!Synthesis!Analysis!Conclusions!
Methods and technical analyses supporting the quality of the measurements:!What did I do to generate the data?!How was the data processed?!Where is the data?!Who did what when!How can the data be used or reused?!
Relation with traditional article - content
![Page 17: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/17.jpg)
AFTER: expand on your research articles, adding further information for reuse of the data
AT THE SAME TIME: publish your Data Descriptor(s) alongside research article(s)
OR BEFORE !
Relation with traditional article - time
Publish Data!
![Page 18: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/18.jpg)
!!!!!!!!!
Code in GitHub
!!!!!!!!!Data in OpenfMRI
Share your data, get credited and cited
![Page 19: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/19.jpg)
!!!
Experimental metadata or !structured component!
(in-house curated, machine-readable formats)!
Article or !narrative component!
(PDF and HTML) !
Data Descriptor: narrative and structure
![Page 20: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/20.jpg)
Sections:!• Title!• Abstract!• Background & Summary!• Methods!• Technical Validation!• Data Records!• Usage Notes !• Figures & Tables !• References!• Data Citations!!
Focus on data reuse!Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.!Does not contain tests of new scientific hypotheses!
Joint Declaration of Data Citation Principles by the Data Citation Synthesis Group
Data Descriptor: narrative
![Page 21: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/21.jpg)
In traditional publications this information is not provided in a sufficiently detailed manner
However this information is essential for understanding, reusing, and reproducing datasets
Focus on data reuse!Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.!Does not contain tests of new scientific hypotheses!
Sections:!• Title!• Abstract!• Background & Summary!• Methods!• Technical Validation!• Data Records!• Usage Notes !• Figures & Tables !• References!• Data Citations!!
Data Descriptor: narrative
![Page 22: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/22.jpg)
In-house editorial curator:!• assists users to submit the structured
content via simple templates and an internal authoring tool!
• performs value-added semantic annotation of the experimental metadata!
For advanced users/service providers willing to export ISA-Tab for direct submission, we have released a technical specification:!
analysis !method! script!
Data file or !record in a database!
Data Descriptor: structure (CC0)
![Page 23: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/23.jpg)
Res
earc
h pa
pers
D
ata
reco
rds
Dat
a D
escr
ipto
rs
We currently recognize over 60 public data repositories!!
Adding value to research articles and data records
![Page 24: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/24.jpg)
Citation of and link to data files and databases
![Page 25: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/25.jpg)
Evaluation is not be based on the perceived impact !or novelty of the findings or size of the data!
!
• Experimental rigour and technical data quality!o Methodologically sound!o Technical validation experiments and statistical analyses!o Depth, coverage, size, and/or completeness of data sufficient for the types
of applications!• Completeness of the description!
o Sufficient details to allow others to reproduce the results, reuse or integrate it with other data!
o Compliance with relevant minimum information or reporting standards!• Integrity of the data files and repository record!
o Data files match the descriptions in the Data Descriptor!o Deposited in the most appropriate available databases!
Peer review process focused on quality and reuse!
![Page 26: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/26.jpg)
~ 156
~ 70
~ 334
Source: BioPortal
Databases !implementing !
standards!
miame!MIAPA!
MIRIAM!MIQAS!MIX!
MIGEN!
CIMR!MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!GCDML!
SRAxml!SOFT! FASTA!
DICOM!
MzML !SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!CHEBI!
OBI!
PATO! ENVO!MOD!
BTO!IDO…!
TEDDY!
PRO!XAO!
DO
VO!
Progressively refine guidance to authors and reviewers
![Page 27: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/27.jpg)
Mapping the landscape of standards and databases
![Page 28: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/28.jpg)
PI: Lucila Ohno-Machado, UCSD
biocaddie.org
![Page 29: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/29.jpg)
PI: Mark Musen, Stanford
metadatacenter.org
![Page 30: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014](https://reader033.fdocuments.net/reader033/viewer/2022052908/55943de31a28abe95b8b4672/html5/thumbnails/30.jpg)
Acknowledgements!
Visit nature.com/scientificdata
Email [email protected]
Tweet @ScientificData
Honorary Academic Editor Susanna-Assunta Sansone, PhD
Managing Editor Andrew L Hufton, PhD Editorial Curator Varsha Khodiyar
Publisher Iain Hrynaszkiewicz Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators
and our Advisory Boards and Collaborators
Philippe Rocca-Serra, PhD
Alejandra Gonzalez-Beltran, PhD
Eamonn Maguire
Milo Thurston, PhD
Funds: