Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE:...
Transcript of Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE:...
Publishing Source Data
Finding and Accessing the Data Behind Figures
Bernd Pulverer Head | Scientific Publications Chief Editor | The EMBO Journal
Three principles of research
• Sharing
research progresses as a collaborative
enterprise - often in small steps
• Self-governance
peer-review, committees, boards
• Ethics
discover
validate
revise
share
discover
validate
revise
share
Data published in papers represents a small fraction of all useful data generated in labs peer review; stable; citable; usually unstructured
Data deposited in databases represents a bigger fraction of the data usually curation; (stable); citable; structured
Data deposited in repositories may capture a large fraction of the data some curation; often unstructured >> validation, citability, stability
>20k journals 1.5 million papers/year
>5% annual growth
papers
databases
structures
sequences
functional genomics
proteomics
genotype phenotype
Data deposition in databases/repositories
computational models
“orphan” data Papers
Authors’ website
Institutional repositories
Open Science! …but
not all data is useful
flawed data
unreproducible data raw data unstructured data validation - curation
Data published in papers is validated by peer review ….but
negative, confirmatory and refuting data is largely ignored not all published data is high quality not all published data is reproducible
Amgen
Bayer Healthcare
Unreproducible data (?)
‘biologists fail to design experiments properly, and so submit underpowered studies that have an insufficient sample size and trumpet chance observations as biological effects…. Researchers …must agree on standards that will protect against avoidable errors. '
NATURE Error prone Nature 487, 406 (2012) doi:10.1038/487406a
‘Scientists and journals must work together to ensure that eye-catching artefacts are not trumpeted as genomic insights’ ‘hunting for biological surprises without due caution can easily yield a rich crop of biases and experimental artefacts, and lead to high-impact papers built on nothing more than systematic experimental 'noise'.’
NATURE Methods: Face up to false positives Daniel MacArthur Nature 487, 427–428 (2012) doi:10.1038/487427a
Nature Reviews Neuroscience Power failure: why small sample size Undermines the reliability of neuroscience Button, Ioannidis, Mokrysz, Nosek, Flint, Robinson & Munafò Nature Reviews Neuroscience 14, 365-376 (2013) doi:10.1038/nrn3475
‘the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful.’
Many Journals have explicit materials and data sharing guidelines …which are often not policed effectively
It is not always easy to share all data: human subjects pharma, biotech
Data (Code) sharing may expose flaws aids competitors
Many Journals have explicit materials and data sharing guidelines …which are often not policed effectively
The scientific paper will remain a key mode of exchanging filtered scientific information • browsing • ‘academic currency’
Research Papers rarely contain data that can be mined, extracted, reanalyzed and reused [not just a question of licensing] The data in papers are not discoverable [not just a question of OA]
Scientific Journals are good at validation and in mandating standards & sharing [but need to get better]
Journals are not good at access and publishing of structured data • human vs. machine readable • interlinking papers, databases and data repositories
Quality assurance Reproducibility Accessibility Discoverability
EXPANDED VIEW
EXPANDED VIEW
• Files that not rendered in HTML are linked as downloads • Links to datasets • Data citations
EXPANDED VIEW
Data Transparency
Published data should be accessible, reproducible and re-usable for research by others
‘The two vital components of
the scientific endeavor – the idea and the evidence – are too frequently separated’
Science as an open enterprise, Royal Society, 2012
A paper
Graphs Gels Schemata Micrographs
What is a figure?
A scientific result converted into a
collection of pixels
Add Source Data to Papers: Gels, Blots, micrographs & Graphs a lab book
Figure Source Data Raw Data
Raw vs. Source
• Archive
• Transparency
• Replicates
• Reanalysis
• Reuse
• Discoverability
• Discourage manipulation
o voluntary o ~40% papers
Source Data
Data or Schema?
Post Review Manuscript EMBO Molecular Medicine 2012
Data or Schema?
‘I’m a great believer in seeing all the data – this is a very important lever that we have for transparency’
Michael Farthing, founder COPE
Smart figures
• Access to source data
• Descriptive metadata
• Coherent experimental units: figure panels
• Enabling data-oriented searches
• Present data in the context of related data
>> Data oriented search
Paper 1 Paper 2
Navigating Research Findings via Figures
Data viewer
Figure = Data
Text = Narrative
(A) Primary early-passage MEFs were infected with MSCV-Myc-ERTAM-IRES-GFP (Myc-ER) or MSCV-IRES-GFP (GFP) virus. GFP+ cells were then left untreated (−) or were treated (+) with 2 μm 4-HT±Chx pretreatment (30 min) for 24 h and assessed for their expression of the indicated mRNAs ( cks1, skp2, rcl and cdc) by SYBR-green real-time PCR analysis. Levels of mRNA were standardized to Ub.
Entity tagging: machine readable metadata
Curation tool
(Level 0: metadata associated to individual panels.)
Level 1: list of chemical and biological components: small molecules, genes, proteins, sub-cellular structures, cell type, tissue, species.
Level 2: representation of the causality of the experimental design: “Measurement of Y as a function of A, B, C, using assay P in biological system S.”
Level 3: normalization to machine-readable standard identifiers.
measured
component
perturbed
component
experimental system
assayed
property
Structured metadata: ‘perturbation-observation-assay’
Resulting hypothesis: test drug Z in disease D.
tissue T disease
D gene x
Pa
pe
r 3
protein X P kinase Y
Pa
pe
r 2
kinase Y activity drug Z
Pa
pe
r 1
Data integration
Survey (n=487)
PubMed is the first choice for 72% of users (Google is the second choice).
Major issues: “Too many irrelevant results (lack of specificity)” and “Difficulty to formulate complex queries”
Microattribution & Data/Protocol links • Credit • Data Citation • Accountability • Reproducibility
Fig 1C • Source Data • Methods • Protocol • Data Citation • Authorship
Tim Elston, Univ. of North Carolina
Prepublication Ethics
Journal Author Checklists
EMM submission 26/8/2013
Fraud or Beautification?
Ban the eraser tool! EMM submission 26/8/2013
Fraud or Beautification?
July 28, 2014
RE: EMM-2014-03890-V3
Dear Dr. Carret,
We retract our paper from further consideration at EMBO Molecular Medicine. …..
We thank the Journal’s Editorial Board for their rigorous evaluation of our study. Publishing a paper with an erroneous blot would call into question all of the good work that went into this study. This situation would have been a disaster for us ….. Thus, we are sincerely grateful that you enabled us to detect this mistake prior to publication.
Cordially,
Prepublication Image Analysis
‘A handful of television trucks with satellite dishes lined the street in front of the building in downtown Tokyo. Some 200 journalists packed the meeting room and were flanked by two dozen video cameras and crew at the front.’ Nature 2014
Fig 1i ED Fig 5
Retracted Nature 505, 641–647 (2014) Nature 505, 676–680 (2014)
Fig 2g Fig 1b
Fig 2g
Prepublication Image Analysis
Long term toxicity of a Roundup herbicide and a Roundup-tolerant genetically modified maize Séralini G.E., et al. Food and Chemical Toxicology, 2012, retracted
Extraordinary claims require extraordinary proof and validation
“no definitive conclusions can be reached” A. Wallace Hayes, editor Food and Chemical Toxicology
Open Data: how to ensure reliability?
Gate of Hell - William Blake Jacob’s Dream - William Blake
Opening the
Gates…