Data models for preserving and publishing digital research material beyond the PDF

Post on 10-May-2015

859 views 2 download

Tags:

description

Slides for the Technology Track of ISMB/ECCB 2013 in Berlin on digital publishing, highlighting the Research Object model, Nanopublications, and ISA as a means to capture methods and results when research is carried out digitally. This work was supported by the EU workflow forever project (http://wf4ever-project.org).

Transcript of Data models for preserving and publishing digital research material beyond the PDF

Data models for digital preservation and publishing beyond the PDF

Jun Zhao, Mark Thompson, Kristina Hettne, Stian Soiland, Susana Garcia , Marco Roos

Acknowledging Harish Dharuri, Susanna Sansone, Philipe Rocca-Sera,

Alejandra Gonzales-Beltran, Albert Mons, Arie Baak, Erik Schultes, Carole Goble, Barend Mons

The Workflow Forever project (EU FP7 nr. 270192), Digital Libraries and Digital Preservation. (ICT-2009.4.1)

Recording your computational steps…

Bioinformaticians have no labbooks!and no training on digital notekeeping

http://graemefielder.wordpress.com/2010/09/17/lab-books-evolution-required/

State of the art study capture?

How then?Workflows encapsulate in silico analysis

http://ap27-cgla.blogspot.nl/ http://openi.nlm.nih.gov/detailedresult.php?img=2743669_1471-2105-10-252-2&req=4

5

Components to understand an experimentIs a workflow enough?

Workflow: Which biological pathways explain the

associations?

Interpret results(Interaction

pathways in the cell)

Research QuestionGenome Wide Association Studies (GWAS)

In 1000+ people: which gene mutations are associated with metabolic syndrome,

and why?

Download data- External DB

- Existing Knowledge

Hypothesis

Genes involved in inflammation pathways are

involved in the onset of metabolic syndrome.

6

Components to understand an experimentIs a workflow enough?

Workflow: Which biological pathways explain the

associations?

Interpret results(Interaction

pathways in the cell)

Research QuestionGenome Wide Association Studies (GWAS)

In 1000+ people: which gene mutations are associated with metabolic syndrome,

and why?

Download data- External DB

- Existing Knowledge

Hypothesis

Genes involved in inflammation pathways are

involved in the onset of metabolic syndrome. Preserve

PreservePreserve

Preserve

Preserve

Research Object

DataData

Method/Experimental

protocol

Method/Experimental

protocol

FindingsFindings

Types of resources

ISA-TAB/ISA2OWL

Nanopublication

ISA-TAB/ISA2OWLWfdesc

ISA-TAB/ISA2OWLWfdesc

Data Models

Capture more than workflows

Research Object ModelPreservation for understanding

Preserve at least the:– Hypothesis

– A workflow-like sketch

– One or more workflows

– Input data

– Workflow runs

– Results

– Conclusion

My Research Book

9

Fame and Glory

It was me, me,

me!

What I found

How I found

it

HDAC1 interacts with Parvb

Discovered by: me

Nanopublication

AssertionProvenance of Assertion

Metadata of nanopublication

Prototyping the models

• Create: myExperiment• Better: Checklist service• Evolution: Digital Library software• Curation: Quality Monitoring Service• Credit original assertions: LandMark Tool• Applications by private partners

myExperiment- create Research Objects

Prototyping the Research Object Data Model in

Checklist service- make better Research Objects

Prototyping the Research Object Data Model in

http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API

http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API

RELEASE! http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API

Digital Library software- evolution of a Research Object

Prototyping the Research Object Data Model in

Research Object ‘under construction’

Snaphots to record intermediate states

Full copy ‘Ready for Release’

Quality Monitoring Service- Long term curation

Prototyping the Research Object Data Model in

Landmark Claim Tool- mark and credit the first discovery

Prototyping the Nanopublication Model

Landmark Claim Tool

Core data

Attribution

Qualification

Applications from private partners- Robust tools for business stakeholders

Prototyping the Nanopublication Model

Nanopublication applicationsEuretos Company

Copyright Euretos b.v. 2013

48

Releases planned for 2014

Some gory detailData models ‘under the hood’

50

Research Object Model at a glance

Research Object

ResourceResource

Resource

AnnotationAnnotation

Annotation

oa:hasTarget

ResourceResource

Annotation graphoa:hasBody

ore:aggregatesManifest

ore:isDescribedBy

For more information and extensions (Evolution model, MINIM) seehttp://wf4ever-project.org/

Extensions

52

Wf4Ever architecture

Semantic REST API

RDF triple store(RO structure, Annotations)

RO indexUploaded files

PortalChecklist service

Command line

Workflow runner

...

Nanopublication Data Model

Assertion

Nanopublication URL

Provenance PublicationInfo

assertion

opm:was

DerivedFrom

http://rdf.biosemantics.org/…profiles_matching_198

0_2010

opm:wasGene-ratedBy

thisnanopub

dcterms:created

2012-03-28T11:32

^̂ xsd:dateTime

pav:authored-

By

associa-tion

a sio:statis-ticalAssociation

sio:has-measurementValue

Association_1_p_value

a

Sio:probability-value

sio:has-value

6.56 e-5

^̂ xsd:float

sio:refers-to

http://bio2rdf.org/

omim:210600

researcherid.com/rB-6035-

2012

dcterms:DOI

http://dx.doi.org/

….

…http://

bio2rdf.org/geneid:55835

Integrity Key

An Individual association between concepts:• statement or declaration• measurement• hypothetical inference• quantitative or qalitative

Guarantee immutabilityafter publication

Unique, persistent and resolvable identifier

How this assertion came to be, methods,

evidence, context, etc.

• Detailed attribution for authors, institutions, lab technicians, curators

• License info• Publication date

Assertion

http://www.store.net/mynanopub.rdf

Provenance Publication-Info

SoapDenovo 2 increases correct

assembly length by 3-80 times over Soapdenovo 1

pav:authoredBydc:rights

dc:created

A Galaxy workflow

results

slides

hypothesis

ro:aggregate

s

Research object can link to a nanopub as

an experimental result

ro:aggregates

Assertion

http://www.store.net/mynanopub.rdf

Provenance Publication-Info

SoapDenovo 2 increases correct

assembly length by 3-80 times over Soapdenovo 1

pav:authoredBydc:rights

dc:created

A Galaxy workflow

results

slides

hypothesis

ro:aggregate

s

Nanopublication gains detailed

workflow provenance by

linking to RO

ro:aggregates

rdf:describedBy

Assertion

http://www.store.net/mynanopub.rdf

Provenance Publication-Info

SoapDenovo 2 increases correct

assembly length by 3-80 times over Soapdenovo 1

pav:authoredBydc:rights

dc:created

A Galaxy workflow

results

ro:aggregates

slides

hypothesis

ro:aggregate

s

Extend your provenance!

E.g. link the claim to the original data elements

from which it was derived

rdf:describedBy

Assertion

http://www.store.net/mynanopub.rdf

Provenance Publication-Info

SoapDenovo 2 increases correct

assembly length by 3-80 times over Soapdenovo 1

pav:authoredBydc:rights

dc:created

A Galaxy workflow

results

ro:aggregates

slides

hypothesis

ro:aggregate

s

?rdf:describedBy

Community effort

• Research Objectshttp://researchobjects.org/http://wf4ever-project.org/

• Nanopublicationhttp://Nanopub.org/

• ISA-toolshttp://www.isa-tools.org/

• Research Objects Community Group at W3Chttp://w3.org/community/rosc

W3C community group for ROhttp://www.w3.org/community/rosc/

Conclusions (1/2)

• Applications of RO and Nanopublication data models to capture the bioinformatics research process ‘beyond the PDF’

• Data models:ISA, Research Objects, Nanopublications

Conclusions (2/2)

• Reference implementations / first to adopt:myExperiment, DLibra, Checklist service, Curation/monitoring, Landmark tool

• Private partners developing stable nanopublication applications

• Prevent perfectionism of the developers:get involved now!

THANK YOU FOR YOUR ATTENTION

http://researchobject.org/ http://nanopub.org/ http://isa-tools.org/ Research Object Community group at W3C: http://w3.org/community/rosc