ISI work

16
Date: 13/10/2011 Work at ISI, relation with wf4Ever, future steps Daniel Garijo Verdejo, Yolanda Gil Ontology Engineering Group. Laboratorio de Inteligencia Artificial Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid

description

My work at ISI during the summer, its relation with wf4Ever and next steps in my research

Transcript of ISI work

Page 1: ISI work

Date: 13/10/2011

Work at ISI, relation with wf4Ever,

future steps

Daniel Garijo Verdejo,Yolanda Gil

Ontology Engineering Group. Laboratorio de Inteligencia ArtificialDepartamento de Inteligencia Artificial

Facultad de InformáticaUniversidad Politécnica de Madrid

Page 2: ISI work

1

The TB Drugome

Page 3: ISI work

3

Project goals

Text:Narrative of method,

software packages used

Software:scripted codes + manual steps +

notes/emails

Workflow: Workflow/scripts describing

dataflow, codes, and parameters

Data:Key datasets and figures/plots

Typical Published Article

Text:Narrative of method,

software packages used

Data:Key datasets and figures/plots

Reproducible Article: Weaver, GenePattern GRRD, etc.

NOT published, loosely recorded:

Page 4: ISI work

4

Problem with existing approaches

Only executable workflow is published:1. Must have the same codes to re-execute

the workflow, but:– Codes become unavailable

• Eg: eHits was proprietary and replaced by AutodockVina

– Different labs prefer different codes • Eg: R vs Matlab• Eg: viz in Citoscape vs yEd

2. Must have the same workflow framework to re-execute the workflow– Must have R for Weaver

3. Must import files to local file system and workflow framework– Must import bundle of workflow/data/code

files to reproduce

Workflow: Workflow/scripts describing

dataflow, codes, and parameters

Text:Narrative of method,

software packages used

Data:Key datasets and figures/plots

Reproducible Article: Weaver, GenePattern GRRD, etc.

Page 5: ISI work

5

Key Features of our approach

• Publish an abstract workflow in addition to executable w.– Description of workflow that is independent of the codes executed– Maps to the codes executed (the “executable workflow”)

• Publish both abstract and executable workflow using the OPM standard – OPM (Open Provenance Model) is independent of workflow framework

and is widely implemented– Other groups can import to their own workflow framework

• Publish data and workflows as Linked Data on the Web– All workflows and related files are web-accessible– Simple mechanism to share across local file systems

Page 6: ISI work

6

High level architecture

Interactive Browsing

(Pubby frontend)

Programatic access(external apps)

Wings workflow generation

OPMconversion Publication Share Reuse

Core

Portal

WINGS on local laptopWorkflow Template

WorkflowInstance

OPMexport

Core

Portal

WINGS on shared hostWorkflow Template

WorkflowInstance

OPMexport

Core

Portal

WINGS on web serverWorkflow Template

WorkflowInstance

OPMexport

LinkedData

Publication

Users

Other workflow environments

Page 7: ISI work

7

High level architecture (2)

RDFTriple store

RDFTriple store

Permanentweb-

accessiblefile

store

Permanentweb-

accessiblefile

store

RDF Upload Interface

RDF Upload Interface

SPARQL EndpointSPARQL

Endpoint

Linked Data publicationAbstract

Workflow(OPM)

ExecutableWorkflow

(OPM)

Web accessible

WorkflowData,

Components, etc.

Needed if workflow was developed in local host instead of a public server

OPMexport

Other workflow frameworks

OPMimport

Wings

Web browser

ISI web servers(http://wings.isi.edu/…)

Amazon EC2 cloud(http://ec2-184-72-160-64.…)

Page 8: ISI work

8

Executable and abstract workflow

Page 9: ISI work

9

OPMV extended model

account

account

account

Abstract template Node

Workflowtemplate

Input artifact1

Input artifact2

Outputartifact1

Abstractcomponent

Execution Node

Execution Input1

Execution Input2

Execution result

Specificcomponent

Execution account

Workflow Template Execution Results

user

account

accounthasArtifact

hasArtifact

hasWorkflowTemplate

hasArtifactTemplate

hasProcessTemplate

hasArtifactTemplate

hasArtifactTemplate

subClassOfwasGeneratedBy

wasGeneratedBy

used

used

usedused

wasControlledBy

hasSpecificComponenthasAbstractComponenthasProcess

Process

ArtifactArtifact

Artifact

Agent

AccountOPM Graph

Process

Artifact Artifact

Artifact

Red: OPM model

Black: OPM profile (extension)

Page 10: ISI work

10

Reproducibility

• 3 perspectives:– Reproducibility by an expert– Basic reproducibility by non-experts– Reproducibility by students from text only

• Or, not reproducible at all

Page 11: ISI work

11

Reproducibility Maps

Comparison of ligand binding sites using SMAP

Comparison of dissimilar protein structures using FATCAT

Docking using eHits/AutodockVina

Page 12: ISI work

12

Reproducibility maps: accessing the scripts and intermediate data

Page 13: ISI work

13

How can we use this in Wf4Ever ?

• The abstract workflow notion can be reused and imported to the workflows used in RO’s.– Complement to the workflow, to understand it better.– Allows tackling incomplete provenance.

• Additional workflow repository for recommendation– OPM (Open Provenance Model) is independent of workflow

framework and is widely implemented (Taverna has a OPM export too)

– Other groups can import to their own workflow framework• Workflow integration with WINGS.

– Semantic annotation of workflows.– Distributed workflow execution engine

Page 14: ISI work

14

Next steps

• Keep working on workflow abstraction.– Research on compatibility with problem solving methods

(PSMs).

• Create an OPMV/W3C PROV-O profile for common workflow representation.– Interoperability between workflow systems (Taverna).

• Work in workflows in different domains.– Biology, Astronomy.– Workflow reuse between different domains?

Page 15: ISI work

15

References

•The TB Drugome paper: http://funsite.sdsc.edu/drugome/TB/

• OPMO + OPMV mapped version: http://openprovenance.org/model/opmo

• WINGS workflow system: http://seagull.isi.edu/marbles/

•TB Drugome Wiki (Evolution of the work): http://seagull.isi.edu/wings-drugome/index.php/Main_Page

•Thanks to Yolanda Gil for letting me borrowing some of the Slides based on USCD slides for this presentation.

Page 16: ISI work

Date: 03/10/2011

Daniel Garijo Verdejo

Ontology Engineering Group. Laboratorio de Inteligencia ArtificialDepartamento de Inteligencia Artificial

Facultad de InformáticaUniversidad Politécnica de Madrid

Work at ISI, relation with wf4Ever,

future steps