ISI work

Post on 18-Dec-2014

776 views 2 download

description

My work at ISI during the summer, its relation with wf4Ever and next steps in my research

Transcript of ISI work

Date: 13/10/2011

Work at ISI, relation with wf4Ever,

future steps

Daniel Garijo Verdejo,Yolanda Gil

Ontology Engineering Group. Laboratorio de Inteligencia ArtificialDepartamento de Inteligencia Artificial

Facultad de InformáticaUniversidad Politécnica de Madrid

1

The TB Drugome

3

Project goals

Text:Narrative of method,

software packages used

Software:scripted codes + manual steps +

notes/emails

Workflow: Workflow/scripts describing

dataflow, codes, and parameters

Data:Key datasets and figures/plots

Typical Published Article

Text:Narrative of method,

software packages used

Data:Key datasets and figures/plots

Reproducible Article: Weaver, GenePattern GRRD, etc.

NOT published, loosely recorded:

4

Problem with existing approaches

Only executable workflow is published:1. Must have the same codes to re-execute

the workflow, but:– Codes become unavailable

• Eg: eHits was proprietary and replaced by AutodockVina

– Different labs prefer different codes • Eg: R vs Matlab• Eg: viz in Citoscape vs yEd

2. Must have the same workflow framework to re-execute the workflow– Must have R for Weaver

3. Must import files to local file system and workflow framework– Must import bundle of workflow/data/code

files to reproduce

Workflow: Workflow/scripts describing

dataflow, codes, and parameters

Text:Narrative of method,

software packages used

Data:Key datasets and figures/plots

Reproducible Article: Weaver, GenePattern GRRD, etc.

5

Key Features of our approach

• Publish an abstract workflow in addition to executable w.– Description of workflow that is independent of the codes executed– Maps to the codes executed (the “executable workflow”)

• Publish both abstract and executable workflow using the OPM standard – OPM (Open Provenance Model) is independent of workflow framework

and is widely implemented– Other groups can import to their own workflow framework

• Publish data and workflows as Linked Data on the Web– All workflows and related files are web-accessible– Simple mechanism to share across local file systems

6

High level architecture

Interactive Browsing

(Pubby frontend)

Programatic access(external apps)

Wings workflow generation

OPMconversion Publication Share Reuse

Core

Portal

WINGS on local laptopWorkflow Template

WorkflowInstance

OPMexport

Core

Portal

WINGS on shared hostWorkflow Template

WorkflowInstance

OPMexport

Core

Portal

WINGS on web serverWorkflow Template

WorkflowInstance

OPMexport

LinkedData

Publication

Users

Other workflow environments

7

High level architecture (2)

RDFTriple store

RDFTriple store

Permanentweb-

accessiblefile

store

Permanentweb-

accessiblefile

store

RDF Upload Interface

RDF Upload Interface

SPARQL EndpointSPARQL

Endpoint

Linked Data publicationAbstract

Workflow(OPM)

ExecutableWorkflow

(OPM)

Web accessible

WorkflowData,

Components, etc.

Needed if workflow was developed in local host instead of a public server

OPMexport

Other workflow frameworks

OPMimport

Wings

Web browser

ISI web servers(http://wings.isi.edu/…)

Amazon EC2 cloud(http://ec2-184-72-160-64.…)

8

Executable and abstract workflow

9

OPMV extended model

account

account

account

Abstract template Node

Workflowtemplate

Input artifact1

Input artifact2

Outputartifact1

Abstractcomponent

Execution Node

Execution Input1

Execution Input2

Execution result

Specificcomponent

Execution account

Workflow Template Execution Results

user

account

accounthasArtifact

hasArtifact

hasWorkflowTemplate

hasArtifactTemplate

hasProcessTemplate

hasArtifactTemplate

hasArtifactTemplate

subClassOfwasGeneratedBy

wasGeneratedBy

used

used

usedused

wasControlledBy

hasSpecificComponenthasAbstractComponenthasProcess

Process

ArtifactArtifact

Artifact

Agent

AccountOPM Graph

Process

Artifact Artifact

Artifact

Red: OPM model

Black: OPM profile (extension)

10

Reproducibility

• 3 perspectives:– Reproducibility by an expert– Basic reproducibility by non-experts– Reproducibility by students from text only

• Or, not reproducible at all

11

Reproducibility Maps

Comparison of ligand binding sites using SMAP

Comparison of dissimilar protein structures using FATCAT

Docking using eHits/AutodockVina

12

Reproducibility maps: accessing the scripts and intermediate data

13

How can we use this in Wf4Ever ?

• The abstract workflow notion can be reused and imported to the workflows used in RO’s.– Complement to the workflow, to understand it better.– Allows tackling incomplete provenance.

• Additional workflow repository for recommendation– OPM (Open Provenance Model) is independent of workflow

framework and is widely implemented (Taverna has a OPM export too)

– Other groups can import to their own workflow framework• Workflow integration with WINGS.

– Semantic annotation of workflows.– Distributed workflow execution engine

14

Next steps

• Keep working on workflow abstraction.– Research on compatibility with problem solving methods

(PSMs).

• Create an OPMV/W3C PROV-O profile for common workflow representation.– Interoperability between workflow systems (Taverna).

• Work in workflows in different domains.– Biology, Astronomy.– Workflow reuse between different domains?

15

References

•The TB Drugome paper: http://funsite.sdsc.edu/drugome/TB/

• OPMO + OPMV mapped version: http://openprovenance.org/model/opmo

• WINGS workflow system: http://seagull.isi.edu/marbles/

•TB Drugome Wiki (Evolution of the work): http://seagull.isi.edu/wings-drugome/index.php/Main_Page

•Thanks to Yolanda Gil for letting me borrowing some of the Slides based on USCD slides for this presentation.

Date: 03/10/2011

Daniel Garijo Verdejo

Ontology Engineering Group. Laboratorio de Inteligencia ArtificialDepartamento de Inteligencia Artificial

Facultad de InformáticaUniversidad Politécnica de Madrid

Work at ISI, relation with wf4Ever,

future steps