D-prov use-case

6

Click here to load reader

description

A use case designed in the context of the Dataone provenance woring group illustrating how the provenance traces generated by differet workflow engines can be quered via the D-PROV model.

Transcript of D-prov use-case

Page 1: D-prov use-case

Use Case for D-PROV:Querying Provenance Traces Produced by Workflows Enacted by Different Systems

Khalid Belhajjame,

Fernando Seabra Chirigati,

Victor Cuevas

Page 2: D-prov use-case

Context and Objective

• D-PROV is a model that capture both workflow definitions, their provenance as well as the provenance of the results obtained by their execution. It expressive enough to capture the definition of workflows and provenance traces that are specified in multiple workflow systems, in particular Kepler, Taverna and VisTrails

• D-PROV provides users with an integrated access to workflow definitions and associated provenance traces

• It uses (extends) the W3C PROV model to capture the provenance traces produced by the execution of such workflows

• The objective of this use case is to show that D-PROV users are able to query (and combine) provenance traces that are produced by (equivalent) workflows that are specified and enacted using different systems, namely Taverna and VisTrails

• Note that while in the use case we focus on two equivalent workflows, generally speaking, D-PROV is expected to allow users to query and combine provenance traces of workflows that are not necessarily equivalent.

Page 3: D-prov use-case

Approach

• The approach adopted in the use case is a four-step process that is illustrated in the figure below

Enact the workflows within their native system

Enact the workflows within their native system

Export the provenance traces in the native format of the workflow systems

Map the workflows and associated provenance traces to D-PROV

Query the provenance traces produced by the workflow system using D-PROV

done

done

ongoing

Page 4: D-prov use-case

Workflows

We used two (equivalent) workflows specified within Taverna and VisTrails. Both workflows implement a simple in-silico experiment for pathway analysis. Given gene IDs, the workflows fetch the corresponding pathways. To do so, they make use of two KEGG web services

Taverna Workflow Vistrails Workflow

Page 5: D-prov use-case

Provenance Traces

• The two workflows were enacted within their respective system using different (yet overlapping) set of gene Ids as inputs

• The provenance traces were then captured and exported in different formats

• From the Taverna workflow, we used PROVO and JANUS formats

• From the VisTrails workflow, we used their own provenance format (based on XML) and OPM

• The workflows and their provenance are accessible through myExperiment [1]

• Workflows and their provenance traces are now being mapped to D-PROV

[1] http://www.myexperiment.org/packs/317.html

Page 6: D-prov use-case

Queries

Once the mapping is done, we would like to issue some queries, as the ones specified below, against D-PROV:

• Q1: Give the pathways that were produced by the pathway analysis workflow (as is defined within D-PROV), specifying the gene IDs that were used as inputs to that workflow

• The result of this query should be the union of pathways returned by Taverna and VisTrails workflows, together with the gene IDS used as input to both workflows.

• Q2: Give the pathways that were produced by the Tavernaworkflow, and that are associated with gene IDs that were not used as input to the VisTrails workflow

• This is a diff query