D-prov use-case
Click here to load reader
-
Upload
khalid-belhajjame -
Category
Technology
-
view
164 -
download
1
description
Transcript of D-prov use-case
Use Case for D-PROV:Querying Provenance Traces Produced by Workflows Enacted by Different Systems
Khalid Belhajjame,
Fernando Seabra Chirigati,
Victor Cuevas
Context and Objective
• D-PROV is a model that capture both workflow definitions, their provenance as well as the provenance of the results obtained by their execution. It expressive enough to capture the definition of workflows and provenance traces that are specified in multiple workflow systems, in particular Kepler, Taverna and VisTrails
• D-PROV provides users with an integrated access to workflow definitions and associated provenance traces
• It uses (extends) the W3C PROV model to capture the provenance traces produced by the execution of such workflows
• The objective of this use case is to show that D-PROV users are able to query (and combine) provenance traces that are produced by (equivalent) workflows that are specified and enacted using different systems, namely Taverna and VisTrails
• Note that while in the use case we focus on two equivalent workflows, generally speaking, D-PROV is expected to allow users to query and combine provenance traces of workflows that are not necessarily equivalent.
Approach
• The approach adopted in the use case is a four-step process that is illustrated in the figure below
Enact the workflows within their native system
Enact the workflows within their native system
Export the provenance traces in the native format of the workflow systems
Map the workflows and associated provenance traces to D-PROV
Query the provenance traces produced by the workflow system using D-PROV
done
done
ongoing
Workflows
We used two (equivalent) workflows specified within Taverna and VisTrails. Both workflows implement a simple in-silico experiment for pathway analysis. Given gene IDs, the workflows fetch the corresponding pathways. To do so, they make use of two KEGG web services
Taverna Workflow Vistrails Workflow
Provenance Traces
• The two workflows were enacted within their respective system using different (yet overlapping) set of gene Ids as inputs
• The provenance traces were then captured and exported in different formats
• From the Taverna workflow, we used PROVO and JANUS formats
• From the VisTrails workflow, we used their own provenance format (based on XML) and OPM
• The workflows and their provenance are accessible through myExperiment [1]
• Workflows and their provenance traces are now being mapped to D-PROV
[1] http://www.myexperiment.org/packs/317.html
Queries
Once the mapping is done, we would like to issue some queries, as the ones specified below, against D-PROV:
• Q1: Give the pathways that were produced by the pathway analysis workflow (as is defined within D-PROV), specifying the gene IDs that were used as inputs to that workflow
• The result of this query should be the union of pathways returned by Taverna and VisTrails workflows, together with the gene IDS used as input to both workflows.
• Q2: Give the pathways that were produced by the Tavernaworkflow, and that are associated with gene IDs that were not used as input to the VisTrails workflow
• This is a diff query