P-Plan
description
Transcript of P-Plan
1Yolanda GilUSC Information Sciences Institute
Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data
Daniel Garijo
OEG-DIAFacultad de Informática
Universidad Politécnica de Madrid
Yolanda Gil
Information Sciences Institute and Department of Computer ScienceUniversity of Southern California
http://www.isi.edu/~gil
2Yolanda GilUSC Information Sciences Institute
W3C PROV http://www.w3.org/2011/prov/
3Yolanda GilUSC Information Sciences Institute
A Workflow Executionin PROV
Benefits:• Makes the work
inspectable Shortcomings:
• Hard to reproduce• Not efficient to
reuse
5Yolanda GilUSC Information Sciences Institute
Replication of Crohn’s Disease Association Study from [Duerr et al, Science 06]
6Yolanda GilUSC Information Sciences Institute
Replication of Early-Onset Parkinson’s Disease Study from [Bayrakli et al, Human Mutation 07]
7Yolanda GilUSC Information Sciences Institute
Reusability Lower cost
• “Scientists and engineers spend more than 60% of their time just preparing the data for model input or data-model comparison” (NASA A40)
Better quality• “We write QC without thinking about
the best way to do the WC. Such approaches perpetuate mediocrity. If someone did it right once, it would benefit many people.” (EC WF CQ)
More efficient• “I often see that I’m repeating the work
that 100 other people have been doing to obtain and process the data.” (EC WF CQ)
8Yolanda GilUSC Information Sciences Institute
Access to Data Analytics Expertise [Science 2011]
9Yolanda GilUSC Information Sciences Institute
The TB-Drugome [Kinnings et al., PLoS CompBio 2010]
“We report a computational approach to construct a drug-target network… applied to the genome of tuberculosis…”
“The TB-drugome reveals that approximately one-third of the drugs examined have the potential to… treat tuberculosis…”
“The methodology can be applied to other pathogens of interest …”
10Yolanda GilUSC Information Sciences Institute
Executable and Abstract Workflow
What I actually run The method that I followed
11Yolanda GilUSC Information Sciences Institute
The Ontology for Biomedical Investigationshttp://obi-ontology.org/
12Yolanda GilUSC Information Sciences Institute
Semantic Web Applications in Neuromedicine (SWAN) Ontology http://www.w3.org/TR/hcls-swan/
13Yolanda GilUSC Information Sciences Institute
Research Objectshttp://www.wf4ever-project.org/research-object-model
14Yolanda GilUSC Information Sciences Institute
Executable and Abstract Workflow
What I actually run The method that I followed
15Yolanda GilUSC Information Sciences Institute
Semantic Workflows in Wings[Gil et al 10][Gil et al 09][Kim & Gil et al 08][Kim et al 06]
Workflows are augmented with semantic constraints
• Each workflow constituent has a variable associated with it
– Workflow components, arguments, datasets
• Constraints are used to restrict workflow variables
• Can define abstract classes of components
– Concrete components model exec. codes
Workflow reasoners propagate and use semantic constraints
Uses semantic web standards: OWL/RDF, SPARQL, rules
9
16Yolanda GilUSC Information Sciences Institute
Documents
Plain text
MarkupInDoc
htmlDoc
latexDoc
Language
EnFr
Model
DecTree
SVMFeatureVector
Size
Ontologies for Data and Workflow Components
CorrelationScoring
ChiSq InfoGain
Modeler
LinearRegression
DecTreeModeler
C4.5 J48
MutInfo
MatLab_LR
R_LRWeka-C4.5
WSJ-2010
17Yolanda GilUSC Information Sciences Institute
Semantic Workflows: Abstractions Based on Ontologies [Gil et al 2011]
Term Weighting
Correlation Scoring
TF-IDF
Chi Squared
CODE
CODE
18Yolanda GilUSC Information Sciences Institute
Publishing Workflows on the Web with OPMWhttp://www.opmw.org
account
account
account
Abstract template Node
Workflowtemplate
Input artifact1
Input artifact2
Outputartifact1
Abstractcomponent
Execution Node
Execution Input1
Execution Input2
Execution result
Specificcomponent
Execution account
Workflow Template Execution Results
user
account
accounthasArtifact
hasArtifact
hasWorkflowTemplate
hasArtifactTemplate
hasProcessTemplate
hasArtifactTemplate
hasArtifactTemplate
subClassOfwasGeneratedBy
wasGeneratedBy
used
used
usedused
wasControlledBy
hasSpecificComponenthasAbstractComponenthasProcess
Process
ArtifactArtifact
Artifact
Agent
AccountOPM Graph
Process
Artifact Artifact
Artifact
Red: OPM model
Black: OPMW profile (extension)
Extension of the Open Provenance Model
19Yolanda GilUSC Information Sciences Institute
Published as Linked Data: Executed Workflow + Abstract Workflow + Data + Steps + Codes…
20Yolanda GilUSC Information Sciences Institute
P-PLAN: Extending PROV to represent plans
Plan representations can be very complex• Iteration, conditionals, decomposition, etc.
P-PLAN is a core representation with only:• Sequences of steps• Parallel steps
P-PLAN, like PROV, is a DAG• Simplest representation of plans
22Yolanda GilUSC Information Sciences Institute
Queries about Workflows Published as Linked Data
Find all abstract workflows (?plan) in which a given entity (?entity) has been used when executing them
SELECT DISTINCT ?plan WHERE { ?entity a p-plan:Entity,prov:Entity; p-plan:correspondsTo ?templVariable. ?templVariable a p-plan:Variable; p-plan:isVariableOfPlan ?plan.}
23Yolanda GilUSC Information Sciences Institute
Conclusions
Linked data as a vehicle to publish science processes• Workflows, experiments, …
Important to publish method, not just provenance• Reproducibility, efficiency, access to expertise
W3C PROV useful to publish execution P-PLAN is an extension of PROV for publishing
methods• Plan, step, variable
P-PLAN is applicable beyond science