A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific...
-
Upload
idafen-santana-perez -
Category
Education
-
view
134 -
download
0
description
Transcript of A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific...
A Semantic-Based Approach to Attain Reproducibility of
Computational Environments in Scientific Workflows: A Case Study
Idafen Santana-Perez1, Rafael Ferreira da Silva2, Mats Rynge2 Ewa Deelman2, María S. Pérez-Hernández1, Oscar Corcho1
1Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain 2Univ. of Southern California, Information Sciences Institute, Marina Del Rey, CA, USA
REPPAR'14 1st International Workshop on Reproducibility in Parallel
Computing. Porto, August 2014
Index
• Introduction • Reproducibility tools
• WICUS • Pegasus & PRECIP
• Reproducibility process • Annotation • Infrastructure Specification Algorithm
• Use case • Conclusion & Future Work
2 A Semantic-Based Approach to Attain Reproducibility...
Introduction
• Experiments in empirical science • Primary component of the scientific method • Main method for validating a hypothesis • Repeatable procedure
• Scientific publications • Announce a result • Convince readers that the result is correct
• Computational science • In silico science • Computational scientific workflow: “a precise, executable
description of a scientific procedure” [De Roure, 2011]
• “Reproducibility in principle underpins the scientific method” [Goble,2012]
• “Its about capturing, preserving, reusing and curating” [Goble 2012]
3 A Semantic-Based Approach to Attain Reproducibility...
Introduction
• Reproducibility in Scientific Experiments
4 A Semantic-Based Approach to Attain Reproducibility...
INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT
IN V
IVO
/VIT
RO
IN
SIL
ICO
Introduction
• Reproducibility in Scientific Experiments
5 A Semantic-Based Approach to Attain Reproducibility...
INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT
IN V
IVO
/VIT
RO
IN
SIL
ICO
CLOUD
Introduction
• Reproducibility in Scientific Experiments
6 A Semantic-Based Approach to Attain Reproducibility...
FORMER EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
• “Its about capturing, preserving, reusing and curating” [Goble 2012]
CLOUD
Semantics
• Reproducibility in Scientific Experiments
7 A Semantic-Based Approach to Attain Reproducibility...
FORMER EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
• “Its about capturing, preserving, reusing and curating” [Goble 2012]
Semantics
• Vocabularies for documenting the main resources involved on the execution of a WF. • Software • Hardware • Computational resources • Workflow
• Increasing the understanding of the underlying components
• Making this knowledge explicit • Standard technology: RDF & OWL • Easy to extend and integrate
8 A Semantic-Based Approach to Attain Reproducibility...
Semantics
• WICUS ontology network • Workflow Infrastructure Conservation Using Semantics • http://purl.org/net/wicus • 5 ontologies • WICUS Software Stack ontology • WICUS Hardware Specs ontology • WICUS Scientific Virtual Appliance ontology • WICUS Workflow Execution Requirements ontology
• WICUS Ontology: links the previous ontologies
9 A Semantic-Based Approach to Attain Reproducibility...
WICUS ontology network
• WICUS Software Stack ontology • http://purl.org/net/wicus-stack
10 A Semantic-Based Approach to Attain Reproducibility...
WICUS ontology network
• WICUS Hardware Specs ontology • http://purl.org/net/wicus-hwspecs
11 A Semantic-Based Approach to Attain Reproducibility...
WICUS ontology network
12 A Semantic-Based Approach to Attain Reproducibility...
• WICUS Scientific Virtual Appliance ontology • http://purl.org/net/wicus-sva
WICUS ontology network
• WICUS Workflow Execution Requirements ontology • http://purl.org/net/wicus-reqs
13 A Semantic-Based Approach to Attain Reproducibility...
WICUS ontology network
• WICUS ontology network • http://purl.org/net/wicus
14 A Semantic-Based Approach to Attain Reproducibility...
CLOUD
PRECIP
• Reproducibility in Scientific Experiments
15 A Semantic-Based Approach to Attain Reproducibility...
FORMER EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
• “Its about capturing, preserving, reusing and curating” [Goble 2012]
PRECIP
• Pegasus Repeatable Experiments for the Clouds In Python
• Experiment management control API • Works with commercial and academic Clouds • Tag-based system • No need of pre-installed software on the VM image • Create VM, transfer files, run commands remotely • Linux
16 A Semantic-Based Approach to Attain Reproducibility...
Pegasus
• Pegasus WMS • Million-task workflows • Records
• Data about execution • Intermediate results
• Replica Catalog • DAX (Direct Acyclic Graph in XML) • Transformation Catalog • HTCondor for executing individual tasks
17 A Semantic-Based Approach to Attain Reproducibility...
CLOUD
Reproducibility process
• Reproducibility in Scientific Experiments
18 A Semantic-Based Approach to Attain Reproducibility...
FORMER EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
• “Its about capturing, preserving, reusing and curating” [Goble 2012]
Reproducibility process
19 A Semantic-Based Approach to Attain Reproducibility...
Pegasus Transf. Catalog
DAX xml DAX annotator
TC annotator
WF Annot
SW Comp
Catalog
WF & Config Annot
Inf. Spec. Algorithm
SVA Catalog
Precip Script
1
4
2
5
6
7
89
WMS Annot
3
Reproducibility process
• Infrastructure Specification Algorithm • Goal: obtain an specification defining what VMs need to be
created, what software components must be deployed and their configuration.
20 A Semantic-Based Approach to Attain Reproducibility...
Reproducibility process
• Infrastructure Specification Algorithm
21 A Semantic-Based Approach to Attain Reproducibility...
GET WF REQUIREMENTS
GET <REQ,STACKS>
GET <REQ,D-GRAPH>
GET AVAILABLE SVA
GET <SVA,STACKS>
CALCULATE REQ-SVA
COMPATIBILITY
GET MAX COMPATIBLE
REQ-SVA
CLEAN REQ D-GRAPH
Infrastructure Specification Algorithm
22 A Semantic-Based Approach to Attain Reproducibility...
Infrastructure Specification Algorithm
23 A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
Infrastructure Specification Algorithm
24 A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S15
Infrastructure Specification Algorithm
25 A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S15
Infrastructure Specification Algorithm
26 A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S13
S15
S15
S14
S14
Infrastructure Specification Algorithm
27 A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S13
S15
S15
S14
S14
Infrastructure Specification Algorithm
28 A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S13
S15
S15
S14
S14
Infrastructure Specification Algorithm
29 A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S13
S15
S15
S14
S14
Infrastructure Specification Algorithm
30 A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S13
S15 S15
S14
S15
Infrastructure Specification Algorithm
31 A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S13
S15 S15
S14
S15
Infrastructure Specification Algorithm
32 A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S13
S15 S15
S14
S15
Infrastructure Specification Algorithm
33 A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S15
S14
S15
Infrastructure Specification Algorithm
34 A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S15
S14
S15
Infrastructure Specification Algorithm
35 A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S15
S14
S15
Infrastructure Specification Algorithm
36 A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S15
S14
S15
Use case
• Montage Workflow • Astronomy workflow • Construct large image mosaics of the sky • Montage Software distribution • 59 binaries
• Target IaaS Cloud Providers • Amazon EC2 • FutureGrid
37 A Semantic-Based Approach to Attain Reproducibility...
RO available at http://pegasus.isi.edu/publications/reppar
Use case
• Goal
38 A Semantic-Based Approach to Attain Reproducibility...
AWS
MONTAGE WORKFLOW
ENVIRONMENT
ANNOTATE REPRODUCE
SEMANTIC ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
FG
Use case
• Annotations
39 A Semantic-Based Approach to Attain Reproducibility...
Use case
• Annotations: Workflow
40 A Semantic-Based Approach to Attain Reproducibility...
Use case
• Annotations: Software
41 A Semantic-Based Approach to Attain Reproducibility...
Use case
• Annotations: C. Resources
42 A Semantic-Based Approach to Attain Reproducibility...
Use case
• Results • 2 PRECIP scripts (AWS and FG): creates VM, deploys and
configure software and executes the WF. • Successfully executed • Same results as the original WF
43 A Semantic-Based Approach to Attain Reproducibility...
Conclusions
• Semantic modelling approach to conserve computational resources
• PRECIP for reproducing the execution environment on the Cloud
• WICUS annotations + PRECIP scripting capabilities • Apply those ideas to Montage on AWS EC2 and FG
• Assume that the binaries are available
44 A Semantic-Based Approach to Attain Reproducibility...
Future Work
• Apply to other workflows • Improve the annotation process • Extend WICUS ontology network (new release coming
soon) • Software variants • Low level libraries • Incompatibilities • User policies
45 A Semantic-Based Approach to Attain Reproducibility...
Questions
46 Conservation of Scientific Workflow Infrastructures by Using Semantics