From Provenance Standards and Tools to Queries and Actionable Provenance
Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories
Transcript of Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories
![Page 1: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/1.jpg)
Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire
Data61 and LAND & WATER
Standard Proveance Reporting and Scientifc Software Management in Virtual Labs
![Page 2: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/2.jpg)
What are VLs?
What is VHIRL?
What is provenance?
How does VHIRL manage provenance (or not)?
How do we represent VHIRL’s actions to standardised provenance?
What work, other than representation, is needed for provenance?
What benefits do we get from this work?
Outline
![Page 3: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/3.jpg)
What are VLs?
![Page 4: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/4.jpg)
From https://nectar.org.au/virtual-laboratories-1, they are:
data repositories and computational tools and streamlining research workflows
What are VLs?
![Page 5: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/5.jpg)
What is VHIRL?
![Page 6: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/6.jpg)
• Virtual Hazards Impact & Risk Laboratory (VHIRL) is a scientific workflow portal
• Gives researchers access to a cloud computing for natural hazards research
• data from a variety of sources• uses cloud computing resources
• currently has tools for the earthquakes, tsunamis & tropical cyclones in the Asia-Pacific region
What is VHIRL?
![Page 7: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/7.jpg)
Components of the Virtual Lab: Virtual Hazard Impact & Risk Laboratory (VHIRL)
Data Services Processing Services
Compute Services Enablers
Virtual Laboratories
/AppsData Analytics
Magnetics
Gravity
DEM
eScript
ANUGA
NCIPetascale
NCICloud
NeCTAR Cloud
AmazonCloud
Desktop
Service Orchestration
ProvenanceMetadata
Auth.
CoastalInundation
Tsuanmi Inundation
Scenario
Cyclone Wind Path Calculation
Landsat
Bathymetry
Cyclone WindModel
Surface Wave Propagation
(earthquake)
TCRM
Connectivity via Provenance | Melanie Ayre | eResearch Australiasia 2015, Brisbane
![Page 8: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/8.jpg)
![Page 9: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/9.jpg)
![Page 10: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/10.jpg)
![Page 11: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/11.jpg)
What is provenance?
![Page 12: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/12.jpg)
From http://en.wikipedia.org/wiki/Provenance#Computer_Science:
What is provenance?
“Computer science uses the term provenance to mean the lineage of data or processes, as per data provenance. However there is a field of informatics research within computer science called provenance that studies how provenance of data and processes should be characterised, stored and used. Semantic web standards bodies, such as the World Wide Web Consortium, ratified a standard for provenance representation in 2014, known as PROV.”
![Page 13: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/13.jpg)
How do we represent VLs using standardised provenance?
![Page 14: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/14.jpg)
• Natively tracks ‘everything’ used for scenario (re)runs• Is not a: Data store, Software repo, Records mgt system• Externalises as much information mgt as possible• Code managed by the SSSC
VHIRL’s own data management
![Page 15: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/15.jpg)
• SSSC is a web-based system to manage code & dependencies
• Contains Problems & Solutions that define a workflow
• Solutions consists of a Toolbox• Toolboxes are code wrapped
in a Python script + description of the required inputs
Scientific Solutions Software Centre (SSSC)
Class diagram for the SSSC
![Page 16: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/16.jpg)
Scientific Solutions Software Centre (SSSC)• Beautiful, RESTful APIthis example: http://vhirl-dev.csiro.au/scm/toolbox/2
• Solution prov:Plan
• No RDF metadata, yet!
![Page 17: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/17.jpg)
Mapping VHIRL to PROV 1
Input Data Process Output Data
![Page 18: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/18.jpg)
Mapping VHIRL to PROV 2
Code Process Output Data
Config
Input Data
“Ontology Design Pattern”
![Page 19: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/19.jpg)
Mapping VHIRL to PROV 3
Code Process Output Data
Config
Input Data
Who/ which
system
Who
wasGeneratedBy
wasAttributedTo
wasAssociatedWith
used
Entity Activity AgentPROV classes:
![Page 20: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/20.jpg)
Mapping VHIRL to PROMS
Report N
Entity Activity AgentPROV classes:PROMS classes:
hadStartingActivity /
hadEndingActivityReporting System X
reportingSystem
R.S. Report
![Page 21: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/21.jpg)
Mapping VHIRL to PROMS
![Page 22: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/22.jpg)
VHIRL provenance into PROMS Server
Report N
Entity Activity AgentPROV classes:PROMS classes:
Reporting System X
R.S. Report
Report NReport N
Report M
Report NReporting System Y Report N
Report NReport N
OrganisationalProvenance
Store
reported and stored
![Page 23: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/23.jpg)
Modelling VHIRL’s data types
VL Run output data
user
actedOnbehalfOf
The VL
Report N
reportingSystem
managed data
web service
data
user supplied
data
managed code
user supplied
code
![Page 24: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/24.jpg)
PROMS Reporting Toolkits
![Page 25: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/25.jpg)
VHIRL’s native PROV output
RDF file
![Page 26: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/26.jpg)
What work other, than representation, is needed for
provenance?
![Page 27: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/27.jpg)
Provenance effort (step) pyramid
Data Management
Establishing Reporting
Continued Reporting
![Page 28: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/28.jpg)
managed data
web service
data
user supplied
data
managed code
user supplied
code
Data Management
output data
all Entities need to be ID’d (via URI) and persisted
VL Runeach VL run is reported as an Activity within a Report
each VL instance has/needs an ID and is modelled as a Reporting System
usereach VL user is known by their login (account) details. Modelled as a Reporter
The VL
Report N
each VL Report is ID’d and persisted in the VL Provenance Store
![Page 29: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/29.jpg)
managed data
web service
data
user supplied
data
managed code
user supplied
code
Data ManagementVL ID’d and persisted
output data
cited using PROMS-O format
soon to be VL ID’d and persisted, with minimal metadata recorded too
SSSC ID’s and persisted
perhaps SSSC ID’s and persisted, perhaps VL managed
soon to be VL ID’d and persisted, if required, perhaps with time limits
![Page 30: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/30.jpg)
managed data
web service
data
user supplied
data
managed code
user supplied
code
Data ManagementVL ID’d and persisted
output data
cited using PROMS-O format
soon to be VL ID’d and persisted, with minimal metadata recorded too
SSSC ID’s and persisted
perhaps SSSC ID’s and persisted, perhaps VL managed
soon to be VL ID’d and persisted, if required, perhaps with time limits
Virtual Labs Service Citation Example
[{ref}] {service title}{service endpoint URI}{query}{time queried}{cached copy ID}
[1] “Subset of elevation”
http://pid.csiro.au/service/anuga-thredds“bussleton.nc?var=elevation&spatial=bb&north=-33.06495205829679&south=-33.551573283840156&west=114.84967874597227&east=115.70661233971667&temporal=all&time_start=&time_end=&horizStride”
“2014-12-15T13:15:11”
http://pid.csiro.au/dataset/abcd1234
![Page 31: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/31.jpg)
Establishing Reporting
VLReport
OrganisationalProvenance
Store
querying & redelivery
Pro
vena
nce
Rep
ortin
g To
olki
t
C#
Java
Python
![Page 32: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/32.jpg)
Establishing Reporting - Reporting Toolkits
managed data
web service
data
VL Run
“Grid X”
“Service Y”
“Run 456”
e1 = Entity(title='Grid X',description='netCDF grid of property X',uri='http://eg-vl.org.au/dataset/123',downloadURL='http://eg-vl.org.au/dataset/123?_view=dl',wasAttributedTo='http://data.ga.gov.au/id/person/john.doe')
Agent N
Report N Report for Run 456
![Page 33: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/33.jpg)
Establishing Reporting - Reporting Toolkits
managed data
web service
data
VL Run
“Grid X”
“Service Y”
“Run 456”
e1 = Entity(title='Grid X',description='netCDF grid of property X',uri='http://eg-vl.org.au/dataset/123',downloadURL='http://eg-vl.org.au/dataset/123?_view=dl',wasAttributedTo='http://data.ga.gov.au/id/person/john.doe')
Agent N
e2 = ServiceEntity(title='Subset of elevation',description='5km solar radiation interpolated raster service',serviceBaseUri='http://siss2.anu.edu.au/anuga/busselton.nc',query='var=elevation&spatial=bb&north=-33.06495205&south=-33.551573283&west=114.84967874&east=115.70661233&temporal=all&time_start=&time_end=&horizStride',queriedAtTime='2014-12-15T13:15:11'chachedCopy='http://bom.gov.au/dataset/678')
Report N Report for Run 456
![Page 34: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/34.jpg)
Establishing Reporting - Reporting Toolkits
managed data
web service
data
VL Run
“Grid X”
“Service Y”
“Run 456”
Agent N
a0 = Activity(title='Run 456',description='Upper bound run, full Grid X use',wasAssociatedWith={VL added automatically},startedAtTime={VL added automatically},endedAtTime={VL added automatically},usedEntities= [e1, e2],generatedEntities={VL added automatically})Report N Report for
Run 456
![Page 35: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/35.jpg)
Establishing Reporting - Reporting Toolkits
managed data
web service
data
VL Run
“Grid X”
“Service Y”
“Run 456”
Agent N
Report N Report for Run 456
r0 = Report(title='Report for Run 456',description='Upper bound run, full Grid X use',startingActivity={VL added automatically},endingActivity={VL added automatically})
rs0 = ReportSender('http://provstore.vl.org.au/report/')rs.send(r0)
![Page 36: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/36.jpg)
What do we get from this work?
![Page 37: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/37.jpg)
Graph power!
Report NReporting System X
...
![Page 38: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/38.jpg)
URI power!
Report NReporting System X
corporate staff DB
temp repo
public web service
DAP-style repo
PROMS instance
![Page 39: Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories](https://reader031.fdocuments.net/reader031/viewer/2022030307/58e77c0e1a28abe7528b4973/html5/thumbnails/39.jpg)
Distributed graphs!
GA PROMS instance
VL PROMS instance
Uni Prov Store
Distributed Querying via endpoint cache