“provenance”
DATA TRACK
Chair : Krystyna MarekRapporteur: Wolfram Horstmann
6th e-Infrastructure ConcertationLyon
24 Nov 2008
Motivation
• Last two meetings were on standards
• It was proposed to have a more focussed discussion– Focus on practice and interoperability rather
than standards
• Select an arbitrary but important topic
Notions of Provenance
• Where do data objects* originate from? – Scientific Work -- examples
• Instrumentation techniques– Manufacturers of hard- and software
• Methodologies– Processes, e.g. gene sequencing
– Technical/Local -- examples
• (web)-identifiers• Database, repository name
* Primary data, documents, metadata …
Why Provenance?
• Quoting / Citing / Referencing as global scientific principle – „Reproducible research“
• Giving credits to authors / creators in distributed environments
• Original location / context has to be known
• Experienced in Grid-Environments [1]
Provenance & Interoperability
• Re-Use / Sharing: “Addressing/Accessing”– Common view, common use– Unidirectional: No change of data objects!
• Federation: “Discovering in Context”– Remote representation of distributed DOs
• Aggregation: “Contextualizing”– Add unchanged object in a context
• Processing/Annotation: “Changing”– Uni- vs. Bidirectional: Change of DOs and remote
representation vs. back-storage (e.g. CVS)
IVOA
• Astronomy area: Repositories use OAI-PMH to provide general
• Provenance as kind of metadata– „Observation data model“ – History of data (process „lineage“)
• Processing• Configuration: telescope, camera • Ambient condiditions: temperature etc.
– Versioning is included (also algorithms etc.)
MetaFor
• Data from numerical models
• Descriptive information from model
• Models are often transformed
• Database / Registry for models in distributed repositories
D4Science
• Framework for
• More than simple import framework
• Graphs representing provenance information– Thematic: fishing site / statistic /
DRIVER
• Focus on document repositories– Some 100 …
• Simple Provenance– OAI-PMH
• Further (2nd order) Provenance– OAI-PMH („about“): repository identifiers– Enhanced Publications >> OAI-ORE
• Semantic Model (named graphs) representing packages of documents and data objects
Solutions
• Provenance– Registries for curator, publisher etc.– Resolving over registry
• Diversity of approaches– CIDOC-CRM, OPM, EuroStats, – Languages: RDF / OAI-ORE
Differentiations
• Expertise from Data-Centers as opposed to Data-Providers– Infrastructures should provide functions to
add provenenace information (but do not)– e.g. EGEE provides an additional module for
recording provenance data
Hot topics
• Propagating provenance: versioning
• Disambiguation / Deduplication– different identical objects
• Who provides the data?– Each processing step should provide at least
some metadata
Recommendations for Infrastructure
• Standards for Provenance: Non-existing?– Each processing step should provide at least
some metadata – Look deeper into specific implementations in
subject communities
• Technical point to point organisation– Bilateral
• Programming a meeting– 24/25th ESA: earth science meeting?
Top Related