Climate Science for a Sustainable Energy Future Provenance

14
December 26, 2012 1 Climate Science for a Sustainable Energy Future (CSSEF) Provenance ERIC STEPHAN Pacific Northwest National Laboratory Richland, WA

description

Invited talk at the Earth System Grid Federation workshop My web page: http://www.linkedin.com/in/ericstephan My citations: http://scholar.google.com/citations?hl=en&user=f4bH2esAAAAJ

Transcript of Climate Science for a Sustainable Energy Future Provenance

Page 1: Climate Science for a Sustainable Energy Future Provenance

December 26, 2012 1

Climate Science for a Sustainable Energy Future (CSSEF) Provenance ERIC STEPHAN Pacific Northwest National Laboratory Richland, WA

Page 2: Climate Science for a Sustainable Energy Future Provenance

Provenance Definitions

!   Provenance is a record that describes the people, institutions, entities, and activities, involved in producing, influencing, or delivering a piece of data or a thing. https://dvcs.w3.org/hg/prov/raw-file/tip/presentations/wg-overview/overview/index.html

!   Metadata used to describe the origin of the data and any of its modifications.

!   A log of historical events describing the origin of data and any subsequent changes.

December 26, 2012 2

Page 3: Climate Science for a Sustainable Energy Future Provenance

3

Popular Provenance Vocabularies

See  Also:  W3C  Incubator  Group,  h8p://www.w3.org/2005/Incubator/prov/wiki/W3C_Provenance_Incubator_Group_Wiki  

Open  Provenance  Model  

The  Provenance  Ontology  (Prov-­‐O)  

Proof  Markup  Language  Ontology  

Dublin  Core  Provenance  Task  Force  

Page 4: Climate Science for a Sustainable Energy Future Provenance

4

The Systems Science Challenge !   Studying  complex  systems  typically  has  the  

following  characterisEcs:    !   Interdisciplinary  studies  involve  mulEple  stakeholders    !   Leverage  mulEple  tools,  algorithms,  data  products,  and  

sensors  !   Reliant  on  highly  iteraEve  and  repeEEve  techniques  !   Steps  are  difficult  to  document  and  are  oLen  Eme  

commiMed  to  memory  or  notes.  

!   Sharing  complex  systems  data  between  collaborators  has  the  following  inherent  problems  !   To  establish  data  confidence,  scienEsts  accessing  data  

(consumers)  need  to  know  data  origin  and  modificaEon  history  (data  provenance).      

!   ScienEsts  producing  the  data  need  a  consistent  means  to  convey  data  provenance  to  targeted  scienEfic  communiEes  !    the  data  provenance  needs  to  be  diverse  enough  to  

support  any  data.  !   It  must  also  be  based  on  community  standards  to  

cross-­‐reference  searches    

Page 5: Climate Science for a Sustainable Energy Future Provenance

December 26, 2012 5

Example: Motivating User Questions About the CSSEFARMBE Diagnostics Dataset

CAM  Modeler  

How  do  CAM  output  

Variables  map  to  the  

CSSEFARMBE  variables?  

What  addiEonal  ancillary  

informaEon  is  available  about  this  dataset?  

Atmosphere  ScienEst  

How  did  both  CSSEFARMBE  and  ARMBE  originate?  

Page 6: Climate Science for a Sustainable Energy Future Provenance

December 26, 2012 6

The Knowledge Gap: CSSEF Users Needing Additional Answers from Data Producers

CSSEFARMBE  Developers  

Test  NCL  Code   ARMBE  

Header  

CSSEF  ARMBE  Header  

Tech  Report  

CF  Terms  

CAM  Web  Page  

wrote  

read  

wrote  

read  

wrote  

compared  

CAM  Modeler  

How  do  CAM  output  

Variables  map  to  the  

CSSEFARMBE  variables?  

What  addiEonal  ancillary  

informaEon  is  available  about  this  dataset?  

Atmosphere  ScienEst  

How  did  both  CSSEFARMBE  and  ARMBE  originate?  

Page 7: Climate Science for a Sustainable Energy Future Provenance

December 26, 2012 7

Goals of CSSEF Provenance Environment (ProvEn) Services

!   Identify future user communities that will need provenance while the data is being generated by scientists producing the data

!   Knowledge products (e.g reports, archivable provenance records)

!   Create consumer oriented provenance products by: !   Capturing historical information from any native source necessary to describe

the origin of the dataset.

!   For user referential purposes retaining a copy of the native source familiar to the domain community.

Page 8: Climate Science for a Sustainable Energy Future Provenance

December 26, 2012 8

FoundaGonal  Ontology   Cross-­‐Reference  Capability  W3C  Provenance  Ontology  (Prov-­‐O)   Core  Ontology  Describing  Data  Origin  

Dublin  Core  Terms   Data  citaEons  and  soLware  

Friend  of  a  Friend  (FOAF)   DescripEon  of  ScienEst    and  collaborators  

(Future)  Proof  Markup  Language  3.0   DescripEon  of  jusEficaEon  and  trust  

(Future)  Dublin  Core  to  PROV-­‐O  Mapping   Support  integraEon  of  DC  provenance  and  PROV-­‐O  

!   Store this information in a cross-referenced knowledge model by mapping domain ontology to foundational ontology !   Domain ontologies are diverse and subject to constant changes defined by the

concepts extracted from native sources. !   Foundational ontologies are stable and seldom change.

!   Use composite knowledge model to provide finished products to different kinds of consumers !   Stability infers lots of methodologies, tools and, services are available to

leverage.

Goals of CSSEF Provenance Environment (ProvEn) Services

Page 9: Climate Science for a Sustainable Energy Future Provenance

December 26, 2012 9

Identifying a New Product with Native Sources, Domain Concepts and Terms for dataset

Test  NCL  Code  

ARMBE  Header  

CSSEF  ARMBE  Header  

Tech  Report  

CF  Terms  

CAM  Web  Page  

ObservaEonal  Data  Origin  Concepts  

ObservaEonal  Data  Origin  Concepts  

IdenEfied  Variable  Mapping  Concepts  and  Terms  

IdenEfied  Variable  Mapping  Concepts  and  Terms  

Page 10: Climate Science for a Sustainable Energy Future Provenance

December 26, 2012 10

Creating and Maintaining Domain Ontologies (Knowledge Engineer)

Atmosphere  DiagnosEcs  

Dataset  Origin/Mapping  Terms  and  Concepts  

Atmosphere  Domain  Ontology  

FoundaEonal  Ontologies  

(Build  Ontology)  

(Align  Ontologies)  

Aligned  Knowledge    Model    For  

Atmosphere  

ProvEn  Services  

Register  

Add  

Page 11: Climate Science for a Sustainable Energy Future Provenance

December 26, 2012 11

Creating new Product By Populating ProvEn Services with CSSEFARMBE Dataset Native Sources

Test  NCL  Code  

ARMBE  Header  

CSSEF  ARMBE  Header  Tech    

Report  

CF  Terms  

CAM  Web  Page  

NaEve  Sources    contributed  by  Developers  

NaEve  Source  Concept  ExtracEon  

FoundaEonal  Ontologies  

Aligned  Knowledge  Model  for  Atmosphere    

NaEve  Provenance  Mapped  to  Atmosphere  Domain  

Ontology  

Copy  of  Corresponding  NaEve  Sources  NaEve    

Source  References  

ProvEn

 Services  

CSSEFARMBE  knowledge  relevant    to  CAM  Modeler  and  Atmosphere  ScienEst  

CSSEFARMBE  Developers  

Page 12: Climate Science for a Sustainable Energy Future Provenance

December 26, 2012 12

Producing ProvEn Services Product: CSSEFARMBE Dataset Origin Report

Standard  Vocabulary  Cross-­‐Reference    Searching  and  Reasoning  

ProvEn  Services  Store  

CAM  Modeler  

What  addiEonal  ancillary  

informaEon  is  available  about  this  dataset?  

Atmosphere  ScienEst  

How  did  both  CSSEFARMBE  and  ARMBE  originate?  

FoundaEonal  Ontologies  

Aligned  Knowledge  Model  for  Atmosphere    

NaEve  Provenance  Mapped  to  Atmosphere  Domain  

Ontology  

Page 13: Climate Science for a Sustainable Energy Future Provenance

Glassfish  Server  

December 26, 2012 13

ProvEn Services Architecture

Sesame  Store  

Ali  Baba  Object  to  RDF    API  

Store  NaEve    Provenance  

Searching  and  Inferencing  API  

ProvEn  (Jersey)  REST  Services  

Query  and  Cross-­‐Reference    Provenance  

Portable  Jarfile  

ESGF  Node  

Local  Compute  Cluster  

UVCDAT  

Deploy  

Page 14: Climate Science for a Sustainable Energy Future Provenance

Questions?

!   Contact: [email protected]

14