Harmonization of vocabularies for water data

29
Harmonization of vocabularies for water data Jonathan Yu | Research engineer HIC 2014, 17 August 2014 LAND AND WATER FLAGSHIP | OCEANS AND ATMOSPHERE FLAGSHIP

Transcript of Harmonization of vocabularies for water data

Harmonization of vocabularies for water data

Jonathan Yu | Research engineer

HIC 2014, 17 August 2014

LAND AND WATER FLAGSHIP | OCEANS AND ATMOSPHERE FLAGSHIP

Outline

• Context and problem space – need formal mechanisms for publishing vocabularies

• Use of semantic web tech to publish and harmonise vocabularies

• Challenges still exist• conceptualisation as both classes and individuals – pragmatic but problematic

• URI patterns

• Versioning and keeping track

• Suggested paths forward?

Issues

• Formalization• RDF SKOS OWL

• Collections

• Re-use/clone/leave alone

• URI Patterns

• Distribution• UIs/APIs

• Versioning

• Mappings

• Search and discovery

Presentation title | Presenter name3 |

Formalization: classic glossary – term+definition

Presentation title | Presenter name4 |

CABI - http://www.cabi.org/ashc/uploads/file/ASHC/8_Glossary__acronyms__index_revised.pdf

AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use

cas_rnnumber

ANGDTS Code ANGDTS Description Units_used

WDTF Parameter chemical name

ADWG name

IUPAC name Group Ion

EC ECease at which conduction current can be caused to flow through material in microSiemens/centimetre

us/cm ms/cm mg/L

ElectricalConductivityAt25C_uScm

Electrical Conductivity Conductivity

PH pHnegative logarithm of hydrogen ion concentration in ph units

pH units WaterpH_pHpH pH

pH, alkalinity, acidity

16887-00-6

16887-00-6

concentration of chloride as Cl in milligrams/litre

mg/L mg/kg Chloride Chloride Chloride Anion

TDS TDSthe portion of total solids that passes through filter and deemed to have been dissolved in sample in milligrams/litre

mg/L Total Dissolved Solids

Total Dissolved Solids Salinity

TOTALALKALINITY

ALKTconcentration in milligrams/litre CaCO3 of titratable bases using a methyl-orange endpoint of about pH 4.3

mg/L Total Alkalinity (as CaCO3)

pH, alkalinity, acidity

HARDNESS_CACO3

HARDthe ability of water to precipitate soap and is sum of calcium and magnesium concentrations as milligrams/litre CaCO3

mg/L Hardness (as CaCO3)

Hardness (as calcium carbonate)

Hardness (as calcium carbonate)

SAR SARratio of sodium to magnesium and calcium and used to assess risk of excess sodium in irrigation water Ratio

Sodium Adsorption Ratio Salinity

3812-32-6

ALKCalkalinity ascribed to carbonate in milligrams/litre CO3

mg/L %MOL

Carbonate Alkalinity (as CaCO3) Carbonate

pH, alkalinity, acidity

NITRATE14797-

55-8concentration of nitrate as N in milligrams/litre

mg/L mg/kg Nitrate

Nitrate and Nitrite

Nitrate and Nitrite Anion

7439-89-6

7439-89-6

concentration of iron as Fe in milligrams/litre

mg/L mg/kg ug/L Iron Iron Metal Cation

Formalization: table – structure + mappings

Healthy Headwater - NGIS Terms

Formalization: RDF – SKOS for basic vocabularies

Linked Vocabularies | Simon Cox6 |

chem:sodium

a skos:Concept ;

rdfs:label "sodium"^^xsd:string ;

skos:broader chem:alkali ;

skos:exactMatch <http://dbpedia.org/resource/Sodium> ;

skos:inScheme skos:chemicals ;

skos:prefLabel "nátrium"@hu , "sodio"@it , "sodium"@fr , "sodium"@en .

Formalization: RDFS/OWL add rich predicates

• Water Quality Vocabulary

Presentation title | Presenter name7 |

AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use

Formalization: alignment with existing vocabularies (Water Quality extension to QUDT)

QUDT

OP

AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use

Formalization: link detailed model to SKOS access using SKOS API

Other approaches: OWL Class per concept

• deep subsumption hierarchy: SWEET, OBO

Presentation title | Presenter name10 |

• intersecting constraints:CGI Lithology

Formalization challenge

• Sometimes formalized as OWL - usually as SKOS(example? SWEET / GEMET?)

• Class vs individuals(Example from QUDT?)

• Hybrid approaches exist – vocabulary as individuals of classes from an ontology but aligned with SKOS(Example from OP?)

• https://www.seegrid.csiro.au/wiki/Siss/VocabularyFormalizationInSKOS

Presentation title | Presenter name11 |

Collections

skos:Collection –skos:member skos:Concept|skos:Collection• A new collection can claim existing concepts as members

• Nested collections

skos:Concept –skos:inscheme skos:ConceptScheme• Concepts assert their own membership

• No nesting

owl:Ontology• No membership predicate

– rdfs:member? dct:hasPart?

void:Dataset, ldp:Container, reg:Register

Presentation title | Presenter name12 |

Re-use: new collections from old – clone, or leave alone

Presentation title | Presenter name13 |

• eReefs WQ vocabulary includes a subset of 330+ chemicals from 36000+ in ChEBI

• New resources in local namespace

• SKOS *Match predicate gives provenance, link to more detail

Clone or leave alone?

• Question of caching content vs federating queries/discovery of content

• Consider CHEBI – big• Cache or just link to its definitions?

• Tradeoff between performance and convenience vs updating and synchronize

• LDR allows registration of external resources• New register = subset or combination of terms already published elsewhere?

Presentation title | Presenter name14 |

URI Patterns – opaque?

What does the URL path imply?

http://vocab.nerc.ac.uk/collection/G04/current/008/

G04 ISO RoleCode, 008 Principal Investigator

http://resource.geosciml.org/classifier/ics/ischart/Pliocene

= Pliocene, URI supplied by GeoSciML, definition sourced from International Commission for Stratigraphy (ics), in the collection known as ‘International Stratigraphic Chart’ (ischart)

Semantics? Management? Set-membership?

Presentation title | Presenter name15 |

Versioning

• Individual items, set-as-a-whole

Presentation title | Presenter name16 |

Versioning - 2

Are these the same thing? How can we tell? How can a machine tell?http://sweet.jpl.nasa.gov/1.1/time.owl#PLEISTOCENEhttp://sweet.jpl.nasa.gov/2.0/timeGeologic.owl#Pleistocenehttp://sweet.jpl.nasa.gov/2.2/stateTimeGeologic.owl#Pleistocenehttp://sweet.jpl.nasa.gov/2.3/stateTimeGeologic.owl#Pleistocene

Compare with http://resource.geosciml.org/classifier/ics/ischart/Pliocene

– URI for the concept

http://def.seegrid.csiro.au/sissvoc/isc2014/resource.html?uri=http://resource.geosciml.org/classifier/ics/ischart/Pliocene

– URI for a description of the concept (i.e. record), according to the 2014 version of the service

Care with version number in URI!

Presentation title | Presenter name17 |

Versioning - 3

• Version info in item?http://vocab.nerc.ac.uk/collection/G04/current/008/ a skos:Concept ;

skos:prefLabel ”principalInvestigator” ;

owl:versionInfo “1” ;

dc:date “2012-07-04 10:56:53.0” .

Presentation title | Presenter name18 |

• Version info in registration record?

Versioning

• How do we manage versions of definitions?

• Do we version a definition of an abstract concept?• Does the definition of the concept change or does our understanding

change?

• Version the set or individual items?

Presentation title | Presenter name19 |

Distribution

• Vocabulary packaged in a file or pagehttp://resource.geosciml.org/vocabulary/timescale/isc2014.ttl

http://resource.geosciml.org/vocabulary/timescale/isc2014.html

• Dereference the URI for a resource in the vocabularyhttp://resource.geosciml.org/classifier/ics/ischart/ (all)

http://resource.geosciml.org/classifier/ics/ischart/Cambrian

• SPARQL endpointhttp://resource.geosciml.org/sparql/isc2014

• Vocabulary servicehttp://def.seegrid.csiro.au/sissvoc/isc2014/collection

Presentation title | Presenter name20 |

Semantic web tech to publish vocabularies

• SISSVoc

Presentation title | Presenter name21 |

Mappings

• Embed in vocabulary vs. store separately?

Presentation title | Presenter name22 |

Mapping challenge

• Linking between ontologies – which to use? All or some?

• SKOS relations - exactMatch, closeMatch, narrowMatch, broadMatch• OWL predicates - sameAs for individuals, equivalentClass for classes and

equivalentProperty for properties

• Dublin core• Prov-O• VoID• VOAF

• Linking between classes and individuals in OWL – logics-based reasoning support

Presentation title | Presenter name23 |

Search and discovery

Presentation title | Presenter name24 |

Cox, Simons, Yu | Observable property ontology25 |

Standards…

• The standard ISO 8601 concerns dates, a common type of information used for data and documentation.

• March 5, 2014• 2014-03-05• 3/5/14• 05/03/2014• 5 Mar 2014

• Multiple representations but essentially one meaning

Source: http://dataabinitio.com/?p=449

Presentation title | Presenter name26 |

Challenges still exist

• Variation of formalisation and publication

• conceptualisation as both classes and individuals – pragmatic but problematic

• URI patterns

• Versioning and keeping track

Presentation title | Presenter name27 |

Suggested paths forward?

Presentation title | Presenter name28 |

Jonathan Yu

Research Software Engineer

[email protected]

Bruce Simons

SDI Modeller

[email protected]

ADD BUSINESS UNIT/FLAGSHIP NAME

Thank you

Terms of use: Image sources from Wikipedia under CC2.0 licencehttp://en.wikipedia.org/wiki/File:Amazing_Great_Barrier_Reef_1.jpg

Simon Cox

Research Scientist

[email protected]

http://ereefs.org.au/