Speeding up ontology creation of scientific terms.
description
Transcript of Speeding up ontology creation of scientific terms.
Speeding up ontology creation of scientific terms.
Luis Bermudez , John Graybeal,Montery Bay Aquarium Research Institute
http://marinemetadata.org
December 7, 2005
2
Marine M
etadata Interoperability Initiative
Why are ontologies importantAt AGU we
have 31 abstracts and 2 entire sessions related to ontologies
3
Marine M
etadata Interoperability Initiative
Problem: Semantic Interoperability
SSDS
AOSN
get me Data for Variable ocean_temperature (C)
get me Data for Parameter temperature_1(deg C)
4
Marine M
etadata Interoperability Initiative
Need for controlled vocabulary
A set of restricted words, used by an information community when describing resources or discovering data. The controlled vocabulary prevents misspellings and avoids the use of arbitrary, duplicative, or confusing words that cause inconsistencies when cataloging data.
5
Marine M
etadata Interoperability Initiative
Controlled Vocabularies: Discovery of Data
GCMD HTMLhttp://gcmd.gsfc.nasa.gov/Resources/valids
BODC Discovery
Comma Separated Value
http://wwwtest.bodc.ac.uk/data/ codes_and_formats/parameter_codes/bodc_para_dict.html
AGU Index Terms HTMLhttp://www.agu.org/pubs/ gaplist.html
MEL HTMLhttps://mel.dmso.mil/docs/metadata_guide/section_6.htm
NOAA CoRIS Thesauri PDF
http://www.coris.noaa.gov/backmatter/keywords/discovery_ thesaurus.pdf
6
Marine M
etadata Interoperability Initiative
Controlled Vocabularies: Usage (tag the data collected)
BODC
Comma Separated Value
http://wwwtest.bodc.ac.uk/data/ codes_and_formats/parameter_codes/bodc_para_dict.html
U.S. JGOFS Dictionary of parameters HTML
http://usjgofs.whoi.edu/datasys/ param_master.html
IOC GF3 parameter codes HTML
http://ioc.unesco.org/oceanteacher/ resourcekit/M3/Formats/Integrated/GF3/GF3.htm
SEACOOS
Comma Separated value
http://twiki.sura.org/twiki/pub/Main/DataStandards/seacoos_draft_data_ dictionary_v2.0.csv
CF XMLhttp://www.cgd.ucar.edu/cms/eaton/cf-metadata/standard_name.xml
7
Marine M
etadata Interoperability Initiative
Problem: Semantic Interoperability
semantics semantics
Standard vocabularies
8
Marine M
etadata Interoperability Initiative
Harmonization
DTDDTD
Comma Comma Separated Separated
ValuesValues
HTMLHTML
Tab Tab Separated Separated
ValuesValues
Relational Relational DatabaseDatabase
XML/XSDXML/XSD
RDFRDF
Web OntologyWeb Ontology Language (OWL)Language (OWL)
9
Marine M
etadata Interoperability Initiative
Web Ontology Language: OWL
2003 World Wide Web Consortium recommendation to formally express ontologies.
Based on the Resource Description Framework (RDF).
Can be serialized in XML. Supporting tools: JENA, Protégé, SWOOP,
Sesame, Pangloss, Kuwari, VINE, Voc2OWL
10
Marine M
etadata Interoperability Initiative
Fast introduction to OWL
RDF TriplesRDF ResourcesClasses - individuals - propertiesRDF Graph
11
Marine M
etadata Interoperability Initiative
RDF: Triples, triples, triples
id description unitsocean_
temperatureOcean
Temperature C
12
Marine M
etadata Interoperability Initiative
RDF: Resource
Resources
A resource is anything on the Web that has a unique identifier. Examples:
URI: urn:aosn.mbari.org.recordVariable.id:1900 URL: http://mmi.org/2005/08/gcmd-keyw#Chlorophyll URL: ftp://mmi.org/data-example
Literal
13
Marine M
etadata Interoperability Initiative
Parameters
id description units
Temperature_1water temperature from unit 00471 deg C
Temperature_2water temperature from unit 00822 deg C
Looks like a class
Looks like individuals of (members of) the class Parameter
Classes Individuals Properties
Property (Attributes)
14
Marine M
etadata Interoperability Initiative
How are ontologies created?
Conceptual direction strategy:
Up - down
Bottom - up
Automation approach:
Manual
Automatic
15
Marine M
etadata Interoperability Initiative
Up - down approach
16
Marine M
etadata Interoperability Initiative
Bottom - up approach
Body of Water Class
RiverLake
Has water
Is inland body
Has a relative defined channel
Lake RiverExample:1. Properties of real
world objects are identified.
2. Similarities are identified.
3. Concepts are created
4. and are expressed as a class.
5. Classes are related.
Subclass
17
Marine M
etadata Interoperability Initiative
id description unitsTemperature deg C
temperature
Temperature inside the OASIS can, in degrees C
Temperature
temperature measured inside the MMC controller
Temperature CelsiusTemperature degrees CTemperature water temperature deg C
Temperature_1water temperature from unit 00471 deg C
Temperature_2water temperature from unit 03533 deg C
Bottom - up approach
id description unitsocean_temperature Ocean Temperature Cocean_temperature_2 Ocean Temperature 2 C
ocean_temperature_allOcean TemperatureAll
C
ocean_temperature_qcflag
Ocean TemperatureQcflag
0=good,1=missing,2=marginal,3=bad
ocean_temperature_rawOcean TemperatureRaw
counts
sea_surface_temperatureSea SurfaceTemperature
C
ssds:Parameter
aosn:Variable
Example:1.Real word objects:
parameters in observatory systems.
2.They all have similar properties (id, description and units).
3. Make them a resource: instance of a class Parameter
rdf:type
18
Marine M
etadata Interoperability Initiative
Bottom - up approach (cont.)
ssds:Parameter
aosn:Variable
mmi:Parameter
sweet:Property
19
Marine M
etadata Interoperability Initiative
Manual (Ontology editor)
List of more than 50 editors: http://www.xml.com/2002/11/06/Ontology_Editor_Survey.html
Protégé
20
Marine M
etadata Interoperability Initiative
Automatic
Ontology in Ontology in OWLOWL
Software Program
transformationProperties file
id description unitsocean_temperature Ocean Temperature Cocean_temperature_2 Ocean Temperature 2 C
ocean_temperature_allOcean TemperatureAll
C
ocean_temperature_qcflag
Ocean TemperatureQcflag
0=good,1=missing,2=marginal,3=bad
ocean_temperature_rawOcean TemperatureRaw
counts
sea_surface_temperatureSea SurfaceTemperature
C
21
Marine M
etadata Interoperability Initiative
Automatic
Advantages Fast Preserves a connection with the source
( back - compatibility ) Avoids typing and copy/paste errors
Disadvantage Only works with simple vocabularies
( Flat vocabularies, and some taxonomies)
22
Marine M
etadata Interoperability Initiative
VOC2OWL
Tool created by MMIAllows to create automatic - bottom -up
ontologies from two basic structures of simple vocabularies: Flat vocabularies (e.g. phone directory) Hierarchical vocabularies (e.g.
taxonomies)JAVA - Eclipse standalone application
23
Marine M
etadata Interoperability Initiative
24
Marine M
etadata Interoperability Initiative
Metadata
25
Marine M
etadata Interoperability Initiative
Conversion Properties I/OFormat of the ASCII file to transform: tab or csv
Location of the ASCII file
Location where the ontology in OWL will be saved
26
Marine M
etadata Interoperability Initiative
Ontology Conversion Properties
Namespace of the resources
Column from where the local names of the resources (individuals) will be created.
One class (at least) is always created.
More than one class can be created
27
Marine M
etadata Interoperability Initiative
ResultParameters
id description units
Temperature_1water temperature from unit 00471 deg C
Temperature_2water temperature from unit 00822 deg C
28
Marine M
etadata Interoperability Initiative
Ontology Conversion Properties
If treated as a hierarchy, there is no such primary class. All the lines in the ASCII file represent a hierarchy
29
Marine M
etadata Interoperability Initiative
Example Hierarchy (GCMD)
30
Marine M
etadata Interoperability Initiative
Has been tested !
About 50 vocabularies were converted to OWL for the MMI workshop “ Advancing Domain Vocabularies” (Aug, 2005)
31
Marine M
etadata Interoperability Initiative
Why do we need all these ontologies ?
Workshop was about relating terms from one controlled vocabulary to another one.
Microsoft Excel was to hard to use for this purpose -:)
32
Marine M
etadata Interoperability Initiative
Mapping results
Topic Direct
mappings Inferred mappings
Total mappings
Plant Pigments 405 1,022 1,427
PaCOOS 131 375 506
Waves 93 181 274
Currents 90 153 243
CTD 81 432 513
Habitats 23 37 60
Total 823 2,200 3,023
47 participants and 12 hours of mapping time
33
Marine M
etadata Interoperability Initiative
VINE : Vocabulary Integration Environment
34
Marine M
etadata Interoperability Initiative
More…
• Advance the Marine Knowledge: 250,000 RDF triples (Ontologies + mappings)• They are available as:
• SOAP web services at: http://marinemetadata.org/webservices• Ontology files at: http://marinemetadata.org/ns
35
Marine M
etadata Interoperability Initiative
Conclusions
• Solving semantic interoperability issues is fun.• We need to relate data producers vocabularies with standard vocabularies.• OWL is growing and growing in popularity more and more tools will be available.• VOC2OWL can help you !
36
Marine M
etadata Interoperability Initiative
Our Guides
Roy Lowry, BODC Robert Arko, LDEO Julie Bosch, NOAA Ben Domenico, Unidata Karen Stocks, SDSC Steve Hankin, NOAA -
Ocean.US/DMAC
Mark Musen, Stanford Univ Michael Parke, Univ of Hawaii Lola Olsen, NASA Goddard Bob Weller, WHOI Dawn Wright, Oregon State
University
Steering Committee
Executive Committee
John Graybeal, MBARI. (PI) Philip Bogden, SURA/SCOOP
Stephen Miller, SIO. Francisco Chavez, MBARI.
Stephanie Watson, Texas A&M
37
Marine M
etadata Interoperability Initiative
MMI:Your Handy Reference GuideMMI: http://marinemetadata.org
Voc2OWL: http://marinemetadata.org/voc2owl
Vine: http://marinemetadata.org/vine
Help Line: [email protected]
Ontologies: http://marinemetadata.org/ns
Term Search:
http://mmi.mbari.org:9600/mmi2/search.jsp
Tethys: http://marinemetadata.org/tethys