Semantic Mediation, Ontologies and Scientific Workflows and all the rest (+/– Web Services)
description
Transcript of Semantic Mediation, Ontologies and Scientific Workflows and all the rest (+/– Web Services)
Semantic Mediation, Semantic Mediation, Ontologies and Ontologies and
Scientific Workflows Scientific Workflows and all the rest (+/– Web Services)and all the rest (+/– Web Services)
Bertram Ludäscher
Knowledge-Based Information Systems LabSan Diego Supercomputer Center
University of California San Diego
http://seek.ecoinformatics.org http://www.geongrid.org
SDSC/LTER Workshop Feb’2004 2
Outline
• Motivation (SEEK, GEON, ..)
• Ontologies 101
• Semantic Mediation, Data Registration, …
• Application Examples (Stargazing with Kepler…)
SDSC/LTER Workshop Feb’2004 3
Kepler Team, Projects, Sponsors
• Ilkay Altintas SDM • Chad Berkley SEEK • Shawn Bowers SEEK• Jeffrey Grethe BIRN• Christopher H. Brooks Ptolemy II • Zhengang Cheng SDM • Efrat Jaeger GEON • Matt Jones SEEK • Edward A. Lee Ptolemy II • Kai Lin GEON• Ashraf Memon GEON• Bertram Ludaescher BIRN, GEON, SDM, SEEK• Steve Mock NMI• Steve Neuendorffer Ptolemy II • Mladen Vouk SDM • Yang Zhao Ptolemy II • …
Ptolemy IIPtolemy II
SDSC/LTER Workshop Feb’2004 4
SDSC/LTER Workshop Feb’2004 5
SEEK
Science Environment for Ecological Knowledge
• EcoGrid• Uniform interfaces to manage environmental data
• Kepler• Modeling scientific workflows
• Semantic Mediation System• “Smart” data discovery and integration
• Knowledge Representation (SEEK-KR)• Classification and Nomenclature (SEEK-TAXON)• Biodiversity and Ecological Analysis and Modeling (SEEK-BEAM)
SDSC/LTER Workshop Feb’2004 7
Building the EcoGrid
AND
LUQ
HBR
NTL
Metacat node
Legacy system
LTER Network (24) Natural History Collections (>> 100)Organization of Biological Field Stations (180)UC Natural Reserve System (36)Partnership for Interdisciplinary Studies of Coastal Oceans (4)Multi-agency Rocky Intertidal Network (60)
SRB node
DiGIR node
VCR
VegBank node
Xanthoria node
SDSC/LTER Workshop Feb’2004 8
Heterogeneous Data integration
• Requires advanced metadata and processing
– Attributes must be semantically typed– Collection protocols must be known– Units and measurement scale must be known– Measurement relationships must be known
• e.g., that ArealDensity=Count/Area
SDSC/LTER Workshop Feb’2004 9
• Label data with semantic types• Label inputs and outputs of analytical components with semantic types
• Use reasoning engines to generate transformation steps– Beware analytical constraints
• Use reasoning engine to discover relevant components
Semantic Mediation
Data Ontology Workflow Components
SDSC/LTER Workshop Feb’2004 10
Ecological ontologies
• What was measured (e.g., biomass)• Type of measurement (e.g., Energy)• Context of measurement (e.g., Psychotria limonensis)• How it was measured (e.g., dry weight)
• SEEK intends to enable community-created ecological ontologies using OWL– Represents a controlled vocabulary for ecological metadata
• More about this in Bertram’s talk
SDSC/LTER Workshop Feb’2004 11
Ontologies 101 (based on a tutorial by Shawn Bowers and CSE291)
• Ontologies basicsOntologies basics
• Ontologies and data management
• Benefits of ontologies
• Constructing ontologies
• Breakout Exercises
SDSC/LTER Workshop Feb’2004 12
What are ontologies?
It depends on who you askWe focus on the data-management view
Generally speaking, an ontology
specifies a theoryspecifies a theory (a modelmodel) by …
definingdefining and relatingrelating …
generic conceptsgeneric concepts representing features of the real or abstract world (a domain of interest)
[Bunge]
SDSC/LTER Workshop Feb’2004 13
Concepts, Symbols, and Things
• Humans use symbols (e.g., words) to communicate
• Words are mapped to things indirectly through concepts that denote (refer to) things
Concept
Ogden, C. K. & Richards, I. A. 1923. "The Meaning of Meaning." 8th Ed. New York, Harcourt, Brace & World, Inc
[Carole Goble, Nigel Shadbolt] [Carole Goble, Nigel Shadbolt]
“Jaguar”
SDSC/LTER Workshop Feb’2004 14
Concepts, Symbols, and Things
Symbols and concepts are not precise– The same symbol can stand for multiple things– The same thing can have multiple symbols– Concepts are usually not well-defined
Concept
Ogden, C. K. & Richards, I. A. 1923. "The Meaning of Meaning." 8th Ed. New York, Harcourt, Brace & World, Inc
[Carole Goble, Nigel Shadbolt] [Carole Goble, Nigel Shadbolt]
“Jaguar”
SDSC/LTER Workshop Feb’2004 15
Concepts, Symbols, and Things
An ontology attempts to define and relate specific concepts for certain sets of things via agreed upon symbols
Concept
Ogden, C. K. & Richards, I. A. 1923. "The Meaning of Meaning." 8th Ed. New York, Harcourt, Brace & World, Inc
“Jaguar”
SDSC/LTER Workshop Feb’2004 16
What are ontologies?
Ontologies are typically created to:
Commit to a definition (a model) of a domain
Explicitly state assumptions concerning the definition
Have a wide scope (be general)
Support exchange and integration of heterogeneous data sources and applications (more on this later…)
SDSC/LTER Workshop Feb’2004 17
What are ontologies?
Ontologies may be expressed
Informally using natural language (e.g., in philosophy and sometimes biology)
Formally using a mathematical language, e.g., first-order logic
We focus on formal ontologies
To be precise about what the theory proposes
SDSC/LTER Workshop Feb’2004 18
What are ontologies?
Formal ontologies can vary in detail
Controlled Vocabulary (list of terms)
Simple Thesaurus (synonyms)
Thesaurus (broader/narrower terms)
Classification (class, instance, is-a, maybe part-of)Classification
(value, cardinality constraints)Classification (axioms such as disjoint, union, etc.)Classification
(general logic constraints)
SDSC/LTER Workshop Feb’2004 19
What are ontologies?
Formal ontologies can vary in detail
Controlled Vocabulary (list of terms)
Simple Thesaurus (synonyms)
Thesaurus (broader/narrower terms)
Classification (class, instance, is-a, maybe part-of)Classification
(value, cardinality constraints)Classification (axioms such as disjoint, union, etc.)Classification
(general logic constraints)
Expressiveness
SDSC/LTER Workshop Feb’2004 20
Class, Instance, and Is-a
Animal
Jaguar
is-a“Every Jaguar is an Animal”x . Jaguar(x) Animal(x)
Set of things (instances)denoted by the class Animal
Set of things (instances)denoted by the class Jaguar
SDSC/LTER Workshop Feb’2004 21
Properties and Cardinality Constraints
Animal
Carnivore
is-a
Jaguar
is-a
eats
A cardinality constraintmight state that carnivores
must eat at least oneat least one Animal
Question: Must Jaguars eat at least one Animal?
SDSC/LTER Workshop Feb’2004 22
Value Restrictions
Animal
Carnivore
is-a
Jaguar
is-a
eats
A value restriction for Jaguar might restrict the eats property
to the specific animals eatenby Jaguars
SDSC/LTER Workshop Feb’2004 23
Value Restrictions
Animal
Carnivore
Jaguar
eats
Marsh Deer
Herbivore
eats
Jaguars restrict the eats relationship to Marsh Deer, …
SDSC/LTER Workshop Feb’2004 24
Value Restrictions
Animal
Carnivore
Jaguar
eats
Marsh Deer
Herbivore
eats
Does anyone see a problem with this choice of representation?
SDSC/LTER Workshop Feb’2004 25
Value Restrictions
Animal
Carnivore
Jaguar
eats
Herbivore
eats
JaguarFood
Marsh Deer
Peccary
These different representations propose the same basicunderlying theory
SDSC/LTER Workshop Feb’2004 26
What are ontologies?
Formal ontologies can vary in detail
Controlled Vocabulary (list of terms)
Simple Thesaurus (synonyms)
Thesaurus (broader/narrower terms)
Classification (class, instance, is-a, maybe part-of)Classification
(value, cardinality constraints)Classification (axioms such as disjoint, union, etc.)Classification
(general logic constraints)
Expressiveness
SDSC/LTER Workshop Feb’2004 27
What are ontologies?
An (informal) ontology of wine:
Wines are potable liquids made by wineries within regions and with specific vintages
Wines are characterized by the type of grape they are made with, their color (white, rose, red), their sugar (dry, offdry, or sweet), their body (light, medium, full), and their flavor (delicate, moderate, strong)
Sauvignon Blanc, Merlot, Pinot Noir, and Riesling are types of wines
[OWL Guide] [OWL Guide]
SDSC/LTER Workshop Feb’2004 28
Exercise
With a partner, take 5 minutes and try to define a “formal” ontology for the wine example
– Select two or three classes– Identify some relationships between them– List any constraints (cardinality or value
restrictions) that exist between them
SDSC/LTER Workshop Feb’2004 29
What are ontologies?
(Philosophy) An ontological theory can answer “ontological” questions
– Is Merlot a potable liquid?– Are there wines made of things other than grapes?– How are Pinot Gris and Pinot Noir related? – Are there white wines that are dry, full, and strong
made in Napa Valley?
We will look at other uses later
[Bunge]
SDSC/LTER Workshop Feb’2004 30
Outline
• Ontologies basics
• Ontologies and data managementOntologies and data management
• Benefits of using ontologies
• Constructing ontologies
• Breakout Exercises
SDSC/LTER Workshop Feb’2004 31
Ontologies and Data Management
Where do ontologies fit within data management architectures?
There is no specific answer to this question…
However, an ontology is similar to a schema or conceptual model if one exists, but is
– Developed independently of a particular application
– Probably given in a different language– Inherently more general– Usually not a very good schema (weak structure)
SDSC/LTER Workshop Feb’2004 32
Ontologies and Data Management( watch out for Semantic Data Registration later)
Schema Schema Schema Schema
ConceptualModel
ConceptualModel
Ontology
Data
Metadata
DesignArtifact
use concepts from(explicitly or implicitly)
SDSC/LTER Workshop Feb’2004 33
Outline
• Ontologies basics
• Ontologies and data management
• Benefits of ontologies Benefits of ontologies
• Constructing ontologies
• Breakout Exercises
SDSC/LTER Workshop Feb’2004 34
Benefits of ontologies
Ontologies are often developed within a community and are interdisciplinary
Explicitly capture “knowledge” about a domain
– Standard terms (symbols) for metadata values and schema design
– Enables advanced searching techniques (via reasoning)
– Enables exchange and integration
SDSC/LTER Workshop Feb’2004 35
Benefits of ontologies
Ontologies for metadata keywords
{cabernet sauvignon, sonoma county, …}
{medium, red, dry, …}
{sonoma county, wine}
SDSC/LTER Workshop Feb’2004 36
Benefits of ontologies
Ontologies for metadata keywords
{cabernet sauvignon, sonoma region, …}
{medium, red, dry, …}
{sonoma region, wine}
Find information about dry californiadry california red winesred wines
We use the ontology to “expand” and/or “focus” the query, e.g., that cabernet sauvignon is red and dry; sonoma valley is in california
SDSC/LTER Workshop Feb’2004 37
Benefits of ontologies
Dataset(region
characteristics)
Dataset(wines by regions)
AnalysisIntegrateDataset
(wine sales)
What regional characteristicsproduce the best-selling wines?
Integration can be extremely complex due to structural (schema and values)and semantic (ontological) differences
Ontologies can help!
SDSC/LTER Workshop Feb’2004 38
Benefits of ontologies
Dataset(region
characteristics)
Dataset(wines by regions)
AnalysisIntegrateDataset
(wine sales)
What regional characteristicsproduce the best-selling wines?
Registering datasets with ontologiesRegistering datasets with ontologies
Map structure (schema) to concepts
Map data to classes/instances
(various ways to do this…)
Provides a uniform view of disparate sources
SDSC/LTER Workshop Feb’2004 39
Outline
• Ontologies basics
• Ontologies and data management
• Benefits of ontologies
• Constructing ontologiesConstructing ontologies
• Breakout Exercises
SDSC/LTER Workshop Feb’2004 40
Constructing ontologies
Various Web-based standards are emerging for defining ontologies
XML Schema• Mainly for defining “vocabularies” and less-formal
ontologies (term-based is-a, some constraints)• Mainly a structural/schema representation
– Topic Maps• For advanced thesauri, subject indexes
– RDF/RDFS/OWL• Formal ontologies based on description logics (a variant of
first-order logic) and semantic networks (more informal)
SDSC/LTER Workshop Feb’2004 41
Resource Description Framework (RDF)
Simple data model that consists of– Resources (uniquely identified via URIs)– Properties – Values (resources or character strings)
Data organized into triples (subject, property, value)
SonomaRegion CaliforniaRegionlocatedIn
Subject(Resource)
Value(Resource)
Property(Resource)
locatedIn(SonomaRegion, California)
SDSC/LTER Workshop Feb’2004 42
RDF Schema
Adds a set of pre-defined properties to define classes and properties
Allows instances to be connected to classes
Sub-class and sub-property (is-a) relationships
SonomaRegion CaliforniaRegionlocatedIn
Region
rdf:type rdf:type
locatedInRegion is a classlocatedIn is a propertylocatedIn connects Regions
SDSC/LTER Workshop Feb’2004 43
OWL
Adds additional pre-defined properties to further constrain an ontology(See http://www.w3.org/TR/owl-guide/)
Note, RDF(S) and OWL use XMLSome graphic tools exist (e.g., Protégé)
<owl:Class rdf:ID="Vintage"> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#hasVintageYear"/> <owl:cardinality>1</owl:cardinality> </owl:Restriction> </rdfs:subClassOf> </owl:Class>
A Vintage is a class that is a subclass of an unnamed class whose instances always have
one hasVintageYear property.
Note the uglified XML syntax…The good news: meant for
parsers, not humans!
SDSC/LTER Workshop Feb’2004 45
Description Logic
A language and syntax for describing “concept” logics
– Concept names C (denote sets of instances)– Class definitions D (denote sets of instances)– Subclass definition C ⊑ D– Equivalence definition C D– Definition constructors
• intersection D ⊓ D• union D ⊔ D• Property existence hasProp.D• Property restriction hasProp.D• Cardinality =1 hasProp.D, >1 hasProp.D, <2 hasProp.D
SDSC/LTER Workshop Feb’2004 46
Description Logic
Wine ⊑ PotableLiquid ⊔ hasColor.{Red, Rose, White)
The class Wine is a sub-class of PotableLiquids that have at least one (exists one) hasColor property whose values are either Red, Rose, or White
WhiteWine Wine ⊓ hasColor.{White)
WhiteWines are exactly Wines whose color is White
WhiteBurgandy ⊑ WhiteWine ⊓ Burgandy
The set of WhiteBurgandy wines is a subset of the set of WhiteWines intersected with Burgandy wines
SauvignonBlanc ⊑ WhiteWine ⊓ =1 madeFromGrape.SauvignonBlancGrape
SDSC/LTER Workshop Feb’2004 47
Constructing Ontologies
In general, creating an ontology is hard
– Requires general agreement and understanding of a domain
– Requires a clear, concise, and unambiguous definition
– May invoke controversy
– Is a hard data-modeling problem (complex constraints, broad domain)
SDSC/LTER Workshop Feb’2004 48
Outline
• Ontologies basics
• Ontologies and data management
• Benefits of ontologies
• Constructing ontologies
• Breakout Exercises
SDSC/LTER Workshop Feb’2004 49
Breakout Exercises
Divide into the same groups as yesterday
Develop an ontology for the domain you worked on:• Define relevant concepts• Define relationships among concepts• If you have time, work on simple constraints (cardinality, value
restrictions)
Capture (on paper, or in PPT if you feel ambitious) your ontology in whatever way makes sense to you (e.g., as circle-line drawings or as list of terms and properties). What assumptions did you make in creating your ontology?
If you have time, develop a scenario for your ontology in terms of your workflow. For example, to show how your ontology could help integration or query.
SDSC/LTER Workshop Feb’2004 50
Some References
Mario Bunge. Treatise on Basic Philosophy, Vol. 3, Ontology I: The Furniture of the World. D. Reidel Publishing Company, 1977.
Nicola Guarino. Formal ontology and information systems. In Proc. of Formal Ontology in Information Systems, IOS Press, pp. 3-15, 1998.
Thomas R. Gruber. Toward principles for the design of ontologies used for knowledge sharing. In Formal Ontology in Conceptual Analysis and Knowledge Representation, Kluwer Academic Publishers, 1993.
Jeffrey Parsons and Yair Wand. Emancipating instances from the tyranny of classes in information modeling. In ACM Transactions on Database Systems, 25(2):228-268, 2000.
SDSC/LTER Workshop Feb’2004 51
Some References
Michael Smith, Chris Welty, and Deborah McGuinness. OWL Web Ontology Language Guide. W3C Proposed Recommendation. (http://www.w3.org/TR/owl-guide/). Includes Wine Ontology.
Protégé. Stanford Medical Informatics. http://protege.stanford.edu/index.html. Freely available. Lots of plug-ins.
Data Registration
SDSC/LTER Workshop Feb’2004 53
What is Data Registration?
• A mechanism by which data sources are A mechanism by which data sources are published in a repository or registry for the published in a repository or registry for the purpose ofpurpose of– data discovery, querying, retrieval (“get”, data discovery, querying, retrieval (“get”,
“copy”), update, transformation, migration, “copy”), update, transformation, migration, application binding, query planning, concept-application binding, query planning, concept-based rewriting, …based rewriting, …
SDSC/LTER Workshop Feb’2004 54
Things to Register
• Data files (individual files)– e.g. shapefile as a blob (+ file type)
• Collections (of files or subcollections)• Ontologies• Services (web + grid services)• Databases (has schema and can be queried)
– e.g. shapefile as a DB with schema registered – schemas (relational, XML, …), – local integrity constraints, local integrity constraints, – access information (connection mechanism, protocols, access information (connection mechanism, protocols,
query capabilities, handles to actual data) query capabilities, handles to actual data) – registration constraints to (identifiable/registered) registration constraints to (identifiable/registered)
ontologies (aka “registration mappings”)ontologies (aka “registration mappings”)
SDSC/LTER Workshop Feb’2004 55
Things to register (w/ metadata!) aka Registration Objects
• Data files (individual files)– Shapefile as a blob (+ file type)
• Collections (of files; nested; eg satellite data)• Databases (has schema and can be queried)
– Shapefile with schema registered
• Ontologies• Services (web + grid services)• Other/external applications
SDSC/LTER Workshop Feb’2004 56
Connecting Datasets to Ontologies
Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57
Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57
DataCollectionEventMeasurement
MeasurementContextMeasurableItem
SpeciesCountSpeciesAbundance
AbundanceCollectionEventLocation
LTERSiteSBLTERSite
{naples,…}
⊑ contains.Measurement⊑ measureOf.MeasurableItem ⊓ hasContext.MeasurementContext
⊑ hasTime.DateTime ⊓ hasLocation.Location ⊑ hasUnit.Unit ⊓ hasValue.UnitValue ⊑ MeasurableItem ⊓ hasSpecies.Species ⊓ hasUnit.RatioUnit
… ⊑ Measurement ⊓ measureOf.SpeciesCount ⊑ DataCollectionEvent ⊓ contains.SpeciesAbundance ⊑ position.Coordinate ⊑ Location ⊑ LTERSite ⊓ position.SBLTERCoordinate ⊑ SBLTERSite
How can we “register”the dataset to concepts in the Ontology?
Ontology (snippet)
Dataset
SDSC/LTER Workshop Feb’2004 57
Purpose of Semantic Registration
Expose “hidden” information:– What do attributes represent? – What do specific values represent? – What conceptual “objects” are in the dataset?
Capture connections between the dataset and ontology to:– Find existing datasets (or parts of datasets) via
ontological concepts (discovery)– Enable fine-grain integration of datasets
(mediation)– Generate metadata for new data products (in a
pipeline)
SDSC/LTER Workshop Feb’2004 58
Semantic Registration Framework
Step 1: Data provider selects relevant ontological concepts (for the dataset)
Step 2: The semantic registration system creates a structural representation based on chosen concepts (data provide refines if needed)
Step 3: The data provider maps the dataset information to the generated structural representation
SDSC/LTER Workshop Feb’2004 59
Step1: Selecting Relevant Concepts
Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57
Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57
Concepts from an Ontology
Dataset
• DataCollectionEvent• AbundanceCollectionEvent
• Measurement• Abundance
• SpeciesAbundance
• MeasurableItem• SpeciesCount
• Location• LTERSite
• SBLTERSite• naples
• Species• …
• MeasurementContext• …
SDSC/LTER Workshop Feb’2004 60
Step1: Selecting Relevant Concepts
Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57
Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57
Concepts from an Ontology
Dataset
• DataCollectionEvent• AbundanceCollectionEvent
• Measurement• Abundance
• SpeciesAbundance
• MeasurableItem• SpeciesCount
• Location• LTERSite
• SBLTERSite• naples
• Species• …
• MeasurementContext• …
SDSC/LTER Workshop Feb’2004 61
Step2: Generate Object ModelConcepts from an Ontology
AbundanceCollection Event
SpeciesAbundance
containsSpeciesCount
measureOf
Species
hasSpecies
RatioUnit
hasUnit
RatioValue
hasValue
DateTime SBLTERSite
hasTime hasLoc
• DataCollectionEvent• AbundanceCollectionEvent
• Measurement• Abundance
• SpeciesAbundance
• MeasurableItem• SpeciesCount
• Location• LTERSite
• SBLTERSite• naples
• Species• …
• MeasurementContext• …
SDSC/LTER Workshop Feb’2004 62
SDSC/LTER Workshop Feb’2004 63
SDSC/LTER Workshop Feb’2004 64
A System for Semantic Integration of Geologic Maps via Ontologies
Kai Lin Bertram Ludäscher
SDSC/LTER Workshop Feb’2004 66
Geologic Map Integration
• Given: – Geologic maps from different state geological
surveys (shapefiles w/ different data schemas)– Different ontologies:
• Geologic age ontology• Rock classification ontologies:
– Multiple hierarchies (chemical, fabric, texture, genesis) from Geological Survey of Canada (GSC)
– Single hierarchy from British Geological Survey (BGS)
• Problem– Support uniform queries using different
ontologies– Support registration w/ ontology A, querying w/
ontology B
Geologic Map Integration
domainknowledge
domainknowledge
Knowledge r
epresentatio
n
Ontologies!?
NevadaNevada
Geoscientists + Computer Scientists Igneous Geoinformaticists+/- Energy
GEON Metamorphism Equation:
+/- a few hundred million years
SDSC/LTER Workshop Feb’2004 68
A Multi-Hierarchical Rock Classification Ontology (GSC)
Composition
Genesis
Fabric
Texture
SDSC/LTER Workshop Feb’2004 69
Implementation in OWL: Not only “for the machine” …
SDSC/LTER Workshop Feb’2004 70
System Overview
Data
Data
Data
Data
ontology A
ontology C
ontology B
Ontology enabled Map Integrator {A,B}
Application (B)
Application (C)
“semantic registration”
Data sets Ontologies Applications
SDSC/LTER Workshop Feb’2004 71
Ontology Repository
• Accept user-defined ontologies in OWL
• Any ontology saved in the system can be imported into a user-defined ontology ( inter-ontology references)
• Provide tool to browse the ontologies in the repository
……………..<owl:Ontology> <owl:imports rdf:resource= "http://compute5.sdsc.geongrid.org:8080/workbench/jsp/ontologies/genesis.owl" /></owl:Ontology>…………….<owl:Class rdf:ID="Ultramafite"> <rdfs:subClassOf rdf:resource="#Ultramafic"/> <rdfs:subClassOf rdf:resource= "http://compute5.sdsc.geongrid.org:8080/workbench/jsp/ontologies/genesis.owl#Igneous"></owl:Class>……………..
composition.owl
SDSC/LTER Workshop Feb’2004 72
Ontology Mapping: Motivation
• Align ontologies• Integrate data sets which are registered to different
ontologies• Query data sets through different ontologies • Ontology parameterization
Data set 1
Data set 2
Ontology 1
Ontology 2
register
register
Ontology mappings queries
SDSC/LTER Workshop Feb’2004 73
Ontology Mapping: Definition
An ontology mapping consists of :
• a class mapping f:
• a property mapping g:
a partial mapping from the property set of Oa to theproperty set of Ob such that if p is a property betweenA1 and A2 in Oa, then g(p) is a property between f (A1) and f(A2) in Ob
a partial mapping from the class set of Oa to theclass set of Ob preserving the subclass relation
A1
A2
f(A1)
f(A2)
p g(p)
Oa Ob
SDSC/LTER Workshop Feb’2004 74
Ontology Mapping: Combining Ontologies
The result O of combining ontologies Oa and Ob is a pushout of the following ontology mappings f and g :
Oc Oa
Ob O Example:
A
B1
A1
A2
B2A
A2
B2q
p
p
q
Oa
Ob
Oc
O
SDSC/LTER Workshop Feb’2004 75
Ontology Switching
Given an ontology mapping f from Oa to Ob, Oa can be used to query any data sets which are registered to Ob.
Data set 1
Data set 2
Ontology Ob
Ontology Oa
register
register
Ontology mapping queries
SDSC/LTER Workshop Feb’2004 76
Geology Workbench : Initial State
click on Ontologies click on Datasets click on Applications
An Ontology-based Mediator
SDSC/LTER Workshop Feb’2004 77
Geology Workbench: Uploading Ontologies
click on Ontology SubmissionChoose an OWL file to uploadClick to check its detail Name SpaceCan be used to import this
ontology into others
SDSC/LTER Workshop Feb’2004 78
Geology Workbench: Data (to Ontology!) RegistrationStep 1: Choose Classes
Click on Submission Data set name
Select a shapefile
Choose an ontology class
SDSC/LTER Workshop Feb’2004 79
Geology Workbench: Data RegistrationStep 2: Choose Columns for Selected Classes
AREA
PERIMETER
AZ_1000
AZ_1000_ID
GEO
PERIOD
ABBREV
DESCR
D_SYMBOL
P_SYMBOL
It contains information about geologic age
SDSC/LTER Workshop Feb’2004 80
Geology Workbench: Data RegistrationStep 3: Resolve Mismatches
Two terms arenot matched anyontology terms
Manually mappingalgonkian intothe ontology
SDSC/LTER Workshop Feb’2004 81
Geology Workbench: Ontology-enabled Map Integrator
Click on the nameChoose interesting
Classes
All areas with the age Paleozoic
SDSC/LTER Workshop Feb’2004 82
Geology Workbench: Change Ontology
Submit a mapping
Ontology mappingbetween British Rock
Classification and CanadianRock Classification
Switch from Canadian Rock Classification to
British Rock Classification
Run it New query interface
Back to Scientific Workflows, Kepler (and yes, web services…)
SDSC/LTER Workshop Feb’2004 84
Web Services & Scientific Workflows in Kepler
• Web services = individual components (“actors”)• “Minute-Made” Application Integration:
– Plugging-in and harvesting web service components is easy and fast
• Rich SWF modeling semantics (“directors” and more):– Different and precise dataflow models of computation– Clear and composable component interaction semantics Web service composition and application integration tool
• Coming soon:– Shrinked wrapped, pre-packaged “Kepler-to-Go” (v0.8)– SWFs with structural and semantic data types (better design
support)– Grid-enabled web services (for big data, big computations,…) – Different deployment models (SWF WS, web site, applet, …)
SDSC/LTER Workshop Feb’2004 85
Genomics Example: Promoter Identification Workflow
Source: Matt Coleman (LLNL)Source: Matt Coleman (LLNL)
SDSC/LTER Workshop Feb’2004 86
Ecology: GARP Analysis Pipeline for Invasive Species Prediction
Training sample
(d)
GARPrule set
(e)
Test sample (d)
Integrated layers
(native range) (c)
Speciespresence &
absence points(native range)
(a)EcoGridQuery
EcoGridQuery
LayerIntegration
LayerIntegration
SampleData
+A3+A2
+A1
DataCalculation
MapGeneration
Validation
User
Validation
MapGeneration
Integrated layers (invasion area) (c)
Species presence &absence points
(invasion area) (a)
Native range
predictionmap (f)
Model qualityparameter (g)
Environmental layers (native
range) (b)
GenerateMetadata
ArchiveTo Ecogrid
RegisteredEcogrid
Database
RegisteredEcogrid
Database
RegisteredEcogrid
Database
RegisteredEcogrid
Database
Environmental layers (invasion
area) (b)
Invasionarea prediction
map (f)
Model qualityparameter (g)
Selectedpredictionmaps (h)
Source: NSF SEEK (Deana Pennington et. al, UNM)Source: NSF SEEK (Deana Pennington et. al, UNM)
SDSC/LTER Workshop Feb’2004 87
Source: NIH BIRN (Jeffrey Grethe, UCSD)Source: NIH BIRN (Jeffrey Grethe, UCSD)
SDSC/LTER Workshop Feb’2004 88
KEPLER Core Capabilities (1/2)
• Capturing scientific workflows– Accessing available workflows through the Grid
• Designing scientific workflows– Composition of actors (tasks) to perform a scientific WF
• Actor prototyping• Accessing heterogeneous data
– Data access wizard to search and retrieve Grid-based resources– Relational DB access and query– Ability to link to EML data sources
SDSC/LTER Workshop Feb’2004 89
KEPLER Core Capabilities (2/2)
• Data transformation actors to link heterogeneous data
• Executing scientific workflows– Distributed and/or local computation– Various models for computational semantics and
scheduling– SDFSDF and PNPN: Most common for scientific workflows
• External computing environments:– C++, Python, C (… Perl--planned ...)
• Deploying scientific tasks and workflows as web services (… planned …)
SDSC/LTER Workshop Feb’2004 90
The KEPLER GUI (Vergil)
Drag and drop utilities, director and actor libraries.
SDSC/LTER Workshop Feb’2004 92
Distributed SWFs in KEPLER
• Web and Grid Service plug-ins– WSDL, GWSDL– ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard
• WS Harvester– Imports all the operations of a specific WS (or of all the WSs in a UDDI repository) as Kepler actors
• WS-deployment interface (…ongoing work…)• XSLT and XQuery transformers to link non-fitting
services together
SDSC/LTER Workshop Feb’2004 93
A Generic Web Service Actor
Given a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method.
Configure - select service operation
SDSC/LTER Workshop Feb’2004 94
Set Parameters and Commit
Set parameters and commit
SDSC/LTER Workshop Feb’2004 96
Web Service Harvester
• Imports the web services in a repository into the actor library.• Has the capability to search for web services based on a keyword.
SDSC/LTER Workshop Feb’2004 97
Composing 3rd-Party WSs
Output of previousweb service
User interaction &Transformations
Input of next web service
SDSC/LTER Workshop Feb’2004 98
GEON Kepler Examples
• Geon Classifier (Efrat) A workflow for classifying igneous rocks.
• Geologic Map Information Integration A workflow for map rendering using web
services(created by Ilkay and Ashraf).
• Database Access (Efrat) Generic actors for connecting and querying
a database.
SDSC/LTER Workshop Feb’2004 99
Problem Description
• Classification of Igneous rocks• Data sets
– Virginia rock database (provides mineral composition).
– Igneous rock diagrams and a transition table for traversing between diagrams.
• Method– Iterations of finer descriptive levels using a
point-in-polygon algorithm.
SDSC/LTER Workshop Feb’2004 101
Mineral Classification of Igneous Rocks
• Inputs:– A row id from the Virginia rock database (contains
mineral composition).– A dataset of diagrams for classification.
• Outputs:– The rock name.– A browser display of each classification level. A
new feature added in Kepler.
• Execution:– Divided into levels. Each provides a finer level of
granularity. – At each level, a point is classified within a diagram
using a PointInPolygon algorithm.
SDSC/LTER Workshop Feb’2004 102
Classifying with Kepler
Extract mineral composition for row Id.
Igneous Rock Diagrams information.
Rock Name.
SDSC/LTER Workshop Feb’2004 104
Classifying with Kepler
Diagrams information and transitions between them.
Extracted from the mineral composition and this level’s diagram coordinates.
SVG to polygons.
Classifier: Locates the point’s region.
Finer granularity
Displays the point in the diagram for this level.
SDSC/LTER Workshop Feb’2004 105
SDSC/LTER Workshop Feb’2004 106
Geologic Map Integration
• Ontology-enabled Map Integration (OMI)– Integration of Heterogeneous Geological Datasets
• Data sets – State geology map datasets (rocky mountain area)– State boundaries and coast lines.
• Rock Type Ontologies
Providing DB Access through Kepler
• Database connection actor: – Opening a database connection and passing it to all actors
accessing this database.
• Database query actor:– A generic actor that queries a database and provides its
result.
• DBConnection type and DBConnectionToken:– A new IOPort type and a token to distinguish a database
connection from any general type.
Database Connection Actor
OpenDBConnection actor:
• Input: database connection information.• Output: A DBConnectionToken, a reference to a
database connection instance, through a DBConnection output port.
Database Query Actor
Database Query actor:
Input: A query string (SQL) and a database connection reference.
Parameters: output type – XML, Record or String. output each row separately or all at once.
Process: Execute query. Produce results according to parameters.
Querying Example
SDSC/LTER Workshop Feb’2004 114
KEPLER and YOU
• Kepler …– is a community-based, cross-
project, open source collaboration
– uses web services as basic building blocks
– has a joint CVS repository, mailing lists, web site, …
– is gaining momentum thanks to contributors and contributions
• BSD-style license allows commercial spin-offs
– a pre-packaged, shrink-wrapped version (“Kepler-to-GO”) coming soon to a place near you…