Bertram Ludäscher ludaesch@sdsc Data and Knowledge Systems San Diego Supercomputer Center
description
Transcript of Bertram Ludäscher ludaesch@sdsc Data and Knowledge Systems San Diego Supercomputer Center
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Towards Semantic Mediation for GEON:
Facilitating Scientific Data Integration using Knowledge Representation
Bertram LudäscherBertram Ludäscher ludaeschludaesch@[email protected]
Data and Knowledge SystemsData and Knowledge SystemsSan Diego Supercomputer CenterSan Diego Supercomputer Center
U.C. San DiegoU.C. San Diego
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Acknowledgements“Smart” Geologic Map Prototype:
Data and Knowledge SystemsSan Diego Supercomputer Center
Geo-Knowledge-Engineer:Boyan Brodaric
[email protected] Resources Canada
... and many GEONites :Dogan, Krishna, ..., State Geologic
Surveys, Chaitan, Ilya, Michalis, Ashraf, ... (upcoming demo)
Geoscientists + Computer Scientists Igneous Geoinformaticists+/- Energy
GEON Metamorphism Equation:
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GEON and “Semantic” Data Integration
Rocky Mountains
Midatlantic Region
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
What is Knowledge Representation? Relating Theory to the World via Formal Models
Source: John F. Sowa, Knowledge Representation: Logical, Philosophical, and Computational Foundations
““All models are wrong, but some are useful!”All models are wrong, but some are useful!”
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
What is (an) “Ontology” ??? (... what CS graduate students need to know ...)
1. Ontology as a philosophical discipline1. Ontology as a philosophical discipline2. Ontology as a an informal conceptual system2. Ontology as a an informal conceptual system3. Ontology as a formal semantic account3. Ontology as a formal semantic account4. Ontology as a specification of a “conceptualization”4. Ontology as a specification of a “conceptualization”5. Ontology as a representation of a conceptual system5. Ontology as a representation of a conceptual systemvia a logical theoryvia a logical theory
5.1 characterized by specific formal properties5.1 characterized by specific formal properties5.2 characterized only by its specific purposes5.2 characterized only by its specific purposes
6. Ontology as the vocabulary used by a logical theory6. Ontology as the vocabulary used by a logical theory7. Ontology as a (meta-level) specification of a logical theory7. Ontology as a (meta-level) specification of a logical theory
http://ontology.ip.rm.cnr.it/Papers/KBKS95.pdf[Guarino’95]
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
What is an Ontology? (CSE-291 cont’d ;-)
• Given a logical language L ...Given a logical language L ...– ... a conceptualization is a set of models of L which describes
the admittable (intended) interpretations of its non-logical symbols (the vocabulary)
– ... an ontology is a (possibly incomplete) axiomatization of a conceptualization.
conceptualization conceptualization C(L)C(L)
ontologyontology
set of all models M(L)set of all models M(L)logiclogictheoriestheories
[Guarino96]http://www-ksl.stanford.edu/KR96/Guarino-What/P003.html
Problem: Scientific Data Integration ... from Questions to Queries ...
What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ?
How does it relate to host rock structures?
?Information Integration
Geologic Map(Virginia) GeoChemical
GeoPhysical(gravity contours)
GeoChronologic(Concordia)
Foliation Map(structure DB)
“Complex Multiple-Worlds”
Mediation
domain knowledge
Database mediationData modeling
Knowledge Representation:ontologies, concept spaces
raw data
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Got Glue? Which one? What for? • XML (common syntax)XML (common syntax)
– flexible (semistructured) data model– used at all levels: data / metadata exchange, message exchange (SOAP), schemas & data
types (XML Schema), Semantic Web & web ontologies (RDF(S), OWL), ...• Grid infrastructure (system interoperation)Grid infrastructure (system interoperation)
– distributed computing and data management– web services
• Controlled Vocabularies (“joins”)Controlled Vocabularies (“joins”)– data level: joins across different data sets– but meta-data and ontologies (concept names, relationship names, ...) are also data!
• Integrated View Definitions (mediated views/virtual databases)Integrated View Definitions (mediated views/virtual databases)– declarative specification of “integration logic”: XQuery, Datalog, ...
• Thesauri (translator for retrieving related information)Thesauri (translator for retrieving related information)– synonyms, broader/narrow term, e.g., UMLS (meta-thesaurus, “ontology”)
• Taxonomies (classification)Taxonomies (classification)– shared vocabulary, concept hierarchy (is-a)
• Ontologies (classification + additional semantics):Ontologies (classification + additional semantics):– formal specification of a conceptualization, shared meaning – facilitates “smart querying”, semantic mediation
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Information Integration Challenges• System aspects: “Grid” Middleware
• distributed data & computing• Web Services, WSDL/SOAP, OGSA, …• sources = functions, files, data sets, …
• Syntax & Structure: (XML-Based) Data Mediators
• wrapping, restructuring • (XML) queries and views• sources = (XML) databases
• Semantics: Model-Based/Semantic Mediators
• conceptual models and declarative views • Knowledge Representation: ontologies,
description logics (RDF(S),OWL ...)• sources = knowledge bases (DB+CMs+ICs)
Syntax
Structure
Semantics
System aspects
reconciling reconciling SS44 heterogeneitiesheterogeneities
““gluing” together multiple gluing” together multiple data sources data sources
bridging information and bridging information and knowledge gaps knowledge gaps computationallycomputationally
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Standard (XML-Based) Mediator Architecture
MEDIATOR
(XML) Queries & Results
S1
Wrapper
(XML) View
S2
Wrapper
(XML) View
Sk
Wrapper
(XML) View
Integrated Global(XML) View G
Integrated ViewDefinition
G(..) S1(..)…Sk(..)
USER/ClientUSER/Client Query Q ( G (SQuery Q ( G (S11,..., S,..., Skk) )) )
wrappers implementedas web services
XML-Based vs. Semantic Mediation
Raw DataRaw DataRaw Data
IF THEN IF THEN IF THEN
Semantics,Constraints in Logic
Integrated-CM :=
CM-QL(Src1-CM,...)
. . ....
....
........ (XML)Objects
Conceptual Models
XMLElements
XML Models
C2 C3
C1
R
Classes,Relations,is-a, has-a, ...
“Glue Maps” ontologies, concept spaces
Integrated-DTD :=
XQuery(Src1-DTD,...)
No Semantics /Domain Constraints
A = (B*|C),DB = ...
Structural Constraints (DTDs),Parent, Child, Sibling, ...
CM ~ {Descr.Logic, ER, UML, RDF(S), …} CM-QL ~ {F-Logic, …}
0.0155381,1.54906,2,140,29,Tertiary,Trc,CHINLE FORMATION,59,57
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GEON Framework for Interoperability in the Geosciences
• Systems levelSystems level: GEON Grid ... : GEON Grid ... – enable sharing of data and tools via grid services– based on Open Grid Services Architecture (OGSA)– acquisition of cluster endpoints and initial deployment at some sites
underway, including SDSC, UTEP, VT, ..., • Syntactic and schema levelSyntactic and schema level: Data integration via (meta)data : Data integration via (meta)data
standards (often XML-based) standards (often XML-based) – database mediators create integrated virtual databases=> dynamic creation and automatic update of data-warehouses
• Semantic levelSemantic level: data integration via “semantic” mediation: data integration via “semantic” mediation– Situating 4-D data in context spatio-temporal, thematic, process
contexts can be represented as “concept spaces”– specifically: use of ontologies, and logic-based knowledge representation– development guided/driven by specific scientific data integration problems
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Towards Shared Conceptualizations: High-level Domain Ontology & Standard Data Model
Source: NADAM Team(Boyan Brodaric et al.)
Adoption of a standard (meta)data model => wrap data sets into unified virtual views
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Towards Shared Conceptualizations: Data Contextualization via Concept Spaces
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Towards Knowledge Sharing: Rock-type “Ontology”
Composition
Genesis
Fabric
Texture
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Biomedical InformaticsResearch Networkhttp://nbirn.net
Getting Formal: Source Contextualization & Ontology Refinement in Logic
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Show formations where AGE = ‘Paleozic’(without age ontology)
Show formations where AGE = ‘Paleozic’
(with age ontology)
domainknowledge
Knowledge re
presentation
AGE ONTOLOGY
Nevada
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Querying with Multiple Classifications/Ontologies:Age, Composition, Texture, Fabric, Genesis
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
What to do with the “KR Glue”?
• Conceptual-level information, concept spaces, Conceptual-level information, concept spaces, ontologies, and other KR techniques for ...ontologies, and other KR techniques for ...– ... smart data discovery– ... browsing and querying by themes, disciplines, ...– ... defining virtual/mediated databases at conceptual level – ... support “plugging together” of “data and information
experiments” into Scientific Workflows (a.k.a. Analytical Pipelines in the SEEK ITR)– ... smarter user interfaces
is “find felsic sedimentary rocks” a meaningful (satisfiable) query?– ...
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Some enabling operations on “ontology data”
Composition
Concept expansion:Concept expansion:• what else to look for what else to look for when asking for ‘Mafic’when asking for ‘Mafic’
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Some enabling operations on “ontology data”
Composition
Generalization:Generalization:• finding data that is finding data that is “like” X and Y“like” X and Y
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Towards Knowledge Sharing: Rock-type Ontology
Composition
Genesis
Fabric
Texture
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
DEMO
... do NOT click this ...
http://kbis.sdsc.edu/GEON/ahm03-demo.htmlhttp://kbis.sdsc.edu/GEON/ahm03-demo.html
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Architecture of Integrated Geologic Map Architecture of Integrated Geologic Map Prototype SystemPrototype System
HTTP Server(Java Server Page)
MapServer(Minnesota) Mediator
(Java application)
Database(Arizona)
Database(Montana)
Map Definition
local layer
remote layer
local layer
Global Ontology DefinitionsRock classification
Geologic age
request response
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Data Source Wrapping and IntegrationData Source Wrapping and Integration
Arizona
Colorado
Utah
Nevada
Wyoming
New Mexico
Montana East
Idaho
Montana West
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
…… FormationFormation
…… AgeAge
…… CompositionComposition
…… FabricFabric
…… TextureTexture
…… FormationFormation
…… AgeAge
…… CompositionComposition
…… FabricFabric
…… TextureTexture
ABBREV
PERIOD
PERIOD
NAME
PERIOD
TYPE
TIME_UNIT
FMATN
PERIOD
NAME
PERIOD
NAME
FORMATION
PERIOD
FORMATION
FORMATION
LITHOLOGY
LITHOLOGY
AGE
AGE
andesitic sandstone
Livingston formation
Tertiary-Cretaceous
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Ontology-Enabled Query ProcessingOntology-Enabled Query ProcessingUser: “Show formations from Cenozoic!”
Query RewritingQuaternary Tertiary
CenozoicAge Ontology
Arizona Montana West
TertiaryTertiary TkgmTkgm
QuaternaryQuaternary QQ
…… …………
QgQg QuaternaryQuaternary …… …… ……
TwpTwp TertiaryTertiary …… …… ……
TwlTwl TertiaryTertiary …… …… ……
PERIOD FORMATION LITHOLOGY
TkgmTkgm
QgQg
TwpTwp
TwlTwl
……
PERIOD
Color Definition
Map Rendering
select FORMATION where AGE=“Tertiary” or AGE=“Quaternary”
ABBREV
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Integration Challenges• MANY!MANY!• non-available or non-non-available or non-
interoperable datainteroperable data• ““Dirty data”, no controlled Dirty data”, no controlled
vocabulariesvocabularies• Many different controlled Many different controlled
vocabularies! (“clean data”)vocabularies! (“clean data”)• What is entailed by a What is entailed by a
vocabulary? vocabulary? Formal OntologiesFormal Ontologies Extensible OntologiesExtensible Ontologies
GEON AHM, April 16-18, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
What’s next?• YOU!YOU!
• GEON-SCI: GEON-SCI: – Science questions waiting to be turned into queries!
• GEON-KR Working Group activitiesGEON-KR Working Group activities– guided (if not driven by) geoscientists– marry KR technologies to standards (W3C, Semantic Web: RDF, OWL, ...)– collect GEON-able KR resources (data models, controlled vocabularies,
ontologies, ...)
• GEON-DEV: GEON-DEV: – Generalize and merge current KR/semantic mediation architecture with
standard Grid architecture– building systems