Model-Based Mediation: Framework and Challenges Bertram Ludäscher [email protected] Data and...

51
Model-Based Mediation: Model-Based Mediation: Framework and Challenges Framework and Challenges Bertram Ludäscher [email protected] Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego

Transcript of Model-Based Mediation: Framework and Challenges Bertram Ludäscher [email protected] Data and...

Page 1: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

Model-Based Mediation:Model-Based Mediation: Framework and ChallengesFramework and Challenges

Bertram Ludä[email protected]

Data and Knowledge Systems

San Diego Supercomputer Center

U.C. San Diego

Page 2: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

2

Outline

• Information Integration from a DB Perspective

• Part I: XML-Based Mediation – wrapper/mediator approach– based on querying semistructured data & XML

• Part II: Model-Based Mediation– basic ideas & architecture, lifting data to knowledge sources– “glue maps” (domain maps, process maps)– formal framework: Description Logic, Frame-Logic– ongoing/future research: mix of DB & KR techniques

• Summary

Page 3: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

An Online Shopper’s Information Integration Problem

El Cheapo: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?”

?Information Integration

?Information Integration

addall.comaddall.com

“One-World”Mediation

“One-World”Mediation

amazon.comamazon.com A1books.comA1books.comhalf.comhalf.combarnes&noble.combarnes&noble.com

WWWpublic library

Page 4: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

A Home Buyer’s Information Integration Problem

What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood

with below-average crime rate and diverse population?

?Information Integration

?Information Integration

RealtorRealtor DemographicsDemographicsSchool RankingsSchool RankingsCrime StatsCrime Stats

“Multiple-Worlds”Mediation

“Multiple-Worlds”Mediation

Page 5: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

A Geoscientist’s Information Integration Problem

What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ?

How does it relate to host rock structures?

?Information Integration

?Information Integration

Geologic Map(Virginia)

Geologic Map(Virginia) GeoChemicalGeoChemical GeoPhysical

(gravity contours)

GeoPhysical(gravity contours)

GeoChronologic(Concordia)

GeoChronologic(Concordia)

Foliation Map(structure DB)

Foliation Map(structure DB)

“Complex Multiple-Worlds”

Mediation

“Complex Multiple-Worlds”

Mediation

Page 6: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

A Neuroscientist’s Information Integration Problem

What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity?

How about other rodents?

?Information Integration

?Information Integration

protein localization(NCMIR)

protein localization(NCMIR)

neurotransmission(SENSELAB)

neurotransmission(SENSELAB)

sequence info(CaPROT)

sequence info(CaPROT) morphometry

(SYNAPSE)

morphometry(SYNAPSE)

“Complex Multiple-Worlds”

Mediation

“Complex Multiple-Worlds”

Mediation

Page 7: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

7

Information Integration from a DB Perspective

• Information Integration Challenge– Given: data sources S_1, ..., S_k (DBMS, web sites, ...) and

user questions Q_1,...,Q_n that can be answered using the S_i

– Find: the answers to Q_1, ..., Q_n

• The Database Perspective: source = “database” S_i has a schema (relational, XML, OO, ...) S_i can be queried define virtual (or materialized) integrated views V over

S_1,...,S_k using database query languages questions become queries Q_i against V(S_1,...,S_k)

• Why a Database Perspective?– scalability, efficiency, reusability (declarative queries), ...

Page 8: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

8

Technical Issues and Challenges

• Integration Method and Architecture– federated DBs, wrapper-mediator approach, GAV/LAV,

warehouse/on-demand, ...

• Suitable KRDB Formalisms and Frameworks– XML, DTDs/XML Schema, XPath, XQuery, ...

– RDF(S), Ontologies, Description Logics, DAML+OIL, ...

– querying, deduction, subsumption, classification, ...

• Algorithms and Implementation– query composition, rewriting, reasoning, source capabilities, ...

• Information Integration Scenario and Scope– simple/complex, single/multiple worlds, ...

Page 9: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

9

DB mediation techniques

OntologiesKR formalisms

Model-Based Mediation

Information Integration Landscape

conceptual distanceone-world multiple-worlds

conceptual complexity/depth

low

high

addallbook-buyer

BLAST

EcoCyc

Cyc

WordNet

GO

home-buyer24x7 consumer

UMLS

MIA Entrez

RiboWeb

Tambis

BioinformaticsGeoinformatics

Page 10: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

10

PART I: XML-Based Mediation

Page 11: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

11

Abstract (XML-Based) Mediator Architecture

S_1

MEDIATORMEDIATOR

XML Queries & Results

USER/ClientUSER/Client

Wrapper

XML View

S_2

Wrapper

XML View

S_k

Wrapper

XML View

IntegratedXML View V

Integrated ViewDefinition

IVD(S_1,...,S_k)

Query Q o V (S_1,...,S_k)Query Q o V (S_1,...,S_k)

Page 12: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

12

XMAS: XML Matching And Structuring language

Integrated View Definition:“Find publications from amazon.com and DBLP,

join on author,group by authors and title”

CONSTRUCT <books> <book>

$a1$t<pubs>

$p { $p } </pubs>

</book> { $a1, $t } </books>WHERE <books.book>

$a1 : <author />$t : <title />

</> IN WRAP(“amazon.com”)AND <authors.author>

$a2 : <author /><pubs> $p : <pub/> </>

</> IN WRAP(“www...DBLP…”)AND value( $a1 ) = value( $a2 )

CONSTRUCT <books> <book>

$a1$t<pubs>

$p { $p } </pubs>

</book> { $a1, $t } </books>WHERE <books.book>

$a1 : <author />$t : <title />

</> IN WRAP(“amazon.com”)AND <authors.author>

$a2 : <author /><pubs> $p : <pub/> </>

</> IN WRAP(“www...DBLP…”)AND value( $a1 ) = value( $a2 )

XMAS

XMAS Algebra

Page 13: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

13

Some Technical Challenges ...

• Uniform Data Model: Semistructured Databases– flexible mix of data and schema– labeled directed graph/tree, ordered/unordered,

ranked/unranked– XML = labeled ordered trees

• Query Languages– DB community: QLs for semistructured data, e.g.,

TSIMMIS/MSL, Lorel, Yatl, ..., Florid/F-logic [InfSystems98] – CSE/SDSC: XMAS [SSD99,WebDB99,EDBT00]

– W3C: XPath, XSLT, XQuery (Working Draft , June 2001)

• DB Theory: Expressiveness/Complexity Trade-Off– querying: FO, (WF/S-)Datalog, FO(LFP), FO(PFP), ... , all– reasoning: query satisfiability, containment, equivalence

Page 14: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

14

... Some More Technical Challenges ...

• DB Practice: Query Composition– compute Q o V(S_1,...,S_k) w/o computing all of V “push Q through V into S_i” in Datalog: view unfolding (resolution, unification) +

simplification ~ top-down evaluation ~ magic sets in XML: some solutions (Papakonstantinou, ...)

• Navigation-Driven Evaluation of Integrated View V:– V materialized => warehousing approach

– V virtual => mediator approach

– V virtual & driven by user-navigation => VXD approach [EDBT00] (w/ Papakonstantinou, Velikhov)

Page 15: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

15

XML (XMAS) Query Processing

TranslatorTranslator

Rewriter/OptimizerRewriter/Optimizer

composed plan

optimized plan

XMAS Query Q

Composition (Q o V)Composition (Q o V)

XMAS ViewDefinition V

algebraic plans

Plan Execution Plan Execution

Compile-timeCompile-time

Run-time: lazy

VXD evaluation

Run-time: lazy

VXD evaluation

Page 16: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

16

A Concrete (Future) XML-Based Mediator System

S1 S2

S3

XML (Integrated View)

MEDIATOREngine

XQuery Processor

Integrated View Definition IVD

XML Queries & Results

XQuery

XPATH

XQuery

XSLT

XQuery

XSQL

USER/ClientUSER/Client

XML-Wrapper

XQuery

XQuery

XScan

XPath

SQL

XSQL

http-get

XSLT

XML-Wrapper XML-Wrapper

First Results & Demos:XMAS language and algebra,

VXD evaluation, BBQ UI,[WebDB99] [SSD99]

[SIGMOD99] [EDBT00] (w/ Papakonstantinou, Vianu, ...)

Page 17: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

17

Open Issue: Querying XML Streams or: From Pull to Push

• Given:– stream S of XML events (open, close, data)

– XML query Q over S

– constraints: 1-pass “on-the-fly” processing, bounded memory

• Find:– decide whether, and if so how, Q can be evaluated given the

constraints

• Initial Approach:– transducer model XSM (XML Stream Machine) to approximate

“streamable” queries

– tree transducers, tree-walking automata!? (w/ Papakonstantinou, Mukhopadhyay, Vianu)

Page 18: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

18

Example: XML Stream Query

XML query (r) = for each customer $C, list all orders $O

Query-aware DTD design is even more important for stream queries!

Page 19: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

19

Example: XML Stream Machine (XSM)

input/output: stream of XML events

memory: finite state control, buffers,

transitions: on EVENT do ACTION

transducer model

Page 20: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

20

PART II: Model-Based Mediation

Page 21: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

21

What’s the Problem with pure XML?

• XML is Syntax– canonical syntax for labeled ordered trees

– a metalanguage, but all semantics lies outside of XML

• DTDs => tag names + element nesting

• XML Schema => DTDs + some data modeling

• Need anything else? => write comments!

Page 22: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

22

Having Schemas & Semantics wasn’t that bad after all...

• Query: “What’s the average price of books in MyDB?”• A Query Plan in PMX-Query:

N=count (MyDB//book); S=sum(MyDB//book/price);

Avg=S/N .

• Consider Structural Constraints:– can a book have multiple prices?

– the schema will tell you!

– quick fix!?

N=count (MyDB//book/price)

Page 23: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

23

Having Schemas & Semantics wasn’t that bad after all...

• “What’s the average price of books in MyDB?”• Consider Structural/Semantic Constraints:

– XML Schema schema: <xsd:complexType aw2> <!– award winning works >

<xsd:extension base=“book”>

<xsd:element name=“award” type=“xsd:string” maxOccurs=“unbounded”>

– XML query processor has to be aware of the subclass relationship encoded in the XML Schema schema!

– if “aw2” has subelement types that are subtypes of the book subelement types, the XML instance may leave no clues what we’re dealing with!

• Modified Query Plan:N=count (MyDB//(book | aw2)/price); S=sum(MyDB//(book | aw2)/price);

Avg=S/N .

Page 24: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

24

What’s the Problem with XML & Complex Multiple-Worlds?

• XML is Syntax– DTDs talk about element nesting– XML Schema schemas give you data types – need anything else? => write comments!

• Domain Semantics is complex:– implicit assumptions, hidden semantics sources seem unrelated to the non-expert

• Need Structure and Semantics beyond XML trees! employ richer OO models make domain semantics and “glue knowledge” explicit use ontologies to fix terminology and conceptualization avoid ambiguities by using formal semantics

Page 25: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

XML-Based vs. Model-Based Mediation

Raw DataRaw DataRaw Data

IF THEN IF THEN IF THEN

LogicalDomainConstraints

Integrated-CM :=

CM-QL(Src1-CM,...)

Integrated-CM :=

CM-QL(Src1-CM,...)

. . ....

....

........ (XML)Objects

Conceptual Models

XMLElements

XML Models

C2 C3

C1

R

Classes,Relations,is-a, has-a, ...

Glue Maps

DMs, PMs

Glue Maps

DMs, PMs

Integrated-DTD :=

XML-QL(Src1-DTD,...)

Integrated-DTD :=

XML-QL(Src1-DTD,...)

No DomainConstraints

A = (B*|C),DB = ...

Structural Constraints (DTDs),Parent, Child, Sibling, ...

CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, DAML+OIL, …}

Page 26: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

28

What’s the Glue? What’s in a Link?

• Syntactic Joins (X,Y) := X.SSN = Y.SSN equality (X,Y) := X.UMLS-ID = Y.UID

• “Speciality” Joins (X,Y,Score) := BLAST(X,Y,Score) similarity

• Semantic/Rule-Based Joins (X,Y,C) :=

X isa C, Y isa C, BLAST(X,Y,S), S>0.8 homology, lub (X,Y,[produces,B,increased_in]) :=

X produces B, B increased_in Y. rule-based

e.g., X=-secretase, B=beta amyloid, Y=Alzheimer’s disease

• YAC (Yet Another Challenge): – compile semantic joins into efficient syntactic ones

XY

Page 27: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

29

Model-Based Mediation Methodology ...

• Lift Sources to export CMs:

CM(S) = OM(S) + KB(S) + CON(S)

• Object Model OM(S):– complex objects (frames), class hierarchy, OO constraints

• Knowledge Base KB(S):– explicit representation of (“hidden”) source semantics

– logic rules over OM(S)

• Contextualization CON(S):– situate OM(S) data using “glue maps” (GMs): domain maps DMs (ontology)

= terminological knowledge: concepts + roles process maps PMs

= “procedural knowledge”: states + transitions

Page 28: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

30

... Model-Based Mediation Methodology

• Integrated View Definition (IVD)– declarative (logic) rules with object-oriented features

– defined over CM(S), domain maps, process maps

– needs “mediation engineers” = domain + KRDB experts

• Knowledge-Based Querying and Browsing (runtime):– mediator composes the user query Q with the IVD

... rewrites (Q o IVD), sends subqueries to sources

... post-processes returned results (e.g., situate in context)

Page 29: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

31

S1 S2

S3

(XML-Wrapper) (XML-Wrapper) (XML-Wrapper)

CM-Wrapper CM-Wrapper CM-Wrapper

USER/ClientUSER/Client

CM (Integrated View)

MediatorEngine

FL rule proc.

LP rule proc.

Graph proc.XSB Engine

CM(S) =OM(S)+KB(S)+CON(S)

GCM

CM S1

GCM

CM S2

GCM

CM S3

CM Queries & Results (exchanged in XML)

Domain MapsDMs

Domain MapsDMs

Domain MapsDMs

Domain MapsDMs

Domain MapsDMs

Process MapsPMs

“Glue” MapsGMs

semanticcontextCON(S)

Integrated View Definition IVD

Model-Based Mediator Architecture

First results & Demos:KIND prototype, formal

DM semantics, PMs[SSDBM00] [VLDB00][ICDE01] [NIH-HB01]

(w/ Gupta, Martone)

Page 30: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

32

Domain Maps (Ontologies) as Glue Knowledge Sources

• Domain Map = Ontology– representation of terminological knowledge

• Use in Model-Based Mediation– (derived) concepts as “drop points”, “anchor points”, “context”

for source classes

– compile-time use: view definition, subsumption, classification,...

– runtime use: querying/deduction, path queries, ....

• Formalisms:– Semantic nets, Thesauri, Frame-logic, Description logics, ...

Page 31: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

33

Ontologies

• So what is an Ontology?– definition of things that are relevant to your application– representation of terminological knowledge (“TBox”)– explicit specification of a conceptualization– concept hierarchy (“is-a”)– further semantic relationships between concepts– abstractions of relational schemas, (E)ER, UML classes, XML

Schemas

• Examples:– NCMIR ANATOM– GO (Gene Ontology)– UMLS (Unified Medical Language System– CYC

Page 32: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

34

Formalism for Ontologies: Description Logic

• DL definition of “Happy Father” (Example from Ian Horrocks, U Manchester, UK)

Page 33: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

35

Description Logics

• Terminological Knowledge (TBox)– Concept Definition (naming of concepts):

– Axiom (constraining of concepts):

=> a mediators “glue knowledge source”

• Assertional Knowledge (ABox)– the marked neuron in image 27

=> the concrete instances/individuals of the concepts/classes that your sources export

Page 34: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

36

Description Logic Statements as F-logic Rules

• In F-logic:X : happyFather :--

X : man, (X..child) : blue, (X..child) : green,

not ( (X..child) : poorunhappyChild ).

C : poorunhappyChild :--

not C : rich, not C : happy.

• Alternatively: DLs as fragments of First-Order Logic

Page 35: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

37

Querying vs. Reasoning

• Querying: – given a DB instance I (= logic interpretation), evaluate a query

expression (e.g. SQL, FO formula, Prolog program, ...)– boolean query: check if I |= (i.e., if I is a model of ) – (ternary) query: { (X, Y, Z) | I |= (X,Y,Z) } => check happyFathers in a given database

• Reasoning:– check if I |= implies I |= for all databases I, – i.e., if => – undecidable for FO, F-logic, etc.– Descriptions Logics are decidable fragments concept subsumption, concept hierarchy, classification semantic tableaux, resolution, specialized algorithms

Page 36: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

38

What’s in an Answer?(What’s in a Link? revisited)

• Semantic/Rule-Based Joins

(X,Y,[produces,B,increased_in]) :=

X produces B, B increased_in Y. rule-based

e.g., X=-secretase, B=beta amyloid, Y=Alzheimer’s disease

• What is the Erdoes number of person P? – 3

• Really? Why?– authority based: <VIP> said so– faith based: don’t know but believe firmly– query statement Q = ... derived it from DB I– query Q = ... derived it from DB I and KB T using derivation D=> logic-based systems often “come with explanations”

(“computations as proofs”)

XY

Page 37: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

39

Formalizing Glue Knowledge:Domain Map for SYNAPSE and NCMIR

Domain Map = labeled graph with concepts ("classes") and roles ("associations")• additional semantics: expressed as logic rules (F-logic)

Domain Map = labeled graph with concepts ("classes") and roles ("associations")• additional semantics: expressed as logic rules (F-logic)

Domain Map (DM)

Purkinje cells and Pyramidal cells have dendritesthat have higher-order branches that contain spines.Dendritic spines are ion (calcium) regulating components.Spines have ion binding proteins. Neurotransmissioninvolves ionic activity (release). Ion-binding proteinscontrol ion activity (propagation) in a cell. Ion-regulatingcomponents of cells affect ionic activity (release).

Domain Expert Knowledge

DM in Description Logic

Page 38: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

40

Source Contextualization & DM Refinement

In addition to registering (“hanging off”) data relative toexisting concepts, a source may also refine the mediator’s domain map...

sources can register new concepts at the mediator ...

Page 39: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

Example:ANATOM Domain Map

Page 40: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

42

Browsing Registered Data with Domain Maps

Page 41: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

43

Compilation : Domain Maps => F-Logic Rules

• Domain Maps ~ Ontologies• DMs have a formal semantics via a translation to F-

Logic (~ Datalog + OO features)

=> Declarative + “Executable” Specification• query evaluation with deductive rules• reasoning over decidable fragments:

• checking concept subsumption, equivalence

Page 42: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

44

Frame-Logic Example Schema and Instances

Page 43: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

45

• Schema Level (“Ontology”)

• Instance Level (DB Instance)

• F-Logic Queries

Page 44: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

Query Processing “Demo”

Query resultsin context

ContextualizationCON(Result) wrt. ANATOM.

Integrated View DefinitionIntegrated View DefinitionDERIVEprotein_distribution(Protein, Organism, Brain_region, Feature_name,

Anatom, Value) IFI:protein_label_image[ proteins ->> {Protein}; organism -> Organism;

anatomical_structures ->>{AS:anatomical_structure[name->Anatom]}] , % from

PROLAB

NAE:neuro_anatomic_entity[name->Anatom; % from ANATOM located_in->>{Brain_region}], AS..segments..features[name->Feature_name; value->Value].

DERIVEprotein_distribution(Protein, Organism, Brain_region, Feature_name,

Anatom, Value) IFI:protein_label_image[ proteins ->> {Protein}; organism -> Organism;

anatomical_structures ->>{AS:anatomical_structure[name->Anatom]}] , % from

PROLAB

NAE:neuro_anatomic_entity[name->Anatom; % from ANATOM located_in->>{Brain_region}], AS..segments..features[name->Feature_name; value->Value].

• provided by the domain expert and mediation engineer• deductive OO language (here: F-logic)

• provided by the domain expert and mediation engineer• deductive OO language (here: F-logic)

Page 45: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

Example: Inside Query Evaluation

push selection

@SENSELAB: X1 := select targets of “output from parallel fiber” ;

determine source context

@MEDIATOR: X2 := “find and situate” X1 in ANATOM Domain Map;

compute region of interest (here: downward closure)

@MEDIATOR: X3 := subregion-closure(X2);

push selection

@NCMIR: X4 := select PROT-data(X3, Ryanodine Receptors);

compute protein distribution

@MEDIATOR: X5 := compute aggregate(X4);

display in context

@MEDIATOR/GUI: display X5 in context (ANATOM)

"How does the parallel fiber output (Yale/SENSELAB) relate to the

distribution of Ryanodine Receptors (UCSD/NCMIR)?”

Page 46: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

48

Some Open Database & Knowledge Representation Issues

• Mix of Query Processing and Reasoning– FaCT description logic reasoner for DMs?– or reconcilation of DMs via argumentation-frameworks

(“games”) using well-founded and stable models of logic programs [ICDT97,PODS97,TCS00]

• Modeling “Process Knowledge” => Process Maps– formal semantics? (dynamic/temporal/Kripke models?)– executable semantics? (Statelog?)

• Graph Queries over DMs and PMs– expressible in F-logic [InfSystem98]– scalability? (UMLS Domain Map has millions of entries)

• ...

Page 47: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

49

Process Maps with Abstractions and Elaborations:=> From Terminological to Procedural Glue

• nodes ~ states• edges ~ processes, transitions• blue/red edges:

• processes in Src1/Src2• general form of edges:

how about these?

Page 48: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

50

Summary: Mediation Scenarios & Techniques

Federated Databases XML-Based Mediation Model-Based Mediation

One-World One-/Multiple-Worlds Complex Multiple-Worlds

Common Schema Mediated Schema Common Glue Maps

SQL, rules XML query languages DOOD query languages

Schema Transformations Syntax-Aware Mappings Semantics-Aware Mappings

Syntactic Joins Syntactic Joins “Semantic” Joins via Glue Maps

DB expert DB expert KRDB + domain expert

Page 49: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

51

Models and Formal Approaches:Relating Theory to the World

©2000 by John F. Sowa, http://www.jfsowa.com/krbook/, Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks/Cole, Pacific Grove, CA.

All models are wrong, but some are useful!

Page 50: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

52

Questions?

Queries?

Page 51: Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

53

Some References

• XML-Based and Model-Based Mediation:– MBM: Model-Based Mediation with Domain Maps, B. Ludäscher, A. Gupta, M. E. Martone,

17th Intl. Conference on Data Engineering (ICDE), Heidelberg, Germany, IEEE Computer Society,2001.

– VXD/Lazy Mediaors: Navigation-Driven Evaluation of Virtual Mediated Views, B. Ludäscher, Y. Papakonstantinou, P. Velikhov, Intl. Conference on Extending Database Technology (EDBT), Konstanz, Germany, LNCS 1777, Springer, 2000.

– DOOD: Managing Semistructured Data with FLORID: A Deductive Object-Oriented Perspective, B. Ludäscher, R. Himmeröder, G. Lausen, W. May, C. Schlepphorst, Information Systems, 23(8), Special Issue on Semistructured Data, 1998.

• STATELOG (Logic Programming with States)– On Active Deductive Databases: The Statelog Approach, G. Lausen, B. Ludäscher, and W.

May. In Transactions and Change in Logic Databases, Hendrik Decker, Burkhard Freitag, Michael Kifer, and Andrei Voronkov, editors. LNCS 1472, Springer, 1998.

• Argumentation Frameworks as Games – Games and Total DatalogNeg Queries, J. Flum, M. Kubierschky, B. Ludäscher,

Theoretical Computer Science, 239(2), pp.257-276, Elsevier, 2000.

– Referential Actions as Logical Rules, B. Ludäscher, W. May, G. Lausen, Proc. 16th ACM Symposium on Principles of Database Systems (PODS'97), Tucson, Arizona, ACM Press, 1997.