Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher...

46
Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher [email protected] Data and Knowledge Systems (DAKS) * San Diego Supercomputer Center U.C. San Diego * * formerly: DICE formerly: DICE

Transcript of Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher...

Page 1: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

Knowledge Management for Digital Libraries, Mediated Views, & Archives

Bertram Ludä[email protected]

Data and Knowledge Systems (DAKS)**

San Diego Supercomputer Center

U.C. San Diego

* * formerly: DICEformerly: DICE

Page 2: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

2

Data and Knowledge Systems (Re-)Organization

• Data and Knowledge Systems Labs (formerly “DICE”):– Data Grids (SRB et al.)– Advanced Query Processing (KRDB/MBM)– Knowledge-Based Integration (KRDB/MBM)– Knowledge and Information Discovery (Data Mining)– Spatial Information Systems (GIS)

• Project Areas:– Data and Knowledge Grids (GriPhyN, NVO, BIRN, I2T, GeoGrid,

SciDAC/SDM, ...)– Digital Libraries (DLI2, NSDL, ...)– Persistent Archives (NARA, NHPRC)....

• R&D:– SRB/(E)MCAT, mySRB, ...– XML/Model-based mediator: from proof-of-concept to reusable prototypes – KBA methodology, preliminary archival prototypes

Page 3: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

3

The Message: Knowledge Management for Digital Libraries, Mediated Views, & Archives

• Making data sources (DBs, collections, archives) “smarter” by adding semantics, context, “knowledge”: extend scope of information integration, mediation, and digital

library federation “intelligent”/“informed” browsing and querying of information

(e.g., via Topic Maps, “concept spaces”, “Semantic Web” tech.) richer, more self-contained knowledge-based archives (KBA)

• Which Knowledge Representation Formalisms?– formal ontologies (domain maps), expressed in Description

Logics (aka concept-definition/terminological languages)– XML, RDF(S), DAML+OIL, Onto..., KIF, KQML, LOOM, ....

• Goal: Create “Executable Knowledge”... => Right mix between DB and KR technologies!

Page 4: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

4

Outline

• Information Integration from a Database Perspective– examples, mediator approach, some technical challenges

• Part I: XML-Based Mediation – based on querying semistructured data & XML

• Part II: Model-Based Mediation– basic ideas & architecture, lifting data to knowledge sources

– “glue maps” (domain maps, process maps) and ontologies

– ongoing/future research: mix of DB & KR techniques

• Part III: Knowledge-Based Archives– how to add more semantics to archives

• Discussion

Page 5: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

An Online Shopper’s Information Integration Problem

El Cheapo: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?”

?Information Integration

?Information Integration

addall.comaddall.com

“One-World”Mediation

“One-World”Mediation

amazon.comamazon.com A1books.comA1books.comhalf.comhalf.combarnes&noble.combarnes&noble.com

WWWpublic library

Page 6: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

A Home Buyer’s Information Integration Problem

What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood

with below-average crime rate and diverse population?

?Information Integration

?Information Integration

RealtorRealtor DemographicsDemographicsSchool RankingsSchool RankingsCrime StatsCrime Stats

“Multiple-Worlds”Mediation

“Multiple-Worlds”Mediation

Page 7: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

7

Information Integration from a DB Perspective

• Information Integration Challenge– Given: data sources S_1, ..., S_k (DBMS, web sites, ...) and

user questions Q_1,...,Q_n that can be answered using the S_i

– Find: the answers to Q_1, ..., Q_n

• The Database Perspective: source = “database” S_i has a schema (relational, XML, OO, ...) S_i can be queried define virtual (or materialized) integrated views V over

S_1,...,S_k using database query languages questions become queries Q_i against V(S_1,...,S_k)

• Why a Database Perspective?– scalability, efficiency, reusability (declarative queries), ...

Page 8: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

8

PART I: XML-Based Mediation

Page 9: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

9

Abstract XML-Based Mediator Architecture

S_1

MEDIATORMEDIATOR

XML Queries & Results

USER/ClientUSER/Client

Wrapper

XML View

S_2

Wrapper

XML View

S_k

Wrapper

XML View

IntegratedXML View V

Integrated ViewDefinition

IVD(S1,...,Sn)

Query Q o V (S_1,...,S_k)Query Q o V (S_1,...,S_k)

Page 10: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

10

A Concrete (Future) XML-Based Mediator System

S1 S2

S3

XML (Integrated View)

MEDIATOREngine

XQuery Processor

Integrated View Definition IVD

XML Queries & Results

XQuery

XPATH

XQuery

XSLT

XQuery

XSQL

USER/ClientUSER/Client

XML-Wrapper

XQuery

XQuery

XScan

XPath

SQL

XSQL

http-get

XSLT

XML-Wrapper XML-Wrapper

First Results & Demos:XMAS language and algebra,

VXD evaluation, BBQ UI,[WebDB99] [SSD99]

[SIGMOD99] [EDBT00] (w/ Papakonstantinou, Vianu, ...)

Page 11: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

11

Some Technical Challenges ...

• XML Query Languages– DB community: QLs for semistructured data, e.g.,

TSIMMIS/MSL, Lorel, Yatl, ..., Florid/F-logic [InfSystems98] – CSE/SDSC: XMAS [SSD99,WebDB99,EDBT00]

– W3C: XPath, XSLT, XQuery (Working Draft , June 2001)

• DB Theory: Expressiveness/Complexity Trade-Off– querying: FO, (WF/S-)Datalog, FO(LFP), FO(PFP), ... , all

– reasoning: query satisfiability, containment, equivalence

Page 12: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

12

... Some More Technical Challenges ...

• DB Practice: Query Composition– compute Q o V(S_1,...,S_k) w/o computing all of V “push Q through V into S_i” in Datalog: view unfolding (resolution, unification) +

simplification ~ top-down evaluation ~ magic sets in XML: some solutions (Papakonstantinou, ...)

• Navigation-Driven Evaluation of Integrated View V:– V materialized => warehousing approach

– V virtual => mediator approach

– V virtual & driven by user-navigation => VXD approach [EDBT00] (w/ Papakonstantinou, Velikhov)

Page 13: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

13

XMAS: XML Matching And Structuring language

Integrated View Definition:“Find books from amazon.com

and DBLP, join on author,group by authors and title”

CONSTRUCT <books> <book>

$a1$t<pubs>

$p { $p } </pubs>

</book> { $a1, $t } </books>WHERE <books.book>

$a1 : <author />$t : <title />

</> IN "amazon.com" AND <authors.author>

$a2 : <author /><pubs> $p : <pub/> </>

</> IN "www...DBLP… "AND value( $a1 ) = value( $a2 )

CONSTRUCT <books> <book>

$a1$t<pubs>

$p { $p } </pubs>

</book> { $a1, $t } </books>WHERE <books.book>

$a1 : <author />$t : <title />

</> IN "amazon.com" AND <authors.author>

$a2 : <author /><pubs> $p : <pub/> </>

</> IN "www...DBLP… "AND value( $a1 ) = value( $a2 )

XMAS

XMAS Algebra

Page 14: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

14

XML (XMAS) Query Processing

TranslatorTranslator

Rewriter/OptimizerRewriter/Optimizer

composed plan

optimized plan

XMAS Query Q

Composition (Q o V)Composition (Q o V)

XMAS ViewDefinition V

algebraic plans

Plan Execution Plan Execution

Compile-timeCompile-time

Run-time: lazy

VXD evaluation

Run-time: lazy

VXD evaluation

Page 15: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

15

PART II: Model-Based Mediation

Page 16: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

A Geoscientist’s Information Integration Problem

What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ?

How does it relate to host rock structures?

?Information Integration

?Information Integration

Geologic Map(Virginia)

Geologic Map(Virginia) GeoChemicalGeoChemical GeoPhysical

(gravity contours)

GeoPhysical(gravity contours)

GeoChronologic(Concordia)

GeoChronologic(Concordia)

Foliation Map(structure DB)

Foliation Map(structure DB)

“Complex Multiple-Worlds”

Mediation

“Complex Multiple-Worlds”

Mediation

Page 17: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

A Neuroscientist’s Information Integration Problem

What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity?

How about other rodents?

?Information Integration

?Information Integration

protein localization(NCMIR)

protein localization(NCMIR)

neurotransmission(SENSELAB)

neurotransmission(SENSELAB)

sequence info(CaPROT)

sequence info(CaPROT) morphometry

(SYNAPSE)

morphometry(SYNAPSE)

“Complex Multiple-Worlds”

Mediation

“Complex Multiple-Worlds”

Mediation

Page 18: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

18

What’s the Problem with XML & Complex Multiple-Worlds?

• XML is Syntax– canonical syntax for labeled ordered trees– a metalanguage, but all semantics lies outside of XML

• DTDs => tags (=controlled vocabulary) + nesting• XML Schema => DTDs + data modeling • need anything else? => write comments!

– but: agreed-upon XML standards still a good thing!

• Domain Semantics is complex:– implicit assumptions, hidden semantics sources seem unrelated to the non-expert

• Need Structure and Semantics beyond XML trees! employ richer OO models make domain semantics and “glue knowledge” explicit use ontologies to fix terminology and conceptualization avoid ambiguities by using formal semantics

Page 19: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

19

DB mediation techniques

OntologiesKR formalisms

Model-Based Mediation

Information Integration Landscape

conceptual distanceone-world multiple-worlds

conceptual complexity/depth

low

high

addallbook-buyer

BLAST

EcoCyc

Cyc

WordNet

GO

home-buyer24x7 consumer

UMLS

MIA Entrez

RiboWeb

Tambis

BioinformaticsGeoinformatics

Page 20: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

XML-Based vs. Model-Based Mediation

Raw DataRaw DataRaw Data

IF THEN IF THEN IF THEN

LogicalDomainConstraints

Integrated-CM :=

CM-QL(Src1-CM,...)

Integrated-CM :=

CM-QL(Src1-CM,...)

. . ....

....

........ (XML)Objects

Conceptual Models

XMLElements

XML Models

C2 C3

C1

R

Classes,Relations,is-a, has-a, ...

Glue Maps

DMs, PMs

Glue Maps

DMs, PMs

Integrated-DTD :=

XML-QL(Src1-DTD,...)

Integrated-DTD :=

XML-QL(Src1-DTD,...)

No DomainConstraints

A = (B*|C),DB = ...

Structural Constraints (DTDs),Parent, Child, Sibling, ...

CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, DAML+OIL, …}

Page 21: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

23

What’s the Glue? What’s in a Link? • Syntactic Joins

(X,Y) := X.SSN = Y.SSN equality (X,Y) := X.UMLS-ID = Y.UID

• “Speciality” Joins (X,Y,Score) := BLAST(X,Y,Score) similarity

• Semantic/Rule-Based Joins (X,Y,C) :=

X isa C, Y isa C, BLAST(X,Y,S), S>0.8 homology, lub (X,Y,[produces,B,increased_in]) :=

X produces B, B increased_in Y. rule-based

e.g., X=-secretase, B=beta amyloid, Y=Alzheimer’s disease

• A Technical Challenge: – “compile” semantic joins into efficient rule evaluation + syntactic

joins

XY

Page 22: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

24

Model-Based Mediation Methodology ...

• Lift Sources to export Conceptual Models (CMs): CM(S) = OM(S) + KB(S) + CON(S)

• Object Model OM(S):– complex objects (frames), class hierarchy, OO constraints

• Knowledge Base KB(S):– explicit representation of (“hidden”) source semantics – logic rules over OM(S)

• Contextualization CON(S):– situate OM(S) data using “glue maps” (GMs): domain maps DMs (ontology)

= terminological knowledge: concepts + roles process maps PMs

= “procedural knowledge”: states, events, transitions

Page 23: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

25

... Model-Based Mediation Methodology

• Integrated View Definition (IVD)– declarative (logic) rules with object-oriented features

– defined over CM(S), domain maps, process maps

– needs “mediation engineers” = domain + KRDB experts

• Knowledge-Based Querying and Browsing (runtime):– mediator composes the user query Q with the IVD

... rewrites (Q o IVD), sends subqueries to sources

... post-processes returned results (e.g., situate in context)

Page 24: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

26

S1 S2

S3

(XML-Wrapper) (XML-Wrapper) (XML-Wrapper)

CM-Wrapper CM-Wrapper CM-Wrapper

USER/ClientUSER/Client

CM (Integrated View)

MediatorEngine

FL rule proc.

LP rule proc.

Graph proc.XSB Engine

CM(S) =OM(S)+KB(S)+CON(S)

GCM

CM S1

GCM

CM S2

GCM

CM S3

CM Queries & Results (exchanged in XML)

Domain MapsDMs

Domain MapsDMs

Domain MapsDMs

Domain MapsDMs

Domain MapsDMs

Process MapsPMs

“Glue” MapsGMs

semanticcontextCON(S)

Integrated View Definition IVD

Model-Based Mediator Architecture

First results & Demos:KIND prototype, formal

DM semantics, PMs[SSDBM00] [VLDB00][ICDE01] [NIH-HB01]

(w/ Gupta, Martone)

Page 25: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

27

Formalizing Glue Knowledge:Domain Map for SYNAPSE and NCMIR

Domain Map = labeled graph with concepts ("classes") and roles ("associations")• additional semantics: expressed as logic rules (F-logic)

Domain Map = labeled graph with concepts ("classes") and roles ("associations")• additional semantics: expressed as logic rules (F-logic)

Domain Map (DM)

Purkinje cells and Pyramidal cells have dendritesthat have higher-order branches that contain spines.Dendritic spines are ion (calcium) regulating components.Spines have ion binding proteins. Neurotransmissioninvolves ionic activity (release). Ion-binding proteinscontrol ion activity (propagation) in a cell. Ion-regulatingcomponents of cells affect ionic activity (release).

Domain Expert Knowledge

DM in Description Logic

Page 26: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

28

Source Contextualization & DM Refinement

In addition to registering (“hanging off”) data relative toexisting concepts, a source may also refine the mediator’s domain map...

sources can register new concepts at the mediator ...

Page 27: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

Example:ANATOM Domain Map

Page 28: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

30

Browsing Registered Data with Domain Maps

Page 29: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

31

Compilation : Domain Maps => F-Logic Rules

• Domain Maps ~ Ontologies• DMs have a formal semantics via a translation to Description

Logics (fragments of first-order logic):

• C ==R=> D x (C(x) y D(y) R(x,y) ) (*)

• Quiz: Neuron ==has=> Compartment ?

• Translation to deductive rules:• e.g. F-logic = Datalog + OO features

=> Declarative + “Executable” Specification• query evaluation with deductive rules

=> (*) as an integrity check, or derived knowledge, ...

• reasoning over decidable fragments:

• checking concept satisfiability, subsumption, equivalence

Page 30: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

Query Processing “Demo”

Query resultsin context

ContextualizationCON(Result) wrt. ANATOM.

Integrated View DefinitionIntegrated View DefinitionDERIVEprotein_distribution(Protein, Organism, Brain_region, Feature_name,

Anatom, Value) IFI:protein_label_image[ proteins ->> {Protein}; organism -> Organism;

anatomical_structures ->>{AS:anatomical_structure[name->Anatom]}] , % from

PROLAB

NAE:neuro_anatomic_entity[name->Anatom; % from ANATOM located_in->>{Brain_region}], AS..segments..features[name->Feature_name; value->Value].

DERIVEprotein_distribution(Protein, Organism, Brain_region, Feature_name,

Anatom, Value) IFI:protein_label_image[ proteins ->> {Protein}; organism -> Organism;

anatomical_structures ->>{AS:anatomical_structure[name->Anatom]}] , % from

PROLAB

NAE:neuro_anatomic_entity[name->Anatom; % from ANATOM located_in->>{Brain_region}], AS..segments..features[name->Feature_name; value->Value].

• provided by the domain expert and mediation engineer• deductive OO language (here: F-logic)

• provided by the domain expert and mediation engineer• deductive OO language (here: F-logic)

Page 31: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

Example: Inside Query Evaluation

push selection

@SENSELAB: X1 := select targets of “output from parallel fiber” ;

determine source context

@MEDIATOR: X2 := “find and situate” X1 in ANATOM Domain Map;

compute region of interest (here: downward closure)

@MEDIATOR: X3 := subregion-closure(X2);

push selection

@NCMIR: X4 := select PROT-data(X3, Ryanodine Receptors);

compute protein distribution

@MEDIATOR: X5 := compute aggregate(X4);

display in context

@MEDIATOR/GUI: display X5 in context (ANATOM)

"How does the parallel fiber output (Yale/SENSELAB) relate to the

distribution of Ryanodine Receptors (UCSD/NCMIR)?”

Page 32: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

34

Some Open Database & Knowledge Representation Issues

• Mix of Query Processing and Reasoning– e.g., FaCT, LOOM description logic reasoner for DMs?– reconcilation of DMs via argumentation-frameworks (“games”)

using well-founded and stable models of logic programs??? [ICDT97,PODS97,TCS00]

• Modeling “Process Knowledge” => Process Maps– formal semantics? (dynamic/temporal/Kripke models?)– executable semantics? (Statelog?)

• Graph Queries over DMs and PMs– expressible in F-logic [InfSystem98]– scalability? (UMLS Domain Map has millions of entries)

• ...

Page 33: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

35

Process Maps with Abstractions and Elaborations:=> From Terminological to Procedural Glue

• nodes ~ states• edges ~ processes, transitions• blue/red edges:

• processes in Src1/Src2• general form of edges:

how about these?

Page 34: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

36

Models and Formal Approaches:Relating Theory to the World

©2000 by John F. Sowa, http://www.jfsowa.com/krbook/, Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks/Cole, Pacific Grove, CA.

All models are wrong, but some are useful!

Page 35: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

37

Summary: Mediation Scenarios & Techniques

Federated Databases XML-Based Mediation Model-Based Mediation

One-World One-/Multiple-Worlds Complex Multiple-Worlds

Common Schema Mediated Schema Common Glue Maps

SQL, rules XML query languages DOOD query languages

Schema Transformations Syntax-Aware Mappings Semantics-Aware Mappings

Syntactic Joins Syntactic Joins “Semantic” Joins via Glue Maps

DB expert DB expert KRDB + domain expert

Page 36: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

38

PART III: Knowledge-Based Archives

Page 37: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

39

From XML-Based to Knowledge-Based Archives

• Collection-based archival with XML: save data "as is" plus...– ... separate content from presentation– ... tag your data (and take a lift in the info hierarchy)– ... use a self-describing, semistructured data format (XML)

• Knowledge-based archival: now add ...– ... conceptual level information– ... integrity constraints– ... explanations/derivation rules:

• archiving only results y=f(x) vs. archiving the rules/function "f" (e.g. f = “the Florida procedure”...)

=> employ knowledge representation languages

Page 38: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

40

Knowledge-Based Archival: Senate Example

Data provider says: “Please archive all records of legislative activities of the 106th senate!”

Integrity constraints, eg:(1) {senators_with_file} = UNION (sponsor, cosponsors, submitted_by)

(2) {senators} = {sponsors} = {co-sponsors}

Violation: – the rhs is a SUPERSET of the lhs !

Exceptions:– (Chafee, John), (Gramm, Phil), (Miller, Zell)

(Possible) Explanations: – senators who joined (Zell), passed away (Chafee), were forgotten (Gramm)!?

Checking ICs:IF sponsor(X), not senator(X) THEN ADD(exception_log, missing_senator_info(X))

IF condition THEN action

Action = LOG, WARN, ABORT, ...

Page 39: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

41

Maximizing “Self-Containedness” ...

• Self-validating archives: add ...– ... "executable knowledge" (=rules) – "helping (bugging?) the data provider" => add the functionality and meaning of DTD (+Schema+IC+...)

validation to the AIP=> package the validator!

• Self-instantiating archives: add ...– ... "executable ingestion process" – “helping the archival engineer (aka archivist)”– …here is: looking over your shoulder… => add the functionality of database transformations to the AIP=> package the transformers!

• BUT packaging validators and transformers increases infrastructure dependence!

Page 40: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

42

Maximize “Self-Containedness” ...…While Minimizing Infrastructure Dependence

Basic Idea: use a language of executable specifications for self-validation

and self-instantiation!=> Use “Bootstrapping” for Self-Validating & Self-

Instantiating Archives

Example: DTD Validator in Logic (F-Logic, Datalog,…)% specify <!ELEMENT X (Y,Z)> false IF P:X, not (P1.X):Y.false IF P:X, not (P2.X):Y.false IF P:X, not P[_-> _].false IF P:X[N->_], not N=1, not N=2....

Page 41: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

43

In Search of Semantics: What’s in a Rock?

• Name: – Basalt

• Description: – a hard, black volcanic rock with less than about 52 weight percent silica

(SiO2).

• Colour: – When fresh it is black or greyish black; often weathers to a reddish or

greenish crust.

• Texture: – Usually dense with no minerals identifiable in hand specimen; a freshly

broken surface is dull in appearance. May be porphyrithic.

• Structure: – Often vesicular and/or amygdaloidal. Xenoliths are relatively common and

usually consists of olivine and pyroxene; they have a green colour...

• Mineralogy:– Phenocrysts are usually olivine (green, glassy), pyroxene (black, shiny) or

plagioclase (white-grey, tabular). If olivine is present the rock is called olivine basalt. Microscopic examination show the groundmass to consist of plagioclase (usually labradorite), pyroxene, olivine and magnetite, with a wide range of accessory minerals ...

• Field relations: – Lava flows and narrow dykes and sills. The edges of dykes or sills are

often finer grained than the centers or even glassy, due to rapid cooling on intrusion ...

Page 42: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

44

In Search of Semantics: What’s in a Rock?

• material = basalt• shape = ...• weight = ... • date.formed = ... • date.found = 1799 a.d.• place.found= Rashid • place.found.common-name = Rosetta• date.created = 196 BC• language1 = Hieroglyphic• language2 = Demotic• language3 = Greek

And private individuals shall also be allowed to keep the festival and set up the aforementioned shrine and have it in their homes, performing the aforementioned celebrations yearly, in order that it may be known to all that the men of Egypt magnify and honour the GOD EPIPHANES EUCHARISTOS the king, according to the law.

This decree shall be inscribed on a stela of Hard stone in sacred [i.e. hieroglyphic] and native [i.e. demotic] and Greek characters and set up in each of the first, second, and third [rank] temples beside the image of the ever living king.

So where did you find the semantics?

Page 43: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

45

Summary: Towards Bootstrapping Knowledge-Based Archives

Baron von Münchhausen, pulling himself out of the

swamp

• enable addition of semantic annotations ("knowledge") via logic rules to AIPs

• add executable specifications of semantics => AIP += KP (knowledge package, i.e., logic ules)

=> self-validating archive

• add executable specifications of the ingestion network => AIP += IN (ingestion network, ...more logic rules)

=> self-instantiating archive

=> bootstrapping knowledge-based archive with DTD/Schema/IC validation and ingestion transformations all expressed in a declarative logic program

• Outlook from the 2do list: build a prototype BARON = Bootstrapping Archive of Rules, Ontologies, and Ingestion Networks

Page 44: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

46

Questions?

Queries?

Page 45: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

47

References

• XML-Based and Model-Based Mediation:– MBM: Model-Based Mediation with Domain Maps, B. Ludäscher, A. Gupta, M. E.

Martone, 17th Intl. Conference on Data Engineering (ICDE), Heidelberg, Germany, IEEE Computer Society,2001.

– VXD/Lazy Mediaors: Navigation-Driven Evaluation of Virtual Mediated Views, B. Ludäscher, Y. Papakonstantinou, P. Velikhov, Intl. Conference on Extending Database Technology (EDBT), Konstanz, Germany, LNCS 1777, Springer, 2000.

– DOOD: Managing Semistructured Data with FLORID: A Deductive Object-Oriented Perspective, B. Ludäscher, R. Himmeröder, G. Lausen, W. May, C. Schlepphorst, Information Systems, 23(8), Special Issue on Semistructured Data, 1998.

• STATELOG (Logic Programming with States)– On Active Deductive Databases: The Statelog Approach, G. Lausen, B. Ludäscher,

and W. May. In Transactions and Change in Logic Databases, Hendrik Decker, Burkhard Freitag, Michael Kifer, and Andrei Voronkov, editors. LNCS 1472, Springer, 1998.

Page 46: Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher LUDAESCH@SDSC.EDU * Data and Knowledge Systems (DAKS) * San Diego.

48

• Towards Self-Validating Knowledge-Based Archives, Bertram Ludäscher, Richard Marciano, Reagan Moore, 11th Workshop on Research Issues in Data Engineering (RIDE), Heidelberg, IEEE Computer Society, April 2001, SDSC TR-2001-1, January 18, 2001. • Knowledge-Based Persistent Archives, Reagan Moore, SDSC TR-2001-7, January 18, 2001

• The Senate Legislative Activities Collection (SLA): a Case Study Infrastructure Research to Support Preservation Strategies, Richard Marciano, Bertram Ludäscher, Reagan Moore, SDSC TR-2001-5, January 18, 2001

• Reference Model for an Open Archival Information System (OAIS), Draft Recommendation, Consultative Committee for Space Data Systems, CCSDS 650.0-R-1, May 1999.

• Digital Rosetta Stone: A Conceptual Model for Maintaining Long-term Access to Digital Documents, Alan R. Heminger, Steven B. Robertson

References