1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong...
-
Upload
davion-dear -
Category
Documents
-
view
219 -
download
0
Transcript of 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong...
1
Global-as-View and Local-as-Viewfor Information Integration
CS652 Spring 2004
Presenter: Yihong Ding
2
Common Integration Architecture
• Information Integration Systems
• Global-as-view (Gav.) vs. Local-as-view (Lav.)
• Query Reformulation• Specification of Source
Description• Adding new sources
3
Query Reformulation
• Problem: rewrite a user query expressed in the mediated schema into a query expressed in the source schema
Given a query Q in terms of the mediator schema relations, and descriptions of information sources
Find a query Q’ that uses only the source relations, such that– Q’ Q, and– Q’ provides all possible answers to Q given the sources
5
Query Rewriting Using Views
• Query Containment: q’ q D q’(D) q(D)• Query Equivalence: q’=q q’ q ^ q q’Given query q and view definitions V={v1, …, vn}• q’ is an Equivalent Rewriting of q using V if
– q’ refers only to views in V, and– q’ = q
• q’ is an Maximally-Contained Rewriting of q using V if – q’ refers only to views in V and– q’ q, and– There is no rewriting q1, such that q’ q1 and q1q’
7
Complexity of Query Containment
• Conjunctive Queries (CQ) (NP-Complete) – Q1: p(X,Z) :- a(X,Y) & a(Y,Z)– Q2: p(X,Z) :- a(X,Y) & a(V,Z)
• CQ’s With Negation ( -Complete)– Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & NOT a(X,Z)
• CQ’s With Arithmetic Comparison ( -Complete)– Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & X<Y
• Datalog Programs– p(A,C) :- a(A,B) & b(B,C)
p
2
p
2
8
Specification of Source Description
• Views: resources that used by integrator to help to answer queries
• Gav. Mediator relation defined as view over source relations
• Lav. Source relation defined as view over mediator relations
9
Information Integration Systems
• Tsimmis– Stanford and IBM– Global-as-View (Gav)– Mediator relations defined as views of source relations
• Information Manifold (IM)– AT&T– Local-as-View (Lav)– Description logic– Source relations defined as views of mediator relations ( a
collection of global predictions)
10
TSIMMIS – Gav Solution
• The Stanford-IBM Manager of Multiple Information Sources (TSIMMIS)
• Offers:– A flexible data model– A common query language– Other supporting tools
11
TSIMMIS – Components
• OEM (Object-Exchange Model)
• LOREL (Lightweight Object REpository Language)
• MSL (Mediator Specification Language)
• Wrappers
12
TSIMMIS – OEM
• Object Exchange Model• The data model for TSIMMIS• “self-describing” (labels carry all of the
information that there is about an object)• Flexible• First order logic
13
TSIMMIS – OEM
OID: label type value
Object Identifier
Human Understandable
“set” or “string”
A set or a string
15
TSIMMIS – OEMFirst order predicate logic
author
string
Aho123
author( T, “Aho” )
This would return the object IDs of allobjects with a label “author” and value “Aho”.
16
TSIMMIS – LOREL
• Lightweight Object REpository Language• An OQL for OEM• The end-user language for TSIMMIS
18
TSIMMIS – LOREL
• Partial Match Semantics
select R.Afrom R, S, Twhere R.A = S.A or R.A = T.A
• This would fail to return anything in SQL if either S or T were empty.
• Because of partial match semantics this does not fail in LOREL
19
TSIMMIS – MSL
• Mediator Specification Language• Allows declarative specification of mediators• Object oriented, logical query language• Targeted to OEM
20
TSIMMIS – MSL Query
Mediator
Mediator
WrapperWrapper
SourceSource
<booktitle X> :- <library { <book { <title X> <author “Aho”> } > } > @s1
library
set
book set
author
string
Aho
title string
Compilers…
21
TSIMMIS – Wrappers Query
Mediator
Mediator
WrapperWrapper
SourceSource
• Wrappers are similar to database drivers
• Wrappers are written with MSL
22
TSIMMIS – Wrappers
• Wrappers have the form:
MSL template// action //
• Example:
<books X> :- <library { X:<book {<title X> <author $AU>}> }>@s1// sprintf(lookup-query, “find author %s”, $AU) //
23
TSIMMIS – Summary
• End users need to specify their sources w.r.t. a mediator model – OEM in TSIMMIS
• Query specification is standard – LOREL • Query rewriting is straightforward – MSL and
wrappers • To add a new source is not easy – need to
specify it in the mediator model
24
Information Manifold
• Challenges for Information Integration– Interrelated data over
multiple information sources
– Large number of the sources
– Limited size of data in many of the sources
– Greatly variant details of interacting with each source
26
World View
Product(Model)Automobile(Model, Year, Category)Motorcycle(Model, Year)Car(Model, Year, Category) NewCar(Model, Year, Category)UsedCar(Model, Year, Category) CarForSale(Model, Year, Category, Price, SellerContact)
Automobile
Car Motorcycle
Car
UsedCar CarForSale
Product
Automobile
Virtual Relations:
Classes:
NewCar
27
Source Descriptions
For each source:
• Content Record • Capability Record
Web Sources forAutomobile Application
30
Query Reformulation
• Containing instead of equivalent– Incomplete source – Useful subset
• Utilizes Plan Generator to:– Prune irrelevant sources– Split query into subgoals– Generate conjunctive query plans– Find executable ordering of subgoals
31
The Bucket Algorithm
Given: user query q, source descriptions {Vi}
1. Find relevant source (fill buckets) For each relation g in query q
• Find Vj that contains relation g
• Check that constraints in Vj are compatible with q
2. Combine source relations {Vj} from each bucket into a conjunctive query q’ and check for containment (q’ q)
32
The Bucket Algorithm: Example
q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992, Model(c,m), Price(c,p), ProductReview(m,y,r)
q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992, Model(c,m), Price(c,p), ProductReview(m,y,r)
33
1. Filling the Buckets
q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992,
Model(c,m), Price(c,p), ProductReview(m,y,r)
q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992,
Model(c,m), Price(c,p), ProductReview(m,y,r)
V1(c1)
V2(c2)
V3(c3)
V1(c1,t1)
V2(c2,t2)
V3(c3,t3)
V1(c1,y1)
V2(c2,y2)
V3(c3,y3)
V1(c1,m1)
V2(c2,m2)
V3(c3,m3)
V1(c1,p1)
V2(c2,p2)
V3(c3,p3)
V5(m5,y5,r5)
CarForSale(c), Category(c,t), Year(c,y), Model(c,m), Price(c,p), ProductReview(m,y,r)y1992t=sportscar
34
2. Checking Containment
User Queryq(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992,
Model(c,m), Price(c,p), ProductReview(m,y,r)
User Queryq(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992,
Model(c,m), Price(c,p), ProductReview(m,y,r)
Result Queryq’(m,p,r) V1(c)({Category(c):sportscar}, {Price(c), Model (c), Year(c)}, {Year(c)1992, Category(c)=sportscar}), V5(m,y,r)({m:Model(c), y:Year(c)}, {r}, {}).
Result Queryq’(m,p,r) V1(c)({Category(c):sportscar}, {Price(c), Model (c), Year(c)}, {Year(c)1992, Category(c)=sportscar}), V5(m,y,r)({m:Model(c), y:Year(c)}, {r}, {}).
?
Expanded Queryq’(m,p,r) CarForSale(c), UsedCar(c), Category(c,t), t=sportscar, Model(c,m), Year(c,y), Price(c,p), ProductReview(m,y,r), y1992
Expanded Queryq’(m,p,r) CarForSale(c), UsedCar(c), Category(c,t), t=sportscar, Model(c,m), Year(c,y), Price(c,p), ProductReview(m,y,r), y1992
35
Finding an Executable Ordering
CarForSale(c), Category(c,t), Year(c,y), Model(c,m), Price(c,p), ProductReview(m,y,r)y1992t=sportscar
V1(c) V1(c,t) V1(c,y) V1(c,m) V1(c,p) V5(m,y,r)
BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s)}
BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s)}
BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r)}
BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r)}
BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r), y1992}
BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r), y1992}
36
Advantages and Disadvantages
• Gav: Tsimmis– Advantage
• Query reformulation: rule unfolding– Disadvantage
• Mediation description• Adding, removing, and modifying source description
– Better for static, centralized systems
• Lav: Information Maniford– Advantage: adding new sources
• Mediator (global predicates, source descriptions)• Query processing
– Disadvantages• query reformulation (Bucket algorithm)
– Better for dynamic, distributed systems