Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis...
-
Upload
jared-flynn -
Category
Documents
-
view
219 -
download
2
Transcript of Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis...
Enabling Ontology Evolution in Data Integration
Haridimos KondylakisDimitris Plexousakis
Yannis Tzitzikas
Computer Science Department, University of Crete
Information Systems Laboratory, FORTH-ICS
Problem Statement
Data Integration SystemData Integration System
DBDB DBDB DBDB
Query
Sub-queries
Mappings
2 of 25
Outline
1. Past Approaches
2. Our Idea
3. Modelling Ontology Evolution
4. Rewritings among ontology versions
5. Problems & Solutions
6. Rewritings to the sources
7. Implementation/Evaluation
8. Conclusions
3 of 25
1. Past Approaches (1/2)
Mapping Adaptation (Velegrakis, 2004)
Idea: After each small evolution the mapping can be incrementally adapted by applying local modifications.
System-dependent The list of changes may not be given
and should be discovered (how?) Multiple list of changes may lead to the
same effect
Cannot handle complex change operations such as split & merge
The algorithm should reapply after each primitive change Inefficient when we have a long
list of changes
S O
O1
O2
move elem
add elem
delete constraint
M1
M2
M3
Lack of a precise criterion under which the adapted mappings constitute indeed the “right result”
4 of 25
1. Past Approaches (2/2)
Mapping Composition (Bernstein, 2008)
Idea: Is it possible to generate M’ that is equivalent to the original mappings?
No known implementation on ontology evolution
First-order mappings: not closed under composition
Second-order: Too difficult to handle Not supported by DBMS ( not likely in
the future either) Not understood by domain experts
MS O
O’M’ = M ° E
E
Can use schema mapping tools to construct E.
The composition for all mappings should be produced. Several Sets of mapping between each
T and T’5 of 25
“Everything should be as simple as it is, but not simpler” -Albert Einstein
Data Integration SystemData Integration System
DBDB DBDB DBDB
MappingsMappings Mappings
DBDB DBDB DBDB
Ontology as global schemaRDF/S Ontology
SpaRQL SpaRQL
System IndependentSystem Independent More IntuitiveMore Intuitive
Only one mapping set
Only one mapping set
ModularModular Mappings created only once
Mappings created only once
Verifiable MappingsVerifiable Mappings
6 of 25
“Everything that exists, it is only change”-Heraclitus 535 BCE
Definition (Change Operation). A change u from one ontology version O1 to another version O2 is defined as a tuple (δa, δd) where:
δa corresponds to the triples that are added to O1 in order to get O2
δd corresponds to the triples that are deleted from O1 in order to get O2
δa(u) δd(u)≠ø, δa(u) δd(u)= ø,
δa(u1) δa(u2)= ø δd(u1) δd(u2)= ø
Definition (Application semantics of a high-level change). The application of u upon O denoted by u(O) is defined as
u(O) = (O δa(u)) \ δd(u).7 of 25
3.1 Example
u1= (Delete, ø, {has_gender(Person, Gender)} )
u2= (Move, {has_cont_point(Person, Cont_Point)}, {has_cont_point(Actor, Cont_Point)})
u3 = (Merge, {domain(Cont_Point, address)},{domain(Cont_Point, street), domain(Cont_Point, city)})
u4 = (Rename, {domain(Person, fullname)}, {domain(Person, name)})
PersonPerson
LiteralLiteral
ActorActorGenderGender
LiteralLiteral
Cont. PointCont. Point
LiteralLiteral
LiteralLiteral
name
ssn
has_gender
street
cityaddress
LiteralLiteral
has_cont_point
fullname
IntuitiveIntuitive
ConciseConcise
Can Describe complex evolutionCan Describe complex evolution
8 of 25
9 of 37
4. Data Integration Redefined
RDF/S OntologyRDF/S Ontology
SourcesSourcesMappingsMappings
Definition (Data Integration): A data integration system I is a quadruple (O, E, S, M) where
•O is a version of the Ontology, •E is the evolution log of the Ontology
•(between the ontologies under consideration), •S is the set of the local sources, •M is the mapping between S and one version Oi
4.1 Affecting change operations
Definition (Affecting change operation). A change operation u affects the query Q (with graph pattern G), i.e u ◊ Q if: δd(u)≠ø and
triple pattern t G that can be unified with a triple of δd(u).
Definition (Valid Rewriting): Let q a query expressed in O1, us a sequence of change operations such that us(O1)= O2. q' is a valid rewriting of q over O2 using us if ui δd(u)such that ui ◊ q holds that |δa(ui)|>0,
t δd(ui), t ◊ q
and is constructed as follows:
q':= (q – δd(ui)) δa(ui).
10 of 25
Definition (equivalent query rewriting): (Lenzerini, 2002) Let O1, O2 two ontology versions,
E a set of dependencies on the O1 O2
q2 a O2-query
An equivalent rewriting of q2 in presence of E is a query O1-query, q1 such that q1 gives the same answers as q2 on any O1 instance that satisfies E
Theorem: Valid rewritings are equivalent query rewritings and can be computed with O(N*T) time complexity (N= #us, T =#triples in G)
4.2 Query answering semantics
11 of 25
4.3 Results
Proposition (Uniqueness): Valid rewritings are unique
Proposition (Inverse Query Rewriting): if q2 is a query over O2 and E the evolution log from O1 to O2, we can produce an equivalent rewriting of q2 to the O1 by computing the valid rewriting of q2 on the sequence of the inverted changes of E.
12 of 25
13 of 37
4.3. Example
ActorActor
LiteralLiteral
Cont. PointCont. Point
LiteralLiteral
ssn address
has_cont_point
fullname
PersonPerson
?NAME?NAME
ActorActor
?SSN?SSN
Cont. PointCont. Point
ssn
?Address?Address
address
fullname
PersonPerson
?NAME?NAME
ActorActor
?SSN?SSN
Cont. PointCont. Point
ssn
?Address?Address
address
fullname
name
PersonPerson
name
LiteralLiteral LiteralLiteral
LiteralLiteralstreet
city
?STREET?STREET
?CITY?CITYstreet
city
GenderGenderhas_gender
Initial QueryInitial Query
Rewriten QueryRewriten Query
5. Problems & Solutions
ActorActor
LiteralLiteral
Cont. PointCont. Point
LiteralLiteral
ssn address
has_cont_point
fullname
ActorActor
?NAME?NAME
?SSN?SSNCont. PointCont. Point
ssn
?Address?Address
address
fullname
PersonPersonLiteralLiteral
has_cont_point
Problem Identification: One class is deleted but there exists a parent class, maintaining all properties
Problem resolution: Use that class to find more general answers
PersonPerson
14 of 25
6.1. System Architecture
DlvHex Prototype (Polleres, 2007)
15 of 25
6.2 Source Rewriter
Traditionally the problem was to find the maximally contained rewriting for one user query Algorithms: MiniCon (Pottinger,
2001), Bucket, Inverse rules
Now we have several queries, one for each ontology version. Information might need to be
combined among ontology versions
16 of 25
6.3 Source Rewriter
Reuse the best algorithm for finding maximally contained rewritings
But adopt it for multiple queriesProperties of the algorithm
Sound & Complete Complexity O(q(n m M)n)
q the number or valid rewriting, n the number of subgoals in the
biggest query,m the maximal number of
subgoals in a viewM the number of the mappings
Algorithm 3.3: EDI-Minicon(Q, M)Input: Q a set of datalog queries, M the mappings Output: The set of maximally-contained rewritings MQ 1. Initialize MCD={}, MQ={} 2. For each qj in Q 5. MCDj:= FormMCDs(qj, M) 6. Add MCDj to MCD 7. For each qj in Q 8. mqj := CombineMCDs (MCD, qj) 9. Add mqj to MQ10. Return MQ
17 of 25
CIDOC-CRM 80 classes 250 properties 726 changes
(01.02.02-01.06.05)
Queries 50 real user queries
from 3D-COFORM
18 of 37
7.Preliminaty Evaluation
Adding & restructuring information does not affect valid rewritings
Deleting Information however it does
In general assuming queries over v.4.2 from CIDOC we would be able to rewrite 89% of them
In general assuming queries over v.4.2 from CIDOC we would be able to rewrite 89% of them
19 of 25
7.4 Problems: Fiction or Reality?
In general assuming queries over v.4.2 from CIDOC we would be able to rewrite 89% of them to v.3.2.1
In general assuming queries over v.4.2 from CIDOC we would be able to rewrite 89% of them to v.3.2.1
A
B
D
B CA
Del D, Add C
Time
Add D, Del C
It makes no sense searching for C in previous versionsIt makes no sense searching for C in previous versions
Actually, we can provide access to the 99% of the source information through valid rewritings
Actually, we can provide access to the 99% of the source information through valid rewritings
7.2 Avg Running Time: 0,06 msec
20 of 25
8.1 Advantages of our approach
We don’t rewrite all the mappings but the query Exploit the locality of the query Mappings are produced one time and can be validated by domain
experts Greatly reduces human effort & time spent Our approach works independently of the family of mappings to the
sources (GAV, LAV, GLAV, nested e.t.c) The mappings to the sources are not affected at all in order to maintain
their initial semantics Modularity & scalability : New mappings or ontology changes can be
defined independently
We use high level changes to model ontology evolution High level changes can model complex ontology evolution Reduces the size of the evolution log Can be provided efficiently for two ontology versions.
21 of 25
22 of 25
8.2 Advantages of our approach
Valid Rewritings We define the answer semantics in such a setting Precise criteria exists for deciding when is possible to compute
valid rewritings. With small complexity
Even when no valid rewritings exist Smart things are done as more-general answers We can guide user in mapping redefinition
Computing Source Rewritings The increased computational complexity is linear to the number
of the input queries and remains scalable.
8.3 Conclusions
Ontology evolution is reality and data integration systems should be aware of this
We have shown how to answer queries over multiple ontology versions
To the best of our knowledge no system today is capable of query answering over multiple ontology versions
Future Work More extensive evaluation using Gene Ontology Semantic Infrastructure for plugIT Integrate our system to Protégé MASTRO system Extend our approach to OWL variants Consider RDF Sources and their Evolution as well
23 of 25
1. Philip A. Bernstein, Todd J. Green, Sergey Melnik, Alan Nash: Implementing mapping composition. VLDB J. (VLDB) 17(2):333-353 (2008)
2. Vicky Papavassiliou, Giorgos Flouris, Irini Fundulaki, Dimitris Kotzinos, Vassilis Christophides: On Detecting High-Level Changes in RDF/S KBs. International Semantic Web Conference 2009:473-488
3. Maurizio Lenzerini: Data Integration: A Theoretical Perspective. PODS 2002:233-246
4. Rachel Pottinger, Alon Y. Halevy: MiniCon: A scalable algorithm for answering queries using views. VLDB J. (VLDB) 10(2-3):182-198 (2001)
5. Axel Polleres: From SPARQL to rules (and back). WWW 2007:787-796
6. Yannis Tzitzikas, Dimitris Kotzinos: (Semantic web) evolution through change logs: Problems and solutions. Artificial Intelligence and Applications 2007:654-659
7. Yannis Velegrakis, Renée J. Miller, Lucian Popa, John Mylopoulos: ToMAS: A System for Adapting Mappings while Schemas Evolve. ICDE 2004:862
References
Questions?