Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis...

25
Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete Information Systems Laboratory, FORTH-ICS

Transcript of Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis...

Page 1: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

Enabling Ontology Evolution in Data Integration

Haridimos KondylakisDimitris Plexousakis

Yannis Tzitzikas

Computer Science Department, University of Crete

Information Systems Laboratory, FORTH-ICS

Page 2: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

Problem Statement

Data Integration SystemData Integration System

DBDB DBDB DBDB

Query

Sub-queries

Mappings

2 of 25

Page 3: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

Outline

1. Past Approaches

2. Our Idea

3. Modelling Ontology Evolution

4. Rewritings among ontology versions

5. Problems & Solutions

6. Rewritings to the sources

7. Implementation/Evaluation

8. Conclusions

3 of 25

Page 4: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

1. Past Approaches (1/2)

Mapping Adaptation (Velegrakis, 2004)

Idea: After each small evolution the mapping can be incrementally adapted by applying local modifications.

System-dependent The list of changes may not be given

and should be discovered (how?) Multiple list of changes may lead to the

same effect

Cannot handle complex change operations such as split & merge

The algorithm should reapply after each primitive change Inefficient when we have a long

list of changes

S O

O1

O2

move elem

add elem

delete constraint

M1

M2

M3

Lack of a precise criterion under which the adapted mappings constitute indeed the “right result”

4 of 25

Page 5: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

1. Past Approaches (2/2)

Mapping Composition (Bernstein, 2008)

Idea: Is it possible to generate M’ that is equivalent to the original mappings?

No known implementation on ontology evolution

First-order mappings: not closed under composition

Second-order: Too difficult to handle Not supported by DBMS ( not likely in

the future either) Not understood by domain experts

MS O

O’M’ = M ° E

E

Can use schema mapping tools to construct E.

The composition for all mappings should be produced. Several Sets of mapping between each

T and T’5 of 25

Page 6: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

“Everything should be as simple as it is, but not simpler” -Albert Einstein

Data Integration SystemData Integration System

DBDB DBDB DBDB

MappingsMappings Mappings

DBDB DBDB DBDB

Ontology as global schemaRDF/S Ontology

SpaRQL SpaRQL

System IndependentSystem Independent More IntuitiveMore Intuitive

Only one mapping set

Only one mapping set

ModularModular Mappings created only once

Mappings created only once

Verifiable MappingsVerifiable Mappings

6 of 25

Page 7: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

“Everything that exists, it is only change”-Heraclitus 535 BCE

Definition (Change Operation). A change u from one ontology version O1 to another version O2 is defined as a tuple (δa, δd) where:

δa corresponds to the triples that are added to O1 in order to get O2

δd corresponds to the triples that are deleted from O1 in order to get O2

δa(u) δd(u)≠ø, δa(u) δd(u)= ø,

δa(u1) δa(u2)= ø δd(u1) δd(u2)= ø

Definition (Application semantics of a high-level change). The application of u upon O denoted by u(O) is defined as

u(O) = (O δa(u)) \ δd(u).7 of 25

Page 8: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

3.1 Example

u1= (Delete, ø, {has_gender(Person, Gender)} )

u2= (Move, {has_cont_point(Person, Cont_Point)}, {has_cont_point(Actor, Cont_Point)})

u3 = (Merge, {domain(Cont_Point, address)},{domain(Cont_Point, street), domain(Cont_Point, city)})

u4 = (Rename, {domain(Person, fullname)}, {domain(Person, name)})

PersonPerson

LiteralLiteral

ActorActorGenderGender

LiteralLiteral

Cont. PointCont. Point

LiteralLiteral

LiteralLiteral

name

ssn

has_gender

street

cityaddress

LiteralLiteral

has_cont_point

fullname

IntuitiveIntuitive

ConciseConcise

Can Describe complex evolutionCan Describe complex evolution

8 of 25

Page 9: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

9 of 37

4. Data Integration Redefined

RDF/S OntologyRDF/S Ontology

SourcesSourcesMappingsMappings

Definition (Data Integration): A data integration system I is a quadruple (O, E, S, M) where

•O is a version of the Ontology, •E is the evolution log of the Ontology

•(between the ontologies under consideration), •S is the set of the local sources, •M is the mapping between S and one version Oi

Page 10: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

4.1 Affecting change operations

Definition (Affecting change operation). A change operation u affects the query Q (with graph pattern G), i.e u ◊ Q if: δd(u)≠ø and

triple pattern t G that can be unified with a triple of δd(u).

Definition (Valid Rewriting): Let q a query expressed in O1, us a sequence of change operations such that us(O1)= O2. q' is a valid rewriting of q over O2 using us if ui δd(u)such that ui ◊ q holds that |δa(ui)|>0,

t δd(ui), t ◊ q

and is constructed as follows:

q':= (q – δd(ui)) δa(ui).

10 of 25

Page 11: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

Definition (equivalent query rewriting): (Lenzerini, 2002) Let O1, O2 two ontology versions,

E a set of dependencies on the O1 O2

q2 a O2-query

An equivalent rewriting of q2 in presence of E is a query O1-query, q1 such that q1 gives the same answers as q2 on any O1 instance that satisfies E

Theorem: Valid rewritings are equivalent query rewritings and can be computed with O(N*T) time complexity (N= #us, T =#triples in G)

4.2 Query answering semantics

11 of 25

Page 12: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

4.3 Results

Proposition (Uniqueness): Valid rewritings are unique

Proposition (Inverse Query Rewriting): if q2 is a query over O2 and E the evolution log from O1 to O2, we can produce an equivalent rewriting of q2 to the O1 by computing the valid rewriting of q2 on the sequence of the inverted changes of E.

12 of 25

Page 13: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

13 of 37

4.3. Example

ActorActor

LiteralLiteral

Cont. PointCont. Point

LiteralLiteral

ssn address

has_cont_point

fullname

PersonPerson

?NAME?NAME

ActorActor

?SSN?SSN

Cont. PointCont. Point

ssn

?Address?Address

address

fullname

PersonPerson

?NAME?NAME

ActorActor

?SSN?SSN

Cont. PointCont. Point

ssn

?Address?Address

address

fullname

name

PersonPerson

name

LiteralLiteral LiteralLiteral

LiteralLiteralstreet

city

?STREET?STREET

?CITY?CITYstreet

city

GenderGenderhas_gender

Initial QueryInitial Query

Rewriten QueryRewriten Query

Page 14: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

5. Problems & Solutions

ActorActor

LiteralLiteral

Cont. PointCont. Point

LiteralLiteral

ssn address

has_cont_point

fullname

ActorActor

?NAME?NAME

?SSN?SSNCont. PointCont. Point

ssn

?Address?Address

address

fullname

PersonPersonLiteralLiteral

has_cont_point

Problem Identification: One class is deleted but there exists a parent class, maintaining all properties

Problem resolution: Use that class to find more general answers

PersonPerson

14 of 25

Page 15: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

6.1. System Architecture

DlvHex Prototype (Polleres, 2007)

15 of 25

Page 16: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

6.2 Source Rewriter

Traditionally the problem was to find the maximally contained rewriting for one user query Algorithms: MiniCon (Pottinger,

2001), Bucket, Inverse rules

Now we have several queries, one for each ontology version. Information might need to be

combined among ontology versions

16 of 25

Page 17: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

6.3 Source Rewriter

Reuse the best algorithm for finding maximally contained rewritings

But adopt it for multiple queriesProperties of the algorithm

Sound & Complete Complexity O(q(n m M)n)

q the number or valid rewriting, n the number of subgoals in the

biggest query,m the maximal number of

subgoals in a viewM the number of the mappings

Algorithm 3.3: EDI-Minicon(Q, M)Input: Q a set of datalog queries, M the mappings Output: The set of maximally-contained rewritings MQ 1. Initialize MCD={}, MQ={} 2. For each qj in Q 5. MCDj:= FormMCDs(qj, M) 6. Add MCDj to MCD 7. For each qj in Q 8. mqj := CombineMCDs (MCD, qj) 9. Add mqj to MQ10. Return MQ

17 of 25

Page 18: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

CIDOC-CRM 80 classes 250 properties 726 changes

(01.02.02-01.06.05)

Queries 50 real user queries

from 3D-COFORM

18 of 37

7.Preliminaty Evaluation

Adding & restructuring information does not affect valid rewritings

Deleting Information however it does

In general assuming queries over v.4.2 from CIDOC we would be able to rewrite 89% of them

In general assuming queries over v.4.2 from CIDOC we would be able to rewrite 89% of them

Page 19: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

19 of 25

7.4 Problems: Fiction or Reality?

In general assuming queries over v.4.2 from CIDOC we would be able to rewrite 89% of them to v.3.2.1

In general assuming queries over v.4.2 from CIDOC we would be able to rewrite 89% of them to v.3.2.1

A

B

D

B CA

Del D, Add C

Time

Add D, Del C

It makes no sense searching for C in previous versionsIt makes no sense searching for C in previous versions

Actually, we can provide access to the 99% of the source information through valid rewritings

Actually, we can provide access to the 99% of the source information through valid rewritings

Page 20: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

7.2 Avg Running Time: 0,06 msec

20 of 25

Page 21: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

8.1 Advantages of our approach

We don’t rewrite all the mappings but the query Exploit the locality of the query Mappings are produced one time and can be validated by domain

experts Greatly reduces human effort & time spent Our approach works independently of the family of mappings to the

sources (GAV, LAV, GLAV, nested e.t.c) The mappings to the sources are not affected at all in order to maintain

their initial semantics Modularity & scalability : New mappings or ontology changes can be

defined independently

We use high level changes to model ontology evolution High level changes can model complex ontology evolution Reduces the size of the evolution log Can be provided efficiently for two ontology versions.

21 of 25

Page 22: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

22 of 25

8.2 Advantages of our approach

Valid Rewritings We define the answer semantics in such a setting Precise criteria exists for deciding when is possible to compute

valid rewritings. With small complexity

Even when no valid rewritings exist Smart things are done as more-general answers We can guide user in mapping redefinition

Computing Source Rewritings The increased computational complexity is linear to the number

of the input queries and remains scalable.

Page 23: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

8.3 Conclusions

Ontology evolution is reality and data integration systems should be aware of this

We have shown how to answer queries over multiple ontology versions

To the best of our knowledge no system today is capable of query answering over multiple ontology versions

Future Work More extensive evaluation using Gene Ontology Semantic Infrastructure for plugIT Integrate our system to Protégé MASTRO system Extend our approach to OWL variants Consider RDF Sources and their Evolution as well

23 of 25

Page 24: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

1. Philip A. Bernstein, Todd J. Green, Sergey Melnik, Alan Nash: Implementing mapping composition. VLDB J. (VLDB) 17(2):333-353 (2008)

2. Vicky Papavassiliou, Giorgos Flouris, Irini Fundulaki, Dimitris Kotzinos, Vassilis Christophides: On Detecting High-Level Changes in RDF/S KBs. International Semantic Web Conference 2009:473-488

3. Maurizio Lenzerini: Data Integration: A Theoretical Perspective. PODS 2002:233-246

4. Rachel Pottinger, Alon Y. Halevy: MiniCon: A scalable algorithm for answering queries using views. VLDB J. (VLDB) 10(2-3):182-198 (2001)

5. Axel Polleres: From SPARQL to rules (and back). WWW 2007:787-796

6. Yannis Tzitzikas, Dimitris Kotzinos: (Semantic web) evolution through change logs: Problems and solutions. Artificial Intelligence and Applications 2007:654-659

7. Yannis Velegrakis, Renée J. Miller, Lucian Popa, John Mylopoulos: ToMAS: A System for Adapting Mappings while Schemas Evolve. ICDE 2004:862

References

Page 25: Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.

Questions?