W3C Invited Talk 16/09/2009Giorgos Flouris1 High-Level Change Detection in the Semantic Web...
-
Upload
robert-stevenson -
Category
Documents
-
view
216 -
download
2
Transcript of W3C Invited Talk 16/09/2009Giorgos Flouris1 High-Level Change Detection in the Semantic Web...
16/09/2009 Giorgos Flouris 1
W3C Invited Talk
High-Level Change Detectionin the Semantic Web
Institute of Computer Science Foundation for Research and Technology – Hellas
Heraklion, Greece
Giorgos Flouris
Joint work with:Vicky Papavassiliou, Irini Fundulaki,
Dimitris Kotzinos, Vassilis Christophides
16/09/2009 Giorgos Flouris 2
W3C Invited Talk
World Wide Web
WWW (and HTML) focus on human readability
Page presentation (fonts, colors, images, …)Human understandingPresentation Semantical contentContent is not formally described (for a machine to understand)
WWW contains documents, not data
16/09/2009 Giorgos Flouris 3
W3C Invited Talk
Problems with Current Web
Search and access becomes difficult
Software ignorant of the semantical content of a web pageKeyword searchHigh recall, low precision
Terminological issues
Synonyms (heart disease = cardiac disease)Hyponyms/hypernyms (parliament members are politicians)
Queries on the semantical content cannot be made
Fetch articles that support B. Obama’s foreign policyFetch the home pages of all members of the Greek Parliament
16/09/2009 Giorgos Flouris 4
W3C Invited Talk
Semantic Web
The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation(Berners-Lee et al., 2001)
The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries http://www.w3.org/2001/sw/
[Semantic Web] is a collaborative effort led by W3C with participation from a large number of researchers and industrial partnershttp://www.w3.org/2001/sw/
16/09/2009 Giorgos Flouris 5
W3C Invited Talk
Semantic Web in Practice
Web of data, rather than documents
HTML for presentationSemantical languages for semantical contentReadable and understandable by humans and machines
Semantic Web languages, protocols, etc
Web page annotation (metadata descriptions etc)Publication of data on the InternetEfficient communication and manipulation of data over the Internet
Different applications
Efficient searchingSharing of data (e-science, e-government, remote learning, …)
16/09/2009 Giorgos Flouris 6
W3C Invited Talk
Ontologies
Backbone of the Semantic Web
Ontologies allow the description of data
Annotation and metadata regarding web pagesTerminological relations (synonyms, hyponyms, …)Communication and description of data, ideas, beliefs
An ontology is an explicit specification of a shared conceptualization of a domain(Gruber, 1993)
Precise, logical account of the intended meaning of terms, data structures etc
Common (shared) interpretation of termsFormal vocabulary for information exchange (for humans and
machines)
16/09/2009 Giorgos Flouris 7
W3C Invited Talk
Ontologies in Practice
Basic structures:
Classes (or concepts): collections of objects (e.g., Actor, Politician)
Properties (or roles): binary relationships between objects (e.g., started_on, member_of)
Instances (or individuals): objects (e.g., Giorgos, B. Obama)
Relations between them
Subsumption (Parliament_Member subclass of Politician), instantiation (B. Obama instance of Politician), …
The allowed relations and their semantics depend on the language
Different representation languages for ontologies
RDF, RDFS, DAML+OiL, OWL, OWL-DL, OWL-Lite, OWL2, DLs, …Usually triple-based
16/09/2009 Giorgos Flouris 8
W3C Invited Talk
Visualization, Triples, Serialization
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
Define classes[Period type Class]Define properties[participants type Property][participants domain Onset][participants range Actor]Instantiate/define individuals[G_Birth type Birth][Giorgos type Actor][G_Birth participants Giorgos]Define hierarchies[Event subClass Period]
G_BirthGiorgosparticipants
<rdfs:Class rdf:ID=“Period”> </rdfs:Class> <rdf:Property rdf:ID=“participants”> <rdfs:domain rdf:resource=“Onset”/> <rdfs:range rdf:resource=“Actor”/> </rdf:Property> <G_Birth rdf:about Birth><participants><Giorgos rdf:about Actor/></participants></G_Birth><rdfs:Class rdf:ID=“Event”> <rdfs:subClassOf rdf:resource=“Period”/> </rdfs:Class>
Visualization Triple Representation Serialization (RDF/XML)
instantiation
subsumption
16/09/2009 Giorgos Flouris 9
W3C Invited Talk
Ontology Dynamics
Ontologies change constantly
World changes (dynamic models)View on the world changes (new knowledge, measurements, etc)Perspective and usage changes
Example: GO ontology changes daily
Gene Ontology: information about gene products (biology)
Must find a way to cope with changes
Ontology evolution (modify an ontology in response to a change)Ontology versioning (keep track of versions and their relations)…
We deal with a peripheral problem (change detection)
16/09/2009 Giorgos Flouris 10
W3C Invited Talk
What is Change?
Ontology
Real World
Ontology EvolutionAlgorithmDelete_Class(…)
Pull_Up_Class(…)Rename_Class(…)
…
16/09/2009 Giorgos Flouris 11
W3C Invited Talk
What is Change Detection?
Ontology
Real World
Delete_Class(…)Pull_Up_Class(…)Rename_Class(…)
…
Change Detection Algorithm
16/09/2009 Giorgos Flouris 12
W3C Invited Talk
Keeping Track of Changes
Purpose of this work: change detection
A posteriori detect the differences (delta or diff) between versions in a concise, intuitive and correct way
It is important to store the changes between versions
Visualization of differencesEfficient storage and/or communicationEvolution history
Record changes as they happen (manual or automatic)
Error-prone, difficult (often impossible)
V1 V2 V3 V4 V5
C1 C2 C3 C4
16/09/2009 Giorgos Flouris 13
W3C Invited Talk
Sample Evolution
Persistent
Event
Onset Birth
Stuff
Actor
started_on
participants
Version 1 (V1) Version 2 (V2)
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
G_BirthGiorgosparticipants
instantiation
subsumption
instantiation
subsumption
G_BirthGiorgosparticipants
Evolution
16/09/2009 Giorgos Flouris 14
W3C Invited Talk
Analyzing the Evolution (Using Triples)
Triples in V1 (partial list)
[Event type Class]
[Period type Class]
[Event subclass Period]
[participants type Property]
[participants domain Onset]
[participants range Actor]
[Giorgos type Actor]
[Existing type Class]
[Stuff subclass Existing]
[started_on domain Existing]
[Onset subclass Event]
[Birth subclass Onset]
…
Triples in V2 (partial list)
[Event type Class]
[participants type Property]
[Event domain participants]
[participants range Actor]
[Giorgos type Actor]
[Persistent type Class]
[Stuff subclass Persistent]
[started_on domain Persistent]
[Onset subclass Event]
[Birth subclass Event]
…
16/09/2009 Giorgos Flouris 15
W3C Invited Talk
Low-Level Delta
Triples in V2 but not in V1
(added triples)
[Event domain participants]
[Persistent type Class]
[Stuff subclass Persistent]
[started_on domain Persistent]
[Birth subclass Event]
Triples in V1 but not in V2
(deleted triples)
[Period type Class]
[Event subclass Period]
[participants domain Onset]
[Existing type Class]
[Stuff subclass Existing]
[started_on domain Existing]
[Birth subclass Onset]
Persistent
Event
Onset Birth
Stuff
Actor
started_on
participants
Version 1 (V1) Version 2 (V2)
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
G_BirthGiorgosparticipants
instantiation
subsumption
instantiation
subsumption
G_BirthGiorgosparticipants
Evolution
Persistent
Event
Onset Birth
Stuff
Actor
started_on
participants
Version 1 (V1) Version 2 (V2)
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
G_BirthGiorgosparticipants
instantiation
subsumption
instantiation
subsumption
instantiation
subsumption
instantiation
subsumption
G_BirthGiorgosparticipants
Evolution
Low-Level DeltaAdd([Event domain participants])
Add([Persistent type Class])…
Del([Period type Class])…
16/09/2009 Giorgos Flouris 16
W3C Invited Talk
Analyzing the Evolution (Visually)
Persistent
Event
Onset Birth
Stuff
Actor
started_on
participants
Version 1 (V1) Version 2 (V2)
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
G_BirthGiorgosparticipants
instantiation
subsumption
G_BirthGiorgosparticipants
Evolution
High-Level DeltaGeneralize_Domain(participants, Onset, Event)
Pull_Up_Class(Birth, Onset, Event)Delete_Class(Period, Ø, {Event}, Ø, Ø, Ø, Ø)
Rename_Class(Existing, Persistent)
16/09/2009 Giorgos Flouris 17
W3C Invited Talk
Comparing the Deltas
Persistent
Event
Onset Birth
Stuff
Actor
started_on
participants
Version 1 (V1) Version 2 (V2)
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
G_BirthGiorgosparticipants
instantiation
subsumption
G_BirthGiorgosparticipants
Evolution
Del([participants domain Onset])Add([participants domain Event])
Generalize_Domain(participants, Onset, Event)
Del([Birth subclass Onset])Add([Birth subclass Event])
Pull_Up_Class(Birth, Onset, Event)
Low-level delta High-level delta
Del([Period type Class])Del([Event subclass Period])
Delete_Class (Period,Ø,{Event},Ø,Ø,Ø,Ø)
16/09/2009 Giorgos Flouris 18
W3C Invited Talk
Associations (Partitioning)
Low-Level Changes Associated High-Level Changes
Del([participants domain Onset]) Generalize_Domain(participants, Onset, Event)Add([participants domain Event])
Del([Birth subclass Onset])Pull_Up_Class(Birth, Onset, Event)
Add([Birth subclass Event])
Del([Period type Class]) Delete_Class(Period, Ø, {Event}, Ø, Ø, Ø, Ø)Del([Event subclass Period])
Del([Existing type Class])
Rename_Class(Existing, Persistent)
Del([Stuff subclass Existing])
Del([started_on domain Existing])
Add([Persistent type Class])
Add([Stuff subclass Persistent])
Add([started_on domain Persistent])
16/09/2009 Giorgos Flouris 19
W3C Invited Talk
Low-Level Versus High-Level Deltas
Purpose:
A posteriori detect the differences (delta or diff) between versions in a concise, intuitive and correct way
Low-level deltas
Easier to get
High-level deltas
More concise (e.g., Rename_Class)More intuitive (e.g., Pull_Up_Class)Carry additional information (e.g., Generalize_Domain)
Objective: detection of high-level deltas
16/09/2009 Giorgos Flouris 20
W3C Invited Talk
Language of Changes and Algorithm
Deltas based on some language of changes
A set of formal definitions that describe the changes that can be understood and detected
Can be high-level or low-levelMust be coupled with a corresponding detection algorithm
Low-level languages easy to define (Add(t), Del(t))
High-level languages more complicated
Several proposals; no standard
Challenges for high-level languages
Must be deterministic (exactly one high-level delta)Must be fine-grained enough to capture subtle changesMust be coarse-grained enough to be concise
16/09/2009 Giorgos Flouris 21
W3C Invited Talk
Proposed Language L
The formal definition of a change consists of:
Changes required in the low-level delta (added/deleted triples)
Conditions that should hold in V1 and/or V2
Generalize_Domain(P, X, Y)
Del([P domain X])Add([P domain Y])
P existing property in both V1, V2
X, Y existing classes in both V1, V2
X subclass of Y in both V1, V2
Generalize_Domain(participants, Onset, Event): detectable
Similarly for the other changes in L (about 120 in total)
16/09/2009 Giorgos Flouris 22
W3C Invited Talk
Results on L: Granularity
Granularity problem: solved by defining levels of changes
Basic Changes: fine-grained, roughly correspond to low-levelComposite Changes: coarse-grained, group several basic changes
togetherHeuristic Changes: based on heuristics, necessary for Rename,
Merge, Split etc
Problems with determinism
One evolution could correspond to different sets of basic/composite changes
Priorities in detection
Heuristic Composite Basic
16/09/2009 Giorgos Flouris 23
W3C Invited Talk
Results on L: Types of Changes
Changes
Low-Level High-Level
Basic Composite Heuristic
AddDel
Delete_SubclassDelete_Domain
Pull_Up_ClassChange_Domain
Rename_ClassSplit_Class
16/09/2009 Giorgos Flouris 24
W3C Invited Talk
Results on L: Determinism
Each low-level change is associated with exactly one detectable high-level change
Full partitioning of low-level changes into high-level ones
Each pair of versions (V1, V2) is associated with:
Exactly one low-level deltaExactly one high-level delta
Determinism is necessary
More than one would lead to ambiguities
Less than one would make some inputs (V1, V2) irresolvable
16/09/2009 Giorgos Flouris 25
W3C Invited Talk
Results on L: Application
Persistent
Event
Onset Birth
Stuff
Actor
started_on
participants
Version 1 (V1) Version 2 (V2)
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
G_BirthGiorgosparticipants
G_BirthGiorgosparticipants
Detect C
Apply C
Apply C-1
16/09/2009 Giorgos Flouris 26
W3C Invited Talk
Results on L: Deltas Keep Version History
Can reproduce all versions as long as you keep (any) one version and the deltas
Deltas are more concise than the versions themselves
Storage and communication efficiency
V1 V2 V3 V4 V5
C1 C2 C3 C4
16/09/2009 Giorgos Flouris 27
W3C Invited Talk
Calculate Low-Level Delta
Detection Algorithm for L (1/2)
Persistent
Event
Onset Birth
Stuff
Actor
started_on
participants
Version 2 (V2)
G_BirthGiorgosparticipants
Persistent
Event
Onset Birth
Stuff
Actor
started_on
participants
Version 2 (V2)
G_BirthGiorgosparticipants
Version 1 (V1)
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
G_BirthGiorgosparticipants
Version 1 (V1)
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
G_BirthGiorgosparticipants
Triples in V1 (Partial List)[Period type Class]
[Event subclass Period][participants type Property][participants domain Onset][participants range Actor]
[Existing type Class][Stuff subclass Existing]
[started_on domain Existing][Onset subclass Event]
…
Triples in V2 (Partial List)[Event type Class]
[participants type Property][Event domain participants][participants range Actor]
[Giorgos type Actor][Persistent type Class]
[Stuff subclass Persistent][started_on domain Persistent]
[Onset subclass Event][Birth subclass Event]
…
Triples in Delta (step 1: low-level)Del([participants domain Onset])
Del([Birth subclass Onset])Del([Event subclass Period])
Del([Existing type Class])Del([Stuff subclass Existing])
Del([started_on domain Existing])Del([Period type Class])
Add([Birth subclass Event])Add([participants domain Event])
Add([Persistent type Class])Add([Stuff subclass Persistent])
Add([started_on domain Persistent])
Run Matcher(External)
List of Mappings<V1:Existing> is matched with <V2:Persistent>
Compute HeuristicChanges
Heuristic ChangesRename_Class(Existing, Persistent)
16/09/2009 Giorgos Flouris 28
W3C Invited Talk
Triples in Delta (step 3: basic and composite)Del([Birth subclass Onset])
Del([Event subclass Period])Del([Period type Class])
Add([Birth subclass Event])Rename_Class(Existing, Persistent)
Generalize_Domain(participants, Onset, Event)
Detection Algorithm for L (2/2)
Persistent
Event
Onset Birth
Stuff
Actor
started_on
participants
Version 2 (V2)
G_BirthGiorgosparticipants
Persistent
Event
Onset Birth
Stuff
Actor
started_on
participants
Version 2 (V2)
G_BirthGiorgosparticipants
Version 1 (V1)
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
G_BirthGiorgosparticipants
Version 1 (V1)
Period
Actor Event
OnsetExisting
Stuff Birth
started_on
participants
G_BirthGiorgosparticipants
Triples in V1 (Partial List)[Period type Class]
[Event subclass Period][participants type Property][participants domain Onset][participants range Actor]
[Existing type Class][Stuff subclass Existing]
[started_on domain Existing][Onset subclass Event]
…
Triples in V2 (Partial List)[Event type Class]
[participants type Property][Event domain participants][participants range Actor]
[Giorgos type Actor][Persistent type Class]
[Stuff subclass Persistent][started_on domain Persistent]
[Onset subclass Event][Birth subclass Event]
…
Triples in Delta (step 2: heuristic)Del([participants domain Onset])
Del([Birth subclass Onset])Del([Event subclass Period])
Del([Period type Class])Add([Birth subclass Event])
Add([participants domain Event])Rename_Class(Existing, Persistent)
Del([participants domain Onset])
FindAssociated
Change
Generalize_Domain(participants, Onset, Event)
DETECTABLE
Triples in Delta (step 4: result)Delete_Class(Period, Ø, {Event}, Ø, Ø, Ø, Ø)
Pull_Up_Class(Birth, Onset, Event)Rename_Class(Existing, Persistent)
Generalize_Domain(participants, Onset, Event)
??
?
16/09/2009 Giorgos Flouris 29
W3C Invited Talk
Find Associated Change
Del([participants domain Onset])
Required in Low-Level Delta Potentially Associated High-Level Change
Add([participants domain X]) Generalize_Domain(participants, Onset, X)
Add([participants domain X]) Specialize_Domain(participants, Onset, X)
--- Delete_Domain(participants, Onset)
Del([participants type Property])Del([participants range X])
Delete_Property(participants, Onset, X)
… …
OperationsPull_Up_Class(*,*,*) [not in the table]Delete_Property(participants,*,*) [necessary triples not found]Specialize_Domain(participants, Onset, Event) [conditions not true]Generalize_Domain(participants, Onset, Birth) [wrong parameter (triples not found)]Generalize_Domain(participants, Onset, Event) [DETECTABLE (ASSOCIATED)]Delete_Domain(participants, Onset) [composite changes have priority]
16/09/2009 Giorgos Flouris 30
W3C Invited Talk
Implementation
Algorithm implemented for experiments and evaluation
Uses the APIs of SWKM
Platform for efficient and scalable management of dynamic RDF/S ontologies and data
Query, update, low-level delta, high-level delta, versioning, …
16/09/2009 Giorgos Flouris 31
W3C Invited Talk
Performance
Complexity: O(max{N1,N2,N2})
Linear average-caseHighly dependent on the detected changes (type, number)
16/09/2009 Giorgos Flouris 32
W3C Invited Talk
Evaluation: Usefulness and Intuitiveness
L is well-defined (changes used in practice)
GO: add/delete class, comments changingCIDOC: add/delete/rename properties
Results confirmed by literature/editor notes
16/09/2009 Giorgos Flouris 33
W3C Invited Talk
Evaluation: Conciseness
Basic ≈ Low-Level
Basic+Composite+Heuristic << Low-Level
16/09/2009 Giorgos Flouris 34
W3C Invited Talk
Manual Change Recording (CIDOC)
Editor notes
Delete class: 3
Add property: 54
Delete property: 16
Rename property: 24
Redirect properties (domain): 14
Redirect properties (range): 14
Detection result
Delete class: 6
Add property: 58
Delete property: 18
Rename property: 30
Generalize_Domain: 13Specialize_Domain: 1
Generalize_Range: 14Specialize_Range: 1Change_Range: 1
16/09/2009 Giorgos Flouris 35
W3C Invited Talk
Conclusion
High-level change detection
A posteriori detection (input: V1, V2)
No further information needed (e.g., logs, change recording etc)
Formal semantics
Formal results (reversibility, determinism, …)Non-heuristic based (except for heuristic changes)No need for precision and recall evaluation
Efficient, sound and complete detection algorithm
Nice informal properties
Conciseness, intuitiveness
Future work: more operations, evaluation on other datasets, evaluation with real users
16/09/2009 Giorgos Flouris 36
W3C Invited Talk
References
1. Vicky Papavassiliou, Giorgos Flouris, Irini Fundulaki, Dimitris Kotzinos, Vassilis Christophides. On Detecting High-Level Changes in RDF/S KBs. In Proceedings of the 8th International Semantic Web Conference (ISWC-09), to appear, 2009
2. Vicky Papavassiliou, Giorgos Flouris, Irini Fundulaki, Dimitris Kotzinos, Vassilis Christophides. Formalizing High-Level Change Detection for RDF/S KBs. Technical Report TR-398, FORTH-ICS, 2009