LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

94
L Logics for D Data and K Knowledge R Representation Application of (Ground) ClassL

Transcript of LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Page 1: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

LLogics for DData and KKnowledgeRRepresentation

Application of (Ground) ClassL

Page 2: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Outline Ontologies

Lightweight Ontologies

Classifications

Optimization of Classifications

Document Classification in LOs

Query-answering in LOs

Semantic Matching

2

Page 3: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Ontology Ontologies are explicit

specifications of conceptualizations.

They are often thought of as directed graphs whose nodes represent concepts and whose edges represent relations between concepts.

3

Animal

Bird HeadMammal

Predator Herbivore

GoatTiger

Chicken

Cat

Is-a

Is-a

Is-aIs-a

Is-a

EatsEats

Is-aPart-of

Is-a Is-a

Eats

Body

Part-of

Page 4: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Concept The notion of concept is understood as defined in

Knowledge Representation, i.e., as a set of objects or individuals.

This set is called the concept extension or the concept interpretation.

Concepts are often lexically defined, i.e. they have natural language names which are used to describe the concept extensions.

4

Page 5: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Relation The notion of relation is understood as a set of

ordered pairs, with the two items of the pair from the source concept and the target concept respectively.

The backbone structure of the ontology graph is a taxonomy in which the relations are ‘is-a’, ‘part-of’ and ‘instance-of’ whereas the remaining structure of the graph supplies auxiliary information about the modeled domain and may include relations like ‘located-in’, ‘sibling-of’, ‘ant’, etc.

5

Page 6: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Ontology as a graph

A mathematical definition comes from ‘graph’, an ontology is an ordered pair

O=<V, E>

in which V is the set of vertices describing the concepts and E is the set of edges describing relations.

6

Page 7: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Tree-like Ontologies Take the ontology in

previous slide, remove those auxiliary relations…

… we get a tree-like ontology consisting of the backbone structure with ‘is-a’, ‘part-of’ and even ‘instance-of’ relations.

They are informal Lightweight Ontologies.

7

Animal

Bird HeadMammal

Predator Herbivore

GoatTiger

Chicken

Cat

Is-a

Is-a

Is-aIs-a

Is-a

EatsEats

Is-aPart-of

Is-a Is-a

Eats

Body

Part-of

Page 8: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Descriptive VS. Classification Ontologies Some ontologies are used to describe a piece of world,

such as the Gene ontology, Industry ontology, etc. The purpose it to make a clear description of the world. This is usually the first idea to mind when people talk about ontologies.

Some other ontologies are used to classify things, such as books, documents, web pages, etc. The aim is to provide a domain specific category to organize individuals accordingly. Such ontologies usually take the form of classifications with or without explicit meaningful links.

We will see the difference further, in the transformation into formal Lightweight Ontologies.

8

Page 9: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Why ‘Lightweight’ Ontologies?

Two observations:

1. Majority of existing ontologies are ‘simple’ taxonomies or classifications, i.e., categories to classify resources.

2. Ontologies with arbitrary relations do exist, but no intuitively reasoning techniques support such ontologies in general.

… so we need ‘lightweight’ ontologies.9

Page 10: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Outline Ontologies

Lightweight Ontologies

Classifications

Optimization of Classifications

Document Classification in LOs

Query-answering in LOs

Semantic Matching

10

Page 11: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Lightweight Ontologies A (formal) lightweight ontology is a triple

O = <N,E,C> where

N is a finite set of nodes, E is a set of edges on N, such that <N,E> is a rooted

tree, and C is a finite set of concepts expressed in a formal

language F, such that for any node ni N, there is one ∈and only one concept ci C, and, if n∈ i is the parent node for nj ,then cj c⊑ i.

11

Page 12: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

From Tree-like Ontologies to LOs

12

Animal

Bird HeadMammal

Predator Herbivore

GoatTiger

Chicken

Cat

Is-a

Is-a

Is-aIs-a

Is-a Is-aPart-of

Is-a Is-a

Body

Part-of

Head Body

Part-ofPart-

of

Animal

Bird Mammal

Predator Herbivore

GoatTiger

Chicken

Cat

⊑ ⊑

⊑ ⊑ ⊑

⊑ ⊑ ⊑

Page 13: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

In Classification Semantics…

13

Animal

Bird HeadMammal

Predator Herbivore

GoatTiger

Chicken

Cat

Is-a

Is-a

Is-aIs-a

Is-a Is-aPart-of

Is-a Is-a

Body

Part-of

Head Body

Part-ofPart-

of

Animal

Bird Mammal

Predator Herbivore

GoatTiger

Chicken

Cat

⊑ ⊑

⊑ ⊑ ⊑

⊑ ⊑ ⊑

⊑⊑

Page 14: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

From Tree-like Ontologies to LOs cont. For a descriptive tree-like ontology, the backbone

taxonomy of ‘is-a’ intuitively coincident with ‘subsumption’ relation in LOs. But ‘part-of’ relations has to be modeled as a new kind of binary relation in order to preserve the semantics.

For a classification ontology, the semantics behind the labels of the nodes are the extension interpretation, i.e. the documents (books, websites, etc.) that should be classified under the nodes. Therefore, ‘part-of’ relation also follows the intuition of ‘subsumption’ and can be transformed directly into ‘⊑’ in the target LOs.

14

Page 15: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Populated (Lightweight) Ontologies In Information Retrieval, the term classification is

seen as the process of arranging a set of objects (e.g., documents) into categories or classes.

A classification Ontology is said populated if a set of objects have been classified under ‘proper’ nodes.

Thus a populated (Lightweight) Ontology consists a new type of links: instance-of.

15

Page 16: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Example of a Populated Ontology

16

⊑ ⊑

Head Body

Animal

Bird Mammal

Predator Herbivore

GoatTiger

Chicken

Cat

⊑ ⊑

⊑ ⊑ ⊑

⊑⊑ ⊑

‘Chicken Soup’

‘How to Raise Chicken’

‘Tom and Jerry’ ‘www.protectTiger.org’ …

Instance-of

Instance-of

Instance-of Instance-of Instance-of

Page 17: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Lightweight Ontologies in ClassL:TBox Subsumption terminologies:

‘… C is a finite set of concepts expressed in a formal language F, such that for any node ni N, there is one ∈and only one concept ci C, and, if n∈ i is the parent node for nj ,then cj c⊑ i.’

1. Bird⊑ Animal

2. Mammal⊑ Animal

3. Chicken⊑ Bird

4. Cat⊑ Predator

5. … 17

Observation: a tree-like ontology can be transformed into a lightweight

ontology, but not vise versa.

Page 18: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Populated LOs in ClassL: TBox+ABox Subsumption terminologies:

‘… cj c⊑ i.’ ‘Instance of’ links: ‘concept assertion!’

1. …

2. …

3. …

4. …

5. Chicken(ChickenSoup)

6. Cat(TomAndJerry)

7. …

18

Page 19: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Outline Ontologies

Lightweight Ontologies

Classifications

Optimization of Classifications

Document Classification in LOs

Query-answering in LOs

Semantic Matching

19

Page 20: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Classifications… Classifications hierarchies are easy to use...

... for humans. Classifications hierarchies are pervasive (Google,

Yahoo, Amazon, our PC directories, email folders, address book, etc.).

Classifications hierarchies are largely used in industry (Google, Yahoo, eBay, Amazon, BBC, CNN, libraries, etc.).

Classification hierarchies have been studied for very long (e.g., Dewey Decimal Classification system -- DCC, Library of Congress Classification system –LCC, etc.).

20

Page 21: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Classification Example: Yahoo! Directory

21

Page 22: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Classification Example: Email Folders

22

Page 23: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Classification Example: E-Commerce Category

23

Page 24: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Classifications .. more Classifications hierarchies are lightweight (no

roles, trees or simple DAGs, …). Classification hierarchies are a kind of concept

hierarchies. Labels are natural language sentences; useful but

hard to deal with in an automated way. Links are of the kind “child-of” (e.g. “economy

child-of Europe”), where in an ontology you would have, (instance-of}, or roles, or {is-a} links.

No clear semantics for both labels at nodes and links.

24

How to use such informal

information?

Page 25: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Recall: Lightweight Ontologies A (formal) lightweight ontology is a

triple

O = <N,E,C>, where

N is a finite set of nodes, E is a set of edges on N, such that <N,E> is

a rooted tree, and C is a finite set of concepts expressed in

a formal language F, such that for any node ni N, there is one and only one concept ∈ci C, and, if n∈ i is the parent node for nj ,then cj c⊑ i.

25

A classification already has.

To be fixed

Page 26: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

What do LOs Bring? We know that a lightweight ontology is a formal

conceptualization of a domain in terms of concepts and {is-a, instance-of} relationships.

Lightweight ontologies (LOs) add a formal semantics and {instance-of} relationships to classification hierarchies.

In short: LOs make classifications formal!

26

Page 27: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

LOs and Ground Class Logic Ground ClassL provides a formal language (syntax

+ semantics) to model lightweight ontologies, where:

concepts are modeled by propositions and formulas;

‘is-a’ relationship is modeled by subsumption ( ) ⊑

and ‘is-instance-of’ relationship is modeled by individual assertion (i.e., wffs like P(a)).

27

Page 28: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Label Semantics Natural language words are

often ambiguous.

E.g. Java (an island, a beverage, an OO programming language)

When used with other words in a label, improper senses can be pruned.

E.g., “Java Language” – only the 3rd sense of Java is preserved.

28

Level

4

Subjects

Computers andInternet

0

1

2

3

(1)

(3)

(5)

(7)

(8)

Programming

Java Language

Java Beans

Page 29: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

From NL Labels to Labels in Class Logic Several approaches to rewrite a natural language label

into a ClassL proposition. Following (Giunchiglia et al., 2007), we may

distinguish four steps:

1. Tokenization (get distinct words);Italian Pictures ‘Italian’, ‘Pictures’

2. Words stemming (get to a basic form);Pictures picture

3. Rewrite each word into its proposition;picture picture-noun-1⊓picture-noun-2⊓…⊓picture-verb-2

4. Prune inconsistent senses.picture-noun-1⊓picture-noun-2⊓…⊓picture-verb-2pictureN1

29

Page 30: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Class Logic Label Eamples E.g.1: “Java” becomes the proposition

Java#1 ⊔ Java#2 ⊔ Java#3

where Java#i is a propositional variable representing the ith-sense of the word “Java” according to a dictionary (e.g., WordNet).

E.g.2: “Java Beans” becomes:

(Java#1 ⊔ Java#2 ⊔ Java#3)⊓(Bean#1 ⊔ Bean#2)

30

Page 31: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Advantages of Propositions NL labels are ambiguous, propositions are NOT!

Extensional semantics of propositions naturally maps nodes to real world objects.

Labels as propositions allow us to deal with the standard problems in classification (e.g., document classification, query-answering, and matching) by means of ClassL’s reasoning, mainly the SAT problem.

31

Page 32: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Formalizing the Meaning of Links (1) Child nodes in a classification are always

considered in the context of their parent nodes.

Child nodes therefore specialize the meaning of the parent nodes.

Contextuality property of classifications.

32

Page 33: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Formalizing the Meaning of Links (2) General intersection

relationship(a): can be used to represent facets. The meaning of node 2 is C = A ⊓ B.

Subsumption relationship (b): child nodes are specific case of the parent nodes. The meaning of node 2 is B.

33

1

2

A

B

A

B

A

B? C

(a)

(b)

Page 34: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

General Intersection Example

hardwaresoftwarenetworking …

l3 = “Computers

and Internet” l5 = “Programming”

scheduling, planning

computerprogrammin

g

l1 = “Subjects”

34

Page 35: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Concept at a Node Parental contextuality is formalized in ClassL by the

notion of “concept at a node.”

A concept Cr at the root node r is the class proposition (label) used to denote the node.

A concept Ci at a node ni is the conjunction of a proposition Pi (label of ni) and the concept Cj at node nj parent to ni (if it has any parents).

In ClassL: Pi ⊓ Cj.

35

Page 36: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Concept at a Node A concept at a node ni can be computed as the

conjunction of all the labels from the root of the classification hierarchy to ni.

Concepts at nodes capture the classification semantics by using the meaning of labels (propositions defined by using WordNet and a linguistic analysis) and the nodes' position.

36

Page 37: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Concept at a Node: Example

Wine and Cheese

Italy

Europe

Austria

Pictures

1

2 3

4 5

In ClassL: C4 = Ceurope ⊓ Cpictures ⊓ Citaly

37

Page 38: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

What have we done? Calculate the concepts and label and concept at

nodes.

In which format?

ClassL

Java#1 ⊔ Java#2 ⊔ Java#3

Ceurope ⊓ Cpictures ⊓ Citaly

We have built the ClassL formulas for each node!

38

Page 39: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Distinctions Among Ontology, LO and CLS

39

A

B C

D E

Is-a

Instance-of

Part-of

Is-a

Locate-in

Likes

Ontology

A

B C

D E

Is-a

Instance-of

Part-of

Is-a

Tree-like Ontolog

y

A

B C

D E

Child-ofChild-of

Child-of Child-of

Classification

Most commonformat

BackboneTaxonomy

A

A⊓B A C⊓

A⊓B⊓D A⊓B⊓E

Classification Semantics

Formalization

Formal Lightweig

ht Ontology

Descriptive Ontologies

Classification Ontologies

Page 40: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Outline Ontologies

Lightweight Ontologies

Classifications

Optimization of Classifications

Document Classification in LOs

Query-answering in LOs

Semantic Matching

40

Page 41: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Rational LOsLOs may be not perfect…Reconstruct a LO based on

the “most specific subsumer” relation.

Nodes get parents which most specifically describe them, still being more general.

The new structure is called, a Rational LO (RLO)

NOTE: classification semantics do not change.

41

EU

Schengen StatesItaly

FranceGermanyPictures

EU

Schengen States

Italy FranceGermany

Pictures

Page 42: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Optimization of Classifications Problem: to find ‘the most specific subsumer’ of a

given node.

Suppose we have, for all nodes in the LO, the concepts at label in ClassL, i.e. wff’s after NLP.

Then we can refer to the ‘subsumption’ reasoning service which finds the minimal with respect to the ordering ‘⊑’.

E.g.: Italy⊑EU, ShengenState⊑EU, Italy⊑ShengenState…

42

Page 43: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Outline Ontologies

Lightweight Ontologies

Classifications

Optimization of Classifications

Document Classification in LOs

Query-answering in LOs

Semantic Matching

43

Page 44: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Document Classification Each document d in a classification is assigned a

proposition Cd in ClassL.

Cd is called document concept.

Cd is build from d in two steps:1. keywords are retrieved from d by using standard

text mining techniques.2. keywords are converted into propositions by using

methodology discussed above.

44

Page 45: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

“Get specific” Rule

For any given document d and its concept Cd we classify d in each node ni such that:

1. ⊨ Cd ⊑ Ci (i.e. the concept at node ni is more general than Cd);

2. and there is no node nj (j ≠ i), whose concept at node Cj is more specific than Ci and more general than Cd:

⊨Cj C⊑ i and ⊨ Cd⊑ Cj.

45

Su

bsu

mp

tion

re

ason

ing

Of C

lassL

Page 46: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

ExampleSuppose we need to

classify “Professional Java, JDK-5th Edition” by W. Clay Richardson et al.

The document concept of such document d is:Cd = Java#3⊓Programming#2.

The node 7 is the only node which conforms to the “get specific” rule.

Level

0

1

2

3

4

(1)

(2) (3)

(4) (5)

(7)(6)

(8)

Subjects

Computers andInternet

Business andInvesting

Small Business and

Entrepreneurship

Programming

New BusinessEnterprises

Java Language

Java Beans

46

Page 47: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Example (cont’)Suppose we need to

classify “Visual Basic.Net Programming for Business” by Philip A. Koneman.

The document concept of such document d is:Cd = VisualBasicNet#1⊓Programming#2⊓Business#1

The nodes 2,5 conform to the “get specific” rule.

Level

0

1

2

3

4

(1)

(2) (3)

(4) (5)

(7)(6)

(8)

Subjects

Computers andInternet

Business andInvesting

Small Business and

Entrepreneurship

Programming

New BusinessEnterprises

Java Language

Java Beans

47

Page 48: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

What have we done by far? Classify documents.

How? Get specific algorithm! But how to implement the algorithm?

ClassL!

We are reasoning with the ‘Concept Realization’ service of ClassL!

(With an empty ABox.)

⊨Cj C⊑ i and ⊨ Cd⊑ Cj

48

Page 49: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Outline Ontologies

Lightweight Ontologies

Classifications

Optimization of Classifications

Document Classification in LOs

Query-answering in LOs

Semantic Matching

49

Page 50: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Intuitive Query-answering Query-answering on a hierarchy of documents

based on a query q as a set of keywords is defined in two steps:

1. The ClassL proposition Cq is build from q by converting q’s keywords as said above.

2. The set of answers (retrieval set) to q is defined as a set of subsumption checking problems in Ground ClassL:

Aq ={d∈ document | T⊨ Cd C⊑ q}.

50

Page 51: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Query-Answering: A Problem Searching on all the documents may be expensive

(millions of documents classified).

We define a set of nodes which contain only answers to a query q as follows:

Nsq ={ni node| T⊨Ci ⊑ Cq}

NOTE: Each document d in ni in Nsq is an answer to

the query q, because T⊨ Cd C⊑ i by definition of classification.

Thus all the documents d in Nsq ∈ Aq.

51

Page 52: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Query-Answering: Classification Set We extend Ns

q (named sound classification answer) by adding a set of nodes (named query classification set) defined as:

Clq ={ni node | d n∈ i and ⊨Cd ≡ Cq}

i.e., the nodes which constitute the classification set of a document d, whose concept Cd is equivalent to Cq.

52

Page 53: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Query-Answering: Sound Answer Set The set of answers (retrieval set) to q is finally

defined as the following set:

Asq =df {d n∈ i | ni ∈ Ns

q} ∪ {d n∈ i | ni C∈ lq and ⊨Cd C⊑ q}.

Under this definition, an answer to a query are documents from nodes whose concepts are more specific than the query concept.

53

Page 54: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Example Suppose that a user makes

a query q, which is converted into Cq = Java#3⊓Cobol#1, where Cobol#1 is “common business-oriented language.”

It can be shown that Ns

q = {7,8}.

Exercise: show it.

Level

0

1

2

3

4

(1)

(3)

(5)

(7)

(8)

Subjects

Computers andInternet

Programming

Java Language

Java Beans

Nsq

Nsq

54

Page 55: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Example (cont’) It can be shown that

Nsq = {7,8}.

“Java for COBOL programmers, 2nd ed.” is classified in node 2, so it is not an aswer by using only Ns

q = {7,8}.

We then consider Clq to compute more answers, among others are the documents in node 5.

Level

0

1

2

3

4

(1)

(3)

(5)

(7)

(8)

Subjects

Computers andInternet

Programming

Java Language

Java Beans

Clq

55

Page 56: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Sound Answer Set: Remark The set As

q is sound (i.e., contains answers to q), but not complete (i.e., does not contain all the answers to q).

See the next example.

d

EuropeanUnion

Pictures

(1)

(2)

56

Page 57: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Sound Answer Set Example Suppose that a user makes

a query q like “video or pictures of Italy,” which is converted into Cq = Italy#1 ⊓(Video#2 Pictures#1).⊔

Cq is equivalent to:

Cq1 = Video#2 Italy#1,⊓

Cq2 = Pictures#1 Italy#1.⊓

d

EuropeanUnion

Pictures

(1)

(2)

57

Page 58: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Sound Answer Set: Example (cont’) But not |= C2 ⊑ C1

q,

hence a document d in 2 about Rome, with Cd = Pictures#1 Rome#1 ⊓

is not retrieved, since:

Nsq = {ni |= Ci ⊑ Cq} = ∅ and

Clq ={1}, so d ∉ Asq.

(Asq is not complete)

d

EuropeanUnion

Pictures

(1)

(2)

58

Page 59: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Some Comments The edge structure of a LO is not considered for

document classification, neither for query answering.

The edges information becomes redundant, as it is implicitly encoded in the “concept at a node” notion.

There are more than one way to build a LO from a set of concepts at nodes.

59

Page 60: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

What have we done in Query Answering? Find the set of documents.

How? Find the concept that is subsumed by the query.

But how to implement it?

ClassL!

We are reasoning with the ‘Concept subsumption’ service of ClassL!

⊨Cd ⊑ Cq

60

Page 61: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Outline Ontologies

Lightweight Ontologies

Classifications

Optimization of Classifications

Document Classification in LOs

Query-answering in LOs

Semantic Matching

61

Page 62: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Date: a matching?

62

Page 63: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Why Matching? Most popular knowledge can be represented as

graphs. The heterogeneity between knowledge graphs demands the exposition of relations, such as semantically equivalent.

Some popular situations that can be modeled as a matching problem are:

Concept matching in semantic networks.

Schema matching in distributed databases.

Ontology matching (ontology “alignment”) in the Semantic Web.

63

Page 64: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

A Matching Problem (Example)

64

?

?

?

Page 65: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Relational DB Schemas Let us consider the following relational database

(RDB) model, say “BANK”:

65

(Giunchiglia & Shvaiko, 2007)

Page 66: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Relational DB Schemas: Representation 1 We can represent the RDB model “BANK” as a graph

(a tree) with root “BANK”:

The RDB model is first partitioned into relations, then attributes and data instances.

66

(Giunchiglia & Shvaiko, 2007)

Page 67: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Relational DB Schemas: Representation 2 We can represent the RDB model “BANK” as a graph

(a tree) with root “BANK”:

The model is partitioned into relations, then into tuples, attributes and data instances.

67

(Giunchiglia & Shvaiko, 2007)

Page 68: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Relational DB Schemas: NOTEs Which of the two representations is more preferable

depends on the concrete task?

It is always possible to transform one representation into the other.

In contrast to the example of RDB “BANK”, DB schemas are seldom trees. More often, DB schemas are translated into Directed Acyclic Graphs (DAG’s).

68

Page 69: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

OODB Schemas Let us consider the RDB “BANK” in terms of an

object-oriented DB (OODB) schema:BRANCH (Street, City, Zip) PERSON (F_Name, L_Name) STAFF : PERSON (Position, Salary, Manager)

The resulting graph is:

69

(Giunchiglia & Shvaiko, 2007)

Page 70: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

OODB Schemas: NOTEs

OODB schemas capture more semantics than the relational DBs.

In particular, an OODB schema: explicitly expresses subsumption relations between

elements; admits special types of arcs for part/whole relationships

in terms of aggregation and composition.

70

Page 71: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Semi-structured Data Neither RDBs nor OODBs capture all the features of

semi-structured or unstructured data (Buneman, 1997): semi-structured data do not possess a regular structure

(schemaless); the “structure” of semi-structured data could be partial or

even implicit.

Typical examples are: HTML and XML.

71

Page 72: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

XML Schemas XML schemas can be represented as DAGs. The graph from the RDB “BANK” could also be

obtained from an XML schema.

72

(Giunchiglia & Shvaiko, 2007)

Page 73: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

XML Schemas: NOTEs Often XML schemas represent hierarchical data

models. In this case the only relationships between the elements

are {is-a}.

Attributes in XML are used to represent extra information about data. There are no strict rules telling us when data should be represented as elements, or as attributes.

73

Page 74: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Concept Hierarchies Def. A concept hierarchy is a semi-formal

conceptualization of an application domain in terms of concepts and relationships.

74

(Giunchiglia & Shvaiko, 2007)

Page 75: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Concept Hierarchies: NOTEs Examples are classification hierarchies, e.g.,

and directories (catalogs). Classification hierarchies / Web directories are

sometimes referred to as lightweight ontologies (Uschold & Gruninger, 2004). However: They are not ontologies, as they lack of a formal

semantics (semi-formal vs formal.) They don’t formalize class instances.

75

Page 76: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

The Matching Problems A Matching Problem (syntactic or semantic) is a

problem on graphs summarized as:

Given two finite graphs, is there a matching between the (nodes of the) two graphs?

In other words: given two graph-like structures (e.g., concept hierarchies or ontologies), produce a mapping between the nodes of the graphs that semantically correspond to each other.

76

Page 77: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Matching Procedures A problem of matching can be decomposed in two

steps:

1. Extract the graphs from the conceptual models under consideration;

2. Match the resulting graphs.

Below we show some examples of step 1. (We follow [Giunchiglia & Shvaiko, 2007].)

77

Page 78: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Syntactic VS. Semantic

78

Matching

Semantic Matching

Syntactic Matching

Relations are computed between labels at nodes

R = {x[0,1]}

Relations are computed between concepts at nodes

R = { =, ⊑, ⊒, , ⊓} Note: all

previous systems are syntactic…

Note: needed for proper labeling of context mappings

Page 79: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Semantic Matching Mapping element is a 4-tuple < IDij, n1i, n2j, R >, where

IDij is a unique identifier of the given mapping element;

n1i is the i-th node of the first graph;

n2j is the j-th node of the second graph;

R specifies a semantic relation between the concepts at the given nodes

Semantic Matching: Given two graphs G1 and G2, for any node n1i G1, find the strongest semantic relation R holding with node n2j G2.

Computed R’s, listed in the decreasing binding strength order: equivalence { = }; more general/ specific {⊒,⊑}; mismatch { }; overlapping {⊓}.

79

Familiar?

Page 80: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Example

?

< ID22, 2, 2, = >

=

?

?

< ID22, 2, 2, = >

4

Images

Europe

ItalyAustria

2

3 4

1

Italy

Europe

Wine and Cheese

Austria

Pictures

1

2 3

5

< ID21, 2, 1, ⊑ >

< ID24, 2, 4, ⊒ >

80

Page 81: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

S-Match AlgorithmFour Macro StepsGiven two labeled trees T1 and T2, do:1. For all labels in T1 and T2 compute concepts at labels 2. For all nodes in T1 and T2 compute concepts at nodes3. For all pairs of labels in T1 and T2 compute relations between

concepts at labels4. For all pairs of nodes in T1 and T2 compute relations between

concepts at nodes

Steps 1 and 2 constitute the preprocessing phase, and are executed once and each time after the schema/ontology is changed (OFF- LINE part). This is the SAME as the procedure of formalizing a classification into a lightweight ontology as dicussed in last lecture.

Steps 3 and 4 constitute the matching phase, and are executed every time the two schemas/ontologies are to be matched (ON - LINE part)

81

Page 82: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Step 1: Compute concepts at labels The idea:

Translate natural language expressions into internal formal language Compute concepts based on possible senses of words in a label and

their interrelations Preprocessing:

Tokenization. Labels (according to punctuation, spaces, etc.) are parsed into tokens. E.g., Wine and Cheese <Wine, and, Cheese>;

Lemmatization. Tokens are morphologically analyzed in order to find all their possible basic forms. E.g., Images Image;

Building atomic concepts. An oracle (WordNet) is used to extract senses of lemmatized tokens. E.g., Image has 8 senses, 7 as a noun and 1 as a verb;

Building complex concepts. Prepositions, conjunctions, etc. are translated into logical connectives and used to build complex concepts out of the atomic concepts

E.g., CWine and Cheese = <Wine, U(WNWine)> ⊔ <Cheese, U(WNCheese)>,

where ⋃ is a union of the senses that WordNet attaches to lemmatized tokens

82

Page 83: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Step 2: Compute concepts at nodes The idea: extend concepts at labels by capturing the knowledge

residing in a structure of a graph in order to define a context in which the given concept at a label occurs

Computation (basic case): Concept at a node for some node n is computed as an intersection of concepts at labels located above the given node, including the node itself

Wine and Cheese

Italy

Europe

Austria

Pictures

1

2 3

4 5

C4 = Ceurope ⊓CPictures⊓CItaly

83

Page 84: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Step 3: compute relations between concepts at labels

The idea: Exploit a priori knowledge, e.g., lexical, domain knowledge with the help of element level semantic matchers

Results of step 3:

Italy

Europe

Wine and

Cheese

Austria

Pictures

1

2 3

4 5

Europe

Italy Austria

2

3 4

1

Images

T1 T2

T2

=CItaly

=CAustria

=CEurope

=CImages

CAustriaCItalyCPicturesCEuropeT1

CWine CCheese

84

Page 85: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Don’t hurry to Step 4! What we have done in Step 3?

Find semantic relations. What for?

To build the semantic relation bases for further matching.

What is a ‘relation base’ in the logic sense?

TBox!We are building the TBox!

85

Page 86: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Step 4: compute relations between concepts at nodesThe idea: Reduce the matching problem to a

validity problem

Context: the relations between concepts at labels

Wffrel (C1i, C2j): relation to be proved (with: C1i in Tree1 and C2j in Tree2)

Translate into propositional logic: C1i = C2j is translated into C1i C2j C1i subsumes C2j is translated into C1i C2j

C1i C2j is translated into ¬ (C1i C2j) Prove Context Wffrel (C1i, C2j) is valid with

Rel={=,⊑,⊒,⊥,⊓} A propositional formula is valid iff its negation is

unsatisfiable (SAT deciders are sound and complete…)86

Page 87: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Pseudo code of Step4 1. i, j, N1, N2: int; 2. context, goal: wff; 3. n1, n2: node; 4. T1, T2: tree of (node); 5. relation = {=, ⊑ , ⊒ , }; 6. ClabMatrix(N1, N2), CnodMatrix(N1, N2), relation: relation

7. function mkCnodMatrix(T1, T2, ClabMatrix) { 8. for (i = 0; i < N1; i++) do 9. for (j = 0; j < N2; j++) do 10. CnodMatrix(i, j):=NodeMatch(T1(i),T2(j), ClabMatrix)}

11. function NodeMatch(n1, n2, ClabMatrix) { 12. context:=mkcontext(n1, n2, ClabMatrix, context); 13. foreach (relation in < =, ⊑ , ⊒ , >) do { 14. goal:= w2r(mkwff(relation, GetCnod(n1), GetCnod(n2)); 15. if VALID(mkwff(, context, goal)) 16. return relation;} 17. return IDK;}

Validity Reasoning Problem:

Context⊨goal?

87

Page 88: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Example Example. Suppose we want to check if C12 = C22

88

T2

=C14

C13

=C12

C11

C25C24C23C22C21T1

(C1Images C2Pictures) (C1Europe C2Europe) (C12 C22 )

Context Goal

Page 89: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Examples

Italy

Europe

Wine and

Cheese

Austria

Pictures

1

2 3

4 5

Europe

Italy Austria

2

3 4

1Images

T1 T2=

Italy

Europe

Wine and

Cheese

Austria

Pictures

1

2 3

4 5

Europe

Italy Austria

2

3 4

1Images

T1 T2

Italy

Europe

Wine and

Cheese

Austria

Pictures

1

2 3

4 5

Europe

Italy Austria

2

3 4

1Images

T1 T2

Italy

Europe

Wine and

Cheese

Austria

Pictures

1

2 3

4 5

Europe

Italy Austria

2

3 4

1Images

T1 T2

89

Page 90: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Short Summary What is done in Step 4? Find implicit semantic relations.

How? Validity reasoning based on TBox built in Step 3!

What is working on the backend of semantic matching?

ClassL reasoning!

90

Page 91: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

A Real S-Match System Structure

91

Page 92: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Testing Methodology

92

Measuring match quality Expert mappings are inherently subjective Two degrees of freedom

Directionality

Use of Oracles

Indicators Precision, [0,1] (correctness criterion, how many false positives?)

Recall, [0,1] (completeness criterion, how many false negatives?)

Overall, [-1,1]

F-measure, [0,1]

Time, sec.

Matching systems: S-Match vs. Cupid, COMA and SF as implemented in Rondo

Page 93: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

Preliminary Experimental Results

Average Results

0.0

0.2

0.4

0.6

0.8

1.0

Rondo Cupid COMA S-match

0.02.0

4.06.0

8.010.0

12.0sec

Precision Recall Overall F-measure Time

93

Three experiments, test cases from different domains

Some characteristics of test cases: #nodes 4-39, depth 2-3

PC: PIV 1,7Ghz; 256Mb. RAM; Win XP

Page 94: LDK R Logics for Data and Knowledge Representation Application of (Ground) ClassL.

References & Credits References:

F. Giunchiglia, P. Shvaiko, “Semantic matching.” Knowledge Engineering Review, 18(3):265-280, 2003.

F. Giunchiglia, M. Marchese, I. Zaihrayeu. “Encoding Classifications into Lightweight Ontologies.” J. of Data Semantics VIII, Springer-Verlag LNCS 4380, pp 57-81, 2007.

F.Giunchiglia, I.Zaihrayeu. “Lightweight Ontologies” Encyclopedia of Database Systems , Springer-Verlag, 2008.

Available as a DIT Technical Report here.

94