Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna...

27
Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1 , Anna Formica 2 , Patrizia Grifoni 1 , and Maurizio Rafanelli 2 1 IRPPS-CNR, via Nizza 128, 00198 Roma, Italy [email protected], [email protected] 2 IASI-CNR, viale Manzoni 30, 00185 Roma, Italy [email protected], [email protected].

Transcript of Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna...

Page 1: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Evaluating semantic similarity using GML

in Geographic Information Systems

Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli 2

1 IRPPS-CNR, via Nizza 128, 00198 Roma, Italy

[email protected], [email protected] 2 IASI-CNR, viale Manzoni 30, 00185 Roma, Italy

[email protected], [email protected].

Page 2: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Summary

Motivation Related works Coding a Part-of Hierarchy using GML Similarity evaluation Conclusion

Page 3: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Motivation (1)

In Geographic Information Systems (GISs) semantic similarity plays an important role, as it supports the identification of objects that are conceptually close, but not identical.

GML (Geography Markup Language) is emerging as the dominant standard for exchanging geographic data across the Internet.

A semantic similarity model facilitates comparison of entities and allows information retrieval and integration to handle semantically similar concepts . The goal of a similarity model is to obtain flexible and better matches between user-expected and system-retrieved information.

Page 4: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Motivation (2)

Given the relevance of the Is-in relationship in the geographic context, we focus on GML elements organized according to Part-of (meronymic) hierarchies.

The semantics essentially concerns parts which are similar to and inseparable from the whole.

Page 5: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Related works (1)

Similarity of hierarchically related concepts has been widely investigated in the literature [Resnik] [Rodriguez, Egenhofer].

From the various proposals, we followed the probabilistic approach of Lin, which is based on the notion of information content and overcomes the drawbacks of the traditional edge-counting approach.

Page 6: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Related Works (2)

Resnik proposes algorithms that take advantage of taxonomic similarity in resolving syntactic and semantic ambiguities.

Lin starts from the Resnik’ work and addresses also the information content of the comparing concepts.

Page 7: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Coding a Part-of Hierarchy with GML (1)

The real world in the geographic domain can be represented as a set of features, and AbstractFeatureType codifies a geographic feature in GML.

Its geometry type is an important property, it is given in the reference coordinate system and describes the extent, position or relative location of the represented concept.

Page 8: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Coding a Part-of Hierarchy with GML (2)

The geometric types defined in GML provide the framework for modelling all the geographical concepts.

By means of this framework it is possible to model, for example, the concepts composing a communication ways network, such as roads, rivers, canals and other communication infrastructures.

Page 9: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Coding a Part-of Hierarchy with GML (3)

AbstractFeatureType

MultiLineStringType MultiPolygonType……..

ComWayType RoadType RiverType CanalType NavSegmentType NNavSegmentType

This figure shows an example of a type hierarchy that introduces concepts concerning communication infrastructures starting from the GML geometric types.

Page 10: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Coding a Part-of Hierarchy with GML (4)

As mentioned in the motivation, due to the relevance of the Is-in relationship in the geographic context, the paper focuses on GML elements organized according to Part-of (meronymic) hierarchies.

For instance, in our example a Part-of relationship exists among communication ways (ComWay) and roads, rivers and canals.

Page 11: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Coding a Part-of Hierarchy with GML (5) Usually, in the literature, Part-of hierarchies are modelled in XML using

“sequences of elements”, and a similar approach could be followed in GML

ComWay

RiverRoad

NavRiverNNavRiver NavCanal NNavCanal

Canal CountryKind

However, this approach does not permit to distinguish between elements of the Part-of hierarchy and other elements eventually defined out of the Part-of hierarchy, such as Kind and Country

Page 12: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Coding a Part-of Hierarchy with GML (6) In order to put in evidence meronymic relationships within the GML

element hierarchy, a Part-of hierarchy could be modelled by introducing some special geographic types such as PartOfWayType, PartOfRivType, PartOfCanType

PartOfWay

River CanalRoad

NavRiver NNavRiver NavCanal NNavCanal

ComWay

CountryKind

PartOfRiv PartOfCan

Each special type is introduced for modelling a Part-of relationship between a geographic concept and their component concepts

Page 13: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Coding a Part-of Hierarchy with GML (7)

<element name="ComWay" type=="ComWayType"/><element name="Road" type=="RoadType"/><element name="River" type=="RiverType"/><element name="Canal" type=="CanalType"/><element name="NavRiver" type=="NavSegmentType"/><element name="NNavRiver“ type=="NNavSegmentType"/><element name="NavCanal“ type=="NavSegmentType"/><element name="NNavCanal“ type=="NNavSegmentType"/>

 <complexType name="ComWayType"> <sequence> <element name = "kind" type="string"/> <element name = "country" type="string"/>

<element name = "PartOfWay" type="PartOfWayType"/> </sequence> <attribute name="label" type="string" /> <attribute name="label" type="string" /> <attribute name="length" type="integer" /></complexType>

<complexType name="PartOfWayType"> <sequence>

<element name = "Road" type="RoadType"/> <element name = "River" type="RiverType"/> <element name = "Canal" type="CanalType"/>

</sequence></complexType>

 <complexType name="RoadType"> <attribute name="label" type="string" /> <attribute name="length" type="integer" /> <attribute name="maxspeed" type="integer" /></complexType>

…………………………..

This GML code shows how to put in evidence a meronymic relationship within the GML element hierarchy introducing a special geographic type such as PartOfWayType

Page 14: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Evaluating similarity (1)

For evaluating concept similarity this paper combines and revisits:

the information content approach [Lin98], a proposal inspired by the maximum

weighted matching problem in bipartite graphs [FM02].

Page 15: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Evaluating similarity (2)

The starting assumption is that the association of probabilities with the Part-of taxonomy allows us the notion of a weighted element hierarchy to be introduced. In particular, in our example the probabilities have been estimated in line with WordNet 2.0.

For instance, below the concepts Road and River have been defined, with the related frequencies (the numbers in parenthesis). (95) Road – an open way (generally public) for travel and transportation (55) River – a large natural stream of water (larger than a creek)

Page 16: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Evaluating similarity (3)

The probability of a concept

The probability of a concept c is defined as:

p(c) = freq(c)/N

where freq(c) is the frequency of the concept c in the taxonomy, and N is the total number of concepts.

In the example probabilities have been assigned according to WordNet.

Page 17: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Evaluating similarity (4)

Example: Weighted Concept Hierarchy

ComWay 0.006

Road River Canal 0.0019 0.0011 0.0006

PartofRiv PartofCan

NavRiver NNavRiver NavCanal NNavCanal 0.0002 0.0002 0.0002 0.0002

Kind Country PartofWay

Page 18: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Evaluating similarity (5)

Following the standard approach of information theory [Ross76], the information content of a concept c can be quantified as:

– log p(c)

that is, as the probability increases, the informativeness decreases.

Page 19: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Evaluating similarity (6)

The information content similarity (ics) of two concepts such as River and Canal is defined as:

ics(River, Canal) = 2 log p(ComWay)/(log p(River)+log p(Canal)) = 0,72

where ComWay is the concept representing the maximum information content shared by River and Canal. According to the Lin’s approach the more information two concepts share, the more similar they are.

Page 20: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Evaluating similarity (7)

Structural similarity (asim)

Inspired by the maximum weighted matching problem in bipartite graphs, we have to identify the

set of pairs of typed attributes

such that is maximal the sum of the products of the information content similarity of the attributes and the related types.

Page 21: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Evaluating similarity (8)

Example

label:string

length:integer

flow:integer

deepness:integer

label:string

profundity:integer

capacity:integer

length:integer

RiverType CanalType

Page 22: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Evaluating similarity (9)

In the previous example the set of pairs of attributes that maximizes the sum of the related information content similarity is the following:

{(label,label), (length,length), (flow,capacity), (deepness,profundity)}

Page 23: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Evaluating similarity (10)

In fact, by assuming that deepness and profundity are synonyms, we have:

ics(label,label)=ics(length,length)= ics(deepness,profundity) = 1

and ics(flow,capacity) = 0.  

Page 24: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Evaluating similarity (11)

The similarity of the sets of attributes of complexTypes (asim) is therefore defined by the above maximum sum divided by the greatest of the cardinalities of the sets of attributes of the types compared.

In the case of RiverType and CanalType we have:

asim(RiverType,CanalType) = ¾ = 0.75

Page 25: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Evaluating similarity (12)Concept Similarity (Gsim)

The Similarity (Gsim) of the concepts River and Canal is defined as:

Gsim(River , Canal) =(ics(River , Canal)*w + asim(River, Canal)*(1-w)) * t(RiverType,CanalType)

where: ics(River , Canal) is the information content similarity asim(River , Canal) is the structural similarity w is a weight, s.t. 0 <= w <= 1. t is a Boolean function that, given two complexTypes, returns 0 if their

least upper bound in the type hierarchy is AbstractFeatureType, otherwise it returns 1.

Page 26: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Evaluating similarity (13)

In particular, if we assume w=0.5

Gsim(River , Canal) =(ics(River , Canal)*w + asim(River, Canal)*(1-w)) * t(RiverType,CanalType)

Gsim(River , Canal) = 0.5 (0.72+0.75)*1 = 0.74

Page 27: Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Conclusion

Thank you