A similarity measure based on semantic and linguistic information

25
Copyright 2010 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.i e A Similarity Measure Based on Semantic and Linguistic Information Nitish Aggarwal DERI, NUI Galway [email protected] Wednesday,15 th June, 2011 DERI, Reading Group 1

description

 

Transcript of A similarity measure based on semantic and linguistic information

Page 1: A similarity measure based on semantic and linguistic information

Copyright 2010 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

1

A Similarity Measure Based on Semantic and Linguistic Information

Nitish AggarwalDERI, NUI Galway

[email protected]

Wednesday,15th June, 2011DERI, Reading Group

Page 2: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

Based On:

“A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”

Authors: Giuseppe Pirro and JeoromeEuzenat

Published: International Semantic Web Conference, 2010

“SyMSS: A syntax-based measure for short-text semantic similarity ”

Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias

Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011

2

Page 3: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

Overview

Introduction

Classical Approaches

Ontology-based Similarity

Set of relations

Information Content

SyMSS (Syntax-based) Deep Parsing

Influence of adjectives and adverbs

Conclusion

3

Page 4: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

Introduction & Motivation

Short-text Similarity Lack of Semantics and Linguistics

Applications Semantic Annotation Semantic Search Information Retrieval and Extraction

4

Page 5: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

Classical Approaches

String Similarity Levenshtein distance, Dice Coefficient

Corpus-based ESA, Google distance,Vector-Space Model

Ontology-based Path distance, Information content

Syntax Similarity Word-order, Part of Speech

5

Page 6: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

First Paper:

“A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”

Authors: Giuseppe Pirro and JeoromeEuzenat

Published: International Semantic Web Conference, 2010

“SyMSS: A syntax-based measure for short-text semantic similarity ”

Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias

Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011

6

Page 7: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

Ontology-based - Overview

Features Whole set of semantic relations defined in an ontology

Resnik’s Information Content IC(c) = -log p(c)

Intrinsic Information Content Overcome the analysis of large corpora

Extended Information Content Map feature-based model to information theoretic

domain

7

Page 8: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

Ontology-based - Why whole set?

8

Eyes Ears

Relation: Part of

Page 9: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

Ontology-based - model

Tversky’s feature-based similarity model common features of two concepts ~ similarity Extra feature ~ 1/similarity .

Ratio-base formulation of Tverky’s model

.

9

Page 10: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

Ontology-based - Mapping

1

10

Mapping between feature-based and information theoretic similarity models

1. MSCA: Most Specific Common Abstraction

Page 11: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

Ontology-based - Example

11

T1: Car

T2: Bicycle

Example of Concept Feature

Page 12: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

Ontology-based - Example

12

T1: Car

T2: Bicycle

Example of Concept Feature

Page 13: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

Ontology-based - Framework

Intrinsic information content(iIC)

.

where sub(c) is number of sub-concept of given concept c.

Extended information content(eIC) where EIC(c) is relatedness coefficient using all kind of relations

13

Page 14: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie DataSet: 65 human evaluated pairs

Correlation values:

14

Ontology-based – Evaluation of Similarity

Page 15: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

Ontology-based – Evaluation of Relatedness

DataSet : Wordnet 353

Correlation value:

15

Page 16: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

16

Ontology-based - Summary

Intrinsic similarity measure Ontology-based similarity Outperforms corpus measures

Limitation No short-text Model-based

– E,g, only concepts in the ontology are considered (e.g. car accident)

Page 17: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

Second paper (SyMSS)

“A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”

Authors: Giuseppe Pirro and JeoromeEuzenat

Published: International Semantic Web Conference, 2010

“SyMSS: A syntax-based measure for short-text semantic similarity ”

Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias

Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011

17

Page 18: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

SyMSS - Overview

SyMSS = “syntax-based similarity for short-term text”

Syntactic Information Not only word order Deep Parsing Parts of speech

Semantic Information Wordnet similarity Different ontology-based similarity

18

Page 19: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

SyMSS - Semantic Information

Path-base measure Shortest path Hirst and st. Onge (HSO)

Information Content Resnik measure Jiang and Corath measure Lin measure

Gloss-base measure Gloss Overlap and Gloss vector

19

Page 20: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

SyMSS - Syntactic Information

Parse tree phrases Head of phrases

Head similarity Head of phrases which have same syntactic function

Penalization factor Non shared phrases

20

Page 21: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

SyMSS - Model

My brother has a dog with four legs

My brother has four legs

Sim(Has,Has) = 1

Sim(brother,brother) = 1Sim(dog,leg) = 0.1414

PF = 0.03

Page 22: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

SyMSS - Evaluation

DataSet: 30 pairs out of 65 human evaluated pairs

Correlation values:

22

Page 23: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

SyMSS - Effect of adverb and adjective

Sentence1: ”I have a big dog”

Sentence2: ”I have a little dog”

8.68% gain in SyMSS with HSO

23

Page 24: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

24

SyMSS - Summary

Syntax-based similarity considers… Nouns and verbs Influence of adjectives and adverbs

Limitation Depend on parsed structure

– E.g. not grammatically correct Depend on word similarity

Page 25: A similarity measure based on semantic and linguistic information

Digital Enterprise Research Institute www.deri.ie

25

Conclusion

No established method for short text Parsing of phrases is difficult

Concept similarity depend on model Weak model

– E.g. xebr: Extraordinary Income and xebr: Other Operating Income ->

Pathlength = 0.2 and Expert = 0.8

Need a syntactic similarity for concepts tag (word or phrase)