Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes...
-
Upload
marjory-adams -
Category
Documents
-
view
214 -
download
1
Transcript of Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes...
Similarity Measures for Query Expansion in TopX
Caroline Gherbaoui
Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I
Fachrichtung 6.2 - Informatik
Max-Planck-Institut für InformatikAG 5 - Datenbanken und Informationssysteme
Prof. Dr. Gerhard Weikum
Overview
background knowledgesimilarity measures for the query expansionevaluation of the computed similarity valueschanges in TopXconclusion
Background
top-k query processing provides k most relevant results
query expansion extends source query terms
word sense disambiguation extracts correct meaning
ontology amount of terms with their meanings and
semantic relations
Word Sense Disambiguation
„java, coffee“
„java “
„island“
„coffee“
„programming language“
…
Query Expansion
„COFFEE“ „drink, espresso“
TopX
top-k retrieval enginetext and XML dataword sense disambiguationquery expansionontology
TopX – WordNet Ontology
lexicon for the English languagehierarchical relationsone relation one direction~160,000 words~120,000 synsets~210,000 relations
TopX – YAGO Ontology
Wikipedia and WordNethierarchical and not hierarchical relationsone relation two directions~2,100,000 words~2,200,000 concepts~6,000,000 relations
Similarity Measures
Dice similarity the already used measure in TopX
NAGA similarity applied measure for YAGO
Best WordNet similarity measure with best result among WordNet
measures
Dice Similarity Measure
sdfsdf
measures the intersection of two regions
BA
BABADICE
2
,
BFREQAFREQ
BAFREQBADICE
,2
,
NAGA Similarity Measure
sdfasfsdf
combination of the confidence of a relation and the informativeness of a relation
BABAconfBANAGA ,inf1,,
n
iii wtrustwBAacc
nBAconf
1
,,1
,
)(
,,inf
AFREQ
BAFREQBA
)(
,,inf
BFREQ
BAFREQAB
Best WordNet Similarity Measure
sdfsdfsdf
product of the transfer function of the path length and the transfer function of the concept depth
hflfBAWordNet 21,
lelf 1
hh
hh
ee
eehf
2
Evaluation
All Relation Types
0,00%
10,00%
20,00%
30,00%
40,00%
50,00%
60,00%
70,00%
80,00%
90,00%
0 0 < 0.01 < 0.025 < 0.1 < 0.25 < 0.5 < 1
Dice (WordNet ontology)
Dice
NAGA backward
NAGA forward
Best WordNet
am
ou
nt
in %
Evaluation
DICE measure applicable also on the YAGO ontology
NAGA measure applicable with omitting of the forward direction
Best WordNet measure not applicable due to the density of YAGO
Changes for TopX
tuning of some procedures Dijkstra algorithm word sense disambiguation query expansion
extension of configuration file
Conclusion
larger knowledge basemore flexibilityincreased complexityfurther measure for the similarity
computation NAGA similarity
Questions?