Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

download Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

of 18

Transcript of Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    1/18

     

    Feature-based approaches to semanticsimilarity assessment of concepts using

    Wikipedia

    -Yuncheng Jiang , Xiaopei Zhang, Yong Tang, Ruihua Nie

    Presented By:Kushagra Sharma (286/CO/12)Lakshay Bansal (287/CO/12)

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    2/18

     

    Abstract

     In the ast! se"eral ar#a$hes t# assess s%m%lar%ty &ye"aluat%ng the kn#'lege m#ele %n an (#r mult%le) #nt#l#gy (#r#nt#l#g%es) ha"e &een r##se

     *#'e"er! there are s#me l%m%tat%#ns su$h as the +a$ts #+ rely%ng#n ree+%ne #nt#l#g%es an +%tt%ng n#n-ynam%$ #ma%ns %n thee,%st%ng measures

     In th%s aer! s#me n#"el +eature &ase s%m%lar%ty assessment

    meth#s ha"e &een r##se that are +ully eenent #n%k%e%a an $an a"#% m#st #+ the l%m%tat%#ns an ra'&a$ks%ntr#u$e %n the re"%#us meth#s

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    3/18

     

    Introduction

    Definition: Semant%$ s%m%lar%ty %s unerst## as the egree #+ta,#n#m%$ r#,%m%ty &et'een $#n$ets (#r terms! '#rs)

      In #ther '#rs! semant%$ s%m%lar%ty states h#' ta,#n#m%$ally neart'# $#n$ets (#r terms! '#rs) are! &e$ause they share s#mease$ts #+ the%r mean%ng

    .e$hn%$ally! s%m%lar%ty measures assess a numer%$al s$#re thatuant%+%es th%s r#,%m%ty as a +un$t%#n #+ the semant%$ e"%en$e#&ser"e %n #ne #r se"eral kn#'lege s#ur$es

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    4/18

     

    Ontology based methods toestimate similarity

    0ge C#unt%ng easures

    In+#rmat%#n C#ntent easures

    eature Base easures

    *y&r% easures

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    5/18

     

    Edge counting measures

     - consists of taking into account the length of the path linking theconcepts !or terms" and the position of the concepts !or terms"in a gi#en dictionary !or ta$onomy% ontology"

      .he ma%n ad#antage #+ ege $#unt%ng measures %s the%r simplicity&

    .hey #nly rely #n the grah m#el #+ an %nut #nt#l#gy 'h#see"aluat%#n reu%res a lo' computational cost&

    3ue t# the%r s%ml%$%ty! these ar#a$hes #++er a limited accuracy ue t# #nt#l#g%es m#el a large am#unt #+ ta,#n#m%$al kn#'legethat %s n#t $#ns%ere ur%ng the e"aluat%#n #+ the m%n%mum ath In

    an#ther erse$t%"e! the ma%n assumt%#n #+ ege $#unt%ngmeasures %s that an ege reresents the same semantic distanceany'here %n the stru$ture #+ the grah (#r ath)! 'h%$h %s n#t true asse$t%#ns #+ the grah may &e +%nely $lass%+%e an #thers #nly$#arsely e+%ne

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    6/18

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    7/18

     

    DI*AD+A,A.E* ofO,O/O.0 BA*ED )E1OD*

    Clearly! the $#nstru$t%#n process of domain ontologies is time-consuming and

    error-prone an ma%nta%n%ng these #nt#l#g%es als# reu%res a l#t #+ e++#rt +r#m

    e,erts .hus! the meth#s #+ #nt#l#gy &ase s%m%lar%ty measures are limited in

    scope and scalability

    %th the emergence of social net'orks or instant messaging systems! a l#t #+

    (sets #+) $#n$ets #r terms (r#er n#uns! &rans! a$r#nyms! ne' '#rs!$#n"ersat%#nal '#rs! te$hn%$al terms an s# #n) are not included in Word,et an

    #ma%n #nt#l#g%es (%n +a$t e& users $an u&l%sh 'hate"er they 'ant t# share '%th

    the rest #+ the '#rl &y us%ng %k%s! Bl#gs an #nl%ne $#mmun%t%es at resent)!

    there+#re! s%m%lar%ty measures that are &ase #n these k%ns #+ kn#'lege res#ur$es

    $ann#t &e use %n these tasks

    .hese l%m%tat%#ns are the m#t%"at%#n &eh%n the ne' te$hn%ues resente %n th%saer 'h%$h infer semantic similarity from a kind of ne' source of information!

    %e! a '%e $#"erage #nl%ne en$y$l#e%a! namely %k%e%a

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    8/18

     

    Feature based similarityFeature based approaches to similarity measures assess similarity bet'een conceptsas a function of their properties&

    (ommon features tend to increase similarity and non-common ones tend to diminish it&

     4m%tt%ng a +un$t%#n 2!c" that yields the set of features rele#ant to c! ."ersky r##se the+#ll#'%ng s%m%lar%ty +un$t%#n:

    'here %s s#me +un$t%#n that re+le$ts the sal%en$e #+ a set #+ +eatures! 5(a) 5(&) %s the%nterse$t%#n &et'een th#se t'# sets #+ +eatures! 5(a) 5(&)%s the set #&ta%ne 'henel%m%nat%ng the elements #+ '(&) +r#m the set #+ +eatures #+ $#n$et a! 5(a)! an ! 9 an are arameters that r#"%e +#r %++eren$es %n +#$us #n the %++erent $#m#nents

    3odrigue4 and Egenhofer!35E" resent a k%n #+ ar#a$h t# $#mut%ng semant%$s%m%lar%ty .he s%m%lar%ty %s $#mute as the 'e%ghte sum #+ s%m%lar%t%es &et'een synsets!+eatures (eg! mer#nyms! attr%&utes! et$) an ne%ghr $#n$ets (th#se l%nke "%a semant%$

    #%nters) #+ e"aluate terms: Simre(a,b) = w. S synsets(a,b) + u. S features(a,b)+ v. S neighborhoods(a,b) 

    'here the +un$t%#ns Ssynsets! S+eatures! an Sne%ghrh##s are the s%m%lar%ty &et'eensyn#nym sets! +eatures! an semant%$ ne%ghrh##s #+ e"aluate terms! '! u! an " ('! u! ";

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    9/18

     

    S reresents the #"erla%ng &et'een the %++erent +eatures!$#mute as +#ll#'s:

     4 +eature &ase +un$t%#n $alle =-s%m%lar%ty rel%es #n the mat$h%ng &et'een synsets an termes$r%t%#n sets .he term es$r%t%#n sets $#nta%n '#rs e,tra$te &y ars%ng term e+%n%t%#ns .'# terms are s%m%lar %+ the%r synsets #r es$r%t%#n sets #r! the synsets #+ the terms %n the%rne%ghrh## (eg! m#re se$%+%$ an m#re general terms) are le,%$ally s%m%lar .he s%m%lar%ty+un$t%#n %s e,resse as +#ll#'s

     .he s%m%lar%ty +#r the semant%$ ne%ghrs Sne%ghrh##s %s $al$ulate as +#ll#'s:

    'here % en#tes relat%#nsh% tye

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    10/18

     

    WIs hyerl%nks are als# use+ul as ana%t%#nal s#ur$e #+ syn#nyms n#t $ature &y re%re$ts*yerl%nks als# $#mlement%sam&%guat%#n ages &y en$#%ng #lysemy In art%$ular! art%$les ment%#n%ng #ther en$y$l#e%$entr%es #%nt t# them thr#ugh %nternal hyerl%nks .h%s m#els art%$le $r#ss-re+eren$e

    (ategory structure S%n$e ay 2

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    11/18

     

    Feature-based similarity usingWikipedia

    Formal representation of Wikipedia conceptsLet 4 &e a %k%e%a art%$le an C#n &e the t%tle #+ 4 .he +#rmal reresentat%#n #+ %k%e%a

    $#n$et C#n %se+%ne as +#ll#'s:

    (on 6 =*ynonyms% .losses% Anchors% (ategories>

    'here Syn#nyms C#n! C#n1! ! C#nmD %s the set #+ syn#nyms #+ C#n! El#sses %s the +%rstaragrah #+ te,t #+ 4! 4n$h#rs 4n$1! ! 4n$nD %s the set #+ an$h#r te,ts (%e! la&els #+%nternal hyerl%nks) %n 4! an Categ#r%es Cat1! ! CatkD %s the set #+ $ateg#r%es #+ 4

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    12/18

     

     4 +rame'#rk +#r +eature-&ase s%m%lar%ty

    Let C#n1  FSyn#nyms1! El#sses1! 4n$h#rs1! Categ#r%es1G an C#n2 FSyn#nyms2! El#sses2!

     4n$h#rs2! Categ#r%es2G &e t'# %k%e%a $#n$ets .he s%m%lar%ty #+ C#n1 an C#n2! en#te

    as S%mC#n(C#n1! C#n2)! %s the +un$t%#n

    S%mC#n: %k%C#n = %k%C#n H

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    13/18

     

    )A1E)AI(A/ )ODE//I,. OF FEA3EBA*ED A**E**)E,

    e $an #&ta%n %++erent +eature &ase ar#a$hes t# s%m%lar%ty assessment reult%ng +r#m%nstant%at%#ns #+ the +rame'#rk

    %th#ut l#ss #+ general%ty! 'e assume that there are t'# sets #+ terms (#r '#rs! $#n$ets) Set1 

    an Set2 O&"%#usly! these t'# sets may &e Syn#nyms! Setgl#sses! 4n$h#rs! #r Categ#r%es

     4$$#r%ng t# =-s%m%lar%ty ar#a$h #r 0 ar#a$h (#r%gue@ 0genh#+er)! 'e ha"e the

    +#ll#'%ng s%m%lar%ty $#mutat%#n meth#s +#r Set1 an Set2:

    here

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    14/18

     

    E%"en t'# %k%e%a $#n$ets

    (on? 6 =*ynonyms?% *etglosses?% Anchors?% (ategories?> and

    (on@ 6 =*ynonyms@% *etglosses@%Anchors@% (ategories@>

     4$$#r%ng t# the n#t%#ns #+ SBsim an SKL0! 'e ha"e the +#ll#'%ng ar#a$hes t# s%m%lar%tymeasures +#r %k%e%a $#n$ets (su#se that the +un$t%#n S$#n$ets %s the a"erage #r ma,):

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    15/18

     

    (omparisonof #arious

    Approachesto 1uman

    basedCudgements

    3esults on correlation'ith human udgementsof similarity measures&

    (r#m le+t t# r%ght: measurear#a$h! $#rrelat%#n +#r C

    &en$hmark! $#rrelat%#n +#r E&en$hmark! an $#rrelat%#n +#r

    NMN-.C &en$hmark)

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    16/18

     

    Benchmark and E$perimental results !based on studentsG and professorsG udgements"

    3esults on correlation 'ith our benchmark ofsimilarity measures&

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    17/18

     

    Analysis of E$perimental3esults

    .he ar#a$hes S%m%r C#n! S%mSe$C#n! S%m.h%C#n! an S%m#uC#n! they er+#rm relat%"ely 'ell

    '%th the l#'est $#rrelat%#n &e%ng

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    18/18

     

    (onclusion

      .he +%nal g#al #+ $#muter%@e s%m%lar%ty measures %s to accurately mimic human udgements about semantic similarity&

      In th%s aer! s#me limitations #+ the e,%st%ng +eature &ase measures areidentified! su$h as the +a$ts #+ rely%ng #n a (#r mult%le) ree+%ne #ma%n #nt#l#gy (#r#nt#l#g%es) an fitting static domains (%e! n#n-ynam%$ #ma%ns)

    .# %mlement semant%$ s%m%lar%ty measurement &ase #n +eature &y mak%ng use #+

    %k%e%a a formal representation of Wikipedia concepts %s resente .hen! aframe'ork +#r +eature &ase s%m%lar%ty &ase #n the +#rmal reresentat%#n #+ %k%e%a$#n$ets %s g%"en

    .he e"aluat%#n! &ase #n se"eral '%ely use &en$hmarks an a &en$hmarke"el#e %n th%s aer! susta%ns the %ntu%t%#ns '%th rese$t t# human Pugements

    O"erall! se"eral meth#s resente here ha"e g## human $#rrelat%#n an $#nst%tute

    s#me e++e$t%"e 'ays #+ eterm%n%ng s%m%lar%ty &et'een %k%e%a $#n$ets In a%t%#n!$#ns%er%ng the l%m%tat%#ns (eg! small s%@e) #+ the e,%st%ng stanar &en$hmarks +#r$#n$et s%m%lar%ty assessment! 'e '%ll ursue the es%gn #+ a ne' &en$hmark se$%ally+#$use #n %k%e%a $#n$ets