01/06/15Sergey Chernov 1 Extracting Semantic Relationships between Wikipedia Categories By Sergey...

Post on 18-Dec-2015

225 views 1 download

Transcript of 01/06/15Sergey Chernov 1 Extracting Semantic Relationships between Wikipedia Categories By Sergey...

April 18, 2023Sergey Chernov

1

Extracting Semantic Relationships between Wikipedia Categories

By Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, Xuan Zhou, Michal Kopycki, Przemyslaw Rys

April 18, 2023Sergey Chernov

2

Preliminaries

WIKIPEDIA: largest knowledge sharing system

Many pages assigned to CATEGORIES

All links are NAVIGATIONAL

Can we extract SEMANTIC links?

MOTIVATION

April 18, 2023Sergey Chernov

3

Wikipedia Categories ExampleMOTIVATION

April 18, 2023Sergey Chernov

4

Possible benefits

Semi-structured queries“find Countries which had Democratic Non-Violent Revolutions”

rephrased as

“find page from category Countries which is connected to some page in Non-Violent Revolutions”

Hints for authors

“you edit page from category Countries, do you want to add a link to page in category Capital?”

Raw data for manual semantic markup

MOTIVATION

April 18, 2023Sergey Chernov

5

Countries

HeuristicsExperiments

Denmark

Austria

CapitalsBerlin

Stockholm

Vienna

Germany

France Paris

Number of links

NL = 3

Connectivity Ratio

CR = 3/4 = 0.75

April 18, 2023Sergey Chernov

6

Dataset

INEX 2006 collection

Sample category rankings

Experiments

April 18, 2023Sergey Chernov

7

Manual assessment methodology

Semantic Connection Strength (SCS) Measure: 2 = strong semantic relationship, 1 = average semantic relationship, 0 = weak or no semantic relationship.

Instruction for Assessors

“category A is strongly related to category B (value 2) if you believe that every page in A should conceptually have at least one semantic link to B;”

“A and B are averagely related (value 1), if you believe 50% of pages in A should have semantic links to B;”

“otherwise, A and B are weakly related (value 0).”

April 18, 2023Sergey Chernov

8

Experiments with Number of Links

Average semantic connections strength for 100 sample categories, extracted using Number of Links.

Experiments

April 18, 2023Sergey Chernov

9

Experiments with Connectivity Ratio

Average semantic connections strength for 100 sample categories, extracted using Connectivity Ratio.

Experiments

April 18, 2023Sergey Chernov

10

General Results and Conclusions

Result is skewed toward Countries category

Connectivity Ratio is a better measure than Number of Links

We have observed that inlinks have better performance than outlinks.

Summary

April 18, 2023Sergey Chernov

11

Future Steps

More manual exploration, look for additional heuristics

Consider more categories

SCS composed of

Is this a “part of” relation? W1 Is this a “is a” relation? W2 Is this a “synonym” relation? W3 Is this a “antonym” relation? W4 It is related in a different way? Which one? W5

Summary

April 18, 2023Sergey Chernov

12

Thank You!