Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

14
Institute for Web Science & Technologies – WeST Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations Christian Hachenberg and Thomas Gottron Workshop Web of Linked Entities (WoLE 2012) at ISWC 2012 Sunday, 11 November 2012

Transcript of Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Page 1: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Institute for Web Science & Technologies – WeST

Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web

Document Representations

Christian Hachenberg and Thomas Gottron

Workshop Web of Linked Entities (WoLE 2012) at ISWC 2012

Sunday, 11 November 2012

Page 2: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 2Finding Good URLs

Mapping Documents to Entities

dbpedia.org:Rob_Roy_(film)

Page 3: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 3Finding Good URLs

Mapping Entities to Documents

dbpedia.org:Rob_Roy_(film)

Align entities in KB with public documents

• Publish knowledge base• Propagate changes• Human readable

representation

Align entities in KB with public documents

• Publish knowledge base• Propagate changes• Human readable

representation

Page 4: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 4Finding Good URLs

Task Definition

Harrison Ford

dbpedia:Star_Wars_Episode_IV:_A_New_Hope

dbpedia:George_Lucas

prop

erty

: sta

rrin

g

type: actor

type: movie

type: director

dbpedia:Harrison_Ford

Star Wars IV: A New Hope

George Lucas

prop

erty

: dire

cts

3 types of information:

• Labels• Link structure• Types

???

Page 5: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 5Finding Good URLs

Label Search (using Web Search Engine)

Harrison Ford

dbpedia:Star_Wars_Episode_IV:_A_New_Hope

dbpedia:George_Lucas

prop

erty

: sta

rrin

g

type: actor

type: movie

type: director

dbpedia:Harrison_Ford

Star Wars IV: A New Hope

George Lucas

prop

erty

: dire

cts

SW4

SW4

SW4

Implementation:

• Bing

Page 6: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 6Finding Good URLs

Exploiting Link Structure

Harrison Ford

dbpedia:Star_Wars_Episode_IV:_A_New_Hope

dbpedia:George_Lucas

prop

erty

: sta

rrin

g

type: actor

type: movie

type: director

dbpedia:Harrison_Ford

Star Wars IV: A New Hope

George Lucas

prop

erty

: dire

cts

GL SW4

SW4

SW4HF

Implementation:

• In-degree• PageRank• HITS

+ Variations:Topic, Focussed

Page 7: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 7Finding Good URLs

Type Filtering

dbpedia:Star_Wars_Episode_IV:_A_New_Hope

type: movie

Star Wars IV: A New Hope

SW4

SW4

RR

SW4

GT

Implementation:

• Borda Count for domain ranking

dbpedia:Gran_Torino_(film)

type: movieGran Torino

dbpedia:Rob_Roy_(film)

type: movieRob Roy

Page 8: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 8Finding Good URLs

Experimental Setup

100 Entities 4 domains (cities, companies, persons, movies) Stratified by little, medium and large representation on the

web Complete network of linked entities

Application of label search and link structure approaches Type-filtering as post-process

User evaluation (Cranfield setup, pooling) Graded relevance judgements High juror agreement (Krippendorff's Alpha >0.67)

Page 9: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 9Finding Good URLs

Evaluation Metrics

At which rank can I expect the first relevant result

Average P@1: How often can I expect the first result to be relevant

Precision@1

Page 10: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 10Finding Good URLs

Evaluation: Results

Statistically significant , p=0.05

Page 11: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 11Finding Good URLs

Evaluation: Results (Domain, Stratum)

Page 12: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 12Finding Good URLs

Evaluation: Results (Filtering)

Page 13: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 13Finding Good URLs

Conclusions and Next Steps

Novel task: Mapping entities to public web URLs

– Evaluated 9 link analysis and web search methods (+1 post-processing using Borda counts)

– Best methods: Label Search and Focussed HITS• Semantic Typing boosts all results

Next steps: Investigate domain-dependent performance of methods

Page 14: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 14Finding Good URLs

Thank you!

Contact:WeST – Institute for Web Science and Technologies

Universität Koblenz-Landau

[email protected]