Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations
-
Upload
thomas-gottron -
Category
Science
-
view
137 -
download
3
Transcript of Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations
![Page 1: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations](https://reader033.fdocuments.net/reader033/viewer/2022052522/554e7b75b4c9054a698b5110/html5/thumbnails/1.jpg)
Institute for Web Science & Technologies – WeST
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web
Document Representations
Christian Hachenberg and Thomas Gottron
Workshop Web of Linked Entities (WoLE 2012) at ISWC 2012
Sunday, 11 November 2012
![Page 2: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations](https://reader033.fdocuments.net/reader033/viewer/2022052522/554e7b75b4c9054a698b5110/html5/thumbnails/2.jpg)
Thomas Gottron WoLE Workshop 2012 2Finding Good URLs
Mapping Documents to Entities
dbpedia.org:Rob_Roy_(film)
![Page 3: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations](https://reader033.fdocuments.net/reader033/viewer/2022052522/554e7b75b4c9054a698b5110/html5/thumbnails/3.jpg)
Thomas Gottron WoLE Workshop 2012 3Finding Good URLs
Mapping Entities to Documents
dbpedia.org:Rob_Roy_(film)
Align entities in KB with public documents
• Publish knowledge base• Propagate changes• Human readable
representation
Align entities in KB with public documents
• Publish knowledge base• Propagate changes• Human readable
representation
![Page 4: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations](https://reader033.fdocuments.net/reader033/viewer/2022052522/554e7b75b4c9054a698b5110/html5/thumbnails/4.jpg)
Thomas Gottron WoLE Workshop 2012 4Finding Good URLs
Task Definition
Harrison Ford
dbpedia:Star_Wars_Episode_IV:_A_New_Hope
dbpedia:George_Lucas
prop
erty
: sta
rrin
g
type: actor
type: movie
type: director
dbpedia:Harrison_Ford
Star Wars IV: A New Hope
George Lucas
prop
erty
: dire
cts
3 types of information:
• Labels• Link structure• Types
???
![Page 5: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations](https://reader033.fdocuments.net/reader033/viewer/2022052522/554e7b75b4c9054a698b5110/html5/thumbnails/5.jpg)
Thomas Gottron WoLE Workshop 2012 5Finding Good URLs
Label Search (using Web Search Engine)
Harrison Ford
dbpedia:Star_Wars_Episode_IV:_A_New_Hope
dbpedia:George_Lucas
prop
erty
: sta
rrin
g
type: actor
type: movie
type: director
dbpedia:Harrison_Ford
Star Wars IV: A New Hope
George Lucas
prop
erty
: dire
cts
SW4
SW4
SW4
Implementation:
• Bing
![Page 6: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations](https://reader033.fdocuments.net/reader033/viewer/2022052522/554e7b75b4c9054a698b5110/html5/thumbnails/6.jpg)
Thomas Gottron WoLE Workshop 2012 6Finding Good URLs
Exploiting Link Structure
Harrison Ford
dbpedia:Star_Wars_Episode_IV:_A_New_Hope
dbpedia:George_Lucas
prop
erty
: sta
rrin
g
type: actor
type: movie
type: director
dbpedia:Harrison_Ford
Star Wars IV: A New Hope
George Lucas
prop
erty
: dire
cts
GL SW4
SW4
SW4HF
Implementation:
• In-degree• PageRank• HITS
+ Variations:Topic, Focussed
![Page 7: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations](https://reader033.fdocuments.net/reader033/viewer/2022052522/554e7b75b4c9054a698b5110/html5/thumbnails/7.jpg)
Thomas Gottron WoLE Workshop 2012 7Finding Good URLs
Type Filtering
dbpedia:Star_Wars_Episode_IV:_A_New_Hope
type: movie
Star Wars IV: A New Hope
SW4
SW4
RR
SW4
GT
Implementation:
• Borda Count for domain ranking
dbpedia:Gran_Torino_(film)
type: movieGran Torino
dbpedia:Rob_Roy_(film)
type: movieRob Roy
![Page 8: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations](https://reader033.fdocuments.net/reader033/viewer/2022052522/554e7b75b4c9054a698b5110/html5/thumbnails/8.jpg)
Thomas Gottron WoLE Workshop 2012 8Finding Good URLs
Experimental Setup
100 Entities 4 domains (cities, companies, persons, movies) Stratified by little, medium and large representation on the
web Complete network of linked entities
Application of label search and link structure approaches Type-filtering as post-process
User evaluation (Cranfield setup, pooling) Graded relevance judgements High juror agreement (Krippendorff's Alpha >0.67)
![Page 9: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations](https://reader033.fdocuments.net/reader033/viewer/2022052522/554e7b75b4c9054a698b5110/html5/thumbnails/9.jpg)
Thomas Gottron WoLE Workshop 2012 9Finding Good URLs
Evaluation Metrics
At which rank can I expect the first relevant result
Average P@1: How often can I expect the first result to be relevant
Precision@1
![Page 10: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations](https://reader033.fdocuments.net/reader033/viewer/2022052522/554e7b75b4c9054a698b5110/html5/thumbnails/10.jpg)
Thomas Gottron WoLE Workshop 2012 10Finding Good URLs
Evaluation: Results
Statistically significant , p=0.05
![Page 11: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations](https://reader033.fdocuments.net/reader033/viewer/2022052522/554e7b75b4c9054a698b5110/html5/thumbnails/11.jpg)
Thomas Gottron WoLE Workshop 2012 11Finding Good URLs
Evaluation: Results (Domain, Stratum)
![Page 12: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations](https://reader033.fdocuments.net/reader033/viewer/2022052522/554e7b75b4c9054a698b5110/html5/thumbnails/12.jpg)
Thomas Gottron WoLE Workshop 2012 12Finding Good URLs
Evaluation: Results (Filtering)
![Page 13: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations](https://reader033.fdocuments.net/reader033/viewer/2022052522/554e7b75b4c9054a698b5110/html5/thumbnails/13.jpg)
Thomas Gottron WoLE Workshop 2012 13Finding Good URLs
Conclusions and Next Steps
Novel task: Mapping entities to public web URLs
– Evaluated 9 link analysis and web search methods (+1 post-processing using Borda counts)
– Best methods: Label Search and Focussed HITS• Semantic Typing boosts all results
Next steps: Investigate domain-dependent performance of methods
![Page 14: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations](https://reader033.fdocuments.net/reader033/viewer/2022052522/554e7b75b4c9054a698b5110/html5/thumbnails/14.jpg)
Thomas Gottron WoLE Workshop 2012 14Finding Good URLs
Thank you!
Contact:WeST – Institute for Web Science and Technologies
Universität Koblenz-Landau