Fact Extraction from Wikipedia
-
Upload
marco-fossati -
Category
Technology
-
view
204 -
download
4
Transcript of Fact Extraction from Wikipedia
Cutting Long Stories Short
Fact Extraction from Wikipedia
Marco Fossati [email protected]
Poznan, 25th June 2015
What?A Google Summer of Code Project for DBpedia
What?
Teaching Machines to Read
Natural Language
Why?Text Contains a Huge Amount of Knowledge
Why?
DBpedia Focuses on Semi-structured Data
Discovery of New Relations
Automatic Knowledge Base Population
How?
Machine Learning +
Lexical Semantics
How?
Poland victory World Cup 2014
“Poland won the World Cup in 2014”
Approach
1. Lexical Units
1.1.Extraction via POS Tagging
1.2.Statistical Ranking
2. Frame Database (FrameNet, Kicktionary)
The Data-driven Way
Approach
3. Frame + Frame Elements Classification
Unsupervised, Rule-based
Supervised
4. Crowdsourced Training Set Construction
5. RDF Serialization
The Data-driven Way
Crowdsourcing the AnnotationLabel words with Frame Elements
Use Case
Soccer Domain
Widely Represented (223.000 articles)
Lots of Semi-structured Data
Italian Wikipedia
Wanna contribute?
https://github.com/dbpedia/fact-extractor