Extending DBpedia (LOD) using WikiTables

28
Extending DBpedia (LOD) using WikiTables Emir Muñoz Unit for Reasoning and Querying [email protected]

Transcript of Extending DBpedia (LOD) using WikiTables

Page 1: Extending DBpedia (LOD) using WikiTables

Extending DBpedia (LOD) using WikiTables

Emir Muñoz

Unit for Reasoning and Querying

[email protected]

Page 2: Extending DBpedia (LOD) using WikiTables

Linked Open Data

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

October 12, 2012 -- E. Muñoz

Page 3: Extending DBpedia (LOD) using WikiTables

Linked Open Data

• DBpedia, an export of Wikipedia’s structured data

DBpedia provides RDF version of all wikipedia structured data (infoboxes)

October 12, 2012 -- E. Muñoz

Page 4: Extending DBpedia (LOD) using WikiTables

Linked Open Data

• DBpedia, an export of Wikipedia’s structured data

DBpedia provides RDF version of all wikipedia structured data (infoboxes)

But not yet a version of all normal Wikipedia tables or wikitables

October 12, 2012 -- E. Muñoz

Page 5: Extending DBpedia (LOD) using WikiTables

Tables as a source of LOD

http://en.wikipedia.org/wiki/Dublin

Caption as another row

Column header represents types of information

The values represent

instances of that types

http://en.wikipedia.org/wiki/Galway

Infoboxes (attr-value)

October 12, 2012 -- E. Muñoz

Tables are inherently concise as well as information rich

Page 6: Extending DBpedia (LOD) using WikiTables

Reasoning over Wikipedia Tables

http://en.wikipedia.org/wiki/Dublin

Recovering Table Semantics …

October 12, 2012 -- E. Muñoz

Dublin is twinned with the following places:

Page 7: Extending DBpedia (LOD) using WikiTables

Reasoning over Wikipedia Tables

dbpedia.org/resource/San_Jose,_California

dbpedia.org/resource/Liverpool

dbpedia.org/resource/Matsue,_Shimane

dbpedia.org/resource/Barcelona

dbpedia.org/resource/Beijing

dbpedia.org/resource/United_States

dbpedia.org/resource/United_Kingdom

dbpedia.org/resource/Japan

dbpedia.org/resource/Spain

dbpedia.org/resource/People’s_Republic_of_China

dbpedia.org/property/city dbpedia.org/property/nation dbpedia.org/property/since

http://en.wikipedia.org/wiki/Dublin

Entity annotation for cells, mappings to DBpedia resources

(xsd:integer)

October 12, 2012 -- E. Muñoz

Page 8: Extending DBpedia (LOD) using WikiTables

Reasoning over Wikipedia Tables

dbpedia.org/resource/San_Jose,_California

dbpedia.org/resource/Liverpool

dbpedia.org/resource/Matsue,_Shimane

dbpedia.org/resource/Barcelona

dbpedia.org/resource/Beijing

dbpedia.org/resource/United_States

dbpedia.org/resource/United_Kingdom

dbpedia.org/resource/Japan

dbpedia.org/resource/Spain

dbpedia.org/resource/People’s_Republic_of_China

(xsd:integer)

dbpedia.org/property/city dbpedia.org/property/nation dbpedia.org/property/since

dbpedia.org/ontology/country dbpedia.org/property/subdivisionName

is dbpedia.org/ontology/country of

http://en.wikipedia.org/wiki/Dublin

Extracting relations

October 12, 2012 -- E. Muñoz

Page 9: Extending DBpedia (LOD) using WikiTables

Reasoning over Wikipedia Tables

• <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_States> .

• <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_States> .

• <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_Kingdom> .

• <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> .

• <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Japan> .

• <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan> .

• <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Spain> .

• <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Spain> .

• <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/People's_Republic_of_China> .

• <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/People's_Republic_of_China> .

October 12, 2012 -- E. Muñoz

Page 10: Extending DBpedia (LOD) using WikiTables

Reasoning over Wikipedia Tables

• <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_States> .

• <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_States> .

• <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_Kingdom> .

• <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> .

• <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Japan> .

• <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan> .

• <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Spain> .

• <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Spain> .

• <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/People's_Republic_of_China> .

• <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/People's_Republic_of_China> .

October 12, 2012 -- E. Muñoz

Page 11: Extending DBpedia (LOD) using WikiTables

Reasoning over Wikipedia Tables

• Let’s analyze these cases …

• Liverpool

• Matsue

• Beijing

October 12, 2012 -- E. Muñoz

Page 12: Extending DBpedia (LOD) using WikiTables

Not that simple…

• Web tables usually don’t have explicit semantics by themselves.

• Main issues:

– Complex tables with spans

– Captions inside the table as another row

– Not well-formed tables (i.e., not a matrix)

– We need filters (e.g., min 2 columns, 2 rows)

• We are extracting relations at row level and between the main entity and the table resources

October 12, 2012 -- E. Muñoz

Page 13: Extending DBpedia (LOD) using WikiTables

Parsing: Extracting Tables

http://en.wikipedia.org/wiki/People%27s_Republic_of_China

Caption as another row

Table split

October 12, 2012 -- E. Muñoz

Rowspans with pictures

First step: parsing Wiki format

Page 14: Extending DBpedia (LOD) using WikiTables

Parsing: Extracting Tables

• Problems with parsing the cell’s content

http://en.wikipedia.org/wiki/Danny_Kaye

October 12, 2012 -- E. Muñoz

Page 15: Extending DBpedia (LOD) using WikiTables

Parsing: Extracting Tables

• Problems with parsing the cell’s content

http://en.wikipedia.org/wiki/Danny_Kaye

October 12, 2012 -- E. Muñoz

Page 16: Extending DBpedia (LOD) using WikiTables

Parsing: Extracting Tables

Same page link Many different formats

Anchor text vs.

Content text

http://en.wikipedia.org/wiki/List_of_animated_television_series_of_the_1990s

October 12, 2012 -- E. Muñoz

Page 17: Extending DBpedia (LOD) using WikiTables

Extracting Relations

A table containing tables

http://en.wikipedia.org/wiki/AFC_Ajax

October 12, 2012 -- E. Muñoz

Page 18: Extending DBpedia (LOD) using WikiTables

Extracting Relations

• Also relations between the main entity and the entities in the table

dbpedia.org/resource/AFC_Ajax

14 dbpedia.org/ontology/team 14 dbpedia.org/property/clubs 11 dbpedia.org/property/currentclub 3 dbpedia.org/property/youthclubs

In his dbpedia page there is no mention

to AFC Ajax

http://en.wikipedia.org/wiki/AFC_Ajax

16 players

October 12, 2012 -- E. Muñoz

Page 19: Extending DBpedia (LOD) using WikiTables

dbpedia.org/resource/Christian_Eriksen

Disambiguation page dbpedia.org/resource/Ajax

http://en.wikipedia.org/wiki/AFC_Ajax

October 12, 2012 -- E. Muñoz

Page 20: Extending DBpedia (LOD) using WikiTables

Our Dataset

• enwiki dump from 2012-09-03 02:17:37

• 8.6 GB of Wikipedia pages that comprise

– 10,531,986 documents (HTML pages)

– Only 413,256 HTML contains tables

– 2,989,098 tables

– 905,929 tables after the filter

• 27.7% of the whole tables

– 0.46 tables per page (or 2.15 discarding pages without tables)

October 12, 2012 -- E. Muñoz

Page 21: Extending DBpedia (LOD) using WikiTables

Methodology

October 12, 2012 -- E. Muñoz

Page 22: Extending DBpedia (LOD) using WikiTables

Ranking of Relationships

• The current ranking function is naïve

October 12, 2012 -- E. Muñoz

http://en.wikipedia.org/wiki/AFC_Ajax

16 players

freq relationship score

14 dbpedia.org/ontology/team 0,875

14 dbpedia.org/property/clubs 0,875

11 dbpedia.org/property/currentclub 0,6875

3 dbpedia.org/property/youthclubs 0,1875

𝑠𝑐𝑜𝑟𝑒 =𝑓𝑟𝑒𝑙𝑛𝑟𝑜𝑤𝑠

Page 23: Extending DBpedia (LOD) using WikiTables

Ranking of Relationships

• For this cases is not good and 𝑠𝑐𝑜𝑟𝑒 ∉ [0,1]

October 12, 2012 -- E. Muñoz

http://en.wikipedia.org/wiki/Danny_Kaye

Page 24: Extending DBpedia (LOD) using WikiTables

Ongoing Work and Challenges

• Improve the ranking function for relations.

• Store the 5.5M DBpedia (transitive) redirects locally (optimizing time).

• Statistical analysis of Wikipedia tables

– Number of columns, rows

– Headers, Captions

– External and internal links

• The big following challenge is the evaluation.

October 12, 2012 -- E. Muñoz

Page 25: Extending DBpedia (LOD) using WikiTables

What’s next?

• Some ideas in mind:

– Use the extracted relations to classify WikiTables

– Define a similarity function for WikiTables

English Italian

October 12, 2012 -- E. Muñoz

Page 26: Extending DBpedia (LOD) using WikiTables

What’s next?

October 12, 2012 -- E. Muñoz

http://en.wikipedia.org/wiki/Electronegativity

What means this number?

Here there is no reference to those numbers!

Page 27: Extending DBpedia (LOD) using WikiTables

What’s next?

October 12, 2012 -- E. Muñoz

http://en.wikipedia.org/wiki/Electronegativity

http://en.wikipedia.org/wiki/Chlorine

Chlorous acid is a chlorite

http://dbpedia.org/page/Chlorous_acid

Page 28: Extending DBpedia (LOD) using WikiTables

Open problems

• Handle multiple-entities in the same cell

• Improve the ranking function

• Handle redirects before querying DBpedia

• How to evaluate the outcome

October 12, 2012 -- E. Muñoz

Thanks! Q & A

Thanks! Emir Muñoz

Unit for Reasoning and Querying

[email protected]