Introduction to Wikidata
-
Upload
andrew-gray -
Category
Technology
-
view
632 -
download
0
description
Transcript of Introduction to Wikidata
Wikidata summary
● Central data repository for Wikimedia projects● Human- and machine-readable● Human- and machine-editable● Fully multilingual● Supports semantic relationships
www.wikidata.org
Overall plan
● Phase I– Centralise cross-language relationships
● Phase II– Centralise core structured data
● Phase III– Dynamic generation of list content
Phase I
● Centralising all “interwiki” cross-language links– Historically, a major maintenance headache!
● Single conceptual entity => many articles– ...some unexpected oddities arise; not all 1:1
● Almost all entities now listed● Inclusion standards currently restricted
Phase I
Phase I – oddities
#'
Phase II
● Building structured data on these entities
● “Phase 2.1” - harvesting data from Wikipedia– and supplemented from other sources
● “Phase 2.2” - displaying data on Wikipedia– autogenerated information templates
Phase II
Phase III
● Automatic creation of lists and charts
● Expected for late 2013...
Wikidata entities
● Single entity corresponding to one or more Wikipedia articles– Name (in various languages) + WP links– Contains various Phase II properties– Properties can include sources/qualifiers
● No support (yet!) for entities not existing in WP
Phase II – planned model
Phase II – initial properties
● Limited properties – gradual roll-outStandard ● Single“main type”, but no restrictions on use
– “the capital of Julius Caesar”● Relational properties implemented
– but no automatic reciprocity yet● String datatypes created for identifiers● 130 properties currently in use
Phase II – future properties
● Properties created by community discussion● Several awaiting datatypes:
– time– geocoordinate– number (and dimension)
● Qualifiers yet to be added
Data reuse
● Permanent numeric identifier for all items● API available (JSON)
– but still being developed!● Regular XML dumps – dumps.wikimedia.org
– all item/property data licensed as CC-0
Identifiers & authorities
● GND, ISNI, LCCN, ULAN, VIAF, BNF, SUDOC, CALIS, CiNii, NDL, ICCU, NLA, MusicBrainz, IMDB
● ISBN, ISSN, OCLC, DOI, NOR● OpenStreetMap IDs● Corporate, administrative, monument,
chemical, gene identifiers, language codes● ...and pigeon breed registries
Tools
● Examples of toolsets:– GeneaWiki (visualise relations)– Reasonator (display interface)– Query API (experimental, alternative)– Tree of Life (static dump)