Introduction to Wikidata

16
Introduction to Wikidata British Library, 26/4/13 Andrew Gray [email protected] | @generalising

description

"Introduction to Wikidata" presentation given 26th April 2013, at the British Library

Transcript of Introduction to Wikidata

Page 1: Introduction to Wikidata

Introduction to Wikidata

British Library, 26/4/13

Andrew Gray

[email protected] | @generalising

Page 2: Introduction to Wikidata

Wikidata summary

● Central data repository for Wikimedia projects● Human- and machine-readable● Human- and machine-editable● Fully multilingual● Supports semantic relationships

www.wikidata.org

Page 3: Introduction to Wikidata

Overall plan

● Phase I– Centralise cross-language relationships

● Phase II– Centralise core structured data

● Phase III– Dynamic generation of list content

Page 4: Introduction to Wikidata

Phase I

● Centralising all “interwiki” cross-language links– Historically, a major maintenance headache!

● Single conceptual entity => many articles– ...some unexpected oddities arise; not all 1:1

● Almost all entities now listed● Inclusion standards currently restricted

Page 5: Introduction to Wikidata

Phase I

Page 6: Introduction to Wikidata

Phase I – oddities

#'

Page 7: Introduction to Wikidata

Phase II

● Building structured data on these entities

● “Phase 2.1” - harvesting data from Wikipedia– and supplemented from other sources

● “Phase 2.2” - displaying data on Wikipedia– autogenerated information templates

Page 8: Introduction to Wikidata

Phase II

Page 9: Introduction to Wikidata

Phase III

● Automatic creation of lists and charts

● Expected for late 2013...

Page 10: Introduction to Wikidata

Wikidata entities

● Single entity corresponding to one or more Wikipedia articles– Name (in various languages) + WP links– Contains various Phase II properties– Properties can include sources/qualifiers

● No support (yet!) for entities not existing in WP

Page 11: Introduction to Wikidata

Phase II – planned model

Page 12: Introduction to Wikidata

Phase II – initial properties

● Limited properties – gradual roll-outStandard ● Single“main type”, but no restrictions on use

– “the capital of Julius Caesar”● Relational properties implemented

– but no automatic reciprocity yet● String datatypes created for identifiers● 130 properties currently in use

Page 13: Introduction to Wikidata

Phase II – future properties

● Properties created by community discussion● Several awaiting datatypes:

– time– geocoordinate– number (and dimension)

● Qualifiers yet to be added

Page 14: Introduction to Wikidata

Data reuse

● Permanent numeric identifier for all items● API available (JSON)

– but still being developed!● Regular XML dumps – dumps.wikimedia.org

– all item/property data licensed as CC-0

Page 15: Introduction to Wikidata

Identifiers & authorities

● GND, ISNI, LCCN, ULAN, VIAF, BNF, SUDOC, CALIS, CiNii, NDL, ICCU, NLA, MusicBrainz, IMDB

● ISBN, ISSN, OCLC, DOI, NOR● OpenStreetMap IDs● Corporate, administrative, monument,

chemical, gene identifiers, language codes● ...and pigeon breed registries

Page 16: Introduction to Wikidata

Tools

● Examples of toolsets:– GeneaWiki (visualise relations)– Reasonator (display interface)– Query API (experimental, alternative)– Tree of Life (static dump)