Metadata in Wikipedia

27
Metadata in Wikipedia Daniel Kinzler Wikipedia Traditional Metadata Document and Revision Media Metadata Accessing Metadata Link Structure Hyperlinks Categories Inter-Language Links WikiWord Structured Data Records Infoboxes DBPedia Semantic MediaWiki WikiData Conclusion We Have We Need Thank You Metadata in Wikipedia data in, data out Daniel Kinzler Wikimedia Deutschland e.V. September 26. 2008

description

Presentation by Daniel Kinzler about Metadata and Wikipedia at the DC-2008 Wikimedia Workshop on User Generated Metadata

Transcript of Metadata in Wikipedia

  • 1.Metadata in WikipediaDaniel Kinzler Wikipedia Traditional MetadataMetadata in WikipediaDocument and RevisionMedia MetadataAccessing Metadata data in, data outLink StructureHyperlinksCategoriesInter-Language LinksWikiWord Daniel Kinzler Structured DataRecordsInfoboxes Wikimedia Deutschland e.V. DBPediaSemantic MediaWikiWikiDataSeptember 26. 2008ConclusionWe HaveWe NeedThank You

2. Metadata in Wikipedia WikipediaDaniel KinzlerWikipediaTraditional Metadata Document and Revision Media Metadata Accessing MetadataLink Structure Hyperlinks Categories Wikipedia is the free encyclopedia anyone can editInter-Language Links WikiWordFounded in 2001 Structured Data Records Has become the standard online referenceInfoboxes DBPedia Semantic MediaWiki Number 8 website (Alexa), 50K requests per second WikiDataConclusion Exists in 250 languages, has 10 million articlesWe Have We Need Run by Wikimedia, runs on MediaWiki Thank You Free content, free software 3. Metadata in Wikipedia Document MetadataDaniel KinzlerWikipediaTraditional Metadata Document and Revision Media Metadata Accessing MetadataTraditional (document) metadata is available throughout Link Structure Hyperlinks Wikipedia Categories Inter-Language LinksDocument informationWikiWordStructured DataTitleRecordsURLInfoboxes DBPedia Semantic MediaWiki Revision informationWikiData Author Conclusion We HaveTimestampWe Need Thank You 4. Metadata in Wikipedia Document MetadataDaniel KinzlerWikipedia Metadata for media les is maintained on-page, as Traditional Metadata content:Document and Revision Media Metadata Accessing Metadata Source, License, Contributors, . . .Link Structure Hyperlinks Categories Inter-Language Links WikiWordStructured Data Records Infoboxes DBPedia Semantic MediaWiki WikiDataConclusion We Have We Need Thank You 5. Metadata in Wikipedia Images Metadata Daniel Kinzler Wikipedia Traditional MetadataDocument and Revision Metadata for image formats Media MetadataAccessing MetadataResolution Link StructureHyperlinks EXIF CategoriesInter-Language LinksAuthor, Copyright WikiWord Structured DataTimestamp RecordsExposure, Aperture, InfoboxesDBPediaFlash Semantic MediaWikiWikiDataCamera modelConclusion... We HaveWe Need Metadata for audio and Thank Youvideo formats is not yet supported. 6. Metadata in Wikipedia Online Export InterfaceDaniel KinzlerWikipediaTraditional Metadata Document and Revision Media Metadata Accessing MetadataLink Structure MediaWikis page export facility provides limited Hyperlinks Categories metadataInter-Language Links WikiWordSpecial:ExportStructured Data Records Pages and revisions Infoboxes DBPedia Semantic MediaWiki XML wrapper around wikitext WikiDataConclusion Some basic metadata We Have We Need Thank You 7. Metadata in Wikipedia Online Export Interface XMLDaniel KinzlerWikipedia http://en.wikipedia.org/wiki/Special: Traditional Metadata Export/Berlin Document and Revision Media Metadata Accessing Metadata Berlin Link Structure Hyperlinks 3354 Categories Inter-Language Links WikiWord 2406278312008-09-24T06:44:58Z Structured Data Records Infoboxes Ling.Nut DBPedia 1929579Semantic MediaWiki WikiData Conclusion We Have clean up, typos fixedWe Need Thank You {{pp-semi-protected|small=yes}} {{otheruses1|the capital of Germany}} {{Infobox German Bundesland 8. Metadata in Wikipedia MediaWiki Web APIDaniel KinzlerWikipediaTraditional Metadata Document and Revision Media Metadata Accessing MetadataLink Structure MediaWikis web API for bots/scriptsHyperlinks Categories api.php Inter-Language Links WikiWordsupports complex queriesStructured Data Records lots of propertiesInfoboxes DBPedia Semantic MediaWiki several output formats (JSON, YAML, WDDX, . . . ) WikiDataConclusion but no RDFWe Have We Need Thank You 9. Metadata in Wikipedia MediaWiki Web API XMLDaniel Kinzler http://en.wikipedia.org/w/api.php?action= Wikipediaquery&titles=Berlin&prop=info|Traditional Metadata revisions&rvlimit=5&format=xmlDocument and Revision Media Metadata Accessing Metadata Infoboxes DBPedia Semantic MediaWiki Thank You 10. Metadata in Wikipedia MediaWiki RDF Extension Daniel Kinzler Wikipedia Traditional MetadataDocument and RevisionMedia Metadata The RDF Extension provides access to metadataAccessing Metadata Link Structure Per-page RDF outputHyperlinksCategories Document info mainly in DC and CC vocabInter-Language LinksWikiWordAlso links, categories, images, etcStructured DataRecords Output in XML, Turtle or NTriplesInfoboxesDBPediaSemantic MediaWiki Supports custom RDF embedded in wiki pages WikiData Conclusion Compare http://www.communitywiki.org/en/ We Have DublinCoreForWikiWe NeedThank YouNot on Wikipedia, used by WikiTravel 11. Metadata in Wikipedia MediaWiki RDF Extension XML Daniel Kinzler Wikipedia Traditional MetadataDocument and Revision http://wikitravel.org/en/Special:Rdf/BerlinMedia MetadataAccessing Metadata Categories WikiWord2008-09-23T18:04:01Z Structured DataRecords Infoboxes DBPedia Creative Commons Attribution-ShareAlike 1.0Semantic MediaWikiWikiData ConclusionWe Have Berlin We Need Thank You 12. Metadata in Wikipedia Structural Information Daniel Kinzler Wikipedia Traditional MetadataDocument and RevisionMedia MetadataAccessing Metadata Link StructureHyperlinksCategories Wiki pages contain several types of linksInter-Language LinksWikiWordThe structure of hyperlinks encodes relationsStructured DataRecords Links connect on textual and conceptual levelInfoboxesDBPediaSemantic MediaWiki Links maintened by users, relations are implicit WikiData ConclusionWe HaveWe NeedThank You 13. Metadata in Wikipedia Page LinksDaniel KinzlerWikipediaTraditional Metadata Document and Revision Media Metadata Accessing MetadataHyperlinks cross-reference pagesLink Structure Hyperlinks Navigational, but also conceptual Categories Inter-Language Links WikiWord Mutually linked pages related concepts Structured Data Link label and link target word and meaning Records Infoboxes DBPedia Beware identity crisis when choosing URIs Semantic MediaWiki WikiDataConclusion [[Berlin Wall|The Wall]]We Have We Need Thank You 14. Metadata in Wikipedia Category Links Daniel Kinzler Wikipedia Traditional MetadataDocument and RevisionMedia MetadataAccessing Metadata Pages are assigned to one or more categories.Link StructureHyperlinks Categories form a poly-hierarchy (by convention) CategoriesInter-Language Links Categories of pages Subsumtion of concepts WikiWord Structured Data Structure often unclear or brokenRecordsInfoboxes No intersection, no transitive inclusion DBPediaSemantic MediaWikiWikiData[[Category:Capitals in Europe]]ConclusionWe Have [[Category:States of Germany]] We NeedThank You 15. Metadata in Wikipedia Inter-Language LinksDaniel KinzlerWikipediaTraditional Metadata Document and Revision Media Metadata Inter-language links refer to the same page in a dierent Accessing Metadatalanguage (on another wiki)Link Structure Hyperlinks Granularity and coverage dier greatlyCategories Inter-Language Links WikiWord Mutually linked pages probably describe the same Structured Data concept Records Infoboxes Maintained manually, and per botDBPedia Semantic MediaWiki WikiData Would a centralized system be better? Conclusion We Have [[de:Berliner Mauer]] We Need Thank You [[fr:Mur de Berlin]] 16. Metadata in Wikipedia WikiWord Daniel Kinzler Wikipedia Traditional MetadataDocument and RevisionMedia Metadata WikiWord builds a thesaurus by mining the link structure Accessing Metadata Link Structure Every page describes a concept HyperlinksCategoriesInter-Language Links Link labels are terms refering to those concepts WikiWord Structured Data Links and categories dene relations RecordsInfoboxes Multilingual thesaurus by merging languagesDBPediaSemantic MediaWiki Export to SKOS WikiData Conclusion No web interface yet We HaveWe NeedThank You http://brightbyte.de/page/WikiWord 17. Metadata in Wikipedia Data RecordsDaniel KinzlerWikipediaWikipedia uses templates to present structured data Traditional Metadata Document and Revision records Media Metadata Accessing Metadata Maintained directly by users Link Structure HyperlinksTemplate parameters can be extracted Categories Inter-Language Links WikiWordMediaWiki stores them as plain text Structured Data RecordsExternal mining tools needed Infoboxes DBPedia Semantic MediaWiki {{Infobox German Bundesland WikiData |Name = Berlin Conclusion |image_photo = Cityscapeberlin2006.JPGWe Have |area = 891.82We Need Thank You |population = 3416300 |elevation = 34 - 115 |GDP = 81.7 ... 18. Metadata in Wikipedia InfoboxesDaniel KinzlerWikipediaTraditional Metadata Document and Revision Media Metadata Accessing MetadataLink Structure Infoboxes present a terse overview of Hyperlinks CategoriespropertiesInter-Language Links WikiWordUsed for Cities, animals, bands,Structured Data Records books, chemicals, . . . Infoboxes DBPedia Semantic MediaWiki Qualiers are problematic:WikiDatadate of measurement, errorConclusion We Have margin, unit, source, etc We Need Thank You 19. Metadata in Wikipedia Personendaten Daniel Kinzler Wikipedia Personendaten are biographic records on the German Traditional MetadataDocument and Revision WikipediaMedia MetadataAccessing MetadataWorks like a hidden infobox Link StructureHyperlinksContains date/place of birth/death, aliases, etc. CategoriesInter-Language LinksMaintained by a WikiProject WikiWord Structured DataAutomated extraction (every now and then) RecordsInfoboxesDBPedia {{PersonendatenSemantic MediaWikiWikiData |NAME=Einstein, Albert |ALTERNATIVNAMEN=ConclusionWe Have |KURZBESCHREIBUNG=PhysikerWe Need |GEBURTSDATUM=14. Mrz 1879 aThank You |GEBURTSORT=[[Ulm]] |STERBEDATUM=18. April 1955 |STERBEORT=[[Princeton (New Jersey)|Princeton]], [[USA]] }} 20. Metadata in Wikipedia DBPedia Daniel Kinzler Wikipedia Traditional MetadataDocument and RevisionMedia MetadataAccessing Metadata Link StructureHyperlinksCategoriesDBPedia is a project that mines RDF triples from Inter-Language LinksWikiWordInfoboxesStructured DataRecords Allows SPARQL queriesInfoboxesDBPediaSemantic MediaWiki Multiple languages WikiData100 million triplesConclusionWe HaveWe Need Web interfaceThank Youhttp://dbpedia.org 21. Metadata in Wikipedia DBPedia XML Daniel Kinzler Wikipedia Traditional Metadata http://dbpedia.org/data/Berlin Document and RevisionMedia MetadataAccessing Metadata Hyperlinks Inter-Language LinksWikiWord Infoboxes Semantic MediaWikiWikiData We Need 22. Metadata in Wikipedia Semantic MediaWikiDaniel KinzlerWikipediaTraditional Metadata Document and Revision Media Metadata Accessing MetadataLink Structure Hyperlinks CategoriesSemantic MediaWiki is a MediaWiki extension:Inter-Language Links WikiWordBuilds an RDF structure Structured Data RecordsAllows SPARQL queries Infoboxes DBPedia Semantic MediaWiki Users enter semantic relations in wiki syntax WikiDataConclusion More complex syntax We Have We Need semantic-mediawiki.orgThank You Not supported by Wikipedia 23. Metadata in Wikipedia Semantic MediaWiki XML Daniel KinzlerWikipediahttp://semantic-mediawiki.org/wiki/Special: Traditional Metadata ExportRDF/BerlinDocument and Revision Media Metadata Accessing Metadata Link Structure BerlinHyperlinks CategoriesInter-Language Links WikiWord Structured Data Records DBPediaSemantic MediaWiki 52 31 0 N, 13 24 0 E WikiData Conclusion We Have 3391407 24. Metadata in Wikipedia WikiDataDaniel KinzlerWikipediaTraditional Metadata Document and Revision Media Metadata Accessing MetadataLink Structure Hyperlinks CategoriesWikiData is a MediaWiki extension:Inter-Language Links WikiWordStores structured data separate from wikitext Structured Data RecordsReusable across wikis Infoboxes DBPedia Semantic MediaWiki Form-based structured data entryWikiDataConclusion No export interface We Have We Need omegawiki.org Thank You Not used by Wikipedia, active on OmegaWiki 25. Metadata in Wikipedia We Have Daniel Kinzler Wikipedia Traditional MetadataDocument and RevisionMedia MetadataAccessing Metadata Link StructureHyperlinks We have. . . CategoriesInter-Language LinksDocument Metadata WikiWord Structured DataStructural Data RecordsInfoboxesStructured data records DBPediaSemantic MediaWikiWikiDataLots of people maintaining thisConclusionWe HaveWe NeedThank You 26. Metadata in Wikipedia We NeedDaniel KinzlerWikipediaTraditional Metadata Document and Revision Media Metadata Accessing MetadataLink Structure We need ways to. . .Hyperlinks Categoriesmaintain the data easily. Inter-Language Links WikiWordstore structured data sensibly. Structured Data Records query the data eciently. Infoboxes DBPedia Semantic MediaWiki access the data conveniently. WikiDataConclusion We need people to make it happen. We Have We Need Thank You 27. Metadata in Wikipedia Thank You Daniel Kinzler Wikipedia Traditional MetadataDocument and RevisionMedia MetadataAccessing Metadata Link StructureHyperlinksThe End CategoriesInter-Language LinksWikiWord Structured DataRecordsInfoboxesDBPediaSemantic MediaWikiWikiData http://brightbyte.de/repos/papers/2008/ ConclusionWe HaveWe NeedThank You