Cultural Linked Data: Some preliminary results of the Linked Heritage project EVA Moscow Conference...
-
date post
15-Jan-2016 -
Category
Documents
-
view
218 -
download
0
Transcript of Cultural Linked Data: Some preliminary results of the Linked Heritage project EVA Moscow Conference...
Cultural Linked Data: Some preliminary results
of the Linked Heritage projectEVA Moscow Conference
November 2011
Gordon McKenna
International Development Manager
Collections Trust, UK
Regine Stein
Head of Information Technology, Deutsches Dokumentationszentrum
für Kunstgeschichte –
Bildarchiv Foto Marburg, Germany
Context – The Linked Heritage Project
http://www.linkedheritage.org
Project Overview
Basic information:• Length – 30 months; • Partners – 38+; • Budget – €3.85m (80% from EC ICT-PSP Programme); • Background – Successor to ATHENA (Minerva & MICHAEL)
Objectives: • To contribute large quantities of new content to Europeana, from both the
public and private sectors;• To demonstrate enhancement of quality of content, in terms of metadata
richness, re-use potential and uniqueness;• To demonstrate enable improved search, retrieval and use of Europeana
content.
Work packages: • WP 1 Project management and Coordination (114 person months)• WP 2 Linking Cultural Heritage Information (53 pm)• WP 3 Terminology (73 pm)• WP 4 Public Private Partnership (57 pm)• WP 5 Technical Integration (38 pm)• WP 6 Coordination of Content (238 pm)• WP 7 Dissemination & Training (116 pm)
WP 2 – Selected Overview
Objectives: • To explore the state of the art in linked data;
• To identify appropriate models, processes and technologies for the deployment of linked data;
• To consider how linked data practices can be applied to cultural heritage;
• To explore the state of the art in persistent identifiers.
Tasks and Deliverables:• T2.1 – Exploring cultural heritage information best practic
o D2.1 – Best practice report on cultural heritage linked data and metadata standards
• T2.2 – Resource identification [PIDs]
o D2.2 – State of the art report on persistent identifier standards and management tools
Project Methodology
1. Carry out research – What exists, survey
2. Make an analysis – Look for patterns and trends.
3. Give simple advice – practical and implementable
4. Reuse or create tools – Easy to use, audience relevant, adaptable open licence (e.g. Multilingual versions possible)
5. Identify further needs – Leading to further work
Partner Survey
• Aimed at partners in Linked Heritage
• Data collection – Online Surveymonkey (supported by a RTF document)
• Sections:
1. Participant information2. Metadata standards and use3. Linked data use and Europeana agreement
Survey Method and Structure
• Museum – 4
• Library – 5
• Archive – 4
• Sound archive – 1
• Aggregator – 10
• Other – 23
• Total – 47
Participant Type
• Yes: 29 (74.4%)
• No: 10 (25.6%)
Familiar with the Linked Data Concept?
• Yes: 6 (15.40%)
• No: 33 (84.60%)
• Details:
o 4 – Dbpedia;
o 3 – GeoNames;
o 1 – Freebase;
o 1 – IPTC;
o 1 – SKOS;
o 1 – [in-house];
Used Linked Data?
• Yes: 4 (10.3%)
• No: 35 (89.7%)
• Details:
• http://data.kunstkamera.ru/sparql;
• http://data.kunstkamera.ru
• http://nektar.oszk.hu/wiki/Semantic_web
• Thesaurus in SKOS
Published Linked Data?
• Yes: 15 (38.5%)
• No: 24 (65.5%)
• Activity in:
• France• Germany• Israel• Italy• Russia• Spain• Sweden• United Kingdom
Know of Linked Data Projects?
• Europeana's new licence requires that provider's will have to agree to have the metadata that they provide to Europeana published as Linked Open Data. This means that any 3rd party use, including commercial, is permitted. Does your organisation agree to this?
• Please explain your answer.
Europeana Agreement Questions
• Yes: 30.6% – Why? • [no explanation];• Publishing on Web means Open Data; • Participated in the ATHENA project; • Metadata provided to Europeana specifically selected for Open Linked
Data
• No: 16.7% – Why?• Against 3rd party commercial use; • National policy does not allow commercial use; • Do not contribute to Europeana; • [No explanation]
• Not sure: 52.8% – Why?• Under discussion; • Metadata not ours (our providers’ decision); • Under discussion (possible legal obstacles); • Decision not ours (made at a higher level); • Will provide minimal data; • Against commercial reuse
Europeana Licence Agreement?
Conclusions
• A market for basic information and guidance;
• Significant concerns in cultural organisations about publishing completely open data.
Research into the Linked Open Data Cloud
Tim Berners-Lee 2007 –http://www.w3.org/DesignIssues/LinkedData.html
1.Use URIs as names for things;
2.Use HTTP URIs so that people can look up those names;
3.When someone looks up a URI, provide useful RDF information;
4.Include RDF statements that link to other URIs so that they can discover related things.
Linked Data Principles
Linked Data – simple rules
• The URI identifies an entity – this can be an artwork, a person, a place, a concept etc.
• If two people create data using the same URI then they are describing the same entity.
• That makes it easy to merge data from different sources together – not only in one single database, in one portal, but „web-wide“.
• This actually means making the web – which currently is a global, universal information space for documents – into a global database.
http://linkeddata.org
Linked Open Data CloudMay 2007
12 data packages
Linked Open Data CloudMarch 2009
89 data packages
Linked Open Data CloudSeptember 2011
295 data packages
The Data Hub
http://thedatahub.org
• Part of CKAN – Comprehensive Knowledge Archive Network)
• Registry of open [and not open] knowledge
• Packages: > 2.300 packages in total, ~ 300 of them in the LOD cloud
• Projects (and a few closed ones).
‘Open’ = commercial use
311 packages:
• Yes 42.6%
• No 57.4%
c38 billion triples:
• Yes 30.9%
• No 69.1%
Is the LOD Cloud Open?
PackagesTriples
• CC BY 28.8% 45.8%
• CC BY-SA 18.2% 10.2%
• PDDL 10.6% 0.2%
• CC0 9.1% 2.9%
• UK Crown Copyright with data.gov.uk rights 7.6% 27.4%
• Other (Public Domain) 6.8% 7.0%
• Other (Open) 5.3% 5.0%
• Other (Attribution) 3.0% 0.4%
• UK Open Government Licence (OGL) 3.0% 0.1%
• GNU FDL 3.0% <0.1%
• ODbL 2.3% 0.9%
• GNU GPL 0.8% <0.1%
• New and Simplified BSD licences 0.8% 0.1%
Open Licences Used
Packages Triples
• [not given] 69.1% 89.4%
• None 14.6% 0.3%
• CC BY-NC 7.3% 5.8%
• Other (Not Open) 6.7% <0.1%
• CC BY 1.1% 0.6%
• Other (Non-Commercial) 0.6% 3.9%
• CC BY-SA 0.6% <0.1%
Not Open Licences Used (or Not)
• > 1 b 2.9%
• > 500 m 1.9%
• >100 m 6.1%
• >50 m 5.79%
• >10 m 14.8%
• >5 m 6.1%
• >1 m 15.8%
• > 0.5 m 7.4%
• > 0.1 m 14.5%
• < 0.1 m 24.4%
Number of triples per package
Top Packages Linked To By Packages
Packages Links (million)
1. DBpedia 158 31.53
2. GeoNames Semantic Web 38 9.35
3. [none] 34 0
4. DBLP Computer Science Bibliography (RKBExplorer) 27 1.34
5. Association for Computing Machinery (ACM) (RKBExplorer) 26 1.49
6. ePrints3 Institutional Archive Collection (RKBExplorer) 26 0.28
7. Freebase 25 10.45
8. CiteSeer (Research Index) (RKBExplorer) 24 0.80
9. School of Electronics and Computer Science, University of Southampton (RKBExplorer) 24 0.04
10.ReSIST Project Wiki (RKBExplorer) 24 <0.01 [408]
Cultural Packages in the Cloud
Triples (million)
• VIAF: The Virtual International Authority File 200.0• Europeana Linked Open Data 185.0• British National Bibliography (BNB) 80.2• Hungarian National Library (NSZL) catalog 19.3• Amsterdam Museum as Linked Open Data in the Europeana Data Model 5.0• Library of Congress Subject Headings 4.2• Swedish Open Cultural Heritage Other (Open) 3.4• Calames 2.0• RAMEAU subject headings (STITCH) 1.6• data.bnf.fr - Bibliothèque nationale de France 1.4• National Diet Library of Japan subject headings 1.3• Gemeenschappelijke Thesaurus Audiovisuele Archieven 1.0• Gemeinsame Normdatei (GND) 0.6• Archives Hub Linked Data 0.4• Thesaurus for Graphic Materials (t4gm.info) 0.1• Italian Museums (LinkedOpenData.it) <0.1• Thesaurus W for Local Archives <0.1• MARC Codes List Open Data <0.1
Open licences
Number
• CC0 2
• Other (Public Domain) 1
• Other (Open) 1
• ODbL 1
Not open licences
Number
• [not given] 9
• CC BY-SA 3
• Other (non-commercial) 1
Cultural Heritage – Licences Used
3,5 Mio object descriptions= 185 m triple
contains currently< 620.000 links to other packages
Europeana in the LOD
31
Europeana examplesAmsterdam Museum
Europeana examplesAmsterdam Museum
Hack4Europe Award „Most Innovative Application“: Time Mash – based on your current geographical location historical views of the same place and interesting objects in the vicinity are searched in Europeana.
33
Europeana examples
Conclusions
Open Data – Licensing?• Must have one Before publishing make a decision? • What kind of licence can you give (CC useable?)?• What kind of 3rd party use do you want to allow?
Linkable Data – Publishing?• Use Persistent Identifiers;• Select ‘standard’ data formats; • Carefully choose what you are publishing
Linking Data – Which package(s) do you link to? • Trusted source?• Presence of PIDs and maintained resource?
Linked Culture Cloud – shared resource?• Sub-set of the LOD Cloud / CKAN; • Information relevant for cultural institutions• Feed into general LOD Cloud