Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

37
Nuno Freire, The European Library Michael Mertens, Research Libraries UK Chain Reactions: The Experience of The European Library and Research Libraries UK in Providing Linked Open Data

description

Presentation on the experience and learning of The European Library and Research Libraries UK (RLUK) in creating a set of Linked Open Data based on some 19 million bibliographic records

Transcript of Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Page 1: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Nuno Freire, The European LibraryMichael Mertens, Research Libraries UK

Chain Reactions:The Experience of The European Library and Research Libraries UK in Providing

Linked Open Data

Page 2: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Outline•Library Linked Open Data: Some Motivations

•The Linked Open Data publication process

•Linking aggregated library data: conclusions so

far

•Current status of Linked Open Data publication

at The European Library

•RLUK’s perspective on Linked Open Data

•Next steps

Page 3: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Library LOD: Some MotivationsWhy use Linked Open Data (LOD) to disseminate libraries and their resources?

LOD provides a set of procedures and technical standards to allow the reuse of data across communities.

LOD allows for:

● Opening access to the data… in order to allow others to obtain, process and re-use

the data.

● Linking the data to other datasets … in order to allow others to find the data more easily, better understand its meaning, match it with other data... … to enable new use cases

Page 4: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Library LOD: Some MotivationsLinking data makes it more precise and informative.Data links allow computers to better understand the data, enabling more use cases.

Page 5: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Linked data is not new to libraries, and its value clearly realized● Libraries have perceived the value of linked data for decades:

o Authority files, union catalogues, ...● Library data is already contributing with LOD datasets which are being

re-used across all communities

Nowadays LOD framework addresses the same benefits: ● but beyond libraries … at a global level … across all communities.

The precision of Linked Data is particularly important for re-use in research and Virtual Research Environments

Library LOD: Some Motivations

Page 6: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Overview of the LOD publication process

● Choosing a licence for granting copyright permissions to the data

● Setting up the required infrastructure for the LOD technical requirements

o Requirements for publishing the data

o Requirements for linking the data

Page 7: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Overview of the LOD publication process

● LOD technical requirements for publishing the data

Which data model to use

How to structure the URIs

Supporting the protocols (as specified in the W3C LOD

platform)

Providing bulk download of the data

Supporting querying protocols (optional)

Page 8: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Overview of the LOD publication process

● LOD technical requirements for linking the data

Links promote data re-use and enable new use cases

The potential targets for linking are endless:

● Which data to link?

● Where to link to?

● How precisely to link?

Page 9: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Overview of the LOD publication process

● Publishing LOD is demanding:

● Requires considerable human and computational resources

● Requires a large range of expertise:

o in information science

o in semantic technology● This process may be leveraged on the infrastructure provided by

aggregation networks

Page 10: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Library LODLeveraging on aggregation networks

Aggregation networks may provide:

● An existing information and communication technology

infrastructure

● Technical expertise may be focused on the aggregating

organizations

● Centralized data, enabling for more linking to be establishedo Linking bibliographic within aggregated data is easier than

across distributed datasetso Each library benefit from the linking done for other librarieso Each external dataset liked to, benefits all libraries’ data

Page 11: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Linked Open Datasets at The European Library

Two datasets are available● The RLUK Dataset

o The dataset was the focus of the RLUK Hack Day

o It is a subset of the RLUK database comprising nearly 20 milliOn bibliographic records from 34 libraries.

● The European Library Open Dataseto Data from 15 contrieso Over 60 million Bibliographic

Resourceso Size is likely to double during

2014/2015o Formalization of data distribution

agreements with partner libraries is underway

http://www.theeuropeanlibrary.org/tel4/access

Page 12: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

RLUK’s view #1● Why we partnered with The European Library● #RLUKHack day - what developers tried to do● So, LOD for libraries has arrived! ● Libraries produce it - why aren’t they using it? (or are they?)● Why consume LOD?● Open Data-Theory is nice; agility, skills and benefits even nicer?● How to create a more viable environment?

○ Establish a library developer/coder/programmer base○ Better awareness of tools○ Partnerships between developers and librarians○ Encourage greater documentation of existing APIs

● Need for a vision of more library and vendor engagement.

Page 13: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

RLUK’s View #2

● A Few Examples of those currently aiming to build services using LOD○ AMSL○ National Library of Sweden○ Oslo Public Library○ Linked Jazz

● You can have your (Opac) Cake and Eat it (the Web), too.● The European Library and RLUK view on taking LOD forward: Collection Level

Descriptions as next published LOD.● A European Hack Day…

Page 14: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Strategic Partnership● Why we partnered with The European Library

○ Based on a formal investigation of potential LOD service providers, The European Library was the most cost effective and sustainable choice

○ Part of a broader network, including CERL and Europeana, who had also worked on LOD

● The strategic thinking behind our creation of Linked Open Data○ Sticking to RLUK’s open principles - “Open Access” is not just about journals!○ Ensuring we could comply with changes to European PSI (Public Sector

Information) Directive, which includes museums, libraries and archives for the first time

○ Provide more impetus to libraries in the UK consuming and using LOD as well as producing it

○ Push in new directions regarding overall library skills base.

Page 15: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

(RL)UK context● Observations on UK LOD Context

○ “Infrastructure”=Datasets, Projects, Systems, Capacity and Skills based on LOD

■ Headlining datasets from The British Library, University of Cambridge, Archives Hub, British Museum

■ Projects by Archives Hub, King’s College London and others■ Internal Systems at British Library and University of Cambridge■ Developer Capacity at BL_Labs and a relative handful of libraries■ ….

● The Hack Day itself - who came, what they tried to do, successes and feedback:○ A mix of library-based developers, digital humanists, postgraduates

researching LOD, self-styled “hackers”, systems librarians, repository programmers, and academics interested in data

Page 16: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

What “hackers’ aimed at with RLUK LOD

● Linking a WWI diary to digitised newspapers and other datasets.

● Develop a Chrome web browser extension that matched named entities on The European Library portal with individuals with same date of birth

● Create an interface where users could explore LOD at their chosen level of complexity.

● Inform a dissertation on LOD. ● Populate a learning exercise with

information on German and Russian experiences of WWI.

● Exploring the dataset to find out more about data structures.

Page 17: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

So, LOD for libraries has arrived! Where should we go next?

● We seem to have come a long way - search on Slideshare for “Linked Open Data” gives some 3,500 presentations, for “LOD” there are over 13,000. No longer esoteric?

● Libraries in Europe have been publishing LOD in earnest for some 6 years - just 3 national libraries alone (BL, BNF, DNB) have produced tens of millions of items of LOD

● Everything from bibliographic, authority information to vocabularies

● By contrast only a handful of UK HE libraries are consuming LOD to create their own services

Page 18: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Libraries produce it - why aren’t they using it? (or are they?)

● Creating and publishing LOD does take considerable resource● Impression that consuming LOD would be just as resource-

hungry?● Motivations to date have been (largely) intangible for institutions,

and directed towards external factors (new audiences beyond libraries, linking to and becoming part of Semantic Web)

● Rise of the Machine - by definition a focus on connecting to other machine-readable datasets

● If for humans, then always for “other people”: “The best thing to do with your data will be thought of by someone else”

● But who knows your library’s needs better than you?

Page 19: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

So, why consume LOD?(From http://semanticpublishing.wordpress.com/2013/03/01/lld6-catalogues-and-linked-data/)

Better enrichment and contextualisation (de-siloisation):

“For example, by recording the dates and geographical coordinates

relating to ancient documents held by the Bodleian Libraries, and to the

sites described in its archaeological publications, and by mapping

these coordinates onto Google Maps or some other useful mapping system,

together with similar data held at Cambridge University, it would be

possible for a scholar from Sweden to see at a glance that

Cambridge holds early descriptions of archeaological sites at Nimrud and Nineveh in ancient Mesopotamia, together with a large number of

Mesopotamian written documents dating back 3000 years, while the Sackler Library in Oxford is particularly rich in papyri and information

about Egyptian archaeological sites, while also having good holdings

on cuneiform languages and Assyrian reliefs.”

Page 20: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

So, why consume LOD? #2With vast, exponential amounts of data & information on the Web, it’s better than search:

“In the search for new drugs to treat Alzheimer Disease, a researcher may wish to knows the answer to the following biological question:

“What proteins are involved in cellular signal transduction, and are related to pyramidal neurons?”

A Google search on that question gave ~223,000 hits in 2007, none of which provided a specific answer. However, a search over the linked healthcare data, made in collaboration with the W3C Health Care and Life Science Interest Group, gave 32 responses, each one of which was the name of a specific protein involved both in signal transduction and related to pyramidal neurons.”

(This week, in 2014, the same Google search will result in some 3.2 million hits.)

(From http://semanticpublishing.wordpress.com/2013/03/01/linked-data-101/)

Page 21: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

LOD DIY. Just a Nice Theory?● Improve on or augment existing offerings (see

below)● Broaden library skills base (use & make sense of

data)● Open data - reclaim agility for research support

development● Greater ability to match speed of electronic

content business models● Overcome “format hell” and information siloing

(link and relate information from galleries, libraries, archives and museums)

● Linking information from different campus systems

● Greater support for digital humanities

Page 22: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Example of improving a service for users with a bit of LOD magic (Thanks to Owen Stephens)

Page 23: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Tim Sherrat’s alternative interface

Page 24: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Shopping list of elements in consuming and using LOD more effectively?

● Need for vision: library and vendor engagement (“our system only uses linked data-like concepts”)

● Need for developer base● Awareness of tools● Partnerships between developers and librarians● Librarians as developers● Encourage greater documentation of existing

APIs

Page 25: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

A Few Examples of those currently building services using LOD

● AMSL-A linked data basis for Electronic Resource Management based at SLUB Dresden and Leipzig University Library.

● LIBRIS Consortium and Swedish National Library● Oslo Public Library OPAC● Linked Jazz, a research project based at the Pratt Institute School

of Information and Library Science

Page 26: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

AMSL 1● AMSL - A linked data basis for Electronic Resource Management

based at SLUB Dresden and Leipzig University Library.● Funded by the European Union (EFRE) and the Free State of

Saxony.● Uses tools (OntoWiki) developed at the Institute of Applied

Informatics (InfAI), University of Leipzig under the LOD2 project (7th EU Framework Programme).

● Business models regarding e-media acquisition are changing rapidly. “Our acquisition department longed for a very flexible way to manage e-resources in future”.

Page 27: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

AMSL 2Decision flow for selecting Linked Open Data● Looked for a very flexible solution that could be

adjusted by ourselves.● Required a flexible data model● Emphasis on the management of business data

and best approach for data integration, linking and enrichment

● Tested against commercial offerings (which were too fixed)

Page 28: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

AMSL 3Aim - to make using Linked Open Data as easy as WYSIWYG web editing:

“Our Data Management Platform [enables] librarians with limited or no IT background to model and configure transformation procedures, data schemas and queries.”

Page 29: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

AMSL 4

Page 30: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

LIBRIS● Swedish Union Catalogue, Directory of Libraries, library

infrastructure and OPAC; approximately 6.5 m records, over 22 million holdings, contains some 180 academic, research and public libraries

● The Swedish National Library (Kungliga Biblioteket KB) released its catalogue as Linked Open Data in 2008

● Since 2012, The Swedish National Library (Kungliga Biblioteket KB) has worked on a new infrastructure and system for the national catalogue, basing its core on Linked Data

● Has the capacity to mesh with other linked data on the web, through minimal engineering efforts

● Anticipating BibFrame, replacement for MARC● Displacing MARC legacy entirely with an LOD-based structure

Page 31: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

OSLO Public Library•Announced last month that its catalogue data will be stored as linked data (within open source Koha ILS)•Finds linked data a rich metadata model more suitable for future needs than the library specific MARC-format. •LOD enables it to use the same metadata format for its physical collection and its digital content•Provides a good foundation for search, presentation and integration with other content, internally and externally

Page 32: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

OSLO Public Library #2

“RDF opens a new world of possibilities as to how we can connect our metadata with data from other relevant resources. We will pursue data harvesting from other sources, which mean we can add value to our core content, and harvesting of basic bibliographic data to facilitate the cataloguing process.”

Page 33: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Linked Jazz 1“Research into the possibilities for linked open data applications within digital scholarship remains wide open. While a number of research projects are currently exploring methods and tools to clean and open up data for use in linked open data environments, the field of digital scholarship lacks a critical mass of these efforts.”(Linked Jazz: Building with Linked Open Data, Leanora Lange, the Center for Jewish History & Cristina Pattuelli, School of Information and Library Science at the Pratt Institute). 30 June, 2014.

Page 34: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Linked Jazz 2

Page 35: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Next Steps for RLUK & The European Library

● Collection Level Descriptionso Usecases have already been identified in digital

humanites

o Publication as LOD

● European Hack Day o A concertation of several LOD efforts from the library

domain

Page 36: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Final Thought

"We hope someone shares our dream of a full-fledged RDF library system. Please get in touch if you do! :)”

(OSLO Public Library, 19 June, 2014)

Page 37: Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.

Contact and acknowledgements

Nuno Freire: [email protected]

Mike Mertens: [email protected] @RLUK_Mike

Thanks to David Shotton (University of Oxford) and Silvio Peroni (University of Bologna) for material for slides 19 & 20, from The Semantic Publishing Blog.Thanks to Tim Sherrat (via Owen Stephens) for slides 22 & 23

Headline image, ‘Castle of Light’ by Bernt Rostad https://www.flickr.com/photos/brostad/Linked Open Data Lifecycle by Michael Haschke https://www.flickr.com/photos/haschek/“Stop Hugging Your Data” by Amy https://www.flickr.com/photos/_-amy-_/ (via Paul Miller)