Hiberlink: Prototypes of pro-active approaches to support the archiving of web references for...

32
Prototypes of pro-active approaches to support the archiving of web references for scholarly communications Richard Wincewicz 1 , Peter Burnhill 1 & Herbert Van de Sompel 2 1 EDINA, University of Edinburgh, 2 Los Alamos National Laboratory

Transcript of Hiberlink: Prototypes of pro-active approaches to support the archiving of web references for...

Prototypes of pro-active approaches to support the archiving of web references for scholarly

communications

Richard Wincewicz1, Peter Burnhill1 & Herbert Van de Sompel2

1EDINA, University of Edinburgh, 2Los Alamos National Laboratory

The Project Team 2013 – 2015, funded by the

Andrew W. Mellon Foundation

• Los Alamos National Laboratory:

Research Library: Herbert Van de Sompel Harihar Shankar, [Martin Klein, Rob Sanderson]

• University of Edinburgh:

Language Technology Group: Claire Grover, Beatrice Alex, Colin Matheson, Richard Tobin, [Ke “Adam” Zhou]

EDINA * : Peter Burnhill, Muriel Mewissen (Project Manager), Tim Stickland, Richard Wincewicz, [Neil Mayo]

Centre for Service Delivery & Digital Expertise

Overview

1. Introduction

2. Evidence

3. Remedy

1. Introduction

Reference Rot

Links to Web at Large resources are subject to Reference Rot. This is a combination of two factors:

• Link Rot: Link stops working • e.g. HTTP 404 “Not Found”

• Content Drift: Linked content changes over time• Possibly to the extent that it is no longer

representative of the content that was initially referenced

2. Evidence

Articles that Link to Articles & to Web At Large Resources (PMC)

Martin Klein et al. (2014) Scholarly context not foundhttp://dx.doi.org/10.1371/journal.pone.0115253

Articles that Link to Articles & to Web At Large Resources (Elsevier)

Martin Klein et al. (2014) Scholarly context not foundhttp://dx.doi.org/10.1371/journal.pone.0115253

Articles with URI References (PMC)

Articles 479,194

with URI references 399,005

with URI references to articles 240,857

with URI references to Web at Large 156,160

Martin Klein et al. (2014) Scholarly context not foundhttp://dx.doi.org/10.1371/journal.pone.0115253

Link Rot (PMC)

Martin Klein et al. (2014) Scholarly context not foundhttp://dx.doi.org/10.1371/journal.pone.0115253

Link Rot (Elsevier)

Martin Klein et al. (2014) Scholarly context not foundhttp://dx.doi.org/10.1371/journal.pone.0115253

Links from arXiv, Elsevier, PMC to TLD Targets

Martin Klein et al. (2014) Scholarly context not found. In: PLOS ONEhttp://dx.doi.org/10.1371/journal.pone.0115253

Grey is Link Rot – Referenced Content Not Accessible

Martin Klein et al. (2014) Scholarly context not found. In: PLOS ONEhttp://dx.doi.org/10.1371/journal.pone.0115253

Grey is Not Archived - Referenced Content Lost

Martin Klein et al. (2014) Scholarly context not found. In: PLOS ONEhttp://dx.doi.org/10.1371/journal.pone.0115253

Content Drift – http://dl00.org

2000 2004

2005 2008

(a) Dynamic contentvalues on webpage change

over time

(b) Static contentbut very different (often

unrelated) web pages

3. Remedy

Create Snapshots of Referenced Resources

Various web archives support on-demand creation of snapshots of URIs (manual, API):

archive.today Internet Archive perma.cc webcitation.org

When creating snapshots, maintain: Original URI Snapshot URI Date/Time of snapshot

Create Snapshots of Referenced Resources

Snapshots can be created at various stages. The closer to the moment of referencing, the better the image captured.

Stage Actor Snapshot Quality

Preparation Author/reference tool best

Submission/Issue

Editor/manuscript system

good

PublicationAggregator/

publisher platformok

Post-publicationLibrarian/IR,

journal archivebetter than nothing

Authoring - Zotero Plugin Demonstrator

Richard Wincewicz (2014) Prototype Hiberlink plugin for Zotero for pro-active archiving and temporal references

https://www.youtube.com/v/ZYmi_Ydr65M%26vq

Publication - OJS

Publication - OJS

Publication - OJS

Publication - OJS

Publication - HiberActive Service Demonstrator

Martin Klein et al. (2014) HiberActive: Pro-Active Archiving of web references from scholarly articles

Open Repositories 2014 http://www.slideshare.net/martinklein0815/hiberactive

Reference Resources Robustly

When referencing resources include:

Original URI – Allows the user to revisit the URI as it is at the time of reading, if the URI is still operational

Snapshot URI – Allows the user to visit the snapshot, if one was created, and if the web archive in which it was created is still operational

Date/Time – with the original URI allow the user to visit any snapshot created around the Date/Time in any web archive around the world (using Memento infrastructure)

(2015) Robust Links - Motivationhttp://robustlinks.mementoweb.org/about/

Reference Resources Actionably

When referencing resources, use Link Decorations to convey Original URI, Snapshot URI, Date/Time

<a href=“http://www.stanford.edu” data-versionurl=“http://archive.is/FAy6o” data-versiondate=“2014-08-15” >

<a href=“http://www.stanford.edu” data-versiondate=“2014-08-15” >

Herbert Van de Sompel et al. (2015) Robust Links - Link Decorationshttp://robustlinks.mementoweb.org/spec/

<a href=“http://archive.is/FAy6o” data-originalurl=“http://www.stanford.edu” data-versiondate=“2014-08-15” >

Robust Links Using Link Decorations, JavaScript, Memento API

Demo - http://robustlinks.mementoweb.org/demo/uri_references_js.htmlrobustlinks.js - https://github.com/mementoweb/robustlinks

Activate Robust Links

There are no Link Decorations, currently. But there is an article publication date:

Express the article publication date in an actionable manner (‘datePublished’ or ‘dateModified’ Schema.org properties) in HTML pages that contain URI references

Tailor robustlinks.js to exclude links to articles

Inject robustlinks.js in HTML pages that contain URI references

Users Follow Robust Links into Web Archives

The combination of the referenced URI and the article publication date:

Leads users to a snapshot in a web archive, created as close as possible to the article publication date

Addresses link rot

Addresses content drift

Create Archive Copies

When ingesting new content into the platform:

Parse for URI references

Create snapshots in web archives of select URIs

For these URIs, use Link Decorations in HTML to convey:

• original URI• snapshot URI • snapshot Date/Time

Users Follow Robust Links into Web Archives

The Link Decorations:

Lead users to the created snapshot, if the web archive is operational

Lead users to a snapshot in any web archive, created as close as possible to the snapshot Date/Time

Addresses link rot

Addresses content drift

Prototypes of pro-active approaches to support the archiving of web references for scholarly

communicationsRichard Wincewicz1, Peter Burnhill1

& Herbert Van de Sompel21EDINA, University of Edinburgh, 2Los Alamos National Laboratory

http://hiberlink.org #hiberlink