Peter Burnhill, Director of EDINA, University of Edinburgh

25
Threats to the integrity of evidence in the record of scholarship Peter Burnhill EDINA University of Edinburgh

Transcript of Peter Burnhill, Director of EDINA, University of Edinburgh

Page 1: Peter Burnhill, Director of EDINA, University of Edinburgh

Threats to the integrity of evidence in the record of scholarship

Peter Burnhill

EDINA University of Edinburgh

Page 2: Peter Burnhill, Director of EDINA, University of Edinburgh

Focus on two unintended consequences of the

Web/Internet

Picture credit: http://somanybooksblog.com/2009/03/27/library-tour/

①Digital back copy

is not in the

custody of libraries

Libraries boast of

‘e-collections’,

but they only have

‘e-connections’ Caroline Brazier,

Chief Librarian British Library

Page 3: Peter Burnhill, Director of EDINA, University of Edinburgh

Scholarly Articles increasingly link to the Web-at-large

not just back to other Articles

Dark solid lines represents URIs to Web-at-large, from 1997/2011

arXiv PMC

Link Rot: Link stops working e.g. HTTP 404 “Not Found”

② Links to web resources suffer ‘Reference Rot’,

a combination of two factors:

+ Content Drift: What is at end of URI has changed, or gone!

Page 4: Peter Burnhill, Director of EDINA, University of Edinburgh

1. Ensuring access, over the long term, to online journals

– Keepers Registry: a Jisc-funded service at EDINA

2. Remedy for reference rot: so what is cited is not lost

– Hiberlink: an Andrew Mellon-funded project at EDINA

To counter these threats to our scholarship,

focus on two initiatives

What’s this got to do the REF …?

Page 5: Peter Burnhill, Director of EDINA, University of Edinburgh

Evidence base for REF 2014: four individual pieces of research output

for judgment in 36 Units of Assessment (UoAs): a grade point average for each institution

What of the evidence behind this: has to be reckoned as important?

The value of Research Power for each HEI is shown as a dot plot For REF2008 & REF2014

Cambridge

UCL, Oxford

Edinburgh

KCL, Nottingham

Imperial, Bristol. Leeds

Manchester

REF2014

Research Power: ‘a measure of quality multiplied by volume’ [the REF grade point average x FTE]

Page 6: Peter Burnhill, Director of EDINA, University of Edinburgh

“The Scholarly Record has a fuzzy edge”

‘e-journals’

Websites, Databases, Repositories

‘book-length work’

‘Gov Docs’

Much of what is in the Scholarly Record

(& submitted as evidence to the REF) is digital …

conference proceedings

‘e-magazines’

‘e-newsmedia’

‘data as findings’

New ‘research objects’

… and online somewhere

Page 7: Peter Burnhill, Director of EDINA, University of Edinburgh

National Science Library,

Chinese Academy of Sciences

Good News: we have some digital shelving

National Science Library,

Chinese Academy of Sciences

① Web-scale not-for-profit archiving agencies:

① National institutions (usually national libraries) …

① Consortia of university libraries & specialist centres …

Private LOCKSS Networks

Page 8: Peter Burnhill, Director of EDINA, University of Edinburgh

… and you can now discover who is looking after what

.

Page 9: Peter Burnhill, Director of EDINA, University of Edinburgh

We can derive two Key Performance Indicators

(KPIs)

‘Ingest Ratio’ = titles ‘ingested & archived’ by 1+ Keeper

/ ‘online serials’ in ISSN Register

‘KeepSafe Ratio’ = titles ingested by 3+ Keepers / ‘online serials’ in ISSN Register

Page 10: Peter Burnhill, Director of EDINA, University of Edinburgh

Big Variation in Archival Status of Online Continuing Resources (assigned ISSN) by Country, July 2015

ISSNCount

Country

IngestRatio

%

KeepSafeRatio

%Archival

statusunknown

31757 USA 34.1 6.9 20911

14569 UK 44.6 20.2 8066

12118 France 4.8 1.4 11538

8655 Canada 7.7 0.2 7988

7189 Brazil 0.8 0.2 7130

7121 Germany 26.6 10.0 5228

6556 Spain 4.0 0.5 6296

5411 Netherlands 68.0 48.6 1729

5248 India 6.7 1.9 4899

5078 Australia 4.6 1.4 4846

4955 International 2.2 0.4 4847

3908 Finland 0.6 0.1 3884

3576 Italy 5.8 1.1 3368

3456 Denmark 2.1 0.4 3383

2700 NewZealand 4.0 0.1 2591

2693 Poland 8.8 0.9 2457

2251 Romania 3.6 0.0 2169

2187 Japan 6.3 3.8 2050

2153 CzechRep 2.7 0.1 2094

2070 RussiaFed 6.8 5.5 1929

1991 Norway 2.0 0.2 1950

1769 Argentina 1.1 0.8 1749

1688 Switzerland 15.8 3.7 1421

1627 Hungary 4.6 0.6 1553

1224 Slovenia 1.06 0.00 1211

1149 Croatia 1.74 0.00 1129

1092 Egypt 59.89 3.85 438

1071 KoreaS 6.44 3.45 1002

1053 Iran 1.52 0.47 1037

1015 Sweden 10.25 0.89 911

Others<

1,000each

165,949

Total

Ingest Ratio for Top 10 of ISSN assigned

Page 11: Peter Burnhill, Director of EDINA, University of Edinburgh

What of the articles in journals and other serials that were submitted to REF2014? • Articles were > 80% of all submissions

Page 12: Peter Burnhill, Director of EDINA, University of Edinburgh

For some Panels books and other types of output may be important , but journal articles also important!

Page 13: Peter Burnhill, Director of EDINA, University of Edinburgh

Is the content of those journals being kept safe for future scholarship?

Estimate the Ingest Ratio and KeepSafe Ratio for each Unit of Assessment:

1. identify the e-journals, using ISSN in the metadata for the articles submitted & the ISSN Register

2. check archival status in the Keepers Registry

3. compute the %age of journals with at least some volumes on the ‘digital shelves’ by an organisation with archival intent

(an idea from Steven Carlyle-Davies, EDINA)

Page 14: Peter Burnhill, Director of EDINA, University of Edinburgh

Many of Journals used in REF not known to be archived: => are at risk of loss Varies by Panel

Archiving of Journals in REF, by Unit of Assessment

Law Classics

Classics

Page 15: Peter Burnhill, Director of EDINA, University of Edinburgh

The big publishers are paying to be archived,

by CLOCKSS & Portico

Elsevier Hindawi

T&F, OUP, etc

Wiley etc Springer

Karger

Page 16: Peter Burnhill, Director of EDINA, University of Edinburgh

very many ‘at risk’ e-journals from many (small & not so small) publishers

BIG publishers

have acted but incompletely

This includes the ‘applied literature’ that has societal impact

Page 17: Peter Burnhill, Director of EDINA, University of Edinburgh

“when links to web resources no longer point

to what was intended”

②That other unintended consequence of the Web

Reference Rot = Link Rot + Content Drift

Funding: Andrew W. Mellon

Foundation

What of the citations that act as evidence for those articles submitted to the REF?

Page 18: Peter Burnhill, Director of EDINA, University of Edinburgh

Link Rot

Link Rot’ is known to be scary

Page 19: Peter Burnhill, Director of EDINA, University of Edinburgh

Content Drift may be even scarier! When what is at end of cited URL has changed, or gone!!

http://dl00.org

2000

http://dl00.org

2004

http://dl00.org

2005

http://dl00.org

2008

(a) Dynamic content as values on webpage changes over time

(b) Static content but very different (often

unrelated) web pages

Page 20: Peter Burnhill, Director of EDINA, University of Edinburgh

Hiberlink analysed 1million URI links to Web-at-large not links to publisher & access platforms (DOI etc)

If a Memento cannot be found in a Web Archive within N days of the date of

publication, but URI is still active then risk of loss (& rot)

Methodology: answer to 2 questions

1. Do those links (URIs) still work? - on the ‘Live Web’’?

2. Is there a ‘Memento’ of that reference in the ‘Archived Web’?

If Memento cannot be found in a Web Archive within N days of the date of publication, and URI not active on the Live Web,

then it is lost / rotten

Page 21: Peter Burnhill, Director of EDINA, University of Edinburgh

Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014)

Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot.

PLoS ONE 9(12): e115253. doi:10.1371/journal.pone.0115253 http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0115253

Hiberlink Results: within 14 days of publication date …

PMC Elsevier

‘Not Archived’ 74.5% 75.2%

Of those ‘Not Archived’ % %

still ‘Live’ on the Web 80 67.3

‘No longer Live’ on the Web 20% 32.7%

1/5th & 1/3rd of articles have

Reference Rot within fortnight of publication

Most referenced URIs at risk of loss

Team at Harvard Law School establishing similar evidence “We documented a serious problem of reference rot:

• more than 70% of the URLs within the above mentioned [law] journals, and

• 50% of the URLs within U.S. Supreme Court opinions suffer reference rot

— meaning, again, that they do not produce the information originally cited.”

Jonathan Zittrain, Kendra Albert and Lawrence Lessig (2014).

Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations.

Legal Information Management 14. doi:10.1017/S1472669614000255.

Page 22: Peter Burnhill, Director of EDINA, University of Edinburgh

=> Content of All Citations Rot over Time!!

… leading to rotten references for the reader

Page 23: Peter Burnhill, Director of EDINA, University of Edinburgh

Rot in References means a Defective Article!

undermines the integrity of the scholarly record

Page 24: Peter Burnhill, Director of EDINA, University of Edinburgh

Hiberlink Remedy

As with fish, ‘Quick Freeze & Store’

• Snapshot & Save: Proactive/ Transactional archiving

• Turn a simple URI into a hiberlink URI

Snapshot URI + Original URI + DateTime [Robust Link syntax] http://robustlinks.mementoweb.org/spec/

No time to say more, so go to

http://hiberlink.org

Page 25: Peter Burnhill, Director of EDINA, University of Edinburgh

Looking to the next REF: Open Access & Impact

Open Access has also to mean Assured Access: – “infrastructure … for the curation, integration, discovery,

presentation and preservation of digital collections”

1. Only accept articles that are on digital shelves under policy control of research libraries?

– What is paid for as open access should be kept safe • Check theKeepers.org but also a role for repositories

2. Only accept articles with citations to Web resources that have been archived & accessible to the reader?

• Check Hiberlink.org

- delivering capability & stewardship, nationally & internationally

- part of University of Edinburgh’s commitment to the Sector