A collaborative approach to "filling the digital preservation gap" for Research Data Management

26
A collaborative approach to “filling the digital preservation gap” for Research Data Management Jenny Mitcham, University of York Chris Awre, University of Hull Sarah Romkey, Artefactual Systems 9 November 2015

Transcript of A collaborative approach to "filling the digital preservation gap" for Research Data Management

Page 1: A collaborative approach to "filling the digital preservation gap" for Research Data Management

A collaborative approach to “filling the digital preservation gap” for Research Data ManagementJenny Mitcham, University of York

Chris Awre, University of Hull

Sarah Romkey, Artefactual Systems

9 November 2015

Page 2: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Research at Hull and York

• Hull : – 6 Faculties, 25+ academic departments

– c. 22,000 students

– 62% research classed as 3* or 4* in REF 2014

– In top 50 UK institutions by ‘research power’

• York:– 30+ academic departments

– c. 16,000 students

– Ranked in the top ten of UK universities for research council income (THE)

– Secured £46 million in research council income in 2014/15

Page 3: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Filling the digital preservation gap:Project aim

“…to investigate Archivematica and explore how it might be used to provide digital preservation functionality within a wider infrastructure for Research Data Management.”

Page 4: A collaborative approach to "filling the digital preservation gap" for Research Data Management

What is Archivematica?

● Free and open-source digital preservation system (AGPLv3) designed to maintain standards-based, long-term access to digital objects

● Allows users to process digital objects from ingest to access in using ISO-OAIS functional model

● Implements format normalization upon ingest and preserves originals to support emulation and migration strategies

Page 5: A collaborative approach to "filling the digital preservation gap" for Research Data Management

What is Archivematica?

● Archivematica is a processing pipeline consisting of a bundle of open-source tools and python scripts which deliver a series of preservation micro-services

● Archivematica is designed to output high-quality, standards-compliant Archival Information Packages (AIPs)● Bagit, METS, PREMIS

Page 6: A collaborative approach to "filling the digital preservation gap" for Research Data Management

How is Archivematica used?

• For long term preservation of born-digital and digitized material

• To unpack and preserve contents of disk images• To create access copies for AtoM, CONTENTdm,

other access systems• As a “dark archive” for DSpace content• Research data preservation

– => Jisc Research Data Spring project

Page 7: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Archivematica development partners

and more!

Page 8: A collaborative approach to "filling the digital preservation gap" for Research Data Management

This is a collaborationUniversity of Hull:• Chris Awre – Head of Information Services, Library and

Learning Innovation• Richard Green – Independent Consultant• Simon Wilson – University ArchivistUniversity of York:• Julie Allinson – Manager, Digital York• Jen Mitcham – Digital ArchivistArtefactual Systems Jisc - part of Research Data Spring

Page 9: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Project structure• Phase 1 – explore: testing, research,

thinking -produce a report (3 months)• Phase 2 – develop: make

Archivematica better for RDM, plan implementation (4 months)

• Phase 3 – implement: set up proof of concepts at York and Hull (6 months)

Page 10: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Why do we need digital preservation?

Page 11: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Why do we need digital preservation for research data?

• We can’t ignore digital preservation – moving targets for data retention mean we need to take this seriously

• Funder requirements around retention:– NERC - data should be retained for a minimum of 10 years but

for projects of major importance this may need to be 20 years or longer

– STFC - expect data to be retained for a minimum of 10 years and data that cannot be re-measured should be retained indefinitely

– Wellcome Trust – expect data to be kept for a minimum of 10 years but suggest longer periods for certain types of data

Page 12: A collaborative approach to "filling the digital preservation gap" for Research Data Management

University of York RDM questionnaire 2013

• Which data management issues have you come across in your research over the last five years?– “Inability to read files in old software formats on old

media or because of expired software licences”– 24% of 181 researchers who answered this question

admitted this had been a problem for them

Why do we need digital preservation for research data?

Page 13: A collaborative approach to "filling the digital preservation gap" for Research Data Management

What does research data look like?

York RDM questionnaire 2013: Please select the main types of electronic research data you generate

Page 14: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Top research data applications at York

Page 15: A collaborative approach to "filling the digital preservation gap" for Research Data Management

What does research data look like?

York RDM questionnaire 2013:If your project is not yet complete, can you make an estimate of the ‘final’ size of your digital data

Page 16: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Why would we recommend Archivematica for RDM?

• It is flexible and can be configured in different ways for different institutional needs and workflows

• It allows many of the tasks around digital preservation to be carried out in an automated fashion

• It can be used alongside other existing systems as part of a wider workflow for research data

• It is a good digital preservation solution for those with limited resources

• It is an evolving solution that is continually driven and enhanced by and for the digital preservation community

• It gives institutions greater confidence that they will be able to continue to provide access to usable copies of research data over time

Page 17: A collaborative approach to "filling the digital preservation gap" for Research Data Management

What are the downsides?• It isn’t a magic bullet• There is no guarantee your data will be

readable in the future• It can only be as good as current digital

preservation practice• It can be fiddly to install correctly• The GUI isn’t that intuitive• You need staff who understand it

Page 18: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Read all about it!

http://digital-archiving.blogspot.co.uk/

Page 19: A collaborative approach to "filling the digital preservation gap" for Research Data Management

How could you use Archivematica?• Host it in-house and link it to an existing

repository/access system (for example DSpace, CONTENTdm, Fedora/Hydra ...or a CRIS)

• Host it in-house and use as a standalone system (you would need to have a storage system in place and establish a way of facilitating access to the data)

• Sign up for a hosted instance of Archivematica with archivesDIRECT (combines Archivematica with DuraCloud storage)

• Sign up for a hosted instance of Archivematica with Arkivum (combines Archivematica with Arkivum storage)

Page 20: A collaborative approach to "filling the digital preservation gap" for Research Data Management

RDM Workflows at York• We get a copy of data from a researcher• We transfer it to Archivematica• Archivematica packages it up for storage and

creates the Archival Information Package (AIP)• Archivematica sends the AIP to archival storage• Metadata is published in data catalogue• If someone requests the data Archivematica will

create a Dissemination Information Package (DIP)• DIP will be uploaded to Digital Library for access

Page 21: A collaborative approach to "filling the digital preservation gap" for Research Data Management

How do York plan to use Archivematica?

Page 22: A collaborative approach to "filling the digital preservation gap" for Research Data Management
Page 23: A collaborative approach to "filling the digital preservation gap" for Research Data Management

How can we improve Archivematica?1. Enable better workflows for RDM (producing a

DIP on request)2. Allowing the DIP (access copy of data) to be

usable by different repository systems3. Helping reduce bottlenecks for big data4. Workflows for unidentified files5. Enabling easier querying of data within

Archivematica by third party applications6. Better documentation

All are in progress in Phase 2 of the project

Page 24: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Where to find out more

http://www.york.ac.uk/borthwick/

Page 25: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Where to find out more

Page 26: A collaborative approach to "filling the digital preservation gap" for Research Data Management

Do talk to us if you are interested in finding out more about this project

Useful links:Digital archiving blog: http://digital-archiving.blogspot.co.uk/Archivematica: https://www.archivematica.org/en/Report: http://dx.doi.org/10.6084/m9.figshare.1481170