"Filling the Digital Preservation Gap" with Archivematica

26
“Filling the Digital Preservation Gap” with Archivematica Jenny Mitcham, Digital Archivist, University of York Simon Wilson, University Archivist, University of Hull 25 November 2015

Transcript of "Filling the Digital Preservation Gap" with Archivematica

Page 1: "Filling the Digital Preservation Gap" with Archivematica

“Filling the Digital Preservation Gap” with Archivematica

Jenny Mitcham, Digital Archivist, University of YorkSimon Wilson, University Archivist, University of Hull

25 November 2015

Page 2: "Filling the Digital Preservation Gap" with Archivematica

Filling the digital preservation gap:Project aim

“…to investigate Archivematica and explore how it might be used to provide digital preservation functionality within a wider infrastructure for Research Data Management.”

Page 3: "Filling the Digital Preservation Gap" with Archivematica

This is a collaborationUniversity of Hull:• Chris Awre – Head of Information Services, Library and

Learning Innovation• Richard Green – Independent Consultant• Simon Wilson – University ArchivistUniversity of York:• Julie Allinson – Manager, Digital

York• Jen Mitcham – Digital ArchivistArtefactual Systems Jisc

Page 4: "Filling the Digital Preservation Gap" with Archivematica

Project structure• Phase 1 – explore: testing, research,

thinking -produce a report (3 months)• Phase 2 – develop: make

Archivematica better for RDM, plan implementation (4 months)

• Phase 3 – implement: set up proof of concepts at York and Hull (6 months)

Page 5: "Filling the Digital Preservation Gap" with Archivematica

Why do we need digital preservation?

Page 6: "Filling the Digital Preservation Gap" with Archivematica

Why do we need digital preservation for research data?

• We can’t ignore digital preservation – moving targets for data retention mean we need to take this seriously

• Funder requirements around retention:– NERC - data should be retained for a minimum of 10 years but

for projects of major importance this may need to be 20 years or longer

– STFC - expect data to be retained for a minimum of 10 years and data that cannot be re-measured should be retained indefinitely

– Wellcome Trust – expect data to be kept for a minimum of 10 years but suggest longer periods for certain types of data

Page 7: "Filling the Digital Preservation Gap" with Archivematica

University of York RDM questionnaire 2013

• Which data management issues have you come across in your research over the last five years?– “Inability to read files in old software formats on old

media or because of expired software licences”– 24% of 181 researchers who answered this question

admitted this had been a problem for them

Why do we need digital preservation for research data?

Page 8: "Filling the Digital Preservation Gap" with Archivematica

What does research data look like?

York RDM questionnaire 2013: Please select the main types of electronic research data you generate

Page 9: "Filling the Digital Preservation Gap" with Archivematica

Top research data applications at York

Page 10: "Filling the Digital Preservation Gap" with Archivematica

What does research data look like?

York RDM questionnaire 2013:If your project is not yet complete, can you make an estimate of the ‘final’ size of your digital data

Page 11: "Filling the Digital Preservation Gap" with Archivematica

Value of research data

“There has probably been an awful lot of good data lost due to poor practice

in archiving ...”

“Storing vast datasets which are not part of the final publication adds a lot

of cost for very little benefit.”

“Unprocessed data is generally large and difficult to analyse, unless the analysis tools are provided in the

archive.”

“I hope strongly that in the future I might contribute to a widely available

repository for musical instruction/examples ....both for other players/composers and for

musicological researchers.”

Researchers

Page 12: "Filling the Digital Preservation Gap" with Archivematica

Why Archivematica?“The goal of the Archivematica project is to give archivists and librarians with limited technical and financial capacity the tools, methodology and confidence to begin preserving digital information today.”

Page 13: "Filling the Digital Preservation Gap" with Archivematica

Why Archivematica?• Standards-based• Open Source• Flexible and customisable• Compatible with hundreds of file formats• Advanced search and storage management• Integrated with third-party systems

From https://ww.archivematica.org/en/

Page 14: "Filling the Digital Preservation Gap" with Archivematica

What does Archivematica do?The short answer:“It packages data up in a standards compliant way and prepares it to be stored for the long term”

Page 15: "Filling the Digital Preservation Gap" with Archivematica

What does Archivematica do?The longer answer:• Assigns unique identifiers • Creates a checksum for each object• Creates a text file with a directory tree of the transfer• Option to quarantine data for a specified period• Runs virus checks• Cleans up file and directory names (removing characters that may cause

problems)• Runs identification tools so you can find out what file formats you have• Extracts data from zip files (or not if you would rather not)• Extracts metadata embedded in the files (if you want)• Normalises files (if a migration path exists)• ...

Page 16: "Filling the Digital Preservation Gap" with Archivematica

What does Archivematica do?The really really long answer (if you have time):

• Read the manualhttps://ww.archivematica.org/en/docs/archivematica-1.4/

Page 17: "Filling the Digital Preservation Gap" with Archivematica

Why would we recommend Archivematica for RDM?

• It is flexible and can be configured in different ways for different institutional needs and workflows

• It allows many of the tasks around digital preservation to be carried out in an automated fashion

• It can be used alongside other existing systems as part of a wider workflow for research data

• It is a good digital preservation solution for those with limited resources

• It gives institutions greater confidence that they will be able to continue to provide access to usable copies of research data over time

Page 18: "Filling the Digital Preservation Gap" with Archivematica

…and don’t forget the community• It is an evolving solution that is continually

driven and enhanced by and for the digital preservation community– Moving target…but moving in the right direction– Some really interesting developments underway– Engaged communities

• International community• UK user group (includes National Library of Wales, Tate

Britain, Museum of London, Arkivum, several HEIs and some European institutions)

Page 19: "Filling the Digital Preservation Gap" with Archivematica

What are the downsides?• It isn’t a magic bullet• There is no guarantee your data will be

readable in the future• It can only be as good as current digital

preservation practice• It can be fiddly to install correctly• The GUI isn’t that intuitive• You need staff who understand it

Page 20: "Filling the Digital Preservation Gap" with Archivematica

RDM Workflows at York• We get a copy of data from a researcher• We transfer it to Archivematica• Archivematica packages it up for storage and

creates the Archival Information Package (AIP)• Archivematica sends the AIP to archival storage• Metadata is published in data catalogue• If someone requests the data Archivematica will

create a Dissemination Information Package (DIP)• DIP will be uploaded to Digital Library for access

Page 21: "Filling the Digital Preservation Gap" with Archivematica

How do York plan to use Archivematica?

Page 22: "Filling the Digital Preservation Gap" with Archivematica
Page 23: "Filling the Digital Preservation Gap" with Archivematica

How can we improve Archivematica?1. Enable better workflows for RDM (producing a

DIP on request)2. Allowing the DIP (access copy of data) to be

usable by different repository systems3. Helping reduce bottlenecks for big data4. Workflows for unidentified files5. Enabling easier querying of data within

Archivematica by third party applications6. Better documentation

All are in progress in Phase 2 of the project

Page 24: "Filling the Digital Preservation Gap" with Archivematica

Archivematica development partners

and more!

Page 25: "Filling the Digital Preservation Gap" with Archivematica

Read all about it!

http://digital-archiving.blogspot.co.uk/

Page 26: "Filling the Digital Preservation Gap" with Archivematica

Do talk to us if you are

interested in finding out more about this project

Useful links:Digital archiving blog: http://digital-archiving.blogspot.co.uk/Archivematica: https://www.archivematica.org/en/Report: http://dx.doi.org/10.6084/m9.figshare.1481170