Digital Curation @ McGill Jenn Riley Associate Dean, Digital Initiatives McGill University Library.

27
Digital Curation @ McGill Jenn Riley Associate Dean, Digital Initiatives McGill University Library

Transcript of Digital Curation @ McGill Jenn Riley Associate Dean, Digital Initiatives McGill University Library.

Digital Curation @ McGill

Jenn RileyAssociate Dean, Digital InitiativesMcGill University Library

What are we responsible for curating?

Primarily digitized versions of analogue rare/unique/archival/valuable materials

Close collaborations with Rare Books & Special Collections, and McGill University Archives

Currently working on collection prioritization and scaling up production

Digitized content

Institutional mandate

Determined by records retention schedule

Involves transfer of selected records from originating departments after the end of their immediate life

Archival appraisal practices determine what to keep

Born digital university records of long term value

Fonds of personal papers and organizational records

The same types of documents we’ve already collected in paper form

Examples of paper archival fonds from the past: Montréal Natural History Society James McGill MS 435 George Mercer Dawson (President of Royal Society of Canada) Harvey Cushing Fonds (William Osler biographer)

Born digital archival materials

E.g., digital art and digital humanities projects

Often more like software than a set of standalone files

High risk of loss compared to analogue ancestors

Born digital creative content

Typically we only have remote access, are not responsible directly for curation

In some cases we must deliver ourselves rather than rely on the vendor

And in these cases, we take on curation responsibility

Some licensed/purchased digital content

McGill students required to deposit masters and doctoral theses, sign a non-exclusive license to disseminate

Policy allows students to request a 1-year embargo

Students retain copyright

McGill does not contract with ProQuest for ETD delivery and preservation

McGill participates in Theses Canada

Some courses show interest in pushing student work to eScholarship@McGill

ETDs and other student work

Supporting “green OA” In fulfillment of funder mandates Or voluntarily

Still not heavily used, at McGill or elsewhere

No serious discussion yet at McGill about a campus mandate

Expecting Canadian Tri-Council OA mandate beginning May 1, 2015

Pre-prints/post-prints

BIG new focus

Studies show significant loss of data sets over time Odds of data supporting a paper being extant fall by 17% per year (Vines et al

2014; doi:10.1016/j.cub.2013.11.014)

Some studies show a citation advantage for papers with open data

30% for papers published in 2004 and 2005 (Piwowar and Vision, 2013; doi:10.7717/peerj.175)

Expecting Canadian Tri-Council data management planning requirements in 2015/2016

Research data

How do we curate it?

As difficult as any other step

Luckily, it’s not an all or nothing proposition

Some areas we’re pretty good at (ETDs, digitized collections)

Others we try but with limited success (pre-prints/post-prints)

Others are brand new to us (research data, born digital archival materials, born digital creative content)

This is difficult!

Determine what’s worth keeping

Create/map metadata

Responsibility to handle personally identifiable information carefully

Processing and organizing

Digitization master files to NCS for storage

Backups of files/servers (digital collections, eScholarship@McGill, born digital university records)

Multiple copies including one off site eScholarship is a “repository” but not a “preservation repository”

Reliance on external vendors (licensed content) E.g., through LOCKSS We run a LOCKSS node at McGill

And the stuff we’re not handling so well (born digital special collections/archival materials)

Several different approaches in place now

What about access?

Need better repositories

That handle common use cases Hierarchical file structures Paged objects Display common file types in-browser

That are connected to preservation systems and manage content in them

How do we make this better?

Harder than the paper world!

What is “the long term”? How long will Universities exist in their current form? How long will computers continue to function the way they do now? How will metadata structures evolve over this period of time?

What does a “pay once” model for digital preservation look like?

What criteria do we use to determine the useful lifespan of a digital file?

It’s about policy as much as technology

How do our organizations set things up to ensure someone takes an active management role over time

Yeah, this is hard

Standardize input file formats to the degree possible

Actively check file integrity

Refresh hardware frequently

Know what will need to be emulated, and what you can safely migrate

Partner!

Strategies

Chronopolis @ UC San Diego

CLOCKSS

Portico

Héritage (Canadiana from CRKN)

Scholars’ Portal

HathiTrust

APTrust

DPN

Who’s doing this well?

And they all need funding to run

And our organizations pay the membership fees from our institutional budgets

How do we get them all to work together?

Committee on Coherence at Scale for Higher Education

That’s a lot of groups!

Biggest decision points

How many repositories and how they connect

Open source vs locally hosted vended vs cloud

Metadata issues

Keeping up with technological advancements

Technical

How many copies

What preservation actions are necessary

Who to partner with

In Canada or beyond?

Business planning

Policy

[email protected]

These presentation slides: http://www.jennriley.com/presentations/mcgillsis/15winter/curation.ppt

Thank you!