A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C.,...

18
a centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation and curation: collaboration for collection development in institutional repository networks Michael Day , Maureen Pennock and Julie Allinson UKOLN, University of Bath Bath BA2 7AY [email protected]/ http://www.ukoln.ac.uk/

Transcript of A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C.,...

Page 1: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

Co-operation for digital preservation and curation: collaboration for collection

development in institutional repository networks

Michael Day, Maureen Pennock and Julie AllinsonUKOLN, University of Bath

Bath BA2 [email protected]/

http://www.ukoln.ac.uk/

Page 2: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

2

Presentation outline

– Emerging work from the Digital Curation Centre– Contexts

• Collaborative infrastructures for digital preservation• Networks of institutional repositories

– Collaboration on preservation infrastructures– Collaboration on collection development policies

• Potential areas for collaboration

– Conclusions• What do digital curators do?• What do they need to know?

Page 3: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

3

Contexts (1)

• Collaborative infrastructures needed for digital preservation and curation, e.g.:

• Preservation is "an ongoing, long-term commitment, often shared, and cooperatively met, by many stakeholders" (Lavoie & Dempsey, 2004)

• Examples:

– Shared services (e.g. file format registries, bit-level preservation)

– Networks of "trust" (audit and certification, etc.)

– Collaboration on policy level, e.g. on collection development and unified access

Page 4: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

4

Contexts (2)

• Institutional repositories:– Used by higher education and research

organisations to provide (open) access to peer-reviewed publications and other research materials

– Increasingly supported by deposit "mandates" from universities or research funding bodies

– Setting up a repository implies an institutional commitment to long-term stewardship

Page 5: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

5

Contexts (3)

• Collaborative infrastructures for institutional repositories:– Distributed services linked (for access) by

metadata harvesting• OAI-PMH• Data Providers vs. Service Providers (aggregators)

– Potential for the development of shared services to support repositories

• Alma Swan & Chris Awre, Linking UK Repositories (JISC, 2006): http://www.jisc.ac.uk/

Page 6: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

6

Contexts (4)

• Potential shared services (from Swan & Awre):– Advisory services (e.g. on IPR, preservation)– Content creation, digitisation– Repository building or hosting– Metadata enhancement– Resource discovery– Name authorities– Citation analysis and research assessment– Preservation

Page 7: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

7

Digital preservation (1)

• Shared services for preservation:– Not all institutions with repositories will be

expected to manage long-term preservation challenges:

• Lack of local expertise and resources• Existing availability of third party services in related

areas, e.g. data archives, national libraries• Preservation is a logical area for collaboration

Page 8: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

8

Digital preservation (2)

• Examples:– DARE (Digital Academic Repositories) initiative

- The Netherlands• National Library (KB) has responsibility for all content

deposited in participating repositories

– Repository Bridge project - UK• Demonstration of harvesting e-theses (using OAI-

PHM and METS) by the National Library of Wales

Page 9: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

9

Digital preservation (3)

• Examples (continued):– SHERPA DP project - UK

• Developed disaggregated framework for outsourcing preservation, based on the OAIS model

• Explored the packaging and transfer of content (using METS)

Page 10: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

10

Digital preservation (4)

• Examples (continued):– Preserv project - UK

• Led by University of Southampton• Simple model of modular services, e.g. for:

– Bit-level preservation– Object characterisation and validation (e.g. using

registries like PRONOM-DROID)– Preservation Planning (risk assessments,

technology watch, etc.)– Preservation strategies (e.g. migration)

Page 11: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

11

Digital preservation (5)

Preserv serviceprovider model(Hichcock, et al.,2007)

Page 12: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

12

Collection development (1)

• Collection development:– Set of activities, including: selection, acquisition,

deselection, disposal, preservation– A traditional focus of library collaboration, e.g.

on the development of shared collections– Need for institutional repositories to consider

own collection development requirements with wider (national or international) contexts

Page 13: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

13

Collection development (2)

• Managed collaboration on collection development– Potentially reduces unnecessary duplication of

effort, but ...– But may also support redundancy:

• Replication of content• Application of different preservation strategies

– Need to investigate role of repositories with regard to more formally published research materials

• Perhaps e-journals should be the main focus of preservation activities in this domain?

Page 14: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

14

Collection development (3)

• Institutional repositories need to define collection development policies with regard to:– Institutional requirements– Interoperability requirements (e.g. OAI-PMH)– Preservation requirements

Page 15: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

15

Collection development (4)

• Collection development issues:– Content types

• Peer-reviewed research outputs, scientific datasets, administrative records, ...

• Will be different preservation priorities

– Object types (file formats)• Policies will have direct influence on risks (and costs) of

long-term preservation, e.g.:– Accepting any format– Only accepting a limited number of format types (e.g.

PDF/A, XML); need for conversion and validation tools, or considerable post-processing

Page 16: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

16

Collection development (5)

• Potential areas for collaboration (continued):– Ingest workflows

• Checking conformance with submission rules• Automated tools for format characterisation and validation,

maybe conversion (normalisation)• Metadata enhancement, e.g. consistent forms of name

– Ongoing review (and weeding) of collections• Withdrawal of content (contentious issue)• Superseded or duplicate material

– Defining preservation service levels• Different policies needed for different types of material

Page 17: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

17

Conclusions (1)

• What should curators do?– Collaborate with other stakeholders on:

• Strategic level collaboration (e.g. through organisations like the UK Digital Preservation Coalition)

• Policy development (e.g. through emerging national frameworks)

• Research and development• Standards development (e.g., OAIS, ISO Records

Management Metadata)• The development of shared services (e.g. GDFR)

Page 18: A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

a centre of expertise in data curation and preservation

DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007

18

Conclusions (2)

• What do curators need to know?– Where core services are dependent on other

organisations (or services):• Need to understand the risks• Need to deal with these sensibly (e.g., through

contracts, service-level agreements, or by moving the most vital functions in-house)

– Many remaining open questions:• Be aware that there are still many unknown unknowns• But it is still important to do something (and to

collaborate)