Http:// Digital preservation Michael Day UKOLN, University of Bath, UK [email protected] University...

67
http://www.ukoln.ac.uk/ Digital preservation Michael Day UKOLN, University of Bath, UK [email protected] University of Bristol, MSc in Library and Information Management, Unit 6A: Advanced Information Systems Bristol, 13th October 2004

Transcript of Http:// Digital preservation Michael Day UKOLN, University of Bath, UK [email protected] University...

Page 1: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Digital preservation

Michael DayUKOLN, University of Bath, [email protected]

University of Bristol, MSc in Library and Information Management, Unit 6A: Advanced Information Systems

Bristol, 13th October 2004

Page 2: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Session overview

• The digital preservation problem

• Preservation strategies

• Preservation metadata

– The OAIS model

• Non-technical issues

– collection management, legal issues, costs, …

• Case study: the World Wide Web

• Selected projects and initiatives

Page 3: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

The digital preservation problem

Page 4: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Definitions (1)

• Preservation:– a management function

• “Its objective is to ensure that information survives in usable form for as long as it is wanted” - John Feather (1991)

– not primarily about:• conservation or restoration• backups or storage• concepts of “permanence”

Page 5: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Definitions (2)

• Digital preservation:– digital information is different– technical problems with ensuring continued

access– but also a managerial problem

• “... the planning, resource allocation, and application of preservation methods and technologies to ensure that digital information of continuing value remains accessible and usable” - Margaret Hedstrom (1998)

Page 6: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Definitions (3)

• Potential confusion with:– “archiving”

• a term used in some computing contexts for the creation of secure backup copies

– “archives”• a well-understood term in archives and

recordkeeping professions• but also used to refer to almost any

collection of data– e.g., e-print archives, image archives, etc.

Page 7: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Definitions (4)

• Potential confusion (continued):– “digitisation”

• especially where the motive for digitisation is the preservation of original items

Page 8: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Definitions (5)

• Digital Curation– New(ish) term, from science data world (e.g.

bioinformatics)– UK Digital Curation Centre– Means preservation plus …– "The activity of managing and promoting the

use of data from its point of creation, to ensure it is fit for contemporary purpose, and available for discovery and reuse" - Philip Lord, et al. (2004)

Page 9: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Digital information (1)

• An increasing flood of data ...• The Web

– Billions of pages– Internet Archive - >1 Petabyte (and growing @ 20 Tb.

per month)– The "deep-Web"

• Scientific data– Wellcome Trust Sanger Institute - manages several

hundred Terabytes of data per year, growing exponentially (just one data centre)

– Particle physics, Earth Observation and astronomy - e-Science projects expected to generate Petabytes of data per year (e.g., CERN's Large Hadron Collider = ca. >15 Pb)

Page 10: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Digital information (2)

• Sizes:Kilobyte: 1,000 bytes

Megabyte: 1,000,000 bytes

Gigabyte: 1 billion bytes

Terabyte: 1,000 Gigabytes

Petabyte: 1,000 Terabytes

Page 11: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Digital information (2)

• Sizes (broadly):Kilobyte: 1,000 bytes

Megabyte: 1,000,000 bytes

Gigabyte: 1 billion bytes

Terabyte: 1,000 Gigabytes

Petabyte: 1,000 Terabytes

Exabyte: 1,000 Petabytes

Zettabyte: 1,000 Exabytes

Yottabyte: 1,000 Zettabytes

Page 12: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Digital preservation (1)

• Media issues:• currently magnetic or optical tape and disks

– e.g., CD-ROM, DVD (optical), DAT, DLT (magnetic)

• unknown lifetimes– but relatively short compared to paper or good

quality microform– probably years rather than decades

• Format differences

Page 13: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Digital preservation (2)

• Media issues (continued):• technical solutions

– longer lasting media:» e.g. Norsam's High Density Rosetta system -

analogue storage on nickel plates» COM (output to good-quality microform)» Keeping paper copies!

– periodic copying of data bits on to new media (refreshing) - data management solution, e.g. for hierarchical storage systems

Page 14: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Digital preservation (3)

• Dependence on particular hardware and software:

• the heart of the digital preservation problem• relatively short obsolescence cycle for:

– hardware» e.g., BBC Domesday Project (1986) used a special

type of videodisc player developed by Philips– software

» e.g., word-processing files

http://www.atsf.co.uk/dottext/domesday.html

Page 15: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Digital preservation strategies

Page 16: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Preservation strategies

– Main proposed types:• technology preservation• emulation • migration• encapsulation• others ...

Page 17: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Technology preservation

• The preservation of an information object together with all of the hardware and software needed to interpret it

– preserves the look and feel and behaviour of whole system

– but will lead to museums of “ageing and incompatible computer hardware” - Mary Feeney (1999)

– storage space, maintenance, costs ...– may have a short-term role in the rescue of digital

objects (digital archaeology)

Page 18: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Emulation (1)

• The preservation of original application software and to run this on emulators that mimic the behaviour of obsolete hardware and operating systems

– preserves ‘look-and-feel’– may be useful where the digital object is complex

(e.g. multimedia) or cannot easily be migrated– development of ‘virtual machines’ that would

have to be migrated to work on different platforms (Jeff Rothenberg)

Page 19: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Emulation (2)

– strategy has been tested in:» Camileon project (JISC/NSF)» NEDLIB experiments (European national

libraries)» National Library of the Netherlands

– requires the maintenance of a huge (and growing) amount of information about platforms and operating systems

– preserves the defects embedded in original software

– Hard to know whether user experience has been accurately preserved

Page 20: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Migration (1)

• Managed transformations:– The periodic transfer of digital information from

one hardware and software configuration to another, or from one generation of computer technology to a subsequent one - CPA/RLG report (1996)

– abandons attempts to keep old technology (or substitutes) working

– a linear migration strategy is used by software vendors for some data types (e.g. Microsoft Excel files)

Page 21: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Migration (2)

– Migration can often be combined with some form of standardisation (e.g., on ingest)

» ASCII» bit-mapped-page images» well-defined XML formats

– Migration on Request» Camileon project proposal

Page 22: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Encapsulation

• Encapsulating the digital object with information on how it should be interpreted

– self-describing objects– the principle underlying the OAIS reference

model– can also support emulation or migration on

demand strategies– examples:

» Universal Preservation Format (UPF)» “Buckets” (NASA Langley Research Center)

Page 23: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Other strategies

• Digital archaeology– data recovery– time consuming process (expensive)

• “Persistent archives”– San Diego Supercomputer Center– research funded by NSF, DARPA, NARA– comprehensive strategy based on an information

management architecture– infrastructure independent representations of

digital objects (tagged in XML) – tested on an e-mail collection (Reagan Moore, et

al., 2000)

Page 24: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Mixed strategies

– Preservation strategies are not in competition• different strategies can work together• but have implications for:

– the technical infrastructure required (and metadata)

– collection management priorities

» e.g., encouraging the consistent use of standards (migration), the collection of software and documentation (emulation)

– rights management

» e.g., holding the rights to re-engineer software

– costs

Page 25: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Preservation metadata

Page 26: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Preservation metadata (1)

• All digital preservation strategies depend - to some extent - on the creation, capture and maintenance of metadata

– "Preserving the right metadata is key to preserving digital objects" (ERPANET Briefing Paper, 2003)

• Defined as:– The various types data that will allow the re-

creation and interpretation of the structure and content of digital data over time (Ludäsher, Marciano & Moore, 2001)

Page 27: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Preservation metadata (2)

• Metadata fulfil various roles, e.g.:– "… to find, manage, control, understand or

preserve … information over time" (Cunningham, 2000)

– Descriptive information; technical information about formats and structure; information about provenance and context; administrative information, e.g. for rights management

– Current schemas either very complex or only provide a basic framework (sometimes both!)

– Perception that different strategies and objects will need different metadata

Page 28: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Preservation metadata - standards

– Developed from many different perspectives:• Digital libraries:

– METS, NISO Z39.87 (to support digitisation initiatives)– OCLC/RLG Framework, Cedars, NEDLIB, NLA, NLNZ– OAIS influence has been greatest in this area

• Records management and archival description:– Pittsburgh BAC, RKMS, NAA, VERS, PRO, EAD, etc.

– Also standards not specifically developed for preservation, but with some overlap:

• Multimedia– MPEG-7, SMPTE, etc

• Rights management:– <indecs>, MPEG-21, etc.

Page 29: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

The OAIS model

– Reference Model for an Open Archival Information System (OAIS)

– ISO 14721:2003– Established a common framework of terms and

concepts– Influential on the design of some schemas

» e.g., OCLC/RLG Metadata Framework– Identified basic functions:

» Ingest, Data Management, Archival Storage, Administration, Access, Preservation Planning

Page 30: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

OAIS functional model

Administration

Ingest

ArchivalStorage

Access

DataManagement

Descriptive info.

PRODUCER

CONSUMER

MANAGEMENT

queries

result sets

Descriptive info.

Preservation Planning

orders

OAIS Functional Entities (Figure 4-1)

SIP

SIP

SIP

DIP

DIP

AIP AIP

Page 31: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

OAIS information objects

• Information Object (basic concept)– Data Object (bit-stream)– Representation Information (permits “the full

interpretation of Data Object into meaningful information”)

• Information Object Classes– Content Information– Preservation Description Information (PDI)– Packaging Information– Descriptive Information

Page 32: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

OAIS information packages

• Information package:– Container that encapsulates Content Information and

PDI

– Packages for submission (SIP), archival storage (AIP) and dissemination (DIP)

» AIP = “... a concise way of referring to a set of information that has, in principle, all of the qualities needed for permanent, or indefinite, Long Term Preservation of a designated Information Object”

– PDI = other information (metadata) “which will allow the understanding of the Content Information over an indefinite period of time”

» Reference, Provenance, Context, Fixity

Page 33: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

PreservationDescriptionInformation

Reference Information

ProvenanceInformation

ContextInformation

FixityInformation

Preservation Description Information:

The OAIS model (4)

OAIS Information Package Taxonomy (Figure 4-14)

Page 34: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Metadata schema categorisation

• Earliest schemas were largely conceptual in nature:

– e.g. Pittsburgh BAC model, Cedars outline specification, OCLC/RLG WG I

• Gradually moving towards a more practical focus:

– e.g., VERS, NLNZ, METS, PREMIS WG– Convergence on XML (DTDs and Schemas)

• But there is an urgent need for all this practical experience to be shared

– e.g., published schemas, advice on implementation, etc.

Page 35: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Sustainability issues (1)

• Balance risks with costs:– There is a perception that metadata creation and

maintenance will be expensive– But costs associated with data recovery are not

trivial– Need to balance the risks of data loss with the

cost of creating metadata» Cost/benefit analysis» Robust selection criteria» Co-operation between repositories» Re-use of existing metadata

Page 36: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Sustainability issues (2)

• Avoid imposing unnecessary costs:– Avoid large schemas (?)– Need to identify the right metadata - 'core

metadata' (?)

Page 37: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Metadata creation issues

• Created by humans or captured automatically?

– Some metadata already exists, e.g.:» Embedded within objects» In separate databases» Generated by particular processes

– Need for this metadata to be captured at creation, ingest, migration, and at other appropriate points in object life-cycle

Page 38: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Interoperability issues

• Benefits of interoperability– Support for ingest process– To support the management of multiple formats

and metadata schema within a digital preservation system

» Current metadata specifications not entirely clear on how this should be done

– To support the exchange of information packages outside the repository, e.g. by converting to standard 'exchange formats'

» Networks of 'trusted repositories'

Page 39: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Format and metadata registries

• Format registries– There is "… a pressing need to establish reliable,

sustained repositories of file format specifications, documentation, and related software" (Lawrence, et al., 2000)

– DSpace 'bitstream format registry'– Digital Library Federation, et al. recently

proposed a Global digital format registry

• Metadata registries– More research into these is required

Page 40: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Non-technical issues

Page 41: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Collection management

– Selection, storage, access, "de-selection"– Issues:

• Preservation issues need to be considered early in an object's life-cycle (the traditional 'transfer to repository' model will not work)

• An important role for creators (and funding bodies)– Guidance, documentation

• Sharing of responsibilities– A need for collaboration

• Digital storage costs are cheap, so should we keep everything?

Page 42: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Legal issues (1)

• Institutions need to obtain the legal rights to preserve digital objects and make them accessible:

– e.g., copying, the re-engineering of software– identify and negotiate with rights holders?

» but difficult to identify all rights holders ...– safeguard rights– part of legal deposit?– Monitoring legislation and case law

Page 43: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Legal issues (2)

• Rights holders want increasing control over content

– e.g., the extension of copyright periods, licensing of access

– Digital Millennium Copyright Act (US)– European Union Copyright Directive

• Consideration of “dark archives” - repositories without access ...

Page 44: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Costs

– Still very little known about costs:• no widely used economic models• no clear idea of who pays?• Moore’s Law (technology)

– digital storage densities increase while costs decrease

– not necessarily applicable to Petabytes of data from e-science projects

• identification of cost elements is best approach

Page 45: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Capturing and preserving the World Wide Web

Page 46: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Web archiving (1)

• Four main approaches (to date):– Crawler based (for surface Web)

• Internet Archive• Swedish Royal Library (Kulturarw3)• Iceland, Finland, Austria, etc.

– Selective approach• National Library of Australia (PANDORA)• UK Web Archive Consortium

– Direct deposit by creators– Combined approaches

• Bibliothèque nationale de France• International Internet Preservation Consortium

Page 47: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Web archiving (2)

– an important response to the transitory nature of the Web

– existing projects more concerned with collection strategies than access or preservation

– major focus on events, e.g. national elections• Internet Archive Special Collections• NARA (US National Archives and Records

Administration) snapshots of US federal agencies and departments in 2001

• The National Archives (PRO) - capture of No. 10, Downing Street site (2001); current work with Internet Archive (UK Central Government Web Archive)

Page 48: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Web archiving (3)

– limited consideration of access issues, • except for:

– Internet Archive (Wayback Machine)– PANDORA Archive (NLA)– Nordic Web Archive project

– but this is changing …

– Wayback Machine: http://www.archive.org/

Page 49: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Page 50: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Some projects and initiatives

Page 51: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Digital Curation Centre (1)

– Two main drivers:• e-Science, the "data deluge," need for continued

access and reuse of data• Digital preservation

– Jointly funded by the Joint Information Systems Committee (JISC) and the e-Science Core Programme

• Outreach, services and development• Research programme

– Funding from March 2004, initially for three years

Page 52: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Digital Curation Centre (2)

• Consortium partners:• University of Edinburgh (lead partner)

– EDINA, NeSC, Informatics

• University of Glasgow– HATII, Information Services

• Council for the Central Laboratory of the Research Councils (CCLRC)

• University of Bath– UKOLN

Page 53: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Digital Curation Centre (3)

• Aims:– ‘Continuing quality improvement in data

curation & digital preservation’

• Main focus:– data as evidential base for science and

scholarship– role of data curation & preservation as keys to

reproducibility and reuse

• Wider focus:– the worlds of e-learning & scholarly

communication

Page 54: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Digital Curation Centre (4)

• Objectives:– Research programme– Collaborative Associates Network

– Links with existing communities of practice– Engagement with active curators

– Services– Evaluation of tools, standards, etc.– Repository of tools, etc.– Advice, curation manual, etc.

– 'Virtuous circle'

Page 55: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Digital Curation Centre (5)

• Progress to date:– Requirements analysis– Web site: http://www.dcc.ac.uk/– Launch: 5 November 2004

Page 56: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Digital Preservation Coalition

• formed in 2001 • aims to foster joint action in the UK and

internationally– Dissemination (handbook, bulletin, …)– getting digital preservation on the agenda of key

stakeholders– members include BL, the e-Science core

programme, JISC, OCLC, the National Archives, Resource, the BBC, etc.

Page 57: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

NDIIPP

– National Digital Information Infrastructure Preservation Program

• Funded by the US Congress• A national planning effort led by the Library of

Congress, in co-operation with representatives of other federal, research, library, and business organisations

• $100 million• Master plan approved by Congress, December 2002• NDIIPP Programme

– 8 projects ($14.9 m), announced September 2004

Page 58: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Page 59: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Summing up

Page 60: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Summing up:

• Digital preservation is a managerial as well as a technical problem

• Technical agenda is being developed– there is much work being undertaken into

developing sustainable preservation strategies and metadata schemas

• Co-operation is essential– some progress, e.g. the DCC, DPC, NDIIPP

• Many problems remain– costs, legal issues, etc.

Page 61: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Further information

Page 62: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Readings (1)

– Neil Beagrie and Maggie Jones, Preservation Management of Digital Materials: a Handbook (2001) http://www.dpconline.org/

– Margaret Hedstrom, It's about time: research challenges in digital archiving and long-term preservation (2003) http://www.digitalpreservation.gov/

– Margaret Hedstrom and Seamus Ross, Invest to save: report and recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation (2003)http://delos-noe.iei.pi.cnr.it/

Page 63: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Readings (2)

– Council on Library and Information Resources, The state of digital preservation: an international perspective (2002) http://www.clir.org/

– Philip Lord and Alison Macdonald, Data curation for e-Science in the UK: an audit to establish requirements for future curation and provision (2003) http://www.jisc.ac.uk/

– ERPANET workshop reports and related materials: http://www.erpanet.org/

Page 64: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Page 65: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Page 66: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

More information

• Preserving Access to Digital Information (PADI) gateway:– http://www.nla.gov.au/padi/

• DPC/PADI “What’s New” bulletin:– http://www.dpconline.org/graphics/whatsnew/

Page 67: Http:// Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information.

                                                             

http://www.ukoln.ac.uk/

Unit 6A: Advanced Information Systems, 13 October 2004

Acknowledgements

UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath, where it is based.