Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation...

39
© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford Prof Seamus Ross Academic Visitor at Oxford Internet Institute, Professor of Humanities Informatics and Digital Curation and Director, HATII (University of Glasgow) Digital Longevity: Research Directions and Opportunities Oxford Internet Institute (OII) 26 January 2006 Oxford

Transcript of Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation...

Page 1: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Prof Seamus RossAcademic Visitor at Oxford Internet Institute,

Professor of Humanities Informatics and Digital Curation and Director, HATII (University of Glasgow)

Digital Longevity: Research Directions and Opportunities

Oxford Internet Institute (OII)26 January 2006

Oxford

Page 2: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Brace the doors….• The widespread use of digital

technologies is creating a tidal wave of digital materials to be ingested into memory institutions—new technologies, processes, and cultural attitudes are needed to make this happen

©H

ATII UofG

lasgow, 2005

Paper Records arriving at the Hocken Library in Dunedin (NZ), September 2003

Page 3: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Objective of digital longevity• Digital preservation aims to ensure that future users will be

able discover, retrieve, render, manipulate, interpret and use digital information in the face of constantly changing technology

• It involves conservation, renewal, restoration, selection, destruction, enhancing, updating, and annotating

• It is a risk management activity at all stages of the longevity pathway

• In the digital age we are all digital curators whether in our work, in our community or in our personal life

©H

ATII UofG

lasgow, 2005

Page 4: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Digital Curation Aims

Sustainable, effective, and viable digital archives/libraries, accountable government and companies, and effective science depend on mechanisms to support integrity, authenticity, reliability, security, maintenance and access to digital materials across time and systems

Margaret Hedstrom (University of Michigan) &Birte Christensen-Dalsgaard, Staatsbibliotek Denmark at DELOS Preservation Research Workgroup Meeting in Washington DC, November 2002.

©H

ATII UofG

lasgow, 2002

Page 5: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Authenticity, Integrity, & Reliability

• That digital objects are what they purport or claim to be

• each rendition carries the same force as the initial instantiation (sometime refereed to as the original)

• completeness • validation of integrity and

authenticity• InterPARES concluded that

Authenticity and Integrity were the responsibility of the repository and Reliability of the responsibility of the creator

Participants at the ERPANET File Formats Seminar in Vienna, May 2004. http://www.erpanet.org

©ER

PANET, 2004

Page 6: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Ensuring Authenticity?

•We need to know•The history of digital objects (i.e. chain of custody and process history)•That we can verify that they have not changed or been modified•That we can moderate and validate the ingest process•That they are stored in immutable data stores•That transformations maintain authenticity and reliability

Page 7: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Motivations

Reuse

Risk of Loss

Costsof recreation (if possible)

CompetitiveAdvantage

Evidential value

Foundation for ScholarlyEndeavour

Memory

Accountability

Regulatory Compliance

Mission of organisation

Page 8: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

What is a Digital Object?

• Combination of – Source– Process

• Boundaries between source and process sometimes blurred

Page 9: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Challenges (Examples)• Inaccessibility of digital object

– Degradation of storage medium– Technological obsolescence

• Syntactical interpretation or representation failures

• Semantic opaqueness– Lack of contextual information (e.g.

suitable metadata)– Loss of Process & dynamic nature

• Legal impediments• The organisation and its staff

– Lack of organisational will – visible benefits

– Decentralised and node-based organisation

©H

ATII UofG

lasgow, 2005

Historic Media on Display at the Launch of the UK Digital Curation Centre (DCC), November 04http://www.dcc.ac.uk

Page 10: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Preservation Approaches

• Print to Paper or Microfilm/fiche• Refreshing• Normalisation• Migration (including format

conversion)• Emulation• Simulation• Technology Preservation (e.g. keeping

hardware running)• Reconstruction• Restoration• Replication (copying and closely

related to migration)• Digital Archaeology

Stuart Weibel of OCLC at the grave of George Boole following the ERPANET seminar on Persistent Identifiers, Cork, June 2004

©ER

PANET, 2005

Page 11: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Preservation Approaches

• Print to Paper or Microfilm/fiche• Refreshing• Normalisation• Migration (including format

conversion)• Emulation• Simulation• Technology Preservation (e.g. keeping

hardware running)• Reconstruction• Restoration• Replication (copying and closely

related to migration)• Digital Archaeology

Stuart Weibel of OCLC at the grave of George Boole following the ERPANET seminar on Persistent Identifiers, Cork, June 2004

©ER

PANET, 2005

Page 12: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

• Case studies in public and private sectors – 500 organisations contacted 12%

participation rate achieved– Aim to achieve spread across

organisational size and location– Diversity of organisational type, activity,

regulatory framework, and culture• Perception across records and

resources• Accumulate and make accessible

information about how approaches to digital longevity

• Identify issues for further research

ERPANET Case Studies

Mr Allemann, Trivadis AG Switzerland co-sponsor of the ERPANET Bern Workshop on Long term preservation of databases, April 2003.

©ER

PANET, 2005

Page 13: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

• Variation in awareness of risk• Value of information depends upon recognition of

business/organisational dependency, potential of re-use, or risk associated with information (not necessarily cost)

• Responsibility rarely taken at corporate level• Few organisations have adequate strategies• Activity is fragmentary: practices tend to be incomplete,

ad hoc, and unitary• waiting for external solutions• Preservation and storage poorly understood • Lack of policies and procedures

Eight Very General Findings

Page 14: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

General Needs

• Off-the-shelf policy statements• Business cases & strategies• Digestible guidance on technologies

and their preservation implications• Improved models (reference, costs,

standards, functional requirements)• Simple Guidelines on digital survival • Guidance on creating data repositories• Externally provided solutions and

automation mechanisms• IPR support and guidance

Maria Guercio and John McDonald at the Preservation Policy and Procedures co-sponsored with the Archives de France – Le Centre des Archives Contemporaines, Fontainebleau, January 2003.

©ER

PANET, 2005

Page 15: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Reasons for Research Gap

• there has been a lack of appreciation of the research challenges posed by digital preservation,

• a lack of a sense of urgency, • A lack of proven business cases which might have

encouraged the development of this as a research or technology sector,

• the fact that in the past the research agenda has been driven by information professionals working in memory institutions or corporate records management teams,

• the limited funding for this kind of research, and, • of course, the speed of technological

development.

Page 16: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Out of Scope, but not out of mind

• Research needed in areas of– policy and procedures, – organisational structure and communication,– education, – business case development, or – legal arena.

Page 17: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Background

• Working Group on Digital Preservation and Archiving– Joint EU and US Workgroup– Sponsored by DELOS -- Europe’s Digital Library Network of

Excellence) & National Science Foundation (NSF) in the US– Create a Research Agenda for Digital Preservation that will

enable research and create new market opportunities for the information society.

– Published as Invest to Save• Digital Curation Centre of the UK

– Defined its research agenda with the broader concept of curation in mind

Page 18: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Scalability• preservation  research  to  date  has examined  either  large  sets  of homogeneous  data  or  small collections  of  heterogeneous material. 

• It  has  been  done  by  large institutions

• This  raises  a  series  of  issues concerning  the  scalability  of current models  and methods,  the ingest  rate,  and  the  rate  at which digital  materials  can  be normalised or migrated. – Is  it  possible  to  develop  metrics  to assess  the  scalability  of  preservation strategies and methods? 

©ER

PANET, 2004

Ken Thibodeau of NARA speaking at the ERPANET Berne Seminar, October 2004

Page 19: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Modelling Preservation Processes

• Improve preservation by building preservation functionality into systems used to create and manage digital objects.

• This requires:– This means improving our knowledge about what

preservation functionality really is and ensuring that this functionality can be effectively communicated to system developers.

• Dream of Archivists and Records Managers—to front-end preservation

Page 20: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Process Planning• Different formats require different kinds of

strategic approaches to ensure that they can be accessed in the future.

• Problems with formats are exacerbated by the fact that archival collections, which need to be managed as a whole, generally contain entities in multiple formats; these formats have different rates of obsolescence.

• E.g. we need predictive measures to enable developers to assess the preservation impact of attributes of formats in advance of their completed development or use.

Page 21: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Charaterisation Mechanisms• Digital entities need to be characterized

independently of underlying software and hardware infrastructure to reconstruction in newer environments

• Machine interpretable expression of the significant properties of digital assets

• Mechanisms for identifying and representing these ‘significant properties’

• Registries store expressions and as a source of generic expressions (e.g. .xls, .sxw)

Page 22: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Documentation of Functionality and Behaviour

• Need formal ways to express the functionality and behaviour of digital entities.

• These are needed to establish benchmarks and measure consistency of performance across migrations or emulations.

• Approaches to functionality and behaviourabstraction and representation are also needed to enable us to reconstruct applications and systems. (e.g. Culture?)

Page 23: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Automation (or semi-automation)• Huge quantities of materials to

ingest and manage– selection, validation, description, assigning 

unique persistent identifiers, data management, migration, and selection and appraisal

• Automation of workflows allow integration of independent services

• Standardized logging/record creation

• Reduce human intervention– Cheaper and faster– Less error prone – Enables higher level of security and

reliability• Enables intensive test and

verification mechanisms

Hans Hofman (Dutch National Archives) and Charles Dollar at ICA2004 Wien.

©ER

PANET, 2005

Page 24: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Automation of Processes

• Human effort does not scale to the size and complexity of material

• Workflow characterisation• Preservation process selection

(e.g. utility analysis)• Acquisition of content• Organisation of content• Description• Management• Collection personalisation• Scalability

©H

ATII UofG

lasgow, 2005

Page 25: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Automated Metadata Creation

• Preservation metadata is an essential part of the information infrastructure necessary to support all the processes in digital preservation.

• automatic or semi-automated creation and authoring of the technical, descriptive, structural, and contextual metadata are a crucial issue.

• Need for creation of metadata supporting the discover, use and understandability of digital objects.

Page 26: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Long-term Metadata Viability

• meaning of metadata itself changes over time, what we might describe as ‘metadata drift’.

• For purposes of interpretation and authenticity, users will need access to the metadata schema used at the time the digital entity was created.

• research needed into metadata schema and ontology evolution mapping to ensure that, over time, metadata and underlying ontologies do not lose their meaning.

• Tools to track provenance, version control

Page 27: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Elaboration of existing Repository Models• Current models for repositories

provide a useful starting point • Further research is needed to

develop technical specifications and standards to build persistent DLs.– Definition of a service layer that

would allow distributed repositories to share content, tools (e.g. repositories) and services (e.g. security, user profiling, management, privacy)

– Models and specifications for discovery, access, security and retrieval across diverse repositories and collections

©H

ATII UofG

lasgow, 2005

Library at Hadrian’s Villa at Tivoli

Page 28: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Repository Design• Change will be a feature

of repositories– Storage technologies– Services, close down of

some and initiation of others

– Workflows– Verification mechanisms– Migration, refreshing,

emulation—and ….

Digital Repository Infrastructure, Swiss Federal Archives, Berne, October 2004

©ER

PANET, 2005

Page 29: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Audit and Certification• Consider such traditional archives as

those held at the Tate or The National Archives (Kew)

• How do we know that our data are secure

• Trust– How is it established?– How is it maintained?– How is it secured?– What happens when it is lost?– How can it be verified?

• BUT will audit and certification increase our trust in services provided by digital repositories?

Speakers at the Lisbon Seminar on The Selection, Appraisal and Retention of Digital Scientific Data, Weber Amaral, William Anderson, Terry Eastwood, John Faundeen, Pedro Fernades, Luigi Fusco, Francoise Genova, Myron Gutmann, Gail Hodge, Jürgen Knobloch, Meredith Lane, Seamus Ross, Kevin Schϋrer, Alex Szalay, Peter Weiss

©ER

PANET, 2005

Page 30: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Repositories• Software Repositories as an example:

– Emulation and salvage and rescue techniques may depend on software that may no longer be available

– A small number of software repositories to collect, maintain, and provide access to obsolete software.

• Examples: it might hold a characterisation of the capabilities of prior systems, which can be implemented using modern technology, or it may hold routines that can migrate obsolete encoding formats to contemporary encoding formats.

Page 31: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Representation Information Registries

• Information that maps a digit object to structure and semantics 

• guide the managing of their transition from one state to another

• For formats it might provide keys to understanding the nature of digital objects– to identify the format of unknown files, – to verify whether a file is the format that it purports to be, – to assess the viability and implications of transforming from one file format to another, 

– to provide an information resource to support the investigationsof file format risk, and 

– to store information about how to render an object from a particular format. 

Page 32: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Managing Complex and Dynamic Digital Entities

• There has been little research to address how interrelationships between the components of compound documents/digital entities might be maintained.– How can complex and dynamic entities be

authenticated and their integrity verified?– How can dynamic entities be accessioned and managed

in an archive? – To what aspects of a dynamic document should

metadata be attached and what metadata would be required?

– How do we ensure dynamic qualities across time?– At what level is loss of dynamic qualities acceptable?

What measurement metrics?

Page 33: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Self-describing & self- monitoring entities

• Digital entities that know what they are• Digital entities that know something about their

semantics• Digital entities that can observe the state of

other objects (e.g. observe decline in numbers of similar classes of objects)

• Digital entities that know where they are• Digital entities that know where their metadata

are• Digital entities with an ET mentality – ‘phone

home’

Page 34: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Salvage and Rescue• little work

– has been done on developing techniques to enable raw data streams to be analyzed and the original meaningful (e.g. logical) units they represent reconstructed (e.g. crypto-analysis methods)

– generic devices for reading media

Hard disk undergoing one stage of forensic analysis as part of data recovery planning at the Tunstall and Tunstall (Nepean, Ontario, Canada) in 2003.

©ER

PANET, 2005

Page 35: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Archival Media

• BUT to bring new classes of technology to bear on the recovery, reconstruction and interpretation of the meaning represented by bitstreams, they need to be recorded on that have durability and stability measured in 1000s of years.

Hard disk undergoing forensic analysis as part of data recover planning at the Tunstall and Tunstall (Nepean, Ontario, Canada) in 2003.

©ER

PANET, 2005

Page 36: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Experimental Testbeds• Facility to run tests (e.g. ‘is this the appropriate

preservation pathway for this digital object or class of objects, or system’)

• to investigate the potential metrics for measuring the effectiveness of different preservation strategies in the context of complex digital objects

• integrate, automate, and evaluate frameworkafor digital entity preservation by integrating and combining the testbed framework and evaluation metrics

• integration of software tools to support the digital preservation testbed framework.

Page 37: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

Why Focus on these Challenges

• Address problems of long term persistency• Support delivery of outcomes that meet

clearly identified areas of user need• Address problems broadly applicable to digital

libraries from cultural and scientific to eHealth

• All these are areas in which research could be successfully done!!!

Page 39: Digital Longevity: Research Directions and Opportunities · 1/26/2006  · • Digital preservation aims to ensure that future users will be able discover, retrieve, render, manipulate,

© Seamus Ross, Digital Longevity, 26 Jan 06, OII-Oxford

• Digital Curation Centre – http://www.dcc.ac.uk

• ERPANET– http://www.erpanet.org

• DELOS – Preservation Cluster– http://www.dpc.delos.info

• & (from 2 April 2006)• DigitalPreservationEurope (dpe)

– http://www.digitalpreservationeurope.org

For Resources & Guidance Visit