Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the...

65
Preservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther , “Metadata for preservation of digital objects: background, functions, and standards” – Preservation Metadata Workshop (1), Hilversum, The Netherlands, 4 March 2014 Preservation Metadata: between theory and practice

Transcript of Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the...

Page 1: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Preservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata for preservation of digital objects: background, functions, and standards” – Preservation Metadata Workshop (1), Hilversum, The Netherlands, 4 March 2014

Preservation Metadata: between theory and practice

Page 2: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

OUTLINE

1.  General introduction to preservation metadata 2.  The PREMIS Data Dictionary 3.  A use case: the Preservation Health Check

2

Page 3: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Introduction to preservation metadata

3

Page 4: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

metadata Function �  Discovery �  Access �  Management �  Control intellectual property

rights �  Identification �  Certify authenticity �  Mark content structure �  Indicate status �  Describe processes �  Etc.

Type �  Descriptive �  Administrative �  Technical �  Rights/Access �  Structural �  Meta-metadata �  Etc.

4

Page 5: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

digital preservation Digital preservation is part and parcel of the “management and

preservation” tasks and responsibilities of a heritage institution. Digital information poses its own set of challenges to preservation: •  The overwhelming volume of digital information created daily and

the uncontrolled duplication of information; •  The complexity of digital information (content, structure, context,

presentation, behaviour) and the evolving boundaries of the scholarly record and the cultural record;

•  The dependency on software/hardware (incl. incompatible, obscure or proprietary systems)

•  The rapid technological change and the danger of obsolescence •  The ease of (accidental or malicious) content alteration •  Doubts about the reliability and integrity of electronic records and

the need to vouch for their authenticity

5

Page 6: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

digital preservation Digital preservation is part and parcel of the “management and

preservation” tasks and responsibilities of a heritage institution. Digital information poses its own set of challenges to preservation: •  The overwhelming volume of digital information created daily and

the uncontrolled duplication of information; •  The complexity of digital information (content, structure, context,

presentation, behaviour) and the evolving boundaries of the scholarly record and the cultural record;

Ø  The dependency on software/hardware (incl. incompatible, obscure or proprietary systems)

Ø  The rapid technological change and the danger of obsolescence

•  The ease of (accidental or malicious) content alteration •  Doubts about the reliability and integrity of electronic records and

the need to vouch for their authenticity

6

Page 7: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

preservation metadata in 2000 “We can then say that the main problem metadata

for long term preservation will help to solve is the problem of technological obsolescence.” (p.4)

7 http://www.kb.nl/sites/default/files/docs/NEDLIBmetadata.pdf

Page 8: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

preservation metadata in 2002 “Preservation metadata (…) is the information

necessary to maintain the viability, renderability, and understandability of digital resources over the long-term.” (p.1)

8

http://www.oclc.org/content/dam/research/activities/pmwg/pm_framework.pdf?urlm=161391

Page 9: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

preservation metadata in 2005 “Preservation metadata (…) metadata supporting

the functions of maintaining viability, renderability, understandability, authenticity, and identity in a preservation context.” (p. ix)

9

http://www.loc.gov/standards/premis/

Page 10: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

The SPOT Model for risk assessment

SPOT Model

Availability

Identity

Persistence

Renderability

Understandability

Authenticity

Threats

http://www.dlib.org/dlib/september12/vermaaten/09vermaaten.html

Six essential properties of successful digital preservation

Page 11: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

metadata and preservation metadata

“Structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource”

METADATA

“Metadata that supports and documents the digital preservation process”

PRESERVATION METADATA

Page 12: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

supporting and documenting the digital preservation process •  Provenance:

–  The chain of custody/ownership of the digital object; info about the depositor; etc.

•  Authenticity:

–  The documentation of changes affecting the authenticity of the digital object during the preservation process

•  Preservation Activity:

–  The documentation of actions taken to preserve the digital object •  Technical Environment:

–  The documentation of the dependencies on and changes in the technical environment needed to render and use the digital object

•  Rights:

–  The documentation of the rights and permissions for carrying out preservation activities on the digital object (duplication, migration, transformations)

Page 13: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata
Page 14: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

OAIS Information Model

Information Package Concepts and Relationships (Figure 2-3)

Page 15: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Preservation Description Information

Preservation Description Information

Reference Information

Provenance Information

Context Information

Fixity Information

Preservation Description Information (Figure 4-16) – June 2012 version

Reference information: identifiers of the Content Provenance information: history of the custody Context information: relation of the Content to other objects Fixity information: a data integrity checksum of the Content Access Rights Information: permissions for preservation operations

Access Rights Information

Page 16: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

How to record and manage change

OAIS rule: if the PDI changes, the AIP version changes.

Implementation choices: e.g. fixity information in source AIP + keep log of data integrity checks and their

outcomes separate from the AIP.

16

Page 17: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

OAIS compliance relevant to preservation metadata

OAIS Mandatory Responsibilities: 1.  Negotiating and accepting information 2.  Obtaining sufficient control of the information to

ensure long-term preservation 3.  Determining the "designated community" 4.  Ensuring that information is independently

understandable 5.  Following documented policies and procedures 6.  Making the preserved information available

Page 18: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Digital repository certification

–  RLG-NARA Task Force on Digital Repository Certification –  Various other certification initiatives (CRL, DCC, nestor,

DRAMBORA) –  Trusted Repositories Audit & Certification (TRAC): Criteria and

Checklist (March 2007) •  Organisational infrastructure

–  e.g., governance, organisational structures, mandates, policy frameworks, funding systems, contracts and licenses

•  Digital Object Management (OAIS functions) –  e.g., ingest, metadata, preservation strategies

•  Technologies, Technical Infrastructure, & Security

Page 19: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Functions of a trusted digital repository relevant to preservation metadata •  Maintains persistent, unique identifiers for all archived

objects •  Identifies properties it will preserve •  Verifies each submitted object during ingest •  Creates archival package from submission package to

include technical and rights metadata •  Has mechanisms to authenticate content and its source •  Ensures that content information isn’t corrupted and

maintains integrity by using fixity information •  Manages number and location of copies of all digital

objects •  Employs documented preservation strategies

19

Page 20: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Functions of a trusted digital repository relevant to preservation metadata •  Maintains precise descriptions of actions necessary to ensure

that objects are preserved •  Has mechanisms for monitoring and notification when formats

are becoming obsolete •  Uses tools and resources such as format registries to

establish semantic and technical context •  Has processes for storage media and/or hardware changes •  Tracks and manages intellectual property rights and

restrictions •  Ensures that agreements applicable to access conditions are

adhered to •  Maintains descriptive metadata for access and retrieval and

associates it with object

20

Page 21: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

PREMIS

21

Page 22: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Standards that address preservation metadata: technical •  PREMIS •  Images

–  NISO Z39.87 and MIX –  Adobe and XMP (Extensible Metadata Platform) –  Exif (Exchangeable Image File Format) –  IPTC (International Press Telecommunications Council)/XMP

•  Text: textMD •  Sound

–  AES57-2011: Audio Object XML Schema –  AES60-2011: Core Audio Metadata –  AudioMD (Library of Congress)

Page 23: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Standards that address preservation metadata: technical

•  Video –  VideoMD –  SMPTE RP210 –  Technical metadata in EBUCore, PBCore –  U.S. Federal Agencies Digitization Guidelines –  MPEG-7 and MPEG-21 for video

Page 24: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Standards that address preservation metadata: Structural §  METS §  PREMIS §  MPEG 21 Digital Item Declaration §  OAI/ORE §  Specific format types

–  MXF –  AVI

Page 25: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Standards that address preservation metadata: Rights •  PREMIS •  METS Rights •  CDL Copyright schema •  Creative commons •  PLUS for images •  MPEG-21 REL for moving images •  ONIX for licensing terms •  Full rights expression languages

–  XRML/MPEG-21 –  ODRL

Page 26: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

PREMIS Data Dictionary •  May 2005: Data Dictionary for Preservation

Metadata: Final Report of the PREMIS Working Group •  March 2008: PREMIS Data Dictionary for Preservation

Metadata, version 2.0

•  Jan. 2011: version 2.1

•  April 2012: version 2.2

•  Announced in September 2013: version 3.0

•  Data Dictionary: –  Comprehensive view of information needed to support digital preservation

•  Guidelines/recommendations to support creation, use, management –  Based on deep pool of institutional experiences in setting up and managing operational

capacity for digital preservation

Page 27: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Guiding principles: “implementable, core preservation metadata”

•  Preservation metadata: maintain viability, renderability, understandability, authenticity, identity in a preservation context

•  Core: What most preservation repositories need to know to preserve digital materials over the long-term

•  Implementable: rigorously defined; supported by usage guidelines/recommendations; emphasis on automated workflows and metadata generation

•  Technical neutrality: no assumptions about technologies, systems and architectures, where metadata is stored

Page 28: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Scope

•  What PREMIS DD is: –  Common data model for organizing/thinking about preservation metadata –  Guidance for local implementations –  Standard for exchanging information packages between repositories –  Compatible with the OAIS reference and information model

•  What PREMIS DD is not: –  Out-of-the-box solution: need to instantiate as metadata elements in repository

system –  All needed metadata: excludes business rules, format-specific technical

metadata, descriptive metadata for access, non-core preservation metadata –  Lifecycle management of objects outside repository –  Rights management: limited to permissions regarding actions taken within

repository

Page 29: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

PREMIS Data Model

Intellectual Entities

Objects

Rights Statements

Agents

Events

Page 30: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Intellectual Entities

Examples: •  The Chamber by John Grisham (an

ebook) •  “Maggie at the beach”

(a photograph) •  The Metropolitan New York Library

Council Website (a website)

•  Set of content that is considered a single intellectual unit for purposes of management and description (e.g., a book, a photograph, a map, a database)

•  Has one or more digital representations

•  May include other Intellectual Entities (e.g. a website that includes a web page)

•  Not fully described in PREMIS DD, but can be linked to in metadata describing digital representation THIS WILL CHANGE IN 3.0

Page 31: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Objects

Examples: §  a PDF file §  A book composed of several

XML files and many images §  TIFF file containing a header

and 2 images

Objects are what repository actually preserves FILE: named and ordered sequence of bytes that is known by an operating system REPRESENTATION: set of files, including structural metadata, that, taken together, constitute a complete rendering of an Intellectual Entity BITSTREAM: data within a file with properties relevant for preservation purposes (but needs additional structure or reformatting to be stand-alone file) FILESTREAMS (files within files) are considered files since can be rendered alone

Page 32: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Object Example: book in two versions

Intellectual Entity Da Vinci Code by Dan Brown

Representation 1 Page image version

Representation 2 ebook version

File 1: page1.tiff

File 2: page2.tiff

File N: pageN.tiff

File 1: book.lit

File N+1: METS.xml

Page 33: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Semantic units pertaining to Objects

•  Object identifier •  Preservation level •  Significant characteristics •  Object characteristics

–  fixity –  format –  size –  creating application –  inhibitors –  object characteristics

extension •  Original name

•  Storage •  Environment

–  software –  hardware

will change in 3.0 •  Digital signatures •  Relationships •  Linking event identifier •  Linking rights statement

identifier

Page 34: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Events

Examples: §  Validation Event: use JHOVE tool to

verify that chapter1.pdf is a valid PDF file

§  Ingest Event: transform an OAIS SIP into an AIP (one Event or multiple Events?)

•  An action that involves or impacts at least one Object or Agent associated with or known by the preservation repository

•  Helps document digital provenance. Can track history of Object through the chain of Events that occur during the Objects lifecycle

•  Determining which Events are in scope is up to the repository (e.g., Events which occur before ingest, or after de-accession)

•  Determining which Events should be recorded, and at what level of granularity is up to the repository

Page 35: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Semantic units pertaining to Events: provenance and preservation activity

§  Event identifier §  Event type (e.g. capture, creation, validation, migration,

fixity check, ingestion) §  Event dateTime §  Event detail §  Event outcome §  Event outcome detail §  Linking agent identifier §  Linking object identifier

Page 36: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Agents

Examples: §  Rebecca Guenther (a person) §  New York Public Library (an

organization) §  JHOVE version 1.0 (a software

program)

•  Person, organization, or software program/system associated with an Event or a Right (permission statement)

•  Agents are associated only indirectly to Objects through Events or Rights

•  Not defined in detail in PREMIS DD; not considered core preservation metadata beyond identification

Page 37: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Semantic units pertaining to Agents

•  Agent Identifier •  Agent Name •  Agent Type •  Agent Note •  Agent Extension •  Linking Event Identifier •  Linking Rights Identifier

Page 38: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Rights Statements

Example: §  Priscilla Caplan grants FCLA

digital repository permission to make three copies of metadata_fundamentals.pdf for preservation purposes.

•  An agreement with a rights holder that grants permission for the repository to undertake an action(s) associated with an Object(s) in the repository.

•  Not a full rights expression language; focuses exclusively on permissions that take the form: –  Agent X grants Permission Y

to the repository in regard to Object Z.

Page 39: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Semantic units pertaining to Rights

•  Rights Statement •  Rights Statement Identifier •  Rights Basis •  Copyright Information •  License Information •  Statute Information •  Other Rights Information

•  Rights Granted •  act •  restriction •  termOfGrant •  rightsGranted

•  Linking Object Identifier •  Linking Agent Identifier •  rightsExtension

Page 40: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Relationships

•  PREMIS Data Dictionary supports expression of relationships between: –  Different Objects

•  Structural: relationships between parts of a whole •  Derivation: relationships resulting from replication or transformation of

an Object •  New relationships in 3.0: replacement, dependency, generalization,

reference –  Different Entities

•  Relationships are established through reference to Identifiers of other Objects or Entities

Page 41: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

PREMIS Maintenance Activity •  Web site:

–  Permanent Web presence, hosted by Library of Congress

–  Central destination for PREMIS-related info, announcements, resources

–  Home of the PREMIS Implementers’ Group (PIG) discussion list

•  PREMIS Editorial Committee:

–  Set directions/priorities for PREMIS development –  Coordinate future revisions of Data Dictionary and XML

schema –  Promote implementation –  International in scope, cross domain

http://www.loc.gov/standards/premis/

Page 42: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Implementation resources •  Tools:

–  XML schema –  PREMIS-in-METS toolbox <http://pim.fcla.edu> –  Controlled vocabularies at http://id.loc.gov –  RDF/OWL ontology for use as Linked Data

•  Guidelines: –  PREMIS conformance statement –  PREMIS & METS guidelines

•  Community Working groups on special topics •  Implementation Fairs

•  Others: –  Understanding PREMIS (available in multiple languages) –  PIG Forum –  Implementation Registry –  Tools Registry

Page 43: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Some implementers …

•  DAITTSS (Florida) •  Ex Libris Rosetta •  OCLC’s Digital Archive™ •  Archivematica •  HathiTrust •  TIPR (Towards Interoperable Preservation

Repositories) –  FCLA, NYU and Cornell

•  Digital libraries in Spain –  Mandated for use in cultural heritage preservation

repositories See: http://www.loc.gov/premis/premis-registry.html

Page 44: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

PREMIS Conformance

•  Conformance statement issued in 2010 •  PREMIS Conformance Working Group active

now •  Levels of conformance:

–  Level 1 A repository uses an internal metadata schema whose elements can be mapped to PREMIS. The mapped metadata can satisfy the principles of use at both the semantic unit and Data Dictionary levels. The repository is able to produce documentation demonstrating such mapping for representative samples of its holdings.

–  Level 2 A repository implements the PREMIS Data Dictionary as its internal metadata schema in a way that satisfies the principles of use at both the semantic unit and Data Dictionary levels and in a form that does not require further mapping or conversion.

Page 45: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

URLs, etc.

•  PREMIS Maintenance Activity: http://www.loc.gov/standards/premis/

•  PREMIS Data Dictionary for Preservation Metadata:

http://www.loc.gov/standards/premis/v2/premis-2-1.pdf

•  PREMIS Implementation Registry http://www.loc.gov/standards/premis/registry

•  PREMIS Implementers Group list http://listserv.loc.gov/listarch/pig.html

Page 46: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

A use case: the preservation health check

46

Page 47: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

-  Open Planets Foundation (OPF) A community hub for digital preservation whose main goal is

to jointly manage and improve tools and research outcomes for practical use.

-  OCLC Research A community resource for shared R&D that addresses

challenges facing libraries and archives in a rapidly changing information technology environment.

-  Bibliothèque nationale de France The BnF runs a fully operational trusted digital repository

(SPAR). They volunteered to become a PHC-pilot site.

What is the Preservation Health Check Pilot?

Page 48: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

As part of their preservation management task, repository managers need to be able to monitor the preservation status of the content of their repository.

We are looking at regular “routine check-ups” that can support this monitoring task. –  Monitoring should be made easy (automatically

generated reports or dashboard) –  Monitoring should be based on objective data,

generated by the repository (e.g. preservation metadata)

The Preservation Health Check proposition

Page 49: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

The analogy

Page 50: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

If a Preservation Health Check is a monitoring activity to be performed on a repository with digital content

1. What are empirical indicators (i.e. measures) for PHCs? 2. Are preservation metadata recorded by repositories

useful as health indicators for PHCs? Monitoring is about tracking change ... intentional and

unintentional change.

The research question

Page 51: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Goal: To develop an implementable logic (or protocol) to

support PHCs, and to test this logic against the store of preservation metadata maintained by an operational preservation repository.

Page 52: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

The BnF runs a fully operational trusted digital repository (SPAR). They volunteered to become a PHC-pilot site.

The empirical data consists of: 1.  A sample (200 GB) of the PREMIS data (AIP-METS

files), covering the following collections: –  Gallica = digitised periodicals, monographs, still images and

manuscripts (TIFF + OCR-files) –  Legal deposit Web harvests (warc files) –  3rd party collection (Centre Pompidou)

The pilot site

Page 53: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

The empirical data consists of (continued): 2.  All the Reference Information packages in SPAR that

contain reference information/code/specifications of (external) tools used during INGEST (ex. JHOVE) and of formats ingested;

3.  Per collection: SLAs defining policy agreements with SIP suppliers concerning the preservation regime to be applied at the INGEST and ARCHIVAL STORAGE stages.

The pilot site

Page 54: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Mapping PREMIS on to SPOT

PREMIS Data

Model

Int. Ent.

SPOT Model

Availability

Identity

Persistence

Renderability

Understandability

Authenticity

Objects

Agents

Rights

Events

Semantic Units

Threats

Page 55: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

preservation metadata in 2005 “Preservation metadata (…) metadata supporting

the functions of maintaining viability, renderability, understandability, authenticity, and identity in a preservation context.” (p. ix)

55

http://www.loc.gov/standards/premis/

Page 56: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata
Page 57: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Findings: coverage

SPOT property # of PREMIS semantic units*

•  Availability 16 •  Identity 19 •  Persistence 10 •  Renderability 15 •  Understandability 14 •  Authenticity 16 *Container level only; Agents, Events, Rights considered one semantic unit

Page 58: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Findings: coverage

•  What does coverage in terms of “number of PREMIS semantic units” mean?

•  More meaningful: Do the PREMIS semantic units address the threats associated with a SPOT property?

Example of a gap between SPOT and PREMIS: SPOT property: Understandability We found no PREMIS semantic units that provide

information that aids in the understanding or interpretation of the content of the archived digital object.

Page 59: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

A repository usually implements a large number of explicit and implicit policy decisions; however, PREMIS currently makes few provisions for recording these in preservation metadata (the semantic unit preservationLevel being a notable exception).

Findings: preservation policies

Page 60: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

PREMIS conformance does not require explicit encoding of metadata if the information applies to all objects in the repository.

This impedes the provision of automated PHC services (by a third-party provider) because efficient provision of this service would likely require the information in semantic units to be explicitly recorded, and implemented in a standard way.

Findings: explicit encoding

Page 61: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Logic for assessing Persistence

SPOT Model

Availability

Persistence

Identity

Renderability

Understandability

Authenticity

Threats

Six essential properties of successful digital preservation

Page 62: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

62

Page 63: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

•  If storage medium information is not available in PREMIS metadata, the PHC will need to take other information sources into account – such as audit reports generated by storage management systems.

•  We note that there are no pre-defined events for Corruption and Readability in PREMIS, which means that the repositories need to define their own events. PREMIS does provide a list of recommended event labels for the semantic unit eventType, but it is just a “suggested starter list”.

•  The repository should have policies in place that prescribe frequencies of fixity checks, of medium refreshment, backup policy, etc. The PREMIS semantic unit preservationLevel does not address such policies. The PHC flow thus needs to get the policy information from other sources.

Logic for assessing Persistence

Page 64: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

A use case: the preservation health check (to be continued)

64

Page 65: Preservation Metadata: between theory and practicePreservation Metadata Workshop (2) The Hague, the Netherlands 19 June 2014 Titia van der Werf adapted from: Rebecca Guenther, “Metadata

Thank You!

©2014 OCLC. This work is licensed under a Creative Commons Attribution 3.0 Unported License. Suggested attribution: “This work uses content from [presentation title] © OCLC, used under a Creative Commons Attribution license: http://creativecommons.org/licenses/by/3.0/”

[email protected]