Infrastructure Training Session

45
An Infrastructure for Preservation Claudio Prandoni Marlis Valentini MetaWare SpA & CASPAR

description

This presentation was delivered during the joint DPE/Planets/CAPAR/nestor training event, ‘The Preservation challenge: basic concepts and practical applications’ (Barcelona, March 2009). It explains how CASPAR aims to solve, from the technical point of view, the problem of accessibility and intelligibility of digital data in the long term.CASPAR approach is presented as an implementation of the OAIS functional model, introducing CASPAR Key Components, i.e. the main building blocks which constitute CASPAR architecture, and giving an overview of their functionalities, their usage and their role in the digital preservation workflow. The objective is to clarify how the digital preservation workflow is realised within CASPAR architecture.

Transcript of Infrastructure Training Session

Page 1: Infrastructure Training Session

An Infrastructure for Preservation

Claudio Prandoni

Marlis Valentini

MetaWare SpA & CASPAR

Page 2: Infrastructure Training Session

Programme

• Digital preservation threats and requisites• Summary of OAIS model• From OAIS to CASPAR• CASPAR key components• Ex. 1: Preservation step by step• Demo: A simple web application• Ex. 2: CASPAR answers to preservation threats• A preservable architecture• Interviews: Two case studies

Page 3: Infrastructure Training Session

Introduction

• How can digital data still be used and understood in the future when systems, software, and everyday knowledge continues to change? This is the CASPAR challenge.

Page 4: Infrastructure Training Session

Preservation Issue 1

• Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved– How to guarantee digital information may be

accessed and understood in the future?– How to guarantee retrieval of Archival

Information?– How to guarantee intelligibility of digital

information within heterogeneous Designated Communities?

Page 5: Infrastructure Training Session

Preservation Issue 2

• Non-maintainability of essential hardware, software or support environment may make the information inaccessible– How to guarantee preservation actors are

informed about change events?– How to guarantee appropriate actions are

undertaken to preserve Archival Information against change events?

Page 6: Infrastructure Training Session

Preservation Issue 3

• The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity– How to guarantee an adequate integrity and

identity for any Archival Information?

Page 7: Infrastructure Training Session

Preservation Issue 4

• Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future– How to guarantee an adequate security

access with the proper rights to any resource and functionality within an Archive?

Page 8: Infrastructure Training Session

Preservation Issue 5

• The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future– How to guarantee a proper information

package management within and Archive?– How to guarantee long-time preservation

maintenance of any information package?

Page 9: Infrastructure Training Session

The CASPAR Project

• The CASPAR project is mainly based on the OAIS standard ISO:14721:2003

• In this perspective, its Architecture is defined for– Managing key concepts of the OAIS reference model– Supporting main functionality identified in the OAIS

functional model

• Moreover, the CASPAR project aims to define and implement interfaces and functionally independent components

Page 10: Infrastructure Training Session

OAIS Information Model

Content Information

DataObject

interpreted using

interpretedusing

Designated CommunityKnowledge Base

InformationPackage

PreservationDescriptionInformation

Needed for long-term

preservation

DescriptiveInformation

Needed for discovery

Primary focus of archival

preservation

RepresentationInformation

Page 11: Infrastructure Training Session

OAIS Functional Model

Manager

Consumer

Producer

Page 12: Infrastructure Training Session

CASPAR Implementation

Monitoring OAIS Environment Monitoring OAIS Environment

Detect Changes/Impacts in DCKBDetect Changes/Impacts in DCKB

Mapping out Preservation Strategy Mapping out Preservation Strategy

Provide Recommendations Provide Recommendations

STORAGESTORAGE

AIP StorageAIP Storage

AIP Maintenance AIP Maintenance

AIP Retrieval AIP Retrieval

DATA MANAGEMENTDATA MANAGEMENT

Populate Descriptive InfoPopulate Descriptive Info

Maintain Descriptive InfoMaintain Descriptive Info

Access Descriptive Info Access Descriptive Info

INGESTINGEST

Receive SIP Receive SIP

Q-check on SIP Q-check on SIP

Generate AIP Generate AIP

Extract DescInfo Extract DescInfo

Coordinate updates Coordinate updates

ACCESSACCESS

Query ProcessingQuery Processing

RetrievalRetrieval

Delivery Delivery

Perform Transformation Perform Transformation

Security Security

Access Control Access Control

Page 13: Infrastructure Training Session

STORAGESTORAGE

DATA MANAGEMENTDATA MANAGEMENT

INGESTINGEST

ACCESSACCESS

CASPAR Implementation

Page 14: Infrastructure Training Session

CASPAR key components

Creation, maintenance and reuse of OAIS Representation Information

Allow search of an object using either a related measurable parameter or a linkage to remote values

Construction and unpackaging of OAIS Information Packages

Centralised and persistent storage and retrieval of OAIS Representation Information, including PDI

OAIS-based Preservation Aware Storage, providing built-in support for bit and logical preservation

Page 15: Infrastructure Training Session

CASPAR key components

Information discovery services

Definition and enforcement of access control policies

Registration of provenance information on digital works and retrieval of right holding information

Maintenance and verification of authenticity in terms of identity and integrity of the digital objects

Reception of notifications from Publishers for a specific “topic” and sending of alerts to Subscribers

Definition of Designated Communities, identification of missing Representation Information

Page 16: Infrastructure Training Session

The CASPAR Workflow

Page 17: Infrastructure Training Session

Preservation step by step

1) The digital content object has to be “prepared” and “packed” in a proper way to be “ingested” in the digital archive system that will manage and maintain it for a long time.

2) The digital content object has to be “retrieved” within the digital archive, through its descriptive information, and “checked” for any restricting access right policy.

3) The digital content object within the digital archive needs to be maintained in order to be accessed, used and understood for whatever changes during its long-term lifecycle.

Page 18: Infrastructure Training Session

18

Ingestion steps

Page 19: Infrastructure Training Session

Ingestion Phase

InformationPackaging

Components

InformationPackaging

Components

1. Ingest Content Information2. Create Information Package

• Representation Info• Descriptive Info• Preservation Description Info

3. Check Information Package4. Store Information Package for long term

OAIS

IngestIngest

Data Management

Data Management

Archival Storage

Archival Storage

PreservationPlanning

PreservationPlanning

AdministrationAdministration

AccessAccess

Page 20: Infrastructure Training Session

20

Access steps

Page 21: Infrastructure Training Session

Access Phase

InformationAccess

Components

InformationAccess

Components1. Search Content Information2. Obtain Information

Packages and relative Contents and Descriptions

3. Check Content Access Permissions

OAIS

IngestIngest

Data Management

Data Management

Archival Storage

Archival Storage

PreservationPlanning

PreservationPlanning

AdministrationAdministration

AccessAccess

Page 22: Infrastructure Training Session

22

Preservation steps

Page 23: Infrastructure Training Session

Preservation Phase

CommunicationComponents

CommunicationComponents 1. Notify and Alert for Change

Event impacting long term preservation

2. Trigger Preservation Process

OAIS

IngestIngest

Data Management

Data Management

Archival Storage

Archival Storage

PreservationPlanning

PreservationPlanning

AdministrationAdministration

AccessAccess

Page 24: Infrastructure Training Session

CASPAR innovations

• CASPAR aims at preserving not only the bits of digital objects but also the information and knowledge that is encoded in digital objects

• CASPAR aims at preserving digital rights on contents and at identifying mechanisms to ensure maintenance and verification of the authenticity of digital objects along the whole preservation process

Page 25: Infrastructure Training Session

Phaistos disk (1700 BC)

We still cannot understand it

(the meaning has not been preserved)

We can only understand it’s a “sequence of symbols”…

Page 26: Infrastructure Training Session

Rosetta Stone (196 BC)

…just a

“sequence of symbols”… but…

Ancient Heroglyphic Egyptian

Demotic Egyptian

Greek

Page 27: Infrastructure Training Session

Additional components

Designated Community & Knowledge

Management

Designated Community & Knowledge

Management

1. Deal with Designated Community Profile and its own Knowledge Base

2. Identify and Provide Knowledge Gap for understanding a Content Information

ProvenanceManagementProvenanceManagement

1. Deal with Digital Rights

2. Guarantee Authenticity

Page 28: Infrastructure Training Session

Web Application

Page 29: Infrastructure Training Session

CASPAR answers

• So…

Is CASPAR solution able to provide an answer to the digital preservation issues identified at the beginning?

Page 30: Infrastructure Training Session

Preservation Issue 1

• Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved

– You need the ability to create and maintain adequate Representation Information

Page 31: Infrastructure Training Session

Preservation Issue 1

• To guarantee a digital information may be accessed and understood in the future, you need an adequate OAIS Representation Information

• To guarantee retrieval of Archival Information, you need an OAIS Finding Aids

• To guarantee intelligibility of digital information within heterogeneous Designated Communities, you need to manage DC Profiles and their Knowledge Base

Page 32: Infrastructure Training Session

Preservation Issue 2

• Non-maintainability of essential hardware, software or support environment may make the information inaccessible

– You need the ability to share information about the availability of hardware and software and their replacements/substitutes

Page 33: Infrastructure Training Session

Preservation Issue 2

• To guarantee preservation actors are informed about change events, you need an adequate management of message exchange

• To guarantee appropriate actions are undertaken to preserve Archival Information against change events, you need to identify the information to be added/modified

Page 34: Infrastructure Training Session

Preservation Issue 3

• The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity

– You need the ability to bring together evidence from diverse sources about the Authenticity of a digital object

Page 35: Infrastructure Training Session

Preservation Issue 3

• To guarantee an adequate integrity and identity for any Archival Information, you need an Authenticity Tool

Page 36: Infrastructure Training Session

Preservation Issue 4

• Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future

– You need the ability to deal with Digital Rights correctly in a changing and evolving environment

Page 37: Infrastructure Training Session

Preservation Issue 4

• To guarantee an adequate security access with the proper rights to any resource and functionality within an OAIS Archive, you need a Security and DRM Management

Page 38: Infrastructure Training Session

Preservation Issue 5

• The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future

– You need brokering of organisations to hold data and the ability to package together the information needed to transfer information between organisations ready for long term preservation

Page 39: Infrastructure Training Session

Preservation Issue 5

– To guarantee a proper information package management within and OAIS Archive, you need to create an adequate OAIS Information Package

– To guarantee long-time preservation maintenance of any information package, you need an implementation of OAIS Archival Storage

Page 40: Infrastructure Training Session

Conclusion

Pla

tform

Pla

tform

Operating System: Linux, Unix, Windows, MacOperating System: Linux, Unix, Windows, Mac

Java PlatformJava Platform

DBMS: H2, PostgresDBMS: H2, Postgres

Fram

ew

ork

Fram

ew

ork

Development Framework: JAX-WS, GWT, AntDevelopment Framework: JAX-WS, GWT, Ant

Application Server: Tomcat, Glassfish, WASCEApplication Server: Tomcat, Glassfish, WASCE

KeyC

om

ponen

tsK

eyC

om

ponen

tsGapManagerGapManagerGapManagerGapManager

OrchestrationOrchestrationOrchestrationOrchestration

DataAccess&SecurityDataAccess&SecurityDataAccess&SecurityDataAccess&Security RepInfoToolboxRepInfoToolboxRepInfoToolboxRepInfoToolbox

RegistryRegistryRegistryRegistry

PackagingPackagingPackagingPackaging

DataStoresDataStoresDataStoresDataStores VirtualisationVirtualisationVirtualisationVirtualisation

CASPAR Service FactoryCASPAR Service Factory

AuthenticityAuthenticityAuthenticityAuthenticity

SemanticWebSemanticWebSemanticWebSemanticWeb

DigitalRightsDigitalRightsDigitalRightsDigitalRights FindingAidsFindingAidsFindingAidsFindingAids

Development Management: Hudson and JTracDevelopment Management: Hudson and JTrac

Th

e C

AS

PA

R F

ou

nd

atio

nT

he

CA

SP

AR

Fo

un

dat

ion

Th

e C

AS

PA

R F

ou

nd

atio

nT

he

CA

SP

AR

Fo

un

dat

ion

Page 41: Infrastructure Training Session

Preservable Equation

Self-Contained +

Well Described +

Adaptable +

Replaceable =

Preservable

Pure Service-oriented design guarantees that the component can provide functionality without requiring cooperation of other components

Component analysis, design and development process is strongly based on complete – shared – open documentation at any level

• No DependenciesNo Dependencies• Loosely coupledLoosely coupled• DistributedDistributed

• Sharing know-howSharing know-how• Open SpecificationOpen Specification• Open Source Open Source • Open DocumentationOpen Documentation

Design choices and implementation allows to adapt and configure each component to provide always at least a minimal set of functionality independently from the deployment framework and condition

• FlexibilityFlexibility• ScalabilityScalability

Design choices and implementation allows to replace any component in the framework with compliant one.

• InteroperabilityInteroperability• MantainabilityMantainability

Page 42: Infrastructure Training Session

The Developer Community

http://developers.casparpreserves.eu:8080http://developers.casparpreserves.eu:8080

• Shared and cooperative development community based on– CASPAR Best PracticesCASPAR Best Practices

• Development Management based on a detailed– D1302 Overall Master PlanD1302 Overall Master Plan

– Refinement SpecificationsRefinement Specifications

• Development Control based on a Continuous Integration Engine– Hudson + JTracHudson + JTrac

• Specification, Software and Documentation available for developers & practitioners

Page 43: Infrastructure Training Session

CASPAR Preservation Nodes

Page 44: Infrastructure Training Session

Use cases

• Artistic Testbed – IRCAM• Scientific Testbed – ESA