Preservation Research Roadmap

14
San Diego Supercomputer Center University of California, San Die Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center [email protected] http://www.sdsc.edu/srb

description

Preservation Research Roadmap. Reagan W. Moore San Diego Supercomputer Center [email protected] http://www.sdsc.edu/srb. Preservation Environments. External World. Preservation Environment. Records. A preservation environment protects records from changes in the external world. - PowerPoint PPT Presentation

Transcript of Preservation Research Roadmap

Page 1: Preservation Research Roadmap

San Diego Supercomputer Center University of California, San Diego

Preservation Research Roadmap

Reagan W. Moore

San Diego Supercomputer Center

[email protected]

http://www.sdsc.edu/srb

Page 2: Preservation Research Roadmap

San Diego Supercomputer Center University of California, San Diego

Preservation Environments

External World

Preservation

Environment

Records

A preservation environmentprotects recordsfrom changesin the external world

Page 3: Preservation Research Roadmap

San Diego Supercomputer Center University of California, San Diego

Preservation Research Roadmap

• Interpreting digital data How to build generic format descriptions across both scientific data

and office products such that only the description is migrated to new syntax - persistent objects

• Preservation environment management How to build generic preservation management software that is

more broadly used

• Interoperability How to show preservation environments can exchange records

while preserving integrity and authenticity How to exchange records between systems with different

management policies

Page 4: Preservation Research Roadmap

San Diego Supercomputer Center University of California, San Diego

Research Agenda

• Generic infrastructure

• Infrastructure used for preservation should also support: Digital libraries Data grids Real-time sensor systems Workflow provenance systems Cyberinfrastructure

• Minimizes risk that infrastructure will become obsolete Includes development efforts from other projects

Page 5: Preservation Research Roadmap

San Diego Supercomputer Center University of California, San Diego

Scientific Data Format Virtualization

• Characterize the properties of a digital entity independently of the creation application (scientific data)

Describe the structures present within the bit stream - DFDL Describe the relationships present between the structures

• Logical relationships Semantic labels

• Temporal relationships Mapping of time stamps to a coordinate system

• Structural relationships Mapping of bytes to words to arrays

• Spatial relationships Mapping of arrays to coordinate systems Mapping of coordinate systems to geometry

• Functional relationships Mapping of semantic labels to physical quantities, and the allowed

compositions of the physical quantities

Page 6: Preservation Research Roadmap

San Diego Supercomputer Center University of California, San Diego

Persistent Objects

• Keep the original bits unchanged• Separate knowledge required for parsing from manipulation behaviors

• Migrate the knowledge representation onto new syntax over time

• For office products - Multivalent Structure and relationships captured within a media adaptor Behaviors (manipulations of the structures) based on the defined

relationships Can add new behaviors on the original structures Or can restrict presentation to the original behaviors.

Page 7: Preservation Research Roadmap

San Diego Supercomputer Center University of California, San Diego

Designated Community

• Each designated community defines:

• Standard semantics Astronomy community - Uniform Content Descriptors

• Standard encoding format Astronomy community - FITS file

• Standard services Manipulate standard format using standard semantics Astronomy community - SIAP, Simple Image Access Protocol

• Can we build better representations for description of the community standards?

Can format virtualization simplify tasks for the designated community?

Page 8: Preservation Research Roadmap

San Diego Supercomputer Center University of California, San Diego

Page 9: Preservation Research Roadmap

San Diego Supercomputer Center University of California, San Diego

Preservation Environment

Preservation Environment

Preservation Properties

Preservation Control

Preservation Operations

Management Functions

Assessment Criteria

Management Policies Capabilities

Preservation Environment

Persistent State Rules Services

Physical Infrastructure Database Rule Engine

Storage System

SRB

iRODS

Page 10: Preservation Research Roadmap

San Diego Supercomputer Center University of California, San Diego

iRODS - integrated Rule-Oriented Data System

Resources

Client Interface Admin Interface

MetadataModifierModule

ConfigModifierModule

RuleModifierModule

ConsistencyCheckModule

Confs

RuleBase

MetadataPersistent

Repository

Engine

Rule

Current State

Rule Invoker

MicroService

Modules

Resource-based Services

MicroService

Modules

Metadata-based Services

ServiceManager

ConsistencyCheckModule

ConsistencyCheckModule

Page 11: Preservation Research Roadmap

San Diego Supercomputer Center University of California, San Diego

iRODS - infrastructure independence

• Six logical name spaces required to manage preservation properties Records Persons Storage resources Rules Micro-services Persistent state information

Page 12: Preservation Research Roadmap

San Diego Supercomputer Center University of California, San Diego

Summary of Mapping ERA Capabilities to Management Rules

• Multiple systems need to be integrated: PAWN submission pipeline - 34 operations Cheshire indexing system - 13 operations Kepler workflow - 53 operations iRODS data management - 597 operations Operations facility - the remaining capabilities

• The 597 operations are executed by 174 generic rules• The analysis identified five types of metadata attributes:

Collection metadata - 11 attributes File metadata - 123 attributes User metadata - 38 attributes Resource metadata - 9 attributes Rule metadata - 32 attributes

Page 13: Preservation Research Roadmap

San Diego Supercomputer Center University of California, San Diego

Two Types of Rules

• Manage micro-services Replicate, validate integrity, synchronize, manage disposition, … Compare outcomes with expectations

• Manage structured information Parse information from submission agreements, disposition agreements Format information for dissemination information packages, archival

information packages, error reporting

• Expect transformation to higher levels of granularity Structured management policies Structured micro-services - workflows Structured assertions

Page 14: Preservation Research Roadmap

San Diego Supercomputer Center University of California, San Diego

More Information

[email protected]

SRB:http://www.sdsc.edu/srb

iRODS:http://www.sdsc.edu/srb/future/index.php/Main_Page