Preservation Research Roadmap
-
Upload
lance-oconnor -
Category
Documents
-
view
25 -
download
1
description
Transcript of Preservation Research Roadmap
San Diego Supercomputer Center University of California, San Diego
Preservation Research Roadmap
Reagan W. Moore
San Diego Supercomputer Center
http://www.sdsc.edu/srb
San Diego Supercomputer Center University of California, San Diego
Preservation Environments
External World
Preservation
Environment
Records
A preservation environmentprotects recordsfrom changesin the external world
San Diego Supercomputer Center University of California, San Diego
Preservation Research Roadmap
• Interpreting digital data How to build generic format descriptions across both scientific data
and office products such that only the description is migrated to new syntax - persistent objects
• Preservation environment management How to build generic preservation management software that is
more broadly used
• Interoperability How to show preservation environments can exchange records
while preserving integrity and authenticity How to exchange records between systems with different
management policies
San Diego Supercomputer Center University of California, San Diego
Research Agenda
• Generic infrastructure
• Infrastructure used for preservation should also support: Digital libraries Data grids Real-time sensor systems Workflow provenance systems Cyberinfrastructure
• Minimizes risk that infrastructure will become obsolete Includes development efforts from other projects
San Diego Supercomputer Center University of California, San Diego
Scientific Data Format Virtualization
• Characterize the properties of a digital entity independently of the creation application (scientific data)
Describe the structures present within the bit stream - DFDL Describe the relationships present between the structures
• Logical relationships Semantic labels
• Temporal relationships Mapping of time stamps to a coordinate system
• Structural relationships Mapping of bytes to words to arrays
• Spatial relationships Mapping of arrays to coordinate systems Mapping of coordinate systems to geometry
• Functional relationships Mapping of semantic labels to physical quantities, and the allowed
compositions of the physical quantities
San Diego Supercomputer Center University of California, San Diego
Persistent Objects
• Keep the original bits unchanged• Separate knowledge required for parsing from manipulation behaviors
• Migrate the knowledge representation onto new syntax over time
• For office products - Multivalent Structure and relationships captured within a media adaptor Behaviors (manipulations of the structures) based on the defined
relationships Can add new behaviors on the original structures Or can restrict presentation to the original behaviors.
San Diego Supercomputer Center University of California, San Diego
Designated Community
• Each designated community defines:
• Standard semantics Astronomy community - Uniform Content Descriptors
• Standard encoding format Astronomy community - FITS file
• Standard services Manipulate standard format using standard semantics Astronomy community - SIAP, Simple Image Access Protocol
• Can we build better representations for description of the community standards?
Can format virtualization simplify tasks for the designated community?
San Diego Supercomputer Center University of California, San Diego
San Diego Supercomputer Center University of California, San Diego
Preservation Environment
Preservation Environment
Preservation Properties
Preservation Control
Preservation Operations
Management Functions
Assessment Criteria
Management Policies Capabilities
Preservation Environment
Persistent State Rules Services
Physical Infrastructure Database Rule Engine
Storage System
SRB
iRODS
San Diego Supercomputer Center University of California, San Diego
iRODS - integrated Rule-Oriented Data System
Resources
Client Interface Admin Interface
MetadataModifierModule
ConfigModifierModule
RuleModifierModule
ConsistencyCheckModule
Confs
RuleBase
MetadataPersistent
Repository
Engine
Rule
Current State
Rule Invoker
MicroService
Modules
Resource-based Services
MicroService
Modules
Metadata-based Services
ServiceManager
ConsistencyCheckModule
ConsistencyCheckModule
San Diego Supercomputer Center University of California, San Diego
iRODS - infrastructure independence
• Six logical name spaces required to manage preservation properties Records Persons Storage resources Rules Micro-services Persistent state information
San Diego Supercomputer Center University of California, San Diego
Summary of Mapping ERA Capabilities to Management Rules
• Multiple systems need to be integrated: PAWN submission pipeline - 34 operations Cheshire indexing system - 13 operations Kepler workflow - 53 operations iRODS data management - 597 operations Operations facility - the remaining capabilities
• The 597 operations are executed by 174 generic rules• The analysis identified five types of metadata attributes:
Collection metadata - 11 attributes File metadata - 123 attributes User metadata - 38 attributes Resource metadata - 9 attributes Rule metadata - 32 attributes
San Diego Supercomputer Center University of California, San Diego
Two Types of Rules
• Manage micro-services Replicate, validate integrity, synchronize, manage disposition, … Compare outcomes with expectations
• Manage structured information Parse information from submission agreements, disposition agreements Format information for dissemination information packages, archival
information packages, error reporting
• Expect transformation to higher levels of granularity Structured management policies Structured micro-services - workflows Structured assertions
San Diego Supercomputer Center University of California, San Diego
More Information
SRB:http://www.sdsc.edu/srb
iRODS:http://www.sdsc.edu/srb/future/index.php/Main_Page