Mike Smorul, Mike McGann, Joseph JaJa
description
Transcript of Mike Smorul, Mike McGann, Joseph JaJa
May 23 2007 Archiving 2007 1
PAWN: A Policy-Driven Software PAWN: A Policy-Driven Software Environment for Implementing Environment for Implementing
Producer-Archive Interactions in Producer-Archive Interactions in Support of Long Term Digital Support of Long Term Digital
PreservationPreservation Mike Smorul, Mike McGann, Joseph JaJa
Institute for Advanced Computer Science StudiesUniversity of Maryland, College Park
Sponsored by National Archives and Records Administration, Library of Congress and NSF
May 23 2007 Archiving 2007 2
Problems Facing IngestionProblems Facing Ingestion
• Ensure integrity of data ingestion• Each producer-archive interaction is
unique• Final destination for items in an archive is
unique.• Differing roles between producer and
archive• Hostile producers
May 23 2007 Archiving 2007 3
What is PAWN?What is PAWN?
• Software that provides an ingestion framework
• Distributed and secure ingestion of digital objects into an archive.
• Handles the process – From package assembly – To archival storage
• Simple, customizable interface for end-users
• Flexible interface for archive publication
May 23 2007 Archiving 2007 4
Package WorkflowPackage Workflow
1. Create Producer-Archive Agreement2. Client package template.3. Create package based on template4. Once approved, packages can be archived5. Rejected packages can be held until rectified or
deleted for resubmission.
Package Builder Review
Producer Agreement
· AdministrativeStrategic and Performance PlansAppointment and PromotionPolicies and CommitteesAlumni Affairs
· FinancialContracts and GrantsPayrollDonations
· Publication ReportsTechnical ReportsPresentationsPostersOutreach
Template
Template Name: Research ResultsNotes: Published results and conference presentations
Contents:· Presentations
· Technical Reports
Create Template Create Package Audit Package
Activity Log
Package Lifecycle
ArchiveArchive Gateway
Archive
May 23 2007 Archiving 2007 5
Expanding a Simple WorkflowExpanding a Simple Workflow
• Support for multiple workflows.– Grouped into logical domains
• Definable roles per workflow• Pluggable components for assembly and
archival publishing• Distributed components
– Web-service based components
May 23 2007 Archiving 2007 6
Domain OrganizationDomain Organization
• Producers organized into domains, each domain contains a transfer agreement negotiated with the archive.
• Each domain contains a hierarchical organization of data grouped into record sets/templates (convenient groupings from the transfer agreement).
• Each domain contains its own users.
• An end-user operates within a set of record sets.
May 23 2007 Archiving 2007 7
Domain ExampleDomain Example
May 23 2007 Archiving 2007 8
Custom RolesCustom Roles
• Actions in PAWN can be grouped together to create roles.– There are no common roles between archives, so allow custom
ones.
• Default roles– Producer – Individual data supplier– Records Manager – Oversight of producers– Archive Manager – Final review and archive publishing– Global Administrator – Creates domain, sysadmin-like account
• Sample Actions– Setting permissions on record sets– Record Schedule creation and modification– Add or delete whole packages– Modify items in a package…
May 23 2007 Archiving 2007 9
Custom Package BuildingCustom Package Building
• PAWN provides an API for developing custom package builders
• Custom package builders can be written in JAVA and implement a simple interface.
• Builders interact with a hierarchical structured package
Manifest·Namespace·Type·Descriptive Name
Data·Type·Descriptive Name·Bits
Metadata…
Manifest…
Metadata·Type·Bits·Name
May 23 2007 Archiving 2007 10
PAWN Archive GatewayPAWN Archive Gateway
• Pluggable component that provides an API for developing gateways into various services.
• Each gateway may have multiple instances, each configured differently
• PAWN handles managing and associating gateways with the appropriate data.
May 23 2007 Archiving 2007 11
PAWN ArchitecturePAWN Architecture
• Divided into producer and archive side components– Producer: data supplying and domain
management– Archive: data storage, resource
allocation and archival publishing
• Web-service based communication
• Trust relationship between producer and archive components– SAML and PKI
May 23 2007 Archiving 2007 12
ComponentsComponents
May 23 2007 Archiving 2007 13
Case StudiesCase Studies
• ICDL Book Builder• SLAC Record Ingestion• 10,000 CDroms
• Remote ingestion
• Unskilled labor
• Custom hardware
• Sample NARA ingestion
• Model government roles
• DOE Record Schedule
• Custom package builder
• Multiple data sources
• Model logical books
May 23 2007 Archiving 2007 14
PAWN SummaryPAWN Summary
• Platform for ingestion• Customizable Components
– Roles, ingest and publishing
• Distributed architecture
May 23 2007 Archiving 2007 15
More informationMore information
• Web site:– http://www.umiacs.umd.edu/research/adapt
• Wiki link for technical details.
• Or “I’m feeling lucky” Google keywords:– ADAPT UMIACS