Reference Model for an Open Archival Information System (OAIS) And Submission Agreements NOAA DSA...
-
Upload
reilly-fritter -
Category
Documents
-
view
216 -
download
2
Transcript of Reference Model for an Open Archival Information System (OAIS) And Submission Agreements NOAA DSA...
Reference Model for an Open Archival Information System
(OAIS)And Submission Agreements
Reference Model for an Open Archival Information System
(OAIS)And Submission Agreements
NOAA DSA TIM
Donald Sawyer/NASA/GSFC
26-October 2005
NOAA DSA TIM
Donald Sawyer/NASA/GSFC
26-October 2005
Topics (time permitting)Topics (time permitting)
• OAIS Reference Model
• Producer-Archive Interface Methodology Abstract Standard
• Submission Information Package (SIP) standardization (separate presentation)
• OAIS Reference Model
• Producer-Archive Interface Methodology Abstract Standard
• Submission Information Package (SIP) standardization (separate presentation)
OAIS Reference ModelOAIS Reference Model
• Consultative Committee for Space Data Systems• International group of space agencies• Developed variety of science discipline- independent
standards• Became working body for an ISO TC 20/ SC 13 about 1990
TC20: Aircraft and Space Vehicles
SC13: Space Data and Information Transfer Systems
– Ensured broad participation, including traditional archives
(Not restricted to space communities; all participation was welcomed!)
• Consultative Committee for Space Data Systems• International group of space agencies• Developed variety of science discipline- independent
standards• Became working body for an ISO TC 20/ SC 13 about 1990
TC20: Aircraft and Space Vehicles
SC13: Space Data and Information Transfer Systems
– Ensured broad participation, including traditional archives
(Not restricted to space communities; all participation was welcomed!)
What is a Reference Model?What is a Reference Model?
• A framework – for understanding significant relationships among the entities of
some environment, and
– for the development of consistent standards or specifications supporting that environment.
• A reference model– is based on a small number of unifying concepts
– is an abstraction of the key concepts, their relationships, and their interfaces both to each other and to the external environment
– may be used as a basis for education and explaining standards to a non-specialist.
• A framework – for understanding significant relationships among the entities of
some environment, and
– for the development of consistent standards or specifications supporting that environment.
• A reference model– is based on a small number of unifying concepts
– is an abstraction of the key concepts, their relationships, and their interfaces both to each other and to the external environment
– may be used as a basis for education and explaining standards to a non-specialist.
Organizational ApproachOrganizational Approach
• Organized US contribution under a framework with NASA lead– Established liaison with Federal Geographic Data Committee
(FGDC) and National Archives and Records Administration (NARA)
– Agency archives and users must be represented in this process
• An “Open” process– Important to stimulate dialogue with broad archive/user
communities
– Results of US and International workshops put on WEB
– Supported e-mail comments/critiques
• Organized US contribution under a framework with NASA lead– Established liaison with Federal Geographic Data Committee
(FGDC) and National Archives and Records Administration (NARA)
– Agency archives and users must be represented in this process
• An “Open” process– Important to stimulate dialogue with broad archive/user
communities
– Results of US and International workshops put on WEB
– Supported e-mail comments/critiques
Technical Approach: 1Technical Approach: 1
• Investigate other Reference Models.– ISO “Seven Layer”Communications Reference Model
– ISO Reference Model for Open Distributed Processing
– ISO TC211 Reference Model for Geomantics
• Define what is meant by ‘archiving of data’
• Break ‘archiving’ into a few functional areas (e.g., ingest, storage, access, and preservation planning)
• Investigate other Reference Models.– ISO “Seven Layer”Communications Reference Model
– ISO Reference Model for Open Distributed Processing
– ISO TC211 Reference Model for Geomantics
• Define what is meant by ‘archiving of data’
• Break ‘archiving’ into a few functional areas (e.g., ingest, storage, access, and preservation planning)
Technical Approach: 2Technical Approach: 2
• Define a set of interfaces between the functional areas
• Define a set of data classes for use in Archiving
• Choose formal specification techniques– Data flow diagrams for functional models and interfaces
– Unified Modeling Language (UML) for data classes
• Define a set of interfaces between the functional areas
• Define a set of data classes for use in Archiving
• Choose formal specification techniques– Data flow diagrams for functional models and interfaces
– Unified Modeling Language (UML) for data classes
Results: 1Results: 1• Reference Model targeted to several
categories of reader– Archive designers
– Archive users
– Archive managers, to clarify digital preservation issues and assist in securing appropriate resources
– Standards developers
• Adopted terminology that crosses various disciplines– Traditional archivists
– Scientific data centers
– Digital libraries
• Reference Model targeted to several categories of reader– Archive designers
– Archive users
– Archive managers, to clarify digital preservation issues and assist in securing appropriate resources
– Standards developers
• Adopted terminology that crosses various disciplines– Traditional archivists
– Scientific data centers
– Digital libraries
Results: 2Results: 2• Widely adopted as starting point in digital
preservation efforts– Digital libraries (e.g., Netherlands National Library)
– Traditional archives (e.g., US National Archives)
– Scientific data centers (e.g., National Space Science Data Center)
– Commercial Organizations (e.g., Aerospace Industries Association preservation working team)
• Published as final CCSDS standard (Blue Book) available from:
http://www.ccsds.org/documents/650x0b1.pdf
• Published as a final ISO standard: ISO 14721: 2003
• Widely adopted as starting point in digital preservation efforts– Digital libraries (e.g., Netherlands National Library)
– Traditional archives (e.g., US National Archives)
– Scientific data centers (e.g., National Space Science Data Center)
– Commercial Organizations (e.g., Aerospace Industries Association preservation working team)
• Published as final CCSDS standard (Blue Book) available from:
http://www.ccsds.org/documents/650x0b1.pdf
• Published as a final ISO standard: ISO 14721: 2003
Purpose and Scope: 1
• Framework for understanding and applying concepts needed for long-term digital information preservation –Long-term is long enough to be concerned
about changing technologies
• Also can be starting point for model addressing non-digital information
Purpose and Scope: 2Purpose and Scope: 2
• Provides set of minimal responsibilities to distinguish an OAIS from other uses of ‘archive’
• Framework for comparing architectures and operations of existing and future archives
• Provides set of minimal responsibilities to distinguish an OAIS from other uses of ‘archive’
• Framework for comparing architectures and operations of existing and future archives
Purpose and Scope: 3Purpose and Scope: 3
• Basis for development of additional related standards
• Addresses a full range of archival functions
– Ingest, Archival Storage, Data Management, Access, Preservation Planning, Administration
• Basis for development of additional related standards
• Addresses a full range of archival functions
– Ingest, Archival Storage, Data Management, Access, Preservation Planning, Administration
ApplicabilityApplicability
• Applicable to all long-term archives and those organizations and individuals dealing with information that may need long-term preservation
• Does NOT specify an implementation
• Applicable to all long-term archives and those organizations and individuals dealing with information that may need long-term preservation
• Does NOT specify an implementation
ConformanceConformance
• How does an archive conform?– It discharges the set of minimal responsibilities
– It supports the basic information concepts that address a definition of information and types of information packages
• How do other documents conform?– By using OAIS terms and concepts
• How does an archive conform?– It discharges the set of minimal responsibilities
– It supports the basic information concepts that address a definition of information and types of information packages
• How do other documents conform?– By using OAIS terms and concepts
Who wants to conform to OAIS?Who wants to conform to OAIS?
• All organizations that need to preserve digital information for extended periods– To demonstrate a level of awareness of digital
preservation needs
• Other standards and documents– For effective communication and integration
• All organizations that need to preserve digital information for extended periods– To demonstrate a level of awareness of digital
preservation needs
• Other standards and documents– For effective communication and integration
Open Archival Information System (OAIS)
• Information
– Any type of knowledge that can be exchanged
– Data are the representation forms of information
• Archival Information System
– Hardware, software, and people who are responsible for the acquisition, preservation and dissemination of the information
View of an OAIS Environment
OAIS(archive)
Management
Producer Consumer
• Producer provides the information to be preserved
• Management sets overall OAIS policy• Consumer seeks and acquires preserved
information of interest
• Negotiates and accepts information from information producers
• Obtains sufficient control to ensure long-term preservation
• Determines which communities (designated) need to be able to understand the preserved information
OAIS Responsibilities: 1
OAIS Responsibilities: 2OAIS Responsibilities: 2
• Ensures the information to be preserved is independently understandable to the Designated Communities
• Follows documented policies and procedures that ensure the information is preserved against all reasonable contingencies
• Makes the preserved information available to the Designated Communities in forms understandable to those communities
• Ensures the information to be preserved is independently understandable to the Designated Communities
• Follows documented policies and procedures that ensure the information is preserved against all reasonable contingencies
• Makes the preserved information available to the Designated Communities in forms understandable to those communities
OAIS Information Definition
• Information is always expressed (i.e., represented) by some type of data
• Data interpreted using its Representation Information yields Information
DataObject
InterpretedUsing its
RepresentationInformation
Yields
InformationObject
Information Package Definition
• An Information Package is a conceptual container holding two types of information
• Content Information• Preservation Description Information (PDI)
PreservationDescriptionInformation
ContentInformation
Content InformationContent Information
• The information that is the original target of preservation
• Deciding what is the Content Information may not be obvious and may need to be negotiated with the Producer
• The Content Data Object in the Content Information may be either a Digital Object or a Physical Object (e.g., microfilm, a physical sample)
• The information that is the original target of preservation
• Deciding what is the Content Information may not be obvious and may need to be negotiated with the Producer
• The Content Data Object in the Content Information may be either a Digital Object or a Physical Object (e.g., microfilm, a physical sample)
Preservation Description Information (PDI) : 1
Preservation Description Information (PDI) : 1
• Reference Information– Provides one or more identifiers, or systems of
identifiers, by which the Content Information may be uniquely identified
• Provenance Information– Describes the source of Content Information, who has
had custody of it, what is its history
• Reference Information– Provides one or more identifiers, or systems of
identifiers, by which the Content Information may be uniquely identified
• Provenance Information– Describes the source of Content Information, who has
had custody of it, what is its history
Preservation Description Information (PDI) : 2
Preservation Description Information (PDI) : 2
• Context Information– Describes how the Content Information relates to other
information outside the Information Package
• Fixity Information– Protects the Content Information from undocumented
alteration
• Context Information– Describes how the Content Information relates to other
information outside the Information Package
• Fixity Information– Protects the Content Information from undocumented
alteration
Examples of PDIExamples of PDI
• Reference
– Bibliographic description; Persistent Ids
• Provenance
– Metadata on preservation process
• Context
– Pointers to related collections
• Fixity
– Digital signatures, checksums
• Reference
– Bibliographic description; Persistent Ids
• Provenance
– Metadata on preservation process
• Context
– Pointers to related collections
• Fixity
– Digital signatures, checksums
Information Package VariantsInformation Package Variants
• Submission Information Package (SIP)– Negotiated between Producer and OAIS
– Sent to OAIS by a Producer
• Archival Information Package (AIP)– Information Package used for preservation
– Holds complete set of Preservation Description Information for the Content Information
• Dissemination Information Package (DIP)– Includes part or all of one or more Archival Information Packages
– Sent to a Consumer by the OAIS
• Submission Information Package (SIP)– Negotiated between Producer and OAIS
– Sent to OAIS by a Producer
• Archival Information Package (AIP)– Information Package used for preservation
– Holds complete set of Preservation Description Information for the Content Information
• Dissemination Information Package (DIP)– Includes part or all of one or more Archival Information Packages
– Sent to a Consumer by the OAIS
Producer
Consumer
queries
resultsets
orders
OAIS
ArchivalInformationPackages
External Data Flow ViewExternal Data Flow View
SubmissionInformationPackages
DisseminationInformationPackages
ArchivalInformation
Package (AIP)
ContentInformation
PreservationDescriptionInformation
(PDI)e.g., • Hardcopy document
• Document as an electronic file together with its format description • Scientific data set consisting of image file, text file, and format descriptions file describing the other files
e.g., • How the Content Information came into being, who has held it, how it relates to other information, and how its integrity is assured
OAIS Archival Information Package
PackagingInformation
PackageDescription
further described by
delimited byderived from
e.g., How to find Content information and PDI onsome medium
e.g., Informationsupporting customersearches for AIP
Packaging InformationPackaging Information
• Information which, either actually or logically, binds and relates the components of the package into an identifiable entity on specific media
• Examples of typical Packaging Information include tar files, directory structures, filenames, and tape marks
• Information which, either actually or logically, binds and relates the components of the package into an identifiable entity on specific media
• Examples of typical Packaging Information include tar files, directory structures, filenames, and tape marks
Package DescriptionPackage Description
• Contains the data that serves as the input to documents or applications called Access Aids.
• Access Aids can be used by a Consumer to locate, analyze, retrieve, or order information from the OAIS.
• Contains the data that serves as the input to documents or applications called Access Aids.
• Access Aids can be used by a Consumer to locate, analyze, retrieve, or order information from the OAIS.
Functional Entities: 1Functional Entities: 1
• Ingest: This entity provides the services and functions to accept Submission Information Packages (SIPs) from Producers and prepare the contents for storage and management within the archive
• Archival Storage: This entity provides the services and functions for the storage, maintenance and retrieval of Archival Information Packages
• Data Management: This entity provides the services and functions for populating, maintaining, and accessing both descriptive information that identifies and documents archive holdings and internal archive administrative data.
• Ingest: This entity provides the services and functions to accept Submission Information Packages (SIPs) from Producers and prepare the contents for storage and management within the archive
• Archival Storage: This entity provides the services and functions for the storage, maintenance and retrieval of Archival Information Packages
• Data Management: This entity provides the services and functions for populating, maintaining, and accessing both descriptive information that identifies and documents archive holdings and internal archive administrative data.
Functional Entities: 2Functional Entities: 2
• Administration: This entity manages the overall operation of the archive system
• Preservation Planning: This entity monitors the environment of the OAIS and provides recommendations to ensure that the information stored in the OAIS remain accessible to the Designated Community over the long term.
• Access: This entity supports Consumers in determining the existence, description, location and availability of information stored in the OAIS and allows Consumers to request and receive information products
• Administration: This entity manages the overall operation of the archive system
• Preservation Planning: This entity monitors the environment of the OAIS and provides recommendations to ensure that the information stored in the OAIS remain accessible to the Designated Community over the long term.
• Access: This entity supports Consumers in determining the existence, description, location and availability of information stored in the OAIS and allows Consumers to request and receive information products
OAIS Functional EntitiesOAIS Functional Entities
SIP = Submission Information PackageAIP = Archival Information PackageDIP = Dissemination Information Package
SIP
DescriptiveInfo.
AIP AIP DIP
Administration
PRODUCER
CONSUMER
queriesresult sets
MANAGEMENT
Ingest Access
DataManagement
ArchivalStorage
DescriptiveInfo.
Preservation Planning
orders
Submission AgreementSubmission Agreement
• Negotiated between Producer and Archive
• Identifies the SIPs to be submitted by the Producer
• May include mandatory requirements
• Not further expanded in the OAIS Reference Model
• Negotiated between Producer and Archive
• Identifies the SIPs to be submitted by the Producer
• May include mandatory requirements
• Not further expanded in the OAIS Reference Model
Reference Model Summary• Reference model is applicable to all digital archives, and
their Producers and Consumers
• Establishes common terms and concepts for comparing implementations, but does not specify an implementation
• Identifies a minimum set of responsibilities for an archive to claim it is an OAIS
• Provides detailed models of both archival functions and archival information
• Also discusses OAIS information migration and interoperability among OAISs
C. Huc/CNES, D. Boucon/CNES-SILOGIC,
D.M. Sawyer/NASA/GSFC, J.G. Garrett/NASA-Raytheon
Producer-Archive Interface MethodologyProducer-Archive Interface MethodologyAbstract StandardAbstract Standard
(PAIMAS)(PAIMAS)
NOAA DSA TIM
RAYTHEON
Why a new standard?Why a new standard?Needs for standardization: problemsNeeds for standardization: problems
•The relations between archives and data Producers are rarely simple and easy:
•nonconformity of received data
•unclear and imprecise definition of the data to be delivered,
•failure to meet delivery schedule,
•late detection of errors in archived data,
•non-management of modifications
==> Can be detrimental to archived information quality and the cost of the operation.
•Ever increasing diversity of the producers
•Data complexity
•Each project develops its own methodology on the basis of a process that is roughly the same from one project to another
==> Work duplicated, no generality, excessively high costs, etc.
SIP = Submission Information Package
SIP
DIP
Administration
PRODUCER
CONSUMER
queriesresult sets
MANAGEMENT
Ingest Access
DataManagement
ArchivalStorage
DescriptiveInfo.
Preservation Planning
orders
AIP
AIP = Archival Information Package
DIP = Dissemination Information Package
PAIMAS Focus
MethodologyContext
MethodologyMethodologyDescriptionDescription
•The archive project is broken down into 4 main phases:
•Preliminary Phase,
•Formal Definition Phase,
•Transfer Phase,
•Validation Phase.
• Each phase has extensive action tables.
• Specialization for a community.
Data ready to archive
MethodologyMethodologyThe phases: relationshipsThe phases: relationships
Preliminary Agreement
DictionaryFormal modelSubmission Agreement
Transferred object files
Validation agreement
Ph
ase
obje
ctiv
e
Preliminary Phase
Formal Definition Phase
Transfer Phase
ValidationPhase
Anomalies
Validate
the
transfe
rred
objects
• Iden
tifica
tion a
nd pre
limin
ary
Defi
ne th
e in
form
atio
n
to
be
arch
ived
•reso
urces
est
imat
ion
•Neg
ocia
te th
e Su
bmiss
ion
Dev
elop
agre
emen
t (dat
a to
be
deliv
ered
, com
plem
enta
ry
ele
men
ts, s
ched
ule)
•Act
ual t
rans
fer o
f the
dat
a
Act
ual tr
ansf
er o
f the
ob
ject
s
Preliminary Phase
Preliminary Agreement
MethodologyMethodology Preliminary phase: context Preliminary phase: context
Archive
Producer
First contact
Preliminary definition,feasibility and assessment
Establishment of apreliminary agreement
MethodologyMethodologyPreliminary phase: sub-phasesPreliminary phase: sub-phases
Information to be archived, Quantification, Legal andcontractual aspects, permanent impact on the Archive,Summary of costs, etc.
Id Preliminary phase: quantification Involves
P-19 Estimate the data volume to be transmitted to the Archive Producer
P-20 Assess the permanent data volume to store Archive
P-21 Assess the storage capability need for the ingest process Archive
P-22 Assess the associated costs Archive
Action table
Description
Formal Definition Phase
Preliminary Agreement
Dictionary
Data Model
Submission Agreement
MethodologyMethodologyFormal Definition Phase: contextFormal Definition Phase: context
Organization of the FormalDefinition Phase
Formal definition
Submission Agreement
MethodologyMethodologyFormal Definition Phase: sub-phases Formal Definition Phase: sub-phases
and action tableand action table
Id Formal Definition Phase: Submission Agreement Involves
F-36 Draw up the Submission Agreement Producer and/or Archive
• information to be transferred (e.g., SIP contents, SIP packaging, data models, Designated Community, legal and contractual aspects);
• transfer definition (e.g. specification of the Data Submission Sessions); • validation definition;• change management (e.g. conditions for modification of the agreement, for
breaking the agreement); • schedule (submission timetable).
Transfer PhaseActual transfer of the objects:• carry out the transfer test• manage the transfer
Data
Model of object files to deliver
Schedule
Transferred object files
MethodologyMethodologyTransfer PhaseTransfer Phase
Validation Phase
Validate the transferred objects:• carry out the validation test• manage the validation
Transferred object files
Data ready to archive
Anomalies
MethodologyMethodologyValidation PhaseValidation Phase
Producer
Validation
acknowledgement
• Adapt the generic standard to a particular community (which can range from an international organization to a simple archive service)
• Steps involved to define a community standard
• terminology,
• data dictionary and information model,
• standards,
• common tools.
• Analyze each action of the generic standard (add and delete actions as appropriate)
SpecializationSpecialization
ConclusionConclusion
• PAIMAS identifies:
• the phases in the process of transferring information,
• the objective of the phases,
• the actions that must be carried out,
• the expected results.
• PAIMAS is a basis:
• for further specialization by a particular community
• for the identification of standards and implementation guides,
• for identification and development of a set of software tools.
PAIMAS approved as final Consultative Committee for Space Data Systems standard
….. http://public.ccsds.org/publications/archive/651x0b1.pdf
PAIMAS is undergoing ISO review as a final ISO standard
• Expect approval this Fall, 2005
PAIMAS Status