BNSC Report Fall 2007 David Giaretta. CASPAR Consortium Integrated project Total spend 16MEuro.
-
Upload
emily-fields -
Category
Documents
-
view
222 -
download
0
Transcript of BNSC Report Fall 2007 David Giaretta. CASPAR Consortium Integrated project Total spend 16MEuro.
BNSC Report Fall 2007
David Giaretta
CASPAR Consortium
http://www.casparpreserves.eu
Integrated project
Total spend 16MEuro
…CASPAR
• Strongly based on OAIS
• Passed 1st year EU review
CASPAR Aims• Produce tools and techniques to support digital
preservation and make it easier to share the cost– must be relatively easy to use– must have a low “buy-in” in terms of effort required for adoption– must avoid requiring wholesale change of everyone else’s
systems– must be decentralised and reproducible so that it can live on
after the formal end of the CASPAR project– must be “preservable”– must be open: open source, open standards
• Cannot do everything– Working closely with other projects
Validation• How can we judge any proposed solution?
• CASPAR validation metrics:– Theoretic underpinning– Testbed scenarios addressing real issues
• No “hand-waving” – use what is there now• Accelerated lifetime tests
– Hardware and Software – Environment– People
– Improved “trustability”/”certifiability”
Live a long time
Evidence - not proof
CASPAR information flow architecture
•Rep
•Info
Virtualisation
RegRepData
Curator
RepInfo toolkit
Repository
Gap ManagerOrchestration
ApplicationUser
Data Source
INFRASTRUCTURE ELEMENTS
Preservation Aware Storage and Preservation DataStores
• Preservation Aware Storage - The storage component of a digital preservation system that has built-in support for both bit preservation and logical preservation.
• Presevation DataStores (PDS) is a new OAIS-based preservation-aware storage. It offloads functionality to the storage layer– Decrease the probability of data loss– Simplify the applications– Provide improved performance and robustness– Utilize locality properties
• Compute data intensive functions internally e.g. fixity• Provide better support for links among objects
Preservation Aware Storage Functionality
Functionality Rational
Physically co-locate the Information Object (AIP).
However, this is relaxed if the AIP data already resides in an existing archive
Ensure metadata is never lost when raw data survives
Execute data intensive functions at the storage component:
–fixity computations and validation–data transformation
Utilize the data locality property
Lessen data transfers to applications
Handle technical provenance events internally
E.g. migration and copy occurs at the storage
Simplify applications
Support the loading and execution of external transformations
Ideally performed during bit-migration performed close to data
Preservation Aware Storage Functionality (Cont.)
Functionality Rational
Maintain referential integrityUpdate links during migration
Ideally done during migration
Ensure readability of the data by a different system in the future.
Support global self-described formats
Interaction with backend storage
Support media migrationLoad and execute transformations Portable export format
Interaction with backend storage
Support a graceful loss of dataSelf-describing self-contained media format
Minimize the effect of media loss/corruption
PDS ArchitecturePreservation Web Services
Applications
Ingest, Access, Administration, …
backend
Preservation Engine Layer
Pre
serv
atio
n D
ataS
tore
AIP
XAM Layer
Object/File Layer
Layered approach Prototype based on open standards
OAIS, XAM, OSD Generic gradual mapping from logical
to physical object Independent of physical storage Independent of stored data type Scalable
PDS Architecture
HL OSD +
Object Store
XAM to OSD
Preservation Web Services
XAM Library
Applications
Preservation WSDL
Ingest, Access, Administration, …
XAM API
WAS CE
backend
Security Adminweb service
XAM to FS
File System
VIM API
sockets
VIM API
RepInfo Mgr
Placement MgrMigration Mgr
PDI Mgr
Preservation Engine
Pre
serv
atio
n D
ataS
tore
HL OSD
AIP
posix I/O
Preservation
Engine
Layer
XAM
Layer
Object
Layer
Preservation DataStores
• Preservation DataStores are OAIS-based preservation aware storage• API covers different options for ingest and access, configure policies and
enables updates of AIPs and PDS code • Prototype implements mainly ingest and access using web services• References
– “Towards OAIS-Based Preservation Aware Storage - A White Paper“.• http://www.haifa.il.ibm.com/projects/storage/datastores/public.html
– “The Need for Preservation Aware Storage - A Position Paper". • ACM SIGOPS Operating Systems Review, Special Issue on File and
Storage Systems, Volume 41, Issue 1 (Jan 2007), pp 19-23.– “Preservation DataStores: Architecture for Preservation Aware Storage”, to
appear in 24th IEEE Conference on Mass Storage Systems and Technologies (MSST), 2007.
– Web site - http://www.haifa.il.ibm.com/projects/storage/datastores/index.html
Data Value
Vector
Image
Earth Observation
image
Astronomical image
Spectrum
Time Series
Virtualisation - building up data types…
3-D data
Content dependent components • Representation Information tools
– Structure• EAST• DRB• DFDL• Virtualisation assistant
– Semantics• RDF editors• RDFSuite• Terminology capture
– Software• UVC• Hardware emulators
• Trust, Authenticity & Provenance tools– Certification assistant– PREMIS
• Packaging tools– XFDU toolkit
Use existing tools where applicable
Develop new tools as needed and resources allow
Strawman Architecture…
…CASPAR Architecture Overview
CASPAR meets OAIS - 2
OAIS Information Model and CASPAR API
OAIS Information Model
Capture in UML diagrams
1. Add “obvious” methods• get/set for sub-components e.g. we know
AIP has PDI so need get/setPDI
2. Add “best guess” methods• Iterators over contents• May need to change
class Identifier Taxonomy
java.lang.Comparable
«interface»
Identifier
+ getLocators() : Collection<Locator>+ setLocator(Collection<Locator>) : void
DataObject
«interface»
PhysicalObjectLocator
«interface»
InfoObjectIdentifier
«interface»
PersistentIdentifier
«interface»
CurationPersistentIdentifier
«interface»
Locator
+ getIdValue() : String+ getResolver() : String+ setIdValue(String) : void+ setResolver(String) : void
class Representation Information
ContentInformation
«interface»
RepresentationInformation
+ getClassificationConcepts() : Collection<ClassificationConcept>+ getLatestVersion() : CurationPersistentIdentifier+ getStatus() : String+ setClassificationConcepts(Collection<ClassificationConcept>) : void
«interface»
SemanticRepInfo
«interface»
StructureRepInfo
«interface»
OtherRepresentationInformation
«interface»
RepresentationRenderingSoftware
«interface»
AccessSoftware
«interface»
RepInfoLabel
+ getDOM() : org.w3c.org.Document+ setDOM(org.w3c.doc.Document) : void
XXX Have made RepresentationInformation extend InformationPackage
«interface»
ClassiciationConcept
+ getConceptPath() : List<Concept>+ getConceptPath(List<Concept>) : void
«interface»
Concept
+ getDescription() : String+ getName() : String+ setDescription(String) : void+ setName(String) : void
Interpreted using
class Information Package Contents
java.lang.Comparable
«interface»
InformationPackage
+ getContentInformation() : ContentInformation+ getPackageDescription() : PackageDescription+ getPackagingInformation() : PackagingInformation+ getPDI() : PreservationDescriptionInformation+ getVersion() : Version+ setContentInformation(ContentInformation) : void+ setPackageDescription(PackageDescription) : void+ setPackagingInformation(PackagingInformation) : void+ setPDI(PreservationDescriptionInformation) : void
InformationObject
«interface»
ContentInformation
InformationObject
«interface»
PreservationDescriptionInformation
+ getContextInformation() : ContextInformation+ getFixityInformation() : FixityInformation+ getProvenanceInformation() : ProvenanceInformation+ getReferenceInformation() : ReferenceInformation
InformationObject
«interface»
PackagingInformation
«interface»
PackageDescription
ISSUE: VerionsIdentifiers point to specific versionsThis may cause an issue with Provenance and handing an iAIP from one OAIS to another - if the Provenance changes then does the version (and therefore the identifier) associated with that AIP.
delimited by
1
described by*
0..1
1
further described by
*
1
identifies
*
derived from
1
class Archiv al Information Package Contents
InformationPackage
«interface»
ArchivalInformationPackage
+ isValid() : boolean
InformationObject
«interface»
PreservationDescriptionInformation
+ getContextInformation() : ContextInformation+ getFixityInformation() : FixityInformation+ getProvenanceInformation() : ProvenanceInformation+ getReferenceInformation() : ReferenceInformation
Note that an AIP must have some PDI. A general Information Package is not required to have any PDI.
1*
class Information Object Contents
«interface»
InformationObject
+ getDataObject() : DataObject+ getRepresentationInformation() : RepresentationInformation+ setDataObject(DataObject) : void+ setRepresentationInformation(RepresentationInformation) : void
«interface»
DataObject
ContentInformation
«interface»
RepresentationInformation
+ getClassificationConcepts() : Collection<ClassificationConcept>+ getLatestVersion() : CurationPersistentIdentifier+ getStatus() : String+ setClassificationConcepts(Collection<ClassificationConcept>) : void
«interface»
DigitalObject
+ getDataResource() : DataResource+ getInformationsObjects() : Collection<InformationObject>+ setDataResource(DataResource) : void+ setInformationObjects(Collection<InformationObject>) : void
«interface»
BitSequence
Identifier
«interface»
PhysicalObjectLocator
*
Interpreted using*
Interpreted using
1..*1
Summary• The Conceptual Model is based on OAIS and works out some
implications
• It suggests area of Research– Intelligibility– Structure
• Virtualisation
– Authenticity
• It leads into the Architecture which is– Broadly applicable– Is useful not just for Preservation but also interoperability
• Note - Registry/Repository of Representation Information– http://registry.casparpreserves.eu– http://registry.dcc.ac.uk
Digital Curation Centre
• DCC Development closely linked to CASPAR
• Other linked JISC funded projects:– SCARP– Significant properties of software– …may be others
Audit and Certification
The need for Trustable Repositories• Task Force on Archiving of Digital Information
(1996) declared,– “a critical component of digital archiving
infrastructure is the existence of a sufficient number of trusted organizations capable of storing, migrating, and providing access to digital collections.”
– “a process of certification for digital archives is needed to create an overall climate of trust about the prospects of preserving digital information.”
• A recurring request in many subsequent studies and workshops
Trusted Digital Repositories
• Invited group, hosted by Research Library Group (RLG)
• Concerned with organisational and financial issues
• Trusted Digital Repositories: Attributes and Responsibilities (TDR)– http://www.rlg.org/legacy/longterm/repositories.pdf
Critique of TRAC• Closed process
– Single review of draft document• Many changes based on unpublished “test audits”• Underplays “understandability”
– Important for data– Assumed not to be important for “documents”
• Simple list –– Do ALL boxes have to be ticked?– What does a “tick” mean anyway?
• Link to other standards – ISO 17799/27001 for security (overlap with TRAC section C)– ISO 9000 – say what you do and do what you say
– but impractical to demand multiple independent audits
ISO process status• New group set up with the primary aim of producing an
ISO standard– Repository Audit and Certification (RAC)
• OPEN process– Wiki open to all– Mailing list open to all– Virtual meetings normally every week– See http://wiki.digitalrepositoryauditandcertification.org
• Into ISO via CCSDS – same route as OAIS– Some organisational/procedural changes in CCSDS
• Currently a Birds of a Feather (BoF) group– To demonstrate adequate support for the work
• Subsequently should become a Working Group• Documents agreed by the WG will then be reviewed by
CCSDS and more broadly via international ISO review process
Current status
• Reviewing and comparing– TRAC– NESTOR– DCC documents
• Do we need another ISO standard?– Could we could simply add to existing standards
e.g. ISO 27001– The view is that ISO 27001 CANNOT be modified
adequately• It’s view of Information is too limited
• Started drafting a straw man document– Taking TRAC and add concepts from other docs
Key Issues• How to get from a checklist to an international
accreditation/ certification system?
• Evidence – short term• Evidence – long term
– The real crunch!• Quantification
– The marking system• Levels of audit?
– External review– Internal maturity
The Market• Transparency
• Trustable?– certified by whom?– to what level?– what evidence?– for what Designated Community
• relevant/sensible?
• What cost?
Links
• RAC group Wiki: – http://wiki.digitalrepositoryauditandcertifiation.org
• TRAC document – http://www.crl.edu/PDF/trac.pdf
• Digital Curation Centre– http://www.dcc.ac.uk
• CASPAR project – EU project on digital preservation – Science, Culture and
Arts data• Infrastructure, tools and detailed case studies – what does one
need to actually “understand” the data?
– http://www.casparpreserves.eu
Alliance for Permanent Access• Members:
– Science and Technology Facilities Council– Koninklijke Bibliotheek– Deutsche Nationalbibliothek– Max Planck Gesellschaft– International Association of Scientific, Technical and Medical
Publishers– European Space Agency, ESRIN– Fernuniversität in Hagen– European Organization for Nuclear Research– Georg-August-Universitat Gottingen Stiftung Oeffentlichen
Rechts– European Science Foundation, – Centre National d’Etudes Spatiales, – Centre Informatique National de l’Enseignement Supérieur,– UK Joint Information Systems Committee, – British Library– National Archives of Sweden
Alliance status
• First stage – fairly informal sign-up
• Preparing for Conference in Nov
• More formal framework next year
PARSE bid
• Consortium is a sub-group of the Alliance
• EU bid
• Aims at E-Infrastructure for Preservation– Roadmap– Survey of what is in place and planned– Gap Analysis – Impact Analysis tool
Other opportunities
• NSF solicitation, entitled Sustainable Digital Data Preservation and Access Network Partners (DataNet)– http://www.nsf.gov/pubs/2007/nsf07601/nsf07601.pdf – informational meeting for prospective Principal
Investigators will be held 10 am to noon, Tuesday, November 6, 2007, Room 595 NSF Stafford II building, Arlington, Virginia.
– www.nsf.gov/dir/index.jsp?org=OCI