CODATA 2006, Beijing, China 23-25 Oct 2006 1 CASPAR: Early results and future goals David Giaretta.
-
Upload
lilian-kendall -
Category
Documents
-
view
216 -
download
0
Transcript of CODATA 2006, Beijing, China 23-25 Oct 2006 1 CASPAR: Early results and future goals David Giaretta.
1CODATA 2006, Beijing, China 23-25 Oct 2006
CASPAR: Early results and future goals
David Giaretta
2CODATA 2006, Beijing, China 23-25 Oct 2006
CASPAR aims• Produce tools and techniques to support digital
preservation and make it easier to share the cost– must be relatively easy to use– must have a low “buy-in” in terms of effort required for
adoption– must avoid requiring wholesale change of everyone else’s
systems– must be decentralised and reproducible so that it can live on
after the formal end of the CASPAR project– must be “preservable”– must be open: open source, open standards
• Cannot do everything but should do something broadly useful
• Working closely with the UK Digital Curation Centre
3CODATA 2006, Beijing, China 23-25 Oct 2006
Digital Preservation…
• Easy to do…
• …as long as you can provide money forever
• Easy to test claims about tools…
• …as long as you live a long time
4CODATA 2006, Beijing, China 23-25 Oct 2006
Validation
• Demonstrate theoretical basis• “Accelerated lifetime” tests
– Changes in hardware– Changes in environment– Changes in Designated Community
• Demonstrate increased trustworthiness – Measured using draft Certification
Standard
5CODATA 2006, Beijing, China 23-25 Oct 2006
Digital Preservation• Need to preserve information & knowledge – not
just “the bits”– Documents, videos are rendered – simple?– Data – must be processed - harder
• Need to manage knowledge to keep archives alive through time – Preservation is a process, not a one-time event– Preservation is expensive – costs need to be shared
• The alternative is money – endless supplies of money
• Open Archival Information Systems Reference Model (ISO 14721) provides a general conceptual framework
6CODATA 2006, Beijing, China 23-25 Oct 2006
Immediate benefits of Digital Preservation: Use of Unfamiliar Data
• Global Cyber-Infrastructures allow users to find and try to use data from many sources– Some sources will be familiar – Most available sources will be unfamiliar
• How can one be sure that the unfamiliar data is used correctly
• Garbage in – garbage out• Need to be able to deal with unfamiliar data
whether it is contemporary or old (preserved)
7CODATA 2006, Beijing, China 23-25 Oct 2006
OAIS Reference Model• ISO 14721 : Reference Model for an Open Archival Information Systems
(OAIS). http://public.ccsds.org/publications/archive/650x0b1.pdf• An OAIS is an archive, consisting of an organization of people and
systems, that has accepted the responsibility to preserve information and make it available for a Designated Community.
• Long Term Preservation: The act of maintaining information, in a correct and Independently Understandable form, over the Long Term.
• Long Term is long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing user community.
• Designated Community: An identified group of potential Consumers who should be able to understand a particular set of information. The Designated Community may be composed of multiple user communities.
• Has sufficient documentation to allow the information to be understood and used by the Designated Community without having to resort to special resources not widely available, including named individuals.
OASIS OAIXX
8CODATA 2006, Beijing, China 23-25 Oct 2006
OAIS Information ModelInformation
Object
RepresentationInformation
1+
interpretedusing1+Data
Object
interpretedusing
PhysicalObject
DigitalObject
BitSequence
1+
Recursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY
(this knowledge will change over time and region)
9CODATA 2006, Beijing, China 23-25 Oct 2006
Rep.Info. Classification
10CODATA 2006, Beijing, China 23-25 Oct 2006
FITS FILE
FITS STANDARD
PDF STANDARD
FITSJAVA s/w
JAVA VM
PDF s/w
FITS DICTIONARY
DICTIONARYSPECIFICATION
UNICODESPECIFICATION
XMLSPECIFICATION
11CODATA 2006, Beijing, China 23-25 Oct 2006
Representation Information• The Data Object is “interpreted using” the
Representation Information (RepInfo) • The Reference Model is designed to ensure
that an OAIS is not set the impossible task of having to provide all possible RepInfo immediately
• Hence:– Take account of the Designated Community and its
associated Knowledge Base
• The amount of RepInfo is not fixed– Additional RepInfo will be needed over time
12CODATA 2006, Beijing, China 23-25 Oct 2006
Early Results
• High level architecture for sharing cost and access to Representation Information
• Detailed examinations of specific datasets to understand what is really needed to keep them understandable and usable
13CODATA 2006, Beijing, China 23-25 Oct 2006
Rep. Info. Use and maintenance
14CODATA 2006, Beijing, China 23-25 Oct 2006
Registry for Representation Info
The Digital Object could have RepInfo packed with it, as well as CPID
Support automated access & processing
Rep. Info. Registry/Repository
network
Archive
User
Representation Information
Digital Object
CPID
CPID
CPIDCPID
CPID
CPID
CPID
Rep. Info. Registry/Repository
network
Archive
User
Representation Information
Digital Object
CPID
CPID
CPIDCPIDCPIDCPID
CPIDCPID
CPIDCPID
CPIDCPID
•1 – User gets data from archive. Data has associated Curation Persistent Identifier (CPID)
•2
•2 – User unfamiliar with data so requests Rep.Info.using CPID
•1
•3•3 – User receives Rep.Info – which has its own CPID in case it is not immediately usable
15CODATA 2006, Beijing, China 23-25 Oct 2006
CASPAR information flow architecture
•Rep
•Info
16CODATA 2006, Beijing, China 23-25 Oct 2006
CASPAR Testbeds• Three testbeds
– Cultural: UNESCO– Performing Arts: INA , IRCAM– Scientific: ESA and CCLRC
• Complex, multi-source, multifaceted data• Many common preservation & evaluation &
validation issues• Some specific requirements on preservation
(technical, delivery, legal)– Specific user communities/ Knowledge bases
• Also test the OAIS model
17CODATA 2006, Beijing, China 23-25 Oct 2006
Science: CCLRC example
Ionosonde data
World map of ionosondes
18CODATA 2006, Beijing, China 23-25 Oct 2006
Some Issues• Difficult to derive physical quantities
from data– Can be analysed in multiple ways– Raises fundamental questions about
Representation Information
• Common automated method is proprietary– Data structure also proprietary– Paper documentation - restricted access
• Provenance and trust
19CODATA 2006, Beijing, China 23-25 Oct 2006
ESA example
GOME
Global Ozone Monitoring Instrument
on ERS-2
20CODATA 2006, Beijing, China 23-25 Oct 2006
GOME data processing
21CODATA 2006, Beijing, China 23-25 Oct 2006
GOME Level 4 product:Integration of GOME, other data and models
GOME Level 3 product: Integration of time and space data
GOME Level 2 product:Ozone profile at given location
22CODATA 2006, Beijing, China 23-25 Oct 2006
Some Issues
• Provenance and Context of processed data
relationship to
• Representation Information of raw dataand
• Knowledge base of Designated Community
23CODATA 2006, Beijing, China 23-25 Oct 2006
UNESCO examples
DATA:
• Scanned documents and maps
• Aerial and close range photography (Digital photogrammetry)
• Monument measurements (Laser scanning)
• Satellite images (Remote sensing and image processing)
• Multi-scale digital cartography (Geographic information systems (GIS) and CAD)
• 3D models, virtual tours (Computer visualization)
Mandatory Documentation:
• Identification of property
• Description of property
• Justification of inscription
• State of conservation and factors affecting the property
• Protection and Management
• Monitoring
• Documentation
• Contact information of responsible authorities
• Signature on behalf of the State Party(ies)
World Heritage List
24CODATA 2006, Beijing, China 23-25 Oct 2006
Performing Arts examplesExamples:
• Score
• MAX/MSP patches
• Additional instructions
25CODATA 2006, Beijing, China 23-25 Oct 2006
Some Issues
• What is Preservation of “performability”?– Composer’s intention
• Authenticity
• Proprietary software and hardware
• Copyright
• Digital Rights Management
26CODATA 2006, Beijing, China 23-25 Oct 2006
Shared Infrastructure• Registries of Representation Information• Persistent Identifier name resolvers
– DOI? ARK? URL? – none are guaranteed
• Interfaces – support preservation and interoperability
• Standards – Preservation Description Information– Fixity, Provenance, Reference, Context
• Accreditation/Certification for repositories
27CODATA 2006, Beijing, China 23-25 Oct 2006
Knowledge at the heart of preservation
• Knowledge driven approach• Knowledge management to support long-term
preservation of concepts/information including:– Single, complex, on demand, interactive objects– DRM – Authenticity– Access– Storage– Designated Community – descriptions
• Knowledge base definition• ontologies
28CODATA 2006, Beijing, China 23-25 Oct 2006
WHEN
• Component architecture and prototypes by month 12
• Framework architecture month 18
• Component integration months 24-30
• Testbed implementations months 30-36
• Project completion month 42
29CODATA 2006, Beijing, China 23-25 Oct 2006
www.casparpreserves.eu
30CODATA 2006, Beijing, China 23-25 Oct 2006
Conclusions
• Science Data and Knowledge – needs more than just storing the “bits”
• Understanding and being able to process the vast amount of unfamiliar data which is available is hard
• It is expensive– Costs much be shared
• So far the Open Archival Information Systems Reference Model is OK– Many similarities can be exploited– Many subtleties need to be explored
• Watch this space