Digital presevation
-
Upload
national-library-of-australia -
Category
Technology
-
view
444 -
download
0
Transcript of Digital presevation
Digital preservationfor ongoing access
Presentation for Council July 2008
David PearsonManager, Digital Preservation Section
Overview
1. We have lots of “digital stuff” in our collections and it is growing
2. We will lose access to it unless we take action
3. We need to manage the process of keeping it accessible and usable
4. Solutions have to be scalable, reliable and automated
1. “Digital stuff”- many collections
Oral HistoryPictures
Historical Newspapers
Maps
Manuscripts
Books
Web sites
Ephemera
Sheet music
Serial
How does it grow?
1. We collect it – Physical carriers– Online
• PANDORA web archive• Australian web domain harvests
2. We create it– Oral history interviews – Photographs – Publications
3. We convert it– Digitise our collections
Web Archives
• Web sites are collected selectively – Individually for access via PANDORA, or
– On a large scale via annual domain snapshots
• No control over content creation
• Lots of – File formats
– Individual files (Pandora ≈ 51 million, Domain harvest ≈ 1.3 billion files)
– Links
– Software (browser, plug-ins, readers)
• Internet content changes over time
Digitisation
• Around 135,000 items digitised
• Newspaper project = 4 million pages by 2010
• Internally created so we can control– Standards– File formats (e.g. TIFF,
JPEG, PDF )– Metadata– Workflows
• Issues– Growing volume
Physical carriers
• Approx. 12,000 items – grows by 1,000 a year
Issues• No control over creation
• Time lag before acquisition
• Variety of carriers (fragile) and file formats
• Require various hardware, software, operating systems, drivers to access
• Labour intensive to process and transfer to safe storage (growing backlog)
Growth : digital collection storage
0
50
100
150
200
250
300
350
Jan-03 Jul-03 Jan-04 Jul-04 Jan-05 Jul-05 Jan-06 Jul-06 Jan-07 Jul-07 Jan-08 Jul-08
Stor
age
size
(ter
abyt
es)
Australian Web Harvests
Newspapers
Type of Digital Collections2008
Pandora3%
Maps2%
Sheet Music4%
Manuscripts2%
Pictures7%
Oral History18%
Other3%
Historical Newspapers
21%
Australian Web Harvest
40%
Comparison of books collection & digital collection "book equivalents"
0.00
1.00
2.00
3.00
4.00
5.00
6.00
2005 2006 2007 2008
Year end June
"Boo
k Eq
uiva
lent
s" (m
illio
ns)
Digital Collection20 mb "bookequivalents"Books Collection
Growth: compared to books
2. Act or risk losing it
• “Digital stuff” is dependent on technology at all stages– Creation/capture
– Storage
– Access
• Technology changes rapidly thus software, hardware, media, file formats, operating systems become obsolete
• Unless managed deterioration can occur rapidly e.g. data can be corrupted or lost in storage or transfer process
Computer Museum
3. Managing to keep it
• “Not managing it” is not an option
• We need to
– Understand our “digital stuff” & associated risks
– Provide safe storage & ensure integrity
– Ensure access over time as technology changes
– Develop & implement preservation workflows, skills, standards, & strategies for ongoing access
– Enable content to be shared and used in different ways in the future
4. Solutions and implications
• Large scale automated processes
• Original research & time to deliver the solutions
• Reasonably long lead times
• Audit processes and quality control monitoring are critical
• Significant resources are required
Conclusions
• We are responsible for a lot of “digital stuff”• If we simply collect and store it, it will become
unusable in a relatively short time as technologies change
• Maintaining the ability to access it requires a lot of good management, planning, & dedicated resources
• We have to find and use solutions that can be applied automatically and reliably to billions of digital files