Organized Digital Library Development … from the Bottom Up
description
Transcript of Organized Digital Library Development … from the Bottom Up
Organized Organized Digital Library Development Digital Library Development … … from the Bottom Upfrom the Bottom Up
University of Alabama Libraries
Jody L. DeRidderJody L. [email protected] [email protected]
Image courtesy of Life Magazine
Libraries organize information… primarily books.
Trinity College Library, Dublin, as captured by Candida Höferin her book Libraries (Thames and Hudson ,UK: 2005).
Photo credit: Flickr user "Libby", used with permission (creative commons)
If libraries organize books… Why not digital files??
It’s all information!
A digital object may belong in MANY potential virtual collections…
… but it originated from ONE SINGLE ANALOG collection. Provenance trumps all!
Slavery African Americans Sheet Music Tombigbee River Southern History … and more
“Gum Tree Canoe,” Published by G.P. Reed (Boston: 1847). Wade Hall collection of Southern History and Culture, Hoole Special Collections, University of Alabama Libraries.
Bringing Order to Chaos
University of Alabama Libraries
Holder ID: u0003
Collection ID: 0000023
Item ID: 0000007
Sequence ID: 0005
Archival File: u0003_0000023_0000007_0005.tif
1) Clarity
2) Low cost
3) Simple
4) Extensible
u0003_0001980_0000001 is the first digitized item in the MSS 1980 collection
HOLDER ID
COLLECTION ID
The Digitization Working Area…
Collection folders are named for the collection identifier. Allowed subfolders include:
Admin Metadata Scans Transcripts
Compound objects have their own subfolders for pages, named for the item.
And a Collection Folder in the Working Area
Bringing Content Up to the Level Of the WEB!!! Greater Usability and Access == Longer Life
Images … ImageMagick: http://www.imagemagick.org(it’s free!)
Protected archive area
u0003 u0003
0000023 0000023
0000007
0005
u0003_0000023_0000007_0005.tif
0000007
0005
Thumb and large-size derivatives
Web accessible area
Audio … LAME: http://lame.sourceforge.netOCR … TESSERACT: http://code.google.com/p/tesseract-ocr/
u0003 slide
Identification, Organization and Consistency
Each segment of numbers:
Holder ID Collection ID Item ID Sequence ID
is used in the directory structure.
The directory for u0003_0000003_0002_001.tif
Is simply:
u0003/ 0000003/ 0002/ 001/
Dropping the Technical Metadata in… where it belongs
Makes METS creation a Piece of Cake!
(and redundant!)
Using FITS, the File Information Tool Set developed by Harvard which encapsulates JHOVE, DROID, ExifTool and other tools: http://code.google.com/p/fits/
An Example of the Lowest- Cost Model: The Alabama Digital Preservation Network http://www.adpn.org/
http://www.lockss.org/
Lots of Copies Keeps Stuff Safe!!
storage area
Simple, Clear Hierarchical Organization:
Holder ID Collection ID Item ID Sequence ID
http://acumen.lib.ua.edu
ACCESS! Via Acumen
(also free!)
XML agnostic No ingest No metadata modifications All content easily accessible Open to search engines
Now it’s organized. But can users find what they need?
Trinity College Library, Dublin, as captured by Candida Höferin her book Libraries (Thames and Hudson ,UK: 2005).
Usability Testing
* U=Undergraduate, G=Graduate Student, PG=Post graduate volunteer, S=University staff
Participant Number 1 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17 18 19 20 21
Educational Status* G G U G G G S G U G U U U U U U PG
G G G
Educational Background in
History
X X X X X
Previous Special Collections Experience
X X X X X X X X X X X X
Previous Digital Collections Experience
X X X X X X X X X X X X X X X X X
English as a Second Language
X X X X X
http://transcribe.lib.ua.edu
http://tagit.lib.ua.edu
http://tagit.lib.ua.edu
S. R. Ranganthan (1931), paraphrased:
Information is for use.Information is for use.
Every user his / her information.
Every information its user.
Save the time of the user.Save the time of the user.
The library is a growing organism.The library is a growing organism.
Remember why we’re here…
Image: jscreationzs / FreeDigitalPhotos.net