Digitizing Spectator - Libraries Digital Program
-
Upload
robert-frech -
Category
Documents
-
view
314 -
download
2
Transcript of Digitizing Spectator - Libraries Digital Program
Columbia Spectator Archive
Progress Report on Phase 1
Stephen Paul Davis
Columbia University Libraries
Digital Program
June 27, 2012
The Plan
• Partnership between Columbia Libraries /
Information Services and the Spectator
• High quality scanning of original Spectator
issues from Columbia University Archives and
the Spectator Editorial Offices
• State-of-the-art text processing (OCR) of
scanned images to allow searching at article
• Feature-rich online presentation
• Permanent, long-term digital preservation
The Players
• The Spectator staff and board
• University Archives
• Libraries‟ Preservation & Digital Conversion Division
• Libraries‟ Digital Program Division
• Libraries‟ Information Technology Division
• Digital Data Divide
• Brechin Imaging Services
• Digital Library Consulting (Veridian provider)
• Cornell University Libraries [behind the scenes]
The Context
Columbia Libraries Digital Program’s mission:
• To carry out digitization and access projects chiefly
from Columbia‟s rare and special collections (2002-)
• To build and support Columbia‟s long-term digital
preservation infrastructure (2010-)
• To develop and support preservation of and access
to born-digital archival collections (2011-)
Columbia Libraries Digitization Program
• Digitization Projects (Digital Scriptorium, APIS (papyrus project), John Jay Papers,
Herbert Lehman Papers, etc.)
• Digital Exhibitions (See especially: Core Curriculum:CC, Core Curriculum:LitHum,
1968:Columbia in Crisis, Varsity Show)
• „Born-Digital‟ & Web Archives (Columbia University, Human Rights Organizations, etc.)
Columbia‟s Technology Platforms
Columbia University Libraries / Information Services
has a:
• robust repository infrastructure that follows
• national and international standards and
• „best practices‟ to support
• digital publishing and
• long-term digital preservation
Newspaper Access …
• Providing flexible access to newspaper
content is complicated and expensive
• Not cost-effective for single institutions to
build custom, newspaper-oriented software
• Only two major vendors provide software
optimized for newspapers
• DL Consulting’s Veridian is by far the better &
most frequent choice for research libraries
Spectator Stats
Spectator run from 1877-2009:
Number of volumes = 155
Estimated no. of pages = 79,145
Average pages per volume = 500 (wide variation!)
Est. vols. requiring disbinding = 100 Est. vols. unable to be digitized = 10
NB: Most volumes contain severely brittle paper; only 24 volumes have flexible paper
Phase 1 Completion
• Prep, rehouse, digitize & encode Spec volumes
for 1955-1992: completed June 15th
• Load into VeridianTest System: June 29th
• Design Spectator Archive website: July 15th
• Move test system to production environment:
July 30th
• Do user testing and quality review: August 15th
• Launch new public site: September 4th
Demo of Test System
• 1964: http://tinyurl.com/78hhypj
• 1968: http://tinyurl.com/7jk6ynz
• 1973: http://tinyurl.com/7gu55p6
• 1983: http://tinyurl.com/7dq8zly
• Searching “coeducation”: http://tinyurl.com/7cwd95g
• Partial content list: http://tinyurl.com/7q8w4nq
[Note that these are all temporary links that work as of 6/28/2012 but which
will stop working altogether at some point in the next few weeks.]
Phase 2 Goals
Finish the Project!
(Prep, rehouse, repair, digitize & encode Spec
volumes for 1877-1954 and 1992-2009)
Phase 2 Costs (for ca. 55,000 pages)
• Preparation, rehousing, repair
= will be covered by CU Libraries
• Scanning of 55,000 pages
= $55,000 + $5,000 contingency
• OCR, segmentation, selective text correction
= $55,000 + $5,000 contingency
• Load into host system, license, maintenance
= already covered by CU Libraries
• Long term preservation of master image (tiff) files
= may require additional fundraising
Final, key points
• The Spectator Archive project is extremely important
for preservation of and access to Columbia
University‟s history
• This is an archival preservation project as well as an
information access project
• Columbia Libraries is making a major, long-term
investment to ensure the success of this project
• The Libraries and the Spec have made a great start,
but additional funding is needed to complete the job
Questions
Stephen Paul Davis, Director
Libraries Digital Program
Columbia University