Technology Support for ESSSS

22
TECHNOLOGY SUPPORT FOR ESSSS Progress, Issues, and Challenges Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library Founder and Publisher, Library Technology Guides http://www.librarytechnology.org/ http://twitter.com/mbreeding ESSSS Digital Archive Workshop February 4, 2012

description

Technology Support for ESSSS. Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library Founder and Publisher, Library Technology Guides http://www.librarytechnology.org/ http://twitter.com/mbreeding. Progress, Issues, and Challenges. - PowerPoint PPT Presentation

Transcript of Technology Support for ESSSS

Page 1: Technology Support  for ESSSS

TECHNOLOGY SUPPORT FOR ESSSS

Progress, Issues, and Challenges

Marshall BreedingDirector for Innovative Technology and ResearchVanderbilt University LibraryFounder and Publisher, Library Technology Guideshttp://www.librarytechnology.org/http://twitter.com/mbreedingESSSS Digital Archive WorkshopFebruary 4, 2012

Page 2: Technology Support  for ESSSS

Turning Pages on Paper to Digital Images

Digitizing in the field involves many compromises compared to what can be done in more controlled settings

Access to archives may be of limited duration Arbitrary and political

Materials deteriorating rapidly Practices related to physical preservation

tend to be minimal Must be light, fast, and expensive

Page 3: Technology Support  for ESSSS
Page 4: Technology Support  for ESSSS
Page 5: Technology Support  for ESSSS
Page 6: Technology Support  for ESSSS

Achieve best results possible Maximize quality and consistency Handheld digital cameras

Rapid advancement in capabilities Early images down at lower resolutions

compared with what is possible today Fixed camera stands Consistency in orientation and framing Organization of Images (folders / image

names)

Page 7: Technology Support  for ESSSS

Image Standards

TIFF: Currently regarded as best image format for archiving images

RAW: Native proprietary format of a camera

JPEG: Compressed images for display on the Web Data lost during compression: non-

reversible VU system creates multiple sizes of JPEG

images JPEG2000

Lossless compression method Not well supported on the Web

Page 8: Technology Support  for ESSSS

Bringing Images to the Web

Take advantage of infrastructure developed at by the Vanderbilt University Library to manage images

Digital Library framework: Presentation and functionality created in Perl-based

interface Data and Metadata stored in MySQL relational tables ODBC connectivity between presentation layer and

MySQL Microsoft Windows Server/IIS for Web server Images reside on digital storage provided by the

Vanderbilt University Library

Page 9: Technology Support  for ESSSS

Digital Preservation

Disaster Recovery Ability to restore files in the case of any

hardware, software, or human Error Digital Preservation

Commitment and processes in place to preserve digital information for the very long term

Multiple replications Migration of data into future formats as

current standards become obsolete

Page 10: Technology Support  for ESSSS

Building structure through Metadata

Metadata structure based on Dublin Core Volume-level descriptive metadata

Courtney Campbell designed metadata structure and is analyzing volumes to populate metadata for each volume

EXIF Data extracted from images into the individual records for each page

Page-level structure Supports ability to select volumes and

browse page images

Page 11: Technology Support  for ESSSS

Demonstration

Image management environment Interface Metadata Page Images

Page 12: Technology Support  for ESSSS

Turning Pages into Data

The contents of the page images contain valuable data

Page images can be read by humans but do not support essential features: search, computer analysis, etc.

Full value of these collections can be realized through transcription

Page 13: Technology Support  for ESSSS

Challenges in transcription

Page characteristics Hand written by many different hands Many names and numbers Spanish language Varying contrast Many defects: water damage, insects, etc

Page 14: Technology Support  for ESSSS
Page 15: Technology Support  for ESSSS
Page 16: Technology Support  for ESSSS
Page 17: Technology Support  for ESSSS
Page 18: Technology Support  for ESSSS

Human transcription

Scholars that work with pages of interest can create transcriptions manually

Optical character recognition? Highly accurate for typescript Not effective for handwritten manuscripts

Page 19: Technology Support  for ESSSS

Crowdsourcing

Find ways to have large numbers of persons create transcript snippets

Google uses crowdsourcing to improve transcripts for Google Books project.

Page 20: Technology Support  for ESSSS

Google ReCAPTCHA:

“Digitizing books one word at a time” Each transaction transcribes one or two

words Each word is transcribed many times Results compared to determine correct

version

Page 21: Technology Support  for ESSSS

Google ReCAPTCHA

Page 22: Technology Support  for ESSSS

Crowdsourcing to Transcribe ESSSS Scholars contribute any transcriptions

created as they work with any given set of pages

Students assigned to create transcriptions Language, history, LIS

Collaboration with some organization with ReCAPTCHA like infrastructure