Down and Dirty Digitization: Everything you need to know about putting content online
description
Transcript of Down and Dirty Digitization: Everything you need to know about putting content online
![Page 1: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/1.jpg)
Down and Dirty Digitization:Everything you need to know about putting content online
Roy TennantCalifornia Digital Library
![Page 2: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/2.jpg)
Outline
Project Planning Selecting Material to Digitize Digitization Purpose Basic Imaging Principles Capturing Images Editing Images Best Practices Conversion to Text Metadata Access Systems Skills Required of Staff Preservation
![Page 3: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/3.jpg)
Project PlanningWho will do the work?What systems will be required?What are the specifications for images
and metadata?How much will the project cost?Who will own and manage the digital
products that will be produced?
Steve Chapman, from Handbook for Digital Projects, NEDCC
![Page 4: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/4.jpg)
Selecting Material to Digitize
Publishing rights Available support/funding opportunity Critical mass Uniqueness Reputation Audience and potential use Diversity of material type Ability to stand on its own and fit in with other
collections
![Page 5: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/5.jpg)
What Do We Preserve?The body or the soul?
The artifact The intellectual content
How do we decide that the artifact has preservation value?
Who decides?
![Page 6: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/6.jpg)
The Artifact The “look and feel” The experience of interacting with a specific
object Consequences:
Choices for providing access are limited Time and money spent on recreating the artifact
may be better spent on increasing access In some cases, preserving the look and feel
actually harms other uses
![Page 7: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/7.jpg)
![Page 8: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/8.jpg)
Written MaterialHandwritten texts (diaries, etc.), or
those with handwritten notations (manuscript drafts, etc.) can easily be considered to have artifactual value
But how much artifactual value do printed texts have?
And born-digital texts?What’s it worth to you?
![Page 9: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/9.jpg)
“If the goal of preservation is persistent utility, then functionality rather than aesthetics should drive system design.”
— Stephen Chapman, “Content Follows Form: Preservation via Systems Design, Microform & Imaging Review
![Page 10: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/10.jpg)
Persistent UtilityForm must be allowed to be altered or
destroyed to retain or enhance function If function cannot be retained or
enhanced, then form should be preserved
![Page 11: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/11.jpg)
Considerations for Retaining Items in Original FormatAgeEvidential valueAesthetic valueScarcityAssociational valueMarket valueExhibition value
![Page 12: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/12.jpg)
“The issue is not to evaluate the artifact per se to determine what survives and what does not…The issue is the need to agree on a method for interrogating the individual artifact, that would, in a climate of finite resources, help make a good decision about whether and how to preserve it.”
— Council on Library and Information Resources, The Evidence in Hand: the Report of the Task Force on the Artifact in Library Collections
![Page 13: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/13.jpg)
How Do We Preserve It?
$0
$200
$400
$600
$800
$1,000
$1,200
$1,400
$1,600
$1,800
$2,000
Bind/Box Deacidify Microfilm Digitize Simple Book Digitize Complex Book Conserve
Preservation costs by method calculated by the Library of Congress Preservation Directorate
![Page 14: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/14.jpg)
Types of Materials
Printed text/
Simple line art
Manuscripts
Halftones
Continuous Tone
Mixed
From Anne Kenney, et.al., Moving Theory into Practice
![Page 15: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/15.jpg)
Benchmarking The process whereby you determine your
digitization requirements using the material you will digitize
![Page 16: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/16.jpg)
Resolution
One pixel
The number of pixels in a given area defines the resolution of an image
1”
500 x 1,000 pixels
![Page 17: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/17.jpg)
Dynamic Range (bit-depth)
1 bit 8 bit grayscale 8 bit color 24 bit color (GIF) (GIF) (JPEG)
1 bit = black or white8 bits = 256 shades16 bits = thousands24 bits = millions36 bits = billions
![Page 18: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/18.jpg)
RGB Color Space
Red
Green
Blue
8 bits per channel = 24 bit color image
12 bits per channel = 36 bit color image
Color Channels
![Page 19: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/19.jpg)
Image CompressionLossless — the image is unchanged
after compression (no image data is lost) Typical file size: 50% of original Example: LZW compression
Lossy — the image is altered after compression (image data is lost) Example: JPEG
![Page 20: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/20.jpg)
TIFF
Tagged Image File FormatMost often used to save “master
versions” of images (unedited)Can be compressed or uncompressed
![Page 21: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/21.jpg)
Compuserve GIF
Graphic Interchange Format (GIF) Maximum 8 bits/pixel: 256 colors (shades) Good for:
Text and line art Thumbnails
Not good for: Full-color pictures Anything that requires more than 256 colors
![Page 22: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/22.jpg)
JPEG
Joint Photographic Engineers Group JPEG is actually a compression scheme; the
image file format is JFIF (JPEG File Image Format)
Good for: Full-color pictures Anything that requires more than 256 colors
Not good for: Text or line art
![Page 23: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/23.jpg)
New Image Formats
Portable Network Graphics (PNG) - from the W3C to replace the Compuserve GIF format and provide more capabilities
JPEG2000 - An upgrade of the JPEG format Flashpix - from a consortium of commercial
companies, to provide much higher-resolution images in a way that allows speedy network delivery
MrSID - From LizardTech, good for large format materials (maps, panoramic photos, etc.)
![Page 24: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/24.jpg)
Capturing Images
Technologies Digital Cameras Flatbed Scanners Film Scanners Kodak PhotoCD
OutsourcingStandards and Best Practices
![Page 25: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/25.jpg)
Digital Cameras
BetterLight Super6K6,000 x 8,000 pixels, 136MB (24bit RGB)$16,990
Phase One PowerPhase FX10,500 x 12,600 pixels, 760MB (48 bit RGB)
![Page 26: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/26.jpg)
Flatbed ScannersMinimum requirements:
600 X 1200 dpi optical resolution
36-bit colorNot for slides or transparencies, best for
81/2”x11” or 81/2”x14” originalsSheet feeder (often optional) helpful for
digitizing text
![Page 27: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/27.jpg)
Film ScannersFor 35mm slides and negatives;
others available for larger formats$600 - $3,000 Most around 2700-4000
dpi,30-36 bit color
![Page 28: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/28.jpg)
Kodak PhotoCDTake pictures with a normal camera, but
have your pictures “developed” onto a PhotoCD
A proprietary image format: ImagePAC, but very high resolution (4 different resolutions)
![Page 29: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/29.jpg)
Outsourcing: Pros and Cons Benefits:
No ramp-up costs (both time and money) Probably higher quality, at least to begin with High volume capability
Drawbacks: May be more costly if you have underutilized staff
time No internal capability or experience developed (that
is, when the money runs out, so does your chance to do anything more)
Rare items may require in-house digitization
![Page 30: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/30.jpg)
Outsourcing: How Write an RFQ (Request for Quote) outlining:
Type and amount of material being digitized Quality requirements Volume per unit of time requirements
For RFQ guidance and samples, see RLG Tools for Digital Imaging: www.rlg.org/preserv/RLGtools.html
![Page 31: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/31.jpg)
Digital Image Work Flow
Original TIFF or PCD10-100+MB
JPEG100K
GIF10K
RGB Color Space IndexedColorSpace
Resize,Sharpen
Rotate,Crop,
Retouch,Brightness/
Contrast
Stored offline Stored online
![Page 32: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/32.jpg)
Editing Images
RotatingCroppingRetouchingAdjustingResizingSharpeningSaving
![Page 33: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/33.jpg)
Image Editing Demonstration
![Page 34: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/34.jpg)
Conversion to Text Optical Character Recognition (OCR)
software is required (Caere OmniPage Pro, Xerox TextBridge, etc.)
Quality and typography of originals is key Less than 99.5% accuracy is less expensive
to have re-keyed offshore For some applications, uncorrected text is
sufficient
![Page 35: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/35.jpg)
Imaging Best PracticesGeneral guidelines for archival versions:
Photos, illustrations, maps, etc.: 300-600dpi 24-36 bit color
B/W Text document: 300-600dpi 8 bit grayscale
Negatives and Slides: 2000-4000 pixels in longest dimension 24-36 bit color for color; 8 bit grayscale for B/W
![Page 36: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/36.jpg)
Imaging Best Practices
“The key to image quality is not to capture at the highest resolution or bit depth possible, but to match the conversion process to the informational content of the original, and to scan at that level--no more, no less.” — Moving Theory Into Practice
![Page 37: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/37.jpg)
Metadata: Types
Structured description of an object or collection of objects
Three basic types: descriptive - e.g., title, creator, subject -
used for discovery administrative - e.g., resolution, bit
depth - used for managing the collection
structural - e.g., table of contents page, page 34, etc. - used for navigation
![Page 38: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/38.jpg)
Metadata: Appropriate LevelMetadata: Appropriate Level
Collection-level access: Discovery metadata describes the collection Example: Archival finding aid encoded in
SGML; see http://www.oac.cdlib.org/
Item-level access: Discovery metadata describes the item Example: individual metadata records for
each item; see http://jarda.cdlib.org/cgi-bin/imagesearch.pl
![Page 39: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/39.jpg)
IndividualFinding
Aid
Images
Collection Level AccessCollection Level Access
Search Interface (Library catalog
or dedicated)
IndividualFinding
Aid
![Page 40: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/40.jpg)
![Page 41: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/41.jpg)
![Page 42: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/42.jpg)
![Page 43: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/43.jpg)
Search Interface (Dedicated)
Images
Item Level AccessItem Level AccessFinding Aids
![Page 44: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/44.jpg)
jarda.cdlib.org/search.html
![Page 45: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/45.jpg)
Metadata: Granularity <name>William Randolph Hearst</name> <name>
<first>William</first><middle>Randolph</middle><last>Hearst</last>
</name> Consider all uses for the metadata Design for the most granular use Store it in a machine-parseable format
![Page 46: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/46.jpg)
Metadata: Qualification<name role=“creator”>William Randolph
Hearst</name><subject scheme=“LCSH”>Builder --
Castles -- Southern California</subject>
![Page 47: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/47.jpg)
Metadata: Machine Parseability
The ability to pull apart and reconstruct metadata via software
For example, this:
Can easily become this:
<name><first>William</first><middle>Randolph</middle><last>Hearst</last>
</name>
<DC.creator>Hearst, William Randolph</DC.creator>
![Page 48: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/48.jpg)
Metadata: Standards
Metadata: Collection Level:
Encoded Archival Description (EAD) - lcweb.loc.gov/ead/
Item Level: MARC Dublin Core - purl.org/DC/ MODS - www.loc.gov/standards/mods/
Harvesting: Open Archives Initiative, www.openarchives.org
![Page 49: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/49.jpg)
Access SystemsExhibitBrowseSearch
![Page 50: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/50.jpg)
Access Systems: Exhibit Goals:
Inviting Easy to navigate Highlight selected parts of a collection Teach
Requirements: Great graphic design Informative and succinct commentary Interesting subject matter
![Page 51: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/51.jpg)
![Page 52: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/52.jpg)
![Page 53: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/53.jpg)
Access Systems: BrowseGoals:
Provide intriguing and interesting paths into and throughout a collection
Give a broad sense of a collection, but not show everything necessarily
Requirements: Logical browse paths May have multiple paths to the same items
(e.g., time, geography, subject)
![Page 54: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/54.jpg)
![Page 55: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/55.jpg)
![Page 56: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/56.jpg)
![Page 57: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/57.jpg)
Access Systems: Search Goals
To provide post-coordinate access to all items in a collection relevant to a particular query
To provide good methods to create a search as well as refine or alter the display as required
Requirements: Good search software (database or indexing software) Good metadata (minimum is probably a title or caption
for each item) Good interface (options for navigation, search
refinement, etc.)
![Page 58: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/58.jpg)
![Page 59: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/59.jpg)
Skills Required of Staff Imaging OCR Markup languages (HTML, XML) Cataloging & metadata Indexing and database technology User interface design Programming Web technology Project management
![Page 60: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/60.jpg)
How Does Digital Data Die?
Let me count the ways… New replaces old Death of a sponsor Sponsor loses interest Lost functionality Format rot Media format obsolescence Content format obsolescence Disaster
![Page 61: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/61.jpg)
Preserving Digital Content No preservation format Digital preservation techniques:
Print (on acid free paper!) Store Refresh Encapsulate Emulate Proliferate (Lots Of Copies Keep Stuff Safe or
LOCKSS)
![Page 62: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/62.jpg)
Preserving Digital Content Institutional commitmentConsortial agreementsCooperatively funded central
repositoriesPreservation Open Market
![Page 63: Down and Dirty Digitization: Everything you need to know about putting content online](https://reader036.fdocuments.net/reader036/viewer/2022081519/56813bf4550346895da53662/html5/thumbnails/63.jpg)
The Best DefenseWhat will ensure that material will not be
preserved? Ignorance of its existence Ignorance of its worth Inability or unwillingness to pay for its
preservationAccess helps with all of these problems