1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne...

34
1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library

Transcript of 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne...

Page 1: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

1

CS 502: Computing Methods for Digital Libraries

Lecture 9

Conversion to Digital Formats

Anne Kenney, Cornell University Library

Page 2: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

2

What are Digital Images?

• Electronic snapshots taken of a scene or scanned from documents

• samples and mapped as a grid of dots or picture elements (pixels)

• pixel assigned a tonal value (black, white, grays, colors), represented in binary code

• code stored or reduced (compressed)

• read and interpreted to create analog version

Page 3: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

Four Scanning Methods

Bitonal Grayscale

Color Special Treatment

Page 4: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

4

Digital Image Quality is Governed By:

• resolution and threshold

• bit depth

• image enhancement

• color management

• compression

• system performance

• operator judgment and care

Page 5: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

5

Resolution

• determined by number of pixels used to represent the image

• expressed in dots per inch (dpi)--actually dots/sq. inch

• increasing resolution increases level of detail captured and geometrically increases file size

Page 6: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

Effects of Resolution

600 dpi600 dpi

300 dpi300 dpi

200 dpi200 dpi

Page 7: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

7

Threshold Setting in Bitonal Scanning

defines the point on a scale from 0 to 255 at which gray values will be interpreted either as black or white

Page 8: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

8

Effects of Threshold

threshold = 100

threshold = 60

Page 9: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

9

Bit Depth

• number of bits used to represent each pixel, typically 8 bits or more per channel

• representing 256 (28) levels for grayscale and 16.7 million (224) levels for color example: 8-bit grayscale pixel

00000000 = black

11111111 = white

Page 10: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

10

Bit Depth

• increasing bit depth increases the level of gray or color information that can be represented and arithmetically increases file size

• affects resolution requirements

Page 11: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

11

Effects of Grayscale on Image Quality

3-bit gray 8-bit gray

Page 12: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

12

Image Enhancement

• can be used to improve image capture

• use raises concerns about fidelity and authenticity

Page 13: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

13

Effects of FiltersEffects of Filters

no filters usedno filters used

maximum maximum enhancementenhancement

Page 14: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

14

Image Editing

Page 15: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

15

Compression

• reduces file size for processing, storage, transmission, and display

• image quality may be affected by the compression techniques used and the level of compression applied

Page 16: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

16

Compression Variables

• lossless versus lossy compression

• proprietary vs. open schemes

• level of industry support

• bitonal vs. gray/color

Page 17: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

17

Common Compression Schemes• bitonal

– ITU Group 4: lossless – JBIG (ISO 11544): lossless– CPC: Lossy– DigiPaper

• grayscale/color– LZW, lossless– JPEG: lossy– Kodak Image Pac, “visually lossless”– Fractal and Wavelet compression

Page 18: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

18

Effects of JPEG Compression

300 dpi, 8-bit grayscaleuncompressed TIFF

JPEG 18.5:1 compression

Page 19: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

19

Compression Observations

• the richer the file, the more efficient and sustainable the compression

• the more complex the image, the poorer the compression

Page 20: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

20

Equipment used and its performance over time

• scanners offer wide range of capabilities to capture detail, dynamic range, and color

• scanners with same stated functionality can produce different results

• calibration, age of equipment, and environment affect quality

Page 21: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

21

Equipment used and its performance over time

• attributes and capabilities of monitor and/or printer are also factors

• assess quality visually and computationally– use targets– control QC environment– increasing availability of software to assess

resolution, tone, color, artifacts

Page 22: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

22

Image Capture:

Create digital objects rich enough to be useful over time in the most cost- effective manner.

Page 23: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

23

How to determine what’s good enough?

• Connoisseurship of document attributes

• Objective characterizations

• Translation between analog and digital– measurement to scanning requirement to

corresponding image metrics– e.g., detail sizeresolution MTF– tonal range bit depth signal-to-noise ratio

Page 24: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

24

Case Study

• Brittle Books--printed text, use of metal type, commercial publishers, objective measurement, use of Quality Index from micrographics

• 600 dpi 1-bit capture adequately preserves informational content of text-based materials

Page 25: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

25

Ensuring Full Informational Capture: “No More, No Less”

cost

imag

e qu

ality

and

util

itydesired point of capture

Page 26: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

26

Create One Scan To Serve Multiple Uses

• Derive alternative formats/approaches to meet current and future information needs

• Base “derivative” requirements on document attributes, technical infrastructure, user requirements, and cost

• Understand technical links affecting presentation and utility of derivatives

Page 27: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

27

User Requirements

• completeness

• legibility

• speed of delivery

• “cooked” files

Page 28: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

28

Derivatives from a Digital Master

• the richer the image, the better the derivative– a derivative from a rich file is superior in

quality to one from a poorer scan– the richer the image, the better the image

processing

Page 29: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.
Page 30: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.
Page 31: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

monitor: 800 x 600 pixels

800

600

document: 8” x 10”, 200 dpi (1,600 x 2,000 pixels)

2,000pixels

1,600 pixels

document at 60 dpi480 pixels x 600 pixels

document at 100 dpi800 pixels x 1,000 pixels

Page 32: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

TIFF Uncompressed GGIF Compressed6:1 (NARA)6:1 (NARA)

JPEG Compressed 20:1 ( LC) Compressed

20:1 (LC)

Compression/File Format Comparison for Derivative Files

Page 33: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

33

Alternatives for Displaying Oversize Images

• File formats and compression schemes that support multi-resolution image delivery, e.g., wavelet compression, GridPix, Flashpix

• User tools for representing scale (Blake Project ImageSizer, java applet), and improving image quality

Page 34: 1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.

34

Recommendations Coalescing• Intent of conversion drives decisions

– issues of access considered at conversion– notion of long-term utility and cross-institutional

resources gaining ground

• Access images will change with:– changing user needs and capabilities– changes in technologies: file formats, technical

infrastructure,compression, web browsers, processing programs, scaling routines