Post on 20-Mar-2016
description
CSc 461/561
CSc 461/561Multimedia Systems Part B: 2. Lossy Compression
CSc 461/561
Summary
(1) Why is lossy compression possible? (2) Distortion measure (3) Quantization (4) Transformation (5) Introduction to JPEG- Part I (6) Introduction to MPEG-Part I
CSc 461/561
1. Why is lossy compression possible?
– some information is more important than others for human
– keep the important one
Compression Ratio: 12.3Compression Ratio: 7.7 Compression Ratio: 33.9
Original
CSc 461/561
2. Distortion measure
• Rate– # of bits per source symbol
• Distortion– one measure: mean square error (MSE)– x: original value; y: reconstructed value– MSE = [(x1-y1)2+(x2-y2)2+…+(xN-yN)2]/N
• Rate vs distortion– lower rate, higher distortion
Rate
Distortion
A
B
CSc 461/561
3. Quantization (1)
• Quantization (recall audio A/D)– use a discrete value to represent a value range– information loss!
• The smaller range, the less distortion– granular distortion
• Quantization steps– uniform: all ranges have the same size– non-uniform: otherwise
CSc 461/561
3. Uniform quantization (2)• Quantization step: uniform• Two constructions: midrise, midtread
∆ 2∆ 3∆ Input
-3∆ -2∆ -∆
Reconstruction3.5∆2.5∆1.5∆
0.5 ∆
-0.5∆-1.5∆-2.5∆-3.5∆
Uniform Midrise Quantizer
-2.5∆ -1.5∆ -0.5∆
Reconstruction3∆2∆∆
-∆-2∆-3∆
Uniform Midtread Quantizer
0.5∆ 1.5∆ 2.5∆ Input
CSc 461/561
3. Signal-to-quantization-noise ratio (3)
• Quantization– n bits; 2n steps for [-Xmax,Xmax]
– step size: delta = 2Xmax / 2n
– granular distortion: • SQNR in dB
– 10 log10 signal_energy / noise_energy
=10 log10 [(2Xmax)2/12]/[delta2/12]=20n log102
• One more bit adds 6 dB to SQNR
σ2q= ∫
−Δ/2
Δ/2
x−0 2 1Δdx= 1
12Δ2
CSc 461/561
3. Non-uniform quantization (4)• Recall u-law or A-law voice compander• How to choose quantization steps?
– Int f(x) dx = 1/2n
x
f(x)
0
x
f(x)
0
Uniform Non-uniform
xi xi+1 xi xi+1
xi
xi+1
CSc 461/561xi
3. Non-uniform quantization: more (5)
• How to represent a range?– Int f(x) dx = 1/2n+1
– when uniform: yi=(xi+xi+1)/2
x
f(x)
0
x
f(x)
0
Uniform Non-uniform
xi xi+1 xi+1yiyi
xi
yi
CSc 461/561
4. Transformation (1)• Transformation
– represent information in anther space• identify and remove (hard-to-remove) correlation,
i.e., redundancy, in the original space• information loss!
– e.g., time/space => frequency (FFT)• Inverse transformation
– represent the info back in the original space
CSc 461/561
4. Discrete Cosine Transform (2)• Recall: a wave is of many waves • “Any signal can be expressed as a sum of multiple
signals that are sine or cosine waveforms at various amplitudes and frequencies.”
• Cosine transform: using cosine waveforms• DCT: integer indexes
– widely used in image compression (e.g., JPEG)
CSc 461/561
4. DCT: more (3)
• 2-D DCT (8x8); C(x)=1/sqrt(2) when x=0
• Inverse 2-D DCT (IDCT); C(x)=1 otherwise
CSc 461/561
4. DCT: examples (4)
DC Component
Original values of an 8x8 block (in spatial domain)
Corresponding DCT coefficientscoefficients (in frequency domain)(in frequency domain)
CSc 461/561
5. Introduction to JPEG-Part I (1)
• Joint Photographic Experts Group (JPEG)– ISO standard (1992)– widely used (.jpeg, .jpe, .jpg; C/R: 10~20)
• The family of JPEGs– lossless JPEG: prediction-based compression– lossy JPEG: DCT-based compression– M-JPEG: motion JPEG– JPEG2000: discrete wavelet transform; new!
CSc 461/561
5. Introduction to JPEG-Part I (2)
JPEG compression guidelines – Brightness vs color sensitivity
• RGB => YUV/YIQ• chroma subsampling (4:2:0)
– Spatial correlation among nearby pixels• slice an image into 8x8 blocks (bad for text)
– Remove redundancy in frequency domain• discrete cosine transform (DCT)• coarse quantization for high freq coefficients
CSc 461/561
5. Introduction to JPEG-Part I (3)
• Sequential mode• Progressive mode
– low quality first, then differential data added• DC first, then AC; or MSB first, then LSB
• Hierarchical mode– lowest resolution first and then higher resolutions
• Lossless mode– prediction and entropy encoding
CSc 461/561
5. Introduction to JPEG-Part I (4)
• We will revisit the topic later.
CSc 461/561
6. Introduction to MPEG-Part I (1)
• MPEG-1 (1991): VCD (VCR+CD quality)– 352x240, 1.2Mbps video CBR, 256Kbps audio– progressive scan only (1x CD-ROM)
• MPEG-1 video compression– similar to H.261, with a few differences
• more formats, flexible slices, quantization table– I-frame: JPEG-like compression– P-frame: prediction-based; B-frame
CSc 461/561
6. Introduction to MPEG-Part I (2) MPEG-1: more
• Bi-directional search– search both previous and
next frames for similarmacro-blocks
• MPEG-1 GOP– I-frame, P-frame, B-frame
• display order: IBBPBBPBBPBBPBBI (M=3, N=15)• coding order: IPBBPBBPBBPBBIBB; timestamps
– D-frame: for search through the video, DC only
1 2 3 4 5 6 7 8 9I B B P B B P B B
CSc 461/561
6. Introduction to MPEG-Part I (3) MPEG-2• MPEG-2 (1994): DVD, HDTV, etc
– also adopted as ITU-T H.262– many video formats and data rates; better audio
• profiles: simple (4:2:0, I/P), main (+B), SNR (+variable quality), spatial (+variable resolution), high (+4:2:2)
• levels: low (352x288), main (720x576), high 1440 (1440x1152), high (1920x1152)
– support interlaced video (broadcasting!)
CSc 461/561
6. Introduction to MPEG-Part I (4) MPEG-2 scalability• Layered encoding
– base layer: independent for basic quality– enhancement layer: dependent on the base layer
• E.g., SNR scalability– base: low SQNR (coarse quantization)– enhance: high SQNR (fine Q on actual-base)
• E.g., spatial scalability– base: low resolution; enhance: high resolution
CSc 461/561
6. Introduction to MPEG-Part I (5) MPEG-4
• MPEG-4 (1999): content-based, object-oriented– based on H.263, initially for low bit-rate apps– video sequence: a collection of media objects
• objects: still image, moving object, audio, etc• how to decompose is NOT specified (encoder)
– VOP: video object plane• GOV: I-VOP, P-VOP, B-VOP• VOP is divided into many macro-blocks
– motion estimation: bounding box; padding
CSc 461/561
6. Introduction to MPEG-Part I (5):
MPEG-4: object oriented
CSc 461/561
6. Introduction to MPEG-Part I (6) MPEG-4: more• Fine gain scalability
– spatial scalability– temporal scalability– quality scalability
• MPEG-4 audio– general audio (2~64Kbps)– speech (2~4Kbps: HVXC; 4~24Kbps: CELP)– synthesized (e.g., MIDI, TTS)
CSc 461/561
6. Introduction to MPEG-Part I (7)
• We will revisit the topic later.