Transform Coding 2005
Transcript of Transform Coding 2005
-
7/27/2019 Transform Coding 2005
1/80
Lecture notes of Image Compression and Video
Compression series 2005
2. Transform Coding
Prof. Amar Mukherjee
Weifeng Sun
-
7/27/2019 Transform Coding 2005
2/80
#2
Topics
z Introduction to Image Compression
z Transform Coding
z Subband Coding, Filter Banks
z Haar Wavelet Transformz SPIHT, EZW, JPEG 2000
z
Motion Compensationz Wireless Video Compression
-
7/27/2019 Transform Coding 2005
3/80
#3
Transform Codingz Why transform Coding?
z Purpose of transformation is to convert the data intoa form where compression is easier. Thistransformation will transform the pixels which arecorrelated into a representation where they aredecorrelated. The new values are usually smaller
on average than the original values. The net effectis to reduce the redundancy of representation.
z For lossy compression, the transformcoefficients can now be quantized according to
their statistical properties, producing a muchcompressed representation of the originalimage data.
-
7/27/2019 Transform Coding 2005
4/80
#4
Transform Coding Block Diagram
z Transmitter
z Receiver
Segment into
n*n BlocksForward
Transform
Quantization
and Coder
Original Image f(j,k) F(u,v) F*(uv)
Channel
Combine n*nBlocks InverseTransform Decoder
Reconstructed Image f(j,k) F(u,v) F*(uv)
-
7/27/2019 Transform Coding 2005
5/80
#5
How Transform Coders Work
z Divide the image into 1x2 blocks
z Typical transforms are 8x8 or 16x16
x1
x1
x2
x2
-
7/27/2019 Transform Coding 2005
6/80
#6
Joint Probability Distribution
z Observe the Joint Probability Distribution
or the Joint Histogram.
x1x2
Probability
-
7/27/2019 Transform Coding 2005
7/80
#7
Pixel Correlation in Image[Amar]
z Rotate 45o clockwise
Source Image: Amar
Before Rotation After Rotation
=
=
2
1
2
1
45cos45sin
45sin45cos
X
X
Y
YY
oo
oo
-
7/27/2019 Transform Coding 2005
8/80
#8
Pixel Correlation Map in [Amar]
-- coordinate distribution
z Upper:Before Rotation
z Lower:After Rotation
z Notice thevariance of Y2 issmaller than thevariance of X2.
z Compression:apply entropycoder on Y2.
-
7/27/2019 Transform Coding 2005
9/80
#9
Pixel Correlation in Image[Lenna]
z Lets look at another example
Before Rotation After Rotation
Source Image: Lenna
=
=
2
1
2
1
45cos45sin
45sin45cos
X
X
Y
YY
oo
oo
-
7/27/2019 Transform Coding 2005
10/80
#10
Pixel Correlation Map in [Lenna]
-- coordinate distributionz Upper:
Before Rotation
z Lower:After Rotation
z Notice thevariance of Y2 issmaller than thevariance of X2.
z Compression:apply entropycoder on Y2.
-
7/27/2019 Transform Coding 2005
11/80
#11
Rotation Matrix
z Rotated 45 degrees clockwise
z Rotation matrix A
=
==
=
2
1
2
1
2
1
2222
2222
45cos45sin
45sin45cos
X
X
X
XAX
Y
YY
oo
oo
=
=
=
11
11
2
2
2222
2222
45cos45sin
45sin45cos
oo
oo
A
-
7/27/2019 Transform Coding 2005
12/80
#12
Orthogonal/orthonormal Matrixz Rotation matrix is orthogonal.
z The dot product of a rowwith itself is nonzero.
z The dot product of different
rows is 0.
z Futhermore, the rotationmatrix is orthonormal.
z The dot product of a row
with itself is 1.
z Example:
=
11
11
2
2A
==
else0
if0 jiAA ji
=
=else0
if1 jiAA ji
-
7/27/2019 Transform Coding 2005
13/80
#13
Reconstruct the Image
z Goal: recover X from Y.
z Since Y = AX, so X = A-1Y
z Because the inverse of an orthonormal matrix
is its transpose, we have A-1 = AT
z So, Y = A-1X = ATX
z We have inverse matrix
=
==11
11
2
2
45cos45sin
45sin45cos1oo
ooT
AA
-
7/27/2019 Transform Coding 2005
14/80
#14
Energy Compaction
z Rotation matrix [A] compacted the energy into Y1.
z Energy is the variance (http://davidmlane.com/hyperstat/A16252.html)
z Given:
z The total variance of X equals to that of Y. It is 41.
z Transformation makes Y2 (0.707) very small.z If we discard min{X}, we have error 42/41 =0.39
z If we discard min{Y}, we have error 0.7072/41 =0.012
z Conclusion: we are more confident to discard min{Y}.
=
=
=
11
11
2
2
2
1
2
1A
X
XA
Y
YY
meantheiswhere)(
1
1 2
1
2 =
=N
i
ix
N
=
=
==
=
0.707
6.364
1
9
2
2
5
4
11
11
2
2AXthen,
5
4X Y
-
7/27/2019 Transform Coding 2005
15/80
#15
Idea of Transform Codingz Transform the input pixels X0,X1,X2,...,Xn-1 into
coefficients Y0,Y1,...,Yn-1 (real values)z The coefficients have the property that most of
them are near zero.
z Most of the energy is compacted into a few
coefficients.
z Scalar quantize the coefficientz This is bit allocation.
z Important coefficients should have morequantization levels.
z Entropy encode the quantization symbols.
-
7/27/2019 Transform Coding 2005
16/80
#16
Forward transform (1D)
z Get the sequence Y from the sequence X.
z Each element of Y is a linear combination ofelements in X.
The element of the matrix are also called the weight of the linear transform,
and they should be independent of the data (except for the KLT transform).
BasisVectors
0 0,0 0, 1 0
1 1,0 1, 1 1
n
n n n n n
Y a a X
Y a a X
=
L
M M O M M
L
=
==1
0
, 1,1,0n
i
iijj njXaY L
AXY =
-
7/27/2019 Transform Coding 2005
17/80
#17
Choosing the Weights of the Basis
Vector
z The general guideline to determine the values
of A is to make Y0 large, while remainingY1,...,Yn-1 to be small.
z The value of the coefficient will be large ifweights a
ij
reinforce the corresponding dataitems Xj. This requires the weights and thedata values to have similar signs. Theconverse is also true: Yi will be small if the
weights and the data values to have dissimilarsigns.
-
7/27/2019 Transform Coding 2005
18/80
#18
Extracting Features of Dataz Thus, the basis vectors should extract distinct
features of the data vectors and must beindependent orthogonal). Note the pattern ofdistribution of +1 and -1 in the matrix. Theyare intended to pick up the low and high
frequency components of data.z Normally, the coefficients decrease in the
order of Y0,Y1,...,Yn-1.
z So, Y is more amenable to compression thanX.
AXY =
-
7/27/2019 Transform Coding 2005
19/80
#19
Energy Preserving (1D)z Another consideration to choose rotation matrix is to conserve
energy.
z For example, we have orthogonal matrix
z Energy before rotation: 42+62+52+22=81
z Energy after rotation: 172+32+(-5)2+12=324
z Energy changed!z Solution: scale W by scale factor. The scaling does not change the fact
that most of the energy is concentrated at the low frequency components.
=
11111111
1111
1111
A
=
25
6
4
X
==
25
3
17
AXY
-
7/27/2019 Transform Coding 2005
20/80
#20
Energy Preserving, Formal Proof
z The sum of the squares
of the transformedsequence is the same
as the sum of the
squares of the originalsequence.
z Most of the energy are
concentrated in the lowfrequency coefficients.
=
=
=
=
==
=
=
1
1
2
1
1
2
)(
)()(
Energy
n
ii
T
TT
TT
T
n
i
T
i
X
XX
XAAXAXAX
AXAX
YYY
Orthonormal
Matrix;See page #12
-
7/27/2019 Transform Coding 2005
21/80
#21
Why we are interested in the
orthonormal matrix?
z Normally, it is
computationally difficultto get the inverse matrix.
z The inverse of the
transformation matrix issimply its transpose.
A-1=AT
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 11 1 1 1 1 1 1 11
1 1 1 1 1 1 1 18
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
A
=
1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 11
1 1 1 1 1 1 1 18
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
TB A A
= = =
-
7/27/2019 Transform Coding 2005
22/80
#22
Two Dimensional Transform
z From input Image I, we get D.
z Given Transform matrix A
z 2D transformation goes as:
z Notice the energy compaction.
=
1111
1111
1111
1111
2
1A
=
9542
6745
6386
9674
D9657
69364425
8764
=
==
75.175.025.125.125.225.025.325.0
75.125.025.375.1
75.375.075.275.22
11111111
1111
1111
2
1
95426745
6386
9674
11111111
1111
1111
2
1TAXAY
-
7/27/2019 Transform Coding 2005
23/80
#23
Two Dimensional Transform
z Because transformation matrix is
orthonormal,z So, we have
z
Forward transform
z Backward transform
1= AAT
TAXAAXAY == 1
YAAYAAXT== 1
-
7/27/2019 Transform Coding 2005
24/80
#24
Linear Separable transformz Two dimensional transform is simplified as two
iterations of one-dimensional transform.
z Column-wise transform and row-wise transform.
1 1
, , , ,
0 0
N N
k l k i i j l j
i j
Y a X a
= =
=
-
7/27/2019 Transform Coding 2005
25/80
#25
Transform and Filteringz Consider the orthonormal
transform.
z If A is used to transform a vector of 2 identicalelements x = [x,x]T, the transformed sequence willindicating the low frequency or the average
value is and the high frequency component is0 because the signal value do not vary.
z If x = [3,1]T or [3,-1]T, the output sequence will beand respectively. Now, the high frequencycomponent has positive value and it is bigger for
[3,-1]T,indicating a much large variation. Thus, thetwo coefficients behave like output of a low-passand a high-pass filters, respectively.
Tx )0,2(
=11
11
2
2A
x2
T)2,22(T
)22,2(
-
7/27/2019 Transform Coding 2005
26/80
#26
Transform and Functional
Approximation
z Transform is a kind of function approximation.
z Image is a data set. Any data set is a function.z Transform is to approximate the image function
by a combination of simpler, well defined
waveforms (basis functions).z Not all basis sets are equal in terms of
compression.
z DCT and Wavelets are computationally easierthan Fourier.
-
7/27/2019 Transform Coding 2005
27/80
#27
Comparison of various transforms
z The KLT is optimal in the sense of decorrelating and energy-packing.
z Walsh-Hadamard Transform is especially easy for implementation.z basis functions are either -1 or +1, only add/sub is necessary.
-
7/27/2019 Transform Coding 2005
28/80
#28
Two-Dimensional Basis Matrixz The outer product of two vectors V1 and V2 is
defined as V1TV2. For example,
z For a matrix A of size n*n, the outer product ofith row and jth column is defined as
[ ]
=
=
cfcecd
bfbebd
afaead
fed
c
b
a
VVT
21
[ ]
=
=
1,1,1,1,0,1,
1,1,1,1,0,1,
1,0,1,0,0,0,
1,1,0,
1,
1,
0,
njnijnijni
njijiji
njijiji
njjj
ni
i
i
ij
aaaaaa
aaaaaa
aaaaaa
aaa
a
a
a
L
MOMML
L
LM
-
7/27/2019 Transform Coding 2005
29/80
#29
Outer Product
z For example, if
z We have:
=
11
11
2
2A
[ ]
=
=
11
11
2
111
2
2
1
1
2
200
[ ]
=
=
1111
2111
22
11
22
01
[ ]
=
=
11
11
2
111
2
2
1
1
2
210
[ ]
=
=
11
11
2
111
2
2
1
1
2
211
-
7/27/2019 Transform Coding 2005
30/80
#30
Outer Product (2)
z We have shown earlier that
z Consider X to be a 2*2 matrix:
YAX T=
1111101001010000
11011000
1101100011011000
1101100011011000
11011000
11011000
1110
0100
1110
0100
11
11
11
11
11
11
11
11
2
1
2
1
1111
21
11
11
11
11
2
1
yyyy
yyyy
yyyyyyyy
yyyyyyyy
yyyyyyyy
yy
yy
xx
xx
+++=
+
+
+
=
++
++++=
++=
=
-
7/27/2019 Transform Coding 2005
31/80
#31
Basis Matrix
z The quantities are called the
basis matrices in 2-D space.z In general, the outer products of a n*n
orhtonormal matrix form a basis matrix set in 2dimension. The quantity is called the DCcoefficient (note all the elements for the DCcoeeficient are 1, indicating an averageoperation), and other coefficients have
alternating values and are called ACcoefficients.
11100100 ,,,
00
-
7/27/2019 Transform Coding 2005
32/80
#32
Fast Cosine Transform
z 2D 8X8 basis functionsof the DCT:
z The horizontal frequencyof the basis functionsincreases from left toright and the vertical
frequency of the basisfunctions increases fromtop to bottom.
.
-
7/27/2019 Transform Coding 2005
33/80
#33
Amplitude distribution of the DCT
coefficients
z Histograms for 8x8 DCT
coefficient amplitudesmeasured for natural
images
z
DC coefficient is typicallyuniformly distributed.
z The distribution of the AC
coefficients have a
Laplacian distributionwith zero-mean.
-
7/27/2019 Transform Coding 2005
34/80
#34
Discrete Cosine Transform (DCT)
z Conventional image data have
reasonably high inter-element correlation.z DCT avoids the generation of the
spurious spectral components which is a
problem with DFT and has a fast
implementation which avoids complex
algebra.
-
7/27/2019 Transform Coding 2005
35/80
#35
One-dimensional DCT
z The basis idea is to decompose the image intoa set of waveforms, each with a particularspecial frequency.
z To human eyes, high spatial frequencies areimperceptible and a good approximation of the
image can be created by keeping only thelower frequencies.
z Consider the one-dimensional case first. The 8arbitrary grayscale values (with range 0 to 255,
shown in the next slide) are level shifted by128 (as is done by JPEG).
-
7/27/2019 Transform Coding 2005
36/80
#36
(b)
-150
-100
-50
0
50
100
150
1 2 3 4 5 6 7 8
x
f(x)
(c)
-150
-100
-50
0
50
100
150
1 2 3 4 5 6 7 8
u
S(u)
-1
0
1
1 2 3 4 5 6 7 8
U=7
Amplitude
-1
0
1
1 2 3 4 5 6 7 8
U=6
Amplitu
de
-1
0
1
1 2 3 4 5 6 7 8
U=5
Amplitude
-1
0
1
1 2 3 4 5 6 7 8
U=4
Amplitude
-1
0
1
1 2 3 4 5 6 7 8
U=3
Amplitude
-1
0
1
1 2 3 4 5 6 7 8
U=2
Amplitu
de
-1
0
1
1 2 3 4 5 6 7 8
U=1
Am
plitude
-1
0
1
1 2 3 4 5 6 7 8
U=0
Amplitude
Before DCT (image data) After DCT (coefficients
An example of 1-D DCT decomposition
The 8 basis functions for 1-D DCT
One-dimensional DCT
-
7/27/2019 Transform Coding 2005
37/80
#37
One-dimensional DCT
z The waveforms cam be denoted as
with frequencies f=0, 1, , 7. Each wave is sampled at 8 points
to form a basis vector.
z The eight basis vector constitutes a matrix A:
= 0with),cos()( ffw
16
15,
16
13,
16
11,
16
9,
16
7,
16
5,
16
3,
16
=
-0.1950.556-0.8310.981-0.9810.831-0.556-0.195
0.383-0.9240.924-0.383-0.3830.924-0.9240.383
-0.5560.981-0.195-0.8310.8310.195-0.9810.556
0.707-0.707-0.7070.7070.707-0.707-0.7070.707
-0.8310.1950.9810.556-0.556-0.981-0.1950.831
0.9240.383-0.383-0.924-0.924-0.3830.3830.924
-0.981-0.831-0.556-0.1950.1950.5560.8310.981
11111111
-
7/27/2019 Transform Coding 2005
38/80
#38
-
7/27/2019 Transform Coding 2005
39/80
#39
One-dimensional DCT
z The output of the DCT transform is:
where A is the 8*8 transformation matrix
defined in the previous slide, and I(x) isthe input signal.
z
S(u) are called the coefficients for theDCT transform for input signal I(x).
[ ]TxIAuS )()( =
-
7/27/2019 Transform Coding 2005
40/80
#40
One-dimensional FDCT and IDCT with N=8
Sample points
z The 1-D DCT in JPEG is defined as:
z FDCT
z IDCT
z Where (u is frequency)z I(x) is the 1-D sample
z S(u) is the 1-D DCT coefficientzAnd
0,1,..7for]16
)12(cos[)(
2
)()(
7
0=
+= = u
uxxI
uCuS
x
0,1,..,7for]16
)12(cos[)(
2
)()(
7
0
=+
= =
xux
uSuC
xIu
>
==
0for1
0for2
2
)(
u
uuC
O di i l FDCT d IDCT ith N
-
7/27/2019 Transform Coding 2005
41/80
#41
One-dimensional FDCT and IDCT with N
Sample points
z The 1-D DCT in JPEG is defined as:
z FDCT
z IDCT
z Where (u is frequency)
z I(x) is the 1-D sample
z S(u) is the 1-D DCT coefficient and
1-1,2,..Nfor]2
)12(cos[)()(
2)(
1
0
=+=
=
uN
uxxIuC
NuS
N
x
1-0,1,..Nfor]2
)12(cos[)()(
2)(
1
0=
+
=
=xN
uxuSuCNxI
n
u
>
==
0for1
0for2
2
)(
u
uuC
-
7/27/2019 Transform Coding 2005
42/80
#42
One-dimensional FDCT and IDCT
z As an example, let I(x)=[12,10,8,10,12,10,8,11]
z If we now apply IDCT, we will get back I(x).
We can quantize the coefficient S(u) and still
obtain a very good approximation of I(x).z For example,
z While)9989.10,99354.7,9932.9,0164.12,93097.9,96054.7,0233.10,0254.12(
)3.0,2.0,8.1,2.3,8.1,5.0,6.0,6.28(
=
IDCT
[ ] ].309.0,191.0,729.1,182.3,757.1,4619.0,5712.0,6375.28[)()( ==T
xIAuS
)684.10,053.8,014.10,347.12,573.9,6628.7,6244.9,236.11(
)0,0,2,3,2,0,0,28(
=
IDCT
T di i l FDCT d IDCT f 8X8
-
7/27/2019 Transform Coding 2005
43/80
#43
Two-dimensional FDCT and IDCT for 8X8
block
z The 2-D DCT in JPEG is defined as:
z FDCT
z IDCT
z Wherez I(y,x) is the 2-D sample
z S(v,u) is the 2-D DCT coefficientzAnd
==
++=
7
0
7
0
]16
)12(cos[]
16
)12(cos[),(
2
)(
2
)(),(
xy
uxvyxyI
uCvCuvS
==++
=
7
0
7
0]16
)12(
cos[]16
)12(
cos[),(2
)(
2
)(
),(uv
uxvy
uvS
uCvC
xyI
>
==
0for1
0for2
2
)(
u
uuC
>
==
0for1
0for2
2
)(
v
vvC
-
7/27/2019 Transform Coding 2005
44/80
#44
Fast Cosine Transform
z 2D 8X8 basis functionsof the DCT:
z The horizontal frequencyof the basis functionsincreases from left toright and the verticalfrequency of the basisfunctions increases fromtop to bottom.
.
-
7/27/2019 Transform Coding 2005
45/80
#45
Two-dimensional FDCT for NXM block
z Where I(y,x) is the 2-D sample, S(v,u) is the 2-D DCT
coefficient and
10,1,...,and10,1,...,for
]2
)12(
cos[]2
)12(
cos[),(2
)(
2
)(2
),(
FDCT
1
0
1
0
==
++
=
=
=
MvN-u
N
ux
M
vy
xyI
uCvC
NMuvS
M
y
N
x
>
==
0for1
0for22)(
u
uuC
>==
0for1
0for2
2
)(
v
vvC
-
7/27/2019 Transform Coding 2005
46/80
#46
Separable Function
z The 2-D FDCT is a separable function because we can express
the formula as
10,1,...,and10,1,...,for
2
1)(2cos]
2
)12(cos[),()(
2),( ][}{
1
0
)(2
1
0
==
++=
=
=
MvN-u
M
uy
N
uxxyIvC
MuvS
M
y
uCN
N
x
This means that we can compute a 2-D FDCT by first computing a row-wise
1-D FDCT and taking the output and perform on it a column-wise 1-D FDCTtransform. This is possible, as we explained earlier, because the basis
matrices are orthonormal.
-
7/27/2019 Transform Coding 2005
47/80
#47
Two-dimensional IDCT for NXM block
.10,1,...,and10,1,...,for
]2
)12(cos[]
2
)12(cos[),()()(
2),(
1
0
1
0
==
++=
=
=
MyNx
N
ux
M
vyuvSvCuC
NMxyI
N
u
M
v
This function is again separable and the computation can be done in
two steps: a row-wise 1-D IDCT followed by a column-wise 1-D IDCT.A straightforward algorithm to compute both FDCT and IDCT will need
( for a block of NXN pixels) an order ofO(N3) multiplication/ addition
operations. After the ground-breaking discovery of O(N logN) algorithm
for Fast Fourier Transform, several researchers proposed efficient
algorithms for DCT computation.
-
7/27/2019 Transform Coding 2005
48/80
#48
JPEG Introduction - The background
z JPEG stands for Joint Photographic Expert Group
z
A standard image compression method is needed toenable interoperability of equipment from different
manufacturer
z It is the first international digital image compression
standard for continuous-tone images (grayscale orcolor)
z The history of JPEG the selection process
-
7/27/2019 Transform Coding 2005
49/80
#49
JPEG Introduction whats the objective?
z
very good or excellent compression rate,reconstructed image quality, transmission rate
z be applicable to practically any kind of continuous-
tone digital source image
z good complexity
z have the following modes of operations:
z sequential encoding
z progressive encodingz lossless encoding
z hierarchical encoding
-
7/27/2019 Transform Coding 2005
50/80
#50
encoder decoder Source
image data
compressed
image data
reconstructed
image data
Image compression system
encoder
statistical
model
entropy
encoder
Encoder
model
Source
image data
compressed
image datadescriptors symbols
model
tables
entropy
coding tables
The basic parts of an JPEG encoder
JPEG Architecture Standard
-
7/27/2019 Transform Coding 2005
51/80
#51
JPEG has the following Operation Modes:
Sequential DCT-based mode
Progressive DCT-based mode
Sequential lossless mode
Hierarchical mode
JPEG entropy coding supports:
Huffman encoding
Arithmetic encoding
JPEG Overview (cont.)
-
7/27/2019 Transform Coding 2005
52/80
#52
JPEG Baseline System
JPEG Baseline system is composed of:
Sequential DCT-based mode Huffman coding
The basic architecture of JPEG Baseline system
Source
image data
quantizerentropy
encoder
compressed
image data
table
specification
table
specification
88 blocksDCT-based encoder
statistical
modelFDCT
-
7/27/2019 Transform Coding 2005
53/80
#53
The Baseline System
The DCT coefficient values can be regarded as the relative amounts ofthe 2-D spatial frequencies contained in the 88 block
the upper-left corner coefficient is called the DC coefficient, which is ameasure of the average of the energy of the block
Other coefficients are calledAC coefficients, coefficients correspondto high frequencies tend to be zero or near zero for most naturalimages
DC coefficient
Q ti ti T bl i DCT
-
7/27/2019 Transform Coding 2005
54/80
#54
Quantization Tables in DCT
z Human eyes are less sensitive to highfrequencies.
z We use different quantization value fordifferent frequencies.z Higher frequency, bigger quantization value.
z
Lower frequency, smaller quantization value.z Each DCT coefficient corresponds to a certain
frequency.
z High-level HVS is much more sensitive to the
variations in the achromatic channel than inthe chromatic channels.
-
7/27/2019 Transform Coding 2005
55/80
#55
The Baseline System Quantization
z Why quantization? .
z to achieve further compression by representing DCT
coefficients with no greater precision than is necessary toachieve the desired image quality
z Generally, the high frequency coefficients has larger quantizationvalues
z Quantization makes most coefficients to be zero, it makes the
compression system efficient, but its the main source that make thesystem lossy and introduces distortion in the reconstructed image.
z The quantization step-size parameters for JPEG are given by two
Quantization Matrices, each element of the matrix being an integer
of size between 1 to 255. The DCT coeeficients are divided by the
corresponding quantization parameters and rounded to nearestinteger.
-
7/27/2019 Transform Coding 2005
56/80
#56
)),(
),((),('
vuQ
vuFRoundvuF =
F(u,v): original DCT coefficient
F(u,v): DCT coefficient after quantization
Q(u,v): quantization value
There are two quantization tables: one for the luminancecomponent and the other for the Chrominance component. JPEG
standard does not prescribe the values of the table; it is left upto
the user, but they recommend the following two tables under
normal circumstances. These tables have been created by
analysing data of human perception and a lot of trial and error.
Quantization Tables in DCT
-
7/27/2019 Transform Coding 2005
57/80
#57
Quantization Tables in DCT
z So, we have two quantization tables.z Measured for an average person.
z Higher frequency, bigger quantization value.
z Lower frequency, smaller quantization value.
9910310011298959272
10112012110387786449
921131048164553524
771031096856372218
6280875129221714
5669574024161314
5560582619141212
6151402416101116
9999999999999999
9999999999999999
9999999999999999
9999999999999999
9999999999996647
9999999999562624
9999999966262118
9999999947241817
Luminance quantization table Chrominance quantization table
Two dimensional DCT
-
7/27/2019 Transform Coding 2005
58/80
#58
Two-dimensional DCT
z The image samples are shifted from unsigned integer with range[0, 2 N-1] to signed integers with range [- 2 N-1, 2 N-1-1]. Thussamples in the range 0-255 are converted in the range -128 to
127 and those in the range 0 to 4095 are converted in the range -2048 to 2047. This zero-shift done for JPEG to reduce theinternal precision requirements in the DCT calculations.
z How to interpret the DCT coefficients?z The DCT coefficient values can be regarded as the relative amounts
of the 2-D spatial frequencies contained in the 88 block.z F(0,0) is called DC coefficient, which is a measure of the average of
the energy of the block.
z Other coefficients are called AC coefficients, coefficients correspondto high frequencies tend to be zero or near zero for most images.
z
The energy is concentrated in the upper-left corner.
-
7/27/2019 Transform Coding 2005
59/80
#59
Baseline System - DC coefficient coding
z After quantization step, the DC coefficient are encoded separatelyby using differential encoding. The DC difference sequence isconstituted by concatenating the DC coefficient of the first block
followed by the difference values of DC coefficients of thesucceeding blocks. The average values of the succeeding blocksare usually correlated.
DPCMquantized DCcoefficients
DC difference
Differential pulse code modulation
-
7/27/2019 Transform Coding 2005
60/80
#60
Baseline System - AC coefficient coding
z AC coefficients are arranged into a zig-zag sequence. This is
because most of the significant coefficients are located in the
upper left corner of the matrix. The high frequency componentsare mostly 0s and can be efficiently coded by run-length
encoding.
0 1 5 6 14 15 27 28
2 4 7 13 16 26 29 42
3 8 12 17 25 30 41 43
9 11 18 24 31 40 44 53
10 19 23 32 39 45 52 54
20 22 33 38 46 51 55 60
21 34 37 47 50 56 59 61
35 36 49 48 57 58 62 63
Horizontal frequency
Verticalfrequency
An Example (Ref:T. Acharya, P. Tsai,JPEG 2000
Standard for Image Compression) Wiley Iterscience
-
7/27/2019 Transform Coding 2005
61/80
#61
Standard for Image Compression), Wiley-Iterscience,
2005
The 8X8 Image Data Block
110 110 118 118 121 126 131 131108 111 125 122 120 125 134 135
106 119 129 127 125 127 138 144
110 126 130 133 133 131 141 148115 116 119 120 122 125 137 139
115 106 99 110 107 116 130 127
110 91 82 101 99 104 120 118103 76 70 95 92 91 107 106
-
7/27/2019 Transform Coding 2005
62/80
#62
-
7/27/2019 Transform Coding 2005
63/80
#63
Baseline System - Statistical modeling
z Statistical modeling translate the inputs
to a sequence of symbols for Huffmancoding to use
z Statistical modeling on DC coefficients:
z Category: different size (SSSS)
z Magnitude: amplitude of difference
(additional bits)
Coding the DC coeeficients
-
7/27/2019 Transform Coding 2005
64/80
#64
Coding the DC coeeficients
The prediction errors are classified into 16 categories in modulo 216. Category
C contains the range of integers [-(2c
-1), + (2c
-1)] excluding the numbers in therange [-(2c-1-1), + (2c-1-1)] which falls in the previous category. The category
number is encoded using either a fixed code (SSSS), unary code (0, 10, 110, )
or a variable length Huffman code.
Coding the DC/AC coefficients
-
7/27/2019 Transform Coding 2005
65/80
#65
Coding the DC/AC coefficients
If the magnitude of the error is positive, it is encoded using category
number of bits with a leading 1. If it is negative, Is complement of
the positive number is used. Thus, if error is -6 and if the Huffman
code for category 3 is 1110, it will get a code (1110001). The JPEG
Standard recommends Huffman two codes for the category
numbers: one for Luminance DC and the other for Chrominance DC
( See Salomon, p.289; it is also described in Annex K ( Table k.3
and K.4 of the Standard)
z Statistical modeling on AC coefficients:
z Run-length: RUN-SIZE=16*RRRR+SSSS
z Amplitude: amplitude of difference (additional bits)
-
7/27/2019 Transform Coding 2005
66/80
#66
The AC coefficients contains just a few nonzero numbers, with runs of
zeros followed by a long run of trailing zeros. For each nonzero
number, two numbers are generated: 1)16*RRRR which gives the
number of consecutive zeros preceding the nonzero coefficient being
encoded, 2)SSSS gives the category corresponding to number of bits
required to encode the AC coefficient using variable length Huffman
code. This is specified in the form of a table (see next slide). The
second attribute of the AC coefficient is its amplitude which is
encoded exactly the same way as the DC values. Note the code (0,0)
is reserved for EOB ( signifying that the remaining AC coefficients are
just a single trailing run of 0s). The code (15,0) is reserved for a
maximum run of 16 zeros. If it is more than 16, it is broken down into
runs of 16 plus possibly a single run less than 16. JPEG recommends
two tables for Huffman codes for AC coefficients, Tables K5 and K6
for luminance and Chrominance values.
Encoding the AC Coefficients
-
7/27/2019 Transform Coding 2005
67/80
#67
Huffman AC statistical model
run-length/amplitude combinations Huffman coding of AC coefficients
Encoding the AC Coefficients
Example (Contd.)
-
7/27/2019 Transform Coding 2005
68/80
#68
Example (Contd.)
z Let us finish our example. The DC coeeficient is -6 has a code o
1110001. The AC coeeficients will be encoded as:
z (0,3)(-6), (0,3)(6), (0,3)(-5), (1,2)(2), (1,1)(-1), (5,1)(-1),(2,1)(-1),(0,1)(1),(0,0). The Table K.5 gives the codes
z 1010, 00, 100, 1100, 11011, 11100 and 11110101 for the pairs of
symbols (0,0), (0,1), (0,3), (1,1), (1,2), (2,1) and (5,1),
respectively. The variable length code for AC coefficients 1, -1, 2,-5, 6 and -6 are 1, 0, 10, 010,110 and 001, respectively. Thus
JPEG will give a compressed representation of the example
image using only 55 bits as:
z 1110001 100001 100110 100010 1101110 11000 11110100
111000 001 1010
z The uncompressed representation would have required 512 bits!
Example to illustrate encoding/decoding
-
7/27/2019 Transform Coding 2005
69/80
#69
p g g
JPEG Progressive Model
-
7/27/2019 Transform Coding 2005
70/80
#70
JPEG Progressive Model
z Why progressive model?
z Quick transmission
z Image built up in a coarse-to-fine passes
z First stage: encode a rough but recognizable version of the image
z Later stage(s): the image refined by successive scans till get the
final image
z Two ways to do this:
z Spectral selection send DC, AC coefficients separately
z Successive approximationsend the most significant bits first
and then the least significant bits
JPEG Lossless Model
-
7/27/2019 Transform Coding 2005
71/80
#71
JPEG Lossless Model
DPCMSample values Descriptors
Differential pulse code modulation
C B
A X
selection value prediction strategy
7
0
1
2
3
(A+B)/2
no prediction
A
B
C
Predictors for lossless coding
B+(A-C)/2
A+B-C
A+(B-C)/2
4
56
JPEG Hierarchical Model
-
7/27/2019 Transform Coding 2005
72/80
#72
JPEG Hierarchical Model
z Hierarchical model is an alternative of progressive
model (pyramid)
z Steps:
z filter and down-sample the original images by the desired
number of multiplies of 2 in each dimension
z Encode the reduced-size image using one of the above
coding model
z Use the up-sampled image as a prediction of the origin at
this resolution, encode the difference
z Repeat till the full resolution image has been encode
The Effect of Segmentation
-
7/27/2019 Transform Coding 2005
73/80
#73
g
z The image samples are grouped into 88 blocks. 2-D DCT isapplied on each 88 blocks.
z Because of blocking, the spatial frequencies in the image and thespatial frequencies of the cosine basis functions are not preciselyequivalent. According to Fouriers theorem, all the harmonics ofthe fundamental frequencies must be present in the basisfunctions to be precise. Nonetheless, the relationship between
the DCT frequency and the spatial frequency is a closeapproximation if we take into account the sensitivity of human eyefor detecting contrast in frequency.
z The segmentation also introduces what is called the blockingartifacts. This becomes very pronounced if the DC coefficients
from block to block vary considerably. These artifacts appear asedges in the image, and abrupt edges imply high frequency. Theeffect can be minimized if the non-zero AC coefficients are kept.
Some other transforms
-
7/27/2019 Transform Coding 2005
74/80
#74
z Discrete Fourier Transform (DFT)
z Haar Transformz Karhunen Love Transform (KLT)
z Walsh-Hadamard Transform (WHT)
Discrete Fourier Transform (DFT)
-
7/27/2019 Transform Coding 2005
75/80
#75
( )
z Well-known for its connection to spectral
analysis and filtering.
z Extensive study done on its fast
implementation (O(Nlog2N) for N-point DFT).
z
Has the disadvantage of storage andmanipulation of complex quantities and
creation of spurious spectral components due
to the assumed periodicity of image blocks.
Haar Transform
-
7/27/2019 Transform Coding 2005
76/80
#76
z Very fast transform.
z The easiest wavelet
transform.z Useful in edge
detection, imagecoding, and image
analysis problems.z Energy Compaction
is fair, not the bestcompression
algorithms.2D basis function of Haar transform
Karhunen Love Transform (KLT)
-
7/27/2019 Transform Coding 2005
77/80
#77
z Karhunen Love Transform (KLT) yieldsdecorrelated transform coefficients.
z Basis functions are eigenvectors of thecovariance matrix of the input signal.
z KLT achieves optimum energy concentration.
z Disadvantages:z KLT dependent on signal statistics
z KLT not separable for image blocks
z Transform matrix cannot be factored into sparsematrices
Walsh-Hadamard Transform (WHT)
-
7/27/2019 Transform Coding 2005
78/80
#78
z Although far from
optimum in an
energy packing
sense for typical
imagery, its simple
implementation(basis functions are
either -1 or +1) has
made it widelypopular.
Transformation matrix:
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 111 1 1 1 1 1 1 18
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
A
=
Walsh-Hadamard Transform (WHT)
-
7/27/2019 Transform Coding 2005
79/80
#79
z Walsh-Hadamard transform requires
adds and subtractsz Use of high speed signal processing has
reduced its use due to improvements
with DCT.
Transform Coding: Summary
-
7/27/2019 Transform Coding 2005
80/80
#80
z Purpose of transformz de-correlation
z
energy concentrationz KLT is optimum, but signal dependent and, hence,
without a fast algorithm
z DCT reduces blocking artifacts appeared in DFT
z
Threshold coding + zig-zag-scan + 8x8 block size iswidely used todayz JPEG, MPEG, ITU-T H.263.
z Fast algorithm for scaled 8-DCTz 5 multiplications, 29 additions
z Audio Codingz MP3 = MPEG 1- Layer 3 uses DCT