Lossless Compression - I
-
Upload
fredricka-elijah -
Category
Documents
-
view
29 -
download
0
description
Transcript of Lossless Compression - I
Lossless Compression - I
Hao JiangComputer Science Department
Sept. 13, 2007
Introduction
Compress methods are key enabling techniques for multimedia applications.
Raw media takes much storage and bandwidth– A raw video with 30 frame/sec, resolution of
640x480, 24bit color
One second of video 30 * 640 * 480 * 3 = 27.6480 Mbytes
One hour video is about 100Gbytes
Some Terms
Encoder(compression)
Storage ornetworks
Decoder(decompression)
Data Input(a sequence of symbols
from an alphabet)Recovered datasequence
Lossless compression: The recovered data is exactly the same as the input.
Lossy compression: The recovered data approximates the input data.
Compression ratio = (bits used to represent the input data) / (bits of the code)
Code ( a sequence of codewords)
Informationsource
Entropy
The number of bits needed to encode a media source is lower-bounded by its “Entropy”.
Self information of an event A is defined as -logbP(A)
where P(A) is the probability of event A. If b equals 2, the unit is “bits”. If b equals e, the unit is “nats” If b is 10, the unit is “hartleys”
Example
A source outputs two symbols (the alphabet has 2 symbols) 0 or 1. P(0) = 0.25, P(1) = 0.75.
Information we get when receiving a 0 is log_2 (1/0.25) = 2 bit ;
when receiving a 1 is log_2 (1/0.75) = 0.4150 bit .
Properties of Self Information
The letter with smaller probability has high self information.
The information we get when receiving two independent letters are summation of each of the self information.
-log2P(sa,sb)
= -log2P(sa)P(sb)
= [-log2P(sa)] + [- log2P(sa)]
Entropy
An source has symbols {s1, s2, …, sn}, and the symbols are independent, the average self-information is
H= 1
n P(si)log2(1/P(si)) bits
H is called the Entropy of the source.
The number of bits per symbol needed to encode a media source is lower-bounded by its “Entropy”.
Entropy (cont)
Example: A source outputs two symbols (the alphabet
has 2 letters) 0 or 1. P(0) = 0.25, P(1) = 0.75.
H = 0.25 * log_2 (1/0.25) + 0.75 * log_2(1/0.75) = 0.8113 bits
We need at least 0.8113 bits per symbol in encoding.
The Entropy of an Image An grayscale image with 256 possible levels. A={0, 1, 2,
…, 255}. Assuming the pixels are independent and the grayscales are have equal probabilities,
H = 256 * 1/256 *log2(1/256) = 8bits
What about an image with only 2 levels 0 and 255? Assuming, P(0) = 0.5 and P(255) = 0.5.
H = 1 bit
Estimate the Entropy
a a a b b b b c c c c d d
P(a) = 3/13P(b) = 4/13P(c) = 4/13P(d) = 2/13
H = [-P(a)log_2P(a)] + [-P(b)log_2P(b)] + [-P(c)log_2P(c)] + [-P(d)log_2P(d)] = 1.95bits
Assuming the symbols are independent:
Coding Schemes
A = {s1, s2, s3, s4}
P(s1) = 0.125P(s2) = 0.125P(s3) = 0.25P(s4) = 0.5
s1s2s3s4 0
1 11
01 s1s2s3s4 0
10 111
110s1s2s3s4 0
0 11
10
Its entropy H = 1.75
Not uniquely decodeable Good codewords and achieves lower bound
Huffman Coding
s1
s2
s3
s4
0.125
0.125
0.25
0.5
0.25
0.51
0
0
1
0
1
(01)
(1)
(001)
(000)
Another Example
0.1 a5
0.1 a4
0.2 a3
0.2 a2
0.4 a1
0.2
0.4
0.20.6 0.4
1
0
1
01
0
1
0
(0)
(10)
(111)
(1101)
(1100)