Lossless Compression - I

13
Lossless Compression - I Hao Jiang Computer Science Department Sept. 13, 2007

description

Lossless Compression - I. Hao Jiang Computer Science Department Sept. 13, 2007. Introduction. Compress methods are key enabling techniques for multimedia applications. Raw media takes much storage and bandwidth A raw video with 30 frame/sec, resolution of 640x480, 24bit color - PowerPoint PPT Presentation

Transcript of Lossless Compression - I

Page 1: Lossless Compression - I

Lossless Compression - I

Hao JiangComputer Science Department

Sept. 13, 2007

Page 2: Lossless Compression - I

Introduction

Compress methods are key enabling techniques for multimedia applications.

Raw media takes much storage and bandwidth– A raw video with 30 frame/sec, resolution of

640x480, 24bit color

One second of video 30 * 640 * 480 * 3 = 27.6480 Mbytes

One hour video is about 100Gbytes

Page 3: Lossless Compression - I

Some Terms

Encoder(compression)

Storage ornetworks

Decoder(decompression)

Data Input(a sequence of symbols

from an alphabet)Recovered datasequence

Lossless compression: The recovered data is exactly the same as the input.

Lossy compression: The recovered data approximates the input data.

Compression ratio = (bits used to represent the input data) / (bits of the code)

Code ( a sequence of codewords)

Informationsource

Page 4: Lossless Compression - I

Entropy

The number of bits needed to encode a media source is lower-bounded by its “Entropy”.

Self information of an event A is defined as -logbP(A)

where P(A) is the probability of event A. If b equals 2, the unit is “bits”. If b equals e, the unit is “nats” If b is 10, the unit is “hartleys”

Page 5: Lossless Compression - I

Example

A source outputs two symbols (the alphabet has 2 symbols) 0 or 1. P(0) = 0.25, P(1) = 0.75.

Information we get when receiving a 0 is log_2 (1/0.25) = 2 bit ;

when receiving a 1 is log_2 (1/0.75) = 0.4150 bit .

Page 6: Lossless Compression - I

Properties of Self Information

The letter with smaller probability has high self information.

The information we get when receiving two independent letters are summation of each of the self information.

-log2P(sa,sb)

= -log2P(sa)P(sb)

= [-log2P(sa)] + [- log2P(sa)]

Page 7: Lossless Compression - I

Entropy

An source has symbols {s1, s2, …, sn}, and the symbols are independent, the average self-information is

H= 1

n P(si)log2(1/P(si)) bits

H is called the Entropy of the source.

The number of bits per symbol needed to encode a media source is lower-bounded by its “Entropy”.

Page 8: Lossless Compression - I

Entropy (cont)

Example: A source outputs two symbols (the alphabet

has 2 letters) 0 or 1. P(0) = 0.25, P(1) = 0.75.

H = 0.25 * log_2 (1/0.25) + 0.75 * log_2(1/0.75) = 0.8113 bits

We need at least 0.8113 bits per symbol in encoding.

Page 9: Lossless Compression - I

The Entropy of an Image An grayscale image with 256 possible levels. A={0, 1, 2,

…, 255}. Assuming the pixels are independent and the grayscales are have equal probabilities,

H = 256 * 1/256 *log2(1/256) = 8bits

What about an image with only 2 levels 0 and 255? Assuming, P(0) = 0.5 and P(255) = 0.5.

H = 1 bit

Page 10: Lossless Compression - I

Estimate the Entropy

a a a b b b b c c c c d d

P(a) = 3/13P(b) = 4/13P(c) = 4/13P(d) = 2/13

H = [-P(a)log_2P(a)] + [-P(b)log_2P(b)] + [-P(c)log_2P(c)] + [-P(d)log_2P(d)] = 1.95bits

Assuming the symbols are independent:

Page 11: Lossless Compression - I

Coding Schemes

A = {s1, s2, s3, s4}

P(s1) = 0.125P(s2) = 0.125P(s3) = 0.25P(s4) = 0.5

s1s2s3s4 0

1 11

01 s1s2s3s4 0

10 111

110s1s2s3s4 0

0 11

10

Its entropy H = 1.75

Not uniquely decodeable Good codewords and achieves lower bound

Page 12: Lossless Compression - I

Huffman Coding

s1

s2

s3

s4

0.125

0.125

0.25

0.5

0.25

0.51

0

0

1

0

1

(01)

(1)

(001)

(000)

Page 13: Lossless Compression - I

Another Example

0.1 a5

0.1 a4

0.2 a3

0.2 a2

0.4 a1

0.2

0.4

0.20.6 0.4

1

0

1

01

0

1

0

(0)

(10)

(111)

(1101)

(1100)