MATH3411 Chapter 3

24
Chapter 3: Compression Coding Lectures 10-11 Compression coding Variable length codes Assume that there is no channel noise: source coding Define source S with symbols s 1 ,...,s q with probabilities p 1 ,...,p q code C with codewords c 1 ,..., c q of lengths 1 ,...,ℓ q and radix r A code C is uniquely decodeable (UD) if it can always be decoded unambiguously instantaneous if no codeword is the prefix of another Such a code is an I-code. Example Morse code is an I-code (due to the stop p). A •−p N −•p 1 •−−−−p B −•••p O −−−p 2 ••−−−p C −•−•p P •−−•p 3 •••−−p D −••p 0.11% Q −−•−p 4 ••••−p 12.5% E p R •−•p 5 •••••p F ••−•p S •••p 6 −••••p G −−•p 9.25% T p 7 −−•••p H ••••p U ••−p 8 −−−••p I ••p V •••−p 9 −−−−•p J •−−−p W •−−p 0 −−−−−p K −•−p X −••−p L •−••p Y −•−−p M −−p Z −−••p (See Appendix 1 for full list) 1

description

Lecture Notesmathematics

Transcript of MATH3411 Chapter 3

Page 1: MATH3411 Chapter 3

Chapter 3: Compression Coding

Lectures 10-11

Compression coding

Variable length codesAssume that there is no channel noise: source coding

Define

source S with symbols s1, . . . , sqwith probabilities p1, . . . , pq

code C with codewords c1, . . . , cqof lengths ℓ1, . . . , ℓq

and radix r

A code C isuniquely decodeable (UD) if it can always be decoded unambiguouslyinstantaneous if no codeword is the prefix of anotherSuch a code is an I-code.

ExampleMorse code is an I-code (due to the stop p).

A •−p N −•p 1 •−−−−pB −•••p O −−−p 2 ••−−−pC −•−•p P •−−•p 3 •••−−pD −••p 0.11% Q −−•−p 4 ••••−p

12.5% E •p R •−•p 5 •••••pF ••−•p S •••p 6 −••••pG −−•p 9.25% T −p 7 −−•••pH ••••p U ••−p 8 −−−••pI ••p V •••−p 9 −−−−•pJ •−−−p W •−−p 0 −−−−−pK −•−p X −••−pL •−••p Y −•−−pM −−p Z −−••p (See Appendix 1 for full list)

1

Page 2: MATH3411 Chapter 3

ExampleThe standard comma code of length 5 is

s1 → c1 = 0s2 → c2 = 10s3 → c3 = 110s4 → c4 = 1110s5 → c5 = 11110s6 → c6 = 11111

This code is an I-code.Decode

1 1 0 0 1 1 1 1 0 1 1 0 1 0 1 1 1 1 1 1 1 0

ass3s1s5s3s2s6s3

ExampleConsider the code C:

s1 → c1 = 0s2 → c2 = 01s3 → c3 = 11s4 → c4 = 00

This code is not uniquely decodable since, for example,

0011

can be decoded as s1s1s3 and s4s3.

2

Page 3: MATH3411 Chapter 3

ExampleConsider the code C:

s1 → c1 = 0s2 → c2 = 01s3 → c3 = 011s4 → c4 = 0111s5 → c5 = 1111

This code is uniquely decodable but is not instantaneous.

ExampleConsider the code C:

s1 → c1 = 00s2 → c2 = 01s3 → c3 = 10s4 → c4 = 11

This code is a block code and is thus an I-code.

ExampleConsider the code C:

s1 → c1 = 0s2 → c2 = 100s3 → c3 = 1011s4 → c4 = 110s5 → c5 = 111

This code is an I-code.Decode

0 0 1 1 0 0 1 0 1 1 1 1 1

ass1s1s4s1s3s5

3

Page 4: MATH3411 Chapter 3

Decision trees can represent I-codes.

ExampleThe standard comma code of length 5 is

s1 → c1 = 0s2 → c2 = 10s3 → c3 = 110s4 → c4 = 1110s5 → c5 = 11110s6 → c6 = 11111

s10

1

s20

1

s30

1

s40

1

s50

s61

ExampleConsider the block code C:

s1 → c1 = 00s2 → c2 = 01s3 → c3 = 10s4 → c4 = 11

0

s10

s21

1s30

s414

Page 5: MATH3411 Chapter 3

ExampleConsider the code C:

s1 → c1 = 0s2 → c2 = 100s3 → c3 = 1011s4 → c4 = 110s5 → c5 = 111

s10

1

0

s20

1s3

11

1s40

s51

Decision trees can represent I-codes.

Branches are numbered from the top down.Any radix r is allowed.Two codes are equivalent if their decision trees are isomorphic.By shuffling source symbols, we may assume that ℓ1 ≤ ℓ2 ≤ · · · ≤ ℓq .

Example

s1 → c1 = 00s2 → c2 = 01s3 → c3 = 02s4 → c4 = 1s5 → c5 = 20s6 → c6 = 21

Code

10111202120

Equivalent codes4

0

1

s10

s21

s32

2 s60

s515

Page 6: MATH3411 Chapter 3

The Kraft-McMillan TheoremThe following are equivalent:

1 There is a radix r UD-code with codeword lengths ℓ1 ≤ ℓ2 ≤ · · · ≤ ℓq2 There is a radix r I-code with codeword lengths ℓ1 ≤ ℓ2 ≤ · · · ≤ ℓq

3 K =

q∑

i=1

(1

r

)ℓi

≤ 1

ExampleIs there a radix 2 UD-code with codeword lengths 1, 2, 2, 3 ?No, by the Kraft-McMillan Theorem:

(1

2

)1

+(1

2

)2

+(1

2

)2

+(1

2

)3

=9

8� 1

ExampleIs there a radix 3 I-code with codeword lengths 1, 2, 2, 2, 2, 3 ?Yes, by the Kraft-McMillan Theorem:

(1

3

)1

+(1

3

)2

+(1

3

)2

+(1

3

)2

+(1

3

)2

+(1

3

)3

=22

27≤ 1

For instance,

s1 → c1 = 0s2 → c2 = 10s3 → c3 = 11s4 → c4 = 12s5 → c5 = 20s6 → c6 = 210

s10

1

s20

s31

s42

2 s50

1 s60

This is a standard I-code.

6

Page 7: MATH3411 Chapter 3

Proof 2 ⇒ 1 is trivial. We now prove that 1 ⇒ 3 .Suppose that a radix r UD-code has codeword lengths ℓ1 ≤ ℓ2 ≤· · ·≤ ℓq.Note that

Kn =( q∑

i=1

(1

r

)ℓi)n

=∞∑

j=1

Nj

rj

where Nj = |{(i1, . . . , in) ∈ {ℓ1, . . . , ℓq}n : i1 + · · · + in = j}|.

For instance, if (ℓ1, . . . , ℓq) = (2, 3) and n = 3, then

Kn =( q∑

i=1

(1

r

)ℓi)n

=( 1

r2+

1

r3

)3

=( 1

r2+

1

r3

)( 1

r2+

1

r3

)( 1

r2+

1

r3

)

=1

r2+2+2+

1

r2+2+3+

1

r2+3+2+

1

r2+3+3+

1

r3+2+2+

1

r3+2+3+

1

r3+3+2+

1

r3+3+3

=1

r6+

3

r7+

3

r8+

1

r9

Here, N6 = N9 = 1 and N7 = N8 = 3 and Nj = 0 if j 6= 6, 7, 8, 9 .

Now, Nj counts the ways to write n-codeword messages of length j.Since the code is UD, each such message can only be written in one way,so Nj ≤ rj. Therefore,

Kn =

nℓq∑

j=1

Nj

rj≤

nℓq∑

j=1

rj

rj=

nℓq∑

j=1

1 = nℓq

This inequality holds for all n = 1, 2, . . . ,but the left-hand side is exponential whereas the right-hand side is linear.We conclude that K ≤ 1.

We have proved that 2 ⇒ 1 and that 1 ⇒ 3 .To conclude the proof, let us also prove that 3 ⇒ 2 (just for r = 2).Therefore, suppose that K ≤ 1; we wish to construct an I-code.Set c1 = 0 · · · 0

︸ ︷︷ ︸

ℓ1

and c2 = 0 · · · 01︸ ︷︷ ︸

ℓ1

0 · · · 0︸ ︷︷ ︸

ℓ2−ℓ1

.

7

Page 8: MATH3411 Chapter 3

For i ≥ 3, setci = ci1ci2 · · · ciℓi

where ci1, ci2, . . . , ciℓi satisfy

i−1∑

j=1

1

2ℓj=

ci1

2+

ci2

22+ · · · +

ciℓi2ℓi

=

ℓi∑

k=1

cik

2k

Such ci1, . . . , ciℓi exist since

q∑

j=1

1

2ℓj= K ≤ 1 .

For instance,

ℓ1 = 2 c1 = 00ℓ2 = 3 c2 = 010ℓ3 = 3 c3 = 011ℓ4 = 4 c4 = 1000

3∑

j=1

1

2ℓj=

1

22+

1

23+

1

23=

1

2=

1

21+

0

22+

0

23+

0

24

These binary expansions are unique, so the code is UD.Assume that the code is not instantaneous.Then some cu is a prefix of some cv where u < v.Then ℓu < ℓv, so

v−1∑

j=1

1

2ℓj−

u−1∑

j=1

1

2ℓj=

v−1∑

j=u

1

2ℓj≥

1

2ℓu

However, we also havev−1∑

j=1

1

2ℓj−

u−1∑

j=1

1

2ℓj=

ℓv∑

k=1

cvk

2k−

ℓu∑

k=1

cuk

2k=

ℓv∑

k=ℓu+1

cvk

2k≤

ℓv∑

k=ℓu+1

1

2k

<

∞∑

k=ℓu+1

1

2k

=1

2ℓu

This is a contradiction, so the proof is finished. ✷8

Page 9: MATH3411 Chapter 3

Chapter 3: Compression Coding

Lecture 12

Define

source S with symbols s1, . . . , sqwith probabilities p1, . . . , pq

code C with codewords c1, . . . , cqof lengths ℓ1, . . . , ℓq

and radix r

By shuffling source symbols, we may assume that p1 ≥ p2 ≥ · · · ≥ pq.

The (expected or) average length and variance of codewords in C are

L =

q∑

i=1

piℓi V =

(

q∑

i=1

piℓ2

i

)

− L2

A UD-code is minimal with respect to p1, . . . , pq if it has minimal length.

Example

A code C has the codewords 0, 10, 11 with probabilities1

2,1

4,1

4.

Its average length and variance are

L =1

2× 1 +

1

4× 2 +

1

4× 2 =

3

2

V =1

2× 1

2+

1

4× 2

2+

1

4× 2

2 − L2=

5

2−

(

3

2

)2

=1

4

It is easy to see that C is minimal with respect to1

2,1

4,1

4.

Example

A code C ′ has the codewords 10, 0, 11 with probabilities1

2,1

4,1

4.

Its average length is

L =1

2× 2 +

1

4× 1 +

1

4× 2=

7

4>

3

2

We see that C ′ is not minimal with respect to1

2,1

4,1

4. 9

Page 10: MATH3411 Chapter 3

TheoremIf a binary UD-code has minimal average length L with respect to p1, . . . , pq,then, possibly after permuting codewords of equally likely symbols,

1 ℓ1 ≤ ℓ2 ≤ · · · ≤ ℓq2 The code may be assumed to be instantaneous.3 K =

∑q

i=12−ℓi = 1

4 ℓq−1 = ℓq5 cq−1 and cq differ only in their last place.

Proof1 Suppose that pm > pn and ℓm > ℓn.Swapping cm and cn gives a new code with smaller L, a contradiction.

2 Use the Kraft-McMillan Theorem.

3 If K < 1, then the code can be shortened, reducing L, a contradiction.

4 We know that ℓq−1 ≤ ℓq. If ℓq−1 < ℓq, then there must be nodes in thedecision tree where no choice is made, implying K < 1, a contradiction.

5 The tree must end with a simple fork:

sq−10

sq1

Therefore, cq−1 and cq differ only in their last place. ✷

10

Page 11: MATH3411 Chapter 3

Huffman’s Algorithm (binary)

Input: a source S = {s1, . . . , sq} and probabilities p1, . . . , pqOutput: a code C for S, given by a decision tree

Combining phaseReplace the last 2 symbols sq−1 and sq

by a new symbol sq−1,q with probability pq−1 + pq.Reorder the symbols s1, . . . , sq−2, sq−1,q by their probabilities.Repeat until there is only one symbol left.

Splitting phaseRoot-label this symbol.Draw edges from symbol sa,b to symbols sa and sb.Label edge sasa,b by 0 and label edge sbsa,b by 1.

The resulting code depends on the reordering of the symbols.

ExampleIn the place-low strategy, we place sa,b as low as possible.Consider a source s1, . . . , s6 with probabilities 0.3, 0.2, 0.2, 0.1, 0.1, 0.1 .

s1

s2

s3

s4

s5

s6

0.3

0.2

0.2

0.1

0.1

0.1

0.1

0.2

0.2

0.2

0.3

0

1

0.2

0.2

0.3

0.3

0

1 0.3

0.3

0.4

0

1 0.4

0.60

1

1.00

1

00

10

11

011

0100

0101

codew

ords

L = 0.3×2 + 0.2×2 + 0.2×2 + 0.1×3 + 0.1×4 + 0.1×4= 2.5

V = 0.3×22 + 0.2×22 + 0.2×22 + 0.1×32 + 0.1×42 + 0.1×42 − L2= 0.6511

Page 12: MATH3411 Chapter 3

ExampleIn the place-high strategy, we place sa,b as high as possible.Consider a source s1, . . . , s6 with probabilities 0.3, 0.2, 0.2, 0.1, 0.1, 0.1 .

s1

s2

s3

s4

s5

s6

0.3

0.2

0.2

0.1

0.1

0.1

0.1

0.2

0.2

0.2

0.3

0

1

0.2

0.2

0.2

0.3

0.3

0

1

0.3

0.3

0.3

0.4

0

1 0.4

0.60

1

1.00

1

01

11

000

001

100

101

codew

ords

L = 0.3×2 + 0.2×2 + 0.2×3 + 0.1×3 + 0.1×3 + 0.1×3= 2.5

V = 0.3×22 + 0.2×22 + 0.2×32 + 0.1×32 + 0.1×32 + 0.1×32 − L2= 0.25

The average length is the same as for the place-low strategy- but the variance is smaller. It turns out that this is always the case,so we will use only use the place-high strategy.

The Huffman Code TheoremFor any given source S and corresponding probabilities,the Huffman algorithm yields an instantaneous minimum UD-code.

12

Page 13: MATH3411 Chapter 3

Chapter 3: Compression Coding

Lectures 13-14

The Huffman Code TheoremFor any given source S and corresponding probabilities,the Huffman algorithm yields an instantaneous minimum UD-code.

ProofWe proceed by induction on q = |S|. For q = 2, each Huffman code isan instantaneous UD-code with minimum average length L = 1.

Now assume that each Huffman code on q−1 symbols is an instantaneousUD-code with minimum average length.Let C be a Huffman code on q symbols with average length L andlet C∗ be any UD-code q symbols with minimum average length L∗.Denote the codeword lengths of C and C∗ by ℓ1, . . . , ℓq and ℓ∗1, . . . , ℓ

q.By construction, cq and cq−1 in C differ only in their last coordinate.By minimality, C∗ has codewords c∗q, c

q−1 differing only in the last coordinate.Combine cq and cq−1 in C to get a Huffman code on q − 1 symbols andcombine c∗q and c∗q−1 in C∗ to get a UD-code on q − 1 symbols.Denote the average lengths of these codes by M and M ∗, respectively.By the induction hypothesis M ≤ M ∗, so

L− L∗ =

q∑

i=1

ℓipi −

q∑

i=1

ℓ∗i pi

=(( q−2

i=1

ℓipi

)

+ ℓq−1pq−1 + ℓqpq

)

−(( q−2

i=1

ℓ∗i pi

)

+ ℓ∗q−1pq−1 + ℓ∗qpq

)

=(( q−2

i=1

ℓipi

)

+ ℓq(pq−1 + pq))

−(( q−2

i=1

ℓ∗i pi

)

+ ℓ∗q(pq−1 + pq))

=(( q−2

i=1

ℓipi

)

+(ℓq−1)(pq−1+pq))

−(( q−2

i=1

ℓ∗i pi

)

+(ℓ∗q−1)(pq−1+pq))

= M −M ∗≤ 0

Thus L ≤ L∗, so the Huffman code has minimum average length.The code is created using a decision tree, so it is instantaneous.The proof follows by induction. ✷13

Page 14: MATH3411 Chapter 3

Theorem (Knuth)The average codeword length L of each Huffman code isthe sum of all child node probabilities.

s1

s2

s3

s4

s5

s6

0.3

0.2

0.2

0.1

0.1

0.1

0.1

0.2

0.2

0.2

0.3

0

1

0.2

0.2

0.2

0.3

0.3

0

1

0.3

0.3

0.3

0.4

0

1 0.4

0.60

1

1.00

1

01

11

000

001

100

101

codew

ords

L = 0.3×2 + 0.2×2 + 0.2×3 + 0.1×3 + 0.1×3 + 0.1×3= 2.5

L = 1.0 + 0.6 + 0.4 + 0.3 + 0.2 = 2.5

ProofThe tree-path for symbol si will pass through exactly ℓi child nodes.But pi will occur as part of the sum in each of the these child nodes.So, adding all child node probabilities adds ℓi copies of pi for each si;this is L. ✷

Huffman’s Algorithm also works for radix r:just combine r symbols at each step instead of 2.

However, there are at least two ways to do this:1 Combine the last r symbols at each combining step.2 First add dummy symbols; then combine the last r symbols at each step.

It turns out that 2 is the best strategy.If there are k combining steps, then we need

|S| = k(r − 1) + r = (k + 1)(r − 1) + 1

initial symbols. In other words, we must have |S| ≡ 1 (mod r − 1).We can ensure this by adding dummy symbols. 14

Page 15: MATH3411 Chapter 3

Huffman’s Algorithm (radix r)

Input: a source S = {s1, . . . , sq} and probabilities p1, . . . , pqOutput: a code C for S, given by a decision tree

Add dummy symbols until q = |S| ≡ 1 (mod r − 1).

Combining phaseReplace the last r symbols sq−r+1, . . . , sq

by a new symbol with probability pq−r+1 + · · · + pq.Reorder the symbols by their probabilities.Repeat until there is only one symbol left.

Splitting phaseRoot-label this symbol.Draw edges from each child node to the r preceding nodes.Label these edges from top to bottom by 0, . . . , q − 1.

Example

Consider a source s1, . . . , s6 with probabilities 0.3, 0.2, 0.2, 0.1, 0.1, 0.1 .

s1

s2

s3

s4

s5

s6

dummy

0.3

0.2

0.2

0.1

0.1

0.1

0

0.1

0.2

0.2

0.2

0.3

0

1

2

0.2

0.2

0.3

0.50

1

2

1

0

1

2

1

00

01

02

20

21

codew

ords

L = 1.0 + 0.5 + 0.2 = 1.7

V = 0.3× 12 + · · · + 0.1× 22 − L2 = 0.21

With radix r = 3, there are 6 (mod r − 1) = 0 symbols.We need to add 1 dummy symbol.

15

Page 16: MATH3411 Chapter 3

Extensions

Given a source S = {s1, . . . , sq} with associated probabilities p1, . . . , pq,the nth extension is the Cartesian product

Sn = S × · · · × S︸ ︷︷ ︸

n

= {s′1 · · · s′

n : s′1, . . . , s′

n ∈ S} = {σ1, . . . , σqn}

together with the following probability for each new symbol σi ∈ Sn:

p(n)i = P (σi) = P (s′1 · · · s

n) = P (s′1) · · ·P (s′n)

Note that we implictly assume that p1, . . . , pq are independent.

We usually order the symbols σi so that p(n)1 , . . . , p

(n)qn are non-increasing.

Example

Consider source S = {a, b} with associated probabilities3

4,1

4.

We apply the (binary) Huffman algorithm:

S1 = S pi ci S2 p(2)i ci S3 p

(3)i ci

a 34

0 aa 916

0 aaa 2764

1

b 14

1 ab 316

11 aab 964

001

ba 316

100 aba 964

010

bb 116

101 baa 964

011

abb 364

00000

bab 364

00001

bba 364

00010

bbb 164

00011

L(1) = L = 1

L(2) = 2716

L(3) = 15864

Average length per S-symbol for S: 1

Average length per S-symbol for S2: L(2)

2= 27

32≈ 0.84

Average length per S-symbol for S3: L(3)

3= 158

192≈ 0.82 16

Page 17: MATH3411 Chapter 3

Markov sources

A k-memory source S is one whose symbols each depend on the previous k.

If k = 0, then no symbol depends on any other, and S is memoryless.

If k = 1, then S is a Markov source.

pij = P (si|sj) is the probability of si occurring right after a given sj .

The matrix M = (pij) is the transition matrix.

Entry pij is the probability of getting from state sj to state si.

A Markov process is a set of states (the source S)and probabilities pij = P (si|sj) of getting from state sj to state si.

Example

Consider Sydney, Melbourne, and Elsewhere in Australia.A simple Markov model for the populations of these is that, each year,

population growth by births, deaths, and emmi-/immigration is 0%of people living in Sydney, 5%move to Melbourne; 3%move Elsewhereof people living in Melbourne, 4%move to Sydney; 2%move Elsewhereof people living Elsewhere, 7%move to Sydney; 6%move to Melbourne.

S = {Sydney, Melbourne, Elsewhere}

S M

E

0.92 0.94

0.87

0.05

0.04

0.07

0.03

0.

020.

06 M =

S M ES 0.92 0.04 0.07M 0.05 0.94 0.06E 0.03 0.02 0.87

From

To

LemmaThe sum of entries in any column of M is 1.

17

Page 18: MATH3411 Chapter 3

Let xk =

skmk

ek

denote the population distribution after k years.

Thenxk+1 = Mxk and xk = Mkx0

Suppose that the initial population distribution is x0 =

4.5M4M14M

.

After k = 20 years, the population distribution is then

x20 = M 20x0 =

0.41 0.34 0.380.42 0.52 0.440.16 0.15 0.19

4.5M4M14M

=

9.5M10.5M4M

Note that S and M 20 also form a Markov chain.Eg., after 20 years, most people will have left Sydney (41% remain),whereas most people will have stayed in Melbourne (52%).To find a stable population distribution, we need to find a state x0

for which xk = xk−1 = · · · = x1 = x0; that is, Mx0 = x0.

In other words, we need an eigenvector x0 of M for the eigenvalue 1; e.g.,

x0 =

0.60.760.26

. If we want an eigenvector with actual population numbers,

then we must scale x0 by 4.5M+4M+14M0.6+0.76+0.26

: x0 =

8.3M10.6M3.6M

A Markov process M is in equilibrium p if p = Mp.In this case, p is an eigenvector for M and the eigenvalue 1.

We will assume thatM is ergodic: we can get from any state j to any state i.M is aperiodic: the gcd of cycle lengths is 1.

TheoremUnder the above assumptions, M has a non-zero equilibrium state.

We will only consider equilibriums p with |p| = 1. 18

Page 19: MATH3411 Chapter 3

Chapter 3: Compression Coding

Lecture 15

Huffman coding for stationary Markov sources

Consider a Markov source S = {s1, . . . , sq} with probabilities p1, . . . , pq,transition matrix M and equilibrium p.

Define

HuffE : the binary Huffman code on p (ordered)

Huff(i): the binary Huffman code on the (ordered) ith column of M .

HuffM : s1 is encoded by HuffE ; for i > 1, si is encoded by Huff(i−1)

This gives average lengths

LE

L(1), . . . , L(q)

LM ≈ p1L(1) + · · · + pqL(q).

Importantly, LM ≤ LE.

Example

Transition matrix M =

0.3 0.1 0.10.5 0.1 0.550.2 0.8 0.35

has equilibrium p =1

8

134

.

pi HuffE

s118

01

s238

00

s312

1

pi Huff(1)

s1 0.3 00

s2 0.5 1

s3 0.2 01

pi Huff(2)

s1 0.1 10

s2 0.1 11

s3 0.8 0

pi Huff(3)

s1 0.10 11

s2 0.55 0

s3 0.35 10

LE = 1.5 L(1) = 1.5 L(2) = 1.2 L(3) = 1.45

LM =1

8L(1) +

3

8L(2) +

4

8L(3) ≈ 1.36 < LE

Therefore, compared to a 2-bit block code C,this Huffman code compresses the message length to

LM

LC

=1.36

2= 68%.

19

Page 20: MATH3411 Chapter 3

Let us now encode the message s1s2s3s3s2s1s2 :

symbol code to use encoded symbol

s1 HuffE 01s2 Huff(1) 1s3 Huff(2) 0s3 Huff(3) 10s2 Huff(3) 0s1 Huff(2) 10s2 Huff(1) 1

The message is encoded as 0110100101 .

Let us now decode the message 0110100101 :

code to use encoded symbol decoded symbol

HuffE 01 s1Huff(1) 1 s2Huff(2) 0 s3Huff(3) 10 s3Huff(3) 0 s2Huff(2) 10 s1Huff(1) 1 s2

The message is decoded as s1s2s3s3s2s1s2 .

20

Page 21: MATH3411 Chapter 3

Compression Coding

Huffman coding

Huffman coding of extensions

Huffman coding of Markov sources

Arithmetic coding

Dictionary methods

Lossy compression

much more

Arithmetic coding

Input: source S = {s1, . . . , sq} where sq = • is a stop-symbolprobabilities p1, . . . , pqA message si1 · · · sin where sin = sq = •

Output: The message encoded, given by a number between 0 and 1

Algorithm:Partition the interval [0, 1) into sub-intervals of lengths p1, . . . , pq.Crop to the i1th sub-interval.Partition this sub-interval according to relative lengths p1, . . . , pq.Crop to the i2th sub-sub-interval.Repeat in this way until the whole message has been encoded.

21

Page 22: MATH3411 Chapter 3

ExampleConsider symbols s1, s2, s3, s4 = • with probabilities 0.4, 0.3, 0.15, 0.15.Let us encode the message s2s1s3 • :

subinterval start subinterval width

0 1

s2 0 + .4 = .4 .3×1 = .3

s1 .4 + 0×.3 = .4 .4×.3 = .12

s3 .4 + .7×.12 = .484 .15×.12 = .018

• .484 + .85×.018 = .4993 .15×.018 = .0027

0 .4 .7 .85 1

We must therefore choose a number in the interval

[0.4993, 0.4993 + 0.0027) = [0.4993, 0.5020)

For instance, we may simply choose the number 0.5.

ExampleConsider symbols s1, s2, s3, s4 = • with probabilities 0.4, 0.3, 0.15, 0.15.Let us decode the number 0.5 :

0 .4 .7 .85 1s1 s2 s3 •

code number rescaled in interval decoded symbol

0.5 [0.4, 0.7) s2(0.5− 0.4)/.3 = 0.33333 [0, 0.4) s1

(0.3333− 0)/.4 = 0.83333 [0.7, 0.85) s3(0.8333− 0.7)/.15 = 0.88889 [0.85, 1) •

The decoded message is then s2s1s3 • .

22

Page 23: MATH3411 Chapter 3

Dictionary methods

LZ77, LZ78, LZW, othersFor instance used in gzip, gif, ps

LZ78

Input: a message r = r1 · · · rnOutput: The message encoded, given by a dictionary

Algorithm:Begin with a dictionary D = {∅}.Find the longest prefix s of r in D (possibly ∅), in entry ℓ.Find the symbol c just after s.Append sc to D, remove it from r, and output (ℓ, c).Repeat in this way until the whole message has been encoded.

Loosely speaking, LZ78 encodes by finding new codewords, adding themto a dictionary, and recognising them subsequently.

23

Page 24: MATH3411 Chapter 3

ExampleLet us encode the message abbcbcababcaa :

r s ℓ new dictionary entry output

abbcbcababcaa ∅ 0 1. a (0,a)

bbcbcababcaa ∅ 0 2. b (0,b)

bcbcababcaa b 2 3. bc (2,c)

bcababcaa bc 3 4. bca (3,a)

babcaa b 2 5. ba (2,a)

bcaa bca 4 6. bcaa (4,a)

The message is encoded as (0,a)(0,b)(2,c)(3,a)(2,a)(4,a)

ExampleLet us decode the message (0,c)(0,a)(2,a)(3,b)(4,c)(4,b) :

output new dictionary entry

(0,c) 1. c

(0,a) 2. a

(2,a) 3. aa

(3,b) 4. aab

(4,c) 5. aabc

(4,b) 6. aabb

The message is encoded as caaaaabaabcaabb

24