Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer...
Transcript of Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer...
![Page 1: Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer than the buffer LZ77 will run into problems. LZ78: Drop the buffer and build a](https://reader033.fdocuments.net/reader033/viewer/2022060500/5f1aa8d9531bbd720a76a27a/html5/thumbnails/1.jpg)
Information TheoryLecture 6LZ codes
STEFAN HÖST
![Page 2: Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer than the buffer LZ77 will run into problems. LZ78: Drop the buffer and build a](https://reader033.fdocuments.net/reader033/viewer/2022060500/5f1aa8d9531bbd720a76a27a/html5/thumbnails/2.jpg)
Universal Source Coding
Introduction
If the statistics of the source is unknown:
• Use static dictionary.
• Estimate the probability distribution from the source block anduse e.g. Huffman. (Two sweep)
• Adaptive algorithm– Adaptive Huffman coding. (One sweep or windowed)– Sequential algorithms (windowed).
Stefan Höst Information theory 1
![Page 3: Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer than the buffer LZ77 will run into problems. LZ78: Drop the buffer and build a](https://reader033.fdocuments.net/reader033/viewer/2022060500/5f1aa8d9531bbd720a76a27a/html5/thumbnails/3.jpg)
Lempel-Ziv ’77
LZ77 (Lempel-Ziv 1977)
A sliding window technique with two buffers:
• Search buffer of length S (recent symbols)
• Look ahead buffer of length B (upcoming symbols)
• The offset j is the number of steps from the first symbol in thelook ahead buffer to the same symbol in the search buffer.
• The length of a match l is the number of symbols with matchbetween the buffers, starting at the offset.
Stefan Höst Information theory 2
![Page 4: Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer than the buffer LZ77 will run into problems. LZ78: Drop the buffer and build a](https://reader033.fdocuments.net/reader033/viewer/2022060500/5f1aa8d9531bbd720a76a27a/html5/thumbnails/4.jpg)
Lempel-Ziv ’77
S B
x : · · · · · · · · · a b c d x · · · a b c d z · · · · · · · · ·
time = n
jl l
Stefan Höst Information theory 3
![Page 5: Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer than the buffer LZ77 will run into problems. LZ78: Drop the buffer and build a](https://reader033.fdocuments.net/reader033/viewer/2022060500/5f1aa8d9531bbd720a76a27a/html5/thumbnails/5.jpg)
Lempel-Ziv ’77
LZ77 (Lempel-Ziv 1977)
• Init: Read the first S symbols to the search buffer.
• For the symbol xn, determine the offset with longest match.• The codeword is y = (j, l, r), where
– j is the offset– l is the length– r is the next symbol in the look ahead buffer
If there is no match send (0, 0, xn).
`(x) = dlogSe+ dlogBe+ dlog ke
Stefan Höst Information theory 4
![Page 6: Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer than the buffer LZ77 will run into problems. LZ78: Drop the buffer and build a](https://reader033.fdocuments.net/reader033/viewer/2022060500/5f1aa8d9531bbd720a76a27a/html5/thumbnails/6.jpg)
Lempel-Ziv ’77
Example
Consider a source alphabet X = {a, b, c, d , r}Encode the sequence
x =cabracadabrar rar rad . . .
with LZ77 using
• Search buffer of length S = 7
• Look ahead buffer of length B = 6
Stefan Höst Information theory 5
![Page 7: Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer than the buffer LZ77 will run into problems. LZ78: Drop the buffer and build a](https://reader033.fdocuments.net/reader033/viewer/2022060500/5f1aa8d9531bbd720a76a27a/html5/thumbnails/7.jpg)
Lempel-Ziv ’77
Improvements of LZ77
• Encode the triples (codewords) with a second, variablecodeword length, code (e.g. Huffman).
• (LZSS) Only reason to include uncoded char is to cope withno match.Use binary prefix to distinguish:
– If x in S: y = (0, j, l)– In x not in S: y = (1, x)
Stefan Höst Information theory 6
![Page 8: Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer than the buffer LZ77 will run into problems. LZ78: Drop the buffer and build a](https://reader033.fdocuments.net/reader033/viewer/2022060500/5f1aa8d9531bbd720a76a27a/html5/thumbnails/8.jpg)
Lempel-Ziv ’77
Example
Consider a source alphabet X = {a, b, c, d , r}Encode the sequence
x =cabracadabrar rar rad . . .
with LZSS using
• Search buffer of length S = 7
• Look ahead buffer of length B = 6
Stefan Höst Information theory 7
![Page 9: Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer than the buffer LZ77 will run into problems. LZ78: Drop the buffer and build a](https://reader033.fdocuments.net/reader033/viewer/2022060500/5f1aa8d9531bbd720a76a27a/html5/thumbnails/9.jpg)
Lempel-Ziv ’78
LZ77→LZ78
If the repetition period in the source text is longer than the bufferLZ77 will run into problems.
LZ78: Drop the buffer and build a dictionary instead.
Stefan Höst Information theory 8
![Page 10: Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer than the buffer LZ77 will run into problems. LZ78: Drop the buffer and build a](https://reader033.fdocuments.net/reader033/viewer/2022060500/5f1aa8d9531bbd720a76a27a/html5/thumbnails/10.jpg)
Lempel-Ziv ’78
LZ78 (Lempel-Ziv 1978)
• Init: Start with an empty dictionary.
• At time n, determine the largest l such that xn . . . xn+l−1 is inthe dictionary but xn . . . xn+l−1xn+l is not. Let j be a pointer tothe dictionary position.
• The codeword for xn . . . xn+l−1xn+l is y = (j, xn+l).Use
⌈log(max index)
⌉bits to encode j
• Add xn . . . xn+l−1xn+l to the dictionary.
• If there is no match send (0, xn).
Stefan Höst Information theory 9
![Page 11: Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer than the buffer LZ77 will run into problems. LZ78: Drop the buffer and build a](https://reader033.fdocuments.net/reader033/viewer/2022060500/5f1aa8d9531bbd720a76a27a/html5/thumbnails/11.jpg)
Lempel-Ziv ’78
Example
Consider a source alphabet X = {a, b, c, d , r}Encode the sequence
x =cabracadabrar rar rad . . .
with LZ78.
Stefan Höst Information theory 10
![Page 12: Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer than the buffer LZ77 will run into problems. LZ78: Drop the buffer and build a](https://reader033.fdocuments.net/reader033/viewer/2022060500/5f1aa8d9531bbd720a76a27a/html5/thumbnails/12.jpg)
Lempel-Ziv ’78
LZ78→LZW
Get rid of the uncoded symbol in the codeword.
Only used together with (0, r) when a single letter is not in thedictionary.⇒Initiate the dictionary with the alphabet.
Stefan Höst Information theory 11
![Page 13: Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer than the buffer LZ77 will run into problems. LZ78: Drop the buffer and build a](https://reader033.fdocuments.net/reader033/viewer/2022060500/5f1aa8d9531bbd720a76a27a/html5/thumbnails/13.jpg)
Lempel-Ziv ’78
LZW (Lempel-Ziv-Welch)
• Init: Start with a dictionary with all letters.
• At time n, determine the largest l such that xn . . . xn+l−1 is inthe dictionary but xn . . . xn+l−1xn+l is not. Let j be a pointer tothe dictionary position.
• The codeword for xn . . . xn+l−1 is y = (j).
• Add xn . . . xn+l−1xn+l to the dictionary.
Stefan Höst Information theory 12
![Page 14: Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer than the buffer LZ77 will run into problems. LZ78: Drop the buffer and build a](https://reader033.fdocuments.net/reader033/viewer/2022060500/5f1aa8d9531bbd720a76a27a/html5/thumbnails/14.jpg)
Lempel-Ziv ’78
Example
Consider a source alphabet X = {a, b, c, d , r}Encode the sequence
x =cabracadabrar rar rad . . .
with LZW.
Stefan Höst Information theory 13
![Page 15: Information Theory Lecture 6 LZ codes · LZ78 If the repetition period in the source text is longer than the buffer LZ77 will run into problems. LZ78: Drop the buffer and build a](https://reader033.fdocuments.net/reader033/viewer/2022060500/5f1aa8d9531bbd720a76a27a/html5/thumbnails/15.jpg)
Lempel-Ziv in the real world
Applications of LZ
The LZ algorithms are used in many applications (often followed byHuffman). Among others:• LZ77
– deflate (implementation of LZ77 + Huffman)– zip, gzip, 7z– PNG
• LZW– compress (UNIX)– GIF, TIFF
Stefan Höst Information theory 14