Post on 18-Jan-2018
description
Lossless Decomposition and Huffman Codes
Sophia SoohooCS 157B
Lossless Data Compression
oAny compression algorithm can be viewed as a function that maps sequences of units into other sequences of units.oThe original data to be reconstructed from the compressed data. - LosslessoLossless is in contrast to lossy data compression, which only allows an approximation of the original data to be reconstructed in exchange for better compression rates.
David A. HuffmanoBS Electrical Engineering at Ohio
State UniversityoWorked as a radar maintenance
officer for the US NavyoPhD student, Electrical
Engineering at MIT 1952oWas given the choice of writing
a term paper or to take a final exam
oPaper topic: most efficient method for representing numbers, letters or other symbols as binary code
Huffman CodingoUses the minimum number of bitsoVariable length coding – good for data
transferoDifferent symbols have different lengths
oSymbols with the most frequency will result in shorter codewords
oSymbols with lower frequency will have longer codewords
o“Z” will have a longer code representation then “E” if looking at the frequency of character occurrences in an alphabet
oNo codeword is a prefix for another codeword!
DecodingSymbol Code
E 0T 11N 100I 1010S 1011
To determine the original message, read the string of bits from left to right and use the table to determine the
individual symbols
Decode the following: 11010010010101011
Decoding
11
Symbol
Code
E 0T 11N 100I 1010S 1011
0 100 100 1010
1011
T E N N I S
Original String:11010010010101011
Representing a Huffman Table as a Binary TreeoCodewords are presented by a binary
treeoEach leaf stores a characteroEach node has two children
oLeft = 0oRight = 1
oThe codeword is the path from the root to the leaf storing a given character
oThe code is represented by the leads of the tree is the prefix code
Constructing Huffman Codes
oGoal: construct a prefix code for Σ: associate each letter i with a codeword wi to minimize the average codeword length:
ExampleLetter
pi wi
A 0.1 000B 0.1 001C 0.2 01D 0.3 10E 0.3 11
Where pi = probability of wi
AlgorithmoMake a leaf node for node symboloAdd the generation probability for each
symbol to the leaf nodeoTake the two leaf nodes with the smallest probability (pi) and connect them into a new node (which becomes the parent of those nodes)oAdd 1 for the right edgeoAdd 0 for the left edgeoThe probability of the new node is the sum of
the probabilities of the two connecting nodesoIf there is only one node left, the code construction is completed. If not, to back to (2)
ExampleSymbol
Probability
A 0.387B 0.194C 0.161D 0.129E 0.129
Example – Creating the tree
D 0.12
9
C 0.16
1
A 0.38
7
B0.19
4
E 0.12
9
Symbol
Probability
A 0.387B 0.194C 0.161D 0.129E 0.129
Example – Iterate Step 2
Take the two leaf nodes with the smallest probability (pi) and connect them into a new node (which becomes the parent of those nodes)oGreen nodes – nodes to be evaluatedoWhite nodes – nodes which have already been evaluatedoBlue nodes – nodes which are added in this iteration
D 0.12
9
C 0.16
1
A 0.38
7
B0.19
4
E 0.12
9
0.258
Example – Iterate Step 2
D 0.12
9
C 0.16
1
A 0.38
7
B0.19
4
E 0.12
9
0.258
0.355
Note: when two nodes are connected by a parent, the parent should be evaluated in the next iteration
D 0.12
9
B 0.19
4
A 0.38
7
C0.16
1
E 0.12
9
0.258
0.355
0.613
Example – Iterate Step 2
Example: Completed Tree
D 0.12
9
C 0.16
1
A 0.38
7
B0.19
4
E 0.12
9
0.258
0.355
0.613
1 1
0
0 0
0
1
1 1
Example: Table for Huffman Code
Symbol
Probability
A 0B 111C 110D 100E 101
Generate the table by reading from the root node to the leaves for each symbol
PracticeSymbol
Occurrences
Huffman Code
A 0.45 ?B 0.13 ?C 0.12 ?D 0.16 ?E 0.09 ?F 0.05 ?
Practice Solution
C 0.12
0.14
A 0.45
D0.16
B 0.13
0.25
0.30
0.55
1 1
0
0 0
0
1
1 1
F 0.05
E 0.09
0 1
Questions?
Referencesohttp://www.cstutoringcenter.com/tutoria
ls/algorithms/huffman.phpohttp://en.wikipedia.org/wiki/Huffman_co
dingohttp://michael.dipperstein.com/huffman/
index.htmlohttp://en.wikipedia.org/wiki/David_A._Hu
ffmanohttp://www.binaryessence.com/dct/en00
0080.htm