1 Analysis of Algorithms Chapter - 08 Data Compression.
-
Upload
catherine-hardy -
Category
Documents
-
view
216 -
download
3
Transcript of 1 Analysis of Algorithms Chapter - 08 Data Compression.
1
Analysis of Algorithms
Chapter - 08Data Compression
2
This Chapter Contains the following Topics:
1. Why Data Compression?2. Lossless and Lossy Compression3. Fixed-Length Coding4. Variable-Length Coding5. Huffman Coding
3
Why Data Compression?
What is data compression? Transformation of data into a more compact
form. Transfer rate of compressed data is more than
the uncompressed data. Why compress data?
Saves storage space. Saves transmission time over a network.
Examples: Suppose ASCII code of a character is 1 byte. Suppose we have a text file containing one
hundred instances of ‘a’. So, file size would be about 100 bytes. Let us store this as “100a” in a new file to
convey the same information New file size would be 4 bytes 4/100 96% saving
4
Lossless and Lossy Data Compression
Last example shows “lossless” compression. Can retrieve original data by
decompression. Lossless compression used when data
integrity is important. Example software:
winzip, gzip, compress etc. “Lossy” means original not retrievable.
Reduces size by permanently eliminating certain information.
When uncompressed, only a part of the original information is there (but the user may not notice it)
When can we use lossy compression? For audio, images, video. jpeg, mpeg etc. are example softwares.
5
Fixed- Length Coding
Coding: Way to represent information Two ways:
Fixed-Length and Variable-Length Coding. The code for a character is a “codeword”. We consider binary codes, each character
represented by a unique binary codeword. Fixed-length coding
Length of codeword of each character same E.g., ASCII, Unicode etc.
Suppose there are n characters What is the minimum number of bits needed for
fixed-length coding? log2 n
Example: {a, b, c, d, e}; 5 characters log2 5 = 2.3… = 3 bits per character
We can have codewords: a=000, b=001, c=010, d=011, e=100.
6
Variable-Length Coding
Length of codewords may differ from character to character.
Frequent characters get short codewords. Infrequent ones get long codewords. Example:
a b c d e f
Frequency 46 13 12 16 8 5
Codeword 0 101 100 111 1101 1100
Make sure that a codeword does not occur as the prefix of another codeword
What we need is a “prefix-free code”. Last example is a prefix-free code
Prefix-free codes give unique decoding E.g., “001011101” is decoded as “aabe”
based on the table in last example Huffman coding algorithm shows how to obtain
prefix-free codes.
7
Huffman Coding Algorithm Huffman invented a greedy method to construct
an optimal prefix-free variable-length code Code based on frequency of occurrence
Optimal code given by a full binary tree Every internal node has 2 children If |C| is the size of alphabet, , there are |C|
leaves and |C|-1 internal nodes We build the tree bottom-up
Begin with |C| leaves Perform |C|-1 “merging” operations Let f [c] denote frequency of character c
We use a priority queue Q in which high priority means low frequency
GetMin(Q) removes element with the lowest frequency and returns it
8
An Algorithm
Input: Alphabet C and frequencies f [ ]Result: Optimal coding tree for C
Algorithm Huffman(C, f){ n := |C|;
Q := C;for i := 1 to n-1 do{ z := NewNode( );
x := z.left := GetMin(Q); y := z.right := GetMin(Q); f [z] := f [x] + f [y]; Insert(Q, z);
} return GetMin(Q);} Running time is O(n lg n)
9
Example
Obtain the optimal coding for the following using the Huffman Algorithm
Character a b c d e f
Frequency 45 13 12 16 9 5
10
Example (Contd.)
11
End of Chapter - 07