1 Analysis of Algorithms Chapter - 08 Data Compression.

1

Analysis of Algorithms

Chapter - 08Data Compression

2

This Chapter Contains the following Topics:

1. Why Data Compression?2. Lossless and Lossy Compression3. Fixed-Length Coding4. Variable-Length Coding5. Huffman Coding

3

Why Data Compression?

What is data compression? Transformation of data into a more compact

form. Transfer rate of compressed data is more than

the uncompressed data. Why compress data?

Saves storage space. Saves transmission time over a network.

Examples: Suppose ASCII code of a character is 1 byte. Suppose we have a text file containing one

hundred instances of ‘a’. So, file size would be about 100 bytes. Let us store this as “100a” in a new file to

convey the same information New file size would be 4 bytes 4/100 96% saving

4

Lossless and Lossy Data Compression

Last example shows “lossless” compression. Can retrieve original data by

decompression. Lossless compression used when data

integrity is important. Example software:

winzip, gzip, compress etc. “Lossy” means original not retrievable.

Reduces size by permanently eliminating certain information.

When uncompressed, only a part of the original information is there (but the user may not notice it)

When can we use lossy compression? For audio, images, video. jpeg, mpeg etc. are example softwares.

5

Fixed- Length Coding

Coding: Way to represent information Two ways:

Fixed-Length and Variable-Length Coding. The code for a character is a “codeword”. We consider binary codes, each character

represented by a unique binary codeword. Fixed-length coding

Length of codeword of each character same E.g., ASCII, Unicode etc.

Suppose there are n characters What is the minimum number of bits needed for

fixed-length coding? log2 n

Example: {a, b, c, d, e}; 5 characters log2 5 = 2.3… = 3 bits per character

We can have codewords: a=000, b=001, c=010, d=011, e=100.

6

Variable-Length Coding

Length of codewords may differ from character to character.

Frequent characters get short codewords. Infrequent ones get long codewords. Example:

a b c d e f

Frequency 46 13 12 16 8 5

Codeword 0 101 100 111 1101 1100

Make sure that a codeword does not occur as the prefix of another codeword

What we need is a “prefix-free code”. Last example is a prefix-free code

Prefix-free codes give unique decoding E.g., “001011101” is decoded as “aabe”

based on the table in last example Huffman coding algorithm shows how to obtain

prefix-free codes.

7

Huffman Coding Algorithm Huffman invented a greedy method to construct

an optimal prefix-free variable-length code Code based on frequency of occurrence

Optimal code given by a full binary tree Every internal node has 2 children If |C| is the size of alphabet, , there are |C|

leaves and |C|-1 internal nodes We build the tree bottom-up

Begin with |C| leaves Perform |C|-1 “merging” operations Let f [c] denote frequency of character c

We use a priority queue Q in which high priority means low frequency

GetMin(Q) removes element with the lowest frequency and returns it

8

An Algorithm

Input: Alphabet C and frequencies f [ ]Result: Optimal coding tree for C

Algorithm Huffman(C, f){ n := |C|;

Q := C;for i := 1 to n-1 do{ z := NewNode( );

x := z.left := GetMin(Q); y := z.right := GetMin(Q); f [z] := f [x] + f [y]; Insert(Q, z);

} return GetMin(Q);} Running time is O(n lg n)

9

Example

Obtain the optimal coding for the following using the Huffman Algorithm

Character a b c d e f

Frequency 45 13 12 16 9 5

10

Example (Contd.)

11

End of Chapter - 07

1 Analysis of Algorithms Chapter - 08 Data Compression.

Documents

Transcript of 1 Analysis of Algorithms Chapter - 08 Data Compression.