1 Analysis of Algorithms Chapter - 08 Data Compression.

11
1 Analysis of Algorithms Chapter - 08 Data Compression

Transcript of 1 Analysis of Algorithms Chapter - 08 Data Compression.

Page 1: 1 Analysis of Algorithms Chapter - 08 Data Compression.

1

Analysis of Algorithms

Chapter - 08Data Compression

Page 2: 1 Analysis of Algorithms Chapter - 08 Data Compression.

2

This Chapter Contains the following Topics:

1. Why Data Compression?2. Lossless and Lossy Compression3. Fixed-Length Coding4. Variable-Length Coding5. Huffman Coding

Page 3: 1 Analysis of Algorithms Chapter - 08 Data Compression.

3

Why Data Compression?

What is data compression? Transformation of data into a more compact

form. Transfer rate of compressed data is more than

the uncompressed data. Why compress data?

Saves storage space. Saves transmission time over a network.

Examples: Suppose ASCII code of a character is 1 byte. Suppose we have a text file containing one

hundred instances of ‘a’. So, file size would be about 100 bytes. Let us store this as “100a” in a new file to

convey the same information New file size would be 4 bytes 4/100 96% saving

Page 4: 1 Analysis of Algorithms Chapter - 08 Data Compression.

4

Lossless and Lossy Data Compression

Last example shows “lossless” compression. Can retrieve original data by

decompression. Lossless compression used when data

integrity is important. Example software:

winzip, gzip, compress etc. “Lossy” means original not retrievable.

Reduces size by permanently eliminating certain information.

When uncompressed, only a part of the original information is there (but the user may not notice it)

When can we use lossy compression? For audio, images, video. jpeg, mpeg etc. are example softwares.

Page 5: 1 Analysis of Algorithms Chapter - 08 Data Compression.

5

Fixed- Length Coding

Coding: Way to represent information Two ways:

Fixed-Length and Variable-Length Coding. The code for a character is a “codeword”. We consider binary codes, each character

represented by a unique binary codeword. Fixed-length coding

Length of codeword of each character same E.g., ASCII, Unicode etc.

Suppose there are n characters What is the minimum number of bits needed for

fixed-length coding? log2 n

Example: {a, b, c, d, e}; 5 characters log2 5 = 2.3… = 3 bits per character

We can have codewords: a=000, b=001, c=010, d=011, e=100.

Page 6: 1 Analysis of Algorithms Chapter - 08 Data Compression.

6

Variable-Length Coding

Length of codewords may differ from character to character.

Frequent characters get short codewords. Infrequent ones get long codewords. Example:

a b c d e f

Frequency 46 13 12 16 8 5

Codeword 0 101 100 111 1101 1100

Make sure that a codeword does not occur as the prefix of another codeword

What we need is a “prefix-free code”. Last example is a prefix-free code

Prefix-free codes give unique decoding E.g., “001011101” is decoded as “aabe”

based on the table in last example Huffman coding algorithm shows how to obtain

prefix-free codes.

Page 7: 1 Analysis of Algorithms Chapter - 08 Data Compression.

7

Huffman Coding Algorithm Huffman invented a greedy method to construct

an optimal prefix-free variable-length code Code based on frequency of occurrence

Optimal code given by a full binary tree Every internal node has 2 children If |C| is the size of alphabet, , there are |C|

leaves and |C|-1 internal nodes We build the tree bottom-up

Begin with |C| leaves Perform |C|-1 “merging” operations Let f [c] denote frequency of character c

We use a priority queue Q in which high priority means low frequency

GetMin(Q) removes element with the lowest frequency and returns it

Page 8: 1 Analysis of Algorithms Chapter - 08 Data Compression.

8

An Algorithm

Input: Alphabet C and frequencies f [ ]Result: Optimal coding tree for C

Algorithm Huffman(C, f){ n := |C|;

Q := C;for i := 1 to n-1 do{ z := NewNode( );

x := z.left := GetMin(Q); y := z.right := GetMin(Q); f [z] := f [x] + f [y]; Insert(Q, z);

} return GetMin(Q);} Running time is O(n lg n)

Page 9: 1 Analysis of Algorithms Chapter - 08 Data Compression.

9

Example

Obtain the optimal coding for the following using the Huffman Algorithm

Character a b c d e f

Frequency 45 13 12 16 9 5

Page 10: 1 Analysis of Algorithms Chapter - 08 Data Compression.

10

Example (Contd.)

Page 11: 1 Analysis of Algorithms Chapter - 08 Data Compression.

11

End of Chapter - 07