Introduction to Computer Science 2 Balanced Binary Search Trees (2) & Extended Binary Trees

45
© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Introduction to Computer Science 2 Balanced Binary Search Trees (2) & Extended Binary Trees Prof. Neeraj Suri Brahim Ayari

description

Introduction to Computer Science 2 Balanced Binary Search Trees (2) & Extended Binary Trees. Prof. Neeraj Suri Brahim Ayari. Height of AVL Trees. AVL trees are defined by the height difference of subtrees Original goal: the tree should be as “balanced” as possible - PowerPoint PPT Presentation

Transcript of Introduction to Computer Science 2 Balanced Binary Search Trees (2) & Extended Binary Trees

Page 1: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

© Neeraj SuriEU-NSF ICT March 2006

Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de

Introduction to Computer Science 2

Balanced Binary Search Trees (2)&

Extended Binary Trees

Prof. Neeraj SuriBrahim Ayari

Page 2: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 2Balanced Binary Search Trees (2) & Extended Binary Trees

Height of AVL Trees

AVL trees are defined by the height difference of subtrees

Original goal: the tree should be as “balanced” as possible

How balanced is an AVL tree? The answer is given by the theorem of height of an AVL

tree:

Theorem: For the height h(T) of an AVL tree with n nodes holds:

log2n + 1 h(T) 1.44 log2( n+1 )

Page 3: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 3Balanced Binary Search Trees (2) & Extended Binary Trees

Fibonacci Trees

The lower bound log2n + 1 h(T) comes from the minimal height of a balanced binary tree (already shown)

For the proof of the upper bound one needs a special class of AVL trees: Fibonacci trees

Fibonacci numbers: F0 = 0, F1 = 1, Fn = Fn-1 + Fn-2

Definition: Fibonacci Trees are constructed as follows: The empty tree T0 is a Fibonacci tree (height 0) The tree T1, that contains only one node is a Fibonacci tree of height 1 If Th-1 and Th-2 are Fibonacci trees of heights h-1 and h-2, and x a node,

then Th = (Th-1, x, Th-2) is a Fibonacci tree of height h No other trees are Fibonacci trees

-> Observe: the number of nodes on the path from root to the deepest leaf gives the height of the Fibonacci tree !

Page 4: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 4Balanced Binary Search Trees (2) & Extended Binary Trees

Number of nodes

n0 = 0, F0= 0

n1 = 1, F1= 1

n2 = 2 , F2= 1

n3 = 4, F3= 2

Fibonacci Trees

T0 : empty tree

T1: one node

T2: (T1, x, T0)

x

T3: (T2, x, T1)

x

Page 5: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 5Balanced Binary Search Trees (2) & Extended Binary Trees

Number of nodes

n4 = 7 , F4= 3

n5 = 12 , F5=5

Fibonacci Trees

T4: (T3, x, T2)

T5: (T4, x, T3)

xT3

T2

T4

T3

T6, T7, etc. analogue

Page 6: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 6Balanced Binary Search Trees (2) & Extended Binary Trees

Fibonacci and AVL Trees

To prove: Every Fibonacci tree is an AVL tree

Proof (by induction over h): Note: Th is always a tree of height h

T0 and T1 are AVL trees

If Th-1 and Th-2 are AVL trees, build according to the rules Th = (Th-1, x, Th-2).

As Th-1 and Th-2 are AVL trees, we must now only check the balancing factor of the root

BF(Th) = | h(Th-1) - h(Th-2) | = | (h - 1) - (h - 2) | = 1

Page 7: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 7Balanced Binary Search Trees (2) & Extended Binary Trees

Fibonacci and AVL Trees

Special note: for a given Fibonacci tree there are no AVL trees with the same height and fewer nodes

The construction gives AVL trees with maximal height

One can add more nodes with kept height, but remove none without violating the AVL criterion (height is kept unchanged)

Fibonacci trees gives the maximal height of an AVL tree for a given number of nodes

Note: the number of nodes nh in Th is the number of nodes in the (h+2)-th Fibonacci number minus 1, i.e.,

nh = Fh+2 - 1 (for n 0)

Page 8: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 8Balanced Binary Search Trees (2) & Extended Binary Trees

Fibonacci and AVL Trees

The following inequality holds for Fibonacci numbers:Fh h-2 for h 2 and = ½ ( 1 + 5 )

n is the number of nodes in an AVL tree of height h. As Th contains a minimal number of nodes:

n nh

Insert nh = Fh+2 - 1:

n nh = Fh+2 - 1 h - 1 thus n + 1 h

Number of nodes grows exponentially with the height Reversely:

h log (n + 1) = (1 / log2) log2(n+1) = 1.44... log2(n+1)

Thus: search path in an AVL tree is in worst case 44% longer than in a complete tree

Page 9: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 9Balanced Binary Search Trees (2) & Extended Binary Trees

Cost Analysis of AVL Trees

h c•log2 (n+1) means: the height of an AVL tree is limited by O(log2n)

Cost for insertion is in O( log2n ) One should only consider the path from the root to the insertion point Rotations have constant costs

Cost for deletion is in O( log2n ) For every node on the path from the root to the deleted node results in

maximally one rotation

AVL trees are worst case efficient implementations of binary search trees Natural trees need (n) steps in worst case

Calculating the average height is still an open problem

Empirical results give h = c + log2n for c 0,2

Page 10: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 10Balanced Binary Search Trees (2) & Extended Binary Trees

Weight Balanced Binary Search Trees

Treat the “weight difference” of two subtrees as a measure of balancing

Weight = number of nodes in subtree

The properties are very similar to height balanced binary trees

Let T be a binary search tree, TL the left subtree and n(X) the number of nodes in a tree X

Definition: the value (T) = (n(TL) + 1) / (n(T) + 1) is the root balance of T

Definition: a tree T is -balanced, if for every subtree T’ holds that:

(T’) 1 -

Page 11: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 11Balanced Binary Search Trees (2) & Extended Binary Trees

Condition (T’) 1 -

The set of all -balanced binary trees are called BB() („bounded balance“).

The definition of balance only considers the left subtree, but for a BB() tree holds also for every subtree 1 - ’(T’) 1 - where ’ analogue to is defined on the right subtree

Parameter defines the “distance” from a complete tree: = ½ only complete trees allowed < ½ relaxed condition = 0 no structural conditions > ½ makes no sense to consider

Page 12: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 12Balanced Binary Search Trees (2) & Extended Binary Trees

Example

(T) = (n(TL) + 1) / (n(T) + 1) Choose = 0.3, then holds for

every subtree = 0.3 1 - = 0.7

Tree is in BB() for = 0.3

Subtree with root

Mars 3/10 = 0.3

Jupiter 2/3 = 0.67

Pluto 3/7 = 0.43

Mercury 1/3 = 0.33

Uranus 2/4 = 0.5

Pluto

Mars

Jupiter

Earth Mercury Uranus

VenusSaturnNeptune

Page 13: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 13Balanced Binary Search Trees (2) & Extended Binary Trees

Notes

Already noted: = ½ holds for complete trees

Root balance < ½ means: there are fewer nodes in the left subtree limits the root balance symmetrically from both sides

Left tree is complete: root balance goes towards 1 with increasing number of nodes Only = 0 allows all “degenerations”

Not every tree (with n nodes) can be transformed into a BB() tree for any

There is at least one tree in BB() when 0,25 1 - ½ 2 0,292

Page 14: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 14Balanced Binary Search Trees (2) & Extended Binary Trees

Height of Weight Balanced Trees

Note: when traversing the path from the root to the leaves one “looses”, dependent on , a number of nodes at every step

Consider the path p = v1, v2, ..., vh

For the right and left subtree TL and TR of a tree T holds (due to the BB() condition)

n(TL) + 1 ( 1 - ) (n(T) + 1)

n(TR) + 1 ( 1 - ) (n(T) + 1)

Traversal of path p:n(v2) + 1 ( 1 - ) (n(v1) + 1)

n(v3) + 1 ( 1 - ) (n(v2) + 1)

n(vh) + 1 ( 1 - ) (n(vh-1) + 1)

Page 15: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 15Balanced Binary Search Trees (2) & Extended Binary Trees

Height of Weight Balanced Trees

As v1 is the root and vh a leaf, holds: n(T) + 1 = n(v1) + 1 and n(vh) + 1 = 2

Insertion in the total inequality :2 = n(vh) + 1 (1 - )h-1 (n(v1) + 1) = (1 - )h-1 (n(T) + 1)

Apply logarithms on both sides:1 (h - 1)log2(1 - ) + log2 (n(T) + 1)

Thus (note: log2(1 - ) < 0 for > 0):

h - 1 log2 (n(T) + 1) / c O(log2n)

Height of the tree is logarithmic in the number of nodes

Page 16: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 16Balanced Binary Search Trees (2) & Extended Binary Trees

Operations on Weight Balanced Binary Trees

Search is the same as for AVL trees Cost is logarithmic

For insertion/deletion the root balance must be updated along the path from the root to the corresponding position

By violation of the criterion: rotations as for AVL trees

Open issues: Are rotations appropriate measures for restructuring BB() trees? How does one effectively calculate the root balance?

The number of rotations on the path to the root is limited: search/insertion/deletion are all in O(log2n)

Page 17: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 17Balanced Binary Search Trees (2) & Extended Binary Trees

Position Search in Balanced Binary Search Tree

Comparison: Tree implementations vs. linked lists Balanced trees allows (almost) all operations in O(log2n) Linked lists need for search/insertion/deletion in O(n)! For sequential traversal both perform in O(n)

Should sorted data always be stored in trees?! One should not underestimate the implementation costs “Last” operation where lists “win” is for positional search (the pth element)

Positional search: Find the kth element in a list For trees the “list” is an inorder traversal

Page 18: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 18Balanced Binary Search Trees (2) & Extended Binary Trees

The Problem

For lists: Travers k elements in O(k)

For trees: One does not “know” whether to go left or right, and one does not know

anything about the number of nodes in the subtrees Worst case all nodes must be visited: O(n)!

That can be improved!

?...

Page 19: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 19Balanced Binary Search Trees (2) & Extended Binary Trees

Rank of a Node

Definition: The rank of a node is the number of nodes in the left subtree plus 1

Rank = position of node x in the tree where x is root

class BinarySearchTree {int K; /* Key */Info info; /* info */int balance; /* BF, for AVL trees: -1, 0, +1 */int rank;BinarySearchTree L, R;/* constructor und methods ... */

public BinarySearchTree posFind(int pos) { ... }}

Page 20: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 20Balanced Binary Search Trees (2) & Extended Binary Trees

Algorithm

Pseudo code: Start in the root If pos < rank: search in the left subtree If pos > rank: subtract the rank from the position and search in the right

subtree Search stops when pos = rank

Correctness: The rank of a node is always its position in the subtree where it is the root

Note: when inserting/deleting in the left subtree, the nodes upwards until the root must update their ranks

Page 21: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 21Balanced Binary Search Trees (2) & Extended Binary Trees

Example

3

PragueBonn

Bern

2

Lima

5

Sofia

3

2

Paris

2

Cairo

1

Athens

1

Oslo

1

Rome

1

Tokyo

1

pos = 4 -> Cairopos = 9 -> Rome

pos=1

pos=2

pos=3

pos=4

pos=5

pos=6

pos=7

pos=8

pos=9

pos=10

pos=11

Page 22: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 22Balanced Binary Search Trees (2) & Extended Binary Trees

Java Method

public BinarySearchTree findPos( int pos ) {BinarySearchTree root = this;while ( ( root null ) && ( pos root.rank )) {

if ( pos < root.rank ) {root = root.L;

} else {pos = pos - root.rank;root = root.R;

}} return root;

}Complexity in balanced tree

O(log2n)

Page 23: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 23Balanced Binary Search Trees (2) & Extended Binary Trees

Summary: Balanced Search Trees

Operation Sequential list Linked list Bal. tree with degree

Search O(log2n)

(binary search)

O(n) O(log2n)

Positional search (kth element)

O(1) O(k) O(log2n)

Insertion O(log2n) + O(n) O(n)O(1) known pos.

O(log2n)

Deletion O(log2n) + O(n) O(n)O(1) known pos.doubly linked

O(log2n)

Deletion kth element

O(n-k) O(k) O(log2n)

Sequential traversal

O(n) O(n) O(n)

Page 24: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 24Balanced Binary Search Trees (2) & Extended Binary Trees

Extended Binary Trees

Page 25: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 25Balanced Binary Search Trees (2) & Extended Binary Trees

Extended binary trees

Replace NULL-pointers with special (external) nodes. A binary tree, to which external nodes are added, is

called extended binary tree. The data can be stored either in the internal or the

external nodes. The length of the path to the node illustrates the cost of

the search.

Page 26: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 26Balanced Binary Search Trees (2) & Extended Binary Trees

External and internal path length

The cost of the search in extended binary trees depend on the following parameters:

External path length = The sum over all path lengths from the root to the external nodes Si (1 i n+1):

Extn = i = 1 ... n+1 depth( Si ) Internal path length = The sum over all path lengths to

the internal nodes Ki ( 1 i n ):

Intn = i = 1 ... n depth( Ki )

Extn = Intn + 2n (Proof by induction) Extended binary trees with a minimal external path

length have a minimal internal path length too.

Page 27: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 27Balanced Binary Search Trees (2) & Extended Binary Trees

Example

External path length Extn = 3 + 4 + 4 + 2 + 3 + 3 + 3 + 3 = 25

Internal path length Intn = 0 + 1 + 1 + 2 + 2 + 2 + 3 = 11

25 = Extn = Intn + 2n = 11 + 14 = 25

n = 7

0

4

1 1

2 22

3

4

3 3 3 33

2

Page 28: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 28Balanced Binary Search Trees (2) & Extended Binary Trees

Minimal and maximal length

For a given n, a balanced tree has the minimal internal path length.

Example: Within a complete tree with height h, the internal path length is (for n = 2h -1):

Intn = i = 1 ... h i • 2i Internal path length becomes maximum if the tree

degenerates to a linear list:

Intn = i = 1 ... n-1 i = n(n-1)/2

Example: h = 4, n = 15, Int = 34, Ext = 16•4 = 64

For comparison: List with n = 15 nodes has Int = 105, Ext = 105 + 30 = 135

Page 29: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 29Balanced Binary Search Trees (2) & Extended Binary Trees

Weighted binary trees

Often weights qi are assigned to the external nodes ( 1 i n+1 ).

The weighted external path length is defined as

Extw = i = 1 ... n+1 depth( Si ) qi

Within weighted binary trees the properties of minimal and maximal path lengths do not apply any more.

The determination of the minimal external path length is an important practical problem...

Extw = 102 Extw = 88 (less than 102 although linear list)3 8 15 25

8 3

15

25

Page 30: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 30Balanced Binary Search Trees (2) & Extended Binary Trees

Application example: optimal codes

To convert a text file efficiently to bit strings, there are two alternatives: Fixed length coding: each character has the same number of bits

(e.g., ASCII) Variable length coding: some characters are represented using

less bits than the others

Example for coding with fixed length: 3-bit code for alphabet A, B, C, D: A = 001, B = 010, C = 011, D = 100 Message: ABBAABCDADA is converted to 001010010001001010011100001100001 (length 33 bits) Using a 2-bit code the same message can be coded only with 22

bits. For decoding the message, group each 3-bits (respectively 2bits)

and use a table with the code and its matching character.

Page 31: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 31Balanced Binary Search Trees (2) & Extended Binary Trees

Application example: optimal codes (2)

Idea: More frequently used characters are coded using less bits.

Message: ABBAABCDADA Coding: 01010001011111001100 Length: 20 Bit! Variable length coding can reduce the memory space

needed for storing the file. How can this special coding be found and why is the

decoding unique?

Character A B C D

Frequency 5 3 1 2

Coding 0 10 111 110

Page 32: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 32Balanced Binary Search Trees (2) & Extended Binary Trees

Application example: optimal codes (3)

Representation of the frequencies and coding as a weighted binary tree.

First of all decoding: Given a bit string: Use the successive bits, in order to traverse the tree starting from

the root. If you arrive to an external node, use the character stored there.

Example: 010100010111...

• 1. Bit = 0: external node, A• 2. Bit = 1, from the root to the right• 3. Bit 0, links, external node, B• 4. Bit = 1, from the root to the right• 5. Bit 1, right• ...

3

5

2 1

0

0

0

1

1

1

A

B

D C

Page 33: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 33Balanced Binary Search Trees (2) & Extended Binary Trees

Correctness condition

Observation: Within variable length coding, the code of one character should not be a prefix of the code of any other character.

If a character is represented in form of an extended binary tree, then the uniqueness is guaranteed (only one character per external node).

If the frequency of the characters in the original text is taken as the weight of the external nodes, then a tree with minimal external path length will offer an optimal code.

How is a tree with minimal external path length generated?

Page 34: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 34Balanced Binary Search Trees (2) & Extended Binary Trees

Huffman Code

Idea: Characters are weighted and sorted according to the frequency This works as well independently from the text, e.g., in English

(characters with relative weights):

A binary tree with minimal external path length is constructed as follows: Each character is represented with an appropriate tree with its corresponding

weight (only one external node). The two trees having respectively the smallest weight are merged to a new tree. The root of the new tree is marked with the sum of the weights of the original roots. Continue until only one tree remains.

E 1231 T 959 A 805 O 794

N 719 I 718 S 659 R 603

H 514 L 403 D 365 C 320

U 310 P 229 F 228 M 225

W 203 Y 188 B 162 G 161

V 93 K 52 Q 20 X 20

J 10 Z 9

Page 35: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 35Balanced Binary Search Trees (2) & Extended Binary Trees

Example 1: Huffman

Alphabet and frequency:

E T N I S

29 10 9 5 4

Step 1: (4, 5, 9, 10, 29)new weight: 9

Step 2: (9, 9, 10, 29)

new weight: 18

4+5

4 5

0 1

9

4 5

0 1

9+9

0

9

1

Page 36: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 36Balanced Binary Search Trees (2) & Extended Binary Trees

Example 1: Huffman (2)

Step 3: (18, 10, 29) (10, 18, 29) new weight: 28

• Step 4: (28, 29)

finished!

9

4 5

0 1

18

0

9

1

10+18

10

0 1

9

4 5

0 1

18

0

9

1

28

10

0 1

29

57

0 1

Page 37: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 37Balanced Binary Search Trees (2) & Extended Binary Trees

Resulting tree

Coding:

Extw = 112 Using this coding, the code e.g., for:

TENNIS = 00101101101010100 SET = 0100100 NET = 011100

Decoding as described before.

9

S I

0 1

18

0

N

1

28

T

0 1

E

57

0 1Character Code Weight

E 1 29

T 00 10

N 011 9

I 0101 5

S 0100 4

Page 38: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 38Balanced Binary Search Trees (2) & Extended Binary Trees

Some remarks

The resulting tree is not regular. Regular trees are not always optimal. Example: the best nearly complete tree has Extw = 123

For the messageABBAABCDADA20 bits is optimal(see previousslides)

4 5

10 299

Page 39: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 39Balanced Binary Search Trees (2) & Extended Binary Trees

Example 2: Huffman

Average number of bits without Huffman:

3 (because 23 = 8)

Average number of bits using Huffman code:

There are other “valid” solutions! But the average number of bits remains the same for all these solutions (equal to Huffman)

Z p (%) Code

A 25 00

B 4 1110

C 13 100

D 7 110

E 35 01

F 11 101

G 2 11110

H 3 11111

54,203,0502,0511,0335,02

07,0313,0304,0425,02

Page 40: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 40Balanced Binary Search Trees (2) & Extended Binary Trees

Analysis

/* Algorithm Huffmann */for (int i = 1; i n-1; i++) {

p1 = smallest element in list L remove p1 from L p2 = smallest element in L

remove p2 from L create node p add p1 und p2 as left and right subtrees to p

weight p = weight p1 + weight p2

insert p into L}

Run time behavior depends in particular on the implementation of the list Time required to find the node with the smallest weight Time required to insert a new node

“Naive” implementations give O(n2), “smarter” result in O(n log2n)

Page 41: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 41Balanced Binary Search Trees (2) & Extended Binary Trees

Optimality

Observation: The weight of a node K in the Huffman tree is equal to the external path length of the subtree having K as root.

Theorem: A Huffman tree is an extended binary tree with minimal external path length Extw.

Proof outline (per induction over n, the number of the characters in the alphabet): The statement to prove is A(n) = “A Huffman tree with n nodes

has minimal external path length Extw”. Consider first n=2: Prove A(2) = “A Huffman tree with 2 nodes

has minimal external path length”.

Page 42: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 42Balanced Binary Search Trees (2) & Extended Binary Trees

Optimality (2)

Proof: n = 2: Only two characters with weights q1 and q2 result in a tree

with Extw = q1 + q2. This is minimal, because there are no other trees.

Induction hypothesis: For all i n, A(i) is true. To prove: A(n+1) is true.

V

T1 T2

Page 43: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 43Balanced Binary Search Trees (2) & Extended Binary Trees

Optimality (3)

Proof: Consider a Huffman tree T with n+1 nodes. This tree has a root V

and two subtrees T1 und T2, which have respectively the weights q1 and q2.

Considering the construction method we can deduce, that For the weights qi of all internal nodes ni of T1 and T2: qi min(q1, q2).

That’s why: for these weights qi: q1 + q2 > qi. So if V is replaced by any node in T1 or T2, the resulting tree will have a greater weight.

Replacing nodes within T1 and T2 will not make sense, because T1 and T2 are already optimal (both are trees with n nodes or less and the induction hypothesis hold for them).

So T is an optimal tree with n+1 nodes. V

T1 T2q1 q2

q1 + q2

Page 44: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 44Balanced Binary Search Trees (2) & Extended Binary Trees

Huffman Code: Applications

Fax machine

Page 45: Introduction to Computer Science 2  Balanced Binary Search Trees (2) & Extended Binary Trees

ICS-II - 2008 45Balanced Binary Search Trees (2) & Extended Binary Trees

Huffman: Other applications

ZIP-Coding (at least similar technique)

In principle: most of coding techniques with data reduction (lossless compression)

NOT Huffman: lossy compression techniques like JPEG, MP3, MPEG, …