Introduction to Computer Science 2 Balanced Binary Search Trees (2) & Extended Binary Trees
-
Upload
hamilton-whitehead -
Category
Documents
-
view
34 -
download
1
description
Transcript of Introduction to Computer Science 2 Balanced Binary Search Trees (2) & Extended Binary Trees
© Neeraj SuriEU-NSF ICT March 2006
Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de
Introduction to Computer Science 2
Balanced Binary Search Trees (2)&
Extended Binary Trees
Prof. Neeraj SuriBrahim Ayari
ICS-II - 2008 2Balanced Binary Search Trees (2) & Extended Binary Trees
Height of AVL Trees
AVL trees are defined by the height difference of subtrees
Original goal: the tree should be as “balanced” as possible
How balanced is an AVL tree? The answer is given by the theorem of height of an AVL
tree:
Theorem: For the height h(T) of an AVL tree with n nodes holds:
log2n + 1 h(T) 1.44 log2( n+1 )
ICS-II - 2008 3Balanced Binary Search Trees (2) & Extended Binary Trees
Fibonacci Trees
The lower bound log2n + 1 h(T) comes from the minimal height of a balanced binary tree (already shown)
For the proof of the upper bound one needs a special class of AVL trees: Fibonacci trees
Fibonacci numbers: F0 = 0, F1 = 1, Fn = Fn-1 + Fn-2
Definition: Fibonacci Trees are constructed as follows: The empty tree T0 is a Fibonacci tree (height 0) The tree T1, that contains only one node is a Fibonacci tree of height 1 If Th-1 and Th-2 are Fibonacci trees of heights h-1 and h-2, and x a node,
then Th = (Th-1, x, Th-2) is a Fibonacci tree of height h No other trees are Fibonacci trees
-> Observe: the number of nodes on the path from root to the deepest leaf gives the height of the Fibonacci tree !
ICS-II - 2008 4Balanced Binary Search Trees (2) & Extended Binary Trees
Number of nodes
n0 = 0, F0= 0
n1 = 1, F1= 1
n2 = 2 , F2= 1
n3 = 4, F3= 2
Fibonacci Trees
T0 : empty tree
T1: one node
T2: (T1, x, T0)
x
T3: (T2, x, T1)
x
ICS-II - 2008 5Balanced Binary Search Trees (2) & Extended Binary Trees
Number of nodes
n4 = 7 , F4= 3
n5 = 12 , F5=5
Fibonacci Trees
T4: (T3, x, T2)
T5: (T4, x, T3)
xT3
T2
T4
T3
T6, T7, etc. analogue
ICS-II - 2008 6Balanced Binary Search Trees (2) & Extended Binary Trees
Fibonacci and AVL Trees
To prove: Every Fibonacci tree is an AVL tree
Proof (by induction over h): Note: Th is always a tree of height h
T0 and T1 are AVL trees
If Th-1 and Th-2 are AVL trees, build according to the rules Th = (Th-1, x, Th-2).
As Th-1 and Th-2 are AVL trees, we must now only check the balancing factor of the root
BF(Th) = | h(Th-1) - h(Th-2) | = | (h - 1) - (h - 2) | = 1
ICS-II - 2008 7Balanced Binary Search Trees (2) & Extended Binary Trees
Fibonacci and AVL Trees
Special note: for a given Fibonacci tree there are no AVL trees with the same height and fewer nodes
The construction gives AVL trees with maximal height
One can add more nodes with kept height, but remove none without violating the AVL criterion (height is kept unchanged)
Fibonacci trees gives the maximal height of an AVL tree for a given number of nodes
Note: the number of nodes nh in Th is the number of nodes in the (h+2)-th Fibonacci number minus 1, i.e.,
nh = Fh+2 - 1 (for n 0)
ICS-II - 2008 8Balanced Binary Search Trees (2) & Extended Binary Trees
Fibonacci and AVL Trees
The following inequality holds for Fibonacci numbers:Fh h-2 for h 2 and = ½ ( 1 + 5 )
n is the number of nodes in an AVL tree of height h. As Th contains a minimal number of nodes:
n nh
Insert nh = Fh+2 - 1:
n nh = Fh+2 - 1 h - 1 thus n + 1 h
Number of nodes grows exponentially with the height Reversely:
h log (n + 1) = (1 / log2) log2(n+1) = 1.44... log2(n+1)
Thus: search path in an AVL tree is in worst case 44% longer than in a complete tree
ICS-II - 2008 9Balanced Binary Search Trees (2) & Extended Binary Trees
Cost Analysis of AVL Trees
h c•log2 (n+1) means: the height of an AVL tree is limited by O(log2n)
Cost for insertion is in O( log2n ) One should only consider the path from the root to the insertion point Rotations have constant costs
Cost for deletion is in O( log2n ) For every node on the path from the root to the deleted node results in
maximally one rotation
AVL trees are worst case efficient implementations of binary search trees Natural trees need (n) steps in worst case
Calculating the average height is still an open problem
Empirical results give h = c + log2n for c 0,2
ICS-II - 2008 10Balanced Binary Search Trees (2) & Extended Binary Trees
Weight Balanced Binary Search Trees
Treat the “weight difference” of two subtrees as a measure of balancing
Weight = number of nodes in subtree
The properties are very similar to height balanced binary trees
Let T be a binary search tree, TL the left subtree and n(X) the number of nodes in a tree X
Definition: the value (T) = (n(TL) + 1) / (n(T) + 1) is the root balance of T
Definition: a tree T is -balanced, if for every subtree T’ holds that:
(T’) 1 -
ICS-II - 2008 11Balanced Binary Search Trees (2) & Extended Binary Trees
Condition (T’) 1 -
The set of all -balanced binary trees are called BB() („bounded balance“).
The definition of balance only considers the left subtree, but for a BB() tree holds also for every subtree 1 - ’(T’) 1 - where ’ analogue to is defined on the right subtree
Parameter defines the “distance” from a complete tree: = ½ only complete trees allowed < ½ relaxed condition = 0 no structural conditions > ½ makes no sense to consider
ICS-II - 2008 12Balanced Binary Search Trees (2) & Extended Binary Trees
Example
(T) = (n(TL) + 1) / (n(T) + 1) Choose = 0.3, then holds for
every subtree = 0.3 1 - = 0.7
Tree is in BB() for = 0.3
Subtree with root
Mars 3/10 = 0.3
Jupiter 2/3 = 0.67
Pluto 3/7 = 0.43
Mercury 1/3 = 0.33
Uranus 2/4 = 0.5
Pluto
Mars
Jupiter
Earth Mercury Uranus
VenusSaturnNeptune
ICS-II - 2008 13Balanced Binary Search Trees (2) & Extended Binary Trees
Notes
Already noted: = ½ holds for complete trees
Root balance < ½ means: there are fewer nodes in the left subtree limits the root balance symmetrically from both sides
Left tree is complete: root balance goes towards 1 with increasing number of nodes Only = 0 allows all “degenerations”
Not every tree (with n nodes) can be transformed into a BB() tree for any
There is at least one tree in BB() when 0,25 1 - ½ 2 0,292
ICS-II - 2008 14Balanced Binary Search Trees (2) & Extended Binary Trees
Height of Weight Balanced Trees
Note: when traversing the path from the root to the leaves one “looses”, dependent on , a number of nodes at every step
Consider the path p = v1, v2, ..., vh
For the right and left subtree TL and TR of a tree T holds (due to the BB() condition)
n(TL) + 1 ( 1 - ) (n(T) + 1)
n(TR) + 1 ( 1 - ) (n(T) + 1)
Traversal of path p:n(v2) + 1 ( 1 - ) (n(v1) + 1)
n(v3) + 1 ( 1 - ) (n(v2) + 1)
n(vh) + 1 ( 1 - ) (n(vh-1) + 1)
ICS-II - 2008 15Balanced Binary Search Trees (2) & Extended Binary Trees
Height of Weight Balanced Trees
As v1 is the root and vh a leaf, holds: n(T) + 1 = n(v1) + 1 and n(vh) + 1 = 2
Insertion in the total inequality :2 = n(vh) + 1 (1 - )h-1 (n(v1) + 1) = (1 - )h-1 (n(T) + 1)
Apply logarithms on both sides:1 (h - 1)log2(1 - ) + log2 (n(T) + 1)
Thus (note: log2(1 - ) < 0 for > 0):
h - 1 log2 (n(T) + 1) / c O(log2n)
Height of the tree is logarithmic in the number of nodes
ICS-II - 2008 16Balanced Binary Search Trees (2) & Extended Binary Trees
Operations on Weight Balanced Binary Trees
Search is the same as for AVL trees Cost is logarithmic
For insertion/deletion the root balance must be updated along the path from the root to the corresponding position
By violation of the criterion: rotations as for AVL trees
Open issues: Are rotations appropriate measures for restructuring BB() trees? How does one effectively calculate the root balance?
The number of rotations on the path to the root is limited: search/insertion/deletion are all in O(log2n)
ICS-II - 2008 17Balanced Binary Search Trees (2) & Extended Binary Trees
Position Search in Balanced Binary Search Tree
Comparison: Tree implementations vs. linked lists Balanced trees allows (almost) all operations in O(log2n) Linked lists need for search/insertion/deletion in O(n)! For sequential traversal both perform in O(n)
Should sorted data always be stored in trees?! One should not underestimate the implementation costs “Last” operation where lists “win” is for positional search (the pth element)
Positional search: Find the kth element in a list For trees the “list” is an inorder traversal
ICS-II - 2008 18Balanced Binary Search Trees (2) & Extended Binary Trees
The Problem
For lists: Travers k elements in O(k)
For trees: One does not “know” whether to go left or right, and one does not know
anything about the number of nodes in the subtrees Worst case all nodes must be visited: O(n)!
That can be improved!
?...
ICS-II - 2008 19Balanced Binary Search Trees (2) & Extended Binary Trees
Rank of a Node
Definition: The rank of a node is the number of nodes in the left subtree plus 1
Rank = position of node x in the tree where x is root
class BinarySearchTree {int K; /* Key */Info info; /* info */int balance; /* BF, for AVL trees: -1, 0, +1 */int rank;BinarySearchTree L, R;/* constructor und methods ... */
public BinarySearchTree posFind(int pos) { ... }}
ICS-II - 2008 20Balanced Binary Search Trees (2) & Extended Binary Trees
Algorithm
Pseudo code: Start in the root If pos < rank: search in the left subtree If pos > rank: subtract the rank from the position and search in the right
subtree Search stops when pos = rank
Correctness: The rank of a node is always its position in the subtree where it is the root
Note: when inserting/deleting in the left subtree, the nodes upwards until the root must update their ranks
ICS-II - 2008 21Balanced Binary Search Trees (2) & Extended Binary Trees
Example
3
PragueBonn
Bern
2
Lima
5
Sofia
3
2
Paris
2
Cairo
1
Athens
1
Oslo
1
Rome
1
Tokyo
1
pos = 4 -> Cairopos = 9 -> Rome
pos=1
pos=2
pos=3
pos=4
pos=5
pos=6
pos=7
pos=8
pos=9
pos=10
pos=11
ICS-II - 2008 22Balanced Binary Search Trees (2) & Extended Binary Trees
Java Method
public BinarySearchTree findPos( int pos ) {BinarySearchTree root = this;while ( ( root null ) && ( pos root.rank )) {
if ( pos < root.rank ) {root = root.L;
} else {pos = pos - root.rank;root = root.R;
}} return root;
}Complexity in balanced tree
O(log2n)
ICS-II - 2008 23Balanced Binary Search Trees (2) & Extended Binary Trees
Summary: Balanced Search Trees
Operation Sequential list Linked list Bal. tree with degree
Search O(log2n)
(binary search)
O(n) O(log2n)
Positional search (kth element)
O(1) O(k) O(log2n)
Insertion O(log2n) + O(n) O(n)O(1) known pos.
O(log2n)
Deletion O(log2n) + O(n) O(n)O(1) known pos.doubly linked
O(log2n)
Deletion kth element
O(n-k) O(k) O(log2n)
Sequential traversal
O(n) O(n) O(n)
ICS-II - 2008 24Balanced Binary Search Trees (2) & Extended Binary Trees
Extended Binary Trees
ICS-II - 2008 25Balanced Binary Search Trees (2) & Extended Binary Trees
Extended binary trees
Replace NULL-pointers with special (external) nodes. A binary tree, to which external nodes are added, is
called extended binary tree. The data can be stored either in the internal or the
external nodes. The length of the path to the node illustrates the cost of
the search.
ICS-II - 2008 26Balanced Binary Search Trees (2) & Extended Binary Trees
External and internal path length
The cost of the search in extended binary trees depend on the following parameters:
External path length = The sum over all path lengths from the root to the external nodes Si (1 i n+1):
Extn = i = 1 ... n+1 depth( Si ) Internal path length = The sum over all path lengths to
the internal nodes Ki ( 1 i n ):
Intn = i = 1 ... n depth( Ki )
Extn = Intn + 2n (Proof by induction) Extended binary trees with a minimal external path
length have a minimal internal path length too.
ICS-II - 2008 27Balanced Binary Search Trees (2) & Extended Binary Trees
Example
External path length Extn = 3 + 4 + 4 + 2 + 3 + 3 + 3 + 3 = 25
Internal path length Intn = 0 + 1 + 1 + 2 + 2 + 2 + 3 = 11
25 = Extn = Intn + 2n = 11 + 14 = 25
n = 7
0
4
1 1
2 22
3
4
3 3 3 33
2
ICS-II - 2008 28Balanced Binary Search Trees (2) & Extended Binary Trees
Minimal and maximal length
For a given n, a balanced tree has the minimal internal path length.
Example: Within a complete tree with height h, the internal path length is (for n = 2h -1):
Intn = i = 1 ... h i • 2i Internal path length becomes maximum if the tree
degenerates to a linear list:
Intn = i = 1 ... n-1 i = n(n-1)/2
Example: h = 4, n = 15, Int = 34, Ext = 16•4 = 64
For comparison: List with n = 15 nodes has Int = 105, Ext = 105 + 30 = 135
ICS-II - 2008 29Balanced Binary Search Trees (2) & Extended Binary Trees
Weighted binary trees
Often weights qi are assigned to the external nodes ( 1 i n+1 ).
The weighted external path length is defined as
Extw = i = 1 ... n+1 depth( Si ) qi
Within weighted binary trees the properties of minimal and maximal path lengths do not apply any more.
The determination of the minimal external path length is an important practical problem...
Extw = 102 Extw = 88 (less than 102 although linear list)3 8 15 25
8 3
15
25
ICS-II - 2008 30Balanced Binary Search Trees (2) & Extended Binary Trees
Application example: optimal codes
To convert a text file efficiently to bit strings, there are two alternatives: Fixed length coding: each character has the same number of bits
(e.g., ASCII) Variable length coding: some characters are represented using
less bits than the others
Example for coding with fixed length: 3-bit code for alphabet A, B, C, D: A = 001, B = 010, C = 011, D = 100 Message: ABBAABCDADA is converted to 001010010001001010011100001100001 (length 33 bits) Using a 2-bit code the same message can be coded only with 22
bits. For decoding the message, group each 3-bits (respectively 2bits)
and use a table with the code and its matching character.
ICS-II - 2008 31Balanced Binary Search Trees (2) & Extended Binary Trees
Application example: optimal codes (2)
Idea: More frequently used characters are coded using less bits.
Message: ABBAABCDADA Coding: 01010001011111001100 Length: 20 Bit! Variable length coding can reduce the memory space
needed for storing the file. How can this special coding be found and why is the
decoding unique?
Character A B C D
Frequency 5 3 1 2
Coding 0 10 111 110
ICS-II - 2008 32Balanced Binary Search Trees (2) & Extended Binary Trees
Application example: optimal codes (3)
Representation of the frequencies and coding as a weighted binary tree.
First of all decoding: Given a bit string: Use the successive bits, in order to traverse the tree starting from
the root. If you arrive to an external node, use the character stored there.
Example: 010100010111...
• 1. Bit = 0: external node, A• 2. Bit = 1, from the root to the right• 3. Bit 0, links, external node, B• 4. Bit = 1, from the root to the right• 5. Bit 1, right• ...
3
5
2 1
0
0
0
1
1
1
A
B
D C
ICS-II - 2008 33Balanced Binary Search Trees (2) & Extended Binary Trees
Correctness condition
Observation: Within variable length coding, the code of one character should not be a prefix of the code of any other character.
If a character is represented in form of an extended binary tree, then the uniqueness is guaranteed (only one character per external node).
If the frequency of the characters in the original text is taken as the weight of the external nodes, then a tree with minimal external path length will offer an optimal code.
How is a tree with minimal external path length generated?
ICS-II - 2008 34Balanced Binary Search Trees (2) & Extended Binary Trees
Huffman Code
Idea: Characters are weighted and sorted according to the frequency This works as well independently from the text, e.g., in English
(characters with relative weights):
A binary tree with minimal external path length is constructed as follows: Each character is represented with an appropriate tree with its corresponding
weight (only one external node). The two trees having respectively the smallest weight are merged to a new tree. The root of the new tree is marked with the sum of the weights of the original roots. Continue until only one tree remains.
E 1231 T 959 A 805 O 794
N 719 I 718 S 659 R 603
H 514 L 403 D 365 C 320
U 310 P 229 F 228 M 225
W 203 Y 188 B 162 G 161
V 93 K 52 Q 20 X 20
J 10 Z 9
ICS-II - 2008 35Balanced Binary Search Trees (2) & Extended Binary Trees
Example 1: Huffman
Alphabet and frequency:
E T N I S
29 10 9 5 4
Step 1: (4, 5, 9, 10, 29)new weight: 9
Step 2: (9, 9, 10, 29)
new weight: 18
4+5
4 5
0 1
9
4 5
0 1
9+9
0
9
1
ICS-II - 2008 36Balanced Binary Search Trees (2) & Extended Binary Trees
Example 1: Huffman (2)
Step 3: (18, 10, 29) (10, 18, 29) new weight: 28
• Step 4: (28, 29)
finished!
9
4 5
0 1
18
0
9
1
10+18
10
0 1
9
4 5
0 1
18
0
9
1
28
10
0 1
29
57
0 1
ICS-II - 2008 37Balanced Binary Search Trees (2) & Extended Binary Trees
Resulting tree
Coding:
Extw = 112 Using this coding, the code e.g., for:
TENNIS = 00101101101010100 SET = 0100100 NET = 011100
Decoding as described before.
9
S I
0 1
18
0
N
1
28
T
0 1
E
57
0 1Character Code Weight
E 1 29
T 00 10
N 011 9
I 0101 5
S 0100 4
ICS-II - 2008 38Balanced Binary Search Trees (2) & Extended Binary Trees
Some remarks
The resulting tree is not regular. Regular trees are not always optimal. Example: the best nearly complete tree has Extw = 123
For the messageABBAABCDADA20 bits is optimal(see previousslides)
4 5
10 299
ICS-II - 2008 39Balanced Binary Search Trees (2) & Extended Binary Trees
Example 2: Huffman
Average number of bits without Huffman:
3 (because 23 = 8)
Average number of bits using Huffman code:
There are other “valid” solutions! But the average number of bits remains the same for all these solutions (equal to Huffman)
Z p (%) Code
A 25 00
B 4 1110
C 13 100
D 7 110
E 35 01
F 11 101
G 2 11110
H 3 11111
54,203,0502,0511,0335,02
07,0313,0304,0425,02
ICS-II - 2008 40Balanced Binary Search Trees (2) & Extended Binary Trees
Analysis
/* Algorithm Huffmann */for (int i = 1; i n-1; i++) {
p1 = smallest element in list L remove p1 from L p2 = smallest element in L
remove p2 from L create node p add p1 und p2 as left and right subtrees to p
weight p = weight p1 + weight p2
insert p into L}
Run time behavior depends in particular on the implementation of the list Time required to find the node with the smallest weight Time required to insert a new node
“Naive” implementations give O(n2), “smarter” result in O(n log2n)
ICS-II - 2008 41Balanced Binary Search Trees (2) & Extended Binary Trees
Optimality
Observation: The weight of a node K in the Huffman tree is equal to the external path length of the subtree having K as root.
Theorem: A Huffman tree is an extended binary tree with minimal external path length Extw.
Proof outline (per induction over n, the number of the characters in the alphabet): The statement to prove is A(n) = “A Huffman tree with n nodes
has minimal external path length Extw”. Consider first n=2: Prove A(2) = “A Huffman tree with 2 nodes
has minimal external path length”.
ICS-II - 2008 42Balanced Binary Search Trees (2) & Extended Binary Trees
Optimality (2)
Proof: n = 2: Only two characters with weights q1 and q2 result in a tree
with Extw = q1 + q2. This is minimal, because there are no other trees.
Induction hypothesis: For all i n, A(i) is true. To prove: A(n+1) is true.
V
T1 T2
ICS-II - 2008 43Balanced Binary Search Trees (2) & Extended Binary Trees
Optimality (3)
Proof: Consider a Huffman tree T with n+1 nodes. This tree has a root V
and two subtrees T1 und T2, which have respectively the weights q1 and q2.
Considering the construction method we can deduce, that For the weights qi of all internal nodes ni of T1 and T2: qi min(q1, q2).
That’s why: for these weights qi: q1 + q2 > qi. So if V is replaced by any node in T1 or T2, the resulting tree will have a greater weight.
Replacing nodes within T1 and T2 will not make sense, because T1 and T2 are already optimal (both are trees with n nodes or less and the induction hypothesis hold for them).
So T is an optimal tree with n+1 nodes. V
T1 T2q1 q2
q1 + q2
ICS-II - 2008 44Balanced Binary Search Trees (2) & Extended Binary Trees
Huffman Code: Applications
Fax machine
ICS-II - 2008 45Balanced Binary Search Trees (2) & Extended Binary Trees
Huffman: Other applications
ZIP-Coding (at least similar technique)
In principle: most of coding techniques with data reduction (lossless compression)
NOT Huffman: lossy compression techniques like JPEG, MP3, MPEG, …