Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of...

117
Balanced Trees Part One

Transcript of Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of...

Page 1: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Balanced TreesPart One

Page 2: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Balanced Trees

● Balanced trees are surprisingly versatile data structures.

● Many programming languages ship with a balanced tree library.● C++: std::map / std::set● Java: TreeMap / TreeSet● Haskell: Data.Map

● Many advanced data structures are layered on top of balanced trees.● Euler tour trees (next week)● Dynamic graphs (later this quarter)● y-Fast Tries

Page 3: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Outline for This Week

● B-Trees● A simple type of balanced tree developed for

block storage.

● Red/Black Trees● The canonical balanced binary search tree.

● Augmented Search Trees● Adding extra information to balanced trees to

supercharge the data structure.

● Two Advanced Operations● The split and join operations.

Page 4: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Outline for Today

● BST Review● Refresher on basic BST concepts and runtimes.

● Overview of Red/Black Trees● What we're building toward.

● B-Trees● A simple balanced tree in depth.

● Intuiting Red/Black Trees● A much better feel for red/black trees.

Page 5: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

A Quick BST Review

Page 6: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Binary Search Trees

● A binary search tree is a binary tree with the following properties:

● Each node in the BST stores a key, and optionally, some auxiliary information.

● The key of every node in a BST is strictly greater than all keys to its left and strictly smaller than all keys to its right.

● The height of a binary search tree is the length of the longest path from the root to a leaf, measured in the number of edges.

● A tree with one node has height 0.● A tree with no nodes has height -1, by convention.

Page 7: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Inserting into a BST

96

137

42

57

271

161 314

Page 8: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Inserting into a BST

96

137

42

57

271

161 314

166

Page 9: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Deleting from a BST

96

137

42

271

161 314

166

Case 1: If the node has just no children, just remove it.

Case 1: If the node has just no children, just remove it.

Page 10: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Deleting from a BST

137

42 271

161 314

166

Case 2: If the node has just one child, remove it and replace it with its child.

Case 2: If the node has just one child, remove it and replace it with its child.

Page 11: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Deleting from a BST

161

42 271

314166

Case 3: If the node has two children, find its inorder successor (which has zero or one child), replace the node's key with its successor's key, then delete its successor.

Case 3: If the node has two children, find its inorder successor (which has zero or one child), replace the node's key with its successor's key, then delete its successor.

Page 12: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Runtime Analysis

● The time complexity of all these operations is O(h), where h is the height of the tree.● Represents the longest path we can take.

● In the best case, h = O(log n) and all operations take time O(log n).

● In the worst case, h = Θ(n) and some operations will take time Θ(n).

● Challenge: How do you efficiently keep the height of a tree low?

Page 13: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

A Glimpse of Red/Black Trees

Page 14: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Red/Black Trees

● A red/black tree is a BST with the following properties:● Every node is either

red or black.● The root is black.● No red node has a red

child.● Every root-null path in

the tree passes through the same number of black nodes.

110

107

106

166

161 261

140

Page 15: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Red/Black Trees

● A red/black tree is a BST with the following properties:● Every node is either

red or black.● The root is black.● No red node has a red

child.● Every root-null path in

the tree passes through the same number of black nodes.

53

31

59 97

58

26 41

Page 16: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Red/Black Trees

● A red/black tree is a BST with the following properties:● Every node is either

red or black.● The root is black.● No red node has a red

child.● Every root-null path in

the tree passes through the same number of black nodes.

5

2

8

7

1 4

Page 17: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Red/Black Trees

● Theorem: Any red/black tree with n nodes has height O(log n).● We could prove this now, but there's a much

simpler proof of this we'll see later on.

● Given a fixed red/black tree, lookups can be done in time O(log n).

Page 18: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Mutating Red/Black Trees

17

3 11 23 37

7 31

13What are we

supposed to do with this new node?

What are we supposed to do with

this new node?

Page 19: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Mutating Red/Black Trees

17

3 11 23

7 37

How do we fix up the black-height property?

How do we fix up the black-height property?

Page 20: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Fixing Up Red/Black Trees

● The Good News: After doing an insertion or deletion, can locally modify a red/black tree in time O(log n) to fix up the red/black properties.

● The Bad News: There are a lot of cases to consider and they're not trivial.

● Some questions:● How do you memorize / remember all the

different types of rotations?● How on earth did anyone come up with

red/black trees in the first place?

Page 21: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

B-Trees

Page 22: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Generalizing BSTs

● In a binary search tree, each node stores a single key.

● That key splits the “key space” into two pieces, and each subtree stores the keys in those halves.

2

-1 4

-2 0 63

Values less than two Values greater than two

Page 23: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Generalizing BSTs

● In a multiway search tree, each node stores an arbitrary number of keys in sorted order.

● In a node with k keys splits the “key space” into k + 1 pieces, and each subtree stores the keys in those pieces.

2

43

5 19 31 71 83

3 7 11 13 17 23 29 37 41 47 53 67 73 79 89 9759 61

Page 24: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

One Solution: B-Trees

● A B-tree of order b is a multiway search tree with the following properties:

● All leaf nodes are stored at the same depth.

● All non-root nodes have between b – 1 and 2b – 1 keys.

● The root has at most 2b – 1 keys.

2

43

5 19 31 71 83

3 7 11 13 17 23 29 37 41 47 53 67 73 79 89 9759 61

B-tree of order 3

B-tree of order 3

Page 25: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

One Solution: B-Trees● A B-tree of order b is a multiway search tree with the following

properties:

● All leaf nodes are stored at the same depth.

● All non-root nodes have between b – 1 and 2b – 1 keys.

● The root has at most 2b – 1 keys.

B-tree of order 7

B-tree of order 7

1 3 6 10 11 14 19 20 21 23 24 28 29 33 44 48 57 62 77 91

16 36

Page 26: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

One Solution: B-Trees● A B-tree of order b is a multiway search tree with the following

properties:

● All leaf nodes are stored at the same depth.

● All non-root nodes have between b – 1 and 2b – 1 keys.

● The root has at most 2b – 1 keys.

B-tree of order 2

(2-3-4 Tree)

B-tree of order 2

(2-3-4 Tree)

1 2 4 6 7 8 11 12 14 15 17 18 19 21 2 24 26

3 9 10 16 20 25

5 13 23

Page 27: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Tradeoff

● Because B-tree nodes can have multiple keys, we have to spend more work inside each node.

● Insertion and deletion can be expensive – for large b, might have to shuffle thousands or millions of keys over!

● Why would you use a B-tree?

Page 28: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Memory Tradeoffs

● There is an enormous tradeoff between speed and size in memory.

● SRAM (the stuff registers are made of) is fast but very expensive:

● Can keep up with processor speeds in the GHz.

● As of 2010, cost is $5/MB.

● Good luck buying 1TB of the stuff!

● Hard disks are cheap but very slow:

● As of 2014, you can buy a 2TB hard drive for about $80.

● As of 2014, good disk seek times are measured in ms (about two to four million times slower than a processor cycle!)

Page 29: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Memory Hierarchy

● Idea: Try to get the best of all worlds by using multiple types of memory.

256B - 8KB

16KB – 64KB

1MB - 4MB

4GB – 256GB

500GB+

HUGE

0.25 – 1ns

1ns – 5ns

5ns – 25ns

25ns – 100ns

3 – 10ms

10 – 2000ms

L2 Cache

Main Memory

Hard Disk

Network (The Cloud)

Registers

L1 Cache

Page 30: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Why B-Trees?

● Because B-trees have a huge branching factor, they're great for on-disk storage.

● Disk block reads/writes are glacially slow.● Only have to read a few disk pages.● Extra work scanning inside a block offset by these

savings.

● Used extensively in databases, file systems, etc.● Typically, use a B+-tree rather than a B-tree, but

idea is similar.

● Recently, have been gaining traction for main-memory data structures.

● Memory cache effects offset extra searching costs.

Page 31: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Height of a B-Tree

● What is the maximum possible height of a B-tree of order b?

1

b – 1

b – 1 b – 1

b – 1 b – 1

…… …

b – 1

b – 1 b – 1

b – 1 b – 1

…… …

1

2(b - 1)

2b(b - 1)

2b2(b - 1)

2bh-1(b - 1)

b – 1 b – 1 b – 1…

Page 32: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Height of a B-Tree

● Theorem: The maximum height of a B-tree of order b containing n nodes is logb ((n + 1) / 2).

● Proof: Number of nodes n in a B-tree of height h is guaranteed to be at least

= 1 + 2(b – 1) + 2b(b – 1) + 2b2(b – 1) + … + 2bh-1(b – 1)

= 1 + 2(b – 1)(1 + b + b2 + … + bh-1)

= 1 + 2(b – 1)((bh – 1) / (b – 1))

= 1 + 2(bh – 1) = 2bh – 1

● Solving n = 2bh – 1 yields n = logb ((n + 1) / 2)

● Corollary: B-trees of order b have height O(logb n).

Page 33: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Searching in a B-Tree

● Doing a search in a B-tree involves● searching the root node for the key, and● if it's not found, recursively exploring the correct child.

● Using binary search within a given node, can find the key or the correct child in time O(log number-of-keys).

● Repeat this process O(tree-height) times.● Time complexity is

= O(log number-of-keys · tree-height)

= O(log b · logb n)

= O(log b · (log n / log b))

= O(log n)

● Requires reading O(logb n) blocks; this more directly accounts for the total runtime.

Page 34: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

B-Trees are Simple

● Because nodes in a B-tree can store multiple keys, most insertions or deletions are straightforward.

● Here's a B-tree with b = 3 (nodes have between 2 and 5 keys):

2

43

7 19 31 71 83

3 11 12 23 29 37 41 67 73 79 89 976113 14 59

Page 35: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

● What happens if you insert a key into a node that's too full?

● Idea: Split the node in two and propagate upward.

● Here's a 2-3-4 tree (each node has 1 to 3 keys).

1 6

11 21 31

41

16 91

86

26 36 81

56

56

51

46

76

71

66

612

Page 36: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

● What happens if you insert a key into a node that's too full?

● Idea: Split the node in two and propagate upward.

● Here's a 2-3-4 tree (each node has 1 to 3 keys).

1 6

11 21 31

41

16 91

86

26 36 81

56

56

51

46

76

71

66

612 3

Page 37: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

● What happens if you insert a key into a node that's too full?

● Idea: Split the node in two and propagate upward.

● Here's a 2-3-4 tree (each node has 1 to 3 keys).

1 6

11 21 31

41

16 91

86

26 36 81

56

56

51

46

76

71

66

612

3

Page 38: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

● What happens if you insert a key into a node that's too full?

● Idea: Split the node in two and propagate upward.

● Here's a 2-3-4 tree (each node has 1 to 3 keys).

1 6

11 31

41

16 91

86

26 36 81

56

56

51

46

76

71

66

612

3

21

Page 39: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

● What happens if you insert a key into a node that's too full?

● Idea: Split the node in two and propagate upward.

● Here's a 2-3-4 tree (each node has 1 to 3 keys).

1 6

11 31

16 91

86

26 36 81

56

56

51

46

76

71

66

612

3

21

41 Note: B-trees grow upward, not downward.

Note: B-trees grow upward, not downward.

Page 40: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Inserting into a B-Tree

● To insert a key into a B-tree:● Search for the key, insert at the last-visited leaf node.● If the leaf is too big (contains 2b keys):

– Split the node into two nodes of size b each.– Remove the largest key of the first block and make it the

parent of both blocks.– Recursively add that node to the parent, possibly triggering

more upward splitting.

● Time complexity:● O(b) work per level and O(logb n) levels.

● Total work: O(b logb n)

● In terms of blocks read: O(logb n)

Page 41: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Time-Out for Announcements!

Page 42: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Problem Set One

● Problem Set One is due on Wednesday at the start of class (2:15PM).

● Hand in theory questions in hardcopy at the start of lecture and submit coding problems using the submitter.

● Have questions? I'll be holding my office hours here in Hewlett 201 today right after class.

Page 43: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Your Questions!

Page 44: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

“Why were segment trees not covered in the context of RMQs? Segment trees have

complexity ⟨O(n), O(log n)⟩ and as I understand, they are more suitable when the array needs to be modified between

queries.”

Time – if we had more time to spend on RMQ, I definitely would have covered them. I have about 100 slides on them that I had to

cut... sorry about that!

Time – if we had more time to spend on RMQ, I definitely would have covered them. I have about 100 slides on them that I had to

cut... sorry about that!

Page 45: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

“What's your favorite programming language (and why)?”

C++, but that's just my personal preference. I have a lot of

experience with it and it has some amazingly powerful features.

C++, but that's just my personal preference. I have a lot of

experience with it and it has some amazingly powerful features.

Page 46: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

“If you were stuck on a deserted island with just one data structure, which data

structure would you want to have?”

Any fully retroactive data structure.

Any fully retroactive data structure.

Page 47: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Back to CS166!

Page 48: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

6

11 26 36

46

16 31 41 61

56

51

● How do you delete from a leaf that has only b – 1 keys?

● Idea: Steal keys from an adjacent nodes, or merge the nodes if both are empty.

● Again, a 2-3-4 tree:

21

Page 49: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

6

? 26 36

46

16 31 41 61

56

51

● How do you delete from a leaf that has only b – 1 keys?

● Idea: Steal keys from an adjacent nodes, or merge the nodes if both are empty.

● Again, a 2-3-4 tree:

2111

Page 50: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

6

16 26 36

46

31 41 61

56

51

● How do you delete from a leaf that has only b – 1 keys?

● Idea: Steal keys from an adjacent nodes, or merge the nodes if both are empty.

● Again, a 2-3-4 tree:

2111

Page 51: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

16 26 36

46

31 41 61

56

51

● How do you delete from a leaf that has only b – 1 keys?

● Idea: Steal keys from an adjacent nodes, or merge the nodes if both are empty.

● Again, a 2-3-4 tree:

2111

Page 52: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

? 26 36

46

31 41 61

56

51

● How do you delete from a leaf that has only b – 1 keys?

● Idea: Steal keys from an adjacent nodes, or merge the nodes if both are empty.

● Again, a 2-3-4 tree:

2111 16

Page 53: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

26 36

46

31 41 61

56

51

● How do you delete from a leaf that has only b – 1 keys?

● Idea: Steal keys from an adjacent nodes, or merge the nodes if both are empty.

● Again, a 2-3-4 tree:

2111 16

Page 54: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

26 36

46

31 41 61

56

51

● How do you delete from a leaf that has only b – 1 keys?

● Idea: Steal keys from an adjacent nodes, or merge the nodes if both are empty.

● Again, a 2-3-4 tree:

2116

Page 55: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

36

46

31 41 61

56

51

● How do you delete from a leaf that has only b – 1 keys?

● Idea: Steal keys from an adjacent nodes, or merge the nodes if both are empty.

● Again, a 2-3-4 tree:

Page 56: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

36

46

31 41 61

56

51

● How do you delete from a leaf that has only b – 1 keys?

● Idea: Steal keys from an adjacent nodes, or merge the nodes if both are empty.

● Again, a 2-3-4 tree:

Page 57: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

?

46

31 41 61

56

51

● How do you delete from a leaf that has only b – 1 keys?

● Idea: Steal keys from an adjacent nodes, or merge the nodes if both are empty.

● Again, a 2-3-4 tree:

36

Page 58: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

?

46

31 41 61

56

51

● How do you delete from a leaf that has only b – 1 keys?

● Idea: Steal keys from an adjacent nodes, or merge the nodes if both are empty.

● Again, a 2-3-4 tree:

36

Page 59: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

?

46

41 61

56

51

● How do you delete from a leaf that has only b – 1 keys?

● Idea: Steal keys from an adjacent nodes, or merge the nodes if both are empty.

● Again, a 2-3-4 tree:

36

Page 60: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

?

41 61

56

51

● How do you delete from a leaf that has only b – 1 keys?

● Idea: Steal keys from an adjacent nodes, or merge the nodes if both are empty.

● Again, a 2-3-4 tree:

36

46

Page 61: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

41 61

56

51

● How do you delete from a leaf that has only b – 1 keys?

● Idea: Steal keys from an adjacent nodes, or merge the nodes if both are empty.

● Again, a 2-3-4 tree:

36

46

?

Page 62: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Trickier Cases

41 61

56

51

● How do you delete from a leaf that has only b – 1 keys?

● Idea: Steal keys from an adjacent nodes, or merge the nodes if both are empty.

● Again, a 2-3-4 tree:

36

46

Page 63: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Deleting from a B-Tree

● If not in a leaf, replace the key with its successor from a leaf and delete out of a leaf.

● To delete a key from a node:

● If the node has more than b – 1 keys, or if the node is the root, just remove the key.

● Otherwise, find a sibling node whose shared parent is p.

● If that sibling has at least b – 1 keys, move the max/min key from that sibling into p's place and p down into the current node, then remove the key.

● Fuse the node and its sibling into a single node by adding p into the block, then recursively remove p from the parent node.

● Work done is O(b logb n): O(b) work per level times O(logb n) total levels. Requires O(logb n) block reads/writes.

Page 64: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

So... red/black trees?

Page 65: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Red/Black Trees

● A red/black tree is a BST with the following properties:● Every node is either

red or black.● The root is black.● No red node has a red

child.● Every root-null path in

the tree passes through the same number of black nodes.

110

107

106

166

161 261

140

Page 66: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Red/Black Trees

● A red/black tree is a BST with the following properties:● Every node is either

red or black.● The root is black.● No red node has a red

child.● Every root-null path in

the tree passes through the same number of black nodes.

110

107106

166

161 261140

Page 67: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Red/Black Trees

● A red/black tree is a BST with the following properties:● Every node is either

red or black.● The root is black.● No red node has a red

child.● Every root-null path in

the tree passes through the same number of black nodes.

19

3

11

23 37

7 31

17

13

Page 68: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Red/Black Trees

● A red/black tree is a BST with the following properties:● Every node is either

red or black.● The root is black.● No red node has a red

child.● Every root-null path in

the tree passes through the same number of black nodes.

19

3 11 23 37

7

311713

Page 69: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Red/Black Trees ≡ 2-3-4 Trees

● Red/black trees are an isometry of 2-3-4 trees; they represent the structure of 2-3-4 trees in a different way.

● Many data structures can be designed and analyzed in the same way.

● Huge advantage: Rather than memorizing a complex list of red/black tree rules, just think about what the equivalent operation on the corresponding 2-3-4 tree would be and simulate it with color flips and rotations.

Page 70: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

The Height of a Red/Black Tree

Theorem: Any red/black tree with n nodes hasheight O(log n).

Proof: Contract all red nodes into theirparent nodes to convert the red/blacktree into a 2-3-4 tree. This decreasesthe height of the tree by at most afactor of two. The resulting 2-3-4 tree hasheight O(log n), so the original red/blacktree has height 2 · O(log n) = O(log n). ■

Page 71: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Exploring the Isometry

● Nodes in a 2-3-4 tree are classified into types based on the number of children they can have.● 2-nodes have one key (two children).● 3-nodes have two keys (three children).● 4-nodes have three keys (four children).

● How might these nodes be represented?

Page 72: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Exploring the Isometry

k₁

k₁ k₂

k₁ k₂ k₃

k₁

k₁

k₂ k₁

k₂

k₁ k₃

k₂

Page 73: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7 31

17

13

Page 74: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

311713

Page 75: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

3117135

Page 76: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

3117135

Page 77: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7 31

17

13

5

Page 78: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7 31

17

13

5

Page 79: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

3117135

Page 80: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

3117135 37

Page 81: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

3117135 37

Page 82: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7 31

17

13

5

37

Page 83: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Red/Black Tree Insertion

● Rule #1: When inserting a node, if its parent is black, make the node red and stop.

● Justification: This simulates inserting a key into an existing 2-node or 3-node.

Page 84: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7 31

17

13

5

37

Page 85: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

3117135 37

Page 86: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

3117135 374

Page 87: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

3117135 374

Page 88: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

4 23

7 31

17

13

5

37

3

Page 89: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

4 23

7 31

17

13

5

37

3 We need to

1. Change the colors of the nodes, and2. Move the nodes around in the tree.

We need to

1. Change the colors of the nodes, and2. Move the nodes around in the tree.

Page 90: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Tree Rotations

B

A

>B

<A >A<B

B

A

>B

<A

>A<B

Rotate Right

Rotate Left

Page 91: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Tree Rotations

● Tree rotations are a fundamental primitive on binary search trees.

● Makes it possible to locally reorder nodes while preserving the binary search property.

● Most balanced trees use tree rotations at some point.

Page 92: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

3

5

4

3 5

4

3

5

4

3 5

4

applyrotation

changecolors

apply rotation

This applies any time we're inserting a new node into the middle of a “3-node.”

By making observations like these, we can determine

how to update a red/black tree after an insertion.

This applies any time we're inserting a new node into the middle of a “3-node.”

By making observations like these, we can determine

how to update a red/black tree after an insertion.

Page 93: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7 31

17

13

5

37

4

Page 94: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7 31

17

13

5

37

4

Page 95: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3

23

7 31

17

13

5

374

Page 96: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3

23

7 31

17

13

5

374

Page 97: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3

23

7 31

17

13

5

374

15

Page 98: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3

23

7 31

17

13

5

374

15

Page 99: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3

23

7 31

17

13

5

374

15

Page 100: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3

23

7 31

17135

374 15

Page 101: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3

23

7 31

17135

374 15

Page 102: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

3117135 374 15

Page 103: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

3117135 374 15 16

Page 104: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

3117135 374

15

16

Page 105: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

3117135 374

15

16

Page 106: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

3117135 374

15

16

Page 107: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

3117135 374

15

16

Two steps:

1. Split the “5-node” into a “2-node” and a “3-node.”2. Insert the new parent of the two nodes into the parent node.

Two steps:

1. Split the “5-node” into a “2-node” and a “3-node.”2. Insert the new parent of the two nodes into the parent node.

Page 108: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

1713

15

16

1713 15 16

1713

15

161713

15

16

change colors

Page 109: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3

23

7 31

17135

374 15

16

Page 110: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3

23

7 31

17135

374 15

16

Page 111: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3

23

7 31

17135

374 15

16

Page 112: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3

237

31

17

13

5

37

4

15

16

Page 113: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

311713

5 37

4

15

16

Page 114: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Using the Isometry

19

3 23

7

311713

5 37

4

15

16

Page 115: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Building Up Rules

● All of the crazy insertion rules on red/black trees make perfect sense if you connect it back to 2-3-4 trees.

● There are lots of cases to consider because there are many different ways you can insert into a red/black tree.

● Main point: Simulating the insertion of a key into a node takes time O(1) in all cases. Therefore, since 2-3-4 trees support O(log n) insertions, red/black trees support O(log n) insertions.

● The same is true of deletions.

Page 116: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

My Advice

● Do know how to do B-tree insertions and deletions.

● You can derive these easily if you remember to split and join nodes.

● Do remember the rules for red/black trees and B-trees.

● These are useful for proving bounds and deriving results.

● Do remember the isometry between red/black trees and 2-3-4 trees.

● Gives immediate intuition for all the red/black tree operations.

● Don't memorize the red/black rotations and color flips.

● This is rarely useful. If you're coding up a red/black tree, just flip open CLRS and translate the pseudocode. ☺

Page 117: Balanced Trees - Stanford University trees are surprisingly versatile data ... A simple type of balanced tree developed for ... There is an enormous tradeoff between speed and size

Next Time

● Augmented Trees● Building data structures on top of balanced

BSTs.

● Splitting and Joining Trees● Two powerful operations on balanced trees.