B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary...

39
B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra University
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    228
  • download

    2

Transcript of B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary...

Page 1: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

B-Trees:Balanced Trees for Use with Random

Access Secondary Storage

Gerda KamberovaDepartment of Computer Science

Hofstra University

Page 2: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Overview

• Dynamic Set/Dictionary on a Disk Drive: B-trees• Memory

– Motivation– Memory hierarchy– Impact of memory organization on the running time of

algorithms• B-trees

– Definition and examples– bounding the height of a B-tree– Operations on a B-tree: search, insert, delete

Page 3: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Memory Hierarchy

• Up to now we assumed that read and write are done from/to main memory and that it takes fixed minimal amount of time to complete them those operations

• Some applications deal with huge amounts of data that cannot all fit into the main memory: – analysis of sci data– processing financial transactions, – organization and maintenance of databases– telephone directories,– library catalogs, etc.

• B-Trees are balanced search trees designed to work well on direct access secondary storage devices (minimizing disk I/O operations)

Page 4: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Memory Hierarchy

• Computers have hierarchy of different memories which vary in size and speed (in increasing size and decreasing speed order):– CPU registers, slower, larger in size, – Cache, order of magnitude slower than cache– Main memory (RAM): about 2 orders of magnitude slower than

cache • SRAM • DRAM

– Disks: 100,000 to 1,000,000 times slower than main memory

• OS support general mechanisms that allow most memory accesses to be fast. The mechanism are based on the property locality-of-reference of property of most software.

Page 5: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Locality-of-Reference and Memory Access

• Locality-of-reference– Temporal locality (TL): if a program accesses a certain memory

location now, it is likely it will access it in the near future– Spatial locality (SL): if a program accesses a certain memory location

now, it is likely it will access other close-by locations in near future• Caching and Blocking

– design choices for two-level memory systems – present in interfaces

• between main memory and cache memory • between external memory and main memory

– Caching: motivated by TL, • bring data from main memory into cache , hoping that they will be

needed soon, and then the response will be fast then going to main memory

• Data are accessed in blocks called cache lines– Blocking: motivated by SL,

• If location x is required from secondary memory, bring to main memory not only data from x , but also data from close by locations to x

• Data are accessed in blocks, called pages (disk blocks).

Page 6: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Implications of Locality-of-Reference

• In addition, the blocking for external memory is motivated by hardware characteristics of external storage devices– By using blocking the secondary memory is perceived much

faster then it is.

• Implications of locality-of-reference for programmers.– The programmer usually does not have to be overly concerned

with memory hierarchy and how blocking and caching are implemented, still one should try to

• Use TL: if an algorithm calls for several accesses to the same variable, try to group these accesses as close as possible in execution order .

• Use SL: if an algorithm calls for accessing a certain location x in an array or a certain field in an object, try to group access to locations spatially close to x as close as possible in execution order

• When selecting an algorithm can we take an advantage of the locality of reference?

Page 7: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Dynamic Set on Secondary Storage

• Goal: minimize disk accesses needed to perform search or updates.– It is preferable to do many main memory accesses instead of

one disk access.

• Disk accesses complexity on various implementations of a dynamic set– Use # pages (blocks) read from disk as crude approximation

of time spent accessing the disk– doubly linked list: search O(n), each successive linc requires

a different block– Sorted array: search is O(log n), still require Theta(n/B)

accesses for insert and delete.– Balanced BST, skip lists or other structures with logarithmic

times: worst case, each accessed node is in a different block –O(log n) accesses.

– B-Trees: O(log n/log B)• Idea: Trade 1 slow disk access for O(B) very fast , where B

is the block size.

Page 8: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

B-Trees• Balanced search trees designed to work well on data stored on disks. Multiple keys

are stored sorted in a node. If a node keeps m keys, it has m+1 children.

• Property: n-node B-tree has height O(log n)• Max branching factor (BF) depends on disk block size. • For large B-trees stored on disk, branching factor (BF) between 50 an 2000 often used. • With BF=1001 (1000 keys per node),

– how many nodes are in tree of height 2?– how many keys can be stored in tree of height 2? – since the root is kept in main memory, at most 2 disk accesses will be necessary to

locate any key.

M

D H Q T X

F G S K L Y ZV WR SN PB C

Root(T) is M

height 2

Page 9: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Conventions

• Modify pseudo code language by adding– DiskRead(x): reads page containing object x into main

memory– DiskWrite(x): writes page containing object x into

secondary storage• Assume pages no more in used are flushed from main

memory

• Usually want B-tree node to be the size of a whole disk page

• For simplicity, ignore “data” information, in practice most common to store with each key a pointer to another disk page with the data.

Page 10: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

B-Tree DefinitionB-Tree is a rooted tree with nodes having the following

properties. 1. Every x has the following fields:

– n[x], number keys stored in x– The n[x] keys are sorted

key1[x]<=key2[x]<=…<= keyn[x][x] – leaf[x] is TRUE if x is a leaf, and FALSE otherwise

2. If x is an internal node, x has n[x]+1 children which are accessed by pointers c1[x] <= c2[x] <= … <= cn[x]+1[x](analogy with left[x] and right[x] on binary tree)

3. The keys in a node x separate the ranges of the keys stored in the children key1[x] key2[x] … keyn[x][x]

<key1[x]>key1[x]

<key2[x]>key n[x]-1 [x]<key n[x] [x] >keyn[x][x]

C1[x]C2[x] Cn[x]-1[x] Cn[x][x]

Page 11: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

B-Tree Definition (cont)

4. Every leaf has the same depth, the height of the tree h

5. Let t>=2 be an integer the minimum branching factor (the minimum out-degree of the B-Tree).

– Every node except the root must have >= t-1 keys and thus >= t children (n[x]>=t-1)

– If the tree is not empty, n[root[T]]>=1, and thus the root has at least 2 children

– Every node contains <= 2t-1 keys, and thus has at most 2t children

Thus: BF of the root is between 2 and 2t, each node other than the root has BF between t and 2t

Example: t=2, every internal node has between 2 and 4 children (2-3-4 tree)

Page 12: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

The Height of B-Tree

• The number disk accesses for the operation is bounded by the height, thus O(h)

• Theorem: If n >= 1, then for any n-key B-tree T of height h and minimum BF t >=2,

Proof: If we prove the statement for the min number-key B-tree of height h, M , then it will be true for any tree of height h.

22

2 2

1log1 log ( 1) 12log

2 log logt

nn n

ht t

Page 13: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

B-Tree Height

• Proof: (cont)

2 2

-2 -2

level #nodes #keys

0 1 1

1 2 2( -1)

2 2 2 ( -1)

3 2 2 ( -1)

1 2 2k k

t

t t t

t t t

k t t

-1 -1

( -1)

2 2 ( -1)k k

t

k t t t

t

t tt t

t

Root(T)

11

1 0

2 2

2 2

Thus, for any n-key tree of height,

( 1)1 2( 1) 1 2( 1) 1 2( 1) 1 2( 1),and since ,

( 1)

1 2( 1) 2 1, and solving for ,

1 log ( 1) log 2log (log ),

2 log log

kk ki i k

i i

h h

t

tn t t t t t t k h

t

n t t h

n n nh O n

t t

22

2

.

1 log ( 1)Recall that for RB tree 2 log ( 1), and for B-tree log =

2 log

Thus, B-trees save a factor of about lg over RB-trees.

RB t

t const

n nh n

t

t

Page 14: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Basic OperationsAssume root(T) always in main memory, so never do DiskRead on the

root,however must do DiskWrite when the root is changed.

• Searching: stright forward generalization of BST search• Ex: search S• Complexity:

– to find/not find the node <= log(n+1)/log t– At each node, O(log t) to do Binary search on the sorted keys and decide

which child to go to– 1 DiskRead to get the page containing the child

M

D H Q T X

F G S K L Y ZV WR SN PB C

Root(T)

Thus, ( ) ( log ), # (log )

Note that disk access searching for key in a node.t ttotal time O th O t n disk pages accessed h O n

Page 15: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Basic B-Tree Operations

• Creating an empty tree: O(1) time• Splitting a node

– Important operation for insertion is splitting a full node y (with 2t-1 keys) around its median key into 2 nodes having t-1 keys each.

– The median key moves into y’s parent (which must not be full prior to splitting y)

– Ex: t=4, max 7 keys in node, max BF 8

– If y is the root, the tree grows in height by 1

… N W … … N S W …

P Q R S T U V P Q R T U V

TaTb

Tc

1 2 3 4 5 6 7 8

Ta Tc

Tb1 Tb2

1 2 3 4 5 6 78

Page 16: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Basic Operations: Split (cont)

– B_Tree_split-child(x,i,y),• splits the full child y of the non-full node x already read

into memory into two subtrees,

• Median key moves into x– Complexity of Split:

• Time: to copy half of pointers and keys into new nodes and remove y

• Disk access: allocate one node on disk + write 3 to disk, O(1)

[ ] is a full child of , with 2 1 keysiy c x x t

( )t

Page 17: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Basic Operations: Insert• Idea:

– Use a single pass going down the tree, as for search; search is performed to locate the leaf in which to insert the new key. At each non-full node a binary search will be performed to decide which subtree to follow

– whenever full nodes are encountered on the search path, split them, and continue recursively insert on one of the newly created subtrees.

• Start at the root, – if it is full, prior to continuing, create a new node and split the

root pushing the median key of the root up into the new node. (This is the only way B-Tree height grows.)

A D F H L N P

root[T]H

A D F L N P

Root[T] ( ) CPU time

(1) disk access

t

O

Page 18: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Basic Operations: Insert (cont)

The procedure in the textbook implements this one-pass insert.

• It starts from the root, – if it is full, prior to continuing, it will create a new node

and split the root pushing the median key of root up into the new node. The key is inserted always in a non full leaf (terminating condition for the recursion).

– During the search the procedure detects a full child that must be visited and splits it prior to making a recursive call to one of the two new children. This will guarantee, that when a key is inserted into a leaf, the leaf is non full.

• Complexity of Insert: – The number of disk accesses (nodes read) is O(h), at

most h splits, thus at most O(h) nodes allocated.– The CPU time O(th).

Page 19: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Insert Example

• Given, t=3, full node has 5 keys

• Insert C

G M P X

A B D E J K R S T U V Y ZN O

G M P X

A B C D E J K R S T U V Y ZN O

Insert in non-full leaf

Page 20: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Insert Example

• Given, t=3

• Insert Q

G M P X

A B C D E J K R S T U V Y ZN O

G M P T X

A B C D E J K Y ZN O

split

G M P X

A B C D E J K R S T U V Y ZN O

Q R S U V

Page 21: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Insert Example

• Given, t=3

• Insert L

G M P T X

A B C D E J K Y ZN O Q R S U V

Full root, split

A B C D E J K Y ZN O Q R S U V

G M T X

P

A B C D E J K L Y ZN O Q R S U V

G M T X

P

Insert here

Page 22: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Insert Example

• Given, t=3

• Insert F

Full , splitA B C D E J K L Y ZN O Q R S U V

G M T X

P

J K L Y ZN O Q R S U V

C G M T X

P

A B D E F

Page 23: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Basic Operation: Deletion• Key ideas is to

– ensure as you move down the tree that the node to visit (i.e. on the path from the root to the node with the key to be deleted) has at least 1+(t-1) = t, keys, if not we’ll rearrange the tree before continuing

– this way, if a key is deleted from a node still the min # keys is maintained

• Let x be the current node when searching for the node with key k to delete

• Case 1: x is a leaf, just delete k from x

• Case 2: k is in x; – let y is the child before k and z is the child after– Case2a: at least t keys in y– Case2b: at least t keys in z– Case2c: t-1 keys in both y and z (will have to rearrange)– Note: y and z could be leaves

Page 24: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Deletion, Cases 2a,b,c: x has the key

…c k o…

f m

… ……

j iTa Tb

… …

k’s pred k’s succ

y z…c j o…

f m

… ……

j iTa Tb

… …

y z

x

2a Delete of predof k (recursively)

…c i o…

f m

x

……

j iTa Tb

… …

y z

2bDelete of succof k (recursively)

…c o… …

x

……

j iTa Tb

…f k m…

2c

Merge the nodes y and z moving k as a median keyIn the new node x, delete k recursively from x. Note that if x was the root with single(y and z have t-1 each) key k, the height shrinks.(z has

at leasl t)

(y has atleast t)

xKeep in memory to put pred

Keep in memory to put pred

Page 25: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Basic Operation: Deletion (cont)

• Key idea is to – ensure as you move down the tree that at the nodes

visited, the number of keys is always at least one more than the minimum number allowed, t

• Let x be the current node when searching for the node with key k to delete

• Case 1: x is a leaf, just delete k from x• Case 2: k is in x; • Case 3: k is not in the current node x and the node z we

want to go to next has t-1 keys (need to rearrange the tree)– Case 3a: at least t keys in y– Case 3b: t-1 keys in each y and z – Note: roles of y and z can be interchanged (the rotation

will change direction, see next)– Also y and z could be leaves

Page 26: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Deletion, Cases 3a,c: x does not have the key, node to go next has t-1 keys

…c i o…

g h j

x (current node, has at least t)

……

… …

y z (we want to come here, but z has t-1 keys)

Ta Tb Tc

…c h o…

g i j

……

… …

ynewborn sib

x (where you want to go, has t keys now)

Tb Tc

3a

At least t keysIn y

Do left-to-right-like rotation around y

…c o… … x……

Ta Tb

… g h i j …

3c

t -1 keys in y (+z)

Merge nodes y and z, drop ias median key in the merged node

Example Delete k:

Ta

has >= t-1

Tc

Then continuesearch at x

Rearrange so the node tovisit has t keys

Page 27: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Example 1

• Given B-tree rooted in x, t=3, delete F

Y ZA B D E F Q R SN O U V

T xC G M

PX

1

J K L

X not leafX does not have keyAll on path to leaf 3 keys

Y ZA B D E Q R SN O U V

T xC G M

P

J K L

Page 28: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Example 1

• t=3, delete M

Y ZA B D E Q R SN O U V

T xC G M

PX

2a

J K L

X

X not leafX has keyY , left, has t keysPut pred of key up

y

Y ZA B D E Q R SN O U V

T xC G L

P

J K

Page 29: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Example 1

• t=3, delete G

2c

X not leafX has keyY an z have t-1 keysMerge and dropRecursively delete

Y ZA B D E G J K Q R SN O U V

T xC L

P

Y ZA B D E Q R SN O U V

T xC G L

P

J K

x

x

y z

Page 30: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Example 1

• t=3, delete D

3c

X not leaf, X does not have keyZ to go next has t-1 keysY has t-1 keys tooMerge y and z drop rootRecursively delete

Y ZA B D E J K Q R SN O U V

T xC L

P

x

yz

Y ZA B D E J K Q R SN O U V

C L P T X

Y ZA B D E J K Q R SN O U V

C L P T X

x

x

h shrinks

Page 31: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Example 1

• t=3, delete B

X not leaf, X does not have keyZ to go next has t-1 keysY has t keys Rotate-like R to LRecursively delete

Y ZA B E J K Q R SN O U V

C L P T X

x

h shrinks

z

y3a

Y ZA B C J K Q R SN O U V

E L P T X

Page 32: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Example 2

• Given B-tree rooted in x, t=2, delete H

Y

ZX

B

C DA

H

JG

M

NL

S U

VQ R T

P WF

KX

Y

ZX

B

C DA

H

JG

M

NL

S U

VQ R T

WF K

P

x does not have h,node to visit has 1 key,its sibling has 2

3a

x

Page 33: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Example 2

• delete H (cont)

Y

ZX

B

C DA

H

JG

M

NL

S U

VQ R T

WF K

P

3b

x

x does not have h,node to visit has 1 key,its sibling has 1 too

merge+drop

Y

ZX

B

C DA

H K M

JG NL

S U

VQ R T

WF

P

x

Page 34: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Example 2

• delete H (cont)

2c

x has h, is not leaf, y and zhave 1 key each

merge+drop

Y

ZX

B

C DA

H K M

JG NL

S U

VQ R T

WF

P

x

y z

Y

ZX

B

C DA

K M

G H J NL

S U

VQ R T

WF

P

x is a leaf with h, delete (case 1)

Page 35: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Example 2

• Result from Delete H, now delete L

3c merge+drop

Y

ZX

B

C DA

K M

G J NL

S U

VQ R T

WF

P

x

x does not have L, nodeto visit has 1 key, itssibling has 1 too

Y

ZX

B

C DA

K M

G J NL

S U

VQ R T

F P W

x

Height shrinks

Page 36: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

Example 2

• Delete L

Y

ZX

B

C DA

K M

G J NL

SU

VQ R T

F P W

x not leafx does not have keyneed to go to z with 1 keyinto a node with 1 key,Y has 2

3a

x

Y

ZX

B

C DA

J M

G NK L

SU

VQ R T

F P W

zy

x

Page 37: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

B-Tree Delete• Recall BST delete: delete key from

• leaf• internal node with one child• internal node with two children

• Delete a key k from B-Tree T rooted at x.– The node x is in memory.

– Go in one pass, from the root down

– The procedure is always called recursively on a tree rooted in a node with at least t keys, one of these keys might have to be pushed down to a child before continuing down

– If it ever happens that the root x becomes with no keys (may happen in 2c or 3b), the only child of x becomes the root, decreasing the height.

Only the root may become empty (all others have > 1 key after manipulation)

– Next, we just sketch the pseudo-code with the above understanding

Page 38: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

i

Del et e( x, k) / / del et es key k f r om t r ee B- Tr ee x,n[ x] ³ t

i f l eaf ( x) and x cont ai ns k del et e k r et ur n

i f x cont ai ns k { / / case 2

Let key [ x] = k, y = i i +1

i

c [ x] , z = c [ x]

i f n[ y] t / / case 2a

w = key pr eecessor of k

key [ x] = w

el se i f n[ z] t / / case

i

2b

w = key successor of k

key [ x] = w

el se / / case 2c

move k f r om x t o z

move al l keys and chi l dr en of y t o z

f r ee y and Del et e( z, k)

i f n[ x] ==0 x = z } / / shr i nk h

Page 39: B -Trees G.Kamberova, Algorithms B-Trees: Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra.

B -Trees G.Kamberova, Algorithms

i

i - 1

el se { / / case 3

l et z be t he node we want t o go next , z = c [ x]

i f z has a l ef t si bl i ng y and n[ y] t / / case 3a

move key [ x] t o z

n[ y]

n[ y]+1

i- 1

1

move key [ y] t o x

r econnect t he chi l d c [ y] t o z

el se i f z has a r i ght subl i ng y and n[ y] t / / case 3a

move key [ x] t o z

move key [ y] t

1

i- 1

o x

r econnect t he chi l d c [ y] t o z

el se / / case 3b

move key [ x] t o z

move t he keys of y t o z

gi ve al l chi l dr en of y t o z, f r ee y

i f n[ x] ==0 x = z / / shr i nk h

Del et e( z, k) }