Chapter 8 Multiway Trees - ccc.cs.lakeheadu.caccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s8.pdf ·...

CS 2412 Data Structures

Chapter 8

Multiway Trees

This chapter studies multiway trees. These trees can be used for

external search.

When searched data is big, it is not suitable to load all the data to

memory.

Search in high-speed memory is much faster than search in external

devices (hard discs, CD etc.)

The idea of external search: each time read in a block of

information to the memory and decide what is the next block we

should search on.

Data Structure 2015 R. Wei 2

Definition An m-way tree has the following properties.

• Each node has 0 to m subtrees.

• A node with k < m subtrees contains k subtrees and k− 1 data

entries.

• The keys of the data entries are ordered:

key1 ≤ key2 ≤ · · · ≤ keyk−1.

• The key values in the first subtree (0th subtree) are all less

than key1; the key values in the ith subtrees are all greater

than or equal to keyi but less than keyi+1.

• All subtrees are themselves multiway trees.

A 2-way tree is a BST.


We also want the multisay tree to be balance.

Definition A B-tree is an m-way tree with the following additional

properties:

• The root is either a leaf or it has 2 to m subtrees.

• All internal nodes have at least ⌈m/2⌉ nonnull subtrees.

• All leaf nodes are at the same level.

• A leaf node has at least ⌈m/2⌉ − 1 entries.


Data structure of B-tree.

The structure of m-way tree: an entry of a node contains data and

a pointer to its right subtree. A node contains the first pointer to

the subtree with entries less than the key of the first entry, a count

of the number of entries currently in the node, and an array of

entries. The array can be of size m.

Main operations for B-trees are: insert, delete, traverse and search.


B-tree insertion:

B-tree insertion takes place at a leaf node.

• Locate the leaf node where the data can be inserted.

• If the node is not full (has less than m− 1 entries), insert the

data to this node.

• If the node is full (called overflow condition), split the node

into two node.

A B-tree grows from the bottom up.


Example Insert 11, 21, 14, 78, 97 to a B-tree of order 5:


The algorithm of B-tree insert:

• If the B-tree is empty, then create the root and insert the first

entry.

• If the B-tree is not empty, call the insert node algorithm which

finds the location, insert it and do necessary update (if

overflow, then split and install the median entry to the parent

etc.).

• If the root needs to split, then create a new root.


Algorithm BTreeInsert( tree, data)

if (tree empty)

create new node

set left subtree of node to null

move data to first entry in new node

set subtree of first entry to null

set tree root to address of new node

set number of entries to 1

else

insertNode(tree, data, upEntry)

end if

if (tree higher)

create new node

move upEntry to first entry in new node

set left subtree of the new node to tree

set tree root to new node

set number of entries to 1

end if


Algorithm searchNode(nodePtr, target)

if (target < key in first entry)

return 0

end if

set walker to number of entries -1

loop (targer < entry key[walker])

decrement walker

end loop

return walker

This function returns the index to entry with key ≤ target, or 0 if

the key < first entry in node.


Algorithm splitNode (node, entryNdx, newEntryLow, upEntry)

create new node

move high entries to new node

if (entryNdx < minimum entries)

inset upEntry in new node

end if

move median data to upEntry

make new node first Ptr the right subtree of median data

make new node the right subtree of upEntry


B-tree deletion:

B-tree deletion is a little more complicated than insertion.

• Search for the data to be deleted. If can’t find, then print an

error message and quit.

• If the data is found, then delete the data. Two cases need to

consider: the data at leaf node or non-leaf node.

• If an underflow (a leaf node has less than ⌈m/2⌉ − 1 entries or

an internal node has less than ⌈m/2⌉ nonull subtrees) occurred

after the data deletion, then adjustment must be done.


The following algorithm delete an entry. Some situations are

considered: empty tree, the root is empty after deletion. Leave the

details about how to treat underflow to algorithm delete.

Algorithm BTreeDelete (tree, dltKey)

if (tree empty)

return false

end if

delete (tree, dltKey, success)

if (success)

if(tree number of entries zero)

set tree to left subtree

end if

end if

return success


6 end if

6 end if

7 return underflow

end delete


The following algorithm deletes the entry from a leaf node and

returns the value of underflow.

Algorithm deleteEntry (node, entryNdx)

delete entry at entryNdx from node

shift entries after delete to left

if (number of entries less minimum, entries)

return true

else

retrun false

end if


When deleting an entry in an internal node, we must find

substitute data. We use the immediate predecessor, which is the

largest node on the left subtree of the entry to be deleted. In the

subtree, the largest node is the rightmost subtree.

Algorithm deleteMid (node, entryNdx, subtree)

if (no rightmost subtree) //predecessor in a leaf node

move predecessor’s data to deleted entry

set underflow if node entries less minimum

else

set underflow to deleteMid(node, entryNdx, right subtree)

if (underflow)

set underflow to reFlow(root, entryNdx)

end if

end if

return underflow


When a node is underflow, we need to do some adjustment which

we call reflow. Suppose one of the subtree contains unerflow node,

two situations need to consider:

• If the other subtree has more entries than the minimum

number, than we just move some entry to the underflow node,

which we call it balance.

• If the other subtree only has minimum number of entries, then

we need to combine two node to one node together with the

root entry. This is called combine.


Algorithm reflow (root, entryNdx)

if (rightTree entries greater minimum entries)

borrowRight (root, entryNdx, leftTree, rightTree)

set underflow to false

else if (leftTree entries greater minimum entries)

borrowLeft (root, entryNdx, leftTree, rightTree)


else

combine (root, entryNdx, leftTree, rightTree)

if (root numEntries less minimum entries)

set underflow to true

else


end if

end if

return underflow


Algorithm borrowLeft(root, entryNdx, left, right)

shift all elements one to the right

move root data to first entry in right

move right first pointer to right subtree of first entry

move left last right pointer to right first pointer

move left last entry data to root at entryNdx

In above algorithm, when an entry is moved the according pointers

are also adjusted. (To see that, consider the underflow node is not

a leaf node). The algorithm of borrowRight is similar.


Algorithm combine (root, entryNdx, left, right)

move parent entry to first open entry in left subtree

move right subtree first subtree to

moved parent left subtree

move entries from right subtree to end of left subtree

shift root data to left


Example


Similar to BST, the traversal of a B-tree uses inorder. The

difference is that except of leaf nodes, the data in a node is not

processed at the same time.


Algorithm BTreeTraversal (root)

set scanCount to 0

set nextSubTree to root left subtree

loop (scanCount <= number of entries)

if (nextSubTree not null)

BTreeTraversal (nextSubTree)

end if

if (ScanCount < number of entries)

process (entry[scanCount])

set nextSubTree to current entry right subtree

end if

increment scanount

end loop


The B-tree search algorithm follow the similar idea of search a

binary tree. But we need find the node and then find the entry in

that node. In this case, we need to return both the node and the

location of the entry in that node.

Recursive method are used for finding the node. At the node

found, compare from the last entry to the first entry.


typedef struct

{

void* dataPtr;

struct node* rightPtr;

} ENTRY;

typedef struct node

{

struct node* firstPtr;

int numEntries;

ENTRY entries[ORDER - 1];

} NODE;

typedef struct

{

int count;

NODE* root;

int (*compare) (void* argu1, void* argu2);

} BTREE;


void* BTree_Search (BTREE* tree, void* targetPtr)

{

if (tree->root)

return _search

(tree, targetPtr, tree->root);

else

return NULL;

} // BTree_Search


void* _search (BTREE* tree, void* targetPtr,

NODE* root)

{

int entryNo;

if (!root)

return NULL;

if (tree->compare(targetPtr,

root->entries[0].dataPtr) < 0)

return _search (tree,

targetPtr,

root->firstPtr);

entryNo = root->numEntries - 1;

while (tree->compare(targetPtr,

root->entries[entryNo].dataPtr) < 0)

entryNo--;

if (tree->compare(targetPtr,

root->entries[entryNo].dataPtr) == 0)

return (root->entries[entryNo].dataPtr);

return (_search (tree,

targetPtr, root->entries[entryNo].rightPtr));

} // _search


void BTree_Traverse (BTREE* tree,

void (*process) (void* dataPtr))

{

// Statements

if (tree->root)

_traverse (tree->root, process);

return;

} // end BTree_Traverse


void _traverse (NODE* root,

void (*process) (void* dataPtr))

{

int scanCount;

NODE* ptr;

scanCount = 0;

ptr = root->firstPtr;

while (scanCount <= root->numEntries)

{

if (ptr)

_traverse (ptr, process);

// Subtree processed -- get next entry

if (scanCount < root->numEntries)

{

process (root->entries[scanCount].dataPtr);

ptr = root->entries[scanCount].rightPtr;

} // if scanCount

scanCount++;

} // if

return;

} // _traverse


void BTree_Insert (BTREE* tree, void* dataInPtr)

{

bool taller;

NODE* newPtr;

ENTRY upEntry;

if (tree->root == NULL)

// Empty Tree. Insert first node

if (newPtr = (NODE*)malloc(sizeof (NODE)))

{

newPtr->firstPtr = NULL;

newPtr->numEntries = 1;

newPtr->entries[0].dataPtr = dataInPtr;

newPtr->entries[0].rightPtr = NULL;

tree->root = newPtr;

(tree->count)++;

for (int i = 1; i < ORDER - 1; i++)


{

newPtr->entries[i].dataPtr = NULL;

newPtr->entries[i].rightPtr = NULL;

} // for *

return;

} // if malloc

else

printf("Overflow error 100 in BTree_Insert\a\n"),

exit (100);

taller = _insert (tree, tree->root,

dataInPtr, &upEntry);

if (taller)

{

// Tree has grown. Create new root

newPtr = (NODE*)malloc(sizeof(NODE));

if (newPtr)


{

newPtr->entries[0] = upEntry;

newPtr->firstPtr = tree->root;

newPtr->numEntries = 1;

tree->root = newPtr;

} // if newPtr

else

printf("Overflow error 101\a\n"),

exit (100);

} // if taller

(tree->count)++;

return;

} // BTree_Insert


bool _insert (BTREE* tree, NODE* root,

void* dataInPtr, ENTRY* upEntry)

{

int compResult;

int entryNdx;

bool taller;

NODE* subtreePtr;

if (!root)

{

(*upEntry).dataPtr = dataInPtr;

(*upEntry).rightPtr = NULL;

return true; // tree taller

} // if NULL tree

entryNdx = _searchNode (tree, root, dataInPtr);

compResult = tree->compare(dataInPtr,

root->entries[entryNdx].dataPtr);


if (entryNdx <= 0 && compResult < 0)

// in node’s first subtree

subtreePtr = root->firstPtr;

else

// in entry’s right subtree

subtreePtr = root->entries[entryNdx].rightPtr;

taller = _insert (tree, subtreePtr,

dataInPtr, upEntry);

if (taller)

{

if (root->numEntries >= ORDER - 1)

{

// Need to create new node

_splitNode (root, entryNdx,

compResult, upEntry);

taller = true;


} // node full

else

{

if (compResult >= 0)

// New data >= current entry -- insert after

_insertEntry(root, entryNdx + 1, *upEntry);

else

// Insert before current entry

_insertEntry(root, entryNdx, *upEntry);

(root->numEntries)++;

taller = false;

} // else

} // if taller

return taller;

} // _insert


void _splitNode (NODE* node, int entryNdx,

int compResult, ENTRY* upEntry)

{

int fromNdx;

int toNdx;

NODE* rightPtr;

rightPtr = (NODE*)malloc(sizeof (NODE));

if (!rightPtr)

printf("Overflow Error 101 in _splitNode\a\n"),

exit (100);

if (entryNdx < MIN_ENTRIES)

fromNdx = MIN_ENTRIES;

else

fromNdx = MIN_ENTRIES + 1;

toNdx = 0;

rightPtr->numEntries = node->numEntries - fromNdx;


while (fromNdx < node->numEntries)

rightPtr->entries[toNdx++]

= node->entries[fromNdx++];

node->numEntries = node->numEntries

- rightPtr->numEntries;

if (entryNdx < MIN_ENTRIES)

{

if (compResult < 0)

_insertEntry (node, entryNdx, *upEntry);

else

_insertEntry (node, entryNdx + 1, *upEntry);

} // if

else

{

_insertEntry (rightPtr,

entryNdx - MIN_ENTRIES,


*upEntry);

(rightPtr->numEntries)++;

(node->numEntries)--;

} // else

upEntry->dataPtr = node->entries[MIN_ENTRIES].dataPtr;

upEntry->rightPtr = rightPtr;

rightPtr->firstPtr

= node->entries[MIN_ENTRIES].rightPtr;

return;

} // _splitNode


bool BTree_Delete (BTREE* tree, void* dltKey)

{

bool success;

NODE* dltPtr;

if (!tree->root)

return false;

_delete (tree,

tree->root,

dltKey,

&success);

if (success)

{

(tree->count)--;

if (tree->root->numEntries == 0)

{

dltPtr = tree->root;


tree->root = tree->root->firstPtr;

free (dltPtr);

} // root empty

} // success

return success;

} // BTree_Delete


bool _delete (BTREE* tree, NODE* root,

void* dltKeyPtr, bool* success)

{

NODE* leftPtr;

NODE* subTreePtr;

int entryNdx;

int underflow;

if (!root)

{

*success = false;

return false;

} // null tree

entryNdx = _searchNode (tree, root, dltKeyPtr);

if (tree->compare(dltKeyPtr,

root->entries[entryNdx].dataPtr) == 0)

{


*success = true;

if (root->entries[entryNdx].rightPtr == NULL)

underflow = _deleteEntry (root, entryNdx);

else

{

if (entryNdx > 0)

leftPtr =

root->entries[entryNdx - 1].rightPtr;

else

leftPtr = root->firstPtr;

underflow = _deleteMid

(root, entryNdx, leftPtr);

if (underflow)

underflow = _reFlow (root, entryNdx);

} // else internal node

} // else found entry


else

{

if (tree->compare (dltKeyPtr,

root->entries[0].dataPtr) < 0)

subTreePtr = root->firstPtr;

else

subTreePtr = root->entries[entryNdx].rightPtr;

underflow = _delete (tree, subTreePtr,

dltKeyPtr, success);

if (underflow)

underflow = _reFlow (root, entryNdx);

} // else not found *

return underflow;

} // _delete


bool _deleteMid (NODE* root,

int entryNdx,

NODE* subtreePtr)

{

int dltNdx;

int rightNdx;

bool underflow;

if (subtreePtr->firstPtr == NULL)

{

// leaf located. Exchange data & delete leaf

dltNdx = subtreePtr->numEntries - 1;

root->entries[entryNdx].dataPtr =

subtreePtr->entries[dltNdx].dataPtr;

--subtreePtr->numEntries;

underflow = subtreePtr->numEntries < MIN_ENTRIES;

} // if leaf


else

{

// Not located. Traverse right for predecessor

rightNdx = subtreePtr->numEntries - 1;

underflow = _deleteMid (root, entryNdx,

subtreePtr->entries[rightNdx].rightPtr);

if (underflow)

underflow = _reFlow (subtreePtr, rightNdx);

} // else traverse right

return underflow;

} // _deleteMid


bool _reFlow (NODE* root, int entryNdx)

{

NODE* leftTreePtr;

NODE* rightTreePtr;

bool underflow;

if (entryNdx == 0)

leftTreePtr = root->firstPtr;

else

leftTreePtr = root->entries[entryNdx - 1].rightPtr;

rightTreePtr = root->entries[entryNdx].rightPtr;

if (rightTreePtr->numEntries > MIN_ENTRIES)

{

_borrowRight (root, entryNdx,

leftTreePtr, rightTreePtr);

underflow = false;

} // if borrow right

else

{


// Can’t borrow from right--try left

if (leftTreePtr->numEntries > MIN_ENTRIES)

{

_borrowLeft (root, entryNdx,


underflow = false;

} // if borrow left *

else

{

// Can’t borrow. Must combine nodes.

_combine (root, entryNdx,


underflow = (root->numEntries < MIN_ENTRIES);

} // else combine

} // else borrow right

return underflow;

} // _reFlow


void _borrowRight (NODE* root,

int entryNdx,

NODE* leftTreePtr,

NODE* rightTreePtr)

{

int toNdx;

int shifter;

toNdx = leftTreePtr->numEntries;

leftTreePtr->entries[toNdx].dataPtr

= root->entries[entryNdx].dataPtr;

leftTreePtr->entries[toNdx].rightPtr

= rightTreePtr->firstPtr;

++leftTreePtr->numEntries;

root->entries[entryNdx].dataPtr

= rightTreePtr->entries[0].dataPtr;


rightTreePtr->firstPtr

= rightTreePtr->entries[0].rightPtr;

shifter = 0;

while (shifter < rightTreePtr->numEntries - 1)

{

rightTreePtr->entries[shifter]

= rightTreePtr->entries[shifter + 1];

++shifter;

} // while

--rightTreePtr->numEntries;

return;

} // _borrowRight


void _combine (NODE* root, int entryNdx,

NODE* leftTreePtr, NODE* rightTreePtr)

{

int toNdx;

int fromNdx;

int shifter;

toNdx = leftTreePtr->numEntries;

leftTreePtr->entries[toNdx].dataPtr

= root->entries[entryNdx].dataPtr;

leftTreePtr->entries[toNdx].rightPtr

= rightTreePtr->firstPtr;

++leftTreePtr->numEntries;

--root->numEntries;

fromNdx = 0;

toNdx++;

while (fromNdx < rightTreePtr->numEntries)


leftTreePtr->entries[toNdx++]

= rightTreePtr->entries[fromNdx++];

leftTreePtr->numEntries += rightTreePtr->numEntries;

free (rightTreePtr);

shifter = entryNdx;

while (shifter < root->numEntries)

{

root->entries[shifter] =

root->entries[shifter + 1];

shifter++;

} // while

return;

} // _combine


BTREE* BTree_Create (int (*compare)

(void* argu1, void* argu2))

{

BTREE* tree;

tree = (BTREE*) malloc (sizeof (BTREE));

if (tree)

{

tree->root = NULL;

tree->count = 0;

tree->compare = compare;

} // if

return tree;

} // BTree_Create


void BTree_Print (BTREE* tree)

{

_print (tree->root, 0);

return;

} // BTree_PRINT

void _print (NODE* root, int level)

{

int scanCount;

NODE* ptr;

void* voidPtr;

// Statements

if (root)

{


scanCount = root->numEntries - 1;

while (scanCount >= 0)

{

ptr = root->entries[scanCount].rightPtr;

// Test for subtree

if (ptr)

_print (ptr, level + 1);

// Subtree processed -- print current entry

printf("(%02d)", level);

for (int i = 1; i <= level; i++ )

printf (" ." );

voidPtr = root->entries[scanCount].dataPtr;

printf("%4d", *((int*)voidPtr));


printf("\t--Node: %p\n", root);

scanCount--;

} // while

// Process first pointer

if (root->firstPtr)

_print (root->firstPtr, level + 1);

} // if root

return;

} // BTree_Print


Some special B-tree and variations:

• 2-3 Tree: a B-tree of order 3. (suitable for internal search)

• 2-3-4 Tree: a B-tree of order 4. (suitable for internal search)

• B*tree: when a node overflows, instead of being split

immediately, the data are tried to redistribute among the

node’s siblings.

• B+tree: Some data need to be processed both randomly and

sequentially. In a B+tree, data are all stored in leaf nodes. The

key in the internal node are just for searching. Each leaf node

has one additional pointer pointed to the next leaf node.


Tries

A trie is a multiway tree which is used to search keys as a sequence

of characters (letters or digits, for example).

For example, if we want to search a key begin, then we first find b,

then find be, then beg, and so on.

In this way, the root has 26 children. And each node may have at

most 26 children. So it is based on a 26-way tree. In English, there

are no words beginning with ‘bb’, ‘bc’ or , ‘bf’, ‘bg’, · · · . So the

according nodes can be pruned.


To prune the tree, we cut all of the branches that are not needed.

For example, if no key starts with letter X, then at level 0 the X

pointer is null. Similarly, after the letter Q, the only valid letter is

U . So all the pointers in the Q branch except U are set to null.

As an example, we display a tries which only contains 5 letters

A,B,C,E, and T . A node contains an array of 5. The node itself

pointer to the letter if the letter exists. In this example, most

pointers are null.


Algorithm searchTrie (dictionary, word)

set root to dictionary

set ltrNdx to 0

loop (root not null)

if (root entry equals word)

return true

end if

if (ltrNdx > = word length)

return false

end if

set chNdx to word[ltrNdx]

set root to chNdx subtree

increment ltrNdx

end loop

return false


Chapter 8 Multiway Trees - ccc.cs.lakeheadu.caccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s8.pdf ·...

Documents

Transcript of Chapter 8 Multiway Trees - ccc.cs.lakeheadu.caccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s8.pdf ·...