Chapter 8 Multiway Trees - ccc.cs.lakeheadu.caccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s8.pdf ·...

80
CS 2412 Data Structures Chapter 8 Multiway Trees

Transcript of Chapter 8 Multiway Trees - ccc.cs.lakeheadu.caccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s8.pdf ·...

CS 2412 Data Structures

Chapter 8

Multiway Trees

This chapter studies multiway trees. These trees can be used for

external search.

When searched data is big, it is not suitable to load all the data to

memory.

Search in high-speed memory is much faster than search in external

devices (hard discs, CD etc.)

The idea of external search: each time read in a block of

information to the memory and decide what is the next block we

should search on.

Data Structure 2015 R. Wei 2

Definition An m-way tree has the following properties.

• Each node has 0 to m subtrees.

• A node with k < m subtrees contains k subtrees and k− 1 data

entries.

• The keys of the data entries are ordered:

key1 ≤ key2 ≤ · · · ≤ keyk−1.

• The key values in the first subtree (0th subtree) are all less

than key1; the key values in the ith subtrees are all greater

than or equal to keyi but less than keyi+1.

• All subtrees are themselves multiway trees.

A 2-way tree is a BST.

Data Structure 2015 R. Wei 3

Data Structure 2015 R. Wei 4

Data Structure 2015 R. Wei 5

We also want the multisay tree to be balance.

Definition A B-tree is an m-way tree with the following additional

properties:

• The root is either a leaf or it has 2 to m subtrees.

• All internal nodes have at least ⌈m/2⌉ nonnull subtrees.

• All leaf nodes are at the same level.

• A leaf node has at least ⌈m/2⌉ − 1 entries.

Data Structure 2015 R. Wei 6

Data Structure 2015 R. Wei 7

Data structure of B-tree.

The structure of m-way tree: an entry of a node contains data and

a pointer to its right subtree. A node contains the first pointer to

the subtree with entries less than the key of the first entry, a count

of the number of entries currently in the node, and an array of

entries. The array can be of size m.

Main operations for B-trees are: insert, delete, traverse and search.

Data Structure 2015 R. Wei 8

Data Structure 2015 R. Wei 9

B-tree insertion:

B-tree insertion takes place at a leaf node.

• Locate the leaf node where the data can be inserted.

• If the node is not full (has less than m− 1 entries), insert the

data to this node.

• If the node is full (called overflow condition), split the node

into two node.

A B-tree grows from the bottom up.

Data Structure 2015 R. Wei 10

Example Insert 11, 21, 14, 78, 97 to a B-tree of order 5:

Data Structure 2015 R. Wei 11

The algorithm of B-tree insert:

• If the B-tree is empty, then create the root and insert the first

entry.

• If the B-tree is not empty, call the insert node algorithm which

finds the location, insert it and do necessary update (if

overflow, then split and install the median entry to the parent

etc.).

• If the root needs to split, then create a new root.

Data Structure 2015 R. Wei 12

Algorithm BTreeInsert( tree, data)

if (tree empty)

create new node

set left subtree of node to null

move data to first entry in new node

set subtree of first entry to null

set tree root to address of new node

set number of entries to 1

else

insertNode(tree, data, upEntry)

end if

if (tree higher)

create new node

move upEntry to first entry in new node

set left subtree of the new node to tree

set tree root to new node

set number of entries to 1

end if

Data Structure 2015 R. Wei 13

Data Structure 2015 R. Wei 14

Data Structure 2015 R. Wei 15

Algorithm searchNode(nodePtr, target)

if (target < key in first entry)

return 0

end if

set walker to number of entries -1

loop (targer < entry key[walker])

decrement walker

end loop

return walker

This function returns the index to entry with key ≤ target, or 0 if

the key < first entry in node.

Data Structure 2015 R. Wei 16

Algorithm splitNode (node, entryNdx, newEntryLow, upEntry)

create new node

move high entries to new node

if (entryNdx < minimum entries)

inset upEntry in new node

end if

move median data to upEntry

make new node first Ptr the right subtree of median data

make new node the right subtree of upEntry

Data Structure 2015 R. Wei 17

Data Structure 2015 R. Wei 18

Data Structure 2015 R. Wei 19

Data Structure 2015 R. Wei 20

Data Structure 2015 R. Wei 21

B-tree deletion:

B-tree deletion is a little more complicated than insertion.

• Search for the data to be deleted. If can’t find, then print an

error message and quit.

• If the data is found, then delete the data. Two cases need to

consider: the data at leaf node or non-leaf node.

• If an underflow (a leaf node has less than ⌈m/2⌉ − 1 entries or

an internal node has less than ⌈m/2⌉ nonull subtrees) occurred

after the data deletion, then adjustment must be done.

Data Structure 2015 R. Wei 22

The following algorithm delete an entry. Some situations are

considered: empty tree, the root is empty after deletion. Leave the

details about how to treat underflow to algorithm delete.

Algorithm BTreeDelete (tree, dltKey)

if (tree empty)

return false

end if

delete (tree, dltKey, success)

if (success)

if(tree number of entries zero)

set tree to left subtree

end if

end if

return success

Data Structure 2015 R. Wei 23

Data Structure 2015 R. Wei 24

6 end if

6 end if

7 return underflow

end delete

Data Structure 2015 R. Wei 25

The following algorithm deletes the entry from a leaf node and

returns the value of underflow.

Algorithm deleteEntry (node, entryNdx)

delete entry at entryNdx from node

shift entries after delete to left

if (number of entries less minimum, entries)

return true

else

retrun false

end if

Data Structure 2015 R. Wei 26

When deleting an entry in an internal node, we must find

substitute data. We use the immediate predecessor, which is the

largest node on the left subtree of the entry to be deleted. In the

subtree, the largest node is the rightmost subtree.

Algorithm deleteMid (node, entryNdx, subtree)

if (no rightmost subtree) //predecessor in a leaf node

move predecessor’s data to deleted entry

set underflow if node entries less minimum

else

set underflow to deleteMid(node, entryNdx, right subtree)

if (underflow)

set underflow to reFlow(root, entryNdx)

end if

end if

return underflow

Data Structure 2015 R. Wei 27

When a node is underflow, we need to do some adjustment which

we call reflow. Suppose one of the subtree contains unerflow node,

two situations need to consider:

• If the other subtree has more entries than the minimum

number, than we just move some entry to the underflow node,

which we call it balance.

• If the other subtree only has minimum number of entries, then

we need to combine two node to one node together with the

root entry. This is called combine.

Data Structure 2015 R. Wei 28

Algorithm reflow (root, entryNdx)

if (rightTree entries greater minimum entries)

borrowRight (root, entryNdx, leftTree, rightTree)

set underflow to false

else if (leftTree entries greater minimum entries)

borrowLeft (root, entryNdx, leftTree, rightTree)

set underflow to false

else

combine (root, entryNdx, leftTree, rightTree)

if (root numEntries less minimum entries)

set underflow to true

else

set underflow to false

end if

end if

return underflow

Data Structure 2015 R. Wei 29

Data Structure 2015 R. Wei 30

Algorithm borrowLeft(root, entryNdx, left, right)

shift all elements one to the right

move root data to first entry in right

move right first pointer to right subtree of first entry

move left last right pointer to right first pointer

move left last entry data to root at entryNdx

In above algorithm, when an entry is moved the according pointers

are also adjusted. (To see that, consider the underflow node is not

a leaf node). The algorithm of borrowRight is similar.

Data Structure 2015 R. Wei 31

Data Structure 2015 R. Wei 32

Algorithm combine (root, entryNdx, left, right)

move parent entry to first open entry in left subtree

move right subtree first subtree to

moved parent left subtree

move entries from right subtree to end of left subtree

shift root data to left

Data Structure 2015 R. Wei 33

Example

Data Structure 2015 R. Wei 34

Data Structure 2015 R. Wei 35

Similar to BST, the traversal of a B-tree uses inorder. The

difference is that except of leaf nodes, the data in a node is not

processed at the same time.

Data Structure 2015 R. Wei 36

Algorithm BTreeTraversal (root)

set scanCount to 0

set nextSubTree to root left subtree

loop (scanCount <= number of entries)

if (nextSubTree not null)

BTreeTraversal (nextSubTree)

end if

if (ScanCount < number of entries)

process (entry[scanCount])

set nextSubTree to current entry right subtree

end if

increment scanount

end loop

Data Structure 2015 R. Wei 37

The B-tree search algorithm follow the similar idea of search a

binary tree. But we need find the node and then find the entry in

that node. In this case, we need to return both the node and the

location of the entry in that node.

Recursive method are used for finding the node. At the node

found, compare from the last entry to the first entry.

Data Structure 2015 R. Wei 38

Data Structure 2015 R. Wei 39

Data Structure 2015 R. Wei 40

typedef struct

{

void* dataPtr;

struct node* rightPtr;

} ENTRY;

typedef struct node

{

struct node* firstPtr;

int numEntries;

ENTRY entries[ORDER - 1];

} NODE;

typedef struct

{

int count;

NODE* root;

int (*compare) (void* argu1, void* argu2);

} BTREE;

Data Structure 2015 R. Wei 41

void* BTree_Search (BTREE* tree, void* targetPtr)

{

if (tree->root)

return _search

(tree, targetPtr, tree->root);

else

return NULL;

} // BTree_Search

Data Structure 2015 R. Wei 42

void* _search (BTREE* tree, void* targetPtr,

NODE* root)

{

int entryNo;

if (!root)

return NULL;

if (tree->compare(targetPtr,

root->entries[0].dataPtr) < 0)

return _search (tree,

targetPtr,

root->firstPtr);

entryNo = root->numEntries - 1;

while (tree->compare(targetPtr,

root->entries[entryNo].dataPtr) < 0)

entryNo--;

if (tree->compare(targetPtr,

root->entries[entryNo].dataPtr) == 0)

return (root->entries[entryNo].dataPtr);

return (_search (tree,

targetPtr, root->entries[entryNo].rightPtr));

} // _search

Data Structure 2015 R. Wei 43

void BTree_Traverse (BTREE* tree,

void (*process) (void* dataPtr))

{

// Statements

if (tree->root)

_traverse (tree->root, process);

return;

} // end BTree_Traverse

Data Structure 2015 R. Wei 44

void _traverse (NODE* root,

void (*process) (void* dataPtr))

{

int scanCount;

NODE* ptr;

scanCount = 0;

ptr = root->firstPtr;

while (scanCount <= root->numEntries)

{

if (ptr)

_traverse (ptr, process);

// Subtree processed -- get next entry

if (scanCount < root->numEntries)

{

process (root->entries[scanCount].dataPtr);

ptr = root->entries[scanCount].rightPtr;

} // if scanCount

scanCount++;

} // if

return;

} // _traverse

Data Structure 2015 R. Wei 45

void BTree_Insert (BTREE* tree, void* dataInPtr)

{

bool taller;

NODE* newPtr;

ENTRY upEntry;

if (tree->root == NULL)

// Empty Tree. Insert first node

if (newPtr = (NODE*)malloc(sizeof (NODE)))

{

newPtr->firstPtr = NULL;

newPtr->numEntries = 1;

newPtr->entries[0].dataPtr = dataInPtr;

newPtr->entries[0].rightPtr = NULL;

tree->root = newPtr;

(tree->count)++;

for (int i = 1; i < ORDER - 1; i++)

Data Structure 2015 R. Wei 46

{

newPtr->entries[i].dataPtr = NULL;

newPtr->entries[i].rightPtr = NULL;

} // for *

return;

} // if malloc

else

printf("Overflow error 100 in BTree_Insert\a\n"),

exit (100);

taller = _insert (tree, tree->root,

dataInPtr, &upEntry);

if (taller)

{

// Tree has grown. Create new root

newPtr = (NODE*)malloc(sizeof(NODE));

if (newPtr)

Data Structure 2015 R. Wei 47

{

newPtr->entries[0] = upEntry;

newPtr->firstPtr = tree->root;

newPtr->numEntries = 1;

tree->root = newPtr;

} // if newPtr

else

printf("Overflow error 101\a\n"),

exit (100);

} // if taller

(tree->count)++;

return;

} // BTree_Insert

Data Structure 2015 R. Wei 48

bool _insert (BTREE* tree, NODE* root,

void* dataInPtr, ENTRY* upEntry)

{

int compResult;

int entryNdx;

bool taller;

NODE* subtreePtr;

if (!root)

{

(*upEntry).dataPtr = dataInPtr;

(*upEntry).rightPtr = NULL;

return true; // tree taller

} // if NULL tree

entryNdx = _searchNode (tree, root, dataInPtr);

compResult = tree->compare(dataInPtr,

root->entries[entryNdx].dataPtr);

Data Structure 2015 R. Wei 49

if (entryNdx <= 0 && compResult < 0)

// in node’s first subtree

subtreePtr = root->firstPtr;

else

// in entry’s right subtree

subtreePtr = root->entries[entryNdx].rightPtr;

taller = _insert (tree, subtreePtr,

dataInPtr, upEntry);

if (taller)

{

if (root->numEntries >= ORDER - 1)

{

// Need to create new node

_splitNode (root, entryNdx,

compResult, upEntry);

taller = true;

Data Structure 2015 R. Wei 50

} // node full

else

{

if (compResult >= 0)

// New data >= current entry -- insert after

_insertEntry(root, entryNdx + 1, *upEntry);

else

// Insert before current entry

_insertEntry(root, entryNdx, *upEntry);

(root->numEntries)++;

taller = false;

} // else

} // if taller

return taller;

} // _insert

Data Structure 2015 R. Wei 51

void _splitNode (NODE* node, int entryNdx,

int compResult, ENTRY* upEntry)

{

int fromNdx;

int toNdx;

NODE* rightPtr;

rightPtr = (NODE*)malloc(sizeof (NODE));

if (!rightPtr)

printf("Overflow Error 101 in _splitNode\a\n"),

exit (100);

if (entryNdx < MIN_ENTRIES)

fromNdx = MIN_ENTRIES;

else

fromNdx = MIN_ENTRIES + 1;

toNdx = 0;

rightPtr->numEntries = node->numEntries - fromNdx;

Data Structure 2015 R. Wei 52

while (fromNdx < node->numEntries)

rightPtr->entries[toNdx++]

= node->entries[fromNdx++];

node->numEntries = node->numEntries

- rightPtr->numEntries;

if (entryNdx < MIN_ENTRIES)

{

if (compResult < 0)

_insertEntry (node, entryNdx, *upEntry);

else

_insertEntry (node, entryNdx + 1, *upEntry);

} // if

else

{

_insertEntry (rightPtr,

entryNdx - MIN_ENTRIES,

Data Structure 2015 R. Wei 53

*upEntry);

(rightPtr->numEntries)++;

(node->numEntries)--;

} // else

upEntry->dataPtr = node->entries[MIN_ENTRIES].dataPtr;

upEntry->rightPtr = rightPtr;

rightPtr->firstPtr

= node->entries[MIN_ENTRIES].rightPtr;

return;

} // _splitNode

Data Structure 2015 R. Wei 54

bool BTree_Delete (BTREE* tree, void* dltKey)

{

bool success;

NODE* dltPtr;

if (!tree->root)

return false;

_delete (tree,

tree->root,

dltKey,

&success);

if (success)

{

(tree->count)--;

if (tree->root->numEntries == 0)

{

dltPtr = tree->root;

Data Structure 2015 R. Wei 55

tree->root = tree->root->firstPtr;

free (dltPtr);

} // root empty

} // success

return success;

} // BTree_Delete

Data Structure 2015 R. Wei 56

bool _delete (BTREE* tree, NODE* root,

void* dltKeyPtr, bool* success)

{

NODE* leftPtr;

NODE* subTreePtr;

int entryNdx;

int underflow;

if (!root)

{

*success = false;

return false;

} // null tree

entryNdx = _searchNode (tree, root, dltKeyPtr);

if (tree->compare(dltKeyPtr,

root->entries[entryNdx].dataPtr) == 0)

{

Data Structure 2015 R. Wei 57

*success = true;

if (root->entries[entryNdx].rightPtr == NULL)

underflow = _deleteEntry (root, entryNdx);

else

{

if (entryNdx > 0)

leftPtr =

root->entries[entryNdx - 1].rightPtr;

else

leftPtr = root->firstPtr;

underflow = _deleteMid

(root, entryNdx, leftPtr);

if (underflow)

underflow = _reFlow (root, entryNdx);

} // else internal node

} // else found entry

Data Structure 2015 R. Wei 58

else

{

if (tree->compare (dltKeyPtr,

root->entries[0].dataPtr) < 0)

subTreePtr = root->firstPtr;

else

subTreePtr = root->entries[entryNdx].rightPtr;

underflow = _delete (tree, subTreePtr,

dltKeyPtr, success);

if (underflow)

underflow = _reFlow (root, entryNdx);

} // else not found *

return underflow;

} // _delete

Data Structure 2015 R. Wei 59

bool _deleteMid (NODE* root,

int entryNdx,

NODE* subtreePtr)

{

int dltNdx;

int rightNdx;

bool underflow;

if (subtreePtr->firstPtr == NULL)

{

// leaf located. Exchange data & delete leaf

dltNdx = subtreePtr->numEntries - 1;

root->entries[entryNdx].dataPtr =

subtreePtr->entries[dltNdx].dataPtr;

--subtreePtr->numEntries;

underflow = subtreePtr->numEntries < MIN_ENTRIES;

} // if leaf

Data Structure 2015 R. Wei 60

else

{

// Not located. Traverse right for predecessor

rightNdx = subtreePtr->numEntries - 1;

underflow = _deleteMid (root, entryNdx,

subtreePtr->entries[rightNdx].rightPtr);

if (underflow)

underflow = _reFlow (subtreePtr, rightNdx);

} // else traverse right

return underflow;

} // _deleteMid

Data Structure 2015 R. Wei 61

bool _reFlow (NODE* root, int entryNdx)

{

NODE* leftTreePtr;

NODE* rightTreePtr;

bool underflow;

if (entryNdx == 0)

leftTreePtr = root->firstPtr;

else

leftTreePtr = root->entries[entryNdx - 1].rightPtr;

rightTreePtr = root->entries[entryNdx].rightPtr;

if (rightTreePtr->numEntries > MIN_ENTRIES)

{

_borrowRight (root, entryNdx,

leftTreePtr, rightTreePtr);

underflow = false;

} // if borrow right

else

{

Data Structure 2015 R. Wei 62

// Can’t borrow from right--try left

if (leftTreePtr->numEntries > MIN_ENTRIES)

{

_borrowLeft (root, entryNdx,

leftTreePtr, rightTreePtr);

underflow = false;

} // if borrow left *

else

{

// Can’t borrow. Must combine nodes.

_combine (root, entryNdx,

leftTreePtr, rightTreePtr);

underflow = (root->numEntries < MIN_ENTRIES);

} // else combine

} // else borrow right

return underflow;

} // _reFlow

Data Structure 2015 R. Wei 63

void _borrowRight (NODE* root,

int entryNdx,

NODE* leftTreePtr,

NODE* rightTreePtr)

{

int toNdx;

int shifter;

toNdx = leftTreePtr->numEntries;

leftTreePtr->entries[toNdx].dataPtr

= root->entries[entryNdx].dataPtr;

leftTreePtr->entries[toNdx].rightPtr

= rightTreePtr->firstPtr;

++leftTreePtr->numEntries;

root->entries[entryNdx].dataPtr

= rightTreePtr->entries[0].dataPtr;

Data Structure 2015 R. Wei 64

rightTreePtr->firstPtr

= rightTreePtr->entries[0].rightPtr;

shifter = 0;

while (shifter < rightTreePtr->numEntries - 1)

{

rightTreePtr->entries[shifter]

= rightTreePtr->entries[shifter + 1];

++shifter;

} // while

--rightTreePtr->numEntries;

return;

} // _borrowRight

Data Structure 2015 R. Wei 65

void _combine (NODE* root, int entryNdx,

NODE* leftTreePtr, NODE* rightTreePtr)

{

int toNdx;

int fromNdx;

int shifter;

toNdx = leftTreePtr->numEntries;

leftTreePtr->entries[toNdx].dataPtr

= root->entries[entryNdx].dataPtr;

leftTreePtr->entries[toNdx].rightPtr

= rightTreePtr->firstPtr;

++leftTreePtr->numEntries;

--root->numEntries;

fromNdx = 0;

toNdx++;

while (fromNdx < rightTreePtr->numEntries)

Data Structure 2015 R. Wei 66

leftTreePtr->entries[toNdx++]

= rightTreePtr->entries[fromNdx++];

leftTreePtr->numEntries += rightTreePtr->numEntries;

free (rightTreePtr);

shifter = entryNdx;

while (shifter < root->numEntries)

{

root->entries[shifter] =

root->entries[shifter + 1];

shifter++;

} // while

return;

} // _combine

Data Structure 2015 R. Wei 67

BTREE* BTree_Create (int (*compare)

(void* argu1, void* argu2))

{

BTREE* tree;

tree = (BTREE*) malloc (sizeof (BTREE));

if (tree)

{

tree->root = NULL;

tree->count = 0;

tree->compare = compare;

} // if

return tree;

} // BTree_Create

Data Structure 2015 R. Wei 68

void BTree_Print (BTREE* tree)

{

_print (tree->root, 0);

return;

} // BTree_PRINT

void _print (NODE* root, int level)

{

int scanCount;

NODE* ptr;

void* voidPtr;

// Statements

if (root)

{

Data Structure 2015 R. Wei 69

scanCount = root->numEntries - 1;

while (scanCount >= 0)

{

ptr = root->entries[scanCount].rightPtr;

// Test for subtree

if (ptr)

_print (ptr, level + 1);

// Subtree processed -- print current entry

printf("(%02d)", level);

for (int i = 1; i <= level; i++ )

printf (" ." );

voidPtr = root->entries[scanCount].dataPtr;

printf("%4d", *((int*)voidPtr));

Data Structure 2015 R. Wei 70

printf("\t--Node: %p\n", root);

scanCount--;

} // while

// Process first pointer

if (root->firstPtr)

_print (root->firstPtr, level + 1);

} // if root

return;

} // BTree_Print

Data Structure 2015 R. Wei 71

Some special B-tree and variations:

• 2-3 Tree: a B-tree of order 3. (suitable for internal search)

• 2-3-4 Tree: a B-tree of order 4. (suitable for internal search)

• B*tree: when a node overflows, instead of being split

immediately, the data are tried to redistribute among the

node’s siblings.

• B+tree: Some data need to be processed both randomly and

sequentially. In a B+tree, data are all stored in leaf nodes. The

key in the internal node are just for searching. Each leaf node

has one additional pointer pointed to the next leaf node.

Data Structure 2015 R. Wei 72

Data Structure 2015 R. Wei 73

Data Structure 2015 R. Wei 74

Data Structure 2015 R. Wei 75

Tries

A trie is a multiway tree which is used to search keys as a sequence

of characters (letters or digits, for example).

For example, if we want to search a key begin, then we first find b,

then find be, then beg, and so on.

In this way, the root has 26 children. And each node may have at

most 26 children. So it is based on a 26-way tree. In English, there

are no words beginning with ‘bb’, ‘bc’ or , ‘bf’, ‘bg’, · · · . So the

according nodes can be pruned.

Data Structure 2015 R. Wei 76

Data Structure 2015 R. Wei 77

To prune the tree, we cut all of the branches that are not needed.

For example, if no key starts with letter X, then at level 0 the X

pointer is null. Similarly, after the letter Q, the only valid letter is

U . So all the pointers in the Q branch except U are set to null.

As an example, we display a tries which only contains 5 letters

A,B,C,E, and T . A node contains an array of 5. The node itself

pointer to the letter if the letter exists. In this example, most

pointers are null.

Data Structure 2015 R. Wei 78

Data Structure 2015 R. Wei 79

Algorithm searchTrie (dictionary, word)

set root to dictionary

set ltrNdx to 0

loop (root not null)

if (root entry equals word)

return true

end if

if (ltrNdx > = word length)

return false

end if

set chNdx to word[ltrNdx]

set root to chNdx subtree

increment ltrNdx

end loop

return false

Data Structure 2015 R. Wei 80