1 2-3-4 Trees Chapter 10 – Part 1 2-3-4 Trees and Related Topics.

34
1 2-3-4 Trees Chapter 10 – Part 1 2-3-4 Trees and Related Topics

Transcript of 1 2-3-4 Trees Chapter 10 – Part 1 2-3-4 Trees and Related Topics.

1

2-3-4 Trees

Chapter 10 – Part 1

2-3-4 Trees and Related Topics

2

Introduction

• Multi-way Trees are trees that can have up to four children and three data items per node.

• 2-3-4 Trees: Very nice features– Are always balanced. (What does balanced mean??)

– Reasonably easy to program . Serve as an introduction to the understanding of B-Trees!!

• B-Trees: another kind of multi-way tree particularly useful in organizing external storage, like files. – B-Trees (later lectures) can have dozens or hundreds of

children with hundreds of thousands of records!

3

Introduction to 2-3-4 Trees

• Shape of nodes is a lozenge-shaped.

• In a 2-3-4 tree, all leaf nodes are at the same level. (but data can appear in all nodes)

50

30 60 70 80

10 20 40 55 62 64 66 75 83 86

30

4

• The 2, 3, and 4 in the name refer to how many links to child nodes can potentially be contained in a given node.

• For non-leaf nodes, three arrangements are possible:– A node with only one data item always has two children

– A node with two data items always has three children

– A node with three data items always has four children.

• For non-leaf nodes with at least one data item ( a node will not exist with zero data items), the number of links may be 2, 3, or 4.

5

• Non-leaf nodes must/will always have one more child (link) than it has data items (see below); – Equivalently, if the number of child links is L and the

number of data items is D, then L = D+1.

50

30 60 70 80

10 20 40 55 62 64 66 75 83 86

30

6

More Introductory stuff

• Critical relationships determine the structure of 2-3-4 trees:

• A leaf node has no children, but can still contain one, two, or three data items ( 2, 3, or 4 links); cannot be empty. (See figure above)

Because a 2-3-4 tree can have nodes with up to four children, it’s called a multiway tree of order 4.

50

30 60 70 80

10 20 40 55 62 64 66 75 83 86

30

7

Still More Introductory stuff• Binary (and Red Black) trees may be referred to as multiway

trees of order 2 - each node can have up to two children.

• But note: in a binary tree, a node may have up to two child links ( but one or more may be null).

• In a 2-3-4 tree, nodes with a single link are NOT permitted; – a node with one data item must have two links (unless it’s a leaf); – nodes with two data items must have three children; – nodes with three data items must have four children.

• (You will see this more clearly once we talk about how they are actually built.)

Even More Introductory stuff

• These numbers are important.

• For data items, a node with one data item: points to (links to) lower level nodes that have values less than the value of this item and a pointer to a node that has values greater than or equal to this value.

• For nodes with two links: a node with two links is called a 2-node; a node with three links is called a 3-node; with four links, a 4-node. (no such thing as a 1-node).

8

9

50

30 60 70 80

10 20 40 55 62 64 66 75 83 86

30

Do you see any 2-nodes? 3-nodes? 4-nodes?Do you see: a node with one data item that has two links?

a node with two data items having three children; a node with three data items having four children?

2-node

2-node 4-node

10

2-3-4 Tree Organization

• Very different organization than for a binary tree.

• First, we number the data items in a node 0,1,2 and number child links: 0,1,2,3. Very Important.

• Data items are always ascending: left to right in a node.

• Relationships between data items and child links is easy to understand but critical for processing.

11

More on 2-3-4 Tree Organization

A B C

Points to nodes w/keys < A ; Nodes with key between A and <B Nodes w/keys between B and < C Nodes w/keys > C

See below: (Equal keys not permitted; leaves all on same level; upper level nodes often not full; tree balanced!Its construction always maintains its balance, even if you add additional data items. (ahead)

50

30 60 70 80

10 20 40 55 62 64 66 75 83 86

30

2-node

2-node 4-node

12

Searching a 2-3-4 Tree

• A very nice feature of these trees.• You have a search key; Go to root. • Retrieve node; search data items; • If hit:

– done.

• Else – Select the link that leads to the appropriate subtree

with the appropriate range of values.

– If you don’t find your target here, go to next child. (notice data items are sequential – VIP later)

– etc. Data will ultimately be ‘found’ or ‘not found.’

13

Try it: search for 64, 40, 6550

30 60 70 80

10 20 40 55 62 64 66 75 83 86

30

2-node

2-node 4-node

Note: Nodes serve as holders of data and holders of ‘indexes’.Note: can easily have a ‘no hit’ conditionNote: the sequential nature after indexing…sequential searching within node.

14

So, how do we Insert into this Structure?

• Can be quite easy; sometimes very complex.

– Can do a top-down or a bottom-up approach…• Easy Approach:

– Start with searching to find a spot for data item.

• We like to insert at the leaf level, but we will take the top-down approach to get there… So, – Inserting may very likely involve moving a data item around to

maintain the sequential nature of the data in a leaf.

15

Node Split – a bit more difficult (1 of 2)

• Using a top-down 2-3-4 tree.

• If we encounter a full node in looking for the insertion point.– We must split the full nodes.

• You will see that this approach keeps the tree balanced.

16

Node Split – Insertion: more difficult – 2 of 2

Upon encountering a full node (searching for a place to insert…)1. split that node at that time.

2. move highest data item from the current (full) node into new node to the right.

3. move middle value of node undergoing the split up to parent node (Know we can do all this because parent node was not full)

4. Retain lowest item in node.

5. New node (to the right) only has one data item (the highest value) 6. Original node (formerly full) node contains only the lowest of the three values.

7. Rightmost children of original full node are disconnected and connected to new children as appropriate

(They must be disconnected, since their parent data is changed) New connections conform to linkage conventions, as expected. 8. Insert new data item into the original leaf node.

Note: there can be multiple splits encountered en route to finding the insertion point.

17

Insert: Here: Split is NOT the root node.Let’s say we want to add a 99 (from book)…

62

83 92 104

74 87 89 97 112

… other stuff

Split this node…

99 to be inserted…

Want to add a data value of 99

18

Case 1 Insert: Split is NOT the root node(Let’s say we want to add a 99 (from book)…)

62 92

83

74 87 89 97 99 112

… other stuff

2. 92 moves up to parent node. (We know it was not full)

104

3. 83 stays put1. 104 starts a new node

4. Two rightmost children of split node are reconnected to new node.

5. New data item moved in.

19

If Root itself is full: Split the Root

• Here, the procedure is the same.

• Root is full. Create a sibling– Highest value data is moved into new sibling; – first (smallest value) remains in node; – middle value moves up and becomes data value

in new root.

• Here, two nodes are created:– A new sibling and a new root.

20

Splitting on the Way Down• Note: once we hit a node that must be split (on the way

down), we know that when we move a data value ‘up’ that ‘that’ node was not full.– May be full ‘now,’ but it wasn’t on way down.

• Algorithm is reasonably straightforward. • Do practice the splits on Figure 10.7.

– Will see later on next exam.

• I strongly recommend working the node splits on page 381. Ensure you understand how they work.

• Just remember: – 1. You are splitting a 4-node. Node being split has three data

values. Data on the right goes to a new node. Data on the left remains; data in middle is promoted upward; new data item is inserted appropriately.

– 2. We do a node split any time we encounter a full node and when we are trying to insert a new data value.

21

Objects of this Class Represent Data Items Actually Stored.

// tree234.java import java.io.*;////////////////////////////////////////////////////////////////

class DataItem{ public int dData; // one data item//-------------------------------------------------------------- public DataItem(int dd) // constructor {

dData = dd; }// end constructor

//-------------------------------------------------------------- public void displayItem() // display item, format "/27" {

System.out.print("/"+dData); } // end displayItem()//-------------------------------------------------------------- } // end class DataItem

This is merely A data item stored at the nodes. In practice, this might be an entire record or object. Here we are only showing the key, where the key may represent the entire object.

22

Following code is for processing that goes on inside the Nodes themselves – not the entire

tree.Not trivial code. Try to understand totally.We will dissect carefully and deliberately!

23

class Node { private static final int ORDER = 4; private int numItems; 4 private Node parent; 3 private Node childArray[] = new Node[ORDER]; private DataItem itemArray[] = new DataItem[ORDER-1];// -------------------------------------------------------------

public void connectChild(int childNum, Node child) {// connect child to this node childArray[childNum] = child; if(child != null) child.parent = this; }// -------------------------------------------------------------public Node disconnectChild(int childNum) {// disconnect child from this node, return it Node tempNode = childArray[childNum]; childArray[childNum] = null; return tempNode; }// ------------------------------------------------------------- public Node getChild(int childNum) { return childArray[childNum]; }// ------------------------------------------------------------- public Node getParent() { return parent; }// ------------------------------------------------------------- public boolean isLeaf() { return (childArray[0]==null) ? true : false; }// ------------------------------------------------------------- public int getNumItems() { return numItems; }

This is what a Node looks like:Note: two arrays: a child array and an item array.Note their size: nodes = 4; item = 3.The child array is size 4: the links: maximum children.The second array, itemArray is of size 3 – the maximum number of data items in a node. numItems is the number of items in the itemArray. parent will be used as a reference when inserting.

Major work done by findItem(), insertItem() and removeItem() (next slides) for a given node.

These are complex routines and NOT to be confused with find() and insert() for the Tree234 class itself. These are find() and insert() within THIS node.

Recall: references are automatically initialized to null and numbers to 0 when their object is created. So, Node doesn’t need a Constructor.

One of three slides of code for class Node

24

public DataItem getItem(int index) // get DataItem at index { return itemArray[index]; }// ------------------------------------------------------------- public boolean isFull() { return (numItems==ORDER-1) ? true : false; }// -------------------------------------------------------------public int findItem(int key) // return index of item (within node) { for(int j=0; j<ORDER-1; j++) // if found, otherwise return -1 { if(itemArray[j] == null) break; else

if(itemArray[j].dData == key) return j; }// end for return -1; } // end findItem// ------------------------------------------------------------- public DataItem removeItem() { // removes largest item // assumes node not empty DataItem temp = itemArray[numItems-1]; // use index, save item itemArray[numItems-1] = null; // disconnect it numItems--; // one less item return temp; // return item }// ------------------------------------------------------------- public void displayNode() { // format "/24/56/74/" for(int j=0; j<numItems; j++) itemArray[j].displayItem(); // "/56" System.out.println("/"); // final "/" } } // end class Node

Class Node (continued)

Find routine:Looking for the data within the node where we are located.

Delete RoutineSaves the deleted item.Sets the location contents to null.Decrements the number of items at the node. Returns the deleted data item.

25

// ------------------------------------------------------------- public int insertItem(DataItem newItem) { // assumes node is not full numItems++; // will add new item int newKey = newItem.dData; // key (int value) of new item

for(int j=ORDER-2; j>=0; j--) // start on right to examine data { // looking for spot to insert

if(itemArray[j] == null) // if item null, go left one cell. continue; // Recall: what does ‘continue’ do? else { // if not null, get its key int itsKey = itemArray[j].dData; // not necessary, but … if(newKey < itsKey) // if existing key is bigger, itemArray[j+1] = itemArray[j]; // shift whole node right else { // otherwise, insert new item and return index.+1 itemArray[j+1] = newItem; // copies over moved item… return j+1; } } // end else (not null) } // end for // shifted all items, itemArray[0] = newItem; // insert new item return 0; } // end insertItem()

Class Node (continued)

Insert RoutineIncrements number of items in node.Get key of new item.Now loop.

Go through code and my comments.Start looking for place to insert the data item. Start on the right and proceed left looking for proper place.

26

Code for the 2-3-4 Tree itself

27

class Tree234App { public static void main(String[] args) throws IOException { long value; Tree234 theTree = new Tree234(); theTree.insert(50); theTree.insert(40); theTree.insert(60); theTree.insert(30); theTree.insert(70);

while(true) { System.out.print("Enter first letter of "); System.out.print("show, insert, or find: "); char choice = getChar(); switch(choice) { case 's': theTree.displayTree(); break; case 'i': System.out.print("Enter value to insert: "); value = getInt(); theTree.insert(value); break; case 'f': System.out.print("Enter value to find: "); value = getInt(); int found = theTree.find(value); if(found != -1) System.out.println("Found "+value); else System.out.println("Could not find "+value); break; default: System.out.print("Invalid entry\n"); } // end switch } // end while } // end main()

This code is merely the interface for a client.Pretty easy to follow.

All the complex processing is undertaken at the tree level and at the node level.

Be certain to recognize this.

Note the ‘break’ in the case statements…

28

//-------------------------------------------------------------- public static String getString() throws IOException { InputStreamReader isr = new InputStreamReader

(System.in); BufferedReader br = new BufferedReader(isr); String s = br.readLine(); return s; }//-------------------------------------------------------------- public static char getChar() throws IOException { String s = getString(); return s.charAt(0); }

//------------------------------------------------------------- public static int getInt() throws IOException { String s = getString(); return Integer.parseInt(s); }//------------------------------------------------------------- } // end class Tree234App

Class Tree234App(continued)

29

class Tree234 { private Node root = new Node(); // make root node public int find(int key) { Node curNode = root; int childNumber; while(true) { if(( childNumber=curNode.findItem(key) ) != -1) return childNumber; // found; recall findItem returns index value else if( curNode.isLeaf() ) return -1; // can't find it else // search deeper curNode = getNextChild(curNode, key); } // end while }// end find()

public void insert(int dValue) { // insert a DataItem Node curNode = root; DataItem tempItem = new DataItem(dValue); while(true) { if( curNode.isFull() ) {

split(curNode); // call to split node curNode = curNode.getParent(); // back up // search once curNode = getNextChild(curNode, dValue); } // end if(node is full) else if( curNode.isLeaf() ) // if node is leaf, insert data break; else // node is not full, not a leaf, so go to lower level curNode = getNextChild(curNode, dValue); } // end while curNode.insertItem(tempItem); // insert new DataItem } // end insert()

The object of type Tree234 IS the entire tree.Note: tree only has one attribute, its root.This is all it needs.

Finding, Searching, and Splitting algorithms are all shown here.

Call split routine

See creation of additional two nodes

30

public void split(Node thisNode) { // split the node // assumes node is full DataItem itemB, itemC; // When you get here, you know you need to split… Node parent, child2, child3; int itemIndex; itemC = thisNode.removeItem(); // remove items from this node. itemB = thisNode.removeItem(); // Note: these are second and third items child2 = thisNode.disconnectChild(2); // remove children These are rightmost child3 = thisNode.disconnectChild(3); // from this node two children Node newRight = new Node(); // make new node if(thisNode==root) // if the node we’re looking at is the root, { root = new Node(); // make new root parent = root; // root is our parent root.connectChild(0, thisNode); // connect to parent } else // this node to be split is not the root parent = thisNode.getParent(); // get parent // deal with parent itemIndex = parent.insertItem(itemB); // item B to parent int n = parent.getNumItems(); // total items?

for(int j=n-1; j>itemIndex; j--) { // move parent's connections Node temp = parent.disconnectChild(j); // one child to the right parent.connectChild(j+1, temp); } parent.connectChild(itemIndex+1, newRight); // connect newRight to parent // deal with newRight newRight.insertItem(itemC); // item C to newRight newRight.connectChild(0, child2); // connect to 0 and 1 newRight.connectChild(1, child3); // on newRight } // end split()

Class Tree234(continued)

Be careful in here.Remember, the middle value is moved to parent and must be disconnected (and made null).

Rightmost element is moved into new node as leftmost data item.

Review the process and then note the code that implements the process.

31

// gets appropriate child of node during search for value public Node getNextChild(Node theNode, int theValue) { int j; // assumes node is not empty, not full, not a leaf int numItems = theNode.getNumItems(); for(j=0; j<numItems; j++) // for each item in node // are we less? if( theValue < theNode.getItem(j).dData ) return theNode.getChild(j); // return left child // end for // we're greater, so return theNode.getChild(j); // return right child }// end getNextChild()// ------------------------------------------------------------- public void displayTree() { recDisplayTree(root, 0, 0); }// ------------------------------------------------------------- private void recDisplayTree(Node thisNode, int level, int childNumber) { System.out.print("level="+level+" child="+childNumber+" "); thisNode.displayNode(); // display this node // call ourselves for each child of this node int numItems = thisNode.getNumItems(); for(int j=0; j<numItems+1; j++) { Node nextNode = thisNode.getChild(j); if(nextNode != null) recDisplayTree(nextNode, level+1, j); else return; } } // end recDisplayTree()// -------------------------------------------------------------\ } // end class Tree234

Class Tree234(continued)

32

Efficiency Considerations

33

Efficiency Considerations for 2-3-4 Trees• Searching: • 2-3-4 Trees: one node must be visited, but

– More data per node / level. – Searches are fast.

• recognize all data items at node must be checked in a 2-3-4 tree, • but this is very fast and is done sequentially.

• All nodes in the 2-3-4 tree are NOT always full.

Overall, for 2-3-4 trees, the increased number of items (which increases processing / search times) per node processing tends to cancel out the increases gained from the decreased height of the tree and size of the nodes.– Increased number of data items per node implies: fewer node retrievals.

So, the search times for a 2-3-4 tree and for a balanced binary tree are approximately equal and both are O(log2n)

34

Efficiency Considerations for 2-3-4 Trees

• Storage

• 2-3-4 Trees: a node can have three data items and up to four references.

• Can be an array of references or four specific variables.

• IF not all of it is used, can be considerable waste. In 2-3-4 trees, quite common to see many

nodes not full.