1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

42
1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

Transcript of 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

Page 1: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

1

Data Structures:Trees and Grammars

Readings:Sections 6.1, 7.1-7.4(more from Ch. 6 later)

Page 2: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

2

Goals for this Unit

• Continue focus on data structures and algorithms

• Understand concepts of reference-based data structures (e.g. linked lists, binary trees)– Some implementation for binary trees

• Understand usefulness of trees and hierarchies as useful data models– Recursion used to define data

organization

Page 3: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

3

Taxonomy of Data Structures

• From the text:– Data type: collection of values and

operations• Compare to Abstract Data Type!

– Simple data types vs. composite data types

• Book: Data structures are composite data types– Definition: a collection of elements that are

some combination of primitive and other composite data types

Page 4: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

4

Book’s Classification of Data Structures

• Four groupings:– Linear Data Structures– Hierarchical– Graph– Sets and Tables

• When defining these, note an element has:– one or more information fields– relationships with other elements

Page 5: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

5

Note on Our Book and Our Course

• Our book’s strategy– In Ch. 6, discuss principles of Lists

• Give an interface, then implement from scratch

– In Ch. 7, discuss principles of Trees– Later, in Ch. 9, see what Java gives us

• Our course’s strategy– We did Ch. 9 first. Saw List interfaces and

operations– Then, Ch. 8 on maps and sets – Now, trees with some implementation too

Page 6: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

6

Trees Represent…

• Concept of a tree verycommon and important.

• Tree terminology:– Nodes have one parent– A node’s children

• Leaf nodes: no children• Root node: top or start; no parent

• Data structures that store trees• Execution or processing that can

be expressed as a tree– E.g. method calls as a program runs– Searching a maze or puzzle

Page 7: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

7

Trees are Important

• Trees are important for cognition and computation– computer science– language processing (human or

computer)• parse trees

– knowledge representation (or modeling of the “real world”)• E.g. family trees; the Linnaean taxonomy

(kingdom, phylum, …, species); etc.

Page 8: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

8

Another Tree Example: File System

• What about file links (Unix) or shortcuts (Windows)?

C:\

lab2

CS216

lab1 lab3

MyMail

school

CS120

pers

list.h calc.cpplist.cpp

Page 9: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

9

Another Tree Example: XML and HTML documents

<HTML><HEAD>…</HEAD><BODY><H1>My Page</H1><P> Blah<PRE>blah

blah</PRE>End</P></BODY></HTML>

How is this a tree? What are the leaves?

Page 10: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

10

Tree Data Structures

• Why this now?– Very useful in coding

• TreeMap in Java Collections Framework

– Example of recursive data structures– Methods are recursive algorithms

Page 11: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

11

Tree Definitions and Terms

• First, general trees vs. binary trees– Each node in a binary tree has at most two

children

• General tree definition:– Set of nodes T (possibly empty?) with a

distinguished node, the root– All other nodes form a set of disjoint subtrees

Ti

• each a tree in its own right• each connected to the root with an edge

• Note the recursive definition– Each node is the root of a subtree

Page 12: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

12

Picture of Tree Definition

• And all subtrees are recursively defined as:– a node with…– subtrees attached to it

r

T1

T2 T3

Page 13: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

13

Tree Terminology

• A node’s parent• A node’s children

– Binary tree: left child and right child– Sibling nodes– Descendants, ancestors

• A node’s degree (how many children)

• Leaf nodes or terminal nodes• Internal or non-terminal nodes

Page 14: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

14

Recursive Data StructureRecursive Data Structure: a data

structure that contains a pointer or reference to an instance of itself:

public class TreeNode<T> {T nodeItem;TreeNode<T> left, right;TreeNode<T> parent;…

}• Recursion is a natural way to express

many algorithms.• For recursive data-structures, recursive

algorithms are a natural choice

Page 15: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

15

General Trees

• Representing general trees is a bit harder– Each node has a list of child nodes

• Turns out that:– Binary trees are simpler and still quite

useful

• From now on, let’s focus on binary-trees only

Page 16: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

16

ADT Tree

• Remember definition on an ADT?– Model of information: we just covered that– Operations? See pages 405-406 in textbook

• Many are similar to ADT List or any data structure– The “CRUD” operations: create, replace,

update, delete• Important about this list of operations

– some are in terms of one specified node, e.g. hasParent()

– others are “tree-wide”, e.g. size(), traversal

Page 17: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

17

Classes for Binary Trees (pp. 416-431)

• class LinkedBinaryTree (p. 425, bottom)– reference to root BinaryTreeNode– methods: tree-level operations

• class BinaryTreeNode (p. 416)– data: an object (of some type)– left: references root of left-subtree (or null)– right: references root of right-subtree (or

null)– parent: references this node’s parent node

• Could this be null? When should it be?

– methods: node-level operations

Page 18: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

18

Two-class Strategy for Recursive Data Structures

• Common design: use two classes for a Tree or List

• “Top” class– has reference to “first” node– other things that apply to the whole data-structure

object (e.g. the tree-object)• both methods and fields

• Node class– Recursive definitions are here as references to other

node objects– Also data (of course)– Methods defined in this class are recursive

Page 19: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

19

Binary Tree and Node Class

• LinkedBinaryTree class has:– reference to root node– reference to a current node, a cursor– non-recursive methods like:

boolean find(tgt) // see if tgt is in the whole tree• Node class has:

– data, references to left and right subtrees– recursive versions of methods like find:

boolean find(tgt) // is tgt here or in my subtrees?• Note: BinaryTree.find() just calls Node.find()

on the root node!– Other methods work this way too

Page 20: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

20

Why Does This Matter Now?

• This illustrates (again) important design ideas

• The tree itself is what we’re interested in– There are tree-level operations on it (“ADT

level” operations)• The implementation is a recursive data

structure– There are recursive methods inside the

lower-level classes that are closely related (same name!) to the ADT-level operation

• Principles? abstraction (hiding details), delegation (helper classes, methods)

Page 21: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

21

ADT Tree Operations: “Navigation”

• Positioning:– toRoot(), toParent(), toLeftChild(),

toRightChild(), find(Object o)

• Checking:– hasParent(), hasLeftChild(), etc.– equals(Object tree2)

• Book calls this a “deep compare”• Do two distinct objects have the same

structure and contents?

Page 22: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

22

ADT Tree Operations: Mutators

• Mutators:– insertRight(Object o),

insertLeft(Object o)• create a new node containing new data• make this new node be the child of the

current node• Important: We use these to build trees!

– prune()• delete the subtree rooted by the current

node

Page 23: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

23

Next: Implementation

• Next (in the book)– How to implement Java classes for binary trees– Class for node, another class for BinTree– Interface for both, then two implementations

(array and reference)

• But for us:– We’ll skip some of this, particularly the array

version– We’ll only look at reference-base

implementation– After that: concept of a binary search tree

Page 24: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

Understanding Implementations

• Let’s review some of the methods on pp. 416-431– (Done in class, looking at code in book.)

• Some topics discussed:– Node class. Parent reference or not?– Are two trees equal?– Traversal strategies: Section 7.3.2 in

book– visit() method and callback (also 7.3.2)

24

Page 25: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

25

Binary Search Trees

• We often need collections that store items– Maybe a long series of inserts or deletions

• We want fast lookup, and often we want to access in sorted order– Lists: O(n) lookup– Could sort them for O(lg n) lookup

• Cost to sort is O(n lg n) and we might need to re-sort often as we insert, remove items

• Solution: search tree

Page 26: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

26

Binary Search Trees

• Associated with each node is a key value that can be compared.

• Binary search tree property:– every node in the left subtree has

key whose value is less than the value of the root’s key value, and

– every node in the right subtree has key whose value is greater than the value of the root’s key value.

Page 27: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

27

Example

3

1171

84

5

BINARY SEARCH TREE

Page 28: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

28

Counterexample

4

181062

115

8

20

21NOT ABINARY SEARCH TREE

7

15

Page 29: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

29

Find and Insert in BST• Find: look for where it should be• If not there, that’s where you insert

Page 30: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

30

Recursion and Tree Operations

• Recursive code for tree operations is simple, natural, elegant

• Example: pseudo-code for Node.find()boolean find(Comparable tgt) { Node next = null; if (this.data matches tgt) return true else if (tgt’s data < this.data) next = this.leftChild else // tgt’s data > this.data next = this.rightChild // next points to left or right subtree if (next == null ) return false // no subtree else return next.find(tgt) // search on }

Page 31: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

Order in BSTs

• How could we traverse a BST so that the nodes are visited in sorted order?– Does one of our traversal strategies work?

• A very useful property about BSTs• Consider Java’s TreeSet and TreeMap

– A search tree (not a BST, but be one of its better “cousins”)• In CS2150: AVL trees, Red-Black trees

– Guarantee: search times are O(lg n)

31

Page 32: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

Deleting from a BST

• Removing a node requires– Moving its left and right subtrees– Not hard if one not there– But if both there?

• Answer: not too tough, but wait for CS2150 to see!

• In CS2110, we’ll not worry about this

32

Page 33: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

Next: Grammars, Trees, Recursion

• Languages are governed by a set of rules called a grammar– Is a statement legal ?– Generate or derive a new legal statement

• Natural language grammars– Language processing by computers

• But, grammars used a lot in computing– Grammar for a programming language– Grammar for valid inputs, messages, data,

etc.33

Page 34: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

34

Backus-Naur Form• http://en.wikipedia.org/wiki/Backus-Naur_form

• BNF is a widely-used notation for describing the grammar or formal syntax of programming languages or data

• BNF specifics a grammar as a set of derivation rules of this form: <symbol> ::= <expression with symbols>

• Look at website and example there (also on next slide)– How are trees involved here? Is it recursive?

Page 35: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

35

BNF for Postal Address

1. <postal-address> ::= <name-part> <street-address> <zip-part>

2. <personal-part> ::= <first-name> | <initial> "." 3. <name-part> ::= <personal-part> <last-name> [<jr-

part>] <EOL> | <personal-part> <name-part>4. <street-address> ::= [<apt>] <house-num> <street-

name> <EOL> 5. <zip-part> ::= <town-name> "," <state-code> <ZIP-

code> <EOL> Example:Ann Marie G. Jones

123 Main St.Hooville, VA 22901

Where’s the recursion?

Page 36: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

36

Grammars in Language

• Rule-based grammars describe– how legal statements can be produced– how to tell if a statement is legal

• Study textbook, pp. 389-391, to see rule-based grammar for simple Java-like arithmetic expressions– four rules for expressions, terms, factors,

and letter– Study how a (possibly) legal statement is

parsed to generate a parse tree

Page 37: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

37

Computing Parse-Tree Example

• Expression: a * b + c

Page 38: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

38

Grammar Terms and Concepts

• First, this is what’s called a context-free grammar– For CS2110, let’s not worry about what

this means! (But in CS2102, you learn this.)

• A CFG has– a set of variables (AKA non-terminals)– a set of terminal symbols– a set of productions– a starting symbol

Page 39: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

39

Previous Parse Tree

• Terminal symbols:– <operator> could be: + *– <letter> could be: a b c

• Production: <factor> <letter> | <number>

Page 40: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

40

Natural Language Parse Tree

• Statement: The man bit the dog

Page 41: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

41

How Can We Use Grammars?

• Parsing– Is a given statement a valid statement in the

language? (Is the statement recognized by the grammar?)

– Note this is what the Java compiler does as a first step toward creating an executable form of your program. (Find errors, or build executable.)

• Production– Generate a legal statement for this grammar– Demo: generate random statements!

• See link on website next to slides

Page 42: 1 Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later)

42

Demo’s Poem-grammar data file

{<start>The <object> <verb>

tonight}

{<object>waves big yellow flowers

slugs}

{<verb>sigh <adverb>portend like

<object>die <adverb> }

{<adverb>warilygrumpily}

• Note: no recursive productions in this example!