Welcome to CIS 068 !

61
CIS 068 Welcome to CIS 068 ! Lesson 10: Data Structures

description

Welcome to CIS 068 !. Lesson 10: Data Structures. Overview. Description, Usage and Java-Implementation of Collections Lists Sets Hashing. Definition. Data Structures Definition ( www.nist.gov ): - PowerPoint PPT Presentation

Transcript of Welcome to CIS 068 !

Page 1: Welcome to CIS 068 !

CIS 068

Welcome to CIS 068 !Lesson 10:

Data Structures

Page 2: Welcome to CIS 068 !

CIS 068

Overview

Description, Usage and Java-Implementation of

Collections

Lists

Sets

Hashing

Page 3: Welcome to CIS 068 !

CIS 068

Definition

Data Structures

Definition (www.nist.gov):

“An organization of information, usually in memory, for better algorithm efficiency, such as queue, stack, linked list, heap,

dictionary, and tree, or conceptual unity, such as the name and address of a

person.”

Page 4: Welcome to CIS 068 !

CIS 068

Efficiency

“An organization of information …for better algorithm efficiency...”:

Isn’t the efficiency of an algorithm defined by the order of magnitude O( )?

Page 5: Welcome to CIS 068 !

CIS 068

Efficiency

Yes, but it is dependent on its implementation.

Page 6: Welcome to CIS 068 !

CIS 068

Introduction• Data structures define the structure of a

collection of data types, i.e. primitive data types or objects

• The structure provides different ways to access the data

• Different tasks need different ways to access the data

• Different tasks need different data structures

Page 7: Welcome to CIS 068 !

CIS 068

Introduction

Typical properties of different structures:

• fixed length / variable length

• access by index / access by iteration

• duplicate elements allowed / not allowed

Page 8: Welcome to CIS 068 !

CIS 068

Examples

Tasks:

• Read 300 integers

• Read an unknown number of integers

• Read 5th element of sorted collection

• Read next element of sorted collection

• Merge element at 5th position into collection

• Check if object is in collection

Page 9: Welcome to CIS 068 !

CIS 068

Examples

Although you can invent any datastructure you want, there are ‘classic structures‘, providing:

• Coverage of most (classic) problems

• Analysis of efficience

• Basic implementation in modern languages, like JAVA

Page 10: Welcome to CIS 068 !

CIS 068

Data Structures in JAVA

Let‘s see what JAVA has to offer:

Page 11: Welcome to CIS 068 !

CIS 068

The Collection Hierarchy

Collection: top interface, specifying requirements for all collections

Page 12: Welcome to CIS 068 !

CIS 068

Collection Interface

Page 13: Welcome to CIS 068 !

CIS 068

Collection Interface!

Page 14: Welcome to CIS 068 !

CIS 068

Iterator InterfacePurpose:

• Sequential access to collection elements

• Note: the so far used technique of sequentially accessing elements by sequentially indexing is not reasonable in general (why ?) !

Methods:

Page 15: Welcome to CIS 068 !

CIS 068

Iterator Interface

Iterator points ‘between‘ the elements of collection:

1 2 3 4 5

first position, hasNext() = true, remove() throws error

Current position (after 2 calls to next() ),

remove() deletes element 2 Position after next()

hasNext() = false

Returned element

Page 16: Welcome to CIS 068 !

CIS 068

Iterator Interface Usage

Typical usage of iterator:

Page 17: Welcome to CIS 068 !

CIS 068

Back to Collections

AbstractCollection

Page 18: Welcome to CIS 068 !

CIS 068

AbstractCollection

• Facilitates implementation of Collection interface

• Providing a skeletal implementation

• Implementation of a concrete class:

• Provide data structure (e.g. array)

• Provide access to data structure

Page 19: Welcome to CIS 068 !

CIS 068

AbstractCollection• Concrete class must provide implementation of

Iterator

• To maintain ‘abstract character‘ of data in AbstractClass implemented (non abstract) methods use Iterator-methods to access data

AbstractCollection myCollection

add(){

Iterator i=iterator();

}

Clear(){

Iterator i=iterator();

}

implements Iterator;

int[ ] data;

Iterator iterator(){

return this;

}

hasNext(){

}

Page 20: Welcome to CIS 068 !

CIS 068

Back to Collections

List Interface

Page 21: Welcome to CIS 068 !

CIS 068

List Interface

• Extends the Collection Interface

• Adds methods to insert and retrieve objects by their position (index)

• Note: Collection Interface could NOT specify the position

• A new Iterator, the ListIterator, is introduced

• ListIterator extends Iterator, allowing for bidirectional traversal (previousIndex()...)

Page 22: Welcome to CIS 068 !

CIS 068

List Interface

Incorporates

index !

A new

Iterator Type

(can move forward

and

backward)

Page 23: Welcome to CIS 068 !

CIS 068

Example: Selection-Sorting a ListPart 1: call to selection

sort

Actual implementation of List does not

matter !

Call to SelectionSort

Use only Iterator-properties of ListIterator (upcasting)

Page 24: Welcome to CIS 068 !

CIS 068

Example: Selection-Sorting a ListPart 2:

Selection sort

access at index ‘fill‘

Inner loop

swap

Page 25: Welcome to CIS 068 !

CIS 068

Back to Collections

AbstractList: ...again the implementation of some methods...

Note:

Still ABSTRACT !

Page 26: Welcome to CIS 068 !

CIS 068

Concrete Lists

ArrayList and Vector:

at last concrete implementations !

Page 27: Welcome to CIS 068 !

CIS 068

ArrayList and VectorVector:

• For compatibility reasons (only)

• Use ArrayList

ArrayList:

• Underlying DataStructure is Array

• List-Properties add advantage over Array:

• Size can grow and shrink

• Elements can be inserted and removed in the middle

Page 28: Welcome to CIS 068 !

CIS 068

An Alternative Implementation (1)

Page 29: Welcome to CIS 068 !

CIS 068

An Alternative Implementation (2)

Page 30: Welcome to CIS 068 !

CIS 068

An Alternative Implementation (3)

Page 31: Welcome to CIS 068 !

CIS 068

Collections

The underlying array-datastructure has

• advantages for index-based access

• disadvantages for insertion / removal of middle elements (copy), insertion/removal with O(n)

• Alternative: linked lists

Page 32: Welcome to CIS 068 !

CIS 068

Linked List

Flexible structure, providing

• Insertion and removal from any place in O(1), compared to O(n) for array-based list

• Sequential access

• Random access at O(n), compared to O(1) for array-based list

Page 33: Welcome to CIS 068 !

CIS 068

Linked List

• List of dynamically allocated nodes

• Nodes arranged into a linked structure

• Data Structure ‘node‘ must provide

• Data itself (example: the bead-body)

• A possible link to another node (ex.: the link)

Children’s pop-beads as an example for a linked list

Page 34: Welcome to CIS 068 !

CIS 068

Linked List

Old nodeNew node next next (null)

Page 35: Welcome to CIS 068 !

CIS 068

Connecting Nodes

creating the nodes

connecting

Page 36: Welcome to CIS 068 !

CIS 068

Inserting Nodes

p.link = r

r.link = q

q can be accessed by p.link.link

r

Page 37: Welcome to CIS 068 !

CIS 068

Removing Nodes

p q

Page 38: Welcome to CIS 068 !

CIS 068

Traversing a List

(null)

Page 39: Welcome to CIS 068 !

CIS 068

Double Linked ListsSingle linked list

Double linked list

(null)

(null)

data

successor

predecessor

data

successor

predecessor

data

successor

predecessor

(null)

(null)

Page 40: Welcome to CIS 068 !

CIS 068

Back to Collections

AbstractSequentialList and LinkedList

Page 41: Welcome to CIS 068 !

CIS 068

LinkedList

An implementation example:

See textbook

Page 42: Welcome to CIS 068 !

CIS 068

Sets

Example task:

Examine, collection contains object o

Solution using a List:

-> O(n) operation !

Page 43: Welcome to CIS 068 !

CIS 068

Sets

Comparison to List:

• Set is designed to overcome the limitation of O(n)

• Contains unique elements

• contains() / remove() operate in O(1) or O(log n)

• No get() method, no index-access...

• ...but iterator can (still) be used to traverse set

Page 44: Welcome to CIS 068 !

CIS 068

Back to Collections

Interface Set

Page 45: Welcome to CIS 068 !

CIS 068

Hashing

How can method ‘contain()‘ be implemented to be an O(1) operation ?

http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/hash_tables.html

Page 46: Welcome to CIS 068 !

CIS 068

Hashing

How can method ‘contain()‘ be implemented to be an O(1) operation ?

Idea:

• Retrieving an object of an array can be done in O(1) if the index is known

• Determine the index to store and retrieve an object by the object itself !

Page 47: Welcome to CIS 068 !

CIS 068

Hashing

Determine the index ... by the object itself:

Example:

Store Strings “Apu“, “Bob“, “Daria“ as Set.

Define function H: String -> integer:

• Take first character, A=1, B=2,...

Store names in String array at position H(name)

Page 48: Welcome to CIS 068 !

CIS 068

HashingApu: first character: A H(A) = 1

Bob: first character: B H(B) = 2

Daria: first character: D H(D) = 4

...

Apu

Bob

(unused)

Daria

(unused)

Page 49: Welcome to CIS 068 !

CIS 068

Hashing• The Function H(o) is called the HashCode of the object o

• Properties of a hashcode function:

• If a.equals(b) then H(a) = H(b)

• BUT NOT NECESSARILY VICE VERSA:

• H(a) = H(b) does NOT guarantee a.equals(b) !

• If H() has ‘sufficient variation‘, then it is most likely, that different objects have different hashcodes

Page 50: Welcome to CIS 068 !

CIS 068

Hashing• Additionally an array is needed,

that has sufficient space to contain at least all elements.

• The hashcode may not address an index outside the array, this can easily be achieved by:

• H1(o) = H(o) % n

• % = modulo-function, n = array length

• The larger the array, the more variates H1() !

Apu

Bob

(unused)

Daria

(unused)

Page 51: Welcome to CIS 068 !

CIS 068

HashingBack to the example:

Insert ‘Abe‘

First character: A H(A) = 1

H(Apu) = H(Abe), this is called a

Collision

Apu

Bob

(unused)

Daria

(unused)

Page 52: Welcome to CIS 068 !

CIS 068

Solving CollisionsMethod 1:

Don‘t use array of objects, but arrays of linked lists !

Apu

Bob

(unused)

Daria

(unused)

Abe

ARRAY

Array contains (start of) linked lists

Page 53: Welcome to CIS 068 !

CIS 068

Solving CollisionsDrawback:

• Objects must be ‘wrapped‘ in node structure, to provide links, introducing a huge overhead

’Apu’

Node

link

’Apu’wrap

Page 54: Welcome to CIS 068 !

CIS 068

Solving CollisionsMethod 2:

• Iteratively apply different hashcodes H0, H1, H2,.. to object o, until collision is solved

• As long as the different hashcodes

are used in the same order, the

search is guaranteed to be

consistent

Apu

Bob

(unused)

Daria

(unused)

ARRAY

Apu

H0

H1

H2

Page 55: Welcome to CIS 068 !

CIS 068

Solving CollisionsThe easiest hashcode-series Hinc:

H(0) = H

Hi = Hi-1 + i

http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/hash_tables.html

Apu

Bob

(unused)

Daria

(unused)

ARRAY

Apu

H0H1

H2

Page 56: Welcome to CIS 068 !

CIS 068

addExample implementation of ‘add(Object o)‘ using Hinc

(assume array A has length n, H as given above)

determine index = H(o) % n

while ( A[index] != null )

if o.equals(A[index])

break;

else

index = (index +1) % n;

end

}

add element at position a[index]

Page 57: Welcome to CIS 068 !

CIS 068

Example implementation of ‘contains(Object o)‘ using Hinc

(assume array A has length n, H as given above)

determine index = H(o) % n

found = false;

while ( A[index] != null )

if o.equals(A[index])

found = true;

break;

else

index = (index +1) % n;

end

}

// ‘found‘ is true if set contains object o

contains

Page 58: Welcome to CIS 068 !

CIS 068

• If there is no collision, contains() operates in O(1)

• If the set contains elements having the same hashcode, there is a collision. Being dupmax the maximum value of elements having the same hash code, contains() operates in O(dupmax)

• If dupmax is near n, there is no increase in speed, since contains() operates in O(n)

Analysis

Page 59: Welcome to CIS 068 !

CIS 068

• JAVA provides a hashcode for every object

• The implementation for hashCode for e.g. String is computed by:

S[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

n = length of string, s[i] = character at position i

A Real Hashcode

Method hashCode in java.lang.Object

Page 60: Welcome to CIS 068 !

CIS 068

What happens if the array is full ?• Create new array, e.g. double size, and insert all

elements of old table into new table

• Note: the elements won‘t keep their index, since the modulo-function applied to the hashing has changed !

Rehashing a table

Page 61: Welcome to CIS 068 !

CIS 068

• Hashtable provides Set-operations add(), contains() in O(1) if hashcode is chosen properly and array allows for sufficient variation

• Speed is gained by usage of more memory

• If multiple collisions occur, hashtable might be slower than list due to overhead (computation of H,...)

Hashcode Resume