CSC 212 – Data Structures Lecture 26: Hash Tables.

14
CSC 212 – Data Structures Lecture 26: Hash Tables

Transcript of CSC 212 – Data Structures Lecture 26: Hash Tables.

Page 1: CSC 212 – Data Structures Lecture 26: Hash Tables.

CSC 212 –Data Structures

Lecture 26:

Hash Tables

Page 2: CSC 212 – Data Structures Lecture 26: Hash Tables.

Question of the Day

Two English words change their pronounciation when their first letter is capitalized. What are they?

Polish/polish

Reading/reading

Page 3: CSC 212 – Data Structures Lecture 26: Hash Tables.

Entry Interface

ADT representing search dataEach Entry is key-value pairkey is what we have and use…… but we actually want the value

public interface Entry<K,V> { public K key(); public V value();

}

Page 4: CSC 212 – Data Structures Lecture 26: Hash Tables.

Map ADT

Represents searchable CollectionsData items are entries (key-value pairs) Instances map keys to values

Keys contained in at most one entrySo each key mapped to at most one value

Values may be in multiple entriesSo many keys refer to same value

Basis of searching data structures

Page 5: CSC 212 – Data Structures Lecture 26: Hash Tables.

Map Interface

public interface Map<K,V> extends Collection {public V put(K key, V val) throws InvalidKeyException;public V get(K key) throws InvalidKeyException;public V remove(K key) throws InvalidKeyException;public Iterable<K> keys();public Iterable<V> values();public Iterable<Entry<K,V>> entries();

}

Page 6: CSC 212 – Data Structures Lecture 26: Hash Tables.

PositionList-Based Implementation

PositionList holds entries in any order Independent of PositionList’s

implementationRelies on methods defined by Interface

Positions

Entrys

9 c 6 c 5 c 8 c

PositionList

Map

Page 7: CSC 212 – Data Structures Lecture 26: Hash Tables.

Map Performance

Want simple & fast implementationGoogle: Search speed measured in TB/sList-based Map: get, remove, put takes

O(n) time Would love to use arrays

Implementation is easy Insertion, access, and removal in O(1) timeBut ranks or array indices are ints, not K

Page 8: CSC 212 – Data Structures Lecture 26: Hash Tables.

Hashing To The Rescue

For each key, hash function computes integer from 0 to N - 1For example, h(x) = x mod NValue h(x) is “hash value” of x

Hash table stores all the entries(Nothing to do with eateries in Amsterdam)Really just an array of size N

Goal is storing entry (k, v) at index h(k)1st (good) implementation of a Map

Page 9: CSC 212 – Data Structures Lecture 26: Hash Tables.

Hash Table Example

Stores instances of Entry<SSN,Name>

Array has 10,000 indices

Hash function ish(x)x mod 10,000

What if execute call: put(212710001, “Ike Oh”);

01234

999799989999

4512290004 | “Jill Roe”

9811010002 | “Bob Dole”

2007519998 | “Rhi Smith”

0256120001 | “Jay Doe”

Page 10: CSC 212 – Data Structures Lecture 26: Hash Tables.

Collisions

Name when keys hash to same index Ideal hash spreads out equally and evenly

Limit/avoid collisionsBut also want to keep table small

But good hash hard to findDepends on what you have to work withEven harder to make a good hash

Could try to work with collisions

Page 11: CSC 212 – Data Structures Lecture 26: Hash Tables.

Bucket Arrays

Each item in array is itself a List “Chain” whenever there is a collision

Nothing to do with road rage Instead, just add new Entry onto List

Page 12: CSC 212 – Data Structures Lecture 26: Hash Tables.

Bucket Arrays

But what if have really bad hash? Suppose always hash to same index

All entries now in single List Back to O(n) execution times (Also get bad case of the munchies)

Page 13: CSC 212 – Data Structures Lecture 26: Hash Tables.

Your Turn

Get back into groups and do activity

Page 14: CSC 212 – Data Structures Lecture 26: Hash Tables.

Before Next Lecture…

Keep up with your reading! Complete Week #10 Assignment Review Programming Assignment #3