Grokking Hash Tables

22
Grokking Hash Tables

description

Grokking Hash Tables. A hash table is…. Just a data structure that’s used to establish a mapping between arbitrary objects Not to be confused with Map : An interface that specifies methods that a program could call to get/set/query key/value mappings - PowerPoint PPT Presentation

Transcript of Grokking Hash Tables

Page 1: Grokking Hash Tables

Grokking Hash Tables

Page 2: Grokking Hash Tables

A hash table is… Just a data structure that’s used to establish

a mapping between arbitrary objects Not to be confused with Map:

An interface that specifies methods that a program could call to get/set/query key/value mappings

Basically, defines what mappings are A hash table is one way (but not the only)

to make a Map And your MondoHashTable is just

implementation of that hash table data struct that happens to match the Map interface

Page 3: Grokking Hash Tables

Example:

/**

* MondoHashTable is a hash table based

* implementation of the Map interface

*/

import java.util.Map;

public class MondoHashTable implements Map {

}

Page 4: Grokking Hash Tables

So what’s a mapping?

A mapping is just a pair relationship between two things

In math, we write a mapping m: xy to denote that m establishes relationships between things of type x and things of type y

Big restriction: every unique x must map to a single y

Essentially: every left hand side has one and only one right hand side

Page 5: Grokking Hash Tables

Examples of mappings

376.0827 (abs(sqrt(37))) 79609 “Prof Lane’s office phone” 123456789 studentRecord(“J. Student”) -6.0827 36.999 studentRecord(“J. Student”) gradeSet() largeFunkyDataObject() otherObject() “nigeria” 114 “cs35” 12 Left side is the key; right side is the value

Page 6: Grokking Hash Tables

Small integers are easy…

37 6.0827

double[] mapArray=new double[200];mapArray[37]=Math.sqrt(37);

Page 7: Grokking Hash Tables

What about non-integers?

Desire: have a table that gives quick lookup for arbitrary objects

Doesn’t require vast space Answer: introduce an intermediate step

Hash function turns key into hash code Use hash code to look up key/val pair in

table table[h(key)]=<key,value>

Page 8: Grokking Hash Tables

The big picture…

“nigeria”

MondoHashTable

get(“nigeria”)h(“nigeria”)

1945462417

nigeria114

Page 9: Grokking Hash Tables

Some practical questions

1.9 billion? Isn’t that a little much?

Page 10: Grokking Hash Tables

Some practical questions

1.9 billion? Isn’t that a little much? A: reduce mod table size int h=hashFunction(Object a) % table.length;

Page 11: Grokking Hash Tables

Some practical questions

1.9 billion? Isn’t that a little much? A: reduce mod table size int h=hashFunction(Object a) % table.length;

Where do the hash functions come from?

Page 12: Grokking Hash Tables

Some practical questions

1.9 billion? Isn’t that a little much? A: reduce mod table size int h=hashFunction(Object a) % table.length;

Where do the hash functions come from? A: java.lang.Object.hashCode()

Page 13: Grokking Hash Tables

Some practical questions

1.9 billion? Isn’t that a little much? A: reduce mod table size int h=hashFunction(Object a) % table.length;

Where do the hash functions come from? A: java.lang.Object.hashCode()

How big should the table be, initially?

Page 14: Grokking Hash Tables

Some practical questions

1.9 billion? Isn’t that a little much? A: reduce mod table size int h=hashFunction(Object a) % table.length;

Where do the hash functions come from? A: java.lang.Object.hashCode()

How big should the table be, initially? Good choice: pick a prime # Ask the user (arg to constructor)

Page 15: Grokking Hash Tables

Some practical questions

1.9 billion? Isn’t that a little much? A: reduce mod table size int h=hashFunction(Object a) % table.length;

Where do the hash functions come from? A: java.lang.Object.hashCode()

How big should the table be, initially? Good choice: pick a prime # Ask the user (arg to constructor)

What happens if the table gets too full?

Page 16: Grokking Hash Tables

Some practical questions

1.9 billion? Isn’t that a little much? A: reduce mod table size int h=hashFunction(Object a) % table.length;

Where do the hash functions come from? A: java.lang.Object.hashCode()

How big should the table be, initially? Good choice: pick a prime # Ask the user (arg to constructor)

What happens if the table gets too full? A: resize it!

Page 17: Grokking Hash Tables

#1 killer question: Collisions

What happens if (a.hashCode()%tSz)==(b.hashCode()%tSz)

Depends… If a.equals(b), then these are the same key If not…

This is a hash collision Basically, you have two different keys pointing

at the same location in the hash table Have to resolve this somehow -- find unique

storage for every key and don’t lose anything

Page 18: Grokking Hash Tables

Collision strategy 1: Chaining

Make each cell in the hash table a “bucket” containing multiple key/value pairs

h(“nigeria”)

nigeria114

h(“viagra”)

viagra29

Page 19: Grokking Hash Tables

Collision strategy 2: Open addressing

Each cell of the table actually holds a key/value pair

When you have a collision, rehash to find a new location for the new pair Linear probing: try next cell in line Quadratic probing: try cell h+1, then

h+4, h+9, h+16, h+15 … Double hashing:

h(k)=(h1(k)+i*h2(k)) mod table.size()

Repeat probes until you find an empty spot

Page 20: Grokking Hash Tables

Map.keySet()

A common operation on hash tables (Maps): get all of the keys You’ll probably use this in the “dump”

functionality of SpamBGon Map requires:

Set keySet(): Returns a set view of the keys contained in this map.

What’s a view?

Page 21: Grokking Hash Tables

Views of data

Different interface to the same underlying data

Doesn’t copy the data -- just provides new methods for accessing it

A “set view of the keys”, then, is an object that behaves like (i.e., implements) a set, but gives you access to all of the keys of the hash table.

Page 22: Grokking Hash Tables

The set view picture

MondoHashTable

keySet()

Set

size()

iterator()

contains()