Hashtables. An Abstract data type that supports the following operations: –Insert –Find...

Hashtables

Hashtables

• An Abstract data type that supports the following operations:– Insert – Find – Remove

• Search trees can be used for the same operations but require an order relation to be defined an logarithmic time.

• Hashtables do not require an order relationship on the elements and all operations take O(1) time on average.

Direct Access Tables

• Assume that the keys are distinct numbers in the range U = {1,2,3….m}, use an array of size m and place the kth element in the kth index of the array.

• O(1) time for all operations

• Problem: wasteful for small sets and impractical if m is very large

Hashtables

• Main Idea: instead of using the keys themselves as index in the table, use a hash function for mapping keys to indices.

• Note U is the set representing all possible keys, it is therefore usually much larger than m.

: {0,1..... 1}h U m

Simple Uniform Hashing

• We assume that we use a hash function that given an key, will hash the key into any slot with equal probability.

• We will try to provide some reasonable hash functions later

hash functions

• The hash function is responsible to map keys into integers (slot numbers). A good hash function must have the following properties

– 1. Easy to evaluate - computing h(x) in O(1)– 2. Uniform distribution over all the table slots– 3. Similar keys will be mapped to different slots

hash functions

• The first step is to represent the key as a natural integer number.

• For example if S is a String then we can compute the interpret it as an integer value using the formula

keylengthi

0

128 ( [ ])i

char key i

Collisions

• Mapping keys to indices can cause collisions if to keys are mapped by the hash function to the same index

• Solutions– Chaining– Open addressing

1 2( ) ( )h k h k

Collision resolution - Chaining

• All keys that have the same hash value are placed in a linked list

• Insertion can be done at the beginning of the list in O(1) time

• Searching is proportional to the length of the list

Collision resolution by chaining

• Let h be a hash table of 9 slots and h(k) = k mod 9, insert the elements :

6, 43, 23, 62, 1, 13, 34, 55, 25

h(6) = 6 mod 9 = 6h(43) = 43 mod 9 = 7h(23) = 23 mod 9 = 5h(62) = 62 mod 9 = 8h(1) = 1 mod 9 = 1h(13) = 13 mod 9 = 4

h(34) = 34 mod 9 = 7h(55) = 55 mod 9 = 1h(25) = 25 mod 9 = 7

Analysis

• The load factor of a hashtable is defined by the number of elements stored in the table divided by the number of slots

• An search will take under the assumption of uniform hashing

/n m

(1 )

Division method

• An appropriate hash function for a hashtable that uses chaining is the division method.

• Powers of 10 and 2 should be avoided• Good values are primes not close to powers of 2

( ) modh k k m

Open Addressing

• Each element occupies a single slot in the hashtable. No chaining is done

• To insert an element, we probe the table according to the hash function until an empty slot is found.

• The hash function is now a function of both the key and the number of attempts in the insertion process

: {0,1..... 1} {0,1..... 1}h U m m

Hash Insert

• HashInsert (T,k) {int i;

for (i = 0; i < m; i++) {j = h(k,i)if (T[j] == null) break;

}if (i < m)

T[j] = kelse

hashtable overflow}

Hash Search

• HashSearch (T,k) {int i;

for (int i = 0; i < m; i++) {j = h(k,i)if (T[j] == null)

return not foundelse if (T[j] ==k)

return j}

}

Linear probing

• Using linear probing the hash function uses an ordinary hash function h’, such as a function using the division method, and turns it into:

• If a slot is occupied, we try the subsequent slot, etc., thus the initial slot determines the probing sequence for insertion and search.

( , ) ( '( ) ) modh k i h k i m

Linear Probing

• Easy to implement but suffers from primary clustering.

• The probability of probing into a slot following an occupied slot is greater than the probability of any other slot.

Linear Probing

• Given a hash function h’, the linear probing scheme is simply

( , ) ( '( ) ) modh k i h k i m

Exercise

• You are given a hash table h with 11 slots. Demonstrate inserting the following elements using linear probing and a hash function h(k) = k mod m

– 10,22,31,4,15,28,17,88,59

Solution

• h(10,0) = (10mod11 + 0) mod 11 = 10• h(22,0) = (22mod11 + 0) mod 11 = 0• h(31,0) = (31mod11 + 0) mod 11 = 9• h(4,0) = (4mod11 + 0) mod 11 = 4• h(15,0) = (15mod11 + 0) mod 11 = 4• h(15,1) = (15mod11 + 0) mod 11 = 5• h(28,0) = (28mod11 +1) mod 11 = 6• h(17,0) = (17mod11 + 0) mod 11 = 6• h(17,1) = (17mod11 + 1) mod 11 = 7

0 1 2 3 4 5 6 7 8 9 10

22 88 4 15 28 17 59 31 10

•h(88,0) = (88mod11 + 0) mod 11 = 10•h(88,1) = (88mod11 +1) mod 11 = 1•h(59,0) = (59mod11 + 0) mod 11 = 4•h(59,1) = (59mod11 + 1) mod 11 = 5•h(59,2) = (59mod11 + 2) mod 11 = 6•h(59,3) = (59mod11 + 3) mod 11 = 7•h(59,4) = (59mod11 + 4) mod 11 = 8

Quadric Probing

• Using quadratic probing the has function again uses an initial hash function h’, and is now

• Choosing a subsequent slot once a slot is full depends on the probe number i.

• Quadric probing involves a secondary form of clustering since only the initial probe determines the entire probing sequence,

21 2( , ) ( '( ) ) modh k i h k c i c i m

1 2 1 2( , ) ( , ) ( , 1) ( , 1)h k i h k i h k i h k i

Quadric Probing

• Given a hash function h’ quadric probing is done by:

21 2( , ) ( '( ) ) modh k i h k c i c i m

Example

• You are given a hash table h with 11 slots. Demonstrate inserting the following elements using quadric probing and a hash function

– 10,22,31,4,15,28,17,88,59

2( , ) ( mod 3 ) modh k i k m i i m

• h(10,0) = (10mod11 + 0) mod 11 = 10• h(22,0) = (22mod11 + 0) mod 11 = 0• h(31,0) = (31mod11 + 0) mod 11 = 9• h(4,0) = (4mod11 + 0) mod 11 = 4• h(15,0) = (15mod11 + 0) mod 11 = 4• h(15,1) = (15mod11 + 1 + 3) mod 11 = 8• h(28,0) = (28mod11 +1) mod 11 = 6• h(17,0) = (17mod11 + 0) mod 11 = 6• h(17,1) = (17mod11 + 1 + 3) mod 11 = 10• h(17,2) = (17mod11 + 2 + 12) mod 11 = 9• h(17,3) = (17mod11 + 3 + 27) mod 11 = 3• h(88,0) = (88mod11 + 0) mod 11 = 0• h(88,1) = (88mod11 + 1 + 3) mod 11 = 4• h(88,2) = (88mod11 + 2 + 12) mod 11 = 3

•h(88,3) = (88mod11+ 3+ 27) mod 11 = 8•h(88,4) = (88mod11+ 4+ 48) mod 11 = 8•h(88,5) = (88mod11+ 5+ 75) mod 11 = 3•h(88,6) = (88mod11+ 6+ 108) mod 11 = 4•h(88,7) = (88mod11+ 7+ 147) mod 11 = 0•h(88,8) = (88mod11+ 8+ 192) mod 11 = 2•h(59,0) = (59mod11 + 0) mod 11 = 4•h(59,1) = (59mod11 + 1 + 3) mod 11 = 8•h(59,2) = (59mod11 + 1 + 12) mod 11 = 7

0 1 2 3 4 5 6 7 8 9 10

22 88 17 4 28 59 15 31 10

Double Hashing

• Given two hash functions

• Problem should not have any common divisors.

1 2,h h

1 2( , ) ( ( ) ( )) modh k i h k ih k m

2 ( ),h k m

Double Hashing

• Example 1:

select m to be a power of 2, and design to produce odd numbers.

• Example 2:

select m to be prime, and m’ to be m-1.

2h

1

2

( ) mod

( ) 1 ( mod ')

h k k m

h k k m

Analysis

• In open addressing the load factor can not be more than 1.

• Insertion and unsuccessful searching requires at most attempts

• A successful search will take at most

1/1

1 1ln

1

Analysis

• When the table is 50% full, searching will require 1.387 probes on average

• When the table is 90% full, searching will require 2.599 probes on average

Problems with open addressing

• If an element is deleted, we can not simply remove the element, since later search operations may fail. Rehashing will ruin the running time

• Solution: Use a DELETED node.

Rehashing

• If we do not know the size of the elements in advance, we use a technique similar to the one used in vectors. Once the load factor reaches some predefined threshold, rehash the data into a larger hashtable.

Example

• Given a set S of unique integers and a number z, find such that x+y = z

– An efficient worst case algorithm– An efficient average case algorithm

,x y S

An efficient worst case algorithm

• 1. Sort all elements in S -

.

• 2. For every x in S we search for z-x (y) in

S using binary search –

Total of O(nlogn)

( log )O n n

(log( )) ( log )n O n O n n

An efficient average case algorithm

• 1. We use a hash table where m is of order nfor all we execute insert(x)

• 2. For all we execute search(z-x)

Total - average caseTotal - worst case

x S

(1) ( )n n

x S

(1) ( )n n

( )n2( )n

Example

• Given a set S of sortable items, we are asked if all items in S are unique.

• 1. Sort the elements of S.

• 2. Iterate on the elements of S searching for subsequent equal values.

• Execution time

( )n

( log )O n n

( log )O n n

Example

• 1. Use a hash table were m is of order n. for all we execute insert(x). We modify the insert operation to signal if x already exists in the table.

(every insert includes a search operation)

• Execution time - average case

x S

(1) ( )n n

Java hashcode

• Each java object has a method public int hashcode, which is defined in class Object, and is supported for the purposes of hashtables and hashmaps.

• The default implementation returns a unique number that is based on the memory location of the object.

• If two objects are equal they must have the same hashcode

Java hashcode

• It is not required that distinct objects will have distinct hashcodes, but it will improve the performance of the hashtables.

• Can the hashcode of an object change throughout it’s life cycle?

Hashtables. An Abstract data type that supports the following operations: –Insert –Find...

Documents

Transcript of Hashtables. An Abstract data type that supports the following operations: –Insert –Find...