Hashtables. An Abstract data type that supports the following operations: –Insert –Find...
-
Upload
kimberly-obrien -
Category
Documents
-
view
215 -
download
0
Transcript of Hashtables. An Abstract data type that supports the following operations: –Insert –Find...
Hashtables
Hashtables
• An Abstract data type that supports the following operations:– Insert – Find – Remove
• Search trees can be used for the same operations but require an order relation to be defined an logarithmic time.
• Hashtables do not require an order relationship on the elements and all operations take O(1) time on average.
Direct Access Tables
• Assume that the keys are distinct numbers in the range U = {1,2,3….m}, use an array of size m and place the kth element in the kth index of the array.
• O(1) time for all operations
• Problem: wasteful for small sets and impractical if m is very large
Hashtables
• Main Idea: instead of using the keys themselves as index in the table, use a hash function for mapping keys to indices.
• Note U is the set representing all possible keys, it is therefore usually much larger than m.
: {0,1..... 1}h U m
Simple Uniform Hashing
• We assume that we use a hash function that given an key, will hash the key into any slot with equal probability.
• We will try to provide some reasonable hash functions later
hash functions
• The hash function is responsible to map keys into integers (slot numbers). A good hash function must have the following properties
– 1. Easy to evaluate - computing h(x) in O(1)– 2. Uniform distribution over all the table slots– 3. Similar keys will be mapped to different slots
hash functions
• The first step is to represent the key as a natural integer number.
• For example if S is a String then we can compute the interpret it as an integer value using the formula
keylengthi
0
128 ( [ ])i
char key i
Collisions
• Mapping keys to indices can cause collisions if to keys are mapped by the hash function to the same index
• Solutions– Chaining– Open addressing
1 2( ) ( )h k h k
Collision resolution - Chaining
• All keys that have the same hash value are placed in a linked list
• Insertion can be done at the beginning of the list in O(1) time
• Searching is proportional to the length of the list
Collision resolution by chaining
• Let h be a hash table of 9 slots and h(k) = k mod 9, insert the elements :
6, 43, 23, 62, 1, 13, 34, 55, 25
h(6) = 6 mod 9 = 6h(43) = 43 mod 9 = 7h(23) = 23 mod 9 = 5h(62) = 62 mod 9 = 8h(1) = 1 mod 9 = 1h(13) = 13 mod 9 = 4
h(34) = 34 mod 9 = 7h(55) = 55 mod 9 = 1h(25) = 25 mod 9 = 7
Analysis
• The load factor of a hashtable is defined by the number of elements stored in the table divided by the number of slots
• An search will take under the assumption of uniform hashing
/n m
(1 )
Division method
• An appropriate hash function for a hashtable that uses chaining is the division method.
• Powers of 10 and 2 should be avoided• Good values are primes not close to powers of 2
( ) modh k k m
Open Addressing
• Each element occupies a single slot in the hashtable. No chaining is done
• To insert an element, we probe the table according to the hash function until an empty slot is found.
• The hash function is now a function of both the key and the number of attempts in the insertion process
: {0,1..... 1} {0,1..... 1}h U m m
Hash Insert
• HashInsert (T,k) {int i;
for (i = 0; i < m; i++) {j = h(k,i)if (T[j] == null) break;
}if (i < m)
T[j] = kelse
hashtable overflow}
Hash Search
• HashSearch (T,k) {int i;
for (int i = 0; i < m; i++) {j = h(k,i)if (T[j] == null)
return not foundelse if (T[j] ==k)
return j}
}
Linear probing
• Using linear probing the hash function uses an ordinary hash function h’, such as a function using the division method, and turns it into:
• If a slot is occupied, we try the subsequent slot, etc., thus the initial slot determines the probing sequence for insertion and search.
( , ) ( '( ) ) modh k i h k i m
Linear Probing
• Easy to implement but suffers from primary clustering.
• The probability of probing into a slot following an occupied slot is greater than the probability of any other slot.
Linear Probing
• Given a hash function h’, the linear probing scheme is simply
( , ) ( '( ) ) modh k i h k i m
Exercise
• You are given a hash table h with 11 slots. Demonstrate inserting the following elements using linear probing and a hash function h(k) = k mod m
– 10,22,31,4,15,28,17,88,59
Solution
• h(10,0) = (10mod11 + 0) mod 11 = 10• h(22,0) = (22mod11 + 0) mod 11 = 0• h(31,0) = (31mod11 + 0) mod 11 = 9• h(4,0) = (4mod11 + 0) mod 11 = 4• h(15,0) = (15mod11 + 0) mod 11 = 4• h(15,1) = (15mod11 + 0) mod 11 = 5• h(28,0) = (28mod11 +1) mod 11 = 6• h(17,0) = (17mod11 + 0) mod 11 = 6• h(17,1) = (17mod11 + 1) mod 11 = 7
0 1 2 3 4 5 6 7 8 9 10
22 88 4 15 28 17 59 31 10
•h(88,0) = (88mod11 + 0) mod 11 = 10•h(88,1) = (88mod11 +1) mod 11 = 1•h(59,0) = (59mod11 + 0) mod 11 = 4•h(59,1) = (59mod11 + 1) mod 11 = 5•h(59,2) = (59mod11 + 2) mod 11 = 6•h(59,3) = (59mod11 + 3) mod 11 = 7•h(59,4) = (59mod11 + 4) mod 11 = 8
Quadric Probing
• Using quadratic probing the has function again uses an initial hash function h’, and is now
• Choosing a subsequent slot once a slot is full depends on the probe number i.
• Quadric probing involves a secondary form of clustering since only the initial probe determines the entire probing sequence,
21 2( , ) ( '( ) ) modh k i h k c i c i m
1 2 1 2( , ) ( , ) ( , 1) ( , 1)h k i h k i h k i h k i
Quadric Probing
• Given a hash function h’ quadric probing is done by:
21 2( , ) ( '( ) ) modh k i h k c i c i m
Example
• You are given a hash table h with 11 slots. Demonstrate inserting the following elements using quadric probing and a hash function
– 10,22,31,4,15,28,17,88,59
2( , ) ( mod 3 ) modh k i k m i i m
• h(10,0) = (10mod11 + 0) mod 11 = 10• h(22,0) = (22mod11 + 0) mod 11 = 0• h(31,0) = (31mod11 + 0) mod 11 = 9• h(4,0) = (4mod11 + 0) mod 11 = 4• h(15,0) = (15mod11 + 0) mod 11 = 4• h(15,1) = (15mod11 + 1 + 3) mod 11 = 8• h(28,0) = (28mod11 +1) mod 11 = 6• h(17,0) = (17mod11 + 0) mod 11 = 6• h(17,1) = (17mod11 + 1 + 3) mod 11 = 10• h(17,2) = (17mod11 + 2 + 12) mod 11 = 9• h(17,3) = (17mod11 + 3 + 27) mod 11 = 3• h(88,0) = (88mod11 + 0) mod 11 = 0• h(88,1) = (88mod11 + 1 + 3) mod 11 = 4• h(88,2) = (88mod11 + 2 + 12) mod 11 = 3
•h(88,3) = (88mod11+ 3+ 27) mod 11 = 8•h(88,4) = (88mod11+ 4+ 48) mod 11 = 8•h(88,5) = (88mod11+ 5+ 75) mod 11 = 3•h(88,6) = (88mod11+ 6+ 108) mod 11 = 4•h(88,7) = (88mod11+ 7+ 147) mod 11 = 0•h(88,8) = (88mod11+ 8+ 192) mod 11 = 2•h(59,0) = (59mod11 + 0) mod 11 = 4•h(59,1) = (59mod11 + 1 + 3) mod 11 = 8•h(59,2) = (59mod11 + 1 + 12) mod 11 = 7
0 1 2 3 4 5 6 7 8 9 10
22 88 17 4 28 59 15 31 10
Double Hashing
• Given two hash functions
• Problem should not have any common divisors.
1 2,h h
1 2( , ) ( ( ) ( )) modh k i h k ih k m
2 ( ),h k m
Double Hashing
• Example 1:
select m to be a power of 2, and design to produce odd numbers.
• Example 2:
select m to be prime, and m’ to be m-1.
2h
1
2
( ) mod
( ) 1 ( mod ')
h k k m
h k k m
Analysis
• In open addressing the load factor can not be more than 1.
• Insertion and unsuccessful searching requires at most attempts
• A successful search will take at most
1/1
1 1ln
1
Analysis
• When the table is 50% full, searching will require 1.387 probes on average
• When the table is 90% full, searching will require 2.599 probes on average
Problems with open addressing
• If an element is deleted, we can not simply remove the element, since later search operations may fail. Rehashing will ruin the running time
• Solution: Use a DELETED node.
Rehashing
• If we do not know the size of the elements in advance, we use a technique similar to the one used in vectors. Once the load factor reaches some predefined threshold, rehash the data into a larger hashtable.
Example
• Given a set S of unique integers and a number z, find such that x+y = z
– An efficient worst case algorithm– An efficient average case algorithm
,x y S
An efficient worst case algorithm
• 1. Sort all elements in S -
.
• 2. For every x in S we search for z-x (y) in
S using binary search –
Total of O(nlogn)
( log )O n n
(log( )) ( log )n O n O n n
An efficient average case algorithm
• 1. We use a hash table where m is of order nfor all we execute insert(x)
• 2. For all we execute search(z-x)
Total - average caseTotal - worst case
x S
(1) ( )n n
x S
(1) ( )n n
( )n2( )n
Example
• Given a set S of sortable items, we are asked if all items in S are unique.
• 1. Sort the elements of S.
• 2. Iterate on the elements of S searching for subsequent equal values.
• Execution time
( )n
( log )O n n
( log )O n n
Example
• 1. Use a hash table were m is of order n. for all we execute insert(x). We modify the insert operation to signal if x already exists in the table.
(every insert includes a search operation)
• Execution time - average case
x S
(1) ( )n n
Java hashcode
• Each java object has a method public int hashcode, which is defined in class Object, and is supported for the purposes of hashtables and hashmaps.
• The default implementation returns a unique number that is based on the memory location of the object.
• If two objects are equal they must have the same hashcode
Java hashcode
• It is not required that distinct objects will have distinct hashcodes, but it will improve the performance of the hashtables.
• Can the hashcode of an object change throughout it’s life cycle?