Hashing Powerpoint
-
Upload
marcus-aurelius -
Category
Documents
-
view
242 -
download
0
Transcript of Hashing Powerpoint
-
7/26/2019 Hashing Powerpoint
1/58
1/51
Dictionaries, Tables
Hashing
TCSS 342
-
7/26/2019 Hashing Powerpoint
2/58
2/51
The Dictionary ADT
a dictionary (table) is an abstract model of a
database
like a priority queue, a dictionary stores
key-element pairs
the main operation supported by a
dictionary is searching by key
-
7/26/2019 Hashing Powerpoint
3/58
3/51
Examples
Telephone directory
Library catalogue
Books in print: key ISBN
FAT (File Allocation Table)
-
7/26/2019 Hashing Powerpoint
4/58
4/51
Main Issues
Size
Operations: search, insert, delete, ??? Create
reports??? List?
What will be stored in the dictionary?
How will be items identified?
-
7/26/2019 Hashing Powerpoint
5/58
5/51
The Dictionary ADT
simple container methods:
size()
isEmpty()
elements()
query methods:
findElement(k)
findAllElements(k)
-
7/26/2019 Hashing Powerpoint
6/58
6/51
The Dictionary ADT
update methods:
insertItem(k, e)
removeElement(k)
removeAllElements(k)
special element
NO_SUCH_KEY, returned by an unsuccessful
search
-
7/26/2019 Hashing Powerpoint
7/58
7/51
Implementing a Dictionary
with a Sequence unordered sequence
searching and removing takes O(n) time
inserting takes O(1) time
applications to log files (frequent insertions,
rare searches and removals) 34 14 12 22 18
34 14 12 22 18
-
7/26/2019 Hashing Powerpoint
8/58
8/51
Implementing a Dictionary
with a Sequence array-based ordered sequence
(assumes keys can be ordered)
- searching takes O(log n) time (binary search)- inserting and removing takes O(n) time
- application to look-up tables
(frequent searches, rare insertions and removals)
12 14 18 22 34
-
7/26/2019 Hashing Powerpoint
9/58
9/51
Binary Search
narrow down the search range in stages
high-low game
findElement(22)
2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37
low highmid
14
-
7/26/2019 Hashing Powerpoint
10/58
10/51
Binary Search
2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37
low highmid
25
2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37
low highmid
19
2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37
low = mid = high
22
-
7/26/2019 Hashing Powerpoint
11/58
11/51
Pseudocode for Binary Search
AlgorithmBinarySearch(S, k, low, high)
if low > high thenreturn NO_SUCH_KEY
elsemid (low+high) / 2
if k = key(mid) thenreturn key(mid)
else if k < key(mid) thenreturn BinarySearch(S, k, low, mid-1)
elsereturn BinarySearch(S, k, mid+1, high)
-
7/26/2019 Hashing Powerpoint
12/58
12/51
Running Time of Binary
Search The range of candidate items to be searched
is halved after each comparison
Comparison Search Range
0 n
1 n/2
2i n/2i
log 2n 1
-
7/26/2019 Hashing Powerpoint
13/58
13/51
Running Time of Binary
Search In the array-based implementation, access
by rank takes O(1) time, thusbinary search
runs in O(log n) time
Binary Search is applicable only to Random
Access structures (Arrays, Vectors)
-
7/26/2019 Hashing Powerpoint
14/58
14/51
Implementations
Sorted? Non Sorted?
Elementary: Arrays, vectors linked lists
Orgainization: None (log file), Sorted, Hashed
Advanced: balanced trees
-
7/26/2019 Hashing Powerpoint
15/58
15/51
Skip Lists
Simulate Binary Search on a linked list.
Linked list allows easy insertion and
deletion.
http://www.epaperpress.com/s_man.html
-
7/26/2019 Hashing Powerpoint
16/58
16/51
A FAT Example
Directory: Key: file name. Data (time, date, size) location of first block in the FAT table.
If first block is in physical location #23 (Diskblock number) look up position #23 in the FAT.Either shows end of file or has the block numberon disk.
Example: Directory entry: block # 4 FAT: x x x F 5 6 10 x 23 25
3
The file occupies blocks 4,5,6,10, 3.
-
7/26/2019 Hashing Powerpoint
17/58
17/51
Hashing
Place item with key k in position h(k).
Hope: h(k) is 1-1.
Requires: unique key (unless multiple items
allowed). Key must be protected from
change (use abstract class that provides only
a constructor).
Keys must be comparable.
-
7/26/2019 Hashing Powerpoint
18/58
-
7/26/2019 Hashing Powerpoint
19/58
19/51
Hashing Problem
RT&T is a large phone company, and they
want to provide enhanced caller ID
capability: given a phone number, return the callers name
phone numbers are in the range 0 toR = 10101
n is the number of phone numbers usedwant to do this as efficiently as possible
-
7/26/2019 Hashing Powerpoint
20/58
-
7/26/2019 Hashing Powerpoint
21/58
-
7/26/2019 Hashing Powerpoint
22/58
22/51
Generalized indexing
Hash table
Data storage associated with a key
The key need not be an integer
-
7/26/2019 Hashing Powerpoint
23/58
23/51
Hash Tables
A data structure
The location of an item is determined
directly as a function of the item itselfNot by a sequence of trial and error comparisons
Commonly used to provide faster searching
O(n) for linear searches
O (logn) for binary search
O(1) for hash table
-
7/26/2019 Hashing Powerpoint
24/58
-
7/26/2019 Hashing Powerpoint
25/58
25/51
Another Solution
A Hash Table is an alternative solution with
O(1) expected query time and O(n + N)
space, whereN is the size of the table Like an array, but with a function to map
the large range of keys into a smaller one
e.g., take the original key, mod the size of thetable, and use that as an index
-
7/26/2019 Hashing Powerpoint
26/58
26/51
Example
Insert item (401-863-7639, Roberto) into a table of
size 5
4018637639 mod 5 = 4, so item (401-863-7639,Roberto) is stored in slot 4 of the table
A lookup uses the same process: map the key to an
index, then check the array cell at that index
401-
863-7639
Roberto
0 1 2 3 4
-
7/26/2019 Hashing Powerpoint
27/58
27/51
Collision
Insert (401-863-9350, Andy)
And insert (401-863-2234, Devin). We have
a collision!
-
7/26/2019 Hashing Powerpoint
28/58
28/51
Collision Resolution
How to deal with two keys which map to
the same cell of the array?
Use chaining
Set up lists of items with the same index
-
7/26/2019 Hashing Powerpoint
29/58
-
7/26/2019 Hashing Powerpoint
30/58
-
7/26/2019 Hashing Powerpoint
31/58
31/51
Hash Function
Function hdefined by h(i) = i
Determines the location of an item iin the
hash table
Called a hash function.
To reduce the large size of a hash table
use h(i) = imod25;
-
7/26/2019 Hashing Powerpoint
32/58
-
7/26/2019 Hashing Powerpoint
33/58
-
7/26/2019 Hashing Powerpoint
34/58
-
7/26/2019 Hashing Powerpoint
35/58
-
7/26/2019 Hashing Powerpoint
36/58
36/51
Popular Hash-Code Maps
The polynomial is computed withHorners
rule, ignoring overflows, at a fixed value x:
a0+ x (a1+ x (a2+ ... x (an-2+ x an-1) ... ))The choicex = 33, 37, 39, or 41 gives at most 6
collisions on a vocabulary of 50,000 English
words
Why is the component-sum hash code bad
for strings?
-
7/26/2019 Hashing Powerpoint
37/58
37/51
Random Hashing
Random hashing
Uses a simple random number generation
technique Scatters the items randomly throughout
the hash table
-
7/26/2019 Hashing Powerpoint
38/58
-
7/26/2019 Hashing Powerpoint
39/58
-
7/26/2019 Hashing Powerpoint
40/58
-
7/26/2019 Hashing Powerpoint
41/58
-
7/26/2019 Hashing Powerpoint
42/58
-
7/26/2019 Hashing Powerpoint
43/58
-
7/26/2019 Hashing Powerpoint
44/58
-
7/26/2019 Hashing Powerpoint
45/58
-
7/26/2019 Hashing Powerpoint
46/58
46/51
Hash code
static inthashCode(long i) {
return(int)((i >> 32) +(int) i);}
-
7/26/2019 Hashing Powerpoint
47/58
-
7/26/2019 Hashing Powerpoint
48/58
-
7/26/2019 Hashing Powerpoint
49/58
-
7/26/2019 Hashing Powerpoint
50/58
50/51
Linear Probing Hash Table
/** constructor providing the hash comparator and the
capacity
* of the bucket array */
public LinearProbingHashTable(HashComparator hc, int
bN) {
h = hc;
N = bN;
A = newItem[N];
}
-
7/26/2019 Hashing Powerpoint
51/58
-
7/26/2019 Hashing Powerpoint
52/58
-
7/26/2019 Hashing Powerpoint
53/58
-
7/26/2019 Hashing Powerpoint
54/58
-
7/26/2019 Hashing Powerpoint
55/58
-
7/26/2019 Hashing Powerpoint
56/58
56/51
Dictionary
public voidinsertItem (Object key, Object element)
throwsInvalidKeyException {
check(key);
int i = h.
hashValue(key) % N; // division method compression
map
intj = i;
-
7/26/2019 Hashing Powerpoint
57/58
-
7/26/2019 Hashing Powerpoint
58/58