Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif...

23
Hash Tables

Transcript of Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif...

Page 1: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

Hash Tables

Page 2: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

Group Members:

Syed Husnain Bukhari SP10-BSCS-92

Ahmad Inam SP10-BSCS-06

M.Umair Sharif SP10-BSCS-38

Page 3: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

Description

• A hash table is a data structure that stores things and allows insertions, lookups, and deletions to be performed in O(1) time.

• An algorithm converts an object, typically a string, to a number. Then the number is compressed according to the size of the table and used as an index.

• There is the possibility of distinct items being mapped to the same key. This is called a collision and must be resolved.

Page 4: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

Key Hash Code Generator Number Compression Index

Smith 7

0

1

2

3

9

8

7

6

5

4

Bob Smith123 Main St.

Orlando, FL 327816407-555-1111

[email protected]

Page 5: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

Definition:

Hashing is a key-to-address mapping process.

Terms must be familiarized.

Collision: A collision occurs when a hashing algorithm produces an address for an insertion key and that address is already occupied.Home address: The address produced by the hashing algorithm is known as the home address.Prime area: The memory that contains all of the home addresses is known as the prime area.Probe: Each calculation of an address and test for success is known as a probe.

Page 6: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

Collision Resolution

• There are two kinds of collision resolution:1 – Chaining makes each entry a linked list so

that when a collision occurs the new entry is added to the end of the list.2 – Open Addressing uses probing to discover

an empty spot.• With chaining, the table does not have to be

resized. With open addressing, the table must be resized when the number of elements is larger than the capacity.

Page 7: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

Smith 7

0

1

2

3

9

8

7

6

5

4

Bob Smith123 Main St.

Orlando, FL 327816407-555-1111

[email protected]

Jim Smith123 Elm St.

Orlando, FL 327816407-555-2222

[email protected]

Chaining

Page 8: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

Smith 7

0

1

2

3

9

8

7

6

5

4

Bob Smith123 Main St.

Orlando, FL 327816407-555-1111

[email protected]

Jim Smith123 Elm St.

Orlando, FL 327816407-555-2222

[email protected]

Probing

Page 9: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

Hashing Methods There are eight hashing methods they are:

1: Direct method2: Subtraction method3: Modulo-division4: Mid square5: Digit extraction6: Rotation7: Folding8: Pseudorandom generation

Page 10: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

Hashing Methods

• Direct MethodIn direct hashing the key is the address without any algorithmic manipulation.Direct hashing is limited, but it can be very powerful because it guarantees that there are no synonyms and therefore no collision.

Page 11: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.
Page 12: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

• Modulo-division Method

This is also known as division remainder method.

This algorithm works with any list size, but a list size that is a prime number produces fewer collisions than other list sizes.

The formula to calculate the address is:Address = key MODULO listsize + 1Where listsize is the number of elements in the array.

Page 13: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

• Example:

Given data : Keys are : 137456 214562 140145137456 % 19 +1 = 11214562 % 19 + 1 = 15140145 % 19 + 1 = 2

Page 14: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

• Digit-extraction MethodUsing digit extraction selected digits are extracted from the key and used as the address.

Example : Using six-digit employee number to hash to a three digit address (000-999), we could select the first, third, and fourth digits( from the left) and use them as the address.

The keys are:379452 -> 394121267 -> 112378845 -> 388

Page 15: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

• Folding MethodTwo folding methods are used they are:

1: Fold shift

2: Fold boundary

1: Fold ShiftIn fold shift the key value is divided

into parts whose size matches the size of the required address. Then the left and right parts are shifted and added with the middle part.

Page 16: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

• Fold boundaryIn fold boundary the left and right numbers are folded on a fixed boundary between them and the center number. The two outside values are thus reversed.

Page 17: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

• Midsquare MethodIn midsquare hashing the key is squared and the address is selected from the middle of the square number.

Limitation is the size of the key.

Example:

94522 = 89340304: address is 3403

Page 18: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

• Rotation MethodRotation method is generally not used by itself but rather is incorporated in combination with other hashing methods.

It is most useful when keys are assigned serially.

Page 19: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

• Pseudorandom HashingA common random-number generator is shown below.y= ax + cTo use the pseudorandom-number generator as a hashing method, we set x to the key, multiply it by the coefficient a, and then add the constant c. The result is then divided by the list size, with the remainder being the hashed address.

Example:Y= ((17 * 121267) + 7) modulo 307Y= (2061539 + 7) modulo 307Y= 2061546Y=41

Page 20: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

Hash Table Uses

• Compilers use hash tables for symbol storage.

• The Linux Kernel uses hash tables to manage memory pages and buffers.

• High speed routing tables use hash tables.

• Database systems use hash tables.

Page 21: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

Summary

• A hash table is a convenient data structure for storing items that provides O(1) access time.

• The concepts that drive selection of the key generation and compression functions can be complicated, but there is a lot of research information available.

• There are many areas where hash tables are used.

• Modern programming languages typically provide a hash table implementation ready for use in applications.

Page 22: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

References

Knuth, Donald A. The Art of Computer Programming. Philippines: Addison-Wesley Publishing Company, 1973.

Loudon, Kyle. Mastering Algorithms with C. Sebastopol: O’Reilly & Associates, 1999

Watt, David A., and Deryck F. Brown. Java Collections. West Sussex: John Wiley & Sons, 2001

Dewdney, A. K. The New Turing Omnibus. New York: Henry Holt and Company, 2001

Page 23: Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.

Any Question