Hashing Powerpoint

download Hashing Powerpoint

of 58

Transcript of Hashing Powerpoint

  • 7/26/2019 Hashing Powerpoint

    1/58

    1/51

    Dictionaries, Tables

    Hashing

    TCSS 342

  • 7/26/2019 Hashing Powerpoint

    2/58

    2/51

    The Dictionary ADT

    a dictionary (table) is an abstract model of a

    database

    like a priority queue, a dictionary stores

    key-element pairs

    the main operation supported by a

    dictionary is searching by key

  • 7/26/2019 Hashing Powerpoint

    3/58

    3/51

    Examples

    Telephone directory

    Library catalogue

    Books in print: key ISBN

    FAT (File Allocation Table)

  • 7/26/2019 Hashing Powerpoint

    4/58

    4/51

    Main Issues

    Size

    Operations: search, insert, delete, ??? Create

    reports??? List?

    What will be stored in the dictionary?

    How will be items identified?

  • 7/26/2019 Hashing Powerpoint

    5/58

    5/51

    The Dictionary ADT

    simple container methods:

    size()

    isEmpty()

    elements()

    query methods:

    findElement(k)

    findAllElements(k)

  • 7/26/2019 Hashing Powerpoint

    6/58

    6/51

    The Dictionary ADT

    update methods:

    insertItem(k, e)

    removeElement(k)

    removeAllElements(k)

    special element

    NO_SUCH_KEY, returned by an unsuccessful

    search

  • 7/26/2019 Hashing Powerpoint

    7/58

    7/51

    Implementing a Dictionary

    with a Sequence unordered sequence

    searching and removing takes O(n) time

    inserting takes O(1) time

    applications to log files (frequent insertions,

    rare searches and removals) 34 14 12 22 18

    34 14 12 22 18

  • 7/26/2019 Hashing Powerpoint

    8/58

    8/51

    Implementing a Dictionary

    with a Sequence array-based ordered sequence

    (assumes keys can be ordered)

    - searching takes O(log n) time (binary search)- inserting and removing takes O(n) time

    - application to look-up tables

    (frequent searches, rare insertions and removals)

    12 14 18 22 34

  • 7/26/2019 Hashing Powerpoint

    9/58

    9/51

    Binary Search

    narrow down the search range in stages

    high-low game

    findElement(22)

    2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37

    low highmid

    14

  • 7/26/2019 Hashing Powerpoint

    10/58

    10/51

    Binary Search

    2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37

    low highmid

    25

    2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37

    low highmid

    19

    2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37

    low = mid = high

    22

  • 7/26/2019 Hashing Powerpoint

    11/58

    11/51

    Pseudocode for Binary Search

    AlgorithmBinarySearch(S, k, low, high)

    if low > high thenreturn NO_SUCH_KEY

    elsemid (low+high) / 2

    if k = key(mid) thenreturn key(mid)

    else if k < key(mid) thenreturn BinarySearch(S, k, low, mid-1)

    elsereturn BinarySearch(S, k, mid+1, high)

  • 7/26/2019 Hashing Powerpoint

    12/58

    12/51

    Running Time of Binary

    Search The range of candidate items to be searched

    is halved after each comparison

    Comparison Search Range

    0 n

    1 n/2

    2i n/2i

    log 2n 1

  • 7/26/2019 Hashing Powerpoint

    13/58

    13/51

    Running Time of Binary

    Search In the array-based implementation, access

    by rank takes O(1) time, thusbinary search

    runs in O(log n) time

    Binary Search is applicable only to Random

    Access structures (Arrays, Vectors)

  • 7/26/2019 Hashing Powerpoint

    14/58

    14/51

    Implementations

    Sorted? Non Sorted?

    Elementary: Arrays, vectors linked lists

    Orgainization: None (log file), Sorted, Hashed

    Advanced: balanced trees

  • 7/26/2019 Hashing Powerpoint

    15/58

    15/51

    Skip Lists

    Simulate Binary Search on a linked list.

    Linked list allows easy insertion and

    deletion.

    http://www.epaperpress.com/s_man.html

  • 7/26/2019 Hashing Powerpoint

    16/58

    16/51

    A FAT Example

    Directory: Key: file name. Data (time, date, size) location of first block in the FAT table.

    If first block is in physical location #23 (Diskblock number) look up position #23 in the FAT.Either shows end of file or has the block numberon disk.

    Example: Directory entry: block # 4 FAT: x x x F 5 6 10 x 23 25

    3

    The file occupies blocks 4,5,6,10, 3.

  • 7/26/2019 Hashing Powerpoint

    17/58

    17/51

    Hashing

    Place item with key k in position h(k).

    Hope: h(k) is 1-1.

    Requires: unique key (unless multiple items

    allowed). Key must be protected from

    change (use abstract class that provides only

    a constructor).

    Keys must be comparable.

  • 7/26/2019 Hashing Powerpoint

    18/58

  • 7/26/2019 Hashing Powerpoint

    19/58

    19/51

    Hashing Problem

    RT&T is a large phone company, and they

    want to provide enhanced caller ID

    capability: given a phone number, return the callers name

    phone numbers are in the range 0 toR = 10101

    n is the number of phone numbers usedwant to do this as efficiently as possible

  • 7/26/2019 Hashing Powerpoint

    20/58

  • 7/26/2019 Hashing Powerpoint

    21/58

  • 7/26/2019 Hashing Powerpoint

    22/58

    22/51

    Generalized indexing

    Hash table

    Data storage associated with a key

    The key need not be an integer

  • 7/26/2019 Hashing Powerpoint

    23/58

    23/51

    Hash Tables

    A data structure

    The location of an item is determined

    directly as a function of the item itselfNot by a sequence of trial and error comparisons

    Commonly used to provide faster searching

    O(n) for linear searches

    O (logn) for binary search

    O(1) for hash table

  • 7/26/2019 Hashing Powerpoint

    24/58

  • 7/26/2019 Hashing Powerpoint

    25/58

    25/51

    Another Solution

    A Hash Table is an alternative solution with

    O(1) expected query time and O(n + N)

    space, whereN is the size of the table Like an array, but with a function to map

    the large range of keys into a smaller one

    e.g., take the original key, mod the size of thetable, and use that as an index

  • 7/26/2019 Hashing Powerpoint

    26/58

    26/51

    Example

    Insert item (401-863-7639, Roberto) into a table of

    size 5

    4018637639 mod 5 = 4, so item (401-863-7639,Roberto) is stored in slot 4 of the table

    A lookup uses the same process: map the key to an

    index, then check the array cell at that index

    401-

    863-7639

    Roberto

    0 1 2 3 4

  • 7/26/2019 Hashing Powerpoint

    27/58

    27/51

    Collision

    Insert (401-863-9350, Andy)

    And insert (401-863-2234, Devin). We have

    a collision!

  • 7/26/2019 Hashing Powerpoint

    28/58

    28/51

    Collision Resolution

    How to deal with two keys which map to

    the same cell of the array?

    Use chaining

    Set up lists of items with the same index

  • 7/26/2019 Hashing Powerpoint

    29/58

  • 7/26/2019 Hashing Powerpoint

    30/58

  • 7/26/2019 Hashing Powerpoint

    31/58

    31/51

    Hash Function

    Function hdefined by h(i) = i

    Determines the location of an item iin the

    hash table

    Called a hash function.

    To reduce the large size of a hash table

    use h(i) = imod25;

  • 7/26/2019 Hashing Powerpoint

    32/58

  • 7/26/2019 Hashing Powerpoint

    33/58

  • 7/26/2019 Hashing Powerpoint

    34/58

  • 7/26/2019 Hashing Powerpoint

    35/58

  • 7/26/2019 Hashing Powerpoint

    36/58

    36/51

    Popular Hash-Code Maps

    The polynomial is computed withHorners

    rule, ignoring overflows, at a fixed value x:

    a0+ x (a1+ x (a2+ ... x (an-2+ x an-1) ... ))The choicex = 33, 37, 39, or 41 gives at most 6

    collisions on a vocabulary of 50,000 English

    words

    Why is the component-sum hash code bad

    for strings?

  • 7/26/2019 Hashing Powerpoint

    37/58

    37/51

    Random Hashing

    Random hashing

    Uses a simple random number generation

    technique Scatters the items randomly throughout

    the hash table

  • 7/26/2019 Hashing Powerpoint

    38/58

  • 7/26/2019 Hashing Powerpoint

    39/58

  • 7/26/2019 Hashing Powerpoint

    40/58

  • 7/26/2019 Hashing Powerpoint

    41/58

  • 7/26/2019 Hashing Powerpoint

    42/58

  • 7/26/2019 Hashing Powerpoint

    43/58

  • 7/26/2019 Hashing Powerpoint

    44/58

  • 7/26/2019 Hashing Powerpoint

    45/58

  • 7/26/2019 Hashing Powerpoint

    46/58

    46/51

    Hash code

    static inthashCode(long i) {

    return(int)((i >> 32) +(int) i);}

  • 7/26/2019 Hashing Powerpoint

    47/58

  • 7/26/2019 Hashing Powerpoint

    48/58

  • 7/26/2019 Hashing Powerpoint

    49/58

  • 7/26/2019 Hashing Powerpoint

    50/58

    50/51

    Linear Probing Hash Table

    /** constructor providing the hash comparator and the

    capacity

    * of the bucket array */

    public LinearProbingHashTable(HashComparator hc, int

    bN) {

    h = hc;

    N = bN;

    A = newItem[N];

    }

  • 7/26/2019 Hashing Powerpoint

    51/58

  • 7/26/2019 Hashing Powerpoint

    52/58

  • 7/26/2019 Hashing Powerpoint

    53/58

  • 7/26/2019 Hashing Powerpoint

    54/58

  • 7/26/2019 Hashing Powerpoint

    55/58

  • 7/26/2019 Hashing Powerpoint

    56/58

    56/51

    Dictionary

    public voidinsertItem (Object key, Object element)

    throwsInvalidKeyException {

    check(key);

    int i = h.

    hashValue(key) % N; // division method compression

    map

    intj = i;

  • 7/26/2019 Hashing Powerpoint

    57/58

  • 7/26/2019 Hashing Powerpoint

    58/58