Balls Bins Adv

download Balls Bins Adv

of 131

Transcript of Balls Bins Adv

  • 7/27/2019 Balls Bins Adv

    1/131

    Outline

    Balls and Bins (Advanced)

    K. Subramani1

    1 Lane Department of Computer Science and Electrical Engineering

    West Virginia University

    6 March, 2012

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    2/131

    Outline

    Outline

    1 Recap

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    3/131

    Outline

    Outline

    1 Recap

    2 The Poisson Approximation

    Some theorems and lemmas

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    4/131

    Outline

    Outline

    1 Recap

    2 The Poisson Approximation

    Some theorems and lemmas

    3 Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    5/131

    Recap

    The Poisson Approximation

    Applications to Hashing

    Recap

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    6/131

  • 7/27/2019 Balls Bins Adv

    7/131

    Recap

    The Poisson Approximation

    Applications to Hashing

    Recap

    Main issues

    The experiment of throwing m balls into nbins,

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    8/131

    Recap

    The Poisson Approximation

    Applications to Hashing

    Recap

    Main issues

    The experiment of throwing m balls into nbins, each bin being chosen independently anduniformly at random.

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    9/131

    Recap

    The Poisson Approximation

    Applications to Hashing

    Recap

    Main issues

    The experiment of throwing m balls into nbins, each bin being chosen independently anduniformly at random.Several questions regarding the above random process were examined,

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    10/131

    Recap

  • 7/27/2019 Balls Bins Adv

    11/131

    Recap

    The Poisson Approximation

    Applications to Hashing

    Recap

    Main issues

    The experiment of throwing m balls into nbins, each bin being chosen independently anduniformly at random.Several questions regarding the above random process were examined,

    such as expected maximum load, expected number of balls in a bin,

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    12/131

    Recap

  • 7/27/2019 Balls Bins Adv

    13/131

    Recap

    The Poisson Approximation

    Applications to Hashing

    Recap

    Main issues

    The experiment of throwing m balls into nbins, each bin being chosen independently anduniformly at random.Several questions regarding the above random process were examined,

    such as expected maximum load, expected number of balls in a bin, expected number of empty

    bins, and expected number of bins with r balls.

    Subramani Balls and Bins

    Recap

  • 7/27/2019 Balls Bins Adv

    14/131

    Recap

    The Poisson Approximation

    Applications to Hashing

    Recap

    Main issues

    The experiment of throwing m balls into nbins, each bin being chosen independently anduniformly at random.Several questions regarding the above random process were examined,

    such as expected maximum load, expected number of balls in a bin, expected number of empty

    bins, and expected number of bins with r balls. We also examined the Poisson random variable

    and its applications to Balls and Bins questions.

    Subramani Balls and Bins

    Recap

  • 7/27/2019 Balls Bins Adv

    15/131

    p

    The Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

    Recap

  • 7/27/2019 Balls Bins Adv

    16/131

    The Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

    The Poisson Approximation

    Main Issues

    Subramani Balls and Bins

    Recap

  • 7/27/2019 Balls Bins Adv

    17/131

    The Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

    The Poisson Approximation

    Main Issues

    Is bin emptiness events independent?

    Subramani Balls and Bins

    Recap

    Th P i A i i S h d l

  • 7/27/2019 Balls Bins Adv

    18/131

    The Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

    The Poisson Approximation

    Main Issues

    Is bin emptiness events independent?

    We know that if m balls are thrown uniformly and independently into n bins, the distribution

    is approximately Poisson with mean mn

    .

    Subramani Balls and Bins

    Recap

    The Poisson Approximation Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    19/131

    The Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

    The Poisson Approximation

    Main Issues

    Is bin emptiness events independent?

    We know that if m balls are thrown uniformly and independently into n bins, the distribution

    is approximately Poisson with mean mn

    .

    We wish to approximate the load at each bin with independent Poisson random variables.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    20/131

    The Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

    The Poisson Approximation

    Main Issues

    Is bin emptiness events independent?

    We know that if m balls are thrown uniformly and independently into n bins, the distribution

    is approximately Poisson with mean mn

    .

    We wish to approximate the load at each bin with independent Poisson random variables.

    We will show that this can be achieved by provable bounds.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    21/131

    The Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

    The Poisson Approximation

    Main Issues

    Is bin emptiness events independent?

    We know that if m balls are thrown uniformly and independently into n bins, the distribution

    is approximately Poisson with mean mn

    .

    We wish to approximate the load at each bin with independent Poisson random variables.

    We will show that this can be achieved by provable bounds.

    Note

    There is a difference between throwing m balls randomly and assigning each bin a number of

    balls that is Poisson distributed with mean mn .

    Subramani Balls and Bins

    Recap

    The Poisson Approximation Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    22/131

    The Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

    The Poisson Approximation

    Main Issues

    Is bin emptiness events independent?

    We know that if m balls are thrown uniformly and independently into n bins, the distribution

    is approximately Poisson with mean mn

    .

    We wish to approximate the load at each bin with independent Poisson random variables.

    We will show that this can be achieved by provable bounds.

    Note

    There is a difference between throwing m balls randomly and assigning each bin a number of

    balls that is Poisson distributed with mean mn . However, if you use Poisson distribution and endwith m balls, the distributions are identical!

    Subramani Balls and Bins

    Recap

    The Poisson Approximation Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    23/131

    pp

    Applications to Hashing

    Outline

    1 Recap

    2 The Poisson Approximation

    Some theorems and lemmas

    3 Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    24/131

    Recap

    The Poisson Approximation Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    25/131

    Applications to Hashing

    Theorem I

    Theorem

    Let X(m)i , 1 i n be the number of balls in the ith bin.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    26/131

    Applications to Hashing

    Theorem I

    Theorem

    Let X(m)i , 1 i n be the number of balls in the ith bin. Let Y(m)i , 1 i n denote independentPoisson random variables with mean m

    n.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    A li ti t H hi

    Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    27/131

    Applications to Hashing

    Theorem I

    Theorem

    Let X(m)i , 1 i n be the number of balls in the ith bin. Let Y(m)i , 1 i n denote independentPoisson random variables with mean m

    n.

    The distribution of(Y(m)

    1 , . . . , Y(m)n ) conditioned oni Y

    (m)i = k is the same as(X

    (k)1 , . . . , X

    (k)n ).

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    28/131

    Applications to Hashing

    Theorem II

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    29/131

    Applications to Hashing

    Theorem II

    Theorem

    Let f(x1, x2, . . . , xn) denote a nonnegative function.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    30/131

    Applications to Hashing

    Theorem II

    Theorem

    Let f(x1, x2, . . . , xn) denote a nonnegative function. Then,

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    31/131

    pp cat o s to as g

    Theorem II

    Theorem

    Let f(x1, x2, . . . , xn) denote a nonnegative function. Then,

    E[f(X(m)1 , . . . , X(m)n )] emE[f(Y(m)1 , . . . , Y(m)n )]

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    32/131

    pp g

    Theorem II

    Theorem

    Let f(x1, x2, . . . , xn) denote a nonnegative function. Then,

    E[f(X(m)1 , . . . , X(m)n )] emE[f(Y(m)1 , . . . , Y(m)n )]

    Corollary

    Any event that takes place with probability p in the Poisson case, takes place with probability at

    most

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    33/131

    Theorem II

    Theorem

    Let f(x1, x2, . . . , xn) denote a nonnegative function. Then,

    E[f(X(m)1 , . . . , X(m)n )] emE[f(Y(m)1 , . . . , Y(m)n )]

    Corollary

    Any event that takes place with probability p in the Poisson case, takes place with probability at

    most pem in the exact case.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    34/131

    Theorem III

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    35/131

    Theorem III

    Theorem

    Let f(x1, x2, . . . , xn) denote a nonnegative function,

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    36/131

    Theorem III

    Theorem

    Let f(x1, x2, . . . , xn) denote a nonnegative function, such that E[f(X(m)1 , . . . , X

    (m)n )] is either

    monotonically increasing or monotonically decreasing in m.

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    37/131

  • 7/27/2019 Balls Bins Adv

    38/131

    RecapThe Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    39/131

    Theorem III

    Theorem

    Let f(x1, x2, . . . , xn) denote a nonnegative function, such that E[f(X(m)1 , . . . , X

    (m)n )] is either

    monotonically increasing or monotonically decreasing in m. Then,

    E[f(X(m)1 , . . . , X

    (m)n )] 2 E[f(Y(m)1 , . . . , Y(m)n )]

    Corollary

    Let be an event whose probability is either monotonically increasing or decreasing in thenumber of balls.

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    40/131

    RecapThe Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    41/131

    Theorem III

    Theorem

    Let f(x1, x2, . . . , xn) denote a nonnegative function, such that E[f(X(m)1 , . . . , X

    (m)n )] is either

    monotonically increasing or monotonically decreasing in m. Then,

    E[f(X(m)1 , . . . , X

    (m)n )] 2 E[f(Y(m)1 , . . . , Y(m)n )]

    Corollary

    Let be an event whose probability is either monotonically increasing or decreasing in thenumber of balls. If has probability p in the Poisson case, then it has probability at most 2 p inthe exact case.

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Some theorems and lemmas

  • 7/27/2019 Balls Bins Adv

    42/131

    Theorem III

    Theorem

    Let f(x1, x2, . . . , xn) denote a nonnegative function, such that E[f(X(m)1 , . . . , X

    (m)n )] is either

    monotonically increasing or monotonically decreasing in m. Then,

    E[f(X(m)1 , . . . , X

    (m)n )] 2 E[f(Y(m)1 , . . . , Y(m)n )]

    Corollary

    Let be an event whose probability is either monotonically increasing or decreasing in thenumber of balls. If has probability p in the Poisson case, then it has probability at most 2 p inthe exact case.

    Lemma

    When n balls are thrown independently into n bins, the maximum load is at least ln nlnln n

    with

    probability at least(1 1n

    ), for sufficiently large n.

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

  • 7/27/2019 Balls Bins Adv

    43/131

    Outline

    1 Recap

    2 The Poisson Approximation

    Some theorems and lemmas

    3 Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Ch i H hi

  • 7/27/2019 Balls Bins Adv

    44/131

    Chain Hashing

    Main Issues

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Ch i H hi

  • 7/27/2019 Balls Bins Adv

    45/131

    Chain Hashing

    Main Issues

    The password checking problem.

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Ch i H hi

  • 7/27/2019 Balls Bins Adv

    46/131

    Chain Hashing

    Main Issues

    The password checking problem.

    The Hashing Approach and Hash functions.

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Chain Hashing

  • 7/27/2019 Balls Bins Adv

    47/131

    Chain Hashing

    Main Issues

    The password checking problem.

    The Hashing Approach and Hash functions. f : U [0, n1].

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Chain Hashing

  • 7/27/2019 Balls Bins Adv

    48/131

    Chain Hashing

    Main Issues

    The password checking problem.

    The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Chain Hashing

  • 7/27/2019 Balls Bins Adv

    49/131

    Chain Hashing

    Main Issues

    The password checking problem.

    The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.

    Assumption: Hash function maps words into bins in random fashion.

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Chain Hashing

  • 7/27/2019 Balls Bins Adv

    50/131

    Chain Hashing

    Main Issues

    The password checking problem.

    The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.

    Assumption: Hash function maps words into bins in random fashion.

    For each x U, the probability that f(x) = j is 1n

    ,

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Chain Hashing

  • 7/27/2019 Balls Bins Adv

    51/131

    Chain Hashing

    Main Issues

    The password checking problem.

    The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.

    Assumption: Hash function maps words into bins in random fashion.

    For each x U, the probability that f(x) = j is 1n

    , and

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Chain Hashing

  • 7/27/2019 Balls Bins Adv

    52/131

    Chain Hashing

    Main Issues

    The password checking problem.

    The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.

    Assumption: Hash function maps words into bins in random fashion.

    For each x U, the probability that f(x) = j is 1n

    , and the values of f(x) for each x areindependent of each other.

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Chain Hashing

  • 7/27/2019 Balls Bins Adv

    53/131

    Chain Hashing

    Main Issues

    The password checking problem.

    The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.

    Assumption: Hash function maps words into bins in random fashion.

    For each x U, the probability that f(x) = j is 1n

    , and the values of f(x) for each x areindependent of each other.

    Search approach.

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Chain Hashing

  • 7/27/2019 Balls Bins Adv

    54/131

    Chain Hashing

    Main Issues

    The password checking problem.

    The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.

    Assumption: Hash function maps words into bins in random fashion.

    For each x U, the probability that f(x) = j is 1n

    , and the values of f(x) for each x areindependent of each other.

    Search approach.

    If the word is not in dictionary, expected time is mn

    ,

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Chain Hashing

  • 7/27/2019 Balls Bins Adv

    55/131

    g

    Main Issues

    The password checking problem.

    The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.

    Assumption: Hash function maps words into bins in random fashion.

    For each x U, the probability that f(x) = j is 1n

    , and the values of f(x) for each x areindependent of each other.

    Search approach.

    If the word is not in dictionary, expected time is mn

    , otherwise, expected time is 1 + m1n

    .

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    56/131

    RecapThe Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Chain Hashing

  • 7/27/2019 Balls Bins Adv

    57/131

    g

    Main Issues

    The password checking problem.

    The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.

    Assumption: Hash function maps words into bins in random fashion.

    For each x U, the probability that f(x) = j is 1n

    , and the values of f(x) for each x areindependent of each other.

    Search approach.

    If the word is not in dictionary, expected time is mn

    , otherwise, expected time is 1 + m1n

    .

    Choosing n= m, gives constant expected search time.

    When n= m, the maximum load is ( ln nlnln n), w.h.p.;

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Chain Hashing

  • 7/27/2019 Balls Bins Adv

    58/131

    Main Issues

    The password checking problem.

    The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.

    Assumption: Hash function maps words into bins in random fashion.

    For each x U, the probability that f(x) = j is 1n

    , and the values of f(x) for each x areindependent of each other.

    Search approach.

    If the word is not in dictionary, expected time is mn

    , otherwise, expected time is 1 + m1n

    .

    Choosing n= m, gives constant expected search time.

    When n= m, the maximum load is ( ln nlnln n), w.h.p.; faster than binary search.

    Subramani Balls and Bins

    RecapThe Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Chain Hashing

  • 7/27/2019 Balls Bins Adv

    59/131

    Main Issues

    The password checking problem.

    The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.

    Assumption: Hash function maps words into bins in random fashion.

    For each x U, the probability that f(x) = j is 1n

    , and the values of f(x) for each x areindependent of each other.

    Search approach.

    If the word is not in dictionary, expected time is mn

    , otherwise, expected time is 1 + m1n

    .

    Choosing n= m, gives constant expected search time.

    When n= m, the maximum load is ( ln nlnln n), w.h.p.; faster than binary search.

    Wasted space.

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    60/131

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Bit String Hashing

  • 7/27/2019 Balls Bins Adv

    61/131

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Bit String Hashing

  • 7/27/2019 Balls Bins Adv

    62/131

    Main Issues

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    63/131

  • 7/27/2019 Balls Bins Adv

    64/131

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Bit String Hashing

  • 7/27/2019 Balls Bins Adv

    65/131

    Main Issues

    Goal is to save space instead of time.

    Assume that passwords are restricted to 64 bits and that a hash function maps words into

    32 bits.

    Fingerprinting and false positives.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Bit String Hashing

  • 7/27/2019 Balls Bins Adv

    66/131

    Main Issues

    Goal is to save space instead of time.

    Assume that passwords are restricted to 64 bits and that a hash function maps words into

    32 bits.

    Fingerprinting and false positives.

    The approximate set membership problem:

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Bit String Hashing

  • 7/27/2019 Balls Bins Adv

    67/131

    Main Issues

    Goal is to save space instead of time.

    Assume that passwords are restricted to 64 bits and that a hash function maps words into

    32 bits.

    Fingerprinting and false positives.

    The approximate set membership problem:

    Given a set S= {s1, s2, . . . , sm} of m elements from a large universe U,

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Bit String Hashing

  • 7/27/2019 Balls Bins Adv

    68/131

    Main Issues

    Goal is to save space instead of time.

    Assume that passwords are restricted to 64 bits and that a hash function maps words into

    32 bits.

    Fingerprinting and false positives.

    The approximate set membership problem:

    Given a set S= {s1, s2, . . . , sm} of m elements from a large universe U, we would like torepresent elements in such a way,

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Bit String Hashing

  • 7/27/2019 Balls Bins Adv

    69/131

    Main Issues

    Goal is to save space instead of time.

    Assume that passwords are restricted to 64 bits and that a hash function maps words into

    32 bits.

    Fingerprinting and false positives.

    The approximate set membership problem:

    Given a set S= {s1, s2, . . . , sm} of m elements from a large universe U, we would like torepresent elements in such a way, as to efficiently answer queries of the form Is x S?

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    70/131

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Bit String Hashing

  • 7/27/2019 Balls Bins Adv

    71/131

    Main Issues

    Goal is to save space instead of time.

    Assume that passwords are restricted to 64 bits and that a hash function maps words into

    32 bits.

    Fingerprinting and false positives.

    The approximate set membership problem:

    Given a set S= {s1, s2, . . . , sm} of m elements from a large universe U, we would like torepresent elements in such a way, as to efficiently answer queries of the form Is x S?The disallowed passwords correspond to S. We also want to be space efficient and are

    willing to tolerate some error.

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Bit String Hashing

  • 7/27/2019 Balls Bins Adv

    72/131

    Main Issues

    Goal is to save space instead of time.

    Assume that passwords are restricted to 64 bits and that a hash function maps words into

    32 bits.

    Fingerprinting and false positives.

    The approximate set membership problem:

    Given a set S= {s1, s2, . . . , sm} of m elements from a large universe U, we would like torepresent elements in such a way, as to efficiently answer queries of the form Is x S?The disallowed passwords correspond to S. We also want to be space efficient and are

    willing to tolerate some error.

    Assume that we use b bits to create a fingerprint.

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Bit String Hashing

  • 7/27/2019 Balls Bins Adv

    73/131

    Main Issues

    Goal is to save space instead of time.

    Assume that passwords are restricted to 64 bits and that a hash function maps words into

    32 bits.

    Fingerprinting and false positives.

    The approximate set membership problem:

    Given a set S= {s1, s2, . . . , sm} of m elements from a large universe U, we would like torepresent elements in such a way, as to efficiently answer queries of the form Is x S?The disallowed passwords correspond to S. We also want to be space efficient and are

    willing to tolerate some error.

    Assume that we use b bits to create a fingerprint. The probability that an acceptable

    password has a fingerprint that is different from any specific password is

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Bit String Hashing

  • 7/27/2019 Balls Bins Adv

    74/131

    Main Issues

    Goal is to save space instead of time.

    Assume that passwords are restricted to 64 bits and that a hash function maps words into

    32 bits.

    Fingerprinting and false positives.

    The approximate set membership problem:

    Given a set S= {s1, s2, . . . , sm} of m elements from a large universe U, we would like torepresent elements in such a way, as to efficiently answer queries of the form Is x S?The disallowed passwords correspond to S. We also want to be space efficient and are

    willing to tolerate some error.

    Assume that we use b bits to create a fingerprint. The probability that an acceptable

    password has a fingerprint that is different from any specific password is (1 12b ).

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Bit String Hashing (contd.)

  • 7/27/2019 Balls Bins Adv

    75/131

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Bit String Hashing (contd.)

  • 7/27/2019 Balls Bins Adv

    76/131

    Main Issues (contd.)

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String HashingBloom Filters

    Breaking Symmetry

    Bit String Hashing (contd.)

  • 7/27/2019 Balls Bins Adv

    77/131

    Main Issues (contd.)

    Since S has size m, what is the probability that an acceptable password fingerprint will

    match the fingerprints of one of the elements of S?

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Bit String Hashing (contd.)

  • 7/27/2019 Balls Bins Adv

    78/131

    Main Issues (contd.)

    Since S has size m, what is the probability that an acceptable password fingerprint will

    match the fingerprints of one of the elements of S? 1 (1 12b

    )m

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Bit String Hashing (contd.)

  • 7/27/2019 Balls Bins Adv

    79/131

    Main Issues (contd.)

    Since S has size m, what is the probability that an acceptable password fingerprint will

    match the fingerprints of one of the elements of S? 1 (1 12b

    )m 1em2b .

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Bit String Hashing (contd.)

  • 7/27/2019 Balls Bins Adv

    80/131

    Main Issues (contd.)

    Since S has size m, what is the probability that an acceptable password fingerprint will

    match the fingerprints of one of the elements of S? 1 (1 12b

    )m 1em2b .Since we want the probability of a false positive to be less than c, we need,

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Bit String Hashing (contd.)

  • 7/27/2019 Balls Bins Adv

    81/131

    Main Issues (contd.)

    Since S has size m, what is the probability that an acceptable password fingerprint will

    match the fingerprints of one of the elements of S? 1 (1 12b

    )m 1em2b .Since we want the probability of a false positive to be less than c, we need,

    1

    em2b

    c

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Bit String Hashing (contd.)

  • 7/27/2019 Balls Bins Adv

    82/131

    Main Issues (contd.)

    Since S has size m, what is the probability that an acceptable password fingerprint will

    match the fingerprints of one of the elements of S? 1 (1 12b

    )m 1em2b .Since we want the probability of a false positive to be less than c, we need,

    1

    em2b

    c

    em2b 1 c

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Bit String Hashing (contd.)

  • 7/27/2019 Balls Bins Adv

    83/131

    Main Issues (contd.)

    Since S has size m, what is the probability that an acceptable password fingerprint will

    match the fingerprints of one of the elements of S? 1 (1 12b

    )m 1em2b .Since we want the probability of a false positive to be less than c, we need,

    1

    em2b

    c

    em2b 1 c b log2

    m

    ln 11c

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Bit String Hashing (contd.)

  • 7/27/2019 Balls Bins Adv

    84/131

    Main Issues (contd.)

    Since S has size m, what is the probability that an acceptable password fingerprint will

    match the fingerprints of one of the elements of S? 1 (1 12b

    )m 1em2b .Since we want the probability of a false positive to be less than c, we need,

    1

    em2b

    c

    em2b 1 c b log2

    m

    ln 11c

    If we choose b= 2 log2 m, the probability of a false positive is:

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Bit String Hashing (contd.)

  • 7/27/2019 Balls Bins Adv

    85/131

    Main Issues (contd.)

    Since S has size m, what is the probability that an acceptable password fingerprint will

    match the fingerprints of one of the elements of S? 1 (1 12b

    )m 1em2b .Since we want the probability of a false positive to be less than c, we need,

    1

    em2b

    c

    em2b 1 c b log2

    m

    ln 11c

    If we choose b= 2 log2 m, the probability of a false positive is: 1 (1 1m2 )m

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Bit String Hashing (contd.)

  • 7/27/2019 Balls Bins Adv

    86/131

    Main Issues (contd.)

    Since S has size m, what is the probability that an acceptable password fingerprint will

    match the fingerprints of one of the elements of S? 1 (1 12b

    )m 1em2b .Since we want the probability of a false positive to be less than c, we need,

    1

    em2b

    c

    em2b 1 c b log2

    m

    ln 11c

    If we choose b= 2 log2 m, the probability of a false positive is: 1 (1 1m2 )m < 1m.

    If our dictionary has 216 words, using 32 bits when hashing, leads to an error probability of atmost 1

    65,536.

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Outline

  • 7/27/2019 Balls Bins Adv

    87/131

    1 Recap

    2 The Poisson Approximation

    Some theorems and lemmas

    3 Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Bloom Filters

  • 7/27/2019 Balls Bins Adv

    88/131

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    89/131

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Bloom Filters

  • 7/27/2019 Balls Bins Adv

    90/131

    Main Issues

    Chain hashing optimizes time,

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Bloom Filters

  • 7/27/2019 Balls Bins Adv

    91/131

    Main Issues

    Chain hashing optimizes time, while bit string hashing optimizes space.

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Bloom Filters

  • 7/27/2019 Balls Bins Adv

    92/131

    Main Issues

    Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a

    better trade-off?

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Bloom Filters

  • 7/27/2019 Balls Bins Adv

    93/131

    Main Issues

    Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a

    better trade-off? Bloom filters!

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Bloom Filters

  • 7/27/2019 Balls Bins Adv

    94/131

    Main Issues

    Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a

    better trade-off? Bloom filters!

    A Bloom filter consists of an array of nbits, A[0] to A[n1],

    Subramani Balls and Bins

    Recap

    The Poisson ApproximationApplications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Bloom Filters

  • 7/27/2019 Balls Bins Adv

    95/131

    Main Issues

    Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a

    better trade-off? Bloom filters!

    A Bloom filter consists of an array of nbits, A[0] to A[n1], initially set to 0.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Bloom Filters

  • 7/27/2019 Balls Bins Adv

    96/131

    Main Issues

    Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a

    better trade-off? Bloom filters!

    A Bloom filter consists of an array of nbits, A[0] to A[n1], initially set to 0.k hash function h1, h2, . . . , hk with range {0, . . . , n1} are used.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Bloom Filters

  • 7/27/2019 Balls Bins Adv

    97/131

    Main Issues

    Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a

    better trade-off? Bloom filters!

    A Bloom filter consists of an array of nbits, A[0] to A[n1], initially set to 0.k hash function h1, h2, . . . , hk with range {0, . . . , n1} are used.We wish to represent a set S= {s1, s2, . . . , sm} of m elements.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Bloom Filters

  • 7/27/2019 Balls Bins Adv

    98/131

    Main Issues

    Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a

    better trade-off? Bloom filters!

    A Bloom filter consists of an array of nbits, A[0] to A[n1], initially set to 0.k hash function h1, h2, . . . , hk with range {0, . . . , n1} are used.We wish to represent a set S= {s1, s2, . . . , sm} of m elements.Set A[hi(s)] to 1, for each 1 i k and each s S.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Bloom Filters

  • 7/27/2019 Balls Bins Adv

    99/131

    Main Issues

    Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a

    better trade-off? Bloom filters!

    A Bloom filter consists of an array of nbits, A[0] to A[n1], initially set to 0.k hash function h1, h2, . . . , hk with range {0, . . . , n1} are used.We wish to represent a set S= {s1, s2, . . . , sm} of m elements.Set A[hi(s)] to 1, for each 1 i k and each s S.How to check if x S?

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Bloom Filters

  • 7/27/2019 Balls Bins Adv

    100/131

    Main Issues

    Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a

    better trade-off? Bloom filters!

    A Bloom filter consists of an array of nbits, A[0] to A[n1], initially set to 0.k hash function h1, h2, . . . , hk with range {0, . . . , n1} are used.We wish to represent a set S= {s1, s2, . . . , sm} of m elements.Set A[hi(s)] to 1, for each 1 i k and each s S.How to check if x S? Check all locations A[hi(x)], 1 i k.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Bloom Filters

  • 7/27/2019 Balls Bins Adv

    101/131

    Main Issues

    Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a

    better trade-off? Bloom filters!

    A Bloom filter consists of an array of nbits, A[0] to A[n1], initially set to 0.k hash function h1, h2, . . . , hk with range {0, . . . , n1} are used.We wish to represent a set S= {s1, s2, . . . , sm} of m elements.Set A[hi(s)] to 1, for each 1 i k and each s S.How to check if x S? Check all locations A[hi(x)], 1 i k.How could we go wrong?

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Bloom Filters

  • 7/27/2019 Balls Bins Adv

    102/131

    Main Issues

    Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a

    better trade-off? Bloom filters!

    A Bloom filter consists of an array of nbits, A[0] to A[n1], initially set to 0.k hash function h1, h2, . . . , hk with range {0, . . . , n1} are used.We wish to represent a set S= {s1, s2, . . . , sm} of m elements.Set A[hi(s)] to 1, for each 1 i k and each s S.How to check if x S? Check all locations A[hi(x)], 1 i k.How could we go wrong? False positives.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Analyzing error probability

  • 7/27/2019 Balls Bins Adv

    103/131

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Analyzing error probability

  • 7/27/2019 Balls Bins Adv

    104/131

    Analysis

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Analyzing error probability

  • 7/27/2019 Balls Bins Adv

    105/131

    Analysis

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Analyzing error probability

  • 7/27/2019 Balls Bins Adv

    106/131

    Analysis

    What is the probability that a specific bit is 0, after the preprocessing?

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Analyzing error probability

  • 7/27/2019 Balls Bins Adv

    107/131

    Analysis

    What is the probability that a specific bit is 0, after the preprocessing? (1 1n

    )km

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    108/131

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Analyzing error probability

  • 7/27/2019 Balls Bins Adv

    109/131

    Analysis

    What is the probability that a specific bit is 0, after the preprocessing? (1 1n

    )km = ekm

    n .

    Assume that a fraction p= ekm

    n of the entries are still 0, after S has been hashed into the

    Bloom filter.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Analyzing error probability

  • 7/27/2019 Balls Bins Adv

    110/131

    Analysis

    What is the probability that a specific bit is 0, after the preprocessing? (1 1n

    )km = ekm

    n .

    Assume that a fraction p= ekm

    n of the entries are still 0, after S has been hashed into the

    Bloom filter.

    The probability of a false positive is then precisely the probability that all the hash functions

    map the input string x S, to 1,

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Analyzing error probability

  • 7/27/2019 Balls Bins Adv

    111/131

    Analysis

    What is the probability that a specific bit is 0, after the preprocessing? (1 1n

    )km = ekm

    n .

    Assume that a fraction p= ekm

    n of the entries are still 0, after S has been hashed into the

    Bloom filter.

    The probability of a false positive is then precisely the probability that all the hash functions

    map the input string x S, to 1, which is,

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Analyzing error probability

  • 7/27/2019 Balls Bins Adv

    112/131

    Analysis

    What is the probability that a specific bit is 0, after the preprocessing? (1 1n

    )km = ekm

    n .

    Assume that a fraction p= ekm

    n of the entries are still 0, after S has been hashed into the

    Bloom filter.

    The probability of a false positive is then precisely the probability that all the hash functions

    map the input string x S, to 1, which is,

    (1 (1 1n

    )km)k

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom FiltersBreaking Symmetry

    Analyzing error probability

  • 7/27/2019 Balls Bins Adv

    113/131

    Analysis

    What is the probability that a specific bit is 0, after the preprocessing? (1 1n

    )km = ekm

    n .

    Assume that a fraction p= ekm

    n of the entries are still 0, after S has been hashed into the

    Bloom filter.

    The probability of a false positive is then precisely the probability that all the hash functions

    map the input string x S, to 1, which is,

    (1 (1 1n

    )km)k = (1e kmn )k

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Analyzing error probability

  • 7/27/2019 Balls Bins Adv

    114/131

    Analysis

    What is the probability that a specific bit is 0, after the preprocessing? (1 1n

    )km = ekm

    n .

    Assume that a fraction p= ekm

    n of the entries are still 0, after S has been hashed into the

    Bloom filter.

    The probability of a false positive is then precisely the probability that all the hash functions

    map the input string x S, to 1, which is,(1 (1 1

    n)km)k = (1e kmn )k

    = f()

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Analyzing error probability

  • 7/27/2019 Balls Bins Adv

    115/131

    Analysis

    What is the probability that a specific bit is 0, after the preprocessing? (1 1n

    )km = ekm

    n .

    Assume that a fraction p= ekm

    n of the entries are still 0, after S has been hashed into the

    Bloom filter.

    The probability of a false positive is then precisely the probability that all the hash functions

    map the input string x S, to 1, which is,(1 (1 1

    n)km)k = (1e kmn )k

    = f()

    Optimizing for k, we get k = ln2 and f (0.6185)mn .

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Outline

  • 7/27/2019 Balls Bins Adv

    116/131

    1 Recap

    2 The Poisson Approximation

    Some theorems and lemmas

    3 Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Breaking Symmetry

  • 7/27/2019 Balls Bins Adv

    117/131

    Main Issues

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Breaking Symmetry

  • 7/27/2019 Balls Bins Adv

    118/131

    Main IssuesResource contention.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Breaking Symmetry

  • 7/27/2019 Balls Bins Adv

    119/131

    Main IssuesResource contention.

    Sequential ordering.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Breaking Symmetry

  • 7/27/2019 Balls Bins Adv

    120/131

    Main IssuesResource contention.

    Sequential ordering.

    The Hashing approach.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Breaking Symmetry

  • 7/27/2019 Balls Bins Adv

    121/131

    Main IssuesResource contention.

    Sequential ordering.

    The Hashing approach. Hash each user identifier into b bits.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Breaking Symmetry

    Main Issues

  • 7/27/2019 Balls Bins Adv

    122/131

    Resource contention.

    Sequential ordering.

    The Hashing approach. Hash each user identifier into b bits.

    How likely that two users will have the same hash value?

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    123/131

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Breaking Symmetry

    Main Issues

  • 7/27/2019 Balls Bins Adv

    124/131

    Resource contention.

    Sequential ordering.

    The Hashing approach. Hash each user identifier into b bits.

    How likely that two users will have the same hash value? Fix one user. What is the

    probability that some other user obtains the same hash value?

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Breaking Symmetry

    Main Issues

  • 7/27/2019 Balls Bins Adv

    125/131

    Resource contention.

    Sequential ordering.

    The Hashing approach. Hash each user identifier into b bits.

    How likely that two users will have the same hash value? Fix one user. What is the

    probability that some other user obtains the same hash value?

    1 (1 12b

    )n1

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    126/131

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Breaking Symmetry

    Main Issues

  • 7/27/2019 Balls Bins Adv

    127/131

    Resource contention.

    Sequential ordering.

    The Hashing approach. Hash each user identifier into b bits.

    How likely that two users will have the same hash value? Fix one user. What is the

    probability that some other user obtains the same hash value?

    1 (1 12b

    )n1 n12b

    By the union bound, the probability that any user has the same hash values as the fixed

    user is

    Subramani Balls and Bins

  • 7/27/2019 Balls Bins Adv

    128/131

  • 7/27/2019 Balls Bins Adv

    129/131

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Breaking Symmetry

    Main Issues

  • 7/27/2019 Balls Bins Adv

    130/131

    Resource contention.

    Sequential ordering.

    The Hashing approach. Hash each user identifier into b bits.

    How likely that two users will have the same hash value? Fix one user. What is the

    probability that some other user obtains the same hash value?

    1 (1 12b

    )n1 n12b

    By the union bound, the probability that any user has the same hash values as the fixed

    user isn(n1)

    2b. In order to guarantee success with probability (1 1

    n), choose

    b= 3 log n2.

    Subramani Balls and Bins

    Recap

    The Poisson Approximation

    Applications to Hashing

    Chain Hashing

    Bit String Hashing

    Bloom Filters

    Breaking Symmetry

    Breaking Symmetry

    Main Issues

  • 7/27/2019 Balls Bins Adv

    131/131

    Resource contention.

    Sequential ordering.

    The Hashing approach. Hash each user identifier into b bits.

    How likely that two users will have the same hash value? Fix one user. What is the

    probability that some other user obtains the same hash value?

    1 (1 12b

    )n1 n12b

    By the union bound, the probability that any user has the same hash values as the fixed

    user isn(n1)

    2b. In order to guarantee success with probability (1 1

    n), choose

    b= 3 log n2.Can also be used for the leader election problem.

    Subramani Balls and Bins