Balls Bins Adv
-
Upload
barbara-fleming -
Category
Documents
-
view
217 -
download
0
Transcript of Balls Bins Adv
-
7/27/2019 Balls Bins Adv
1/131
Outline
Balls and Bins (Advanced)
K. Subramani1
1 Lane Department of Computer Science and Electrical Engineering
West Virginia University
6 March, 2012
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
2/131
Outline
Outline
1 Recap
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
3/131
Outline
Outline
1 Recap
2 The Poisson Approximation
Some theorems and lemmas
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
4/131
Outline
Outline
1 Recap
2 The Poisson Approximation
Some theorems and lemmas
3 Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
5/131
Recap
The Poisson Approximation
Applications to Hashing
Recap
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
6/131
-
7/27/2019 Balls Bins Adv
7/131
Recap
The Poisson Approximation
Applications to Hashing
Recap
Main issues
The experiment of throwing m balls into nbins,
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
8/131
Recap
The Poisson Approximation
Applications to Hashing
Recap
Main issues
The experiment of throwing m balls into nbins, each bin being chosen independently anduniformly at random.
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
9/131
Recap
The Poisson Approximation
Applications to Hashing
Recap
Main issues
The experiment of throwing m balls into nbins, each bin being chosen independently anduniformly at random.Several questions regarding the above random process were examined,
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
10/131
Recap
-
7/27/2019 Balls Bins Adv
11/131
Recap
The Poisson Approximation
Applications to Hashing
Recap
Main issues
The experiment of throwing m balls into nbins, each bin being chosen independently anduniformly at random.Several questions regarding the above random process were examined,
such as expected maximum load, expected number of balls in a bin,
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
12/131
Recap
-
7/27/2019 Balls Bins Adv
13/131
Recap
The Poisson Approximation
Applications to Hashing
Recap
Main issues
The experiment of throwing m balls into nbins, each bin being chosen independently anduniformly at random.Several questions regarding the above random process were examined,
such as expected maximum load, expected number of balls in a bin, expected number of empty
bins, and expected number of bins with r balls.
Subramani Balls and Bins
Recap
-
7/27/2019 Balls Bins Adv
14/131
Recap
The Poisson Approximation
Applications to Hashing
Recap
Main issues
The experiment of throwing m balls into nbins, each bin being chosen independently anduniformly at random.Several questions regarding the above random process were examined,
such as expected maximum load, expected number of balls in a bin, expected number of empty
bins, and expected number of bins with r balls. We also examined the Poisson random variable
and its applications to Balls and Bins questions.
Subramani Balls and Bins
Recap
-
7/27/2019 Balls Bins Adv
15/131
p
The Poisson Approximation
Applications to Hashing
Some theorems and lemmas
Recap
-
7/27/2019 Balls Bins Adv
16/131
The Poisson Approximation
Applications to Hashing
Some theorems and lemmas
The Poisson Approximation
Main Issues
Subramani Balls and Bins
Recap
-
7/27/2019 Balls Bins Adv
17/131
The Poisson Approximation
Applications to Hashing
Some theorems and lemmas
The Poisson Approximation
Main Issues
Is bin emptiness events independent?
Subramani Balls and Bins
Recap
Th P i A i i S h d l
-
7/27/2019 Balls Bins Adv
18/131
The Poisson Approximation
Applications to Hashing
Some theorems and lemmas
The Poisson Approximation
Main Issues
Is bin emptiness events independent?
We know that if m balls are thrown uniformly and independently into n bins, the distribution
is approximately Poisson with mean mn
.
Subramani Balls and Bins
Recap
The Poisson Approximation Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
19/131
The Poisson Approximation
Applications to Hashing
Some theorems and lemmas
The Poisson Approximation
Main Issues
Is bin emptiness events independent?
We know that if m balls are thrown uniformly and independently into n bins, the distribution
is approximately Poisson with mean mn
.
We wish to approximate the load at each bin with independent Poisson random variables.
Subramani Balls and Bins
Recap
The Poisson Approximation Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
20/131
The Poisson Approximation
Applications to Hashing
Some theorems and lemmas
The Poisson Approximation
Main Issues
Is bin emptiness events independent?
We know that if m balls are thrown uniformly and independently into n bins, the distribution
is approximately Poisson with mean mn
.
We wish to approximate the load at each bin with independent Poisson random variables.
We will show that this can be achieved by provable bounds.
Subramani Balls and Bins
Recap
The Poisson Approximation Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
21/131
The Poisson Approximation
Applications to Hashing
Some theorems and lemmas
The Poisson Approximation
Main Issues
Is bin emptiness events independent?
We know that if m balls are thrown uniformly and independently into n bins, the distribution
is approximately Poisson with mean mn
.
We wish to approximate the load at each bin with independent Poisson random variables.
We will show that this can be achieved by provable bounds.
Note
There is a difference between throwing m balls randomly and assigning each bin a number of
balls that is Poisson distributed with mean mn .
Subramani Balls and Bins
Recap
The Poisson Approximation Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
22/131
The Poisson Approximation
Applications to Hashing
Some theorems and lemmas
The Poisson Approximation
Main Issues
Is bin emptiness events independent?
We know that if m balls are thrown uniformly and independently into n bins, the distribution
is approximately Poisson with mean mn
.
We wish to approximate the load at each bin with independent Poisson random variables.
We will show that this can be achieved by provable bounds.
Note
There is a difference between throwing m balls randomly and assigning each bin a number of
balls that is Poisson distributed with mean mn . However, if you use Poisson distribution and endwith m balls, the distributions are identical!
Subramani Balls and Bins
Recap
The Poisson Approximation Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
23/131
pp
Applications to Hashing
Outline
1 Recap
2 The Poisson Approximation
Some theorems and lemmas
3 Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
24/131
Recap
The Poisson Approximation Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
25/131
Applications to Hashing
Theorem I
Theorem
Let X(m)i , 1 i n be the number of balls in the ith bin.
Subramani Balls and Bins
Recap
The Poisson Approximation Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
26/131
Applications to Hashing
Theorem I
Theorem
Let X(m)i , 1 i n be the number of balls in the ith bin. Let Y(m)i , 1 i n denote independentPoisson random variables with mean m
n.
Subramani Balls and Bins
Recap
The Poisson Approximation
A li ti t H hi
Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
27/131
Applications to Hashing
Theorem I
Theorem
Let X(m)i , 1 i n be the number of balls in the ith bin. Let Y(m)i , 1 i n denote independentPoisson random variables with mean m
n.
The distribution of(Y(m)
1 , . . . , Y(m)n ) conditioned oni Y
(m)i = k is the same as(X
(k)1 , . . . , X
(k)n ).
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
28/131
Applications to Hashing
Theorem II
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
29/131
Applications to Hashing
Theorem II
Theorem
Let f(x1, x2, . . . , xn) denote a nonnegative function.
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
30/131
Applications to Hashing
Theorem II
Theorem
Let f(x1, x2, . . . , xn) denote a nonnegative function. Then,
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
31/131
pp cat o s to as g
Theorem II
Theorem
Let f(x1, x2, . . . , xn) denote a nonnegative function. Then,
E[f(X(m)1 , . . . , X(m)n )] emE[f(Y(m)1 , . . . , Y(m)n )]
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
32/131
pp g
Theorem II
Theorem
Let f(x1, x2, . . . , xn) denote a nonnegative function. Then,
E[f(X(m)1 , . . . , X(m)n )] emE[f(Y(m)1 , . . . , Y(m)n )]
Corollary
Any event that takes place with probability p in the Poisson case, takes place with probability at
most
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
33/131
Theorem II
Theorem
Let f(x1, x2, . . . , xn) denote a nonnegative function. Then,
E[f(X(m)1 , . . . , X(m)n )] emE[f(Y(m)1 , . . . , Y(m)n )]
Corollary
Any event that takes place with probability p in the Poisson case, takes place with probability at
most pem in the exact case.
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
34/131
Theorem III
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
35/131
Theorem III
Theorem
Let f(x1, x2, . . . , xn) denote a nonnegative function,
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
36/131
Theorem III
Theorem
Let f(x1, x2, . . . , xn) denote a nonnegative function, such that E[f(X(m)1 , . . . , X
(m)n )] is either
monotonically increasing or monotonically decreasing in m.
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
37/131
-
7/27/2019 Balls Bins Adv
38/131
RecapThe Poisson Approximation
Applications to Hashing
Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
39/131
Theorem III
Theorem
Let f(x1, x2, . . . , xn) denote a nonnegative function, such that E[f(X(m)1 , . . . , X
(m)n )] is either
monotonically increasing or monotonically decreasing in m. Then,
E[f(X(m)1 , . . . , X
(m)n )] 2 E[f(Y(m)1 , . . . , Y(m)n )]
Corollary
Let be an event whose probability is either monotonically increasing or decreasing in thenumber of balls.
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
40/131
RecapThe Poisson Approximation
Applications to Hashing
Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
41/131
Theorem III
Theorem
Let f(x1, x2, . . . , xn) denote a nonnegative function, such that E[f(X(m)1 , . . . , X
(m)n )] is either
monotonically increasing or monotonically decreasing in m. Then,
E[f(X(m)1 , . . . , X
(m)n )] 2 E[f(Y(m)1 , . . . , Y(m)n )]
Corollary
Let be an event whose probability is either monotonically increasing or decreasing in thenumber of balls. If has probability p in the Poisson case, then it has probability at most 2 p inthe exact case.
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Some theorems and lemmas
-
7/27/2019 Balls Bins Adv
42/131
Theorem III
Theorem
Let f(x1, x2, . . . , xn) denote a nonnegative function, such that E[f(X(m)1 , . . . , X
(m)n )] is either
monotonically increasing or monotonically decreasing in m. Then,
E[f(X(m)1 , . . . , X
(m)n )] 2 E[f(Y(m)1 , . . . , Y(m)n )]
Corollary
Let be an event whose probability is either monotonically increasing or decreasing in thenumber of balls. If has probability p in the Poisson case, then it has probability at most 2 p inthe exact case.
Lemma
When n balls are thrown independently into n bins, the maximum load is at least ln nlnln n
with
probability at least(1 1n
), for sufficiently large n.
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
-
7/27/2019 Balls Bins Adv
43/131
Outline
1 Recap
2 The Poisson Approximation
Some theorems and lemmas
3 Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Ch i H hi
-
7/27/2019 Balls Bins Adv
44/131
Chain Hashing
Main Issues
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Ch i H hi
-
7/27/2019 Balls Bins Adv
45/131
Chain Hashing
Main Issues
The password checking problem.
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Ch i H hi
-
7/27/2019 Balls Bins Adv
46/131
Chain Hashing
Main Issues
The password checking problem.
The Hashing Approach and Hash functions.
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Chain Hashing
-
7/27/2019 Balls Bins Adv
47/131
Chain Hashing
Main Issues
The password checking problem.
The Hashing Approach and Hash functions. f : U [0, n1].
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Chain Hashing
-
7/27/2019 Balls Bins Adv
48/131
Chain Hashing
Main Issues
The password checking problem.
The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Chain Hashing
-
7/27/2019 Balls Bins Adv
49/131
Chain Hashing
Main Issues
The password checking problem.
The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.
Assumption: Hash function maps words into bins in random fashion.
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Chain Hashing
-
7/27/2019 Balls Bins Adv
50/131
Chain Hashing
Main Issues
The password checking problem.
The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.
Assumption: Hash function maps words into bins in random fashion.
For each x U, the probability that f(x) = j is 1n
,
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Chain Hashing
-
7/27/2019 Balls Bins Adv
51/131
Chain Hashing
Main Issues
The password checking problem.
The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.
Assumption: Hash function maps words into bins in random fashion.
For each x U, the probability that f(x) = j is 1n
, and
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Chain Hashing
-
7/27/2019 Balls Bins Adv
52/131
Chain Hashing
Main Issues
The password checking problem.
The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.
Assumption: Hash function maps words into bins in random fashion.
For each x U, the probability that f(x) = j is 1n
, and the values of f(x) for each x areindependent of each other.
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Chain Hashing
-
7/27/2019 Balls Bins Adv
53/131
Chain Hashing
Main Issues
The password checking problem.
The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.
Assumption: Hash function maps words into bins in random fashion.
For each x U, the probability that f(x) = j is 1n
, and the values of f(x) for each x areindependent of each other.
Search approach.
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Chain Hashing
-
7/27/2019 Balls Bins Adv
54/131
Chain Hashing
Main Issues
The password checking problem.
The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.
Assumption: Hash function maps words into bins in random fashion.
For each x U, the probability that f(x) = j is 1n
, and the values of f(x) for each x areindependent of each other.
Search approach.
If the word is not in dictionary, expected time is mn
,
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Chain Hashing
-
7/27/2019 Balls Bins Adv
55/131
g
Main Issues
The password checking problem.
The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.
Assumption: Hash function maps words into bins in random fashion.
For each x U, the probability that f(x) = j is 1n
, and the values of f(x) for each x areindependent of each other.
Search approach.
If the word is not in dictionary, expected time is mn
, otherwise, expected time is 1 + m1n
.
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
56/131
RecapThe Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Chain Hashing
-
7/27/2019 Balls Bins Adv
57/131
g
Main Issues
The password checking problem.
The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.
Assumption: Hash function maps words into bins in random fashion.
For each x U, the probability that f(x) = j is 1n
, and the values of f(x) for each x areindependent of each other.
Search approach.
If the word is not in dictionary, expected time is mn
, otherwise, expected time is 1 + m1n
.
Choosing n= m, gives constant expected search time.
When n= m, the maximum load is ( ln nlnln n), w.h.p.;
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Chain Hashing
-
7/27/2019 Balls Bins Adv
58/131
Main Issues
The password checking problem.
The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.
Assumption: Hash function maps words into bins in random fashion.
For each x U, the probability that f(x) = j is 1n
, and the values of f(x) for each x areindependent of each other.
Search approach.
If the word is not in dictionary, expected time is mn
, otherwise, expected time is 1 + m1n
.
Choosing n= m, gives constant expected search time.
When n= m, the maximum load is ( ln nlnln n), w.h.p.; faster than binary search.
Subramani Balls and Bins
RecapThe Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Chain Hashing
-
7/27/2019 Balls Bins Adv
59/131
Main Issues
The password checking problem.
The Hashing Approach and Hash functions. f : U [0, n1].Chain Hashing.
Assumption: Hash function maps words into bins in random fashion.
For each x U, the probability that f(x) = j is 1n
, and the values of f(x) for each x areindependent of each other.
Search approach.
If the word is not in dictionary, expected time is mn
, otherwise, expected time is 1 + m1n
.
Choosing n= m, gives constant expected search time.
When n= m, the maximum load is ( ln nlnln n), w.h.p.; faster than binary search.
Wasted space.
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
60/131
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Bit String Hashing
-
7/27/2019 Balls Bins Adv
61/131
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Bit String Hashing
-
7/27/2019 Balls Bins Adv
62/131
Main Issues
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
63/131
-
7/27/2019 Balls Bins Adv
64/131
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Bit String Hashing
-
7/27/2019 Balls Bins Adv
65/131
Main Issues
Goal is to save space instead of time.
Assume that passwords are restricted to 64 bits and that a hash function maps words into
32 bits.
Fingerprinting and false positives.
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Bit String Hashing
-
7/27/2019 Balls Bins Adv
66/131
Main Issues
Goal is to save space instead of time.
Assume that passwords are restricted to 64 bits and that a hash function maps words into
32 bits.
Fingerprinting and false positives.
The approximate set membership problem:
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Bit String Hashing
-
7/27/2019 Balls Bins Adv
67/131
Main Issues
Goal is to save space instead of time.
Assume that passwords are restricted to 64 bits and that a hash function maps words into
32 bits.
Fingerprinting and false positives.
The approximate set membership problem:
Given a set S= {s1, s2, . . . , sm} of m elements from a large universe U,
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Bit String Hashing
-
7/27/2019 Balls Bins Adv
68/131
Main Issues
Goal is to save space instead of time.
Assume that passwords are restricted to 64 bits and that a hash function maps words into
32 bits.
Fingerprinting and false positives.
The approximate set membership problem:
Given a set S= {s1, s2, . . . , sm} of m elements from a large universe U, we would like torepresent elements in such a way,
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Bit String Hashing
-
7/27/2019 Balls Bins Adv
69/131
Main Issues
Goal is to save space instead of time.
Assume that passwords are restricted to 64 bits and that a hash function maps words into
32 bits.
Fingerprinting and false positives.
The approximate set membership problem:
Given a set S= {s1, s2, . . . , sm} of m elements from a large universe U, we would like torepresent elements in such a way, as to efficiently answer queries of the form Is x S?
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
70/131
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Bit String Hashing
-
7/27/2019 Balls Bins Adv
71/131
Main Issues
Goal is to save space instead of time.
Assume that passwords are restricted to 64 bits and that a hash function maps words into
32 bits.
Fingerprinting and false positives.
The approximate set membership problem:
Given a set S= {s1, s2, . . . , sm} of m elements from a large universe U, we would like torepresent elements in such a way, as to efficiently answer queries of the form Is x S?The disallowed passwords correspond to S. We also want to be space efficient and are
willing to tolerate some error.
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Bit String Hashing
-
7/27/2019 Balls Bins Adv
72/131
Main Issues
Goal is to save space instead of time.
Assume that passwords are restricted to 64 bits and that a hash function maps words into
32 bits.
Fingerprinting and false positives.
The approximate set membership problem:
Given a set S= {s1, s2, . . . , sm} of m elements from a large universe U, we would like torepresent elements in such a way, as to efficiently answer queries of the form Is x S?The disallowed passwords correspond to S. We also want to be space efficient and are
willing to tolerate some error.
Assume that we use b bits to create a fingerprint.
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Bit String Hashing
-
7/27/2019 Balls Bins Adv
73/131
Main Issues
Goal is to save space instead of time.
Assume that passwords are restricted to 64 bits and that a hash function maps words into
32 bits.
Fingerprinting and false positives.
The approximate set membership problem:
Given a set S= {s1, s2, . . . , sm} of m elements from a large universe U, we would like torepresent elements in such a way, as to efficiently answer queries of the form Is x S?The disallowed passwords correspond to S. We also want to be space efficient and are
willing to tolerate some error.
Assume that we use b bits to create a fingerprint. The probability that an acceptable
password has a fingerprint that is different from any specific password is
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Bit String Hashing
-
7/27/2019 Balls Bins Adv
74/131
Main Issues
Goal is to save space instead of time.
Assume that passwords are restricted to 64 bits and that a hash function maps words into
32 bits.
Fingerprinting and false positives.
The approximate set membership problem:
Given a set S= {s1, s2, . . . , sm} of m elements from a large universe U, we would like torepresent elements in such a way, as to efficiently answer queries of the form Is x S?The disallowed passwords correspond to S. We also want to be space efficient and are
willing to tolerate some error.
Assume that we use b bits to create a fingerprint. The probability that an acceptable
password has a fingerprint that is different from any specific password is (1 12b ).
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Bit String Hashing (contd.)
-
7/27/2019 Balls Bins Adv
75/131
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Bit String Hashing (contd.)
-
7/27/2019 Balls Bins Adv
76/131
Main Issues (contd.)
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String HashingBloom Filters
Breaking Symmetry
Bit String Hashing (contd.)
-
7/27/2019 Balls Bins Adv
77/131
Main Issues (contd.)
Since S has size m, what is the probability that an acceptable password fingerprint will
match the fingerprints of one of the elements of S?
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Bit String Hashing (contd.)
-
7/27/2019 Balls Bins Adv
78/131
Main Issues (contd.)
Since S has size m, what is the probability that an acceptable password fingerprint will
match the fingerprints of one of the elements of S? 1 (1 12b
)m
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Bit String Hashing (contd.)
-
7/27/2019 Balls Bins Adv
79/131
Main Issues (contd.)
Since S has size m, what is the probability that an acceptable password fingerprint will
match the fingerprints of one of the elements of S? 1 (1 12b
)m 1em2b .
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Bit String Hashing (contd.)
-
7/27/2019 Balls Bins Adv
80/131
Main Issues (contd.)
Since S has size m, what is the probability that an acceptable password fingerprint will
match the fingerprints of one of the elements of S? 1 (1 12b
)m 1em2b .Since we want the probability of a false positive to be less than c, we need,
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Bit String Hashing (contd.)
-
7/27/2019 Balls Bins Adv
81/131
Main Issues (contd.)
Since S has size m, what is the probability that an acceptable password fingerprint will
match the fingerprints of one of the elements of S? 1 (1 12b
)m 1em2b .Since we want the probability of a false positive to be less than c, we need,
1
em2b
c
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Bit String Hashing (contd.)
-
7/27/2019 Balls Bins Adv
82/131
Main Issues (contd.)
Since S has size m, what is the probability that an acceptable password fingerprint will
match the fingerprints of one of the elements of S? 1 (1 12b
)m 1em2b .Since we want the probability of a false positive to be less than c, we need,
1
em2b
c
em2b 1 c
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Bit String Hashing (contd.)
-
7/27/2019 Balls Bins Adv
83/131
Main Issues (contd.)
Since S has size m, what is the probability that an acceptable password fingerprint will
match the fingerprints of one of the elements of S? 1 (1 12b
)m 1em2b .Since we want the probability of a false positive to be less than c, we need,
1
em2b
c
em2b 1 c b log2
m
ln 11c
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Bit String Hashing (contd.)
-
7/27/2019 Balls Bins Adv
84/131
Main Issues (contd.)
Since S has size m, what is the probability that an acceptable password fingerprint will
match the fingerprints of one of the elements of S? 1 (1 12b
)m 1em2b .Since we want the probability of a false positive to be less than c, we need,
1
em2b
c
em2b 1 c b log2
m
ln 11c
If we choose b= 2 log2 m, the probability of a false positive is:
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Bit String Hashing (contd.)
-
7/27/2019 Balls Bins Adv
85/131
Main Issues (contd.)
Since S has size m, what is the probability that an acceptable password fingerprint will
match the fingerprints of one of the elements of S? 1 (1 12b
)m 1em2b .Since we want the probability of a false positive to be less than c, we need,
1
em2b
c
em2b 1 c b log2
m
ln 11c
If we choose b= 2 log2 m, the probability of a false positive is: 1 (1 1m2 )m
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Bit String Hashing (contd.)
-
7/27/2019 Balls Bins Adv
86/131
Main Issues (contd.)
Since S has size m, what is the probability that an acceptable password fingerprint will
match the fingerprints of one of the elements of S? 1 (1 12b
)m 1em2b .Since we want the probability of a false positive to be less than c, we need,
1
em2b
c
em2b 1 c b log2
m
ln 11c
If we choose b= 2 log2 m, the probability of a false positive is: 1 (1 1m2 )m < 1m.
If our dictionary has 216 words, using 32 bits when hashing, leads to an error probability of atmost 1
65,536.
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Outline
-
7/27/2019 Balls Bins Adv
87/131
1 Recap
2 The Poisson Approximation
Some theorems and lemmas
3 Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Bloom Filters
-
7/27/2019 Balls Bins Adv
88/131
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
89/131
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Bloom Filters
-
7/27/2019 Balls Bins Adv
90/131
Main Issues
Chain hashing optimizes time,
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Bloom Filters
-
7/27/2019 Balls Bins Adv
91/131
Main Issues
Chain hashing optimizes time, while bit string hashing optimizes space.
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Bloom Filters
-
7/27/2019 Balls Bins Adv
92/131
Main Issues
Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a
better trade-off?
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Bloom Filters
-
7/27/2019 Balls Bins Adv
93/131
Main Issues
Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a
better trade-off? Bloom filters!
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Bloom Filters
-
7/27/2019 Balls Bins Adv
94/131
Main Issues
Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a
better trade-off? Bloom filters!
A Bloom filter consists of an array of nbits, A[0] to A[n1],
Subramani Balls and Bins
Recap
The Poisson ApproximationApplications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Bloom Filters
-
7/27/2019 Balls Bins Adv
95/131
Main Issues
Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a
better trade-off? Bloom filters!
A Bloom filter consists of an array of nbits, A[0] to A[n1], initially set to 0.
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Bloom Filters
-
7/27/2019 Balls Bins Adv
96/131
Main Issues
Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a
better trade-off? Bloom filters!
A Bloom filter consists of an array of nbits, A[0] to A[n1], initially set to 0.k hash function h1, h2, . . . , hk with range {0, . . . , n1} are used.
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Bloom Filters
-
7/27/2019 Balls Bins Adv
97/131
Main Issues
Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a
better trade-off? Bloom filters!
A Bloom filter consists of an array of nbits, A[0] to A[n1], initially set to 0.k hash function h1, h2, . . . , hk with range {0, . . . , n1} are used.We wish to represent a set S= {s1, s2, . . . , sm} of m elements.
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Bloom Filters
-
7/27/2019 Balls Bins Adv
98/131
Main Issues
Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a
better trade-off? Bloom filters!
A Bloom filter consists of an array of nbits, A[0] to A[n1], initially set to 0.k hash function h1, h2, . . . , hk with range {0, . . . , n1} are used.We wish to represent a set S= {s1, s2, . . . , sm} of m elements.Set A[hi(s)] to 1, for each 1 i k and each s S.
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Bloom Filters
-
7/27/2019 Balls Bins Adv
99/131
Main Issues
Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a
better trade-off? Bloom filters!
A Bloom filter consists of an array of nbits, A[0] to A[n1], initially set to 0.k hash function h1, h2, . . . , hk with range {0, . . . , n1} are used.We wish to represent a set S= {s1, s2, . . . , sm} of m elements.Set A[hi(s)] to 1, for each 1 i k and each s S.How to check if x S?
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Bloom Filters
-
7/27/2019 Balls Bins Adv
100/131
Main Issues
Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a
better trade-off? Bloom filters!
A Bloom filter consists of an array of nbits, A[0] to A[n1], initially set to 0.k hash function h1, h2, . . . , hk with range {0, . . . , n1} are used.We wish to represent a set S= {s1, s2, . . . , sm} of m elements.Set A[hi(s)] to 1, for each 1 i k and each s S.How to check if x S? Check all locations A[hi(x)], 1 i k.
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Bloom Filters
-
7/27/2019 Balls Bins Adv
101/131
Main Issues
Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a
better trade-off? Bloom filters!
A Bloom filter consists of an array of nbits, A[0] to A[n1], initially set to 0.k hash function h1, h2, . . . , hk with range {0, . . . , n1} are used.We wish to represent a set S= {s1, s2, . . . , sm} of m elements.Set A[hi(s)] to 1, for each 1 i k and each s S.How to check if x S? Check all locations A[hi(x)], 1 i k.How could we go wrong?
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Bloom Filters
-
7/27/2019 Balls Bins Adv
102/131
Main Issues
Chain hashing optimizes time, while bit string hashing optimizes space. Can we get a
better trade-off? Bloom filters!
A Bloom filter consists of an array of nbits, A[0] to A[n1], initially set to 0.k hash function h1, h2, . . . , hk with range {0, . . . , n1} are used.We wish to represent a set S= {s1, s2, . . . , sm} of m elements.Set A[hi(s)] to 1, for each 1 i k and each s S.How to check if x S? Check all locations A[hi(x)], 1 i k.How could we go wrong? False positives.
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Analyzing error probability
-
7/27/2019 Balls Bins Adv
103/131
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Analyzing error probability
-
7/27/2019 Balls Bins Adv
104/131
Analysis
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Analyzing error probability
-
7/27/2019 Balls Bins Adv
105/131
Analysis
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Analyzing error probability
-
7/27/2019 Balls Bins Adv
106/131
Analysis
What is the probability that a specific bit is 0, after the preprocessing?
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Analyzing error probability
-
7/27/2019 Balls Bins Adv
107/131
Analysis
What is the probability that a specific bit is 0, after the preprocessing? (1 1n
)km
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
108/131
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Analyzing error probability
-
7/27/2019 Balls Bins Adv
109/131
Analysis
What is the probability that a specific bit is 0, after the preprocessing? (1 1n
)km = ekm
n .
Assume that a fraction p= ekm
n of the entries are still 0, after S has been hashed into the
Bloom filter.
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Analyzing error probability
-
7/27/2019 Balls Bins Adv
110/131
Analysis
What is the probability that a specific bit is 0, after the preprocessing? (1 1n
)km = ekm
n .
Assume that a fraction p= ekm
n of the entries are still 0, after S has been hashed into the
Bloom filter.
The probability of a false positive is then precisely the probability that all the hash functions
map the input string x S, to 1,
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Analyzing error probability
-
7/27/2019 Balls Bins Adv
111/131
Analysis
What is the probability that a specific bit is 0, after the preprocessing? (1 1n
)km = ekm
n .
Assume that a fraction p= ekm
n of the entries are still 0, after S has been hashed into the
Bloom filter.
The probability of a false positive is then precisely the probability that all the hash functions
map the input string x S, to 1, which is,
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Analyzing error probability
-
7/27/2019 Balls Bins Adv
112/131
Analysis
What is the probability that a specific bit is 0, after the preprocessing? (1 1n
)km = ekm
n .
Assume that a fraction p= ekm
n of the entries are still 0, after S has been hashed into the
Bloom filter.
The probability of a false positive is then precisely the probability that all the hash functions
map the input string x S, to 1, which is,
(1 (1 1n
)km)k
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom FiltersBreaking Symmetry
Analyzing error probability
-
7/27/2019 Balls Bins Adv
113/131
Analysis
What is the probability that a specific bit is 0, after the preprocessing? (1 1n
)km = ekm
n .
Assume that a fraction p= ekm
n of the entries are still 0, after S has been hashed into the
Bloom filter.
The probability of a false positive is then precisely the probability that all the hash functions
map the input string x S, to 1, which is,
(1 (1 1n
)km)k = (1e kmn )k
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Analyzing error probability
-
7/27/2019 Balls Bins Adv
114/131
Analysis
What is the probability that a specific bit is 0, after the preprocessing? (1 1n
)km = ekm
n .
Assume that a fraction p= ekm
n of the entries are still 0, after S has been hashed into the
Bloom filter.
The probability of a false positive is then precisely the probability that all the hash functions
map the input string x S, to 1, which is,(1 (1 1
n)km)k = (1e kmn )k
= f()
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Analyzing error probability
-
7/27/2019 Balls Bins Adv
115/131
Analysis
What is the probability that a specific bit is 0, after the preprocessing? (1 1n
)km = ekm
n .
Assume that a fraction p= ekm
n of the entries are still 0, after S has been hashed into the
Bloom filter.
The probability of a false positive is then precisely the probability that all the hash functions
map the input string x S, to 1, which is,(1 (1 1
n)km)k = (1e kmn )k
= f()
Optimizing for k, we get k = ln2 and f (0.6185)mn .
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Outline
-
7/27/2019 Balls Bins Adv
116/131
1 Recap
2 The Poisson Approximation
Some theorems and lemmas
3 Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Breaking Symmetry
-
7/27/2019 Balls Bins Adv
117/131
Main Issues
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Breaking Symmetry
-
7/27/2019 Balls Bins Adv
118/131
Main IssuesResource contention.
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Breaking Symmetry
-
7/27/2019 Balls Bins Adv
119/131
Main IssuesResource contention.
Sequential ordering.
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Breaking Symmetry
-
7/27/2019 Balls Bins Adv
120/131
Main IssuesResource contention.
Sequential ordering.
The Hashing approach.
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Breaking Symmetry
-
7/27/2019 Balls Bins Adv
121/131
Main IssuesResource contention.
Sequential ordering.
The Hashing approach. Hash each user identifier into b bits.
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Breaking Symmetry
Main Issues
-
7/27/2019 Balls Bins Adv
122/131
Resource contention.
Sequential ordering.
The Hashing approach. Hash each user identifier into b bits.
How likely that two users will have the same hash value?
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
123/131
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Breaking Symmetry
Main Issues
-
7/27/2019 Balls Bins Adv
124/131
Resource contention.
Sequential ordering.
The Hashing approach. Hash each user identifier into b bits.
How likely that two users will have the same hash value? Fix one user. What is the
probability that some other user obtains the same hash value?
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Breaking Symmetry
Main Issues
-
7/27/2019 Balls Bins Adv
125/131
Resource contention.
Sequential ordering.
The Hashing approach. Hash each user identifier into b bits.
How likely that two users will have the same hash value? Fix one user. What is the
probability that some other user obtains the same hash value?
1 (1 12b
)n1
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
126/131
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Breaking Symmetry
Main Issues
-
7/27/2019 Balls Bins Adv
127/131
Resource contention.
Sequential ordering.
The Hashing approach. Hash each user identifier into b bits.
How likely that two users will have the same hash value? Fix one user. What is the
probability that some other user obtains the same hash value?
1 (1 12b
)n1 n12b
By the union bound, the probability that any user has the same hash values as the fixed
user is
Subramani Balls and Bins
-
7/27/2019 Balls Bins Adv
128/131
-
7/27/2019 Balls Bins Adv
129/131
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Breaking Symmetry
Main Issues
-
7/27/2019 Balls Bins Adv
130/131
Resource contention.
Sequential ordering.
The Hashing approach. Hash each user identifier into b bits.
How likely that two users will have the same hash value? Fix one user. What is the
probability that some other user obtains the same hash value?
1 (1 12b
)n1 n12b
By the union bound, the probability that any user has the same hash values as the fixed
user isn(n1)
2b. In order to guarantee success with probability (1 1
n), choose
b= 3 log n2.
Subramani Balls and Bins
Recap
The Poisson Approximation
Applications to Hashing
Chain Hashing
Bit String Hashing
Bloom Filters
Breaking Symmetry
Breaking Symmetry
Main Issues
-
7/27/2019 Balls Bins Adv
131/131
Resource contention.
Sequential ordering.
The Hashing approach. Hash each user identifier into b bits.
How likely that two users will have the same hash value? Fix one user. What is the
probability that some other user obtains the same hash value?
1 (1 12b
)n1 n12b
By the union bound, the probability that any user has the same hash values as the fixed
user isn(n1)
2b. In order to guarantee success with probability (1 1
n), choose
b= 3 log n2.Can also be used for the leader election problem.
Subramani Balls and Bins