Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem:...

30
Probability in Computing © 2010, Quoc Le & Van Nguyen Probability for Computing 1 LECTURE 5: MORE APPLICATIONS WITH PROBABILISTIC ANALYSIS, BINS AND BALLS

Transcript of Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem:...

Page 1: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Probability in Computing

© 2010, Quoc Le & Van Nguyen Probability for Computing 1

LECTURE 5: MORE APPLICATIONS WITH PROBABILISTIC ANALYSIS, BINS AND BALLS

Page 2: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Agenda

Review: Coupon Collector’s problem and Packet SamplingAnalysis of Quick-Sort

© 2010, Quoc Le & Van Nguyen Probability for Computing 2

Analysis of Quick-SortBirthday Paradox and applicationsThe Bins and Balls Model

Page 3: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Coupon Collector ProblemProblem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one of every type of coupon, you can send in for a prize.

Question: How many boxes of cereal must you buy before obtaining at least one of every type of coupon.

© 2010, Quoc Le & Van Nguyen Probability for Computing 3

before obtaining at least one of every type of coupon.

Let X be the number of boxes bought until at least one of every type of coupon is obtained.

E[X] = nH(n) = nlnn

Page 4: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Application: Packet Sampling

Sampling packets on a router with probability p The number of packets transmitted after the last sampled

packet until and including the next sampled packet is geometrically distributed.

From the point of destination host, determining all

© 2010, Quoc Le & Van Nguyen Probability for Computing 4

From the point of destination host, determining all the routers on the path is like a coupon collector’s problem.

If there’s n routers, then the expected number of packets arrived before destination host knows all of the routers on the path = nln(n).

Page 5: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

DoS attack

© 2010, Quoc Le & Van Nguyen Probability for Computing 5

Page 6: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

IP traceback

Marking and Reconstruction Node append vs.

node sampling

© 2010, Quoc Le & Van Nguyen Probability for Computing 6

node sampling

Page 7: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Node apend

A1 A2A3

R5 R6 R7

D

D R6

© 2010, Quoc Le & Van Nguyen Probability for Computing 7

VR1

R2

R3 R4D R6 R3

D R6 R3 R2

D R6 R3 R2 R1D R6 R3 R2 R1

Page 8: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Node SamplingA1

A2 A

3

R3

R4

R5

R6

R7

D1 R7

© 2010, Quoc Le & Van Nguyen Probability for Computing 8

V

R1

R2

3 4

R2p=0.51

D1 R2

x=0.2 < p

Page 9: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Expected Run-Time ofQuickSort

© 2010, Quoc Le & Van Nguyen Probability for Computing 9

Page 10: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

AnalysisWorst-case: n2.Depends on how we choose the pivot.Good pivot (divide the list in two nearly equal

length sub-lists) vs. Bad pivot.

© 2010, Quoc Le & Van Nguyen Probability for Computing 10

length sub-lists) vs. Bad pivot.In case of good pivot -> nlg(n). [by solving

recurrence]

If we choose pivot point randomly, we will have a randomized version of QuickSort.

Page 11: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

AnalysisXij be a random variable that Takes value 1 if yi and yj are compared with each other 0 if they are not compared.

E[X] = ∑∑E[Xij]E[X ] = 2/ (j-i+1)

© 2010, Quoc Le & Van Nguyen Probability for Computing 11

ijE[Xij] = 2/ (j-i+1) Consider when the set Yij={yi, yi+1, …, yj} is

“touched” by a pivot the first time. If this pivot is either yi or yj then the two will be compared, otherwise Never – a (S1,S2) split

Using k = j-i+1, we can compute E[X] = 2nln(n)

Page 12: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Detail analysis

© 2010, Quoc Le & Van Nguyen Probability for Computing 12

Page 13: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

What is the probability thattwo persons in a room of30 have the same

Birthday “Paradox”

© 2010, Quoc Le & Van Nguyen Probability for Computing 13

30 have the samebirthday?

Page 14: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Birthday Paradox

Ways to assign k different birthdays without duplicates:

N = 365 * 364 * ... * (365 – k + 1)

© 2010, Quoc Le & Van Nguyen Probability for Computing 14

N = 365 * 364 * ... * (365 – k + 1)= 365! / (365 – k)!Ways to assign k different birthdays with possible duplicates:

D = 365 * 365 * ... * 365 = 365k

Page 15: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Birthday “Paradox”Assuming real birthdays assigned randomly: N/D = probability there are no duplicates1 - N/D = probability there is a duplicate

© 2010, Quoc Le & Van Nguyen Probability for Computing 15

= 1 – 365! / ((365 – k)!(365)k )

Page 16: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

16Generalizing Birthdays

P(n, k) = 1 – n!/(n-k)!nk

Given k random selections from n possible

© 2010, Quoc Le & Van Nguyen Probability for Computing 16

Given k random selections from n possible values, P(n, k) gives the probability that there is at least 1 duplicate.

Page 17: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Birthday Probabilities

P(no two match) = 1 – P(all are different)P(2 chosen from N are different)

= 1 – 1/NP(3 are all different)

= (1 – 1/N)(1 – 2/N)

© 2010, Quoc Le & Van Nguyen Probability for Computing 17

= (1 – 1/N)(1 – 2/N)P(n trials are all different)

= (1 – 1/N)(1 – 2/N) ... (1 – (n – 1)/N)ln (P)

= ln (1 – 1/N) + ln (1 – 2/N) + ... ln (1 – (k – 1)/N)

Page 18: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Happy Birthday Bob!

ln (P) = ln (1 – 1/N) + ... + ln (1 – (k – 1)/N)For 0 < x < 1: ln (1 – x) x

ln (P) – (1/N + 2/N + ... + (n – 1)/N)Gauss says:

© 2010, Quoc Le & Van Nguyen Probability for Computing 18

1 + 2 + 3 + 4 + ... + (n – 1) + n = ½ n (n + 1)So,

ln (P) ½ (k-1) k/NP e½ (k-1)k / N

Probability of match 1 – e½ (k-1)k / N

Page 19: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Applying Birthdays

P(n, k) > 1 – e-k*(k-1)/2n

For n = 365, k = 20:P(365, 20) > 1 – e-20*(19)/2*365

P(365, 20) > .4058

© 2010, Quoc Le & Van Nguyen Probability for Computing 19

P(365, 20) > .4058For n = 264, k = 232: P (264, 232) > .39For n = 264, k = 233: P (264, 233) > .86For n = 264, k = 234: P (264, 234) > .9996Application: Digital Signatures

Page 20: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Digital Signature Scheme: UsingHash Functions

A hash function H maps a message of variable length n bits to a fingerprint of fixed length m bits, with m < n.

This hash value is also called a digest (of

© 2010, Quoc Le & Van Nguyen

This hash value is also called a digest (of the original message).

Since n>m, there exist many X which are map to the same digest collision.

Page 21: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

DS schemes with hashfunctions

DAH

C X.DA(H(X))

X

Signature Generator

H(X) Concatenation

© 2010, Quoc Le & Van Nguyen

Signature Generator

EA

H

+X.DA(H(X)) 0 – Accept1 – Reject

Signature Verifier

Page 22: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Main propertiesGiven a hash function H: X →Y

Long message short, fixed-length hashOne-way property: given y Yit is computationally infeasible to find a value xX

© 2010, Quoc Le & Van Nguyen

it is computationally infeasible to find a value xX s.t. H(x) = y

Collision resistance (collision-free)it is computationally infeasible to find any two

distinct values x’, x X s.t. H(x’) = H(x) This property prevent against signature forgery

Page 23: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Collisions

Avoiding collisions is theoretically impossible Dirichlet principle: n+1 rabbits into n cages at

least 2 rabbits go to the same cage This suggest exhaustive search: try |Y|+1

messages then must find a collision (H:X Y)

© 2010, Quoc Le & Van Nguyen

This suggest exhaustive search: try |Y|+1 messages then must find a collision (H:XY)

In practice Choose |Y| large enough so exhaustive search is

computational infeasible. |Y| not too large or long signature and slow process

However, collision-freeness is still hard

Page 24: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Birthday attackCan hash values be of 64 bits? Look good, initially, since a space of size

264 is too large to do exhaustive search or compute that many hash values

© 2010, Quoc Le & Van Nguyen

compute that many hash values However a birthday attack can easily break

a DS with a 64-bit hash function In fact, the attacker only need to create a

bunch of 232 messages and then launch the attack with reasonably high probability for success.

Page 25: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

How is the attack

Goal: given H, find x, x’ such that H(x)=H(x’)Algorithm: pick a random set S of q values in X for each xS, computes hx=H(x)

if h =h for some x’≠x then collision found: (x,x’), else

© 2010, Quoc Le & Van Nguyen

if hx=hx’ for some x’≠x then collision found: (x,x’), else fail

The average success probability is = 1-exp(q(q-1)/2|Y|) Suppose Y has size 2m, choose q ≈2m/2 then is

almost 0.5!

Page 26: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Balls into BinsWe have m balls that are thrown into n bins, with the location of each ball chosen independently and uniformly at random from n possibilities. What does the distribution of the balls into the bins look like

© 2010, Quoc Le & Van Nguyen Probability for Computing 26

What does the distribution of the balls into the bins look like “Birthday paradox” question: is there a bin with at

least 2 balls How many of the bins are empty? How many balls are in the fullest bin?

Answers to these questions give solutions to many problems in the design and analysis of algorithms

Page 27: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

The maximum loadWhen n balls are thrown independently and uniformly at random into n bins, the probability that the maximum load is more than 3 lnn/lnlnn is at most 1/n for nsufficiently large. By Union bound, Pr [bin 1 receives M balls]

Note that:

© 2010, Quoc Le & Van Nguyen Probability for Computing 27

Note that:

Now, using Union bound again, Pr [ any ball receives M balls] is at most

which is 1/n

Page 28: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Application: Bucket SortA sorting algorithm that

breaks the (nlogn) lower bound under certain input assumptionBucket sort works as follows:

© 2010, Quoc Le & Van Nguyen Probability for Computing 28

Bucket sort works as follows: Set up an array of initially

empty "buckets." Scatter: Go over the original

array, putting each object in its bucket.

Sort each non-empty bucket. Gather: Visit the buckets in

order and put all elements back into the original array.

A set of n =2m integers, randomly chosen from [0,2k),km, can be sorted in expected time O(n) Why: will analyze later!

Page 29: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

The Poisson DistributionConsider m balls, n bins Pr [ a given bin is empty] = Let Xj is a indicator r.v. that os 1 if bin j empty, 0 otherwise Let X be a r.v. that represents # empty bins

© 2010, Quoc Le & Van Nguyen Probability for Computing 29

Generalizing this argument, Pr [a given bin has r balls] =

Approximately,

So:

Page 30: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one

Limit of the Binomial Distribution

© 2010, Quoc Le & Van Nguyen Probability for Computing 30