What is the probability that the firstelomaa/teach/RandAl-17-2.pdf · do not choose an edge of %in...

1/18/2017

1

Theorem 1.7 [Bayes' Law]: Assume that, , … , are mutually disjoint events in the

sample space s.t. . Then

PrPr

Pr( )=

Pr PrPr Pr

.

• We are given three coins and are told that two ofthe coins are fair and the third coin is biased,landing heads with probability 2/3

• We permute the coins randomly, and then flipeach of the coins

18-Jan-17MAT-72306 RandAl, Spring 2017 49

• The first and second coins come upheads, and the third comes up tails

? What is the probability that the firstcoin is the biased one?

• The coins are in a random order and so, beforeour observing the outcomes of the coin flips,each of the three coins is equally likely to be thebiased one

• Let be the event that the th coin flipped isthe biased one, and let be the event that thethree coin flips came up heads, heads, andtails


1/18/2017

2

• Before we flip the coins Pr( ) = 1/3 for all• The probability of the event conditioned on :

Pr = Pr =23

12

12

=16

and

Pr =12

12

13

=1

12• Applying Bayes' law, we have Pr =

Pr PrPr Pr

=1

181

18 + 118 + 1

36=

25

• The three coin flips increases the likelihood thatthe first coin is the biased one from 1/3 to 2/5


• In randomized matrix multiplication test, we wantto evaluate the increase in confidence in thematrix identity obtained through repeated tests

• In the Bayesian approach one starts with a priormodel, giving some initial value to the modelparameters

• This model is then modified, by incorporatingnew observations, to obtain a posterior modelthat captures the new information

• If we have no information about the process thatgenerated the identity then a reasonable priorassumption is that the identity is correct withprobability 1/2


1/18/2017

3

• Let be the event that the identity is correct,and let be the event that the test returns thatthe identity is correct

• We start with Pr = Pr = 1 2, and sincethe test has a one-sided error bounded by 1 2,we have Pr = 1 and Pr 1 2

• Applying Bayes' law yields

Pr =Pr Pr

Pr Pr + Pr Pr1 2

1 2 + 1 2 1 2=

23


• Assume now that we run the randomized testagain and it again returns that the identity iscorrect

• After the first test, we may have revised our priormodel, so that we believe Pr( 2/3 andPr( 1/3

• Now let be the event that the new test returnsthat the identity is correct; since the tests areindependent, as before we have

Pr = 1 and Pr 1/2


1/18/2017

4

• Applying Bayes' law then yields

Pr2 3

2 3 + 1 3 1 2=

45

• In general: If our prior model (before running thetest) is that Pr( 2 2 + 1 and if the testreturns that the identity is correct (event ), then

Pr =2

2 + 1= 1

12 + 1

• Thus, if all 100 calls to the matrix identity testreturn that it is correct, our confidence in thecorrectness of this identity is 1/(2 + 1)


1.4. A Randomized Min-Cut Algorithm

• A cut-set in a graph is a set of edges whoseremoval breaks the graph into two or moreconnected components

• Given a graph = ( , ) with vertices, theminimum cut – or min-cut – problem is to find aminimum cardinality cut-set in

• Minimum cut problems arise in many contexts,including the study of network reliability


1/18/2017

5

• Minimum cuts also arise in clustering problems• For example, if nodes represent Web pages (or

any documents in a hypertext-based system)– and two nodes have an edge between them if

the corresponding nodes have a hyperlinkbetween them,

– then small cuts divide the graph into clustersof documents with few links between clusters

• Documents in different clusters are likely to beunrelated


• The main operation is edge contraction• In contracting an edge { , } we merge vertices

and into one, eliminate all edges connectingand , and retain all other edges in the graph

• The new graph may have parallel edges but noself-loops

• The algorithm consists of 2 iterations• Each iteration picks an edge from the existing

edges in the graph and contracts that edge• Our randomized algorithm chooses the edge

uniformly at random from the remaining edges


1/18/2017

6

• Each iteration reduces # of vertices by one• After 2 iterations, there are two vertices• The algorithm outputs the set of edges

connecting the two remaining vertices• Any cut-set in an intermediate iteration of the

algorithm is also a cut-set of the original graph• Not every cut-set of the original graph is one in

an intermediate iteration, since some edges mayhave been contracted in previous iterations

• As a result, the output of the algorithm is alwaysa cut-set of the original graph but not necessarilythe minimum cardinality cut-set



1/18/2017

7

Theorem 1.8: The algorithm outputs a min-cut setwith probability at least 2 ( ( 1)).

Proof: Let be the size of the min-cut set of .The graph may have several cut-sets of minimumsize. We compute the probability of finding onespecific such set .Since is a cut-set in the graph, removal of the set

partitions the set of vertices into two sets, and, such that there are no edges connecting

vertices in to those in .


Assume that, throughout an execution of thealgorithm, we contract only edges that connect twovertices in or two in , but not edges in .In that case, all the edges eliminated throughoutthe execution will be edges connecting vertices in

or vertices in , and after 2 iterations thealgorithm returns a graph with two verticesconnected by the edges in .We may conclude that, if the algorithm neverchooses an edge of in its 2 iterations, thenthe algorithm returns as the minimum cut-set.


1/18/2017

8

If the size of the cut is small, then the probabilitythat the algorithm chooses an edge of is small –at least when the number of edges remaining islarge compared to .Let be the event that the edge contracted initeration is not in , and let = be theevent that no edge of was contracted in the firstiterations. We need to compute Pr .Start by computing Pr = Pr . Since theminimum cut-set has edges, all vertices in thegraph must have degree or larger. If each vertexis adjacent to at least edges, then the graphmust have at least 2 edges.


Since there are at least /2 edges in the graphand since has edges, the probability that wedo not choose an edge of in the first iteration is

Pr = Pr = 1 = 12

Suppose that the first contraction did not eliminatean edge of . I.e., we condition on the event .Then, after the first iteration, we are left with an( 1)-node graph with minimum cut-set of size .Again, the degree of each vertex in the graph mustbe at least , and the graph must have at least

( 1)/2 edges.


1/18/2017

9

Pr 11 2

= 12

1Similarly,

Pr+ 1 2

= 12

+ 1To compute Pr , we use

Pr = Pr= Pr Pr= Pr Pr | Pr | Pr

2+ 1

=1

+ 1

=2 3

124

13

=2

1.


2. Discrete RandomVariables and Expectation

Random Variables and ExpectationThe Bernoulli and Binomial Random

VariablesConditional Expectation

The Geometric DistributionThe Expected Run-Time of Quicksort

1/18/2017

10

• In tossing two dice we are often interested in thesum of the dice rather than their separate values

• The sample space in tossing two dice consists of36 events of equal probability, given by theordered pairs of numbers {(1,1), (1,2), … , (6, 6)}

• If the quantity we are interested in is the sum ofthe two dice, then we are interested in 11 events(of unequal probability)

• Any such function from the sample space to thereal numbers is called a random variable


2.1. Random Variables and Expectation

Definition 2.1: A random variable (RV) on asample space is a real-valued function on ;that is, . A discrete random variable is aRV that takes on only a finite or countably infinitenumber of values• For a discrete RV and a real value , the event

" = " includes all the basic events of thesample space in which assumes the value

• I.e., " = " represents the set ( ) = }


1/18/2017

11

• We denote the probability of that event by

Pr = = Pr,

( )

• If is the RV representing the sum of the twodice, the event = 4 corresponds to the set ofbasic events {(1, 3), (2,2), (3, 1)}

• Hence,

Pr = 4 =3

36=

112


Definition 2.2: Two RVs and are independentif and only if

Pr(( = ( = )) = Pr( = ) Pr( = )for all values and .Similarly, RVs , , … , are mutuallyindependent if and only if, for any subset [1, ]and any values , ,

Pr ( = ) = Pr =


1/18/2017

12

Definition 2.3: The expectation of a discrete RV, denoted by E[ ], is given by

= Pr = ,

where the summation is over all values in therange of . The expectation is finite if | | Pr( =), converges; otherwise, it is unbounded.

• E.g., the expectation of the RV representingthe sum of two dice is

=1

362 +

236

3 +3

364 + +

136

12 = 7


• As an example of where the expectation of adiscrete RV is unbounded, consider a RV thattakes on the value 2 with probability 1 2 for =1,2, …

• The expected value of is

[ ] =12

2 = 1

• expresses that [ ] is unbounded


1/18/2017

13

2.1.1. Linearity of Expectations

• By this property, the expectation of the sum ofRVs is equal to the sum of their expectations

Theorem 2.1 [Linearity of Expectations]:For any finite collection of discrete RVs

, , … , with finite expectations,

=


Proof: We prove the statement for two random variablesand (general case by induction). The summations thatfollow are understood to be over the ranges of thecorresponding RVs:

+ = + Pr ( = ( = )

= Pr ( = ( = ) + Pr ( = ( = )

= Pr ( = ( = ) + Pr ( = ( = )

= Pr = + Pr =

= [ ] + [ ]The first equality follows from Definition 1.2. In thepenultimate equation uses Theorem 1.6, the law of totalprobability.


1/18/2017

14

• Let us now compute the expected sum of twostandard dice

• Let = + , where represents theoutcome of die for = 1,2

• Then

=16

=72

• Applying the linearity of expectations, we have= + = 7


• Linearity of expectations holds for any collectionof RVs, even if they are not independent

• Consider, e.g., the previous example and let therandom variable = 1 + 1

2

• We have= 1 + 1

2 = 1 + [ 12]

even though 1 and 12 are clearly

dependent• Verify the identity by considering the six possible

outcomes for 1


1/18/2017

15

Lemma 2.2: For any constant and discrete RV ,= [ ].

Proof: The lemma is obvious for = 0. For 0,

[ ] = Pr =

= / Pr = /

= Pr =

= .


2.1.2. Jensen's Inequality

• Let us choose the length of a side of a squareuniformly at random from the range [1,99]

• What is the expected value of the area?• We can write this as [ ]• It is tempting to think of this as being equal to

, but a simple calculation shows that this isnot correct

• In fact, = = 50 = 2500 whereas= 9950 3 3317 > 2500


1/18/2017

16

• More generally, [ ] ( )• Consider = ( )• The RV is nonnegative and hence its

expectation must also be nonnegative[ ] = [( [ ]) ]

= [ + ]= [ 2 [ ] + ( )

= [ ( )• To obtain the penultimate line, use the linearity

of expectations• To obtain the last line use Lemma 2.2 to simplify

[ [ ]] = [ ] [ ]18-Jan-17MAT-72306 RandAl, Spring 2017 79

• The fact that [ ] ( ) is an example ofJensen's inequality

• Jensen's inequality shows that, for any convexfunction , we have [ ( )] ( [ ])

Definition 2.4: A function : is said to beconvex if, for any , and 1,

+ + (1 ) ( )

Lemma 2.3: If is twice differentiable function,then is convex if and only if

"( ) 018-Jan-17MAT-72306 RandAl, Spring 2017 80

1/18/2017

17


Theorem 2.4 [Jensen's Inequality]: If is aconvex function, then [ ( )] ( [ ]).Proof: We prove the theorem assuming thathas a Taylor expansion. Let = [ ].By Taylor's theorem there is a value such that

= + (x ) + ( ) 2+ (x )

since ( ) > 0 by convexity. Taking expectationsand applying linearity of and Lemma 2.2 yields:

[ [ + ]= [ ] + ( )( [ )

= ( ) = ( [ ]).


1/18/2017

18

2.2. The Bernoulli and BinomialRandom Variables

• We run an experiment that succeeds withprobability and fails with probability

• Let be a RV such that

= 1 if the experiment succeeds,0 otherwise

• The variable is called a Bernoulli or anindicator random variable

• Note that, for a Bernoulli RV,[ ] = 1 + (1 ) 0 = = Pr( = 1)


• If we, e.g., flip a fair coin and consider heads asuccess, then the expected value of thecorresponding indicator RV is 1/2

• Consider a sequence of independent coin flips• What is the distribution of the number of heads

in the entire sequence?• More generally, consider a sequence of

independent experiments, each of whichsucceeds with probability

• If we let represent the number of successes inthe experiments, then has a binomialdistribution


1/18/2017

19

Definition 2.5: A binomial RV with parametersand , denoted by ( , ), is defined by the

following probability distribution on = 0,1,2, … , :Pr = =

• I.e., the binomial RV (BRV) equals whenthere are exactly successes and failuresin independent experiments, each of which issuccessful with probability

• Definition 2.5 ensures that the BRV is a validprobability function (Definition 1.2):

Pr = = 118-Jan-17MAT-72306 RandAl, Spring 2017 85

• We want to gather data about the packets goingthrough a router

• We want to know the approximate fraction ofpackets from a certain source / of a certain type

• We store a random subset – or sample – of thepackets for later analysis

• Each packet is stored with probability andpackets go through the router each day, thenumber of sampled packets each day is a BRV

with parameters and• To know how much memory is necessary for

such a sample, determine the expectation of18-Jan-17MAT-72306 RandAl, Spring 2017 86

1/18/2017

20

• If is a BRV with parameters and , then isthe number of successes in trials, where eachtrial is successful with probability

• Define a set of indicator RVs , … , , where= 1 if the th trial is successful and 0

otherwise• Clearly, [ ] = and = and so, by the

linearity of expectations,

= = =


2.3. Conditional Expectation

Definition 2.6:

[ | = ] = Pr = = ,

where the summation is over all in the range of

• The conditional expectation of a RV is, like , aweighted sum of the values it assumes

• Now each value is weighted by the conditionalprobability that the variable assumes that value


1/18/2017

21

• Suppose that we independently roll twostandard six-sided dice

• Let be the number that shows on the first die,the number on the second die, and the sum

of the numbers on the two dice• Then

= 2 = Pr = = 2

=16

=112


• As another example, consider = 5 :

= 5 = Pr = | = 5

=Pr = = 5

Pr = 5

=1 364 36

=52


1/18/2017

22

Lemma 2.5: For any RVs and ,[ ] = Pr = [ | = ],

where the sum is over all values in the range ofand all of the expectations exist.Proof:

Pr = == Pr = Pr = | == Pr = | = Pr == Pr = == Pr = = [ ]


• The linearity of expectations also extends toconditional expectations

Lemma 2.6: For any finite collection of discreteRVs , , … , with finite expectations and forany RV ,

| = = [ | = ]


1/18/2017

23

• Confusingly, the conditional expectation is alsoused to refer to the following RV

Definition 2.7: The expression [ | ] is a RV ( )that takes on the value [ | = ] when =

• [ | ] is not a real value; it is actually a functionof the RV

• Hence [ | ] is itself a function from the samplespace to the real numbers and can therefore bethought of as a RV


• In the previous example of rolling two dice,

= Pr = =16

= +72

• We see that is a RV whose valuedepends on

• If [ | ] is a RV, then it makes sense to considerits expectation [ ]

• We found that = + 7 2• Thus,

= +72

=72

+72

= 7 = [ ]


1/18/2017

24

• More generally,

Theorem 2.7: Y = [ ]

Proof: From Definition 2.7 we have = ,where takes on the value = when

= . Hence

= = Pr =

The right-hand side equals Y by Lemma 2.5.18-Jan-17MAT-72306 RandAl, Spring 2017 95

• Consider a program that includes one call to aprocess

• Assume that each call to process recursivelyspawns new copies of the process , where thenumber of new copies is a BRV with parameters

and• We assume that these random variables are

independent for each call to• What is the expected number of copies of the

process generated by the program?


1/18/2017

25

• To analyze this recursive spawning process, weuse generations

• The initial process is in generation 0• Otherwise, we say that a process is in

generation if it was spawned by anotherprocess in generation 1

• Let denote the number of processes ingeneration

• Since we know that = 1, the number ofprocesses in generation 1 has a binomialdistribution

• Thus, =18-Jan-17MAT-72306 RandAl, Spring 2017 97

• Similarly, suppose we knew that the number ofprocesses in generation 1 was , so

=• Then

= =• Applying Theorem 2.7, we can compute the

expected size of the th generation inductively• We have

= = = [ ]• By induction on , and using the fact that = 1,

we then obtain=


1/18/2017

26

• The expected total number of copies of processgenerated by the program is given by

= =

• If 1 then the expectation is unbounded; if< 1, the expectation is 1 (1 )

• The # of processes generated by the programis bounded iff the # of processes spawned byeach process is less than 1

• This is a simple example of a branching process,a probabilistic paradigm extensively studied inprobability theory


2.4. The Geometric Distribution

• Let us flip a coin until it lands on heads

? What is the distribution of the number of flips?• This is an example of a geometric distribution• It arises when we perform a sequence of

independent trials until the first success, whereeach trial succeeds with probability

Definition 2.8: A geometric RV with parameteris given by the following probability distribution on

= 1,2, … : Pr( = ) =


1/18/2017

27

• Geometric RVs are said to be memorylessbecause the probability that you will reach yourfirst success trials from now is independent ofthe number of failures you have experienced

• Informally, one can ignore past failures – they donot change the distribution of the number offuture trials until first success

• Formally, we have the following

Lemma 2.8: For a geometric RV with parameterand for > 0,

Pr( = + | > ) = Pr( = )


• When a RV takes values in the set of naturalnumbers = {0,1,2,3, … } there is an alternativeformula for calculating its expectation

Lemma 2.9: Let be a discrete RV that takes ononly nonnegative integer values. Then

[ ] = Pr

Proof:

Pr = Pr =

= Pr = = Pr = = [ ]


1/18/2017

28

• For a geometric RV with parameter ,

Pr = =

• Hence

=

=1

(1 )

= 1

• Thus, for a fair coin where = 1/2, on averageit takes two flips to see the first heads


• Finding the expectation of a geometric RV withparameter using conditional expectations andthe memoryless property of geometric RVs

• Recall that corresponds to the number of flipsuntil the first heads given that each flip is headswith probability

• Let = 0 if the first flip is tails and = 1 if thefirst flip is heads

• By the identity from Lemma 2.5,= Pr = 0 = 0

+ Pr = 1 [ | = 1]= (1 ) [ | = 0] + [ | = 1]


1/18/2017

29

• If = 1 then = 1, so [ | = 1] = 1• If = 0, then > 1• In this case, let the number of remaining flips

(after the first flip until the first heads) be• Then, by the linearity of expectations,

[ ] = (1 ) [ + 1] + 1 = (1 ) [ ] + 1• By the memoryless property of geometric RVs,

is also a geometric RV with parameter• Hence [ ] = [ ], since they both have the

same distribution• We therefore have [ ] = (1 ) [ ] + 1 =

(1 ) [ ] + 1, which yields [ ] = 1/18-Jan-17MAT-72306 RandAl, Spring 2017 105

2.4.1. Example: CouponCollector's Problem

• Each box of cereal containsone of different coupons

• Once you obtain one of every type of coupon,you can send in for a prize

• Coupon in each box is chosen independentlyand uniformly at random from the possibilitiesand that you do not collaborate to collectcoupons

? How many boxes of cereal must you buy beforeyou obtain at least one of every type of coupon?


1/18/2017

30

• Let be the number of boxes bought until atleast one of every type of coupon is obtained

• If is the number of boxes bought while youhad exactly 1 different coupons, then clearly

=• The advantage of breaking into a sum of

random variables , = 1, … , , is that eachis a geometric RV

• When exactly 1 coupons have been found,the probability of obtaining a new coupon is

= 11


• Hence, is a geometric RV with parameter :

=1

=+ 1

• Using the linearity of expectations, we have that

=

=+ 1

=1


1/18/2017

31

• The summation 1 is known as theharmonic number ( )

Lemma 2.10: The harmonic number =1 satisfies ( ) = ln (1).

• Thus, for the coupon collector's problem, theexpected number of random coupons requiredto obtain all coupons is ln ( )


What is the probability that the firstelomaa/teach/RandAl-17-2.pdf · do not choose an edge of %in...

Documents

Transcript of What is the probability that the firstelomaa/teach/RandAl-17-2.pdf · do not choose an edge of %in...