What is the probability that the firstelomaa/teach/RandAl-17-2.pdf · do not choose an edge of %in...
Transcript of What is the probability that the firstelomaa/teach/RandAl-17-2.pdf · do not choose an edge of %in...
1/18/2017
1
Theorem 1.7 [Bayes' Law]: Assume that, , … , are mutually disjoint events in the
sample space s.t. . Then
PrPr
Pr( )=
Pr PrPr Pr
.
• We are given three coins and are told that two ofthe coins are fair and the third coin is biased,landing heads with probability 2/3
• We permute the coins randomly, and then flipeach of the coins
18-Jan-17MAT-72306 RandAl, Spring 2017 49
• The first and second coins come upheads, and the third comes up tails
? What is the probability that the firstcoin is the biased one?
• The coins are in a random order and so, beforeour observing the outcomes of the coin flips,each of the three coins is equally likely to be thebiased one
• Let be the event that the th coin flipped isthe biased one, and let be the event that thethree coin flips came up heads, heads, andtails
18-Jan-17MAT-72306 RandAl, Spring 2017 50
1/18/2017
2
• Before we flip the coins Pr( ) = 1/3 for all• The probability of the event conditioned on :
Pr = Pr =23
12
12
=16
and
Pr =12
12
13
=1
12• Applying Bayes' law, we have Pr =
Pr PrPr Pr
=1
181
18 + 118 + 1
36=
25
• The three coin flips increases the likelihood thatthe first coin is the biased one from 1/3 to 2/5
18-Jan-17MAT-72306 RandAl, Spring 2017 51
• In randomized matrix multiplication test, we wantto evaluate the increase in confidence in thematrix identity obtained through repeated tests
• In the Bayesian approach one starts with a priormodel, giving some initial value to the modelparameters
• This model is then modified, by incorporatingnew observations, to obtain a posterior modelthat captures the new information
• If we have no information about the process thatgenerated the identity then a reasonable priorassumption is that the identity is correct withprobability 1/2
18-Jan-17MAT-72306 RandAl, Spring 2017 52
1/18/2017
3
• Let be the event that the identity is correct,and let be the event that the test returns thatthe identity is correct
• We start with Pr = Pr = 1 2, and sincethe test has a one-sided error bounded by 1 2,we have Pr = 1 and Pr 1 2
• Applying Bayes' law yields
Pr =Pr Pr
Pr Pr + Pr Pr1 2
1 2 + 1 2 1 2=
23
18-Jan-17MAT-72306 RandAl, Spring 2017 53
• Assume now that we run the randomized testagain and it again returns that the identity iscorrect
• After the first test, we may have revised our priormodel, so that we believe Pr( 2/3 andPr( 1/3
• Now let be the event that the new test returnsthat the identity is correct; since the tests areindependent, as before we have
Pr = 1 and Pr 1/2
18-Jan-17MAT-72306 RandAl, Spring 2017 54
1/18/2017
4
• Applying Bayes' law then yields
Pr2 3
2 3 + 1 3 1 2=
45
• In general: If our prior model (before running thetest) is that Pr( 2 2 + 1 and if the testreturns that the identity is correct (event ), then
Pr =2
2 + 1= 1
12 + 1
• Thus, if all 100 calls to the matrix identity testreturn that it is correct, our confidence in thecorrectness of this identity is 1/(2 + 1)
18-Jan-17MAT-72306 RandAl, Spring 2017 55
1.4. A Randomized Min-Cut Algorithm
• A cut-set in a graph is a set of edges whoseremoval breaks the graph into two or moreconnected components
• Given a graph = ( , ) with vertices, theminimum cut – or min-cut – problem is to find aminimum cardinality cut-set in
• Minimum cut problems arise in many contexts,including the study of network reliability
18-Jan-17MAT-72306 RandAl, Spring 2017 56
1/18/2017
5
• Minimum cuts also arise in clustering problems• For example, if nodes represent Web pages (or
any documents in a hypertext-based system)– and two nodes have an edge between them if
the corresponding nodes have a hyperlinkbetween them,
– then small cuts divide the graph into clustersof documents with few links between clusters
• Documents in different clusters are likely to beunrelated
18-Jan-17MAT-72306 RandAl, Spring 2017 57
• The main operation is edge contraction• In contracting an edge { , } we merge vertices
and into one, eliminate all edges connectingand , and retain all other edges in the graph
• The new graph may have parallel edges but noself-loops
• The algorithm consists of 2 iterations• Each iteration picks an edge from the existing
edges in the graph and contracts that edge• Our randomized algorithm chooses the edge
uniformly at random from the remaining edges
18-Jan-17MAT-72306 RandAl, Spring 2017 58
1/18/2017
6
• Each iteration reduces # of vertices by one• After 2 iterations, there are two vertices• The algorithm outputs the set of edges
connecting the two remaining vertices• Any cut-set in an intermediate iteration of the
algorithm is also a cut-set of the original graph• Not every cut-set of the original graph is one in
an intermediate iteration, since some edges mayhave been contracted in previous iterations
• As a result, the output of the algorithm is alwaysa cut-set of the original graph but not necessarilythe minimum cardinality cut-set
18-Jan-17MAT-72306 RandAl, Spring 2017 59
18-Jan-17MAT-72306 RandAl, Spring 2017 60
1/18/2017
7
Theorem 1.8: The algorithm outputs a min-cut setwith probability at least 2 ( ( 1)).
Proof: Let be the size of the min-cut set of .The graph may have several cut-sets of minimumsize. We compute the probability of finding onespecific such set .Since is a cut-set in the graph, removal of the set
partitions the set of vertices into two sets, and, such that there are no edges connecting
vertices in to those in .
18-Jan-17MAT-72306 RandAl, Spring 2017 61
Assume that, throughout an execution of thealgorithm, we contract only edges that connect twovertices in or two in , but not edges in .In that case, all the edges eliminated throughoutthe execution will be edges connecting vertices in
or vertices in , and after 2 iterations thealgorithm returns a graph with two verticesconnected by the edges in .We may conclude that, if the algorithm neverchooses an edge of in its 2 iterations, thenthe algorithm returns as the minimum cut-set.
18-Jan-17MAT-72306 RandAl, Spring 2017 62
1/18/2017
8
If the size of the cut is small, then the probabilitythat the algorithm chooses an edge of is small –at least when the number of edges remaining islarge compared to .Let be the event that the edge contracted initeration is not in , and let = be theevent that no edge of was contracted in the firstiterations. We need to compute Pr .Start by computing Pr = Pr . Since theminimum cut-set has edges, all vertices in thegraph must have degree or larger. If each vertexis adjacent to at least edges, then the graphmust have at least 2 edges.
18-Jan-17MAT-72306 RandAl, Spring 2017 63
Since there are at least /2 edges in the graphand since has edges, the probability that wedo not choose an edge of in the first iteration is
Pr = Pr = 1 = 12
Suppose that the first contraction did not eliminatean edge of . I.e., we condition on the event .Then, after the first iteration, we are left with an( 1)-node graph with minimum cut-set of size .Again, the degree of each vertex in the graph mustbe at least , and the graph must have at least
( 1)/2 edges.
18-Jan-17MAT-72306 RandAl, Spring 2017 64
1/18/2017
9
Pr 11 2
= 12
1Similarly,
Pr+ 1 2
= 12
+ 1To compute Pr , we use
Pr = Pr= Pr Pr= Pr Pr | Pr | Pr
2+ 1
=1
+ 1
=2 3
124
13
=2
1.
18-Jan-17MAT-72306 RandAl, Spring 2017 65
2. Discrete RandomVariables and Expectation
Random Variables and ExpectationThe Bernoulli and Binomial Random
VariablesConditional Expectation
The Geometric DistributionThe Expected Run-Time of Quicksort
1/18/2017
10
• In tossing two dice we are often interested in thesum of the dice rather than their separate values
• The sample space in tossing two dice consists of36 events of equal probability, given by theordered pairs of numbers {(1,1), (1,2), … , (6, 6)}
• If the quantity we are interested in is the sum ofthe two dice, then we are interested in 11 events(of unequal probability)
• Any such function from the sample space to thereal numbers is called a random variable
18-Jan-17MAT-72306 RandAl, Spring 2017 67
2.1. Random Variables and Expectation
Definition 2.1: A random variable (RV) on asample space is a real-valued function on ;that is, . A discrete random variable is aRV that takes on only a finite or countably infinitenumber of values• For a discrete RV and a real value , the event
" = " includes all the basic events of thesample space in which assumes the value
• I.e., " = " represents the set ( ) = }
18-Jan-17MAT-72306 RandAl, Spring 2017 68
1/18/2017
11
• We denote the probability of that event by
Pr = = Pr,
( )
• If is the RV representing the sum of the twodice, the event = 4 corresponds to the set ofbasic events {(1, 3), (2,2), (3, 1)}
• Hence,
Pr = 4 =3
36=
112
18-Jan-17MAT-72306 RandAl, Spring 2017 69
Definition 2.2: Two RVs and are independentif and only if
Pr(( = ( = )) = Pr( = ) Pr( = )for all values and .Similarly, RVs , , … , are mutuallyindependent if and only if, for any subset [1, ]and any values , ,
Pr ( = ) = Pr =
18-Jan-17MAT-72306 RandAl, Spring 2017 70
1/18/2017
12
Definition 2.3: The expectation of a discrete RV, denoted by E[ ], is given by
= Pr = ,
where the summation is over all values in therange of . The expectation is finite if | | Pr( =), converges; otherwise, it is unbounded.
• E.g., the expectation of the RV representingthe sum of two dice is
=1
362 +
236
3 +3
364 + +
136
12 = 7
18-Jan-17MAT-72306 RandAl, Spring 2017 71
• As an example of where the expectation of adiscrete RV is unbounded, consider a RV thattakes on the value 2 with probability 1 2 for =1,2, …
• The expected value of is
[ ] =12
2 = 1
• expresses that [ ] is unbounded
18-Jan-17MAT-72306 RandAl, Spring 2017 72
1/18/2017
13
2.1.1. Linearity of Expectations
• By this property, the expectation of the sum ofRVs is equal to the sum of their expectations
Theorem 2.1 [Linearity of Expectations]:For any finite collection of discrete RVs
, , … , with finite expectations,
=
18-Jan-17MAT-72306 RandAl, Spring 2017 73
Proof: We prove the statement for two random variablesand (general case by induction). The summations thatfollow are understood to be over the ranges of thecorresponding RVs:
+ = + Pr ( = ( = )
= Pr ( = ( = ) + Pr ( = ( = )
= Pr ( = ( = ) + Pr ( = ( = )
= Pr = + Pr =
= [ ] + [ ]The first equality follows from Definition 1.2. In thepenultimate equation uses Theorem 1.6, the law of totalprobability.
18-Jan-17MAT-72306 RandAl, Spring 2017 74
1/18/2017
14
• Let us now compute the expected sum of twostandard dice
• Let = + , where represents theoutcome of die for = 1,2
• Then
=16
=72
• Applying the linearity of expectations, we have= + = 7
18-Jan-17MAT-72306 RandAl, Spring 2017 75
• Linearity of expectations holds for any collectionof RVs, even if they are not independent
• Consider, e.g., the previous example and let therandom variable = 1 + 1
2
• We have= 1 + 1
2 = 1 + [ 12]
even though 1 and 12 are clearly
dependent• Verify the identity by considering the six possible
outcomes for 1
18-Jan-17MAT-72306 RandAl, Spring 2017 76
1/18/2017
15
Lemma 2.2: For any constant and discrete RV ,= [ ].
Proof: The lemma is obvious for = 0. For 0,
[ ] = Pr =
= / Pr = /
= Pr =
= .
18-Jan-17MAT-72306 RandAl, Spring 2017 77
2.1.2. Jensen's Inequality
• Let us choose the length of a side of a squareuniformly at random from the range [1,99]
• What is the expected value of the area?• We can write this as [ ]• It is tempting to think of this as being equal to
, but a simple calculation shows that this isnot correct
• In fact, = = 50 = 2500 whereas= 9950 3 3317 > 2500
18-Jan-17MAT-72306 RandAl, Spring 2017 78
1/18/2017
16
• More generally, [ ] ( )• Consider = ( )• The RV is nonnegative and hence its
expectation must also be nonnegative[ ] = [( [ ]) ]
= [ + ]= [ 2 [ ] + ( )
= [ ( )• To obtain the penultimate line, use the linearity
of expectations• To obtain the last line use Lemma 2.2 to simplify
[ [ ]] = [ ] [ ]18-Jan-17MAT-72306 RandAl, Spring 2017 79
• The fact that [ ] ( ) is an example ofJensen's inequality
• Jensen's inequality shows that, for any convexfunction , we have [ ( )] ( [ ])
Definition 2.4: A function : is said to beconvex if, for any , and 1,
+ + (1 ) ( )
Lemma 2.3: If is twice differentiable function,then is convex if and only if
"( ) 018-Jan-17MAT-72306 RandAl, Spring 2017 80
1/18/2017
17
18-Jan-17MAT-72306 RandAl, Spring 2017 81
Theorem 2.4 [Jensen's Inequality]: If is aconvex function, then [ ( )] ( [ ]).Proof: We prove the theorem assuming thathas a Taylor expansion. Let = [ ].By Taylor's theorem there is a value such that
= + (x ) + ( ) 2+ (x )
since ( ) > 0 by convexity. Taking expectationsand applying linearity of and Lemma 2.2 yields:
[ [ + ]= [ ] + ( )( [ )
= ( ) = ( [ ]).
18-Jan-17MAT-72306 RandAl, Spring 2017 82
1/18/2017
18
2.2. The Bernoulli and BinomialRandom Variables
• We run an experiment that succeeds withprobability and fails with probability
• Let be a RV such that
= 1 if the experiment succeeds,0 otherwise
• The variable is called a Bernoulli or anindicator random variable
• Note that, for a Bernoulli RV,[ ] = 1 + (1 ) 0 = = Pr( = 1)
18-Jan-17MAT-72306 RandAl, Spring 2017 83
• If we, e.g., flip a fair coin and consider heads asuccess, then the expected value of thecorresponding indicator RV is 1/2
• Consider a sequence of independent coin flips• What is the distribution of the number of heads
in the entire sequence?• More generally, consider a sequence of
independent experiments, each of whichsucceeds with probability
• If we let represent the number of successes inthe experiments, then has a binomialdistribution
18-Jan-17MAT-72306 RandAl, Spring 2017 84
1/18/2017
19
Definition 2.5: A binomial RV with parametersand , denoted by ( , ), is defined by the
following probability distribution on = 0,1,2, … , :Pr = =
• I.e., the binomial RV (BRV) equals whenthere are exactly successes and failuresin independent experiments, each of which issuccessful with probability
• Definition 2.5 ensures that the BRV is a validprobability function (Definition 1.2):
Pr = = 118-Jan-17MAT-72306 RandAl, Spring 2017 85
• We want to gather data about the packets goingthrough a router
• We want to know the approximate fraction ofpackets from a certain source / of a certain type
• We store a random subset – or sample – of thepackets for later analysis
• Each packet is stored with probability andpackets go through the router each day, thenumber of sampled packets each day is a BRV
with parameters and• To know how much memory is necessary for
such a sample, determine the expectation of18-Jan-17MAT-72306 RandAl, Spring 2017 86
1/18/2017
20
• If is a BRV with parameters and , then isthe number of successes in trials, where eachtrial is successful with probability
• Define a set of indicator RVs , … , , where= 1 if the th trial is successful and 0
otherwise• Clearly, [ ] = and = and so, by the
linearity of expectations,
= = =
18-Jan-17MAT-72306 RandAl, Spring 2017 87
2.3. Conditional Expectation
Definition 2.6:
[ | = ] = Pr = = ,
where the summation is over all in the range of
• The conditional expectation of a RV is, like , aweighted sum of the values it assumes
• Now each value is weighted by the conditionalprobability that the variable assumes that value
18-Jan-17MAT-72306 RandAl, Spring 2017 88
1/18/2017
21
• Suppose that we independently roll twostandard six-sided dice
• Let be the number that shows on the first die,the number on the second die, and the sum
of the numbers on the two dice• Then
= 2 = Pr = = 2
=16
=112
18-Jan-17MAT-72306 RandAl, Spring 2017 89
• As another example, consider = 5 :
= 5 = Pr = | = 5
=Pr = = 5
Pr = 5
=1 364 36
=52
18-Jan-17MAT-72306 RandAl, Spring 2017 90
1/18/2017
22
Lemma 2.5: For any RVs and ,[ ] = Pr = [ | = ],
where the sum is over all values in the range ofand all of the expectations exist.Proof:
Pr = == Pr = Pr = | == Pr = | = Pr == Pr = == Pr = = [ ]
18-Jan-17MAT-72306 RandAl, Spring 2017 91
• The linearity of expectations also extends toconditional expectations
Lemma 2.6: For any finite collection of discreteRVs , , … , with finite expectations and forany RV ,
| = = [ | = ]
18-Jan-17MAT-72306 RandAl, Spring 2017 92
1/18/2017
23
• Confusingly, the conditional expectation is alsoused to refer to the following RV
Definition 2.7: The expression [ | ] is a RV ( )that takes on the value [ | = ] when =
• [ | ] is not a real value; it is actually a functionof the RV
• Hence [ | ] is itself a function from the samplespace to the real numbers and can therefore bethought of as a RV
18-Jan-17MAT-72306 RandAl, Spring 2017 93
• In the previous example of rolling two dice,
= Pr = =16
= +72
• We see that is a RV whose valuedepends on
• If [ | ] is a RV, then it makes sense to considerits expectation [ ]
• We found that = + 7 2• Thus,
= +72
=72
+72
= 7 = [ ]
18-Jan-17MAT-72306 RandAl, Spring 2017 94
1/18/2017
24
• More generally,
Theorem 2.7: Y = [ ]
Proof: From Definition 2.7 we have = ,where takes on the value = when
= . Hence
= = Pr =
The right-hand side equals Y by Lemma 2.5.18-Jan-17MAT-72306 RandAl, Spring 2017 95
• Consider a program that includes one call to aprocess
• Assume that each call to process recursivelyspawns new copies of the process , where thenumber of new copies is a BRV with parameters
and• We assume that these random variables are
independent for each call to• What is the expected number of copies of the
process generated by the program?
18-Jan-17MAT-72306 RandAl, Spring 2017 96
1/18/2017
25
• To analyze this recursive spawning process, weuse generations
• The initial process is in generation 0• Otherwise, we say that a process is in
generation if it was spawned by anotherprocess in generation 1
• Let denote the number of processes ingeneration
• Since we know that = 1, the number ofprocesses in generation 1 has a binomialdistribution
• Thus, =18-Jan-17MAT-72306 RandAl, Spring 2017 97
• Similarly, suppose we knew that the number ofprocesses in generation 1 was , so
=• Then
= =• Applying Theorem 2.7, we can compute the
expected size of the th generation inductively• We have
= = = [ ]• By induction on , and using the fact that = 1,
we then obtain=
18-Jan-17MAT-72306 RandAl, Spring 2017 98
1/18/2017
26
• The expected total number of copies of processgenerated by the program is given by
= =
• If 1 then the expectation is unbounded; if< 1, the expectation is 1 (1 )
• The # of processes generated by the programis bounded iff the # of processes spawned byeach process is less than 1
• This is a simple example of a branching process,a probabilistic paradigm extensively studied inprobability theory
18-Jan-17MAT-72306 RandAl, Spring 2017 99
2.4. The Geometric Distribution
• Let us flip a coin until it lands on heads
? What is the distribution of the number of flips?• This is an example of a geometric distribution• It arises when we perform a sequence of
independent trials until the first success, whereeach trial succeeds with probability
Definition 2.8: A geometric RV with parameteris given by the following probability distribution on
= 1,2, … : Pr( = ) =
18-Jan-17MAT-72306 RandAl, Spring 2017 100
1/18/2017
27
• Geometric RVs are said to be memorylessbecause the probability that you will reach yourfirst success trials from now is independent ofthe number of failures you have experienced
• Informally, one can ignore past failures – they donot change the distribution of the number offuture trials until first success
• Formally, we have the following
Lemma 2.8: For a geometric RV with parameterand for > 0,
Pr( = + | > ) = Pr( = )
18-Jan-17MAT-72306 RandAl, Spring 2017 101
• When a RV takes values in the set of naturalnumbers = {0,1,2,3, … } there is an alternativeformula for calculating its expectation
Lemma 2.9: Let be a discrete RV that takes ononly nonnegative integer values. Then
[ ] = Pr
Proof:
Pr = Pr =
= Pr = = Pr = = [ ]
18-Jan-17MAT-72306 RandAl, Spring 2017 102
1/18/2017
28
• For a geometric RV with parameter ,
Pr = =
• Hence
=
=1
(1 )
= 1
• Thus, for a fair coin where = 1/2, on averageit takes two flips to see the first heads
18-Jan-17MAT-72306 RandAl, Spring 2017 103
• Finding the expectation of a geometric RV withparameter using conditional expectations andthe memoryless property of geometric RVs
• Recall that corresponds to the number of flipsuntil the first heads given that each flip is headswith probability
• Let = 0 if the first flip is tails and = 1 if thefirst flip is heads
• By the identity from Lemma 2.5,= Pr = 0 = 0
+ Pr = 1 [ | = 1]= (1 ) [ | = 0] + [ | = 1]
18-Jan-17MAT-72306 RandAl, Spring 2017 104
1/18/2017
29
• If = 1 then = 1, so [ | = 1] = 1• If = 0, then > 1• In this case, let the number of remaining flips
(after the first flip until the first heads) be• Then, by the linearity of expectations,
[ ] = (1 ) [ + 1] + 1 = (1 ) [ ] + 1• By the memoryless property of geometric RVs,
is also a geometric RV with parameter• Hence [ ] = [ ], since they both have the
same distribution• We therefore have [ ] = (1 ) [ ] + 1 =
(1 ) [ ] + 1, which yields [ ] = 1/18-Jan-17MAT-72306 RandAl, Spring 2017 105
2.4.1. Example: CouponCollector's Problem
• Each box of cereal containsone of different coupons
• Once you obtain one of every type of coupon,you can send in for a prize
• Coupon in each box is chosen independentlyand uniformly at random from the possibilitiesand that you do not collaborate to collectcoupons
? How many boxes of cereal must you buy beforeyou obtain at least one of every type of coupon?
18-Jan-17MAT-72306 RandAl, Spring 2017 106
1/18/2017
30
• Let be the number of boxes bought until atleast one of every type of coupon is obtained
• If is the number of boxes bought while youhad exactly 1 different coupons, then clearly
=• The advantage of breaking into a sum of
random variables , = 1, … , , is that eachis a geometric RV
• When exactly 1 coupons have been found,the probability of obtaining a new coupon is
= 11
18-Jan-17MAT-72306 RandAl, Spring 2017 107
• Hence, is a geometric RV with parameter :
=1
=+ 1
• Using the linearity of expectations, we have that
=
=+ 1
=1
18-Jan-17MAT-72306 RandAl, Spring 2017 108
1/18/2017
31
• The summation 1 is known as theharmonic number ( )
Lemma 2.10: The harmonic number =1 satisfies ( ) = ln (1).
• Thus, for the coupon collector's problem, theexpected number of random coupons requiredto obtain all coupons is ln ( )
18-Jan-17MAT-72306 RandAl, Spring 2017 109