CS#109 Lecture#12 April#22th,2016 · Lecture#12 April#22th,2016. Four Prototypical Trajectories...
Transcript of CS#109 Lecture#12 April#22th,2016 · Lecture#12 April#22th,2016. Four Prototypical Trajectories...
CS 109Lecture 12
April 22th, 2016
Four Prototypical Trajectories
Today:1. Multi variable RVs
2. Expectation with multiple RVs3. Independence with multiple RVs
Four Prototypical Trajectories
Review
• For two discrete random variables X and Y, the Joint Probability Mass Function is:
• Marginal distributions:
• Example: X = value of die D1, Y = value of die D2
),(),(, bYaXPbap YX ===
∑===y
YXX yapaXPap ),()()( ,
∑===x
YXY bxpbYPbp ),()()( ,
61
3616
1
6
1, ),1()1( ==== ∑∑
== yyYX ypXP
Discrete Joint Mass Function
Probability Table• States all possible outcomes with several discrete variables• Often is not “parametric”• If #variables is > 2, you can have a probability table, but you can’t draw it on a slide
All values of A
All values of B
a
b P(A = a, B = b)
Remember “,” means “and”
Every outcome falls into a bucket
Probability Table
• Random variables X and Y, are Jointly Continuous if there exists PDF fX,Y(x, y) defined over –∞ < x, y < ∞ such that:
∫ ∫=≤<≤<2
1
2
1
),( ) ,(P ,2121
a
a
b
bYX dxdyyxfbYbaXa
Jointly Continuous
0
y
x 900
900
Jointly Continuous
∫ ∫=≤<≤<2
1
2
1
),( ) ,(P ,2121
a
a
b
bYX dxdyyxfbYbaXa
a1
a2b2
b1
fX,Y (x, y)
x
y
Can calculate probabilities
Darts!
X-Pixel Marginal
y
xY-Pixel Marginal
X ⇠ N✓900
2,900
2
◆Y ⇠ N
✓900
3,900
5
◆
Can calculate marginal probabilities
Four Prototypical Trajectories
Transfer Learning
Four Prototypical Trajectories
Way Back
Permutations
How many ways are there to order n distinct objects?
n!
Multinomial
How many ways are there to order n objects such that:n1 are the same (indistinguishable)n2 are the same (indistinguishable)…nr are the same (indistinguishable)?
n!
n1!n2! . . . nr!
Called the “multinomial” because of something from Algebra
=
✓n
n1, n2, . . . , nr
◆
Binomial
How many ways are there to order n objects such that:r are the same (indistinguishable)(n – r) are the same (indistinguishable)?
Called the Binomial (Multi -> Bi)
How many ways are there to make an unordered selection of r objects from n objects?
n!
r!(n� r)!=
✓n
r
◆
• Consider n independent trials of Ber(p) rand. var.§ X is number of successes in n trials§ X is a Binomial Random Variable: X ~ Bin(n, p)
nippin
ipiXP ini ,...,1,0 )1()()( =−⎟⎟⎠
⎞⎜⎜⎝
⎛=== −
Binomial Distribution
Probability of exactly isuccesses
Binomial # ways of ordering the
successes
Probability of each ordering of i
successes is equal + mutually exclusive
Four Prototypical Trajectories
End Review
• Multinomial distribution§ n independent trials of experiment performed§ Each trial results in one of m outcomes, with respective probabilities: p1, p2, …, pm where
§ Xi = number of trials with outcome i
where and
∑=
=m
iip
11
mcm
cc
mmm ppp
cccn
cXcXcXP ...,...,,
),...,,( 2121
212211 ⎟⎟
⎠
⎞⎜⎜⎝
⎛====
ncm
ii =∑
=1!!!
!,...,, 2121 mm ccc
nccc
n⋅⋅⋅
=⎟⎟⎠
⎞⎜⎜⎝
⎛
Welcome Back the Multinomial
Joint distribution Multinomial # ways of ordering the successes
Probabilities of each ordering are equal and
mutually exclusive
• 6-sided die is rolled 7 times§ Roll results: 1 one, 1 two, 0 three, 2 four, 0 five, 3 six
• This is generalization of Binomial distribution§ Binomial: each trial had 2 possible outcomes§ Multinomial: each trial has m possible outcomes
7302011654321
61420
61
61
61
61
61
61
!3!0!2!0!1!1!7
)3,0,2,0,1,1(
⎟⎠
⎞⎜⎝
⎛=⎟⎠
⎞⎜⎝
⎛⎟⎠
⎞⎜⎝
⎛⎟⎠
⎞⎜⎝
⎛⎟⎠
⎞⎜⎝
⎛⎟⎠
⎞⎜⎝
⎛⎟⎠
⎞⎜⎝
⎛=
====== XXXXXXP
Hello Die Rolls, My Old Friends
• Ignoring order of words, what is probability of any given word you write in English?§ P(word = “the”) > P(word = “transatlantic”)§ P(word = “Stanford”) > P(word = “Cal”)§ Probability of each word is just multinomial distribution
• What about probability of those same words in someone else’s writing?§ P(word = “probability” | writer = you) >
P(word = “probability” | writer = non-CS109 student)§ After estimating P(word | writer) from known writings, use Bayes’ Theorem to determine P(writer | word) for new writings!
Probabilistic Text Analysis
Example document:“Pay for Viagra with a credit-card. Viagra is great. So are credit-cards. Risk free Viagra. Click for free.”n = 18
Text is a Multinomial
Viagra = 2Free = 2Risk = 1Credit-card: 2…For = 2
P
✓1
2|spam
◆=
n!
2!2! . . . 2!p2viagra
p2free
. . . p2for
P
✓1
2|spam
◆=
n!
2!2! . . . 2!p2viagra
p2free
. . . p2for
P
✓1
2|spam
◆=
n!
2!2! . . . 2!p2viagra
p2free
. . . p2for
Probability of seeing this document | spam
It’s a Multinomial!
The probability of a word in spam email being viagra
• Authorship of “Federalist Papers”§ 85 essays advocating ratification of US constitution
§ Written under pseudonym “Publius”o Really, Alexander Hamilton, James Madison and John Jay
§ Who wrote which essays?o Analyzed probability of words in each essay versus word distributions from known writings of three authors
• Filtering Spam§ P(word = “Viagra” | writer = you) << P(word = “Viagra” | writer = spammer)
Old and New Analysis
Four Prototypical Trajectories
Expectation with Multiple Variables?
Joint Expectation
E[g(X,Y )] =X
x,y
g(x, y)p(x, y)
E[g(X)] =X
x
g(x)p(x)
E[X] =X
x
xp(x)
• Expectation over a joint isn’t nicely defined because it is not clear how to compose the multiple variables:• Add them? Multiply them?
• Lemma: For a function g(X,Y) we can calculate the expectation of that function:
• By the way, this also holds for single random variables:
E[X + Y] = E[X] + E[Y]
Generalized:
Holds regardless of dependency between Xi’s
∑∑==
=⎥⎦
⎤⎢⎣
⎡ n
ii
n
ii XEXE
11][
Expected Values of Sums
E[X + Y ] = E[g(X,Y )] =X
x,y
g(x, y)p(x, y)
=X
x,y
[x+ y]p(x, y)
=X
x,y
xp(x, y) +X
x,y
yp(x, y)
=X
x
x
X
y
p(x, y) +X
y
y
X
x
p(x, y)
=X
x
xp(x) +X
y
yp(y)
= E[X] + E[Y ]
Skeptical Chris Wants a Proof!Let g(X,Y) = [X + Y]
E[X + Y ] = E[g(X,Y )] =X
x,y
g(x, y)p(x, y)
=X
x,y
[x+ y]p(x, y)
=X
x,y
xp(x, y) +X
x,y
yp(x, y)
=X
x
x
X
y
p(x, y) +X
y
y
X
x
p(x, y)
=X
x
xp(x) +X
y
yp(y)
= E[X] + E[Y ]
E[X + Y ] = E[g(X,Y )] =X
x,y
g(x, y)p(x, y)
=X
x,y
[x+ y]p(x, y)
=X
x,y
xp(x, y) +X
x,y
yp(x, y)
=X
x
x
X
y
p(x, y) +X
y
y
X
x
p(x, y)
=X
x
xp(x) +X
y
yp(y)
= E[X] + E[Y ]
E[X + Y ] = E[g(X,Y )] =X
x,y
g(x, y)p(x, y)
=X
x,y
[x+ y]p(x, y)
=X
x,y
xp(x, y) +X
x,y
yp(x, y)
=X
x
x
X
y
p(x, y) +X
y
y
X
x
p(x, y)
=X
x
xp(x) +X
y
yp(y)
= E[X] + E[Y ]
E[X + Y ] = E[g(X,Y )] =X
x,y
g(x, y)p(x, y)
=X
x,y
[x+ y]p(x, y)
=X
x,y
xp(x, y) +X
x,y
yp(x, y)
=X
x
x
X
y
p(x, y) +X
y
y
X
x
p(x, y)
=X
x
xp(x) +X
y
yp(y)
= E[X] + E[Y ]
E[X + Y ] = E[g(X,Y )] =X
x,y
g(x, y)p(x, y)
=X
x,y
[x+ y]p(x, y)
=X
x,y
xp(x, y) +X
x,y
yp(x, y)
=X
x
x
X
y
p(x, y) +X
y
y
X
x
p(x, y)
=X
x
xp(x) +X
y
yp(y)
= E[X] + E[Y ]
By the definition of g(x,y)
What a useful lemma
Break that sum into parts!
Change the sum of (x,y) into
separate sums
That is the definition of marginal probability
That is the definition of expectation
Four Prototypical Trajectories
Independence and Random Variables
• Two discrete random variables X and Y are called independent if:
• Intuitively: knowing the value of X tells us nothing about the distribution of Y (and vice versa)§ If two variables are not independent, they are called dependent
• Similar conceptually to independent events, but we are dealing with multiple variables§ Keep your events and variables distinct (and clear)!
yxypxpyxp YX , allfor )()(),( =
Independent Discrete Variables
• Flip coin with probability p of “heads”§ Flip coin a total of n + m times§ Let X = number of heads in first n flips§ Let Y = number of heads in next m flips
§ X and Y are independent§ Let Z = number of total heads in n + m flips§ Are X and Z independent?
o What if you are told Z = 0?
ymyxnx ppym
ppxn
yYxXP −− −⎟⎟⎠
⎞⎜⎜⎝
⎛−⎟⎟
⎠
⎞⎜⎜⎝
⎛=== )1()1(),(
)()( yYPxXP ===
Coin Flips
• Let N = # of requests to web server/day§ Suppose N ~ Poi(λ)§ Each request comes from a human (probability = p) or from a “bot” (probability = (1 – p)), independently
§ X = # requests from humans/day (X | N) ~ Bin(N, p)§ Y = # requests from bots/day (Y | N) ~ Bin(N, 1 - p)
)()|,()()|,(),(jiYXPjiYXjYiXPjiYXPjiYXjYiXPjYiXP
+≠++≠+==++=++=+=====
Web Server Requests
Probability of i human requests and j bot
requestsProbability of number of
requests in a day was i + j
Probability of i human requests and j bot requests |
we got i + j requests
• Let N = # of requests to web server/day§ Suppose N ~ Poi(λ)§ Each request comes from a human (probability = p) or from a “bot” (probability = (1 – p)), independently
§ X = # requests from humans/day (X | N) ~ Bin(N, p)§ Y = # requests from bots/day (Y | N) ~ Bin(N, 1 - p)
§ Note:
)()|,()()|,(),(jiYXPjiYXjYiXPjiYXPjiYXjYiXPjYiXP
+≠++≠+==++=++=+=====
Web Server Requests
0)|,( =+≠+== jiYXjYiXP
You got i human requests and j bot requests
You did not get i + jrequests
• Let N = # of requests to web server/day§ Suppose N ~ Poi(λ)§ Each request comes from a human (probability = p) or from a “bot” (probability = (1 – p)), independently
§ X = # requests from humans/day (X | N) ~ Bin(N, p)§ Y = # requests from bots/day (Y | N) ~ Bin(N, 1 - p)
)()|,()()|,(),(jiYXPjiYXjYiXPjiYXPjiYXjYiXPjYiXP
+≠++≠+==++=++=+=====
Web Server Requests
• Let N = # of requests to web server/day§ Suppose N ~ Poi(λ)§ Each request comes from a human (probability = p) or from a “bot” (probability = (1 – p)), independently
§ X = # requests from humans/day (X | N) ~ Bin(N, p)§ Y = # requests from bots/day (Y | N) ~ Bin(N, 1 - p)
)()|,()()|,(),(jiYXPjiYXjYiXPjiYXPjiYXjYiXPjYiXP
+≠++≠+==++=++=+=====
Web Server Requests
)!()( jiji
ejiYXP +
+−=+=+ λλ
ji ppjiYXjYiXPiji )1()|,( −⎟⎠⎞
⎜⎝⎛=+=+==+
)!()1(),( jiiji jiji eppjYiXP +
+ +−−⎟⎠⎞
⎜⎝⎛=== λλ
• Let N = # of requests to web server/day§ Suppose N ~ Poi(λ)§ Each request comes from a human (probability = p) or from a “bot” (probability = (1 – p)), independently
§ X = # requests from humans/day (X | N) ~ Bin(N, p)§ Y = # requests from bots/day (Y | N) ~ Bin(N, 1 - p)
§ Where X ~ Poi(λp) and Y ~ Poi(λ(1 – p))§ X and Y are independent!
Web Server Requests
)!(! !)!( )1(),( jijiji jiji eppjYiXP +
+ +−−=== λλ
!))1((
!)(
jp
ip ji
e −⋅−= λλλ
!))1((
!)( )1(
jp
ip jpip ee −
⋅ −−−= λλ λλ )()( jYPiXP ===
• Two continuous random variables X and Y are called independent if:P(X ≤ a, Y ≤ b) = P(X ≤ a) P(Y ≤ b) for any a, b
• Equivalently:
• More generally, joint density factors separately:
babFaFbaF YXYX , allfor )()(),(, =babfafbaf YXYX , allfor )()(),(, =
∞<<∞−= yxygxhyxf YX , where)()(),(,
Independent Continuous Variables
• Consider joint density function of X and Y:
§ Are X and Y independent?
• Consider joint density function of X and Y:
§ Are X and Y independent?
§ Now add constraint that: 0 < (x + y) < 1§ Are X and Y independent?
o Cannot capture constraint on x + y in factorization!
∞<<= −− yxeeyxf yxYX ,0for 6),( 23,
)()(),( ,2)( 3)( ,23 ygxhyxfeygexh YXyx === −− soandLet
1,0for 4),(, <<= yxxyyxf YX
)()(),( ,2)( 2)( , ygxhyxfyygxxh YX === soandLet
Yes!
Yes!
No!
Pop Quiz (just kidding)
• Two people set up a meeting for 12pm§ Each arrives independently at time uniformly distributed between 12pm and 12:30pm
§ X = # min. past 12pm person 1 arrives X ~ Uni(0, 30)§ Y = # min. past 12pm person 2 arrives Y ~ Uni(0, 30)§ What is P(first to arrive waits > 10 min. for other)?
symmetryby )10(2)10()10( YXPXYPYXP <+=<++<+
∫∫∫∫<+<+
==<+yx
YXyx
dxdyyfxfdxdyyxfYXP1010
)()(2),(2)10(2
∫∫∫ ∫∫ ∫===
−
==
−
=
−=⎟⎠
⎞⎜⎝
⎛=⎟
⎟⎠
⎞⎜⎜⎝
⎛=⎟
⎠
⎞⎜⎝
⎛= −30
102
30
102
30
10
10
02
30
10
10
0
2
)10(302
302
302
3012
010
yyy
y
xy
y
x
dyydyxdydxdxdy y
94100
210300
230
30210
2302 22
2
2
2 1030 =⎥
⎦
⎤⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛−−⎟⎟
⎠
⎞⎜⎜⎝
⎛−=⎟⎟
⎠
⎞⎜⎜⎝
⎛−= yy
Dating at Stanford
• n random variables X1, X2, …, Xn are called independent if:
• Analogously, for continuous random variables:
nii
n
inn aaaaXPaXaXaXP ,...,, of subsets allfor )(),...,,( 21
12211 ≤=≤≤≤ ∏
=
nii
n
inn xxxxXPxXxXxXP ,...,, of subsets allfor )(),...,,( 21
12211 ===== ∏
=
Independence of Multiple Variables
• If random variables X and Y independent, then§ X independent of Y, and Y independent of X
• Duh!? Duh, indeed...§ Let X1, X2, ... be a sequence of independent and identically distributed (I.I.D.) continuous random vars
§ Say Xn > Xi for all i = 1,..., n - 1 (i.e. Xn = max(X1, ... ,Xn))o Call Xn a “record value”
§ Let event Ai indicate Xi is “record value”o Is An+1 independent of An?o Is An independent of An+1?o Easier to answer: Yes!o By symmetry, P(An) = 1/n and P(An+1) = 1/(n+1)o P(An An+1) = (1/n)(1/(n+1)) = P(An)P(An+1)
Independence is Symmetric
Four Prototypical Trajectories
Earth Day
Choosing a Random Subset• From set of n elements, choose a subset of size ksuch that all possibilities are equally likely§ Only have random(), which simulates X ~ Uni(0, 1)
• Brute force:§ Generate (an ordering of) all subsets of size k§ Randomly pick one (divide (0, 1) into intervals)§ Expensive with regard to time and space§ Bad times!
⎟⎟⎠
⎞⎜⎜⎝
⎛
kn
⎟⎟⎠
⎞⎜⎜⎝
⎛
kn
(Happily) Choosing a Random Subset• Good times:
int indicator(double p) {if (random() < p) return 1; else return 0;
}
// array I[] indexed from 1 to nsubset rSubset(k, set of size n) {
subset_size = 0;I[1] = indicator((double)k/n);for(i = 1; i < n; i++) {
subset_size += I[i];I[i+1] = indicator((k – subset_size)/(n – i));
}return (subset containing element[i] iff I[i] == 1);
}
niiIIiIPnkIP in
jIki
j <<==+== −
∑−= 1 ])[],...,1[|1]1[( )1]1[(
][1 whereand
Random Subsets the Happy Way• Proof (Induction on (k + n)): (i.e., why this algorithm works)
§ Base Case: k = 1, n = 1, Set S = {a}, rSubset returns {a} with p=
§ Inductive Hypoth. (IH): for k + x ≤ c, Given set S, |S| = x and k ≤ x,rSubset returns any subset S’ of S, where |S’| = k, with p =
§ Inductive Case 1: (where k + n ≤ c + 1) |S| = n (= x + 1), I[1] = 1o Elem 1 in subset, choose k – 1 elems from remaining n – 1o By IH: rSubset returns subset S’ of size k – 1 with p =
o P(I[1] = 1, subset S’) =
§ Inductive Case 2: (where k + n ≤ c + 1) |S| = n (= x + 1), I[1] = 0o Elem 1 not in subset, choose k elems from remaining n – 1
o By IH: rSubset returns subset S’ of size k with p =
o P(I[1] = 0, subset S’) =
⎟⎟⎠
⎞⎜⎜⎝
⎛
kx
1
⎟⎟⎠
⎞⎜⎜⎝
⎛
−
−
11
1kn
⎟⎟⎠
⎞⎜⎜⎝
⎛=⎟⎟
⎠
⎞⎜⎜⎝
⎛
−
−⋅
kn
kn
nk 1
11
1
⎟⎟⎠
⎞⎜⎜⎝
⎛ −
kn 1
1
⎟⎟⎠
⎞⎜⎜⎝
⎛=⎟⎟
⎠
⎞⎜⎜⎝
⎛ −⋅⎟⎠
⎞⎜⎝
⎛ −=⎟⎟⎠
⎞⎜⎜⎝
⎛ −⋅⎟⎠
⎞⎜⎝
⎛ −kn
kn
nkn
kn
nk 1
11
111
⎟⎟⎠
⎞⎜⎜⎝
⎛
11
1
Sum of Independent Binomial RVs• Let X and Y be independent random variables
§ X ~ Bin(n1, p) and Y ~ Bin(n2, p) § X + Y ~ Bin(n1 + n2, p)
• Intuition:§ X has n1 trials and Y has n2 trials
o Each trial has same “success” probability p
§ Define Z to be n1 + n2 trials, each with success prob. p§ Z ~ Bin(n1 + n2, p), and also Z = X + Y
• More generally: Xi ~ Bin(ni, p) for 1 ≤ i ≤ N⎟⎠
⎞⎜⎝
⎛⎟⎠
⎞⎜⎝
⎛∑∑==
pnXN
ii
N
ii ,Bin~
11
Sum of Independent Poisson RVs• Let X and Y be independent random variables
§ X ~ Poi(λ1) and Y ~ Poi(λ2)
§ X + Y ~ Poi(λ1 + λ2)
• Proof: (just for reference)§ Rewrite (X + Y = n) as (X = k, Y = n – k) where 0 ≤ k ≤ n
§ Noting Binomial theorem:
§ so, X + Y = n ~ Poi(λ1 + λ2)
∑∑==
−===−====+n
k
n
kknYPkXPknYkXPnYXP
00
)()(),()(
∑∑∑=
−+−
=
−+−
=
−−−
−=
−=
−=
n
k
knkn
k
knkn
k
knk
knkn
ne
knke
kne
ke
021
)(
0
21)(
0
21
)!(!!
!)!(!)!(!
212121 λλ
λλλλ λλλλλλ
( )nn
enYXP 21
)(
!)(
21
λλλλ
+==++−
∑=
−
−=+
n
k
knkn
knkn
02121 )!(!
!)( λλλλ