Lecture •Review syllabus and class logistics •Intro ...
Transcript of Lecture •Review syllabus and class logistics •Intro ...
1
Lecture 1: Monday August 23, 2021
Lecture
• Review syllabus and class logistics• Intro & motivation• Probability review
2
Class Websitehttps://barry.ece.gatech.edu/6601/
3
Progression
In a nutshell:
• probability theory• random variables X, Y• random vectors X = [X1, X2, ... Xn]T
• random sequences Xk
• random processes X( t )
4
Topics• Review Probability (Chapters 1-3)
axioms, Conditional probability, Bayes Theorem• Random Variables (Chapters 4 and 5)
The cdf, pmf (discrete), and pdf (continuous)Expectation and moments, the mgf
• Pairs of Random Variables (Chapter 6)Joint, marginal, and conditionalindependence and correlationLaw of total expectation
• Random Vectors (Chapter 6)Jointly Gaussian random vectorsConditional pdfs for a Gaussian random vectorMinimum mean-square error (MMSE) prediction
• Limit Theorems (Chapter 7)The Central Limit Theorem
5
• Discrete-Time Random Sequences (Chapter 9 and 11)Stationarity, ErgodicityAutocorrelation and Power spectral densitySpectral factorization and innovationsLinear prediction
• Continuous-Time Random Processes (Chapter 10 and 11)• The Poisson Process
Discrete-Time Bernoulli Process: Binomial, Geometric, and NegBinContinous-Time Poisson Point Process: Poisson, Exp, and Erlang
• Kalman filters (Chapter 13)• Markov Chains (Chapter 15 and 16)
6
Examples• Thermal noise
• Wiener process
• Bernoulli
• Poisson Counting
THERMALNOISE (HISTOGRAM)
... 00101001000000001011010011011010001100110011010001 ...
N( t )
0 t
7
• Discrete-Time Random Telegraph
• Random Telegraph
• Random PAM
t
8
2 Birds in a Cage
Given that one is male, what is the probability that the other is male?
MM
MF
FM
FF⇒ P(MM) = 1/3
9
Why we need this class
The information age: information = –log(probability)
Data analysis, ML, AI
To design
• communication systems• radar, GPS: minimize P(miss) given P(false alarm)?• Queueing systems
airport security, check-out countersaverage wait time, #servers so that waiting time ist0
• Forecasting and predictionweather, financial markets, speech, video (e.g. for compression)
10
Sets
A set is a collection of objects called elements.
Examples:
F = {apple, banana}B = {0, 1, 2, }Z = { –3, –2, –1, 0, 1, 2, 3, }R+ = {x : x > 0}A subset B of a set A is itself a set, all of whose elts are in A.
Notation: B ⊂ A
Equality: A = B iff A ⊂ B and B ⊂ A
In our discussion, all sets will be subsets of a universal set, S
A set with n=6 elts (die) has 2n=26 subsets: ø, {1}, {2},{2,3,4,5,6}
11
Set OperationsSum (Union): C = A ∪ B is shaded
The set of elts that are in A, or in B, or both
Product (Intersection): C = A ∩ B is shaded
The set of elts that are common to A and B
A
B
S
A
B
S
A∩B
12
Disjoint SetsTwo sets A and B are disjoint or mutually exclusive when they share no elts in common:
Equivalent condition: A∩B = ø
A
B
S
13
Set Complement
The complement Ac are the elts of S not in A
Properties:
øc = SSc = ø
(Ac)c = AA ∩ Ac = øA ∪ Ac = S
A
S
Ac
14
Properties of Set Operations• Commutative A ∪ B = B ∪ A• Associative A ∪ (B ∪ C) = (A ∪ B) ∪ C• Distributive A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
• Double complement Acc = A
• Mutual exclusion A ∩ Ac = ø• Inclusion A ∩ S = A
• DeMorgan
? (A ∪ B)c = Ac ∩ Bc
? (A ∩ B)c = Ac ∪ Bc
15
ExampleLet S = {1, 2, 3, 4, 5, 6}with subsets A = {2, 4, 6} and B = {1, 2, 3, 4}
Find (A ∩ Bc) ∪ B =________________ ?
(A ∩ Bc) ∪ B = {6} ∪ {1, 2, 3, 4} = {1, 2, 3, 4, 6}
16
ExperimentThree components:
• procedure (e.g, flip a coin)• observable (e.g. which sides lands up)• model (e.g., heads and tails equally likely)
Key property:
• Outcome uncertainExample of different observables for the same physical act of flipping a coin:
• observe heads or tails• count bounces• measure settling time
In context of random experiment, we define:
• sample space as the set of possible outcomes• An event is any subset of the sample space• An elementary event is any singleton subset; i.e., a single outcome
17
Pop QuizFlip a pair of coins, what is sample space?
It depends on the observable!
Three options:
• order matters ⇒ S1 = {TT, TH, HT, HH}• order does not matter ⇒ S2 = {both tails, both heads, one of each}• distance between coins ⇒ S3 = {d : d > 0}
18
Relative Frequency ApproachAssume a finite sample space S ={A, B, C, D}After N trials, let NA = #times that A occurs.
Define the “probabilility of A” by P (A ) = limN NA
N--------
0 200 400 600 800 10000
0.1
0.2
0.3
0.4
BATTING AVERAGE
NUMBER OF AT BATS
P (hit)?
19
Properties• 0 NA N ⇒ 0 P(A) 1
• NA + NB + + ND = N ⇒ P(A) + P(B) + + P(D) = 1
• A Impossible ⇒ P(A) = 0
• A Certain ⇒ P(A) = 1
20
Limitations of Relative-Freq ApproachHow to handle uncountable sample spaces?
Example experiment: flip a coin, measure settling time ⇒ S = {t : t > 0}After repeated trials
• T1 = 0.1235124234245423554345346351013293127825897523124376• T2 = 3.0021235543453463510132931278258975234245421243769001• T3 = 17.1235293127825897523124376
• T4 = 0.5
• etc.... Now what?
Adopt the axiomatic approach to probability
• based on set theory
21
The 3 Axioms of ProbabilityWith respect to a random experiment, define
• a sample space S = set of possible outcomes.• Any measurable subset defines an event.
Example:
ø = the null event = the impossible event = “nothing happened”
S = the certain event “something happened”
The probability P(A) of an event A is a number that satisfies 3 axioms:
(1) P(A ) 0
(2) P(S ) = 1
(3) If A ∩ B = ø then P(A ∪ B ) = P(A ) + P(B )
(4) P(A) 1(5) P(ø) = 0
(6) P(A ∪ B ) = P (A ) + P(B ) – P(A ∩ B)
22
CorrolariesCorrolary 4: P(A) 1
Why? From (3), 1 P(S) = P(A∪A ) P(A) + P(A)
P(A) = 1 – P(A) 1
Corrolary 5: P(ø) = 0
Why? P(S) P(S∪ ø) P(S) + P(ø) P(ø) = 0
The axiom only requires that probability be positive.
The corrolary ensures that it is also no bigger than unity.
=(2)
=(3)
(1)
= =(3)
23
IntuitionFor finite sample spaces, the axioms are met by assigning a probability value toeach possible outcome, a number between 0 and 1, such that they sum to unity.
24
How to Interpret Combined EventsSuppose A, B, C are subsets of S
Interpretation of:
P(A ) = probability that A occurs
P(A ∪ B) = probability that A or B occurs
P(A ∩ B) = probability that A and B occur
25
Corrolary 5Corrolary 5: P(A∪B) = P(A) + P(B) – P(A ∩ B)
P(A) = P(A ∩ B) + P(A ∩ B)
P(B) = P(B ∩ A) + P(A ∩ B)
Add these two equations:
P(A) + P(B) = P(A ∩ B) + P(B ∩ A) + P(A ∩ B) + P(A ∩ B)
= P(A∪B ) + P(A ∩ B),Q.E.D.
A
B
S
A ∩ B
A ∩ B
B ∩ A
DISJOINTProof?
26
RemarkWhile it is true that A = ø implies that P(A) = 0,
the converse is false: P(A) = 0 does not imply A = ø
Example: settling time after tossing coin, let A = {0.1 seconds exactly}
27
Generalize to 3 EventsP(A∪B∪C ) = P(A) + P(B) + P(C)
– P(A ∩ B) – P(A ∩ C) – P(B ∩ C)
+ P(A ∩ B∩ C)
A
B
S
A ∩ B
A ∩ B
B ∩ A
C
28
Methods for Calculating ProbabilitiesAny solution to a problem of the form “Find the probability that ” will likely use one of four methods:
(1) Counting method (for finite uniform distributions)P[A ] =
(2) Multiplication Rule (chain rule)
(3) Law of Total Probability (divide-and-conquer)
(4) Combinations of the above three (e.g. using Bayes rule)
size of Asize of S----------------------
29
Example: Prob of 2 Pair, Neither Faced?Suppose five cards are drawn from a deck at random from a standard deck. Find the probability that there are two pair, neither of which is a “face” card. [Assume that the 5th card has different rank than others (otherwise it would be a “full house” not “2 pair”). Assume that an ace is not a face card.]
There are |S | = ( 525 ) possible hands, all equally likely
⇒ we can use the counting method ⇒ P(A ) = |A |/|S |The question becomes: what is |A |? In other words, how many distinct handsare there that have two pair, neither of which is a face card?
There are ( 102 ) ways to specify the ranks of the two pairs.
There are ( 42 ) ways to specify the suits of the smaller pair.
There are ( 42 ) ways to specify the suits of the larger pair.
The fifth card can be any of the 48 – 4 cards that remain with a different rank.
⇒ P(A ) = = ≈ 2.7%. --------------------------------------( 10
2 )( 42 )( 4
2 )(44)
( 525 )
712802598960---------------------
30
Example: Roll 2 DiceIf 2 dice are distinguishable ⇒ 36 possible outcomes
S1 = {(1,1), (2,1), (3,1), (6,1),(1,2), (2,2), (6,2),(5,6), (6,6)}
If not ⇒ 21 possible outcomes
S2 = {(1,1), (2,1), (3,1), (6,1), (2,2), (6,5), (6,6)}
Which sample space is preferable?
The 1st one, because it ensures that all outcomes are equally likely
⇒ computing the probability of an event reduces to a counting exercise.
31
Example: Roll 2 Dice Let A = event that bigger is double smaller.
Let B = event that bigger is smaller + 1.
Find P(A∪B ) = “probability that either A, or B, or both occur”
Approach 1: A∪B has 14 equiprobable elts Pr[A∪B ] = 14/36 = 7/18
Approach 2: Pr[A∪B ] = Pr[A] + Pr[B] – Pr[A ∩ B]
= Pr[A] + Pr[B] – Pr[(1,2) or (2,1)]
= 6/36 + 10/36 – 2/36 = same answer.
2,1first die
second die
2,4
4,21,2
3,6
6,3
AB
32
Conditional ProbabilityIn previous example: If we somehow knew that B occured, this would change the probability of A.
This new probability, that takes into account the knowledge that B has occured,is called the conditional probability.
• Notation: P(A |B)
• It is the new probability of A, conditioned on the event B.
When P(B > 0), the conditional probability of an event A, given B, is
P(A|B) = .P (A B)P(B)
---------------------------
33
Can Verify: Conditional Probability
Satisfies AxiomsWhen B is taken as the new sample space:
(1) P(A|B) 0
(2) P(B|B) = 1
(3) If A ∩ C = ø then P(A ∪ C |B) = P(A|B) + P(C |B)
34
Special Cases
(a) A and B are disjoint ⇒ P(A|B) = 0.
Examples:
• P(head| tail) = 0.
• P(odd|even) = 0.
(b) What happens when P(B ) = 0?
Don’t we have to worry about dividing by zero?
No. P(B ) = 0 means that B is impossible.So P(A|B) doesn’t even make sense.
35
Back to Example: Roll 2 Dice Let A = event that bigger is double smaller.
Let B = event that bigger is smaller + 1.
• If B is known to occur, then B is the new sample space. ¦ All events outside of B become impossible¦ probabilities outside of B are set to zero
• B becomes the new sample space¦ probabilities inside of B sum to 1
Each of 10 outcomes in B are equally likely
⇒ P(A |B) = = = .
2,11st die
2nd die
2,4
4,21,2
3,6
6,3
AB
P (A B)P(B)
--------------------------- 2/3610/36--------------- 1
5---
36
A
S
⇒P(A|B) = = 0/P(B ) = 1
A
B
S
⇒P(A |B) = = P(B )/P(B ) = 1
A
S
B
B ⇒P(A |B) = = P(A )/P(B ) P(A )
A
S
B
⇒P(A |B) = could be bigger than P(A )
could be smaller than P(A )
(when P(B – A) = )
(when P(A ∩ B) = )
(disjoint)
Conditioning Sometimes Increases,Sometimes Decreases
37
Ex: Roll 2 Die, Find P(odd|small)Roll two fair die, and observe the sum.
Let A be the “odd” event; i.e.,
A = {3, 5, 7, 9, 11}.
Let B be the “small” event; i.e.,
B = {2, 3}.
Then P(A|B) =
first diese
cond d
ie
38
Ex: Roll 2 Die, Find P(odd|small)Roll two fair die, and observe the sum.
Let A be the “odd” event; i.e.,
A = {3, 5, 7, 9, 11}.
Let B be the “small” event; i.e.,
B = {2, 3}.
Then P(A|B) = = = = .
(not 1/2 as you might first guess)
first diese
cond d
ie
P (A B)P(B)
--------------------------- P(3)P(B)-------------- 2/36
3/36------------ 2
3---
39
Chain RuleSolving P(B |A) = for numerator
⇒ .
• This is the chain rule (or the multiplication rule). • Use it to compute the probability that both events A and B occur.
Example: Draw 2 cards from deck, probability both are red cards is
P(first red and second red) = P(first red)P(second red | 1st red)
=
P (A B)P(A)
---------------------------
P(AB) = P(A )P (B |A )
2652------ 25
51------
40
Chain Rule = Multiplication RuleTo calculate the probability that 5 events A1A5 happen simultaneously:
P(A1A2A3A4A5) =P(A1)P(A2A3A4A5|A1)
=P(A1)P(A2|A1)P(A3A4A5|A1A2)
=P(A1)P(A2|A1)P(A3|A1A2)P(A4A5|A1A2A3)
= P(A1)P(A2|A1)P(A3|A1A2)P(A4|A1A2A3)P(A5|A1A2A3A4)
Describe tree interpretation.
41
ExampleExample: 10 coins in a bag: 2 Red, 3 Blue, 5 Green
Draw 2 coins without replacement. Find probability of (first Green, then Blue).
R (0.2)
1/9
B (0.3)
G (0.5)
3/9
5/9
2/9
2/9
5/9
2/9
3/9
4/9
(0.5)(3/9) = 1/6
42
Birthday Problem: A Classic Chain Rule Example
Assume 365 days (ignore leap yrs), all equally likely.
How big must class be to ensure P(at least one match) > 1/2?
Experiment:
• Arrange n students in random order• Let Ai be the event that the i-th birthday does not match a previous one
P(at least one match) = 1 – P(no match)
= 1 – P(A1 ∩ A2 ∩ A3 ∩A4 ∩ ∩ An) = 1 – P(A1)P(A2|A1)P(A3|A1,A2)P(A4|A1, A2, A3)
= 1 – (1) 364365--------- 363
365--------- 362
365--------- 365 n– 1+
365----------------------------
43
0 10 20 30 40 500
0.5
1
CLASS SIZE n
Prob
abili
ty o
f at L
east
One
Mat
ch
23
44
Law of Total Probability{B1, Bn} is a partition iff (1) they mutually disjoint, and (2) ∪iBi = S.
Then P(A ) = P(∪i(A Bi))
= iP(A Bi)
⇒ “Law of Total Probability”
Special case: P(A ) = P(A B )P(B ) + P(A Bc)P(Bc)
A
P(A ) = iP(A Bi)P(Bi)
45
Law of Total Probability“Divide and Conquer”
Given any partition B1 Bn,
P(A) = P(A B1)P(B1) + P(A B2)P(B2) + + P(A Bn)P(Bn).
Example: 10 coins in a bag: 2 Red coins with P(H|R) = 0.53 Blue coins with P(H|B) = 0.65 Green coins with P(H|G) = 0.7
Pick one at random
⇒ P(H) = P(H|R)P(R) + P(H|B)P(B) + P(H|G)P(G)
= (0.5)(0.2) + (0.6)(0.3) + (0.7)(0.5)
= 0.1 + 0.18 + 0.35 = 0.63
46
Cheater MeterEvery test is scanned by a cheater meter.
P(flash|cheated) = 0.9.
P(flash|not cheated) = 0.2.
P(cheater) = 0.01.
Suppose a randomly chosen test flashes, how likely is it a cheat?
Let F = event that the test causes light to flash.
Let C = event that the tester cheated.
We want P(C|F ) = =
= = 0.043.
This is “Bayes Rule” or “Bayes Theorem.”
Quiz
P (F C)P(F)
-------------------------- P(F C)P(C)
P(F C)P(C) P(F Cc)P(Cc)+-------------------------------------------------------------------------------
0.9 0.01 0.9 0.01 0.2 0.99 +
----------------------------------------------------------------
47
Bayes Theorem Summary
For any partition {A1, A2, An},
P(Ai|B) = .
Bayes theorem turns the conditional probabilities around.
P(B Ai)P(Ai)
----------------------------------------------------------------------------------------------------------P(B|A1)P(A1) + + P(B|An)P(An)
48
Coins in a BagExample: 10 coins in a bag: 2 Red coins with Pr[H|R] = 0.5
3 Blue coins with Pr[H|B] = 0.65 Green coins with Pr[H|G] = 0.7
Find: P(G |H) = ?
Solution:
P(G |H) =
=
=
= = 0.648.
P(H G)P(G)
--------------------------------------------------------------------------------------------------------------------------P(H|R)P(R) + P(H|B)P(B) + P(H|G)P(G)
0.7 0.5 0.5 0.2 0.6 0.3 0.7 0.5 + +
-------------------------------------------------------------------------------------------
0.350.1 0.18 0.35+ +--------------------------------------------
0.350.54----------