1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our...
-
date post
22-Dec-2015 -
Category
Documents
-
view
226 -
download
1
Transcript of 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our...
1
Bits of probability
2
Why?Why?
We need a concept of probability to make We need a concept of probability to make judgements about our hypotheses in the scientific judgements about our hypotheses in the scientific method. Is the data consistent with our hypotheses?method. Is the data consistent with our hypotheses?
3
• relative frequency:If a process or an experiment is repeated a large number oftimes, n, and if the characteristic, E, occurs m times, then therelative frequency, m/n, of E will be approximately equal to theprobability of E.P(E) ≈ m / n
• personal probabilityWhat is the probability of life on Mars?
What is probability?What is probability?
4
PicturesPictures
UA
Not A
P(A) = (Area of A)/(Area of U) =implicitly P(A|U)
Event spaceEvent space
5
Operation on event setsOperation on event sets
Union of 2 eventsUnion of 2 events = probability(union)probability(union)
= P(E1 or E2) = P(E1 E2)
.OR..OR.
E1
E2
U E1 E2U
6
Operation on event setsOperation on event sets
Intersection of 2 eventsIntersection of 2 events = probability(intersection)probability(intersection)
= P(E1 e E2) = P(E1E2).AN.AND.D.
E1
E2
U E1
E2
U
E1 E2
7
Probability PropertiesProbability Properties
1.1. 0 0 P(E P(Eii) ) 1 1 The probability of EThe probability of Ei i is always a is always a
number between 0 e 1number between 0 e 1
2.2. ii P(E P(Eii) = 1) = 1 The sum of all the outcomes The sum of all the outcomes
EEi i U (the event space) is = 1 U (the event space) is = 1
3.3. Additivity Additivity : P(E P(E11 E E22) = ?) = ?
8
Probability PropertiesProbability Properties
1.1. 0 0 P(E P(Eii) ) 1 1 The probability of EThe probability of Ei i is always a is always a
number between 0 e 1number between 0 e 1
2.2. ii P(E P(Eii) = 1) = 1 The sum of all the outcomes The sum of all the outcomes
EEi i U (the event space) is = 1 U (the event space) is = 1
3.3. Additivity Additivity : P(E P(E11 E E22) = P(E) = P(E11) + P(E) + P(E22) - P(E) - P(E11
EE22))
9
P(sum is even or 7)= P(sum even) + P(sum 7) - P(sum = 8,10,12) = 18/36 + 21/36 - 9/36 = 30/36
1° experiment = toss 2 dice 1° experiment = toss 2 dice || results = sum of the outcomes results = sum of the outcomes
1/366,6122/366,55,6113/364,66,45,5104/364,55,43,66,395/366,2
5,2
4,486/366,11,62,54,33,475/362,44,21,55,13,364/361,44,13,22,35
3/361,33,12,24
2/362,11,231/361,12p(x)Possible outcomesPossible outcomesx
x= sum x= sum ofofthe 2 the 2 diesdies
3,5 5,3 2,6
10
if if EE11 and and EE22 are are mutually exclusivemutually exclusive then then
P(E1 E2) = P(E1) + P(E2)
For instance
P(sum = 2 3) = 1 + 2 = 3 . 36 36 36
Probability AdditivityProbability Additivity
11
2° experiment 2° experiment = joint probability of parents-children
Parent title Parent title
primary High school degree
Children title
primary 0,04 0,01 0,00High
school0,06 0,24 0,05
degree 0,05 0,30 0,25
Marginal probability :Marginal probability :P(P(PPdd) = P(parent title = degree) ) = P(parent title = degree) = = ??
P(P(CCdd) = P(child title = degree) ) = P(child title = degree) = = ??
Event Event = a pair of values: one for each variable
12
2° experiment 2° experiment = joint probability of parents-children
Parent title Parent title
primary High school degree total
Children title
primary 0,04 0,01 0,00 0,05High
school0,06 0,24 0,05 0,35
degree 0,05 0,30 0,25 0,60total 0,15 0,55 0,30 1,00
Marginal probability :Marginal probability :P(P(PPdd) = P(parent title = degree) ) = P(parent title = degree) = = 0,300,30
P(P(CCdd) = P(child title = degree) ) = P(child title = degree) = = 0,600,60
Event Event = a pair of values: one for each variable
13
2° experiment 2° experiment = joint probability of parents-children
Parent title Parent title
primary High school degree total
Children title
primary 0,04 0,01 0,00 0,05High
school0,06 0,24 0,05 0,35
degree 0,05 0,30 0,25 0,60total 0,15 0,55 0,30 1,00
Marginal probability :Marginal probability :P(P(PPdd) = P(parent title = degree) ) = P(parent title = degree) = = 0,300,30
P(P(CCdd) = P(child title = degree) ) = P(child title = degree) = = 0,600,60
Union probabilitiesUnion probabilitiesP(P(PPd d CCdd) =P[(parent = degree ) or (child=degree)] ) =P[(parent = degree ) or (child=degree)] =?=?
P(P(PPpp CCpp) = P[(parent=primary) or (child=primary)]) = P[(parent=primary) or (child=primary)] = = ??
P(P(PPdd CCpp) = P[(parent=degree) or (child=primary) ) = P[(parent=degree) or (child=primary) = = ??
Event Event = a pair of values: one for each variable
14
2° experiment 2° experiment = joint probability of parents-children
Parent title Parent title
primary High school degree total
Children title
primary 0,04 0,01 0,00 0,05High
school0,06 0,24 0,05 0,35
degree 0,05 0,30 0,25 0,60total 0,15 0,55 0,30 1,00
Marginal probability :Marginal probability :P(P(PPdd) = P(parent title = degree) ) = P(parent title = degree) = = 0,300,30
P(P(CCdd) = P(child title = degree) ) = P(child title = degree) = = 0,600,60
Union probabilitiesUnion probabilitiesP(P(PPd d CCdd) =P[(parent = degree ) or (child=degree)] ) =P[(parent = degree ) or (child=degree)] = = 0,30+0,60-0,25 = 0,650,30+0,60-0,25 = 0,65
P(P(PPpp CCpp) = P[(parent=primary) or (child=primary)]) = P[(parent=primary) or (child=primary)] = = 0,15+0,05-0,04= 0,160,15+0,05-0,04= 0,16
P(P(PPdd CCpp) = P[(parent=degree) or (child=primary) ) = P[(parent=degree) or (child=primary) = = 0,30+0,05-0,00= 0,350,30+0,05-0,00= 0,35
Event Event = a pair of values: one for each variable
15
Conditional probability
P(Cd | Pp)= P[(child=degree) given (parent=primary)] = ?
P(Cd | Phs)= P[(child=degree) given (parent=high school)] = ?
P(Cd | Pd)= P[(child=degree) given (parent=degree)] = ?
1 21 2
2
P(E E )P(E |E )=
P(E )
1 2 1 2 2P(E E )=P(E |E ) P(E )
1,000,300,550,15total0,600,250,300,05degree0,350,050,240,06High school
Children level
0,050,000,010,04primarytotaldegreeHigh schoolprimary
Parent level of study Parent level of study
16
Conditional probability
P(Cd | Pp)= P[(child=degree) given (parent=primary)] = 0,05/0,15 = 0,33
P(Cd | Phs)= P[(child=degree) given (parent=high school)] = 0,30/0,55 = 0,54
P(Cd | Pd)= P[(child=degree) given (parent=degree)] = 0,25/0,30 = 0,83
1 21 2
2
P(E E )P(E |E )=
P(E )
1 2 1 2 2P(E E )=P(E |E ) P(E )
1,000,300,550,15total0,600,250,300,05degree0,350,050,240,06High school
Children level
0,050,000,010,04primarytotaldegreeHigh schoolprimary
Parent level of study Parent level of study
17
Conditional probability
“Conditioning on an event” implies that the new total event space is reduced to that event. This is why we divide by its probability
Independent eventIndependent event
2 outcomes 2 outcomes EE11 andand E E22 are independent when are independent when
1 21 2
2
P(E E )P(E |E )=
P(E )
1 2 1 2 2P(E E )=P(E |E ) P(E )
P(EP(E11 |E |E22) = P(E) = P(E11) and P(E) and P(E22|E|E11) = P(E) = P(E22))
both holdsboth holds
18
“Conditioning on an event” implies that the new total event space is reduced to that event.
19
Independent event?
Two dice caseTwo dice case
True= 18/36 3/6 =? P(first is even) P(the first is even | dice are equal)True= 6/36 3/18 =? P(dice are equal) P(2 equal dice | the first is even)False 18/36 12/30
=? P(sum = even) P(sum=even | 2 dice are different)False 3/36 1/6
=? P(sum=10)P(sum=10 | dice are equal)
20
Computing the joint Computing the joint probability probability
Hint:assuming E2 is a certain event we can compute P(E1|E2).
Then we can relax this assumption by multiplying the results by P(E2).
The product is the joint probability (intersection) of the 2 events
P(E1E2)=P(E1|E2)P(E2)
If EIf E11 and E and E2 are are independentindependent the P(E the P(E1||
EE2)=P(E1) )=P(E1)
and this implyand this imply
P(EP(E1EE2) = P(E) = P(E1) P(E) P(E2))Note: 1) 2 mutually exclusive events cannot be independent Note: 1) 2 mutually exclusive events cannot be independent 2) 2 independent events are not mutually exclusive 2) 2 independent events are not mutually exclusive
21
PartitionPartition
If U = i Bi and Bi Bj = for all ij{Bi} is a partition of U
U
B1 B2 B3
B4
B5
B6
22
PartitionPartition
If {Bi} is a partition of U
P(A)= ii P(A,Bi)= P(A,Bi)= ii P(A|Bi)P(Bi) P(A|Bi)P(Bi)
U
B1 B2 B3
B4
B5
B6
A
23
Pr( ) Pr( | ) Pr( )Pr( | )
Pr( ) Pr( )i i i
iB A A B B
B AA A
Bayes’ Rule• Suppose that B1, B2, … Bk form a partition of S:
Suppose that Pr(Bi) > 0 and Pr(A) > 0. Then
; i j iiB B B S
A
SB1 B2 B3
B4
B5
B6
24
P(X,Y) = P(X | Y) P(Y) = P(Y | X) P(X) Joint probability
So:
P(Y | X) =
P(X | Y) P(Y)
P(X)
P(M | s) =
P(s | M) P(M)
P(s)A priori probabilities
Bayes Theorem
P(M | s)Evidenc
e sConclusio
n M
P(s | M)Evidenc
e sConclusio
n M
25
• A rare disease affects 1 out of 100,000 people.
• A test shows positive – with probability 0.99 when applied to an ill person,
and – with probability 0.01 when applied to a healthy
person.
• You result positive to the test. ARE YOU ILL?
Bayes’ rule: Example
26
Bayes’ rule: Example
P(+|ill) = 0.99 P(+|healthy) = 0.01 P(ill) = 10-5
)()|()()|(
)()|( )|(
healthyPhealthyPillPillP
illPillPillP
455
5
1089.9)101(01.01099.0
1099.0
Happy End:More likely the test is incorrect!!
27
Is the pope an alien?
Since the probability
P(Pope|Human) =1/(6,000,000,000)
do this imply that
the Pope is not a human being?
Beck-Bornholdt HP, Dubben HH, Nature 381, 730 (1996)
THAT IS:
if Human Pope is RARE, is Pope Human RARE ?
(Human ~ Not Pope) ? (Pope ~ Not Human)
28
The pope is (probably) not an alien
P(Pope|Human) is not the same as P(Human|Pope)
but P(Alien) ~ 0
So
P(Human|Pope) ~ 1.0
)()|()()|(
)()|(
)|(
AlienPAlienPopePHumanPHumanPopeP
HumanPHumanPopeP
PopeHumanP
S Eddy and D McKay’s answer
29
More examples of fallacious inferenceMore examples of fallacious inference
Since most of sport accidents occur when playing soccer, Stern titled: “SOCCER IS THE MOST DANGEROUS SPORT” (without considering that soccer is probably the most common sport)
Since a third of all fatal accidents in Germany occurs in private homes, Die Welt titled :”PRIVATE HOMES AS DANGER SPOTS”(without considering that home is the place where people spend most of the time)
Since most of the cars entering in one-way streets in the wrong direction are driven by women, Bild titled:”WOMEN MORE DISORIENTED DRIVERS” (without considering whether the samples of men and women drivers had the same size)
From: Kramer W, Gigerenzer G, Statistical Science 20:223-230 (2005)
30
33 Pirates (zecchino d’oro)33 Pirates (zecchino d’oro)
• 11 “pirati nell’occhio hanno una benda” (sight problem) • 11 “pirati son zoppi in una gamba” (leg problem)• 11 “pirati non sentono la tromba” (hearing problem)What is the probability of:• Having all three injuries• Having 2 injuries• Having 1 injury• No injury
31
33 Pirates (zecchino d’oro)33 Pirates (zecchino d’oro)
Suppose that the problems are independentP(I_i)=1/3 (prob injury i), P(NI_I)=2/3 (prob injury j)• Having all three injuries
P(S)*P(L)*P(H)=1/3*1/3*1/3=1/27
• Having 2 injuries P(i,j,not k) + P(j,k,not i) + P(k,i,not j ) =3*2/3*2/3*1/3=12/27
• Having 1 injuryP(i,not (j,k))+P(j, not (i,k))+P(k,not (i,j))= 3*2/3*1/3*1/3=6/27
• No injuryP(not S)*P(not L)*P(not H)= 2/3*2/3*2/3 = 8/27
32
Game: 1 car and 2 sheepGame: 1 car and 2 sheep
From “The Curious Incident of the Dog in the Night-Time” by Mark Haddon
•Two sheep and a care are hidden by three different doors
33From “The Curious Incident of the Dog in the Night-Time” by Mark Haddon
Game: 1 car and 2 sheepGame: 1 car and 2 sheep
•The game: you select one door (ex. 1)
1 2 31 2 3
•From the remaining two one door with a sheep is shown to you (ex. 2)•You may change your door (selecting 3) or you can keep the your first choice (1)
34From “The Curious Incident of the Dog in the Night-Time” by Mark Haddon
Game: 1 car and 2 sheepGame: 1 car and 2 sheep
Question: Are the 2 choices1. Equivalent2. Better change opinion3. Better keeping the first choice
1 2 31 2 3
35From “The Curious Incident of the Dog in the Night-Time” by Mark Haddon
Game: 1 car and 2 sheepGame: 1 car and 2 sheep
Suppose you select x (y and z are the alternatives). Suppose you select x (y and z are the alternatives). P(x)=P(P(x)=P(yy)=P()=P(zz)=1/3)=1/3P(Sz)=probability of showing zP(Sz)=probability of showing z
P(first) = 1/3
P(second) = P(y,Sz)+P(z,Sy)= P(Sz|y)P(y)+P(Sy|z)P(z)= 1*1/3+1*1/3= 2/3
36From “The Curious Incident of the Dog in the Night-Time” by Mark Haddon
Game: 1 car and 2 sheepGame: 1 car and 2 sheep
Write program to test it:Write program to test it:
firstOK = 0
secondOK = 0
for i=1 to MaxIter doors = [0,0,0] put random a 1 in doors first = one door selected random shown = position in door != first which is 0 second= the remaining position != first != shown if doors[first] == 1 then firstOK = firstOK + 1 else if doors[second] == 1 then secondOK = secondOK + 1 end ifend for probfirst = firstOK / MaxIterprobsecond = secondOK / MaxIter
37
Some useful measure: Odd ratio and log-odd scoreSome useful measure: Odd ratio and log-odd score
A measure of the relative influence of A and B is
odd(A,B)=P(A,B) / P(A)P(B)
if A and B are independent odd(A,B) ~ 1
alternatively log(odd(A,B)) >> 0 or << 0
indicates strong correlation
Ex: Substitution matrices
38
Probabilistic training of a parametric methodProbabilistic training of a parametric method
Generally speaking, a parametric model M aims to reproduce a set Generally speaking, a parametric model M aims to reproduce a set of known dataof known data
Model MModel MParameters TParameters T
Modelled dataModelled dataReal data (D)Real data (D)
How to compare them?How to compare them?
39
Maximum likelihoodMaximum likelihood
T* = argmax P(D|T,M) =T
D=data, M= model, T=model parameters
T* = argmax log(P(D|T,M))T
40
Example (coin-tossing)Example (coin-tossing)
Given N tossing of a coin (our data D), the outcomes are h heads and t tails (N=t+h)
ASSUME the model
P(D|M)= ph (1- p)t
Computing the maximum likelihood of P(D|M)
d P(D|M)d p = ph -1(1- p)t-1(h(1-p)-tp) = 0
d P(D|M)d p = 0
We obtain that our estimate of p is
p = h / (h+t) = h / N
41
Example (Error measure)Example (Error measure)
Suppose you think that your data are affected by a Gaussian error
So that they are distributed according to
F(xi)=A*exp-[(xi – )2 /22]
With A=1/sqrt(2 )
If your measures are independent the data likelihood is
P(Data| model) = i F(xi)
Find and that maximize the P(Data| model)