Introduction to Probability and Bayesian Decision Making Soo-Hyung Kim Department of Computer...
-
Upload
betty-chase -
Category
Documents
-
view
231 -
download
0
description
Transcript of Introduction to Probability and Bayesian Decision Making Soo-Hyung Kim Department of Computer...
Introduction to Probability and Bayesian
Decision Making
Soo-Hyung KimDepartment of Computer Science
Chonnam National University
2
Bayesian Decision Making Definition of Probability Conditional Probability Bayes’ Theorem Probability Distribution Gaussian Random Variable Naïve Bayesian Decision References
3
Bayesian Decision Making (1/2)
Male/Female Classification Given a priori data of <height, sex> pairs,
(168, m) (146, f)(173, m) (160 , f)(157, m) (156 , f)(163, m) (159 , f)(162, m) (149 , f)
What is the sex of a people whose height is 160?
?)160|(?)160|(
heightfPheightmP
4
Bayesian Decision Making (2/2)
UCI-Iris Data Classification Given a dataset of 150 tuples of <l1, w1, l2, w2,
class> 4 numeric attributes
Min Max Mean SD Correlation sepal length: 4.3 7.9 5.84 0.83 0.7826 sepal width: 2.0 4.4 3.05 0.43 -0.4194 petal length: 1.0 6.9 3.76 1.76 0.9490 petal width: 0.1 2.5 1.20 0.76 0.9565
3 types of class: Iris Setosa, Iris Versicolour, Iris Virginica What is the class of the data <5.1, 3.0, 4.9, 0.5>?
?)5.0,9.4,0.3,1.5 | (?)5.0,9.4,0.3,1.5 |(?)5.0,9.4,0.3,1.5 | (
VirginicaPrVersicolouP
SetosaP
5
UCI-Iris Data
6
Definition of Probability (1/4)
Experiment & Probability Experiment = procedure + observation Sample: a possible outcome of an experiment Sample space: set of all samples
= { s1, s2, …, sN } Event: set of samples (or subset of , A )
Probability: a value associated with an event, P(A)
7
Definition of Probability (2/4)
Various Definitions of Probability , where samples are all equally
likely
Axiomatic Model
nnANAP
n
),(lim)(
||||)(
AAP
8
Definition of Probability (3/4)
Probability Axioms (A. N. Kolmogorov)1. For any event A, P(A) 02. P() = 13. For any countable collection A1, A2, … of
mutually exclusive events, P(A1 A2 … ) = P(A1) + P(A2) + …
9
Definition of Probability (4/4)
Properties of Probability P() =0 P(Ac) = 1 – P(A) P(A B) = P(A) + P(B) – P(A,B)
If A & B are mutually exclusive, P(A B) = P(A) + P(B)
If A B, P(A) P(B)
10
Conditional Probability (1/3)
Prob. of event A given the occurrence of event B
Independence: if A & B are independent events,
)(),(
)()(
)()()|(
BPABP
BPBAP
BPABPBAP
)|()(),()|()(),( ABPAPBAPBAPBPABP
)()|(),()|( BPABPAPBAP
)()(),( BPAPBAP
),,( CBAP ),|()|()( BACPABPAP
)()()( CPBPAP),,( CBAP
11
Conditional Probability (2/3)
Properties of P(A|B)1. For any event A & B, P(A|B) 02. P(B|B) = 13. If A=A1 A2 … where A1, A2, … are mutually
exclusive, P(A|B) = P(A1|B) + P(A2|B) + …
12
Conditional Probability (3/3)
Total Probability Law Event space: set { B1, B2, …, Bm } of events
which are mutually exclusive: Bi Bj = , i j collectively exhaustive: B1 B2 … Bn =
For an event space { B1, B2, …, Bm } with P(Bi)>0, ii BAC
m
iii
m
ii BAPBPBAPAP
11
)|()()()(
13
Bayes’ Theorem (1/2)
From the definition of conditional probability,
If the set {C1, C2, …, Cm } is an event space then, from the total probability law,
)(/)|()()|()|()()|()(),(
APCAPCPACPACPAPCAPCPACP
iii
iiii
m
iii
iiiii
CPCAP
CPCAPAPCPCAPACP
1
)()|(
)()|()(
)()|()|(
m
iiiiii CAPCPCAPCPACP
1
)|()(/)|()()|(
14
Bayes’ Theorem (2/2) Posterior Probability
Example Application A: 기침 질병의 집합 : C = {C1( 독감 ), C2( 고지혈증 ), …, Cm( 폐
암 )} 기침하는 환자가 어떤 질병에 걸렸는지 판단
P( 독감 | 기침 ), P( 고지혈증 | 기침 ), …, P( 폐암 | 기침 ) Generalization
evidenceprior likelihood posterior
)(
)()|()|(APCPCAPACP ii
i
)()|,,,(),,,|( 2121 iinni CPCAAAPAAACP
15
Probability Distribution Probability Model, P()
A function that assigns a probability to each sample
Histogram Table Mathematical Formula
16
Random Variable A function that assigns a real value to
each element in sample space () X: si x, where si , xR
If si =aaraaa, X(si ) = 5 (number of a) Prob. Model for a discrete random variable
PK(k): probability mass function (PMF) Prob. Model for a continuous random variable
fX(x): probability density function (PDF)
17
Cumulative Probability Distribution (CDF)
FR(r) = PR(Rr)
18
PMF vs PDF PMF: PK(k)=PK(K=k)
PDF: dxxdFxf X
X)()(
)()()()( 12212
1
xFxFdxxfxXxPx
x
x
XX dttfxF )()(
19
Gaussian Random Variable (1/6)
PDF of a random variable X has a form of
is an average; is a standard deviation ( >0)
20
Gaussian Random Variable (2/6)
Example #1: 10 pairs of <height, sex>(168, m) (146, f)(173, m) (160 , f)(157, m) (156 , f)(163, m) (159 , f)(162, m) (149 , f)
MLE for the PDF of H(Height) for the class m 6.1645/)162163157173168( H
5.551 2
5
1
2
Hi
iH height
22 )5.5(2/)6.164(
2)5.5(2
1)|( heightH emheightf
21
Gaussian Random Variable (3/6)
MLE for the PDF of H(Height) for the class f
Classification of a people whose height is 160
Classify the data into male (with a probability of 0.59)
59.0)()|160()()|160(
)()|160(
)160()()|160()160|(
fPfheightfmPmheightfmPmheightf
heightPmPmheightfheightmP
HH
H
H
22 )6.5(2/)0.154(
2)6.5(21)|( height
H efheightf
41.0)160|(1)160|( heightmPheightfP
22
154 164 160
23
Gaussian Random Variable (4/6)
PDF of a n-D random vector X has a form of
X is an average vector; CX is a covariance matrix
where cij = Cov(xi, xj) = E(xixj) –ij
nx
x1
X
n
1
Xμ
ijcXC
24
Gaussian Random Variable (5/6)
Example #2: UCI–Iris data Learning Phase
MLE of PDF for a 4-D R.V. X for individual classes Using a part of the data (e.g., 30 out of 50
samples)
Generalization (Testing) Phase Using a sample which are not used in learning If classify x into the class having the
maximum posterior probability
)|( ),|( ),|( VerginicafrVersicoloufSetosaf xxx XXX
)|( ),|( ),|( xxx VirginicaPrVersicolouPSetosaP
5.0,9.4,0.3,1.5x
25
Gaussian Random Variable (6/6)
)()|()()|()()|()()|()|(
)()|()()|()()|()()|()|(
)()|()()|()()|()()|()|(
VirginicafVirginicafrVersicoloufrVersicoloufSetosafSetosafVirginicafVirginicafVirginicaP
VirginicafVirginicafrVersicoloufrVersicoloufSetosafSetosafrVersicoloufrVersicoloufrVersicolouP
VirginicafVirginicafrVersicoloufrVersicoloufSetosafSetosafSetosafSetosafSetosaP
xxxxx
xxxxx
xxxxx
XXX
X
XXX
X
XXX
X
26
Naïve Bayesian Decision Accuracy of Bayesian Decision depends on
Independence assumption can make it!)|()|()|()|( 21 iniii CxPCxPCxPCP x
),,|(),|()|()|,,()|( 2131211 xxCxPxCxPCxPCxxPCP iiiini x
27
UCI Data
28
References Textbooks
R.D. Yates and D.J. Goodman, Probability and Stochastic Processes, 2nd ed., Wiley, 2005.
송홍엽 , 정하봉 , 확률과 랜덤변수 및 랜덤과정 , 교보문고 , 2006.
R.E. Walpole, et. al., Probability and Statistics for Engineers and Scientist, 7th ed., Prentice Hall, 2002.
W. Mendelhall, Probability and Statistics, 12th ed., Thomson Brooks/Cole, 2006.
신양우 , 기초확률론 , 경문사 , 2000. http://www.ics.uci.edu/~mlearn/MLRepository.html