Pattern recognition binoy 05-naive bayes classifier
-
Upload
108kaushik -
Category
Engineering
-
view
164 -
download
3
Transcript of Pattern recognition binoy 05-naive bayes classifier
Naïve Bayes ClassifierDr. Binoy B Nair
Algorithm• A Naive Bayesian model is easy to
build, with no complicated iterative parameter estimation which makes it particularly useful for very large datasets.
• Despite its simplicity, the Naive Bayesian classifier often does surprisingly well and is widely used.
Assume that there are n number of features in the dataset, then X= {x1 ,x2 , … , xn }
Naïve Bayes -Details
• Bayes classification:Difficulty: learning the joint probability
• Naïve Bayes classification- Assumption that all input features are conditionally independent!
)()|,,()()( )( 1 CPCXXPCPC|P|CP n XX)|,,( 1 CXXP n
)|()|()|( )|,,,( 2121 CXPCXPCXPCXXXP nn
Naïve Bayes• NB classification rule:
• for given and L number of classes: C1, C2, .., CL, the vector X is assigned to class c* when:
Lnn ccccccPcxPcxPcPcxPcxP ,, , ),()]|()|([)()]|()|([ 1*
1***
1
Naïve Bayes
• Algorithm: Continuous-valued Features– Conditional probability often modeled with the normal
distribution
– Learning Phase: Output: normal distributions and
– Test Phase: Given an unknown instance • Instead of looking-up tables, calculate conditional probabilities
with all the normal distributions achieved in the learning phrase• Apply the MAP rule to make a decision
ijji
ijji
ji
jij
jiij
cCcX
XcCXP
which for examplesof X values featureof deviation standard :C which for examplesof values featureof (avearage) mean :
2)(
exp21)|(̂ 2
2
Ln ccCXX ,, ),,,( for 11 XLn
LicCP i ,,1 )(
),,( 1 naa X
Example 3-Naïve Bayes Classifier with Continuous Attributes
• Problem: classify whether a given person is a male or a female based on the measured features. The features include: height, weight, and foot size.
TrainingExample training set below.
Sex (o/p class)
Height (ft)
Weight (lbs)
foot size(inches)
male 6 180 12male 5.92 190 11male 5.58 170 12male 5.92 165 10female 5 100 6female 5.5 150 8female 5.42 130 7female 5.75 150 9
Example 3• Solution• Phase 1: Training• The classifier created from the training set using a Gaussian distribution assumption would be:
sexmean (height)
variance (height)
mean (weight)
variance (weight)
Mean(foot size)
variance (foot size)
male 5.855 3.50E-02 176.25 1.23E+02 11.25 9.17E-01
female 5.4175 9.72E-02 132.5 5.58E+02 7.5 1.67E+00
We have equiprobable classes from the dataset, so P(male)= P(female) = 0.5.
Example 3• Phase 2: Testing• Below is a sample X to be classified as a male or female.
sex height (ft) weight foot size(inches)To identify 6 130 8
Solution:
X={6,130,8}
Given this info, We wish to determine which is greater, p(male|X) or p(female|X) .
p(male|X) = P(male)*P(height|male)*P(weight|male)*P(foot size|male) / evidence
p(female|X) = P(female)*P(height|female)*P(weight|female)*P(foot size|female) / evidence
Example 3
• The evidence (also termed normalizing constant) may be calculated since the sum of the posteriors equals one.
• evidence = P(male)*P(height|male)*P(weight|male)*P(foot size|male) + P(female)*P(height|female)*P(weight|female)*P(foot size|female)
• The evidence may be ignored since it is a positive constant and is same for both the classes. (Normal distributions are always positive.)
Example 3• We now determine the sex of the sample.
• P(male) = 0.5
• P(height|male) = 1.5789 (A probability density greater than 1 is OK. It is the area under the bell curve that is equal to 1.)
• P(weight|male) = 5.9881e-06
• P(foot size|male) = 1.3112e-3
• numerator of p(male|X) = their product = 6.1984e-09
Example 3• P(female) = 0.5
• P(height|female) = 2.2346e-1
• P(weight|female) = 1.6789e-2
• P(foot size|female) = 2.8669e-1
• numerator of p(female|X) = their product = 5.3778e-04
Result:Since posterior numerator of p(female|X) > posterior numerator of p(male|X) , the sample is female.
Naïve Bayes
• Algorithm: Discrete-Valued Features– Learning Phase: Given a training set S,
Output: conditional probability tables; for elements
– Test Phase: Given an unknown instance , Look up tables to assign the label c* to X’ if
; in examples with )|( estimate)|(̂
),1 ;,,1( feature eachof value feature every For ; in examples with )( estimate)(̂
of value target each For 1
S
S
ijkjijkj
jjjk
ii
Lii
cCxXPcCxXP
N,knj XxcCPcCP
)c,,c(c c
Lnn ccccccPcaPcaPcPcaPcaP ,, , ),(̂)]|(̂)|(̂[)(̂)]|(̂)|(̂[ 1*
1***
1
),,( 1 naa XLNX jj ,
Example
• Example: Play Tennis
Given a new instance, predict its label x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
Example
• Learning PhaseOutlook Play=Y
esPlay=N
oSunny 2/9 3/5
Overcast 4/9 0/5Rain 3/9 2/5
Temperature
Play=Yes Play=No
Hot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5
Humidity Play=Yes
Play=No
High 3/9 4/5Normal 6/9 1/5
Wind Play=Yes
Play=No
Strong 3/9 3/5Weak 6/9 2/5
P(Play=Yes) = 9/14P(Play=No) = 5/14
We have four variables, we calculate for each we calculate the conditional probability table
Example
• Test Phase– Given a new instance, predict its label x’=(Outlook=Sunny, Temperature=Cool, Humidity=High,
Wind=Strong)– Look up tables achieved in the learning phrase
– Decision making with the MAP rule
P(Outlook=Sunny|Play=No) = 3/5P(Temperature=Cool|Play==No) = 1/5P(Huminity=High|Play=No) = 4/5P(Wind=Strong|Play=No) = 3/5P(Play=No) = 5/14
P(Outlook=Sunny|Play=Yes) = 2/9P(Temperature=Cool|Play=Yes) = 3/9P(Huminity=High|Play=Yes) = 3/9P(Wind=Strong|Play=Yes) = 3/9P(Play=Yes) = 9/14
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
Example
• Test Phase– Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High,
Wind=Strong)– Look up tables
– MAP rule
P(Outlook=Sunny|Play=No) = 3/5P(Temperature=Cool|Play==No) = 1/5P(Huminity=High|Play=No) = 4/5P(Wind=Strong|Play=No) = 3/5P(Play=No) = 5/14
P(Outlook=Sunny|Play=Yes) = 2/9P(Temperature=Cool|Play=Yes) = 3/9P(Huminity=High|Play=Yes) = 3/9P(Wind=Strong|Play=Yes) = 3/9P(Play=Yes) = 9/14
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
Example 2: Training datasetage income student credit_rating buys_computer
<=30 high no fair no<=30 high no excellent no30…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no
Class:C1:buys_computer=‘yes’C2:buys_computer=‘no’
Data sample: X =(age<=30,Income=medium,Student=yesCredit_rating=Fair)
Naïve Bayesian Classifier: Example 2• Compute P(X|Ci) for each class
P(age=“<30” | buys_computer=“yes”) = 2/9=0.222
P(age=“<30” | buys_computer=“no”) = 3/5 =0.6
P(income=“medium” | buys_computer=“yes”)= 4/9 =0.444
P(income=“medium” | buys_computer=“no”) = 2/5 = 0.4
P(student=“yes” | buys_computer=“yes)= 6/9 =0.667
P(student=“yes” | buys_computer=“no”)= 1/5=0.2
P(credit_rating=“fair” | buys_computer=“yes”)=6/9=0.667
P(credit_rating=“fair” | buys_computer=“no”)=2/5=0.4
• X=(age<=30 ,income =medium, student=yes,credit_rating=fair)
P(X|Ci) : P(X|buys_computer=“yes”)= 0.222 x 0.444 x 0.667 x 0.0.667 =0.044
P(X|buys_computer=“no”)= 0.6 x 0.4 x 0.2 x 0.4 =0.019
P(X|Ci)*P(Ci ) : P(X|buys_computer=“yes”) * P(buys_computer=“yes”)=0.028
P(X|buys_computer=“no”) * P(buys_computer=“no”)=0.007
X belongs to class “buys_computer=yes”
P(buys_computer=“yes“)=9/14
P(buys_computer=“no“)=5/14
Summary• Naïve Bayes: the conditional independence assumption
• Training is very easy and fast; just requiring considering each attribute in each class separately
• Test is straightforward; just looking up tables or calculating conditional probabilities with estimated distributions
• A popular generative model• Performance competitive to most of state-of-the-art classifiers even in
presence of violating independence assumption• Many successful applications, e.g., spam mail filtering• A good candidate of a base learner in ensemble learning
• Apart from classification, naïve Bayes can do more…
Thank You