Pattern recognition binoy 05-naive bayes classifier

Naïve Bayes ClassifierDr. Binoy B Nair

Algorithm• A Naive Bayesian model is easy to

build, with no complicated iterative parameter estimation which makes it particularly useful for very large datasets.

• Despite its simplicity, the Naive Bayesian classifier often does surprisingly well and is widely used.

Assume that there are n number of features in the dataset, then X= {x1 ,x2 , … , xn }

Naïve Bayes -Details

• Bayes classification:Difficulty: learning the joint probability

• Naïve Bayes classification- Assumption that all input features are conditionally independent!

)()|,,()()( )( 1 CPCXXPCPC|P|CP n XX)|,,( 1 CXXP n

)|()|()|( )|,,,( 2121 CXPCXPCXPCXXXP nn

Naïve Bayes• NB classification rule:

• for given and L number of classes: C1, C2, .., CL, the vector X is assigned to class c* when:

Lnn ccccccPcxPcxPcPcxPcxP ,, , ),()]|()|([)()]|()|([ 1*

1***

1

Naïve Bayes

• Algorithm: Continuous-valued Features– Conditional probability often modeled with the normal

distribution

– Learning Phase: Output: normal distributions and

– Test Phase: Given an unknown instance • Instead of looking-up tables, calculate conditional probabilities

with all the normal distributions achieved in the learning phrase• Apply the MAP rule to make a decision

ijji

ijji

ji

jij

jiij

cCcX

XcCXP

which for examplesof X values featureof deviation standard :C which for examplesof values featureof (avearage) mean :

2)(

exp21)|(̂ 2

2

Ln ccCXX ,, ),,,( for 11 XLn

LicCP i ,,1 )(

),,( 1 naa X

Example 3-Naïve Bayes Classifier with Continuous Attributes

• Problem: classify whether a given person is a male or a female based on the measured features. The features include: height, weight, and foot size.

TrainingExample training set below.

Sex (o/p class)

Height (ft)

Weight (lbs)

foot size(inches)

male 6 180 12male 5.92 190 11male 5.58 170 12male 5.92 165 10female 5 100 6female 5.5 150 8female 5.42 130 7female 5.75 150 9

Example 3• Solution• Phase 1: Training• The classifier created from the training set using a Gaussian distribution assumption would be:

sexmean (height)

variance (height)

mean (weight)

variance (weight)

Mean(foot size)

variance (foot size)

male 5.855 3.50E-02 176.25 1.23E+02 11.25 9.17E-01

female 5.4175 9.72E-02 132.5 5.58E+02 7.5 1.67E+00

We have equiprobable classes from the dataset, so P(male)= P(female) = 0.5.

Example 3• Phase 2: Testing• Below is a sample X to be classified as a male or female.

sex height (ft) weight foot size(inches)To identify 6 130 8

Solution:

X={6,130,8}

Given this info, We wish to determine which is greater, p(male|X) or p(female|X) .

p(male|X) = P(male)*P(height|male)*P(weight|male)*P(foot size|male) / evidence

p(female|X) = P(female)*P(height|female)*P(weight|female)*P(foot size|female) / evidence

Example 3

• The evidence (also termed normalizing constant) may be calculated since the sum of the posteriors equals one.

• evidence = P(male)*P(height|male)*P(weight|male)*P(foot size|male) + P(female)*P(height|female)*P(weight|female)*P(foot size|female)

• The evidence may be ignored since it is a positive constant and is same for both the classes. (Normal distributions are always positive.)

Example 3• We now determine the sex of the sample.

• P(male) = 0.5

• P(height|male) = 1.5789 (A probability density greater than 1 is OK. It is the area under the bell curve that is equal to 1.)

• P(weight|male) = 5.9881e-06

• P(foot size|male) = 1.3112e-3

• numerator of p(male|X) = their product = 6.1984e-09

Naïve Bayes

• Algorithm: Discrete-Valued Features– Learning Phase: Given a training set S,

Output: conditional probability tables; for elements

– Test Phase: Given an unknown instance , Look up tables to assign the label c* to X’ if

; in examples with )|( estimate)|(̂

),1 ;,,1( feature eachof value feature every For ; in examples with )( estimate)(̂

of value target each For 1

S

S

ijkjijkj

jjjk

ii

Lii

cCxXPcCxXP

N,knj XxcCPcCP

)c,,c(c c

Lnn ccccccPcaPcaPcPcaPcaP ,, , ),(̂)]|(̂)|(̂[)(̂)]|(̂)|(̂[ 1*

1***

1

),,( 1 naa XLNX jj ,

Example

• Example: Play Tennis

Given a new instance, predict its label x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)

Example

• Learning PhaseOutlook Play=Y

esPlay=N

oSunny 2/9 3/5

Overcast 4/9 0/5Rain 3/9 2/5

Temperature

Play=Yes Play=No

Hot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5

Humidity Play=Yes

Play=No

High 3/9 4/5Normal 6/9 1/5

Wind Play=Yes

Play=No

Strong 3/9 3/5Weak 6/9 2/5

P(Play=Yes) = 9/14P(Play=No) = 5/14

We have four variables, we calculate for each we calculate the conditional probability table

Example 2: Training datasetage income student credit_rating buys_computer

<=30 high no fair no<=30 high no excellent no30…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no

Class:C1:buys_computer=‘yes’C2:buys_computer=‘no’

Data sample: X =(age<=30,Income=medium,Student=yesCredit_rating=Fair)

Naïve Bayesian Classifier: Example 2• Compute P(X|Ci) for each class

P(age=“<30” | buys_computer=“yes”) = 2/9=0.222

P(age=“<30” | buys_computer=“no”) = 3/5 =0.6

P(income=“medium” | buys_computer=“yes”)= 4/9 =0.444

P(income=“medium” | buys_computer=“no”) = 2/5 = 0.4

P(student=“yes” | buys_computer=“yes)= 6/9 =0.667

P(student=“yes” | buys_computer=“no”)= 1/5=0.2

P(credit_rating=“fair” | buys_computer=“yes”)=6/9=0.667

P(credit_rating=“fair” | buys_computer=“no”)=2/5=0.4

• X=(age<=30 ,income =medium, student=yes,credit_rating=fair)

P(X|Ci) : P(X|buys_computer=“yes”)= 0.222 x 0.444 x 0.667 x 0.0.667 =0.044

P(X|buys_computer=“no”)= 0.6 x 0.4 x 0.2 x 0.4 =0.019

P(X|Ci)*P(Ci ) : P(X|buys_computer=“yes”) * P(buys_computer=“yes”)=0.028

P(X|buys_computer=“no”) * P(buys_computer=“no”)=0.007

X belongs to class “buys_computer=yes”

P(buys_computer=“yes“)=9/14

P(buys_computer=“no“)=5/14

Summary• Naïve Bayes: the conditional independence assumption

• Training is very easy and fast; just requiring considering each attribute in each class separately

• Test is straightforward; just looking up tables or calculating conditional probabilities with estimated distributions

• A popular generative model• Performance competitive to most of state-of-the-art classifiers even in

presence of violating independence assumption• Many successful applications, e.g., spam mail filtering• A good candidate of a base learner in ensemble learning

• Apart from classification, naïve Bayes can do more…

Thank You

Pattern recognition binoy 05-naive bayes classifier

Engineering

Transcript of Pattern recognition binoy 05-naive bayes classifier