What we will cover here
• What is a classifier• Difference of learning/training and classifying• Math reminder for Naïve Bayes• Tennis example = naïve Bayes• What may be wrong with your Bayes
Classifier?
1
QUIZZ: QUIZZ: Probability Basics
• Quiz: We have two six-sided dice. When they are tolled, it could end up with the following occurance: (A) dice 1 lands on side “3”, (B) dice 2 lands on side “1”, and (C) Two dice sum to eight. Answer the following questions:
? equals ),( Is 8)
?),( 7)
?),( 6)
?)|( 5)
?)|( 4)
? 3)
? 2)
? )( )1
P(C)P(A)CAP
CAP
BAP
ACP
BAP
P(C)
P(B)
AP
Outline
• Background
• Probability Basics
• Probabilistic Classification
• Naïve Bayes
• Example: Play Tennis
• Relevant Issues
• Conclusions
4
Probabilistic ClassificationProbabilistic Classification
• Establishing a probabilistic model for classification– Discriminative model
),, , )( 1 n1L X(Xc,,cC|CP XX
),,,( 21 nxxx x
Discriminative Probabilistic Classifier
1x 2x nx
)|( 1 xcP )|( 2 xcP )|( xLcP
What is a discriminative Probabilistic Classifier?
Example•C1 – benign mole
•C2 - cancer
Probabilistic Classification
Vectors of random variables
• Establishing a probabilistic model for classification (cont.)– Generative model
),, , )( 1 n1L X(Xc,,cCC|P XX
GenerativeProbabilistic Model
for Class 1
)|( 1cP x
1x 2x nx
GenerativeProbabilistic Model
for Class 2
)|( 2cP x
1x 2x nx
GenerativeProbabilistic Model
for Class L
)|( LcP x
1x 2x nx
),,,( 21 nxxx x
Probability that this fruit is an apple
Probability that this fruit is an orange
Background: methods to create classifiers
• There are three methods to establish a classifier a) Model a classification rule directly
Examples: k-NN, decision trees, perceptron, SVM
b) Model the probability of class memberships given input data Example: perceptron with the cross-entropy cost
c) Make a probabilistic model of data within each class
Examples: naive Bayesnaive Bayes, model based classifiers
• a) and b) are examples of discriminative classification• c) is an example of generative classification• b) and c) are both examples of probabilistic classification
GOOD NEWS: You can create your own hardware/software classifiers!
LAST LECTURE REMINDER: Probability Basics
• We defined prior, conditional and joint probability for random variables– Prior probability:
– Conditional probability:
– Joint probability:
– Relationship:
– Independence:
• Bayesian Rule
)| ,)( 121 XP(XX|XP 2
)()()(
)(X
XX
PCPC|P
|CP
)(XP
) )( ),,( 22 ,XP(XPXX 11 XX
)()|()()|() 2211122 XPXXPXPXXP,XP(X1
)()() ),()|( ),()|( 212121212 XPXP,XP(XXPXXPXPXXP 1
EvidencePriorLikelihood
Posterior
Method: Probabilistic Classification with MAP
• MAP classification rule– MAP: Maximum A Posterior
– Assign x to c* if
• Method of Method of Generative classification with the MAP rule1. Apply Bayesian rule to convert them into posterior
probabilities
2. Then apply the MAP rule
Lc,,cccc|cCP|cCP 1** , )( )( xXxX
Li
cCPcC|P
PcCPcC|P
|cCP
ii
iii
,,2,1 for
)()(
)()()(
)(
xX
xXxX
xX
We use this rule in many applications
Naïve Bayes
• Bayes classificationBayes classification
Difficulty: learning the joint probability
• Naïve Bayes classificationNaïve Bayes classification– Assumption that all input attributes are conditionally
independent!
– MAP classification rule: for
)()|,,()()( )( 1 CPCXXPCPC|P|CP n XX
)|,,( 1 CXXP n
)|()|()|(
)|,,()|(
)|,,(),,,|()|,,,(
21
21
22121
CXPCXPCXP
CXXPCXP
CXXPCXXXPCXXXP
n
n
nnn
Lnn ccccccPcxPcxPcPcxPcxP ,, , ),()]|()|([)()]|()|([ 1*
1***
1 ),,,( 21 nxxx x
For a class, the previous generative model can be decomposed by nn generative models of a single input.
Product of individual probabilities
Naïve Bayes AlgorithmNaïve Bayes Algorithm
• Naïve Bayes Algorithm (for discrete input attributes) has two phases
– 1. Learning Phase1. Learning Phase: Given a training set S,
Output: conditional probability tables; for
elements
– 2. Test Phase2. Test Phase: Given an unknown instance ,
Look up tables to assign the label c* to X’ if
; in examples with)|( estimate)|(̂
),1 ;,,1( attribute each of value attribute every For
; in examples with)( estimate)(̂
of value target each For 1
S
S
ijkjijkj
jjjk
ii
Lii
cCxXPcCxXP
N,knj Xx
cCPcCP
)c,,c(c c
Lnn ccccccPcaPcaPcPcaPcaP ,, , ),(̂)]|(̂)|(̂[)(̂)]|(̂)|(̂[ 1*
1***
1
),,( 1 naa X
LNX jj ,
Classification is easy, just multiply probabilities
Learning is easy, just create probability tables.
The The learning phase for tennis for tennis exampleexample
Outlook Play=Yes
Play=No
Sunny 2/9 3/5Overcas
t4/9 0/5
Rain 3/9 2/5
Temperature
Play=Yes Play=No
Hot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5
Humidity Play=Yes
Play=No
High 3/9 4/5Normal 6/9 1/5
Wind Play=Yes
Play=No
Strong 3/9 3/5Weak 6/9 2/5
P(Play=Yes) = 9/14
P(Play=No) = 5/14
We have four variables, we calculate for each we calculate the conditional probability table
Formulation of a Classification Problem
• Given the data as found in last slide:
• Find for a new point in space (vector of values) to which group it belongs (classify)
The test phase test phase for the tennis example
• Test Phase– Given a new instance of variable values, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High,
Wind=Strong)
– Given calculated Look up tables
– Use the Use the MAP rule MAP rule to calculate Yes or to calculate Yes or NoNo
P(Outlook=Sunny|Play=NoNo) = 3/5
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14
P(Outlook=Sunny|Play=YesYes) = 2/9
P(Temperature=Cool|Play=Yes) = 3/9
P(Huminity=High|Play=Yes) = 3/9
P(Wind=Strong|Play=Yes) = 3/9
P(Play=Yes) = 9/14
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes)
= 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Given the fact P(YesYes||xx’) < P(’) < P(NoNo||xx’), we label ’), we label xx’ ’
to be “to be “NoNo”. ”.
Example: software exists
• Test Phase– Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables
– MAP rule
P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14
P(Outlook=Sunny|Play=Yes) = 2/9
P(Temperature=Cool|Play=Yes) = 3/9
P(Huminity=High|Play=Yes) = 3/9
P(Wind=Strong|Play=Yes) = 3/9
P(Play=Yes) = 9/14
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|
Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x’) < P(No|x’), we label x’ to be
“No”.
From previous slide
Issues Relevant to Naïve BayesIssues Relevant to Naïve Bayes
1.Violation of Independence
Assumption
2.Zero conditional probability Problem
Issues Relevant to Naïve BayesIssues Relevant to Naïve Bayes
1. Violation of Independence Assumption
• For many real world tasks,
• Nevertheless, naïve Bayes works surprisingly well anyway!
)|()|( )|,,( 11 CXPCXPCXXP nn
Events are correlated
First Issue
Issues Relevant to Naïve BayesIssues Relevant to Naïve Bayes
1. Zero conditional probability Problem
– Such problem exists when no example contains the attribute value
– In this circumstance, during test
– For a remedy, conditional probabilities are estimated with
)|()|( )|,,( 11 CXPCXPCXXP nn
0)|(̂ , ijkjjkj cCaXPaX
0)|(̂)|(̂)|(̂ 1 inijki cxPcaPcxP
)1 examples, virtual"" of (number prior to weight:
) of values possible for /1 (usually, estimate prior :
whichfor examples training of number :
C and whichfor examples training of number :
)|(̂
mm
Xttpp
cCn
caXnmnmpn
cCaXP
j
i
ijkjc
cijkj
Second Issue
Another Problem: Another Problem: Continuous-Continuous-valued Input Attributesvalued Input Attributes
• What to do in such a case?– Numberless values for an attribute
– Conditional probability is then modeled with the normal distribution
– Learning Phase: Output: normal distributions and
– Test Phase:1. Calculate conditional probabilities with all the normal distributions2. Apply the MAP rule to make a decision
ijji
ijji
ji
jij
jiij
cC
cX
XcCXP
whichfor examples of X values attribute of deviation standard :
C whichfor examples of values attribute of (avearage) mean :
2
)(exp
2
1)|(̂ 2
2
Ln ccCXX ,, ),,,( for 11 XLn
),,( for 1 nXX X
LicCP i ,,1 )(
Conclusion on classifiersConclusion on classifiers• Naïve Bayes is based on the independence assumption
– Training is very easy and fast; just requiring considering each attribute in each class separately
– Test is straightforward; just looking up tables or calculating conditional probabilities with normal distributions
• Naïve Bayes is a popular generative classifier model1.1. Performance of naïve Bayes Performance of naïve Bayes is competitive to most of state-of-the-art
classifiers even if in presence of violating the independence assumption2. It has many successful applications, e.g., spam mail filtering3. A good candidate of a base learner in ensemble learning4. Apart from classification, naïve Bayes can do more…
Top Related