Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule...
Transcript of Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule...
![Page 1: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/1.jpg)
BayesianDecisionTheory
Chapter 2(Duda,Hart&Stork)
CS7616- PatternRecognition
HenrikIChristensenGeorgiaTech.
![Page 2: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/2.jpg)
BayesianDecisionTheory
• Designclassifierstorecommenddecisions thatminimizesometotalexpected”risk”.– Thesimplestrisk istheclassificationerror(i.e.,costsareequal).
– Typically,therisk includesthecost associatedwithdifferentdecisions.
![Page 3: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/3.jpg)
Terminology
• Stateofnatureω (randomvariable):– e.g.,ω1 forseabass,ω2 forsalmon
• ProbabilitiesP(ω1) andP(ω2) (priors):– e.g.,priorknowledgeofhowlikelyistogetaseabassorasalmon
• Probabilitydensityfunctionp(x)(evidence):– e.g.,howfrequentlywewillmeasureapatternwithfeaturevaluex (e.g.,x correspondstolightness)
![Page 4: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/4.jpg)
Terminology(cont’d)
• Conditionalprobabilitydensityp(x/ωj) (likelihood):– e.g.,howfrequentlywewillmeasureapatternwithfeaturevaluex giventhatthepatternbelongstoclassωj
e.g., lightness distributionsbetween salmon/sea-basspopulations
![Page 5: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/5.jpg)
Terminology(cont’d)
• ConditionalprobabilityP(ωj/x)(posterior):– e.g.,theprobabilitythatthefishbelongstoclassωj givenmeasurementx.
![Page 6: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/6.jpg)
DecisionRuleUsingPriorProbabilities
Decideω1 if P(ω1) >P(ω2); otherwisedecide ω2
or P(error)=min[P(ω1),P(ω2)]
• Favoursthemostlikelyclass.• Thisrulewillbemakingthesamedecisionalltimes.
– i.e.,optimumifnootherinformationisavailable
1 2
2 1
( )( )
( )P if wedecide
P errorP if wedecideω ω
ω ω⎧
= ⎨⎩
![Page 7: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/7.jpg)
DecisionRuleUsingConditionalProbabilities
• UsingBayes’rule,theposteriorprobabilityofcategoryωjgivenmeasurementxisgivenby:
where(i.e.,scalefactor– sumofprobs=1)
Decideω1ifP(ω1 /x)>P(ω2/x); otherwisedecideω2or
Decideω1ifp(x/ω1)P(ω1)>p(x/ω2)P(ω2) otherwisedecideω2
( / ) ( )( / )
( )j j
j
p x P likelihood priorP xp x evidenceω ω
ω×
= =
2
1( ) ( / ) ( )j j
jp x p x Pω ω
=
=∑
![Page 8: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/8.jpg)
DecisionRuleUsingConditionalpdf (cont’d)
1 22 1( ) ( )3 3
P Pω ω= = P(ωj /x)p(x/ωj)
![Page 9: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/9.jpg)
ProbabilityofError
• Theprobabilityoferrorisdefinedas:
or
• Whatistheaverageprobabilityerror?
• TheBayesruleisoptimum,thatis,itminimizestheaverageprobabilityerror!
1 2
2 1
( / )( / )
( / )P x if wedecide
P error xP x if wedecideω ω
ω ω⎧
= ⎨⎩
( ) ( , ) ( / ) ( )P error P error x dx P error x p x dx∞ ∞
−∞ −∞
= =∫ ∫
P(error/x) = min[P(ω1/x), P(ω2/x)]
![Page 10: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/10.jpg)
WheredoProbabilitiesComeFrom?
• Therearetwocompetitiveanswerstothisquestion:
(1) Relativefrequency (objective)approach.– Probabilitiescanonlycomefromexperiments.
(2) Bayesian (subjective)approach.– Probabilitiesmayreflectdegreeofbeliefandcanbebasedonopinion.
![Page 11: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/11.jpg)
Example(objectiveapproach)
• Classifycarswhethertheyaremoreorlessthan$50K:– Classes:C1 ifprice>$50K,C2 ifprice<=$50K– Features:x,theheightofacar
• UsetheBayes’ruletocomputetheposteriorprobabilities:
• Weneedtoestimatep(x/C1),p(x/C2),P(C1),P(C2)
( / ) ( )( / )( )i i
ip x C P CP C x
p x=
![Page 12: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/12.jpg)
Example(cont’d)
• Collectdata– Askdrivershowmuchtheircarwasandmeasureheight.
• Determineprior probabilitiesP(C1),P(C2)– e.g.,1209samples:#C1=221#C2=988
1
2
221( ) 0.1831209988( ) 0.8171209
P C
P C
= =
= =
![Page 13: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/13.jpg)
Example(cont’d)
• Determineclassconditionalprobabilities(likelihood)– Discretizecarheightintobinsandusenormalizedhistogram
( / )ip x C
![Page 14: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/14.jpg)
Example(cont’d)
• Calculatetheposteriorprobability foreachbin:
1 11
1 1 2 2
( 1.0 / ) ( )( / 1.0)( 1.0 / ) ( ) ( 1.0 / ) ( )
0.2081*0.183 0.4380.2081*0.183 0.0597*0.817
p x C P CP C xp x C P C p x C P C
== = =
= + =
= =+
( / )iP C x
![Page 15: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/15.jpg)
AMoreGeneralTheory
• Usemorethanonefeatures.• Allowmorethantwocategories.• Allowactions otherthanclassifyingtheinputtooneofthepossiblecategories(e.g.,rejection).
• Employamoregeneralerrorfunction(i.e.,“risk”function)byassociatinga“cost”(“loss”function)witheacherror(i.e.,wrongaction).
![Page 16: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/16.jpg)
Terminology
• Featuresformavector• Afinitesetofc categoriesω1,ω2,…,ωc
• Bayesrule(i.e.,usingvectornotation):
• Afinitesetof lactionsα1,α2,…,αl
• Aloss functionλ(αi /ωj)– thecostassociatedwithtakingactionαiwhenthecorrect
classificationcategoryisωj
dR∈x
( / ) ( )( / )
( )j j
j
p PP
pω ω
ω =x
xx
1( ) ( / ) ( )
c
j jj
where p p Pω ω=
=∑x x
![Page 17: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/17.jpg)
ConditionalRisk(orExpectedLoss)
• Supposeweobservexandtakeaction αi
• Supposethatthecostassociatedwithtakingactionαi withωj beingthecorrectcategoryisλ(αi /ωj)
• Theconditionalrisk (orexpectedloss)withtakingactionαi is:
1( / ) ( / ) ( / )
c
i i j jj
R a a Pλ ω ω=
=∑x x
![Page 18: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/18.jpg)
OverallRisk
• Supposeα(x)isageneral decisionrulethatdetermineswhichactionα1,α2,…,αltotakeforeveryx;thentheoverallriskisdefinedas:
• Theoptimum decisionruleistheBayesrule
( ( ) / ) ( )R R a p d= ∫ x x x x
![Page 19: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/19.jpg)
OverallRisk(cont’d)
• TheBayesdecisionruleminimizesR by:(i)ComputingR(αi /x) foreveryαi givenanx
(ii)ChoosingtheactionαiwiththeminimumR(αi /x)
• TheresultingminimumoverallriskiscalledBayesrisk andisthebest(i.e.,optimum)performancethatcanbeachieved:
* minR R=
![Page 20: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/20.jpg)
Example:Two-categoryclassification
• Define– α1:decideω1
– α2:decideω2
– λij=λ(αi /ωj)
• Theconditionalrisksare:
1( / ) ( / ) ( / )
c
i i j jj
R a a Pλ ω ω=
=∑x x
(c=2)
![Page 21: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/21.jpg)
Example:Two-categoryclassification(cont’d)
• Minimumriskdecisionrule:
or (i.e.,usinglikelihoodratio)
or
>
thresholdlikelihood ratio
![Page 22: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/22.jpg)
SpecialCase:Zero-OneLossFunction
• Assignthesamelosstoallerrors:
• Theconditionalriskcorrespondingtothislossfunction:
![Page 23: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/23.jpg)
SpecialCase:Zero-OneLossFunction(cont’d)
• Thedecisionrulebecomes:
• Inthiscase,theoverallriskistheaverageprobabilityerror!
or
or
![Page 24: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/24.jpg)
Example
2 1( ) / ( )a P Pθ ω ω=
2 12 22
1 21 11
( )( )( )( )bPPω λ λ
θω λ λ
−=
−(decisionregions)
Decide ω1 if p(x/ω1)/p(x/ω2)>P(ω2 )/P(ω1) otherwise decide ω2
Assumingzero-one loss:
12 21λ λ>
>
assume:
Assuminggeneral loss:
![Page 25: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/25.jpg)
DiscriminantFunctions
• Ausefulwaytorepresentclassifiersisthroughdiscriminant functions gi(x),i =1,...,c,whereafeaturevectorx isassignedtoclassωi if:
gi(x)>gj(x) forall j i≠
![Page 26: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/26.jpg)
DiscriminantsforBayesClassifier
• Assumingagenerallossfunction:
gi(x)=-R(αi/x)
• Assumingthezero-onelossfunction:
gi(x)=P(ωi/x)
![Page 27: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/27.jpg)
DiscriminantsforBayesClassifier(cont’d)
• Isthechoiceofgi unique?– Replacinggi(x)withf(gi(x)),wheref() ismonotonicallyincreasing,doesnotchangetheclassificationresults.
( / ) ( )( )( )
( ) ( / ) ( )( ) ln ( / ) ln ( )
i ii
i i i
i i i
p Pgp
g p Pg p P
ω ω
ω ω
ω ω
=
=
= +
xxx
x xx x
gi(x)=P(ωi/x)
we’llusethisformextensively!
![Page 28: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/28.jpg)
Caseoftwocategories
• Morecommontouseasinglediscriminantfunction(dichotomizer)insteadoftwo:
• Examples:1 2
1 1
2 2
( ) ( / ) ( / )( / ) ( )( ) ln ln( / ) ( )
g P Pp Pgp P
ω ω
ω ωω ω
= −
= +
x x xxxx
![Page 29: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/29.jpg)
DecisionRegions andBoundaries• Decisionrulesdividethefeaturespaceindecisionregions
R1,R2,…,Rc, separatedbydecisionboundaries.
decisionboundaryisdefinedby:
g1(x)=g2(x)
![Page 30: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/30.jpg)
DiscriminantFunctionforMultivariateGaussianDensity
• Considerthefollowingdiscriminantfunction:
( ) ln ( / ) ln ( )i i ig p Pω ω= +x x
N(µ,Σ)
p(x/ωi)
![Page 31: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/31.jpg)
MultivariateGaussianDensity:CaseI
• Σi=σ2(diagonal)– Featuresarestatisticallyindependent– Eachfeaturehasthesamevariance
favoursthea-priorimorelikelycategory
![Page 32: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/32.jpg)
MultivariateGaussianDensity:CaseI(cont’d)
wi=
)
)
![Page 33: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/33.jpg)
MultivariateGaussianDensity:CaseI(cont’d)
• Propertiesofdecisionboundary:– Itpassesthroughx0– Itisorthogonaltothelinelinkingthemeans.– WhathappenswhenP(ωi)=P(ωj) ?– IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.– Ifσ isverysmall,thepositionoftheboundaryisinsensitivetoP(ωi)
and P(ωj)
≠
)
)
![Page 34: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/34.jpg)
MultivariateGaussianDensity:CaseI(cont’d)
IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.
≠
![Page 35: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/35.jpg)
MultivariateGaussianDensity:CaseI(cont’d)
IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.
≠
![Page 36: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/36.jpg)
MultivariateGaussianDensity:CaseI(cont’d)
IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.
≠
![Page 37: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/37.jpg)
MultivariateGaussianDensity:CaseI(cont’d)
• Minimumdistanceclassifier– WhenP(ωi)areequal,then:
2( ) || ||i ig µ= − −x x
max
![Page 38: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/38.jpg)
MultivariateGaussianDensity:CaseII
• Σi=Σ
![Page 39: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/39.jpg)
MultivariateGaussianDensity:CaseII(cont’d)
![Page 40: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/40.jpg)
MultivariateGaussianDensity:CaseII(cont’d)
• Propertiesofhyperplane(decisionboundary):– Itpassesthroughx0– Itisnotorthogonaltothelinelinkingthemeans.– WhathappenswhenP(ωi)=P(ωj) ?– IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.≠
![Page 41: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/41.jpg)
MultivariateGaussianDensity:CaseII(cont’d)
IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.
≠
![Page 42: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/42.jpg)
MultivariateGaussianDensity:CaseII(cont’d)
IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.
≠
![Page 43: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/43.jpg)
MultivariateGaussianDensity:CaseII(cont’d)
• Mahalanobisdistanceclassifier– WhenP(ωi)areequal,then:
max
![Page 44: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/44.jpg)
MultivariateGaussianDensity:CaseIII
• Σi=arbitrary
e.g., hyperplanes,pairsofhyperplanes,hyperspheres,hyperellipsoids,hyperparaboloids etc.
hyperquadrics;
![Page 45: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/45.jpg)
Example- CaseIII
P(ω1)=P(ω2)
decisionboundary:
boundarydoesnot passthroughmidpointofμ1,μ2
![Page 46: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/46.jpg)
MultivariateGaussianDensity:CaseIII(cont’d)
non-lineardecisionboundaries
![Page 47: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/47.jpg)
MultivariateGaussianDensity:CaseIII(cont’d)
• Moreexamples
![Page 48: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/48.jpg)
ErrorBounds• Exacterrorcalculationscouldbedifficult– easierto
estimateerrorbounds!
ormin[P(ω1/x),P(ω2/x)]
P(error)
![Page 49: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/49.jpg)
ErrorBounds(cont’d)
• IftheclassconditionaldistributionsareGaussian,then
where:
| |
![Page 50: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/50.jpg)
ErrorBounds(cont’d)
• TheChernoff boundcorrespondstoβ thatminimizes e-κ(β)– Thisisa1-Doptimizationproblem,regardlesstothedimensionality
oftheclassconditionaldensities.loose boundloose bound
tight bound
![Page 51: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/51.jpg)
ErrorBounds(cont’d)• Bhattacharyyabound
– Approximatetheerrorboundusingβ=0.5– EasiertocomputethanChernofferrorbutlooser.
• TheChernoffandBhattacharyyaboundswillnotbegoodboundsifthedistributionsarenot Gaussian.
![Page 52: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/52.jpg)
Example
k(0.5)=4.06
( ) 0.0087P error ≤
Bhattacharyyaerror:
![Page 53: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/53.jpg)
ReceiverOperatingCharacteristic(ROC)Curve
• Everyclassifieremployssomekindofathreshold.
• Changingthethresholdaffectstheperformanceofthesystem.
• ROCcurvescanhelpusevaluatesystemperformancefordifferent thresholds.
2 1( ) / ( )a P Pθ ω ω=
2 12 22
1 21 11
( )( )( )( )bPPω λ λ
θω λ λ
−=
−
![Page 54: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/54.jpg)
Example:PersonAuthentication• Authenticateapersonusingbiometrics(e.g.,fingerprints).
• Therearetwopossibledistributions(i.e.,classes):– Authentic (A)andImpostor (I)
IA
![Page 55: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/55.jpg)
Example:PersonAuthentication(cont’d)
• Possibledecisions:– (1)correctacceptance(truepositive):
• Xbelongs toA,andwedecideA
– (2)incorrectacceptance (falsepositive):• Xbelongs toI,andwedecide A
– (3)correctrejection(truenegative):• Xbelongs toI,andwedecide I
– (4)incorrectrejection (falsenegative):• Xbelongs toA,andwedecide I
I A
false positive
correct acceptance
correct rejection
false negative
![Page 56: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/56.jpg)
ErrorvsThreshold
ROC
![Page 57: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/57.jpg)
FalseNegativesvsPositives
![Page 58: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/58.jpg)
![Page 59: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/59.jpg)
NextLecture
• LinearClassificationMethods– Hastieetal,Chapter4
• PaperlistwillavailablebyWeekend– BiddingtostartonMonday
![Page 60: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/60.jpg)
BayesDecisionTheory:CaseofDiscreteFeatures
• Replacewith
• Seesection2.9
( / )jp dω∫ x x ( / )jP ω∑x
x
![Page 61: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/61.jpg)
MissingFeatures
• ConsideraBayesclassifierusinguncorrupteddata.• Supposex=(x1,x2)isatestvectorwherex1 ismissingandthe
valueofx2 is- howcanweclassifyit?– Ifwesetx1 equaltotheaveragevalue,wewillclassifyx asω3
– Butislarger;maybeweshouldclassifyxasω2 ?2 2ˆ( / )p x ω
2x̂
![Page 62: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/62.jpg)
MissingFeatures(cont’d)
• Supposex=[xg,xb](xg:goodfeatures,xb:badfeatures)• DerivetheBayesruleusingthegoodfeatures:
pp
Marginalizeposteriorprobabilityoverbadfeatures.
![Page 63: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/63.jpg)
CompoundBayesianDecisionTheory
• Sequential decision(1)Decideaseachfishemerges.
• Compound decision(1)Waitforn fishtoemerge.(2)Makeall n decisionsjointly.
– Couldimproveperformancewhenconsecutivestatesofnaturearenot bestatisticallyindependent.
![Page 64: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/64.jpg)
CompoundBayesianDecisionTheory(cont’d)
• SupposeΩ=(ω(1),ω(2),…,ω(n))denotesthenstatesofnaturewhereω(i)cantakeoneofcvaluesω1,ω2,…,ωc(i.e.,ccategories)
• SupposeP(Ω)isthepriorprobabilityofthenstatesofnature.
• SupposeX=(x1,x2,…,xn)arenobservedvectors.
![Page 65: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the](https://reader030.fdocuments.net/reader030/viewer/2022040513/5e67bfdf302c273dbc6e8464/html5/thumbnails/65.jpg)
CompoundBayesianDecisionTheory(cont’d)
i.e.,consecutivestatesofnaturemaynot bestatisticallyindependent!
acceptable!P P