Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule...

BayesianDecisionTheory

Chapter 2(Duda,Hart&Stork)

CS7616- PatternRecognition

HenrikIChristensenGeorgiaTech.

BayesianDecisionTheory

• Designclassifierstorecommenddecisions thatminimizesometotalexpected”risk”.– Thesimplestrisk istheclassificationerror(i.e.,costsareequal).

– Typically,therisk includesthecost associatedwithdifferentdecisions.

Terminology

• Stateofnatureω (randomvariable):– e.g.,ω1 forseabass,ω2 forsalmon

• ProbabilitiesP(ω1) andP(ω2) (priors):– e.g.,priorknowledgeofhowlikelyistogetaseabassorasalmon

• Probabilitydensityfunctionp(x)(evidence):– e.g.,howfrequentlywewillmeasureapatternwithfeaturevaluex (e.g.,x correspondstolightness)

Terminology(cont’d)

• Conditionalprobabilitydensityp(x/ωj) (likelihood):– e.g.,howfrequentlywewillmeasureapatternwithfeaturevaluex giventhatthepatternbelongstoclassωj

e.g., lightness distributionsbetween salmon/sea-basspopulations

Terminology(cont’d)

• ConditionalprobabilityP(ωj/x)(posterior):– e.g.,theprobabilitythatthefishbelongstoclassωj givenmeasurementx.

DecisionRuleUsingPriorProbabilities

Decideω1 if P(ω1) >P(ω2); otherwisedecide ω2

or P(error)=min[P(ω1),P(ω2)]

• Favoursthemostlikelyclass.• Thisrulewillbemakingthesamedecisionalltimes.

– i.e.,optimumifnootherinformationisavailable

( )( )

( )P if wedecide

P errorP if wedecideω ω

ω ω⎧

= ⎨⎩

DecisionRuleUsingConditionalProbabilities

• UsingBayes’rule,theposteriorprobabilityofcategoryωjgivenmeasurementxisgivenby:

where(i.e.,scalefactor– sumofprobs=1)

Decideω1ifP(ω1 /x)>P(ω2/x); otherwisedecideω2or

Decideω1ifp(x/ω1)P(ω1)>p(x/ω2)P(ω2) otherwisedecideω2

( / ) ( )( / )

( )j j

p x P likelihood priorP xp x evidenceω ω

1( ) ( / ) ( )j j

jp x p x Pω ω

DecisionRuleUsingConditionalpdf (cont’d)

1 22 1( ) ( )3 3

P Pω ω= = P(ωj /x)p(x/ωj)

ProbabilityofError

• Theprobabilityoferrorisdefinedas:

• Whatistheaverageprobabilityerror?

• TheBayesruleisoptimum,thatis,itminimizestheaverageprobabilityerror!

( / )( / )

( / )P x if wedecide

P error xP x if wedecideω ω

ω ω⎧

= ⎨⎩

( ) ( , ) ( / ) ( )P error P error x dx P error x p x dx∞ ∞

−∞ −∞

= =∫ ∫

P(error/x) = min[P(ω1/x), P(ω2/x)]

WheredoProbabilitiesComeFrom?

• Therearetwocompetitiveanswerstothisquestion:

(1) Relativefrequency (objective)approach.– Probabilitiescanonlycomefromexperiments.

(2) Bayesian (subjective)approach.– Probabilitiesmayreflectdegreeofbeliefandcanbebasedonopinion.

Example(objectiveapproach)

• Classifycarswhethertheyaremoreorlessthan$50K:– Classes:C1 ifprice>$50K,C2 ifprice<=$50K– Features:x,theheightofacar

• UsetheBayes’ruletocomputetheposteriorprobabilities:

• Weneedtoestimatep(x/C1),p(x/C2),P(C1),P(C2)

( / ) ( )( / )( )i i

ip x C P CP C x

Example(cont’d)

• Collectdata– Askdrivershowmuchtheircarwasandmeasureheight.

• Determineprior probabilitiesP(C1),P(C2)– e.g.,1209samples:#C1=221#C2=988

221( ) 0.1831209988( ) 0.8171209

Example(cont’d)

• Determineclassconditionalprobabilities(likelihood)– Discretizecarheightintobinsandusenormalizedhistogram

( / )ip x C

Example(cont’d)

• Calculatetheposteriorprobability foreachbin:

1 1 2 2

( 1.0 / ) ( )( / 1.0)( 1.0 / ) ( ) ( 1.0 / ) ( )

0.2081*0.183 0.4380.2081*0.183 0.0597*0.817

p x C P CP C xp x C P C p x C P C

== = =

( / )iP C x

AMoreGeneralTheory

• Usemorethanonefeatures.• Allowmorethantwocategories.• Allowactions otherthanclassifyingtheinputtooneofthepossiblecategories(e.g.,rejection).

• Employamoregeneralerrorfunction(i.e.,“risk”function)byassociatinga“cost”(“loss”function)witheacherror(i.e.,wrongaction).

Terminology

• Featuresformavector• Afinitesetofc categoriesω1,ω2,…,ωc

• Bayesrule(i.e.,usingvectornotation):

• Afinitesetof lactionsα1,α2,…,αl

• Aloss functionλ(αi /ωj)– thecostassociatedwithtakingactionαiwhenthecorrect

classificationcategoryisωj

dR∈x

( / ) ( )( / )

( )j j

pω ω

1( ) ( / ) ( )

where p p Pω ω=

=∑x x

ConditionalRisk(orExpectedLoss)

• Supposeweobservexandtakeaction αi

• Supposethatthecostassociatedwithtakingactionαi withωj beingthecorrectcategoryisλ(αi /ωj)

• Theconditionalrisk (orexpectedloss)withtakingactionαi is:

1( / ) ( / ) ( / )

i i j jj

R a a Pλ ω ω=

=∑x x

OverallRisk

• Supposeα(x)isageneral decisionrulethatdetermineswhichactionα1,α2,…,αltotakeforeveryx;thentheoverallriskisdefinedas:

• Theoptimum decisionruleistheBayesrule

( ( ) / ) ( )R R a p d= ∫ x x x x

OverallRisk(cont’d)

• TheBayesdecisionruleminimizesR by:(i)ComputingR(αi /x) foreveryαi givenanx

(ii)ChoosingtheactionαiwiththeminimumR(αi /x)

• TheresultingminimumoverallriskiscalledBayesrisk andisthebest(i.e.,optimum)performancethatcanbeachieved:

* minR R=

Example:Two-categoryclassification

• Define– α1:decideω1

– α2:decideω2

– λij=λ(αi /ωj)

• Theconditionalrisksare:

1( / ) ( / ) ( / )

i i j jj

R a a Pλ ω ω=

=∑x x

Example:Two-categoryclassification(cont’d)

• Minimumriskdecisionrule:

or (i.e.,usinglikelihoodratio)

thresholdlikelihood ratio

SpecialCase:Zero-OneLossFunction

• Assignthesamelosstoallerrors:

• Theconditionalriskcorrespondingtothislossfunction:

SpecialCase:Zero-OneLossFunction(cont’d)

• Thedecisionrulebecomes:

• Inthiscase,theoverallriskistheaverageprobabilityerror!

Example

2 1( ) / ( )a P Pθ ω ω=

2 12 22

1 21 11

( )( )( )( )bPPω λ λ

θω λ λ

−(decisionregions)

Decide ω1 if p(x/ω1)/p(x/ω2)>P(ω2 )/P(ω1) otherwise decide ω2

Assumingzero-one loss:

12 21λ λ>

assume:

Assuminggeneral loss:

DiscriminantFunctions

• Ausefulwaytorepresentclassifiersisthroughdiscriminant functions gi(x),i =1,...,c,whereafeaturevectorx isassignedtoclassωi if:

gi(x)>gj(x) forall j i≠

DiscriminantsforBayesClassifier

• Assumingagenerallossfunction:

gi(x)=-R(αi/x)

• Assumingthezero-onelossfunction:

gi(x)=P(ωi/x)

DiscriminantsforBayesClassifier(cont’d)

• Isthechoiceofgi unique?– Replacinggi(x)withf(gi(x)),wheref() ismonotonicallyincreasing,doesnotchangetheclassificationresults.

( / ) ( )( )( )

( ) ( / ) ( )( ) ln ( / ) ln ( )

g p Pg p P

x xx x

gi(x)=P(ωi/x)

we’llusethisformextensively!

Caseoftwocategories

• Morecommontouseasinglediscriminantfunction(dichotomizer)insteadoftwo:

• Examples:1 2

( ) ( / ) ( / )( / ) ( )( ) ln ln( / ) ( )

g P Pp Pgp P

ω ωω ω

x x xxxx

DecisionRegions andBoundaries• Decisionrulesdividethefeaturespaceindecisionregions

R1,R2,…,Rc, separatedbydecisionboundaries.

decisionboundaryisdefinedby:

g1(x)=g2(x)

DiscriminantFunctionforMultivariateGaussianDensity

• Considerthefollowingdiscriminantfunction:

( ) ln ( / ) ln ( )i i ig p Pω ω= +x x

N(µ,Σ)

p(x/ωi)

MultivariateGaussianDensity:CaseI

• Σi=σ2(diagonal)– Featuresarestatisticallyindependent– Eachfeaturehasthesamevariance

favoursthea-priorimorelikelycategory

MultivariateGaussianDensity:CaseI(cont’d)

• Propertiesofdecisionboundary:– Itpassesthroughx0– Itisorthogonaltothelinelinkingthemeans.– WhathappenswhenP(ωi)=P(ωj) ?– IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.– Ifσ isverysmall,thepositionoftheboundaryisinsensitivetoP(ωi)

and P(ωj)

IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.

• Minimumdistanceclassifier– WhenP(ωi)areequal,then:

2( ) || ||i ig µ= − −x x

MultivariateGaussianDensity:CaseII

• Σi=Σ

MultivariateGaussianDensity:CaseII(cont’d)

• Propertiesofhyperplane(decisionboundary):– Itpassesthroughx0– Itisnotorthogonaltothelinelinkingthemeans.– WhathappenswhenP(ωi)=P(ωj) ?– IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.≠

• Mahalanobisdistanceclassifier– WhenP(ωi)areequal,then:

MultivariateGaussianDensity:CaseIII

• Σi=arbitrary

e.g., hyperplanes,pairsofhyperplanes,hyperspheres,hyperellipsoids,hyperparaboloids etc.

hyperquadrics;

Example- CaseIII

P(ω1)=P(ω2)

decisionboundary:

boundarydoesnot passthroughmidpointofμ1,μ2

MultivariateGaussianDensity:CaseIII(cont’d)

non-lineardecisionboundaries

MultivariateGaussianDensity:CaseIII(cont’d)

• Moreexamples

ErrorBounds• Exacterrorcalculationscouldbedifficult– easierto

estimateerrorbounds!

ormin[P(ω1/x),P(ω2/x)]

P(error)

ErrorBounds(cont’d)

• IftheclassconditionaldistributionsareGaussian,then

where:

ErrorBounds(cont’d)

• TheChernoff boundcorrespondstoβ thatminimizes e-κ(β)– Thisisa1-Doptimizationproblem,regardlesstothedimensionality

oftheclassconditionaldensities.loose boundloose bound

tight bound

ErrorBounds(cont’d)• Bhattacharyyabound

– Approximatetheerrorboundusingβ=0.5– EasiertocomputethanChernofferrorbutlooser.

• TheChernoffandBhattacharyyaboundswillnotbegoodboundsifthedistributionsarenot Gaussian.

Example

k(0.5)=4.06

( ) 0.0087P error ≤

Bhattacharyyaerror:

ReceiverOperatingCharacteristic(ROC)Curve

• Everyclassifieremployssomekindofathreshold.

• Changingthethresholdaffectstheperformanceofthesystem.

• ROCcurvescanhelpusevaluatesystemperformancefordifferent thresholds.

2 1( ) / ( )a P Pθ ω ω=

2 12 22

1 21 11

( )( )( )( )bPPω λ λ

θω λ λ

Example:PersonAuthentication• Authenticateapersonusingbiometrics(e.g.,fingerprints).

• Therearetwopossibledistributions(i.e.,classes):– Authentic (A)andImpostor (I)

Example:PersonAuthentication(cont’d)

• Possibledecisions:– (1)correctacceptance(truepositive):

• Xbelongs toA,andwedecideA

– (2)incorrectacceptance (falsepositive):• Xbelongs toI,andwedecide A

– (3)correctrejection(truenegative):• Xbelongs toI,andwedecide I

– (4)incorrectrejection (falsenegative):• Xbelongs toA,andwedecide I

false positive

correct acceptance

correct rejection

false negative

ErrorvsThreshold

FalseNegativesvsPositives

NextLecture

• LinearClassificationMethods– Hastieetal,Chapter4

• PaperlistwillavailablebyWeekend– BiddingtostartonMonday

BayesDecisionTheory:CaseofDiscreteFeatures

• Replacewith

• Seesection2.9

( / )jp dω∫ x x ( / )jP ω∑x

MissingFeatures

• ConsideraBayesclassifierusinguncorrupteddata.• Supposex=(x1,x2)isatestvectorwherex1 ismissingandthe

valueofx2 is- howcanweclassifyit?– Ifwesetx1 equaltotheaveragevalue,wewillclassifyx asω3

– Butislarger;maybeweshouldclassifyxasω2 ?2 2ˆ( / )p x ω

MissingFeatures(cont’d)

• Supposex=[xg,xb](xg:goodfeatures,xb:badfeatures)• DerivetheBayesruleusingthegoodfeatures:

Marginalizeposteriorprobabilityoverbadfeatures.

CompoundBayesianDecisionTheory

• Sequential decision(1)Decideaseachfishemerges.

• Compound decision(1)Waitforn fishtoemerge.(2)Makeall n decisionsjointly.

– Couldimproveperformancewhenconsecutivestatesofnaturearenot bestatisticallyindependent.

CompoundBayesianDecisionTheory(cont’d)

• SupposeΩ=(ω(1),ω(2),…,ω(n))denotesthenstatesofnaturewhereω(i)cantakeoneofcvaluesω1,ω2,…,ωc(i.e.,ccategories)

• SupposeP(Ω)isthepriorprobabilityofthenstatesofnature.

• SupposeX=(x1,x2,…,xn)arenobservedvectors.

CompoundBayesianDecisionTheory(cont’d)

i.e.,consecutivestatesofnaturemaynot bestatisticallyindependent!

acceptable!P P

Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule...

Documents

Transcript of Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule...

Score - Amazon S3...Υ > % % > % > ª α α α α α α α α α α S M A T B Gtr. E.B. Drums Pno. Synth. 60 œœŒ Ó jon gens œœŒ Ó jon gens œœ Œ Ó jon gens œ œ œ œœœ

α) αα α η ζη $α δδαζ & α η & η # Γ αζ # α δ - sch.grdide.ark.sch.gr/portal/sites/all/anatheseis/1.pdf · 2011-10-20 · 2 α) αα α η ζη $α δδαζ & α

παρουσίαση Α α 2

CS 7616 Pattern Recognition - College of Computingafb/classes/CS7616-Spring2014/slides/CS7616-06.pdfCS 7616 Pattern Recognition . Non-parametric methods: ... Bishop book # bins = 1/∆

Κ Α Ρ Ν Α Β Α Λ Ι

Geometría Básica - fisica.ru fileΑ α α α α α α α α π Δ π Δ π Δ π Δ π Δ π Δ π Δ π Δ π Δ π Matemática – Geometría Básica- 3

Gaga - Piano - Denman Fowler Maroney - Piano.pdf · 2018. 6. 2. · 3 2 3 2 Piano − ∀ α − Ε − α α − α ∀ − − − α α − ∀ − α − 2 2 % > Pno. 9 ∀Ç

Athens Voice - ΙΜΟΛΟΓΙΟ ΟΜΗΘΕΙΩΝ & ΟΚΟΦΟΩΝ … · 2018. 10. 31. · α/α Ημερομηνία α/α Ημερομηνία α/α Ημερομηνία α/α Ημερομηνία

Locus iste - cpdl.org · Locus iste Michael Haydn % % % % > % % Υ > % > α α α α α α α α α α F. Hrn. Ob. Vl. I Vl. II Kb. S. A. T. B. 6 ˙−− 6 ˙ ˙ œ œ 6 œœœœœœœœ

α-Discounting Method for Multi-Criteria Decision Making (α-D MCDM), by F. Smarandache

PENICILLINS - Philadelphia University · penicillins (α-hydroxybenzylpenicillins < α-aminobenzylpenicillins) •Acid-resistance of α-OH < α-NH2 penicillins •α-COOH penicillins

As performed by Sting Englishman in New York · ª ª α α α α α α α α α α ααα α α ααα ααα 33 33 33 3 3 33 33 33 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

Partitur Beelzebub AndreaCatozzi Thema mit Variationen ... · ª % > ααα ααα ααα α α α α ααα α ααα α α αα αα α ααα ααα ααα ααα ααα ααα

α β β α β β - University of Texas at Dallasbiewerm/5-applications.pdf · α + β " α + 0.97β " α - β α + 1.65β " α ‒ 0.68β " 1β " 0.68β The MO energy levels can

CS 7616 Pattern Recognition - College of Computing › ~afb › classes › CS7616-Spring2014 › ... · 2014-02-25 · CS7616 Pattern Recognition – A. Bobick . Non-parametric:

Sadaf Shahab. M... · 1 E + α 2 CO2 + α 3 K + α 4 L + α 5 X (2) Where expected signs of the parameters are: α 1 > 0, α 2 < 0, α 3 > 0, α 4 > 0, α 5 > 0. 3.

[s†εw~qδw} [‹wrqσu - · PDF filet εσ α ε α α δ σ ε α α ε σ τ τα, δ αφ ε α α δ α σ

schnabl.heimat.euschnabl.heimat.eu/member/wir/pdf/abba.pdf · 2019. 9. 9. · % % Υ % % Α > % % % % % % % % > % % % % % % % > > > > > > % > % > ª ª ª % > α α α α α α α

CS6716 Pattern Recognition - College of Computingafb/classes/CS7616-Spring2014/... · CS7616 Pattern Recognition – A. Bobick Random Forests Aaron Bobick . School of Interactive

Σ α ρ α κ ά κ η Ν ι κ ο λ ά ο Μ.stavraka/CV_NS_Full_Gr_2014_03_15.pdfστις εργασίες α, α, α, α, α, δ. Hέλος, στην εργασία αρ. α ó