Bayesian Classification with a brief introduction to pattern recognition Modified from slides by...

41
Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    230
  • download

    4

Transcript of Bayesian Classification with a brief introduction to pattern recognition Modified from slides by...

Page 1: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

Bayesian Classificationwith a brief introduction to pattern

recognition

Modified from slides byMichael L. Raymer, Ph.D.

Page 2: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 2

The pattern recognition paradigm• Fruit on an assembly line

Oranges, grapefruit, lemons, cherries, apples

• Sensors measure:Red intensityYellow intensityMass (kg)Approximate volume

• At the end of the line, a gate switches to deposit the fruit into the correct bin

Page 3: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 3

Training the algorithm

Red = 2.125

Yellow = 6.143

Mass = 134.32

Volume = 24.21

Apple

Sensors, scales, etc…

Page 4: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 4

Training (2)Red = 2.125

Yellow = 6.143

Mass = 134.32

Volume = 24.21

Apple

Red = ???

Yellow = ???

Mass = ???

Volume = ???

Label

Red = ???

Yellow = ???

Mass = ???

Volume = ???

Label

Red = ???

Yellow = ???

Mass = ???

Volume = ???

Label

Red = ???

Yellow = ???

Mass = ???

Volume = ???

Label

Red = ???

Yellow = ???

Mass = ???

Volume = ???

LabelRed = ???

Yellow = ???

Mass = ???

Volume = ???

Label

Red = ???

Yellow = ???

Mass = ???

Volume = ???

Label

Classifier

Page 5: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 5

Testing

Red = 2.125

Yellow = 6.143

Mass = 134.32

Volume = 24.21

??

Classifier

!

Page 6: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 6

Pattern MatrixV1 V2 V3 V4 V5

Ex 1 3.06 2.05 6.39 7.84 6.75

Ex 2 8.25 0.72 2.52 0.50 9.08

Ex 3 2.72 9.32 5.68 7.83 7.86

Ex 4 7.37 1.30 2.97 0.61 3.49

Ex 5 0.73 1.46 6.60 6.08 0.78

Ex 6 4.85 5.08 4.87 8.06 8.65

Ex 7 5.89 1.23 6.38 2.81 6.84

Ex 8 0.52 6.57 4.08 3.62 0.59

Ex 9 5.66 3.65 6.87 6.90 7.93

Ex 10 3.92 0.73 1.01 3.57 2.47

Ex 11 8.84 1.42 2.79 3.40 3.19

Ex 12 5.63 4.32 8.08 0.82 4.74

Class

1

1

1

2

2

2

3

3

3

4

4

4

Page 7: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 7

Nearest Neighbor Classification

Mass (normalized)0 1 2 3 4 5 6 7 8 9 10

12

34

56

78

910

Red

Int

ensi

ty (

norm

aliz

ed)

?

Page 8: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 8

Evaluating Accuracy

Trainingdata

Mass (normalized)0 1 2 3 4 5 6 7 8 9 10

12

34

56

78

910

Red

Inte

nsity

(no

rmal

ized

)

Testingdata

Page 9: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 9

Problems with KNN classifiers• Lots of memorization• Slow (lots of distance calculations)• Incorrect features cause problems• Features are assumed to all be of

equal importance in classification• Odd exemplars (e.g. green/yellow

apples) cause problems• What value for k?

Page 10: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 10

Distributions• Bayesian classifiers start with an estimate of

the distribution of the features

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

P(N)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

N =# Heads (20 Tosses)0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.5 1 1.5 2

N

P(N

)

Binomial Distribution(Discrete)

Gaussian Distribution(Continuous)

Page 11: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 11

Density Estimation• Parametric

Assume a Gaussian (e.g.) distribution.Estimate the parameters (,).

• Non-parametricHistogram samplingBin size is criticalGaussian smoothing

can help

Page 12: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 12

The Gaussian distribution

xxxfd

1T

2

1exp

2

12

12

2

22 2

ex

xf

Multivariate (d-dimensional):

Univariate:

A parametric Bayesian classifier must estimate and from the training samples.

Page 13: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 13

Making decisions• Once you have the distributions for

Each featureand

Each class

• You can ask questions like…

If I have an apple, what is the probability that the diameter will be between 3.2 and 3.5 inches?

Page 14: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 14

More decisions…

Non-parametric Parametric

Diameter

Cou

nt

bins all

inches 3.5 through 3.1 ngrepresenti bins

dxex

5.3

1.3

2

2

22

Page 15: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 15

A Simple Example• You are given a fruit with a

diameter of 4” – is it a pear or an apple?

• To begin, we need to know the distributions of diameters for pears and apples.

Page 16: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 16

Maximum Likelihood

P(x)

apple|xP pear|xP

diameterx

Class-Conditional Distributions

Class-Conditional Distributions

1” 2” 3” 4” 5” 6”

Page 17: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 17

What are we asking?• If the fruit is an apple, how likely is it

to have a diameter of 4”?• If the fruit is a xenofruit from planet

Xircon, how likely is it to have a diameter of 4”?

Is this the right question to ask?

Page 18: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 18

A Key Problem• We based this decision on

(class conditional)• What we really want to use is

(posterior probability)• What if we found the fruit in a

pear orchard?• We need to know the prior

probability of finding an apple or a pear!

pear|xP

xP |pear

Page 19: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 19

Statistical decisions…• If a fruit has a diameter of 4”, how

likely is it to be an apple?

Apples 4” Fruit

Page 20: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 20

“Inverting” the question

Given an apple, what is the probability that it will have a diameter of 4”?

Given an apple, what is the probability that it will have a diameter of 4”?

Given a 4” diameter fruit, what is the probability that it is an apple?

Given a 4” diameter fruit, what is the probability that it is an apple? appleP |0.4x appleP |0.4x 0.4x| appleP 0.4x| appleP

Page 21: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 21

Prior Probabilities• Prior probability + Evidence

Posterior Probability

• Without evidence, what is the “prior probability” that a fruit is an apple?

Page 22: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 22

The heart of it all• Bayes Rule

classes all

)()|(

)()|(|

classPclassevidenceP

classPclassevidencePevidenceclassP

pearpear|4appleapple|4

appleapple|4

P"dpP"dp

P"dp

Page 23: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 23

Bayes Rule

c

jjj

jjj

Pxp

PxpxP

1

|

||

or

xpPxp

xP jjj

|

|

Page 24: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 24

Example Revisited

• Is it an ordinary apple or an uncommon pear?

05.0pear|4

4.0apple|4

"dP

"dP

9.0)pear(

1.0apple

P

P

Page 25: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 25

Bayes Rule Example

47.0085.0

04.0

9.005.01.04.0

1.04.0

"dP 4|apple

pearpear|4appleapple|4

appleapple|4

P"dpP"dp

P"dp

Page 26: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 26

Bayes Rule Example "dP 4|pear

pearpear|4appleapple|4

pearpear|4

P"dpP"dp

P"dp

53.0085.0

045.0

9.005.01.04.0

9.005.0

Page 27: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 27

Solution

909.0000999.000099.0

00099.0

999.00001.0001.099.0

001.099.0

)|( posguiltP

)(||

|

innocentPinnocentpospguiltPguiltposp

guiltPguiltposp

Page 28: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 28

Marginal Distributions

apple|1xP pear|1xP

apple|2xP pear|2xP

Page 29: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 29

Combining Marginals• Assuming independent features:

• If we assume independence and use Bayes rule, we have a Naïve Bayes decision maker (classifier).

jdjjj xPxPxPxP ω|ω|ω|ω| 21

Page 30: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 30

Bayes Decision Rule

• Provably optimal when the features (evidence) follow Gaussian distributions, and are independent.

jxPxP ji

i

||

such that , classPredict

Page 31: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 31

Likelihood Ratios• When deciding between two

possibilities, we don’t need the exact probabilities. We only need to know which one is greater.

• The denominator for all the classes is always equal.Can be eliminatedUseful when there are many possible

classes

Page 32: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 32

Likelihood Ratio Example

pearpear|4appleapple|4

pearpear|4

P"dpP"dp

P"dp

pearpear|4appleapple|4

appleapple|4

P"dpP"dp

P"dp

Page 33: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 33

Likelihood Ratio Example

appleapple|4

pearpear|4

P"dp

P"dp

Page 34: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 34

In-class example:Oranges Grapefruit

Red Intensity

1 0 13

2018

20

8

3 4

22

0

5

10

15

20

25

3.01

7

3.60

47

4.19

24

4.78

01

5.36

78

5.95

55

6.54

32

7.13

09

7.71

86

8.30

63M

ore

Bin

Fre

qu

en

cy

Mass

05

10

19 21

149

12

63

05

10152025

0.09

18

1.11

06

2.12

94

3.14

824.

167

5.18

58

6.20

46

7.22

34

8.24

22M

ore

Bin

Fre

qu

en

cy

Red Intensity

1 2

810

19

15

19

15

8

1 2

0

5

10

15

20

5.39

1

5.82

64

6.26

18

6.69

72

7.13

267.

568

8.00

34

8.43

88

8.87

42

9.30

96M

ore

Bin

Fre

qu

en

cy

Mass

1 17 9

2026

12 14

72 1

05

1015202530

4.48

1

5.24

83

6.01

56

6.78

29

7.55

02

8.31

75

9.08

48

9.85

21

10.6

194

11.3

867

Mor

e

Bin

Fre

qu

en

cy

Page 35: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 35

Example (cont’d)• After observing several hundred fruit

pass down the assembly line, we observe that72% are oranges28% are grapefruit

• Fruit ‘x’Red intensity = 8.2Mass = 7.6

What shall we predict for the class of fruit ‘x’?

What shall we predict for the class of fruit ‘x’?

Page 36: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 36

The whole enchilada 6.7,2.8|orangeP

grapefruitgrapefruit|6.7,2.8orangeorange|6.7,2.8

orangeorange|6.7,2.8

PpPp

Pp

and…

orange|6.7orange|2.8orange|6.7,2.8 massPredPP

(Naïve assumption)

Repeat for grapefruit and predict the more probable class.

Page 37: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 37

The whole enchilada (2) 6.7,2.8|orangeP

grapefruit orange,

)(|6.7|2.8

)orange(orange|6.7orange|2.8

f

fPfmassPfredP

PmassPredP

)grapefruit(grapefruit|6.7grapefruit|2.8)orange(orange|6.7orange|2.8

)orange(orange|6.7orange|2.8

PmassPredPPmassPredP

PmassPredP

39.

28.20.19.72.12.08.

72.12.08.

Page 38: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 38

The whole enchilada (3) 6.7,2.8|grapefruitP

grapefruit orange,

)(|6.7|2.8

)grapefruit(grapefruit|6.7grapefruit|2.8

f

fPfmassPfredP

PmassPredP

)grapefruit(grapefruit|6.7grapefruit|2.8)orange(orange|6.7orange|2.8

)grapefruit(grapefruit|6.7grapefruit|2.8

PmassPredPPmassPredP

PmassPredP

61.

28.20.19.72.12.08.

28.20.19.

Page 39: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 39

Conclusion

39.|orange xP

61.|grapefruit xP

Predict that fruit ‘x’ is a grapefruit, despite the relative scarcity of grapefruits on the conveyor belt.

Page 40: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 40

Abbreviated

• Since the denominator is the same for all classes, we can just compare:

orangeorange|6.7orange|2.8 PmassPredP

and

grapefruitgrapefruit|6.7grapefruit|2.8 PmassPredP

Page 41: Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 41

Likelihood comparison orangeorange|6.7orange|2.8 PmassPredP

0069.

72.12.08.

grapefruitgrapefruit|6.7grapefruit|2.8 PmassPredP

0106.

28.20.19.