of 23 /23
Chapter 4 (Part 1): Chapter 4 (Part 1): Non-Parametric Non-Parametric Classification Classification Introduction Introduction Density Estimation Density Estimation Parzen Windows Parzen Windows All materials used in this course were taken from the textbook All materials used in this course were taken from the textbook “Pattern Classiﬁcation” “Pattern Classiﬁcation” by Duda et by Duda et al., John Wiley & Sons, 2001 al., John Wiley & Sons, 2001 with the permission of the authors and the publisher with the permission of the authors and the publisher
• date post

19-Dec-2015
• Category

## Documents

• view

241

1

### Transcript of Chapter 4 (Part 1): Non-Parametric Classification Introduction Introduction Density Estimation...

Chapter 4 (Part 1):Chapter 4 (Part 1):Non-Parametric Non-Parametric ClassificationClassification

IntroductionIntroduction

Density EstimationDensity Estimation

Parzen WindowsParzen WindowsAll materials used in this course were taken from the textbook All materials used in this course were taken from the textbook “Pattern Classification”“Pattern Classification” by Duda et al., John Wiley & Sons, 2001 by Duda et al., John Wiley & Sons, 2001 with the permission of the authors and the publisherwith the permission of the authors and the publisher

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

2

IntroductionIntroduction

All Parametric densities are unimodal (have a single All Parametric densities are unimodal (have a single local maximum), whereas many practical problems local maximum), whereas many practical problems involve multi-modal densitiesinvolve multi-modal densities

Nonparametric procedures can be used with Nonparametric procedures can be used with arbitrary distributions and without the assumption arbitrary distributions and without the assumption that the forms of the underlying densities are knownthat the forms of the underlying densities are known

There are two types of nonparametric methods:There are two types of nonparametric methods:

Estimating P(x | Estimating P(x | jj)) Bypass probability and go directly to a-posteriori probability Bypass probability and go directly to a-posteriori probability

estimationestimation

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

3 Density EstimationDensity Estimation

Basic idea:Basic idea:

Probability that a vector x will fall in region Probability that a vector x will fall in region RR is: is:

P is a smoothed (or averaged) version of the density P is a smoothed (or averaged) version of the density function p(x) if we have a sample of size n; therefore, the function p(x) if we have a sample of size n; therefore, the probability that k points fall in probability that k points fall in RR is then: is then:

and the expected value for k is:and the expected value for k is:

E(k) = nP (3)E(k) = nP (3)

(1) 'dx)'x(pP

(2) )P1(P k

nP knk

k

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

4ML estimation of P = ML estimation of P = is reached foris reached for

Therefore, the ratio k/n is a good estimate Therefore, the ratio k/n is a good estimate for the probability P and hence for the for the probability P and hence for the density function p. density function p.

p(x) is continuous and that the region p(x) is continuous and that the region RR is is so small that p does not vary so small that p does not vary significantly within it, we can write:significantly within it, we can write:

where is a point within where is a point within RR and V the volume enclosed by and V the volume enclosed by RR..

)|P(Max k

Pnkˆ

(4) V)x(p'dx)'x(p

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

5Combining equation (1) , (3) and (4) yields:Combining equation (1) , (3) and (4) yields:V

n/k)x(p

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

6 Density Estimation (cont’d)Density Estimation (cont’d)

Justification of equation (4)Justification of equation (4)

We assume that p(x) is continuous and that We assume that p(x) is continuous and that region region RR is so small that p does not vary is so small that p does not vary significantly within significantly within RR. Since p(x) = constant, . Since p(x) = constant, it is not a part of the sum.it is not a part of the sum.

(4) V)x(p'dx)'x(p

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

7

Where: Where: ((RR) is: a surface in IR) is: a surface in IR22

a volume in IRa volume in IR33

a hypervolume in IRa hypervolume in IRnn

Since p(x) Since p(x) p(x’) = constant, therefore in IR p(x’) = constant, therefore in IR33::

)()'x(p'dx)x(l1)'x(p'dx)'x(p'dx)'x(p

nVk

)x(p and

V).x(p'dx)'x(p

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

8 Condition for convergenceCondition for convergence

The fraction k/(nV) is a space averaged value of The fraction k/(nV) is a space averaged value of p(x).p(x). p(x) is obtained only if V approaches zero.p(x) is obtained only if V approaches zero.

This is the case where no samples are included This is the case where no samples are included in in RR: it is an uninteresting case!: it is an uninteresting case!

In this case, the estimate diverges: it is an In this case, the estimate diverges: it is an uninteresting case!uninteresting case!

fixed)n (if 0)x(plim0k ,0V

)x(plim0k ,0V

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

9

The volume V needs to approach 0 anyway The volume V needs to approach 0 anyway if we want to use this estimationif we want to use this estimation

Practically, V cannot be allowed to become small since Practically, V cannot be allowed to become small since the number of samples is always limitedthe number of samples is always limited

One will have to accept a certain amount of variance in One will have to accept a certain amount of variance in the ratio k/nthe ratio k/n

Theoretically, if an unlimited number of samples is Theoretically, if an unlimited number of samples is available, we can circumvent this difficultyavailable, we can circumvent this difficultyTo estimate the density of x, we form a sequence of To estimate the density of x, we form a sequence of regionsregions

RR11, , RR22,…containing x: the first region contains one ,…containing x: the first region contains one sample, the second two samples and so on.sample, the second two samples and so on.

Let VLet Vnn be the volume of be the volume of RRnn, k, knn the number of samples the number of samples falling in falling in RRnn and p and pnn(x) be the n(x) be the nthth estimate for p(x): estimate for p(x):

ppnn(x) = (k(x) = (knn/n)/V/n)/Vn n (7)(7)

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

10

Three necessary conditions should apply if we want pThree necessary conditions should apply if we want pnn(x) to converge to (x) to converge to p(x):p(x):

There are two different ways of obtaining sequences of regions that There are two different ways of obtaining sequences of regions that satisfy these conditions:satisfy these conditions:

(a) Shrink an initial region where V(a) Shrink an initial region where Vnn = 1/ = 1/n and show that n and show that

This is called This is called “the Parzen-window estimation method”“the Parzen-window estimation method”

(b) Specify k(b) Specify knn as some function of n, such as k as some function of n, such as knn = = n; the volume Vn; the volume Vnn is is

grown until it encloses kgrown until it encloses knn neighbors of x. This is called neighbors of x. This is called “the k“the knn-nearest -nearest

neighbor estimation method”neighbor estimation method”

0n/klim )3

klim )2

0Vlim )1

nn

nn

nn

)x(p)x(pn

n

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

11

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

12

Parzen WindowsParzen Windows

Parzen-window approach to estimate densities Parzen-window approach to estimate densities assume that the region assume that the region RRnn is a d-dimensional is a d-dimensional hypercubehypercube

((x-x((x-xii)/h)/hnn) is equal to unity if x) is equal to unity if xii falls within the falls within the hypercube of volume Vhypercube of volume Vnn centered at x and equal to centered at x and equal to zero otherwise.zero otherwise.

otherwise 0

d , 1,...j 21

u 1(u)

:function window following the be (u) Let

) of edge the of length :(h hV

j

nndnn

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

13 The number of samples in this hypercube is:The number of samples in this hypercube is:

By substituting kBy substituting knn in equation (7), we obtain the following in equation (7), we obtain the following estimate:estimate:

PPnn(x) estimates p(x) as an average of functions of x and (x) estimates p(x) as an average of functions of x and

the samples (xthe samples (xii) (i = 1,… ,n). These functions ) (i = 1,… ,n). These functions can be can be general!general!

ni

1i n

in h

xxk

n

ini

1i nn h

xx

V1

n1

)x(p

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

14

IllustrationIllustration

The behavior of the Parzen-window methodThe behavior of the Parzen-window method

– Case where p(x) Case where p(x) N(0,1)N(0,1)

Let Let (u) = (1/(u) = (1/(2(2) exp(-u) exp(-u22/2) and h/2) and hnn = h = h11//n n (n>1) (n>1)

(h(h11: known parameter): known parameter)

Thus:Thus:

is an average of normal densities is an average of normal densities centered at the centered at the

samples xsamples xii..

n

ini

1i nn h

xx

h1

n1

)x(p

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

15

– Numerical results:Numerical results:

For n = 1 and hFor n = 1 and h11=1=1

For n = 10 and h = 0.1, the contributions of the For n = 10 and h = 0.1, the contributions of the individual samples are clearly observable !individual samples are clearly observable !

)1,x(N)xx(e21

)xx()x(p 12

12/1

11

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

16

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

17

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

18Analogous results are also obtained in two dimensions as illustrated:Analogous results are also obtained in two dimensions as illustrated:

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

19

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

20– Case where p(x) = Case where p(x) = 11.U(a,b) + .U(a,b) + 22.T(c,d) (unknown .T(c,d) (unknown

density) (mixture of a uniform and a triangle density) (mixture of a uniform and a triangle density)density)

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

21

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

22

Classification exampleClassification example

In classifiers based on Parzen-window estimation:In classifiers based on Parzen-window estimation:

We estimate the densities for each category and We estimate the densities for each category and classify a test point by the label corresponding classify a test point by the label corresponding to the maximum posteriorto the maximum posterior

The decision region for a Parzen-window The decision region for a Parzen-window classifier depends upon the choice of window classifier depends upon the choice of window function as illustrated in the following figure.function as illustrated in the following figure.

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)

23