Chapter 4 (Part 1): NonParametric Classification Introduction Introduction Density Estimation...

date post
19Dec2015 
Category
Documents

view
241 
download
1
Transcript of Chapter 4 (Part 1): NonParametric Classification Introduction Introduction Density Estimation...
Chapter 4 (Part 1):Chapter 4 (Part 1):NonParametric NonParametric ClassificationClassification
IntroductionIntroduction
Density EstimationDensity Estimation
Parzen WindowsParzen WindowsAll materials used in this course were taken from the textbook All materials used in this course were taken from the textbook “Pattern Classification”“Pattern Classification” by Duda et al., John Wiley & Sons, 2001 by Duda et al., John Wiley & Sons, 2001 with the permission of the authors and the publisherwith the permission of the authors and the publisher
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
2
IntroductionIntroduction
All Parametric densities are unimodal (have a single All Parametric densities are unimodal (have a single local maximum), whereas many practical problems local maximum), whereas many practical problems involve multimodal densitiesinvolve multimodal densities
Nonparametric procedures can be used with Nonparametric procedures can be used with arbitrary distributions and without the assumption arbitrary distributions and without the assumption that the forms of the underlying densities are knownthat the forms of the underlying densities are known
There are two types of nonparametric methods:There are two types of nonparametric methods:
Estimating P(x  Estimating P(x  jj)) Bypass probability and go directly to aposteriori probability Bypass probability and go directly to aposteriori probability
estimationestimation
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
3 Density EstimationDensity Estimation
Basic idea:Basic idea:
Probability that a vector x will fall in region Probability that a vector x will fall in region RR is: is:
P is a smoothed (or averaged) version of the density P is a smoothed (or averaged) version of the density function p(x) if we have a sample of size n; therefore, the function p(x) if we have a sample of size n; therefore, the probability that k points fall in probability that k points fall in RR is then: is then:
and the expected value for k is:and the expected value for k is:
E(k) = nP (3)E(k) = nP (3)
(1) 'dx)'x(pP
(2) )P1(P k
nP knk
k
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
4ML estimation of P = ML estimation of P = is reached foris reached for
Therefore, the ratio k/n is a good estimate Therefore, the ratio k/n is a good estimate for the probability P and hence for the for the probability P and hence for the density function p. density function p.
p(x) is continuous and that the region p(x) is continuous and that the region RR is is so small that p does not vary so small that p does not vary significantly within it, we can write:significantly within it, we can write:
where is a point within where is a point within RR and V the volume enclosed by and V the volume enclosed by RR..
)P(Max k
Pnkˆ
(4) V)x(p'dx)'x(p
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
5Combining equation (1) , (3) and (4) yields:Combining equation (1) , (3) and (4) yields:V
n/k)x(p
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
6 Density Estimation (cont’d)Density Estimation (cont’d)
Justification of equation (4)Justification of equation (4)
We assume that p(x) is continuous and that We assume that p(x) is continuous and that region region RR is so small that p does not vary is so small that p does not vary significantly within significantly within RR. Since p(x) = constant, . Since p(x) = constant, it is not a part of the sum.it is not a part of the sum.
(4) V)x(p'dx)'x(p
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
7
Where: Where: ((RR) is: a surface in IR) is: a surface in IR22
a volume in IRa volume in IR33
a hypervolume in IRa hypervolume in IRnn
Since p(x) Since p(x) p(x’) = constant, therefore in IR p(x’) = constant, therefore in IR33::
)()'x(p'dx)x(l1)'x(p'dx)'x(p'dx)'x(p
nVk
)x(p and
V).x(p'dx)'x(p
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
8 Condition for convergenceCondition for convergence
The fraction k/(nV) is a space averaged value of The fraction k/(nV) is a space averaged value of p(x).p(x). p(x) is obtained only if V approaches zero.p(x) is obtained only if V approaches zero.
This is the case where no samples are included This is the case where no samples are included in in RR: it is an uninteresting case!: it is an uninteresting case!
In this case, the estimate diverges: it is an In this case, the estimate diverges: it is an uninteresting case!uninteresting case!
fixed)n (if 0)x(plim0k ,0V
)x(plim0k ,0V
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
9
The volume V needs to approach 0 anyway The volume V needs to approach 0 anyway if we want to use this estimationif we want to use this estimation
Practically, V cannot be allowed to become small since Practically, V cannot be allowed to become small since the number of samples is always limitedthe number of samples is always limited
One will have to accept a certain amount of variance in One will have to accept a certain amount of variance in the ratio k/nthe ratio k/n
Theoretically, if an unlimited number of samples is Theoretically, if an unlimited number of samples is available, we can circumvent this difficultyavailable, we can circumvent this difficultyTo estimate the density of x, we form a sequence of To estimate the density of x, we form a sequence of regionsregions
RR11, , RR22,…containing x: the first region contains one ,…containing x: the first region contains one sample, the second two samples and so on.sample, the second two samples and so on.
Let VLet Vnn be the volume of be the volume of RRnn, k, knn the number of samples the number of samples falling in falling in RRnn and p and pnn(x) be the n(x) be the nthth estimate for p(x): estimate for p(x):
ppnn(x) = (k(x) = (knn/n)/V/n)/Vn n (7)(7)
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
10
Three necessary conditions should apply if we want pThree necessary conditions should apply if we want pnn(x) to converge to (x) to converge to p(x):p(x):
There are two different ways of obtaining sequences of regions that There are two different ways of obtaining sequences of regions that satisfy these conditions:satisfy these conditions:
(a) Shrink an initial region where V(a) Shrink an initial region where Vnn = 1/ = 1/n and show that n and show that
This is called This is called “the Parzenwindow estimation method”“the Parzenwindow estimation method”
(b) Specify k(b) Specify knn as some function of n, such as k as some function of n, such as knn = = n; the volume Vn; the volume Vnn is is
grown until it encloses kgrown until it encloses knn neighbors of x. This is called neighbors of x. This is called “the k“the knnnearest nearest
neighbor estimation method”neighbor estimation method”
0n/klim )3
klim )2
0Vlim )1
nn
nn
nn
)x(p)x(pn
n
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
11
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
12
Parzen WindowsParzen Windows
Parzenwindow approach to estimate densities Parzenwindow approach to estimate densities assume that the region assume that the region RRnn is a ddimensional is a ddimensional hypercubehypercube
((xx((xxii)/h)/hnn) is equal to unity if x) is equal to unity if xii falls within the falls within the hypercube of volume Vhypercube of volume Vnn centered at x and equal to centered at x and equal to zero otherwise.zero otherwise.
otherwise 0
d , 1,...j 21
u 1(u)
:function window following the be (u) Let
) of edge the of length :(h hV
j
nndnn
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
13 The number of samples in this hypercube is:The number of samples in this hypercube is:
By substituting kBy substituting knn in equation (7), we obtain the following in equation (7), we obtain the following estimate:estimate:
PPnn(x) estimates p(x) as an average of functions of x and (x) estimates p(x) as an average of functions of x and
the samples (xthe samples (xii) (i = 1,… ,n). These functions ) (i = 1,… ,n). These functions can be can be general!general!
ni
1i n
in h
xxk
n
ini
1i nn h
xx
V1
n1
)x(p
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
14
IllustrationIllustration
The behavior of the Parzenwindow methodThe behavior of the Parzenwindow method
– Case where p(x) Case where p(x) N(0,1)N(0,1)
Let Let (u) = (1/(u) = (1/(2(2) exp(u) exp(u22/2) and h/2) and hnn = h = h11//n n (n>1) (n>1)
(h(h11: known parameter): known parameter)
Thus:Thus:
is an average of normal densities is an average of normal densities centered at the centered at the
samples xsamples xii..
n
ini
1i nn h
xx
h1
n1
)x(p
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
15
– Numerical results:Numerical results:
For n = 1 and hFor n = 1 and h11=1=1
For n = 10 and h = 0.1, the contributions of the For n = 10 and h = 0.1, the contributions of the individual samples are clearly observable !individual samples are clearly observable !
)1,x(N)xx(e21
)xx()x(p 12
12/1
11
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
16
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
17
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
18Analogous results are also obtained in two dimensions as illustrated:Analogous results are also obtained in two dimensions as illustrated:
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
19
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
20– Case where p(x) = Case where p(x) = 11.U(a,b) + .U(a,b) + 22.T(c,d) (unknown .T(c,d) (unknown
density) (mixture of a uniform and a triangle density) (mixture of a uniform and a triangle density)density)
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
21
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
22
Classification exampleClassification example
In classifiers based on Parzenwindow estimation:In classifiers based on Parzenwindow estimation:
We estimate the densities for each category and We estimate the densities for each category and classify a test point by the label corresponding classify a test point by the label corresponding to the maximum posteriorto the maximum posterior
The decision region for a Parzenwindow The decision region for a Parzenwindow classifier depends upon the choice of window classifier depends upon the choice of window function as illustrated in the following figure.function as illustrated in the following figure.
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Ch4 (Part 1)
23