Gamma mixture models for target recognition

Pattern Recognition 33 (2000) 2045}2054

Gamma mixture models for target recognition

Andrew R. Webb*

Defence Evaluation and Research Agency, St Andrews Road, Malvern, Worcestershire WR14 3PS, UK

Received 22 August 1997; received in revised form 14 October 1998; accepted 27 August 1999

Abstract

This paper considers a mixture model approach to automatic target recognition using high-resolution radar measure-ments. The mixture model approach is motivated from several perspectives including requirements that the targetclassi"er is robust to uncertainty in amplitude scaling, rotation and translation of the target. Estimation of the modelparameters is achieved using the expectation-maximisation (EM) algorithm. Gamma mixtures are introduced and there-estimation equations derived. The models are applied to the classi"cation of high-resolution radar range pro"les ofships and results compared with a previously published self-organising map approach. Crown Copyright ( 2000Published by Elsevier Science Ltd on behalf of Pattern Recognition Society. All rights reserved.

Keywords: Mixture models; Gamma distribution; Discrimination; Automatic target recognition; Radar; Reliability; Imprecision

1. Introduction

This paper addresses the problem of automatic targetrecognition using mixture models for the density of theradar cross-section (RCS) of targets and applies thesemodels to the classi"cation of radar range pro"les ofships. Our aim is to obtain an estimate of the posteriorprobability of class membership, p( jDx), for class j andmeasurement vector x. We seek to achieve this via Bayestheorem,

p( jDx)"p( xD j )p( j )

+j

p(xD j )p( j ), (1)

where p(x D j ) are the class-conditional densities and p( j )are the class priors. The reason why we seek directestimates of p( jDx) rather than to design a classi"er withclass decisions as the output (for example, a nearest-neighbour classi"er) is that the classi"er will in generalform part of a hierarchical decision-making process.Classi"cation is not an end in itself and will lead to someactions. Cost of making decisions will need to be con-

*Tel.: #44-1684-894490; fax: #44-1684-894384.E-mail address: [email protected] (A.R. Webb).

sidered. Also, supplementary domain-speci"c informa-tion (such as intelligence reports) may need to be com-bined with sensor-derived results in the decision-makingprocess.

Consequently, assessment of the performance of theclassi"er should not simply be based on error rate andshould take into account the estimates of the posteriorprobabilities, particularly in a target recognition context.However, error rate or misclassi"cation rate is the mostcommon measure of performance of a supervised classi-"cation rule. Yet error rate su!ers from several disadvan-tages. For example, it treats all correct classi"cationsas equally valuable and all incorrect classi"cations asequally bad. In the two class case, if comparing estimatedprobabilities with some threshold t, error rate makes nodistinction between estimates close to t and those farfrom t, provided that they are on the same side of t.Furthermore, simple error rate gives no indication of thelocal accuracy of the rule: for example, if points withmeasurement vector x are predicted to have probabilityp of being in class 1, then error rate does not allow us todeduce that about a proportion p of objects with thismeasurement vector will lie in class 1. Related to this,error rate will not distinguish between a rule which isbad because the classes have considerable overlap inthe measurement space and a rule which is bad becausethe class membership probabilities of highly separable

0031-3203/00/$20.00 Crown Copyright ( 2000 Published by Elsevier Science Ltd on behalf of Pattern Recognition Society.All rights reserved.PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 9 5 - 8

1Each component of the vector xicorresponds to a measure-

ment on a "xed-size range gate. Thus, large targets occupy morerange gates than small targets.

classes are poorly estimated. Hand [1] discusses theshortcomings of error rate in more detail.

However, estimation of the posterior probabilities(probabilities of class membership), p( jD x), through es-timation of the class-conditional densities, p(xD j) andBayes theorem (1) is not without its di$culties. The maindi$culty is in the estimation of target densities for datalying in a high-dimensional space. For the range pro"ledata considered later in this paper, the data vectors x

ilie

in a 130-dimensional space1 (xi3R130) and nonparamet-

ric methods of density estimation (for example, kernelmethods) are impractical due to the unrealistic amountsof data required for accurate density estimation. An alter-native approach is to trade #exibility for robustness andto use some simple parametric forms for the densities (forexample, multivariate normal distributions). Yet thesemay impose too much rigidity on the allowable forms ofthe density.

The approach considered in this paper is to model thedensity as a mixture of simple parametric forms whoseparameters are estimated through some optimisationprocedure. The advantage of such an approach, as we seein Section 2, is that it may be used to incorporate desir-able features such as robustness to translation of a testimage with respect to some centroid position (due touncertainty in the true centroid) and robustness to scale.Further, angular dependence of the target return cannaturally be expressed as a mixture, and di!erent scatter-ing models may also be incorporated into the sameframework.

In order to clarify the di!erence between the mixturemodel approach and a template matching approach,consider the following illustration. Let each class, c,be characterised by N templates, lc

i, i"1,2,N;

c"1,2,C. Given a measurement vector, y, we maycompute the distance, D, of y from class c as

D(y, c)"mini

d(y, lci),

where d(,) is the distance between two vectors. If thecentroid position of y is unknown, then it may be shiftedto that of the template lc

ior a minimum over all shifts

taken. The pattern y is assigned to the class correspond-ing to the minimum distance template. In the approachpresented in this paper, we regard the vectors lc

ias

characterising component distributions and representsthe density for class c as

pc(x)"

N+i

p(xDlci)p

i,

where piare the component weights, +

ipi"1. Thus, all

components contribute to the estimate of the densitywhich is used in the classi"cation process. In particular,for a test pattern, y, we consider all possible shifts that thedensity p(xDlc

i) may give rise to

pc(y)"

N+i

+s

p(ysDlc

i)p

i,

where p(ysDlc

i) is the density evaluated for y with centroid

shifted to position s. Thus, instead of a single templatecharacterising the class of a test pattern, all templatescontribute to a greater or lesser extent. This is exploredfurther in the following section.

The basic distribution that we use at the heart of ourmodel is a gamma distribution whose use may be moti-vated from physical arguments and results of empiricalinvestigations.

The outline of the paper is as follows. Section 2 de-scribes the mixture model approach to target classi"ca-tion and how robustness of the model to parameteruncertainty may be incorporated into the mixture modelframework. Section 3 derives the re-estimation procedurefor the mixture model parameters; Section 4 presentsresults of applying the approach to the classi"cation ofradar range pro"les. The paper concludes with a sum-mary of the main results and discussion of ways forward.

2. Mixture models

In constructing the target probability density function,we wish to exploit expected structure in the data, withoutmaking assumptions that are too restrictive or unrealis-tic, such as normal distributions. We do this throughmixture distributions. We shall assume that data aregathered on each of C target types as a function of aspectangle. For simplicity, we assume a single angle coordi-nate (data gathered as a function of azimuth at a "xedelevation) although in principle both azimuth and elev-ation may be considered. The data comprise radar cross-section range pro"les and a set of d measurements corre-sponding to d range gates encompassing the target isextracted. Thus, for each class, j, the training data set isMx

i, i"1,2, N

j;x

i3RdN, where N

jis the number of

patterns in class j. The xiare usually ordered by angle.

On test, we cannot assume that we know the physicalposition of the target precisely. Therefore, we must havea strategy for extracting an appropriate set of d rangegate measurements from the range pro"le.

We start from the premise that we wish to model theprobability density function of the target return,pc(x), c"1,2, C, where x is a set of target measure-

ments. We choose to model this density as a "nite mix-ture [2,3]. Mixture models have been used with successin a wide variety of applications [4] and here we consider

2046 A.R. Webb / Pattern Recognition 33 (2000) 2045}2054

2We assume that the training data comprise measurementson a "xed number of range gates that span the target.

their application to radar target modelling. We motivatetheir use by addressing, in turn, several important issuesconcerning the properties of the probability density func-tion. We drop the su$x c since we are not concerned witha speci"c class.

2.1. Angle dependence

We express the overall distribution for a given target,p(x), as

p(x)"Pp(xDh)p(h) dh (2)

for some variable h, notionally an angle coordinate.A "nite sample approximation to Eq. (2), based ong components, is

p(x)"g+i/1

p(xDhi)p

i, (3)

where +pi"1 and p

i"p(h

i) dh

i. Eq. (3) is a xnite mixture

distribution and the problem of calculating p(x) is re-placed by one of calculating, or specifying, the compon-ent distributions, p(xDh

i), the parameters associated

these distributions, and the component distributionweights, p

i. Given a data set, and some simple parametric

form for each of the p(xDhi), a maximum likelihood ap-

proach, perhaps based on the EM algorithm, may beemployed.

Here, we shall regard the variable h as angle of view ofthe target. This may be a multivariate quantity compris-ing azimuth and elevation, but in the examples of Section4, we consider azimuth only. Thus, Eq. (2) states that fora given angle of look, h, the target return, x, is a samplefrom a distribution, p(xDh), and the total distribution is anintegral over all angles of look.

The interpretation of h as angle allows a simple schemeto be employed for obtaining the parameters of the mix-ture (3), as follows. Suppose that we have data gatheredas a function of azimuth. Partition the data set accordingto angle into g equal-sized sectors. Use the data withineach sector to estimate the parameters (maximum likeli-hood estimation) of each component distribution separ-ately (assuming a simple parametric form, such asexponential or gamma). The component distributionweights are set to p

i"1/g. This scheme does not maxi-

mise the likelihood of the data given the mixture model(3), but provides an approximation in which the likeli-hood of the data within each sector, given the local modelis maximised.

Thus, a mixture model arises naturally if we considerthe target density to be a function of angle. Initialisationof the mixture components can be achieved using a dataset labelled by angle. Re"nement of the mixture compo-nents using a full maximum likelihood approach is de-scribed in Section 3.

2.2. Uncertainty in centroid position

If we gather our training data2 by positioning a vehicleon a turntable, then we know the range gate in whichthe physical centre of the target lies. Given a test rangepro"le, then ideally we need to extract the test image xso that the centre of the target lies in the same rangegate. We then evaluate p(x) for each class. However,in practice, we do not know the physical centre of thetarget (we may not know it for the training data if thetarget translates as it rotates*see Section 4), but wecan calculate a centroid using the test range pro"le.We then need to extract d range gates around this cen-troid by deciding which range gate (1 to d) to position thecentroid in.

Consider a single component of the mixture, p(xDhi)

that we have used to model the radar cross-sectiondensity. Suppose that we generate data (that is,sample range pro"le returns) from this component byrandom sampling. For each sample generated calculateits centroid. We use a simple centre-of-mass calcu-lation; for a range pro"le, x, we take the position of thecentroid as

sn"

+Nj/1

jxj

+Nj/1

xj

.

The centroid positions will not necessarily be in the samerange gate as the centroid of the distribution mean butthere will be a distribution, p(s

n), of centroid positions,

sn3M1,2, dN.We may partition the distribution, p(xDh

i), according to

the positions of the centroids of the generated data,

p(xDhi)"+

sn

pi(sn)p(xDh

i, s

n), (4)

where p(xDhi, s

n) is the probability distribution of samples

whose centroids are in cell sn.

The quantity pi(sn) is the probability that the centroid

occurs in sn

from data generated by p(xDhi). It is deter-

mined by the distribution, p(xDhi) and may be estimated

from that distribution through Monte Carlo simulation,by generating data and noting the distribution of cen-troid positions. The probabilities, p

i(sn), depend on i, the

mixture component.Therefore, the overall target distribution may be writ-

ten as the mixture,

p(x)"g+i/1

pi

d+

sn/1

pi(sn)p(xDh

i, s

n). (5)

A.R. Webb / Pattern Recognition 33 (2000) 2045}2054 2047

3For example, the test data may be gathered at a di!erentrange than that of the training data in a carefully controlledexperiment.

To evaluate the above equation for a test sample x,we may position the centroid of x over each allowablecentroid position (that is, each possible centroid positionpermitted by the distributions, p(xDh

i)) in turn, then

1. sum over centroid positions (many may not contributesince p

i(sn)"0, the centroid position is not allowable

for that component),2. sum over components.

Thus, we have expressed the uncertainty to patterntranslation problem as a mixture formulation (Eqs. (4)and (5)). In theory, the particular algorithm used tocalculate the centroid position is unimportant. Althoughthe distribution of centroid positions does depend on thisalgorithm, we integrate over all possible centroid posi-tions. In practice, however, it may be preferable to havean estimate of centroid position with a narrow distribu-tion, p

i(sn). This would reduce computational costs since

for many values of sn, p

i(sn)"0 * samples with that

centroid do not occur for p(xDhi).

2.3. Uncertainty in amplitude

We may need to scale the test image to normalisethe data. This may arise because the test samples maybe measured at a di!erent signal-to-noise ratio that thetraining data.3 If this is the case, how should thetest image be scaled in order to bring it in line withthe distribution of training images? Again, we mayuse our model of the data to determine an appropriaterange of amplitude values. In the above analysis, wepartitioned the data generated by a component of themixture distribution according to the centroid position.We may also partition according to the overall amplitudelevel. Let A denote some overall amplitude measurementof a pattern. Then a component distribution may bewritten

p(xDhi)"+

A

p(xDhi,A)p

i(A),

where p(xDhi,A) is the probability distribution of samples

whose overall amplitude is A, assumed to take discretevalues.

To evaluate for a given x, we scale it to have amplitudeA and substitute into

p(x)"g+i/1

pi+A

p(xDhi,A)p

i(A).

Thus, in a similar manner to uncertainty in shifts, we cantreat uncertainty in amplitude scaling by formulatinga mixture model.

For computational convenience, we need to discretizeA. Also, we need to de"ne a scheme for calculating theamplitude of a pattern. As in the centroid situation, inprinciple it does not matter how we calculate A, given x,since we integrate over the distribution of A. In practice,it may be important. If our estimator of amplitude hasa narrow distribution, then we could take the one ex-treme that the amplitude distribution is approximated byone cell at the distribution mean. Thus, all test images arescaled to the distribution mean.

2.4. Target distribution

We still need to specify the forms of the mixture com-ponent distributions. We take as our component distri-bution the gamma distribution, given by

p(x)"m

(m!1)!kAmx

k Bm~1

expA!mx

k B,where k is the mean and m is an order parameter. This ismotivated by the fact that this distribution has, as specialcases, the Swerling 1 and 2 models (m"1, Rayleighstatistics), Swerling 3 and 4 (m"2) and the non-#uctuat-ing target (mPR), although other values of m havebeen observed empirically [5].

A simple multivariate form is to assume independenceof range gates so that the component distribution is givenby

p(xDhi)"

d<j/1

mji

(mji!1)!ki

jAm

jixj

kijBmji~1

expA!m

jixj

kijB,

where mji

is the order parameter for range gate j ofcomponent i and ki

jis the mean. Thus, for each compon-

ent, there are two parameters associated with each rangegate.

We may of course represent the component distribu-tion itself as a mixture in order to model di!erent scatter-ing distributions. Further, we may partition it into twobasic components,

p(xDhi)"p

tt(x)#p

nn(x), (6)

where t(x) is a target distribution, n(x) a noise or clutterdistribution and p

tand p

nare prior probabilities

(pt#p

n"1). This allows the presence of noise to be

taken into account. We would need to assume a form forthe noise distribution. This will depend on the applica-tion (we could assume Gaussian noise, Weibull clutter,for example) and may include free parameters to bespeci"ed or estimated. The target distribution could itselfbe a mixture or a single-component gamma density.


Fig. 1. A representation of a target mixture model trained usinggathered data for each target separately.

Fig. 2. Evaluating the probability density function for a testrange pro"le, x. x(s

i,A

j) is the test image extracted so that its

centroid is at siand amplitude is A

j.

2.5. Summary

In this section we have described several di!erent waysin which mixture models may arise in a target modellingsituation.

1. to incorporate angular dependencies.2. to model the uncertainty in target centroid position by

summing over unknown centroid position.3. to model the uncertainty in overall amplitude vari-

ation by summing over unknown amplitude values.4. to incorporate both noise and target models to reduce

sensitivity to noise on test.

The advantage of the mixture model approach is thatit provides a simple #exible model for the distributionof target returns, while also incorporating into the sameframework features such as the ability to handle uncer-tainty in parameter estimates that are important ina practical implementation.

The main philosophy behind the mixture model ap-proach is that we take account of uncertainty in estimatesof quantities such as angle, amplitude and centroid. Ina typical template matching approach, we would takeour test pattern x, suitably scaled (perhaps normalised tounit magnitude) and centred, and compare it with a set ofreference templates. The pattern x is assigned to the classof the template that has minimum distance (in somesense) from x. In the mixture model approach, we ac-knowledge that there is uncertainty in our estimates ofamplitude and centroid and estimate the probability den-sity function at x, which includes summations over angle,amplitude and centroid.

How do we use mixture models in practice? In traininga model, we specify the form of the component distri-butions, p(xDh

i) (either as single component exponential

or gamma, or as a mixture), the number of componentdistributions in the mixture (ideally, this is deter-mined from the data) and use a maximum likelihoodapproach to determine the model parameters for eachclass in turn.

The training model is depicted schematically in Fig. 1.In the "gure, p(xDh

i) is expressed as a sum of a noise

distribution and a target distribution, which itself ismodelled as a sum of two distributions. In the examplesof Section 4, we simply use a single component multivari-ate gamma distribution for p(xDh

i).

Once the parameters of the component distributionshave been obtained, we consider each component inturn to estimate (perhaps through Monte Carlo simula-tion) the distribution of centroid positions and ampli-tudes.

On test, we need to extract a test image from a set ofrange gates and scale appropriately. For each compon-ent, we extract the image so that its centroid lies in eachof the allowable centroid positions for that component.Similarly, we scale it to each possible amplitude (suitably

quantised). We then evaluate the probability densityfunction for the component and sum over all amplitudesand centroid positions. This is shown in Fig. 2 for amodel where component i gives rise to three centroidpositions and two amplitude values.


4We are assuming that each mixture component can berepresented by a product of univariate gamma distributions.This does not imply that we are making the independenceassumption for the mixture distribution, but only for each com-ponent. Thus, interpreting h

ias an angle indicator, we are

assuming that locally the range gates are independent, but notglobally since scatterers move from one range gate to another asthe target rotates.

3. Parameter estimation

We now address the problem of estimating the para-meters of the mixture components. We shall adopt amaximum likelihood approach and derive updateequations for the model parameters for gamma mixturemodels.

Given a set of n observations (x1,2,x

n), the likelihood

function is

¸0(W)"

n<i/1

g+j/1

pjp(x

iDW

j), (7)

where W denotes the set of parameters MW1,2,W

gN and

Wj

are the parameters associated with component j. Ingeneral, it is not possible to solve L¸

0/LW"0 explicitly

for the parameters of the model and iterative schemesmust be employed. One approach for maximising thelikelihood ¸

0(W) is to use a general class of iterative

procedures known as EM (expectation-maximisation)algorithms, introduced in the context of missing dataestimation by Dempster et al. [6], though it had ap-peared in many forms previously [7].

The EM procedure is well known (for example, Ref.[3]) and we do not present the stages in the derivation ofthe re-estimation equations here, simply quoting results.

Let W(m)k

be the estimate of the parameters of the kthcomponent at the mth stage of the iteration. Let w

ijbe the

probability of group j given xiand W(m), given by

wij"

p(m)j

p(xiDW(m)

j)

+jp(m)j

p(xiDW(m)

j). (8)

The re-estimate of the component weights, pj, is given

by

p(j"

1

n

n+i/1

wij. (9)

We assume that each range gate has a mixture ofgamma distributions, with components given by the dis-tribution (independence assumption)4

p(xDWk)"

d<j/1C

mjk

(mjk!1)!kk

jAm

jkxj

kkjBmjk~1

]expA!m

jkxj

kkjBD,

where x"(x1,2, x

d)T, the vector of measurements; the

parameters of the kth group, Wk

are Mlk,mjk

N wherelk"(kk

1,2,kk

d)T and m

jkis the value of m for the kth

group and jth range gate. The re-estimation of the meanvector of the kth mixture component is

lk"+n

i/1wikxi

+ni/1

wik

(10)

and the mjk

satisfy

l(mjk

)"!

+ni/1

wik

log(xij/kk

j)

+ni/1

wik

, (11)

where l(m)"log(m)!t(m) and t(m) is the digammafunction.

The function l(m) satis"es the property that

l(m)+1

2m#

1

12m2#2 as mPR,

l(m)+1

2m#const. as mP0.

Therefore, there is at least one solution m'0 for whichl(m)"d for d'0. A simple gradient scheme should besu$cient to obtain a solution.

Thus, the operations in the maximisation of the EMprocedure are to estimate lk using Eq. (10) and then tosolve Eq. (11) for the m

jk, using an iterative gradient

procedure.

4. Application to radar range pro5les

4.1. Data

The data consist of range pro"les of ships of seven classtypes. There are 19 "les, each of which contains rangepro"les of a ship which are recorded as the ship turnsthrough 3603. The aspect angle of the ship varies smooth-ly as one steps through the "le and the centroid of theship drifts across range gates as the ship turns. Eachrange pro"le consists of 130 measurements on radarreturns from 130 range gates of 3 m in extent. The radaroperating frequency was 10 GHz. These data sets havebeen used by Luttrell [8] and details are given in Table 1.The data sets are divided into 11 training "les and 8 test"les. As we can see from the table, there is no test dataavailable for class 7. Several other classes have more thanone rotation available for training and testing.

Figs. 3}6 show three-dimensional plots of the sectormean templates (from 40 sectors) as a function of rangegate number for four selected ship classes: ships 1, 3, 5and 6. The extent of the target in terms of the number ofrange gates occupied varies as a function of angle. This isclearly seen in Fig. 3 in particular.


Table 1Details of data "les

Target class No. of pro"les

Train Test

1 3334 20852 2636, 4128 21163 2604, 2248, 2923 2476, 36194 8879, 3516 4966, 25605 3872 36436 1696 22167 1839

Fig. 3. RCS versus range gate number for sector means ofship 1 (40 templates).




In each of the experiments below, 1200 samples over3603 from each of the training set "les were used in modeltraining. This enables comparison to be made with theresults of Luttrell [8].

4.2. Implementation details

In each experiment, a mixture model density was con-structed for each "le and those densities corresponding tothe same target type are combined with equal weight. Fora mixture model with g components, the parameters ofthe mixture model were initialised by dividing the datainto g equal angle sectors and for each sector separatelycalculating the maximum likelihood estimate of the meanand order parameters of the gamma distribution. The


Fig. 7. Mean RCS (solid line) and mean#standard deviationand mean!standard deviation (dashed lines) for pro"le 30 ofship 1.




EM algorithm was run on the whole training data set andthe "nal value of the log likelihood, log(¸), at conver-gence recorded.

There has been considerable research into model selec-tion for multivariate mixtures. We adopted a simpleapproach and took our model selection criterion to be

AIC2"!2log(¸)#2Np,

where Np

is the number of parameters in the model,

Np"2(d#1)g!1.

This has been considered by Bozdogan and Sclove [9];other measures are described and assessed by Celeux andSoromenho [10].

Once the components of the mixture model have beendetermined, samples from the component distributionswere generated and the distribution of the centroidsmeasured for each mixture component. It was found thatmost samples ('99%) lay within two range gates of theposition of the centroid of the component mean. There-fore on test, a test pattern was shifted to all positionswithin 2 range gates of the component mean.

The amplitude of the test pattern was scaled to theamplitude of the component mean.

4.3. Ship proxles

Below we give results of the method applied to the shipdata. We report confusion matrices despite their limita-tions as measures of classi"er performance. Ideally, wewould like to say how well the estimate of the posteriordensity, p( ( j D x ), approximates the true density p( j Dx ).A measure of this discrepancy is the reliability of theclassi"er [11] or imprecision [1]. However, p( j Dx ) isunknown in practice and techniques for evaluatingbounds on imprecision are currently under investigation.

Classi"cation is performed by assigning x to the classj for which p( (xD j ) is the greatest.

Figs. 7}10 give illustrations of sample component dis-tributions (a mean pro"le and standard deviations aboutthe mean) after maximum likelihood estimation for fourship types.


Table 2Gamma mixture results (top) and self-organising networkresults (bottom) [8] for a test set pattern $403 bow-on orstern-on to the radar

Predicted True classclass

1 2 3 4 5 6

1 84.4 8.9 8.5 4.2 2.3 10.32 12.6 86.1 5.8 6.0 0.5 10.73 0.6 0.2 68.9 3.4 5.5 23.24 1.0 2.0 5.5 73.1 17.0 11.45 0.8 0.0 5.3 8.8 57.1 0.96 0.3 0.0 2.9 1.7 11.3 37.77 0.3 2.9 3.1 2.8 6.3 5.8

1 72.0 4.4 2.9 5.2 1.7 6.02 11.3 70.7 10.1 4.8 2.8 8.53 9.3 10.0 67.7 10.0 24.6 20.24 0.6 3.0 1.8 59.6 4.4 0.75 0.2 2.3 3.3 9.0 57.9 2.36 2.5 5.9 9.8 4.5 6.2 59.77 4.2 3.6 4.4 6.7 2.3 2.5

Fig. 11. AIC2 as a function of the number of mixture compo-nents for each "le in the training set.

Fig. 12. Classi"cation rate as a function of the number of mix-ture components (equal numbers of components per class).

Table 3Gamma mixture results for the whole test set for a model chosenaccording to minimum values of AIC2

Predicted True classclass

1 2 3 4 5 6

1 59.1 5.6 6.5 5.7 0.5 10.02 12.1 67.2 3.2 1.7 0.0 5.33 5.7 8.9 64.2 5.5 3.4 28.94 9.9 10.5 11.0 72.5 25.2 20.75 0.1 4.3 8.3 12.9 67.3 3.16 12.8 2.8 4.0 1.0 2.3 30.77 0.2 0.7 2.9 0.6 1.3 1.2

4.3.1. Experiment 1In this experiment, the classi"er is trained with 40

components per "le and tested on the test data "les withthe ship orientated so that it is in the range $403bow-on or stern-on to the radar. This restriction is ap-plied so that the results may be compared with thosegiven by Luttrell [8], where a classi"er based on a self-organising network was designed. Table 2 reproduces theresults of Luttrell [8] and gives the mixture model resultsalongside. There are 10,510 test patterns.

The classi"cation rate on test of the mixture modelapproach is 68.6% compared to 63.9% given by Luttrell[8]. There are some notable di!erences in performance:there is much better classi"cation rate on classes 1 and2 and much poorer performance on class 6. However,given that there are 10,510 test patterns, the improve-ment is signi"cant.

4.3.2. Experiment 2In this experiment, mixture models were trained with

varying numbers of components and tested on the wholeof the test set (there is no angle restriction). Again, 1200samples per "le were used and Fig. 11 plots the modelselection criterion AIC2 as a function of the number ofcomponents for each of the training "les. Fig. 12 plots theclassi"cation rate as a function of the number of compo-nents, where each model has the same number of compo-nents. The minimum of AIC2 occurs for each of the11 "les when the number of components is given by(80,80,70,70,60,80,70,70,50,70,100). Thus, each model

requires a di!erent number of components. The classi-"cation performance for this model is given in Table 3.

The overall classi"cation rate is 64% for the selectedmodel. This is about the level of the plateau region in


About the Author*ANDREW WEBB received the B.Sc. degree in mathematics from the University of Manchester in 1976 and thePh.D. degree from the University of St. Andrews in 1980. Since 1979, he has worked for the Defence Evaluation and Research Agency(formerly the Royal Signals and Radar Establishment) at Malvern, England. His major research interests include statistical patternrecognition, pattern processing applications, neural networks and radar signal processing. He is a Fellow of the Institute of Mathematicsand Its Applications (C. Math.) and a Fellow of the Royal Statistical Society. He is an honorary member of sta! at the University ofBirmingham. He is author of the book Statistical Pattern Recognition published by Arnold and he has published papers in a number ofhis specialities. He is associate editor of the journals IEEE Transactions on Pattern Analysis and Machine Intelligence and Statistics andComputing.

Fig. 12. Thus, the AIC2 criterion has provided a modelthat is close to the best test set performance over therange of model sizes considered.

5. Summary and discussion

In this paper we have developed a gamma mixturemodel approach to the classi"cation of radar range pro-"les. Uncertainty in amplitude scaling of the test patternand unknown orientation and location of the target can betaken into account in the mixture model framework. Theapproach has been applied to the classi"cation of shippro"les and improved performance (in terms of error rate)achieved compared with previously published results.

However, error rate is only one measure of perfor-mance. It does not tell us how good the classi"er is. Wemay get an error rate of 40% say, but if the classes areindeed separable (for the given features), then we couldimprove performance by better classi"er design. How-ever, if the Bayes error rate is itself 40%, then we arewasting e!ort trying to improve classi"er design. Wemust seek additional variables or features. This is one ofthe motivations behind the work in this paper: to provideestimates of the posterior probabilities that may be com-bined with other information (for example, misclassi"ca-tion costs, domain-speci"c data, intelligence reports) in ahierarchical manner for decision making.

Thus, we have developed a semi-parametric densityestimator that incorporates robustness features andmakes use of physical/empirical scattering distributions.The approach may not give better performance in termsof error rate than some other classi"ers (although it isclearly better than single component parametric distribu-tions), but hopefully better approximations to the trueposterior probabilities. This is di$cult to assess andremains an issue for continuing study. Other areas forfurther work include assessment on two-dimensional(SAR/ISAR) images, sensitivity to noise and clutter andmodelling correlation between components using latentvariable models.

6. Summary

The aim of this research is to develop an approachto target classi"cation that produces estimates of the

posterior probabilities that may be used as part of a hier-archical classi"cation scheme. This paper adopts a mix-ture model approach to modelling the class conditionaldensities of high-resolution radar measurements. Themixture model approach is motivated from several per-spectives including requirements that the target classi"eris robust to uncertainty in amplitude scaling, rotationand transformation of the target. The components of themixture are chosen to be gamma distribution since thesehave physical and empirical justi"cation.

Estimation of the mixture model parameters isachieved using the expectation-maximisation (EM) algo-rithm. Re-estimation equations are derived. The modelsare applied to the classi"cation of high-resolution radarrange pro"les of ships and results compared with a pre-viously published self-organising map approach and im-proved performance (in terms of error rate) achieved.

References

[1] D.J. Hand, Construction and Assessment of Classi"cationRules, Wiley, Chichester, 1997.

[2] B.S. Everitt, D.J. Hand, Finite Mixture Distributions,Chapman & Hall, London, 1981.

[3] D.M. Titterington, A.F.M. Smith, U.E. Makov, StatisticalAnalysis of Finite Mixture Distributions, Wiley, NewYork, 1985.

[4] A.R. Webb, Statistical Pattern Recognition, Arnold,London, 1999.

[5] M. Skolnik, Introduction to Radar Systems, 2nd Edition,McGraw}Hill, New York, 1980.

[6] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum like-lihood from incomplete data via the EM algorithm,J. R. Statist. Soc. Ser. B 39 (1977) 1}38.

[7] X.-L. Meng, D. van Dyk, The EM algorithm * an oldfolk-song sung to a fast new tune (with discussion),J. R. Statist. Soc. Ser. B 59 (3) (1997) 511}567.

[8] S.P. Luttrell, Using self-organising maps to classify radarrange pro"les, in Proceedings of the Fourth InternationalConference on Arti"cial Neural Networks, Cambridge,IEE, 1995, pp. 335}340.

[9] H. Bozdogan, S. Sclove, Multi-sample cluster analysisusing Akaike's information criterion, Ann. Inst. Statist.Math. 36 (1984) 163}180.

[10] G. Celeux, G. Soromenho, An entropy criterion for asses-sing the number of clusters in a mixture model, J. Classi-"cation 13 (1996) 195}212.

[11] G.J. McLachlan, Discriminant Analysis and StatisticalPattern Recognition, Wiley, New York, 1992a.


Gamma mixture models for target recognition

Documents

Transcript of Gamma mixture models for target recognition