Unsupervised classification of radar images using hidden Markov ...

12
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 3, MARCH 2003 675 Unsupervised Classification of Radar Images Using Hidden Markov Chains and Hidden Markov Random Fields Roger Fjørtoft, Member, IEEE, Yves Delignon, Member, IEEE, Wojciech Pieczynski, Marc Sigelle, Member, IEEE, and Florence Tupin Abstract—Due to the enormous quantity of radar images acquired by satellites and through shuttle missions, there is an evident need for efficient automatic analysis tools. This paper describes unsupervised classification of radar images in the framework of hidden Markov models and generalized mixture estimation. Hidden Markov chain models, applied to a Hilbert–Peano scan of the image, constitute a fast and robust alternative to hidden Markov random field models for spatial regularization of image analysis problems, even though the latter provide a finer and more intuitive modeling of spatial relationships. We here compare the two approaches and show that they can be combined in a way that conserves their respective advantages. We also describe how the distribution families and parameters of classes with constant or textured radar reflectivity can be determined through generalized mixture estimation. Sample results obtained on real and simulated radar images are presented. Index Terms—Generalized mixture estimation, hidden Markov chains, hidden Markov random fields, radar images, unsupervised classification. I. INTRODUCTION B OTH visual interpretation and automatic analysis of data from imaging radars are complicated by a fading effect called speckle, which manifests itself as a strong granularity in detected images (amplitude or intensity). For example, simple classification methods based on thresholding of gray levels are generally inefficient when applied to speckled images, due to the high degree of overlap between the distributions of the different classes. Speckle is caused by the constructive and de- structive interference between waves returned from elementary scatterers within each resolution cell. It is generally modeled as a multiplicative random noise [1], [2]. At full resolution (single-look images), the standard deviation of the intensity is Manuscript received December 4, 2001; revised October 3, 2002. This work was supported by the Groupe des Ecoles des Télécommunications (GET) in the framework of the project “Segmentation d’images radar.” R. Fjørtoft is with the Norwegian Computing Center (NR), 0314 Oslo, Norway (e-mail: [email protected]). Y. Delignon is with the Ecole Nouvelle d’Ingénieurs en Communications (ENIC), Département Electronique, 59658 Villeneuve d’Ascq, France (e-mail: [email protected]). W. Pieczynski is with the Institut National des Télécommunications, Départe- ment CITI, 91011Evry, France (e-mail: [email protected]). M. Sigelle and F. Tupin are with the Ecole Nationale Superieure des Télécom- munications, Département TSI, 75634 Paris, France (e-mail: [email protected]; [email protected]). Digital Object Identifier 10.1109/TGRS.2003.809940 equal to the local mean reflectivity, corresponding to an SNR of 0 dB. One possible way to overcome this problem is to exploit spatial dependencies among different random variables via a hidden Markov model. Markov random fields are frequently used to model stochastic interactions among classes and to allow a global Bayesian op- timization of the classification result [3], according to criteria such as the maximum a posteriori (MAP) or the maximum pos- terior marginal (MPM) [3]–[10]. However, the computing time is considerable and often prohibitive with this approach. A sub- stantially quicker alternative is to use Markov chains, which can be adapted to two-dimensional (2-D) analysis through a Hilbert–Peano scan of the image [11]–[14]. In the case of unsupervised classification, the statistical properties of the different classes are unknown and must be estimated. For each of the Markov models cited above, we can estimate characteristic parameters with iterative methods such as expectation–maximization (EM) [15]–[17], stochastic expectation–maximization (SEM) [18], or iterative conditional estimation (ICE) [19], [20]. Classic mixture estimation consists in identifying the parameters of a set of Gaussian distributions corresponding to the different classes of the image. The weighted sum (or mixture) of the distributions of the different classes should approach the overall distribution of the image. In generalized mixture estimation, we are not limited to Gaussian distributions, but to a finite set of distribution families [14], [21]. For each class we thus seek both the distribution family and the parameters that best describe its samples. In this paper, we consider some distribution families that are well adapted to single- or multilook amplitude radar images and to classes with or without texture [2]. By texture, we here refer to spatial variations in the underlying radar reflectivity and not the variations due to speckle. According to the multiplicative noise model, the observed intensity is the product of the radar reflectivity and the speckle. In this study, we limit ourselves to the ICE estimation method and the MPM classification criterion. When analyzing an image, the only input parameters entered by the user is the number of classes and the list of distribution families that are allowed in the generalized mixture. The estimation and classification schemes are described separately for the two Markov models. We compare their performances and show in particular that the Markov chain method can compete with the Markov random field method in terms of estimation and classification accuracy, while being much faster. We furthermore introduce a hybrid 0196-2892/03$17.00 © 2003 IEEE

Transcript of Unsupervised classification of radar images using hidden Markov ...

Page 1: Unsupervised classification of radar images using hidden Markov ...

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 3, MARCH 2003 675

Unsupervised Classification of Radar ImagesUsing Hidden Markov Chains and Hidden

Markov Random FieldsRoger Fjørtoft, Member, IEEE, Yves Delignon, Member, IEEE, Wojciech Pieczynski, Marc Sigelle, Member, IEEE,

and Florence Tupin

Abstract—Due to the enormous quantity of radar imagesacquired by satellites and through shuttle missions, there isan evident need for efficient automatic analysis tools. Thispaper describes unsupervised classification of radar imagesin the framework of hidden Markov models and generalizedmixture estimation. Hidden Markov chain models, applied to aHilbert–Peano scan of the image, constitute a fast and robustalternative to hidden Markov random field models for spatialregularization of image analysis problems, even though thelatter provide a finer and more intuitive modeling of spatialrelationships. We here compare the two approaches and showthat they can be combined in a way that conserves their respectiveadvantages. We also describe how the distribution families andparameters of classes with constant or textured radar reflectivitycan be determined through generalized mixture estimation.Sample results obtained on real and simulated radar images arepresented.

Index Terms—Generalized mixture estimation, hidden Markovchains, hidden Markov random fields, radar images, unsupervisedclassification.

I. INTRODUCTION

BOTH visual interpretation and automatic analysis of datafrom imaging radars are complicated by a fading effect

called speckle, which manifests itself as a strong granularity indetected images (amplitude or intensity). For example, simpleclassification methods based on thresholding of gray levelsare generally inefficient when applied to speckled images, dueto the high degree of overlap between the distributions of thedifferent classes. Speckle is caused by the constructive and de-structive interference between waves returned from elementaryscatterers within each resolution cell. It is generally modeledas a multiplicative random noise [1], [2]. At full resolution(single-look images), the standard deviation of the intensity is

Manuscript received December 4, 2001; revised October 3, 2002. This workwas supported by the Groupe des Ecoles des Télécommunications (GET) in theframework of the project “Segmentation d’images radar.”

R. Fjørtoft is with the Norwegian Computing Center (NR), 0314 Oslo,Norway (e-mail: [email protected]).

Y. Delignon is with the Ecole Nouvelle d’Ingénieurs en Communications(ENIC), Département Electronique, 59658 Villeneuve d’Ascq, France (e-mail:[email protected]).

W. Pieczynski is with the Institut National des Télécommunications, Départe-ment CITI, 91011 Evry, France (e-mail: [email protected]).

M. Sigelle and F. Tupin are with the Ecole Nationale Superieure des Télécom-munications, Département TSI, 75634 Paris, France (e-mail: [email protected];[email protected]).

Digital Object Identifier 10.1109/TGRS.2003.809940

equal to the local mean reflectivity, corresponding to an SNR of0 dB. One possible way to overcome this problem is to exploitspatial dependencies among different random variables via ahidden Markov model.

Markov random fields are frequently used to model stochasticinteractions among classes and to allow a global Bayesian op-timization of the classification result [3], according to criteriasuch as the maximuma posteriori(MAP) or the maximum pos-terior marginal (MPM) [3]–[10]. However, the computing timeis considerable and often prohibitive with this approach. A sub-stantially quicker alternative is to use Markov chains, whichcan be adapted to two-dimensional (2-D) analysis through aHilbert–Peano scan of the image [11]–[14].

In the case of unsupervised classification, the statisticalproperties of the different classes are unknown and must beestimated. For each of the Markov models cited above, wecan estimate characteristic parameters with iterative methodssuch as expectation–maximization (EM) [15]–[17], stochasticexpectation–maximization (SEM) [18], or iterative conditionalestimation (ICE) [19], [20]. Classic mixture estimation consistsin identifying the parameters of a set of Gaussian distributionscorresponding to the different classes of the image. Theweighted sum (or mixture) of the distributions of the differentclasses should approach the overall distribution of the image. Ingeneralized mixture estimation, we are not limited to Gaussiandistributions, but to a finite set of distribution families [14],[21]. For each class we thus seek both the distribution familyand the parameters that best describe its samples. In thispaper, we consider some distribution families that are welladapted to single- or multilook amplitude radar images and toclasses with or without texture [2]. By texture, we here refer tospatial variations in the underlying radar reflectivity and notthe variations due to speckle. According to the multiplicativenoise model, the observed intensity is the product of the radarreflectivity and the speckle.

In this study, we limit ourselves to the ICE estimation methodand the MPM classification criterion. When analyzing an image,the only input parameters entered by the user is the numberof classes and the list of distribution families that are allowedin the generalized mixture. The estimation and classificationschemes are described separately for the two Markov models.We compare their performances and show in particular that theMarkov chain method can compete with the Markov randomfield method in terms of estimation and classification accuracy,while being much faster. We furthermore introduce a hybrid

0196-2892/03$17.00 © 2003 IEEE

Page 2: Unsupervised classification of radar images using hidden Markov ...

676 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 3, MARCH 2003

method that combines the speed and robustness of the estima-tion and classification methods based on Markov chains with thefine spatial modeling of Markov random fields. Tests have beenconducted on both real and simulated synthetic aperture radar(SAR) data, with different compositions of distribution fami-lies.

The paper is organized as follows. In Section II, we intro-duce the hidden Markov random field and hidden Markov chainmodels, as well as the different probabilities that are needed forparameter estimation and classification. For simplicity, we firstdescribe MPM classification in Section III, assuming knownparameters for both Markov models. The ICE parameter esti-mation methods are presented in Section IV, first in the frame-work of classic mixture estimation (only Gaussian distributions)and then for generalized mixture estimation (several possibledistribution families). Specific adaptations to radar images arementioned. All the algorithms are described in sufficient de-tail to be implemented by readers that are familiar with sta-tistical image processing. Estimation and classification resultsobtained on real and simulated SAR images with the two ap-proaches are reported in Section V. We here also describe theimprovements obtained by combining Markov chain methodsand Markov random field methods in the estimation phase. Ourconclusions are given in Section VI.

II. M ODELS

Let be a finite set corresponding to the pixels of animage. We consider two random processes and

. represents the observed image andthe un-known class image. Each random variable takes its valuesfrom the finite set of classes , whereas each

takes its values in the set of real numbers. We denote real-izations of and by and , respec-tively.1

We here suppose that the random variables areconditionally independent with respect toand that the distri-bution of each conditional on is equal to its distributionconditional on .2 In practical applications, the assumptionof conditional independence is often difficult to justify. How-ever, numerous algorithms based on hidden Markov models thatmake this assumption have been successfully applied to real im-ages.

All the distributions of conditional on are thendetermined by the distributions of with respect to

, which will be denoted :

(1)

1As a practical example, consider a radar imageY = y covering an areacomposed of agricultural fields. We can imagine a corresponding class imageX = x, where the different crops are identified by discrete labels. Each ob-served pixel amplitude depends on several factors, including the characteristicmean radar reflectivity and texture of the underlying class, the speckle phenom-enon, the transfer function of the imaging system, and thermal noise. The latteris in general negligible.

2In the case of radar images, this means that we suppose uncorrelated speckle,i.e., that the radar system has constant transfer function. This is generally notthe case, but it is often used as an approximation.

Fig. 1. Pixel neighborhood and associated clique familiesC andC in thecase of four-connectivity.

As indicated above, we will model the interactions among therandom variables by considering that the prior distri-bution of can be modeled by a Markov process. We refer toit as a hidden Markov model, as is not directly observable.The classification problem consists in estimating fromthe observation .

A. Hidden Markov Random Fields

Let denote a neighborhood of the pixelwhose geometricshape is independent of . is a Markov random field ifand only if

(2)

i.e., if the probability that the pixel belongs to a certain classconditional on the classes attributed to the pixels in the rest

of the image is equal to the probability of conditional on theclasses of the pixels in the neighborhood.

Assuming that the probability of all possible realizations ofis strictly positive, the Hammersley–Clifford theorem [4] estab-lishes the equivalence between a Markov random field, definedwith respect to a neighborhood structure, and a Gibbs fieldwhose potentials are associated with. The elementary rela-tionships within the neighborhood are given by the systemof cliques . Fig. 1 shows the neighborhood and the associatedcliques in the case of four-connectivity.

For a Gibbs field

(3)

where is the energy of , andis a normalizing factor. The

latter is in practice impossible to compute due to the veryhigh number of possible configurations . The Hammer-sley–Clifford theorem makes it possible to relate local andglobal probabilities [7], [21]. Indeed, the local conditionalprobabilities can be written as

(4)where is the local energy function, and

is a normalizingfactor. There are several ways of defining the local energy

, and finding a good compromise between its modelingpower—which increases when its complexity increases—andthe possibilities for efficient estimation of it in unsupervisedclassification methods—which decreases when its complexityincreases—remains an open problem. For simplicity, we will

Page 3: Unsupervised classification of radar images using hidden Markov ...

FJØRTOFTet al.: UNSUPERVISED CLASSIFICATION OF RADAR IMAGES 677

restrict ourselves to Potts model, four-connectivity and cliquesof type [7], in which case is the number ofpixels for which , minus the number of pixels

for which , multiplied by a regularity parameter. The global energy in (3) is computed in a similar way,

except that we sum the potentials over all vertical and horizontalcliques in the image. As will be explained later, it can be

advantageous to have different regularity parametersandhorizontally and vertically.

Bayes’ rule and the conditional independence of the samples(1) allow us to write thea posterioriprobability as

(5)

and the corresponding local conditional distributions as

(6)

where and are normalizing factors obtained by summingover all possible nominators, similar toand . Substituting

into (5), we see thatconditional on is a Gibbs field (3).

It is not possible to createa posteriorirealizations of ac-cording to (5) directly, but they can be approximated iterativelywith the Metropolis algorithm [22] or Gibbs sampler [5]. Wewill here only consider Gibbs sampler, which includes the fol-lowing steps.

• We start from an initial class image .• We sweep the image repeatedly until convergence. For

each iteration and for each pixel

a) the locala posteriori distribution given by (6) iscomputed;

b) the pixel is attributed to a class drawn randomly ac-cording to this distribution.3

In a similar way,a priori realizations of obeying (3) canbe computed iteratively using (4).

B. Hidden Markov Chains

A 2-D image can easily be transformed into a one-dimen-sional chain, e.g., by sweeping the image line by line or columnby column. Another alternative is to use a Hilbert–Peano scan[11], as illustrated in Fig. 2. Generalized Hilbert–Peano scans[12] can be applied to images whose length and width are notpowers of two. In a slightly different sense than above, now let

and be the vectors ofrandom variables ordered according to such a transformation ofthe class image and the observed image, respectively. Their re-alizations will consequently be denoted and

3Random sampling of a class according to a distribution can be done in thefollowing way: We consider the interval [0,1] and attribute to each class a subin-terval whose width is equal to the probability of that class. A uniformly dis-tributed random number in [0,1] is generated, and the class is selected accordingto the subinterval in which this random number falls.

(a) (b) (c)

Fig. 2. Construction of a Hilbert–Peano scan for an 8� 8 image. (a)Initialization. (b) Intermediate stage. (c) Result.

. According to the definition, is a Markovchain if

(7)

for . The distribution of will consequently bedetermined by the distribution of , denoted by , and the setof transition matrices , whose elements are

. We will in the following assume thatthe probabilities

(8)

are independent of. The initial distribution then becomes

(9)

and the transition matrix is constant (independent of) andgiven by

(10)

Hence, thea priori distribution of is that of a stationaryMarkov chain. It is entirely determined by the parameters

, and we can write

(11)

where is the index of the class of theth element of the chain.The so-calledforward andbackwardprobabilities

(12)

and

(13)

can be calculated recursively. Unfortunately, the original for-ward–backward recursions derived from (12) and (13) are sub-ject to serious numerical problems [13], [23]. Devijveret al.[23]have proposed to replace the joint probabilities bya posterioriprobabilities, in which case

(14)

and

(15)

Page 4: Unsupervised classification of radar images using hidden Markov ...

678 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 3, MARCH 2003

In the following, we use the numerically stable forward–back-ward recursions resulting from these approximations:

• Forward initialization

(16)

for .• Forward induction

(17)

for and .• Backward initialization

(18)

for .• Backward induction

(19)

for and .The joint probability of the classes of two subsequent ele-

ments given all the observations

(20)

can be written as a function of the forward–backward probabil-ities

(21)

The marginala posterioriprobability, i.e., the probability ofhaving class in element number given all the observations

, can then be obtained from (21):

(22)

It can be shown [14] that thea posterioridistribution of ,i.e., , is that of a nonstationary Markov chain,with transition matrices given by

(23)

Using (23), we can simulatea posteriorirealizations of di-rectly, i.e., without iterative procedures as in the case of hiddenMarkov random fields. The class of the first element ofthe chain is drawn randomly according to the marginala poste-riori distribution (22). Subsequently, for each new element

, the transition probabilities (23) are com-puted, the class of the precedent element being fixed,

and is obtained by random sampling according tothis distribution.

III. CLASSIFICATION

Let us first assume that we know the distributionand theassociated parameter vector of each class (in the case of ascalar Gaussian distribution, for example, where

is the mean value and the variance), as well as the regu-larity parameters of the underlying Markov model (or ). Ina Bayesian framework, the goal of the classification is to deter-mine the realization that best explains the observation

, in the sense that it minimizes a certain cost function.Several cost functions can be envisaged, leading to different es-timators, such as the MAP, which aims at maximizing the globala posterioriprobability , and the MPM, which consistsin maximizing the posterior marginal distribution foreach pixel, i.e., finding the mode of each local posterior distri-bution. In this study, we use MPM classification.

A. Hidden Markov Random Fields

In the case of hidden Markov random fields, the MPM is com-puted as follows [6].

• A series ofa posteriori realizations of are computediteratively, using Gibbs sampler (or the Metropolis algo-rithm) based on the locala posterioriprobability function(6). Each realization is obtained by generating an initialclass image with a random generator and by performing acertain number of iterations with Gibbs sampler (until ap-proximate convergence of the global energy).

• For each pixel , we retain the most frequently occurringclass .

The required number ofa posteriorirealizations and iterationsper realization will be discussed in Section V.

B. Hidden Markov Chains

The MPM solution can be calculated directly for hiddenMarkov chains, based on one forward–backward computation.

• For every element in the chain, and for every possibleclass , we compute

a) forward probabilities (17);b) backward probabilities (19);c) marginala posterioriprobabilities (22).

• Each element is attributed to the class that maxi-mizes .

IV. M IXTURE ESTIMATION

In practice, the regularity parameters and the parameters ofthe distributions of the classes are often unknown and must beestimated from the observation . As the distribution of

can be written as a weighted mixture of probability densi-ties, , such a problem is calleda mixture estimationproblem. The problem is then double: wedo not know the characteristics of the classes, and we do notknow which pixels are representative for each class. We firstpresentclassicmixture estimation, where all classes are sup-posed to be Gaussian, and then introducegeneralizedmixture

Page 5: Unsupervised classification of radar images using hidden Markov ...

FJØRTOFTet al.: UNSUPERVISED CLASSIFICATION OF RADAR IMAGES 679

estimation, where several distribution families are possible foreach class. There are several iterative methods for mixture esti-mation, including EM [15], SEM [18], and ICE [19], [20]. Herewe will only consider ICE, which has some relations with EM[24]. The ICE method has been applied to a variety of imageanalysis problems, including classification of forest based on re-mote sensing images [25], segmentation of sonar images [26],segmentation of brain SPECT images [27], video segmentation[28], multiresolution image segmentation [29], and fuzzy imagesegmentation [30], [31]. In this paper, we also use the general-ized ICE, which is an extension of the classic ICE to the casewhere the distribution family can vary with the class and is notknown. We will only present what is necessary for the imple-mentation, and not the underlying theory, which can be found in[14] for hidden Markov chains, and in [21] for hidden Markovrandom fields. Furthermore, we propose a special version ofthe ICE for hidden Markov random fields, where the estima-tion of the regularity parameter is inspired by the generalizedEM method proposed in [32].

A. Hidden Markov Random Fields

The ICE algorithm iteratively createsa posteriori realiza-tions and recalculates the class and regularity parameters. In theframework of classic mixture estimation, the following compu-tation steps are carried out for each iteration.

• A certain number ofa posteriorirealizations (with index) are computed with Gibbs sampler, using the parameters

(defining ) obtained in the previous iteration.• The class parameters are estimated for eacha poste-

riori realization and then averaged to obtain .• The regularity parameter is estimated with a stochastic

gradient approach [32]–[35], based on a series ofa priorirealizations computed with Gibbs sampler.

Gibbs sampler is initialized with a random image for eacha priori or a posteriori realization. In order to reduce thecomputation time, we generally compute only onea posteriorirealization for each iteration of the ICE and only onea priorirealization for each iteration of the stochastic gradient. Thissimplification does not imply any significant performance loss.

We use a coarse approximation of the stochastic gradientequation to compute . Let be the energy (3) ofthe currenta posteriori realization and the energy ofthea priori realization in iteration of the stochastic gradient.Setting to , i.e., the value obtained in the precedentICE iteration, we repeatedly compute untilconvergence

(24)

The ICE algorithm needs an initial class image from whichthe initial parameters of the classes are computed. We have usedthe K-means algorithm, which subdivides the gray levels in Kdistinct classes iteratively. The initial class centers are uniformlydistributed over the range of gray levels. We sweep the imagerepeatedly until stability, attributing each pixel to the class withthe closest center, and recompute the class centers from all the

attributed samples at the end of each iteration. It should be notedthat this method only can be used to initiate classes with dif-ferent mean values. It is, for example, not suited when classeswith the same mean value but different variance must be distin-guished. K-means is basically a thresholding method, so if thereis much overlap between the true class distributions, as for radarimages, the resulting class image will be quite irregular and theinitial class statistics will not be very representative. The con-sequences of this, and a possible remedy, will be commentedon in Section V. The initial value of is predefined (typically

).

B. Hidden Markov Chains

We initiate the ICE algorithm in a similar way for hiddenMarkov chains, using K-means to define the class parame-ters and thus the marginal conditional distributions,uniformly distributeda priori probabilities , and a generictransition matrix where when and

when . Each ICE iteration isbased on one forward–backward computation and includes thefollowing steps.

• For every element in the chain, and for every possibleclass , we compute

a) forward probabilities (17) based on ,, and (given by );

b) backward probabilities (19) based on ,, and ;

c) marginala posterioriprobabilities (22).• This allows us to compute

a) new joint conditional probabilities (21)using , , , and ;

b) new elements of the stationary transition matrix

(25)

c) new initial probabilities

(26)

• We compute a series ofa posteriorirealizations based on(23) with , , and . For each realization (withindex ), we estimate the class parameters , which areaveraged to obtain and thus .

As for hidden Markov random fields, we generally limit thenumber ofa posteriorirealizations per ICE iteration to one.

C. Generalized Mixture Estimation

In generalized mixture estimation, the distributionof eachclass is not necessarily Gaussian, but can belong to any distribu-tion family in a predefined set. This implies the following mod-ifications to the above ICE algorithms.

• The parameter vectors of all possible distribution fam-ilies are computed from thea posteriori realizations foreach class.

Page 6: Unsupervised classification of radar images using hidden Markov ...

680 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 3, MARCH 2003

• The Kolmogorov distance [36] is used to determine themost appropriate distribution family for each class [14].

a) We compute the cumulative distributions for all thedistribution families, based on the estimated param-eters of the class.

b) The cumulative normalized histogram of the class iscomputed.

c) We retain the distribution family having the smallestmaximum difference between the cumulative distri-bution and the cumulative normalized histogram ofthe class.

As the pixel values are discrete, the computation and compar-ison of cumulative distributions and cumulative normalized his-tograms are straightforward.

D. Application to Radar Images

For simplicity, we here consider only two distribution fami-lies, corresponding to classes with constant and textured radarreflectivity, respectively. Assuming the speckle to be spatiallyuncorrelated, the observed intensity in a zone of constant re-flectivity is Gamma distributed. The corresponding amplitudedistribution is

(27)

for and for . . is here theequivalent number of independent looks of the image, whichshould be provided by the data supplier. Let the estimated mo-ments of order be denoted by . Assuming that the zone ofconstant reflectivity mentioned above corresponds to a certainclass, the estimated mean radar reflectivity in (27) iscomputed over all pixels attributed to this class.

If we assume that the radar reflectivity texture of a class isGamma distributed, the observed intensity will obey a K distri-bution [37]–[39]. The corresponding amplitude distribution is

(28)

for and for . . ishere the modified Bessel function of the second kind, and theestimated radiometric parametersand are computed as fol-lows. Let and

. If , is obtained by solving theequation . Otherwisewe set , provided that . In both cases

. If and , we consider (28) asunsuited, and we make sure that it is not selected. Moreover, ifthe radar reflectivity texture is very weak (typically ), weapproximate (28) by (27).

For simplicity, we will in the following refer to (27) and(28) as Gamma and K distributions, respectively, even thoughthey actually are the amplitude distributions corresponding toGamma and K distributed intensities. A more comprehensiveset of distribution families are described in [40].

V. RESULTS

The schemes for unsupervised classification of radar imagesdescribed above have been tested on a wide range of real SARimages, as well as on a few simulated ones. We will here onlypresent a small but representative selection of results.

A. Real SAR Image With Three Classes

Fig. 3(a) shows an 512 512 extract of a Japanese Earth Re-sources Satellite (JERS) three-look amplitude image of a trop-ical forest with some burnt plots related to agricultural activity.The equivalent number of independent looks . It wasdecided to classify the image into three classes. Fig. 3(b) showsthe initial classification obtained with the K-means algorithm.The classes are represented by different gray levels (the graylevels do not correspond to the mean amplitudes of the dif-ferent classes, but they are ordered according to these meanvalues). We note that the K-means class image is very irregulardue to the speckle. A more regular initial classification couldbe obtained by applying an adaptive speckle filter prior to theK-means algorithm [41]–[43]. The subsequent mixture estima-tion and classification are of course carried out on the originalradar image. Our tests indicate that such speckle filtering priorto the K-means initialization generally has little influence on thefinal classification result.

Let us first consider the ICE and MPM algorithms based onthe hidden Markov random fields model. We use 30 iterationsfor the ICE algorithm, with only onea posteriorirealization periteration. The initial value of the regularity parameter is .Within each ICE iteration, the maximum number of iterationsfor the stochastic gradient is set to ten, with onea priori realiza-tion per iteration. We interrupt the iteration earlier ifdiffersless than 0.01 from its previous value. Except for the first ICEiterations, the stochastic gradient estimation generally requiresvery few iterations. Gibbs sampler with as much as 100 itera-tions is used to generate eacha priori anda posteriorirealiza-tion. The convergence of the global energy is in fact quite slow,especially for realizations according to thea priori distribution.The MPM classification based on the hidden Markov randomfields model relies on tena posteriori realizations. Fig. 3(c)presents the obtained classification result. The regularity param-eter was in this case estimated to , and the Gammadistribution was retained for all three classes. The classifica-tion result corresponds quite well to our thematic conceptionof the image based on visual inspection, except that many finestructures seems to be lost, and the region borders are somewhatrounded.

The number of ICE iterations is set to 30 also for the cor-responding analysis scheme based on Markov chains, and wecompute only onea posteriorirealization per iteration. Fig. 3(d)shows the result of the MPM classification. It is generally lessregular than the classification in Fig. 3(c), but it represents smallstructures more precisely. The K distribution was here selectedby the algorithm for the darkest class, whereas the Gamma dis-tribution gave the best fit for the two others.

The overall quality is comparable for the two classificationresults, but the computing time is quite different: The programbased on Markov random fields spent about 53 min on a

Page 7: Unsupervised classification of radar images using hidden Markov ...

FJØRTOFTet al.: UNSUPERVISED CLASSIFICATION OF RADAR IMAGES 681

(a) (b)

(c) (d)

Fig. 3. Classification of a 512� 512 extract of a JERS three-look image of a tropical forest into three classes. (a) Original amplitude image. (b) Initial K-meansclassification. Result obtained (c) with the Markov random field method and (d) with the Markov chain method.

PC with a Pentium III 733-MHz processor running Linux,whereas the program based on Markov chains only needed2 min and 27 s.

B. Simulated SAR Image With Three Classes

In order to examine the performances more carefully, letus also consider simulated SAR images. Fig. 4(a) and (b)represents an ideal class image with three classes and itsthree-look speckled counterpart. The darkest andbrightest classes are both Gamma distributed, whereas the onein the middle is K distributed with texture parameter(28). The contrast between two consecutive classes is 3.5 dB.The image size is 512 512 pixels. The ideal class image isin fact derived from a resampled optical image by K-meansclassification, so it is neither a realization of a Markov randomfield nor a realization of a Markov chain, but represents anatural scene. It should therefore permit a fair comparison ofthe two analysis schemes.

The parameter settings described in Section V-A wereapplied here as well, except that we allowed different regularityparameters vertically and horizontally for the method basedon Markov random fields, as the resolution is not the same inthe two directions. The regularity parameters were estimatedto and , respectively. However, the

classification result in Fig. 4(c) is far too irregular, and only72.7% of the pixels are correctly classified. The method basedon Markov chains here gives a more satisfactory result, shownin Fig. 4(d), with 83.9% of correctly classified pixels. Inparticular, the overall regularity of the image seems to be betterestimated.

Both methods correctly identify Gamma distributions for thedarkest and brightest classes, but only the Markov chain methodfound that the class in the middle was K distributed. The fit be-tween the estimated and the true distributions in Fig. 5 is gen-erally very good, except for the case where the Markov randomfield method chooses the wrong distribution family. This can bepart of the reason why the estimation of the regularity parame-ters here fails, but the main reason is probably the simplicity ofthe Markov random field model compared to the true character-istics of the ideal class image.

The program based on Markov chains was here 37 timesquicker than the one based on Markov random fields.

C. Simulated SAR Image With Four Classes

Fig. 6(a) represents an ideal and approximately isotropic512 512 class image with four classes, which has beenderived fromanopticalsatellite imagebyK-meansclassification.Fig. 6(b) shows the corresponding simulated SAR image. The

Page 8: Unsupervised classification of radar images using hidden Markov ...

682 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 3, MARCH 2003

(a) (b)

(c) (d)

Fig. 4. Classification of a 512� 512 simulated three-look SAR image into three classes (a) Ideal class image. (b) Speckled amplitude image. Results obtained(c) with the Markov random field method and (d) with the Markov chain method. The fraction of correctly classified pixels is (c) 72.7% and (d) 83.9%.

(a) (b)

Fig. 5. Fit of the generalized mixture estimation performed (a) by the Markov random field method and (b) by the Markov chain method on the simulatedthree-look SAR image with three classes.

parameter settings and the radiometric characteristics of theclasses are the same as for the simulated image with three classesshown in Fig. 4(b), except that an additional Gamma distributed

class with higher reflectivity has been added. Fig. 6(c) and(d) represents the results obtained with the Markov randomfield and Markov chain methods, respectively.

Page 9: Unsupervised classification of radar images using hidden Markov ...

FJØRTOFTet al.: UNSUPERVISED CLASSIFICATION OF RADAR IMAGES 683

(a) (b)

(c) (d)

Fig. 6. Classification of a 512� 512 simulated three-look SAR image into four classes. (a) Ideal class image. (b) Speckled amplitude image. Results obtained(c) with the Markov random field method and (d) with the Markov chain method. The fraction of correctly classified pixels is (c) 87.0% and (d) 85.2%.

(a) (b)

Fig. 7. Fit of the generalized mixture estimation performed (a) by the Markov random field method and (b) by the Markov chain method on the simulatedthree-look SAR image with four classes.

The estimated regularity parameter for the method based onMarkov random fields is here , which visually givesa very satisfactory result. The proportion of correctly classified

pixels is 87.0%, whereas it is 85.2% for the method based onMarkov chains. The borders are slightly more irregular for thelatter method, but some of the narrow structures are better pre-

Page 10: Unsupervised classification of radar images using hidden Markov ...

684 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 3, MARCH 2003

(a) (b)

Fig. 8. Classification results for the simulated three-look SAR images with (a) three classes and (b) four classes, obtained with the new method combining Markovchain and Markov random field algorithms. The fraction of correctly classified pixels is (a) 85.8% and (b) 87.0%.

(a) (b)

Fig. 9. Fit of the generalized mixture estimation for the simulated three-look SAR images with (a) three classes and (b) four classes, obtained with the newmethod combining Markov chain and Markov random field algorithms.

served. The distribution families of the four classes were cor-rectly identified by both methods, and the fit of the estimated pa-rameters is comparable, as can be seen from Fig. 7. The Markovchain algorithm was, however, 25 times faster.

D. Hybrid Method

The scheme based on Markov random fields occurs to havedifficulties in estimating the regularity parameters for many im-ages, whereas the estimation method based on Markov chainsis very robust. The latter method is also much faster. Neverthe-less, in cases where the Markov random field method estimatesthe regularity parameters correctly, it provides a more satisfac-tory classification result. In particular, it produces more regularregion borders. We therefore propose a hybrid method, wherewe first run the ICE algorithm in the framework of the hiddenMarkov chain model. Initializing with the estimated parame-ters and the lasta posteriorirealization, we thereafter only gothrough a very restricted number of ICE iterations and the final

MPM classification with the Markov random field versions ofthe algorithms. Here, we have used 30 ICE iterations for theMarkov chain part, followed by one single ICE iteration withonea posteriorirealization and the MPM with tena posteriorirealizations for the Markov random field algorithms.

Fig. 8(a) represents the classification result obtained with thenew method on the simulated SAR image with three classes.Visually, the classification result is far better than those given inFig. 4, and the portion of correctly classified pixels has increasedto 85.8%. The distribution families of the different classes arecorrectly identified, and the estimated regularity parameters are

and . Comparing Fig. 9(a) with Fig. 5(a)and (b), we see that the overall fit of the estimated distributionsis improved. The computing time is 11 min and 38 s, which isnearly four times slower than the algorithm based on Markovchains, but ten times faster than the method based on Markovrandom fields.

Likewise, Fig. 8(b) represents the classification result ob-tained with the hybrid method on the simulated SAR image

Page 11: Unsupervised classification of radar images using hidden Markov ...

FJØRTOFTet al.: UNSUPERVISED CLASSIFICATION OF RADAR IMAGES 685

TABLE ICLASSIFICATION ACCURACY AND COMPUTING TIME FOR THE MARKOV RANDOM FIELD, MARKOV CHAIN,

AND HYBRID METHOD APPLIED TO THESIMULATED SAR IMAGES

with four classes. It is very close to the classification result inFig. 6(c), obtained with the Markov random field method. Theportion of correctly classified pixels is the same (87.0%), andthe estimated regularity parameter is very close ;however, the new method is approximately seven times faster.The fit of the estimated distributions, shown in Fig. 9(b), is alsoslightly better.

Table I resumes the classification accuracy and computingtime of the three methods applied to the simulated SAR images.

VI. CONCLUSION

This paper describes unsupervised classification of radar im-ages in the framework of hidden Markov models and general-ized mixture estimation. We describe how the distribution fam-ilies and parameters of classes with constant or textured radarreflectivity can be determined through generalized mixture es-timation. For simplicity, we have restricted ourselves to Gammaand K distributed intensities. Hidden Markov random fields arefrequently used to impose spatial regularity constraints in theparameter estimation and classification stages. This approachproduces excellent results in many cases, but our experimentsindicate that the estimation of the regularity parameter is a dif-ficult problem, especially when the SNR is low. When the esti-mation of the regularity parameters fails, the mixture estimationand classification results are poor. The considerable computingtime is another drawback of this approach. Methods based ona hidden Markov chain model, applied to a Hilbert–Peano scanof the image, constitute an interesting alternative. The estima-tion of the regularity parameters, which here are the elementsof a transition matrix, seems to be more robust, but the regionborders generally get slightly more irregular than with the corre-sponding scheme based on Markov random fields. We thereforepropose a new hybrid method, where we use the Markov chainalgorithms first and those based on Markov random fields onlyfor the final estimation and classification, thus obtaining bothfast and robust mixture estimation and well regularized classi-fication results.

ACKNOWLEDGMENT

We acknowledge the contributions of our colleagues J.-M.Boucher, R. Garello, J.-M. Le Caillec, H. Maître, and J.-M.Nicolas.

REFERENCES

[1] J. S. Lee, “Speckle analysis and smoothing of synthetic aperture radarimages,”Comput. Graph. Image Process., vol. 17, pp. 24–32, 1981.

[2] Y. Delignon, R. Garello, and A. Hillion, “Statistical modeling of oceanSAR images,”Proc. Inst. Elect. Eng. Radar, Sonar Navig., vol. 144, no.6, pp. 348–354, 1997.

[3] P. Pérez, “Markov random fields and images,”CWI Q., vol. 11, no. 4,pp. 413–437, 1998.

[4] J. Besag, “Spatial interactions and the statistical analysis of lattice sys-tems,”J. R. Stat. Soc., vol. B-36, pp. 192–236, 1974.

[5] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions andthe Bayesian restoration of images,”IEEE Trans. Pattern Anal. MachineIntell., vol. PAMI-6, pp. 721–741, Nov. 1984.

[6] J. Marroquin, S. Mitter, and T. Poggio, “Probabilistic solution of ill-posed problems in computational vision,”J. Amer. Stat. Assoc., vol. 82,no. 397, pp. 76–89, 1987.

[7] R. Chellappa and A. Jain, Eds.,Markov Random Fields. New York:Academic, 1993.

[8] H. Derin and H. Elliott, “Modeling and segmentation of noisy and tex-tured images using Gibbs random fields,”IEEE Trans. Pattern Anal.Machine Intell., vol. PAMI-9, pp. 39–55, Jan. 1987.

[9] H. Maître, Ed., “Traitement des images de radar à synthèse d’ouverture,”in Collection Trait. du Signal et des Images. Paris, France: EditionsHermes, 2001.

[10] P. A. Kelly, H. Derin, and K. D. Hartt, “Adaptive segmentation ofspeckled images using a hierarchical random field model,”IEEE Trans.Acoust., Speech, Signal Processing, vol. 36, pp. 1628–1641, Oct. 1988.

[11] K. Abend, T. J. Harley, and L. N. Kanal, “Classification of binaryrandom patterns,”IEEE Trans. Inform. Theory, vol. IT-11, no. 4, 1965.

[12] W. Skarbek, “Generalized Hilbert scan in image printing,” inTheoret-ical Foundations of Computer Vision, R. Klette and W. G. Kropetsh,Eds. Berlin, Germany: Akademie Verlag, 1992.

[13] B. Benmiloud and W. Pieczynski, “Estimation de paramètres dans leschaînes de Markov cachées et segmentation d’images,”Trait. du Signal,vol. 12, no. 5, pp. 433–454, 1995.

[14] N. Giordana and W. Pieczynski, “Estimation of generalized multisensorhidden Markov chains and unsupervised image segmentation,”IEEETrans. Pattern Anal. Machine Intell., vol. 19, pp. 465–475, May 1997.

[15] L. E. Baum, T. Petrie, G. Goules, and N. Weiss, “A maximization tech-nique occuring in the statistical analysis of probabilistic functions ofMarkov chains,”Ann. Math. Stat., vol. 41, pp. 164–171, 1970.

[16] J. Zhang, J. W. Modestino, and D. Langan, “Maximum-likelihood pa-rameter estimation for unsupervised stochastic model-based image seg-mentation,”IEEE Trans. Image Processing, vol. 3, pp. 404–420, July1994.

[17] G. J. McLachlan and T. Krishnan, “The EM algorithm and extensions,”in Series in Probability and Statistics. New York: Wiley, 1997.

[18] G. Celeux and J. Diebolt, “L’algorithme SEM : Un algorithme d’appren-tissage probabiliste pour la reconnaissance de mélanges de densités,”Revue de Stat. Appl., vol. 34, no. 2, 1986.

[19] W. Pieczynski, “Statistical image segmentation,”Mach. Graph. Vision,vol. 1, no. 1/2, pp. 261–268, 1992.

[20] , “Champs de Markov cachés et estimation conditionnele itérative,”Trait. du Signal, vol. 11, no. 2, 1994.

[21] Y. Delignon, A. Marzouki, and W. Pieczynski, “Estimation of general-ized mixtures and its application in image segmentation,”IEEE Trans.Image Processing, vol. 6, pp. 1364–1375, Oct. 1997.

[22] N. Metropolis, A. W. Rosenbluth, A. H. Teller, M. R. Rosenbluth, andE. Teller, “Equations of state calculations by fast computing machines,”J. Chem. Phys., vol. 21, pp. 1087–1091, 1953.

[23] P. A. Devijver and M. Dekesel, “Champs aléatoires de Pickard et modél-ization d’images digitales,”Trait. du Signal, vol. 5, no. 5, pp. 131–150,1988.

[24] J.-P. Delmas, “An equivalence of the EM and ICE algorithm for expo-nential family,”IEEE Trans. Signal Processing, vol. 45, pp. 2613–2615,Oct. 1997.

Page 12: Unsupervised classification of radar images using hidden Markov ...

686 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 3, MARCH 2003

[25] J. M. Boucher and P. Lena, “Unsupervised Bayesian classification, ap-plication to the forest of Paimpont (Brittany),”Photo Interprétation, vol.22, no. 1994/4, 1995/1-2, pp. 79–81, 1995.

[26] M. Mignotte, C. Collet, P. Pérez, and P. Bouthémy, “Sonar image seg-mentation using an unsupervised hierarchical MRF model,”IEEE Trans.Image Processing, vol. 9, pp. 1216–1231, July 2000.

[27] M. Mignotte, J. Meunier, J.-P. Soucy, and C. Janicki, “Segmentation andclassification of brain SPECT images using 3D Markov random field anddensity mixture estimations,” inProc. 5th World Multi-Conf. Systemics,Cybernetics and Informatics, Concepts and Applications of Systemicsand Informatics, vol. X, Orlando, FL, July 2001, pp. 239–244.

[28] A. Peng and W. Pieczynski, “Adaptive mixture estimation and unsu-pervised local Bayesian image segmentation,”Graph. Models ImageProcess., vol. 57, no. 5, pp. 389–399, 1995.

[29] Z. Kato, J. Zerubia, and M. Berthod, “Unsupervised parallel imageclassification using Markovian models,”Pattern Recognit., vol. 32, pp.591–604, 1999.

[30] H. Caillol, W. Pieczynski, and A. Hillon, “Estimation of fuzzy Gaussianmixture and unsupervised statistical image segmentation,”IEEE Trans.Image Processing, vol. 6, pp. 425–440, Mar. 1997.

[31] F. Salzenstein and W. Pieczynski, “Sur le choix de la méthode de seg-mentation statistique d’images,”Trait. du Signal, vol. 15, no. 2, pp.119–127, 1995.

[32] G. Gravier, M. Sigelle, and G. Chollet, “Markov random field modelingfor speech recognition,”Aust. J. Intell. Inform. Process. Syst., vol. 5, no.4, pp. 245–252, 1999.

[33] L. Younes, “Estimation and annealing for Gibbsian fields,”Ann. l’Inst.Henri Poincaré, vol. 24, no. 2, pp. 269–294, 1988.

[34] , “Parameter estimation for imperfectly observed Gibbsian fields,”Prob. Theory Related Fields, vol. 82, pp. 625–645, 1989.

[35] , “Parameter estimation for imperfectly observed Gibbs fields andsome comments on Chalmond’s EM Gibbsian algorithm,” inProc. Sto-chastic Models, Statistical Methods and Algorithms in Image Analysis.ser. Lecture Notes in Statistics, P. Barone and A. Frigessi, Eds. Berlin,Germany: Springer, 1992.

[36] S. Siegel, Nonparametrical Statistics for the Behavioral Sci-ences. New York: McGraw-Hill, 1956.

[37] E. Jakeman and P. N. Pusey, “Significance of K distribution in scatteringexperiments,”Phys. Rev. Lett., vol. 40, pp. 546–550, 1978.

[38] E. Jakeman, “On the statistics of K distributed noise,”J. Phys. A: Math.Gen., vol. 13, pp. 31–48, 1980.

[39] J. K. Jao, “Amplitude of composite terrain radar clutter and the K distri-bution,” IEEE Trans. Antennas Propagat., vol. AP-32, pp. 1049–1062,Oct. 1984.

[40] Y. Delignon, R. Fjørtoft, and W. Pieczynski, “Compound distributionsfor radar images,” inProc. Scandinavian Conf. Image Analysis, Bergen,Norway, June 11–14, 2001, pp. 741–748.

[41] D. T. Kuan, A. A. Sawchuk, T. C. Strand, and P. Chavel, “Adaptive noisesmoothing filter for images with signal-dependent noise,”IEEE Trans.Pattern Anal. Machine Intell., vol. PAMI-7, pp. 165–177, Feb. 1985.

[42] A. Lopès, R. Touzi, E. Nezry, and H. Laur, “Maximum a posteriorispeckle filtering and first order textural models in SAR images,” inIGARSS, vol. 3, College Park, MD, May 1990, pp. 2409–2412.

[43] A. Lopès, R. Touzi, E. Nezry, and H. Laur, “Structure detection and sta-tistical adaptive speckle filtering in SAR images,”Int. J. Remote Sens.,vol. 14, no. 9, pp. 1735–1758, June 1993.

Roger Fjørtoft (M’01) received the M.S. degreein electronics from the Norwegian Institute ofTechnology (NTH), Trondheim, Norway, in 1993,and the Ph.D. degree in signal processing, imageprocessing, and communications from the InstitutNational Polytechnique (INP), Toulouse, France, in1999.

He is currently a Research Scientist at the Norwe-gian Computing Center (NR), Oslo, Norway, wherehe works on automatic image analysis for remotesensing applications, in particular multisensor

and multiscale analysis. From 1999 to 2000, he worked on unsupervisedclassification of radar images in the framework of a PostDoc at the Groupe desEcoles des Télécommunications, Paris, France.

Yves Delignon (M’99) received the Ph.D. degreefrom the University of Rennes I, Rennes, France, in1993.

He is currently with the Institut d’Electronique etde Microélectronique du Nord, Centre National de laRecherche Scientifique, Villeneuve d’Ascq, France.From 1990 to 1992, he was at the Ecole NationaleSupérieure des Télécommunications de Bretagne,Brest, France, where he studied the statisticalmodeling of radar images. In 1993, he joined theEcole Nouvelle d’Ingenieurs en Communication

as an Assistant Professor. His research focused on unsupervised statisticalsegmentation of radar images until 2000. His present research interests includeMIMO systems, digital communications, and wireless ad hoc networks.

Wojciech Pieczynski received the Doctorat d’Etaten Mathématiques from the Université de Paris VI,Paris, France, in 1986.

He has been with France Telecom, Paris, France,since 1987 and is currently a professor at theInstitut National des Télécommunications d’Evry,Evry, France. His main research interests includestochastic modeling, statistical techniques of imageprocessing, and theory of evidence.

Marc Sigelle (M’91) received the engineer diplomafrom both Ecole Polytechnique, Paris, France, in1975, and from Ecole Nationale Supérieure desTélécommunications (ENST), Paris, France, in1977. He received the Ph.D. degree from ENST in1993.

He has been with ENST since 1989. He hasbeen with the Centre National d’Etudes desTélécommunications, working on physics andcomputer algorithms. His main subjects of interestare restoration and segmentation of signals and

images with Markov random fields, hyperparameter estimation methods, andrelationships with statistical physics. His research has concerned reconstructionin angiographic medical imaging, processing of satellite and SAR images, andspeech and character recognition with Markov random fields and Bayesiannetworks.

Florence Tupin received the engineering and Ph.D.degrees from Ecole Nationale Supérieure des Télé-communications (ENST), Paris, France, in 1994 and1997, respectively.

She is currently an Associate Professor at ENST inthe Image and Signal Processing (TSI) Department.Her main research interests are image analysis andinterpretation, Markov random field techniques, andSAR remote sensing.