Unsupervised clustering of PolSAR data using Polarimetric ...Unsupervised clustering of PolSAR data...
Transcript of Unsupervised clustering of PolSAR data using Polarimetric ...Unsupervised clustering of PolSAR data...
Unsupervised clustering of PolSAR data using Polarimetric GDistribution and Markov Random Fields
Salman Khan, Surrey Space Centre, University of Surrey, [email protected], United Kingdom
Anthony Paul Doulgeris, University of Tromsø, [email protected], Norway
Abstract
In this paper an unsupervised PolSAR data clustering algorithm utilizing the flexible polarimetric G distribution is
proposed for the first time. This algorithm has been demonstrated in earlier contributions using other non-Gaussian
distributions like K, G0, and U distributions. The K and G0 distributions suffer from limited modeling capability due
to the presence of only one shape parameter, while the U distribution, although as flexible as the G model, has a very
cumbersome probability distribution function, making its software implementation difficult and computation slow. The
proposed algorithm with the G distribution has a similar non-Gaussian modeling accuracy to the U model, a more easily
implementable probability distribution function, and a much faster computation time. The only disadvantage being that
the log cumulants of the G model are only computable using numerical differentiation, and hence fractional moment
estimators are used in this analysis.
1 Introduction
Synthetic Aperture Radar (SAR) data are of significant
interest because of the weather and light independence
properties of SAR sensors. These data offer a viable al-
ternative in situations where other sensors (e.g. optical)
suffer from impenetrable signals due to cloud cover, rain,
smoke, light conditions etc. The recent advent of high
resolution SARs now provides data with sub meter reso-
lution comparable to that of some optical sensors. Fur-
ther, the availability of polarimetric SAR (PolSAR) data
facilitates more diverse information through the different
transmit and receive polarization pairs. This helps in dif-
ferentiating between physical scattering mechanisms oc-
curring at the target of interest.
SAR data are inherently statistical due to the presence of
speckle, which is a characteristic phenomenon of a co-
herent imaging system. Therefore, it is inevitable that
the analyses of such data take place from a probabilis-
tic approach. Gaussian statistics model low resolution
SAR data reasonably well, however, when the resolution
increases and central limit theorem is not strictly satis-
fied, non-Gaussian statistics are observed. Consequently,
many non-Gaussian probability models have been used to
describe both single-channel and PolSAR data. For mul-
tilook PolSAR data, which is the format analyzed in this
research, the underlying Gaussian statistics are modeled
by the scaled complex Wishart distribution, sWd, while
the non-Gaussian statistics are derived using the product
model [1], which states that the backscattered signal re-
sults from the product between a Gaussian speckle noise
random variate and a positive scalar texture random vari-
able.
The non-Gaussian multilook polarimetric Kd [2] and G0d
[3] distributions are relatively more flexible (one texture
parameter each), and successfully model many PolSAR
scenes. However, it has been noted that sometimes more
modeling flexibility is needed for real PolSAR data. In
this regard, the multilook polarimetric Ud [4], and Gd [3]
distributions are more flexible with two texture shape pa-
rameters each, and have the sWd, Kd, and G0d
distribu-
tions as special cases. In fact, the modeling flexibility of
Ud and Gd distributions is very similar as shown recently
in [5].
In many applications, clustering or segmentation of SAR
data is of interest. These include, land monitoring, map-
ping, change detection, damage assessment and detec-
tion, and rescue and recovery operations. Some of these
recent algorithms are presented in [4, 6–10]. The cluster-
ing algorithm of interest in this paper is a modified ver-
sion of the unsupervised expectation maximization (EM)
algorithm. This algorithm has been proposed by Doul-
geris et al. in [7] and later extended in [10] to incorporate
contextual smoothening through the use of Markov ran-
dom fields (MRF). The algorithm uses one of the afore-
mentioned probability distributions as the underlying sta-
tistical model, and has recently been proposed with the
flexible Ud distribution in [11]. However, the pdf of the
Ud distribution is computationally challenging as it in-
volves Kummer-U functions, which do no have readily
available logarithmic implementations, and are only com-
putable through numerical integration. This is the rea-
son for the slow computation time of this segmentation
algorithm as noted in [11]. In contrast to this, the simi-
larly flexible Gd distribution pdf contains modified Bessel
functions of the second kind, which have stable and well
tested logarithmic implementations in GNU scientific li-
brary (GSL) [12]. It is therefore expected to be computa-
tionally faster with a similar modeling capability. This is
exactly the motivation behind the current research.
The rest of the paper has been organized as follows. Sec-
tion 2 gives a brief overview of the clustering algorithm.
The Gd distribution and its estimator are presented in Sec-
tion 3. Section 4 depicts the application of the algorithm
1025
EUSAR 2014
ISBN 978-3-8007-3607-2 / ISSN 2197-4403 © VDE VERLAG GMBH ∙ Berlin ∙ Offenbach, Germany
to PolSAR data. Section 5 discusses the observed numer-
ical inaccuracies during parameter estimation, while Sec-
tion 6 lists some conclusions, and possible future study.
2 Clustering Algorithm
The clustering algorithm is developed for multilook po-
larimetric data available in the form of polarimetric co-
variance matrices. It is also assumed that the scalar prod-
uct model is valid, and the multilooking procedure is a
simple box-car multilook averaging of single-look scat-
tering vectors. Currently, the equivalent number of looks
(ENL) is estimated only once from a homogeneous area
in the image, and is utilized throughout the lifetime of
the clustering algorithm, although this can be set to an
adaptive ENL estimation.
The clustering algorithm uses the method of multivari-
ate fractional moments (MoMFM), recently proposed in
[13], to estimate the texture shape parameters of the Gd
distribution within the expectation maximization frame-
work. It is pertinent to introduce the two dimensional
matrix log cumulants (2D-MLC) diagram here. Figure 1
shows the 2D-MLC diagram, depicting the color-coded
manifolds spanned by the theoretical MLCs of several
matrix-variate compound distributions. The dimension
spanned by the manifold is equal to the number of texture
parameters present in the compound PDF. The two tex-
ture shape parameters of the Gd distribution only need to
be estimated when the sample matrix log cumulants fall
within the Gd distribution manifold in the 2D-MLC dia-
gram. In this case MoMFM is used to estimate the texture
shape parameters. Outside this domain, only one texture
parameter needs to be estimated, which corresponds to
the texture parameter of the Kd or G0d
distributions de-
pending on whether the sample matrix log cumulant falls
on the left or the right side of the Gd distribution matrix
log cumulant manifold, respectively. In this case the cor-
responding method of matrix log cumulants (MoMLC) is
used for texture shape parameter estimation [14].
−4 −3 −2 −1 0 1 2 3 40
1
2
3
4
5
6
7
8
κ3
κ2
U/G
W
M
K
G0
Wis
Figure 1: Manifolds of different models in matrix log
cumulants diagram. The U and G models have the same
manifold. The W and M models are currently consid-
ered invalid and ignored.
The algorithm separates the image pixels into clusters
based on the Gd distribution. It uses a modified version
of the iterative expectation maximization algorithm as de-
tailed in [7] and contextual smoothening is achieved with
an MRF approach, which integrates the Gd distribution
to model the statistics of each image cluster and a Potts
model for the spatial context.
The goodness-of-fit (GoF) testing in the algorithm is per-
formed using Pearson’s chi-squared GoF test instead of
using matrix log cumulants based GoF procedures as in
[11]. The primary reason of not using matrix log cumu-
lants based GoF testing is that the theoretical matrix log
cumulants of the Gd distribution do not have closed forms
and can only be computed using numerical differentia-
tion. The Pearson’s GoF testing is done by comparing
the model fitting to the histogram of the determinants of
the multilook polarimetric covariance matrices.
The GoF of each cluster is used to automatically deter-
mine the number of significant clusters within the data-
set. Poorly fitting clusters are split into two clusters and
the EM-algorithm is re-applied to convergence. The algo-
rithm stops when all clusters are considered good-fits to
the data histograms. Consistent initialisation is achieved
by always starting with one cluster. This results in as
many statistically distinct classes as allowed by the cho-
sen underlying pdf, the number of data samples, and the
chosen confidence level, e.g., 95%. The algorithm op-
tionally includes adaptive sensitivity and sub-sampling
ability as explained in [7].
3 The Gd Distribution
The multilook polarimetric Gd distribution was initially
proposed in [3]. It has two texture shape parameters
α ∈ R, andω > 0. When ω → 0+, it reduces to the Kd or
G0d
distributions if α is positive or negative, respectively.
When |α| → ∞ or ω → ∞ it reduces to the Gaussian
case of sWd distribution. Its pdf is given by [5, 13]:
pC(C;L,Γ, α, ω, η) =LLd|C|L−d
Γd(L)|Γ|L1
ηαKα(ω)
×
(
2LTr(
Γ−1
C)
+ ωη
ω/η
)α−Ld
2
×Kα−Ld
(
√
ω/η (2LTr (Γ−1C) + ωη))
,
(1)
where L is the number of looks, Γ is the normalized sam-
ple covariance matrix, η is the scale parameter, Kν(·) is
modified Bessel function of the second kind and order ν.
Its MoMFM estimator can be derived from the following
equation:
E{Tr(
Σ−1
C)ν} =
Kα̂K1+ν(ω̂K1)Kν−1
α̂K1(ω̂K1)
Kνα+1(ω̂K1)
×Γ(Ld+ ν)
LνΓ(Ld),
(2)
by simultaneously solving two equations with ν = 1
8and
1
4. Outside the Gd distribution manifold ω → 0+, there-
1026
EUSAR 2014
ISBN 978-3-8007-3607-2 / ISSN 2197-4403 © VDE VERLAG GMBH ∙ Berlin ∙ Offenbach, Germany
fore only α needs to be estimated, which can be easily
done using the MoMLC estimators for Kd and G0d
distri-
butions listed in [14]. The MoMLC estimator for the tex-
ture shape parameters of the Ud distribution is also listed
in [14]. Moreover, the performance analysis of these esti-
mators on simulated PolSAR data can be found in [13,14]
1
2
3
4
5
6
Figure 2: Clustering of simulated Gd data with 6 classes,
sub-sampling = 4.
0 1 2 3 4 5 6 70
20
40
clu
ste
r:1
α=
19
ω=
0.0
|Σ|(1
/d) =
3
2.9
0 1 2 3 4 5 6 70
20
clu
ste
r:2
α=
−8
.2
ω=
9.7
|Σ|(1
/d) =
1
0.2
0 1 2 3 4 5 6 70
20
clu
ste
r:3
α=
3.7
ω=
0.0
|Σ|(1
/d) =
9
.24
0 1 2 3 4 5 6 70
20
40
clu
ste
r:4
α=
−3
3
ω=
0.0
|Σ|(1
/d) =
3
.67
0 1 2 3 4 5 6 70
5
10
clu
ste
r:5
α=
3.7
ω=
0.0
|Σ|(1
/d) =
0
.73
1
0 1 2 3 4 5 6 70
5
10
clu
ste
r:6
α=
−2
.1
ω=
0.0
|Σ|(1
/d) =
2
.43
Figure 3: Fitting of estimated G pdf to cluster histograms
of simulated data.
4 Results
The clustering algorithm using the Gd distribution has
been used to cluster both simulated and real PolSAR data.
Selected results for each case have been shown below.
4.1 Simulated PolSAR Data
Synthetic dual-pol PolSAR data 250 × 250 pixels with
six distinct classes of Gd distribution were generated us-
ing 5 looks. The parameters chosen for simulated data
were collected from real data samples. The results of
the clustering algorithm, with a sub-sampling of four,
after MRF smoothening are shown in Figure 2. The
labeled classes show a nearly perfect clustering perfor-
mance. The algorithm took approximately 1 minute and
2 seconds to compute the shown results using an Intel
quad core 3.1 GHz processor, with 8 Gb RAM, and MAT-
LAB software. The fitting of the G pdf to the six detected
clusters is shown in Figure 3.
1
2
3
4
5
6
7
8
9
Figure 4: (Top) False color Pauli RGB image. (Bottom)
Clustering of quad-pol TerraSARX data with 6 look and
sub-sampling = 8.
0 2 4 6 8 10 12 140
0.5
1
clu
ste
r:1
α=
5.7
83E
+03
ω=
779
|Σ|(1
/d) =
0.8
56
0 2 4 6 8 10 12 140
5
clu
ste
r:2
α=
−1.9
ω=
0.0
|Σ|(1
/d) =
0
.36
0 2 4 6 8 10 12 140
20
clu
ste
r:3
α=
−11
ω=
0.0
|Σ|(1
/d) =
0.0
652
0 2 4 6 8 10 12 140
5
10
clu
ste
r:4
α=
−11
ω=
0.0
|Σ|(1
/d) =
0.0
727
0 2 4 6 8 10 12 140
1
2
clu
ste
r:5
α=
−24
ω=
0.0
|Σ|(1
/d) =
0.0
724
0 2 4 6 8 10 12 140
5
10
clu
ste
r:6
α=
2.6
ω=
3.5
|Σ|(1
/d) =
0.0
368
0 2 4 6 8 10 12 140
5
clu
ste
r:7
α=
−2.3
40E
+03
ω=
0.0
|Σ|(1
/d) =
0.0
573
0 2 4 6 8 10 12 140
5
clu
ste
r:8
α=
−1.0
00E
+04
ω=
0.0
|Σ|(1
/d) =
0.0
548
0 2 4 6 8 10 12 140
50
clu
ste
r:9
α=
−1.0
00E
+04
ω=
0.0
|Σ|(1
/d) =
0.0
425
Figure 5: Fitting of estimated G pdf to cluster histograms
of real data.
1027
EUSAR 2014
ISBN 978-3-8007-3607-2 / ISSN 2197-4403 © VDE VERLAG GMBH ∙ Berlin ∙ Offenbach, Germany
4.2 Real PolSAR Data
The algorithm is also applied to quad-pol TerraSAR-X
data, 300 × 400 pixels, with an ENL of six using a sub-
sampling of eight. The results are shown in Figure 4,
where they can also be compared to the correspond-
ing Pauli decomposed false color image. Nine distinct
classes were found, with the first class containing only
one pixel. A comparison of the clustering results with
the Pauli decomposed image shows a visually acceptable
performance. The algorithm took 5 minutes and 11 sec-
onds to compute the shown results on the same comput-
ing platform. The fitting of the G pdf to the nine detected
clusters is shown in Figure 5.
5 Numerical Inaccuracy of Param-
eter Estimation
It has been experimentally observed that the parame-
ter estimation using multivariate fractional moments has
slight numerical inaccuracies, which accentuate on real
PolSAR data. There is a consensus between the authors
that, for practical purposes, the fast computation time of
MoMFM estimators outweighs their slight numerical in-
accuracy. In the proposed version of the algorithm, this
effect has been mitigated by using sub-sampling and also
limiting the maximum sample size to 10,000 pixels. This
reduces the sensitivity of the GoF test, enough to cancel
out the little inaccuracy in estimation. However, improve-
ment in the accuracy of these estimators will form a sub-
ject suitable for future research as the fast computational
time is highly desirable.
6 Conclusions and Future Work
A fast unsupervised clustering algorithm for multilook
PolSAR data has been proposed using the flexible G dis-
tribution for the first time. The results on simulated and
real PolSAR data look very promising. The computa-
tional time and software implementation have also been
found to be very straight forward. The only drawback
is the numerical inaccuracy during parameter estimation,
which will form a topic of further investigation.
7 Acknowledgments
This work has been funded by the EC FP7 project Dem-
ining ToolBOX (D-BOX), grant agreement no:284996,
and the TerraSAR-X dataset has been provided by DLR.
References
[1] C. Oliver and S. Quegan, Understanding Synthetic
Aperture Radar Images, 2nd ed. Raleigh, NC:
SciTech Publishing, 2004.
[2] J. Lee, D. Schuler, R. Lang, and K. Ranson, “K-
distribution for multi-look processed polarimetric
SAR imagery,” in Proc. IGARSS, vol. 4, Pasadena,
CA, Aug. 1994, pp. 2179–2181.
[3] C. Freitas, A. Frery, and A. Correia, “The polari-
metric G distribution for SAR data analysis,” Envi-
ronmetrics, vol. 16, no. 1, pp. 13–31, Feb. 2005.
[4] L. Bombrun, G. Vasile, M. Gay, and F. Totir, “Hier-
archical segmentation of polarimetric SAR images
using heterogeneous clutter models,” IEEE Trans.
Geosci. Remote Sens., vol. 49, no. 2, pp. 726–737,
Feb. 2011.
[5] S. Khan and R. Guida, “Application of mellin-kind
statistics to polarimetric g distribution for sar data,”
IEEE Trans. Geosci. Remote Sens., vol. PP, no. 99,
pp. 1–16, 2013.
[6] J.-M. Beaulieu and R. Touzi, “Segmentation of tex-
tured polarimetric SAR scenes by likelihood ap-
proximation,” IEEE Trans. Geosci. Remote Sens.,
vol. 42, no. 10, pp. 2063–2072, Oct. 2004.
[7] A. Doulgeris, S. Anfinsen, and T. Eltoft, “Auto-
mated non-gaussian clustering of polarimetric syn-
thetic aperture radar images,” IEEE Trans. Geosci.
Remote Sens., vol. 49, no. 10, pp. 3665–3676, Oct.
2011.
[8] A. C. Frery, J. Jacobo-Berlles, J. Gambini, and
M. Mejail, “Polarimetric SAR image segmentation
with B-splines and a new statistical model,” Multi-
dimensional Syst. Signal Process., vol. 21, no. 4, pp.
319–342, Dec. 2010.
[9] O. Harant, L. Bombrun, M. Gay, R. Fallourd,
E. Trouvé, and F. Tupin, “Segmentation and clas-
sification of polarimetric SAR data based on the
KummerU distribution,” in Proc. PolInSAR, Fras-
cati, Italy, 2011.
[10] V. Akbari, A. P. Doulgeris, G. Moser, T. Eltoft,
S. N. Anfinsen, and S. B. Serpico, “A textural-
contextual model for unsupervised segmentation of
multipolarization synthetic aperture radar images,”
IEEE Trans. Geosci. Remote Sens., no. 99, pp. 1–
12, 2012.
[11] A. Doulgeris, V. Akbari, and T. Eltoft, “Automatic
PolSAR segmentation with the U-distribution and
Markov Random Fields,” in Proc. EUSAR, Nurem-
berg, Germany, Apr. 2012, pp. 183–186.
[12] B. Gough, GNU Scientific Library Reference Man-
ual - Third Edition. Network Theory Ltd., 2009.
[13] S. Khan and R. Guida, “On fractional moments
of multilook polarimetric whitening filter for po-
larimetric SAR data,” IEEE Trans. Geosci. Remote
Sens., vol. PP, no. 99, pp. 1–11, 2013.
[14] S. Anfinsen and T. Eltoft, “Application of the
matrix-variate Mellin transform to analysis of po-
larimetric radar images,” IEEE Trans. Geosci. Re-
mote Sens., vol. 49, no. 6, pp. 2281–2295, Jun.
2011.
1028
EUSAR 2014
ISBN 978-3-8007-3607-2 / ISSN 2197-4403 © VDE VERLAG GMBH ∙ Berlin ∙ Offenbach, Germany