Fast unsupervised Bayesian image segmentation with ... · Fast unsupervised Bayesian image...

10
1057-7149 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2675165, IEEE Transactions on Image Processing 1 Fast unsupervised Bayesian image segmentation with adaptive spatial regularisation Marcelo Pereyra and Steve McLaughlin Abstract—This paper presents a new Bayesian estimation technique for hidden Potts-Markov random fields with unknown regularisation parameters, with application to fast unsupervised K-class image segmentation. The technique is derived by first removing the regularisation parameter from the Bayesian model by marginalisation, followed by a small-variance-asymptotic (SVA) analysis in which the spatial regularisation and the integer-constrained terms of the Potts model are decoupled. The evaluation of this SVA Bayesian estimator is then relaxed into a problem that can be computed efficiently by iteratively solving a convex total-variation denoising problem and a least-squares clustering (K-means) problem, both of which can be solved straightforwardly, even in high-dimensions, and with parallel computing techniques. This leads to a fast fully unsupervised Bayesian image segmentation methodology in which the strength of the spatial regularisation is adapted automatically to the observed image during the inference procedure, and that can be easily applied in large 2D and 3D scenarios or in applications requiring low computing times. Experimental results on synthetic and real images, as well as extensive comparisons with state-of- the-art algorithms, confirm that the proposed methodology offer extremely fast convergence and produces accurate segmentation results, with the important additional advantage of self-adjusting regularisation parameters. Index Terms—Image segmentation, Bayesian methods, spatial mixture models, Potts Markov random field, convex optimisation. I. I NTRODUCTION Image segmentation is a canonical inverse problem which involves classifying image pixels into clusters that are spatially coherent and have well defined boundaries. It is widely ac- cepted that this task can be formulated as a statistical inference problem and most state-of-the-art image segmentation methods compute solutions by performing statistical inference (e.g., computing penalized maximum likelihood or maximum-a- posteriori estimates). In this paper we focus on new Bayesian computation methodology for hidden Potts-Markov random fields (MRFs) [1], a powerful class of statistical models that is widely used in Bayesian image segmentation methods (see [2], [3], [4], [5] for recent examples in hyperspectral, non destructive testing, ultrasound, and fMRI imaging). This work was funded in part by the SuSTaIN program - EPSRC grant EP/D063485/1 - at the Department of Mathematics, University of Bristol, in part by a postdoctoral fellowship from French Ministry of Defence, and in part by the EPSRC via grant EP/J015180/1.This paper was presented in part at EUSIPCO’14 in Lisbon September 2014. Marcelo Pereyra holds a Marie Curie Intra-European Fellowship for Career Development at the University of Bristol, School of Mathematics, University Walk, BS8 1TW, UK, and is a Visiting Scholar at Heriot Watt Univer- sity, Mathematical and Computer Sciences, Edinburgh, EH14 4AS (e-mail: [email protected]). Steve McLaughlin is with Heriot Watt University, Engineering and Physical Sciences, Edinburgh, EH14 4AS, UK (e-mail: [email protected]). Despite the wide range of applications, performing inference on hidden Potts MRFs remains a computationally challenging problem. In particular, computing the maximum-a-posteriori (MAP) estimator for these models is generally NP-hard, and thus most image processing methods compute approxi- mate estimators. This has driven the development of efficient approximate inference algorithms, particularly over the last decade. The current predominant approaches for approximate inference on MRFs are based on convex models and convex approximations that can be solved efficiently by convex opti- misation [6], [7], [8], and on approximate estimators computed with graph-cut [9], [10] and message passing algorithms [11], [12], [13]. In a similar fashion, modern algorithms to solve ac- tive contour models, the other main class of models for image segmentation, are also principally based on convex relaxations and convex optimisation [14], [15] and on Riemannian steepest descent optimisation schemes [16], [17], [18], [19]. An important limitation of these computationally efficient approaches is that they are supervised, in the sense that they require practitioners to specify the value of the regularisation parameter of the Potts MRF. However, it is well known that appropriate values for regularisation parameters can be highly image dependent and sometimes difficult to select a priori, thus requiring practitioners to set parameter values heuristically or by visual cross-validation. The Bayesian framework offers a range of strategies to circumvent this problem and to design unsupervised image segmentation inference procedures that self-adjust their regularisation parameters. Unfortunately, the computations involved in these inferences are beyond the scope of existing fast approximate inference algorithms. As a consequence, unsupervised image segmentation methods have to use more computationally intensive strategies such as Monte Carlo approximations [20], [21], variational Bayes approximations [22], and EM algorithms based on mean-field like approximations [23], [24]. In this paper we propose a highly efficient Bayesian compu- tation approach specifically designed for performing approx- imate inference on hidden Potts-Markov random fields with unknown regularisation parameters, with application to fast unsupervised K-class image segmentation. A main original contribution of our development reported here is to use a small-variance-asymptotic (SVA) analysis to design an approx- imate MAP estimator in which the spatial regularisation and the integer-constrained terms of the Potts model are decoupled. The evaluation of this SVA Bayesian estimator can then be relaxed into a problem that can be computed efficiently by iter- atively solving a convex total-variation denoising problem and a least-squares clustering (K-means) problem, both of which

Transcript of Fast unsupervised Bayesian image segmentation with ... · Fast unsupervised Bayesian image...

Page 1: Fast unsupervised Bayesian image segmentation with ... · Fast unsupervised Bayesian image segmentation with adaptive spatial regularisation Marcelo Pereyra and Steve McLaughlin Abstract—This

1057-7149 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2675165, IEEETransactions on Image Processing

1

Fast unsupervised Bayesian image segmentationwith adaptive spatial regularisation

Marcelo Pereyra and Steve McLaughlin

Abstract—This paper presents a new Bayesian estimationtechnique for hidden Potts-Markov random fields with unknownregularisation parameters, with application to fast unsupervisedK-class image segmentation. The technique is derived by firstremoving the regularisation parameter from the Bayesian modelby marginalisation, followed by a small-variance-asymptotic(SVA) analysis in which the spatial regularisation and theinteger-constrained terms of the Potts model are decoupled. Theevaluation of this SVA Bayesian estimator is then relaxed into aproblem that can be computed efficiently by iteratively solvinga convex total-variation denoising problem and a least-squaresclustering (K-means) problem, both of which can be solvedstraightforwardly, even in high-dimensions, and with parallelcomputing techniques. This leads to a fast fully unsupervisedBayesian image segmentation methodology in which the strengthof the spatial regularisation is adapted automatically to theobserved image during the inference procedure, and that canbe easily applied in large 2D and 3D scenarios or in applicationsrequiring low computing times. Experimental results on syntheticand real images, as well as extensive comparisons with state-of-the-art algorithms, confirm that the proposed methodology offerextremely fast convergence and produces accurate segmentationresults, with the important additional advantage of self-adjustingregularisation parameters.

Index Terms—Image segmentation, Bayesian methods, spatialmixture models, Potts Markov random field, convex optimisation.

I. INTRODUCTION

Image segmentation is a canonical inverse problem whichinvolves classifying image pixels into clusters that are spatiallycoherent and have well defined boundaries. It is widely ac-cepted that this task can be formulated as a statistical inferenceproblem and most state-of-the-art image segmentation methodscompute solutions by performing statistical inference (e.g.,computing penalized maximum likelihood or maximum-a-posteriori estimates). In this paper we focus on new Bayesiancomputation methodology for hidden Potts-Markov randomfields (MRFs) [1], a powerful class of statistical models thatis widely used in Bayesian image segmentation methods (see[2], [3], [4], [5] for recent examples in hyperspectral, nondestructive testing, ultrasound, and fMRI imaging).

This work was funded in part by the SuSTaIN program - EPSRC grantEP/D063485/1 - at the Department of Mathematics, University of Bristol, inpart by a postdoctoral fellowship from French Ministry of Defence, and inpart by the EPSRC via grant EP/J015180/1.This paper was presented in partat EUSIPCO’14 in Lisbon September 2014.

Marcelo Pereyra holds a Marie Curie Intra-European Fellowship for CareerDevelopment at the University of Bristol, School of Mathematics, UniversityWalk, BS8 1TW, UK, and is a Visiting Scholar at Heriot Watt Univer-sity, Mathematical and Computer Sciences, Edinburgh, EH14 4AS (e-mail:[email protected]).

Steve McLaughlin is with Heriot Watt University, Engineering and PhysicalSciences, Edinburgh, EH14 4AS, UK (e-mail: [email protected]).

Despite the wide range of applications, performing inferenceon hidden Potts MRFs remains a computationally challengingproblem. In particular, computing the maximum-a-posteriori(MAP) estimator for these models is generally NP-hard,and thus most image processing methods compute approxi-mate estimators. This has driven the development of efficientapproximate inference algorithms, particularly over the lastdecade. The current predominant approaches for approximateinference on MRFs are based on convex models and convexapproximations that can be solved efficiently by convex opti-misation [6], [7], [8], and on approximate estimators computedwith graph-cut [9], [10] and message passing algorithms [11],[12], [13]. In a similar fashion, modern algorithms to solve ac-tive contour models, the other main class of models for imagesegmentation, are also principally based on convex relaxationsand convex optimisation [14], [15] and on Riemannian steepestdescent optimisation schemes [16], [17], [18], [19].

An important limitation of these computationally efficientapproaches is that they are supervised, in the sense that theyrequire practitioners to specify the value of the regularisationparameter of the Potts MRF. However, it is well known thatappropriate values for regularisation parameters can be highlyimage dependent and sometimes difficult to select a priori, thusrequiring practitioners to set parameter values heuristically orby visual cross-validation. The Bayesian framework offers arange of strategies to circumvent this problem and to designunsupervised image segmentation inference procedures thatself-adjust their regularisation parameters. Unfortunately, thecomputations involved in these inferences are beyond thescope of existing fast approximate inference algorithms. Asa consequence, unsupervised image segmentation methodshave to use more computationally intensive strategies suchas Monte Carlo approximations [20], [21], variational Bayesapproximations [22], and EM algorithms based on mean-fieldlike approximations [23], [24].

In this paper we propose a highly efficient Bayesian compu-tation approach specifically designed for performing approx-imate inference on hidden Potts-Markov random fields withunknown regularisation parameters, with application to fastunsupervised K-class image segmentation. A main originalcontribution of our development reported here is to use asmall-variance-asymptotic (SVA) analysis to design an approx-imate MAP estimator in which the spatial regularisation andthe integer-constrained terms of the Potts model are decoupled.The evaluation of this SVA Bayesian estimator can then berelaxed into a problem that can be computed efficiently by iter-atively solving a convex total-variation denoising problem anda least-squares clustering (K-means) problem, both of which

Page 2: Fast unsupervised Bayesian image segmentation with ... · Fast unsupervised Bayesian image segmentation with adaptive spatial regularisation Marcelo Pereyra and Steve McLaughlin Abstract—This

1057-7149 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2675165, IEEETransactions on Image Processing

2

can be solved straightforwardly, even in high-dimensions, andwith parallel computing techniques.

Small-variance asymptotics estimators were introduced in[25] as a computationally efficient framework for perform-ing inference in Dirichlet process mixture models and havebeen recently applied to other important machine learningclassification models such as the Beta process and sequentialhidden Markov models [26], as well as to the problem ofconfiguration alignment and matching [27]. Here we exploitthese same techniques for the hidden Potts MRF to developan accurate and computationally efficient image segmentationmethodology for the fully unsupervised case of unknown classstatistical parameters (e.g., class means) and unknown Pottsregularisation parameter.

The paper is organised as follows: in Section II we presenta brief background to Bayesian image segmentation using thePotts MRF. This then followed by a detailed developmentof our proposed methodology. In Sections IV and V themethodology is applied to some synthetic and real test imagesand compared to other image segmentation approaches fromthe state of the art. Finally some brief conclusions are drawnin Section VI.

II. BACKGROUND

We begin by recalling the standard Bayesian model usedin image segmentation problems, which is based on a finitemixture model and a hidden Potts-Markov random field withknown regularisation parameter β. For simplicity we focuson univariate Gaussian mixture models. However, the resultspresented hereafter can be generalised to all exponential-family mixture models (e.g., mixtures of multivariate Gaus-sian, Rayleigh, Poisson, Gamma, Binomial, etc.) by followingthe approach described in [28].

Let yn ∈ R denote the nth observation (i.e. pixel or voxel) ina lexicographical vectorized image y = (y1, . . . , yN )T ∈ RN .We assume that y is made up by K regions {C1, . . . , CK} suchthat the observations in the kth class are distributed accordingto the following conditional marginal observation model

yn|n ∈ Ck ∼ N (µk, σ2), (1)

where µk ∈ R represents the mean intensity of class Ck. Foridentifiability we assume that µk 6= µj for all k 6= j.

To perform segmentation, a label vector z = (z1, . . . , zN )T

is introduced to map or classify observations y to classesC1, . . . , CK (i.e., zn = k if and only if n ∈ Ck). Assumingthat observations are conditionally independent given z andgiven the parameter vector µ = (µ1, . . . , µK), the likelihoodof y can be expressed as follows

f(y|z,µ) =K∏k=1

∏n∈Sk

pN (yn|µk, σ2), (2)

with Sk = {n : zn = k} (to simplify notation the dependenceof distributions on σ2 is omitted). A Bayesian model forimage segmentation is then defined by specifying the priordistribution of the unknown parameter vector (z,µ). The priorfor z is the homogenous K-state Potts MRF [29]

f(z|β) =1

C(β)exp [βH(z)], (3)

with regularisation hyper-parameter β ∈ R+, Hamiltonian

H(z) =N∑n=1

∑n′∈V(n)

δ(zn == zn′), (4)

where δ(·) is the Kronecker function and V(n) is the indexset of the neighbors of the nth voxel (most methods use the1st order neighbourhoods depicted in Fig. 2), and normalisingconstant (or partition function)

C(β) =∑z

exp [βH(z)]. (5)

Notice that the Potts prior (3) is defined conditionally to agiven value of β. Most image segmentation methods based onthis prior are supervised; i.e., assume that the value of β isknown and specified a priori by the practitioner. Alternatively,unsupervised methods consider that β is unknown and seek toadjust its value automatically during the image segmentationprocedure (this point is explained in detail in Section III).

In a similar fashion, the class means are considered priorindependent and assigned Gaussian priors µk ∼ N (0, ρ2) withfixed variance ρ2,

f(µ) =K∏k=1

pN (µk|0, ρ2). (6)

(to simplify notation the dependence of distributions on thefixed quantity ρ2 is omitted).

Then, using Bayes theorem and taking into account theconditional independence structure of the model (see Fig. 1),the joint posterior distribution of (z, µ) given y and β can beexpressed as follows

f (z,µ|y, β) ∝ f(y|z,µ)f(z|β)f(µ), (7)

where ∝ denotes proportionality up to a normalising constantthat can be retrieved by setting

∫f (z,µ|y, β) dzdµ = 1.

The graphical structure of this Bayesian model is summarisedin Fig. 1 below. Notice the Markovian structure of z and thatobservations yn are conditionally independent given the modelparameters z, µ and σ2.

ρ2

��

β

��σ2 µ

��

z

��y

β

����

����

z : zn

��

zn+1

��

zn′

��

zn′+1

��

y : yn yn

yn′ yn′+1

Fig. 1. [Left:] Directed acyclic graph of the standard Bayesian model forimage segmentation (parameters with fixed values are represented using blackboxes). [Right] Local hierarchical representation of the hidden Potts MRF andthe observed image for 4 neighbouring pixels.

Page 3: Fast unsupervised Bayesian image segmentation with ... · Fast unsupervised Bayesian image segmentation with adaptive spatial regularisation Marcelo Pereyra and Steve McLaughlin Abstract—This

1057-7149 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2675165, IEEETransactions on Image Processing

3

Fig. 2. 4-pixel (left) and 6-voxel (right) neighborhood structures. Thepixel/voxels considered appears as a void red circle whereas its neighborsare depicted in full black and blue.

Finally, given the Bayesian model (7), a segmentation of yis typically obtained by computing the MAP estimator

z1, µ1 = argmaxz,µ

f (z,µ|y, β) , (8)

which can also be obtained by solving the equivalent optimi-sation problem

z1, µ1 = argminz,µ

− log f (z,µ|y, β) . (9)

Unfortunately these optimisation problems are known to beNP-hard due to the combinatorial nature of the Potts Hamil-tonian H(z) defined in (4). As mentioned previously, modernimage segmentation methods based on (7) typically addressthis issue by using approximate (local) integer optimisationalgorithms (e.g., graph-cut, message passing) [10], [11], [12],and more recently with convex relaxations of the Potts model(see for instance [6], [7]).

III. PROPOSED METHOD

This section presents a highly computationally efficientapproach for performing approximate inference on z whenthe value of the regularisation parameter β is unknown. Theapproach is based on a small-variance asymptotics (SVA)analysis combined with a convex relaxation and a pseudo-likelihood approximation of the Potts MRF. Our developmenthas three main steps. In the first step we adopt a hierarchicalBayesian approach to remove β from the model by marginal-isation; because marginalising w.r.t. β requires knowledge ofthe intractable Potts partition function (5) we use a pseudo-likelihood approximation. However, performing inference withthe resulting marginalised model is still NP-hard. In the secondpart of our development we address this difficulty by usingauxiliary variables and an SVA analysis to decouple the spatialregularisation and the integer-constrained terms of the Pottsmodel. The evaluation of the resulting SVA Bayesian estimatoris then relaxed into a problem that can be computed efficientlyby iteratively solving a convex total-variation denoising prob-lem and a least-squares clustering problem, both of whichcan be solved straightforwardly, even in high-dimensions,with parallel implementations of Chambolle’s optimisationalgorithm [30] and of K-means [31].

A. Marginalisation of the regularisation parameter β

Following a hierarchical Bayesian approach, we address thefact that the value of β is unknown by modelling it as anadditional random variable of the Bayesian model. Precisely,we assign β a prior distribution f(β) and define an augmented

model that includes β within its unknown parameter vector. Byusing Bayes’ theorem we obtain the joint posterior distribution

f (z,µ, β|y) ∝ f(y|z,µ)f(µ)f(z|β)f(β) (10)

which includes β as an unknown variable. The rationale forreplacing the fixed regularisation parameter β of (7) by arandom variable with prior f(β) is that it is often possibleto specify this prior distribution such that the amount ofregularisation enforced by the Potts MRF is driven by dataand the impact of f(β) on the inferences is minimal. At thesame time, experienced practitioners with knowledge of goodvalues of β can specify f(β) to exploit their prior beliefs. Inthis paper we use a gamma (hyper-)prior distribution

f(β) = γαβα−1 exp (−γβ)1R+(β)/Γ(α)

because it has favourable analytical tractability properties thatwill be useful for our development (appropriate values for thefixed parameters α and γ will be derived later through a small-variance asymptotics analysis).

Moreover, in order to marginalise β from the model wenotice that β is conditionally independent of y given z; tobe precise, that f (z,µ, β|y) = f (β|z)f(z,µ|y). Therefore,integrating f (z,µ, β|y) with respect to β is equivalent toredefining the posterior distribution (12) with the marginalprior f(z) =

∫R+ f(z, β)dβ. Evaluating this marginal prior

exactly is not possible because it requires computing thenormalising constant of the Potts model C(β) defined in (5),which is a reputedly intractable problem [20]. To obtain ananalytically tractable approximation for this marginal priorwe adopt a pseudo-likelihood approach [32] and use theapproximation C(β) ∝ β−N , leading to

f(z) =

∫R+

f(z, β)dβ

∝∫R+

βN exp (βH(z))βα−1 exp (−γβ)dβ

∝ [γ −H(z)]−(α+N),

(11)

and to the following (marginal) posterior distribution

f (z,µ|y) ∝

[K∏k=1

∏n∈Sk

pN(yn|µk, σ2

)]× f(µ) (γ −H(z))

−(α+N),

(12)

that does not depend on the regularisation parameter β.

B. Small-variance approximation

The next step of our development is to conduct a small-variance asymptotics analysis on (12) and derive the asymp-totic MAP estimator of z,µ. We begin by introducing acarefully selected auxiliary vector x such that y and (z,µ)are conditionally independent given x, and that the posteriorf (x, z,µ|y) has the same maximisers as (7) (after projectionon the space of (z,µ)). More precisely, we define a randomvector x ∈ RN with degenerate prior

f(x|z,µ) =K∏k=1

∏n∈Sk

δ(xn − µk), (13)

Page 4: Fast unsupervised Bayesian image segmentation with ... · Fast unsupervised Bayesian image segmentation with adaptive spatial regularisation Marcelo Pereyra and Steve McLaughlin Abstract—This

1057-7149 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2675165, IEEETransactions on Image Processing

4

and express the likelihood of y given x, z and µ as

f(y|x, z,µ) = f(y|x) =N∏n=1

pN (yn|xn, σ2).

The prior distributions for z and µ remain as defined above.The joint posterior distribution of x, z,µ is given by

f (x, z,µ, β|y) ∝ f(y|x)f(x|z,µ)f(z|β)f(µ)

[K∏k=1

∏n∈Sk

pN (yn|xn, σ2)δ(xn − µk)

]× f(µ) [γ −H(z)]

−(α+N).

(14)

Notice that from an inference perspective (14) is equivalent to(12), in the sense that marginalising x in (14) results in (12).

Moreover, we define H∗(z) as the “complement” of theHamiltonian H(z) in the sense that for any z ∈ [1, . . . ,K]N

H(z) +H∗(z) =∑Nn=1|V(n)|,

where |V(n)| denotes the cardinality of the neighbourhoodstructure of the nth pixel. For the Potts MRF this complementis given by

H∗(z) ,N∑n=1

∑n′∈V(n)

δ(zn 6= zn′). (15)

Replacing H(z) =∑Nn=1|V(n)|−H∗(z) in (14) we obtain

f (x, z,µ, β|y)

(K∏k=1

∏n∈Sk

pN (yn|xn, σ2)δ(xn − µk)

)

× f(µ)[H∗(z) + (γ −

∑Nn=1|V(n)|)

]−(α+N)

.

(16)

Furthermore, noting that H∗(z) only measures if neighbourlabels are identical or not, regardless of their values, it iseasy to check that the posterior (14) remains unchanged ifwe substitute H∗(z) with H∗(x)

f (x, z,µ, β|y)

[K∏k=1

∏n∈Sk

pN (yn|xn, σ2)δ(xn − µk)

]

× f(µ)[H∗(x) + (γ −

∑Nn=1|V(n)|)

]−(α+N)

.

(17)

Finally, we make the observation that for 1st order neighbour-hoods (see Fig. 2) we have H∗(x) = 2||∇x||0, where||∇x||0 = ||∇hx||0+||∇vx||0 denotes the `0 norm of thehorizontal and vertical components of the 1st order discretegradient of x, and therefore

f (x, z,µ, β|y)

[K∏k=1

∏n∈Sk

pN (yn|xn, σ2)δ(xn − µk)

]

× f(µ)[||∇x||0+(γ −

∑Nn=1|V(n)|)/2

]−(α+N)

.

(18)

The graphical structure of this equivalent hierarchicalBayesian model is summarised in Fig. 3 below. Notice that in

this model x separates y and σ2 from the other model parame-ters, that the regularisation parameter β has been marginalised,that the MRF is now enforcing spatial smoothness on x notz, and that the elements of z are prior independent.

α, γ

��

ρ2

��β

��

µ

��

z

��σ2 x

��y!!

z : zn

��

zn+1]

��

zn′

��

zn′+1

��

x : xn

��

xn+1

��

xn′

��

xn′+1

��

y : yn yn

yn′ yn′+1

Fig. 3. [Left:] Directed acyclic graph of the proposed Bayesian model,augmented by the auxiliary variable x decoupling µ and z from y, andwith marginalisation of the regularisation parameter β (parameters with fixedvalues are represented using solid black boxes, marginalised variables appearin dashed boxes). [Right] Local representation of three layers of the modelfor 4 neighbouring pixels.

We are now ready to conduct a small-variance asymptoticsanalysis on (18) and derive the asymptotic MAP estimator ofx, z,µ, which is defined for our model as [25]

argminx,z,µ

limσ2→0

−σ2 log f (x, z,µ|y) .

First, we use the fact that δ(s) = limτ2→0 pN (s|0, τ2) toexpress (18) as follows

f (x, z,µ|y, β)

∝ limτ2→0

(K∏k=1

∏n∈Sk

pN (yn|xn, σ2)pN (xn|µk, τ2)

)

× f(µ)[||∇x||0+(γ −

∑Nn=1|V(n)|)/2

]−(α+N)

,

∝ limτ2→0

(K∏k=1

∏n∈Sk

exp

(− (xn − yn)2

2σ2− (xn − µk)2

2τ2

))

× f(µ)[||∇x||0+(γ −

∑Nn=1|V(n)|)/2

]−(α+N)

.

(19)

Then, in a manner akin to Broderick et al. [25], we allow themodel’s hyper parameters to scale with σ2 in order to preservethe balance between the prior and the likelihood and avoid atrivial limit. More precisely, we set α = N/σ2 and assumethat σ2 vanishes at the same speed as τ2. Then, the limit of−σ2 log f (x, z,µ|y) as σ2 → 0 is given by

limσ2→0

−σ2 log f (x,z,µ|y)

=

K∑k=1

∑n∈Sk

1

2(xn − yn)2 +

1

2(xn − µk)2

+N log(||∇x||0+(γ −∑Nn=1|V(n)|)/2),

(20)

Page 5: Fast unsupervised Bayesian image segmentation with ... · Fast unsupervised Bayesian image segmentation with adaptive spatial regularisation Marcelo Pereyra and Steve McLaughlin Abstract—This

1057-7149 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2675165, IEEETransactions on Image Processing

5

and the MAP asymptotic estimators of x, z,µ by

argminx,z,µ

K∑k=1

∑n∈Sk

1

2(xn − yn)2 +

1

2(xn − µk)2

+N log(||∇x||0+1),

(21)

where we have set γ = 2+∑Nn=1|V(n)| such that the penalty

log[||∇x||0+(γ −

∑Nn=1|V(n)|)/2

]≥ 0.

C. Convex relaxation and optimisationComputing the estimator (21) is still NP-hard due to

log(||∇x||0+1). To address this difficulty we use a convexrelaxation of ||∇x||0 and exploit the concavity of the loga-rithmic function. Precisely, we replace ||∇x||0 by the convexapproximation TV(x) = ||∇x||1−2, (i.e., the isotropic total-variation pseudo-norm of x [33]), and obtain the followingoptimisation problem

argminx,z,µ

K∑k=1

∑n∈Sk

1

2(xn − yn)2 +

1

2(xn − µk)2

+N log(TV (x) + 1),

(22)

which can be very efficiently computed by iterative minimisa-tion w.r.t. x, z and µ. The minimisation of (22) w.r.t. z (withx and µ fixed) is a trivial separable integer problem that can beformulated as N independent (pixel-wise) minimisation prob-lems over 1, . . . ,K (these unidimensional integer problemscan be solved by simply checking the value zn = 1, . . . ,Kthat minimises (22) for each pixel n = 1, . . . , N ). Similarly,the minimisation with respect to µ is a trivial quadraticleast squares fitting problem with analytic solution (i.e., bysetting µk = 1

|Sk|∑n∈Sk xn for each k = 1, . . . ,K, where

|Sk| denotes the cardinality of Sk). Also note that iterativelyminimising (22) with respect to z and µ, with fixed x, isequivalent to solving a least squares clustering problem withthe popular K-means algorithm [31]. Moreover, the minimi-sation of (22) w.r.t. x (with z and µ fixed) is achieved bysolving the non-convex optimisation problem

argminx

K∑k=1

∑n∈Sk

1

2(xn − yn)2 +

1

2(xn − µk)2

+N log [TV (x) + 1] ,

(23)

which was studied in detail in [34]. Essentially, given someinitial condition v(0) ∈ RN , (23) can be efficiently minimisedby majorisation-minimisation (MM) by iteratively solving thefollowing sequence of trivial convex problems,

v(`+1) = argminx

K∑k=1

∑n∈Sk

1

2(xn − yn)2 +

1

2(xn − µk)2

+ λ`TV (x) with λ` =N

TV [v(`)] + 1,

(24)

in which λ` plays the role of a regularisation parameter, andwhere we have used the majorant [34]

q(x|v(`)) =

(TV (x)− TV (v(`))

)(TV (v(`)) + 1)

+ log (TV (x) + 1)

≥ log(TV (v(`)) + 1

).

(25)

Notice that each step of (24) is equivalent to a trivial convextotal-variation denoising problem that can be very efficientlysolved, even in high-dimensional scenarios, by using modernconvex optimisation techniques (in this paper we used aparallel implementation of Chambolle’s algorithm [30]).

The proposed unsupervised segmentation algorithm basedon (22) is summarised in Algo. 1 below. We note at this pointthat because the overall minimisation problem is not convexthe solution obtained by iterative minimisation of (22) mightdepend on the initial values of x, z,µ. In all our experimentswe have used the initialisation x(0) = 2y, z = [1, . . . , 1]T ,µ = [0, . . . , 0]T that produced good estimation results.

Algorithm 1 Unsupervised Bayesian segmentation algorithm1: Input: Image y, number of maximum outer iterations T

and inner iterations L, tolerance ε.2: Initialise x(0) = 2y, z = [1, . . . , 1]T , µ = [0, . . . , 0]T .3: for t = 1 : T do4: Set v(0) = x(t−1).5: for ` = 0 : L do6: Set λ` = N/{TV [v(`)] + 1}.7: Compute v(`+1) using (24), with fixed z = z(t−1)

and µ = µ(t−1), using Chambolle’s algorithm [30].8: if (N/{TV [v(`+1)] + 1} − λ) ≥ ελ then9: Set ` = `+ 1.

10: else11: Exit to line 14.12: end if13: end for14: Set x(t) = v(L).15: Compute z(t) and µ(t) by least-squares clustering of

x(t) using the K-means algorithm [31].16: if z(t) 6= z(t−1) then17: Set t = t+ 1.18: else19: Exit to line 22.20: end if21: end for22: Output: Segmentation z(t), µ(t), λ = N/(TV [x(t)] + 1).

IV. VALIDATION WITH SYNTHETIC DATA

In this section we validate the proposed Bayesian imagesegmentation methodology with a series of experiments onsynthetic data for which we have ground truth available. Toassess the accuracy of our method we compare the resultswith the estimates produced by the Markov chain Monte Carloalgorithm [20], which estimates the marginal posterior of thesegmentation labels f (z|y) with very high accuracy. For com-pleteness, we also report comparisons with the Iterated Con-ditional Modes (ICM) method [35], which is the predominantapproach to perform approximate inferences with the hiddenPotts MRF model. We consider two fully unsupervised in-stances of this method. The first is a non-iterative algorithm inwhich µ,σ, z are initialised by K-means clustering, followedby β estimated from z by pseudo-likelihood estimation [36],and finally z estimated by ICM conditionally on the values

Page 6: Fast unsupervised Bayesian image segmentation with ... · Fast unsupervised Bayesian image segmentation with adaptive spatial regularisation Marcelo Pereyra and Steve McLaughlin Abstract—This

1057-7149 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2675165, IEEETransactions on Image Processing

6

of µ,σ and β. The second instance is an iterative algorithmin which we update alternatively z by ICM, β by pseudo-likelihood estimation, and µ and σ by maximum-likelihoodestimation, until the estimate of z stabilises (this algorithm isalso initialised by K-means clustering). The iterative instanceis generally more accurate than the non-iterative one becausethe estimates of µ,σ and β are refined in each iteration,however it is also more computationally expensive.

We tested the algorithms with the four synthetic datasetsdisplayed in Figure 4, which we have designed to represent arange of challenging segmentation conditions related to high-noise, large numbers of classes, and model misspecification(i.e., deviations from the model such as heteroscedasticity andnon-gaussianity):

1) GMM4: Gaussian mixture model with K = 4 re-gions with parameters µ = {0, 1, 2, 4}, σ ={1,√

2,√

3/2,√

2}, and spatial organisation accordingto a Potts MRF with β = 1.2 and size 256×256 pixels,resulting in a signal-to-noise ratio (SNR) of 7.7dB. Thisdataset is challenging because there is strong overlapbetween the distribution of the mixture components(i.e., low SNR) as well as heteroscedasticity, which ourmethod does not take into account and hence representsa case of mild likelihood misspecification.

2) GMM8: Gaussian mixture model with K = 8 regionswith parameters µ = {1, 2, . . . , 8}, σ = {0.3, . . . , 0.3},and spatial organisation according to a Potts MRF withβ = 1.5 and size 256× 256 pixels. The main challengehere is the large number of mixture components, whichis further complicated by the fact that the distributionsoverlap partially (the SNR for this dataset is 24.1dB).

3) LMM2: Laplace mixture model with K = 2 compo-nents with parameters µ = {1, 2}, σ = {1, 1}, andcheckerboard spatial organisation (size 256×256 pixels),resulting in a very low SNR value of 1.0dB. Thisdataset is challenging because it strongly deviates fromthe Bayesian model considered, which is misspecifiedboth at the level of the prior and the likelihood (notethat deviations from the model can degrade significantlysegmentation performance [37]). Also, both mixturecomponents overlap significantly, making the segmen-tation even more difficult.

4) PMM3: Poisson mixture model with K = 3 componentswith parameter µ = {1, 6, 11}, and spatial organi-sation according to the three main structures of theShepp-Logan phantom. This dataset has a low SNRvalue of 4.7dB. Again, the challenges here are the strongmisspecification in the likelihood and prior, and that themixture components overlap.

All experiments have been conducted using a MATLABimplementation of Algo. 1 with parameters T = 50, L = 25,ε = 10−3, and computed on an Intel i7 quad-core workstationrunning MATLAB 2014a. For the ICM algorithms we haveused the MATLAB implementation of [37]. The implemen-tation of the MCMC algorithm [20] is written in MATLABwith specific functions in C, so it has an advantage in termsof computational performance.

Table I reports the segmentation accuracy for the four

test data and each method (we measure accuracy as thepercentage of correctly classified pixels with respect to theground truth). We observe that the MCMC method producedthe most accurate segmentation results, with a remarkableaccuracy of the order of 95%−99%, followed by the proposedmethod which achieved 90% − 96% of correctly classifiedpixels. Moreover, we also observe that the ICM algorithmswere less accurate on average, and struggled particularly withData 2 because they were not able to estimate correctlythe mixture model parameters. More importantly, Table IIreports the computing times associated with these experiments.Observe that the proposed method is very computationallyefficient and was one or two orders of magnitude faster thanthe ICM and MCMC approaches in all experiments (noticethat these low computing times are in agreement with a largebody of literature reporting that convex relaxations, combinedwith convex optimisation algorithms, lead to state-of-the-artcomputational performance). In conclusion, these experimentswith synthetic data indicate that the proposed methodologyoffer extremely fast and accurate segmentation results. Finally,for completeness, we note that in all cases Algo. 1 convergedin t = 2 iterations, and determined the following values forthe regularisation parameter λ`: GMM4, λ` = 5.5603; GMM8,λ` = 5.6314; LMM2, λ` = 6.5282; and PMM3, λ` = 4.8355.

(a) GMM4 (b) GMM8

(c) LMM2 (c) PMM3

Fig. 4. The four synthetic datasets used to benchmark the proposedimage segmentation methodology. Segmentation accuracy and com-puting times reported in Tables I-II.

V. EXPERIMENTAL RESULTS AND OBSERVATIONS

In this section we demonstrate empirically the proposedmethodology with a series of experiments with real dataand comparisons with state-of-the-art algorithms. Similarly toSection IV, to assess the accuracy of our method we comparethe results with the estimates produced by a Markov chain

Page 7: Fast unsupervised Bayesian image segmentation with ... · Fast unsupervised Bayesian image segmentation with adaptive spatial regularisation Marcelo Pereyra and Steve McLaughlin Abstract—This

1057-7149 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2675165, IEEETransactions on Image Processing

7

TABLE ISEGMENTATION ACCURACY (PIXELS CORRECTLY CLASSIFIED) FOR THE

FOUR DATA DISPLAYED IN FIG. 4.

GMM4 GMM8 LMM2 PMM3

Proposed 88.9% 89.6% 96.0% 93.1%

MCMC 94.4% 98.6% 96.1% 98.1%

Iterative ICM 79.5% 18.0% 96.0% 81.5%

Non-iterative ICM 78.4% 60.4% 94.8% 95.6%

TABLE IICOMPUTING TIMES (SECONDS) FOR THE FOUR DATA DISPLAYED IN FIG. 4.

GMM4 GMM8 LMM2 PMM3

Proposed 1.41 1.45 1.75 1.54

MCMC 208 252 76.5 73.0

Iterative ICM 113 144 30 108

Non-iterative ICM 20.4 60.4 7.37 21.0

Monte Carlo algorithm [20], which estimates the marginalposterior of the segmentation labels f (z|y) with very highaccuracy. We also report comparisons with four supervisedfast image segmentation techniques that we haven chosen torepresent different efficient algorithmic approaches to imagesegmentation (e.g. MRF energy minimisation solved by graph-cut, active contour solved by Riemannian gradient descent,and two convex models solved by convex optimisation). Thespecific methods used in the comparison are as follows:• The two-stage smoothing-followed-by-thresholding algo-

rithm (TSA) [15], which is closely related to a semi-supervised instance of Algo. 1 with a single iteration(TV-denoising followed by K-means), and with a fixedregularisation parameter λ specified by the practitioner.

• Hidden Potts MRF segmentation (7) with fixed β, solvedby graph max-flow/min-cut approximation [38].

• Chan-Vese active contour by natural gradient descent [16](to our knowledge this method is currently the fastestapproach for solving active contour models).

• The fast global minimisation algorithm (FGMA) [14] foractive contour models. In a similar fashion to our method,this algorithm also involves a model with a TV convexrelaxation that is solved by convex optimisation.

We emphasise that, unlike the proposed method, all theseefficient approaches are supervised, i.e., they require thespecification of a regularisation parameters. In the experimentsreported hereafter we have tuned and adjusted the parametersof each algorithm to each image by use of visual cross-validation to ensure we produce the best results for eachmethod on each image.

To guarantee that the comparisons are fair we have appliedthe six algorithms considered in this paper to four imageswith very different characteristics: the Lungs and Bacteriaimages from the supplementary material of [14]; one sliceof a 3D in-vivo MRI image of a human brain composedof biological tissues (white matter and grey matter) withcomplex shapes and textures, making the segmentation prob-lem challenging; and a SAR image of an agricultural region

in Bourges, France, containing three types of crops (notethat SAR image segmentation is challenging because of thepresence of strong non-Gaussian noise). The four test imagesare depicted in Figure 5. These images have been selected asthey are composed of different types and numbers of objects;objects which have different shapes, (regular and irregular);noise characteristics; and a range of potential segmentationsolutions. Again, all experiments have been conducted using aMATLAB implementation of Algo. 1 with parameters T = 50,L = 25, ε = 10−3, and computed on an Intel i7 quad-coreworkstation running MATLAB 2014a. With regards to thealgorithms used for comparison, when possible we have usedMATLAB codes made available by the respective authors.It should be noted that these are mainly MATLAB scripts,however the graph-cut method is written in C++, ( the [39]implementation was used here), and the MCMC method ispartly written in C, so they have an advantage in terms ofcomputational performance.

We emphasise at this point that we do not seek to explicitlycompare the accuracy of the methods because: 1) there isno objective ground truth; 2) the ”correct” segmentation isoften both subjective and application-specific; and 3) thesegmentations can often be marginally improved by fine tuningthe regularisation parameters. What our experiments seek todemonstrate is that our method performs similarly to the mostefficient deterministic approaches of the state-of-the-art, bothin terms of segmentation results and computing speed, with thefundamental advantage that it does not require specificationof the value of regularisation parameters (i.e., it is fullyunsupervised).

(a) Lung (b) Bacteria

(c) Brain (d) SAR

Fig. 5. The Lungs (336×336 pixels), Bacteria (380×380 pixels),Brain (256×256 pixels), and SAR (256×256 pixels) images usedin the experiments.

Page 8: Fast unsupervised Bayesian image segmentation with ... · Fast unsupervised Bayesian image segmentation with adaptive spatial regularisation Marcelo Pereyra and Steve McLaughlin Abstract—This

1057-7149 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2675165, IEEETransactions on Image Processing

8

Figures 6, 7,, 8 and 9 respectively show the segmentationresults obtained for the Lungs, Bacteria, Brain andSARtest images with each method. The segmentations of theLungs and Bacteria images have been computed usingK = 2 classes to enable comparison with the natural gradientmethod [16] and FGMA [14] (these methods are based on anactive contour model that only supports binary segmentations),whereas the Brain image has been computed using K = 3classes to produce a clear segmentation of the grey matterand the white matter. Similarly, the SAR image has also beencomputed using K = 3 to identify the three crops. The com-puting times associated with these experiments are reportedin Table III. Observe that all six methods produced similarsegmentation results that are in good visual agreement witheach other. In particular, we observe that the proposed methodsuccessfully determined the appropriate level of regularisationfor each image and produced segmentations that are verysimilar to the results obtained with the supervised methodsgraph-cut [38] and TSA [15], and with the unsupervisedMCMC algorithm [16] that in a sense represents a benchmarkfor these approximate inference methods. Moreover, TableIII shows that the proposed method was only 2 or 3 timesslower than state-of-the-art supervised approaches, which isan excellent performance for a fully unsupervised method.This additional computing time is mainly due to the addi-tional computations related to the non-convex program (23);however, we emphasise that this algorithm has the propertyof adapting automatically the level of regularisation to theimage, and that the computing times reported in Table IIIdo not take into account the time involved in running thesupervised algorithms repeatedly to adjust their regularisationparameters. Finally, for completeness, we note that in all casesAlgo. 1 converged in t = 2 iterations, and determined thefollowing values for the regularisation parameter λ`: Lung,λ` = 0.065; Bacteria, λ` = 0.110; Brain, λ` = 0.095;and SAR, λ` = 6.6314. Observe the large range of values ofλ`, which highlights the fact that different images do requirevery different amounts of regularisation, and that the capacityof the proposed methodology to self-adjust λ` represents animportant advantage over supervised approaches.

TABLE IIICOMPUTING TIMES (SECONDS) FOR THE LUNGS , BACTERIA AND BRAIN

IMAGES DISPLAYED IN FIGS. 6, FIGS. 7 AND FIGS. 8.

Bacteria Bacteria Brain SAR

Proposed 0.65 0.80 0.23 0.23

TSA [15] 0.20 0.21 0.17 0.18

Graph-Cut [38] 0.30 0.30 0.21 0.23

Nat. grad. [16] 0.20 0.18 n/a n/a

FGMA [14] 0.32 0.47 n/a n/a

MCMC [16] 900 1 150 533 535

VI. CONCLUSIONS

We have presented a new fully unsupervised approach forcomputationally efficient image segmentation. The approachis based on a new approximate Bayesian estimator for hid-den Potts-Markov random fields with unknown regularisation

(a) Proposed (b) MCMC [20]

(c) TSA [15] (f) Graph-Cut [38]

(e) Natural grad. [16] (f) FGMA [14]

Fig. 6. Comparison with the state-of-the-art methods [16], [38],[14], and [15] using the lung image (336 × 336 pixels) from thesupplementary material of [14].

parameter β. The estimator is based on a small-variance-asymptotic analysis of an augmented Bayesian model and aconvex relaxation combined with majorisation-minimisationtechnique. This estimator can be very efficiently computedby using an alternating direction scheme based on a convextotal-variation denoising step and a least-squares (K-means)clustering step, both of which can be computed straightfor-wardly, even in large 2D and 3D scenarios, and with parallelcomputing techniques. Experimental results on synthetic andreal images, as well as extensive comparisons with state-of-the-art algorithms showed that the resulting new imagesegmentation methodology performs similarly in terms ofsegmentation results and of computing times as the mostefficient supervised image segmentation methods, with theimportant additional advantage of self-adjusting regularisationparameters. A detailed analysis of the theoretical propertiesof small-variance-asymptotic estimators in general, and inparticular of the methods described in this paper, is currentlyunder investigation. Potential future research topics include theextension of these methods to non-Gaussian statistical modelsfrom the exponential family, taking into consideration lineardegradation effects such as blurring and missing pixels [40],model choice techniques to address segmentation problemswhere the number of classes K is unknown, applications toultrasound and PET image segmentation, and comparisonswith other Bayesian segmentation methods based on alterna-tive hidden MRF models that can also be solved by convex

Page 9: Fast unsupervised Bayesian image segmentation with ... · Fast unsupervised Bayesian image segmentation with adaptive spatial regularisation Marcelo Pereyra and Steve McLaughlin Abstract—This

1057-7149 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2675165, IEEETransactions on Image Processing

9

(a) Proposed (b) MCMC [20]

(c) TSA [15] (f) Graph-Cut [38]

(e) Natural gradient [16] (f) FGMA [14]

Fig. 7. Comparison of the supervised and unsupervised methods withthe state of the algorithm [16], [38], [14] and [15] using the bacteriaimage (380× 380 pixels) from the supplementary material of [14].

optimisation, such as [8].

REFERENCES

[1] S. Z. Li, Markov random field modeling in image analysis. Secaucus,NJ, USA: Springer-Verlag New York, Inc., 2001.

[2] O. Eches, N. Dobigeon, and J.-Y. Tourneret, “Enhancing hyperspectralimage unmixing with spatial correlations,” IEEE Trans. Geoscience andRemote Sensing, vol. 49, no. 11, pp. 4239–4247, Nov. 2011.

[3] H. Ayasso and A. Mohammad-Djafari, “Joint NDT image restorationand segmentation using Gauss Markov Potts prior models and variationalBayesian computation,” IEEE Trans. Image Process., vol. 19, no. 9, pp.2265 –2277, Sept. 2010.

[4] M. Pereyra, N. Dobigeon, H. Batatia, and J.-Y. Tourneret, “Segmentationof skin lesions in 2D and 3D ultrasound images using a spatially coherentgeneralized Rayleigh mixture model,” IEEE Trans. Med. Imag., vol. 31,no. 8, pp. 1509–1520, Aug. 2012.

[5] T. Vincent, L. Risser, and P. Ciuciu, “Spatially adaptive mixture model-ing for analysis of fMRI time series,” IEEE Trans. Med. Imag., vol. 29,no. 4, pp. 1059 –1074, April 2010.

[6] V. Kolmogorov, M. Heath, I. Re, P. H. S. Torr, and M. Wainwright, “Ananalysis of convex relaxations for MAP estimation of discrete MRFs,”Journal of Machine Learning Research, pp. 71–106, 2009.

[7] N. Komodakis, N. Paragios, and G. Tziritas, “MRF energy minimizationand beyond via dual decomposition,” IEEE Trans. Patt. Anal. Mach.Intell., vol. 33, no. 3, pp. 531–552, March 2011.

(a) Proposed (b) MCMC [20]

(c) TSA [15] (d) Graph-Cut [38]

Fig. 8. Segmentation of a brain MRI image (256× 256 pixels).

(a) Proposed (b) MCMC [20]

(c) TSA [15] (d) Graph-Cut [38]

Fig. 9. Segmentation of a SAR image of an agricultural region (256×256 pixels).

[8] J. Bioucas-Dias, F. Condessa, and J. Kovacevic, “Alternating directionoptimization for image segmentation using hidden Markov measure fieldmodels,” in IS&T/SPIE Electronic Imaging. International Society forOptics and Photonics, Feb. 2014, pp. 90 190P–90 190P.

[9] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy mini-mization via graph cuts,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 23,p. 2001, 2001.

[10] V. Kolmogorov and R. Zabih, “What energy functions can be minimizedvia graph cuts?” IEEE Trans. Patt. Anal. Mach. Intell., vol. 26, pp. 65–

Page 10: Fast unsupervised Bayesian image segmentation with ... · Fast unsupervised Bayesian image segmentation with adaptive spatial regularisation Marcelo Pereyra and Steve McLaughlin Abstract—This

1057-7149 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2675165, IEEETransactions on Image Processing

10

81, 2004.[11] V. Kolmogorov, “Convergent tree-reweighted message passing for en-

ergy minimization,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 28,no. 10, pp. 1568–1583, Oct 2006.

[12] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient belief propagationfor early vision,” Int. J. Computer Vision, vol. 70, no. 1, pp. 41–54,2006.

[13] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov,A. Agarwala, M. Tappen, and C. Rother, “A comparative study of energyminimization methods for Markov random fields with smoothness-basedpriors,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 30, no. 6, pp. 1068–1080, 2008.

[14] X. Bresson, S. Esedoglu, P. Vandergheynst, J.-P. Thiran, and S. Osher,“Fast global minimization of the active contour/snake model,” J. Math.Imaging Vis., vol. 28, no. 2, pp. 151–167, June 2007.

[15] X. Cai, R. H. Chan, and T. Zeng, “A two-stage image segmentationmethod using a convex variant of the Mumford-Shah model and thresh-olding,” SIAM J. Imaging Sci., vol. 6, no. 1, pp. 368–390, Aug. 2013.

[16] M. Pereyra, H. Batatia, and S. McLaughlin, “Exploiting informationgeometry to improve the convergence properties of variational activecontours,” IEEE J. Sel. Topics Signal Processing, vol. 7, no. 4, pp. 1–8,Aug. 2013.

[17] ——, “Exploiting information geometry to improve the convergence ofnonparametric active contours,” IEEE Trans. Image Process., vol. 1, pp.1–10, 2015.

[18] L. Bar and G. Sapiro, “Generalized Newton-type methods for energyformulations in image processing,” SIAM J. Imaging Sciences, vol. 2,no. 2, pp. 508–531, 2009.

[19] G. Sundaramoorthi, A. Yezzi, A. Mennucci, and S. G., “New possibilitieswith sobolev active contours,” International Journal of Computer Vision,vol. 84, no. 2, pp. 113–129, May 2009.

[20] M. Pereyra, N. Dobigeon, H. Batatia, and J.-Y. Tourneret, “Estimatingthe granularity parameter of a Potts-Markov random field within anMCMC algorithm,” IEEE Trans. Image Process., vol. 22, no. 6, pp.2385–2397, June 2013.

[21] M. Pereyra, N. Whiteley, C. Andrieu, and J.-Y. Tourneret, “Maximummarginal likelihood estimation of the granularity coefficient of a Potts-Markov random field within an MCMC algorithm,” in Statistical SignalProcessing (SSP), 2014 IEEE Workshop on, June 2014, pp. 121–124.

[22] C. McGrory, D. Titterington, R. Reeves, and A. Pettitt, “VariationalBayes for estimating the parameters of a hidden Potts model,” Statisticsand Computing, vol. 19, no. 3, pp. 329–340, Sept. 2009.

[23] G. Celeux, F. Forbes, and N. Peyrard, “EM procedures using meanfield-like approximations for Markov model-based image segmentation,”Pattern Recognition, vol. 36, no. 1, pp. 131 – 144, Jan. 2003.

[24] F. Forbes and G. Fort, “Combining Monte Carlo and mean field likemethods for inference in hidden Markov random fields,” IEEE Trans.Image Process., vol. 16, no. 3, pp. 824–837, March 2007.

[25] T. Broderick, B. Kulis, and M. I. Jordan, “MAD-Bayes: MAP-basedasymptotic derivations from Bayes,” J. Mach. Learn. Res., vol. 28, no. 3,pp. 226–234, 2013.

[26] A. Roychowdhury, K. Jiang, and B. Kulis, “Small-variance asymptoticsfor hidden Markov models,” in Advances in Neural Information Pro-cessing Systems 26, C. Burges, L. Bottou, M. Welling, Z. Ghahramani,and K. Weinberger, Eds., 2013, pp. 2103–2111. [Online]. Available:http://media.nips.cc/nipsbooks/nipspapers/paper˙files/nips26/1045.pdf

[27] P. J. Green, “MAD-Bayes matching and alignment for labelled andunlabelled configurations,” in Geometry driven statistics, I. L. Drydenand J. T. Kent, Eds. Chichester: Wiley, 2015, ch. 19, pp. 365–375.

[28] K. Jiang, B. Kulis, and M. I. Jordan, “Small-variance asymptoticsfor exponential family Dirichlet process mixture models,” in Advancesin Neural Information Processing Systems 25, F. Pereira, C. Burges,L. Bottou, and K. Weinberger, Eds., 2012, pp. 3167–3175.

[29] F. Y. Wu, “The Potts model,” Rev. Mod. Phys., vol. 54, no. 1, pp. 235–268, Jan. 1982.

[30] A. Chambolle, “An algorithm for total variation minimization andapplications,” J. Math. Imaging Vis., vol. 20, no. 1-2, pp. 89–97, 2004.

[31] J. MacQueen, “Some methods for classification and analysis of mul-tivariate observations,” in Proc. of the 5-th Berkeley Symposium onMathematical Statistics and Probability, vol. 1. University of CaliforniaPress, 1967, pp. 281–297.

[32] J. Besag, “On the Statistical Analysis of Dirty Pictures,” J. Roy. Stat.Soc. Ser. B, vol. 48, no. 3, pp. 259–302, 1986.

[33] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation basednoise removal algorithms,” Phys. D, vol. 60, no. 1-4, pp. 259–268, Nov.1992.

[34] J. Oliveira, J. Bioucas-Dias, and M. Figueiredo, “Adaptive total varia-tion image deblurring: A majorization-minimization approach,” SignalProcess., vol. 89, no. 9, pp. 1683–1693, 2009.

[35] J. Besag, “On the statistical analysis of dirty pictures,” J. Roy. Stat. Soc.B., vol. 48, no. 3, pp. 259–302, 1986.

[36] A. L. M. Levada, N. D. A. Mascarenhas, and A. Tannus, “Pseudolike-lihood equations for the Potts MRF model parameters estimation onhigher order neighborhood systems,” IEEE Geosci. Remote Sens. Lett.,vol. 5, no. 3, pp. 522–526, 2008.

[37] J. Gimenez, A. C. Frery, and A. G. Flesia, “When data do not bringinformation: A case study in Markov random fields estimation,” IEEEJ. Sel. Topics Appl. Earth Observ. in Remote Sens., vol. 8, no. 1, pp.195–203, Jan. 2015.

[38] Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision.” IEEEtransactions on Pattern Analysis and Machine Intelligence, vol. 26,no. 9, pp. 1124–1137, September 2004.

[39] S. Bagon, “Matlab wrapper for graph cut,” December 2006. [Online].Available: http://www.wisdom.weizmann.ac.il/ bagon

[40] X. Cai, “Variational image segmentation model coupled with imagerestoration achievements,” Pattern Recognition, vol. 48, no. 1, pp. 2029–2042, Jan. 2015.

Marcelo Pereyra was born in Buenos Aires, Ar-gentina, in 1984. He studied Electronic Engineeringand received a double M.Eng. degree from ITBA(Argentina) and INSA Toulouse (France), togetherwith a M.Sc. degree from INSA Toulouse, in June2009. In July 2012 he obtained a Ph.D. degree inSignal Processing from the University of Toulouse.From 2012 to 2016 he held a Brunel PostdoctoralResearch Fellowship in Statistics, a Postdoctoral Re-search Fellowship from French Ministry of Defence,and a Marie Curie Intra-European Fellowship for

Career Development at the School of Mathematics of the University of Bristol(UK). In 2017 he joined Heriot-Watt University as an Assistant Professorat the School of Mathematical and Computer Sciences. Dr Pereyra is alsothe recipient of a Leopold Escande PhD Thesis excellent award from theUniversity of Toulouse (2012), an INFOTEL R&D excellent award fromthe Association of Engineers of INSA Toulouse (2009), and an ITBA R&Dexcellence award from the Buenos Aires Institute of Technology (2007).His research activities are centred around Bayesian statitical analysis andcomputation for high-dimensional inverse problems related to mathematicaland computational imaging.

Steve McLaughlin was born in Clydebank Scotlandin 1960. He received the B.Sc. degree in Electronicsand Electrical Engineering from the University ofGlasgow in 1981 and the Ph.D. degree from theUniversity of Edinburgh in 1990. From 1981 to 1984he was a Development Engineer in industry involvedin the design and simulation of integrated thermalimaging and fire control systems. From 1984 to 1986he worked on the design and development of highfrequency data communication systems. In 1986 hejoined the Dept. of Electronics and Electrical Engi-

neering at the University of Edinburgh as a research fellow where he studiedthe performance of linear adaptive algorithms in high noise and nonstationaryenvironments. In 1988 he joined the academic staff at Edinburgh, and from1991 until 2001 he held a Royal Society University Research Fellowshipto study nonlinear signal processing techniques. In 2002 he was awarded apersonal Chair in Electronic Communication Systems at the University ofEdinburgh. In October 2011 he joined Heriot-Watt University as a Professorof Signal Processing and Head of the School of Engineering and PhysicalSciences. His research interests lie in the fields of adaptive signal processingand nonlinear dynamical systems theory and their applications to biomedical, energy and communication systems. Prof. McLaughlin is a Fellow of theRoyal Academy of Engineering, of the Royal Society of Edinburgh, of theInstitute of Engineering and Technology and of the IEEE.