ICTP Lultra.sdk.free.fr/docs/DxO/Mathematical Problems In Image...ICTP L e ctur Notes SCHOOL ON MA...
Transcript of ICTP Lultra.sdk.free.fr/docs/DxO/Mathematical Problems In Image...ICTP L e ctur Notes SCHOOL ON MA...
ICTP Lecture Notes
SCHOOL ON
MATHEMATICAL PROBLEMS
IN IMAGE PROCESSING
4 - 22 September 2000
Editor
Charles E. Chidume
The Abdus Salam ICTP
Trieste, Italy
SCHOOL ON MATHEMATICAL PROBLEMS IN IMAGE PROCESSING
{ First edition
Copyright c 2000 by The Abdus Salam International Centre for Theoretical Physics
The Abdus Salam ICTP has the irrevocable and inde�nite authorization to reproduce and dissem-
inate these Lecture Notes, in printed and/or computer readable form, from each author.
ISBN 92-95003-04-7
Printed in Trieste by The Abdus Salam ICTP Publications & Printing Section
iii
PREFACE
One of the main missions of the Abdus Salam International Centre for
Theoretical Physics in Trieste, Italy, founded in 1964 by Abdus Salam, is to
foster the growth of advanced studies and research in developing countries.
To this aim, the Centre organizes a large number of schools and workshops
in a great variety of physical and mathematical disciplines.
Since unpublished material presented at the meetings might prove of
great interest also to scientists who did not take part in the schools the Centre
has decided to make it available through a new publication titled ICTP
Lecture Note Series. It is hoped that this formally structured pedagogical
material in advanced topics will be helpful to young students and researchers,
in particular to those working under less favourable conditions.
The Centre is grateful to all lecturers and editors who kindly authorize
the ICTP to publish their notes as a contribution to the series.
Since the initiative is new, comments and suggestions are most welcome
and greatly appreciated. Information can be obtained from the Publica-
tions Section or by e-mail to \pub�[email protected]". The series is
published in house and also made available on-line via the ICTP web site:
\http://www.ictp.trieste.it".
M.A. Virasoro
Director
v
Introduction
This is the second volume of a new series of lecture notes of the Abdus
Salam International Centre for Theoretical Physics. These new lecture notes
are put onto the web pages of the ICTP to allow people from all over the
world to access them freely. In addition a limited number of hard copies is
printed to be distributed to scientists and institutions which otherwise do
not have access to the web pages.
This volume contains the lecture notes given by A. Chambolle during
the School on Mathematical Problems in Image Processing that took place
at the Abdus Salam International Centre for Theoretical Physics from 4
to 22 September 2000 under the direction of L. Ambrosio (Scuola Normale
Superiore di Pisa), G. Dal Maso (Scuola Internazionale Superiore di Studi
Avanzati) and J.-M. Morel (Ecole Normale Sup�erieure, Cachan).
The topic of Chambolle's course was \Inverse problems in image process-
ing and image segmentation: some mathematical and numerical aspects".
The School consisted of two weeks of lecture courses and one week of
conference. It was �nancially supported by the Abdus Salam International
Centre for Theoretical Physics, SISSA (Scuola Internazionale Superiore di
Studi Avanzati, Trieste) and Scuola Normale Superiore di Pisa. I take this
opportunity to express our gratitude to all the lecturers and speakers at the
conference for their contribution towards the success of the School.
Charles E. Chidume
November, 2000
Inverse problems in Image processing and Image
segmentation: some mathematical
and numerical aspects
A. Chambolle�
CEREMADE (CNRS, UMR 7534),
Université de Paris-Dauphine, 75775 Paris cedex 16, France
Lecture given at the
School on Mathematical Problems in Image Processing
Trieste, 4 � 22 September 2000
LNS002001
Abstract
These notes contain an introduction to some approaches to the reg-
ularization of inverse problems in image processing and to the mathe-
matical tools that are necessary to handle correctly these approaches.
The methods we consider here are variational methods. We consider
mainly the minimization of two kinds of functionals: functionals based
on the total variation of the image, and the so�called Mumford and
Shah functional that penalizes the edge set and the gradient of the im-
age. In both cases we study mathematically the existence of a solution
in the space of functions with bounded variation (BV ), and discuss
then some approximations and numerical methods for computing solu-
tions.
Keywords: Image processing, inverse problems, image segmentation, func-
tions with bounded variation, ��convergence, iterative algorithms.
AMS Classi�cation numbers: 26A45, 49J45, 49Q20, 68U10
Contents
1 Introduction: denoising and deblurring images 7
1.1 The �classical approach� . . . . . . . . . . . . . . . . . . . . . 7
1.2 The total variation criterion . . . . . . . . . . . . . . . . . . . 10
1.3 The segmentation of images . . . . . . . . . . . . . . . . . . . 10
1.3.1 A statistical approach to image denoising . . . . . . . 10
1.3.2 The Mumford�Shah functional . . . . . . . . . . . . . 14
2 Some mathematical preliminaries 15
2.1 The functions with bounded variation (BV ) . . . . . . . . . . 15
2.1.1 Why we need bounded variation functions . . . . . . . 15
2.1.2 BV functions: de�nition and main properties . . . . . 16
2.1.3 Existence for the Rudin-Osher approach . . . . . . . . 19
2.2 More properties of BV functions . . . . . . . . . . . . . . . . 21
2.2.1 The jumps set Su . . . . . . . . . . . . . . . . . . . . . 21
2.2.2 BV functions in one dimension . . . . . . . . . . . . . 21
2.2.3 The jumps set and the singular part of Du . . . . . . 25
2.2.4 Special BV functions, in dimension one . . . . . . . . 27
2.2.5 The general, N�dimensional case . . . . . . . . . . . . 29
2.2.6 Special BV functions . . . . . . . . . . . . . . . . . . . 30
2.2.7 Ambrosio's compactness theorem . . . . . . . . . . . . 31
2.2.8 Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Back to the Mumford�Shah functional . . . . . . . . . . . . . 33
2.3.1 Existence for the weak formulation . . . . . . . . . . . 33
2.3.2 From the weak to the strong formulation . . . . . . . . 34
2.4 Variational approximations and ��convergence . . . . . . . . 35
3 The numerical analysis of the total variation minimization 36
3.1 The discrete energy . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 The method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Proof of the convergence of the algorithm . . . . . . . . . . . 39
3.4 Two examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4 The numerical analysis of the Mumford�Shah problem (I) 43
4.1 Ambrosio and Tortorelli's approximate energy . . . . . . . . . 43
4.2 Sketch of the proof of Ambrosio and Tortorelli's theorem, in
dimension one . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.1 Proof of (i) . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.2 Proof of (ii) . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3 Higher dimensions . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.1 The �rst inequality . . . . . . . . . . . . . . . . . . . . 49
4.3.2 The second inequality . . . . . . . . . . . . . . . . . . 51
5 The numerical analysis of the Mumford�Shah problem (II) 53
5.1 Rescaling Blake and Zisserman's functional . . . . . . . . . . 53
5.2 The ��limit of the rescaled 1�dimensional functional . . . . . 55
5.2.1 Proof of (i) . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2.2 Proof of (ii) . . . . . . . . . . . . . . . . . . . . . . . 56
5.3 The ��limit of the rescaled 2�dimensional functional . . . . . 57
5.4 More general �nite-di�erences approximations . . . . . . . . . 58
6 A numerical method for minimizing the Mumford�Shah func-
tional 61
6.1 An iterative procedure for minimizing (34) . . . . . . . . . . . 62
6.2 Anisotropy of the length term . . . . . . . . . . . . . . . . . . 64
6.3 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . 67
A Proof of Theorems 11 and 12 74
A.1 A compactness lemma . . . . . . . . . . . . . . . . . . . . . . 74
A.2 Estimate from below the ��limit . . . . . . . . . . . . . . . . 77
A.3 Estimate from above the ��limit . . . . . . . . . . . . . . . . 83
A.4 Proof of Theorem 12 . . . . . . . . . . . . . . . . . . . . . . . 88
References 89
Main notations
� a_ b, a^ b: respectively, the max and the min of the two real numbers
a; b 2 R.
� Hk: the k�th dimensional Hausdor� measure. In particular, for every
set E � RN , H0(E) is the cardinality of E, also denoted by ]E.
� �E(x): the characteristic function of a set E, i.e., �
E(x) = 1 if x 2 E
and �E(x) = 0 otherwise.
� jEj = LN (E) =RRN
�E(x) dx: the Lebesgue measure of E � R
N .
� Cc(), C1c (), C
1c (): the space of compactly supported continuous
(respectively, continuously di�erentiable, in�nitely di�erentiable) real�
valued functions on the domain � RN . C1
c () is also denoted by
D() when it is equipped with the appropriate topology in order to
de�ne the distributions by duality (see [55]). Cc(;RN ) = [Cc()]
N ,
etc.
� C0(): the space of the real�valued functions that are continuous on
and vanish at the boundary and/or at in�nity, in the sense that if
' 2 C0(), 8" > 0, 9K � compact set such that supnK j'j � ". The
norm on C0() is k'k = sup j'j. With this norm, C0() = Cc().
Similarly, C0(;RN ) = [C0()]
N .
� M(): the space of bounded Radon measures on . It is isomorphic
(and isometric) to the topological dual C0()0 of C0(). M(;RN ) =
[M()]N = C0(;RN )0.
� hx0; xi: the Euclidean scalar product of x; x0 2 RN , or the duality
product between an element x 2 X of a spaceX and an element x0 2 X 0
of its dual (also sometimes denoted by hx0; xiX0;X). The Euclidean
norm in RN is usually denoted by j�j =ph�; �i.
� (a; b): the set ft 2 R : a < t < bg. [a; b] = ft 2 R : a � t � bg,(a; b] = ft 2 R : a < t � bg, etc.
� SN�1 = f� 2 R
N : j�j = 1g is the (N � 1)�dimensional sphere in RN .
Inverse problems in Image processing and Image segmentation 7
1 Introduction: denoising and deblurring images
One fundamental branch of the image processing concerns the problem of
reconstructing images, i.e., given some data (that may be a corrupted image
but also any kind of signal, like the output of a tomography device or of
a satellite aerial), how to reconstruct a clear and clean image that can be
correctly understood by a human operator or post-processed by other image
analysis methods.
The most basic examples of image reconstruction problems are the prob-
lems of denoising and of deblurring an image. Although they are the simplest,
they share many common features with more complicated problems that are
usually too speci�c for the purpose of short lectures. All these problems
belong to the class usually known as inverse problems. It means that the
process through which the data is obtained from the physical characteristics
of the observed scene corresponds to transformations that are roughly well
understood and can be more or less correctly modelized mathematically, but
whose inverse either is not known or is not computable by direct methods,
or whose computation is highly instable and sensitive to small changes in the
data (or noise), so that the scene itself is di�cult to reconstruct.
First we will describe the main classical approach to denoising and de-
blurring (more or less the �standard� method for solving inverse problems)
and will try to explain why it is not well suited to the nature and structure
of images. Then, we will introduce solutions that have been proposed in the
past years to improve this approach.
1.1 The �classical approach�
Assume you observe a signal (an image) which is a matrix G = (gi;j)1�i;j�n ofgrey level values in [0; 1], and suppose you know that this signal is the sum of
a �perfect world� unknown signal U = (ui;j)1�i;j�n and an additive Gaussian
noise N = (ni;j)1�i;j�n, where, for instance, all ni;j are independent and
have mean 0 and known variance �2.
In a di�erent point of view, in the continuous setting, you can assume
that the signal you observe is a bounded grey-level function
g : ![0; 1]
where is �the screen�, usually an open domain of R2 (although lower or
higher dimensions may be considered), and most of the time, in the appli-
cations, a rectangle, e.g., (0; 1) � (0; 1). This function g(x) will be assumed
8 A. Chambolle
to be the sum u(x) + n(x) of a �good image� u(x) and an oscillation n(x)
that we would like to remove. We will assume thatR n(x) dx = 0 and thatR
n(x)2 dx = �2 is known or can be correctly estimated.
The �rst point of view (the discrete setting) describes well the structure of
digital images, and is usually adopted in the statistical approaches to image
reconstruction. We will return to this setting in section 1.3 devoted to the
image segmentation problem, since the origins of the approach that we will
discuss in these notes are to be found in the statistical approach to image
denoising. However, in the PDE or variational approach that we will usually
adopt here, it is more common and more convenient to work in the continuous
setting, and except otherwise mentioned we will consider this point of view
in the sequel.
Up to now we have just considered an image corrupted by some noise, but
usually an image also goes through all kinds of degradations, that are usually
modelized by a blur of more or less known kernel. It means that instead of
g(x) = u(x) +n(x), the correct model should be g(x) = Au(x) +n(x) where
A is a linear operator, say, from L2() into L2() (or any kind of reasonable
function space). Usually
Au(x) = � � u(x) =
Z�(x� y)u(x) dy
is simply a blur (a convolution), with � some (usually non negative) kernel
that is known or estimated, but one may imagine more complex operators
(like tomography kernels, or all sorts of transformation).
Then, the problem we need to solve is the following: given g and an
estimation of A and �2, is it possible to get a good approximation of u?
The �rst idea would be to compute A�1g = u + A�1n, however this isnot feasible in practice: the operator A is often not invertible, or its inverse
is impossible to compute. Consider for instance the case where Au = � � u.In the Fourier domain, we �nd that dAu = �u where � denotes the Fourier
transform. So that u = A�1v if and only if u = v=�. But, even if � does not
vanish, this ratio is usually not in L2 for an arbitrary v 2 L2. Moreover, if �
is a smooth low pass �lter, then �(�) is very small for large frequencies j�j, sothat in the case where v is the oscillatory signal n, for which jn(�)j remains
strictly greater than zero for large j�j, the ratio n(�)=�(�) will become very
large and go to +1 as j�j increases. This enhancement of the high frequencies
gives birth to wild oscillations and artifacts that make the image u + A�1nimpossible to read.
Inverse problems in Image processing and Image segmentation 9
A better approach to this kind of problem, therefore, is the following: we
will try to �nd the �best� function u among all u satisfying8>><>>:ZAu(x)� g(x) dx = 0Z
jAu(x)� g(x)j2 dx = �2:
(1)
So that the main issue, now, is to �nd a good criterion for characterizing
what the �best� function u is.
The classical approach of Tichonov consists in minimizing some quadratic
norm of u, likeR juj2 or
R jruj2 under the constraints (1). Both problems
can easily be solved (using the Fourier transform) and the linear transforma-
tion of g that gives the solution u is called a Wiener �lter.
Figure 1: From left to right: a white square on a black background; the same
image with noise added; the Tichonov reconstruction by minimizingRjuj2;
the minimization ofRjruj2.
However, while the �rst criterion is not regularizing enough and produces
images that still look very noisy, the criterionR jruj2 is not well suited ei-
ther for the analysis of images (see Fig. 1). Indeed, if it is �nite, it means
that the image belongs to the Sobolev space H1(), and it is well known that
a function in that space may not have discontinuities along a hypersurface,
whereas the grey level of an image should be allowed to have such disconti-
nuities that correspond to edges and boundaries of objects in the image. For
instance, in dimension 1, it is well known that if u 2 H1(I) (I being some
interval of R), then for every x; y 2 I with x � y,
u(y)� u(x) =
Zy
x
u0(s) ds �py � x
sZy
x
ju0(s)j2 ds
so that u 2 C0; 12 (I) (the space of continuous 1
2�Hölder functions in I) and
may not have discontinuities.
This motivates the introduction of the criterion that we discuss in the
next section.
10 A. Chambolle
1.2 The total variation criterion
In their paper [54], Rudin, Osher and Fatemi describe a di�erent approach
(see also [46, 53, 60, 61, 24, 25, 44, 33, 47]). Their idea is to try to �nd
a criterion of minimization that corresponds better to the structure of the
images. They propose to consider the �total variation� of the function u as
a measure of the optimality of an image.
The total variation (that will be introduced correctly in section 2.1) is
roughly the integralR jru(x)j dx. The main advantage is that it can be
de�ned for functions that have discontinuities along hypersurfaces (in 2�
dimensional images, along 1�dimensional curves), and this is essential to get
a correct representation of the edges in an image.
The problem to solve is thus the following:
min
�Zjru(x)j dx : u satis�es (1)
�: (2)
We will show in section 2.1 that under some simple and natural assumptions,
this problem has a solution. Then, we will propose a numerical approach for
computing a solution.
1.3 The segmentation of images
The last approach that we will discuss in these notes can be seen as an
independent problem, although historically it has the same origin. It is called
the problem of image segmentation, and can be described as the problem
of �nding a simple representation of a given image in terms of edges and
smooth areas. The proposition of D. Mumford and J. Shah [49, 50]) to solve
this problem by minimizing a functional is indeed derived from statistical
approaches to image denoising, introduced in particular by S. and D. Geman,
that we will describe in the next section. Again, the problem of Geman and
Geman was to regularize correctly an inverse problem (the problem that we
have described in the previous paragraphs, written in the discrete setting),
and to restore correctly the edges of the image. Thus we will brie�y describe
the point of view of Geman and Geman, and explain then how Mumford and
Shah derived their continuous formulation.
1.3.1 A statistical approach to image denoising
The origins of the variational approaches to image segmentation are to be
found in Geman and Geman's famous paper [41] in which they introduce a
Inverse problems in Image processing and Image segmentation 11
statistical approach for image analysis that has proved to be very e�cient.
First we will brie�y explain how it appeared in the probabilistic setting.
We return to the discrete setting of the image denoising problem: the
observed signal (or image) is a matrix G = (gi;j)1�i;j�n of grey level values
in [0; 1], and is the combination of a �perfect world� unknown signal U =
(ui;j)1�i;j�n and an additive Gaussian noise N = (ni;j)1�i;j�n. The ni;j areindependent and have mean 0 and variance �2. If you know the a priori
probability P (U) of the perfect world signal U , since for a given G = U +N ,
the probability of G knowing U is P (GjU) = P (N = G� U) � exp(�kG�Uk2=2�2), the Bayes' rule tells you that
P (U jG)P (G) = P (GjU)P (U);
so that P (U jG), up to a constant, is P (GjU)P (U), that is:
1
(p2��)n�n
exp
0@� 1
2�2
Xi;j
(gi;j � ui;j)2
1A P (U):
Geman and Geman proposed the following a priori probability for U :
they considered that most scenes are piecewise smooth with possible dis-
continuities (the edges), and introduce an edge set (or line process) L =
(li+ 1
2;j)1�i<n;1�j�n; (li;j+ 1
2)1�i�n;1�j<n, where each variable l�;� is either 0
or 1, and (see Fig. 2):
li+ 1
2;j
=
8><>:1 if there is a break (a vertical piece of edge) between i; j
and i+ 1; j
0 if U has to be smooth between i; j and i+ 1; j,
li;j+ 1
2=
8><>:1 if there is a break (a horizontal piece of edge) between i; j
and i; j + 1
0 if U has to be smooth between i; j and i; j + 1.
They then proposed the following probability law for U;L:
P (U;L) =
1
Zexp
8<:�Xi;j
��(1� l
i+ 12;j)(ui+1;j � ui;j)
2 + �li+ 1
2;j
+�(1� li;j+ 1
2)(ui;j+1 � ui;j)
2 + �li;j+ 1
2
�):
12 A. Chambolle
x
x
x
x
x
QQk
i; ji+ 1; j
li+1
2;j= 1
x
x
x
x
x
��+
i; j
li;j+ 1
2
= 1
i; j + 1
Figure 2: The line process: li+ 1
2;jand l
i;j+ 12.
where �, � are two positive weights, and Z is computed in order to havePU;L P (U;L) = 1, the sum being computed over all the possible states U;L.
The problem that needs to be solved is therefore the following:
�Among all possible images U and line processes L, �nd the one that has the
greatest probability
P (U;LjG) � e�E(U;L;G); (3)
where the free energy E(U;L;G) is given by
E(U;L;G) =Xi;j
��(1� l
i+ 12;j)(ui+1;j � ui;j)
2 + (1� li;j+ 1
2)(ui;j+1 � ui;j)
2�
+��li+ 1
2;j+ l
i;j+ 12
�+
1
2�2(gi;j � ui;j)
2
(4)
and G = (gi;j)1�i;j�n is the given data�
In what follows, since the observed data G will be �xed, we will drop the
dependency in G in the notations and merely write E(U;L).
Then, Geman and Geman proposed to maximize the probability (3) us-
ing a simulated annealing algorithm (see for instance [11, 12], [15], [32], the
book [51] for more general segmentation models, and the book on Markov
Random Field Modeling in Computer Vision by Li [45] for a general intro-
duction to the �eld). This kind of method is still widely used in the computer
vision community and gives good result. It has to be adapted to each partic-
ular segmentation problem (in which the problem we exposed is among the
simplest, but might not be the most interesting!).
We will present other approaches, since in some simple cases it might be
too costly to implement a simulated annealing algorithm. Notice that the
problem of maximizing (3) is equivalent to the problem of �nding a minimum
Inverse problems in Image processing and Image segmentation 13
to the free energy E(U;L) that appears in the exponential in (3). The prob-
lem is that this energy is not convex, so that there is no known deterministic
(i.e., non-probabilistic) algorithm that can be proved to surely converge to
the minimum. The history of the minimization of E(U;L) is therefore mostly
a competition for �nding a �better� algorithm, in any possible sense.
In the 80's, already, many have suggested deterministic methods to min-
imize directly energy (4). See for instance [38, 39], or [40], and more re-
cently [10], but of course this list is far from being exhaustive.
In most of these papers, the problem is iteratively approximated by a
sequence of simpler problems, each one becoming �less convex� as the process
evolves. This is the central idea of the book Visual Reconstruction by Blake
and Zisserman [14], who introduce the so�called �Graduated Non-Convexity�
(GNC) algorithm.
They �rst noticed that, minimizing with respect to L, the energy E in (4)
can be rewritten as
E(U) =Xi;j
W�;� (ui+1;j � ui;j) +W�;� (ui;j+1 � ui;j) +1
2�2(gi;j � ui;j)
2
(5)
where the non-convex potential W�;� is (see Figure 3)
W�;�(x) = min(�x2; �):
(We will also denote min(�x2; �) by (�x2) ^ �.) Blake and Zisserman call
0
0.5
1
1.5
2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
W1;1(x)
Figure 3: The function W�;�(x) for � = � = 1.
14 A. Chambolle
E(U) the �weak membrane� energy, since it looks like the potential of an
elastic membrane that can break when the elastic energy becomes locally too
high. Notice now that their problem is very similar to the inverse problems
that we have presented in the previous sections. Now, instead of minimizing
a regularizing factor (that is quite more complex than Tichonov's) under a
constraint like (1) (with A = Id), the energy has a term 12�2
(gi;j � ui;j)2 that
could be seen as a Lagrange multiplier for the constraintR ju(x)�g(x)j2 dx =
�2.
Their idea to minimize E(U) is to replaceW�;� with a family of potentials
W �
�;�, � 2 [0; 1], with W �
�;�convex for � = 0 and gradually going to W�;� as
� increases to 1. They then propose to solve the problem for small �, and
then to increase slowly � to improve the solution.
1.3.2 The Mumford�Shah functional
In order to study energies (4) or (5), Mumford and Shah (see [49, 50]) pro-
posed to rewrite those in a continuous setting. They considered an observed
image g(x; y), with (x; y) 2 , bounded open set of R2 , and g(x; y) 2 [0; 1]
for (almost) every (x; y), and then they noticed that the variable L, or rather
the set fL = 1g, describes the �discontinuity� or �jump set� K � of a
piecewise regular function u(x; y), (x; y) 2 , whereas the �nite di�erences
ui+1;j � ui;j (resp., ui;j+1 � ui;j), are approximations of the partial deriva-
tives @u
@x(x; y) (resp., @u
@y(x; y)). The energy they wrote was thus (with the
standard notation ru = (@u@x; @u@y) for the gradient)
E(u;K) = �
ZnK
jru(x; y)j2 dxdy + � � length(K \)
+ �
Z(u(x; y) � g(x; y))2 dxdy (6)
where �; �; � are positive parameters. They then proposed to study the
problem of minimizing energy (6).
In these lecture notes we will try to explain brie�y
(a) how this problem can mathematically be handled, in what setting, what
functions space, in what sense it has a solution,
(b) a �rst approximation result that has been proposed in order to minimize
more easily energy E(u;K), in a continuous setting,
Inverse problems in Image processing and Image segmentation 15
(c) in what sense can one say that E(u;K) and E(U;L) are the �same
energies�, in a continuous and in a discrete setting,
(d) how it is possible to approximate E(u;K) by discrete energies, �better�,
in some sense, than with energy E(U;L).
What we will not describe on the other hand are the possible �nite-
element approaches that have also been proposed for solving the Mumford�
Shah problem. It is still not clear whether they are of some interest for image
processing applications or not. They are usually useful in other �elds where
similar problems are relevant, and in particular in fracture mechanics. The
interested reader may consult [13, 37, 16], or [23, 17].
2 Some mathematical preliminaries
2.1 The functions with bounded variation (BV )
2.1.1 Why we need bounded variation functions
For the study of Rudin and Osher's problem (2), the correct mathematical
setting is clearly the functions with bounded variation (the criterion they
propose to minimize being simply the semi-norm de�ning such functions),
that we will de�ne in the next paragraph. Although it may be not as clear,
this is also true for the analysis of the Mumford�Shah functional. Indeed, in
order to study energy E , Ambrosio and De Giorgi have suggested to introducea �weak formulation� depending only on the variable u. This formulation
assumes that we are able to de�ne, given a function u, a set of discontinuities
Su and a gradient ru everywhere outside of Su. The weak Mumford�Shah
energy is then
E(u) =
Zjru(x)j2 dx + HN�1(Su) +
Zju(x)� g(x)j2 dx: (7)
Here we consider that u, g are de�ned in a domain of a space of arbitrary
dimension N , and the set Su is (N � 1)�dimensional, for images you can
just replace N by 2 everywhere in the notes. HN�1 denotes the (N � 1)�
dimensional Hausdor� measure (see for instance [35]). It is a Borel measure
in RN that agrees with the traditional de�nition of the surface for every
regular hypersurface in RN (any bounded part of an hyperplane, a sphere,
...).
16 A. Chambolle
The discontinuity set Su can be de�ned for very general functions, but
it usually has no kind of regularity. A correct de�nition of the gradient rurequires more regularity of u. Usually, we can de�ne a gradient ru as an
integrable function if u belongs to the Sobolev space W 1;1() (or at least
W1;1loc
()). But in this case it is possible to show that Su is almost empty (in
fact, HN�1(Su) = 0: we say that Su is HN�1�essentially empty).
The space of bounded variation functions, that we are going to introduce,
doesn't su�er this drawback. It contains functions for which it is possible to
de�ne correctly Su and the gradient ru, in such a way that 0 < HN�1(Su) <+1 and
Rjru(x)j2 dx < +1. Such a function combines some regularity,
and discontinuities across the essentially (N � 1)�dimensional set Su
2.1.2 BV functions: de�nition and main properties
The space of bounded variation functions in , denoted by BV (), is de�ned
in the following way:
BV () =nu 2 L1() : Du is a bounded Radon vector measure on
o;
(8)
where Du is the distributional (or weak) derivative of u, de�ned by
hDu; �iD0(;RN);D(;RN) = �Zu(x)div�(x) dx
for any vector �eld � 2 D(;RN ), i.e., C1 with compact support in .
Let us denote byM(;RN ) the space ofN�dimensional bounded (vector�
valued) Radon measures on . It is well known (as a consequence of Riesz'
representation theorem) thatM(;RN ) is identi�ed to the dual of C0(;RN ),
the space of all continuous vector �elds vanishing at the boundary (this
means that if � 2 C0(;RN ), for every " > 0, there exists a compact
set K � such that supx=2K j�j < "), on which the norm is given by
k�kC0(;RN) = supx2 j�(x)j. If � 2 M(;RN ) is a measure, we can de-
�ne its variation as the Borel positive measure given by
j�j(E) = sup
(nXi=1
j�(Ei)j :n[i=1
Ei � E ; Ei \Ej = �8i 6= j
); (9)
for every Borel set E � (here the Ei; i = 1; � � � ; n, are disjoint Borel
sets). Saying that the measure � is bounded is nothing else than saying that
j�j() < +1, the quantity j�j() is called the total variation of � (on )
and de�nes the usual norm in the Banach space M(;RN ).
Inverse problems in Image processing and Image segmentation 17
As an element of the dual C0(;RN )0 of C0(;R
N ), � also has a norm
given by
k�kC0(;RN)0 = supk�k
C0(;RN)�1h�; �i = sup
k�kC0(;R
N)�1
Z�(x)�(dx):
In fact, both norms coincide, which means that for every � 2M(;RN ),
j�j() = sup
�Z�(x)�(dx) : � 2 C0(;R
N ); j�(x)j � 1 8x 2
�The weak-� convergence of a sequence of measures (�n) is understood as
the weak-� convergence in the dual of C0(;RN ), which means that �n*�
weakly-� if and only ifZ�(x)�n(dx)!
Z�(x)�(dx)
for every � 2 C0(;RN ).
If Du is a bounded Radon measure, then sinceR �Du = �
R u div�
for every � in C1c (;R
N ), we deduce (the compactly supported C1�regular
functions being dense in the space C0(;RN ) ) that if u 2 L1(),
u 2 BV () , V(u;):= sup
�Zu(x)div�(x) dx : � 2 C1
c (;RN );
j�(x)j � 1 8x 2 o< +1: (10)
The quantity V(u;) coincides with the total variation of the measure Du,
i.e., V(u;) = jDuj(). In fact, saying that V(u;) must be �nite is an
equivalent way to de�ne the space BV (). We call V(u;) = jDuj() thetotal variation of u in . If u 2 C1(), or u is in the Sobolev space W 1;1(),
then the notation in (2) is valid since it is simple to show that jDuj() =R jru(x)j dx. The space BV (), endowed with the norm kukBV () =
kukL1() + jDuj(), is a Banach space.
Exercise. Prove that, given u 2 L1(), Du 2 M(;RN ) if and only if
V(u;) (given by (10)) is �nite.
The �rst result we can state about the total variation is the following
semi-continuity property:
Theorem 1 (Semicontinuity of the total variation) The convex func-
tional u 7! V(u;) = jDuj() 2 [0;+1] is lower semicontinuous in the
L1loc() topology.
18 A. Chambolle
This means that if un goes to u in L1(0) for every 0 �� , then jDuj() �
lim infn!1 jDunj(). The proof of theorem 1 is straightforward if we con-
sider the de�nition (10) of the variation of u. Indeed, in (10), V(u;) is built
as the sup of the linear functionals u 7!R u(x)div�(x) dx for � 2 C1
c (;RN ).
Since each of these functionals is continuous in the L1loc
topology, we deduce
that the sup is lower semicontinuous.
Next, we have the following Poincaré inequalities
Theorem 2 (Poincaré inequalities) There exists a constant c = c(N)
such that if u 2 L1loc(RN ), then
kukL
NN�1 (RN)
� cjDuj(RN );
and if B is a ball and u 2 L1(B), u��ZB
u
L
NN�1 (B)
� cjDuj(B):
Here and everywhere in the notes �RXu = �
RXu(x) dx denotes the average
1jXjRXu(x) dx.
If is a bounded Lipschitz�regular open set (this will be assumed always
in what follows) we can build a continuous linear extension operator T;0
from BV () to BV (0) for every 0 with �� 0, which means that for
every u 2 BV () we can �nd u0 2 BV (0) with u0 � u on and ku0kBV (0) �ckukBV (), the constant c depending only on ;0. This extension allows
to generalize the second inequality in the last theorem to any such : we
deduce that there exists a constant c = c() such that u��Zu
L
NN�1 ()
� cjDuj() (11)
for every u 2 BV () (see [34] for details).We state, still without any proof, the two next theorems that are funda-
mental for the study of the space BV ().
Theorem 3 (Sobolev embeddings) Let be bounded and Lipschitz�regular.
Then the space BV () is continuously embedded in LN=(N�1)(), and com-
pactly embedded in Lp() for every 1 � p < N=(N � 1).
The �rst assertion is a consequence of the previous theorem. The second
means that if a sequence of functions (uj)j�1 is bounded in BV (), i.e.,
Inverse problems in Image processing and Image segmentation 19
supj kujkL1()+ jDujj() < +1, then we can extract a subsequence ujk and
there exists a function u 2 BV () such that, as k!1, Dujk*Du weakly-�as a measure and ujk!u strongly in Lp(), for every p < N=(N � 1).
Theorem 4 (Approximation by smooth functions) Let u 2 BV ().
Then there exists a sequence (un)n�1 � C1() such that, as n!1, un!u
in L1(), Dun*Du weakly-� as measures, and
jDunj() =
Zjrun(x)j dx ! jDuj():
These properties (in fact, mainly Theorem 2) are su�cient to derive the
existence for problem (2), as we are going to show in the next section.
2.1.3 Existence for the Rudin-Osher approach
The existence for problem (2) in dimension N = 1 or N = 2 is ensured
provided we assume that
� the operator A satis�es A1 = 1 (i.e., the image of a constant function
is the same function),
� the initial data satis�esR jg(x)��
R gj2 dx � �2,
� there exists a ~u satisfying (1) such that jDuj() < +1.
The �rst assumption is not absolutely necessary (we need that A1 6� 0)
but simpli�es a lot the proof, it is obviously satis�ed if A corresponds to a
convolution with a kernel of integral 1 (Au = � � u,R� = 1) (provided the
boundary e�ects are treated correctly). The second assumption is needed,
observe that if the model g = Au + n is correct then it should be satis�ed
(with n rapidly oscillating so thatRAu �n ' 0). The last assumption means
that I = inf fjDuj() : u satis�es (1)g < +1, otherwise any u satisfying (1)
is a solution but the problem is of little interest. In the general continuous
setting the existence of such a ~u is not absolutely obvious.
The following proof is taken from [24]. We consider a minimizing sequence
(un)n�1 for (2), of functions un that all satisfy the constraints and such
that jDunj()!I as n!1. Such a sequence exists because of our third
assumption. We assume in order to simplify the notations that jj = 1 (so
that in particular �R u =
R u for every u). We show, �rst, that the average
20 A. Chambolle
mn =R un remains bounded. This is obvious if A is the identity, or has a
continuous inverse. Otherwise, we can write (since A1 = 1)
�2 =
ZjAun � gj2 =
ZjAun �mn +mn � gj2
=
ZjA (un �mn) +mn � gj2
so that
� � kmn � gkL2() � kA (un �mn)kL2()
� kmn � gkL2() � kAk kun �mnkL2()
where kAk denotes the norm of A as a continuous operator of L2(). Since
N = 1 or 2, 2 � N=(N � 1) and by (11),
kun �mnkL2() =
un � Zun(x) dx
L2()
� cjDunj(): (12)
The total variation jDunj() remains bounded, therefore also mn =R un is
bounded. This implies (using again (12)) that un is bounded in L2().
Upon extracting a subsequence we may thus assume that there exists
u 2 L2() \ BV () such that un*u weakly in L2 and Dun*Du weakly-�as a measure. We also have (since A is continuous and linear) Aun*Au,
therefore by semicontinuity we get
jDuj() � lim infn!1
jDunj() = I; and,
ZjAu(x)� g(x)j2 dx � �2;
ZAu(x) dx =
Zg(x) dx:
(Alternatively, we could invoke Theorem 3 to deduce that some subsequence
of (un) converges to some u strongly in L1(), and Theorem 1 to conclude
that jDuj() � lim infn!1 jDunj() = I.)
We now introduce for t 2 [0; 1] the function ut = tu + (1 � t)R g. We
have for every t, jDutj() = tjDuj() � tI � I,RAu
t =R g, and we haveR
jAu0 � gj2 =R jg �
R gj2 � �2 (by assumption), and
R jAu1 � gj2 =R
jAu � gj2 � �2. By continuity of the map t 7!R jAut � gj2, there
exists therefore a t0 2 [0; 1] such that ut0 satis�es (1), and jDut0 j() � I.
Necessarily we must have jDut0 j() = I, so that t0 = 1 and u is the solution
of problem (2).
Inverse problems in Image processing and Image segmentation 21
2.2 More properties of BV functions
In the previous section we have just introduced the very basic properties
of BV functions that allowed us to state correctly problem (2) and show
that it is well posed. Now, if we want to study the weak Mumford�Shah
energy (7), we see that we need to know more properties of these functions.
In particular, we must de�ne correctly the discontinuity set Su and study
its regularity. We also need to describe precisely the measure Du. This
will be done in the next sections. We will not prove all the results since it
is too di�cult for the purpose of these lectures, but we will try to give a
correct idea of these results by describing with more precision the simpler
one-dimensional case.
2.2.1 The jumps set Su
Let us �rst introduce the approximate limits of a function u at some point
x 2 . Given u : ![�1;+1] a measurable function, we can de�ne the
approximate upper limit of u at x 2 as
u+(x) = inf
�t 2 [�1;+1] : lim
�#0
jfy : u(y) > tg \B�(x)j�N
= 0
�;
where B�(x) is the ball of radius � centered at x and jEj denotes the Lebesguemeasure of the set E. u+(x) is thus the greatest lower bound of the set of
values t for which the set fu > tg has (Lebesgue) density 0 at x: on the other
hand if t < u+(x), then this set must have strictly positive density at x. The
approximate lower limit u�(x) is de�ned in the same way i.e.,
u�(x) = �(�u)+(x) = sup
�t 2 [�1;+1] : lim
�#0
jfy : u(y) < tg \B�(x)j�N
= 0
�:
The set
Su = fx 2 : u�(x) < u+(x)g;is the set of essential discontinuities of u, it is a (Lebesgue�)negligible Borel
set. If x 62 Su, we write ~u(x) = u�(x) = u+(x) = ap limy!x u(y), and when
~u(x) 6= �1 we say that u is approximately continuous at x.
Let us �rst analyse the one�dimensional case, which is simpler.
2.2.2 BV functions in one dimension
In this section, we consider a (bounded) interval I = (a; b) � R (here, a < b
are two real numbers). In this case the total variation of a function u 2 L1(I)
22 A. Chambolle
is simply
V (u; I) = sup
�ZI
u(x)v0(x) dx : v 2 C1c (I; [�1; 1])
�:
Notice that the usual �classical� de�nition is di�erent:
Var (u; I) = sup
(n�1Xi=1
ju(ti+1)� u(ti)j : a < t1 < � � � < tn < b
)
and in general we do not have V (u; I) = Var (u; I). Indeed, the second
de�nition depends on the pointwise values of the function u and the value of
Var (u; I) can be made in�nite by changing u on a set of measure zero (for
instance on a sequence (xn)n�1 of points in I), whereas the �rst de�nition
gives the same value for two functions that are almost everywhere equal. In
fact, if u 2 C1(I), then clearly for every v 2 C1c (I; [�1; 1]),
RIuv0 = �
RIu0v,
and we deduce that V (u; I) =RIju0j, in this case it is easy to show that
V (u; I) = Var (u; I).
Exercise. Show that for every u, V (u; I) � Var (u; I). [ Hint: consider
v 2 C1c (I; [�1; 1]) and remark that limh!0
1h
RIh(v(x + h) � v(x))u(x) dx =R
v0(x)u(x) dx. (Here Ih = fx 2 I : x+ h 2 Ig.) Prove then that for h > 0
su�ciently small, 1h
RIh(v(x+ h)� v(x))u(x) dx � Var (u; I). ]
In the general case we have that V (u; I) � Var (v; I) and V (u; I) = minfVar (v; I) :
v = u a.e.g (see the next exercise).The distributional derivative of u 2 L1(I) is the distribution Du de�ned
by
hDu;'iD0(I);D(I) = �ZI
u(x)'(x) dx
for every ' 2 D(I) (i.e., the set C1c (I) with the appropriate topology). The
function u is in BV (I) if and only if Du is a bounded Radon measure on I,
which means that Du 2 M() ' C0(I)0, the dual of C0(I), which is the set
of continuous functions on I = [a; b] such that u(a) = u(b) = 0. It can be
proved (quite easily) that Du is a bounded Radon measure on I if and only
if V (u; I) < +1, and that in this case we have
V (u; I) = jDuj(I) = sup
(nXi=1
jDu(Ii)j :n[i=1
Ii � I ; Ii \ Ij = �8i 6= j
);
(where the sets Ii are Borel sets) the right-hand side of the last equation being
the standard de�nition of the total variation of the measure Du (which is
Inverse problems in Image processing and Image segmentation 23
also the norm of Du when it is seen as an element of the dual C0(I)0). (More
generally, the variation of a vector�valued (or real�valued) Borel measure �
is the Borel positive measure j�j de�ned by equation (9).)
We now introduce two functions ul and ur, de�ned for every x 2 I by
ul(x) = Du((a; x)) and ur(x) = Du((a; x]):
Here, as usual, (a; x) = fy : a < y < xg denotes the open interval of
extremities a and x > a, which is sometimes also denoted by ]a; x[, whereas
(a; x] is the interval fy : a < y � xg.
Lemma 1 The function ul is left�continuous, while ur is right�continuous.
Moreover, ul = ur except on a set at most countable.
Proof. First of all, ur(x) � ul(x) = Du(fxg) so that ur(x) = ul(x) except
when x is an atom of the measure Du (a point such that Du(fxg) 6= 0),
but a bounded vector (or real�valued) measure can have at most a countable
number of atoms.
To show that ul is left�continuous, for any x 2 I and each sequence of
non�negative numbers �n # 0 we must show that ul(x� �n) goes to ul(x) asn!1. But jul(x��n)�ul(x)j = jDu([x��n; x))j � jDuj([x��n; x)), and by
standard properties of positive measures we know that (assuming, without
loss of generality, that �n is a decreasing sequence) lim jDuj([x � �n; x)) =
jDuj(\n�1[x � �n; x)) = jDuj(�) = 0. Therefore ul is left�continuous. For
the same reason, ur is right�continuous (indeed, ur(x) = Du(I)�Du((x; b)) ).
Remark. More precisely, we can show in the same way that
lim�!0� > 0
ul(x� �) = lim�!0� > 0
ur(x� �) = ul(x); and lim�!0� > 0
ul(x+ �)
= lim�!0� > 0
ur(x+ �) = ur(x):
Lemma 2 The distributional derivatives of ul, ur and u are equal (Dul =
Dur = Du).
24 A. Chambolle
Proof. Let us show, for instance, that Dul = Du. Consider ' 2 D(I). We
have (using Fubini's theorem)ZI
'Dul = �Z
b
a
'0(x)Du((a; x)) dx = �Z
b
a
'0(x)
(Zy2(a;x)
Du(dy)
)dx =
= �Zy2(a;b)
(Zx2(y;b)
'0(x) dx
)Du(dy) = �
ZI
�'(y)Du(dy) =
ZI
'Du;
showing the desired equality.
In particular, we deduce from the last lemma thatD(u�ul) = D(u�ur) =D(ul�ur) = 0 so that the functions u, ul, ur can di�er at most by a constant.
We can rede�ne the functions ul and ur by adding the appropriate constant
so that ul = u and ur = u almost everywhere in I (i.e., now, ul(x) =
c +Du((a; x)) and ur(x) = c + Du((a; x]) with c 2 R appropriately chosen
to have ul = ur = u a.e.) We have shown so far the following proposition.
Proposition 1 Every u 2 BV (I) has a left�continuous and a right-continuousrepresentant1
Exercise. Show that jDuj(I) = Var (ul; I) = Var (ur; I).
We now introduce the function
_u =Du
L1
which is the Radon�Nykodym derivative of the measure Du with respect
to the Lebesgue measure L1 on I (in particular, _u 2 L1(I) ). The Radon�
Nykodym derivation theorem states that for L1�a.e. x 2 I,
_u(x) = lim�!0
Du((x� �; x+ �))
2�= lim
�!0
Du([x� �; x+ �])
2�
and we can write the measure Du as
Du = _u(x) dx + Dsu
with Dsu ? L1, which means that there exists a Borel set E � I such that
jEj = L1(E) = 0 and jDsuj(I n E) = 0. In particular the Radon�Nykodym
1We recall that a representant of a function u 2 L
1is a function ~u a.e. equal to u, or
more precisely belonging to the equivalence class of a.e. equal functions de�ning u.
Inverse problems in Image processing and Image segmentation 25
derivativejDs
ujL1 is zero, so that for L1�a.e. x 2 I, lim�!0 jDsuj((x � �; x+ �))=2�
= 0.
Consider now x a Lebesgue point of _u, i.e., such that lim�!01�
Rx+�x�� j _u(y)�
_u(x)j dy = 0 (a.e. x 2 I satis�es this property), and also assume that
lim�!0 jDsuj([x� �; x+ �])=2� = 0. Then:
lim sup�#0
����ul(x+ �)� ul(x)
�� _u(x)
���� = lim sup�#0
����1��Z
x+�
x
_u(y) dy
+Dsu([x; x+ �))�� _u(x)
��� � lim sup�#0
1
�jDsuj([x; x+ �))
+ lim sup�#0
1
�
Zx+�
x
j _u(y)� _u(x)j dy = 0:
In the same way, we can prove that lim sup�#0 jul(x� �)� ul(x)=�� _u(x)j =0, showing that ul has a (classical) derivative at x which is _u(x). We have
shown the following proposition.
Proposition 2 The functions ul and ur have a derivative a.e. in I, and
u0l(x) = u0r(x) = _u(x) for a.e. x 2 I.
Remark. In a similar way we can show that at a.e. x,
lim sup�!0
1
2�
Zjy�xj<�
ju(y)� u(x)� _u(x)(y � x)jjy � xj dy = 0; (13)
which expresses the fact that _u(x) is the approximate derivative of u at x.
This property will have a generalization in higher dimension.
2.2.3 The jumps set and the singular part of Du
Now we will try do describe better the singular part Dsu of the measure Du,
and the set Su. The �rst property is the following.
Proposition 3 At every x 2 I, u+(x) = ul(x) _ ur(x) and u�(x) = ul(x) ^ur(x). In particular, Su = fx 2 I : ul(x) 6= ur(x)g.
Remark. By Lemma 1 we deduce that the set Su is at most countable. In
fact, since ur(x)�ul(x) = Du(fxg), it is the set of the atoms of the measure
Du.
26 A. Chambolle
Proof. Let us �rst show that u+(x) � ul(x) for every x 2 I. Let t <
ul(x): by the left�continuity of ul there exists � > 0 such that x� � < y �x ) ul(y) > t. Therefore fy : ul(y) > tg � (x � �; x) so that if �0 � �,
�0 = j(x � �0; x)j � jfy : ul(y) > tg \ B�0(x)j = jfy : u(y) > tg \ B�0(x)j,where the last equality comes from the fact that u = ul a.e. in I. We deduce
that lim inf�0#0 jfy : u(y) > tg\B�0(x)j � 1 so that (by the de�nition of u+)
t � u+(x). Thus ul(x) � u+(x). In the same way we get that ur(x) � u+(x).
Conversely let t > ul(x) _ ur(x). By left� and right�continuity we know
that there exists � > 0 such that x � � < y < x ) ul(y) < t and x < y <
x + � ) ur(y) < t. As before we deduce this time that lim sup�0#0 jfy :
u(y) > tg \ B�0(x)j = 0. Thus, u+(x) � ul(x) _ ur(x). This proves that
u+ = ul _ ur on I. The proof of the equality u� = ul ^ ur is identical.
Now, we split the measure Dsu into two parts, called respectively Ju
(�J� for �jumps�) and Cu (�C� for �Cantor�):
Ju = Dsu Su and Cu = Dsu (I n Su):
(Notice that, since jSuj = 0 (Su is �nite or countable), Ju is also Du Su).
Since Su is the set of the atoms of the measure Du, we have
Ju =Xx2Su
Du(fxg)Æx
=Xx2Su
(ur(x)� ul(x))Æx:
(Æx stands for the Dirac mass at x.) This measure represents the jumps of u
across its discontinuities. It can also be written as
Ju =Xx2Su
(u+(x)� u�(x))�u(x)Æx = (u+ � u�) �uH0 Su (14)
where �u(x) 2 f�1;+1g represents the direction of the jump of u at x:
�u(x) = +1 if ul(x) = u�(x), ur(x) = u+(x), so that u is �increasing� at x
(ul(x) < ur(x)), whereas �u(x) = �1 when ul(x) = u+(x), ur(x) = u�(x),meaning u is �decreasing� at x (ul(x) > ur(x)). This last expression (14) will
be generalized in higher dimension.
Consider now the measure Cu. It has no atoms (i.e., Cu(fxg) = 0 for
every x 2 I) since Du(fxg) = 0 and Dsu(fxg) = 0 for every x 2 I n Su. Onthe other hand, it is singular with respect to the Lebesgue measure L1 (i.e.,
Inverse problems in Image processing and Image segmentation 27
Cu ? L1). It is called the Cantor part of u. We will soon show an example
of a function u with Du having a Cantor part.
Let us now return for a while to the weak Mumford�Shah functional (7).
In one�dimension, we can write it
E(u) =
ZI
j _u(x)j2 dx + H0(Su) +
ZI
ju(x)� g(x)j2 dx:
(Here the zero�dimensional Hausdor� measure of Su is simply the cardinality
]Su of the set Su.)
In our de�nitions of _u(x) and Su, we see that the weak energy E(u) iscorrectly de�ned. However, if we try to �nd a minimum of E(u) in the
class of all functions with bounded variation, we realize that inffE(u) : u 2BV (I)g = 0 and that it is in general not reached! This happens because it
is possible to approximate every function in L2(I) (here, g) by BV functions
such that Su = �, _u(x) = 0 a.e., and all the derivatives are Cu. A typical
example of such a function is the �Cantor-Vitali� function, de�ned as the
(uniform) limit of the continuous functions in [0; 1]
uk(x) =jCk \ [0; x]j
jCkjwhere C0 = [0; 1]; Ck = Ck�1n
3k�1[n=1
�n
3k;n+ 1
3k
�for k � 1,
see Figure 4. The set C = \1k=0Ck = limk Ck is the Cantor set, it has zero
length. The function u is continuous, and u0 = 0 in [0; 1] n C, i.e., almost
everywhere in (0; 1). The derivativeDu is entirely supported by the negligible
set C, and is therefore singular with respect to the Lebesgue measure. Thus
Du = Cu.
Exercise. Show that any function f 2 L2(0; 1) can be approximated in L2
norm by a sequence fn of functions in BV (0; 1) with Dfn = Cfn ( _fn = 0,
Sfn = � for every n).
If we want to minimize E(u), we have to restrict ourselves to the set of
function we want to consider. We will therefore introduce a new subspace of
BV (I), made of the functions for which Cu is zero.
2.2.4 Special BV functions, in dimension one
De�nition. We say that a function u 2 BV (I) is a special function with
bounded variation if Cu = 0, which means that the singular part Dsu of the
28 A. Chambolle
u1
u2
u20
10:8890:7780:6670:5560:4440:3330:2220:1110
1
0:875
0:75
0:625
0:5
0:375
0:25
0:125
0
Figure 4: The Cantor-Vitali function.
distributional derivative Du is concentrated on the jump set Su. We denote
by SBV (I) the space of such functions.
The main tool in order to prove the existence of a minimizer for the weak
Mumford�Shah energy E is the following compactness and semicontinuity
theorem, due to Ambrosio.
Theorem 5 (Ambrosio, one dimensional version) Let I � R be an open
and bounded interval and (uj) be a sequence in SBV (I). Suppose that
supj
ZI
_uj(x)2 dx + H0(Suj ) + kujkL1(I) < +1:
Then there exist a subsequence (not relabeled) and a function u 2 SBV (I)
such thatuj(x)!u(x) a.e. in I;
_uj* _u weakly in L2(I);
H0(Su) � lim infj!1
H0(Suj ):
(15)
Proof. Consider such a sequence uj. In what follows we will extract several
subsequences from uj that will all still be denoted by uj . Remark that
Inverse problems in Image processing and Image segmentation 29
since supjH0(Suj ) = supj ]Suj < +1, there exists an integer k such that
k = lim infj ]Suj and we can extract a �rst subsequence such that ]Suj = k
for every j. We let Suj = fx1j; � � � ; xk
jg, with a < x1
j< x2
j< � � � < xk
j< b.
Extracting a further subsequence we may assume that each xnjconverges to
some xn 2 �I = [a; b]. For t � 0 we will set It = I n[kn=1[xn� t; xn+ t]. For a
�xed ` � 1, if j is large enough we have that xnj62 I1=` for every n = 1; � � � ; k.
In this case, uj 2 H1(I1=`) and is uniformly bounded:
supj
ZI1=`
ju0j j2 dx = supj
ZI1=`
j _uj j2 dx < +1 ; and
supj
kujkL1(I1=`)< +1:
We can therefore extract a subsequence such that uj converges to some func-
tion u 2 H1(I1=`), uniformly on I1=`, and _uj = u0j*u0 weakly in L2(I1=`).
Using a diagonal procedure, since [`�1I1=` = I0, we can in this way build
a function u 2 H1loc(I0) such that uj!u locally uniformly on I0 and _uj*u0
weakly in L2loc(I0).
But since _uj is bounded in L2(I) = L2(I0), we deduce that _uj*u0 weaklyin L2(I).
In particular, u0 2 L2(I) and u 2 H1(I n fx1; � � � ; xkg) \ L1(I), so that
u 2 SBV (I), _u = u0, and Su � fx1; � � � ; xkg, showing also that ]Su � k and
achieving the proof of Theorem 5.
Exercise. Show that u 2 H1(I n fx1; � � � ; xkg) \ L1(I) ) u 2 SBV (I),
_u = u0, and Su � fx1; � � � ; xkg.Exercise. Use Theorem 5 to show that the weak Mumford�Shah energy Ehas a minimizer in SBV (I).
2.2.5 The general, N�dimensional case
We return to the general case of functions de�ned on an open set � RN ,
N � 1.
If u 2 BV (), it can been shown that the set Su is countably (HN�1; N�1)�recti�able, i.e.,
Su =1[i=1
Ki [N
where HN�1(N ) = 0 and each Ki is a compact subset of a C1�hypersurface
�i. Note that this is a very weak notion of regularity: the set Su could still
be, for instance, dense in .
30 A. Chambolle
There exists a Borel function �u : Su!SN�1 such that HN�1-a.e. in Su
the vector �u(x) is normal to Su at x in the sense that it is normal to �iif x 2 Ki. For every u; v 2 BV (), we must therefore have �u = ��vHN�1-a.e. in Su \ Sv.
As in the one�dimensional case, the derivative Du of every u 2 BV ()
can be decomposed as follows:
Du = ru(x) dx + Ju + Cu
= ru(x) dx + (u+ � u�)�uHN�1 Su + Cu
where ru = Du
LN , the Radon�Nykodym derivative of Du with respect to the
Lebesgue measure LN , is also the approximate gradient of u, de�ned a.e. in
by
ap limy!x
u(y)� u(x)� hru(x); y � xijy � xj = 0;
(remember equation (13)). HN�1 Su is the restriction of the (N � 1)�
dimensional Hausdor� measure to the set Su so that Ju = (u+�u�)�uHN�1 Suis the �jump� part of the measure Du, that is carried by the discontinuity set
of u (compare with equation (14)). Eventually, Cu is the Cantor part of the
measureDu, which is singular with respect to the Lebesgue measure and such
that jCuj(E) = 0 for any (N � 1)�dimensional set E with HN�1(E) < +1.
With these de�nitions of ru(x) and Su, we see here again that the weak
energy (7), E(u), is correctly de�ned. Here again as in the one�dimensional
case we have inffE(u) : u 2 BV ()g = 0 and the in�mum is usually not
reached. We must consider as previously the functions u 2 BV (), such that
Cu is zero.
2.2.6 Special BV functions
De�nition. We say that a function u 2 BV () is a special function with
bounded variation if Cu = 0, which means that the singular part of the
distributional derivative Du is concentrated on the jump set Su. We de-
note by SBV () the space of such functions. We also de�ne the space
GSBV () of generalized SBV functions as the set of all measurable func-
tions u : ![�1;+1] such that for any k > 0, uk = (�k_u)^k 2 SBV ()(where X ^ Y = min(X;Y ) and X _ Y = max(X;Y )) (This follows Ambro-
sio's de�nition in [2], notice that sometimes GSBV () is de�ned as the space
Inverse problems in Image processing and Image segmentation 31
we call hereafter GSBVloc(), which is the space of functions that belongs
to GSBV (A) for any open set A �� , i.e., such that A is compact and
included in .)
If u 2 GSBVloc() \ L1loc(), u has an approximate gradient a.e. in ,
moreover, as k " 1, the function uk = (�k _ u) ^ k satis�es
ruk!ru a.e. in , and jrukj " jruj a.e. in ; (16)
Suk � Su; HN�1(Suk)!HN�1(Su) and �uk = �u HN�1-a.e. in Suk .(17)
2.2.7 Ambrosio's compactness theorem
We mention the following compactness and lower semi-continuity result that
was proved in [2]:
Theorem 6 (Ambrosio) Let be an open subset of RN and let (uj) be a
sequence in GSBV (). Suppose that there exist p 2 [1;1] and a constant C
such that Zjruj j2 dx + HN�1(Suj ) + kujkLp() � C < +1
for every j. Then there exist a subsequence (not relabeled) and a function
u 2 GSBV () \ Lp() such that
uj(x)!u(x) a.e. in ;
ruj*ru weakly in L2(;RN );
HN�1(Su) � lim infj!1
HN�1(Suj ):
(18)
Moreover ZSu
jh�u; �ij dHN�1 � lim infj!1
ZSuj
jD�uj ; �
Ej dHN�1 (19)
for every � 2 SN�1.
There exist variants of this theorem, with di�erent proofs (see [3, 4, 5]).
We need however in these lectures to consider this version, since the con-
clusion (19) will be useful in order to study the anisotropic variants of the
Mumford�Shah functional that appear in the �nite di�erences discretizations
that are common in image processing.
32 A. Chambolle
Remark. By a standard diagonalization technique Theorem 6 also holds if
uj and u are only in GSBVloc().
In this setting we are now able to show the existence of the weak Mumford�
Shah functional of Ambrosio and De Giorgi (cf section 2.3.1). However, �rst
we end this section on functions with bounded variation with a paragraph
about some useful additional properties.
2.2.8 Slicing
We now explain how a (special) bounded variation function can be described
and its properties recovered from its 1-dimensional �slices�, i.e., its restric-
tions to 1-dimensional lines. Many results of the sections 2.2.2�2.2.4 can be
extended to the N�dimensional case using the following properties. In fact,
most of Theorem 6 (in the case p =1) can be recovered from Theorem 3 and
Theorem 5 in this way, the very di�cult part being to show that ruj*ru.Many of the following results will be needed in order to study the variational
approximations of the Mumford�Shah functional.
We consider for � 2 SN�1 the sets �? = fx 2 R
N : h�; xi = 0g and for
any z 2 �?, z;� = ft 2 R : z + t� 2 g. On z;� we de�ne a function
uz;� : z;�![�1;+1] by uz;�(s) = u(z + s�). If u 2 BV (), we have
the following classical representation (see for instance [2, 7]): for HN�1-a.e.z 2 �?, uz;� 2 BV (z;�) and for any Borel set B �
hDu; �i(B) = hDu(B); �i =
Z�?dHN�1(z)Duz;�(Bz;�)
where Bz;� is de�ned in the same way as z;�; conversely if uz;� 2 BV (z;�)
for at least N independent vectors � 2 SN�1 and HN�1-a.e. z 2 �?, and ifZ
�?dHN�1(z)jDuz;�j(z;�) < +1
then u 2 BV (). Now (see [3, 2]), if u 2 SBVloc(), then for almost every
z 2 �?, uz;� 2 SBVloc(z;�) (the converse is true provided this property
is satis�ed for at least N independent vectors � and u has locally bounded
variation), and the approximate derivative satis�es
_uz;�(s) = hru(z + s�); �i
for a.e. s 2 z;�, moreover
Suz;� = fs 2 z;� : z + s� 2 Sug ;
Inverse problems in Image processing and Image segmentation 33
(uz;�)�(s) = u�(z + s�) 8s 2 Suz;� ;
and for any Borel set B � Z�?dHN�1(z)H0(Bz;� \ Suz;�) =
ZB\Su
jh�u(x); �ij dHN�1(x):
The reader interested in knowing more about the space BV and how the
results in this section are proved should consult, for instance, the books [6,
34, 36, 42, 62].
2.3 Back to the Mumford�Shah functional
2.3.1 Existence for the weak formulation
Now, in this setting, it is clear that the weak Mumford�Shah functional (7)
has a minimum in GSBV () (which, in fact, is in SBV ()). Indeed, con-
sider a minimizing sequence (uj)j�1 for the problem infuE(u), in GSBV ().
Then, this sequence satis�es the conditions of Theorem 6 (with p = 2).
Therefore, some subsequence (still denoted by uj) converges almost every-
where to a function u 2 GSBV () withZjru(x)j2 dx � lim inf
j!1
Zjruj(x)j2 dx
(since ruj goes to ru weakly in L2(), by (18)),
HN�1(Su) � lim infj!1
HN�1(Suj ); and
Zju(x)� g(x)j2 dx � lim inf
j!1
Zjuj(x)� g(x)j2 dx
(by Fatou's lemma). Therefore E(u) � lim infj!1E(uj) and u is a mini-
mizer for the weak Mumford�Shah functional. Notice that since g is bounded,
we can always replace u with its truncation at level kgk1, (�kgk1 _u(x))^kgk1, and decrease the energy, so that the minimum u has to satisfy kuk1 �kgk1 and is in SBV () \ L1().
In the next section we will explain how the weak problem is then related
to the strong original one (that is, the minimization of E(u;K)).
34 A. Chambolle
2.3.2 From the weak to the strong formulation
Once we have proved the existence of a minimizer for the weak Mumford�
Shah energy E(u) using Theorem 6, we need to show that it can also be
considered as a minimizer for the original energy E(u;K) de�ned by (6). In
arbitrary dimension N the general de�nition for E is
E(u;K) =
ZnK
jru(x)j2 dx + HN�1(K \ ) +
Zju(x)� g(x)j2 dx;
where the length has been replaced with the (N � 1)�dimensional Hausdor�
measure. (We have also dropped the constant parameters �; ��.)
The natural way to associate a set K to u 2 SBV () is to set K = Su.
However, if u is arbitrary, we could have HN�1(K \ ) > HN�1(Su). For
instance, the function
v(x) =1Xk=1
1
2k�B2�k (xk)
;
where the sequence (xk)k�1 is the set of all points in with rational coor-
dinates, is such that Sv = \[1k=1@B2�k(xk). This has �nite length, but is
dense in . Thus HN�1(Sv \ ) = +1.
A minimizer u of E(u) will be a minimizer for E if and only if we can
prove that HN�1(Su \ ) = HN�1(Su). (Conversely, it is �simple� to show
that if u 2 H1( n K) and K is a closed set with HN�1(K) < +1, then
u 2 SBV () and HN�1(Su n K) = 0, that is, Su is included in K up to a
HN�1�negligible set.)This di�cult result was proved by De Giorgi, Carriero and Leaci [29], and
independently in dimension N = 2 by Dal Maso, Morel and Solimini [28] (see
also the book [48] for a general overview of the problem). They proved that
if u minimizes E(u), then HN�1( \ Su n Su) = 0 and u 2 C1( n Su), sothat (u; Su) minimizes E .
We make here the observation that if we slightly change the problem,
introducing an anisotropy in the energy, then these results still hold. Indeed,
if we consider a weak functional
E0(u) =ZQ(ru(x)) dx +
ZSu
N(�u(x)) dHN�1(x);+Zju(x)� g(x)j dx;
(20)
where Q is a positive de�nite quadratic form in RN and N is a norm in RN (a
1-homogeneous convex function with 0 < min�2SN�1N(�) � max�2SN�1N(�)
Inverse problems in Image processing and Image segmentation 35
< +1), then E0 has a minimizer u in GSBV () (exercise, you need to use
inequality (19) in Theorem 6), moreover, it is possible to adapt the proofs
in [29] and show that HN�1( \ Su n Su) = 0 and u 2 C1( n Su).
2.4 Variational approximations and ��convergence
In these lectures we will describe a few ways to approximate the Mumford�
Shah problem, or variants of this problem. This has to be done because
numerically, it is di�cult to deal with a jump setK. We introduce in this part
a special notion of convergence that is adapted to variational problems. As a
matter of fact, if you are looking for the minimizer of a function F (x), x 2 X(where X is some space), and want to approximate it with minimizers (xn)
of approximate problems minx2X Fn(x), then when can you say that (xn)
converge to a minimizer of F ? If you consider the classical notions of limits
of functions, then only the uniform convergence seems suitable to handle
this problem. However, this notion of convergence is far too strong for most
applications. This motivates the introduction of the following de�nition of ��
convergence, specially invented for studying the limit of variational problems.
We will limit ourselves to the case where X is a metric space. For more
details we refer to [27].
Given a metric space (X; d) and Fk : X![�1;+1] a sequence of func-
tions, we de�ne for every u 2 X the �-lim inf of F
F 0(u) = �� lim infk!1
Fk(u) = infuk!u
lim infk!1
Fk(uk)
and the �-lim sup of F
F 00(u) = �� lim supk!1
Fk(u) = infuk!u
lim supk!1
Fk(uk);
and we say that Fk ��converges to F : X![�1;+1] if F 0 = F 00 = F .
F 0, F 00, and F (if they exist) are lower semi-continuous on X. We have the
following two properties:
1. Fk ��converges to F if and only if for every u 2 X,
(i) for every sequence uk converging to u, F (u) � lim infk!1 Fk(uk);
(ii) there exists a sequence uk that converges to u and such that
lim supk!1 Fk(uk) � F (u);
36 A. Chambolle
2. If G : X!R is continuous and Fk ��converges to F , then Fk + G ��
converges to F +G.
The following result makes clear the interest of the notion of ��convergence:
Theorem 7 Assume Fk ��converges to F and for every k let uk be a mini-
mizer of Fk over X. Then, if the sequence (or a subsequence) uk converges
to some u 2 X, u is a minimizer for F and Fk(uk) converges to F (u).
Eventually, we give the following de�nition of ��convergence in the case
where (Fh)h>0 is a family of functionals on X indexed by a continuous pa-
rameter h: we say that Fh ��converges to F in X as h # 0 if and only if for
every sequence (hj) that converges to zero as j!1, Fhj ��converges to F .
The reader who would like to know more about the ��convergence may
consult the books [9, 27]. Also, the excellent notes [1] by G. Alberti contain
a good introduction to this theory as well as to the applications to phase
transition problems, that are very close (at least technically) to the methods
and techniques of section 4.
3 The numerical analysis of the total variation min-
imization
3.1 The discrete energy
Let us consider problem (2), in dimension 2, and let us try to �nd a way
to compute a solution. We will discuss the approach studied by Vogel and
Oman [60, 61] (see also [24, 31]).
Although it is not absolutely obvious we will �rst assume that there
exists a Lagrange multiplier � > 0 such that problem (2) is equivalent to the
problem
minu2BV ()
jDuj() + �
ZjAu(x)� g(x)j2 dx (21)
(see [24] for details, we must assume here A1 = 1, so that a minimizer
of (21) automatically satis�esRAu =
Rg, as well as the other assumptions
of section 2.1.3). The problem of determining the correct � is also di�cult,
we will not consider it in this short section.
First we must discretize (21). For simplicity we assume that u and g
are discretized on the same square lattice, i; j = 1; � � � ; L. (This is the case
in some situations, but there exist other common situations like the recon-
struction of tomographic data, or the zooming, where it is not true.) The
Inverse problems in Image processing and Image segmentation 37
functions u and g are approximated by discrete matrices U = (Ui;j)1�i;j�Land G = (Gi;j)1�i;j�L. The term �
R jAu(x) � g(x)j2 dx is replaced, in the
discrete setting, by a term �P
i;j j(AU)i;j � gi;jj2. (We omit the scale factor,
but it is important in the practical applications.) In the discrete formula A
denotes a linear operator of RN = RL�L (we set N = L2) and (AU)i;j is the
component i; j of AU .
There are several ways to approximate jDuj(). The simplest (which,
however, has several drawbacks), is to consider the variation along the hori-
zontal and vertical directionsX1�i<L
X1�j�L
jUi+1;j � Ui;jj +X
1�i�L
X1�j<L
jUi;j+1 � Ui;jj
(here again we omitted the scale factor). Due to the strong anisotropy of
this approximation the results it gives are not very good (in fact it is an
approximation of jD1uj() + jD2uj() �where in R2 Diu is the derivative
of u along the ith direction, i = 1; 2� which is a semi�norm in BV () that
is equivalent to jDuj() but not invariant under rotations in the plane), and
many other authors try to consider a �more isotropic� approximation of the
total variation (see for instance [60, 33, 47].
Therefore the discrete energy we need to minimize is the following
E(U) =Xi;j
(jUi+1;j � Ui;jj + jUi;j+1 � Ui;jj) + �Xi;j
j(AU)i;j � gi;j j2: (22)
3.2 The method
Due to the strong nonlinearity of (22) (or rather of its derivative DUE), it is
di�cult (although feasible) to minimize it by a straightforward gradient de-
scent method. The nonexistence of the derivative of the absolute value jxj atx = 0 (a problem that is often overcome by replacing jxj with
p� + x2, with
� a small parameter) is not the only di�culty. Another approach would be
to de�ne a dual problem, using convex duality. In the one�dimensional case
and when A is the identity, it leads to a very simple and e�cient algorithm,
but in other situations it is not very practical. (If you know a bit of convex
analysis you may think about it!) For a similar approach see [25].
The solution we will study here is common in the image processing liter-
ature (see [10, 15, 38, 39, 40, 60, 61]). It is closely related to the method we
will use in section 6, or to the approach in section 4.
38 A. Chambolle
It consists in noticing that for every x 2 R, x 6= 0,
jxj = minv>0
�v
2x2 +
1
2v
�;
the minimum being reached for v = 1=jxj. We thus introduce the func-
tion f(x; v) = vx2=2 + 1=(2v), a new �eld V = (Vi+1=2;j)1�i<L;1�j�L [(Vi;j+1=2)1�i�L;1�j<L 2 R
(L�1)�L+L�(L�1)+ (of positive real numbers) and a
new energy,
F (U; V ) =Xi;j
�f(jUi+1;j � Ui;jj; Vi+ 1
2;j) + f(jUi;j+1 � Ui;jj; Vi;j+ 1
2)�
+�Xi;j
j(AU)i;j � gi;jj2 =Xi;j
�1
2Vi+ 1
2;jjUi+1;j � Ui;jj2 +
1
2Vi;j+ 1
2jUi;j+1 � Ui;jj2
+1
2Vi+ 1
2;j
+1
2Vi;j+ 1
2
1A + �Xi;j
j(AU)i;j � gi;j j2
and we notice that
minV
F (U; V ) = E(U);
the minimum being reached for Vi+1=2;j = 1=jUi+1;j � Ui;jj (or at +1 if
Ui+1;j = Ui;j) and Vi;j+1=2 = 1=jUi;j+1 � Ui;jj.We choose some starting values U0; V 0 and compute for every n � 1
Un = arg minU
F (U; V n�1) ; and
V n = arg minV
F (Un; V ):
The idea is that as n becomes large, Un will converge to the minimizer of (22).
This is actually true if we slightly modify the algorithm (and the function E
we minimize).
We choose an " > 0 and introduce the convex closed set K" = fV : " �Vi+1=2;j � 1=" and " � Vi;j+1=2 � 1=" 8i; jg in R
M (M = (L � 1) � L +
L � (L � 1) ). We de�ne a new energy E"(U) = minV 2K" F (U; V ). It is
possible to show that as " becomes small, the minimizer of E" approaches
the minimizer of E. Moreover, it is easy to compute explicitly E":
E" =Xi;j
(j"(Ui+1;j � Ui;j) + j"(Ui;j+1 � Ui;j)) + �Xi;j
j(AU)i;j � gi;jj2
Inverse problems in Image processing and Image segmentation 39
where
j"(x) = min"�v�1="
f(x; v) =
8>>><>>>:12"x2 + "
2if jxj � ";
jxj if " � jxj � 1";
"
2x2 + 1
2"if jxj � 1
":
De�ne �"(x) = (" _ 1=jxj) ^ 1=" (= 1=jxj if " � jxj � 1=", 1=" if jxj � ",
and " if jxj � 1="). Then �"(x) is the unique value in ["; 1="] such that
j"(x) = f(x; �"(x)). We deduce that the unique V 2 K" for which E"(U) =
minK" F (U; �) = F (U; V ) is given by Vi+ 1
2;j= �"(xi+1;j � xi;j) and Vi;j+ 1
2=
�"(xi;j+1�xi;j) for every i; j. In this case we set �"(U) = V and this de�nes
a continuous function �" : RN!K" � R
M .
The algorithm, now, consists in computing for every n � 1, the starting
values U0; V 0 being chosen,
Un = arg minU
F (U; V n�1): ; and
V n = arg minV 2K"
F (Un; V ) = �"(Un):
3.3 Proof of the convergence of the algorithm
We assume (as in the continuous formulation) that A1N = 1N , where 1N is
the vector in RN de�ned by (1N )i;j = 1 for every 1 � i; j � L (remember
N = L� L is the dimension of the space where U lives). Then we have the
following proposition.
Proposition 4 There exist �U , �V = �"( �U) such that as n!1, Un! �U and
V n! �V , and �U is a (the) minimizer of E".
Proof. First we claim that the following holds
Lemma 3 There exist 0 < � < � < +1 such that the second derivatives
D2UUF and D2
V VF satisfy
�IN � D2UUF (U; V ) � �IN and �IM � D2
V V F (U; V ) � �IM
for every U 2 RN and V 2 K".
This is equivalent to saying that for every U 2 RN , V 2 K", � 2 R
N and � 2RM , �j�j2 �
D2UUF (U; V )�; �
�� �j�j2 and �j�j2 �
D2V V
F (U; V )�; ���
�IM j�j2. Here � and � both depend on ".
40 A. Chambolle
Proof. We will leave to the reader the proof of three of the inequalities of
the lemma and will prove the �rst one, which is the more di�cult. We �rst
recall the following �Poincaré inequality� (in �nite dimension): there exists
a constant c > 0 such that for every � 2 RN = R
L�L such thatP
i;j �i;j = 0,
X1�i;j�L
j�i;jj2 � c
0@ X1�i<L;j
j�i+1;j � �i;jj2 +X
i;1�j<Lj�i;j+1 � �i;jj2
1A : (23)
Exercise. Prove this inequality. Hint: suppose it is not true and consider
a sequence �n such that for every n,P
i;j �ni;j
= 0 and 1 =P
i;j j�ni;jj2 �nP
i;j(j�ni+1;j��ni;jj2+ j�ni;j+1��ni;jj2). Then, if � is the limit of a subsequence
�nk of �n, �nd a contradiction on �.
Notice that for every U , V 2 K" and � 2 RN ,D
D2UUF (U; V )�; �
E=
Xi;j
�Vi+ 1
2;jj�i+1;j � �i;jj2 + V
i;j+ 12j�i;j+1 � �i;jj2
�+ jA�j2
� "Xi;j
�j�i+1;j � �i;jj2 + j�i;j+1 � �i;jj2
�+ jA�j2
In particular, letting m(�) = (1=N)P
i;j �i;j be the average of �, we have
(since A1N = 1N )q
D2UUF (U; V )�; �
�� jA�j = jA(��m(�)1N )+m(�)1N j �
jm(�)1N j � jAjj� �m(�)1N j. But by (23), j� �m(�)1N j2 � cP
i;j(j�i+1;j ��i;jj2 + j�i;j+1 � �i;jj2) � (1=")
D2UUF (U; V )�; �
�, therefore
jm(�)1N j � cq
D2UUF (U; V )�; �
�(here c denotes any positive constant that
does not depend on U; V; �). Moreover, using again (23), cD2UUF (U; V )�; �
��
j� � m(�)1N j2. Since 1N and � � m(�)1N are orthogonal we deduce that
j�j2 � cD2UUF (U; V )�; �
�.
Remark. Observe the identity between this proof and the proof of the
coerciveness of the energy in section 2.1.3.
We next prove the following lemma.
Lemma 4 For every n � 1,
E"(Un�1) � E"(U
n) � �
2
�jUn�1 � Unj2 + jV n�1 � V nj2
�:
Proof. For every n � 1, we have DUF (Un; V n�1) = 0 while
hDV F (Un; V n); V � V ni � 0
Inverse problems in Image processing and Image segmentation 41
for every V 2 K". We deduce that (using Lemma 3)
F (Un; V n�1) = F (Un; V n) +DDV F (U
n; V n); V n�1 � V nE
+
Z 1
0(1� t)
�DD2V V F (U
n; V n + t(V n�1 � V n))(V n�1 � V n); V n�1 � V nEdt
� F (Un; V n) +�
2jV n�1 � V nj2:
In a similar way we prove that F (Un�1; V n�1) � F (Un; V n�1) + �
2jUn�1 �
Unj2. Since E"(Un) = F (Un; V n), the lemma is proved.
Since by construction the sequence E"(Un) = F (Un; V n) must decrease
and is bounded from below, it must have a limit in R+ and E"(Un�1) �
E"(Un)!0, therefore Un�1 � Un and V n�1 � V n go to zero as n!1.
Now, we notice that E" is coercive, which means that for every c > 0
the set fE" � cg is bounded in RN (and closed), hence compact (this can be
deduced from Lemma 3). Thus we may extract a subsequence Unk and �nd a�U 2 R
N such that as k!1, Unk! �U . By continuity V nk = �"(Unk)!�"( �U),
and we let �V = �"( �U). We also have DUF (Unk ; V nk�1) = 0 and since
V nk�1�V nk!0 by Lemma 4, V nk�1! �V so that by continuity, DUF ( �U; �V ) =
0.
We conclude using the following lemma
Lemma 5 Let �U; �V satisfyDUF ( �U; �V ) = 0 and �V = arg minV 2K" F (�U; V ) =
�"( �U ). Then DUE"( �U ) = 0.
Proof. Let h 2 RN and t > 0. Letting Vt = �"( �U + th) (that goes to �V as
t # 0), we have
E"( �U + th) � E"( �U) = F ( �U + th;�"( �U + th)) � F ( �U; �V )
= (F ( �U + th; Vt) � F ( �U; Vt))
+ (F ( �U; Vt) � F ( �U; �V ))
Since Vt 2 K", F ( �U; Vt) � F ( �U; �V )) so that E"( �U + th) � E"( �U ) � F ( �U +
th; Vt)�F ( �U; Vt). Since F ( �U+th; Vt)�F ( �U; Vt) = tDUF ( �U; Vt); h
�+Rt
0 (t�s)D2UUF ( �U; Vt)h; h
�dt and
Rt
0(t � s)D2UUF ( �U; Vt)h; h
�dt � �t2jhj2=2, we
deduce thatDUE"( �U ); h
�= lim
t#0
E"( �U + th)�E"( �U)
t�DUF ( �U; �V ); h
�= 0:
Since h is arbitrary, DUE"( �U ) = 0.
42 A. Chambolle
Since E" is (C1 and) strictly convex2 (meaning that for every U;U 0 and
0 < � < 1, E"(�U+(1��)U 0) < �E"(U)+(1��)E"(U0) unless U = U 0), it has
a unique minimizer characterized by the equation DUE = 0. We deduce that�U is the unique minimizer of E". This achieves the proof of Proposition 4,
since by uniqueness of this minimizer any subsequence of (Un) must converge
to the same value �U , so that the whole sequence (Un) converges to �U .
Exercise. Prove that as "!0, E" ��converges to E, so that the minimizer
of E" tends to the minimizer of E.
3.4 Two examples
Figure 5: A noisy image and the reconstruction.
We just show two examples of image denoising (i.e., A is the identity
matrix) obtained with this method (these are taken from [24]). The �rst
one (Fig. 5) represents the reconstruction of a piecewise constant image on
which a Gaussian noise (of standard deviation 60 for values between 0 and
255) has been added. On the left, the noisy image is presented, while on the
right the reconstruction with an appropriate value of � is shown. The second
example (Fig. 6) shows a �true picture� that has been corrupted by some
Gaussian noise (here the standard deviation is approximately 30, always for
values between 0 and 255). The reconstruction (on the right) is less good,
2We have to assume that A is injective. Otherwise, �
U is still a minimizer of E" but
might be not unique.
Inverse problems in Image processing and Image segmentation 43
Figure 6: Another example of total variation minimization.
since the total variation tends to favor piecewise constant images, so that the
result is a bit �blocky�.
4 The numerical analysis of the Mumford�Shah prob-
lem (I)
4.1 Ambrosio and Tortorelli's approximate energy
Now we will describe the �rst attempt that has been made to provide an
approximation of the Mumford�Shah functional by simpler (elliptic) vari-
ational problems. This result is due to Ambrosio and Tortorelli (see [7]
and [8]). They have proposed to replace the set K by a function v(x), and
design energies that depend on a scale parameter ", so that as " goes to
zero the function 1 � v(x), in some sense, becomes an approximation of the
characteristic function of the discontinuity set K.
Their approximation F"(u; v), de�ned over the space L2() � L2(), is
the following
F"(u; v) =
Z(v(x)2 + k")jru(x)j2 dx +
Z"jrv(x)j2
+(1� v(x))2
4"dx +
Zju(x)� g(x)j2 dx (24)
for u; v 2 H1(), and they set F (u; v) = +1 if u or v is in L2() nH1().
The parameter k" > 0 is needed in order to have that for " > 0, F is coercive
44 A. Chambolle
in H1() � H1() (i.e., greater than a constant times the norm of (u; v)
in this space). It has to go to zero faster than " as " goes to zero (i.e.,
lim"#0 k"=" = 0), the reason for this will be made clear in the proof.
In order to show that F" ��converges to an energy such as the weakMumford�Shah energy E, they need to rede�ne it on the same space as F",that is, L2()� L2(). To do this, they consider the fact that as " goes tozero, they want v to become almost everywhere equal to 1, and thus they setfor every u; v 2 L2():
F (u; v) =
8>>>><>>>>:
Z
jru(x)j2 dx + HN�1(Su) +
Z
(u(x)� g(x))2 dx
if
�u 2 GSBV ();
v(x) = 1 a.e. in ,
+1 otherwise:
Notice that F (u; v) is just E(u) when v = 1 a.e., and +1 otherwise. Then,
they are able to show the following theorem.
Theorem 8 (Ambrosio�Tortorelli) As " goes to zero, F" ��converge to
F .
In particular, this means that for small ", the minimizers of F" will be
close to minimizers of F .
This approximation has been used in e�orts to compute image segmen-
tations and for other application (see [16, 37, 52], that are based on a
�nite-element version of Theorem 8 established by Bellettini and Coscia [13],
and [57, 56, 58, 59] where in particular Shah considers the gradient �ow of
F"(u; v) and of similar energies). It works quite well, however, it has the fea-
ture that the approximation of the Mumford�Shah functional will be correct
only if the discretization step (the pixels' size) is much smaller than the scale
parameter ". This is not very convenient for most applications. For this
reason, we will study in the following sections the problem from a di�erent
point of view, considering the original discrete energies (E(U;L) or E(U))
and explain how one can �nd the energy they approximate in the continuous
setting (which will be a variant of the Mumford�Shah energy).
We quickly explain in one dimension how the proof of Theorem 8 goes. In
dimension greater than one, the proof is obtained mostly through a localiza-
tion and a slicing argument, that we will brie�y mention in the subsequent
section.
Inverse problems in Image processing and Image segmentation 45
4.2 Sketch of the proof of Ambrosio and Tortorelli's theorem,
in dimension one
Basically, in order to show Theorem 8, you have to choose any sequence
("j)j�1 of positive numbers with limj!1 "j = 0, and to prove that for any
(u; v),
i) if (uj ; vj) goes to (u; v) (in L2-norm) as j!1, then
F (u; v) � lim infj!1 F"j (uj; vj), and
ii) there exists (uj ; vj) that converges to (u; v) such that
lim supj!1 F"j (uj ; vj) � F (u; v)
4.2.1 Proof of (i)
To prove (i), we consider a sequence (uj; vj). We of course can assume that
lim infj!1 F"j (uj; vj) < +1 (otherwise there is nothing to prove), and we
can extract a subsequence (unless really needed �i.e., if we need to keep
the track of the original sequence or compare two di�erent subsequences�
we will always denote the subsequences like the original sequence) such
that lim infj!1 F"j (uj ; vj) = limj!1 F"j (uj ; vj). In particular we have
c = supj F"j (uj ; vj) < +1, thusR(1 � vj(x))
2 dx � c"j!0 as j!1.
Then, since vj goes to 1 in L2(), we must have v = 1 a.e.. We then
just need to show that E(u) � lim infj!1 F"j (uj ; vj). Since it is clear thatR(uj(x) � g(x))2 dx goes to
R(u(x)� g(x))2 dx as j!1, we have to show
that u 2 GSBV () and estimate the other two terms, H0(Su) = ]Su (notice
that the measure H0(Su) is the cardinality of the set Su, that we also denote
by ]Su), andR _u(x)2 dx.
Now let � be the set of points x 2 such that for every Æ > 0 (small, so
that (x� Æ; x + Æ) 2 ), u 62 H1(x� Æ; x + Æ). We will show that this set is
�nite. Indeed, choose x1; : : : ; xk 2 � with x1 < x2 < � � � < xk. Choose also Æ
such that xi+ Æ < xi+1� Æ for every i = 1; : : : ; k� 1 and (xi� Æ; xi+ Æ) �
for every i.
Then, we must have for every i that lim supj!1 infBÆ=2(xi)vj = 0. Other-
wise, there exists i and a subsequence ujk ; vjk such that limk!1 infBÆ=2(xi)vjk =
� > 0. And vjk � �=2 in BÆ=2(xi) for k large enough. But then,R xi+Æ=2xi�Æ=2 (vjk+
k"jk )u0jk(x)2 dx � c implies that
R xi+Æ=2xi�Æ=2 u
0jk(x)2 dx � 2c=�, so that ujk is uni-
formly bounded in H1(xi � Æ=2; xi + Æ=2). In this case, its limit u also has
46 A. Chambolle
to be in H1(xi � Æ=2; xi + Æ=2), and this is in contradiction with the choice
of xi 2 �.
Now, for every i we know that lim supj!1 infBÆ=2(xi)vj = 0. We also
know that vj!1 in L2(). Therefore if we �x i and choose � > 0 small,
there will exist for j large enough a point xi(j) 2 (xi � Æ=2; xi + Æ=2) with
vj(xi(j)) < �, and x0i(j) 2 (xi � Æ; xi � Æ=2) and x00
i(j) 2 (xi + Æ=2; xi + Æ)
with vj(x0i(j)) > 1 � � and vj(x
00i(j)) > 1 � �. We then have (using the fact
that A2 +B2 � 2AB)Zxi+Æ
xi�Æ"jv
0j(x)
2 +(1� vj(x))
4"j
2
dx �Z
xi+Æ
xi�Æjv0j(x)jj1 � vj(x)j dx
�Z
xi(j)
x0i(j)
j1� vj(x)jjv0j(x)j dx
+
Zx00i (j)
xi(j)j1� vj(x)jjv0j(x)j dx
� (1� �)2 � �2
2+
(1� �)2 � �2
2= 1� 2�:
In particular, we get that for j large enoughR "jv
0j(x)2 +
(1�vj(x))24"j
dx �(1� 2�)k.
Since this is valid for an arbitrary �nite subset fx1; : : : ; xkg of �, it showsthat ]� < +1, and more precisely,
]�(1�2�) � lim infj!1
Z"jv
0j(x)
2+(1� vj(x))
2
4"jdx � c = sup
j
F"j (uj ; vj) < +1:
This is true for every � > 0, so that eventually
]� � lim infj!1
Z"jv
0j(x)
2 +(1� vj(x))
2
4"jdx
holds.
Now, by the de�nition of �, u 2 H1loc( n �), and we need to �nd an
estimate forRn� u
0(x)2 dx. Indeed, if we knew thatRn� u
0(x)2 dx < 1(i.e., u 2 H1( n �)), then it would yield u 2 SBV () and Su = �, _u = u0.
Notice that the proof we have just written could easily be transformed
to lead to the following lemma:
Lemma 6 Let � > 0. Then, for every Æ > 0, there exists J such that for
every j � J , if x1 < x2 < � � � < xk are such that vj(xi) < � and xi+1�xi > Æ,
then k � c=(1 � 2�).
Inverse problems in Image processing and Image segmentation 47
We leave the proof of this lemma to the reader (use the same arguments as
in the proof above, after having chosen J such that if j � J , jfx : vj(x) <
1� �gj < Æ).
In this case, once � > 0 is chosen, we can choose Æ > 0 and select for all
j � J a maximal set x1(j) < x2(j) < � < xk(j)(j), with xi+1(j) � xi(j) > Æ
and vj(xi(j)) < �, and we have k(j) � c=(1 � 2�). Therefore, there exist
k � c=(1 � 2�), a subsequence (ujl ; vjl), and k points x1 < x2 < � � � < xk,
such that k(jl) = k for all l and xi(jl)!xi as l!1 for every i = 1; : : : ; k.
If l is large enough we thus have (by the maximality of the set
fx1(j); : : : ; xk(j)(j)g) that vjl � � in the open set Æ = n[ki=1[xi� 2Æ; xi+
2Æ], so thatRÆ u0jl(x)
2 dx � c=� and since ujl goes to u in L2() we know
that this implies the convergence of u0jlto u0 weakly in L2(Æ).
If w 2 L2(Æ), the functions wqk"jl + v2
jlgo to w strongly in L2(Æ),
since vjl!1 in L2().
Thus, liml!1
ZÆ
qk"jl + vjl(x)
2u0jl(x)w(x) dx =
liml!1
ZÆu0jl(x)
�w(x)
qk"jl + vjl(x)
2�dx =
ZÆu0(x)w(x) dx
andqk"jl + vjl(x)
2u0jl(x) goes weakly to u0 in L2(Æ). This yields
ZÆu0(x)2 dx � lim inf
l!1
ZÆ(k"jl + vjl(x)
2)u0jl(x)2 dx � c < +1:
Since Æ is arbitrary we can deduce thatZu0(x)2 dx � lim inf
j!1
Z(k"j + vj(x)
2)u0j(x)2 dx
and in particular u 2 H1( n �). Therefore u 2 SBV (), Su = �, _u = u0,andZ_u(x)2 dx+ ]�+
Z(u(x) � g(x))2 dx �
� lim infj!1
Z(k"j + vj(x)
2)u0j(x)2 dx+
Z"jv
0j(x)
2 +(1� vj(x))
2
4"jdx
+
Z(uj(x)� g(x))2 dx
which was the thesis we wanted to prove, and (i) is true.
48 A. Chambolle
4.2.2 Proof of (ii)
To prove (ii), we consider u 2 SBV () with E(u) = F (u; 1) < +1 (oth-
erwise there is nothing to prove). In this case, Su is a �nite set, and
u 2 H1( n Su) (in particular it is continuous everywhere but in Su).
In order to simplify the proof we will consider that = (�1; 1) and that
u has just one jump at point 0, but the study of the general case can be
localized in small intervals around each discontinuity of u so that it is not
very di�erent.
Consider the function (t) = 1 � exp(�t=2) for t � 0. We leave the
following result to the reader:
Exercise. Prove that v"(x) = (x=") minimizesZ 1
0"v0(x)2 +
(1� v(x))2
4"dx
on the set fv 2 L2loc(0;+1); v0 2 L2(0;+1); v(0) = 0g, and that the value
of the minimum is 1=2.
Now, we set for every " > 0 v"(x) = 0 if jxj < a", and v"(x) = �jxj�a"
"
�otherwise, where a" goes to zero with " and will be �xed later on, and we set
u"(x) = u(�a")+ u(a")�u(�a")2a"
(x+a") if jxj < a", and u"(x) = u(x) otherwise.
Then (using the result of the previous exercise),
E"(u"; v") =
Zjxj�a"
(k" + v"(x))u0(x)2 dx + 2a"k"
�u(a")� u(�a")
2a"
�2
+
Zjxj�a"
1
" 0� jxj � a"
"
�2
+1
4"
�1�
� jxj � a"
"
��2
dx
+2a"
4"+
Zjxj�a"
(u(x)� g(x))2 dx +
Za"
�a"(u"(x)� g(x))2 dx
� (1 + k")
Z 1
�1u0(x)2 dx + 2kuk21
k"
a"
+2� 1
2+
2a"
4"
+
Z 1
�1(u(x)� g(x))2 dx + 2a"(kuk1 + kgk1)2:
We see that we will get the result if both k"=a" and a"=" go to zero as "
goes to zero. This is possible if and only if k"="!0, since in this case we can
let a" =pk"". Then, we have lim sup"#0E"(u"; v") � F (u) and point (ii) is
proved.
Inverse problems in Image processing and Image segmentation 49
4.3 Higher dimensions
In dimension greater than one, the proof is obtained through a localization
and a slicing argument. The reader, if interested, should report himself to
Ambrosio and Tortorelli's paper, to [19] where a similar argument is used,
or to Braides' book [18]. We will brie�y explain, without giving too many
details, how the proof goes.
4.3.1 The �rst inequality
Consider a sequence "j # 0 and uj; vj such that uj!u and vj!v strongly in
L2() as j!1. Again, it is clear that v = 1 a.e., and we need to show thatE(u) � lim infj!1 F"j (u
j ; vj). To prove this inequality we �rst localize the
energy F"j (uj ; vj) by letting, for every A � open,
F"j (uj ; vj ; A) =Z
A
�(vj(x)2 + k"j )jru
j(x)j2 + "j jrvj(x)j2 +
(1� vj(x))2
4"j+ juj(x)� g(x)j2
�dx:
(25)
Then, we �x an open set A and a unit vector � 2 SN�1, and write
F"j (uj ; vj ; A) =
Z�?dHN�1(z)
ZAz;�
dt�(vj(z + t�)2 + k"j )jruj(z + t�)j2
+ "j jrvj(z + t�)j2 + (1� vj(z + t�))2
4"j
+ juj(z + t�)� g(z + t�)j2�
so that
F"j (uj; vj ; A) �
Z�?F 1D"j
(uj
z;�; v
j
z;�; Az;�; gz;�) dHN�1(z):
(We follow the notations in section 2.2.8: Az;� = ft 2 R : z + t� 2 Ag,and for every t 2 Az;�, u
j
z;�(t) = uj(z + t�), v
j
z;�(t) = vj(z + t�), gz;�(t) =
g(z + t�).) Here F 1D"j
denotes the localized Ambrosio�Tortorelli energy (25)
in dimension 1 (given I � R an open set and w; r 2 H1(I), h 2 L1(I)):
F 1D"j
(w; r; I; h) =
ZI
(r(t)2 + k"j )w0(t)2 dt +
ZI
"jr0(t)2 +
(1� r(t))2
4"jdt
+
ZA
jw(t)� h(t)j2 dt:
50 A. Chambolle
Since as j!1,Zjuj(x)� u(x)j2 dx =
Z�?
Zz;�
jujz;�(t)� uz;�(t)j dtdHN�1(z)! 0
we may assume we have extracted a subsequence (not relabeled) such that
uj
z;�!uz;� for almost every z 2 �? (such that z;� 6= �).
Then, the one�dimensional result states that for such a �,
E1D(uz;�; Az;�; gz;�) � lim infj!1
F 1D"j
(uj
z;�; v
j
z;�; Az;�; gz;�);
where again if I � R is open,
E1D(w; I; h) =
ZI
_w(t)2 dt + H0(Sw \ I) +
ZI
(w(t) � h(t))2 dt
if w 2 SBV (I), and E1D(w; I; h) = +1 otherwise. Using Fatou's lemma,
we deduce thatZ�?
ZAz;�
_uz;�(t)2 dt + H0(Suz;� \Az;�) +
ZAz;�
(uz;�(t)� gz;�(t))2 dt
!dH0(z)
� lim infj!1
F"j (uj ; vj ; A):
In particular we get that if lim infj!1 F"j (uj ; vj ; A) < +1, uz;� 2 SBV (Az;�)
for a.e. every z 2 �?, and since this is true for every � we deduce that
u 2 SBV (A). Thanks to the results of section 2.2.8, the last inequality can
then be rewritten as
E�(u;A) =ZA
hru(x); �i2 dx +
ZSu\A
jh�u(x); �ij dHN�1(x) +
ZA
ju(x)� g(x)j2 dx
� lim infj!1
F"j (uj ; vj ; A):
To conclude, we admit that (see [19, Prop. 6.5]), if (�n)n�1 is a dense
sequence of points in SN�1,
E(u) = sup
(kX
n=1
E�n(u;An) : k 2 N; (An)n=1;���;k disjoint open subsets of
)and we observe that if (An)n=1;���;k is such a family, then
kXn=1
E�n(u;An) �kX
n=1
lim infj!1
F"j (uj ; vj ; An) � lim inf
j!1F"j (u
j ; vj);
so that E(u) � lim infj!1 F"j (uj ; vj).
Inverse problems in Image processing and Image segmentation 51
4.3.2 The second inequality
Now, given u 2 SBV () such that E(u) < +1, let us build functions u" and
v" such that u"!u, v"!1 as " # 0, and such that lim sup"#0 F"(u"; v") � E(u).
We will also assume that u is bounded and that Su is essentially closed in ,
which means that HN�1( \ (Su n Su)) = 0. This, in fact, is not restrictive,
since it is possible to approximate every u 2 SBV () with a sequence of
bounded functions (uj) such that for every j, HN�1( \ (Suj n Suj )) = 0,
in such a way that limj!1E(uj) = E(u). This is a consequence of the
essential�closedness of the jumps set of the minimizers of the Mumford�Shah
functional, mentioned in section (2.3.2).
Exercise. Show this approximation property.
If Su (which is recti�able) is essentially closed in , then the limit of the
quantities
LÆ(Su) =jfx 2 : dist(x; Su) � Ægj
2Æ
as Æ # 0, called the Minkowsky content of Su, is exactly HN�1(Su) (see [36]).Notice moreover that since Æ 7! LÆ(Su) is continuous on (0;+1) and bounded
by jj=(2Æ), it is bounded, so that there exists a constant cL such that
jfx 2 : dist(x; Su) � Ægj � 2cLÆ (26)
for every Æ � 0.
We consider the function (�) de�ned in section 4.2.2, and, again, a" =pk"", and let S" = fx 2 : dist(x; Su) � a"g. We set
v"(x) =
8><>:
�dist(x; Su)� a"
"
�if x 62 S" ;
0 otherwise.
and
u"(x) = u(x)
�dist(x; Su)
a"^ 1
�so that u" = u on n S". Then u"!u and v"!1 as " # 0. We will denote
in what follows dist(x; Su) = d(x). Out of S", ru" = ru, whereas if x 2S", ru"(x) = ru(x)d(x)=a" + u(x)rd(x)=a" so that jru"(x)j � jru(x)j +kuk1=a" (we admit that rd exists a.e. and that jrdj � 1). Therefore,
52 A. Chambolle
Z(v"(x)
2 + k")jru"(x)j2 dx � (1 + k")
ZnS"
jru(x)j2 dx
+2k"
ZS"jru(x)j2 dx +
jS"jkuk21a2"
!:
Since jS"j = 2a"La"(Su) � 2a"HN�1(Su) as "!0 and k"=a"!0, we deduce
that lim sup"#0R(v
2" + k")jru"j2 �
R jruj2.
Exercise. Show that u" 2 H1() and that ru" = (dru+ urd)=a" in S".Let us now show that lim sup"#0
R "jrv"j2+(1� v")2=(4") � HN�1(Su).
Out of S", rv"(x) = 0 ((d(x) � a")=")rd(x)=", so thatZ"jrv"(x)j2 +
(1� v"(x))2
4"dx =
jS"j4"
+1
4"
ZnS"
4 0�d(x)� a"
"
�2
+
�1�
�d(x)� a"
"
��2
dx:
The ratio jS"j=(4") is of order a"=" and goes to zero as " # 0. Since 0(t) =(1� (t))=2 = exp(�t=2)=2, the second integral is
1
2"
ZnS"
exp
��d(x)� a"
"
�dx:
We notice thatZnS"
exp
��d(x)� a"
"
�dx =
ZnS"
Z 1
0
�ft : t�exp(�(d(x)�a")=")g dtdx
=
Z 1
0jfx 2 : a" < d(x) � a" � " log tgj dt:
Let h"(t) = jfx 2 : a" < d(x) � a" � " log tgj=(2") = (a"=" �log t)L(a"�" log t)(Su) � a"="La"(Su). By (26), jh"(t)j � cL(a"=" � log t) �cL(1 � log t) (if " is small enough) for every t 2 (0; 1) and the latter func-
tion is integrable on (0; 1). Moreover, we know that as " # 0 lim"#0 h"(t) =(� log t)HN�1(Su). By Lebesgue's dominated convergence theorem we de-
duce that lim"#0R 10 h"(t) dt = HN�1(Su), so that
lim"#0
1
2"
ZnS"
exp
��d(x)� a"
"
�dx = HN�1(Su)
This shows that lim"#0R "jrv"j2+(1�v")2=(4") = HN�1(Su), and achieves
the proof of Theorem 8 in arbitrary dimension.
Inverse problems in Image processing and Image segmentation 53
5 The numerical analysis of the Mumford�Shah prob-
lem (II)
Now we consider a di�erent problem, that can be stated in this way: �in
what sense is Blake and Zisserman's energy E(U) (or Geman and Geman's
energy E(U;L)) a discrete approximation of the Mumford�Shah functional?�
We will see that in fact, it is not an approximation of this functional, but of
a slightly di�erent functional in which the length of the discontinuity set is
measured in a di�erent (anisotropic) way.
5.1 Rescaling Blake and Zisserman's functional
But �rst of all, if we want to see E(U) as a discrete approximation of some-
thing, we have to introduce a discretization step (or scale parameter) h > 0
and explain how the parameters � and � in (5) must vary with h in order to
get some result. How can this be done?
Consider the simpler, one�dimensional Blake and Zisserman's energy
E1(U) =n�1Xi=1
W�;�(ui+1 � ui) +nXi=1
(ui � gi)2
where U = (ui)ni=1 and G = (gi)
ni=1 are 1�dimensional signals.
First of all, if we want the last term of the energy to be an approximation
of an integralR 10 (u(x)� g(x))2 dx, then we need to assume that the signal G
is in fact some discretization Gh = (ghi)ni=1 at step h = 1=n of the function
g 2 L2(0; 1). For instance, we can let ghi= (1=h)
Rih
(i�1)h g(x) dx. Then, we
know thatP
n
i=1 ghi�[(i�1)h;ih) will converge to g in L2. In this case, if we
consider for all h = 1=n a signal U = Uh = (uhi)ni=1, we will have that
nXi=1
h (uhi � ghi )2 �!
Z 1
0(u(x)� g(x))2 dx
as h goes to 0 (n to +1) if the functionsP
n
i=1 uhi�[(i�1)h;ih) converge to u
in L2(0; 1).
But if this is true, then it is easy to show that, at least in the distributional
sense,n�1Xi=1
uhi+1 � uh
i
h�[(i�1)h;ih) * Du
54 A. Chambolle
where Du denotes the distributional derivative of u. In the regions where u
is di�erentiable, it is therefore reasonable to ask that (uhi+1 � uh
i)=h � u0(x)
if x � ih, and the sumP
n�1i=1 W�;�(u
hi+1 � uh
i) will be an approximation ofR
u0(x)2 dx in these regions if when uhi+1 � uh
iis small,
W�;�(uh
i+1 � uhi ) ' h � uhi+1 � uh
i
h
!2
=(uh
i+1 � uhi)
h
2
:
This implies that we must choose � � 1=h.
On the other hand, when u has a jump, if (i + 1)h is on one side of
the jump and ih on the other side, the di�erence uhi+1 � uh
ishould go to
ur � ul = �(u+ � u�) and thus have the order of magnitude of a constant.
In this case, we want to count �1� in the energy. Since the value of W�;�(t)
for large t is �, it means that we must choose � � 1. The rescaled energy
then becomes (recording that W1=h;1(t) = min(t2=h; 1))
E1h(U
h) =n�1Xi=1
min
0@(uhi+1 � uh
i)
h
2
; 1
1A +nXi=1
h (uhi � ghi )2:
Letting f(t) = min(t; 1) = t ^ 1, this can be written as
E1h(U
h) = h
0@n�1Xi=1
1
hf
0@(uhi+1 � uh
i)
h
21A +
nXi=1
(uhi � ghi )2
1A : (27)
In a similar way, the rescaled 2�dimensional Blake and Zisserman energy
will be
E2h(U
h) = h2
0@Xi;j
1
hf
0@(uhi+1;j � uh
i;j)
h
21A +
1
hf
0@(uhi;j+1 � uh
i;j)
h
21A
+Xi;j
(uhi;j � ghi;j)2
1A (28)
where now (ghi;j)1�i;j�n is the correct discretization of an image g 2 L2()
with = (0; 1) � (0; 1): for instance, we can let
ghi;j =1
h2
Zih
(i�1)h
Zjh
(j�1)hg(x; y) dxdy:
Inverse problems in Image processing and Image segmentation 55
5.2 The ��limit of the rescaled 1�dimensional functional
In order to state a ��convergence result for the energy E1h(Uh) de�ned
by (27), we must �rst consider E1has a functional over L2(0; 1). This is
done by de�ning it for every uh 2 L2(0; 1) as
E1h(u
h) =
(E1h(U
h) if uh =P
n
i=1 uhi�[(i�1)h;ih); U
h = (uhi)ni=1;
+1 otherwise,
which means that E1h(uh) has a �nite value only when uh is a piecewise
constant function at scale h. Then, we have the following result ([20]).
Theorem 9 E1h��converges to E1
0 as h goes to zero (n goes to in�nity),
where
E10(u) =
8<:Z 1
0_u(x)2 dx+H0(Su) +
Z 1
0(u(x)� g(x))2 dx if u 2 SBV (0; 1),
+1 if u 2 L2(0; 1) n SBV (0; 1).(29)
We will sketch a proof of this result. We need to prove that
i. if (uh) goes to u (in L2-norm) as h!0, then E10(u) � lim infh!0E
1h(uh),
and
ii. there exists (uh) that converges to u such that lim suph!0E1h(uh) �
E10(u)
5.2.1 Proof of (i)
To prove (i), we consider a sequence (uh) converging to u as h goes to zero.
We can assume that lim infh!0E1h(uh) < +1 (otherwise there is nothing to
prove), and even, by extracting a subsequence, that suphE1h(uh) < +1. In
particular, for every h, there is a discrete signal Uh = (uhi)ni=1 (n = 1=h) such
that uh =P
n
i=1 uhi�[(i�1)h;ih) and E
1h(uh) = E1
h(Uh). Then, we build a new
function vh in the following way:
� if x 2 [0; h) we let vh(x) = uh1 ;
� then, for x 2 [ih; (i + 1)h) (1 � i � n� 1):
� if juhi+1 � uh
ij �
ph, we let vh(x) = uh
i+ (x� ih)(uh
i+1 � uhi)=h,
56 A. Chambolle
� otherwise, if juhi+1 � uh
ij >
ph, we let vh(x) = uh
iif x 2 [ih; (i +
1=2)h) and vh(x) = uhi+1 if x 2 [(i+ 1=2)h; (i + 1)h).
We see that vh(ih) = uhifor all i, and that vh is a�ne in the intervals
[ih; (i + 1)h) such that f((uhi+1 � uh
i)2=h) = (uh
i+1 � uhi)2=h, and piecewise
constant with exactly one jump in the intervals such that f((uhi+1�uhi )2=h) =
1.
With this construction, we have that vh 2 SBV (0; 1), and that
Z 1
0_vh(x)2 dx + H0(Svh) = h
n�1Xi=1
1
hf
0@(uhi+1 � uh
i)
h
21A :
We can show thatR 10 (v
h(x) � g(x))2 dx is less than some constant times
hP
n
i=1(uhi� gh
i)2, so that we may apply Theorem 6 to deduce that some
subsequence of vh converges a.e. to a function v 2 SBV (0; 1), withZ 1
0_v(x)2 dx + H0(Sv) � lim inf
h!0
Z 1
0_vh(x)2 dx + H0(Svh):
Since it is easy to show that vh must converge (at least) weakly to u in
L2(0; 1), and since hP
n
i=1(uhi�gh
i)2!
R 10 (u(x)�g(x))2 dx, we get that v = u,
u 2 SBV (0; 1) andZ 1
0_u(x)2 dx + H0(Su) +
Z 1
0(u(x)� g(x))2 dx � lim inf
h!0E1h(u
h);
which is the thesis we wanted to show.
Remark. We have shown slightly more than just the point (i). Notice in-
deed that we can easily deduce that if uh is bounded uniformly in L1(0; 1)
and suphE1h(uh) < +1, then some subsequence of uh converges (weakly in
L2: easy, strongly in L2: �rst show that (uh) is bounded in BV (0; 1), hence
compact in L2) to a function u with E10(u) � lim infh!0E
1h(uh). In particu-
lar, we can deduce from Theorem 9 and this remark that if uh is for every h
a minimizer of E1h, then it has subsequences that converge to a minimizer of
E10 . (See Theorem 12 in section 5.4 below for a more general statement.)
5.2.2 Proof of (ii)
Now we consider proving (ii). This is very simple. Choose u 2 SBV (0; 1)
with E10(u) < +1. The function u is piecewise continuous and has a �nite
Inverse problems in Image processing and Image segmentation 57
number of jumps x1 < x2 < � � � < xk. We de�ne for every n � 1 and h = 1=n
the discrete signal U = (uhi)ni=1 by uh
i= u(ih � 0), which is the left limit
lim"#0 u(ih � ") of u at ih (thus uhi= u(ih) if ih 62 Su). It is standard that
uh =P
n
i=1 uhi�[(i�1)h;ih) goes to u in L2(0; 1), in fact it is easy to prove that
uh goes to u uniformly on (xl + Æ; xl+1 � Æ) for every l = 1; : : : ; k and small
Æ > 0.
If [ih; (i+1)h)\Su = �, we have (using the Cauchy-Schwarz inequality)
uhi+1 � uhi =
Z (i+1)h
ih
_u(x) dx �ph
Z (i+1)h
ih
_u(x)2 dx
! 12
;
so that (uhi+1 � uh
i)2=h �
R (i+1)hih
_u(x)2 dx. Thus, denoting Ih the set fi 2[1; n] : [ih; (i + 1)h) \ Su 6= �g, we have
hn�1Xi=1
1
hf
0@(uhi+1 � uh
i)
h
21A �
Xi62Ih
Z (i+1)h
ih
_u(x)2 dx + ]Ih
�Z 1
0_u(x)2 dx + ]Su:
Therefore lim supE1h(uh) � E1
0(u) and point (ii) is proved.
5.3 The ��limit of the rescaled 2�dimensional functional
For the 2�dimensional functional, we have the same kind of result. We also
de�ne the functional E2h(Uh) as a functional over L2(), with = (0; 1) �
(0; 1), in the following way: for every 1 � i; j � n we let Chi;j
be the square
[(i� 1)h; ih) � [(j � 1)h; jh), and
E2h(u
h) =
(E2h(U
h) if uh =P
1�i;j�n uhi�Chi;j
; Uh = (uhi;j)1�i;j�n;
+1 otherwise.
Then, we de�ne E20 as
E20(u) =
8>>><>>>:Zjru(x)j2 dx +
ZSu
j�1u(x)j + j�2u(x)j dH1+
Z(u(x)� g(x))2 dx
if u 2 SBV (),
+1 if u 2 L2() n SBV ().(30)
Here the vector �u(x) = (�1u(x); �2u(x)) is the normal vector to the jump
set Su at x. Notice that in this case E20 is not the Mumford and Shah
58 A. Chambolle
functional: it is slightly di�erent and measures the length of the jump set
in an anisotropic way. We point out the fact that it is of the form of E0 inde�nition (20), so that it still admits minimizers and the jump set Su of a
minimizer u is still essentially closed (i.e., H1( \ Su n Su) = 0).
The anisotropy in E20 is unavoidable since the discrete energy E2
his not
isotropic either, as illustrated by the following exercise.
Exercise. Assume g = 0. Let C = [a; b] � [c; d] � = (0; 1) � (0; 1), with
` = b�a = c�d > 0, be a square in , and let C 0 be the same square rotated
by 45Æ. Let uhi;j
be 1 if (ih; jh) 2 C (i.e., if a � ih � b and c � jh � d)
and 0 otherwise (uh is an approximation of the characteristic function of C).
Similarly, let u0hi;j be 1 if (ih; jh) 2 C 0 and 0 otherwise. Show that as h goes
to zero, E2h(uh) � 4`. On the other hand, show that E2
h(u0h) � 4
p2`.
With these de�nitions of E2hand E2
0 , we have the following theorem [21]:
Theorem 10 As h = 1=n goes to zero, E2h��converges to E2
0 in L2().
The proof of Theorem 10 in [21] is based on the same ideas as the proof
of Theorem 9 that was just given, and we will not repeat it here. On the
other hand, this result is a particular case of Theorem 11 that will be proved
in the next section.
5.4 More general �nite-di�erences approximations
Now, we will introduce a general result for �nite di�erence discrete approx-
imations of the Mumford�Shah functional, of which Theorems 9 and 10 are
particular cases. What follows is derived from [22].
In 1995, De Giorgi imagined the following non-local functional, de�ned
for any measurable function u on RN ,
F"(u) =
Z ZRN�RN
1
"arctg
(u(x)� u(x+ "�))
"
2!e�j�j
2
dxd�
as a possible approximation, as " goes to zero, of the �rst-order part of the
Mumford�Shah functional (the part that depends on Du)
F (u) = �
ZRNjru(x)j2 dx+ �HN�1(Su);
de�ned on functions u 2 GSBVloc(RN ) . Here �, � are two positive parame-
ters. Notice that the function arctg (t) looks like the function f(t) = min(t; 1)
Inverse problems in Image processing and Image segmentation 59
that we introduce in the previous sections: in the neighborhood of 0, it
behaves as t, whereas as t!1, it behaves like a constant (�2instead of
1). We will see in the sequel that we could consider any non-decreasing
function f such that f(0) = 0, f 0(0) = 1 (or some positive constant), and
limt!+1 f(t) = 1 (or any other positive constant).
This conjecture of De Giorgi was proved by Gobbino [43]. He established
that, as " # 0, F" ��converges to F in the strong Lp(RN ) topology for any
1 � p < +1, for � =p�N
2and � = �
p�.
This result is very close to the Theorems stated in the previous section.
We will show here how we can formulate a discrete version of Gobbino's
theorem, and give its complete proof, based on Gobbino's proof.
Let us give some details. Let � RN be an open domain with Lipschitz
boundary, and for every h > 0 and every u : \ hZN ! R let
Fh(u;) = hNX(
x 2 hZN
x 2
X(� 2 ZN
x+ h� 2
1
hf�
(u(x)� u(x+ h�))
h
2!�(�) 2 [0;+1] ;
(31)
where:
� � : ZN ! [0;+1) is even, satis�es �(0) = 0,P
�2ZN j�j2�(�) < +1,
and �(ei) > 0 for any i = 1; : : : ; N where (ei)1�i�N is the canonical
basis of RN (in practical applications the support of � will have to be
�nite and small);
� for any � with �(�) > 0, f� : [0;+1) ! [0;+1) is a non-decreasing
bounded function with f� � f��, f�(0) = 0, f 0�(0) = �� > 0, and
limt!+1 f�(t) = �� , and we assume that f� is below (or equal to) the
function t 7! ��t^��. We also assume both sup�2ZN �� and sup�2ZN ��are �nite;
� we will adopt in the sequel the convention that any term in the sum
above is zero whenever either x or x + h� is not in even if we do
not explicitly write these conditions under the summation signs (this
convention will be adopted everywhere in what follows unless otherwise
stated), as well, we'll usually write Fh(u) instead of Fh(u;) when not
ambiguous.
Fix p 2 [1;+1) (but you can assume p = 2, as in the previous sections),
and let `p( \ hZN) be the vector space of functions u : \ hZN ! R such
60 A. Chambolle
that the norm
kukp =
8<:hN Xx2\hZN
ju(x)jp9=;
1p
is �nite. In the sequel we will always identify a function u in `p( \ hZN) andthe piecewise constant function in Lp(RN ) equal to u(x) on x +
��h
2; h2
�Nfor any x 2 \ hZN (and to 0 elsewhere), so that kukp = kukLp(RN) and
that a sentence such as �uh 2 `p( \ hZN) converges to u 2 Lp() as h # 0�will have a natural sense. We also set Fh(u) = +1 for any u 2 Lp() thatis not the restriction to of the piecewise constant extension of a function
in `p( \ hZN).Let now, for any u 2 Lp() \GSBVloc(),
F (u) =
Z
X�2ZN
�(�)�� jhru(x); �ij2 dx
+
ZSu
X�2ZN
�(�)�� jh�u(x); �ij dHN�1(x) 2 [0;+1]; (32)
and set F (u) = +1 if u 2 Lp() nGSBVloc(). We will also denote some-
times
F (u;B) =
ZB
X�2ZN
�(�)�� jhru(x); �ij2 dx
+
ZB\Su
X�2ZN
�(�)�� jh�u(x); �ij dHN�1(x)
when B � is a Borel set. We have the following theorem.
Theorem 11 Fh ��converges to F as h # 0 in Lp() (endowed with its
strong topology), for any p 2 [1;+1).
We also have the following compactness result:
Theorem 12 Let p 2 [1;+1), g 2 Lp() \ L1(), and for any h > 0 let
uh be a minimizer over `p( \ hZN) of
Fh(u) +
Zju(x)� g(x)jp dx (33)
(or, equivalently, of
Fh(u) +�ku� ghkp
�p(34)
Inverse problems in Image processing and Image segmentation 61
where gh 2 `p( \ hZN) is a suitable discretization of g at scale h, with
gh ! g in Lp() as h # 0 and kghk1 � kgk1 for all h). Then (uh) is
relatively compact in Lp() and if some subsequence uhj goes to u as j !1,
u 2 SBVloc() \ Lp() is a minimizer of
F (u) +
Zju(x)� g(x)jp dx:
These theorems, for p = 2, provide a generalization of the previous The-
orems 9 and 10. For instance, Theorem 10 is the case where N = 2, p = 2,
= (0; 1) � (0; 1), � � 0 on Z2 except �(0; 1) = �(1; 0) = �(0;�1) =
�(�1; 0) = 1=2, and f�(t) = t ^ 1, so that
F (u) =
Zjru(x)j2 dx+
ZSu
j�1u(x)j+ j�2u(x)j dH1(x):
Remark. The condition �(ei) > 0 for i = 1; : : : ; N is necessary only for
the coercivity, i.e. to establish Lemma 7 and Theorem 12. This is important
in practical applications for the stability of the numerical schemes. Even
if we have not discussed it in the previous sections (except in the Remark
on page 56), a similar coercivity and compactness result also hold for Theo-
rems 8, 9 and 10.
If we wanted only to prove the ��convergence of Fh to F , it would be
su�cient to assume that �(�i) > 0, i = 1; : : : ; N for some basis (�i)1�i�N 2ZN�N of RN .
We will �rst describe the implementation of these energies. The proofs
of Theorem 11 and Theorem 12 will then be given in the last section of these
notes. The next sections are extracted from [22].
6 A numerical method for minimizing the Mumford�
Shah functional
In this section we describe a numerical method for the implementation of the
energies we have introduced in these lectures. We will describe the minimiza-
tion of problem (34) for p = 2, since the energies E1hand E2
hare a particular
case. In particular, we will show how the choice of �, �� and �� in�uences
the results.
62 A. Chambolle
6.1 An iterative procedure for minimizing (34)
Let us quickly describe a standard procedure for minimizing energies such
as (34). Of course we do not pretend to compute an exact minimizer of
the energy, since the high non-convexity of the problem does not allow this.
However, the iterative algorithm we describe gives satisfactory results. A
variant has been successfully implemented in the case of the approximation
of [23] (see [17]). Many other similar implementations have been made for
solving image reconstruction problems (see for instance [60, 10], and the
pioneering work [40] by D. Geman and G. Reynolds).
We assume is bounded so that the discrete problem is �nite-dimensional
for every �xed h > 0 (in the applications will be a rectangle). The non-
convexity in the energy Fh comes from the non-convexity of the functions f�,
� 2 ZN. In order to simplify the computations we will assume that the f�
are all identical, up to a rescaling:
f�(t) = ��f
��
��t
!
for all � 2 ZN (with �(�) > 0) and t � 0. The function f is nondecreasing,
and satis�es f(0) = 0, f(+1) = 1, and f 0(0) = 1. It could be, of course,
the function f(t) = t ^ 1 of section 5, except that a di�erentiable function
provides better numerical results. An interpretation of Blake and Zisserman's
GNC algorithm would correspond to approximate gradually t^1 with smooth
functions.
We will thus assume, as well, that f is concave, and di�erentiable. Thus,
�f is convex (we extend it with the value +1 on ft < 0g), and lower semi-
continuous. Let
(�v) = supt2R
tv � (�f)(t) = (�f)�(v)
be the Legendre-Fenchel transform of �f , by a classical result (�f)�� = �fso that
�f(t) = supv2R
tv � (�v) = � infv2R
tv + (v):
It is well known that the �rst sup in this equation is attained at v such that
t 2 @(�f)�(v) (the subdi�erential of (�f)� at t), and that this is equivalent
to v 2 @(�f)(t), and since @(�f)(t) = f�f 0(t)g for t > 0 and (�1;�1] fort = 0 we deduce that the sup is reached at some v 2 [�1; 0] (since for t = 0
Inverse problems in Image processing and Image segmentation 63
we check that (�f)�(�1) = 0 and thus the sup is reached at v = �1). Hence
f(t) = minv2[0;1]
tv + (v)
and the min is reached for v = f 0(t). (If f(t) = t ^ 1, all of this is still true
except that for t = 1, the min is reached for any v 2 [0; 1].) We may therefore
rewrite Fh in the following way:
Fh(u) = minv(�;�)
Fh(u; v)
for v : ( \ hZN)� ( \ hZN)![0; 1] and
Fh(u; v) = hNX
x2hZN
X�2ZN
(��v(x; x+ h�)
����u(x)� u(x+ h�)
h
����2
+�� (v(x; x + h�))
h
��(�): (35)
The algorithm consists in minimizing alternatively Fh(u; v) + ku � ghk22with respect to u and v. The minimization with respect to v is straightfor-
ward, since it just consists in computing for each x; y 2 \ hZN
v(x; y) = f 0 ��
(u(x)� u(y))
��h
2!;
with � = �(y � x)=h. The minimization with respect to u is also a simple
(linear) problem, since the energy is convex and quadratic with respect to u.
Of course there is no way of knowing whether the algorithm converges to a
solution or not, what is certain is that the energy decreases and goes to some
critical level, while the function u converges to either a critical point or, if it
exists, a continuum of critical points. Notice that if f is strictly increasing,
v is everywhere strictly positive.
In the applications shown in these notes we considered f(t) = 2�arctg �x
2,
so that
f 0(t) =1
1 + �2x2
4
:
Notice that one never has to compute explicitly the position of the edges
during the minimization. Once a minimizer of the energy has been found,
it is possible to extract the edges out of the segmented image by standard
algorithms (using Canny's or more sophisticated edge detectors, with a very
64 A. Chambolle
narrow kernel since the images on which the edges have to be found are
piecewise smooth). The value of the auxiliary function v is also a good
indicator for the position of the edges (it is �large� on the edges and close
to zero everywhere else), and should be taken into account. An elementary
method may be for instance to consider the zero-crossings of the (discretized)
operator d2u(ru;ru) in the regions where v is large.
6.2 Anisotropy of the length term
In some of the above mentioned image processing papers it had been noticed
that the segmentations could be improved by trying to modify slightly the
energy, making it �less anisotropic�. Here we illustrate how the result of
Theorem 11 allows to control this anisotropy and �nd explicitly the correct
parameters for the �best� energies.
In this section, like in section 5, n will be an integer (n > 1), we will set
h = 1=n and the functions u and gh (de�ned on [0; 1) � [0; 1) \ hZ2) will be
denoted as n�nmatrices (ui;j)0�i;j<n and (ghi;j)0�i;j<n. We will compare the
following two cases (pay attention to the fact that the notations here for the
energies are di�erent from the notations in section 5, in fact, the following
E1n is similar to E2
hin sec. 5, the other energies are new)
E1n(u) = h2
Xi;j
�1
hf
�1jui+1;j � ui;jj
�1h
2!
+�1
hf
�1jui;j+1 � ui;jj
�1h
2!
+ jui;j � ghi;j j2;
(which is Blake and Zisserman's �weak membrane� energy, except that f is
smoother than t 7! t ^ 1) and
E2n(u) = h2
Xi;j
�2
hf
�2jui+1;j � ui;j j
�2h
2!
+�2
hf
�2jui;j+1 � ui;jj
�2h
2!
+
+�02hf
�02jui+1;j+1 � ui;jj
�02h
2!
+�02hf
�02jui�1;j+1 � ui;j j
�02h
2!
+ jui;j � ghi;jj2:
By Theorem 12, the limit points of the minimizers of E1n and E2
n, as n!1,
will be minimizers of respectively
E11(u) = �1
Zjru(x)j2 dx + �1�1(Su) +
Zju(x)� g(x)j2 dx
Inverse problems in Image processing and Image segmentation 65
and
E21(u) = �2
Zjru(x)j2 dx + �2�2(Su) +
Zju(x)� g(x)j2 dx;
(for u 2 L2() \ GSBV (), and +1 otherwise) with = (0; 1) � (0; 1),
�1 = �1, �2 = �2 + 2�02, and
�1�1(Su) =
ZSu
�1(j�1(x)j+ j�2(x)j) dH1(x);
�2�2(Su) =
ZSu
�2(j�1(x)j+j�2(x)j)+�02(j�1(x)��2(x)j+j�1(x)+�2(x)j) dH1(x);
where (�1(x); �2(x)) is the normal vector to Su at x. Simple computations
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
Figure 7: The solid line represents the length of a unit vector, as a function
of the angle. Left: for �1, right: for �2. The dashed line is the unit circle.
show that �02 = �2=p2 is an optimal choice, since it minimizes the ratio of
the length �2 of the longest (with length �2) vector in S1 over the length of
the shortest. For any recti�able 1-set E � R2 with normal vector (�1; �2) at
x we de�ne the lengths
�1(E) =�
4
ZE
(j�1(x)j+ j�2(x)j) dH1(x)
and
�2(E) =�
8
ZE
�j�1(x)j+ j�2(x)j +
j�1(x)� �2(x)j+ j�1(x) + �2(x)jp2
�dH1(x):
66 A. Chambolle
(Notice that �2 = (�1 + �1 Æ R�4)=2 where R�
4is the rotation of angle �=4
in R2 .) The choice of the parameters �=4 and �=8 is made in order to ensure
that a �random� set of lines has in average the same length �1 and �2 (and
Euclidean length), in other words, the unit circle has length 2� in both cases.
This is of course not the only possible choice. For instance, one could prefer
to parameterize these lengths in such a way that the error (with respect to
the Euclidean length) �i(emax) � 1 on the measure of the longest (for �i)
vector emax 2 S1 is equal to the error 1 � �i(emin) on the measure of the
shortest vector. In this case, one should choose as parameters 2
1+p2instead
of �=4 for �1 and2
1+p2+p
4+2p2instead of �=8 for �2.
With the choice we made, we get that �1 = 4�1=� and �2 = 8�2=�.
In both cases the limit energy is anisotropic, what is interesting is that the
second length �2 is far �less anisotropic� than the �rst length �1. As a matter
of fact, the longest vector in S1 for �1 is about 41:4% longer than the shortest
(the ratio isp2) while it is only 8:2% longer for the length �2 (the ratio is
2p2 cos �
8=(1 +
p2) =
q4 + 2
p2=(1 +
p2)). The di�erence of anisotropy of
both lengths is striking in �gure 7.
Figure 8: Original images for the next examples.
Inverse problems in Image processing and Image segmentation 67
6.3 Numerical experiments
We show here a few experiments, so that the reader can see for himself
the di�erence of behaviour of the lengths �1 and �2. Notice that in all
of our comparisons we of course always choose �1 = �2 and �1 = �2. In
Figure 9 and Figure 10 (see original pictures in Figure 8), one notices that
the edges are usually nicer when length �2 is used, whereas images obtained
by minimization of energy E1n are more �blocky�. Notice in particular the
Figure 9: The segmented column. Left, by using energy E2n, then with energy
E1n. The details appear in the same order.
diagonal line at the bottom of the image. However, the vertical edges on the
column (Figure 9, second picture) are nicer with energy E1n. The reason is
clear: these edges are vertical, and the vertical and horizontal lines have a
much lower costs than lines with other orientations with this energy.
In Figures 10 and 11 the results are similar: the edges look much nicer
when energy E2n is minimized. The other two �gures (Figs. 12 and 13) show
segmentations in presence of noise. In the two segmentations of the disk, the
total length of the edges found was 6:56 � R with energy E2n and 6:40 � R
with E1n. These lengths are slightly overestimated because a few spurious
edges were found, and also because of some oscillation of the boundary, that
is due to the noise. Again in Figure 13 the result is more blocky with energy
E1n.
68 A. Chambolle
Figure 10: Segmented lady with energy E2n and two details.
Figure 11: Segmented lady with energy E1n and two details.
Inverse problems in Image processing and Image segmentation 69
Figure 12: The noisy disk (grey level values 64 (disk) and 192 (background),
std. dev. of noise 40). Middle, the segmented disk with energy E2n. Right,
with E1n.
Figure 13: A noisy image (std. dev.= 25 for values between 0 and 255) and
the segmented outputs, by minimizing E2n (middle) and E1
n (right).
70 A. Chambolle
In the last two experiments we used another energy, namely,
E3n(u) = h2
Xi;j
�3
hf
�3jui+1;j � ui;j j
�3h
2!
+�3
hf
�3jui;j+1 � ui;jj
�3h
2!
+
+�03hf
�03jui�1;j+1 � ui;jj
�03h
2!
+
+�003hf
�003jui+2;j�1 � ui;j j
�003h
2!
+�003hf
�003jui�1;j+2 � ui;jj
�003h
2!
+ jui;j � ghi;jj2:
We choose �03 = �3=p2 and �003 = �3=
p5. Now, as n!1, the limit points of
the minimizers of E3n minimize
E31(u) = �3
Zjru(x)j2 dx + �3�3(Su) +
Zju(x)� g(x)j2 dx;
with �3 = �3 + 2�03 + 10�003 ,
�3(E) =�
16
ZE
�j�1(x)j+ j�2(x)j+
j�1(x)� �2(x)j+ j�1(x) + �2(x)jp2
+
+j2�1(x)� �2(x)j+ j�1(x) + 2�2(x)j+ j2�1(x) + �2(x)j+ j�1(x)� 2�2(x)jp
5
�dH1(x)
(this time �3 = (�1+�1 ÆR�4+�1 ÆR�+�1 ÆR��)=4 with � = arctg 2), and
�3 = 16�3=�. Figure 14 illustrates how �isotropic� the measure �3 is, and
Figures 15 and 16 show examples. (Now, the length of the longest vector
in S1 is about 5:0% greater than the length of the shortest.) The results
look slightly better than the ones obtained with energy E2n, however, the
computational cost is quite higher.
Inverse problems in Image processing and Image segmentation 71
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
Figure 14: Same as Figure 7, this time for �3.
72 A. Chambolle
Figure 15: Segmentation with energy E3n (the column).
Inverse problems in Image processing and Image segmentation 73
Figure 16: Segmentation with energy E3n(the lady).
74 A. Chambolle
A Proof of Theorems 11 and 12
This last section is entirely devoted to the proof of the theorems of section 5.4.
Most of it relies on Gobbino's work [43], except for some adaptations and
a slight di�erence in the proof (which avoids the use of Gobbino's technical
Lemmas 3.1 and 3.2 in [43], and makes it simpler).
We let for any u 2 Lp()
F 0(u) = (�� lim infh#0
Fh)(u)
and
F 00(u) = (�� lim suph#0
Fh)(u):
In the next section A.1 we will prove a preliminary lemma that will be
helpful in the sequel. Then, the aim of the following two sections A.2 and A.3
will be to prove Theorem 11, i.e., to prove that F 0(u) � F (u) and F 00(u) �F (u) for all u 2 Lp(). Eventually in section A.4, we will prove Theorem 12.
A.1 A compactness lemma
The lemma we show in this section will be needed to establish Theorem 12,
but it will also give some a priori information on the regularity of functions
u 2 Lp() such that F 0(u) < +1.
Lemma 7 Let hj # 0 and uhj 2 `p( \ hjZN) such that
supj
Fhj (uhj ) < +1 and supj
kuhjk1 < +1:
Then there exist a subsequence (not relabeled) uhj and u 2 SBVloc() such
that
uhj (x)!u(x) a.e. in
as j goes to in�nity, andZjru(x)j2 dx+HN�1(Su) < +1
Proof. In order to simplify the notations we drop the subscript j. Let
f = mini=1;:::;N fei , c = mini=1;:::;N �(ei) > 0, and choose �; � > 0 such that
�t ^ � � f(t) for all t � 0. We have:
Fh(uh) � 2c hNX
x2hZN
NXi=1
�
����uh(x)� uh(x+ hei)
h
����2 ^ �
h
Inverse problems in Image processing and Image segmentation 75
(remember we only sum on x such that x; x + hei 2 ). We �rst show that
the sequence (uh) is bounded in BVloc() (so that it is compact in L1loc()).
Choose R > 0, i 2 f1; : : : ; Ng, and write (with
12
pNh
=nx 2 : dist(x; @) > 1
2
pNh
o)
jDiuhj ( 12
pNh
\BR(0)) �X
x2hZN\BR(0)
hN�1juh(x)� uh(x+ hei)j
�Xx2X+
2kuhk1hN�1
+hNXx2X�
����uh(x)� uh(x+ hei)
h
����whereX+ =
nx 2 hZN \BR(0) : juh(x)� uh(x+ hei)j >
p�h=�
oandX� =
hZN \BR(0) nX+. Of course, we only consider points x 2 hZN such that x
and x+ hei belong to . Then, using the Cauchy-Schwarz inequality,
jDiuhj ( 12
pNh
\BR(0)) � 2kuhk1hN�1]X+
+CRN2
8<:hN Xx2X�
����uh(x)� uh(x+ hei)
h
����29=;
12
� kuhk1Fh(uh)�c
+ CRN2
sFh(uh)
2�c
with C some constant depending only on N , so that eventually, for any � > 0,
suph
jDuhj(� \BR(0)) < +1:
This shows that upon extracting a subsequence we may assume that uhconverges almost everywhere in (and in L
p
loc(), as well) to some function
u that belongs to BV ( \BR(0)) for any R > 0.
Now consider the extension of uh (on RN , uh(x) being considered to be
0 outside of )
vh(y) =X
x2hZNuh(x)�N
�y � x
h
�;
where �(t) = (1 � jtj)+ for any t 2 R and �N (y) =QN
i=1�(yi) for any
y 2 RN . We estimate
Rjrvhj2 on an �elementary cell�, for instance (0; h)N :
76 A. Chambolle
Z(0;h)N
j@1vh(y)j2dy =
=
Z(0;h)N
������X
x2f0;hgN
uh(x)1
h�0
�y1 � x1
h
� NYi=2
�
�yi � xi
h
�������2
dy
= h
Z(0;h)N�1
dy2 : : : dyN
������X
x2f0;hgN�1
uh(h; x)� uh(0; x)
h
NYi=2
�
�yi � xi
h
�������2
= h
Z(0;h)N�2
dy3 : : : dyN
Z h
0
�������y2h
� Xx2f0;hgN�2
uh(h; h; x)� uh(0; h; x)
h
NYi=3
�
�yi � xi
h
�
+�1�
y2
h
� Xx2f0;hgN�2
uh(h; 0; x)� uh(0; 0; x)
h
NYi=3
�
�yi � xi
h
�������2
dy2
�h
2
2
8><>:Z(0;h)N�2
dy3 : : : dyN
������X
x2f0;hgN�2
uh(h; h; x)� uh(0; h; x)
h
NYi=3
�
�yi � xi
h
�������2
+
Z(0;h)N�2
dy3 : : : dyN
������X
x2f0;hgN�2
uh(h; 0; x)� uh(0; 0; x)
h
NYi=3
�
�yi � xi
h
�������29>=>;;
by induction we deduce that
Z(0;h)N
j@1vh(y)j2 dy �hN
2N�1X
x2f0;hgN�1
����uh(h; x) � uh(0; x)
h
����2 :Notice that we could therefore conclude that
Zp
Nh
jrvh(y)j2 dy � hNX
x2hZN
NXi=1
����uh(x)� uh(x+ hei)
h
����2
(with pNh
=nx 2 : dist(x; @) >
pNh
o, since we control the gradient
of vh only on the cubes x + (0; h)N , x 2 hZN whose 2N vertices all belong
to ), but since we cannot control the right-hand side of this expression if it
is summed over all x we must introduce a slight modi�cation of vh: we thus
de�ne vh = vh, except each time
juh(x)� uh(x+ hei)j >
s�h
�; (A.1)
Inverse problems in Image processing and Image segmentation 77
in which case we set vh � 0 on (x; x+hei)�Qi0 6=i(x�hei0 ; x+hei0) = Uh
x;ei.
The new function vh is in SBVloc(), and Svh �S(x;ei)2Xh
@Uhx;ei
where the
union is taken on Xh = f(x; ei) : (A.1) holdsg. Now, we can writeZp
Nh
jrvh(y)j2 dy �X
(x;ei)2hZNnXh
����uh(x)� uh(x+ hei)
h
����2 � Fh(uh)
2c�;
(A.2)
moreover since HN�1(@Uhx;ei
) = �hN�1 (with � = 2N�1(N + 1)),
HN�1(pNh
\ Svh) � ]Xh�hN�1 � �
Fh(uh)
2c�: (A.3)
From (A.2) and (A.3) and since suph kvhk1 < +1, we deduce invoking Am-
brosio's Theorem 6 (see section 2.2.6) that some subsequence of vh converges
to a function v 2 L1() \ SBVloc(), withZjrv(x)j2 dx+HN�1(Sv) = sup
A��
ZA
jrv(x)j2 dx+HN�1(A \ Sv)
� 12c
�1�+ �
�
�lim infh#0
Fh(uh) < +1:
The proof of the lemma is achieved once we notice that v must be equal to
u (as for instance by the construction of vh and vh it is simple to check that
for any A �� with regular boundaryRA(uh(y)� vh(y)) dy!0 as h # 0).
Remark. If we drop the condition �(ei) > 0 for i = 1; : : : ; N , the result
may be false. For instance, if N = 1, � � 0 except at �2 and 2 where
�(�2) = �(2) = 1, the family (uh)h>0 de�ned by
uh(kh) =
(0 if k 2 2Z
1 if k 2 2Z+ 1for every k 2 Z
satis�es the assumptions of Lemma 7 but is not compact.
A.2 Estimate from below the ��limit
In this section we wish to prove that for all u 2 Lp(),
F (u) � F 0(u): (A.4)
We must therefore prove that for any u 2 Lp() and any sequence (uhj ) that
converges to u in Lp() as j!1 (with limj!1 hj = 0) we have,
F (u) � lim infj!1
Fhj (uhj ): (A.5)
78 A. Chambolle
Let u 2 Lp(), and we will suppose �rst that it is bounded. Choose
also an arbitrary decreasing sequence hj # 0 and functions uhj that converge
to u in Lp(). We can assume that kuhjk1 � kuk1, as truncating uhj we
decrease its energy Fhj (uhj ). It is clearly not restrictive to consider, as well,
that the lim inf is in fact a limit, and that supj Fhj (uhj ) < +1 (since if
lim infj!1 Fhj (uhj ) = +1 the result is obvious). In view of Lemma 7 we
deduce that
u 2 SBVloc() and
Zjru(x)j2 dx+HN�1(Su) < +1: (A.6)
In the sequel we will drop the subscripts j and write �h # 0� for �j!1�.
We prove (A.5) following Gobbino's method in [43], with a few modi�cations
and adaptations. Let
h =[
x2hZN\x+
��h2;h
2
�N
and notice that 12
pNh
=nx 2 : dist(x; @) > 1
2
pNh
o� h. We have
(still using the convention that we only consider in the sums the points that
fall inside )
Fh(uh) = hNX
x2hZN
X�2ZN
1
hf�
(uh(x)� uh(x+ h�))
h
2!�(�)
=
Zh
dyX
�2ZN\ 1h(h�y)
1
hf�
(uh(y)� uh(y + h�))
h
2!�(�)
=X�2ZN
�(�)
Zh\(h�h�)
1
hf�
(uh(y)� uh(y + h�))
h
2!dy:
For every � 2 ZN we let
Fh(uh; �) =
Zh\(h�h�)
1
hf�
(uh(y)� uh(y + h�))
h
2!dy:
Inequality (A.5) will follow by Fatou's lemma if we prove that for any �,
lim infh#0
Fh(uh; �) � ��
Zjhru(x); �ij2 dx + ��
ZSu
jh�u(x); �ij dHN�1(x):
(A.7)
Inverse problems in Image processing and Image segmentation 79
We choose A �� . If h is small enough (i.e., h � dist(A; @)=(j�j + 12
pN))
then
Fh(uh; �) �ZA
1
hf�
(uh(y)� uh(y + h�))
h
2!dy
and it will be su�cient to show that
lim infh#0
ZA
1
hf�
(uh(y)� uh(y + h�))
h
2!dy �
� ��
ZA
jhru(x); �ij2 dx + ��
ZSu\A
jh�u(x); �ij dHN�1(x);
(A.8)
as the supremum of the right-hand side of (A.8) for all A �� is the right-
hand side of (A.7). This is part of Gobbino's result [43], but we present a
slightly di�erent approach, still based on the �slicing� (see section 2.2.6 for
technical details) of the functions uh in the direction �.
Let �? =nz 2 R
N : hz; �i = 0o, and for every z 2 �?, Az;� = fs 2 R :
z + s� 2 Ag, (uh)z;�(s) = uh(z + s�). We rewrite the �rst integral over Ain (A.8):Z
�?dHN�1(z)
ZAz;�
1
hf�
((uh)z;�(s)� (uh)z;�(s+ h))
h
2!j�j ds =
= j�j
Z�?
dHN�1(z)Xk2Z
ZAz;�\[kh;kh+h)
1
hf�
((uh)z;�(s)� (uh)z;�(s+ h))
h
2!
ds
= j�j
Z�?
dHN�1(z)
Z[0;h)
dt
(Xk2Z
1
hf�
((uh)z;�(t+ kh)� (uh)z;�(t+ (k + 1)h))
h
2!)
(by the change of variable t+ kh = s) where the sum is taken only on thek 2 Z such that t+ kh 2 Az;�. Now, with the change of variable t = h� , thisbecomes
j�j
Z�?
dHN�1(z)
Z 1
0
d�
(hXk2Z
1
hf�
((uh)z;�((� + k)h)� (uh)z;�((� + k + 1)h))
h
2!)
:
We will prove that for a.e. (z; �) 2 �? � (0; 1),
lim infh#0
hX�k 2 Z(� + k)h 2 Az;�
1
hf�
((uh)z;�((� + k)h)� (uh)z;�((� + k + 1)h))
h
2!�
� ��
ZAz;�
j _uz;�(x)j2 dx + ��H
0(Suz;� \ Az;�):
(A.9)
80 A. Chambolle
In order to prove (A.9), we need some information on the limit of ((uh)z;�((�+
k)h))k2Z as h # 0. Since, using the same changes of variables,Zjuh(y)� u(y)jp dy =
Z�?dHN�1(z)
Zz;�
j(uh)z;�(s)� uz;�(s)jp j�j ds
= j�jZ�?dHN�1(z)
Z 1
0d�
8<:hXk2Z
j(uh)z;�((� + k)h)
�uz;�((� + k)h)jp)
(where in the sum we consider only k such that (� + k)h 2 z;�) we may
assume (upon extracting a subsequence) that for a.e. (z; �) 2 �? � (0; 1),
limh#0
hXk2Z
j(uh)z;�((� + k)h)� uz;�((� + k)h)jp = 0: (A.10)
Choose a (z; �) such that (A.10) holds. By (A.6) we may also assume when
choosing z that
uz;� 2 SBVloc(z;�) and
Zz;�
j _uz;�(s)j2 ds + H0(Suz;�) < +1;
so that uz;� is continuous except at a �nite number of points. Thus, for
almost all s 2 z;�,
limh#0
uz;�
��� +
�s
h
��h
�= uz;�(s)
(where [�] denotes the integer part). We easily deduce from this and (A.10)
that the piecewise constant function vh : z;�!R de�ned by
vh(s) = (uh)z;�
��� +
�s
h
��h
�converges to uz;� in L
p
loc(z;�).
Remark. Following Gobbino (proof of Lemma 3.3, Step 2 in [43]) we could
also prove that for a.e. � 2 (0; 1), uz;�((� + [s=h])h)!uz;�(s) in L1loc(z;�),
so that the a priori information on the regularity of u is not really needed.
We return to the proof of inequality (A.9). Notice that if f(t) = t^1, it issimply a consequence of Theorem 9. The proof that follows is needed because
Inverse problems in Image processing and Image segmentation 81
we want to consider more general functions f , and provide a generalization
to such functions of the thesis of Theorem 9. For any I �� Az;�, we denote
G(vh; I) =
ZI
1
hf�
jvh(s+ h)� vh(s)j
h
2!ds
=Xk2Z
j(kh; kh + h) \ Ij1hf�
jvh((k + 1)h) � vh(kh)j
h
2!:
If h is small enough, (� + [s=h])h 2 Az;� for every s 2 I so that the lim inf
in (A.9) is greater than lim infh#0G(vh; I). Therefore, we just need to prove
that for any I �� Az;�,
lim infh#0
G(vh; I) � ��
ZI
j _uz;�(s)j2 ds + ��H0(Suz;� \ I); (A.11)
indeed, taking then the lowest greater bound of the right-hand term of (A.11)
for all I, we will get (A.9). Because of the super-additivity of lim infh#0G(vh; �)we may assume without loss of generality that I is an interval. To prove (A.11),
we then choose �; � > 0 such that �t ^ � � f�(t) for all t � 0 (noticing that
��respectively, ��may be chosen as close as wanted to ���resp., ��), and
we write
G(vh; I) �X
(kh;kh+h)�Ih�
����vh((k + 1)h) � vh(kh)
h
����2 ^ �:Rede�ning a function ~vh with ~vh(kh) = vh(kh) for kh 2 I, a�ne on the
intervals (kh; kh+h) � I such that h����vh((k+1)h)�vh(kh)
h
���2 � � and piecewise
constant, jumping once on the intervals with the reverse inequality (just like
in the proof of Theorem 9), we get
G(vh; I) � �
ZIh
j _~vh(s)j2 ds + �H0(S~vh \ Ih)
with Ih = fx 2 I : dist(x;R n I) > hg, so that invoking Theorem 6
(section 2.2.6) we get the existence of a function ~v such that some subsequence
of ~vh goes to ~v a.e., and that satis�es
�
ZI
j _~v(s)j2 ds + �H0(S~v \ I) � lim infh#0
G(vh; I): (A.12)
We check then that ~v has to be equal to uz;� (noticing easily, for instance,
that (vh � ~vh)*0 weakly in Lp). If �!�� we deduce from (A.12)
��
ZI
j _uz;�(s)j2 ds � lim infh#0
G(vh; I); (A.13)
82 A. Chambolle
whereas sending � to �� we get
��H0(Suz;� \ I) � lim infh#0
G(vh; I): (A.14)
Inequality (A.11) is deduced from the last two inequalities by subdividing the
interval I into suitable subintervals (the connected components of a small
neighborhood of Suz;� and its complement) and using the appropriate in-
equality (A.13) or (A.14) in each subinterval. Hence (A.9) holds, and using
Fatou's lemma we deduce (A.8), as
j�jZ�?dHN�1(z)
��
ZAz;�
j _uz;�(s)j2 + ��H0(Suz;� \Az;�)
!=
= ��
ZA
jhru(x); �ij2 dx + ��
ZSu\A
jh�u(x); �ijdHN�1(x):
Inequality (A.5) therefore holds in the case u 2 L1().
Now, if u 2 Lp() is not bounded, choose again uhj!u in Lp(). Con-
sider uk = (�k _ u) ^ k and ukhj
= (�k _ uhj ) ^ k, clearly ukhj!uk in Lp(),
so that
F (uk) � lim infj!1
Fhj (uk
hj):
But as f is increasing, Fhj (uk
hj) � Fhj (uhj ) so that
F (uk) � lim infj!1
Fhj (uhj ):
If this is �nite, we conclude by noticing that limk!1 F (uk) = F (u) (by (16),
(17)); so that the proof of (A.4) is achieved.
Remark. Notice that if uhj!u in Lp
loc(), the result still holds. Indeed,
for any A �� we have uhj!u in Lp(A) and since the result holds in this
case we can write
F (u;A) � lim infj!1
Fhj (uhj ; A) � lim infj!1
Fhj (uhj ;):
Then, as F (u;) = supA�� F (u;A) we get (A.5). (Thus the Fh also ��
converge to F in Lp() endowed with the Lp
loc() topology.)
Inverse problems in Image processing and Image segmentation 83
A.3 Estimate from above the ��limit
Given u 2 GSBVloc() \ Lp() with F (u) = F (u;) < +1, we want to
build uh 2 `p( \ hZN) such that uh!u in Lp() and
lim suph#0
Fh(uh) � F (u): (A.15)
In order to be able to assume some regularity on the function u we �rst prove
the following lemma. It is a (simpler) variant of the results in [30] and [26]
that are usually needed to show the �� lim sup inequality for most approx-
imations the Mumford�Shah functional, like Ambrosio and Tortorelli's. For
F", however, a very strong regularity of the jump set is not needed, and this
lemma is su�cient.
Lemma 8 Let u 2 GSBVloc() \ Lp() with F (u) < +1. There exists
a sequence (uk)k�1 � SBV () of bounded functions with bounded supports,
that are almost everywhere continuous in and such that
� uk!u in Lp() as k goes to in�nity,
� limk!1 F (uk) = F (u).
Remark. The information on the support of uk makes sense only when
is unbounded.
Proof. For every integer k � 1 �rst let uk = (�k _ u) ^ k be the truncated
of u at level k. We choose in Lp(RN ) a minimizer vk of
v 7! F (v) + k
ZRNjv(x)� uk(x)jp dx:
Then,
kvk � ukLp(RN) � kvk � ukkLp(RN) + kuk � ukLp(RN) �
��1
kF (uk)
� 1p
+
Zfjuj>kg
(ju(x)j � k)p dx
! 1p
��1
kF (u)
� 1p
+
Zfjuj>kg
ju(x)jp dx! 1
p
!0
as k!1, moreover (see the observation about functional E0 de�ned by (20)
in section 2.3.2), we know that HN�1(\Svk nSu) = 0 and vk 2 C1(nSvk).
84 A. Chambolle
In particular vk is almost everywhere continuous. We also have that F (vk) �F (uk) � F (u) and jvk(x)j � k for all x 2 . Set now for every integer n > 1
and x 2
vk;n(x) =
8>>>>><>>>>>:vk(x)�
1
nif vk(x) > 1=n;
0 if jvk(x)j � 1=n; and
vk(x) +1
nif vk(x) < �1=n.
Clearly vk;n is still a.e. continuous and goes to vk in Lp() as n!1, so that
we can choose nk such that kvk;nk � vkkLp() � 1=k. We set wk = vk;nk .
We also have Swk� Svk ,
rwk =
8<: rvk a.e. in fx 2 : jvk(x)j > 1=nkg ;
0 a.e. in the complement,
and
jfwk 6= 0gj =�����jvkj > 1
nk
����� � np
k
Zjvk(x)jp dx < +1
so that in particular wk 2 Lq() for any q 2 [1;+1].
Choose at last � 2 C10 (RN ) with 0 � � � 1 and � � 1 on B1(0), and set
for R > 0 and any x 2 wk;R(x) = ��x
R
�wk(x). For any R,
Swk;R� Swk
� Svk
and if � 2 ZN,Z
jhrwk;R(x); �ij2 dx =
=
ZBR(0)\
jhrwk(x); �ij2 dx
+
ZnBR(0)
����� � xR�hrwk(x); �i+
wk(x)
R
�r�
�x
R
�; �
�����2 dx�ZBR(0)\
jhrvk(x); �ij2 dx
+2
ZnBR(0)
jhrvk(x); �ij2 dx +C
R2j�j2
ZnBR(0)
jwk(x)j2 dx
Inverse problems in Image processing and Image segmentation 85
with C = 2kr�k2L1(RN)
. Hence
F (wk;R) � F (vk)
+
0@ X�2ZN
���(�)j�j21A(Z
nBR(0)jrvk(x)j2 dx +
C
R2
ZnBR(0)
jwk(x)j2 dx):
Since wk and rvk are in L2(), we can choose R large enough in order to
have
F (wk;R) � F (vk) +1
k: (A.16)
Choose Rk large enough so that (A.16) holds and kwk;Rk� wkkLp() � 1=k,
and set uk = wk;Rk. Clearly uk is still a.e. continuous. Moreover, F (uk) �
F (u)+1=k, uk goes to u in Lp() as k!1, and by Theorem 6 (section 2.2.6)
we deduce that
F (u) � lim infk!1
F (uk)
so that limk!1 F (uk) = F (u) and the lemma is true.
We now establish (A.15). First consider the case = RN . Given u 2
GSBVloc(RN ) \ Lp(RN ) with F (u) < +1, we build invoking Lemma 8 a
sequence of compactly supported, bounded and a.e. continuous functions ukconverging to u such that F (uk)!F (u) as k goes to in�nity. By a standard
diagonalization procedure, if we know how to build for every k a sequence
((uk)h)h>0 converging to uk in Lp(RN ) as h # 0, such that
lim suph#0
Fh((uk)h) � F (uk);
we will be able to �nd uh with uh!u and satisfying (A.15). In the sequel we
may therefore assume that u is bounded, compactly supported, and contin-
uous at almost every x 2 RN .
For y 2 (0; h)N de�ne uy
h2 `p(hZN) by uy
h(x) = u(y+x) for any x 2 hZN.
We compute the mean of Fh(uy
h) over (0; h)N :
h�N
Z(0;h)N
Fh(uy
h) dy =X�2ZN
�(�)X
x2hZN
Z(0;h)N
1
hf�
(u(y + x) � u(y + x+ h�))
h
2!
dy
=X�2ZN
�(�)
ZRN
1
hf�
(u(y)� u(y + h�))
h
2!
dy:
86 A. Chambolle
At this point (following exactly Gobbino's proof), we writeZRN
1
hf�
(u(y) � u(y + h�))
h
2!dy =
=
Z�?dHN�1(z)
ZR
dt1
hf�
0@(u(z + t �j�j)� u(z + t �j�j + h�))
h
21A= j�j
Z�?dHN�1(z)
ZR
ds1
hf�
(u(z + s�)� u(z + (s+ h)�))
h
2!
= j�jZ�?dHN�1(z)F 1
�;h(uz;�)
where uz;�(s) = u(z + s�) and we have set
F 1�;h(v) =
ZR
1
hf�
(v(s) � v(s+ h))
h
2!ds
for any measurable function v. Since we assumed f�(t) � ��t ^ ��, we have
F 1�;h(v) �
ZR
��
����v(s)� v(s+ h)
h
����2 ^ ��
hds (A.17)
and as shown in [43] by M. Gobbino, this is less than
��
ZR
j _v(s)j2 ds + ��H0(Sv)
provided v 2 SBVloc(R) and this expression is �nite.
Exercise. Check this fact, by computing the integral in (A.17) separately
over Shv =Ss2Sv [s� h; s] and over R n Shv .
Therefore,
h�NZ(0;h)N
Fh(uy
h) dy �
�X�2ZN
�(�)j�jZ�?dHN�1(z)F 1
�;h(uz;�)
�X�2ZN
�(�)j�jZ�?dHN�1(z)
���
ZR
jhru(z + s�); �ij2 ds + ��H0(Suz;�)
�=
X�2ZN
�(�)
�ZRN
��jhru(x); �ij2 dx +
ZSu
��jh�u(x); �ij dHN�1(x)
�= F (u):
Inverse problems in Image processing and Image segmentation 87
Thus, for y in some set of positive measure in (0; h)N ,
Fh(uy
h) � F (u): (A.18)
For all h we choose yh such that inequality (A.18) holds and set uh = uyh
h.
We easily check that if u is continuous at x 2 RN then uh(x)!u(x) as h # 0
(since uh(x) = u(x0) for some x0 such that jx � x0j < 32
pNh). Since u is
almost everywhere continuous, uh converges to u a.e. in RN . We also have
kuhkL1(RN) � kukL1(RN) and the functions uh, u are zero outside some
compact set so that by Lebesgue's theorem uh!u in Lp(RN ). Since clearly,
(A.15) holds for this sequence uh, the proof of the case = RN is achieved.
We now return to the general case where is a Lipschitz domain. The
method used in order to localize the previous result is adapted from [23].
We choose a function u 2 GSBVloc() \ Lp(), and once again invoking
Lemma 8 we see that it is not restrictive to assume that u is bounded with
bounded support. Since we assumed that @ is Lipschitz, (and since u is
zero outside some bounded set) we can extend u outside of (using the
same re�ection procedure as for instance in [34] for the extension of W 1;p
functions) into a bounded compactly supported SBV function (still denoted
by u) such that HN�1(@ \ Su) = 0 and F (u;RN ) < +1. Then, we build
(uh) like previously, such that uh goes to u in Lp(RN ) and
lim suph#0
Fh(uh;RN ) � F (u;RN ):
We can write
Fh(uh;RN ) � Fh(uh;) + Fh(uh;
c)
where cis the complement of in R
N . Notice that we have dropped all
terms involving di�erences of values of uh at one point in and another in
c. Sending h to zero we get
lim suph#0
Fh(uh;RN ) � lim sup
h#0Fh(uh;) + lim inf
h#0Fh(uh;
c);
and we deduce from (A.4) that
lim suph#0
Fh(uh;) + F (u;c) � lim sup
h#0Fh(uh;R
N ) � F (u;RN ):
Thus, u being extended in such a way that F (u;c) < +1,
lim suph#0
Fh(uh;) � F (u;):
Since HN�1(@ \ Su) = 0, F (u;) = F (u;) and we get the thesis. This
achieves the proof of Theorem 11.
88 A. Chambolle
A.4 Proof of Theorem 12
For any h > 0 let (uh)h>0 be a minimizer in `p( \ hZN) of
Fh(u) +
Zju(x)� g(x)jp dx (A.19)
where g 2 L1() \ Lp().Replacing uh with (�kgkL1() _ uh) ^ kgkL1() we decrease the energy,
thus in fact kuhkL1() � kgkL1(). In view of Lemma 7, since suph>0 Fh(uh) <
+1, some subsequence (uhj )j�1 of (uh)h>0 converges to a function u 2SBVloc() a.e. in . From the uniform bound on kuhk1 we deduce that
uhj!u in Lp
loc().
If jj < +1, the convergence is in Lp() and we simply conclude invoking
Theorem 7 (section 2.4). Otherwise, we know (by the remark at the end of
section A.2 and Fatou's lemma) that
F (u) +
Zju(x)� g(x)jp dx � lim inf
j!1Fhj (uhj ) +
Zjuhj (x)� g(x)jp dx:
For any v 2 Lp(), we consider (vhj )j�1 a sequence converging to v in Lp()such that
lim supj!1
Fhj (vhj ) � F (v):
For all j we have that
Fhj (uhj ) +
Zjuhj (x)� g(x)jp dx � Fhj (vhj ) +
Zjvhj (x)� g(x)jp dx;
so that at the limit we get
F (u) +
Zju(x)� g(x)jp dx � F (v) +
Zjv(x)� g(x)jp dx;
showing the minimality of u. If we choose v = u, we also deduce that
limj!1
kuhj � gkLp() = ku� gkLp();
thus, by equi-integrability, uhj!u strongly in Lp(), since we had the con-
vergence in Lp
loc(). In the case where we minimize
Fh(u) +�ku� ghkp
�pinstead of (A.19) the proof is not di�erent.
Inverse problems in Image processing and Image segmentation 89
References
[1] G. Alberti. Variational models for phase transitions, an approach via
gamma-convergence. In G. Buttazzo et al., editor, Di�erential Equations
and Calculus of Variations. Springer�Verlag, 2000. (Also available at
http://cvgmt.sns.it/papers/).
[2] L. Ambrosio. A compactness theorem for a new class of functions with
bounded variation. Boll. Un. Mat. Ital. (7), 3-B:857�881, 1989.
[3] L. Ambrosio. Variational problems in SBV and image segmentation.
Acta Appl. Math., 17:1�40, 1989.
[4] L. Ambrosio. Existence theory for a new class of variational problems.
Arch. Rat. Mech. Anal., 111(1):291�322, 1990.
[5] L. Ambrosio. A new proof of the SBV compactness theorem. Calc. Var.
Partial Di�erential Equations, 3(1):127�137, 1995.
[6] L. Ambrosio, N. Fusco, and D. Pallara. Functions of Bounded Variation
and Free Discontinuity Problems. Oxford mathematical monographs.
Oxford Clarendon Press, 2000.
[7] L. Ambrosio and V.M. Tortorelli. Approximation of functionals depend-
ing on jumps by elliptic functionals via �-convergence. Comm. Pure
Appl. Math., 43(8):999�1036, 1990.
[8] L. Ambrosio and V.M. Tortorelli. On the approximation of free discon-
tinuity problems. Boll. Un. Mat. Ital. (7), 6-B:105�123, 1992.
[9] H. Attouch. Variational convergence for functions and operators. Ap-
plicable Mathematics Series. Pitman (Advanced Publishing Program),
Boston, Mass.�London, 1984.
[10] G. Aubert, M. Barlaud, P. Charbonnier, and L. Blanc-Féraud. Deter-
ministic edge-preserving regularization in computed imaging. Technical
Report TR#94-01, I3S, CNRS URA 1376, Sophia-Antipolis, France,
1994.
[11] R. Azencott. Image analysis and Markov �elds. In SIAM Proceedings
of 1st Int. Conf. Appl. Math., Paris, 1987.
90 A. Chambolle
[12] R. Azencott. Markov �elds and image analysis. In Proceedings AFCET
Congress, Antibes, France, 1987.
[13] G. Bellettini and A. Coscia. Discrete approximation of a free disconti-
nuity problem. Numer. Funct. Anal. Optim., 15(3-4):201�224, 1994.
[14] A. Blake and A. Zisserman. Visual Reconstruction. MIT Press, 1987.
[15] L. Blanc-Féraud and M. Barlaud. Restauration d'images bruitées par
analyse multirésolution et champs de Markov. In proceedings of �13e
colloque GRETSI sur le traitement du signal et des images�, Juan-les-
Pins, France, pages 829�832, 1991.
[16] B. Bourdin. Image segmentation with a �nite element method. M2AN
Math. Model. Numer. Anal., 33(2):229�244, 1999.
[17] B. Bourdin and A. Chambolle. Implementation of a �nite-elements
approximation of the Mumford�Shah functional. Numer. Math.,
85(4):609�646, 2000.
[18] A. Braides. Approximation of free-discontinuity problems. Number 1694
in Lecture Notes in Mathematics. Springer�Verlag, Berlin, 1998.
[19] A. Braides and G. Dal Maso. Non-local approximation of the Mumford�
Shah functional. Calc. Var. Partial Di�erential Equations, 5(4):293�322,
1997.
[20] A. Chambolle. Un théorème de ��convergence pour la segmentation des
signaux. C. R. Acad. Sci. Paris, t. 314 Série I:191�196, 1992.
[21] A. Chambolle. Image segmentation by variational methods: Mumford
and Shah functional and the discrete approximations. SIAM J. Appl.
Math., 55(3):827�863, 1995.
[22] A. Chambolle. Finite-di�erences discretizations of the Mumford�Shah
functional. M2AN Math. Model. Numer. Anal., 33(2):261�288, 1999.
[23] A. Chambolle and G. Dal Maso. Discrete approximation of the
Mumford�Shah functional in dimension two. M2AN Math. Model. Nu-
mer. Anal., 33(4):651�672, 1999.
[24] A. Chambolle and P.-L. Lions. Image recovery via total variation mini-
mization and related problems. Numer. Math., 76(2):167�188, 1997.
Inverse problems in Image processing and Image segmentation 91
[25] T. F. Chan, G. H. Golub, and P. Mulet. A nonlinear primal-dual method
for total variation-based image restoration. SIAM J. Sci. Comput.,
20(6):1964�1977, 1999.
[26] G. Cortesani. Strong approximation of GSBV functions by piecewise
smooth functions. Ann. Univ. Ferrara Sez. VII (N.S.), 43:27�49, 1997.
[27] G. Dal Maso. An introduction to �-convergence. Birkhäuser, Boston,
1993.
[28] G. Dal Maso, J.-M. Morel, and S. Solimini. A variational method in
image segmentation: Existence and approximation results. Acta Math.,
168:89�151, 1992.
[29] E. De Giorgi, M. Carriero, and A. Leaci. Existence theorem for a min-
imum problem with free discontinuity set. Arch. Rational Mech. Anal.,
108:195�218, 1989.
[30] F. Dibos and E. Séré. An approximation result for the minimizers of
Mumford�Shah functional. Boll. Un. Mat. Ital. (7), 11-A:149�162, 1997.
[31] D. C. Dobson and C. R. Vogel. Convergence of an iterative method
for total variation denoising. SIAM J. Numer. Anal., 34(5):1779�1791,
1997.
[32] R. C. Dubes, A. K. Jain, S. G. Nadabar, and C. C. Chen. MRF model-
based algorithms for image segmentation. In proc. 10th IEEE Int. Conf
on Pattern Recognition, Atlantic City, pages 808�814, 1990.
[33] S. Durand, F. Malgouyres, and B. Rougé. Image de-blurring, spectrum
interpolation and application to satellite imaging. Technical Report
9916, CMLA, ENS Cachan, 1999.
[34] L. C. Evans and R. F. Gariepy. Measure theory and �ne properties of
functions. Studies in Advanced Mathematics. CRC Press, Boca Raton,
FL, 1992.
[35] K. J. Falconer. The geometry of fractal sets. Cambridge University
Press, Cambridge, 1985.
[36] H. Federer. Geometric Measure Theory. Springer�Verlag, New�York,
1969.
92 A. Chambolle
[37] S. Finzi-Vita and P. Perugia. Some numerical experiments on the energy
minimization problem. In Proc. of the Second European Workshop on
Image Processing and Mean Curvature Motion, pages 233�240, Palma
de Mallorca, September 1995.
[38] D. Geiger and F. Girosi. Parallel and deterministic algorithms for MRFs:
surface reconstruction. IEEE Trans. PAMI, PAMI-13(5):401�412, May
1991.
[39] D. Geiger and A. Yuille. A common framework for image segmentation.
Internat. J.Comp. Vision, 6(3):227�243, August 1991.
[40] D. Geman and G. Reynolds. Constrained image restoration and the
recovery of discontinuities. IEEE Trans. PAMI, PAMI-3(14):367�383,
1992.
[41] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions,
and the Bayesian restoration of images. IEEE Trans. PAMI, PAMI-
6(6), November 1984.
[42] E. Giusti. Minimal surfaces and functions of bounded variation.
Birkhäuser, Boston, 1984.
[43] M. Gobbino. Finite di�erence approximation of the Mumford�Shah
functional. Comm. Pure Appl. Math., 51(2):197�228, 1998.
[44] F. Guichard and F. Malgouyres. Total variation based interpolation.
In Proceedings of the European Signal Processing Conference, volume 3,
pages 1741�1744, 1998.
[45] S. Z. Li. Markov Random Field Modeling in Computer Vision. Spinger
Verlag, 1995. (see also
http://markov.eee.ntu.ac.sg:8000/�szli/MRF_Book/MRF_Book.html).
[46] P.-L. Lions, S. J. Osher, and L. Rudin. Denoising and deblurring using
constrained nonlinear partial di�erential equations. Technical report,
Cognitech Inc., Santa Monica, CA, 1992.
[47] F. Malgouyres and F. Guichard. Edge direction preserving image zoom-
ing: a mathematical and numerical analysis. Technical Report 9930,
CMLA, ENS Cachan, 1999.
Inverse problems in Image processing and Image segmentation 93
[48] J.-M. Morel and S. Solimini. Variational Methods in Image Segmenta-
tion. Birkhäuser, Boston, 1995.
[49] D. Mumford and J. Shah. Boundary detection by minimizing function-
als, I. In Proc. IEEE Conf. on Computer Vision and Pattern Recogni-
tion, San Francisco, 1985. (also Image Understanding , 1988).
[50] D. Mumford and J. Shah. Optimal approximation by piecewise smooth
functions and associated variational problems. Comm. Pure Appl.
Math., 42:577�685, 1989.
[51] M. Nitzberg, D. Mumford, and T. Shiota. Filtering, segmentation and
depth. Number 662 in Lecture Notes in Computer Science. Springer�
Verlag, Berlin, 1993.
[52] T. J. Richardson and S. K. Mitter. A variational formulation-based edge
focussing algorithm. Sadhana, 22(4):553�574, 1997.
[53] L. Rudin and S. J. Osher. Total variation based image restoration with
free local constraints. In Proceedings of the IEEE ICIP'94, volume 1,
pages 31�35, Austin, Texas, 1994.
[54] L. Rudin, S. J. Osher, and E. Fatemi. Nonlinear total variation based
noise removal algorithms. Physica D., 60:259�268, 1992. [also in Experi-
mental Mathematics: Computational Issues in Nonlinear Science (Proc.
Los Alamo Conf. 1991)].
[55] L. Schwartz. Théorie des distributions. Hermann, Paris, 1966.
[56] J. Shah. Properties of segmentations which minimize energy functionals.
(Preprint Northeastern Univ. Math. Dept., Boston), December 1988.
[57] J. Shah. Parameter esimation, multiscale representation and algorithms
for energy-minimizing segmentations. In proc. 10th IEEE Int. Conf. on
Pattern Recognition, Atlantic City, pages 815�819, 1990.
[58] J. Shah. Segmentation by nonlinear di�usion. In proc. IEEE Comp.
Soc. Conf. on Pattern Recognition, pages 202�207, 1991.
[59] J. Shah. Segmentation by nonlinear di�usion, II. In proc. IEEE Comp.
Soc. Conf. on Pattern Recognition, Champaign, IL, pages 644�647, June
1992.
94 A. Chambolle
[60] C. R. Vogel and M. E. Oman. Iterative methods for total variation
denoising. SIAM J. Sci. Comput, 17(1):227�238, 1996. Special issue on
iterative methods in numerical linear algebra (Breckenridge, CO, 1994).
[61] C. R. Vogel and M. E. Oman. Fast, robust total variation-based re-
construction of noisy, blurred images. IEEE Trans. Image Process.,
7(6):813�824, 1998.
[62] W. P. Ziemer. Weakly Di�erentiable Functions. Springer�Verlag, Berlin,
1989.