ICTP Lultra.sdk.free.fr/docs/DxO/Mathematical Problems In Image...ICTP L e ctur Notes SCHOOL ON MA...

ICTP Lecture Notes

SCHOOL ON

MATHEMATICAL PROBLEMS

IN IMAGE PROCESSING

4 - 22 September 2000

Editor

Charles E. Chidume

The Abdus Salam ICTP

Trieste, Italy

SCHOOL ON MATHEMATICAL PROBLEMS IN IMAGE PROCESSING

{ First edition

Copyright c 2000 by The Abdus Salam International Centre for Theoretical Physics

The Abdus Salam ICTP has the irrevocable and inde�nite authorization to reproduce and dissem-

inate these Lecture Notes, in printed and/or computer readable form, from each author.

ISBN 92-95003-04-7

Printed in Trieste by The Abdus Salam ICTP Publications & Printing Section

iii

PREFACE

One of the main missions of the Abdus Salam International Centre for

Theoretical Physics in Trieste, Italy, founded in 1964 by Abdus Salam, is to

foster the growth of advanced studies and research in developing countries.

To this aim, the Centre organizes a large number of schools and workshops

in a great variety of physical and mathematical disciplines.

Since unpublished material presented at the meetings might prove of

great interest also to scientists who did not take part in the schools the Centre

has decided to make it available through a new publication titled ICTP

Lecture Note Series. It is hoped that this formally structured pedagogical

material in advanced topics will be helpful to young students and researchers,

in particular to those working under less favourable conditions.

The Centre is grateful to all lecturers and editors who kindly authorize

the ICTP to publish their notes as a contribution to the series.

Since the initiative is new, comments and suggestions are most welcome

and greatly appreciated. Information can be obtained from the Publica-

tions Section or by e-mail to \pub�[email protected]". The series is

published in house and also made available on-line via the ICTP web site:

\http://www.ictp.trieste.it".

M.A. Virasoro

Director

v

Introduction

This is the second volume of a new series of lecture notes of the Abdus

Salam International Centre for Theoretical Physics. These new lecture notes

are put onto the web pages of the ICTP to allow people from all over the

world to access them freely. In addition a limited number of hard copies is

printed to be distributed to scientists and institutions which otherwise do

not have access to the web pages.

This volume contains the lecture notes given by A. Chambolle during

the School on Mathematical Problems in Image Processing that took place

at the Abdus Salam International Centre for Theoretical Physics from 4

to 22 September 2000 under the direction of L. Ambrosio (Scuola Normale

Superiore di Pisa), G. Dal Maso (Scuola Internazionale Superiore di Studi

Avanzati) and J.-M. Morel (Ecole Normale Sup�erieure, Cachan).

The topic of Chambolle's course was \Inverse problems in image process-

ing and image segmentation: some mathematical and numerical aspects".

The School consisted of two weeks of lecture courses and one week of

conference. It was �nancially supported by the Abdus Salam International

Centre for Theoretical Physics, SISSA (Scuola Internazionale Superiore di

Studi Avanzati, Trieste) and Scuola Normale Superiore di Pisa. I take this

opportunity to express our gratitude to all the lecturers and speakers at the

conference for their contribution towards the success of the School.

Charles E. Chidume

November, 2000

Inverse problems in Image processing and Image

segmentation: some mathematical

and numerical aspects

A. Chambolle�

CEREMADE (CNRS, UMR 7534),

Université de Paris-Dauphine, 75775 Paris cedex 16, France

Lecture given at the

School on Mathematical Problems in Image Processing

Trieste, 4 � 22 September 2000

LNS002001

�[email protected]

Abstract

These notes contain an introduction to some approaches to the reg-

ularization of inverse problems in image processing and to the mathe-

matical tools that are necessary to handle correctly these approaches.

The methods we consider here are variational methods. We consider

mainly the minimization of two kinds of functionals: functionals based

on the total variation of the image, and the so�called Mumford and

Shah functional that penalizes the edge set and the gradient of the im-

age. In both cases we study mathematically the existence of a solution

in the space of functions with bounded variation (BV ), and discuss

then some approximations and numerical methods for computing solu-

tions.

Keywords: Image processing, inverse problems, image segmentation, func-

tions with bounded variation, ��convergence, iterative algorithms.

AMS Classi�cation numbers: 26A45, 49J45, 49Q20, 68U10

Contents

1 Introduction: denoising and deblurring images 7

1.1 The �classical approach� . . . . . . . . . . . . . . . . . . . . . 7

1.2 The total variation criterion . . . . . . . . . . . . . . . . . . . 10

1.3 The segmentation of images . . . . . . . . . . . . . . . . . . . 10

1.3.1 A statistical approach to image denoising . . . . . . . 10

1.3.2 The Mumford�Shah functional . . . . . . . . . . . . . 14

2 Some mathematical preliminaries 15

2.1 The functions with bounded variation (BV ) . . . . . . . . . . 15

2.1.1 Why we need bounded variation functions . . . . . . . 15

2.1.2 BV functions: de�nition and main properties . . . . . 16

2.1.3 Existence for the Rudin-Osher approach . . . . . . . . 19

2.2 More properties of BV functions . . . . . . . . . . . . . . . . 21

2.2.1 The jumps set Su . . . . . . . . . . . . . . . . . . . . . 21

2.2.2 BV functions in one dimension . . . . . . . . . . . . . 21

2.2.3 The jumps set and the singular part of Du . . . . . . 25

2.2.4 Special BV functions, in dimension one . . . . . . . . 27

2.2.5 The general, N�dimensional case . . . . . . . . . . . . 29

2.2.6 Special BV functions . . . . . . . . . . . . . . . . . . . 30

2.2.7 Ambrosio's compactness theorem . . . . . . . . . . . . 31

2.2.8 Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.3 Back to the Mumford�Shah functional . . . . . . . . . . . . . 33

2.3.1 Existence for the weak formulation . . . . . . . . . . . 33

2.3.2 From the weak to the strong formulation . . . . . . . . 34

2.4 Variational approximations and ��convergence . . . . . . . . 35

3 The numerical analysis of the total variation minimization 36

3.1 The discrete energy . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 The method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3 Proof of the convergence of the algorithm . . . . . . . . . . . 39

3.4 Two examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4 The numerical analysis of the Mumford�Shah problem (I) 43

4.1 Ambrosio and Tortorelli's approximate energy . . . . . . . . . 43

4.2 Sketch of the proof of Ambrosio and Tortorelli's theorem, in

dimension one . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2.1 Proof of (i) . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2.2 Proof of (ii) . . . . . . . . . . . . . . . . . . . . . . . . 48

4.3 Higher dimensions . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3.1 The �rst inequality . . . . . . . . . . . . . . . . . . . . 49

4.3.2 The second inequality . . . . . . . . . . . . . . . . . . 51

5 The numerical analysis of the Mumford�Shah problem (II) 53

5.1 Rescaling Blake and Zisserman's functional . . . . . . . . . . 53

5.2 The ��limit of the rescaled 1�dimensional functional . . . . . 55

5.2.1 Proof of (i) . . . . . . . . . . . . . . . . . . . . . . . . 55

5.2.2 Proof of (ii) . . . . . . . . . . . . . . . . . . . . . . . 56

5.3 The ��limit of the rescaled 2�dimensional functional . . . . . 57

5.4 More general �nite-di�erences approximations . . . . . . . . . 58

6 A numerical method for minimizing the Mumford�Shah func-

tional 61

6.1 An iterative procedure for minimizing (34) . . . . . . . . . . . 62

6.2 Anisotropy of the length term . . . . . . . . . . . . . . . . . . 64

6.3 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . 67

A Proof of Theorems 11 and 12 74

A.1 A compactness lemma . . . . . . . . . . . . . . . . . . . . . . 74

A.2 Estimate from below the ��limit . . . . . . . . . . . . . . . . 77

A.3 Estimate from above the ��limit . . . . . . . . . . . . . . . . 83

A.4 Proof of Theorem 12 . . . . . . . . . . . . . . . . . . . . . . . 88

References 89

Main notations

� a_ b, a^ b: respectively, the max and the min of the two real numbers

a; b 2 R.

� Hk: the k�th dimensional Hausdor� measure. In particular, for every

set E � RN , H0(E) is the cardinality of E, also denoted by ]E.

� �E(x): the characteristic function of a set E, i.e., �

E(x) = 1 if x 2 E

and �E(x) = 0 otherwise.

� jEj = LN (E) =RRN

�E(x) dx: the Lebesgue measure of E � R

N .

� Cc(), C1c (), C

1c (): the space of compactly supported continuous

(respectively, continuously di�erentiable, in�nitely di�erentiable) real�

valued functions on the domain � RN . C1

c () is also denoted by

D() when it is equipped with the appropriate topology in order to

de�ne the distributions by duality (see [55]). Cc(;RN ) = [Cc()]

N ,

etc.

� C0(): the space of the real�valued functions that are continuous on

and vanish at the boundary and/or at in�nity, in the sense that if

' 2 C0(), 8" > 0, 9K � compact set such that supnK j'j � ". The

norm on C0() is k'k = sup j'j. With this norm, C0() = Cc().

Similarly, C0(;RN ) = [C0()]

N .

� M(): the space of bounded Radon measures on . It is isomorphic

(and isometric) to the topological dual C0()0 of C0(). M(;RN ) =

[M()]N = C0(;RN )0.

� hx0; xi: the Euclidean scalar product of x; x0 2 RN , or the duality

product between an element x 2 X of a spaceX and an element x0 2 X 0

of its dual (also sometimes denoted by hx0; xiX0;X). The Euclidean

norm in RN is usually denoted by j�j =ph�; �i.

� (a; b): the set ft 2 R : a < t < bg. [a; b] = ft 2 R : a � t � bg,(a; b] = ft 2 R : a < t � bg, etc.

� SN�1 = f� 2 R

N : j�j = 1g is the (N � 1)�dimensional sphere in RN .

Inverse problems in Image processing and Image segmentation 7

1 Introduction: denoising and deblurring images

One fundamental branch of the image processing concerns the problem of

reconstructing images, i.e., given some data (that may be a corrupted image

but also any kind of signal, like the output of a tomography device or of

a satellite aerial), how to reconstruct a clear and clean image that can be

correctly understood by a human operator or post-processed by other image

analysis methods.

The most basic examples of image reconstruction problems are the prob-

lems of denoising and of deblurring an image. Although they are the simplest,

they share many common features with more complicated problems that are

usually too speci�c for the purpose of short lectures. All these problems

belong to the class usually known as inverse problems. It means that the

process through which the data is obtained from the physical characteristics

of the observed scene corresponds to transformations that are roughly well

understood and can be more or less correctly modelized mathematically, but

whose inverse either is not known or is not computable by direct methods,

or whose computation is highly instable and sensitive to small changes in the

data (or noise), so that the scene itself is di�cult to reconstruct.

First we will describe the main classical approach to denoising and de-

blurring (more or less the �standard� method for solving inverse problems)

and will try to explain why it is not well suited to the nature and structure

of images. Then, we will introduce solutions that have been proposed in the

past years to improve this approach.

1.1 The �classical approach�

Assume you observe a signal (an image) which is a matrix G = (gi;j)1�i;j�n ofgrey level values in [0; 1], and suppose you know that this signal is the sum of

a �perfect world� unknown signal U = (ui;j)1�i;j�n and an additive Gaussian

noise N = (ni;j)1�i;j�n, where, for instance, all ni;j are independent and

have mean 0 and known variance �2.

In a di�erent point of view, in the continuous setting, you can assume

that the signal you observe is a bounded grey-level function

g : ![0; 1]

where is �the screen�, usually an open domain of R2 (although lower or

higher dimensions may be considered), and most of the time, in the appli-

cations, a rectangle, e.g., (0; 1) � (0; 1). This function g(x) will be assumed

8 A. Chambolle

to be the sum u(x) + n(x) of a �good image� u(x) and an oscillation n(x)

that we would like to remove. We will assume thatR n(x) dx = 0 and thatR

n(x)2 dx = �2 is known or can be correctly estimated.

The �rst point of view (the discrete setting) describes well the structure of

digital images, and is usually adopted in the statistical approaches to image

reconstruction. We will return to this setting in section 1.3 devoted to the

image segmentation problem, since the origins of the approach that we will

discuss in these notes are to be found in the statistical approach to image

denoising. However, in the PDE or variational approach that we will usually

adopt here, it is more common and more convenient to work in the continuous

setting, and except otherwise mentioned we will consider this point of view

in the sequel.

Up to now we have just considered an image corrupted by some noise, but

usually an image also goes through all kinds of degradations, that are usually

modelized by a blur of more or less known kernel. It means that instead of

g(x) = u(x) +n(x), the correct model should be g(x) = Au(x) +n(x) where

A is a linear operator, say, from L2() into L2() (or any kind of reasonable

function space). Usually

Au(x) = � � u(x) =

Z�(x� y)u(x) dy

is simply a blur (a convolution), with � some (usually non negative) kernel

that is known or estimated, but one may imagine more complex operators

(like tomography kernels, or all sorts of transformation).

Then, the problem we need to solve is the following: given g and an

estimation of A and �2, is it possible to get a good approximation of u?

The �rst idea would be to compute A�1g = u + A�1n, however this isnot feasible in practice: the operator A is often not invertible, or its inverse

is impossible to compute. Consider for instance the case where Au = � � u.In the Fourier domain, we �nd that dAu = �u where � denotes the Fourier

transform. So that u = A�1v if and only if u = v=�. But, even if � does not

vanish, this ratio is usually not in L2 for an arbitrary v 2 L2. Moreover, if �

is a smooth low pass �lter, then �(�) is very small for large frequencies j�j, sothat in the case where v is the oscillatory signal n, for which jn(�)j remains

strictly greater than zero for large j�j, the ratio n(�)=�(�) will become very

large and go to +1 as j�j increases. This enhancement of the high frequencies

gives birth to wild oscillations and artifacts that make the image u + A�1nimpossible to read.


A better approach to this kind of problem, therefore, is the following: we

will try to �nd the �best� function u among all u satisfying8>><>>:ZAu(x)� g(x) dx = 0Z

jAu(x)� g(x)j2 dx = �2:

(1)

So that the main issue, now, is to �nd a good criterion for characterizing

what the �best� function u is.

The classical approach of Tichonov consists in minimizing some quadratic

norm of u, likeR juj2 or

R jruj2 under the constraints (1). Both problems

can easily be solved (using the Fourier transform) and the linear transforma-

tion of g that gives the solution u is called a Wiener �lter.

Figure 1: From left to right: a white square on a black background; the same

image with noise added; the Tichonov reconstruction by minimizingRjuj2;

the minimization ofRjruj2.

However, while the �rst criterion is not regularizing enough and produces

images that still look very noisy, the criterionR jruj2 is not well suited ei-

ther for the analysis of images (see Fig. 1). Indeed, if it is �nite, it means

that the image belongs to the Sobolev space H1(), and it is well known that

a function in that space may not have discontinuities along a hypersurface,

whereas the grey level of an image should be allowed to have such disconti-

nuities that correspond to edges and boundaries of objects in the image. For

instance, in dimension 1, it is well known that if u 2 H1(I) (I being some

interval of R), then for every x; y 2 I with x � y,

u(y)� u(x) =

Zy

x

u0(s) ds �py � x

sZy

x

ju0(s)j2 ds

so that u 2 C0; 12 (I) (the space of continuous 1

2�Hölder functions in I) and

may not have discontinuities.

This motivates the introduction of the criterion that we discuss in the

next section.

10 A. Chambolle

1.2 The total variation criterion

In their paper [54], Rudin, Osher and Fatemi describe a di�erent approach

(see also [46, 53, 60, 61, 24, 25, 44, 33, 47]). Their idea is to try to �nd

a criterion of minimization that corresponds better to the structure of the

images. They propose to consider the �total variation� of the function u as

a measure of the optimality of an image.

The total variation (that will be introduced correctly in section 2.1) is

roughly the integralR jru(x)j dx. The main advantage is that it can be

de�ned for functions that have discontinuities along hypersurfaces (in 2�

dimensional images, along 1�dimensional curves), and this is essential to get

a correct representation of the edges in an image.

The problem to solve is thus the following:

min

�Zjru(x)j dx : u satis�es (1)

�: (2)

We will show in section 2.1 that under some simple and natural assumptions,

this problem has a solution. Then, we will propose a numerical approach for

computing a solution.

1.3 The segmentation of images

The last approach that we will discuss in these notes can be seen as an

independent problem, although historically it has the same origin. It is called

the problem of image segmentation, and can be described as the problem

of �nding a simple representation of a given image in terms of edges and

smooth areas. The proposition of D. Mumford and J. Shah [49, 50]) to solve

this problem by minimizing a functional is indeed derived from statistical

approaches to image denoising, introduced in particular by S. and D. Geman,

that we will describe in the next section. Again, the problem of Geman and

Geman was to regularize correctly an inverse problem (the problem that we

have described in the previous paragraphs, written in the discrete setting),

and to restore correctly the edges of the image. Thus we will brie�y describe

the point of view of Geman and Geman, and explain then how Mumford and

Shah derived their continuous formulation.

1.3.1 A statistical approach to image denoising

The origins of the variational approaches to image segmentation are to be

found in Geman and Geman's famous paper [41] in which they introduce a


statistical approach for image analysis that has proved to be very e�cient.

First we will brie�y explain how it appeared in the probabilistic setting.

We return to the discrete setting of the image denoising problem: the

observed signal (or image) is a matrix G = (gi;j)1�i;j�n of grey level values

in [0; 1], and is the combination of a �perfect world� unknown signal U =

(ui;j)1�i;j�n and an additive Gaussian noise N = (ni;j)1�i;j�n. The ni;j areindependent and have mean 0 and variance �2. If you know the a priori

probability P (U) of the perfect world signal U , since for a given G = U +N ,

the probability of G knowing U is P (GjU) = P (N = G� U) � exp(�kG�Uk2=2�2), the Bayes' rule tells you that

P (U jG)P (G) = P (GjU)P (U);

so that P (U jG), up to a constant, is P (GjU)P (U), that is:

1

(p2��)n�n

exp

0@� 1

2�2

Xi;j

(gi;j � ui;j)2

1A P (U):

Geman and Geman proposed the following a priori probability for U :

they considered that most scenes are piecewise smooth with possible dis-

continuities (the edges), and introduce an edge set (or line process) L =

(li+ 1

2;j)1�i<n;1�j�n; (li;j+ 1

2)1�i�n;1�j<n, where each variable l�;� is either 0

or 1, and (see Fig. 2):

li+ 1

2;j

=

8><>:1 if there is a break (a vertical piece of edge) between i; j

and i+ 1; j

0 if U has to be smooth between i; j and i+ 1; j,

li;j+ 1

2=

8><>:1 if there is a break (a horizontal piece of edge) between i; j

and i; j + 1

0 if U has to be smooth between i; j and i; j + 1.

They then proposed the following probability law for U;L:

P (U;L) =

1

Zexp

8<:�Xi;j

��(1� l

i+ 12;j)(ui+1;j � ui;j)

2 + �li+ 1

2;j

+�(1� li;j+ 1

2)(ui;j+1 � ui;j)

2 + �li;j+ 1

2

�):

12 A. Chambolle

x

x

x

x

x

QQk

i; ji+ 1; j

li+1

2;j= 1

x

x

x

x

x

��+

i; j

li;j+ 1

2

= 1

i; j + 1

Figure 2: The line process: li+ 1

2;jand l

i;j+ 12.

where �, � are two positive weights, and Z is computed in order to havePU;L P (U;L) = 1, the sum being computed over all the possible states U;L.

The problem that needs to be solved is therefore the following:

�Among all possible images U and line processes L, �nd the one that has the

greatest probability

P (U;LjG) � e�E(U;L;G); (3)

where the free energy E(U;L;G) is given by

E(U;L;G) =Xi;j

��(1� l

i+ 12;j)(ui+1;j � ui;j)

2 + (1� li;j+ 1

2)(ui;j+1 � ui;j)

2�

+��li+ 1

2;j+ l

i;j+ 12

�+

1

2�2(gi;j � ui;j)

2

(4)

and G = (gi;j)1�i;j�n is the given data�

In what follows, since the observed data G will be �xed, we will drop the

dependency in G in the notations and merely write E(U;L).

Then, Geman and Geman proposed to maximize the probability (3) us-

ing a simulated annealing algorithm (see for instance [11, 12], [15], [32], the

book [51] for more general segmentation models, and the book on Markov

Random Field Modeling in Computer Vision by Li [45] for a general intro-

duction to the �eld). This kind of method is still widely used in the computer

vision community and gives good result. It has to be adapted to each partic-

ular segmentation problem (in which the problem we exposed is among the

simplest, but might not be the most interesting!).

We will present other approaches, since in some simple cases it might be

too costly to implement a simulated annealing algorithm. Notice that the

problem of maximizing (3) is equivalent to the problem of �nding a minimum


to the free energy E(U;L) that appears in the exponential in (3). The prob-

lem is that this energy is not convex, so that there is no known deterministic

(i.e., non-probabilistic) algorithm that can be proved to surely converge to

the minimum. The history of the minimization of E(U;L) is therefore mostly

a competition for �nding a �better� algorithm, in any possible sense.

In the 80's, already, many have suggested deterministic methods to min-

imize directly energy (4). See for instance [38, 39], or [40], and more re-

cently [10], but of course this list is far from being exhaustive.

In most of these papers, the problem is iteratively approximated by a

sequence of simpler problems, each one becoming �less convex� as the process

evolves. This is the central idea of the book Visual Reconstruction by Blake

and Zisserman [14], who introduce the so�called �Graduated Non-Convexity�

(GNC) algorithm.

They �rst noticed that, minimizing with respect to L, the energy E in (4)

can be rewritten as

E(U) =Xi;j

W�;� (ui+1;j � ui;j) +W�;� (ui;j+1 � ui;j) +1

2�2(gi;j � ui;j)

2

(5)

where the non-convex potential W�;� is (see Figure 3)

W�;�(x) = min(�x2; �):

(We will also denote min(�x2; �) by (�x2) ^ �.) Blake and Zisserman call

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

W1;1(x)

Figure 3: The function W�;�(x) for � = � = 1.

14 A. Chambolle

E(U) the �weak membrane� energy, since it looks like the potential of an

elastic membrane that can break when the elastic energy becomes locally too

high. Notice now that their problem is very similar to the inverse problems

that we have presented in the previous sections. Now, instead of minimizing

a regularizing factor (that is quite more complex than Tichonov's) under a

constraint like (1) (with A = Id), the energy has a term 12�2

(gi;j � ui;j)2 that

could be seen as a Lagrange multiplier for the constraintR ju(x)�g(x)j2 dx =

�2.

Their idea to minimize E(U) is to replaceW�;� with a family of potentials

W �

�;�, � 2 [0; 1], with W �

�;�convex for � = 0 and gradually going to W�;� as

� increases to 1. They then propose to solve the problem for small �, and

then to increase slowly � to improve the solution.

1.3.2 The Mumford�Shah functional

In order to study energies (4) or (5), Mumford and Shah (see [49, 50]) pro-

posed to rewrite those in a continuous setting. They considered an observed

image g(x; y), with (x; y) 2 , bounded open set of R2 , and g(x; y) 2 [0; 1]

for (almost) every (x; y), and then they noticed that the variable L, or rather

the set fL = 1g, describes the �discontinuity� or �jump set� K � of a

piecewise regular function u(x; y), (x; y) 2 , whereas the �nite di�erences

ui+1;j � ui;j (resp., ui;j+1 � ui;j), are approximations of the partial deriva-

tives @u

@x(x; y) (resp., @u

@y(x; y)). The energy they wrote was thus (with the

standard notation ru = (@u@x; @u@y) for the gradient)

E(u;K) = �

ZnK

jru(x; y)j2 dxdy + � � length(K \)

+ �

Z(u(x; y) � g(x; y))2 dxdy (6)

where �; �; � are positive parameters. They then proposed to study the

problem of minimizing energy (6).

In these lecture notes we will try to explain brie�y

(a) how this problem can mathematically be handled, in what setting, what

functions space, in what sense it has a solution,

(b) a �rst approximation result that has been proposed in order to minimize

more easily energy E(u;K), in a continuous setting,


(c) in what sense can one say that E(u;K) and E(U;L) are the �same

energies�, in a continuous and in a discrete setting,

(d) how it is possible to approximate E(u;K) by discrete energies, �better�,

in some sense, than with energy E(U;L).

What we will not describe on the other hand are the possible �nite-

element approaches that have also been proposed for solving the Mumford�

Shah problem. It is still not clear whether they are of some interest for image

processing applications or not. They are usually useful in other �elds where

similar problems are relevant, and in particular in fracture mechanics. The

interested reader may consult [13, 37, 16], or [23, 17].

2 Some mathematical preliminaries

2.1 The functions with bounded variation (BV )

2.1.1 Why we need bounded variation functions

For the study of Rudin and Osher's problem (2), the correct mathematical

setting is clearly the functions with bounded variation (the criterion they

propose to minimize being simply the semi-norm de�ning such functions),

that we will de�ne in the next paragraph. Although it may be not as clear,

this is also true for the analysis of the Mumford�Shah functional. Indeed, in

order to study energy E , Ambrosio and De Giorgi have suggested to introducea �weak formulation� depending only on the variable u. This formulation

assumes that we are able to de�ne, given a function u, a set of discontinuities

Su and a gradient ru everywhere outside of Su. The weak Mumford�Shah

energy is then

E(u) =

Zjru(x)j2 dx + HN�1(Su) +

Zju(x)� g(x)j2 dx: (7)

Here we consider that u, g are de�ned in a domain of a space of arbitrary

dimension N , and the set Su is (N � 1)�dimensional, for images you can

just replace N by 2 everywhere in the notes. HN�1 denotes the (N � 1)�

dimensional Hausdor� measure (see for instance [35]). It is a Borel measure

in RN that agrees with the traditional de�nition of the surface for every

regular hypersurface in RN (any bounded part of an hyperplane, a sphere,

...).

16 A. Chambolle

The discontinuity set Su can be de�ned for very general functions, but

it usually has no kind of regularity. A correct de�nition of the gradient rurequires more regularity of u. Usually, we can de�ne a gradient ru as an

integrable function if u belongs to the Sobolev space W 1;1() (or at least

W1;1loc

()). But in this case it is possible to show that Su is almost empty (in

fact, HN�1(Su) = 0: we say that Su is HN�1�essentially empty).

The space of bounded variation functions, that we are going to introduce,

doesn't su�er this drawback. It contains functions for which it is possible to

de�ne correctly Su and the gradient ru, in such a way that 0 < HN�1(Su) <+1 and

Rjru(x)j2 dx < +1. Such a function combines some regularity,

and discontinuities across the essentially (N � 1)�dimensional set Su

2.1.2 BV functions: de�nition and main properties

The space of bounded variation functions in , denoted by BV (), is de�ned

in the following way:

BV () =nu 2 L1() : Du is a bounded Radon vector measure on

o;

(8)

where Du is the distributional (or weak) derivative of u, de�ned by

hDu; �iD0(;RN);D(;RN) = �Zu(x)div�(x) dx

for any vector �eld � 2 D(;RN ), i.e., C1 with compact support in .

Let us denote byM(;RN ) the space ofN�dimensional bounded (vector�

valued) Radon measures on . It is well known (as a consequence of Riesz'

representation theorem) thatM(;RN ) is identi�ed to the dual of C0(;RN ),

the space of all continuous vector �elds vanishing at the boundary (this

means that if � 2 C0(;RN ), for every " > 0, there exists a compact

set K � such that supx=2K j�j < "), on which the norm is given by

k�kC0(;RN) = supx2 j�(x)j. If � 2 M(;RN ) is a measure, we can de-

�ne its variation as the Borel positive measure given by

j�j(E) = sup

(nXi=1

j�(Ei)j :n[i=1

Ei � E ; Ei \Ej = �8i 6= j

); (9)

for every Borel set E � (here the Ei; i = 1; � � � ; n, are disjoint Borel

sets). Saying that the measure � is bounded is nothing else than saying that

j�j() < +1, the quantity j�j() is called the total variation of � (on )

and de�nes the usual norm in the Banach space M(;RN ).


As an element of the dual C0(;RN )0 of C0(;R

N ), � also has a norm

given by

k�kC0(;RN)0 = supk�k

C0(;RN)�1h�; �i = sup

k�kC0(;R

N)�1

Z�(x)�(dx):

In fact, both norms coincide, which means that for every � 2M(;RN ),

j�j() = sup

�Z�(x)�(dx) : � 2 C0(;R

N ); j�(x)j � 1 8x 2

�The weak-� convergence of a sequence of measures (�n) is understood as

the weak-� convergence in the dual of C0(;RN ), which means that �n*�

weakly-� if and only ifZ�(x)�n(dx)!

Z�(x)�(dx)

for every � 2 C0(;RN ).

If Du is a bounded Radon measure, then sinceR �Du = �

R u div�

for every � in C1c (;R

N ), we deduce (the compactly supported C1�regular

functions being dense in the space C0(;RN ) ) that if u 2 L1(),

u 2 BV () , V(u;):= sup

�Zu(x)div�(x) dx : � 2 C1

c (;RN );

j�(x)j � 1 8x 2 o< +1: (10)

The quantity V(u;) coincides with the total variation of the measure Du,

i.e., V(u;) = jDuj(). In fact, saying that V(u;) must be �nite is an

equivalent way to de�ne the space BV (). We call V(u;) = jDuj() thetotal variation of u in . If u 2 C1(), or u is in the Sobolev space W 1;1(),

then the notation in (2) is valid since it is simple to show that jDuj() =R jru(x)j dx. The space BV (), endowed with the norm kukBV () =

kukL1() + jDuj(), is a Banach space.

Exercise. Prove that, given u 2 L1(), Du 2 M(;RN ) if and only if

V(u;) (given by (10)) is �nite.

The �rst result we can state about the total variation is the following

semi-continuity property:

Theorem 1 (Semicontinuity of the total variation) The convex func-

tional u 7! V(u;) = jDuj() 2 [0;+1] is lower semicontinuous in the

L1loc() topology.

18 A. Chambolle

This means that if un goes to u in L1(0) for every 0 �� , then jDuj() �

lim infn!1 jDunj(). The proof of theorem 1 is straightforward if we con-

sider the de�nition (10) of the variation of u. Indeed, in (10), V(u;) is built

as the sup of the linear functionals u 7!R u(x)div�(x) dx for � 2 C1

c (;RN ).

Since each of these functionals is continuous in the L1loc

topology, we deduce

that the sup is lower semicontinuous.

Next, we have the following Poincaré inequalities

Theorem 2 (Poincaré inequalities) There exists a constant c = c(N)

such that if u 2 L1loc(RN ), then

kukL

NN�1 (RN)

� cjDuj(RN );

and if B is a ball and u 2 L1(B), u��ZB

u

L

NN�1 (B)

� cjDuj(B):

Here and everywhere in the notes �RXu = �

RXu(x) dx denotes the average

1jXjRXu(x) dx.

If is a bounded Lipschitz�regular open set (this will be assumed always

in what follows) we can build a continuous linear extension operator T;0

from BV () to BV (0) for every 0 with �� 0, which means that for

every u 2 BV () we can �nd u0 2 BV (0) with u0 � u on and ku0kBV (0) �ckukBV (), the constant c depending only on ;0. This extension allows

to generalize the second inequality in the last theorem to any such : we

deduce that there exists a constant c = c() such that u��Zu

L

NN�1 ()

� cjDuj() (11)

for every u 2 BV () (see [34] for details).We state, still without any proof, the two next theorems that are funda-

mental for the study of the space BV ().

Theorem 3 (Sobolev embeddings) Let be bounded and Lipschitz�regular.

Then the space BV () is continuously embedded in LN=(N�1)(), and com-

pactly embedded in Lp() for every 1 � p < N=(N � 1).

The �rst assertion is a consequence of the previous theorem. The second

means that if a sequence of functions (uj)j�1 is bounded in BV (), i.e.,


supj kujkL1()+ jDujj() < +1, then we can extract a subsequence ujk and

there exists a function u 2 BV () such that, as k!1, Dujk*Du weakly-�as a measure and ujk!u strongly in Lp(), for every p < N=(N � 1).

Theorem 4 (Approximation by smooth functions) Let u 2 BV ().

Then there exists a sequence (un)n�1 � C1() such that, as n!1, un!u

in L1(), Dun*Du weakly-� as measures, and

jDunj() =

Zjrun(x)j dx ! jDuj():

These properties (in fact, mainly Theorem 2) are su�cient to derive the

existence for problem (2), as we are going to show in the next section.

2.1.3 Existence for the Rudin-Osher approach

The existence for problem (2) in dimension N = 1 or N = 2 is ensured

provided we assume that

� the operator A satis�es A1 = 1 (i.e., the image of a constant function

is the same function),

� the initial data satis�esR jg(x)��

R gj2 dx � �2,

� there exists a ~u satisfying (1) such that jDuj() < +1.

The �rst assumption is not absolutely necessary (we need that A1 6� 0)

but simpli�es a lot the proof, it is obviously satis�ed if A corresponds to a

convolution with a kernel of integral 1 (Au = � � u,R� = 1) (provided the

boundary e�ects are treated correctly). The second assumption is needed,

observe that if the model g = Au + n is correct then it should be satis�ed

(with n rapidly oscillating so thatRAu �n ' 0). The last assumption means

that I = inf fjDuj() : u satis�es (1)g < +1, otherwise any u satisfying (1)

is a solution but the problem is of little interest. In the general continuous

setting the existence of such a ~u is not absolutely obvious.

The following proof is taken from [24]. We consider a minimizing sequence

(un)n�1 for (2), of functions un that all satisfy the constraints and such

that jDunj()!I as n!1. Such a sequence exists because of our third

assumption. We assume in order to simplify the notations that jj = 1 (so

that in particular �R u =

R u for every u). We show, �rst, that the average

20 A. Chambolle

mn =R un remains bounded. This is obvious if A is the identity, or has a

continuous inverse. Otherwise, we can write (since A1 = 1)

�2 =

ZjAun � gj2 =

ZjAun �mn +mn � gj2

=

ZjA (un �mn) +mn � gj2

so that

� � kmn � gkL2() � kA (un �mn)kL2()

� kmn � gkL2() � kAk kun �mnkL2()

where kAk denotes the norm of A as a continuous operator of L2(). Since

N = 1 or 2, 2 � N=(N � 1) and by (11),

kun �mnkL2() =

un � Zun(x) dx

L2()

� cjDunj(): (12)

The total variation jDunj() remains bounded, therefore also mn =R un is

bounded. This implies (using again (12)) that un is bounded in L2().

Upon extracting a subsequence we may thus assume that there exists

u 2 L2() \ BV () such that un*u weakly in L2 and Dun*Du weakly-�as a measure. We also have (since A is continuous and linear) Aun*Au,

therefore by semicontinuity we get

jDuj() � lim infn!1

jDunj() = I; and,

ZjAu(x)� g(x)j2 dx � �2;

ZAu(x) dx =

Zg(x) dx:

(Alternatively, we could invoke Theorem 3 to deduce that some subsequence

of (un) converges to some u strongly in L1(), and Theorem 1 to conclude

that jDuj() � lim infn!1 jDunj() = I.)

We now introduce for t 2 [0; 1] the function ut = tu + (1 � t)R g. We

have for every t, jDutj() = tjDuj() � tI � I,RAu

t =R g, and we haveR

jAu0 � gj2 =R jg �

R gj2 � �2 (by assumption), and

R jAu1 � gj2 =R

jAu � gj2 � �2. By continuity of the map t 7!R jAut � gj2, there

exists therefore a t0 2 [0; 1] such that ut0 satis�es (1), and jDut0 j() � I.

Necessarily we must have jDut0 j() = I, so that t0 = 1 and u is the solution

of problem (2).


2.2 More properties of BV functions

In the previous section we have just introduced the very basic properties

of BV functions that allowed us to state correctly problem (2) and show

that it is well posed. Now, if we want to study the weak Mumford�Shah

energy (7), we see that we need to know more properties of these functions.

In particular, we must de�ne correctly the discontinuity set Su and study

its regularity. We also need to describe precisely the measure Du. This

will be done in the next sections. We will not prove all the results since it

is too di�cult for the purpose of these lectures, but we will try to give a

correct idea of these results by describing with more precision the simpler

one-dimensional case.

2.2.1 The jumps set Su

Let us �rst introduce the approximate limits of a function u at some point

x 2 . Given u : ![�1;+1] a measurable function, we can de�ne the

approximate upper limit of u at x 2 as

u+(x) = inf

�t 2 [�1;+1] : lim

�#0

jfy : u(y) > tg \B�(x)j�N

= 0

�;

where B�(x) is the ball of radius � centered at x and jEj denotes the Lebesguemeasure of the set E. u+(x) is thus the greatest lower bound of the set of

values t for which the set fu > tg has (Lebesgue) density 0 at x: on the other

hand if t < u+(x), then this set must have strictly positive density at x. The

approximate lower limit u�(x) is de�ned in the same way i.e.,

u�(x) = �(�u)+(x) = sup

�t 2 [�1;+1] : lim

�#0

jfy : u(y) < tg \B�(x)j�N

= 0

�:

The set

Su = fx 2 : u�(x) < u+(x)g;is the set of essential discontinuities of u, it is a (Lebesgue�)negligible Borel

set. If x 62 Su, we write ~u(x) = u�(x) = u+(x) = ap limy!x u(y), and when

~u(x) 6= �1 we say that u is approximately continuous at x.

Let us �rst analyse the one�dimensional case, which is simpler.

2.2.2 BV functions in one dimension

In this section, we consider a (bounded) interval I = (a; b) � R (here, a < b

are two real numbers). In this case the total variation of a function u 2 L1(I)

22 A. Chambolle

is simply

V (u; I) = sup

�ZI

u(x)v0(x) dx : v 2 C1c (I; [�1; 1])

�:

Notice that the usual �classical� de�nition is di�erent:

Var (u; I) = sup

(n�1Xi=1

ju(ti+1)� u(ti)j : a < t1 < � � � < tn < b

)

and in general we do not have V (u; I) = Var (u; I). Indeed, the second

de�nition depends on the pointwise values of the function u and the value of

Var (u; I) can be made in�nite by changing u on a set of measure zero (for

instance on a sequence (xn)n�1 of points in I), whereas the �rst de�nition

gives the same value for two functions that are almost everywhere equal. In

fact, if u 2 C1(I), then clearly for every v 2 C1c (I; [�1; 1]),

RIuv0 = �

RIu0v,

and we deduce that V (u; I) =RIju0j, in this case it is easy to show that

V (u; I) = Var (u; I).

Exercise. Show that for every u, V (u; I) � Var (u; I). [ Hint: consider

v 2 C1c (I; [�1; 1]) and remark that limh!0

1h

RIh(v(x + h) � v(x))u(x) dx =R

v0(x)u(x) dx. (Here Ih = fx 2 I : x+ h 2 Ig.) Prove then that for h > 0

su�ciently small, 1h

RIh(v(x+ h)� v(x))u(x) dx � Var (u; I). ]

In the general case we have that V (u; I) � Var (v; I) and V (u; I) = minfVar (v; I) :

v = u a.e.g (see the next exercise).The distributional derivative of u 2 L1(I) is the distribution Du de�ned

by

hDu;'iD0(I);D(I) = �ZI

u(x)'(x) dx

for every ' 2 D(I) (i.e., the set C1c (I) with the appropriate topology). The

function u is in BV (I) if and only if Du is a bounded Radon measure on I,

which means that Du 2 M() ' C0(I)0, the dual of C0(I), which is the set

of continuous functions on I = [a; b] such that u(a) = u(b) = 0. It can be

proved (quite easily) that Du is a bounded Radon measure on I if and only

if V (u; I) < +1, and that in this case we have

V (u; I) = jDuj(I) = sup

(nXi=1

jDu(Ii)j :n[i=1

Ii � I ; Ii \ Ij = �8i 6= j

);

(where the sets Ii are Borel sets) the right-hand side of the last equation being

the standard de�nition of the total variation of the measure Du (which is


also the norm of Du when it is seen as an element of the dual C0(I)0). (More

generally, the variation of a vector�valued (or real�valued) Borel measure �

is the Borel positive measure j�j de�ned by equation (9).)

We now introduce two functions ul and ur, de�ned for every x 2 I by

ul(x) = Du((a; x)) and ur(x) = Du((a; x]):

Here, as usual, (a; x) = fy : a < y < xg denotes the open interval of

extremities a and x > a, which is sometimes also denoted by ]a; x[, whereas

(a; x] is the interval fy : a < y � xg.

Lemma 1 The function ul is left�continuous, while ur is right�continuous.

Moreover, ul = ur except on a set at most countable.

Proof. First of all, ur(x) � ul(x) = Du(fxg) so that ur(x) = ul(x) except

when x is an atom of the measure Du (a point such that Du(fxg) 6= 0),

but a bounded vector (or real�valued) measure can have at most a countable

number of atoms.

To show that ul is left�continuous, for any x 2 I and each sequence of

non�negative numbers �n # 0 we must show that ul(x� �n) goes to ul(x) asn!1. But jul(x��n)�ul(x)j = jDu([x��n; x))j � jDuj([x��n; x)), and by

standard properties of positive measures we know that (assuming, without

loss of generality, that �n is a decreasing sequence) lim jDuj([x � �n; x)) =

jDuj(\n�1[x � �n; x)) = jDuj(�) = 0. Therefore ul is left�continuous. For

the same reason, ur is right�continuous (indeed, ur(x) = Du(I)�Du((x; b)) ).

Remark. More precisely, we can show in the same way that

lim�!0� > 0

ul(x� �) = lim�!0� > 0

ur(x� �) = ul(x); and lim�!0� > 0

ul(x+ �)

= lim�!0� > 0

ur(x+ �) = ur(x):

Lemma 2 The distributional derivatives of ul, ur and u are equal (Dul =

Dur = Du).

24 A. Chambolle

Proof. Let us show, for instance, that Dul = Du. Consider ' 2 D(I). We

have (using Fubini's theorem)ZI

'Dul = �Z

b

a

'0(x)Du((a; x)) dx = �Z

b

a

'0(x)

(Zy2(a;x)

Du(dy)

)dx =

= �Zy2(a;b)

(Zx2(y;b)

'0(x) dx

)Du(dy) = �

ZI

�'(y)Du(dy) =

ZI

'Du;

showing the desired equality.

In particular, we deduce from the last lemma thatD(u�ul) = D(u�ur) =D(ul�ur) = 0 so that the functions u, ul, ur can di�er at most by a constant.

We can rede�ne the functions ul and ur by adding the appropriate constant

so that ul = u and ur = u almost everywhere in I (i.e., now, ul(x) =

c +Du((a; x)) and ur(x) = c + Du((a; x]) with c 2 R appropriately chosen

to have ul = ur = u a.e.) We have shown so far the following proposition.

Proposition 1 Every u 2 BV (I) has a left�continuous and a right-continuousrepresentant1

Exercise. Show that jDuj(I) = Var (ul; I) = Var (ur; I).

We now introduce the function

_u =Du

L1

which is the Radon�Nykodym derivative of the measure Du with respect

to the Lebesgue measure L1 on I (in particular, _u 2 L1(I) ). The Radon�

Nykodym derivation theorem states that for L1�a.e. x 2 I,

_u(x) = lim�!0

Du((x� �; x+ �))

2�= lim

�!0

Du([x� �; x+ �])

2�

and we can write the measure Du as

Du = _u(x) dx + Dsu

with Dsu ? L1, which means that there exists a Borel set E � I such that

jEj = L1(E) = 0 and jDsuj(I n E) = 0. In particular the Radon�Nykodym

1We recall that a representant of a function u 2 L

1is a function ~u a.e. equal to u, or

more precisely belonging to the equivalence class of a.e. equal functions de�ning u.


derivativejDs

ujL1 is zero, so that for L1�a.e. x 2 I, lim�!0 jDsuj((x � �; x+ �))=2�

= 0.

Consider now x a Lebesgue point of _u, i.e., such that lim�!01�

Rx+�x�� j _u(y)�

_u(x)j dy = 0 (a.e. x 2 I satis�es this property), and also assume that

lim�!0 jDsuj([x� �; x+ �])=2� = 0. Then:

lim sup�#0

��ul(x+ �)� ul(x)

�� _u(x)

�� = lim sup�#0

��1��Z

x+�

x

_u(y) dy

+Dsu([x; x+ �))�� _u(x)

�� lim sup�#0

1

�jDsuj([x; x+ �))

+ lim sup�#0

1

�

Zx+�

x

j _u(y)� _u(x)j dy = 0:

In the same way, we can prove that lim sup�#0 jul(x� �)� ul(x)=�� _u(x)j =0, showing that ul has a (classical) derivative at x which is _u(x). We have

shown the following proposition.

Proposition 2 The functions ul and ur have a derivative a.e. in I, and

u0l(x) = u0r(x) = _u(x) for a.e. x 2 I.

Remark. In a similar way we can show that at a.e. x,

lim sup�!0

1

2�

Zjy�xj<�

ju(y)� u(x)� _u(x)(y � x)jjy � xj dy = 0; (13)

which expresses the fact that _u(x) is the approximate derivative of u at x.

This property will have a generalization in higher dimension.

2.2.3 The jumps set and the singular part of Du

Now we will try do describe better the singular part Dsu of the measure Du,

and the set Su. The �rst property is the following.

Proposition 3 At every x 2 I, u+(x) = ul(x) _ ur(x) and u�(x) = ul(x) ^ur(x). In particular, Su = fx 2 I : ul(x) 6= ur(x)g.

Remark. By Lemma 1 we deduce that the set Su is at most countable. In

fact, since ur(x)�ul(x) = Du(fxg), it is the set of the atoms of the measure

Du.

26 A. Chambolle

Proof. Let us �rst show that u+(x) � ul(x) for every x 2 I. Let t <

ul(x): by the left�continuity of ul there exists � > 0 such that x� � < y �x ) ul(y) > t. Therefore fy : ul(y) > tg � (x � �; x) so that if �0 � �,

�0 = j(x � �0; x)j � jfy : ul(y) > tg \ B�0(x)j = jfy : u(y) > tg \ B�0(x)j,where the last equality comes from the fact that u = ul a.e. in I. We deduce

that lim inf�0#0 jfy : u(y) > tg\B�0(x)j � 1 so that (by the de�nition of u+)

t � u+(x). Thus ul(x) � u+(x). In the same way we get that ur(x) � u+(x).

Conversely let t > ul(x) _ ur(x). By left� and right�continuity we know

that there exists � > 0 such that x � � < y < x ) ul(y) < t and x < y <

x + � ) ur(y) < t. As before we deduce this time that lim sup�0#0 jfy :

u(y) > tg \ B�0(x)j = 0. Thus, u+(x) � ul(x) _ ur(x). This proves that

u+ = ul _ ur on I. The proof of the equality u� = ul ^ ur is identical.

Now, we split the measure Dsu into two parts, called respectively Ju

(�J� for �jumps�) and Cu (�C� for �Cantor�):

Ju = Dsu Su and Cu = Dsu (I n Su):

(Notice that, since jSuj = 0 (Su is �nite or countable), Ju is also Du Su).

Since Su is the set of the atoms of the measure Du, we have

Ju =Xx2Su

Du(fxg)Æx

=Xx2Su

(ur(x)� ul(x))Æx:

(Æx stands for the Dirac mass at x.) This measure represents the jumps of u

across its discontinuities. It can also be written as

Ju =Xx2Su

(u+(x)� u�(x))�u(x)Æx = (u+ � u�) �uH0 Su (14)

where �u(x) 2 f�1;+1g represents the direction of the jump of u at x:

�u(x) = +1 if ul(x) = u�(x), ur(x) = u+(x), so that u is �increasing� at x

(ul(x) < ur(x)), whereas �u(x) = �1 when ul(x) = u+(x), ur(x) = u�(x),meaning u is �decreasing� at x (ul(x) > ur(x)). This last expression (14) will

be generalized in higher dimension.

Consider now the measure Cu. It has no atoms (i.e., Cu(fxg) = 0 for

every x 2 I) since Du(fxg) = 0 and Dsu(fxg) = 0 for every x 2 I n Su. Onthe other hand, it is singular with respect to the Lebesgue measure L1 (i.e.,


Cu ? L1). It is called the Cantor part of u. We will soon show an example

of a function u with Du having a Cantor part.

Let us now return for a while to the weak Mumford�Shah functional (7).

In one�dimension, we can write it

E(u) =

ZI

j _u(x)j2 dx + H0(Su) +

ZI

ju(x)� g(x)j2 dx:

(Here the zero�dimensional Hausdor� measure of Su is simply the cardinality

]Su of the set Su.)

In our de�nitions of _u(x) and Su, we see that the weak energy E(u) iscorrectly de�ned. However, if we try to �nd a minimum of E(u) in the

class of all functions with bounded variation, we realize that inffE(u) : u 2BV (I)g = 0 and that it is in general not reached! This happens because it

is possible to approximate every function in L2(I) (here, g) by BV functions

such that Su = �, _u(x) = 0 a.e., and all the derivatives are Cu. A typical

example of such a function is the �Cantor-Vitali� function, de�ned as the

(uniform) limit of the continuous functions in [0; 1]

uk(x) =jCk \ [0; x]j

jCkjwhere C0 = [0; 1]; Ck = Ck�1n

3k�1[n=1

�n

3k;n+ 1

3k

�for k � 1,

see Figure 4. The set C = \1k=0Ck = limk Ck is the Cantor set, it has zero

length. The function u is continuous, and u0 = 0 in [0; 1] n C, i.e., almost

everywhere in (0; 1). The derivativeDu is entirely supported by the negligible

set C, and is therefore singular with respect to the Lebesgue measure. Thus

Du = Cu.

Exercise. Show that any function f 2 L2(0; 1) can be approximated in L2

norm by a sequence fn of functions in BV (0; 1) with Dfn = Cfn ( _fn = 0,

Sfn = � for every n).

If we want to minimize E(u), we have to restrict ourselves to the set of

function we want to consider. We will therefore introduce a new subspace of

BV (I), made of the functions for which Cu is zero.

2.2.4 Special BV functions, in dimension one

De�nition. We say that a function u 2 BV (I) is a special function with

bounded variation if Cu = 0, which means that the singular part Dsu of the

28 A. Chambolle

u1

u2

u20

10:8890:7780:6670:5560:4440:3330:2220:1110

1

0:875

0:75

0:625

0:5

0:375

0:25

0:125

0

Figure 4: The Cantor-Vitali function.

distributional derivative Du is concentrated on the jump set Su. We denote

by SBV (I) the space of such functions.

The main tool in order to prove the existence of a minimizer for the weak

Mumford�Shah energy E is the following compactness and semicontinuity

theorem, due to Ambrosio.

Theorem 5 (Ambrosio, one dimensional version) Let I � R be an open

and bounded interval and (uj) be a sequence in SBV (I). Suppose that

supj

ZI

_uj(x)2 dx + H0(Suj ) + kujkL1(I) < +1:

Then there exist a subsequence (not relabeled) and a function u 2 SBV (I)

such thatuj(x)!u(x) a.e. in I;

_uj* _u weakly in L2(I);

H0(Su) � lim infj!1

H0(Suj ):

(15)

Proof. Consider such a sequence uj. In what follows we will extract several

subsequences from uj that will all still be denoted by uj . Remark that


since supjH0(Suj ) = supj ]Suj < +1, there exists an integer k such that

k = lim infj ]Suj and we can extract a �rst subsequence such that ]Suj = k

for every j. We let Suj = fx1j; � � � ; xk

jg, with a < x1

j< x2

j< � � � < xk

j< b.

Extracting a further subsequence we may assume that each xnjconverges to

some xn 2 �I = [a; b]. For t � 0 we will set It = I n[kn=1[xn� t; xn+ t]. For a

�xed ` � 1, if j is large enough we have that xnj62 I1=` for every n = 1; � � � ; k.

In this case, uj 2 H1(I1=`) and is uniformly bounded:

supj

ZI1=`

ju0j j2 dx = supj

ZI1=`

j _uj j2 dx < +1 ; and

supj

kujkL1(I1=`)< +1:

We can therefore extract a subsequence such that uj converges to some func-

tion u 2 H1(I1=`), uniformly on I1=`, and _uj = u0j*u0 weakly in L2(I1=`).

Using a diagonal procedure, since [`�1I1=` = I0, we can in this way build

a function u 2 H1loc(I0) such that uj!u locally uniformly on I0 and _uj*u0

weakly in L2loc(I0).

But since _uj is bounded in L2(I) = L2(I0), we deduce that _uj*u0 weaklyin L2(I).

In particular, u0 2 L2(I) and u 2 H1(I n fx1; � � � ; xkg) \ L1(I), so that

u 2 SBV (I), _u = u0, and Su � fx1; � � � ; xkg, showing also that ]Su � k and

achieving the proof of Theorem 5.

Exercise. Show that u 2 H1(I n fx1; � � � ; xkg) \ L1(I) ) u 2 SBV (I),

_u = u0, and Su � fx1; � � � ; xkg.Exercise. Use Theorem 5 to show that the weak Mumford�Shah energy Ehas a minimizer in SBV (I).

2.2.5 The general, N�dimensional case

We return to the general case of functions de�ned on an open set � RN ,

N � 1.

If u 2 BV (), it can been shown that the set Su is countably (HN�1; N�1)�recti�able, i.e.,

Su =1[i=1

Ki [N

where HN�1(N ) = 0 and each Ki is a compact subset of a C1�hypersurface

�i. Note that this is a very weak notion of regularity: the set Su could still

be, for instance, dense in .

30 A. Chambolle

There exists a Borel function �u : Su!SN�1 such that HN�1-a.e. in Su

the vector �u(x) is normal to Su at x in the sense that it is normal to �iif x 2 Ki. For every u; v 2 BV (), we must therefore have �u = ��vHN�1-a.e. in Su \ Sv.

As in the one�dimensional case, the derivative Du of every u 2 BV ()

can be decomposed as follows:

Du = ru(x) dx + Ju + Cu

= ru(x) dx + (u+ � u�)�uHN�1 Su + Cu

where ru = Du

LN , the Radon�Nykodym derivative of Du with respect to the

Lebesgue measure LN , is also the approximate gradient of u, de�ned a.e. in

by

ap limy!x

u(y)� u(x)� hru(x); y � xijy � xj = 0;

(remember equation (13)). HN�1 Su is the restriction of the (N � 1)�

dimensional Hausdor� measure to the set Su so that Ju = (u+�u�)�uHN�1 Suis the �jump� part of the measure Du, that is carried by the discontinuity set

of u (compare with equation (14)). Eventually, Cu is the Cantor part of the

measureDu, which is singular with respect to the Lebesgue measure and such

that jCuj(E) = 0 for any (N � 1)�dimensional set E with HN�1(E) < +1.

With these de�nitions of ru(x) and Su, we see here again that the weak

energy (7), E(u), is correctly de�ned. Here again as in the one�dimensional

case we have inffE(u) : u 2 BV ()g = 0 and the in�mum is usually not

reached. We must consider as previously the functions u 2 BV (), such that

Cu is zero.

2.2.6 Special BV functions

De�nition. We say that a function u 2 BV () is a special function with

bounded variation if Cu = 0, which means that the singular part of the

distributional derivative Du is concentrated on the jump set Su. We de-

note by SBV () the space of such functions. We also de�ne the space

GSBV () of generalized SBV functions as the set of all measurable func-

tions u : ![�1;+1] such that for any k > 0, uk = (�k_u)^k 2 SBV ()(where X ^ Y = min(X;Y ) and X _ Y = max(X;Y )) (This follows Ambro-

sio's de�nition in [2], notice that sometimes GSBV () is de�ned as the space


we call hereafter GSBVloc(), which is the space of functions that belongs

to GSBV (A) for any open set A �� , i.e., such that A is compact and

included in .)

If u 2 GSBVloc() \ L1loc(), u has an approximate gradient a.e. in ,

moreover, as k " 1, the function uk = (�k _ u) ^ k satis�es

ruk!ru a.e. in , and jrukj " jruj a.e. in ; (16)

Suk � Su; HN�1(Suk)!HN�1(Su) and �uk = �u HN�1-a.e. in Suk .(17)

2.2.7 Ambrosio's compactness theorem

We mention the following compactness and lower semi-continuity result that

was proved in [2]:

Theorem 6 (Ambrosio) Let be an open subset of RN and let (uj) be a

sequence in GSBV (). Suppose that there exist p 2 [1;1] and a constant C

such that Zjruj j2 dx + HN�1(Suj ) + kujkLp() � C < +1

for every j. Then there exist a subsequence (not relabeled) and a function

u 2 GSBV () \ Lp() such that

uj(x)!u(x) a.e. in ;

ruj*ru weakly in L2(;RN );

HN�1(Su) � lim infj!1

HN�1(Suj ):

(18)

Moreover ZSu

jh�u; �ij dHN�1 � lim infj!1

ZSuj

jD�uj ; �

Ej dHN�1 (19)

for every � 2 SN�1.

There exist variants of this theorem, with di�erent proofs (see [3, 4, 5]).

We need however in these lectures to consider this version, since the con-

clusion (19) will be useful in order to study the anisotropic variants of the

Mumford�Shah functional that appear in the �nite di�erences discretizations

that are common in image processing.

32 A. Chambolle

Remark. By a standard diagonalization technique Theorem 6 also holds if

uj and u are only in GSBVloc().

In this setting we are now able to show the existence of the weak Mumford�

Shah functional of Ambrosio and De Giorgi (cf section 2.3.1). However, �rst

we end this section on functions with bounded variation with a paragraph

about some useful additional properties.

2.2.8 Slicing

We now explain how a (special) bounded variation function can be described

and its properties recovered from its 1-dimensional �slices�, i.e., its restric-

tions to 1-dimensional lines. Many results of the sections 2.2.2�2.2.4 can be

extended to the N�dimensional case using the following properties. In fact,

most of Theorem 6 (in the case p =1) can be recovered from Theorem 3 and

Theorem 5 in this way, the very di�cult part being to show that ruj*ru.Many of the following results will be needed in order to study the variational

approximations of the Mumford�Shah functional.

We consider for � 2 SN�1 the sets �? = fx 2 R

N : h�; xi = 0g and for

any z 2 �?, z;� = ft 2 R : z + t� 2 g. On z;� we de�ne a function

uz;� : z;�![�1;+1] by uz;�(s) = u(z + s�). If u 2 BV (), we have

the following classical representation (see for instance [2, 7]): for HN�1-a.e.z 2 �?, uz;� 2 BV (z;�) and for any Borel set B �

hDu; �i(B) = hDu(B); �i =

Z�?dHN�1(z)Duz;�(Bz;�)

where Bz;� is de�ned in the same way as z;�; conversely if uz;� 2 BV (z;�)

for at least N independent vectors � 2 SN�1 and HN�1-a.e. z 2 �?, and ifZ

�?dHN�1(z)jDuz;�j(z;�) < +1

then u 2 BV (). Now (see [3, 2]), if u 2 SBVloc(), then for almost every

z 2 �?, uz;� 2 SBVloc(z;�) (the converse is true provided this property

is satis�ed for at least N independent vectors � and u has locally bounded

variation), and the approximate derivative satis�es

_uz;�(s) = hru(z + s�); �i

for a.e. s 2 z;�, moreover

Suz;� = fs 2 z;� : z + s� 2 Sug ;


(uz;�)�(s) = u�(z + s�) 8s 2 Suz;� ;

and for any Borel set B � Z�?dHN�1(z)H0(Bz;� \ Suz;�) =

ZB\Su

jh�u(x); �ij dHN�1(x):

The reader interested in knowing more about the space BV and how the

results in this section are proved should consult, for instance, the books [6,

34, 36, 42, 62].

2.3 Back to the Mumford�Shah functional

2.3.1 Existence for the weak formulation

Now, in this setting, it is clear that the weak Mumford�Shah functional (7)

has a minimum in GSBV () (which, in fact, is in SBV ()). Indeed, con-

sider a minimizing sequence (uj)j�1 for the problem infuE(u), in GSBV ().

Then, this sequence satis�es the conditions of Theorem 6 (with p = 2).

Therefore, some subsequence (still denoted by uj) converges almost every-

where to a function u 2 GSBV () withZjru(x)j2 dx � lim inf

j!1

Zjruj(x)j2 dx

(since ruj goes to ru weakly in L2(), by (18)),

HN�1(Su) � lim infj!1

HN�1(Suj ); and

Zju(x)� g(x)j2 dx � lim inf

j!1

Zjuj(x)� g(x)j2 dx

(by Fatou's lemma). Therefore E(u) � lim infj!1E(uj) and u is a mini-

mizer for the weak Mumford�Shah functional. Notice that since g is bounded,

we can always replace u with its truncation at level kgk1, (�kgk1 _u(x))^kgk1, and decrease the energy, so that the minimum u has to satisfy kuk1 �kgk1 and is in SBV () \ L1().

In the next section we will explain how the weak problem is then related

to the strong original one (that is, the minimization of E(u;K)).

34 A. Chambolle

2.3.2 From the weak to the strong formulation

Once we have proved the existence of a minimizer for the weak Mumford�

Shah energy E(u) using Theorem 6, we need to show that it can also be

considered as a minimizer for the original energy E(u;K) de�ned by (6). In

arbitrary dimension N the general de�nition for E is

E(u;K) =

ZnK

jru(x)j2 dx + HN�1(K \ ) +

Zju(x)� g(x)j2 dx;

where the length has been replaced with the (N � 1)�dimensional Hausdor�

measure. (We have also dropped the constant parameters �; ��.)

The natural way to associate a set K to u 2 SBV () is to set K = Su.

However, if u is arbitrary, we could have HN�1(K \ ) > HN�1(Su). For

instance, the function

v(x) =1Xk=1

1

2k�B2�k (xk)

;

where the sequence (xk)k�1 is the set of all points in with rational coor-

dinates, is such that Sv = \[1k=1@B2�k(xk). This has �nite length, but is

dense in . Thus HN�1(Sv \ ) = +1.

A minimizer u of E(u) will be a minimizer for E if and only if we can

prove that HN�1(Su \ ) = HN�1(Su). (Conversely, it is �simple� to show

that if u 2 H1( n K) and K is a closed set with HN�1(K) < +1, then

u 2 SBV () and HN�1(Su n K) = 0, that is, Su is included in K up to a

HN�1�negligible set.)This di�cult result was proved by De Giorgi, Carriero and Leaci [29], and

independently in dimension N = 2 by Dal Maso, Morel and Solimini [28] (see

also the book [48] for a general overview of the problem). They proved that

if u minimizes E(u), then HN�1( \ Su n Su) = 0 and u 2 C1( n Su), sothat (u; Su) minimizes E .

We make here the observation that if we slightly change the problem,

introducing an anisotropy in the energy, then these results still hold. Indeed,

if we consider a weak functional

E0(u) =ZQ(ru(x)) dx +

ZSu

N(�u(x)) dHN�1(x);+Zju(x)� g(x)j dx;

(20)

where Q is a positive de�nite quadratic form in RN and N is a norm in RN (a

1-homogeneous convex function with 0 < min�2SN�1N(�) � max�2SN�1N(�)


< +1), then E0 has a minimizer u in GSBV () (exercise, you need to use

inequality (19) in Theorem 6), moreover, it is possible to adapt the proofs

in [29] and show that HN�1( \ Su n Su) = 0 and u 2 C1( n Su).

2.4 Variational approximations and ��convergence

In these lectures we will describe a few ways to approximate the Mumford�

Shah problem, or variants of this problem. This has to be done because

numerically, it is di�cult to deal with a jump setK. We introduce in this part

a special notion of convergence that is adapted to variational problems. As a

matter of fact, if you are looking for the minimizer of a function F (x), x 2 X(where X is some space), and want to approximate it with minimizers (xn)

of approximate problems minx2X Fn(x), then when can you say that (xn)

converge to a minimizer of F ? If you consider the classical notions of limits

of functions, then only the uniform convergence seems suitable to handle

this problem. However, this notion of convergence is far too strong for most

applications. This motivates the introduction of the following de�nition of ��

convergence, specially invented for studying the limit of variational problems.

We will limit ourselves to the case where X is a metric space. For more

details we refer to [27].

Given a metric space (X; d) and Fk : X![�1;+1] a sequence of func-

tions, we de�ne for every u 2 X the �-lim inf of F

F 0(u) = �� lim infk!1

Fk(u) = infuk!u

lim infk!1

Fk(uk)

and the �-lim sup of F

F 00(u) = �� lim supk!1

Fk(u) = infuk!u

lim supk!1

Fk(uk);

and we say that Fk ��converges to F : X![�1;+1] if F 0 = F 00 = F .

F 0, F 00, and F (if they exist) are lower semi-continuous on X. We have the

following two properties:

1. Fk ��converges to F if and only if for every u 2 X,

(i) for every sequence uk converging to u, F (u) � lim infk!1 Fk(uk);

(ii) there exists a sequence uk that converges to u and such that

lim supk!1 Fk(uk) � F (u);

36 A. Chambolle

2. If G : X!R is continuous and Fk ��converges to F , then Fk + G ��

converges to F +G.

The following result makes clear the interest of the notion of ��convergence:

Theorem 7 Assume Fk ��converges to F and for every k let uk be a mini-

mizer of Fk over X. Then, if the sequence (or a subsequence) uk converges

to some u 2 X, u is a minimizer for F and Fk(uk) converges to F (u).

Eventually, we give the following de�nition of ��convergence in the case

where (Fh)h>0 is a family of functionals on X indexed by a continuous pa-

rameter h: we say that Fh ��converges to F in X as h # 0 if and only if for

every sequence (hj) that converges to zero as j!1, Fhj ��converges to F .

The reader who would like to know more about the ��convergence may

consult the books [9, 27]. Also, the excellent notes [1] by G. Alberti contain

a good introduction to this theory as well as to the applications to phase

transition problems, that are very close (at least technically) to the methods

and techniques of section 4.

3 The numerical analysis of the total variation min-

imization

3.1 The discrete energy

Let us consider problem (2), in dimension 2, and let us try to �nd a way

to compute a solution. We will discuss the approach studied by Vogel and

Oman [60, 61] (see also [24, 31]).

Although it is not absolutely obvious we will �rst assume that there

exists a Lagrange multiplier � > 0 such that problem (2) is equivalent to the

problem

minu2BV ()

jDuj() + �

ZjAu(x)� g(x)j2 dx (21)

(see [24] for details, we must assume here A1 = 1, so that a minimizer

of (21) automatically satis�esRAu =

Rg, as well as the other assumptions

of section 2.1.3). The problem of determining the correct � is also di�cult,

we will not consider it in this short section.

First we must discretize (21). For simplicity we assume that u and g

are discretized on the same square lattice, i; j = 1; � � � ; L. (This is the case

in some situations, but there exist other common situations like the recon-

struction of tomographic data, or the zooming, where it is not true.) The


functions u and g are approximated by discrete matrices U = (Ui;j)1�i;j�Land G = (Gi;j)1�i;j�L. The term �

R jAu(x) � g(x)j2 dx is replaced, in the

discrete setting, by a term �P

i;j j(AU)i;j � gi;jj2. (We omit the scale factor,

but it is important in the practical applications.) In the discrete formula A

denotes a linear operator of RN = RL�L (we set N = L2) and (AU)i;j is the

component i; j of AU .

There are several ways to approximate jDuj(). The simplest (which,

however, has several drawbacks), is to consider the variation along the hori-

zontal and vertical directionsX1�i<L

X1�j�L

jUi+1;j � Ui;jj +X

1�i�L

X1�j<L

jUi;j+1 � Ui;jj

(here again we omitted the scale factor). Due to the strong anisotropy of

this approximation the results it gives are not very good (in fact it is an

approximation of jD1uj() + jD2uj() �where in R2 Diu is the derivative

of u along the ith direction, i = 1; 2� which is a semi�norm in BV () that

is equivalent to jDuj() but not invariant under rotations in the plane), and

many other authors try to consider a �more isotropic� approximation of the

total variation (see for instance [60, 33, 47].

Therefore the discrete energy we need to minimize is the following

E(U) =Xi;j

(jUi+1;j � Ui;jj + jUi;j+1 � Ui;jj) + �Xi;j

j(AU)i;j � gi;j j2: (22)

3.2 The method

Due to the strong nonlinearity of (22) (or rather of its derivative DUE), it is

di�cult (although feasible) to minimize it by a straightforward gradient de-

scent method. The nonexistence of the derivative of the absolute value jxj atx = 0 (a problem that is often overcome by replacing jxj with

p� + x2, with

� a small parameter) is not the only di�culty. Another approach would be

to de�ne a dual problem, using convex duality. In the one�dimensional case

and when A is the identity, it leads to a very simple and e�cient algorithm,

but in other situations it is not very practical. (If you know a bit of convex

analysis you may think about it!) For a similar approach see [25].

The solution we will study here is common in the image processing liter-

ature (see [10, 15, 38, 39, 40, 60, 61]). It is closely related to the method we

will use in section 6, or to the approach in section 4.

38 A. Chambolle

It consists in noticing that for every x 2 R, x 6= 0,

jxj = minv>0

�v

2x2 +

1

2v

�;

the minimum being reached for v = 1=jxj. We thus introduce the func-

tion f(x; v) = vx2=2 + 1=(2v), a new �eld V = (Vi+1=2;j)1�i<L;1�j�L [(Vi;j+1=2)1�i�L;1�j<L 2 R

(L�1)�L+L�(L�1)+ (of positive real numbers) and a

new energy,

F (U; V ) =Xi;j

�f(jUi+1;j � Ui;jj; Vi+ 1

2;j) + f(jUi;j+1 � Ui;jj; Vi;j+ 1

2)�

+�Xi;j

j(AU)i;j � gi;jj2 =Xi;j

�1

2Vi+ 1

2;jjUi+1;j � Ui;jj2 +

1

2Vi;j+ 1

2jUi;j+1 � Ui;jj2

+1

2Vi+ 1

2;j

+1

2Vi;j+ 1

2

1A + �Xi;j

j(AU)i;j � gi;j j2

and we notice that

minV

F (U; V ) = E(U);

the minimum being reached for Vi+1=2;j = 1=jUi+1;j � Ui;jj (or at +1 if

Ui+1;j = Ui;j) and Vi;j+1=2 = 1=jUi;j+1 � Ui;jj.We choose some starting values U0; V 0 and compute for every n � 1

Un = arg minU

F (U; V n�1) ; and

V n = arg minV

F (Un; V ):

The idea is that as n becomes large, Un will converge to the minimizer of (22).

This is actually true if we slightly modify the algorithm (and the function E

we minimize).

We choose an " > 0 and introduce the convex closed set K" = fV : " �Vi+1=2;j � 1=" and " � Vi;j+1=2 � 1=" 8i; jg in R

M (M = (L � 1) � L +

L � (L � 1) ). We de�ne a new energy E"(U) = minV 2K" F (U; V ). It is

possible to show that as " becomes small, the minimizer of E" approaches

the minimizer of E. Moreover, it is easy to compute explicitly E":

E" =Xi;j

(j"(Ui+1;j � Ui;j) + j"(Ui;j+1 � Ui;j)) + �Xi;j

j(AU)i;j � gi;jj2


where

j"(x) = min"�v�1="

f(x; v) =

8>>><>>>:12"x2 + "

2if jxj � ";

jxj if " � jxj � 1";

"

2x2 + 1

2"if jxj � 1

":

De�ne �"(x) = (" _ 1=jxj) ^ 1=" (= 1=jxj if " � jxj � 1=", 1=" if jxj � ",

and " if jxj � 1="). Then �"(x) is the unique value in ["; 1="] such that

j"(x) = f(x; �"(x)). We deduce that the unique V 2 K" for which E"(U) =

minK" F (U; �) = F (U; V ) is given by Vi+ 1

2;j= �"(xi+1;j � xi;j) and Vi;j+ 1

2=

�"(xi;j+1�xi;j) for every i; j. In this case we set �"(U) = V and this de�nes

a continuous function �" : RN!K" � R

M .

The algorithm, now, consists in computing for every n � 1, the starting

values U0; V 0 being chosen,

Un = arg minU

F (U; V n�1): ; and

V n = arg minV 2K"

F (Un; V ) = �"(Un):

3.3 Proof of the convergence of the algorithm

We assume (as in the continuous formulation) that A1N = 1N , where 1N is

the vector in RN de�ned by (1N )i;j = 1 for every 1 � i; j � L (remember

N = L� L is the dimension of the space where U lives). Then we have the

following proposition.

Proposition 4 There exist �U , �V = �"( �U) such that as n!1, Un! �U and

V n! �V , and �U is a (the) minimizer of E".

Proof. First we claim that the following holds

Lemma 3 There exist 0 < � < � < +1 such that the second derivatives

D2UUF and D2

V VF satisfy

�IN � D2UUF (U; V ) � �IN and �IM � D2

V V F (U; V ) � �IM

for every U 2 RN and V 2 K".

This is equivalent to saying that for every U 2 RN , V 2 K", � 2 R

N and � 2RM , �j�j2 �

D2UUF (U; V )�; �

�� j�j2 and �j�j2 �

D2V V

F (U; V )�; ��

�IM j�j2. Here � and � both depend on ".

40 A. Chambolle

Proof. We will leave to the reader the proof of three of the inequalities of

the lemma and will prove the �rst one, which is the more di�cult. We �rst

recall the following �Poincaré inequality� (in �nite dimension): there exists

a constant c > 0 such that for every � 2 RN = R

L�L such thatP

i;j �i;j = 0,

X1�i;j�L

j�i;jj2 � c

0@ X1�i<L;j

j�i+1;j � �i;jj2 +X

i;1�j<Lj�i;j+1 � �i;jj2

1A : (23)

Exercise. Prove this inequality. Hint: suppose it is not true and consider

a sequence �n such that for every n,P

i;j �ni;j

= 0 and 1 =P

i;j j�ni;jj2 �nP

i;j(j�ni+1;j��ni;jj2+ j�ni;j+1��ni;jj2). Then, if � is the limit of a subsequence

�nk of �n, �nd a contradiction on �.

Notice that for every U , V 2 K" and � 2 RN ,D

D2UUF (U; V )�; �

E=

Xi;j

�Vi+ 1

2;jj�i+1;j � �i;jj2 + V

i;j+ 12j�i;j+1 � �i;jj2

�+ jA�j2

� "Xi;j

�j�i+1;j � �i;jj2 + j�i;j+1 � �i;jj2

�+ jA�j2

In particular, letting m(�) = (1=N)P

i;j �i;j be the average of �, we have

(since A1N = 1N )q

D2UUF (U; V )�; �

�� jA�j = jA(��m(�)1N )+m(�)1N j �

jm(�)1N j � jAjj� �m(�)1N j. But by (23), j� �m(�)1N j2 � cP

i;j(j�i+1;j ��i;jj2 + j�i;j+1 � �i;jj2) � (1=")

D2UUF (U; V )�; �

�, therefore

jm(�)1N j � cq

D2UUF (U; V )�; �

�(here c denotes any positive constant that

does not depend on U; V; �). Moreover, using again (23), cD2UUF (U; V )�; �

��

j� � m(�)1N j2. Since 1N and � � m(�)1N are orthogonal we deduce that

j�j2 � cD2UUF (U; V )�; �

�.

Remark. Observe the identity between this proof and the proof of the

coerciveness of the energy in section 2.1.3.

We next prove the following lemma.

Lemma 4 For every n � 1,

E"(Un�1) � E"(U

n) � �

2

�jUn�1 � Unj2 + jV n�1 � V nj2

�:

Proof. For every n � 1, we have DUF (Un; V n�1) = 0 while

hDV F (Un; V n); V � V ni � 0


for every V 2 K". We deduce that (using Lemma 3)

F (Un; V n�1) = F (Un; V n) +DDV F (U

n; V n); V n�1 � V nE

+

Z 1

0(1� t)

�DD2V V F (U

n; V n + t(V n�1 � V n))(V n�1 � V n); V n�1 � V nEdt

� F (Un; V n) +�

2jV n�1 � V nj2:

In a similar way we prove that F (Un�1; V n�1) � F (Un; V n�1) + �

2jUn�1 �

Unj2. Since E"(Un) = F (Un; V n), the lemma is proved.

Since by construction the sequence E"(Un) = F (Un; V n) must decrease

and is bounded from below, it must have a limit in R+ and E"(Un�1) �

E"(Un)!0, therefore Un�1 � Un and V n�1 � V n go to zero as n!1.

Now, we notice that E" is coercive, which means that for every c > 0

the set fE" � cg is bounded in RN (and closed), hence compact (this can be

deduced from Lemma 3). Thus we may extract a subsequence Unk and �nd a�U 2 R

N such that as k!1, Unk! �U . By continuity V nk = �"(Unk)!�"( �U),

and we let �V = �"( �U). We also have DUF (Unk ; V nk�1) = 0 and since

V nk�1�V nk!0 by Lemma 4, V nk�1! �V so that by continuity, DUF ( �U; �V ) =

0.

We conclude using the following lemma

Lemma 5 Let �U; �V satisfyDUF ( �U; �V ) = 0 and �V = arg minV 2K" F (�U; V ) =

�"( �U ). Then DUE"( �U ) = 0.

Proof. Let h 2 RN and t > 0. Letting Vt = �"( �U + th) (that goes to �V as

t # 0), we have

E"( �U + th) � E"( �U) = F ( �U + th;�"( �U + th)) � F ( �U; �V )

= (F ( �U + th; Vt) � F ( �U; Vt))

+ (F ( �U; Vt) � F ( �U; �V ))

Since Vt 2 K", F ( �U; Vt) � F ( �U; �V )) so that E"( �U + th) � E"( �U ) � F ( �U +

th; Vt)�F ( �U; Vt). Since F ( �U+th; Vt)�F ( �U; Vt) = tDUF ( �U; Vt); h

�+Rt

0 (t�s)D2UUF ( �U; Vt)h; h

�dt and

Rt

0(t � s)D2UUF ( �U; Vt)h; h

�dt � �t2jhj2=2, we

deduce thatDUE"( �U ); h

�= lim

t#0

E"( �U + th)�E"( �U)

t�DUF ( �U; �V ); h

�= 0:

Since h is arbitrary, DUE"( �U ) = 0.

42 A. Chambolle

Since E" is (C1 and) strictly convex2 (meaning that for every U;U 0 and

0 < � < 1, E"(�U+(1��)U 0) < �E"(U)+(1��)E"(U0) unless U = U 0), it has

a unique minimizer characterized by the equation DUE = 0. We deduce that�U is the unique minimizer of E". This achieves the proof of Proposition 4,

since by uniqueness of this minimizer any subsequence of (Un) must converge

to the same value �U , so that the whole sequence (Un) converges to �U .

Exercise. Prove that as "!0, E" ��converges to E, so that the minimizer

of E" tends to the minimizer of E.

3.4 Two examples

Figure 5: A noisy image and the reconstruction.

We just show two examples of image denoising (i.e., A is the identity

matrix) obtained with this method (these are taken from [24]). The �rst

one (Fig. 5) represents the reconstruction of a piecewise constant image on

which a Gaussian noise (of standard deviation 60 for values between 0 and

255) has been added. On the left, the noisy image is presented, while on the

right the reconstruction with an appropriate value of � is shown. The second

example (Fig. 6) shows a �true picture� that has been corrupted by some

Gaussian noise (here the standard deviation is approximately 30, always for

values between 0 and 255). The reconstruction (on the right) is less good,

2We have to assume that A is injective. Otherwise, �

U is still a minimizer of E" but

might be not unique.


Figure 6: Another example of total variation minimization.

since the total variation tends to favor piecewise constant images, so that the

result is a bit �blocky�.

4 The numerical analysis of the Mumford�Shah prob-

lem (I)

4.1 Ambrosio and Tortorelli's approximate energy

Now we will describe the �rst attempt that has been made to provide an

approximation of the Mumford�Shah functional by simpler (elliptic) vari-

ational problems. This result is due to Ambrosio and Tortorelli (see [7]

and [8]). They have proposed to replace the set K by a function v(x), and

design energies that depend on a scale parameter ", so that as " goes to

zero the function 1 � v(x), in some sense, becomes an approximation of the

characteristic function of the discontinuity set K.

Their approximation F"(u; v), de�ned over the space L2() � L2(), is

the following

F"(u; v) =

Z(v(x)2 + k")jru(x)j2 dx +

Z"jrv(x)j2

+(1� v(x))2

4"dx +

Zju(x)� g(x)j2 dx (24)

for u; v 2 H1(), and they set F (u; v) = +1 if u or v is in L2() nH1().

The parameter k" > 0 is needed in order to have that for " > 0, F is coercive

44 A. Chambolle

in H1() � H1() (i.e., greater than a constant times the norm of (u; v)

in this space). It has to go to zero faster than " as " goes to zero (i.e.,

lim"#0 k"=" = 0), the reason for this will be made clear in the proof.

In order to show that F" ��converges to an energy such as the weakMumford�Shah energy E, they need to rede�ne it on the same space as F",that is, L2()� L2(). To do this, they consider the fact that as " goes tozero, they want v to become almost everywhere equal to 1, and thus they setfor every u; v 2 L2():

F (u; v) =

8>>>><>>>>:

Z

jru(x)j2 dx + HN�1(Su) +

Z

(u(x)� g(x))2 dx

if

�u 2 GSBV ();

v(x) = 1 a.e. in ,

+1 otherwise:

Notice that F (u; v) is just E(u) when v = 1 a.e., and +1 otherwise. Then,

they are able to show the following theorem.

Theorem 8 (Ambrosio�Tortorelli) As " goes to zero, F" ��converge to

F .

In particular, this means that for small ", the minimizers of F" will be

close to minimizers of F .

This approximation has been used in e�orts to compute image segmen-

tations and for other application (see [16, 37, 52], that are based on a

�nite-element version of Theorem 8 established by Bellettini and Coscia [13],

and [57, 56, 58, 59] where in particular Shah considers the gradient �ow of

F"(u; v) and of similar energies). It works quite well, however, it has the fea-

ture that the approximation of the Mumford�Shah functional will be correct

only if the discretization step (the pixels' size) is much smaller than the scale

parameter ". This is not very convenient for most applications. For this

reason, we will study in the following sections the problem from a di�erent

point of view, considering the original discrete energies (E(U;L) or E(U))

and explain how one can �nd the energy they approximate in the continuous

setting (which will be a variant of the Mumford�Shah energy).

We quickly explain in one dimension how the proof of Theorem 8 goes. In

dimension greater than one, the proof is obtained mostly through a localiza-

tion and a slicing argument, that we will brie�y mention in the subsequent

section.


4.2 Sketch of the proof of Ambrosio and Tortorelli's theorem,

in dimension one

Basically, in order to show Theorem 8, you have to choose any sequence

("j)j�1 of positive numbers with limj!1 "j = 0, and to prove that for any

(u; v),

i) if (uj ; vj) goes to (u; v) (in L2-norm) as j!1, then

F (u; v) � lim infj!1 F"j (uj; vj), and

ii) there exists (uj ; vj) that converges to (u; v) such that

lim supj!1 F"j (uj ; vj) � F (u; v)

4.2.1 Proof of (i)

To prove (i), we consider a sequence (uj; vj). We of course can assume that

lim infj!1 F"j (uj; vj) < +1 (otherwise there is nothing to prove), and we

can extract a subsequence (unless really needed �i.e., if we need to keep

the track of the original sequence or compare two di�erent subsequences�

we will always denote the subsequences like the original sequence) such

that lim infj!1 F"j (uj ; vj) = limj!1 F"j (uj ; vj). In particular we have

c = supj F"j (uj ; vj) < +1, thusR(1 � vj(x))

2 dx � c"j!0 as j!1.

Then, since vj goes to 1 in L2(), we must have v = 1 a.e.. We then

just need to show that E(u) � lim infj!1 F"j (uj ; vj). Since it is clear thatR(uj(x) � g(x))2 dx goes to

R(u(x)� g(x))2 dx as j!1, we have to show

that u 2 GSBV () and estimate the other two terms, H0(Su) = ]Su (notice

that the measure H0(Su) is the cardinality of the set Su, that we also denote

by ]Su), andR _u(x)2 dx.

Now let � be the set of points x 2 such that for every Æ > 0 (small, so

that (x� Æ; x + Æ) 2 ), u 62 H1(x� Æ; x + Æ). We will show that this set is

�nite. Indeed, choose x1; : : : ; xk 2 � with x1 < x2 < � � � < xk. Choose also Æ

such that xi+ Æ < xi+1� Æ for every i = 1; : : : ; k� 1 and (xi� Æ; xi+ Æ) �

for every i.

Then, we must have for every i that lim supj!1 infBÆ=2(xi)vj = 0. Other-

wise, there exists i and a subsequence ujk ; vjk such that limk!1 infBÆ=2(xi)vjk =

� > 0. And vjk � �=2 in BÆ=2(xi) for k large enough. But then,R xi+Æ=2xi�Æ=2 (vjk+

k"jk )u0jk(x)2 dx � c implies that

R xi+Æ=2xi�Æ=2 u

0jk(x)2 dx � 2c=�, so that ujk is uni-

formly bounded in H1(xi � Æ=2; xi + Æ=2). In this case, its limit u also has

46 A. Chambolle

to be in H1(xi � Æ=2; xi + Æ=2), and this is in contradiction with the choice

of xi 2 �.

Now, for every i we know that lim supj!1 infBÆ=2(xi)vj = 0. We also

know that vj!1 in L2(). Therefore if we �x i and choose � > 0 small,

there will exist for j large enough a point xi(j) 2 (xi � Æ=2; xi + Æ=2) with

vj(xi(j)) < �, and x0i(j) 2 (xi � Æ; xi � Æ=2) and x00

i(j) 2 (xi + Æ=2; xi + Æ)

with vj(x0i(j)) > 1 � � and vj(x

00i(j)) > 1 � �. We then have (using the fact

that A2 +B2 � 2AB)Zxi+Æ

xi�Æ"jv

0j(x)

2 +(1� vj(x))

4"j

2

dx �Z

xi+Æ

xi�Æjv0j(x)jj1 � vj(x)j dx

�Z

xi(j)

x0i(j)

j1� vj(x)jjv0j(x)j dx

+

Zx00i (j)

xi(j)j1� vj(x)jjv0j(x)j dx

� (1� �)2 � �2

2+

(1� �)2 � �2

2= 1� 2�:

In particular, we get that for j large enoughR "jv

0j(x)2 +

(1�vj(x))24"j

dx �(1� 2�)k.

Since this is valid for an arbitrary �nite subset fx1; : : : ; xkg of �, it showsthat ]� < +1, and more precisely,

]�(1�2�) � lim infj!1

Z"jv

0j(x)

2+(1� vj(x))

2

4"jdx � c = sup

j

F"j (uj ; vj) < +1:

This is true for every � > 0, so that eventually

]� � lim infj!1

Z"jv

0j(x)

2 +(1� vj(x))

2

4"jdx

holds.

Now, by the de�nition of �, u 2 H1loc( n �), and we need to �nd an

estimate forRn� u

0(x)2 dx. Indeed, if we knew thatRn� u

0(x)2 dx < 1(i.e., u 2 H1( n �)), then it would yield u 2 SBV () and Su = �, _u = u0.

Notice that the proof we have just written could easily be transformed

to lead to the following lemma:

Lemma 6 Let � > 0. Then, for every Æ > 0, there exists J such that for

every j � J , if x1 < x2 < � � � < xk are such that vj(xi) < � and xi+1�xi > Æ,

then k � c=(1 � 2�).


We leave the proof of this lemma to the reader (use the same arguments as

in the proof above, after having chosen J such that if j � J , jfx : vj(x) <

1� �gj < Æ).

In this case, once � > 0 is chosen, we can choose Æ > 0 and select for all

j � J a maximal set x1(j) < x2(j) < � < xk(j)(j), with xi+1(j) � xi(j) > Æ

and vj(xi(j)) < �, and we have k(j) � c=(1 � 2�). Therefore, there exist

k � c=(1 � 2�), a subsequence (ujl ; vjl), and k points x1 < x2 < � � � < xk,

such that k(jl) = k for all l and xi(jl)!xi as l!1 for every i = 1; : : : ; k.

If l is large enough we thus have (by the maximality of the set

fx1(j); : : : ; xk(j)(j)g) that vjl � � in the open set Æ = n[ki=1[xi� 2Æ; xi+

2Æ], so thatRÆ u0jl(x)

2 dx � c=� and since ujl goes to u in L2() we know

that this implies the convergence of u0jlto u0 weakly in L2(Æ).

If w 2 L2(Æ), the functions wqk"jl + v2

jlgo to w strongly in L2(Æ),

since vjl!1 in L2().

Thus, liml!1

ZÆ

qk"jl + vjl(x)

2u0jl(x)w(x) dx =

liml!1

ZÆu0jl(x)

�w(x)

qk"jl + vjl(x)

2�dx =

ZÆu0(x)w(x) dx

andqk"jl + vjl(x)

2u0jl(x) goes weakly to u0 in L2(Æ). This yields

ZÆu0(x)2 dx � lim inf

l!1

ZÆ(k"jl + vjl(x)

2)u0jl(x)2 dx � c < +1:

Since Æ is arbitrary we can deduce thatZu0(x)2 dx � lim inf

j!1

Z(k"j + vj(x)

2)u0j(x)2 dx

and in particular u 2 H1( n �). Therefore u 2 SBV (), Su = �, _u = u0,andZ_u(x)2 dx+ ]�+

Z(u(x) � g(x))2 dx �

� lim infj!1

Z(k"j + vj(x)

2)u0j(x)2 dx+

Z"jv

0j(x)

2 +(1� vj(x))

2

4"jdx

+

Z(uj(x)� g(x))2 dx

which was the thesis we wanted to prove, and (i) is true.

48 A. Chambolle

4.2.2 Proof of (ii)

To prove (ii), we consider u 2 SBV () with E(u) = F (u; 1) < +1 (oth-

erwise there is nothing to prove). In this case, Su is a �nite set, and

u 2 H1( n Su) (in particular it is continuous everywhere but in Su).

In order to simplify the proof we will consider that = (�1; 1) and that

u has just one jump at point 0, but the study of the general case can be

localized in small intervals around each discontinuity of u so that it is not

very di�erent.

Consider the function (t) = 1 � exp(�t=2) for t � 0. We leave the

following result to the reader:

Exercise. Prove that v"(x) = (x=") minimizesZ 1

0"v0(x)2 +

(1� v(x))2

4"dx

on the set fv 2 L2loc(0;+1); v0 2 L2(0;+1); v(0) = 0g, and that the value

of the minimum is 1=2.

Now, we set for every " > 0 v"(x) = 0 if jxj < a", and v"(x) = �jxj�a"

"

�otherwise, where a" goes to zero with " and will be �xed later on, and we set

u"(x) = u(�a")+ u(a")�u(�a")2a"

(x+a") if jxj < a", and u"(x) = u(x) otherwise.

Then (using the result of the previous exercise),

E"(u"; v") =

Zjxj�a"

(k" + v"(x))u0(x)2 dx + 2a"k"

�u(a")� u(�a")

2a"

�2

+

Zjxj�a"

1

" 0� jxj � a"

"

�2

+1

4"

�1�

� jxj � a"

"

��2

dx

+2a"

4"+

Zjxj�a"

(u(x)� g(x))2 dx +

Za"

�a"(u"(x)� g(x))2 dx

� (1 + k")

Z 1

�1u0(x)2 dx + 2kuk21

k"

a"

+2� 1

2+

2a"

4"

+

Z 1

�1(u(x)� g(x))2 dx + 2a"(kuk1 + kgk1)2:

We see that we will get the result if both k"=a" and a"=" go to zero as "

goes to zero. This is possible if and only if k"="!0, since in this case we can

let a" =pk"". Then, we have lim sup"#0E"(u"; v") � F (u) and point (ii) is

proved.


4.3 Higher dimensions

In dimension greater than one, the proof is obtained through a localization

and a slicing argument. The reader, if interested, should report himself to

Ambrosio and Tortorelli's paper, to [19] where a similar argument is used,

or to Braides' book [18]. We will brie�y explain, without giving too many

details, how the proof goes.

4.3.1 The �rst inequality

Consider a sequence "j # 0 and uj; vj such that uj!u and vj!v strongly in

L2() as j!1. Again, it is clear that v = 1 a.e., and we need to show thatE(u) � lim infj!1 F"j (u

j ; vj). To prove this inequality we �rst localize the

energy F"j (uj ; vj) by letting, for every A � open,

F"j (uj ; vj ; A) =Z

A

�(vj(x)2 + k"j )jru

j(x)j2 + "j jrvj(x)j2 +

(1� vj(x))2

4"j+ juj(x)� g(x)j2

�dx:

(25)

Then, we �x an open set A and a unit vector � 2 SN�1, and write

F"j (uj ; vj ; A) =

Z�?dHN�1(z)

ZAz;�

dt�(vj(z + t�)2 + k"j )jruj(z + t�)j2

+ "j jrvj(z + t�)j2 + (1� vj(z + t�))2

4"j

+ juj(z + t�)� g(z + t�)j2�

so that

F"j (uj; vj ; A) �

Z�?F 1D"j

(uj

z;�; v

j

z;�; Az;�; gz;�) dHN�1(z):

(We follow the notations in section 2.2.8: Az;� = ft 2 R : z + t� 2 Ag,and for every t 2 Az;�, u

j

z;�(t) = uj(z + t�), v

j

z;�(t) = vj(z + t�), gz;�(t) =

g(z + t�).) Here F 1D"j

denotes the localized Ambrosio�Tortorelli energy (25)

in dimension 1 (given I � R an open set and w; r 2 H1(I), h 2 L1(I)):

F 1D"j

(w; r; I; h) =

ZI

(r(t)2 + k"j )w0(t)2 dt +

ZI

"jr0(t)2 +

(1� r(t))2

4"jdt

+

ZA

jw(t)� h(t)j2 dt:

50 A. Chambolle

Since as j!1,Zjuj(x)� u(x)j2 dx =

Z�?

Zz;�

jujz;�(t)� uz;�(t)j dtdHN�1(z)! 0

we may assume we have extracted a subsequence (not relabeled) such that

uj

z;�!uz;� for almost every z 2 �? (such that z;� 6= �).

Then, the one�dimensional result states that for such a �,

E1D(uz;�; Az;�; gz;�) � lim infj!1

F 1D"j

(uj

z;�; v

j

z;�; Az;�; gz;�);

where again if I � R is open,

E1D(w; I; h) =

ZI

_w(t)2 dt + H0(Sw \ I) +

ZI

(w(t) � h(t))2 dt

if w 2 SBV (I), and E1D(w; I; h) = +1 otherwise. Using Fatou's lemma,

we deduce thatZ�?

ZAz;�

_uz;�(t)2 dt + H0(Suz;� \Az;�) +

ZAz;�

(uz;�(t)� gz;�(t))2 dt

!dH0(z)

� lim infj!1

F"j (uj ; vj ; A):

In particular we get that if lim infj!1 F"j (uj ; vj ; A) < +1, uz;� 2 SBV (Az;�)

for a.e. every z 2 �?, and since this is true for every � we deduce that

u 2 SBV (A). Thanks to the results of section 2.2.8, the last inequality can

then be rewritten as

E�(u;A) =ZA

hru(x); �i2 dx +

ZSu\A

jh�u(x); �ij dHN�1(x) +

ZA

ju(x)� g(x)j2 dx

� lim infj!1

F"j (uj ; vj ; A):

To conclude, we admit that (see [19, Prop. 6.5]), if (�n)n�1 is a dense

sequence of points in SN�1,

E(u) = sup

(kX

n=1

E�n(u;An) : k 2 N; (An)n=1;��;k disjoint open subsets of

)and we observe that if (An)n=1;��;k is such a family, then

kXn=1

E�n(u;An) �kX

n=1

lim infj!1

F"j (uj ; vj ; An) � lim inf

j!1F"j (u

j ; vj);

so that E(u) � lim infj!1 F"j (uj ; vj).


4.3.2 The second inequality

Now, given u 2 SBV () such that E(u) < +1, let us build functions u" and

v" such that u"!u, v"!1 as " # 0, and such that lim sup"#0 F"(u"; v") � E(u).

We will also assume that u is bounded and that Su is essentially closed in ,

which means that HN�1( \ (Su n Su)) = 0. This, in fact, is not restrictive,

since it is possible to approximate every u 2 SBV () with a sequence of

bounded functions (uj) such that for every j, HN�1( \ (Suj n Suj )) = 0,

in such a way that limj!1E(uj) = E(u). This is a consequence of the

essential�closedness of the jumps set of the minimizers of the Mumford�Shah

functional, mentioned in section (2.3.2).

Exercise. Show this approximation property.

If Su (which is recti�able) is essentially closed in , then the limit of the

quantities

LÆ(Su) =jfx 2 : dist(x; Su) � Ægj

2Æ

as Æ # 0, called the Minkowsky content of Su, is exactly HN�1(Su) (see [36]).Notice moreover that since Æ 7! LÆ(Su) is continuous on (0;+1) and bounded

by jj=(2Æ), it is bounded, so that there exists a constant cL such that

jfx 2 : dist(x; Su) � Ægj � 2cLÆ (26)

for every Æ � 0.

We consider the function (�) de�ned in section 4.2.2, and, again, a" =pk"", and let S" = fx 2 : dist(x; Su) � a"g. We set

v"(x) =

8><>:

�dist(x; Su)� a"

"

�if x 62 S" ;

0 otherwise.

and

u"(x) = u(x)

�dist(x; Su)

a"^ 1

�so that u" = u on n S". Then u"!u and v"!1 as " # 0. We will denote

in what follows dist(x; Su) = d(x). Out of S", ru" = ru, whereas if x 2S", ru"(x) = ru(x)d(x)=a" + u(x)rd(x)=a" so that jru"(x)j � jru(x)j +kuk1=a" (we admit that rd exists a.e. and that jrdj � 1). Therefore,

52 A. Chambolle

Z(v"(x)

2 + k")jru"(x)j2 dx � (1 + k")

ZnS"

jru(x)j2 dx

+2k"

ZS"jru(x)j2 dx +

jS"jkuk21a2"

!:

Since jS"j = 2a"La"(Su) � 2a"HN�1(Su) as "!0 and k"=a"!0, we deduce

that lim sup"#0R(v

2" + k")jru"j2 �

R jruj2.

Exercise. Show that u" 2 H1() and that ru" = (dru+ urd)=a" in S".Let us now show that lim sup"#0

R "jrv"j2+(1� v")2=(4") � HN�1(Su).

Out of S", rv"(x) = 0 ((d(x) � a")=")rd(x)=", so thatZ"jrv"(x)j2 +

(1� v"(x))2

4"dx =

jS"j4"

+1

4"

ZnS"

4 0�d(x)� a"

"

�2

+

�1�

�d(x)� a"

"

��2

dx:

The ratio jS"j=(4") is of order a"=" and goes to zero as " # 0. Since 0(t) =(1� (t))=2 = exp(�t=2)=2, the second integral is

1

2"

ZnS"

exp

��d(x)� a"

"

�dx:

We notice thatZnS"

exp

��d(x)� a"

"

�dx =

ZnS"

Z 1

0

�ft : t�exp(�(d(x)�a")=")g dtdx

=

Z 1

0jfx 2 : a" < d(x) � a" � " log tgj dt:

Let h"(t) = jfx 2 : a" < d(x) � a" � " log tgj=(2") = (a"=" �log t)L(a"�" log t)(Su) � a"="La"(Su). By (26), jh"(t)j � cL(a"=" � log t) �cL(1 � log t) (if " is small enough) for every t 2 (0; 1) and the latter func-

tion is integrable on (0; 1). Moreover, we know that as " # 0 lim"#0 h"(t) =(� log t)HN�1(Su). By Lebesgue's dominated convergence theorem we de-

duce that lim"#0R 10 h"(t) dt = HN�1(Su), so that

lim"#0

1

2"

ZnS"

exp

��d(x)� a"

"

�dx = HN�1(Su)

This shows that lim"#0R "jrv"j2+(1�v")2=(4") = HN�1(Su), and achieves

the proof of Theorem 8 in arbitrary dimension.


5 The numerical analysis of the Mumford�Shah prob-

lem (II)

Now we consider a di�erent problem, that can be stated in this way: �in

what sense is Blake and Zisserman's energy E(U) (or Geman and Geman's

energy E(U;L)) a discrete approximation of the Mumford�Shah functional?�

We will see that in fact, it is not an approximation of this functional, but of

a slightly di�erent functional in which the length of the discontinuity set is

measured in a di�erent (anisotropic) way.

5.1 Rescaling Blake and Zisserman's functional

But �rst of all, if we want to see E(U) as a discrete approximation of some-

thing, we have to introduce a discretization step (or scale parameter) h > 0

and explain how the parameters � and � in (5) must vary with h in order to

get some result. How can this be done?

Consider the simpler, one�dimensional Blake and Zisserman's energy

E1(U) =n�1Xi=1

W�;�(ui+1 � ui) +nXi=1

(ui � gi)2

where U = (ui)ni=1 and G = (gi)

ni=1 are 1�dimensional signals.

First of all, if we want the last term of the energy to be an approximation

of an integralR 10 (u(x)� g(x))2 dx, then we need to assume that the signal G

is in fact some discretization Gh = (ghi)ni=1 at step h = 1=n of the function

g 2 L2(0; 1). For instance, we can let ghi= (1=h)

Rih

(i�1)h g(x) dx. Then, we

know thatP

n

i=1 ghi�[(i�1)h;ih) will converge to g in L2. In this case, if we

consider for all h = 1=n a signal U = Uh = (uhi)ni=1, we will have that

nXi=1

h (uhi � ghi )2 �!

Z 1

0(u(x)� g(x))2 dx

as h goes to 0 (n to +1) if the functionsP

n

i=1 uhi�[(i�1)h;ih) converge to u

in L2(0; 1).

But if this is true, then it is easy to show that, at least in the distributional

sense,n�1Xi=1

uhi+1 � uh

i

h�[(i�1)h;ih) * Du

54 A. Chambolle

where Du denotes the distributional derivative of u. In the regions where u

is di�erentiable, it is therefore reasonable to ask that (uhi+1 � uh

i)=h � u0(x)

if x � ih, and the sumP

n�1i=1 W�;�(u

hi+1 � uh

i) will be an approximation ofR

u0(x)2 dx in these regions if when uhi+1 � uh

iis small,

W�;�(uh

i+1 � uhi ) ' h � uhi+1 � uh

i

h

!2

=(uh

i+1 � uhi)

h

2

:

This implies that we must choose � � 1=h.

On the other hand, when u has a jump, if (i + 1)h is on one side of

the jump and ih on the other side, the di�erence uhi+1 � uh

ishould go to

ur � ul = �(u+ � u�) and thus have the order of magnitude of a constant.

In this case, we want to count �1� in the energy. Since the value of W�;�(t)

for large t is �, it means that we must choose � � 1. The rescaled energy

then becomes (recording that W1=h;1(t) = min(t2=h; 1))

E1h(U

h) =n�1Xi=1

min

0@(uhi+1 � uh

i)

h

2

; 1

1A +nXi=1

h (uhi � ghi )2:

Letting f(t) = min(t; 1) = t ^ 1, this can be written as

E1h(U

h) = h

0@n�1Xi=1

1

hf

0@(uhi+1 � uh

i)

h

21A +

nXi=1

(uhi � ghi )2

1A : (27)

In a similar way, the rescaled 2�dimensional Blake and Zisserman energy

will be

E2h(U

h) = h2

0@Xi;j

1

hf

0@(uhi+1;j � uh

i;j)

h

21A +

1

hf

0@(uhi;j+1 � uh

i;j)

h

21A

+Xi;j

(uhi;j � ghi;j)2

1A (28)

where now (ghi;j)1�i;j�n is the correct discretization of an image g 2 L2()

with = (0; 1) � (0; 1): for instance, we can let

ghi;j =1

h2

Zih

(i�1)h

Zjh

(j�1)hg(x; y) dxdy:


5.2 The ��limit of the rescaled 1�dimensional functional

In order to state a ��convergence result for the energy E1h(Uh) de�ned

by (27), we must �rst consider E1has a functional over L2(0; 1). This is

done by de�ning it for every uh 2 L2(0; 1) as

E1h(u

h) =

(E1h(U

h) if uh =P

n

i=1 uhi�[(i�1)h;ih); U

h = (uhi)ni=1;

+1 otherwise,

which means that E1h(uh) has a �nite value only when uh is a piecewise

constant function at scale h. Then, we have the following result ([20]).

Theorem 9 E1h��converges to E1

0 as h goes to zero (n goes to in�nity),

where

E10(u) =

8<:Z 1

0_u(x)2 dx+H0(Su) +

Z 1

0(u(x)� g(x))2 dx if u 2 SBV (0; 1),

+1 if u 2 L2(0; 1) n SBV (0; 1).(29)

We will sketch a proof of this result. We need to prove that

i. if (uh) goes to u (in L2-norm) as h!0, then E10(u) � lim infh!0E

1h(uh),

and

ii. there exists (uh) that converges to u such that lim suph!0E1h(uh) �

E10(u)

5.2.1 Proof of (i)

To prove (i), we consider a sequence (uh) converging to u as h goes to zero.

We can assume that lim infh!0E1h(uh) < +1 (otherwise there is nothing to

prove), and even, by extracting a subsequence, that suphE1h(uh) < +1. In

particular, for every h, there is a discrete signal Uh = (uhi)ni=1 (n = 1=h) such

that uh =P

n

i=1 uhi�[(i�1)h;ih) and E

1h(uh) = E1

h(Uh). Then, we build a new

function vh in the following way:

� if x 2 [0; h) we let vh(x) = uh1 ;

� then, for x 2 [ih; (i + 1)h) (1 � i � n� 1):

� if juhi+1 � uh

ij �

ph, we let vh(x) = uh

i+ (x� ih)(uh

i+1 � uhi)=h,

56 A. Chambolle

� otherwise, if juhi+1 � uh

ij >

ph, we let vh(x) = uh

iif x 2 [ih; (i +

1=2)h) and vh(x) = uhi+1 if x 2 [(i+ 1=2)h; (i + 1)h).

We see that vh(ih) = uhifor all i, and that vh is a�ne in the intervals

[ih; (i + 1)h) such that f((uhi+1 � uh

i)2=h) = (uh

i+1 � uhi)2=h, and piecewise

constant with exactly one jump in the intervals such that f((uhi+1�uhi )2=h) =

1.

With this construction, we have that vh 2 SBV (0; 1), and that

Z 1

0_vh(x)2 dx + H0(Svh) = h

n�1Xi=1

1

hf

0@(uhi+1 � uh

i)

h

21A :

We can show thatR 10 (v

h(x) � g(x))2 dx is less than some constant times

hP

n

i=1(uhi� gh

i)2, so that we may apply Theorem 6 to deduce that some

subsequence of vh converges a.e. to a function v 2 SBV (0; 1), withZ 1

0_v(x)2 dx + H0(Sv) � lim inf

h!0

Z 1

0_vh(x)2 dx + H0(Svh):

Since it is easy to show that vh must converge (at least) weakly to u in

L2(0; 1), and since hP

n

i=1(uhi�gh

i)2!

R 10 (u(x)�g(x))2 dx, we get that v = u,

u 2 SBV (0; 1) andZ 1

0_u(x)2 dx + H0(Su) +

Z 1

0(u(x)� g(x))2 dx � lim inf

h!0E1h(u

h);

which is the thesis we wanted to show.

Remark. We have shown slightly more than just the point (i). Notice in-

deed that we can easily deduce that if uh is bounded uniformly in L1(0; 1)

and suphE1h(uh) < +1, then some subsequence of uh converges (weakly in

L2: easy, strongly in L2: �rst show that (uh) is bounded in BV (0; 1), hence

compact in L2) to a function u with E10(u) � lim infh!0E

1h(uh). In particu-

lar, we can deduce from Theorem 9 and this remark that if uh is for every h

a minimizer of E1h, then it has subsequences that converge to a minimizer of

E10 . (See Theorem 12 in section 5.4 below for a more general statement.)

5.2.2 Proof of (ii)

Now we consider proving (ii). This is very simple. Choose u 2 SBV (0; 1)

with E10(u) < +1. The function u is piecewise continuous and has a �nite


number of jumps x1 < x2 < � � � < xk. We de�ne for every n � 1 and h = 1=n

the discrete signal U = (uhi)ni=1 by uh

i= u(ih � 0), which is the left limit

lim"#0 u(ih � ") of u at ih (thus uhi= u(ih) if ih 62 Su). It is standard that

uh =P

n

i=1 uhi�[(i�1)h;ih) goes to u in L2(0; 1), in fact it is easy to prove that

uh goes to u uniformly on (xl + Æ; xl+1 � Æ) for every l = 1; : : : ; k and small

Æ > 0.

If [ih; (i+1)h)\Su = �, we have (using the Cauchy-Schwarz inequality)

uhi+1 � uhi =

Z (i+1)h

ih

_u(x) dx �ph

Z (i+1)h

ih

_u(x)2 dx

! 12

;

so that (uhi+1 � uh

i)2=h �

R (i+1)hih

_u(x)2 dx. Thus, denoting Ih the set fi 2[1; n] : [ih; (i + 1)h) \ Su 6= �g, we have

hn�1Xi=1

1

hf

0@(uhi+1 � uh

i)

h

21A �

Xi62Ih

Z (i+1)h

ih

_u(x)2 dx + ]Ih

�Z 1

0_u(x)2 dx + ]Su:

Therefore lim supE1h(uh) � E1

0(u) and point (ii) is proved.

5.3 The ��limit of the rescaled 2�dimensional functional

For the 2�dimensional functional, we have the same kind of result. We also

de�ne the functional E2h(Uh) as a functional over L2(), with = (0; 1) �

(0; 1), in the following way: for every 1 � i; j � n we let Chi;j

be the square

[(i� 1)h; ih) � [(j � 1)h; jh), and

E2h(u

h) =

(E2h(U

h) if uh =P

1�i;j�n uhi�Chi;j

; Uh = (uhi;j)1�i;j�n;

+1 otherwise.

Then, we de�ne E20 as

E20(u) =

8>>><>>>:Zjru(x)j2 dx +

ZSu

j�1u(x)j + j�2u(x)j dH1+

Z(u(x)� g(x))2 dx

if u 2 SBV (),

+1 if u 2 L2() n SBV ().(30)

Here the vector �u(x) = (�1u(x); �2u(x)) is the normal vector to the jump

set Su at x. Notice that in this case E20 is not the Mumford and Shah

58 A. Chambolle

functional: it is slightly di�erent and measures the length of the jump set

in an anisotropic way. We point out the fact that it is of the form of E0 inde�nition (20), so that it still admits minimizers and the jump set Su of a

minimizer u is still essentially closed (i.e., H1( \ Su n Su) = 0).

The anisotropy in E20 is unavoidable since the discrete energy E2

his not

isotropic either, as illustrated by the following exercise.

Exercise. Assume g = 0. Let C = [a; b] � [c; d] � = (0; 1) � (0; 1), with

` = b�a = c�d > 0, be a square in , and let C 0 be the same square rotated

by 45Æ. Let uhi;j

be 1 if (ih; jh) 2 C (i.e., if a � ih � b and c � jh � d)

and 0 otherwise (uh is an approximation of the characteristic function of C).

Similarly, let u0hi;j be 1 if (ih; jh) 2 C 0 and 0 otherwise. Show that as h goes

to zero, E2h(uh) � 4`. On the other hand, show that E2

h(u0h) � 4

p2`.

With these de�nitions of E2hand E2

0 , we have the following theorem [21]:

Theorem 10 As h = 1=n goes to zero, E2h��converges to E2

0 in L2().

The proof of Theorem 10 in [21] is based on the same ideas as the proof

of Theorem 9 that was just given, and we will not repeat it here. On the

other hand, this result is a particular case of Theorem 11 that will be proved

in the next section.

5.4 More general �nite-di�erences approximations

Now, we will introduce a general result for �nite di�erence discrete approx-

imations of the Mumford�Shah functional, of which Theorems 9 and 10 are

particular cases. What follows is derived from [22].

In 1995, De Giorgi imagined the following non-local functional, de�ned

for any measurable function u on RN ,

F"(u) =

Z ZRN�RN

1

"arctg

(u(x)� u(x+ "�))

"

2!e�j�j

2

dxd�

as a possible approximation, as " goes to zero, of the �rst-order part of the

Mumford�Shah functional (the part that depends on Du)

F (u) = �

ZRNjru(x)j2 dx+ �HN�1(Su);

de�ned on functions u 2 GSBVloc(RN ) . Here �, � are two positive parame-

ters. Notice that the function arctg (t) looks like the function f(t) = min(t; 1)


that we introduce in the previous sections: in the neighborhood of 0, it

behaves as t, whereas as t!1, it behaves like a constant (�2instead of

1). We will see in the sequel that we could consider any non-decreasing

function f such that f(0) = 0, f 0(0) = 1 (or some positive constant), and

limt!+1 f(t) = 1 (or any other positive constant).

This conjecture of De Giorgi was proved by Gobbino [43]. He established

that, as " # 0, F" ��converges to F in the strong Lp(RN ) topology for any

1 � p < +1, for � =p�N

2and � = �

p�.

This result is very close to the Theorems stated in the previous section.

We will show here how we can formulate a discrete version of Gobbino's

theorem, and give its complete proof, based on Gobbino's proof.

Let us give some details. Let � RN be an open domain with Lipschitz

boundary, and for every h > 0 and every u : \ hZN ! R let

Fh(u;) = hNX(

x 2 hZN

x 2

X(� 2 ZN

x+ h� 2

1

hf�

(u(x)� u(x+ h�))

h

2!�(�) 2 [0;+1] ;

(31)

where:

� � : ZN ! [0;+1) is even, satis�es �(0) = 0,P

�2ZN j�j2�(�) < +1,

and �(ei) > 0 for any i = 1; : : : ; N where (ei)1�i�N is the canonical

basis of RN (in practical applications the support of � will have to be

�nite and small);

� for any � with �(�) > 0, f� : [0;+1) ! [0;+1) is a non-decreasing

bounded function with f� � f��, f�(0) = 0, f 0�(0) = �� > 0, and

limt!+1 f�(t) = �� , and we assume that f� is below (or equal to) the

function t 7! ��t^��. We also assume both sup�2ZN �� and sup�2ZN ��are �nite;

� we will adopt in the sequel the convention that any term in the sum

above is zero whenever either x or x + h� is not in even if we do

not explicitly write these conditions under the summation signs (this

convention will be adopted everywhere in what follows unless otherwise

stated), as well, we'll usually write Fh(u) instead of Fh(u;) when not

ambiguous.

Fix p 2 [1;+1) (but you can assume p = 2, as in the previous sections),

and let `p( \ hZN) be the vector space of functions u : \ hZN ! R such

60 A. Chambolle

that the norm

kukp =

8<:hN Xx2\hZN

ju(x)jp9=;

1p

is �nite. In the sequel we will always identify a function u in `p( \ hZN) andthe piecewise constant function in Lp(RN ) equal to u(x) on x +

��h

2; h2

�Nfor any x 2 \ hZN (and to 0 elsewhere), so that kukp = kukLp(RN) and

that a sentence such as �uh 2 `p( \ hZN) converges to u 2 Lp() as h # 0�will have a natural sense. We also set Fh(u) = +1 for any u 2 Lp() thatis not the restriction to of the piecewise constant extension of a function

in `p( \ hZN).Let now, for any u 2 Lp() \GSBVloc(),

F (u) =

Z

X�2ZN

�(�)�� jhru(x); �ij2 dx

+

ZSu

X�2ZN

�(�)�� jh�u(x); �ij dHN�1(x) 2 [0;+1]; (32)

and set F (u) = +1 if u 2 Lp() nGSBVloc(). We will also denote some-

times

F (u;B) =

ZB

X�2ZN

�(�)�� jhru(x); �ij2 dx

+

ZB\Su

X�2ZN

�(�)�� jh�u(x); �ij dHN�1(x)

when B � is a Borel set. We have the following theorem.

Theorem 11 Fh ��converges to F as h # 0 in Lp() (endowed with its

strong topology), for any p 2 [1;+1).

We also have the following compactness result:

Theorem 12 Let p 2 [1;+1), g 2 Lp() \ L1(), and for any h > 0 let

uh be a minimizer over `p( \ hZN) of

Fh(u) +

Zju(x)� g(x)jp dx (33)

(or, equivalently, of

Fh(u) +�ku� ghkp

�p(34)


where gh 2 `p( \ hZN) is a suitable discretization of g at scale h, with

gh ! g in Lp() as h # 0 and kghk1 � kgk1 for all h). Then (uh) is

relatively compact in Lp() and if some subsequence uhj goes to u as j !1,

u 2 SBVloc() \ Lp() is a minimizer of

F (u) +

Zju(x)� g(x)jp dx:

These theorems, for p = 2, provide a generalization of the previous The-

orems 9 and 10. For instance, Theorem 10 is the case where N = 2, p = 2,

= (0; 1) � (0; 1), � � 0 on Z2 except �(0; 1) = �(1; 0) = �(0;�1) =

�(�1; 0) = 1=2, and f�(t) = t ^ 1, so that

F (u) =

Zjru(x)j2 dx+

ZSu

j�1u(x)j+ j�2u(x)j dH1(x):

Remark. The condition �(ei) > 0 for i = 1; : : : ; N is necessary only for

the coercivity, i.e. to establish Lemma 7 and Theorem 12. This is important

in practical applications for the stability of the numerical schemes. Even

if we have not discussed it in the previous sections (except in the Remark

on page 56), a similar coercivity and compactness result also hold for Theo-

rems 8, 9 and 10.

If we wanted only to prove the ��convergence of Fh to F , it would be

su�cient to assume that �(�i) > 0, i = 1; : : : ; N for some basis (�i)1�i�N 2ZN�N of RN .

We will �rst describe the implementation of these energies. The proofs

of Theorem 11 and Theorem 12 will then be given in the last section of these

notes. The next sections are extracted from [22].

6 A numerical method for minimizing the Mumford�

Shah functional

In this section we describe a numerical method for the implementation of the

energies we have introduced in these lectures. We will describe the minimiza-

tion of problem (34) for p = 2, since the energies E1hand E2

hare a particular

case. In particular, we will show how the choice of �, �� and �� in�uences

the results.

62 A. Chambolle

6.1 An iterative procedure for minimizing (34)

Let us quickly describe a standard procedure for minimizing energies such

as (34). Of course we do not pretend to compute an exact minimizer of

the energy, since the high non-convexity of the problem does not allow this.

However, the iterative algorithm we describe gives satisfactory results. A

variant has been successfully implemented in the case of the approximation

of [23] (see [17]). Many other similar implementations have been made for

solving image reconstruction problems (see for instance [60, 10], and the

pioneering work [40] by D. Geman and G. Reynolds).

We assume is bounded so that the discrete problem is �nite-dimensional

for every �xed h > 0 (in the applications will be a rectangle). The non-

convexity in the energy Fh comes from the non-convexity of the functions f�,

� 2 ZN. In order to simplify the computations we will assume that the f�

are all identical, up to a rescaling:

f�(t) = ��f

��

��t

!

for all � 2 ZN (with �(�) > 0) and t � 0. The function f is nondecreasing,

and satis�es f(0) = 0, f(+1) = 1, and f 0(0) = 1. It could be, of course,

the function f(t) = t ^ 1 of section 5, except that a di�erentiable function

provides better numerical results. An interpretation of Blake and Zisserman's

GNC algorithm would correspond to approximate gradually t^1 with smooth

functions.

We will thus assume, as well, that f is concave, and di�erentiable. Thus,

�f is convex (we extend it with the value +1 on ft < 0g), and lower semi-

continuous. Let

(�v) = supt2R

tv � (�f)(t) = (�f)�(v)

be the Legendre-Fenchel transform of �f , by a classical result (�f)�� = �fso that

�f(t) = supv2R

tv � (�v) = � infv2R

tv + (v):

It is well known that the �rst sup in this equation is attained at v such that

t 2 @(�f)�(v) (the subdi�erential of (�f)� at t), and that this is equivalent

to v 2 @(�f)(t), and since @(�f)(t) = f�f 0(t)g for t > 0 and (�1;�1] fort = 0 we deduce that the sup is reached at some v 2 [�1; 0] (since for t = 0


we check that (�f)�(�1) = 0 and thus the sup is reached at v = �1). Hence

f(t) = minv2[0;1]

tv + (v)

and the min is reached for v = f 0(t). (If f(t) = t ^ 1, all of this is still true

except that for t = 1, the min is reached for any v 2 [0; 1].) We may therefore

rewrite Fh in the following way:

Fh(u) = minv(�;�)

Fh(u; v)

for v : ( \ hZN)� ( \ hZN)![0; 1] and

Fh(u; v) = hNX

x2hZN

X�2ZN

(��v(x; x+ h�)

��u(x)� u(x+ h�)

h

��2

+�� (v(x; x + h�))

h

��(�): (35)

The algorithm consists in minimizing alternatively Fh(u; v) + ku � ghk22with respect to u and v. The minimization with respect to v is straightfor-

ward, since it just consists in computing for each x; y 2 \ hZN

v(x; y) = f 0 ��

(u(x)� u(y))

��h

2!;

with � = �(y � x)=h. The minimization with respect to u is also a simple

(linear) problem, since the energy is convex and quadratic with respect to u.

Of course there is no way of knowing whether the algorithm converges to a

solution or not, what is certain is that the energy decreases and goes to some

critical level, while the function u converges to either a critical point or, if it

exists, a continuum of critical points. Notice that if f is strictly increasing,

v is everywhere strictly positive.

In the applications shown in these notes we considered f(t) = 2�arctg �x

2,

so that

f 0(t) =1

1 + �2x2

4

:

Notice that one never has to compute explicitly the position of the edges

during the minimization. Once a minimizer of the energy has been found,

it is possible to extract the edges out of the segmented image by standard

algorithms (using Canny's or more sophisticated edge detectors, with a very

64 A. Chambolle

narrow kernel since the images on which the edges have to be found are

piecewise smooth). The value of the auxiliary function v is also a good

indicator for the position of the edges (it is �large� on the edges and close

to zero everywhere else), and should be taken into account. An elementary

method may be for instance to consider the zero-crossings of the (discretized)

operator d2u(ru;ru) in the regions where v is large.

6.2 Anisotropy of the length term

In some of the above mentioned image processing papers it had been noticed

that the segmentations could be improved by trying to modify slightly the

energy, making it �less anisotropic�. Here we illustrate how the result of

Theorem 11 allows to control this anisotropy and �nd explicitly the correct

parameters for the �best� energies.

In this section, like in section 5, n will be an integer (n > 1), we will set

h = 1=n and the functions u and gh (de�ned on [0; 1) � [0; 1) \ hZ2) will be

denoted as n�nmatrices (ui;j)0�i;j<n and (ghi;j)0�i;j<n. We will compare the

following two cases (pay attention to the fact that the notations here for the

energies are di�erent from the notations in section 5, in fact, the following

E1n is similar to E2

hin sec. 5, the other energies are new)

E1n(u) = h2

Xi;j

�1

hf

�1jui+1;j � ui;jj

�1h

2!

+�1

hf

�1jui;j+1 � ui;jj

�1h

2!

+ jui;j � ghi;j j2;

(which is Blake and Zisserman's �weak membrane� energy, except that f is

smoother than t 7! t ^ 1) and

E2n(u) = h2

Xi;j

�2

hf

�2jui+1;j � ui;j j

�2h

2!

+�2

hf


�2h

2!

+

+�02hf

�02jui+1;j+1 � ui;jj

�02h

2!

+�02hf

�02jui�1;j+1 � ui;j j

�02h

2!

+ jui;j � ghi;jj2:

By Theorem 12, the limit points of the minimizers of E1n and E2

n, as n!1,

will be minimizers of respectively

E11(u) = �1

Zjru(x)j2 dx + �1�1(Su) +

Zju(x)� g(x)j2 dx


and

E21(u) = �2



(for u 2 L2() \ GSBV (), and +1 otherwise) with = (0; 1) � (0; 1),

�1 = �1, �2 = �2 + 2�02, and

�1�1(Su) =

ZSu

�1(j�1(x)j+ j�2(x)j) dH1(x);

�2�2(Su) =

ZSu

�2(j�1(x)j+j�2(x)j)+�02(j�1(x)��2(x)j+j�1(x)+�2(x)j) dH1(x);

where (�1(x); �2(x)) is the normal vector to Su at x. Simple computations

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

Figure 7: The solid line represents the length of a unit vector, as a function

of the angle. Left: for �1, right: for �2. The dashed line is the unit circle.

show that �02 = �2=p2 is an optimal choice, since it minimizes the ratio of

the length �2 of the longest (with length �2) vector in S1 over the length of

the shortest. For any recti�able 1-set E � R2 with normal vector (�1; �2) at

x we de�ne the lengths

�1(E) =�

4

ZE

(j�1(x)j+ j�2(x)j) dH1(x)

and

�2(E) =�

8

ZE

�j�1(x)j+ j�2(x)j +

j�1(x)� �2(x)j+ j�1(x) + �2(x)jp2

�dH1(x):

66 A. Chambolle

(Notice that �2 = (�1 + �1 Æ R�4)=2 where R�

4is the rotation of angle �=4

in R2 .) The choice of the parameters �=4 and �=8 is made in order to ensure

that a �random� set of lines has in average the same length �1 and �2 (and

Euclidean length), in other words, the unit circle has length 2� in both cases.

This is of course not the only possible choice. For instance, one could prefer

to parameterize these lengths in such a way that the error (with respect to

the Euclidean length) �i(emax) � 1 on the measure of the longest (for �i)

vector emax 2 S1 is equal to the error 1 � �i(emin) on the measure of the

shortest vector. In this case, one should choose as parameters 2

1+p2instead

of �=4 for �1 and2

1+p2+p

4+2p2instead of �=8 for �2.

With the choice we made, we get that �1 = 4�1=� and �2 = 8�2=�.

In both cases the limit energy is anisotropic, what is interesting is that the

second length �2 is far �less anisotropic� than the �rst length �1. As a matter

of fact, the longest vector in S1 for �1 is about 41:4% longer than the shortest

(the ratio isp2) while it is only 8:2% longer for the length �2 (the ratio is

2p2 cos �

8=(1 +

p2) =

q4 + 2

p2=(1 +

p2)). The di�erence of anisotropy of

both lengths is striking in �gure 7.

Figure 8: Original images for the next examples.


6.3 Numerical experiments

We show here a few experiments, so that the reader can see for himself

the di�erence of behaviour of the lengths �1 and �2. Notice that in all

of our comparisons we of course always choose �1 = �2 and �1 = �2. In

Figure 9 and Figure 10 (see original pictures in Figure 8), one notices that

the edges are usually nicer when length �2 is used, whereas images obtained

by minimization of energy E1n are more �blocky�. Notice in particular the

Figure 9: The segmented column. Left, by using energy E2n, then with energy

E1n. The details appear in the same order.

diagonal line at the bottom of the image. However, the vertical edges on the

column (Figure 9, second picture) are nicer with energy E1n. The reason is

clear: these edges are vertical, and the vertical and horizontal lines have a

much lower costs than lines with other orientations with this energy.

In Figures 10 and 11 the results are similar: the edges look much nicer

when energy E2n is minimized. The other two �gures (Figs. 12 and 13) show

segmentations in presence of noise. In the two segmentations of the disk, the

total length of the edges found was 6:56 � R with energy E2n and 6:40 � R

with E1n. These lengths are slightly overestimated because a few spurious

edges were found, and also because of some oscillation of the boundary, that

is due to the noise. Again in Figure 13 the result is more blocky with energy

E1n.

68 A. Chambolle

Figure 10: Segmented lady with energy E2n and two details.

Figure 11: Segmented lady with energy E1n and two details.


Figure 12: The noisy disk (grey level values 64 (disk) and 192 (background),

std. dev. of noise 40). Middle, the segmented disk with energy E2n. Right,

with E1n.

Figure 13: A noisy image (std. dev.= 25 for values between 0 and 255) and

the segmented outputs, by minimizing E2n (middle) and E1

n (right).

70 A. Chambolle

In the last two experiments we used another energy, namely,

E3n(u) = h2

Xi;j

�3

hf

�3jui+1;j � ui;j j

�3h

2!

+�3

hf


�3h

2!

+

+�03hf

�03jui�1;j+1 � ui;jj

�03h

2!

+

+�003hf

�003jui+2;j�1 � ui;j j

�003h

2!

+�003hf

�003jui�1;j+2 � ui;jj

�003h

2!

+ jui;j � ghi;jj2:

We choose �03 = �3=p2 and �003 = �3=

p5. Now, as n!1, the limit points of

the minimizers of E3n minimize

E31(u) = �3



with �3 = �3 + 2�03 + 10�003 ,

�3(E) =�

16

ZE

�j�1(x)j+ j�2(x)j+

j�1(x)� �2(x)j+ j�1(x) + �2(x)jp2

+

+j2�1(x)� �2(x)j+ j�1(x) + 2�2(x)j+ j2�1(x) + �2(x)j+ j�1(x)� 2�2(x)jp

5

�dH1(x)

(this time �3 = (�1+�1 ÆR�4+�1 ÆR�+�1 ÆR��)=4 with � = arctg 2), and

�3 = 16�3=�. Figure 14 illustrates how �isotropic� the measure �3 is, and

Figures 15 and 16 show examples. (Now, the length of the longest vector

in S1 is about 5:0% greater than the length of the shortest.) The results

look slightly better than the ones obtained with energy E2n, however, the

computational cost is quite higher.


-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

Figure 14: Same as Figure 7, this time for �3.

72 A. Chambolle

Figure 15: Segmentation with energy E3n (the column).


Figure 16: Segmentation with energy E3n(the lady).

74 A. Chambolle

A Proof of Theorems 11 and 12

This last section is entirely devoted to the proof of the theorems of section 5.4.

Most of it relies on Gobbino's work [43], except for some adaptations and

a slight di�erence in the proof (which avoids the use of Gobbino's technical

Lemmas 3.1 and 3.2 in [43], and makes it simpler).

We let for any u 2 Lp()

F 0(u) = (�� lim infh#0

Fh)(u)

and

F 00(u) = (�� lim suph#0

Fh)(u):

In the next section A.1 we will prove a preliminary lemma that will be

helpful in the sequel. Then, the aim of the following two sections A.2 and A.3

will be to prove Theorem 11, i.e., to prove that F 0(u) � F (u) and F 00(u) �F (u) for all u 2 Lp(). Eventually in section A.4, we will prove Theorem 12.

A.1 A compactness lemma

The lemma we show in this section will be needed to establish Theorem 12,

but it will also give some a priori information on the regularity of functions

u 2 Lp() such that F 0(u) < +1.

Lemma 7 Let hj # 0 and uhj 2 `p( \ hjZN) such that

supj

Fhj (uhj ) < +1 and supj

kuhjk1 < +1:

Then there exist a subsequence (not relabeled) uhj and u 2 SBVloc() such

that

uhj (x)!u(x) a.e. in

as j goes to in�nity, andZjru(x)j2 dx+HN�1(Su) < +1

Proof. In order to simplify the notations we drop the subscript j. Let

f = mini=1;:::;N fei , c = mini=1;:::;N �(ei) > 0, and choose �; � > 0 such that

�t ^ � � f(t) for all t � 0. We have:

Fh(uh) � 2c hNX

x2hZN

NXi=1

�

��uh(x)� uh(x+ hei)

h

��2 ^ �

h


(remember we only sum on x such that x; x + hei 2 ). We �rst show that

the sequence (uh) is bounded in BVloc() (so that it is compact in L1loc()).

Choose R > 0, i 2 f1; : : : ; Ng, and write (with

12

pNh

=nx 2 : dist(x; @) > 1

2

pNh

o)

jDiuhj ( 12

pNh

\BR(0)) �X

x2hZN\BR(0)

hN�1juh(x)� uh(x+ hei)j

�Xx2X+

2kuhk1hN�1

+hNXx2X�


h

��whereX+ =

nx 2 hZN \BR(0) : juh(x)� uh(x+ hei)j >

p�h=�

oandX� =

hZN \BR(0) nX+. Of course, we only consider points x 2 hZN such that x

and x+ hei belong to . Then, using the Cauchy-Schwarz inequality,

jDiuhj ( 12

pNh

\BR(0)) � 2kuhk1hN�1]X+

+CRN2

8<:hN Xx2X�


h

��29=;

12

� kuhk1Fh(uh)�c

+ CRN2

sFh(uh)

2�c

with C some constant depending only on N , so that eventually, for any � > 0,

suph

jDuhj(� \BR(0)) < +1:

This shows that upon extracting a subsequence we may assume that uhconverges almost everywhere in (and in L

p

loc(), as well) to some function

u that belongs to BV ( \BR(0)) for any R > 0.

Now consider the extension of uh (on RN , uh(x) being considered to be

0 outside of )

vh(y) =X

x2hZNuh(x)�N

�y � x

h

�;

where �(t) = (1 � jtj)+ for any t 2 R and �N (y) =QN

i=1�(yi) for any

y 2 RN . We estimate

Rjrvhj2 on an �elementary cell�, for instance (0; h)N :

76 A. Chambolle

Z(0;h)N

j@1vh(y)j2dy =

=

Z(0;h)N

��X

x2f0;hgN

uh(x)1

h�0

�y1 � x1

h

� NYi=2

�

�yi � xi

h

��2

dy

= h

Z(0;h)N�1

dy2 : : : dyN

��X

x2f0;hgN�1

uh(h; x)� uh(0; x)

h

NYi=2

�

�yi � xi

h

��2

= h

Z(0;h)N�2

dy3 : : : dyN

Z h

0

��y2h

� Xx2f0;hgN�2

uh(h; h; x)� uh(0; h; x)

h

NYi=3

�

�yi � xi

h

�

+�1�

y2

h

� Xx2f0;hgN�2

uh(h; 0; x)� uh(0; 0; x)

h

NYi=3

�

�yi � xi

h

��2

dy2

�h

2

2

8><>:Z(0;h)N�2

dy3 : : : dyN

��X

x2f0;hgN�2

uh(h; h; x)� uh(0; h; x)

h

NYi=3

�

�yi � xi

h

��2

+

Z(0;h)N�2

dy3 : : : dyN

��X

x2f0;hgN�2

uh(h; 0; x)� uh(0; 0; x)

h

NYi=3

�

�yi � xi

h

��29>=>;;

by induction we deduce that

Z(0;h)N

j@1vh(y)j2 dy �hN

2N�1X

x2f0;hgN�1

��uh(h; x) � uh(0; x)

h

��2 :Notice that we could therefore conclude that

Zp

Nh

jrvh(y)j2 dy � hNX

x2hZN

NXi=1


h

��2

(with pNh

=nx 2 : dist(x; @) >

pNh

o, since we control the gradient

of vh only on the cubes x + (0; h)N , x 2 hZN whose 2N vertices all belong

to ), but since we cannot control the right-hand side of this expression if it

is summed over all x we must introduce a slight modi�cation of vh: we thus

de�ne vh = vh, except each time

juh(x)� uh(x+ hei)j >

s�h

�; (A.1)


in which case we set vh � 0 on (x; x+hei)�Qi0 6=i(x�hei0 ; x+hei0) = Uh

x;ei.

The new function vh is in SBVloc(), and Svh �S(x;ei)2Xh

@Uhx;ei

where the

union is taken on Xh = f(x; ei) : (A.1) holdsg. Now, we can writeZp

Nh

jrvh(y)j2 dy �X

(x;ei)2hZNnXh


h

��2 � Fh(uh)

2c�;

(A.2)

moreover since HN�1(@Uhx;ei

) = �hN�1 (with � = 2N�1(N + 1)),

HN�1(pNh

\ Svh) � ]Xh�hN�1 � �

Fh(uh)

2c�: (A.3)

From (A.2) and (A.3) and since suph kvhk1 < +1, we deduce invoking Am-

brosio's Theorem 6 (see section 2.2.6) that some subsequence of vh converges

to a function v 2 L1() \ SBVloc(), withZjrv(x)j2 dx+HN�1(Sv) = sup

A��

ZA

jrv(x)j2 dx+HN�1(A \ Sv)

� 12c

�1�+ �

�

�lim infh#0

Fh(uh) < +1:

The proof of the lemma is achieved once we notice that v must be equal to

u (as for instance by the construction of vh and vh it is simple to check that

for any A �� with regular boundaryRA(uh(y)� vh(y)) dy!0 as h # 0).

Remark. If we drop the condition �(ei) > 0 for i = 1; : : : ; N , the result

may be false. For instance, if N = 1, � � 0 except at �2 and 2 where

�(�2) = �(2) = 1, the family (uh)h>0 de�ned by

uh(kh) =

(0 if k 2 2Z

1 if k 2 2Z+ 1for every k 2 Z

satis�es the assumptions of Lemma 7 but is not compact.

A.2 Estimate from below the ��limit

In this section we wish to prove that for all u 2 Lp(),

F (u) � F 0(u): (A.4)

We must therefore prove that for any u 2 Lp() and any sequence (uhj ) that

converges to u in Lp() as j!1 (with limj!1 hj = 0) we have,

F (u) � lim infj!1

Fhj (uhj ): (A.5)

78 A. Chambolle

Let u 2 Lp(), and we will suppose �rst that it is bounded. Choose

also an arbitrary decreasing sequence hj # 0 and functions uhj that converge

to u in Lp(). We can assume that kuhjk1 � kuk1, as truncating uhj we

decrease its energy Fhj (uhj ). It is clearly not restrictive to consider, as well,

that the lim inf is in fact a limit, and that supj Fhj (uhj ) < +1 (since if

lim infj!1 Fhj (uhj ) = +1 the result is obvious). In view of Lemma 7 we

deduce that

u 2 SBVloc() and

Zjru(x)j2 dx+HN�1(Su) < +1: (A.6)

In the sequel we will drop the subscripts j and write �h # 0� for �j!1�.

We prove (A.5) following Gobbino's method in [43], with a few modi�cations

and adaptations. Let

h =[

x2hZN\x+

��h2;h

2

�N

and notice that 12

pNh

=nx 2 : dist(x; @) > 1

2

pNh

o� h. We have

(still using the convention that we only consider in the sums the points that

fall inside )

Fh(uh) = hNX

x2hZN

X�2ZN

1

hf�

(uh(x)� uh(x+ h�))

h

2!�(�)

=

Zh

dyX

�2ZN\ 1h(h�y)

1

hf�

(uh(y)� uh(y + h�))

h

2!�(�)

=X�2ZN

�(�)

Zh\(h�h�)

1

hf�

(uh(y)� uh(y + h�))

h

2!dy:

For every � 2 ZN we let

Fh(uh; �) =

Zh\(h�h�)

1

hf�

(uh(y)� uh(y + h�))

h

2!dy:

Inequality (A.5) will follow by Fatou's lemma if we prove that for any �,

lim infh#0

Fh(uh; �) � ��

Zjhru(x); �ij2 dx + ��

ZSu

jh�u(x); �ij dHN�1(x):

(A.7)


We choose A �� . If h is small enough (i.e., h � dist(A; @)=(j�j + 12

pN))

then

Fh(uh; �) �ZA

1

hf�

(uh(y)� uh(y + h�))

h

2!dy

and it will be su�cient to show that

lim infh#0

ZA

1

hf�

(uh(y)� uh(y + h�))

h

2!dy �

� ��

ZA

jhru(x); �ij2 dx + ��

ZSu\A

jh�u(x); �ij dHN�1(x);

(A.8)

as the supremum of the right-hand side of (A.8) for all A �� is the right-

hand side of (A.7). This is part of Gobbino's result [43], but we present a

slightly di�erent approach, still based on the �slicing� (see section 2.2.6 for

technical details) of the functions uh in the direction �.

Let �? =nz 2 R

N : hz; �i = 0o, and for every z 2 �?, Az;� = fs 2 R :

z + s� 2 Ag, (uh)z;�(s) = uh(z + s�). We rewrite the �rst integral over Ain (A.8):Z

�?dHN�1(z)

ZAz;�

1

hf�

((uh)z;�(s)� (uh)z;�(s+ h))

h

2!j�j ds =

= j�j

Z�?

dHN�1(z)Xk2Z

ZAz;�\[kh;kh+h)

1

hf�

((uh)z;�(s)� (uh)z;�(s+ h))

h

2!

ds

= j�j

Z�?

dHN�1(z)

Z[0;h)

dt

(Xk2Z

1

hf�

((uh)z;�(t+ kh)� (uh)z;�(t+ (k + 1)h))

h

2!)

(by the change of variable t+ kh = s) where the sum is taken only on thek 2 Z such that t+ kh 2 Az;�. Now, with the change of variable t = h� , thisbecomes

j�j

Z�?

dHN�1(z)

Z 1

0

d�

(hXk2Z

1

hf�

((uh)z;�((� + k)h)� (uh)z;�((� + k + 1)h))

h

2!)

:

We will prove that for a.e. (z; �) 2 �? � (0; 1),

lim infh#0

hX�k 2 Z(� + k)h 2 Az;�

1

hf�

((uh)z;�((� + k)h)� (uh)z;�((� + k + 1)h))

h

2!�

� ��

ZAz;�

j _uz;�(x)j2 dx + ��H

0(Suz;� \ Az;�):

(A.9)

80 A. Chambolle

In order to prove (A.9), we need some information on the limit of ((uh)z;�((�+

k)h))k2Z as h # 0. Since, using the same changes of variables,Zjuh(y)� u(y)jp dy =

Z�?dHN�1(z)

Zz;�

j(uh)z;�(s)� uz;�(s)jp j�j ds

= j�jZ�?dHN�1(z)

Z 1

0d�

8<:hXk2Z

j(uh)z;�((� + k)h)

�uz;�((� + k)h)jp)

(where in the sum we consider only k such that (� + k)h 2 z;�) we may

assume (upon extracting a subsequence) that for a.e. (z; �) 2 �? � (0; 1),

limh#0

hXk2Z

j(uh)z;�((� + k)h)� uz;�((� + k)h)jp = 0: (A.10)

Choose a (z; �) such that (A.10) holds. By (A.6) we may also assume when

choosing z that

uz;� 2 SBVloc(z;�) and

Zz;�

j _uz;�(s)j2 ds + H0(Suz;�) < +1;

so that uz;� is continuous except at a �nite number of points. Thus, for

almost all s 2 z;�,

limh#0

uz;�

�� +

�s

h

��h

�= uz;�(s)

(where [�] denotes the integer part). We easily deduce from this and (A.10)

that the piecewise constant function vh : z;�!R de�ned by

vh(s) = (uh)z;�

�� +

�s

h

��h

�converges to uz;� in L

p

loc(z;�).

Remark. Following Gobbino (proof of Lemma 3.3, Step 2 in [43]) we could

also prove that for a.e. � 2 (0; 1), uz;�((� + [s=h])h)!uz;�(s) in L1loc(z;�),

so that the a priori information on the regularity of u is not really needed.

We return to the proof of inequality (A.9). Notice that if f(t) = t^1, it issimply a consequence of Theorem 9. The proof that follows is needed because


we want to consider more general functions f , and provide a generalization

to such functions of the thesis of Theorem 9. For any I �� Az;�, we denote

G(vh; I) =

ZI

1

hf�

jvh(s+ h)� vh(s)j

h

2!ds

=Xk2Z

j(kh; kh + h) \ Ij1hf�

jvh((k + 1)h) � vh(kh)j

h

2!:

If h is small enough, (� + [s=h])h 2 Az;� for every s 2 I so that the lim inf

in (A.9) is greater than lim infh#0G(vh; I). Therefore, we just need to prove

that for any I �� Az;�,

lim infh#0

G(vh; I) � ��

ZI

j _uz;�(s)j2 ds + ��H0(Suz;� \ I); (A.11)

indeed, taking then the lowest greater bound of the right-hand term of (A.11)

for all I, we will get (A.9). Because of the super-additivity of lim infh#0G(vh; �)we may assume without loss of generality that I is an interval. To prove (A.11),

we then choose �; � > 0 such that �t ^ � � f�(t) for all t � 0 (noticing that

��respectively, ��may be chosen as close as wanted to ��resp., ��), and

we write

G(vh; I) �X

(kh;kh+h)�Ih�

��vh((k + 1)h) � vh(kh)

h

��2 ^ �:Rede�ning a function ~vh with ~vh(kh) = vh(kh) for kh 2 I, a�ne on the

intervals (kh; kh+h) � I such that h��vh((k+1)h)�vh(kh)

h

��2 � � and piecewise

constant, jumping once on the intervals with the reverse inequality (just like

in the proof of Theorem 9), we get

G(vh; I) � �

ZIh

j _~vh(s)j2 ds + �H0(S~vh \ Ih)

with Ih = fx 2 I : dist(x;R n I) > hg, so that invoking Theorem 6

(section 2.2.6) we get the existence of a function ~v such that some subsequence

of ~vh goes to ~v a.e., and that satis�es

�

ZI

j _~v(s)j2 ds + �H0(S~v \ I) � lim infh#0

G(vh; I): (A.12)

We check then that ~v has to be equal to uz;� (noticing easily, for instance,

that (vh � ~vh)*0 weakly in Lp). If �!�� we deduce from (A.12)

��

ZI

j _uz;�(s)j2 ds � lim infh#0

G(vh; I); (A.13)

82 A. Chambolle

whereas sending � to �� we get

��H0(Suz;� \ I) � lim infh#0

G(vh; I): (A.14)

Inequality (A.11) is deduced from the last two inequalities by subdividing the

interval I into suitable subintervals (the connected components of a small

neighborhood of Suz;� and its complement) and using the appropriate in-

equality (A.13) or (A.14) in each subinterval. Hence (A.9) holds, and using

Fatou's lemma we deduce (A.8), as

j�jZ�?dHN�1(z)

��

ZAz;�

j _uz;�(s)j2 + ��H0(Suz;� \Az;�)

!=

= ��

ZA

jhru(x); �ij2 dx + ��

ZSu\A

jh�u(x); �ijdHN�1(x):

Inequality (A.5) therefore holds in the case u 2 L1().

Now, if u 2 Lp() is not bounded, choose again uhj!u in Lp(). Con-

sider uk = (�k _ u) ^ k and ukhj

= (�k _ uhj ) ^ k, clearly ukhj!uk in Lp(),

so that

F (uk) � lim infj!1

Fhj (uk

hj):

But as f is increasing, Fhj (uk

hj) � Fhj (uhj ) so that

F (uk) � lim infj!1

Fhj (uhj ):

If this is �nite, we conclude by noticing that limk!1 F (uk) = F (u) (by (16),

(17)); so that the proof of (A.4) is achieved.

Remark. Notice that if uhj!u in Lp

loc(), the result still holds. Indeed,

for any A �� we have uhj!u in Lp(A) and since the result holds in this

case we can write

F (u;A) � lim infj!1

Fhj (uhj ; A) � lim infj!1

Fhj (uhj ;):

Then, as F (u;) = supA�� F (u;A) we get (A.5). (Thus the Fh also ��

converge to F in Lp() endowed with the Lp

loc() topology.)


A.3 Estimate from above the ��limit

Given u 2 GSBVloc() \ Lp() with F (u) = F (u;) < +1, we want to

build uh 2 `p( \ hZN) such that uh!u in Lp() and

lim suph#0

Fh(uh) � F (u): (A.15)

In order to be able to assume some regularity on the function u we �rst prove

the following lemma. It is a (simpler) variant of the results in [30] and [26]

that are usually needed to show the �� lim sup inequality for most approx-

imations the Mumford�Shah functional, like Ambrosio and Tortorelli's. For

F", however, a very strong regularity of the jump set is not needed, and this

lemma is su�cient.

Lemma 8 Let u 2 GSBVloc() \ Lp() with F (u) < +1. There exists

a sequence (uk)k�1 � SBV () of bounded functions with bounded supports,

that are almost everywhere continuous in and such that

� uk!u in Lp() as k goes to in�nity,

� limk!1 F (uk) = F (u).

Remark. The information on the support of uk makes sense only when

is unbounded.

Proof. For every integer k � 1 �rst let uk = (�k _ u) ^ k be the truncated

of u at level k. We choose in Lp(RN ) a minimizer vk of

v 7! F (v) + k

ZRNjv(x)� uk(x)jp dx:

Then,

kvk � ukLp(RN) � kvk � ukkLp(RN) + kuk � ukLp(RN) �

��1

kF (uk)

� 1p

+

Zfjuj>kg

(ju(x)j � k)p dx

! 1p

��1

kF (u)

� 1p

+

Zfjuj>kg

ju(x)jp dx! 1

p

!0

as k!1, moreover (see the observation about functional E0 de�ned by (20)

in section 2.3.2), we know that HN�1(\Svk nSu) = 0 and vk 2 C1(nSvk).

84 A. Chambolle

In particular vk is almost everywhere continuous. We also have that F (vk) �F (uk) � F (u) and jvk(x)j � k for all x 2 . Set now for every integer n > 1

and x 2

vk;n(x) =

8>>>>><>>>>>:vk(x)�

1

nif vk(x) > 1=n;

0 if jvk(x)j � 1=n; and

vk(x) +1

nif vk(x) < �1=n.

Clearly vk;n is still a.e. continuous and goes to vk in Lp() as n!1, so that

we can choose nk such that kvk;nk � vkkLp() � 1=k. We set wk = vk;nk .

We also have Swk� Svk ,

rwk =

8<: rvk a.e. in fx 2 : jvk(x)j > 1=nkg ;

0 a.e. in the complement,

and

jfwk 6= 0gj =��jvkj > 1

nk

�� np

k

Zjvk(x)jp dx < +1

so that in particular wk 2 Lq() for any q 2 [1;+1].

Choose at last � 2 C10 (RN ) with 0 � � � 1 and � � 1 on B1(0), and set

for R > 0 and any x 2 wk;R(x) = ��x

R

�wk(x). For any R,

Swk;R� Swk

� Svk

and if � 2 ZN,Z

jhrwk;R(x); �ij2 dx =

=

ZBR(0)\

jhrwk(x); �ij2 dx

+

ZnBR(0)

�� xR�hrwk(x); �i+

wk(x)

R

�r�

�x

R

�; �

��2 dx�ZBR(0)\

jhrvk(x); �ij2 dx

+2

ZnBR(0)

jhrvk(x); �ij2 dx +C

R2j�j2

ZnBR(0)

jwk(x)j2 dx


with C = 2kr�k2L1(RN)

. Hence

F (wk;R) � F (vk)

+

0@ X�2ZN

��(�)j�j21A(Z

nBR(0)jrvk(x)j2 dx +

C

R2

ZnBR(0)

jwk(x)j2 dx):

Since wk and rvk are in L2(), we can choose R large enough in order to

have

F (wk;R) � F (vk) +1

k: (A.16)

Choose Rk large enough so that (A.16) holds and kwk;Rk� wkkLp() � 1=k,

and set uk = wk;Rk. Clearly uk is still a.e. continuous. Moreover, F (uk) �

F (u)+1=k, uk goes to u in Lp() as k!1, and by Theorem 6 (section 2.2.6)

we deduce that

F (u) � lim infk!1

F (uk)

so that limk!1 F (uk) = F (u) and the lemma is true.

We now establish (A.15). First consider the case = RN . Given u 2

GSBVloc(RN ) \ Lp(RN ) with F (u) < +1, we build invoking Lemma 8 a

sequence of compactly supported, bounded and a.e. continuous functions ukconverging to u such that F (uk)!F (u) as k goes to in�nity. By a standard

diagonalization procedure, if we know how to build for every k a sequence

((uk)h)h>0 converging to uk in Lp(RN ) as h # 0, such that

lim suph#0

Fh((uk)h) � F (uk);

we will be able to �nd uh with uh!u and satisfying (A.15). In the sequel we

may therefore assume that u is bounded, compactly supported, and contin-

uous at almost every x 2 RN .

For y 2 (0; h)N de�ne uy

h2 `p(hZN) by uy

h(x) = u(y+x) for any x 2 hZN.

We compute the mean of Fh(uy

h) over (0; h)N :

h�N

Z(0;h)N

Fh(uy

h) dy =X�2ZN

�(�)X

x2hZN

Z(0;h)N

1

hf�

(u(y + x) � u(y + x+ h�))

h

2!

dy

=X�2ZN

�(�)

ZRN

1

hf�

(u(y)� u(y + h�))

h

2!

dy:

86 A. Chambolle

At this point (following exactly Gobbino's proof), we writeZRN

1

hf�

(u(y) � u(y + h�))

h

2!dy =

=

Z�?dHN�1(z)

ZR

dt1

hf�

0@(u(z + t �j�j)� u(z + t �j�j + h�))

h

21A= j�j

Z�?dHN�1(z)

ZR

ds1

hf�

(u(z + s�)� u(z + (s+ h)�))

h

2!

= j�jZ�?dHN�1(z)F 1

�;h(uz;�)

where uz;�(s) = u(z + s�) and we have set

F 1�;h(v) =

ZR

1

hf�

(v(s) � v(s+ h))

h

2!ds

for any measurable function v. Since we assumed f�(t) � ��t ^ ��, we have

F 1�;h(v) �

ZR

��

��v(s)� v(s+ h)

h

��2 ^ ��

hds (A.17)

and as shown in [43] by M. Gobbino, this is less than

��

ZR

j _v(s)j2 ds + ��H0(Sv)

provided v 2 SBVloc(R) and this expression is �nite.

Exercise. Check this fact, by computing the integral in (A.17) separately

over Shv =Ss2Sv [s� h; s] and over R n Shv .

Therefore,

h�NZ(0;h)N

Fh(uy

h) dy �

�X�2ZN

�(�)j�jZ�?dHN�1(z)F 1

�;h(uz;�)

�X�2ZN

�(�)j�jZ�?dHN�1(z)

��

ZR

jhru(z + s�); �ij2 ds + ��H0(Suz;�)

�=

X�2ZN

�(�)

�ZRN

��jhru(x); �ij2 dx +

ZSu

��jh�u(x); �ij dHN�1(x)

�= F (u):


Thus, for y in some set of positive measure in (0; h)N ,

Fh(uy

h) � F (u): (A.18)

For all h we choose yh such that inequality (A.18) holds and set uh = uyh

h.

We easily check that if u is continuous at x 2 RN then uh(x)!u(x) as h # 0

(since uh(x) = u(x0) for some x0 such that jx � x0j < 32

pNh). Since u is

almost everywhere continuous, uh converges to u a.e. in RN . We also have

kuhkL1(RN) � kukL1(RN) and the functions uh, u are zero outside some

compact set so that by Lebesgue's theorem uh!u in Lp(RN ). Since clearly,

(A.15) holds for this sequence uh, the proof of the case = RN is achieved.

We now return to the general case where is a Lipschitz domain. The

method used in order to localize the previous result is adapted from [23].

We choose a function u 2 GSBVloc() \ Lp(), and once again invoking

Lemma 8 we see that it is not restrictive to assume that u is bounded with

bounded support. Since we assumed that @ is Lipschitz, (and since u is

zero outside some bounded set) we can extend u outside of (using the

same re�ection procedure as for instance in [34] for the extension of W 1;p

functions) into a bounded compactly supported SBV function (still denoted

by u) such that HN�1(@ \ Su) = 0 and F (u;RN ) < +1. Then, we build

(uh) like previously, such that uh goes to u in Lp(RN ) and

lim suph#0

Fh(uh;RN ) � F (u;RN ):

We can write

Fh(uh;RN ) � Fh(uh;) + Fh(uh;

c)

where cis the complement of in R

N . Notice that we have dropped all

terms involving di�erences of values of uh at one point in and another in

c. Sending h to zero we get

lim suph#0

Fh(uh;RN ) � lim sup

h#0Fh(uh;) + lim inf

h#0Fh(uh;

c);

and we deduce from (A.4) that

lim suph#0

Fh(uh;) + F (u;c) � lim sup

h#0Fh(uh;R

N ) � F (u;RN ):

Thus, u being extended in such a way that F (u;c) < +1,

lim suph#0

Fh(uh;) � F (u;):

Since HN�1(@ \ Su) = 0, F (u;) = F (u;) and we get the thesis. This

achieves the proof of Theorem 11.

88 A. Chambolle

A.4 Proof of Theorem 12

For any h > 0 let (uh)h>0 be a minimizer in `p( \ hZN) of

Fh(u) +

Zju(x)� g(x)jp dx (A.19)

where g 2 L1() \ Lp().Replacing uh with (�kgkL1() _ uh) ^ kgkL1() we decrease the energy,

thus in fact kuhkL1() � kgkL1(). In view of Lemma 7, since suph>0 Fh(uh) <

+1, some subsequence (uhj )j�1 of (uh)h>0 converges to a function u 2SBVloc() a.e. in . From the uniform bound on kuhk1 we deduce that

uhj!u in Lp

loc().

If jj < +1, the convergence is in Lp() and we simply conclude invoking

Theorem 7 (section 2.4). Otherwise, we know (by the remark at the end of

section A.2 and Fatou's lemma) that

F (u) +

Zju(x)� g(x)jp dx � lim inf

j!1Fhj (uhj ) +

Zjuhj (x)� g(x)jp dx:

For any v 2 Lp(), we consider (vhj )j�1 a sequence converging to v in Lp()such that

lim supj!1

Fhj (vhj ) � F (v):

For all j we have that

Fhj (uhj ) +

Zjuhj (x)� g(x)jp dx � Fhj (vhj ) +

Zjvhj (x)� g(x)jp dx;

so that at the limit we get

F (u) +

Zju(x)� g(x)jp dx � F (v) +

Zjv(x)� g(x)jp dx;

showing the minimality of u. If we choose v = u, we also deduce that

limj!1

kuhj � gkLp() = ku� gkLp();

thus, by equi-integrability, uhj!u strongly in Lp(), since we had the con-

vergence in Lp

loc(). In the case where we minimize

Fh(u) +�ku� ghkp

�pinstead of (A.19) the proof is not di�erent.


References

[1] G. Alberti. Variational models for phase transitions, an approach via

gamma-convergence. In G. Buttazzo et al., editor, Di�erential Equations

and Calculus of Variations. Springer�Verlag, 2000. (Also available at

http://cvgmt.sns.it/papers/).

[2] L. Ambrosio. A compactness theorem for a new class of functions with

bounded variation. Boll. Un. Mat. Ital. (7), 3-B:857�881, 1989.

[3] L. Ambrosio. Variational problems in SBV and image segmentation.

Acta Appl. Math., 17:1�40, 1989.

[4] L. Ambrosio. Existence theory for a new class of variational problems.

Arch. Rat. Mech. Anal., 111(1):291�322, 1990.

[5] L. Ambrosio. A new proof of the SBV compactness theorem. Calc. Var.

Partial Di�erential Equations, 3(1):127�137, 1995.

[6] L. Ambrosio, N. Fusco, and D. Pallara. Functions of Bounded Variation

and Free Discontinuity Problems. Oxford mathematical monographs.

Oxford Clarendon Press, 2000.

[7] L. Ambrosio and V.M. Tortorelli. Approximation of functionals depend-

ing on jumps by elliptic functionals via �-convergence. Comm. Pure

Appl. Math., 43(8):999�1036, 1990.

[8] L. Ambrosio and V.M. Tortorelli. On the approximation of free discon-

tinuity problems. Boll. Un. Mat. Ital. (7), 6-B:105�123, 1992.

[9] H. Attouch. Variational convergence for functions and operators. Ap-

plicable Mathematics Series. Pitman (Advanced Publishing Program),

Boston, Mass.�London, 1984.

[10] G. Aubert, M. Barlaud, P. Charbonnier, and L. Blanc-Féraud. Deter-

ministic edge-preserving regularization in computed imaging. Technical

Report TR#94-01, I3S, CNRS URA 1376, Sophia-Antipolis, France,

1994.

[11] R. Azencott. Image analysis and Markov �elds. In SIAM Proceedings

of 1st Int. Conf. Appl. Math., Paris, 1987.

90 A. Chambolle

[12] R. Azencott. Markov �elds and image analysis. In Proceedings AFCET

Congress, Antibes, France, 1987.

[13] G. Bellettini and A. Coscia. Discrete approximation of a free disconti-

nuity problem. Numer. Funct. Anal. Optim., 15(3-4):201�224, 1994.

[14] A. Blake and A. Zisserman. Visual Reconstruction. MIT Press, 1987.

[15] L. Blanc-Féraud and M. Barlaud. Restauration d'images bruitées par

analyse multirésolution et champs de Markov. In proceedings of �13e

colloque GRETSI sur le traitement du signal et des images�, Juan-les-

Pins, France, pages 829�832, 1991.

[16] B. Bourdin. Image segmentation with a �nite element method. M2AN

Math. Model. Numer. Anal., 33(2):229�244, 1999.

[17] B. Bourdin and A. Chambolle. Implementation of a �nite-elements

approximation of the Mumford�Shah functional. Numer. Math.,

85(4):609�646, 2000.

[18] A. Braides. Approximation of free-discontinuity problems. Number 1694

in Lecture Notes in Mathematics. Springer�Verlag, Berlin, 1998.

[19] A. Braides and G. Dal Maso. Non-local approximation of the Mumford�

Shah functional. Calc. Var. Partial Di�erential Equations, 5(4):293�322,

1997.

[20] A. Chambolle. Un théorème de ��convergence pour la segmentation des

signaux. C. R. Acad. Sci. Paris, t. 314 Série I:191�196, 1992.

[21] A. Chambolle. Image segmentation by variational methods: Mumford

and Shah functional and the discrete approximations. SIAM J. Appl.

Math., 55(3):827�863, 1995.

[22] A. Chambolle. Finite-di�erences discretizations of the Mumford�Shah

functional. M2AN Math. Model. Numer. Anal., 33(2):261�288, 1999.

[23] A. Chambolle and G. Dal Maso. Discrete approximation of the

Mumford�Shah functional in dimension two. M2AN Math. Model. Nu-

mer. Anal., 33(4):651�672, 1999.

[24] A. Chambolle and P.-L. Lions. Image recovery via total variation mini-

mization and related problems. Numer. Math., 76(2):167�188, 1997.


[25] T. F. Chan, G. H. Golub, and P. Mulet. A nonlinear primal-dual method

for total variation-based image restoration. SIAM J. Sci. Comput.,

20(6):1964�1977, 1999.

[26] G. Cortesani. Strong approximation of GSBV functions by piecewise

smooth functions. Ann. Univ. Ferrara Sez. VII (N.S.), 43:27�49, 1997.

[27] G. Dal Maso. An introduction to �-convergence. Birkhäuser, Boston,

1993.

[28] G. Dal Maso, J.-M. Morel, and S. Solimini. A variational method in

image segmentation: Existence and approximation results. Acta Math.,

168:89�151, 1992.

[29] E. De Giorgi, M. Carriero, and A. Leaci. Existence theorem for a min-

imum problem with free discontinuity set. Arch. Rational Mech. Anal.,

108:195�218, 1989.

[30] F. Dibos and E. Séré. An approximation result for the minimizers of

Mumford�Shah functional. Boll. Un. Mat. Ital. (7), 11-A:149�162, 1997.

[31] D. C. Dobson and C. R. Vogel. Convergence of an iterative method

for total variation denoising. SIAM J. Numer. Anal., 34(5):1779�1791,

1997.

[32] R. C. Dubes, A. K. Jain, S. G. Nadabar, and C. C. Chen. MRF model-

based algorithms for image segmentation. In proc. 10th IEEE Int. Conf

on Pattern Recognition, Atlantic City, pages 808�814, 1990.

[33] S. Durand, F. Malgouyres, and B. Rougé. Image de-blurring, spectrum

interpolation and application to satellite imaging. Technical Report

9916, CMLA, ENS Cachan, 1999.

[34] L. C. Evans and R. F. Gariepy. Measure theory and �ne properties of

functions. Studies in Advanced Mathematics. CRC Press, Boca Raton,

FL, 1992.

[35] K. J. Falconer. The geometry of fractal sets. Cambridge University

Press, Cambridge, 1985.

[36] H. Federer. Geometric Measure Theory. Springer�Verlag, New�York,

1969.

92 A. Chambolle

[37] S. Finzi-Vita and P. Perugia. Some numerical experiments on the energy

minimization problem. In Proc. of the Second European Workshop on

Image Processing and Mean Curvature Motion, pages 233�240, Palma

de Mallorca, September 1995.

[38] D. Geiger and F. Girosi. Parallel and deterministic algorithms for MRFs:

surface reconstruction. IEEE Trans. PAMI, PAMI-13(5):401�412, May

1991.

[39] D. Geiger and A. Yuille. A common framework for image segmentation.

Internat. J.Comp. Vision, 6(3):227�243, August 1991.

[40] D. Geman and G. Reynolds. Constrained image restoration and the

recovery of discontinuities. IEEE Trans. PAMI, PAMI-3(14):367�383,

1992.

[41] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions,

and the Bayesian restoration of images. IEEE Trans. PAMI, PAMI-

6(6), November 1984.

[42] E. Giusti. Minimal surfaces and functions of bounded variation.

Birkhäuser, Boston, 1984.

[43] M. Gobbino. Finite di�erence approximation of the Mumford�Shah

functional. Comm. Pure Appl. Math., 51(2):197�228, 1998.

[44] F. Guichard and F. Malgouyres. Total variation based interpolation.

In Proceedings of the European Signal Processing Conference, volume 3,

pages 1741�1744, 1998.

[45] S. Z. Li. Markov Random Field Modeling in Computer Vision. Spinger

Verlag, 1995. (see also

http://markov.eee.ntu.ac.sg:8000/�szli/MRF_Book/MRF_Book.html).

[46] P.-L. Lions, S. J. Osher, and L. Rudin. Denoising and deblurring using

constrained nonlinear partial di�erential equations. Technical report,

Cognitech Inc., Santa Monica, CA, 1992.

[47] F. Malgouyres and F. Guichard. Edge direction preserving image zoom-

ing: a mathematical and numerical analysis. Technical Report 9930,

CMLA, ENS Cachan, 1999.


[48] J.-M. Morel and S. Solimini. Variational Methods in Image Segmenta-

tion. Birkhäuser, Boston, 1995.

[49] D. Mumford and J. Shah. Boundary detection by minimizing function-

als, I. In Proc. IEEE Conf. on Computer Vision and Pattern Recogni-

tion, San Francisco, 1985. (also Image Understanding , 1988).

[50] D. Mumford and J. Shah. Optimal approximation by piecewise smooth

functions and associated variational problems. Comm. Pure Appl.

Math., 42:577�685, 1989.

[51] M. Nitzberg, D. Mumford, and T. Shiota. Filtering, segmentation and

depth. Number 662 in Lecture Notes in Computer Science. Springer�

Verlag, Berlin, 1993.

[52] T. J. Richardson and S. K. Mitter. A variational formulation-based edge

focussing algorithm. Sadhana, 22(4):553�574, 1997.

[53] L. Rudin and S. J. Osher. Total variation based image restoration with

free local constraints. In Proceedings of the IEEE ICIP'94, volume 1,

pages 31�35, Austin, Texas, 1994.

[54] L. Rudin, S. J. Osher, and E. Fatemi. Nonlinear total variation based

noise removal algorithms. Physica D., 60:259�268, 1992. [also in Experi-

mental Mathematics: Computational Issues in Nonlinear Science (Proc.

Los Alamo Conf. 1991)].

[55] L. Schwartz. Théorie des distributions. Hermann, Paris, 1966.

[56] J. Shah. Properties of segmentations which minimize energy functionals.

(Preprint Northeastern Univ. Math. Dept., Boston), December 1988.

[57] J. Shah. Parameter esimation, multiscale representation and algorithms

for energy-minimizing segmentations. In proc. 10th IEEE Int. Conf. on

Pattern Recognition, Atlantic City, pages 815�819, 1990.

[58] J. Shah. Segmentation by nonlinear di�usion. In proc. IEEE Comp.

Soc. Conf. on Pattern Recognition, pages 202�207, 1991.

[59] J. Shah. Segmentation by nonlinear di�usion, II. In proc. IEEE Comp.

Soc. Conf. on Pattern Recognition, Champaign, IL, pages 644�647, June

1992.

94 A. Chambolle

[60] C. R. Vogel and M. E. Oman. Iterative methods for total variation

denoising. SIAM J. Sci. Comput, 17(1):227�238, 1996. Special issue on

iterative methods in numerical linear algebra (Breckenridge, CO, 1994).

[61] C. R. Vogel and M. E. Oman. Fast, robust total variation-based re-

construction of noisy, blurred images. IEEE Trans. Image Process.,

7(6):813�824, 1998.

[62] W. P. Ziemer. Weakly Di�erentiable Functions. Springer�Verlag, Berlin,

1989.

ICTP Lultra.sdk.free.fr/docs/DxO/Mathematical Problems In Image...ICTP L e ctur Notes SCHOOL ON MA...

Documents

Transcript of ICTP Lultra.sdk.free.fr/docs/DxO/Mathematical Problems In Image...ICTP L e ctur Notes SCHOOL ON MA...