Sparsity and Compressed Sensing
-
Upload
gabriel-peyre -
Category
Documents
-
view
838 -
download
4
description
Transcript of Sparsity and Compressed Sensing
Sparsity andCompressed Sensing
Gabriel Peyré
www.numerical-tours.com
Overview
• Inverse Problems Regularization
• Sparse Synthesis Regularization
• Theoritical Recovery Guarantees
• Compressed Sensing
• RIP and Polytopes CS Theory
• Fourier Measurements
• Convex Optimization via Proximal Splitting
Forward model:
Observations Operator Noise(Unknown)Input� : RQ � RP
Inverse Problems
y = K f0 + w � RP
Forward model:
Observations Operator Noise(Unknown)Input� : RQ � RP
Denoising: K = IdQ, P = Q.
Inverse Problems
y = K f0 + w � RP
Forward model:
Observations Operator Noise(Unknown)Input
(Kf)(x) =�
0 if x � �,f(x) if x /� �.
K
� : RQ � RP
Denoising: K = IdQ, P = Q.
Inpainting: set � of missing pixels, P = Q� |�|.
Inverse Problems
y = K f0 + w � RP
Forward model:
Observations Operator Noise(Unknown)Input
(Kf)(x) =�
0 if x � �,f(x) if x /� �.
K
� : RQ � RP
Denoising: K = IdQ, P = Q.
Inpainting: set � of missing pixels, P = Q� |�|.
Super-resolution: Kf = (f � k) �� , P = Q/� .
Inverse Problems
K
y = K f0 + w � RP
Kf = (p�k)1�k�K
Inverse Problem in Medical Imaging
Magnetic resonance imaging (MRI):
Kf = (p�k)1�k�K
Kf = (f(�))���
Inverse Problem in Medical Imaging
f
Magnetic resonance imaging (MRI):
Other examples: MEG, EEG, . . .
Kf = (p�k)1�k�K
Kf = (f(�))���
Inverse Problem in Medical Imaging
f
Noisy measurements: y = Kf0 + w.
f� � argminf�RQ
12||y �Kf ||2 + � J(f)
Prior model: J : RQ � R assigns a score to images.
Inverse Problem Regularization
Noisy measurements: y = Kf0 + w.
f� � argminf�RQ
12||y �Kf ||2 + � J(f)
Prior model: J : RQ � R assigns a score to images.
Inverse Problem Regularization
Data fidelity Regularity
Noisy measurements: y = Kf0 + w.
Choice of �: tradeo�
||w||Regularity of f0
J(f0)Noise level
f� � argminf�RQ
12||y �Kf ||2 + � J(f)
Prior model: J : RQ � R assigns a score to images.
Inverse Problem Regularization
Data fidelity Regularity
Noisy measurements: y = Kf0 + w.
No noise: �� 0+, minimize
Choice of �: tradeo�
||w||Regularity of f0
J(f0)Noise level
f� � argminf�RQ
12||y �Kf ||2 + � J(f)
Prior model: J : RQ � R assigns a score to images.
f� � argminf�RQ,Kf=y
J(f)
Inverse Problem Regularization
Data fidelity Regularity
J(f) =�
||�f(x)||2dx
Smooth and Cartoon Priors
�|�f |2
J(f) =�
||�f(x)||2dx
J(f) =�
||�f(x)||dx
J(f) =�
Rlength(Ct)dt
Smooth and Cartoon Priors
�|�f |2 �
|�f |
Inpainting Example
Input y = Kf0 + w Sobolev Total variation
Overview
• Inverse Problems Regularization
• Sparse Synthesis Regularization
• Theoritical Recovery Guarantees
• Compressed Sensing
• RIP and Polytopes CS Theory
• Fourier Measurements
• Convex Optimization via Proximal Splitting
Q
N
Dictionary � = (�m)m � RQ�N , N � Q.
Redundant Dictionaries
�
�m = ei�·, m�
frequencyFourier:
Q
N
Dictionary � = (�m)m � RQ�N , N � Q.
Redundant Dictionaries
�
Wavelets:�m = �(2�jR��x� n)
m = (j, �, n)
scale
orientation
position
�m = ei�·, m�
frequencyFourier:
Q
N
Dictionary � = (�m)m � RQ�N , N � Q.
Redundant Dictionaries
�
� = 2� = 1
Wavelets:�m = �(2�jR��x� n)
m = (j, �, n)
scale
orientation
position
�m = ei�·, m�
frequencyFourier:
DCT, Curvelets, bandlets, . . .
Q
N
Dictionary � = (�m)m � RQ�N , N � Q.
Redundant Dictionaries
�
� = 2� = 1
Synthesis: f =�
m xm�m = �x.
xf
Image f = �xCoe�cients x
Wavelets:�m = �(2�jR��x� n)
m = (j, �, n)
scale
orientation
position
�m = ei�·, m�
frequencyFourier:
DCT, Curvelets, bandlets, . . .
Q
N
Dictionary � = (�m)m � RQ�N , N � Q.
Redundant Dictionaries
� =�
� = 2� = 1
Ideal sparsity: for most m, xm = 0.
J0(x) = # {m \ xm �= 0}
Sparse Priors
Image f0
Coe�cients x
�
Ideal sparsity: for most m, xm = 0.
J0(x) = # {m \ xm �= 0}
Sparse approximation: f = �x whereargminx�RN
||f0 ��x||2 + TJ0(x)
Sparse Priors
Image f0
Coe�cients x
�
Ideal sparsity: for most m, xm = 0.
J0(x) = # {m \ xm �= 0}
Sparse approximation: f = �x where
Orthogonal �: ��� = ��� = IdN
xm =�
�f0, �m� if |�f0, �m�| > T,0 otherwise.
��
f = �� � ST � �(f0)ST
argminx�RN
||f0 ��x||2 + TJ0(x)
Sparse Priors
Image f0
Coe�cients x
�
Ideal sparsity: for most m, xm = 0.
J0(x) = # {m \ xm �= 0}
Sparse approximation: f = �x where
Orthogonal �: ��� = ��� = IdN
xm =�
�f0, �m� if |�f0, �m�| > T,0 otherwise.
��
f = �� � ST � �(f0)ST
Non-orthogonal �:�� NP-hard.
argminx�RN
||f0 ��x||2 + TJ0(x)
Sparse Priors
Image f0
Coe�cients x
�
Image with 2 pixels:
q = 0
J0(x) = # {m \ xm �= 0}J0(x) = 0 �� null image.J0(x) = 1 �� sparse image.J0(x) = 2 �� non-sparse image.
x2
Convex Relaxation: L1 Prior
x1
Image with 2 pixels:
q = 0 q = 1 q = 2q = 3/2q = 1/2
Jq(x) =�
m
|xm|q
J0(x) = # {m \ xm �= 0}J0(x) = 0 �� null image.J0(x) = 1 �� sparse image.J0(x) = 2 �� non-sparse image.
x2
Convex Relaxation: L1 Prior
�q priors: (convex for q � 1)
x1
Image with 2 pixels:
q = 0 q = 1 q = 2q = 3/2q = 1/2
Jq(x) =�
m
|xm|q
J0(x) = # {m \ xm �= 0}J0(x) = 0 �� null image.J0(x) = 1 �� sparse image.J0(x) = 2 �� non-sparse image.
x2
J1(x) =�
m
|xm|
Convex Relaxation: L1 Prior
Sparse �1 prior:
�q priors: (convex for q � 1)
x1
L1 Regularization
coe�cientsx0 � RN
L1 Regularization
coe�cients image�
x0 � RN f0 = �x0 � RQ
L1 Regularization
observations
w
coe�cients image� K
x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP
L1 Regularization
observations
� = K �⇥ ⇥ RP�N
w
coe�cients image� K
x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP
Fidelity Regularization
minx�RN
12
||y � �x||2 + �||x||1
L1 Regularization
Sparse recovery: f� = �x� where x� solves
observations
� = K �⇥ ⇥ RP�N
w
coe�cients image� K
x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP
x� � argmin�x=y
�
m
|xm|
x�
�x = y
Noiseless Sparse Regularization
Noiseless measurements: y = �x0
x� � argmin�x=y
�
m
|xm|
x�
�x = y
x� � argmin�x=y
�
m
|xm|2
Noiseless Sparse Regularization
x�
�x = y
Noiseless measurements: y = �x0
x� � argmin�x=y
�
m
|xm|
x�
�x = y
x� � argmin�x=y
�
m
|xm|2
Noiseless Sparse Regularization
Convex linear program.Interior points, cf. [Chen, Donoho, Saunders] “basis pursuit”.
Douglas-Rachford splitting, see [Combettes, Pesquet].
x�
�x = y
Noiseless measurements: y = �x0
RegularizationData fidelity
y = �x0 + wNoisy measurements:
x� � argminx�RQ
12
||y � �x||2 + � ||x||1
Noisy Sparse Regularization
�� �RegularizationData fidelityEquivalence
||�x =y|| �
�
y = �x0 + wNoisy measurements:
x� � argminx�RQ
12
||y � �x||2 + � ||x||1
x� � argmin||�x�y||��
||x||1
Noisy Sparse Regularization
x�
Iterative soft thresholdingForward-backward splitting
Algorithms:
�� �RegularizationData fidelityEquivalence
||�x =y|| �
�
y = �x0 + wNoisy measurements:
x� � argminx�RQ
12
||y � �x||2 + � ||x||1
x� � argmin||�x�y||��
||x||1
Noisy Sparse Regularization
Nesterov multi-steps schemes.
see [Daubechies et al], [Pesquet et al], etc
��
x�
Image De-blurring
Original f0 y = h � f0 + w
f� = argminf�RN
||f ⇥ h� y||2 + �||⇥f ||2
f�(⇥) =h(⇥)
|h(⇥)|2 + �|⇥|2y(⇥)
Sobolev regularization:
Image De-blurring
Original f0 y = h � f0 + w SobolevSNR=22.7dB
f� = argminf�RN
||f ⇥ h� y||2 + �||⇥f ||2
f�(⇥) =h(⇥)
|h(⇥)|2 + �|⇥|2y(⇥)
Sobolev regularization:
� = translation invariant wavelets.
x� � argminx
12
||h � (�x)� y||2 + �||x||1f� = �x� where
Sparsity
Image De-blurring
Original f0 y = h � f0 + w
Sparsity regularization:
SNR=24.7dBSobolev
SNR=22.7dB
K
y = Kf0 + wMeasures:
Inpainting Problem
(Kf)(x) =�
0 if x � �,f(x) if x /� �.
Image SeparationModel: f = f1 + f2 + w, (f1, f2) components, w noise.
Image SeparationModel: f = f1 + f2 + w, (f1, f2) components, w noise.
Union dictionary:
(x�1, x
�2) � argmin
x=(x1,x2)�RN
12
||f ��x||2 + �||x||1
� = [�1,�2] � RQ�(N1+N2)
Image Separation
Recovered component: f�i = �ix�
i .
Model: f = f1 + f2 + w, (f1, f2) components, w noise.
Examples of Decompositions
Cartoon+Texture Separation
Overview
• Inverse Problems Regularization
• Sparse Synthesis Regularization
• Theoritical Recovery Guarantees
• Compressed Sensing
• RIP and Polytopes CS Theory
• Fourier Measurements
• Convex Optimization via Proximal Splitting
Basics of Convex Analysis
Setting: Here: H = RN .G : H� R ⇤ {+⇥}
minx�H
G(x)Problem:
� t � [0, 1]
Basics of Convex Analysis
Setting: Here: H = RN .G : H� R ⇤ {+⇥}
x y
minx�H
G(x)Problem:
G(tx + (1� t)y) � tG(x) + (1� t)G(y)Convex:
�G(x) = {u ⇥ H \ ⇤ z, G(z) � G(x) + ⌅u, z � x⇧}Sub-di�erential:
G(x) = |x|
�G(0) = [�1, 1]
� t � [0, 1]
Basics of Convex Analysis
Setting: Here: H = RN .G : H� R ⇤ {+⇥}
x y
minx�H
G(x)Problem:
G(tx + (1� t)y) � tG(x) + (1� t)G(y)Convex:
�G(x) = {u ⇥ H \ ⇤ z, G(z) � G(x) + ⌅u, z � x⇧}
If F is C1, �F (x) = {�F (x)}
Sub-di�erential:
Smooth functions: G(x) = |x|
�G(0) = [�1, 1]
� t � [0, 1]
Basics of Convex Analysis
Setting: Here: H = RN .G : H� R ⇤ {+⇥}
x y
minx�H
G(x)Problem:
G(tx + (1� t)y) � tG(x) + (1� t)G(y)Convex:
�G(x) = {u ⇥ H \ ⇤ z, G(z) � G(x) + ⌅u, z � x⇧}
If F is C1, �F (x) = {�F (x)}
Sub-di�erential:
Smooth functions: G(x) = |x|
�G(0) = [�1, 1]First-order conditions:
�� 0 � �G(x�)x� � argminx�H
G(x)
� t � [0, 1]
Basics of Convex Analysis
Setting: Here: H = RN .G : H� R ⇤ {+⇥}
x y
minx�H
G(x)Problem:
G(tx + (1� t)y) � tG(x) + (1� t)G(y)Convex:
⇥G(x) = ��(�x� y) + �⇥|| · ||1(x)
�|| · ||1(x)i =�
sign(xi) if xi ⇥= 0,[�1, 1] if xi = 0.
L1 Regularization: First Order Conditions
x� ⇥ argminx�RQ
G(x) =12
||y � �x||2 + �||x||1
I = {i ⇥ {0, . . . , N � 1} \ x�i ⇤= 0}
Support of the solution:
⇥G(x) = ��(�x� y) + �⇥|| · ||1(x)
�|| · ||1(x)i =�
sign(xi) if xi ⇥= 0,[�1, 1] if xi = 0.
L1 Regularization: First Order Conditions
i
x�i
x� ⇥ argminx�RQ
G(x) =12
||y � �x||2 + �||x||1
I = {i ⇥ {0, . . . , N � 1} \ x�i ⇤= 0}
Support of the solution:
⇥G(x) = ��(�x� y) + �⇥|| · ||1(x)
�|| · ||1(x)i =�
sign(xi) if xi ⇥= 0,[�1, 1] if xi = 0.
Restrictions:xI = (xi)i�I � R|I| �I = (�i)i�I � RP�|I|
L1 Regularization: First Order Conditions
i
x�i
x� ⇥ argminx�RQ
G(x) =12
||y � �x||2 + �||x||1
First order condition:
��(�x� � y) + �s = 0
where�
sI = sign(x�I),
||sIc ||� � 1
P�(y)
L1 Regularization: First Order Conditions
x� � argminx�RN
12
||�x� y||2 + �||x||1i
x�i
First order condition:
��(�x� � y) + �s = 0
where�
sI = sign(x�I),
||sIc ||� � 1
=� sIc =1�
��Ic(y � �x�)
P�(y)
L1 Regularization: First Order Conditions
x� � argminx�RN
12
||�x� y||2 + �||x||1i
x�i
��i, y � �x��
��
� i
First order condition:
��(�x� � y) + �s = 0
where�
sI = sign(x�I),
||sIc ||� � 1
x� solution of P�(y)||��Ic(�x� � y)||� � � ��
=� sIc =1�
��Ic(y � �x�)
P�(y)
L1 Regularization: First Order Conditions
Theorem:
x� � argminx�RN
12
||�x� y||2 + �||x||1i
x�i
��i, y � �x��
��
� i
Theorem:
First order condition:
then x� is the unique solution of P�(y)
��(�x� � y) + �s = 0
where�
sI = sign(x�I),
||sIc ||� � 1
If �I has full rank and
x� solution of P�(y)||��Ic(�x� � y)||� � � ��
=� sIc =1�
��Ic(y � �x�)
P�(y)
L1 Regularization: First Order Conditions
||��Ic(�x� � y)||� < �
Theorem:
x� � argminx�RN
12
||�x� y||2 + �||x||1i
x�i
��i, y � �x��
��
� i
(implicit equation)
= x0,I + �+I w � �(��I�I)�1sI
x�I = �+
I y � �(��I�I)�1sign(x�I)
Local Behavior of the Solution
=�
��(�x� � y) + �s = 0First order condition:
x� � argminx�RN
12
||�x� y||2 + �||x||1
(implicit equation)
Intuition: for small w.(unknown) (known)
= x0,I + �+I w � �(��I�I)�1sI
x�I = �+
I y � �(��I�I)�1sign(x�I)
sI = sign(x�I) = sign(x0,I) = s0,I
Local Behavior of the Solution
=�
��(�x� � y) + �s = 0First order condition:
x� � argminx�RN
12
||�x� y||2 + �||x||1
(implicit equation)
Intuition: for small w.
To prove:
(unknown) (known)
is the unique solution.
= x0,I + �+I w � �(��I�I)�1sI
x�I = �+
I y � �(��I�I)�1sign(x�I)
xI = x0,I + �+I w � �(��I�I)�1s0,I
sI = sign(x�I) = sign(x0,I) = s0,I
Local Behavior of the Solution
=�
��(�x� � y) + �s = 0First order condition:
x� � argminx�RN
12
||�x� y||2 + �||x||1
Candidate for the solution:
xI = x0,I + �+I w � �(��I�I)�1s0,I
Local Behavior of the Solution
Candidate for the solution:
xI = x0,I + �+I w � �(��I�I)�1s0,I
Local Behavior of the Solution
To prove: ||�Ic(�I xI � y)||� < 1
Candidate for the solution:
xI = x0,I + �+I w � �(��I�I)�1s0,I
�I = ��Ic(�I�+
I � Id) �I = ��Ic�+,�
I
1�
��Ic(�I xI � y) = �I
�w
�
�� �I(s0,I)
Local Behavior of the Solution
To prove: ||�Ic(�I xI � y)||� < 1
Candidate for the solution:
xI = x0,I + �+I w � �(��I�I)�1s0,I
can be madesmall when w � 0
|| · ||� mustbe < 1
�I = ��Ic(�I�+
I � Id) �I = ��Ic�+,�
I
1�
��Ic(�I xI � y) = �I
�w
�
�� �I(s0,I)
Local Behavior of the Solution
To prove: ||�Ic(�I xI � y)||� < 1
F(s) = ||�IsI ||� where �I = ��Ic�+,�
I
Robustness to Small Noise
Identifiability crition: [Fuchs]For s ⇥ {�1, 0,+1}N , let I = supp(s)
is the unique solution of P�(y).
If ||w||/T is small enough and � � ||w||, then
If F (sign(x0)) < 1, T = mini�I
|x0,i|
F(s) = ||�IsI ||� where �I = ��Ic�+,�
I
x0,I + �+I w � �(��I�I)�1 sign(x0,I)
Theorem: [Fuchs 2004]
Robustness to Small Noise
Identifiability crition: [Fuchs]For s ⇥ {�1, 0,+1}N , let I = supp(s)
is the unique solution of P�(y).
If ||w||/T is small enough and � � ||w||, then
If F (sign(x0)) < 1, T = mini�I
|x0,i|
F(s) = ||�IsI ||� where �I = ��Ic�+,�
I
x0,I + �+I w � �(��I�I)�1 sign(x0,I)
Theorem: [Fuchs 2004]
When w = 0, F (sign(x0) < 1 =� x� = x0.
Robustness to Small Noise
Identifiability crition: [Fuchs]For s ⇥ {�1, 0,+1}N , let I = supp(s)
is the unique solution of P�(y).
If ||w||/T is small enough and � � ||w||, then
If F (sign(x0)) < 1, T = mini�I
|x0,i|
F(s) = ||�IsI ||� where �I = ��Ic�+,�
I
x0,I + �+I w � �(��I�I)�1 sign(x0,I)
Theorem: [Fuchs 2004]
When w = 0, F (sign(x0) < 1 =� x� = x0.
Robustness to Small Noise
Identifiability crition: [Fuchs]For s ⇥ {�1, 0,+1}N , let I = supp(s)
Theorem: [Grassmair et al. 2010] If F (sign(x0)) < 1
if � � ||w||, ||x� � x0|| = O(||w||)
where dI defined by:� i � I, �dI , �i� = si
dI = �I(��I�I)�1sI
F(s) = ||�IsI ||� = maxj /�I
|�dI , �j�|
Geometric Interpretation
�j
�idI = �+,�
I sI
where dI defined by:� i � I, �dI , �i� = si
Condition F (s) < 1: no vector �j inside the cap Cs.
dI
Cs
dI = �I(��I�I)�1sI
F(s) = ||�IsI ||� = maxj /�I
|�dI , �j�|
Geometric Interpretation
�j
�i
�i
�j
|�dI , �⇥| < 1
dI = �+,�I sI
where dI defined by:� i � I, �dI , �i� = si
Condition F (s) < 1: no vector �j inside the cap Cs.
dI
Cs
dI
�i
�j
�k
dI = �I(��I�I)�1sI
F(s) = ||�IsI ||� = maxj /�I
|�dI , �j�|
Geometric Interpretation
�j
�i
�i
�j
|�dI , �⇥| < 1
|�dI ,
�⇥|<
1
dI = �+,�I sI
Exact Recovery Criterion (ERC): [Tropp]
Relation with F criterion: ERC(I) = maxs,supp(s)�I
F(s)
For a support I ⇥ {0, . . . , N � 1} with �I full rank,
= ||�+I �Ic ||1,1 = max
j�Ic||�+
I �j ||1
(use ||(aj)j ||1,1 = maxj ||aj ||1)
ERC(I) = ||�I ||�,� where �I = ��Ic�+,�
I
Robustness to Bounded Noise
Exact Recovery Criterion (ERC): [Tropp]
Relation with F criterion: ERC(I) = maxs,supp(s)�I
F(s)
For a support I ⇥ {0, . . . , N � 1} with �I full rank,
= ||�+I �Ic ||1,1 = max
j�Ic||�+
I �j ||1
(use ||(aj)j ||1,1 = maxj ||aj ||1)
ERC(I) = ||�I ||�,� where �I = ��Ic�+,�
I
Robustness to Bounded Noise
Theorem: If ERC(supp(x0)) < 1 and � � ||w||, then
||x0 � x�|| = O(||w||)x� is unique, satisfies supp(x�) � supp(x0), and
P = 200, N = 1000
F < 1ERC < 1 x� = x0
w-ERC < 1
Example: Random Matrix
0 10 20 30 40 50
0
0.2
0.4
0.6
0.8
1
⇥x =�
i
xi�(·��i)
Increasing �:� reduces correlation.
F (s)ERC(I)
w-ERC(I)
� reduces resolution.
�
Example: Deconvolution
�x0
x0�
Coherence Boundsµ(�) = max
i �=j|��i, �j⇥|Mutual coherence:
Theorem: F(s) � ERC(I) � w-ERC(I) � |I|µ(�)1� (|I|� 1)µ(�)
Coherence Bounds
Theorem:
||x0 � x�|| = O(||w||)
||x0||0 <12
�1 +
1µ(�)
�If
µ(�) = maxi �=j
|��i, �j⇥|Mutual coherence:
one has supp(x�) � I, and
and � � ||w||,
Theorem: F(s) � ERC(I) � w-ERC(I) � |I|µ(�)1� (|I|� 1)µ(�)
Coherence Bounds
Theorem:
||x0 � x�|| = O(||w||)
||x0||0 <12
�1 +
1µ(�)
�If
µ(�) = maxi �=j
|��i, �j⇥|Mutual coherence:
one has supp(x�) � I, and
and � � ||w||,
Theorem: F(s) � ERC(I) � w-ERC(I) � |I|µ(�)1� (|I|� 1)µ(�)
For Gaussian matrices:
For convolution matrices: useless criterion.
µ(�) ��
log(PN)/P
One has: Optimistic setting:||x0||0 � O(
�P )
µ(�) ��
N � P
P (N � 1)
Incoherent pair of orthobases:
�2 =�k �� N�1/2e
2i�N mk
�
m�1 = {k ⇤⇥ �[k �m]}m
Diracs/Fourier
� = [�1,�2] � RN�2N
Spikes and Sinusoids Separation
Incoherent pair of orthobases:
�2 =�k �� N�1/2e
2i�N mk
�
m�1 = {k ⇤⇥ �[k �m]}m
Diracs/Fourier
� = [�1,�2] � RN�2N
minx�R2N
12
||y � �x||2 + �||x||1
minx1,x2�RN
12
||y � �1x1 � �2x2||2 + �||x1||1 + �||x2||1��
= +
Spikes and Sinusoids Separation
Incoherent pair of orthobases:
�2 =�k �� N�1/2e
2i�N mk
�
m�1 = {k ⇤⇥ �[k �m]}m
Diracs/Fourier
� = [�1,�2] � RN�2N
µ(�) =1�N
=� separates up to�
N/2 Diracs + sines.
minx�R2N
12
||y � �x||2 + �||x||1
minx1,x2�RN
12
||y � �1x1 � �2x2||2 + �||x1||1 + �||x2||1��
= +
Spikes and Sinusoids Separation
Overview
• Inverse Problems Regularization
• Sparse Synthesis Regularization
• Theoritical Recovery Guarantees
• Compressed Sensing
• RIP and Polytopes CS Theory
• Fourier Measurements
• Convex Optimization via Proximal Splitting
Data aquisition:
Sensors
Shannon interpolation: if Supp( ˆf) � [�N�, N�]
f [i] = f(i/N) = �f , �i��1
�2(�i)i
(Diracs)
Pointwise Sampling and Smoothness
f � L2 f � RN
�0
Data aquisition:
Sensors
where h(t) =sin(�t)
�t
f(t) =�
i
f [i]h(Nt� i)
Shannon interpolation: if Supp( ˆf) � [�N�, N�]
f [i] = f(i/N) = �f , �i��1
�2(�i)i
(Diracs)
Pointwise Sampling and Smoothness
f � L2 f � RN
�0
Data aquisition:
Sensors
where h(t) =sin(�t)
�t
f(t) =�
i
f [i]h(Nt� i)
�� Natural images are not smooth.
�� But can be compressed e�ciently.
Shannon interpolation: if Supp( ˆf) � [�N�, N�]
f [i] = f(i/N) = �f , �i��1
�2(�i)i
(Diracs)
Pointwise Sampling and Smoothness
f � L2 f � RN
�0
Single Pixel Camera (Rice)
y[i] = �f0, �i⇥
Single Pixel Camera (Rice)
y[i] = �f0, �i⇥
f0, N = 2562 f�, P/N = 0.16 f�, P/N = 0.02
Physical hardware resolution limit: target resolution f � RN .
f � L2 f � RN y � RPmicromirrors
arrayresolution
CS hardwareK
CS Hardware ModelCS is about designing hardware: input signals f � L2(R2).
Physical hardware resolution limit: target resolution f � RN .
f � L2 f � RN y � RPmicromirrors
arrayresolution
CS hardware
,
...
K
CS Hardware ModelCS is about designing hardware: input signals f � L2(R2).
,
,
Physical hardware resolution limit: target resolution f � RN .
f � L2 f � RN y � RPmicromirrors
arrayresolution
CS hardware
,
...
fOperator K
K
CS Hardware ModelCS is about designing hardware: input signals f � L2(R2).
,
,
f0 � RN sparse in ortho-basis �
Sparse CS Recovery
���
x0 � RN
f0 � RN
(Discretized) sampling acquisition:
f0 � RN sparse in ortho-basis �
y = Kf0 + w = K � �(x0) + w= �
Sparse CS Recovery
���
x0 � RN
f0 � RN
(Discretized) sampling acquisition:
f0 � RN sparse in ortho-basis �
y = Kf0 + w = K � �(x0) + w= �
K drawn from the Gaussian matrix ensemble
Ki,j � N (0, P�1/2) i.i.d.
� � drawn from the Gaussian matrix ensemble
Sparse CS Recovery
���
x0 � RN
f0 � RN
(Discretized) sampling acquisition:
f0 � RN sparse in ortho-basis �
y = Kf0 + w = K � �(x0) + w= �
K drawn from the Gaussian matrix ensemble
Ki,j � N (0, P�1/2) i.i.d.
� � drawn from the Gaussian matrix ensemble
Sparse recovery:min
||�x�y||�||w||||x||1 min
x
12
||�x� y||2 + �||x||1||w||�� �
Sparse CS Recovery
���
x0 � RN
f0 � RN
� = translation invariantwavelet frame
Original f0
CS Simulation Example
Overview
• Inverse Problems Regularization
• Sparse Synthesis Regularization
• Theoritical Recovery Guarantees
• Compressed Sensing
• RIP and Polytopes CS Theory
• Fourier Measurements
• Convex Optimization via Proximal Splitting
⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:
�1 recovery:
CS with RIP
x⇥ � argmin||�x�y||��
||x||1 where�
y = �x0 + w||w|| � �
⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:
�1 recovery:
CS with RIP
[Candes 2009]
x⇥ � argmin||�x�y||��
||x||1 where�
y = �x0 + w||w|| � �
Theorem: If �2k ��
2� 1, then
where xk is the best k-term approximation of x0.
||x0 � x�|| � C0⇥k
||x0 � xk||1 + C1�
f�(⇥) =1
2⇤�⇥
�(⇥� b)+(a� ⇥)+
Eigenvalues of ��I�I with |I| = k are essentially in [a, b]
a = (1��
�)2 and b = (1��
�)2 where � = k/P
When k = �P � +�, the eigenvalue distribution tends to
[Marcenko-Pastur]
Large deviation inequality [Ledoux]
Singular Values Distributions
0 0.5 1 1.5 2 2.50
0.5
1
1.5
P=200, k=10
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
1
P=200, k=30
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
P=200, k=50
0 0.5 1 1.5 2 2.50
0.5
1
1.5
P=200, k=10
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
1
P=200, k=30
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
P=200, k=50
P = 200, k = 10
f�(�)
�
�k = 30
Link with coherence:
�k � (k � 1)µ(�)
�2 = µ(�)
RIP for Gaussian Matrices
µ(�) = maxi �=j
|��i, �j⇥|
Link with coherence:
�k � (k � 1)µ(�)
For Gaussian matrices:
�2 = µ(�)
RIP for Gaussian Matrices
µ(�) = maxi �=j
|��i, �j⇥|
µ(�) ��
log(PN)/P
Link with coherence:
�k � (k � 1)µ(�)
For Gaussian matrices:
Stronger result:
�2 = µ(�)
RIP for Gaussian Matrices
k � C
log(N/P )PTheorem: If
then �2k ��
2� 1 with high probability.
µ(�) = maxi �=j
|��i, �j⇥|
µ(�) ��
log(PN)/P
(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:
smallest / largest eigenvalues of A�A
Numerics with RIP
�2� 1
(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:
Upper/lower RIC:
�ik = max
|I|=k�i(�I)
�k = min(�1k, �2
k)
k
�2k
�2k
Monte-Carlo estimation:�k � �k
smallest / largest eigenvalues of A�A
N = 4000, P = 1000
Numerics with RIP
�(B�)
x0 �x0
�
�1
��2
�2�3
��3
��1
� = (�i)i � R2�3
B� = {x \ ||x||1 � �}� = ||x0||1
x� � argmin�x=y
||x||1 (P0(y))Noiseless recovery:
y �� x�
Polytopes-based Guarantees
�(B�)
x0 �x0
�
�1
��2
�2�3
��3
��1
� = (�i)i � R2�3
B� = {x \ ||x||1 � �}� = ||x0||1
x0 solution of P0(�x0) �⇥ �x0 ⇤ ��(B�)
x� � argmin�x=y
||x||1 (P0(y))Noiseless recovery:
y �� x�
Polytopes-based Guarantees
C(0,1,1)
K(0,1,1)
Ks =�(�isi)i � R3 \ �i � 0
� 2-D conesCs = �Ks
2-D quadrant
L1 Recovery in 2-D
��1
�2�3
� = (�i)i � R2�3
y �� x�
All MostRIP
� Sharp constants.
� No noise robustness.
All x0 such that ||x0||0 � Call(P/N)P are identifiable.Most x0 such that ||x0||0 � Cmost(P/N)P are identifiable.
Call(1/4) � 0.065
Cmost(1/4) � 0.25
[Donoho]
Polytope Noiseless Recovery
50 100 150 200 250 300 350 4000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Counting faces of random polytopes:
All MostRIP
� Sharp constants.
� No noise robustness.
All x0 such that ||x0||0 � Call(P/N)P are identifiable.Most x0 such that ||x0||0 � Cmost(P/N)P are identifiable.
Call(1/4) � 0.065
Cmost(1/4) � 0.25
[Donoho]
� Computation of“pathological” signals
[Dossal, P, Fadili, 2010]
Polytope Noiseless Recovery
50 100 150 200 250 300 350 4000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Counting faces of random polytopes:
Overview
• Inverse Problems Regularization
• Sparse Synthesis Regularization
• Theoritical Recovery Guarantees
• Compressed Sensing
• RIP and Polytopes CS Theory
• Fourier Measurements
• Convex Optimization via Proximal Splitting
Tomography and Fourier Measures
Tomography and Fourier Measures
�
Fourier slice theorem: p�(⇥) = f(⇥ cos(�), ⇥ sin(�))
1D 2D Fourier
�k
f = FFT2(f)
Partial Fourier measurements:
Equivalent to:
{p�k(t)}t�R0�k<K
�f = {f [�]}���
Disclaimer: this is not compressed sensing.
Regularized Inversion
f⇥ = argminf
12
�
���
|y[⇤] � f [⇤]|2 + ��
m
|⇥f, ⇥m⇤|.�1 regularization:
Noisy measurements: ⇥� � �, y[�] = f0[�] + w[�].
Noise: w[⇥] � N (0,�), white noise.
f+ f�
MRI ImagingFrom [Lutsig et al.]
Fourier sub-sampling pattern:randomization
MRI Reconstruction
High resolution Linear SparsityLow resolution
From [Lutsig et al.]
Pseudo inverse Sparse wavelets
�� Sampling low frequencies helps.
Compressive Fourier Measurements
Gaussian matrices: intractable for large N .
Random partial orthogonal matrix: {��}� orthogonal basis.
where |�| = P drawn uniformly at random.
�� � �, y[�] = �f, ⇥�� = f [�]
Fast measurements: (e.g. Fourier basis)
Structured Measurements
� = (��)���
Gaussian matrices: intractable for large N .
Random partial orthogonal matrix: {��}� orthogonal basis.
where |�| = P drawn uniformly at random.
�� � �, y[�] = �f, ⇥�� = f [�]
Fast measurements: (e.g. Fourier basis)
Mutual incoherence: µ =⌅
Nmax�,m
|⇥⇥�, �m⇤| � [1,⌅
N ]
Structured Measurements
� = (��)���
�� not universal: requires incoherence.
Gaussian matrices: intractable for large N .
Random partial orthogonal matrix: {��}� orthogonal basis.
where |�| = P drawn uniformly at random.
�� � �, y[�] = �f, ⇥�� = f [�]
Fast measurements: (e.g. Fourier basis)
Mutual incoherence: µ =⌅
Nmax�,m
|⇥⇥�, �m⇤| � [1,⌅
N ]
Structured Measurements
Theorem: with high probability on �,
If M � CP
µ2 log(N)4, then �2M �
�2� 1
[Rudelson, Vershynin, 2006]
� = (��)���
Overview
• Inverse Problems Regularization
• Sparse Synthesis Regularization
• Theoritical Recovery Guarantees
• Compressed Sensing
• RIP and Polytopes CS Theory
• Fourier Measurements
• Convex Optimization via Proximal Splitting
Setting:H: Hilbert space. Here: H = RN .
G : H� R ⇤ {+⇥}
Convex Optimization
minx�H
G(x)Problem:
Setting:H: Hilbert space. Here: H = RN .
Class of functions:
G : H� R ⇤ {+⇥}
x y
G(tx + (1� t)y) � tG(x) + (1� t)G(y) t � [0, 1]
Convex Optimization
Convex:
minx�H
G(x)Problem:
Setting:H: Hilbert space. Here: H = RN .
Class of functions:
G : H� R ⇤ {+⇥}
lim infx�x0
G(x) � G(x0)
{x ⇥ H \ G(x) ⇤= +�} ⇤= ⌅
x y
G(tx + (1� t)y) � tG(x) + (1� t)G(y) t � [0, 1]
Convex Optimization
Lower semi-continuous:
Convex:
Proper:
minx�H
G(x)Problem:
Setting:H: Hilbert space. Here: H = RN .
Class of functions:
G : H� R ⇤ {+⇥}
lim infx�x0
G(x) � G(x0)
{x ⇥ H \ G(x) ⇤= +�} ⇤= ⌅
x y
�C(x) =�
0 if x ⇥ C,+� otherwise.
(C closed and convex)
G(tx + (1� t)y) � tG(x) + (1� t)G(y) t � [0, 1]
Convex Optimization
Indicator:
Lower semi-continuous:
Convex:
Proper:
minx�H
G(x)Problem:
Proximal operator of G:Prox�G(x) = argmin
z
12
||x� z||2 + �G(z)
Proximal Operators
Proximal operator of G:Prox�G(x) = argmin
z
12
||x� z||2 + �G(z)
G(x) = ||x||1 =�
i
|xi|
G(x) = ||x||0 = | {i \ xi �= 0} |
G(x) =�
i
log(1 + |xi|2)
Proximal Operators
−10 −8 −6 −4 −2 0 2 4 6 8 10
−2
0
2
4
6
8
10
12
||x||0|x|log(1 + x2)
G(x)
�� 3rd order polynomial root.
Proximal operator of G:Prox�G(x) = argmin
z
12
||x� z||2 + �G(z)
G(x) = ||x||1 =�
i
|xi|
Prox�G(x)i = max�
0, 1� �
|xi|
�xi
G(x) = ||x||0 = | {i \ xi �= 0} |
Prox�G(x)i =�
xi if |xi| � �2�,0 otherwise.
G(x) =�
i
log(1 + |xi|2)
Proximal Operators
−10 −8 −6 −4 −2 0 2 4 6 8 10
−2
0
2
4
6
8
10
12
−10 −8 −6 −4 −2 0 2 4 6 8 10−10
−8
−6
−4
−2
0
2
4
6
8
10
||x||0|x|log(1 + x2)
G(x)
ProxG(x)
Separability: G(x) = G1(x1) + . . . + Gn(xn)
ProxG(x) = (ProxG1(x1), . . . ,ProxGn(xn))
Proximal Calculus
Separability:
Quadratic functionals:
= ��(Id + ����)�1
G(x) = G1(x1) + . . . + Gn(xn)
ProxG(x) = (ProxG1(x1), . . . ,ProxGn(xn))
G(x) =12
||�x� y||2
Prox�G = (Id + ����)�1��
Proximal Calculus
Separability:
Quadratic functionals:
= ��(Id + ����)�1
G(x) = G1(x1) + . . . + Gn(xn)
ProxG(x) = (ProxG1(x1), . . . ,ProxGn(xn))
G(x) =12
||�x� y||2
Prox�G = (Id + ����)�1��
Composition by tight frame:
Proximal Calculus
ProxG�A(x) = A� � ProxG �A + Id�A� �A
A � A� = Id
Separability:
Quadratic functionals:
Indicators:
= ��(Id + ����)�1
G(x) = G1(x1) + . . . + Gn(xn)
ProxG(x) = (ProxG1(x1), . . . ,ProxGn(xn))
G(x) =12
||�x� y||2
Prox�G = (Id + ����)�1��
G(x) = �C(x) x
Prox�G(x) = ProjC(x)= argmin
z�C||x� z||
Composition by tight frame:
Proximal Calculus
ProjC(x)C
ProxG�A(x) = A� � ProxG �A + Id�A� �A
A � A� = Id
If 0 < �� < 2/L, x(�) � x� a solution.Theorem:
Gradient descent:G is C1 and �G is L-Lipschitz
Gradient and Proximal Descents
[explicit]x(�+1) = x(�) � ���G(x(�))
If 0 < �� < 2/L, x(�) � x� a solution.Theorem:
Gradient descent:
x(�+1) = x(�) � ��v(�),
�� Problem: slow.
G is C1 and �G is L-Lipschitz
v(�) � �G(x(�))
Gradient and Proximal Descents
Sub-gradient descent:
[explicit]
If �� � 1/⇥, x(�) � x� a solution.Theorem:
x(�+1) = x(�) � ���G(x(�))
If 0 < �� < 2/L, x(�) � x� a solution.
If �� � c > 0, x(�) � x� a solution.
Theorem:
Gradient descent:
x(�+1) = x(�) � ��v(�),
�� Problem: slow.
G is C1 and �G is L-Lipschitz
v(�) � �G(x(�))
x(⇥+1) = Prox��G(x(⇥))
�� Prox�G hard to compute.
Gradient and Proximal Descents
Sub-gradient descent:
Proximal-point algorithm:
[explicit]
[implicit]
If �� � 1/⇥, x(�) � x� a solution.Theorem:
Theorem:
x(�+1) = x(�) � ���G(x(�))
Solve minx�H
E(x)
Problem: Prox�E is not available.
Proximal Splitting Methods
Solve minx�H
E(x)
Splitting: E(x) = F (x) +�
i
Gi(x)
SimpleSmooth
Problem: Prox�E is not available.
Proximal Splitting Methods
Solve minx�H
E(x)
Splitting: E(x) = F (x) +�
i
Gi(x)
SimpleSmooth
Problem: Prox�E is not available.
Iterative algorithms using: �F (x)Prox�Gi(x)
Forward-Backward:Douglas-Rachford:
Primal-Dual:Generalized FB:
�Gi�Gi � A
F +�
Gi
F + Gsolves
Proximal Splitting Methods
SimpleSmooth
Data fidelity:
Regularization:
f0 = �x0 sparse in dictionary �.
Inverse problem: y = Kf0 + wmeasurements
K : RN � RP , P � NK
� = K � ⇥F (x) =12
||y � �x||2
G(x) = ||x||1 =�
i
|xi|
minx�RN
F (x) + G(x)
Sparse recovery: f� = �x� where x� solves
Model:
Smooth + Simple Splitting
Kf0f0
x� � argminx
F (x) + G(x) 0 � �F (x�) + �G(x�)
(x� � ��F (x�)) � x� + �⇥G(x�)
Fix point equation:
x⇥ = Prox�G(x⇥ � ��F (x⇥))
Forward-Backward
����
��
x� � argminx
F (x) + G(x) 0 � �F (x�) + �G(x�)
(x� � ��F (x�)) � x� + �⇥G(x�)
Fix point equation:
x(⇥+1) = Prox�G
�x(⇥) � ��F (x(⇥))
�x⇥ = Prox�G(x⇥ � ��F (x⇥))
Forward-Backward
����
��
Forward-backward:
x� � argminx
F (x) + G(x) 0 � �F (x�) + �G(x�)
(x� � ��F (x�)) � x� + �⇥G(x�)
Fix point equation:
G = �C
x(⇥+1) = Prox�G
�x(⇥) � ��F (x(⇥))
�x⇥ = Prox�G(x⇥ � ��F (x⇥))
Forward-Backward
����
��
Forward-backward:
Projected gradient descent:
x� � argminx
F (x) + G(x) 0 � �F (x�) + �G(x�)
(x� � ��F (x�)) � x� + �⇥G(x�)
Fix point equation:
G = �C
x(⇥+1) = Prox�G
�x(⇥) � ��F (x(⇥))
�x⇥ = Prox�G(x⇥ � ��F (x⇥))
Forward-Backward
����
��
Forward-backward:
Projected gradient descent:
Theorem:
a solution of (�)If � < 2/L,
Let �F be L-Lipschitz.x(�) � x�
minx
12
||�x� y||2 + �||x||1 minx
F (x) + G(x)
F (x) =12
||�x� y||2
G(x) = �||x||1
�F (x) = ��(�x� y)
Prox�G(x)i = max�
0, 1� �⇥
|xi|
�xi
L = ||���||
Example: L1 Regularization
��
Forward-backward Iterative soft thresholding��
Douglas-Rachford iterations:
(�)
RProx�G(x) = 2Prox�G(x)� x
x(⇥+1) = Prox�G2(z(⇥+1))
z(⇥+1) =�1� �
2
�z(⇥) +
�
2RProx�G2 � RProx�G1(z
(⇥))
Reflexive prox:
Douglas Rachford Scheme
Simple Simple
minx
G1(x) + G2(x)
Douglas-Rachford iterations:
Theorem:
(�)
a solution of (�)
RProx�G(x) = 2Prox�G(x)� x
x(�) � x�
x(⇥+1) = Prox�G2(z(⇥+1))
z(⇥+1) =�1� �
2
�z(⇥) +
�
2RProx�G2 � RProx�G1(z
(⇥))
If 0 < � < 2 and ⇥ > 0,
Reflexive prox:
Douglas Rachford Scheme
Simple Simple
minx
G1(x) + G2(x)
minx
G1(x) + G2(x)
G1(x) = iC(x), C = {x \ �x = y}
Prox�G1(x) = ProjC(x) = x + �⇥(��⇥)�1(y � �x)
G2(x) = ||x||1 Prox�G2(x) =�
max�
0, 1� �
|xi|
�xi
�
i
�� e⇥cient if ��� easy to invert.
Example: Constrainted L1
min�x=y
||x||1 ��
50 100 150 200 250
−5
−4
−3
−2
−1
0
1
minx
G1(x) + G2(x)
G1(x) = iC(x), C = {x \ �x = y}
Prox�G1(x) = ProjC(x) = x + �⇥(��⇥)�1(y � �x)
G2(x) = ||x||1 Prox�G2(x) =�
max�
0, 1� �
|xi|
�xi
�
i
�� e⇥cient if ��� easy to invert.
� = 0.01� = 1� = 10
Example: compressed sensing
� � R100�400 Gaussian matrix
||x0||0 = 17y = �x0
log10(||x(�)||1 � ||x�||1)
Example: Constrainted L1
min�x=y
||x||1 ��
�
C =�(x1, . . . , xk) � Hk \ x1 = . . . = xk
�
each Fi is simpleminx
G1(x) + . . . + Gk(x)
minx
G(x1, . . . , xk) + �C(x1, . . . , xk)
G(x1, . . . , xk) = G1(x1) + . . . + Gk(xk)
More than 2 Functionals
��
C =�(x1, . . . , xk) � Hk \ x1 = . . . = xk
�
Prox�⇥C (x1, . . . , xk) = (x, . . . , x) where x =1k
�
i
xi
each Fi is simpleminx
G1(x) + . . . + Gk(x)
minx
G(x1, . . . , xk) + �C(x1, . . . , xk)
G(x1, . . . , xk) = G1(x1) + . . . + Gk(xk)
G and �C are simple:
Prox�G(x1, . . . , xk) = (Prox�Gi(xi))i
More than 2 Functionals
��
Linear map A : E � H.
C = {(x, y) ⇥ H� E \ Ax = y}
minx
G1(x) + G2 � A(x)
G1, G2 simple.minz⇥H�E
G(z) + �C(z)
G(x, y) = G1(x) + G2(y)
Auxiliary Variables
��
Linear map A : E � H.
C = {(x, y) ⇥ H� E \ Ax = y}
Prox�C (x, y) = (x + A�y, y � y) = (x, Ax)
wherey = (Id + AA�)�1(Ax� y)
x = (Id + A�A)�1(A�y + x)
�� e�cient if Id + AA� or Id + A�A easy to invert.
minx
G1(x) + G2 � A(x)
G1, G2 simple.minz⇥H�E
G(z) + �C(z)
G(x, y) = G1(x) + G2(y)
Prox�G(x, y) = (Prox�G1(x),Prox�G2(y))
Auxiliary Variables
��
G1(u) = ||u||1 Prox�G1(u)i = max�
0, 1� �
||ui||
�ui
minf
12||Kf � y||2 + �||⇥f ||1
minx
G1(f) + G2 � �(f)
G2(f) =12||Kf � y||2 Prox�G2 = (Id + �K�K)�1K�
C =�(f, u) ⇥ RN � RN�2 \ u = ⇤f
�
Prox�C (f, u) = (f ,�f)
Example: TV Regularization
||u||1 =�
i
||ui||
��
Compute the solution of:
�� O(N log(N)) operations using FFT.
G1(u) = ||u||1 Prox�G1(u)i = max�
0, 1� �
||ui||
�ui
minf
12||Kf � y||2 + �||⇥f ||1
minx
G1(f) + G2 � �(f)
G2(f) =12||Kf � y||2 Prox�G2 = (Id + �K�K)�1K�
C =�(f, u) ⇥ RN � RN�2 \ u = ⇤f
�
Prox�C (f, u) = (f ,�f)
(Id + �)f = �div(u) + f
Example: TV Regularization
||u||1 =�
i
||ui||
��
Iteration �y = Kx0
y = �f0 + wOrignal f0 Recovery f�
Example: TV Regularization
dictionary
ConclusionSparsity: approximate signals with few atoms.
�� Randomized sensors + sparse recovery.�� Number of measurements � signal complexity.
Compressed sensing ideas:
�� CS is about designing new hardware.
dictionary
ConclusionSparsity: approximate signals with few atoms.
�� Randomized sensors + sparse recovery.�� Number of measurements � signal complexity.
Compressed sensing ideas:
The devil is in the constants:
�� Worse case analysis is problematic.
�� Designing good signal models.
�� CS is about designing new hardware.
dictionary
ConclusionSparsity: approximate signals with few atoms.
Dictionary learning:
learning
�
Some Hot Topics
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MA
IRA
Letal.:SPA
RSE
RE
PRE
SEN
TAT
ION
FOR
CO
LO
RIM
AG
ER
EST
OR
AT
ION
61
Fig.7.D
atasetused
forevaluating
denoisingexperim
ents.
TAB
LE
IPSN
RR
ESU
LTS
OF
OU
RD
EN
OISIN
GA
LG
OR
ITH
MW
ITH
256A
TO
MS
OF
SIZ
E7
73
FOR
AN
D6
63
FOR
.EA
CH
CA
SEIS
DIV
IDE
DIN
FO
UR
PA
RT
S:TH
ET
OP-L
EFT
RE
SULT
SA
RE
TH
OSE
GIV
EN
BY
MCA
UL
EY
AN
DA
L[28]W
ITH
TH
EIR
“33
MO
DE
L.”T
HE
TO
P-RIG
HT
RE
SULT
SA
RE
TH
OSE
OB
TAIN
ED
BY
APPLY
ING
TH
EG
RA
YSC
AL
EK
-SVD
AL
GO
RIT
HM
[2]O
NE
AC
HC
HA
NN
EL
SE
PAR
AT
ELY
WIT
H8
8A
TO
MS.T
HE
BO
TT
OM
-LE
FTA
RE
OU
RR
ESU
LTS
OB
TAIN
ED
WIT
HA
GL
OB
AL
LYT
RA
INE
DD
ICT
ION
AR
Y.TH
EB
OT
TO
M-R
IGH
TA
RE
TH
EIM
PRO
VE
ME
NT
SO
BTA
INE
DW
ITH
TH
EA
DA
PTIV
EA
PPRO
AC
HW
ITH
20IT
ER
AT
ION
S.B
OL
DIN
DIC
AT
ES
TH
EB
EST
RE
SULT
SFO
RE
AC
HG
RO
UP.
AS
CA
NB
ESE
EN,
OU
RP
RO
POSE
DT
EC
HN
IQU
EC
ON
SISTE
NT
LYP
RO
DU
CE
ST
HE
BE
STR
ESU
LTS
TAB
LE
IIC
OM
PAR
ISON
OF
TH
EPSN
RR
ESU
LTS
ON
TH
EIM
AG
E“C
AST
LE”
BE
TW
EE
N[28]
AN
DW
HA
TW
EO
BTA
INE
DW
ITH
2566
63
AN
D7
73
PA
TC
HE
S.F
OR
TH
EA
DA
PTIV
EA
PPRO
AC
H,20IT
ER
AT
ION
SH
AV
EB
EE
NP
ER
FOR
ME
D.BO
LD
IND
ICA
TE
ST
HE
BE
STR
ESU
LT,IN
DIC
AT
ING
ON
CE
AG
AIN
TH
EC
ON
SISTE
NT
IMPR
OV
EM
EN
TO
BTA
INE
DW
ITH
OU
RP
RO
POSE
DT
EC
HN
IQU
E
patch),inorderto
preventanylearning
ofthese
artifacts(over-
fitting).W
edefine
thenthe
patchsparsity
ofthe
decompo-
sitionas
thisnum
berof
steps.The
stoppingcriteria
in(2)
be-com
esthe
number
ofatom
sused
insteadof
thereconstruction
error.Using
asm
allduring
theO
MP
permits
tolearn
adic-
tionaryspecialized
inproviding
acoarse
approximation.
Our
assumption
isthat
(pattern)artifacts
areless
presentin
coarseapproxim
ations,preventingthe
dictionaryfrom
learningthem
.W
epropose
thenthe
algorithmdescribed
inFig.6.W
etypically
usedto
preventthe
learningof
artifactsand
foundout
thattwo
outeriterationsin
theschem
ein
Fig.6are
sufficienttogive
satisfactoryresults,w
hilew
ithinthe
K-SV
D,10–20
itera-tions
arerequired.
Toconclude,in
ordertoaddressthe
demosaicing
problem,w
euse
them
odifiedK
-SVD
algorithmthatdeals
with
nonuniformnoise,as
describedin
previoussection,and
addto
itanadaptive
dictionarythathas
beenlearned
with
lowpatch
sparsityin
orderto
avoidover-fitting
them
osaicpattern.T
hesam
etechnique
canbe
appliedto
genericcolor
inpaintingas
demonstrated
inthe
nextsection.
V.
EX
PER
IME
NTA
LR
ESU
LTS
We
arenow
readyto
presentthe
colorim
agedenoising,in-
painting,anddem
osaicingresultsthatare
obtainedw
iththe
pro-posed
framew
ork.
A.
Denoising
Color
Images
The
state-of-the-artperform
anceof
thealgorithm
ongrayscale
images
hasalready
beenstudied
in[2].
We
nowevaluate
ourextension
forcolor
images.
We
trainedsom
edictionaries
with
differentsizesof
atoms
55
3,66
3,7
73
and8
83,
on200
000patches
takenfrom
adatabase
of15
000im
agesw
iththe
patch-sparsityparam
eter(six
atoms
inthe
representations).We
usedthe
databaseL
abelMe
[55]to
buildour
image
database.T
henw
etrained
eachdictionary
with
600iterations.
This
providedus
aset
ofgeneric
dictionariesthat
we
usedas
initialdictionaries
inour
denoisingalgorithm
.C
omparing
theresults
obtainedw
iththe
globalapproach
andthe
adaptiveone
permits
usto
seethe
improvem
entsin
thelearning
process.W
echose
toevaluate
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61
Fig. 7. Data set used for evaluating denoising experiments.
TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY
APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.
BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS
TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.
FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE
patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.
To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.
V. EXPERIMENTAL RESULTS
We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.
A. Denoising Color Images
The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter
(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate
Dictionary learning:
Analysis vs. synthesis:
learning
�
Js(f) = minf=�x
||x||1
Some Hot Topics
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MA
IRA
Letal.:SPA
RSE
RE
PRE
SEN
TAT
ION
FOR
CO
LO
RIM
AG
ER
EST
OR
AT
ION
61
Fig.7.D
atasetused
forevaluating
denoisingexperim
ents.
TAB
LE
IPSN
RR
ESU
LTS
OF
OU
RD
EN
OISIN
GA
LG
OR
ITH
MW
ITH
256A
TO
MS
OF
SIZ
E7
73
FOR
AN
D6
63
FOR
.EA
CH
CA
SEIS
DIV
IDE
DIN
FO
UR
PA
RT
S:TH
ET
OP-L
EFT
RE
SULT
SA
RE
TH
OSE
GIV
EN
BY
MCA
UL
EY
AN
DA
L[28]W
ITH
TH
EIR
“33
MO
DE
L.”T
HE
TO
P-RIG
HT
RE
SULT
SA
RE
TH
OSE
OB
TAIN
ED
BY
APPLY
ING
TH
EG
RA
YSC
AL
EK
-SVD
AL
GO
RIT
HM
[2]O
NE
AC
HC
HA
NN
EL
SE
PAR
AT
ELY
WIT
H8
8A
TO
MS.T
HE
BO
TT
OM
-LE
FTA
RE
OU
RR
ESU
LTS
OB
TAIN
ED
WIT
HA
GL
OB
AL
LYT
RA
INE
DD
ICT
ION
AR
Y.TH
EB
OT
TO
M-R
IGH
TA
RE
TH
EIM
PRO
VE
ME
NT
SO
BTA
INE
DW
ITH
TH
EA
DA
PTIV
EA
PPRO
AC
HW
ITH
20IT
ER
AT
ION
S.B
OL
DIN
DIC
AT
ES
TH
EB
EST
RE
SULT
SFO
RE
AC
HG
RO
UP.
AS
CA
NB
ESE
EN,
OU
RP
RO
POSE
DT
EC
HN
IQU
EC
ON
SISTE
NT
LYP
RO
DU
CE
ST
HE
BE
STR
ESU
LTS
TAB
LE
IIC
OM
PAR
ISON
OF
TH
EPSN
RR
ESU
LTS
ON
TH
EIM
AG
E“C
AST
LE”
BE
TW
EE
N[28]
AN
DW
HA
TW
EO
BTA
INE
DW
ITH
2566
63
AN
D7
73
PA
TC
HE
S.F
OR
TH
EA
DA
PTIV
EA
PPRO
AC
H,20IT
ER
AT
ION
SH
AV
EB
EE
NP
ER
FOR
ME
D.BO
LD
IND
ICA
TE
ST
HE
BE
STR
ESU
LT,IN
DIC
AT
ING
ON
CE
AG
AIN
TH
EC
ON
SISTE
NT
IMPR
OV
EM
EN
TO
BTA
INE
DW
ITH
OU
RP
RO
POSE
DT
EC
HN
IQU
E
patch),inorderto
preventanylearning
ofthese
artifacts(over-
fitting).W
edefine
thenthe
patchsparsity
ofthe
decompo-
sitionas
thisnum
berof
steps.The
stoppingcriteria
in(2)
be-com
esthe
number
ofatom
sused
insteadof
thereconstruction
error.Using
asm
allduring
theO
MP
permits
tolearn
adic-
tionaryspecialized
inproviding
acoarse
approximation.
Our
assumption
isthat
(pattern)artifacts
areless
presentin
coarseapproxim
ations,preventingthe
dictionaryfrom
learningthem
.W
epropose
thenthe
algorithmdescribed
inFig.6.W
etypically
usedto
preventthe
learningof
artifactsand
foundout
thattwo
outeriterationsin
theschem
ein
Fig.6are
sufficienttogive
satisfactoryresults,w
hilew
ithinthe
K-SV
D,10–20
itera-tions
arerequired.
Toconclude,in
ordertoaddressthe
demosaicing
problem,w
euse
them
odifiedK
-SVD
algorithmthatdeals
with
nonuniformnoise,as
describedin
previoussection,and
addto
itanadaptive
dictionarythathas
beenlearned
with
lowpatch
sparsityin
orderto
avoidover-fitting
them
osaicpattern.T
hesam
etechnique
canbe
appliedto
genericcolor
inpaintingas
demonstrated
inthe
nextsection.
V.
EX
PER
IME
NTA
LR
ESU
LTS
We
arenow
readyto
presentthe
colorim
agedenoising,in-
painting,anddem
osaicingresultsthatare
obtainedw
iththe
pro-posed
framew
ork.
A.
Denoising
Color
Images
The
state-of-the-artperform
anceof
thealgorithm
ongrayscale
images
hasalready
beenstudied
in[2].
We
nowevaluate
ourextension
forcolor
images.
We
trainedsom
edictionaries
with
differentsizesof
atoms
55
3,66
3,7
73
and8
83,
on200
000patches
takenfrom
adatabase
of15
000im
agesw
iththe
patch-sparsityparam
eter(six
atoms
inthe
representations).We
usedthe
databaseL
abelMe
[55]to
buildour
image
database.T
henw
etrained
eachdictionary
with
600iterations.
This
providedus
aset
ofgeneric
dictionariesthat
we
usedas
initialdictionaries
inour
denoisingalgorithm
.C
omparing
theresults
obtainedw
iththe
globalapproach
andthe
adaptiveone
permits
usto
seethe
improvem
entsin
thelearning
process.W
echose
toevaluate
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61
Fig. 7. Data set used for evaluating denoising experiments.
TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY
APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.
BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS
TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.
FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE
patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.
To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.
V. EXPERIMENTAL RESULTS
We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.
A. Denoising Color Images
The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter
(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate
Image f = �x
Coe�cients x
�
Dictionary learning:
Analysis vs. synthesis:
learning
�
Ja(f) = ||D�f ||1
Js(f) = minf=�x
||x||1
Some Hot Topics
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MA
IRA
Letal.:SPA
RSE
RE
PRE
SEN
TAT
ION
FOR
CO
LO
RIM
AG
ER
EST
OR
AT
ION
61
Fig.7.D
atasetused
forevaluating
denoisingexperim
ents.
TAB
LE
IPSN
RR
ESU
LTS
OF
OU
RD
EN
OISIN
GA
LG
OR
ITH
MW
ITH
256A
TO
MS
OF
SIZ
E7
73
FOR
AN
D6
63
FOR
.EA
CH
CA
SEIS
DIV
IDE
DIN
FO
UR
PA
RT
S:TH
ET
OP-L
EFT
RE
SULT
SA
RE
TH
OSE
GIV
EN
BY
MCA
UL
EY
AN
DA
L[28]W
ITH
TH
EIR
“33
MO
DE
L.”T
HE
TO
P-RIG
HT
RE
SULT
SA
RE
TH
OSE
OB
TAIN
ED
BY
APPLY
ING
TH
EG
RA
YSC
AL
EK
-SVD
AL
GO
RIT
HM
[2]O
NE
AC
HC
HA
NN
EL
SE
PAR
AT
ELY
WIT
H8
8A
TO
MS.T
HE
BO
TT
OM
-LE
FTA
RE
OU
RR
ESU
LTS
OB
TAIN
ED
WIT
HA
GL
OB
AL
LYT
RA
INE
DD
ICT
ION
AR
Y.TH
EB
OT
TO
M-R
IGH
TA
RE
TH
EIM
PRO
VE
ME
NT
SO
BTA
INE
DW
ITH
TH
EA
DA
PTIV
EA
PPRO
AC
HW
ITH
20IT
ER
AT
ION
S.B
OL
DIN
DIC
AT
ES
TH
EB
EST
RE
SULT
SFO
RE
AC
HG
RO
UP.
AS
CA
NB
ESE
EN,
OU
RP
RO
POSE
DT
EC
HN
IQU
EC
ON
SISTE
NT
LYP
RO
DU
CE
ST
HE
BE
STR
ESU
LTS
TAB
LE
IIC
OM
PAR
ISON
OF
TH
EPSN
RR
ESU
LTS
ON
TH
EIM
AG
E“C
AST
LE”
BE
TW
EE
N[28]
AN
DW
HA
TW
EO
BTA
INE
DW
ITH
2566
63
AN
D7
73
PA
TC
HE
S.F
OR
TH
EA
DA
PTIV
EA
PPRO
AC
H,20IT
ER
AT
ION
SH
AV
EB
EE
NP
ER
FOR
ME
D.BO
LD
IND
ICA
TE
ST
HE
BE
STR
ESU
LT,IN
DIC
AT
ING
ON
CE
AG
AIN
TH
EC
ON
SISTE
NT
IMPR
OV
EM
EN
TO
BTA
INE
DW
ITH
OU
RP
RO
POSE
DT
EC
HN
IQU
E
patch),inorderto
preventanylearning
ofthese
artifacts(over-
fitting).W
edefine
thenthe
patchsparsity
ofthe
decompo-
sitionas
thisnum
berof
steps.The
stoppingcriteria
in(2)
be-com
esthe
number
ofatom
sused
insteadof
thereconstruction
error.Using
asm
allduring
theO
MP
permits
tolearn
adic-
tionaryspecialized
inproviding
acoarse
approximation.
Our
assumption
isthat
(pattern)artifacts
areless
presentin
coarseapproxim
ations,preventingthe
dictionaryfrom
learningthem
.W
epropose
thenthe
algorithmdescribed
inFig.6.W
etypically
usedto
preventthe
learningof
artifactsand
foundout
thattwo
outeriterationsin
theschem
ein
Fig.6are
sufficienttogive
satisfactoryresults,w
hilew
ithinthe
K-SV
D,10–20
itera-tions
arerequired.
Toconclude,in
ordertoaddressthe
demosaicing
problem,w
euse
them
odifiedK
-SVD
algorithmthatdeals
with
nonuniformnoise,as
describedin
previoussection,and
addto
itanadaptive
dictionarythathas
beenlearned
with
lowpatch
sparsityin
orderto
avoidover-fitting
them
osaicpattern.T
hesam
etechnique
canbe
appliedto
genericcolor
inpaintingas
demonstrated
inthe
nextsection.
V.
EX
PER
IME
NTA
LR
ESU
LTS
We
arenow
readyto
presentthe
colorim
agedenoising,in-
painting,anddem
osaicingresultsthatare
obtainedw
iththe
pro-posed
framew
ork.
A.
Denoising
Color
Images
The
state-of-the-artperform
anceof
thealgorithm
ongrayscale
images
hasalready
beenstudied
in[2].
We
nowevaluate
ourextension
forcolor
images.
We
trainedsom
edictionaries
with
differentsizesof
atoms
55
3,66
3,7
73
and8
83,
on200
000patches
takenfrom
adatabase
of15
000im
agesw
iththe
patch-sparsityparam
eter(six
atoms
inthe
representations).We
usedthe
databaseL
abelMe
[55]to
buildour
image
database.T
henw
etrained
eachdictionary
with
600iterations.
This
providedus
aset
ofgeneric
dictionariesthat
we
usedas
initialdictionaries
inour
denoisingalgorithm
.C
omparing
theresults
obtainedw
iththe
globalapproach
andthe
adaptiveone
permits
usto
seethe
improvem
entsin
thelearning
process.W
echose
toevaluate
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61
Fig. 7. Data set used for evaluating denoising experiments.
TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY
APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.
BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS
TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.
FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE
patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.
To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.
V. EXPERIMENTAL RESULTS
We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.
A. Denoising Color Images
The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter
(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate
Image f = �x
Coe�cients x c = D�f
� D�
(a) (b) (c)
Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.
natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.
We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond
3
(a) (b) (c)
Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.
natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.
We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond
3
Dictionary learning:
Analysis vs. synthesis:
learning
�
Ja(f) = ||D�f ||1
Js(f) = minf=�x
||x||1
Some Hot Topics
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MA
IRA
Letal.:SPA
RSE
RE
PRE
SEN
TAT
ION
FOR
CO
LO
RIM
AG
ER
EST
OR
AT
ION
61
Fig.7.D
atasetused
forevaluating
denoisingexperim
ents.
TAB
LE
IPSN
RR
ESU
LTS
OF
OU
RD
EN
OISIN
GA
LG
OR
ITH
MW
ITH
256A
TO
MS
OF
SIZ
E7
73
FOR
AN
D6
63
FOR
.EA
CH
CA
SEIS
DIV
IDE
DIN
FO
UR
PA
RT
S:TH
ET
OP-L
EFT
RE
SULT
SA
RE
TH
OSE
GIV
EN
BY
MCA
UL
EY
AN
DA
L[28]W
ITH
TH
EIR
“33
MO
DE
L.”T
HE
TO
P-RIG
HT
RE
SULT
SA
RE
TH
OSE
OB
TAIN
ED
BY
APPLY
ING
TH
EG
RA
YSC
AL
EK
-SVD
AL
GO
RIT
HM
[2]O
NE
AC
HC
HA
NN
EL
SE
PAR
AT
ELY
WIT
H8
8A
TO
MS.T
HE
BO
TT
OM
-LE
FTA
RE
OU
RR
ESU
LTS
OB
TAIN
ED
WIT
HA
GL
OB
AL
LYT
RA
INE
DD
ICT
ION
AR
Y.TH
EB
OT
TO
M-R
IGH
TA
RE
TH
EIM
PRO
VE
ME
NT
SO
BTA
INE
DW
ITH
TH
EA
DA
PTIV
EA
PPRO
AC
HW
ITH
20IT
ER
AT
ION
S.B
OL
DIN
DIC
AT
ES
TH
EB
EST
RE
SULT
SFO
RE
AC
HG
RO
UP.
AS
CA
NB
ESE
EN,
OU
RP
RO
POSE
DT
EC
HN
IQU
EC
ON
SISTE
NT
LYP
RO
DU
CE
ST
HE
BE
STR
ESU
LTS
TAB
LE
IIC
OM
PAR
ISON
OF
TH
EPSN
RR
ESU
LTS
ON
TH
EIM
AG
E“C
AST
LE”
BE
TW
EE
N[28]
AN
DW
HA
TW
EO
BTA
INE
DW
ITH
2566
63
AN
D7
73
PA
TC
HE
S.F
OR
TH
EA
DA
PTIV
EA
PPRO
AC
H,20IT
ER
AT
ION
SH
AV
EB
EE
NP
ER
FOR
ME
D.BO
LD
IND
ICA
TE
ST
HE
BE
STR
ESU
LT,IN
DIC
AT
ING
ON
CE
AG
AIN
TH
EC
ON
SISTE
NT
IMPR
OV
EM
EN
TO
BTA
INE
DW
ITH
OU
RP
RO
POSE
DT
EC
HN
IQU
E
patch),inorderto
preventanylearning
ofthese
artifacts(over-
fitting).W
edefine
thenthe
patchsparsity
ofthe
decompo-
sitionas
thisnum
berof
steps.The
stoppingcriteria
in(2)
be-com
esthe
number
ofatom
sused
insteadof
thereconstruction
error.Using
asm
allduring
theO
MP
permits
tolearn
adic-
tionaryspecialized
inproviding
acoarse
approximation.
Our
assumption
isthat
(pattern)artifacts
areless
presentin
coarseapproxim
ations,preventingthe
dictionaryfrom
learningthem
.W
epropose
thenthe
algorithmdescribed
inFig.6.W
etypically
usedto
preventthe
learningof
artifactsand
foundout
thattwo
outeriterationsin
theschem
ein
Fig.6are
sufficienttogive
satisfactoryresults,w
hilew
ithinthe
K-SV
D,10–20
itera-tions
arerequired.
Toconclude,in
ordertoaddressthe
demosaicing
problem,w
euse
them
odifiedK
-SVD
algorithmthatdeals
with
nonuniformnoise,as
describedin
previoussection,and
addto
itanadaptive
dictionarythathas
beenlearned
with
lowpatch
sparsityin
orderto
avoidover-fitting
them
osaicpattern.T
hesam
etechnique
canbe
appliedto
genericcolor
inpaintingas
demonstrated
inthe
nextsection.
V.
EX
PER
IME
NTA
LR
ESU
LTS
We
arenow
readyto
presentthe
colorim
agedenoising,in-
painting,anddem
osaicingresultsthatare
obtainedw
iththe
pro-posed
framew
ork.
A.
Denoising
Color
Images
The
state-of-the-artperform
anceof
thealgorithm
ongrayscale
images
hasalready
beenstudied
in[2].
We
nowevaluate
ourextension
forcolor
images.
We
trainedsom
edictionaries
with
differentsizesof
atoms
55
3,66
3,7
73
and8
83,
on200
000patches
takenfrom
adatabase
of15
000im
agesw
iththe
patch-sparsityparam
eter(six
atoms
inthe
representations).We
usedthe
databaseL
abelMe
[55]to
buildour
image
database.T
henw
etrained
eachdictionary
with
600iterations.
This
providedus
aset
ofgeneric
dictionariesthat
we
usedas
initialdictionaries
inour
denoisingalgorithm
.C
omparing
theresults
obtainedw
iththe
globalapproach
andthe
adaptiveone
permits
usto
seethe
improvem
entsin
thelearning
process.W
echose
toevaluate
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61
Fig. 7. Data set used for evaluating denoising experiments.
TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY
APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.
BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS
TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.
FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE
patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.
To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.
V. EXPERIMENTAL RESULTS
We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.
A. Denoising Color Images
The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter
(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate
Other sparse priors:
Image f = �x
Coe�cients x c = D�f
� D�
|x1| + |x2| max(|x1|, |x2|)
(a) (b) (c)
Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.
natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.
We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond
3
(a) (b) (c)
Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.
natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.
We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond
3
Dictionary learning:
Analysis vs. synthesis:
learning
�
Ja(f) = ||D�f ||1
Js(f) = minf=�x
||x||1
|x1| + (x22 + x2
3)12
Some Hot Topics
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MA
IRA
Letal.:SPA
RSE
RE
PRE
SEN
TAT
ION
FOR
CO
LO
RIM
AG
ER
EST
OR
AT
ION
61
Fig.7.D
atasetused
forevaluating
denoisingexperim
ents.
TAB
LE
IPSN
RR
ESU
LTS
OF
OU
RD
EN
OISIN
GA
LG
OR
ITH
MW
ITH
256A
TO
MS
OF
SIZ
E7
73
FOR
AN
D6
63
FOR
.EA
CH
CA
SEIS
DIV
IDE
DIN
FO
UR
PA
RT
S:TH
ET
OP-L
EFT
RE
SULT
SA
RE
TH
OSE
GIV
EN
BY
MCA
UL
EY
AN
DA
L[28]W
ITH
TH
EIR
“33
MO
DE
L.”T
HE
TO
P-RIG
HT
RE
SULT
SA
RE
TH
OSE
OB
TAIN
ED
BY
APPLY
ING
TH
EG
RA
YSC
AL
EK
-SVD
AL
GO
RIT
HM
[2]O
NE
AC
HC
HA
NN
EL
SE
PAR
AT
ELY
WIT
H8
8A
TO
MS.T
HE
BO
TT
OM
-LE
FTA
RE
OU
RR
ESU
LTS
OB
TAIN
ED
WIT
HA
GL
OB
AL
LYT
RA
INE
DD
ICT
ION
AR
Y.TH
EB
OT
TO
M-R
IGH
TA
RE
TH
EIM
PRO
VE
ME
NT
SO
BTA
INE
DW
ITH
TH
EA
DA
PTIV
EA
PPRO
AC
HW
ITH
20IT
ER
AT
ION
S.B
OL
DIN
DIC
AT
ES
TH
EB
EST
RE
SULT
SFO
RE
AC
HG
RO
UP.
AS
CA
NB
ESE
EN,
OU
RP
RO
POSE
DT
EC
HN
IQU
EC
ON
SISTE
NT
LYP
RO
DU
CE
ST
HE
BE
STR
ESU
LTS
TAB
LE
IIC
OM
PAR
ISON
OF
TH
EPSN
RR
ESU
LTS
ON
TH
EIM
AG
E“C
AST
LE”
BE
TW
EE
N[28]
AN
DW
HA
TW
EO
BTA
INE
DW
ITH
2566
63
AN
D7
73
PA
TC
HE
S.F
OR
TH
EA
DA
PTIV
EA
PPRO
AC
H,20IT
ER
AT
ION
SH
AV
EB
EE
NP
ER
FOR
ME
D.BO
LD
IND
ICA
TE
ST
HE
BE
STR
ESU
LT,IN
DIC
AT
ING
ON
CE
AG
AIN
TH
EC
ON
SISTE
NT
IMPR
OV
EM
EN
TO
BTA
INE
DW
ITH
OU
RP
RO
POSE
DT
EC
HN
IQU
E
patch),inorderto
preventanylearning
ofthese
artifacts(over-
fitting).W
edefine
thenthe
patchsparsity
ofthe
decompo-
sitionas
thisnum
berof
steps.The
stoppingcriteria
in(2)
be-com
esthe
number
ofatom
sused
insteadof
thereconstruction
error.Using
asm
allduring
theO
MP
permits
tolearn
adic-
tionaryspecialized
inproviding
acoarse
approximation.
Our
assumption
isthat
(pattern)artifacts
areless
presentin
coarseapproxim
ations,preventingthe
dictionaryfrom
learningthem
.W
epropose
thenthe
algorithmdescribed
inFig.6.W
etypically
usedto
preventthe
learningof
artifactsand
foundout
thattwo
outeriterationsin
theschem
ein
Fig.6are
sufficienttogive
satisfactoryresults,w
hilew
ithinthe
K-SV
D,10–20
itera-tions
arerequired.
Toconclude,in
ordertoaddressthe
demosaicing
problem,w
euse
them
odifiedK
-SVD
algorithmthatdeals
with
nonuniformnoise,as
describedin
previoussection,and
addto
itanadaptive
dictionarythathas
beenlearned
with
lowpatch
sparsityin
orderto
avoidover-fitting
them
osaicpattern.T
hesam
etechnique
canbe
appliedto
genericcolor
inpaintingas
demonstrated
inthe
nextsection.
V.
EX
PER
IME
NTA
LR
ESU
LTS
We
arenow
readyto
presentthe
colorim
agedenoising,in-
painting,anddem
osaicingresultsthatare
obtainedw
iththe
pro-posed
framew
ork.
A.
Denoising
Color
Images
The
state-of-the-artperform
anceof
thealgorithm
ongrayscale
images
hasalready
beenstudied
in[2].
We
nowevaluate
ourextension
forcolor
images.
We
trainedsom
edictionaries
with
differentsizesof
atoms
55
3,66
3,7
73
and8
83,
on200
000patches
takenfrom
adatabase
of15
000im
agesw
iththe
patch-sparsityparam
eter(six
atoms
inthe
representations).We
usedthe
databaseL
abelMe
[55]to
buildour
image
database.T
henw
etrained
eachdictionary
with
600iterations.
This
providedus
aset
ofgeneric
dictionariesthat
we
usedas
initialdictionaries
inour
denoisingalgorithm
.C
omparing
theresults
obtainedw
iththe
globalapproach
andthe
adaptiveone
permits
usto
seethe
improvem
entsin
thelearning
process.W
echose
toevaluate
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61
Fig. 7. Data set used for evaluating denoising experiments.
TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY
APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.
BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS
TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.
FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE
patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.
To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.
V. EXPERIMENTAL RESULTS
We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.
A. Denoising Color Images
The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter
(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate
Other sparse priors:
Image f = �x
Coe�cients x c = D�f
� D�
|x1| + |x2| max(|x1|, |x2|)
(a) (b) (c)
Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.
natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.
We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond
3
(a) (b) (c)
Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.
natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.
We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond
3
Dictionary learning:
Analysis vs. synthesis:
learning
�
Ja(f) = ||D�f ||1
Js(f) = minf=�x
||x||1
|x1| + (x22 + x2
3)12
Some Hot Topics
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57
Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,
dB.
Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
MA
IRA
Letal.:SPA
RSE
RE
PRE
SEN
TAT
ION
FOR
CO
LO
RIM
AG
ER
EST
OR
AT
ION
61
Fig.7.D
atasetused
forevaluating
denoisingexperim
ents.
TAB
LE
IPSN
RR
ESU
LTS
OF
OU
RD
EN
OISIN
GA
LG
OR
ITH
MW
ITH
256A
TO
MS
OF
SIZ
E7
73
FOR
AN
D6
63
FOR
.EA
CH
CA
SEIS
DIV
IDE
DIN
FO
UR
PA
RT
S:TH
ET
OP-L
EFT
RE
SULT
SA
RE
TH
OSE
GIV
EN
BY
MCA
UL
EY
AN
DA
L[28]W
ITH
TH
EIR
“33
MO
DE
L.”T
HE
TO
P-RIG
HT
RE
SULT
SA
RE
TH
OSE
OB
TAIN
ED
BY
APPLY
ING
TH
EG
RA
YSC
AL
EK
-SVD
AL
GO
RIT
HM
[2]O
NE
AC
HC
HA
NN
EL
SE
PAR
AT
ELY
WIT
H8
8A
TO
MS.T
HE
BO
TT
OM
-LE
FTA
RE
OU
RR
ESU
LTS
OB
TAIN
ED
WIT
HA
GL
OB
AL
LYT
RA
INE
DD
ICT
ION
AR
Y.TH
EB
OT
TO
M-R
IGH
TA
RE
TH
EIM
PRO
VE
ME
NT
SO
BTA
INE
DW
ITH
TH
EA
DA
PTIV
EA
PPRO
AC
HW
ITH
20IT
ER
AT
ION
S.B
OL
DIN
DIC
AT
ES
TH
EB
EST
RE
SULT
SFO
RE
AC
HG
RO
UP.
AS
CA
NB
ESE
EN,
OU
RP
RO
POSE
DT
EC
HN
IQU
EC
ON
SISTE
NT
LYP
RO
DU
CE
ST
HE
BE
STR
ESU
LTS
TAB
LE
IIC
OM
PAR
ISON
OF
TH
EPSN
RR
ESU
LTS
ON
TH
EIM
AG
E“C
AST
LE”
BE
TW
EE
N[28]
AN
DW
HA
TW
EO
BTA
INE
DW
ITH
2566
63
AN
D7
73
PA
TC
HE
S.F
OR
TH
EA
DA
PTIV
EA
PPRO
AC
H,20IT
ER
AT
ION
SH
AV
EB
EE
NP
ER
FOR
ME
D.BO
LD
IND
ICA
TE
ST
HE
BE
STR
ESU
LT,IN
DIC
AT
ING
ON
CE
AG
AIN
TH
EC
ON
SISTE
NT
IMPR
OV
EM
EN
TO
BTA
INE
DW
ITH
OU
RP
RO
POSE
DT
EC
HN
IQU
E
patch),inorderto
preventanylearning
ofthese
artifacts(over-
fitting).W
edefine
thenthe
patchsparsity
ofthe
decompo-
sitionas
thisnum
berof
steps.The
stoppingcriteria
in(2)
be-com
esthe
number
ofatom
sused
insteadof
thereconstruction
error.Using
asm
allduring
theO
MP
permits
tolearn
adic-
tionaryspecialized
inproviding
acoarse
approximation.
Our
assumption
isthat
(pattern)artifacts
areless
presentin
coarseapproxim
ations,preventingthe
dictionaryfrom
learningthem
.W
epropose
thenthe
algorithmdescribed
inFig.6.W
etypically
usedto
preventthe
learningof
artifactsand
foundout
thattwo
outeriterationsin
theschem
ein
Fig.6are
sufficienttogive
satisfactoryresults,w
hilew
ithinthe
K-SV
D,10–20
itera-tions
arerequired.
Toconclude,in
ordertoaddressthe
demosaicing
problem,w
euse
them
odifiedK
-SVD
algorithmthatdeals
with
nonuniformnoise,as
describedin
previoussection,and
addto
itanadaptive
dictionarythathas
beenlearned
with
lowpatch
sparsityin
orderto
avoidover-fitting
them
osaicpattern.T
hesam
etechnique
canbe
appliedto
genericcolor
inpaintingas
demonstrated
inthe
nextsection.
V.
EX
PER
IME
NTA
LR
ESU
LTS
We
arenow
readyto
presentthe
colorim
agedenoising,in-
painting,anddem
osaicingresultsthatare
obtainedw
iththe
pro-posed
framew
ork.
A.
Denoising
Color
Images
The
state-of-the-artperform
anceof
thealgorithm
ongrayscale
images
hasalready
beenstudied
in[2].
We
nowevaluate
ourextension
forcolor
images.
We
trainedsom
edictionaries
with
differentsizesof
atoms
55
3,66
3,7
73
and8
83,
on200
000patches
takenfrom
adatabase
of15
000im
agesw
iththe
patch-sparsityparam
eter(six
atoms
inthe
representations).We
usedthe
databaseL
abelMe
[55]to
buildour
image
database.T
henw
etrained
eachdictionary
with
600iterations.
This
providedus
aset
ofgeneric
dictionariesthat
we
usedas
initialdictionaries
inour
denoisingalgorithm
.C
omparing
theresults
obtainedw
iththe
globalapproach
andthe
adaptiveone
permits
usto
seethe
improvem
entsin
thelearning
process.W
echose
toevaluate
MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61
Fig. 7. Data set used for evaluating denoising experiments.
TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY
APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.
BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS
TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.
FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE
patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.
To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.
V. EXPERIMENTAL RESULTS
We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.
A. Denoising Color Images
The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter
(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate
Other sparse priors:
Image f = �x
Coe�cients x c = D�f
� D�
|x1| + |x2| max(|x1|, |x2|)
(a) (b) (c)
Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.
natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.
We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond
3
Nuclear