Sparsity Based Methods for Overparameterized Variational ...

16
arXiv:1405.4969v5 [cs.CV] 14 Aug 2015 1 Sparsity Based Methods for Overparameterized Variational Problems Raja Giryes * , Michael Elad ** , and Alfred M. Bruckstein ** * Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina, 27708, USA. [email protected]. ** Department of Computer Science, The Technion - Israel Institute of Technology, Haifa, 32000, Israel. {elad, freddy}@cs.technion.ac.il Abstract—Two complementary approaches have been exten- sively used in signal and image processing leading to novel results, the sparse representation methodology and the variational strat- egy. Recently, a new sparsity based model has been proposed, the cosparse analysis framework, which may potentially help in bridging sparse approximation based methods to the traditional total-variation minimization. Based on this, we introduce a spar- sity based framework for solving overparameterized variational problems. The latter has been used to improve the estimation of optical flow and also for general denoising of signals and images. However, the recovery of the space varying parameters involved was not adequately addressed by traditional variational methods. We first demonstrate the efficiency of the new framework for one dimensional signals in recovering a piecewise linear and polynomial function. Then, we illustrate how the new technique can be used for denoising and segmentation of images. I. I NTRODUCTION Many successful signal and image processing techniques rely on the fact that the given signals or images of interest belong to a class described by a certain a priori known model. Given the model, the signal is processed by estimating the “correct” parameters of the model. For example, in the sparsity framework the assumption is that the signals belong to a union of low dimensional subspaces [1], [2], [3], [4]. In the variational strategy, a model is imposed on the variations of the signal, e.g., its derivatives are required to be smooth [5], [6], [7], [8]. Though both sparsity-based and variational-based ap- proaches are widely used for signal processing and computer vision, they are often viewed as two different methods with little in common between them. One of the well known vari- atonal tools is the total-variation regularization, used mainly for denoising and inverse problems. It can be formulated as [6] min ˜ f g M ˜ f 2 2 + λ ˜ f 1 , (1) where g = Mf + e R m are the given noisy measurements, M R m×d is a measurement matrix, e R m is an additive (typically white Gaussian) noise, λ is a regularization parameter, f R d is the original unknown signal to be recovered, and f is its gradients vector. The anisotropic version of (1), which we will use in this work, is min ˜ f g M ˜ f 2 2 + λ Ω DIF ˜ f 1 , (2) where Ω DIF is the finite-difference operator that returns the derivatives of the signal. In the 1D case it applies the filter [1, 1], i.e., Ω DIF = Ω 1D-DIF = 1 1 0 ... ... 0 0 1 1 ... ... 0 . . . . . . . . . . . . . . . . . . 0 0 . . . 1 1 0 0 0 ... ... 1 1 , (3) For images it returns the horizontal and vertical derivatives using the filters [1, 1] and [1, 1] T respectively. Note that for one dimensional signals there is no difference between (1) and (2) as the gradient equals the derivative. However, in the 2D case the first (Eq. (1)) considers the sum of gradients (square root of the squared sum of the directional derivatives), while the second (Eq. (2)) considers the absolute sum of the directional derivatives, approximated by finite differences. Recently, a very interesting connection has been drawn between the total-variation minimization problem and the sparsity model. It has been shown that (2) can be viewed as an 1 -relaxation technique for approximating signals that are sparse in their derivatives domain, i.e., after applying the operator Ω DIF on them [9], [10], [11]. Such signals are said to be cosparse under the operator Ω DIF in the analysis (co)sparsity model [10]. Notice that the TV regularization is only one example from the variational framework. Another recent technique, which is the focus of this paper, is the overparameterization idea, which represents the signal as a combination of known functions weighted by space-variant parameters of the model [12], [13]. Let us introduce this overparameterized model via an ex- ample. If a 1D signal f is known to be piecewise linear, its i-th element can be written as f (i)= a(i)+ b(i)i, where a(i) and b(i) are the local coefficients describing the local line-curve. As such, the vectors a and b should be piecewise constant, with discontinuities in the same locations. Each constant interval in a and b corresponds to one linear segment

Transcript of Sparsity Based Methods for Overparameterized Variational ...

Page 1: Sparsity Based Methods for Overparameterized Variational ...

arX

iv:1

405.

4969

v5 [

cs.C

V]

14 A

ug 2

015

1

Sparsity Based Methods for OverparameterizedVariational Problems

Raja Giryes* , Michael Elad** , and Alfred M. Bruckstein**

* Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina, 27708, [email protected].

** Department of Computer Science, The Technion - Israel Institute of Technology, Haifa, 32000, Israel.elad, [email protected]

Abstract—Two complementary approaches have been exten-sively used in signal and image processing leading to novel results,the sparse representation methodology and the variationalstrat-egy. Recently, a new sparsity based model has been proposed,the cosparse analysis framework,which may potentially help inbridging sparse approximation based methods to the traditionaltotal-variation minimization. Based on this, we introduce a spar-sity based framework for solving overparameterized variationalproblems. The latter has been used to improve the estimationofoptical flow and also for general denoising of signals and images.However, the recovery of the space varying parameters involvedwas not adequatelyaddressedby traditional variational methods.We first demonstrate the efficiency of the new framework forone dimensional signals in recovering a piecewise linear andpolynomial function. Then, we illustrate how the new techniquecan be used for denoising and segmentation of images.

I. I NTRODUCTION

Many successful signal and image processing techniquesrely on the fact that the given signals or images of interestbelong to a class described by a certain a priori known model.Given the model, the signal is processed by estimating the“correct” parameters of the model. For example, in the sparsityframework the assumption is that the signals belong to aunion of low dimensional subspaces [1], [2], [3], [4]. In thevariational strategy, a model is imposed on the variations ofthe signal, e.g., its derivatives are required to be smooth [5],[6], [7], [8].

Though both sparsity-based and variational-based ap-proaches are widely used for signal processing and computervision, they are often viewed as two different methods withlittle in common between them. One of the well known vari-atonal tools is the total-variation regularization, used mainlyfor denoising and inverse problems. It can be formulated as[6]

minf

∥g−Mf

2

2+ λ

∥∇f

1, (1)

whereg = Mf + e ∈ Rm are thegiven noisy measurements,

M ∈ Rm×d is a measurement matrix,e ∈ R

m is anadditive(typically white Gaussian)noise,λ is a regularizationparameter,f ∈ R

d is the original unknown signalto berecovered,and∇f is its gradients vector.

The anisotropic version of (1),which we will use in thiswork, is

minf

∥g −Mf

2

2+ λ

∥ΩDIFf

1, (2)

whereΩDIF is the finite-difference operator that returns thederivatives of the signal. In the1D case it applies the filter[1,−1], i.e.,

ΩDIF = Ω1D-DIF =

1 −1 0 . . . . . . 00 1 −1 . . . . . . 0...

. . .. ..

. . .. . .

...

0 0. .. 1 −1 0

0 0 . . . . . . 1 −1

, (3)

For images it returns the horizontal and vertical derivativesusing the filters[1,−1] and [1,−1]T respectively. Note thatfor one dimensional signals there is no difference between(1) and (2) as the gradient equals the derivative.However, inthe 2D case the first (Eq. (1)) considers the sum of gradients(square root of the squared sum of the directional derivatives),while the second (Eq. (2)) considers the absolute sum of thedirectional derivatives, approximated by finite differences.

Recently, a very interesting connection has been drawnbetween the total-variation minimization problem and thesparsity model. It has been shown that (2) can be viewedas anℓ1-relaxation technique for approximating signals thatare sparse in their derivatives domain, i.e., after applying theoperatorΩDIF on them [9], [10], [11]. Such signals are said tobe cosparse under the operatorΩDIF in the analysis (co)sparsitymodel [10].

Notice that the TV regularization is only one example fromthe variational framework. Another recent technique,which isthe focus of this paper,is the overparameterizationidea, whichrepresents the signal as a combination of known functionsweighted by space-variant parameters of the model [12], [13].

Let us introduce this overparameterized model via an ex-ample. If a 1D signalf is known to be piecewise linear, itsi-th element can be written asf(i) = a(i) + b(i)i, wherea(i) and b(i) are the local coefficients describing the localline-curve. As such, the vectorsa andb should be piecewiseconstant, with discontinuities in the same locations. Eachconstant interval ina andb corresponds to one linear segment

Page 2: Sparsity Based Methods for Overparameterized Variational ...

2

in f . When put in matrix-vector notation,f can be writtenalternatively as

f = a+ Zb, (4)

where Z ∈ Rd×d is a diagonal matrix with the values

1, 2, . . . , d on its main diagonal.For images this parameteriza-tion wouldsimilarly bef(i, j) = a(i, j)+b1(i, j)i+b2(i, j)j.

This strategy is referred to as”overparameterization”be-cause the number ofrepresentationparameters islarger thanthe signalsize. In the above 1D example, while the originalsignal containsd unknown values, the recovery problem thatseeksa and b has twice as many variables. Clearly, thereare many other parameterization options for signals, beyondthe linear one. Such parameterizations have been shown toimprove the denoising performance ofthe solution of theproblem posed in(1) in some cases [12], and to provide veryhigh quality results for optical flow estimation [13], [14].

A. Our Contribution

The true force behind overparameterization is that while ituses more variables than needed for representing the signals,these are often more naturally suited to describe its structure.For example, if a signal is piecewise linear then we mayimpose a constraint on the overparameterization coefficientsa andb to be piecewise constant.

Note that piecewise constant signals are sparse under theΩDIF operator. Therefore, for each of the coefficients we canuse the tools developed in the analysis sparsity model [11],[15], [16], [17], [18]. However, in our casea andb are jointlysparse, i.e., their change points are collocated and therefore anextension is necessary.

Constraints on the structure in the sparsity pattern of arepresentation have already been analyzed in the literature.They are commonly referred to as joint sparsity models, andthose are found in the literature,both in the context of handlinggroups of signals [19], [20], [21], [22], [23], [24], or whenconsidering blocks of non-zeros in a single representationvector [25], [26], [27], [28], [29]. We use these tools to extendthe existing analysis techniques to handle the block sparsityin our overparameterized scheme.

In this paper we introduce a general sparsity based frame-work for solving overparameterized variational problems.Asthe structure of these problems enables segmentation whilerecovering the signal,we provide an elegant way for recover-ing a signal from its deteriorated measurements by using anℓ0 approach, which is accompanied by theoretical guarantees.We demonstratethe efficiency of the new framework for onedimensional functions in recovering piecewise polynomialsig-nals. Then weshift our view to images and demonstrate howthe new approach can be used for denoising and segmentation.

B. Organization

This paper is organized as follows: In Section II we presentthe overparameterized variational model with more details.In Section III we describe briefly the synthesis and analysissparsity models. In Sections IV and V we introduce a newframework for solving overparameterized variational problems

using sparsity. Section IV proposes a recovery strategy forthe 1D polynomial case based on the SSCoSaMP techniquewith optimal projections [30], [31], [32]. We provide stablerecovery guarantees for this algorithm for the case of anadditive adversarial noise and denoising guarantees for thecaseof a zero-mean white Gaussian noise. In Section V weextend our scheme beyond the 1D case to higher dimensionalpolynomial functions such as images. We employ an extensionof the GAPN algorithm [33] for block sparsity for this task. InSection VI we present experiments for linear overparameteri-zation of images and one dimensional signals. Wedemonstratehow the proposed method can be used for image denoising andsegmentation. Section VII concludes our work and proposesfuture directions of research.

II. THE OVERPARAMETERIZEDVARIATIONAL

FRAMEWORK

Considering again the linear relationship between the mea-surements and the unknown signal,

g = Mf + e, (5)

note that without a prior knowledge onf we cannot recoverit from g if m < d or e 6= 0.

In the variational framework, a regularization is imposedon the variations of the signalf . One popular strategy forrecovering the signal in this framework is by solving thefollowing minimization problem:

minf

∥g −Mf

2

2+ λ

∥A(f )

p, (6)

whereλ is the regularization weight andp ≥ 1 is the typeof norm used with the regularization operatorA, which istypically a local operator. For example, forp = 1 andA = ∇we get the TV minimization (Eq. (2)). Another example for aregularization operator is the Laplace operatorA = ∇2. Othertypes of regularization operators and variational formulationscan be found in [34], [35], [36], [37].

Recently, the overameterized variational framework hasbeen introduced as an extension to the traditional variationalmethodology [12], [13], [14], [38]. Instead of applying aregularization on the signal itself, it is applied on the coef-ficients of the signal under a global parameterization of thespace. Each element of the signal can be modeled asf(i) =∑n

j=1 bj(i)xj(i), wherebjnj=1 are the coefficients vectorsandxjnj=1 contain the parameterization basis functions forthe space.

Denoting byXi , diag(xi) the diagonal matrix that hasthe vectorxi on its diagonal, we can rewrite the above asf =

∑n

j=1 Xjbj . With these notations, the overparameterizedminimization problem becomes

minbi,1≤i≤n

g−M

n∑

i=1

Xibi

2

2

+

n∑

i=1

λi

∥Ai(bi)

pi

, (7)

where each coefficientbi is regularized separately by the op-eratorAi (which can be the same one for all the coefficients)1.

1We note that it is possible to have more than one regularization for eachcoefficient, as practiced in [38].

Page 3: Sparsity Based Methods for Overparameterized Variational ...

3

Returning to the example of a linear overparameterization,we have thatf = a + Zb, where in this caseX1 = I (theidentity matrix) andX2 = Z = diag(1, . . . , d), a diagonalmatrix with 1, . . . , d on its diagonal. If f is a piecewiselinear function then the coefficients vectorsa and b shouldbe piecewise constant, and therefore it would be natural toregularize these coefficients with the gradient operator. Thisleads to the following minimization problem:

mina,b

∥g−M

(

a+ Zb)∥

2

2+ λ1 ‖∇a‖1 + λ2

∥∇b

1, (8)

which is a special case of (7). The two main advantages ofusing the overparameterized formulation are these: (i) thenewunknowns have a simpler form (e.g. a piecewise linear signalistreated by piecewise constant unknowns), and thus are easierto recover; and (ii) this formulation leads to recovering theparameters of the signal along with the signal itself.

The overparametrization idea, as introduced in [12], [13],[14], [38] builds upon the vast work in signal processing thatrefers to variational methods. As such, there are no knownguarantees for the quality of the recovery of the signal, whenusing the formulation posed in (8) or its variants. Moreover, ithas been shown in [38] that even for the case ofM = I

(and obviously,e 6= 0), a poor recovery is achieved inrecoveringf and its parameterization coefficients. Note thatthe same happens even if more sophisticated regularizationsare combined and applied ona, b, and eventually onf [38].

This leads us to look for another strategy to approachthe problem of recovering a piecewise linear function fromits deteriorated measurementg. Before describing our newscheme, we introduce in the next section the sparsity modelthat will aid us in developing this alternative strategy.

III. T HE SYNTHESIS AND ANALYSIS SPARSITY MODELS

A popular prior for recovering a signalf from its distortedmeasurements (as posed in (5)) is the sparsity model [1], [2].The idea behind it is that if we know a priori thatf resides ina union of low dimensional subspaces, which do not intersecttrivially with the null space ofM, then we can estimatefstably by selecting the signal that belongs to this union ofsubspaces and is the closest tog [3], [4].

In the classical sparsity model, the signalf is assumed tohave a sparse representationα under a given dictionaryD,i.e., f = Dα, ‖α‖0 ≤ k, where‖·‖0 is the ℓ0 pseudo-normthat counts the number of non-zero entries in a vector, andkis the sparsity of the signal. Note that each low dimensionalsubspace in the standard sparsity model, known also as thesynthesis model, is spanned by a collection ofk columns fromD. With this model we can recoverf by solving

minα

‖g−MDα‖22 s.t. ‖α‖0 ≤ k, (9)

if k is known, or

minα

‖α‖0 s.t. ‖g −MDα‖22 ≤ ‖e‖22 , (10)

if we have information about the energy of the noisee.Obviously, once we getα, the desired recovered signal is

simply Dα. As both of these minimization problems are NP-hard [39], many approximation techniques have been proposedto approximate their solution, accompanied with recoveryguarantees that depend on the properties of the matricesM

and D. These includeℓ1-relaxation [40], [41], [42], knownalso as LASSO [43], matching pursuit (MP) [44], orthogonalmatching pursuit (OMP) [45], [46], compressive samplingmatching pursuit (CoSaMP) [47], subspace pursuit (SP) [48],iterative hard thresholding (IHT) [49] and hard thresholdingpursuit (HTP) [50].

Another framework for modeling a union of low dimen-sional subspaces is the analysis one [10], [40]. This modelconsiders the behavior ofΩf , the signal after applying a givenoperatorΩ on it, and assumes that this vector is sparse. Notethat here the zeros are those that characterize the subspaceinwhich f resides, as each zero inΩf corresponds to a row inΩto which f is orthogonal to. Therefore,f resides in a subspaceorthogonal to the one spanned by these rows. We say thatf

is cosparse underΩ with a cosupportΛ if ΩΛf = 0, whereΩΛ is a sub-matrix ofΩ with the rows corresponding to thesetΛ.

The analysis variants of (9) and (10) for estimatingf are

minf

∥g−Mf

2

2s.t.

∥Ωf

0≤ k, (11)

wherek is the number of non-zeros inΩf , and

minf

∥Ωf

0s.t.

∥g−Mf

2

2≤ ‖e‖22 . (12)

As in the synthesis case, these minimization problems arealso NP-hard [10] and approximation techniques have beenproposed including Greedy Analysis Pursuit (GAP) [10], GAPnoise (GAPN) [33], analysis CoSAMP (ACoSaMP), analysisSP (ASP), analysis IHT (AIHT) and analysis HTP (AHTP)[11].

IV. OVERPARAMETERIZATION VIA THE ANALYSIS

SPARSITY MODEL

With the sparsity models now defined, we revisit theoverparameterization variational problem. If we know thatour signal f is piecewise linear, then it is clear that thecoefficients parameters should be piecewise constant with thesamediscontinuity locations, when linear overparameterizationis used.We denote byk the number of these discontinuitylocations.

As a reminder we rewritef = [I,Z][

aT ,bT]T

. Note thata and b are jointly sparse underΩDIF, i.e, ΩDIFa and ΩDIFb

have the same non-zero locations. With this observation wecan extend the analysis minimization problem (11) to supportthe structured sparsity in the vector

[

aT ,bT]T

, leading to thefollowing minimization problem:

mina,b

g −M [I,Z]

[

a

b

]∥

2

2

(13)

s.t. ‖|ΩDIFa|+ |ΩDIFb|‖0 ≤ k,

where |ΩDIFa| denotes applying element-wise absolute valueon the entries ofΩDIFa.

Page 4: Sparsity Based Methods for Overparameterized Variational ...

4

Note that we can have a similar formulation for this problemalso in the synthesis framework using the Heaviside dictionary

DHS =

1 1 . . . 1 1

0 1 . . .. . . 1

... 0. . .

. . ....

.... . .

. . . 1 10 0 . . . 0 1

, (14)

whose atoms are step functions of different length. We use theknown observation that every one dimensional signal withkchange points can be sparsely represented usingk + 1 atomsfrom DHS (k columns for representing the change points plusone for the DC). One way to observe that is by the fact thatΩDIFDHS = I, whereDHS is a submatrix ofDHS obtainedby removing the last column ofDHS (the DC component).Therefore, one may recover the coefficient parametersa andb, by their sparse representationsα andβ, solving

minα,β

g−M [I,Z]

[

DHS 00 DHS

] [

α

β

]∥

2

2

(15)

s.t. ‖|α|+ |β|‖0 ≤ k,

where a = DHSα and b = DHSβ. This minimizationproblem can be approximated using block-sparsity techniquessuch as the group-LASSO estimator [25], the mixed-ℓ2/ℓ1relaxation (extension of theℓ1 relaxation) [26], [27], the BlockOMP (BOMP) algorithm [28] or the extensions of CoSaMPand IHT for structured sparsity [29].The joint sparsity frame-work can also be used with (15)[19], [20], [21], [22], [23],[24].

The problem with the above synthesis techniques is twofold:(i) No recovery guarantees exist forthis formulationwith thedictionaryDHS ; (ii) It is hard to generalize the model in (9)to higher order signals, e.g., images.

The reason that no theoretical guarantees are providedfor the DHS dictionary is the high correlation between itscolumns. These create high ambiguity, causing the classicalsynthesis techniques to fail in recovering the representationsα and β. This problem has been addressed in several con-tributions that have treated the signal directly and not itsrepresentation [30], [31], [32], [51], [52], [53].

We introduce an algorithm that approximates the solutionsof both (9) and (11) and has theoretical reconstruction per-formance guarantees for one dimensional functionsf withmatricesM that are near isometric for piecewise polynomialfunctions. In the next section weshall present anotheralgo-rithm that does not have such guarantees but is generalizableto higher order functions.

Though till now we have restricted our discussion onlyto piecewise linear functions, we turn now to look at themore general case of piecewise 1D polynomial functions ofdegreen. Note that this method approximates the followingminimization problem, which is a generalization of (13) to any

polynomial of degreen,

minb0,b1,...,bn

g−M[

I,Z,Z2, . . . ,Zn]

b0

b1

...bn

2

2

(16)

s.t.

n∑

i=0

|ΩDIFbi|∥

0

≤ k,

where|ΩDIFbi| is an element-wise operation that calculates theabsolute value of each entry inΩDIFbi.

We employ the signal space CoSaMP (SSCoSaMP) strategy[30], [31]2 to approximate the solution of (16). This algorithmassumes the existence of a projection that for a given signalfinds its closest signal (in theℓ2-norm sense) that belongs tothe model3, where in our case the model is piecewise polyno-mial functions withk jump points.This algorithm, along withthe projection required, are presented in Appendix A.

A. Recovery Guarantees for Piecewise Polynomial Functions

To provide theoretical guarantees for the recovery bySSCoSaMP we employ two theorems from [31] and [32].These lead to reconstruction error bounds for SSCoSaMP thatguarantee stable recovery if the noise is adversarial,and aneffective denoisingeffect if it is zero-mean white Gaussian.

Both theorems rely on the following property of the mea-surement matrixM, which is a special case of theD-RIP [18]andΩ-RIP [11].

Definition 4.1: A matrix M has a polynomial restrictedisometry property of ordern (Pn-RIP) with a constantδk iffor any piecewise polynomial functionf of order n with kjumps we have

(1 − δk) ‖f‖22 ≤ ‖Mf‖22 ≤ (1 + δk) ‖f‖22 . (17)

Having thePn-RIP definition we turn to present the firsttheorem, which treats the adversarial noise case.

Theorem 4.2 (Based on Corollary 3.2 in [31]4): Let f be apiecewise polynomial function of ordern, e be an adversarialbounded noiseand M satisfy thePn-RIP (17) with a con-stant δ4k < 0.046. Then after a finite number of iterations,SSCoSaMP yields

∥f − f

2≤ C ‖e‖2 , (18)

whereC > 2 is a constant depending onδ4k.Note that the above theorem implies that we may com-

pressively sense piecewise polynomial functions and achievea perfect recovery in the noiseless casee = 0. Note also thatif M is a subgaussian random matrix then it is sufficient touse onlym = O(k(n+ log(d)) measurements [3], [11].

Though the above theorem is important for compressedsensing, it does not guarantee noise reduction, even for the

2In a very similar way we could have used the analysis CoSaMP(ACoSaMP) [11], [16].

3Note that in [30], [31] the projection might be allowed to be near-optimalin the sense that the projection error is close to the optimalerror up to amultiplicative constant.

Page 5: Sparsity Based Methods for Overparameterized Variational ...

5

0 50 100 150 200 250 300−1.5

−1

−0.5

0

0.5

1

1.5

(a) Noisy Functionσ = 0.1

0 50 100 150 200 250 300−1.5

−1

−0.5

0

0.5

1

1.5

original

BGAPN Recovery

BGAPN Continuous Recovery

(b) Function Recovery forσ = 0.1

0 50 100 150 200 250 300−10

−5

0

5

10

a Estimate

b Estimate

0 50 100 150 200 250 300−10

−5

0

5

10

a true

b true

(c) Coefficients Parameters Recovery forσ = 0.1

0 50 100 150 200 250 300−1.5

−1

−0.5

0

0.5

1

1.5

(d) Noisy Functionσ = 0.25

0 50 100 150 200 250 300−1.5

−1

−0.5

0

0.5

1

1.5

original

BGAPN Recovery

BGAPN Continuous Recovery

(e) Function Recovery forσ = 0.25

0 50 100 150 200 250 300−10

−5

0

5

10

a Estimate

b Estimate

0 50 100 150 200 250 300−10

−5

0

5

10

a true

b true

(f) Coefficients Parameters Recovery forσ = 0.25

Fig. 1. Recovery of a piecewise linear function using the BGAPN algorithm with and without a constraint on the continuity.

caseM = I, as C > 2. The reason for this is that thenoise here is adversarial, leading to a worst-case bound. Byintroducing a random distribution for the noise, one mayget better reconstruction guarantees. The following theoremassumes that the noise is randomly Gaussian distributed,thisway enabling to provide effectivedenoising guarantees.

Theorem 4.3 (Based on Theorem 1.7 in [32]5): Assumethe conditions of Theorem 4.2 such thate is a randomzero-mean white Gaussian noise with a varianceσ2. Thenafter a finite number of iterations, SSCoSaMP yields∥

∥f − f

2≤ (19)

C√

(1 + δ3k)3k(

1 +√

2(1 + β) log(nd))

σ,

with probability exceeding1− 2(3k)! (nd)

−β .The bound in the theorem can be given on the expected

error instead of being given only with high probability usingthe proof technique in [54]. We remark that if we were givenan oracle that foreknows the locations of the jumps in theparameterization, the error we would get would beO(

√kσ).

As thelog(nd) factor in our bound is inevitable [55], we mayconclude that our guarantee is optimal up to a constant factor.

V. SPARSITY BASEDOVERPARAMETERIZEDVARIATIONAL

ALGORITHM FOR HIGH DIMENSIONAL FUNCTIONS

We now turn to generalize the model in (13) to supportother overparameterization forms, including higher dimen-sional functions such as images. We consider the case where an

upper-bound for the noise energy is given and not the sparsityk, as is common in many applications. Notice that for thesynthesis model, such a generalization is not trivial becausewhile it is easy to extend theΩDIF operator to high dimensions,it is not clear how to do this for the Heaviside dictionary.

Therefore we consider an overparameterized version of (12),where the noise energy is known and the analysis modelis used. LetX1, . . .Xn be matrices of the space variablesand b1 . . .bn their coefficients parameters.For example, ina 2D (image) case of piecewise linear constant,X1 willbe the identity matrix,X2 will be a diagonal matrix withthe values[1, 2, . . . , d, 1, 2, . . . , d, . . . 1, 2, . . . , d] on its maindiagonal, andX3 will similarly be a diagonal matrix with[1, 1, . . . , 1, 2, 2, . . . , 2, . . . d, d, . . . , d] on its main diagonal.Assuming that all the coefficient parameters are jointly sparseunder a general operatorΩ, we may recover these coefficientsby solving

[

bT1 , . . . , b

Tn

]T

= minb1,...,bn

n∑

i=1

∣Ωbi

0

(20)

s.t.

g −M [X1, . . .Xn]

b1

...bn

2

≤ ‖e‖2 .

Having an estimate for all these coefficients,our approximation for the original signal f is

f = [X1, . . . ,Xn][

bT1 , . . . , b

Tn

]T

.As the minimization problem in (20) is NP-hard we suggest

Page 6: Sparsity Based Methods for Overparameterized Variational ...

6

0 50 100 150 200 250 300−1.5

−1

−0.5

0

0.5

1

1.5

(a) Noisy Functionσ = 0.1

0 50 100 150 200 250 300−1.5

−1

−0.5

0

0.5

1

1.5

original

BGAPN Recovery

BGAPN Continuous Recovery

(b) Function Recovery forσ = 0.1

0 50 100 150 200 250 300−10

−5

0

5

10

a Estimate

b Estimate

c Estimate

0 50 100 150 200 250 300−10

−5

0

5

10

a true

b true

c true

(c) Coefficients Parameters Recovery forσ = 0.1

0 50 100 150 200 250 300−1.5

−1

−0.5

0

0.5

1

1.5

(d) Noisy Functionσ = 0.25

0 50 100 150 200 250 300−1.5

−1

−0.5

0

0.5

1

1.5

original

BGAPN Recovery

BGAPN Continuous Recovery

(e) Function Recovery forσ = 0.25

0 50 100 150 200 250 300−10

−5

0

5

10

a Estimate

b Estimate

c Estimate

0 50 100 150 200 250 300−10

−5

0

5

10

a true

b true

c true

(f) Coefficients Parameters Recovery forσ = 0.25

Fig. 2. Recovery of a piecewise second-order polynomial function using the BGAPN algorithm with and without a constraint on the continuity.

to solve it by a generalization of the GAPN algorithm [33] –the block GAPN (BGAPN). We introduce this extension inAppendix B.

This algorithm aims at finding in a greedy way the rows ofΩ that are orthogonal to the space variablesb1 . . .bn. Noticethat once we find theindices of these rows, the setΛ thatsatisfiesΩΛbi = 0 for i = 1 . . . n (ΩΛ is the submatrixof Ω with the rows corresponding to the setΛ), we mayapproximateb1 . . .bn by solving

[

bT1 , . . . , b

Tn

]T

= minb1,...,bn

n∑

i=1

∥ΩΛbi

2

2(21)

s.t.

g −M [X1, . . .Xn]

b1

...bn

2

≤ ‖e‖2 .

Therefore, BGAPNapproximatesb1 . . .bn by findingΛ first.It starts with Λ that includes all the rows of Ω and thengradually removeselements from it by solvingthe problemposed in (21)at each iteration and then finding the row inΩ that has the largest correlationwith the current temporal

solution[

bT1 , . . . , b

Tn

]T

.Note that there are no known recovery guarantees for

BGAPN of the form we have had for SSCoSaMP before.Therefore, we present its efficiency in several experimentsinthe next section. As explained in Appendix B, the advantagesof BGAPN over SSCoSaMP, despite the lack of theoreticalguarantees, are that (i) it does not needk to be foreknown

and (ii) it is easier to use with higher dimensional functions.Before we move to the next section we note that one of the

advantages of the above formulation and the BGAPN algo-rithm is the relative ease of adding to it new constraints. Forexample, we may encounter piecewise polynomial functionsthat are also continuous. However, we do not have such acontinuity constraint in the current formulation. As we shallsee in the next section, the absence of such a constraint allowsjumps in the discontinuity points between the polynomialsegments and therefore it is important to add it to the algorithmto get a better reconstruction.

One possibility to solve this problem is to add a continuityconstraint on the jump points of the signal. In Appendix B wepresent also a modified version of the BGAPN algorithm thatimposes such a continuity constraint,and in the next sectionwe shall see how this handlesthe problem. Note that this isonly one example of a constraint that one may add to theBGAPN technique. For example, in images one may add asmoothness constrainton the edges’ directions.

VI. EXPERIMENTS

For demonstrating the efficiency of the proposed method weperform several tests. We start with the one dimensional case,testing our polynomial fitting approach with the continuityconstraint and without it for continuouspiecewise polynomialsof first and second degrees.We compare these results withthe optimal polynomial approximation scheme presented inSection IV and to the variational approach in [38]. We con-tinue with a compressed sensing experiment for discontinuous

Page 7: Sparsity Based Methods for Overparameterized Variational ...

7

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

2

4

6

8

10

12

14

16

18

20

σ

Mea

n S

quar

ed E

rror

BGAPNBGAPN ContinuousOptimal ProjectionOptimal ContinuousTVOPNL

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

2

4

6

8

10

12

14

16

18

20

σ

Mea

n S

quar

ed E

rror

BGAPNBGAPN ContinuousOptimal ProjectionOptimal ContinuousTVOPNL

Fig. 3. MSE of the recovered piecewise linear functions (left) and piecewise second-order polynomial functions (right) as a function of the noise varianceσfor the methods BGAPN with and without the continuity constraint and the optimal approximation with and without continuity post-processing. As a referencewe compare to the non local overparameterized TV (TVOPNL) approach introduced in [38].

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Sampling Rate: m/d

Rec

over

y R

ate

BGAPNSSCoSaMP

Fig. 4. Recovery rate of piecewise second-order polynomialfunctions as a function of the sampling ratem/d for the methods BGAPN and SSCoSaMP.

piecewise polynomials and compare BGAPN with SSCoSaMP.Then we perform some tests on images using BGAPN. Westart by denoising cartoon images using the piecewise linearmodel. We compare our outcome with the one of TV denoising[6] and show thatour result doesnot suffer from a staircasingeffect [56]. We compare also to a TV denoising version withoverparameterization [12].Then we show how our frameworkmay be used for image segmentation, drawing a connectionto the Mumford-Shah functional [57], [58].We compare ourresults with the ones obtained by the popular graph-cuts basedsegmentation [59].

A. Continuous Piecewise Polynomial Functions Denoising

In order to check the performance of the polynomial fitting,we generate random continuous piecewise-linear and second-order polynomial functions with300 samples,6 jumps and adynamic range[−1, 1]. Then we contaminate the signal witha white Gaussian noise with a standard deviation from the set0.05, 0.1, 0.15, . . . , 0.5.

We compare the recovery result of BGAPN with andwithout the continuity constraint with the one of the optimalapproximation6. Figs. 1 and 2 present BGAPN reconstructionresults for the linear and second order polynomial casesrespectively, for two different noise levels. It can be observedthat the addition of the continuity constraint is essentialforthe correctness of the recovery. Indeed, without it we getjumps between the segments. Note also that the number ofjumps in our recovery may be different than the one ofthe original signal as BGAPN does not have a preliminaryinformation about it. However, it still manages to recover theparameterization in a good way, especially in the lower noisecase.

The possibility to provide a parametric representation is oneof the advantages of our method. Indeed, one may achievegood denoising results without using the linear model in

6We have done the same experiment with the BOMP algorithm [28],adopting the synthesis framework, with and without the continuity constraint,and observed that it performs very similarly toBGAPN.

Page 8: Sparsity Based Methods for Overparameterized Variational ...

8

terms of mean squared error (MSE) using methods such asfree-knot spline [60]. However, the approximated functionisnot guaranteed to be piecewise linear and therefore learningthe change points from it is sub-optimal. See [38] and thereferences therein for more details.

To evaluate our method with respect to its MSE we compareit with the optimal approximation for piecewise polynomialfunction presented in Appendix A-A. Note that the targetsignals are continuous while this algorithm does not use thisassumption. Therefore, we add the continuity constraint tothismethod as a post processing (unlike BGAPN thatmerges thisin its steps). We take the changing points it has recoveredand project the noisy measurementg to its closestcontinuouspiecewise polynomial function with the same discontinuities.

Figure 3 presents the recovery performance of BGAPNand the projection algorithm with and without the continuousconstraint. Without the constraint, it can be observed thatBGAPN achieves better recovery performance. This is dueto the fact that it is not restricted to the number of changepoints in the initial signal and therefore it can use more pointsand thus adapt itself better to the signal, achieving lowerMSE. However, after adding the constraint in the piecewiselinear case the optimal projection achieves a better recoveryerror. The reason is that, as the optimal projection uses theexact number of points, it finds the changing locations moreaccurately. Note though that in the case of second order poly-nomial functions, BGAPN gets better recovery. This happensbecause this program uses the continuity constraint also withinits iterations and not only at the final step, as is the case withthe projection algorithm. As the second order polynomial caseis more complex than the piecewise linear one, the impact ofthe usage of the continuity prior is higher and more significantthan the information on the number of change points.

We compare also to the non-local opverapameterized TValgorithm (TVOPNL) in [38]7, which was shown to be bet-ter for the task of line segmentation, when compared withseveral alternatives including the ones reported in [12] and[13]. Clearly, our proposed scheme achieves better recoveryperformance than TVOPNL, demonstrating the supremacy ofour line segmentation strategy.

B. Compressed Sensing of Piecewise Polynomial Functions

We perform also a compressed sensing experiment in whichwe compare the performance of SSCoSAMP, with the opti-mal projection, and BGAPN for recovering a second orderpolynomial function with6 jumps from a small set of linearmeasurements. Each entry in the measurement matrixM isselected from an i.i.d normal distribution and then all columnsare normalized to have a unit norm. The polynomial functionsare selected as in the previous experiment but with twodifferences: (i) we omit the continuity constraint; and (ii) wenormalize the signals to be with a unit norm.

Fig. 4 presents the recovery rate (noiseless caseσ = 0) ofeach program as a function of the number of measurementsm.Note that for a very small or large number of samples BGAPNbehaves better. However, in the middle range SSCoSaMP

7Code provided by the authors.

(a) Original Image (b) Noisy Imageσ = 20

(c) BGAPN with ΩDIF. PSNR =40.09dB.

(d) TV recovery. PSNR = 38.95dB.(e) TV OP recovery. PSNR =37.41dB.

Fig. 5. Denoising ofswooshusing the BGAPN algorithm with and withoutdiagonal derivatives. Notice that we do not have the staircasing effect thatappears in the TV reconstruction.

achieves a better reconstruction rate. Nonetheless, we maysaythat their performance is more or less the same.

C. Cartoon Image Denoising

We turn to evaluate the performance of our approach onimages. A piecewise smooth model is considered to be a goodmodel for images, and especially tothe ones with no texture,i.e., cartoon images [61], [62]. Therefore, we use a linearoverparameterization of the two dimensional plane andemploythe two dimensional difference operatorΩDIF that calculatesthe horizontal and vertical discrete derivatives of an image byapplying the filters[1,−1] and[1,−1]T on it. In this case, the

Page 9: Sparsity Based Methods for Overparameterized Variational ...

9

(a) Original Image (b) Noisy Imageσ = 20

(c) BGAPN with ΩDIF. PSNR =34.02dB.

(d) TV recovery. PSNR =33.69dB.(e) TV OP recovery. PSNR=31.83dB.

Fig. 6. Denoising ofsign using the BGAPN algorithm. Theresults of TVand OP-TV arepresented as a reference.

problem in (20) turns to be (notice thatM = I)[

bT0 , b

Th , b

Tv

]T

= (22)

minb0,bh,bv

∣ΩDIFb0

2

+∣

∣ΩDIFbh

2

+∣

∣ΩDIFbv

2∥

0

s.t.

g− [X0,Xh,Xv]

b0

bh

bv

2

≤ ‖e‖2 ,

whereX0, Xh,Xv are the matrices that contain the DC, thehorizontal and the vertical parameterizations respectively; andb0, bh, bv are their corresponding space variables.

We apply this scheme for denoising two cartoon images,

(a) Original Image (b) Noisy Imageσ = 20

(c) BGAPN with ΩDIF. PSNR =30.77dB.

(d) TV recovery. PSNR = 31.44dB.(e) TV OP recovery. PSNR =30.6dB.

Fig. 7. Denoising ofhouseusing the BGAPN algorithm. Notice that we donot have the staircasing effect that appears in the TV reconstruction. Becauseour model is linear we do not recover the texture and thus we get slightlyinferior results compared to TV with respect to PSNR. Note that if we usea cubic overparameterization with BGAPN instead of linear we get PSNR(=31.81dB) better than that of TV.

swooshandsign. We compare our results with the ones of TVdenoising [6]. Figs. 5 and 6 present the recovery ofswooshandsign from their noisy version contaminated with an additivewhite Gaussian noise withσ = 20. Note that we achieve betterrecovery results than TV and do not suffer from its staircasingeffect. We havetuned the parameters of TV separately for eachimage to optimize its output quality, while we have used thesame setup for our method in all the denoising experiments.To get a good quality with BGAPN, we run the algorithmseveral times with different set of parameters (which are thesame for all images) and then provide as an output the averageimage of all the runs. Notice that using this technique with TVdegrades its results.

To test whether our better denoising is just a result of using

Page 10: Sparsity Based Methods for Overparameterized Variational ...

10

(a) Original Image Gradients (b) Recovered Image Gradients

Fig. 8. Gradient map of the cleanhouseimage and our recovered imagefrom Fig. 7.

overparameterization or an outcome of our new framework, wecompare also to TV with linear overparameterization [12]8.Notice that while plugging overparameterization directlyinTV improves the results in some cases [12], this is not thecase with the images here. Therefore, we see that our newframework that links sparsity with overparameterization hasan advantage over the old approach that still acts within thevariational scheme.

We could use other forms of overparameterizations suchas cubical instead of planar or add other directions of thederivatives in addition to the horizontal and vertical ones. Forexample, one may apply our scheme also using an operatorthat calculates also the diagonal derivatives using the filters[

1 00 −1

]

and

[

0 1−1 0

]

. Such choices may lead to an

improvement in different scenarios. A future work shouldfocus on learning the overparameterizations and the type ofderivatives that should be used for denoising and for othertasks. We believe that such a learning has the potential to leadto state-of-the-art results.

D. Image Segmentation

As a motivation for the task of segmentation we present thedenoising of an image with a texture. We continueusing themodel in (22) andconsiderthe houseimage as an example.Fig. 7 demonstrates the denoising result we get for this image.Note that here as well we do not suffer from the staircasingeffect that appears in the TV recovery. However, due to thenature of our model we loose the texture and therefore achievean inferior PSNR compared to the TV denoising9.

Though the removal of texture is not favorable for the task ofdenoising, it makes the recovery of salient edges in the originalimage easier. In Fig. 8 we present the gradient map of ourrecovered image and the one of the original image. It can beseen that while the gradients of the original image capture also

8Code provided by the authors.9The lower PSNR we get with our method is because our model is linear

and therefore is less capable to adapt itself to the texture.By using a cubicoverparameterization we get PSNR which is equal to the one ofTV. Note alsothat for larger noise magnitudes the recovery performance of our algorithmin terms of PSNR becomes better than TV also with the linear model, as inthese conditions, we tend to loose the texture anyway.

the texture changes,our method findsonly the main edges10.This motivates us to use our scheme for segmentation.

Since our scheme divides the image into piecewise linear re-gions, we can view our strategy as an approach that minimizesthe Mumford-Shah functional [57], [58]. On the other hand, ifthe image has only two regions, our segmentation result canbe viewed as a solution of the Chan-Vese functional with thedifference that we model each region by a polynomial functioninstead of approximating it by a constant [63].

We present our segmentation results for three images, andfor each we display the piecewise constant version of eachimage together with its boundary map. Our segmentationresults appear in Figs. 9, 10 and 11. We compare our resultsto the popular graph-cuts based segmentation [59]. Notice thatwe achieve a comparable performance, where in some placesour method behaves better and in others the strategy in [59]provides a better result.

Though we get a good segmentation, it is clear that thereis still a large room for improvement compared to the currentstate-of-the-art. One direction for improvement is to use morefilters within Ω. Another one is to calculate the gradients ofthe coefficients parameters and not of the recovered image asthey are supposed to be truly piecewise constant. We leavethese ideas to a future work.

VII. C ONCLUSION AND FUTURE WORK

This work has presented a novel framework for solving theoverparameterized variational problem using sparse represen-tations. We have demonstrated how this framework can beused both for one dimensional and two dimensional functions,while a generalization to other higher dimensions (such as 3D)is straightforward. We have solved the problem of line fittingfor piecewise polynomial 1D signals and then shown how thenew technique can be used for compressed sensing, denoisingand segmentation.

Though this work has focused mainly on linear overparam-eterizations, the extension to other forms is straightforward.However, to keep the discussion as simple as possible, wehave chosen to use simple forms of overparameterizations inthe experiments section. As a future research, we believe thata learning process should be added to our scheme. It shouldadapt the functions of the space variablesX1, . . . ,Xn and thefilters in Ω to the signal at hand. We believe that this hasthe potential to lead to state-of-the-art results in segmentation,denoising and other signal processing tasks. Combining of ourscheme with the standard sparse representation approach mayprovide the possibility to add support to images with texture.This will lead to a scheme that works globally on the imagefor the cartoon part and locally for the texture part. Anotherroute for future work is to integrate our scheme in the state-of-the-art overparameterized based algorithm for opticalflowin [13].

ACKNOWLEDGMENT

The authors would like to thank Tal Nir and Guy Rosemanfor providing their code for the experiments. The research

10We have tested our approach also in the presence of a blur operator. Theedges in this case are preserved as well.

Page 11: Sparsity Based Methods for Overparameterized Variational ...

11

(a) Original Image (b) Piecewise linear version of the image

(c) Image Segmentation (d) Image Segmentation using Graph-Cuts [59]

Fig. 9. Piecewise linear version ofcoins image together with the segmentation result. We compare to the popular graph-cuts based segmentation [59].

leading to these results has received funding from the Eu-ropean Research Council under European Unions SeventhFramework Program, ERC Grant agreement no. 320649. Thisresearch is partially supported by AFOSR, ARO, NSF, ONR,and NGA. The authors would like to thank the anonymousreviewers for their helpful and constructive comments thatgreatly contributed to improving this paper.

APPENDIX ATHE SSCOSAMP ALGORITHM

For approximating (16), we use a block sparsity variant ofSSCoSaMP [32] and adapt it to our model. It is presentedin Algorithm 1. Due to the equivalence betweenDHS andΩDIF , we use the latter in the algorithm.

This method uses a projectionSn(·, k) that given a signalfinds its closest piecewise polynomial functions withk jumppoints. We calculate this projection using dynamic program-ming. Our strategy is a generalization of the one that appearsin [11], [64] and is presented in the next subsection.

The halting criterion we use in our work in Algorithm 1is ‖gt

r‖2 ≤ ǫ for a given small constantǫ. Other options forstopping criteria are discussed in [47].

A. Optimal Approximation using Piecewise Polynomial Func-tions

Our projection technique uses the fact that once the jumppoints are set, the optimal parameters of the polynomial in asegment[t, l] can be calculated optimally by solving a leastsquares minimization problem∥

[I[t, l],Z[t, l], . . . ,Zn[t, l]]

b0[t, l]b1[t, l]

...bn[t, l]

− g[t, l]

2

2

, (23)

whereg[t, l] is the sub-vector ofg supported by the indicest to l (t ≤ l) and Zi[t, l] is the (square) sub-matrix ofZi

corresponding to the indicest to l. We denote byPn(g[t, l])the polynomial function we get by solving (23). Indeed, in

Page 12: Sparsity Based Methods for Overparameterized Variational ...

12

(a) Original Image (b) Piecewise linear version of the image

(c) Image Segmentation (d) Image Segmentation using Graph-Cuts [59]

Fig. 10. Piecewise linear version ofairplane image together with the segmentation result. We compare to the popular graph-cuts based segmentation [59].

the case that the size of the segment[t, l] is smaller than thenumber of parameters, e.g. segment of size one for a linearfunction, the above minimization problem has infinitely manyoptions for setting the parameters. However, all of them leadto the same result, which is keeping the values of the pointsin the segment, i.e., havingPn(g[t, l]) = g[t, l].

Denote bySn(g[1, d], k) the optimal approximation of thesignal g[1, d] by a piecewise polynomial function withkjumps. It can be calculated by solving the following recursiveminimization problem

t = argmin1≤t<d

‖Sn(g[1, t], k − 1)− g[1, t]‖22 (24)

+∥

∥Pn(g[t+ 1, d])− g[t+ 1, d]

2

2,

and setting

Sn(g[1, d], k) =

[

Sn(g[1, t], k − 1)

Pn(g[t+ 1, d])

]

. (25)

The vectors Sn(g[1, t], k − 1) can be calculated recur-sively using (24). The recursion ends with the base caseSn(g[1, t], 0) = Pn(g[1, t]).

This leads us to the following algorithm for calculatingan optimal approximation for a signalg. Notice that thisalgorithm provides us also with the parameterization of apiecewise polynomial.

1) CalculateSn(g[1, t], 0) = Pn(g[1, t]) for 1 ≤ t ≤ d.2) For k = 1 : k − 1 do

• CalculateSn(g[1, d], k) for 1 ≤ d ≤ d using (24)and (25).

3) CalculateSn(g[1, d], k) using (24) and (25).

Denoting by T the worst case complexity of calculatingPn(g[t, l]) for any pairt, l, we have that the complexity of step1) is O(dT ); of step 2) isO(kd2(T + d)), as the computationof the projection error is of complexityO(d); and of step 3)O(d(T + d)). Summing all together we get a total complexityof O(kd2(T + d)) for the algorithm, which is a polynomialcomplexity sinceT is polynomial.

APPENDIX BTHE BLOCK GAPN ALGORITHM

For approximating (20), we extend the GAPN technique[33] to block sparsity and adapt it to our model. It is presentedin Algorithm 2. Notice that this program, unlike SSCoSaMP,does not assume the knowledge ofk or the existence of anoptimal projection onto the signals’ low dimensional unionofsubspaces. Note also that it suits a general form of overparam-eterization and not only 1D piecewise polynomial functions. Itis possible to accelerate BGAPN for highly scaled problemsby removing from the cosupport several elements at a timeinstead of one in theupdate cosupportstage.

Page 13: Sparsity Based Methods for Overparameterized Variational ...

13

(a) Original Image (b) Piecewise linear version of the image

(c) Image Segmentation (d) Image Segmentation using Graph-Cuts [59]

Fig. 11. Piecewise linear version ofman image together with the segmentation result. We compare to the popular graph-cuts based segmentation [59].

Ideally, we would expect that after several iterations ofupdating the cosupport in BGAPN we would haveΩΛbi = 0.However, many signals are only nearly cosparse, i.e., haveksignificantly large values inΩbi while the rest are smallerthan a small constantǫ. Therefore, a natural stopping criterionin this case would be to stop when the maximal value in∣

∣Ωbi

∣is smaller thanǫ. This is the stopping criterion we use

throughout this paper for BGAPN. Of course, this is not theonly option for a stopping criterion, e.g. one may look at therelative solution change in each iteration or use a constantnumber of iterations ifk is foreknown.

We present also a modified version of BGAPN in Algo-rithm 3 that imposes a continuity constraint on the changepoints. This is done by creating a binary diagonal matrixW = diag(w1, . . . , wp) such that in each iteration of theprogram thei-th elementwi is 1 if it corresponds to a changepoint and zero otherwise. This matrix serves as a weightsmatrix to penalize discontinuity in the change point. This isdone by adding the regularizing term

γ

WΩ [X1, . . .Xn]

b1

...bn

2

2

to the minimization problem in (26) (Eq. (27) in Algorithm 3),which leads to the additional step (Eq. (28)) in the modified

program.

REFERENCES

[1] A. M. Bruckstein, D. L. Donoho, and M. Elad, “From sparse solutions ofsystems of equations to sparse modeling of signals and images,” SIAMReview, vol. 51, no. 1, pp. 34–81, 2009.

[2] R. Gribonval and M. Nielsen, “Sparse representations inunions ofbases,”IEEE Trans. Inf. Theory., vol. 49, no. 12, pp. 3320–3325, Dec.2003.

[3] T. Blumensath and M. Davies, “Sampling theorems for signals from theunion of finite-dimensional linear subspaces,”IEEE Trans. Inf. Theory.,vol. 55, no. 4, pp. 1872 –1882, april 2009.

[4] Y. M. Lu and M. N. Do, “A theory for sampling signals from a unionof subspaces,”IEEE Trans. Signal Process., vol. 56, no. 6, pp. 2334–2345, Jun. 2008.

[5] P. Perona and J. Malik, “Scale-space and edge detection usinganisotropic diffusion,”Pattern Analysis and Machine Intelligence, IEEETransactions on, vol. 12, no. 7, pp. 629–639, Jul 1990.

[6] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation basednoise removal algorithms,”Physica D: Nonlinear Phenomena, vol. 60,no. 1-4, pp. 259–268, 1992.

[7] V. Caselles, R. Kimmel, and G. Sapiro, “Geodesic active contours,”International Journal of Computer Vision, vol. 22, no. 1, pp. 61–79,1997.

[8] J. Weickert, Anisotropic diffusion in image processing. Teubner,Stuttgart, Germany, 1998.

[9] D. Needell and R. Ward, “Stable image reconstruction using totalvariation minimization,” SIAM Journal on Imaging Sciences, vol. 6,no. 2, pp. 1035–1058, 2013.

[10] S. Nam, M. E. Davies, M. Elad, and R. Gribonval, “The cosparseanalysis model and algorithms,”Appl. Comput. Harmon. Anal., vol. 34,no. 1, pp. 30 – 56, 2013.

Page 14: Sparsity Based Methods for Overparameterized Variational ...

14

Algorithm 1 Signal Space CoSaMP (SSCoSaMP) for Piece-wise Polynomial FunctionsInput: k,M,g, γ, where g = Mf + e, f =

[

I,Z,Z2, . . . ,Zn] [

bT0 ,b

T1 , . . . ,b

Tn

]Tis a piecewise

polynomial function of ordern, k = ‖∑ni=0 |ΩDIFbi|‖0

is the number of jumps in the representation coefficientsof f , e is an additive noise andγ is a parameter of thealgorithm. Sn(·, k) is a procedure that approximates agiven signal by a piecewise polynomial function of ordern with k jumps.

Output: f : A piecewise polynomial withk+1 segments thatapproximatesf .• Initialize the jumps’ locationsT 0 = ∅, the residualg0

r =g and sett = 0.while halting criterion is not satisfieddo• t = t+ 1.• Find the parameterizationbr,0,br,1, . . . ,br,n ofthe residual’s polynomial approximation by calculatingSn(M

Tgt−1r , γk).

• Find new temporal jump locations:T∆ = the supportof

∑ni=0 |ΩDIFbr,i|.

• Update the jumps locations’ indices:T t = T t−1∪T∆.• Compute temporal parameters:[bp,0, . . . ,bp,n] =

argminb0,...,bn

g −M[

I,Z,Z2 . . . ,Zn]

b0

b1

...bn

2

2s.t.

(ΩDIFb0)(T t)C = 0, . . . (ΩDIFbn)(T t)C = 0.• Calculate a polynomial approximation of ordern:f t = Sn(

[

I,Z,Z2, . . . ,Zn] [

bTp,0, . . . ,b

Tp,n

]T, k).

• Find new jump locations:T t = the locations of thejumps in the parameterization off t.• Update the residual:gt

r = g −Mf t.end while• Form final solutionf = f t.

[11] R. Giryes, S. Nam, M. Elad, R. Gribonval, and M. E. Davies, “Greedy-like algorithms for the cosparse analysis model,”Linear Algebra and itsApplications, vol. 441, no. 0, pp. 22 – 60, Jan. 2014, special Issue onSparse Approximate Solution of Linear Systems.

[12] T. Nir and A. Bruckstein, “On over-parameterized modelbased TV-denoising,” inSignals, Circuits and Systems, 2007. ISSCS 2007. Inter-national Symposium on, vol. 1, July 2007, pp. 1–4.

[13] T. Nir, A. M. Bruckstein, and R. Kimmel, “Over-parameterized varia-tional optical flow,” International Journal of Computer Vision, vol. 76,no. 2, pp. 205–216, 2008.

[14] G. Rosman, S. Shem-Tov, D. Bitton, T. Nir, G. Adiv, R. Kimmel,A. Feuer, and A. M. Bruckstein, “Over-parameterized optical flow usinga stereoscopic constraint,” inScale Space and Variational Methodsin Computer Vision, ser. Lecture Notes in Computer Science, A. M.Bruckstein, B. M. Haar Romeny, A. M. Bronstein, and M. M. Bronstein,Eds. Springer Berlin Heidelberg, 2012, vol. 6667, pp. 761–772.

[15] R. Giryes, S. Nam, R. Gribonval, and M. E. Davies, “Iterative cosparseprojection algorithms for the recovery of cosparse vectors,” in The 19thEuropean Signal Processing Conference (EUSIPCO-2011), Barcelona,Spain, 2011.

[16] R. Giryes and M. Elad, “CoSaMP and SP for the cosparse anal-ysis model,” in The 20th European Signal Processing Conference(EUSIPCO-2012), Bucharest, Romania, 2012.

[17] T. Peleg and M. Elad, “Performance guarantees of the thresholdingalgorithm for the Co-Sparse analysis model,”submitted to IEEE Trans.

Algorithm 2 The Block GAPN AlgorithmInput: M,g,Ω, where g = Mf + e, f =

[X1, . . . ,Xn][

bT1 , . . . ,b

Tn

]Tsuch that

∑n

i |Ωbi| issparse, ande is an additive noise.

Output: f = [X1, . . . ,Xn][

bT1 , . . . , b

Tn

]T

: an estimate forf

such that∑n

i

∣Ωbi

∣is sparse.

Initialize cosupportΛ = 1, . . . , p and sett = 0.while halting criterion is not satisfieddot = t+ 1.Calculate a new estimate:[

bT1 , . . . , b

Tn

]T

= argminb1,...,bn

n∑

i=1

∥ΩΛbi

2

2(26)

s.t.

g −M [X1, . . .Xn]

b1

...bn

2

≤ ‖e‖2 .

Update cosupport:Λ = Λ \

argmaxj∑n

i=1

∥Ωjbi

2

2

.

end whileForm an estimate for the original signal:f =

[X1, . . . ,Xn][

bT1 , . . . , b

Tn

]T

.

on Information Theory.[18] E. J. Candes, Y. C. Eldar, D. Needell, and P. Randall, “Compressed sens-

ing with coherent and redundant dictionaries,”Appl. Comput. Harmon.Anal., vol. 31, no. 1, pp. 59 – 73, 2011.

[19] S. F. Cotter, B. D. Rao, K. Engan, and K. Kreutz-Delgado,“Sparsesolutions to linear inverse problems with multiple measurement vectors,”IEEE Trans. Signal Process., vol. 53, no. 7, pp. 2477–2488, July 2005.

[20] J. A. Tropp, A. C. Gilbert, and M. J. Strauss, “Algorithms for simulta-neous sparse approximation. Part I: Greedy pursuit,”Signal Processing,vol. 86, no. 3, pp. 572 – 588, 2006, sparse Approximations in Signaland Image Processing.

[21] J. A. Tropp, “Algorithms for simultaneous sparse approximation. PartII: Convex relaxation,”Signal Processing, vol. 86, no. 3, pp. 589 – 602,2006, sparse Approximations in Signal and Image Processing.

[22] D. Wipf and B. Rao, “An empirical bayesian strategy for solvingthe simultaneous sparse approximation problem,”IEEE Trans. SignalProcess., vol. 55, no. 7, pp. 3704–3716, July 2007.

[23] M. Mishali and Y. C. Eldar, “Reduce and boost: Recovering arbitrarysets of jointly sparse vectors,”IEEE Trans. Signal Process., vol. 56,no. 10, pp. 4692–4702, Oct. 2008.

[24] M. Fornasier and H. Rauhut, “Recovery algorithms for vector-valueddata with joint sparsity constraints,”SIAM Journal on Numerical Anal-ysis, vol. 46, no. 2, pp. 577–613, 2008.

[25] M. Yuan and Y. Lin, “Model selection and estimation in regression withgrouped variables,”Journal of the Royal Statistical Society: Series B(Statistical Methodology), vol. 68, no. 1, pp. 49–67, 2006.

[26] Y. C. Eldar and M. Mishali, “Robust recovery of signals from astructured union of subspaces,”Information Theory, IEEE Transactionson, vol. 55, no. 11, pp. 5302–5316, Nov 2009.

[27] M. Stojnic, F. Parvaresh, and B. Hassibi, “On the reconstruction ofblock-sparse signals with an optimal number of measurements,” SignalProcessing, IEEE Transactions on, vol. 57, no. 8, pp. 3075–3085, Aug2009.

[28] Y. C. Eldar, P. Kuppinger, and H. Bolcskei, “Block-sparse signals:Uncertainty relations and efficient recovery,”Signal Processing, IEEETransactions on, vol. 58, no. 6, pp. 3042–3054, June 2010.

[29] R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, “Model-based com-pressive sensing,”Information Theory, IEEE Transactions on, vol. 56,no. 4, pp. 1982–2001, April 2010.

[30] M. A. Davenport, D. Needell, and M. B. Wakin, “Signal space cosampfor sparse recovery with redundant dictionaries,”IEEE Trans. Inf.Theory, vol. 59, no. 10, pp. 6820–6829, Oct 2013.

Page 15: Sparsity Based Methods for Overparameterized Variational ...

15

Algorithm 3 The Block GAPN Algorithm with ContinuityConstraintInput: M,g,Ω, γ, where g = Mf + e, f =

[X1, . . . ,Xn][

bT1 , . . . ,b

Tn

]Tsuch that

∑n

i |Ωbi| is sparse,e is an additive noise, andγ is a weight for the continuityconstraint.

Output: f = [X1, . . . ,Xn][

bT1 , . . . , b

Tn

]T

: an estimate forf

such that∑n

i

∣Ωbi

∣is sparse.

Initialize cosupportΛ = 1, . . . , p and sett = 0.while halting criterion is not satisfieddot = t+ 1.Calculate a new estimate:

[

bT1 , . . . , b

Tn

]T

= argminb1,...,bn

n∑

i=1

∥ΩΛbi

2

2(27)

s.t.

g −M [X1, . . .Xn]

b1

...bn

2

≤ ‖e‖2 .

Update cosupport:Λ = Λ \

argmaxj∑n

i=1

∥Ωjbi

2

2

.

Create Weight Matrix:W = diag(w1, . . . , wp), wherewi = 0 if i ∈ Λ or wi = 1 otherwise.Recalculate the estimate:[

bT1 , . . . , b

Tn

]T

= argminb1,...,bn

n∑

i=1

∥ΩΛbi

2

2(28)

WΩ [X1, . . .Xn]

b1

...bn

2

2

s.t.

g −M [X1, . . .Xn]

b1

...bn

2

≤ ‖e‖2 .

end whileForm an estimate for the original signal:f =

[X1, . . . ,Xn][

bT1 , . . . , b

Tn

]T

.

[31] R. Giryes and D. Needell, “Greedy signal space methods for incoherenceand beyond,”Appl. Comput. Harmon. Anal., vol. 39, no. 1, pp. 1 – 20,Jul. 2015.

[32] ——, “Near oracle performance and block analysis of signal spacegreedy methods,”Journal of Approximation Theory, vol. 194, pp. 157– 174, Jun. 2015.

[33] S. Nam, M. E. Davies, M. Elad, and R. Gribonval, “Recovery of cosparsesignals with greedy analysis pursuit in the presence of noise,” in 4thIEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), Dec 2011, pp. 361–364.

[34] J. M. Morel and S. Solimini,Variational Methods in Image Segmenta-tion. Cambridge, MA, USA: Birkhauser Boston Inc., 1995.

[35] T. Chan and J. Shen,Image Processing and Analysis. Society forIndustrial and Applied Mathematics, 2005.

[36] J. Weickert, A. Bruhn, T. Brox, and N. Papenberg, “A survey onvariational optic flow methods for small displacements,” inMathematicalModels for Registration and Applications to Medical Imaging, ser.Mathematics in industry. Springer Berlin Heidelberg, 2006, vol. 10,pp. 103–136.

[37] G. Aubert and P. Kornprobst,Mathematical problems in image pro-

cessing: partial differential equations and the calculus of variations.Springer, 2006, vol. 147.

[38] S. Shem-Tov, G. Rosman, G. Adiv, R. Kimmel, and A. M. Bruckstein,“On globally optimal local modeling: From moving least squares to over-parametrization,” inInnovations for Shape Analysis. Springer, 2013,pp. 379–405.

[39] G. Davis, S. Mallat, and M. Avellaneda, “Adaptive greedy approxima-tions,” Constructive Approximation, vol. 50, pp. 57–98, 1997.

[40] M. Elad, P. Milanfar, and R. Rubinstein, “Analysis versus synthesis insignal priors,”Inverse Problems, vol. 23, no. 3, pp. 947–968, June 2007.

[41] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decompositionby basis pursuit,”SIAM Journal on Scientific Computing, vol. 20, no. 1,pp. 33–61, 1998.

[42] D. L. Donoho and M. Elad, “On the stability of the basis pursuit in thepresence of noise,”Signal Process., vol. 86, no. 3, pp. 511–532, 2006.

[43] R. Tibshirani, “Regression shrinkage and selection via the lasso,”J. Roy.Statist. Soc. B, vol. 58, no. 1, pp. 267–288, 1996.

[44] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictio-naries,” IEEE Trans. Signal Process., vol. 41, pp. 3397–3415, 1993.

[45] S. Chen, S. A. Billings, and W. Luo, “Orthogonal least squares methodsand their application to non-linear system identification,” InternationalJournal of Control, vol. 50, no. 5, pp. 1873–1896, 1989.

[46] G. Davis, S. Mallat, and M. Avellaneda, “Adaptive time-frequencydecompositions,”Optical Engineering, vol. 33, no. 7, pp. 2183–2191,July 1994.

[47] D. Needell and J. Tropp, “CoSaMP: Iterative signal recovery fromincomplete and inaccurate samples,”Appl. Comput. Harmon. Anal.,vol. 26, no. 3, pp. 301 – 321, May 2009.

[48] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensingsignal reconstruction,”IEEE Trans. Inf. Theory., vol. 55, no. 5, pp. 2230–2249, May 2009.

[49] T. Blumensath and M. E. Davies, “Iterative hard thresholding forcompressed sensing,”Appl. Comput. Harmon. Anal., vol. 27, no. 3, pp.265 – 274, 2009.

[50] S. Foucart, “Hard thresholding pursuit: an algorithm for compressivesensing,”SIAM J. Numer. Anal., vol. 49, no. 6, pp. 2543–2563, 2011.

[51] R. Giryes and M. Elad, “Can we allow linear dependenciesin the dic-tionary in the synthesis framework?” inIEEE International Conferenceon Acoustics, Speech and Signal Processing (ICASSP), 2013.

[52] ——, “Iterative hard thresholding for signal recovery using near optimalprojections,” in10th Int. Conf. on Sampling Theory Appl. (SAMPTA),2013.

[53] ——, “OMP with highly coherent dictionaries,” in10th Int. Conf. onSampling Theory Appl. (SAMPTA), 2013.

[54] ——, “RIP-based near-oracle performance guarantees for SP, CoSaMP,and IHT,” IEEE Trans. Signal Process., vol. 60, no. 3, pp. 1465–1468,March 2012.

[55] E. J. Candes, “Modern statistical estimation via oracle inequalities,”ActaNumerica, vol. 15, pp. 257–325, 2006.

[56] J. Savage and K. Chen, “On multigrids for solving a classof improvedtotal variation based staircasing reduction models,” inImage ProcessingBased on Partial Differential Equations, ser. Mathematics and Visual-ization, X.-C. Tai, K.-A. Lie, T. Chan, and S. Osher, Eds. SpringerBerlin Heidelberg, 2007, pp. 69–94.

[57] D. Mumford and J. Shah, “Optimal approximations by piecewise smoothfunctions and associated variational problems,”Communications on Pureand Applied Mathematics, vol. 42, no. 5, pp. 577–685, 1989.

[58] L. Ambrosio and V. M. Tortorelli, “Approximation of functional depend-ing on jumps by elliptic functional via t-convergence,”Communicationson Pure and Applied Mathematics, vol. 43, no. 8, pp. 999–1036, 1990.

[59] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based imagesegmentation,”International Journal of Computer Vision, vol. 59, no. 2,pp. 167–181, 2004.

[60] D. Jupp, “Approximation to data by splines with free knots,” SIAMJournal on Numerical Analysis, vol. 15, no. 2, pp. 328–343, 1978.

[61] E. J. Candes and D. L. Donoho, “Curvelets? a surprisingly effectivenonadaptive representation for objects with edges,” inCurves andSurface Fitting: Saint-Malo 99, C. R. A. Cohen and L. Schumaker, Eds.Vanderbilt University Press, Nashville, 2000, p. 105?120.

[62] D. L. Donoho, “Sparse components of images and optimal atomic de-compositions,”Constructive Approximation, vol. 17, no. 3, p. 353?382,2001.

[63] T. F. Chan and L. Vese, “Active contours without edges,”IEEE Trans-actions on Image Processing, vol. 10, no. 2, pp. 266–277, Feb 2001.

[64] T. Han, S. Kay, and T. Huang, “Optimal segmentation of signals andits application to image denoising and boundary feature extraction,” in

Page 16: Sparsity Based Methods for Overparameterized Variational ...

16

IEEE International Conference on Image Processing (ICIP)., vol. 4, oct.2004, pp. 2693 – 2696 Vol. 4.