Compressive Online Robust Principal Component...

16
1 Compressive Online Robust Principal Component Analysis Via n-1 Minimization Huynh Van Luong, IEEE Member, Nikos Deligiannis, IEEE Member, J¨ urgen Seiler, IEEE Member, Søren Forchhammer, IEEE Member, and Andr´ e Kaup, IEEE Fellow Abstract—This work considers online robust principal compo- nent analysis (RPCA) in time-varying decomposition problems such as video foreground-background separation. We propose a compressive online RPCA algorithm that decomposes recursively a sequence of data vectors (e.g., frames) into sparse and low- rank components. Different from conventional batch RPCA, which processes all the data directly, our approach considers a small set of measurements taken per data vector (frame). Moreover, our algorithm can incorporate multiple prior infor- mation from previous decomposed vectors via proposing an n- 1 minimization method. At each time instance, the algorithm recovers the sparse vector by solving the n-1 minimization problem—which promotes not only the sparsity of the vector but also its correlation with multiple previously-recovered sparse vectors—and, subsequently, updates the low-rank component using incremental singular value decomposition. We also establish theoretical bounds on the number of measurements required to guarantee successful compressive separation under the assump- tions of static or slowly-changing low-rank components. We eval- uate the proposed algorithm using numerical experiments and online video foreground-background separation experiments. The experimental results show that the proposed method outperforms the existing methods. Index Terms—Robust PCA, compressed sensing, sparse signal, low-rank model, prior information. P RINCIPAL component analysis (PCA) [3] forms the crux of dimensionality reduction for high-dimensional data analysis. Robust PCA (RPCA) [4]–[7] has emerged as a counterpart to address the sensitivity of PCA to outliers and structured noise. A popular approach to RPCA decomposes a data matrix M R n×t into the sum of a sparse compo- nent X and a low-rank component L by solving the principal component pursuit (PCP) [5] problem: min L,S kLk * + λkvec(X)k 1 s.t. M = L + X, (1) where kLk * = i σ i (L) is the nuclear norm—sum of singu- lar values σ i (L)—of the matrix L, kvec(X)k 1 = i,j |x i,j | is the 1 -norm of X organized in a vector, and λ is a tuning parameter. RPCA has found many applications in computer vision, web data analysis, anomaly detection, and data visual- ization (see [5], [8] and references therein). In video analysis, Parts of this work have been presented at the IEEE ICIP 2016 [1] and the IEEE GlobalSIP 2017 [2]. H. V. Luong, J. Seiler, and A. Kaup are with the Chair of Multime- dia Communications and Signal Processing, Friedrich-Alexander-Universit¨ at Erlangen-N¨ urnberg, 91058 Erlangen, Germany (e-mail: [email protected], [email protected], [email protected]). N. Deligiannis is with the Department of Electronics and Informatics, Vrije Universiteit Brussel, 1050 Brussels and with imec, Kapeldreef 75, 3001 Leuven, Belgium (e-mail: [email protected]). S. Forchhammer is with the Department of Photonics Engineer- ing, Technical University of Denmark, 2800 Lyngby, Denmark (e-mail: [email protected]). in particular, a sequence of vectorized frames (modeled by M) is separated into the slowly-changing background L and the sparse foreground X. Batch RPCA methods [4]–[7] typically assume that the low-dimensional subspace, where the data lives, is static; namely, the L component does not vary across the data samples. This assumption is typically invalid; even in static-camera video sequences, for example, the background might change due to illumination variations, moving water or leaves. In addition, these methods process all the data directly, leading to high computation and memory demands when high- dimensional streaming data such as video is analyzed. In order to address these shortcomings, online RPCA meth- ods [9]–[17] perform the decomposition recursively, i.e., per data vector (a.k.a., column-vector in M). As such, online RPCA methods can process the incoming data vectors on- the-fly, without needing to store them in memory for com- putation at a later instance (as batch methods do). Several approaches to online RPCA have been proposed, including: (i) re-weighting each incoming data vector with a pre-defined influence function of its residual to the current estimated sub- space [9]; (ii) performing matrix factorization of the low-rank component [10]; (iii) leveraging an exponentially-weighted least squares estimator [11]; (iv) assuming slow variation of the low-rank component and leveraging compressed sensing (CS) [12], [13] to recover the sparse component [14], [15]; and (v) applying alternating minimization to solve the PCP problem [16] and extending it to its incremental form [17]. Moreover, the online RPCA methods in [15], [18], [19] can operate on compressive measurements (a.k.a., subsampled or uncompleted data vectors), thereby reducing further the associated computation, communication and storage costs. Problem definition. We consider a compressive online RPCA algorithm that recursively decomposes the data sam- ples, observed through low-dimensional measurements, by leveraging information from multiple previously decomposed data samples. Formally, at time instance t, we wish to decom- pose M t = L t + X t R n×t into X t =[x 1 x 2 ... x t ] and L t =[v 1 v 2 ... v t ], where [·] denotes a matrix and x t , v t R n are column-vectors in X t and L t , respectively. We assume that L t-1 =[v 1 v 2 ... v t-1 ] and X t-1 =[x 1 x 2 ... x t-1 ] have been recovered at time instance t - 1 and that at time instance t we have access to compressive measurements of the vector x t + v t , that is, we observe y t = Φ(x t + v t ), where Φ R m×n (m n) is a random projection [12]. At time instance t, we formulate the decomposition problem min xt,vt k[L t-1 v t ]k * + λ 1 kx t k 1 + λ 2 f (x t , X t-1 ) s.t. y t = Φ(x t + v t ) (2)

Transcript of Compressive Online Robust Principal Component...

Page 1: Compressive Online Robust Principal Component …homepages.vub.ac.be/~ndeligia/pubs/corpcaVideo_v21.pdfsparse and low-rank recovery, respectively. Other studies have proposed online

1

Compressive Online Robust Principal ComponentAnalysis Via n-`1 Minimization

Huynh Van Luong, IEEE Member, Nikos Deligiannis, IEEE Member, Jurgen Seiler, IEEE Member, SørenForchhammer, IEEE Member, and Andre Kaup, IEEE Fellow

Abstract—This work considers online robust principal compo-nent analysis (RPCA) in time-varying decomposition problemssuch as video foreground-background separation. We propose acompressive online RPCA algorithm that decomposes recursivelya sequence of data vectors (e.g., frames) into sparse and low-rank components. Different from conventional batch RPCA,which processes all the data directly, our approach considersa small set of measurements taken per data vector (frame).Moreover, our algorithm can incorporate multiple prior infor-mation from previous decomposed vectors via proposing an n-`1 minimization method. At each time instance, the algorithmrecovers the sparse vector by solving the n-`1 minimizationproblem—which promotes not only the sparsity of the vectorbut also its correlation with multiple previously-recovered sparsevectors—and, subsequently, updates the low-rank componentusing incremental singular value decomposition. We also establishtheoretical bounds on the number of measurements required toguarantee successful compressive separation under the assump-tions of static or slowly-changing low-rank components. We eval-uate the proposed algorithm using numerical experiments andonline video foreground-background separation experiments. Theexperimental results show that the proposed method outperformsthe existing methods.

Index Terms—Robust PCA, compressed sensing, sparse signal,low-rank model, prior information.

PRINCIPAL component analysis (PCA) [3] forms the cruxof dimensionality reduction for high-dimensional data

analysis. Robust PCA (RPCA) [4]–[7] has emerged as acounterpart to address the sensitivity of PCA to outliers andstructured noise. A popular approach to RPCA decomposesa data matrix M ∈ Rn×t into the sum of a sparse compo-nent X and a low-rank component L by solving the principalcomponent pursuit (PCP) [5] problem:

minL,S‖L‖∗ + λ‖vec(X)‖1 s.t. M = L+X, (1)

where ‖L‖∗ =∑i σi(L) is the nuclear norm—sum of singu-

lar values σi(L)—of the matrix L, ‖vec(X)‖1 =∑i,j |xi,j |

is the `1-norm of X organized in a vector, and λ is a tuningparameter. RPCA has found many applications in computervision, web data analysis, anomaly detection, and data visual-ization (see [5], [8] and references therein). In video analysis,

Parts of this work have been presented at the IEEE ICIP 2016 [1] and theIEEE GlobalSIP 2017 [2].

H. V. Luong, J. Seiler, and A. Kaup are with the Chair of Multime-dia Communications and Signal Processing, Friedrich-Alexander-UniversitatErlangen-Nurnberg, 91058 Erlangen, Germany (e-mail: [email protected],[email protected], [email protected]).

N. Deligiannis is with the Department of Electronics and Informatics,Vrije Universiteit Brussel, 1050 Brussels and with imec, Kapeldreef 75, 3001Leuven, Belgium (e-mail: [email protected]).

S. Forchhammer is with the Department of Photonics Engineer-ing, Technical University of Denmark, 2800 Lyngby, Denmark (e-mail:[email protected]).

in particular, a sequence of vectorized frames (modeled byM )is separated into the slowly-changing background L and thesparse foreground X . Batch RPCA methods [4]–[7] typicallyassume that the low-dimensional subspace, where the datalives, is static; namely, the L component does not vary acrossthe data samples. This assumption is typically invalid; even instatic-camera video sequences, for example, the backgroundmight change due to illumination variations, moving water orleaves. In addition, these methods process all the data directly,leading to high computation and memory demands when high-dimensional streaming data such as video is analyzed.

In order to address these shortcomings, online RPCA meth-ods [9]–[17] perform the decomposition recursively, i.e., perdata vector (a.k.a., column-vector in M ). As such, onlineRPCA methods can process the incoming data vectors on-the-fly, without needing to store them in memory for com-putation at a later instance (as batch methods do). Severalapproaches to online RPCA have been proposed, including:(i) re-weighting each incoming data vector with a pre-definedinfluence function of its residual to the current estimated sub-space [9]; (ii) performing matrix factorization of the low-rankcomponent [10]; (iii) leveraging an exponentially-weightedleast squares estimator [11]; (iv) assuming slow variation ofthe low-rank component and leveraging compressed sensing(CS) [12], [13] to recover the sparse component [14], [15];and (v) applying alternating minimization to solve the PCPproblem [16] and extending it to its incremental form [17].Moreover, the online RPCA methods in [15], [18], [19] canoperate on compressive measurements (a.k.a., subsampledor uncompleted data vectors), thereby reducing further theassociated computation, communication and storage costs.

Problem definition. We consider a compressive onlineRPCA algorithm that recursively decomposes the data sam-ples, observed through low-dimensional measurements, byleveraging information from multiple previously decomposeddata samples. Formally, at time instance t, we wish to decom-pose M t = Lt +Xt ∈ Rn×t into Xt = [x1 x2 ... xt] andLt = [v1 v2 ... vt], where [·] denotes a matrix and xt,vt ∈ Rnare column-vectors in Xt and Lt, respectively. We assumethat Lt−1 = [v1 v2 ... vt−1] and Xt−1 = [x1 x2 ... xt−1]have been recovered at time instance t − 1 and that at timeinstance t we have access to compressive measurements of thevector xt + vt, that is, we observe yt = Φ(xt + vt), whereΦ ∈ Rm×n (m � n) is a random projection [12]. At timeinstance t, we formulate the decomposition problem

minxt,vt

‖[Lt−1 vt]‖∗ + λ1‖xt‖1 + λ2f(xt,Xt−1)

s.t. yt = Φ(xt + vt)

(2)

Page 2: Compressive Online Robust Principal Component …homepages.vub.ac.be/~ndeligia/pubs/corpcaVideo_v21.pdfsparse and low-rank recovery, respectively. Other studies have proposed online

2

where Φ is assumed known, f(·) is a function that expressesthe similarity of xt with the previously recovered sparsecomponents Xt−1, and λ1, λ2 are tuning parameters. Prob-lem (2) emerges naturally in scenarios where the data sampleshave strong temporal correlation; for example, there is highcorrelation across the backgrounds and the foregrounds inmultiple adjacent video frames [20], [21].

Relation to prior work. Unlike batch [4]–[7] or on-line [9]–[11], [17] RPCA approaches that consume full datasamples, operating on compressive measurements (like ourmethod does) improves the computational efficiency and re-duces the data communication cost. Examples of compressivebatch RPCA solutions include Compressive PCP [22] andSpaRCS [23]. The former solves a modified version of Prob-lem (1) with constraint D = Φ(L+X) using a fast interior-point algorithm [24], while the latter adheres to a greedyapproach, combining CoSaMP [25] and ADMiRA [26] forsparse and low-rank recovery, respectively. Other studies haveproposed online RPCA methods that consider sub-sampleddata: GRASTA [18] alternates between an ADMM (alternatingdirection method of multipliers [27]) step to update the sparsecomponent and an incremental gradient descent method on theGrassmannian manifold to update the low-rank component.GOSUS [28] improves over GRASTA by accounting forstructured sparsity in the foreground, and ROSETA [29] byupdating the sub-space using a proximal point iteration.

Recently, in the context of foreground-background sepa-ration, advanced factorization methods, which obtain robustbackground subtractions, have been proposed [30]–[32]. Alow-rank matrix factorization method with a general noisedistribution model was developed in [30]. Alternately, aGaussian mixture model that is updated frame-by-frame andregularized by the previous foreground-background frameswas developed in [31]. The method in [32] studied backgroundsubtraction from compressive measurements by proposing atensor RPCA to exploit the spatial-temporal correlation in thebackground and the continuity of the foreground. The studyin [19] presented an online compressive RPCA method that issupported by performance guarantees. However, these methodsas well as the methods in [18], [28], [29] do not explore thecorrelation across adjacent sparse components, as we proposein Problem (2).

The problem of reconstructing a sequence of time-varyingsparse signals using prior information has been explored inonline RPCA [15] and in recursive CS [33]–[35] (see [36] fora review). The study in [37] extended the ReProCS [14], [15]algorithm to the compressive case and used modified-CS [38]to leverage prior support knowledge under the condition ofslowly-varying support. However, no explicit condition on thenumber of measurements required for successful recovery wasreported. Alternatively, the studies in [33]–[35], [39] assumedthe low-rank vector non-varying (v = v1 = · · · = vt−1 = vt)and recovered the sparse component using `1 [35], [39] or`1-`1 [33], [34] minimization. The studies in [33]–[35] alsoprovide a method to compute the number of compressive mea-surements needed to recover xt per time instance. However,the methods in [34], [35], [37] do not exploit multiple priorinformation from multiple previously recovered frames.

Contributions. We propose a compressive online RPCA(CORPCA) with multiple prior information algorithm thatsolves Problem (2). At each time instance, the algorithm recov-ers the sparse vector by minimizing an n-`1 norm minimiza-tion problem that promotes the sparsity of the vector and itscorrelation with multiple previously-recovered sparse vectors.To this end, we use a reconstruction algorithm with multipleside information using adaptive weights (RAMSIA) [1], ourrecent algorithm for compressed sensing with multiple priorinformation signals. Moreover, CORPCA updates the low-rankcomponent via an incremental SVD [40] method, assumingthat it varies slowly. Furthermore, we establish theoreticalbounds on the number of measurements required by the pro-posed n-`1 minimization that serve as performance guaranteesfor CORPCA to obtain successful decomposition. We evaluateCORPCA using synthetic data as well as video data forforeground-background separation. The results show that theperformance of CORPCA is following our theoretical analysisand it is higher than existing methods, that is, RPCA [5],GRASTA [18] and ReProCS [37].

Outline. The paper is continuous as follows. Section Ipresents the background of the work and Section II establishesour proposal for the function f(·) in (2) and elaborateson our algorithm to solve the n-`1 minimization problem.Section III gives the full description of the CORPCA methodand derives its performance guarantees. Section IV describesour experimental results and Section V concludes the paper.

I. BACKGROUND

We first review fundamental problems related to our work,namely, sparse signal recovery via `1 [12], [13] and `1-`1 min-imization [41]–[43], and PCP [5]. Let x ∈ Rn denote a finite-length, sparse or compressible signal for which we have accessto low-dimensional, linear measurements y = Φx ∈ Rm,with m � n. The measurements are random projectionsobtained using a measurement matrix Φ ∈ Rm×n, whoseelements are sampled from an i.i.d. Gaussian distribution.According to the compressed sensing (CS) theory [12], [13],x can be recovered with overwhelming probability by solvingthe following `1 minimization problem:

minx||x||1 s.t. y=Φx, (3)

provided that [44]m`1 ≥ 2s0 log(n/s0) + (7/5)s0 + 1, (4)

where s0 := ‖x‖0 = |{i : xi 6= 0}| denotes the number ofnonzero elements in x, with | · | being the cardinality of a setand ‖ · ‖0 the `0 pseudo-norm. Problem (3) can be solved bya relaxed problem as

minx{H(x) = f(x) + g(x)}, (5)

where f := Rn → R is a smooth convex function withLipschitz constant L∇f [45] of the gradient ∇f and g :=Rn → R is a continuous convex function possibly non-smooth.Problem (3) arises from (5) when g(x) = λ||x||1, with λ > 0being a regularization parameter, and f(x) = 1

2 ||Φx − y||22.By using a proximal gradient method [45], x(k) at iteration kcan be iteratively computed as

x(k) = Γ 1L g

(x(k−1)− 1

L∇f(x(k−1))

), (6)

Page 3: Compressive Online Robust Principal Component …homepages.vub.ac.be/~ndeligia/pubs/corpcaVideo_v21.pdfsparse and low-rank recovery, respectively. Other studies have proposed online

3

whereL ≥ L∇f, andΓ 1L g

(x) is a proximal operator defined as

Γ 1L g

(x) = arg minv∈Rn

{ 1

Lg(v) +

1

2||v − x||22

}. (7)

The performance of CS can be improved by leveraging asignal z correlated to the target signal x—called prior (orside) information—which is available at the reconstructionside [38], [41], [42]. The study in [41], [42] showed that bysolving an `1-`1 minimization problem the performance ofCS can be improved dramatically under the condition that theprior information has good enough quality [41], [42]. The `1-`1 minimization problem attempts to reconstruct x given aside information signal z ∈ Rn by solving Problem (5) withg(x) = λ(‖x‖1 + ‖x− z‖1), that is,

minx

{1

2‖Φx− y‖22 + λ(‖x‖1 + ‖x− z‖1)

}. (8)

The study in [41], [42] derived the following bound onthe number of measurements required for Problem (8) tosuccessfully reconstruct x:

m`1-`1 ≥ 2h log( n

s0 + ξ/2

)+

7

5

(s0 +

ξ

2

)+ 1, (9)

where ξ, h are quantities—expressing the correlation betweenthe target signal and the prior information—defined as

ξ : = |{i : zi 6= xi = 0}| − |{i : zi = xi 6= 0}|, (10a)h : = |{i : xi > 0, xi > zi} ∪ {i : xi < 0, xi < zi}|, (10b)

wherein xi, zi are the i-th elements in x, z.The PCP [5] problem defined in (1) subsumes the CS

problem. To show this, we follow the formulation in (5) andwrite Problem (1) as [4]

minL,X{H(L,X) = F(L,X) + G(L,X)}, (11)

where F(L,X) = 12‖M−L−X‖2F and G(L,X) = µ‖L‖∗+

µλ‖X‖1, with ‖ · ‖F denoting the Frobenious norm. Thesolution of (11) using proximal gradient methods [45] givesthat L(k+1) and X(k+1) at iteration k+ 1 can be iterativelycomputed via the singular value thresholding operator [46] forL and the soft thresholding operator [45] for XL(k+1) = arg min

L

{µ‖L‖∗

+∥∥∥L−(L(k)− 1

2∇LF(L(k),X(k))

)∥∥∥2F

}(12a)

X(k+1) = arg minX

{µλ‖X‖1

+∥∥∥X−(X(k)− 1

2∇XF(L(k),X(k))

)∥∥∥2F

}. (12b)

II. SPARSE SIGNAL RECOVERY WITH MULTIPLE PRIORINFORMATION

A. Problem StatementWe consider the problem of reconstructing efficiently a

sparse signal given multiple prior signals; specifically, westudy how to leverage—and adapt to—the varying correla-tions among a target sparse signal and multiple known priorsignals. Let us consider the application of video reconstruc-tion: Figs. 1(a)-1(c) depict three consecutive frames fromthe Bootstrap [47] video sequence and Figs. 1(d), 1(f), 1(h)depict the corresponding foregrounds after performing manual

BackgroundVideo Foreground

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2Vectorized pixel index #104

0

50

100

150

200

250

Gra

y le

vel

Vectorized Foreground

(a)

BackgroundVideo Foreground

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2Vectorized pixel index #104

0

50

100

150

200

250

Gra

y le

vel

Vectorized Foreground

(b)

BackgroundVideo Foreground

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2Vectorized pixel index #104

0

50

100

150

200

250

Gra

y le

vel

Vectorized Foreground

(c)BackgroundVideo Foreground

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2Vectorized pixel index #104

0

50

100

150

200

250

Gra

y le

vel

Vectorized Foreground

(d)

BackgroundVideo Foreground

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2Vectorized pixel index #104

0

50

100

150

200

250

Gra

y le

vel

Vectorized Foreground

(e)BackgroundVideo Foreground

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2Vectorized pixel index #104

0

50

100

150

200

250G

ray

leve

lVectorized Foreground

(f)

BackgroundVideo Foreground

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2Vectorized pixel index #104

0

50

100

150

200

250

Gra

y le

vel

Vectorized Foreground

(g)BackgroundVideo Foreground

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2Vectorized pixel index #104

0

50

100

150

200

250

Gra

y le

vel

Vectorized Foreground(h)

BackgroundVideo Foreground

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2Vectorized pixel index #104

0

50

100

150

200

250

Gra

y le

vel

Vectorized Foreground

(i)

Fig. 1. Consecutive frames in the Bootstrap video sequence [frame no.1284, 1285, 1286 in (a), (b), (c), respectively], as well as the depictions[(d), (f), (h)] and vectorized pixel values [(e), (g), (i)] of the correspondingforegrounds.

separation. Each foreground signal is vectorized (in the formof a column-vector in X = [. . . ,xt−2,xt−1,xt, . . . ]) and isdepicted in Figs. 1(e), 1(g), and 1(i), respectively. It is clearthat (i) each foreground signal xt is sparse; and (ii) there ishigh (temporal) correlation across the consecutive foregroundsignals. Suppose that we have access to a low-dimensionalmeasurement vector yt = Φxt from the foreground at timeinstance t (corresponding, for example, to frame no. 1286).Since xt is sparse we can recover it by solving Problem (3)provided that the dimension of yt adheres to (4) [this was themain concept behind the compressive foreground extractionwork in [39], where it was assumed that the backgroundis constant]. Alternatively, we can attempt to recover xt bysolving the `1-`1 minimization problem in (8) [41], [42], wherea previously recovered signal (for example, xt−1 or xt−2) canplay the role of the prior information signal z [this was inturn the idea behind the work in [33], [34], where as in [39] itwas assumed that the background is non-varying]. It mighthowever be that the `1-`1 minimization method performsworse than `1 minimization due to poor correlation betweenthe prior signal z and the target signal xt [41], [42]. Toaddress this issue, we propose RAMSIA a new sparse signalreconstruction algorithm that uses adaptively-learned weightsto balance the contributions from multiple prior informationsignals. RAMSIA formulates a n-`1 minimization problem thecost function of which is obtained by replacing the followingfunction in (5):

g(x) = λ(β0‖W0x‖1 +

J∑j=1

βj‖Wj(x− zj)‖1), (13)

Page 4: Compressive Online Robust Principal Component …homepages.vub.ac.be/~ndeligia/pubs/corpcaVideo_v21.pdfsparse and low-rank recovery, respectively. Other studies have proposed online

4

where x (a.k.a., xt) is the signal to be recovered, z1, . . . ,zJare J prior information signals (a.k.a., zj corresponds toxt−j , j ∈ {1, . . . , J}), βj > 0 are weights across the priorinformation vectors, and the element-wise weights, Wj =diag(wj1, wj2, ..., wjn), are assigned to each element xi− zjiin the vector x − zj . By writing the first summation termin (13) as β0‖W0(x − z0)‖1 with z0 = 0, we can thencompact the function as

g(x) = λ

J∑j=0

βj‖Wj(x− zj)‖1. (14)

We end up formulating the objective function of the n-`1problem as

minx

{H(x)=

1

2||Φx−y||22 +λ

J∑j=0

βj ||Wj(x−zj)||1}. (15)

The use of multiple prior information signals is the mainnovelty of our algorithm compared to prior studies. Further-more, the use of weights across the side information vectorsas well as the elements of each vector differentiates RAMSIAfrom our previous work [48]. We note that (15) boils downto Problem (8) when βj = 1, W0 = W1 = I, andWj = 0 (j ≥ 2), where I is the n× n identity matrix.

B. The proposed RAMSIA Algorithm

We solve the n-`1 minimization problem in (15) by modi-fying the fast iterative soft thresholding (FISTA) method [45],where, at every iteration k of the algorithm, we update theweights βj and Wj , and also compute x. In this way, weadaptively weight multiple prior information signals accordingto their qualities during the iterative process. We may havedifferent strategies to update the weights depending on ourconstraints. In this work1, we set the constraints as

∑ni=1 wji=

n and∑Jj=0 βj=1.

At each iteration, our algorithm proceeds as follows: Firstly,we keep Wj and βj fixed, and we compute x by minimizingProblem (15). Adhering to the proximal gradient method [45],x(k) at iteration k is obtained from (6), where g(x) =λ∑Jj=0 βj‖Wj(x − zj)‖1. The proximal operator Γ 1

L g(x)

(7) for our problem is given by [we restate here the resultderived in Proposition A.1 in Appendix A]

Γ 1L g

(xi) =

xi − λL

J∑j=0

βjwji(−1)b(l<j) if (17a),

zli if (17b),(16)

with

zli+λ

L

J∑j=0

βjwji(−1)b(l<j)<xi<z(l+1)i+λ

L

J∑j=0

βjwji(−1)b(l<j),

(17a)

zli+λ

L

J∑j=0

βjwji(−1)b(l−1<j)≤xi≤zli+λ

L

J∑j=0

βjwji(−1)b(l<j),

(17b)

where, without loss of generality, we have assumed that−∞ =z(−1)i ≤ z0i ≤ z1i ≤ . . .≤ zJi ≤ z(J+1)i = ∞, and we havedefined a boolean function

1In [48], we used a different weighting strategy; specifically,∑Jj=0 wji=1

and βj = 1, ∀j = {0, 1, . . . .J}.

b(l < j) =

{1, if l < j0, otherwise.

(18)

with l ∈ {−1, . . . , J}. It is worth noting that (17a) and (17b)are disjoint.

Secondly, given x and βj , we update Wj per prior informa-tion zj . In order to improve the reconstruction of x, we aim atassigning each element xi−zji in the vector x−zj to a weightwji that is higher when |xi − zji| is low. In this way, we canpromote the sparsity of the solution of the n-`1 minimizationproblem2. Therefore, we set wji = η/(|xi − zji| + ε) forall i = {1, . . . , n}, where a small parameter ε > 0 isadded to ensure that zero-valued elements |xi − zji| do notcause a breakdown of the method. Setting the constraint∑ni=1 wji = n, i.e.,

∑ni=1(η/(|xi − zji|+ ε)) = n, we derive

η =n

n∑l=1

(|xi − zjl|+ ε)−1. (19)

Then, we get

wji =n(|xi − zji|+ ε)−1n∑l=1

(|xi − zjl|+ ε)−1. (20)

Thirdly, keeping x and Wj fixed, we compute βj across theprior information vectors—which should have a higher valuewhen they correspond to priors zj closer to x—similar to wji,we obtain βj as

βj = η/(‖Wj(x− zj)‖1 + ε). (21)

Combining (21) with the constraint∑Jj=0 βj = 1, similar to

the derivation of wji in (20), we get

βj =

(‖Wj(x− zj)‖1 + ε

)−1J∑l=0

(‖Wl(x− zl)‖1 + ε

)−1 . (22)

The steps of RAMSIA3 are summarized in Algorithm 1. Itsmain difference w.r.t. FISTA [45] is the iterative updating ofthe weights using (20) and (22). The stopping criterion forAlgorithm 1 can be either a maximum number of iterationskmax reached, a small relative variation of the objectivefunction H(x) in (15), or a small change of the number ofnonzero components of the estimate x(k) at iteration k. In thiswork, we use the small relative variation of H(x).

III. THE PROPOSED CORPCA METHOD

We now present our compressive online RPCA (CORPCA)algorithm and we derive a bound on the number of requiredmeasurements that guarantee successful signal decomposition.

2The `1 minimization is a relaxation of the `0 minimization problem forrecovering a sparse signal. The `1 norm depends on the magnitude of thenonzero signal coefficients, while the `0 norm only counts the number nonzerocoefficients in the signal. Hence, as proposed in [49], the weights in the re-weighted version of `1 minimization are designed to reduce the impact ofthe magnitude of the nonzero elements, thereby leading to a solution thatapproximates better the one obtained with `0 minimization. For further detailson re-weighted `1 minimization, we refer to [49].

3The code is available online at https://github.com/huynhlvd/ramsia.

Page 5: Compressive Online Robust Principal Component …homepages.vub.ac.be/~ndeligia/pubs/corpcaVideo_v21.pdfsparse and low-rank recovery, respectively. Other studies have proposed online

5

Algorithm 1: The proposed RAMSIA algorithm.Input: y, Φ, z1, z2, ..., zJ ;Output: x;// Initialization.

1 u(1)=x(0)=0; W0=I; β0=1; Wj=0;βj=0 (1≤j≤J); L=L∇f ; λ, ε>0; ξ1=1; k=0;

2 while stopping criterion is false do3 k = k + 1;

// Solving (15) given the weights.4 ∇f(u(k)) = ΦT(Φu(k) − y);

5 x(k)=Γ1L g

(u(k)− 1

L∇f(u(k)))

; where Γ1L g

(·) is givenby (16);// Computing the updated weights.

6 wji=n(|x(k)i −zji|+ ε)−1n∑l=1

(|x(k)i −zjl|+ ε)−1;

7 βj=

(||Wj(x

(k)−zj)||1+ ε)−1

J∑l=0

(||Wl(x(k)−zl)||1+ ε

)−1 ;

// Updating new values.8 ξk+1=(1+

√1+4ξ2k)/2;

9 u(k+1)=x(k)+ ξk−1ξk+1

(x(k)−x(k−1));10 end11 return x = x(k);

A. The CORPCA Objective Function

At time instance t, the algorithm receives as input a randomprojection yt = Φt(xt + vt) ∈ Rmt of a data vector (a.k.a.,video frame) and estimates the sparse and low-rank compo-nents (xt, vt, respectively) with the aid of prior informationgleaned from multiple previously decomposed vectors. Thealgorithm solves the following optimization problem:

minxt,vt

{H(xt,vt)=

1

2‖Φt(xt + vt)−yt‖22

+ λµ

J∑j=0

βj‖Wj(xt − zj)‖1 + µ∥∥∥[Bt−1 vt]

∥∥∥∗

}, (23)

where λ, µ > 0 are tuning parameters, and Zt−1 := {zj}Jj=1,Bt−1 ∈ Rn×d denotes a matrix, of which serve as priorinformation for xt and vt, respectively. The componentsin Zt−1 and Bt−1 can be a direct (sub-)set of the previouslyreconstructed data vectors {x1, ..., xt−1} and {v1, ..., vt−1},or formed after applying a processing step. In the case of videodata, the processing step can compensate for the motion acrossthe frames, for example, by means of optical flow [50]. Finally,the reconstructed vectors xt and vt are used to update the priorinformation Zt and Bt, which is used at the decompositionat time instance t+ 1.

We note that Problem (23) is obtained by replacing theterms λ1‖xt‖1 + λ2f(xt,Xi−1) in Problem (2) with thefunction g(x) in the objective function of RAMSIA [c.f. (14)].It is also worth observing that when vt is non-varying,Problem (23) becomes Problem (15). Furthermore, when xtand vt are batch variables and we do not take the priorinformation, Zt−1 and Bt−1, and the projection Φt into

account, Problem (23) becomes Problem (1).

B. The CORPCA Algorithm

The proposed CORPCA4 algorithm operates in two steps:It first solves Problem (23) given Zt−1 and Bt−1 and thenupdates the matrices Zt and Bt, which are to be used in thefollowing time instance.

Solution of Problem (23). Let us denote f(vt,xt) =(1/2)‖Φt(xt + vt) − yt‖22, g(xt) = λ

∑Jj=0 βj‖Wj(xt −

zj)‖1, and h(vt) = ‖[Bt−1 vt]‖∗. As shown in Algorithm 2,CORPCA solves (23) by using proximal gradient methods[4], [45]. Specifically, as shown in Lines 3-9 in Algorithm 2,the algorithm computes x(k+1)

t and v(k+1)t at iteration k + 1

via the soft thresholding operator [45] and the singular valuethresholding operator [46], respectively:

x(k+1)t =arg min

xt

{µg(xt)+

∥∥∥xt−(x(k)t −

1

2∇xtf(v

(k)t ,x

(k)t ))∥∥∥2

2

}.

(24)

v(k+1)t =arg min

vt

{µh(vt)+

∥∥∥vt−(v(k)t −1

2∇vtf(v

(k)t ,x

(k)t ))∥∥∥2

2

},

(25)The proximal operator Γ τg1(·) in Line 7 of Algorithm 2 isdefined as

Γ τg1(X) = arg minV

{τg1(V ) +

1

2||V −X||2F

}, (26)

where g1(·) = ‖ · ‖1. The weights Wj and βj are updatedper iteration of the algorithm (see Lines 10-11). As suggestedin [4], the convergence of the convex problem (23) of Algo-rithm 2 in Line 2 is determined by evaluating the criterion‖∂H(xt,vt)|x(k+1)

t ,v(k+1)t‖22 < 2× 10−7‖(x(k+1)

t ,v(k+1)t )‖22.

Prior Update. Updating Zt and Bt is carried out aftereach time instance (see Lines 15-16 in Algorithm 2). Dueto the correlation between subsequent signals (a.k.a., videoframes in our example), we update the prior information Ztby using the J latest recovered sparse components, that is,Zt := {zj = xt−J+j}Jj=1. For Bt ∈ Rn×d, we consider anadaptive update, which operates on a fixed or constant numberd of the columns of Bt. To this end, we use the incrementalsingular value decomposition (SVD) [40] method [referring tothe function incSVD(·) in Line 6 of Algorithm 2]. It is worthnoting that the update Bt = U tΓ µk

2 g1(Σt)V

Tt , causes the

dimension ofBt to increase asBt ∈ Rn×(d+1) after each timeinstance. However, in order to maintain a reasonable numberof d, we take Bt = U t(:, 1 : d)Γ µk

2 g1(Σt)(1 : d, 1 : d)V T

t (:

, 1 : d). The computational cost of incSVD(·) is lower thanthat of the conventional SVD [17], [40] since we only computethe full SVD of the middle matrix with size (d+1)× (d+1),where d� n, instead of n× (d+ 1).

The computation of incSVD(·) is presented in the fol-lowing: The goal is to compute incSVD[Bt−1 vt], i.e.,[Bt−1 vt] = U tΣtV

Tt . By taking the SVD of Bt−1 ∈ Rn×d,

we obtain Bt−1 = U t−1Σt−1VTt−1. Therefore, we can

derive the components (U t,Σt,V t) using the components(U t−1,Σt−1,V t−1) and the vector vt. We write the matrix[Bt−1 vt] as

4The code for CORPCA is available online [51].

Page 6: Compressive Online Robust Principal Component …homepages.vub.ac.be/~ndeligia/pubs/corpcaVideo_v21.pdfsparse and low-rank recovery, respectively. Other studies have proposed online

6

Algorithm 2: The proposed CORPCA algorithm.Input: yt, Φt, Zt−1, Bt−1;Output: xt, vt, Zt, Bt;// Initialize variables and parameters.

1 x(−1)t =x

(0)t =0; v(−1)t =v

(0)t =0; W0=I; β0=1;

Wj=0; βj=0 (1≤j≤J); ξ−1= ξ0=1; µ0=0; µ>0;λ > 0; 0<ε<1; k=0; g1(·)=‖ · ‖1;

2 while not converged do// Solve Problem (23).

3 vt(k)= v

(k)t + ξk−1−1

ξk(v

(k)t −v(k−1)t );

4 xt(k)

= x(k)t + ξk−1−1

ξk(x

(k)t −x(k−1)

t );

5 ∇vtf(vt(k), xt

(k)) = ∇xtf(vt

(k), xt(k)

) =

ΦtT(Φt(vt

(k) + xt(k)

)− yt)

;6 (U t,Σt,V t) =

incSVD([Bt−1

(vt

(k)− 12∇vtf(vt

(k), xt(k)

))])

;

7 Θt=U tΓ µk2 g1

(Σt)VTt ;

8 v(k+1)t = Θt(:, end);

9 x(k+1)t =Γµk

2 g

(xt

(k) − 12∇xtf(vt

(k), xt(k)

))

; whereΓµk

2 g(·) is given by (16);

// Compute the updated weights.

10 wji=n(|x(k+1)

ti −zji|+ ε)−1n∑l=1

(|x(k+1)tl −zjl|+ ε)−1

;

11 βj=

(||Wj(x

(k+1)t −zj)||1+ ε

)−1J∑l=0

(||Wl(x

(k+1)t −zl)||1+ ε

)−1 ;

12 ξk+1 = (1 +√

1 + 4ξ2k)/2; µk+1 = max(εµk, µ);13 k = k + 1;14 end

// Update prior information.

15 Zt := {zj = x(k+1)t−J+j}Jj=1;

16 Bt = U t(:, 1 : d)Γ µk2 g1

(Σt)(1 : d, 1 : d)V Tt (:, 1 : d);

17 return xt = x(k+1)t , vt = v

(k+1)t , Zt, Bt;

[Bt−1 vt]=[U t−1

δt‖δt‖2

]·[

Σt−1 et0T ‖δt‖2

]·[V Tt−1 0

0T 1

],

(27)where et = UT

t−1vt and δt = vt − U t−1et. By takingthe SVD of the matrix in between the right side of (27),

we have[

Σt−1 et0T ‖δt‖2

]= UΣV

T. Finally, we obtain

U t =[U t−1

δt‖δt‖2

]·U , Σt=Σ, and V t =

[V Tt−1 0

0T 1

]·V .

C. Performance Guarantees

We establish a bound for the minimum number of measure-ments required to solve the n-`1 minimization in Problem (23).The bound depends on the support of the sparse vector xt tobe recovered and the correlations between xt and the multiplepriors zj . The correlations are expressed via the supports ofthe differences xt−zj . The bound serves as performance guar-antees on the number of measurements for COPRCA, whichsolves iteratively the n-`1 minimization problem (23). This

bound is useful for adaptively determining the compressiverate for recursive separation; namely, at a given time instance,the bound can be used to predict the number of measurementsrequired at the next time instance. In our theoretical analysis,we assume that vt is slowly-changing incurring a measurementerror that is bounded by σ, ‖Φt(vt−vt)‖2<σ, where vt is arecovered low-rank component. In the rest of the paper, we uses0 to denote the dimensionality of the support of the sourcext and sj to denote that of each difference vector xt− zj ;namely, we can write ‖xt−zj‖0=sj , where j ∈ {0, . . . , J}.We establish the following result:

Theorem III.1. In Problem (23), let mt be the number ofmeasurements required to successfully separate xt (‖xt‖0 =s0) and vt from compressive measurements yt = Φt(xt +vt)—where the elements of Φt ∈ Rmt×n are randomly drawnfrom an i.i.d. Gaussian distribution—and prior informationZt−1 := {zj}Jj=1—with ‖xt − zj‖0 = sj and zj ∈ Rn—andBt−1 ∈ Rn×d. Let λmt denote the expected length of a zero-mean, unit-variance mt-dimensional random Gaussian vectorand ω(xt) the Gaussian width [44, Definition 3.1].

1) If vt is not changing in time, we can recover xt sucess-fully with probability greater than 1 − exp(− 1

2 [λmt −ω(xt)]

2) provided that

mt ≥ 2α log( n∑J

j=0 βjsj

)+

7

5

( J∑j=0

βjsj

)+ 1. (28)

2) If vt is slowly-changing, that is, Bt−1 and [Bt−1 vt] arelow-rank, we obtain xt and vt by solving the problemin (23). Assuming that the measurement error of vt isbounded as ‖Φt(vt − vt)‖2 < σ, we have that ‖xt −xt‖2 ≤ 2σ

1−√ρ , where 0<ρ<1, with probability greaterthan 1−exp(− 1

2 [λmt−ω(xt)− (1−√ρ)√mt]

2) providedthat

mt ≥ 2α

ρlog( n∑J

j=0 βjsj

)+

7

( J∑j=0

βjsj

)+

3

2ρ, (29)

where α =ε2

η2

J∑j=0

βjsj∑i=1

w2ji in both (28) and (29)5 wherein

ε > 0 and η = minj

{n∑n

i=1(|xti−zji|+ε)−1

}.

Proof: The proof is given in Appendix B

IV. EXPERIMENTAL RESULTS

We first assess the performance of RAMSIA (Section IV-A)on sparse signal recovery and the performance of CORPCA(Section IV-B) on compressive signal decomposition usingsynthetic data. We then assess the performance of CORPCAusing real video data (Section IV-C).

A. Performance Evaluation of RAMSIA Using Synthetic Data

1) Experimental Setup: We consider the reconstructionof a sparse vector x given three known prior informationvectors, z1, z2, z3. We generate x with n = 1000 entries,

5We obtain wji and βj from (20) and (22), respectively, given xt and zj .Meanwhile, wji (Line 10) and βj (Line 11) in Algorithm 2 of CORPCAare iteratively updated based on the current reconstructed x

(k+1)t , which is

converged to xt under the successful separation.

Page 7: Compressive Online Robust Principal Component …homepages.vub.ac.be/~ndeligia/pubs/corpcaVideo_v21.pdfsparse and low-rank recovery, respectively. Other studies have proposed online

7

128 of which were nonzero (‖x‖0 = s0 = 128), andwhich were drawn from a standard Gaussian distribution. Theprior information signals were generated as zj = x − ςj ,j = {1, 2, 3}, where ςj is a 64-sparse vector whose nonzeroentries were drawn from a standard Gaussian distribution.The supports of x and ςj coincided in 51 positions. Sucha prior information differed significantly from x; specifically,we have ‖x−zj‖2/‖x‖2 ≈ 0.56. We also considered the casewhere the prior information signals have poorer quality, ex-pressed by increased values for the support of ςj (specifically,sj = ‖ςj‖0 = ‖x − zj‖0 = 256 and sj = 352). In this case,the number of supports for which x and ςj coincide was set to128, leading to a relative distance of ‖x− zj‖2/‖x‖2≈ 1.12for sj = 256. We clarify that we set all sj , j = {1, 2, 3},equal for simplicity.

2) Performance Evaluation: We compare the reconstruc-tion performance of the proposed RAMSIA algorithm againstexisting methods based on `1 norm minimization. For a givennumber of measurements m, we compute the probability ofsuccessful recovery Pr(success) as the number of times thesignal x is recovered within an error ‖x−x‖2/‖x‖2 ≤ 10−2,divided by the total number of 100 trials (each trial considereddifferent x, z1, z2, z3, and Φ).

Let RAMSIA-1-`1, RAMSIA-2-`1, RAMSIA-3-`1 denotethe proposed RAMSIA algorithm configured to use one (a.k.a.,z1), two (a.k.a., z1, z2), and three (a.k.a., z1, z2, and z3)prior information signals, respectively. We have set λ = 10−5

and ε = 0.8 in RAMSIA, as experimentation revealed thatthese values led to the best performance. Similarly, RAMSI-3-`1, RAMSI-2-`1, RAMSI-1-`1 are the corresponding threeversions of our previous algorithm in [48]. Moreover, letFISTA and FISTA-`1-`1 denote the versions of the FISTA [45]algorithm solving the `1 and the `1-`1 minimization prob-lem [41], [42], respectively (where the latter is using thesignal z1 as prior information). Finally, Mod-CS correspondsto the Modified-CS [52] algorithm. We assess the recoveryperformance of the aforementioned algorithms via reportingthe success rate versus the number of measurements m inFig. 2. Specifically, Fig. 2(a) and Figs. 2(b), (c) presentthe results for the experiments corresponding to the good(sj = 64) and the poor (sj = 256 and sj = 352) qualityprior information, respectively. It is clear that RAMSIA-3-`1 successfully leverages the correlation of the various priorinformation signals with the target signal, delivering the high-est probability of success in comparison to all competingalgorithms for all experiments. It is also evident that theperformance of RAMSIA increases with the number of usedprior signals. As anticipated, the increase in performance ismore apparent when the quality of the prior information is high[compare, for example, the results in Fig. 2(a) and Fig. 2(c)].Importantly, RAMSIA-1-`1 outperforms significantly (and sys-tematically) RAMSI-1-`1 [48], FISTA-`1-`1 [41], [42], andMod-CS [52], which also consider only one prior informationsignal aiding the reconstruction. Interestingly, Fig. 2(c) revealsthat the accuracy of FISTA-`1-`1 and Mod-CS becomes lowerthan that of the conventional FISTA algorithm; namely, inthese algorithms the poor-quality prior information hinders therecovery performance. This does not happen for RAMSIA-1-

Measurements m50 100 150 200 250 300 350 400

Pr(

succ

ess)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RAMSIA-3-`1

RAMSIA-2-`1

RAMSIA-1-`1

RAMSI-3-`1

RAMSI-2-`1

RAMSI-1-`1

FISTA-`1-`1

Mod-CS

FISTA

(a) sj = 64

Measurements m200 220 240 260 280 300 320 340 360 380 400 420

Pr(

succ

ess)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RAMSIA-3-`1

RAMSIA-2-`1

RAMSIA-1-`1

RAMSI-3-`1

RAMSI-2-`1

RAMSI-1-`1

FISTA-`1-`1

Mod-CS

FISTA

(b) sj = 256

Measurements m250 300 350 400 450

Pr(

succ

ess)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RAMSIA-3-`1

RAMSIA-2-`1

RAMSIA-1-`1

RAMSI-3-`1

RAMSI-2-`1

RAMSI-1-`1

FISTA-`1-`1

Mod-CS

FISTA

(c) sj = 352

Fig. 2. Probability of success in recovering the synthetic signal x ∈ R1000

(with sparsity s0 = 128) versus the number of measurements m. The recon-struction is aided by a number of prior information signals (zj , j = {1, 2, 3})with sparsity of the difference vector ςj = x − zj equal to: (a) sj = 64,(b) sj = 256, and (c) sj = 352, j = {1, 2, 3}.`1, which always improves over FISTA irrespective of the sideinformation quality. These results highlight the advantages ofthe proposed algorithm compared to the state of the art.

3) Prior-Information Quality Analysis: We here analyzefurther how the prior information quality influences the num-ber of measurements required to successfully reconstruct thesignal. Specifically, we keep the sparsity of x constant ats0 = 128 and assess the performance of RAMSIA, ourprevious RAMSI in [48], and the other algorithms when sj—that is, the sparsity of ςj , j = {1, 2, 3}—varies from 20 to400 (for simplicity, we set s1 = s2 = s3). In this experiment,we assume successful recovery when Pr(success)≥ 0.98. Fig.3 depicts the number of measurements required for successfulreconstruction versus the value of sj , j = {1, 2, 3} for the dif-ferent algorithms. Evidently, using prior information reduces

Page 8: Compressive Online Robust Principal Component …homepages.vub.ac.be/~ndeligia/pubs/corpcaVideo_v21.pdfsparse and low-rank recovery, respectively. Other studies have proposed online

8

Number of nonzeros50 100 150 200 250 300 350 400

Mea

sure

men

ts

100

150

200

250

300

350

400

450

RAMSIA-3-`1

RAMSIA-2-`1

RAMSIA-1-`1

RAMSI-3-`1

RAMSI-2-`1

RAMSI-1-`1

FISTA-`1-`1

FISTA

Fig. 3. The number of measurements required to recover the syntheticsignal x ∈ R1000 (with sparsity s0 = 128) versus the sparsity s1 = s2 = s3of the difference vectors ςj = x− zj , j = {1, 2, 3}.the number of measurements compared to FISTA. However,at sj = 315, the Modified-CS and FISTA-`1-`1 performancebecome worse than that of FISTA, thereby illustrating thelimitation of the `1-`1 minimization approach. On the contrary,the proposed weighted n-`1 minimization strategy leads tosystematically improved performance compared to FISTA.

B. Performance Evaluation of CORPCA Using Synthetic Data

We evaluate the performance of Algorithm 2 employingthe proposed n-`1 minimization method as well as the ex-isting `1 [45] and `1-`1 [41], [42] minimization methodsfor the sparse recovery step. We denote the three configu-rations as CORPCA-n-`1, CORPCA-`1-`1, and CORPCA-`1,respectively. We also compare CORPCA against RPCA [5],GRASTA [18], and ReProCS [37]. RPCA [5] is a batch-based method assuming full access to the data, while GRASTA[18] and ReProCS [37] are online methods. For compressivemeasurements, ReProCS (see Algorithm 4 in [37]) recoversthe sparse components, while GRASTA recovers the low-rankcomponents (GRASTA was assessed using full video data forthe separation, see the experiments in [18]).

We generate our data as follows. First, we generate thelow-rank component as L = UV T, where U ∈ Rn×r andV ∈R(d+q)×r are random matrices whose entries are drawnfrom the standard normal distribution. We set n = 500,r = 5 (the rank of L), and d = 100 the number ofvectors for training and q = 100 the number of testingvectors; this yields L = [v1 . . .vd+q]. Secondly, we generateX = [x1 . . .xd+q]. Specifically, at time instance t = 1,we draw x1 ∈ Rn from the standard normal distributionwith s0 nonzero elements. Then, we generate a sequence ofcorrelated sparse vectors xt, t = {2, 3, . . . , d + q}, whereeach xt satisfies ‖xt − xt−1‖0 = s0/2. As this could leadto ‖xt‖0 > s0, we add the constraint ‖xt‖0 ∈ [s0, s0 + 15].Whenever ‖xt‖0 > s0 + 15, we reset xt to ‖xt‖0 = s0 bysetting ‖xt‖0−s0 randomly selected positions to zero. Thirdly,we initialize the prior information; in order to address realscenarios, where we do not know the sparse and low-rankcomponents, we use the batch-based RPCA [5] method toseparate the training set M0 = [x1 + v1 ... xd + vd] so as

to obtain B0 = [v1 ... vd]. In this experiment, we use three(a.k.a., J = 3) sparse components as prior information andwe set Z0 := {0,0,0}.

We then evaluate the CORPCA method (see Section III-B)on the test set of vectors M = [xd+1+vd+1 ... xd+q+vd+q].Specifically, we vary s0 (from 10 to 90) and the measure-ment rate m/n, and we assess the probability of successfulreconstruction Pr(success). In this experiment we attempt torecover both the sparse xt and the low-rank vt componentfor all t = d + 1, d + 2, . . . , d + q. Hence, we assess theprobability of success for the sparse Prsparse(success) and thelow-rank Prlow-rank(success) component, averaged over the testvectors t = d + 1, d + 2, . . . , d + q. Prsparse(success) [resp.Prlow-rank(success)] is defined as the number of times in whichthe sparse component xt (resp. the low-rank component vt)is recovered within an error ‖xt−xt‖2/‖xt‖2 ≤ 10−2 (resp.‖vt − vt‖2/‖vt‖2 ≤ 10−2) divided by the total 50 MonteCarlo simulations. We note that in Algorithm 2 we have setε = 0.8, λ = 1/

√n, and µ = 10−3. In addition, we evaluate

our bound for the n-`1 minimization method in (29) as wellas the corresponding bounds for `1 and `1-`1 minimization6.We have set the parameter ρ to 0.8/3, 0.6/3, 0.4/3 for thebounds of n-`1, `1-`1, and `1 minimization, respectively, asexperimentation has shown that these values achieve the mostaccurate prediction of the real performance.

The results in Fig. 4 demonstrate the efficiency of theproposed CORPCA algorithm employing the proposed n-`1 minimization method. At specific sparsity levels, we canrecover the 500-dimensional data from measurements of muchlower dimensions [m/n = 0.3 to 0.6, see the white areas inFig. 4(a)]. It is also clear that the `1 and `1-`1 minimizationmethods [see Figs. 4(b), 4(c)] lead to a higher number ofmeasurements, thereby illustrating the benefit of incorporatingmultiple side information into the problem. As mentionedbefore, the ReProCS and GRASTA methods focus primarilyon recovering the sparse components and the low-rank compo-nents from compressive measurements, respectively. Fig. 4(d)shows that the performance of ReProCS is worse than that ofCORPCA-n-`1 [see Fig. 4(a)]. Moreover, Fig. 4(e) shows thatGRASTA delivers lower low-rank recovery performance thanthe proposed CORPCA method. It is also worth observing thatthe measurement bound for recovering the sparse componentby CORPCA [red line in Fig. 4(a)] is sharper than the existingbounds for `1-`1 and `1 minimization, depicted in Fig. 4(b),and Fig. 4(c), respectively.

Moreover, we evaluate CORPCA given that only the sub-sampled data of the training set M0 is available. To obtainthe prior B0, we perform compressive batch RPCA—usingthe SpaRCS [23] method—on the subsampled set of M0 viaa projection Φ ∈ Rm×n. The testing set M is generatedsimilarly with the above full training set. We vary s0 from5 to 90 and set n = 512, r = 5 (the rank of L), thenumber of training vectors d = 128, and the number of

6The bound for the number of measurements required by `1 and `1-`1 minimization to achieve successful recovery of the sparse componentassuming that the low-rank component is slowly-varying (see Theorem III.1)is given by m`1 ≥ 2 s0

ρlog n

s0+ 7

5s0ρ

+ 32ρ

[44], and m`1-`1 ≥2hρlog(

ns0+ξ/2

)+ 7

5ρs0ρ

(s0 + ξ

2

)+ 3

2ρ[41], [42], respectively.

Page 9: Compressive Online Robust Principal Component …homepages.vub.ac.be/~ndeligia/pubs/corpcaVideo_v21.pdfsparse and low-rank recovery, respectively. Other studies have proposed online

9

2

Sparse

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Nonze

ross 0

90

70

50

30

10Bound mt

Pr(

succ

ess)

[%]

0

50

100

Measurement rate m=n0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Nonze

ross 0

90

70

50

30

10P

r(su

cces

s) [%

]0

50

100

Fig. 1. Average success probabilities for CORPCA (for recovering xt,vt ∈ R512) using the subsampled training data 35% and 70% against the full set.

components Xt−1, and λ1, λ2 are tuning parameters. Prob-lem (2) emerges naturally in scenarios where the data sampleshave strong temporal correlation; for example, there is highcorrelation across the backgrounds and the foregrounds inmultiple adjacent video frames [18], [19].

Relation to prior work. Unlike batch [4]–[7] or on-line [9]–[11], [17] RPCA approaches that consume full datasamples, operating on compressive measurements (like ourmethod does) improves the computational efficiency and re-duces the data communication cost. Examples of compressivebatch RPCA solutions include Compressive PCP [20] andSpaRCS [21]. The former solves a modified version of Prob-lem (1) with constraint D = Φ(L+X) using a fast interior-point algorithm [22], while the latter adheres to a greedyapproach, combining CoSaMP [23] and ADMiRA [24] forsparse and low-rank recovery, respectively. Other studies haveproposed online RPCA methods that consider sub-sampleddata: GRASTA [25] alternates between an ADMM (alternatingdirection method of multipliers [26]) step to update the sparsecomponent and an incremental gradient descent method on theGrassmannian manifold to update the low-rank component.GOSUS [27] improves over GRASTA by accounting forstructured sparsity in the foreground, and ROSETA [28] byupdating the sub-space using a proximal point iteration.

Recently, in the context of foreground-background separa-tion, the works in [29]–[31] proposed advanced factorizationmethods to obtain robust background subtractions. A newlow-rank matrix factorization with a general noise distribu-

tion model was developed in [29]. Meanwhile, a mixture ofGaussian model was developed in [30] that can be updatedframe-by-frame and regularized by the previous foreground-background frames. The method in [31] studied backgroundsubtraction from compressive measurements by proposing atensor RPCA to exploiting the spatial-temporal correlation inbackground and continuity of foreground. The study in [32]presented an online compressive RPCA method that is sup-ported by performance guarantees. However, these methodsas well as the methods in [25], [27], [28] do not explore thecorrelation across adjacent sparse components, as we proposein Problem (2).

The problem of reconstructing a series of time-varyingsparse signals using prior information has been explored inonline RPCA [15] and in recursive CS [33]–[35] (see [36] fora review). The study in [37] extended the ReProCS [14], [15]algorithm to the compressive case and used modified-CS [38]to leverage prior support knowledge under the condition ofslowly-varying support. However, no explicit condition on thenumber of measurements required for successful recovery wasreported. Alternatively, the studies in [33]–[35], [39] assumedthe low-rank vector non-varying (v = v1 = · · · = vt−1 = vt)and recovered the sparse component using `1 [35], [39] or`1-`1 [33], [34] minimization. The studies in [33]–[35] alsoprovide a method to compute the number of compressive mea-surements needed to recover xt per time instance. However,the methods in [34], [35], [37] do not exploit multiple priorinformation from multiple previously recovered frames.

Low-rank

Measurement rate m=n

(a) CORPCA-n-`1

2

Sparse

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Nonze

ross 0

90

70

50

30

10

Bound m`1!`1

Pr(

succ

ess)

[%]

0

50

100

Measurement rate m=n0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Non

zero

ss 0

90

70

50

30

10

Pr(

succ

ess)

[%]

0

50

100

Fig. 1. Average success probabilities for CORPCA (for recovering xt,vt ∈ R512) using the subsampled training data 35% and 70% against the full set.

components Xt−1, and λ1, λ2 are tuning parameters. Prob-lem (2) emerges naturally in scenarios where the data sampleshave strong temporal correlation; for example, there is highcorrelation across the backgrounds and the foregrounds inmultiple adjacent video frames [18], [19].

Relation to prior work. Unlike batch [4]–[7] or on-line [9]–[11], [17] RPCA approaches that consume full datasamples, operating on compressive measurements (like ourmethod does) improves the computational efficiency and re-duces the data communication cost. Examples of compressivebatch RPCA solutions include Compressive PCP [20] andSpaRCS [21]. The former solves a modified version of Prob-lem (1) with constraint D = Φ(L+X) using a fast interior-point algorithm [22], while the latter adheres to a greedyapproach, combining CoSaMP [23] and ADMiRA [24] forsparse and low-rank recovery, respectively. Other studies haveproposed online RPCA methods that consider sub-sampleddata: GRASTA [25] alternates between an ADMM (alternatingdirection method of multipliers [26]) step to update the sparsecomponent and an incremental gradient descent method on theGrassmannian manifold to update the low-rank component.GOSUS [27] improves over GRASTA by accounting forstructured sparsity in the foreground, and ROSETA [28] byupdating the sub-space using a proximal point iteration.

Recently, in the context of foreground-background separa-tion, the works in [29]–[31] proposed advanced factorizationmethods to obtain robust background subtractions. A newlow-rank matrix factorization with a general noise distribu-

tion model was developed in [29]. Meanwhile, a mixture ofGaussian model was developed in [30] that can be updatedframe-by-frame and regularized by the previous foreground-background frames. The method in [31] studied backgroundsubtraction from compressive measurements by proposing atensor RPCA to exploiting the spatial-temporal correlation inbackground and continuity of foreground. The study in [32]presented an online compressive RPCA method that is sup-ported by performance guarantees. However, these methodsas well as the methods in [25], [27], [28] do not explore thecorrelation across adjacent sparse components, as we proposein Problem (2).

The problem of reconstructing a series of time-varyingsparse signals using prior information has been explored inonline RPCA [15] and in recursive CS [33]–[35] (see [36] fora review). The study in [37] extended the ReProCS [14], [15]algorithm to the compressive case and used modified-CS [38]to leverage prior support knowledge under the condition ofslowly-varying support. However, no explicit condition on thenumber of measurements required for successful recovery wasreported. Alternatively, the studies in [33]–[35], [39] assumedthe low-rank vector non-varying (v = v1 = · · · = vt−1 = vt)and recovered the sparse component using `1 [35], [39] or`1-`1 [33], [34] minimization. The studies in [33]–[35] alsoprovide a method to compute the number of compressive mea-surements needed to recover xt per time instance. However,the methods in [34], [35], [37] do not exploit multiple priorinformation from multiple previously recovered frames.

Low-rank

Measurement rate m=n

(b) CORPCA-`1-`1

2

Sparse

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Nonze

ross 0

90

70

50

30

10

Bound m`1

Pr(

succ

ess)

[%]

0

50

100

Measurement rate m=n0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Non

zero

ss 0

90

70

50

30

10

Pr(

succ

ess)

[%]

0

50

100

Fig. 1. Average success probabilities for CORPCA (for recovering xt,vt ∈ R512) using the subsampled training data 35% and 70% against the full set.

components Xt−1, and λ1, λ2 are tuning parameters. Prob-lem (2) emerges naturally in scenarios where the data sampleshave strong temporal correlation; for example, there is highcorrelation across the backgrounds and the foregrounds inmultiple adjacent video frames [18], [19].

Relation to prior work. Unlike batch [4]–[7] or on-line [9]–[11], [17] RPCA approaches that consume full datasamples, operating on compressive measurements (like ourmethod does) improves the computational efficiency and re-duces the data communication cost. Examples of compressivebatch RPCA solutions include Compressive PCP [20] andSpaRCS [21]. The former solves a modified version of Prob-lem (1) with constraint D = Φ(L+X) using a fast interior-point algorithm [22], while the latter adheres to a greedyapproach, combining CoSaMP [23] and ADMiRA [24] forsparse and low-rank recovery, respectively. Other studies haveproposed online RPCA methods that consider sub-sampleddata: GRASTA [25] alternates between an ADMM (alternatingdirection method of multipliers [26]) step to update the sparsecomponent and an incremental gradient descent method on theGrassmannian manifold to update the low-rank component.GOSUS [27] improves over GRASTA by accounting forstructured sparsity in the foreground, and ROSETA [28] byupdating the sub-space using a proximal point iteration.

Recently, in the context of foreground-background separa-tion, the works in [29]–[31] proposed advanced factorizationmethods to obtain robust background subtractions. A newlow-rank matrix factorization with a general noise distribu-tion model was developed in [29]. Meanwhile, a mixture of

Gaussian model was developed in [30] that can be updatedframe-by-frame and regularized by the previous foreground-background frames. The method in [31] studied backgroundsubtraction from compressive measurements by proposing atensor RPCA to exploiting the spatial-temporal correlation inbackground and continuity of foreground. The study in [32]presented an online compressive RPCA method that is sup-ported by performance guarantees. However, these methodsas well as the methods in [25], [27], [28] do not explore thecorrelation across adjacent sparse components, as we proposein Problem (2).

The problem of reconstructing a series of time-varyingsparse signals using prior information has been explored inonline RPCA [15] and in recursive CS [33]–[35] (see [36] fora review). The study in [37] extended the ReProCS [14], [15]algorithm to the compressive case and used modified-CS [38]to leverage prior support knowledge under the condition ofslowly-varying support. However, no explicit condition on thenumber of measurements required for successful recovery wasreported. Alternatively, the studies in [33]–[35], [39] assumedthe low-rank vector non-varying (v = v1 = · · · = vt−1 = vt)and recovered the sparse component using `1 [35], [39] or`1-`1 [33], [34] minimization. The studies in [33]–[35] alsoprovide a method to compute the number of compressive mea-surements needed to recover xt per time instance. However,the methods in [34], [35], [37] do not exploit multiple priorinformation from multiple previously recovered frames.

Contributions. We propose a compressive online RPCA(CORPCA) with multiple prior information algorithm that

Low-rank

Measurement rate m=n

(c) CORPCA-`1Sparse

Measurement rate m=n0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Non

zero

ss 0

90

70

50

30

10

Pr(

succ

ess)

[%]

0

50

100

(d) ReProCS [37]

Low-rank

Measurement rate m=n0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Nonze

ross 0

90

70

50

30

10P

r(su

cces

s) [%

]

0

20

40

60

80

(e) GRASTA [18] (f) Scale

Fig. 4. Average success probabilities for CORPCA (for recovering xt,vt ∈R500), ReProCS [37] (for xt), and GRASTA [18] (for vt).

2

Sparse

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Non

zero

ss 0

90

70

50

30

10

Pr(

succ

ess)

[%]

0

20

40

60

80

100

Measurement rate m=n0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Non

zero

ss 0

90

70

50

30

10

Pr(

succ

ess)

[%]

0

20

40

60

80

100

Fig. 1. Average success probabilities for CORPCA (for recovering xt,vt ∈ R512) using the subsampled training data 35% and 70% against the full set.

components Xt−1, and λ1, λ2 are tuning parameters. Prob-lem (2) emerges naturally in scenarios where the data sampleshave strong temporal correlation; for example, there is highcorrelation across the backgrounds and the foregrounds inmultiple adjacent video frames [18], [19].

Relation to prior work. Unlike batch [4]–[7] or on-line [9]–[11], [17] RPCA approaches that consume full datasamples, operating on compressive measurements (like ourmethod does) improves the computational efficiency and re-duces the data communication cost. Examples of compressivebatch RPCA solutions include Compressive PCP [20] andSpaRCS [21]. The former solves a modified version of Prob-lem (1) with constraint D = Φ(L+X) using a fast interior-point algorithm [22], while the latter adheres to a greedyapproach, combining CoSaMP [23] and ADMiRA [24] forsparse and low-rank recovery, respectively. Other studies haveproposed online RPCA methods that consider sub-sampleddata: GRASTA [25] alternates between an ADMM (alternatingdirection method of multipliers [26]) step to update the sparsecomponent and an incremental gradient descent method on theGrassmannian manifold to update the low-rank component.GOSUS [27] improves over GRASTA by accounting forstructured sparsity in the foreground, and ROSETA [28] byupdating the sub-space using a proximal point iteration.

Recently, in the context of foreground-background separa-tion, the works in [29]–[31] proposed advanced factorization

methods to obtain robust background subtractions. A newlow-rank matrix factorization with a general noise distribu-tion model was developed in [29]. Meanwhile, a mixture ofGaussian model was developed in [30] that can be updatedframe-by-frame and regularized by the previous foreground-background frames. The method in [31] studied backgroundsubtraction from compressive measurements by proposing atensor RPCA to exploiting the spatial-temporal correlation inbackground and continuity of foreground. The study in [32]presented an online compressive RPCA method that is sup-ported by performance guarantees. However, these methodsas well as the methods in [25], [27], [28] do not explore thecorrelation across adjacent sparse components, as we proposein Problem (2).

The problem of reconstructing a series of time-varyingsparse signals using prior information has been explored inonline RPCA [15] and in recursive CS [33]–[35] (see [36] fora review). The study in [37] extended the ReProCS [14], [15]algorithm to the compressive case and used modified-CS [38]to leverage prior support knowledge under the condition ofslowly-varying support. However, no explicit condition on thenumber of measurements required for successful recovery wasreported. Alternatively, the studies in [33]–[35], [39] assumedthe low-rank vector non-varying (v = v1 = · · · = vt−1 = vt)and recovered the sparse component using `1 [35], [39] or`1-`1 [33], [34] minimization. The studies in [33]–[35] also

Low-rankMeasurement rate m=n

(a) Full

2

Sparse

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Non

zero

ss 0

90

70

50

30

10

Pr(

succ

ess)

[%]

0

20

40

60

80

100

Measurement rate m=n0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Nonze

ross 0

90

70

50

30

10

Pr(

succ

ess)

[%]

0

20

40

60

80

100

Fig. 1. Average success probabilities for CORPCA (for recovering xt,vt ∈ R512) using the subsampled training data 35% and 70% against the full set.

components Xt−1, and λ1, λ2 are tuning parameters. Prob-lem (2) emerges naturally in scenarios where the data sampleshave strong temporal correlation; for example, there is highcorrelation across the backgrounds and the foregrounds inmultiple adjacent video frames [18], [19].

Relation to prior work. Unlike batch [4]–[7] or on-line [9]–[11], [17] RPCA approaches that consume full datasamples, operating on compressive measurements (like ourmethod does) improves the computational efficiency and re-duces the data communication cost. Examples of compressivebatch RPCA solutions include Compressive PCP [20] andSpaRCS [21]. The former solves a modified version of Prob-lem (1) with constraint D = Φ(L+X) using a fast interior-point algorithm [22], while the latter adheres to a greedyapproach, combining CoSaMP [23] and ADMiRA [24] forsparse and low-rank recovery, respectively. Other studies haveproposed online RPCA methods that consider sub-sampleddata: GRASTA [25] alternates between an ADMM (alternatingdirection method of multipliers [26]) step to update the sparsecomponent and an incremental gradient descent method on theGrassmannian manifold to update the low-rank component.GOSUS [27] improves over GRASTA by accounting forstructured sparsity in the foreground, and ROSETA [28] byupdating the sub-space using a proximal point iteration.

Recently, in the context of foreground-background separa-tion, the works in [29]–[31] proposed advanced factorization

methods to obtain robust background subtractions. A newlow-rank matrix factorization with a general noise distribu-tion model was developed in [29]. Meanwhile, a mixture ofGaussian model was developed in [30] that can be updatedframe-by-frame and regularized by the previous foreground-background frames. The method in [31] studied backgroundsubtraction from compressive measurements by proposing atensor RPCA to exploiting the spatial-temporal correlation inbackground and continuity of foreground. The study in [32]presented an online compressive RPCA method that is sup-ported by performance guarantees. However, these methodsas well as the methods in [25], [27], [28] do not explore thecorrelation across adjacent sparse components, as we proposein Problem (2).

The problem of reconstructing a series of time-varyingsparse signals using prior information has been explored inonline RPCA [15] and in recursive CS [33]–[35] (see [36] fora review). The study in [37] extended the ReProCS [14], [15]algorithm to the compressive case and used modified-CS [38]to leverage prior support knowledge under the condition ofslowly-varying support. However, no explicit condition on thenumber of measurements required for successful recovery wasreported. Alternatively, the studies in [33]–[35], [39] assumedthe low-rank vector non-varying (v = v1 = · · · = vt−1 = vt)and recovered the sparse component using `1 [35], [39] or`1-`1 [33], [34] minimization. The studies in [33]–[35] also

Measurement rate m=n

Low-rank

(b) 70%

2

Sparse

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Non

zero

ss 0

90

70

50

30

10

Pr(

succ

ess)

[%]

0

20

40

60

80

100

Measurement rate m=n0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Non

zero

ss 0

90

70

50

30

10

Pr(

succ

ess)

[%]

0

20

40

60

80

100

Fig. 1. Average success probabilities for CORPCA (for recovering xt,vt ∈ R512) using the subsampled training data 35% and 70% against the full set.

components Xt−1, and λ1, λ2 are tuning parameters. Prob-lem (2) emerges naturally in scenarios where the data sampleshave strong temporal correlation; for example, there is highcorrelation across the backgrounds and the foregrounds inmultiple adjacent video frames [18], [19].

Relation to prior work. Unlike batch [4]–[7] or on-line [9]–[11], [17] RPCA approaches that consume full datasamples, operating on compressive measurements (like ourmethod does) improves the computational efficiency and re-duces the data communication cost. Examples of compressivebatch RPCA solutions include Compressive PCP [20] andSpaRCS [21]. The former solves a modified version of Prob-lem (1) with constraint D = Φ(L+X) using a fast interior-point algorithm [22], while the latter adheres to a greedyapproach, combining CoSaMP [23] and ADMiRA [24] forsparse and low-rank recovery, respectively. Other studies haveproposed online RPCA methods that consider sub-sampleddata: GRASTA [25] alternates between an ADMM (alternatingdirection method of multipliers [26]) step to update the sparsecomponent and an incremental gradient descent method on theGrassmannian manifold to update the low-rank component.GOSUS [27] improves over GRASTA by accounting forstructured sparsity in the foreground, and ROSETA [28] byupdating the sub-space using a proximal point iteration.

Recently, in the context of foreground-background separa-tion, the works in [29]–[31] proposed advanced factorization

methods to obtain robust background subtractions. A newlow-rank matrix factorization with a general noise distribu-tion model was developed in [29]. Meanwhile, a mixture ofGaussian model was developed in [30] that can be updatedframe-by-frame and regularized by the previous foreground-background frames. The method in [31] studied backgroundsubtraction from compressive measurements by proposing atensor RPCA to exploiting the spatial-temporal correlation inbackground and continuity of foreground. The study in [32]presented an online compressive RPCA method that is sup-ported by performance guarantees. However, these methodsas well as the methods in [25], [27], [28] do not explore thecorrelation across adjacent sparse components, as we proposein Problem (2).

The problem of reconstructing a series of time-varyingsparse signals using prior information has been explored inonline RPCA [15] and in recursive CS [33]–[35] (see [36] fora review). The study in [37] extended the ReProCS [14], [15]algorithm to the compressive case and used modified-CS [38]to leverage prior support knowledge under the condition ofslowly-varying support. However, no explicit condition on thenumber of measurements required for successful recovery wasreported. Alternatively, the studies in [33]–[35], [39] assumedthe low-rank vector non-varying (v = v1 = · · · = vt−1 = vt)and recovered the sparse component using `1 [35], [39] or`1-`1 [33], [34] minimization. The studies in [33]–[35] also

Low-rankMeasurement rate m=n

(c) 35%

Fig. 5. Average success probabilities for CORPCA (for recovering xt,vt ∈R512) using (a) a full training set and subsampled training sets, (b) 70% and(c) 35%.

testing vectors q = 128 (following that the SpaRCS code7

requires dimensions that are a power of 2). The correspondingresults are shown in Fig. 5, where m/n = {1, 0.7, 0.35}are tested. We observe that the performance of COPRCAunder the subsampled training sets [see Figs. 5(b) and 5(c)] issignificantly degraded. These degradations are caused by thepoor quality B0, obtained by SpaRCS. In these experiments,SpaRCS does not work well for a given rank r = 5 of L andhigher supports s0 > 15 on the subsampled training set. Inrealistic scenarios, we should take the characteristics of therank of low-rank components and the sparse degree of sparsecomponents into account to avoid obtaining poor B0 from thesubsampled training set.

C. Compressive Video Foreground-Background SeparationWe now assess our CORPCA method in the application

of compressive video background-foreground separation usingreal video content and compare it against existing methods,namely, GRASTA [18] and ReProCS [37].

1) Visual Evaluation: We consider several video se-quences [47], namely, Bootstrap (rescaled to 60×80 pix-els), Curtain (rescaled to 64×80 pixels), Hall (rescaled

7The code is available at https://www.ece.rice.edu/ aew2/sparcs.html

to 72×88 pixels) and Fountain (rescaled to 64×80 pixels),where Bootstrap and Hall have static backgrounds andCurtain and Fountain have dynamic backgrounds (seeFig. 6). For all methods, we use the first d = 100 framesfor training and the subsequent q = 2955 (Bootstrap),q = 2864 (Curtain), q = 3484 (Hall), and q = 423(Fountain) frames for evaluation. We use three sparsecomponents as prior information: xt−1, xt−2, and xt−3. Weconsider the immediately previously reconstructed foregroundas the first side information signal, i.e., z1 = xt−1. The othertwo side information signals are formed by applying motion-compensated extrapolation using the three previously recon-structed frames. Specifically, we perform forward optical-flow-based [50], [53] motion estimation from xt−2 to xt−1 (resp.,xt−3 to xt−1) and then apply the motion vectors on xt−1 togenerate z2 (resp. z3).

We first consider the case where we have full access tothe video frames (that is, the number of measurements m isequal to the dimension n of the data); the visual results ofthe various methods are illustrated in Fig. 6. For both videosequences, CORPCA delivers superior visual results than theother methods, which suffer from less details in the separatedforeground and noise in the background. Fig. 7 presents theresults of CORPCA under various compressive rates m/n. Theresults show that we can recover the foreground and back-ground even by accessing a small number of measurements;for instance, with m/n = 0.6 and for Bootstrap [Fig. 7(a)],Hall [Fig. 7(c)] and m/n = 0.4 for Curtain [Fig. 7(b)],Fountain [Fig. 7(d)]. Bootstrap and Hall requires moremeasurements than Curtain and Fountain due to themore complex foreground. For comparison, we also illustratethe visual results of the foreground recovered by ReProCSin Fig. 8. It is clear that the foreground images recoveredwith ReProCS have a poorer visual quality compared toCORPCA even at a high rate m/n = 0.8. The original andthe reconstructed video sequences are available online [51].

In addition, we illustrate the results of CORPCA by di-rectly using the reconstructed foregrounds without motioncompensation as well as considering a different number (J)of priors. Fig. 9 shows the visual results of CORPCA forBootstrap (frame no. 2213) and Curtain (frame no.2866) using three priors, zj = xt−j (1 ≤ j ≤ 3) taken directlyfrom previously reconstructed foregrounds. These results areworse than those of CORPCA with motion compensation (seeFig. 7); particularly, at rates m/n = 0.4 and m/n = 0.2,Figs. 9(a) and 9(b) show poorer foregrounds than the onesdepicted in Figs. 7(a) and 7(b), respectively. To demonstratethe influence of the number (J) of foreground priors, Fig. 10shows the visual results of CORPCA using one and five priors.Setting J = 1 [Fig. 10(a)] delivers worse results than settingJ = 5 [Fig. 10(a)] and J = 3 [Fig. 7(a)]. Furthermore, theresults using J = 5 are only slightly better than those usingJ = 3, concluding that using a reasonable number of priorsJ = 3 is sufficient to capture the temporal correlation.

2) Quantitative Results: We now present quantitative re-sults using the receiver operating curve (ROC) metric [54](the true positives and false positives metrics are defined asin [54]). The ROC results for CORPCA deploying differ-

Page 10: Compressive Online Robust Principal Component …homepages.vub.ac.be/~ndeligia/pubs/corpcaVideo_v21.pdfsparse and low-rank recovery, respectively. Other studies have proposed online

10

Original

CORPCA RPCA GRASTA ReProCS

CORPCA RPCA GRASTA ReProCS

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

(b) Bootstrap

Original

CORPCA RPCA GRASTA ReProCS

CORPCA RPCA GRASTA ReProCS

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

(c) Curtain

Fig. 2. Background and foreground separation for the different separation methods with full data of frames Bootstrap#2210 and Curtain#2866.

defined as in [?].For comparing CORPCA to RPCA, GRASTA, and ReProCS, Fig.

5 illustrates the ROC results for full data access, i.e., m/n = 1,since RPCA, GRASTA, and ReProCS can not support the separationfor both foregrounds and backgrounds from the compressive mea-surements. The results of CORPCA are better than all the remainingmethods, especially, it is significantly improved for Curtain in Fig.5(b). The ROC results for CORPCA, CORPCA-n-`1, is depictedin Figs. ?? and ?? for Bootstrap and Curtain sequences,respectively. CORPCA-n-`1 still maintain good performances withthe small number of measurements, e.g., for Bootstrap untilm/n = 0.6 [see Fig. ??] and for Curtain until m/n = 0.4[see Fig. ??]. Furthermore, we simulate ReProCS from compressivemeasurements for obtaining only foreground and comprared withCORPCA with different measurement rates as shown in Fig. 6. TheROC results for ReProCS are quickly degradated for the compressivemeasurements even with a high rate m/n = 0.8.

IV. CONCLUSION

This paper proposed a compressive online robust PCA algorithm(CORPCA) that can process a data vector per time instance fromcompressive measurements. CORPCA efficiently incorporates multi-ple prior information based on the n-`1 minimization problem. We

established the theoretical bounds on the number of required mea-surements to guarantee successful recovery. Furthermore, we havepresented numerical results that showed the efficiency of CORPCAand the consistency of the theoretical bounds.

(a) Bootstrap

Original

CORPCA RPCA GRASTA ReProCS

CORPCA RPCA GRASTA ReProCS

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

(b) Bootstrap

Original

CORPCA RPCA GRASTA ReProCS

CORPCA RPCA GRASTA ReProCS

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

(c) Curtain

Fig. 2. Background and foreground separation for the different separation methods with full data of frames Bootstrap#2210 and Curtain#2866.

defined as in [?].For comparing CORPCA to RPCA, GRASTA, and ReProCS, Fig.

5 illustrates the ROC results for full data access, i.e., m/n = 1,since RPCA, GRASTA, and ReProCS can not support the separationfor both foregrounds and backgrounds from the compressive mea-surements. The results of CORPCA are better than all the remainingmethods, especially, it is significantly improved for Curtain in Fig.5(b). The ROC results for CORPCA, CORPCA-n-`1, is depictedin Figs. ?? and ?? for Bootstrap and Curtain sequences,respectively. CORPCA-n-`1 still maintain good performances withthe small number of measurements, e.g., for Bootstrap untilm/n = 0.6 [see Fig. ??] and for Curtain until m/n = 0.4[see Fig. ??]. Furthermore, we simulate ReProCS from compressivemeasurements for obtaining only foreground and comprared withCORPCA with different measurement rates as shown in Fig. 6. TheROC results for ReProCS are quickly degradated for the compressivemeasurements even with a high rate m/n = 0.8.

IV. CONCLUSION

This paper proposed a compressive online robust PCA algorithm(CORPCA) that can process a data vector per time instance fromcompressive measurements. CORPCA efficiently incorporates multi-ple prior information based on the n-`1 minimization problem. We

established the theoretical bounds on the number of required mea-surements to guarantee successful recovery. Furthermore, we havepresented numerical results that showed the efficiency of CORPCAand the consistency of the theoretical bounds.

(b) Curtain

m/n = 1

m/n = 0.6 m/n = 0.4 m/n = 0.8 m/n = 0.5

CORPCA RPCA GRASTA ReProCS

Original

(c) Hall

m/n = 1

m/n = 0.6 m/n = 0.4 m/n = 0.8 m/n = 0.5

Original

CORPCA RPCA GRASTA ReProCS

(d) Fountain

Fig. 6. Visual results when assuming access to the entire data vectors: (a) Bootstrap (frame no. 2213), (b) Curtain (frame no. 2866), (c) Hall (frameno. 1963), and (d) Fountain (frame no. 480).

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

(a) Bootstrap

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

(b) Curtain

m/n = 1

m/n = 0.6 m/n = 0.2 m/n = 0.8 m/n = 0.4

(c) Hall

m/n = 1

m/n = 0.6 m/n = 0.4 m/n = 0.8 m/n = 0.5

m/n = 0.6 m/n = 0.2 0.2

m/n = 0.8 m/n = 0.4 (d) Fountain

Fig. 7. Visual performance of CORPCA under different measurement rates: (a) Bootstrap (frame no. 2213), (b) Curtain (frame no. 2866), (c) Hall(frame no. 1963), and (d) Fountain (frame no. 480).

ent recovery algorithms, namely n-`1, `1-`1, and `1 mini-mization, are depicted in Fig. 11 for the Bootstrap andCurtain sequences. It is clear that CORPCA-`1-`1 [Figs.11(b), 11(e)] and CORPCA-`1 [Figs. 11(c), 11(f)] deliverworse performance than CORPCA-n-`1 [Figs. 11(a), 11(d)].It is worth observing that CORPCA-n-`1 delivers high per-formance even at low rates, that is, at m/n = 0.6 forBootstrap [Fig. 11(a)], and m/n = 0.4 for Curtain [seeFig. 11(d)]. Fig. 12 illustrates the ROCs of CORPCA, RPCA,GRASTA, and ReProCS when assuming full data access(i.e., m/n = 1). The results show that CORPCA delivershigher performance than the other methods, especially for theCurtain video sequence [see Fig. 12(b)]. Furthermore, we

compare the foreground recovery performance of CORPCAagainst ReProCS for different compressive measurement rates:m/n = {0.8, 0.6, 0.4, 0.2}. The ROC results in Fig. 13 showthat CORPCA achieves a relatively high performance with asmall number of measurements. In addition, the ROC resultsfor ReProCS are quickly degrading even at a high compressivemeasurement rate m/n = 0.8.

Memory and Computational Needs: In the considered videoanalysis application, batch RPCA [5] needs to load all rawvideo samples—of size n×p pixels (where n is the number ofpixels per frame and p is the number of frames)—in memory toperform the decomposition. Batch-based compressed SpaRCS[23] needs to load a reduced amount of video samples, namely,

Page 11: Compressive Online Robust Principal Component …homepages.vub.ac.be/~ndeligia/pubs/corpcaVideo_v21.pdfsparse and low-rank recovery, respectively. Other studies have proposed online

11

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

(a) Bootstrap

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

(b) Curtain

Fig. 8. Foreground recovered using ReProCS [37] under different measurement rates: (a) Bootstrap (frame no. 2213) and (b) Curtain (frame no. 2866).

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2 m/n = 1

(a) Bootstrap

CORPCA RPCA GRASTA ReProCS

CORPCA RPCA GRASTA ReProCS

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2

m/n = 1

(b) Curtain

Fig. 9. Visual performance of CORPCA for (a) Bootstrap (frame no. 2213) and (b) Curtain (frame no. 2866) using three (J = 3) directly reconstructedforegrounds as priors without motion compensation.

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2 m/n = 1

(a) J = 1

m/n = 0.8 m/n = 0.6 m/n = 0.4 m/n = 0.2 m/n = 1

(b) J = 5

Fig. 10. Visual performance of CORPCA for Bootstrap (frame no. 2213) using different numbers of foreground priors: (a) J = 1 and (b) J = 5.

False positives0 0.1 0.2 0.3 0.4 0.5

Tru

e po

sitiv

es

0

0.2

0.4

0.6

0.8

1ROC curve of dataset -- bootstrap

CORPCA-n-`1: m=n = 1

CORPCA-n-`1: m=n = 0:8

CORPCA-n-`1: m=n = 0:6

CORPCA-n-`1: m=n = 0:4

CORPCA-n-`1: m=n = 0:2

(a) Bootstrap: CORPCA-n-`1

False positives0 0.02 0.04 0.06 0.08 0.1 0.12

Tru

e po

sitiv

es

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7ROC curve of dataset -- bootstrap

CORPCA-`1-`1: m=n = 1

CORPCA-`1-`1: m=n = 0:8

CORPCA-`1-`1: m=n = 0:6

CORPCA-`1-`1: m=n = 0:4

CORPCA-`1-`1: m=n = 0:2

(b) Bootstrap: CORPCA-`1-`1

False positives0 0.002 0.004 0.006 0.008 0.01 0.012 0.014

Tru

e po

sitiv

es

0

0.1

0.2

0.3

0.4

0.5ROC curve of dataset -- bootstrap

CORPCA-`1: m=n = 1

CORPCA-`1: m=n = 0:8

CORPCA-`1: m=n = 0:6

CORPCA-`1: m=n = 0:4

CORPCA-`1: m=n = 0:2

(c) Bootstrap: CORPCA-`1

False positives0 0.02 0.04 0.06 0.08 0.1 0.12

Tru

e po

sitiv

es

0

0.2

0.4

0.6

0.8

1ROC curve of dataset -- curtain

CORPCA-n-`1: m=n = 1

CORPCA-n-`1: m=n = 0:8

CORPCA-n-`1: m=n = 0:6

CORPCA-n-`1: m=n = 0:4

CORPCA-n-`1: m=n = 0:2

(d) Curtain: CORPCA-n-`1

False positives0 0.005 0.01 0.015 0.02 0.025 0.03

Tru

e po

sitiv

es

0

0.2

0.4

0.6

0.8

1ROC curve of dataset -- curtain

CORPCA-`1-`1: m=n = 1

CORPCA-`1-`1: m=n = 0:8

CORPCA-`1-`1: m=n = 0:6

CORPCA-`1-`1: m=n = 0:4

CORPCA-`1-`1: m=n = 0:2

(e) Curtain: CORPCA-`1-`1

False positives #10-30 0.5 1 1.5 2

Tru

e po

sitiv

es

0

0.1

0.2

0.3

0.4

0.5

0.6ROC curve of dataset -- curtain

CORPCA-`1: m=n = 1

CORPCA-`1: m=n = 0:8

CORPCA-`1: m=n = 0:6

CORPCA-`1: m=n = 0:4

CORPCA-`1: m=n = 0:2

(f) Curtain: CORPCA-`1

Fig. 11. ROC for CORPCA deploying different sparse recovery algorithms with different measurement rates.

Page 12: Compressive Online Robust Principal Component …homepages.vub.ac.be/~ndeligia/pubs/corpcaVideo_v21.pdfsparse and low-rank recovery, respectively. Other studies have proposed online

12

False positives0 0.2 0.4 0.6 0.8

Tru

e po

sitiv

es

0.4

0.5

0.6

0.7

0.8

0.9

1ROC curve of dataset -- bootstrap

CORPCA

RPCA

GRASTA

ReProCS

(a) Bootstrap

False positives0 0.02 0.04 0.06 0.08 0.1

Tru

e po

sitiv

es

0

0.2

0.4

0.6

0.8

1ROC curve of dataset -- curtain

CORPCA

RPCA

GRASTA

ReProCS

(b) Curtain

Fig. 12. ROC for the different separation methods with full data m/n = 1.

TABLE ICOMPUTATIONAL TIMES PER FRAME FOR BOOTSTRAP AND CURTAIN

CORPCA GRASTA ReProCS[18] [37]

Bootstrap (sec/frame)m/n = 1 0.75 0.003 0.29m/n = 0.6 1.72 1.39

Curtain (sec/frame)m/n = 1 0.22 0.003 0.13m/n = 0.4 1.59 0.82

m×p pixels, where m� n is the numbered of measurements.Both methods have high computational and memory usage de-mands, especially when the number of processed video framesincreases. Alternatively, recursive methods like GRASTA [18],ReProCS [37], and CORPCA only load one video frame ateach time instance and do not have a processing delay ofp frames. The advantage of CORPCA over non-compressedonline RPCA—such as incremental PCP [17]—is that it accesssubsampled data per frame (namely, m � n instead of nsamples), thereby reducing further the associated data storageand communication requirements.

We evaluate the computational time of CORPCA by com-paring it against other online methods, that is, GRASTA [18]and ReProCS [37]. Matlab implementations of these algo-rithms are executed on a 64-bit desktop with a 3.5GHz Corei5-CPU and 8GB of memory, running Windows 7, and theexecution times are measured. Table I reports the average timeper frame for processing full data (m/n = 1) and subsampleddata—with m/n = 0.6 and m/n = 0.4, respectively—fromthe Bootstrap and Curtain video sequences. The resultsshow that CORPCA requires the highest computational timecompared to the other methods. This is because (i) CORPCA

False positives0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

Tru

e po

sitiv

es

0

0.2

0.4

0.6

0.8

1ROC curve of dataset -- bootstrap

CORPCA: m=n = 0:8

CORPCA: m=n = 0:6

CORPCA: m=n = 0:4

CORPCA: m=n = 0:2

ReProCS: m=n = 0:8

ReProCS: m=n = 0:6

ReProCS: m=n = 0:4

ReProCS: m=n = 0:2

(a) Bootstrap

False positives0 0.02 0.04 0.06 0.08 0.1 0.12

Tru

e po

sitiv

es

0

0.2

0.4

0.6

0.8

1ROC curve of dataset -- curtain

CORPCA: m=n = 0:8

CORPCA: m=n = 0:6

CORPCA: m=n = 0:4

CORPCA: m=n = 0:2

ReProCS: m=n = 0:8

ReProCS: m=n = 0:6

ReProCS: m=n = 0:4

ReProCS: m=n = 0:2

(b) Curtain

Fig. 13. ROC for CORPCA and ReProCS [37] with compressive measure-ments, m/n = {0.8, 0.6, 0.4, 0.2}.requires additional computations for incorporating multiplepriors in the optimization problem and (ii) the code is notoptimized for speed. By using a C++ implementation runningon GPUs, CORPCA can be accelerated to process real-timehigh-resolution videos.

V. CONCLUSIONThis paper proposed a compressive online robust PCA

(CORPCA) algorithm that decomposes a data vector per timeinstance into a low-rank and a sparse component using low-dimensional measurements. A key component of CORPCA isthat it incorporates multiple prior information in the sparse re-covery problem via establishing and solving an n-`1 minimiza-tion problem. Moreover, we established the theoretical boundson the number of measurements to guarantee successful re-covery by the n-`1 minimization problem. Numerical resultshave shown the efficiency of CORPCA and the consistencywith the theoretical bounds. We have also assessed our methodagainst the state of the art in compressive video foreground-background separation. The results revealed the advantage ofincorporating multiple prior information in the problem anddemonstrated the superior performance improvement offeredby CORPCA compared to the existing methods.

VI. ACKNOWLEDGEMENT

This work was supported by the Humboldt Research Fellow-ship, the Alexander von Humboldt Foundation, 53173 Bonn,Germany.

APPENDIX ATHE PROXIMAL OPERATOR FOR RAMSIA

Proposition A.1. The proximal operator Γ1L g

(x) in (7) for theproblem of signal recovery with multiple side information, for

Page 13: Compressive Online Robust Principal Component …homepages.vub.ac.be/~ndeligia/pubs/corpcaVideo_v21.pdfsparse and low-rank recovery, respectively. Other studies have proposed online

13

which g(x) = λ∑Jj=0 βj‖Wj(x− zj)‖1, is given by

Γ1L g

(xi)=

xi− λL

J∑j=0

βjwji(−1)b(l<j) if (17a);

zli if (17b);(30)

Proof: We compute the proximal operator Γ1L g

(x) (7)with g(x) = λ

∑Jj=0βj||Wj(x−zj)||1. From (7), Γ 1

L g(x) is

expressed by:

Γ1L g

(x)=arg minv∈Rn

{λL

J∑j=0

βj ||Wj(v−zj)||1+1

2||v−x||22

}.

(31)We note that both terms in (31) are separable in v and thuswe can minimize each element vi of v individually as

Γ1L g

(xi)=arg minvi∈R

{h(vi)=

λ

L

J∑j=0

βjwji|vi−zji|+1

2(vi−xi)2

}.

(32)We consider the ∂h(vi)/∂vi. Without loss of generality, we

assume −∞ ≤ z0i ≤ z1i ≤ ... ≤ zJi ≤ ∞. For convenience,let us denote z(−1)i = −∞ and z(J+1)i = ∞. When vi islocated in one of the intervals, we suppose vi ∈ (zli, z(l+1)i)with −1≤l≤J , where ∂h(vi) exists. Taking the derivative ofh(vi) in (zli, z(l+1)i) delivers

∂h(vi)

∂vi=λ

L

J∑j=0

βjwjisign(vi−zji)+(vi−xi), (33)

where sign(.) is a sign function. From (18) b(.) denotes aboolean function, consequently, sign(vi−zji)=(−1)b(l<j) andusing (33) we rewrite:

∂h(vi)

∂vi=λ

L

J∑j=0

βjwji(−1)b(l<j)+(vi−xi). (34)

When setting ∂h(vi)/∂vi=0 to minimize h(vi), we derive:

vi=xi−λ

L

J∑j=0

βjwji(−1)b(l<j). (35)

The vi in (35) is only valid in (zli, z(l+1)i), i.e.,

zli+λ

L

J∑j=0

βjwji(−1)b(l<j)<xi<z(l+1)i+λ

L

J∑j=0

βjwji(−1)b(l<j).

(36)In the remaining range value of xi, namely, in the case that

zli+λ

L

J∑j=0

βjwji(−1)b(l−1<j)≤xi≤zli+λ

L

J∑j=0

βjwji(−1)b(l<j),

(37)we prove that the minimum of h(vi) (32) is obtained whenvi = zli in the following Lemma A.2.

Lemma A.2. Given xi belonging to the intervals representedin (37), h(vi) in (32) is minimum when vi = zli.

Proof: We re-express h(vi) in (32) as

h(vi)=λ

L

J∑j=0

βjwji

∣∣∣(vi−zli)−(zji−zli)∣∣∣+1

2

((vi−zli)−(xi−zli)

)2.

(38)

Applying the inequality |a− b| ≥ |a| − |b|, where a, b ∈ R,to the first term and expanding the second term in (38), weobtain:

h(vi)≥λ

L

J∑j=0

βjwji|vi−zli|−λ

L

J∑j=0

βjwji|zji−zli|+1

2(vi−zli)2−(vi−zli)(xi−zli)+

1

2(xi−zli)2.

(39)

It can be noted that −(vi−zli)(xi−zli) ≥ −|vi−zli||xi−zli|.Thus the inequality in (39) is equivalent to:

h(vi)≥ |vi−zli|λ

L

J∑j=0

βjwji−|vi−zli||xi−zli|

+1

2(vi−zli)2 −

λ

L

J∑j=0

βjwji|zji−zli|+1

2(xi−zli)2.

(40)

We can then derive from the expression in (37) the following

−λL

J∑j=0

βjwji≤xi−zli≤λ

L

J∑j=0

βjwji⇔|xi−zli|≤λ

L

J∑j=0

βjwji.

(41)

Eventually, we observe that the part including vi in the righthand side of the inequality h(vi) in (40) is

|vi − zli|(λL

J∑j=0

βjwji−|xi − zli|)

+1

2(vi−zli)2. (42)

With (41), the expression in (42) is minimum when vi= zli.Therefore, we deduce that h(vi) in (32) is minimum whenvi=zli.

In summary, from (35) with the conditions in (36), (37) andLemma A.2, we obtain (30) as the proof.

APPENDIX BBOUND ON COMPRESSIVE MEASUREMENTS

Measurement Condition. We use notations x, v, and ywithout index t in the following proof. We restate the mea-surement condition in convex optimization [44], which is usedin the derivation of the measurement bound. The problem inthis paper is an instance of (5), where f(x):=1/2‖Φx−y‖22 isa smooth convex function and g(x):=Rn→R is a continuousconvex function possibly non-smooth. Let us denote that thesubdifferential ∂g(·) [55] of a convex function g(·) at apoint x ∈ Rn is given by ∂g(x) := {u ∈ Rn : g(y) ≥g(x) + uT(y − x) for all y ∈ Rn}. Let g ∼ N(0, In)denote a vector of n independent, zero-mean, and unit-varianceGaussian random variables and Eg [·] be the expectation withrespect to g . The Euclidean distance of g with respect to aconvex cone C [44] is defined by

dist(g, C) := minu{‖g − u‖2 : u ∈ C}. (43)

Proposition B.1. [44, Corrollary 3.3, Proposition 3.6] LetΦ ∈ Rm×n be a random projection, whose elements arerandomly drawn from an i.i.d. Gaussian distribution, and λm,ω(x) be defined as in Theorem III.1 and [44, Definition 3.1].

1) By observing y = Φx ∈Rm and solving (5), x∈Rn issuccessfully recovered with probability greater than 1 −exp(− 1

2 [λm − ω(x)]2) provided that m ≥ Ug+1.

Page 14: Compressive Online Robust Principal Component …homepages.vub.ac.be/~ndeligia/pubs/corpcaVideo_v21.pdfsparse and low-rank recovery, respectively. Other studies have proposed online

14

2) If we observe y = Φx + ϑ with the noise ϑ that isbounded as ‖ϑ‖2<σ. Let x denote any solution in (5)and 0 < ρ < 1. We have that ‖x − x‖2 < 2σ

1−√ρ withprobability greater than 1 − exp(− 1

2 [λm − ω(x) − (1−√ρ)√m]2) provided that m ≥ Ug+3/2

ρ .The quantity Ug is calculated given a convex norm functiong(x) :=Rn→R by

Ug = minτ≥0

Eg

[dist2(g, τ · ∂g(x))

]. (44)

Supporting results. Recall that the probability densityof the normal distribution N(0, 1) with zero-mean and unitvariance ψ(x) is given by

ψ(x) := (1/√

2π)e−x2/2. (45)

We also consider the following inequality [42]:

(1− x−1)√π log(x)

≤ 1√2π≤ 2

5, (46)

for all x > 1. Moreover, adhering to the formulation in [42],we use the following inequality with x > 0 in our derivations:

1√2π

∫ ∞x

(v − x)2e−v2/2dv ≤ ψ(x)

x. (47)

Proof of Theorem III.1: We firstly derive the boundin (28); the bound in (29) is then derived as its noisy-casecounterpart. We derive the bounds based on Proposition B.1.

We first compute the subdifferential ∂g(x) and then thedistance, dist(·), between the standard normal vector g and∂g(x). The u ∈ ∂g(x) of g(x) is derived through the separatecomponents of the sum g(x)=

∑Jj=0 βjgj(x), where gj(x)=

‖Wj(x− zj)‖1. As a result, ∂g(x) =∑Jj=0 βj∂gj(x).

Considering the distance from the standard normal vector gto the subdifferential ∂g(x) given by (43), we derive a relationbetween this distance and all separate distances of gj(x) as

dist2(g, τ ·∂g(x))≤J∑j=0

βjdist2(g, τ ·∂gj(x)), (48)

where∑Jj=0 βj=1. Taking the expectation of (48) gives

Eg [dist2(g, τ ·∂g(x))]≤J∑j=0

βjEg [dist2(g, τ ·∂gj(x))]. (49)

1) Distance Expectation: We now consider each compo-nent Eg [dist2(g, τ · ∂gj(x))], where gj(x) = ‖Wj(x−zj)‖1.We firstly consider x−zj . Without loss of generality, we haveassumed that the sparse vector x−zj has sj supports or ‖x−zj‖0=sj in the form x−zj=(x1−zj1, ..., xsj−zjsj , 0, ..., 0).The subdifferential u∈∂gj(x) of gj(x) is given by:{

ui = wjisign(xi−zji), i=1, ..., sj|ui| ≤ wji, i=sj+1, ..., n.

(50)

Let us consider the weights wji in Line 10 in Algorithm 2for a specific zj and denote that ηj = n∑n

i=1(|xi−zji|+ε)−1 . Weexpress wji for the source x−zj as

wji=

{ ηj|xi−zji|+ε , i=1, ..., sjηjε , i=sj+1, ..., n.

(51)

We can then compute the distance from the standard normalvector g to the subdifferential ∂gj(x) based on (43) as

dist2(g, τ · ∂gj(x)) =

sj∑i=1

(gi− τwjisign(xi−zji)

)2+

n∑i=sj+1

(max(|gi| − τwji, 0)

)2,

(52)

where max(a, 0) returns the maximum value between a∈Rand 0. Further, after taking the expectation of (52), we obtain

Eg [dist2(g,τ · ∂gj(x))] = sj+ τ2sj∑i=1

w2ji

+

√2

π

n∑i=sj+1

∫ ∞τwji

(v − τwji)e−v2/2dv.

(53)

We apply (47) on the third term in (53) to get

Eg [dist2(g, τ ·∂gj(x))]≤sj+τ2sj∑i=1

w2ji+2

n∑i=sj+1

ψ(τwji)

τwji. (54)

2) Bound Derivation: Inserting (54) in (49) for all functionsgj(x) gives

Eg [dist2(g, τ · ∂g(x))] ≤J∑j=0

βjsj+ τ2J∑j=0

βj

sj∑i=1

w2ji

+2

J∑j=0

βj(n− sj)ψ(τηj/ε)

τηj/ε,

(55)

where wji = ηj/ε in (51) are used in the third term of (55).Let us denote η = min

j{ηj}. It can be shown that the function

ψ(x)/x decreases with x>0. Thus ψ(τηj/ε)τηj/ε

≤ ψ(τη/ε)τη/ε . Then

we can write (55) as

Eg [dist2(g, τ · ∂g(x))] ≤J∑j=0

βjsj+ τ2J∑j=0

βj

sj∑i=1

w2ji

+2(n−

J∑j=0

βjsj

)ψ(τ η/ε)

τ η/ε.

(56)

Inequality (56) holds due to∑Jj=0βj(n−sj) = n−∑J

j=0βjsj .We denote s=

∑Jj=0 βjsj and from (44), (56), we have

Ug≤ minτ≥0

{s+τ2

J∑j=0

βj

sj∑i=1

w2ji+2(n−s)ψ(τ η/ε)

τ η/ε

}. (57)

Inserting (45) in (57) gives

Ug≤minτ≥0

{s+τ2

J∑j=0

βj

sj∑i=1

w2ji+(n−s) 2√

e(−τ2η2/2ε2)

τ η/ε

}. (58)

We can select a parameter τ >0 in (58) to obtain a bound,here setting τ=(ε/η)

√2 log(n/s) gives

Ug≤ s+ 2 log(n

s)ε2

η2

J∑j=0

βj

sj∑i=1

w2ji +

s(1− s/n)√π log(n/s)

. (59)

We denote α = ε2

η2

∑Jj=0βj

∑sji=1 w

2ji in the second term in

(59). Finally, applying (46) to the last term of (59) gives

Ug ≤ 2α log(n

s) +

7

5s. (60)

Page 15: Compressive Online Robust Principal Component …homepages.vub.ac.be/~ndeligia/pubs/corpcaVideo_v21.pdfsparse and low-rank recovery, respectively. Other studies have proposed online

15

As a result, from Proposition B.1, we get the bound in (28)as

mt ≥ 2α log(n

s) +

7

5s+ 1. (61)

For slowly-changing vt, we assume that the measurement error‖Φ(vt − vt)‖2 < σ. Applying Proposition B.1 for the noisycase, we get the bound in (29).

REFERENCES

[1] H. V. Luong, J. Seiler, A. Kaup, and S. Forchhammer, “Sparse signalreconstruction with multiple side information using adaptive weights formultiview sources,” in Proc. of IEEE Int. Conf. on Image Process., 2016.

[2] H. V. Luong, N. Deligiannis, J. Seiler, S. Forchhammer, and A. Kaup,“Compressive online robust principle component analysis with multipleprior information,” in Proc. of IEEE Global Conf. Signal Inf. Process.,2017.

[3] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,”Chemometrics and intelligent laboratory systems, vol. 2, no. 1-3, pp.37–52, 1987.

[4] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma, “Robust principalcomponent analysis: Exact recovery of corrupted low-rank matricesvia convex optimization,” in Proc. of Advances in Neural InformationProcessing Systems, 2009.

[5] E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal componentanalysis?” Journal of the ACM (JACM), vol. 58, no. 3, pp. 11:1–11:37,Jun. 2011.

[6] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky,“Rank-sparsity incoherence for matrix decomposition,” SIAM Journalon Optimization, vol. 21, no. 2, pp. 572–596, 2011.

[7] X. Ding, L. He, and L. Carin, “Bayesian robust principal componentanalysis,” IEEE Trans. Image Process., vol. 20, no. 12, pp. 3419–3430,2011.

[8] K. Slavakis, G. B. Giannakis, and G. Mateos, “Modeling and optimiza-tion for big data analytics: (Statistical) learning tools for our era of datadeluge,” IEEE Signal Process. Mag., vol. 31, no. 5, pp. 18–31, 2014.

[9] Y. Li, “On incremental and robust subspace learning,” Pattern Recogni-tion, vol. 37, no. 7, pp. 1509–1518, 2004.

[10] J. Feng, H. Xu, and S. Yan, “Online robust PCA via stochastic optimiza-tion,” in Proc. of Advances in Neural Information Processing Systems26, 2013.

[11] G. Mateos and G. B. Giannakis, “Robust PCA as bilinear decomposi-tion with outlier-sparsity regularization,” IEEE Transactions on SignalProcessing, vol. 60, no. 10, pp. 5176–5190, 2012.

[12] E. Candes and T. Tao, “Near-optimal signal recovery from randomprojections: Universal encoding strategies?” IEEE Trans. Inf. Theory,vol. 52, no. 12, pp. 5406–5425, Apr. 2006.

[13] D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52,no. 4, pp. 1289–1306, Apr. 2006.

[14] C. Qiu and N. Vaswani, “Real-time robust principal components’ pur-suit,” in Proc. of Allerton Conf. Commun., Control, Comput., 2010, pp.591–598.

[15] C. Qiu, N. Vaswani, B. Lois, and L. Hogben, “Recursive robust PCAor recursive sparse recovery in large but structured noise,” IEEE Trans.Inf. Theory, vol. 60, no. 8, pp. 5007–5039, 2014.

[16] P. Rodriguez and B. Wohlberg, “Fast principal component pursuit viaalternating minimization,” in Proc. of IEEE Int. Conf. Image Process.,2013.

[17] ——, “Incremental principal component pursuit for video backgroundmodeling,” Journal of Mathematical Imaging and Vision, vol. 55, no. 1,pp. 1–18, 2016.

[18] J. He, L. Balzano, and A. Szlam, “Incremental gradient on the Grass-mannian for online foreground and background separation in subsam-pled video,” in Proc. of IEEE Conf. on Computer Vision and PatternRecognition, June 2012.

[19] P. Pan, J. Feng, L. Chen, and Y. Yang, “Online compressed robust PCA,”in Proc. of Int. Joint Conf. on Neural Networks (IJCNN). IEEE, 2017,pp. 1041–1048.

[20] A. M. Tekalp, Digital video processing. Prentice Hall Press, 2015.[21] N. Deligiannis, A. Munteanu, T. Clerckx, J. Cornelis, and P. Schelkens,

“On the side-information dependency of the temporal correlation inwyner-ziv video coding,” in Proc. of IEEE Int. Conf. on Acoustics,Speech and Signal Process., 2009.

[22] J. Wright, A. Ganesh, K. Min, and Y. Ma, “Compressive principalcomponent pursuit,” Inf. and Inference, vol. 2, no. 1, pp. 32–68, 2013.

[23] A. E. Waters, A. C. Sankaranarayanan, and R. Baraniuk, “SpaRCS:Recovering low-rank and sparse matrices from compressive measure-ments,” in Proc. of Advances in Neural Information Processing Systems,2011, pp. 1089–1097.

[24] S. R. Becker, E. J. Candes, and M. C. Grant, “Templates for convex coneproblems with applications to sparse signal recovery,” MathematicalProgramming Computation, vol. 3, no. 3, pp. 165–218, 2011.

[25] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery fromincomplete and inaccurate samples,” Applied and Computational Har-monic Analysis, vol. 26, no. 3, pp. 301–321, 2009.

[26] K. Lee and Y. Bresler, “Admira: Atomic decomposition for minimumrank approximation,” IEEE Trans. Inf. Theory, vol. 56, no. 9, pp. 4402–4416, 2010.

[27] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributedoptimization and statistical learning via the alternating direction methodof multipliers,” Foundations and Trends in Machine Learning, vol. 3,no. 1, pp. 1–122, 2011.

[28] J. Xu, V. K. Ithapu, L. Mukherjee, J. M. Rehg, and V. Singh, “GOSUS:Grassmannian online subspace updates with structured-sparsity,” in Proc.of IEEE International Conference on Computer Vision, Dec 2013.

[29] H. Mansour and X. Jiang, “A robust online subspace estimation andtracking algorithm,” in Proc. of IEEE Int. Conf. on Acoustics, Speechand Signal Processing, April 2015.

[30] X. Cao, Q. Zhao, D. Meng, Y. Chen, and Z. Xu, “Robust low-rankmatrix factorization under general mixture noise distributions,” IEEETrans. Image Process., vol. 25, no. 10, pp. 4677–4690, Oct 2016.

[31] H. Yong, D. Meng, W. Zuo, and L. Zhang, “Robust online matrixfactorization for dynamic background subtraction,” IEEE Trans. PatternAnal. Mach. Intell., vol. PP, no. 99, pp. 1–1, Oct 2017.

[32] W. Cao, Y. Wang, J. Sun, D. Meng, C. Yang, A. Cichocki, and Z. Xu,“Total variation regularized tensor RPCA for background subtractionfrom compressive measurements,” IEEE Trans. Image Process., vol. 25,no. 9, pp. 4075–4090, Sept 2016.

[33] J. F. Mota, N. Deligiannis, A. Sankaranarayanan, V. Cevher, and M. R.Rodrigues, “Dynamic sparse state estimation using `1-`1 minimization:Adaptive-rate measurement bounds, algorithms and applications,” inProc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing,Brisbane, Australia, Apr. 2015.

[34] J. F. Mota, N. Deligiannis, A. C. Sankaranarayanan, V. Cevher, and M. R.Rodrigues, “Adaptive-rate reconstruction of time-varying signals withapplication in compressive foreground extraction,” IEEE Trans. SignalProcess., vol. 64, no. 14, pp. 3651–3666, 2016.

[35] G. Warnell, S. Bhattacharya, R. Chellappa, and T. Basar, “Adaptive-rate compressive sensing using side information,” IEEE Trans. ImageProcess., vol. 24, no. 11, pp. 3846–3857, 2015.

[36] N. Vaswani and J. Zhan, “Recursive recovery of sparse signal sequencesfrom compressive measurements: A review,” IEEE Trans. Signal Pro-cess., vol. 64, no. 13, pp. 3523–3549, 2016.

[37] H. Guo, C. Qiu, and N. Vaswani, “An online algorithm for separatingsparse and low-dimensional signal sequences from their sum,” IEEETrans. Signal Process., vol. 62, no. 16, pp. 4284–4297, 2014.

[38] N. Vaswani and W. Lu, “Modified-CS: Modifying compressive sensingfor problems with partially known support,” IEEE Trans. Signal Pro-cess., vol. 58, no. 9, pp. 4595–4607, Sep. 2010.

[39] V. Cevher, A. Sankaranarayanan, M. F. Duarte, D. Reddy, R. G.Baraniuk, and R. Chellappa, “Compressive sensing for backgroundsubtraction,” in Proc. of European Conference on Computer Vision.Springer, 2008, pp. 155–168.

[40] M. Brand, “Incremental singular value decomposition of uncertain datawith missing values,” in Proc. of Europ. Conf. Computer Vision, 2002.

[41] J. F. Mota, N. Deligiannis, and M. R. Rodrigues, “Compressed sens-ing with side information: Geometrical interpretation and performancebounds,” in Proc. of IEEE Global Conf. on Signal and InformationProcessing, Austin, Texas, USA, Dec. 2014.

[42] J. F. C. Mota, N. Deligiannis, and M. R. D. Rodrigues, “Compressedsensing with prior information: Strategies, geometry, and bounds,” IEEETrans. Inf. Theory, vol. 63, no. 7, pp. 4472–4496, Jul. 2017.

[43] L. Weizman, Y. C. Eldar, and D. B. Bashat, “Compressed sensingfor longitunal MRI: An adaptive-weighted approach,” Medical Physics,vol. 42, no. 9, pp. 5195–5207, 2015.

[44] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, “The con-vex geometry of linear inverse problems,” Foundations of ComputationalMathematics, vol. 12, no. 6, pp. 805–849, 2012.

[45] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algo-rithm for linear inverse problems,” SIAM Journal on Imaging Sciences,vol. 2(1), pp. 183–202, 2009.

Page 16: Compressive Online Robust Principal Component …homepages.vub.ac.be/~ndeligia/pubs/corpcaVideo_v21.pdfsparse and low-rank recovery, respectively. Other studies have proposed online

16

[46] J.-F. Cai, E. J. Candes, and Z. Shen, “A singular value thresholdingalgorithm for matrix completion,” SIAM J. on Optimization, vol. 20,no. 4, pp. 1956–1982, Mar. 2010.

[47] L. Li, W. Huang, I. Y.-H. Gu, and Q. Tian, “Statistical modeling ofcomplex backgrounds for foreground object detection,” IEEE Trans.Image Process., vol. 13, no. 11, pp. 1459–1472, 2004.

[48] H. V. Luong, J. Seiler, A. Kaup, and S. Forchhammer, “A reconstructionalgorithm with multiple side information for distributed compression ofsparse sources,” in Proc. of Data Compr. Conf., 2016.

[49] E. Candes, M. B. Wakin, and S. P. Boyd, “Enhancing sparsity byreweighted `1 minimization,” J. Fourier Anal. Appl., vol. 14, no. 5-6,pp. 877–905, 2008.

[50] T. Brox and J. Malik, “Large displacement optical flow: Descriptormatching in variational motion estimation,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 33, no. 3, pp. 500–513, 2011.

[51] The matlab code of the CORPCA algorithm, the testsequences, and the corresponding outcomes. [Online]. Available:https://github.com/huynhlvd/corpca

[52] N. Vaswani, “Stability (over time) of modified-cs for recursive causalsparse reconstruction,” in Proc. of Allerton Conf. Communications,Control, and Computing, Monticello, Illinois, USA, Sep. 2010.

[53] S. Prativadibhayankaram, H. V. Luong, T. H. Le, and A. Kaup, “Com-pressive online robust principal component analysis with optical flowfor video foreground-background separation,” in Proc. of the Eighth In-ternational Symposium on Information and Communication Technology,Nha Trang City, Viet Nam, Dec. 2017.

[54] M. Dikmen, S. F. Tsai, and T. S. Huang, “Base selection in estimatingsparse foreground in video,” in Proc. of IEEE Int. Conf. Image Process.,2009.

[55] J.-B. Hiriart-Urruty and C. Lemarechal, Fundamentals of Convex Anal-ysis. Springer, 2004.

Huynh Van Luong (M’09) received the M.Sc. de-gree in computer engineering with the University ofUlsan, Korea in 2009. He received the Ph.D. degreewith the Coding and Visual Communication Groupat the Technical University of Denmark, Denmarkin 2013.

He was a postdoctoal researcher with Biomedi-cal Research Imaging Center, University of NorthCarolina at Chapel Hill, USA in 2014. He wasawarded the Humboldt Research Fellowship fromthe Alexander von Humboldt, Germany, where he

was with the Chair of Multimedia Communications and Signal Processing(LMS), University of Erlangen-Nuremberg, Germany from 2015 to 2018. Heis currently a researcher with the Department of Electronics and Informatics(ETRO), Vrije Universiteit Brussel, Belgium.

His research interests include image and video analysis, compressivesensing, distributed source coding, and high-dimensional data analytics usingartificial intelligence.

Nikos Deligiannis (S’08-M’10) is assistant profes-sor with the Electronics and Informatics departmentat Vrije Universiteit Brussel (VUB) and principalinvestigator in Data Science at the imec institutein Belgium. He received the Diploma in electricaland computer engineering from University of Patras,Greece, in 2006, and the PhD (Hons.) in appliedsciences from Vrije Universiteit Brussel, Belgium,in 2012. From Oct. 2013 to Feb. 2015, he wassenior researcher at the Department of Electronicand Electrical Engineering at University College

London, UK and technical consultant on big visual data technologies at theBritish Academy of Film and Television Arts (BAFTA), UK.

His research interests include big data mining, processing and analysis,signal processing, machine learning, internet-of-things networking, and im-age/video processing and understanding. He has authored over 100 journaland conference publications, book chapters, and two international patentapplications.

He has received the 2017 EURASIP Best PhD Award; the 2013 ScientificPrize IBM-FWO Belgium for Informatics; and the Best Paper Award at the2011 ACM/IEEE International Conference on Distributed Smart Cameras.

Jurgen Seiler (M’09-SM’16) is senior scientist andlecturer at the Chair of Multimedia Communicationsand Signal Processing at the Friedrich-AlexanderUniversitat Erlangen-Nurnberg, Germany. There, healso received his doctoral degree in 2011 and hisdiploma degree in Electrical Engineering, Electron-ics and Information Technology in 2006.

He received the dissertation award of the Infor-mation Technology Society of the German ElectricalEngineering Association as well as the dissertationaward of the Staedtler-Foundation, both in 2012. In

2007, he received diploma awards from the Institute of Electrical Engineering,Electronics and Information Technology, Erlangen, as well as from theGerman Electrical Engineering Association. He also received scholarshipsfrom the German National Academic Foundation and the Lucent TechnologiesFoundation. He is the co-recipient of four best paper awards and he hasauthored or co-authored more than 70 technical publications.

From 2012 to 2016, he was member of the Management Committee of EUCOST Action IC1105 3D-ConTourNet - ?3D Content Creation, Coding andTransmission over Future Media Networks?. For this Action, he also servedas coordinator for Short Term Scientific Missions. In 2016, he received anLFUI-Guest Professorship at the University of Innsbruck.

His research interests include image and video signal processing, signalreconstruction and coding, signal transforms, and linear systems theory.

Søren Forchhammer (M’04) received the M.S.degree in engineering and the Ph.D. degree from theTechnical University of Denmark, Lyngby, in 1984and 1988, respectively.

Currently, he is a Professor with DTU Fotonik,Technical University of Denmark. He is Head of theCoding and Visual Communication Group. His maininterests include source coding, image and videocoding, distributed source coding, distributed videocoding, compressive sensing, processing for imagedisplays, two-dimensional information theory, visual

communications and coding for optical communication.

Andre Kaup (M’96-SM’99-F’13) received theDipl.-Ing. and Dr.-Ing. degrees in electrical en-gineering from Rheinisch-Westfalische TechnischeHochschule (RWTH) Aachen University, Aachen,Germany, in 1989 and 1995, respectively.

He was with the Institute for CommunicationEngineering, RWTH Aachen University, from 1989to 1995. He joined the Networks and MultimediaCommunications Department, Siemens CorporateTechnology, Munich, Germany, in 1995 and becameHead of the Mobile Applications and Services Group

in 1999. Since 2001 he has been a Full Professor and the Head of theChair of Multimedia Communications and Signal Processing, University ofErlangen- Nuremberg, Erlangen, Germany. From 1997 to 2001 he was theHead of the German MPEG delegation. From 2005 to 2007 he was a ViceSpeaker of the DFG Collaborative Research Center 603. From 2015 to 2017he served as Head of the Department of Electrical Engineering and ViceDean of the Faculty of Engineering. He has authored around 350 journaland conference papers and has over 70 patents granted or pending. Hisresearch interests include image and video signal processing and coding, andmultimedia communication.

Andre Kaup is a member of the IEEE Multimedia Signal ProcessingTechnical Committee, a member of the scientific advisory board of theGerman VDE/ITG, and a Fellow of the IEEE. He served as an AssociateEditor for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FORVIDEO TECHNOLOGY and was a Guest Editor for IEEE JOURNAL OFSELECTED TOPICS IN SIGNAL PROCESSING. From 1998 to 2001 heserved as an Adjunct Professor with the Technical University of Munich,Munich. He was a Siemens Inventor of the Year 1998 and received the 1999ITG Award. He has received several best paper awards, including the Paul DanCristea Special Award from the International Conference on Systems, Signals,and Image Processing in 2013. His group won the Grand Video CompressionChallenge at the Picture Coding Symposium 2013 and he received the teachingaward of the Faculty of Engineering in 2015.