Independent Vector Analysis via Log-Quadratically Penalized ...1 Independent Vector Analysis via...

11
1 Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization Robin Scheibler, Senior Member, IEEE Abstract—We propose a new algorithm for blind source sepa- ration of convolutive mixtures using independent vector analysis. This is an improvement over the popular auxiliary function based independent vector analysis (AuxIVA) with iterative projection (IP) or iterative source steering (ISS). We introduce iterative projection with adjustment (IPA), whereas we update one demixing filter and jointly adjust all the other sources along its current direction. We implement this scheme as multiplicative updates by a rank-2 perturbation of the identity matrix. Each update involves solving a non-convex minimization problem that we term log-quadratically penalized quadratic minimization (LQPQM), that we think is of interest beyond this work. We find that the global minimum of an LQPQM can be efficiently computed. In the general case, we show that all its stationary points can be characterized as zeros of a kind of secular equation, reminiscent of modified eigenvalue problems. We further prove that the global minimum corresponds to the largest of these zeros. We propose a simple procedure based on Newton-Raphson seeded with a good initial point to efficiently compute it. We validate the performance of the proposed method for blind acoustic source separation via numerical experiments with reverberant speech mixtures. We show that not only is the convergence speed faster in terms of iterations, but each update is also computationally cheaper. Notably, for four and five sources, AuxIVA with IPA converges more than twice as fast as competing methods. Index Terms—blind source separation, array signal processing, optimization, non-convex, majorization-minimization I. I NTRODUCTION B LIND source separation (BSS) deals with decomposing a mixture of signals into its constitutive components without any prior information [1]. It has found prominent application in multichannel audio processing [2], e.g., for the separation of speech [3] and music [4], but also in biomedical signal processing for electrocardiogram [5] and electroencephalogram [6], and in digital communications [7]. For multichannel signals, independent component analysis (ICA) allows to do BSS, only requiring statistical indepen- dence of the sources [8]. For convolutional mixtures such as those found in audio processing, the separation can be done in the frequency domain where convolution becomes pointwise multiplication [9]. In this case, we need to perform the joint separation at all the frequency sub-bands in parallel. Without further considerations, this introduces a permutation ambiguity whereas the order of extracted sources may be different at each frequency. Independent vector analysis (IVA) solves this problem by assuming a multivariate distribution of the sources over the frequency sub-bands and doing the separation jointly [10], [11], [12]. The source model is used to express LINE Corporation, Tokyo, Japan (e-mail: [email protected]) The software to reproduce the results of this paper is available at https: //github.com/fakufaku/auxiva-ipa. the likelihood of the input data which is then maximized to estimate the source signals. This optimization problem is non- convex, and, without a known closed form solution. Auxiliary function based IVA (AuxIVA) was proposed as a fast and stable optimization method to solve IVA [13]. It relies on the majorization-minimization technique [14] and is applica- ble to super-Gaussian source models. AuxIVA majorizes the IVA cost function with a quadratic surrogate, leading to the so-called hybrid exact-approximate diagonalization problem (HEAD) [15], [16]. Solving the HEAD for more than two sources is still an open problem and instead AuxIVA performs alternating minimization of the surrogate with respect to the demixing filters of the sources [13]. This approach has been coined iterative projection (IP). A similar solution was also proposed in the context of Gaussian sources [17]. Alternatives to the MM approach have been proposed. For example, proximal splitting allows for a versatile algorithm with a heuristic extension based on masking [18], [19]. Another approach, specialized for two sources, is based on expectation- maximization and a Gaussian mixture model [20]. This paper focuses on the MM approach which under- pins many algorithms with more sophisticated source models. These include non-negative low-rank [21], based on a varia- tional auto-encoder [22], a deep network [23], or using inter- clique dependence [24]. In addition, algorithms for overde- termined IVA (OverIVA), i.e., when there are more channels than sources, also rely on IP for estimating the demixing matrix [25]. As such, any improvement to the optimization of the surrogate function in AuxIVA directly translates to improvements for all of these algorithms. For two sources, the HEAD problem can be solved by a generalized eigenvalue decomposition [26] and thus globally optimal updates of the surrogate are possible. A similar situation arises for blind extraction of a single source with the fast independent vector extraction algorithm [27], [28]. For three and more sources, iterative projection 2 (IP2) does pairwise updates of two sources at a time, leading to faster convergence [29], [30]. Finally, iterative source steering (ISS) performs a series of rank-1 updates of the demixing matrix which correspond in fact to alternating updates of the steering vectors [31]. While the convergence of ISS is similar to that of IP, it does not require matrix inversion, unlike IP, and has an overall lower computational complexity. Thus, when separating three and more sources, all of IP, IP2, and ISS, fix all the other sources when doing one of the updates. This means that further correction can only happen at the next iteration. In this work, we propose iterative projection with adjust- ment (IPA), a joint update of one demixing filter with an extra rank-1 modification of the rest of the demixing matrix. As arXiv:2008.10048v1 [eess.SP] 23 Aug 2020

Transcript of Independent Vector Analysis via Log-Quadratically Penalized ...1 Independent Vector Analysis via...

Page 1: Independent Vector Analysis via Log-Quadratically Penalized ...1 Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization Robin Scheibler, Senior Member,

1

Independent Vector Analysis via Log-QuadraticallyPenalized Quadratic Minimization

Robin Scheibler, Senior Member, IEEE

Abstract—We propose a new algorithm for blind source sepa-ration of convolutive mixtures using independent vector analysis.This is an improvement over the popular auxiliary function basedindependent vector analysis (AuxIVA) with iterative projection(IP) or iterative source steering (ISS). We introduce iterativeprojection with adjustment (IPA), whereas we update one demixingfilter and jointly adjust all the other sources along its currentdirection. We implement this scheme as multiplicative updatesby a rank-2 perturbation of the identity matrix. Each updateinvolves solving a non-convex minimization problem that we termlog-quadratically penalized quadratic minimization (LQPQM), thatwe think is of interest beyond this work. We find that theglobal minimum of an LQPQM can be efficiently computed. Inthe general case, we show that all its stationary points can becharacterized as zeros of a kind of secular equation, reminiscentof modified eigenvalue problems. We further prove that the globalminimum corresponds to the largest of these zeros. We propose asimple procedure based on Newton-Raphson seeded with a goodinitial point to efficiently compute it. We validate the performanceof the proposed method for blind acoustic source separation vianumerical experiments with reverberant speech mixtures. Weshow that not only is the convergence speed faster in termsof iterations, but each update is also computationally cheaper.Notably, for four and five sources, AuxIVA with IPA convergesmore than twice as fast as competing methods.

Index Terms—blind source separation, array signal processing,optimization, non-convex, majorization-minimization

I. INTRODUCTION

BLIND source separation (BSS) deals with decomposinga mixture of signals into its constitutive components

without any prior information [1]. It has found prominentapplication in multichannel audio processing [2], e.g., forthe separation of speech [3] and music [4], but also inbiomedical signal processing for electrocardiogram [5] andelectroencephalogram [6], and in digital communications [7].For multichannel signals, independent component analysis(ICA) allows to do BSS, only requiring statistical indepen-dence of the sources [8]. For convolutional mixtures such asthose found in audio processing, the separation can be done inthe frequency domain where convolution becomes pointwisemultiplication [9]. In this case, we need to perform the jointseparation at all the frequency sub-bands in parallel. Withoutfurther considerations, this introduces a permutation ambiguitywhereas the order of extracted sources may be different ateach frequency. Independent vector analysis (IVA) solvesthis problem by assuming a multivariate distribution of thesources over the frequency sub-bands and doing the separationjointly [10], [11], [12]. The source model is used to express

LINE Corporation, Tokyo, Japan (e-mail: [email protected])The software to reproduce the results of this paper is available at https:

//github.com/fakufaku/auxiva-ipa.

the likelihood of the input data which is then maximized toestimate the source signals. This optimization problem is non-convex, and, without a known closed form solution. Auxiliaryfunction based IVA (AuxIVA) was proposed as a fast andstable optimization method to solve IVA [13]. It relies onthe majorization-minimization technique [14] and is applica-ble to super-Gaussian source models. AuxIVA majorizes theIVA cost function with a quadratic surrogate, leading to theso-called hybrid exact-approximate diagonalization problem(HEAD) [15], [16]. Solving the HEAD for more than twosources is still an open problem and instead AuxIVA performsalternating minimization of the surrogate with respect to thedemixing filters of the sources [13]. This approach has beencoined iterative projection (IP). A similar solution was alsoproposed in the context of Gaussian sources [17]. Alternativesto the MM approach have been proposed. For example,proximal splitting allows for a versatile algorithm with aheuristic extension based on masking [18], [19]. Anotherapproach, specialized for two sources, is based on expectation-maximization and a Gaussian mixture model [20].

This paper focuses on the MM approach which under-pins many algorithms with more sophisticated source models.These include non-negative low-rank [21], based on a varia-tional auto-encoder [22], a deep network [23], or using inter-clique dependence [24]. In addition, algorithms for overde-termined IVA (OverIVA), i.e., when there are more channelsthan sources, also rely on IP for estimating the demixingmatrix [25]. As such, any improvement to the optimizationof the surrogate function in AuxIVA directly translates toimprovements for all of these algorithms. For two sources,the HEAD problem can be solved by a generalized eigenvaluedecomposition [26] and thus globally optimal updates of thesurrogate are possible. A similar situation arises for blindextraction of a single source with the fast independent vectorextraction algorithm [27], [28]. For three and more sources,iterative projection 2 (IP2) does pairwise updates of twosources at a time, leading to faster convergence [29], [30].Finally, iterative source steering (ISS) performs a series ofrank-1 updates of the demixing matrix which correspond infact to alternating updates of the steering vectors [31]. Whilethe convergence of ISS is similar to that of IP, it does notrequire matrix inversion, unlike IP, and has an overall lowercomputational complexity. Thus, when separating three andmore sources, all of IP, IP2, and ISS, fix all the other sourceswhen doing one of the updates. This means that furthercorrection can only happen at the next iteration.

In this work, we propose iterative projection with adjust-ment (IPA), a joint update of one demixing filter with an extrarank-1 modification of the rest of the demixing matrix. As

arX

iv:2

008.

1004

8v1

[ee

ss.S

P] 2

3 A

ug 2

020

Page 2: Independent Vector Analysis via Log-Quadratically Penalized ...1 Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization Robin Scheibler, Senior Member,

2

opposed to IP, IP2, and ISS, when updating the demixing filterof one source, we simultaneously correct the demixing filtersof all other sources accordingly. Intuitively, this allows thealgorithm to make progress in the demixing of all sourcesat every update. Concretely, we adopt a multiplicative updateform whereas the current demixing matrix is multiplied by arank-2 perturbation of the identity matrix. We show that theminimization of the IVA surrogate function with respect to themultiplicative update leads to an optimization problem that webelieve is of independent interest. We term this problem log-quadratically penalized quadratic minimization (LQPQM).

Problem 1 (LQPQM). Let A,C ∈ Cd×d be Hermitianpositive definite and semi-definite, respectively, and b,d ∈ Cd,and z ∈ R, z ≥ 0. Then, the LQPQM problem is,

minx∈Cd

(x−b)HA(x−b)−log((x− d)HC(x− d) + z

). (P1)

For a sneak peek of what the objective function looks like intwo dimensions, skip to Fig. 1. One of the main contributionsof this paper is to show that, despite being non-convex, theglobal minimum of (P1) can be computed efficiently. In thegeneral case, we show that all the stationary points of theobjective of (P1) can be characterized as the zeros of a secularequation. Then, we prove that the value of the objective func-tion decreases for increasing values of the zeros, and the globalminimum thus corresponds to the largest zero. Furthermore,we find that its location is the only zero of the secular equationlarger than the largest generalized eigenvalue for the problemAx = ϕBx. Thus, we propose to use the Newton-Raphsonroot finding algorithm in this interval. We find that a goodinitial point for the root finding is given by the largest realroot of a third order polynomial, with which the procedureconverges in just a few iterations. The procedure we proposeis reminiscent of other algorithms for problems involving pairsof quadratic forms such as modified eigenvalue problems [32],[33], [34], generalized trust region subproblems [35], or someapplications in robust beamforming [36], multi-lateration [37],or direction of arrival [38].

We validate the performance of the proposed method vianumerical experiments for the separation of speech mixtureswith two to five sources. Compared to competing methods,the proposed algorithm not only converges in fewer iterations,but also faster overall. For up to three sources, IPA and IP2are similar, with a slight advantage for the former. For foursources and more, IPA converges more than twice as fast.

The rest of this paper is organized as follows. We coverthe background on IVA, MM optimization, and AuxIVAin Section II. Section III describes IPA, the proposed AuxIVAupdates, and proves that they are given by the solution to anLQPQM. The procedure to find the global minimum of anLQPQM is stated and proved in Section IV. We evaluate theperformance of AuxIVA with IPA updates and compare to IP,ISS, and IP2 in Section V. Section VI concludes this paper.

II. BACKGROUND

We consider the problem of separating F mixtures of Ksources, recorded by M sensors,

xfn = Afsfn, n = 1, . . . , N, (1)

where xfn ∈ CM and sfn are the measurement and sourcevectors, respectively, in mixture f and at frame n. Here,Af ∈ CM×K is the mixing matrix whose entry (Af )mkis the transfer function from source k to sensor m. Suchparallel mixtures most frequently appear as the result of time-frequency domain processing for the separation of convolu-tional mixtures, e.g., of audio sources [9]. From here on,we assume that we operate in the determined regime whereM = K, i.e. the number of sources and sensors is the same.In this case, the separation may be done by finding the M×Mdemixing matrices,

W f =[w1f · · · wMf

]H, f = 1, . . . , F, (2)

such that an estimate of the sources is,

yfn = W fxfn. (3)

Thus, row k ofW f contains the demixing filter wHkf for source

k, and yfn is the estimated source vector. Finding matrices

W = {W f : f = 1, . . . , F} (4)

is the object of IVA.In the rest of the manuscript, we use lower and upper case

bold letters for vectors and matrices, respectively. Furthermore,A>, AH, and det(A) denote the transpose, conjugate trans-pose, and determinant of matrixA, respectively. The conjugateof complex scalar z ∈ C is denoted z∗. Let v ∈ Cd, a complexd-dimensional vector. The vector v∗ contains the conjugatedcoefficients of v. The Euclidean norm of v is ‖v‖ = (vHv)1/2.Unless specified otherwise, indices f , k, m, and n always takethe ranges defined in this section.

A. Independent Vector Analysis

IVA estimates the demixing matrices by maximum likeli-hood. The observed data are the time-frequency vectors xfn,and the parameters to estimate are the demixing matrices W f .For convenience, we also define the vector of source k overfrequencies, at frame n, as

skn =[sk1n · · · skFn

]>. (5)

The likelihood function is derived on the basis of the twofollowing hypotheses.

Hypothesis 1 (Independence of Sources). The sources arestatistically independent, i.e.,

skn ⊥ sk′n′ , ∀k 6= k′, n, n′. (6)

Hypothesis 2 (Source Model). The sources follow a multi-variate distribution, i.e.,

p(skn) =1

Ze−F (skn), ∀k (7)

where F (s) is called the contrast function and Z is anormalizing constant that does not depend on the source.

Let us apply the change of variable ykfn = wHxfn anddefine ykn similarly to (5),

ykn =[yk1n · · · ykFn

]>. (8)

Page 3: Independent Vector Analysis via Log-Quadratically Penalized ...1 Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization Robin Scheibler, Senior Member,

3

By further using independence, the joint distribution of thesources is just their product. Thus, the likelihood of theobservation is

L(W) =∏kn

p(ykn)∏f

|detW f |2N . (9)

Finally, IVA estimates W f by minimizing the negative log-likelihood function, shown here with constant terms omitted,

`(W) =∑

knF (ykn)− 2

∑f

log |detW f |. (10)

The choice of the contrast function and the minimization of thenegative log-likelihood have been the object of considerablework [10], [11], [12], [26], [13], [29], [30]. Source modelsbased on spherical super-Gaussian distributions [26], [13], [30]underpin AuxIVA, described in Section II-C. They allow toapply the MM optimization technique that we describe next.

B. Majorization-Minimization Optimization

MM optimization is an iterative technique that makes useof a surrogate function that is both tangent to, and majorizesthe cost function everywhere. Repeatedly minimizing thesurrogate also minimizes the original cost function.

Proposition 1. Let Q(θ, θ) be a surrogate function such that

Q(θ, θ) = f(θ), and, Q(θ, θ) ≥ f(θ), ∀θ, θ. (11)

Given an initial point θ0, consider the sequence of iterates

θt = arg minθ

Q(θ,θt−1), t = 1, . . . , T. (12)

Then, the cost function is monotonically decreasing on thesequence, θ0,θ1, ...,θT , i.e.,

f(θ0) ≥ f(θ1) ≥ . . . ≥ f(θT ). (13)

Proof. Applying the properties of the surrogate,

f(θt−1) = Q(θt−1,θt−1)

≥ minθ

Q(θ,θt−1) = Q(θt,θt−1) ≥ f(θt), (14)

where we used in order, (11) (left), (12), and (11) (right).

Note that the proposition still holds even if the mini-mization in (12) is replaced by any operation that merelyreduces the value of Q(θ,θt−1). MM optimization has manydesirable properties. It allows to tackle non-convex and/ornon-smooth objective. Unlike gradient descent, it does notrequire tuning of a step size. Finally, the derived updatesoften have an intuitive interpretation. It has been applied tomulti-dimensional scaling [39], sparse norm minimization asthe popular iteratively reweighted least-squares algorithm [40],sub-sample time delay estimation [41], and direction-of-arrivalestimation [38]. For in-depth theory, a general introduction, ormore applications in signal processing, see [14], [42], [43].

Input : Microphone signals xfn ∈ CM , ∀f, nOutput: Separated signals yfn ∈ CM , ∀f, nW f ← IK , ∀fyfn ← xfn, ∀f, nfor loop ← 1 to max. iterations do

rkn ←√∑

f |ykfn|2, ∀k, n

V kf ← 1N

∑nG′(rkn)

2rknxfnx

Hfn, ∀k, f

for f ← 1 to F doW f ← Update(W f ,V 1f , . . . ,V Mf )yfn ←W fxfn, ∀n

Algorithm 1: AuxIVA. The sub-routine Update performs oneof IP, IP2, ISS, or IPA.

C. Auxiliary function based IVA

AuxIVA applies the MM technique to the IVA cost function(10) [13]. This is done by restricting the contrast function tothe class of spherical super-Gaussian source models.

Definition 1 (Spherical super-Gaussian contrast function [26]).A spherical super-Gaussian contrast function depends only onthe magnitude of the source vector, i.e.,

F (skn) = G(‖skn‖) (15)

and, in addition, G : R+ → R is a real, continuous,and differentiable function such that G′(r)/r is continuouseverywhere and monotonically decreasing for r > 0. FunctionG′(r) is the derivative of G(r).

These contrast functions include Laplace, time-varyingGauss, Cauchy, and other popular source models [26], [30].Conveniently, they can be majorized by a quadratic function.

Lemma 1 (Theorem 1 in [26]). Let G be as in Definition 1.Then,

G(r) ≤ G′(r0)r2

2r0+(G(r0)− r0

2G′(r0)

), (16)

with equality for r = r0.

Equipped with this inequality, we can form `2, a surrogateof (10) such that `(W) ≤ N`2(W) + constant,

`2(W) =∑

kfwHkfV kfwkf−2

∑f

log |detW f |, (17)

whereV kf =

1

N

∑n

G′(rkn)

2rknxfnx

Hfn, (18)

and rkn is an auxiliary variable. Conveniently, the surrogateis separable for f . In AuxIVA, described in Algorithm 1, theoptimization at different frequencies is tied together by rkn.Taking rkn = ‖ykn‖, with ykn from (8), ensures that the sur-rogate is tangent to the objective, i.e. (11) (left). Interestingly,rkn is the magnitude of the source estimate from the previousiteration. Closed-form minimization is possible for two sourcesvia the generalized eigenvalue decomposition. However, formore than two sources, it is still an open problem. Instead, anumber of strategies updating the parameters alternatingly ina block-coordinate descent fashion have been proposed.

Page 4: Independent Vector Analysis via Log-Quadratically Penalized ...1 Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization Robin Scheibler, Senior Member,

4

One of them is IP [13], [17]. It considers minimizationof (17) with respect to only one demixing filter, e.g., wkf ,keeping everything else fixed. In this case, the followingclosed-form solution exists,

wkf ←(W fV kf )−1ek

e>kW−HV −1

kfW−1ek

. (19)

The update is applied for k = 1, . . . ,M , in order.IP2 is an improvement over IP whereas (17) is mini-

mized with respect to two demixing filter, e.g. wkf ,wmf ,keeping everything else fixed [29], [30]. First, form P uf =

(W fV uf )−1[ek em], and let V uf = P HufV ufP uf , for

u = k,m. Then, the new demixing filters are given bythe generalized eigenvectors of the generalized eigenvalueproblem V kfx = ϕV mfx. The update is applied to pairsof sources, e.g., for three sources: (1, 2), (3, 1), (2, 3), etc.

Finally, ISS updates the whole demixing matrix [31],

W f ←W f − vkfwHkf , (20)

where the mth coefficient of vkf is given by

vmkf =

wH

mfV mfwkf

wHkfV mfwkf

if m 6= k,

1− (wHkfV kfwkf )−1/2 if m = k.

(21)

This corresponds in fact to an update of the kth steeringvector. This is performed for k = 1, . . . ,M , in order, onceper iteration.

III. ITERATIVE PROJECTION WITH ADJUSTMENT

We propose to perform an update that blends IP and ISS.We completely replace the kth demixing filter, and, jointly,we adjust the values of all other filter by taking a step alignedwith the current estimate of source k. This is implemented asthe following multiplicative update of the demixing matrix,

W ← T k(u, q)W , (22)

where u ∈ CM , q ∈ CM−1, and,

T k(u, q) = I + ek(u− ek)H + Ekq∗e>k , (23)

with Ek being the M×(M−1) matrix containing all canonicalbasis vectors but the kth,

Ek =[e1 · · · ek−1 ek+1 · · · eM

]. (24)

Without loss of generality, we can assume W = I , since in(17) it can be absorbed into the weighted covariance matricesV kf and some constant factors. Note that we removed theindex f to lighten the notation, and because optimization of(17) can be carried out separately for different f .

Plugging (22) into the IVA surrogate (17), we obtain

`2(u, q) =∑m6=k

(em + qmek)HV m(em + qmek)

+ uHV ku− 2 log |detT k(u, q)|, (25)

and we want to find the optimal values of u and q, i.e.,

u?, q? = arg minu∈CM ,q∈CM−1

`2(u, q). (26)

Input : W , V 1, . . ., V M

Output: Updated matrix Wfor k ← 1 to M do

A← diag(. . . , wHkV mwk, . . .), m 6= k

b←[· · · wH

kV mwm · · ·]>

, m 6= k

V ← ((WV kWH)−1)∗

C ← E>k V Ek

g ← E>k V ek

z ← e>k V ek − gHC−1g

q, λ← LQPQM(A,−A−1b,C,C−1g, z)

u← 1√λV (ek − Ekq

∗)

W ← (I + ek(uH − e>k ) + Ekq∗e>k )W

Algorithm 2: UpdateIPA: Update sub-routine of AuxIVAimplementing IPA.

Albeit not convex, it turns out that the solution of this opti-mization problem can be found efficiently. First, we show thata closed-form solution for u as a function of q exists. Then,plugging the expression for u back in the cost function, wefind that the optimal q is given by the solution of Problem 1.This is formalized in Theorem 1. An efficient algorithm tosolve Problem 1 is described in the following section and thefinal procedure is given in Algorithm 2.

Theorem 1. Let V 1, . . . ,V M be M Hermitian positive defi-nite matrices. Then, the solution of (26) is as follows.

1) The optimal vector u? is given by

u? =V −1k qk√

qHkV−1k qk

ejθ. (27)

where we defined q for convenience as

qk = ek − Ekq∗, (28)

and θ ∈ [0, 2π] is an arbitrary phase.2) The optimal q? is the solution to the following instance

of Problem 1,

minq∈CM−1

(q +A−1b)HA(q +A−1b)

− log((q −C−1g)HC(q −C−1g) + z

)(29)

with

A = diag(. . . , e>k V mek, . . .), m 6= k, (30)

b =[· · · e>k V mem · · ·

]>, m 6= k (31)

C = E>k (V −1

k )∗Ek, (32)

g = E>k (V −1

k )∗ek, (33)

z = e>k (V −1k )∗ek − gHC−1g. (34)

Proof. We prove the two parts of the theorem in order.1) Let us take the derivative of (25) with respect to u∗, and

∇u∗L = V ku− T−1k (u, q)ek. (35)

Page 5: Independent Vector Analysis via Log-Quadratically Penalized ...1 Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization Robin Scheibler, Senior Member,

5

Equating to zero and multiplying by T−1k (u, q) from

the left, we obtain the following equations,

uHV ku = 1, (36)

(E>k + q∗e>k )V ku = 0. (37)

We can find an equation for u by seeing (37) as a nullspace constraint. Adding the extra equation ekV ku = η,we have

(I + Ekq∗e>k )V ku = ηek, (38)

where η ∈ C is an extra variable. We can use the matrixinverse lemma to give a closed form solution of u as afunction of q and η,

u = ηV −1k (I + Ekq

∗e>k )−1ek (39)

= ηV −1k

(I − Ekq

∗e>k1 + e>k Ekq∗

)ek (40)

= ηV −1k

(ek − Ekq

∗) = ηV −1k qk, (41)

where we used the fact that e>k Ekq = 0. Now, we cancompute η from (36)

uHV ku = |η|2qHkV−1k V kV

−1k qk = 1, (42)

and thus

η = ejθ(qHV −1

k q)−1/2

. (43)

Together with (41), this gives (27).2) The proof of the second part follows from substitutingu? from (27) into the objective function (25).

a) By (36), the quadratic term in u equals one.b) Now, we handle the log-determinant part. In Ap-

pendix A, we show that

det(T k) = uHqk. (44)

Substituting u?, we further have

|(u?)Hqk| =

∣∣∣∣∣∣ qHkV−1k qk√

qHkV−1k qk

∣∣∣∣∣∣ =

√qHkV

−1k qk.

(45)

Finally, with a little algebra, one can check that

qHkV−1k qk = (q −C−1g)HC(q −C−1g) + z.

c) As shown in Appendix B, the remaining quadraticterms can be transformed into a standard quadraticform as follows∑

m6=k

(em + qmek)HV m(em + qmek)

= (q+A−1b)HA(q+A−1b)−bHA−1b+1>c,(46)

where cm = e>mV mem, and 1 is the all one vector.Removing the constant terms yields the proof.

IV. LOG-QUADRATICALLY PENALIZED QUADRATICMINIMIZATION

We will now provide an efficient algorithm to computethe solution of Problem 1. It is interesting to take a look atthe landscape of one instance of the 2D problem as shownin Fig. 1. Let us first try to give an intuitive and informaldescription of the problem. The quadratic term of the objectiveforms the familiar bowl shape, and the log-quadratic termappears like someone pinched and pulled up the "fabric" ofthe cost function in one point. The location of the "pinch",described by offset vectors b and d, as well as the offset z inthe log, may create different patterns of stationary points. Inthe 2D case of Fig. 1, we observe two "bowls", separated bya kind of ridge, which is due to the log-quadratic term. Thereare in fact only a finite number of stationary points, five inFig. 1, to be precise. In the rest of this section, we will makeprecise this intuitive description, and give a procedure to findthe global minimum.

Since A (in Problem 1) is Hermitian positive definite, it hasa Cholesky decomposition, which can be inverted. This allowsto consider the following alternative form of LQPQM instead.

Problem 2 (LQPQM alternative form). Let U ∈ Cd×d beHermitian positive semi-definite, and v ∈ Cd.

miny∈Cd

yHy − log((y + v)HU(y + v) + z

)(P2)

The two problems are equivalent, as we explain now.

Proposition 2. Let G be such that A = GHG. Further, lety? be the optimum of (P2) with

U = G−HCG−1, and v = G(b− d). (47)

Then, the optimum of Problem 1 is

x? = G−1y + b. (48)

Proof. Let y = G(x− b), and substitute in (P1).

Proposition 3. The objective function of (P2) is bounded frombelow and takes its minimum at a finite value.

Proof. See Appendix C.

The next two theorems fully characterize the solution ofProblem 1 and 2. Theorem 2 handles the case when the offsetvector v is zero (or b = d in Problem 1). There, the solutioncan be obtained from the eigendecomposition of U . Note thatthe eigendecomposition of U is equivalent to the generalizedeigendecomposition of A and B. When v 6= 0, the solutioncan be computed by solving a secular equation as explained inTheorem 3. An algorithmic instantiation of these two theoremsis provided by Algorithm 3.

Theorem 2 (Special Case, v = 0). The global minimumof (P2) is characterized as follows. Let ϕ1 ≤ . . . ≤ ϕdbe the eigenvalues of U , and σ1, . . . ,σd, the correspondingeigenvectors.

1) If z ≥ ϕd, the unique global minimum of (P2) is y? = 0.2) If z < ϕd, the minimum of (P2) is given by

y? = ejθ√ϕd − zσHUσ

σ, (49)

Page 6: Independent Vector Analysis via Log-Quadratically Penalized ...1 Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization Robin Scheibler, Senior Member,

6

2 0 2x1

2

0

2

x 2

x 1

x2

Fig. 1: The loss landscape of an instance of the 2D LQPQM shown with ina 3D (left) and 2D (right) contour plots. The global minimum is indicated byan × on the right figure.

where θ ∈ [0, 2π] is an arbitrary phase. If ϕd > ϕd−1,the global minimum is unique (up to the phase θ) andgiven by σ = σd. If the largest eigenvalue has multiplic-ity k, then any linear combination σ of σd−k, . . . ,σdis a global minimum.

Theorem 3 (General Case, v 6= 0). Let U = ΣΦΣH be theeigendecomposition of U , with Φ = diag(ϕ1, . . . , ϕd), whereϕ1 ≤ . . . ≤ ϕd are the eigenvalues of U . Then, the uniqueglobal minimum of (P2) is

y? = (λ?I −U)−1Uv (50)

where λ? is the largest root of the function f : R+ → R,

f(λ) = λ2∑m∈S

ϕm|vm|2

(λ− ϕm)2− λ+ z, (51)

where vm are the coefficients of the vector v = ΣHv, and Sis the common support of v and the eigenvalues,

S = {m : ϕm|vm|2 6= 0}. (52)

Furthermore, the largest root is the unique root located in theinterval (max(ϕmax, z),+∞), where ϕmax = maxm∈S ϕm.In this interval, f(λ) is strictly decreasing.

Because the optimal λ is restricted to an interval where f(λ)is strictly decreasing, we may use a root finding algorithm suchas Newton-Raphson to compute it efficiently, Initializationand stability aspects of the root finding are covered in Sec-tion IV-C. The complete procedure for LQPQM is describedin Algorithm 3. Algorithm 4 is the sub-routine that solves thesecular equation.

A. Proof of Theorem 2The special case, v = 0, leads to the simpler problem,

minx∈Cd

yHy − log(yHUy + z). (53)

Equating the gradient to zero, and adding an extra non-negativevariable λ ≥ 0, we can obtain the following first ordernecessary optimality conditions,{

Uy = λy,

λ = yHUy + z.(54)

Input : A, b, C, d, zOutput: x, λ, solution to Problem 1G← Cholesky(A)

U ← G−HCG−1

Φ,Σ← EigenValueDecomposition(U)if b = d then

if z ≥ ϕd thenλ← zy ← 0

elseλ← ϕd

y ←√

ϕd−zσH

dUσdσd

elsev ← ΣHG(b− d)

µ← SolveSecularEquation(

Φϕmax

, vϕmax

, zϕmax

)λ← µϕmax

y ← Σ(λI −Φ)−1Φv

x← G−1y + b

Algorithm 3: LQPQM

Solutions to this system of equations are stationary points.• The trivial solution to (54): λ = z, y = 0.• The eigenvalue/vectors of U also provide the solutions,

λ = ϕi, y = ejθ

√ϕi − zσHi Uσi

σi, (55)

where θ ∈ [0, 2π] is an arbitrary phase, for all ϕi ≥ z.From (54), we can obtain yHy = (λ − z)/λ. Together withthe second equation in (54), we can rewrite the objective as afunction of λ,

g(λ) = − log λ+λ− zλ

. (56)

The derivative is

g′(λ) =z − λλ2

, (57)

and g(λ) is thus decreasing for λ > z. Thus, if ϕd ≥ z, thesolution is given by the largest eigenvector (or eigenvectors ifthe multiplicity of the largest eigenvalue is more than one).Otherwise, the optimum is zero. �

B. Proof of Theorem 3

Equating the gradient of the objective of (P2) with respectto y∗ to zero, we obtain the following equation,

y − U(y + v)

(y + v)HU(y + v) + z= 0. (58)

As in the previous section, we isolate the quadratic term in asecond equation by adding the non-negative variable λ ≥ 0,and obtain the following first order optimality conditions,{

U(y + v) = λy,

λ = (y + v)HU(y + v) + z.(59)

Page 7: Independent Vector Analysis via Log-Quadratically Penalized ...1 Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization Robin Scheibler, Senior Member,

7

Solving for y, we obtain a solution as a function of λ,

y(λ) = (λI −U)−1Uv. (60)

Switching to the eigenbasis of U and substituting into thesecond equation leads to

λ = ‖Φ1/2((λI −Φ)−1Φ + I)v‖2 + z (61)

= λ2∑m∈S

ϕm|vm|2

(λ− ϕm)2+ z. (62)

This gives us the necessary condition that f(λ) = 0 for anystationary point of (P2). Now this equation may have multipleroots, so we need to find the one with the lowest value ofthe objective. It turns out that the value of the objective canalso be written as a function of λ only. First, we expand theleft-most factor of the second equation in (59) to obtain,

λ = yHU(y + v) + vHU(y + v) + z. (63)

From the first equation in (59), we have

yHU(y + v) = λyHy. (64)

Then, by (60), we find the second term

vHU(y + v) = vH(U(λI −U)−1U +U)v. (65)

Substituting into the second condition in (63) gives us

λ = λyHy + vH(U(λI −U)−1U +U)v + z. (66)

Using the eigendecomposition of U and rearranging terms,

yHy = 1−∑m∈S

ϕm|vm|2

(λ− ϕm)− z

λ. (67)

Finally, replacing into the objective, we obtain

g(λ) = 1−∑m∈S

ϕm|vm|2

(λ− ϕm)− z

λ− log λ. (68)

Thus, the optimal λ is the solution to the following optimiza-tion problem,

minλ∈R+

g(λ), subject to f(λ) = 0. (P3)

where f(λ) is defined in (51). In Fig. 2, we show the functionsg(λ) and f(λ) for the instance of LQPQM of Fig. 1. This newproblem is highly non-linear and the objective is not evencontinuous. However, we can show that f(λ) only has a finitenumber of roots and that the largest, λ?, has the minimumvalue of the objective among them. In the next series oflemmas, we characterize all the roots of f(λ). We show thatthe value of the cost function decreases for increasing roots.As a consequence, the largest root is the global minimum ofthe cost function. In the following, to lighten the notation, weassume, without loss of generality, that S = {1, . . . , d}.

Lemma 2. The function f(λ) has1) no roots smaller or equal to z,2) zero, one, or two roots in (z, ϕk), with ϕk being the

smallest eigenvalue larger than z, if such a root exists,3) zero, one, or two roots in (ϕL−1, ϕL) for L = k +

1, . . . , d,

4) a unique root in the interval (max(ϕmax, z),+∞).

Proof. The proof proceeds by inspection of the first andsecond derivatives of f(λ),

f ′(λ) = −2λ∑m∈S

ϕ2m|vm|2

(λ− ϕm)3− 1, (69)

f ′′(λ) = 2∑m∈S

ϕ2m|vm|2

2λ+ ϕm(λ− ϕm)4

. (70)

1) Follows from z − λ ≥ 0 in (0, z), and

λ2∑m∈S

ϕm|vm|2

(λ− ϕm)2> 0, if λ > 0. (71)

2) In (z, ϕk), we have

f(z) > 0, f(ϕk − ε) −→ε→0

+∞, (72)

and because f ′′(λ) > 0 in this interval, the functionthere is strictly convex with a unique minimum. If theminimum is larger than zero, there is no root. If theminimum is zero, there is one root. If the minimum isless than zero, there are two roots.

3) In (ϕL−1, ϕL), we have

f(ϕL−1 + ε) −→ε→0

+∞, f(ϕL − ε) −→ε→0

+∞, (73)

and f ′′(λ) > 0, thus, f(λ) is strictly convex with aunique minimum, as in 2.

4) In (max(ϕmax, z),+∞), f ′(λ) < 0 because ϕm > 0for all m, and λ > max(ϕmax, z). In addition, we have

f(ϕmax + ε) −→ε→0

+∞, and f(λ) −→λ→+∞

−∞,

and thus there is exactly one root in this interval. By 1),the root is in (z,+∞) if z > ϕmax.

Corollary 1. The roots of f(λ) are strictly larger than 0.

Proof. By Lemma 2, 1), if f(λ) = 0, then λ > z ≥ 0.

Fact 1. The derivative of g(λ) is g′(λ) = 1λ2 f(λ).

Lemma 3. If f(λ) has roots 0 < λ1 ≤ λ2 in (ϕL−1, ϕL),then, f(λ1) ≥ f(λ2).

Proof. From Fact 1, we know that the roots of f(λ) are sta-tionary points of g(λ). Moreover, because f(λ) is convex witha unique minimum in the interval, f(λ) < 0 for λ ∈ (λ1, λ2).Thus, g′(λ) = 1

λ2 f(λ) < 0 for λ ∈ (λ1, λ2), and the prooffollows.

Lemma 4. Let λ1 ∈ (ϕL−1, ϕL) and λ2 ∈ (ϕL+K , ϕL+K+1)such that f(λ1) = f(λ2) = 0, for some L ∈ {1, . . . , d} andK ∈ {0, . . . , d−L}. For convenience, we defined ϕ0 = z andϕd+1 = +∞. Then g(λ1) ≥ g(λ2).

Proof. First, we define two functions fA(λ) and gA(λ), thatare similar to f(λ) and g(λ), respectively, but with all thediscontinuous terms between λ1 and λ2 removed. Then, weshow that gA(λ) is decreasing in (λ1, λ2) with g(λ1) andg(λ2) strictly above and below gA(λ), respectively.

Page 8: Independent Vector Analysis via Log-Quadratically Penalized ...1 Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization Robin Scheibler, Senior Member,

8

0 1 2

0

f( ) (secular eq.)g( ) (objective)Cubic approx. of f( )

Fig. 2: The secular equation f(λ) corresponding to the 2D LQPQM in Fig. 1,its objective g(λ), and the cubic polynomial used for the initialization of theroot finding. The optimal λ? is the largest root of f(λ).

Input : Φ, v, zOutput: Largest zero of f(λ)λ← InitCubicPoly(ϕmax, vmax, z)λ← max(λ, z)while |f(λ)| > ε do

µ← λ− f(λ)f ′(λ)

if µ > ϕmax thenλ← µ

elseλ← ϕmax+λ

2

Algorithm 4: SolveSecularEquation. The sub-routine InitCu-bicPoly returns the largest real root of the cubic polynomial(81).

Let A = {L, . . . , L+K} and define

fA(λ) = λ2∑m∈A

ϕm|vm|2

(λ− ϕm)2≥ 0 (74)

gA(λ) = −∑m∈A

ϕm|vm|2

(λ− ϕm)

{> 0 if λ < ϕL

< 0 if λ > ϕL+K

(75)

Then, let fA(λ) = f(λ)−fA(λ), and gA(λ) = g(λ)−gA(λ).Note that these two functions are continuous in (λ1, λ2). SincefA(λ) ≥ 0, we have

fA(λp) ≤ f(λp) = 0, for p = 1, 2. (76)

Together with Lemma 2, this means that fA(λ) has two rootsin (ϕL−1, ϕL+K+1), or just one if ϕL+K+1 = +∞. As aconsequence, g′A(λ) = 1

λ2 fA(λ) < 0 for λ ∈ (λ1, λ2). And,thus, gA(λ) is strictly decreasing on this interval.

Then, because gA(λ1) > 0 and gA(λ2) < 0, we have

g(λ1) > gA(λ1), and, g(λ2) < gA(λ1), (77)

respectively. Finally, because gA(λ) is strictly decreasing inthe interval,

g(λ1) > gA(λ1) > gA(λ2) > g(λ2), (78)

which concludes the proof.

C. Root finding

The solution to the general problem (P2) is given by thelargest root of f(λ), from (51). We have shown that the rootis in (max(ϕmax, z),+∞), and we can thus use a root findingalgorithm to find it. Several methods are possible, but we optfor Newton-Raphson,

λt ← λt−1 −f(λt−1)

f ′(λt−1), t = 1, . . . , T, (79)

where f ′(λ) is given in (70). With a good enough startingpoint λ0, this method converges in just a few iterations.

1) Initialization: Because the inverse square terms in f(λ)decay quickly, when λ > ϕmax, we can approximate

f(λ) ≈ λ2ϕmax|vmax|2

(λ− ϕmax)2− λ+ z (80)

where ϕd is the largest eigenvalue. Note that this approxima-tion is guaranteed to have its largest zero in the same intervalas f(λ), which is important for Newton-Raphson. Equating tozero and multiplying by (λ−ϕmax)2 on both sides leads to acubic equation in λ (see also Fig. 2),

− λ3 + (ϕmax|vmax|2 + 2ϕmax + z)λ2

− (ϕmax + 2z)ϕmaxλ+ ϕ2maxz = 0. (81)

Cubic equation have three solutions including at least one real,and two possibly complex. We will thus use the largest realsolution as a starting point for the root finding.

2) Numerical Stability: When the eigenvalues are large,computation of (λ− ϕm)−2 may lead to an overflow, jeopar-dizing the algorithm. Instead, we consider

f(µ) =1

ϕmaxf(ϕmaxµ) = µ2

∑m

ϕm|vm|2

(µ− ϕm)2−µ+z, (82)

with ϕm = ϕm/ϕmax, vm = vm/ϕmax, and z = z/ϕmax.We can find the largest root of f(µ) = 0, µ∗, according toLemma 2. Then, the largest root of f(λ) is λ∗ = ϕmax µ

∗.

V. NUMERICAL EXPERIMENTS

The effectiveness of the proposed IPA updates for AuxIVAis compared to that of competing methods: IP [13], ISS [25],and IP2 [29], [30]. The experiments are done on simulatedreverberant speech mixtures and the performance is evaluatedin terms of scale-invariant signal-to-distortion and signal-to-interference ratios (SI-SDR and SI-SIR, respectively) [44]. Weevaluate different numbers of sources/microphones and signal-to-noise ratios (SNR).

A. Setup

We simulate 100 random rectangular rooms with thepyroomacoustics Python package [45]. The walls are be-tween 6 m and 10 m long, and the ceiling from 2.8 m to 4.5 mhigh. Simulated reverberation times (T60) are approximatelyuniformly sampled between 60 ms and 450 ms. Sources andmicrophone array are placed at random at least 50 cm awayfrom the walls and between 1 m and 2 m high. The array iscircular and regular with 2, 3, 4, or 5 microphones, and radius

Page 9: Independent Vector Analysis via Log-Quadratically Penalized ...1 Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization Robin Scheibler, Senior Member,

9

0

20

Dec

ibel

sSI-SDR [dB]

AuxIVA-IPAuxIVA-ISSAuxIVA-IP2AuxIVA-IPA

SN

R 5 dB

SI-SIR [dB]

−10

0

10

20

Dec

ibel

s

SN

R 15 dB

2 3 4 5# Sources/Mics

0

10

20

30

Dec

ibel

s

2 3 4 5# Sources/Mics

SN

R 25 dB

Fig. 3: Box-plots of the final SI-SDR (left) and SI-SIR (right) values after ahundred iterations. From top to bottom, the SNR is 5dB, 15dB, and 25dB.In subplots, from left to right, the number of sources goes from two to five.

such that neighboring elements are 10 cm apart. All sourcesare placed further from the array than the critical distance ofthe room — the distance where direct sound and reverberationhave equal energy. It is computed as dcrit = 0.057

√V/T60 m,

with V the volume of the room [46]. Uncorrelated Gaussiannoise with variance σ2

n is added to the microphone inputs. Thesource signals are normalized to have equal variance σ2

s at thefirst microphone, and σ2

n chosen so that the SNR, defined asSNR = Mσ2

s/σ2n, where M is the number of sources, is 5 dB,

15 dB, or 25 dB.The simulation is conducted at 16 kHz with concatenated

utterances from the CMU Arctic corpus [47], [48]. We usean STFT with a 4096-points Hamming analysis window and3/4-overlap. Reconstruction uses the optimal synthesis win-dow [49]. All algorithms are run for 100 iterations and withthe microphone signals as initial source estimates. Because IP2and IPA update twice as many parameters per iteration thanIP and ISS, each of their iteration is counted twice, so that thetotal number of parameter updates are about the same for allalgorithms. The scale of the output is restored by minimizingdistortion with respect to the first microphone [50], [51].

B. Results

First, we compare the final value of the SI-SDR and SI-SIR after 100 iterations. Fig. 3 shows box-plots for differentnumbers of sources and SNR. In all cases, the proposedIPA updates attain the same or a higher final performance.For larger number of sources and lower SNR, especially, weobserve that IPA performs clearly better, even compared toIP2. For two sources, IP and ISS have slightly higher finalSI-SIR values. This is due to IP and ISS converging slower,but also overshooting in terms of SI-SIR, as seen in Fig. 4.

Next, we look at the convergence of the SI-SIR as a functionof the number of iterations and runtime. Fig. 4 shows theresults for SNR 25 dB. This is where IPA really shines asit outperforms all other methods for all number of sources.In terms of number of iterations, IPA is closely tied to IP2for 2 and 3 Microphones. But when the x-axis is scaledaccording to the runtime of one iteration (for 1 s of inputsignal, the so-called real-time factor), then IPA convergesfaster overall. In the two source cases, IP2 performs globallyoptimal minimization of the surrogate function (17), and itis thus surprising that IPA seems to converge faster. Thismight suggest some subtle effects in the minimization of theunderlying objective function (10) that we believe deservefurther study. For four and five sources, IPA converges morethan twice faster than IP2, which makes it a good candidatefor high performance implementations.

VI. CONCLUSION

We proposed a new algorithm for the MM-based inde-pendent vector analysis algorithm AuxIVA. Unlike previousmethods that only update parts of the demixing matrix at atime, we introduced iterative projection with adjustment (IPA)that updates the whole demixing matrix with a multiplicativeupdate. In the derivation of the IPA update, we stumbled upona generic optimization problem that we call log-quadraticallypenalized quadratic minimization (LQPQM). Despite beingnon-convex, its global minimum can be computed efficiently.To the best of our knowledge, this problem had not beensolved before. We assessed the performance of AuxIVA withthe IPA updates in numerical experiments with simulatedreverberant speech mixtures. We found IPA to outperform allother methods in terms of convergence speed, both for iterationcount and runtime, significantly so for four sources and more.

In future work, we hope to evaluate the impact of IPAupdates on more source models, e.g. in ILRMA [21], and in theoverdetermined [25], [30] and underdetermined [52] regimes.Another interesting question is whether LQPQM is applicablein other contexts. The log-penalty suggests it might be usefulfor barrier-based interior point methods. Another possibility isthe maximization of the information theoretic capacity subjectto a quadratic penalty or constraint [53].

ACKNOWLEDGMENT

We would like to acknowledge the work of the open sourcescientific Python community, on which the code for this paperrelies. In particular numpy for the computations [54], [55],pandas for the statistical analysis of the results [56], andmatplotlib and seaborn for the figures [57], [58].

REFERENCES

[1] P. Comon and C. Jutten, Handbook of blind source separation: inde-pendent component analysis and applications. Oxford, UK: AcademicPress/Elsevier, 2010.

[2] S. Makino, Ed., Audio Source Separation, ser. Signals and Communica-tion Technology. Cham, CH: Springer International Publishing, 2018.

[3] S. Makino, H. Sawada, and T.-W. Lee, Eds., Blind Speech Separation,ser. Signals and Communication Technology. Cham, CH: Springer,2007.

Page 10: Independent Vector Analysis via Log-Quadratically Penalized ...1 Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization Robin Scheibler, Senior Member,

10

0 50 100Iteration

0

10

20

ΔS

I-SIR

[dB

]2 Sources/Mics

0 50 100Iteration

3 Sources/Mics

0 50 100Iteration

4 Sources/Mics

0 50 100Iteration

5 Sources/Mics

AuxIVA-IPAuxIVA-ISSAuxIVA-IP2AuxIVA-IPA

0.00 0.25 0.50 0.75Runtime [s]

0

10

20

ΔS

I-SIR

[dB

]

0 1 2Runtime [s]

0 2 4Runtime [s]

0 2 4 6Runtime [s]

AuxIVA-IPAuxIVA-ISSAuxIVA-IP2AuxIVA-IPA

Fig. 4: Evolution of the average SI-SIR over number of iterations or runtime in the top and bottom row, respectively. The number of sources increases fromtwo to five from left to right. The SNR is 25dB.

[4] E. Cano, D. FitzGerald, A. Liutkus, M. D. Plumbley, and F.-R. Stöter,“Musical source separation: An introduction,” IEEE Signal Process.Mag., vol. 36, no. 1, pp. 31–40, Jan. 2019.

[5] V. Zarzoso, A. K. Nandi, and E. Bacharakis, “Maternal and foetal ECGseparation using blind source separation methods,” IMA J Math ApplMed Biol, vol. 14, no. 3, pp. 207–225, Sep. 1997.

[6] F. Cong, “Blind source separation,” in EEG Signal Processing andFeature Extraction, L. Hu and Z. Zhang, Eds. Singapore: Springer,2019, ch. 7, pp. 117–140.

[7] H. Yang, H. Zhang, J. Li, L. Yang, and W. Ding, “Baseband communica-tion signal blind separation algorithm based on complex nonparametricprobability density estimation,” IEEE Access, vol. 6, pp. 22 434–22 440,Apr. 2018.

[8] P. Comon, “Independent component analysis, a new concept?” SignalProcessing, vol. 36, no. 3, pp. 287–314, 1994.

[9] P. Smaragdis, “Blind separation of convolved mixtures in the frequencydomain,” Neurocomputing, vol. 22, no. 1-3, pp. 21–34, Nov. 1998.

[10] A. Hiroe, “Solution of permutation problem in frequency domainICA, using multivariate probability density functions,” in Advances inCryptology – ASIACRYPT 2016. Berlin, Heidelberg: Springer BerlinHeidelberg, 2006, pp. 601–608.

[11] T. Kim, H. T. Attias, S.-Y. Lee, and T.-W. Lee, “Blind source separationexploiting higher-order frequency dependencies,” IEEE Trans. Audio,Speech, Lang. Process., vol. 15, no. 1, pp. 70–79, Dec. 2006.

[12] I. Lee, T. Kim, and T.-W. Lee, “Independent vector analysis for convo-lutive blind speech separation,” in Blind Speech Separation. Dordrecht:Springer, Dordrecht, 2007, pp. 169–192.

[13] N. Ono, “Stable and fast update rules for independent vector analysisbased on auxiliary function technique,” in Proc. IEEE WASPAA, NewPaltz, NY, USA, Oct. 2011, pp. 189–192.

[14] K. Lange, MM optimization algorithms. SIAM, 2016.[15] A. Yeredor, “On hybrid exact-approximate joint diagonalization,” in

Proc. IEEE CAMSAP, Dec. 2009, pp. 312–315.[16] A. Weiss, A. Yeredor, S. Cheema, and M. Haardt, “The extended

“sequentially drilled” joint congruence transformation and its applicationin Gaussian independent vector analysis,” IEEE Trans. Signal Process.,vol. 65, no. 23, pp. 6332–6344, Dec. 2017.

[17] S. Degerine and A. Zaidi, “Separation of an instantaneous mixtureof Gaussian autoregressive sources by the exact maximum likelihoodapproach,” IEEE Trans. Signal Process., vol. 52, no. 6, pp. 1499–1512,Jun. 2004.

[18] K. Yatabe and D. Kitamura, “Determined blind source separation viaproximal splitting algorithm,” in Proc. IEEE ICASSP, Calgary, CA, Apr.2018, pp. 776–780.

[19] ——, “Determined bss based on time-frequency masking and its appli-cation to harmonic vector analysis,” arXiv, Apr. 2020.

[20] Z. Gu, J. Lu, and K. Chen, “Speech separation using independent vectoranalysis with an amplitude variable Gaussian mixture model,” in Proc.Interspeech 2019, Graz, AU, Sep. 2019, pp. 1358–1362.

[21] D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “De-termined blind source separation unifying independent vector analysis

and nonnegative matrix factorization,” IEEE/ACM Trans. Audio SpeechLang. Process., vol. 24, no. 9, pp. 1626–1641, Sep. 2016.

[22] H. Kameoka, L. Li, S. Inoue, and S. Makino, “Supervised determinedsource separation with multichannel variational autoencoder,” Neuralcomputation, vol. 31, no. 9, pp. 1891–1914, Sep. 2019.

[23] N. Makishima, S. Mogami, N. Takamune, D. Kitamura, H. Sumino,S. Takamichi, H. Saruwatari, and N. Ono, “Independent deeply learnedmatrix analysis for determined audio source separation,” IEEE/ACMTrans. Audio Speech Lang. Process., vol. 27, no. 10, pp. 1601–1615,2019.

[24] U.-H. Shin and H.-M. Park, “Auxiliary-function-based independentvector analysis using generalized inter-clique dependence source modelswith clique variance estimation,” IEEE Access, vol. 8, pp. 68 103–68 113,Apr. 2020.

[25] R. Scheibler and N. Ono, “Independent vector analysis with moremicrophones than sources,” in Proc. IEEE WASPAA, New Paltz, NY,USA, Oct. 2019, pp. 185–189.

[26] N. Ono and S. Miyabe, “Auxiliary-function-based independent compo-nent analysis for super-Gaussian sources,” Proc. LVA/ICA, vol. 6365,no. 6, pp. 165–172, Sep. 2010.

[27] R. Scheibler and N. Ono, “Fast independent vector extraction by iterativeSINR maximization,” in Proc. IEEE ICASSP, Barcelona, ES, May 2020,accepted.

[28] R. Ikeshita, T. Nakatani, and S. Araki, “Overdetermined independentvector analysis,” in Proc. IEEE ICASSP, Barcelona, ES, May 2020,accepted.

[29] N. Ono, “Fast algorithm for independent component/vector/low-rankmatrix analysis with three or more sources,” in Proc. Acoustical Societyof Japan, Mar. 2018, pp. 437–438.

[30] R. Scheibler and N. Ono, “MM algorithms for joint independentsubspace analysis with application to blind single and multi-sourceextraction,” arXiv, Apr. 2020.

[31] ——, “Fast and stable blind source separation with rank-1 updates,” inProc. IEEE ICASSP, Barcelona, ES, May 2020, pp. 236–240.

[32] G. H. Golub, “Some modified matrix eigenvalue problems,” SIAMReview, vol. 15, no. 2, pp. 318–334, Apr. 1973.

[33] J. R. Bunch, C. P. Nielsen, and D. C. Sorensen, “Rank-one modificationof the symmetric eigenproblem,” Numerische Mathematik, vol. 31, no. 1,pp. 31–48, Mar. 1978.

[34] K.-B. Yu, “Recursive updating the eigenvalue decomposition of acovariance matrix,” IEEE Trans. Signal Process., vol. 39, no. 5, pp.1136–1145, May 1991.

[35] J. J. More, “Generalizations of the trust region problem,” Optim. MethodSoftw., vol. 2, no. 3-4, pp. 189–209, Jan. 1993.

[36] R. G. Lorenz and S. P. Boyd, “Robust minimum variance beamforming,”IEEE Trans. Signal Process., vol. 53, no. 5, pp. 1684–1696, 2005.

[37] A. Beck, P. Stoica, and J. Li, “Exact and approximate solutions of sourcelocalization problems,” IEEE Trans. Signal Process., vol. 56, no. 5, pp.1770–1778, Apr. 2008.

[38] M. Togami and R. Scheibler, “Sparseness-aware DOA estimation withmajorization minimization,” in Proc. Interspeech, Oct. 2020, accepted.

Page 11: Independent Vector Analysis via Log-Quadratically Penalized ...1 Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization Robin Scheibler, Senior Member,

11

[39] J. de Leeuw and W. J. Heiser, “Convergence of correction matrixalgorithms for multidimensional,” in Geometric Representations of Re-lational Data, J. C. Lingoes, E. Roskam, and I. Borg, Eds. Ann Arbor,MI: Geometric representations of relational data, 1977, pp. 735–752.

[40] I. Daubechies, R. DeVore, M. Fornasier, and C. Sinan Güntürk, “It-eratively reweighted least squares minimization for sparse recovery,”Communications on Pure and Applied Mathematics, vol. 63, no. 1, pp.1–38, Jan. 2010.

[41] K. Yamaoka, R. Scheibler, N. Ono, and Y. Wakabayashi, “Sub-sampletime delay estimation via auxiliary-function-based iterative updates,” inProc. IEEE WASPAA, New Paltz, NY, USA, Oct. 2019, pp. 130–134.

[42] D. R. Hunter and K. Lange, “A tutorial on MM algorithms,” TheAmerican Statistician, vol. 58, no. 1, pp. 30–37, Feb. 2004.

[43] Y. Sun, P. Babu, D. P. I. T. o. Signal, and 2016, “Majorization-minimization algorithms in signal processing, communications, andmachine learning,” IEEE Trans. Signal Process., vol. 65, no. 3, pp. 794–816, Feb. 2017.

[44] J. Le Roux, S. Wisdom, H. Erdogan, and J. R. Hershey, “SDR — half-baked or well done?” in Proc. IEEE ICASSP, Brighton, UK, May 2019,pp. 626–630.

[45] R. Scheibler, E. Bezzam, and I. Dokmanic, “Pyroomacoustics: A Pythonpackage for audio room simulations and array processing algorithms,”in Proc. IEEE ICASSP, Calgary, CA, Apr. 2018, pp. 351–355.

[46] H. Kuttruff, Room acoustics. CRC Press, 2009.[47] J. Kominek and A. W. Black, “CMU ARCTIC databases for speech syn-

thesis,” Language Technologies Institute, School of Computer Science,Carnegie Mellon University, Tech. Rep. CMU-LTI-03-177, 2003.

[48] R. Scheibler, “CMU ARCTIC concatenated 15s,” Zenodo. [Online].Available: http://doi.org/10.5281/zenodo.3066489

[49] D. Griffin and J. Lim, “Signal estimation from modified short-timeFourier transform,” IEEE Trans. Acoust. Speech Signal Process., vol. 32,no. 2, pp. 236–243, 1984.

[50] K. Matsuoka and S. Nakashima, “Minimal distortion principle for blindsource separation,” in Proc. ICA, San Diego, Dec. 2001, pp. 722–727.

[51] K. Matsuoka, “Minimal distortion principle for blind source separation,”in Proc. SICE, Aug. 2002, pp. 2138–2143.

[52] K. Sekiguchi, A. A. Nugraha, Y. Bando, and K. Yoshii, “Fast multichan-nel source separation based on jointly diagonalizable spatial covariancematrices,” Proc. EUSIPCO, Sep. 2019.

[53] T. M. Cover and J. A. Thomas, Elements of Information Theory. Wiley-Interscience, Jul. 2006.

[54] T. E. Oliphant, “Python for scientific computing,” Computing in Science& Engineering, vol. 9, no. 3, pp. 10–20, 2007.

[55] S. van der Walt, S. C. Colbert, and G. Varoquaux, “The NumPy array:A structure for efficient numerical computation,” Computing in Science& Engineering, vol. 13, no. 2, pp. 22–30, Feb. 2011.

[56] Wes McKinney, “Data structures for statistical computing in python,” inProc. 9th Python Sci. Conf., Stéfan van der Walt and Jarrod Millman,Eds., 2010, pp. 56 – 61.

[57] J. D. Hunter, “Matplotlib: A 2D graphics environment,” Computing inScience & Engineering, vol. 9, no. 3, pp. 90–95, 2007.

[58] M. Waskom, O. Botvinnik, J. Ostblom, M. Gelbart, S. Lukauskas,P. Hobson, D. C. Gemperline, T. Augspurger, Y. Halchenko, J. B.Cole, and et al., “mwaskom/seaborn: v0.10.1 (April 2020),” Apr 2020.[Online]. Available: https://github.com/mwaskom/seaborn

APPENDIX ADETERMINANT OF T k

The proof uses the matrix determinant lemma, and the factthat e>k Ekq = 0 several times,

det(T k) = det(I + ek(u− ek)H + Ekq∗e>k )

= det

(I2 +

[uH − e>ke>k

] [ek Ekq

∗])

= det

([u∗k uHEkq

1 1

])= uH(ek − Ekq

∗).

(83)

APPENDIX BQUADRATIC FORM

Let am = ekV mek, bm = emV mek, cm =e>mV mem, and 1 be the all one vector. Further let A =diag(. . . , am, . . .), m 6= k. Then,∑

m6=k

(em + qmek)HV m(em + qmek)

=∑m 6=k

am|qm|2 + (b∗mqm + bmq∗m) + cm

= qHAq + (bHq + qHb) + 1>c

= (q +A−1b)HA(q +A−1b)− bHA−1b+ 1>c.

APPENDIX CPROOF OF PROPOSITION 3

We can lower bound the objective in (P2) as follows

yHy − log(yHUy + 2Re

{yHUv

}+ vHUv + z

)≥ ‖y‖2 − log(a‖y‖2 + b‖y‖+ c), (84)

where a = λmax(U) is the largest eigenvalue of U , b =2‖Uv‖, and c = vHUv + z. We used the spectral norm ofU to bound the quadratic term, and Cauchy-Schwarz for thelinear term. Thus, we can equivalently study the real functionf(x) = x2− log(ax2 +bx+c), of x ≥ 0, with a > 0, b, c ≥ 0.One can show that the stationary points of this function arethe zeros of a third order polynomial. Thus, by the propertiesof cubic polynomials, f(x) has either one or three stationarypoints. Furthermore f(x) → +∞, when x → +∞, sincethe quadratic term grows faster than the log decreases. Thus,with a single stationary point, f(x) is strictly decreasing to aminimum, and then increasing. With three stationary points,it must be strictly decreasing, increasing, decreasing, andincreasing, with two minima and one maximum. By continuity,in both cases, f is bounded from below. �

Robin Scheibler (M’07, SM’20) is a senior re-searcher at LINE Corporation. Robin received hisB.Sc, M.Sc, and Ph.D. from Ecole PolytechniqueFédérale de Lausanne (EPFL, Switzerland). He alsoworked at the research labs of NEC Corpora-tion (Kawasaki, Japan) and IBM Research (Zürich,Switzerland). From 2017 to 2019, he was a post-doctoral fellow at the Tokyo Metropolitan Univer-sity, and then a specially appointed associate profes-sor until February 2020. Robin’s research interestsare in efficient algorithms for signal processing,

and array signal processing more particularly. He also likes to build largemicrophone arrays and is the lead developer of pyroomacoustics, an opensource library for room acoustics simulation and array signal processing.