UNIVERSIDADE DE LISBOA INSTITUTO SUPERIOR TÉCNICOusers.isr.ist.utl.pt/~csoares/thesis.pdf ·...

X

X

X

X

UNIVERSIDADE DE LISBOA

INSTITUTO SUPERIOR TÉCNICO

Distributed and robust network localization algorithms

Cláudia Alexandra Magalhães Soares

Orientador: Doutor João Pedro Castilho Pereira Santos Gomes

Thesis approved in public session to obtain the PhD Degree in

Electrical and computer Engineering

Jury final classification: Pass with Distinction and Honour

Tese Provisória

Dezembro 2015

Abstract

Signal processing over networks has been a broad and hot topic in the last few years in the

signal processing community. Networks of agents typically rely on known node positions, even

if the main goal of the network is not localization. A network of agents may comprise a large

set of miniature, low cost, low power autonomous sensing nodes. In this scenario it is generally

unsuitable or even impossible to accurately deploy all nodes in a predefined location within the

network operation area. GPS is also discarded as an option for indoor applications or due to cost

and energy consumption constraints. Also, mobile agents need localization for, e.g., motion planing,

or formation control, and GPS might not be available in many environments. Real world conditions

imply noisy environments, and the network operation calls for fast and reliable estimation of the

agents’ locations. So, galvanized by the compelling applications and the difficulty of the problem

itself, researchers have dedicated work to finding the nodes in networks. Some develop centralized

methods, while others pursue distributed, scalable solutions, either by developing approximations

or tackling the nonconvex problem, sometimes combining both approaches. With the growing

network sizes of devices constrained in energy expenditure and computation power, the need for

simple, fast, and distributed algorithms for network localization spurred the work presented on this

thesis. Here, we approach the problem starting from minimal data collection — aggregating only

range measurements and a few landmark positions — delivering a good approximated solution,

that can be fed to our fast, yet simple maximum-likelihood method, returning highly accurate

solutions. We explore tailored solutions recurring to the optimization and probability tools that can

leverage performance with noise and unstructured environments. Thus, this thesis contributions

are, mainly:

• Distributed localization algorithms characterized for their simplicity but also strong guaran-

tees;

• Analyses of convergence, iteration complexity, and optimality bounds for the designed pro-

cedures;

• Novel majorization approaches which are tailored to the specific problem structure.

i

Keywords

Distributed algorithms, convex relaxations, nonconvex optimization, maximum likelihood es-

timation, distributed iterative agent localization, robust estimation, noisy range measurements,

network localization, majorization-minimization, optimal gradient methods.

ii

Resumo

O processamento de sinal em redes tem sido um tema lato e abundante nos últimos anos

junto da comunidade científica. As redes de agentes geralmente assentam o seu funcionamento no

conhecimento da posição dos seus nós, mesmo em situações em que a localização não é o objectivo

principal da operação. Uma rede de agentes pode ser composta por um grande conjunto de nós

autónomos, de baixo custo e baixa potência. Neste cenário não é em geral adequado — ou torna-se

mesmo impossível — posicionar os nós em localizações pré-definidas dentro da área de operação.

A utilização de GPS também está excluída para aplicações no interior de edifícios ou devido ao

seu custo ou requisitos energéticos. Por outro lado, agentes móveis necessitam de conhecer a sua

localização para planear o seu movimento ou efectuar controlo de formação, por exemplo, e recursos

como o GPS podem não ser acessíveis. O mundo real implica ainda ambientes ruidosos e o objectivo

da rede exige uma estimação rápida e fiável das posições dos diversos agentes. Assim, investigadores

da área dedicaram trabalho a localizar os nós da rede, galvanizados pelas relevantes aplicações, mas

também pelo desafio do problema. Algumas linhas de trabalho centraram-se no desenvolvimento

de métodos centralizados, enquanto que outras procuram soluções distribuídas e escaláveis, quer

desenvolvendo aproximações, quer tratando directamente o problema não convexo e por vezes

combinando ambas as abordagens. Com o aumento do tamanho das redes de dispositivos parcos em

recursos energéticos e computacionais, a necessidade de algoritmos simples, rápidos e distribuídos

impulsionou o trabalho apresentado nesta tese. Nela abordamos o problema começando com um

conjunto mínimo de dados, agregando apenas medidas de distância e posições de uns poucos marcos,

entregando uma solução aproximada de média precisão que, posteriormente, pode ser alimentada

ao nosso rápido e, no entanto, simples método de máxima verosimilhança, retornando soluções

de muito alta precisão. Para ir ao encontro destas necessidades exploramos soluções à medida

recorrendo a técnicas de optimização e probabilidade que potenciam a exactidão e rapidez mesmo

na presença de ruído e ambientes não estruturados. Assim, as principais contribuições desta tese

são:

• Algoritmos distribuídos de localização, caracterizados pela sua simplicidade, mas também

pelas fortes garantias de convergência;

• Análises de convergência, complexidade em termos de iterações e limites de optimalidade

para os procedimentos em causa;

iii

• Novas abordagens de majorização feitas à medida para a estrutura específica do problema.

Palavras Chave

Algoritmos distribuídos, relaxações convexas, optimização não convexa, estimação de máxima

verosimilhança, localização distribuída de redes de agentes, estimação robusta, medidas de distância

ruidosas, localização em redes, majorização-minimização, métodos óptimos de gradiente.

iv

Contents

1 Introduction 1

1.1 Motivation and related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Scalability and networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.2 Robustness and harsh environments . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Objectives and contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Agent localization on a network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Distributed network localization without initialization: Tight convex underestimator-

based procedure 7

2.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Convex underestimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Distributed sensor network localization . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4.1 Gradient and Lipschitz constant of f . . . . . . . . . . . . . . . . . . . . . . 11

2.4.2 Parallel method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4.3 Asynchronous method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5.1 Quality of the convexified problem . . . . . . . . . . . . . . . . . . . . . . . 16

2.5.2 Parallel method: convergence guarantees and iteration complexity . . . . . 18

2.5.3 Asynchronous method: convergence guarantees and iteration complexity . . 19

2.6 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.6.1 Assessment of the convex underestimator performance . . . . . . . . . . . . 26

2.6.2 Performance of distributed optimization algorithms . . . . . . . . . . . . . . 28

2.6.3 Performance of the asynchronous algorithm . . . . . . . . . . . . . . . . . . 29

2.7 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.7.1 Convex envelope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.7.2 Lipschitz constant of ∇φBij. . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.7.3 Auxiliary Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.7.4 Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

v

Contents

2.8 Summary and further extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.8.1 Heterogeneous data fusion application . . . . . . . . . . . . . . . . . . . . . 35

3 Distributed network localization with initialization: Nonconvex procedures 37

3.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.2 Distributed Majorization-Minimization with quadratic majorizer . . . . . . . . . . 39

3.2.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2.2 Problem reformulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2.3 Majorization-Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2.4 Distributed sensor network localization . . . . . . . . . . . . . . . . . . . . 42

3.2.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3 Majorization-Minimization with convex tight majorizer . . . . . . . . . . . . . . . . 47

3.3.1 Majorization function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.3.2 Experimental results on majorization function quality . . . . . . . . . . . . 49

3.3.3 Distributed optimization of the proposed majorizer using ADMM . . . . . . 49

3.3.4 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.3.5 Proof of majorization function properties . . . . . . . . . . . . . . . . . . . 63

3.3.6 Proof of Proposition 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.3.7 Proof of (3.31) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.4 Sensor network localization: a graphical model approach . . . . . . . . . . . . . . . 66

3.4.1 Uncertainty models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.4.2 Optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.4.3 Combinatorial problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.4.4 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.4.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.4.6 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.4.7 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4 Robust algorithms for sensor network localization 77

4.1 Related work and contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.2 Discrepancy measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.3 Convex underestimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.3.1 Approximation quality of the convex underestimator . . . . . . . . . . . . . 80

4.4 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

vi

Contents

5 Conclusions and perspectives 87

5.1 Distributed network localization without initialization . . . . . . . . . . . . . . . . 88

5.2 Addressing the nonconvex problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.2.1 With more computations we can do better . . . . . . . . . . . . . . . . . . . 89

5.2.2 Network of agents as a graphical model . . . . . . . . . . . . . . . . . . . . 89

5.3 Robust network localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.4 In summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

vii

Contents

viii

List of Figures

2.1 Convex envelope for one-dimensional example . . . . . . . . . . . . . . . . . . . . . 10

2.2 One-dimensional example of the quality of the approximation of the true nonconvex

cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 Two-dimensional star network to assess the quality of optimality bounds . . . . . . 18

2.4 Proximal minimization evolution for the toy problem . . . . . . . . . . . . . . . . . 21

2.5 Proximal minimization cost evolution for the toy problem . . . . . . . . . . . . . . 22

2.6 Network 1. Topology with 4 anchors and 10 sensors. Anchors are marked with blue

squares and sensors with red stars. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.7 Network 2. Topology with 4 anchors and 50 sensors. Anchors are also marked with

blue squares and sensors with red stars. . . . . . . . . . . . . . . . . . . . . . . . . 25

2.8 Relaxation quality experiment for different noise levels . . . . . . . . . . . . . . . . 26

2.9 Relaxation quality experiment for high power noise . . . . . . . . . . . . . . . . . . 27

2.10 Estimates for the location of the sensor nodes (network with 10 agents). . . . . . . 27

2.11 Performance comparison: Algorithm 1 vs. projection method. . . . . . . . . . . . . 28

2.12 Performance comparison: Algorithm 1 vs. ESDP method. . . . . . . . . . . . . . . 29

2.13 Performance of the asynchronous algorithm. . . . . . . . . . . . . . . . . . . . . . . 29

3.1 Nonconvex reformulation illustration . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.2 Evolution of cost and average error per sensor with communications, for Algorithm 4

and the benchmark, under low power noise. . . . . . . . . . . . . . . . . . . . . . . 44


and the benchmark, under medium power noise. . . . . . . . . . . . . . . . . . . . 45


and the benchmark, under high power noise. . . . . . . . . . . . . . . . . . . . . . . 46

3.5 Tightness evaluation for the proposed majorizer in (3.15). . . . . . . . . . . . . . . 48

3.6 Evaluation of majorizer performance for different initializations. . . . . . . . . . . . 50

3.7 Performance comparison: Algorithm 7 vs. SGO; noiseless range measurements. . . 59

3.8 Performance comparison: Algorithm 7 vs. SGO; noisy range measurements. . . . . 60

ix

List of Figures

3.9 Performance comparison: Algorithm 7 vs. SGO; noisy range measurements, random

anchors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.10 Performance comparison: Algorithm 7 vs. SGO; with increasing measurement noise. 62

3.11 Performance comparison: Algorithm 7 vs. SGO; accuracy and communications. . . 63

3.12 Performance comparison: Algorithm 7 vs. SGO; increasing parameter value. . . . . 64

3.13 Average cost over Monte Carlo trials. . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.14 Mean positioning error per sensor over Monte Carlo trials. . . . . . . . . . . . . . . 74

3.15 Rank of the solution matrix E in the tested Monte Carlo trials. . . . . . . . . . . . 74

4.1 Comparison of nonconvex cost functions. . . . . . . . . . . . . . . . . . . . . . . . . 80

4.2 Quality of the proposed relaxation (4.4). . . . . . . . . . . . . . . . . . . . . . . . . 81

4.3 Illustration of the relaxation quality of (4.4). . . . . . . . . . . . . . . . . . . . . . 85

4.4 Estimates for sensor positions for the three discrepancy functions. . . . . . . . . . 86

4.5 Average positioning error vs. the value of the Huber function parameter. . . . . . . 86

x

List of Tables

2.1 Bounds on the optimality gap for the example in Figure 2.2 . . . . . . . . . . . . . 18

2.2 Bounds on the optimality gap for the 2D example in Figure 2.3 . . . . . . . . . . . 18

2.3 Number of communications per sensor for the results in Fig. 2.12 . . . . . . . . . . 28

3.1 Mean positioning error, with measurement noise . . . . . . . . . . . . . . . . . . . 43

3.2 Squared error dispersion over Monte Carlo trials for Figure 3.7. . . . . . . . . . . 59

3.3 Squared error dispersion over Monte Carlo trials for Figure 3.8. . . . . . . . . . . . 60



3.6 Cost values per sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.1 Bounds on the optimality gap for the example in Figure 4.3 . . . . . . . . . . . . . 82

4.2 Average positioning error per sensor (MPE/sensor), in meters . . . . . . . . . . . . 83

4.3 Average positioning error per sensor (MPE/sensor), in meters, for the biased exper-

iment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

xi

List of Tables

xii

List of Algorithms

1 Parallel method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Asynchronous method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Asynchronous update at each node i . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Distributed nonconvex localization algorithm . . . . . . . . . . . . . . . . . . . . . 42

5 Minimization of the tight majorizer in (3.19) . . . . . . . . . . . . . . . . . . . . . 50

6 Nesterov’s optimal method for (3.35) . . . . . . . . . . . . . . . . . . . . . . . . . 54

7 Step 2 of Algorithm 5 using ADMM: position updates . . . . . . . . . . . . . . . . 58

8 Distributed monotonic spanning tree-based algorithm . . . . . . . . . . . . . . . . 72

9 Coordinate descent algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

xiii

List of Algorithms

xiv

1Introduction

Contents1.1 Motivation and related work . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Scalability and networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Robustness and harsh environments . . . . . . . . . . . . . . . . . . . . 3

1.2 Objectives and contributions . . . . . . . . . . . . . . . . . . . . . . . 31.3 Agent localization on a network . . . . . . . . . . . . . . . . . . . . . . 4

1

1. Introduction

Networks of agents are becoming ubiquitous. From environmental and infrastructure monitor-

ing to surveillance, and healthcare, networked extensions of the human senses in contemporary

technological societies are improving our quality of life, our productivity, and our safety. Appli-

cations of such networks recurrently need to be aware of node positions to fulfill their tasks and

deliver meaningful information. Nevertheless, locating the nodes is not trivial: these small, low

cost, low power devices are deployed in large numbers, often with imprecise prior knowledge of

their locations, and might be equipped with minimal processing capabilities. Such limitations call

for localization algorithms which are scalable, fast, and parsimonious in their communication and

computational requirements.

1.1 Motivation and related work

The network localization problem is not new (see e.g., Bulusu et al. in [1], published in 2000),

nevertheless, the scientific community is still striving for a usable, practicable solution to the prob-

lem of spatially localizing a network of agents. Nowadays, with the increasing number of networked

devices for applications which are localization dependent, the scalability problem imposes itself as

a real and urgent need. Solutions based on centralized semidefinite programming relaxations like

in [2] are, thus, not suitable for problems with hundreds of nodes, and many distributed approaches,

like [3] demand that each node solve a semidefinite problem at each algorithm iteration — a re-

quirement difficult to attain by the small, low power, inexpensive hardware usually deployed in

applications. There is the need for simple, distributed and fast methods for localization, methods

easy to understand by a non-specialist engineer, but with no concessions in terms of convergence

and expected behavior. This thesis provides such methods with the best performance in accuracy

and convergence rate, demanding only simple arithmetic operations.

Also, the signal processing community has approached the network localization problem with

noisy range measurements assuming that noise is Gaussian, as did Biswas et al. [2]. Nevertheless,

in many applications, added to the indeed Gaussian measurement noise, we find outlier data — due

to, e.g., environmental conditions, malfunctioning or malicious nodes. This problem, with great

practical impact in applications, is also recognized as interesting in the literature (see e.g., Ihler

et al. [4] or more recently Simonetto et al. [3]). Two noteworthy works that address this largely

unexplored problem are presented by Korkmaz et al. in [5], where the authors provide a distributed

nonconvex procedure using the Huber M-estimator, but highly dependent on the initialization,

and the approach by Oğuz-Ekim et al. [6], which does not need initialization information but is

centralized, assumes a Laplacian noise model, and does not scale well with the size of the network.

2

1.2 Objectives and contributions

Our work bridges this gap, by providing an estimate that does not depend on the initialization,

does not assume an outlier distribution, and is lightweight in its computing needs.

As the present work spans over such a varied set of tools and flavors of the sensor network

localization problem, more in depth treatment of related work is included in each chapter.

The remainder of this section briefly discusses some aspects that should be taken into account

when devising a system that effectively delivers localization information to a network of agents.

1.1.1 Scalability and networks

With agents operating in networks and processing increasing amounts of data, today’s computa-

tional paradigm is shifting from a centralized to a distributed operation. Self-localization of agents

is one of the basic requirements of such networked environments, enabling other applications.

This paradigm shift raises questions such as: is it possible to pursue parallel and even asyn-

chronous optimization algorithms and, at the same time, ensure their optimality? Considering

that communication expends more energy than computation, fewer algorithm iterations will mean

not only a faster response, but also longer battery life and thus extended operation.

The simplicity of agents might mean that the amount of computation is also a limited resource,

and only simple operations are available.

Also, the spread of this type of solution throughout applications imposes some simplicity con-

straints to the procedures to be implemented, and avoidance of tuning parameters, so they can be

intuitively and swiftly interpreted by a nonspecialist.

1.1.2 Robustness and harsh environments

Agents deployed in unstructured or uncontrolled environments must often cope with sporadic

but strong measurement impairments that are difficult to characterize probabilistically, and which

can greatly perturb the accuracy of regular algorithms. In this sense, in addition to the scalability

and communication performance issues, we must work toward robust behavior to outlier measure-

ments. With this concern in mind, this thesis contains some work in progress in the area of robust

network localization, also aiming at simple, efficient, and understandable solutions.

1.2 Objectives and contributions

The focus of this work is to develop approaches to the network localization problem that lead

to efficient, simple, and intuitive distributed algorithms, always seeking a rigorous performance

analysis, and provable convergence guarantees. Whenever we design a convexified proxy to the

problem, we always try to produce a bound for the optimality gap of the resulting estimate, also to

understand how we can better tailor the method to the specific problem, taking the most out of the

problem structure. When designing nonconvex refinement solutions, we aim at both performance

and also guaranteed convergence. The broad goals of this thesis are to:

3

1. Introduction

• Study optimization strategies to ensure accuracy and simplicity;

• Consider probabilistic tools to analyze performance in highly unstructured procedures.

Thus, the main contributions of this thesis are the following:

• A distributed network localization algorithm that requires neither parameter tuning nor ini-

tialization in the vicinity of the solution. This algorithm has two flavors: a synchronous,

parallel one and an asynchronous one. Convergence guarantees, optimality bounds and iter-

ation complexity are provided for both. The main body of work was accepted for publication

on the IEEE Transactions on Signal Processing [7].

• A distributed network localization algorithm with no parameter tuning, addressing directly

the nonconvex Maximum Likelihood estimation problem for Gaussian noise. We provided

convergence guarantees to a stationary point, capitalizing on the properties of the Majorization-

Minimization (MM) framework. This work was presented at GlobalSIP 2014 [8].

• A novel, tight majorization function specially crafted for the nonconvex Maximum Likelihood

problem for Gaussian noise, whose preliminary experimental results show a substantial im-

provement in performance over the general purpose quadratic majorizer. The working draft

for this paper is [9].

• A probabilistic approach to the problem, relying in the framework of graphical models.

Also, we re-derive the standard LP relaxation for the maximum a posteriori problem and,

capitalizing on this reformulation, we propose an SDP relaxation which is tighter and shows

better results in the localization problem. We also propose a descent method, requiring no

initialization, based on the least squares and the majorization-minimization framework more

accurate than a coordinate descent method.

• A novel convex relaxation for a robust formulation of the discrepancy function arising from

the Maximum Likelihood problem for Gaussian noise. Instead of considering the square of

the difference between acquired measurements and distances of estimated points, we consider

the Huber M-estimator. This leads to an interesting improvement in the performance under

Gaussian and outlier noise, while preserving the Maximum Likelihood properties within a

prescribed region. The working draft of this paper is [10].

A common denominator in the present thesis contributions is the exploitation of problem structure,

crafting solutions which are intuitive, simple, and robust.

1.3 Agent localization on a network

We represent mathematically the network of agents as an undirected graph G = (V, E), where

the node set V = 1, 2, . . . , n designates agents with unknown positions. An edge i ∼ j ∈ E

4

1.3 Agent localization on a network

between sensors i and j means there is a noisy range measurement between nodes i and j known

to both, and i and j can communicate with each other. Anchors are elements with known positions

and they are collected in the set A = 1, . . . ,m. For each agent i ∈ V, we define Ai ⊂ A as the

subset of anchors with quantified range measurement to agent i. The setNi collects the neighboring

agents of node i.

Let Rp be the space of interest (p = 2 for planar networks, and p = 3 otherwise). We denote

by xi ∈ Rp the position of agent i, and by dij the noisy range measurement between agents i and

j, available at both i and j.

Anchor positions are denoted by ak ∈ Rp. We let rik denote the noisy range measurement

between agent i and anchor k, available at agent i.

The distributed network localization problem addressed in this thesis consists in estimating the

agents’ positions x = xi : i ∈ V, from the available measurements dij : i ∼ j∪rik : i ∈ V, k ∈Ai, through collaborative message passing between neighboring agents in the communication

graph G.Under the assumption of zero-mean, independent and identically-distributed, additive Gaussian

measurement noise, the maximum likelihood estimator for the agent positions is the solution of

the optimization problem

minimizex

f(x), (1.1)

where

f(x) =∑i∼j

1

2(‖xi − xj‖ − dij)2 +

∑i

∑k∈Ai

1

2(‖xi − ak‖ − rik)2.

Problem (1.1) is nonconvex and NP-hard for generic network configurations [11]1.

Problem (1.1) is similar to the Multidimensional scaling (MDS) problem presented in Costa et

al. in [13]. In classical MDS, one must have access to distance measurements between all nodes,

or be able to estimate them. To circumvent this in a sparse network as the geometric topology of

sensor networks, the authors write the MDS problem as

minimizex

∑i∈V

∑j>i

wij (‖xi − xj‖ − dij)2+∑k∈A

wik (‖xi − ak‖ − rik)2,

where the weights wij are zero whenever there is no measurement between nodes i and j, or positive

— chosen by the user — reflecting how accurate are the range measurements dij .

1In [12] the authors prove that highly dense networks in R2 (with edge set cardinality |E| ≥ 2|V| + |V|(|V|+1)2

)can be localized in polynomial time. This density would correspond to an average node degree 〈k〉 ≥ 5 + |V|, whichis not realistic in practice.

5

1. Introduction

6

2Distributed network localization

without initialization: Tight convexunderestimator-based procedure

Contents2.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Convex underestimator . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Distributed sensor network localization . . . . . . . . . . . . . . . . . 11

2.4.1 Gradient and Lipschitz constant of f . . . . . . . . . . . . . . . . . . . . 112.4.2 Parallel method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4.3 Asynchronous method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5.1 Quality of the convexified problem . . . . . . . . . . . . . . . . . . . . . 162.5.2 Parallel method: convergence guarantees and iteration complexity . . . 182.5.3 Asynchronous method: convergence guarantees and iteration complexity 19

2.6 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.6.1 Assessment of the convex underestimator performance . . . . . . . . . . 262.6.2 Performance of distributed optimization algorithms . . . . . . . . . . . . 282.6.3 Performance of the asynchronous algorithm . . . . . . . . . . . . . . . . 29

2.7 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.7.1 Convex envelope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.7.2 Lipschitz constant of ∇φBij . . . . . . . . . . . . . . . . . . . . . . . . . 312.7.3 Auxiliary Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.7.4 Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.8 Summary and further extensions . . . . . . . . . . . . . . . . . . . . . 342.8.1 Heterogeneous data fusion application . . . . . . . . . . . . . . . . . . . 35

7

2. Distributed network localization without initialization: Tight convexunderestimator-based procedure

After deploying a network of agents we would like to estimate the optimal configuration, given

a set of acquired range measurements and a (small) set of reference positions. As we have seen,

solving (1.1) is, in general, an NP-hard problem, and any polynomial time algorithm can only

aspire to deliver a local minimizer; the returned estimate will be initialization dependent, but at

this point of the network’s operation, any meaningful initialization might be impossible. For such

scenarios where there is no possibility of producing a valuable hint on the true agent positions, one

might turn to some approximation of the problem (1.1) that can be globally minimized and, at the

same time, captures the “main shape” of the original problem, particularly, the location of its global

minimizer. This kind of estimate can have low precision, but it may be sufficient for some practical

purposes. Even if this is not the case, it returns invaluable information to initialize a descent

algorithm over the nonconvex problem (1.1) to refine the solution. The work described in this

Chapter was published in the IEEE Transactions on Signal Processing. Some new, unsubmitted,

material was added to this thesis, as signaled in the text.

2.1 Contributions

We propose a convex underestimator of the maximum likelihood cost for the sensor network lo-

calization problem (1.1) based on the convex envelopes of its terms. We also obtain a simple bound

for the optimality gap given an estimate produced by the algorithm. We present an optimal syn-

chronous and parallel algorithm to minimize this convex underestimator with proven convergence

guarantees. We also propose an asynchronous variant of this algorithm, prove that it converges

almost surely, and we analyze its iteration complexity.

Moreover, we assert the superior performance of our algorithms by computer simulations; we

compared several aspects of our method with [3], [6], and [14], and our approach always yields

better performance metrics. When compared with the method in [3], which operates under the same

conditions, our method outperforms it by one order of magnitude in accuracy and in communication

volume.

2.2 Related work

Reference [15] proposes a parallel distributed algorithm to minimize a discrepancy function

based on squared distances, which is known to amplify large variance noise. Also, each element

in the sensor network must solve a second-order cone program at each algorithm iteration, which

can be a demanding task for the simple hardware used in such networks. Furthermore, the formal

8

2.3 Convex underestimator

convergence properties of the algorithm are not established. The work in [16] considers network

localization outside a maximum-likelihood framework. The approach is not parallel, operating

sequentially through layers of nodes: neighbors of anchors estimate their positions and become

anchors themselves, making it possible in turn for their neighbors to estimate their positions, and

so on. Position estimation is based on planar geometry-based heuristics. In [17], the authors

propose an algorithm with assured asymptotic convergence, but the solution is computationally

complex since a triangulation set must be calculated, and matrix operations are pervasive. Fur-

thermore, in order to attain good accuracy, a large number of range measurement rounds must

be acquired, one per iteration of the algorithm, thus increasing energy expenditure. On the other

hand, the algorithm presented in [18] and based on the non-linear Gauss Seidel framework, has

a pleasingly simple implementation, combined with convergence guarantees inherited from the

framework. Notwithstanding, this algorithm is sequential, i.e., nodes perform their calculations

in turn, not in a parallel fashion. This entails the existence of a network-wide coordination pro-

cedure to precompute the processing schedule upon startup, or whenever a node joins or leaves

the network. The sequential nature of the work in [18] was superseded by the work in [3] which

puts forward a parallel method based on two consecutive relaxations of the maximum likelihood

estimator in (1.1). The first relaxation is a semi-definite program with a rank relaxation, while the

second is an edge based relaxation, best suited for the Alternating Direction Method of Multipliers

(ADMM). The main drawback is the amount of communications required to manage the ADMM

variable local copies, and by the prohibitive complexity of the problem at each node. In fact, each

one of the simple sensing units must solve a semidefinite program at each ADMM iteration and

after the update copies of the edge variables must be exchanged with each neighbor. A simpler

approach was devised in [14] by extending the source localization Projection Onto Convex Sets

algorithm in [19] to the problem of sensor network localization. The proposed method is sequential,

activating nodes one at a time according to a predefined cyclic schedule; thus, it does not take

advantage of the parallel nature of the network and imposes a stringent timetable for individual

node activity.


Problem (1.1) can be written as

minimizex

∑i∼j

1

2d2

Sij(xi − xj) +

∑i

∑k∈Ai

1

2d2

Saik(xi), (2.1)

where d2C(x) represents the squared euclidean distance of point x to the set C, i.e.,

d2C(x) = inf

y∈C‖x− y‖2,

9


−1 −0.5 0 0.5 1

0

0.2

0.4

d2S i j(z )

d2B i j

(z )

Bi j = z ∈ R : |z | ≤ 0.5

S i j = z ∈ R : |z | = 0.5

Figure 2.1: Illustration of the convex envelope for intersensor terms of the nonconvex cost func-tion (2.1). The squared distance to the ball Bij (dotted line) is the convex hull of the squareddistance to the sphere Sij (dashed line). In this one dimensional example the value of the rangemeasurement is dij = 0.5

and the sets Sij and Saik are defined as the spheres generated by the noisy measurements dij and

rik

Sij = z : ‖z‖ = dij , Saik = z : ‖z − ak‖ = rik .

Nonconvexity of (2.1) follows from the nonconvexity of the building block

1

2d2

Sij(z) =

1

2inf

‖y‖=dij‖z − y‖2. (2.2)

A simple convexification consists in replacing it by

1

2d2

Bij(z) =

1

2inf

‖y‖≤dij‖z − y‖2 (2.3)

where Bij = z ∈ Rp : ‖z‖ ≤ dij , is the convex hull of Sij . Actually, (2.3) is the convex envelope1

of (2.2). This fact is illustrated in Figure 2.1 with a one-dimensional example; a formal proof for

the generic case is given in Section 2.7.1.

The terms of (2.1) associated with anchor measurements are similarly relaxed as

d2Baik

(z) = inf‖y−ak‖≤rik

‖z − y‖2, (2.4)

where the set Baik is the convex hull of Saik: Baik = z ∈ Rp : ‖z − ak‖ ≤ rik . Replacing the

nonconvex terms in (2.1) by (2.3) and (2.4) we obtain the convex problem

minimizex

f(x) =∑i∼j

1

2d2

Bij(xi − xj) +

∑i

∑k∈Ai

1

2d2

Baik(xi). (2.5)

The function in Problem (2.5) is an underestimator of (2.1) but it is not the convex envelope of

the original function. We argue that in our application of sensor network localization it is generally

a very good approximation whose sub-optimality can be quantified, as discussed in Section 2.5.1.1The convex envelope (or convex hull) of a function γ is its best possible convex underestimator, i.e., conv γ(x) =

sup η(x) : η ≤ γ, η is convex, and is hard to determine in general.

10

2.4 Distributed sensor network localization

The cost function (2.5) also appears in [14], albeit via a different reasoning; our convexification

mechanism seems more intuitive. But the striking difference with respect to [14] is how (2.5) is

exploited to generate distributed solution methods. Whereas [14] lays out a sequential block-

coordinate approach, we show that (2.5) is amenable to distributed solutions either via the fast

Nesterov’s gradient method (for synchronous implementations) or exact/inexact randomized block-

coordinate methods (for asynchronous implementations).


We propose two distributed algorithms: a synchronous one, where nodes work in parallel, and

an asynchronous, gossip-like algorithm, where each node starts its processing step according to

some probability distribution. Both algorithms entail computing the gradient of the cost function

and its Lipschitz constant. In order to achieve this it is convenient to rewrite Problem (2.5) as

minimizex

1

2d2

B(Ax) +∑i

∑k∈Ai

1

2d2

Baik(xi), (2.6)

where A = C ⊗ Ip, C is the arc-node incidence matrix of G, Ip is the identity matrix of size p, and

B is the cartesian product of the balls Bij corresponding to all the edges in E . We denote the two

terms in (2.6) as

g(x) =1

2dB

2(Ax), h(x) =∑i

hi(xi),

where hi(xi) =∑k∈Ai

12d2

Baik(xi). Problems (2.5) and (2.6) are equivalent since Ax is the vector

(xi − xj : i ∼ j) and function g(x) in (2.6) can be written as

g(x) =1

2dB

2(Ax)

=1

2infy∈B‖Ax− y‖2

=1

2inf

‖yij‖≤dij

∑i∼j‖xi − xj − yij‖2.

As all the terms are non-negative and the constraint set is a cartesian product, we can exchange inf

with the summation, resulting in

g(x) =1

2

∑i∼j

inf‖yij‖≤dij

‖xi − xj − yij‖2

=∑i∼j

1

2dB

2ij(xi − xj),

which is the corresponding term in (2.5).

2.4.1 Gradient and Lipschitz constant of f

To simplify notation, let us define the functions:

φBij(z) =

1

2d2

Bij(z), φBaik

(z) =1

2d2

Baik(z).

11


Now we call on a key result from convex analysis (see [20, Prop. X.3.2.2, Th. X.3.2.3]): the

function in (2.3), φBij (z) = 12d2

Bij(z) is convex, differentiable, and its gradient is

∇φBij (z) = z − PBij (z), (2.7)

where PBij (z) is the orthogonal projection of point z onto the closed convex set Bij

PBij (z) = argminy∈Bij

‖z − y‖.

Further, function φBijhas a Lipschitz continuous gradient with constant Lφ = 1, i.e.,

‖∇φBij(x)−∇φBij

(y)‖ ≤ ‖x− y‖. (2.8)

We show (2.8) in Section 2.7.2.

Let us define a vector-valued function φB, obtained by summing all functions φBij . Then, g(x) =

φB(Ax). From this relation, and using (2.7), we can compute the gradient of g(x):

∇g(x) = A>∇φB(Ax)

= A>(Ax− PB(Ax))

= Lx−A>PB(Ax), (2.9)

where the second equality follows from (2.7) and L = A>A = L⊗ Ip, with L being the Laplacian

matrix of G. This gradient is Lipschitz continuous and we can obtain an easily computable Lipschitz

constant Lg as follows

‖∇g(x)−∇g(y)‖ = ‖A> (∇φB(Ax)−∇φB(Ay)) ‖

≤ |||A||| ‖Ax−Ay‖

≤ |||A|||2 ‖x− y‖

= λmax(A>A)‖x− y‖(a)= λmax(L)‖x− y‖

≤ 2δmax︸︷︷︸Lg

‖x− y‖, (2.10)

where |||A||| is the maximum singular value norm; equality (a) is a consequence of Kronecker

product properties. In (2.10) we denote the maximum node degree of G by δmax. A proof of the

bound λmax(L) ≤ 2δmax can be found in [21]2.

The gradient of h is ∇h(x) = (∇h1(x1), . . . ,∇hn(xn)) , where the gradient of each hi is

∇hi(xi) =∑k∈Ai

∇φBaik(xi). (2.11)

2A tighter bound would be λmax(L) ≤ maxi∼j δi + δj − c(i, j) where δi is the degree of node i and c(i, j) isthe number of vertices that are adjacent to both i and j [22, Th. 4.13], nevertheless 2δmax is easier to compute in adistributed way.

12


The gradient of h is also Lipschitz continuous. The constants Lhifor ∇hi are

‖∇hi(xi)−∇hi(yi)‖ ≤∑k∈Ai

‖∇φBaik(xi)−∇φBaik

(yi)‖

≤ |Ai|‖xi − yi‖, (2.12)

where |C| is the cardinality of set C. We now have an overall constant Lh for ∇h,

‖∇h(x)−∇h(y)‖ =

√∑i

‖∇hi(xi)−∇hi(yi)‖2

≤√∑

i

|Ai|2‖xi − yi‖2

≤ max(|Ai| : i ∈ V)︸︷︷︸Lh

‖x− y‖. (2.13)

We are now able to write ∇f , the gradient of our cost function, as

∇f(x) = Lx−A>PB(Ax) +

∑k∈A1

x1 − PBa1k(x1)...∑

k∈Anxn − PBank(xn)

. (2.14)

A Lipschitz constant Lf is, thus,

Lf = 2δmax + max(|Ai| : i ∈ V). (2.15)

This constant is easy to precompute through in-network processing by, e.g., a diffusion algorithm

— c.f. [23, Ch. 9] for more information.

Although we restricted ourselves to a fixed (time-invariant) topology, it is easy to show, taking

the worse-case scenario, that the computation of a Lipschitz constant is possible for time-varying

topologies. For this worst-case constant we replace the maximum node degree δmax and maximum

number of connected anchors to one single node, and max(|Ai| : i ∈ V) by the corresponding values

of the complete network, resulting in

Lf = 2(|V| − 1) + |A|.

Note, however, that the algorithm would be slower, but the constant would still be valid for any

topology.

In summary, we can compute the gradient of f using Equation (2.14) and a Lipschitz constant

by (2.15), which leads us to the algorithms described in Sections 2.4.2 and 2.4.3 for minimizing f .

2.4.2 Parallel method

Since f has a Lipschitz continuous gradient we can follow Nesterov’s optimal method [24]. Our

approach is detailed in Algorithm 1. In Step 7, c(i∼j,i) is the entry (i ∼ j, i) in the arc-node

incidence matrix C, and δi is the degree of node i.

13


Algorithm 1 Parallel methodInput: Lf ; dij : i ∼ j ∈ E; rik : i ∈ V, k ∈ A;Output: x1: k = 0;2: each node i chooses random xi(0) = xi(−1);3: while some stopping criterion is not met, each node i do4: k = k + 1

5: wi = xi(k − 1) +k − 2

k + 1(xi(k − 1)− xi(k − 2)) ;

6: node i broadcasts wi to its neighbors7: ∇gi(wi) = δiwi −

∑j∈Ni

wj +∑j∈Ni

c(i∼j,i)PBij(wi − wj);

8: ∇hi(wi) =∑k∈Ai

wi − PBa ik(wi);

9: xi(k) = wi −1

Lf(∇gi(wi) +∇hi(wi));

10: end while11: return x = x(k)

Parallel nature of Algorithm 1

The updates in Step 9 of the algorithm require the computation of the gradient of the cost

w.r.t. the position of node i. This corresponds to the i-th entry of ∇f , given in (2.14). The last

summand in (2.14) is simply ∇h(x), and the i-th entry of ∇h(x) is given in (2.11). This can be

easily computed independently by each node. The i-th entry of Lx can be computed by node i,

from its current position estimate and the position estimates of the neighbors, in particular, it

holds (Lx)i = δixi−∑j∈Ni

xj . The less obvious parallel term is A>PB(Ax). We start the analysis

by the concatenated projections PB(Ax) = PBij(xi − xj)i∼j∈E . Each one of those projections

only depends on the edge terminals and the noisy measurement dij . The product with A> will

collect, at the entries corresponding to each node, the sum of the projections relative to edges where

it intervenes, with a positive or negative sign depending on the arbitrary edge direction agreed upon

at the onset of the algorithm. More specifically, (A>PB(Ax))i =∑j∈Ni

c(i∼j,i)PBij(xi − xj), aspresented in Step 7 of Algorithm 1.

2.4.3 Asynchronous method

The method described in Algorithm 1 is fully parallel but still depends on some synchronization

between all the nodes — so that their updates of the gradient are consistent. This requirement

can be inconvenient in some applications of sensor networks; to circumvent it, we present a fully

asynchronous method, achieved by means of a broadcast gossip scheme (c.f. [25] for an extended

survey of gossip algorithms).

Nodes are equipped with independent clocks ticking at random times (say, as Poisson point

processes). When node i’s clock ticks, it performs the update of its variable xi and broadcasts the

update to its neighbors. Let the order of node activation be collected in ξkk∈N, a sequence of

14


Algorithm 2 Asynchronous methodInput: Lf ; dij : i ∼ j ∈ E; rik : i ∈ V, k ∈ A;Output: x1: each node i chooses random xi(0);2: k = 0;3: while some stopping criterion is not met, each node i do4: k = k + 1;5: if ξk = i then6: xi(k) = argminwi

f(x1(k − 1), . . . , wi, . . . , xn(k − 1))7: else8: xi(k) = xi(k − 1)9: end if

10: end while11: return x = x(k)

independent random variables taking values on the set V, such that

P(ξk = i) = Pi > 0. (2.16)

Then, the asynchronous update of variable xi on node i can be described as in Algorithm 2.

To compute the minimizer in Step 6 of Algorithm 2 it is useful to recast Problem (2.6) as

minimizex

∑i

∑j∈Ni

1

4d2

Bij(xi − xj) +

∑k∈Ai

1

2d2

Baik(xi)

, (2.17)

where the factor 14 accounts for the duplicate terms when considering summations over nodes

instead of over edges. By fixing the neighbor positions, each node solves a single-source localization

problem; this setup leads to the Problem

minimizexi

fsli(xi) :=∑j∈Ni

1

4d2

Bsij(xi) +

∑k∈Ai

1

2d2

Baik(xi), (2.18)

where Bsij = z ∈ Rp : ‖z − xj‖ ≤ dij. We call the reader’s attention to the fact that the function

in (2.18) is continuous and coercive; thus, the optimization problem (2.18) has a solution.

We solve (2.18) at each node by employing Nesterov’s optimal accelerated gradient method as

described in Algorithm 3. The asynchronous method proposed in Algorithm 2 converges to the set

of minimizers of function f , as established in Theorem 2, in Section 2.5.

We also propose an inexact version in which nodes do not solve Problem (2.18) but instead

take just one gradient step. That is, simply replace Step 6 in Algorithm 2 by

xi(k) = xi(k − 1)− 1

Lf∇if(x(k − 1)) (2.19)

where ∇if(x1, . . . , xn) is the gradient with respect to xi, and assume

P (ξk = i) =1

n. (2.20)

The convergence of the resulting algorithm is established in Theorem 3, Section 2.5.

15


Algorithm 3 Asynchronous update at each node iInput: ξk;Lf ; dij : j ∈ Ni; rik : k ∈ Ai;Output: xi(k)1: if ξk not i then2: xi(k) = xi(k − 1);3: return xi(k);4: end if5: choose random z(0) = z(−1);6: l = 0;7: while some stopping criterion is not met do8: l = l + 1;

9: w = z(l − 1) +l − 2

l + 1(z(l − 1)− z(l − 2));

10: ∇fsli(w) =1

2

∑j∈Ni

w − PBS ij(w) +∑k∈Ai

w − PBa ik(w)

11: z(l) = w − 1

Lf∇fsli(w)

12: end while13: return xi(k) = z(l)

2.5 Analysis

A relevant question regarding Algorithms 1 and 2 is whether they will return a good solution to

the problem they are designed to solve, after a reasonable amount of computations. Sections 2.5.2

and 2.5.3 address convergence issues of the proposed methods, and discuss some of the assumptions

on the problem data. Section 2.5.1 provides a formal bound for the gap between the original and

the convexified problems.

2.5.1 Quality of the convexified problem

While evaluating any approximation method it is important to know how far the approximate

optimum is from the original one. In this Section we will focus on this analysis.

It was already noted in Section 2.3 that φBij (z) = φSij (z) for ‖z‖ ≥ dij ; when the functions

differ, for ‖z‖ < dij , we have that φBij (z) = 0. The same applies to the terms related to anchor

measurements. The optimal value of function f , denoted by f?, is bounded by

f? = f(x?) ≤ f? ≤ f(x?),

where x? is the minimizer of the convexified problem (2.5), and

f? = infxf(x)

is the minimum of function f . With these inequalities we can compute a bound for the optimality

16

2.5 Analysis

0 2 3 4 5 7

0

10

20

f(x)

f (x)

nodene ighbor ne ighbor ne ighbor1D starNetwork

Figure 2.2: One-dimensional example of the quality of the approximation of the true nonconvexcost f(x) by the convexified function f(x) in a star network. Here the node positioned at x = 3has 3 neighbors.

gap, after (2.5) is solved, as

f? − f? ≤ f(x?)− f?

=∑i∼j∈E

1

2

(d2

Sij(x?i − x?j )− d2

Bij(x?i − x?j )

)+∑i∈V

∑k∈Ai

1

2

(d2

Saik(x?i )− d2

Baik(x?i )

)=

∑i∼j∈E2

1

2d2

Sij(x?i − x?j ) +

∑i∈V

∑k∈A2i

1

2d2

Saik(x?i ).

(2.21)

In Equation (2.21), we denote the set of edges where the distance of the estimated positions is less

than the distance measurement by E2 = i ∼ j ∈ E : d2Bij

(x?i − x?j ) = 0, and similarly A2i =

k ∈ Ai : d2Baik

(x?i ) = 0. Inequality (2.21) suggests a simple method to compute a bound for the

optimality gap of the solution returned by the algorithms:

1. Compute the optimal solution x? using Algorithm 1 or 2;

2. Select the terms of the convexified problem (2.5) which are zero;

3. Add the nonconvex costs of each of these edges, as in (2.21).

Our bound is tighter than the one (available a priori) from applying [26, Th. 1], which is

f? − f? ≤∑i∼j∈E

1

2d2ij +

∑i∈V

∑k∈Ai

1

2r2ik. (2.22)

For the one-dimensional example of the star network costs depicted in Figure 2.2 the bounds

in (2.21), and (2.22), averaged over 500 Monte Carlo trials, are presented in Table 2.1. The

true average gap f? − f? is also shown. In the Monte Carlo trials we sampled a zero mean

Gaussian random variable with σ = 0.25 and obtained a noisy range measurement as described

later by (2.31).

17


Table 2.1: Bounds on the optimality gap for the example in Figure 2.2

f? − f? Equation (2.21) Equation (2.22)

0.0367 0.0487 3.0871

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4

−0.4

−0.2

0

0.2

0.4

0.6

1

2

3

4

5

Figure 2.3: 2D network example to assess the quality of the bound in Equation (2.21). Blue squaresstand for anchors, while the red star is a sensor with unknown position.

A two-dimensional example was also produced to check if the bound is also informative in

2D. Our bound is in the same order of magnitude as the true optimality gap, whereas the bound

in (2.22) is two orders of magnitude greater. For the simple example in Figure 2.3 we obtain the

results of Table 2.2.

These results show the tightness of the convexified function and how loose the bound (2.22) is

when applied to our problem.

2.5.2 Parallel method: convergence guarantees and iteration complexity

As Problem (2.6) is convex and the cost function has a Lipschitz continuous gradient, Algo-

rithm 1 is known to converge at the optimal rateO(k−2

)[24], [27]: f(x(k))−f? ≤ 2Lf

(k+1)2 ‖x(0)− x?‖2 .

Table 2.2: Bounds on the optimality gap for the 2D example in Figure 2.3

f? − f? Equation (2.21) Equation (2.22)

4.5801 6.0899 384.1226

18

2.5 Analysis

2.5.3 Asynchronous method: convergence guarantees and iteration com-plexity

To state the convergence properties of Algorithm 2 we only need Assumption 1; it is used to

prove coerciveness of the relaxed cost in (2.5).

Assumption 1. There is at least one anchor linked to some sensor and the graph G is connected

(there is a path between any two sensors).

This assumption holds generally as one needs p+ 1 anchors to eliminate translation, rotation,

and flip ambiguities while performing localization in Rp, which exceeds the assumption requirement.

We present two convergence results, — Theorem 2, and Theorem 3 — and the iteration complexity

analysis for Algorithm 2 in Proposition 4. Proofs of the Theorems are detailed in Section 2.7.

The following Theorem establishes the almost sure (a.s.) convergence of Algorithm 2.

Theorem 2 (Almost sure convergence of Algorithm 2). Let x(k)k∈N be the sequence of points

produced by Algorithm 2, or by Algorithm 2 with the update (2.19), and let X ? = x? : f(x?) = f?be the set of minimizers of function f defined in (2.5). Then it holds:

limk→∞

dX? (x(k)) = 0, a.s. (2.23)

In words, with probability one, the iterates x(k) will approach the set X ? of minimizers of f ;

this does not imply that x(k)k∈N will converge to one single x? ∈ X ?, but it does imply that

limk→∞ f(x(k)) = f?, since X ? is a compact set, as proved in Section 2.7, Lemma 5.

Theorem 3 (Almost sure convergence to a point). Let x(k)k∈N be a sequence of points generated

by Algorithm 2, with the update (2.19) in Step 6, and let all nodes start computations with uniform

probability. Then, with probability one, there exists a minimizer of f , denoted by x? ∈ X ?, suchthat

x(k)→ x?. (2.24)

This result not only tells us that the iterates of Algorithm 2 with the modified Step 6 stated in

Equation (2.19) converge to the solution set, but it also guarantees that they will not be jumping

around the solution set X ? (unlikely to occur in Algorithm 2, but not ruled out by the analysis).

One of the practical benefits of Theorem 3 is that the stopping criterion can safely probe the

stability of the estimates along iterations. To the best of our knowledge, this kind of strong type

of convergence (the whole sequence converges to a point in X ?) was not established previously in

the context of randomized approaches for convex functions with Lipschitz continuous gradients,

though it was derived previously for randomized proximal-based minimizations of a large number

of convex functions, cf. [28, Proposition 9].

Proposition 4 (Iteration complexity for Algorithm 2). Let x(k)k∈N be a sequence of points

generated by Algorithm 2, with the update (2.19) in Step 6, and let the nodes be activated with

19


equal probability. Choose 0 < ε < f(x(0)) − f? and ρ ∈ (0, 1). There exists a constant b(ρ, x(0))

such that

P(f(x(k))− f? ≤ ε

)≥ 1− ρ (2.25)

for all

k ≥ K =2nb(ρ, x(0))

ε+ 2− n. (2.26)

The constant b(ρ, x(0)) can be computed from inequality (19) in [29]; it depends only on the

initialization and the chosen ρ. We remind that n is the number of sensor nodes. Proposition 4 is

saying that, with high probability, the function value f(x(k)) for all k ≥ K will be at a distance

no larger than ε of the optimal, and the number of iterations K depends inversely on the chosen ε.

Proof of Proposition 4. As f is differentiable and has Lipschitz gradient, the result trivially follows

from [29, Th. 2].

A natural question to pose, is whether the strong convergence properties of the inexact version

still apply to the exact version. Actually, we can disprove it, with a small toy example.

A toy example. The explanation of this counter-intuitive phenomenon lies in the lack of unique-

ness of minimizers for the function in (2.18), as this function is not necessarily strictly convex at

all iterations. The ambiguity in selecting minimizers across iterations may generate oscillations.

Consider a network localization problem in R with two anchors placed at 0 and 3, and two

nodes placed at 1 and 2. Assume that node 1 measures its distance to anchor 1 with no noise,

node 2 measures its distance to anchor 2 with no noise, and nodes 1 and 2 measure their mutual

distance (with noise) as 1.2. The problem we face is the minimization of

f(x1, x2) =1

2d2B(x1 − x2) +

1

2d2A1

(x1) +1

2d2A2

(x2), (2.27)

where A1 = [−1, 1], A2 = [2, 4] and B = [−1.2, 1.2].

Consider the initialization x2(0) = 2.6 and assume that nodes minimize (2.27) alternatively

x1(k + 1) = arg minx1

f(x1, x2(k)), x2(k + 1) = arg minx2

f(x1(k + 1), x2), (2.28)

for k = 0, 1, . . .. It is straightforward to check that the assignments x1(1) = 1.2, x2(k) = 2.05 +

0.05(−1)k for k ≥ 1 and x1(k) = 1− 0.1/k for k ≥ 2 obey (2.28), i.e., are valid algorithm outputs.

In this example x1(k) converges whereas x2(k) oscillates (we can adjust the example such that

both oscillate in the optimal set). Note, however, that x1(k) and x2(k) are optimal for (2.27) as

soon as k ≥ 2 (of course, for larger networks, optimality cannot be certified at a single node as in

this simple scenario).

The subproblem faced by node 2 depends only on x1(k + 1). Thus, selecting one minimizer—

when it has many—corresponds to establishing a “rule” for each given x1(k + 1). The example

shows that not any rule will lead to strong convergence.

20

2.5 Analysis

0 5 10 15 20 25 301

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

Figure 2.4: Proximal minimization evolution for the toy problem: iterates x1(k) (lower-curve, blue)and x2(k) (upper-curve, red) for k = 1, . . . , 30.

Proximal minimization. A possible approach to circumvent non-uniqueness of minimizers

in (2.18) is to add a proximal term (as this makes the function strictly convex). In the context of

the toy problem, this translates into replacing (2.28) with


f(x1, x2(k)) +c

2(x1 − x1(k))2, (2.29)

and


f(x1(k + 1), x2) +c

2(x2 − x2(k))2 (2.30)

for some c > 0, possibly time-varying. Problems (2.29)-(2.30) have now unique solutions at all

iterations. However, the proximal terms tend to slow down convergence. With the same initial-

ization as above (x2(0) = 2.6, x1(1) = 1.2) and c = 1, Figure 2.4 shows the first 30 iterations

of (2.29)-(2.30), and Figure 2.5 shows the corresponding cost function values (2.27). We see that

optimality is not reached after 30 iterations (recall that, for (2.28), it is attained at the 2nd iterate,

with zero cost).

Other approaches. Another option would be to set a systematic rule for selecting minimizers,

whenever there are many. For example, always picking the one with lowest norm. Intuitively, this

21


0 5 10 15 20 25 3010

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

Figure 2.5: Function values f(x1(k), x2(k)), cf. (2.27), corresponding to the iterates in Figure 2.4.

22

2.6 Numerical experiments

should stabilize iterations but the implementation of such a rule would complicate the numeri-

cal solution of the inner problems (2.18) substantially (also, the theoretical analysis seems very

challenging).

In sum, given that oscillations of the iterations for the exact version of Algorithm 2 are rarely

observed in practice (the example above is highly artificial), it is unclear if alternative approaches

that secure strong convergence are worth pursuing, from a practical standpoint. Note that our

inexact version that guarantees strong convergence is both simple to implement and to certify

theoretically.


In this Section we present experimental results that demonstrate the superior performance of our

methods when compared with four state of the art algorithms: Euclidean Distance Matrix (EDM)

completion presented in [6], Semidefinite Program (SDP) relaxation and Edge-based Semidefinite

Program (ESDP) relaxation, both implemented in [3], and a sequential projection method (PM)

in [14] optimizing the same convex underestimator as the present work, with a different algorithm.

The fist two methods — EDM completion and SDP relaxation — are centralized, whereas the

ESDP relaxation and PM are distributed.

Setup

We conducted simulations with two uniquely localizable geometric networks with sensors dis-

tributed in a two-dimensional square of unit area, with 4 anchors in the corners. Network 1,

depicted in Figure 2.6, has 10 sensor nodes with an average node degree3 of 4.3, while network 2,

shown in Figure 2.7, has 50 sensor nodes and average node degree of 6.1. The ESDP method was

only evaluated in network 1 due to simulation time constraints, since it involves solving an SDP

at each node, and each iteration. The noisy range measurements are generated according to

dij = |‖x?i − x?j‖+ νij |, rik = |‖x?i − ak‖+ νik|, (2.31)

where x?i is the true position of node i, and νij : i ∼ j ∈ E ∪ νik : i ∈ V, k ∈ Ai are

independent Gaussian random variables with zero mean and standard deviation σ. The accuracy

of the algorithms is measured by the original nonconvex cost in (1.1) and by the Root Mean

Squared Error (RMSE) per sensor, defined as

RMSE =

√√√√ 1

n

(1

M

M∑m=1

‖x? − x(m)‖2), (2.32)

where M is the number of Monte Carlo trials performed.

3To characterize the used networks we resort to the concepts of node degree ki, which is the number of edgesconnected to node i, and average node degree 〈k〉 = 1/n

∑ni=1 ki.

23


0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2

3 4

56

7

8

9

10

11

12

13

14

Figure 2.6: Network 1. Topology with 4 anchors and 10 sensors. Anchors are marked with bluesquares and sensors with red stars.

24


0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

Figure 2.7: Network 2. Topology with 4 anchors and 50 sensors. Anchors are also marked withblue squares and sensors with red stars.

25


0.01 0.05 0.1 0.30.06

0.1

0.14

0.18

0.21

0.25

Measurement noise σ

RMSE

EDM complet ion

SDP re laxation

Disk re laxation

Figure 2.8: Relaxation quality: Root mean square error comparison of EDM completion in [6],SDP relaxation in [3] and the disk relaxation (2.5); measurements were perturbed with noise withdifferent values for the standard deviation σ. The disk relaxation approach in (2.5) improved onthe RMSE values of both EDM completion and SDP relaxation for all noise levels, even though itdoes not rely on the SDP machinery. The performance gap to EDM completion is substantial.

2.6.1 Assessment of the convex underestimator performance

The first experiment aimed at comparing the performance of the convex underestimator (2.5)

with two other state of the art convexifications. For the proposed disk relaxation (2.5), Algorithm 1

was stopped when the gradient norm ‖∇f(x)‖ reached 10−6, while both EDM completion and SDP

relaxation were solved with the default SeDuMi solver [30] with eps = 10−9, so that algorithm

properties did not mask the real quality of the relaxations. Figures 2.8 and 2.9 report the results

of the experiment with 50 Monte Carlo trials over network 2 and measurement noise with σ =

[0.01, 0.05, 0.1, 0.3]; so, we had a total of 200 runs, equally divided by the 4 noise levels. In

Figure 2.8 we can see that the disk relaxation in (2.5) has better performance for all noise levels.

Figure 2.9 depicts the results of optimizing the three convex functions for the same problems in

RMSE vs. execution time, which reflects, albeit imperfectly, the complexities of the considered

algorithms. The convex surrogate (2.5) used in the present work combined with our methods is

faster by at least one order of magnitude.

We tested all convex relaxations for robustness to sensors outside the convex hull of the anchors

and they all performed worse in such conditions. This type of behavior has been previously noted

by several authors. The noise-free network with 10 nodes and 4 anchors is depicted in Figure 2.6.

Notice that there are some sensors placed near the boundary of the anchors’ convex hull. The

result of 40 Monte Carlo runs is shown in Figure 2.10. This plot is illustrative of the behavior of

the tested algorithms: both Algorithm 1 and the centralized version of [3] are somewhat better in

more interior nodes, like 9 and 7, and perform not so well in the more peripheric nodes, near the

boundary of the anchors’ convex hull, like 5 and 14.

26


1.21 27.31 152.05

0.130.14

0.25

Execution time

RMSE

EDM

complet ion

SDP re laxation

Disk

re laxation

Figure 2.9: Relaxation quality: Comparison of the best achievable root mean square error versusoverall execution time of the algorithms. Measurements were contaminated with noise with σ = 0.1.Although disk relaxation (2.5) has a distributed implementation, running it sequentially can befaster by one order of magnitude than the centralized methods.

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2

3 4

56

7

8

9

10

11

12

13

14

Measurement noise σ = 0.01

Figure 2.10: Estimates for the location of the sensor nodes, based on 40 Monte Carlo trials, fornetwork 1, shown in Figure 2.6. Red dots express the output of Algorithm 1, blue circles indicatethe estimates of the centralized version of [3], and yellow stars represent the EDM completionalgorithm in [6].

27


0.4 0.6 2

x 104

0.06

0.07

0.1

Communicat ions per sensor

RMSE

Pro ject ion method (σ =0.05)

Proposed method (σ = 0.05)

Proposed method (σ = 0.01)

Pro ject ion method(σ = 0.01)

Figure 2.11: Performance of the proposed method in Algorithm 1 and of the Projection methodpresented in [14]. The stopping criterion for both algorithms was a relative improvement of 10−6

in the estimate. The proposed method uses fewer communications to achieve better RMSE for thetested noise levels. Our method outperforms the projection method with one fourth the numberof communications for a noise level of 0.01.

2.6.2 Performance of distributed optimization algorithms

To measure the performance of the presented Algorithm 1 in a distributed setting we compared

it with the state of the art methods in [14] and the distributed algorithm in [3]. The results

are shown, respectively, in Figures 2.11 and 2.12. The experimental setups were different, since

the authors proposed different stopping criteria for their algorithms and, in order to do a fair

comparison, we ran our algorithm with the specific criterion set by each benchmark method. Also,

to compare with the distributed ESDP method in [3], we had to use a smaller network of 10 sensors

because of simulation time constraints — as the ESDP method entails solving an SDP problem at

each node, the simulation time becomes prohibitively large, at least using a general-purpose solver.

The number of Monte Carlo trials was 32, with 3 noise levels, leading to 96 realizations for each

noisy measurement. In the experiment illustrated in Figure 2.11, the stopping criterion for both

the projection method and the presented method was the relative improvement of the solution; we

stress that this is not a distributed stopping criterion, we adopted it just for the sake of algorithm

comparison. We can see that the proposed method fares better not only in RMSE but, foremost,

in communication cost. The experiment comprised 120 Monte Carlo trials and two noise levels.

Table 2.3: Number of communications per sensor for the results in Fig. 2.12ESDP method Algorithm 1

21600 2000

From the analysis of both Figure 2.12 and Table 2.3 we can see that the ESDP method is one

order of magnitude worse in RMSE performance, using one order of magnitude more communica-

tions, than Algorithm 1.

28


0.01 0.05 0.1

0.02

0.070.11

0.35

0.44


RMSE

ESDP method

Proposed method

Figure 2.12: Performance of the proposed method in Algorithm 1 and of the ESDP method in [3].The stopping criterion for both algorithms was the number of algorithm iterations. The perfor-mance advantage of the proposed method in Algorithm 1 is even more remarkable when consideringthe number of communications presented in Table 2.3.

0.01 0.05 0.1

0.02

0.06


Costvalue

Paralle l algorithm

Asynchronous algorithm

Figure 2.13: Final cost of the parallel Algorithm 1 and its asynchronous counterpart in Algorithm 2with an exact update for the same number of communications. Results for the asynchronous versiondegrade less than those of the parallel one as the noise level increases. The stochastic Gauss-Seideliterations prove to be more robust to intense noise.

2.6.3 Performance of the asynchronous algorithm

A second set of experiments examined the performance of the parallel and asynchronous flavors

of our method, presented respectively in Algorithms 1 and 2, the latter with exact updating. The

metric was the value of the convex cost function f in (2.5) evaluated at each algorithm’s estimate

of the minimum. For fairness, both algorithms were allowed to run until they reached a preset

number of communications. In Figure 2.13 we present the effectiveness of both algorithms in

optimizing the disk relaxation cost (2.5), with the same amount of communications. We chose the

random variables ξk representing the sequence of updating nodes in the asynchronous version of

our method, with uniform probability. Again, we ran 50 Monte Carlo trials, each with 3 noise

levels, thus leading to 150 samplings of the noise variables in (2.31).

29


The more robust behavior of the asynchronous version is a phenomenon empirically observed

in other optimization algorithms, when comparing deterministic and randomized versions. In [31,

Section 6.3.5] the authors prove that, for a fixed-point algorithm with given properties and “with

bounded communication delays, the convergence rate is geometric and, under certain conditions,

it is superior to the convergence rate of the corresponding synchronous iteration”. Also, in [28] the

author states that, for the problem considered in the cited paper, “the randomized order provides

a worst-case performance advantage over the cyclic order”. Our numerical experiments suggest a

similar behavior of our Algorithms 1 and 2, but we don’t have further theoretical support for these

observations.

2.7 Proofs

2.7.1 Convex envelope

We show that the function in (2.3) is the convex envelope of the function in (2.2). Refer to α

as the function in (2.2) and β as the function in (2.3). We show that α?? = β where f? denotes

the Fenchel conjugate of a function f , cf. [20, Cor. 1.3.6, p. 45, v. 2].

We start by computing α?:

α?(s) = supzs>z − α(z)

= supzs>z −

(1

2inf

‖y‖=dij‖z − y‖2

)= sup

zsup‖y‖=dij

s>z − 1

2‖z − y‖2

= sup‖y‖=dij

supzs>z − 1

2‖z − y‖2

= sup‖y‖=dij

1

2‖s‖2 + s>y

=1

2‖s‖2 + dij ‖s‖ .

Thus, α? is the sum of two closed convex functions: α? = g + h where g(s) = 12 ‖s‖

2 and h(s) =

dij ‖s‖. Note that h(s) = σB(0,dij)(s) where σC(s) = sups>x : x ∈ C denotes the support

function of a set C. Thus, using [20, Th. 2.3.1, p. 61, v. 2], we have

α??(z) = infz1+z2=z

g?(z1) + h?(z2).

Since g?(z1) = 12 ‖z1‖2 [20, Ex. 1.1.3, p. 38, v. 2] and h?(z2) = iBij (z2) [20, Ex. 1.1.5, p. 39, v. 2]

where iC(x) = 0 if x ∈ C and iC(x) = +∞ if x 6∈ C denotes the indicator of a set C, we conclude

that

α??(z) = infz1+z2=z

1

2‖z1‖2 + iBij

(z2)

= infz2∈Bij

1

2‖z − z2‖2

= β(z).

30

2.7 Proofs

2.7.2 Lipschitz constant of ∇φBij

We prove the inequality in (2.8):∥∥∇φBij(x)−∇φBij

(y)∥∥ ≤ ‖x− y‖ (2.33)

where ∇φBij(z) = z − PBij

(z), and PBij(z) is the projector onto Bij = z ∈ Rp : ‖z‖ ≤ dij.

Squaring both sides of (2.33) gives the equivalent inequality

2(P(x)− P(y))>(x− y)− ‖P(x)− P(y)‖2 ≥ 0 (2.34)

where, to simplify notation, we let P(z) := PBij(z). Inequality (2.34) can be rewritten as

(P(x)− P(y))>(x− y) + (P(x)− P(y))>

(P(y)− y)

+(P(x)− P(y))>(x− P(x)) ≥ 0. (2.35)

By the properties of projectors onto closed convex sets, (z−P(z))>(w−P(z)) ≤ 0, for any w ∈ Bij

and any z, cf. [20, Th. 3.1.1, p. 117, v. 1]. Thus, the last two terms on the left-hand side of (2.35)

are nonnegative. Moreover, the first term is nonnegative due to [20, Prop. 3.1.3, p. 118, v. 1].

Inequality (2.35) is proved.

2.7.3 Auxiliary Lemmas

In this Section we establish basic properties of Problem (2.6) in Lemma 5 and also two technical

Lemmas, instrumental to prove our convergence results in Theorem 2.

Lemma 5 (Basic properties). Let f as defined in (2.5). Then the following properties hold.

1. f is coercive;

2. f? ≥ 0 and X ? 6= ∅;

3. X ? is compact;

Proof.

1. By Assumption 1 there is a path from each node i to some node j which is connected to

an anchor k. If ‖xi‖ → ∞ then there are two cases: (1) there is at least one edge t ∼ u

along the path from i to j where ‖xt‖ → ∞ and ‖xu‖ 6→ ∞, and so d2Btu

(xt − xu) → ∞;

(2) if ‖xu‖ → ∞ for all u in the path between i and j, in particular we have ‖xj‖ → ∞ and

so d2Bajk

(xj)→∞, and in both cases f →∞, thus, f is coercive.

2. Function f defined in (2.5) is a sum of squares, it is continuous, convex and a real valued

function, lower bounded by zero; so, the infimum f? exists and is non-negative. To prove

this infimum is attained and X ? 6= ∅, we consider the set T = x : f(x) ≤ α; T is a sublevel

set of a continuous, coercive function and, thus, it is compact. As f is continuous, by the

Weierstrass Theorem, the value p = infx∈T f(x) is attained; the equality f? = p is evident.

31


3. X ? is a sublevel set of a continuous coercive function and, thus, compact.

Lemma 6. Let x(k)k∈N be the sequence of iterates of Algorithm 2, or of Algorithm 2 with the

update (2.19), and ∇f (x(k)) be the gradient of function f evaluated at each iterate. Then,

1.∑k≥1

‖∇f (x(k)) ‖2 <∞, a.s.;

2. ∇f (x(k))→ 0, a.s.

Proof. Let Fk = σ (x(0), · · · , x(k)) be the sigma-algebra generated by all the algorithm iterations

until time k. We are interested in E[f (x(k)) |Fk−1

], the expected value of the cost value of the kth

iteration, given the knowledge of the past k − 1 iterations. Firstly, let us examine function φ :

Rp → R, the slice of f along a coordinate direction, φ(y) = f(x1, . . . , xi−1, y, xi+1, . . . , xn). As f

has Lipschitz continuous gradient with constant Lf , so will φ: ‖∇φ(y)−∇φ(z)‖ ≤ Lf‖y − z‖, forall y and z, and, thus, it will inherit the property

φ(y) ≤ φ(z) + 〈∇φ(z), y − z〉+Lf2‖y − z‖2. (2.36)

Inequality (3.18) is known as the Descent Lemma [32, Prop. A.24]. The minimizer of the quadratic

upper-bound in (3.18) is z − 1Lf∇φ(z), which can be plugged back in (3.18), obtaining

φ? ≤ φ(z − 1

Lf∇φ(z)

)≤ φ(z)− 1

2Lf‖∇φ(z)‖2. (2.37)

In the sequel, for a given x = (x1, . . . , xn), we let

f?i (x−i) = inff(x1, . . . , xi−1, z, xi+1, . . . , xn) : z.

Going back to the expectation E[f (x(k)) |Fk−1

]=∑ni=1 Pif

?i (x−i(k − 1)), we can bound it from

above, recurring to (2.37), by

n∑i=1

Pi

(f(x(k − 1))− 1

2Lf‖∇if(x(k − 1))‖2

)

= f(x(k − 1))− 1

2Lf

n∑i=1

Pi‖∇if(x(k − 1))‖2

(a)

≤ f(x(k − 1))− Pmin

2Lf‖∇f(x(k − 1))‖2, (2.38)

where we used 0 < Pmin ≤ Pi, for all i ∈ 1, · · · , n in (a). To alleviate notation, let g(k) =

∇f(x(k)); we then have

‖g(k)‖2 =∑i≤k

‖g(i)‖2 −∑i≤k−1

‖g(i)‖2,

and adding Pmin

2L

∑i≤k−1 ‖g(i)‖2 to both sides of the inequality in (2.38), we find that

E [Yk|Fk−1] ≤ Yk−1, (2.39)

32

2.7 Proofs

where Yk = f(x(k)) + Pmin

2L

∑i≤k−1 ‖g(i)‖2. Inequality (2.39) defines the sequence Ykk∈N as a

supermartingale. As f(x) is always non-negative, then Yk is also non-negative and so [33, Corollary

27.1],

Yk → Y, a.s.

In words, the sequence Yk converges almost surely to an integrable random variable Y . This entails

that∑k≥1 ‖g(k)‖2 <∞, a.s., and so, g(k)→ 0, a.s.

The previous arguments show that Lemma 6 holds for Algorithm 2. To show that Lemma 6

also holds for Algorithm 2 with the update (2.19) it suffices to redefine

f?i (x−i) := f

(x1, . . . , xi −

1

Lf∇if(x), . . . , xn

).

As the second inequality in (2.37) shows, we have the bound

f?i (x−i(k − 1)) ≤ f(x(k − 1))− 1

Lf

∥∥∥∇if (x(k − 1))∥∥∥2

and the rest of the proof holds intact.

Lemma 7. Let x(k)k∈N be one of the sequences generated with probability one according to

Lemma 6. Then,

1. The function value decreases to the optimum: f(x(k)) ↓ f?;

2. There exists a subsequence of x(k)k∈N converging to a point in X ?: x(kl)→ y, y ∈ X ?.

Proof. As f is coercive, then the sublevel set Xf =x : f(x) ≤ f(x(0))

is compact and, be-

cause f(x(k)) is non increasing, all elements of x(k)k∈N belong to this set. From the compactness

of Xf we have that there is a convergent subsequence x(kl)→ y. We evaluate the gradient at this

accumulation point, ∇f(y) = liml→∞∇f(x(kl)), which, by assumption, vanishes, and we therefore

conclude that y belongs to the solution set X ?. Moreover, the function value at this point is, by

definition, the optimal value.

2.7.4 Theorems

Equipped with the previous lemmas, we are now ready to prove the Theorems stated in Sec-

tion 2.5.

Proof of Theorem 2. Suppose the distance does not converge to zero. Then, there exists an ε > 0

and some subsequence x(kl)l∈N such that dX?(x(kl)) > ε. But, as f is coercive (by Lemma 5),

continuous, and convex, and whose gradient, by Lemma 6, vanishes, then by Lemma 7, there is a

subsequence of x(kl)l∈N converging to a point in X ?, which is a contradiction.

Proof of Theorem 3. Fix an arbitrary point x? ∈ X ?. We start by proving that the sequence of

squared distances to x? of the estimate produced by Algorithm 2, with the update defined in

33


Equation (2.19), converges almost surely; that is, the sequence ‖x(k)− x?‖2k∈N is convergent

with probability one. We have

E[‖x(k)− x?‖2|Fk−1

]= (2.40)

n∑i=1

1

n

∥∥∥∥∥x(k − 1)− 1

Lfgi(k − 1)− x?

∥∥∥∥∥2

where gi(k−1) = (0, . . . , 0,∇if(x(k−1)), 0, . . . , 0) and Fk = σ (x(1), . . . , x(k)) is the sigma-algebra

generated by all iterates until time k. Expanding the right-hand side of (2.40) yields

‖x(k − 1)− x?‖2 +1

nL2f

∥∥∥∇f(x(k − 1))∥∥∥2

− 2

nLf(x(k − 1)− x?)>∇f(x(k − 1)).

Since (x(k−1)−x?)>∇f(x(k−1)) = (x(k−1)−x?)>(∇f(x(k − 1))−∇f(x?)

)≥ 0, we conclude

that

E[‖x(k)− x?‖2|Fk−1

]≤ ‖x(k − 1)− x?‖2 +

1

nL2f

∥∥∥∇f(x(k − 1))∥∥∥2

.

Now, as proved in Lemma 6, the sum∑k ‖∇f(x(k))‖2 converges almost surely. Thus, invoking

the result in [34], we get that ‖x(k)− x?‖2 converges almost surely.

We can now invoke the technique at the end of the proof of [28, Prop. 9] to conclude that x(k)

converges to some optimal point x?.

2.8 Summary and further extensions

Experiments in Section 3.4.7 show that our method is superior to the state of the art in

all measured indicators. While the comparison with the projection method published in [14] is

favorable to our proposal, it should be further considered that the projection method has a different

nature when compared to ours: it is sequential, and such algorithms will always have a larger

computation time than parallel ones, since nodes run in sequence; moreover, this computation

time grows with the number of sensors while parallel methods retain similar speed, no matter how

many sensors the network has.

When comparing with a distributed and parallel method similar to Algorithm 1, like the ESDP

method in [3] we can see one order of magnitude improvement in RMSE for one order of magni-

tude fewer communications of our method — and this score is achieved with a simpler, easy-to-

implement algorithm, performing simple computations at each node that are well suited to the

kind of hardware commonly found in sensor networks. Also, unlike SDP methods, our proposal

preserves positions, which need not to be recovered back from the SDP estimate.

34

2.8 Summary and further extensions

There are some important questions not addressed here. For example, it is not clear what

influence the number of anchors and their spatial distribution can have in the performance of the

proposed and state of the art algorithms. Also, an exhaustive study on the impact of varying

topologies and number of sensors could lead to interesting results. Some preliminary experiments

show that all convex relaxations experience some performance degradation when tested for ro-

bustness to sensors outside the convex hull of the anchors. This issue has been noted by several

authors, but a more exhaustive study exceeds the scope of this thesis.

But with the data presented here one can already grasp the advantages of our fast and easily

implementable distributed method, where the optimality gap of the solution can also be easily

quantified, and which offers two implementation flavours for different localization needs.

2.8.1 Heterogeneous data fusion application

A spin-off of the presented method was developed and already submitted for publication. The

problem in (2.5) can be thought of as a minimization of the squared discrepancy between a data

model and measured data. In this perspective, we envisioned an extension of the present work, by

fusing the range measurements procedure with angle information. This can be done by considering

a new edgeset Eu containing pairs of nodes with measured angle between them, and the squared

distance d2`tv

(·) to a line `tv, passing through the origin and defined by the unit vector utv. The

problem is, then,

minimizex

∑i∼j∈E

1

2σ2d2

Bij(xi−xj)+

∑i∼j∈Eu

1

2σ2`

d2`tv (xt−xv)+

∑i

(∑k∈Ai

1

2σ2d2

Baik(xi) +

∑k∈Aui

1

2σ2`

d2`aik

(xi)

),

(2.41)

where Aui is the set of anchors with angle measurements related to node i, σ is the standard

deviation of the Gaussian noise term in (2.31), and σ` is the standard deviation of the noise in

the angle measurement statistics4. The simulation and real data results are very encouraging, and

the possibility of fusing two different types of information on the same minimization offers a new

flexibility to localization.

4Note the simplifying assumption that the statistics over the measured angle has a Gaussian distribution.

35


36

3Distributed network localization withinitialization: Nonconvex procedures

Contents3.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.2 Distributed Majorization-Minimization with quadratic majorizer . 39

3.2.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2.2 Problem reformulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2.3 Majorization-Minimization . . . . . . . . . . . . . . . . . . . . . . . . . 413.2.4 Distributed sensor network localization . . . . . . . . . . . . . . . . . . 423.2.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3 Majorization-Minimization with convex tight majorizer . . . . . . . 473.3.1 Majorization function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.3.2 Experimental results on majorization function quality . . . . . . . . . . 493.3.3 Distributed optimization of the proposed majorizer using ADMM . . . . 493.3.4 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.3.5 Proof of majorization function properties . . . . . . . . . . . . . . . . . 633.3.6 Proof of Proposition 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.3.7 Proof of (3.31) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.4 Sensor network localization: a graphical model approach . . . . . . 663.4.1 Uncertainty models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.4.2 Optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.4.3 Combinatorial problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.4.4 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.4.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.4.6 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.4.7 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723.4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

37

3. Distributed network localization with initialization: Nonconvex procedures

Now imagine we have some prior knowledge on where our agents are located. This knowledge

can come from either deployment instructions or a previous run of a convexified algorithm, like the

one presented in Chapter 2, or, maybe, from known (or estimated) positions at a previous moment

in time. Imagine you want an accurate estimate of your network configuration, but you need it

fast and with a simple and stable implementation. This Chapter addresses this scenario, providing

results concerning MM with a quadratic majorizer, MM with a tighter majorizer, and a graphical

model approach to the problem. The work described in the next Section was partially presented n

the 2014 IEEE GlobalSIP conference.

3.1 Related work

As we have seen previously, distributed and maximum-likelihood (thus nonconvex) approaches

to the sensor network localization problem are much less common than centralized or relaxation-

based approaches, despite the more suited nature of this computational paradigm to the problem

at hand. The work in [15] proposes a parallel distributed algorithm. However, the sensor network

localization problem adopts a discrepancy function between squared distances which, unlike the

ones in maximum likelihood (ML) methods, is known to amplify measurement errors and outliers.

The convergence properties of the algorithm are not studied theoretically. The work in [16] also

considers network localization outside a ML framework. The approach proposed in [16] is not par-

allel, operating sequentially through layers of nodes: neighbors of anchors estimate their positions

and become anchors themselves, making it possible in turn for their neighbors to estimate their

positions, and so on. Position estimation is based on planar geometry-based heuristics. In [17], the

authors propose an algorithm with assured asymptotic convergence, but the solution is computa-

tionally complex since a triangulation set must be calculated, and matrix operations are pervasive.

Furthermore, in order to attain good accuracy, a large number of range measurement rounds must

be acquired, one per iteration of the algorithm, thus increasing energy expenditure. The algorithm

presented in [18] is a nonlinear Gauss-Seidel approach: only one node works at a time and solves a

source localization problem with neighbors playing the role of anchors. The nodes activate sequen-

tially in a round-robin scheme. Thus, the time to complete just one cycle becomes proportional to

the network size. Parallel algorithms — the ones we are interested in Chapters 2 and the present

one — avoid this issue altogether, as all nodes operate simultaneously; moreover, adding or delet-

ing a node raises no special synchronization concern. The work presented in [35] puts forward

a two-stage algorithm which is parallel: in a first consensus phase, a Barzilai-Borwein (BB) step

size is calculated, followed by a local gradient computation phase. It is known that BB steps do

38

3.2 Distributed Majorization-Minimization with quadratic majorizer

not necessarily decrease the objective function; as discussed in [36], an outer globalization scheme

involving line searches is needed to ensure its stability. However, line searches are cumbersome to

implement in a distributed setting and are, in fact, absent in [35]. Further, the algorithm requires

the step size to be computed via consensus, and thus the number of consensus rounds needed is a

parameter to tune.

3.2 Distributed Majorization-Minimization with quadratic ma-jorizer

We propose a simple, stable and distributed algorithm which directly optimizes the nonconvex

maximum likelihood criterion for sensor network localization, with no need to tune any free pa-

rameter. We reformulate the problem to obtain a gradient Lipschitz cost; by shifting to this cost

function we enable a Majorization-Minimization (MM) approach based on quadratic upper bounds

that decouple across nodes; the resulting algorithm happens to be distributed, with all nodes work-

ing in parallel. Our method inherits the stability of MM: each communication cuts down the cost

function. Numerical simulations indicate that the proposed approach tops the performance of state

of the art algorithms, both in accuracy and communication cost.

The algorithm we present has an astonishingly simple implementation which is both parallel and

stable, with no free parameters. In Section 3.4.7 we will compare experimentally the performance

of our method with the distributed, parallel, state of the art method in [35].

3.2.1 Contributions

We tackle the nonconvex problem in (1.1) directly, with a simple and efficient algorithm which:

1. is parallel;

2. does not involve any free parameter definition;

3. is proven not to increase the value of the cost function at each iteration (thus, stable);

4. has better performance in positioning error and cost value than a state of the art method,

while requiring fewer communications.

The first and second claims are addressed in Section 3.2.4, the third in Section 3.4.7.B and the last

one in Section 3.4.7, dedicated to numerical experiments.

3.2.2 Problem reformulation

We can reformulate Problem (1.1) as

minimizexi,yij ,wik

∑i∼j

1

2‖xi − xj − yij‖2 +

∑i

∑j∈Ai

1

2‖xi − ak − wik‖2 (3.1)

subject to ‖yij‖ = dij , ‖wij‖ = rij .

39


dij

xi xj

dSij(xi xj)

Sij

d2Sij

(xi xj) = minimizeyij

kxi xj yijk2

subject to kyijk = dij

Figure 3.1: Illustration of the reformulation in (3.1) of Problem (1.1). The sphere Sij of radius dijis defined as y ∈ Rp : ‖y‖ = dij.

This reformulation is illustrated in Figure 3.1. We now rewrite (3.1) as

minimizexi,yij ,wik

1

2‖Ax− y‖2 +

∑i

1

2‖xi ⊗ 1− αi − wi‖2 (3.2)

subject to ‖yij‖ = dij , ‖wik‖ = rik,

with concatenated vectors x = (xi)i∈V , y = (yij)i∼j , αi = (aik)k∈Ai, and wi = (wik)k∈Ai

. In (3.2),the symbol 1 stands for the vector of ones. Matrix A is the result of the Kronecker product of thearc-node incidence matrix1 C with the identity matrix Ip: A = C⊗ Ip. Problem (3.2) is equivalentto

minimizexi,yij ,wik

1

2

∥∥∥∥∥∥[A −I 0] xyw

∥∥∥∥∥∥2

+1

2‖Ex− α− w‖2

subject to ‖yij‖ = dij , ‖wik‖ = rik,

where α = (αi)i∈V , w = (wi)i∈V , and E is a matrix with zeros and ones, selecting the entriesin α and w corresponding to each sensor node. We now collect all the optimization variablesin z = (x, y, w), and rewrite our problem as

minimizez

1

2

∥∥[A −I 0]z∥∥2

+1

2

∥∥[E 0 −I]z − α

∥∥2

subject to z ∈ Z,

where

Z = z = (x, y, w) : ‖yij‖ = dij , i ∼ j, wik = rik, i ∈ V, k ∈ Ai.Problem (3.2) can be written as

minimizez

f(z) =1

2zTMz − bT z (3.3)

subject to z ∈ Z, (3.4)

for M and b defined as

M = M1 +M2, b =

ET0−I

α, (3.5)

M1 =

AT−I0

[A −I 0], M2 =

ET0−I

[E 0 −I].

1Each edge is arbitrarily assigned a direction by the two incident nodes.

40


3.2.3 Majorization-Minimization

To solve Problem (3.3) in a distributed way we must deal with the complicating off-diagonal

entries of M that couple the sensors’ variables. We emphasize a simple, but key fact:

Remark 8. The function optimized in Problem (3.3) is quadratic in z and, thus, has a Lipschitz

continuous gradient [32], i.e.,

‖∇f(x)−∇f(y)‖ ≤ L‖x− y‖,

for some L and all x, y.

From this property of function f we can obtain the upper bound (also found in [32]) f(z) ≤f(zt)+〈∇f(zt), z − zt〉+ L

2 ‖z − zt‖2, for any point zt and use it as a majorizer in the Majorization-

Minimization framework [37]. This majorizer decouples the variables and allows for a distributed

solution. This happens because the quadratic term is a diagonal matrix and, so, there are no

off-diagonal terms to couple the sensors’ position variables. Our algorithm is simply:

zt+1 = argminz∈Z

f(zt) +⟨∇f(zt), z − zt

⟩+L

2

∥∥z − zt∥∥2. (3.6)

The solution of (3.6) is the projected gradient iteration [32]

zt+1 = PZ

(zt − 1

L∇f(zt)

), (3.7)

where PZ(z) is the projection of point z onto Z. This projection has a closed-form expression,

PZ(z) =

x

PY(y) =[yij‖yij‖dij

]i∼j

PW(w) =[wik

‖wik‖rik]i∈Vk∈Ai

.The gradient in (3.7) can be easily computed as the affine function ∇f(z) = Mz−b. See the recentwork [38] for interesting convergence properties of the recursion (3.7). Particularly, we emphasize

that the cost function is non increasing per iteration.

We now compute a Lipschitz constant L for the gradient of the quadratic function in Prob-

lem (3.3), such that it is easy to estimate in a distributed way.

L = λmax (M)

≤ λmax (M1) + λmax (M2)

= λmax

(AAT + I

)+ λmax

(EET + I

)≤ λmax

(ATA

)+ λmax

(EET

)+ 2

≤ 2δmax + maxi∈V|Ai|+ 2, (3.8)

where λmax denotes the largest eigenvalue, |A| is the cardinality of set A, and δmax is the maximum

node degree of the network. We note that λmax(ATA) is the maximum eigenvalue of the Laplacian

matrix of graph G; the proof that it is upper-bounded by 2δmax can be found in [21] and was

discussed in Section 2.4.1. This Lipschitz constant can be computed in a distributed way by, e.g.,

a diffusion algorithm (c.f. [23, Ch. 9]).

41


Algorithm 4 Distributed nonconvex localization algorithmInput: x0;L; dij : j ∈ Ni; rik : k ∈ Ai;Output: x1: set y0

ij = PYij

(x0i − x0

j

), Yij = y : ‖y‖ = dij and w0

ik = PWik

(x0i − ak

), Wik = w : ‖w‖ =

rik2: t = 03: while some stopping criterion is not met, each node i do4: xt+1

i = bixti + 1

L

∑j∈Ni

(xtj + C(i∼j,i)y

tij

)+ 1

L

∑k∈Ai

(wtik + aik)5: for all neighboring j, compute

yk+1ij = PYij

(L−1L ykij + 1

LC(i∼j,i)(xti − xtj

)),

6: for each of the connected anchors k ∈ Ai, computewt+1ik = PWik

(L−1L wtik + 1

L (xi − aik))

7: broadcast xt+1i to neighbors

8: t = t+ 19: end while

10: return xi = xti

3.2.4 Distributed sensor network localization

At this point, the recursion in Eq. (3.7) is already distributed, as detailed below. From (3.7)

we will obtain the update rules for the variables x, y and w. For this we write matrixM as follows:

M =

ATA+ ETE −AT −ET−A I 0−E 0 I

, (3.9)

and denote B = ATA+ ETE. Then, each block of z is updated according to

xt+1 =

(I − 1

LB

)xt +

1

LAT yt +

1

LET (wt + α), (3.10)

yt+1 = PY

(L− 1

Lyt +

1

LAxt

), (3.11)

wt+1 = PW

(L− 1

Lwt +

1

LExt − α

L

), (3.12)

where Y andW are the constraint sets associated with the acquired measurements between sensors,

and between anchors and sensors, respectively, and Ni is the set of the neighbors of node i. We

observe that each block of z = (x, y, w) at iteration t + 1 will only need local neighborhood

information, as clarified in Algorithm 4. Each node i will update the current estimate of its own

position, each one of the yij for all the incident edges i ∼ j and the anchor terms wik, if any. The

symbol C(i∼j,i) denotes the arc-node incidence matrix entry relative to edge i ∼ j (row index) and

node i (column index). The constant in step 4 of Algorithm 4 is defined as bi = L−δi−|Ai|L .

3.2.5 Experimental results

We present numerical experiments to ascertain the performance of the proposed Algorithm 4,

both in accuracy and in communication cost. For a fized graph, accuracy will be measured in

1) mean positioning error defined as

MPE =1

M

M∑m=1

n∑i=1

‖xi(m)− x?i ‖, (3.13)

42


Table 3.1: Mean positioning error, with measurement noiseσ Proposed method BB method

0.01 0.0053 0.00590.05 0.0143 0.01540.10 0.0210 0.0221

whereM is the total number of Monte Carlo trials, xi(m) is the estimate generated by an algorithm

at the Monte Carlo trial m, and x?i is the true position of node i, and 2) also by evaluating the

cost function in (1.1), averaged over the Monte Carlo trials, as in (3.65).

In the previous Chapter we used RMSE as a performance measure. Both MPE and RMSE

characterize the localization error, albeit in different ways. RMSE penalizes bigger discrepancies,

while MPE weights outliers less. So, as we are dealing with the nonconvex Maximum Likelihood

cost directly, a discrepancy in the estimate is due to the presence of measurement noise — which

shifts the cost minimum, to possible ambiguities in the network, but also to the existence of

different local minima attracting the measured algorithm. The fact that the algorithm converged

to a local minimum which is different from the global optimum should be penalized. Nevertheless,

the resulting individual distance ‖xi(m)−x?i ‖ can be large, and may not represent adequately the

overall performance of the algorithm in question.

Communication cost will be measured taking into account that each iteration in Algorithm 4

involves communicating pn real numbers. We will compare the performance of the proposed method

with the Barzilai-Borwein algorithm in [35], whose communication cost per iteration is n(2T + p),

where T is the number of consensus rounds needed to estimate the Barzilai-Borwein step size.

We use T = 20 as in [35]. The setup for the experiments is a geometric network with 50 sensors

randomly distributed in the two-dimensional square [0, 1] × [0, 1], with average node degree of

about 6, and 4 anchors placed at the vertices of this square. The network remains fixed during all

the Monte Carlo trials. Both algorithms are initialized by a convex approximation method. The

initialization will hopefully hand the nonconvex refinement algorithms a point near the basin of

attraction of the true minimum. For this purpose we generate noisy range measurements according

to dij = |‖x?i − x?j‖+ νij |, and rik = |‖x?i − ak‖+ ηik|, where νij : i ∼ j ∈ E ∪ ηik : i ∈ V, k ∈Ai are independent Gaussian random variables with zero mean and standard deviation σ. We

conducted 100 Monte Carlo trials for each standard deviation σ = (0.01, 0.05, 0.1). If we spread

the sensors by a squared area with side of 1Km, this means measurements are affected by noise

of standard deviation of 10m, 50m, and 100m. In terms of mean positioning error the proposed

algorithm fares better than the benchmark: Table 3.1 shows the mean error defined in (3.65)

after the algorithms have stabilized, or reached a maximum iteration number. In the simulated

setup, we improve the accuracy of the gradient descent with Barzilai-Borwein steps by about 1m

per sensor, even for high power noise. Figure 3.2 depicts the averaged evolution of the error per

sensor of both algorithms as a function of the volume of accumulated communications, and also

43


0 5000 10000 15000

5

6

7

8

x 10−3

Gradient descentw ith BB steps

Proposed method

Communicat ions/sensor

MPE/se

nso

r

(a) The proposed method improves the comparing algorithm, both in accuracy and communicationcost. Our proposed method improves the state of the art method in [35] by about 60 cm inmean positioning error per sensor, delivering a consistent and stable progression of the error ofthe estimates.

0 5000 10000 15000

0

10

20

x 10−4

Communications/sensor

Avg.cost/se

nso

r


Proposed method

(b) The final costs are, for the BB method, 1.7392 10−4 and, for the proposed method 1.5698 10−4.A small difference in cost that translates into a considerable distance in error, as depicted inFigure 3.2(a) and Table 3.1.

Figure 3.2: Comparison of the evolution of cost and average error per sensor with communications,for Algorithm 4 and the benchmark. Noisy distance measurements with σ = 0.01, representing 10mfor a square with 1Km sides. The proposed method shows a faster and smoother progression, whilethe benchmark bounces, always above the proposed method.

the evolution of the cost for low power noise. Gradient descent with Barzilai-Borwein steps shows

an irregular pattern for the error, only vaguely matching the variation in the corresponding cost

(Figure 3.2(b)), thus leaving some uncertainty regarding when to stop the algorithm and what

estimate to keep. The presented method reaches the final cost value per sensor much faster and

steadily than the benchmark for medium-low measurement noise. In fact, our method takes under

one order of magnitude fewer communications than the benchmark to approach the minimum cost

value (match the cost at about 1500 communications with 15000 for the benchmark). The most

realistic case of medium noise power led to the results presented in Figure 3.3. The characteristic

irregularity of the BB method continues to fail in delivering better solutions in average than our

44


0 1000 2000 3000 4000 5000 6000 70000.012

0.013

0.014

0.015

0.016Gradient descentw ith BB steps

Proposed method


MPE/se

nso

r

(a) For medium noise power the algorithms’ performance comparison follows the one under lownoise power. The accuracy gain is more than 1m per sensor.

0 1000 2000 3000 4000 5000 6000 7000

2

4

6

8

10x 10

−3


Avg.cost/se

nso

r


Proposed method

(b) Under medium noise the proposed method also reaches a smaller value for the average cost persensor: 0.0031, vs. 0.0032 for the BB method.

Figure 3.3: Comparison of the evolution of cost and average error per sensor with communications,for Algorithm 4 and the benchmark, under medium power noise. Average error and cost Distancemeasurements contaminated with noise, with σ = 0.05, representing 50m for a square with 1Kmsides. The proposed method continues to outperform the benchmark, and evolves much morepredictably than the BB method.

stable, guaranteed method. The error curves in Figure 3.3(a) are increasing because the error is

not the quantity being directly optimized and the medium-high noise power in measurement data

shifts the optimal points of the cost function relative to the nominal positions. Under high noise

power, our method tops the performance of the benchmark in cost function terms, as shown in

Figure 3.4(b), not only in terms of convergence speed, but also in the final value reached. Again,

our method expends almost one order of magnitude fewer communications to achieve its plateau,

which is itself, on average, better than the alternative method (compare the performance at 700

communications with the one at 7000 for the benchmark).

45


0 1000 2000 3000 4000 5000 6000 7000

0.018

0.02

0.022

0.024

Gradient descent w ith BB steps

Proposed method


MPE/se

nso

r

(a) The proposed algorithm tops the benchmark in error, under high noise power, by more than 1m,when considering a squared deployment area of 1Km sides.

0 1000 2000 3000 4000 5000 6000 70000.005

0.01

0.015

0.02

0.025

Gradient descent w ith BB steps

Proposed method


Avg.cost/se

nso

r

(b) Under heavy noise the proposed method reaches a smaller value for the average cost per sen-sor: 0.0096, vs. 0.0099 for the BB method.

Figure 3.4: Comparison of the evolution of cost and average error per sensor with communications,for Algorithm 4 and the benchmark, under high power noise. Distance measurements contaminatedwith noise, with σ = 0.1, representing 100m for a square with 1Km sides.

3.2.6 Summary

The monotonicity of the proposed method is a strong feature for applications of sensor network

localization. Our method proves to be not only fast and resilient, but also simple to implement

and deploy, with no free parameters to tune. The steady accuracy gain over the competing method

also makes it usable in contexts of different noise powers. The presented method can be useful

both as a refinement algorithm and as a tracking method, e.g., for mobile robot formations where

position estimates computed on a given time step are used as initialization for the next one.

An asynchronous flavor of the algorithm would be, as far as we know, restricted to a broadcast

gossip scheme, following a block-coordinate descent model. This line of research is in progress.

46

3.3 Majorization-Minimization with convex tight majorizer


A quadratic majorizer such as the one used in 3.2 is a common choice for the MM framework.

As one would expect, preliminary simulation results show that using a tighter majorizer improves

localization performance. In the following Sections we describe a particularly tight convex ma-

jorization function and point some directions of research in order to devise a distributed method

to optimize it.

3.3.1 Majorization function

Commonly, MM techniques resort to quadratic majorizers which, albeit easy to minimize, show

a considerable mismatch with most cost functions (in particular, with f in (1.1)). To overcome

this problem, we introduce a key novel majorizer. It is specifically adapted to f , tighter than a

quadratic, convex, and easily optimizable.

Before proceeding it is useful to rewrite (1.1) as

f(x) =∑i∼j

fij(xi, xj) +∑i

∑k∈Ai

fik(xi),

where fij(xi, xj) = φdij (xi − xj) and fik(xi) = φrik(xi − ak), both defined in terms of the basic

building block

φd(u) = (‖u‖ − d)2. (3.14)

3.3.1.A Majorization function for (3.14)

Let v ∈ Rp be given, assumed nonzero. We provide a majorizer Φd(· | v) for φd in (3.14) which

is tight at v, i.e., φd(u) ≤ Φd(u | v) for all u and φd(v) = Φd(v | v).

Proposition 9. Let

Φd(u|v) = maxgd(u), hd(v

>u/ ‖v‖ − d), (3.15)

where

gd(u) = (‖u‖ − d)2+ , (3.16)

(r)2+ = (max0, r)2, and

hR(r) =

2R|r| −R2 if |r| ≥ Rr2 if |r| < R,

(3.17)

is the Huber function of parameter R. Then, the function Φd(· | v) is convex, is tight at v, and

majorizes φd.

Proof. See Section 3.3.6.

Further, we propose the following Conjecture:

Conjecture 10. Majorizer Φd(· | v) in (3.15) is a tight convex majorizer of the nonconvex func-

tion φd(·) in (3.14), i.e., for all convex functions ψ : Rn → R such that φd(x) ≤ ψ(x) ≤ Φd(x) for

all x ∈ Rn we have ψ(x) = Φd(x).

47


−3 −2.5 −2 −1.5 −1 −0.5 0 0.5 10

2

4

6

8

10

12

!d(u)

Qd(u |v )

!d(u |v )

Figure 3.5: Nonconvex cost function (black, dash point) in (3.14) against the proposed majorizer(red, solid) in (3.15) and a vanilla quadratic majorizer (blue, dashed) in (3.18), for d = 0.5 andv = 0.1. The proposed convex majorizer is a much more accurate approximation.

The tightness of the proposed majorization function is illustrated in Figure 3.5, in which we

depict, for a one-dimensional argument u, d = 0.5 and v = 0.1: the nonconvex cost function

in (3.14), the proposed majorizer in (3.15) and a quadratic majorizer

Qd(u|v) = ‖u‖2 + d2 − 2dv>u

‖v‖ , (3.18)

obtained through routine manipulations of (3.14), e.g., expanding the square and linearizing ‖u‖at v, which is common in MM approaches (c.f. [5,6] for quadratic majorizers applied to the sensor

network localization problem and [39] for an application in robust MDS). Clearly, the proposed

convex majorizer is a better approximation to the nonconvex cost function2. As an expected

corollary, it also outperforms in accuracy the quadratic majorizer when embedded in the MM

framework, as shown in the experimental results of Section 3.3.2. The proof of Conjecture 10 is

being addressed.

3.3.1.B Majorization function for the sensor network localization problem

We remind the reader about the maximum likelihood estimation problem for sensor network

localization defined in (1.1).

2The fact that both majorizers have coincident minimum is an artifact of this toy example, and does not hold ingeneral.

48


Now, for given x[l], consider the function

F (x |x[l]) =∑i∼j

Fij(xi, xj) +∑i

∑k∈Ai

Fik(xi), (3.19)

where

Fij(xi, xj) = Φdij (xi − xj |xi[l]− xj [l]) (3.20)

and

Fik(xi) = Φrik(xi − ak |xi[l]− ak). (3.21)

Given Proposition 9, it is clear that it majorizes f and is tight at x[l]. Moreover, it is convex as a

sum of convex functions.

3.3.2 Experimental results on majorization function quality

To initialize the algorithms we take the true sensor positions x? = x?i : i ∈ V and we perturb

them by adding independent zero mean Gaussian noise, according to

xi[0] = x?i + ηi, (3.22)

where ηi ∼ N (0, σ2initIp) and Ip is the identity matrix of size p× p. The parameter σinit is detailed

ahead.

We compare the performance of our proposed majorizer in (3.19) with a standard one built out

of quadratic functions, e.g., the one used in [6]. We have submitted a simple source localization

problem with one sensor and 4 anchors to two MM algorithms, each associated with one of the

majorization functions. They ran for a fixed number of 30 iterations. At each Monte Carlo trial,

the true sensor positions were corrupted by zero mean Gaussian noise, as in (3.22), with standard

deviation σinit ∈ [0.01, 1]. The range measurements are taken to be noiseless, i.e., σ = 0 in (3.64),

in order to create an idealized scenario for direct comparison of the two approaches. The evolution

of RMSE as a function of initialization noise intensity is illustrated in Figure 3.6. There is a clear

advantage of using this majorization function when the initialization is within a radius of the true

location which is 30% of the square size.

3.3.3 Distributed optimization of the proposed majorizer using ADMM

At the l-th iteration of the nonconvex minimization algorithm, the convex function in (3.19)

must be minimized. We now show how this optimization problem can be solved collaboratively by

the network in a distributed, parallel manner.

We propose a first distributed algorithm to tackle problem (1.1). Starting from an initializa-

tion x[0] for the unknown sensors’ positions x, it generates a sequence of iterates (x[l])l≥1 which,

hopefully, converges to a solution of (1.1). We apply the majorization minimization (MM) frame-

work [37] to (1.1): at each iteration l, we minimize (3.19), a majorizer of f , tight at the current

49


10−2 10−1 100

10−4

10−3

10−2

10−1

F (proposed)

Q (quadrat ic )

RMSE

I nit ializat ion noise intensity (! i n i t)

Figure 3.6: RMSE vs. σinit, the intensity of initialization noise in (3.22). The range measurementsare noiseless: σ = 0 in (3.64). Anchors are at the unit square corners. The proposed majorizer(red, solid) outperforms the quadratic majorizer (blue, dashed) in accuracy.

Algorithm 5 Minimization of the tight majorizer in (3.19)Input: x[0]Output: x[L]1: for l = 0 to L− 1 do2: x[l + 1] = argminx F (x |x[l])3: end for4: return x[L]

iterate x[l], to obtain the next iterate x[l+ 1]. The algorithm is outlined in Algorithm 5 for a fixed

number of iterations L. Here, F (· |x[l]) denotes a majorizer of f (i.e., f(x) ≤ F (x |x[l]) for all

x) which is tight at x[l] (i.e., f(x[l]) = F (x[l] |x[l])). The majorizer is detailed in Section 3.3.1.B.

Note that f(x[l+1]) ≤ f(x[l]), that is, f is monotonically decreasing along iterations, an important

property of the MM framework.

Algorithm 5 is a distributed algorithm because, as we shall see, the minimization of the upper-

bounds F can be achieved in a distributed manner.

3.3.3.A Problem reformulation

In the distributed algorithm, the working node will operate on local copies of the estimated

positions of its neighbors and of itself. So, it is convenient to introduce new variables. Let

Vi = j : j ∼ i denote the neighbors of sensor i. We also define the closed neighborhood

50


Vi = Vi ∪ i. For each i, we duplicate xi into new variables yji, j ∈ Vi, and zik, k ∈ Ai. This

choice of notation is not fortuitous: the first subscript reveals which physical node will store the

variable, in our proposed implementation; thus, xi and zik are stored at node i, whereas yji is

managed by node j. We write the minimization of (3.19) as the optimization problem

minimize F (y, z)subject to yji = xi, j ∈ Vi

zik = xi, k ∈ Ai,(3.23)

where y = yji : i ∈ V, j ∈ Vi, z = zik : i ∈ V, k ∈ Ai, andF (y, z) =

∑i∼j

(Fij (yii, yij) + Fij (yji, yjj)) +

+ 2∑i

∑k∈Ai

Fik (zik) . (3.24)

In passing from (3.19) to (3.23) we used the identity Fij(xi, xj) = 12Fij (yii, yij) + 1

2Fij (yji, yjj),

due to yji = xi. Also, for convenience, we have rescaled the objective by a factor of two.

3.3.3.B Algorithm derivation

Problem (3.23) is in the form

minimize F (y, z) +G(x)subject to A(y, z) +Bx = 0

(3.25)

where F is the convex function in (3.24), G is the identically zero function, A is the identity

operator and B is a matrix whose rows belong to the set −e>i , i ∈ V, being ei the ith column of

the identity matrix of size |V|. In the presence of a connected network B is full column rank, so

the problem is suited for the Alternating Direction Method of Multipliers (ADMM). See [40] and

references therein for more details on this method. See also [41–46] for applications of ADMM in

distributed optimization settings.

Let λji be the Lagrange multiplier associated with the constraint yji = xi and λ = λji thecollection of all such multipliers. Similarly, let µik be the Lagrange multiplier associated with the

constraint zik = xi and µ = µik.The ADMM framework generates a sequence (y(t), z(t), x(t), λ(t), µ(t))t≥1 such that

(y(t+ 1), z(t+ 1)) = argminy,z

Lρ (y, z, x(t), λ(t), µ(t))

(3.26)x(t+ 1) = argmin

xLρ (y(t+ 1), z(t+ 1), x, λ(t), µ(t))

(3.27)λji(t+ 1) = λji(t) + ρ(yji(t+ 1)− xi(t+ 1)) (3.28)µik(t+ 1) = µik(t) + ρ(zik(t+ 1)− xi(t+ 1)), (3.29)

where Lρ is the augmented Lagrangian defined as

Lρ(y, z, x, λ, µ) = F (y, z) +∑i

∑j∈Vi

(λ>ji(yji − xi)+

ρ

2‖yji − xi‖2

)+∑i

∑k∈Ai

(µ>ik(zik − xi) +

ρ

2‖zik − xi‖2

).

(3.30)

51


Here, ρ > 0 is a pre-chosen constant.

In our implementation, we let node i store the variables xi, yij , λij , λji, for j ∈ Vi and zik,

µik, for k ∈ Ai. Note that a copy of λij is maintained at both nodes i and j (this is to avoid extra

communication steps). For t = 0, we can set λ(0) and µ(0) to a pre-chosen constant (e.g., zero) at

all nodes. Also, we assume that, at the beginning of the iterations (i.e., for t = 0) node i knows

xj(0) for j ∈ Vi (this can be accomplished, e.g., by having each node i communicating xi(0) to all

its neighbors). This property will be preserved for all t ≥ 1 in our algorithm, via communication

steps.

We now show that the minimizations in (3.26) and (3.27) can be implemented in a distributed

manner and require low computational cost at each node.

3.3.3.C ADMM: Solving Problem (3.26)

As shown in Section 3.3.7, the augmented Lagrangian in (3.30) can be written as

Lρ (y, z, x, λ, µ) =∑i

∑j∈Vi

Lij (yii, yij , xj , λij) +

+∑k∈Ai

Lik (zik, xi, µik)

)(3.31)

where

Lij (yii, yij , xj , λij) = Fij (yii, yij) + λ>ij (yij − xj) +

+ρ

2‖yij − xj‖2

and

Lik (zik, xi, µik) = 2Fik (zik) + µ>ik (zik − xi) +

+ρ

2‖zik − xi‖2 .

In (3.31) we let Fii ≡ 0. It is clear from (3.31) that Problem (3.26) decouples across sensors i ∈ V,since we are optimizing only over y and z. Further, at each sensor i, it decouples into two types ofsubproblems: one involving the variables yij , j ∈ Vi, given by

minimizeyij , j∈Vi

∑j∈Vi

Lij (yii, yij , xj , λij) , (3.32)

and into |Ai| subproblems of the form

minimizezik

Lik (zik, xi, µik) , (3.33)

involving the variable zik, k ∈ Ai. Note that problems related with anchors are simpler, and, since

there are usually few anchors in a network, they do not occur frequently.

A – Solving Problem (3.32) First, note that node i can indeed address Problem (3.32) since

all the data defining it is available at node i: it stores λji(t), j ∈ Vi, and it knows xj(t) for all

neighbors j ∈ Vi (this holds trivially for t = 0 by construction, and it is preserved by our approach,

as shown ahead).

52


To alleviate notation we now suppress the indication of the working node i, i.e., variable yij issimply written as yj . Problem (3.32) can be written as

minimizeyj , j∈Vi

∑j∈Vi

(Fij(yi, yj) +

ρ

2‖yj − γij‖2

)+ρ

2‖yi − γii‖2, (3.34)

where γij = xj − λij

ρ .

We make the crucial observation that, for fixed yi, the problem is separable in the remaining

variables yj , j ∈ Vi. This motivates writing (3.34) as the master problem

minimizeyi

H(yi) =∑j∈Vi

Hij(yi) +ρ

2‖yi − γii‖2, (3.35)

where

Hij(yi) = minyj

Fij(yi, yj) +ρ

2‖yj − γij‖2. (3.36)

We now state important properties of Hij .

Proposition 11. Define Hij as in (3.36). Then:

1. Optimization problem (3.36) has a unique solution yj for any given yi, henceforth denoted

y?j (yi);

2. Function Hij is convex and differentiable, with gradient

∇Hij(yi) = ρ(y?j (yi)− γij

); (3.37)

3. The gradient of Hij is Lipschitz continuous with parameter ρ, i.e.,

‖∇Hij(u)−∇Hij(v)‖ ≤ ρ‖u− v‖

for all u, v ∈ Rp.

Proof. 1. Recall from (3.20) that Fij(yi, yj) = Φd(yi−yj | v) where d = dij and v = xi[l]−xj [l].We have Hij(yi) = Θ(yi − γij) where

Θ(w) = minu

Φd(u | v) +ρ

2‖u− w‖2 . (3.38)

Moreover, u? solves (3.38) if and only if y?j = yi − u? solves (3.36). Now, the cost function

in (3.38) is clearly continuous, coercive (i.e., it converges to +∞ as ‖u‖ → +∞) and strictly

convex, the two last properties arising from the quadratic term. Thus, it has an unique

solution;

2. The function Θ is the Moreau-Yosida regularization of the convex function Φd(·|v) [20,

XI.3.4.4]. As Θ is known to be convex and Hij is the composition of Θ with an affine

map, Hij is convex. It is also known that the gradient of Θ is

∇Θ(w) = ρ(w − u?(w))

53


where u?(w) is the unique solution of (3.38) for a given w. Thus,

∇Hij(yi) = ∇Θ(yi − γij)

= ρ(yi − γij − u?(yi − γij)).

Unwinding the change of variable, i.e., using y?j (yi) = yi − u?(yi − γij), we obtain (3.37);

3. Follows from the well known fact that the gradient of Θ is Lipschitz continuous with param-

eter ρ.

As a consequence, we obtain several nice properties of the function H.

Theorem 12. Function H in (3.35) is strongly convex with parameter ρ, i.e., H− ρ2 ‖·‖

2 is convex.

Furthermore, it is differentiable with gradient

∇H(yi) = ρ∑j∈Vi

(y?j (yi)− γij

)+ ρ(yi − γii). (3.39)

The gradient of H is Lipschitz continuous with parameter LH = ρ(|Vi|+ 1).

Proof. Since H is a sum of convex functions, it is convex. It is strongly convex with parameter ρ

due to the presence of the strongly convex term ρ2 ‖yi − γii‖

2. As a sum of differentiable functions,

it is differentiable and the given formula for the gradient follows from proposition 11. Finally, since

H is the sum of |Vi|+ 1 functions with Lipschitz continuous gradient with parameter ρ , the claim

is proved.

The properties established in Theorem 12 show that the optimization problem (3.35) is suited

for Nesterov’s optimal method for the minimization of strongly convex functions with Lipschitz

continuous gradient [27, Theorem 2.2.3]. The resulting algorithm is outlined in Algorithm 6, which

is guaranteed to converge to the solution of (3.35).

Algorithm 6 Nesterov’s optimal method for (3.35)1: yi(0) = yi(0)2: for s ≥ 0 do3: yi(s+ 1) = yi(s)− 1

LH∇H(yi(s))

4: yi(s+ 1) = yi(s+ 1) +√LH−

√ρ√

LH+√ρ(yi(s+ 1)− yi(s))

5: end for

B – Solving problem (3.36) It remains to show how to solve (3.36) at a given sensor node.

Any off-the-shelf convex solver, e.g. based on interior-point methods, could handle it. However,

we present a simpler method that avoids expensive matrix operations, typical of interior point

methods, by taking advantage of the problem structure at hand. This is important in sensor

networks where the sensors have stringent computational resources.

54


First, as shown in the proof of Proposition 11, it suffices to focus on solving (3.38) for a

given w: solving (3.36) amounts to solving (3.38) for w = yi − γij to obtain u? = u?(w) and set

y?j (yi) = yi − u?.Note from (3.15) that Φd(·|v) only depends on v/ ‖v‖, so we can assume, without loss of

generality, that ‖v‖ = 1.

From (3.15), we see that Problem (3.38) can be rewritten as

minimize r + ρ2 ‖u− w‖

2

subject to gd(u) ≤ rhd(v

>u− d) ≤ r,(3.40)

with optimization variable (u, r). The Lagrange dual (c.f., for example, [20]) of (3.40) is given by

maximize ψ(ω)subject to 0 ≤ ω ≤ 1,

(3.41)

where ψ(ω) = infΨ(ω, u) : u ∈ Rn and

Ψ(ω, u) =ρ

2‖u− w‖2 + ωgd(u) + (1− ω)hd(v

>u− d). (3.42)

We propose to solve the dual problem (3.41), which involves the single variable ω, by bisection:

we maintain an interval [a, b] ⊂ [0, 1] (initially, they coincide); we evaluate ψ(c) at the midpoint

c = (a + b)/2; if ψ(c) > 0, we set a = c, otherwise, b = c; the scheme is repeated until the

uncertainty interval is sufficiently small.

In order to make this approach work, we must prove first that the dual function ψ is indeed

differentiable in the open interval Ω = (0, 1) and find a convenient formula for its derivative. We

will need the following useful result from convex analysis.

Lemma 13. Let X ⊂ Rn be an open convex set and Y ⊂ Rp be a compact set. Let F : X×Y → R.

Assume that F (x, ·) is lower semi-continuous for all x ∈ X and F (·, y) is concave and differentiable

for all y ∈ Y . Let f : X → R, f(x) = infF (x, y) : y ∈ Y . Assume that, for any x ∈ X, the

infimum is attained at an unique y?(x) ∈ Y . Then, f is differentiable everywhere and its gradient

at x ∈ X is given by

∇f(x) = ∇F (x, y?(x)) (3.43)

where ∇ refers to differentiation with respect to x.

Proof. This is essentially [20, VI.4.4.5], after one changes concave for convex, lower semi-continuous

for upper semi-continuous and inf for sup.

Now, view Ψ in (3.42) as defined in Ω × Rn. Is is clear that Ψ(ω, ·) is lower semi-continuous

for all ω (in fact, continuous) and Ψ(·, u) is concave (in fact, affine) and differentiable for all u. In

fact, some even nicer properties hold.

55


Lemma 14. Let ω ∈ Ω. The function Ψω = Ψ(ω, ·) is strongly convex with parameter ρ and

differentiable everywhere with gradient

∇Ψω(u) = ρ(u− w) + 2ω(u− π(u)) + (1− ω)hd(v>u− d)v, (3.44)

where π(u) denotes the projection of u onto the closed ball of radius d centered at the origin.

Furthermore, the gradient of Ψω is Lipschitz continuous with parameter ρ+ 2.

Proof. We start by noting that gd in (3.16) can be written as gd(u) = d2C(u) where C is the closed

ball with radius d centered at the origin, and dC denotes the distance to the closed convex set C.

It is known that gd is convex, differentiable, the gradient is given by ∇gd(u) = 2(u− π(u)) and it

is Lipschitz continuous with parameter 2 [20, X.3.2.3]. Also, function hd in (3.17) is convex and

differentiable. Thus, the function Ψω is convex (resp. differentiable) as a sum of three convex (resp.

differentiable) functions. It is strongly convex with parameter ρ due to the first term ρ2 ‖· − w‖

2.

The gradient in (3.44) is clear. Finally, from |hd(r)| − hd(s)| ≤ 2|r − s| for all r, s, there holds for

any u1, u2,

|hd(v>u1 − d)− hd(v>u2 − d)| ≤ 2|v>(u1 − u2)| ≤ 2‖u1 − u2‖,

where ‖v‖ = 1 and the Cauchy-Schwarz inequality was used in the last step. We conclude

from (3.44) that, for any u1, u2,

‖∇Ψω(u1)−∇Ψω(u2)‖ ≤ (ρ+ 2ω + 2(1− ω)) ‖u1 − u2‖ ,

i.e., the gradient of Ψω is Lipschtz continuous with parameter ρ+ 2.

Using Lemma 14, we see that the infimum of Ψω is attained at a single u?(ω) since it is a

continuous, strongly convex function. The derivative of ψ in (3.41) relies on u?(ω), as seen in

Lemma 15.

Lemma 15. Function ψ in (3.41) is differentiable and its derivative is

ψ(ω) = gd (u?(ω))− hd(v>u?(ω)− d

). (3.45)

Proof. We begin by bounding the norm of u?(ω). From the necessary stationary condition∇Ψω(u?(ω)) =

0 and (3.44) we conclude

(ρ+ 2ω)u?(ω) = ρw + 2ωπ(u?(ω))− (1− ω)hd(v>u?(ω)− d)v. (3.46)

Since |hd(t)| ≤ 2d for all t (see (3.17)), ‖π(u)‖ ≤ d for all u, ‖v‖ = 1, and 0 ≤ ω ≤ 1, we can bound

the norm of the right-hand side of (3.46) by ρ ‖w‖+ 4d. Thus,

‖u?(ω)‖ ≤ 1

ρ+ 2ω(ρ ‖w‖+ 4d)

≤ 1

ρ(ρ ‖w‖+ 4d)

= ‖w‖+4d

ρ.

56


Introduce the compact set U = u ∈ Rn : ‖u‖ ≤ ‖w‖ + 4d/ρ. The previous analysis has shown

that the dual function in (3.41) can also be represented as ψ(ω) = infΨ(ω, u) : u ∈ U, i.e., wecan restrict the search to U and view Ψ as defined in Ω × U . We can thus invoke Lemma 13 to

conclude that ψ is differentiable and (3.45) holds.

Finding u?(ω) To obtain u?(ω) we must minimize Ψω. But, given its properties in Lemma 14,

the simple optimal Nesterov method, described in Algorithm 6, is also applicable here.

C – Solving problem (3.33) Note that node i stores xi(t) and µik(t), k ∈ Ai. Thus, it can

indeed address Problem (3.33). Problem (3.33) is similar (in fact, much simpler) than (3.32),

and following the previous steps leads to the same Nesterov’s optimal method. We omit this

straightforward derivation.

3.3.3.D ADMM: Solving Problem (3.27)

Looking at (3.30), it is clear that Problem (3.27) decouples across nodes also. Furthermore,

at node i a simple unconstrained quadratic problem with respect to xi must be solved, whose

closed-form solution is

xi(t+ 1) =1

|Vi|+ |Ai|

∑j∈Vi

(1

ρλji(t) + yji(t+ 1)

)

+∑k∈Ai

(1

ρµik(t) + zik(t+ 1)

)). (3.47)

For node i to carry this update, it needs first to receive yji(t + 1) from its neighbors j ∈ Vi.This requires a communication step.

3.3.3.E ADMM: Implementing (3.28) and (3.29)

Recall that the dual variable λji is maintained at both nodes i and j. Node i can carry the

update λji(t+ 1) in (3.28), for all j ∈ Vi, since the needed data are available (recall that yji(t+ 1)

is available from the previous communication step). To update λij(t+ 1) = λij(t) + ρ(yij(t+ 1)−xj(t+1), node i needs to receive xj(t+1) from its neighbors j ∈ Vi. This requires a communication

step.

3.3.3.F Summary of the distributed algorithm

Our ADMM-based algorithm stops after a fixed number of iterations, denoted T . Algorithm 7

outlines the procedure derived in Sections 3.3.3.C and 3.3.3.D, and corresponds to step 2 of the

ADMM-based algorithm (Algorithm 5). Note that, in order to implement step 5 of Algorithm 7,

one must adapt Algorithm 6 to the problem at hand.

57


Algorithm 7 Step 2 of Algorithm 5 using ADMM: position updatesInput: x[l]Output: x[l + 1]1: for t = 0 to T − 1 do2: for each node i ∈ V in parallel do3: Solve Problem (3.32) by minimizing H in (3.35) with Alg. 6 to obtain yij(t+ 1), j ∈ Vi4: for k = 1 to |Ai| do5: Solve Problem (3.33) to obtain zik(t+ 1)6: end for7: Send yij(t+ 1) to neighbor j ∈ Vi8: Compute xi(t+ 1) from (3.47)9: Send xi(t+ 1) to all j ∈ Vi

10: Update λji(t+ 1), µik(t+ 1), j ∈ Vi, k ∈ Ai as in (3.28) and (3.29)11: end for12: end for13: return x[l + 1] = x(T )

A – Communication load Algorithm 7 shows two communication steps: step 7 and step 9.

At step 7 each node i sends |Vi| vectors in Rp, each to one neighboring sensor, and at step 9 a

vector in Rp is broadcast to all nodes in Vi. This results in 2TL|Vi| communications of Rp vectors

for node i for the overall algorithm. When comparing with SGO in [18], for T iterations, node i

sends T |Vi| vectors in Rp. The increase in communications is the price to pay for the parallel

nature of the ADMM-based algorithm.

3.3.4 Experimental setup

Unless otherwise specified, the generated geometric networks are composed by 4 anchors and 50

sensors, with an average node degree, i.e., 1|V|∑i∈V |Vi|, of about 6. In all experiments the sensors

are distributed at random and uniformly on a square of 1 × 1, and anchors are placed, unless

otherwise stated, at the four corners of the unit square (to follow [18]), namely, at (0, 0),(0, 1),

(1, 0) and (1, 1). These properties require a communication range of about R = 0.24. Since

localizability is an issue when assessing the accuracy of sensor network localization algorithms,

the used networks are first checked to be generically globally rigid, so that a small disturbance in

measurements does not create placement ambiguities. To detect generic global rigidity, we used

the methodologies in [47, Section 2]. The results for the proposed algorithm consider L = 40 MM

iterations, unless otherwise stated.

3.3.4.A ADMM and SGO: RMSE vs. initialization noise

Two sets of experiments were made to compare the RMSE performance of SGO in [18] and the

proposed Algorithm 5, termed DCOOL-NET, as a function of the initialization quality (i.e., σinit

in (3.22)). In the first set, range measurements are noiseless (i.e., σ = 0 in (3.64)), whereas in the

second set we consider noisy range measurements (σ > 0).

58


0 0.05 0.1 0.15 0.2 0.25 0.3

0.02

0.04

0.06

0.08

DCOOL-NET

SGO

RMSE


Figure 3.7: RMSE vs. σinit, the intensity of initialization noise in (3.22). The range measurementsare noiseless: σ = 0 in (3.64). Anchors are at the unit square corners. Proposed DCOOL-NET(red, solid) and SGO (blue, dashed) attain comparable accuracy.

Table 3.2: Squared error dispersion over Monte Carlo trials for Figure 3.7.σinit DCOOL-NET SGO

0.01 0.0002 0.00070.10 0.0638 0.12900.30 0.2380 0.3400

Noiseless range measurements In this setup 300 Monte Carlo trials were run. As the mea-

surements are accurate (σ = 0 in (3.64)) one would expect not only insignificant values of RMSE,

but also a considerable agreement between all the Monte Carlo trials on the solution for suffi-

ciently close initializations. Figure 3.7 confirms that both DCOOL-NET and SGO achieve small

error positions, and their accuracies are comparable. As stated before, SGO also has a low com-

putational complexity. In fact, lower than DCOOL-NET (although DCOOL-NET is fully parallel

across nodes, whereas SGO operates by activating the nodes sequentially, implying some high-level

coordination). Table 3.2 shows the squared error dispersion over all Monte Carlo trials, i.e., the

standard deviation of the data SEm : m = 1, . . . ,M,SEm = ‖xm − x?‖2, for both algorithms.

We see that DCOOL-NET exhibits a more stable performance, in the sense that it has a lower

squared error dispersion.

Noisy range measurements We set σ = 0.12 in the noise model (3.64). Figure 3.8 shows that

59


0 0.05 0.1 0.15 0.2 0.25 0.3

0.04

0.06

0.08

0.1

DCOOL-NET

SGO

RMSE


Figure 3.8: RMSE vs. σinit, the intensity of initialization noise in (3.22). The range measurementsare noisy: σ = 0.12 in (3.64). Anchors are at the unit square corners. Proposed DCOOL-NET(red, solid) outperforms SGO (blue, dashed) in accuracy.


0.00 0.0118 0.07830.01 0.0121 0.07750.10 0.0727 0.16100.30 0.2490 0.3320

DCOOL-NET fares better than SGO: the gap between the performances of both algorithms is

now quite significant. The squared error dispersion over all Monte Carlo trials for both algorithms

is given in Table 3.3. As before, we see that DCOOL-NET is more reliable, in the sense that it

exhibits lower variance of estimates across Monte Carlo experiments.

We also considered placing the anchors randomly within the unit square, instead of at the

corners. This is a more realistic and challenging setup, where the sensors are no longer necessarily

located inside the convex hull of the anchors. The corresponding results are shown in Figure 3.9 and

Table 3.4, for 250 Monte Carlo trials. Again, DCOOL-NET achieves better accuracy. Comparing

the dispersions in Tabs. 3.3 and 3.4 also reveals that the gap in reliability between SGO and our

algorithm is now wider.

60


0 0.1 0.2 0.3 0.4 0.5

0.04

0.08

0.12

0.16

DCOOL-NET

SGO

RMSE


Figure 3.9: RMSE vs. σinit, the intensity of initialization noise in (3.22). The range measurementsare noisy: σ = 0.12 in (3.64). Anchors were randomly placed in the unit square. ProposedDCOOL-NET (red, solid) outperforms SGO (blue, dashed) in accuracy.


0.00 0.0097 0.07120.01 0.0099 0.07090.10 0.1550 0.31600.30 0.4350 0.84400.50 0.8330 1.3000

3.3.4.B ADMM and SGO: RMSE vs. measurement noise

To evaluate the sensitivity of both algorithms to the intensity of noise present in range measure-

ments (i.e., σ in (3.64)), 300 Monte Carlo trials were run for σ = 0.01, 0.1, 0.12, 0.15, 0.17, 0.2, 0.3.

Both algorithms were initialized at the true sensor positions, i.e., σinit = 0 in (3.22), and ADMM

performs L = 100 iterations3. Figure 3.10 and Table 3.5 summarize the computer simulations for

this setup. As before, ADMM consistently achieves better accuracy and stability.

3.3.4.C ADMM and SGO: RMSE vs. communication cost

We assessed how the RMSE varies with the communication load incurred by both algorithms.

We considered the general setup described in Section 3.3.4. The results are displayed in Figure 3.11.

3This is to guarantee that, in practice, ADMM indeed attained a fixed point, but the results barely changed forL = 40.

61


0.05 0.1 0.15 0.2 0.25 0.3

0.02

0.04

0.06

0.08

DCOOL-NET

SGO

RMSE

Measurement noise intensity (!)

Figure 3.10: RMSE vs. σ, the intensity of measurement noise in (3.64). No initialization noise:σinit = 0 in (3.22). Anchors are at the unit square corners. Proposed ADMM (red, solid) outper-forms SGO (blue, dashed) in accuracy.

Table 3.5: Squared error dispersion over Monte Carlo trials for Figure 3.10.σ DCOOL-NET SGO

0.01 0.0002 0.00160.10 0.0177 0.06880.12 0.0218 0.07020.15 0.0326 0.09210.17 0.0394 0.09930.20 0.0525 0.10900.30 0.1020 0.1630

We see an interesting tradeoff: SGO converges much quicker than ADMM (in terms of commu-

nication rounds), and attains a lower RMSE sooner. However, ADMM can improve its accuracy

through more communications, whereas SGO remains trapped in a suboptimal solution.

3.3.4.D ADMM: RMSE vs. parameter ρ

The parameter ρ plays a role in the augmented Lagrangian discussed in Section 3.3.3, and is

user-selected. As such, it is important to study the sensitivity of ADMM to this parameter choice.

For this purpose, we have tested several ρ between 1 and 200. For each choice, 300 Monte Carlo

trials were performed using noisy measurements and initializations. Figure 3.12 portrays RMSE

against ρ for L = 40 iterations of ADMM. There is no ample variation, especially for values of

62


1 2 3 4 5 6 7 8x 106

0.05

0.06

0.07

0.08

DCOOL-NET

SGO

Number of communicat ions

RMSE

Figure 3.11: RMSE versus total number of two-dimensional vectors communicated in the network.The range measurements are noiseless: σ = 0 in (3.64). Initialization is noisy: σinit = 0.1 in (3.22).Anchors are at the unit square corners. Proposed ADMM (red, solid) outperforms SGO (blue,dashed) in accuracy, at the expense of more communications.

ρ over 30, which offers some confidence in the algorithm resilience to this parameter, a pivotal

feature from the practical standpoint. However, an analytical approach for selecting the optimal ρ

is beyond the scope of this work, and is postponed for future research. Note that adaptive schemes

to adjust ρ do exist for centralized settings, e.g., [40], but seem impractical for distributed setups

as they require global computations.

3.3.5 Proof of majorization function properties

We now prove Proposition 9. We write Φd(u) instead of Φd(u|v) and we let 〈x, y〉 = x>y.

Convexity

Note that gd is convex as the composition of the convex, non-decreasing function (·)2+ with the

convex function ‖·‖− d. Also, hd(〈v/‖v‖, ·〉 − d) is convex as the composition of the convex Huber

function hd(·) with the affine map 〈v/‖v‖, ·〉 − d. Finally, Φd is convex as the pointwise maximum

of two convex functions.

63


20 40 60 80 100 120 140 160 180 200

0.045

0.05

0.055

0.06

0.065

Parameter value (!)

RMSE

Figure 3.12: RMSE vs. ρ. The range measurements are noisy: σ = 0.05 in (3.64). Initialization isnoisy: σinit = 0.1 in (3.22). Anchors are the unit square corners.

Tightness

It is straightforward to check that φd(v) = Φd(v) by examining separately the three cases

‖v‖ < d, d ≤ ‖v‖ < 2d and ‖v‖ ≥ 2d.

Majorization

We must show that Φd(u) ≥ φd(u) for all u. First, consider ‖u‖ ≥ d. Then, gd(u) = φd(u)

and it follows that Φd(v) = maxgd(u), hd(〈v/‖v‖, u〉 − d) ≥ φd(u). Now, consider ‖u‖ < d and

write u = Ru, where R = ‖u‖ < d and ‖u‖ = 1. It is straightforward to check that, in terms of R

and u, we have φd(u) = (R − d)2 and Φd(u) = hd(R〈v, u〉 − d), where v = v/‖v‖. Thus, we must

show that hd(R〈v, u〉− d) ≥ (R− d)2. Motivated by the definition of the Huber function hd in two

branches, we divide the analysis in two cases.

Case 1: |R〈v, u〉−d| ≤ d. In this case, hd(R〈v, u〉−d) = (R〈v, u〉−d)2. Noting that |〈v, u〉| ≤ 1,

there holds

(R〈v, u〉 − d)2 ≥ inf(Rz − d)2 : |z| ≤ 1 = (R− d)2,

where the fact that R < d was used to compute the infimum over z (attained at z = 1).

Case 2: |R〈v, u〉 − d| > d. In this case, hd(R〈v, u〉 − d) = 2d|R〈v, u〉 − d| − d2. Thus,

hd(R〈v, u〉 − d) ≥ d2 ≥ (d−R)2,

where the last inequality follows from 0 ≤ R < d.

64


3.3.6 Proof of Proposition 9

We write Φd(u) instead of Φd(u|v) and we let 〈x, y〉 = x>y.

Convexity

Note that gd is convex as the composition of the convex, non-decreasing function (·)2+ with the

convex function ‖·‖− d. Also, hd(〈v/‖v‖, ·〉 − d) is convex as the composition of the convex Huber

function hd(·) with the affine map 〈v/‖v‖, ·〉 − d. Finally, Φd is convex as the pointwise maximum

of two convex functions.

Tightness

It is straightforward to check that φd(v) = Φd(v) by examining separately the three cases

‖v‖ < d, d ≤ ‖v‖ < 2d and ‖v‖ ≥ 2d.

Majorization

We must show that Φd(u) ≥ φd(u) for all u. First, consider ‖u‖ ≥ d. Then, gd(u) = φd(u)

and it follows that Φd(v) = maxgd(u), hd(〈v/‖v‖, u〉 − d) ≥ φd(u). Now, consider ‖u‖ < d and

write u = Ru, where R = ‖u‖ < d and ‖u‖ = 1. It is straightforward to check that, in terms of R

and u, we have φd(u) = (R − d)2 and Φd(u) = hd(R〈v, u〉 − d), where v = v/‖v‖. Thus, we must

show that hd(R〈v, u〉− d) ≥ (R− d)2. Motivated by the definition of the Huber function hd in two

branches, we divide the analysis in two cases.

Case 1: |R〈v, u〉−d| ≤ d. In this case, hd(R〈v, u〉−d) = (R〈v, u〉−d)2. Noting that |〈v, u〉| ≤ 1,

there holds

(R〈v, u〉 − d)2 ≥ inf(Rz − d)2 : |z| ≤ 1 = (R− d)2,

where the fact that R < d was used to compute the infimum over z (attained at z = 1).

Case 2: |R〈v, u〉 − d| > d. In this case, hd(R〈v, u〉 − d) = 2d|R〈v, u〉 − d| − d2. Thus,

hd(R〈v, u〉 − d) ≥ d2 ≥ (d−R)2,

where the last inequality follows from 0 ≤ R < d.

3.3.7 Proof of (3.31)

We show how to rewrite (3.30) as (3.31). First, note that F (y, z) in (3.24) can be rewritten as

F (y, z) =∑i

∑j∈Vi

Fij(yii, yij) + 2∑i

∑k∈Ai

Fik(zik). (3.48)

65


Here, we used the fact that Fij(yji, yjj) = Fji(yjj , yji) which follows from dij = dji and Φd(u|v) =Φd(−u| − v), see (3.15). In addition, there holds∑

i

∑j∈Vi

λ>ji(yji − xi) +ρ

2‖yji − xi‖2

=∑j

∑i∈Vj

λ>ij(yij − xj) +ρ

2‖yij − xj‖2

=∑i

∑j∈Vi

λ>ij(yij − xj) +ρ

2‖yij − xj‖2 . (3.49)

The first equality follows from interchanging i with j. The second equality follows from noting

that i ∈ Vj if and only if j ∈ Vi. Using (3.48) and (3.49) in (3.30) gives (3.31).

3.3.8 Summary

We presented a convex majorizer, crafted to be a tight fit to the sensor network localization

problem (1.1).

We developed a distributed, fully parallel algorithm to optimize the convex majorizer, based

on the ADMM. This choice allowed for the distribution of the problem but at the expense of

unpractical communication load. This behavior of the algorithm can be explained by the increase

of the number of variables, when adding edge variables to the equivalent problems, but mainly

by the fact that node subproblems do not have closed form exact solutions and ADMM has to

compensate the deviations of the partial iterative solutions with more communication rounds.

We are currently establishing the proof of tightness in Rn for n > 1 as enunciated in Conjec-

ture 10, and investigating a proximal method to efficiently minimize each majorizer in a distributed

fashion, also allowing for gossip-like asynchronous solutions.

3.4 Sensor network localization: a graphical model approach

This Section focuses on the sensor network localization problem, when one has access to the

mean and variance of the normally distributed priors on the sensor positions. In this setting we

do not need landmarks or anchors to take care of rotation, translation or flip ambiguities, so when

anchors are not easy to determine and on deployment we have some notion of the drop-off of sensors

and their spread, this solution is appropriate.

The problem is cast under the formalism of probabilistic graphical models and the optimization

problem to obtain the MAP (maximum a posteriori) estimate of the sensor positions is stated. The

proposed goals concentrate on suboptimal approximation methods for the derived combinatorial

problem.

In general, the deployment of the sensors in the terrain is not done in an accurate way, but

sometimes it is possible to delimit regions with some probability of containing each sensor. Many

sensor networks can also acquire noisy distance measurements between neighboring nodes, thus

obtaining data to estimate their true positions. Under such conditions, each node’s position can

66


be seen as a random variable whose distribution depends on the distribution of the noisy mea-

surements, the prior on its own position and the distributions of the neighboring nodes positions.

Here, the probabilistic graphical models framework may capture this complex set of dependen-

cies between random variables and enable the use of general purpose algorithms for performing

inference.

The graphical model for the sensor network coincides with the measurement model.

3.4.1 Uncertainty models

In order to establish the graphical model formalism on our problem, we restate several objects

already defined, but now framed in the probabilistic setting. Range measurements are contami-

nated by zero mean independent Gaussian noise. So, the distance between node t and node u can

be expressed as

dtu = ‖x?t − x?u‖+ νtu, νtuiid∼ N (0, σ), (3.50)

where x?t is the true position of node t. A set of measurements corresponding to a subset of edges

I ⊂ E is denoted by dI and, in the same way, a set of positions corresponding to a subset of

nodes V ∈ V is denoted as xV . The probability distribution of νtu will be denoted by pν(νtu). The

noisy range measurement acquired between sensor t and anchor k ∈ At is modelled by

rtk = ‖x?t − ak‖+ νtk, (3.51)

where ak is the anchor position and νtk is a random variable with probability distribution pν(νtk),

the same as in (3.50).

It is assumed that dtu = dut and that the position variables x are independent of the random

variables νE = νtu : t ∼ u ∈ E and νV = νtk : t ∈ V, k ∈ At. Additionally, it is presumed that

each sensor position xt has a prior distribution pt(xt) = N (xt;µt, Rt). Each xt is independent

from xV−t.

The joint distribution is, thus,

p(x, dE) = p(dE |x)p(x)=

∏t∼u p(dtu|xt, xu)

∏t pt(xt)

∏t

∏k∈At

p(rtk|xt), (3.52)

and the a posteriori distribution is proportional to the joint distribution in (3.52), i.e.,

p(x|dE) ∝∏t∼u

pν(‖xt − xu‖ − dtu)∏t

pt(xt)∏t

∏k∈At

pν(‖xt − ak‖ − rtk), (3.53)

where we explicitly wrote the conditional probabilities in terms of pν .

3.4.2 Optimization problem

As defined in the previous section, all probability distributions are Gaussian. To find the

maximum a posteriori (MAP) estimate for the sensor positions an optimization problem is cast

67


by taking the logarithm of Eq. (3.53), thus obtaining

minimizex

∑t∼u

θtu(xt, xu) +∑t

θt(xt), (3.54)

where pairwise potentials are

θtu(xt, xu) =1

σ2(‖xt − xu‖ − dtu)2,

and single node potentials

θt(xt) =1

σ2

∑k∈At

(‖xt − ak‖ − rtk)2 + (xt − µt)>R−1t (xt − µt).

Problem (3.54) is known to be NP-hard for generic graphs, as stated earlier.

3.4.3 Combinatorial problem

We discretize the 95% confidence region for each of the prior distributions, collecting an alphabet

of candidate node positions Xt = α(1)t , · · · , α(nt)

t for each sensor node t, where nt is the cardinality

of each alphabet, and we formulate a combinatorial problem over the collection of such alphabets

as

minimizext∈Xt

∑t∼u

θtu(xt, xu) +∑t

θt(xt). (3.55)

To rewrite the problem over binary variables, we translate this functional form into a matrix form;

we define the matrix Θtu as the evaluation of the pairwise potential θtu(xt, xu) on all points of the

intervening nodes’ alphabets as

Θtu :=

θtu(α

(1)t , α

(1)u ) θtu(α

(1)t , α

(2)u ) · · · θtu(α

(1)t , α

(nu)u )

θtu(α(2)t , α

(1)u ) · · ·

.... . .

...θtu(α

(nt)t , α

(1)u ) · · · θtu(α

(nt)t , α

(nu)u )

, (3.56)

and the vector θt of all evaluations of the node potential function θt(xt) over the alphabet Xt toobtain

θt :=

θt(α

(1)t )

θt(α(2)t )...

θt(α(nt)t )

.We also specify a set ∆t :=

et : et ∈ 0, 1nt , e>t 1 = 1

; it is now possible to rewrite the problem

over the binary variables et as

minimizeet∈∆t

∑t∼u

e>t Θtueu +∑t

e>t θt. (3.57)

The formulation in Problem (3.57) is well known in the probabilistic graphical models literature.

68


3.4.4 Related work

3.4.4.A Linear relaxation and tree-reweighted message passing algorithms

The problem of minimizing (3.55) is widely attacked by means of a Linear programming (LP)

relaxation (see e.g., Wainwright et al. [48]). This approach relies on minimizing (3.55) in the local

marginal polytope, relaxing the integer constraints to non-negative ones. This relaxation is tight

for graphs configured as trees. Nevertheless, the number of variables is very large and the method

does not scale well. To address this issue, some other approaches rely on maximizing the dual

of (3.55). The tree-reweighted message passing algorithms solve a dual problem determined by a

convex combination of trees.

3.4.4.B Dual decomposition

Komodakis et al. [49] proposed a Lagrangian relaxation of the MAP problem related to the

optimization technique of dual decomposition. Rather than minimizing directly (3.55), the problem

is decomposed into a set of subproblems which are easier to solve. As the subproblems emerge

from dualization, the sum of their minima is a lower bound on the value of (3.55). One can

apply different decompositions, resulting in distinct relaxations; if the minimization of (3.55) is

decomposed in a set of trees, this relaxation is equivalent to the LP relaxation. A major issue

arising from the dual nature of this algorithm is how to find a primal solution.

Remarks

1. Problem (3.54) is very hard to solve. A possible approach is to discretize the 6-sigma ellipsoid

defined by the prior distributions, as mentioned in Section 3.4.3, obtaining a combinatorial

problem for which it is possible to design linear and semi-definite relaxations (see Wainwright

et al. [48]), in order to approximate the optimal solution.

2. Dealing with graphical models with cycles — like the ones generally arising from geometric

networks — is also a difficult task. In fact, there are no guarantees of convergence of the

sum-product updates on such topologies.

3. As observed in Ihler et al. [4], only a coarse discretization of the 2D or 3D space leads to

a problem which is computationally effective. Nevertheless, the obtained result can provide

an initialization to be refined by local optimization methods as explored, e.g., in Soares et

al. [8].

3.4.5 Contributions

1. Design an effective convex approximation to Problem (3.54), following the approach sketched

in Remark 1;

69


2. Formulate an iterative scheme for monotonically decreasing the cost function, by performing

inference on judiciously chosen spanning trees of the graph, thus tackling the issue noted in

Remark 2;

3. Provide numerical results assessing the value of the approaches taken on this Section.

3.4.6 Algorithms3.4.6.A Linear and semidefinite relaxations

The work of Wainwright and Jordan [48] establishes a linear relaxation of Problem (3.57) that

we will derive in a different way. The cited work [48] proves that the relaxation is exact for

tree-structured graphs. We begin by defining:

Θ :=

0 Θ12 Θ13 · · · Θ1n

Θ21 0 Θ23 · · · Θ2n

.... . .

...Θn1 · · · 0

θ :=

θ1

θ2

...θn

e :=

e1

e2

...en

,where Θtu is as in (3.56), if the edge t ∼ u belongs to the edge set E , and the zero matrix otherwise.

We reformulate Problem (3.57) as

minimize Tr

[0 θ>/2θ2 Θ

]>E

subject to E =

[1e

] [1 e

]et ∈ ∆t

(3.58)

and rewrite the restrictions to isolate the non convexity in a rank constraint. In order to do so, we

need to write our variable E as

E =

[1 E>21

E21 E22

]where E21 = e, E22 = ee>. We can write the equivalent restrictions as

E = E>

E ≥ 0

E11 = 1

diag ((E22)ii) = (E21)ii

(E22)ij1 = (E21)i

1>(E21)i = 1

Rank(E) = 1.

(3.59)

To achieve the linear relaxation of (3.58) we drop the rank constraint in (3.59). As stated before,

this linear program is only exact for tree structured graphs, as shown in [48]. For graphs with

cycles we propose an SDP reformulation of (3.58), with the constraints in (3.59) except the first,

E = E>, that we replace with

E 0. (3.60)

70


One could expect that this stronger constraint allows a more approximate result in graphs with

cycles, at the expense of an increase in computational cost incurred when passing from the linear

to the semidefinite problem.

3.4.6.B Distributed tree-based inference

The second contribution of this work is an iterative scheme to monotonically decrease the cost

function.

We know that inference on trees is exact. In fact, the methods referred to in Section 3.4.4

work with a dual problem to have access to this important property. Our approach will perform

inference on trees, but still retains a primal nature. At each step we choose a spanning tree over

the geometric measurement graph and perform inference on it. In order to choose the edges that

go into the spanning tree, for each edge t ∼ u, we find the vectors at and au such that a separable

matrix majorizes as tightly as possible the edge potentials matrix. Here, separable matrix means

we can impute to each alphabet element of each node a constant part of the entries of the edge

potentials matrix Θtu. Mathematically, for each edge t ∼ u, we solve the problem

minimizeat,au

‖at1> + 1a>u −Θtu‖F

subject to at1> + 1a>u ≥ Θtu

e>t at + e>u au = e>t Θtueu,

(3.61)

where ‖·‖F denotes the Frobenius norm. The values for et and eu are given as an initialization. The

first restriction in (3.61) ensures that the separable approximation to the edge potentials matrix

lies above it, whereas the second one guarantees it is tight at the initialization point. The optimal

value of problem (3.61) represents the cost of breaking the edge t ∼ u and at and au are the vectors

to add to the node potentials of nodes t and u, respectively, in case the edge t ∼ u is not present

in the spanning tree. Mathematically, we construct a sequence of problems solvable in polynomial

time that majorize (3.57):

minimizeet∈∆t

f(e) =∑

t∼u∈Te>t Θtueu +

∑t

e>t θt, (3.62)

where θt is

θt = θt +∑t∼v∈Et∼v 6∈T

(at)t∼v. (3.63)

We build the spanning tree with the edges that are more expensive to break, thus retaining

those which are less separable, and perform exact inference on the resulting tree. The method

then builds the maximum spanning tree T of the measurement graph G breaking the cheapest

edges. There are many algorithms which are able to compute minimum spanning trees, even in a

distributed way (see for example the work of Gallager et al. in [50]). The maximum spanning tree

is obtained by invoking the chosen minimum spanning tree algorithm with the edge weights w′t∼u =

71


Algorithm 8 Distributed monotonic spanning tree-based algorithmInput: Initialization eOutput: Estimate e1: while some stopping criterion is not met do2: for t ∼ u ∈ E do3: Solve problem (3.61) for edge t ∼ u4: wt∼u = optimal value of problem (3.61)5: (at, au)t∼u = (at, au) optimal points of problem (3.61)6: end for7: Compute maximum spanning tree T of G, feeding edge weights w′t∼u = maxwi∼j : i ∼ j ∈

E − wt∼u + 1 to a (distributed) minimum spanning tree algorithm8: Increment node potentials of broken edges θt as in (3.63)9: Perform exact inference on T , to solve problem (3.62), obtaining a new estimate e

10: end while11: return e = e

maxwi∼j : i ∼ j ∈ E − wt∼u + 1. We note that using an upper bound on maxwi∼j : i ∼ j ∈ Ewill also work and spares the distributed computation of the maximum. Algorithm 8 monotonically

decreases the cost in (3.57) at each iteration. These properties are inherited from the Majorization-

Minimization framework (see Hunter et al. [37] for an in-depth treatment of the framework). We

stress that Algorithm 8 does not prescribe any method to perform exact inference on each tree T .This flexibility is also found in Komodakis et al. [49]. To obtain a distributed algorithm, we can

use a message passing max-sum method which is distributed across nodes.

As with all Majorization-Minimization algorithms, Algorithm 8 requires an initialization. Our

strategy is to have an initial iteration of Algorithm 8 where, instead of solving problem (3.61), we

solve the least squares problem

minimizeat,au

‖at1> + 1a>u −Θtu‖2F

for each edge t ∼ u. As in (3.61), we are looking for the best fit between a separable matrix and

the edge potential matrix Θtu, but, as the separable matrix will be used as an initializing step, it

does not majorize the edge potential matrix, neither it obeys the tightness requirement.

3.4.6.C Distributed nature of the algorithm

Algorithm 8 is distributed, because at each edge the intervening nodes can agree which one

will perform Step 3; the computing node, say, t, will them communicate with the neighbor u to

pass wt∼u and au, thus enabling the distributed computation of the maximum spanning tree T .Inference in Step 9 is also distributed, as mentioned earlier.

3.4.7 Experimental results

In this Section we present numerical experiments assessing the quality of the proposed algo-

rithms in the context of the localization problem.

72


Methods

We conducted simulations with several uniquely localizable geometric networks with 5 sensors

randomly distributed in a two-dimensional square of size 1 × 1. The discretization was random,

with 13 elements in each alphabet.

The noisy range measurements are generated according to

dij = |‖x?i − x?j‖+ νij |, rik = |‖x?i − ak‖+ νik|, (3.64)

where x?i is the true position of node i, and νij : i ∼ j ∈ E ∪ νik : i ∈ V, k ∈ Ai are

independent Gaussian random variables with zero mean and standard deviation σ. The accuracy

of the algorithms is measured by the original nonconvex cost value in (3.55) and by the mean

positioning error, defined as

MPE =1

M

M∑m=1

n∑i=1

‖xi(m)− x?i ‖, (3.65)

whereM is the total number of Monte Carlo trials, xi(m) is the estimate generated by an algorithm

at the Monte Carlo trial m, and x?i is the true position of node i.

3.4.7.A Linear and semidefinite relaxations

The first experiment aimed at comparing the performance of the linear and semidefinite re-

laxations. In Figure 3.13, we can observe the average nonconvex cost over Monte Carlo trials,

5 10 15 20 25 30 35 40 45 50

0.5

1

1.5

2

2.5

3

Monte Carlo Trials

Ave

rag

e c

ost

va

lue

LP

SDP

Figure 3.13: Average cost over Monte Carlo trials.

73


stabilizing on 1.745 for the LP, and 1.478 for the SDP relaxations. Thus, we obtain an improve-

ment of 15% by using the proposed tighter relaxation. The mean positioning error depicted in

5 10 15 20 25 30 35 40 45 50

0.15

0.2

0.25

0.3

0.35

Monte Carlo trials

Mean p

ositio

nin

g e

rror

/sensor

LP

SDP

Figure 3.14: Mean positioning error per sensor over Monte Carlo trials.

Figure 3.14 also shows that, as expected, a tighter relaxation can perform better in terms of

accuracy. The rank of the solution matrix in the experiments also shows the superiority of our

5 10 15 20 25 30 35 40 45 501

1.5

2

2.5

3

3.5

4

Monte Carlo Trials

Ra

nk o

f th

e s

olu

tio

n m

atr

ix

LP

SDP

Figure 3.15: Rank of the solution matrix E in the tested Monte Carlo trials.

relaxation in accuracy: for all the trials, the SDP method is rank 1, proving that the solution of the

SDP problem in (3.58), with the restriction (3.60), is the solution to the nonconvex, combinatorial

problem (3.57). The LP relaxation, on the other hand, is very loose most of the trials, as seen in

Figure 3.15. The price to pay for the accuracy gains is a noticeable increase in execution time,

from less than a second for the LP to several minutes for the SDP.

74


Algorithm 9 Coordinate descent algorithmInput: Initialization xOutput: Estimate x1: while some stopping criterion is not met do2: for t ∈ V do3: Compute the cost C for all elements of the alphabet Xt and considering xV−t fixed.4: xt = argminy C(x1, · · · , xt−1, y, xt+1, xn)5: end for6: end while7: return x = x

Table 3.6: Cost values per sensorLP MM+LS MM+LP CD+LP

0.3617 0.0358 0.0788 0.0719

3.4.7.B Distributed majorization-minimization

In a second experiment, we compared the performance of our Algorithm 8 with a vanilla co-

ordinate descent procedure, described in Algorithm 9. We ran our MM algorithm initialized as

explained in the end of Section 3.4.6.B (MM+LS), and our MM algorithm and the coordinate

descent method, both initialized with an LP estimate (MM+LP and CD+LP). Measurements

were contaminated with white Gaussian noise with standard deviation σ = 0.01. The resulting

cost values per sensor are shown in Table 3.6. Here we can see that all refinement strategies im-

prove the initialization score, but our combination of Majorization-Minimization plus least squares

initialization (MM+LS) decreases the cost by one order of magnitude.

3.4.8 Summary

We addressed the sensor network localization problem with a graphical model approach, by

proposing an SDP relaxation which outperforms the standard LP approximation. Also, we pro-

posed a distributed iterative algorithm to estimate the solution of the problem by means of the

Majorization-Minimization framework, by judiciously choosing spanning trees where inference

could be exactly performed. This method outperformed the coordinate descent method initial-

ized with the LP estimate.

75


76

4Robust algorithms for sensor

network localization

Contents4.1 Related work and contributions . . . . . . . . . . . . . . . . . . . . . . 784.2 Discrepancy measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.3 Convex underestimator . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.3.1 Approximation quality of the convex underestimator . . . . . . . . . . . 804.4 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

77

4. Robust algorithms for sensor network localization

In practice, network applications have to deal with failing nodes, malicious attacks, or, some-

how, nodes facing highly corrupted data — generally classified as outliers. This calls for robust,

uncomplicated, and efficient methods. We propose a dissimilarity model for network localization

which is robust to high-power noise, but also discriminative in the presence of regular Gaussian

noise. We capitalize on the known properties of the M-estimator Huber penalty function to obtain

a robust, but nonconvex, problem, and devise a convex underestimator, tight in the function terms,

that can be minimized in polynomial time. Simulations show the performance advantage of using

this dissimilarity model in the presence of outliers and under regular Gaussian noise: our proposal

consistently outperforms the L1 norm by about one half the positioning error.

4.1 Related work and contributions

Some approaches to robust localization rely on identifying outliers from regular data. Then,

outliers are removed from the estimation of sensor positions. The work in [4] formulates the

network localization problem as an inference problem in a graphical model. To approximate an

outlier process the authors add a high-variance Gaussian to the Gaussian mixtures and employ

nonparametric belief propagation to approximate the solution. In the same vein, [51] employs the

EM algorithm to jointly estimate outliers and sensor positions. Recently, the work [52] tackled

robust localization with estimation of positions, mixture parameters, and outlier noise model for

unknown propagation conditions.

Alternatively, methods may perform a soft rejection of outliers, still allowing them to con-

tribute to the solution. In the work [6] a maximum likelihood estimator for Laplacian noise was

derived and subsequently relaxed to a convex program by linearization and dropping a rank con-

straint, The authors in [39] present a robust Multidimensional Scaling based on the least-trimmed

squares criterion minimizing the squares of the smallest residuals. In [5] the authors use the Huber

loss [53] composed with a discrepancy between measurements and estimate distances, in order to

achieve robustness to outliers. The resulting cost is nonconvex, and optimized by means of the

Majorization-Minimization technique.

The cost function we present incorporates outliers into the estimation process and does not

assume any outlier model. We capitalize on the robust estimation properties of the Huber function

but, unlike [5], we do not address the nonconvex cost in our proposal. Instead, we produce a convex

relaxation which numerically outperforms other natural formulations of the problem.

We present a tight convex underestimator to each term of the robust discrepancy measure for

sensor network localization. Further, we analyze its tightness and compare it with other discrepancy

78

4.2 Discrepancy measure

measures and appropriate relaxations. Our approach assumes no specific outlier model, and all

measurements contribute to the estimate. Numerical simulations illustrate the quality of the convex

underestimator.

4.2 Discrepancy measure

The maximum-likelihood estimator for the sensor positions with additive i.i.d. Gaussian noise

contaminating range measurements is the solution of the optimization problem

minimizex

fG(x),

where

fG(x) =∑i∼j

1

2(‖xi − xj‖ − dij)2 +

∑i

∑k∈Ai

1

2(‖xi − ak‖ − rik)2

is the cost in (1.1). However, outlier measurements will heavily bias the solutions of the optimiza-

tion problem since their magnitude will be amplified by the squares hQ(t) = t2 at each outlier

term. From robust estimation, we know some alternatives to perform soft rejection of outliers,

namely, using L1 loss h|·|(t) = |t| or the Huber loss

hR(t) =

t2 if |t| ≤ R,2R|t| −R2 if |t| ≥ R. (4.1)

The Huber loss joins the best of two worlds: It is robust for large values of the argument — like

the L1 loss — and for reasonable noise levels it behaves quadratically, thus leading to the maximum-

likelihood estimator adapted to regular Gaussian noise. Figure 4.1 depicts a one-dimensional

example of these different costs. We can observe in this simple example the main properties

of the different cost functions, in terms of adaptation to low/medium-power Gaussian noise and

high-power outlier spikes. Using (4.1) we can write our modified robust localization problem as

minimizex

fR(x), (4.2)

where

fR(x) =∑i∼j

1

2hRij (‖xi − xj‖ − dij) +

∑i

∑k∈Ai

1

2hRik

(‖xi − ak‖ − rik). (4.3)

This function is nonconvex and, in general, difficult to minimize. We shall provide a convex

underestimator, that tightly bounds each term of (4.3), thus leading to better estimation results

than other relaxations which are not tight [3].


To convexify fR we can replace each term by its convex hull as depicted in Figure 4.2. Here,

we observe that the high-power behavior is maintained, whereas the medium/low-power is only

79


−1.5 −1 −0.5 0 0.5 1 1.5

0.2

0.4

0.6

0.8

1

1.2

f | · |Absolute value

fR Huber

fQ Quadrati c

Figure 4.1: The different cost functions under consideration: the maximum-likelihood independentwhite Gaussian noise term fQ(xi, xj) = (‖xi − xj‖ − dij)2 shows the steepest tails, which act asoutlier amplifiers; the L1 loss f|·|(xi, xj) = |‖xi−xj‖−dij |, associated with impulsive noise, whichfails to model the Gaussianity of regular operating noise; and, finally, the Huber loss fR(xi, xj) =hR(‖xi − xj‖− dij), combines robustness to high-power outliers and adaptation to medium-powerGaussian noise.

altered in the convexified area. We define the convex costs by composition of any of the convex

functions h with a nondecreasing function (·)+

(t)+ = max0, t

which, in turn, transforms the discrepancies

δij(x) = ‖xi − xj‖ − dik,

δik(xi) = ‖xi − ak‖ − rik.

As (δij(x))+ and (δik(x))+ are nondecreasing and each one of the functions h is convex, then

fR(x) =∑i∼j

1

2h(

(‖xi − xj‖ − dij)+

)+∑i

∑k∈Ai

1

2h((‖xi − ak‖ − rik)+

)(4.4)

is also convex.

4.3.1 Approximation quality of the convex underestimator

The quality of the convexified quadratic problem was addressed in Section 2.5.1, which we

summarize here for convenience of the reader and extend to the two other convex problems.

80


−1.5 −1 −0.5 0 0.5 1 1.50

0.2

0.4

0.6

0.8

1

1.2

f | · |Absolute value

fR Huber

fQ Quadrati c

Convex t ight under es t imator sf(x) = f

(

max0 , |x i − x j | − d i j)

Figure 4.2: All functions f are tight underestimators to the functions g in Figure 4.1. They are theconvex envelopes and, thus, the best convex approximations to each one of the original nonconvexcost terms. The convexification is performed by restricting the arguments of g to be nonnegative.

The optimal value of the nonconvex f , denoted by f?, is bounded by

f? = f(x?) ≤ f? ≤ f(x?),

where x? is the minimizer of the convex underestimator f , and

f? = minxf(x),

is the minimum of function f . A bound for the optimality gap is, thus,

f? − f? ≤ f(x?)− f?.

It is evident that in all cases (quadratic, Huber, and absolute value) f is equal to g when ‖xi−xj‖ ≥dij and ‖xi − ak‖ ≥ rik. When the function terms differ, say for all edges i ∼ j ∈ E2 ⊂ E , wehave s (‖xi − xj‖ − dij) = 0, and similarly with the anchor terms, leading to

f?Q − f?Q ≤∑

i∼j∈E2

1

2

(‖x?i − x?j‖ − dij

)2 (4.5)

f?|·| − f?|·| ≤∑

i∼j∈E2

1

2

∣∣‖x?i − x?j‖ − dij∣∣ (4.6)

f?R − f?R ≤∑

i∼j∈E2

1

2hRij

(‖x?i − x?j‖ − dij

), (4.7)

where

E2 = i ∼ j ∈ E : ‖x?i − x?j‖ < dij).

81


Table 4.1: Bounds on the optimality gap for the example in Figure 4.3

Cost g? − f? Eqs. (4.5)-(4.7) Eqs. (4.8)-(4.10)

Quadratic 3.7019 5.5250 11.3405Absolute value 1.1416 1.1533 3.0511Robust Huber 0.1784 0.1822 0.4786

These bounds are an optimality gap guarantee available after the convexified problem is solved;they tell us how low our estimates can bring the original cost. Our bounds are tighter than theones available a priori from applying [26, Th. 1], which are

f?Q − f?Q ≤∑i∼j

1

2d2ij (4.8)

f?|·| − f?|·| ≤∑i∼j

1

2dij (4.9)

f?R − f?R ≤∑i∼j

1

2hRij (dij) . (4.10)

For the one-dimensional example of the star network costs depicted in Figure 4.3 the bounds

in (4.5)-(4.7), and (4.8)-(4.10), averaged over 500 Monte Carlo trials, are presented in Table 4.1.

The true average gap f? − f? is also shown. In the Monte Carlo trials we sampled a set of zero

mean Gaussian random variables with σ = 0.04 for the baseline Gaussian noise and obtained

a noisy range measurement as in (4.11). One of the measurements is then corrupted by a zero

mean random variable with σ = 4, modelling outlier noise. These results show the tightness of

the convexified function under such noisy conditions and also demonstrate the behaviour of the a

priori bounds in (4.8)-(4.10).


We assess the performance of the three considered loss functions through simulation. The

experimental setup consists in a uniquely localizable geometric network deployed in a square area

with side of 1Km, with four anchors (blue squares in Figure 4.4) located at the corners, and ten

sensors (red stars). Measurements are also visible as dotted green lines. The average node degree

of the network is 4.3. The regular noisy range measurements are generated according to

dij = |‖x?i − x?j‖+ νij |,

rik = |‖x?i − ak‖+ νik|, (4.11)

expressed in Km, where x?i is the true position of node i, and νij : i ∼ j ∈ E ∪ νik : i ∈ V, k ∈Ai are independent Gaussian random variables with zero mean and standard deviation 0.04,

corresponding to an uncertainty of about 40m. Node 7 is malfunctioning and all measurements

related to it are perturbed with Gaussian noise with standard deviation 4, corresponding to an

uncertainty of 4Km. The convex optimization problems were solved with cvx [54]. We ran 100

Monte Carlo trials, sampling both regular and outlier noise.

82

4.5 Summary

Table 4.2: Average positioning error per sensor (MPE/sensor), in meters

f|·| fQ fR

59.50 32.16 31.06

Table 4.3: Average positioning error per sensor (MPE/sensor), in meters, for the biased experiment

f|·| fQ fR

80.98 58.31 47.08

The performance metric used to assess accuracy is the average positioning error defined as (3.65).

In Figure 4.4 we can observe that clouds of estimates from fR and fQ gather around the true po-

sitions, except for the malfunctioning node 7. Note the spread of blue dots in the surroundings

of the edges connecting node 7, indicating that fR better preserves the nodes’ ability to localize

themselves, despite their confusing neighbor, node 7. This intuition is confirmed by the analysis of

the data in Table 4.2, which demonstrates that, even with only one disrupted sensor, our robust

cost can reduce the error per sensor by 1.1 meters. Also, as expected, the malfunctioning node

cannot be reliably located by any of the methods. The sensitivity to the value of the Huber param-

eter R in (4.1) is moderate, as shown in Figure 4.5. In fact, the error per sensor of the proposed

estimator is always the smallest for all tested values of the parameter. We observe that the error

increases when R approaches the standard deviation of the regular Gaussian noise, meaning that

the Huber loss gets closer to the L1 loss and, thus, is no longer adapted to the regular noise (R = 0

corresponds exactly to the L1 loss); in the same way, as R increases, so does the quadratic section,

and the estimator gets less robust to outliers, so, again, the error increases.

Another interesting experiment is to see what happens when the faulty sensor produces mea-

surements with consistent errors or bias. So, we ran 100 Monte Carlo trials in the same setting, but

the measurements of node 7 are consistently 10% of the real distance to each neighbor. The average

positioning error per sensor is shown in Table 4.3. Here we observe a significant performance gap

between the alternative costs, and our formulation proves to be, by far, superior.

4.5 Summary

We proposed an easy to motivate and effective dissimilarity model, which accounts for outliers

without prescribing a model for outlier noise. This dissimilarity model was convexified by means

of the convex envelopes of its terms, leading to a problem with a unique minimum value attainable

in polynomial time. Further, we studied the optimality gap of the discrepancies, both a priori and

after obtaining an estimate, thus providing bounds for the suboptimality of the convexification —

guarantees useful in practice.

Different types of algorithms can be designed to attack the discrepancy measure presented in

this work, since the function is continuous and convex (in the previous section the optimization

83


problem of minimizing (4.4) was solved using the cvx general-purpose convex solver). Due to the

distributed nature of networks of sensors or, generically, agents, we aim at investigating a dis-

tributed minimization of the proposed robust loss. There are also several nice properties regarding

distributed operation: the adjustable Huber parameter is local to each edge and, if desired, can be

dynamically adjusted to the local environmental noise conditions, in a distributed manner.

84

4.5 Summary

0 2 3 4 5 7

0

5

10

fQ(x)

fQ(x)

nodene i ghbor ne i ghbor ne i ghbor1D starNetwork

(a) Quadratic cost.

0 2 3 4 5 7

0

5

10

f | · |(x)

f | · |(x)


(b) Absolute value cost.

0 2 3 4 5 7

0

5

10

fR(x)

fR(x)


(c) Robust Huber cost.

Figure 4.3: One-dimensional example of the quality of the approximation of the true nonconvexcosts f(x) by the convexified functions f(x) in a star network. The node positioned at x = 3 has3 neighbors.

85


0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2

3 4

5

6

7

8

9

10

11

12

13

14

Figure 4.4: Estimates for sensor positions for the three loss functions; We plotted the results ofminimizing the L1 loss f|·| (yellow), the quadratic loss fQ (blue), and the proposed robust estimatorwith Huber loss fR (magenta) It is noticeable that the L1 loss is not able to correctly estimatepositions whose measurements are corrupted with Gaussian noise. The Outlier measurements innode 7 have more impact in the augmented dispersion of blue dots than magenta dots around itsneighbors.

0.04 0.05 0.06 0.07 0.08 0.09 0.131.05

31.1

31.15

Huber function parameter R

Avg.positioning

errorǫ/se

nso

r[m

Figure 4.5: Average positioning error versus the value of the Huber function parameter R. Theaccuracy is maintained even for a wide range of parameter values. We stress that the error willincrease largely when R → 0 and R → ∞, since these situations correspond to the L1 and L2

cases, respectively.

86

5Conclusions and perspectives

Contents5.1 Distributed network localization without initialization . . . . . . . . 885.2 Addressing the nonconvex problem . . . . . . . . . . . . . . . . . . . . 88

5.2.1 With more computations we can do better . . . . . . . . . . . . . . . . . 895.2.2 Network of agents as a graphical model . . . . . . . . . . . . . . . . . . 89

5.3 Robust network localization . . . . . . . . . . . . . . . . . . . . . . . . 905.4 In summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

87

5. Conclusions and perspectives

In this thesis we presented a flow of methods, from the initialization-free convex relaxation

method, which only requires knowledge of noisy measurements and a few anchor locations, to more

precise algorithms that, given also a good initial guess, will provide a highly accurate estimate

of the positions of the nodes. We also addressed localization in harsh environments, prone to

outliers, which need especially robust algorithms to overcome the negative influence of corrupted

or malicious data.

5.1 Distributed network localization without initialization

We presented a simple, fast and convergent relaxation method for synchronous and asyn-

chronous time models. From the analysis of the problem, we uncovered key properties which

allow a synchronous time, distributed gradient algorithm with an optimal convergence rate. We

also presented an asynchronous randomized method, more suited for unstructured and large scale

networks. We proved not only almost sure convergence of the cost value, but also almost sure

convergence to a point, which is, as far as we know, an absolutely novel result for distributed

gradient algorithms in general. This stronger convergence result has a significant impact in real

time applications because nodes can safely probe the amount of change in the estimates to stop

computing. The methods were published in IEEE Transactions on Signal Processing. Extending

this work, we interpreted each term of the cost as a discrepancy measure between a model and the

noisy measurement and generalized it to include in the same cost heterogeneous measurements. In

particular, we fused range and angle information, obtaining very interesting results, which were

already submitted for publication.

5.2 Addressing the nonconvex problem

In some applications it is fundamental to obtain a very accurate estimate of the positions of the

agents. Sometimes the convex approximation discussed above does not achieve these tight precision

requirements. In such cases one should address the nonconvex estimator problem directly, whenever

armed with a good starting point — for example, the convex approximation solution. With this

need in mind, we presented a simple, distributed, and efficient algorithm, proven to converge1,

requiring no parameter tuning. The method turns out to be a member of the majorization-

minimization family where the majorization function is a quadratic.

An alternative to the majorization-minimization framework, initialized with, e.g., the estimate

1More precisely, every limit point of the algorithm is a stationary point of the cost function.

88

5.2 Addressing the nonconvex problem

from the methods described in Chapter 2, could be an homotopy continuation method (see Allgo-

wer [55] for in-depth information on homotopy methods). The downside of such methods is that

they can become very difficult when applied to nonconvex functions, e.g., if we maintain the acces-

sibility condition that all isolated solutions can be reached. This might lead to bifurcations of the

method and, so, a combinatorial problem. Nevertheless, Moré and Wu [56] propose a smoothing

Gaussian kernel, also used in the paper by Destino and Abreu [57] for localization by continuation.

The first work addresses a squared discrepancy with squared distances, whereas the second applies

the same smoothing kernel to the maximum likelihood formulation.

Experimentally, our method has substantial performance improvements over the state of the

art, and adding the convergence properties, the algorithm stands in the small club of distributed,

nonconvex, and provable maximum-likelihood estimators for the network localization problem,

given a good starting point. We presented the method in IEEE GlobalSIP 2014.

5.2.1 With more computations we can do better

In order to be less dependent on the initialization, we aimed at a tighter convex majorization

function as the tool for a majorization-minimization algorithm. The quality of the estimate from

the MM procedure with this novel, tighter approximation was verified experimentally as being more

than one order of magnitude in root mean squared error. This extraordinary result encourages

us to pursue a tailored minimization algorithm for this tighter majorizer. A first attempt was to

follow the Alternating Direction Method of Multipliers (ADMM) strategy, but, even though we

obtained a distributed method with far better accuracy than the benchmark, we were not satisfied

with the amount of communications expended on the process, that are also commonly observed in

distributed methods using ADMM. Also, as the ADMM subproblems at each node did not have

closed form solution, the overall estimate degraded very sharply with a small degradation in the

solutions of the subproblems at each node, thus leading to a less interesting performance than that

of our previously mentioned work. Our next step is to devise a proximal algorithm, taking into

account the non-differentiability of our novel majorizer.

5.2.2 Network of agents as a graphical model

Another perspective on the localization problem is to consider the positions of the nodes as

random variables, and the measurement network as an undirected graphical model. This perspec-

tive was explored in Section 3.4 and the known linear relaxation to the resulting combinatorial

problem was re-derived. The resulting formulation of the problem led to a novel SDP relaxation,

tighter than the linear one and, thus, obtaining better experimental results in terms of root mean

squared error. A faster and more accurate descent algorithm with no initialization and attacking

the nonconvex cost was also presented. This novel method improved the error not only of the

linear relaxation but also of a vanilla coordinate descent initialized with the estimate from the

89


linear relaxation.

5.3 Robust network localization

Sometimes our nodes are malfunctioning or malicious and their collected measurements behave

like outliers. Albeit the practical pertinence of this problem, research in this topic is still meager.

To bridge this gap, we designed a soft outlier rejection approach, by considering the known outlier

rejecting penalties of the L1 norm and the Huber function. Nevertheless, using these penalties

leads to very difficult, nonconvex problems. We convexified these penalties and the approximated

estimator for the Huber function performed far better in a scenario with malfunctioning nodes than

the approach presented in Chapter 2 and discussed in Section 5.1. The results are very exciting

and our next step is to shape an algorithm to optimize the tight convex approximations in a way

that is distributed, fast and simple to implement.

5.4 In summary

Throughout this work we were dedicated to achieve useful estimates for the agent positions

given a sparse noisy range measurement network and a small set of reference nodes. Here we

presented methods for network localization which are defined by the following principles:

• Full network localization solutions, from uninformed agent deployment to a refined estimate;

• Emphasis on scalable, distributed, solutions;

• Novel, tighter, approximations to the Maximum-Likelihood cost function, thus leading to

estimates that are more resilient to noise;

• Intuitive derivations and simple to implement algorithms;

• Faster and reliable estimates;

• Provable convergence of algorithms.

Open perspectives of work include deepening our approaches, but also considering new settings,

like:

• Online optimization variants of the proposed algorithms;

• Applying our approaches to the mobile setting, by introducing dynamics in the problem

formulation;

• Considering that the noise variance is not constant but rather a function of distance measured;

• Determining the optimal placement of anchors and sensor nodes;

90

5.4 In summary

• Approaching the robust estimation topic with a Laplacian noise model. Here we can write the

Laplacian distribution as a mixture of Gaussians with the same mean but different variances,

as described in Girosi [58]. In this setting, the expectation-maximization algorithm could be

employed.

91


92

Bibliography

[1] N. Bulusu, J. Heidemann, and D. Estrin, “Gps-less low-cost outdoor localization for very small

devices,” Personal Communications, IEEE, vol. 7, no. 5, pp. 28–34, 2000.

[2] P. Biswas, T.-C. Lian, T.-C. Wang, and Y. Ye, “Semidefinite programming based algorithms

for sensor network localization,” ACM Transactions on Sensor Networks (TOSN), vol. 2, no. 2,

pp. 188–220, 2006.

[3] A. Simonetto and G. Leus, “Distributed maximum likelihood sensor network localization,”

Signal Processing, IEEE Transactions on, vol. 62, no. 6, pp. 1424–1437, Mar. 2014.

[4] A. Ihler, I. Fisher, J.W., R. Moses, and A. Willsky, “Nonparametric belief propagation for

self-localization of sensor networks,” Selected Areas in Communications, IEEE Journal on,

vol. 23, no. 4, pp. 809 – 819, Apr. 2005.

[5] S. Korkmaz and A.-J. van der Veen, “Robust localization in sensor networks with iterative

majorization techniques,” in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009.

IEEE International Conference on, Apr. 2009, pp. 2049 –2052.

[6] P. Oguz-Ekim, J. Gomes, J. Xavier, and P. Oliveira, “Robust localization of nodes and time-

recursive tracking in sensor networks using noisy range measurements,” Signal Processing,

IEEE Transactions on, vol. 59, no. 8, pp. 3930 –3942, Aug. 2011.

[7] C. Soares, J. Xavier, and J. Gomes, “Simple and fast convex relaxation method for coop-

erative localization in sensor networks using range measurements,” Signal Processing, IEEE

Transactions on, vol. 63, no. 17, pp. 4532–4543, Sept 2015.

[8] ——, “Distributed, simple and stable network localization,” in Signal and Information

Processing (GlobalSIP), 2014 IEEE Global Conference on, Dec 2014, pp. 764–768.

[9] ——, “DCOOL-NET: Distributed cooperative localization for sensor networks,” submitted,

http://arxiv.org/abs/1211.7277.

[10] C. Soares and J. Gomes, “Robust dissimilarity measure for network localization,” arXiv

preprint arXiv:1410.2327, 2014.

93

Bibliography

[11] J. Aspnes, D. Goldenberg, and Y. R. Yang, “On the computational complexity of sensor

network localization,” in Algorithmic Aspects of Wireless Sensor Networks. Springer, 2004,

pp. 32–44.

[12] P. Biswas and Y. Ye, “Semidefinite programming for ad hoc wireless sensor network localiza-

tion,” in Proceedings of the 3rd international symposium on Information processing in sensor

networks. ACM, 2004, pp. 46–54.

[13] J. Costa, N. Patwari, and A. Hero III, “Distributed weighted-multidimensional scaling for

node localization in sensor networks,” ACM Transactions on Sensor Networks (TOSN), vol. 2,

no. 1, pp. 39–64, 2006.

[14] M. Gholami, L. Tetruashvili, E. Strom, and Y. Censor, “Cooperative wireless sensor network

positioning via implicit convex feasibility,” Signal Processing, IEEE Transactions on, vol. 61,

no. 23, pp. 5830–5840, Dec. 2013.

[15] S. Srirangarajan, A. Tewfik, and Z.-Q. Luo, “Distributed sensor network localization using

SOCP relaxation,” Wireless Communications, IEEE Transactions on, vol. 7, no. 12, pp. 4886

–4895, Dec. 2008.

[16] F. Chan and H. So, “Accurate distributed range-based positioning algorithm for wireless sensor

networks,” Signal Processing, IEEE Transactions on, vol. 57, no. 10, pp. 4100 –4105, Oct. 2009.

[17] U. Khan, S. Kar, and J. Moura, “DILAND: An algorithm for distributed sensor localization

with noisy distance measurements,” Signal Processing, IEEE Transactions on, vol. 58, no. 3,

pp. 1940 –1947, Mar. 2010.

[18] Q. Shi, C. He, H. Chen, and L. Jiang, “Distributed wireless sensor network localization via

sequential greedy optimization algorithm,” Signal Processing, IEEE Transactions on, vol. 58,

no. 6, pp. 3328 –3340, June 2010.

[19] D. Blatt and A. Hero, “Energy-based sensor network source localization via projection onto

convex sets,” Signal Processing, IEEE Transactions on, vol. 54, no. 9, pp. 3614–3619, Sept.

2006.

[20] J.-B. Hiriart-Urruty and C. Lemaréchal, Convex analysis and minimization algorithms.

Springer-Verlag Limited, 1993.

[21] F. R. Chung, Spectral graph theory. American Mathematical Soc., 1997, vol. 92.

[22] R. B. Bapat, Graphs and matrices. Springer, 2010.

[23] M. Mesbahi and M. Egerstedt, Graph theoretic methods in multiagent networks. Princeton

University Press, 2010.

94

Bibliography

[24] Y. Nesterov, “A method of solving a convex programming problem with convergence rate

O(1/k2),” in Soviet Mathematics Doklady, vol. 27, no. 2, 1983, pp. 372–376.

[25] D. Shah, Gossip algorithms. Now Publishers Inc, 2009.

[26] M. Udell and S. Boyd, “Bounding duality gap for problems with separable objective,” ONLINE,

2014.

[27] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Aca-

demic Publishers, 2004.

[28] D. Bertsekas, “Incremental proximal methods for large scale convex optimization,”

Mathematical Programming, vol. 129, pp. 163–195, 2011.

[29] Z. Lu and L. Xiao, “On the complexity analysis of randomized block-coordinate descent meth-

ods,” arXiv preprint arXiv:1305.4723, 2013.

[30] J. Sturm, “Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones,”

Optimization Methods and Software, vol. 11–12, pp. 625–653, 1999, version 1.05 available

from http://fewcal.kub.nl/sturm.

[31] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and distributed computation: numerical methods.

Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1989.

[32] D. P. Bertsekas, Nonlinear programming. Athena Scientific, 1999.

[33] J. Jacod and P. Protter, Probability Essentials. Springer, 2003, vol. 1.

[34] H. Robbins and D. Siegmund, “A convergence theorem for non negative almost supermartin-

gales and some applications,” in Herbert Robbins Selected Papers. Springer, 1985, pp.

111–135.

[35] G. Calafiore, L. Carlone, and M. Wei, “Distributed optimization techniques for range local-

ization in networked systems,” in Decision and Control (CDC), 2010 49th IEEE Conference

on, Dec. 2010, pp. 2221–2226.

[36] M. Raydan, “The barzilai and borwein gradient method for the large scale unconstrained

minimization problem,” SIAM Journal on Optimization, vol. 7, no. 1, pp. 26–33, 1997.

[37] D. R. Hunter and K. Lange, “A tutorial on MM algorithms,” The American Statistician,

vol. 58, no. 1, pp. 30–37, Feb. 2004.

[38] A. Beck and Y. Eldar, “Sparsity constrained nonlinear optimization: Optimality conditions

and algorithms,” SIAM Journal on Optimization, vol. 23, no. 3, pp. 1480–1509, 2013.

[Online]. Available: http://dx.doi.org/10.1137/120869778

95

http://dx.doi.org/10.1137/120869778

Bibliography

[39] P. Forero and G. Giannakis, “Sparsity-exploiting robust multidimensional scaling,” Signal

Processing, IEEE Transactions on, vol. 60, no. 8, pp. 4118 –4134, Aug. 2012.

[40] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statis-

tical learning via the alternating direction method of multipliers,” Foundations and Trends R©in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011.

[41] I. Schizas, A. Ribeiro, and G. Giannakis, “Consensus in ad hoc WSNs with noisy links — part

i: Distributed estimation of deterministic signals,” Signal Processing, IEEE Transactions on,

vol. 56, no. 1, pp. 350 –364, Jan. 2008.

[42] H. Zhu, G. Giannakis, and A. Cano, “Distributed in-network channel decoding,” Signal

Processing, IEEE Transactions on, vol. 57, no. 10, pp. 3970 –3983, Oct. 2009.

[43] P. Forero, A. Cano, and G. Giannakis, “Consensus-based distributed support vector machines,”

The Journal of Machine Learning Research, vol. 11, pp. 1663–1707, 2010.

[44] J. Bazerque and G. Giannakis, “Distributed spectrum sensing for cognitive radio networks by

exploiting sparsity,” Signal Processing, IEEE Transactions on, vol. 58, no. 3, pp. 1847 –1862,

Mar. 2010.

[45] T. Erseghe, D. Zennaro, E. Dall’Anese, and L. Vangelista, “Fast consensus by the alternating

direction multipliers method,” Signal Processing, IEEE Transactions on, vol. 59, no. 11, pp.

5523 –5537, Nov. 2011.

[46] J. Mota, J. Xavier, P. Aguiar, and M. Puschel, “Distributed basis pursuit,” Signal Processing,

IEEE Transactions on, vol. 60, no. 4, pp. 1942 –1956, Apr. 2012.

[47] B. D. O. Anderson, I. Shames, G. Mao, and B. Fidan, “Formal theory of noisy sensor network

localization,” SIAM Journal on Discrete Mathematics, vol. 24, no. 2, pp. 684–698, 2010.

[48] M. Wainwright and M. Jordan, “Graphical models, exponential families, and variational in-

ference,” Foundations and Trends R© in Machine Learning, vol. 1, no. 1-2, pp. 1–305, 2008.

[49] N. Komodakis, N. Paragios, and G. Tziritas, “Mrf energy minimization and beyond via dual

decomposition,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 33,

no. 3, pp. 531–552, March 2011.

[50] R. G. Gallager, P. A. Humblet, and P. M. Spira, “A distributed algorithm for minimum-weight

spanning trees,” ACM Transactions on Programming Languages and systems (TOPLAS),

vol. 5, no. 1, pp. 66–77, 1983.

[51] J. Ash and R. Moses, “Outlier compensation in sensor network self-localization via the EM

algorithm,” in Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP ’05).

IEEE International Conference on, vol. 4, March 2005, pp. iv/749–iv/752 Vol. 4.

96

Bibliography

[52] F. Yin, A. Zoubir, C. Fritsche, and F. Gustafsson, “Robust cooperative sensor network local-

ization via the EM criterion in LOS/NLOS environments,” in Signal Processing Advances in

Wireless Communications (SPAWC), 2013 IEEE 14th Workshop on, June 2013, pp. 505–509.

[53] P. J. Huber, “Robust estimation of a location parameter,” The Annals of Mathematical

Statistics, vol. 35, no. 1, pp. 73–101, 1964.

[54] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex programming, version

1.21,” http://cvxr.com/cvx, Apr. 2011.

[55] E. L. Allgower and K. Georg, Numerical continuation methods: an introduction. Springer

Science & Business Media, 2012, vol. 13.

[56] J. J. Moré and Z. Wu, “Global continuation for distance geometry problems,” SIAM Journal

on Optimization, vol. 7, no. 3, pp. 814–836, 1997.

[57] G. Destino and G. Abreu, “On the maximum likelihood approach for source and network

localization,” Signal Processing, IEEE Transactions on, vol. 59, no. 10, pp. 4954 –4970, Oct.

2011.

[58] F. Girosi, “Models of noise and robust estimates,” 1991.

97

http://cvxr.com/cvx

UNIVERSIDADE DE LISBOA INSTITUTO SUPERIOR TÉCNICOusers.isr.ist.utl.pt/~csoares/thesis.pdf ·...

Documents

Transcript of UNIVERSIDADE DE LISBOA INSTITUTO SUPERIOR TÉCNICOusers.isr.ist.utl.pt/~csoares/thesis.pdf ·...