UNIVERSIDADE DE LISBOA INSTITUTO SUPERIOR TÉCNICOusers.isr.ist.utl.pt/~csoares/thesis.pdf ·...
Transcript of UNIVERSIDADE DE LISBOA INSTITUTO SUPERIOR TÉCNICOusers.isr.ist.utl.pt/~csoares/thesis.pdf ·...
X
X
X
X
UNIVERSIDADE DE LISBOA
INSTITUTO SUPERIOR TÉCNICO
Distributed and robust network localization algorithms
Cláudia Alexandra Magalhães Soares
Orientador: Doutor João Pedro Castilho Pereira Santos Gomes
Thesis approved in public session to obtain the PhD Degree in
Electrical and computer Engineering
Jury final classification: Pass with Distinction and Honour
Tese Provisória
Dezembro 2015
Abstract
Signal processing over networks has been a broad and hot topic in the last few years in the
signal processing community. Networks of agents typically rely on known node positions, even
if the main goal of the network is not localization. A network of agents may comprise a large
set of miniature, low cost, low power autonomous sensing nodes. In this scenario it is generally
unsuitable or even impossible to accurately deploy all nodes in a predefined location within the
network operation area. GPS is also discarded as an option for indoor applications or due to cost
and energy consumption constraints. Also, mobile agents need localization for, e.g., motion planing,
or formation control, and GPS might not be available in many environments. Real world conditions
imply noisy environments, and the network operation calls for fast and reliable estimation of the
agents’ locations. So, galvanized by the compelling applications and the difficulty of the problem
itself, researchers have dedicated work to finding the nodes in networks. Some develop centralized
methods, while others pursue distributed, scalable solutions, either by developing approximations
or tackling the nonconvex problem, sometimes combining both approaches. With the growing
network sizes of devices constrained in energy expenditure and computation power, the need for
simple, fast, and distributed algorithms for network localization spurred the work presented on this
thesis. Here, we approach the problem starting from minimal data collection — aggregating only
range measurements and a few landmark positions — delivering a good approximated solution,
that can be fed to our fast, yet simple maximum-likelihood method, returning highly accurate
solutions. We explore tailored solutions recurring to the optimization and probability tools that can
leverage performance with noise and unstructured environments. Thus, this thesis contributions
are, mainly:
• Distributed localization algorithms characterized for their simplicity but also strong guaran-
tees;
• Analyses of convergence, iteration complexity, and optimality bounds for the designed pro-
cedures;
• Novel majorization approaches which are tailored to the specific problem structure.
i
Keywords
Distributed algorithms, convex relaxations, nonconvex optimization, maximum likelihood es-
timation, distributed iterative agent localization, robust estimation, noisy range measurements,
network localization, majorization-minimization, optimal gradient methods.
ii
Resumo
O processamento de sinal em redes tem sido um tema lato e abundante nos últimos anos
junto da comunidade científica. As redes de agentes geralmente assentam o seu funcionamento no
conhecimento da posição dos seus nós, mesmo em situações em que a localização não é o objectivo
principal da operação. Uma rede de agentes pode ser composta por um grande conjunto de nós
autónomos, de baixo custo e baixa potência. Neste cenário não é em geral adequado — ou torna-se
mesmo impossível — posicionar os nós em localizações pré-definidas dentro da área de operação.
A utilização de GPS também está excluída para aplicações no interior de edifícios ou devido ao
seu custo ou requisitos energéticos. Por outro lado, agentes móveis necessitam de conhecer a sua
localização para planear o seu movimento ou efectuar controlo de formação, por exemplo, e recursos
como o GPS podem não ser acessíveis. O mundo real implica ainda ambientes ruidosos e o objectivo
da rede exige uma estimação rápida e fiável das posições dos diversos agentes. Assim, investigadores
da área dedicaram trabalho a localizar os nós da rede, galvanizados pelas relevantes aplicações, mas
também pelo desafio do problema. Algumas linhas de trabalho centraram-se no desenvolvimento
de métodos centralizados, enquanto que outras procuram soluções distribuídas e escaláveis, quer
desenvolvendo aproximações, quer tratando directamente o problema não convexo e por vezes
combinando ambas as abordagens. Com o aumento do tamanho das redes de dispositivos parcos em
recursos energéticos e computacionais, a necessidade de algoritmos simples, rápidos e distribuídos
impulsionou o trabalho apresentado nesta tese. Nela abordamos o problema começando com um
conjunto mínimo de dados, agregando apenas medidas de distância e posições de uns poucos marcos,
entregando uma solução aproximada de média precisão que, posteriormente, pode ser alimentada
ao nosso rápido e, no entanto, simples método de máxima verosimilhança, retornando soluções
de muito alta precisão. Para ir ao encontro destas necessidades exploramos soluções à medida
recorrendo a técnicas de optimização e probabilidade que potenciam a exactidão e rapidez mesmo
na presença de ruído e ambientes não estruturados. Assim, as principais contribuições desta tese
são:
• Algoritmos distribuídos de localização, caracterizados pela sua simplicidade, mas também
pelas fortes garantias de convergência;
• Análises de convergência, complexidade em termos de iterações e limites de optimalidade
para os procedimentos em causa;
iii
• Novas abordagens de majorização feitas à medida para a estrutura específica do problema.
Palavras Chave
Algoritmos distribuídos, relaxações convexas, optimização não convexa, estimação de máxima
verosimilhança, localização distribuída de redes de agentes, estimação robusta, medidas de distância
ruidosas, localização em redes, majorização-minimização, métodos óptimos de gradiente.
iv
Contents
1 Introduction 1
1.1 Motivation and related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Scalability and networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Robustness and harsh environments . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Objectives and contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Agent localization on a network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Distributed network localization without initialization: Tight convex underestimator-
based procedure 7
2.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Convex underestimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Distributed sensor network localization . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.1 Gradient and Lipschitz constant of f . . . . . . . . . . . . . . . . . . . . . . 11
2.4.2 Parallel method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.3 Asynchronous method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.1 Quality of the convexified problem . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.2 Parallel method: convergence guarantees and iteration complexity . . . . . 18
2.5.3 Asynchronous method: convergence guarantees and iteration complexity . . 19
2.6 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6.1 Assessment of the convex underestimator performance . . . . . . . . . . . . 26
2.6.2 Performance of distributed optimization algorithms . . . . . . . . . . . . . . 28
2.6.3 Performance of the asynchronous algorithm . . . . . . . . . . . . . . . . . . 29
2.7 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7.1 Convex envelope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7.2 Lipschitz constant of ∇φBij. . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.7.3 Auxiliary Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.7.4 Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
v
Contents
2.8 Summary and further extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.8.1 Heterogeneous data fusion application . . . . . . . . . . . . . . . . . . . . . 35
3 Distributed network localization with initialization: Nonconvex procedures 37
3.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Distributed Majorization-Minimization with quadratic majorizer . . . . . . . . . . 39
3.2.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.2 Problem reformulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.3 Majorization-Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.4 Distributed sensor network localization . . . . . . . . . . . . . . . . . . . . 42
3.2.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Majorization-Minimization with convex tight majorizer . . . . . . . . . . . . . . . . 47
3.3.1 Majorization function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.2 Experimental results on majorization function quality . . . . . . . . . . . . 49
3.3.3 Distributed optimization of the proposed majorizer using ADMM . . . . . . 49
3.3.4 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.5 Proof of majorization function properties . . . . . . . . . . . . . . . . . . . 63
3.3.6 Proof of Proposition 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.3.7 Proof of (3.31) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4 Sensor network localization: a graphical model approach . . . . . . . . . . . . . . . 66
3.4.1 Uncertainty models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4.2 Optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4.3 Combinatorial problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.4.4 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.4.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.4.6 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4.7 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4 Robust algorithms for sensor network localization 77
4.1 Related work and contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2 Discrepancy measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3 Convex underestimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3.1 Approximation quality of the convex underestimator . . . . . . . . . . . . . 80
4.4 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
vi
Contents
5 Conclusions and perspectives 87
5.1 Distributed network localization without initialization . . . . . . . . . . . . . . . . 88
5.2 Addressing the nonconvex problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.2.1 With more computations we can do better . . . . . . . . . . . . . . . . . . . 89
5.2.2 Network of agents as a graphical model . . . . . . . . . . . . . . . . . . . . 89
5.3 Robust network localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4 In summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
vii
List of Figures
2.1 Convex envelope for one-dimensional example . . . . . . . . . . . . . . . . . . . . . 10
2.2 One-dimensional example of the quality of the approximation of the true nonconvex
cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Two-dimensional star network to assess the quality of optimality bounds . . . . . . 18
2.4 Proximal minimization evolution for the toy problem . . . . . . . . . . . . . . . . . 21
2.5 Proximal minimization cost evolution for the toy problem . . . . . . . . . . . . . . 22
2.6 Network 1. Topology with 4 anchors and 10 sensors. Anchors are marked with blue
squares and sensors with red stars. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.7 Network 2. Topology with 4 anchors and 50 sensors. Anchors are also marked with
blue squares and sensors with red stars. . . . . . . . . . . . . . . . . . . . . . . . . 25
2.8 Relaxation quality experiment for different noise levels . . . . . . . . . . . . . . . . 26
2.9 Relaxation quality experiment for high power noise . . . . . . . . . . . . . . . . . . 27
2.10 Estimates for the location of the sensor nodes (network with 10 agents). . . . . . . 27
2.11 Performance comparison: Algorithm 1 vs. projection method. . . . . . . . . . . . . 28
2.12 Performance comparison: Algorithm 1 vs. ESDP method. . . . . . . . . . . . . . . 29
2.13 Performance of the asynchronous algorithm. . . . . . . . . . . . . . . . . . . . . . . 29
3.1 Nonconvex reformulation illustration . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Evolution of cost and average error per sensor with communications, for Algorithm 4
and the benchmark, under low power noise. . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Evolution of cost and average error per sensor with communications, for Algorithm 4
and the benchmark, under medium power noise. . . . . . . . . . . . . . . . . . . . 45
3.4 Evolution of cost and average error per sensor with communications, for Algorithm 4
and the benchmark, under high power noise. . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Tightness evaluation for the proposed majorizer in (3.15). . . . . . . . . . . . . . . 48
3.6 Evaluation of majorizer performance for different initializations. . . . . . . . . . . . 50
3.7 Performance comparison: Algorithm 7 vs. SGO; noiseless range measurements. . . 59
3.8 Performance comparison: Algorithm 7 vs. SGO; noisy range measurements. . . . . 60
ix
List of Figures
3.9 Performance comparison: Algorithm 7 vs. SGO; noisy range measurements, random
anchors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.10 Performance comparison: Algorithm 7 vs. SGO; with increasing measurement noise. 62
3.11 Performance comparison: Algorithm 7 vs. SGO; accuracy and communications. . . 63
3.12 Performance comparison: Algorithm 7 vs. SGO; increasing parameter value. . . . . 64
3.13 Average cost over Monte Carlo trials. . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.14 Mean positioning error per sensor over Monte Carlo trials. . . . . . . . . . . . . . . 74
3.15 Rank of the solution matrix E in the tested Monte Carlo trials. . . . . . . . . . . . 74
4.1 Comparison of nonconvex cost functions. . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2 Quality of the proposed relaxation (4.4). . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3 Illustration of the relaxation quality of (4.4). . . . . . . . . . . . . . . . . . . . . . 85
4.4 Estimates for sensor positions for the three discrepancy functions. . . . . . . . . . 86
4.5 Average positioning error vs. the value of the Huber function parameter. . . . . . . 86
x
List of Tables
2.1 Bounds on the optimality gap for the example in Figure 2.2 . . . . . . . . . . . . . 18
2.2 Bounds on the optimality gap for the 2D example in Figure 2.3 . . . . . . . . . . . 18
2.3 Number of communications per sensor for the results in Fig. 2.12 . . . . . . . . . . 28
3.1 Mean positioning error, with measurement noise . . . . . . . . . . . . . . . . . . . 43
3.2 Squared error dispersion over Monte Carlo trials for Figure 3.7. . . . . . . . . . . 59
3.3 Squared error dispersion over Monte Carlo trials for Figure 3.8. . . . . . . . . . . . 60
3.4 Squared error dispersion over Monte Carlo trials for Figure 3.9. . . . . . . . . . . 61
3.5 Squared error dispersion over Monte Carlo trials for Figure 3.10. . . . . . . . . . . 62
3.6 Cost values per sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1 Bounds on the optimality gap for the example in Figure 4.3 . . . . . . . . . . . . . 82
4.2 Average positioning error per sensor (MPE/sensor), in meters . . . . . . . . . . . . 83
4.3 Average positioning error per sensor (MPE/sensor), in meters, for the biased exper-
iment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
xi
List of Algorithms
1 Parallel method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Asynchronous method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Asynchronous update at each node i . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Distributed nonconvex localization algorithm . . . . . . . . . . . . . . . . . . . . . 42
5 Minimization of the tight majorizer in (3.19) . . . . . . . . . . . . . . . . . . . . . 50
6 Nesterov’s optimal method for (3.35) . . . . . . . . . . . . . . . . . . . . . . . . . 54
7 Step 2 of Algorithm 5 using ADMM: position updates . . . . . . . . . . . . . . . . 58
8 Distributed monotonic spanning tree-based algorithm . . . . . . . . . . . . . . . . 72
9 Coordinate descent algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
xiii
1Introduction
Contents1.1 Motivation and related work . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Scalability and networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Robustness and harsh environments . . . . . . . . . . . . . . . . . . . . 3
1.2 Objectives and contributions . . . . . . . . . . . . . . . . . . . . . . . 31.3 Agent localization on a network . . . . . . . . . . . . . . . . . . . . . . 4
1
1. Introduction
Networks of agents are becoming ubiquitous. From environmental and infrastructure monitor-
ing to surveillance, and healthcare, networked extensions of the human senses in contemporary
technological societies are improving our quality of life, our productivity, and our safety. Appli-
cations of such networks recurrently need to be aware of node positions to fulfill their tasks and
deliver meaningful information. Nevertheless, locating the nodes is not trivial: these small, low
cost, low power devices are deployed in large numbers, often with imprecise prior knowledge of
their locations, and might be equipped with minimal processing capabilities. Such limitations call
for localization algorithms which are scalable, fast, and parsimonious in their communication and
computational requirements.
1.1 Motivation and related work
The network localization problem is not new (see e.g., Bulusu et al. in [1], published in 2000),
nevertheless, the scientific community is still striving for a usable, practicable solution to the prob-
lem of spatially localizing a network of agents. Nowadays, with the increasing number of networked
devices for applications which are localization dependent, the scalability problem imposes itself as
a real and urgent need. Solutions based on centralized semidefinite programming relaxations like
in [2] are, thus, not suitable for problems with hundreds of nodes, and many distributed approaches,
like [3] demand that each node solve a semidefinite problem at each algorithm iteration — a re-
quirement difficult to attain by the small, low power, inexpensive hardware usually deployed in
applications. There is the need for simple, distributed and fast methods for localization, methods
easy to understand by a non-specialist engineer, but with no concessions in terms of convergence
and expected behavior. This thesis provides such methods with the best performance in accuracy
and convergence rate, demanding only simple arithmetic operations.
Also, the signal processing community has approached the network localization problem with
noisy range measurements assuming that noise is Gaussian, as did Biswas et al. [2]. Nevertheless,
in many applications, added to the indeed Gaussian measurement noise, we find outlier data — due
to, e.g., environmental conditions, malfunctioning or malicious nodes. This problem, with great
practical impact in applications, is also recognized as interesting in the literature (see e.g., Ihler
et al. [4] or more recently Simonetto et al. [3]). Two noteworthy works that address this largely
unexplored problem are presented by Korkmaz et al. in [5], where the authors provide a distributed
nonconvex procedure using the Huber M-estimator, but highly dependent on the initialization,
and the approach by Oğuz-Ekim et al. [6], which does not need initialization information but is
centralized, assumes a Laplacian noise model, and does not scale well with the size of the network.
2
1.2 Objectives and contributions
Our work bridges this gap, by providing an estimate that does not depend on the initialization,
does not assume an outlier distribution, and is lightweight in its computing needs.
As the present work spans over such a varied set of tools and flavors of the sensor network
localization problem, more in depth treatment of related work is included in each chapter.
The remainder of this section briefly discusses some aspects that should be taken into account
when devising a system that effectively delivers localization information to a network of agents.
1.1.1 Scalability and networks
With agents operating in networks and processing increasing amounts of data, today’s computa-
tional paradigm is shifting from a centralized to a distributed operation. Self-localization of agents
is one of the basic requirements of such networked environments, enabling other applications.
This paradigm shift raises questions such as: is it possible to pursue parallel and even asyn-
chronous optimization algorithms and, at the same time, ensure their optimality? Considering
that communication expends more energy than computation, fewer algorithm iterations will mean
not only a faster response, but also longer battery life and thus extended operation.
The simplicity of agents might mean that the amount of computation is also a limited resource,
and only simple operations are available.
Also, the spread of this type of solution throughout applications imposes some simplicity con-
straints to the procedures to be implemented, and avoidance of tuning parameters, so they can be
intuitively and swiftly interpreted by a nonspecialist.
1.1.2 Robustness and harsh environments
Agents deployed in unstructured or uncontrolled environments must often cope with sporadic
but strong measurement impairments that are difficult to characterize probabilistically, and which
can greatly perturb the accuracy of regular algorithms. In this sense, in addition to the scalability
and communication performance issues, we must work toward robust behavior to outlier measure-
ments. With this concern in mind, this thesis contains some work in progress in the area of robust
network localization, also aiming at simple, efficient, and understandable solutions.
1.2 Objectives and contributions
The focus of this work is to develop approaches to the network localization problem that lead
to efficient, simple, and intuitive distributed algorithms, always seeking a rigorous performance
analysis, and provable convergence guarantees. Whenever we design a convexified proxy to the
problem, we always try to produce a bound for the optimality gap of the resulting estimate, also to
understand how we can better tailor the method to the specific problem, taking the most out of the
problem structure. When designing nonconvex refinement solutions, we aim at both performance
and also guaranteed convergence. The broad goals of this thesis are to:
3
1. Introduction
• Study optimization strategies to ensure accuracy and simplicity;
• Consider probabilistic tools to analyze performance in highly unstructured procedures.
Thus, the main contributions of this thesis are the following:
• A distributed network localization algorithm that requires neither parameter tuning nor ini-
tialization in the vicinity of the solution. This algorithm has two flavors: a synchronous,
parallel one and an asynchronous one. Convergence guarantees, optimality bounds and iter-
ation complexity are provided for both. The main body of work was accepted for publication
on the IEEE Transactions on Signal Processing [7].
• A distributed network localization algorithm with no parameter tuning, addressing directly
the nonconvex Maximum Likelihood estimation problem for Gaussian noise. We provided
convergence guarantees to a stationary point, capitalizing on the properties of the Majorization-
Minimization (MM) framework. This work was presented at GlobalSIP 2014 [8].
• A novel, tight majorization function specially crafted for the nonconvex Maximum Likelihood
problem for Gaussian noise, whose preliminary experimental results show a substantial im-
provement in performance over the general purpose quadratic majorizer. The working draft
for this paper is [9].
• A probabilistic approach to the problem, relying in the framework of graphical models.
Also, we re-derive the standard LP relaxation for the maximum a posteriori problem and,
capitalizing on this reformulation, we propose an SDP relaxation which is tighter and shows
better results in the localization problem. We also propose a descent method, requiring no
initialization, based on the least squares and the majorization-minimization framework more
accurate than a coordinate descent method.
• A novel convex relaxation for a robust formulation of the discrepancy function arising from
the Maximum Likelihood problem for Gaussian noise. Instead of considering the square of
the difference between acquired measurements and distances of estimated points, we consider
the Huber M-estimator. This leads to an interesting improvement in the performance under
Gaussian and outlier noise, while preserving the Maximum Likelihood properties within a
prescribed region. The working draft of this paper is [10].
A common denominator in the present thesis contributions is the exploitation of problem structure,
crafting solutions which are intuitive, simple, and robust.
1.3 Agent localization on a network
We represent mathematically the network of agents as an undirected graph G = (V, E), where
the node set V = 1, 2, . . . , n designates agents with unknown positions. An edge i ∼ j ∈ E
4
1.3 Agent localization on a network
between sensors i and j means there is a noisy range measurement between nodes i and j known
to both, and i and j can communicate with each other. Anchors are elements with known positions
and they are collected in the set A = 1, . . . ,m. For each agent i ∈ V, we define Ai ⊂ A as the
subset of anchors with quantified range measurement to agent i. The setNi collects the neighboring
agents of node i.
Let Rp be the space of interest (p = 2 for planar networks, and p = 3 otherwise). We denote
by xi ∈ Rp the position of agent i, and by dij the noisy range measurement between agents i and
j, available at both i and j.
Anchor positions are denoted by ak ∈ Rp. We let rik denote the noisy range measurement
between agent i and anchor k, available at agent i.
The distributed network localization problem addressed in this thesis consists in estimating the
agents’ positions x = xi : i ∈ V, from the available measurements dij : i ∼ j∪rik : i ∈ V, k ∈Ai, through collaborative message passing between neighboring agents in the communication
graph G.Under the assumption of zero-mean, independent and identically-distributed, additive Gaussian
measurement noise, the maximum likelihood estimator for the agent positions is the solution of
the optimization problem
minimizex
f(x), (1.1)
where
f(x) =∑i∼j
1
2(‖xi − xj‖ − dij)2 +
∑i
∑k∈Ai
1
2(‖xi − ak‖ − rik)2.
Problem (1.1) is nonconvex and NP-hard for generic network configurations [11]1.
Problem (1.1) is similar to the Multidimensional scaling (MDS) problem presented in Costa et
al. in [13]. In classical MDS, one must have access to distance measurements between all nodes,
or be able to estimate them. To circumvent this in a sparse network as the geometric topology of
sensor networks, the authors write the MDS problem as
minimizex
∑i∈V
∑j>i
wij (‖xi − xj‖ − dij)2+∑k∈A
wik (‖xi − ak‖ − rik)2,
where the weights wij are zero whenever there is no measurement between nodes i and j, or positive
— chosen by the user — reflecting how accurate are the range measurements dij .
1In [12] the authors prove that highly dense networks in R2 (with edge set cardinality |E| ≥ 2|V| + |V|(|V|+1)2
)can be localized in polynomial time. This density would correspond to an average node degree 〈k〉 ≥ 5 + |V|, whichis not realistic in practice.
5
2Distributed network localization
without initialization: Tight convexunderestimator-based procedure
Contents2.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Convex underestimator . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Distributed sensor network localization . . . . . . . . . . . . . . . . . 11
2.4.1 Gradient and Lipschitz constant of f . . . . . . . . . . . . . . . . . . . . 112.4.2 Parallel method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4.3 Asynchronous method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5.1 Quality of the convexified problem . . . . . . . . . . . . . . . . . . . . . 162.5.2 Parallel method: convergence guarantees and iteration complexity . . . 182.5.3 Asynchronous method: convergence guarantees and iteration complexity 19
2.6 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.6.1 Assessment of the convex underestimator performance . . . . . . . . . . 262.6.2 Performance of distributed optimization algorithms . . . . . . . . . . . . 282.6.3 Performance of the asynchronous algorithm . . . . . . . . . . . . . . . . 29
2.7 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.7.1 Convex envelope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.7.2 Lipschitz constant of ∇φBij . . . . . . . . . . . . . . . . . . . . . . . . . 312.7.3 Auxiliary Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.7.4 Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.8 Summary and further extensions . . . . . . . . . . . . . . . . . . . . . 342.8.1 Heterogeneous data fusion application . . . . . . . . . . . . . . . . . . . 35
7
2. Distributed network localization without initialization: Tight convexunderestimator-based procedure
After deploying a network of agents we would like to estimate the optimal configuration, given
a set of acquired range measurements and a (small) set of reference positions. As we have seen,
solving (1.1) is, in general, an NP-hard problem, and any polynomial time algorithm can only
aspire to deliver a local minimizer; the returned estimate will be initialization dependent, but at
this point of the network’s operation, any meaningful initialization might be impossible. For such
scenarios where there is no possibility of producing a valuable hint on the true agent positions, one
might turn to some approximation of the problem (1.1) that can be globally minimized and, at the
same time, captures the “main shape” of the original problem, particularly, the location of its global
minimizer. This kind of estimate can have low precision, but it may be sufficient for some practical
purposes. Even if this is not the case, it returns invaluable information to initialize a descent
algorithm over the nonconvex problem (1.1) to refine the solution. The work described in this
Chapter was published in the IEEE Transactions on Signal Processing. Some new, unsubmitted,
material was added to this thesis, as signaled in the text.
2.1 Contributions
We propose a convex underestimator of the maximum likelihood cost for the sensor network lo-
calization problem (1.1) based on the convex envelopes of its terms. We also obtain a simple bound
for the optimality gap given an estimate produced by the algorithm. We present an optimal syn-
chronous and parallel algorithm to minimize this convex underestimator with proven convergence
guarantees. We also propose an asynchronous variant of this algorithm, prove that it converges
almost surely, and we analyze its iteration complexity.
Moreover, we assert the superior performance of our algorithms by computer simulations; we
compared several aspects of our method with [3], [6], and [14], and our approach always yields
better performance metrics. When compared with the method in [3], which operates under the same
conditions, our method outperforms it by one order of magnitude in accuracy and in communication
volume.
2.2 Related work
Reference [15] proposes a parallel distributed algorithm to minimize a discrepancy function
based on squared distances, which is known to amplify large variance noise. Also, each element
in the sensor network must solve a second-order cone program at each algorithm iteration, which
can be a demanding task for the simple hardware used in such networks. Furthermore, the formal
8
2.3 Convex underestimator
convergence properties of the algorithm are not established. The work in [16] considers network
localization outside a maximum-likelihood framework. The approach is not parallel, operating
sequentially through layers of nodes: neighbors of anchors estimate their positions and become
anchors themselves, making it possible in turn for their neighbors to estimate their positions, and
so on. Position estimation is based on planar geometry-based heuristics. In [17], the authors
propose an algorithm with assured asymptotic convergence, but the solution is computationally
complex since a triangulation set must be calculated, and matrix operations are pervasive. Fur-
thermore, in order to attain good accuracy, a large number of range measurement rounds must
be acquired, one per iteration of the algorithm, thus increasing energy expenditure. On the other
hand, the algorithm presented in [18] and based on the non-linear Gauss Seidel framework, has
a pleasingly simple implementation, combined with convergence guarantees inherited from the
framework. Notwithstanding, this algorithm is sequential, i.e., nodes perform their calculations
in turn, not in a parallel fashion. This entails the existence of a network-wide coordination pro-
cedure to precompute the processing schedule upon startup, or whenever a node joins or leaves
the network. The sequential nature of the work in [18] was superseded by the work in [3] which
puts forward a parallel method based on two consecutive relaxations of the maximum likelihood
estimator in (1.1). The first relaxation is a semi-definite program with a rank relaxation, while the
second is an edge based relaxation, best suited for the Alternating Direction Method of Multipliers
(ADMM). The main drawback is the amount of communications required to manage the ADMM
variable local copies, and by the prohibitive complexity of the problem at each node. In fact, each
one of the simple sensing units must solve a semidefinite program at each ADMM iteration and
after the update copies of the edge variables must be exchanged with each neighbor. A simpler
approach was devised in [14] by extending the source localization Projection Onto Convex Sets
algorithm in [19] to the problem of sensor network localization. The proposed method is sequential,
activating nodes one at a time according to a predefined cyclic schedule; thus, it does not take
advantage of the parallel nature of the network and imposes a stringent timetable for individual
node activity.
2.3 Convex underestimator
Problem (1.1) can be written as
minimizex
∑i∼j
1
2d2
Sij(xi − xj) +
∑i
∑k∈Ai
1
2d2
Saik(xi), (2.1)
where d2C(x) represents the squared euclidean distance of point x to the set C, i.e.,
d2C(x) = inf
y∈C‖x− y‖2,
9
2. Distributed network localization without initialization: Tight convexunderestimator-based procedure
−1 −0.5 0 0.5 1
0
0.2
0.4
d2S i j(z )
d2B i j
(z )
Bi j = z ∈ R : |z | ≤ 0.5
S i j = z ∈ R : |z | = 0.5
Figure 2.1: Illustration of the convex envelope for intersensor terms of the nonconvex cost func-tion (2.1). The squared distance to the ball Bij (dotted line) is the convex hull of the squareddistance to the sphere Sij (dashed line). In this one dimensional example the value of the rangemeasurement is dij = 0.5
and the sets Sij and Saik are defined as the spheres generated by the noisy measurements dij and
rik
Sij = z : ‖z‖ = dij , Saik = z : ‖z − ak‖ = rik .
Nonconvexity of (2.1) follows from the nonconvexity of the building block
1
2d2
Sij(z) =
1
2inf
‖y‖=dij‖z − y‖2. (2.2)
A simple convexification consists in replacing it by
1
2d2
Bij(z) =
1
2inf
‖y‖≤dij‖z − y‖2 (2.3)
where Bij = z ∈ Rp : ‖z‖ ≤ dij , is the convex hull of Sij . Actually, (2.3) is the convex envelope1
of (2.2). This fact is illustrated in Figure 2.1 with a one-dimensional example; a formal proof for
the generic case is given in Section 2.7.1.
The terms of (2.1) associated with anchor measurements are similarly relaxed as
d2Baik
(z) = inf‖y−ak‖≤rik
‖z − y‖2, (2.4)
where the set Baik is the convex hull of Saik: Baik = z ∈ Rp : ‖z − ak‖ ≤ rik . Replacing the
nonconvex terms in (2.1) by (2.3) and (2.4) we obtain the convex problem
minimizex
f(x) =∑i∼j
1
2d2
Bij(xi − xj) +
∑i
∑k∈Ai
1
2d2
Baik(xi). (2.5)
The function in Problem (2.5) is an underestimator of (2.1) but it is not the convex envelope of
the original function. We argue that in our application of sensor network localization it is generally
a very good approximation whose sub-optimality can be quantified, as discussed in Section 2.5.1.1The convex envelope (or convex hull) of a function γ is its best possible convex underestimator, i.e., conv γ(x) =
sup η(x) : η ≤ γ, η is convex, and is hard to determine in general.
10
2.4 Distributed sensor network localization
The cost function (2.5) also appears in [14], albeit via a different reasoning; our convexification
mechanism seems more intuitive. But the striking difference with respect to [14] is how (2.5) is
exploited to generate distributed solution methods. Whereas [14] lays out a sequential block-
coordinate approach, we show that (2.5) is amenable to distributed solutions either via the fast
Nesterov’s gradient method (for synchronous implementations) or exact/inexact randomized block-
coordinate methods (for asynchronous implementations).
2.4 Distributed sensor network localization
We propose two distributed algorithms: a synchronous one, where nodes work in parallel, and
an asynchronous, gossip-like algorithm, where each node starts its processing step according to
some probability distribution. Both algorithms entail computing the gradient of the cost function
and its Lipschitz constant. In order to achieve this it is convenient to rewrite Problem (2.5) as
minimizex
1
2d2
B(Ax) +∑i
∑k∈Ai
1
2d2
Baik(xi), (2.6)
where A = C ⊗ Ip, C is the arc-node incidence matrix of G, Ip is the identity matrix of size p, and
B is the cartesian product of the balls Bij corresponding to all the edges in E . We denote the two
terms in (2.6) as
g(x) =1
2dB
2(Ax), h(x) =∑i
hi(xi),
where hi(xi) =∑k∈Ai
12d2
Baik(xi). Problems (2.5) and (2.6) are equivalent since Ax is the vector
(xi − xj : i ∼ j) and function g(x) in (2.6) can be written as
g(x) =1
2dB
2(Ax)
=1
2infy∈B‖Ax− y‖2
=1
2inf
‖yij‖≤dij
∑i∼j‖xi − xj − yij‖2.
As all the terms are non-negative and the constraint set is a cartesian product, we can exchange inf
with the summation, resulting in
g(x) =1
2
∑i∼j
inf‖yij‖≤dij
‖xi − xj − yij‖2
=∑i∼j
1
2dB
2ij(xi − xj),
which is the corresponding term in (2.5).
2.4.1 Gradient and Lipschitz constant of f
To simplify notation, let us define the functions:
φBij(z) =
1
2d2
Bij(z), φBaik
(z) =1
2d2
Baik(z).
11
2. Distributed network localization without initialization: Tight convexunderestimator-based procedure
Now we call on a key result from convex analysis (see [20, Prop. X.3.2.2, Th. X.3.2.3]): the
function in (2.3), φBij (z) = 12d2
Bij(z) is convex, differentiable, and its gradient is
∇φBij (z) = z − PBij (z), (2.7)
where PBij (z) is the orthogonal projection of point z onto the closed convex set Bij
PBij (z) = argminy∈Bij
‖z − y‖.
Further, function φBijhas a Lipschitz continuous gradient with constant Lφ = 1, i.e.,
‖∇φBij(x)−∇φBij
(y)‖ ≤ ‖x− y‖. (2.8)
We show (2.8) in Section 2.7.2.
Let us define a vector-valued function φB, obtained by summing all functions φBij . Then, g(x) =
φB(Ax). From this relation, and using (2.7), we can compute the gradient of g(x):
∇g(x) = A>∇φB(Ax)
= A>(Ax− PB(Ax))
= Lx−A>PB(Ax), (2.9)
where the second equality follows from (2.7) and L = A>A = L⊗ Ip, with L being the Laplacian
matrix of G. This gradient is Lipschitz continuous and we can obtain an easily computable Lipschitz
constant Lg as follows
‖∇g(x)−∇g(y)‖ = ‖A> (∇φB(Ax)−∇φB(Ay)) ‖
≤ |||A||| ‖Ax−Ay‖
≤ |||A|||2 ‖x− y‖
= λmax(A>A)‖x− y‖(a)= λmax(L)‖x− y‖
≤ 2δmax︸ ︷︷ ︸Lg
‖x− y‖, (2.10)
where |||A||| is the maximum singular value norm; equality (a) is a consequence of Kronecker
product properties. In (2.10) we denote the maximum node degree of G by δmax. A proof of the
bound λmax(L) ≤ 2δmax can be found in [21]2.
The gradient of h is ∇h(x) = (∇h1(x1), . . . ,∇hn(xn)) , where the gradient of each hi is
∇hi(xi) =∑k∈Ai
∇φBaik(xi). (2.11)
2A tighter bound would be λmax(L) ≤ maxi∼j δi + δj − c(i, j) where δi is the degree of node i and c(i, j) isthe number of vertices that are adjacent to both i and j [22, Th. 4.13], nevertheless 2δmax is easier to compute in adistributed way.
12
2.4 Distributed sensor network localization
The gradient of h is also Lipschitz continuous. The constants Lhifor ∇hi are
‖∇hi(xi)−∇hi(yi)‖ ≤∑k∈Ai
‖∇φBaik(xi)−∇φBaik
(yi)‖
≤ |Ai|‖xi − yi‖, (2.12)
where |C| is the cardinality of set C. We now have an overall constant Lh for ∇h,
‖∇h(x)−∇h(y)‖ =
√∑i
‖∇hi(xi)−∇hi(yi)‖2
≤√∑
i
|Ai|2‖xi − yi‖2
≤ max(|Ai| : i ∈ V)︸ ︷︷ ︸Lh
‖x− y‖. (2.13)
We are now able to write ∇f , the gradient of our cost function, as
∇f(x) = Lx−A>PB(Ax) +
∑k∈A1
x1 − PBa1k(x1)...∑
k∈Anxn − PBank(xn)
. (2.14)
A Lipschitz constant Lf is, thus,
Lf = 2δmax + max(|Ai| : i ∈ V). (2.15)
This constant is easy to precompute through in-network processing by, e.g., a diffusion algorithm
— c.f. [23, Ch. 9] for more information.
Although we restricted ourselves to a fixed (time-invariant) topology, it is easy to show, taking
the worse-case scenario, that the computation of a Lipschitz constant is possible for time-varying
topologies. For this worst-case constant we replace the maximum node degree δmax and maximum
number of connected anchors to one single node, and max(|Ai| : i ∈ V) by the corresponding values
of the complete network, resulting in
Lf = 2(|V| − 1) + |A|.
Note, however, that the algorithm would be slower, but the constant would still be valid for any
topology.
In summary, we can compute the gradient of f using Equation (2.14) and a Lipschitz constant
by (2.15), which leads us to the algorithms described in Sections 2.4.2 and 2.4.3 for minimizing f .
2.4.2 Parallel method
Since f has a Lipschitz continuous gradient we can follow Nesterov’s optimal method [24]. Our
approach is detailed in Algorithm 1. In Step 7, c(i∼j,i) is the entry (i ∼ j, i) in the arc-node
incidence matrix C, and δi is the degree of node i.
13
2. Distributed network localization without initialization: Tight convexunderestimator-based procedure
Algorithm 1 Parallel methodInput: Lf ; dij : i ∼ j ∈ E; rik : i ∈ V, k ∈ A;Output: x1: k = 0;2: each node i chooses random xi(0) = xi(−1);3: while some stopping criterion is not met, each node i do4: k = k + 1
5: wi = xi(k − 1) +k − 2
k + 1(xi(k − 1)− xi(k − 2)) ;
6: node i broadcasts wi to its neighbors7: ∇gi(wi) = δiwi −
∑j∈Ni
wj +∑j∈Ni
c(i∼j,i)PBij(wi − wj);
8: ∇hi(wi) =∑k∈Ai
wi − PBa ik(wi);
9: xi(k) = wi −1
Lf(∇gi(wi) +∇hi(wi));
10: end while11: return x = x(k)
Parallel nature of Algorithm 1
The updates in Step 9 of the algorithm require the computation of the gradient of the cost
w.r.t. the position of node i. This corresponds to the i-th entry of ∇f , given in (2.14). The last
summand in (2.14) is simply ∇h(x), and the i-th entry of ∇h(x) is given in (2.11). This can be
easily computed independently by each node. The i-th entry of Lx can be computed by node i,
from its current position estimate and the position estimates of the neighbors, in particular, it
holds (Lx)i = δixi−∑j∈Ni
xj . The less obvious parallel term is A>PB(Ax). We start the analysis
by the concatenated projections PB(Ax) = PBij(xi − xj)i∼j∈E . Each one of those projections
only depends on the edge terminals and the noisy measurement dij . The product with A> will
collect, at the entries corresponding to each node, the sum of the projections relative to edges where
it intervenes, with a positive or negative sign depending on the arbitrary edge direction agreed upon
at the onset of the algorithm. More specifically, (A>PB(Ax))i =∑j∈Ni
c(i∼j,i)PBij(xi − xj), aspresented in Step 7 of Algorithm 1.
2.4.3 Asynchronous method
The method described in Algorithm 1 is fully parallel but still depends on some synchronization
between all the nodes — so that their updates of the gradient are consistent. This requirement
can be inconvenient in some applications of sensor networks; to circumvent it, we present a fully
asynchronous method, achieved by means of a broadcast gossip scheme (c.f. [25] for an extended
survey of gossip algorithms).
Nodes are equipped with independent clocks ticking at random times (say, as Poisson point
processes). When node i’s clock ticks, it performs the update of its variable xi and broadcasts the
update to its neighbors. Let the order of node activation be collected in ξkk∈N, a sequence of
14
2.4 Distributed sensor network localization
Algorithm 2 Asynchronous methodInput: Lf ; dij : i ∼ j ∈ E; rik : i ∈ V, k ∈ A;Output: x1: each node i chooses random xi(0);2: k = 0;3: while some stopping criterion is not met, each node i do4: k = k + 1;5: if ξk = i then6: xi(k) = argminwi
f(x1(k − 1), . . . , wi, . . . , xn(k − 1))7: else8: xi(k) = xi(k − 1)9: end if
10: end while11: return x = x(k)
independent random variables taking values on the set V, such that
P(ξk = i) = Pi > 0. (2.16)
Then, the asynchronous update of variable xi on node i can be described as in Algorithm 2.
To compute the minimizer in Step 6 of Algorithm 2 it is useful to recast Problem (2.6) as
minimizex
∑i
∑j∈Ni
1
4d2
Bij(xi − xj) +
∑k∈Ai
1
2d2
Baik(xi)
, (2.17)
where the factor 14 accounts for the duplicate terms when considering summations over nodes
instead of over edges. By fixing the neighbor positions, each node solves a single-source localization
problem; this setup leads to the Problem
minimizexi
fsli(xi) :=∑j∈Ni
1
4d2
Bsij(xi) +
∑k∈Ai
1
2d2
Baik(xi), (2.18)
where Bsij = z ∈ Rp : ‖z − xj‖ ≤ dij. We call the reader’s attention to the fact that the function
in (2.18) is continuous and coercive; thus, the optimization problem (2.18) has a solution.
We solve (2.18) at each node by employing Nesterov’s optimal accelerated gradient method as
described in Algorithm 3. The asynchronous method proposed in Algorithm 2 converges to the set
of minimizers of function f , as established in Theorem 2, in Section 2.5.
We also propose an inexact version in which nodes do not solve Problem (2.18) but instead
take just one gradient step. That is, simply replace Step 6 in Algorithm 2 by
xi(k) = xi(k − 1)− 1
Lf∇if(x(k − 1)) (2.19)
where ∇if(x1, . . . , xn) is the gradient with respect to xi, and assume
P (ξk = i) =1
n. (2.20)
The convergence of the resulting algorithm is established in Theorem 3, Section 2.5.
15
2. Distributed network localization without initialization: Tight convexunderestimator-based procedure
Algorithm 3 Asynchronous update at each node iInput: ξk;Lf ; dij : j ∈ Ni; rik : k ∈ Ai;Output: xi(k)1: if ξk not i then2: xi(k) = xi(k − 1);3: return xi(k);4: end if5: choose random z(0) = z(−1);6: l = 0;7: while some stopping criterion is not met do8: l = l + 1;
9: w = z(l − 1) +l − 2
l + 1(z(l − 1)− z(l − 2));
10: ∇fsli(w) =1
2
∑j∈Ni
w − PBS ij(w) +∑k∈Ai
w − PBa ik(w)
11: z(l) = w − 1
Lf∇fsli(w)
12: end while13: return xi(k) = z(l)
2.5 Analysis
A relevant question regarding Algorithms 1 and 2 is whether they will return a good solution to
the problem they are designed to solve, after a reasonable amount of computations. Sections 2.5.2
and 2.5.3 address convergence issues of the proposed methods, and discuss some of the assumptions
on the problem data. Section 2.5.1 provides a formal bound for the gap between the original and
the convexified problems.
2.5.1 Quality of the convexified problem
While evaluating any approximation method it is important to know how far the approximate
optimum is from the original one. In this Section we will focus on this analysis.
It was already noted in Section 2.3 that φBij (z) = φSij (z) for ‖z‖ ≥ dij ; when the functions
differ, for ‖z‖ < dij , we have that φBij (z) = 0. The same applies to the terms related to anchor
measurements. The optimal value of function f , denoted by f?, is bounded by
f? = f(x?) ≤ f? ≤ f(x?),
where x? is the minimizer of the convexified problem (2.5), and
f? = infxf(x)
is the minimum of function f . With these inequalities we can compute a bound for the optimality
16
2.5 Analysis
0 2 3 4 5 7
0
10
20
f(x)
f (x)
nodene ighbor ne ighbor ne ighbor1D starNetwork
Figure 2.2: One-dimensional example of the quality of the approximation of the true nonconvexcost f(x) by the convexified function f(x) in a star network. Here the node positioned at x = 3has 3 neighbors.
gap, after (2.5) is solved, as
f? − f? ≤ f(x?)− f?
=∑i∼j∈E
1
2
(d2
Sij(x?i − x?j )− d2
Bij(x?i − x?j )
)+∑i∈V
∑k∈Ai
1
2
(d2
Saik(x?i )− d2
Baik(x?i )
)=
∑i∼j∈E2
1
2d2
Sij(x?i − x?j ) +
∑i∈V
∑k∈A2i
1
2d2
Saik(x?i ).
(2.21)
In Equation (2.21), we denote the set of edges where the distance of the estimated positions is less
than the distance measurement by E2 = i ∼ j ∈ E : d2Bij
(x?i − x?j ) = 0, and similarly A2i =
k ∈ Ai : d2Baik
(x?i ) = 0. Inequality (2.21) suggests a simple method to compute a bound for the
optimality gap of the solution returned by the algorithms:
1. Compute the optimal solution x? using Algorithm 1 or 2;
2. Select the terms of the convexified problem (2.5) which are zero;
3. Add the nonconvex costs of each of these edges, as in (2.21).
Our bound is tighter than the one (available a priori) from applying [26, Th. 1], which is
f? − f? ≤∑i∼j∈E
1
2d2ij +
∑i∈V
∑k∈Ai
1
2r2ik. (2.22)
For the one-dimensional example of the star network costs depicted in Figure 2.2 the bounds
in (2.21), and (2.22), averaged over 500 Monte Carlo trials, are presented in Table 2.1. The
true average gap f? − f? is also shown. In the Monte Carlo trials we sampled a zero mean
Gaussian random variable with σ = 0.25 and obtained a noisy range measurement as described
later by (2.31).
17
2. Distributed network localization without initialization: Tight convexunderestimator-based procedure
Table 2.1: Bounds on the optimality gap for the example in Figure 2.2
f? − f? Equation (2.21) Equation (2.22)
0.0367 0.0487 3.0871
−0.8 −0.6 −0.4 −0.2 0 0.2 0.4
−0.4
−0.2
0
0.2
0.4
0.6
1
2
3
4
5
Figure 2.3: 2D network example to assess the quality of the bound in Equation (2.21). Blue squaresstand for anchors, while the red star is a sensor with unknown position.
A two-dimensional example was also produced to check if the bound is also informative in
2D. Our bound is in the same order of magnitude as the true optimality gap, whereas the bound
in (2.22) is two orders of magnitude greater. For the simple example in Figure 2.3 we obtain the
results of Table 2.2.
These results show the tightness of the convexified function and how loose the bound (2.22) is
when applied to our problem.
2.5.2 Parallel method: convergence guarantees and iteration complexity
As Problem (2.6) is convex and the cost function has a Lipschitz continuous gradient, Algo-
rithm 1 is known to converge at the optimal rateO(k−2
)[24], [27]: f(x(k))−f? ≤ 2Lf
(k+1)2 ‖x(0)− x?‖2 .
Table 2.2: Bounds on the optimality gap for the 2D example in Figure 2.3
f? − f? Equation (2.21) Equation (2.22)
4.5801 6.0899 384.1226
18
2.5 Analysis
2.5.3 Asynchronous method: convergence guarantees and iteration com-plexity
To state the convergence properties of Algorithm 2 we only need Assumption 1; it is used to
prove coerciveness of the relaxed cost in (2.5).
Assumption 1. There is at least one anchor linked to some sensor and the graph G is connected
(there is a path between any two sensors).
This assumption holds generally as one needs p+ 1 anchors to eliminate translation, rotation,
and flip ambiguities while performing localization in Rp, which exceeds the assumption requirement.
We present two convergence results, — Theorem 2, and Theorem 3 — and the iteration complexity
analysis for Algorithm 2 in Proposition 4. Proofs of the Theorems are detailed in Section 2.7.
The following Theorem establishes the almost sure (a.s.) convergence of Algorithm 2.
Theorem 2 (Almost sure convergence of Algorithm 2). Let x(k)k∈N be the sequence of points
produced by Algorithm 2, or by Algorithm 2 with the update (2.19), and let X ? = x? : f(x?) = f?be the set of minimizers of function f defined in (2.5). Then it holds:
limk→∞
dX? (x(k)) = 0, a.s. (2.23)
In words, with probability one, the iterates x(k) will approach the set X ? of minimizers of f ;
this does not imply that x(k)k∈N will converge to one single x? ∈ X ?, but it does imply that
limk→∞ f(x(k)) = f?, since X ? is a compact set, as proved in Section 2.7, Lemma 5.
Theorem 3 (Almost sure convergence to a point). Let x(k)k∈N be a sequence of points generated
by Algorithm 2, with the update (2.19) in Step 6, and let all nodes start computations with uniform
probability. Then, with probability one, there exists a minimizer of f , denoted by x? ∈ X ?, suchthat
x(k)→ x?. (2.24)
This result not only tells us that the iterates of Algorithm 2 with the modified Step 6 stated in
Equation (2.19) converge to the solution set, but it also guarantees that they will not be jumping
around the solution set X ? (unlikely to occur in Algorithm 2, but not ruled out by the analysis).
One of the practical benefits of Theorem 3 is that the stopping criterion can safely probe the
stability of the estimates along iterations. To the best of our knowledge, this kind of strong type
of convergence (the whole sequence converges to a point in X ?) was not established previously in
the context of randomized approaches for convex functions with Lipschitz continuous gradients,
though it was derived previously for randomized proximal-based minimizations of a large number
of convex functions, cf. [28, Proposition 9].
Proposition 4 (Iteration complexity for Algorithm 2). Let x(k)k∈N be a sequence of points
generated by Algorithm 2, with the update (2.19) in Step 6, and let the nodes be activated with
19
2. Distributed network localization without initialization: Tight convexunderestimator-based procedure
equal probability. Choose 0 < ε < f(x(0)) − f? and ρ ∈ (0, 1). There exists a constant b(ρ, x(0))
such that
P(f(x(k))− f? ≤ ε
)≥ 1− ρ (2.25)
for all
k ≥ K =2nb(ρ, x(0))
ε+ 2− n. (2.26)
The constant b(ρ, x(0)) can be computed from inequality (19) in [29]; it depends only on the
initialization and the chosen ρ. We remind that n is the number of sensor nodes. Proposition 4 is
saying that, with high probability, the function value f(x(k)) for all k ≥ K will be at a distance
no larger than ε of the optimal, and the number of iterations K depends inversely on the chosen ε.
Proof of Proposition 4. As f is differentiable and has Lipschitz gradient, the result trivially follows
from [29, Th. 2].
A natural question to pose, is whether the strong convergence properties of the inexact version
still apply to the exact version. Actually, we can disprove it, with a small toy example.
A toy example. The explanation of this counter-intuitive phenomenon lies in the lack of unique-
ness of minimizers for the function in (2.18), as this function is not necessarily strictly convex at
all iterations. The ambiguity in selecting minimizers across iterations may generate oscillations.
Consider a network localization problem in R with two anchors placed at 0 and 3, and two
nodes placed at 1 and 2. Assume that node 1 measures its distance to anchor 1 with no noise,
node 2 measures its distance to anchor 2 with no noise, and nodes 1 and 2 measure their mutual
distance (with noise) as 1.2. The problem we face is the minimization of
f(x1, x2) =1
2d2B(x1 − x2) +
1
2d2A1
(x1) +1
2d2A2
(x2), (2.27)
where A1 = [−1, 1], A2 = [2, 4] and B = [−1.2, 1.2].
Consider the initialization x2(0) = 2.6 and assume that nodes minimize (2.27) alternatively
x1(k + 1) = arg minx1
f(x1, x2(k)), x2(k + 1) = arg minx2
f(x1(k + 1), x2), (2.28)
for k = 0, 1, . . .. It is straightforward to check that the assignments x1(1) = 1.2, x2(k) = 2.05 +
0.05(−1)k for k ≥ 1 and x1(k) = 1− 0.1/k for k ≥ 2 obey (2.28), i.e., are valid algorithm outputs.
In this example x1(k) converges whereas x2(k) oscillates (we can adjust the example such that
both oscillate in the optimal set). Note, however, that x1(k) and x2(k) are optimal for (2.27) as
soon as k ≥ 2 (of course, for larger networks, optimality cannot be certified at a single node as in
this simple scenario).
The subproblem faced by node 2 depends only on x1(k + 1). Thus, selecting one minimizer—
when it has many—corresponds to establishing a “rule” for each given x1(k + 1). The example
shows that not any rule will lead to strong convergence.
20
2.5 Analysis
0 5 10 15 20 25 301
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
Figure 2.4: Proximal minimization evolution for the toy problem: iterates x1(k) (lower-curve, blue)and x2(k) (upper-curve, red) for k = 1, . . . , 30.
Proximal minimization. A possible approach to circumvent non-uniqueness of minimizers
in (2.18) is to add a proximal term (as this makes the function strictly convex). In the context of
the toy problem, this translates into replacing (2.28) with
x1(k + 1) = arg minx1
f(x1, x2(k)) +c
2(x1 − x1(k))2, (2.29)
and
x2(k + 1) = arg minx2
f(x1(k + 1), x2) +c
2(x2 − x2(k))2 (2.30)
for some c > 0, possibly time-varying. Problems (2.29)-(2.30) have now unique solutions at all
iterations. However, the proximal terms tend to slow down convergence. With the same initial-
ization as above (x2(0) = 2.6, x1(1) = 1.2) and c = 1, Figure 2.4 shows the first 30 iterations
of (2.29)-(2.30), and Figure 2.5 shows the corresponding cost function values (2.27). We see that
optimality is not reached after 30 iterations (recall that, for (2.28), it is attained at the 2nd iterate,
with zero cost).
Other approaches. Another option would be to set a systematic rule for selecting minimizers,
whenever there are many. For example, always picking the one with lowest norm. Intuitively, this
21
2. Distributed network localization without initialization: Tight convexunderestimator-based procedure
0 5 10 15 20 25 3010
−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1
Figure 2.5: Function values f(x1(k), x2(k)), cf. (2.27), corresponding to the iterates in Figure 2.4.
22
2.6 Numerical experiments
should stabilize iterations but the implementation of such a rule would complicate the numeri-
cal solution of the inner problems (2.18) substantially (also, the theoretical analysis seems very
challenging).
In sum, given that oscillations of the iterations for the exact version of Algorithm 2 are rarely
observed in practice (the example above is highly artificial), it is unclear if alternative approaches
that secure strong convergence are worth pursuing, from a practical standpoint. Note that our
inexact version that guarantees strong convergence is both simple to implement and to certify
theoretically.
2.6 Numerical experiments
In this Section we present experimental results that demonstrate the superior performance of our
methods when compared with four state of the art algorithms: Euclidean Distance Matrix (EDM)
completion presented in [6], Semidefinite Program (SDP) relaxation and Edge-based Semidefinite
Program (ESDP) relaxation, both implemented in [3], and a sequential projection method (PM)
in [14] optimizing the same convex underestimator as the present work, with a different algorithm.
The fist two methods — EDM completion and SDP relaxation — are centralized, whereas the
ESDP relaxation and PM are distributed.
Setup
We conducted simulations with two uniquely localizable geometric networks with sensors dis-
tributed in a two-dimensional square of unit area, with 4 anchors in the corners. Network 1,
depicted in Figure 2.6, has 10 sensor nodes with an average node degree3 of 4.3, while network 2,
shown in Figure 2.7, has 50 sensor nodes and average node degree of 6.1. The ESDP method was
only evaluated in network 1 due to simulation time constraints, since it involves solving an SDP
at each node, and each iteration. The noisy range measurements are generated according to
dij = |‖x?i − x?j‖+ νij |, rik = |‖x?i − ak‖+ νik|, (2.31)
where x?i is the true position of node i, and νij : i ∼ j ∈ E ∪ νik : i ∈ V, k ∈ Ai are
independent Gaussian random variables with zero mean and standard deviation σ. The accuracy
of the algorithms is measured by the original nonconvex cost in (1.1) and by the Root Mean
Squared Error (RMSE) per sensor, defined as
RMSE =
√√√√ 1
n
(1
M
M∑m=1
‖x? − x(m)‖2), (2.32)
where M is the number of Monte Carlo trials performed.
3To characterize the used networks we resort to the concepts of node degree ki, which is the number of edgesconnected to node i, and average node degree 〈k〉 = 1/n
∑ni=1 ki.
23
2. Distributed network localization without initialization: Tight convexunderestimator-based procedure
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2
3 4
56
7
8
9
10
11
12
13
14
Figure 2.6: Network 1. Topology with 4 anchors and 10 sensors. Anchors are marked with bluesquares and sensors with red stars.
24
2.6 Numerical experiments
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
Figure 2.7: Network 2. Topology with 4 anchors and 50 sensors. Anchors are also marked withblue squares and sensors with red stars.
25
2. Distributed network localization without initialization: Tight convexunderestimator-based procedure
0.01 0.05 0.1 0.30.06
0.1
0.14
0.18
0.21
0.25
Measurement noise σ
RMSE
EDM complet ion
SDP re laxation
Disk re laxation
Figure 2.8: Relaxation quality: Root mean square error comparison of EDM completion in [6],SDP relaxation in [3] and the disk relaxation (2.5); measurements were perturbed with noise withdifferent values for the standard deviation σ. The disk relaxation approach in (2.5) improved onthe RMSE values of both EDM completion and SDP relaxation for all noise levels, even though itdoes not rely on the SDP machinery. The performance gap to EDM completion is substantial.
2.6.1 Assessment of the convex underestimator performance
The first experiment aimed at comparing the performance of the convex underestimator (2.5)
with two other state of the art convexifications. For the proposed disk relaxation (2.5), Algorithm 1
was stopped when the gradient norm ‖∇f(x)‖ reached 10−6, while both EDM completion and SDP
relaxation were solved with the default SeDuMi solver [30] with eps = 10−9, so that algorithm
properties did not mask the real quality of the relaxations. Figures 2.8 and 2.9 report the results
of the experiment with 50 Monte Carlo trials over network 2 and measurement noise with σ =
[0.01, 0.05, 0.1, 0.3]; so, we had a total of 200 runs, equally divided by the 4 noise levels. In
Figure 2.8 we can see that the disk relaxation in (2.5) has better performance for all noise levels.
Figure 2.9 depicts the results of optimizing the three convex functions for the same problems in
RMSE vs. execution time, which reflects, albeit imperfectly, the complexities of the considered
algorithms. The convex surrogate (2.5) used in the present work combined with our methods is
faster by at least one order of magnitude.
We tested all convex relaxations for robustness to sensors outside the convex hull of the anchors
and they all performed worse in such conditions. This type of behavior has been previously noted
by several authors. The noise-free network with 10 nodes and 4 anchors is depicted in Figure 2.6.
Notice that there are some sensors placed near the boundary of the anchors’ convex hull. The
result of 40 Monte Carlo runs is shown in Figure 2.10. This plot is illustrative of the behavior of
the tested algorithms: both Algorithm 1 and the centralized version of [3] are somewhat better in
more interior nodes, like 9 and 7, and perform not so well in the more peripheric nodes, near the
boundary of the anchors’ convex hull, like 5 and 14.
26
2.6 Numerical experiments
1.21 27.31 152.05
0.130.14
0.25
Execution time
RMSE
EDM
complet ion
SDP re laxation
Disk
re laxation
Figure 2.9: Relaxation quality: Comparison of the best achievable root mean square error versusoverall execution time of the algorithms. Measurements were contaminated with noise with σ = 0.1.Although disk relaxation (2.5) has a distributed implementation, running it sequentially can befaster by one order of magnitude than the centralized methods.
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2
3 4
56
7
8
9
10
11
12
13
14
Measurement noise σ = 0.01
Figure 2.10: Estimates for the location of the sensor nodes, based on 40 Monte Carlo trials, fornetwork 1, shown in Figure 2.6. Red dots express the output of Algorithm 1, blue circles indicatethe estimates of the centralized version of [3], and yellow stars represent the EDM completionalgorithm in [6].
27
2. Distributed network localization without initialization: Tight convexunderestimator-based procedure
0.4 0.6 2
x 104
0.06
0.07
0.1
Communicat ions per sensor
RMSE
Pro ject ion method (σ =0.05)
Proposed method (σ = 0.05)
Proposed method (σ = 0.01)
Pro ject ion method(σ = 0.01)
Figure 2.11: Performance of the proposed method in Algorithm 1 and of the Projection methodpresented in [14]. The stopping criterion for both algorithms was a relative improvement of 10−6
in the estimate. The proposed method uses fewer communications to achieve better RMSE for thetested noise levels. Our method outperforms the projection method with one fourth the numberof communications for a noise level of 0.01.
2.6.2 Performance of distributed optimization algorithms
To measure the performance of the presented Algorithm 1 in a distributed setting we compared
it with the state of the art methods in [14] and the distributed algorithm in [3]. The results
are shown, respectively, in Figures 2.11 and 2.12. The experimental setups were different, since
the authors proposed different stopping criteria for their algorithms and, in order to do a fair
comparison, we ran our algorithm with the specific criterion set by each benchmark method. Also,
to compare with the distributed ESDP method in [3], we had to use a smaller network of 10 sensors
because of simulation time constraints — as the ESDP method entails solving an SDP problem at
each node, the simulation time becomes prohibitively large, at least using a general-purpose solver.
The number of Monte Carlo trials was 32, with 3 noise levels, leading to 96 realizations for each
noisy measurement. In the experiment illustrated in Figure 2.11, the stopping criterion for both
the projection method and the presented method was the relative improvement of the solution; we
stress that this is not a distributed stopping criterion, we adopted it just for the sake of algorithm
comparison. We can see that the proposed method fares better not only in RMSE but, foremost,
in communication cost. The experiment comprised 120 Monte Carlo trials and two noise levels.
Table 2.3: Number of communications per sensor for the results in Fig. 2.12ESDP method Algorithm 1
21600 2000
From the analysis of both Figure 2.12 and Table 2.3 we can see that the ESDP method is one
order of magnitude worse in RMSE performance, using one order of magnitude more communica-
tions, than Algorithm 1.
28
2.6 Numerical experiments
0.01 0.05 0.1
0.02
0.070.11
0.35
0.44
Measurement noise σ
RMSE
ESDP method
Proposed method
Figure 2.12: Performance of the proposed method in Algorithm 1 and of the ESDP method in [3].The stopping criterion for both algorithms was the number of algorithm iterations. The perfor-mance advantage of the proposed method in Algorithm 1 is even more remarkable when consideringthe number of communications presented in Table 2.3.
0.01 0.05 0.1
0.02
0.06
Measurement noise σ
Costvalue
Paralle l algorithm
Asynchronous algorithm
Figure 2.13: Final cost of the parallel Algorithm 1 and its asynchronous counterpart in Algorithm 2with an exact update for the same number of communications. Results for the asynchronous versiondegrade less than those of the parallel one as the noise level increases. The stochastic Gauss-Seideliterations prove to be more robust to intense noise.
2.6.3 Performance of the asynchronous algorithm
A second set of experiments examined the performance of the parallel and asynchronous flavors
of our method, presented respectively in Algorithms 1 and 2, the latter with exact updating. The
metric was the value of the convex cost function f in (2.5) evaluated at each algorithm’s estimate
of the minimum. For fairness, both algorithms were allowed to run until they reached a preset
number of communications. In Figure 2.13 we present the effectiveness of both algorithms in
optimizing the disk relaxation cost (2.5), with the same amount of communications. We chose the
random variables ξk representing the sequence of updating nodes in the asynchronous version of
our method, with uniform probability. Again, we ran 50 Monte Carlo trials, each with 3 noise
levels, thus leading to 150 samplings of the noise variables in (2.31).
29
2. Distributed network localization without initialization: Tight convexunderestimator-based procedure
The more robust behavior of the asynchronous version is a phenomenon empirically observed
in other optimization algorithms, when comparing deterministic and randomized versions. In [31,
Section 6.3.5] the authors prove that, for a fixed-point algorithm with given properties and “with
bounded communication delays, the convergence rate is geometric and, under certain conditions,
it is superior to the convergence rate of the corresponding synchronous iteration”. Also, in [28] the
author states that, for the problem considered in the cited paper, “the randomized order provides
a worst-case performance advantage over the cyclic order”. Our numerical experiments suggest a
similar behavior of our Algorithms 1 and 2, but we don’t have further theoretical support for these
observations.
2.7 Proofs
2.7.1 Convex envelope
We show that the function in (2.3) is the convex envelope of the function in (2.2). Refer to α
as the function in (2.2) and β as the function in (2.3). We show that α?? = β where f? denotes
the Fenchel conjugate of a function f , cf. [20, Cor. 1.3.6, p. 45, v. 2].
We start by computing α?:
α?(s) = supzs>z − α(z)
= supzs>z −
(1
2inf
‖y‖=dij‖z − y‖2
)= sup
zsup‖y‖=dij
s>z − 1
2‖z − y‖2
= sup‖y‖=dij
supzs>z − 1
2‖z − y‖2
= sup‖y‖=dij
1
2‖s‖2 + s>y
=1
2‖s‖2 + dij ‖s‖ .
Thus, α? is the sum of two closed convex functions: α? = g + h where g(s) = 12 ‖s‖
2 and h(s) =
dij ‖s‖. Note that h(s) = σB(0,dij)(s) where σC(s) = sups>x : x ∈ C denotes the support
function of a set C. Thus, using [20, Th. 2.3.1, p. 61, v. 2], we have
α??(z) = infz1+z2=z
g?(z1) + h?(z2).
Since g?(z1) = 12 ‖z1‖2 [20, Ex. 1.1.3, p. 38, v. 2] and h?(z2) = iBij (z2) [20, Ex. 1.1.5, p. 39, v. 2]
where iC(x) = 0 if x ∈ C and iC(x) = +∞ if x 6∈ C denotes the indicator of a set C, we conclude
that
α??(z) = infz1+z2=z
1
2‖z1‖2 + iBij
(z2)
= infz2∈Bij
1
2‖z − z2‖2
= β(z).
30
2.7 Proofs
2.7.2 Lipschitz constant of ∇φBij
We prove the inequality in (2.8):∥∥∇φBij(x)−∇φBij
(y)∥∥ ≤ ‖x− y‖ (2.33)
where ∇φBij(z) = z − PBij
(z), and PBij(z) is the projector onto Bij = z ∈ Rp : ‖z‖ ≤ dij.
Squaring both sides of (2.33) gives the equivalent inequality
2(P(x)− P(y))>(x− y)− ‖P(x)− P(y)‖2 ≥ 0 (2.34)
where, to simplify notation, we let P(z) := PBij(z). Inequality (2.34) can be rewritten as
(P(x)− P(y))>(x− y) + (P(x)− P(y))>
(P(y)− y)
+(P(x)− P(y))>(x− P(x)) ≥ 0. (2.35)
By the properties of projectors onto closed convex sets, (z−P(z))>(w−P(z)) ≤ 0, for any w ∈ Bij
and any z, cf. [20, Th. 3.1.1, p. 117, v. 1]. Thus, the last two terms on the left-hand side of (2.35)
are nonnegative. Moreover, the first term is nonnegative due to [20, Prop. 3.1.3, p. 118, v. 1].
Inequality (2.35) is proved.
2.7.3 Auxiliary Lemmas
In this Section we establish basic properties of Problem (2.6) in Lemma 5 and also two technical
Lemmas, instrumental to prove our convergence results in Theorem 2.
Lemma 5 (Basic properties). Let f as defined in (2.5). Then the following properties hold.
1. f is coercive;
2. f? ≥ 0 and X ? 6= ∅;
3. X ? is compact;
Proof.
1. By Assumption 1 there is a path from each node i to some node j which is connected to
an anchor k. If ‖xi‖ → ∞ then there are two cases: (1) there is at least one edge t ∼ u
along the path from i to j where ‖xt‖ → ∞ and ‖xu‖ 6→ ∞, and so d2Btu
(xt − xu) → ∞;
(2) if ‖xu‖ → ∞ for all u in the path between i and j, in particular we have ‖xj‖ → ∞ and
so d2Bajk
(xj)→∞, and in both cases f →∞, thus, f is coercive.
2. Function f defined in (2.5) is a sum of squares, it is continuous, convex and a real valued
function, lower bounded by zero; so, the infimum f? exists and is non-negative. To prove
this infimum is attained and X ? 6= ∅, we consider the set T = x : f(x) ≤ α; T is a sublevel
set of a continuous, coercive function and, thus, it is compact. As f is continuous, by the
Weierstrass Theorem, the value p = infx∈T f(x) is attained; the equality f? = p is evident.
31
2. Distributed network localization without initialization: Tight convexunderestimator-based procedure
3. X ? is a sublevel set of a continuous coercive function and, thus, compact.
Lemma 6. Let x(k)k∈N be the sequence of iterates of Algorithm 2, or of Algorithm 2 with the
update (2.19), and ∇f (x(k)) be the gradient of function f evaluated at each iterate. Then,
1.∑k≥1
‖∇f (x(k)) ‖2 <∞, a.s.;
2. ∇f (x(k))→ 0, a.s.
Proof. Let Fk = σ (x(0), · · · , x(k)) be the sigma-algebra generated by all the algorithm iterations
until time k. We are interested in E[f (x(k)) |Fk−1
], the expected value of the cost value of the kth
iteration, given the knowledge of the past k − 1 iterations. Firstly, let us examine function φ :
Rp → R, the slice of f along a coordinate direction, φ(y) = f(x1, . . . , xi−1, y, xi+1, . . . , xn). As f
has Lipschitz continuous gradient with constant Lf , so will φ: ‖∇φ(y)−∇φ(z)‖ ≤ Lf‖y − z‖, forall y and z, and, thus, it will inherit the property
φ(y) ≤ φ(z) + 〈∇φ(z), y − z〉+Lf2‖y − z‖2. (2.36)
Inequality (3.18) is known as the Descent Lemma [32, Prop. A.24]. The minimizer of the quadratic
upper-bound in (3.18) is z − 1Lf∇φ(z), which can be plugged back in (3.18), obtaining
φ? ≤ φ(z − 1
Lf∇φ(z)
)≤ φ(z)− 1
2Lf‖∇φ(z)‖2. (2.37)
In the sequel, for a given x = (x1, . . . , xn), we let
f?i (x−i) = inff(x1, . . . , xi−1, z, xi+1, . . . , xn) : z.
Going back to the expectation E[f (x(k)) |Fk−1
]=∑ni=1 Pif
?i (x−i(k − 1)), we can bound it from
above, recurring to (2.37), by
n∑i=1
Pi
(f(x(k − 1))− 1
2Lf‖∇if(x(k − 1))‖2
)
= f(x(k − 1))− 1
2Lf
n∑i=1
Pi‖∇if(x(k − 1))‖2
(a)
≤ f(x(k − 1))− Pmin
2Lf‖∇f(x(k − 1))‖2, (2.38)
where we used 0 < Pmin ≤ Pi, for all i ∈ 1, · · · , n in (a). To alleviate notation, let g(k) =
∇f(x(k)); we then have
‖g(k)‖2 =∑i≤k
‖g(i)‖2 −∑i≤k−1
‖g(i)‖2,
and adding Pmin
2L
∑i≤k−1 ‖g(i)‖2 to both sides of the inequality in (2.38), we find that
E [Yk|Fk−1] ≤ Yk−1, (2.39)
32
2.7 Proofs
where Yk = f(x(k)) + Pmin
2L
∑i≤k−1 ‖g(i)‖2. Inequality (2.39) defines the sequence Ykk∈N as a
supermartingale. As f(x) is always non-negative, then Yk is also non-negative and so [33, Corollary
27.1],
Yk → Y, a.s.
In words, the sequence Yk converges almost surely to an integrable random variable Y . This entails
that∑k≥1 ‖g(k)‖2 <∞, a.s., and so, g(k)→ 0, a.s.
The previous arguments show that Lemma 6 holds for Algorithm 2. To show that Lemma 6
also holds for Algorithm 2 with the update (2.19) it suffices to redefine
f?i (x−i) := f
(x1, . . . , xi −
1
Lf∇if(x), . . . , xn
).
As the second inequality in (2.37) shows, we have the bound
f?i (x−i(k − 1)) ≤ f(x(k − 1))− 1
Lf
∥∥∥∇if (x(k − 1))∥∥∥2
and the rest of the proof holds intact.
Lemma 7. Let x(k)k∈N be one of the sequences generated with probability one according to
Lemma 6. Then,
1. The function value decreases to the optimum: f(x(k)) ↓ f?;
2. There exists a subsequence of x(k)k∈N converging to a point in X ?: x(kl)→ y, y ∈ X ?.
Proof. As f is coercive, then the sublevel set Xf =x : f(x) ≤ f(x(0))
is compact and, be-
cause f(x(k)) is non increasing, all elements of x(k)k∈N belong to this set. From the compactness
of Xf we have that there is a convergent subsequence x(kl)→ y. We evaluate the gradient at this
accumulation point, ∇f(y) = liml→∞∇f(x(kl)), which, by assumption, vanishes, and we therefore
conclude that y belongs to the solution set X ?. Moreover, the function value at this point is, by
definition, the optimal value.
2.7.4 Theorems
Equipped with the previous lemmas, we are now ready to prove the Theorems stated in Sec-
tion 2.5.
Proof of Theorem 2. Suppose the distance does not converge to zero. Then, there exists an ε > 0
and some subsequence x(kl)l∈N such that dX?(x(kl)) > ε. But, as f is coercive (by Lemma 5),
continuous, and convex, and whose gradient, by Lemma 6, vanishes, then by Lemma 7, there is a
subsequence of x(kl)l∈N converging to a point in X ?, which is a contradiction.
Proof of Theorem 3. Fix an arbitrary point x? ∈ X ?. We start by proving that the sequence of
squared distances to x? of the estimate produced by Algorithm 2, with the update defined in
33
2. Distributed network localization without initialization: Tight convexunderestimator-based procedure
Equation (2.19), converges almost surely; that is, the sequence ‖x(k)− x?‖2k∈N is convergent
with probability one. We have
E[‖x(k)− x?‖2|Fk−1
]= (2.40)
n∑i=1
1
n
∥∥∥∥∥x(k − 1)− 1
Lfgi(k − 1)− x?
∥∥∥∥∥2
where gi(k−1) = (0, . . . , 0,∇if(x(k−1)), 0, . . . , 0) and Fk = σ (x(1), . . . , x(k)) is the sigma-algebra
generated by all iterates until time k. Expanding the right-hand side of (2.40) yields
‖x(k − 1)− x?‖2 +1
nL2f
∥∥∥∇f(x(k − 1))∥∥∥2
− 2
nLf(x(k − 1)− x?)>∇f(x(k − 1)).
Since (x(k−1)−x?)>∇f(x(k−1)) = (x(k−1)−x?)>(∇f(x(k − 1))−∇f(x?)
)≥ 0, we conclude
that
E[‖x(k)− x?‖2|Fk−1
]≤ ‖x(k − 1)− x?‖2 +
1
nL2f
∥∥∥∇f(x(k − 1))∥∥∥2
.
Now, as proved in Lemma 6, the sum∑k ‖∇f(x(k))‖2 converges almost surely. Thus, invoking
the result in [34], we get that ‖x(k)− x?‖2 converges almost surely.
We can now invoke the technique at the end of the proof of [28, Prop. 9] to conclude that x(k)
converges to some optimal point x?.
2.8 Summary and further extensions
Experiments in Section 3.4.7 show that our method is superior to the state of the art in
all measured indicators. While the comparison with the projection method published in [14] is
favorable to our proposal, it should be further considered that the projection method has a different
nature when compared to ours: it is sequential, and such algorithms will always have a larger
computation time than parallel ones, since nodes run in sequence; moreover, this computation
time grows with the number of sensors while parallel methods retain similar speed, no matter how
many sensors the network has.
When comparing with a distributed and parallel method similar to Algorithm 1, like the ESDP
method in [3] we can see one order of magnitude improvement in RMSE for one order of magni-
tude fewer communications of our method — and this score is achieved with a simpler, easy-to-
implement algorithm, performing simple computations at each node that are well suited to the
kind of hardware commonly found in sensor networks. Also, unlike SDP methods, our proposal
preserves positions, which need not to be recovered back from the SDP estimate.
34
2.8 Summary and further extensions
There are some important questions not addressed here. For example, it is not clear what
influence the number of anchors and their spatial distribution can have in the performance of the
proposed and state of the art algorithms. Also, an exhaustive study on the impact of varying
topologies and number of sensors could lead to interesting results. Some preliminary experiments
show that all convex relaxations experience some performance degradation when tested for ro-
bustness to sensors outside the convex hull of the anchors. This issue has been noted by several
authors, but a more exhaustive study exceeds the scope of this thesis.
But with the data presented here one can already grasp the advantages of our fast and easily
implementable distributed method, where the optimality gap of the solution can also be easily
quantified, and which offers two implementation flavours for different localization needs.
2.8.1 Heterogeneous data fusion application
A spin-off of the presented method was developed and already submitted for publication. The
problem in (2.5) can be thought of as a minimization of the squared discrepancy between a data
model and measured data. In this perspective, we envisioned an extension of the present work, by
fusing the range measurements procedure with angle information. This can be done by considering
a new edgeset Eu containing pairs of nodes with measured angle between them, and the squared
distance d2`tv
(·) to a line `tv, passing through the origin and defined by the unit vector utv. The
problem is, then,
minimizex
∑i∼j∈E
1
2σ2d2
Bij(xi−xj)+
∑i∼j∈Eu
1
2σ2`
d2`tv (xt−xv)+
∑i
(∑k∈Ai
1
2σ2d2
Baik(xi) +
∑k∈Aui
1
2σ2`
d2`aik
(xi)
),
(2.41)
where Aui is the set of anchors with angle measurements related to node i, σ is the standard
deviation of the Gaussian noise term in (2.31), and σ` is the standard deviation of the noise in
the angle measurement statistics4. The simulation and real data results are very encouraging, and
the possibility of fusing two different types of information on the same minimization offers a new
flexibility to localization.
4Note the simplifying assumption that the statistics over the measured angle has a Gaussian distribution.
35
2. Distributed network localization without initialization: Tight convexunderestimator-based procedure
36
3Distributed network localization withinitialization: Nonconvex procedures
Contents3.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.2 Distributed Majorization-Minimization with quadratic majorizer . 39
3.2.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2.2 Problem reformulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2.3 Majorization-Minimization . . . . . . . . . . . . . . . . . . . . . . . . . 413.2.4 Distributed sensor network localization . . . . . . . . . . . . . . . . . . 423.2.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Majorization-Minimization with convex tight majorizer . . . . . . . 473.3.1 Majorization function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.3.2 Experimental results on majorization function quality . . . . . . . . . . 493.3.3 Distributed optimization of the proposed majorizer using ADMM . . . . 493.3.4 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.3.5 Proof of majorization function properties . . . . . . . . . . . . . . . . . 633.3.6 Proof of Proposition 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.3.7 Proof of (3.31) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4 Sensor network localization: a graphical model approach . . . . . . 663.4.1 Uncertainty models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.4.2 Optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.4.3 Combinatorial problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.4.4 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.4.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.4.6 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.4.7 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723.4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
37
3. Distributed network localization with initialization: Nonconvex procedures
Now imagine we have some prior knowledge on where our agents are located. This knowledge
can come from either deployment instructions or a previous run of a convexified algorithm, like the
one presented in Chapter 2, or, maybe, from known (or estimated) positions at a previous moment
in time. Imagine you want an accurate estimate of your network configuration, but you need it
fast and with a simple and stable implementation. This Chapter addresses this scenario, providing
results concerning MM with a quadratic majorizer, MM with a tighter majorizer, and a graphical
model approach to the problem. The work described in the next Section was partially presented n
the 2014 IEEE GlobalSIP conference.
3.1 Related work
As we have seen previously, distributed and maximum-likelihood (thus nonconvex) approaches
to the sensor network localization problem are much less common than centralized or relaxation-
based approaches, despite the more suited nature of this computational paradigm to the problem
at hand. The work in [15] proposes a parallel distributed algorithm. However, the sensor network
localization problem adopts a discrepancy function between squared distances which, unlike the
ones in maximum likelihood (ML) methods, is known to amplify measurement errors and outliers.
The convergence properties of the algorithm are not studied theoretically. The work in [16] also
considers network localization outside a ML framework. The approach proposed in [16] is not par-
allel, operating sequentially through layers of nodes: neighbors of anchors estimate their positions
and become anchors themselves, making it possible in turn for their neighbors to estimate their
positions, and so on. Position estimation is based on planar geometry-based heuristics. In [17], the
authors propose an algorithm with assured asymptotic convergence, but the solution is computa-
tionally complex since a triangulation set must be calculated, and matrix operations are pervasive.
Furthermore, in order to attain good accuracy, a large number of range measurement rounds must
be acquired, one per iteration of the algorithm, thus increasing energy expenditure. The algorithm
presented in [18] is a nonlinear Gauss-Seidel approach: only one node works at a time and solves a
source localization problem with neighbors playing the role of anchors. The nodes activate sequen-
tially in a round-robin scheme. Thus, the time to complete just one cycle becomes proportional to
the network size. Parallel algorithms — the ones we are interested in Chapters 2 and the present
one — avoid this issue altogether, as all nodes operate simultaneously; moreover, adding or delet-
ing a node raises no special synchronization concern. The work presented in [35] puts forward
a two-stage algorithm which is parallel: in a first consensus phase, a Barzilai-Borwein (BB) step
size is calculated, followed by a local gradient computation phase. It is known that BB steps do
38
3.2 Distributed Majorization-Minimization with quadratic majorizer
not necessarily decrease the objective function; as discussed in [36], an outer globalization scheme
involving line searches is needed to ensure its stability. However, line searches are cumbersome to
implement in a distributed setting and are, in fact, absent in [35]. Further, the algorithm requires
the step size to be computed via consensus, and thus the number of consensus rounds needed is a
parameter to tune.
3.2 Distributed Majorization-Minimization with quadratic ma-jorizer
We propose a simple, stable and distributed algorithm which directly optimizes the nonconvex
maximum likelihood criterion for sensor network localization, with no need to tune any free pa-
rameter. We reformulate the problem to obtain a gradient Lipschitz cost; by shifting to this cost
function we enable a Majorization-Minimization (MM) approach based on quadratic upper bounds
that decouple across nodes; the resulting algorithm happens to be distributed, with all nodes work-
ing in parallel. Our method inherits the stability of MM: each communication cuts down the cost
function. Numerical simulations indicate that the proposed approach tops the performance of state
of the art algorithms, both in accuracy and communication cost.
The algorithm we present has an astonishingly simple implementation which is both parallel and
stable, with no free parameters. In Section 3.4.7 we will compare experimentally the performance
of our method with the distributed, parallel, state of the art method in [35].
3.2.1 Contributions
We tackle the nonconvex problem in (1.1) directly, with a simple and efficient algorithm which:
1. is parallel;
2. does not involve any free parameter definition;
3. is proven not to increase the value of the cost function at each iteration (thus, stable);
4. has better performance in positioning error and cost value than a state of the art method,
while requiring fewer communications.
The first and second claims are addressed in Section 3.2.4, the third in Section 3.4.7.B and the last
one in Section 3.4.7, dedicated to numerical experiments.
3.2.2 Problem reformulation
We can reformulate Problem (1.1) as
minimizexi,yij ,wik
∑i∼j
1
2‖xi − xj − yij‖2 +
∑i
∑j∈Ai
1
2‖xi − ak − wik‖2 (3.1)
subject to ‖yij‖ = dij , ‖wij‖ = rij .
39
3. Distributed network localization with initialization: Nonconvex procedures
dij
xi xj
dSij(xi xj)
Sij
d2Sij
(xi xj) = minimizeyij
kxi xj yijk2
subject to kyijk = dij
Figure 3.1: Illustration of the reformulation in (3.1) of Problem (1.1). The sphere Sij of radius dijis defined as y ∈ Rp : ‖y‖ = dij.
This reformulation is illustrated in Figure 3.1. We now rewrite (3.1) as
minimizexi,yij ,wik
1
2‖Ax− y‖2 +
∑i
1
2‖xi ⊗ 1− αi − wi‖2 (3.2)
subject to ‖yij‖ = dij , ‖wik‖ = rik,
with concatenated vectors x = (xi)i∈V , y = (yij)i∼j , αi = (aik)k∈Ai, and wi = (wik)k∈Ai
. In (3.2),the symbol 1 stands for the vector of ones. Matrix A is the result of the Kronecker product of thearc-node incidence matrix1 C with the identity matrix Ip: A = C⊗ Ip. Problem (3.2) is equivalentto
minimizexi,yij ,wik
1
2
∥∥∥∥∥∥[A −I 0] xyw
∥∥∥∥∥∥2
+1
2‖Ex− α− w‖2
subject to ‖yij‖ = dij , ‖wik‖ = rik,
where α = (αi)i∈V , w = (wi)i∈V , and E is a matrix with zeros and ones, selecting the entriesin α and w corresponding to each sensor node. We now collect all the optimization variablesin z = (x, y, w), and rewrite our problem as
minimizez
1
2
∥∥[A −I 0]z∥∥2
+1
2
∥∥[E 0 −I]z − α
∥∥2
subject to z ∈ Z,
where
Z = z = (x, y, w) : ‖yij‖ = dij , i ∼ j, wik = rik, i ∈ V, k ∈ Ai.Problem (3.2) can be written as
minimizez
f(z) =1
2zTMz − bT z (3.3)
subject to z ∈ Z, (3.4)
for M and b defined as
M = M1 +M2, b =
ET0−I
α, (3.5)
M1 =
AT−I0
[A −I 0], M2 =
ET0−I
[E 0 −I].
1Each edge is arbitrarily assigned a direction by the two incident nodes.
40
3.2 Distributed Majorization-Minimization with quadratic majorizer
3.2.3 Majorization-Minimization
To solve Problem (3.3) in a distributed way we must deal with the complicating off-diagonal
entries of M that couple the sensors’ variables. We emphasize a simple, but key fact:
Remark 8. The function optimized in Problem (3.3) is quadratic in z and, thus, has a Lipschitz
continuous gradient [32], i.e.,
‖∇f(x)−∇f(y)‖ ≤ L‖x− y‖,
for some L and all x, y.
From this property of function f we can obtain the upper bound (also found in [32]) f(z) ≤f(zt)+〈∇f(zt), z − zt〉+ L
2 ‖z − zt‖2, for any point zt and use it as a majorizer in the Majorization-
Minimization framework [37]. This majorizer decouples the variables and allows for a distributed
solution. This happens because the quadratic term is a diagonal matrix and, so, there are no
off-diagonal terms to couple the sensors’ position variables. Our algorithm is simply:
zt+1 = argminz∈Z
f(zt) +⟨∇f(zt), z − zt
⟩+L
2
∥∥z − zt∥∥2. (3.6)
The solution of (3.6) is the projected gradient iteration [32]
zt+1 = PZ
(zt − 1
L∇f(zt)
), (3.7)
where PZ(z) is the projection of point z onto Z. This projection has a closed-form expression,
PZ(z) =
x
PY(y) =[yij‖yij‖dij
]i∼j
PW(w) =[wik
‖wik‖rik]i∈Vk∈Ai
.The gradient in (3.7) can be easily computed as the affine function ∇f(z) = Mz−b. See the recentwork [38] for interesting convergence properties of the recursion (3.7). Particularly, we emphasize
that the cost function is non increasing per iteration.
We now compute a Lipschitz constant L for the gradient of the quadratic function in Prob-
lem (3.3), such that it is easy to estimate in a distributed way.
L = λmax (M)
≤ λmax (M1) + λmax (M2)
= λmax
(AAT + I
)+ λmax
(EET + I
)≤ λmax
(ATA
)+ λmax
(EET
)+ 2
≤ 2δmax + maxi∈V|Ai|+ 2, (3.8)
where λmax denotes the largest eigenvalue, |A| is the cardinality of set A, and δmax is the maximum
node degree of the network. We note that λmax(ATA) is the maximum eigenvalue of the Laplacian
matrix of graph G; the proof that it is upper-bounded by 2δmax can be found in [21] and was
discussed in Section 2.4.1. This Lipschitz constant can be computed in a distributed way by, e.g.,
a diffusion algorithm (c.f. [23, Ch. 9]).
41
3. Distributed network localization with initialization: Nonconvex procedures
Algorithm 4 Distributed nonconvex localization algorithmInput: x0;L; dij : j ∈ Ni; rik : k ∈ Ai;Output: x1: set y0
ij = PYij
(x0i − x0
j
), Yij = y : ‖y‖ = dij and w0
ik = PWik
(x0i − ak
), Wik = w : ‖w‖ =
rik2: t = 03: while some stopping criterion is not met, each node i do4: xt+1
i = bixti + 1
L
∑j∈Ni
(xtj + C(i∼j,i)y
tij
)+ 1
L
∑k∈Ai
(wtik + aik)5: for all neighboring j, compute
yk+1ij = PYij
(L−1L ykij + 1
LC(i∼j,i)(xti − xtj
)),
6: for each of the connected anchors k ∈ Ai, computewt+1ik = PWik
(L−1L wtik + 1
L (xi − aik))
7: broadcast xt+1i to neighbors
8: t = t+ 19: end while
10: return xi = xti
3.2.4 Distributed sensor network localization
At this point, the recursion in Eq. (3.7) is already distributed, as detailed below. From (3.7)
we will obtain the update rules for the variables x, y and w. For this we write matrixM as follows:
M =
ATA+ ETE −AT −ET−A I 0−E 0 I
, (3.9)
and denote B = ATA+ ETE. Then, each block of z is updated according to
xt+1 =
(I − 1
LB
)xt +
1
LAT yt +
1
LET (wt + α), (3.10)
yt+1 = PY
(L− 1
Lyt +
1
LAxt
), (3.11)
wt+1 = PW
(L− 1
Lwt +
1
LExt − α
L
), (3.12)
where Y andW are the constraint sets associated with the acquired measurements between sensors,
and between anchors and sensors, respectively, and Ni is the set of the neighbors of node i. We
observe that each block of z = (x, y, w) at iteration t + 1 will only need local neighborhood
information, as clarified in Algorithm 4. Each node i will update the current estimate of its own
position, each one of the yij for all the incident edges i ∼ j and the anchor terms wik, if any. The
symbol C(i∼j,i) denotes the arc-node incidence matrix entry relative to edge i ∼ j (row index) and
node i (column index). The constant in step 4 of Algorithm 4 is defined as bi = L−δi−|Ai|L .
3.2.5 Experimental results
We present numerical experiments to ascertain the performance of the proposed Algorithm 4,
both in accuracy and in communication cost. For a fized graph, accuracy will be measured in
1) mean positioning error defined as
MPE =1
M
M∑m=1
n∑i=1
‖xi(m)− x?i ‖, (3.13)
42
3.2 Distributed Majorization-Minimization with quadratic majorizer
Table 3.1: Mean positioning error, with measurement noiseσ Proposed method BB method
0.01 0.0053 0.00590.05 0.0143 0.01540.10 0.0210 0.0221
whereM is the total number of Monte Carlo trials, xi(m) is the estimate generated by an algorithm
at the Monte Carlo trial m, and x?i is the true position of node i, and 2) also by evaluating the
cost function in (1.1), averaged over the Monte Carlo trials, as in (3.65).
In the previous Chapter we used RMSE as a performance measure. Both MPE and RMSE
characterize the localization error, albeit in different ways. RMSE penalizes bigger discrepancies,
while MPE weights outliers less. So, as we are dealing with the nonconvex Maximum Likelihood
cost directly, a discrepancy in the estimate is due to the presence of measurement noise — which
shifts the cost minimum, to possible ambiguities in the network, but also to the existence of
different local minima attracting the measured algorithm. The fact that the algorithm converged
to a local minimum which is different from the global optimum should be penalized. Nevertheless,
the resulting individual distance ‖xi(m)−x?i ‖ can be large, and may not represent adequately the
overall performance of the algorithm in question.
Communication cost will be measured taking into account that each iteration in Algorithm 4
involves communicating pn real numbers. We will compare the performance of the proposed method
with the Barzilai-Borwein algorithm in [35], whose communication cost per iteration is n(2T + p),
where T is the number of consensus rounds needed to estimate the Barzilai-Borwein step size.
We use T = 20 as in [35]. The setup for the experiments is a geometric network with 50 sensors
randomly distributed in the two-dimensional square [0, 1] × [0, 1], with average node degree of
about 6, and 4 anchors placed at the vertices of this square. The network remains fixed during all
the Monte Carlo trials. Both algorithms are initialized by a convex approximation method. The
initialization will hopefully hand the nonconvex refinement algorithms a point near the basin of
attraction of the true minimum. For this purpose we generate noisy range measurements according
to dij = |‖x?i − x?j‖+ νij |, and rik = |‖x?i − ak‖+ ηik|, where νij : i ∼ j ∈ E ∪ ηik : i ∈ V, k ∈Ai are independent Gaussian random variables with zero mean and standard deviation σ. We
conducted 100 Monte Carlo trials for each standard deviation σ = (0.01, 0.05, 0.1). If we spread
the sensors by a squared area with side of 1Km, this means measurements are affected by noise
of standard deviation of 10m, 50m, and 100m. In terms of mean positioning error the proposed
algorithm fares better than the benchmark: Table 3.1 shows the mean error defined in (3.65)
after the algorithms have stabilized, or reached a maximum iteration number. In the simulated
setup, we improve the accuracy of the gradient descent with Barzilai-Borwein steps by about 1m
per sensor, even for high power noise. Figure 3.2 depicts the averaged evolution of the error per
sensor of both algorithms as a function of the volume of accumulated communications, and also
43
3. Distributed network localization with initialization: Nonconvex procedures
0 5000 10000 15000
5
6
7
8
x 10−3
Gradient descentw ith BB steps
Proposed method
Communicat ions/sensor
MPE/se
nso
r
(a) The proposed method improves the comparing algorithm, both in accuracy and communicationcost. Our proposed method improves the state of the art method in [35] by about 60 cm inmean positioning error per sensor, delivering a consistent and stable progression of the error ofthe estimates.
0 5000 10000 15000
0
10
20
x 10−4
Communications/sensor
Avg.cost/se
nso
r
Gradient descentw ith BB steps
Proposed method
(b) The final costs are, for the BB method, 1.7392 10−4 and, for the proposed method 1.5698 10−4.A small difference in cost that translates into a considerable distance in error, as depicted inFigure 3.2(a) and Table 3.1.
Figure 3.2: Comparison of the evolution of cost and average error per sensor with communications,for Algorithm 4 and the benchmark. Noisy distance measurements with σ = 0.01, representing 10mfor a square with 1Km sides. The proposed method shows a faster and smoother progression, whilethe benchmark bounces, always above the proposed method.
the evolution of the cost for low power noise. Gradient descent with Barzilai-Borwein steps shows
an irregular pattern for the error, only vaguely matching the variation in the corresponding cost
(Figure 3.2(b)), thus leaving some uncertainty regarding when to stop the algorithm and what
estimate to keep. The presented method reaches the final cost value per sensor much faster and
steadily than the benchmark for medium-low measurement noise. In fact, our method takes under
one order of magnitude fewer communications than the benchmark to approach the minimum cost
value (match the cost at about 1500 communications with 15000 for the benchmark). The most
realistic case of medium noise power led to the results presented in Figure 3.3. The characteristic
irregularity of the BB method continues to fail in delivering better solutions in average than our
44
3.2 Distributed Majorization-Minimization with quadratic majorizer
0 1000 2000 3000 4000 5000 6000 70000.012
0.013
0.014
0.015
0.016Gradient descentw ith BB steps
Proposed method
Communicat ions/sensor
MPE/se
nso
r
(a) For medium noise power the algorithms’ performance comparison follows the one under lownoise power. The accuracy gain is more than 1m per sensor.
0 1000 2000 3000 4000 5000 6000 7000
2
4
6
8
10x 10
−3
Communicat ions/sensor
Avg.cost/se
nso
r
Gradient descentw ith BB steps
Proposed method
(b) Under medium noise the proposed method also reaches a smaller value for the average cost persensor: 0.0031, vs. 0.0032 for the BB method.
Figure 3.3: Comparison of the evolution of cost and average error per sensor with communications,for Algorithm 4 and the benchmark, under medium power noise. Average error and cost Distancemeasurements contaminated with noise, with σ = 0.05, representing 50m for a square with 1Kmsides. The proposed method continues to outperform the benchmark, and evolves much morepredictably than the BB method.
stable, guaranteed method. The error curves in Figure 3.3(a) are increasing because the error is
not the quantity being directly optimized and the medium-high noise power in measurement data
shifts the optimal points of the cost function relative to the nominal positions. Under high noise
power, our method tops the performance of the benchmark in cost function terms, as shown in
Figure 3.4(b), not only in terms of convergence speed, but also in the final value reached. Again,
our method expends almost one order of magnitude fewer communications to achieve its plateau,
which is itself, on average, better than the alternative method (compare the performance at 700
communications with the one at 7000 for the benchmark).
45
3. Distributed network localization with initialization: Nonconvex procedures
0 1000 2000 3000 4000 5000 6000 7000
0.018
0.02
0.022
0.024
Gradient descent w ith BB steps
Proposed method
Communicat ions/sensor
MPE/se
nso
r
(a) The proposed algorithm tops the benchmark in error, under high noise power, by more than 1m,when considering a squared deployment area of 1Km sides.
0 1000 2000 3000 4000 5000 6000 70000.005
0.01
0.015
0.02
0.025
Gradient descent w ith BB steps
Proposed method
Communicat ions/sensor
Avg.cost/se
nso
r
(b) Under heavy noise the proposed method reaches a smaller value for the average cost per sen-sor: 0.0096, vs. 0.0099 for the BB method.
Figure 3.4: Comparison of the evolution of cost and average error per sensor with communications,for Algorithm 4 and the benchmark, under high power noise. Distance measurements contaminatedwith noise, with σ = 0.1, representing 100m for a square with 1Km sides.
3.2.6 Summary
The monotonicity of the proposed method is a strong feature for applications of sensor network
localization. Our method proves to be not only fast and resilient, but also simple to implement
and deploy, with no free parameters to tune. The steady accuracy gain over the competing method
also makes it usable in contexts of different noise powers. The presented method can be useful
both as a refinement algorithm and as a tracking method, e.g., for mobile robot formations where
position estimates computed on a given time step are used as initialization for the next one.
An asynchronous flavor of the algorithm would be, as far as we know, restricted to a broadcast
gossip scheme, following a block-coordinate descent model. This line of research is in progress.
46
3.3 Majorization-Minimization with convex tight majorizer
3.3 Majorization-Minimization with convex tight majorizer
A quadratic majorizer such as the one used in 3.2 is a common choice for the MM framework.
As one would expect, preliminary simulation results show that using a tighter majorizer improves
localization performance. In the following Sections we describe a particularly tight convex ma-
jorization function and point some directions of research in order to devise a distributed method
to optimize it.
3.3.1 Majorization function
Commonly, MM techniques resort to quadratic majorizers which, albeit easy to minimize, show
a considerable mismatch with most cost functions (in particular, with f in (1.1)). To overcome
this problem, we introduce a key novel majorizer. It is specifically adapted to f , tighter than a
quadratic, convex, and easily optimizable.
Before proceeding it is useful to rewrite (1.1) as
f(x) =∑i∼j
fij(xi, xj) +∑i
∑k∈Ai
fik(xi),
where fij(xi, xj) = φdij (xi − xj) and fik(xi) = φrik(xi − ak), both defined in terms of the basic
building block
φd(u) = (‖u‖ − d)2. (3.14)
3.3.1.A Majorization function for (3.14)
Let v ∈ Rp be given, assumed nonzero. We provide a majorizer Φd(· | v) for φd in (3.14) which
is tight at v, i.e., φd(u) ≤ Φd(u | v) for all u and φd(v) = Φd(v | v).
Proposition 9. Let
Φd(u|v) = maxgd(u), hd(v
>u/ ‖v‖ − d), (3.15)
where
gd(u) = (‖u‖ − d)2+ , (3.16)
(r)2+ = (max0, r)2, and
hR(r) =
2R|r| −R2 if |r| ≥ Rr2 if |r| < R,
(3.17)
is the Huber function of parameter R. Then, the function Φd(· | v) is convex, is tight at v, and
majorizes φd.
Proof. See Section 3.3.6.
Further, we propose the following Conjecture:
Conjecture 10. Majorizer Φd(· | v) in (3.15) is a tight convex majorizer of the nonconvex func-
tion φd(·) in (3.14), i.e., for all convex functions ψ : Rn → R such that φd(x) ≤ ψ(x) ≤ Φd(x) for
all x ∈ Rn we have ψ(x) = Φd(x).
47
3. Distributed network localization with initialization: Nonconvex procedures
−3 −2.5 −2 −1.5 −1 −0.5 0 0.5 10
2
4
6
8
10
12
!d(u)
Qd(u |v )
!d(u |v )
Figure 3.5: Nonconvex cost function (black, dash point) in (3.14) against the proposed majorizer(red, solid) in (3.15) and a vanilla quadratic majorizer (blue, dashed) in (3.18), for d = 0.5 andv = 0.1. The proposed convex majorizer is a much more accurate approximation.
The tightness of the proposed majorization function is illustrated in Figure 3.5, in which we
depict, for a one-dimensional argument u, d = 0.5 and v = 0.1: the nonconvex cost function
in (3.14), the proposed majorizer in (3.15) and a quadratic majorizer
Qd(u|v) = ‖u‖2 + d2 − 2dv>u
‖v‖ , (3.18)
obtained through routine manipulations of (3.14), e.g., expanding the square and linearizing ‖u‖at v, which is common in MM approaches (c.f. [5,6] for quadratic majorizers applied to the sensor
network localization problem and [39] for an application in robust MDS). Clearly, the proposed
convex majorizer is a better approximation to the nonconvex cost function2. As an expected
corollary, it also outperforms in accuracy the quadratic majorizer when embedded in the MM
framework, as shown in the experimental results of Section 3.3.2. The proof of Conjecture 10 is
being addressed.
3.3.1.B Majorization function for the sensor network localization problem
We remind the reader about the maximum likelihood estimation problem for sensor network
localization defined in (1.1).
2The fact that both majorizers have coincident minimum is an artifact of this toy example, and does not hold ingeneral.
48
3.3 Majorization-Minimization with convex tight majorizer
Now, for given x[l], consider the function
F (x |x[l]) =∑i∼j
Fij(xi, xj) +∑i
∑k∈Ai
Fik(xi), (3.19)
where
Fij(xi, xj) = Φdij (xi − xj |xi[l]− xj [l]) (3.20)
and
Fik(xi) = Φrik(xi − ak |xi[l]− ak). (3.21)
Given Proposition 9, it is clear that it majorizes f and is tight at x[l]. Moreover, it is convex as a
sum of convex functions.
3.3.2 Experimental results on majorization function quality
To initialize the algorithms we take the true sensor positions x? = x?i : i ∈ V and we perturb
them by adding independent zero mean Gaussian noise, according to
xi[0] = x?i + ηi, (3.22)
where ηi ∼ N (0, σ2initIp) and Ip is the identity matrix of size p× p. The parameter σinit is detailed
ahead.
We compare the performance of our proposed majorizer in (3.19) with a standard one built out
of quadratic functions, e.g., the one used in [6]. We have submitted a simple source localization
problem with one sensor and 4 anchors to two MM algorithms, each associated with one of the
majorization functions. They ran for a fixed number of 30 iterations. At each Monte Carlo trial,
the true sensor positions were corrupted by zero mean Gaussian noise, as in (3.22), with standard
deviation σinit ∈ [0.01, 1]. The range measurements are taken to be noiseless, i.e., σ = 0 in (3.64),
in order to create an idealized scenario for direct comparison of the two approaches. The evolution
of RMSE as a function of initialization noise intensity is illustrated in Figure 3.6. There is a clear
advantage of using this majorization function when the initialization is within a radius of the true
location which is 30% of the square size.
3.3.3 Distributed optimization of the proposed majorizer using ADMM
At the l-th iteration of the nonconvex minimization algorithm, the convex function in (3.19)
must be minimized. We now show how this optimization problem can be solved collaboratively by
the network in a distributed, parallel manner.
We propose a first distributed algorithm to tackle problem (1.1). Starting from an initializa-
tion x[0] for the unknown sensors’ positions x, it generates a sequence of iterates (x[l])l≥1 which,
hopefully, converges to a solution of (1.1). We apply the majorization minimization (MM) frame-
work [37] to (1.1): at each iteration l, we minimize (3.19), a majorizer of f , tight at the current
49
3. Distributed network localization with initialization: Nonconvex procedures
10−2 10−1 100
10−4
10−3
10−2
10−1
F (proposed)
Q (quadrat ic )
RMSE
I nit ializat ion noise intensity (! i n i t)
Figure 3.6: RMSE vs. σinit, the intensity of initialization noise in (3.22). The range measurementsare noiseless: σ = 0 in (3.64). Anchors are at the unit square corners. The proposed majorizer(red, solid) outperforms the quadratic majorizer (blue, dashed) in accuracy.
Algorithm 5 Minimization of the tight majorizer in (3.19)Input: x[0]Output: x[L]1: for l = 0 to L− 1 do2: x[l + 1] = argminx F (x |x[l])3: end for4: return x[L]
iterate x[l], to obtain the next iterate x[l+ 1]. The algorithm is outlined in Algorithm 5 for a fixed
number of iterations L. Here, F (· |x[l]) denotes a majorizer of f (i.e., f(x) ≤ F (x |x[l]) for all
x) which is tight at x[l] (i.e., f(x[l]) = F (x[l] |x[l])). The majorizer is detailed in Section 3.3.1.B.
Note that f(x[l+1]) ≤ f(x[l]), that is, f is monotonically decreasing along iterations, an important
property of the MM framework.
Algorithm 5 is a distributed algorithm because, as we shall see, the minimization of the upper-
bounds F can be achieved in a distributed manner.
3.3.3.A Problem reformulation
In the distributed algorithm, the working node will operate on local copies of the estimated
positions of its neighbors and of itself. So, it is convenient to introduce new variables. Let
Vi = j : j ∼ i denote the neighbors of sensor i. We also define the closed neighborhood
50
3.3 Majorization-Minimization with convex tight majorizer
Vi = Vi ∪ i. For each i, we duplicate xi into new variables yji, j ∈ Vi, and zik, k ∈ Ai. This
choice of notation is not fortuitous: the first subscript reveals which physical node will store the
variable, in our proposed implementation; thus, xi and zik are stored at node i, whereas yji is
managed by node j. We write the minimization of (3.19) as the optimization problem
minimize F (y, z)subject to yji = xi, j ∈ Vi
zik = xi, k ∈ Ai,(3.23)
where y = yji : i ∈ V, j ∈ Vi, z = zik : i ∈ V, k ∈ Ai, andF (y, z) =
∑i∼j
(Fij (yii, yij) + Fij (yji, yjj)) +
+ 2∑i
∑k∈Ai
Fik (zik) . (3.24)
In passing from (3.19) to (3.23) we used the identity Fij(xi, xj) = 12Fij (yii, yij) + 1
2Fij (yji, yjj),
due to yji = xi. Also, for convenience, we have rescaled the objective by a factor of two.
3.3.3.B Algorithm derivation
Problem (3.23) is in the form
minimize F (y, z) +G(x)subject to A(y, z) +Bx = 0
(3.25)
where F is the convex function in (3.24), G is the identically zero function, A is the identity
operator and B is a matrix whose rows belong to the set −e>i , i ∈ V, being ei the ith column of
the identity matrix of size |V|. In the presence of a connected network B is full column rank, so
the problem is suited for the Alternating Direction Method of Multipliers (ADMM). See [40] and
references therein for more details on this method. See also [41–46] for applications of ADMM in
distributed optimization settings.
Let λji be the Lagrange multiplier associated with the constraint yji = xi and λ = λji thecollection of all such multipliers. Similarly, let µik be the Lagrange multiplier associated with the
constraint zik = xi and µ = µik.The ADMM framework generates a sequence (y(t), z(t), x(t), λ(t), µ(t))t≥1 such that
(y(t+ 1), z(t+ 1)) = argminy,z
Lρ (y, z, x(t), λ(t), µ(t))
(3.26)x(t+ 1) = argmin
xLρ (y(t+ 1), z(t+ 1), x, λ(t), µ(t))
(3.27)λji(t+ 1) = λji(t) + ρ(yji(t+ 1)− xi(t+ 1)) (3.28)µik(t+ 1) = µik(t) + ρ(zik(t+ 1)− xi(t+ 1)), (3.29)
where Lρ is the augmented Lagrangian defined as
Lρ(y, z, x, λ, µ) = F (y, z) +∑i
∑j∈Vi
(λ>ji(yji − xi)+
ρ
2‖yji − xi‖2
)+∑i
∑k∈Ai
(µ>ik(zik − xi) +
ρ
2‖zik − xi‖2
).
(3.30)
51
3. Distributed network localization with initialization: Nonconvex procedures
Here, ρ > 0 is a pre-chosen constant.
In our implementation, we let node i store the variables xi, yij , λij , λji, for j ∈ Vi and zik,
µik, for k ∈ Ai. Note that a copy of λij is maintained at both nodes i and j (this is to avoid extra
communication steps). For t = 0, we can set λ(0) and µ(0) to a pre-chosen constant (e.g., zero) at
all nodes. Also, we assume that, at the beginning of the iterations (i.e., for t = 0) node i knows
xj(0) for j ∈ Vi (this can be accomplished, e.g., by having each node i communicating xi(0) to all
its neighbors). This property will be preserved for all t ≥ 1 in our algorithm, via communication
steps.
We now show that the minimizations in (3.26) and (3.27) can be implemented in a distributed
manner and require low computational cost at each node.
3.3.3.C ADMM: Solving Problem (3.26)
As shown in Section 3.3.7, the augmented Lagrangian in (3.30) can be written as
Lρ (y, z, x, λ, µ) =∑i
∑j∈Vi
Lij (yii, yij , xj , λij) +
+∑k∈Ai
Lik (zik, xi, µik)
)(3.31)
where
Lij (yii, yij , xj , λij) = Fij (yii, yij) + λ>ij (yij − xj) +
+ρ
2‖yij − xj‖2
and
Lik (zik, xi, µik) = 2Fik (zik) + µ>ik (zik − xi) +
+ρ
2‖zik − xi‖2 .
In (3.31) we let Fii ≡ 0. It is clear from (3.31) that Problem (3.26) decouples across sensors i ∈ V,since we are optimizing only over y and z. Further, at each sensor i, it decouples into two types ofsubproblems: one involving the variables yij , j ∈ Vi, given by
minimizeyij , j∈Vi
∑j∈Vi
Lij (yii, yij , xj , λij) , (3.32)
and into |Ai| subproblems of the form
minimizezik
Lik (zik, xi, µik) , (3.33)
involving the variable zik, k ∈ Ai. Note that problems related with anchors are simpler, and, since
there are usually few anchors in a network, they do not occur frequently.
A – Solving Problem (3.32) First, note that node i can indeed address Problem (3.32) since
all the data defining it is available at node i: it stores λji(t), j ∈ Vi, and it knows xj(t) for all
neighbors j ∈ Vi (this holds trivially for t = 0 by construction, and it is preserved by our approach,
as shown ahead).
52
3.3 Majorization-Minimization with convex tight majorizer
To alleviate notation we now suppress the indication of the working node i, i.e., variable yij issimply written as yj . Problem (3.32) can be written as
minimizeyj , j∈Vi
∑j∈Vi
(Fij(yi, yj) +
ρ
2‖yj − γij‖2
)+ρ
2‖yi − γii‖2, (3.34)
where γij = xj − λij
ρ .
We make the crucial observation that, for fixed yi, the problem is separable in the remaining
variables yj , j ∈ Vi. This motivates writing (3.34) as the master problem
minimizeyi
H(yi) =∑j∈Vi
Hij(yi) +ρ
2‖yi − γii‖2, (3.35)
where
Hij(yi) = minyj
Fij(yi, yj) +ρ
2‖yj − γij‖2. (3.36)
We now state important properties of Hij .
Proposition 11. Define Hij as in (3.36). Then:
1. Optimization problem (3.36) has a unique solution yj for any given yi, henceforth denoted
y?j (yi);
2. Function Hij is convex and differentiable, with gradient
∇Hij(yi) = ρ(y?j (yi)− γij
); (3.37)
3. The gradient of Hij is Lipschitz continuous with parameter ρ, i.e.,
‖∇Hij(u)−∇Hij(v)‖ ≤ ρ‖u− v‖
for all u, v ∈ Rp.
Proof. 1. Recall from (3.20) that Fij(yi, yj) = Φd(yi−yj | v) where d = dij and v = xi[l]−xj [l].We have Hij(yi) = Θ(yi − γij) where
Θ(w) = minu
Φd(u | v) +ρ
2‖u− w‖2 . (3.38)
Moreover, u? solves (3.38) if and only if y?j = yi − u? solves (3.36). Now, the cost function
in (3.38) is clearly continuous, coercive (i.e., it converges to +∞ as ‖u‖ → +∞) and strictly
convex, the two last properties arising from the quadratic term. Thus, it has an unique
solution;
2. The function Θ is the Moreau-Yosida regularization of the convex function Φd(·|v) [20,
XI.3.4.4]. As Θ is known to be convex and Hij is the composition of Θ with an affine
map, Hij is convex. It is also known that the gradient of Θ is
∇Θ(w) = ρ(w − u?(w))
53
3. Distributed network localization with initialization: Nonconvex procedures
where u?(w) is the unique solution of (3.38) for a given w. Thus,
∇Hij(yi) = ∇Θ(yi − γij)
= ρ(yi − γij − u?(yi − γij)).
Unwinding the change of variable, i.e., using y?j (yi) = yi − u?(yi − γij), we obtain (3.37);
3. Follows from the well known fact that the gradient of Θ is Lipschitz continuous with param-
eter ρ.
As a consequence, we obtain several nice properties of the function H.
Theorem 12. Function H in (3.35) is strongly convex with parameter ρ, i.e., H− ρ2 ‖·‖
2 is convex.
Furthermore, it is differentiable with gradient
∇H(yi) = ρ∑j∈Vi
(y?j (yi)− γij
)+ ρ(yi − γii). (3.39)
The gradient of H is Lipschitz continuous with parameter LH = ρ(|Vi|+ 1).
Proof. Since H is a sum of convex functions, it is convex. It is strongly convex with parameter ρ
due to the presence of the strongly convex term ρ2 ‖yi − γii‖
2. As a sum of differentiable functions,
it is differentiable and the given formula for the gradient follows from proposition 11. Finally, since
H is the sum of |Vi|+ 1 functions with Lipschitz continuous gradient with parameter ρ , the claim
is proved.
The properties established in Theorem 12 show that the optimization problem (3.35) is suited
for Nesterov’s optimal method for the minimization of strongly convex functions with Lipschitz
continuous gradient [27, Theorem 2.2.3]. The resulting algorithm is outlined in Algorithm 6, which
is guaranteed to converge to the solution of (3.35).
Algorithm 6 Nesterov’s optimal method for (3.35)1: yi(0) = yi(0)2: for s ≥ 0 do3: yi(s+ 1) = yi(s)− 1
LH∇H(yi(s))
4: yi(s+ 1) = yi(s+ 1) +√LH−
√ρ√
LH+√ρ(yi(s+ 1)− yi(s))
5: end for
B – Solving problem (3.36) It remains to show how to solve (3.36) at a given sensor node.
Any off-the-shelf convex solver, e.g. based on interior-point methods, could handle it. However,
we present a simpler method that avoids expensive matrix operations, typical of interior point
methods, by taking advantage of the problem structure at hand. This is important in sensor
networks where the sensors have stringent computational resources.
54
3.3 Majorization-Minimization with convex tight majorizer
First, as shown in the proof of Proposition 11, it suffices to focus on solving (3.38) for a
given w: solving (3.36) amounts to solving (3.38) for w = yi − γij to obtain u? = u?(w) and set
y?j (yi) = yi − u?.Note from (3.15) that Φd(·|v) only depends on v/ ‖v‖, so we can assume, without loss of
generality, that ‖v‖ = 1.
From (3.15), we see that Problem (3.38) can be rewritten as
minimize r + ρ2 ‖u− w‖
2
subject to gd(u) ≤ rhd(v
>u− d) ≤ r,(3.40)
with optimization variable (u, r). The Lagrange dual (c.f., for example, [20]) of (3.40) is given by
maximize ψ(ω)subject to 0 ≤ ω ≤ 1,
(3.41)
where ψ(ω) = infΨ(ω, u) : u ∈ Rn and
Ψ(ω, u) =ρ
2‖u− w‖2 + ωgd(u) + (1− ω)hd(v
>u− d). (3.42)
We propose to solve the dual problem (3.41), which involves the single variable ω, by bisection:
we maintain an interval [a, b] ⊂ [0, 1] (initially, they coincide); we evaluate ψ(c) at the midpoint
c = (a + b)/2; if ψ(c) > 0, we set a = c, otherwise, b = c; the scheme is repeated until the
uncertainty interval is sufficiently small.
In order to make this approach work, we must prove first that the dual function ψ is indeed
differentiable in the open interval Ω = (0, 1) and find a convenient formula for its derivative. We
will need the following useful result from convex analysis.
Lemma 13. Let X ⊂ Rn be an open convex set and Y ⊂ Rp be a compact set. Let F : X×Y → R.
Assume that F (x, ·) is lower semi-continuous for all x ∈ X and F (·, y) is concave and differentiable
for all y ∈ Y . Let f : X → R, f(x) = infF (x, y) : y ∈ Y . Assume that, for any x ∈ X, the
infimum is attained at an unique y?(x) ∈ Y . Then, f is differentiable everywhere and its gradient
at x ∈ X is given by
∇f(x) = ∇F (x, y?(x)) (3.43)
where ∇ refers to differentiation with respect to x.
Proof. This is essentially [20, VI.4.4.5], after one changes concave for convex, lower semi-continuous
for upper semi-continuous and inf for sup.
Now, view Ψ in (3.42) as defined in Ω × Rn. Is is clear that Ψ(ω, ·) is lower semi-continuous
for all ω (in fact, continuous) and Ψ(·, u) is concave (in fact, affine) and differentiable for all u. In
fact, some even nicer properties hold.
55
3. Distributed network localization with initialization: Nonconvex procedures
Lemma 14. Let ω ∈ Ω. The function Ψω = Ψ(ω, ·) is strongly convex with parameter ρ and
differentiable everywhere with gradient
∇Ψω(u) = ρ(u− w) + 2ω(u− π(u)) + (1− ω)hd(v>u− d)v, (3.44)
where π(u) denotes the projection of u onto the closed ball of radius d centered at the origin.
Furthermore, the gradient of Ψω is Lipschitz continuous with parameter ρ+ 2.
Proof. We start by noting that gd in (3.16) can be written as gd(u) = d2C(u) where C is the closed
ball with radius d centered at the origin, and dC denotes the distance to the closed convex set C.
It is known that gd is convex, differentiable, the gradient is given by ∇gd(u) = 2(u− π(u)) and it
is Lipschitz continuous with parameter 2 [20, X.3.2.3]. Also, function hd in (3.17) is convex and
differentiable. Thus, the function Ψω is convex (resp. differentiable) as a sum of three convex (resp.
differentiable) functions. It is strongly convex with parameter ρ due to the first term ρ2 ‖· − w‖
2.
The gradient in (3.44) is clear. Finally, from |hd(r)| − hd(s)| ≤ 2|r − s| for all r, s, there holds for
any u1, u2,
|hd(v>u1 − d)− hd(v>u2 − d)| ≤ 2|v>(u1 − u2)| ≤ 2‖u1 − u2‖,
where ‖v‖ = 1 and the Cauchy-Schwarz inequality was used in the last step. We conclude
from (3.44) that, for any u1, u2,
‖∇Ψω(u1)−∇Ψω(u2)‖ ≤ (ρ+ 2ω + 2(1− ω)) ‖u1 − u2‖ ,
i.e., the gradient of Ψω is Lipschtz continuous with parameter ρ+ 2.
Using Lemma 14, we see that the infimum of Ψω is attained at a single u?(ω) since it is a
continuous, strongly convex function. The derivative of ψ in (3.41) relies on u?(ω), as seen in
Lemma 15.
Lemma 15. Function ψ in (3.41) is differentiable and its derivative is
ψ(ω) = gd (u?(ω))− hd(v>u?(ω)− d
). (3.45)
Proof. We begin by bounding the norm of u?(ω). From the necessary stationary condition∇Ψω(u?(ω)) =
0 and (3.44) we conclude
(ρ+ 2ω)u?(ω) = ρw + 2ωπ(u?(ω))− (1− ω)hd(v>u?(ω)− d)v. (3.46)
Since |hd(t)| ≤ 2d for all t (see (3.17)), ‖π(u)‖ ≤ d for all u, ‖v‖ = 1, and 0 ≤ ω ≤ 1, we can bound
the norm of the right-hand side of (3.46) by ρ ‖w‖+ 4d. Thus,
‖u?(ω)‖ ≤ 1
ρ+ 2ω(ρ ‖w‖+ 4d)
≤ 1
ρ(ρ ‖w‖+ 4d)
= ‖w‖+4d
ρ.
56
3.3 Majorization-Minimization with convex tight majorizer
Introduce the compact set U = u ∈ Rn : ‖u‖ ≤ ‖w‖ + 4d/ρ. The previous analysis has shown
that the dual function in (3.41) can also be represented as ψ(ω) = infΨ(ω, u) : u ∈ U, i.e., wecan restrict the search to U and view Ψ as defined in Ω × U . We can thus invoke Lemma 13 to
conclude that ψ is differentiable and (3.45) holds.
Finding u?(ω) To obtain u?(ω) we must minimize Ψω. But, given its properties in Lemma 14,
the simple optimal Nesterov method, described in Algorithm 6, is also applicable here.
C – Solving problem (3.33) Note that node i stores xi(t) and µik(t), k ∈ Ai. Thus, it can
indeed address Problem (3.33). Problem (3.33) is similar (in fact, much simpler) than (3.32),
and following the previous steps leads to the same Nesterov’s optimal method. We omit this
straightforward derivation.
3.3.3.D ADMM: Solving Problem (3.27)
Looking at (3.30), it is clear that Problem (3.27) decouples across nodes also. Furthermore,
at node i a simple unconstrained quadratic problem with respect to xi must be solved, whose
closed-form solution is
xi(t+ 1) =1
|Vi|+ |Ai|
∑j∈Vi
(1
ρλji(t) + yji(t+ 1)
)
+∑k∈Ai
(1
ρµik(t) + zik(t+ 1)
)). (3.47)
For node i to carry this update, it needs first to receive yji(t + 1) from its neighbors j ∈ Vi.This requires a communication step.
3.3.3.E ADMM: Implementing (3.28) and (3.29)
Recall that the dual variable λji is maintained at both nodes i and j. Node i can carry the
update λji(t+ 1) in (3.28), for all j ∈ Vi, since the needed data are available (recall that yji(t+ 1)
is available from the previous communication step). To update λij(t+ 1) = λij(t) + ρ(yij(t+ 1)−xj(t+1), node i needs to receive xj(t+1) from its neighbors j ∈ Vi. This requires a communication
step.
3.3.3.F Summary of the distributed algorithm
Our ADMM-based algorithm stops after a fixed number of iterations, denoted T . Algorithm 7
outlines the procedure derived in Sections 3.3.3.C and 3.3.3.D, and corresponds to step 2 of the
ADMM-based algorithm (Algorithm 5). Note that, in order to implement step 5 of Algorithm 7,
one must adapt Algorithm 6 to the problem at hand.
57
3. Distributed network localization with initialization: Nonconvex procedures
Algorithm 7 Step 2 of Algorithm 5 using ADMM: position updatesInput: x[l]Output: x[l + 1]1: for t = 0 to T − 1 do2: for each node i ∈ V in parallel do3: Solve Problem (3.32) by minimizing H in (3.35) with Alg. 6 to obtain yij(t+ 1), j ∈ Vi4: for k = 1 to |Ai| do5: Solve Problem (3.33) to obtain zik(t+ 1)6: end for7: Send yij(t+ 1) to neighbor j ∈ Vi8: Compute xi(t+ 1) from (3.47)9: Send xi(t+ 1) to all j ∈ Vi
10: Update λji(t+ 1), µik(t+ 1), j ∈ Vi, k ∈ Ai as in (3.28) and (3.29)11: end for12: end for13: return x[l + 1] = x(T )
A – Communication load Algorithm 7 shows two communication steps: step 7 and step 9.
At step 7 each node i sends |Vi| vectors in Rp, each to one neighboring sensor, and at step 9 a
vector in Rp is broadcast to all nodes in Vi. This results in 2TL|Vi| communications of Rp vectors
for node i for the overall algorithm. When comparing with SGO in [18], for T iterations, node i
sends T |Vi| vectors in Rp. The increase in communications is the price to pay for the parallel
nature of the ADMM-based algorithm.
3.3.4 Experimental setup
Unless otherwise specified, the generated geometric networks are composed by 4 anchors and 50
sensors, with an average node degree, i.e., 1|V|∑i∈V |Vi|, of about 6. In all experiments the sensors
are distributed at random and uniformly on a square of 1 × 1, and anchors are placed, unless
otherwise stated, at the four corners of the unit square (to follow [18]), namely, at (0, 0),(0, 1),
(1, 0) and (1, 1). These properties require a communication range of about R = 0.24. Since
localizability is an issue when assessing the accuracy of sensor network localization algorithms,
the used networks are first checked to be generically globally rigid, so that a small disturbance in
measurements does not create placement ambiguities. To detect generic global rigidity, we used
the methodologies in [47, Section 2]. The results for the proposed algorithm consider L = 40 MM
iterations, unless otherwise stated.
3.3.4.A ADMM and SGO: RMSE vs. initialization noise
Two sets of experiments were made to compare the RMSE performance of SGO in [18] and the
proposed Algorithm 5, termed DCOOL-NET, as a function of the initialization quality (i.e., σinit
in (3.22)). In the first set, range measurements are noiseless (i.e., σ = 0 in (3.64)), whereas in the
second set we consider noisy range measurements (σ > 0).
58
3.3 Majorization-Minimization with convex tight majorizer
0 0.05 0.1 0.15 0.2 0.25 0.3
0.02
0.04
0.06
0.08
DCOOL-NET
SGO
RMSE
I nit ializat ion noise intensity (! i n i t)
Figure 3.7: RMSE vs. σinit, the intensity of initialization noise in (3.22). The range measurementsare noiseless: σ = 0 in (3.64). Anchors are at the unit square corners. Proposed DCOOL-NET(red, solid) and SGO (blue, dashed) attain comparable accuracy.
Table 3.2: Squared error dispersion over Monte Carlo trials for Figure 3.7.σinit DCOOL-NET SGO
0.01 0.0002 0.00070.10 0.0638 0.12900.30 0.2380 0.3400
Noiseless range measurements In this setup 300 Monte Carlo trials were run. As the mea-
surements are accurate (σ = 0 in (3.64)) one would expect not only insignificant values of RMSE,
but also a considerable agreement between all the Monte Carlo trials on the solution for suffi-
ciently close initializations. Figure 3.7 confirms that both DCOOL-NET and SGO achieve small
error positions, and their accuracies are comparable. As stated before, SGO also has a low com-
putational complexity. In fact, lower than DCOOL-NET (although DCOOL-NET is fully parallel
across nodes, whereas SGO operates by activating the nodes sequentially, implying some high-level
coordination). Table 3.2 shows the squared error dispersion over all Monte Carlo trials, i.e., the
standard deviation of the data SEm : m = 1, . . . ,M,SEm = ‖xm − x?‖2, for both algorithms.
We see that DCOOL-NET exhibits a more stable performance, in the sense that it has a lower
squared error dispersion.
Noisy range measurements We set σ = 0.12 in the noise model (3.64). Figure 3.8 shows that
59
3. Distributed network localization with initialization: Nonconvex procedures
0 0.05 0.1 0.15 0.2 0.25 0.3
0.04
0.06
0.08
0.1
DCOOL-NET
SGO
RMSE
I nit ializat ion noise intensity (! i n i t)
Figure 3.8: RMSE vs. σinit, the intensity of initialization noise in (3.22). The range measurementsare noisy: σ = 0.12 in (3.64). Anchors are at the unit square corners. Proposed DCOOL-NET(red, solid) outperforms SGO (blue, dashed) in accuracy.
Table 3.3: Squared error dispersion over Monte Carlo trials for Figure 3.8.σinit DCOOL-NET SGO
0.00 0.0118 0.07830.01 0.0121 0.07750.10 0.0727 0.16100.30 0.2490 0.3320
DCOOL-NET fares better than SGO: the gap between the performances of both algorithms is
now quite significant. The squared error dispersion over all Monte Carlo trials for both algorithms
is given in Table 3.3. As before, we see that DCOOL-NET is more reliable, in the sense that it
exhibits lower variance of estimates across Monte Carlo experiments.
We also considered placing the anchors randomly within the unit square, instead of at the
corners. This is a more realistic and challenging setup, where the sensors are no longer necessarily
located inside the convex hull of the anchors. The corresponding results are shown in Figure 3.9 and
Table 3.4, for 250 Monte Carlo trials. Again, DCOOL-NET achieves better accuracy. Comparing
the dispersions in Tabs. 3.3 and 3.4 also reveals that the gap in reliability between SGO and our
algorithm is now wider.
60
3.3 Majorization-Minimization with convex tight majorizer
0 0.1 0.2 0.3 0.4 0.5
0.04
0.08
0.12
0.16
DCOOL-NET
SGO
RMSE
I nit ializat ion noise intensity (! i n i t)
Figure 3.9: RMSE vs. σinit, the intensity of initialization noise in (3.22). The range measurementsare noisy: σ = 0.12 in (3.64). Anchors were randomly placed in the unit square. ProposedDCOOL-NET (red, solid) outperforms SGO (blue, dashed) in accuracy.
Table 3.4: Squared error dispersion over Monte Carlo trials for Figure 3.9.σinit DCOOL-NET SGO
0.00 0.0097 0.07120.01 0.0099 0.07090.10 0.1550 0.31600.30 0.4350 0.84400.50 0.8330 1.3000
3.3.4.B ADMM and SGO: RMSE vs. measurement noise
To evaluate the sensitivity of both algorithms to the intensity of noise present in range measure-
ments (i.e., σ in (3.64)), 300 Monte Carlo trials were run for σ = 0.01, 0.1, 0.12, 0.15, 0.17, 0.2, 0.3.
Both algorithms were initialized at the true sensor positions, i.e., σinit = 0 in (3.22), and ADMM
performs L = 100 iterations3. Figure 3.10 and Table 3.5 summarize the computer simulations for
this setup. As before, ADMM consistently achieves better accuracy and stability.
3.3.4.C ADMM and SGO: RMSE vs. communication cost
We assessed how the RMSE varies with the communication load incurred by both algorithms.
We considered the general setup described in Section 3.3.4. The results are displayed in Figure 3.11.
3This is to guarantee that, in practice, ADMM indeed attained a fixed point, but the results barely changed forL = 40.
61
3. Distributed network localization with initialization: Nonconvex procedures
0.05 0.1 0.15 0.2 0.25 0.3
0.02
0.04
0.06
0.08
DCOOL-NET
SGO
RMSE
Measurement noise intensity (!)
Figure 3.10: RMSE vs. σ, the intensity of measurement noise in (3.64). No initialization noise:σinit = 0 in (3.22). Anchors are at the unit square corners. Proposed ADMM (red, solid) outper-forms SGO (blue, dashed) in accuracy.
Table 3.5: Squared error dispersion over Monte Carlo trials for Figure 3.10.σ DCOOL-NET SGO
0.01 0.0002 0.00160.10 0.0177 0.06880.12 0.0218 0.07020.15 0.0326 0.09210.17 0.0394 0.09930.20 0.0525 0.10900.30 0.1020 0.1630
We see an interesting tradeoff: SGO converges much quicker than ADMM (in terms of commu-
nication rounds), and attains a lower RMSE sooner. However, ADMM can improve its accuracy
through more communications, whereas SGO remains trapped in a suboptimal solution.
3.3.4.D ADMM: RMSE vs. parameter ρ
The parameter ρ plays a role in the augmented Lagrangian discussed in Section 3.3.3, and is
user-selected. As such, it is important to study the sensitivity of ADMM to this parameter choice.
For this purpose, we have tested several ρ between 1 and 200. For each choice, 300 Monte Carlo
trials were performed using noisy measurements and initializations. Figure 3.12 portrays RMSE
against ρ for L = 40 iterations of ADMM. There is no ample variation, especially for values of
62
3.3 Majorization-Minimization with convex tight majorizer
1 2 3 4 5 6 7 8x 106
0.05
0.06
0.07
0.08
DCOOL-NET
SGO
Number of communicat ions
RMSE
Figure 3.11: RMSE versus total number of two-dimensional vectors communicated in the network.The range measurements are noiseless: σ = 0 in (3.64). Initialization is noisy: σinit = 0.1 in (3.22).Anchors are at the unit square corners. Proposed ADMM (red, solid) outperforms SGO (blue,dashed) in accuracy, at the expense of more communications.
ρ over 30, which offers some confidence in the algorithm resilience to this parameter, a pivotal
feature from the practical standpoint. However, an analytical approach for selecting the optimal ρ
is beyond the scope of this work, and is postponed for future research. Note that adaptive schemes
to adjust ρ do exist for centralized settings, e.g., [40], but seem impractical for distributed setups
as they require global computations.
3.3.5 Proof of majorization function properties
We now prove Proposition 9. We write Φd(u) instead of Φd(u|v) and we let 〈x, y〉 = x>y.
Convexity
Note that gd is convex as the composition of the convex, non-decreasing function (·)2+ with the
convex function ‖·‖− d. Also, hd(〈v/‖v‖, ·〉 − d) is convex as the composition of the convex Huber
function hd(·) with the affine map 〈v/‖v‖, ·〉 − d. Finally, Φd is convex as the pointwise maximum
of two convex functions.
63
3. Distributed network localization with initialization: Nonconvex procedures
20 40 60 80 100 120 140 160 180 200
0.045
0.05
0.055
0.06
0.065
Parameter value (!)
RMSE
Figure 3.12: RMSE vs. ρ. The range measurements are noisy: σ = 0.05 in (3.64). Initialization isnoisy: σinit = 0.1 in (3.22). Anchors are the unit square corners.
Tightness
It is straightforward to check that φd(v) = Φd(v) by examining separately the three cases
‖v‖ < d, d ≤ ‖v‖ < 2d and ‖v‖ ≥ 2d.
Majorization
We must show that Φd(u) ≥ φd(u) for all u. First, consider ‖u‖ ≥ d. Then, gd(u) = φd(u)
and it follows that Φd(v) = maxgd(u), hd(〈v/‖v‖, u〉 − d) ≥ φd(u). Now, consider ‖u‖ < d and
write u = Ru, where R = ‖u‖ < d and ‖u‖ = 1. It is straightforward to check that, in terms of R
and u, we have φd(u) = (R − d)2 and Φd(u) = hd(R〈v, u〉 − d), where v = v/‖v‖. Thus, we must
show that hd(R〈v, u〉− d) ≥ (R− d)2. Motivated by the definition of the Huber function hd in two
branches, we divide the analysis in two cases.
Case 1: |R〈v, u〉−d| ≤ d. In this case, hd(R〈v, u〉−d) = (R〈v, u〉−d)2. Noting that |〈v, u〉| ≤ 1,
there holds
(R〈v, u〉 − d)2 ≥ inf(Rz − d)2 : |z| ≤ 1 = (R− d)2,
where the fact that R < d was used to compute the infimum over z (attained at z = 1).
Case 2: |R〈v, u〉 − d| > d. In this case, hd(R〈v, u〉 − d) = 2d|R〈v, u〉 − d| − d2. Thus,
hd(R〈v, u〉 − d) ≥ d2 ≥ (d−R)2,
where the last inequality follows from 0 ≤ R < d.
64
3.3 Majorization-Minimization with convex tight majorizer
3.3.6 Proof of Proposition 9
We write Φd(u) instead of Φd(u|v) and we let 〈x, y〉 = x>y.
Convexity
Note that gd is convex as the composition of the convex, non-decreasing function (·)2+ with the
convex function ‖·‖− d. Also, hd(〈v/‖v‖, ·〉 − d) is convex as the composition of the convex Huber
function hd(·) with the affine map 〈v/‖v‖, ·〉 − d. Finally, Φd is convex as the pointwise maximum
of two convex functions.
Tightness
It is straightforward to check that φd(v) = Φd(v) by examining separately the three cases
‖v‖ < d, d ≤ ‖v‖ < 2d and ‖v‖ ≥ 2d.
Majorization
We must show that Φd(u) ≥ φd(u) for all u. First, consider ‖u‖ ≥ d. Then, gd(u) = φd(u)
and it follows that Φd(v) = maxgd(u), hd(〈v/‖v‖, u〉 − d) ≥ φd(u). Now, consider ‖u‖ < d and
write u = Ru, where R = ‖u‖ < d and ‖u‖ = 1. It is straightforward to check that, in terms of R
and u, we have φd(u) = (R − d)2 and Φd(u) = hd(R〈v, u〉 − d), where v = v/‖v‖. Thus, we must
show that hd(R〈v, u〉− d) ≥ (R− d)2. Motivated by the definition of the Huber function hd in two
branches, we divide the analysis in two cases.
Case 1: |R〈v, u〉−d| ≤ d. In this case, hd(R〈v, u〉−d) = (R〈v, u〉−d)2. Noting that |〈v, u〉| ≤ 1,
there holds
(R〈v, u〉 − d)2 ≥ inf(Rz − d)2 : |z| ≤ 1 = (R− d)2,
where the fact that R < d was used to compute the infimum over z (attained at z = 1).
Case 2: |R〈v, u〉 − d| > d. In this case, hd(R〈v, u〉 − d) = 2d|R〈v, u〉 − d| − d2. Thus,
hd(R〈v, u〉 − d) ≥ d2 ≥ (d−R)2,
where the last inequality follows from 0 ≤ R < d.
3.3.7 Proof of (3.31)
We show how to rewrite (3.30) as (3.31). First, note that F (y, z) in (3.24) can be rewritten as
F (y, z) =∑i
∑j∈Vi
Fij(yii, yij) + 2∑i
∑k∈Ai
Fik(zik). (3.48)
65
3. Distributed network localization with initialization: Nonconvex procedures
Here, we used the fact that Fij(yji, yjj) = Fji(yjj , yji) which follows from dij = dji and Φd(u|v) =Φd(−u| − v), see (3.15). In addition, there holds∑
i
∑j∈Vi
λ>ji(yji − xi) +ρ
2‖yji − xi‖2
=∑j
∑i∈Vj
λ>ij(yij − xj) +ρ
2‖yij − xj‖2
=∑i
∑j∈Vi
λ>ij(yij − xj) +ρ
2‖yij − xj‖2 . (3.49)
The first equality follows from interchanging i with j. The second equality follows from noting
that i ∈ Vj if and only if j ∈ Vi. Using (3.48) and (3.49) in (3.30) gives (3.31).
3.3.8 Summary
We presented a convex majorizer, crafted to be a tight fit to the sensor network localization
problem (1.1).
We developed a distributed, fully parallel algorithm to optimize the convex majorizer, based
on the ADMM. This choice allowed for the distribution of the problem but at the expense of
unpractical communication load. This behavior of the algorithm can be explained by the increase
of the number of variables, when adding edge variables to the equivalent problems, but mainly
by the fact that node subproblems do not have closed form exact solutions and ADMM has to
compensate the deviations of the partial iterative solutions with more communication rounds.
We are currently establishing the proof of tightness in Rn for n > 1 as enunciated in Conjec-
ture 10, and investigating a proximal method to efficiently minimize each majorizer in a distributed
fashion, also allowing for gossip-like asynchronous solutions.
3.4 Sensor network localization: a graphical model approach
This Section focuses on the sensor network localization problem, when one has access to the
mean and variance of the normally distributed priors on the sensor positions. In this setting we
do not need landmarks or anchors to take care of rotation, translation or flip ambiguities, so when
anchors are not easy to determine and on deployment we have some notion of the drop-off of sensors
and their spread, this solution is appropriate.
The problem is cast under the formalism of probabilistic graphical models and the optimization
problem to obtain the MAP (maximum a posteriori) estimate of the sensor positions is stated. The
proposed goals concentrate on suboptimal approximation methods for the derived combinatorial
problem.
In general, the deployment of the sensors in the terrain is not done in an accurate way, but
sometimes it is possible to delimit regions with some probability of containing each sensor. Many
sensor networks can also acquire noisy distance measurements between neighboring nodes, thus
obtaining data to estimate their true positions. Under such conditions, each node’s position can
66
3.4 Sensor network localization: a graphical model approach
be seen as a random variable whose distribution depends on the distribution of the noisy mea-
surements, the prior on its own position and the distributions of the neighboring nodes positions.
Here, the probabilistic graphical models framework may capture this complex set of dependen-
cies between random variables and enable the use of general purpose algorithms for performing
inference.
The graphical model for the sensor network coincides with the measurement model.
3.4.1 Uncertainty models
In order to establish the graphical model formalism on our problem, we restate several objects
already defined, but now framed in the probabilistic setting. Range measurements are contami-
nated by zero mean independent Gaussian noise. So, the distance between node t and node u can
be expressed as
dtu = ‖x?t − x?u‖+ νtu, νtuiid∼ N (0, σ), (3.50)
where x?t is the true position of node t. A set of measurements corresponding to a subset of edges
I ⊂ E is denoted by dI and, in the same way, a set of positions corresponding to a subset of
nodes V ∈ V is denoted as xV . The probability distribution of νtu will be denoted by pν(νtu). The
noisy range measurement acquired between sensor t and anchor k ∈ At is modelled by
rtk = ‖x?t − ak‖+ νtk, (3.51)
where ak is the anchor position and νtk is a random variable with probability distribution pν(νtk),
the same as in (3.50).
It is assumed that dtu = dut and that the position variables x are independent of the random
variables νE = νtu : t ∼ u ∈ E and νV = νtk : t ∈ V, k ∈ At. Additionally, it is presumed that
each sensor position xt has a prior distribution pt(xt) = N (xt;µt, Rt). Each xt is independent
from xV−t.
The joint distribution is, thus,
p(x, dE) = p(dE |x)p(x)=
∏t∼u p(dtu|xt, xu)
∏t pt(xt)
∏t
∏k∈At
p(rtk|xt), (3.52)
and the a posteriori distribution is proportional to the joint distribution in (3.52), i.e.,
p(x|dE) ∝∏t∼u
pν(‖xt − xu‖ − dtu)∏t
pt(xt)∏t
∏k∈At
pν(‖xt − ak‖ − rtk), (3.53)
where we explicitly wrote the conditional probabilities in terms of pν .
3.4.2 Optimization problem
As defined in the previous section, all probability distributions are Gaussian. To find the
maximum a posteriori (MAP) estimate for the sensor positions an optimization problem is cast
67
3. Distributed network localization with initialization: Nonconvex procedures
by taking the logarithm of Eq. (3.53), thus obtaining
minimizex
∑t∼u
θtu(xt, xu) +∑t
θt(xt), (3.54)
where pairwise potentials are
θtu(xt, xu) =1
σ2(‖xt − xu‖ − dtu)2,
and single node potentials
θt(xt) =1
σ2
∑k∈At
(‖xt − ak‖ − rtk)2 + (xt − µt)>R−1t (xt − µt).
Problem (3.54) is known to be NP-hard for generic graphs, as stated earlier.
3.4.3 Combinatorial problem
We discretize the 95% confidence region for each of the prior distributions, collecting an alphabet
of candidate node positions Xt = α(1)t , · · · , α(nt)
t for each sensor node t, where nt is the cardinality
of each alphabet, and we formulate a combinatorial problem over the collection of such alphabets
as
minimizext∈Xt
∑t∼u
θtu(xt, xu) +∑t
θt(xt). (3.55)
To rewrite the problem over binary variables, we translate this functional form into a matrix form;
we define the matrix Θtu as the evaluation of the pairwise potential θtu(xt, xu) on all points of the
intervening nodes’ alphabets as
Θtu :=
θtu(α
(1)t , α
(1)u ) θtu(α
(1)t , α
(2)u ) · · · θtu(α
(1)t , α
(nu)u )
θtu(α(2)t , α
(1)u ) · · ·
.... . .
...θtu(α
(nt)t , α
(1)u ) · · · θtu(α
(nt)t , α
(nu)u )
, (3.56)
and the vector θt of all evaluations of the node potential function θt(xt) over the alphabet Xt toobtain
θt :=
θt(α
(1)t )
θt(α(2)t )...
θt(α(nt)t )
.We also specify a set ∆t :=
et : et ∈ 0, 1nt , e>t 1 = 1
; it is now possible to rewrite the problem
over the binary variables et as
minimizeet∈∆t
∑t∼u
e>t Θtueu +∑t
e>t θt. (3.57)
The formulation in Problem (3.57) is well known in the probabilistic graphical models literature.
68
3.4 Sensor network localization: a graphical model approach
3.4.4 Related work
3.4.4.A Linear relaxation and tree-reweighted message passing algorithms
The problem of minimizing (3.55) is widely attacked by means of a Linear programming (LP)
relaxation (see e.g., Wainwright et al. [48]). This approach relies on minimizing (3.55) in the local
marginal polytope, relaxing the integer constraints to non-negative ones. This relaxation is tight
for graphs configured as trees. Nevertheless, the number of variables is very large and the method
does not scale well. To address this issue, some other approaches rely on maximizing the dual
of (3.55). The tree-reweighted message passing algorithms solve a dual problem determined by a
convex combination of trees.
3.4.4.B Dual decomposition
Komodakis et al. [49] proposed a Lagrangian relaxation of the MAP problem related to the
optimization technique of dual decomposition. Rather than minimizing directly (3.55), the problem
is decomposed into a set of subproblems which are easier to solve. As the subproblems emerge
from dualization, the sum of their minima is a lower bound on the value of (3.55). One can
apply different decompositions, resulting in distinct relaxations; if the minimization of (3.55) is
decomposed in a set of trees, this relaxation is equivalent to the LP relaxation. A major issue
arising from the dual nature of this algorithm is how to find a primal solution.
Remarks
1. Problem (3.54) is very hard to solve. A possible approach is to discretize the 6-sigma ellipsoid
defined by the prior distributions, as mentioned in Section 3.4.3, obtaining a combinatorial
problem for which it is possible to design linear and semi-definite relaxations (see Wainwright
et al. [48]), in order to approximate the optimal solution.
2. Dealing with graphical models with cycles — like the ones generally arising from geometric
networks — is also a difficult task. In fact, there are no guarantees of convergence of the
sum-product updates on such topologies.
3. As observed in Ihler et al. [4], only a coarse discretization of the 2D or 3D space leads to
a problem which is computationally effective. Nevertheless, the obtained result can provide
an initialization to be refined by local optimization methods as explored, e.g., in Soares et
al. [8].
3.4.5 Contributions
1. Design an effective convex approximation to Problem (3.54), following the approach sketched
in Remark 1;
69
3. Distributed network localization with initialization: Nonconvex procedures
2. Formulate an iterative scheme for monotonically decreasing the cost function, by performing
inference on judiciously chosen spanning trees of the graph, thus tackling the issue noted in
Remark 2;
3. Provide numerical results assessing the value of the approaches taken on this Section.
3.4.6 Algorithms3.4.6.A Linear and semidefinite relaxations
The work of Wainwright and Jordan [48] establishes a linear relaxation of Problem (3.57) that
we will derive in a different way. The cited work [48] proves that the relaxation is exact for
tree-structured graphs. We begin by defining:
Θ :=
0 Θ12 Θ13 · · · Θ1n
Θ21 0 Θ23 · · · Θ2n
.... . .
...Θn1 · · · 0
θ :=
θ1
θ2
...θn
e :=
e1
e2
...en
,where Θtu is as in (3.56), if the edge t ∼ u belongs to the edge set E , and the zero matrix otherwise.
We reformulate Problem (3.57) as
minimize Tr
[0 θ>/2θ2 Θ
]>E
subject to E =
[1e
] [1 e
]et ∈ ∆t
(3.58)
and rewrite the restrictions to isolate the non convexity in a rank constraint. In order to do so, we
need to write our variable E as
E =
[1 E>21
E21 E22
]where E21 = e, E22 = ee>. We can write the equivalent restrictions as
E = E>
E ≥ 0
E11 = 1
diag ((E22)ii) = (E21)ii
(E22)ij1 = (E21)i
1>(E21)i = 1
Rank(E) = 1.
(3.59)
To achieve the linear relaxation of (3.58) we drop the rank constraint in (3.59). As stated before,
this linear program is only exact for tree structured graphs, as shown in [48]. For graphs with
cycles we propose an SDP reformulation of (3.58), with the constraints in (3.59) except the first,
E = E>, that we replace with
E 0. (3.60)
70
3.4 Sensor network localization: a graphical model approach
One could expect that this stronger constraint allows a more approximate result in graphs with
cycles, at the expense of an increase in computational cost incurred when passing from the linear
to the semidefinite problem.
3.4.6.B Distributed tree-based inference
The second contribution of this work is an iterative scheme to monotonically decrease the cost
function.
We know that inference on trees is exact. In fact, the methods referred to in Section 3.4.4
work with a dual problem to have access to this important property. Our approach will perform
inference on trees, but still retains a primal nature. At each step we choose a spanning tree over
the geometric measurement graph and perform inference on it. In order to choose the edges that
go into the spanning tree, for each edge t ∼ u, we find the vectors at and au such that a separable
matrix majorizes as tightly as possible the edge potentials matrix. Here, separable matrix means
we can impute to each alphabet element of each node a constant part of the entries of the edge
potentials matrix Θtu. Mathematically, for each edge t ∼ u, we solve the problem
minimizeat,au
‖at1> + 1a>u −Θtu‖F
subject to at1> + 1a>u ≥ Θtu
e>t at + e>u au = e>t Θtueu,
(3.61)
where ‖·‖F denotes the Frobenius norm. The values for et and eu are given as an initialization. The
first restriction in (3.61) ensures that the separable approximation to the edge potentials matrix
lies above it, whereas the second one guarantees it is tight at the initialization point. The optimal
value of problem (3.61) represents the cost of breaking the edge t ∼ u and at and au are the vectors
to add to the node potentials of nodes t and u, respectively, in case the edge t ∼ u is not present
in the spanning tree. Mathematically, we construct a sequence of problems solvable in polynomial
time that majorize (3.57):
minimizeet∈∆t
f(e) =∑
t∼u∈Te>t Θtueu +
∑t
e>t θt, (3.62)
where θt is
θt = θt +∑t∼v∈Et∼v 6∈T
(at)t∼v. (3.63)
We build the spanning tree with the edges that are more expensive to break, thus retaining
those which are less separable, and perform exact inference on the resulting tree. The method
then builds the maximum spanning tree T of the measurement graph G breaking the cheapest
edges. There are many algorithms which are able to compute minimum spanning trees, even in a
distributed way (see for example the work of Gallager et al. in [50]). The maximum spanning tree
is obtained by invoking the chosen minimum spanning tree algorithm with the edge weights w′t∼u =
71
3. Distributed network localization with initialization: Nonconvex procedures
Algorithm 8 Distributed monotonic spanning tree-based algorithmInput: Initialization eOutput: Estimate e1: while some stopping criterion is not met do2: for t ∼ u ∈ E do3: Solve problem (3.61) for edge t ∼ u4: wt∼u = optimal value of problem (3.61)5: (at, au)t∼u = (at, au) optimal points of problem (3.61)6: end for7: Compute maximum spanning tree T of G, feeding edge weights w′t∼u = maxwi∼j : i ∼ j ∈
E − wt∼u + 1 to a (distributed) minimum spanning tree algorithm8: Increment node potentials of broken edges θt as in (3.63)9: Perform exact inference on T , to solve problem (3.62), obtaining a new estimate e
10: end while11: return e = e
maxwi∼j : i ∼ j ∈ E − wt∼u + 1. We note that using an upper bound on maxwi∼j : i ∼ j ∈ Ewill also work and spares the distributed computation of the maximum. Algorithm 8 monotonically
decreases the cost in (3.57) at each iteration. These properties are inherited from the Majorization-
Minimization framework (see Hunter et al. [37] for an in-depth treatment of the framework). We
stress that Algorithm 8 does not prescribe any method to perform exact inference on each tree T .This flexibility is also found in Komodakis et al. [49]. To obtain a distributed algorithm, we can
use a message passing max-sum method which is distributed across nodes.
As with all Majorization-Minimization algorithms, Algorithm 8 requires an initialization. Our
strategy is to have an initial iteration of Algorithm 8 where, instead of solving problem (3.61), we
solve the least squares problem
minimizeat,au
‖at1> + 1a>u −Θtu‖2F
for each edge t ∼ u. As in (3.61), we are looking for the best fit between a separable matrix and
the edge potential matrix Θtu, but, as the separable matrix will be used as an initializing step, it
does not majorize the edge potential matrix, neither it obeys the tightness requirement.
3.4.6.C Distributed nature of the algorithm
Algorithm 8 is distributed, because at each edge the intervening nodes can agree which one
will perform Step 3; the computing node, say, t, will them communicate with the neighbor u to
pass wt∼u and au, thus enabling the distributed computation of the maximum spanning tree T .Inference in Step 9 is also distributed, as mentioned earlier.
3.4.7 Experimental results
In this Section we present numerical experiments assessing the quality of the proposed algo-
rithms in the context of the localization problem.
72
3.4 Sensor network localization: a graphical model approach
Methods
We conducted simulations with several uniquely localizable geometric networks with 5 sensors
randomly distributed in a two-dimensional square of size 1 × 1. The discretization was random,
with 13 elements in each alphabet.
The noisy range measurements are generated according to
dij = |‖x?i − x?j‖+ νij |, rik = |‖x?i − ak‖+ νik|, (3.64)
where x?i is the true position of node i, and νij : i ∼ j ∈ E ∪ νik : i ∈ V, k ∈ Ai are
independent Gaussian random variables with zero mean and standard deviation σ. The accuracy
of the algorithms is measured by the original nonconvex cost value in (3.55) and by the mean
positioning error, defined as
MPE =1
M
M∑m=1
n∑i=1
‖xi(m)− x?i ‖, (3.65)
whereM is the total number of Monte Carlo trials, xi(m) is the estimate generated by an algorithm
at the Monte Carlo trial m, and x?i is the true position of node i.
3.4.7.A Linear and semidefinite relaxations
The first experiment aimed at comparing the performance of the linear and semidefinite re-
laxations. In Figure 3.13, we can observe the average nonconvex cost over Monte Carlo trials,
5 10 15 20 25 30 35 40 45 50
0.5
1
1.5
2
2.5
3
Monte Carlo Trials
Ave
rag
e c
ost
va
lue
LP
SDP
Figure 3.13: Average cost over Monte Carlo trials.
73
3. Distributed network localization with initialization: Nonconvex procedures
stabilizing on 1.745 for the LP, and 1.478 for the SDP relaxations. Thus, we obtain an improve-
ment of 15% by using the proposed tighter relaxation. The mean positioning error depicted in
5 10 15 20 25 30 35 40 45 50
0.15
0.2
0.25
0.3
0.35
Monte Carlo trials
Mean p
ositio
nin
g e
rror
/sensor
LP
SDP
Figure 3.14: Mean positioning error per sensor over Monte Carlo trials.
Figure 3.14 also shows that, as expected, a tighter relaxation can perform better in terms of
accuracy. The rank of the solution matrix in the experiments also shows the superiority of our
5 10 15 20 25 30 35 40 45 501
1.5
2
2.5
3
3.5
4
Monte Carlo Trials
Ra
nk o
f th
e s
olu
tio
n m
atr
ix
LP
SDP
Figure 3.15: Rank of the solution matrix E in the tested Monte Carlo trials.
relaxation in accuracy: for all the trials, the SDP method is rank 1, proving that the solution of the
SDP problem in (3.58), with the restriction (3.60), is the solution to the nonconvex, combinatorial
problem (3.57). The LP relaxation, on the other hand, is very loose most of the trials, as seen in
Figure 3.15. The price to pay for the accuracy gains is a noticeable increase in execution time,
from less than a second for the LP to several minutes for the SDP.
74
3.4 Sensor network localization: a graphical model approach
Algorithm 9 Coordinate descent algorithmInput: Initialization xOutput: Estimate x1: while some stopping criterion is not met do2: for t ∈ V do3: Compute the cost C for all elements of the alphabet Xt and considering xV−t fixed.4: xt = argminy C(x1, · · · , xt−1, y, xt+1, xn)5: end for6: end while7: return x = x
Table 3.6: Cost values per sensorLP MM+LS MM+LP CD+LP
0.3617 0.0358 0.0788 0.0719
3.4.7.B Distributed majorization-minimization
In a second experiment, we compared the performance of our Algorithm 8 with a vanilla co-
ordinate descent procedure, described in Algorithm 9. We ran our MM algorithm initialized as
explained in the end of Section 3.4.6.B (MM+LS), and our MM algorithm and the coordinate
descent method, both initialized with an LP estimate (MM+LP and CD+LP). Measurements
were contaminated with white Gaussian noise with standard deviation σ = 0.01. The resulting
cost values per sensor are shown in Table 3.6. Here we can see that all refinement strategies im-
prove the initialization score, but our combination of Majorization-Minimization plus least squares
initialization (MM+LS) decreases the cost by one order of magnitude.
3.4.8 Summary
We addressed the sensor network localization problem with a graphical model approach, by
proposing an SDP relaxation which outperforms the standard LP approximation. Also, we pro-
posed a distributed iterative algorithm to estimate the solution of the problem by means of the
Majorization-Minimization framework, by judiciously choosing spanning trees where inference
could be exactly performed. This method outperformed the coordinate descent method initial-
ized with the LP estimate.
75
4Robust algorithms for sensor
network localization
Contents4.1 Related work and contributions . . . . . . . . . . . . . . . . . . . . . . 784.2 Discrepancy measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.3 Convex underestimator . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3.1 Approximation quality of the convex underestimator . . . . . . . . . . . 804.4 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
77
4. Robust algorithms for sensor network localization
In practice, network applications have to deal with failing nodes, malicious attacks, or, some-
how, nodes facing highly corrupted data — generally classified as outliers. This calls for robust,
uncomplicated, and efficient methods. We propose a dissimilarity model for network localization
which is robust to high-power noise, but also discriminative in the presence of regular Gaussian
noise. We capitalize on the known properties of the M-estimator Huber penalty function to obtain
a robust, but nonconvex, problem, and devise a convex underestimator, tight in the function terms,
that can be minimized in polynomial time. Simulations show the performance advantage of using
this dissimilarity model in the presence of outliers and under regular Gaussian noise: our proposal
consistently outperforms the L1 norm by about one half the positioning error.
4.1 Related work and contributions
Some approaches to robust localization rely on identifying outliers from regular data. Then,
outliers are removed from the estimation of sensor positions. The work in [4] formulates the
network localization problem as an inference problem in a graphical model. To approximate an
outlier process the authors add a high-variance Gaussian to the Gaussian mixtures and employ
nonparametric belief propagation to approximate the solution. In the same vein, [51] employs the
EM algorithm to jointly estimate outliers and sensor positions. Recently, the work [52] tackled
robust localization with estimation of positions, mixture parameters, and outlier noise model for
unknown propagation conditions.
Alternatively, methods may perform a soft rejection of outliers, still allowing them to con-
tribute to the solution. In the work [6] a maximum likelihood estimator for Laplacian noise was
derived and subsequently relaxed to a convex program by linearization and dropping a rank con-
straint, The authors in [39] present a robust Multidimensional Scaling based on the least-trimmed
squares criterion minimizing the squares of the smallest residuals. In [5] the authors use the Huber
loss [53] composed with a discrepancy between measurements and estimate distances, in order to
achieve robustness to outliers. The resulting cost is nonconvex, and optimized by means of the
Majorization-Minimization technique.
The cost function we present incorporates outliers into the estimation process and does not
assume any outlier model. We capitalize on the robust estimation properties of the Huber function
but, unlike [5], we do not address the nonconvex cost in our proposal. Instead, we produce a convex
relaxation which numerically outperforms other natural formulations of the problem.
We present a tight convex underestimator to each term of the robust discrepancy measure for
sensor network localization. Further, we analyze its tightness and compare it with other discrepancy
78
4.2 Discrepancy measure
measures and appropriate relaxations. Our approach assumes no specific outlier model, and all
measurements contribute to the estimate. Numerical simulations illustrate the quality of the convex
underestimator.
4.2 Discrepancy measure
The maximum-likelihood estimator for the sensor positions with additive i.i.d. Gaussian noise
contaminating range measurements is the solution of the optimization problem
minimizex
fG(x),
where
fG(x) =∑i∼j
1
2(‖xi − xj‖ − dij)2 +
∑i
∑k∈Ai
1
2(‖xi − ak‖ − rik)2
is the cost in (1.1). However, outlier measurements will heavily bias the solutions of the optimiza-
tion problem since their magnitude will be amplified by the squares hQ(t) = t2 at each outlier
term. From robust estimation, we know some alternatives to perform soft rejection of outliers,
namely, using L1 loss h|·|(t) = |t| or the Huber loss
hR(t) =
t2 if |t| ≤ R,2R|t| −R2 if |t| ≥ R. (4.1)
The Huber loss joins the best of two worlds: It is robust for large values of the argument — like
the L1 loss — and for reasonable noise levels it behaves quadratically, thus leading to the maximum-
likelihood estimator adapted to regular Gaussian noise. Figure 4.1 depicts a one-dimensional
example of these different costs. We can observe in this simple example the main properties
of the different cost functions, in terms of adaptation to low/medium-power Gaussian noise and
high-power outlier spikes. Using (4.1) we can write our modified robust localization problem as
minimizex
fR(x), (4.2)
where
fR(x) =∑i∼j
1
2hRij (‖xi − xj‖ − dij) +
∑i
∑k∈Ai
1
2hRik
(‖xi − ak‖ − rik). (4.3)
This function is nonconvex and, in general, difficult to minimize. We shall provide a convex
underestimator, that tightly bounds each term of (4.3), thus leading to better estimation results
than other relaxations which are not tight [3].
4.3 Convex underestimator
To convexify fR we can replace each term by its convex hull as depicted in Figure 4.2. Here,
we observe that the high-power behavior is maintained, whereas the medium/low-power is only
79
4. Robust algorithms for sensor network localization
−1.5 −1 −0.5 0 0.5 1 1.5
0.2
0.4
0.6
0.8
1
1.2
f | · |Absolute value
fR Huber
fQ Quadrati c
Figure 4.1: The different cost functions under consideration: the maximum-likelihood independentwhite Gaussian noise term fQ(xi, xj) = (‖xi − xj‖ − dij)2 shows the steepest tails, which act asoutlier amplifiers; the L1 loss f|·|(xi, xj) = |‖xi−xj‖−dij |, associated with impulsive noise, whichfails to model the Gaussianity of regular operating noise; and, finally, the Huber loss fR(xi, xj) =hR(‖xi − xj‖− dij), combines robustness to high-power outliers and adaptation to medium-powerGaussian noise.
altered in the convexified area. We define the convex costs by composition of any of the convex
functions h with a nondecreasing function (·)+
(t)+ = max0, t
which, in turn, transforms the discrepancies
δij(x) = ‖xi − xj‖ − dik,
δik(xi) = ‖xi − ak‖ − rik.
As (δij(x))+ and (δik(x))+ are nondecreasing and each one of the functions h is convex, then
fR(x) =∑i∼j
1
2h(
(‖xi − xj‖ − dij)+
)+∑i
∑k∈Ai
1
2h((‖xi − ak‖ − rik)+
)(4.4)
is also convex.
4.3.1 Approximation quality of the convex underestimator
The quality of the convexified quadratic problem was addressed in Section 2.5.1, which we
summarize here for convenience of the reader and extend to the two other convex problems.
80
4.3 Convex underestimator
−1.5 −1 −0.5 0 0.5 1 1.50
0.2
0.4
0.6
0.8
1
1.2
f | · |Absolute value
fR Huber
fQ Quadrati c
Convex t ight under es t imator sf(x) = f
(
max0 , |x i − x j | − d i j)
Figure 4.2: All functions f are tight underestimators to the functions g in Figure 4.1. They are theconvex envelopes and, thus, the best convex approximations to each one of the original nonconvexcost terms. The convexification is performed by restricting the arguments of g to be nonnegative.
The optimal value of the nonconvex f , denoted by f?, is bounded by
f? = f(x?) ≤ f? ≤ f(x?),
where x? is the minimizer of the convex underestimator f , and
f? = minxf(x),
is the minimum of function f . A bound for the optimality gap is, thus,
f? − f? ≤ f(x?)− f?.
It is evident that in all cases (quadratic, Huber, and absolute value) f is equal to g when ‖xi−xj‖ ≥dij and ‖xi − ak‖ ≥ rik. When the function terms differ, say for all edges i ∼ j ∈ E2 ⊂ E , wehave s (‖xi − xj‖ − dij) = 0, and similarly with the anchor terms, leading to
f?Q − f?Q ≤∑
i∼j∈E2
1
2
(‖x?i − x?j‖ − dij
)2 (4.5)
f?|·| − f?|·| ≤∑
i∼j∈E2
1
2
∣∣‖x?i − x?j‖ − dij∣∣ (4.6)
f?R − f?R ≤∑
i∼j∈E2
1
2hRij
(‖x?i − x?j‖ − dij
), (4.7)
where
E2 = i ∼ j ∈ E : ‖x?i − x?j‖ < dij).
81
4. Robust algorithms for sensor network localization
Table 4.1: Bounds on the optimality gap for the example in Figure 4.3
Cost g? − f? Eqs. (4.5)-(4.7) Eqs. (4.8)-(4.10)
Quadratic 3.7019 5.5250 11.3405Absolute value 1.1416 1.1533 3.0511Robust Huber 0.1784 0.1822 0.4786
These bounds are an optimality gap guarantee available after the convexified problem is solved;they tell us how low our estimates can bring the original cost. Our bounds are tighter than theones available a priori from applying [26, Th. 1], which are
f?Q − f?Q ≤∑i∼j
1
2d2ij (4.8)
f?|·| − f?|·| ≤∑i∼j
1
2dij (4.9)
f?R − f?R ≤∑i∼j
1
2hRij (dij) . (4.10)
For the one-dimensional example of the star network costs depicted in Figure 4.3 the bounds
in (4.5)-(4.7), and (4.8)-(4.10), averaged over 500 Monte Carlo trials, are presented in Table 4.1.
The true average gap f? − f? is also shown. In the Monte Carlo trials we sampled a set of zero
mean Gaussian random variables with σ = 0.04 for the baseline Gaussian noise and obtained
a noisy range measurement as in (4.11). One of the measurements is then corrupted by a zero
mean random variable with σ = 4, modelling outlier noise. These results show the tightness of
the convexified function under such noisy conditions and also demonstrate the behaviour of the a
priori bounds in (4.8)-(4.10).
4.4 Numerical experiments
We assess the performance of the three considered loss functions through simulation. The
experimental setup consists in a uniquely localizable geometric network deployed in a square area
with side of 1Km, with four anchors (blue squares in Figure 4.4) located at the corners, and ten
sensors (red stars). Measurements are also visible as dotted green lines. The average node degree
of the network is 4.3. The regular noisy range measurements are generated according to
dij = |‖x?i − x?j‖+ νij |,
rik = |‖x?i − ak‖+ νik|, (4.11)
expressed in Km, where x?i is the true position of node i, and νij : i ∼ j ∈ E ∪ νik : i ∈ V, k ∈Ai are independent Gaussian random variables with zero mean and standard deviation 0.04,
corresponding to an uncertainty of about 40m. Node 7 is malfunctioning and all measurements
related to it are perturbed with Gaussian noise with standard deviation 4, corresponding to an
uncertainty of 4Km. The convex optimization problems were solved with cvx [54]. We ran 100
Monte Carlo trials, sampling both regular and outlier noise.
82
4.5 Summary
Table 4.2: Average positioning error per sensor (MPE/sensor), in meters
f|·| fQ fR
59.50 32.16 31.06
Table 4.3: Average positioning error per sensor (MPE/sensor), in meters, for the biased experiment
f|·| fQ fR
80.98 58.31 47.08
The performance metric used to assess accuracy is the average positioning error defined as (3.65).
In Figure 4.4 we can observe that clouds of estimates from fR and fQ gather around the true po-
sitions, except for the malfunctioning node 7. Note the spread of blue dots in the surroundings
of the edges connecting node 7, indicating that fR better preserves the nodes’ ability to localize
themselves, despite their confusing neighbor, node 7. This intuition is confirmed by the analysis of
the data in Table 4.2, which demonstrates that, even with only one disrupted sensor, our robust
cost can reduce the error per sensor by 1.1 meters. Also, as expected, the malfunctioning node
cannot be reliably located by any of the methods. The sensitivity to the value of the Huber param-
eter R in (4.1) is moderate, as shown in Figure 4.5. In fact, the error per sensor of the proposed
estimator is always the smallest for all tested values of the parameter. We observe that the error
increases when R approaches the standard deviation of the regular Gaussian noise, meaning that
the Huber loss gets closer to the L1 loss and, thus, is no longer adapted to the regular noise (R = 0
corresponds exactly to the L1 loss); in the same way, as R increases, so does the quadratic section,
and the estimator gets less robust to outliers, so, again, the error increases.
Another interesting experiment is to see what happens when the faulty sensor produces mea-
surements with consistent errors or bias. So, we ran 100 Monte Carlo trials in the same setting, but
the measurements of node 7 are consistently 10% of the real distance to each neighbor. The average
positioning error per sensor is shown in Table 4.3. Here we observe a significant performance gap
between the alternative costs, and our formulation proves to be, by far, superior.
4.5 Summary
We proposed an easy to motivate and effective dissimilarity model, which accounts for outliers
without prescribing a model for outlier noise. This dissimilarity model was convexified by means
of the convex envelopes of its terms, leading to a problem with a unique minimum value attainable
in polynomial time. Further, we studied the optimality gap of the discrepancies, both a priori and
after obtaining an estimate, thus providing bounds for the suboptimality of the convexification —
guarantees useful in practice.
Different types of algorithms can be designed to attack the discrepancy measure presented in
this work, since the function is continuous and convex (in the previous section the optimization
83
4. Robust algorithms for sensor network localization
problem of minimizing (4.4) was solved using the cvx general-purpose convex solver). Due to the
distributed nature of networks of sensors or, generically, agents, we aim at investigating a dis-
tributed minimization of the proposed robust loss. There are also several nice properties regarding
distributed operation: the adjustable Huber parameter is local to each edge and, if desired, can be
dynamically adjusted to the local environmental noise conditions, in a distributed manner.
84
4.5 Summary
0 2 3 4 5 7
0
5
10
fQ(x)
fQ(x)
nodene i ghbor ne i ghbor ne i ghbor1D starNetwork
(a) Quadratic cost.
0 2 3 4 5 7
0
5
10
f | · |(x)
f | · |(x)
nodene i ghbor ne i ghbor ne i ghbor1D starNetwork
(b) Absolute value cost.
0 2 3 4 5 7
0
5
10
fR(x)
fR(x)
nodene i ghbor ne i ghbor ne i ghbor1D starNetwork
(c) Robust Huber cost.
Figure 4.3: One-dimensional example of the quality of the approximation of the true nonconvexcosts f(x) by the convexified functions f(x) in a star network. The node positioned at x = 3 has3 neighbors.
85
4. Robust algorithms for sensor network localization
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2
3 4
5
6
7
8
9
10
11
12
13
14
Figure 4.4: Estimates for sensor positions for the three loss functions; We plotted the results ofminimizing the L1 loss f|·| (yellow), the quadratic loss fQ (blue), and the proposed robust estimatorwith Huber loss fR (magenta) It is noticeable that the L1 loss is not able to correctly estimatepositions whose measurements are corrupted with Gaussian noise. The Outlier measurements innode 7 have more impact in the augmented dispersion of blue dots than magenta dots around itsneighbors.
0.04 0.05 0.06 0.07 0.08 0.09 0.131.05
31.1
31.15
Huber function parameter R
Avg.positioning
errorǫ/se
nso
r[m
Figure 4.5: Average positioning error versus the value of the Huber function parameter R. Theaccuracy is maintained even for a wide range of parameter values. We stress that the error willincrease largely when R → 0 and R → ∞, since these situations correspond to the L1 and L2
cases, respectively.
86
5Conclusions and perspectives
Contents5.1 Distributed network localization without initialization . . . . . . . . 885.2 Addressing the nonconvex problem . . . . . . . . . . . . . . . . . . . . 88
5.2.1 With more computations we can do better . . . . . . . . . . . . . . . . . 895.2.2 Network of agents as a graphical model . . . . . . . . . . . . . . . . . . 89
5.3 Robust network localization . . . . . . . . . . . . . . . . . . . . . . . . 905.4 In summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
87
5. Conclusions and perspectives
In this thesis we presented a flow of methods, from the initialization-free convex relaxation
method, which only requires knowledge of noisy measurements and a few anchor locations, to more
precise algorithms that, given also a good initial guess, will provide a highly accurate estimate
of the positions of the nodes. We also addressed localization in harsh environments, prone to
outliers, which need especially robust algorithms to overcome the negative influence of corrupted
or malicious data.
5.1 Distributed network localization without initialization
We presented a simple, fast and convergent relaxation method for synchronous and asyn-
chronous time models. From the analysis of the problem, we uncovered key properties which
allow a synchronous time, distributed gradient algorithm with an optimal convergence rate. We
also presented an asynchronous randomized method, more suited for unstructured and large scale
networks. We proved not only almost sure convergence of the cost value, but also almost sure
convergence to a point, which is, as far as we know, an absolutely novel result for distributed
gradient algorithms in general. This stronger convergence result has a significant impact in real
time applications because nodes can safely probe the amount of change in the estimates to stop
computing. The methods were published in IEEE Transactions on Signal Processing. Extending
this work, we interpreted each term of the cost as a discrepancy measure between a model and the
noisy measurement and generalized it to include in the same cost heterogeneous measurements. In
particular, we fused range and angle information, obtaining very interesting results, which were
already submitted for publication.
5.2 Addressing the nonconvex problem
In some applications it is fundamental to obtain a very accurate estimate of the positions of the
agents. Sometimes the convex approximation discussed above does not achieve these tight precision
requirements. In such cases one should address the nonconvex estimator problem directly, whenever
armed with a good starting point — for example, the convex approximation solution. With this
need in mind, we presented a simple, distributed, and efficient algorithm, proven to converge1,
requiring no parameter tuning. The method turns out to be a member of the majorization-
minimization family where the majorization function is a quadratic.
An alternative to the majorization-minimization framework, initialized with, e.g., the estimate
1More precisely, every limit point of the algorithm is a stationary point of the cost function.
88
5.2 Addressing the nonconvex problem
from the methods described in Chapter 2, could be an homotopy continuation method (see Allgo-
wer [55] for in-depth information on homotopy methods). The downside of such methods is that
they can become very difficult when applied to nonconvex functions, e.g., if we maintain the acces-
sibility condition that all isolated solutions can be reached. This might lead to bifurcations of the
method and, so, a combinatorial problem. Nevertheless, Moré and Wu [56] propose a smoothing
Gaussian kernel, also used in the paper by Destino and Abreu [57] for localization by continuation.
The first work addresses a squared discrepancy with squared distances, whereas the second applies
the same smoothing kernel to the maximum likelihood formulation.
Experimentally, our method has substantial performance improvements over the state of the
art, and adding the convergence properties, the algorithm stands in the small club of distributed,
nonconvex, and provable maximum-likelihood estimators for the network localization problem,
given a good starting point. We presented the method in IEEE GlobalSIP 2014.
5.2.1 With more computations we can do better
In order to be less dependent on the initialization, we aimed at a tighter convex majorization
function as the tool for a majorization-minimization algorithm. The quality of the estimate from
the MM procedure with this novel, tighter approximation was verified experimentally as being more
than one order of magnitude in root mean squared error. This extraordinary result encourages
us to pursue a tailored minimization algorithm for this tighter majorizer. A first attempt was to
follow the Alternating Direction Method of Multipliers (ADMM) strategy, but, even though we
obtained a distributed method with far better accuracy than the benchmark, we were not satisfied
with the amount of communications expended on the process, that are also commonly observed in
distributed methods using ADMM. Also, as the ADMM subproblems at each node did not have
closed form solution, the overall estimate degraded very sharply with a small degradation in the
solutions of the subproblems at each node, thus leading to a less interesting performance than that
of our previously mentioned work. Our next step is to devise a proximal algorithm, taking into
account the non-differentiability of our novel majorizer.
5.2.2 Network of agents as a graphical model
Another perspective on the localization problem is to consider the positions of the nodes as
random variables, and the measurement network as an undirected graphical model. This perspec-
tive was explored in Section 3.4 and the known linear relaxation to the resulting combinatorial
problem was re-derived. The resulting formulation of the problem led to a novel SDP relaxation,
tighter than the linear one and, thus, obtaining better experimental results in terms of root mean
squared error. A faster and more accurate descent algorithm with no initialization and attacking
the nonconvex cost was also presented. This novel method improved the error not only of the
linear relaxation but also of a vanilla coordinate descent initialized with the estimate from the
89
5. Conclusions and perspectives
linear relaxation.
5.3 Robust network localization
Sometimes our nodes are malfunctioning or malicious and their collected measurements behave
like outliers. Albeit the practical pertinence of this problem, research in this topic is still meager.
To bridge this gap, we designed a soft outlier rejection approach, by considering the known outlier
rejecting penalties of the L1 norm and the Huber function. Nevertheless, using these penalties
leads to very difficult, nonconvex problems. We convexified these penalties and the approximated
estimator for the Huber function performed far better in a scenario with malfunctioning nodes than
the approach presented in Chapter 2 and discussed in Section 5.1. The results are very exciting
and our next step is to shape an algorithm to optimize the tight convex approximations in a way
that is distributed, fast and simple to implement.
5.4 In summary
Throughout this work we were dedicated to achieve useful estimates for the agent positions
given a sparse noisy range measurement network and a small set of reference nodes. Here we
presented methods for network localization which are defined by the following principles:
• Full network localization solutions, from uninformed agent deployment to a refined estimate;
• Emphasis on scalable, distributed, solutions;
• Novel, tighter, approximations to the Maximum-Likelihood cost function, thus leading to
estimates that are more resilient to noise;
• Intuitive derivations and simple to implement algorithms;
• Faster and reliable estimates;
• Provable convergence of algorithms.
Open perspectives of work include deepening our approaches, but also considering new settings,
like:
• Online optimization variants of the proposed algorithms;
• Applying our approaches to the mobile setting, by introducing dynamics in the problem
formulation;
• Considering that the noise variance is not constant but rather a function of distance measured;
• Determining the optimal placement of anchors and sensor nodes;
90
5.4 In summary
• Approaching the robust estimation topic with a Laplacian noise model. Here we can write the
Laplacian distribution as a mixture of Gaussians with the same mean but different variances,
as described in Girosi [58]. In this setting, the expectation-maximization algorithm could be
employed.
91
Bibliography
[1] N. Bulusu, J. Heidemann, and D. Estrin, “Gps-less low-cost outdoor localization for very small
devices,” Personal Communications, IEEE, vol. 7, no. 5, pp. 28–34, 2000.
[2] P. Biswas, T.-C. Lian, T.-C. Wang, and Y. Ye, “Semidefinite programming based algorithms
for sensor network localization,” ACM Transactions on Sensor Networks (TOSN), vol. 2, no. 2,
pp. 188–220, 2006.
[3] A. Simonetto and G. Leus, “Distributed maximum likelihood sensor network localization,”
Signal Processing, IEEE Transactions on, vol. 62, no. 6, pp. 1424–1437, Mar. 2014.
[4] A. Ihler, I. Fisher, J.W., R. Moses, and A. Willsky, “Nonparametric belief propagation for
self-localization of sensor networks,” Selected Areas in Communications, IEEE Journal on,
vol. 23, no. 4, pp. 809 – 819, Apr. 2005.
[5] S. Korkmaz and A.-J. van der Veen, “Robust localization in sensor networks with iterative
majorization techniques,” in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009.
IEEE International Conference on, Apr. 2009, pp. 2049 –2052.
[6] P. Oguz-Ekim, J. Gomes, J. Xavier, and P. Oliveira, “Robust localization of nodes and time-
recursive tracking in sensor networks using noisy range measurements,” Signal Processing,
IEEE Transactions on, vol. 59, no. 8, pp. 3930 –3942, Aug. 2011.
[7] C. Soares, J. Xavier, and J. Gomes, “Simple and fast convex relaxation method for coop-
erative localization in sensor networks using range measurements,” Signal Processing, IEEE
Transactions on, vol. 63, no. 17, pp. 4532–4543, Sept 2015.
[8] ——, “Distributed, simple and stable network localization,” in Signal and Information
Processing (GlobalSIP), 2014 IEEE Global Conference on, Dec 2014, pp. 764–768.
[9] ——, “DCOOL-NET: Distributed cooperative localization for sensor networks,” submitted,
http://arxiv.org/abs/1211.7277.
[10] C. Soares and J. Gomes, “Robust dissimilarity measure for network localization,” arXiv
preprint arXiv:1410.2327, 2014.
93
Bibliography
[11] J. Aspnes, D. Goldenberg, and Y. R. Yang, “On the computational complexity of sensor
network localization,” in Algorithmic Aspects of Wireless Sensor Networks. Springer, 2004,
pp. 32–44.
[12] P. Biswas and Y. Ye, “Semidefinite programming for ad hoc wireless sensor network localiza-
tion,” in Proceedings of the 3rd international symposium on Information processing in sensor
networks. ACM, 2004, pp. 46–54.
[13] J. Costa, N. Patwari, and A. Hero III, “Distributed weighted-multidimensional scaling for
node localization in sensor networks,” ACM Transactions on Sensor Networks (TOSN), vol. 2,
no. 1, pp. 39–64, 2006.
[14] M. Gholami, L. Tetruashvili, E. Strom, and Y. Censor, “Cooperative wireless sensor network
positioning via implicit convex feasibility,” Signal Processing, IEEE Transactions on, vol. 61,
no. 23, pp. 5830–5840, Dec. 2013.
[15] S. Srirangarajan, A. Tewfik, and Z.-Q. Luo, “Distributed sensor network localization using
SOCP relaxation,” Wireless Communications, IEEE Transactions on, vol. 7, no. 12, pp. 4886
–4895, Dec. 2008.
[16] F. Chan and H. So, “Accurate distributed range-based positioning algorithm for wireless sensor
networks,” Signal Processing, IEEE Transactions on, vol. 57, no. 10, pp. 4100 –4105, Oct. 2009.
[17] U. Khan, S. Kar, and J. Moura, “DILAND: An algorithm for distributed sensor localization
with noisy distance measurements,” Signal Processing, IEEE Transactions on, vol. 58, no. 3,
pp. 1940 –1947, Mar. 2010.
[18] Q. Shi, C. He, H. Chen, and L. Jiang, “Distributed wireless sensor network localization via
sequential greedy optimization algorithm,” Signal Processing, IEEE Transactions on, vol. 58,
no. 6, pp. 3328 –3340, June 2010.
[19] D. Blatt and A. Hero, “Energy-based sensor network source localization via projection onto
convex sets,” Signal Processing, IEEE Transactions on, vol. 54, no. 9, pp. 3614–3619, Sept.
2006.
[20] J.-B. Hiriart-Urruty and C. Lemaréchal, Convex analysis and minimization algorithms.
Springer-Verlag Limited, 1993.
[21] F. R. Chung, Spectral graph theory. American Mathematical Soc., 1997, vol. 92.
[22] R. B. Bapat, Graphs and matrices. Springer, 2010.
[23] M. Mesbahi and M. Egerstedt, Graph theoretic methods in multiagent networks. Princeton
University Press, 2010.
94
Bibliography
[24] Y. Nesterov, “A method of solving a convex programming problem with convergence rate
O(1/k2),” in Soviet Mathematics Doklady, vol. 27, no. 2, 1983, pp. 372–376.
[25] D. Shah, Gossip algorithms. Now Publishers Inc, 2009.
[26] M. Udell and S. Boyd, “Bounding duality gap for problems with separable objective,” ONLINE,
2014.
[27] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Aca-
demic Publishers, 2004.
[28] D. Bertsekas, “Incremental proximal methods for large scale convex optimization,”
Mathematical Programming, vol. 129, pp. 163–195, 2011.
[29] Z. Lu and L. Xiao, “On the complexity analysis of randomized block-coordinate descent meth-
ods,” arXiv preprint arXiv:1305.4723, 2013.
[30] J. Sturm, “Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones,”
Optimization Methods and Software, vol. 11–12, pp. 625–653, 1999, version 1.05 available
from http://fewcal.kub.nl/sturm.
[31] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and distributed computation: numerical methods.
Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1989.
[32] D. P. Bertsekas, Nonlinear programming. Athena Scientific, 1999.
[33] J. Jacod and P. Protter, Probability Essentials. Springer, 2003, vol. 1.
[34] H. Robbins and D. Siegmund, “A convergence theorem for non negative almost supermartin-
gales and some applications,” in Herbert Robbins Selected Papers. Springer, 1985, pp.
111–135.
[35] G. Calafiore, L. Carlone, and M. Wei, “Distributed optimization techniques for range local-
ization in networked systems,” in Decision and Control (CDC), 2010 49th IEEE Conference
on, Dec. 2010, pp. 2221–2226.
[36] M. Raydan, “The barzilai and borwein gradient method for the large scale unconstrained
minimization problem,” SIAM Journal on Optimization, vol. 7, no. 1, pp. 26–33, 1997.
[37] D. R. Hunter and K. Lange, “A tutorial on MM algorithms,” The American Statistician,
vol. 58, no. 1, pp. 30–37, Feb. 2004.
[38] A. Beck and Y. Eldar, “Sparsity constrained nonlinear optimization: Optimality conditions
and algorithms,” SIAM Journal on Optimization, vol. 23, no. 3, pp. 1480–1509, 2013.
[Online]. Available: http://dx.doi.org/10.1137/120869778
95
Bibliography
[39] P. Forero and G. Giannakis, “Sparsity-exploiting robust multidimensional scaling,” Signal
Processing, IEEE Transactions on, vol. 60, no. 8, pp. 4118 –4134, Aug. 2012.
[40] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statis-
tical learning via the alternating direction method of multipliers,” Foundations and Trends R©in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011.
[41] I. Schizas, A. Ribeiro, and G. Giannakis, “Consensus in ad hoc WSNs with noisy links — part
i: Distributed estimation of deterministic signals,” Signal Processing, IEEE Transactions on,
vol. 56, no. 1, pp. 350 –364, Jan. 2008.
[42] H. Zhu, G. Giannakis, and A. Cano, “Distributed in-network channel decoding,” Signal
Processing, IEEE Transactions on, vol. 57, no. 10, pp. 3970 –3983, Oct. 2009.
[43] P. Forero, A. Cano, and G. Giannakis, “Consensus-based distributed support vector machines,”
The Journal of Machine Learning Research, vol. 11, pp. 1663–1707, 2010.
[44] J. Bazerque and G. Giannakis, “Distributed spectrum sensing for cognitive radio networks by
exploiting sparsity,” Signal Processing, IEEE Transactions on, vol. 58, no. 3, pp. 1847 –1862,
Mar. 2010.
[45] T. Erseghe, D. Zennaro, E. Dall’Anese, and L. Vangelista, “Fast consensus by the alternating
direction multipliers method,” Signal Processing, IEEE Transactions on, vol. 59, no. 11, pp.
5523 –5537, Nov. 2011.
[46] J. Mota, J. Xavier, P. Aguiar, and M. Puschel, “Distributed basis pursuit,” Signal Processing,
IEEE Transactions on, vol. 60, no. 4, pp. 1942 –1956, Apr. 2012.
[47] B. D. O. Anderson, I. Shames, G. Mao, and B. Fidan, “Formal theory of noisy sensor network
localization,” SIAM Journal on Discrete Mathematics, vol. 24, no. 2, pp. 684–698, 2010.
[48] M. Wainwright and M. Jordan, “Graphical models, exponential families, and variational in-
ference,” Foundations and Trends R© in Machine Learning, vol. 1, no. 1-2, pp. 1–305, 2008.
[49] N. Komodakis, N. Paragios, and G. Tziritas, “Mrf energy minimization and beyond via dual
decomposition,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 33,
no. 3, pp. 531–552, March 2011.
[50] R. G. Gallager, P. A. Humblet, and P. M. Spira, “A distributed algorithm for minimum-weight
spanning trees,” ACM Transactions on Programming Languages and systems (TOPLAS),
vol. 5, no. 1, pp. 66–77, 1983.
[51] J. Ash and R. Moses, “Outlier compensation in sensor network self-localization via the EM
algorithm,” in Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP ’05).
IEEE International Conference on, vol. 4, March 2005, pp. iv/749–iv/752 Vol. 4.
96
Bibliography
[52] F. Yin, A. Zoubir, C. Fritsche, and F. Gustafsson, “Robust cooperative sensor network local-
ization via the EM criterion in LOS/NLOS environments,” in Signal Processing Advances in
Wireless Communications (SPAWC), 2013 IEEE 14th Workshop on, June 2013, pp. 505–509.
[53] P. J. Huber, “Robust estimation of a location parameter,” The Annals of Mathematical
Statistics, vol. 35, no. 1, pp. 73–101, 1964.
[54] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex programming, version
1.21,” http://cvxr.com/cvx, Apr. 2011.
[55] E. L. Allgower and K. Georg, Numerical continuation methods: an introduction. Springer
Science & Business Media, 2012, vol. 13.
[56] J. J. Moré and Z. Wu, “Global continuation for distance geometry problems,” SIAM Journal
on Optimization, vol. 7, no. 3, pp. 814–836, 1997.
[57] G. Destino and G. Abreu, “On the maximum likelihood approach for source and network
localization,” Signal Processing, IEEE Transactions on, vol. 59, no. 10, pp. 4954 –4970, Oct.
2011.
[58] F. Girosi, “Models of noise and robust estimates,” 1991.
97