MAXIMUM-LIKELIHOOD REGISTRATION OF...

MAXIMUM-LIKELIHOOD REGISTRATION OF NETWORKS∗

STEPHEN D. HOWARD† , DOUGLAS COCHRAN‡ , WILLIAM MORAN§ , AND

FREDERICK R. COHEN¶

Abstract. Alignment of measurements collected at the nodes of a network is treated from anestimation-theoretic perspective. Particular attention is given to the role of the topology of thenetwork graph in constraining the maximum-likelihood estimation problem for a global set of inter-node offsets using noisy data, and to elucidating how network topology manifests in the forms of theestimators and their respective Fisher information matrices. More specifically, each node possesses adatum that is an element of a Lie group and each edge in the network graph has a corresponding noisymeasurement if the difference between the data values at the nodes it joins. Explicit estimators andtheir Fisher information are derived for two Lie groups, Rd and the unit circle, and the more generalcase of a connected abelian Lie group is discussed. In all cases, the estimators are expressed in terms ofthe distribution of the measurement noise and standard algebraic invariants of the network’s graphstructure. A relationship between maximum-likelihood estimation in this setting and Kirchhoff’slaws on the network graph is pointed out, and distributed estimation algorithms that exploit thisrelationship are described.

1. Introduction. Registration of data across a network is an ubiquitous problemin distributed sensing. Over more than three decades, much effort has been expendedon development of algorithms just to provide time synchronization across a distributednetwork; e.g., [12, 17, 24]. Synchronization of this kind is important for distributedparallel processing as well as data fusion across a sensor network. It is typically thecase that the network is not complete; i.e., each node does not communicate directlywith every other node. A large fraction of algorithms described in the literatureseek to minimize an error or objective function based on least squares, often withinpower or other resource constraints. Leaving aside the latter issue, the problem inthis setting is to assign a clock adjustment to every node based on knowledge of theclock differences, generally noisy, between some pairs of nodes in the network. Even ifclock difference measurements are available for every pair of nodes in the network, thepresence of noise still raises consistency considerations [25, 26]; e.g., the true offsetsmust sum to zero around any closed cycle.

The domain of modern network synchronization problems is by no means limitedto clock offsets, nor is the natural measurement space restricted to the real line.Determining the relative positions of sensor nodes in deployed sensor networks isessential in localization and tracking, for example [18]. Moreover, individual nodesmay possess multiple data to be registered across the network, and the noise affectingsuch vector data may be correlated across its components. Recent perspectives onnetwork signal processing consider multi-dimensional data (e.g., [13]) as well as noisyand otherwise imperfect information exchange between nodes (e.g., [20,28]). Researchin this arena has led to sophisticated distributed algorithms for estimation problems

∗This work was supported in part by the U.S. Air Force under grants FA9550-12-1-0225 andFA9550-12-1-0418, the U.S. Army Research office under MURI grant W911NF- 11-1-0391, the De-fence Science and Technology Group, Australia, the DARPA SToMP program, and the AustralianResearch Council.†Defence Science and Technology Group, PO Box 1500, Edinburgh 5111 SA, Australia.‡School of Mathematical and Statistical Sciences, Arizona State University, Tempe AZ 85287-1804

USA ([email protected]).§School of Electrical and Computer Engineering, RMIT University, Melbourne 3000 VIC, Aus-

tralia ([email protected]).¶Department of Mathematics, University of Rochester, Rochester NY 14627, USA

([email protected]).

1

mailto:[email protected]



2 S. D. HOWARD, D. COCHRAN, W. MORAN, AND F. R. COHEN

that apply beyond the realm of network alignment [4,5,14,19], but little attention hasbeen given to situations in which the natural measurement space is a Lie group [3,7]rather than a linear space. In phase synchronization, for example, typical data couldbe measurements of the phase differences between local oscillators at the nodes. Inthis case, the natural measurement space is the circle T = R/2πZ rather then thereal line R. If several local oscillators are involved, measurements might lie on thetorus Tn. Another important practical example where the measurement space is anonlinear multi-dimensional manifold is registration of local coordinate systems, forwhich the natural setting is the special orthogonal group SO(3). Even in the contextof clocks, if both offset and clock speed are adjustable locally the offsets are elementsof the affine group A. These examples illustrate that practical problems can entaildata on Lie groups that are compact (e.g., T or SO(3)), non-compact (Rd), abelian(Rd, T), or non-abelian (SO(3) or A).

It is common to represent networks by graphs. In this setting, the network nodesthat provide data to be registered or synchronized are represented by vertices labeledwith their associated parameters, such as local clock time or local coordinate system.Each pair of vertices corresponding to a pair of nodes that are in direct communicationare joined by an edge. Information is shared between vertices along such edges, each ofwhich is labeled by a noisy measurement of the difference between the parameters atthe end vertices of the edge. The notion of difference between two parameter valuesdepends on the algebraic structure of the parameter space. In Rd, for example, itis defined by subtraction of vectors. In other spaces, which are assumed to be Liegroups, difference is defined in terms of the group operation. The estimation problemon which this paper focuses arises precisely because the difference values that labelthe edges are corrupted by noise in a sense that will be made precise later in thepaper. As a consequence of this corruption, the edge labels in any graph with cycleswill generally be inconsistent; i.e., the edge labels around closed cycles will not sumto zero, even though the true difference values must do so. For a connected graph,the desired synchronization should provide a set of consistent edge labels that can beused to determine a unique offset value to register any pair of vertices in the graph.If one vertex label is known, this is equivalent to assigning labels to all other verticesconsistently throughout the graph.

The first goal of this paper is to frame a class of network synchronization problemsencompassed by the model just described in terms of statistical estimation theory. Thesecond goal is to describe the estimation-theoretic limits of these problems in termsof Fisher information and to quantify the role of the topology of the network graphin these limits. In the process, ML estimators are derived for a significant subclass ofthese problems, and interpretation of these estimators in terms of Kirchhoff’s laws forthe graph suggests how they can be realized by local algorithms. Specifically, theseestimation problems entail estimation of the vertex labels (parameters) from the edgelabels (noisy data). This paper focuses upon two cases in which the noise modelsare classical: Gaussian noise on Rd and von Mises distributed noise on T. Betweenthem, these cases encompass most of the elements encountered in the situation wherethe measurements and noise are on connected abelian Lie groups, which is the mostgeneral setting addressed in this paper.

In each form of the estimation problem treated here, the Fisher information andML estimator are derived in closed forms that depend both on the distribution ofthe noise and on the structure of the graph in intuitively appealing ways. In each ofthese cases, it is observed that the ML estimator is non-local in the sense that theoffsets for a node relative to its neighbors cannot be estimated at that node using only

MAXIMUM-LIKELIHOOD REGISTRATION OF NETWORKS 3

information obtained from its neighbors. Nevertheless, it is possible to find iterative(gradient descent) algorithms that are local and do converge to a ML estimator whenthey converge, which happens consistently in empirical tests.

Interesting problems associated with local maxima of the likelihood function thatarise in the setting of compact Lie groups are discussed in connection with the circlecase treated here. The value of the Fisher information and functions thereof (e.g.,its determinant) in designing networks that support accurate registration is also dis-cussed.

Structurally, Section 2 begins with a synopsis of the essential concepts of graphtheory and the associated notation that will be used throughout the paper. Section 3treats the case of Gaussian noise on Rd, with local estimators for this case developedin Section 4. Section 5 covers the case of von Mises noise on T and Section 6 discussesthe situation where the data reside in a connected abelian Lie group. Finally, someconcluding remarks appear in Section 7.

2. Graph Theory Preliminaries. The aspects of elementary graph theoryneeded in the remainder of this paper are covered in many standard references, suchas [2]. This section establishes specific graph-theoretic notation and terminology asthey are used throughout the remainder of the paper. In what follows, a graph Γwill be assumed to have a finite vertex set V (Γ). The collection of edges in Γ will bedenoted by E(Γ), where the edge e = (u, v) joins vertices u and v . If the graph isdirected then edge (u, v) starts at u and ends at v. If the graph is not directed then(u, v) ≡ (v, u). Unless otherwise noted, graphs in this paper will be directed, and thevalues labeling the edges will correspond to the difference between the label on thevertex at the end of the edge and the label on the vertex at the start of the edge. Atmany places in the paper, however, the graph orientation is irrelevant.

In the following definitions of spaces of functions on V (Γ) and E(Γ), the functionsare assumed to be real-valued, as in the clock synchronization example. Later thesedefinitions will be extended to cover other possible parameter spaces. The vertexspace C0(Γ) of a graph Γ is the real vector space of functions V (Γ)→ R; elements ofC0(Γ) are vectors of real numbers indexed by the vertices. Similarly the edge spaceC1(Γ) is the real vector space of functions E(Γ) → R. The number of vertices in Γwill be denoted by |V (Γ)| = n and the number of edges by |E(Γ)| = m. It will beconvenient to fix an ordering on the sets V and E in order to construct bases forC0(Γ) and C1(Γ). Denoting V (Γ) = {v1, . . . , vn}, a basis for C0(Γ) is obtained bydefining functions v1, · · · ,vn according to vi(vj) = δij for i, j = 1, . . . , n, so that anyelement of C0(Γ) can be written

x =

n∑i=1

xivi.

C0(Γ) is an inner product space with inner product 〈·, ·〉C0such that 〈vi,vj〉C0

= δij ,the Kronecker delta function

δij =

{1 if i = j0 if i 6= j

.

With this inner product and its associated norm || · ||C0, v1, · · · ,vn is an orthonormal

basis of the real inner product space C0. The linear map A : C0(Γ)→ C0(Γ) defined


by

Av` =∑vj∼v`

vj

where vj ∼ v` indicates that there is an edge connecting vj and v`, is called the(undirected) adjacency map. The corresponding matrix in the v` basis is called theadjacency matrix.

The degree dv of a vertex v is the number of edges either starting or ending at v,and the linear map N : C0(Γ) → C0(Γ) defined by Nv` = dv`v` is called the degreemap. The corresponding matrix in the vj basis is called the degree matrix and isdiagonal with the degrees of the vertices of Γ on its main diagonal. The Kirchhoffmap (unnormalized Laplacian) of Γ is defined by L = N − A : C0(Γ) → C1(Γ). Lis positive semidefinite and the dimension of its zero eigenspace (null space) is thenumber of connected components of Γ.

Indexing the edges of Γ as E(Γ) = {e1, · · · , em}, a basis for C1(Γ) can beconstructed in similar fashion to the one for C0(Γ). Specifically, define functionse1, . . . , em by ei(ej) = δij for i, j = 1, . . . ,m. Then any element of C1(Γ) can bewritten

x =

m∑i=1

xiei.

Defining an inner product 〈·, ·〉C1 such that 〈ei, ej〉C1 = δij makes C1(Γ) into aninner product space with norm || · ||C1 . Linear source and target maps, respectivelys, t : C1(Γ) → C0(Γ), are defined by s(ej) = u and t(ej) = v where ej = (u, v).Finally the (directed) incidence (or boundary) map of Γ is D : C1(Γ) → C0(Γ),where D = t− s. Certain concepts will be needed from homology theory, and will beintroduced informally as required. The map D is, in homological terms, a boundaryoperator, which applied to any edge e = (u, v) gives D(e) = t(e)− s(e) = v−u. TheKirchhoff map can be written in terms of the incidence map as L = DDT, where theadjoint map DT : C0(Γ)→ C1(Γ) is the coboundary operator defined by

DT(v) =∑

ej :t(ej)=v

ej −∑

ej :s(ej)=v

ej .

D

C1(Γ)

DT

C0(Γ)

The cycle space Z(Γ) is defined as follows. A cycle is a closed path in Γ; i.e.,a sequence of vertices L = v1v2v3 . . . vq in Γ where vi is adjacent to vi+1 for i =1, . . . , q− 1 and vq is adjacent to v1. The corresponding element of Z(Γ) is a functionzL ∈ C1(Γ) given by

(1) zL(ej) =

1 if ej ∈ L and ej is oriented as L−1 if ej ∈ L and ej is oriented opposite to L

0 otherwise.

Z(Γ) is the linear subspace of C1(Γ) spanned by the zL as L runs over all cycles in


Γ. The cycle space is exactly the kernel of D; i.e., for all z ∈ Z(Γ),

(2) Dz = 0,

and every z ∈ C1(Γ) satisfying (2) is a linear combination of zL. This conditionimplies that, for z ∈ Z(Γ), the oriented sum of the values on the set of edges meetingat any of the vertices is zero; i.e.,

(3)∑

ej :t(ej)=v

zj =∑

ej :s(ej)=v

zj ,

for all v ∈ V (Γ). This, of course, is a statement of Kirchhoff’s current law. For agraph with k connected components, the dimension of Z(Γ) is the first Betti numberof the graph; i.e., dimZ(Γ) = m− n+ k.

A second subspace of C1(Γ) that arises in the development to follow is the cocyclespace (cut set) U(Γ), an element of which is defined by fixing a partition P of thevertex set V (Γ) into two disjoint sets; i.e., V (Γ) = V1 ∪ V2. With respect to thispartition, define a vector ωP ∈ U(Γ) to be

ωP(ej) =

1 if ej joins V1 to V2

−1 if ej joins V2 to V1

0 otherwise.

U(Γ) is the linear subspace of C1(Γ) spanned by the ωP as P runs over all partitionsof V (Γ).

The cocycle space is exactly the orthogonal complement of Z(Γ) in C1(Γ). Thus,for any z ∈ Z(Γ) and any ω ∈ U(Γ),

(4) 〈z,ω〉C1 = 0.

This implies that every ω =∑j ωjej ∈ U(Γ) satisfies∑ej∈L

(−1)σjωj = 0,

for all cycles L, where σj = 0 if ej is oriented as L and σj = 1 if ej is oriented oppositeto L. This is Kirchhoff’s voltage law. Furthermore, every vector ω ∈ C1(Γ) can beuniquely decomposed as x = ω + z with ω ∈ U(Γ) and z ∈ Z(Γ). In other words,C1(Γ) = Z(Γ) ⊕ U(Γ). It is customary in the mathematical literature (e.g., [23])not to choose a basis to identify the Ci(Γ) (i = 0, 1) with their duals and to regardboundary and coboundary maps as being on different spaces. It is convenient here toidentify them.

The cocycle space U(Γ) is also the image of C0(Γ) under the coboundary operator,U(Γ) = Im(DT). The kernel of DT is the space of locally constant functions on V (Γ);i.e., functions that are constant on the vertices of connected components. In thedevelopment to follow, there is no loss of generality in working on different connectedcomponents separately. So from this point Γ will be assumed to be connected, andhence the kernel of DT is span{1} where 1 denotes the unit constant function on V (Γ).With this connectedness assumption, every x ∈ C0(Γ) can be decomposed uniquelyas

(5) x = Dω + α1,


where ω ∈ U(Γ) and α ∈ R. Formally, this decomposition is orthogonal since thekernel of DT is orthogonal to the image of D.

A spanning tree S in Γ is a graph with V (S) = V (Γ) such that every pair ofvertices is joined by exactly one path in S. Equivalently, S is a maximal subtree ofΓ. Kirchhoff’s matrix tree theorem implies that the number of spanning trees t(Γ) inΓ is equal to the absolute value of any cofactor of L(Γ); i.e., t(Γ) is the modulus ofthe product of the n− 1 largest eigenvalues of L(Γ).

In graph theory, one often encounters maps W : C1(Γ) → C1(Γ) that describesome weighting on the edges of Γ. In the standard basis, such a map has diagonalmatrix, and a weighted Laplacian can be defined by L = DWDT. The matrix treetheorem can be generalized to state that the absolute value of any cofactor of L isequal to ∑

S

∏e∈S

W (e)

where the sum extends over all spanning trees of Γ.

3. Gaussian Noise on Rd. This section provides precise formulations of esti-mation problems that arise in connection with registration on networks in situationswhere the parameter values at each node are real numbers or vectors in Rd. Themeasurements of differences between parameter values at communicating nodes arecorrupted by zero-mean additive Gaussian noise. The one-dimensional problem withindependent noise on each measurement is treated first. The values of explicit ex-pressions for the Fisher information, its determinant, and ML estimators obtained forthis case in network design problems are discussed. This is followed by treatment ofthe multi-dimensional problem, in which both correlated and independent noise areconsidered.

v4

v1

v2

v3

r(1,2)

r(4,1)

r(4,2)

r(4,3)

Fig. 1. Each edge e of a directed graph Γ is labeled with a value re that is the difference ωe oflabels at its boundary vertices plus a noise value εe.

3.1. One-dimensional Gaussian problem. The situation in which the pa-rameter space is the real line and the differences between communicating nodes arecorrupted by zero-mean additive Gaussian noise is developed here in detail because itillustrates the approach that will be used in the more complicated cases that follow.In this setting, the data vector r ∈ C1(Γ), is a sum of a vector of true difference valuesand noise; i.e.,

(6) r = ω + ε


where r and ε are in C1(Γ). Because the true difference values must sum to zeroaround any cycle, ω ∈ U(Γ). For the moment, assume that ε is jointly normalwith mean zero and covariance matrix σ2I; i.e., that the random variables εi areindependent and identically distributed (iid) with variance σ2. With this assumption,the conditional probability density function for r given ω is

p(r|ω) =∏

ej∈E(Γ)

1√2πσ2

exp

(− 1

2σ2(rj − ωj)2

)

= (2πσ2)−|E(Γ)|/2 exp

(− 1

2σ2‖r − ω‖2C1

),

and the log-likelihood function is thus

`(r|ω) = − 1

2σ2‖r − ω‖2C1

+ constant.

The ML estimator minimizes ‖r − ω‖C1, and is hence given by

(7) ω = ΠU r

where ΠU denotes orthogonal projection into U(Γ) with respect to the inner producton C1. The residual r− ω then resides in the cycle space Z(Γ). This result provides auseful characterization of the ML estimator: the estimate satisfies Kirchhoff’s voltagelaw and the residual satisfies Kirchhoff’s current law.

To explicitly compute (7), it is desirable to parameterize the space U(Γ), whosedefinition (4) does not well elucidate the nature of its elements. Two basic parameter-izations are useful. In the first, a particular vertex is chosen as a reference. Then (5)implies that the value of ω ∈ U(Γ) is determined by the relative offsets of the other|V (Γ)| − 1 = n − 1 vertices, denoted by W . The second parameterization of U(Γ) isobtained by choosing spanning tree S ∈ E(Γ) of Γ and noting that if ω is known onS, then all n− 1 offsets relative to a reference vertex can be determined by followingthe tree.

In the first of these cases, a basis {v1, . . . ,vn−1} for span(W ) is chosen andx ∈ span(W ) is expressed as

x =

n−1∑j=1

xjvj .

Alternatively, for the fixed spanning tree S, one may write ν ∈ S = span(S) as

ν =

n−1∑j=1

νjeρ(j)

where the ρ(j) label the n− 1 edges comprising the spanning tree S.These representations enable the definition of the (n − 1) × m matrix DW and

the (n− 1)× (n− 1) matrix DWS with respective entries

[DW ]ij = 〈vi, Dej〉C0

[DWS ]ij = 〈vi, Deρ(j)〉C0 .


DWS is the matrix of the restriction of D to span(S). With these definitions,

(8) ω = DTWx

and

(9) ν = DTWSx.

DWS is invertible; in fact [11, p. 101],

(10) detDWS = ±1.

Taking inverses in (9) yields x =(DTWS

)−1and ν = DT

WS

−1PSω, where [PS ]ij =

δρ(i),j is the matrix of the orthogonal projection onto S. The matrix LW = DWDTW

is the Laplacian (Kirchhoff) matrix with the row and column corresponding to vnremoved.

With this notation, the respective ML estimates for x and ν are

x = L−1W DWr(11)

ν = DTWSL

−1W DWr,

and the estimate of ω is

ω = DTWL−1W DWr.

The Fisher information matrix for estimation of the offsets xj is

FW = −E{∇2

x log p(r|ω)}

=1

σ2(∇xω)

T ∇xω.

Equation (8) implies that

(12) ∇xω =

[∂ωi∂xj

]= DT

W ,

and so

FW =1

σ2DWD

TW =

1

σ2LW .

Hence FW is proportional to the matrix of the Kirchhoff map in the standard basis,but with the row and column corresponding to the reference vertex removed. In asimilar way, the Fisher information matrix for estimation of the νj is

FS =1

σ2DTWSLWDWS .

Note that detLW is a minor of the Kirchhoff matrix L, and so by the Kirchhoffmatrix tree theorem it is equal to the number t(Γ) of spanning trees of Γ. By (10),the determinant of the Fisher information is thus

(13) detFW = detFS = σ−2(n−1) detLW = σ−2(n−1)t(Γ).

The best possible situation occurs when the pairwise difference between all nodesis measured. In this case, Γ is a complete graph and the number of spanning trees is


known to be t(Γ) = n(n−2). In this situation, the “average” Fisher information pernode is

(detFW )1/(n−1) =1

σ2n(n−2)/(n−1) ∼ n

σ2as n→∞.

Both of the ML estimators x and ω are unbiased, and as a result, since thecovariance matrix of r is σ2I,

Cx = E{(x− x)(x− x)T} =1

σ2LW ,

and its determinant is

(14) detCx =σ2(n−1)

t(Γ),

as anticipated from (13). The covariance matrix of the estimator ω is

Cω = E{(ω − ω)(ω − ω)T} = DTW (FW )−1DW = σ2DT

WL−1W DW = σ2PU

since PU = DTWL−1W DW is the orthogonal projection onto U(Γ). An interesting con-

sequence of this observation is that TrCω = σ2 dimU(Γ).

3.2. Network design for independent errors on edges. Before going on toother measurement models, it is instructive to consider briefly the consequences of theabove results in the design of a synchronization or registration scheme for a network.Since the ML estimator (11) is unbiased for any graph Γ, the role of the number ofspanning trees t(Γ) in the determinant of the estimator covariance matrix (14), orequivalently in the determinant of the Fisher information matrix (13), shows that alarge number of spanning trees is desirable for good estimator performance.

(a) (b)

Fig. 2. Two networks with the same number of nodes and links. Network (a) has three spanningtrees while network (b) has five spanning trees and is hence superior in the estimation context ofthis paper.

The network depicted in Figure 2(a) has the same number of nodes and links asthe one in Figure 2(b). But the number of spanning trees in the former is three whilethe number of spanning trees in the latter is five. So, under the model assumed inthis paper, estimation fidelity will be better for the network of Figure 2(b). Fromthe perspective of design, if one has the opportunity to add one link to the acyclicnetwork shown in Figure 3, the best choice in the context of this paper is to createa ring network (six spanning trees) and the worst is to create the network of Figure2(a).


Fig. 3. An acyclic network. If one link can be added, the best choice is to create a ring network.The worst choice is to create the network shown in Figure 2(a).

3.3. Estimation with correlated measurements. This section further exam-ines the situation where G = R and the measurement model is given by (6), but nowwith the measurement errors, in the standard basis for C1(Γ), being jointly Gaussianwith covariance matrix R. In this setting, the probability density of the measurementsis

p(r|ω) =1√

(2π)m detRexp

(−1

2(r − ω)R−1(r − ω)T

).

The ML estimate of ω ∈ U(Γ) is obtained by splitting the data r as

(15) r = ω + ε,

where ω ∈ U(Γ) and residual ε satisfies R−1ε ∈ Z(Γ). That is, ω = QUr, whereQU is now an oblique projection with range U(Γ) and null space R−1(Z(Γ)). This

ω0 + ω1 − ω2 = 0

ω0

ω1 ω2

σ20

ε0σ21

ε1

σ22

ε2ε0/σ

20 + ε1/σ

21 + ε2/σ

22 = 0

Fig. 4. The residual ε of the ML estimate has the R−1ε satisfies the Kirchhoff current law.

situation is illustrated in Figure 4 for a diagonal covariance matrix with entries σ2j .

The estimate ω satisfies the Kirchhoff voltage law; i.e., the oriented sum of ω aroundany cycle is zero. It is interesting to observe that it is now R−1ε that satisfies theKirchhoff current law; i.e., the oriented sum at any vertex is zero, with the covarianceR playing the role of resistance in a way akin to Ohm’s law.

The columns of DTW are a basis for U(Γ) and (2) implies that the columns of

R−1DTW are a basis for (R−1(Z(Γ)))⊥. Thus,

ω = DTW

(DWR

−1DTW

)−1DWR

−1r.

The corresponding ML estimate for the vertex offsets x is

x =(DWR

−1DTW

)−1DWR

−1r.


Motivated by these expressions, it is convenient to define the weighted LaplacianL = DR−1DT, and similarly LW = DWR

−1DTW .

By (12), the Fisher information matrix for the vertex parametrization is given by

FW = −E{∇2

x log p(r|ω)}

= DWR−1DT

W = LW .

The Cauchy-Binet formula [8] allows the determinant of the Fisher informationto be written as

detFW = det(DWR−1DT

W ) =∑T

∑T ′

det(DWT ) det([R−1]TT ′) det(DWT ′)

where T and T ′ denote subsets of columns (edges) which are retained. The sumextends over all subsets of E(Γ) of order n− 1. The matrices DWT have the property[11, p. 101]

detDWT =

{±1, if T is a spanning tree,

0, otherwise.

The only terms in the above sum over T which are non-zero are those correspondingto spanning trees, and so

(16) detFW =∑S,S

αSS′ det([R−1]SS′).

The quantity

(17) αSS′ = det(DWSDTWS′)

takes values ±1 depending on the pair of spanning trees S and S′ and independentlyof the choice of W .

With the assumption R = diag(σ21 , ..., σ

2m), so that det([R−1]SS′) = 0 unless

S = S′, (16) simplifies to

detFW =∑S

∏ej∈S

1

σ2j

,

since DWSDTWS is positive definite. This reduces further to (13) when σ2

j = σ2 for allej ∈ E(Γ).

The ML estimate of the vertex offsets x is unbiased and its covariance matrix is

Cx = E{(x− x)(x− x)T} = L−1W

and has determinant

detCx =1∑

S,S αSS′ det([R−1]SS′)

from (16). The ML estimate of ω is also unbiased and has covariance

Cω = E{(ω − ω)(ω − ω)T} = DTW L−1W DW = QUR.


3.4. Multi-dimensional Gaussian problem. This section further generalizesthe setting to G = Rd, where the state of each vertex in the network is a vector inRd. In this situation, it is necessary to consider more general functions of the graph Γthan were treated in Section 2. The vertex space now consists of functions V (Γ)→ Rdand is correspondingly denoted by C0(Γ,Rd). In fact, C0(Γ,Rd) = Rd ⊗C0(Γ,R). Interms of the standard basis qj , j = 1, · · · , d for Rd, any element of C0(Γ,Rd) can beexpressed as

x =

d∑i=1

n∑j=1

xijqi ⊗ vj =

n∑j=1

xj ⊗ vj .

Similarly, the vector space of functions E(Γ)→ Rd is C1(Γ,Rd) = Rd⊗C1(Γ,R). Theboundary map on C1(Γ,Rd) is I⊗D where I is the identity map on Rd and D is theboundary map on C1(Γ). The coboundary map on C0(Γ,Rd) is I⊗DT.

As in the one-dimensional case, the cycle space Z(Γ,Rd) is defined to be thekernel of the boundary map; i.e., the set of z ∈ C1(Γ,Rd) such that (I ⊗ D)z = 0.Writing

z =

m∑j=1

zj ⊗ ej ,

the cycles satisfy a vector form of (3):∑ej :t(ej)=v

zj =∑

ej :s(ej)=v

zj ;

i.e., at any vertex, the oriented vector sum of the values of z on the set of edgesmeeting at the vertex is zero.

The cocycle space U(Γ,Rd) is the image of the coboundary map I⊗DT. For any

ω =

m∑j=1

ωj ⊗ ej ∈ U(Γ,Rd),

the equation ∑ej∈L

(−1)σjωj = 0

must hold for for all cycles L. In this expression, σj = 0 if ej is oriented as L andσj = 1 if ej is oriented opposite to L.

An inner product can be defined on C1(Γ,Rd), based on the inner product definedearlier on C1(Γ) and the standard inner product on Rd. Specifically, for s1, s2 ∈ Rdand x1,x2 ∈ C1(Γ),

〈s1 ⊗ x1, s1 ⊗ x1〉 = 〈s1, s2〉Rd〈x1,x2〉C1(Γ).

For all z ∈ Z(Γ,Rd) and ω ∈ U(Γ,Rd), 〈z,ω〉 = 0; i.e., Z(Γ,Rd) is the orthogonalcomplement of U(Γ,Rd).

3.4.1. Independent identically distributed edge measurements. Recallthat the measurement model is r = ω+ε with ε ∼ N (0, R), r and ε in C1(Γ,Rd), and


ω ∈ U(Γ,Rd). The first case of interest is when the errors εj ∈ Rd are independent.Their individual probability densities are

p(εj) =1√

(2π)d detRexp

(−1

2〈εj , R−1εj〉Rd

)for j = 1, · · · ,m. By independence of the εj , the joint probability density for thedata r ∈ C1(Γ,Rd) is thus

p(r|ω) =1

((2π)d detR)m/2exp

(−1

2〈r − ω, (R−1 ⊗ I)(r − ω)〉

).

The ML estimate of ω is obtained by splitting the data r as r = ω + ε, whereω ∈ U(Γ,Rd) and ε ∈ Z(Γ,Rd). Explicitly, ω ∈ U(Γ,Rd) is parameterized in termsof the offsets of n− 1 vertices with respect to a reference vertex

ω = (I⊗DTW )x

or alternatively, in terms of a spanning tree S, as

(18) ω = (I⊗DTWD

−1WS)ν,

where x and ν are related by x = (I⊗D−1WS)ν. In terms of these parameterizations,

the ML estimates are

x = (R−1 ⊗ L−1W )(R−1 ⊗DW )r = (I⊗ L−1

W DW )r

and

ν = (I⊗D−1WSL

−1W DW )r.

So ω = (I⊗DTWL−1W DW )r.

The Fisher information matrix for x is given by

FW = −E{∇2

x log p(r|ω)}

= (∇xω)T

(I⊗R−1)∇xω.

Equation (18) implies that ∇xω = I⊗DTW , so that

FW = R−1 ⊗DWDTW = R−1 ⊗ LW .

In a similar way, the Fisher information matrix for estimation of ν is

FS = R−1 ⊗DTWSLWDWS .

The determinant of the Fisher information in both cases is

(19) detFW = detFS =(detLW )d

(detR)n−1=

t(Γ)d

(detR)n−1,

which is obtained by using the tensor product identity

detA⊗B = (detA)dimB(detB)dimA.

Note the similarity of (19) and the corresponding expression (13) for the scalar case.


Finally, the error covariance of the unbiased ML estimate ω is

Cω = E{(ω − ω)(ω − ω)T} = R⊗DTWL−1W DW = R⊗ PU ,

where PU is the orthogonal projection onto U(Γ). Taking the partial trace over C1(Γ)yields

TrC1(Γ) Cω = (dimU(Γ))R.

3.4.2. Correlated edge measurements. Now assume the errors on the edgemeasurements εj ∈ Rd have joint probability density

p(ε) =1√

(2π)md detRexp

(−1

2〈ε, R−1ε〉C1(Γ,Rd)

)where R now denotes the md ×md covariance matrix of the m d-dimensional edgemeasurements. The probability density for the data r ∈ C1(Γ,Rd) is

p(r|ω) =1√

(2π)md detRexp

(−1

2〈r − ω, R−1r − ω〉

)where the inner product in the exponent is in C1(Γ,Rd). The ML estimate involves adecomposition of the data as r = ω+ ε, where ω ∈ U(Γ,Rd) and ε ∈ R−1(Z(Γ,Rd)).Explicitly,

ω = QUr,

where QU is the oblique projection with range U(Γ,Rd) and null space R−1(Z(Γ,Rd));i.e.,

QU = (I⊗DTW )((I⊗DW )R−1(I⊗DT

W ))−1

(I⊗DW )R−1.

The ML estimate of the vertex offsets is

x =((I⊗DW )R−1(I⊗DT

W ))−1

(I⊗DW )R−1r.

The Fisher information for x is

FW = (I⊗DW )R−1(I⊗DTW ).

In order to calculate the determinant of FW , it is helpful to first find

det(I⊗DW )T

where T denotes a subset of (n−1)d of the columns. Now (I⊗DW ) is a block diagonalmatrix with diagonal blocks all DW . Write T = (T1, T2, · · · , Td), where Tk denotesthe set of columns selected from the kth block. Since Γ is assumed to be connected,DW has rank n− 1 thus det(I⊗DW )T = 0 unless exactly n− 1 columns are chosenfrom each block, in which case

det(I⊗DW )T =

d∏j=1

det(DWTj ) =

{±1, if each Tj is a spanning tree

0, otherwise.


S = (S1, S2, . . . , Sd) will be called a multi-spanning tree if it consists of one span-ning tree for each dimension of the parameter space Rd. Applying the Cauchy-Binetformula gives(20)

detFW =∑T T ′

det(I⊗DW )T det([R−1]T T ′) det(I⊗DTW )T =

∑SS′

αSS′ det([R−1]SS′)

where the sum is over multi-spanning trees. In this expression,

αSS′ =

d∏j=1

αSjS′j

with αSS′ given by (17), which takes values ±1. If the edge measurement errors areindependent, then (20) reduces to

(21) detFW =∑S

det([R−1]SS).

The error covariance of the ML estimate of the vertex offsets x is

E{(x− x)(x− x)T} =((I⊗DW )R−1(I⊗DT

W ))−1

.

So,

detE{(x− x)(x− x)T} =1∑

SS′ αSS′ det([R−1]SS′)

and

E{(ω − ω)(ω − ω)T}= (I⊗DT

W )((I⊗DW )R−1(I⊗DT

W ))−1

(I⊗DTW ) = QUR.

4. Local Estimators for Gaussian Noise on R. The estimators presentedthus far have either implicitly or explicitly assumed a fixed reference vertex or a par-ticular spanning tree. This is intrinsically incompatible with the development of localestimators; i.e., estimators that can be implemented in ways that each vertex fol-lows some procedure that uses only information accessible from its nearest neighbors.Consider again the situation involving Gaussian noise on R with the noise on edgemeasurements being independent, treated in Section 3.1. Recall that the ML estimatefor ω is such that the residual (r − ω) ∈ Z(Γ); i.e., it obeys Kirchhoff’s current law.Thus, ω is the solution of Dω = Dr. Indeed, if x is any element of C0(Γ) satisfying

(22) Lx = Dr

then ω = DTx will be a ML estimate of ω. The quantity x may be interpreted asthe collection of offsets each vertex needs to apply to its own coordinate in order forthe entire network to be aligned in a maximum-likelihood sense.

If the linear system (22) is solved using Jacobi’s method [6], the structure of theLaplacian matrix L ensures this algorithm is local. Jacobi’s method involves writingL = N − A in terms of the diagonal degree matrix N and the adjacency matrix Aand applying the recursion

(23) x(t+1) = N−1(Dr +Ax(t)

)


A fixed point of this recursion satisfies (36). Thus if the method converges, it givesthe ML estimate. Jacobi’s method is known to converge if the matrix L is diagonallydominant [27]; i.e.,

(24) |Lii| >∑j 6=i

|Lij |,

although this is not a necessary condition. The Laplacian matrix of a graph givesequality in (24). Another sufficient condition for convergence of Jacobi’s method isthe so-called walk-summability condition [15]. This specifies that the spectral radiusof N−1A is less than one; i.e., ρ(N−1A) < 1. The Gershgorin circle theorem [8]implies ρ(N−1A) ≤ 1.

The Jacobi algorithm has been applied in MAP estimation using Gaussian beliefpropagation in Bayesian belief networks [10]. In this work, it is noted that whenneither (24) and ρ(N−1A) < 1 are satisfied and Jacobi’s method fails to converge,convergence can forced by using a double-loop iterative method. In the estimationproblem of interest in this section, the possible violations of the sufficient conditionsfor convergence are as mild as can be found in practice. In experiments run to date,Jacobi’s method has never failed to converge.

Once the recursion (23) is written out in detail, the update for the kth vertexbecomes

x(t+1)k =

1

nk

∑v`∼vk

(x

(t)` + r(`,k)

),

where r(`,k) is re if e = (v`, vk) ∈ E(Γ) or −re if e = (vk, v`) ∈ E(Γ). Thus the

recursion is indeed local. Note that x(t)` + r(`,k) is the current prediction at the

neighboring vertex ` of the value at the kth vertex should be. At each vertex, a singleiteration of the algorithm can be summarized as “become the mean of what yourneighbors say your value should be.”

An alternate way to state the recursion, which will be important subsequentapplication to estimation in Lie groups other that Rd, is as follows. Denote an actionof R on itself by

Trx = x+ r.

Then (23) can be written as x(t+1) = Qx(t), where

Qk,` =

1nkTr(k,`) , if (v`, vk) ∈ E(Γ),

1nkT−1r(k,`)

, if (vk, v`) ∈ E(Γ),

0, otherwise.

For G = Rd, with no correlation between the measurement noise on differentedges of the graph, local algorithms and analysis follow by direct analogy with thecase d = 1 and we do not give it here. Finally, although it may be possible to developlocal algorithms for some forms of correlation between measurement noise on differentedges of the graph, this possibility has not been explored.

5. Phase and Directional Alignment. This section addresses the alignmentof directional senors, such as sensors recording wind direction, across a network oralternatively the alignment of the phase of oscillators across a network. For this


problem, the natural parameter space for the data at the vertices is the circle. Thiscan be thought of as the group of real numbers modulo 1, or equivalently as the groupof complex numbers with unit modulus; i.e., T = {e2πiθ : θ ∈ [0, 1)}. In what follows,it is convenient to use the latter description.

In the preceding sections, the offsets between measurements at vertices naturallyform elements of a vector space. In the current situation this will no longer be true,resulting in significant differences in both theory and algorithms.

5.1. Cycles and Cocycles. It is necessary to modify the treatment in Section2 to accommodate the current case. The first thing is that the spaces C0(Γ,C) andC1(Γ,C) are now the vector spaces of functions V (Γ)→ C and E(Γ)→ C, respectively.The boundary operator is defined by analogy with the real case and the cycle spaceZ1(Γ,C) ⊂ C1(Γ,C) consists of the functions z satisfying

Dz = 0.

The state of the network is an element of the set C0(Γ,T) ⊂ C0(Γ,C) of functionsV (Γ)→ T. Note that C0(Γ,T) is not a vector space. Similarly C1(Γ,T) ⊂ C1(Γ,C) isthe set of functions E(Γ) → T. The coboundary operator D∗ : C1(Γ,T) → C0(Γ,T)is defined to take x ∈ C0(Γ,T) to

D∗x =

m∑j=1

xt(ej)xs(ej)ej .

The T-cocycle space U(Γ,T) is the image of C0(Γ,T) under D∗; i.e., U(Γ,T) ={D∗x |x ∈ C0(Γ,T)}. Members ω of U(Γ,T) share a property corresponding to (4);i.e., around any cycle L (see Equation (1)),∏

e∈E(Γ)

ωzL(e)e = 1.

This is the analogue of Kirchhoff’s voltage law in this setting: the oriented productof the elements of ω around any cycle is the identity element of T (i.e., 1).

5.2. The model. In the statistical analysis of directional or phase data, onemust use probability distributions defined on the circle. The distribution which takesthe place of the ubiquitous Gaussian distribution in this setting is the von Misesdistribution [16]. The property that makes the von Mises distribution ideally suitedto modeling measurement errors in data on a circle is that it is the maximum entropydistribution on the circle for a given circular mean and circular variance, just as theGaussian is the maximum entropy distribution on the real line for a given mean andvariance.

Given a set of N points zj ∈ T, the sample circular mean z ∈ T is given by

Az =1

N

N∑j=1

zj .

The quantity A is a measure of concentration and is related to the circular variance ρof the sample by ρ = 1−A2. For a probability density p(θ) on the circle, the circularmean µ and the circular variance ρ are defined by

Aµ =

∫Teiθp(eiθ) dθ,


where the circular variance is ρ = 1−A2.The von Mises density function with circular mean µ and circular variance ρ =

1− (I1(κ)/I0(κ))2 is

(25) p(z = eiθ|µ, κ) =1

2πI0(κ)eκ cos(θ−µ) =

1

2πI0(κ)eκ2 (z∗0z+z0z

∗),

where I0 and I1 are respectively the first and second modified Bessel functions of thefirst kind and z0 = eiµ. The value of ρ uniquely determines κ and vice-versa. Thenotation M(µ, κ) will be used for this distribution.

Von Mises distributed noise is used as the basis of a measurement model as follows.The state of the network is

x = (xv)v∈V (Γ) ∈ C1(Γ,T).

For each edge e ∈ E(Γ), a measurement on e is made which has the form

(26) re = ωeεe

where the errors εe are independently distributed with εe ∼M(µ, κ) and ω ∈ U(Γ,T)has the form ω = D∗x.

5.3. The estimation problem. The joint distribution of the noise ε ∈ C1(Γ,T)is taken to be

p(ε) =∏

e∈E(Γ)

1

2πI0(κe)exp

(κe2

(εe + εe)).

Consequently, the density of r conditioned on ω is

p(r|ω) =∏

e∈E(Γ)

(1

2πI0(κe)

)exp

κe2

∑e∈E(Γ)

(ωere + ωere)

.

The log-likelihood is thus

(27) `(r|ω) =∑

e∈E(Γ)

κe2

(ωere + ωere) +∑

e∈E(Γ)

log1

2πI0(κe).

The Fisher information for this estimation problem can be found by first differ-entiating `(r|ω) with respect to θv (cf. (25)):

(28)∂

∂θv`(r|ω) = Im

( ∑e:t(e)=v

κeωere −∑

e:s(e)=v

κeωere

)= Im

∑e∈E(Γ)

Dveκeωere.

Differentiating again with respect to θu yields

∂2`(r|ω)

∂θu∂θv= −Re

∑e∈E(Γ)

κeDueDveωere.

Noting that E{re} = ωeI1(κe)/I0(κe), the Fisher information is thus

Fuv = −E{∂2`(r|ω)

∂θu∂θv

}= Re

∑e∈E(Γ)

κeDueDveωeI1(κe)

I0(κe)ωe,


which reduces to F = DWFDTW where F is the diagonal matrix

F = diag

(κe1I1(κe1)

I0(κe1), · · · , κemI1(κem)

I0(κem)

).

The quantity κeI1(κe)/I0(κe) associated with an edge e may be interpreted as follows.Consider the problem of estimating ωe given the datum re. As above, the conditionaldensity is

p(re|ωe) =1

2πI0(κe)exp

(κe2

(ωere + ωere)).

The parameterization ωe = exp(iφe) yields the Fisher information for estimation ofφe from re as

Fe = −E{∂2

∂φ2e

log p(re|ωe)}

=κeI1(κe)

I0(κe).

Thus, when the Fe are regarded as edge weights, the graph Fisher information is theweighted Laplacian with these weights. The determinant of the Fisher information is,by Kirchhoff’s matrix tree theorem,

detF =∑S

∏e∈SFe.

If the edges of Γ share a common κ, this reduces to

detF =κI1(κ)

I0(κ)t(Γ).

Note the similarities of this expression with (13) and (19).

5.4. Maximum-likelihood estimator. This section considers the problem ofdetermining a ML estimator x for the vertex offsets from data r. Equation (28)implies that critical points of the likelihood (27) must satisfy

Im∑

e∈E(Γ)

Dveκeωere = 0

at every vertex v ∈ V (Γ). In this expression, ω denotes a ML estimate of ω. If themap K : C1(Γ,T)→ C1(Γ,C) is defined by

K(ε) =∑

e∈E(Γ)

κeεe

and the residual is written as εe = ωere, the critical points of the likelihood satisfy

ω ∈ U(Γ,T)

Im (K(ε)) ∈ Z(Γ,C)(29)

and Re (K(ε)) is the corresponding value of the log-likelihood (up to a positive con-stant). These correspond to (15) for the Gaussian case. This condition can be rear-ranged as

Im

∑e:v=s(e)

κeωere +∑

e:v=t(e)

κeωere

= 0


or, more succinctly,

(30)∑

e∈E(v)

κe(ωere

)Dve= ρv ∈ R

for all v ∈ V (Γ), since (ωere)−1 = ωere. In other words, the weighted sum of the

directed residuals at any vertex must be real. This condition may be seen as a gener-alization of the Kirchhoff current law applicable to the group T and circular statistics(See Figure 5). ML estimates for x will be among the solutions of (30). In contrastto the situation for Rd, there are now a number of critical points. Consequently, toobtain a reliable fast estimator, it is necessary to distinguish the global maxima fromthe other critical points.

ω0ω1ω2 = 1

ω0

ω1 ω2

κ0ε0

κ1

ε1

κ2

ε2Im(κ0ε0 + κ1ε−11 + κ2ε2) = 0

Fig. 5. Schematic of Kirchhoff laws for the circle group T. At a maximum of the likelihood(27) the residual ε satisfies Kirchhoff’s current law in the form (29).

Writing

κv =1

dv

∑e∈E(v)

κe,

so that κvdv measures the strength of the connection of the vertex v to the rest of thenetwork, reveals that what distinguishes ML estimates from the other critical pointsof the likelihood function is that, for moderate noise values,

(31)1

dv

∑e∈E(v)

κeκv

(ωere

)Dve ≈ 1.

In terms of the augmented adjacency matrix

(32) Avv′ =

κ(v,v′)r(v,v′) if (v, v′) ∈ E(Γ)

κ(v,v′)r−1(v,v′) if (v′, v) ∈ E(Γ)

0 otherwise

,

equation (30) can be reformulated with the aid of (26) as

(33) Qx = N−1Ax = Gx,

where Nv,v′ = κvdvδv,v′ and G is a real diagonal matrix. The approximation (31)corresponds to κvdv ≈ ρv or G ≈ I. When κe = κ for all e ∈ E(Γ), the matrix N is κtimes the degree matrix.


A fast estimator for x may be obtained as follows. First, find the eigenvectory ∈ Cn corresponding to the largest eigenvalue of Q,

(34) Qy = λy.

Then xv = yv/|yv| for all v ∈ V (Γ). The structure of this estimator is elucidatedby consideration of its local implementation, which can be obtained by applying thepower method [8] to (34). This involves application of the matrix Q repeatedly to arandom initial vector y(0). The method results in convergence to an eigenvector of Qcorresponding to the eigenvalue of largest magnitude provided this is the only eigen-value of largest magnitude and the initial y(0) is not orthogonal to a left eigenvector ofQ. In practice, to avoid numerical overflow, the resultant vector is re-normalized aftereach application of Q. This is a critical point in turning (34), or any other eigenvectorbased estimator, into an algorithm that can be run locally on the network. The nor-malization of y is a global operation since it involves each vertex knowing the valuesof y across the entire network. Without the ability to normalize vector, it is necessaryto ensure that the eigenvalue of largest magnitude is close enough to unity that theiteration converges well before an overflow or underflow condition is reached. Thealgorithm proposed here indeed has the property that the largest magnitude eigen-value λ is approximately one. Further, as demonstrated in the simulations below, itoperates reliably to align the network without renormalization.

To examine the local operation of this algorithm more closely, write yv = avxvwhere av is a real amplitude and xv ∈ T for each v ∈ V (Γ). In addition to its phasexv ∈ T, each vertex keeps a circular variance 0 ≤ av ≤ 1 which measures the degreeto which it agrees with its neighbors. The local update rule is

(35) a(m+1)v x(m+1)

v =1

dv

∑u∼v

κ(u,v)

κva(m)u (x(m)

u r(u,v))Dv,(u,v) .

Thus, at each update, the vertex v resets to the weighted circular mean of what itsnearest neighbors predict its phase should be (compare this to Jacobi’s method foralignment in R). The weighting is based on how well each of the neighbors are alignedwith their own neighbors. In this way, the contributions of vertices that have not yetconverged are discounted relative to the ones from vertices that have converged toalignment with their neighbors.

This estimator exhibits remarkably good performance. In cases tested, the esti-mator gives a mean circular error performance indistinguishable from that of actualmaximum likelihood, even when κ ≈ 1. Although it seems to have no real bearingon practical estimation performance, it is possible to construct estimators that givevalues closer to the ML estimate. One example of such an estimator, the hybridmaximum-likelihood algorithm, starts by running a suitable number of iterations ofthe local algorithm just described. Then it switches to the following iteration

(36) a(m+1)v x(m+1)

v =1

dv

(∑u∼v

κ(u,v)

κ(m)v

a(m)u (x(m)

u re)Dve

),

where

κ(m)v =

1

dvRe(

(Ax(m))v/x(m)v

).


10−4

10−3

10−2

10−1

MeanCircularError

102 103 104 105κ

Local Q Eigenvector

Local Hybrid ML

Global Q Eigenvector

Global A Eigenvector

TrF−1 per node

Fig. 6. Mean circular error versus concentration parameter κ for the four estimators operatingon the 31-node network shown in the inset.

At a particular vertex, the algorithm stops when the size of the imaginary component

of (Ax(m))v/x(m)v drops below some threshold. At this point, x satisfies the condi-

tion (33) with tolerance corresponding to the threshold, and thus a maximum of thelikelihood function to this tolerance is obtained.

The algorithms described above were tested on a number of networks with ran-domly chosen parameters and also on some networks with specifically chosen charac-teristics (the results for a single network are given here). The errors on the edges wereidentically distributed with concentration parameter κ. Two global algorithms weretested. The first is a method proposed in [21] that uses the eigenvector correspondingto the largest eigenvalues of the adjacency matrix (32). The other uses the eigen-vector corresponding to the largest eigenvalues of the matrix Q given in (33). Thesewere compared with the two local algorithms; i.e., the algorithm (35) for arriving atan the eigenvector of Q corresponding to the largest eigenvalue and the hybrid MLalgorithm described in (36). In general, across the set of random networks simulated,results indicated that the local Q eigenvector estimator works as well as the globalmethods except at very high noise levels (κ < 3). This local scheme gives essentiallyML performance, with average error nearly equal to the trace of the inverse Fisherinformation. The local algorithms were tested in simulations using only nearest neigh-bor communication and with purely local stopping criteria. In each simulation themean circular error across the aligned network

CE(x) = 1−∣∣∣ 1

|V (Γ)− 1|∑

v∈V (Γ)/v0

xvxv

∣∣∣2was computed, where xv is the true value of the node offset. These results werecompared with the trace of the inverse of the per-node Fisher information. Figure 6shows an example of a 31-node network in which there is a significant difference inperformance.

6. Abelian Lie Groups. This section briefly describes the situation in whichthe measurements reside in a general connected abelian Lie group. Such groups are


just products of groups considered in the previous sections; i.e., G is of the formRd × Tq, and its elements may thus be written as g = (x, z) where x ∈ Rd andz = (e2πiθk)qk=1. Its dual group, which is the group of homomorphisms from G to T,is identifiable with Rd × Zq, with elements τ = (y,n), via the pairing

〈g, τ 〉 = 〈(x, z), (y,n)〉 = exp i(x.y + 2π

q∑k=1

nkθk).

As in the formulations treated previously, the edges of the graph Γ are labeled withoffsets that are elements of G. Rather than treating this case in complete detail, thefocus of what follows in on synopsis of the aspects of the problem where significantmodification of the approach described for earlier cases is needed to treat this moregeneral situation.

A key issue in this setting is the choice of a suitable model for the distributionof the noise corrupting the edge labels. Little work has been done on probabilitydistributions on abelian Lie groups. Even the case of the q-torus Tq for q > 2 is littleexplored, although the case q = 2 is addressed in [22] where a version of the von Misesdistribution is used in connection with an application to estimation of torsion anglesin complex molecules. Specifically, they use

p(θ1, θ2) = C exp(κ1 cos(θ1 − µ1) + κ2 cos(θ2 − µ2) + λ sin(θ1 − µ1) sin(θ2 − µ2)

).

The authors have studied this topic and intend to address it in a later paper, where adetailed analysis of appropriate error distributions on such groups will be presented.For the present purpose, however, it suffices to assume sufficiently smooth densitiespe(g) = pe(x, z) for g = (x, z) ∈ G = Rd × Tq and e ∈ E(Γ) to allow the Fisherinformation to be calculated. Also, for this discussion, attention is restricted to thecase where noise on distinct edges is independent.

The methods adopted in the previous examples can be mimicked and generalizedas follows, where now the group operation is written additively (so that the circle groupT is represented as the real numbers modulo 2π). The data vector r is, therefore,given by

re = ωe + εe = xt(e) − xs(e) + εe

for each e ∈ E(Γ). In this expression, ω is parameterized in terms of the vertex offsetsx. The probability density for ε is

f(ε) =∏

e∈E(Γ)

fe(εe).

The density for the noisy measurements is

f(r|x) =∏

e∈E(Γ)

fe(re − xt(e) + xs(e))

and, for given data r, the log-likelihood function is

`(r|x) =∑

e∈E(G)

log fe(re − xt(e) + xs(e))

where re and xe are elements of G. It remains to define an “edge Fisher information”Fe as Fe = −E

{∇2

ωe log pe(re−ωe)}

where ∇2 involves partial derivatives in each of


the real and circular coordinates of G. Observe that Fe is a (d+ q)× (d+ q) matrixindexed by the coordinates of G.

In analogy to the case of Rd, the Fisher information matrix for x is given by

FW = −E{∇2

x log p(r|ω)}

= (∇xω)T F∇xω

where F is a m(d + q) ×m(d + q) matrix with components F(e,s),(e′,t) = δe,e′ [Fe]s,tfor s, t = 1, · · · , d+ q. Using ∇xω = I⊗DT

W gives FW = (I⊗DW )F(I⊗DTW ). In a

way that closely parallels the multi-dimensional Gaussian case treated in Section 3.4(cf. equation (21)), the determinant of the Fisher information in this case is

detFW =∑S

det(FSS),

where the sum is over all multi-spanning trees S = (S1, S2, · · · , Sd+q) consisting ofone spanning tree for each dimension of G.

7. Discussion and Conclusions. The preceding sections have formulated thenetwork alignment problem as one of statistically estimating true offsets betweendata at communicating nodes from noisy measurements of these values. The Fisherinformation was derived for the cases treated, thereby providing the correspondingRiemannian metric structure on the parameter manifold as well as bounds on esti-mator performance. The Fisher information, and thus the information geometry itinduces on the Lie group measurement space, were shown to depend on algebraicinvariants of the network graph topology in a direct way that provides insight abouthow network topology can be designed and adapted to promote accurate synchroniza-tion across the network. Additionally, explicit estimators were derived and analyzedfor important special cases, including the one in which the measurement space is Rdand the noise on the difference measurements labeling the graph edges is additivezero-mean Gaussian with arbitrary covariance and arbitrary correlation from edge toedge. These solutions were pointed out to have a homological character leading toconditions akin to Kirchhoff’s laws that ML estimates and their residual errors mustsatisfy.

The phase alignment problem in which the data are on the circle T has also betreated explicitly and has been pointed out to manifest the critical properties of themore general case in which the data belong to a connected abelian Lie group. Inboth the Rd ant T settings, the Kirchhoff conditions that ML estimates and theirresiduals must satisfy have been shown to lead to fast approximate ML alignmentalgorithms that can be implemented locally. Important practical cases, includingalignment of local coordinate systems, involve non-abelian Lie groups. Application ofthe approach set forth here to this class of problems will be treated in a sequel to thispaper. Finally, the authors have noted the invariance of estimators for this problem togauge transformations and have published preliminary results of investigations alongthis line [9].

REFERENCES

[1] A. S. Bandeira, A. Singer, and D. A. Spielman, A Cheeger inequality for the graph connec-tion Laplacian, SIAM Journal on Matrix Analysis and Applications, 34 (2013), pp. 1611–1630.

[2] B. Bollobas, Modern Graph Theory, Graduate Texts in Mathematics 184, Springer-Verlag,1998.


[3] D. Bump, Lie Groups, Graduate Texts in Mathematics 225, Springer-Verlag, New York, 2004.[4] F. S. Cattivelli, C. G. Lopes, and A. H. Sayed, Diffusion recursive least-squares for dis-

tributed estimation over adaptive networks, IEEE Transactions on Signal Processing, 56(2008), pp. 1865–1877.

[5] J. Chen and A. H. Sayed, Diffusion adaptation strategies for distributed optimization andlearning over networks, IEEE Transactions on Signal Processing, 60 (2012), pp. 4289–4305.

[6] G. H. Golub and C. F. van Loan, Matrix Computations, The Johns Hopkins University Press,3rd ed., 1996.

[7] B. C. Hall, Lie Groups, Lie Algebras, and Representations: An Elementary Introduction,Graduate Texts in Mathematics 222, Springer-Verlag, New York, 2003.

[8] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, 1993.[9] S. D. Howard, D. Cochran, and W. Moran, Gauge-invariant registration in networks, in

Proceedings of the International Conference on Information Fusion, July 2015, pp. 1526–1532.

[10] J. K. Johnson, D. Bickson, and D. Dolev, Fixing the convergence of the Gaussian beliefpropagation algorithm, in IEEE International Symposium on Information Theory, 2009.

[11] D. Jungnickel, Graphs, Networks and Algorithms, Algorithms and Computation in Mathe-matics, Springer-Verlag, 3rd ed., 2008.

[12] L. Lamport, Time, clocks, and the ordering of events in a distributed system, Communicationsof the ACM, 21 (1978).

[13] J.-W. Lee, S.-E. Kim, W.-J. Song, and A. H. Sayed, Spatio-temporal diffusion strategiesfor estimation and detection over networks, IEEE Transactions on Signal Processing, 60(2012), pp. 4017–4034.

[14] C. G. Lopes and A. H. Sayed, Diffusion least-mean squares over adaptive networks: For-mulation and performance analysis, IEEE Transactions on Signal Processing, 56 (2008),pp. 3122–3136.

[15] D. M. Malioutov, J. K. Johnson, and A. S. Willsky, Walk-sums and belief propagation inGaussian graphical models, Journal of Machine Learning Research, 7 (2006).

[16] K. V. Mardia and P. Jupp, Directional Statistics, Wiley, 2nd ed., 2000.[17] K.-L. Noh, Y.-C. Wu, K. Qaraqe, and B. W. Suter, Extension of pairwise broadcast clock

synchronization for multicluster sensor networks, EURASIP Journal on Advances in SignalProcessing, (2008), pp. 1–10.

[18] N. Patwari, A. O. H. III, M. Perkins, N. S. Correal, and R. J. O’Dea, Relative locationestimation in wireless sensor networks, IEEE Transactions on Signal Processing, 51 (2003),pp. 2137–2148.

[19] M. G. Rabbat and R. D. Nowak, Quantized incremental algorithms for distributed optimiza-tion, IEEE Journal of Selected Areas in Communications, 23 (2005), pp. 798–808.

[20] G. Scutari, S. Barbarossa, and L. Pescosolido, Distributed decision through self-synchronizing sensor networks in the presence of propagation delays and asymmetric chan-nels, IEEE Transactions on Signal Processing, 56 (2008), pp. 1667–1684.

[21] A. Singer, Angular synchronization by eigenvectors and semidefinite programming, Appliedand Computational Harmonic Analysis, 30 (2011), pp. 20–36.

[22] H. Singh, V. Hnizdo, and E. Demchuk, Probabilistic model for two dependent circular vari-ables, Biometrika, 89 (2002), pp. 719–723.

[23] S. Smale, On the mathematical foundations of electrical circuit theory, Journal of DifferentialGeometry, 7 (1972), pp. 193–210.

[24] W. Su and I. F. Akyildiz, Time-diffusion synchronization protocol for wireless sensor net-works, IEEE/ACM Transactions on Networks, 13 (2005), pp. 384–397.

[25] E. B. Sudderth, Embedded trees: Estimation of Gaussian processes on graphs with cycles,master’s thesis, MIT, 2002.

[26] M. J. Wainwright, E. B. Sudderth, and A. S. Willsky, Tree-based modeling and estimationof Gaussian processes on graphs with cycles, in Advances in Neural Information ProcessingSystems 13, T. Leen, T. Dietterich, and V. Tresp, eds., MIT Press, Cambridge MA, May2001.

[27] Y. Weiss and W. T. Freeman, Correctness of belief propagation in Gaussian graphical modelsof arbitrary topology, Neural Computation, 13 (2001), pp. 2173–2200.

[28] X. Zhao, S.-Y. Tu, and A. H. Sayed, Diffusion adaptation over networks under imperfectinformation exchange and non-stationary data, IEEE Transactions on Signal Processing,60 (2012), pp. 3460–3475.

MAXIMUM-LIKELIHOOD REGISTRATION OF...

Documents

Transcript of MAXIMUM-LIKELIHOOD REGISTRATION OF...