2013 modele wp tse.doc · PDF fileTSE‐652 “How to calculate the barycenter of a...
-
Upload
phungthuan -
Category
Documents
-
view
214 -
download
1
Transcript of 2013 modele wp tse.doc · PDF fileTSE‐652 “How to calculate the barycenter of a...
TSE‐652
“Howtocalculatethebarycenterofaweightedgraph”
SébastienGadat,IoanaGavraandLaurentRisser
May2016
How to calculate the barycenter of a weighted graph
Sebastien GadatToulouse School of Economics, Universite Toulouse 1 Capitole [email protected],
http://perso.math.univ-toulouse.fr/gadat/
Ioana GavraInstitut de Mathematiques de Toulouse, Universite Toulouse 3 Paul Sabatier [email protected],
Laurent RisserInstitut de Mathematiques de Toulouse, CNRS [email protected], http://laurent.risser.free.fr/
Discrete structures like graphs make it possible to naturally and flexibly model complex phenomena. Sincegraphs that represent various types of information are increasingly available today, their analysis has becomea popular subject of research. The graphs studied in the field of data science at this time generally havea large number of nodes that are not fairly weighted and connected to each other, translating a structuralspecification of the data. Yet, even an algorithm for locating the average position in graphs is lackingalthough this knowledge would be of primary interest for statistical or representation problems. In this work,we develop a stochastic algorithm for finding the Frechet mean of weighted undirected metric graphs. Thismethod relies on a noisy simulated annealing algorithm dealt with using homogenization. We then illustrateour algorithm with two examples (subgraphs of a social network and of a collaboration and citation network).
Key words : metric graphs; Markov process; simulated annealing; homogeneizationMSC2000 subject classification : 60J20,60J60,90B15OR/MS subject classification : Primary: Probability - Markov processes; secondary: Networks/graphs -
Traveling salesmanHistory :
1. Introduction.Numerous open questions in a very wide variety of scientific domains involve complex discretestructures that are easily modeled by graphs. The nature of these graphs may be weighted or not,directed or not, observed online or by using batch processing, each time implying new problemsand sometimes leading to difficult mathematical or numerical questions. Graphs are the subject ofperhaps one of the most impressive growing bodies of literature dealing with potential applicationsin statistical or quantum physics (see, e.g., [21]) economics (dynamics in economy structured asnetworks), biology (regulatory networks of genes, neural networks), informatics (Web understand-ing and representation), social sciences (dynamics in social networks, analysis of citation graphs).We refer to [37] and [43] for recent communications on the theoretical aspects of random graphmodels, questions in the field of statistics and graphical models, and related numerical algorithms.In [34], the authors have developed an overview of numerous possible applications using graphsand networks in the fields of industrial organization and economics. Additional applications, detailsand references in the field of machine learning may also be found in [28].
In the meantime, the nature of the mathematical questions raised by the models that involvenetworks is very extensive and may concern geometry, statistics, algorithms or dynamical evolutionover the network, to name a few. For example, we can be interested in the definition of suitablerandom graph models that make it possible to detect specific shape phenomena observed at dif-ferent scales and frameworks (the small world networks of [47], the existence of a giant connectedcomponent for a specific range of parameters in the Erdos-Renyi random graph model [20]). A com-plete survey may be found in [39]. Another important field of investigation is dedicated to graphvisualization (see some popular methods in [45] and [35], for example). In statistics, a popular topic
1
Author: On the calculation of a graph barycenter2 00(0), pp. 000–000, c© 0000 INFORMS
deals with the estimation of a natural clustering when the graph follows a specific random graphmodel (see, among others, the recent contribution [36] that proposes an optimal estimator for thestochastic block model and smooth graphons). Other approaches rely on efficient algorithms thatanalyze the spectral properties of adjacency matrices representing the networks (see, e.g, [6]). Afinal important field of interest deals with the evolution of a dynamical system defined througha discrete graph structure: this is, for instance, the question raised by gossip models that maydescribe belief evolution over a social network (see [1] and the references therein).
We address a problem here that may be considered as very simple at first glance: we aim todefine and estimate the barycenter of weighted graphs. Hence, this problem involves questionsthat straddle the area of statistics and the geometry of graphs. Surprisingly, as far as we know,this question has received very little attention, although a good assessment of the location of aweighted graph barycenter could be used for fair representation issues or for understanding thegraph structure from a statistical point of view. In particular, it could be used and extended toproduce “second order” moment analyses of graphs. We could therefore generalize a PrincipalComponent Analysis by extending this framework. This intermediary step was used, for example,by [13] to extend the definition of PCA on the space of probability measures on R, and by [14]to develop a suitable geometric PCA of a set of images. A popular strategy to define momentsin complex metric spaces is to use the variational interpretation of means (or barycenters), whichleads to the introduction of Frechet (or Karcher) means (see Section 2.2 for an accurate definition).This approach has been introduced in the seminal contribution [23] that makes it possible to definep-means over any metric probability space.
The use of Frechet means has met with great interest, especially in the field of bio-statistics andsignal processing, although mathematical and statistical derivations around this notion constitutea growing field of interest.• In continuous domains, many authors have recently proposed limit theorems on the empirical
Frechet sample mean (only a sample of size n of the probability law is observed) towards itspopulation counterpart. These works were mostly guided by applications to continuous manifoldsthat describe shape spaces introduced in [19]. For example, [38] establishes the consistency of thepopulation Frechet mean and derives applications in the Kendall space. The study of [9] establishesthe consistency of Frechet empirical means and derives its asymptotic distribution when n−→+∞,whereas the observations live in more general Riemannian manifolds. Finer results can be obtainedin some non-parametric restrictive situations (see, e.g., [11, 12, 15] dedicated to the so-called shapeinvariant model). Many applications in various domains involving signal processing can also befound: ECG curve analysis [10] and image analysis [44, 2], to name a few.• Recent works treat Frechet means in a discrete setting, especially when dealing with phylo-
genetic trees that have an important hierarchical structure property. In particular, [8] proposeda central limit theorem in this discrete case, whereas [41] used an idea of Sturm for spaces withnon-positive curvature to define an algorithm for the computation of the population Frechet mean.Other works deal with the averaging of discrete structure sequences such as diagrams using theWasserstein metric (see, e.g., [42]) or graphs [27].Our work deviates from the above-mentioned point since we build an algorithm that recovers theFrechet mean of a weighted graph while observing an infinite sequence of nodes of the graph,instead of finding the population Frechet mean of a set of discrete structures, as proposed in[8, 41, 42, 27]. Our algorithm uses recent contributions on simulated annealing ([3, 4]). It relies ona continuous-time noisy simulated annealing Markov process on graphs, as well as a second processthat accelerates and homogenizes the updates of the noisy transitions in the simulated annealingprocedure.
The paper is organized as follows: Section 2 highlights the different difficulties raised by thecomputation of the Frechet mean of weighted graphs and describes the algorithm we propose. In
Author: On the calculation of a graph barycenter00(0), pp. 000–000, c© 0000 INFORMS 3
Section 3.1, theoretical backgrounds to understand the behavior of our method are developed onthe basis of this algorithm, and Section 3.2 states our main results. Simulations and numericalinsights are then given in Section 4. The convergence of the algorithm is theoretically establishedin Section 5, whereas Section 6 describes functional inequalities in quantum graphs that will beintroduced below.
Acknowledgments.The authors gratefully acknowledge Laurent Miclo for his stimulating discussions and helpful com-ments throughout the development of this work, and Nathalie Villa-Vialaneix for her interest andadvice concerning simulations. The authors are also indebted to zbMATH for making their databaseavailable to produce numerical simulations.
2. Stochastic algorithm on quantum graphsWe consider G= (V,E) a finite connected and undirected graph with no loop, where V = {1, . . . ,N}refers to the N vertices (also called nodes) of G, and E the set of edges that connect some couplesof vertices in G.
2.1. Undirected weighted graphsThe structure of G may be described by the adjacency matrix W that gives a non-negative weightto each edge E (pair of connected vertices), so that W = (wi,j)1≤i,j≤N while wi,j = +∞ if there isno direct link between node i and node j. W indicates the length of each direct link in E: a smallpositive value of wi,j represents a small length of the edge {i, j}. We assume that G is undirected(so that the adjacency matrix W is symmetric) and connected: for any couple of nodes in V , wecan always find a path that connects these two nodes. Finally, we assume that G has no self loop.Hence, the matrix W satisfies:
∀i 6= j wi,j =wj,i and ∀i∈ V wi,i = 0
We define d(x, y) as the geodesic distance between two points (x, y) ∈ V 2, which is the length ofthe shortest path between them. The length of this path is given by the addition of the length oftraversed edges:
∀(i, j)∈ V 2 d(i, j) = mini=i1→i2→...→ik+1=j
k−1∑`=1
wi`,i`+1.
When the length of the edges is constant and equal to 1, it simply corresponds to the number oftraversed edges. Since the graph is connected and finite, we introduce the definition of the diameterof G:
DG := sup(x,y)∈V 2
d(x, y).
To define a barycenter of a graph, it is necessary to introduce a discrete probability distributionν over the set of vertices V . This probability distribution is used to measure the influence of eachnode on the graph.Example 1. Let us consider a simple scientometric example illustrated in Figure 1 and consider
a “toy” co-authorship relation that could be obtained in a subgraph of a collaboration network likezbMATH 1. If two authors A and B share kA,B joint papers, it is a reasonable choice to use a weightwA,B = φ(kA,B), where φ is a convex function satisfying φ(0) = +∞, φ(1) = 1 and φ(+∞) = 0. Thismeans that no joint paper between A and B leads to the absence of a direct link between A and B
1 https://zbmath.org/
Author: On the calculation of a graph barycenter4 00(0), pp. 000–000, c© 0000 INFORMS
ν(1) = 0.5
ν(2) = 0.1 ν(3) = 0.1
ν(4) = 0.2
ν(5) = 0.11/5
1/10
1
1/10
1/20
1/30
Figure 1 Example of a weighted graph between five authors, where the probability mass is {0.5,0.1,0.1,0.2,0.1}.In this graph, Author 1 shares five publications with Author 2 and ten publications with Author 5, andso on.
on the graph. On the contrary, the more papers there are between A and B, the closer A and B willbe on the graph. Of course, this graph may be embedded in a probability space with the additionaldefinition of a probability distribution over the authors that can be naturally proportional to thenumber of citations of each author. This is a generalization of the Erdos graph. Note that this typeof example can also be encountered when dealing with movies and actors, leading, for example, tothe Bacon number and graph (see the website: www.oracleofbacon.org/).
2.2. Frechet mean of an undirected weighted graphFollowing the simple remark that the p-mean of any distribution ν of Rd is the point that minimizes
x 7−→EZ∼ν [|x−Z|p],
it is legitimate to be interested in the Frechet mean of a graph (G,ν) where ν refers to theprobability distribution over each node. The Frechet mean is introduced in [23] to generalize thisvariational approach to any metric space. Although we have chosen to restrict our work to thecase of p= 2, which corresponds to the Frechet mean definition, we believe that our work could begeneralized to any value of p≥ 1. If d denotes the geodesic distance w.r.t. G, we are interested insolving the following minimization problem:
Mν := arg minx∈E
Uν(x) where Uν(x) =1
2
∫G
d2(x, y)ν(y). (1)
Hence, the Frechet mean of (G,ν), denoted Mν , is the set of all possible minimizers of Uν . Notethat this set is not necessarily a singleton and this uniqueness property generally requires someadditional topological assumptions (see [3], for example). At this point, we want to make threeimportant remarks about the difficulty of this optimization problem:• The problem of finding Mν involves the minimization of Uν , which is a non-convex function
with the possibility of numerous local traps. To our knowledge, this problem cannot be efficientlysolved using either a relaxed solution or using a greedy/dynamic programing algorithm (in thespirit of the Dijkstra method that makes it possible to compute geodesic paths [18]).
Author: On the calculation of a graph barycenter00(0), pp. 000–000, c© 0000 INFORMS 5
• It is thus natural to think about the use of a global minimization procedure, and, in particular,the simulated annealing (S.A. for short) method. S.A. is a standard strategy to minimize a functionover discrete spaces and its computational cost is generally high. It relies on an inhomogeneousMarkov process that evolves on the graph with a transition kernel depending on the energy esti-mates Uν . In other words, it requires the computation of Uν that depends on an integral w.r.t. ν.Since we plan to handle large graphs, this last dependency can be a very strong limitation.• In some cases, even the global knowledge of ν may not be realistic, and the importance of
each node can only be revealed through i.i.d. sequential arrivals of new observations in E thatare distributed according to ν. This may be the case, for example, if we consider a probabilitydistribution over E that is cropped while gathering interactive forms on a website.
2.3. Outline of Simulated Annealing (S.A.)The optimization with S.A. introduces a Markov random dynamical system that evolves either incontinuous time (generally for continuous spaces) or in discrete time (for discrete spaces). Whendealing with a discrete setting, S.A. is based on a Markov kernel proposition L(., .) : V × V −→[0,1] related to the 1-neighborhoods of the Markov chain, as introduced in [30]. It is based onan inhomogeneous Metropolis-Hastings scheme, which is recalled in Algorithm 1. We can deriveasymptotic guarantees of the convergence towards a minimum of U as soon as the cooling scheduleis well chosen.Algorithm 1: M.-H. Simulated Annealing
Data: Function U . Decreasing temperature sequence (Tk)k≥0
1Initialization: X0 ∈ V ;2for k= 0 . . .+∞ do
3 Draw x′ ∼L(Xk, .) and compute pk = 1∧{eT−1k
[U(Xk)−U(x′)] L(x′,Xk)
L(Xk,x′)
}4 Update Xk+1 according to Xk+1 =
{x′ with probability pk
Xk with probability 1− pk5end6Output: limk−→+∞Xk
When dealing with a continuous setting, S.A. uses a drifted diffusion with a vanishing variance(εt)t≥0 over V , or an increasing drift coefficient (βt)t≥0. We refer to [32, 40] for details and we recallits Langevin formulation in Algorithm 2.
Algorithm 2: Langevin Simulated Annealing
Data: Function U . Increasing inverse temperature (βt)t≥0
1 Initialization: X0 ∈ V ;2 ∀t≥ 0 dXt =−βt∇U(Xt)dt+ dBt3Output: limt−→+∞Xt
In both cases, we can see that S.A. with U = Uν given by (1) involves the computation of thevalue of U in Line 4 of Algorithm 1, or the computation of ∇U in Line 2 of Algorithm 2. These twocomputations are problematic for our Frechet mean problem: the integration over ν is intractablein the situation of large graphs and we are naturally driven to consider a noisy version of S.A. Apossible alternative method for this problem is to use a homogenization technique: replacing Uν inthe definition of pk by Uy(·) = d2(·, y), where y is a value from an i.i.d. sequence (Yn)n≥0 distributedaccording to ν . Such methods have been developed in [29] as a modification of Algorithm 1 withan additional Monte-Carlo step in Line 4, when UY (x) follows a Gaussian distribution centeredaround the true value of U(x) = EY∼νUY (x). This approach is still problematic in our case sincethe Gaussian assumption on the random variable d2(x,Y ) is unrealistic here. Another limitationof this MC step relies on the fact that it requires a batch average of several (U(x,Yj))1≤j≤nk where
Author: On the calculation of a graph barycenter6 00(0), pp. 000–000, c© 0000 INFORMS
nk is the number of observations involved at iteration k, although we also plan to develop analgorithm that may be adapted to on-line arrivals of the observation (Yn)n≥0. Lastly, it is importantto observe that the non-linearity of the exponential prevents the use of only one observation Yk inthe acceptation/reject ratio involved in the S.A. since it does not lead to an unbiased evaluationof the true transition:
EY∼ν[eT−1k
[UY (Xk)−UY (x′)]]6= eT
−1k
EY∼ν [UY (Xk)−UY (x′)] .
This difficulty does not arise in the homogenization of the simulated annealing algorithm in thecontinuous case since the exponential is replaced by a gradient, i.e., the process we use is a Markovprocess of the form:
dXt =−βt∇UYt(Xt)dt+ dBt,
where Bt is a “Brownian motion” on the graph G, β−1t refers to the inverse of the temperature, and
Yt is a continuous time Markov process obtained from the sequence (Yn)n∈N. This point motivatesthe introduction of the quantum graph induced by the initial graph. Of course, dealing with acontinuous diffusion over a quantum graph G deserves special theoretical attention, which will begiven in Section 3.1.
2.4. Homogenized S.A. algorithm on a quantum graphWe now present the proposed algorithm for estimating Frechet means. To do so, we first introducethe quantum graph ΓG derived from G= (V,E) that corresponds to the set of points living insidethe edges e∈E of the initial graph. Once an orientation is arbitrarily fixed for each edge of E, thelocation of a point in ΓG depends on the choice of an edge e ∈E and on a coordinate xe ∈ [0,Le]where Le is the length of edge e on the initial graph. The coordinate 0 then refers to the initialpoint of e and Le refers to the other extremity.
While considering the quantum graph ΓG, it is still possible to define the geodesic distancebetween any point x ∈ ΓG and any node y ∈G. In particular, when x ∈ V , we use the initial defi-nition of the geodesic distance over the discrete graph, although when x∈ e∈E with a coordinatexe ∈ [0,Le], the geodesic distance between x and y is:
d(x, y) = {xe + d(e(0), y)}∧ {Le−xe + d(e(Le), y)} .
This definition can be naturally generalized to any two points of ΓG, enabling us to consider themetric space (ΓG, d).
Consider a positive, continuous and increasing function t 7−→ αt such that:
limt 7−→+∞
αt = +∞ and ∀t≥ 0 βt = o(αt).
We introduce (Nαt )t≥0, an inhomogeneous Poisson process over R+ with intensity α. It is standard
to represent Nα through a homogeneous Poisson process H of intensity 1 using the relationship:
∀t≥ 0 Nαt =Hh(t), where h(t) =
∫ t
0
αsds.
Using this accelerated process Nα, our optimization algorithm over ΓG is based on the Markovprocess that solves the following stochastic differential equation over ΓG:{
X0 ∈ ΓG
dXt =−βt∇UYNαt (Xt)dt+ dBt.(2)
Author: On the calculation of a graph barycenter00(0), pp. 000–000, c© 0000 INFORMS 7
Using the definition of Nt and the basic properties of a Poisson Process, it can be observed thatfor all ε > 0 and all t≥ 0:
E[Nt+ε−Nt
ε
]=
1
ε
∫ t+ε
t
αsds
Hence, α should be understood as the speed of new arrivals in the sequence (Yn)n∈N.Our definition (2) is slightly inaccurate: for any y ∈ V (G), the function Uy is C1 except at a finite
set of points of ΓG:• This can be the case inside an edge x∈ e with xe ∈ (0,Le) when at least two different geodesic
paths from x to y start in opposite directions. Then, we define ∇Uy(x) = 0.• This can also be the case at a node x ∈ V when several geodesic paths start from x to y. In
that case, we once again arbitrarily impose a null value for the “gradient” of Uy at node x. Thisprior choice will not have any influence on the behavior of the algorithm.
We will first detail below the theoretical objects involved in (2), then we will describe an efficientdiscretization of (2) that makes it possible to derive our practical optimization algorithm.
Algorithm 3: Homogenized Simulated Annealing over a quantum graph
Data: Function U . Increasing inverse temperature (βt)t≥0. Intensity (αt)t≥0
1Initialization: Pick X0 ∈ ΓG. ;2T0 = 0 ;3for k= 0 . . .+∞ do4 while Nα
t = k do5 Xt evolves as a Brownian motion, relatively to the structure of ΓG initialized at X−Tk .
6 end7 Tk+1 := inf{t :Nα
t = k+ 1};8 At time t= Tk+1, draw Yk+1 = YNαt according to ν.9 The process Xt jumps from X−t towards Yk+1:
Xt =X−t +βtα−1t
−−−−−−−→XtYNαt , (3)
where−−−−→XtYNαt represents the shortest (geodesic) path from Xt to YNαt in ΓG.
10end11Output: limt−→+∞Xt
Algorithm 3 could be studied following the road map of [5]. Nevertheless, this implies seriousregularity difficulties on the densities and the Markov semi-group involved. Hence, we have chosento consider Algorithm 3 as a natural Euler explicit discretization of our Markov evolution (2): fora large value of k, the average time needed to travel from Tk to Tk+1 is approximately α−1
Tk−→ 0 as
k−→+∞. On this short time interval, the drift term in (2) is the gradient of the squared geodesic
distance between XTk and YNαTk
, which is approximated by our vector−−−→XtYk, multiplied by βTk ,
leading to (3). Xt now evolves as a Brownian motion over ΓG between two jump times and thisevolution can be simulated with a Gaussian random variable using a (symmetric) random walkwhen the algorithm hits a node of ΓG. Figure 2 proposes a schematic evolution of (Xt)t≥0 over asimple graph ΓG with five nodes. We will prove the following result.
Theorem 1. A constant c?(Uν) exists such that if αt = 1 + t and βt = b log(1 + t) with b >{c?(Uν)}−1, then (Xt)t≥0 defined in (2) converges a.s. to Mν defined in (1), when t goes to infinity.
A more rigorous form of this result and more details about the process defined in (2) are presentedin Section 3.2. The proof of Theorem 3 is deferred to Section 5.
3. Inhomogeneous Markov process over ΓGThis section presents the theoretical background needed to define the Markov evolution (2).
Author: On the calculation of a graph barycenter8 00(0), pp. 000–000, c© 0000 INFORMS
V1
V2 V3
V4
V5
+X−T1+XT1
YT1
Jump of size {βT1α−1T1}× d(X−
T1, YT1
)
+X−T2
Brownian displacement YT2
+XT2
Jump of size {βT2α−1T2}× d(X−
T2, YT2
)
Figure 2 Schematic evolution of the Homogenized S.A. described in Algorithm 3 over the quantum graph. Wefirst observe a jump at time T1 towards YT1 = V1 and a Brownian motion on ΓG during T2 − T1. Asecond node is then sampled according to ν: here, YT2 = V5 and a jump towards YT2 occurs at time T2.
3.1. Diffusion processes on quantum graphsWe adopt here the convention introduced in [24] and fix for any edge e ∈ E of length Le anorientation (and parametrization se). This means that se(0) is one of the extremities of e andse(Le) is the other one. By doing so, we have determined an orientation for ΓG.
Dynamical system inside one edge Following the parametrization of each edge, we can define thesecond order elliptic operator Le as ∀(x, y, t)∈ e×V (G)×R+:
Lef(x, y, t) =−βt∇xUy(x) +1
2∆xf(x, y) +αt
∫G
[f(x, y′)− f(x, y)]dν(y′), (4)
which is associated with (2) when x ∈ e: the x-component follows a standard diffusion drifted by∇xUy(.) inside the edge e, although the y-component jumps over the nodes of the initial graph Gwith a jump distribution ν and a rate αt.
Since the drift term ∇Uy(.) is measurable w.r.t. the Lebesgue measure over e and the second-order part of the operator is uniformly elliptic, Le uniquely defines (in the weak sense) a diffusionprocess up to the first time the process hits one of the extremities of e (see, e.g., [33]), which leadsto a Feller Markov semi-group.
Dynamical system near one node We adopt the notation of [26] and write e∼ v when a vertexv ∈ V is an extremity of an edge e∈E. For any function f on ΓG, at any point v ∈ V , we can definethe directional derivative of f with respect to an edge e∼ v according to the parametrization ofe. If nv denotes the number of edges e such that e∼ v, we then obtain nv directional derivativesdesignated as (def(v)):
def(v) =
lim
h−→0+
f(se(h))− f(se(0))
hif se(0) = v
limh−→0−
f(se(Le +h))− f(se(Le))
hif se(Le) = v.
It is shown in [26] that general dynamics over quantum graphs depend on a set of positivecoefficients:
A :=
{(av, (ae,v)e∼v)∈R1+nv
+ s.t.av +∑e∼v
ae,v > 0 : v ∈ V
}.
Author: On the calculation of a graph barycenter00(0), pp. 000–000, c© 0000 INFORMS 9
There then exists a one-to-one correspondence between A and the set of all possible continuousMarkov Feller processes on ΓG. More precisely, if the global generator Lt is defined as
∀(x, y)∈ ΓG×V Ltf(x, y) =Le,t(f)(x, y) when x∈ e,
while f belongs to the domain:
D(L) :=
{∀y ∈G f(., y)∈ C∞b (ΓG) : ∀v ∈ V
∑e∼v
aedef(v, y) = 0
}, (5)
then the martingale problem is well-posed (see [26, 22]). In our setting, the gluing conditions aredefined through the following set of coefficients A:
∀v ∈ V av = 0 and ∀e∼ v ae,v =1
nv.
The gluing conditions defined in (5) induce the following dynamics: when the x component of(Xt, Yt)t≥0 hits an extremity v ∈ V of an edge e, it is instantaneously reflected in one of the nv edgesconnected to v (with a uniform probability distribution over the connected edges) while spendingno time on v. Using the uniform ellipticity of Le and the measurability of the drift term, Theorem2.1 of [26] can be adapted, providing the well-posedness of the Martingale problem associated with(L,D(L)), and the next preliminary result can then be obtained.
Theorem 2. The operator L associated with the gluing conditions A generates a Feller Markovprocess on ΓG×V ×R+, with continuous sample paths on the x component. This process is weaklyunique and follows the S.D.E. (2) on each e∈E.
For simplicity, ∆x will refer to the Laplacian operator with respect to the x-coordinate on thequantum graph ΓG, using our gluing conditions (5) and the formalism introduced in [26].
3.2. Convergence of the homogenized S.A. over ΓG
As mentioned earlier, we use a homogenization technique that involves an auxiliary sequence ofrandom variables (Yn)n≥1, which are distributed according to ν. More specifically, the stochasticprocess (Xt, Yt)y≥0 described above is depicted by its inhomogeneous Markov generator, which canbe split into three parts:
Ltf(x, y) =L1,tf(x, y) +L2,tf(x, y).
In the equality above, L1,t is the part of the generator that acts on Yt:
L1,tf(x, y) = αt
∫[f(x, y′)− f(x, y)]ν(dy′), (6)
describing the arrival of a new observation Y ∼ ν with a rate αt at time t. Concerning the actionon the x component, the generator is:
L2,tf(x, y) =1
24xf(x, y)−βt <OxUy,Oxf(x, y)> . (7)
Since the couple (Xt, Yt)t≥0 is Markov with a renewal of Y with ν, it can be immediately observedthat the y component is distributed at any time according to ν. We introduce the notation mt
to refer to the distribution of the couple (Xt, Yt) at time t, and we define nt as the marginaldistribution of Xt. In the following, we will also need to deal with the conditional distribution of
Author: On the calculation of a graph barycenter10 00(0), pp. 000–000, c© 0000 INFORMS
Yt given the position Xt in ΓG. We will refer to this probability distribution as mt(y|x). To sumup, we have:
L (Xt, Yt) =mt with nt(x)dx=
∫V
mt(x, y)dy and mt(y |x) := P[Y = y |Xt = x]. (8)
A traditional method for establishing the convergence of S.A. towards the minimum of a functionUν consists in studying the evolution of the law of (Xt)t≥0 and, in particular, its close relationshipwith the Gibbs field µβt with energy Uν and inverse temperature βt:
µβt =1
Zβtexp(−βtUν(x)), (9)
where Zβt is the normalization factor, i.e., Zβt =
∫e−βtUν(x)dx.
Using a slowly decreasing temperature scheme t 7−→ β−1t , it is expected that the process (Xt)t≥0
evolves sufficiently fast (over the state space) in the ergodic sense so that its law remains close to µβt .In addition, the Laplace method on the sequence (µβt)t≥0 ensures that the measure is concentratednear the global minimum of Uν (see, for example, the large deviation principle associated with(µβ)β→+∞ in [25]).
Hence, a natural consequence of the convergence “L (Xt)−→ µβt” and of the weak asymptoticconcentration of (µβt)t≥0 around Mν would be the almost sure convergence of the algorithm towardsthe Frechet mean:
limt−→+∞
P (Xt ∈Mν) = 1.
We refer to [32] for further details. In particular, a strong requirement for this convergence can beconsidered through the relative entropy of nt (the law of Xt) with respect to µβt :
Jt :=KL(nt||µβt) =
∫ΓG
log
[nt(x)
µβt(x)
]dnt(x) (10)
The function βt will be chosen as a C1 function of R+, and since µβ is a strictly positive measure overΓG, it implies that t 7−→ µβt(x)−1 is C1(R+×ΓG). Moreover, (t, x) 7−→ nt(x) follows the backwardKolmogorov equation, which induces a C1(R+ × ΓG) function. Since the semi-group is uniformlyelliptic on the x-component, we have:
∀t > 0 ∀x∈ ΓG nt(x)> 0.
On the basis of these arguments, we can deduce:
Proposition 1. Assume that t 7−→ βt is C1, then (t, x) 7−→ nt(x) defines a positive C1(R+×ΓG)function and t 7−→ Jt is differentiable for any t > 0.
If we define:αt = λ(t+ 1) and βt = b log(t+ 1),
with b a constant smaller than c?(Uν)−1, where c?(Uν) is the maximal depth of a well containing a
local but not global minimum of Uν , defined in (34), then our main result can be stated as follows:
Theorem 3. For any constant λ > 0 such that αt = λ(t + 1) and βt = b log(1 + t) with b >{c?(Uν)−1}−1, then:
limJt = 0 as t−→+∞.
Author: On the calculation of a graph barycenter00(0), pp. 000–000, c© 0000 INFORMS 11
This ensures that the process Xt will almost surely converge towards Mν and, therefore, towardsa global minimum of Uν .
The idea of the proof is to obtain a differential inequality for Jt, which implies its convergencetowards 0. It is well known that the Gibbs measure µβt is the unique invariant distribution of thestochastic process that evolves only on the x component, whose Markov generator is given by:
L2,t(f)(x) =1
2∆xf −βt〈∇xf,∇xUν〉. (11)
Therefore, a natural step of the proof will be to control the difference between L2,t and L2,t andto use this difference to study the evolution of mt and nt. It can be seen that L2,t may be writtenas an average action of the operator thanks to the linearity of the gradient operator:
L2,t(f) =1
2∆xf −βtEy∼ν〈∇xf,∇xUy〉.
When Xt = x, we know that Yt is distributed according to the distribution mt(y|x). Consequently,the average action of L2,t on the x component is:
L2,t(f) =1
2∆xf −βtEy∼mt(.|x)〈∇xf,∇xUy〉=
1
2∆xf −βt
∫V
〈∇xf,∇xUy〉mt(y|x)dy, (12)
whose expression may be close to that of L2,t if mt(.|x) is close to ν.Thus, another important step is to choose appropriate values for αt and βt, i.e., to find the
balance between the increasing intensity of the Poisson process and the decreasing temperatureschedule, in order to quantify the distance between ν and mt(.|x). The main core of the proofbrings together these two aspects and is detailed in Section 5.
Another important step will be the use of functional inequalities (Poincare and log-Sobolevinequalities) over ΓG for the measure µβ when β = 0 and β −→+∞. The proof of these technicalresults are given in Section 6.3.
Corollary 1. Assume that βt = b log(t+ 1) with b < c?(Uν)−1 and αt = λ(t+ 1)γ with γ ≥ 1,
then for any neighborhood N of Mν:
limt−→+∞
P[Xt ∈N ] = 1.
Proof: The argument follows from Theorem 3. Consider any neighborhood N of Mν . The conti-nuity of Uν shows that:
∃δ > 0 N c ⊂ {x∈ ΓG :U(x)>minU + δ}.
Hence,
P[Xt ∈N c] ≤ P[U(Xt)>minU + δ]
=
∫ΓG
1U(x)>minU+δdnt(x)
=
∫ΓG
1U(x)>minU+δdµβt(x) +
∫ΓG
1U(x)>minU+δ[nt(x)−µβt(x)]dx
≤ µβt {U >minU + δ}+ 2dTV (nt, µβt)
≤ µβt {U >minU + δ}+√
2Jt,
where we used the variational formulation of the total variation distance and the Csiszar-Kullback inequality. As soon as limt−→+∞ Jt = 0, we can conclude the proof observing thatµβt {U >minU + δ} −→ 0 as βt −→+∞. �
Author: On the calculation of a graph barycenter12 00(0), pp. 000–000, c© 0000 INFORMS
It can actually be proven (not shown here) that limt−→+∞P[Xt ∈ N ] = 1 if and only if theconstant b is chosen to be lower than c?(Uν)
−1. This means that this algorithm does not allowfaster cooling schedules than the classical S.A. algorithm. Nevertheless, this is a positive resultsince this homogenized S.A. can be numerically computed quickly and easily on large graphs.Finally, we should consider this result to be theoretical. However, in practice, the simulation of thishomogenized S.A. is performed during a finite horizon time and efficient implementations certainlydeserve a specific theoretical study following the works of [17] and [46].
4. Numerical simulation resultsWe now present different practical aspects of our strategy. After giving numerical details abouthow to make it usable on large graphs, plus different insights into its calibration, we will presentresults obtained on social network subgraphs of Facebook2 and a citation subgraph of zbMATH.
4.1. Algorithmic complexityIt is widely recognized that the algorithmic complexity of numerical strategies dealing with graphscan be an issue. The number of vertices and edges of real-life graphs can indeed be quite large.For instance, the zbMATH subgraph of Section 4.3.2 has 13000 nodes and approximately 48000edges. In this context, it is worth justifying that the motion of (Xt)t≥0 on the graph ΓG across theiterations of Algorithm 3 is reasonably demanding in terms of computational resources. We recallthat this motion is driven by a Brownian motion (lines 4 to 6 of Algorithm 3) and an attractiontowards the vertex YNαt (line 9 of Algorithm 3) at random times (Tk)k≥0. The number of verticesand undirected edges with non-null weights in ΓG is also N and |E|, respectively. Note finally thatN − 1≤ |E| ≤N(N − 1)/2 if ΓG has a unique connected component.
Neighborhood structure A first computational issue that can arise when moving Xt is to find allpossible neighbors of a specific vertex v. This is indeed performed every time Xt moves from oneedge to another and directly depends on how ΓG is encoded in the memory. Encoding ΓG in a listof edges with non-null weights is common practice. In that case, the computer checks the vertexpairs linked by all edges to find those containing v, so the algorithmic cost is 2|E|. A second classicstrategy is to encode the graph in a connectivity matrix. In this case, the computer has to gothrough all the N indexes of the columns representing v to find the non-null weights. We insteadsparsely encode the graph in a list of lists: the main list has a size N and each of its elements liststhe neighbors of a specific vertex v. If the graph only contains edges with strictly positive weights,this strategy has a computational cost N , and in all of the other cases, the cost is <N . On average,the computational cost is 2|E|/N so that this strategy is particularly advantageous for sparsegraphs, where |E|<<N(N − 1)/2, which are common for the targeted applications. For instance,|E|= 4 ∗ 103 and N(N − 1)/2 = 1.24 ∗ 105 on the smallest Facebook subgraph of Section 4.3.1, and|E|= 4.8 ∗ 104 and N(N − 1)/2 = 8.44 ∗ 107 on the zbMATH subgraph of Section 4.3.2.
Geodesic paths Another potential issue with our strategy is that it seeks at least one optimalpath between the vertices Xt and YNαt at each jump time (line 9 of Algorithm 3). This may beefficiently done using a fast marching propagation algorithm (see, e.g., [18] for details), where:
(i) the distance to Xt is iteratively propagated on the whole graph ΓG until no more optimaldistance to reach Xt is updated, and
(ii) considering the shortest path between Xt and YNαt .In this case, step (i) is particularly time-consuming and cannot be reasonably performed at eachstep of Algorithm 3. Fortunately, the algorithmic cost to compute the distance between all pairs ofvertices is equivalent to that of computing step (i). We then compute these distances once and for
2 Subgraphs obtained on the Stanford Large Network Dataset Collection: https://snap.stanford.edu/data/
Author: On the calculation of a graph barycenter00(0), pp. 000–000, c© 0000 INFORMS 13
all at the beginning of the computations and store the result in a N ×N matrix in the RAM of
the computer 3. We can therefore very quickly use these results at each iteration of the algorithm
and, in particular, deduce the useful part of the geodesic paths involved to travel from Xt to YNαt .
The only limitation of this strategy is that the distance matrix can be memory-consuming. It can
therefore be used on small to large graphs but not on huge graphs (typically when N > 105) on
current desktops. An extension to deal with this scalability issue is a current subject of research,
and we will therefore not describe applications of Algorithm 3 on huge graphs in this work.
4.2. Parameter tuning
Several parameters influence the behavior of the simulated process Xt. Some of them are directly
introduced in the theoretical construction of the algorithm, i.e., the intensity of the Poisson process
or the temperature schedule. Other ones come from the practical implementation of the algorithm,
i.e., the maximal time up to which we generate Xt. The theoretical result given in Theorem 3 gives
an upper bound for the probability of Xt to be not too distant from the set of global minima. This
bound depends on βt and αt as well as on different characteristics of the graph such as its diameter
and number of nodes. We then propose an empirical strategy for parameter tuning, which we will
use later in Section 4.3.
For a graph G with N nodes and a diameter DG, we define:
T ∗max = 100 + 0.1N and β∗t = 2 log(t+ 1)/DG.
We then choose the intensity of the Poisson process so that it will generate a reasonably large
amount of Yn at the end of the algorithm, depending on the discrete probability distribution ν.
More specifically, let S∗ be the average number of jumping times between T ∗max− 1 and T ∗max. We
then set S∗ = 1000, which can be obtained by choosing:
α∗t = λ(2t+ 2) with λ=S∗
2T ∗max + 1.
4.3. Results
This section presents results obtained on graphs of different sizes and using different parameters.
We used the empirical method of Section 4.2 to define default parameters and altered them to
quantify the sensitivity of our algorithm to parameter variations.
Since the process Xt lives on a quantum graph, its location at time t is between two vertices, on
an edge of ΓG. However, our primary interest is to study the properties of the initial discrete graph
and therefore, the output of the algorithm will be the vertex considered as the graph barycenter.
To achieve this, we associate a frequency to each vertex. For a vertex v, this frequency is the
portion of time during which v was the closest vertex to the simulated process Xt. In our results,
we consider frequencies computed over the last 10% of the iterations of the algorithm.
4.3.1. Facebook subgraphs
3 In our experiments, we used the all-shortest-paths function of the Python library NetworkX.
Author: On the calculation of a graph barycenter14 00(0), pp. 000–000, c© 0000 INFORMS
Experimental protocol We first tested our algorithm on three subgraphs of Facebook from the
Stanford Large Network Dataset Collection: (FB500) has 500 nodes and 4337 edges and contains
two obvious clusters; (FB2000) has 2000 nodes and 37645 edges and fully contains (FB500);
and (FB4000) has 4039 nodes and 88234 edges and fully contains (FB2000). For each of these
subgraphs, we considered the probability measure ν as the uniform distribution over the graph’s
vertices and used a length of 1 for all edges. We also explicitly computed the barycenter of these
graphs using an exhaustive search procedure. We are able to do that because we considered a
simplified case where the distribution over the nodes is uniform. For example, this exhaustive search
procedure required approximately 6 hours for the (FB4000) subgraph. Nevertheless, it allowed us
to compare our strategy with ground-truth results.
We used the strategy of Section 4.2 to define default parameters adapted to each subgraph. We
also tested different values for parameters β, S and Tmax in order to quantify their influence. We
repeated our algorithm 100 times for each parameter set to evaluate the algorithm stability.
In the tables representing quantitative results, Error represents the number of times, out of
100, that the algorithm converged to a node different from the ground-truth barycenter. It is a
rough indicator of the ability of the algorithm to locate the barycenter of the graph, that could be
replaced by a measure of the average distance between the last iterations of the algorithm and the
ground-truth barycenter (not shown in this work). For each subgraph and parameter set, column
Av. time contains the average times in seconds for the barycenter estimation, keeping in mind that
the Dijkstra algorithm was performed once for all before the 100 estimations. This preliminary
computation requires approximately 1, 30 and 80 seconds on a standard laptop.
Effect of β From a theoretical point of view, (βt)t≥0 should be chosen in relation to the constant
c?(Uν), which is unknown in practice. Therefore, the practical choice of (βt)t≥0 is a real issue to
obtain a good behavior of the algorithm. Table 1 gives the results obtained on our algorithm with
different values of βt.
FB500 FB2000 FB4000β Error Med. Freq. Av. time Error Med. Freq. Av. time Error Med. Freq. Av. time14β∗ 15 % 0.6042 0.95 s 28 % 0.3519 10.12 s 27 % 0.3314 12.41 s
12β∗ 2 % 0.4184 1.38 s 1 % 0.7268 4.25 s 5 % 0.6534 14.11 sβ∗ 0 % 0.8008 1.48 s 0 % 0.9418 12.36 s 1 % 0.8913 13.55 s2β∗ 0 % 0.8321 11.31 s 0 % 0.9892 8.52 s 0 % 0.9647 16.79 s4β∗ 0 % 0.8233 13.19 s 0 % 0.9930 15.58 s 0 % 0.9824 31.43 s8β∗ 0 % 0.7717 2.36 s 0 % 0.9750 23.25 s 0 % 0.9445 44.46 s
Table 1 Experiments on the three Facebook subgraphs, while varying the values of (βt)t≥0.
We can observe that when (βt)t≥0 is too small, then the behavior of the algorithm is deteriorated,
revealing the tendancy of the process (Xt)0≤t≤T∗maxto have an excessively slow convergence rate
towards its local attractor in the graph. Roughly speaking, in such a situation, the process does
not learn fast enough. When the value of β is chosen in the range [β∗; 2β∗], we can observe a really
good behavior of the algorithm: it almost always locates the good barycenter in a quite reasonable
time of computation (less then 20 seconds for the largest graph). Finally, we can observe in the
column, Med. Freq., that in most of the last iterations of the algorithm (more than 80%), the
process evolves around its estimated barycenter, so that the decision to produce an estimator is
quite easy when looking at an execution of the algorithm.
Author: On the calculation of a graph barycenter00(0), pp. 000–000, c© 0000 INFORMS 15
Effect of S and Tmax Table 2 gives the results obtained with our algorithm while using differentvalues of S and Tmax. As expected, we observe that increasing the ending time of simulation alwaysimproves the convergence rate (column Error in Table 2) of the algorithm towards the right node.The behavior of the algorithm is also improved by increasing the value of S, which quantifies thenumber of arrivals of nodes observed along the averaging procedure. Of course, the counterpart ofincreasing both S and Tmax is an increasing cost of simulation (see column Av. time).
FB500 FB2000 FB4000S Tmax Error Med. Freq. Av. time Error Med. Freq. Av. time Error Med. Freq. Av. time12S∗ T ∗max 2 % 0.7667 0.59 s 0 % 0.9344 3.22 s 0 % 0.8614 6.70 sS∗ 2T ∗max 0 % 0.8049 2.79 s 0 % 0.9610 9.30 s 0 % 0.8970 24.42 sS∗ 4T ∗max 0 % 0.8101 5.60 s 0 % 0.9677 22.21 s 0 % 0.9222 54.73 s2S∗ T ∗max 1 % 0.8345 2.26 s 0 % 0.9512 8.39 s 0 % 0.9062 20.95 s2S∗ 2T ∗max 0 % 0.8361 4.57 s 0 % 0.9586 23.30 s 0 % 0.9121 23.30 s2S∗ 4T ∗max 0 % 0.8423 11.16 s 0 % 0.9735 11.16 s 0 % 0.9366 96.84 s
Table 2 Experiments on the three Facebook subgraphs, while varying the values of S and Tmax.
As an illustration of the (small) complexity of the Facebook sub-graphs used for benchmarkingour algorithm, we provide a representation of the FB500 graph in Figure 3. This representationhas been obtained with the help of Cytoscape software and is not a result of our own algorithm.In Figure 3, the red node is the estimated barycenter, which is also the ground-truth barycenterlocated by a direct exhaustive computation. The blue nodes are the “second rank” nodes visitedby our method.
N80
N330
N202
N187
N180
N210
N194
N254
N266N141
N329
N142 N21
N66
N332
N331N72 N79
N48
N75
N285
N276
N104
N186
N204
N200
N161
N53
N122
N113
N16 N134
N318
N145
N123
N256
N63
N26
N323
N342
N67N169
N188
N322
N9
N213
N55N15
N344
N233
N234
N118
N25N239
N62
N261
N277N203 N56
N242
N249
N341
N208
N315
N148
N290
N196
N101
N85
N166
N106
N292
N179
N297
N231
N257
N69
N121
N183
N164
N117
N221
N159
N77
N90
N165
N345
N39
N45
N294
N209
N215
N74
N153
N299
N94
N24
N346
N92N302
N336
N76
N300
N73
N191
N126
N160
N228
N54
N105
N10
N57
N88
N30
N27
N260
N50
N303
N38
N1
N3
N236
N133 N130
N272
N280
N178 N283
N270
N184
N281
N320
N284
N135
N197
N139
N251
N206
N108
N127
N36
N309
N348
N479
N404
N445
N461
N392
N387
N431
N395
N419
N426
N451
N439 N397
N370
N460
N373
N432
N402
N444
N421
N420
N413
N198
N495
N473
N471
N448
N425
N386
N441
N415
N379
N455
N463
N390
N442
N372
N407
N422
N466
N429N352
N107
N374
N34
N428
N456N484
N363
N409
N173N465
N427
N440
N430
N393
N464
N462
N423
N453
N447
N459
N380
N443
N468
N485
N435
N457
N358
N477
N357
N486
N389
N401
N381
N424
N467
N454
N498
N411
N356
N472
N499
N437
N351
N476
N384
N406
N364
N385
N369
N469
N350
N399
N480
N449
N371
N433
N362
N89
N225
N102
N143
N95
N319
N287
N46
N23
N177
N227
N47
N131 N289
N43
N273
N264
N218
N78
N328
N306
N86
N174
N17
N162
N140
N114
N19
N116
N157
N71
N155
N61
N245
N267
N307
N286
N124
N49
N237
N129
N190
N347
N81
N125
N51
N175
N68
N83
N240
N99
N278
N301
N255
N8
N91
N181
N275
N193
N241
N152
N337
N32
N282
N279
N305
N216
N151
N44
N167
N205
N333
N14
N28
N244
N111
N93
N310
N20 N226
N312N220
N343N112
N41
N144
N138
N293
N262
N326
N149
N214
N2
N115
N259
N52
N182
N201
N110
N97
N137
N192
N253
N243
N230
N70
N29
N265
N238
N82
N212
N268
N316
N18
N171
N217
N64
N163
N100
N58
N223
N13
N59
N211
N128
N222
N176
N87
N150
N146
N207
N339
N258
N274
N96
N5
N98
N172
N298
N295
N271
N334
N109N252
N11
N35
N12
N335
N37
N154
N185
N325
N313
N248
N232
N199
N132
N40
N314
N304
N170
N291
N308
N324
N0
N224
N156
N119
N158
N490
N489
N418
N474
N396
N361
N382
N410
N359
N367
N434
N482
N488
N416
N388
N446
N438
N417
N391
N493
N492
N483
N414
N481
N353
N497
N354
N458
N360
N355
N475
N452
N470
N450
N403
N398
N400
N436
N408
N376
N378
N375
N412
N368
N349
N405
N383
N377
N478
N491
N365
N494
N366
N487
N496
N394
N147
N288
N263
N22
N120
N250
N296
N327 N229
N321
N6
N338
N311
N247
N219
N195N4
N269
N33
N189
N42
N103
N136
N246
N84
N31
N60
N65
N235
N340
N317
N7
N168
Figure 3 Region of interest presenting the results obtained on the FB500 graph.
4.3.2. zbMATH subgraph The zbMATH subgraph built from zbMATH4 has been obtainedby an iterative exploration of the co-authorship relationship in an alphabetical order. We thus nat-urally obtained a connected graph. This graph has been weighted by the complete number of cita-tions obtained by each author, leading to the distribution probability ν ∝ ](number of citations).Finally, all the edges in the graph are fixed to have a length of 1. This exploration was initialized onthe entry of the first author’s name (i.e., S. Gadat) and we stopped the process when we obtained13000 nodes (authors) on the graph. This stopping criterion in the exploration of the zbMATHdatabase corresponds to a technical limitation of 40 GB memory required by the distance matrix
4 https://zbmath.org/authors/
Author: On the calculation of a graph barycenter16 00(0), pp. 000–000, c© 0000 INFORMS
obtained with the Dijsktra algorithm. In particular, this limitation and the starting point of theexploration induce an important bias in the community of authors used to build the subgraph fromthe zbMATH dataset: the researchers obtained in the subgraph are generally French and appliedmathematicians. The graph is more or less focused on the following research themes: probabil-ity, statistics and P.D.E. Consequently, the results provided below should be understood as anillustration of our algorithm and not as a bibliometric study!
Our experiments rely on the same choice of parameters (β∗t )t≥0, S∗, T ∗max and α∗t as above,
indicated at the beginning of Section 4.2. Again, our algorithm produces the same outputs but ofcourse, in this situation, we do not know the ground-truth barycenter of the zbMath subgraph.In Figure 4, we present a representation of the subgraph obtained with Cytoscape software, and azoom on a region of interest (ROI for short) in Figure 4.
Again, the main nodes visited by our algorithm are represented with a red square and the sizeof the used square is larger when the node is frequently visited. According to the results obtainedon the Facebook subgraphs, we then assume that the larger red square is the barycenter of thezbMath subgraph.
ROI
Szulga
Bonichon
Marckert
Mallows
vanDijk
Demichel
Raugi
Simatos
Horowitz
Yamazaki
Ferraz
Barth
Sol\\xc3\\xa9
Egami
Decreusefond
Randriambololona
MoyalTran
Vergne
Delattre
HuGarc\\xc3\\xada Moore
Verd\xc3\xba
Valencia-Pabon
McDiarmid
Z\\xc3\\xa9morVall\\xc3\\xa9eScheicher
Szpankowski
Denise
Oudjane
Rubenthaler
Logothetis
Marsalle
Leonardi
Rosenkrantz
Stanley
Waterman
Altman
DhersinCoutin
Ponty
Flajolet
Ravelomanana
Pouyanne
Banderier
Rouault
Tolley
Louchard
Gouyou-Beauchamps
Francon
Randrianarimanana
Bousquet-M\\xc3\\xa9lou
Paccaut
Corteel
Zimmermann
Ludkovski
SteeleDorea
K\\xc3\\xb6rezlio\\xc7\\xa7lu
Yukich
Deguy
Puech
Promislow
GardyMilevsky
Ney
Feinberg
Sen
Trashorras
Hillairet
Siri-Jegousse
Gao
\\xc3\\x89tor\\xc3\\xa9
Bertein
Suen
deSaporta
vanGaans
Zhang
Rulliere
Falconnet
Wigderson
Norros
Lubachevsky
Brown
Hill
Ivanoff
Freedman
Richou
Spruill
Alencar
Zajic
Schneider
Bordenave
Ruci\xc5\x84ski
Mordecki
Leung
Cocozza-Thivent
Fricker
Fischer
Tibi
Vallois
Ravanelli
Weil
Zhang
Weiss
Reitzner
Billera
Thimonier
Geniet
Bensusan
Wang
Yuan
Henard
Chalendar
Kleinberg
Feige
Karp
Mathieu
Bayraktar
Delmas
Huang
Peleg
Kravitz
Dupuis
Zhang
Liang
Yang
Luccio
Najim
Lueker
Aza\xc3\xafs
Khorunzhiy
Borodin
Mercier
Zhou
Shamir
Azar
Raghavan
Montanari
DasSarma
Slivkins
Broder
Straka
LeGland Chen
Yang
Merhav
Roussignol
Celani
Frieze
KarlinUpfal
Dyer
RobinsonMolla
Pietracaprina
Augustine
Pandurangan
Pisanti
Grossi
KumarDworkMitzenmacher
Salhi
Balan
Zinn
Robert
Fayolle
Temme
MahmoudHordijk
Madiman
Valchev
Aistleitner
El-Nouty
Panholzer
Chigansky
Prodinger
Antunes
Lew
DrmotaAkhavi
Liu
Kuba
dePanafieu
Young
Gittenberger
Budhiraja
Zhou
Regnier
Shwartz
Luschgy
Cohen
Lavault
Huang
Montseny
Song
LelargeEstrade
B\\xc3\\xa1r\\xc3\\xa1ny
Pra\\xc5\\x82at
Gross
MastersSworder
Wu
Sturm
Carey
Maslov
Faddeev
Bobkov Kwapie\\xc5\\x84Guerin
Hamadene
Rutkowski
Bielecki
Orlandi
Cisse Sirbu
BenAlayaTanr\\xc3\\xa9
Schachermayer
Voiculescu
Manou-Abi
Graf
Chistyakov
Christen
Guillin
Albuquerque
Nappo
Ruh
Stockbridge
LeBorgne
Miller
Song
Schreiber
Djehiche
Marchioro
Huet
Hairer
Zabczyk
Tyukov
Mityagin
Doebner
Filipovi\xc4\x87
Weisz
Carmona
Nikeghbali
Sayit
Kabanov
Zani
Guiol
Spehner
Ferri\\xc3\\xa8re
Mania
Maillard
Stauffer
Yehudayoff
Locherbach
Yadin
Schramm
Bardet
Stoltz
Vincenzi
Janssens
Dorobantu
Makarychev
Veber
Andjel Dozzi
Fehr
Peled
Wakolbinger
Kurtz
Wilson
Benjamini
Soundararajan
Ramanan
G\xc3\xb3ra
Tsur
Shinkar
H\\xc3\\xa4ggstr\\xc3\\xb6m
Williams
Delbaen
JonassonMiclo
Hinz
Hernandez-Hernandez
Thieullen
Lima
Schmitt
Grorud
Bucklew
Iscoe
UtzetCoquet
Goodman
Chern
Kane
Ofek
Kendall
Bansaye
Marton
Rhoades
Barrieu
Th\xc3\xa4le
Simonyi
Aldous
DAristotile
Diaconis
Ewald
Pimentel
K\\xc3\\xb6rner
Alonso
Gillet
Janson
Remy
Kardaras
Reingold
Tetali
Amami
Dombry
\xc5\x81uczak
Krell
Nadtochiy
Hanlon
Evstigneev
Zitt
Calka
Sly
Herbertsson
B\\xc3\\xb6r\\xc3\\xb6czky
Hitczenko
Crawford
Pene
Chung
Anulova
Bugeaud
Alon
Chassaing
Brightwell
Hug
Mahfoudh
Kira
Mairesse
Spencer
Loukianov
Garcia
Loukianova
Galves
Lov\\xc3\\xa1sz
ZeitouniJolis
Mart\\xc3\\xadnez
Romik
Angel
ChronopoulouDuminil-Copin
Gurel-Gurevich
Sol\\xc3\\xa9
vanderHofstad
Kulske
Nourdin
Muhle-Karbe
Baxendale
Lyons
Izkovsky
Steif
Gobet
Cattiaux
Lladser
Rost
Takahashi
DaiPra
Sodin
Shkolnikov
Gallardo
Bahsoun
Friedli
Martin
OConnell
Schoenmakers
Mazliak
Karatzas
Kannan
Zhang
Lambert
Petz
Caputo
Guerbaz
Teichmann
Chen
Berger
Meda
Hoffman
Szpirglas
Kozma
Feldman
Melnikov
Kallsen
Rosen
Chavel
Klenke
Hubalek
Pokern
Galves
Lubetzky
Gu\\xc5\\xa3\\xc4\\x83
Sazonov
Liptser
Duarte
ODonnell
Creutzig
Mossel
Abadi
Gloter
Torres
Shapira
Baldi
Graham
Panloup
Schott
Pontier
Luczak
Mayer-Wolf
Assaf
Dobri\xc4\x87
Evans
Dembo
M\\xc3\\xa9min
Shields Meckes
Pemantle
S\xc5\x82omi\xc5\x84ski
\\xc3\\x9cst\\xc3\\xbcnel
Pitman
Veretennikov
Cohen
Winkler
Gombani
Bollob\\xc3\\xa1s
Guiol
Lacayo
Quer-Sardanyons
Rovira
Es-Sebaiy
Newman
Ichiba
Greven
Roelly
Tudor
Gantert
Liggett
Fradon
Mountford
Sidoravicius
T\xc3\xb3thdenHollander
Ruszel
Maes
Ravishankar
Nualart
Grimmett
Rifo
Alexander
Bardina Bjork
Ferrari
Hammond
Bahadoran
Redig
Freire
VivesSanta-Eul\\xc3\\xa0ia
Crisan
Fritz
Vares
Fernandez
Remenik
Santacroce
Cairoli
Ramirez
Molchanov
Stricker
Fontes
Schweizer
Follmer
Virag
Guillotin-Plantard Noreddine
Garz\xc3\xb3n
Schonmann
Cesi
Geman
Darses
ProtterConforti
Bertoglio
Martinelli
Le\\xc3\\xb3n
\xc3\x87etin
Cadel
Bascompte
Deya
Comets
PiccoAl\\xc3\\xb2s
Erd\xc5\x91s
\xc5\xbdigo
Kipnis
Murr
Grandits
Weiss
Runggaldier
Fernandez
Schilling
Jona-Lasinio
Bouten
Guasoni
Villemonais
Cancrini
SanMart\\xc3\\xadn
Cassandro
Briand
Davis
Zervos
Viens
Porchet
Coculescu
Kramkov
Collet
Tindel
Joulin
Benois
Jeanblanc
ElKarouiKohatsu-Higa
Lacoste
Mathieu
Nguy\xe1\xbb\x85nHuuDu
Pardoux
Presutti
TrioloRoynetteChoulli
Dereich
LandimHu
Herrmann
Ouahhabi
Castell
Chesney
Ianiro
Lamberton
Sarr\xc3\xa0
Gobron
Sznitman
Lin
Tarr\\xc3\\xa8s
Merola
Ta\\xc3\\xafbi
Caderoni
Olivieri
Maass
Anderson
DartnellM\\xc3\\xa1rquez-Carreras
Sanz-Sole
Pettersson
Fiel
Song
Bezerra
Tsagkarogiannis
SaadaOb\xc5\x82\xc3\xb3j
Fontana
BaudoinCarmona
Werner
Bressaud
Brassesco
Rozovskii
Ankirchner
Bogachev
Swanson
NardiKrawczykPoisat
Dalang
RollaCaravenna
dosSantos
Beffara
G\xc3\xa4rtner
LeNyJarai
deLima
March
Peres
Pal
Cotar
Poly
Berestycki
Revelle
Aaronson
Dawson
Gorostiza
Limic
vandenBerg
Fleischmann
Goodman
Schinazi
Merzbach
Kesten
Lawler
Xiong
Yor Mytnik
Rolles
Berend
Morters
Konig
Kupper
Chevallier
Birkner
Schweinsberg
Klein Biagini
Khas\\xca\\xb9minski\\xc4\\xad
Durrett
Madan
Stoimenov
Donati-Martin
Winter
NavarroCruz
Zili
Bronstein
Hryniv
\xc4\x8cern\xc3\xbd
Najnudel
LeGall
Sidorova
Shlosman
Feng
Heyne
Corwin
Cabezas
Tudor
Xu
Deuschel
Quastel
VanMoffaert
Huang
Wilbertz
Arguin
Walsh
Bourgade
Biane
Ma\xc3\xafda
Burdzy
Chung
ZessinKirkpatrick
Frangos
Dimitroff
Gayrard
Nualart
Pellegrinotti
R\\xc3\\xa9veillac
Mueller
Schapira
Keane
TudorGuionnet
Xu
BenArous
Salopek
Krylov
Sortais
Gun
Bovier
Cranston
Bassalygo
Pinsker
Rudiger
Prelov
Pecherski\xc4\xad
Adelman
EckhoffRuizdeCh\\xc3\\xa1vez
Ghomrasni
Foondun
Konno
L\\xc3\\xb3pez-Mimbela
Braverman
Tribe
Drapeau
Hawkes
Shiryaev
Tanemura
Levin
Windridge
Goldschmidt
Kogan
Bolthausen
Hu
Keevash
Elworthy
Batchelor
Hyland
Robinson
Marcuard
Zhang
Li
Liu
Bernard
Butko
Ying
Varadarajan
Kim
Zhou
Paycha
Xu
Ouerdiane
Derguzov
Sudakov
Kotani
Winter
Bogdan
Reichmann
Issoglio
Hayman
Zhang
Brown
Petrov
Whittle
Quas
Fukushima
Gouere
Ren
Yuan
Aida
Bao
Cao
Chen
Wu
Smolyanov
Zhang
Bo
Lyons
Hsu
Takeda
Holley
Kuwae
Albeverio
Krasovski\xc4\xad
Barbaresco
Ma
Sagna
Ma
Yan
Taniguchi
Mendez-Hernandez
Park
Williams
Shen
Durran
Engl\\xc3\\xa4nder
Diehl
Saussol
Neate
Lunt
Gong McKean
Franchi
Sztonyk
L\\xc3\\xa9vy
Kulczycki
Caruana
Reuter
Davis
Cipriani
Kree
Chen
Nesterenko
Sabot
Truman
Wang
Housworth
Singh
Lau
Gustafson
Brossard
Kershaw
Setterqvist
Baggett
Duplantier
Shao
Duan
Tudor
Galamba
Ouyang
Metz
Thalmaier
vandenBerg
Mauldin
Xu
Stroock
Coulibaly-Pasquier
UlsamerXu
Plank
Eremenko
Frikha
Delyon
Grigoryan
Fang
Ledrappier
Mao
Wu
Deng
Qian
Basdevant
Wang
Driver
Aleksandrov
Spencer
Ancona
Parry
Bouthemy
Cruzeiro
Benassi
Gairing
Teplyaev
Glover
Bardou
Rockner
Hennequin
Kwa\\xc5\\x9bnicki
Pietruska-Pa\xc5\x82ubaPopescu
St\xc3\xb3s
Lorentz
Feng
Bogachev
Kumar
Hogele
Rudolph
Razborov
Muller
Cass
Taylor
Qian
Yao
Chaumont
ZhengLitterer
Laurence
Semenov
Russo
Swords
Lepingle
Kruk
Appleby
Lapeyre
vonWeizs\\xc3\\xa4cker
Schottstedt
Michel
Peithmann
Marinelli
Bakry
Gassiat
Beznea
Wetzel
Hein
Nazarov
Mohammed
Sab
Davies
Gundy
Jakobson
Pesin
Gerin
Enriquez
Brydges
Bunimovich
Suryanarayana
Smillie
AubrunMichon
Dellacherie
Az\\xc3\\xa9ma
Fromm
\\xc3\\x89mery
Lindstr\xc3\xb8m
Jacob
Gaines
Bottcher
Vondra\xc4\x8dek
Wang
Kusuoka
Gatheral
Durand
Nadakuditi
Yang
Gjerde
OuyangKusuoka
Ub\\xc3\\xb8e
Ladyzhenskaya
Kurtz
Ross
\\xc3\\x98ksendal
Meyer
Grishin
Frohlich
Phan
Raimond
Vormoor
Kumagai
Croydon
Marstrand
Aase
Xiang
Song
Uemura
Hattori
SunKhanin
Bol\xca\xb9shev
Wang
Ayache
Kendall
Kloppel
Kornfeld
Ba\xc4\xaddak
Chen
Privault
LeJan
Vershik
Kirillov
Rozanov
Kolmogorov
Dani
Benaych-Georges
Port\\xc3\\xa8s
Maslova
Friz
Arnaudon
Torrecilla-Tarantino
Allouche
Gr\\xc4\\x83dinaru
Richter
Brunel
Miermont
Kozlov
Aleksandrov
Haas
Bakhtin
Smirnov
Profeta
Ershov
Tang
Menard
Beck
Zabrodin
Zindy
Raja
Oseledets
Mekler
Gurevich
Hattori
Hambly
Andres
Cox
L\\xc3\\xa9ger
Godunov Khelemski\xc4\xadZ\xc3\xa4hle
Onishchik
Wang
Korolyuk
Zalgaller
Reshetnyak
Berestycki
Lavrent\xca\xb9ev
YangKingman
Kutateladze
Taylor
Fujita
Sevast\xca\xb9yanov
Curien
Hughston
Dobrushin
Warren
Karpelevich
Baumgarten
Koteck\\xc3\\xbd
Baik
Ye
Forde
Zambrini
Hobson
Schneider
Guivarch
Mazet
Jiao
Pag\\xc3\\xa8s
Kuptsov
Marchetti
Yan
Xu
Auffinger
Ahn
Maisonneuve
Ueltschi
Minlos
Yang
Pirogov
Fitzsimmons
Jung
Kifer
Tsirelson
Sina\xc4\xad
Sukhov
Doring
Vvedenskaya
He
Monch Barlow
Bass
Henkel
Cherny
Bayer
Ortgiese
Royfman
Unterberger
Fribergh
Koralov
Revuz
Neveu
Hu
Aizenman
P\\xc3\\xa9ch\\xc3\\xa9Scacciatelli
Jansen Zhang
Riedle
Klein
Bonnefont
Spohn
Stauch
DosReis
Yao
Boufoussi
Blanchet-Scalliet
Hajji
Lepeltier
L\\xc3\\xa9onard
Nzi
Lejay
Elliott
Chafa\\xc3\\xaf
Ouknine
Sellami
AitOuahra
Callegaro
Bally
Zargari
Viswanathan
Mourragui
DeMasiBertini
Stoica
Scheutzow
Fehringer
Mao
Bouleau
Imkeller
Pavlyukevich
Yang
Zambotti
Varadhan
Zhang
Szatzschneider
Rodkina
Schulman
Mellouk
Zhang
Millet
\xc5\x81ochowski
Arnold
Edgar
Butt\xc3\xa0
Scarlatti
Matoussi
Freidlin
Petunin
Denis
Victoir
Ledoux
Durrleman
Berkaoui
Erraoui
Aksamit
Horst
Kruse
Deng
Olla
M\\xc3\\xa9l\\xc3\\xa9ard
Eddahbi
Kohlmann
FaggionatoToninelli
Gabrielli
Dupoiron
Blondel
Lakhel
Figure 4 Left: General overview of the zbMath subgraph extracted for our experiments, containing approximately13000 nodes. Right: Region of interest presenting the main results obtained on the zbMATH subgraph.
It is also possible to assert the robustness of our method with respect to several Monte Carloruns of our algorithm. We have produced some boxplots for each of the main authors located in thesubgraph, according to the occupation measure of the process over the last 10% of the iterationswith 10 Monte-Carlo replications. These “violin” plots are represented in Figure 5. Each executionof the algorithm requires approximately 3 hours of computations. The algorithm seems to producereliable conclusions concerning the top nodes visited all along the ending iterations. Nevertheless,it appears to be necessary to extend our investigations in order to obtain a scalable method forhandling larger graphs.
5. Proof of the main result (Theorem 3)We establish a differential inequality that will imply the convergence of Jt. The computationsactually lead us to a system of two differential inequalities. We therefore introduce another quantity
Author: On the calculation of a graph barycenter00(0), pp. 000–000, c© 0000 INFORMS 17
Yor
Cohen
Str
oock
Ledoux
Jaff
ard
Bre
zis
Zeit
ouni
Lions
Nuala
rt
1
2
3
4
5
6
7
8
Frequency
(%
)
Figure 5 “Violin” plot of the occupation measure of the algorithm on the zbMATH subgraph for 10 MC repli-cations. The average frequency is located in the middle of the “violin” plot, although the minimal andmaximal values are shown in the extremity of the representation.
It that measures the average closeness (w.r.t. x) of the conditional law of y given x at time t to ν,defined as:
It :=
∫KL(mt(.|x)||ν)dnt(x) =
∫ [∫log
mt(y|x)
ν(y)mt(y|x)dy
].dnt(x) (13)
The next proposition links the evolution of Jt (in terms of an upper bound of ∂tJt) with thespectral gap of µβt over ΓG, the diameter of the graph DG and the divergence It.
5.1. Study of ∂tJt
Proposition 2. Let c?(Uν) be the term defined in Equation (34) and CΓG be the constant givenin Proposition 5. We then have:
∂tJt ≤D2Gβ′t + 8D2
Gβ2t It−
e−c?(U)βt
CΓG(1 +βt)Jt.
Proof: We compute the derivative of Jt and separately study each of the three terms:
∂tJt =
∫∂t{log(nt(x))}dnt(x)︸ ︷︷ ︸
J1,t
−∫∂t{log(µβt(x))}dnt(x)︸ ︷︷ ︸
J2,t
+
∫log
[nt(x)
µβt
]∂tnt(x)dx︸ ︷︷ ︸
J3,t
(14)
Study of J1,t: This term is easy to deal with:
J1,t =
∫ΓG
∂t{log(nt(x))}nt(x)
Author: On the calculation of a graph barycenter18 00(0), pp. 000–000, c© 0000 INFORMS
=
∫ΓG
∂t{nt(x)}nt(x)
nt(x) =
∫ΓG
∂t{nt(x)}dx= ∂t
{∫ΓG
nt(x)dx
}= ∂t{1}= 0. (15)
Study of J2,t: Using the definition of µβt given in Equation (9), we obtain for the second term:
J2,t =−∫
ΓG
∂t{log(µβt(x))}dnt(x) =−∫
ΓG
∂t{−βtUν(x)− logZβt}dnt(x)
= β′t
∫ΓG
Uν(x)dnt(x) +∂t{Zβt}Zβt
.
According to the definition of Zβt , we have:
∂t{Zβt}Zβt
=Z−1βt∂t
{∫ΓG
e−βtUν(x)dx
}=−β′t
∫ΓG
Uν(x)e−βtUν(x)
Zβtdx=−∂t{βt}
∫ΓG
Uν(x)µβt(x)dx.
Hence, we obtain
J2,t = β′t
∫ΓG
Uν(x)[nt(x)−µβt(x)]dx
The graph has a finite diameter DG. We therefore have:
∫ΓG
Uν(x)dnt(x)≤D2G. The same inequality
holds using the measure µβt so that:|J2,t| ≤ D2
Gβ′t. (16)
Study of J3,t: The last term J3,t involves the backward Kolmogorov equation. First, since nt is
the marginal law of Xt, we have: nt(x) =
∫mt(x, y)dy.
Using the backward Kolmogorov equation for the Markov process (Xt, Yt)t≥0 and the Fubinitheorem, we have, for any smooth enough function ft : x∈ ΓG 7−→R :
∫ΓG
ft(x)∂t{nt(x)}dx=
∫ΓG
ft(x)∂t
{∫V
mt(x, y)dy
}dx=
∫ΓG
∫V
ft(x)∂t{mt(x, y)}dxdy
=
∫ΓG
∫V
Lt(ft)(x)mt(x, y)dxdy=
∫ΓG
∫V
[L1,t +L2,t](ft)(x)mt(x, y)dxdy,
where L1,t and L2,t are defined in Equations (6) and (7). Since the function ft is independent of y,we have L1,t(ft) = 0. For the part corresponding to L2,t, we have:
∫ΓG
∫V
L2,t(ft)(x)mt(x, y)dxdy
=
∫ΓG
∫V
[1
2∆xft(x)−βt∇xUy(x)∇xft(x)
]mt(x, y)dxdy
=
∫ΓG
1
2∆xft(x)nt(x)dx−βt
∫ΓG
∫V
∇xUy(x)∇xft(x)nt(x)mt(y|x)dxdy.
because nt is the marginal distribution of Xt and mt(y|x)×nt(x) =mt(x, y).
Thus, for any smooth enough function ft(x), using the operator introduced in (12), we have:∫ΓG
ft(x)∂tnt(x)dx=
∫ΓG
L2,t(ft)(x)nt(x)dx.
Author: On the calculation of a graph barycenter00(0), pp. 000–000, c© 0000 INFORMS 19
Replacing ft(x) by lognt(x)
µβt(x), we obtain:
J3,t =
∫ΓG
lognt(x)
µβt∂tnt(x) =
∫ΓG
L2,t
[log
nt(x)
µβt(x)
]nt(dx).
Since µβt is the invariant distribution of L2,t (see Equation (11)), it is natural to insert L2,t:
J3,t =
∫ΓG
L2,t
[log
nt(x)
µβt(x)
]nt(dx)
=
∫ΓG
L2,t
[log
nt(x)
µβt(x)
]nt(dx)−
∫ΓG
(L2,t−L2,t
)[log
nt(x)
µβt(x)
]nt(dx)︸ ︷︷ ︸
:=κt
. (17)
Since L2,t is a diffusion operator and µβt is its invariant measure, it is well known (see, e.g., [7])that the action of L2,t on the entropy is closely linked to the Dirichlet form in the following way:
∫ΓG
L2,t
[log
nt(x)
µβt(x)
]nt(dx) =−2
∫ΓG
(∇x
{√nt(x)
µβt(x)
})2
µβt(dx), (18)
and therefore translates a mean reversion towards µβt in the first term of (17).
We now study the size of the difference between L2,t and L2,t and introduce the “approximation”term of ∇xUν(x) at time t:
Rt(x) =
∫V
∇x(Uy)(x)mt(y|x)dy.
The relationship ∇x log(f) = 2∇x√f√
f, the Cauchy-Schwarz inequality and 2ab≤ a2 + b2 yield:
|κt| = βt
∣∣∣∣∫ΓG
(Rt(x)−∇xUν(x)
)∇x{
lognt(x)
µβt(x)
}nt(dx)
∣∣∣∣= 2βt
∣∣∣∣∣∫
ΓG
(Rt(x)−∇xUν(x)
)∇x
{√nt(x)
µβt(x)
}√µβt(x)
nt(x)nt(dx)
∣∣∣∣∣≤ 2βt
√∫ΓG
(Rt(x)−∇xUν(x)
)2
nt(dx) ·
√√√√∫ΓG
∇x
(√nt(x)
µβt(x)
)2
µβt(x)dx
≤ β2t
∫ (Rt(x)−∇xUν(x)
)2
nt(x)dx+
∫ΓG
(∇x
{√nt(x)
µβt(x)
})2
µβt(dx)
If dTV denotes the total variation distance, the first term of the right hand side leads to:∣∣∣Rt(x)−∇xUν(x)∣∣∣= ∣∣∣∣∫
V
∇xd2(x, y)[mt(y|x)− ν(y)]dy
∣∣∣∣≤ ‖∇xd2(x, y)‖∞
∣∣∣∣∫V
[mt(y|x)− ν(y)]dy
∣∣∣∣≤ 2‖∇xd2(x, y)‖∞dTV (mt(.|x), ν)
≤√
2‖∇xd2(x, y)‖∞
√∫V
logmt(y|x)
ν(y)mt(y|x)dy
Author: On the calculation of a graph barycenter20 00(0), pp. 000–000, c© 0000 INFORMS
where the last line comes from the Csiszar-Kullback inequality. Since d2(·, y) is differentiable a.e.and its derivative is bounded for all y ∈ V by 2DG, we can use It defined in Equation (13) to obtain:∫
ΓG
(Rt− Rt
)2
(x)nt(x)dx≤ 8D2GIt
Consequently, we obtain:
|κt| ≤ 8D2Gβ
2t It +
∫ΓG
(∇x
{√nt(x)
µβt(x)
})2
µβt(dx) (19)
Taking the inequalities (18) and (19), we now obtain in (17):
J3,t ≤ 8D2Gβ
2t It−
∫ΓG
(∇x
{√nt(x)
µβt(x)
})2
µβt(dx)
We denote ft =√
ntµβt
. Since ‖f‖22,µβt = 1, one can easily see that Entµβt (f2t ) = Jt. Now, the Loga-
rithmic Sobolev inequality on the (quantum) graph ΓG for the measure µβt stated in Proposition5 shows that: ∫
ΓG
(∇x
{√nt(x)
µβt(x)
})2
µβt(dx)≥ e−c?(Uν)βt
CΓG(1 +βt)Jt,
where c?(Uν) is defined in Equation (34) and is related to the maximal depth of a well containinga local but not global minimum. We thus obtain:
J3,t =
∫L2,t
[log
nt(x)
µβt(x)
]nt(dx)≤ 8D2
Gβ2t It−
e−c?(Uν)βt
CΓG(1 +βt)Jt.
The proof is concluded by regrouping the three terms. �
5.2. Study of ∂tItProposition 3. Assume that DG ≥ 1 and βt is an increasing inverse temperature with βt ≥ 1,
then:∂tIt ≤−αtIt− ∂tJt +D2
G[β′t + 6β2t ]. (20)
Proof: We compute the derivative of It. Observing that mt(x, y) = nt(x)mt(y|x), we have:
∂tIt = ∂t
{∫ΓG
nt(x)
(∫V
logmt(y|x)
ν(y)mt(y|x)dy
)dx
}= ∂t
{∫ΓG
∫V
logmt(y|x)
ν(y)mt(x, y)dxdy
}=
∫ΓG
∫V
∂t{logmt(y|x)}mt(x, y)dxdy︸ ︷︷ ︸:=I1,t
−∫
ΓG
∫V
∂t{log ν(y)}mt(x, y)dxdy︸ ︷︷ ︸:=I2,t
+
∫log
mt(y|x)
ν(y)∂tmt(x, y)dxdy︸ ︷︷ ︸
:=I3,t
The computation of the first two terms is straightforward. For the first one we have:
I1,t =
∫ΓG
∫V
∂t{logmt(y|x)}mt(x, y) =
∫ΓG
∫V
∂tmt(y|x)
mt(y|x)mt(x, y)dxdy
=
∫ΓG
∫V
∂tmt(y|x)nt(x)dxdy=
∫ΓG
nt(x)
(∫V
∂tmt(y|x)dy
)dx
Author: On the calculation of a graph barycenter00(0), pp. 000–000, c© 0000 INFORMS 21
=
∫ΓG
nt(x)∂t
∫V
mt(y|x)dy︸ ︷︷ ︸:=1
dx= 0
The computation of I2,t is easy since ν does not depend on t, implying that I2,t = 0.
For the third term, we use the backward Kolmogorov equation and obtain:
I3,t =
∫ΓG
∫V
logmt(y|x)
ν(y)∂tmt(x, y) =
∫ΓG
∫V
Lt(
logmt(y|x)
ν(y)
)mt(x, y)dx dy
=
∫ΓG
∫V
L1,t
(log
mt(y|x)
ν(y)
)mt(x, y)dx dy︸ ︷︷ ︸
:=I13,t
+
∫ΓG
∫V
L2,t
(log
mt(y|x)
ν(y)
)mt(x, y)dx dy︸ ︷︷ ︸
:=I23,t
The jump part L1,t exhibits a mean reversion on the entropy on the conditional law: applying theJensen inequality for the logarithmic function and the measure ν, we obtain:
I13,t = αt
∫ΓG
∫V
[∫V
(log
mt(y′|x)
ν(y′)− log
mt(y|x)
ν(y)
)ν(y′)dy′
]mt(x, y)dxdy
= αt
∫ΓG
∫V
[∫V
(log
mt(y′|x)
ν(y′)
)ν(y′)dy′
]mt(x, y)dxdy−αtIt
≤ αt∫
ΓG
∫V
log
(∫V
mt(y′|x)
ν(y′)ν(y′)dy′
)mt(x, y)dxdy−αtIt
≤ αt∫
ΓG
∫V
log
(∫V
mt(y′|x)dy′
)mt(x, y)dxdy−αtIt
≤ αt∫
ΓG
∫V
log 1 ·mt(x, y)dxdy−αtIt ≤−αtIt (21)
If we consider the action of L2,t on the entropy of the conditional law, using mt(x, y) =mt(y|x)nt(x)yields:
I23,t =
∫ΓG
∫V
L2,t
(log
mt(y|x)
ν(y)
)mt(x, y)dx dy
=
∫ΓG
∫V
L2,t
(log
mt(x, y)
ν(y) ·nt(x)
)mt(x, y)dx dy
=
∫ΓG
∫V
L2,t log (mt(x, y))mt(x, y)dxdy−∫
ΓG
∫V
L2,t log (nt(x, y))mt(x, y)dxdy, (22)
because L2,t(ν)(y) = 0 (L2,t only involves the x component). We now study the first term of (22):
∫ΓG
∫V
L2,t (logmt(x, y))dmt(x, y) =1
2
∫ΓG
∫V
∆x [logmt(x, y)] dmt(x, y)
−βt∫
ΓG
∫V
<∇xUy,∇x logmt(x, y)> dmt(x, y)
=1
2
∫ΓG
∫V
∂2x {logmt(x, y)}dmt(x, y)
−βt∫∂xUy∂x logmt(x, y)dmt(x, y)
Author: On the calculation of a graph barycenter22 00(0), pp. 000–000, c© 0000 INFORMS
The first term of the right-hand side deserves special attention:∫ΓG
∫V
∂2x {logmt(x, y)}dmt(x, y) =
∫ΓG
∫V
∂x
{∂x{mt(x, y)}mt(x, y)
}dmt(x, y)
=
∫ΓG
∫V
∂2x{mt(x, y)}mt(x, y)
dmt(x, y)−∫
ΓG
∫V
[∂x{mt(x, y)}]2
mt(x, y)dxdy
=
∫ΓG
∫V
∂2x{mt(x, y)}dxdy− 4
∫ΓG
∫V
(∂x√mt(x, y)
)2
dxdy
=∑e∈E
∫V
[∂xmt(e(Le), y)]− ∂xmt(e(0), y)dy
− 4
∫ΓG
∫V
(∂x√mt(x, y)
)2
dxdy,
where {e(s),0≤ s≤ Le} is the parametrization of the edge e ∈ E introduced in Section 3.1. Thegluing conditions (5) yield:∑
e∈E
∫V
∂xmt(e(Le), y)− ∂xmt(e(0), y) =
∫V
∑v∈V
[∑e∼v
−demt(v, y)
]= 0.
As a consequence, we obtain:∫ΓG
∫V
L2,t (logmt(x, y))dmt(x, y) =−2
∫ΓG
∫V
(∂x√mt(x, y)
)2
dxdy
−βt∫
ΓG
∫V
∂xUy∂x logmt(x, y)dmt(x, y) (23)
The Cauchy-Schwarz inequality applied on the second term leads to:∣∣∣∣βt ∫ΓG
∫V
∂xUy∂x logmt(x, y)dmt(x, y)
∣∣∣∣≤ βt‖∂xd2(., .)‖2,mt
√∫ΓG
∫V
(∂x logmt(x, y))2dmt(x, y)
≤ βt||∂xd2(x, y)||∞
√∫ΓG
∫V
(∂x logmt(x, y))2mt(x, y)dxdy
≤ 4βtDG
√∫ΓG
∫V
(∂x√mt(x, y)
)2
dxdy
≤ 2D2Gβ
2t + 2
∫ΓG
∫V
(∂x√mt(x, y)
)2
dxdy
Inserting this in Equation (23) leads to:∫ΓG
∫V
L2,t logmt(x, y)dmt(x, y)≤ 2D2Gβ
2t . (24)
We study the second term of (22) and use the proof of Proposition 2: the decomposition (14) withEquations (15) and (16) yield:
∂tJt ≤ β′tD2G +
∫L2,t
(log
nt(x)
µβt(x)
)mt(dx,dy).
This implies:
−∫
ΓG
∫V
L2,t (lognt(x))mt(dx,dy)≤−∂tJt +β′tD2G−
∫L2,t (logµβt(x))mt(dx,dy). (25)
Author: On the calculation of a graph barycenter00(0), pp. 000–000, c© 0000 INFORMS 23
Using the definition of µβt , we obtain:
−∫
ΓG
∫V
L2,t loge−βtUν(x)
Zβt=
∫ΓG
∫V
[L2,t (βtUν(x)) +L2,t (logZβt)]mt(dx,dy)
= βt
∫ΓG
∫V
L2,t (Uν(x))mt(dx,dy)
= βt
∫ΓG
∫V
1
2∆xUν(x)−βt∇xUy(x)∇xUν(x)mt(dx,dy)
≤ βt2
+ 4β2tD2
G
This inequality used in (25) yields:
−∫L2,t (lognt(x))mt(x, y) =−∂tJt +β′tD2
G +βt2
+ 4β2tD2
G (26)
We now use (24) and (26) in (22) and our assumptions βt ≥ 1 and DG ≥ 1 to obtain:
I23,t ≤−∂tJt +D2
G[β′t + 6β2t ] (27)
Combining (21) and (27) leads to the desired inequality given by Equation (20). �
5.3. Convergence of the entropyThe use of Propositions 2 and 3 makes it possible to obtain a system of coupled differential inequal-ities.
Proof of Theorem 3: If we denote a=D2G, we can write:{
J ′t ≤− e−c?(Uν )βt
CΓG(1+βt)
Jt + aβ′t + 8aβ2t It
I ′t ≤−αtIt−J ′t + a(β′t + 6β2t )
We introduce an auxiliary function Kt = Jt+ktIt, where kt is a smooth positive decreasing functionfor t large enough, so that lim
t→∞kt = 0. We use the system above to deduce that:
K ′t = J ′t + k′tIt + ktI′t
≤ J ′t + ktI′t
≤ J ′t + kt(−αtIt−J ′t) + akt(β′t + 6β2
t )≤ (1− kt)J ′t − ktαtIt + akt(β
′t + 6β2
t ).
In the first inequality, we use the fact that k′t ≤ 0 and It is positive. The second inequality is givenby (20). The upper bound of Proposition 2 now leads to:
K ′t ≤−εt(1− kt)Jt− ktαtIt + 8a(1− kt)β2t It + aβ′t + 6aktβ
2t ,
where we denoted εt = e−c?(Uν )βt
CΓG(1+βt)
. We choose the function kt to obtain a mean reversion on Kt:
kt :=8aβ2
t
αt + 8aβ2t − εt/2
. (28)
Note that this function is decreasing for sufficiently large t as soon as βt = o(αt), which is thecase according to the choices described in Theorem 3. Moreover, a straightforward consequence islimt−→+∞ kt = 0. This ensures that a positive T0 exists such that:
∀t≥ T0 0≤ kt ≤1
2.
Author: On the calculation of a graph barycenter24 00(0), pp. 000–000, c© 0000 INFORMS
Consequently, we deduce that:
∀t≥ T0 K ′t ≤−εt(1− kt)Jt− ktεt2It + aβ′t + 6aβ2
t kt
≤−εt2Jt− kt
εt2It + aβ′t + 6aβ2
t kt ≤−εt2Kt + aβ′t + 6aβ2
t kt︸ ︷︷ ︸:=ηt
The next bound is an easy consequence of the Gronwall Lemma:
∀t≥ T0 Kt ≤KT0e−
∫ t
T0
εs2
ds+
∫ T
T0
ηse−
∫ t
s
εu2
duds,
which in turn implies that Kt −→ 0 as t−→+∞ as soon as ηt = ot∼+∞(εt) with∫∞
0εudu= +∞.
We are now looking for a suitable choice for αt and βt. Let us assume that αt ∼ λtγ for any γ > 0.This choice leads to:
kt ∼8ab2 log(t+ 1)2
λtγ,
so that:
ηt ∼ab
t+
48a2b4 log(t+ 1)4
λtγ=O(t−(1∧γ) log(t)4).
At the same time, we can check that εt ∼ t−c?(Uν )b
bCΓGlog t
. Now, our conditions on ηt and εt imply that:∫ ∞0
εudu= +∞ ⇐⇒ b c?(Uν)≤ 1.
At the same time:
ηt = o(εt) ⇐⇒ 1∧ γ > bc?(Uν).
The optimal calibration of our parameters γ and b (minimal value of γ, maximal value of b) inducesthe choice γ = 1 and b < c?(Uν)
−1. Up to the choices αt ∼ λt and βt ∼ b log(t), we deduce thatKt −→ 0 when t goes to infinity. Since It, kt and Jt are positive, this also implies that Jt −→ 0when t goes to infinity. �
6. Functional inequalitiesThis section is devoted to the proof of the Log-Sobolev inequality for the measure (µβ)β>0. We dealtwith the Dirichlet form using this inequality (see Equation (18)) in Proposition 2. In particular, weneed to obtain an accurate estimate when β −→+∞. For this purpose, we introduce the genericnotation for Dirichlet forms (see, i.e., [7] for a more in-depth description):
∀β ∈R∗+ ∀f ∈W 1,2(µβ) Eβ(f, f) = ‖Of‖2µβ =
∫ΓG
|Of(x)|2dµβ(x).
If we denote < f >µβ= µβ(f) =
∫ΓG
fdµβ, we are interested in showing the Poincare inequality:
‖f−< f >µβ ‖22,µβ
=
∫ΓG
(f−< f >µβ )2dµβ ≤ λ(β)Eβ(f, f) (29)
Author: On the calculation of a graph barycenter00(0), pp. 000–000, c© 0000 INFORMS 25
and the Log-Sobolev Inequality (referred to as LSI below):∫ΓG
f2 log
(f
‖f‖2,µβ
)2
dµβ ≤C(β)Eβ(f, f) (30)
The specific feature of this functional inequality deals with the quantum graph settings anddeserves careful adaptation of the pioneering work of [31]. This technical section is split into twoparts. The first one establishes a preliminary estimate when β = 0 (i.e., when dealing with theuniform measure on ΓG). The second one then uses this estimate to derive the asymptotic behaviorof the LSI when β −→+∞.
6.1. Preliminary control for µ0 on ΓGWe consider µ0 the normalized Lebesgue measure and use the standard notation for any measureµ on ΓG:
‖f‖W1,p(µ) := ‖f‖p,µ + ‖∇f‖p,µLet us establish the next elementary result:
Lemma 1. The ordinary Sobolev inequality holds on ΓG, i.e., for all measurable functions f wehave:
‖f−< f >0 ‖2p,µ0≤ e2/eL2
[DGL
2+ 1
]E0(f, f), (31)
where DG is the diameter of the graph and L its perimeter defined by:
L=∑e∈E
Le.
Proof: First, let us remind the reader that for a given interval I in dimension 1 equipped with theLebesgue measure λI , the Sobolev space W 1,p(λI) is continuously embedded in L∞(I) (compactinjection when I is bounded). In particular, it can be shown (see, e.g., [16]) that while integratingw.r.t. the unnormalized Lebesgue measure:
∀p≥ 1 ‖f‖L∞(I) ≤ e1/e‖f‖W1,p(λI ).
We now consider f ∈W 1,p(µ0). Since G has a finite number of edges we can write: ΓG =⋃n
i=1 ei,where each ei can be seen as an interval of length `i. We have seen that for each edge ei:
‖f‖L∞(ei) ≤ e1/e‖f‖W1,p(λei )
.
We can use a union bound since ΓG represents the union of edges, and deduce that:
‖f‖L∞(ΓG) = max1≤i≤n
‖f‖L∞(ei) ≤ e1/e max
1≤i≤n‖f‖W1,p(λei )
≤ e1/e‖f‖W1,p(λΓG),
where the inequality above holds w.r.t. the unnormalized Lebesgue measure. If L denotes the sumof the lengths of all edges in ΓG, we obtain:
‖f‖L∞(ΓG) ≤ e1/eL‖f‖W1,p(µ0). (32)
Second, we establish a simple Poincare inequality for µ0 on ΓG. For any function f ∈W 1,2(µ0),we use the equality:
‖f−< f >0 ‖22,µ0=
1
2
∫ΓG
∫ΓG
[f(x)−f(y)]2dµ0(x)dµ0(y) =1
2
∫ΓG
∫ΓG
[∫γx,y
f ′(s)ds
]2
dµ0(x)dµ0(y).
Author: On the calculation of a graph barycenter26 00(0), pp. 000–000, c© 0000 INFORMS
where γx,y is the shortest path that connects x to y, parametrized with speed 1 and f ′(s) refers tothe derivative of f w.r.t. this parametrization at time s. It should be noted that such a path existsbecause the graph ΓG is connected. The Cauchy-Schwarz inequality yields:
‖f−< f >0 ‖22,µ0≤ 1
2
∫ΓG
∫ΓG
[∫γx,y
|∇f(s)|2ds
]|γx,y|dµ0(x)dµ0(y)≤ DGL
2‖∇f‖22,µ0
. (33)
The Sobolev inequality is now an obvious consequence of the previous inequality: consider f ∈W 1,p(µ0) and note that since µ0(ΓG) = 1, then (32) applied with p= 2 leads to:
∀q≥ 1 ‖f−< f >0 ‖2q,µ0=
(∫ΓG
|f−< f >0 |qdµ0
)2/q
≤ ‖f−< f >0 ‖2L∞(ΓG)µ0(ΓG)
≤ e2/eL2‖f−< f >0 ‖2W1,2(µ0).
We can use the Poincare inequality established for µ0 on ΓG in Equation (33) and obtain:
∀q≥ 1 ‖f−< f >0 ‖2q,µ0≤ e2/eL2
[DGL
2+ 1
]‖∇f‖22,µ0
= e2/eL2
[DGL
2+ 1
]E0(f, f),
which concludes the proof. �Remark 1. The constants obtained in the proof of Lemma 1 above could certainly be improved.
Nevertheless, such an improvement would have little importance for the final estimates obtainedin Proposition 5.
6.2. Poincare Inequality on µβIn the following, we show a Poincare Inequality for the measure µβ for large values of β. Thispreliminary estimate will be useful for deriving LSI on µβ. This functional inequality is stronglyrelated to the classical minimal elevation of the energy function Uν for joining any state x to anystate y.
We first introduce some useful notations. For any couple of vertices (x, y) of ΓG, and for anypath γx,y that connects them, we define h(γx,y) as the highest value of Uν on γx,y:
h(γx,y) = maxs∈γx,y
Uν(s).
We define H(x, y) as the smallest value of h(γx,y) obtained for all possible paths from x to y:
H(x, y) = minγ:x→y
h(γ)
Now, for any pair of vertices x and y, the notation γx,y will be reserved for the path that attainsthe minimum in the definition of H(x, y). Such a path exists for any x, y because ΓG is connectedand possesses a finite number of paths that connect any two given vertices.
Finally, we introduce the quantity that will mainly determine the size of the spectral gap involvedin the Poincare inequality and the constant in the LSI (see the seminal works of [25] for a LDPprobabilistic interpretation and [32] for a functional analysis point of view):
c?(Uν) := max(x,y)∈Γ2
G
[H(x, y)−Uν(x)−Uν(y)] + minx∈ΓG
Uν(x) (34)
In the following, every time we write minUν we refer to minx∈ΓG
Uν(x).
Author: On the calculation of a graph barycenter00(0), pp. 000–000, c© 0000 INFORMS 27
Theorem 4 (Poincare inequality for µβ). For all measurable functions g defined on ΓG:
V arµβ (g)≤ |E|maxe∈E Lee2DG
2eβc
?(Uν)Eµβ (g, g).
Proof: Since the graph ΓG is connected, for any two points x and y, we can find a minimal pathγx,y that links them and that minimizes H. We denote x0 = x,x1, · · ·xl, xl+1 = y where the sequencex1, · · · , xl refers to the nodes included in the path γx,y.
V arµβ (g) =1
2
∫ΓG
∫ΓG
(g(y)− g(x))2dµβ(x)dµβ(y)
=1
2
∫ΓG
∫ΓG
(l∑i=0
(g(xi+1)− g(xi)
)2
dµβ(x)dµβ(y)
=1
2
∫ΓG
∫ΓG
(l∑i=0
∫ xi+1
xi
g′(s)ds
)2
dµβ(x)dµβ(y)
=1
2
∫ΓG
∫ΓG
(∫γx,y
g′(s)ds
)2
dµβ(x)dµβ(y)
≤ 1
2
∫ΓG
∫ΓG
∫γx,y
|∇g(s)|2ds|γx,y|dµβ(x)dµβ(y),
The last line is implied by the Cauchy-Schwarz inequality.
We can organize the terms involved in the above upper bound in the following way: for any edgeof the graph e∈E, we denote Ve the set of points (x, y) such that γx,y ∩ e 6= ∅. We therefore have:
V arµβ (g)≤ 1
2
∑e∈E
∫e
|∇g(s)|2ds
∫Ve
|γx,y|dµβ(x)dµβ(y)
≤ 1
2
∑e∈E
∫e
|∇g(s)|2µβ(s)
∫Ve
|γx,y|1
µβ(s)dµβ(x)dµβ(y)ds
≤ 1
2
∑e∈E
[∫e
|∇g(s)|2µβ(s)ds
][sups∈e
1
µβ(s)
∫Ve
|γx,y|dµβ(x)dµβ(y)
]≤ 1
2
∫ΓG
|∇g(s)|2µβ(s)ds sups∈ΓG
1
µβ(s)
∫Ves
|γx,y|dµβ(x)dµβ(y)
In this case, es denotes the edge of the graph that contains the point s and the set Ves is stillthe set of couples (x, y) defined above, associated with each edge es. We introduce the quantity Adefined as:
A= sups∈ΓG
1
µβ(s)
∫Ves
|γx,y|dµβ(x)dµβ(y).
Using this notation, we have obtained that for all functions g, we have the Poincare inequality:
V arµβ (g)≤ A2Eµβ (g, g). (35)
All that remains to be done is to obtain an upper bound of A.
A= sups∈ΓG
1
µβ(s)
∫Ves
|γx,y|dµβ(x)dµβ(y)
Author: On the calculation of a graph barycenter28 00(0), pp. 000–000, c© 0000 INFORMS
= sups∈ΓG
ZβeβUν(s)
∫Ves
e−β(Uν(x)+Uν(y))
Z2β
|γx,y|dxdy
=1
Zβsups∈ΓG
∫Ves
eβ(Uν(s)−Uν(x)−Uν(y))|γx,y|dxdy
Since γx,y is the minimal path for H(x, y), we have H(x, y) = maxs∈γx,y
Uν(s). Therefore:
A≤ 1
Zβsups∈ΓG
∫Ves
eβ((H(x,y)−Uν(x)−Uν(y))|γx,y|dxdy
≤ 1
Zβsups∈ΓG
∫Ves
eβ(c?(Uν)−min(Uν))|γx,y|dxdy
≤ eβ(c?(Uν)−min(Uν))
Zβsups∈ΓG
∫Ves
|γx,y|dxdy
≤ eβ(c?(Uν)−min(Uν))
Zβ|E|max
e∈ELe.
Using the definition of Zβ, we have:
A≤ eβc?(Uν)|E|maxe∈E Le∫
ΓG
e−β(Uν(x)−min(Uν))dx
. (36)
If x? ∈Mν is a Frechet mean that minimizes Uν , we designate B(x?,
1
β
), as the ball of center x?
and radius1
β, for the geodesic distance d on the graph ΓG. It is easy to check that:
|Uν(x)−Uν(x?)|=∣∣EY∼ν [d2(x,Y )− d2(x?, Y )
]∣∣≤EY∼ν ∣∣d2(x,Y )− d2(x?, Y )∣∣≤ 2DG× d(x,x?).
We can then deduce a lower bound on the denominator involved in (36):∫ΓG
e−β(Uν(x)−min(Uν))dx≥∫B
x?, 1β
e−β(Uν(x)−min(Uν))dx
≥∫B
x?, 1β
e−2DGβd(x,x?)dx
≥ e−2DG
∫B
x?, 1β
dx.
The Lebesgue measure of B(x?,
1
β
)may be lower bounded by β−1 since there is, at the least, one
path in ΓG passing by the point x?. Inserting this inequality in (36) gives:
A≤ |E|maxe∈E
Lee2DG βeβc
?(Uν).
Using this upper bound in (35) leads to the desired Poincare inequality. �
Author: On the calculation of a graph barycenter00(0), pp. 000–000, c© 0000 INFORMS 29
6.3. Sobolev Inequalities on µβ6.3.1. Preliminary control on Dirichlet forms The next result will be useful to derive
a LSI for µβ from a Poincare inequality on µβ (given in Theorem 4). It generalizes the Poincareinequality for p norms with p > 2 while using the Sobolev inequality given in Lemma 1.
First, we introduce the maximal elevation of Uν as:
M = supx∈ΓG
Uν(x)− infx∈ΓG
Uν(x). (37)
Note that in our case, M may be upper bounded by D2G.
Proposition 4. For any p > 2 and all measurable functions f , we have:
∀β ≥ 0 ‖f−< f >β ‖2p,µβ ≤ 4L2
(DGL
2+ 1
)e2/e+MβEβ(f, f). (38)
Proof: The Jensen inequality applied to the convex function x 7−→ xp yields:
‖f−< f >β ‖pp,µβ =
∫ΓG
|f−< f > |pβdµβ
=
∫ΓG
∣∣∣∣f(x)−∫
ΓG
f(y)dµβ(y)
∣∣∣∣p dµβ(x)
=
∫ΓG
∣∣∣∣∫ΓG
f(x)− f(y)dµβ(y)
∣∣∣∣p dµβ(x)
≤∫
ΓG
∫ΓG
|f(x)− f(y)|p dµβ(y)dµβ(x)
≤∫
ΓG
(|f(x)−< f >0 |+ |f(y)−< f >0 |)p dµβ(y)dµβ(x)
Again, the Jensen inequality |a+ b|p ≤ 2p−1[|a|p + |b|p] implies that:
‖f−< f >β ‖pp,β ≤ 2p−1
∫ΓG
|f(x)−< f >0 |p + |f(y)−< f >0 |pdµβ(y)dµβ(x)
≤ 2p∫
ΓG
|f(x)−< f >0 |pdµβ(x) = 2p‖f−< f >0 ‖pp,µβ
We conclude that:‖f−< f >β ‖2p,µβ ≤ 4‖f−< f >0 ‖2p,µβ . (39)
Using the fact that:
µβ(x) =e−βminUν
Zβ≤ e−βminUν
Zβ×Z0µ0(x),
we also have: ‖f−< f >0 ‖pp,µβ ≤Le−βminUν
Zβ‖f−< f >0 ‖pp,µ0
, since Z0 is the perimeter L of the
graph ΓG. Consequently, we obtain:
‖f−< f >0 ‖2p,µβ =(‖f−< f >0 ‖pp,µβ
)2/p
≤(Le−βminUν
Zβ
)2/p
‖f−< f >0 ‖2p,µ0
Using inequality (39), the fact that Zβ ≤Le−βminUν and the assumption 2/p < 1, we conclude that:
‖f−< f >β ‖2p,µβ ≤4Le−βminUν
Zβ‖f−< f >0 ‖2p,µ0
.
Author: On the calculation of a graph barycenter30 00(0), pp. 000–000, c© 0000 INFORMS
The Sobolev inequality (1) implies that:
‖f−< f >β ‖2p,µβ ≤4L3 (DGL/2 + 1)e2/e−βminUν
ZβE0(f, f). (40)
We now find an upper bound for the Dirichlet form E0(f, f) that involves Eβ(f, f):
E0(f, f) =
∫ΓG
<Of,Of > dµ0
=Zβ
∫ΓG
<Of,Of > eβUν(x) e−βUν(x)
Zβdµ0
≤Zβ exp
(β supx∈ΓG
Uν(x)
)Z0
∫ΓG
<Of,Of > dµβ
≤Zβ exp
(β supx∈ΓG
Uν(x)
)L
Eβ(f, f) (41)
Putting (40) and (41) together concludes the proof. �
6.3.2. Log-Sobolev Inequality on µβ For all probability measures µ and all measurablefunctions f , we denote:
Entµ(f2) =
∫ΓG
f2 log
(f2
‖f‖22,µ
)dµ
Proposition 5. The Log-Sobolev Inequality holds on ΓG. A constant CΓG exists such that forall β ≥ 0 and all µβ-measurable functions f , we have:
Entµβ (f2)≤CΓG [1 +β]ec?(Uν)βEβ(f, f)
Proof: We consider β ≥ 0 and a µβ measurable function f . We apply the Jensen inequality for thelogarithmic function and the measure f2/‖f‖22,µβdµβ to obtain:
∫ΓG
(f
‖f‖2,µβ
)2
log
(f2
‖f‖22,µβ
)dµβ =
2
p− 2
∫ΓG
(f
‖f‖2,µβ
)2
log
(|f |p−2
‖f‖p−22,µβ
)dµβ
≤ 2
p− 2log
(∫ΓG
|f |p−2
‖f‖p−22,,µβ
f2
‖f‖22,,µβdµβ
)
≤ 2
p− 2log
(‖f‖pp,µβ‖f‖p2,µβ
)=
p
p− 2log
(‖f‖2p,µβ‖f‖22,µβ
).
Observing that for all x, δ > 0 we have log(xδ)≤ δx, we obtain log(x)≤ xδ+ log(δ−1). Therefore:
∀δ > 0
∫ΓG
f2 log
(f
‖f‖2,µβ
)2
dµβ = ‖f‖22,µβ
∫ΓG
(f
‖f‖2,µβ
)2
log
(f2
‖f‖22,µβ
)dµβ
≤p‖f‖22,µβp− 2
log
(‖f‖2p,µβ‖f‖22,µβ
)
≤p‖f‖22,µβp− 2
[δ‖f‖2p,µβ‖f‖22,µβ
+ log1
δ
].
Author: On the calculation of a graph barycenter00(0), pp. 000–000, c© 0000 INFORMS 31
Replacing f with f−< f >β and choosing δ= e−Mβ leads to:∫ΓG
(f−< f >β)2 log
(f−< f >β
‖f−< f >β ‖2,β
)2
dµβ
≤ p
p− 2
[e−βM‖f−< f >β ‖2p,µβ +Mβ‖f−< f >β ‖22,µβ
].
Proposition 4 and the Poincare inequality established in Theorem 4 yields:
Entµβ[(f−< f >β)2
]=
∫ΓG
(f−< f >β)2 log
(f−< f >β
‖f−< f >β ‖2,µβ
)2
dµβ
≤ p
p− 2
(4L2DGL+ 2
2e2/e +Mβ|E|max
e∈ELee2DG
2βec
?(Uν)β
)Eβ(f, f) (42)
It remains to use Rothau’s Lemma (see Lemma 5.1.4 of [7]) that states that for any measure µ andany constant a:
Entµ(g+ a)2 ≤Entµg2 + 2
∫ΓG
g2dµ.
Let p= 4 in Equation (42). Putting this together with Rothau’s Lemma and Theorem 4, we obtainthe following:
Entµβ (f2) =
∫ΓG
f2 log
(f2
‖f‖22,µβ
)dµβ
≤Entµβ[(f−< f >β)2
]+ 2V arµβ (f)
≤Eβ(f, f)[β|E|max
e∈ELee
2DG(1 +Mβ)ec?(Uν)β + 4L2(2 +DGL)e2/e
]≤CΓG [1 +β]ec
?(Uν)βEβ(f, f),
where CΓG is a large enough constant (independent of β) that could be made explicit in terms ofconstants DG, |E| and L since we trivially have M ≤D2
G. �
References[1] Acemoglu D., Fagnani F. Ozdaglar A., Como G. 2013. Opinion fluctuations and disagreement in social
networks. Math. Oper. Res. 38(1) 1–27.
[2] Allassonniere S., Trouve A., Kuhn E. 2010. Construction of bayesian deformable models via stochasticapproximation algorithm: A convergence study. Bernoulli 16 641–678.
[3] Arnaudon M., Miclo L. 2014. Means in complete manifolds: uniqueness and approximation. ESAIM:Probability and Statistics 18 185–206.
[4] Arnaudon M., Miclo L. 2014. A stochastic algorithm finding generalized means on compact manifolds.Stochastic Processes and their Applications 124 34633479.
[5] Arnaudon M., Miclo L. 2016. A stochastic algorithm finding p-means on the circle. Bernoulli, in press.
[6] Bach F., Jordan M. 2004. Learning spectral clustering. Advances in Neural Information ProcessingSystems 305–312.
[7] Bakry D., Ledoux M., Gentil I. 2014. Analysis and Geometry of Markov Diffusion Operators,Grundlehren der mathematischen Wissenschaften, vol. 348. Springer.
[8] Barden D., Owen M., Le H. 2013. Central limit theorems for Frechet means in the space of phylogenetictrees. Electronic Journal of Probability 18 1–25.
Author: On the calculation of a graph barycenter32 00(0), pp. 000–000, c© 0000 INFORMS
[9] Bhattacharya R., Patrangenaru V. 2003. Large sample theory of intrinsic and extrinsic sample meanson manifolds. Ann. Statist. 31 1–29.
[10] Bigot, J. 2013. Frechet means of curves for signal averaging and application to ECG data analysis.Annals of Applied Statistics 7 1837–2457.
[11] Bigot J., Gadat S. 2010. A deconvolution approach to estimation of a common shape in a shifted curvesmodel. Annals of Statistics 38 224–243.
[12] Bigot J., Gendre X. 2013. Minimax properties of Frechet means of discretely sampled curves. Annalsof Statistics 41 923–956.
[13] Bigot J., Klein T. Lopez A., Gouet R. 2015. Geodesic PCA in the wasserstein space by convex PCA.Annales de l’Institut Henri Poincare B: Probability and Statistics, to appear .
[14] Bigot J., Lopez A., Gouet R. 2013. Geometric PCA of images. Siam, J. Imaging Sci. 6 1851–1879.
[15] Bontemps D., Gadat S. 2014. Bayesian methods for the shape invariant model. Electronic Journal ofStatistics 8 1522–1568.
[16] Brezis, H. 1987. Analyse fonctionelle. Masson, Paris.
[17] Catoni, O. 1992. Rough large deviation estimates for simulated annealing : application to exponentialschedules. Ann. Probab. 20 1109–1146.
[18] Dijkstra, E. W. 1959. A note on two problems in connexion with graphs. Numer. Math. 1 269–271.
[19] Dryden I. L., Mardia K. V. 1998. Statistical Shape Analysis.
[20] Erdos P., Renyi A. 1960. The evolution of random graphs. Magyar Tud. Akad. Mat. Kutato Int. Kozl.5 17–61.
[21] Estrada, E. 2015. Introduction to Complex Networks. Structure and Dynamics, chapter of EvolutionaryEquations with Applications to Natural Sciences. Lecture Notes in Mathematics, Springer.
[22] Ethier S. N., Kurtz T. 2005. Markov processes. Characterization and convergence.. Wiley Series inProbability and Statistics, John Wiley & Sons.
[23] Frechet, M. 1948. Les elements aleatoires de nature quelconque dans un espace distancie. Annales del’Institut Henri Poincare (B) 10 215–310.
[24] Freidlin M., Sheu S.J. 2000. Diffusion processes on graphs: stochastic differential equations, large devi-ation principle. Probab. Theory Relat. Fields 116 181–220.
[25] Freidlin M. I., Wentzell A. D. 1979. Random perturbations of dynamical systems, Grundlehren derMathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] , vol. 260. 2nd ed.Springer-Verlag, New York. Translated from the 1979 Russian original by Joseph Szucs.
[26] Freidlin M. I., Wentzell A. D. 1995. Random perturbations of Hamiltonian systems, Memoirs of theAmerican Mathematical Society , vol. 523. A.M.S., New York.
[27] Ginestet, C.E. 2013. Strong consistency of set-valued frechet sample means in metric spaces. Preprint .
[28] Goldenberg A., Fienberg S.E. Airoldi E. M., Zheng A. X. 2010. A survey of statistical network models.Found. Trends Mach. Learn. 2(2) 129–233.
[29] Gutjahr W. J., Pflug G. 1996. Simulated annealing for noisy cost functions. J. Global Optim. 8(1) 1–13.
[30] Hajeck, B. 1988. Cooling schedules for optimal annealing. Mathematics of Operations Research 13311–329.
[31] Holley, Richard, Daniel Stroock. 1988. Simulated annealing via Sobolev inequalities. Comm. Math.Phys. 115(4) 553–569.
[32] Holley R., Stroock D., Kusuoka D. 1989. Asymptotics of the spectral gap with applications to the theoryof simulated annealing. J. Funct. Anal. 83(2) 333–347.
[33] Ikeda N., Watanabe S. 1981. Stochastic Differential Equations and Diffusion Processes. North-Holland.
[34] Jackson, M. O. 2008. Social and Economic Networks. Princeton Univ. Press, Princeton, NJ.
[35] Kaufmann M., Wagner D. 2001. Drawing graphs: methods and models. Lecture Notes in ComputerScience, Springer.
Author: On the calculation of a graph barycenter00(0), pp. 000–000, c© 0000 INFORMS 33
[36] Klopp O., Verzelen N., Tsybakov A.B. 2016. Oracle inequalities for network models and sparse graphonestimation. Annals of Statistics, in press .
[37] Kolaczyk, E. D. 2009. Statistical analysis of network data: methods and models. Springer Series inStatistics, Springer New-York.
[38] Le, H. 2001. Locating Frechet means with application to shape spaces. Adv. Appl. Probab. 33 324–338.
[39] Lovasz, L. 2012. Large networks and graph limits, vol. 60. American Mathematical Society.
[40] Miclo, Laurent. 1992. Recuit simule sur Rn. Etude de l’evolution de l’energie libre. Ann. Inst. H.Poincare Probab. Statist. 28(2) 235–266.
[41] Miller E., Provan J., Owen M. 2015. Polyhedral computational geometry for averaging metric phyloge-netic trees. Advances in Applied Mathematics 68 51–91.
[42] Munch E., Bendich P. Mukherjee S. Mattingly J. Harer J., Turner K. 2015. Probabilistic Frechet meansfor time varying persistence diagrams. Electronic Journal of Statistics 9 1173–1204.
[43] Newman, M. 2010. Networks, An Introduction. Oxford University Press.
[44] Pennec, X. 2006. Intrinsic statistics on Riemannian manifolds: basic tools for geometric measurements.J. Math. Imaging Vision 25 127–154.
[45] Shneiderman B., Aris A. 2006. Network visualization by semantic substrates. IEEE Transactions onVisualization and Computer Graphics 12 733–740.
[46] Trouve, A. 1993. Parallelisation massive du recuit simule. PhD Thesis, Universite d’Orsay .
[47] Watts D. J., Strogatz S. H. 1998. Collective dynamics of ’small-world’ networks. Nature 393(6684)409–10.