arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

55
Topological Learning for Brain Networks Tananun Songdechakraiwut, Moo K. Chung Department of Biostatistics and Medical Informatics University of Wisconsin–Madison Abstract This paper proposes a novel topological learning framework that can integrate networks of different sizes and topology through per- sistent homology. This is possible through the introduction of a new topological loss function that enables such challenging task. The use of the proposed loss function bypasses the intrinsic computa- tional bottleneck associated with matching networks. We validate the method in extensive statistical simulations with ground truth to assess the effectiveness of the topological loss in discriminating net- works with different topology. The method is further applied to a twin brain imaging study in determining if the brain network is ge- netically heritable. The challenge is in overlaying the topologically different functional brain networks obtained from the resting-state functional MRI (fMRI) onto the template structural brain network obtained through the diffusion MRI (dMRI). 1 arXiv:2012.00675v3 [q-bio.NC] 30 Nov 2021

Transcript of arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Page 1: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Topological Learning for Brain Networks

Tananun Songdechakraiwut, Moo K. ChungDepartment of Biostatistics and Medical Informatics

University of Wisconsin–Madison

Abstract

This paper proposes a novel topological learning framework thatcan integrate networks of different sizes and topology through per-sistent homology. This is possible through the introduction of a newtopological loss function that enables such challenging task. Theuse of the proposed loss function bypasses the intrinsic computa-tional bottleneck associated with matching networks. We validatethe method in extensive statistical simulations with ground truth toassess the effectiveness of the topological loss in discriminating net-works with different topology. The method is further applied to atwin brain imaging study in determining if the brain network is ge-netically heritable. The challenge is in overlaying the topologicallydifferent functional brain networks obtained from the resting-statefunctional MRI (fMRI) onto the template structural brain networkobtained through the diffusion MRI (dMRI).

1

arX

iv:2

012.

0067

5v3

[q-

bio.

NC

] 3

0 N

ov 2

021

Page 2: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

1 Introduction

Networks are useful representations for complex data and often representedusing graphs consisting of nodes and edges. In the usual network analysis,the focus is mainly on analyzing how data at nodes are interacting witheach other. The strength of interaction is represented as edge weights. Insocial science, human interactions are modeled using networks by treatinghumans as nodes and interaction between humans as edges [Scott, 1988]. Inmolecular network studies, interatomic distances in a molecule are measuredacross atoms and serve as edge weights while the atoms themselves serveas nodes [Chung and Ombao, 2021, Xia and Wei, 2014]. In brain imagingstudies, the whole brain is parcellated into hundreds of disjoint regions,which serve as network nodes [Arslan et al., 2018, Desikan et al., 2006,Fornito et al., 2016, Hagmann et al., 2007, Tzourio-Mazoyer et al., 2002]while the brain activities measured as correlations between parcellationsserve as edge weights.

In the standard graph theory based network analysis [Sporns, 2003, Wijket al., 2010], graph theory features such as node degrees and clusteringcoefficients are often obtained from adjacency matrices after thresholdingedge weights. The final statistical results could be different depending onthe choice of threshold [Lee et al., 2012]. Thus, there is a need to develop amultiscale network model that provides consistent results and interpretationregardless of the choice of threshold. Topological Data Analysis (TDA)[Edelsbrunner et al., 2000, Wasserman, 2018], a general framework basedon algebraic topology, can provide a novel solution to the multiscale networkanalysis challenge. Instead of examining networks using graphs at one fixedscale, persistent homology identifies persistent topological features that arerobust under different scales.

Numerous TDA studies have been applied to increasingly diversebiomedical problems such as genetics [Chung et al., 2017b, 2019b], epilep-tic seizure detection [Wang et al., 2018], sexual dimorphism in the humanbrain [Songdechakraiwut and Chung, 2020], analysis of brain arteries [Ben-dich et al., 2016], image segmentation [Clough et al., 2019], classification[Chen et al., 2019, Reininghaus et al., 2015, Singh et al., 2014], clinicalpredictive model [Crawford et al., 2020] and persistence-based clustering[Chazal et al., 2013].

Persistent homology begins to emerge as a powerful mathematical rep-resentation to understand, characterize and quantify topology of networks.In persistent homology, topological features are measured across different

2

Page 3: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

spatial resolutions. As the resolution changes, such features are born anddie. Persistent homology associates the life-time to these features in theform of 1D interval from birth to death. The collection of such intervals issummarized as a barcode, which completely characterizes the topology of un-derlying data [Ghrist, 2008]. Long-lived barcodes persist over a long rangeof resolutions and are considered as signal [Carlsson, 2009]. Recently, it wasproposed to penalize barcodes through topological loss for image segmenta-tion [Hu et al., 2019]. While the approach allows to incorporate topologicalinformation into segmentation problem, the method has been limited to 2Dimage segmentation with a small number of topological features due to itsexpensive optimization process involving O(|V |6) run-time for |V | numberof vertices [Edmonds and Karp, 1972, Kerber et al., 2017]. Barcodes aretypically computed at a finite set of pre-specified resolutions. A sufficientnumber of such resolutions is required to give a reasonably accurate estima-tion of barcodes, which quickly increases computational complexity whenthe size of data increases [Chung et al., 2019a, Hu et al., 2019]. This isimpractical in brain networks with far larger number of topological featuresinvolving hundreds of connected components and thousands of cycles. Inthis paper, motivated by [Cohen-Steiner et al., 2010, Hu et al., 2019], wepropose a more principled approach that learns the topological structureof brain networks with large number of features in O(|E| log |V |) run-timefor |E| number of edges and |V | number of vertices. Our proposed methodbypasses the intrinsic computational bottleneck and thus enables us to per-form various topology computations and optimizations at every possibleresolution.

We illustrate the proposed topological learning method on brain net-work data obtained from the resting-state functional magnetic resonanceimages of 194 twin pairs obtained from the Human Connectome Project[Van Essen et al., 2012, 2013]. HCP twin brain imaging data is consideredas the gold standard, where the zigosity is confirmed by the blood and salivatest [Gritsenko et al., 2020]. Monozygotic (MZ) twins share 100% of geneswhile dizygotic (DZ) twins share 50% of genes [Falconer and Mackay, 1995].MZ-twins are more similar or concordant than DZ-twins for cognitive ag-ing, cognitive dysfunction and Alzheimer’s disease [Reynolds and Phillips,2015]. These genetic differences allow us to pull apart and examine geneticand environmental influences easily in vivo. The difference between MZ-and DZ-twins directly quantify the extent to which phenotypes are influ-enced by genetic factors. If MZ-twins show more similarity on a given traitcompared to DZ-twins, this provides evidence that genes significantly influ-

3

Page 4: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Figure 1: Schematic of topological learning on brain networks. (a) TheAutomated Anatomical Labeling (AAL) atlas obtained through structural-MRI is used to partition the human brain into 116 disjoint regions, whichform the nodes of brain networks. In functional MRI, brain activity at eachnode is measured as a time series of changes associated with the relativeblood oxygenation level (b-top). The functional connectivity between twonodes is given as the correlation between their fMRI time series, resultingin the functional network G through metric transform (c-top). The struc-tural connectivity between two brain regions is measured by the numberof white matter fiber tracts passing through them using dMRI (b-bottom).Structural connectivities over all subjects are then normalized and scaled,resulting in the structural network P (c-bottom) that serves as the tem-plate where statistical analysis can be performed. The structural networkP is sparse while the functional network G is densely connected. Since bothnetworks are topologically different, it is difficult to integrate them togetherin a coherent model. Simply overlaying functional brain networks on top ofthe structural network, as usually done in the field, will completely destroy1D topology (cycles) of the functional networks [Zhu et al., 2014]. (d) Us-ing the proposed framework, we learn network Θ that has the topologicalcharacteristics of both functional and structural networks.

ence that trait. Previous twin brain imaging studies mainly used univariateimaging phenotypes such as cortical surface thickness [McKay et al., 2014],fractional anisotropy [Chiang et al., 2011], functional activation [Bloklandet al., 2011, Glahn et al., 2010, Smit et al., 2008] in determining heritabil-ity in few regions of interest. Compared to existing studies on univariate

4

Page 5: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

imaging phenotypes, there are not many studies on the heritability of thewhole brain functional networks [Blokland et al., 2011]. Measures of net-work topology and features may be worth investigating as intermediatephenotypes that indicate the genetic risk for a neuropsychiatric disorder[Bullmore and Sporns, 2009]. However, the brain network analysis has notyet been adapted for this purpose beyond a small number of regions. De-termining the extent of heritability of the whole brain networks is the firstnecessary prerequisite for identifying network-based endophenotypes. Theproposed method will be used to determine the heritability of functionalbrain networks while integrating the structural brain network information.We demonstrate that our method increases the sensitivity in detecting sub-tle genetic signals.

2 Methods

2.1 Preliminary

Consider a complete network represented as a graph G = (V,w) comprisinga set of nodes V and unique positive symmetric edge weights w = (wij).The proposed method is translation invariant so any negative edge weightscan be made positive by translations. The condition of having unique edgeweights is not restrictive in practice. Assuming edge weights follow somecontinuous distribution, the probability of any two edge weights being equalis zero. This is particularly true in functional brain networks where edgeweights are given as the Pearson correlation between time series of brainactivity [Fornito et al., 2016]. In case of equal edge weights, we can simplyadd infinitesimally small noise and break the tie. Since the proposed topo-logical loss is based on the Wasserstein distance, which enjoys the stabilitytheorem [Cohen-Steiner et al., 2010], adding infinitesimally small noise willnot affect the final numerical outcome.

The cardinality of sets is denoted using | · |. The number of nodesand edges are then denoted as |V | and |E|. Since G is a complete graph,we have |E| = |V |(|V | − 1)/2. Any incomplete graph can be treated asa special case of a complete graph with zero edge weights. Then we cansimply add infinitesimally small noise to zero edge weights and breakingties. We emphasize that the proposed method works for any arbitrarygraphs. However, assuming the graph to be complete with unique positiveedge weights makes exposition of the method straightforward.

5

Page 6: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Figure 2: (a) Graph filtration on four-node network G. β0 is monoton-ically increasing while β1 is monotonically decreasing over the filtration.Connected components are born at the edge weights w3, w5, w6 while cy-cles die at the edge weights w1, w2, w4. A cycle consisting of w4, w5, w6

persists longer than any other 1D features and considered as topologicalsignal. 0D barcode P0 = {(−∞,∞), (w3,∞), (w5,∞), (w6,∞)} is rep-resented using the birth values as I0(G) = {w3, w5, w6}. 1D barcodeP1 = {(−∞, w1), (−∞, w2), (−∞, w4)} is represented using the death val-ues as I1(G) = {w1, w2, w4}. (b) We can prove that 0D and 1D barcodesuniquely partition the edge weight set W , i.e., W = I0(G) ∪ I1(G) withI0(G) ∩ l1(G) = ∅.

The binary graph Gε = (V,wε) of G is defined as a graph consisting ofnode set V and binary edge weight wε given by

wε = (wε,ij) =

{1 if wij > ε;

0 otherwise.

A graph filtration of G is defined as a collection of nested binary networks[Lee et al., 2011b, 2012]:

Gε0 ⊃ Gε1 ⊃ · · · ⊃ Gεk ,

where ε0 < ε1 < · · · < εk are filtration values. Traditionally, edge weightsare simply taken as the filtration values. During the filtration, each removededge either increases β0 or decreases β1 at most by one each [Chung et al.,2019b]. Figure 2 displays an example of the graph filtration on a four-nodenetwork.

In persistent homology, the 0-dimensional topological feature is a con-nected component, which is the set of nodes and edges that are connected

6

Page 7: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

to each other by paths. In Figure 2, there is only one connected compo-nent in graph G0 since every nodes and edges are connected through pathswhile Gw6 has four connected components since all the four nodes cannot bereached to each other by any path. The number of connected componentsis called the 0-th Betti number β0. Thus we can write β0(G0) = 1 andβ0(Gw6) = 4.

The 1-dimensional topological feature is a cycle or loop, which is a paththat starts and ends at the same node but no other nodes in the path areoverlapping. In Gw3, there is one cycle consisting of edges w4, w5 and w6.The cycle can be algebraically represented as [w4] + [w5] + [w6] with theconvention of putting clockwise orientation along the edges. In Gw1 , thereare three cycles consisting of [w4] + [w5] + [w6], −[w5] + [w3] + [w2] and[w4] + [w3] + [w2] + [w6]. However, they are linearly dependent in a sensethat the cycle consisting of four nodes can be written as the sum of the twoother smaller cycles:

[w4] + [w3] + [w2] + [w6] = ([w4] + [w5] + [w6]) + (−[w5] + [w3] + [w2]).

Thus, there are only two algebraically independent cycles in Gw1 . The totalnumber of algebraically independent cycles is the 1-st Betti number β1. Wecan write β1(Gw3) = 1 and β1(Gw1) = 2. Unlike Rips complexes [Ghrist,2008], which are often used in persistent homology, there are no more higherdimensional topological features than 0D and 1D topological features.

Persistent homology keeps track of appearances (birth) and disappear-ances (death) of connected components and cycles over filtrations, and as-sociates their persistence (the duration of birth to death) to them. Longerpersistence indicates the presence of larger topological signal [Edelsbrunnerand Harer, 2008]. The persistence of topological features are algebraicallyrepresented as the collection of intervals [εb, εd], where a feature appears atthe filtration value εb and vanishes at the filtration value εd [Adler et al.,2010, Ghrist, 2008, Lee et al., 2011a, Songdechakraiwut et al., 2021].

2.2 Birth-death decomposition

During the graph filtration, ignoring all the death value of connected com-ponents at∞ and the birth value at −∞, we can represent the 0D barcodefor connected components as increasing birth values

I0(G) : εb1 < εb2 < · · · < εbm0 ,

7

Page 8: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

where m0 = β0(G∞)−1 = |V |−1. I0(G) forms the maximum spanning tree(MST) [Lee et al., 2012]. Ignoring all the birth values of cycles at −∞, wecan represent the 1D barcode for cycles as increasing death values

I1(G) : εd1 < εd2 < · · · < εdm1 ,

where m1 = β1(G0) = (|V | − 1)(|V | − 2)/2. Deleting edge wij in the graphfiltration result in either the birth of a connected component or the deathof a cycle. However, the birth of a component and the death of a cyclecannot possibly happen at the same time. Thus, every edge weight mustbe in either 0D barcode or 1D barcode but not both (Figure 2-b). Thus,we have

Theorem 1. The birth set I0(G) and death set I1(G) partition the edgeweight set W such that W = I0(G) ∪ I1(G) with I0(G) ∩ l1(G) = ∅. The

cardinalities of I0(G) and I1(G) are |V | − 1 and (|V |−1)(|V |−2)2

, respectively.Further, I0(G) is MST of G and I1(G) is none-MST part of the edges.

Theorem 1 is a non-trivial statement and used in development of theproposed topological learning framework.

2.3 Topological loss

Since a network is topologically completely characterized by 0D and 1Dbarcodes, the topological similarity between two networks can be measuredusing differences of such barcodes. We modified the Wasserstein distance tomeasure the differences in the barcodes for 1-skeleton [Clough et al., 2019,Cohen-Steiner et al., 2010, Hu et al., 2019, Kolouri et al., 2019, Rabin et al.,2011]. Let Θ = (V Θ, wΘ) and P = (V P , wP ) be two given networks. Fornow, we will simply assume that the two networks have the same size, i.e.,|V Θ| = |V P |. Case |V Θ| 6= |V P | will be explained later. The topologicalloss Ltop(Θ, P ) is defined as the optimal matching cost

Ltop(Θ, P ) = minτ

( ∑εb∈I0(Θ)

[εb − τ(εb)

]2+

∑εd∈I1(Θ)

[εd − τ(εd)

]2),

where τ is a bijection from I0(Θ) ∪ I1(Θ) to I0(P ) ∪ I1(P ). It is reasonableto match 0D to 0D persistences and 1D to 1D persistences. Thus, wefurther restrict the bijection to map from I0(Θ) to I0(P ) and I1(Θ) toI1(P ). Subsequently, it is equivalent to optimize separately as

Ltop(Θ, P ) = minτ0

∑εb∈I0(Θ)

[εb − τ0(εb)

]2+ min

τ1

∑εd∈I1(Θ)

[εd − τ1(εd)

]2,

8

Page 9: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

where τ0 is a bijection from I0(Θ) to I0(P ) and τ1 is a bijection from I1(Θ)to I1(P ). We will call the first term as 0D topological loss, which measuresthe topological similarity between the two networks Θ and P using the dif-ference in 0D barcodes and is denoted by L0D(Θ, P ). Similarly, we will callthe second term as 1D topological loss, which measures the topological sim-ilarity through the difference of 1D barcodes and is denoted by L1D(Θ, P ).The 0D topological loss L0D is a variation to the standard assignment prob-lem and usually solved in a greedy fashion using Hungarian algorithm inO(|I0(Θ)|3

), or equivalently O

(|V Θ|3

), in combinatorial optimization [Ed-

monds and Karp, 1972]. However, for 1-skeletons, minimum matching isgiven exactly and can be numerically computed in O

(|I0(Θ)| log |I0(Θ)|

)time:

Theorem 2.

minτ0

∑εb∈I0(Θ)

[εb − τ0(εb)

]2=

∑εb∈I0(Θ)

[εb − τ ∗0 (εb)

]2,

where τ ∗0 maps the i-th smallest birth value in I0(Θ) to the i-th smallestbirth value in I0(P ) for all i.

Similarly for the 1D topological loss L1D, we also have

Theorem 3.

minτ1

∑εd∈I1(Θ)

[εd − τ1(εd)

]2=

∑εd∈I1(Θ)

[εd − τ ∗1 (εd)

]2,

where τ ∗1 maps the i-th smallest death value in I1(Θ) to the i-th smallestdeath value in I1(P ) for all i.

The optimal bijections τ ∗0 , τ ∗1 can be computed in O(|I1(Θ)| log |I1(Θ)|

)time. The minimization in Theorems 2 and 3 is equivalent to the fol-lowing assignment problem related to the sliced Wasserstein distance andWassertein barycenter [Rabin et al., 2011, Vayer et al., 2019]. For mono-tonic sequences

a1 < a2 < · · · < an, b1 < b2 < · · · < bn,

we consider finding minτ∑n

i=1(ai−τ(ai))2 over all possible bijections τ . The

optimal bijection is simply given by the identity permutation τ(ai) = bi and

9

Page 10: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

proved by induction. The problem statement can be extended to a moregeneral assignment problem between different number of data

a1 < a2 < · · · < am, b1 < b2 < · · · < bn.

However, we longer have the analytic solution given in Theorems 2 and3 that enable the fast scalable algorithms for topological learning. Theunbalanced matching issue is discussed in the next section.

2.4 Topological loss between networks of differentsizes

Let Θ = (V Θ, wΘ) and P = (V P , wP ) be networks of different sizes, i.e.,|V Θ| > |V P |. Case for |V Θ| < |V P | can be argued similarly so is notdiscussed. Theorem 1 implies that the number of births and deaths arelarger for Θ, i.e.,

|I0(Θ)| > |I0(P )| and |I1(Θ)| > |I1(P )|.

Thus, there is no bijection between the birth sets and between the deathsets. In this case, we can directly generalize Theorems 2 and 3 by relax-ing τ0 and τ1 to no longer be a bijection. Then τ0 and τ1 should assignmultiple values to one value. There are few algorithms available for han-dling the general cases but such algorithms do not yield analytic solutionsgiven in Theorems 2 and 3 [Ramshaw and Tarjan, 2012]. In this study, weexplore two major methods for handling networks of different sizes: dataaugmentation and empirical distributions.

Data augmentation is probably the most popular technique in matchingvarious topological features of different sizes [Hu et al., 2019], trees of differ-ent sizes [Guo and Srivastava, 2020] and point sets of different sizes [Chunget al., 2019b]. The augmentation is often done by augmenting dummy val-ues that represent short-lived topological features [Edelsbrunner and Harer,2008, Ghrist, 2008]. Since |I0(Θ)| > |I0(P )|, some birth values in Θ maynot have corresponding matches in P . Since connected components that areborn later have shorter persistences, we may match any unmatched birthvalues in Θ to the largest birth value with the least persistence in P . Weaugment I0(P ) by |I0(Θ)| − |I0(P )| number of dummy values each equal tothe largest edge weight in P to ensure that bijections between I0(Θ) andI0(P ) exist. Similarly, we can bypass cardinality difference between deathvalues by augmenting I1(P ) using |I1(Θ)|−|I1(P )| number of dummy values

10

Page 11: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

each equal to the smallest edge weight with the shortest-lived cycle in P .The proposed data augmentation penalizes unmatched points in Θ basedon their distances to the least persistence in P and thus prioritizes promi-nent topological features with longer persistences while likely overlooks thecontribution of shorter persistence points [Patrangenaru et al., 2019, Robinsand Turner, 2016, Xia and Wei, 2014].

An alternate approach is to write down the topological loss using theempirical distribution function [Bonneel et al., 2015, Carriere et al., 2017,Deshpande et al., 2018, Karras et al., 2018, Kolouri et al., 2017, Liutkuset al., 2019]. The empirical distribution functions for Θ and P for birthsets are defined as

FΘ(x) =1

|I0(Θ)|∑

εb∈I0(Θ)

1εb≤x, (1)

FP (x) =1

|I0(P )|∑

εb∈I0(P )

1εb≤x, (2)

where 1εb≤x is the indicator having the value 1 if εb ≤ x and the value0 otherwise. Their psedoinverses F−1

Θ (z) and F−1P (z) are defined as the

smallest x for which FΘ(x) ≥ z and FP (x) ≥ z. Then 0D topological lossis given by [Kolouri et al., 2017]

L0D(Θ, P ) =

∫ 1

0

(F−1

Θ (x)− F−1P (x)

)2dx. (3)

Similarly, 1D topological loss can be defined in terms of the empirical func-tion for death sets. If we compute the loss directly by computing the in-tegral numerically, we can compute the loss between networks of differentsizes. When the data sizes are identical, this method turns into our ana-lytic expression. Since the cumulative distributions are well defined evenwith the same birth and death values, the Wasserstein distances are all welldefined in the case of tied birth and death values and will have identicalmathematical forms.

Other than two major techniques for handling networks of different sizes,there are few other available methods [Deshpande et al., 2018, Karras et al.,2018]. In particular, Marchese and Maroulas [2018] proposed a variant ofthe Wasserstein distance that explicitly penalizes for cardinality differencesin shorter persistence points, and shown that the variant better utilizes ge-ometric information hence improves discriminative power in a classificationtask [Love et al., 2021].

11

Page 12: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

In actual brain imaging application, brain networks are usually con-structed following the same pipeline, which results in the networks with thesame number of nodes [Sporns, 2003, Fornito et al., 2016]. Thus, matchingbrain networks of different sizes can often be avoided and there is no needto augment data or use the empirical cumulative distribution approach (3).Comparing the two techniques, when the numerical accuracy against thedefinition of the Wasserstein distance matters, data augmentation meth-ods will not provide the accurate distance. In such a case, the empiricalcumulative distribution approach (3) is preferable.

2.5 Topological Learning

There have been many attempts to combine topology into learning and in-ference frameworks. Adler et al. [2017] proposed the parametric model forpersistence diagrams based on the Gibbs distribution and developed mod-eling, replication and inference procedures. Marchese and Maroulas [2018]built the probability measure on metric space of persistence diagrams andused in classifying signals. In Maroulas et al. [2020], Bayesian frameworkwas developed by modeling persistence diagrams as a Poisson point pro-cess. Naitzat et al. [2020] investigated how Betti numbers change whendifferent activation functions are used in deep neural networks. Naitzatet al. [2020] suggests using deep neural networks as input to topologicallearning frameworks. Love et al. [2021] proposed topological convolutionalneural networks in deep learning. Learning can also be done by minimizingthe topological loss that replaces the usual Euclidean distance based lossesin learning models [Chen et al., 2019].

Let G1 = (V,w1), · · · , Gn = (V,wn) be the observed networks withidentical node set V that will be used as training networks. Let P =(V P , wP ) be a network expressing a prior topological knowledge. In brainnetwork analysis [Lv et al., 2010, Zhu et al., 2014, Kang et al., 2017], Gk

can be the functional brain network of the k-th subject obtained from theresting-state fMRI and P can be the template structural brain networkobtained through dMRI, where functional brain networks are often overlaid(Figure 1). The node sets V and V P may differ. This can happen if wetry to integrate brain networks obtained from different parcellations andstudies.

We are interested in learning individual network model from given k-th

12

Page 13: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Figure 3: Topological learning of functional brain networks of a subject.Left: When λk = 0, the learned network Θk is simply the individual networkGk. As λk increases, Θk is deformed such that the topology of Θk is closerto structural template P . Middle: The sum of losses as a function λk for5 different subjects. Solid line is the subject with the smallest losses. λk ischosen to minimize the topological loss individually. Right: Distribution ofoptimal λk centered around λ = 1.0000± 0.0002.

functional network Gk = (V,wk) by minimizing

Θk = arg minΘ

LF (Θ, Gk) + λkLtop(Θ, P ), (4)

where the squared Frobenius loss LF (Θ, Gk) = ||wΘ−wk||2F is the goodness-of-fit term between the model and the individual observation. The optimalparameter λk is chosen differently for each subject. The parameter λk con-trols the amount of topological information of network P we are introduc-ing to the model. The larger the value of λk, the more we are learningtoward P . If λk = 0, we no longer learn the topology of P but simplyfit data to the individual network Gk. Since P is the average structuralbrain network obtained from diffusion MRI, we are trying to learn towardthe population average. Optimization is done as follows. For each fixedλk, we find optimal Θ by minimizing the loss (4) over all possible Θ. Wethen determine what λk gives the minimum possible loss. Figure 3-middledisplays how the sum of estimated individual losses LF and Ltop behavesfor five subjects for each fixed λk. Figure 3-right displays the histogram ofthe distribution of optimal λk. The average optimal λk over all subjects isλk = 1.0000± 0.0002 showing highly stable result. Such stable result is notpossible in non-topological loss functions. Similar to the stability results inpersistent homology [Cohen-Steiner et al., 2010], we can algebraically show

Ltop(Θ, P ) ≤ C||wΘ − wP ||2F

13

Page 14: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Figure 4: Group-level networks of female (top row) and male (bottom row)are estimated by minimizing the objective function (5) with different λ =0, 1 and 100.

for some C providing the stability of topological loss. For real data we usedin the study, we have the least upper bound 0.4102 for all subjects.

Although the group-level learning is not the focus of this study, we canalso learn network Θ using all the training data such that

Θ = arg minΘ

1

n

n∑k=1

LF (Θ, Gk) + λLtop(Θ, P ) (5)

Figure 4 displays the average networks of females and males by minimizingthe objective function (5) with different λ = 0, 1 and 100. The larger thevalue of λ, the more we are reinforcing the topology of the structural brainnetwork that is sparse onto the functional brain network (Figure 5). Eventhough we will not show in this paper, the statistical significance of networkdifferences between females and males can be determined by using the exacttopological inference developed in our previous work [Chung et al., 2017b,2019b].

14

Page 15: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

(a)

(b)

Figure 5: (a) The group-level networks are learned by minimizing the ob-jective function (5) over all subjects with different λ = 0, 1 and 100. Thetemplate structural network P is shown in the last column. (b) As λ in-creases, Betti-plots of the group-level network are adjusted toward that ofP . β0-plot shows that the connected components in the structural net-work P are gradually born over a wide range of edge weights during graphfiltration. β1-plot shows topological sparsity of lack of cycles in the struc-tural network P . While the group-level functional network (when λ = 0)is densely connected with 6555 maximum number of cycles, the structuralnetwork is sparsely connected with only 1709 cycles.

15

Page 16: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

2.6 Averaging networks of different sizes and topol-ogy

As an application of the proposed topological learning framework, it is alsopossible to average networks of different sizes and topology directly withoutaligning to the template network as in model (5). This might be useful ina situation when we do not have the template or it is not necessary toalign networks to a template. Averaging networks of different topology is adifficult task using existing methods.

Given n networks G1 = (V1, w1), · · · , Gn = (Vn, w

n) with different nodesets, we are interested in obtaining its average, which we will call the topo-logical mean. Since the size and topology of the networks are different,we cannot simply average edge weight matrices w1, · · · , wn directly. Moti-vated by Frechet mean [Le and Kume, 2000, Turner et al., 2014, Zemel and

Panaretos, 2019], we obtain the topological mean Θ by minimizing the sumof topological losses

Θ = arg minΘ

n∑k=1

Ltop(Θ, Gk) = arg minΘ

n∑k=1

[L0D(Θ, Gk) + L1D(Θ, Gk)

].

(6)

Θ is viewed as a network that is the topological centroid of n networks. Theoptimization can be done analytically as follows [Rabin et al., 2011].

For now, we assume the same number of nodes in the networks, whichgives the same m0 number of birth values. The 0D topological loss L0D

depends on the birth values of G1, · · · , Gn. Let bk1 < bk2 < · · · < bkm0 bethe birth values of network Gk. Let θ1 < θ2 < · · · < θm0 be the birth valuesof network Θ. By Theorem 2, the first term is equivalent to

n∑k=1

L0D(Θ, Gk) =n∑k=1

(θ1 − bk1)2 +n∑k=1

(θ2 − bk2)2 + · · ·+n∑k=1

(θm0 − bkm0)2.

This is a quadratic so we can simply find the minimum by setting its deriva-tive equal to zero. This is given by θj =

∑nk=1 bkj

/n. For the second term,

it is similarly represented as the sum of squared differences of death values(Theorem 3). Thus, the i-th smallest birth (or death) value of the topolog-

ical mean network Θ is given by the mean of all the i-th smallest birth (ordeath) values of n networks.

Given all the birth and death values of a network, we can completelyrecover the topology of the network. However, just like Frechet mean, the

16

Page 17: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Figure 6: Examples of averaging networks of different sizes and topologyusing the proposed topological averaging. The topological mean network Θ(right) is the topological centroid of five networks G1, · · · , G5 (left) showing

the average topological pattern. The topological mean network Θ is esti-mated by minimizing the sum of topological losses minΘ

∑5k=1 Ltop(Θ, Gk).

The topological mean network Θ highlights topological characterization ofthe five networks. The existing methods will have difficulty averaging net-works of different sizes and topology.

average network is not unique in a geometric sense. It is only unique inthe topological sense. We can have multiple different average networks thatare geometrically different but with exactly identical topology. For twonetworks A and B whose edge weights are different, we can have identicalbirth sets I0(A) = I0(B) and identical death sets I1(A) = I1(B) such thatLtop(A,C) = Ltop(B,C) for any network C. A simplest such example canbe obtained by permuting node labels or rotating networks geometrically.Such nonuniqueness may not be disadvantageous in classification tasks andimage segmentation. By rotating graphs embedded in 2D images, trainingsamples can be drastically increased, boosting the segmentation and clas-sification performance [Marcos et al., 2016, Taylor and Nitschke, 2018]. Inour brain network application, to avoid this geometric ambiguity, we are in-troducing the Frobenius loss that will constrain the networks geometrically.Figure 6 illustrates toy examples of averaging networks of different sizes andtopology. Since we can have many differently shaped networks that are alltopologically equivalent, it is not possible to identify Θ uniquely with theaveraged birth and death values. For Figure 6-top example, it would evenbe possible to have a triangle with one extended edge.

17

Page 18: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

2.7 Numerical implementation

The topological learning (4) estimates Θ = (V Θ, wΘ) iteratively throughgradient descent [Bottou, 1998]. The gradient of the topological loss can becomputed efficiently without taking numerical derivatives and its computa-tion mainly comprises the computation of barcodes I0 and I1, and findingthe optimal matchings through Theorems 2 and 3. The gradient of thetopological loss ∇Ltop(Θ, P ) with respect to edge weight wΘ = (wΘ

ij) isgiven as a gradient matrix whose ij-th entry is

∂Ltop(Θ, P )

∂wΘij

=

{2[wΘij − τ ∗0 (wΘ

ij)]

if wΘij ∈ I0(Θ);

2[wΘij − τ ∗1 (wΘ

ij)]

if wΘij ∈ I1(Θ)

since I0(Θ) and I1(Θ) partition the weight set (Theorem 1). Intuitively,by slightly adjusting the edge weight wΘ

ij , we have the slight adjustment ofeither a birth value in 0D barcode or a death value in 1D barcode, whichslightly changes the topology of the network. During the estimation of Θ,we take steps in the direction of negative gradient:

wΘij → wΘ

ij − 0.1(

2(wΘij − wkij) + λ

∂Ltop(Θ, P )

∂wΘij

),

where 0.1 is the learning rate. As wΘij are moved closer to its optimal match,

the topology of the estimated network Θ gets closer to that of P while theFrobenius norm keeps the estimation Θ close to the observed network Gk.

Finding 0D birth values I0(G) is equivalent to finding edge weights com-prising the maximum spanning tree (MST) of G [Lee et al., 2012]. Once I0

is computed, I1 is simply given as the rest of the remaining edge weights(Theorem 1). Then, we can compute the optimal matchings τ ∗0 and τ ∗1between Θ and P by simply sorting edge weights in the ascending orderand matching them. The computational complexity for the topological lossgradient is dominated by the computation of the MST using the popularalgorithms such as Prim’s and Kruskal’s, which take O(|E| log |V |) run-timefor |E| number of edges and |V | number of vertices [Lee et al., 2012].

Many fast algorithms for bottleneck and Wasserstein distances uti-lize the specific geometric structure of data [Sharathkumar and Agarwal,2012]. Kerber et al. [2017] proposed an algorithm based on k-d trees withO(n3/2 log n) run-time for n scatter points in an arbitary persistence di-agram. Translated into the graph filtration setting, this is equivalent toO(|E|3/2 log |E|) or equivalently O(|E|3/2 log |V |). The algorithm in Kerber

18

Page 19: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Figure 7: Simulation study 1 with small noise (σ = 0.1). Network difference.The comparison of networks with different topology: k = 5 vs. 20 (first row)and 10 vs. 20 (second row). The change of Betti numbers over filtrationvalues clearly shows topology differences. The topological difference in thesecond row is subtle compared to the first row.

et al. [2017] is obviously faster than the Hungarian algorithm with O(|E|6)but slightly slower than the proposed method with O(|E| log |V |). We ex-plicitly utilized the algebraic structure of graph filtrations and simplifiedthe problem of matching in 2D scatter points to 1D order statistics of birthand death values.

3 Validation

For validation, we performed two simulation studies to assess the perfor-mance of the topological loss as a similarity measure between networks ofdifferent topology. Networks of the same size were simulated since manyexisting methods cannot handle networks of different sizes.

19

Page 20: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Study 1: Random network model with ground truth

Initial data vector bi at node i was simulated as independent and identicallydistributed multivariate normal across n subjects, i.e., bi ∼ N (0, In) withthe identity matrix In as the covariance matrix of size n×n. The new datavector xi at node i was then generated by introducing additional dependencystructures to bi through a mixed-effects model that partitions the covariancematrix of xi into c blocks forming modular structures (Figure 7) [Chunget al., 2019b, Snijders et al., 1995]:

x1, ..., xa = b1 +N (0, σ2In),

xa+1, ..., x2a = ba+1 +N (0, σ2In),

...

x(c−1)a+1, ..., xca = b(c−1)a+1 +N (0, σ2In),

where a is the number of nodes in each module.In this simulation, 100-node networks are used. Then they were parti-

tioned into c modules, where c is chosen such that 100 is divisible by c. Thischoice of c makes the node partition straightforward. Thus, c = 2, 5, 10, 20number of modules are chosen. In each module, there are a = 50, 20, 10, 5nodes respectively. Then the simulation is done in small noise (σ = 0.1)and large noise (σ = 0.5, 1) settings. Then we computed Pearson correla-tion coefficient ρxij between xi and xj, which was then translated and scaled

as wxij =√

(1− ρxij)/

2 as a metric [Chung et al., 2019b]. This gives a

block network X = (V,wx). The mixed-effects model allows us to explicitlysimulate the amount of statistical dependency between modules and nodes,providing the control over topological structures of connectedness. Figure 7shows examples of simulated modular networks. As the variability increases,it gets more difficult to discriminate between different modular networks.The variability σ = 1 is large enough to mask topological differences and itis expected all the methods will not perform well.

Based on the statistical model above, we simulated two groups ofnetworks consisting of n = 7 subjects in each group for two differentstudies. The small sample size is chosen such that the exact permutationtest can be done by generating exactly

(147

)= 3432 number of every

possible permutation. We then tested the performance of topologicalloss for network differences. For comparison, we tested the topologicalloss against widely-used Euclidean losses such as L1-, L2- and L∞-norms.

20

Page 21: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Table 1: Study 1. The performance results are summarized (from top tobottom σ = 0.1, 0.5, 1 in terms of false negative rate (2 vs. 10, 5 vs. 20,10 vs. 20) and false positive rate (2 vs. 2, 5 vs. 5, 10 vs. 10). Smallernumbers are better.

c L1 L2 L∞ GH Bottleneck KS(β0) KS(β1) Ltop2 vs. 10 0.02 0.00 0.02 0.30 0.40 0.00 0.00 0.025 vs. 20 0.06 0.00 0.02 0.20 0.32 0.02 0.00 0.0010 vs. 20 1.00 0.86 0.34 0.32 0.20 0.24 0.00 0.082 vs. 2 0.00 0.00 0.00 0.00 0.00 0.54 0.88 0.005 vs. 5 0.00 0.00 0.00 0.00 0.00 0.10 0.42 0.00

10 vs. 10 0.00 0.00 0.00 0.00 0.00 0.02 0.12 0.00

2 vs. 10 0.04 0.02 0.42 0.84 0.60 0.00 0.00 0.105 vs. 20 0.16 0.08 0.58 0.82 0.72 0.02 0.00 0.0810 vs. 20 1.00 0.98 0.80 0.92 0.82 0.06 0.00 0.622 vs. 2 0.00 0.00 0.00 0.02 0.00 0.80 0.56 0.005 vs. 5 0.00 0.00 0.00 0.02 0.00 0.92 0.58 0.00

10 vs. 10 0.00 0.00 0.00 0.02 0.00 0.98 0.84 0.00

2 vs. 10 0.12 0.14 0.92 0.90 0.68 0.02 0.00 0.245 vs. 20 0.74 0.62 0.92 0.94 0.84 0.06 0.04 0.5610 vs. 20 1.00 1.00 1.00 0.94 0.96 0.06 0.38 0.862 vs. 2 0.00 0.00 0.02 0.02 0.08 0.88 0.40 0.005 vs. 5 0.00 0.00 0.00 0.06 0.00 0.92 0.52 0.00

10 vs. 10 0.00 0.00 0.02 0.06 0.00 0.92 0.74 0.00

We also compared against other topological distances such as bottleneck,Gromov-Hausdorff (GH) and Kolmogorov-Smirnov (KS) distances [Chazalet al., 2009, Chung et al., 2017a, Cohen-Steiner et al., 2007]. The bottle-neck and GH-distance are two widely-used baseline distances in persistenthomology often used in persistence diagrams and dendrograms [Carlssonand Memoli, 2010], and brain networks [Lee et al., 2012]. KS-distancebased on β0 and β1 curves is later introduced as a more intuitive alternativethat gives results that are easier to interpret [Chung et al., 2013, 2019b].For KS-distance, we computed its probability distribution exactly withoutthe permutation test. For all other distances, the permutation test wasused on the two-sample t-statistic.

Network difference. We compared networks with different number

21

Page 22: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

of modules 2 vs. 10, 5 vs. 20 and 10 vs. 20. Since the networks haddifferent topological structures, the distances were expected to detect thedifferences (Figure 7). The simulations were independently performed50 times and the performance results were given in terms of the falsenegative rate computed as the fraction of 50 simulations that gave p-valuesabove 0.05 (Table 1). In general, the topological loss performed verywell in small noise settings while other distances were less sensitive.When the topological difference is obvious as in σ = 0.1, the proposedmethod performed exceptionally well. However, in the large noise settings(σ = 0.5, 1), all the distances including the topological loss did notperform well. Even though the KS-distance on cycles performed well whenσ = 1, it is known to be overly sensitive and often produces false posi-tives in small noise settings such as σ = 0.1 as shown in the next simulation.

No network difference. We compared networks with the number of mod-ules 2 vs. 2, 5 vs. 5 and 10 vs. 10, which should give networks of similartopology in each group. It was expected that the networks were not topo-logically different and we should not detect the network differences. Thesimulations were independently performed 50 times and the performanceresults were given in terms of the false positive rate computed as the frac-tion of 50 simulations that gave p-values below 0.05 (Table 1). While allthe distances performed well when there was no network difference, theEuclidean losses, bottleneck and GH-distance had the tendency to producefalse negatives when there was network difference. On the other hand,KS-distance tended to produce false positives when there was no networkdifference, notably in the small noise setting (σ = 0.1); however, it is ableto discriminate topological signals in the large noise settings (σ = 0.5, 1).Overall, the proposed topological loss performed well.

The bottleneck distance is often used in stability statements in persis-tent homology. However, it is not necessarily better than other topologicaldistances. The bottleneck distance is a lower bound on the GH-distance,which often performs better in two sample comparisons and clustering ap-plications [Chazal et al., 2009, Lee et al., 2012]. The proposed topologicalloss based on the Wasserstein distance performed better than the bottleneckand GH-distances.

22

Page 23: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Figure 8: Study 2 simulation examples. Network modular structure variesas parameters p (probability of connection within modules) and c (the num-ber of modules) change. The modular structure becomes more pronouncedas p increases. The color bar displays edge weights. Edge weight within thesame module follows normal distribution N (µ, σ2) with probability p andGaussian noise N (0, σ2) with probability 1 − p. On the other hand, edgeweights connecting nodes between different modules have probability 1− pof being N (µ, σ2) and probability p of being N (0, σ2).

Study 2: Comparison against graph matching

The aim of this simulation is to evaluate the performance of the proposedtopological matching process against existing graph matching algorithms[Cho et al., 2010, Gold and Rangarajan, 1996, Leordeanu and Hebert, 2005,Leordeanu et al., 2009, Zhou and De la Torre, 2013] in differentiating net-works of different topology. Graph matching algorithms are considered asthe baseline for establishing correspondence between graphs and are usuallydone by penalizing network structures that cannot be exactly matched. LetG1 = (V1, w

1) and G2 = (V2, w2) be two networks. For a graph matching

problem modified for our brain network setting, where the edge weightsare not binary but weighted, we need to find mapping τgm between nodesi1, j1 ∈ V1 and i2, j2 ∈ V2 that best preserves edge attributes between edgeweights w1

i1j1∈ w1 and w2

i2j2∈ w2. In other words, we seek τgm that maxi-

23

Page 24: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

mizes the graph matching cost

J(τgm) =∑

w1i1j1

,w2i2j2

f(w1i1j1, τgm(w2

i2j2)),

where f measures the similarity between edge attributes and the summationis taken over all possible edge weights. The matching cost J(τgm) quantifiessimilarity between two networks by taking large values for similar networksand values close to zero for dissimilar networks hence J(τgm) is somewhatthe inverse of distance metrics. We compared the proposed topological lossagainst four well-known graph matching algorithms: graduated assignment(GA) [Gold and Rangarajan, 1996], spectral matching (SM) [Leordeanuand Hebert, 2005], integer projected fixed point method (IPFP) [Leordeanuet al., 2009] and re-weighted random walk matching (RRWM) [Cho et al.,2010]. Such graph matching methods are widely used as baseline algorithmsin medical imaging, computer vision and machine learning studies [Couret al., 2006, Tian et al., 2012, Wang et al., 2020, Yu et al., 2018, Zhanget al., 2019b, Zhou and De la Torre, 2013]. For all the baseline methods,we used existing implementation codes from authors’ repository websiteslisted in the publication. We also used parameters recommended in thepublic code for each baseline algorithm without modification. Since weare dealing with weighted edges, the graph matching algorithms based onbinary edge weight are excluded in the study [Babai and Luks, 1983, Guoand Srivastava, 2020, Zavlanos and Pappas, 2008].

In study 2, a different random network model from study 1 is used.We simulate a random modular network X with d number of nodes and cnumber of modules, where the nodes are evenly distributed among mod-ules. Figure 8 displays modular networks with d = 24 nodes and c = 2, 3, 6modules such that we have d/c = 12, 8, 4 number of nodes in each module re-spectively. Since the time complexity of the aforementioned graph matchingalgorithms can be very demanding (Figure 9), we considered d = 12, 18, 24and c = 2, 3, 6 in this simulation. Each edge connecting two nodes withinthe same module was then assigned a random weight following a normal dis-tribution N (µ, σ2) with probability p or otherwise Gaussian noise N (0, σ2)with probability 1− p. On the other hand, edge weights connecting nodesbetween different modules had probability 1 − p of being N (µ, σ2) andprobability p of being N (0, σ2). With a larger value of within-module prob-ability p, we have a more pronounced modular structure. Any negative edgeweights were set to zero. This gives the random network X that exhibits

24

Page 25: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Figure 9: Study 2 run-time given in terms of a logarithmic scale. The aver-age runtime measures the amount of time each algorithm takes to computeits matching cost between two modular networks of size d starting fromedge weights as a given input. The run-time performance of the baselinemethods is consistent with Cour et al. [2006] for GA and SM, and Zhanget al. [2019b] for IPFP and RRWM.

topological structures of connectedness. Figure 8 illustrates changes of net-work modular structure as parameters p and c vary. We used µ = 1 andσ = 0.25 universally throughout study 2.

Based on the statistical model above, we simulated two groups of randommodular networks X1, · · · ,Xm and Y1, · · · ,Yn. If there is group difference,the topological loss is expected to be relatively small within groups andrelatively large between groups. The average topological loss within groupgiven by

LW =

∑i<j Ltop(Xi,Xj) +

∑i<j Ltop(Yi,Yj)(

m2

)+(n2

)is expected to be smaller than the average topological loss between groupsgiven by

LB =

∑mi=1

∑nj=1 Ltop(Xi,Yj)mn

.

We measure the disparity between groups as the ratio statistic φL

φL = LB/LW .

If φL is large, the groups differ significantly in network topology. On theother hand, if φL is small, it is likely that there is no group difference.

25

Page 26: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Figure 10: Empirical distributions in Study 2. The empirical distributionsof the ratio statistic were generated by the permutation on two groupseach consisting of 10 modular networks. Here we tested if there is groupdifference in networks with varying parameter c = 2 vs. 3 (left) and 3 vs.3 (right). As expected, the test based on the topological loss rejected thenull hypothesis when there is group difference.

Similarly, we define the ratio statistic for graph matching cost J as

φJ = JW

/JB,

where JW is the average graph matching cost within groups and JB is theaverage graph matching cost between groups. Since the distributions ofthe ratio statistics φL and φJ are unknown, the permutation test is usedto determine the empirical distributions. Figure 10 displays the empiricaldistribution of φL. By comparing the observed group ratio φL to this empir-ical distribution, we can determine the statistical significance of testing thegroup difference. However, when the sample size is large, existing match-ing algorithms are too slow for the permutation test. So we adapted for ascalable computation strategy as follows [Chung et al., 2019c].

Given two groups of networks, topological loss (or graph matching cost)for every pair of networks needs to be computed only once, which can thenbe arranged into a matrix whose rows and columns are networks, and theij-entry is the loss between two networks corresponding to row i and col-umn j (Figure 11). Once we obtain such matrix, the permutation processis equivalent to rearranging rows and columns based on permuted grouplabels. There are 1

2

(m+nm

)total number of permutations excluding the sym-

metry of loss functions. Computing the ratio statistic over a permutationrequires re-summing over all such losses, which is time consuming. Instead,we can perform the transposition procedure of swapping only one networkper group and setting up iteration of how the ratio statistic changes over

26

Page 27: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

the transposition [Chung et al., 2019c].Let X = (X1, · · · ,Xm) and Y = (Y1, · · · ,Yn) be the two groups of

networks. We transpose k-th and l-th networks between the groups as

πkl(X ) = (X1, · · · ,Xk−1,Yl,Xk+1, · · · ,Xm),

πkl(Y) = (Y1, · · · ,Yl−1,Xk,Yl+1, · · · ,Yn).

Over transposition πkl, the ratio statistic is changed from φL(X ,Y) toφL(πkl(X ), πkl(Y)

), which involves the following functions:

ν(X ,Y) =∑i<j

Ltop(Xi,Xj) +∑i<j

Ltop(Yi,Yj),

ω(X ,Y) =m∑i=1

n∑j=1

Ltop(Xi,Yj),

where ν is the total sum of within-group losses and ω is the total sum ofbetween-group losses. Then we determine how ν and ω change over thetransposition πkl. As Xk and Yl are swapped, the function ν is updatedover the transposition πkl as (Figure 11)

ν(πkl(X ), πkl(Y)

)= ν(X ,Y) + δ(X ,Y)

with

δ(X ,Y) =(∑i 6=k

Ltop(Yl,Xi)−∑i 6=k

Ltop(Xk,Xi))

+(∑

i 6=l

Ltop(Xk,Yi)−∑i 6=l

Ltop(Yl,Yi)).

(7)Similarly, function ω are updated iteratively over the transposition πkl as

ω(πkl(X ), πkl(Y)

)= ω(X ,Y)− δ(X ,Y).

The ratio statistic over the transposition is then computed as

φL(πkl(X ), πkl(Y)

)=ω(πkl(X ), πkl(Y)

)ν(πkl(X ), πkl(Y)

) × (m2)+(n2

)mn

.

For each transposition, we store information about the function values νand ω, and update them sequentially. Each transposition requires manipu-lating 2(m+n−2) terms as opposed to

(m+n

2

)total number of terms over a

random permutation. More transpositions than permutations can be gener-ated given the same amount of run-time, which speeds up the convergence

27

Page 28: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Figure 11: Study 2. Two groups each consisting of 5 modular networks sim-ulated with parameters c = 2 vs. 3, resulting in small and large topologicallosses within groups and between groups respectively. The color representsthe topological loss Ltop = L0D + L1D that combines 0D and 1D topologi-cal losses. Left: the matrix whose ij-th entry represents the loss betweennetworks i and j. The main diagonal consists of zero since topological lossbetween two identical networks is zero. Right: a transposition between the2-nd network in Group 1 and the 8-th network in Group 2. We do notneed to recompute all the pairwise losses again but just rearrange them. Inparticular, only the pairwise losses in the solid lines and dashed lines arerearranged. Thus, we simply need to figure out how the rearrangement ofentries changes the ratio statistic in an iterative manner. This enables usto easily perform the permutation test in a scalable fashion. We compute δin equation (7) by subtracting the sum of entries within the solid lines fromthe sum of entries within the dashed lines.

28

Page 29: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Figure 12: Transposition test is applied in determining the statistical signif-icance of two groups each consisting of 10 simulated networks. To furtherspeed up the convergence rate, a random permutation is intermixed for ev-ery sequence of 500 transpositions. The left panel shows the convergence ofp-value over 50000 transpositions. For comparison, ground truth p-value iscomputed from the exact permutation test by enumerating every possiblepermutation exactly once. The left panel shows the average relative erroragainst the ground truth across 100 independent simulations.

of transposition procedure [Chung et al., 2019c]. To further accelerate therate of convergence and avoid possible bias, we introduce a full permutationto the sequence of transpositions. Figure 12 illustrates the convergence oftransposition procedure.

In each simulation, we generated two groups each with 10 random modu-lar networks. We then sequentially computed 200000 random transpositionswhile interjecting a random permutation for every 500 transpositions andobtained the p-value. This guarantees the convergence of p-value within2 decimal places (within 0.01) in average. The simulations were indepen-dently performed 50 times and the average p-value across 50 simulationswas reported.

Network difference. We compared two groups of networks generated byvarying parameter c = 2 vs. 3, 2 vs. 6 and 3 vs. 6 each with d = 12, 18, 24nodes and p = 0.6, 0.8 probability of connection within modules. Since eachgroup exhibited a different modular structure, topological loss and graphmatching cost were expected to detect the group difference. Table 2 sum-marizes the performance results. Networks with d = 12 nodes might betoo small to extract topologically distinct features used in each algorithm.Thus, all graph matching costs performed poorly while the topological lossperformed reasonably well. When the number of nodes increases, all meth-ods show overall performance improvement. In all settings, topological losssignificantly outperforms other graph matching algorithms.

No network difference. We compared networks generated by unvarying

29

Page 30: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

parameter c = 2 vs. 2, 3 vs. 3 and 6 vs. 6 each with d = 12, 18, 24 nodesand p = 0.6, 0.8 probability of connection within modules. Since it wasexpected that there was no topological difference between networks gener-ated using the same values for parameters c, d and p, topological loss andgraph matching cost should not be able to detect the group difference. Theperformance result is summarized in Table 3. In all settings, all methodsperformed well when there was no group difference.

The baseline graph matching methods have low sensitivity over topo-logical differences such as connected components and cycles in networks.They were unable to detect the network differences, especially subtle topo-logical differences. While it might be possible to extend a graph matchingalgorithm to encode high-order geometrical relations, a small incrementin the order of relations usually results in a combinatorial explosion of theamount of data needed to fit the model [Zhou and De la Torre, 2013]. Thus,most high-order graph matching methods are often limited to very sparsenetworks such as binary trees. They are not practical in dense functionalbrain networks with far larger numbers of cycles. On the other hand, theproposed topological loss is able to detect such subtle topological patterndifferences with minimal amount of run-time.

4 Application

Dataset and preprocessing

The dataset is the resting-state fMRI of 412 subjects collected as part ofthe Human Connectome Project (HCP) twin study [Van Essen et al., 2012,2013]. The fMRI were collected over 14 minutes and 33 seconds using agradient-echo-planar imaging (EPI) sequence with multiband factor 8, rep-etition time (TR) 720 ms, time echo (TE) 33.1 ms, flip angle 52◦, 104× 90(RO×PE) matrix size, 72 slices, 2 mm isotropic voxels, and 1200 timepoints. Subjects without fMRI or full 1200 time points were excluded. Ad-ditional details on the imaging protocol is given in https://protocols.

humanconnectome.org/HCP/3T/imaging-protocols.html. During eachscanning, participants were at rest with eyes open with relaxed fixation ona projected bright cross-hair on a dark background [Van Essen et al., 2013].The standard minimal preprocessing pipelines [Glasser et al., 2013] wereapplied on the fMRI scans including spatial distortion removal [Anderssonet al., 2003, Jovicich et al., 2006], motion correction [Jenkinson and Smith,

30

Page 31: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Table 2: Study 2. Network difference. The performance results are sum-marized as average p-values for various parameter settings of d (number ofnodes), c (number of modules) and p (within-module probability).

d c p GA SM RRWM IPFP Ltop

12 vs. 12

2 vs. 30.6 0.45± 0.27 0.48± 0.30 0.28± 0.31 0.34± 0.28 0.08± 0.160.8 0.26± 0.24 0.30± 0.28 0.06± 0.12 0.28± 0.28 0.01± 0.03

2 vs. 60.6 0.06± 0.10 0.17± 0.20 0.04± 0.13 0.23± 0.28 0.00± 0.000.8 0.00± 0.01 0.01± 0.03 0.00± 0.00 0.02± 0.04 0.00± 0.00

3 vs. 60.6 0.40± 0.29 0.35± 0.28 0.24± 0.26 0.35± 0.28 0.06± 0.130.8 0.21± 0.23 0.28± 0.27 0.08± 0.14 0.26± 0.25 0.00± 0.01

18 vs. 18

2 vs. 30.6 0.25± 0.23 0.41± 0.26 0.26± 0.24 0.42± 0.28 0.01± 0.020.8 0.12± 0.17 0.19± 0.22 0.00± 0.00 0.04± 0.05 0.00± 0.00

2 vs. 60.6 0.02± 0.05 0.07± 0.17 0.00± 0.00 0.14± 0.20 0.00± 0.000.8 0.00± 0.00 0.00± 0.00 0.00± 0.00 0.00± 0.00 0.00± 0.00

3 vs. 60.6 0.28± 0.24 0.37± 0.31 0.21± 0.24 0.37± 0.30 0.01± 0.010.8 0.15± 0.22 0.13± 0.14 0.00± 0.01 0.16± 0.18 0.00± 0.00

24 vs. 24

2 vs. 30.6 0.23± 0.25 0.30± 0.26 0.14± 0.20 0.31± 0.28 0.00± 0.010.8 0.06± 0.11 0.12± 0.19 0.00± 0.00 0.01± 0.05 0.00± 0.00

2 vs. 60.6 0.00± 0.01 0.03± 0.06 0.00± 0.00 0.09± 0.13 0.00± 0.000.8 0.00± 0.00 0.00± 0.00 0.00± 0.00 0.00± 0.00 0.00± 0.00

3 vs. 60.6 0.24± 0.26 0.29± 0.28 0.10± 0.13 0.37± 0.26 0.00± 0.000.8 0.07± 0.12 0.13± 0.19 0.00± 0.01 0.12± 0.19 0.00± 0.00

2001, Jenkinson et al., 2002], bias field reduction [Glasser and Van Essen,2011], registration to the structural MNI template, and data masking us-ing the brain mask obtained from FreeSurfer [Glasser et al., 2013]. Thisresulted in the resting-state functional time series with 91× 109× 91 2-mmisotropic voxels at 1200 time points. The subjects ranged from 22 to 36years in age with average age 29.24± 3.39 years. There are 172 males and240 females. Among them, there are 131 MZ-twin pairs and 75 same sexDZ-twin pairs.

Subsequently, we employed the Automated Anatomical Labeling (AAL)template to parcellate the brain volume into 116 non-overlapping anatom-ical regions [Tzourio-Mazoyer et al., 2002] (Figure 1). We averaged fMRIacross voxels within each brain parcellation, resulting in 116 average fMRItime series with 1200 time points for each subject. Previous studies re-ported that head movement produces spatial artifacts in functional con-nectivity [Power et al., 2012, Van Dijk et al., 2012, Satterthwaite et al.,2012, Caballero-Gaudes and Reynolds, 2017]. Thus, we scrubbed the datato remove fMRI volumes with significant head motion [Power et al., 2012,Huang et al., 2020b]. We calculated the framewise displacement (FD) from

31

Page 32: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Table 3: Study 2. No network difference. The performance results aresummarized as average p-values for various parameter settings of d (numberof nodes), c (number of modules) and p (within-module probability).

d c p GA SM RRWM IPFP Ltop

12 vs. 12

2 vs. 20.6 0.49± 0.27 0.46± 0.30 0.51± 0.30 0.47± 0.28 0.53± 0.290.8 0.45± 0.25 0.47± 0.31 0.56± 0.29 0.47± 0.30 0.50± 0.30

3 vs. 30.6 0.45± 0.32 0.44± 0.26 0.47± 0.27 0.51± 0.30 0.46± 0.310.8 0.54± 0.31 0.51± 0.27 0.51± 0.29 0.52± 0.29 0.51± 0.30

6 vs. 60.6 0.57± 0.30 0.51± 0.28 0.56± 0.29 0.45± 0.26 0.58± 0.290.8 0.55± 0.29 0.48± 0.26 0.52± 0.27 0.54± 0.30 0.49± 0.27

18 vs. 18

2 vs. 20.6 0.48± 0.26 0.49± 0.32 0.54± 0.29 0.47± 0.30 0.54± 0.310.8 0.52± 0.28 0.50± 0.28 0.46± 0.30 0.52± 0.25 0.50± 0.26

3 vs. 30.6 0.49± 0.28 0.58± 0.31 0.43± 0.28 0.51± 0.27 0.53± 0.300.8 0.46± 0.30 0.51± 0.27 0.52± 0.33 0.45± 0.29 0.53± 0.27

6 vs. 60.6 0.53± 0.28 0.48± 0.30 0.51± 0.30 0.45± 0.29 0.44± 0.330.8 0.54± 0.27 0.52± 0.30 0.48± 0.26 0.52± 0.31 0.43± 0.30

24 vs. 24

2 vs. 20.6 0.52± 0.28 0.49± 0.30 0.50± 0.30 0.48± 0.28 0.55± 0.260.8 0.53± 0.27 0.56± 0.30 0.51± 0.30 0.56± 0.32 0.52± 0.30

3 vs. 30.6 0.48± 0.29 0.54± 0.27 0.49± 0.26 0.49± 0.30 0.52± 0.300.8 0.55± 0.29 0.49± 0.27 0.52± 0.28 0.49± 0.30 0.47± 0.26

6 vs. 60.6 0.47± 0.30 0.45± 0.31 0.51± 0.29 0.56± 0.28 0.49± 0.290.8 0.51± 0.30 0.47± 0.28 0.54± 0.28 0.56± 0.31 0.51± 0.31

the three translational displacements and three rotational displacements ateach time point to measure the head movement from one volume to thenext. The volumes with FD larger than 0.5 mm and their neighbors (oneback and two forward time points) were scrubbed [Van Dijk et al., 2012,Power et al., 2012, Huang et al., 2020b]. We excluded 12 subjects havingexcessive head movement, resulting in fMRI data of 400 subjects (168 malesand 232 females). More than one third of 1200 volumes were scrubbed inthe excluded 12 subjects. Among the remaining 400 subjects, there are 124monozygotic (MZ) twin pairs and 70 same-sex dizygotic (DZ) twin pairs.The first 20 time points were removed from all subjects to avoid artifactsin the fMRI data, leaving 1180 time points per subject [Diedrichsen andShadmehr, 2005, Shah et al., 2016].

For dMRI, the white matter fiber orientation information was extractedby multi-shell, multi-tissue constrained spherical deconvolution from differ-ent tissue types such as white matter and gray matter [Callaghan et al.,1988, Jeurissen et al., 2014]. The fiber orientation distribution functionswere estimated and apparent fiber densities were exploited to produce thereliable white and gray matter volume maps [Jeurissen et al., 2014, Chris-

32

Page 33: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

tiaens et al., 2015]. Subsequently, multiple random seeds were selected ineach voxel to generate about 10 million initial streamlines per subject withthe maximum fiber tract length at 250 mm and FA larger than 0.06 usingMRtrix3 (http://www.mrtrix.org) [Tournier et al., 2012, Xie et al., 2018].The Spherical-Deconvolution Informed Filtering of Tractograms (SIFT2)technique making use of complete streamlines was subsequently appliedto generate more biologically accurate brain connectivity, which results inabout 1 million tracts per subject [Smith et al., 2015]. Nonlinear diffeomor-phic registration between subject images to the template was performedusing ANTS [Avants et al., 2008, 2011]. AAL was used to parcellate thebrain into 116 regions [Tzourio-Mazoyer et al., 2002]. The subject-levelconnectivity matrices were constructed by counting the number of tractsconnecting between brain regions. The structural brain network P thatserves as the template where all the functional networks are aligned is ob-tained by computing the one sample t-statistic map P over all the subjectsand rescaling t-statistics between 0 to 2 using the hyperbolic tangent func-tion tanh, then adding 1.

We illustrate the proposed topological learning method on graphs withbrain network data. In the popular brain network modeling frameworkutilizing functional magnetic resonance images (fMRI), the whole brain isparcellated into d disjoint regions, where d is usually a few hundreds [Ar-slan et al., 2018, Desikan et al., 2006, Eickhoff et al., 2018, Fan et al., 2016,Fornito et al., 2016, 2010, Glasser et al., 2016, Gong et al., 2009, Hagmannet al., 2007, Schaefer et al., 2017, Shattuck et al., 2008, Tzourio-Mazoyeret al., 2002, Zalesky et al., 2010]. Subsequently, functional or structural in-formation is overlaid on top of the parcellation to obtain d× d connectivitymatrices that measure the strength of connectivity between brain regions(Figure 1). These disjoint brain regions form nodes in the brain network.Connectivity between brain regions that defines edges in the brain networkis usually determined by the type of imaging modality [Ombao et al., 2016].Structural connectivity is obtained through diffusion MRI (dMRI), whichcan trace the white matter fibers connecting brain regions. The strengthof structural connectivity between the brain regions is determined by thenumber of fibers passing through them [Fornito et al., 2016]. The structuralbrain network is expected to exhibit sparse topology without many loops orcycles (Figure 1) [Chung et al., 2011, Gong et al., 2009, Zhang et al., 2018].On the other hand, functional connectivity obtained from the resting-statefunctional MRI (fMRI) is often computed as the Pearson correlation coeffi-cient between brain regions [Bryant et al., 2017, Shappell et al., 2019]. While

33

Page 34: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

structural connectivity provides information whether the brain regions arephysically connected through white matter fibers, functional connectivityexhibits connections between two regions without the direct neuroanatom-ical connections through additional intermediate connections [Honey et al.,2007, 2009]. Thus, resting-state functional brain networks are often verydense with thousands of cycles. Both structural and functional brain net-works provide topologically different information. Existing graph theorybased brain network analyses have shown that there is some common topo-logical profile that is conserved for both structural and functional brainnetworks [Bullmore and Sporns, 2009]. However, due to the difficulty ofintegrating both networks in a coherent statistical framework, not muchresearch has been done on integrating such networks at the localized edgelevel. Many multimodal network researchers focus on comparing summarygraph theory features across different networks [Bullmore and Sporns, 2009,Ginestet et al., 2011, Karas et al., 2019]. Although there are not many,some statistical studies have focused on fusing networks derived from bothmodalities probabilistically, which can easily destroy the aforementionedtopological difference of the networks [Kang et al., 2017, Xue et al., 2015].Thus, there is a need for a new multimodal network model that can easilyintegrate networks of different topology at the localized connection level.

Learning individual networks

Among 400 subjects, there are p = 124 monozygotic (MZ) twin pairsand q = 70 same-sex dizygotic (DZ) twin pairs. For subject k, wehave resting-state fMRI time series x = (x1, x2, ..., x1180) for region i andy = (y1, y2, ..., y1180) for region j with 1180 time points. Correlation ρkijbetween regions i and j is computed as the Pearson correlation betweenx and y, as usually done in the field. This gives the correlation matrixCk = (ρkij), which is used as the baseline against the proposed method. We

then translate and scale the correlation as wkij =√

(1− ρkij)/

2, which is a

metric [Chung et al., 2019b]. The subject-level functional brain networkis given by Gk = (V,wk). The t-statistic map P is used as the templatestructural brain network where the functional network Gk is matched.

Given λk, we applied the topological learning to estimate the subject-level network Θk(λk) by minimizing the objective (4) using the individual

34

Page 35: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

network Gk and the structural network P :

Θk(λk) = arg minΘ

LF (Θ, Gk) + λkLtop(Θ, P ).

Θ is initialized to Gk. For every subject, we globally used λ = 1.0000 wherethe average minimum is obtained (Figure 3).

Heritability in twins

We investigated in localizing what parts of brain networks are geneticallyheritable. In particular, we investigated if the estimated network Θk isgenetically heritable in twins as follows. At edge ij, let (a1l, a2l) be thel-th twin pair in MZ-twin and (b1l, b2l) be the l-th twin pair in DZ-twin.MZ-twin and DZ-twin pairs are then represented as

a =

(a11 · · · a1p

a21 · · · a2p

), b =

(b11 · · · b1q

b21 · · · b2q

).

Let ar = (ar1, ar2, ..., arp) and br = (br1, br2, ..., brq). Then MZ-correlationis computed as the Pearson correlation γMZ(a1, a2) between a1 and a2.Similarly for DZ-correlation γDZ(b1, b2). In the widely-used ACE geneticmodel, the heritability index (HI) h, which determines the amount of vari-ation caused by genetic factors in a population, is estimated using Fal-coner’s formula [Falconer and Mackay, 1995]. Thus, HI h is given byh(a, b) = 2(γMZ − γDZ).

Since the order of the twins is interchangeable, we can transpose the l-thtwin pair in MZ-twin as

πl(a1) = (a11, ..., a1,l−1, a2l, a1,l+1, ..., a1p),

πl(a2) = (a21, ..., a2,l−1, a1l, a2,l+1, ..., a2p)

and obtain another MZ-correlation γMZ(πl(a1), πl(a2)). Likewise, we obtainmany different correlations for DZ-twin. Similar to the transposition testused in the simulation study 2, we perform a sequence of random transposi-tions iteratively to estimate the twin correlations γMZ and γDZ sequentiallyas follows [Chung et al., 2019c].

Over the transposition πl, the MZ-correlation is changed from

35

Page 36: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

γMZ(a1, a2) to γMZ(πl(a1), πl(a2)), which involves the following functions:

ν(ar) =

p∑j=1

arj,

ω(ar, as) =

p∑j=1

(arj − ν(ar)/p)(asj − ν(as)

/p).

The functions ν and ω are updated iteratively over the transposition πl as

ν(πl(ar)) = ν(ar)− arl + asl,

ω(πl(ar), πl(as)) = ω(ar, as) + (arl − asl)2/p− (arl − asl)(µ(ar)− µ(as))

/p.

Then MZ-correlation after the transposition is calculated by

γMZ(πl(a1), πl(a2)) =ω(πl(a1), πl(a2))√

ω(πl(a1), πl(a1))ω(πl(a2), πl(a2)).

The time complexity for computing correlation iteratively is 33 operationsper transposition, which is significantly more efficient than that of directcorrelation computation per permutation. In numerical implementation, wesequentially perform random transpositions πl1 , πl2 , ..., πlJ , which result inJ different twin correlations. Let

κ1 = πl1 , κ2 = πl2 ◦ πl1 , · · · , κJ = πlJ ◦ · · · ◦ πl2 ◦ πl1 .

The average MZ-correlation γMZJ of J correlations is given by

γMZJ =

1

J

J∑j=1

γMZ(κj(a1), κj(a2)),

which is iteratively updated as

γMZJ =

J − 1

JγMZJ−1 +

1

JγMZ(κJ(a1), κJ(a2)).

The average correlation γMZJ converges to the true underlying twin correla-

tion γMZ for sufficiently large J . Similarly, DZ-correlation γDZ is estimated.

36

Page 37: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Results

Using the transposition method, we randomly transposed a twin and up-dated the correlation for 50000 times. This process was repeated 100 timesand the total 50000× 100 correlations were used in estimating the underly-ing MZ- and DZ-correlations. At every edge, the standard deviation of theaverage correlations from 100 results was smaller than 0.01, which guaran-tees the convergence of the estimate within two decimal places on average.

We computed the HI-maps using the original correlation matrix Ck andthe proposed topologically learned network Θk. Figures 13 displays far moreconnections with 100% heritability for the topologically learned network Θk

compared to the original Pearson correlation matrix Ck. As demonstratedin the simulation studies as well, the topological approach boosts subtletopological signals. The networks Θk are expected to inherit topologicalsparsity from the template structural brain network P that has sparse topol-ogy with fewer cycles (Figure 5). This suggests that noisy, short-lived cycleswere removed from the functional networks, improving the statistical sen-sitivity. Comparing HI-maps from the two methods, there are overlaps butthe topological approach detects more connections with higher heritability.For network Θk, left superior parietal lobule and left amygdala connectionshows the strongest heritability among many other connections (Table 4).There are significant overlaps in detected connections between the standardmethod and the topologically learned network. Since it is difficult to visual-ize dense complete graphs in 3D, only edges above HI value 0.85 are shownin Figure 13-a. Thus, if correlation values are slightly changed to below orabove 0.85, connections may look like disappearing or appearing. However,both the methods result in complete graphs, which have the connectionswith different edge weights.

5 Conclusion

We have presented a new topological loss that provides the optimal match-ing and alignment at the edge level. Unlike many existing graph matchingalgorithms that simply provide the matching cost without explicitly identi-fying how edges are matched to each other, the proposed method identifiesthe edge-to-edge correspondence explicitly. [Babai and Luks, 1983, Guoand Srivastava, 2020, Tian et al., 2012, Wang et al., 2020, Yu et al., 2018,Zhang et al., 2019b, Zhou and De la Torre, 2013]. Such mapping enables usto develop the subsequent topological learning framework that can analyze

37

Page 38: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

(a)

(b)

Figure 13: (a) The HI maps for the original Pearson correlation matrix (top)and topologically learned network (bottom) are thresholded at different HI≥ 0.85, 0.9, 0.95 and 1. (b) Most highly heritable connections above HI≥ 1 using the original Pearson correlation matrix (top) and the topologicallylearned network (bottom).

38

Page 39: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Table 4: Top ten most heritable connections with HI ≥ 1 using the topo-logically learned network.

Most heritable connectionsLeft superior parietal lobule – Left amygdalaLeft lobule VIIIB of cerebellar hemisphere – Left globus pallidusLeft lobule III of cerebellar hemisphere – Right crus I of cerebellar hemisphereLeft – Right opercular part of inferior frontal gyrusLeft lobule IV, V of cerebellar hemisphere – Left thalamusLeft lobule IX of cerebellar hemisphere – Left lobule VI of cerebellar hemisphereRight thalamus – Right superior frontal gyrus, dorsolateralLeft middle frontal gyrus, orbital part – Right caudate nucleusRight crus II of cerebellar hemisphere – Left globus pallidusLobule VIII of vermis – Right fusiform gyrus

networks of different sizes and topology. Due to the wide availability of var-ious network data including social networks, computer networks, artificialnetworks such as convolutional neural networks, our method can be easilyadapted for other network applications where matching of whole networksor subnetworks is needed.

Among many different learning tasks, the method is illustrated withaveraging and regression. The method is shown to average networks of dif-ferent sizes and topology, which is not easy with existing methods. Theaverage is topologically unique but geometrically not unique. We can havegeometrically different networks that have identical topology. The methodis further used to set up the optimization based regression models at thesubject and group levels. We believe the proposed method can be easilyapplied to other types of network regression problems. Unlike existing meth-ods that minimize the geometric distance between a model and data, theproposed method penalizes the topological differences explicitly. Thus, themethod should work better in topology related learning tasks such as clus-tering. It is well known that the Euclidean-distance based clustering such ask-means clustering does not perform well against more geometric clusteringmethods such as spectral clustering [Kriegel et al., 2009, Ng et al., 2002].It may be possible that a new clustering method utilizing the proposed lossfunction might perform better than k-means or spectral clustering. The dis-tinct clustering pattern observed in Figure 11 demonstrates the feasibilityof using the topological distance that can easily replace k-means clusteringor spectral clustering. This is left as a future study.

The limitation of the topological loss is the inability to discriminate ge-

39

Page 40: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

ometrically different networks that are identical in topology. We can easilyobtain topologically identical networks by simply mirror reflecting one ofthem. Since the human brain network is asymmetric across hemispheres[Toga and Thompson, 2003], it is critical to be able to discriminate suchnetworks. In our application, we introduced the Frobenius loss to geomet-rically constrain brain networks. We provided one possible approach forcombining the topological and geometrical losses together for brain net-work analyses. It is hoped this paper serves as a springboard for morerefined models in the future.

Existing network prediction models usually employ various forms of re-gressions such as linear models and logistic regressions that incorporate theaccumulated effect of features as the sum of predictive variables in correlat-ing prediction scores [Arslan et al., 2018, Eickhoff et al., 2016, Goodfellowet al., 2016, Kong et al., 2019, Rottschy et al., 2012, Zhang et al., 2019a].Regression models might be reasonable for determining the group level av-erage patterns. However, the underlying network features that matter mostand in what combinations might be just too complex to be discovered inthe regression based predictive models. It is possible to address these chal-lenges by developing new prediction models with topological loss, which iswell suited to address all these challenges [Huang et al., 2020a]. This is leftas a future study.

We have also applied the method to twin brain imaging data in analyzingfunctional and structural brain networks together. Our topological learningframework is more sensitive to detecting subtle network genetic signals thanthe baseline method. In determining the amount of heritability, we usedthe heritability index, which is the twice difference in MZ- and DZ-twincorrelations. Due to the possibility of subjects swapping in twin pairing,the resulting twin correlations are not unique. This has been consideredas the biggest weakness of the widely used ACE model in genetics. Weremedied the problem by computing enough permutations over twin labelswapping through the transposition test. This enables us to perform anetwork analysis at the edge level even if network shapes and topologies aredifferent. We believe the transposition test would be more useful in variousresampling problems beyond twin correlations. This is also left as a futurestudy.

40

Page 41: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

Acknowledgements

We thank Gary Shiu of University of Wisconsin–Madison for the discussionon stability theorems. We thank Li Shen of University of Pennsylvania forproviding a t-statistic map of structural brain networks, which is used as astructural template in this study. The t-statistic map is reported in Chunget al. [2019c], Songdechakraiwut et al. [2021]. We also like to thank Shih-GuHuang of National University of Singapore for providing support for fMRIprocessing. This study is funded by NIH R01 EB022856, EB02875 and NSFMDS-2010778.

References

R. Adler, O. Bobrowski, M. Borman, E. Subag, and S. Weinberger. Persis-tent homology for random fields and complexes. In Borrowing strength:theory powering applications–a Festschrift for Lawrence D. Brown, pages124–143. Institute of Mathematical Statistics, 2010.

R. Adler, S. Agami, and P. Pranav. Modeling and replicating statisticaltopology and evidence for cmb nonhomogeneity. Proceedings of the Na-tional Academy of Sciences, 114:11878–11883, 2017.

J. L. Andersson, S. Skare, and J. Ashburner. How to correct susceptibil-ity distortions in spin-echo echo-planar images: application to diffusiontensor imaging. NeuroImage, 20(2):870–888, 2003.

S. Arslan, S. Ktena, A. Makropoulos, E. Robinson, D. Rueckert, andS. Parisot. Human brain mapping: A systematic comparison of par-cellation methods for the human cerebral cortex. NeuroImage, 170:5–30,2018.

B. Avants, C. Epstein, M. Grossman, and J. Gee. Symmetric diffeomorphicimage registration with cross-correlation: Evaluating automated labelingof elderly and neurodegenerative brain. Medical Image Analysis, 12:26–41, 2008.

B. Avants, N. Tustison, G. Song, P. Cook, A. Klein, and J. Gee. A repro-ducible evaluation of ANTs similarity metric performance in brain imageregistration. NeuroImage, 54:2033–2044, 2011.

41

Page 42: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

L. Babai and E. Luks. Canonical labeling of graphs. In Proceedings of thefifteenth annual ACM symposium on Theory of computing, pages 171–183, 1983.

P. Bendich, J. S. Marron, E. Miller, A. Pieloch, and S. Skwerer. Persistenthomology analysis of brain artery trees. The Annals of Applied Statistics,10(1):198, 2016.

G. Blokland, K. McMahon, P. Thompson, N. Martin, G. de Zubicaray, andM. Wright. Heritability of working memory brain activation. The Journalof Neuroscience, 31:10882–10890, 2011.

N. Bonneel, J. Rabin, G. Peyre, and H. Pfister. Sliced and radon wassersteinbarycenters of measures. Journal of Mathematical Imaging and Vision,51(1):22–45, 2015.

L. Bottou. Online learning and stochastic approximations. On-Line Learn-ing in Neural Networks, 17(9):142, 1998.

C. Bryant, H. Zhu, M. Ahn, and J. Ibrahim. LCN: a random graph mixturemodel for community detection in functional brain networks. Statisticsand Its Interface, 10(3):369, 2017.

E. Bullmore and O. Sporns. Complex brain networks: graph theoreticalanalysis of structural and functional systems. Nature Reviews Neuro-science, 10(3):186–198, 2009.

C. Caballero-Gaudes and R. Reynolds. Methods for cleaning the BOLDfMRI signal. NeuroImage, 154:128–149, 2017.

P. Callaghan, C. Eccles, and Y. Xia. NMR microscopy of dynamic displace-ments: k-space and q-space imaging. Journal of Physics E: ScientificInstruments, 21:820, 1988.

G. Carlsson. Topology and data. Bulletin of the American MathematicalSociety, 46(2):255–308, 2009.

G. Carlsson and F. Memoli. Characterization, stability and convergenceof hierarchical clustering methods. The Journal of Machine LearningResearch, 11:1425–1470, 2010.

M. Carriere, M. Cuturi, and S. Oudot. Sliced wasserstein kernel for persis-tence diagrams. pages 664–673, 2017.

42

Page 43: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

F. Chazal, D. Cohen-Steiner, L. J. Guibas, F. Memoli, and S. Y. Oudot.Gromov-Hausdorff stable signatures for shapes using persistence. In Com-puter Graphics Forum, volume 28, pages 1393–1403. Wiley Online Li-brary, 2009.

F. Chazal, L. J. Guibas, S. Y. Oudot, and P. Skraba. Persistence-basedclustering in riemannian manifolds. Journal of the ACM (JACM), 60(6):1–38, 2013.

C. Chen, X. Ni, Q. Bai, and Y. Wang. A topological regularizer for classi-fiers via persistent homology. In The 22nd International Conference onArtificial Intelligence and Statistics, pages 2573–2582, 2019.

M.-C. Chiang, K. McMahon, G. de Zubicaray, N. Martin, I. Hickie, A. Toga,M. Wright, and P. Thompson. Genetics of white matter development: aDTI study of 705 twins and their siblings aged 12 to 29. NeuroImage, 54:2308–2317, 2011.

M. Cho, J. Lee, and K. M. Lee. Reweighted random walks for graphmatching. In European Conference on Computer Vision, pages 492–505.Springer, 2010.

D. Christiaens, M. Reisert, T. Dhollander, S. Sunaert, P. Suetens, andF. Maes. Global tractography of multi-shell diffusion-weighted imagingdata using a multi-tissue model. NeuroImage, 123:89–101, 2015.

M. Chung and H. Ombao. Lattice paths for persistent diagrams. In In-terpretability of Machine Intelligence in Medical Image Computing, andTopological Data Analysis and Its Applications for Medical Data, LNCS12929, pages 77–86. 2021.

M. Chung, N. Adluru, K. Dalton, A. Alexander, and R. Davidson. Scalablebrain network construction on white matter fibers. In Proc. of SPIE,volume 7962, page 79624G, 2011.

M. K. Chung, J. L. Hanson, H. Lee, N. Adluru, A. L. Alexander, R. J.Davidson, and S. D. Pollak. Persistent homological sparse network ap-proach to detecting white matter abnormality in maltreated children:MRI and DTI multimodal study. In International Conference on MedicalImage Computing and Computer-Assisted Intervention, pages 300–307.Springer, 2013.

43

Page 44: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

M. K. Chung, H. Lee, V. Solo, R. J. Davidson, and S. D. Pollak. Topo-logical distances between brain networks. In International Workshop onConnectomics in NeuroImaging, pages 161–170. Springer, 2017a.

M. K. Chung, V. Villalta-Gil, H. Lee, P. J. Rathouz, B. B. Lahey, andD. H. Zald. Exact topological inference for paired brain networks viapersistent homology. In International Conference on Information Pro-cessing in Medical Imaging, pages 299–310. Springer, 2017b.

M. K. Chung, S.-G. Huang, A. Gritsenko, L. Shen, and H. Lee. Statisticalinference on the number of cycles in brain networks. In 2019 IEEE 16thInternational Symposium on Biomedical Imaging (ISBI 2019), pages 113–116. IEEE, 2019a.

M. K. Chung, H. Lee, A. DiChristofano, H. Ombao, and V. Solo. Exacttopological inference of the resting-state brain networks in twins. NetworkNeuroscience, 3(3):674–694, 2019b.

M. K. Chung, L. Xie, S.-G. Huang, Y. Wang, J. Yan, and L. Shen. Rapidacceleration of the permutation test via transpositions. In InternationalWorkshop on Connectomics in NeuroImaging, pages 42–53. Springer,2019c.

J. R. Clough, I. Oksuz, N. Byrne, J. A. Schnabel, and A. P. King. Explicittopological priors for deep-learning based image segmentation using per-sistent homology. In International Conference on Information Processingin Medical Imaging, pages 16–28. Springer, 2019.

D. Cohen-Steiner, H. Edelsbrunner, and J. Harer. Stability of persistencediagrams. Discrete & Computational Geometry, 37(1):103–120, 2007.

D. Cohen-Steiner, H. Edelsbrunner, J. Harer, and Y. Mileyko. Lipschitzfunctions have Lp-stable persistence. Foundations of ComputationalMathematics, 10(2):127–139, 2010.

T. Cour, P. Srinivasan, and J. Shi. Balanced graph matching. Advances inNeural Information Processing Systems, 19:313–320, 2006.

L. Crawford, A. Monod, A. X. Chen, S. Mukherjee, and R. Rabadan. Pre-dicting clinical outcomes in glioblastoma: an application of topologicaland functional data analysis. Journal of the American Statistical Asso-ciation, 115(531):1139–1150, 2020.

44

Page 45: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

I. Deshpande, Z. Zhang, and A. G. Schwing. Generative modeling usingthe sliced wasserstein distance. pages 3483–3491, 2018.

R. Desikan, F. Segonne, B. Fischl, B. Quinn, B. Dickerson, D. Blacker,R. Buckner, A. Dale, R. Maguire, B. Hyman, S. Marilyn, and J. Ronald.An automated labeling system for subdividing the human cerebral cortexon mri scans into gyral based regions of interest. NeuroImage, 31:968–980,2006.

J. Diedrichsen and R. Shadmehr. Detecting and adjusting for artifacts infMRI time series data. NeuroImage, 27:624–634, 2005.

H. Edelsbrunner and J. Harer. Persistent homology-a survey. ContemporaryMathematics, 453:257–282, 2008.

H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological persistenceand simplification. In Proceedings 41st Annual Symposium on Founda-tions of Computer Science, pages 454–463. IEEE, 2000.

J. Edmonds and R. M. Karp. Theoretical improvements in algorithmicefficiency for network flow problems. Journal of the ACM (JACM), 19(2):248–264, 1972.

S. Eickhoff, T. Nichols, A. Laird, F. Hoffstaedter, K. Amunts, P. Fox, D. Bz-dok, and C. Eickhoff. Behavior, sensitivity, and power of activation like-lihood estimation characterized by massive empirical simulation. Neu-roImage, 137:70–85, 2016.

S. Eickhoff, B. Yeo, and S. Genon. Imaging-based parcellations of thehuman brain. Nature Reviews Neuroscience, 19:672–686, 2018.

D. Falconer and T. Mackay. Introduction to Quantitative Genetics, 4th ed.Longman, 1995.

L. Fan, H. Li, J. Zhuo, Y. Zhang, J. Wang, L. Chen, Z. Yang, C. Chu,S. Xie, A. Laird, P. Fox, S. Eickhoff, C. Yu, and T. Jiang. The humanbrainnetome atlas: a new brain atlas based on connectional architecture.Cerebral Cortex, 26:3508–3526, 2016.

A. Fornito, A. Zalesky, and E. Bullmore. Network scaling effects in graphanalytic studies of human resting-state fMRI data. Frontiers in SystemsNeuroscience, 4:1–16, 2010.

45

Page 46: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

A. Fornito, A. Zalesky, and E. Bullmore. Fundamentals of Brain NetworkAnalysis. Academic Press, New York, 2016.

R. Ghrist. Barcodes: the persistent topology of data. Bulletin of the Amer-ican Mathematical Society, 45(1):61–75, 2008.

C. E. Ginestet, T. E. Nichols, E. T. Bullmore, and A. Simmons. Brainnetwork analysis: separating cost from topology using cost-integration.PLoS One, 6(7):e21570, 2011.

D. Glahn, A. Winkler, P. Kochunov, L. Almasy, R. Duggirala, M. Carless,J. Curran, R. Olvera, A. Laird, S. Smith, C. Beckmann, P. Fox, andJ. Blangero. Genetic control over the resting brain. Proceedings of theNational Academy of Sciences, 107:1223–1228, 2010.

M. Glasser and D. Van Essen. Mapping human cortical areas in vivo basedon myelin content as revealed by T1-and T2-weighted MRI. Journal ofNeuroscience, 31:11597–11616, 2011.

M. Glasser, S. Smith, D. Marcus, J. Andersson, E. Auerbach, T. Behrens,T. Coalson, M. Harms, M. Jenkinson, and S. Moeller. The human con-nectome project’s neuroimaging approach. Nature Neuroscience, 19:1175,2016.

M. F. Glasser, S. N. Sotiropoulos, J. A. Wilson, T. S. Coalson, B. Fischl,J. L. Andersson, J. Xu, S. Jbabdi, M. Webster, J. R. Polimeni, et al.The minimal preprocessing pipelines for the human connectome project.NeuroImage, 80:105–124, 2013.

S. Gold and A. Rangarajan. A graduated assignment algorithm for graphmatching. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 18(4):377–388, 1996.

G. Gong, Y. He, L. Concha, C. Lebel, D. Gross, A. Evans, and C. Beaulieu.Mapping anatomical connectivity patterns of human cerebral cortex usingin vivo diffusion tensor imaging tractography. Cerebral Cortex, 19:524–536, 2009.

I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT press,2016.

46

Page 47: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

A. Gritsenko, M. Lindquist, and M. Chung. Twin classification in resting-state brain connectivity. IEEE International Symposium on BiomedicalImaging (ISBI), arXiv:1807.00244, 2020.

X. Guo and A. Srivastava. Representations, metrics and statistics for shapeanalysis of elastic graphs. pages 832–833, 2020.

P. Hagmann, M. Kurant, X. Gigandet, P. Thiran, V. Wedeen, R. Meuli,and J. Thiran. Mapping human whole-brain structural networks withdiffusion MRI. PLoS One, 2(7):e597, 2007.

C. J. Honey, R. Kotter, M. Breakspear, and O. Sporns. Network structureof cerebral cortex shapes functional connectivity on multiple time scales.Proceedings of the National Academy of Sciences, 104(24):10240–10245,2007.

C. J. Honey, O. Sporns, L. Cammoun, X. Gigandet, J.-P. Thiran, R. Meuli,and P. Hagmann. Predicting human resting-state functional connectivityfrom structural connectivity. Proceedings of the National Academy ofSciences, 106(6):2035–2040, 2009.

X. Hu, F. Li, D. Samaras, and C. Chen. Topology-preserving deep imagesegmentation. In Advances in Neural Information Processing Systems,pages 5657–5668, 2019.

S.-G. Huang, M. Chung, A. Qiu, and A. D. N. Initiative. Revisiting con-volutional neural network on graphs with polynomial approximations oflaplace-beltrami spectral filtering. Neural Computing and Applications,arXiv preprint arXiv:2010.13269, in press, 2020a.

S.-G. Huang, S.-T. Samdin, C. Ting, H. Ombao, and M. Chung. Statisticalmodel for dynamically-changing correlation matrices with application tobrain connectivity. Journal of Neuroscience Methods, 331:108480, 2020b.

M. Jenkinson and S. Smith. A global optimisation method for robust affineregistration of brain images. Medical Image Analysis, 5(2):143–156, 2001.

M. Jenkinson, P. Bannister, M. Brady, and S. Smith. Improved optimizationfor the robust and accurate linear registration and motion correction ofbrain images. NeuroImage, 17(2):825–841, 2002.

47

Page 48: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

B. Jeurissen, J.-D. Tournier, T. Dhollander, A. Connelly, and J. Sijbers.Multi-tissue constrained spherical deconvolution for improved analysis ofmulti-shell diffusion MRI data. NeuroImage, 103:411–426, 2014.

J. Jovicich, S. Czanner, D. Greve, E. Haley, A. van Der Kouwe, R. Gollub,D. Kennedy, F. Schmitt, G. Brown, J. MacFall, et al. Reliability in multi-site structural mri studies: effects of gradient non-linearity correction onphantom and human data. NeuroImage, 30(2):436–443, 2006.

H. Kang, H. Ombao, C. Fonnesbeck, Z. Ding, and V. Morgan. A bayesiandouble fusion model for resting-state brain connectivity using joint func-tional and structural data. Brain Connectivity, 7:219–227, 2017.

M. Karas, D. Brzyski, M. Dzemidzic, J. Goni, D. A. Kareken, T. W. Ran-dolph, and J. Harezlak. Brain connectivity-informed regularization meth-ods for regression. Statistics in Biosciences, 11(1):47–90, 2019.

T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gansfor improved quality, stability, and variation. 2018.

M. Kerber, D. Morozov, and A. Nigmetov. Geometry helps to comparepersistence diagrams. Journal of Experimental Algorithmics (JEA), 22:1–20, 2017.

S. Kolouri, S. R. Park, M. Thorpe, D. Slepcev, and G. K. Rohde. Opti-mal mass transport: Signal processing and machine-learning applications.IEEE Signal Processing Magazine, 34(4):43–59, 2017.

S. Kolouri, K. Nadjahi, U. Simsekli, R. Badeau, and G. Rohde. Gener-alized sliced wasserstein distances. In Advances in Neural InformationProcessing Systems, pages 261–272, 2019.

R. Kong, J. Gao, Y. Xu, Y. Pan, J. Wang, and J. Liu. Classificationof autism spectrum disorder by combining brain connectivity and deepneural network classifier. Neurocomputing, 324:63–68, 2019.

H.-P. Kriegel, P. Kroger, and A. Zimek. Clustering high-dimensional data:A survey on subspace clustering, pattern-based clustering, and correla-tion clustering. ACM Transactions on Knowledge Discovery from Data(TKDD), 3(1):1–58, 2009.

48

Page 49: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

H. Le and A. Kume. The frechet mean shape and the shape of the means.Advances in Applied Probability, pages 101–113, 2000.

H. Lee, M. Chung, H. Kang, B.-N. Kim, and D. Lee. Discriminative persis-tent homology of brain networks. In IEEE International Symposium onBiomedical Imaging (ISBI), pages 841–844, 2011a.

H. Lee, M. Chung, H. Kang, B.-N. Kim, and D. Lee. Computing the shapeof brain networks using graph filtration and Gromov-Hausdorff metric.MICCAI, Lecture Notes in Computer Science, 6892:302–309, 2011b.

H. Lee, H. Kang, M. K. Chung, B.-N. Kim, and D. S. Lee. Persistent brainnetwork homology from the perspective of dendrogram. IEEE Transac-tions on Medical Imaging, 31(12):2267–2277, 2012.

M. Leordeanu and M. Hebert. A spectral technique for correspondence prob-lems using pairwise constraints. In Tenth IEEE International Conferenceon Computer Vision (ICCV’05) Volume 1, volume 2, pages 1482–1489.IEEE, 2005.

M. Leordeanu, M. Hebert, and R. Sukthankar. An integer projected fixedpoint method for graph matching and map inference. In Advances inNeural Information Processing Systems, pages 1114–1122, 2009.

A. Liutkus, U. Simsekli, S. Majewski, A. Durmus, and F.-R. Stoter. Sliced-wasserstein flows: Nonparametric generative modeling via optimal trans-port and diffusions. pages 4104–4113, 2019.

E. Love, B. Filippenko, V. Maroulas, and G. Carlsson. Topological deeplearning. arXiv preprint arXiv:2101.05778, 2021.

J. Lv, L. Guo, X. Hu, T. Zhang, K. Li, D. Zhang, J. Yang, and T. Liu.Fiber-centered analysis of brain connectivities using DTI and restingstate FMRI data. In International Conference on Medical Image Com-puting and Computer-Assisted Intervention (MICCAI), pages 143–150.Springer, 2010.

A. Marchese and V. Maroulas. Signal classification with a point processdistance on the space of persistence diagrams. Advances in Data Analysisand Classification, 12(3):657–682, 2018.

49

Page 50: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

D. Marcos, M. Volpi, and D. Tuia. Learning rotation invariant convolutionalfilters for texture classification. In 2016 23rd International Conferenceon Pattern Recognition (ICPR), pages 2012–2017. IEEE, 2016.

V. Maroulas, F. Nasrin, and C. Oballe. A bayesian framework for persistenthomology. SIAM Journal on Mathematics of Data Science, 2:48–74, 2020.

D. McKay, E. Knowles, A. Winkler, E. Sprooten, P. Kochunov, R. Olvera,J. Curran, J. Kent Jr., M. Carless, H. Goring, T. Dyer, R. Duggirala,L. Almasy, P. Fox, J. Blangero, and D. Glahn. Influence of age, sexand genetic factors on the human brain. Brain Imaging and Behavior, 8:143–152, 2014.

G. Naitzat, A. Zhitnikov, and L.-H. Lim. Topology of deep neural networks.Journal of Machine Learning Research, 21:1–40, 2020.

A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: analysis and analgorithm. In Advances in Neural Information Processing Systems, pages849–856, 2002.

H. Ombao, M. Lindquist, W. Thompson, and J. Aston. Handbook of Neu-roimaging Data Analysis. CRC Press, 2016.

V. Patrangenaru, P. Bubenik, R. Paige, and D. Osborne. Challenges intopological object data analysis. Sankhya A, 81:244–271, 2019.

J. Power, K. Barnes, A. Snyder, B. Schlaggar, and S. Petersen. Spuriousbut systematic correlations in functional connectivity MRI networks arisefrom subject motion. NeuroImage, 59:2142–2154, 2012.

J. Rabin, G. Peyre, J. Delon, and M. Bernot. Wasserstein barycenter and itsapplication to texture mixing. In International Conference on Scale Spaceand Variational Methods in Computer Vision, pages 435–446. Springer,2011.

L. Ramshaw and R. Tarjan. On minimum-cost assignments in unbalancedbipartite graphs. HP Labs, Palo Alto, CA, USA, Tech. Rep. HPL-2012-40R1, 2012.

J. Reininghaus, S. Huber, U. Bauer, and R. Kwitt. A stable multi-scalekernel for topological machine learning. In Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition, pages 4741–4748,2015.

50

Page 51: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

C. Reynolds and D. Phillips. Genetics of brain aging–twin aging. 2015.

V. Robins and K. Turner. Principal component analysis of persistent ho-mology rank functions with case studies of spatial point patterns, spherepacking and colloids. Physica D: Nonlinear Phenomena, 334:99–117,2016.

C. Rottschy, R. Langner, I. Dogan, K. Reetz, A. Laird, J. Schulz, P. Fox, andS. Eickhoff. Modelling neural correlates of working memory: a coordinate-based meta-analysis. NeuroImage, 60:830–846, 2012.

T. Satterthwaite, D. Wolf, J. Loughead, K. Ruparel, M. Elliott, H. Hakonar-son, R. Gur, and R. Gur. Impact of in-scanner head motion on multiplemeasures of functional connectivity: relevance for studies of neurodevel-opment in youth. NeuroImage, 60:623–632, 2012.

A. Schaefer, R. Kong, E. Gordon, T. Laumann, X.-N. Zuo, A. Holmes,S. Eickhoff, and B. Yeo. Local-global parcellation of the human cerebralcortex from intrinsic functional connectivity MRI. Cerebral Cortex, 28:3095–3114, 2017.

J. Scott. Social network analysis. Sociology, 22:109–127, 1988.

L. Shah, J. Cramer, M. Ferguson, R. Birn, and J. Anderson. Reliabilityand reproducibility of individual differences in functional connectivityacquired during task and resting state. Brain and Behavior, 6:e00456,2016.

H. Shappell, B. Caffo, J. Pekar, and M. Lindquist. Improved state changeestimation in dynamic functional connectivity using hidden semi-markovmodels. bioRxiv, page 519868, 2019.

R. Sharathkumar and P. Agarwal. Algorithms for the transportation prob-lem in geometric settings. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 306–317. SIAM, 2012.

D. Shattuck, M. Mirza, V. Adisetiyo, C. Hojatkashani, G. Salamon,K. Narr, R. Poldrack, R. Bilder, and A. Toga. Construction of a 3Dprobabilistic atlas of human cortical structures. NeuroImage, 39:1064–1080, 2008.

51

Page 52: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

N. Singh, H. D. Couture, J. Marron, C. Perou, and M. Niethammer. Topo-logical descriptors of histology images. In International Workshop onMachine Learning in Medical Imaging, pages 231–239. Springer, 2014.

D. Smit, C. Stam, D. Posthuma, D. Boomsma, and E. De Geus. Heritabilityof small-world networks in the brain: a graph theoretical analysis ofresting-state EEG functional connectivity. Human Brain Mapping, 29:1368–1378, 2008.

R. Smith, J.-D. Tournier, F. Calamante, and A. Connelly. SIFT2: enablingdense quantitative assessment of brain white matter connectivity usingstreamlines tractography. NeuroImage, 119:338–351, 2015.

T. Snijders, M. Spreen, and R. Zwaagstra. The use of multilevel modelingfor analysing personal networks: Networks of cocaine users in an urbanarea. Journal of Quantitative Anthropology, 5(2):85–105, 1995.

T. Songdechakraiwut and M. K. Chung. Dynamic topological data analysisfor functional brain signals. 2020 IEEE 17th International Symposiumon Biomedical Imaging Workshops (ISBI Workshops), pages 1–4, 2020.

T. Songdechakraiwut, L. Shen, and M. Chung. Topological learning andits application to multimodal brain network integration. Medical ImageComputing and Computer Assisted Intervention (MICCAI), pages 166–176, 2021.

O. Sporns. Graph Theory Methods for the Analysis of Neural ConnectivityPatterns, pages 171–185. Springer US, Boston, MA, 2003.

L. Taylor and G. Nitschke. Improving deep learning with generic dataaugmentation. In 2018 IEEE Symposium Series on Computational Intel-ligence (SSCI), pages 1542–1547. IEEE, 2018.

Y. Tian, J. Yan, H. Zhang, Y. Zhang, X. Yang, and H. Zha. On the conver-gence of graph matching: Graduated assignment revisited. In EuropeanConference on Computer Vision, pages 821–835. Springer, 2012.

A. Toga and P. Thompson. Mapping brain asymmetry. Nature ReviewsNeuroscience, 4:37–48, 2003.

J. Tournier, F. Calamante, A. Connelly, et al. Mrtrix: diffusion tractogra-phy in crossing fiber regions. International Journal of Imaging Systemsand Technology, 22:53–66, 2012.

52

Page 53: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

K. Turner, Y. Mileyko, S. Mukherjee, and J. Harer. Frechet means for dis-tributions of persistence diagrams. Discrete & Computational Geometry,52:44–70, 2014.

N. Tzourio-Mazoyer, B. Landeau, D. Papathanassiou, F. Crivello, O. Etard,N. Delcroix, B. Mazoyer, and M. Joliot. Automated anatomical labelingof activations in spm using a macroscopic anatomical parcellation of theMNI MRI single-subject brain. NeuroImage, 15:273–289, 2002.

K. Van Dijk, M. Sabuncu, and R. Buckner. The influence of head motionon intrinsic functional connectivity MRI. NeuroImage, 59:431–438, 2012.

D. C. Van Essen, K. Ugurbil, E. Auerbach, D. Barch, T. Behrens, R. Bu-cholz, A. Chang, L. Chen, M. Corbetta, S. W. Curtiss, et al. The humanconnectome project: a data acquisition perspective. NeuroImage, 62(4):2222–2231, 2012.

D. C. Van Essen, S. M. Smith, D. M. Barch, T. E. Behrens, E. Yacoub,K. Ugurbil, W.-M. H. Consortium, et al. The WU-Minn human connec-tome project: an overview. NeuroImage, 80:62–79, 2013.

T. Vayer, R. Flamary, N. Courty, R. Tavenard, and L. Chapel. Slicedgromov-wasserstein. In Advances in Neural Information Processing Sys-tems, volume 32. journal=arXiv preprint arXiv:1905.10124,, 2019.

T. Wang, H. Liu, Y. Li, Y. Jin, X. Hou, and H. Ling. Learning combinatorialsolver for graph matching. In Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition, pages 7568–7577, 2020.

Y. Wang, H. Ombao, and M. K. Chung. Topological data analysis of single-trial electroencephalographic signals. The Annals of Applied Statistics,12(3):1506, 2018.

L. Wasserman. Topological data analysis. Annual Review of Statistics andIts Application, 5:501–532, 2018.

B. C. M. Wijk, C. J. Stam, and A. Daffertshofer. Comparing brain networksof different size and connectivity density using graph theory. PloS one,5:e13701, 2010.

K. Xia and G.-W. Wei. Persistent homology analysis of protein structure,flexibility, and folding. International Journal for Numerical Methods inBiomedical Engineering, 30(8):814–844, 2014.

53

Page 54: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

L. Xie, E. Amico, P. Salama, Y.-C. Wu, S. Fang, O. Sporns, A. Saykin,J. Goni, J. Yan, and L. Shen. Heritability estimation of reliable con-nectomic features. In International Workshop on Connectomics in Neu-roImaging, Lecture Notes in Computer Science, volume 11083, pages 58–66, 2018.

W. Xue, F. D. Bowman, A. V. Pileggi, and A. R. Mayer. A multimodalapproach for determining brain networks by jointly modeling functionaland structural connectivity. Frontiers in Computational Neuroscience, 9:22, 2015.

T. Yu, J. Yan, Y. Wang, W. Liu, et al. Generalizing graph matchingbeyond quadratic assignment model. In Advances in Neural InformationProcessing Systems, pages 853–863, 2018.

A. Zalesky, A. Fornito, I. Harding, L. Cocchi, M. Yucel, C. Pantelis, andE. Bullmore. Whole-brain anatomical networks: Does the choice of nodesmatter? NeuroImage, 50:970–983, 2010.

M. Zavlanos and G. Pappas. A dynamical systems approach to weightedgraph matching. Automatica, 44:2817–2824, 2008.

Y. Zemel and V. Panaretos. Frechet means and procrustes analysis inwasserstein space. Bernoulli, 25:932–976, 2019.

G. Zhang, B. Cai, A. Zhang, J. Stephen, T. Wilson, V. Calhoun, and Y.-P. Wang. Estimating Dynamic Functional Brain Connectivity With aSparse Hidden Markov Model. IEEE transactions on medical imaging,39:488–498, 2019a.

Z. Zhang, M. Descoteaux, J. Zhang, G. Girard, M. Chamberland, D. Dun-son, A. Srivastava, and H. Zhu. Mapping population-based structuralconnectomes. NeuroImage, 172:130–145, 2018.

Z. Zhang, Y. Xiang, L. Wu, B. Xue, and A. Nehorai. KerGM: Kernelizedgraph matching. In Advances in Neural Information Processing Systems,pages 3335–3346, 2019b.

F. Zhou and F. De la Torre. Deformable graph matching. In Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition,pages 2922–2929, 2013.

54

Page 55: arXiv:2012.00675v2 [q-bio.NC] 16 Dec 2020

D. Zhu, T. Zhang, X. Jiang, X. Hu, H. Chen, N. Yang, J. Lv, J. Han,L. Guo, and T. Liu. Fusing DTI and fMRI data: a survey of methodsand applications. NeuroImage, 102:184–191, 2014.

55