Graph-Adaptive Semi-Supervised Tracking of Dynamic Processes … · 2020-05-08 · 2586 IEEE...

12
2586 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020 Graph-Adaptive Semi-Supervised Tracking of Dynamic Processes Over Switching Network Modes Qin Lu , Member, IEEE, Vassilis N. Ioannidis , Student Member, IEEE, and Georgios B. Giannakis , Fellow, IEEE Abstract—A plethora of network-science related applications call for inference of spatio-temporal graph processes. Such an inference task can be aided by the underlying graph topology that might jump over discrete modes. For example, the connectivity in dynamic brain networks, switches among candidate topologies, each corresponding to a different emotional state, also known as the network mode. Taking advantage of limited nodal observations, the present contribution deals with semi-supervised tracking of dy- namic processes over a given candidate set of graphs with unknown switches. Towards this end, a dynamical model is introduced to capture the per-slot spatial correlation using the active topology, as well as the temporal variation across slots through a state-space model. A scalable graph-adaptive Bayesian approach is developed, based on what is termed interacting multi-graph model (IMGM), to track the dynamic nodal processes and the active graph topology on-the-fly. Besides switching topologies, the proposed IMGM algo- rithm can accommodate various generalizations, including multi- ple dynamic functions, multiple kernels, and adaptive observation noise covariances. IMGM learns the dynamical model that best fits the data from a pool of available models. Thus, the resul- tant adaptive algorithm does not require offline model training. Numerical tests with synthetic and real datasets demonstrate the superior tracking performance of the novel approach compared to the mode-clairvoyant existing alternatives. Index Terms—Dynamic graph processes, switching network modes, online scalable Bayesian inference, multi-kernel learning. I. INTRODUCTION G RAPHS capture relations among entities (nodes), and have found widespread application in various fields, in- cluding sociology, biology, neuroscience and economics [15], [30]. Attributes collected in interdependent feature vectors per node represent processes over the graph. Given such vectors from a subset of nodes, various applications call for semi- supervised learning (SSL) of processes across all network nodes. The scarcity of nodal observations can be due to e.g., cost, and computational or privacy constraints. For example, individuals in social networks may be reluctant to share personal infor- mation, while acquiring nodal samples in brain networks may require invasive procedures such as electrocorticography. Manuscript received June 1, 2019; revised January 31, 2020 and March 16, 2020; accepted March 23, 2020. Date of publication April 3, 2020; date of current version May 1, 2020. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Pierre Borgnat. This work was supported by NSF under Grants 1508993, 1711471, and 1901134. (Corresponding author: Qin Lu.) The authors are with the Department of ECE and Digital Technology Center, University of Minnesota, Minneapolis, MN 55414 USA (e-mail: [email protected]; [email protected]; [email protected]). Digital Object Identifier 10.1109/TSP.2020.2984889 SSL tasks over networks can leverage the prior information of the underlying graph topology that captures nodal inter- dependencies [12]. Existing approaches to reconstructing time- invariant (TI) graph processes often rely on the smoothness of graph processes [16], [31], which asserts that connected vertices have similar features. In social networks where nodes and edges represent users and their friendships, one can infer the age of a specific user from her or his friends’ age. Other than smoothness inference from limited nodal observations can rely on e.g., ‘graph bandlimitedness’ [10], [29], sparsity and overcomplete dictionaries [11]. Most of these approaches can be unified under the framework of learning using graph kernels; see e.g., [26]. The aforementioned SSL task becomes more challenging when nodal processes are nonstationary, and the graph topology is also time-varying. In a brain network for instance, where nodes correspond to brain regions and edges capture dependencies among them, one may be interested in predicting the dynamic processes as well as the varying interconnections. An interesting time-varying topology model switches over a set of connectivity patterns, also known as “network modes” [5]. For example, the connectivity among human brain regions varies as the humans’ emotional, mental or physical activities change [36]. Coupled with the topology, the dynamics of nodal processes can also switch among different modes. Switching dynamical models have been typically employed to characterize the multi-modal behavior of control systems [27], as well as kinematics of ma- neuvering targets such as drones [6]. Nevertheless, graph-based switching dynamical models have not been considered so far. Several attempts have been made to reconstruct dynamic graph processes in the presence of possibly time-varying topolo- gies. Inference of slow-varying processes over graphs has been pursued using the so-termed graph bandlimited model in [10], [35]. On the other hand, graph kernel-based estimators have been leveraged in [25], [14] to reconstruct general dynamic processes. All these contemporary approaches rely on a known graph topol- ogy and fixed dynamic models. However, the dynamic graph can change or switch in an unknown fashion among a set of possibly known topologies, which may reflect sudden changes in the partially observed signals. Furthermore, even when no topology switches occur, the graph process can evolve over multiple dynamical models across time, and thus a fixed model may be inadequate. The present paper puts forth an approach for semi-supervised tracking and extrapolation of dynamic nodal processes over switching graphs. Our contribution is threefold. 1053-587X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. Authorized licensed use limited to: University of Minnesota. Downloaded on May 08,2020 at 00:39:08 UTC from IEEE Xplore. Restrictions apply.

Transcript of Graph-Adaptive Semi-Supervised Tracking of Dynamic Processes … · 2020-05-08 · 2586 IEEE...

  • 2586 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

    Graph-Adaptive Semi-Supervised Tracking ofDynamic Processes Over Switching Network Modes

    Qin Lu , Member, IEEE, Vassilis N. Ioannidis , Student Member, IEEE,and Georgios B. Giannakis , Fellow, IEEE

    Abstract—A plethora of network-science related applicationscall for inference of spatio-temporal graph processes. Such aninference task can be aided by the underlying graph topology thatmight jump over discrete modes. For example, the connectivityin dynamic brain networks, switches among candidate topologies,each corresponding to a different emotional state, also known asthe network mode. Taking advantage of limited nodal observations,the present contribution deals with semi-supervised tracking of dy-namic processes over a given candidate set of graphs with unknownswitches. Towards this end, a dynamical model is introduced tocapture the per-slot spatial correlation using the active topology,as well as the temporal variation across slots through a state-spacemodel. A scalable graph-adaptive Bayesian approach is developed,based on what is termed interacting multi-graph model (IMGM),to track the dynamic nodal processes and the active graph topologyon-the-fly. Besides switching topologies, the proposed IMGM algo-rithm can accommodate various generalizations, including multi-ple dynamic functions, multiple kernels, and adaptive observationnoise covariances. IMGM learns the dynamical model that bestfits the data from a pool of available models. Thus, the resul-tant adaptive algorithm does not require offline model training.Numerical tests with synthetic and real datasets demonstrate thesuperior tracking performance of the novel approach compared tothe mode-clairvoyant existing alternatives.

    Index Terms—Dynamic graph processes, switching networkmodes, online scalable Bayesian inference, multi-kernel learning.

    I. INTRODUCTION

    GRAPHS capture relations among entities (nodes), andhave found widespread application in various fields, in-cluding sociology, biology, neuroscience and economics [15],[30]. Attributes collected in interdependent feature vectors pernode represent processes over the graph. Given such vectorsfrom a subset of nodes, various applications call for semi-supervised learning (SSL) of processes across all network nodes.The scarcity of nodal observations can be due to e.g., cost, andcomputational or privacy constraints. For example, individualsin social networks may be reluctant to share personal infor-mation, while acquiring nodal samples in brain networks mayrequire invasive procedures such as electrocorticography.

    Manuscript received June 1, 2019; revised January 31, 2020 and March 16,2020; accepted March 23, 2020. Date of publication April 3, 2020; date ofcurrent version May 1, 2020. The associate editor coordinating the review ofthis manuscript and approving it for publication was Dr. Pierre Borgnat. Thiswork was supported by NSF under Grants 1508993, 1711471, and 1901134.(Corresponding author: Qin Lu.)

    The authors are with the Department of ECE and Digital TechnologyCenter, University of Minnesota, Minneapolis, MN 55414 USA (e-mail:[email protected]; [email protected]; [email protected]).

    Digital Object Identifier 10.1109/TSP.2020.2984889

    SSL tasks over networks can leverage the prior informationof the underlying graph topology that captures nodal inter-dependencies [12]. Existing approaches to reconstructing time-invariant (TI) graph processes often rely on the smoothness ofgraph processes [16], [31], which asserts that connected verticeshave similar features. In social networks where nodes and edgesrepresent users and their friendships, one can infer the age of aspecific user from her or his friends’ age. Other than smoothnessinference from limited nodal observations can rely on e.g.,‘graph bandlimitedness’ [10], [29], sparsity and overcompletedictionaries [11]. Most of these approaches can be unified underthe framework of learning using graph kernels; see e.g., [26].

    The aforementioned SSL task becomes more challengingwhen nodal processes are nonstationary, and the graph topologyis also time-varying. In a brain network for instance, where nodescorrespond to brain regions and edges capture dependenciesamong them, one may be interested in predicting the dynamicprocesses as well as the varying interconnections. An interestingtime-varying topology model switches over a set of connectivitypatterns, also known as “network modes” [5]. For example, theconnectivity among human brain regions varies as the humans’emotional, mental or physical activities change [36]. Coupledwith the topology, the dynamics of nodal processes can alsoswitch among different modes. Switching dynamical modelshave been typically employed to characterize the multi-modalbehavior of control systems [27], as well as kinematics of ma-neuvering targets such as drones [6]. Nevertheless, graph-basedswitching dynamical models have not been considered so far.

    Several attempts have been made to reconstruct dynamicgraph processes in the presence of possibly time-varying topolo-gies. Inference of slow-varying processes over graphs has beenpursued using the so-termed graph bandlimited model in [10],[35]. On the other hand, graph kernel-based estimators have beenleveraged in [25], [14] to reconstruct general dynamic processes.All these contemporary approaches rely on a known graph topol-ogy and fixed dynamic models. However, the dynamic graphcan change or switch in an unknown fashion among a set ofpossibly known topologies, which may reflect sudden changesin the partially observed signals. Furthermore, even when notopology switches occur, the graph process can evolve overmultiple dynamical models across time, and thus a fixed modelmay be inadequate.

    The present paper puts forth an approach for semi-supervisedtracking and extrapolation of dynamic nodal processes overswitching graphs. Our contribution is threefold.

    1053-587X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.

    Authorized licensed use limited to: University of Minnesota. Downloaded on May 08,2020 at 00:39:08 UTC from IEEE Xplore. Restrictions apply.

    https://orcid.org/0000-0002-4051-1396https://orcid.org/0000-0002-8367-0733https://orcid.org/0000-0002-0196-0260mailto:[email protected]:[email protected]:[email protected]

  • LU et al.: GRAPH-ADAPTIVE SEMI-SUPERVISED TRACKING OF DYNAMIC PROCESSES OVER SWITCHING NETWORK MODES 2587

    C1. The evolution of dynamic processes over switchinggraphs is captured by a first-order vector autoregres-sive model, where the transition matrix and the processnoise covariance matrix depend on the active mode-conditioned topology. The resulting graph-adaptive dy-namical model accounts for both spatial correlationwithin one slot and temporal variations across slots.

    C2. Given a candidate set of the aforementioned mode-conditioned dynamical models and measurements on asubset of nodes, we put forth a scalable graph-awareBayesian tracker, termed interacting multi-graph model(IMGM), to jointly estimate the graph processes andactive network modes on-the-fly.

    C3. Further, the proposed IMGM framework accommodatesvarious modeling extensions, including switching non-linear dynamical functions, multiple kernels, and adap-tive observation covariances. By accounting for thesedynamical models, IMGM adapts on the observed dataand selects the pertinent model per time slot withoutrequiring offline training.

    If observations were available at all nodes, it would have beenpossible to identify the active topology per slot without explicitlymodeling the nodal process dynamics [5]. Relative to [5], thiswork leverages the dynamics to reconstruct unavailable nodaldata, while at the same time identifying the active mode andtracking the nodal processes. Not necessarily graph related yetsimilar to that of [5] is the goal of subspace clustering [32], butdifferent from the work here mode dynamics are not exploitedto reconstruct unavailable nodal processes.

    The rest of the paper is organized as follows. Section II startswith preliminaries to formulate the problem that is solved inSection III. Section IV deals with modeling generalizations ofthe IMGM approach. Numerical results and conclusions arepresented in Sections V and VI, respectively. Part of this paperis published in our conference precursors [20], [21].

    Notation: Scalars are denoted by lowercase, column vectorsby bold lowercase, and matrices by bold uppercase fonts. Su-perscripts �, −1 and † denote transpose, inverse and pseudo-inverse, respectively; while 1N stands for the N × 1 all-onesvector; and N (x;μ,K) for the probability density function(pdf) of a Gaussian random vector x with mean μ, and co-variance matrix K. Finally, if A is a matrix and x a vector,then ||x||2A := x�A−1x, ||x||22 := x�x, ‖A‖1 represents theL1-norm of the vectorized matrix, and ‖A‖2F is the Frobeniusnorm of A.

    II. PROBLEM FORMULATION

    Consider a time-varying graph Gt with N nodes indexed bythe vertex set V := {1, . . . , N}. Per slot t, the relationship be-tween nodes is captured by anN ×N adjacency matrixAt withAt(n, n

    ′) representing the weight of the edge connecting nodesn and n′. The focus will be on graphs whose topology jumpsamong a known set of S candidate adjacency matrices; that is,At = A

    σtt ∈ {A1t , . . . ,ASt }, where the per-slot active topology

    index σt ∈ S := {1, . . . , S} describes the so-called “networkmode.” The active mode-conditioned Laplacian matrix is then

    TABLE IEXAMPLES OF LAPLACIAN KERNELS

    given by Lσtt = Dσtt −Aσtt , where Dσtt = diag{Aσtt 1N} de-

    notes the graph degree matrix. Switching topologies emergein several networked systems. Besides brain networks [36],network topologies from information cascades exhibit switchingpatterns [5].

    A dynamic graph process is defined as the mapping x : V ×T �→ R, where T := {1, 2, . . .} is the set of slot indices. Thus,xt(n) represents the attribute of node n at slot t. For instance,it may represent the value of a stock n at day t. The valuesover all the nodes at slot t are collected in the vector xt :=[xt(1), . . . , xt(N)]

    �.In several applications, processes over only a subset of M <

    N vertices are observed, which yields the observation model

    zt = Htxt + et (1)

    where Ht ∈ {0, 1}M×N is the time-varying observation (orsampling) matrix, whose rows sum up to 1, and et is theobservation noise that accounts for unmodeled uncertainties,assumed to be white and Gaussian distributed with mean zeroand covariance Rt.

    A. Kernel-Based Inference of TI Graph Processes

    Towards learning dynamic graph processes, it is instructiveto first outline the kernel-based inference of TI graph processes.Consider a TI adjacency matrix A and observation model z =Hx+ e, which are given by dropping slot index t in the time-varying scenario. To uniquely reconstruct x, one may rely onthe regularized least-squares formulation

    x̂ = arg minx

    ‖z−Hx‖22 + μΩ(x) (2)

    where Ω(·) is a chosen monotonic regularizing function alongwith the scalar μ ≥ 0 that controls the importance of the regu-larization term vis-a-vis the fitting error.

    For undirected graphs with symmetric adjacency matrix A,the so-called Laplacian regularizer is given by

    ΩLR(x) := x�Lx =

    1

    2

    N∑

    n=1

    N∑

    n′=1

    A(n, n′)(x(n)−x(n′))2 (3)

    whereL is the TI Laplacian matrix. The regularizer (3) promotessmoothness of the estimated signal on the graph as verticesconnected by strong links (large A(n, n′)) will have similarsignal estimates to minimize (3). To facilitate other propertiessuch as diffusion or graph bandlimitedness, the Laplacian matrixin (3) is replaced by r(L), where the scalar energy mappingr : R �→ R+ is applied on the eigenvalues of L to promotedesired properties, see e.g., Table I. The pseudo-inverse of r(L)

    Authorized licensed use limited to: University of Minnesota. Downloaded on May 08,2020 at 00:39:08 UTC from IEEE Xplore. Restrictions apply.

  • 2588 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

    yields the graph Laplacian kernel [26]

    K := r†(L). (4)

    By considering Ω(x) := ‖x‖2K, we recover the family of ker-nel ridge regression (KRR) estimators, which enjoys well-documented reconstruction performance [14], [31].

    For directed graphs, one can not directly apply the KRRframework since A is not symmetric. Nevertheless, attemptshave also been made towards KRR by redefining a positivesemidefinite Laplacian matrix [3], [4], [8], [9]. In [9], sucha valid matrix is constructed as L := U− (UĀ+ Ā�U)/2,where Ā := D−1A and U := diag(u) with u denoting the lefteigenvector of Ā. Based on this definition, kernel matrices canbe constructed, allowing the KRR framework to accommodatedirected graphs as well. The methods in this paper apply to bothdirected and undirected graphs.

    So far, we outlined SSL on graphs using a deterministickernel-based framework. It is however instructive to present aBayesian generative model for KRR estimation. First, considerthat the prior pdf of x is p(x) = N (x;0,K) and the likelihoodof x based on observation z is given by p(z|x) = N (z;Hx,R)withR = μIM . Under these Gaussian densities, the maximum aposteriori (MAP) estimator ofx given z is equivalent to the KRRestimator, which amounts to the linear minimum mean-squareerror (LMMSE) estimator

    x̂ = arg maxx

    p(x|z) = arg maxx

    p(z|x)p(x)

    = arg minx

    ‖z−Hx‖2R + ‖x‖2K. (5)

    Graph processes with arbitrary dynamics render the inferencetask in (5) intractable, in general. Fortunately, structured dynam-ical models, such as the one dealt with in the ensuing section,can lead to tractable estimators.

    B. Modeling Dynamic Processes Over Switching Graphs

    One possible approach is to pursue an instantaneous per-slotKRR estimator based on zt in (1). This estimator however, doesnot account for the xt−1 to xt transition that can benefit theestimator ofxt from observations other thanzt, and thus improveestimation performance [14], [25].

    Exploitation of graph process dynamics calls for modelingthe evolution from xt−1 to xt, which arguably depends on theunderlying topology [14], [25]. To capture the dynamics ofprocesses over switching graphs, we model the evolution fromxt−1 to xt as the first-order Markov process

    xt = Fσtt xt−1 + η

    σtt (6)

    where the state transition matrix is a known function f of theactive adjacency matrix given by

    Fσtt := f (Aσtt ) . (7)

    The mode-conditioned process noise ησtt is assumed uncorre-lated with the state, white, and Gaussian distributed with zeromean and covariance Kσtt , which is the Laplacian kernel (4).

    The model in (6) accounts for the spatio-temporal dependenceof graph processes in the following two aspects.

    i) The temporal dynamics across two consecutive slots arecaptured by the state transition matrix of the so-termed“transition graph.” With Fσtt = A

    σtt , the transition model

    (6) amounts to a graph diffusion process [29].ii) The spatial correlations across nodes within t are captured

    by the Laplacian kernel Kσtt of the process noise covari-ance. By setting Fσtt xt−1 = 0, the dynamical model (6)reduces to xt = η

    σtt , which together with (1), constitutes

    the generative model for TI graph processes, leading tothe MAP estimate given by (5). Incidentally, such a co-variance model implies that xt is “graph stationary” [24].A related noise model was also adopted in [14] to promotesmoothness of the estimates.

    The dynamical model in (6) describes what is also known as aswitching linear dynamical system (SLDS) [23], and it is widelyemployed in the tracking community to capture the kinematicstate evolution of maneuvering targets [6].

    Problem statement: Given T observations ZT := [z1 . . . zT ]as in (1), and candidate models {{Fst ,Kst}Ss=1}Tt=1 as in (6),the goal is to jointly track the dynamic graph processes XT :=[x1 . . .xT ], and the discrete modes {σt}Tt=1.

    III. SCALABLE GRAPH-AWARE BAYESIAN TRACKER

    In this section, we develop a Bayesian approach to trackdynamic graph processes over switching graphs. First, giventhe Markovian state transition model in (6), the prior joint pdfof the nodal processes in XT can be expressed as

    p(XT ) = p(xT |xT−1;σT )p(XT−1) = · · ·=T∏

    t=1

    p(xt|xt−1;σt)

    =T∏

    t=1

    (S∑

    s=1

    wst p(xt|xt−1;σt = s))

    where we explicitly incorporate σt in p(xt|xt−1) to stress theactive topology present, and wst encodes the existence of themode σt = s with wst ∈ {0, 1} and

    ∑Ss=1 w

    st = 1.

    Furthermore, since et in (1) is temporally white, the condi-tional data pdf also factorizes as

    p(ZT |XT ) =T∏

    t=1

    p(zt|xt).

    Hence, Bayes’ rule yields the posterior joint state pdf as

    p(XT |ZT ) ∝ p(ZT |XT )p(XT )

    =

    T∏

    t=1

    p(zt|xt)(

    S∑

    s=1

    wst p(xt|xt−1;σt = s)).

    (8)

    Since, et and ησtt are Gaussian, the conditional likelihood

    p(zt|xt) and the transition pdf p(xt|xt−1;σt = s) are also Gaus-sian, that is

    p(zt|xt) = N (zt;Htxt,Rt)p(xt|xt−1;σt = s) = N (xt;Fstxt−1,Kst ).

    Authorized licensed use limited to: University of Minnesota. Downloaded on May 08,2020 at 00:39:08 UTC from IEEE Xplore. Restrictions apply.

  • LU et al.: GRAPH-ADAPTIVE SEMI-SUPERVISED TRACKING OF DYNAMIC PROCESSES OVER SWITCHING NETWORK MODES 2589

    Thus, the MAP state estimates in batch form are (cf. (8))

    arg min{xt}Tt=1

    {{wst }Ss=1}Tt=1

    1

    2

    T∑

    t=1

    [‖zt −Htxt‖2Rt+

    S∑

    s=1

    wst ‖xt − Fstxt−1‖2Kst

    ]

    s.to wst ∈ {0, 1},S∑

    s=1

    wst = 1. (9)

    Unfortunately, (9) is a mixed integer program whose optimalsolution is given by enumerating all the ST combinations ofdiscrete network modes across T slots, and then applying aKalman smoother for each mode combination, thus incurringcomputational complexity O(TSTN3).

    Targeting a computationally efficient solver with xt and σtestimates obtained on-the-fly, we will build on the interactingmulti-model (IMM) algorithm [7] that has been applied to targettracking [22] and air traffic control [19], but without graph-related information. Taking into account dynamically switchinggraph topologies, we will naturally term the resultant algo-rithm interacting multi-graph model (IMGM). Given partiallyobserved nodal samples zt and a candidate set of switchinggraphs, IMGM is a graph-adaptive Bayesian tracker that es-timates the active network mode σt together with the N scalarnodal processes in xt.

    Our IMGM replaces the hard constraint wst ∈ {0, 1} with thesoft one wst ∈ [0, 1]. To further stress that the weight is basedon observations up to t, wst is replaced with w

    st|t. Thus, one can

    interpret wst|t as the posterior probability mass function (pmf)of mode s being active at slot t, namely wst|t = Pr(σt = s|Zt).Different from (9) where σt was viewed as deterministic, wewill next model it as a first-order Markov chain parameterizedby the S × S mode transition matrix Π, whose (i, j)th entry

    πij = Pr(σt = i|σt−1 = j) (10)

    denotes the transition probability from mode j at slot t− 1 tomode i at slot t. The parameters ofΠ are pre-selected. A practicalchoice for Π is to set its diagonal entries to π0 ∈ [0.9, 1), andthe rest to (1− π0)/(S − 1) [6].

    IMGM leverages the current observation zt to propagate theposterior marginal state pdf p(xt−1|Zt−1) to p(xt|Zt). Towardsthis end, we start by approximating the mode-conditional pos-terior of xt with a Gaussian pdf

    p(xt|σt = s,Zt) ≈ N (xt; x̂st|t,Pst|t) (11)

    where x̂st|t and Pst|t are the mean and the covariance matrix

    associated with mode s. Bayes’ rule and the total probabilitytheorem (TPT) yields the marginal posterior

    p(xt|Zt) =S∑

    s=1

    Pr(σt=s|Zt) p(xt|σt=s,Zt)

    ≈S∑

    s=1

    wst|t N (xt; x̂st|t,Pst|t) (12)

    approximated by a Gaussian mixture (GM) pdf, which is param-eterized by the set

    Pt := {wst|t, x̂st|t,Pst|t, s = 1, . . . , S}. (13)

    This GM model facilitates the propagation fromp(xt−1|Zt−1)to p(xt|Zt) through updates of the elements in Pt−1 to those inPt. These updates will be implemented using the prediction andcorrection of the mode pmf and the mode-conditional state pdfas detailed next.

    A. Prediction

    At the end of slot t− 1, the posterior marginal state pdf ischaracterized by Pt−1. Before the arrival of a new observationzt, IMGM leverages the mode and state evolution models (cf.(10) and (6)) to make predictions about the mode pmf and themode-conditional state pdf, respectively.

    1) Predicted Mode Pmf: Based on the Markov transitionmodel (10), the predicted mode pmf is readily obtained via TPTand Bayes’ rule as

    wst|t−1 :=Pr(σt = s|Zt−1)=S∑

    s′=1

    Pr(σt = s,σt−1 = s′|Zt−1)

    =

    S∑

    s′=1

    Pr(σt = s|σt−1 = s,′ Zt−1)Pr(σt−1 = s′|Zt−1)

    =

    S∑

    s′=1

    πss′ws′

    t−1|t−1 . (14)

    2) Predicted State Pdf: Since p(xt−1|σt−1 = s,′ Zt−1) isGaussian (cf. (11)), the linear-Gaussian state transition model(6) conditioned on σt = s allows one to deduce that

    p(xt|σt = s, σt−1 = s,′ Zt−1) = N (xt; x̂s,s′

    t|t−1,Ps,s′

    t|t−1) (15)

    where the subscript (s, s′) refers to the conditioning on modes(s, s′) at slots t and t− 1 respectively.

    The first two moments of the pdf in (15) are given by

    x̂s,s′

    t|t−1 = Fst x̂

    s′

    t−1|t−1 (16a)

    Ps,s′

    t|t−1 = FstP

    s′

    t−1|t−1 (Fst )

    � +Kst . (16b)

    Next, using the TPT and Bayes’ rule, we express the predictedmode-conditional state pdf at t as

    p(xt|σt = s,Zt−1)

    =

    S∑

    s′=1

    Pr(σt−1=s′|σt =s,Zt−1)p(xt|σt = s, σt−1 =s,′ Zt−1)

    (17)

    where Pr(σt−1 = s′|σt = s,Zt−1) := ws′ |s

    t−1|t can be interpretedas the backward mode transition probability, which upon

    Authorized licensed use limited to: University of Minnesota. Downloaded on May 08,2020 at 00:39:08 UTC from IEEE Xplore. Restrictions apply.

  • 2590 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

    Algorithm 1: One Recursion of the IMGM Algorithm.

    1: Input: Pt−1, zt, {Fst ,Kst}Ss=1, Rt, Ht, Π2: for s = 1 to S do3: S1 Prediction4: S1.1 of mode pmf via (14)5: S1.2 of mode-conditional state pdf via (16), (18),

    and (20)6: S2 Correction7: S2.1 of mode-conditional state pdf via (23)8: S2.2 of mode pmf via (24)9: S3 Fusion of mode-conditional state pdfs via (27)

    10: end for11: Output: Pt, x̂t|t, Pt|t

    appealing to Bayes’ rule and the TPT, boils down to

    ws′|st−1|t =

    Pr(σt−1 = s′|Zt−1)Pr(σt = s|σt−1 = s,′ Zt−1)∑S

    s′=1Pr(σt−1 = s′|Zt−1)Pr(σt = s|σt−1 = s,′ Zt−1)

    =ws

    t−1|t−1πss′∑Ss′=1 w

    s′t−1|t−1πss′

    . (18)

    So far, the predicted mode-conditional state pdf (17) is aGM pdf. A GM prior however, does not evolve to a Gaussianposterior pdf with Gaussian likelihood. To maintain Gaussianityof the posterior mode-conditional state pdf as in (11), we willapproximate (17) by the following Gaussian pdf

    p(xt|σt = s,Zt−1) ≈ N (xt; x̂st|t−1,Pst|t−1) (19)

    where x̂st|t−1 and Pst|t−1 are chosen to match the first two

    moments of the GM pdf (17) as

    x̂st|t−1 =S∑

    s′=1

    ws′ |st−1|tx̂

    s,s′

    t|t−1 (20a)

    Pst|t−1 =S∑

    s′=1

    ws′ |st−1|t

    (Ps,s

    t|t−1

    + (x̂s,s′

    t|t−1 − x̂st|t−1)(x̂

    s,s′

    t|t−1 − x̂st|t−1)

    �)

    . (20b)

    Approximating non-Gaussian pdfs with Gaussian ones is awell-documented approach to effect scalability in approximate(Bayesian) inference, including variational inference and ex-pectation propagation; see [23] and the references therein. Withmoments of the approximating Gaussian pdf matched to that ofthe non-Gaussian one (cf. (20)), the KL divergence between thetwo pdfs is minimized.

    Up to now, we have obtained the predicted mode pmf and themode-conditional state pdf, which will be propagated to theirposterior counterparts after a new zt is observed.

    B. Correction

    1) Posterior Mode-Conditional State Pdf: Given thenew observation zt, the approximate predicted mode-conditional state pdf (19) is propagated to its posterior via

    Bayes’ rule as

    p(xt|σt = s,Zt) = p(xt|σt = s, zt,Zt−1)

    =p(xt|σt = s,Zt−1)p(zt|xt, σt = s,Zt−1)

    p(zt|σt = s,Zt−1)(21)

    where p(zt|xt, σt = s,Zt−1) = N (zt;Htxt,Rt), since zt isindependent of Zt−1 and σt. Hence, with the likelihood andthe prior (cf. (19)) being Gaussian, it holds that

    p(xt|σt = s,Zt) = N (xt; x̂st|t,Pst|t) (22)

    where the first two moments x̂st|t and Pst|t are obtained via the

    Kalman update as (see e.g., [6])

    ẑst|t−1 = Htx̂st|t−1 (23a)

    Φst|t−1 = HtPst|t−1 (Ht)

    � +Rt (23b)

    Gst = Pst|t−1 (Ht)

    � (Φst )−1 (23c)

    x̂st|t = x̂st|t−1 +G

    st (zt − ẑst|t−1) (23d)

    Pst|t = Pst|t−1 −GstΦst|t−1 (Gst )

    � . (23e)

    2) Posterior Mode Pmf: Upon applying Bayes’ rule, theposterior mode pmf is

    wst|t = Pr(σt = s|zt,Zt−1)

    =p(zt|σt = s,Zt−1)Pr(σt = s|Zt−1)∑Ss=1 p(zt|σt = s,Zt−1)Pr(σt = s|Zt−1)

    (24)

    where the first factor p(zt|σt = s,Zt−1) is computable via (14),and the second factor is the normalizing pdf in (21)

    p(zt|σt = s,Zt−1) =∫

    p(zt,xt|σt = s,Zt−1)dxt

    =

    ∫p(zt|xt)p(xt|σt = s,Zt−1)dxt

    (25)

    that can be shown to be the Gaussian N (zt; ẑst|t−1,Φst|t−1) with

    ẑst|t−1 and Φst|t−1 given by (23a) and (23b), respectively.

    C. Fusion

    Finally, the marginal posterior state pdf is given by fusing theindividual mode-conditional posteriors to obtain the GM

    p(xt|Zt) =S∑

    s=1

    wst|tN (xt; x̂st|t,Pst|t) (26)

    whose first two moments are

    x̂t|t =S∑

    s=1

    wst|tx̂st|t (27a)

    Pt|t =S∑

    s=1

    wst|t

    (Pst|t + (x̂

    st|t − x̂t|t)(x̂st|t − x̂t|t)�

    ). (27b)

    Authorized licensed use limited to: University of Minnesota. Downloaded on May 08,2020 at 00:39:08 UTC from IEEE Xplore. Restrictions apply.

  • LU et al.: GRAPH-ADAPTIVE SEMI-SUPERVISED TRACKING OF DYNAMIC PROCESSES OVER SWITCHING NETWORK MODES 2591

    Fig. 1. Flowchart of IMGM with S = 2 modes for one recursion, where yellow is used for mode 1, and blue for mode 2. Each mode predicts the first twomoments of the state pdf at slot t assuming the active mode is 1, or 2, respectively. Then the predictive state pdf conditioned on mode s ∈ {1, 2} at slot t is obtainedby fusing the contributions from modes 1 and 2 at slot t− 1 (denoted by the green lines in the figure). After receiving new observation zt, each mode updates thefirst two moments of the state pdf. Aided by lst = N (zt; ẑst|t−1,Φ

    st|t−1), each mode obtains the posterior weight, based on which the fused state moments in the

    green box are acquired.

    Thus, the posterior mean (27a) is the minimum mean-squareerror (MMSE) estimator of xt, whose uncertainty is character-ized by the covariance matrix (27b). On the other hand, uponapproximating the GM in (26) with a single Gaussian pdf havingmatched moments, (27a) can also be interpreted as the MAPestimator of xt.

    The implementation steps of the IMGM algorithm for onerecursion are summarized in Alg.1, and the flowchart of IMGMforS = 2modes is presented in Fig. 1. Note that at initialization,the mode probabilities and mode-conditional state pdfs are setto be identical across modes; that is,

    ws0|0 =1

    S, x̂s0|0 = x̂0, P

    s0|0 = P0, s = 1, . . . , S (28)

    where x̂0 and P0 encode our prior information about the initialstate distribution.

    IMGM incurs low computational complexity of orderO(STN3) over T slots, which is clearly more affordable thanthe exponential complexity of the optimal solution of (9). To fur-ther maintain scalability for N �, the graph can be divided intoNg subgraphs, each with at most �N/Ng nodes. Upon leverag-ing distributed solvers along the lines of [28], the computationalcomplexity per subgraph is O(S�N/Ng3), yielding an overallcomplexity of order O(SNg�N/Ng3). Hence, scalability forlarge graphs can be effected by adjusting Ng . However, how to

    optimally chooseNg and divide the graph based on the topology,is an interesting future direction.

    A few remarks are now in order.Remark 1: IMGM is a memoryless online algorithm that

    requires no storage of past observations. All information aboutthe past is summarized by the parameter set Pt−1 that definesthe GM pdf for the marginal state distribution.

    Remark 2: Different from our IMGM, the classical IMM [6]first approximates a GM by a single Gaussian for each modethat corresponds to an updated mode-conditional state posterior,which is the input to one of S parallel mode-dependent Kalmanfilters with prediction and correction. Adhering to both Gaussianpredicted (19) and posterior (22) mode-conditional state pdfs,IMGM predicts a GM per mode (17) that is then approximatedby a single mode-conditional Gaussian (19), before running Sparallel Kalman correction steps. The order of approximationand prediction makes no difference for linear state transitionmodels.

    IV. MULTI-KERNEL, TRANSITION, AND NOISE ADAPTIVITY

    This section shows that IMGM can also be utilized to trackdynamic processes and adapt the graph kernel(s), transitionfunction, and noise variance per slot even for a fixed graph.

    To start with, the linear state transition model in (6) may notbe able to fully capture the dynamics of the graph processes,

    Authorized licensed use limited to: University of Minnesota. Downloaded on May 08,2020 at 00:39:08 UTC from IEEE Xplore. Restrictions apply.

  • 2592 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

    necessitating a general nonlinear transition model, which isgiven by a nonlinear function f(Aσt ,xt−1). Moreover, the statein several applications may adhere to a different dynamic modelper slot. For example, the stocks in an economic network abideby different evolution patterns, e.g. in a period of economicrecession. Hence, the transition function f (7) may jump amonga candidate set ofL transition functions {f1, . . . , fL} for a giventopology at slot t.

    Besides f , the noise model in (6) can be sensitive to theselection of the appropriate kernel (4). To deal with this, adictionary of candidate ‘basis kernels’ (4) can be constructedwith energy mappings in the set {r1(·) . . . rK(·)}. Hence, theextended dynamical model of graph processes that further ac-counts for multiple kernels and switching nonlinear dynamicfunctions is given by

    xt = flt(Aσtt ,xt−1) + η

    kt,σtt (29)

    where lt ∈ L := {1, . . . , L} is the active dynamic function in-dex, and σt ∈ S denotes the active topology index; while thezero-mean Gaussian process noise ηkt,σtt has covariance matrixKkt,σtt with active kernel function indexkt ∈ K := {1, . . . ,K}.

    On the other hand, the observation noise covariance in (1)is selected as Rt = μIM . The scale μ is typically tuned viacross-validation offline among a candidate set of grid points{μ1, . . . , μR}. However, μ remains fixed for all t, and does notadapt to the data across slots t. To avoid cross-validation andeffect a data-driven choice of μ, we can recast the observationmodel in (1) as

    zt = Htxt + ertt (30)

    where the covariance matrix of ertt is Rrtt = μ

    rtIM with rt ∈R := {1, . . . , R}.

    To incorporate switching topologies, dynamic functions, ker-nels, and observation noise covariances, we construct S̄ = S ×L×K ×R candidate dynamical models with the active model{f lt(Aσt , ·),Kkt,σtt ,Rrtt } indicated by the extended networkmode σt := (lt, σt, kt, rt) ∈ S̄ , where the extended networkmode set is S̄ := L × S × K ×R. Before invoking the IMGMalgorithm, one has to rescale the cost function in (9) by re-placing Rrtt with IM , and subsequently absorbing R

    rtt into the

    process noise covariance as K̃kt,σt,rtt = Kkt,σtt /μ

    rt , such thatthe fitting error in (9) will have no scaling factor. The expandedcandidate list of dynamical models at slot t is then constructedas {f l(Aσ, ·), K̃k,σ,rt ,σ ∈ S̄}. Subsequently, by changing Rtto IM in (23b), the IMGM algorithm is readily applied withonly one revision in (15) for nonlinear dynamical models. Asalluded to in the previous discussion, IMGM strives to maintaina Gaussian mode-conditional state pdf. Thus, to approximatethe nonlinear transformation of a Gaussian state pdf by anotherGaussian, we can leverage the unscented transformation as inunscented KF [34], or, just linearize the nonlinear transitionfunctions, as in extended KF [6].

    Three more remarks are in order.Remark 3: For dynamical models with unknown parameters,

    candidate dynamical models can be constructed and IMGMcan learn the model parameters that best fit the data on-the-fly,

    thus circumventing extra offline model training. Such a jointsystem identification and state estimation problem has beenconsidered in the KF literature along three prevailing lines.The first leverages the expectation-maximization algorithm toiterate between state estimation and system identification, butthe online characteristic is compromised; see e.g., [33]. Thesecond approach chooses the model parameters from a knowndictionary, and applies the classical IMM approach to select theappropriate parameters online [18]. Recently, for models withunknown process, and observation noise covariance matrices,a variational Bayesian approach is employed to obtain pdfestimates [13].

    Remark 4: With only one graph and Flt,σtt = 0, IMGM of-fers an online probabilistic multi-kernel based alternative to re-construct TI graph processes, which complements rather nicelythe deterministic multi-kernel KRR framework in [26]. For afixed set of nodes, the complexity of IMGM for time-invariantgraphs is the same as that for the time-varying case.

    V. NUMERICAL TESTS

    In this section, we evaluate the performance of the pro-posed IMGM approach using synthetic and real data, and com-pare it with existing algorithms, including the kernel Kalmanfilter (KKF) [25]; the adaptive least mean-square (LMS) al-gorithm [10] with bandwidth BLMS ∈ {2, 4, 6, . . . , 20} andstep size μLMS ∈ {0.5, 0.6, 0.7, . . . , 2}; as well as the dis-tributed least-squares reconstruction (DLSR) [35] with band-width BDLSR ∈ {2, 4, 6, . . . , 20} and step sizes μDLSR ∈{0.2, 0.4, 0.6, . . . , 2} and βDLSR ∈ {0.1, 0.2, . . . , 0.9}. BothLMS and DLSR can track slowly time-varying B-bandlimitedgraph processes. Unless stated otherwise, the reported perfor-mances of LMS and DLSR are best-performing in terms ofNMSE with hyperparameters selected from the candidate sets.Also, we consider the oracle of IMGM (abbreviated as “IMGM-O”), which relies on the dynamical model (6), but with knownσt.To compare on equal footing with LMS and DLSR, which cannotdeal with time-varying observation matrices, we setHt = H forall t ∈ {1, . . . , T}. For experiments with switching graphs, thecompeting algorithms know the active graph topology per slott, whereas our mode-agnostic IMGM estimates σt on-the-fly.The performance metric is the normalized mean-square error(NMSE) over unobserved nodes, which is given by

    NMSE(t) := ‖Hct(x̂t|t − xt

    )‖22/‖Hctxt‖22 (31)

    where Hct is the sampling matrix for the unobserved nodes. Dueto the random sampling scheme, the performance is averagedover 100 random sampling realizations.

    A. Synthetic Data

    We consider a synthetic dynamic process over a networkwith N = 60 nodes, and S = 2 modes. The graph topologiesassociated with the two modes at slot 1 are generated by twosymmetric Erdős-Rényi random graphs with edge existenceprobabilities 0.1 and 0.2, respectively. In the following slots, werandomly choose two pairs of nodes, and the edge between eachpair is flipped relative to the previous slot per mode. The network

    Authorized licensed use limited to: University of Minnesota. Downloaded on May 08,2020 at 00:39:08 UTC from IEEE Xplore. Restrictions apply.

  • LU et al.: GRAPH-ADAPTIVE SEMI-SUPERVISED TRACKING OF DYNAMIC PROCESSES OVER SWITCHING NETWORK MODES 2593

    Fig. 2. Posterior mode pmfs of IMGM for synthetic data.

    Fig. 3. NMSE for synthetic data.

    switches from mode 1 to mode 2 at slot 6, and back to mode 1 atslot 11 over a total T = 15 slots. The dynamic graph processxt is generated according to (6) with F

    σtt = 0.2(A

    σtt + IN )

    and ησtt ∼ N (ησtt ;0,Kσtt ), where Kσtt is a diffusion kernelwith a = 0.1 (see ). The observations are generated based on(1) with M = 30 and R = 4IM . Only IMGM-O was comparedwith IMGM because the rest of the approaches have no infor-mation about the generative model. The average mode posteriorprobabilities produced by IMGM over 100 Monte-Carlo runsare shown Fig. 2, which demonstrates that the IMGM is capableof keeping track of the active network mode in the presence ofunknown switches. Further, Fig. 3 plots the NMSE over time, andillustrates that IMGM achieves the same NMSE as IMGM-O,which relies on extra information. Fig. 4 depicts the estimatedprocesses along with the corresponding true values over anunobserved node. The perfect tracking of the true signal furthervalidates IMGM’s nearly optimal reconstruction performance.

    Fig. 4. True and estimated processes over an unobserved node for syntheticdata.

    B. Brain ECoG Dataset

    Next, we experiment with the brain ECoG data obtained froman epilepsy study [17]. The ECoG time series were obtainedfrom N = 76 electrodes implanted in a patient’s brain beforeand after a seizure, where the onset of the seizure was identifiedby a neurophysiologist. Therefore, there are S = 2 modes, thepre-ictal and ictal mode that correspond to before and after theseizure. We extract 250 samples from the dataset for each ofthe two modes, which are preprocessed by subtracting the sam-ple mean and normalizing by the sample standard deviation. Thepreprocessed samples are then concatenated so that σt = 1 fort = 1, . . . , 250, and σt = 2 for t = 251, . . . , 500. We constructa time-invariant symmetric correlation graph for each of the twomodes, which is a special case of the problem statement at theend of Section II. The ECoG signals are modeled to evolve basedon (6), where the state transition matrixFσtt = 0.15(A

    σt + IN ),and process noise covariance Kσtt is a diffusion kernel withparameter a = 2. Here, the value 0.15 and a = 2 are selected toyield the lowest NMSE from the sets {0.1, 0.11, 0.12, . . . , 0.3}and {1, 1.2, 1.4, . . . , 3}. The observations are generated as in(1) with M = 53, and R = 10−2IM .

    Fig. 5 shows the posterior mode probabilities {wst|t}2s=1 pro-duced by IMGM over 100 random sampling schemes. Here,IMGM plays the role of a “neurophysiologist” who detects theonset of an epileptic seizure. In addition, the NMSE of IMGMis comparable to that of the mode-clairvoyant IMGM-O, whilemarkedly outperforming KKF, LMS and DLSR, as confirmed byFig. 6. The NMSEs for all methods undergo a peak at the onset ofthe ictal, while for LMS and KKF the NMSEs are considerablylarger during the ictal period. As in Fig. 7, the estimated brainsignals from IMGM and IMGM-O over an unobserved nodeagree quite well with the corresponding true values, which ishowever not the case for the rest of the approaches. Further, Fig. 8demonstrates that IMGM enjoys lower NMSE as the number Mof sampled nodes grows.

    Authorized licensed use limited to: University of Minnesota. Downloaded on May 08,2020 at 00:39:08 UTC from IEEE Xplore. Restrictions apply.

  • 2594 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

    Fig. 5. Posterior mode pmfs of IMGM for ECoG data.

    Fig. 6. NMSE for ECoG data (μLMS = 0.6, BLMS = 2, μDLSR = 1.2,BDLSR = 6, βDLSR = 0.5).

    C. Temperature Prediction

    The next dataset comprises hourly temperature measurementsat N = 109 measuring stations across the continental UnitedStates in 2010 by the National Climatic Data Center [1]. A time-invariant graph was constructed based on geographical distancesas in [25]. Even though only one graph is available, IMGM canstill be applied to track the dynamic processes and simultane-ously learn the model parameters that best fit the data as inSection IV, which would otherwise need an offline trainingprocess. The value xt(n) represents the tth temperature sam-ple recorded at the nth station. The sampling interval in ourexperiment is chosen to be one day. The number of observednodes is M = 44, and observation noise covariance is selectedfrom the candidate set as Rrt = μrtIM , where μrt ∈ 10−4 ×{1, 2, . . . , 5}. The transition matrix is taken as Fltt = clt(A+

    Fig. 7. True and estimated brain signals over an unobserved node for ECoGdata (μLMS = 0.6, BLMS = 2, μDLSR = 1.2, BDLSR = 6, βDLSR = 0.5).

    Fig. 8. Overall NMSE of IMGM versus M for ECoG data.

    IN ), where clt takes value from 0.05 to 0.15 with uniform grid0.02. The process noise is given by a diffusion kernel with a = 2.Thus, IMGM is equipped with 30 candidate dynamical models,among which the best performing one is assigned to IMGM-Owith Ft = 0.05(A+ IN ) and R = 10−4IM .

    As shown in Fig. 9, the mode-agnostic IMGM demonstratessuperior tracking performance compared to the KKF, LMS andDLSR, while it also showcases performance comparable toIMGM-O. Hence, IMGM is capable of selecting the dynamicalmodel that best fits the data on-the-fly. Fig. 10 further corrobo-rates this assertion by displaying the true and estimated networkdelays from the candidate approaches over an unobserved node.The probability of existence of each model is reported by theposterior mode pmf wst|t as w

    1t|t ≈ 1, and wst|t ≈ 0 for s other-

    wise.

    Authorized licensed use limited to: University of Minnesota. Downloaded on May 08,2020 at 00:39:08 UTC from IEEE Xplore. Restrictions apply.

  • LU et al.: GRAPH-ADAPTIVE SEMI-SUPERVISED TRACKING OF DYNAMIC PROCESSES OVER SWITCHING NETWORK MODES 2595

    Fig. 9. NMSE for temperature data (μLMS = 1.5, BLMS = 10, μDLSR =1.2, BDLSR = 4, βDLSR = 0.5).

    Fig. 10. True and estimated temperature values over an unobserved location(μLMS = 1.5, BLMS = 10, μDLSR = 1.2, BDLSR = 4, βDLSR = 0.5).

    D. Network Delay Prediction

    The last dataset records measurements of path delays on theInternet2backbone [2]. The network comprises 9 end-nodes and26 directed links. There are N = 70 paths, each connectingtwo origin-destination nodes by a subset of the 26 links. Theactive links for each path are described by the path-link routingmatrix B ∈ {0, 1}70×26, whose (n, l)th entry Bn,l is 1, if path ntraverses link l, and 0 otherwise. With each vertex representingone of these paths, an undirected graph is constructed with the(n, n′)th entry (n �= n′) of the adjacency matrix as

    A(n, n′) =

    ∑26l=1 Bn,lBn,′l∑26

    l=1 Bn,l +∑26

    l=1 Bn,′l −∑26

    l=1 Bn,lBn,′l

    Fig. 11. NMSE for network delay data (μLMS = 1.5, BLMS = 12).

    Fig. 12. True and estimated network delays over an unobserved path (μLMS =1.5, BLMS = 12).

    which places large weights for vertices (paths) with a largenumber of common links. The graph process xt(n) representsthe delay of path n in minutes.

    The number of observed nodes is selected to be M = 20.The candidate dynamical models for IMGM are configuredas follows. The state transition matrix is selected to be Ft =0.17(A+ IN ). Process noise covariance Kkt is chosen from aset of K = 8 diffusion kernels (cf. ) with parameter akt takingvalues from 0.6 to 2 with uniform space 0.2. Observation noisecovariance for the observation model (30) is Rrt = μrtIM ,with μrt ∈ {10−4, 10−3, 10−2, 10−1}. The number of candi-date models for IMGM is then S̄ = 8× 4 = 32, among whichIMGM-O is equipped with the best performing one: a = 0.6 and

    Authorized licensed use limited to: University of Minnesota. Downloaded on May 08,2020 at 00:39:08 UTC from IEEE Xplore. Restrictions apply.

  • 2596 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

    R = 10−2IM . For this experiment, we did not employ DLSRbecause it did not yield comparable performance to the rest ofthe alternatives.

    Adaptively choosing a model with kernel parameter akt andnoise parameter μrt from the candidate set, IMGM exhibitssuperior tracking performance compared to the single-modelalternatives, as confirmed by Fig. 11. This also corroborates thatIMGM provides a probabilistic multi-kernel learning approach.The estimated and true network delays from an unobserved nodeover the entire observation interval are plotted in Fig. 12.

    VI. CONCLUSION

    This paper dealt with tracking dynamic graph processes thatevolve over a candidate set of graph topologies with unknownswitches. To this end, a dynamical model was introduced to cap-ture both spatial and temporal variations of the graph processesthrough the notion of active mode-conditioned topology. Sub-sequently, given observations over a subset of nodes, a scalableBayesian tracker, termed IMGM, was developed to carry outsemi-supervised tracking of the dynamic graph processes jointlywith the active network mode. The novel IMGM solver lendsitself to several important generalizations, including dynamicfunction switches, multiple kernels, and adaptive observationnoise covariances. Accounting for all these models, IMGMoffers an online approach to select the one that best fits the dataadaptively, while at the same time tracks the graph processes.Numerical tests on synthetic and real data corroborated theperformance gain of the novel IMGM approach.

    REFERENCES

    [1] “1981-2010 U.S. climate normals,” [Online]. Available: https://www. ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/climat e-normals/1981-2010-normals-data. Accessed: Apr.29, 2019.

    [2] “One-way ping internet2,” [Online]. Available: http://software.internet2.edu/owamp/, Accessed: Apr. 29, 2019.

    [3] R. Agaev and P. Chebotarev, “The matrix of maximum out forests ofa digraph and its applications,” Autom. Remote Control, vol. 61, no. 9,pp. 1424–1450, 2000.

    [4] R. P. Agaev and P. Chebotarev, “Spanning forests of a digraph andtheir applications,” Autom. Remote Control, vol. 62, no. 3, pp. 443–466,Mar. 2001.

    [5] B. Baingana and G. B. Giannakis, “Tracking switched dynamic networktopologies from information cascades,” IEEE Trans. Signal Process.,vol. 65, no. 4, pp. 985–997, Feb. 2017.

    [6] Y. Bar-Shalom, X.-R. Li, and T. Kirubarajan, Estimation With Applicationsto Tracking and Navigation: Theory Algorithms and Software. Hoboken,NJ, USA: Wiley, 2004.

    [7] H. A. Blom and Y. Bar-Shalom, “The interacting multiple model algorithmfor systems with Markovian switching coefficients,” IEEE Trans. Autom.Control, vol. 33, no. 8, pp. 780–783, Aug. 1988.

    [8] D. Boley, G. Ranjan, and Z.-L. Zhang, “Commute times for a directedgraph using an asymmetric Laplacian,” Linear Algebra its Appl., vol. 435,no. 2, pp. 224–242, Jul. 2011.

    [9] F. Chung, “Laplacians and the Cheeger inequality for directed graphs,”Ann. Combinatorics, vol. 9, no. 1, pp. 1–19, Apr. 2005.

    [10] P. Di Lorenzo, S. Barbarossa, P. Banelli, and S. Sardellitti, “Adaptiveleast mean-squares estimation of graph signals,” IEEE Trans. Signal Inf.Process. Over Netw., vol. 2, no. 4, pp. 555–568, Dec. 2016.

    [11] P. A. Forero, K. Rajawat, and G. B. Giannakis, “Prediction of partiallyobserved dynamical processes over networks via dictionary learning,”IEEE Trans. Signal Process., vol. 62, no. 13, pp. 3305–3320, Jul. 2014.

    [12] G. B. Giannakis, Y. Shen, and G. V. Karanikolas, “Topology identificationand learning over graphs: Accounting for nonlinearities and dynamics,”Proc. IEEE, vol. 106, no. 5, pp. 787–807, May 2018.

    [13] Y. Huang, Y. Zhang, Z. Wu, N. Li, and J. Chambers, “A novel adaptiveKalman filter with inaccurate process and measurement noise covariancematrices,” IEEE Trans. Autom. Control, vol. 63, no. 2, pp. 594–601,Feb. 2018.

    [14] V. N. Ioannidis, D. Romero, and G. B. Giannakis, “Inference of spatio-temporal functions over graphs via multikernel kriged Kalman filter-ing,” IEEE Trans. Signal Process., vol. 66, no. 12, pp. 3228–3239,Jun. 2018.

    [15] E. D. Kolaczyk, Statistical Analysis of Network Data: Methods and Mod-els. Berlin, Germnay: Springer, 2009.

    [16] R. I. Kondor and J. Lafferty, “Diffusion kernels on graphs and other discretestructures,” in Proc. Int. Conf. Machi. Learn., Jul. 2002, pp. 315–322.

    [17] M. A. Kramer, E. D. Kolaczyk, and H. E. Kirsch, “Emergent networktopology at seizure onset in humans,” Epilepsy Res., vol. 79, no. 2/3,pp. 173–186, May 2008.

    [18] X. R. Li and Y. Bar-Shalom, “A recursive multiple model approach tonoise identification,” IEEE Trans. Aerosp. Electron. Syst., vol. 30, no. 3,pp. 671–684, Jul. 1994.

    [19] X.-R. Li and Y. Bar-Shalom, “Design of an interacting multiple modelalgorithm for air traffic control tracking,” IEEE Trans. Control Syst.Technol., vol. 1, no. 3, pp. 186–194, Sep. 1993.

    [20] Q. Lu, V. Ioannidis, and G. B. Giannakis, “Semi-supervised trackingof dynamic processes over switching graphs,” in Proc. IEEE Data Sci.Workshop, Minneapolis, MN, Jun. 2019, pp. 64–68.

    [21] Q. Lu, V. Ioannidis, G. B. Giannakis, and M. Coutino, “Learninggraph processes with multiple dynamical models,” in Proc. AsilomarConf. Signals, Syst., Comput., Pacific Grove, CA, USA, Nov. 2019,pp. 1783–1787.

    [22] E. Mazor, A. Averbuch, Y. Bar-Shalom, and J. Dayan, “Interacting multiplemodel methods in target tracking: A survey,” IEEE Trans. Aerosp. Electron.Syst., vol. 34, no. 1, pp. 103–123, Jan. 1998.

    [23] K. P. Murphy, Machine Learning: A Probabilistic Perspective. Cambridge,MA, USA: MIT Press, 2012.

    [24] N. Perraudin and P. Vandergheynst, “Stationary signal processing ongraphs,” IEEE Trans. Signal Process., vol. 65, no. 13, pp. 3462–3477,Jul. 2017.

    [25] D. Romero, V. N. Ioannidis, and G. B. Giannakis, “Kernel-based recon-struction of space-time functions on dynamic graphs,” IEEE J. Sel. TopicsSignal Process., vol. 11, no. 6, pp. 856–869, Sep. 2017.

    [26] D. Romero, M. Ma, and G. B. Giannakis, “Kernel-based reconstruction ofgraph signals,” IEEE Trans. Signal Process., vol. 65, no. 3, pp. 764–778,Feb. 2017.

    [27] A. V. Savkin and R. J. Evans, Hybrid Dynamical Systems: Controller andSensor Switching Problems. Berlin, Germany: Springer, 2002.

    [28] I. D. Schizas, G. B. Giannakis, S. I. Roumeliotis, and A. Ribeiro, “Con-sensus in Ad Hoc WSNs with noisy links—part II: Distributed estimationand smoothing of random signals,” IEEE Trans. Signal Process., vol. 56,no. 4, pp. 1650–1666, Apr. 2008.

    [29] S. Segarra, A. G. Marques, G. Leus, and A. Ribeiro, “Reconstruction ofgraph signals through percolation from seeding nodes,” IEEE Trans. SignalProcess., vol. 64, no. 16, pp. 4363–4378, Aug. 2016.

    [30] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst,“The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEESignal Process. Mag., vol. 30, no. 3, pp. 83–98, May 2013.

    [31] A. J. Smola and R. Kondor, “Kernels and regularization on graphs,” inLearning Theory and Kernel Machines. Berlin, Germany: Springer, 2003,pp. 144–158.

    [32] P. A. Traganitis and G. B. Giannakis, “Sketched subspace clustering,”IEEE Trans. Signal Process., vol. 66, no. 7, pp. 1663–1675, Apr. 2018.

    [33] E. A. Wan and A. T. Nelson, “Dual Kalman filtering methods for nonlinearprediction, smoothing and estimation,” in Proc. Advances Neural Inf.Process. Syst., 1997, pp. 793–799.

    [34] E. A. Wan and R. Van Der Merwe, “The unscented Kalman filter fornonlinear estimation,” in Proc. IEEE Adaptive Syst. for Signal Process.,Commun., Control Symp., 2000, pp. 153–158.

    [35] X. Wang, M. Wang, and Y. Gu, “A distributed tracking algorithm forreconstruction of graph signals,” IEEE J. Sel. Topics Signal Process., vol. 9,no. 4, pp. 728–740, Jun. 2015.

    [36] Q. Yu et al., “Application of graph theory to assess static and dynamicbrain connectivity: Approaches for building brain graphs,” Proc. IEEE,vol. 106, no. 5, pp. 886–906, May 2018.

    Authorized licensed use limited to: University of Minnesota. Downloaded on May 08,2020 at 00:39:08 UTC from IEEE Xplore. Restrictions apply.

    https://www. ignorespaces ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/climat ignorespaces e-normals/1981-2010-normals-data ignorespaces http://software.internet2. ignorespaces edu/owamp/

  • LU et al.: GRAPH-ADAPTIVE SEMI-SUPERVISED TRACKING OF DYNAMIC PROCESSES OVER SWITCHING NETWORK MODES 2597

    Qin Lu (Member, IEEE) received the Ph.D. degreein electrical engineering from the University of Con-necticut (UConn), Mansfield, CT, USA, in 2018. Sheis currently a Postdoctoral Research Associate withthe University of Minnesota, Twin Cities, Minneapo-lis, MN, USA. Her current research interests includemachine learning, data science, and network science.In the past, she has worked on statistical signalprocessing, multiple target tracking, and underwateracoustic communications. She was awarded SummerFellowship and Doctoral Dissertation Fellowship at

    UConn. She was also the recipient of the Women of Innovation Award inCollegian Innovation and Leadership by Connecticut Technology Council inMarch, 2018.

    Vassilis N. Ioannidis (Student Member, IEEE) re-ceived the diploma in electrical and computer en-gineering from the National Technical University ofAthens, Athens, Greece, in 2015, and the M.Sc. de-gree in electrical engineering in 2017 from the Uni-versity of Minnesota, Twin Cities, Minneapolis, MN,USA. He is currently working toward the Ph.D. de-gree with the Department of Electrical and ComputerEngineering. He received the Doctoral DissertationFellowship in 2019 from the University of Minnesota.He also received student travel awards from the IEEE

    Signal Processing Society in 2017–2018 and from the IEEE (2018). From 2014to 2015, he was a middleware consultant for Oracle in Athens, Greece, andreceived a Performance Excellence Award. His research interests include deepgraph learning, machine learning, big data analytics, and network science.

    Georgios B. Giannakis (Fellow, IEEE) received thediploma in electrical engineering from the NationalTechnical University of Athens, Athens, Greece,1981, the M.Sc. degree in electrical engineering, theM.Sc. degree in mathematics, and the Ph.D. degree inelectrical engineering from the University of South-ern California (USC), Los Angeles, CA, USA, in1983, 1986, and 1986, respectively. From 1982 to1986, he was with the USC. He was a Faculty Memberwith the University of Virginia from 1987 to 1998,and since 1999, he has been a Professor with the

    University of Minnesota, Twin Cities, Minneapolis, MN, USA, where he holdsan ADC Endowed Chair, a University of Minnesota McKnight PresidentialChair in ECE, and serves as the Director of the Digital Technology Center.His current research interests focus on data science, and network science withapplications to the Internet of Things, and power networks with renewables.His general interests span the areas of statistical learning, communications, andnetworking—subjects on which he has authored or coauthored more than 460journal papers, 760 conference papers, 25 book chapters, two edited books andtwo research monographs. He is the Coinventor of 33 issued patents, and thecorecipient of 9 best journal paper awards from the IEEE Signal Processing(SP) and Communications Societies, including the G. Marconi Prize PaperAward in Wireless Communications. He also received the IEEE-SPS NobertWiener Society Award (2019); EURASIP’s A. Papoulis Society Award (2020);Technical Achievement Awards from the IEEE-SPS (2000) and from EURASIP(2005); the IEEE ComSoc Education Award (2019); the G. W. Taylor Awardfor Distinguished Research from the University of Minnesota, and the IEEEFourier Technical Field Award (2015). He is a fellow of the National Academyof Inventors, the European Academy of Sciences, and EURASIP. He has servedthe IEEE in a number of posts, including that of a Distinguished Lecturer forthe IEEE-SPS.

    Authorized licensed use limited to: University of Minnesota. Downloaded on May 08,2020 at 00:39:08 UTC from IEEE Xplore. Restrictions apply.

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 150 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages false /GrayImageDownsampleType /Bicubic /GrayImageResolution 1200 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.00083 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages false /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages false /MonoImageDownsampleType /Bicubic /MonoImageResolution 1600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.00063 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

    /CreateJDFFile false /Description >>> setdistillerparams> setpagedevice