Graph Signal Processing: Filterbanks, Sampling and ...

Graph Signal Processing: Filterbanks, Sampling andApplications to Machine Learning and Video Coding

Antonio Ortega

Signal and Image Processing InstituteDepartment of Electrical Engineering

University of Southern CaliforniaLos Angeles, California

Sept 16, 2017

Acknowledgements

I Collaborators

- Sunil Narang (Microsoft), Godwin Shen (NASA-JPL), AkshayGadde, Hilmi Egilmez (Qualcomm)

- Jessie Chao, Aamir Anis, Eduardo Pavez, Joanne Kao, Keng-ShihLu (USC)

- Marco Levorato (UCI), Urbashi Mitra (USC), Salman Avestimehr(USC), Aly El Gamal (USC/Purdue), Eyal En Gad (USC), NiccoloMichelusi (USC/Purdue), Gene Cheung (NII), Pierre Vandergheynst(EPFL), Pascal Frossard (EPFL), David Shuman (MacalasterCollege), Yuichi Tanaka (TUAT), David Taubman (UNSW), DavidTay (LaTrobe).

I Funding

- NASA (AIST-05-0081), NSF (CCF-1018977, CCF-1410009,CCF-1527874)

- Mitsubishi Electric, Samsung, LGE, Google- TUAT

Outline

Motivation

Basic Concepts

Graph Transforms – Filterbanks

Graph Sampling – Machine Learning

Graph Learning from Data – Compression

Motivation

Graphs provide a flexible model to represent many datasets:

I Examples in Euclidean domains

(a) (b)

11

.5

(c)

(a) Computer graphics1 (b) Wireless sensor networks 2 (c) image - graphs

Motivation

I Examples in non-Euclidean settings

(a)

Combined ARQ-Queue

0

1

2

3

0 1 2 3

ARQ

Que

ue(b)

(a) Social Networks 3, (b) Finite State Machines(FSM)

Graph Signal Processing

I Given a graph (fixed or learned from data) and given signals on the graph(set of scalars associated to vertices) define frequency, sampling,transforms, etc in order to solve problems such as compression, denoising,interpolation, etc

I Overview papers:[Shuman, Narang, Frossard, Ortega, Vandergheysnt, SPM’2013][Sandryhaila and Moura 2013]

Examples

11

.5

I Sensor network

I Relative positions of sensors(kNN), temperature

I Does temperature varysmoothly?

I Social network

I Friendship relationship, ageI Are friends of similar age?

I Images

I Pixel positions and similarity,pixel values

I Discontinuities and smoothness

Outline

Motivation

Basic Concepts




Basic concepts

I Graph : vertices (nodes) connected via some links (edges)

I Graph Signal: set of scalar/vector values defined on the vertices.

1

2

3

4

5

6

7

8

9

10

1112

13

14

15

16

17

18

19

20

21 22 23

24 25 26

27

Graph-signal

Graph G = (V,E ,w)

Vertex Set V = v1, v2, ...Edge Set E = (v1, v2), (v1, v3), ...Weighted edges w , sets of weights aij

Graph Signal x = x1, x2, ...Neighborhood, h-hopNh(i) = j ∈ V : hop dist(i , j) ≤ h

Multiple algebraic representations

1

2

3

4

5

6

7

8

9

10

1112

13

14

15

16

17

18

19

20

21 22 23

24 25 26

27

I Graph G = (V,E ,w).

I Adjacency A, aij , aji = weights oflinks between i and j (could bedifferent if graph is directed.)

I Degree D = diagdi, in case ofundirected graph.

I Various algebraic representations

I normalized adjacency 1|λmax |A

I Laplacian matrix L = D− A.I Symmetric normalized

Laplacian L = D−1/2LD−1/2

I Graph Signalf = f (1), f (2), ..., f (N)

I Discussion:

1. Undirected graphs easier to work with2. Some applications require directed graphs3. Graphs with self loops are useful (more on this later)

Graph Fourier Transform (GFT)

I Different results/insights for different choices of operator

I Laplacian L = D− A = UΛU′

I Eigenvectors of L : U = ukk=1:N

I Eigenvalues of L : diagΛ = λ1 ≤ λ2 ≤ ... ≤ λN

I Eigen-pair system (λk , uk) provides Fourier-like interpretation —Graph Fourier Transform (GFT)

Graph Fourier Transform (GFT) – Intuition

I Finding eigenvalues/eigenvectors

I Rayleigh quotient formulation:

λk = minu⊥u1,...uk−1

utLu

utu

I For combinatorial Laplacian, L = D− A we have:

I u1 = 1I λ1 = 0I utLu =

∑ivj wij(ui − uj)

2

I Interpretation: Basis vectors (eigenvectors) ordered by increasingvariation on the graph.

Eigenvectors of graph Laplacian

(a) λ = 0.00 (b) λ = 0.04 (c) λ = 0.20

(d) λ = 0.40 (e) λ = 1.20 (f) λ = 1.49

I Basic idea: increased variation on the graph, e.g., ftLf, as frequencyincreases

What makes a transform a “graph transform”?

Input Signal Transform Output Signal Processing/

Analysis

I Properties

I Invertibility, critical sampling, orthogonality

I What makes these “graph transforms”?

I Frequency interpretation

I Operation is diagonalized by U

I Vertex localization: polynomial of the operator (A, L, etc)

I Polynomial degree is small (vs a polynomial of degree N − 1).

I Transforms adapted to graph: choosing the right graphI Example designs in vertex and frequency domain

Spatial Graph Transforms

I Designed in the vertex domain of the graph. Examples:

I Graph wavelets [Crovella’03]I Approaches for WSN [Wang’06], [Wagner’05] [Shen-ICASSP08]I Lifting wavelets on arbitrary graphs [Narang and O., 2009]

I 1-hop averaging transform

y [n] =1

dn

N∑

m=1

A[n,m]x [m] ⇒ y = D−1Ax = Prwx

I 1-hop difference transform

y [n] =1

dn

N∑

m=1

A[n,m](x [n]− x [m]) ⇒ y = Lrwx = x− Prwx

Spectral Graph Wavelets

I Spectral Wavelet transforms (Hammond et al, CHA’09):

Design spectral kernels: h(λ) : σ(G)→ R.

Th = h(L) = Uh(Λ)Ut

h(Λ) = diagh(λi )

I Analogy: FFT implementation of filters

Vertex Localization: SGWT

I Polynomial kernel approximation:

h(λ) ≈K∑

k=0

akλk

Th ≈K∑

k=0

akLk

I Note that A and L are both 1-hop operations K -hop localized: no spectraldecomposition required.

Summary

I Different types of graphs (directed, undirected, with/without self loops)

I Multiple algebraic representations (A,L, ...)

I Eigenvectors/eigenvalues induce a notion of variation/frequency

I A,L can be viewed as “shifts” or “elementary operators”

I Polynomials of these operators represent “local” processing on the graph

I Choice of graph impacts processing performed

Outline

Motivation

Basic Concepts




Graph Filterbank Designs

I Formulation of critically sampled graph filterbank design problem

I Design filters using spectral techniques [Hammond et al. 2009].

I Orthogonal (not compactly supported) [Narang and O. TSP’12]

I Bi-Orthogonal (compactly supported) [Narang and O. TSP’13]

analysis side synthesis side

filter downsample upsample filter

- -

Downsampling/Upsampling in Graphs

Downsampling-upsampling operation:

I Regular Signals:

fdu(n) =

f (n) if n = 2m0 if n = 2m + 1

I Graph signals:

fdu(n) =

f (n) if n ∈ S0 if n /∈ S

for some set S.

(a) regular signal (b) regular signal after DU by 2

(c) graph signal (d) graph signal after DU by 2

I For regular signals DU by 2 operation is equivalent toFdu(e jω) = 1/2(F (e jω) + F (e−jω)) in the DFT domain.

I What is the DU by 2 for graph signals in GFT domain?

Downsampling in Graphs

I Define Jβ = JβH = diagβH(n).I In vector form:

fdu =1

2(f + Jβ f)

=1

2(f + f)

I Spectral Folding [4]: For a bipartite graph: f (λ) = f (2− λ).

I This leads to solutions for bipartite graph filterbanks

Wavelet filterbanks on bipartite graphs

I Aliasing Cancellation ⇒ B = 0 if for all λ ∈ σ(G):

B(λ) = g1(λ)h1(2− λ)− g0(λ)h0(2− λ) = 0

I Perfect Reconstruction ⇒ A = cI if for all λ ∈ σ(G):

A(λ) = g1(λ)h1(λ) + g0(λ)h0(λ) = c

I Orthogonal (approximated by polynomials) and bi-Orthogonal (exactlypolynomial) are possible [Narang and O., IEEE TSP 2012, 2013]

Bipartite Subgraph Decomposition

I But not all graphs are bipartite...

I Solution: “Iteratively” decompose non-bipartite graph G into K bipartitesubgraphs:

I each subgraph covers the same vertex set.I each edge in G belongs to exactly one bipartite graph.

Bipartite Subgraph Decomposition

I Example of a 2-dimensional (K = 2) decomposition:

“Multi-dimensional” Filterbanks on graphs

Two-dimensional two-channel filterbank on graphs:

2

2

2

2

2

2

2

2

I Advantages:

I Perfect reconstruction and orthogonal for any graph and any bptdecomposition.

I defined metrics to find ”good” bipartite decompositions.

Example

Minnesota traffic graph and graph signal

Example

−1 0 1

−0.1 0 0.1

−1 0 1

−0.05 0 0.05

LL Channel LH Channel

HL Channel HH Channel

Reconstructed graph-signals for each channel.

What about non-bipartite graphs? [Anis and O., ICASSP’17]

I Maintain original (non-bipartite) graph

I Use polynomial analysis/synthesis filters

I Computationally optimize sampling sets to minimize errors

φ(S) =∥∥∥GTSTSH− I

∥∥∥2

F=

∥∥∥∥∥1

2

∑

i∈Sgih

Ti −

1

2

∑

j∈Sc

gjhTj

∥∥∥∥∥

2

F

Basic greedy minimization

Initialize: S = ∅1: while |S| < N do2: S ← S ∪ u, where u = argminv∈Sc φ(S ∪ v)3: end while

Experiments

I Hi ,Gi obtained using GraphBior(6,6)4

Sampling scheme

obtained for bipartite

graphs

Obtained sampling scheme

on Minnesota traffic graph

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.5

1

1.5

λ

|T(λ

)|

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.5

1

1.5

λ

max

µ ≠

λ |T

(µ)|

Spectral response (top) and

maximum aliasing component

(bottom) for the filterbank

Comparison against simple sampling schemes: mean recon.relative error over 1000 random signals

graph-QMF5 (poly 8) graphBior(6,6)Random 0.4842± 0.0113 0.4629± 0.0108MaxCut 0.1125± 0.0069 0.0972± 0.0061Proposed 0.0779± 0.0049 0.0664± 0.0045

4Narang and O., “Compact support biorthogonal wavelet filterbanks for arbitrary graphs”, IEEE TSP, 20125Narang and Ortega, “Perfect reconstruction two-channel wavelet filterbanks for graph-structured data”, IEEE

TSP, 2012

Summary

I Two design approaches: vertex domain/spectral domain

I Polynomials of L serve as connection between domains

I Numerous overcomplete solutions

I Critical sampling possible for a special case (bipartite) if we use apolynomial reconstruction.

Outline

Motivation

Basic Concepts




Sampling

I Sample signal at discrete points in time/space

I Core tool in Digital Signal Processing (DSP)

I From Analog to DigitalI Digital to Digital

I Key concept: signals can be recovered from their samples

I Questions:

I What properties enable recovery?I How to sample?I How to reconstruct?

Graph Signal Sampling Examples

11

.5

I Sensor network

I Relative positions of sensors(kNN), temperature

I Measure temperature in a subsetof sensors

I Social network

I Friendship relationship, ageI Estimate interests for a subset of

users

I Images

I Pixel positions and similarity,pixel values

I New image sampling techniques

Sampling

Traditional DSP

I Samples dropped in a regular fashion, spectral folding (aliasing).

I Cutoff frequency ⇔ sampling rate.

ReconstructionBandlimited signal

Answers:

I What properties enable recovery? Signals are smooth, low frequency

I How to sample? Regular sampling

I How to reconstruct? Low pass filtering

Graph Sampling?

I Measure a few nodes to estimate information throughout the graph

I Reconstruct signal in whole graph

Questions:

I What properties enable recovery? Need to define frequency

I How to sample? No obvious regular sampling

I How to reconstruct? Filtering is needed

Graph sampling results

Conditions for unique bandlimited reconstruction

I [Pesenson ’08, Anis et al, ’14-’15, Chen et al. ’15, Shomorony andAvestimehr ’14]

I assume that the GFT basis is known

Sampling set selection

Vertex domain methods

• [Narang and Ortega ’10]

• [Ngyuen and Do ’15]

Spectral domain methods

• [Chen et al. ’15]

• [Shomorony and Avestimehr ’14]

• ensure S and Sc are well connected

• optimize max-cut like criterion

• assume GFT basis U is known

• select subset of rows for stability

• used in graph filterbank design

• does not consider stability of BL recon.

• high computational cost

• unsuitable for large graphs

Goal: GFT free method of sampling set selection for stable BL reconstruction[Anis et al, ’14-’15]

Necessary and sufficient condition [Anis, Gadde, O., ICASSP ’14]

I If φ ∈ PWω(G), then g = f + φ ∈ PWω(G).

I f 6= g and f(S) = g(S) ⇒ trouble!

LemmaLet L2(Sc) = φ : φ(S) = 0. All signals f ∈ PWω(G) can be perfectlyrecovered from S if and only if PWω(G) ∩ L2(Sc) = 0.

Sampling theorem [Anis, Gadde, O., ICASSP ’14]

Theorem (Sampling theorem)

All signals f ∈ PWω(G) can be perfectly recovered from their samples f(S) ifand only if

ω < infφ∈L2(Sc )

ω(φ)4= ωc(S)

We call ωc(S) the true cutoff frequency.

The cutoff frequency depends on the size of S and topologies of G and S.

In order to optimize sampling set, maximize cut-off frequency

Sampling set optimization

FormulationRelax the true cutoff ωc(S) by Ωk(S), an approximation based on the k-thpower of the Laplacian and solve:

Minimize |S| subject to Ωk(S) ≥ ωc

Greedy Approach to get an estimate of Sopt [Anis, Gadde and O., ICASSP2014, TSP 2016]:

I Start with S = ∅.I At each iteration estimate lowest frequency eigenvector of L2(Sc): φ∗k (i)

I Add nodes to S (from Sc) one-by-one that ensure maximum increase inΩk(S) (estimate of L2(Sc) frequency) at each step.

I Choose new node i that has maximum entry φ∗k (i)

I Essentially this involves finding nodes that are “far” from S at eachiteration

I Note: Most alternative proposed methods require knowledge of the GFT

Sampling set optimization – Intuition

Cutoff frequency ≡ bandwidth of smoothest signal φ∗ in L2(Sc).

I Compare φ∗ ∈ L2(Sc) for both graphs.

I More cross-links ⇒ higher variation ⇒ higher bandwidth.

Sampling set optimization – Intuition

Cutoff frequency ≡ bandwidth of smoothest signal φ∗ in L2(Sc).

Which choice of S leads to a higher cutoff frequency?

I Compare φ∗ ∈ L2(Sc) for both graphs.

I More cross-links ⇒ higher variation ⇒ higher bandwidth.

Summary

I Graph sampling formulation similar to regular domain

I Lack of regular structure prevents regular sampling

I Vertex domain vs spectral solutions

I Simplified approach using spectral proxies

Learning: Motivation and Problem Definition

I Unlabeled data is abundant. Labeled data is expensive and scarce.

I Solution: Active Semi-supervised Learning (SSL).

I Classification performance can be improved when both labeled andunlabeled data are used for training

I Questions:

I How to predict unknown labels from the known labels?I What is the optimal set of nodes to label given the learning

algorithm?

Graph-based semi-supervised learning

Key Idea:

I Construct a distance-based similarity graph.

I Data points → nodes.I Edge weights → similarity.I Sample (label) some of the data points

I Treat class indicator vectors as graph signals that are

I Consistent with known labels.I Smooth on the graph.

Graph-based semi-supervised learning

Key Idea:

I Construct a distance-based similarity graph.

I Data points → nodes.I Edge weights → similarity.I Sample (label) some of the data points

I Treat class indicator vectors as graph signals that are

I Consistent with known labels.I Smooth on the graph.

Why does this work?

Connection to Graph Sampling

I Class membership functions can be approximated by bandlimited graphsignals.

0 200 400 600 800 10000

0.2

0.4

0.6

0.8

1

Eigenvalue index

CD

F o

f e

ne

rgy in

GF

T c

oe

ffic

ien

ts

(a) USPS

0 200 400 600 800 1000 1200 14000

0.2

0.4

0.6

0.8

1

Eigenvalue index

CD

F o

f energ

y in G

FT

coeffic

ients

(b) Isolet

0 200 400 600 800 10000

0.2

0.4

0.6

0.8

1

Eigenvalue index

CD

F o

f e

ne

rgy in

GF

T c

oe

ffic

ien

ts

(c) 20 newsgroups

Figure 3: Cumulative distribution of energy in the GFT coefficients of one of the class membership functions pertaining tothe three real-world dataset experiments considered in Section 5. Note that most of the energy is concentrated in the low-passregion.

term is expected to be negligible as compared to the firstone due to differencing, and we get

xtLx ≈

j∈Sc

pj

dj

x2

j , (24)

where, pj =

i∈S wij is defined as the “partial out-degree”of node j ∈ Sc, i.e., it is the sum of weights of edges crossingover to the set S. Therefore, given a current selected S, thegreedy algorithm selects the next node, to be added to S,that maximizes the increase in

Ω1(S) ≈ inf||x||=1

j∈Sc

pj

dj

x2

j . (25)

Due to the constraint ||x|| = 1, the expression being mini-mized is essentially an infimum over a convex combinationof the fractional out-degrees and its value is largely deter-mined by nodes j ∈ Sc for which pj/dj is small. In otherwords, we must worry about those nodes that have a lowratio of partial degree to the actual degree. Thus, in thesimplest case, our selection algorithm tries to remove thosenodes from the unlabeled set that are weakly connected tonodes in the labeled set. This makes intuitive sense as, inthe end, most prediction algorithms involve propagation oflabels from the labeled to the unlabeled nodes. If an unla-beled node is strongly connected to various numerous points,its label can be assigned with greater confidence.

Note that using a higher power k in the cost function,i.e., finding Ωk(S) for k > 1 involves xLkx which, looselyspeaking, takes into account higher order interactions be-tween the nodes while choosing the nodes to label. In asense, we expect it to capture the connectivities in a moreglobal sense, beyond local interactions, taking into accountthe underlying manifold structure of the data.

3.3 ComplexityWe now comment on the time and space complexity of

our algorithm. The most complex step in the greedy proce-dure for maximizing Ωk(S) is computing the smallest eigen-pair of (Lk)Sc . This can be accomplished using an iterativeRayleigh-quotient minimization based algorithm. Specifi-cally, the locally-optimal pre-conditioned conjugate gradi-ent (LOPCG) method [14] is suitable for this approach.Note that (Lk)Sc can be written as ISc,V .L.L . . . L.IV,Sc ,hence the eigenvalue computation can be broken into atomic

matrix-vector products: L.x. Typically, the graphs encoun-tered in learning applications are sparse, leading to efficientimplementations of L.x. If |L| denotes the number of non-zero elements in L, then the complexity of the matrix-vectorproduct is O(|L|). The complexity of each eigen-pair com-putation for (Lk)Sc is then O(k|L|r), where r is a constantequal to the average number of iterations required for theLOPCG algorithm (r depends on the spectral properties ofL and is independent of its size |V|). The complexity of thelabel selection algorithm then becomes O(k|L|mr), wherem is the number of labels requested.

In the iterative reconstruction algorithm, since we usepolynomial graph filters (Section 2.5), once again the atomicstep is the matrix-vector product L.x. The complexity ofthis algorithm can be given as O(|L|pq), where p is the orderof the polynomial used to design the filter and q is the av-erage number of iterations required for convergence. Again,both these parameters are independent of |V|. Thus, theoverall complexity of our algorithm is O(|L|(kmr + pq)). Inaddition, our algorithm has major advantages in terms ofspace complexity: Since, the atomic operation at each stepis the matrix-vector product L.x, we only need to store Land a constant number of vectors. Moreover, the structureof the Laplacian matrix allows one to perform the afore-mentioned operations in a distributed fashion. This makesit well-suited for large-scale implementations using softwarepackages such as GraphLab [16].

3.4 Prediction Error and Number of LabelsAs discussed in Section 2.5, given the samples fS of the

true graph signal on a subset of nodes S ⊂ V, its estimateon Sc is obtained by solving the following problem:

f(Sc) = USc,Kα∗ where, α∗ = arg minα

US,Kα− f(S)(26)

Here, K is the index set of eigenvectors with eigenvalues lessthan the cut-off ωc(S). If the true signal f ∈ PWωc(S)(G),then the prediction is perfect. However, this is not the casein most problems. The prediction error f − f roughlyequals the portion of energy of the true signal in [ωc(S),λN ]frequency band. By choosing the sampling set S that max-imizes ωc(S), we try to capture most of the signal energyand thus, reduce the prediction error.

An important question in the context of active learning isdetermining the minimum number of labels required so that

Summary of the Algorithm [Anis, Gadde, O., KDD 2014]

Construct graph

Choose nodes to label by maximizing cut-off frequency

Predict labels by signal reconstruction

Query labels of chosen nodes

Input data

Results: Real Datasets

1 2 3 4 5 6 7 8 9 100.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Percentage of labeled data

Accu

racy

RandomLLGC BoundMETISLLRProposed

I USPS: handwritten digits

I xi = 16× 16 image

I number of classes = 10

I K -NN graph with K = 10

I wij = exp

(−‖xi−xj‖

2

2σ2

)

2 3 4 5 6 7 8 9 100.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Percentage of labeled dataA

ccu

racy

I ISOLET: spoken letters

I xi ∈ R617 speech features.



I wij = exp

(−‖xi−xj‖

2

2σ2

)

1 2 3 4 5 6 7 8 9 100.1

0.2

0.3

0.4

0.5

0.6

0.7

Percentage of labeled data

Accu

racy

I Newsgroups: documents

I xi ∈ R3000 tf-idf of words



I wij =x>i xj‖xi‖‖xj‖

Summary

I Graph based approaches previously proposed for SSL, including [Zhu et al,2005]

I Main novelty is GSP formulation to:

I Reconstruction/interpolation (better filters)I Sampling (labeling)

I Initial work towards addressing fundamental problems: how does labelcomplexity relate to sampling?

Outline

Motivation

Basic Concepts




Motivation

I Fundamental problem:

I what is the best graph to use for a given task?I In some cases graph is given, e.g., online social networkI For other approaches: model or data-driven approach

I Basic idea: provide a representation such that “likely” signals in thedataset correspond to low graph frequency

I connection with KLT/PCA will be important

Assumptions

I We have a data matrix X = [x1, · · · , xN ] ∈ Rn×N .

I The k-th row of X is attached to k-th vertex of the graph.

I Each xi is a graph signal in an unknown graph.

I Sensor network: each vertex is a sensor, signal is a measurement/timeseries (fMRI, climate data).

I Images: vertex is a pixel, signal is a color.

Basic Formulation

Basic Problem

I Data model: attractive Gaussian Markov Random Field (a-GMRF) ⇔Gaussian with a Generalized Laplacian (GL) for precision matrix.

I Q = P + L with P diagonal (self loop matrix) and L a combinatorialLaplacian.

I Q = (qij) and qij ≤ 0 for all i 6= j

I Graph estimation: Max. Likelihood under aGMRF model.

Use block coordinate descent to solve

minQ is GL

− log det(Q) + tr(QS),

with S = 1N

XXT .

Gaussian Markov Random Fields

I GMRF ⇔ multivariate Gaussian.

I Work with precision (inverse covariance) matrix Θ.

I x is Gaussian N (0,Θ−1) with density

p(x) =det(Θ)1/2

(2π)n/2e−

12

xT Θx,

Precision Matrix Θ 0 sign of Θ sign of Θ−1 u1

GMRF YES ANY ANY ANYa-GMRF YES GL ≥ 0 u1 ≥ 0

Θ = σ2I + L YES GL ≥ 0 u1 = 1

Estimation methods

I GMRF e.g. Graphical Lasso [Friedman et.al., 2008].

I a-GMRF [Slawski and Hein, 2015] [Pavez and Ortega,2016]

I Θ = σ2I + L: proposed by [Lake and Tenenbaum, 2010], efficientalgorithm [Egilmez, Pavez and O., 2016].

Precision matrix estimation

Compute empirical covariance matrix S = 1N

∑Ni=1 xix

Ti .

`1 regularized inverse covariance estimator (GMRF)

minΘ0− log det(Θ) + tr(ΘS) + ρ‖Θ‖`1

`1 regularized GL estimation (a-GMRF)

minQ0,qij≤0

− log det(Q) + tr(QS) + ρ∑

i 6=j

|qij |

⇔ minQ0,qij≤0

− log det(Q) + tr(QK)

with K = S− ρ(11T − I).

I Key difference: off diagonal entries in a Generalized Laplacian are allnegative, so no need for absolute value

Experiment: Natural image graph

I Randomly sample image patches of size 16× 16 from Berkeleysegmentation dataset BSD5006 (around 3× 105 patches from 200 images)

I Estimate GL matrix from data, where each patch is a graph signal, andeach pixel corresponds to a vertex ( 256 vertices).

Generalized Laplacian sparsity (left), graph weights shown in 2D grid (right).

Experiment: Texture graphWe consider wood textures from Brodatz dataset, with 0 and with 60 degreerotation. For each texture of Brodatz dataset, take 8× 8 blocks, compute theircovariance matrix S and solve GL estimation rho = 0.

2.7539e−05 1

wood060

0.053489 1

Texture graphs using our GL estimation (only off diagonal elements of Q). The graphshave |E | = 130 and |E | = 117 edges respectively

.

Summary

I Learning a graph from data under different assumptions on weights (sign,structure)

I Main idea: approximate inverse covariance

I Generalized Laplacian assumption allows efficient solutions

I Can incorporate structural constraints (which edges can be non-zero)

Video coding application

I Goal: Design transforms for video (intra/inter) prediction residuals

I Basic idea: Designing graph G ⇐⇒ Designing L ⇐⇒ Designing GBT

I Cases:

I Fixed: Transforms are fixed across all blocksI Mode-dependent: Transforms are designed offline and tied to each

modeI Block-adaptive-TS: Transforms are designed per-block based on edge

informationI RDO-TS: The best transform is selected per-block from a given set of

transforms using an RD criterion.

Average Design: Graph Learning

2D GMRF Modeling for Residuals ⇐⇒ Learning Generalized GraphLaplacians:

maximizew, v

logdet(

L)− Tr

(LS)

subject to L = B diag(w) Bt + diag(v)

w 0

(1)

B: incidence matrix of a given graph (connectivity)S: sample covariance.w: edge weightsv: self-loop (vertex) weights.

Graph Learning for Intra Predicted Residuals

Mode Variance Graph weights Self-loop weights

DC

Horizontal 10

Planar

Diagonal 18

Graph Learning for Inter Predicted Residuals

HEVC (Inter-mode)

Eight PU partitions in HEVC

We define PU-modes and develop PU-adaptive transforms

PU-modes

This is a simplification: HEVC commonly uses transforms crossing partitionboundaries.

Graph Learning for Inter Predicted Residuals

PU Variance Graph weights Self-loop weights

Experimental Setup – GBT vs. KLT

Compare: GBT vs. KLT

HEVC standard

Transform set

DCT

Residual Blocks

Transform Selection Quant Entropy

Encoding Coeff.

Mode Information

Average designs

…

𝝀 RDO

I Average designs:

I Intra: (i) 35 Non-separable GBTs vs. (ii) 35 KLTsI Inter: (i) 7 Non-separable GBTs vs. (ii) 7 KLTs

I λ = 0.85× 2(QP−12)/3

Summary: Compression

I RDO-TS is better than MD-TS on average

I RDO /w EA-GBT tend to improve more for large blocks

BD-rate MD-TS RDO-TS RDO-TS /w EA-GBTInter -0.27 -2.66 -2.89Intra -2.23 -6.30 -6.76

I Rich set of extensions of existing transforms (DCT/ADST)

I Promising results: currently investigating full integration in codec

Conclusions and FAQs

I GSP has deep roots in the Signal Processing community

I DCT [Strang’99], Image Segmentation using Graph Cuts [Shi andMalik’00], Semi-supervised learning [Belkin, Niyogi’04, Zhu et al, ’05]

I A lot of progress, interesting results

I Filterbanks, Sampling, Graph Learning

I Many open questions!

I FAQs

I Scaling to large graphsI Graph choiceI Killer app

I To get started (Shuman et al, SPM’13), (Sandryhaila & Moura, SPM’14)

References I

S.K. Narang and A. Ortega.

Lifting based wavelet transforms on graphs.In Proc. of Asia Pacific Signal and Information Processing Association (APSIPA), October 2009.

S. Narang, G. Shen, and A. Ortega.

Unidirectional graph-based wavelet transforms for efficient data gathering in sensor networks.In In Proc. of ICASSP’10.

S. Narang and A. Ortega.

Downsampling Graphs using Spectral TheoryIn In Proc. of ICASSP’11.

G. Shen and A. Ortega.

Transform-based Distributed Data Gathering.IEEE Transactions on Signal Processing.

G. Shen, S. Pattem, and A. Ortega.

Energy-efficient graph-based wavelets for distributed coding in wireless sensor networks.In Proc. of ICASSP’09, April 2009.

G. Shen, S. Narang, and A. Ortega.

Adaptive distributed transforms for irregularly sampled wireless sensor networks.In Proc. of ICASSP’09, April 2009.

G. Shen and A. Ortega.

Tree-based wavelets for image coding: Orthogonalization and tree selection.In Proc. of PCS’09, May 2009.

R. Baraniuk, A. Cohen, and R. Wagner.

Approximation and compression of scattered data by meshless multiscale decompositions.Applied Computational Harmonic Analysis, 25(2):133–147, September 2008.

R.R. Coifman and M. Maggioni.

Diffusion wavelets.Applied Computational Harmonic Analysis, 21(1):53–94, 2006.

References II

M. Crovella and E. Kolaczyk.

Graph wavelets for spatial traffic analysis.In IEEE INFOCOMM, 2003.

I. Daubechies, I. Guskov, P. Schroder, and W. Sweldens.

Wavelets on irregular point sets.Phil. Trans. R. Soc. Lond. A, 357(1760):2397–2413, September 1999.

W. Ding, F. Wu, X. Wu, S. Li, and H. Li.

Adaptive directional lifting-based wavelet transform for image coding.IEEE Transactions on Image Processing, 16(2):416–427, February 2007.

D. Jungnickel.

Graphs, Networks and Algorithms.Springer-Verlag Press, 2nd edition, 2004.

M. Maitre and M. N. Do,

“Shape-adaptive wavelet encoding of depth maps,”In Proc. of PCS’09, 2009.

Y. Morvan, P.H.N. de With, and D. Farin,

“Platelet-based coding of depth maps for the transmission of multiview images,”2006, vol. 6055, SPIE.

E. Le Pennec and S. Mallat.

Sparse geometric image representations with bandelets.IEEE Transactions on Image Processing, 14(4):423– 438, April 2005.

G. Karypis, and V. Kumar, “Multilevel k-way Partitioning Scheme for Irregular Graphs”, J. Parallel Distrib. Comput. vol. 48(1),

pp. 96-129, 1998.

A. Said and W.A. Pearlman.

A New, Fast, and Efficient Image Codec Based on Set Partitioning in Hierarchical Trees.IEEE Transactions on Circuits and Systems for Video Technology, 6(3):243– 250, June 1996.

References III

A. Sanchez, G. Shen, and A. Ortega,

“Edge-preserving depth-map coding using graph-based wavelets,”In Proc. of Asilomar’09, 2009.

G. Valiente.

Algorithms on Trees and Graphs.Springer, 1st edition, 2002.

V. Velisavljevic, B. Beferull-Lozano, M. Vetterli, and P.L. Dragotti.

Directionlets: Anisotropic multidirectional representation with separable filtering.IEEE Transactions on Image Processing, 15(7):1916– 1933, July 2006.

R. Wagner, H. Choi, R. Baraniuk, and V. Delouille.

Distributed wavelet transform for irregular sensor network grids.In IEEE Stat. Sig. Proc. Workshop (SSP), July 2005.

R. Wagner, R. Baraniuk, S. Du, D.B. Johnson, and A. Cohen.

An architecture for distributed wavelet analysis and processing in sensor networks.In IPSN ’06, April 2006.

A. Wang and A. Chandraksan.

Energy-efficient DSPs for wireless sensor networks.IEEE Signal Processing Magazine, 19(4):68–78, July 2002.

S.K. Narang, G. Shen and A. Ortega,

“Unidirectional Graph-based Wavelet Transforms for Efficient Data Gathering in Sensor Networks”.pp.2902-2905, ICASSP’10, Dallas, April 2010.

S.K. Narang and A. Ortega,

“Local Two-Channel Critically Sampled Filter-Banks On Graphs”,Intl. Conf. on Image Proc. (2010),

References IV

R. R. Coifman and M. Maggioni,

“Diffusion Wavelets,”Appl. Comp. Harm. Anal., vol. 21 no. 1 (2006), pp. 53–94

A. Sandryhaila and J. Moura,

“Discrete Signal Processing on Graphs”IEEE Transactions on Signal Processing, 2013

D. K. Hammond, P. Vandergheynst, and R. Gribonval,

“Wavelets on graphs via spectral graph theory,”Applied and Computational Harmonic Analysis, March 2011.

D. Shuman, S. K. Narang, P. Frossard, A. Ortega, P. Vandergheynst,

“Signal Processing on Graphs: Extending High-Dimensional Data Analysis to Networks and Other Irregular Data Domains”Signal Processing Magazine, May 2013

M. Crovella and E. Kolaczyk,

“Graph wavelets for spatial traffic analysis,”in INFOCOM 2003, Mar 2003, vol. 3, pp. 1848–1857.

G. Shen and A. Ortega,

“Optimized distributed 2D transforms for irregularly sampled sensor network grids using wavelet lifting,”in ICASSP’08, April 2008, pp. 2513–2516.

W. Wang and K. Ramchandran,

“Random multiresolution representations for arbitrary sensor network graphs,”in ICASSP, May 2006, vol. 4, pp. IV–IV.

R. Wagner, H. Choi, R. Baraniuk, and V. Delouille.

Distributed wavelet transform for irregular sensor network grids.In IEEE Stat. Sig. Proc. Workshop (SSP), July 2005.

S. K. Narang and A. Ortega,

“Lifting based wavelet transforms on graphs,”(APSIPA ASC’ 09), 2009.

References V

B. Zeng and J. Fu,

“Directional discrete cosine transforms for image coding,”in Proc. of ICME 2006, 2006.

E. Le Pennec and S. Mallat,

“Sparse geometric image representations with bandelets,”IEEE Trans. Image Proc., vol. 14, no. 4, pp. 423–438, Apr. 2005.

M. Vetterli V. Velisavljevic, B. Beferull-Lozano and P.L. Dragotti,

“Directionlets: Anisotropic multidirectional representation with separable filtering,”IEEE Trans. Image Proc., vol. 15, no. 7, pp. 1916–1933, Jul. 2006.

P.H.N. de With Y. Morvan and D. Farin,

“Platelet-based coding of depth maps for the transmission of multiview images,”in In Proceedings of SPIE: Stereoscopic Displays and Applications, 2006, vol. 6055.

M. Tanimoto, T. Fujii, and K. Suzuki,

“View synthesis algorithm in view synthesis reference software 2.0 (VSRS2.0),”Tech. Rep. Document M16090, ISO/IEC JTC1/SC29/WG11, Feb. 2009.

S. K. Narang and A. Ortega,

“Local two-channel critically-sampled filter-banks on graphs,”In ICIP’10, Sep. 2010.

S.K. Narang and Ortega A.,

“Perfect reconstruction two-channel wavelet filter-banks for graph structured data,”IEEE trans. on Sig. Proc., vol. 60, no. 6, June 2012.

J. P-Trufero, S.K. Narang and A. Ortega,

“Distributed Transforms for Efficient Data Gathering in Arbitrary Networks,”In ICIP’11., Sep 2011.

E. M-Enrquez, F. Daz-de-Mara and A. Ortega,

“Video Encoder Based On Lifting Transforms On Graphs,”In ICIP’11., Sep 2011.

References VI

I. Daubechies and I. Guskov and P. Schrder and W. Sweldens.

Wavelets on Irregular Point Sets.Phil. Trans. R. Soc. Lond. A, 1991.

Gilbert Strang,

“The discrete cosine transform,”SIAM Review, vol. 41, no. 1, pp. 135–147, 1999.

S.K. Narang and A. Ortega,

“Downsampling graphs using spectral theory,”in ICASSP ’11., May 2011.

I. Pesenson,

“Sampling in Paley-Wiener spaces on combinatorial graphs,”Trans. Amer. Math. Soc, vol. 360, no. 10, pp. 5603–5627, 2008.

M. Levorato, S.K. Narang, U. Mitra, A. Ortega.

Reduced Dimension Policy Iteration for Wireless Network Control via Multiscale Analysis.Globecom, 2012.

A. Gjika, M. Levorato, A. Ortega, U. Mitra.

Online learning in wireless networks via directed graph lifting transform.Allerton, 2012.

W. W. Zachary.

An information flow model for conflict and fission in small groups.Journal of Anthropological Research, 33, 452-473 (1977).

S. K. Narang, Y. H. Chao, and A. Ortega,

“Graph-wavelet filterbanks for edge-aware image processing,”IEEE SSP Workshop, pp. 141–144, Aug. 2012.

A. Sandryhaila and J.M.F. Moura (2013).

Discrete Signal Processing on Graphs.Signal Processing, IEEE Transactions on, 61(7):1644-1656

References VII

S.K. Narang, A. Ortega (2013).

Compact Support Biorthogonal Wavelet Filterbanks for Arbitrary Undirected Graphs.Signal Processing, IEEE Transactions on

V. Ekambaram, G. Fanti, B. Ayazifar, K. Ramchandran.

Critically-Sampled Perfect-Reconstruction Spline-Wavelet Filterbanks for Graph Signals.IEEE GlobalSIP 2013

Eduardo Pavez and Antonio Ortega

Generalized laplacian precision matrix estimation for graph signal processing.International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016

X. Dong, D. Thanou, P. Frossard and P. Vandergheynst

Learning Laplacian matrix in smooth graph signal representationspreprint

Vassilis Kalofolias

How to learn a graph from smooth signalspreprint

B. Lake and J. Tenenbaum

Discovering structure by learning sparse graphProceedings of the 33rd Annual Cognitive Science Conference

C. Zhang and D. Florencio

Analyzing the Optimality of Predictive Transform Coding Using Graph-Based ModelsSignal Processing Letters, IEEE, January 2013

Markus Maier, Ulrike von Luxburg and Matthias Hein

How the result of graph clustering methods depends on the construction of the graphESAIM: Probability and Statistics , January 2013

Thomas Hofmann, Bernhard Scholkopf and Alexander J. Smola

Kernel methods in machine learningThe Annals of Statistics , Vol. 36, No. 3 (Jun., 2008), pp. 1171-1220

References VIII

Turker Biyikoglu, Josef Leydold and Peter F. Stadler

Laplacian eigenvectors of graphsLecture notes in mathematics , Vol. 1915, 2007

Markus Puschel and Jose MF Moura

The algebraic approach to the discrete cosine and sine transforms and their fast algorithmsSIAM Journal on Computing , Vol. 32, Issue 5, 2003

Gilbert Strang

The discrete cosine transformsSIAM review , Vol. 41, Issue 1, 1999

Martin Slawski and Matthias Hein

Estimation of positive definite M-matrices and structure learning for attractive Gaussian Markov random fieldsLinear Algebra and its Applications , vol. 473, 2015

Wei Hu, Gene Cheung and Antonio Ortega

Intra-Prediction and Generalized Graph Fourier Transform for Image CodingSignal Processing Letters, IEEE , Vol. 2, Issue 11, 2015

Eduardo Pavez, Hilmi E. Egilmez, Yongzhe Wang and Antonio Ortega

GTT: Graph template transforms with applications to image codingPicture Coding Symposium (PCS), 2015

Jerome Friedman, Trevor Hastie and Robert Tibshirani

Sparse inverse covariance estimation with the graphical lassoBiostatistics, Vol. 9 Issue 3, 2008

Pradeep Ravikumar, Martin J. Wainwright, Garvesh Raskutti and Bin Yu

High-dimensional covariance estimation by minimizing 1-penalized log-determinant divergenceElectronic Journal of Statistics, Vol. 5, 2011

J. Deri and J. Moura,

Spectral Projector-based Graph Fourier TransformsJ. on Selected Topics in Signal Processing, submitted, 2016.

References IX

Eduardo Pavez and Antonio Ortega

Generalized laplacian precision matrix estimation for graph signal processing.International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016

X. Dong, D. Thanou, P. Frossard and P. Vandergheynst

Learning Laplacian matrix in smooth graph signal representationsIEEE Transactions on signal processing, 2016

Vassilis Kalofolias

How to learn a graph from smooth signalsAISTATS, 2016

B. Lake and J. Tenenbaum

Discovering structure by learning sparse graphProceedings of the 33rd Annual Cognitive Science Conference

Sepideh Hassan-Moghaddam, Neil K Dhingra, and Mihailo Jovanovic

Dopology identification of undirected consensus networks via sparse inverse covariance estimationCDC, 2016

Turker Biyikoglu, Josef Leydold and Peter F. Stadler

Laplacian eigenvectors of graphsLecture notes in mathematics , Vol. 1915, 2007

Matan Gavish, Boaz Nadler, and Ronald R Coifman

Multiscale wavelets on trees, graphs and high dimensional data: Theory and applications to semi supervised learningICML, 2010

Godwin Shen and Antonio Ortega

Transform-based distributed data gatheringIEEE Transactions on signal processing, 2010

Sunil K Narang and Antonio Ortega

erfect Reconstruction Two-Channel Wavelet Filter Banks for Graph Structured DataIEEE Transactions on signal processing, 2012

References X

V. Y. Tan, A. Anandkumar, and A. S. Willsky

Learning gaussian tree models: Analysis of error exponents and extremal structuresIEEE Transactions on Signal Processing, 2010

C. Chow and C. Liu

Approximating discrete probability distributions with dependence treesIEEE transactions on Information Theory, 1968

S. Yang, Q. Sun, S. Ji, P. Wonka, I. Davidson, and J. Ye

Structural graphical lasso for learning mouse brain connectivityACM SIGKDD, 2015

P.-L. Loh and M. J. Wainwright

Structure estimation for discrete graphical models: Generalized covariance matrices and their inversesNIPS, 2012

C.-J. Hsieh, A. Banerjee, I. S. Dhillon, and P. K. Ravikumar

A divide-and-conquer method for sparse inverse covariance estimationNIPS, 2012

Martin Slawski and Matthias Hein

Estimation of positive definite M-matrices and structure learning for attractive Gaussian Markov random fieldsLinear Algebra and its Applications , vol. 473, 2015

Jerome Friedman, Trevor Hastie and Robert Tibshirani

Sparse inverse covariance estimation with the graphical lassoBiostatistics, Vol. 9 Issue 3, 2008

Pradeep Ravikumar, Martin J. Wainwright, Garvesh Raskutti and Bin Yu

High-dimensional covariance estimation by minimizing 1-penalized log-determinant divergenceElectronic Journal of Statistics, Vol. 5, 2011

M. X. Goemans and D. P. Williamson

Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programmingJournal of the ACM , 1995

Graph Signal Processing: Filterbanks, Sampling and ...

Documents

Transcript of Graph Signal Processing: Filterbanks, Sampling and ...