Graph Signal Processing: Filterbanks, Sampling and ...
Transcript of Graph Signal Processing: Filterbanks, Sampling and ...
Graph Signal Processing: Filterbanks, Sampling andApplications to Machine Learning and Video Coding
Antonio Ortega
Signal and Image Processing InstituteDepartment of Electrical Engineering
University of Southern CaliforniaLos Angeles, California
Sept 16, 2017
Acknowledgements
I Collaborators
- Sunil Narang (Microsoft), Godwin Shen (NASA-JPL), AkshayGadde, Hilmi Egilmez (Qualcomm)
- Jessie Chao, Aamir Anis, Eduardo Pavez, Joanne Kao, Keng-ShihLu (USC)
- Marco Levorato (UCI), Urbashi Mitra (USC), Salman Avestimehr(USC), Aly El Gamal (USC/Purdue), Eyal En Gad (USC), NiccoloMichelusi (USC/Purdue), Gene Cheung (NII), Pierre Vandergheynst(EPFL), Pascal Frossard (EPFL), David Shuman (MacalasterCollege), Yuichi Tanaka (TUAT), David Taubman (UNSW), DavidTay (LaTrobe).
I Funding
- NASA (AIST-05-0081), NSF (CCF-1018977, CCF-1410009,CCF-1527874)
- Mitsubishi Electric, Samsung, LGE, Google- TUAT
Outline
Motivation
Basic Concepts
Graph Transforms – Filterbanks
Graph Sampling – Machine Learning
Graph Learning from Data – Compression
Motivation
Graphs provide a flexible model to represent many datasets:
I Examples in Euclidean domains
(a) (b)
11
.5
(c)
(a) Computer graphics1 (b) Wireless sensor networks 2 (c) image - graphs
Motivation
I Examples in non-Euclidean settings
(a)
Combined ARQ-Queue
0
1
2
3
0 1 2 3
ARQ
Que
ue(b)
(a) Social Networks 3, (b) Finite State Machines(FSM)
Graph Signal Processing
I Given a graph (fixed or learned from data) and given signals on the graph(set of scalars associated to vertices) define frequency, sampling,transforms, etc in order to solve problems such as compression, denoising,interpolation, etc
I Overview papers:[Shuman, Narang, Frossard, Ortega, Vandergheysnt, SPM’2013][Sandryhaila and Moura 2013]
Examples
11
.5
I Sensor network
I Relative positions of sensors(kNN), temperature
I Does temperature varysmoothly?
I Social network
I Friendship relationship, ageI Are friends of similar age?
I Images
I Pixel positions and similarity,pixel values
I Discontinuities and smoothness
Outline
Motivation
Basic Concepts
Graph Transforms – Filterbanks
Graph Sampling – Machine Learning
Graph Learning from Data – Compression
Basic concepts
I Graph : vertices (nodes) connected via some links (edges)
I Graph Signal: set of scalar/vector values defined on the vertices.
1
2
3
4
5
6
7
8
9
10
1112
13
14
15
16
17
18
19
20
21 22 23
24 25 26
27
Graph-signal
Graph G = (V,E ,w)
Vertex Set V = v1, v2, ...Edge Set E = (v1, v2), (v1, v3), ...Weighted edges w , sets of weights aij
Graph Signal x = x1, x2, ...Neighborhood, h-hopNh(i) = j ∈ V : hop dist(i , j) ≤ h
Multiple algebraic representations
1
2
3
4
5
6
7
8
9
10
1112
13
14
15
16
17
18
19
20
21 22 23
24 25 26
27
I Graph G = (V,E ,w).
I Adjacency A, aij , aji = weights oflinks between i and j (could bedifferent if graph is directed.)
I Degree D = diagdi, in case ofundirected graph.
I Various algebraic representations
I normalized adjacency 1|λmax |A
I Laplacian matrix L = D− A.I Symmetric normalized
Laplacian L = D−1/2LD−1/2
I Graph Signalf = f (1), f (2), ..., f (N)
I Discussion:
1. Undirected graphs easier to work with2. Some applications require directed graphs3. Graphs with self loops are useful (more on this later)
Graph Fourier Transform (GFT)
I Different results/insights for different choices of operator
I Laplacian L = D− A = UΛU′
I Eigenvectors of L : U = ukk=1:N
I Eigenvalues of L : diagΛ = λ1 ≤ λ2 ≤ ... ≤ λN
I Eigen-pair system (λk , uk) provides Fourier-like interpretation —Graph Fourier Transform (GFT)
Graph Fourier Transform (GFT) – Intuition
I Finding eigenvalues/eigenvectors
I Rayleigh quotient formulation:
λk = minu⊥u1,...uk−1
utLu
utu
I For combinatorial Laplacian, L = D− A we have:
I u1 = 1I λ1 = 0I utLu =
∑ivj wij(ui − uj)
2
I Interpretation: Basis vectors (eigenvectors) ordered by increasingvariation on the graph.
Eigenvectors of graph Laplacian
(a) λ = 0.00 (b) λ = 0.04 (c) λ = 0.20
(d) λ = 0.40 (e) λ = 1.20 (f) λ = 1.49
I Basic idea: increased variation on the graph, e.g., ftLf, as frequencyincreases
What makes a transform a “graph transform”?
Input Signal Transform Output Signal Processing/
Analysis
I Properties
I Invertibility, critical sampling, orthogonality
I What makes these “graph transforms”?
I Frequency interpretation
I Operation is diagonalized by U
I Vertex localization: polynomial of the operator (A, L, etc)
I Polynomial degree is small (vs a polynomial of degree N − 1).
I Transforms adapted to graph: choosing the right graphI Example designs in vertex and frequency domain
What makes a transform a “graph transform”?
Input Signal Transform Output Signal Processing/
Analysis
I Properties
I Invertibility, critical sampling, orthogonality
I What makes these “graph transforms”?
I Frequency interpretation
I Operation is diagonalized by U
I Vertex localization: polynomial of the operator (A, L, etc)
I Polynomial degree is small (vs a polynomial of degree N − 1).
I Transforms adapted to graph: choosing the right graphI Example designs in vertex and frequency domain
Spatial Graph Transforms
I Designed in the vertex domain of the graph. Examples:
I Graph wavelets [Crovella’03]I Approaches for WSN [Wang’06], [Wagner’05] [Shen-ICASSP08]I Lifting wavelets on arbitrary graphs [Narang and O., 2009]
I 1-hop averaging transform
y [n] =1
dn
N∑
m=1
A[n,m]x [m] ⇒ y = D−1Ax = Prwx
I 1-hop difference transform
y [n] =1
dn
N∑
m=1
A[n,m](x [n]− x [m]) ⇒ y = Lrwx = x− Prwx
Spectral Graph Wavelets
I Spectral Wavelet transforms (Hammond et al, CHA’09):
Design spectral kernels: h(λ) : σ(G)→ R.
Th = h(L) = Uh(Λ)Ut
h(Λ) = diagh(λi )
I Analogy: FFT implementation of filters
Vertex Localization: SGWT
I Polynomial kernel approximation:
h(λ) ≈K∑
k=0
akλk
Th ≈K∑
k=0
akLk
I Note that A and L are both 1-hop operations K -hop localized: no spectraldecomposition required.
Summary
I Different types of graphs (directed, undirected, with/without self loops)
I Multiple algebraic representations (A,L, ...)
I Eigenvectors/eigenvalues induce a notion of variation/frequency
I A,L can be viewed as “shifts” or “elementary operators”
I Polynomials of these operators represent “local” processing on the graph
I Choice of graph impacts processing performed
Outline
Motivation
Basic Concepts
Graph Transforms – Filterbanks
Graph Sampling – Machine Learning
Graph Learning from Data – Compression
Graph Filterbank Designs
I Formulation of critically sampled graph filterbank design problem
I Design filters using spectral techniques [Hammond et al. 2009].
I Orthogonal (not compactly supported) [Narang and O. TSP’12]
I Bi-Orthogonal (compactly supported) [Narang and O. TSP’13]
analysis side synthesis side
filter downsample upsample filter
- -
Downsampling/Upsampling in Graphs
Downsampling-upsampling operation:
I Regular Signals:
fdu(n) =
f (n) if n = 2m0 if n = 2m + 1
I Graph signals:
fdu(n) =
f (n) if n ∈ S0 if n /∈ S
for some set S.
(a) regular signal (b) regular signal after DU by 2
(c) graph signal (d) graph signal after DU by 2
I For regular signals DU by 2 operation is equivalent toFdu(e jω) = 1/2(F (e jω) + F (e−jω)) in the DFT domain.
I What is the DU by 2 for graph signals in GFT domain?
Downsampling in Graphs
I Define Jβ = JβH = diagβH(n).I In vector form:
fdu =1
2(f + Jβ f)
=1
2(f + f)
I Spectral Folding [4]: For a bipartite graph: f (λ) = f (2− λ).
I This leads to solutions for bipartite graph filterbanks
Downsampling in Graphs
I Define Jβ = JβH = diagβH(n).I In vector form:
fdu =1
2(f + Jβ f)
=1
2(f + f)
I Spectral Folding [4]: For a bipartite graph: f (λ) = f (2− λ).
I This leads to solutions for bipartite graph filterbanks
Downsampling in Graphs
I Define Jβ = JβH = diagβH(n).I In vector form:
fdu =1
2(f + Jβ f)
=1
2(f + f)
I Spectral Folding [4]: For a bipartite graph: f (λ) = f (2− λ).
I This leads to solutions for bipartite graph filterbanks
Wavelet filterbanks on bipartite graphs
I Aliasing Cancellation ⇒ B = 0 if for all λ ∈ σ(G):
B(λ) = g1(λ)h1(2− λ)− g0(λ)h0(2− λ) = 0
I Perfect Reconstruction ⇒ A = cI if for all λ ∈ σ(G):
A(λ) = g1(λ)h1(λ) + g0(λ)h0(λ) = c
I Orthogonal (approximated by polynomials) and bi-Orthogonal (exactlypolynomial) are possible [Narang and O., IEEE TSP 2012, 2013]
Bipartite Subgraph Decomposition
I But not all graphs are bipartite...
I Solution: “Iteratively” decompose non-bipartite graph G into K bipartitesubgraphs:
I each subgraph covers the same vertex set.I each edge in G belongs to exactly one bipartite graph.
Bipartite Subgraph Decomposition
I But not all graphs are bipartite...
I Solution: “Iteratively” decompose non-bipartite graph G into K bipartitesubgraphs:
I each subgraph covers the same vertex set.I each edge in G belongs to exactly one bipartite graph.
Bipartite Subgraph Decomposition
I Example of a 2-dimensional (K = 2) decomposition:
Bipartite Subgraph Decomposition
I Example of a 2-dimensional (K = 2) decomposition:
Bipartite Subgraph Decomposition
I Example of a 2-dimensional (K = 2) decomposition:
Bipartite Subgraph Decomposition
I Example of a 2-dimensional (K = 2) decomposition:
Bipartite Subgraph Decomposition
I Example of a 2-dimensional (K = 2) decomposition:
“Multi-dimensional” Filterbanks on graphs
Two-dimensional two-channel filterbank on graphs:
2
2
2
2
2
2
2
2
I Advantages:
I Perfect reconstruction and orthogonal for any graph and any bptdecomposition.
I defined metrics to find ”good” bipartite decompositions.
Example
Minnesota traffic graph and graph signal
Example
−1 0 1
−0.1 0 0.1
−1 0 1
−0.05 0 0.05
LL Channel LH Channel
HL Channel HH Channel
Reconstructed graph-signals for each channel.
What about non-bipartite graphs? [Anis and O., ICASSP’17]
I Maintain original (non-bipartite) graph
I Use polynomial analysis/synthesis filters
I Computationally optimize sampling sets to minimize errors
φ(S) =∥∥∥GTSTSH− I
∥∥∥2
F=
∥∥∥∥∥1
2
∑
i∈Sgih
Ti −
1
2
∑
j∈Sc
gjhTj
∥∥∥∥∥
2
F
Basic greedy minimization
Initialize: S = ∅1: while |S| < N do2: S ← S ∪ u, where u = argminv∈Sc φ(S ∪ v)3: end while
Experiments
I Hi ,Gi obtained using GraphBior(6,6)4
Sampling scheme
obtained for bipartite
graphs
Obtained sampling scheme
on Minnesota traffic graph
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.5
1
1.5
λ
|T(λ
)|
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.5
1
1.5
λ
max
µ ≠
λ |T
(µ)|
Spectral response (top) and
maximum aliasing component
(bottom) for the filterbank
Comparison against simple sampling schemes: mean recon.relative error over 1000 random signals
graph-QMF5 (poly 8) graphBior(6,6)Random 0.4842± 0.0113 0.4629± 0.0108MaxCut 0.1125± 0.0069 0.0972± 0.0061Proposed 0.0779± 0.0049 0.0664± 0.0045
4Narang and O., “Compact support biorthogonal wavelet filterbanks for arbitrary graphs”, IEEE TSP, 20125Narang and Ortega, “Perfect reconstruction two-channel wavelet filterbanks for graph-structured data”, IEEE
TSP, 2012
Summary
I Two design approaches: vertex domain/spectral domain
I Polynomials of L serve as connection between domains
I Numerous overcomplete solutions
I Critical sampling possible for a special case (bipartite) if we use apolynomial reconstruction.
Outline
Motivation
Basic Concepts
Graph Transforms – Filterbanks
Graph Sampling – Machine Learning
Graph Learning from Data – Compression
Sampling
I Sample signal at discrete points in time/space
I Core tool in Digital Signal Processing (DSP)
I From Analog to DigitalI Digital to Digital
I Key concept: signals can be recovered from their samples
I Questions:
I What properties enable recovery?I How to sample?I How to reconstruct?
Graph Signal Sampling Examples
11
.5
I Sensor network
I Relative positions of sensors(kNN), temperature
I Measure temperature in a subsetof sensors
I Social network
I Friendship relationship, ageI Estimate interests for a subset of
users
I Images
I Pixel positions and similarity,pixel values
I New image sampling techniques
Sampling
Traditional DSP
I Samples dropped in a regular fashion, spectral folding (aliasing).
I Cutoff frequency ⇔ sampling rate.
ReconstructionBandlimited signal
Answers:
I What properties enable recovery? Signals are smooth, low frequency
I How to sample? Regular sampling
I How to reconstruct? Low pass filtering
Sampling
Traditional DSP
I Samples dropped in a regular fashion, spectral folding (aliasing).
I Cutoff frequency ⇔ sampling rate.
ReconstructionBandlimited signal
Answers:
I What properties enable recovery? Signals are smooth, low frequency
I How to sample? Regular sampling
I How to reconstruct? Low pass filtering
Graph Sampling?
I Measure a few nodes to estimate information throughout the graph
I Reconstruct signal in whole graph
Questions:
I What properties enable recovery? Need to define frequency
I How to sample? No obvious regular sampling
I How to reconstruct? Filtering is needed
Graph Sampling?
I Measure a few nodes to estimate information throughout the graph
I Reconstruct signal in whole graph
Questions:
I What properties enable recovery? Need to define frequency
I How to sample? No obvious regular sampling
I How to reconstruct? Filtering is needed
Graph sampling results
Conditions for unique bandlimited reconstruction
I [Pesenson ’08, Anis et al, ’14-’15, Chen et al. ’15, Shomorony andAvestimehr ’14]
I assume that the GFT basis is known
Sampling set selection
Vertex domain methods
• [Narang and Ortega ’10]
• [Ngyuen and Do ’15]
Spectral domain methods
• [Chen et al. ’15]
• [Shomorony and Avestimehr ’14]
• ensure S and Sc are well connected
• optimize max-cut like criterion
• assume GFT basis U is known
• select subset of rows for stability
• used in graph filterbank design
• does not consider stability of BL recon.
• high computational cost
• unsuitable for large graphs
Goal: GFT free method of sampling set selection for stable BL reconstruction[Anis et al, ’14-’15]
Necessary and sufficient condition [Anis, Gadde, O., ICASSP ’14]
I If φ ∈ PWω(G), then g = f + φ ∈ PWω(G).
I f 6= g and f(S) = g(S) ⇒ trouble!
LemmaLet L2(Sc) = φ : φ(S) = 0. All signals f ∈ PWω(G) can be perfectlyrecovered from S if and only if PWω(G) ∩ L2(Sc) = 0.
Necessary and sufficient condition [Anis, Gadde, O., ICASSP ’14]
I If φ ∈ PWω(G), then g = f + φ ∈ PWω(G).
I f 6= g and f(S) = g(S) ⇒ trouble!
LemmaLet L2(Sc) = φ : φ(S) = 0. All signals f ∈ PWω(G) can be perfectlyrecovered from S if and only if PWω(G) ∩ L2(Sc) = 0.
Necessary and sufficient condition [Anis, Gadde, O., ICASSP ’14]
I If φ ∈ PWω(G), then g = f + φ ∈ PWω(G).
I f 6= g and f(S) = g(S) ⇒ trouble!
LemmaLet L2(Sc) = φ : φ(S) = 0. All signals f ∈ PWω(G) can be perfectlyrecovered from S if and only if PWω(G) ∩ L2(Sc) = 0.
Necessary and sufficient condition [Anis, Gadde, O., ICASSP ’14]
I If φ ∈ PWω(G), then g = f + φ ∈ PWω(G).
I f 6= g and f(S) = g(S) ⇒ trouble!
LemmaLet L2(Sc) = φ : φ(S) = 0. All signals f ∈ PWω(G) can be perfectlyrecovered from S if and only if PWω(G) ∩ L2(Sc) = 0.
Necessary and sufficient condition [Anis, Gadde, O., ICASSP ’14]
I If φ ∈ PWω(G), then g = f + φ ∈ PWω(G).
I f 6= g and f(S) = g(S) ⇒ trouble!
LemmaLet L2(Sc) = φ : φ(S) = 0. All signals f ∈ PWω(G) can be perfectlyrecovered from S if and only if PWω(G) ∩ L2(Sc) = 0.
Sampling theorem [Anis, Gadde, O., ICASSP ’14]
Theorem (Sampling theorem)
All signals f ∈ PWω(G) can be perfectly recovered from their samples f(S) ifand only if
ω < infφ∈L2(Sc )
ω(φ)4= ωc(S)
We call ωc(S) the true cutoff frequency.
The cutoff frequency depends on the size of S and topologies of G and S.
In order to optimize sampling set, maximize cut-off frequency
Sampling set optimization
FormulationRelax the true cutoff ωc(S) by Ωk(S), an approximation based on the k-thpower of the Laplacian and solve:
Minimize |S| subject to Ωk(S) ≥ ωc
Greedy Approach to get an estimate of Sopt [Anis, Gadde and O., ICASSP2014, TSP 2016]:
I Start with S = ∅.I At each iteration estimate lowest frequency eigenvector of L2(Sc): φ∗k (i)
I Add nodes to S (from Sc) one-by-one that ensure maximum increase inΩk(S) (estimate of L2(Sc) frequency) at each step.
I Choose new node i that has maximum entry φ∗k (i)
I Essentially this involves finding nodes that are “far” from S at eachiteration
I Note: Most alternative proposed methods require knowledge of the GFT
Sampling set optimization – Intuition
Cutoff frequency ≡ bandwidth of smoothest signal φ∗ in L2(Sc).
I Compare φ∗ ∈ L2(Sc) for both graphs.
I More cross-links ⇒ higher variation ⇒ higher bandwidth.
Sampling set optimization – Intuition
Cutoff frequency ≡ bandwidth of smoothest signal φ∗ in L2(Sc).
I Compare φ∗ ∈ L2(Sc) for both graphs.
I More cross-links ⇒ higher variation ⇒ higher bandwidth.
Sampling set optimization – Intuition
Cutoff frequency ≡ bandwidth of smoothest signal φ∗ in L2(Sc).
Which choice of S leads to a higher cutoff frequency?
I Compare φ∗ ∈ L2(Sc) for both graphs.
I More cross-links ⇒ higher variation ⇒ higher bandwidth.
Sampling set optimization – Intuition
Cutoff frequency ≡ bandwidth of smoothest signal φ∗ in L2(Sc).
Which choice of S leads to a higher cutoff frequency?
I Compare φ∗ ∈ L2(Sc) for both graphs.
I More cross-links ⇒ higher variation ⇒ higher bandwidth.
Sampling set optimization – Intuition
Cutoff frequency ≡ bandwidth of smoothest signal φ∗ in L2(Sc).
Which choice of S leads to a higher cutoff frequency?
I Compare φ∗ ∈ L2(Sc) for both graphs.
I More cross-links ⇒ higher variation ⇒ higher bandwidth.
Summary
I Graph sampling formulation similar to regular domain
I Lack of regular structure prevents regular sampling
I Vertex domain vs spectral solutions
I Simplified approach using spectral proxies
Learning: Motivation and Problem Definition
I Unlabeled data is abundant. Labeled data is expensive and scarce.
I Solution: Active Semi-supervised Learning (SSL).
I Classification performance can be improved when both labeled andunlabeled data are used for training
I Questions:
I How to predict unknown labels from the known labels?I What is the optimal set of nodes to label given the learning
algorithm?
Graph-based semi-supervised learning
Key Idea:
I Construct a distance-based similarity graph.
I Data points → nodes.I Edge weights → similarity.I Sample (label) some of the data points
I Treat class indicator vectors as graph signals that are
I Consistent with known labels.I Smooth on the graph.
Graph-based semi-supervised learning
Key Idea:
I Construct a distance-based similarity graph.
I Data points → nodes.I Edge weights → similarity.I Sample (label) some of the data points
I Treat class indicator vectors as graph signals that are
I Consistent with known labels.I Smooth on the graph.
Graph-based semi-supervised learning
Key Idea:
I Construct a distance-based similarity graph.
I Data points → nodes.I Edge weights → similarity.I Sample (label) some of the data points
I Treat class indicator vectors as graph signals that are
I Consistent with known labels.I Smooth on the graph.
Graph-based semi-supervised learning
Key Idea:
I Construct a distance-based similarity graph.
I Data points → nodes.I Edge weights → similarity.I Sample (label) some of the data points
I Treat class indicator vectors as graph signals that are
I Consistent with known labels.I Smooth on the graph.
Why does this work?
Connection to Graph Sampling
I Class membership functions can be approximated by bandlimited graphsignals.
0 200 400 600 800 10000
0.2
0.4
0.6
0.8
1
Eigenvalue index
CD
F o
f e
ne
rgy in
GF
T c
oe
ffic
ien
ts
(a) USPS
0 200 400 600 800 1000 1200 14000
0.2
0.4
0.6
0.8
1
Eigenvalue index
CD
F o
f energ
y in G
FT
coeffic
ients
(b) Isolet
0 200 400 600 800 10000
0.2
0.4
0.6
0.8
1
Eigenvalue index
CD
F o
f e
ne
rgy in
GF
T c
oe
ffic
ien
ts
(c) 20 newsgroups
Figure 3: Cumulative distribution of energy in the GFT coefficients of one of the class membership functions pertaining tothe three real-world dataset experiments considered in Section 5. Note that most of the energy is concentrated in the low-passregion.
term is expected to be negligible as compared to the firstone due to differencing, and we get
xtLx ≈
j∈Sc
pj
dj
x2
j , (24)
where, pj =
i∈S wij is defined as the “partial out-degree”of node j ∈ Sc, i.e., it is the sum of weights of edges crossingover to the set S. Therefore, given a current selected S, thegreedy algorithm selects the next node, to be added to S,that maximizes the increase in
Ω1(S) ≈ inf||x||=1
j∈Sc
pj
dj
x2
j . (25)
Due to the constraint ||x|| = 1, the expression being mini-mized is essentially an infimum over a convex combinationof the fractional out-degrees and its value is largely deter-mined by nodes j ∈ Sc for which pj/dj is small. In otherwords, we must worry about those nodes that have a lowratio of partial degree to the actual degree. Thus, in thesimplest case, our selection algorithm tries to remove thosenodes from the unlabeled set that are weakly connected tonodes in the labeled set. This makes intuitive sense as, inthe end, most prediction algorithms involve propagation oflabels from the labeled to the unlabeled nodes. If an unla-beled node is strongly connected to various numerous points,its label can be assigned with greater confidence.
Note that using a higher power k in the cost function,i.e., finding Ωk(S) for k > 1 involves xLkx which, looselyspeaking, takes into account higher order interactions be-tween the nodes while choosing the nodes to label. In asense, we expect it to capture the connectivities in a moreglobal sense, beyond local interactions, taking into accountthe underlying manifold structure of the data.
3.3 ComplexityWe now comment on the time and space complexity of
our algorithm. The most complex step in the greedy proce-dure for maximizing Ωk(S) is computing the smallest eigen-pair of (Lk)Sc . This can be accomplished using an iterativeRayleigh-quotient minimization based algorithm. Specifi-cally, the locally-optimal pre-conditioned conjugate gradi-ent (LOPCG) method [14] is suitable for this approach.Note that (Lk)Sc can be written as ISc,V .L.L . . . L.IV,Sc ,hence the eigenvalue computation can be broken into atomic
matrix-vector products: L.x. Typically, the graphs encoun-tered in learning applications are sparse, leading to efficientimplementations of L.x. If |L| denotes the number of non-zero elements in L, then the complexity of the matrix-vectorproduct is O(|L|). The complexity of each eigen-pair com-putation for (Lk)Sc is then O(k|L|r), where r is a constantequal to the average number of iterations required for theLOPCG algorithm (r depends on the spectral properties ofL and is independent of its size |V|). The complexity of thelabel selection algorithm then becomes O(k|L|mr), wherem is the number of labels requested.
In the iterative reconstruction algorithm, since we usepolynomial graph filters (Section 2.5), once again the atomicstep is the matrix-vector product L.x. The complexity ofthis algorithm can be given as O(|L|pq), where p is the orderof the polynomial used to design the filter and q is the av-erage number of iterations required for convergence. Again,both these parameters are independent of |V|. Thus, theoverall complexity of our algorithm is O(|L|(kmr + pq)). Inaddition, our algorithm has major advantages in terms ofspace complexity: Since, the atomic operation at each stepis the matrix-vector product L.x, we only need to store Land a constant number of vectors. Moreover, the structureof the Laplacian matrix allows one to perform the afore-mentioned operations in a distributed fashion. This makesit well-suited for large-scale implementations using softwarepackages such as GraphLab [16].
3.4 Prediction Error and Number of LabelsAs discussed in Section 2.5, given the samples fS of the
true graph signal on a subset of nodes S ⊂ V, its estimateon Sc is obtained by solving the following problem:
f(Sc) = USc,Kα∗ where, α∗ = arg minα
US,Kα− f(S)(26)
Here, K is the index set of eigenvectors with eigenvalues lessthan the cut-off ωc(S). If the true signal f ∈ PWωc(S)(G),then the prediction is perfect. However, this is not the casein most problems. The prediction error f − f roughlyequals the portion of energy of the true signal in [ωc(S),λN ]frequency band. By choosing the sampling set S that max-imizes ωc(S), we try to capture most of the signal energyand thus, reduce the prediction error.
An important question in the context of active learning isdetermining the minimum number of labels required so that
Summary of the Algorithm [Anis, Gadde, O., KDD 2014]
Construct graph
Choose nodes to label by maximizing cut-off frequency
Predict labels by signal reconstruction
Query labels of chosen nodes
Input data
Results: Real Datasets
1 2 3 4 5 6 7 8 9 100.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Percentage of labeled data
Accu
racy
RandomLLGC BoundMETISLLRProposed
I USPS: handwritten digits
I xi = 16× 16 image
I number of classes = 10
I K -NN graph with K = 10
I wij = exp
(−‖xi−xj‖
2
2σ2
)
2 3 4 5 6 7 8 9 100.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Percentage of labeled dataA
ccu
racy
I ISOLET: spoken letters
I xi ∈ R617 speech features.
I number of classes = 26
I K -NN graph with K = 10
I wij = exp
(−‖xi−xj‖
2
2σ2
)
1 2 3 4 5 6 7 8 9 100.1
0.2
0.3
0.4
0.5
0.6
0.7
Percentage of labeled data
Accu
racy
I Newsgroups: documents
I xi ∈ R3000 tf-idf of words
I number of classes = 10
I K -NN graph with K = 10
I wij =x>i xj‖xi‖‖xj‖
Summary
I Graph based approaches previously proposed for SSL, including [Zhu et al,2005]
I Main novelty is GSP formulation to:
I Reconstruction/interpolation (better filters)I Sampling (labeling)
I Initial work towards addressing fundamental problems: how does labelcomplexity relate to sampling?
Outline
Motivation
Basic Concepts
Graph Transforms – Filterbanks
Graph Sampling – Machine Learning
Graph Learning from Data – Compression
Motivation
I Fundamental problem:
I what is the best graph to use for a given task?I In some cases graph is given, e.g., online social networkI For other approaches: model or data-driven approach
I Basic idea: provide a representation such that “likely” signals in thedataset correspond to low graph frequency
I connection with KLT/PCA will be important
Assumptions
I We have a data matrix X = [x1, · · · , xN ] ∈ Rn×N .
I The k-th row of X is attached to k-th vertex of the graph.
I Each xi is a graph signal in an unknown graph.
I Sensor network: each vertex is a sensor, signal is a measurement/timeseries (fMRI, climate data).
I Images: vertex is a pixel, signal is a color.
Basic Formulation
Basic Problem
I Data model: attractive Gaussian Markov Random Field (a-GMRF) ⇔Gaussian with a Generalized Laplacian (GL) for precision matrix.
I Q = P + L with P diagonal (self loop matrix) and L a combinatorialLaplacian.
I Q = (qij) and qij ≤ 0 for all i 6= j
I Graph estimation: Max. Likelihood under aGMRF model.
Use block coordinate descent to solve
minQ is GL
− log det(Q) + tr(QS),
with S = 1N
XXT .
Gaussian Markov Random Fields
I GMRF ⇔ multivariate Gaussian.
I Work with precision (inverse covariance) matrix Θ.
I x is Gaussian N (0,Θ−1) with density
p(x) =det(Θ)1/2
(2π)n/2e−
12
xT Θx,
Precision Matrix Θ 0 sign of Θ sign of Θ−1 u1
GMRF YES ANY ANY ANYa-GMRF YES GL ≥ 0 u1 ≥ 0
Θ = σ2I + L YES GL ≥ 0 u1 = 1
Estimation methods
I GMRF e.g. Graphical Lasso [Friedman et.al., 2008].
I a-GMRF [Slawski and Hein, 2015] [Pavez and Ortega,2016]
I Θ = σ2I + L: proposed by [Lake and Tenenbaum, 2010], efficientalgorithm [Egilmez, Pavez and O., 2016].
Precision matrix estimation
Compute empirical covariance matrix S = 1N
∑Ni=1 xix
Ti .
`1 regularized inverse covariance estimator (GMRF)
minΘ0− log det(Θ) + tr(ΘS) + ρ‖Θ‖`1
`1 regularized GL estimation (a-GMRF)
minQ0,qij≤0
− log det(Q) + tr(QS) + ρ∑
i 6=j
|qij |
⇔ minQ0,qij≤0
− log det(Q) + tr(QK)
with K = S− ρ(11T − I).
I Key difference: off diagonal entries in a Generalized Laplacian are allnegative, so no need for absolute value
Experiment: Natural image graph
I Randomly sample image patches of size 16× 16 from Berkeleysegmentation dataset BSD5006 (around 3× 105 patches from 200 images)
I Estimate GL matrix from data, where each patch is a graph signal, andeach pixel corresponds to a vertex ( 256 vertices).
Generalized Laplacian sparsity (left), graph weights shown in 2D grid (right).
Experiment: Texture graphWe consider wood textures from Brodatz dataset, with 0 and with 60 degreerotation. For each texture of Brodatz dataset, take 8× 8 blocks, compute theircovariance matrix S and solve GL estimation rho = 0.
2.7539e−05 1
wood060
0.053489 1
Texture graphs using our GL estimation (only off diagonal elements of Q). The graphshave |E | = 130 and |E | = 117 edges respectively
.
Summary
I Learning a graph from data under different assumptions on weights (sign,structure)
I Main idea: approximate inverse covariance
I Generalized Laplacian assumption allows efficient solutions
I Can incorporate structural constraints (which edges can be non-zero)
Video coding application
I Goal: Design transforms for video (intra/inter) prediction residuals
I Basic idea: Designing graph G ⇐⇒ Designing L ⇐⇒ Designing GBT
I Cases:
I Fixed: Transforms are fixed across all blocksI Mode-dependent: Transforms are designed offline and tied to each
modeI Block-adaptive-TS: Transforms are designed per-block based on edge
informationI RDO-TS: The best transform is selected per-block from a given set of
transforms using an RD criterion.
Average Design: Graph Learning
2D GMRF Modeling for Residuals ⇐⇒ Learning Generalized GraphLaplacians:
maximizew, v
logdet(
L)− Tr
(LS)
subject to L = B diag(w) Bt + diag(v)
w 0
(1)
B: incidence matrix of a given graph (connectivity)S: sample covariance.w: edge weightsv: self-loop (vertex) weights.
Graph Learning for Intra Predicted Residuals
Mode Variance Graph weights Self-loop weights
DC
Horizontal 10
Planar
Diagonal 18
Graph Learning for Inter Predicted Residuals
HEVC (Inter-mode)
Eight PU partitions in HEVC
We define PU-modes and develop PU-adaptive transforms
PU-modes
This is a simplification: HEVC commonly uses transforms crossing partitionboundaries.
Graph Learning for Inter Predicted Residuals
PU Variance Graph weights Self-loop weights
Graph Learning for Inter Predicted Residuals
PU Variance Graph weights Self-loop weights
Experimental Setup – GBT vs. KLT
Compare: GBT vs. KLT
HEVC standard
Transform set
DCT
Residual Blocks
Transform Selection Quant Entropy
Encoding Coeff.
Mode Information
Average designs
…
𝝀 RDO
I Average designs:
I Intra: (i) 35 Non-separable GBTs vs. (ii) 35 KLTsI Inter: (i) 7 Non-separable GBTs vs. (ii) 7 KLTs
I λ = 0.85× 2(QP−12)/3
Summary: Compression
I RDO-TS is better than MD-TS on average
I RDO /w EA-GBT tend to improve more for large blocks
BD-rate MD-TS RDO-TS RDO-TS /w EA-GBTInter -0.27 -2.66 -2.89Intra -2.23 -6.30 -6.76
I Rich set of extensions of existing transforms (DCT/ADST)
I Promising results: currently investigating full integration in codec
Conclusions and FAQs
I GSP has deep roots in the Signal Processing community
I DCT [Strang’99], Image Segmentation using Graph Cuts [Shi andMalik’00], Semi-supervised learning [Belkin, Niyogi’04, Zhu et al, ’05]
I A lot of progress, interesting results
I Filterbanks, Sampling, Graph Learning
I Many open questions!
I FAQs
I Scaling to large graphsI Graph choiceI Killer app
I To get started (Shuman et al, SPM’13), (Sandryhaila & Moura, SPM’14)
Conclusions and FAQs
I GSP has deep roots in the Signal Processing community
I DCT [Strang’99], Image Segmentation using Graph Cuts [Shi andMalik’00], Semi-supervised learning [Belkin, Niyogi’04, Zhu et al, ’05]
I A lot of progress, interesting results
I Filterbanks, Sampling, Graph Learning
I Many open questions!
I FAQs
I Scaling to large graphsI Graph choiceI Killer app
I To get started (Shuman et al, SPM’13), (Sandryhaila & Moura, SPM’14)
Conclusions and FAQs
I GSP has deep roots in the Signal Processing community
I DCT [Strang’99], Image Segmentation using Graph Cuts [Shi andMalik’00], Semi-supervised learning [Belkin, Niyogi’04, Zhu et al, ’05]
I A lot of progress, interesting results
I Filterbanks, Sampling, Graph Learning
I Many open questions!
I FAQs
I Scaling to large graphsI Graph choiceI Killer app
I To get started (Shuman et al, SPM’13), (Sandryhaila & Moura, SPM’14)
References I
S.K. Narang and A. Ortega.
Lifting based wavelet transforms on graphs.In Proc. of Asia Pacific Signal and Information Processing Association (APSIPA), October 2009.
S. Narang, G. Shen, and A. Ortega.
Unidirectional graph-based wavelet transforms for efficient data gathering in sensor networks.In In Proc. of ICASSP’10.
S. Narang and A. Ortega.
Downsampling Graphs using Spectral TheoryIn In Proc. of ICASSP’11.
G. Shen and A. Ortega.
Transform-based Distributed Data Gathering.IEEE Transactions on Signal Processing.
G. Shen, S. Pattem, and A. Ortega.
Energy-efficient graph-based wavelets for distributed coding in wireless sensor networks.In Proc. of ICASSP’09, April 2009.
G. Shen, S. Narang, and A. Ortega.
Adaptive distributed transforms for irregularly sampled wireless sensor networks.In Proc. of ICASSP’09, April 2009.
G. Shen and A. Ortega.
Tree-based wavelets for image coding: Orthogonalization and tree selection.In Proc. of PCS’09, May 2009.
R. Baraniuk, A. Cohen, and R. Wagner.
Approximation and compression of scattered data by meshless multiscale decompositions.Applied Computational Harmonic Analysis, 25(2):133–147, September 2008.
R.R. Coifman and M. Maggioni.
Diffusion wavelets.Applied Computational Harmonic Analysis, 21(1):53–94, 2006.
References II
M. Crovella and E. Kolaczyk.
Graph wavelets for spatial traffic analysis.In IEEE INFOCOMM, 2003.
I. Daubechies, I. Guskov, P. Schroder, and W. Sweldens.
Wavelets on irregular point sets.Phil. Trans. R. Soc. Lond. A, 357(1760):2397–2413, September 1999.
W. Ding, F. Wu, X. Wu, S. Li, and H. Li.
Adaptive directional lifting-based wavelet transform for image coding.IEEE Transactions on Image Processing, 16(2):416–427, February 2007.
D. Jungnickel.
Graphs, Networks and Algorithms.Springer-Verlag Press, 2nd edition, 2004.
M. Maitre and M. N. Do,
“Shape-adaptive wavelet encoding of depth maps,”In Proc. of PCS’09, 2009.
Y. Morvan, P.H.N. de With, and D. Farin,
“Platelet-based coding of depth maps for the transmission of multiview images,”2006, vol. 6055, SPIE.
E. Le Pennec and S. Mallat.
Sparse geometric image representations with bandelets.IEEE Transactions on Image Processing, 14(4):423– 438, April 2005.
G. Karypis, and V. Kumar, “Multilevel k-way Partitioning Scheme for Irregular Graphs”, J. Parallel Distrib. Comput. vol. 48(1),
pp. 96-129, 1998.
A. Said and W.A. Pearlman.
A New, Fast, and Efficient Image Codec Based on Set Partitioning in Hierarchical Trees.IEEE Transactions on Circuits and Systems for Video Technology, 6(3):243– 250, June 1996.
References III
A. Sanchez, G. Shen, and A. Ortega,
“Edge-preserving depth-map coding using graph-based wavelets,”In Proc. of Asilomar’09, 2009.
G. Valiente.
Algorithms on Trees and Graphs.Springer, 1st edition, 2002.
V. Velisavljevic, B. Beferull-Lozano, M. Vetterli, and P.L. Dragotti.
Directionlets: Anisotropic multidirectional representation with separable filtering.IEEE Transactions on Image Processing, 15(7):1916– 1933, July 2006.
R. Wagner, H. Choi, R. Baraniuk, and V. Delouille.
Distributed wavelet transform for irregular sensor network grids.In IEEE Stat. Sig. Proc. Workshop (SSP), July 2005.
R. Wagner, R. Baraniuk, S. Du, D.B. Johnson, and A. Cohen.
An architecture for distributed wavelet analysis and processing in sensor networks.In IPSN ’06, April 2006.
A. Wang and A. Chandraksan.
Energy-efficient DSPs for wireless sensor networks.IEEE Signal Processing Magazine, 19(4):68–78, July 2002.
S.K. Narang, G. Shen and A. Ortega,
“Unidirectional Graph-based Wavelet Transforms for Efficient Data Gathering in Sensor Networks”.pp.2902-2905, ICASSP’10, Dallas, April 2010.
S.K. Narang and A. Ortega,
“Local Two-Channel Critically Sampled Filter-Banks On Graphs”,Intl. Conf. on Image Proc. (2010),
References IV
R. R. Coifman and M. Maggioni,
“Diffusion Wavelets,”Appl. Comp. Harm. Anal., vol. 21 no. 1 (2006), pp. 53–94
A. Sandryhaila and J. Moura,
“Discrete Signal Processing on Graphs”IEEE Transactions on Signal Processing, 2013
D. K. Hammond, P. Vandergheynst, and R. Gribonval,
“Wavelets on graphs via spectral graph theory,”Applied and Computational Harmonic Analysis, March 2011.
D. Shuman, S. K. Narang, P. Frossard, A. Ortega, P. Vandergheynst,
“Signal Processing on Graphs: Extending High-Dimensional Data Analysis to Networks and Other Irregular Data Domains”Signal Processing Magazine, May 2013
M. Crovella and E. Kolaczyk,
“Graph wavelets for spatial traffic analysis,”in INFOCOM 2003, Mar 2003, vol. 3, pp. 1848–1857.
G. Shen and A. Ortega,
“Optimized distributed 2D transforms for irregularly sampled sensor network grids using wavelet lifting,”in ICASSP’08, April 2008, pp. 2513–2516.
W. Wang and K. Ramchandran,
“Random multiresolution representations for arbitrary sensor network graphs,”in ICASSP, May 2006, vol. 4, pp. IV–IV.
R. Wagner, H. Choi, R. Baraniuk, and V. Delouille.
Distributed wavelet transform for irregular sensor network grids.In IEEE Stat. Sig. Proc. Workshop (SSP), July 2005.
S. K. Narang and A. Ortega,
“Lifting based wavelet transforms on graphs,”(APSIPA ASC’ 09), 2009.
References V
B. Zeng and J. Fu,
“Directional discrete cosine transforms for image coding,”in Proc. of ICME 2006, 2006.
E. Le Pennec and S. Mallat,
“Sparse geometric image representations with bandelets,”IEEE Trans. Image Proc., vol. 14, no. 4, pp. 423–438, Apr. 2005.
M. Vetterli V. Velisavljevic, B. Beferull-Lozano and P.L. Dragotti,
“Directionlets: Anisotropic multidirectional representation with separable filtering,”IEEE Trans. Image Proc., vol. 15, no. 7, pp. 1916–1933, Jul. 2006.
P.H.N. de With Y. Morvan and D. Farin,
“Platelet-based coding of depth maps for the transmission of multiview images,”in In Proceedings of SPIE: Stereoscopic Displays and Applications, 2006, vol. 6055.
M. Tanimoto, T. Fujii, and K. Suzuki,
“View synthesis algorithm in view synthesis reference software 2.0 (VSRS2.0),”Tech. Rep. Document M16090, ISO/IEC JTC1/SC29/WG11, Feb. 2009.
S. K. Narang and A. Ortega,
“Local two-channel critically-sampled filter-banks on graphs,”In ICIP’10, Sep. 2010.
S.K. Narang and Ortega A.,
“Perfect reconstruction two-channel wavelet filter-banks for graph structured data,”IEEE trans. on Sig. Proc., vol. 60, no. 6, June 2012.
J. P-Trufero, S.K. Narang and A. Ortega,
“Distributed Transforms for Efficient Data Gathering in Arbitrary Networks,”In ICIP’11., Sep 2011.
E. M-Enrquez, F. Daz-de-Mara and A. Ortega,
“Video Encoder Based On Lifting Transforms On Graphs,”In ICIP’11., Sep 2011.
References VI
I. Daubechies and I. Guskov and P. Schrder and W. Sweldens.
Wavelets on Irregular Point Sets.Phil. Trans. R. Soc. Lond. A, 1991.
Gilbert Strang,
“The discrete cosine transform,”SIAM Review, vol. 41, no. 1, pp. 135–147, 1999.
S.K. Narang and A. Ortega,
“Downsampling graphs using spectral theory,”in ICASSP ’11., May 2011.
I. Pesenson,
“Sampling in Paley-Wiener spaces on combinatorial graphs,”Trans. Amer. Math. Soc, vol. 360, no. 10, pp. 5603–5627, 2008.
M. Levorato, S.K. Narang, U. Mitra, A. Ortega.
Reduced Dimension Policy Iteration for Wireless Network Control via Multiscale Analysis.Globecom, 2012.
A. Gjika, M. Levorato, A. Ortega, U. Mitra.
Online learning in wireless networks via directed graph lifting transform.Allerton, 2012.
W. W. Zachary.
An information flow model for conflict and fission in small groups.Journal of Anthropological Research, 33, 452-473 (1977).
S. K. Narang, Y. H. Chao, and A. Ortega,
“Graph-wavelet filterbanks for edge-aware image processing,”IEEE SSP Workshop, pp. 141–144, Aug. 2012.
A. Sandryhaila and J.M.F. Moura (2013).
Discrete Signal Processing on Graphs.Signal Processing, IEEE Transactions on, 61(7):1644-1656
References VII
S.K. Narang, A. Ortega (2013).
Compact Support Biorthogonal Wavelet Filterbanks for Arbitrary Undirected Graphs.Signal Processing, IEEE Transactions on
V. Ekambaram, G. Fanti, B. Ayazifar, K. Ramchandran.
Critically-Sampled Perfect-Reconstruction Spline-Wavelet Filterbanks for Graph Signals.IEEE GlobalSIP 2013
Eduardo Pavez and Antonio Ortega
Generalized laplacian precision matrix estimation for graph signal processing.International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016
X. Dong, D. Thanou, P. Frossard and P. Vandergheynst
Learning Laplacian matrix in smooth graph signal representationspreprint
Vassilis Kalofolias
How to learn a graph from smooth signalspreprint
B. Lake and J. Tenenbaum
Discovering structure by learning sparse graphProceedings of the 33rd Annual Cognitive Science Conference
C. Zhang and D. Florencio
Analyzing the Optimality of Predictive Transform Coding Using Graph-Based ModelsSignal Processing Letters, IEEE, January 2013
Markus Maier, Ulrike von Luxburg and Matthias Hein
How the result of graph clustering methods depends on the construction of the graphESAIM: Probability and Statistics , January 2013
Thomas Hofmann, Bernhard Scholkopf and Alexander J. Smola
Kernel methods in machine learningThe Annals of Statistics , Vol. 36, No. 3 (Jun., 2008), pp. 1171-1220
References VIII
Turker Biyikoglu, Josef Leydold and Peter F. Stadler
Laplacian eigenvectors of graphsLecture notes in mathematics , Vol. 1915, 2007
Markus Puschel and Jose MF Moura
The algebraic approach to the discrete cosine and sine transforms and their fast algorithmsSIAM Journal on Computing , Vol. 32, Issue 5, 2003
Gilbert Strang
The discrete cosine transformsSIAM review , Vol. 41, Issue 1, 1999
Martin Slawski and Matthias Hein
Estimation of positive definite M-matrices and structure learning for attractive Gaussian Markov random fieldsLinear Algebra and its Applications , vol. 473, 2015
Wei Hu, Gene Cheung and Antonio Ortega
Intra-Prediction and Generalized Graph Fourier Transform for Image CodingSignal Processing Letters, IEEE , Vol. 2, Issue 11, 2015
Eduardo Pavez, Hilmi E. Egilmez, Yongzhe Wang and Antonio Ortega
GTT: Graph template transforms with applications to image codingPicture Coding Symposium (PCS), 2015
Jerome Friedman, Trevor Hastie and Robert Tibshirani
Sparse inverse covariance estimation with the graphical lassoBiostatistics, Vol. 9 Issue 3, 2008
Pradeep Ravikumar, Martin J. Wainwright, Garvesh Raskutti and Bin Yu
High-dimensional covariance estimation by minimizing 1-penalized log-determinant divergenceElectronic Journal of Statistics, Vol. 5, 2011
J. Deri and J. Moura,
Spectral Projector-based Graph Fourier TransformsJ. on Selected Topics in Signal Processing, submitted, 2016.
References IX
Eduardo Pavez and Antonio Ortega
Generalized laplacian precision matrix estimation for graph signal processing.International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016
X. Dong, D. Thanou, P. Frossard and P. Vandergheynst
Learning Laplacian matrix in smooth graph signal representationsIEEE Transactions on signal processing, 2016
Vassilis Kalofolias
How to learn a graph from smooth signalsAISTATS, 2016
B. Lake and J. Tenenbaum
Discovering structure by learning sparse graphProceedings of the 33rd Annual Cognitive Science Conference
Sepideh Hassan-Moghaddam, Neil K Dhingra, and Mihailo Jovanovic
Dopology identification of undirected consensus networks via sparse inverse covariance estimationCDC, 2016
Turker Biyikoglu, Josef Leydold and Peter F. Stadler
Laplacian eigenvectors of graphsLecture notes in mathematics , Vol. 1915, 2007
Matan Gavish, Boaz Nadler, and Ronald R Coifman
Multiscale wavelets on trees, graphs and high dimensional data: Theory and applications to semi supervised learningICML, 2010
Godwin Shen and Antonio Ortega
Transform-based distributed data gatheringIEEE Transactions on signal processing, 2010
Sunil K Narang and Antonio Ortega
erfect Reconstruction Two-Channel Wavelet Filter Banks for Graph Structured DataIEEE Transactions on signal processing, 2012
References X
V. Y. Tan, A. Anandkumar, and A. S. Willsky
Learning gaussian tree models: Analysis of error exponents and extremal structuresIEEE Transactions on Signal Processing, 2010
C. Chow and C. Liu
Approximating discrete probability distributions with dependence treesIEEE transactions on Information Theory, 1968
S. Yang, Q. Sun, S. Ji, P. Wonka, I. Davidson, and J. Ye
Structural graphical lasso for learning mouse brain connectivityACM SIGKDD, 2015
P.-L. Loh and M. J. Wainwright
Structure estimation for discrete graphical models: Generalized covariance matrices and their inversesNIPS, 2012
C.-J. Hsieh, A. Banerjee, I. S. Dhillon, and P. K. Ravikumar
A divide-and-conquer method for sparse inverse covariance estimationNIPS, 2012
Martin Slawski and Matthias Hein
Estimation of positive definite M-matrices and structure learning for attractive Gaussian Markov random fieldsLinear Algebra and its Applications , vol. 473, 2015
Jerome Friedman, Trevor Hastie and Robert Tibshirani
Sparse inverse covariance estimation with the graphical lassoBiostatistics, Vol. 9 Issue 3, 2008
Pradeep Ravikumar, Martin J. Wainwright, Garvesh Raskutti and Bin Yu
High-dimensional covariance estimation by minimizing 1-penalized log-determinant divergenceElectronic Journal of Statistics, Vol. 5, 2011
M. X. Goemans and D. P. Williamson
Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programmingJournal of the ACM , 1995