Network deconvolution as a general method to distinguish ...

27
NATURE BIOTECHNOLOGY (Supplementary Information Addendum as of 7 April 2015) Nat. Biotechnol. 31, 726–733 (2013) Network deconvolution as a general method to distinguish direct dependencies in networks Soheil Feizi, Daniel Marbach, Muriel Médard & Manolis Kellis 1. Additional details on 26 August 2013 correction notice for Supplementary Figure 4. In the version of Supplementary Figure 4 initially published (14 July 2013), panel a was erroneously plotted to show the performance relative to scaling by the maximum eigenvalue λ obs of the observed network G obs on the x axis. The revised panel a (26 August 2013) corrects the error and instead shows performance relative to scaling by the maximum eigenvalue λ dir of the direct network G dir , as computed using equation 12 in Supplementary Note 1.6. The corrected plot shows a peak at S(λ dir =0.6) = 22.6 instead of S(λ obs =0.9) = 21.5, but the difference in performance is only ~5% at any point along the curve. We show an overlay of the original curve (14 July 2013; red) and the revised curve (26 August 2013; blue) in the panel below. For panel b, the original version (14 July 2013; left panel below) had the same issue with λ obs vs. λ dir scaling, and showed the performance for residue pairs within a distance of 11 angstroms, which was the threshold used in our original submission. This threshold was changed to 7 angstroms in response to reviewer comments, and the corrected figure (26 August 2013; right panel below) shows the performance for the new threshold and using the correct scaling by λ dir . Even though the scale of the numbers has changed (notice the different y axes), this change did not affect our conclusion that higher values of beta lead to higher performance. 2. Original parameter selection rationale. We provide a table detailing the rationale for selecting alpha (density) and beta (scaling) parameters for each application, as described in the Supplementary Notes 1.6 for scaling and 2.3, 3 and 4 for density: The density parameter alpha is a preprocessing parameter, set in advance for each application, which represents the expected density of the input network to be deconvolved, namely the fraction of non-zero edges relative to a fully connected network. For applications 2 and 3, we used alpha = 100% density, the default setting of this parameter, which corresponds to utilization of all edges. For application 1, we used alpha = 10% density (as stated in Supplementary Note 2.3), based on the expectation that regulators target, on average, approximately 10% of all genes, consistent with the observed sparsity of ChIP-seq experiments in diverse species (setting this value to 1 would correspond to each regulator targeting 100% of all genes). From a ranking perspective, this corresponds to relative ranks in the top 10% of regulatory connections being meaningful, but the remaining 90% of ranks being uninformative. Application 1: Regulatory networks Application 2: Protein contact inference application Application 3: Co-authorship social network Applied to: 27 networks from DREAM5: CLR, ARACNE, MI, Pearson, Spearman, GENIE3, TIGRESS, Inferelator, ANOV, each applied to In silico, E. coli and S. cerevisiae networks 30 networks: Mutual Information (MI) and Direct Information (DI), each applied to 1hzx, 3tgi, 5p21, 1f21, 1e6k, 1bkr, 2it6, 1rqm, 2o72, 1r9h, 1odd, 1g2e, 1wvn, 5pti, 2hda (PDB IDs) 1 network connecting 1,589 scientists with 2,742 edges. Each edge is binary and cor- responds to two scientists co-authoring one or more papers. Network density parameter (alpha) Density = 10%, corresponding to each regulator targeting on average, 10% of all genes Density = 100%, corresponding to the use of the full mutual information and direct information matrices Density = 100%, corresponding to the full co-authorship matrix (all collaboration links used) Eigenvalue scaling parameter (beta) beta = 0.5, corresponding to relatively shorter propagation of indirect effects through the network beta = 0.99, corresponding to propagation of indi- rect effects over longer indirect paths beta = 0.95, corresponding to propagation of indirect effects over longer indirect paths 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 Protein contact map prediction score Protein contact map prediction score beta 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 30 20 50 40 beta 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 5 10 15 20 25 Gene regulatory network prediction score revised (x-axis=max eig of direct network) original (x-axis=max eig of observed network) maximum eigenvalue CORRECTION NOTICE

Transcript of Network deconvolution as a general method to distinguish ...

Page 1: Network deconvolution as a general method to distinguish ...

NATURE BIOTECHNOLOGY

(Supplementary Information Addendum as of 7 April 2015)

Nat. Biotechnol. 31, 726–733 (2013)

Network deconvolution as a general method to distinguish direct dependencies in networksSoheil Feizi, Daniel Marbach, Muriel Médard & Manolis Kellis

1. Additional details on 26 August 2013 correction notice for Supplementary Figure 4. In the version of Supplementary Figure 4 initially published (14 July 2013), panel a was erroneously plotted to show the performance relative to scaling by the maximum eigenvalue λobs of the observed network Gobs on the x axis. The revised panel a (26 August 2013) corrects the error and instead shows performance relative to scaling by the maximum eigenvalue λdir of the direct network Gdir, as computed using equation 12 in Supplementary Note 1.6. The corrected plot shows a peak at S(λdir=0.6) = 22.6 instead of S(λobs=0.9) = 21.5, but the difference in performance is only ~5% at any point along the curve. We show an overlay of the original curve (14 July 2013; red) and the revised curve (26 August 2013; blue) in the panel below.

For panel b, the original version (14 July 2013; left panel below) had the same issue with λobs vs. λdir scaling, and showed the performance for residue pairs within a distance of 11 angstroms, which was the threshold used in our original submission. This threshold was changed to 7 angstroms in response to reviewer comments, and the corrected figure (26 August 2013; right panel below) shows the performance for the new threshold and using the correct scaling by λdir. Even though the scale of the numbers has changed (notice the different y axes), this change did not affect our conclusion that higher values of beta lead to higher performance.

2. Original parameter selection rationale. We provide a table detailing the rationale for selecting alpha (density) and beta (scaling) parameters for each application, as described in the Supplementary Notes 1.6 for scaling and 2.3, 3 and 4 for density:

The density parameter alpha is a preprocessing parameter, set in advance for each application, which represents the expected density of the input network to be deconvolved, namely the fraction of non-zero edges relative to a fully connected network. For applications 2 and 3, we used alpha = 100% density, the default setting of this parameter, which corresponds to utilization of all edges. For application 1, we used alpha = 10% density (as stated in Supplementary Note 2.3), based on the expectation that regulators target, on average, approximately 10% of all genes, consistent with the observed sparsity of ChIP-seq experiments in diverse species (setting this value to 1 would correspond to each regulator targeting 100% of all genes). From a ranking perspective, this corresponds to relative ranks in the top 10% of regulatory connections being meaningful, but the remaining 90% of ranks being uninformative.

Application 1: Regulatory networks Application 2: Protein contact inference application Application 3: Co-authorship social network

Applied to: 27 networks from DREAM5: CLR, ARACNE, MI, Pearson, Spearman, GENIE3, TIGRESS, Inferelator, ANOV, each applied to In silico, E. coli and S. cerevisiae networks

30 networks: Mutual Information (MI) and Direct Information (DI), each applied to 1hzx, 3tgi, 5p21, 1f21, 1e6k, 1bkr, 2it6, 1rqm, 2o72, 1r9h, 1odd, 1g2e, 1wvn, 5pti, 2hda (PDB IDs)

1 network connecting 1,589 scientists with 2,742 edges. Each edge is binary and cor-responds to two scientists co-authoring one or more papers.

Network density parameter (alpha)

Density = 10%, corresponding to each regulator targeting on average, 10% of all genes

Density = 100%, corresponding to the use of the full mutual information and direct information matrices

Density = 100%, corresponding to the full co-authorship matrix (all collaboration links used)

Eigenvalue scaling parameter (beta)

beta = 0.5, corresponding to relatively shorter propagation of indirect effects through the network

beta = 0.99, corresponding to propagation of indi-rect effects over longer indirect paths

beta = 0.95, corresponding to propagation of indirect effects over longer indirect paths

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

20

40

60

80

100

Pro

tein

con

tact

map

pred

ictio

n sc

ore

Pro

tein

con

tact

map

pred

ictio

n sc

ore

beta0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

10

30

20

50

40

beta

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

5

10

15

20

25

Gen

e re

gula

tory

net

wor

kpr

edic

tion

scor

e

revised (x−axis=max eig of direct network)original (x−axis=max eig of observed network)

maximum eigenvalue

CORRECT ION NOT ICE

Page 2: Network deconvolution as a general method to distinguish ...

NATURE BIOTECHNOLOGY

As described in Supplementary Note 1.6, the beta parameter represents the rate at which indirect effects are expected to propagate through the network. For small values of beta, indirect effects decay rapidly and higher-order transitive interactions play a smaller role. For large values of beta, indirect effects persist for longer periods and higher-order interactions play a more important role. For applications 2 and 3 we used values of beta near 1, corresponding to long persistence of indirect effects (the default value is beta = 0.99). For application 1, we used beta = 0.5, based on the intuition that transitive regulatory effects require rounds of transcription and translation that should delay indirect effects and thus cause them to decay more rapidly.

3. Effect of varying both density (alpha) and eigenvalue scaling (beta) parameters on the DREAM5 networks. To understand the effect of varying both density and scaling, we evaluated the average performance of ND relative to other DREAM5 methods for alpha = 10%, 50% and 100% density, and beta = 0.5 and 0.9 eigenvalue scaling. We found that for all parameter combinations, ND led to improved average performance relative to the other DREAM5 methods (127–146% relative performance on average). The improvement was consistently higher for Mutual Information (MI)- and Correlation (Corr.)-based methods, which were the primary motivation behind ND, indicative of their abundant indirect effects. The highest relative performance overall (146% relative performance) was obtained for the combination of parameters closest to those used in the protein folding and social network applications (alpha = 100% density, beta = 0.9 eigenvalue scaling), but all parameter combinations tested led to similar performance improvements.

4. Additional scripts and resources. We also describe additional scripts and resources to facilitate reproducibility of our work.

• We now provide a script tailored to the regulatory network application (ND_regulatory.m) that implements the steps described in Supplementary Note 1.4, namely padding the MxN (M regulators, N target genes) matrix of the observed network Gobs with zeros to turn it into an NxN square matrix, and adding a random perturbation of the input network to eliminate perfectly aligned subspaces and achieve diagonalizability in the case of non-diagonalizable matrices.

• We also provide plotting code for generating figures for all three applications of our method, processed input and output datasets for all three network applications, and a try-it-out page with interactive scripts reproducing our results.

We also make available a demonstration video that uses these scripts to reproduce the key figures for the three applications in our paper on regulatory networks (Fig. 2a), protein folding residue networks (Supplementary Fig. 7a–d), and a social co-authorship network (Fig. 4b). These scripts and resources are made available on our website (http://compbio.mit.edu/nd/) and on FigShare (http://figshare.com/articles/Network_Deconvolution_Code/1349533). The Supplementary Data file (also available at http://compbio.mit.edu/nd/Supplementary-Data/) has been updated to include (i) updated scripts, (ii) the plotting code and datasets for generating figures for all three applications and (iii) corrected deconvolved networks for each application domain.

Beta = 0.5

Overall MI/Corr. Other

Density (Alpha) = 0.1 142% 166% 111%

Density (Alpha) = 0.5 132% 153% 107%

Density (Alpha) = 1.0 127% 144% 106%

Beta = 0.9

Overall MI/Corr. Other

Density (Alpha) = 0.1 140% 166% 108%

Density (Alpha) = 0.5 132% 157% 100%

Density (Alpha) = 1.0 146% 182% 101%

CORRECT ION NOT ICE

Page 3: Network deconvolution as a general method to distinguish ...

nature biotechnology

Nat. Biotechnol. 31, 726–733 (2013)

Network deconvolution as a general method to distinguish direct dependencies in networksSoheil Feizi, Daniel Marbach, Muriel Médard & Manolis KellisIn the version of this file originally posted online, in equation (12) in Supplementary Note 1, the word “max” should have been “min.” The correct formula was implemented in the source code. Clarification has been made to Supplementary Notes 1.1, 1.3, 1.6 and Supplementary Figure 4 about the practical implementation of network deconvolution and parameter selection for application to the examples used in the paper. In Supplementary Data, in the file “ND.m” on line 55, the parameter delta should have been set to 1 – epsilon (that is, delta = 1 – 0.01), rather than 1.

Supplementary Figure 4 was also updated to correct two plotting errors in panels a and b. (a) For panel a, the original version had an incorrect x axis, plotting the maximum eigenvalue of the observed network instead of the maximum eigenvalue of the direct network. This is now corrected, resulting in a peak at S(0.6) = 22.6 instead of S(0.9) = 21.5 (a difference of 5%). This 5% difference does not affect any of our conclusions. Network deconvolution substantially improves upon other methods in the DREAM5 benchmarks for any value of beta between 0.5 and 0.99. (b) For panel b, the original version was showing the score evaluated for the protein contact map network connecting all residue pairs within a distance of 11 angstroms, which was the threshold used in our original submission. This threshold was changed to 7 angstroms in response to reviewer comments, and the corrected figure shows the performance for the new threshold. The change does not affect any of our results or conclusions.

These errors have been corrected in this file and in the Supplementary Data zip file as of 26 August 2013; the second paragraph was added to this notice 12 February 2014.

correct ion not ice

Page 4: Network deconvolution as a general method to distinguish ...

Supplementary Notes

Network Deconvolution - A General Method to DistinguishDirect Dependencies over Networks

Soheil Feizi1,2,3,4, Daniel Marbach1,2, Muriel Médard3 and Manolis Kellis1,2,4

1 Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.2 Broad Institute of MIT and Harvard, Cambridge, MA, USA.3 Research Laboratory of Electronics (RLE) at MIT, Cambridge, Massachusetts, USA.4 Correspondent author ([email protected],[email protected]).

Supplementary Sections

1 Analyzing General Properties of Network Deconvolution 31.1 Network deconvolution algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Optimality analysis of network deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Modeling assumptions and intuitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Decomposable and non-decomposable matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4.1 Asymmetric decomposable matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4.2 General non-decomposable matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Robustness against noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.6 Scaling effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.7 Computational complexity analysis of network deconvolution . . . . . . . . . . . . . . . . . . . . . . . 7

1.7.1 Using sparsity in eigen decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.7.2 Parallelization of network deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Inferring Gene Regulatory Networks by Network Deconvolution 82.1 DREAM framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Evaluation functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Network motif analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Inferring Protein Structural Constraints by Network Deconvolution 10

4 Inferring Weak and Strong Collaboration Ties by Network Deconvolution 11

Figures

S1 Effects of applying network deconvolution on network eigenvalues . . . . . . . . . . . . . . . . . . . . . . 13S2 Convergence of network deconvolution for non-decomposable matrices . . . . . . . . . . . . . . . . . . . 14S3 Robustness of network deconvolution against noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15S4 Linear scaling effects on network deconvolution performance . . . . . . . . . . . . . . . . . . . . . . . . . 16

1

Page 5: Network deconvolution as a general method to distinguish ...

S5 Scalability of network deconvolution to large networks using eigen sparsity and parallelization . . . . . 17S6 DREAM framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18S7 Overall total and non-redundant contact discovery rates . . . . . . . . . . . . . . . . . . . . . . . . . . . 19S8 Discovery rate of interacting contacts for all tested proteins . . . . . . . . . . . . . . . . . . . . . . . . . 20S9 Discovery rate of non-redundant contacts for all tested proteins . . . . . . . . . . . . . . . . . . . . . . . 21S10 Total and non-redundant discovery rate for different thresholds of contact proximity . . . . . . . . . . . 22

Tables

S1 General properties of considered networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23S2 Tested protein names and PDB IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2

Page 6: Network deconvolution as a general method to distinguish ...

1 Analyzing General Properties ofNetwork Deconvolution

In this section, we describe the proposed network decon-volution (ND) algorithm and analyze its general prop-erties. First, we show how network deconvolution findsa globally optimal direct dependency matrix by elimi-nating indirect dependencies from the observed network.We present an extension of network deconvolution algo-rithm for non-decomposable matrices based on an iter-ative method. Then, we analyze the robustness of net-work deconvolution in the presence of noise and linearscaling. Finally, we analyze computational complexity ofnetwork deconvolution and propose two extensions to it(using eigen sparsity and parallelization) to be able toefficiently use network deconvolution over very large net-works.

1.1 Network deconvolution algorithm

Network deconvolution is a systematic approach of com-puting direct dependencies in a network by use of localedge weights. Recall that, Gobs represents an observeddependency matrix, a properly scaled similarity matrixamong variables. The linear scaling depends on the largestabsolute eigenvalue of the un-scaled similarity matrix andis discussed in more details in Section 1.6. Componentsof Gobs can be derived by use of different pairwise simi-larity metrics such as correlation or mutual information.In particular, gobsi,j , the (i, j)-th component of Gobs, rep-resents the similarity value between the observed patternsof variables i and j in the network.

A perennial challenge to inferring networks is that, bothdirect and indirect dependencies arise together. A directinformation flow modeled by an edge in Gdir can give riseto two or higher level indirect information flows capturedin Gindir:

Gindir = G2dir +G3

dir + . . . (1)

The power associated with each term in Gindir corre-sponds to the level of indirection contributed by thatterm. We assume that the observed dependency matrix,Gobs, comprises both direct and indirect dependency ef-fects (i.e., Gobs = Gdir + Gindir). The case of havingexternal noise over observed dependencies is consideredin Section 1.5. Further, we assume that Gdir is an n × ndecomposable matrix. The case of non-decomposable ma-trices is considered in Section 1.4.

In Section 1.2, we show that the following network de-convolution algorithm finds an optimal solution for di-

rect dependency weights by using the observed dependen-cies:

Algorithm 1 Network deconvolution has three steps:

• Linear Scaling Step: The observed dependency ma-trix is scaled linearly so that all eigenvalues of thedirect dependency matrix are between −1 and 1.

• Decomposition Step: The observed dependencymatrix Gobs is decomposed to its eigenvalues andeigenvectors such that Gobs = UΣobsU

−1.

• Deconvolution step: A diagonal eigenvalue matrixΣdir is formed whose i-th component is λdir

i =λobsi

λobsi +1

.Then, the output direct dependency matrix is Gdir =UΣdirU

−1.

In the practical implementation of network deconvolution(ND), matrices are made symmetric, although this wasnot needed in our three examples as the input matriceswere already symmetric. Network deconvolution providesdirect weights for both observed and non-observed inter-actions, but their use depends on the application. In thispaper, we focused on weights for the subset of observededges, as our goal was to deconvolve indirect effects onthe existing interactions. For gene networks we used thetop 100,000 edges, for protein structure prediction we usedthe top 250 edges (between distant residues), and for co-authorship networks we used all observed edges withouta cutoff. In practice, the threshold chosen for visualiza-tion and subsequent analysis will depend on the specificapplication. The provided code has the option to only re-port deconvolved weights for observed interactions, or toreport deconvolved weights for all possible interactions,both observed and unobserved.

1.2 Optimality analysis of network decon-volution

In this section, we show how the network deconvolutionalgorithm proposed in Algorithm 1 finds an optimal solu-tion for direct dependency weights by using the observedones.

Suppose U and Σdir represent eigenvectors and a diago-nal matrix of eigenvalues of Gdir, where λdir

i is the i-thdiagonal component of the matrix Σdir. Also, suppose thelargest absolute eigenvalue of Gdir is strictly smaller thanone. This assumption holds by using a linear scaling func-tion over the un-scaled observed dependency network andis discussed in Section 1.6. By using the eigen decom-position principle7, we have Gdir = UΣdirU

−1. There-fore,

3

Page 7: Network deconvolution as a general method to distinguish ...

Gdir +Gindir(a)= Gdir +G2

dir + . . . (2)(b)= (UΣdirU

−1) + (UΣ2dirU

−1) + . . .

= U(Σdir +Σ2

dir + . . .)U−1

= U

i≥1(λdir1 )i · · · 0

.... . .

...0 · · ·

∑i≥1(λ

dirn )i

U−1

(c)= U

λdir1

1−λdir1

· · · 0

.... . .

...0 · · · λdir

n

1−λdirn

U−1.

Equality (a) follows from the definition of diffusion modelof equation (1). Equality (b) follows from the eigen de-composition of matrix Gdir. Equality (c) uses geometricseries to compute the infinite summation in a closed-formsince |λdir

i | < 1 for all i.

By using the eigen decomposition of the observed network,Gobs, we have Gobs = UΣobsU

−1, where

Σobs =

λobs1 · · · 0...

. . ....

0 · · · λobsn

. (3)

Therefore, from equations (2) and (3), if

λdiri

1− λdiri

= λobsi ∀1 ≤ i ≤ n, (4)

the error term Gobs −(Gdir + Gindir

)= 0. Rewriting

equation (4) leads to a non-linear filter over eigenvaluesin the network deconvolution algorithm:

λdiri =

λobsi

1 + λobsi

∀1 ≤ i ≤ n. (5)

This nonlinear filter over eigenvalues allows network de-convolution to eliminate indirect dependency effects mod-eled as in equation (1) and compute a globally optimal so-lution for direct dependencies over the network with zeroerror.

1.3 Modeling assumptions and intu-itions

In this section, we provide more intuitions on networkdeconvolution and discuss about its modeling assumptionsand limitations.

As discussed in Sections 1.1 and 1.2, network deconvo-lution can be viewed as a nonlinear filter that is appliedon eigenvalues of the observed dependency matrix. Fig-ure S1-a shows an example of how eigenvalues of the ob-served matrix change by application of network deconvo-lution. In general, ND decreases magnitudes of large pos-itive eigenvalues of the observed matrix since transitivityeffects modeled as in equation (1) inflate these positiveeigenvalues. Network deconvolution reverses this effectby using the described nonlinear filter.

There are some modeling assumptions and limitations ofnetwork deconvolution as follows:

• Network deconvolution assumes that networks arelinear time-invariant flow-preserving operators whichexcludes nonlinear dynamic networks as well as hid-den sinks or sources. Under this condition, indirectflows over different paths are modeled by using powerseries of the direct matrix, while observed dependen-cies are modeled as the sum of direct and indirecteffects.

• We assume that the maximum absolute eigenvalueof the direct dependency matrix is strictly smallerthan one. This assumption can be satisfied by lin-early scaling the observed dependency matrix. Thisis explained in details in Section 1.6.

When flow assumptions of network deconvolution hold,network deconvolution removes all indirect flow effects andinfers all direct interactions and weights exactly (Figure1d). This figure validates the theoretical framework ofnetwork deconvolution where we know the maximum ab-solute eigenvalue of the direct network is less than one andtherefore the linear scaling step is not necessary. How-ever, when these assumptions do not hold, for example byinclusion of non-linear probabilistic effects through sim-ulations, we show that the practical implementation ofND performs well and infers most of direct edges (Fig-ure 1e). In our simulations, we use scale-free networkswith 100 nodes and add transitive edges using a nonlinearprobabilistic function considering two-hop and three-hopindirect paths over the network. In noisy networks, in-teractions with strengths higher than the minimum directweight are displayed. We display the same number oftop-scoring edges for the deconvolved networks as for thesimulated direct network.

1.4 Decomposable and non-decomposablematrices

The network deconvolution framework described in Algo-rithm 1 is based on eigen decomposition principles andtherefore only can be applied on so called diagonalizablematrices7: a matrix G is diagonalizable if it can be de-

4

Page 8: Network deconvolution as a general method to distinguish ...

composed to its eigenvectors U and a diagonal matrix ofeigenvalues Σ such that G = UΣU−1. Many symmetricmatrices and some asymmetric matrices are diagonaliz-able7.

In this section, first we present an example of an asymmet-ric matrix which is diagonalizable and has real eigenvalues.Then, we propose an extension of network deconvolutionfor general non-diagonalizable matrices based on an iter-ative conjugate gradient method.

1.4.1 Asymmetric decomposable matrices

Here, we present an example of an asymmetric matrixwhich is decomposable to its eigenvectors and real eigen-values. The structure of this asymmetric matrix is usedin the application of network deconvolution in inferringgene regulatory networks where the observed dependencymatrix is asymmetric.

Example 2 Suppose G is a n×n matrix with the follow-ing structure:

G =

[G1 G2

0 0

], (6)

where G1 is a m×m symmetric matrix and G2 is a m×(n−m) matrix which can be asymmetric. We show that,eigenvalues of G are real.

Suppose λ is an eigenvalue of G which corresponds to theeigenvector v. By definition, Gv = λv7. Say

v = [ v1 v2 ]′,

where v1 contains the first m elements of v. There-fore,

Gv = λv ⇒[

G1v1 +G2v20

]= λ

[v1v2

]. (7)

From equation (7), we have v2 = 0 and G1v1 = λv1. Inother words, λ is also an eigenvalue for the matrix G1.Since the matrix G1 is symmetric and diagonalizable, λ isreal and G is diagonalizable.

Example 2 illustrates a matrix structure which is commonin gene regulatory network inference problems. In thatapplication, the first m rows correspond to transcriptionfactor genes and hence they can have outgoing and incom-ing edges. The last n−m rows correspond to genes thatare not transcription factors and therefore only have in-coming edges. This example illustrates that the proposednetwork deconvolution algorithm based on eigen decom-position principles can be applied in gene regulatory net-work inference problems. We discuss more details of thisapplication of network deconvolution in Section 2.

1.4.2 General non-decomposable matrices

Most symmetric and some asymmetric matrices are de-composable and hence amenable to the network deconvo-lution algorithm based on eigen decomposition principles.However, if the input matrix is not decomposable, an ex-tension of network deconvolution based on an iterativegradient descent method can be used to eliminate indi-rect information effects from observed dependencies. Weshow that, for a general asymmetric matrix, this variationof ND finds a direct dependency matrix in a network. Inthe case of convexity, the proposed algorithm converges toa globally optimal solution. We also illustrate the conver-gence of this iterative algorithm over a simulated asym-metric network.

For simplicity, in this part, we only consider the second de-gree term of the diffusion model (i.e., Gindir = G2). If thelargest absolute eigenvalue of Gdir is small enough, higherorder diffusion terms are decaying exponentially and ig-noring them can be justified. Also, suppose the observeddependency matrix Gobs is scaled so that gobsi,j ≤ 1/2.This assumption is required to have a convex optimiza-tion setup and can be obtained by linearly scaling theobserved dependency matrix.

Suppose Γ(Gdir) represents the energy of the errorterm:

Γ(Gdir) , ∥Gobs −(Gdir + Gindir

)∥2F

=

n∑i=1

n∑j=1

(gobsi,j − (gdiri,j + gindiri,j (Gdir)

))2,(8)

where ∥.∥F represents the Frobenius norm7. To computeGdir, the energy of the error term is minimized. Thiscan be formulated as the following optimization prob-lem:

minGdir

Γ(Gdir). (9)

Suppose ∇Γ(Gdir) is the gradient of the matrix Γ(Gdir)with respect to Gdir. The (i, j)-th component of ∇Γ(Gdir)is:

[∇Γ]i,j =∂Γ

∂gdiri,j

. (10)

A closed form expression for ∇Γ(Gdir) can be derived byderivative calculations.

Algorithm 3 The following gradient descent algorithmfinds a globally optimal solution for the optimization prob-lem of equation (9):

5

Page 9: Network deconvolution as a general method to distinguish ...

• Step 0: G1dir = Gobs.

• Step i: Gi+1dir = Gi

dir − βi∇Γ(Gidir).

Step sizes βi is a decreasing sequence and can be chosenin various ways2. Since the optimization problem of equa-tion (9) is convex (the condition gobsi,j ≤ 1/2 is required forthis), this algorithm converges to a globally optimal solu-tion2. In non-convex cases, the proposed gradient descentalgorithm converges to a locally optimal solution.

We illustrate the convergence of Algorithm 3 over a ran-domly generated asymmetric (directed) network with 50nodes and 490 edges. Figure S2 illustrates how the en-ergy of error goes to zero as the number of iterations in-creases.

Using the iterative conjugate gradient method can be veryexpensive especially for very large networks. For exam-ple, in the worst case time complexity, at each iteration,we need to compute n2 gradient terms of equation (10),where each term can be computed in O(n) worst case timecomplexity. Therefore, for a fixed number of interactions,the worst case time complexity of the iterative algorithmcan be as high as O(n3). However, we have developed twoextensions of network deconvolution that make it scalableto very large networks (Section 1.7): the first exploits thesparsity of eigenvalues of low rank networks, and the sec-ond parallelizes network deconvolution over potentially-overlapping subgraphs of the network.

1.5 Robustness against noise

In this section, we show that network deconvolution isprovably robust against noise. We also illustrate its ro-bustness in the application of ND in gene regulatory net-work inference.

Suppose the observed dependency matrix is itselfnoisy:

Gobs = Gobs +N = Gdir +Gindir +N,

where N represents the noise matrix. In this case, we maynot be able to recover Gdir exactly from Gobs but insteadprovide an estimate for it, referred to as Gdir, and showthat, this estimate is close to Gdir as long as the noise ismoderate.

We assume that noise is moderate. We quantify the noisepower by its Euclidean norm, i.e.,

√(∑

i,j n2i,j) where ni,j

is the (i, j)-th component of the noise matrix. This normcan be calculated7by the largest absolute eigenvalue of thenoise matrix N . Here, we assume that the largest absoluteeigenvalue of N , referred to as γ, is much smaller than 1(γ << 1), leading to moderate noise. Also, we assumethat the largest absolute eigenvalue of Gobs is equal to δ <

1, leading to convergence of the Taylor series. This canbe obtained by using a linear mapping function over un-scaled observed dependencies. Thus, when the power ofthe noise is smaller than the power of the observed network(γ << 1 and δ < 1), we have sufficient information forrobust network deconvolution.

By use of network deconvolution over the noisy observednetwork Gobs, an estimate of direct dependency matrixGdir is obtained. We show that, this estimate is close toGdir by bounding the Euclidean norm of the error term(i.e., ∥Gdir − Gdir∥l2) as a function of δ and γ, the largestabsolute eigenvalues of the observed dependency networkGobs and the noise matrix N , respectively:

∥Gdir − Gdir∥l2(d)= ∥ Gobs

I +Gobs− Gobs

I + Gobs

∥l2

(e)= ∥(Gobs − (Gobs)

2 + . . .)

−(Gobs − (Gobs)2 + . . .)∥l2

= ∥(Gobs − Gobs)

−(G2obs − (Gobs)

2) + . . . ∥l2(f)

≤ ∥Gobs − Gobs∥l2+∥G2

obs − G2obs∥l2 + . . .

(g)

≤ γ + γ2 + 2δγ + . . .(h)

≤ γ +O(δ2 + γ2 + δγ). (11)

Equality (d) follows from the application of network de-convolution over matrices Gobs and Gobs as in equation(5). Equality (e) follows from the Taylor series expansionof the function x

1+x for matrices. Inequalities (f),(g) and(h) follow from basic algebraic inequalities7.

Equation (11) shows that, for small enough δ, networkdeconvolution is robust against noise. Hence, in differentapplications, we linearly scale the observed network tohave δ < 1. The effect of this linear mapping is discussedin more details in Section 1.6.

Finally, we illustrate the robustness of network deconvo-lution in the gene regulatory network inference problem.An inference method is robust against noise when its per-formance degrades continuously by increasing the noisepower. Here, for each experiment, we add an artificialGaussian noise with a variance of a fraction of the ex-periment variance and use these noisy datasets to infernetworks. Figure S3 shows the performance of networkdeconvolution algorithm in different noise levels in E. coliand in silico networks. Here, to observe noise effects com-pletely, unlike the DREAM framework scoring (see Sec-tion 2.3), we do not threshold inferred networks. As itcan be seen in this figure, by increasing the noise power,

6

Page 10: Network deconvolution as a general method to distinguish ...

the overall score decreases smoothly which indicates therobustness of network deconvolution against noise.

1.6 Scaling effects

As discussed in Section 1.2, to have convergence of theright-hand side of equation (1), the largest absolute eigen-value of the direct dependency matrix Gdir (say β) shouldbe strictly smaller than one (i.e., β < 1). However, thismatrix is unknown in advance and in fact, the whole pur-pose of having the described model in equation (1) is tocompute the direct dependency matrix. In this section,we show that by linearly scaling the un-scaled observeddependency matrix Gus

obs, it can be guaranteed that thelargest absolute eigenvalue of the direct dependency ma-trix is strictly less than one. In the following, we de-scribe how the linear scaling factor α is chosen whereGobs = αGus

obs.

Suppose λobs(us)+ and λ

obs(us)− are the largest positive and

smallest negetive eigenvalues of Gusobs. Then, by hav-

ing

α ≤ min( β

(1− β)λobs(us)+

,−β

(1 + β)λobs(us)−

), (12)

the largest absolute eigenvalue of Gdir will be less or equalto β < 1.

In the following, we show how inequality (12) is derived.Say λdir and λobs(us) are eigenvalues of direct and unscaledobservation matrices. By a similar argument as the oneof equation (2), we have

λdir =λobs(us)

1α + λobs(us)

. (13)

This function is depicted in Figure S1-b. We considerthree cases: λobs(us) ≥ 0, −1/α < λobs < 0 and λobs(us) <−1/α (in the case of λobs(us) = −1/α, the function isundefined. Hence, we choose α to avoid this case).

• Case 1: when λobs(us) ≥ 0: In this case, we have

λobs(us)

1α + λobs(us)

≤ β < 1 ⇒ α ≤ β

(1− β)λobs(us)

• Case 2: when −1/α < λobs < 0: In this case, wehave

−λobs(us)

1α + λobs(us)

≤ β < 1 ⇒ α ≤ −β

(1 + β)λobs(us)

• Case 3: when λobs(us) < −1/α:

In this case, we have

λobs(us)

1α + λobs(us)

≤ β < 1 ⇒ 1

α≤ 1− β

βλobs(us) < 0,

which is not possible since α > 0. Therefore, α shouldbe chosen so that eigenvalues of the unscaled obser-vation matrix do not fall in this regime. In otherwords, for negative λobs(us), α < −1

λobs(us) . This condi-tion trivially holds as a consequence of Case 2.

Putting all three cases together, inequality (12) is derivedwhich guarantees the largest absolute eigenvalue of thedirect dependency matrix is less than or equal to 0 < β <1.

In equation (1), diffusion effects decay exponentially withrespect to the indirect path length. For example, sec-ond order indirect effects decay proportional to β2 whereβ is the largest absolute eigenvalue of Gdir. Therefore,smaller β means faster convergence of the right-hand sideof equation (1) and in turn faster decay of diffusion ef-fects. In other words, if β is very small, higher orderinteractions play insignificant roles in observed dependen-cies since they decay proportionally to βk where k is theorder of indirect interaction. Figure S4 demonstrates theeffect of β in the performance of three different applica-tions: inferring gene regulatory network, inferring proteinevolutionary constraints and inferring weak and strong so-cial ties. As it is illustrated in this figure, choosing βclose to one (i.e., considering higher order indirect inter-actions) leads to high performances in all considered net-work deconvolution applications. For regulatory networkinference, we used β = 0.5, for protein contact maps, weused β = 0.99, and for co-authorship network, we usedβ = 0.95. Note that, β has higher effects in the generegulatory networks and protein evolutionary constraintnetworks applications compared to the co-authorship net-work application, which means low-order indirect interac-tions are sufficient to characterize indirect flows in the co-authorship network compared to two other applications.This is mainly due to the special structure that we ob-serve in the co-authorship network: The co-authorshipnetwork can be decomposed locally into densely connectedsubgraphs and thus most transitive edges happen locallywithin each module, leading to small indirect effects forlonger paths, and thus small β values seem sufficient tocharacterize their indirect flows.

1.7 Computational complexity analysis ofnetwork deconvolution

The proposed network deconvolution algorithm based oneigen decomposition requires computations of eigen de-

7

Page 11: Network deconvolution as a general method to distinguish ...

composition of the observed dependency matrix. Thisstep has order of O(n3) computational complexity wheren is the number of nodes of the network. Further, thegradient descent based algorithm is iterative and can bevery costly to use. Therefore, using baseline network de-convolution algorithms for very large networks can beunattractive due to this required computational power.Here, we propose two extensions to network deconvolu-tion for very large networks. First, many of these largenetworks are usually low rank and have many zero eigen-values and therefore the eigen decomposition step can beperformed efficiently by exploiting this sparsity. Second,network deconvolution can be parallelized by being ap-plied on local subgraphs of the network. In this section,we explain these two algorithms in more detail.

To show the effectiveness of these extensions, we generatesimulated scale-free networks with various levels of eigensparsity and number of nodes. We add some transitiveedges to the network probabilistically by only consideringtwo-hop and three-hop indirect paths. Then, we applyND to discover transitive edges.

1.7.1 Using sparsity in eigen decomposition

In many applications, networks are low rank (many of net-work eigenvalues are zero or negligible.). In those cases,eigen decomposition can be performed efficiently by onlycomputing non-zero eigenvalues3. Here, we use MAT-LAB’s sparse eigen decomposition algorithm15,16. FigureS5-a shows CPU time usage of performing network decon-volution on networks with 1000 to 50000 nodes and vari-ous eigen sparsity levels. As it is illustrated in this figure,network deconvolution can be performed efficiently evenon very large networks if the network is low rank. FigureS5-b shows the CPU time usage of ND for various networksizes averaged over networks whose eigen sparsity is lessthan or equal to 100. This figure demonstrates that un-like the baseline network deconvolution algorithm whichhas order O(n3) complexity, the complexity of performingnetwork deconvolution on low rank networks is approxi-mately linear with respect to the number of nodes in thenetwork.

1.7.2 Parallelization of network deconvolution

Parallelization of algorithms that require large computa-tional power is important in practice. Here, we explainhow network deconvolution can be performed in parallelfor very large networks:

First, network is divided into subgraphs (perhaps over-lapping) with certain sizes. For each subgraph, first andsecond neighbors of nodes that are not in that subgraphare called marginal nodes of that subgraph. To make sure

that edges that are close to subgraph borders do not sufferfrom network partitioning, we add marginal nodes to eachsubgraph and then perform network deconvolution. Then,we keep direct dependencies only for subgraph edges (andnot for marginal ones). To merge results of different sub-graphs, if edges are overlapping, we take the mean of theirdirect dependencies obtained from different subgraphs.Figure S5-c demonstrates the total CPU time usage andalso CPU time usage per core (per subgraph) for differentsubgraph sizes of a network with 5000 nodes.

2 Inferring Gene Regulatory Net-works by Network Deconvolu-tion

In this section, we focus on the application of networkdeconvolution on inferring gene regulatory networks byusing high throughput gene expression data. In gene regu-latory network inference, network deconvolution denoisesthe network structure; it eliminates indirect informationflows over transitive and noisy edges. Network decon-volution can be used either as a stand-alone individualnetwork inference method or to improve published infer-ence algorithms. In this section, we present the consideredframework and demonstrate detailed evaluation results ofnetwork deconvolution.

2.1 DREAM framework

we use datasets and the framework from the DREAM5project8 to evaluate the performance of the proposed net-work deconvolution algorithm as a regulatory inferencemethod. Inferring genome-scale transcriptional regulatorynetworks from gene expression microarray compendia for aprokaryotic model organism (E. coli), a eukaryotic modelorganism (S .cerevisiae) and an in silico benchmark is con-sidered in our evaluations. Each of these compendia is rep-resented as an expression matrix of genes and their chipmeasurements. Further, a list of candidate transcriptionfactors for each compendium is provided. These datasetsare fully anonymized (Figure S6).

For each network, a list of directed, unsigned edges arepredicted, ordered according to confidence scores. Theconfidence scores are only to order predicted edges andare not used directly in the scoring. To score each ofthese predicted networks, organism specific gold standardscontaining the known transcription factor to target geneinteractions (true positives) are used as compiled in theDREAM project8. These gold standards are not availablein the inference part. For the evaluation, all transcriptionfactor-target gene pairs that are not part of the gold stan-dards are considered as negatives, although, as the gold

8

Page 12: Network deconvolution as a general method to distinguish ...

standards are based on incomplete knowledge, they mightcontain yet unknown true interactions.

Two considered biological networks (E. coli andS .cerevisiae) have 4297 and 5667 genes where 296and 183 of them are transcription factors, respectively.The in silico network has 1643 genes where 195 of themare transcription factors. For each of these networks, geneexpressions are reported under several conditions such astime courses, perturbations, knock-outs, etc.

It has been shown that, preprocessing the data by assign-ing different weights to various expression data types canenhance the quality of predicted networks8. However, weavoid data preprocessing to demonstrate the gain obtainedby network deconvolution algorithm alone.

A key assumption in inferring gene regulatory networksfrom gene expression data is that, mRNA levels of reg-ulators and their targets show some mutual dependen-cies. Therefore, the dependency content within interac-tions of a network can be used as a coarse estimate ofinference difficulty. E. coli and in silico networks demon-strate higher information contents over interacting pairscompared to non-interacting ones8. This is true regardlesswhether Pearson’s correlation or mutual information isused to measure dependencies. However, in S .cerevisiaenetwork, the dependency distribution of interacting andnon-interacting pairs are almost identical. This shows itis more difficult to infer the regulatory network from geneexpression data. This fact is reflected in low scores ofvarious inference methods for this network.

2.2 Mutual Information

Mutual information is a quantity that measures the mu-tual dependence of two random variables. For discreterandom variables X and Y , it can be defined as:4

I(X;Y ) =∑y∈Y

∑x∈X

p(x, y) log( p(x, y)

p(x)p(y)

), (14)

where p(x, y) is the joint probability distribution of X andY over their supports X and Y, and p(x) and p(y) are themarginal probability distribution functions of X and Y ,respectively.

In the case of continuous variables, we use a mutual in-formation implementation based on B-spline smoothing oforder 3 and discretizing into 10 bins5.

2.3 Evaluation functions

We use the same performance evaluation metrics as in theDREAM5 challenge8. Network predictions are evaluated

as binary classification tasks where edges are predicted tobe present or absent. Then, standard performance met-rics from machine learning are used: precision-recall (PR)and receiver operating characteristic (ROC) curves. Then,predictions are evaluated statistically and an overall scoresummarizing the performance over all three networks iscomputed. Note that, similar to DREAM5 framework,for each network only top 100,000 edges are considered inthe performance evaluation. However, to perform networkdeconvolution, dense input matrices (:10% density) areused.

For each predicted network, the performance was assessedby the area under the ROC curve (AUROC, true positiverate vs. false positive rate), and the area under the preci-sion vs. recall curve (AUPR). Expressions for true positiverate (TPR), false positive rate (FPR), precision and re-call as a function of the cutoff (k) in the edge list are asfollows8:

recall(k) =TP (k)

P,

precision(k) =TP (k)

TP (k) + FP (k)=

TP (k)

k,

where TP (k) and FP (k) are the number of true positivesand false positives in the top k predictions, and P is thenumber of positives in the gold standard. The true positiverate, TPR(k), is similar to recall. The false positive rateis the fraction of negatives incorrectly predicted in the topk predictions:

FPR(k) =FP (k)

N, (15)

where N is the number of negatives in the gold stan-dard.

AUROC and AUPR values were separately transformedinto p-values by simulating a null distribution for a largenumber (25,000) of random networks8. Overall ROC andPR scores are derived by calculating the geometric meanof the network specific p-values:

ROC score =1

3

3∑i=1

− log10 pROCi

PR score =1

3

3∑i=1

− log10 pPRi

where pROCi and pPRi represent p-values of ROC and PRcurves for network i. Then, an overall score summarizingthe performance over all three networks is calculated asthe mean of the ROC and PR scores.

9

Page 13: Network deconvolution as a general method to distinguish ...

2.4 Network motif analysis

We performed a network motif analysis10,8 to comparethe ability of inference methods to correctly predict feed-forward loops and regulatory cascades before and after ND(Figure 2b). We used the refined approach introducedby Petri et al., which measures performance for differ-ent motif types based on the area under the ROC curve(AUROC)9. Briefly, this involves: (1) identifying the sub-set of edges pertaining to a given motif type in the truenetwork; (2) evaluating the performance for this subset ofedges using the AUROC. Note that edges are counted onlyonce for a given motif type, even if they are part of severaloverlapping motif instances of the same type.

Using this approach, we computed for each network in-ference method the motif-specific AUROC of regulatorycascades and feed-forward loops. The prediction bias isgiven by the difference between the overall AUROC of agiven inference method (computed over the complete setof edges as described in Methods) and the motif-specificAUROC. Figure 2b shows the average prediction bi-ases over the in silico and E. coli networks (following theDREAM project8, we excluded S .cerevisiae because theinference methods did not recover sufficient true edges fora meaningful network motif analysis).

3 Inferring Protein Structural Con-straints by Network Deconvolu-tion

Application of network deconvolution to pairwise evolu-tionary correlation matrix reduces transitive noise effectswhich in turn leads to higher quality contact map pre-dictions. We evaluate the performance of network decon-volution in inferring contact maps for fifteen tested pro-teins in different folding classes with sizes ranging from50 to 260 residues11. The list of these proteins are pre-sented in Table S2. Mutual information (MI) and di-rect information11 (DI), a method recently developed forthe protein-structure prediction problem that seeks to ex-plicitly maximally distribute the interaction probabilitiesacross different residue pairs were used to compute co-variation similarities among different residues. We usedthe full MI and DI matrices computed from the align-ments at http://evofold.org/ using the code provided onthe website. Then, network deconvolution is applied onMI and DI matrices.

We say two amino acid pairs contact if their CA atomdistance is less than 7 Angstrom. Trend of our resultsis robust to the choice of threshold (Figure S10). How-ever, choosing a small threshold leads to sparse contactmaps over which significant statistical results cannot be

inferred, while considering large thresholds may introducefalse contacts. We also report the results for both smaller(5 Angstrom, Figure S10-a) and larger (8 Angstrom, Fig-ure S10-b) thresholds. Distant contact maps are the con-tacts whose amino acid indices differ at least by 4 withtwo modifications. We exclude contacts in alpha-helicesbetween positions i and i+8 when there are already con-tacts between positions i and i+4, and between i+4 andi + 8. We also exclude contacts between positions i andi+4 to remove trivial contacts due to alpha-helices. Moreelaborate contact refinement schemes can also be used aswell11.

For evaluation, we consider fraction of discovered truepositive contacts among top predictions. We also say adiscovered contact is non-redundant when it cannot beexplained by previously discovered ones (i.e., there is noindirect path connecting them over the network). Non-redundant contacts provide important constraints in pre-dicting 3D structure of proteins.

We compare the performance of network deconvolution(ND) against mutual information (MI) and direct infor-mation (DI). Figure S7 shows average total and non-redundant contact discovery rates for 14 tested proteins.We excluded protein 1hzx as all methods had very low per-formance on it possibly due to its poor alignment.

Recovery rate of contacts after ND is consistently higherthan MI for each of 15 tested proteins, leading to an av-erage increase of 90%, 65% and 50% in the top 50, 100and 250 predictions. ND leads to a small but consistentimprovement across the top predictions of direct infor-mation (DI). Remarkably, in the strongest-scoring predic-tions, ranked between 0 and 50 for each method, ND out-performs DI, both independently and as a post-processingstep (Figure S7-a and S8). Focusing on discovery of non-redundant contacts, that cannot be explained by previ-ously discovered ones, ND shows an even greater improve-ment over the performance of MI, by 116%, 95%, and 60%in the top 50, 100, and 250 predictions, respectively (S7-b). Considering non-redundant interactions, ND showsimprovement of 9%, 8%, and 6% over DI for the top 50,100, and 250 predictions, respectively (Figure S7-b andFigure S9). We also developed an alternate definitionof redundant contacts, using a window-based approach(i.e., predictions that are within a distance window ofpreviously-discovered contacts are considered redundant).In this metric, ND shows consistent performance increaseat all cut-offs over MI and DI (Figure S7-c,d).

10

Page 14: Network deconvolution as a general method to distinguish ...

4 Inferring Weak and Strong Col-laboration Ties by Network De-convolution

Network deconvolution provides a systematic way of dis-tinguishing weak and strong ties in the network. Appli-cation of network deconvolution on an input adjacencymatrix ranks edges based on their global importance in in-formation spread through the network. Top ranking edgeshave greater influence on information spread in the net-work, while deletion of low ranking edges have a minoreffect on network information flows.

In this section, we consider the application of networkdeconvolution in inferring weak and strong ties in a co-authorship network formed by M. Newman13. Here, we in-troduce this network and a gold standard which assigns tiestrengths to edges by using additional information abouteach publication.

The co-authorship network created by M. Newman13 isused to evaluate the performance of network deconvolu-tion in distinguishing weak and strong ties. Nodes of thisnetwork represent 1589 scientists that work in the networkscience field. Two scientists are considered connected ifthey have coauthored at least one paper.

Additional publication details such as the number of pa-pers each pair of scientists has collaborated and the num-ber of coauthors on each of those papers have been usedto calculate a collaboration tie strength for each pair ofscientists as follows14:

First, each joint publication is weighted inversely accord-ing to its number of coauthors. The intuition is that, ingeneral two scientists whose names appear on a paper withmany other coauthors know one another less well thantwo who were the sole authors of a paper. Secondly, thestrengths of the ties derived from each of the papers writ-ten by a particular pair of scientists are added together.The intuition of this part is that, authors who have writtenmany papers together know one another better on averagethan those who have written few papers together. Obvi-ously, this is only a coarse approximation of calculatingcollaboration tie strengths.

As explained above, additional information about eachpublication can be used to calculate tie strengths14. How-ever, we show that, network deconvolution can describemany of these collaboration tie strengths solely by use ofthe topology of the network.

We say two individuals have a strong coauthorship tieif their assigned coauthorship strength in the calculatedgold standard is equal or greater than 1/2. Around 36% ofedges are labeled as strong ties according to this threshold(991 out of 2742 edges). The performance trend does not

change by changing this threshold.

Application of network deconvolution to the un-weightedco-authorship network allows edges to be ranked based ontheir global significances in the context of the network.The obtained global edge ranking is significantly corre-lated with the co-authorship tie gold standards calculatedby use of publication details (R2 correlation coefficient isaround 0.76). Note that, in Figure 4b, tie strengths aremapped to be between 0 and 1 by using their edge ranks.In particular, network deconvolution allows recovery ofaround 77% of strong co-authorship ties solely by use ofthe network topology. Also, transitive collaborations arerecognized as weak ties in both ND and Newman’s tiestrength weights.

References

[1] G. Altay and F. Emmert-Streib. Inferring the conser-vative causal core of gene regulatory networks. BMCSystems Biology, 4(1):132, 2010.

[2] D. Bertsekas. Nonlinear programming. 1999.

[3] Y. Saad. Numerical methods for large eigenvalueproblems. Manchester University Press. 1992.

[4] T. Cover, J. Thomas, J. Wiley, et al. Elements ofinformation theory, volume 6. Wiley Online Library,1991.

[5] J. J. Faith, B. Hayete, J. T. Thaden, I. Mogno,J. Wierzbowski, and et al. Large-scale mapping andvalidation of escherichia coli transcriptional regula-tion from a compendium of expression profiles. PLoSBiology, 5:54–66, 2007.

[6] T. Hopf, L. Colwell, R. Sheridan, B. Rost, C. Sander,and D. Marks. Three-dimensional structures of mem-brane proteins from genomic sequencing. Cell, 2012.

[7] R. Horn and C. Johnson. Matrix analysis. CambridgeUniv Pr, 1990.

[8] D. Marbach, et al. Wisdom of crowds for robust genenetwork inference. Nature Methods, 2012.

[9] T. Petri, S. Altmann, L. Geistlinger, R. Zimmer,and K. Robert. Inference of eukaryote transcriptionregulatory networks. in preparation, 2012.

[10] D. Marbach, et al. Revealing strengths and weak-nesses of methods for gene network inference. Pro-ceedings of the National Academy of Sciences of theUnited States of America, 107, 2010.

[11] D. Marks, L. Colwell, R. Sheridan, T. Hopf, A. Pag-nani, R. Zecchina, and C. Sander. Protein 3d struc-

11

Page 15: Network deconvolution as a general method to distinguish ...

ture computed from evolutionary sequence variation.PloS one, 6(12):e28766, 2011.

[12] P. Meyer, D. Marbach, S. Roy, and M. Kellis.Information-theoretic inference of gene networks us-ing backward elimination, 2010.

[13] M. Newman. Finding community structure in net-works using the eigenvectors of matrices. Physicalreview E, 74(3):036104, 2006.

[14] M. Newman et al. Scientific collaboration networks.ii. shortest paths, weighted networks, and centrality.Physical review-series E-, 64(1; part 2):16132–16132,2001.

[15] R.B. Lehoucq and D.C. Sorensen Deflation Tech-niques for an Implicitly Re-Started Arnoldi IterationSIAM J. Matrix Analysis and Applications, Vol. 17,1996, pp. 789Ű821.

[16] D.C. Sorensen Implicit Application of Polyno-mial Filters in a k-Step Arnoldi Method SIAMJ. Matrix Analysis and Applications, Vol. 13, 1992,pp.357Ű385.

12

Page 16: Network deconvolution as a general method to distinguish ...

Figure S1: Effects of applying network deconvolution on network eigenvaluesPanel (a) shows an example of how eigenvalues of the observed matrix change by application of network deconvolution.In general, ND decreases magnitudes of high positive eigenvalues that were inflated due to transitivity effects over thenetwork. Panel (b) shows the network deconvolution nonlinear mapping function that is applied to eigenvalues of theunscaled observed dependency matrix to compute the eigenvalues of the direct dependency matrix.

13

Page 17: Network deconvolution as a general method to distinguish ...

0 5 10 15 20 25 300

0.5

1

1.5

2

2.5

3

3.5

interation #

err

or

Figure S2: Convergence of network deconvolution for non-decomposable matricesThis figure illustrates the convergence of Algorithm 3 over a random asymmetric network with 50 nodes and 490 edges.As the number of iterations increases, the energy of the error term goes to zero.

14

Page 18: Network deconvolution as a general method to distinguish ...

Figure S3: Robustness of network deconvolution against noiseThis figures illustrates the performance of network deconvolution as a gene regulatory network inference method overnoisy data. Y-axis represents the inference score (combination of AUPR and AUROC scores) as explained in Section2.3. However, here, to observe noise effects completely, unlike the DREAM framework scoring (see Section 2.3), we donot threshold inferred networks. E. coli (E) and in silico (I) networks are considered in this setup where an artificialGaussian noise with a variance of a fraction of the experiment variance is added to each experiment in gene expressiondatasets. As it can be seen in this figure, by increasing the noise power, the overall score decreases smoothly whichindicates the robustness of network deconvolution against noise. The error bars represent standard deviations ofevaluation points.

15

Page 19: Network deconvolution as a general method to distinguish ...

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 115

20

25

Ge

ne

re

gu

lato

ry n

etw

ork

pre

dic

tio

n s

co

re

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

20

40

60

80

100

Pro

tein

co

nta

ct m

ap

pre

dic

tio

n s

co

re

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

Str

on

g c

o−

au

tho

rsh

ip tie

pre

dic

tio

n s

co

re

beta

Figure S4: Linear scaling effects on network deconvolution performanceIn the first step of network deconvolution framework, a linear scaling function is applied on the unscaled observeddependency matrix to have the largest absolute eigenvalue of the direct dependency matrix less than or equal toβ. Smaller β means faster convergence of the right-hand side of equation (1) and in turn faster decay of diffusioneffects. Therefore, in the small β case, higher order indirect interactions play smaller roles in observed dependencies.This figure illustrates the effect of β in the performance of network deconvolution in various considered applications:Panel (a) demonstrates the overall score of network deconvolution as a gene regulatory network inference method, overthree considered networks E. coli, in silico and S .cerevisiae. This overall score is computed by combining AUPR andAUROC scores as explained in Section 2.3. For more details, see Section 2. Panel (b) demonstrates the effect of β in theperformance of network deconvolution in inferring protein evolutionary constraints. Y-axis shows the overall contactmap prediction rate gains of top 100 predictions over all 15 tested proteins as explained in Section 3. For more details,see Section 3. Panel (c) illustrates the effect of β in the performance of network deconvolution in inferring weak andstring social ties. Y-axis shows the strong tie prediction ratio over the considered co-authorship network as explained inSection 4. As it is illustrated in this figure, choosing β close to one (i.e., considering higher order indirect interactions)leads to high performances in all considered network deconvolution applications. For regulatory network inference, weused β = 0.5, for protein contact maps, we used β = 0.99, and for co-authorship network, we used β = 0.95. Further,note that despite the social network application, considering higher order indirect interactions is important in generegulatory inference and protein structural constraint inference applications.

16

Page 20: Network deconvolution as a general method to distinguish ...

0 20 40 60 80 10010

0

101

102

103

104

105

106

107

# of non−zero eigenvalues of network

CP

U tim

e u

sag

e o

f N

D (

log

sca

le)

1000 nodes−sparse ND

5000 nodes−sparse ND

10000 nodes−sparse ND

50000 nodes−sparse ND

1000 nodes−ND

5000 nodes−ND

10000 nodes−ND

50000 nodes−ND

(a)

(c)

(b)

1000

# of nodes in the network

5000 10000 500000

100

101

102

103

104

105

106

aver

age

of C

PU

tim

e us

age

(log

scal

e)

Sparse ND

ND

100% 15% 10% 5%

0

0.5

1

0

4

8

Tota

l CP

U ti

me

usag

e CP

U tim

e usage per core

subgraph size

Figure S5: Scalability of network deconvolution to large networks using eigen sparsity and paralleliza-tionPanel (a) shows CPU time usage of performing network deconvolution on networks with 1000 to 50000 nodes andvarious eigen sparsity levels. As it is illustrated in this figure, network deconvolution can be performed efficiently evenon very large networks if the network is low rank. Panel (b) shows the CPU time usage of ND for various networksizes averaged over networks whose eigen sparsity is less than or equal to 100. This figure demonstrates that unlike thebaseline network deconvolution algorithm which has a complexity of order O(n3), complexity of performing networkdeconvolution on low rank networks is approximately linear with respect to the number of nodes in the network.(c) Parallelization of network deconvolution is important to apply it effectively on very large networks. This paneldemonstrates the total CPU time usage and also CPU time usage per core (per subgraph) for different subgraph sizesof a network with 5000 nodes.

17

Page 21: Network deconvolution as a general method to distinguish ...

Figure S6: DREAM frameworkDREAM5 participants were solicited to infer gene regulatory interactions from three different expression compendia.For each network, a list of directed, unsigned edges are predicted, ordered according to confidence scores. To score eachof these predicted networks, organism specific gold standards containing the known transcription factor to target geneinteractions (true positives) are used as compiled in the DREAM project8. These gold standards are not available inthe inference part. For the evaluation, all transcription factor-target gene pairs that are not part of the gold standardsare considered as negatives.

18

Page 22: Network deconvolution as a general method to distinguish ...

(a)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 50 100 150 200 250

MIDIMI+NDDI+ND

0

0.1

0.2

top predictions

0 50 100 150 200 250

MIDIMI+NDDI+ND

top predictions

(b)

ave

rag

e fra

ctio

n o

f n

on

−re

du

nd

an

t

dis

co

ve

red

co

nta

cts

ave

rag

e fra

ctio

n o

f

dis

co

ve

red

co

nta

cts

0 50 100 150 200 2500

0.05

0.1

0.15

0.2

0.25

top predictions

ave

rag

e fra

ctio

n o

f d

istr

ibu

ted

dis

co

ve

red

co

nta

cts

MIDIMI+NDDI+ND

0 50 100 150 200 2500

0.04

0.08

0.12

0.16

top predictions

ave

rag

e fra

ctio

n o

f d

istr

ibu

ted

dis

co

ve

red

co

nta

cts

MIDIMI+NDDI+ND

window size 2window size 1

(d)(c)

Figure S7: Overall total and non-redundant contact discovery rates(a) Recovery of truly interacting amino acid residues with increasing numbers of top-scoring pairs for mutual information(MI), direct information (DI), and network deconvolution (ND). Recovery after ND is consistently higher than MI foreach of tested proteins leading to an average increase of 90%, 65% and 50% in the top 50, 100 and 250 predictions.ND leads to a small but consistent improvement across the top predictions of direct information (DI). In the strongest-scoring predictions, ranked between 0 and 50 for each method, ND outperforms DI, both independently and as apost-processing step. (b) Discovery of non-redundant contacts, that cannot be explained by previously discoveredones. ND shows improvement over both MI and DI specially in top predictions. (c),(d) We show here an alternatedefinition of redundant contacts, using a window-based approach (predictions that are within a distance window ofpreviously-discovered contacts are considered redundant). In this metric, ND shows consistent performance increaseat all cut-offs.

19

Page 23: Network deconvolution as a general method to distinguish ...

0 100 2000

0.1

0.21hzx

0 100 2000

0.1

0.23tgi

0 100 2000

0.2

0.45p21

0 100 2000

0.2

0.41f21

0 100 2000

0.2

0.41e6k

0 100 2000

0.2

0.41bkr

0 100 2000

0.2

0.42it6

0 100 2000

0.2

0.41rqm

0 100 2000

0.2

0.42o72

0 100 2000

0.2

0.41r9h

0 100 2000

0.5

11odd

0 100 2000

0.5

11g2e

0 100 2000

0.2

0.41wvn

0 100 2000

0.5

15pti

0 1000

0.5

12hda

MIDIMI+NDDI+ND

fra

ctio

n o

f d

isco

ve

red

co

nta

cts

top predictions

200

Figure S8: Discovery rate of interacting contacts for all tested proteinsContact recovery rate after ND is consistently higher than MI for each of 15 tested proteins, leading to an averageincrease of 90%, 65% and 50% in the top 50, 100 and 250 predictions. ND leads to a small but consistent improvementacross the top predictions of direct information (DI). Remarkably, in the strongest-scoring predictions, ranked between0 and 50 for each method, ND outperforms DI, both independently and as a post-processing step. We excluded protein1hzx (highlighted in the figure) from overall performance comparisons as all methods had very low performance on itpossibly due to its poor alignment.

20

Page 24: Network deconvolution as a general method to distinguish ...

0 100 2000

0.1

0.21hzx

0 100 2000

0.1

0.23tgi

0 100 2000

0.1

0.25p21

0 100 2000

0.1

0.21f21

0 100 2000

0.1

0.21e6k

0 100 2000

0.2

0.41bkr

0 100 2000

0.1

0.22it6

0 100 2000

0.1

0.21rqm

0 100 2000

0.2

0.42o72

0 100 2000

0.2

0.41r9h

0 100 2000

0.1

0.21odd

0 100 2000

0.2

0.41g2e

0 100 2000

0.2

0.41wvn

0 100 2000

0.2

0.45pti

0 100 2000

0.2

0.42hda

MI

DI

MI+ND

DI+ND

fra

ctio

n o

f d

isco

ve

red

no

n-r

ed

un

da

nt co

nta

cts

top predictions

Figure S9: Discovery rate of non-redundant contacts for all tested proteinsFocusing on non-redundant contacts, that cannot be explained by previously discovered ones, ND shows a significantimprovement over the performance of MI by 116%, 95%, and 60% in the top 50, 100, and 250 predictions, respectively.The improvement of ND over DI is even more pronounced when considering non-redundant interactions, with NDshowing improvement of 9%, 8%, and 6% for the top 50, 100, and 250 predictions, respectively.

21

Page 25: Network deconvolution as a general method to distinguish ...

0 50 100 150 200 2500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

top predictions

ave

rag

e fra

ctio

n o

f d

isco

ve

red

co

nta

cts

MIDIMI+NDDI+ND

0 50 100 150 200 2500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

top predictions

ave

rag

e fra

ctio

n o

f n

on

−re

du

nd

an

t

dis

co

ve

red

co

nta

cts

MIDIMI+NDDI+ND

0 50 100 150 200 2500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

top predictions

ave

rag

e fra

ctio

n o

f d

isco

ve

red

co

nta

cts

MIDIMI+NDDI+ND

0 50 100 150 200 2500

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

top predictions

ave

rag

e fra

ctio

n o

f n

on

−re

du

nd

an

t

dis

co

ve

red

co

nta

cts

MIDIMI+NDDI+ND

(b)

(a)

Figure S10: Total and non-redundant discovery rate for different thresholds of contact proximityPerformance trend of ND is robust to the choice of proximity threshold to define contacts. However, choosing a smallthreshold leads to sparse contact maps over which significant statistical results cannot be inferred, while consideringlarge thresholds may introduce false contacts. We choose threshold 7 Angstrom and also report the results for bothsmaller (5 Angstrom, part a) and larger (8 Angstrom, part b) thresholds.

22

Page 26: Network deconvolution as a general method to distinguish ...

Network name # of nodes Network density Ave. # of neighbors Clustering coeff. Network diameter Network centralityin silico 1643 0.003 5.107 0.185 10 0.021E. coli 4297 0.004 3.802 0.19 13 0.252

S. cerevisiae 5667 0.002 3.947 0.071 9 0.108Co-authorship 1589 0.003 3.754 0.694 17 0.021

Table S1: General properties of considered networks.This table presents general properties of in silico, E. coli, S .cerevisiae and coauthorship networks.

23

Page 27: Network deconvolution as a general method to distinguish ...

Protein PDB ID Protein name Protein PDB ID Protein name Protein PDB ID Protein name1hzx OPSD BOVIN 1bkr SPTB2 HUMAN 1odd OMPR ECOLI3tgi TRY2 RAT 2it6 A8MVQ9 HUMAN 1g2e ELAV4 HUMAN5p21 RASH HUMAN 1rqm THIO ALIAC 1wvn PCBP1 HUMAN1f21 RNH ECOLI 2o72 CADH1 HUMAN 5pti BPT1 BOVIN1e6k CHEY ECOLI 1r9h O45418 CAEEL 2hda YES HUMAN

Table S2: Tested protein names and PDB IDsThis table presents tested protein names and their PDB IDs.

24