Generating Scale Free Networks - YorkU Math and Stats · –Scale free networks Degree distribution...
Transcript of Generating Scale Free Networks - YorkU Math and Stats · –Scale free networks Degree distribution...
Generating Scale Free Networks
S Konini and EJ Janse van RensburgMathematics & Statistics, York University
Toronto, Ontario
Barabasi-Albert Vazquez Sole iSite
– Protein interaction network
Saskatoon 2016 2 / 40
Source: Albert Scalefree networks in cell biology(Journal of Cell Science 2005 118: 4947-4957; doi: 10.1242/jcs.02714)
Saskatoon 2016 3 / 40
– Scale free networks
Degree distribution P(k) = Prob(degree of node is k )Normally P(k) is binomially distributedScale free if P(k) obeys a power law:
P(k) ' Co k−γ
Saskatoon 2016 4 / 40
Finite network of n nodes:
P(k) ' Co k−γ
P(k) is not integrable if γ ≤ 1 as n→∞However
n∑k=0
P(k) '∫ n
1P(k) dk ' C0
1− n1−γ
γ − 1
if γ 6= 1, is finite for finite nThe average degree sequence {〈dk 〉}n
〈dk 〉 ' n P(k) if n is large
Saskatoon 2016 5 / 40
Assume that for finite networks (of order n)
P(k) ' Co k−γ , 1 ≤ k ≤ n
—The average degree (connectivity ) for networks of size n is
〈k〉n =∑n
k=1 k P(k)∑nk=1 P(k)
'∫ n
1 k P(k) dk∫ n1 P(k) dk
'(γ−1γ−2
)nγ−n2
nγ−n
'
(1−γ2−γ
)n, if γ < 1;
nlog n , if γ = 1;(γ−12−γ
)n2−γ , if 1 < γ < 2;
log n, if γ = 2;(γ−1γ−2
), if γ > 2.
Saskatoon 2016 6 / 40
Number of edges (by handshaking)
En = 12n 〈k〉n
En '
(1−γ
2(2−γ)
)n2, if γ < 1 (dense);
n2
2 log n , if γ = 1 (marginally dense);(γ−1
2(2−γ)
)n3−γ , if 1 < γ < 2 (super linear);
12n log n, if γ = 2 (marginally sparse);(
γ−12(γ−2)
)n, if γ > 2 (sparse).
(1)
Saskatoon 2016 7 / 40
– Barabasi-Albert clusters
Barabasi & Albert in Science; 286:509–512 (1999)Attach new nodes preferentially to vertices of high degree
—Barabasi-Albert algorithm – Growth by preferential attachment:
1 Initiate the network with one node x0;2 Suppose that the network consists of nodes {x0, x1, . . . , xn−1} of
degrees {k0, k1, . . . , kn−1};3 Append a new node xn by executing step 3(a) or step 3(b):
3(a) With probability p: Select xj uniformly and attach xn to it by insertingthe edge 〈xj∼xn〉;
3(b) With default probability 1− p: Attach xn by adding edges 〈xj∼xn〉with probability
kj∑j kj
for 1 ≤ j ≤ n − 1;
4 Stop the algorithm when the network has order n.Saskatoon 2016 8 / 40
– Modified Barbasi-Albert clusters
Modify the algorithm by introducing a parameters (λ,A);Attach xn by adding edges 〈xj∼xn〉 with probability
qj = min{λ kj +A∑
j kj,1}
—Recover the canonical algorithm when λ = 1 and A = 0—Replace step 3(b) by
Modified Barabasi-Albert algorithm:Grow Barabasi-Albert clusters of order n:
3(b) With default probability 1− p: Attach xn by adding edges 〈xj∼xn〉with probability qj = min{λ kj +A∑
j kj,1}, and λ and A are non-negative
parameters of the algorithm;Saskatoon 2016 9 / 40
Growing a cluster with p = 0.5 and of order n = 100000
Barabasi-Albert Cluster with p = 0.5
Saskatoon 2016 10 / 40
Barabasi-Albert clusters
Saskatoon 2016 11 / 40
Modified Barabasi-Albert Clusters (λ = 0.1 left, and λ = 1.5 right)
Saskatoon 2016 12 / 40
– Mean field theory for the Modified Barabasi-Albert algorithm
Barabasi-Albert clusters are relatively sparse networks—
kj(n) = degree of node j after n iterationsMean field: kj(n) = 〈kj(n)〉Elementary move of the algorithm:3(a) Probability p append a random edge and node
Probability =pn
3(b) Default: append edges 〈xj∼xn〉 with probability
Probability = (1− p)×λ kj (n) + A∑
j kj (n)
Saskatoon 2016 13 / 40
Edges added:3(a) p
n3(b)
∑jλ kj (n)+A∑
j kj (n) = (1− p)× (λ+ nA∑j kj (n) )
Number of edges En = 12∑
j kj(n)
∆En = p + (1− p)λ+ (1− p) nA2En
.
Approximate by a differential equation
2Enddn En = 2(p + (1− p)λ)En + (1− p)nA.
Solve
En = n2 ((p + (1− p)λ) +
√(p + (1− p)λ)2 + 2(1− p)A) = C n
γ > 2 (the network is sparse)
Saskatoon 2016 14 / 40
To determine the value of γ:Recurrence for kj(n):
kj(n + 1) = kj(n) + pn +
(1−p)(λkj (n)+A)2En
where En = C nDifferential equation approximation (node j is added at nj )
ddn kj(n) = p
n +(1−p)(λkj (n)+A)
2Cn ; IC: kj(nj) = 1
Solve the equation
kj(n) = (1 + QP ) (n/nj)
P − QP , where Q = p + (1−p)A
2C and P = (1−p)λ2C
But P(kj(n) < κ) & P(
njn >
(Q/P+κ1+Q/P
)−1/P)
for 0 ≤ nj ≤ n
That is,
P(kj(n) < κ) & 1−(
Q/P+κ1+Q/P
)−1/P
Saskatoon 2016 15 / 40
Take the derivative to find
P(κ) = P[kj(t) = κ] = ∂∂κP[kn(t) < κ] ' (P+Q)1/P
(Pκ+Q)1+1/P ∼ κ−1−1/P
γ = 1 + 1P = 1 +
((p+(1−p)λ)+√
(p+(1−p)λ)2+2(1−p)A)(1−p)λ
Canonical Barabasi-Albert clusters (λ = 1 and A = 0) are sparse
γ = 3 + 2p1−p ≥ 3
If A = 0 thenγ = 3 + 2p
(1−p)λ ≥ 3
If λ = 1 thenγ = 1 + 1
1−p +
√1+2(1−p)A
1−p ≥ 3
Saskatoon 2016 16 / 40
Plotting log P(k)log k against 1
log k for Barabasi-Albert Clusters
If p = 0 then γ = 3.026〈k〉n → Constant and the clusters are sparse
Saskatoon 2016 17 / 40
– Duplication-Divergence clusters
Vazquez etal in ComplexUS 2003; 1:38–44 (2003)
Saskatoon 2016 18 / 40
Duplication-Divergence algorithm:1 Initiate the network with one node x0;2 Duplication: Choose vertex υ uniformly, and duplicate it to υ′;3 For all edges 〈w∼υ〉 incident with υ, add 〈w∼υ′〉;4 With probability p add the edge 〈υ∼υ′〉;5 Divergence: Delete one of {〈w∼υ〉, 〈w∼υ′〉} with probability q;6 Stop the algorithm when a network of order n is grown.
Saskatoon 2016 19 / 40
Growing a cluster with p = 0.1, q = 0.3 and of order n = 50000
Duplication-Divergence cluster with p = 0.1, q = 0.3
Saskatoon 2016 20 / 40
Growing a cluster with p = 0.1, q = 0.7 and of order n = 500000
Duplication-Divergence cluster with p = 0.1, q = 0.7
Saskatoon 2016 21 / 40
– Duplication-Divergence clusters
Left: p = 1, q = 0.4, n = 300; Right: p = 1, q = 0.6, n = 300
Saskatoon 2016 22 / 40
– Mean field theory for Duplication-Divergence clusters
En+1 = En + p + kj(n)− q kj(n)
Put 2En =∑
j kj(n) = n an
En+1 = En + p + 2(1−q)n En
Approximate by a differential equation and solve
En = 1−2(q−p)2(1−2q) n2(1−q) − pn
1−2q
Three regimes:
En '
1−2(q−p)2(1−2q) n2(1−q), if q < 1
2 ;
pn log n, if q = 12 ;
pn2q−1 , if q > 1
2 .
Saskatoon 2016 23 / 40
The mean field degree distribution is
P(k) ∼
(1− 2(q − p)) k−1−2q, if q < 1
2 ;
2p k−2, if q = 12 ;
C0 k−γ , if q > 12 ,
This gives the γ exponent
γ = 1 + 2q, provided that q ≤ 12
Test this by plotting
log P(k)log(k+1) = −γ + C
log(k+1) + o( 1k )
Saskatoon 2016 24 / 40
– Duplication-Divergence clusters
p = 1, q = 0.4
(Red) log P(k)log(k+1) against 1
log(k+1) for n ≤ 200000
(Blue) log(P(2k)/P(k))log 2 against log(k+1)
k for n ≤ 200000
Saskatoon 2016 25 / 40
– Duplication-Divergence clusters
p = 1, q = 0.6
(Red) log P(k)log(k+1) against 1
log(k+1) for n ≤ 200000
(Blue) log(P(2k)/P(k))log 2 against log(k+1)
k for n ≤ 200000
Saskatoon 2016 26 / 40
Connectivity of Duplication-Divergence clusters:
〈k〉n '
1−2(q−p)
1−2q n1−2q − 2p1−2q , if q < 1
2 ;
2p log n + 1, if q = 12 ;
Constant, if q > 12 .
(2)
In the case that q > 12 ,
En ' pn2q−1 =
(γ−1
2(γ−2)
)n
This gives the tentative prediction
γ = 1 + 2p1+2p−2q , if q > 1
2
Saskatoon 2016 27 / 40
– Connectivity data for Duplication-Divergence clusters (p = 0.75)
n q = 0.4 q = 0.5 q = 0.63125 25.9 11.4 5.936250 31.3 12.6 6.1412500 37.3 13.6 6.3325000 44.8 14.4 6.5550000 51.9 15.5 6.64100000 60.1 16.8 6.75200000 70.3 17.7 6.88
q = 0.4: 〈k〉n ≈ 8.5× n−0.2 (the data give 8.0)q = 0.5: 〈k〉n ' 1.5 log n (the data give 1.45 log n)q = 0.6: The data give γ ≈ 2.13 (and γ = 1 + 2p
1+2p−2q = 2.15)
Saskatoon 2016 28 / 40
– Sole clusters
Pastor-Satorrasa, Smith & Sole inJournal of Theoretical Biology ; 222:199210 (2003)
Saskatoon 2016 29 / 40
– iSite evolutionary clusters
Gibson & Goldberg in BioInformatics; 27:376–382 (2011)
progenitor
iSite.................................................................................................................
........................................
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
••••••••••••
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•••••••••••••••••••••••••••••••••••••••••••••••••••••
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•••••••••••••••••••••••••••••••••••••••••••••••••••••
..
... ... .. ....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
. ..
. .. ....•
•
•••A
B
progenitor
........................................................................................................................................................ ....................................................................................................
....................................................................................................................................
........................................................................................................................................
...........................
...........................
.........
............................................................................................
.........................................................................................................................................................................................................................................................
....................................................................................................................................................
. ..
. .. ....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
. ..
. .. ....•
•
•A
successor(duplicate)
........................................................................................................................................
...........................
.......................................................................................................
.........................................................................................................................................................................................................................................................
....................................................................................................................................................
. ..
. .. ....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
. ..
. .. ....•
•
•A′
q..............................................................................................................................................................................................................................................................................................................................
..............................................................................................................................................................................................................................................................................................................................
...........................................................................................................................................................................................................................................................
p
............................................................................................................
............................................................................................................................................................................................................................................................
................................................................................................................................................................................................................
........................................................................................................................................................• •B••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
1−r
Saskatoon 2016 30 / 40
The iSite evolutionary algorithm:Isites are self-interacting with probability pand are active with probability qInteractions are lost with probability r
1 Initiate the network with one node x0 with I active iSites;2 Choose a progenitor protein υ uniformly and duplicate it to a
successor protein υ′:A duplicated iSite A′ ∈ υ′ is active with probability q;A duplicated iSite A′ ∈ υ′ is self-interacting with probability p;
3 Add new interactions as follows if A′ is active:If iSite A′ ∈ υ′ is self-interacting then add the edge 〈A∼A′〉;If 〈A∼B〉 is an interaction, then duplicate it to 〈A′∼B〉 withprobability 1−r ;
4 Iterate the algorithm until a network of order N is grown.
Saskatoon 2016 31 / 40
– iSite clusters
Saskatoon 2016 32 / 40
– Mean field theory for iSite clusters
ij(n) = # active iSites on node j after n iterationsThe average number of iSites per protein is
i(n) = 1n
∑j
ij(n)
kj(n) = degree of node j after n iterations
2 En =∑
j
kj(n)
Saskatoon 2016 33 / 40
Mean field: i(n) iSites are created and q i(n) are silencedRecurrence for i(n):
(n + 1) i(n + 1) = n i(n) + (1− q) i(n)
since n i(n) = # iSites and (1− q) i(n) are added in the mean fieldExact solution
i(n) =i(0) Γ(1− q + n)
n! Γ(1− q)' I n−q
Γ(1− q)
if i(0) = I
Saskatoon 2016 34 / 40
The number of edges increases in the mean field
En+1 = En + 2(1−r)n En + p i(n)
since 〈kj(n)〉 = 2n En edges are duplicated with probability 1− r
and p i(n) edges are created by self-interacting iSitesApproximate this with a DE
ddn En = 2(1−r)
n En + pIΓ(1−q) t−q
Solve this with IC E1 = 0:
En = pI(1+q−2r) Γ(1−q)
(n2−2r − n1−q
).
Saskatoon 2016 35 / 40
Mean field connectivity of iSite clusters
〈k〉n = 2n En ' 2pI
(1+q−2r) Γ(1−q)
(t1−2r − t−q
)〈k〉n is dominated by the larger of −q and 1− 2rThe γ exponent is
γ =
{1 + 2r , if r < 1
2(1 + q);
2 + q, if r > 12(1 + q).
If 2r = (1 + q) then a different solution is obtained
〈k〉n = pIΓ(1−q) n−q log n
so γ = 2 + q with a log n correction
Saskatoon 2016 36 / 40
– iSite clusters
Plot of log P(k)/ log(k + 1) against 1/ log(k + 1)
I = 3, p = 0.5, q = 0.4, r = 0.3For these parameters, γ = 1 + 2r = 1.6
Saskatoon 2016 37 / 40
– Connectivity data for iSite Clusters
n Column 2 Column 3 Column 4 Column 53125 22.385 20.701 4.756 6.6486250 26.524 25.752 4.770 6.55612500 31.395 29.137 4.677 6.57925000 37.808 35.308 4.733 6.35850000 45.931 42.244 4.579 6.299100000 54.830 50.035 4.584 6.204200000 64.668 59.284 4.649 6.071Column 2: I = 3, p = 0.5 q = 0.4, r = 0.3Column 3: I = 5, p = 0.5 q = 0.4, r = 0.3Column 4: I = 3, p = 0.5 q = 0.05, r = 0.8Column 5: I = 5, p = 0.5 q = 0.05, r = 0.8
〈k〉n = γ−12−γ n2−γ
Columns 2 & 3:Least squares: γ = 1.74 and γ = 1.74 (MF γ = 1.6)Columns 4 & 5: γ = 2.1 and γ = 2.02 (MF γ = 2.05)
Saskatoon 2016 38 / 40
– Computational Time Complexity of Implemented Algorithms
Time complexity ∼ nτ
Algorithm n = 6250 n = 12500 n = 25000 n = 50000 τ
Bar-Alb (p = 0) 0.602 2.51 9.03 38.0 1.97Mod Bar-Alb (λ = 2, p = A = 0) 0.618 2.55 10.1 36.3 1.96Dupl-Div (p = 1, q = 0.4) 0.349 0.862 2.04 5.01 1.28Dupl-Div (p = 1, q = 0.6) 0.155 0.319 0.635 1.31 1.02Sole (δ = 0.25, α = 0.005) 4.84 20.5 91.0 436.0 2.16Sole (δ = 0.75, α = 0.005) 6.10 20.0 79.5 323.2 1.92iSite (p = 0.5, q = 0.01, r = 0.8, I = 1) 0.114 0.234 0.454 0.925 1.00iSite (p = 0.5, q = 0.01, r = 0.8, I = 2) 0.110 0.216 0.458 0.878 1.01iSite (p = 0.5, q = 0.01, r = 0.8, I = 3) 0.106 0.217 0.432 0.857 1.00iSite (p = 0.5, q = 0.01, r = 0.8, I = 4) 0.107 0.231 0.422 0.848 0.98iSite (p = 0.25, q = 0.01, r = 0.8, I = 4) 0.104 0.249 0.415 0.844 0.98iSite (p = 0.75, q = 0.01, r = 0.8, I = 4) 0.108 0.216 0.437 0.867 1.00
Saskatoon 2016 39 / 40
– Conclusions
Variety of algorithms in the mean fieldMixed success, in some cases good, other cases perhaps notAlso considered the Sole model – the clusters are not scalefree,but do exhibit a distribution which scalesAlso introduced variants of the models, and considered theirpropertiesA paper is being written and will be on the arXiv when ready
Thank you Chris S for the invitation to Saskatoon!
Saskatoon 2016 40 / 40