Post on 12-Jul-2015
Sang Hoon Lee Department of Energy Science, Sungkyunkwan University
http://sites.google.com/site/lshlj82
Community Structures of Multilayer Political Cosponsorship Networks
The 2nd Daegu Gyeongbuk International Social Network Conference (DISC 2014)
Community structures in networks
modularity (the objective function to be maximized)
M. A. Porter, J.-P. Onnela, and P. J. Mucha, Not. Am. Math. Soc. 56, 1082 (2009); S. Fortunato, Phys. Rep. 486, 75 (2010).
Q =1
2m
Xij
Aij kikj
2m
(gi, gj)
where the adjacency matrixAij 6= 0 if nodes i and j are connected and Aij = 0 otherwise,ki is the degree (number of neighboring nodes of i)or strength (sum of weights around i),gi is the community to which i belongs,and m is the total number of edges or sum of weights in the network
resolution parameter: controlling the characteristic size of communities
importing network dataidentifying community structure
visualizing
smal
ler
com
mun
ities
TA
XO
NO
MIE
SO
FN
ETW
ORK
SFR
OM
COM
MU
NIT
YST
RUCT
URE
PHY
SICA
LR
EVIE
WE
86,
0361
04(20
12)
that
alle
dges
are
antif
erro
mag
netic
atre
solu
tion="
max
and
ther
eby
forc
esea
chnode
into
itsow
nco
mm
unity
.
III.
MES
OSC
OPI
CR
ESPO
NSE
FUN
CTI
ON
S(M
RFS)
Tode
scrib
eho
wa
net
work
disin
tegr
ates
into
com
mu
niti
esas
thev
alue
of
isin
crea
sed
from"
min
to"
max
[see
Fig.
1(a)
fora
sche
mat
ic],o
ne
nee
dsto
sele
ctsu
mm
ary
stat
istic
s.Th
ere
are
man
ypo
ssib
lew
ays
tosu
mm
ariz
esu
cha
disin
tegr
atio
npr
oces
s,an
dw
efo
cus
on
thre
edi
agno
stics
that
char
acte
rize
fund
amen
talp
rope
rties
ofn
etw
ork
com
muniti
es.
Firs
t,w
euse
the
val
ueoft
heH
amilt
onia
nH(
)(
1),w
hich
isa
scal
arqu
antit
ycl
osel
yre
late
dto
net
work
modu
larit
yan
dqu
antifi
esth
een
ergy
of
the
syste
m[1
3,14
].Se
cond
,w
eca
lcul
ate
apa
rtitio
nen
trop
yS
()
toch
arac
teriz
eth
eco
mm
unity
size
distr
ibutio
n.To
doth
is,le
tnk
deno
teth
enum
ber
of
node
sin
com
munity
kan
dde
finepk=nk/N
tobe
the
prob
abili
tyto
choo
sea
node
from
com
munity
kunifo
rmly
atra
ndo
m.T
hisy
ield
sa(S
hann
on)p
artit
ione
ntr
opy
ofS
()=
(
)k=1pk
logp
k,
whi
chqu
antifi
esth
edi
sord
erin
the
asso
ciat
edco
mm
unity
size
distr
ibutio
n.Th
ird,w
euse
the
num
bero
fcom
muniti
es
().
=1,
=34
=0,
=1
=0.2
, =
8=0
.4,
=12
=0.6
, =
17=0
.8,
=24
= 0.
2 =
0.
4 =
0.
6 =
0.
8 =
0
= 1
00.2
0.4
0.6
0.81
ferro
mag
netic
link
snonlin
ksantif
erro
mag
netic
link
s
(a)
(c)
(b)
Heff
S eff
eff
FIG
.1.
(Colo
ronlin
e)(a)
Sche
mat
icofs
om
eoft
hew
ays
that
a
net
work
can
brea
kup
into
com
muniti
esas
the
val
ue
of
(or
)is
incr
ease
d.(b)
Zach
ary
Kar
ate
Club
net
work
[23]
for
diffe
ren
tval
ues
oft
heef
fect
ive
fract
ion
ofa
ntif
erro
mag
netic
edge
s.A
llin
tera
ctio
ns
are
eith
erfe
rrom
agnet
icor
antif
erro
mag
net
ic;i
.e.,
for
the
val
ues
of
th
atw
euse
d,th
ere
are
no
neu
tral
inte
ract
ions
.We
colo
red
ges
inbl
ueif
the
corr
espo
ndin
gin
tera
ctio
nsar
efe
rrom
agne
tic,a
nd
we
colo
rth
emin
red
ifth
ein
tera
ctio
ns
are
antif
erro
mag
net
ic.
We
colo
r
the
node
sbas
edon
com
munity
affil
iatio
n.(c)
TheH e
ff,S
eff,
and
eff
MR
Fs,a
nd
the
inte
ract
ion
mat
rixJ
for
diffe
rent
val
ues
of
.W
e
colo
rel
emen
tsof
the
inte
ract
ion
mat
rixby
depi
ctin
gth
eab
sence
of
aned
gein
whi
te,
ferr
om
agnet
iced
ges
inbl
ue
(dark
gray
),an
dan
tifer
rom
agne
ticed
gesi
nre
d(li
ghtg
ray).
Bec
ause
we
nee
dto
norm
aliz
eH,S
,an
dto
com
pare
them
acro
ssnet
work
s,w
ede
fine
aneff
ectiv
eener
gy
H eff
()=
H(
)H m
in
H maxH m
in=
1H(
)
H min,
(4)
whe
reH m
in=H(
"m
in)a
ndH m
ax=H(
"m
ax);
aneff
ectiv
een
tropy
Sef
f()=
S(
)S
min
Sm
axS
min=
S(
)lo
gN,
(5)
whe
reS
min=S
("m
in)a
ndS
max=S
("m
ax);
and
aneff
ectiv
enum
bero
fcom
muniti
es
ef
f()=
(
)
min
m
ax
min=
(
)1
N
1,
(6)
whe
re
min=
("m
in)=
1an
d
max=
("m
ax)=
N.
Som
enet
work
sco
nta
ina
smal
lnum
ber
of
entr
ies"
ij
that
are
ord
ers
of
mag
nitu
dela
rger
than
most
oth
eren
trie
s.Fo
rex
ampl
e,in
the
net
work
of
Face
boo
kfri
ends
hips
atCa
ltech
[21,
22],
98%
of
the"
ijen
trie
sar
ele
ssth
an10
0,bu
t0.
02%
of
them
are
larg
erth
an80
00.
Thes
ela
rge"
ij
val
ues
arise
whe
ntw
olo
w-st
ren
gth
no
des
beco
me
con
nec
ted.
Usin
gthe
null
mode
lPij=k ik j/(2
m),t
hein
tera
ctio
nbet
wee
ntw
onode
si
andj
beco
mes
antif
erro
mag
netic
whe
n>
Aij/P
ij=
2mAij/(k
ikj).
Ifa
net
work
has
ala
rge
tota
ledg
ew
eigh
tbu
tbo
thi
andj
have
smal
lst
reng
ths
com
pare
dto
oth
ernode
sin
the
net
work
,th
en
nee
dsto
bela
rge
tom
ake
the
inte
ract
ion
antif
erro
mag
netic
.In
prio
rst
udie
s,net
work
com
munity
stru
ctur
eha
sbee
ninv
estig
ated
atdi
ffere
nt
mes
osc
opi
csc
ales
byco
nsid
erin
gpl
otso
fvar
ious
diag
nosti
csas
afu
nct
ion
of
the
reso
lutio
npa
ram
eter
[1
3,14
,17
].In
the
pres
ent
exam
ple,
such
plot
sw
ould
bedo
min
ated
byin
tera
ctio
nsth
atre
quire
larg
ere
solu
tion-
para
met
erval
ues
tobe
com
ean
tifer
rom
agne
tic.T
oover
com
eth
isiss
ue,w
ede
fine
the
effec
tivef
ractio
nofa
ntife
rrom
agn
etic
edge
s
=
()=
A(
)A
("m
in)
A("
max
)A
("m
in)
[0,1
],(7)
whe
reA
()i
sth
eto
tal
nu
mbe
ro
fan
tifer
rom
agn
etic
in-
tera
ctio
nsfo
rth
egi
ven
val
ueof
.In
oth
erw
ord
s,it
isth
enum
ber
of"
ijel
emen
tsth
atar
esm
alle
rth
an
.Th
us,
A("
min
)is
the
larg
estn
um
bero
fantif
erro
mag
netic
inte
rac-
tions
forw
hich
anet
work
still
form
sasin
gle
com
munity
,an
dth
eef
fect
ive
num
ber
of
antif
erro
mag
netic
inte
ract
ions
(
)is
the
num
ber
of
antif
erro
mag
netic
inte
ract
ions
(norm
alize
dto
the
unit
inte
rval
)in
exce
ssof
A("
min
).Th
efu
nct
ion
()
incr
ease
smonoto
nica
llyin
.
Swee
ping
fro
m"
min
to"
max
corr
espo
nds
tosw
eepi
ngth
eval
ueof
from
0to
1.(O
neca
nth
ink
of
asa
contin
uous
var
iabl
ean
d
asa
disc
rete
var
iabl
etha
tcha
nges
with
even
ts.)
Asw
epe
rform
such
swee
ping
fora
given
net
work
,the
num
ber
ofc
om
muniti
esin
crea
sesf
rom
(=
0)=
1to
(=
1)=N
andy
ield
savec
tor[H e
ff(
),Sef
f(),
eff(
)]who
seco
mpo
nent
sw
eca
llth
em
esosc
opi
cre
spon
sefun
ction
s(M
RFs)
of
that
net
work
.(W
eal
soso
met
imes
refe
rto
the
vec
tor
itsel
fas
anM
RF.
)Bec
auseH e
ff
[0,1
],S
eff
[0,1
],
eff
[0,1
],an
d
[0,1
]for
ever
ynet
work
,we
can
com
pare
theM
RFs
acro
ssnet
work
san
duse
them
toid
entif
ygr
oups
of
net
work
sw
ithsim
ilar
mes
osc
opi
cst
ruct
ures
.In
Fig.
1(b),
we
show
the
Zach
ary
Kar
ate
Club
net
work
[23]
for
diffe
rent
valu
esof
0361
04-3
J.-P. Onnela et al., Phys. Rev. E 86, 036104 (2012).
Community structures in networks
modularity (the objective function to be maximized)
M. A. Porter, J.-P. Onnela, and P. J. Mucha, Not. Am. Math. Soc. 56, 1082 (2009); S. Fortunato, Phys. Rep. 486, 75 (2010).
Q =1
2m
Xij
Aij kikj
2m
(gi, gj)
where the adjacency matrixAij 6= 0 if nodes i and j are connected and Aij = 0 otherwise,ki is the degree (number of neighboring nodes of i)or strength (sum of weights around i),gi is the community to which i belongs,and m is the total number of edges or sum of weights in the network
resolution parameter: controlling the characteristic size of communities
importing network dataidentifying community structure
visualizing
smal
ler
com
mun
ities
TA
XO
NO
MIE
SO
FN
ETW
ORK
SFR
OM
COM
MU
NIT
YST
RUCT
URE
PHY
SICA
LR
EVIE
WE
86,
0361
04(20
12)
that
alle
dges
are
antif
erro
mag
netic
atre
solu
tion="
max
and
ther
eby
forc
esea
chnode
into
itsow
nco
mm
unity
.
III.
MES
OSC
OPI
CR
ESPO
NSE
FUN
CTI
ON
S(M
RFS)
Tode
scrib
eho
wa
net
work
disin
tegr
ates
into
com
mu
niti
esas
thev
alue
of
isin
crea
sed
from"
min
to"
max
[see
Fig.
1(a)
fora
sche
mat
ic],o
ne
nee
dsto
sele
ctsu
mm
ary
stat
istic
s.Th
ere
are
man
ypo
ssib
lew
ays
tosu
mm
ariz
esu
cha
disin
tegr
atio
npr
oces
s,an
dw
efo
cus
on
thre
edi
agno
stics
that
char
acte
rize
fund
amen
talp
rope
rties
ofn
etw
ork
com
muniti
es.
Firs
t,w
euse
the
val
ueoft
heH
amilt
onia
nH(
)(
1),w
hich
isa
scal
arqu
antit
ycl
osel
yre
late
dto
net
work
modu
larit
yan
dqu
antifi
esth
een
ergy
of
the
syste
m[1
3,14
].Se
cond
,w
eca
lcul
ate
apa
rtitio
nen
trop
yS
()
toch
arac
teriz
eth
eco
mm
unity
size
distr
ibutio
n.To
doth
is,le
tnk
deno
teth
enum
ber
of
node
sin
com
munity
kan
dde
finepk=nk/N
tobe
the
prob
abili
tyto
choo
sea
node
from
com
munity
kunifo
rmly
atra
ndo
m.T
hisy
ield
sa(S
hann
on)p
artit
ione
ntr
opy
ofS
()=
(
)k=1pk
logp
k,
whi
chqu
antifi
esth
edi
sord
erin
the
asso
ciat
edco
mm
unity
size
distr
ibutio
n.Th
ird,w
euse
the
num
bero
fcom
muniti
es
().
=1,
=34
=0,
=1
=0.2
, =
8=0
.4,
=12
=0.6
, =
17=0
.8,
=24
= 0.
2 =
0.
4 =
0.
6 =
0.
8 =
0
= 1
00.2
0.4
0.6
0.81
ferro
mag
netic
link
snonlin
ksantif
erro
mag
netic
link
s
(a)
(c)
(b)
Heff
S eff
eff
FIG
.1.
(Colo
ronlin
e)(a)
Sche
mat
icofs
om
eoft
hew
ays
that
a
net
work
can
brea
kup
into
com
muniti
esas
the
val
ue
of
(or
)is
incr
ease
d.(b)
Zach
ary
Kar
ate
Club
net
work
[23]
for
diffe
ren
tval
ues
oft
heef
fect
ive
fract
ion
ofa
ntif
erro
mag
netic
edge
s.A
llin
tera
ctio
ns
are
eith
erfe
rrom
agnet
icor
antif
erro
mag
net
ic;i
.e.,
for
the
val
ues
of
th
atw
euse
d,th
ere
are
no
neu
tral
inte
ract
ions
.We
colo
red
ges
inbl
ueif
the
corr
espo
ndin
gin
tera
ctio
nsar
efe
rrom
agne
tic,a
nd
we
colo
rth
emin
red
ifth
ein
tera
ctio
ns
are
antif
erro
mag
net
ic.
We
colo
r
the
node
sbas
edon
com
munity
affil
iatio
n.(c)
TheH e
ff,S
eff,
and
eff
MR
Fs,a
nd
the
inte
ract
ion
mat
rixJ
for
diffe
rent
val
ues
of
.W
e
colo
rel
emen
tsof
the
inte
ract
ion
mat
rixby
depi
ctin
gth
eab
sence
of
aned
gein
whi
te,
ferr
om
agnet
iced
ges
inbl
ue
(dark
gray
),an
dan
tifer
rom
agne
ticed
gesi
nre
d(li
ghtg
ray).
Bec
ause
we
nee
dto
norm
aliz
eH,S
,an
dto
com
pare
them
acro
ssnet
work
s,w
ede
fine
aneff
ectiv
eener
gy
H eff
()=
H(
)H m
in
H maxH m
in=
1H(
)
H min,
(4)
whe
reH m
in=H(
"m
in)a
ndH m
ax=H(
"m
ax);
aneff
ectiv
een
tropy
Sef
f()=
S(
)S
min
Sm
axS
min=
S(
)lo
gN,
(5)
whe
reS
min=S
("m
in)a
ndS
max=S
("m
ax);
and
aneff
ectiv
enum
bero
fcom
muniti
es
ef
f()=
(
)
min
m
ax
min=
(
)1
N
1,
(6)
whe
re
min=
("m
in)=
1an
d
max=
("m
ax)=
N.
Som
enet
work
sco
nta
ina
smal
lnum
ber
of
entr
ies"
ij
that
are
ord
ers
of
mag
nitu
dela
rger
than
most
oth
eren
trie
s.Fo
rex
ampl
e,in
the
net
work
of
Face
boo
kfri
ends
hips
atCa
ltech
[21,
22],
98%
of
the"
ijen
trie
sar
ele
ssth
an10
0,bu
t0.
02%
of
them
are
larg
erth
an80
00.
Thes
ela
rge"
ij
val
ues
arise
whe
ntw
olo
w-st
ren
gth
no
des
beco
me
con
nec
ted.
Usin
gthe
null
mode
lPij=k ik j/(2
m),t
hein
tera
ctio
nbet
wee
ntw
onode
si
andj
beco
mes
antif
erro
mag
netic
whe
n>
Aij/P
ij=
2mAij/(k
ikj).
Ifa
net
work
has
ala
rge
tota
ledg
ew
eigh
tbu
tbo
thi
andj
have
smal
lst
reng
ths
com
pare
dto
oth
ernode
sin
the
net
work
,th
en
nee
dsto
bela
rge
tom
ake
the
inte
ract
ion
antif
erro
mag
netic
.In
prio
rst
udie
s,net
work
com
munity
stru
ctur
eha
sbee
ninv
estig
ated
atdi
ffere
nt
mes
osc
opi
csc
ales
byco
nsid
erin
gpl
otso
fvar
ious
diag
nosti
csas
afu
nct
ion
of
the
reso
lutio
npa
ram
eter
[1
3,14
,17
].In
the
pres
ent
exam
ple,
such
plot
sw
ould
bedo
min
ated
byin
tera
ctio
nsth
atre
quire
larg
ere
solu
tion-
para
met
erval
ues
tobe
com
ean
tifer
rom
agne
tic.T
oover
com
eth
isiss
ue,w
ede
fine
the
effec
tivef
ractio
nofa
ntife
rrom
agn
etic
edge
s
=
()=
A(
)A
("m
in)
A("
max
)A
("m
in)
[0,1
],(7)
whe
reA
()i
sth
eto
tal
nu
mbe
ro
fan
tifer
rom
agn
etic
in-
tera
ctio
nsfo
rth
egi
ven
val
ueof
.In
oth
erw
ord
s,it
isth
enum
ber
of"
ijel
emen
tsth
atar
esm
alle
rth
an
.Th
us,
A("
min
)is
the
larg
estn
um
bero
fantif
erro
mag
netic
inte
rac-
tions
forw
hich
anet
work
still
form
sasin
gle
com
munity
,an
dth
eef
fect
ive
num
ber
of
antif
erro
mag
netic
inte
ract
ions
(
)is
the
num
ber
of
antif
erro
mag
netic
inte
ract
ions
(norm
alize
dto
the
unit
inte
rval
)in
exce
ssof
A("
min
).Th
efu
nct
ion
()
incr
ease
smonoto
nica
llyin
.
Swee
ping
fro
m"
min
to"
max
corr
espo
nds
tosw
eepi
ngth
eval
ueof
from
0to
1.(O
neca
nth
ink
of
asa
contin
uous
var
iabl
ean
d
asa
disc
rete
var
iabl
etha
tcha
nges
with
even
ts.)
Asw
epe
rform
such
swee
ping
fora
given
net
work
,the
num
ber
ofc
om
muniti
esin
crea
sesf
rom
(=
0)=
1to
(=
1)=N
andy
ield
savec
tor[H e
ff(
),Sef
f(),
eff(
)]who
seco
mpo
nent
sw
eca
llth
em
esosc
opi
cre
spon
sefun
ction
s(M
RFs)
of
that
net
work
.(W
eal
soso
met
imes
refe
rto
the
vec
tor
itsel
fas
anM
RF.
)Bec
auseH e
ff
[0,1
],S
eff
[0,1
],
eff
[0,1
],an
d
[0,1
]for
ever
ynet
work
,we
can
com
pare
theM
RFs
acro
ssnet
work
san
duse
them
toid
entif
ygr
oups
of
net
work
sw
ithsim
ilar
mes
osc
opi
cst
ruct
ures
.In
Fig.
1(b),
we
show
the
Zach
ary
Kar
ate
Club
net
work
[23]
for
diffe
rent
valu
esof
0361
04-3
J.-P. Onnela et al., Phys. Rev. E 86, 036104 (2012).
6www.nature.com/nature
doi: 10.1038/nature09182 SUPPLEMENTARY INFORMATION
Gavroche
ValjeanBossuet
Mabeuf
Bahorel
Grantaire
Gervais
Fauchelevent
Gribier
Fameuil
Listolier
Thenardier
Bamatabois
Champmathieu
MmeHucheloup
Montparnasse
Courfeyrac
Enjolras
Gillenormand
Fantine
Tholomyes
MariusJoly
Brujon
GueulemerFavourite
Zephine
Eponine
MmeMagloireMyriel
MmeThenardier
Cosette
LtGillenormand
MlleGillenormand
Feuilly
MlleBaptistine
Blacheville
Claquesous
CombeferreJavert
Woman1
Dahlia
Child1
Child2
Perpetue
Simplice
Babet
Pontmercy
Chenildieu
Napoleon
CravatteChamptercier
Scaufflaire
Boulatruelle
Labarre
Judge
BaronessTCountessDeLo
Isabeau
Marguerite Brevet
Cochepaille
MmePontmercy
MlleVaubois
Magnon
Woman2
Prouvaire
MmeDeR
Toussaint
Count
MotherPlutarch
MmeBurgon
MotherInnocent
Anzelma
OldMan
Jondrette
Geborand
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 3: Link communities for the coappearance network of characters in the novel Les Miserables [9]. (Top) the networkwith link colors indicating the clustering, with grey indicating single-link clusters. Each node is depicted as a pie-chartrepresenting its membership distribution. The main characters have more diverse community membership. (Bottom) thefull link dendrogram (left) and partition density (right). Note the internal blue community in the large blue and red cliquecontaining Valjean. Link clustering is able to unveil hierarchical structure even inside of cliques.
2.3.1 Clique percolation
Clique percolation [11, 15] provides an elegant and highly useful method to uncover overlapping com-munity structure [16]. It is currently the most popular and most successful tool available for this task.A particularly interesting feature of this method is that it presents the experimenter with a knob k, theclique size, which can be used to tune the result between high coverage, low community quality (sparsecommunities) and low coverage, high community quality (dense communities). For some networks,such as the mobile phone network, a precedent exists for the choice of k, which we follow. Wheneverthat is not the case, we have computed the composite performance for a range of ks and chosen the kwhich results in the optimum overall performance2. This weighs coverage and quality equally, however,and it remains at the discretion of the researcher to decide if this is optimal for his or her application.See Appendix A.2.
2For some of the very large or very dense networks, we were not able to run clique percolation for large values of k with thefastest existing software (even on a machine with 32 Gb of RAM), using the fast algorithm developed by Kumpala et al. [17].
6
Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann, Nature 466, 761 (2010).
note: i and j are node indices, and s and r are layer indices.The adjacency tensor Aijs 6= 0 if nodes i and j are connectedin layer s, and Aijs = 0 otherwise.kis is the degree (or strength) of node i in layer s,ms is the number of edges (or sum of weights) in layer s,and s = is the resolution parameter in layer s.Cjsr = ! 6= 0 if layers s and r are connected via node j,and Cjsr = 0 otherwise.The normalization factor 2 =
PijsAijs +
Pjsr Cjsr for Qmultilayer 2 [1, 1].
Qmultilayer =1
2
Xijsr
Aijs s kiskjs
2ms
sr + ijCjsr
(gis, gjr)
Community Structure inTime-Dependent, Multiscale,and Multiplex NetworksPeter J. Mucha,1,2* Thomas Richardson,1,3 Kevin Macon,1 Mason A. Porter,4,5 Jukka-Pekka Onnela6,7
Network science is an interdisciplinary endeavor, with methods and applications drawn from acrossthe natural, social, and information sciences. A prominent problem in network science is thealgorithmic detection of tightly connected groups of nodes known as communities. We developed ageneralized framework of network quality functions that allowed us to study the communitystructure of arbitrary multislice networks, which are combinations of individual networks coupledthrough links that connect each node in one network slice to itself in other slices. This frameworkallows studies of community structure in a general setting encompassing networks that evolve overtime, have multiple types of links (multiplexity), and have multiple scales.
Thestudy of graphs, or networks, has a longtradition in fields such as sociology andmathematics, and it is now ubiquitous inacademic and everyday settings. An importanttool in network analysis is the detection ofmesoscopic structures known as communities (orcohesive groups), which are defined intuitively asgroups of nodes that are more tightly connected toeach other than they are to the rest of the network(13). One way to quantify communities is by aquality function that compares the number ofintracommunity edges to what one would expectat random.Given the network adjacencymatrixA,where the element Aij details a direct connectionbetween nodes i and j, one can construct a qual-ity functionQ (4, 5) for the partitioning of nodesinto communities as Q = ij (Aij Pij)d(gi, gj),where d(gi, gj) = 1 if the community assignmentsgi and gj of nodes i and j are the same and 0otherwise, and Pij is the expected weight of theedge between i and j under a specified null model.
The choice of null model is a crucial con-sideration in studying network community struc-ture (2). After selecting a null model appropriateto the network and application at hand, one canuse a variety of computational heuristics to assignnodes to communities to optimize the quality Q(2, 3). However, such null models have not beenavailable for time-dependent networks; analyseshave instead depended on ad hoc methods to
piece together the structures obtained at differenttimes (69) or have abandoned quality functionsin favor of such alternatives as the MinimumDescriptionLength principle (10). Although tensordecompositions (11) have been used to clusternetwork data with different types of connections,no quality-function method has been developedfor such multiplex networks.
We developed a methodology to remove theselimits, generalizing the determination of commu-nity structure via quality functions to multislicenetworks that are defined by coupling multipleadjacency matrices (Fig. 1). The connectionsencoded by the network slices are flexible; theycan represent variations across time, variationsacross different types of connections, or evencommunity detection of the same network atdifferent scales. However, the usual procedure forestablishing a quality function as a direct count ofthe intracommunity edge weight minus that
expected at random fails to provide any contribu-tion from these interslice couplings. Because theyare specified by common identifications of nodesacross slices, interslice couplings are either presentor absent by definition, so when they do fall insidecommunities, their contribution in the count of intra-community edges exactly cancels that expected atrandom. In contrast, by formulating a null model interms of stability of communities under Laplaciandynamics, we have derived a principled generaliza-tion of community detection to multislice networks,
REPORTS
1Carolina Center for Interdisciplinary Applied Mathematics,Department of Mathematics, University of North Carolina,Chapel Hill, NC 27599, USA. 2Institute for Advanced Materials,Nanoscience and Technology, University of North Carolina,Chapel Hill, NC 27599, USA. 3Operations Research, NorthCarolina State University, Raleigh, NC 27695, USA. 4OxfordCentre for Industrial and Applied Mathematics, MathematicalInstitute, University of Oxford, Oxford OX1 3LB, UK. 5CABDyNComplexity Centre, University of Oxford, Oxford OX1 1HP, UK.6Department of Health Care Policy, Harvard Medical School,Boston, MA 02115, USA. 7Harvard Kennedy School, HarvardUniversity, Cambridge, MA 02138, USA.
*To whom correspondence should be addressed. E-mail:mucha@unc.edu
1
2
3
4
Fig. 1. Schematic of amultislice network. Four slicess= {1, 2, 3, 4} represented by adjacencies Aijs encodeintraslice connections (solid lines). Interslice con-nections (dashed lines) are encoded byCjrs, specifyingthe coupling of node j to itself between slices r and s.For clarity, interslice couplings are shown for only twonodes and depict two different types of couplings: (i)coupling between neighboring slices, appropriate forordered slices; and (ii) all-to-all interslice coupling,appropriate for categorical slices.
no
des
resolution parameters
coupling = 0
1 2 3 4
5
10
15
20
25
30
no
des
resolution parameters
coupling = 0.1
1 2 3 4
5
10
15
20
25
30
no
des
resolution parameters
coupling = 1
1 2 3 4
5
10
15
20
25
30
Fig. 2. Multislice community detection of theZachary Karate Club network (22) across multipleresolutions. Colors depict community assignments ofthe 34 nodes (renumbered vertically to groupsimilarly assigned nodes) in each of the 16 slices(with resolution parameters gs = {0.25, 0.5,, 4}),for w = 0 (top), w = 0.1 (middle), and w =1 (bottom). Dashed lines bound the communitiesobtained using the default resolution (g = 1).
14 MAY 2010 VOL 328 SCIENCE www.sciencemag.org876
CORRECTED 16 JULY 2010; SEE LAST PAGE
on
Nove
mbe
r 8, 2
011
www.
scien
cem
ag.o
rgDo
wnloa
ded
from
Multilayer community detection
P. J. Mucha, T. Richardson, K. Macon, M. A. Porter, and J.-P. Onnela, Science 328, 876 (2010).
different slices: time series or categories
nodes in individual slices
(weighted) edges
with a single parameter controlling the interslicecorrespondence of communities.
Important to our method is the equivalencebetween themodularity quality function (12) [witha resolution parameter (5)] and stability of com-munities under Laplacian dynamics (13), whichwe have generalized to recover the null models forbipartite, directed, and signed networks (14). First,we obtained the resolution-parameter generaliza-
tion of Barbers null model for bipartite networks(15) by requiring the independent joint probabilitycontribution to stability in (13) to be conditionalon the type of connection necessary to stepbetween two nodes. Second, we recovered thestandard null model for directed networks (16, 17)(again with a resolution parameter) by generaliz-ing the Laplacian dynamics to include motionalong different kinds of connectionsin this case,
both with and against the direction of a link. Bythis generalization, we similarly recovered a nullmodel for signed networks (18). Third, weinterpreted the stability under Laplacian dynamicsflexibly to permit different spreading weights onthe different types of links, giving multiple reso-lution parameters to recover a general null modelfor signed networks (19).
We applied these generalizations to derive nullmodels for multislice networks that extend theexisting quality-function methodology, includingan additional parameter w to control the couplingbetween slices. Representing each network slice sby adjacencies Aijs between nodes i and j, withinterslice couplingsCjrs that connect node j in slicer to itself in slice s (Fig. 1), we have restricted ourattention to unipartite, undirected network slices(Aijs = Ajis) and couplings (Cjrs = Cjsr), but we canincorporate additional structure in the slices andcouplings in the same manner as demonstrated forsingle-slice null models. Notating the strengths ofeach node individually in each slice by kjs =iAijsand across slices by cjs = rCjsr, we define themultislice strength by kjs = kjs + cjs. The continuous-time Laplacian dynamics given by
pis jrAijsdsr dijCjsrpjr
kjr pis 1
respects the intraslice nature of Aijs and theinterslice couplings of Cjsr. Using the steady-stateprobability distribution pjr kjr=2m, where 2m = jrkjr, we obtained the multislice null model interms of the probability ris| jr of sampling node i inslice s conditional on whether the multislice struc-ture allowsone to step from ( j, r) to (i, s), accountingfor intra- and interslice steps separately as
risj jrpjr
kis2ms
kjrkjr
dsr Cjsrcjrcjrkjr
dij
! "kjr2m
2
where ms = jkjs. The second term in parentheses,which describes the conditional probability ofmotion between two slices, leverages the definitionof the Cjsr coupling. That is, the conditionalprobability of stepping from ( j, r) to (i, s) alongan interslice coupling is nonzero if and only if i = j,and it is proportional to the probability Cjsr/kjr ofselecting the precise interslice link that connects toslice s. Subtracting this conditional joint probabilityfrom the linear (in time) approximation of theexponential describing the Laplacian dynamics,weobtained a multislice generalization of modularity(14):
Qmultislice 12m ijsrh#
Aijs gskiskjs2ms
dsr$
dijCjsridgis,gjr 3
where we have used reweighting of the conditionalprobabilities, which allows a different resolution gsin each slice. We have absorbed the resolution pa-rameter for the interslice couplings into the mag-nitude of the elements ofCjsr, which, for simplicity,we presume to take binary values {0,w} indicatingthe absence (0) or presence (w) of interslice links.
1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000
40PA, 24F, 8AA
151DR, 30AA, 14PA, 5F141F, 43DR
44D, 2R
1784R, 276D, 149DR, 162J, 53W, 84other
176W, 97AJ, 61DR, 49A,24D, 19F, 13J, 37other
3168D, 252R, 73other
222D, 6W, 11other
1490R, 247D, 19other
Year
Sena
tor
10 20 30 40 50 60 70 80 90 100 110CTMEMANHRI VTDE NJNY PAIL INMI OHWI IAKSMNMONENDSDVA ALAR FLGA LAMSNCSC TXKYMDOK TNWVAZCO IDMTNVNMUTWYCAORWAAK HI
Congress #
A
B
Fig. 3. Multislice community detection of U.S. Senate roll call vote similarities (23) withw = 0.5 couplingof 110 slices (i.e., the number of 2-year Congresses from 1789 to 2008) across time. (A) Colors indicateassignments to nine communities of the 1884 unique senators (sorted vertically and connected acrossCongresses by dashed lines) in each Congress in which they appear. The dark blue and red communitiescorrespond closely to the modern Democratic and Republican parties, respectively. Horizontal barsindicate the historical period of each community, with accompanying text enumerating nominal partyaffiliations of the single-slice nodes (each representing a senator in a Congress): PA, pro-administration;AA, anti-administration; F, Federalist; DR, Democratic-Republican; W, Whig; AJ, anti-Jackson; A, Adams; J,Jackson; D, Democratic; R, Republican. Vertical gray bars indicate Congresses in which three communitiesappeared simultaneously. (B) The same assignments according to state affiliations.
www.sciencemag.org SCIENCE VOL 328 14 MAY 2010 877
REPORTS
on
Nove
mbe
r 8, 2
011
www.
scien
cem
ag.o
rgDo
wnloa
ded
from
JUKKA-PEKKA ONNELA et al. PHYSICAL REVIEW E 86, 036104 (2012)
Social Facebook Political: voting
Political: cosponsorship Political: committee Protein interaction
Metabolic Brain Fungal
Financial
Language Collaboration
effeffeff
FIG. 7. (Color online) MRFs for all of the network categoriescontaining at least eight networks (see Table I). At each value of , theupper curve shows the maximum value ofHeff (magenta, left panel ineach category), Seff (blue, center panels), and eff (black, right panels)for all networks in the category and the lower curve shows the mini-mum value. The dashed curves show the corresponding mean MRFs.
A. Voting in the United States SenateOur first example deals with roll-call voting in the US
Senate [3134,48]. Establishing a taxonomy of networksdetailing the voting similarities of individual legislators com-plements previous studies of these data, and it facilitatesthe comparison of voting similarity networks across time.We consider Congresses 1110, which cover the period17892008. As in Ref. [34], we construct networks from theroll-call data [31,32] for each two-year Congress such that theadjacency matrix element Aij [0,1] represents the numberof times Senators i and j voted the same way on a bill (eitherboth in favor of it or both against it) divided by the total numberof bills on which both of them voted. Following the approachof Ref. [32], we consider only nonunanimous roll-call votes,which are defined as votes in which at least 3% of the Senatorswere in the minority.
Much research on the US Congress has been devoted tothe ebb and flow of partisan polarization over time and theinfluence of parties on roll-call voting [33,34]. In highlypolarized legislatures, representatives tend to vote alongparty lines, so there are strong similarities in the votingpatterns of members of the same party and strong differencesbetween members of different parties. In contrast, duringperiods of low polarization, the party lines become blurred.The notion of partisan polarization can be used to helpunderstand the taxonomy of Senates in Fig. 8, in which weconsider two measures of polarization. The first measure usesDW-Nominate scores (a multidimensional scaling techniquecommonly used in political science [32,33]), where the extentof polarization is given by the absolute value of the differencebetween the mean first-dimension DW-Nominate scores formembers of one party and the same mean for members ofthe other party [3133]. In particular, we use the simplestsuch measure of polarization, called MPR polarization, whichassumes a competitive two-party system and hence cannot becalculated prior to the 46th Senate. The second measure thatwe consider is the maximum modularity Q over partitions of
0
0.11
0.22
0.33
0.44
0.56
0.67
0.78
0.89
1
10 20 30 40 50 60 70 80 90 100 1100
0.2
0.4
0.6
0.8
1
0
Mod
ular
ity (Q
)D
W-N
omin
ate
pola
rizat
ion
Mod
ular
ity (Q
)D
W-N
omin
ate
pola
rizat
ion
(a)
(b)
FIG. 8. (Color) (a) Dendrogram for Senate roll-call voting net-works for the 1st110th Congresses. Each leaf in the dendrogramrepresents a single Senate. The two horizontal color bars below thedendrograms indicate polarization measured in terms of optimizedmodularity (upper bar) and DW-Nominate scores (lower bar). Wecolor the branches in the dendrogram corresponding to periods ofsimilar polarization. (b) Polarization of the US Senate as a function oftime, which we label using the Congress number. The height of eachstem indicates the level of polarization measured using optimizedmodularity, and the color of each stem gives the cluster membershipof each Senate in (a). The black curve shows the DW-Nominatepolarization. Note that we have normalized both measures to lie inthe interval [0,1].
a network. It was shown recently that Q is a good measure ofpolarization even for Congresses without clear party divisions[34]. Modularity is given in terms of the energy H in Eq. (1)by Q = H( = 1)/(2m).
In Fig. 8(a), we include bars under the dendrogramsto represent the two polarization measures, both of whichhave been normalized to lie in the interval [0,1]. The barsdemonstrate that Senates with similar levels of polarization(measured in terms of both DW-Nominate scores and opti-mized modularity values) are usually assigned to the samegroup, suggesting that our MRF clustering technique groupsSenates based on the polarization of roll-call votes. We havealso colored dendrogram groups according to their mean levelsof polarization using optimized modularity, where the browngroup in the dendrogram corresponds to the most polarizedSenates and the blue group corresponds to the least polarizedSenates. We chose the specific number of groups by inspectionof the dendrogram. Although one ought to expect similarity inthe results from the modularity-based measure of polarizationand the MRF clustering, it is important to stress that theMRF clustering method is based on different principles;modularity attempts to quantify the extent to which a givennetwork is modular, whereas the MRF clustering explicitly
036104-8
data: US senators
note: i and j are node indices, and s and r are layer indices.The adjacency tensor Aijs 6= 0 if nodes i and j are connectedin layer s, and Aijs = 0 otherwise.kis is the degree (or strength) of node i in layer s,ms is the number of edges (or sum of weights) in layer s,and s = is the resolution parameter in layer s.Cjsr = ! 6= 0 if layers s and r are connected via node j,and Cjsr = 0 otherwise.The normalization factor 2 =
PijsAijs +
Pjsr Cjsr for Qmultilayer 2 [1, 1].
Qmultilayer =1
2
Xijsr
Aijs s kiskjs
2ms
sr + ijCjsr
(gis, gjr)
Community Structure inTime-Dependent, Multiscale,and Multiplex NetworksPeter J. Mucha,1,2* Thomas Richardson,1,3 Kevin Macon,1 Mason A. Porter,4,5 Jukka-Pekka Onnela6,7
Network science is an interdisciplinary endeavor, with methods and applications drawn from acrossthe natural, social, and information sciences. A prominent problem in network science is thealgorithmic detection of tightly connected groups of nodes known as communities. We developed ageneralized framework of network quality functions that allowed us to study the communitystructure of arbitrary multislice networks, which are combinations of individual networks coupledthrough links that connect each node in one network slice to itself in other slices. This frameworkallows studies of community structure in a general setting encompassing networks that evolve overtime, have multiple types of links (multiplexity), and have multiple scales.
Thestudy of graphs, or networks, has a longtradition in fields such as sociology andmathematics, and it is now ubiquitous inacademic and everyday settings. An importanttool in network analysis is the detection ofmesoscopic structures known as communities (orcohesive groups), which are defined intuitively asgroups of nodes that are more tightly connected toeach other than they are to the rest of the network(13). One way to quantify communities is by aquality function that compares the number ofintracommunity edges to what one would expectat random.Given the network adjacencymatrixA,where the element Aij details a direct connectionbetween nodes i and j, one can construct a qual-ity functionQ (4, 5) for the partitioning of nodesinto communities as Q = ij (Aij Pij)d(gi, gj),where d(gi, gj) = 1 if the community assignmentsgi and gj of nodes i and j are the same and 0otherwise, and Pij is the expected weight of theedge between i and j under a specified null model.
The choice of null model is a crucial con-sideration in studying network community struc-ture (2). After selecting a null model appropriateto the network and application at hand, one canuse a variety of computational heuristics to assignnodes to communities to optimize the quality Q(2, 3). However, such null models have not beenavailable for time-dependent networks; analyseshave instead depended on ad hoc methods to
piece together the structures obtained at differenttimes (69) or have abandoned quality functionsin favor of such alternatives as the MinimumDescriptionLength principle (10). Although tensordecompositions (11) have been used to clusternetwork data with different types of connections,no quality-function method has been developedfor such multiplex networks.
We developed a methodology to remove theselimits, generalizing the determination of commu-nity structure via quality functions to multislicenetworks that are defined by coupling multipleadjacency matrices (Fig. 1). The connectionsencoded by the network slices are flexible; theycan represent variations across time, variationsacross different types of connections, or evencommunity detection of the same network atdifferent scales. However, the usual procedure forestablishing a quality function as a direct count ofthe intracommunity edge weight minus that
expected at random fails to provide any contribu-tion from these interslice couplings. Because theyare specified by common identifications of nodesacross slices, interslice couplings are either presentor absent by definition, so when they do fall insidecommunities, their contribution in the count of intra-community edges exactly cancels that expected atrandom. In contrast, by formulating a null model interms of stability of communities under Laplaciandynamics, we have derived a principled generaliza-tion of community detection to multislice networks,
REPORTS
1Carolina Center for Interdisciplinary Applied Mathematics,Department of Mathematics, University of North Carolina,Chapel Hill, NC 27599, USA. 2Institute for Advanced Materials,Nanoscience and Technology, University of North Carolina,Chapel Hill, NC 27599, USA. 3Operations Research, NorthCarolina State University, Raleigh, NC 27695, USA. 4OxfordCentre for Industrial and Applied Mathematics, MathematicalInstitute, University of Oxford, Oxford OX1 3LB, UK. 5CABDyNComplexity Centre, University of Oxford, Oxford OX1 1HP, UK.6Department of Health Care Policy, Harvard Medical School,Boston, MA 02115, USA. 7Harvard Kennedy School, HarvardUniversity, Cambridge, MA 02138, USA.
*To whom correspondence should be addressed. E-mail:mucha@unc.edu
1
2
3
4
Fig. 1. Schematic of amultislice network. Four slicess= {1, 2, 3, 4} represented by adjacencies Aijs encodeintraslice connections (solid lines). Interslice con-nections (dashed lines) are encoded byCjrs, specifyingthe coupling of node j to itself between slices r and s.For clarity, interslice couplings are shown for only twonodes and depict two different types of couplings: (i)coupling between neighboring slices, appropriate forordered slices; and (ii) all-to-all interslice coupling,appropriate for categorical slices.
no
des
resolution parameters
coupling = 0
1 2 3 4
5
10
15
20
25
30
no
des
resolution parameters
coupling = 0.1
1 2 3 4
5
10
15
20
25
30
no
des
resolution parameters
coupling = 1
1 2 3 4
5
10
15
20
25
30
Fig. 2. Multislice community detection of theZachary Karate Club network (22) across multipleresolutions. Colors depict community assignments ofthe 34 nodes (renumbered vertically to groupsimilarly assigned nodes) in each of the 16 slices(with resolution parameters gs = {0.25, 0.5,, 4}),for w = 0 (top), w = 0.1 (middle), and w =1 (bottom). Dashed lines bound the communitiesobtained using the default resolution (g = 1).
14 MAY 2010 VOL 328 SCIENCE www.sciencemag.org876
CORRECTED 16 JULY 2010; SEE LAST PAGE
on
Nove
mbe
r 8, 2
011
www.
scien
cem
ag.o
rgDo
wnloa
ded
from
Multilayer community detection
P. J. Mucha, T. Richardson, K. Macon, M. A. Porter, and J.-P. Onnela, Science 328, 876 (2010).
different slices: time series or categories
nodes in individual slices
(weighted) edges
with a single parameter controlling the interslicecorrespondence of communities.
Important to our method is the equivalencebetween themodularity quality function (12) [witha resolution parameter (5)] and stability of com-munities under Laplacian dynamics (13), whichwe have generalized to recover the null models forbipartite, directed, and signed networks (14). First,we obtained the resolution-parameter generaliza-
tion of Barbers null model for bipartite networks(15) by requiring the independent joint probabilitycontribution to stability in (13) to be conditionalon the type of connection necessary to stepbetween two nodes. Second, we recovered thestandard null model for directed networks (16, 17)(again with a resolution parameter) by generaliz-ing the Laplacian dynamics to include motionalong different kinds of connectionsin this case,
both with and against the direction of a link. Bythis generalization, we similarly recovered a nullmodel for signed networks (18). Third, weinterpreted the stability under Laplacian dynamicsflexibly to permit different spreading weights onthe different types of links, giving multiple reso-lution parameters to recover a general null modelfor signed networks (19).
We applied these generalizations to derive nullmodels for multislice networks that extend theexisting quality-function methodology, includingan additional parameter w to control the couplingbetween slices. Representing each network slice sby adjacencies Aijs between nodes i and j, withinterslice couplingsCjrs that connect node j in slicer to itself in slice s (Fig. 1), we have restricted ourattention to unipartite, undirected network slices(Aijs = Ajis) and couplings (Cjrs = Cjsr), but we canincorporate additional structure in the slices andcouplings in the same manner as demonstrated forsingle-slice null models. Notating the strengths ofeach node individually in each slice by kjs =iAijsand across slices by cjs = rCjsr, we define themultislice strength by kjs = kjs + cjs. The continuous-time Laplacian dynamics given by
pis jrAijsdsr dijCjsrpjr
kjr pis 1
respects the intraslice nature of Aijs and theinterslice couplings of Cjsr. Using the steady-stateprobability distribution pjr kjr=2m, where 2m = jrkjr, we obtained the multislice null model interms of the probability ris| jr of sampling node i inslice s conditional on whether the multislice struc-ture allowsone to step from ( j, r) to (i, s), accountingfor intra- and interslice steps separately as
risj jrpjr
kis2ms
kjrkjr
dsr Cjsrcjrcjrkjr
dij
! "kjr2m
2
where ms = jkjs. The second term in parentheses,which describes the conditional probability ofmotion between two slices, leverages the definitionof the Cjsr coupling. That is, the conditionalprobability of stepping from ( j, r) to (i, s) alongan interslice coupling is nonzero if and only if i = j,and it is proportional to the probability Cjsr/kjr ofselecting the precise interslice link that connects toslice s. Subtracting this conditional joint probabilityfrom the linear (in time) approximation of theexponential describing the Laplacian dynamics,weobtained a multislice generalization of modularity(14):
Qmultislice 12m ijsrh#
Aijs gskiskjs2ms
dsr$
dijCjsridgis,gjr 3
where we have used reweighting of the conditionalprobabilities, which allows a different resolution gsin each slice. We have absorbed the resolution pa-rameter for the interslice couplings into the mag-nitude of the elements ofCjsr, which, for simplicity,we presume to take binary values {0,w} indicatingthe absence (0) or presence (w) of interslice links.
1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000
40PA, 24F, 8AA
151DR, 30AA, 14PA, 5F141F, 43DR
44D, 2R
1784R, 276D, 149DR, 162J, 53W, 84other
176W, 97AJ, 61DR, 49A,24D, 19F, 13J, 37other
3168D, 252R, 73other
222D, 6W, 11other
1490R, 247D, 19other
Year
Sena
tor
10 20 30 40 50 60 70 80 90 100 110CTMEMANHRI VTDE NJNY PAIL INMI OHWI IAKSMNMONENDSDVA ALAR FLGA LAMSNCSC TXKYMDOK TNWVAZCO IDMTNVNMUTWYCAORWAAK HI
Congress #
A
B
Fig. 3. Multislice community detection of U.S. Senate roll call vote similarities (23) withw = 0.5 couplingof 110 slices (i.e., the number of 2-year Congresses from 1789 to 2008) across time. (A) Colors indicateassignments to nine communities of the 1884 unique senators (sorted vertically and connected acrossCongresses by dashed lines) in each Congress in which they appear. The dark blue and red communitiescorrespond closely to the modern Democratic and Republican parties, respectively. Horizontal barsindicate the historical period of each community, with accompanying text enumerating nominal partyaffiliations of the single-slice nodes (each representing a senator in a Congress): PA, pro-administration;AA, anti-administration; F, Federalist; DR, Democratic-Republican; W, Whig; AJ, anti-Jackson; A, Adams; J,Jackson; D, Democratic; R, Republican. Vertical gray bars indicate Congresses in which three communitiesappeared simultaneously. (B) The same assignments according to state affiliations.
www.sciencemag.org SCIENCE VOL 328 14 MAY 2010 877
REPORTS
on
Nove
mbe
r 8, 2
011
www.
scien
cem
ag.o
rgDo
wnloa
ded
from
JUKKA-PEKKA ONNELA et al. PHYSICAL REVIEW E 86, 036104 (2012)
Social Facebook Political: voting
Political: cosponsorship Political: committee Protein interaction
Metabolic Brain Fungal
Financial
Language Collaboration
effeffeff
FIG. 7. (Color online) MRFs for all of the network categoriescontaining at least eight networks (see Table I). At each value of , theupper curve shows the maximum value ofHeff (magenta, left panel ineach category), Seff (blue, center panels), and eff (black, right panels)for all networks in the category and the lower curve shows the mini-mum value. The dashed curves show the corresponding mean MRFs.
A. Voting in the United States SenateOur first example deals with roll-call voting in the US
Senate [3134,48]. Establishing a taxonomy of networksdetailing the voting similarities of individual legislators com-plements previous studies of these data, and it facilitatesthe comparison of voting similarity networks across time.We consider Congresses 1110, which cover the period17892008. As in Ref. [34], we construct networks from theroll-call data [31,32] for each two-year Congress such that theadjacency matrix element Aij [0,1] represents the numberof times Senators i and j voted the same way on a bill (eitherboth in favor of it or both against it) divided by the total numberof bills on which both of them voted. Following the approachof Ref. [32], we consider only nonunanimous roll-call votes,which are defined as votes in which at least 3% of the Senatorswere in the minority.
Much research on the US Congress has been devoted tothe ebb and flow of partisan polarization over time and theinfluence of parties on roll-call voting [33,34]. In highlypolarized legislatures, representatives tend to vote alongparty lines, so there are strong similarities in the votingpatterns of members of the same party and strong differencesbetween members of different parties. In contrast, duringperiods of low polarization, the party lines become blurred.The notion of partisan polarization can be used to helpunderstand the taxonomy of Senates in Fig. 8, in which weconsider two measures of polarization. The first measure usesDW-Nominate scores (a multidimensional scaling techniquecommonly used in political science [32,33]), where the extentof polarization is given by the absolute value of the differencebetween the mean first-dimension DW-Nominate scores formembers of one party and the same mean for members ofthe other party [3133]. In particular, we use the simplestsuch measure of polarization, called MPR polarization, whichassumes a competitive two-party system and hence cannot becalculated prior to the 46th Senate. The second measure thatwe consider is the maximum modularity Q over partitions of
0
0.11
0.22
0.33
0.44
0.56
0.67
0.78
0.89
1
10 20 30 40 50 60 70 80 90 100 1100
0.2
0.4
0.6
0.8
1
0
Mod
ular
ity (Q
)D
W-N
omin
ate
pola
rizat
ion
Mod
ular
ity (Q
)D
W-N
omin
ate
pola
rizat
ion
(a)
(b)
FIG. 8. (Color) (a) Dendrogram for Senate roll-call voting net-works for the 1st110th Congresses. Each leaf in the dendrogramrepresents a single Senate. The two horizontal color bars below thedendrograms indicate polarization measured in terms of optimizedmodularity (upper bar) and DW-Nominate scores (lower bar). Wecolor the branches in the dendrogram corresponding to periods ofsimilar polarization. (b) Polarization of the US Senate as a function oftime, which we label using the Congress number. The height of eachstem indicates the level of polarization measured using optimizedmodularity, and the color of each stem gives the cluster membershipof each Senate in (a). The black curve shows the DW-Nominatepolarization. Note that we have normalized both measures to lie inthe interval [0,1].
a network. It was shown recently that Q is a good measure ofpolarization even for Congresses without clear party divisions[34]. Modularity is given in terms of the energy H in Eq. (1)by Q = H( = 1)/(2m).
In Fig. 8(a), we include bars under the dendrogramsto represent the two polarization measures, both of whichhave been normalized to lie in the interval [0,1]. The barsdemonstrate that Senates with similar levels of polarization(measured in terms of both DW-Nominate scores and opti-mized modularity values) are usually assigned to the samegroup, suggesting that our MRF clustering technique groupsSenates based on the polarization of roll-call votes. We havealso colored dendrogram groups according to their mean levelsof polarization using optimized modularity, where the browngroup in the dendrogram corresponds to the most polarizedSenates and the blue group corresponds to the least polarizedSenates. We chose the specific number of groups by inspectionof the dendrogram. Although one ought to expect similarity inthe results from the modularity-based measure of polarizationand the MRF clustering, it is important to stress that theMRF clustering method is based on different principles;modularity attempts to quantify the extent to which a givennetwork is modular, whereas the MRF clustering explicitly
036104-8
data: US senators
multilayer community index:for node i on layer s
note: i and j are node indices, and s and r are layer indices.The adjacency tensor Aijs 6= 0 if nodes i and j are connectedin layer s, and Aijs = 0 otherwise.kis is the degree (or strength) of node i in layer s,ms is the number of edges (or sum of weights) in layer s,and s = is the resolution parameter in layer s.Cjsr = ! 6= 0 if layers s and r are connected via node j,and Cjsr = 0 otherwise.The normalization factor 2 =
PijsAijs +
Pjsr Cjsr for Qmultilayer 2 [1, 1].
Qmultilayer =1
2
Xijsr
Aijs s kiskjs
2ms
sr + ijCjsr
(gis, gjr)
Community Structure inTime-Dependent, Multiscale,and Multiplex NetworksPeter J. Mucha,1,2* Thomas Richardson,1,3 Kevin Macon,1 Mason A. Porter,4,5 Jukka-Pekka Onnela6,7
Network science is an interdisciplinary endeavor, with methods and applications drawn from acrossthe natural, social, and information sciences. A prominent problem in network science is thealgorithmic detection of tightly connected groups of nodes known as communities. We developed ageneralized framework of network quality functions that allowed us to study the communitystructure of arbitrary multislice networks, which are combinations of individual networks coupledthrough links that connect each node in one network slice to itself in other slices. This frameworkallows studies of community structure in a general setting encompassing networks that evolve overtime, have multiple types of links (multiplexity), and have multiple scales.
Thestudy of graphs, or networks, has a longtradition in fields such as sociology andmathematics, and it is now ubiquitous inacademic and everyday settings. An importanttool in network analysis is the detection ofmesoscopic structures known as communities (orcohesive groups), which are defined intuitively asgroups of nodes that are more tightly connected toeach other than they are to the rest of the network(13). One way to quantify communities is by aquality function that compares the number ofintracommunity edges to what one would expectat random.Given the network adjacencymatrixA,where the element Aij details a direct connectionbetween nodes i and j, one can construct a qual-ity functionQ (4, 5) for the partitioning of nodesinto communities as Q = ij (Aij Pij)d(gi, gj),where d(gi, gj) = 1 if the community assignmentsgi and gj of nodes i and j are the same and 0otherwise, and Pij is the expected weight of theedge between i and j under a specified null model.
The choice of null model is a crucial con-sideration in studying network community struc-ture (2). After selecting a null model appropriateto the network and application at hand, one canuse a variety of computational heuristics to assignnodes to communities to optimize the quality Q(2, 3). However, such null models have not beenavailable for time-dependent networks; analyseshave instead depended on ad hoc methods to
piece together the structures obtained at differenttimes (69) or have abandoned quality functionsin favor of such alternatives as the MinimumDescriptionLength principle (10). Although tensordecompositions (11) have been used to clusternetwork data with different types of connections,no quality-function method has been developedfor such multiplex networks.
We developed a methodology to remove theselimits, generalizing the determination of commu-nity structure via quality functions to multislicenetworks that are defined by coupling multipleadjacency matrices (Fig. 1). The connectionsencoded by the network slices are flexible; theycan represent variations across time, variationsacross different types of connections, or evencommunity detection of the same network atdifferent scales. However, the usual procedure forestablishing a quality function as a direct count ofthe intracommunity edge weight minus that
expected at random fails to provide any contribu-tion from these interslice couplings. Because theyare specified by common identifications of nodesacross slices, interslice couplings are either presentor absent by definition, so when they do fall insidecommunities, their contribution in the count of intra-community edges exactly cancels that expected atrandom. In contrast, by formulating a null model interms of stability of communities under Laplaciandynamics, we have derived a principled generaliza-tion of community detection to multislice networks,
REPORTS
1Carolina Center for Interdisciplinary Applied Mathematics,Department of Mathematics, University of North Carolina,Chapel Hill, NC 27599, USA. 2Institute for Advanced Materials,Nanoscience and Technology, University of North Carolina,Chapel Hill, NC 27599, USA. 3Operations Research, NorthCarolina State University, Raleigh, NC 27695, USA. 4OxfordCentre for Industrial and Applied Mathematics, MathematicalInstitute, University of Oxford, Oxford OX1 3LB, UK. 5CABDyNComplexity Centre, University of Oxford, Oxford OX1 1HP, UK.6Department of Health Care Policy, Harvard Medical School,Boston, MA 02115, USA. 7Harvard Kennedy School, HarvardUniversity, Cambridge, MA 02138, USA.
*To whom correspondence should be addressed. E-mail:mucha@unc.edu
1
2
3
4
Fig. 1. Schematic of amultislice network. Four slicess= {1, 2, 3, 4} represented by adjacencies Aijs encodeintraslice connections (solid lines). Interslice con-nections (dashed lines) are encoded byCjrs, specifyingthe coupling of node j to itself between slices r and s.For clarity, interslice couplings are shown for only twonodes and depict two different types of couplings: (i)coupling between neighboring slices, appropriate forordered slices; and (ii) all-to-all interslice coupling,appropriate for categorical slices.
no
des
resolution parameters
coupling = 0
1 2 3 4
5
10
15
20
25
30
no
des
resolution parameters
coupling = 0.1
1 2 3 4
5
10
15
20
25
30
no
des
resolution parameters
coupling = 1
1 2 3 4
5
10
15
20
25
30
Fig. 2. Multislice community detection of theZachary Karate Club network (22) across multipleresolutions. Colors depict community assignments ofthe 34 nodes (renumbered vertically to groupsimilarly assigned nodes) in each of the 16 slices(with resolution parameters gs = {0.25, 0.5,, 4}),for w = 0 (top), w = 0.1 (middle), and w =1 (bottom). Dashed lines bound the communitiesobtained using the default resolution (g = 1).
14 MAY 2010 VOL 328 SCIENCE www.sciencemag.org876
CORRECTED 16 JULY 2010; SEE LAST PAGE
on
Nove
mbe
r 8, 2
011
www.
scien
cem
ag.o
rgDo
wnloa
ded
from
Multilayer community detection
P. J. Mucha, T. Richardson, K. Macon, M. A. Porter, and J.-P. Onnela, Science 328, 876 (2010).
all the layers connected to each other: categorical multilayer communities
only the adjacent layers connected to each other: ordered multilayer communities
different slices: time series or categories
nodes in individual slices
(weighted) edges
with a single parameter controlling the interslicecorrespondence of communities.
Important to our method is the equivalencebetween themodularity quality function (12) [witha resolution parameter (5)] and stability of com-munities under Laplacian dynamics (13), whichwe have generalized to recover the null models forbipartite, directed, and signed networks (14). First,we obtained the resolution-parameter generaliza-
tion of Barbers null model for bipartite networks(15) by requiring the independent joint probabilitycontribution to stability in (13) to be conditionalon the type of connection necessary to stepbetween two nodes. Second, we recovered thestandard null model for directed networks (16, 17)(again with a resolution parameter) by generaliz-ing the Laplacian dynamics to include motionalong different kinds of connectionsin this case,
both with and against the direction of a link. Bythis generalization, we similarly recovered a nullmodel for signed networks (18). Third, weinterpreted the stability under Laplacian dynamicsflexibly to permit different spreading weights onthe different types of links, giving multiple reso-lution parameters to recover a general null modelfor signed networks (19).
We applied these generalizations to derive nullmodels for multislice networks that extend theexisting quality-function methodology, includingan additional parameter w to control the couplingbetween slices. Representing each network slice sby adjacencies Aijs between nodes i and j, withinterslice couplingsCjrs that connect node j in slicer to itself in slice s (Fig. 1), we have restricted ourattention to unipartite, undirected network slices(Aijs = Ajis) and couplings (Cjrs = Cjsr), but we canincorporate additional structure in the slices andcouplings in the same manner as demonstrated forsingle-slice null models. Notating the strengths ofeach node individually in each slice by kjs =iAijsand across slices by cjs = rCjsr, we define themultislice strength by kjs = kjs + cjs. The continuous-time Laplacian dynamics given by
pis jrAijsdsr dijCjsrpjr
kjr pis 1
respects the intraslice nature of Aijs and theinterslice couplings of Cjsr. Using the steady-stateprobability distribution pjr kjr=2m, where 2m = jrkjr, we obtained the multislice null model interms of the probability ris| jr of sampling node i inslice s conditional on whether the multislice struc-ture allowsone to step from ( j, r) to (i, s), accountingfor intra- and interslice steps separately as
risj jrpjr
kis2ms
kjrkjr
dsr Cjsrcjrcjrkjr
dij
! "kjr2m
2
where ms = jkjs. The second term in parentheses,which describes the conditional probability ofmotion between two slices, leverages the definitionof the Cjsr coupling. That is, the conditionalprobability of stepping from ( j, r) to (i, s) alongan interslice coupling is nonzero if and only if i = j,and it is proportional to the probability Cjsr/kjr ofselecting the precise interslice link that connects toslice s. Subtracting this conditional joint probabilityfrom the linear (in time) approximation of theexponential describing the Laplacian dynamics,weobtained a multislice generalization of modularity(14):
Qmultislice 12m ijsrh#
Aijs gskiskjs2ms
dsr$
dijCjsridgis,gjr 3
where we have used reweighting of the conditionalprobabilities, which allows a different resolution gsin each slice. We have absorbed the resolution pa-rameter for the interslice couplings into the mag-nitude of the elements ofCjsr, which, for simplicity,we presume to take binary values {0,w} indicatingthe absence (0) or presence (w) of interslice links.
1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000
40PA, 24F, 8AA
151DR, 30AA, 14PA, 5F141F, 43DR
44D, 2R
1784R, 276D, 149DR, 162J, 53W, 84other
176W, 97AJ, 61DR, 49A,24D, 19F, 13J, 37other
3168D, 252R, 73other
222D, 6W, 11other
1490R, 247D, 19other
Year
Sena
tor
10 20 30 40 50 60 70 80 90 100 110CTMEMANHRI VTDE NJNY PAIL INMI OHWI IAKSMNMONENDSDVA ALAR FLGA LAMSNCSC TXKYMDOK TNWVAZCO IDMTNVNMUTWYCAORWAAK HI
Congress #
A
B
Fig. 3. Multislice community detection of U.S. Senate roll call vote similarities (23) withw = 0.5 couplingof 110 slices (i.e., the number of 2-year Congresses from 1789 to 2008) across time. (A) Colors indicateassignments to nine communities of the 1884 unique senators (sorted vertically and connected acrossCongresses by dashed lines) in each Congress in which they appear. The dark blue and red communitiescorrespond closely to the modern Democratic and Republican parties, respectively. Horizontal barsindicate the historical period of each community, with accompanying text enumerating nominal partyaffiliations of the single-slice nodes (each representing a senator in a Congress): PA, pro-administration;AA, anti-administration; F, Federalist; DR, Democratic-Republican; W, Whig; AJ, anti-Jackson; A, Adams; J,Jackson; D, Democratic; R, Republican. Vertical gray bars indicate Congresses in which three communitiesappeared simultaneously. (B) The same assignments according to state affiliations.
www.sciencemag.org SCIENCE VOL 328 14 MAY 2010 877
REPORTS
on
Nove
mbe
r 8, 2
011
www.
scien
cem
ag.o
rgDo
wnloa
ded
from
JUKKA-PEKKA ONNELA et al. PHYSICAL REVIEW E 86, 036104 (2012)
Social Facebook Political: voting
Political: cosponsorship Political: committee Protein interaction
Metabolic Brain Fungal
Financial
Language Collaboration
effeffeff
FIG. 7. (Color online) MRFs for all of the network categoriescontaining at least eight networks (see Table I). At each value of , theupper curve shows the maximum value ofHeff (magenta, left panel ineach category), Seff (blue, center panels), and eff (black, right panels)for all networks in the category and the lower curve shows the mini-mum value. The dashed curves show the corresponding mean MRFs.
A. Voting in the United States SenateOur first example deals with roll-call voting in the US
Senate [3134,48]. Establishing a taxonomy of networksdetailing the voting similarities of individual legislators com-plements previous studies of these data, and it facilitatesthe comparison of voting similarity networks across time.We consider Congresses 1110, which cover the period17892008. As in Ref. [34], we construct networks from theroll-call data [31,32] for each two-year Congress such that theadjacency matrix element Aij [0,1] represents the numberof times Senators i and j voted the same way on a bill (eitherboth in favor of it or both against it) divided by the total numberof bills on which both of them voted. Following the approachof Ref. [32], we consider only nonunanimous roll-call votes,which are defined as votes in which at least 3% of the Senatorswere in the minority.
Much research on the US Congress has been devoted tothe ebb and flow of partisan polarization over time and theinfluence of parties on roll-call voting [33,34]. In highlypolarized legislatures, representatives tend to vote alongparty lines, so there are strong similarities in the votingpatterns of members of the same party and strong differencesbetween members of different parties. In contrast, duringperiods of low polarization, the party lines become blurred.The notion of partisan polarization can be used to helpunderstand the taxonomy of Senates in Fig. 8, in which weconsider two measures of polarization. The first measure usesDW-Nominate scores (a multidimensional scaling techniquecommonly used in political science [32,33]), where the extentof polarization is given by the absolute value of the differencebetween the mean first-dimension DW-Nominate scores formembers of one party and the same mean for members ofthe other party [3133]. In particular, we use the simplestsuch measure of polarization, called MPR polarization, whichassumes a competitive two-party system and hence cannot becalculated prior to the 46th Senate. The second measure thatwe consider is the maximum modularity Q over partitions of
0
0.11
0.22
0.33
0.44
0.56
0.67
0.78
0.89
1
10 20 30 40 50 60 70 80 90 100 1100
0.2
0.4
0.6
0.8
1
0
Mod
ular
ity (Q
)D
W-N
omin
ate
pola
rizat
ion
Mod
ular
ity (Q
)D
W-N
omin
ate
pola
rizat
ion
(a)
(b)
FIG. 8. (Color) (a) Dendrogram for Senate roll-call voting net-works for the 1st110th Congresses. Each leaf in the dendrogramrepresents a single Senate. The two horizontal color bars below thedendrograms indicate polarization measured in terms of optimizedmodularity (upper bar) and DW-Nominate scores (lower bar). Wecolor the branches in the dendrogram corresponding to periods ofsimilar polarization. (b) Polarization of the US Senate as a function oftime, which we label using the Congress number. The height of eachstem indicates the level of polarization measured using optimizedmodularity, and the color of each stem gives the cluster membershipof each Senate in (a). The black curve shows the DW-Nominatepolarization. Note that we have normalized both measures to lie inthe interval [0,1].
a network. It was shown recently that Q is a good measure ofpolarization even for Congresses without clear party divisions[34]. Modularity is given in terms of the energy H in Eq. (1)by Q = H( = 1)/(2m).
In Fig. 8(a), we include bars under the dendrogramsto represent the two polarization measures, both of whichhave been normalized to lie in the interval [0,1]. The barsdemonstrate that Senates with similar levels of polarization(measured in terms of both DW-Nominate scores and opti-mized modularity values) are usually assigned to the samegroup, suggesting that our MRF clustering technique groupsSenates based on the polarization of roll-call votes. We havealso colored dendrogram groups according to their mean levelsof polarization using optimized modularity, where the browngroup in the dendrogram corresponds to the most polarizedSenates and the blue group corresponds to the least polarizedSenates. We chose the specific number of groups by inspectionof the dendrogram. Although one ought to expect similarity inthe results from the modularity-based measure of polarizationand the MRF clustering, it is important to stress that theMRF clustering method is based on different principles;modularity attempts to quantify the extent to which a givennetwork is modular, whereas the MRF clustering explicitly
036104-8
data: US senators
multilayer community index:for node i on layer s
parameter space = [ (intralayer resolution),! (interlayer coupling strength)]
congressional cosponsorship networks
1
1
2
1
1
2
1 1
congressperson a congressperson d
congressperson b congressperson c congressperson e
congressperson a congressperson d
congressperson b congressperson e
congressperson c
bill A bill B bill C
(a) bipartite network
(b) congresspersonmode projection (c) billmode projection
bill A
bill B bill C
FIG. 1: The construction procedure for the weighted projected networks from the bipartite network.
Figure 1 illustrates the method to project the bipartite cosponsorship network to the weighted
bill and congressperson networks similar to the one used in the relationship between protein com-
plexes and component proteins [1], where Tables I and II show the detailed statistics divided by
eight periods. There are 15 dierent status assigned for the bills as listed in Table III, but we do not
distinguish those status in the following analysis for now. For the bipartite network, Fig. 2 shows
the degree distributions of bills and congresspersons, indicating that the number of congressper-
sons cosponsoring bills is much more heterogeneously distributed compared to the number of bills
in which individual congresspersons participates.
The bills are clearly partitioned into ten periods (2006 I, 2006 II, 2007 I, 2007 II, 2008 I,
2008 II, 2009 I, 2009 II, 2010 I, and 2010 II), but the weighted congressperson-mode projection
networks for dierent years (composed of 130 members in total) shares most congresspersons (ex-
cept for the congresspersons invisible for a specific year due to their absence in the cosponsoring
activities), which allows the temporally changing or multiplex relations over the eight periods. In
2
Congress of the Republic of Peru (20062011)
Senate of the United States (1973-2009)
congressional cosponsorship networks
1
1
2
1
1
2
1 1
congressperson a congressperson d
congressperson b congressperson c congressperson e
congressperson a congressperson d
congressperson b congressperson e
congressperson c
bill A bill B bill C
(a) bipartite network
(b) congresspersonmode projection (c) billmode projection
bill A
bill B bill C
FIG. 1: The construction procedure for the weighted projected networks from the bipartite network.
Figure 1 illustrates the method to project the bipartite cosponsorship network to the weighted
bill and congressperson networks similar to the one used in the relationship between protein com-
plexes and component proteins [1], where Tables I and II show the detailed statistics divided by
eight periods. There are 15 dierent status assigned for the bills as listed in Table III, but we do not
distinguish those status in the following analysis for now. For the bipartite network, Fig. 2 shows
the degree distributions of bills and congresspersons, indicating that the number of congressper-
sons cosponsoring bills is much more heterogeneously distributed compared to the number of bills
in which individual congresspersons participates.
The bills are clearly partitioned into ten periods (2006 I, 2006 II, 2007 I, 2007 II, 2008 I,
2008 II, 2009 I, 2009 II, 2010 I, and 2010 II), but the weighted congressperson-mode projection
networks for dierent years (composed of 130 members in total) shares most congresspersons (ex-
cept for the congresspersons invisible for a specific year due to their absence in the cosponsoring
activities), which allows the temporally changing or multiplex relations over the eight periods. In
2
Congress of the Republic of Peru (20062011)
Senate of the United States (1973-2009)
temporally ordered multilayer
Senate of the United States (1973-2009)
93 97
101 105 109
50 100 150 200 250 300c on g
r es s
i nd e
x
senator index
19 time points, a=1.0, t=20.0
93 97
101 105 109
50 100 150 200 250 300c on g
r es s
i nd e
x
senator index
19 time points, a=1.0, t=40.0
93 97
101 105 109
50 100 150 200 250 300c on g
r es s
i nd e
x
senator index
19 time points, a=1.0, t=60.0
93 97
101 105 109
50 100 150 200 250 300c on g
r es s
i nd e
x
senator index
19 time points, a=1.0, t=80.0
93 97
101 105 109
50 100 150 200 250 300c on g
r es s
i nd e
x
senator index
19 time points, a=1.0, t=100.0
log(modularity Q) log[flexibility (average number of community switching normalized by the number of time points)]
paint drip plots
US senate (93rd-110th): 18 time slices
0 20 40 60 80 100t
0
0.5
1
1.5
2
a
-9-8-7-6-5-4-3-2-1 0
US Senate (93rd-110th): 18 time slices
0 20 40 60 80 100t
0
0.5
1
1.5
2
a
-10-9-8-7-6-5-4-3-2-1 0
19 time points: = 1, ! = 20
19 time points: = 1, ! = 40
19 time points: = 1, ! = 60
19 time points: = 1, ! = 80
19 time points: = 1, ! = 100
congressional cosponsorship networks
(a) (b)
10-4
10-3
10-2
10-1
100
100 1