GRAPH Lecture Notes

Networks: Theory and Applications

Autumn 2011

Dr. Michael T. Gastner

References:• M. E. J. Newman, Networks: An Introduction, Oxford University Press, Oxford

(2010).

• C. D. Meyer, Matrix analysis and applied linear algebra, Society for Industrial andApplied Mathematics (SIAM), Philadelphia (2000).

• T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Introduction to Algorithms(3rd ed.), MIT Press, Cambridge (2009).

• R. K. Ahuja, T. L. Magnanti, J. B. Orline, Network Flows, Prentice Hall, UpperSaddle River (1993).

• T. Roughgarden, Selfish Routing and the Price of Anarchy, The MIT Press, Cam-bridge (2005).

1 Introduction

A network is a set of points connected by lines. I will refer to the points as nodes and tothe lines as links.

node

link

Figure 1: A small network composed of 10 nodes and 9 links.

In different fields, nodes and links are called by different names.

engineering and mathematics physics socialcomputer science sciences

“point” node vertex site actor“line” link edge bond tie“network” network graph network network

1

Figure 2: The structure of the Internet. The positions of the nodes in the figure is not repre-sentative of their real geographic location. Figure created by the Opte project (www.opte.org).

Example: the Internet (Fig. 2)

node: class C subnet (group of computers with similar IP addresses, usually administratedby a single organisation)

link: routes taken by IP packets, usually optical fibre

Example: the World Wide Web (Fig. 3)

Not to be confused with the Internet, which is a physical network of computers, the WorldWide Web is an information network.

node: web page

link: hyperlink (i.e., the fields to click on to navigate from one page to another)

Note: links are directed (i.e. can be traversed in one direction, but not necessarily in theopposite direction).

Example: social network (Fig. 4)

node: person

link: friendship, business relationship

Example: scientific collaborations (Fig. 5)

node: scientist

2

Figure 3: The network of 180 web pages of a large corporation. From M. E. J. Newman andM. Girvan, Physical Review E 69, 026113 (2004).

Figure 4: Friendship network of children at a US school. Node colours represent ethnicity. FromJames Moody, American Journal of Sociology 107, 679–716 (2001).

3

Figure 5: A network of scientific collaborations at the Santa Fe Institute. From M. Girvan andM. E. J. Newman, Proc. Natl. Acad. Sci. USA 99, 8271-8276 (2002).

link: shared authorship on a scientific publication

Note: publications can be co-authored by more than two scientists, but we cannot tellthis from the above network.Better represented by two types of nodes: scientists and publications Links only betweenscientists and papers they co-authored ⇒ Bipartite network.

Example: scientific citation network (Fig. 6)

node: scientific publications

link: there is a link from publication A to publication B if A cites B in its bibliography.

Note: citation networks are (almost) acyclic (i.e. all directed links point backward intime). One cannot cite a paper that is not yet published.

Example: mobile phone call network (Fig. 7)

node: mobile phone user

link: call between two users

Example: food web (Fig. 8)

node: species

link: predator-prey relationship

Example: brain (Fig. 9)

node: neurons

4

Figure 6: Citation network of early DNA articles. Image fromhttp://www.garfield.library.upenn.edu/papers/vladivostok.html

Figure 7: Part of a large network of mobile phone calls. Image from Wang et al., Science 324,1071-1076 (2009).

5

Figure 8: Food web of a Caribbean coral reef. Image by Neo Martinez (Pacific Ecoinformaticsand Computational Ecology Lab).

Figure 9: Anatomical representation of brain regions and their connection. From Meunier etal., Frontiers in Neuroinformatics 3, 37 (2009).

6

Figure 10: A wallchart showing the network formed by major metabolic pathways. Created byDavid Nicholson.

link: synchronised activity

Example: metabolic network (Fig. 10)

node: metabolite

link: chemical reaction

Example: urban railroads (Fig. 11)

node: station

link: train connection

Example: road map (Fig. 12)

node: junction

link: street

Typical questions in network analysis:

• paths and distances: what is the shortest route between two nodes?

• centrality: who is the most influential person in a social network?

• community structure: can we identify groups of like-minded individuals in a socialnetwork?

7

Figure 11: Rail service map of London. Image by Transport for London.

Figure 12: The road network near Imperial College.

8

• flow: how can traffic be routed to avoid congestion?

9

2 Networks represented by their adjacency matrix

2.1 Undirected networks

Definition 2.1:An undirected, simple network G = (N,L) is an ordered pair of a set of nodes N and aset of links L. The links are subsets of N with exactly two distinct elements.

Note: In a simple network there cannot be multiple links (“multiedge”) between twonodes and no node can be connected to itself (i.e. no “self-loops”).

1

2

3

4

5

6

(a) (b)

multiple links

self-loop

1

2

3

4

5

6

Figure 13: (a) An undirected, simple network (i.e. a network without multiple links betweenthe same pair of nodes or self-loops). (b) An example for a network with multiple links andself-loops.

If we allow multiple links, the network is called a multigraph. (In this course, we willmostly deal with simple networks.)Let us label the nodes 1, . . . , n. The order does not matter as long as every node label isunique.The network can be represented by specifying the number of nodes n and the edge list.For example in Fig. 13a, n = 6 and the links are (1, 2), (1, 5), (1, 6), (2, 3), (3, 4), (3, 5)and (4, 5).

Another representation is the adjacency matrix.

Definition 2.2:The adjacency matrix A of a simple network is the matrix with elements Aij such that

Aij =

{

1 if there is a link between nodes i and j (”i and j are adjacent”),

0 otherwise.

Example:The adjacency matrix of the network in Fig. 13a is

10

A =

0 1 0 0 1 11 0 1 0 0 00 1 0 1 1 00 0 1 0 1 01 0 1 1 0 01 0 0 0 0 0

Note:

• The diagonal elements Aii are all zero (no self-loops).

• A is symmetric (if there is a link between i and j, then there is also a link betweenj and i).

2.2 Directed networks

A directed network (also called a directed graph or digraph) is a network where the linksonly go in one direction.

Formally, in Definition 2.1, the elements in the link set L are now ordered (instead ofunordered) pairs of nodes.

Examples: the World Wide Web, food webs, citation networks.

The links can be represented by lines with arrows on them.

1

2

3

4

5

6

Figure 14: A directed network.

Definition 2.3:The adjacency matrix of a directed network has matrix elements

Aij =

{

1 if there is a link from j to i,

0 otherwise

Note: the direction of the link is counter-intuitive, but this notation will be convenientlater on.

11

Example:The adjacency matrix of the network in Fig. 14 is

A =

0 0 0 0 1 01 0 1 0 0 00 0 0 0 1 00 0 1 0 1 00 0 1 0 0 01 0 0 0 0 0

Note: A is asymmetric.

2.3 Weighted networks

In some networks, it is useful to assign different weights to links.

Examples:

• Traffic in a transportation network.

• Frequency of contacts in a social network.

• Total energy flow from prey to predator in a food web.

This information can be represented by an adjacency matrix where the entries are notall either 0 or 1.

If weights are non-negative, they can be represented by line thickness.

Example:The network with the weighted adjacency matrix

A =

0 0 0 0 2 02 0 0.5 0 0 00 0 0 0 0 00 0 1.5 0 0.5 00 0 1 0 0 01 0 0 0 0 0

looks like Fig. 15.

Sometimes it is useful to consider negative weights.

Example:In a social network,

positive weight: friendship,negative weight: animosity.

A special case are signed networks, where all weights are either +1 or −1 (or 0 if thereis no link).Structural balance theory states that signed social networks are stable if and only if either

• two friends have the same friends or

12

1

2

3

4

5

6

Figure 15: A weighted network.

• my enemy’s enemy is my friend.

A recent study of interactions in a virtual life game (Szell et al., PNAS 107, 13636[2010]) with ≈ 300, 000 participants confirmed that most triads (i.e. sub-networks of threemutually connected players) satisfy these two rules. Triads with exactly two positive linkswere less likely than in a null model where the total number of +’s and −’s was fixed,but randomly redistributed over the links. The case of three negative links in a triad ismore complicated: there were relatively few such triads, but their number was not muchsmaller than in the null model.

+

++

+

−−

+

+−

−

−−

structural balancetheory stable stable unstable unstableN∆ 26, 329 4, 428 39, 519 8, 032N rand

∆ 10, 608 28, 545 30, 145 9, 009

Table 1: Possible triad configurations in a signed network. N∆: empirical number of triads in alarge virtual-life community. N rand

∆ : expectation value for sign randomisation. Data from Szellet al., PNAS 107, 13636 (2010).

2.4 Cocitation and bibliographic coupling

Cocitation and bibliographic coupling are two different ways of turning a simple un-weighted directed network into a weighted undirected network.

Definition 2.4:The cocitation Cij of two nodes i and j in a directed network is the number of nodeswith links pointing to both i and j.

Example: Academic citation network

13

i j

Figure 16: Papers i and j are cited together by three papers, so Cij = 3.

Cocitation and the adjacency matrix:From the definition of the adjacency matrix A,

Cij =n∑

k=1

AikAjk

or, expressed as cocitation matrix C,

C = AAT .

Interpretation of cocitationIn citation networks, a large cocitation is an indicator that two papers deal with relatedtopics.C is similar to an adjacency matrix, but it will generally have non-zero entries on thediagonal,

Cii =

n∑

k=1

A2ik =

n∑

k=1

Aik,

thus Cii is equal to the total number of links pointing to i.

Definition 2.5:The bibliographic coupling Bij of two nodes i and j is the number of other nodes to whichboth point.

Example: Academic citation networkBibliographic coupling and the adjacency matrix:

Bij =n∑

k=1

AkiAkj

or, expressed as bibliographic coupling matrix B,

B = ATA.

Interpretation of bibliographic coupling:

14

i j

Figure 17: Papers i and j cite three of the same papers, so Bij = 3.

Similar to cocitation, a large value Bij indicates that papers i and j are about a similarsubject.Difference:

Strong Cij requires both i and j to be highly cited.Strong Bij requires both i and j to cite many papers.

In practice Bij works better because the bibliography sizes of papers are more uniformthan the citations received by papers.Bij is used, for example, by the Science Citation Index in its “Related Records” features.

15

3 Degree

3.1 Definitions

Definition 3.1:The degree ki of a node i in a simple, undirected, unweighted network is the number oflinks connected to i.

degree 4

Figure 18: An undirected network. The node in the centre has degree 4.

Remarks:

• The degree can be computed from the adjacency matrix, ki =∑n

j=1Aij .

• The total number m of links in the network satisfies m = 12

∑ni=1 ki.

Definition 3.2:In a directed, unweighted network, the in-degree kin

i of a node i is the number of ingoinglinks and the out-degree kout

i the number of outgoing links.

in-degree 1out-degree 4

Figure 19: An directed network. The node in the centre has in-degree 1 and out-degree 4.

Remarks:

• kini =

∑nj=1Aij, k

outj =

∑ni=1Aij .

• m =∑n

i=1 kini =

∑nj=1 k

outj .

3.2 Degree Distributions

Definition 3.3:In an undirected network, the degree distribution is the sequence p1, p2, . . ., where pk isthe fraction of nodes in the network with degree k.

16

Example:

p0 = 110

, p1 = 310

, p2 = 310

, p3 = 210

, p4 = 0, p5 = 110

.

Remark: In a directed network, we can similarly define the in-degree distribution andout-degree distribution.

Example:

Figure 20: The in- and out-degree of the World Wide Web. From Broder et al., Comput. Netw.33, 309–320 (2000).

The distributions are often “heavy-tailed”: there are some nodes (“hubs”) with very highdegree. As a first approximation, the distributions can be fit by power laws. But how tomake power-law fits statistically sound is a matter of controversy and current research.

17

4 Walks, cycles and paths

4.1 Definitions

Here we consider simple unweighted networks. They may be undirected or directed.Definition 4.1:

• A walk is a sequence of nodes v1 → v2 → . . . vk in which every consecutive pair ofnodes in the sequence is connected by a link in the network (i.e. Avi+1,vi

= 1, i =1, . . . , k).

• The length of a walk is the number of links traversed along the walk (i.e. k − 1).

• A cycle is a walk that begins and ends at the same node (i.e. v1 = vk).

• A path is a walk that does not contain any cycles.

Remark:Links and nodes in a walk and in a cycle can be traversed more than once, but in a pathmultiple traversals are forbidden.

Example:

walk

path

cycle

Figure 21: A walk of length 6, a cycle of length 3 and a path of length 3.

4.2 A reminder: Jordan normal form

We want to relate walks and cycles to the adjacency matrix. For this purpose (and someapplications later in the course), it will be convenient to transform the adjacency matrixinto Jordan normal form. Here is a brief summary of the properties of the Jordan normalform. Proofs can be found in most linear algebra textbooks.

Theorem 4.2:For every complex square matrix M, there exists a non-singular matrix P such thatJ = P−1MP is upper triangular and block diagonal,

J =

J1 0 . . . 0

0 J2 . . . 0...

.... . .

...0 0 . . . Jp

,

18

where each “Jordan block” Ji is an upper triangular square matrix of the form

Ji =

λi 1 0 . . . 00 λi 1 . . . 0...

. . .. . .

. . ....

0 . . . 0 λi 10 . . . 0 0 λi

.

The diagonal entry λi is an eigenvalue of M.The “Jordan normal form” J is unique up to the order of the Jordan blocks.

Definition 4.3:

• The index of the eigenvalue λi, index(λi), is the size of the largest Jordan blockwith diagonal entries λi.

• The algebraic multiplicity of λi, alg mulM(λi), is the number of times λi is repeatedon the diagonal of J.

• The geometric multiplicity of λi, geo mulM(λi), is the number of Jordan blockswith λi on the diagonal.

• The spectral radius ρ(M) of the matrix M is the maximum absolute value of alldiagonal entries in J, i.e. ρ(M) = maxi |λi|.

Example:

M =

8 −1/2 5 50 −12 0 00 1/2 3 −50 3/2 −15 −7

can be brought into Jordan normal form

J = P−1MP =

8 0 0 00 8 0 00 0 −12 10 0 0 −12

with

P =1

4

1 1 −1 00 0 0 23 −1 1 0−3 1 3 0

.

⇒ index(8) = 1, index(−12) = 2,alg mulM(8) = alg mulM(−12) = 2,geo mulM(8) = 2, geo mulM(−12) = 1,ρ(M) = 12.

19

4.3 Relating walks and cycles to the adjacency matrix

Proposition 4.4:

Let us denote by N(r)ij then umber of walks of length r from node j to node i. If A is the

adjacency matrix, thenN

(r)ij = [Ar]ij ,

i.e. N(r)ij is the (i, j)-th entry of the r-th power of the adjacency matrix.

Proof :

• r = 1:

There is a walk from j to i if and only if there is a (directed) link between these

two nodes. ⇒ N(1)ij = Aij.

• Induction from r to r + 1:

If there are N(r)ik walks of length r from k to i, then the number of walks of length

r + 1 from j to i visiting k as the second node is equal to N(r)ik Akj. Summing over

k yields the number of all walks. ⇒N

(r+1)ij =

∑nk=1N

(r)ik Akj =

∑nk=1[A

r]ikAkj = [Ar+1]ij. �

Let us denote by Cr the number of all cycles of length r anywhere in the network.Note that Cr counts, for example, the cycles

1 → 2 → 3 → 1,2 → 3 → 1 → 2 and1 → 3 → 2 → 1

as separate cycles.

Proposition 4.5:Consider an arbitrary (directed or undirected) network with n nodes. Let the (generallycomplex) eigenvalues of its adjacency matrix A be λ1, . . . , λn. (Note: if eigenvalue λi hasalgebraic multiplicity ai, it appears ai times in this sequence.) Then the number of cyclesof length r is

Cr =

n∑

i=1

λri .

Proof :From Prop. 4.4,

Cr =

n∑

i=1

[Ar]ii = Tr(Ar). (1)

Viewing A as a complex matrix, we can transform it into Jordan normal form: J =P−1AP.1

Because of the upper triangular form of J, T = Jr is upper triangular for any positiveinteger r and the diagonal entries are λri .

1If the network is undirected, A is symmetric so that we can even assume J to be diagonal. But fordirected networks the general Jordan normal form is the best we can do.

20

T = Jr =

λr1 T12 T13 . . . T1n

0 λr2 T23 . . . T2n

0 0 . . . . . . . . .. . . . . . . . . . . . . . .0 0 0 0 λrn

.

Now plug this into Eq. 1,

Cr = Tr(PJrP−1)(∗)= Tr(P−1PJr) = Tr(Jr) =

n∑

i=1

λri .

In step (∗) we have used that Tr(M1M2) = Tr(M2M1) for any square matrices M1,M2.

�

4.4 Directed acyclic networks

Definition 4.6:A directed network with no cycles is called acyclic.

Example: scientific citation networkA paper can only cite another paper if it has already been written.2 ⇒All directed links point backward in time.

time

Figure 22: An example of a directed acyclic network.

Proposition 4.7:Consider a directed network whose nodes are labeled 1, . . . , n. Then the following twostatements are equivalent.(A) The network is acyclic.(B) There exists a sequence ti ∈ R, i = 1, . . . n, so that tj > tk for all links j → k.

2Rare exceptions exist, for example if an author publishes two papers simultaneously in the samejournal and each paper cites the other. Thus, real citation networks have a small number of short cycles.

21

Remark: ti plays the role of the publication date in citation networks.

Proof: (A) ⇒ (B)There must be at least one node with out-degree 0. To see this consider the followingpath across the network.

(i) Start at an arbitrary node,

(ii) If this node has out-degree 0 → we are done.

(iii) Otherwise choose one of the directed outgoing links and follow it to a new node.Go back to step (ii).

If we pass through step (ii) more than n times, we must have revisited a node that hasalready been on the path. But then we have found a cycle, contradicting (A).⇒ The above algorithm must terminate.⇒ There is at least one node i1 with out-degree 0. Assign ti1 = 1.Now remove i1 and all of the links attached to it from the network.The remaining network of n − 1 nodes must again have one node i2 with no outgoinglinks. Set ti2 = 2. Remove i2 from the network and repeat this procedure to assignti3 = 3, . . . , tin = n. The sequence ti satisfies (B).Note: ti is not unique. For example, if there is more than one node without outgoinglinks, we can choose arbitrarily which one we remove next.

Proof: (B) ⇒ (A)Suppose we found a cycle of nodes n1 → n2 → . . . → ni → n1. From (B) and the firsti − 1 steps in the cycle, we know that t1 > t2 > . . . > ti. The last step in the cycleni → n1 demands ti > t1 in contradiction to the previous inequality. �

Proposition 4.8:Consider a network with n nodes. The following three statements are equivalent.(A) The network is acyclic.(B) The adjacency matrix A satisfies An = 0. (This implies that A is nilpotent.)(C) All (complex) eigenvalues of A are zero.

Proof: (A) ⇒ (B)Use the algorithm developed in the proof of Prop. 4.7 to find a sequence ti ∈ {1, . . . , n}so that tj > tk for all links j → k.Define the permutation π so that π(i) = ti and the n× n permutation matrix

P =

eπ(1)

· · ·eπ(n)

,

whereei = (0, . . . , 0, 1

︸︷︷︸

i-th position

, 0, . . . , 0).

P−1AP is strictly upper triangular (i.e. has only zeros on the diagonal),

P−1AP =

0 x12 · · · x1n

0 0 · · · x2n...

.... . .

0 0 · · · 0

.

22

⇒ (P−1AP)n = 0 ⇒ P−1AnP = 0 ⇒ An = 0.

Proof: (B) ⇒ (C)Let λ be an eigenvalue of A with eigenvector v,λv = Av ⇒ λnv = Anv = 0 ⇒ λ = 0.

Proof: (C) ⇒ (A)This follows from Prop. 4.5.

�

23

5 Components

Definition 5.1:An undirected network is connected if there is a path between every pair of nodes.

An undirected network that is not connected can be divided into components defined asmaximal connected subsets.

Figure 23: An undirected network with three components.

In directed networks the situation is more complicated. If there is a path from node i toj, there may not be a path from j to i.Weakly connected components: these are the components in the network if all directedlinks are replaced by undirected links.Strongly connected components: two nodes i and j belong to the same strongly connectedcomponent if there are directed paths from i to j and from j to i.

Figure 24: A directed network with two weakly and four strongly (shaded) connected compo-nents.

Example:Directed acyclic networks have no strongly connected component with more than onenode.

Definition 5.2:The out-component of a node i is the set of all nodes reachable from node i via directedpaths, including i itself.The in-component of i is the set of all nodes from which i can be reached via directedpaths, including i itself.

24

i

inout

Figure 25: The in- and out-component of a node i in a directed network.

Remark: If node j is in both the in- and out-component of i, then i and j are in the samestrongly connected component.

The component structure of directed networks is sometimes visualised in form of a “bow-tie diagram”. Below is the diagram for the World Wide Web.

Figure 26: From Broder et al., Comput. Netw. 33, 309–320 (2000).

25

6 Cycles in bipartite and signed networks

6.1 Bipartite networks

Definition 6.1:An undirected network is called bipartite if the nodes can be divided into two disjointsets N1, N2 so that every link connects one node in N1 to one node in N2.

1 2 3 4 5

A B C

N

N

1

2

Figure 27: A small bipartite network.

Examples:

network N1 N2

scientific co-authorship author co-authored publicationboard of directors director board of a company

recommender systems customers people who bought this book,(e.g. Amazon) movie etc.

public transport station, stop train, tram, bus routefilm actors actor cast of a film

(“Kevin Bacon game”)

Theorem 6.2:The following two statements are equivalent:

(A) A network is bipartite.(B) The length of every cycle is an even number.

Proof (A) ⇒ (B)Consider an arbitrary cycle v1 → v2 → . . .→ vk → v1. Because the network is bipartite,vi and vi+1 must be in different sets.Without loss of generality, assume v1 ∈ N1. (⋆)Then v3, v5, v7 . . . ∈ N1 and v2, v4, v6, . . . ∈ N2. If k is odd, then vk ∈ N1 and, because v1

is adjacent to vk, v1 ∈ N2 in contradiction with (⋆).⇒ The cycle length k is even.

Proof : (B) ⇒ (A)Let us assume that the network is connected. Choose a node v and define

X = {node x | the shortest path from v to x has even length},Y = {node y | the shortest path from v to y has odd length}.

We will show that X and Y play the role of N1 and N2 in Def. 6.1.

26

Let x1, x2 be in X and suppose they are adjacent. v is not adjacent to x1; otherwise theshortest path from v to x1 would have length 1 and thus would not be even. Thereforev 6= x2.Repeating the same argument with the sub-indices 1 and 2 interchanged, we also knowv 6= x1.Let P1 : v → v1 → . . . → v2k be a shortest path from v to v2k = x1 and let P2 : v →w1 → . . .→ w2l be a shortest path from v to w2l = x2. Note that both P1 and P2 are ofeven length.Then the cycle v → v1 → . . .→ x1 → x2 → . . .→ w1 → v has odd length in contradictionto (B).If the network is not connected, we can apply the above argument to every component.Because a network is bipartite if and only if each component is bipartite, the proof isfinished. �

Definition 6.3:The incidence matrix B of a bipartite network is a |N2| × |N1| matrix with entries

Bij =

{

1 if node j ∈ N1 is linked to i ∈ N2,

0 otherwise.

Example: In Fig. 27,

B =

1 0 0 1 01 1 1 0 00 1 1 0 1

.

Although a bipartite network represents the complete information, it is sometimes moreconvenient to eliminate either N1 or N2 and only work with links between the same typeof nodes.

Example: In the Kevin-Bacon game, we try to find the “degree of separation” (i.e. theminimum number of links, a.k.a. Bacon number) between Kevin Bacon and some otheractor.3 For example, the Bacon number of Clint Eastwood is 2, because Eastwood playedwith Glenn Morshower in “Blood Works” (2002) and Morshower with Bacon in “TheRiver Wild” (1994). But to determine the Bacon number, it is enough to know thatthere is a connection Eastwood ↔ Morshower and Morshower ↔ Bacon. The names ofthe movies do not matter. This motivates the next definition.

Definition 6.4:The one-mode projection of a bipartite network on the set N1 is the weighted networkwith node set N1 whose adjacency matrix A has elements

Aij =

{∑|N2|k=1 BkiBkj if i 6= j

0 otherwise.

3There is a similar game called the “Erdos number” for mathematicians. Here mathematicians arelinked if they have co-authored a paper. The Erdos number is the distance from Paul Erdos (1913-1996),a famous Hungarian mathematician, in the one-mode projection. For example, my Erdos number is 4(to the best of my knowledge). We will encounter the work of Paul Erdos in random network theorylater in this course.

27

Remarks:

• If we define D1 to be the diagonal matrix containing the degrees of nodes in N1,

D1 =

k1 0 0 . . .0 k2 0 . . .0 0 k3 . . ....

......

. . .

,

then A = BTB −D1.

• Similarly we can define the one-mode projection on N2 (instead of N1). If D2

contains the degrees in N2,

D2 =

kA 0 0 . . .0 kB 0 . . .0 0 kC . . ....

......

. . .

,

then A = BBT −D2.

Example:

(a) (b)

1

2

3

4

51

1

1

2 1

1

AB

C

1

2

Figure 28: (a) One-mode projection of the bipartite network in Fig. 27 on N1. (b) One-modeprojection on N2.

6.2 Structural balance in signed networks.

Recall from Sec. 2.3 that a signed network is a simple weighted network whose weightsare all equal to either +1 or −1. In this section, we consider only undirected networks.In social networks:

• +1: friendship,

• −1 animosity.

Definition 6.5:An undirected signed network whose nodes can be partitioned into two (possibly empty)sets N1 and N2 so that

28

NN

2

1

−

−

+

+

−

−

−

+

Figure 29: A small structurally balanced network.

• each link v ↔ w with v, w ∈ N1 or v, w ∈ N2 has weight +1,

• each link v ↔ w with v ∈ N1, w ∈ N2 has weight −1,

is called structurally balanced.

Theorem 6.6:The following statements are equivalent.

(A) A signed network is balanced.(B) The product of the signs around each cycle is positive.

Remark: (B) is a generalisation of the two rules:

• my friend’s friend is my friend,

• my enemy’s enemy is my friend.

See Table 1 for balanced (“stable”) and unbalanced (“unstable”) triads.

Proof: (A) ⇒ (B)Consider an arbitrary cycle v1 → v2 → . . . → vk → v1. Every time two consecutivenodes are not in the same set, the sign changes. Because the first and last node areidentical, namely v1, the sign must change an even number of times. Otherwise v1 wouldbe simultaneously in sets N1 and N2 which is impossible because they partition the nodeset and thus N1 ∩N2 = ∅.Proof: (B) ⇒ (A)Let us assume that the network is connected. We will assign the nodes to either N1 orN2 according to the following algorithm:

1. Initially N1 = N2 = ∅.Assign a variable p(v) = −1 to every node v.

2. Choose a node u and assign it to set N1.

3. If all nodes were already assigned to either N1 or N2, then terminate.

4. Choose a node v that has not yet been assigned to neither N1 nor N2, but one ofits neighbours w has been assigned to one of the two sets.Change p(v) to w and

• if w ∈ N1 and the link v ↔ w has weight +1, then assign v to N1,

• otherwise if w ∈ N2 and the link v ↔ w has weight +1, then assign v to N2,

• otherwise if w ∈ N1 and the link v ↔ w has weight −1, then assign v to N2,

29

• otherwise assign v to N1.

5. Go to step 3.

We must show that the algorithm assigns nodes to N1 and N2 so that

(a) all nodes linked to a node v by a link with weight +1 are in the same set as v,

(b) all nodes linked to v by a link with weight −1 are in the opposite set.

• First case: v ∈ N1, w adjacent to v and link v ↔ w has weight +1.Assume w ∈ N2. (⋆)Let P1 be the path v → [p(v) = v1] → [p(v1) = v2] → . . . → [p(vi) = u], where uwas the first node assigned to N1 in the algorithm above.Let P2 be the path w → [p(w) = w1] → [p(w1) = w2] → . . .→ [p(wj) = u].Consider the cycle

C : v → v1 → . . .→ vi →︸︷︷︸

P1

u→ wj → . . .→ w1 → w︸︷︷︸

P2 in opposite direction

→ v. (2)

On our way from v to u, we must encounter an even number of links with weight−1; otherwise v would not be ∈ N1.Similarly, there is an odd number of −1’s between u and w because of assump-tion (⋆).This implies that there is an odd number of −1’s along C, contradicting (B). Thusw must be ∈ N1.

• Second case: v ∈ N1, w adjacent to v and link v ↔ w has weight −1.Assume w ∈ N1. (⋄).Define P1 and P2 as in the first case by tracing back our paths from v and w tou. Form the cycle C as in Eq. 2. This time there is an even number of −1’s alongP1 and, because of (⋄), also along P2 so that C has an odd number of −1’s incontradiction to (B). Thus (⋄) must be false.

• The remaining two cases, namely

– v ∈ N2, w adjacent to v and link v ↔ w has weight +1,– v ∈ N2, w adjacent to v and link v ↔ w has weight −1,

can similarly be shown to imply w ∈ N2 and w ∈ N1, respectively.

If the network is not connected, we can apply the above argument to every component.Because a network is structurally balanced if and only if each component is structurallybalanced, the proof is finished. �

30

7 Models of spread in networks

In this chapter, we only consider simple undirected networks. The generalisation fordirected networks is not straightforward.

7.1 Diffusion

Assume

• there is some commodity distributed on the nodes,

• there is an amount ψi on node i,

• the commodity flows along the links,

• the flow on j → i is at a rate C(ψj−ψi), where C is the so-called diffusion constant.

⇒ dψidt

= C∑

j

Aij(ψj − ψi), (3)

where A is the adjacency matrix. We can rewrite Eq. 3 as

dψidt

= C∑

j

Aijψj − Cψi∑

j

Aij = C∑

j

Aijψj − Cψiki

= C∑

j

(Aij − δijki)ψj , (4)

where ki is the degree of i and δij is the Kronecker delta. In matrix form, Eq. 4 becomes

dψ

dt= C(A −D)ψ, (5)

where

D =

k1 0 0 . . .0 k2 0 . . .0 0 k3 . . ....

......

. . .

.

Definition 7.1:The matrix L = D − A is called the graph Laplacian.

The diffusion equation, Eq. 5, can be written as

dψ

dt= −CLψ. (6)

Remark: In continuous space, the diffusion (or heat) equation is ∂ψ∂t

= C∇2ψ. So L playsthe same role as the ordinary Laplacian ∇2, apart from the minus sign on the left-handside of Eq. 6. We could absorb the minus sign in the definition of L, but unfortunatelythis is not standard practice.

Because L is symmetric, we can find an orthonormal basis of eigenvectors v1, . . . ,vn. Wecan express any solution of Eq. 6 as

ψ(t) =∑

i

ai(t)vi,

31

where ai(t) are time-dependent coefficients. Let λi be the eigenvalue corresponding tothe eigenvector vi. Then it follows from Eq. 6 that

∑

i

daidt

vi = −C∑

i

λiaivi. (7)

Because the vi form a basis, the coefficients on both sides of Eq. 7 must be equal, thus

daidt

= −Cλiai.

The solution is ai(t) = ai(0) exp(−Cλit),

⇒ ψ(t) =∑

i

ai(0) exp(−Cλit)vi. (8)

In summary, given the initial conditions and eigenvalues and eigenvectors of L we cancalculate the diffusion dynamics on a network.

7.2 Eigenvalues of the graph Laplacian

Proposition 7.2:All eigenvalues of the graph Laplacian are non-negative.

Proof:For every link in the network, arbitrarily designate one end of the link to be “end 1” andthe other “end 2”. If there are m links in total, define the m × n “node-link incidencematrix” B with elements

Bij =

+1 if end 1 of link i is attached to node j,

−1 if end 2 of link i is attached to node j,

0 otherwise.

Consider∑

k BkiBkj.

• Case i 6= j:

BkiBkj =

{

−1 if link k connects nodes i and j,

0 otherwise.

In a simple network, there is at the most one link between two nodes, so

∑

k

BkiBkj =

{

−1 if i and j are connected,

0 otherwise.(9)

• Case i = j:

B2ki =

{

1 if link k is connected to node i,

0 otherwise.

⇒∑

k

B2ki = ki. (10)

32

From Eq. 9 and 10,BTB = L. (11)

Let vi be a normalised eigenvector of L with eigenvalue λi. Then

vTi B

TBvi = vTi Lvi = λiv

Ti vi = λi

Because vTi B

TBvi = |Bvi|2 ≥ 0, λi cannot be negative. �

Proposition 7.3:The graph Laplacian has at least one eigenvalue 0.

Proof:Multiply L with the vector 1 = (1, 1, . . . , 1)T. The i-th element of the product is

∑

j

Lij × 1 =∑

j

(δijki −Aij) = ki −∑

j

Aij = ki − ki = 0.

In matrix notation, L · 1 = 0. ⇒ 1 is eigenvector with eigenvalue 0. �

Proposition 7.4:The multiplicity of the eigenvalue 0 equals the number of connected components in thenetwork.

Proof:Assume the network consists of c components of sizes n1, . . . , nc and the nodes are labeledso that the nodes

• 1, . . . , n1 belong to the first component,

• n1 + 1, . . . , n2 to the second component etc.

Then L is block diagonal,

L =

0 . . .0 . . ....

.... . .

and the blocks are the Laplacians of the individual components. We can use the sameargument as in Prop. 7.3 to show that

v1 = (1, . . . , 1︸︷︷︸

n1 ones

, 0, . . . , 0)T, v2 = (0, . . . , 0︸︷︷︸

n1 zeros

, 1, . . . , 1︸︷︷︸

n2 ones

, 0, . . . , 0)T, . . .

are c linearly independent eigenvectors of L with eigenvalue 0.We now have to prove that all vectors u satisfying Lu = 0 are linear combinations ofv1, . . .vc.

Lu = 0Eq. 11⇒ uTBTBu = 0 ⇒ |Bu| = 0 ⇒ Bu = 0.

From the definition of B, Bu = 0 implies that ui = uj for every link i↔ j. By inductionon the path length, we can show that ui is constant for all nodes i on a path and hencefor all i in the same component. The vector u must then be of the form

u = (a1, . . . , a1︸︷︷︸

n1 times

, a2, . . . , a2︸︷︷︸

n2 times

, . . . , ac, . . . , ac︸︷︷︸

nc times

)T = a1v1 + . . .+ acvc. �

33

Remark: In Eq. 8, λi ≥ 0 implies that diffusion tends to a stationary solution as t→ ∞.In this limit, the only non-zero term in the sum comes from λi = 0 so that limt→∞ ψj(t)is equal for all nodes j in the same component (i.e. in each component, the commodityis equally spread over all nodes).

7.3 Random walks – Stationary distribution

Definition 7.5:A random walk starting from a specified initial node n1 is a sequence of nodes (n1, n2, . . .)where the node ni+1 is chosen uniformly at random among the nodes linked to ni.

Proposition 7.6:Assume the network is connected, has m links, and let pi(t) be the probability that thewalk is at node i at the t-th step. There is a unique “stationary” distribution satisfyingpi(t) = pi(t− 1) for all i and t, namely pi = ki

2m.

Proof :From Def. 7.5

pi(t) =n∑

j=1

Aijkjpj(t− 1). (12)

or in matrix form p(t) = AD−1p(t− 1).We are looking for a stationary distribution, i.e. p(t−1) = p(t) = p, so that p = AD−1p

or

(I − AD−1)p = (D − A)D−1p = LD−1p = 0. (13)

Equation 13 implies that D−1p is an eigenvector of L with eigenvalue 0.From the proof of Prop. 7.4 we know that for a connected network the only such eigen-vectors are a1 = a× (1, . . . , 1) where a is a constant.⇒ p = aD1 ⇒ pi = aki.Because

∑

i pi = 1 and∑

i ki = 2m, a = 12m

. �

Remark: The stationary solution of the random walk is not equal to the flat stationarysolution of diffusion.

• The random walk spends time on nodes ∝ ki because the higher the degree, themore ways of reaching the node.

• Diffusion has a flat stationary distribution because particles will leave nodes withhigher degree more quickly.

7.4 Random walks – Mean first passage time

We now want to calculate the mean first passage time from a node u to v, i.e. the averagetime needed for a random walk starting at u to reach v. The next definition will be useful.

34

Definition 7.7:

• Let p be a vector in Rn. Define p(v−) to be the (n − 1)-dimensional vector wherethe v-th entry is removed,

p(v−) = (p1, . . . , pv−1, pv+1, . . . , pn)T

• Let M be an n×n matrix and 1 ≤ v ≤ n. Define M(v−) to be the (n− 1)× (n− 1)matrix obtained from M by removing the v-th column and the v-th row,

M(v−) =

M11 . . . M1,v−1 M1,v+1 . . . M1n

. . . . . . . . . . . . . . . . . .Mv−1,1 . . . Mv−1,v−1 Mv−1,v+1 . . . Mv−1,n

Mv+1,1 . . . Mv+1,v−1 Mv+1,v+1 . . . Mv+1,n

. . . . . . . . . . . . . . . . . .Mn1 . . . Mn,v−1 Mn,v+1 . . . Mnn

.

In the special case where M is the graph Laplacian L, L(v−) is called the v-th reducedLaplacian.

• Let N be an (n− 1)× (n− 1) matrix. Define N(v+) to be the n× n matrix that isequal to N with a v-th row and column of zeros,

N(v+) =

N11 . . . N1,v−1 0 N1v . . . N1,n−1

. . . . . . . . . . . . . . . . . . . . .Nv−1,1 . . . Nv−1,v−1 0 Nv−1,v . . . Nv−1,n−1

0 . . . 0 0 0 . . . 0Nv1 . . . Nv,v−1 0 Nv,v . . . Nv,n−1

. . . . . . . . . . . . . . . . . . . . .Nn−1,1 . . . Nn−1,v−1 0 Nn−1,v . . . Nn−1,n−1

.

To calculate the mean passage time we also need the following proposition.

Proposition 7.8:Let M be a symmetric matrix. The series

∑∞t=1 t(M

t−1 −Mt) converges if and only if alleigenvectors satisfy |λi| < 1. In that case,

∑∞t=1 t(M

t−1 − Mt) = (I − M)−1.

Proof:Because M is symmetric, there exists an orthogonal matrix Q so that

QMQ−1 =

λ1 0 . . . 0

0 λ2...

.... . . 0

0 . . . 0 λn

.

35

⇒ Q

(∞∑

t=1

t(Mt−1 − Mt)

)

Q−1 =

∞∑

t=0

t(QMt−1Q −QMtQ−1) =

∞∑

t=0

t[(QMQ−1)t−1 − (QMQ−1)t

]) =

∑

t t(λt−11 − λt1) 0 . . . 0

0∑

t t(λt−12 − λt2)

......

. . . 00 . . . 0

∑

t t(λt−1n − λtn).

. (14)

Let us have a closer look at the non-zero entries,

∞∑

t=1

t(λt−1i − λti) =

limN→∞

(λ0i − λ1

i + 2λ1i − 2λ2

i + 3λ2i − 3λ3

i + . . .+NλN−1i −NλNi ) =

∞∑

t=0

λti

︸︷︷︸

geometric series

− limN→∞

NλNi︸︷︷︸

→0 if and only if |λi| < 1

=1

1 − λi. (15)

Insert Eq. 15 in Eq. 14

Q

(∞∑

t=1

t(Mt−1 − Mt)

)

Q−1 = (16)

(1 − λ1)−1 0 . . . 0

0 (1 − λ2)−1 ...

.... . . 0

0 . . . 0 (1 − λn)−1

= (17)

(I − QMQ−1)−1 = [Q(I − M)Q−1]−1 = Q(I − M)−1Q−1.�

Proposition 7.9:

If all eigenvalues λi of M = A(v−)(D(v−))−1 satisfy |λi| < 1, then the mean first passagetime for a random walk from node u to v is given by

τ =

n∑

i=0

kiΛiu,

where Λ = [(L(v−))−1](v+).

36

Proof:We change the rules of the random walk slightly to make it absorbing: as soon as thewalk reaches v, it cannot leave again. That is, we set Aiv = 0 for all i, rendering A

asymmetric.Define pv(t) as the probability that a walk reaches v for the first time in ≤ t steps. Theprobability that the first passage time is equal to t is pv(t) − pv(t− 1) and the mean is4

τ =∞∑

t=1

t[pv(t) − pv(t− 1)]. (18)

Consider Eq. 12 for i 6= v: pi(t) =∑n

j=1Aij

kjpj(t− 1)

Aiv=0=

∑

j 6=vAij

kjpj(t− 1).

As long as we concentrate on i 6= v. we can simply remove the v-th column and rowfrom the vectors and matrices,

p(v−)(t) = A(v−)(D(v−))−1

︸︷︷︸

M

p(v−)(t− 1), (19)

By iterating Eq. 19, we obtain

p(v−)(t) = Mt p(v−)(0). (20)

Next we observe that

pv(t) = 1 −∑

i6=v

pi(t) = 1 − 1Tp(v−)(t), (21)

where 1 = (1, 1, 1, . . .)T.

⇒ τEq. 18,21

=∞∑

t=1

t1T[p(v−)(t− 1) − p(v−)(t)]Eq. 20=

1T

[∞∑

t=1

t(Mt−1 − Mt)

]

p(v−)(0)Prop. 7.8

=

1T(I −M)−1p(v−)(0). (22)

From the definition of M,

(I − M)−1 = [I −A(v−)(D(v−))−1]−1 = D(v−)[D(v−) − A(v−)]−1 = D(v−)(L(v−))−1. (23)

Insert Eq. 23 in Eq. 22,τ = 1TD(v−)(L(v−))−1p(v−)(0).

The only non-zero entry in p(v−)(0) is p(v−)u (0) = 1 because the random walk is initially

at u. Furthermore, the only non-zero entries in D(v−) are the degrees ki:

τ =

n∑

i=1

ki{[(L(v−))−1](v+)}iu. �

4The sum in Eq. 18 is not absolutely convergent, so that we cannot change the order of the individualterms.

37

8 The leading eigenvalue of the adjacency matrix

8.1 Statement of the Perron-Frobenius theorem

The results in this section apply to directed networks (with undirected networks as aspecial case).

Definition 8.1:An n×n matrix M is reducible if there exists some permutation matrix P so that PTMP

is block upper triangular,

PTMP =

(X Y

0 Z

)

, (24)

where X and Z are square matrices. Otherwise M is called irreducible.

Proposition 8.2:Let A be the adjacency matrix of a directed network. (The network may be weightedwith link weights ≥ 0.) Then the following three statements are equivalent:

(A) A is irreducible.

(B) The directed network is strongly connected.

(C) For each i and j there exists a k so that (Ak)ij > 0.

Proof: (A) ⇒ (B)Suppose A is irreducible, but that the network is not strongly connected.⇒ There exist nodes i and j so that there is no directed path from i to j. (∗)Define S1 = {node k| there is a path from i to k} and let S2 be its complement.⇒ For any node p in S1 and q in S2, there is no path from p to q; otherwise q would havebeen in S1.Define r = card(S1). Because of (∗), r 6= 0 and r 6= n because i ∈ S1 and j ∈ S2.Without loss of generality assume that the nodes in S1 are labeled 1, . . . , r and thusr + 1, . . . , n are in S2.

5

⇒ There is no link from k to l for all k = 1, . . . , r and l = r + 1, . . . , n.⇒ Alk = 0 for all l = r + 1, . . . , n, k = 1, . . . , r, that is A has the block upper triangularform of the right-hand side of Eq. 24.This contradicts that A is irreducible and, hence, the network must be strongly connected.

Proof: (B) ⇒ (C)This follows from Prop. 4.4.

Proof: (C) ⇒ (A)We will prove the contrapositive version. Suppose A is reducible and without loss ofgenerality is upper block triangular as the right-hand side of Eq. 24.Generally, if two upper block triangular matrices whose blocks have identical dimensions,the result is another upper block triangular matrix with the same dimensions.

5We can make this assumption because we can otherwise apply a permutation transformation A =P

TAP which relabels the nodes accordingly in the new adjacency matrix A.

38

That is, if M1 =

(X1 Y1

0 Z1

)

, M2 =

(X2 Y2

0 Z2

)

with r × r matrices X1, X2 and

(n− r) × (n− r) matices Z1, Z2, then

M1M2 =

(X1X2 X1Y2 + Y1Z2

0 Z1Z2

)

.

If M1 = M2 = A, then we know that A2 has the same block dimensions. Applying thisargument repeatedly, Ak also has the same form. In particular, it keeps a (n − r) × rmatrix 0 as lower left block for any k. Hence, (C) does not hold. �

Notation:

• A matrix M or vector v is positive, denoted by M > 0 or v > 0, if all its elementsare positive.

• A matrix M or vector v is non-negative, denoted by M ≥ 0 or v ≥ 0, if it doesnot contain any negative elements.

• Let M be an n×n matrix. An eigenvalue λi that maximises maxj=1,...,n |λj| is calleda leading eigenvalue of M. In other words, λi is a leading eigenvalue if and onlyif its absolute value is equal to the spectral radius of M, i.e. |λi| = ρ(M). (SeeDef. 4.3 for the definition of the spectral radius.)

• The 1-norm of an n-dimensional vector v is defined as ||v||1 =∑n

i=1 |vi|.

If a network is strongly connected, we can apply the next theorem to the adjacencymatrix.

Theorem 8.3:Perron-Frobenius theorem: If the matrix M ≥ 0 is irreducible and its spectral radius isρ(M) = r, then

(A) r is an eigenvalue of M,

(B) alg mulM(r) = 1,

(C) there exists an eigenvector x > 0 of M with eigenvalue r (i.e. Mx = rx),

(D) r > 0.

(E) Let p be the unique vector defined by Mp = rp, p > 0 and ||p||1 = 1. There areno non-negative eigenvectors of M, regardless of their eigenvalue, except positivemultiples of p.

8.2 Proof for strictly positive matrices

We will first prove Theorem 8.3 for the special case where M > 0. The proof followsC. D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, Philadelphia, 2000.Without loss of generality, we can assume |λ1| = 1 because, if this is not the case, we canreplace M by M = M/|λ1|.6

6We can rule out λ1 = 0. Otherwise all eigenvalues are 0 which makes the matrix nilpotent (seeProp. 4.8). But if all Mij > 0, M cannot be nilpotent.

39

We will furthermore use the notation |M| to represent the matrix with entries |Mij |,(i.e. we take the absolute values of the entries in M). Note that the notation | . . . | hereindicates absolute values, not determinants.We will need the following lemma for the proof.

Lemma 8.4:For any complex square matrix M, limk→∞ Mk = 0 if and only if the spectral radiussatisfies ρ(M) < 1.

Proof:If J = P−1MP is the Jordan normal form of M, then

Mk = PJkP−1 = P

Jk1 0 . . . 0

0 Jk2 . . . 0...

.... . .

...0 0 . . . Jkp

P−1, (25)

where all Jordan blocks Ji are of the upper tridiagonal form J∗ =

λ 1. . .

. . .

λ

.

From Eq. 25, limk→∞ Mk = 0 if and only if limk→∞ Jk∗ = 0, so it suffices to prove thatlimk→∞ Jk∗ = 0 if and only if |λ| < 1.Suppose J∗ is an m×m matrix. Induction on k proves that

Jk∗ =

λk(k1

)λk−1

(k2

)λk−2 . . .

(k

m−1

)λk−m+1

λk(k1

)λk−1 . . .

.... . .

. . .(k2

)λk−2

λk(k1

)λk−1

λk

.

From the diagonal entries, we can tell that Jk → 0 implies λk → 0 and thus |λ| < 1.We then only need to show that, conversely, |λ| < 1 implies that all entries in Jk∗ go tozero. The binomial coefficient can be bounded by

(k

j

)

=k(k − 1) . . . (k − j + 1)

j!≤ kj

j!=⇒

∣∣∣∣

(k

j

)

λk−j∣∣∣∣≤ kj

j!|λ|k−j k→∞−→ 0.

The last term goes to zero because kj increases polynomially, but |λ|k decays exponen-tially. �

Lemma 8.5:If M > 0 and λ1 a leading eigenvalue, then the following statements are true.

• M has an eigenvalue equal to |λ1| > 0.

• If Mx = |λ1|x, then M|x| = |λ1||x| and |x| > 0.

In other words, M has a strictly positive eigenvector whose eigenvalue is the spectralradius ρ(M).

40

Proof:Without loss of generality, we can assume |λ1| = 1. Let x be an eigenvector (and hencex 6= 0) of M with eigenvalue λ1. Then

|x| = |λ1||x| = |λ1x| = |Mx|(∗)

≤ |M||x| = M|x| ⇒ |x| ≤ M|x|, (26)

where (∗) follows from the triangle inequality.We want to show that equality holds. For convenience, define z = M|x| and

y = z − |x|. (27)

From Eq. 26, y ≥ 0. Suppose that y 6= 0, that is suppose that some yi > 0. BecauseM > 0, we must then have My > 0 and, since |x| 6= 0, z = M|x| > 0. This implies thatthere exists a number ǫ > 0 such that My > ǫz. Then

My = Mz −M|x| = Mz − z > ǫz ⇒ M

1 + ǫz > z

Define B = M/(1 + ǫ), so Bz > z. Successively multiplying with B > 0, we find

B2z > Bz > z, B3z > B2z > Bz > z ⇒ Bkz > z. (28)

Because λ1/(1 + ǫ) is a leading eigenvalue of B, the spectral radius satisfies ρ(B) =|λ1/(1 + ǫ)| = 1/(1 + ǫ) < 1. According to Lemma 8.4, limk→∞ Bk = 0. Taking the limitin Eq. 28, we find 0 > z in contradiction to z > 0, so the assumption y 6= 0 was false.

⇒ 0 = y = M|x| − |x|,

so |x| is an eigenvector with eigenvalue 1. The proof is completed by observing |x| =M|x| > 0 where the inequality follows from M > 0,x 6= 0. �

Next we want to show that there is only one eigenvalue with absolute value ρ(M). In theproof, we will use the ∞-norm for vectors and matrices.

Definition 8.6:• For a complex n-dimensional vector x, ||x||∞ = maxi |xi|.• For a complex n× n matrix M, ||M||∞ = maxi

∑nj=1 |Mij |.

Proposition 8.7:The matrix ∞-norm is submultiplicative, i.e. ||AB||∞ ≤ ||A||∞ ||B||∞.

Proof:One can easily show that

||A||∞ = max||x||∞=1

||Ax||∞ = maxx 6=0

( ||Ax||∞||x||∞

)

. ⇒

||AB||∞ = maxx 6=0

( ||ABx||∞||Bx||∞

||Bx||∞||x||∞

)

≤(

maxx 6=0

||Ax||∞||x||∞

)(

maxx 6=0

||Bx||∞||x||∞

)

= ||A||∞ ||B||∞. �

41

Lemma 8.8:If M > 0 and λ1 a leading eigenvalue with |λ1| = ρ(M), then

(A) λ1 = ρ(M) (i.e. there is no other eigenvalue with the same absolute value).

(B) index(λ1) = 1. (See Def. 4.3 for the definition of the index.)

Proof: (A)Assume without loss of generality ρ(M) = 1. Let x be an eigenvector with eigenvalue λ1

and |λ1| = 1.⇒ M|x| = |Mx| = |λ1x| = |λ1||x| = |x| ⇒ M|x| = |x|.From Lemma 8.5, we can deduce that

|x| > 0. (29)

We can write the k-th entry in |x| as

|xk| = (M |x|)k =

n∑

j=1

Mkj|xj |. (30)

But xk also satisfies

|xk| = |λ1||xk| = |(λ1x)k| = |(Mx)k| =

∣∣∣∣∣

n∑

j=1

Mkjxj

∣∣∣∣∣. (31)

Combining Eq. 30 and 31, ∣∣∣∣∣

n∑

j=1

Mkjxj

∣∣∣∣∣=

n∑

j=1

Mkj |xj|, (32)

which implies equality in the triangle inequality. From Eq. 29 we know that all terms inthe sums are different from zero. Therefore, the equality of Eq. 32 implies that all termsMkjxj must have the same sign (otherwise the triangle inequality is strict). BecauseMkj > 0 for all k and j, all xj must have the same sign.In other words, there must be a vector p > 0, so that x = ap for some constant a 6= 0.From Mx = x, we can now deduce

λ1p = Mp = |Mp| = |λ1p| = |λ1|p = p

and thus λ1 = 1.

Proof: (B)Suppose that index(1) = m > 1. The Jordan normal form J = P−1MP must contain anm×m Jordan block J∗ with 1s on the diagonal (see Thm. 4.2).We know the general shape of Jk∗ from Eq. 26. If m > 1, then

||Jk∗||∞ = max1≤i≤m

m∑

j=1

|(Jk∗)ij | = 1 +

(k

1

)

+

(k

2

)

+ . . .+

(k

m− 1

)

.

If m is fixed, the right-hand side diverges for k → ∞ and thus ||Jk∗||∞ → ∞ which inturn means ||Jk||∞ → ∞.

42

From Prop. 8.7 we know that ||Jk||∞ = ||P−1MkP||∞ ≤ ||P−1||∞ ||Mk||∞ ||P||∞ or

||Mk||∞ ≥ ||Jk||∞||P−1||∞ ||P||∞

.

The matrices in the denominator are constants and thus ||Jk||∞ → ∞ implies ||Mk||∞ →∞.Let m

(k)ij be the (i, j)-th entry in Mk and let ik denote the row index for which ||Mk||∞ =

∑

jm(k)ikj

. From the proof of (A) we know that there exists a vector p > 0 such that

p = Mp and consequently p = Mkp. For such a p

||p||∞ ≥ pik =∑

j

m(k)ikjpj ≥

(∑

j

m(k)ikj

)

(minipi) = ||Mk||∞ (min

ipi) → ∞.

But this is impossible because p is a constant vector, so the supposition that index(1) > 1must be false. �

Lemma 8.9:If M > 0, then alg mulM(ρ(M)) = 1.

Proof:Assume without loss of generality ρ(M) = 1. Suppose alg mulM(1) = m > 1. We knowfrom Lemma 8.8 that alg mulM(1) = geo mulM(1), so there are m linearly independenteigenvectors with eigenvalue 1. Let x and y be two such independent eigenvectors, i.e.x 6= αy for all complex numbers α. Select a non-zero component yi from y and setz = x − (xi/yi)y. Because Mz = z, we know from Lemma 8.5 that M|z| = |z| > 0. Butthis contradicts zi = xi − (xi/yi)yi = 0. The supposition alg mulM(1) > 1 must thus befalse. �

Definition 8.10:Let M > 0. The unique vector p satisfying

• Mp = ρ(M)p,

• p > 0 and

• ||p||1 =∑

i |pi| = 1

is called the Perron vector of M.Because M > 0 ⇔ MT > 0, there is also a Perron vector q of MT called theleft-hand Perron vector. Since ρ(M) = ρ(MT), it satisfies qTM = ρ(M)qT.

Lemma 8.11:If M > 0, then there are no non-negative eigenvectors of M, regardless of their eigenvalue,except for positive multiples of the Perron vector p.

43

Proof:Let y ≥ 0 be an eigenvector (and thus y 6= 0) with eigenvalue λ and let x > 0 be theleft-hand Perron vector of M.

ρ(M)xT = xTM ⇒ ρ(M)xTy = xTMy = λxTy. (33)

Because x > 0 and y 6= 0, we must have xTy > 0. From this and Eq. 33 we can concludeλ = ρ(M). So y must be an eigenvector with eigenvalue ρ(M). From Lemma 8.9, weknow that the eigenspace corresponding to this eigenvalue is one-dimensional, hence theLemma is proved.

�

Combining Lemmas 8.5, 8.8, 8.9 and 8.11 yields Perron’s theorem, an important specialcase of the Perron-Frobenius theorem.

Theorem 8.12:Perron’s theorem: If M > 0 and r = ρ(M), then

• r > 0,

• r is a leading eigenvalue of M,

• alg mulM(r) = 1,

• r is the only eigenvalue with absolute value r,

• there exists an eigenvector x > 0 such that Mx = rx,

• the Perron vector p defined in Def. 8.10 is unique and, except for positive multiplesof p, there are no other non-negative eigenvectors of M, regardless of the eigenvalue.

Remark: The Perron theorem only applies to the leading eigenvalue. Non-leading eigen-values can be negative. For example,

M =

(1 22 1

)

.

has an eigenvalue −1. But the elements in the corresponding eigenvectors must havedifferent signs. In this example, the eigenvectors are non-zero multiples of x = (1,−1)T.

Remark: The Perron theorem does not apply to the adjacency matrices of simple networksbecause their diagonal entries are zero. So we still have some work to do in order to obtainthe more general Perron-Frobenius theorem 8.3.

8.3 Proof for non-negative matrices

For the proof of the next theorem, we need the following lemma.

Lemma 8.13:

(A) For any complex square matrix M, ρ(M) ≤ ||M||∞.

(B) ρ(M) = limk→∞(||Mk||∞)1/k.

(C) If |M| ≤ N, then ρ(M) ≤ ρ(|M|) ≤ ρ(N).

44

Proof: (A)Let x = (x1, . . . , xn)

T be an eigenvector with eigenvalue λ. Then the n× n matrix

X =

x1 0 . . . 0...

......

xn 0 . . . 0

satisfies λX = MX. ⇒ |λ| ||X||∞ = ||λX||∞ = ||MX||∞ ≤ ||M||∞ ||X||∞.Since X 6= 0, |λ| ≤ ||M||∞ for all eigenvalues λ of M.

Proof: (B)From the Jordan normal form, we can derive ρ(M)k = ρ(Mk) and, from (A), ρ(Mk) ≤||Mk||∞. Combining these two inequalities, ρ(M) ≤ (||Mk||∞)1/k.Furthermore, ρ(M/(ρ(M + ǫ)) < 1 for every ǫ > 0, so according to Lemma 8.4,

limk→∞

(M

ρ(M) + ǫ

)k

= 0 ⇒ limk→∞

||Mk||∞(ρ(M) + ǫ)k

= 0.

This implies that there is a Kǫ > 0 such that ||Mk||∞/(ρ(M) + ǫ)k < 1 and hence(||Mk||∞)1/k < ρ(M) + ǫ for all k ≥ Kǫ.In summary,

ρ(M) ≤ (||Mk||∞)1/k < ρ(M) + ǫ for k ≥ Kǫ

for all ǫ > 0 and thus limk→∞(||Mk||∞)1/k = ρ(M).

Proof: (C)The triangle inequality implies |Mk| ≤ |M|k for all k ∈ N. From |M| ≤ N we can furtherderive |M|k ≤ Nk. These two inequalities together with (B) yield

||Mk||∞ =∣∣∣∣ |Mk|

∣∣∣∣∞

≤∣∣∣∣ |M|k

∣∣∣∣∞

≤ ||Nk||∞⇒ ||Mk||1/k∞ ≤

∣∣∣∣ |M|k

∣∣∣∣1/k

∞≤ ||Nk||1/k∞

⇒ limk→∞

||Mk||1/k∞ ≤ limk→∞

∣∣∣∣ |M|k

∣∣∣∣1/k

∞≤ lim

k→∞||Nk||1/k∞

⇒ ρ(M) ≤ ρ(|M|) ≤ ρ(N). �

Now we have the necessary tools to generalise Perron’s theorem to non-negative matrices.

Theorem 8.14:For any non-negative square matrix M with r = ρ(M), the following statements are true.

• M has an eigenvalue r (but r = 0 is possible),

• there exists a vector z ≥ 0, z 6= 0 so that Mz = rz.

Proof:Let us define E to be the matrix with 1 in every entry and define the sequence

Mk = M + (1/k)E.

Because all Mk are positive, we can apply Perron’s theorem 8.12. Let rk > 0 be thespectral radius of Mk and pk the Perron vector. The set {pk}∞k=1 is bounded by the

45

unit sphere. The Bolzano-Weierstrass theorem states that each bounded sequence hasa convergent subsequence so that there must be a subsequence {pki

}∞i=1 → z for somevector z. We know that z ≥ 0 because pki

> 0. We also know that z 6= 0 because||pki

||1 = 1.Because M1 > M2 > . . . > M, Lemma 8.13(C) implies r1 ≥ r2 ≥ . . . ≥ r, so the sequencerk is monotonically decreasing and bounded from below by r. Therefore, limk→∞ rk = rexists and

r ≥ r. (34)

On the other hand, limk→∞ Mk = M so that also limi→∞ Mki= M and thus

Mz = limi→∞

Mkilimi→∞

pki= lim

i→∞(Mki

pki) = lim

i→∞(rki

pki) = lim

i→∞rki

limi→∞

pki= rz.

This implies that r is an eigenvalue of M. Since r is the spectral radius of M, r ≤ r.Because of Eq. 34, r = r. �

Theorem 8.14 is as much as we can prove for general non-negative matrices. In the specialcase where M is irreducible, however, we can recover almost all of Perron’s theorem 8.12.The proof requires the following lemma.

Lemma 8.15:If M is irreducible, then (I + M)n−1 > 0, where I denotes the identity matrix.

Proof:Let m

(k)ij be the (i, j)-th entry in Mk. From Prop. 8.2(C) we know that for every pair

(i, j) there is a k so that m(k)ij > 0.

[(I + M)n−1

]

ij=

[n−1∑

k=0

(n− 1

k

)

Mk

]

ij

=n−1∑

k=0

(n− 1

k

)

m(k)ij > 0. �

Now we are prepared for the proof of the Perron-Frobenius theorem 8.3.

Proof of Thm. 8.3: (A)This follows from Thm. 8.14.

Proof of Thm. 8.3: (B)Let B = (I + M)n−1 > 0 be the matrix from Lemma 8.15. Furthermore, let J =

P−1MP =

J1 0 . . . 0

0 J2 . . . 0...

.... . .

...0 0 . . . Jp

be the Jordan normal form of M. Then

P−1BP = P−1

(n−1∑

k=0

(n− 1

k

)

Mk

)

P =

n−1∑

k=0

(n− 1

k

)

(P−1MP)k =

n−1∑

k=0

(n− 1

k

)

Jk.

We have calculated the general shape of Jk in the proof of Lemma 8.4. From this we canconclude that

P−1BP =n−1∑

k=0

(n− 1

k

)

Jk1 0 . . . 0

0 Jk2 . . . 0...

.... . .

...0 0 . . . Jkp

=

Bk1 0 . . . 0

0 Bk2 . . . 0

......

. . ....

0 0 . . . Bkp

46

where

Bki =

∑n−1k=0

(n−1k

)λki x12 . . . x1m

0. . .

......

. . . xm−1,m

0 . . . 0∑n−1

k=0

(n−1k

)λki

=

(1 + λi)n−1 x12 . . . x1m

0. . .

......

. . . xm−1,m

0 . . . 0 (1 + λi)n−1

and we have assumed that Ji is an m×m matrix. So λ is an eigenvalue of M if and onlyif (1 + λ)n−1 is an eigenvalue of B and alg mulM(λ) = alg mulB[(1 + λ)n−1].Set r = ρ(M) and b = ρ(B). Since r is an eigenvalue of M,

b = maxi=1,...,p

|(1 + λi)|n−1 =

(

maxi=1,...,p

|1 + λi|)n−1

= (1 + r)n−1.

Suppose alg mulM(r) > 1. Then alg mulB(b) > 1 in contradiction to B > 0 andThm. 8.12. Therefore the supposition was wrong and instead alg mulM = 1.

Proof of Thm. 8.3: (C)We know from Thm. 8.14 that there is an eigenvector x ≥ 0 with eigenvalue r, Mx = rx.

⇒ Bx = (I + M)n−1x =n−1∑

k=0

(n− 1

k

)

Mkx =n−1∑

k=0

(n− 1

k

)

rkx = (1 + r)n−1x,

which implies that x is a non-negative eigenvector for the leading eigenvalue of B > 0.It follows from Thm. 8.12 that x > 0.

Proof of Thm. 8.3: (D)Let x be an eigenvector with eigenvalue r. Suppose r = 0. Then Mx = 0 and furthermoreM ≥ 0 and x > 0. This can only be true if M = 0. But a matrix with all zeros isreducible, so we must have r > 0.

Proof of Thm. 8.3: (E)This can be proved with the same arguments as Lemma 8.11. �

Remark: There is one property of Thm. 8.12 that the Perron-Frobenius theorem 8.3 doesnot recover, namely that(∗) an eigenvalue λ with |λ| = ρ(M) must satisfy λ = ρ(M).For example,

M =

(0 11 0

)

.

has eigenvalues 1 and −1.Irreducible matrices with the additional property (∗) are called primitive. Primitivematrices play an important role for random walks on directed networks: if the adjacencymatrix is primitive, then the random walk does not have a periodic solution.

47

9 Centrality measures

There are several measures to quantify how central or important a node is in a network.We have already encountered one simple, but useful, centrality measure: the degree, alsosometimes called degree centrality. It is plausible that a hub, i.e. a node with a high(in-)degree, is more important than a node with only few neighbours.

However, the degree is in many applications a very crude measure. Usually not allneighbours are equally important and, therefore, the number of neighbours alone is notenough to assess centrality. This idea leads to several more advanced centrality measures.

9.1 Eigenvector centrality

Motivation:Consider the example in Fig. 30. Node M has a smaller degree than L and R, but is Mreally less central? After all, M is connected to the two nodes of highest degree in thenetwork which should boost its importance. In contrast, L and R are mostly linked tonodes of low degree and thus should be relatively less important than their own degreesuggests.A self-consistent measure of the centrality would be to make it proportional to the sumof its neighbours’ centralities. If xi is the centrality of node i, then we need to solve

xi = Cn∑

j=1

Aijxj (35)

self-consistently for some constant C. In matrix form, this is x = CAx. In other words,x is an eigenvector of the adjacency matrix.If we choose x to be the Perron vector, then M in Fig. 30 receives indeed the samecentrality as L and R.

Definition 9.1:If A is the adjacency matrix of a strongly connected network with n nodes, then theeigenvector centralities of the nodes 1, . . . , n are the elements of the Perron vector of A

(see Def. 8.10 for the definition of the Perron vector).

Motivation:So why do we choose the Perron vector p and not one of the other eigenvectors of A?There are several reasons:

0.1

0.1

0.2 0.2 0.2

0.1

0.1

L M R

Figure 30: A small illustrative undirected network. Node M has a smaller degree than L andR, but the same eigenvector centrality (indicated by the decimal numbers).

48

• p has a positive eigenvalue (at least for a strongly connected network) so that C > 0in Eq. 35 which is sensible.

• p > 0 so that all centralities are positive which is also reasonable.

• As we will show in the next theorem, the Perron vector is (usually) the asymptoticresult of the following iterative procedure known as von Mises iteration or “powermethod”.

(i) Set t = 0.

(ii) Let us make an initial guess about the importance x(0)i > 0 for all nodes

i = 1, . . . , n (e.g. x(0)i = 1 for all i).

(iii) An improved measure of centrality x′i is the sum of the importance of all nodespointing towards i,

x′i =n∑

j=1

Aijx(t)j .

or in matrix form x′ = Ax(t).(iv) Increment t by 1 and define x(t) to be the normalised vector pointing in the

direction of x′,

x(t) =x′

||x′||1.

Go back to step (iii).

Theorem 9.2:If A ≥ 0 is the adjacency matrix of a strongly connected network, x(0) > 0, x(t) = A

tx

(0)

||Atx(0)||1

and ρ(A) is the only eigenvalue on the spectral circle, then

x(∞) = limt→∞

x(t) = p (36)

where p is the Perron vector of A.

Proof:Let J = P−1AP be the Jordan normal form of A with the leading eigenvalue in the upperleft corner. From the Perron-Frobenius theorem 8.3, we know that the leading eigenvalueis ρ(A) > 0 with alg mul(ρ(A)) = 1 which gives J the general form

J =

ρ(A) 0 . . . 0

0 J2...

.... . . 0

0 . . . 0 Jp

. (37)

Because P is non-singular, the column vectors Pe1, . . . ,Pen with

ei = (0, . . . , 0, 1︸︷︷︸

i-th position

, 0, . . . , 0)

49

form a basis of Cn so that we can express our initial guess x(0) as

x(0) =

n∑

i=1

biPei (38)

for some coefficients bi ∈ C. We will later on need b1 6= 0, which can be seen as follows.From Eq. 37, ρ(A)eT

1 P−1 = eT1 JP−1 = eT

1 P−1A, so eT1 P−1 is a multiple of the left-hand

Perron vector. It cannot be zero because otherwise P would be singular. So we canconclude that the elements of eT

1 P−1 are either all positive or all negative. Since we havechosen x(0) > 0, eT

1 P−1x(0) 6= 0. Now we insert Eq. 38,

0 6= eT1 P−1

n∑

i=1

biPei = eT1

n∑

i=1

biei = b1. ⇒ b1 6= 0.

Multiplying x(0) with At, we obtain

Atx(0) = PJtP−1n∑

i=1

biPei = PJtn∑

i=1

biei.

From Eq. 37, Je1 = ρ(A)e1 so that

Atx(0) = b1(ρ(A))tPe1 + PJtn∑

i=2

biei = b1(ρ(A))tP

[

e1 +1

b1

(J

ρ(A)

)t n∑

i=2

biei

]

.

In the t-th step of the von Mises iteration, the centrality vector is

x(t) =Atx(0)

||Atx(0)||1=

b1|b1|

(ρ(A))t

|ρ(A)|tP

[

e1 + 1b1

(J

ρ(A)

)t∑ni=2 biei

]

∣∣∣∣

∣∣∣∣P

[

e1 + 1b1

(J

ρ(A)

)t∑ni=2 biei

]∣∣∣∣

∣∣∣∣1

.

The matrix J

ρ(A)has an entry 1 in the top left corner, but all other diagonal entries are

< 1. Using the arguments in the proof of Thm. 8.4, we find

limt→∞

(J

ρ(A)

)t

=

1 0 . . . 00 0 . . . 0...

......

0 0 . . . 0

.

Because of this and ρ(A) > 0,

x(∞) = limt→∞

x(t) =b1|b1|

Pe1

||Pe1||1. (39)

Since APe1 = PJe1 = ρ(A)Pe1, Pe1 is an eigenvalue of A with eigenvalue ρ(A). Addi-tionally, we know

A ≥ 0, x(0) > 0 ⇒ Ax(0) > 0,

50

because zeros in Ax(0) could only appear if A contained a row of zeros and thus a nodeof in-degree zero, but then the network would not be strongly connected because therewould not be any path to this node. It follows by induction that

Atx(0) > 0 ⇒ x(t) > 0 ⇒ x(∞) ≥ 0. (40)

Furthermore, x(∞) 6= 0 because ||x(t)||1 = 1 ∀t ⇒ ||x(∞)||1 = 1. Together with Eq. 39,this implies that x(∞) is an eigenvector with eigenvalue ρ(A).In summary, x(∞) is a non-negative, normalised eigenvector for the leading eigenvalueρ(A) of the irreducible matrix A. From the Perron-Frobenius theorem 8.3 we know thatx(∞) must then be the Perron vector. �

If the adjacency matrix has more than one eigenvalue on the spectral circle, the von Misesiteration may not converge.

Example: The network in Fig. 31 has the adjacency matrix A =

(0 11 0

)

with eigen-

Figure 31: A network for which the von Mises iteration does not converge.

values +1 and −1. If you start the von Mises iteration for example with the vectorx(0) = (1

4, 3

4)T, then the solution oscillates,

x(t) =

{

(14, 3

4)T if t is even,

(34, 1

4)T otherwise.

However, the network is strongly connected and therefore the eigenvector centrality (i.e.the Perron vector) is unique, p = (1

2, 1

2)T.

If the network is not strongly connected, then the Perron-Frobenius theorem does notapply. In this case, we can still find a normalised non-negative eigenvector, but it maynot be unique.

Example:

1 2 3 4

Figure 32: A network which is not strongly connected with no unique eigenvector centrality.

The adjacency matrix of the network depicted in Fig. 32

A =

0 0 0 01 0 1 00 0 0 00 0 1 0

51

has two orthogonal, normalised, non-negative eigenvectors: p1 = (0, 1, 0, 0)T and p2 =(0, 0, 0, 1)T. Any convex combination ap1 + (1 − a)p2, a ∈ [0, 1] is an eigenvector witheigenvalue ρ(A) = 0.

There are also cases of networks that are not strongly connected, but the Perron vectoris unique.

Example:

1 2 3

A =

0 1 00 0 10 0 0

⇒ p =

100

However, do we really want node 2 to have zero centrality? After all, there is one nodepointing at it. Intuitively, one would therefore assign a higher importance to 2 than to3. This motivates the search for alternatives to the eigenvector centrality.

9.2 Katz centrality

One idea to prevent zeros in the last example is to give every node a minimum centralityof β in the following variation of the von Mises iteration:

(i) Set t = 0.

(ii) Make an initial guess for the centrality vector x(0) ≥ 0.

(iii) Assign an improved centrality x′i which is a mix of the centrality of the neighboursand an intrinsic centrality β,

x′i = αn∑

j=1

Aijx(t)j + β, α > 0, β > 0

or in matrix form x′ = αAx(t) + β1, where 1 = (1, 1, . . . , 1)T.

(iv) Increment t by 1 and define x(t) to be the normalised vector pointing in the directionof x′,

x(t) =x′

||x′||1.

Go back to step (iii).

Theorem 9.3:

If A ≥ 0 and

{

0 < α < 1/ρ(A) if ρ(A) > 0,

α > 0 otherwise,the modified von Mises iteration converges

to

xKatz(α) = limt→∞

x(t) =(I − αA)−11

||(I− αA)−11||1. (41)

This limit, called Katz centrality, exists even if the network is not strongly connected.

52

Proof:By induction, one can show that

x(t) =(αA)tx(0) + β

[∑t−1k=0(αA)k

]1

∣∣∣∣(αA)tx(0) + β

[∑t−1k=0(αA)k

]1∣∣∣∣1

. (42)

The spectral radius of αA is ρ(αA) = αρ(A) < 1. According to Thm. 8.4, limt→∞(αA)t =0. Furthermore, (I− αA) is non-singular. This can be seen from the determinant

det(I − αA) = (−α)n det(A− α−1I) = (−α)npA(α−1),

where pA is the characteristic polynomial. For pA(α−1) = 0 we need α−1 to be atleast as small as the largest eigenvalue, but this is outside the permitted range. Hencedet(I − αA) 6= 0 which implies that (I − αA)−1 exists. This allows us to rewrite thesums in Eq. 42 as follows. First, it follows from straightforward induction on t that(I − αA)

∑t−1k=0(αA)k = I − (αA)t. Then we multiply with (I − αA)−1 from the left to

obtaint−1∑

k=0

(αA)k = (I − αA)−1(I − (αA)t) (43)

Taking the limit t→ ∞, we obtain Eq. 41. �

Remark: The Katz centrality depends on the parameter α. But the second parameter βin the modified von Mises iteration cancels out because of the normalisation.

What value of α should we choose? A common practice is to pick α close to the maximum.In the limit α→ 1/ρ(A), the Katz centrality becomes the Perron vector (proof: homeworkproblem). So, if α is near (but not exactly equal to) this limit, the Katz centrality hasa similar interpretation as the eigenvector centrality, but does not suffer from the sameproblems if the network is not strongly connected.

Example:

The network in Fig. 32 has Katz centralities xKatz(α) = 14+3α

(1, 1 + 2α, 1, 1 +α)T. In the

limit α → ∞, the centralities are (0, 23, 0, 1

3)T which are the in-degree centralities. The

limit of the Katz centrality is a sensible way to bypass the ambiguity of the eigenvectorcentrality.

Interpretation of the Katz centrality:We can use Eq. 43 to rewrite the Katz centrality of Eq. 41,

xKatzi (α) =

∑∞k=0

(

αk∑n

j=1[Ak]ij

)

∣∣∣

∣∣∣∑∞

k=0

(

αk∑n

j=1[Ak]ij

)∣∣∣

∣∣∣1

.

From Prop. 4.4 we know that [Ak]ij equals the number of walks from j to i of length k.⇒ The Katz centrality xKatz

i counts the number of possible ways to reach i, weightingeach path by a factor αk. (Because the infinite series must converge for α ∈ [0, 1/ρ(A)],this observation can be used to determine bounds for the spectral radii of A.)

53

9.3 PageRank

Both eigenvector and Katz centrality, by design, give nodes a large boost in centralityif another central node points at them. In certain contexts this may not be desirable.For example, a central web directory like Yahoo! points – rather indiscriminately – atmany web sites, including my own, but should my web site receive a disproportionatelylarge centrality in return? In some sense, links from Yahoo! should count relatively littleexactly because Yahoo! has so many outgoing links that one particular connection doesnot have much meaning.

How can we reduce the relative influence of hubs like Yahoo! on the centrality a node igains from each of its neighbours j? We can keep the idea of the intrinsic importance βfrom the Katz centrality, but divide neighbour j’s centrality xj by its out-degree koutj ,

xi = α

n∑

j=1

Aijxjkoutj

+ β. (44)

However, Equation 44 is strictly speaking undefined if the denominator koutj equals zero.

This can be easily cured by replacing koutj = 0 by koutj = 1 because, for a node j without-degree zero, Aij = 0 ∀i and thus Aij

xj

koutj

= 0. In other words, j does not contribute

to the centrality of any other node i, just as it intuitively ought to be.

We can express this idea in matrix notation by introducing the diagonal matrix D withelements Dii = max(kouti , 1) so that

x = αAD−1x + β1.

Rearranging this equation

(I − αAD−1)x = β1 ⇒ x = β(I− αAD−1)−11 = βD(D − αA)−11

motivates the next definition.

Definition 9.4:The centrality measure

xPR(α) =D(D − αA)−11

||D(D − αA)−11||1is called PageRank.

Remark:

• PageRank is one of the main ingredients of the search engine Google.

• Google uses α = 0.85, but this choice is apparently based on experimentation ratherthan rigorous theory.

Interpretation of PageRank as a random walk:Equation 44 can be interpreted as the stationary distribution of the following stochasticprocess. A random surfer on the World-Wide Web begins surfing at some specified webpage. Then the surfer iterates the following steps:

54

• If the web site has an out-degree kouti > 0, then

(i) with probability α the surfer follows one of the outgoing links chosen uniformlyat random to a new web page,

(ii) with probability (1 − α) the surfer types a new URL, chosen uniformly atrandom among all existing web pages, into the browser address bar.

• If kouti = 0, the surfer performs the “teleportation” described under (ii) above withprobability 1.

55

10 Spectral network partitioning

Note: In this section we focus on undirected networks. The generalisation to directednetworks is not straightforward.

10.1 What is network partitioning?

Networks can often be divided into groups of nodes so that

• there are many links within a group,

• there are few links between different groups.

Examples of networks with a clear group structure are shown in Fig. 3, 4 and 5. Sometimesthere is additional information about the nodes (e.g. the research area of scientists incollaboration networks) that can be used to partition the network into groups. But oftensuch information is missing and the task is to infer the groups from the adjacency matrix.There are many different versions of this problem. Here we only look at the specific caseof

Network bisection:Suppose the network consists of n nodes. We want to partition the nodes into two setsN1 and N2 consisting of n1 and n2 = n− n1 nodes respectively so that the number R oflinks connecting different sets is minimised.

The number of possible bisections:

There are(nn1

)different ways to partition the network. For large n, n1, n2 we can use

Stirling’s formula limn→∞n!

nn+1/2 exp(n)=

√2π to find the approximate relationship

(n

n1

)

=n!

n1!n2!≈

√2πnn+1/2 exp(n)√

2πnn1+1/21 exp(n1)

√2πn

n2+1/22 exp(n2)

=nn+1/2

√2πn

n1+1/21 n

n2+1/22

.

If n1 ≈ n2, this is approximately

nn+1/2

√2π(n/2)n+1

=2n+1/2

√nπ

,

which grows almost exponentially in n. Even for medium-size networks, the number ofpossible partitions becomes too big to investigate every individual case. In practice, onehas to resort to heuristic algorithms which, although not strictly exact, typically returnnear-optimal solutions.

10.2 The relaxed problem

Before we develop one such heuristic method, let us write the number R of links betweenthe sets N1 and N2 in terms of the adjacency matrix,

R =1

2

∑

i,j indifferent

sets

Aij ,

56

where we need the factor of 12

because the sum contains every pair twice.We can represent the set to which node i belongs by the auxiliary variable

si =

{

+1 if i ∈ N1,

−1 if i ∈ N2.(45)

It follows that1

2(1 − sisj) =

{

1 if i and j are in different sets,

0 otherwise.

and thus

R =1

4

n∑

i=1

n∑

j=1

Aij(1 − sisj) =1

4

(n∑

i=1

n∑

j=1

Aij +n∑

i=1

n∑

j=1

Aijsisj

)

.

The first term in the parentheses can be rewritten as

∑

i

∑

j

Aij =∑

i

ki =∑

i

kis2i =

∑

i

∑

j

kiδijsisj,

where δij is the Kronecker delta. Then

R =1

4

∑

i

∑

j

(kiδij −Aij)sisj

or in matrix form

R =1

4sT(D − A)s =

1

4sTLs,

where L is the graph Laplacian. ⇒Network bisection in matrix notation:Minimise sTLs subject to

(i) si ∈ {+1,−1} and

(ii)∑

i si = n1 − n2. (This constraint fixes the cardinalities of N1 and N2 to be n1 andn2, respectively.)

The difficulty of this problem lies in the restriction of si to two discrete values. If sicould take real values, the situation would simplify tremendously because we could thenuse derivatives to find the minimum. We still keep

∑

i s2i = n – implicit in constraint (i)

above – and constraint (ii), but otherwise allow si to have any real value.

“Relaxed” version of network bisection:Minimise sTLs subject to

(i)∑

i s2i = n and

(ii)∑

i si = n1 − n2.

57

10.3 Spectral bisection

The relaxed problem can be solved with the usual methods of constrained optimisation.We introduce two Lagrange multipliers λ and 2µ (the additional factor of 2 will beconvenient later on) and form the Lagrange function

L(s1, . . . , sn, λ, µ) =∑

j

∑

k

Ljksjsk

︸︷︷︸

objective function

+λ

(

n−∑

j

s2j

)

︸︷︷︸

constraint (i): =0

+2µ

(

(n1 − n2) −∑

j

sj

)

︸︷︷︸

constraint (ii): =0

.

Then the maximum satisfies

∂

∂siL = 0 ⇒

∑

j

Lijsj = λsi + µ ⇒ Ls = λs + µ1. (46)

If we multiply the last equation with 1T, we can eliminate µ,

1TLs = λ1Ts + µn(A)=⇒ 0 = λ

∑

i

si + µn(B)=⇒ µ = −n1 − n2

nλ,

where we have used in (A) that 1 is an eigenvector of L with eigenvalue 0 (see Prop. 7.3)and in (B) that we impose constraint (ii). Let us define the new vector

x = s +µ

λ1 = s− n1 − n2

n1. (47)

It follows from Eq. 46 that

Lx = L(

s +µ

λ1)

= Ls = λs + µ1 = λx,

which shows that x is an eigenvector of L with eigenvalue λ. We can, however, rule outthat x = a1, a ∈ R, because

1Tx = 1Ts − µ

λ1T1 =

∑

i

si −n1 − n2

nn = n1 − n2 − (n1 − n2) = 0.

This still leaves us with many possible eigenvectors and it is not immediately clear whichone is the best candidate. To shed light on this, we note that

R =1

4sTLs =

1

4xTLx =

1

4λxTx

and, from Eq. 47,

xTx = sTs +µ

λ(sT1 + 1Ts) +

µ2

λ21T1 = n− 2

n1 − n2

n(n1 − n2) +

(n1 − n2)2

n= 4

n1n2

n,

thusR =

n1n2

nλ.

58

Since we want to minimise R, we are looking for an eigenvector x which has minimaleigenvalue λ, but is not a multiple of 1. We know from Prop. 7.2 that all eigenvalues are≥ 0. If we sort the eigenvalues λ1 = 0 ≤ λ2 ≤ . . . ≤ λn and if v1 = 1T,v2, . . . ,vn is anorthogonal basis of eigenvectors with Lvi = λivi, then we are looking for the basis vectorv2.

7

We can obtain the solution srel of the relaxed problem from Eq. 47,

srel = v2 +n1 − n2

n1.

Generally, none of its elements will be +1 or −1, so it is not an exact solution of theoriginal bisection problem. However, one plausible heuristic is to look for the vectors ∈ {−1,+1}n that is “closest” to srel. We are then looking for a minimum of

||s− srel||22 = sTs + sTrelsrel − 2sTsrel = 2n− 2sTsrel,

which is the maximum of sTsrel =∑

i sisrel,i. Since we fixed the total number n1 ofelements +1 in s, the sum is maximised by assigning si = +1 to those nodes i with thelargest value of srel,i. But srel,i and the i-th element of v2 only differ by the constantterm (n1 − n2)/n, so that we can equivalently assign si = +1 to the n1 largest entries inv2.

8

Clearly, if v2 is an eigenvector of L with eigenvalue λ2, then −v2 is eigenvector with thesame eigenvalue. So another heuristic solution is to assign si = +1 to the n1 smallestentries in v2. This is tantamount to swapping the group labels “1” and “2” in Eq. 45, andis of course also permitted as a candidate solution. Because the first heuristic solutiongives us the second one almost for free, we should investigate both and choose the onewith the smaller R.

⇒ Spectral bisection algorithm:

(i) Calculate an eigenvector v2 of the graph Laplacian with the second smallest eigen-value λ2. (λ2 is sometimes called “algebraic connectivity”.)

(ii) Sort the elements of v2 in descending order.

(iii) Assign the n1 nodes corresponding to the largest elements to set N1, the rest to N2

and calculate R.

(iv) Then assign the n1 nodes corresponding to the smallest elements to set N1, the restto N2 and recalculate R.

(v) Between the bisections in steps (iii) and (iv), choose the one with the smaller R.

Example: The network depicted in Fig. 33 has the Laplacian

7The second-largest eigenvalue λ2 may be degenerate, for example λ2 = λ3. In this case we should inprinciple investigate all linear combinations v2 +av3, but because the relaxed problem is only a heuristicfor bisection anyway, let us not become too obsessed by details at this point.

8If the n1-th largest entry is equal to the (n1 + 1)-th, (n + 1 + 2)-th . . . largest entry, then we havea choice which entry si we want to make +1. Ideally, we would then investigate all possible cases, butagain let us not become distracted by details.

59

1

23

4

56

7N

N12

Figure 33: A small illustrative network split into groups of 3 and 4 nodes, respectively.

L =

2 −1 0 0 0 0 −1−1 3 −1 0 0 0 −10 −1 4 −1 −1 −1 00 0 −1 3 −1 −1 00 0 −1 −1 3 −1 00 0 −1 −1 −1 4 −1−1 −1 0 0 0 −1 3

with algebraic connectivity λ2 ≈ 0.885. The corresponding eigenvector is

v2 = (1.794, 1.000,−0.679,−1.218,−1.218,−0.679, 1.000)T.

If we want to split the network into groups of size n1 = 3 and n2 = 4, then spectralbisection puts the nodes 1, 2 and 7 in one group and the rest in the other group. This isindeed the optimal split as one can, for this small example, verify by inspection.

Partitioning a network into more than two groups:So far we have only looked at bisection, that is splitting the network in two. This appearsat first sight to be only a special case of the more general problem of dividing the nodesinto multiple groups. However, in practice the vast majority of heuristic algorithms toperform the latter task apply repeated bisection of groups. First the network is split intotwo groups, then one or both of the groups are bisected, etc.

60

11 Shortest-path algorithm – the unweighted case

In this section we will develop algorithms to determine the shortest path from a nodes to another node t. These algorithms can be implemented as computer programmesapplicable to directed networks (with undirected networks as a special case).

s

t

shortest path

not s

horte

st pa

th

11.1 Network representations

How can we represent networks in computer memory? We have already encountered oneimportant representation, the adjacency matrix A. Many of our theorems and equationswere expressed in terms of A. If n is the number of nodes, A can be declared in acomputer programme as a two-dimensional n × n array. However, storing the networkas a two-dimensional array is often costly in terms of memory. Consider the network in

1 2

3

45

Figure 34: A small “sparse” network.

Fig. 34 and let n be the number of nodes and m be the number of links. Because n = 5,the adjacency matrix has 52 = 25 elements, but only m = 5 of them are equal to 1,whereas everything else equals 0. This feels like an enormous waste of memory, and for“sparse” networks, where the number of links is much less than the maximum n(n− 1),we can indeed find much less expensive data structures.

Definition 11.1:Let Γ = {Gi = (Ni, Li), i ∈ N} be a family of networks where the number of nodesni = card(Ni) is unbounded. If mi = card(Li) = O(ni), the members of this family arecalled sparse.

The O-notation in Def. 11.1 is defined as follows.

61

Definition 11.2:Let f and g be functions N → R. The notation f(n) = O(g(n)) means that there existpositive constants c and n0 so that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0.

In this notation, the adjacency matrix of a sparse matrix needs O(n2) memory to storeinformation for O(n) links. An alternative data structure that requires only O(n) memoryis the adjacency list. This is actually not a single list, but consists of one separate listfor every node. The list of node i contains the labels of those nodes j for which there isa link i→ j.

Example: The adjacency list of the network in Fig. 34 is

node linked to1 52 1, 53 245 2

It is usually a good idea to also store the out-degree kouti of each node i in a separate arrayso that we know how many entries there are in i’s list. The out-degrees are n integers and,consequently, the total need for memory for the adjacency list plus the out-degrees is stillO(n). If n is large – for calculations related to the world-wide web it is not uncommon toencounter n > 107 – the adjacency-list representation saves us a lot of memory comparedto the adjacency matrix, but it may cost us in terms of time.

Example: Determine if there is a link between i and j.

Network representation Solution strategy Time neededAdjacency matrix Look up Aji If we have random-access to

the location in memory, O(1).

Adjacency list Go through all entries In the worst-case, there can be n− 1in the list of node i. entries in the list and, if j is not

linked to i, we must investigate alln− 1 entries. ⇒ O(n)

It is always a good idea to assess the advantages and disadvantages of both representationsbefore writing a computer programme. However, as a rule of thumb, the adjacency-listrepresentation is usually the better choice.

11.2 Queues

Another important consideration when writing computer code is the appropriate wayto temporarily store and retrieve information during the execution of the programme.One rather simple data structure for this purpose is a queue. This is an ordered set Q

62

that maintains a list of numbers in a FIFO (i.e. first-in, first-out) order. There are twobasic operations, Enqueue and Dequeue. If we Enqueue a number, it is added at thelast position of Q. If we Dequeue, the number in the front position of Q is returnedas function value and subsequently deleted from Q. After the deletion, the previouslysecond number moves to the front position and all other numbers also proceed one stepcloser to the start of the queue, similar to customers waiting at one single supermarkettill.

Q

Q.head Q.tail

(a)

= 3 = 8

1 2 3 4 5 6 7 8 9 1011 6 15 14 2

(b) ENQUEUE( ,3)Q

Q

Q.head Q.tail= 3 = 9

1 2 3 4 5 6 7 8 9 1011 6 15 14 2 3

(c) DEQUEUE( )Q

Q

Q.head Q.tail= 4 = 9

1 2 3 4 5 6 7 8 9 1011 6 15 14 2 3

return 11

Figure 35: A queue implemented using an array Q[1 . . . 10]. (a) The queue has 5 elements inlocations Q[3 . . . 7]. (b) The configuration of the queue after calling Enqueue(Q, 3). (c) Theconfiguration after calling Dequeue(Q).

Figure 35 shows an example. The queue consists of an array Q[1 . . . n] where n is themaximum number of elements we wish to store. The queue has two attributes, Q.headand Q.tail. The element currently in the queue are stored in Q[Q.head . . .Q.tail − 1].The queue is empty if Q.head = Q.tail. Initially Q.head = Q.tail = 1. If we attemptto dequeue an element from an empty queue, the programme should exit with an errormessage. Conversely, when Q.tail = n+1 and we try to enqueue an element, Q overflowsand again the code should exit. In the pseudocode below I define one more attribute,Q.length, which takes on the role of n.

Initialise-Queue(Q)

1 Q.head = Q.tail = 1

63

Enqueue(Q, x)

1 if Q.tail == Q.length+ 12 error “Queue overflow”.3 Q[Q.tail] = x4 Q.tail = Q.tail + 1

Dequeue(Q)

1 if Q.head == Q.tail2 error “Queue underflow”.3 x = Q[Q.head]4 Q.head = Q.head + 15 return x

These three subroutines require O(1) time.

11.3 Breadth-first search

We now develop an algorithm that can find the shortest path from a specified “source”node s to every possible “target” node t in the out-component of s. The algorithm willalso be able to tell us if t is not in the out-component.

01

23

Figure 36: Upon initialisation, a breadth-first search gives the source node a distance label 0.Then it explores previously unlabelled nodes by preferentially searching for nodes that have asmall distance from the source. In this manner, a breadth-first search labels nodes in the shell“1” before continuing to shell “2”. After shell “2” is finished, the algorithm explores shell “3”etc.

The strategy of this algorithm is to explore the network by stepping from one node whosedistance from s is known to another node whose distance is still unknown. Becausethe search for nodes with undiscovered distances proceeds preferentially from nodes withsmall established distances, the algorithm tends to search along the breadth of the known

64

frontier rather than penetrating deeper into unknown territory (Fig. 36). For this reasonthe algorithm is called breadth-first search.To maintain breadth-first order, the algorithm maintains a queue Q. Initially Q is emptyand all nodes u are given a nominal distance u.d = ∞ until they are “discovered”, exceptthe source node s to which we assign s.d = 0. When a node i is discovered via a link froma node j, i is given a distance i.d = j.d+ 1 and we store the information that we reachedi via j as follows. We say that j is the predecessor of i, and keep this information asa node attribute i.π = j. This attribute will later on allow us to construct the shortestpath from the source s to i. The following pseudocode denotes the network by G, the setof nodes by G.N and the adjacency list of node u as G.Adj[u].

BFS(G, s)

1 for each node u ∈ G.N // Initially all nodes are undiscovered.2 u.d = ∞3 u.π = nil

4 s.d = 0 // Discover the source.5 Initialise-Queue(Q)6 Enqueue(Q, s)7 while Q.head 6= Q.tail // Iterate until the queue is empty.8 u = Dequeue(Q)9 for each v ∈ G.Adj[u]

10 if v.d == ∞11 v.d = u.d+ 1 // Discover v.12 v.π = u13 Enqueue(Q, v)

Figure 37 shows how BFS operates on a sample network.

How long will Q have to be in the worst-case? If s is connected to each of the other n−1nodes, Q will hold n − 1 elements after the first iteration of the while loop. Togetherwith s, which still occupies the first entry in the array, Q.length = n is a safe choice. Thememory requirements of BFS are then O(n) for the queue plus O(m) for the adjacencylist, which is in total O(n) for a sparse network.

The scaling of the running time of BFS is determined by the sum of the running timesof the for loop in lines 1-3 and the while loop in lines 7-13. The assignments in lines4-6 only require O(1) time each and will therefore play no role in the limit n→ ∞. Thefor loop initialises the node distances and predecessors which are all O(1) operations andthere are n iterations, so this loop requires O(n) time. The O(1) queue operations in thewhile loop are performed at the most n times, and hence are altogether O(n), becauseno node can enter the queue more than once. In the for subloop of line 9, we also haveto go through the adjacency list of the dequeued node, which for all nodes together takesO(m) time. Altogether BFS runs in O(m) + O(n) time, which for sparse networks isO(n).

Let us convince ourselves that the value of v.d calculated by BFS are indeed the shortestdistances. Let us denote by δ(s, v) the shortest length of any possible path from s to v.We begin by establishing the following important property of the shortest-path length.

65

(a)

r s t u

v w x y

0Q s

0(b)

r s t u

v w x y

01

1

Q w r1 1

(c)

r s t u

v w x y

0

1

1

2

2Q r t x

1 2 2(d)

r s t u

v w x y

1

1

0

2 2

2Q t x v

2 2 2

(e)

r s t u

v w x y

1

1

0 2

2 2

3Q x v u

2 2 3(f)

r s t u

v w x y

1

1

0

2

2

2 3

3Q v u y

2 3 3

(g)

r s t u

v w x y

1

1

0

2

2

2 3

3Q u y

3 3(h)

r s t u

v w x y

1

1

0

2

2

2

3

3

Q y3

(i)

r s t u

v w x y

1

1

0

2

2

2

3

3

Q empty

Figure 37: The steps carried out by BFS. Undiscovered nodes are white, nodes in the queuegrey and discovered nodes that have left the queue are black. The link from a node to itspredecessor is indicated by a light grey arrow. The numbers in the node are the d values. Thequeue is shown at the beginning of each iteration of the while loop. The numbers below thequeue are established d values.

Lemma 11.3:For any node s and any arbitrary link u→ v

δ(s, v) ≤ δ(s, u) + 1.

Proof:If u is in the out-component of s, then v must also be reachable from s. In this case, onepossible walk from s to v is the one that follows a shortest path from s to u and thenuses the link u → v. This walk of length δ(s, u) + 1 is at least as long as the shortestpath from s to v.If u is not in the out-component of s, then δ(s, u) = ∞ which is certainly at least as largeas δ(s, v). �

66

Next we show that v.d is an upper bound for δ(s, v).

Lemma 11.4:Suppose BFS is run from a source node s. Then at the end of the programme, thecomputed value v.d satisfies v.d ≥ δ(s, v) for all nodes v.

Proof:The proof is by induction on the number of Enqueue operations. The induction hy-pothesis is that v.d ≥ δ(s, v) ∀v after Enqueue.The basis of induction is the first time we encounter Enqueue which occurs in line6. The induction hypothesis is true because s.d = 0 = δ(s, s) and, for all v 6= s,v.d = ∞ ≥ δ(s, v).For the induction step, consider a node v that is discovered from u and then enqueuedin line 13. Because of the induction hypothesis, u.d ≥ δ(s, u). Then

v.d(A)= u.d+ 1 ≥ δ(s, u) + 1

(B)

≥ δ(s, v),

where (A) follows from line 11 in the pseudocode and (B) from Lemma 11.3. The d valuesof all nodes w 6= v remained unchanged since the last Enqueue, so that the inductionhypothesis is true. �

Before we can establish that v.d = δ(s, v), we first have to show that the queue can atall times only contain nodes with at the most two distinct d values.

Lemma 11.5:Suppose that the queue Q contains during the execution of BFS the nodes (v1, v2, . . . , vr)in this particular order. Then vr.d ≤ v1.d+ 1 and vi.d ≤ vi+1.d for i = 1, 2, . . . , r − 1.

Proof:We use induction on the number of queue operations. The induction basis is the situationafter the first Enqueue in line 6, when only s is in the queue and the lemma consequentlyvalid.For the induction step, we must consider the situation immediately after Dequeue andEnqueue.

• DEQUEUE: If the queue becomes empty after dequeuing v1, the lemma certainlyholds. Let us then assume that there is still an element v2 left in the queue. Fromthe induction hypothesis v1.d ≤ v2.d and therefore vr.d ≤ v1.d+1 ≤ v2.d+1. Noneof the other inequalities are affected so that the lemma remains true.

• ENQUEUE: When a node v is enqueued in line 13, it becomes vr+1. At this pointin time, its predecessor u is already removed from Q. The new queue head v1 waseither in the queue together with u at some point in the past or v1 was discoveredfrom u. In both cases v1.d ≥ u.d. The d value of the new entry in the queue satisfiesvr+1.d = v.d = u.d + 1 ≤ v1.d + 1. We also have vr.d ≤ u.d + 1 because of theinduction hypothesis. ⇒ vr.d ≤ u.d+ 1 = v.d = vr+1.d. All other inequalitiesneeded for the lemma follow immediately from the induction hypothesis. �

67

Corollary 11.6:Suppose vi and vj are enqueued during BFS and that vi is enqueued before vj . Thenvi.d ≤ vj .d at the time when vj is enqueued.

Proof:This follows immediately from Lemma 11.5 and the fact that each node only receives atthe most one finite d during the execution of BFS. �

Now we are ready to prove that breadth-first search correctly finds all shortest-pathdistances.

Theorem 11.7:Suppose BFS is run from a source node s. Then

(A) upon termination, v.d = δ(s, v) for all nodes v,

(B) for any node v 6= s that is reachable from s, one of the shortest paths from s to v isa shortest path from s to v.π followed by the link v.π → v.

Proof: (A)We try to establish a contradiction, so

(∗) assume there exists a node v with v.d 6= δ(s, v). If there are several, choose a nodewith minimal δ(s, v).

We know that s.d = 0 is correct, so v 6= s. From Lemma 11.4, we know v.d ≥ δ(s, v) andthus v.d > δ(s, v). We can also conclude that v is reachable from s because otherwiseδ(s, v) = ∞ ≥ v.d. Let u be the node immediately preceding v on a shortest path froms to v, so that δ(s, v) = δ(s, u) + 1. Because δ(s, u) < δ(s, v) we must have u.d = δ(s, u);otherwise v would not have been a misclassified node with minimal distance. Combiningthese results,

v.d > δ(s, v) = δ(s, u) + 1 = u.d+ 1. (48)

The node u is, because of its definition, reachable from s and has the correct, thus finite,d value. During the execution, BFS hence must dequeue u. At this time, v can be inthree different states.

• v.d == ∞:Line 11 in the pseudo-code sets v.d = u.d+ 1, but this is finite.

• v.d <∞ and v /∈ Q:The algorithm must have already dequeued v previously. From Corollary 11.6,v.d ≤ u.d, contradicting Eq. 48.

• v.d <∞ and v ∈ Q:The algorithm must have discovered v from a node w 6= u that is already dequeued.At the time of v’s first discovery we have set v.d = w.d+ 1. From Corollary 11.6,we also know w.d ≤ u.d. Putting these properties together v.d = w.d+1 ≤ u.d+1,which again contradicts Eq. 48.

68

As a consequence, assumption (∗) must be wrong. BFS assigns the correct distances toall nodes.

Proof: (B)If v is reachable, we know from (A) that v.d < ∞ and, therefore, v must have beendiscovered from some node v.π = u with v.d = u.d + 1. Thus, we can obtain a shortestpath from s to v by following a shortest path from s to v.π and then taking the linkv.π → v. �

We now know that the distances established during BFS are those of the shortest paths,but how do we actually obtain the shortest paths? The next lemma provides an importantclue.

Lemma 11.8:We define Nπ as the set of all nodes that have been enqueued during BFS. Then:

(A) We have for every node v ∈ Nπ − {s} that v.π ∈ Nπ, so that we can properly definethe auxiliary network Gπ = (Nπ, Lπ) with links Lπ = {link v → v.π : v ∈ Nπ−{s}}.9

(B) The out-degree in Gπ of all nodes in Nπ − {s} equals 1. The out-degree of s equals0.

(C) Gπ is a directed acyclic network.

(D) There is exactly one path from t ∈ Nπ−{s} to s in (Nπ, Lπ). This is a shortest pathfrom s to t in the original network in reverse order.

Proof: (A)All nodes in Nπ are, by definition, enqueued at some point during the execution of BFS.Except s, all of these must have undergone the assignment v.π = u in line 12, where u isa previously enqueued node.

Proof: (B)Follows from (A) and the fact that for every v there is exactly one v.π. The link s →s.π = Nil is explicitly removed by the definition of Lπ.

Proof: (C)Before v is enqueued, lines 11 and 12 in BFS have set v.d = (v.π).d + 1 > (v.π).d.Because all nodes in Nπ are enqueued exactly once, this inequality stays intact untiltermination. Thus, the distance labels satisfy the conditions of Prop. 4.7(B). The networkmust therefore be acyclic.

Proof: (D)Consider the following algorithm.

9In a directed network, the link v → v.π may not be part of the original network, but v.π → v isguaranteed to exist because v was discovered in via this link.

69

Shortest-BFS-Path(G, s, t)

1 BFS(G, s)2 if t 6= s and t.π == nil

3 Print t “is not in the out-component of” s4 else u = t.π5 print “The predecessor of” t “is” u6 while u.π 6= nil

7 v = u.π8 print “The predecessor of” u “is” v9 u = v

The while loop repeatedly steps from a node to its predecessor if it exists. The loopmust terminate because the Gπ is acyclic; otherwise, if we ran into an endless loop, wewould have to revisit one of the nodes u and could thus construct a cycle in Nπ. We knowfrom (B) that the only node in Nπ without predecessor is s, so this must be the nodewhere the while loop terminates. At all previous steps, there was no alternative link inLπ from the current node, so the path to s is unique. Using Thm. 11.7(B) inductivelyproves that the discovered path must indeed be a shortest path. �

Remark: BFS calculates the shortest paths from s to every other node in the network.This may look like overkill if all we want is the shortest path from s to one specifictarget node t. We can of course terminate BFS earlier, namely as soon as we discover t.However, this does not change the worst-case run time O(m) +O(n). In fact, there is noalgorithm known to find a single shortest path that has a better performance.

70

12 Shortest-path algorithm - the weighted case

In Sec. 11 we implicitly used the minimum number of links between two nodes as ameasure of distance. This is appropriate in many, but not all networks. Especially innetworks where some commodity is transported across the links, there are usually differentcosts associated with different links. For example, these may be travel times or ticketcosts in a passenger network or transmission delays on the Internet. One important caseare costs proportional to geometric distances, measured in kilometres rather than in thenumber of traversed links (Fig. 38). But even if the costs are not directly determined by

path withfewest links

path of shortestgeometric distance

s

t

Figure 38: If links are weighted, the path with the smallest number of links may not be the pathwith the smallest sum of weights. Depicted is an example where links are weighted by Euclideandistance. Obviously, the path with the smallest number of links takes a big geometric detour.Conversely, the path with the shortest geometric distance traverses many different links.

geometry, it is often convenient to interpret them as some kind of distance between thenodes that we would like to minimise over a path from a node s to another node t.

Let us denote by cij the cost or weight of a link from j to i. We can store the cost asan additional attribute in the adjacency list, so that the memory requirements remainO(m) +O(n).

Weighted shortest-path problem:For two given nodes s and t, find a path P : s = v0 → v1 → . . . → vk = t so thatC(P ) =

∑ki=1 cvi,vi−1

is mimimised.

We will investigate only the case where cij ≥ 0 for all links j → i, which covers the mostcommon problems.10 For example, the shortest-path problem in Sec. 11 is the specialcase where all cij = 1.

12.1 Dijkstra’s algorithm

If the cij do not have a constant value, breadth-first search does not give the correctanswer (Fig. 38). The problem is that, when we discover a node v from another node

10If negative costs are permitted, the problem becomes considerably more difficult. For example, ifthere is a cycle of negative weight, we may be able to minimise the cost by going around the cycleinfinitely often.

71

t, we no longer know with certainty that a shortest path to v will pass through the linkt → v (Fig. 39). At the moment of discovery, the best we can do is to provide an upper

s

u w

vt

shortest path to u9

5

1

1 10

9

10

14

Figure 39: A network with weighted distances (numbers next to the links). Suppose we havealready determined a shortest path from s to t and we know it is of length 9. Exploring theneighbours of t, we can establish upper bounds (red) of their distance from s, but these aregenerally overestimates, as seen here in the case of v.

bound on the distance by adding the link distances of the neighbours to an establisheddistance δ(s, v). However, we will prove that the smallest estimated distance is exact.The argument in short is this.

s

u

xy

knowndistances S

estimated distances S

P

P

1

2

Figure 40: Paths in Dijkstra’s algorithm.

Consider the situation depicted in Fig. 40. Suppose that we know the shortest pathfrom s to x and we also know that u is the node with the smallest estimated (but notyet certain) distance. If u’s estimated distance is not the exact shortest-path distance,then there must be another shorter path s, . . . , x, y, . . . , u. Because all distances are non-negative, the sub-path from s to y via x must be shorter than the path along whichwe have first discovered u (the upper path in the figure). But this contradicts that u’sestimated distance is smaller than y’s.

This idea leads to the the following procedure, known as Dijkstra’s algorithm.

(i) Initialisation:Set S = ∅, S = N .For all nodes v, set v.d = ∞ and v.π = Nil.(We will prove that S is the set of nodes with known distance from s and S itscomplement. But let us assume we do not know this yet.)

(ii) Set s.d = 0, but do not yet move it to S.

72

(iii) Let u ∈ S be a node in S for which u.d = min{d(j) : j ∈ S}. Insert u into S andremove it from S.

(iv) Go through all neighbours v of u. If v.d > u.d + cvu, then update our distanceestimate: v.d = u.d+ cvu. In this case also set v.π = u.

(v) If S is not yet equal to ∅, go back to step (iii).

In Fig. 41 a numerical example illustrates the steps in Dijkstra’s algorithm works.

(a)s

6

4

2

2

2

10

3

7

0 (b)s

6

4

2

2

2

10

3

7

4

6

0

(c)s

6

4

2

2

2

10

3

76

6

5

0

4

(d)s

6

4

2

2

2

10

3

76

5

120

4

5

(e)s

6

4

2

2

2

10

3

76

80

4

5

5

(f)s

6

4

2

2

2

10

3

7

80

4

5

5

6

(g)s

6

4

2

2

2

10

3

7

0

4

5

5

6

8

Figure 41: An illustration of Dijkstra’s example. Undiscovered nodes with d value equal to ∞are white. Grey nodes are discovered but their distances are only estimates. Black nodes aremoved to the set S. The link from a node to its predecessor is indicated by a light grey arrow.The situations depicted are at the beginning of step (iii).

12.2 Proof that Dijkstra’s algorithm is correct

Let us now formally prove that the u.d value in Dijkstra’s algorithm returns the correctshortest-path distance δ(s, u). We first need to establish that subpaths of shortest pathsare themselves shortest paths.

Lemma 12.1:Let P : v0 → v1 → . . . → vk be a shortest path from v0 to vk and, for any i, j with0 ≤ i ≤ j ≤ k, let Pij : vi → vi+1 → . . . → vj be the subpath of P from vi to vj. ThenPij is a shortest path from vi to vj .

73

Proof:If there is another path Pij from vi to vj with less weight than Pij, then we could go fromv0 to vk along the following path P :

• Follow P from v0 to vi.

• Follow Pij from vi to vj.

• Follow P from vj to vk.

This would be a path of smaller weight than P which contradicts the conditions in thelemma. �

Another fundamental property of shortest paths, is a network-equivalent of the triangleinequality.

Lemma 12.2:Consider a weighted, directed network with cvu ≥ 0 for all links u → v and source nodes. Then the inequality

δ(s, v) ≤ δ(s, u) + cvu (49)

for all links u → v.

Proof:Case 1: u is not in the out-component of sThen δ(s, u) = ∞ and, regardless if δ(s, v) is finite or not, the inequality in Eq. 49 issatisfied.Case 2: u is in the out-component of sThen v is also in the out-component. Let P be a shortest path from s to v. This shortestpath must, by definition, have no more weight than the particular path that takes ashortest path from s to u followed by the link u→ v. �

Next we show that the d labels assigned during Dijkstra’s algorithm are upper bounds ofthe shortest-path distances.

Lemma 12.3:At any moment during the execution of Dijkstra’s algorithm, v.d ≥ δ(s, v) for all nodesv.

Proof:The proof is by induction over the number of distance updates in steps (ii) and (iv) ofDijkstra’s algorithm.

Induction hypothesis: v.d ≥ δ(s, v) is true for all v after a distance update.

Induction basis: The first update is in step (ii) where we set s.d = 0 which is the correctshortest-path distance δ(s, s). All other distances are at this point v.d = ∞ which iscertainly an upper bound for δ(s, v).

Induction step: Consider what happens in step (iv) to a node v whose distance we areabout to update because we have discovered a link u → v that improves our estimate.Then

v.d = u.d+ cvu(A)

≥ δ(s, u) + cvu(B)

≥ δ(s, v),

where we have used (A) the induction hypothesis and (B) the triangle inequality (Eq. 49).All other distances x.d remain unchanged and satisfy x.d ≥ δ(s, x) because of the indictionhypothesis. �

74

Corollary 12.4:If Dijkstra’s algorithm sets v.d = δ(s, v) at any point during its execution, then thisequality is maintained until termination.

Proof:In Dijkstra’s algorithm, any distance update can only decrease, but never increase, thelabel v.d. The corollary then follows from Lemma 12.3. �

Now we are prepared for the proof that Dijkstra’s algorithm is correct.

Theorem 12.5:Dijkstra’s algorithm, run on a weighted, directed network with weights cvu ≥ 0 for alllinks u→ v and source s, terminates with u.d = δ(s, u) for all nodes u.

Proof:The proof is by induction on the number of iterations of step (iii).

Induction hypothesis: The distance label of each node in S is correct. It suffices to showthat the newly added node u in step (iii) satisfies u.d = δ(s, u) immediately after step(iii). Because of Corollary 12.4 we then know that the d value of this node will not changeduring the rest of the algorithm.

Induction basis: Initially S = {s} which has the correct d value s.d = δ(s, s) = 0.

Induction step:Suppose there exists a node u that has u.d 6= δ(s, u) when it is added to S in step (iii).If there are several such nodes, we take the first misclassified node u encountered duringthe execution of the algorithm. We know u 6= s, because we have already establishedthat s is given the correct d value. Therefore, S 6= ∅ just before u is added. We also knowthat there must be a path from s to u because, from u.d ≥ δ(s, u) and δ(s, u) = ∞, wewould otherwise have u.d = δ(s, u). Let us then choose a shortest path P from s to u.Before adding u to S, P connects a node in S (namely s) to a node in the complementS (namely u). Let us consider the first node y along P such that y ∈ S and let x ∈ S bey’s predecessor along P . Thus, as Fig. 40 illustrates, we can decompose P into

• a path P1 from s to x that is completely in S,

• the link x→ y,

• the rest of the path, P2.

It is possible that P1 or P2 consist of zero links.We now want to show that y.d = δ(s, y) when u is added to S. To see that this is thecase, we note that x.d = δ(s, x) because of the induction hypothesis. We then must have

y.d ≤ x.d+ cyx (50)

because we must have already scanned y as a neighbour of x in step (iv) where we haveeither set y.d = x.d+ cyx or we have found at that point y.d ≤ x.d+ cyx and we did notchange y.d. During any subsequent encounter of y in step (iv), y.d cannot have increased,so that Eq. 50 must be true. Because x.d = δ(s, x), we can deduce

y.d ≤ δ(s, x) + cyx.

75

Moreover, from Lemma 12.1, δ(s, x) + cyx = δ(s, y), thus

y.d ≤ δ(s, y).

But we already know from Lemma 12.3 that y.d ≥ δ(s, y) and therefore

y.d = δ(s, y).

This now allows us to construct a contradiction to prove u.d = δ(s, u). Because y appearsbefore u on a shortest path from s to u and all link weights are non-negative (includingthose on path P2), we have

y.d = δ(s, y) ≤ δ(s, u) ≤ u.d

where we have again used Lemma 12.3 in the last inequality. However, y is in S and, ify.d < u.d, we should have chosen y instead of u to be added to S first. The only way outof this contradiction is y.d = u.d, but then

y.d = δ(s, y) = δ(s, u) = u.d

and thus δ(s, u) = u.d violating our assumption about u. In summary, all nodes addedto S have the correct distance labels. �

For the sake of completeness, we would still have to show the equivalent of Lemma 11.8for Dijkstra’s algorithm, namely that v.π is indeed a predecessor on a shortest path to v.I will leave this as a homework problem.

12.3 Binary heaps

Dijkstra’s algorithm is based on finding the minimum of the d values. Implementednaively, we would in the worst case read through a list of n items to find the minimumwhich requires O(n) time for every iteration of step (iii). But we can do much better ifwe implement Dijkstra’s algorithm using a binary heap. It reduces the time per iterationof (iii) to O(logn) which is much less than O(n) because the logarithm increases quiteslowly.

A binary heap is a special version of a binary tree with an associated index. Everyelement in the tree consists of a unique label ∈ N and a value ∈ R. All levels of the treeare completely filled except possibly the lowest level, which is filled contiguously startingfrom the left.

The tree is stored in memory as an array H whose elements are the same as those in thetree. The array order is determined by going through the tree from left to right and topto bottom. An example is shown in Fig. 42.

The index I is in some sense the inverse of H . If the element labeled i is at the j-thposition of H , then j is at the i-th position of I (Fig. 42).

The defining property of the binary heap is that the tree is “partially ordered”. Thismeans that the value of every element is greater than or equal to the value of the elementabove. As a consequence, the element with the smallest value is at the top of the tree.This property allows us to quickly identify the minimum value in a set and is the mainreason why binary heaps are used in practice.We can perform the following operations on the heap.

76

Tree

50.4

61.9

11.3

125.6

32.0

115.7

81.7

48.4

79.1

107.4

23.8

96.6

LabelValue

Nextavailablespace

Array H

50.4

61.9

11.3

125.6

32.0

115.7

81.7

48.4

79.1

107.4

23.8

96.6

1 2 3 4 5 6 7 8 9 10 11 12 ...

Index I

LabelArray position

1 2 3 4 5 6 7 8 9 10 11 123 11 5 8 1 2 9 7 12 10 6 4

Figure 42: Illustration of a binary heap.

77

50.4

61.9

11.3

125.6

32.0

115.7

81.7

48.4

79.1

107.4

23.8

96.6

131.0

50.4

61.9

11.3

125.6

32.0

131.0

81.7

48.4

79.1

107.4

23.8

96.6

115.7

50.4

61.9

131.0

125.6

32.0

11.3

81.7

48.4

79.1

107.4

23.8

96.6

115.7

Figure 43: After inserting the element with label 13 into the heap of Fig. 42, we need two sift-upoperations to restore partial order in the heap.

78

50.4

61.9

131.0

125.6

32.0

11.3

81.7

48.4

79.1

107.4

23.8

96.6

115.7

return

115.7

61.9

131.0

125.6

32.0

11.3

81.7

48.4

79.1

107.4

23.8

96.6

131.0

61.9

115.7

125.6

32.0

11.3

81.7

48.4

79.1

107.4

23.8

96.6

131.0

61.9

11.3

125.6

32.0

115.7

81.7

48.4

79.1

107.4

23.8

96.6

Figure 44: After deleting the minimum, we need two sift-down operations.

79

• Inserting an element:When we add an item to the heap, it is placed in the first available space at thebottom of the tree. If the bottom row is full, we start a new row. The new valuemay violate the heap property if it is smaller than the value above. To restore theorder, we perform a sift-up operation: we swap the element with the one aboveit. If the tree is still not ordered, we repeat the sift-up operation until the newitem has either an upper neighbour of smaller value or has reached the top of thetree (Fig. 43). If there are n elements in the tree, the maximum number of sift-upoperations is the depth of the tree which scales O(logn).

• Decreasing a value in the heap:If we decrease the value of an element that is already in the heap, we may violatethe partial order. To restore it, we perform the same sift-up operation as describedabove. In the worst case, we need O(logn) iterations until the element has reachedits correct position.

• Deleting the minimum value:It is easy to find the minimum: it must be at the top of the tree. What followsafter we remove this element is a little more complicated. We first fill the emptyspace with the last element in the tree. This element usually does not have theminimum value and thus violates the partial order. To move it to the right place,we perform sift-down operations: if the value is bigger than one of the neighboursbelow, it trades position with the smallest such neighbour. In the worst-case, wemay have to iterate O(logn) sift-down operations until the element is back at thebottom of the tree.

In pseudo-code notation, we initialise the heap simply by setting its length equal to 0.

Initialise-Heap(H)

1 H.length = 0

We need functions that can determine which elements are above or below a certain ele-ment. I will call the element above “parent” and the two elements below “left” and “rightchild”. The following functions use the positions in H as input and output.

Parent(c)

1 return ⌊c/2⌋

Left-Child(p)

1 return 2p

Right-Child(p)

1 return 2p+ 1

1

2 3

4 5 6 7

8 9 10 11 12 13 14 15

The next function returns the array position of the child with the smaller value. Hereand in the rest of the pseudo-code, it is prudent to check if the called heap position isindeed in the heap. For the sake of simplicity, I omit such sanity checks here.

80

Min-Child(H, p)

1 l = Left-Child(p)2 r = Right-Child(p)3 if H [l].v ≥ H [r].v4 return l5 else return r

We swap two elements as follows.

Swap(H, I, pos1, pos2)

1 auxl = H [pos1].l // Swap the elements at pos1-th and pos2-th position in the tree.2 auxv = H [pos1].v3 H [pos1].l = H [pos2].l4 H [pos1].v = H [pos2].v5 H [pos2].l = auxl6 H [pos2].v = auxv7 I[H [pos1].l] = pos1 // Update the index.8 I[H [pos2].l] = pos2

The sift-up operations perform repeated swaps on a tree element and its parent.

Sift-Up(H, I, pos)

1 c = pos // Child.2 p = Parent(c) // Parent.3 while c > 1 and H [c].v < H [p].v // Iterate until child is at top of tree or order is restored.4 Swap(H, I, c, p)5 c = p // New child.6 p = Parent(c) // New parent.

Sifting down involves swaps with the smaller-valued child.

Sift-Down(H, I, pos)

1 p = pos // Parent.2 c = Min-Child(p) // Child.3 while c ≤ H.length and H [p].v > H [c].v // Iterate until parent is at bottom or order is restored.4 Swap(H, I, p, c)5 p = c // New parent.6 c = Min-Child(H, p) // New child.

Insertion adds a new element at the end of the heap which is then sifted up.

Insert(H, I, label, value)

1 H.length = H.length+ 12 H [H.length].l = label3 H [H.length].v = value4 I[label] = H.length5 Sift-Up(H,H.length)

81

Decreasing the value of an existing node must also be followed by iterative sift-up oper-ations.

Decrease-Value(H, I, label, value)

1 if H [I[label]].v < value2 error “New value greater than current value”3 H [I[label]].v = value4 Sift-Up(H, I, I[label])

Deleting the minimum, on the other hand, requires the sift-down routine.

Delete-Min(H, I)

1 if H.length == 02 error ”Heap empty”3 minl = H [0].l // The minimum is at the top of the tree.4 minv = H [0].v5 H [0].l = H [H.length].l // Move last element to the top.6 H [0].v = H [H.length].v7 H [H.length].v = ∞ // Make sure last heap position is never returned as minimum child.8 H.length = H.length− 1 // Reduce heap size.9 I[H [0].l] = 0 // Update index.

10 Sift-Down(H, I, 0)11 return minl

The procedures Parent, Left-Child, Right-Child, Min-Child and Swap are allO(1) in time. The while -loop in Sift-Up and Sift-Down are carried out O(logn)times so that Sift-Up, Sift-Down, Insert, Decrease-Value and Delete-Min areall O(logn) procedures.

12.4 Heap implementation of Dijkstra’s algorithm

Now we are prepared to implement step (iii) in Dijkstra’s algorithm, where we need tofind the minimum of all estimated distances, using a binary heap. This is admittedlymore difficult to code than a straightforward scan through all estimated distances, butfor sparse networks it saves us a substantial amount of time.

There is one final subtlety that saves us a little more time. We do not need to keep theentire set S stored in the heap because nodes with an estimated distance ∞ will onlybe returned as a minimum after the distances in the complete out-component of s areexactly known. But then we can stop the whole process because the remaining infinitedistances are correct. We use this observation in the pseudo-code below where we onlytransfer nodes with a finite estimated distance to the heap. We denote the network byG, the set of nodes by G.N , the adjacency list of node u by G.Adj[u] and the set of linkweights by c.

82

Dijkstra(G, c, s)

1 for each node u ∈ G.N // Initially all nodes are undiscovered.2 u.d = ∞3 u.π = Nil

4 s.d = 0 // Discover the source.5 Initialise-Heap(H)6 Insert(H, I, s, s.d)7 while H.length 6= 0 // Iterate until the heap is empty.8 u = Delete-Min(H, I)9 for each v ∈ G.Adj[u]

10 estimate = u.d+ cvu // New distance estimate.11 if v.d > estimate // Only proceed if estimate is an improvement.12 if v.d == ∞13 v.d = estimate // Discover v.14 v.π = u15 Insert(H, I, v, v.d)16 else v.d = estimate // We have found a better estimate.17 v.π = u18 Decrease-Value(H, I, v, v.d)

If we label the nodes 1, . . . , n, H and index I are both arrays of length n. Including thespace needed for the adjacency list, we need a total memory of O(m) +O(n).

The run-time is determined by the number of heap operations Delete-Min, Insert

and Decrease-Value. We encounter the first two at the most n times and the thirdat the most m times. Since every single heap operation needs O(logn) time, Dijkstra’salgorithm runs in O((m+n) logn). For sparse network, this simplifies to O(n logn) whichis the fastest weighted shortest-path algorithm known to date.11

If we are interested in the shortest paths as well as the distances, we should run thefollowing code which is almost identical to the one we have seen in the unweighted case.The only difference is that we call Dijkstra instead of BFS in line 2.

Shortest-Dijkstra-Path(G, c, s, t)

1 Dijkstra(G, c, s)2 if t 6= s and t.π == nil

3 Print t “is not in the out-component of” s4 else u = t.π5 print “The predecessor of” t “is” u6 while u.π 6= nil

7 v = u.π8 print “The predecessor of” u “is” v9 u = v

11If the network is not sparse, one can achieve a better asymptotic run time O(m + n log n) with adata structure known as Fibonacci heap. In practice, most networks are sparse, so that a Fibonacci heapdoes not accelerate the computation compared to a binary heap. For sparse networks, the Fibonacciheap requires so much computational overhead that it is usually even slower.

83

13 Minimum cost flows – basic algorithms

13.1 Introduction

In a minimum cost flow problem, we wish to find a flow of a commodity from a set ofsupply nodes to a set of demand nodes that minimises the total cost caused by trans-porting the commodity across the network. Minimum cost flow problems arise in manyindustrial applications.

Example:A car manufacturer has two production plants, delivers to two retail centres and offersthree different car models. The retail centres request a specific number of cars of eachmodel. The firm must

• determine the production plan of each model at each plant,

• find a shipping pattern that satisfies the demands of each retail centre,

• minimise the overall cost of production and transportation.

Plantnodes

Plant/modelnodes

Retailer/modelnodes

Retailernodes

p1

p2

p2/m3

p2/m2

p2/m1

p1/m2

p1/m1

r2/m2

r2/m1

r1/m3

r1/m2

r1/m1

r1

r2

Figure 45: Production-distribution model.

We can map this problem onto a network by introducing four kinds of nodes (Fig. 45)

• plant nodes, representing the various plants,

• plant/model nodes, corresponding to each model made at a plant,

• retailer/model nodes, corresponding to the models required by each retailer,

• retailer nodes, representing each retailer.

There are three types of links.

• Production links, connecting a plant to a plant/model node. The cost of such a linkis the cost of producing the model at this plant.

• Transportation links, connecting plant/model nodes to retailer/model nodes. Thecost of such a link is the cost of shipping one car from the plant to the retail centre.

84

• Demand links, connecting retailer/model nodes to the retailer nodes. These arcshave zero cost.

An important feature of such distribution problems are capacity constraints.

• Maximum capacity for production links: Production plants can only manufacture alimited number of cars per unit time.

• Maximum capacity for transportation links: The number of available trains/shipsetc. to deliver the products to the retail centres is limited.

• Maximum capacity for demand links: The retail centre can only sell as many carsas demanded by the customers.

The optimal solution for the firm is a minimum cost flow of cars from the plant nodes tothe retailer nodes that satisfy these capacity constraints.

13.2 Notation and assumptions

Let G = (N,L) be a directed network with a cost cl and a capacity ul associated withevery link l ∈ L. We associate with each node i ∈ N a number ri which indicates

• the supply if ri > 0,

• the demand if ri < 0.

Definition 13.1:Let G = (N,L) be a directed network. A vector f = (fl)l∈L that satisfies the constraints

(flow balance)∑

link l points out of i

fl

︸︷︷︸

out-flow

−∑

link l points into i

fl

︸︷︷︸

in-flow

= ri for all i ∈ N, (51)

(capacity constraints) 0 ≤ fl ≤ ul for all l ∈ L. (52)

is called a feasible flow.

Minimum cost flow problem:Find the feasible flow that minimises

C(f) =∑

l∈L

clfl. (53)

Assumptions:

(A) All input data (cost, supply/demand, capacity) are integers.

(B) There exists a feasible flow.

(C) The total supply equals the total demand,∑

i∈N ri = 0.

(D) All costs are non-negative, cl ≥ 0 for all l ∈ L.

(E) If L contains a link i → j, then it does not contain a link in the opposite directionj → i.

85

(a)i

j

u1 u2

(b)i

j

ku1

u2

u2

Figure 46: Converting a network (a) with antiparallel links to an equivalent one (b) withoutantiparallel links. The numbers indicate capacities. We add an auxiliary node x and replacethe link j → i by the pair of links j → k and k → i with the same capacity u2 as the originallink.

The last assumption is primarily to make the notation simpler. It does not actually causeany loss of generality because, if there are antiparallel links, we can perform the networktransformation depicted in Fig. 46.

Definition 13.2:Let G = (N,L) be a directed network and f a vector satisfying the capacity constraints0 ≤ fl ≤ ul. Such a vector is called a pseudo-flow because it may not satisfy the flowbalance equation 51. Define an additional set of links Lmirror by:

link i→ j ∈ Lmirror ⇔ link j → i ∈ L,

so that Lmirror contains the antiparallel links of L. Because of assumption (E), L andLmirror are disjoint sets. This allows us to define the function mirror : (L ∪ Lmirror) →(L ∪ Lmirror) with

mirror(l) =

{

link j → i ∈ L if l : i→ j ∈ Lmirror,

link j → i ∈ Lmirror if l : i→ j ∈ L,

so mirror(l) is the antiparallel link of l.The residual cost cresl is defined for all l ∈ (L ∪ Lmirror) by

cresl =

{

cl if l ∈ L,

−cmirror(l) if l ∈ Lmirror.

and the residual capacity is defined by

uresl =

{

ul − fl if l ∈ L,

fmirror(l) if l ∈ Lmirror,

Definition 13.3:The residual network for a given network G = (N,L) and a given pseudo-flow f is thenetwork G(f) = (N,L(f)) with L(f) = {l ∈ L ∪ Lmirror : uresl > 0}.Example:In Fig. 47(a), the black arrows show a directed network G with costs cl and capacities

86

i j(cost, capacity)

(a)

a

b

c

d

(2,2)

(2,4)

(1,2)

(3,3)

(1,5)

(b)

2

2

2

4

a

b

c

d

flow

i j(residual cost,residual capacity)

(c)

a

b

c

d

(-2,2)

(2,2)

(-2,2)

(-1,2)

(3,3)

(-1,4)(1

,1)

Figure 47: (a) Original network. (b) Flow. (c) Residual network.

ul indicated near the links. The flows fl are given by the red numbers in (b). Thecorresponding residual network G(f) is shown in Fig. 47(b), where the numbers near thelinks are now the residual costs and capacities.

Motivation behind defining the residual network:Most algorithms to find minimum cost flows are iterative and construct an intermediatesolution f . In the next iteration, the algorithm can only add flow to f on the links in theresidual network G(f):

• On a link l where fl is at the maximum capacity, we cannot add more flow on l.However, we can send flow on the antiparallel link which cancels out some of theflow on l.

• If fl = 0, we can add flow to l in the next iteration as long as ul > 0. However, wecannot reduce the flow on l by adding flow on the antiparallel link.

• If 0 < fl < ul, we can either add flow in the direction of l or reduce it by addingflow in the opposite direction,

We need two more definitions, namely node potentials and reduced costs, before we canpresent our minimum cost flow algorithm.

Definition 13.4:

• Any set of real values π = (π1, . . . , πn) associated with the nodes 1, . . . , n is calleda node potential.

• If the link l points from node i to j, its reduced cost with respect to the nodepotential π is defined by cπl = cresl − πi + πj.

The following pseudo-code implements one possible technique to solve the minimum costflow problem, namely the successive shortest path algorithm.

87

Successive-Shortest-Path(G, {cl}l∈L, {ul}l∈L, {ri}i∈N)

1 for each link l2 fl = 03 for each node i4 πi = 05 ei = ri // Initialise supply and demand.6 Initialise the sets E = {i : e(i) > 0} and D = {i : e(i) < 0}.7 while E 6= ∅8 Select a node p ∈ E and a node q ∈ D.9 Determine the shortest path distances δ(p, i) from node p to all other nodes i

in the residual network G(f) where the link weights are the reduced costs cπl .Let P be the shortest path from node p to node q.

10 for each node i11 πi = πi − δ(p, i)12 Determine µ = min({ep,−eq} ∪ {uresl : l ∈ P}).13 Augment µ units of flow along the path P .14 Update f , G(f), E, D and all reduced costs cπl .

Example:In Fig. 48(a), the only only supply node is a and the only demand node is d. Thus,initially E = {a} and D = {d}. The shortest path distances with respect to the reducedcosts are δ(a, b) = 2, δ(a, c) = 2 and δ(a, d) = 3. The shortest path is P : a → c →d. Figure 48(b) shows the updated node potentials and reduced costs. We can sendµ = min{ea,−ed, uresac , urescd } = min{4, 4, 2, 5} = 5 units of flow along P . Afterwards, theupdated residual network looks as in Fig. 48(c).In the second iteration, we have again E = {a} and D = {d}, but the distances are nowδ(a, b) = 0, δ(a, c) = 1 and δ(a, d) = 1. The shortest path is P : a → b → c → d. Theresulting node potentials and reduced costs are shown in Fig. 48(d). We can augmentthe flow by min{ea,−ed, uresab , uresbc , urescd } = min{2, 2, 4, 2, 3} = 2 units. At the end of thisiteration, ea = eb = ec = ed = 0 and the algorithm terminates.

Remark:The successive shortest path algorithm is relatively easy to implement and adequatefor many purposes. If U is an upper bound on the largest supply ri of a node and ifDijkstra’s algorithm is implemented using binary heaps, the run-time scales O(U(m +n)n log n). However, there are alternative methods (known as capacity scaling or costscaling algorithms) that achieve better worst-case run times.

Convex cost flows:In Equation 53, we have assumed that the cost cl is independent of the flow. In someapplications this is not true. For example in electrical resistor networks, the current f

minimises the function C(f) =∑

lRlf2l , where Rl is the Ohmic resistance. More generally,

we would haveC(f) =

∑

l

hl(fl).

If hl is a monotonic, convex, piecewise linear function with hl(0) = 0, there is a “quickand dirty” way to apply the successive shortest path algorithm. Consider a link whose

88

i j(reduced cost,residual capacity)

(a)

a

b

c

d

(2, 2)

(2, 4

)

(1, 2)

(3, 3)

(1, 5

)

ea = 4πa = 0

eb = 0πb = 0

ed = −4πd = 0

ec = 0πc = 0

(b)

a

b

c

d

(0, 2)

(0, 4

)

(1, 2)

(2, 3)

(0, 5

)

ea = 4πa = 0

eb = 0πb = −2

ed = −4πd = −3

ec = 0πc = −2

(c)

a

b

c

d

(0, 2)

(0, 4

)

(1, 2)

(2, 3)

(0, 3

)

(0, 2

)

ea = 2πa = 0

eb = 0πb = −2

ed = −2πd = −3

ec = 0πc = −2

(d)

a

b

c

d

(1, 2)

(0, 4

)

(0, 2)

(1, 3)

(0, 3

)

(0, 2

)

ea = 2πa = 0

eb = 0πb = −2

ed = −2πd = −4

ec = 0πc = −3

(e)

a

b

c

d

(1, 2)

(0, 2

)

(0, 2

)

(0, 2)

(1, 3)

(0, 1

)

(0, 4

)

ea = 0πa = 0

eb = 0πb = −2

ed = 0πd = −4

ec = 0πc = −3

Figure 48: Illustration of the successive shortest path algorithm. (a) Initial network. (b)Network after updating the node potentials π. (c) Network after augmenting two units of flowalong the path a → c → d. (d) Network after updating the node potentials π. (e) Networkafter augmenting two units along a → b → c → d.

89

cost is given by the function depicted in Fig. 49 If we replace the single link l with 4

x

h(x)

1 2 3 4

5

10

15

13

6

9slopes

Figure 49: Example of a piecewise linear, monotonic, convex function.

different links of capacity 1 and costs equal to the different slopes (Fig. 50), then we canapply exactly the same algorithm as before to this extended network. There are better

i j(cost, capacity)

(a)

i j( h( f )

f, 4)

(b)

i j

(1,1)

(3,1)

(6,1)

(9,1)

Figure 50: Illustrating the network transformation from (a) a flow-dependent cost to (b) aflow-independent cost for the function shown in Fig. 49.

tailor-made algorithms for convex cost flow problems, but this network transformation isparticularly simple to programme.

90

14 The Price of Anarchy

In this section, we look at uncapacitated, directed networks. We drop the assumptionthat flows and costs must be integers.

14.1 Introduction

Example 14.1:(Pigou, 1920)

Figure 51: Pigou’s example

Suppose a total of r = 10 vehicles/minute travel from s to t.r = f1 + f2.Travel time per minute: c1(f1) = f1,

c2(f2) = 10.How do drivers decide if they should take path 1 or 2?

Wardrop’s principles (1952):

• First principle (Nash equilibrium):Travel times on all paths with non-zero traffic ≤ travel time on any (used or unused)path.

• Second principle (social optimum):The sum of all travel times is minimised.

Are both principles equivalent?

Let us call the sum of all travel times in the example above S.

Nash equilibrium:

c1(f1) = c2(f2) ⇒ f1 = 10 ⇒ f2 = r − f1 = 0 ⇒ SNE = 100.

Social optimum:

S = f1c1(f1) + f2c2(f2) = f12 + 10f2 = f1

2 + 10(r − f1).

The minimum satisfies dSdf1

= 0 ⇒ 2f1 − 10 = 0 ⇒ f1 = 5.

91

(Check: d2Sdf1

2 = 2 > 0.√

)

⇒ At the social optimum SSO = 75.

⇒ Wardrop’s first and second principle lead to different flows and differ-ent travel times.

The ratio ρ := SNE

SSOis called the Price of Anarchy. In Pigou’s example ρ = 4/3.

We will prove: for every network with non-negative, linearly increasing costs on all links

l (i.e. cl = alfl + bl, al, bl ≥ 0) the Price of Anarchy is ∈[1, 4

3

]. The lower and upper

bounds are tight.

⇒ The Nash equilibrium cost can be at most 4/3 times the social optimum cost, regardless ofthe exact network topology.

The lower bound immediately follows from the definition of the social optimum as theflow minimising S. The upper bound needs more work ...

14.2 Notation

• Consider a network with node set N and link set L.

• There are k source nodes s1, . . . , sk ∈ N and k sink nodes t1, . . . , tk ∈ N .

• Traffic from si is destined for ti and has rate ri.

• An origin-destination pair {si, ti} is called a commodity.

• Pi: set of all paths from si to ti.P := ∪iPi.(Reminder: A path is a walk that contains no cycles.)

• A flow φ is a non-negative real vector indexed by P.A feasible flow satisfies

∑

P∈PiφP = ri ∀i ∈ {1, . . . , k}.

† A flow φ induces a “flow on the links” {fl}l∈L where fl =∑

P∈P:l∈P φP .Let us call f the “link-representation of φ.”

• Each link l has a cost cl(fl) = alfl + bl, al, bl ≥ 0.

‡ The cost of a path P with respect to a flow φ is χP (φ) :=∑

l∈P cl(fl).

• The total cost of φ is S(φ) :=∑

P∈P χP (φ)φP .

• Note: we can express S also as a function of flows on the links:

S(φ)‡=∑

P∈P

(∑

l∈P cl(fl))φP =

∑

l∈L

(∑

P∈P:l∈P φP)cl(fl)

†=∑

l∈L flcl(fl).

92

s1

t1

s2

t2

1

23

4

56

7 8f1=r2

f2=r1+ 13 r2

f3=r1+ 13 r2

f4=r1

f5=r1

f6=r2

f7= 23 r2

f8= 23 r2

Figure 52: A two-commodity network.

14.3 Flows at a social optimum

Additional notation:

• hl(fl) := flcl(fl) = alfl2 + blfl.

The derivative h′l(fl) = 2alfl + bl is called the marginal cost function. The cost ofadding an infinitesimal amount ǫ of flow on link l equals ǫh′l(fl) +O(ǫ2)

• h′P (φ) :=∑

l∈P h′l(fl) =

∑

l∈P (2alfl + bl).

Definition 14.2:A feasible flow φ is a social optimum if it minimises S(φ) =

∑

l∈L hl(fl).

Existence of a social optimum:S(φ) is continuous and defined in a closed, bounded region (namely the set of feasibleflows). ⇒ There is always a social optimum.

Proposition 14.3:Let φ∗ be a feasible flow and f ∗ its link-representation. Then the next three statementsare equivalent.

(A) φ∗ is a social optimum.

(B) hP1

′(φ∗) ≤ hP2

′(φ∗) for every i ∈ {1, . . . , k} and P1, P2 ∈ Pi with φ∗P1> 0.

(C) For every feasible flow φ with link-representation f ,∑

l∈L h′l(f

∗l )f

∗l ≤∑l∈L h

′l(f

∗l )fl.

Note: (B) implies that all paths P with non-zero traffic have equal h′P (φ).

Proof “(A) ⇒ (B)”:Suppose φ∗ is an optimal flow. Consider an si-ti path P1 ∈ P with φ∗

P1> 0 and another

si-ti path P2 ∈ P. (Note: there must be traffic on P1, but not necessarily on P2.)

93

x

h(x)

x*

Figure 53: A convex function h (solid curve) and its linear approximation (dashed line) at thepoint x∗.

Transfer a small amount of flow ǫ ∈ (0, φ∗P1

] from P1 to P2. This yields a feasible flow φwith total cost

S(φ) =∑

l∈L

hl(f∗l )

︸︷︷︸

S(φ∗)

+ǫ

[∑

l∈P2

h′l(f∗l ) −

∑

l∈P1

h′l(f∗l )

]

+ ǫ2

[∑

l∈P2

al −∑

l∈P1

al

]

,

where we have used the fact that all hl are quadratic.Because φ∗ is optimal, we must have S(φ) ≥ S(φ∗). Since ǫ > 0,

∑

l∈P2

h′l(f∗l ) −

∑

l∈P1

h′l(f∗l ) ≥ ǫ

[∑

l∈P2

al −∑

l∈P1

al

]

.

(B) follows for ǫ→ 0+.

Proof “(B) ⇒ (C)”:Consider

H(φ) :=∑

P∈P

h′P (φ∗)φP ,

where φ is an arbitrary feasible flow.Note: φ∗ is fixed. ⇒ h′P (φ∗) is independent of φ. ⇒Finding the minimum of H(φ) is a congestion-independent min-cost flow problem. ⇒The problem can be solved by independently minimising the cost for every commodityi. Given (B), the best choice is to route the flow from si to ti on one of those paths Pwhere φ∗

P > 0. Because the cost is equal on all these paths, we can, for example, obtaina minimum of H(φ) by routing all flow exactly as in φ∗. ⇒

∑

P∈P

h′P (φ∗)φP ≥∑

P∈P

h′P (φ∗)φ∗P

Rearranging the terms in the sum,

∑

l∈L

h′l(f∗l )fl ≥

∑

l∈L

h′l(f∗l )f

∗l .

94

s t

c1( f1) = 1

c2( f2) = 1

Figure 54: A simple example where the social optimum is not unique.

Proof “(C) ⇒ (A)”:al ≥ 0 ∀l ∈ L. ⇒ hl(fl) = alfl

2 + blfl is convex. ⇒hl(fl) ≥ hl(f

∗l ) + (fl − f ∗

l )h′l(f

∗l ), see Fig. 53. ⇒

S(φ) =∑

l∈L hl(fl)

≥∑l∈L [hl(f∗l ) + h′l(f

∗l )(fl − f ∗

l )]

=∑

l∈L hl(f∗l ) +

∑

l∈L h′l(f

∗l )(fl − f ∗

l )

(C)

≥ ∑

l∈L hl(f∗l ) = S(φ∗).

�

Remark 14.4:

• If φ and ψ are two social optima, their costs must be the same: S(φ) = S(ψ).Otherwise one cost would be larger and thus not a social optimum.

• However, the flow is not unique. Consider the network in Fig. 54. If r is the trafficdemand between s and t, you can distribute the traffic arbitrarily on the two links.

• On the other hand, it is possible to show that for all social optima the flows onlinks l with al > 0 are equal. (Hint: use convexity of hl.)

14.4 Flows at Nash equilibrium

A flow is at Nash equilibrium if no user can reduce his/her travel time by unilaterallychanging paths.We assume that all users are only responsible for an infinitesimal amount of traffic.

Definition 14.5:A feasible flow φ is at Nash equilibrium if for all

• commodities i ∈ {1, . . . , k},

• si-ti paths P1, P2 ∈ Pi with φP1 > 0,

• amounts δ ∈ (0, φP1] of traffic on P1,

95

the costs satisfy χP1(φ) ≤ χP2(ψ), where

ψP =

φP − δ if P = P1,

φP + δ if P = P2,

φP otherwise,

is the flow obtained by moving δ units of flow from P1 to P2.

Proposition 14.6 (Wardrop’s first principle):Let φ be a feasible flow. The following two statements are equivalent.(A) φ is at Nash equilibrium.(B) χP1(φ) ≤ χP2(φ) for every i ∈ {1, . . . , k} and P1, P2 ∈ Pi with φP1 > 0.

Proof “(A) ⇒ (B)”:Let δ → 0 so that ψ → φ. ⇒χP1(φ) ≤ limψ→φ χP2(ψ)

continuity= χP2(limψ→φ ψ) = χP2(φ).

Proof “(B) ⇒ (A)”:The cost functions cl(fl) are monotonically increasing. ⇒When moving more flow to P2, χP2 cannot decrease. ⇒χP2(φ) ≤ χP2(ψ). �

Note the similarity between the statements (B) in Proposition 14.3 and 14.6. This moti-vates

Proposition 14.7:Let φ be a feasible flow. The following two statements are equivalent.(A) φ is at Nash equilibrium.(B) φ is a minimum of S(φ) :=

∑

l∈L hl(fl), where hl(fl) = 12alfl

2 + blfl.

Proof:Because χP (φ) =

∑

l∈P cl(fl) =∑

l∈P h′l(fl) = h′P (φ), we have h′P1

(φ) ≤ h′P2(φ). ⇒

The situation is the same as in Proposition 14.3, only with a tilde on all the functionnames and constants. �

Remark 14.8:In a Nash equilibrium, the cost

cl(fl) = alfl + bl

is replaced bycl(fl) := alfl + bl,

where

al =1

2al, bl = bl.

⇒ The flow-dependent term is given less weight. ⇒

Nash flows are socially optimal flows, but not for the “correct” cost functions!

96

social optimum Nash equilibriumauxiliary congestion-

al al = 12aldependent coefficient

auxiliary congestion-bl bl = blindependent coefficient

auxiliary cost percl = alfl + bl cl = alfl + blunit traffic

auxiliary cost forhl = clfl hl = clflall traffic on link

auxiliary functionS =

∑

l∈L hl(fl) S =∑

l∈L hl(fl)minimisedreal cost paidby all users S =

∑

l∈L hl(fl)

Pigou’s example (see Ex. 14.1):

c1(f1) = 12f1, c2(f2) = 10. ⇒ S = 1

2f1

2 + 10f2.

Because r = f1 + f2 = 10, S = 12f1

2 + 10(10 − f1).

Minimum: dSdf1

= f1 − 10 = 0. ⇒f1 = 10, f2 = 0, in agreement with our results in Ex. 14.1.

Lemma 14.9:Suppose φ is a flow at Nash equilibrium for traffic rates r1, . . . , rk.If the traffic rates are replaced by r1/2, . . . , rk/2, the flow φ/2 is a social optimum forthese new rates.

Proof:Let f be the link-representation of the flow φ. The flow φ/2 then has link-representationf/2 on all the links.hl(fl/2) =

(12alfl + bl

)fl

2= 1

2hl(fl). ⇒

Because φ is a Nash flow and thus minimises S =∑

l∈L hl(fl), φ/2 minimises S =∑

l∈L hl(fl/2) = 12S. �

Corollary 14.10:There exists always a Nash equilibrium and its cost is unique.

Proof: ExistenceS is continuous and the space of feasible flows is closed and bounded.

Proof: UniquenessSuppose φ0 and φ1 are flows at Nash equilibrium.From Prop. 14.7: φ0 and φ1 are global minima of S.

In particular, S(φ0) = S(φ1).

Consider φλ = (1 − λ)φ0 + λφ1, λ ∈ [0, 1].

97

S(φ0)φ0 global min.

≤ S(φλ)S convex

≤ (1 − λ)S(φ0) + λS(φ1) = S(φ0)

⇒ S(φλ) = (1 − λ)S(φ0) + λS(φ1) ∀λ ∈ [0, 1].

Let f0, f1 be the induced flow on the links.∑

l∈L hl((1 − λ)f0,l + λf1,l) =∑

l∈L[(1 − λ)hl(f0,l) + λhl(f1,l)].

Because all hl are convex, equality can only hold if

hl((1 − λ)f0,l + λf1,l) = (1 − λ)hl(f0,l) + λhl(f1,l) ∀ l.(Otherwise “=” would turn into “≤”.)

⇒ All hl(fl) = flcl(fl) must be linear between f0 and f1.

⇒ cl(f0,l) = cl(f1,l). ⇒ f0,l = f1,l or al = 0. ⇒ cl(f0,l) = cl(f1,l).

⇒ S(φ0) = S(φ1). �

14.5 How bad is selfish routing?

Definition 14.11:Let φSO be a socially optimal flow and φNE a flow at Nash equilibrium for the samenetwork. The ratio

ρ =S(φNE)

S(φSO)

is called the Price of Anarchy.

Remark:Because of Remark 14.4 and Corollary 14.10, ρ only depends on the network, not on φSOor φNE .

We now want to give an upper bound for ρ. The next two lemmas will become helpful.

Lemma 14.12:Every flow φ satisfies S(φ/2) ≥ 1

4S(φ).

Proof:Let f be the flow induced on the links.

hl(fl) = alfl2 + blfl, bl ≥ 0 ⇒

hl(fl/2) = 14alfl

2 + 12blfl ≥ 1

4alfl

2 + 14blfl = 1

4hl(fl) ⇒

S(φ/2) =∑

l∈L hl(fl/2) ≥ 14

∑

l∈L hl(fl) = 14S(φ). �

Lemma 14.13:Let ri be the traffic rate from si to ti and φ∗ a socially optimal flow. Let f ∗ be the inducedflow on the links.Now consider traffic on the same network with the increased rates (1 + δ)ri, δ > 0.Every flow φ feasible for the augmented rates satisfies

S(φ) ≥ S(φ∗) + δ∑

l∈L

h′l(f∗l )f

∗l ,

where h′l(fl) = 2alfl+bl is the marginal cost function defined at the beginning of Sec. 14.3.

98

Proof:Let f be the flow on the links induced by φ.All hl are convex. ⇒ hl(fl) ≥ hl(f

∗l ) + (fl − f ∗

l )h′l(f

∗l ), see Fig. 53. ⇒

S(φ) =∑

l∈L

hl(fl) ≥ S(φ∗) +∑

l∈L

(fl − f ∗l )h

′l(f

∗l ). (54)

Apply Proposition 14.3(C) to the flow φ/(1 + δ) ⇒

1

1 + δ

∑

l∈L

h′l(f∗l )fl ≥

∑

l∈L

h′l(f∗l )f

∗l (55)

Inserting Eq. (55) into Eq. (54) proves the lemma.

Theorem 14.14:The Price of Anarchy has an upper bound 4/3,

1 ≤ ρ ≤ 43.

Proof:Let φNE be a flow at Nash equilibrium for traffic rates r1, . . . , rk.Let φ be an arbitrary feasible flow for the same rates.

From Lemma 14.9, φ∗ = 12φNE is optimal for traffic rates r1/2, . . . , rk/2.

Let fNE, f , f ∗ be the link representations of φNE, φ, φ∗.

We can apply Lemma 14.13 with δ = 1,

S(φ) ≥ S(φ∗) +∑

l∈L

h′l(f∗l )f

∗l = S

(φNE

2

)

+∑

l∈L

h′l

(fNEl

2

)fNEl

2.

Next, apply Lemma 14.12 to the first term,

S(φ) ≥ 1

4S(φNE) +

∑

l∈L

h′l

(fNEl

2

)fNEl

2.

Finally, we use that h′l(fl) = 2alfl + bl and cl(fl) = alfl + bl,

S(φ) ≥ 1

4S(φNE) +

1

2

∑

l∈L

cl(fNEl )fNEl =

3

4S(φNE). �

14.6 Braess paradox

Nash equilibrium in Fig. 55a:Suppose there is a traffic rate r = 10 between the left and right nodes. ⇒fa = fb = fc = fd = 5 ⇒cost: Sa = f 2

a + 10fb + 10fc + f 2d = 150.

Nash equilibrium in Fig. 55b:

99

Figure 55: The Braess paradox. The added link in (b) causes additional cost for everybody.

Let us try to reduce the cost by inserting a “perfect” road with cost ce = 0 from top tobottom. ⇒In the Nash equilibrium, all vehicles will follow the path marked in red (i.e. fa = fd =fe = 10, fb = fc = 0). ⇒cost: Sb = 10fa + 10fd = 200. ⇒Counterintuitively, Sb > Sa.

Braess paradox: in a Nash equilibrium, network improvements can de-grade network performance!

For linear costs, Theorem 14.14 gives an upper bound on the severity of the Braessparadox: added links can increase travel times at the most by a factor 4/3.

Remark:

• If costs are not linear, the Price of Anarchy is in general not bounded and theBraess paradox can be arbitrarily severe.

• For certain classes of non-linear convex cost functions, upper bounds for the Priceof Anarchy can be proved. For details, see Tim Roughgarden, Selfish Routing andthe Price of Anarchy, The MIT Press (2005).

100

GRAPH Lecture Notes

Documents

Transcript of GRAPH Lecture Notes