Social Networks

54
SOCIAL NETWORKS FIVE SHORT STORIES 1 LEONID ZHUKOV NATIONAL RESEARCH UNIVERSITY HIGHER SCHOOL OF ECONOMICS [email protected]

description

Introductory talk about history, structure, dynamics and processes in social networks

Transcript of Social Networks

Page 1: Social Networks

SOCIAL NETWORKSFIVE SHORT STORIES

1

LEONID ZHUKOV

NATIONAL RESEARCH UNIVERSITYHIGHER SCHOOL OF ECONOMICS

[email protected]

Page 2: Social Networks

FIVE SHORT STORIES

SCIENTISTS AND POETS

THIS IS A SMALL WORLD

RICH GET RICHER

STRENGTH OF WEAK TIES

ECONOMICS OF FRIENDSHIP

FOLLOWING THE CROWD

2

Page 3: Social Networks

SCIENTISTS AND POETSINTRODUCTORY STORY

3

Page 4: Social Networks

THE VERY BEGINNING

4

1736: LEONARD EULER. KOENIGSBERG BRIDGES

1929: FRIGYES KARINTHY “CHAINS - LANCSZEMEK”

Page 5: Social Networks

60TH AND 70TH

5

1959: PAUL ERDOS, RANDOM NETWORKS

1967: STANLEY MILGRAM, SMALL WORLD

1973: MARK GRANOVETER, STRENGTH OF WEAK TIES

Page 6: Social Networks

LAST 10 YEARS

ALBERT-LÁSZLÓ BARABÁSI, NORHEASTERN, PHYSICS.

DUNKAN WATTS, COLUMBIA, SOCIOLOGY

PAUL NEWMAN, UNIV OF MICHIGAN, PHYSICS

JOHN KLEINBERG, CORNELL, COMPUTER SCIENCE

MATTHEW JACKSON, STANFORD, ECONOMICS

6

Page 7: Social Networks

SUBJECTS

COMPUTER SCIENCE: ALGORITHMS, GRAPH THEORY, SEARCH ON GRAPH, PATHS LENGTH, CONNECTED COMPONENTS, CLIQUES, GRAPH COLORING ETC

SOCIOLOGY: SOCIAL ROLES, STATUS, IDENTITY, COMMUNITIES, INFLUENCE, COHESIVENESS

PHYSICS: STATISTICS, PHASE TRANSITIONS, EVOLUTION MODELS, DYNAMICAL SYSTEM

ECONOMICS: NETWORK GAMES, OPTIMALITY, EQUILIBRIUM

7

Page 8: Social Networks

COMPLEX NETWORKS

NETWORK(GRAPH) : NODES AND CONNECTIONS (EDGES)

COMPLEX

NOT REGULAR NOR RANDOM

VARIOUS

UNIVERSAL

8

CLASS C NETWORKS IMAGE BY BARRETT LYON

Page 9: Social Networks

COMPLEX NETWORKS

9

PROTEIN - PROTEIN INTERACTION MAP OF SCIENTIFIC JPORNALS

IMAGE BY HAWOONG JEONG IMAGE BY JOHAN BOLLEN

Page 10: Social Networks

COMPLEX NETWORKS

10

TWITTER FOLLOWERS IMAGE BY BURAK ARIKAN

Page 11: Social Networks

THIS IS A SMALL WORLDFIRST STORY

11

Page 12: Social Networks

SMALL WORLD

“THE SMALL-WORLD PROBLEM”. STANLEY MILGRAM. 1967.

“AN EXPERIMENTAL STUDY OF THE SMALL WORLD PROBLEM”, J. TRAVERS, S. MILGRAM, 1969

12

Page 13: Social Networks

1969 EXPERIMENT

296 VOLUNTEERS, 217 SENT

196 NEBRASKA (1300 MILES)

100 BOSTON (25 MILES)

TARGET IN BOSTON

13

vaguely ‘out there,’ on the Great Plains or somewhere.” There was littleconsensus about how many links it would take to connect people fromthese remote areas. Milgram himself pointed out in 1969, “Recently Iasked a person of intelligence how many steps he thought it would take,and he said that it would require 100 intermediate persons, or more, tomove from Nebraska to Sharon.”

Milgram’s experiment entailed sending letters to randomly chosenresidents of Wichita and Omaha asking them to participate in a studyof social contact in American society. The letter contained a shortsummary of the study’s purpose, a photograph, and the name and ad-dress of and other information about one of the target persons, alongwith the following four-step instructions:

HOW TO TAKE PART IN THIS STUDY

1. ADD YOUR NAME TO THE ROSTER AT THE BOT-TOM OF THIS SHEET, so that the next person who re-ceives this letter will know who it came from.

2. DETACH ONE POSTCARD. FILL IT OUT AND RE-TURN IT TO HARVARD UNIVERSITY. No stamp isneeded. The postcard is very important. It allows us to keeptrack of the progress of the folder as it moves toward the tar-get person.

3. IF YOU KNOW THE TARGET PERSON ON A PER-SONAL BASIS, MAIL THIS FOLDER DIRECTLY TOHIM (HER). Do this only if you have previously met thetarget person and know each other on a first name basis.

4. IF YOU DO NOT KNOW THE TARGET PERSON ON APERSONAL BASIS, DO NOT TRY TO CONTACT HIMDIRECTLY. INSTEAD, MAIL THIS FOLDER (POST-CARDS AND ALL) TO A PERSONAL ACQUAIN-TANCE WHO IS MORE LIKELY THAN YOU TOKNOW THE TARGET PERSON. You may send the folder

LINKED28

0738206679-01.qxd 3/13/02 2:08 PM Page 28

to a friend, relative or acquaintance, but it must be someoneyou know on a first name basis.

Milgram had a pressing concern: Would any of the letters make itto the target? If the number of links was indeed around one hundred, ashis friend guessed, then the experiment would likely fail, since there isalways someone along such a long chain who does not cooperate. It wastherefore a pleasant surprise when within a few days the first letter ar-rived, passing through only two intermediate links! This would turn outto be the shortest path ever recorded, but eventually 42 of the 160 let-ters made it back, some requiring close to a dozen intermediates. Thesecompleted chains allowed Milgram to determine the number of peoplerequired to get the letter to the target. He found that the median num-ber of intermediate persons was 5.5, a very small number indeed—andcoincidentally, amazingly close to Karinthy’s suggestion. Round it up to6, however, and you get the famous “six degrees of separation.”

As Thomas Blass, a social psychologist who has devoted the last fif-teen years to in-depth research on the life and work of Stanley Milgram,pointed out to me, Milgram himself never used the phrase “six degrees ofseparation.” John Guare originated the term in his brilliant 1991 play ofthat title. After an extremely successful season on Broadway, the play wasmade into a movie with the same title. In the play, Ousa (played byStockard Channing in the movie), musing about our interconnectedness,tells her daughter, “Everybody on this planet is separated by only sixother people. Six degrees of separation. Between us and everybody elseon this planet. The president of the United States. A gondolier inVenice. . . . It’s not just the big names. It’s anyone. A native in a rain for-est. A Tierra del Fuegan. An Eskimo. I am bound to everyone on thisplanet by a trail of six people. It’s a profound thought. . . . How every per-son is a new door opening up into other worlds.”

Milgram’s study was confined to the United States, linking people“out there” in Wichita and Omaha to “over here” in Boston. ForGuare’s Ousa, however, six degrees applied to the whole world. Thus amyth was born. Because more people watch movies than read sociologypapers, Guare’s version has prevailed in popular thought.

Six Degrees of Separation 29

0738206679-01.qxd 3/13/02 2:08 PM Page 29

NAME, ADDRESS, OCCUPATION, JOB, HOMETOWN

Page 14: Social Networks

1969 EXPERIMENT

REACHED THE TARGET N = 64, 29%

AVE CHAIN LENGTH <L> = 5.2

CHANNELS:

HOMETOWN <L> = 6.1

BUSINESS CONTACTS <L> = 4.6

LOCATION:

BOSTON <L> = 4.4

NEBRASKA <L> = 5.7

14

Page 15: Social Networks

SIX DEGREES OF SEPARATION

DUNCAN WATTS, 2001, EMAIL, 48,000 SENDERS, <L> ~ 6

JURE LESKOVEC AND ERIC HORVITZ, 2007, MSN MESSENGER 240 MLN USERS, < L> = 6.6 USERS

YAHOO, 2011, “YAHOO RESEARCH SMALL WORLD EXPERIMENT” ON FACEBOOK :)

15

GRAPH DIAMETER DAVE PATH LENGTH <L>

CO-AUTHORSHIP NETWORK

IMAGE BY LOTHAR KREMPEL

Page 16: Social Networks

CAYLEY TREE (MOORE GRAPH)

16

6 26 106

A ROUGH ESTIMATE:

EACH HAS D FRIENDS D^K = N

K = LOG N/LOG D6 BLN

50 FRIENDSK~ 5.8

EXACT:

Page 17: Social Networks

SMALL WORLD MODEL

WATTS-STROGATZ MODEL

SOLVABLE MODEL

SMALL WORLD: <L>~ LOG(N)

17

“COLLECTIVE DYNAMICS OF SMALL-WORLD NETWORK”, D.J STROGATZ, S.H. WATTS. 1998

Page 18: Social Networks

RICH GET RICHER SECOND STORY

18

Page 19: Social Networks

SIMPLE HYPOTHESIS

WEB SEARCH 1999:

LYCOS, 1994; ALTAVISTA 1995, YAHOO, 1995; INKTOMI, 1996; GOOGLE 1998....

RAMBLER 1996; YANDEX 1997

 EACH PAGE LINKS INDEPENDENTLY AT RANDOM, CLT -> NORMAL DISTRIBUTION

19

“EMERGENCE OF SCALING IN RANDOM NETWORKS”. A-L BARABASI, R ALBERT. 1999

RANDOMVERSUSSCALE-FREENETWORKS I

RANDOMNETWORKS,which resemble the U.S.highway system(simplified in left map), consist of nodes with randomly placedconnections. In such systems, a plot of the distribution of nodelinkages will follow a bell-shaped curve (left graph), with mostnodes having approximately the same number of links.

In contrast, scale-free networks, which resemble,the U.S.airline system (simplified in right map). contain hubs [red)-

RandomNetwork

nodes with a very high number of links. In such networks, thedistribution of node linkages follows a power law [center graph)

in that most nodes have just a few connections and some havea tremendous number of links. In that sense, the system has no"scale." The defining characteristic of such networks is that thedistribution of links, if plotted on a double-logarithmic scale[right graph), results in a straight line.

Scale-FreeNetwork

BellCurve ~istribution of NodeLinkages

If)QJ-c0Z'0c;;..cE:::JZ

Number of Links

~~;;:'"~

Specifically, a power law does not have apeak, as a bell curve does, but is instead de-scribed by a continuously decreasing func-tion. When plotted on a double-logarith-mic scale, a power law is a straight line[see illustration above]. In contrast to thedemocratic distribution of links seen inrandom networks, power laws describesystems in which a few hubs, such as Ya-hoo and Google, dominate.

Hubs are simply forbidden in randomnetworks. When we began to map theWeb, we expected the nodes to follow abell-shaped distribution, as do people'sheights. Instead we discovered certainnodes that defied explanation, almost asif we had stumbled on a significant num-ber of people who were 100 feet tall, thusprompting us to coin the term" scale-free."

www.sciam.com

PowerLawDistribution of NodeLinkages

~L . ~~ ~ '

0 0 QJZ zCij

""'00 0 If)~ ~ 011~ ~~E E~:::J. :::JZ Z

Number of Links Number of Links (log scale)

Scale-Free Networks AboundOVER THE PAST several years, re-searchers have uncovered scale-free struc"tures in a stunning range of systems.When we studied the World Wide Web,we looked at the virtual network of Webpages connected to one another by hy-perlinks. In contrast, .ty1ichalisFaloutsosof the University of California at River-side, Petros Falotitsos of theUniversity ofToronto arid Christos Faloutsos of Car-negie MelloQ Uq~versity .analyzed tbephysical structure of the Internet. Thesethree computer-scientist brothers investi-gated the routers connected by optical orother communications lines and foundthat the topology of that network, too, isscale-free.

Researchers have also discovered that

some social networks are scale-free. A col-laboration between scientists from BostonUniversity and Stockholm University, forinstance, has shown that a netWork ofsexual relationships among people inSweden followed a poWer law: althoughmost individuals had only a few sexualpartners during their lifetime, a few (thehubs) had hundreds. A recent study ledby Stefan Bornholdt of the University ofKiel in Germany concluded that the net-work of people connected bye-mail islikewise scahfree. Sidney Redner ofBoston University demonstrated that thenetwork of scientific papers, connectedby citations, follows a power law as well.And Mark Newman of the University ofMichigan at Ann Arbor examined col-laborations among scientists in several

SCIENTIFIC AMERICAN 53

Page 20: Social Networks

POWER LAW DISTRIBUTION

20

ing systems form a huge genetic networkwhose vertices are proteins and genes, thechemical interactions between them repre-senting edges (2). At a different organization-al level, a large network is formed by thenervous system, whose vertices are the nervecells, connected by axons (3). But equallycomplex networks occur in social science,where vertices are individuals or organiza-tions and the edges are the social interactionsbetween them (4 ), or in the World Wide Web(WWW), whose vertices are HTML docu-ments connected by links pointing from onepage to another (5, 6 ). Because of their largesize and the complexity of their interactions,the topology of these networks is largelyunknown.

Traditionally, networks of complex topol-ogy have been described with the randomgraph theory of Erdos and Renyi (ER) (7 ),but in the absence of data on large networks,the predictions of the ER theory were rarelytested in the real world. However, driven bythe computerization of data acquisition, suchtopological information is increasingly avail-able, raising the possibility of understandingthe dynamical and topological stability oflarge networks.

Here we report on the existence of a highdegree of self-organization characterizing thelarge-scale properties of complex networks.Exploring several large databases describingthe topology of large networks that spanfields as diverse as the WWW or citationpatterns in science, we show that, indepen-dent of the system and the identity of itsconstituents, the probability P(k) that a ver-tex in the network interacts with k othervertices decays as a power law, followingP(k) ! k"#. This result indicates that largenetworks self-organize into a scale-free state,a feature unpredicted by all existing randomnetwork models. To explain the origin of thisscale invariance, we show that existing net-work models fail to incorporate growth andpreferential attachment, two key features ofreal networks. Using a model incorporating

these two ingredients, we show that they areresponsible for the power-law scaling ob-served in real networks. Finally, we arguethat these ingredients play an easily identifi-able and important role in the formation ofmany complex systems, which implies thatour results are relevant to a large class ofnetworks observed in nature.

Although there are many systems thatform complex networks, detailed topologicaldata is available for only a few. The collab-oration graph of movie actors represents awell-documented example of a social net-work. Each actor is represented by a vertex,two actors being connected if they were casttogether in the same movie. The probabilitythat an actor has k links (characterizing his orher popularity) has a power-law tail for largek, following P(k) ! k"#actor, where #actor $2.3 % 0.1 (Fig. 1A). A more complex net-work with over 800 million vertices (8) is theWWW, where a vertex is a document and theedges are the links pointing from one docu-ment to another. The topology of this graphdetermines the Web’s connectivity and, con-sequently, our effectiveness in locating infor-mation on the WWW (5). Information aboutP(k) can be obtained using robots (6 ), indi-cating that the probability that k documentspoint to a certain Web page follows a powerlaw, with #www $ 2.1 % 0.1 (Fig. 1B) (9). Anetwork whose topology reflects the histori-cal patterns of urban and industrial develop-ment is the electrical power grid of the west-ern United States, the vertices being genera-tors, transformers, and substations and theedges being to the high-voltage transmissionlines between them (10). Because of the rel-atively modest size of the network, contain-ing only 4941 vertices, the scaling region isless prominent but is nevertheless approxi-mated by a power law with an exponent#power ! 4 (Fig. 1C). Finally, a rather largecomplex network is formed by the citationpatterns of the scientific publications, the ver-tices being papers published in refereed jour-nals and the edges being links to the articles

cited in a paper. Recently Redner (11) hasshown that the probability that a paper iscited k times (representing the connectivity ofa paper within the network) follows a powerlaw with exponent #cite $ 3.

The above examples (12) demonstrate thatmany large random networks share the com-mon feature that the distribution of their localconnectivity is free of scale, following a powerlaw for large k with an exponent # between2.1 and 4, which is unexpected within theframework of the existing network models.The random graph model of ER (7 ) assumesthat we start with N vertices and connect eachpair of vertices with probability p. In themodel, the probability that a vertex has kedges follows a Poisson distribution P(k) $e"&&k/k!, where

& ! N"N " 1

k#pk'1 " p(N"1"k

In the small-world model recently intro-duced by Watts and Strogatz (WS) (10), Nvertices form a one-dimensional lattice,each vertex being connected to its twonearest and next-nearest neighbors. Withprobability p, each edge is reconnected to avertex chosen at random. The long-rangeconnections generated by this process de-crease the distance between the vertices,leading to a small-world phenomenon (13),often referred to as six degrees of separa-tion (14 ). For p $ 0, the probability distri-bution of the connectivities is P(k) $ )(k "z), where z is the coordination number inthe lattice; whereas for finite p, P(k) stillpeaks around z, but it gets broader (15). Acommon feature of the ER and WS modelsis that the probability of finding a highlyconnected vertex (that is, a large k) decreas-es exponentially with k; thus, vertices withlarge connectivity are practically absent. Incontrast, the power-law tail characterizingP(k) for the networks studied indicates thathighly connected (large k) vertices have alarge chance of occurring, dominating theconnectivity.

There are two generic aspects of real net-works that are not incorporated in these mod-els. First, both models assume that we startwith a fixed number (N) of vertices that arethen randomly connected (ER model), or re-connected (WS model), without modifyingN. In contrast, most real world networks areopen and they form by the continuous addi-tion of new vertices to the system, thus thenumber of vertices N increases throughoutthe lifetime of the network. For example, theactor network grows by the addition of newactors to the system, the WWW grows expo-nentially over time by the addition of newWeb pages (8), and the research literatureconstantly grows by the publication of newpapers. Consequently, a common feature of

Fig. 1. The distribution function of connectivities for various large networks. (A) Actor collaborationgraph with N $ 212,250 vertices and average connectivity *k+ $ 28.78. (B) WWW, N $325,729, *k+ $ 5.46 (6). (C) Power grid data, N $ 4941, *k+ $ 2.67. The dashed lines haveslopes (A) #actor $ 2.3, (B) #www $ 2.1 and (C) #power $ 4.

R E P O R T S

15 OCTOBER 1999 VOL 286 SCIENCE www.sciencemag.org510

on

Sept

embe

r 15,

200

7 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

DISTRIBUTION FUNCTION

ACTOR COLLABORATION GAMMA = 2.3

POWER GRID GAMMA= 4

WWW GAMMA= 2.3

Page 21: Social Networks

“GRAPH STRUCTURE IN THE WEB” ANDREJ BRODER, RAVI KUMAR, ET AL. 2000.

21

GRAPH STRUCTURE OF THE WEB

Page 22: Social Networks

SCALE FREE NETWORKS

22

6

100 102 104

word frequency

100

102

104

100 102 104

citations

100

102

104

106

100 102 104

web hits

100

102

104

106 107

books sold

1

10

100

100 102 104 106

telephone calls received

100

103

106

2 3 4 5 6 7earthquake magnitude

102

103

104

0.01 0.1 1crater diameter in km

10-4

10-2

100

102

102 103 104 105

peak intensity

101

102

103

104

1 10 100intensity

1

10

100

109 1010

net worth in US dollars

1

10

100

104 105 106

name frequency

100

102

104

103 105 107

population of city

100

102

104

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

FIG. 4 Cumulative distributions or “rank/frequency plots” of twelve quantities reputed to follow power laws. The distributionswere computed as described in Appendix A. Data in the shaded regions were excluded from the calculations of the exponentsin Table I. Source references for the data are given in the text. (a) Numbers of occurrences of words in the novel Moby Dickby Hermann Melville. (b) Numbers of citations to scientific papers published in 1981, from time of publication until June1997. (c) Numbers of hits on web sites by 60 000 users of the America Online Internet service for the day of 1 December 1997.(d) Numbers of copies of bestselling books sold in the US between 1895 and 1965. (e) Number of calls received by AT&Ttelephone customers in the US for a single day. (f) Magnitude of earthquakes in California between January 1910 and May 1992.Magnitude is proportional to the logarithm of the maximum amplitude of the earthquake, and hence the distribution obeys apower law even though the horizontal axis is linear. (g) Diameter of craters on the moon. Vertical axis is measured per squarekilometre. (h) Peak gamma-ray intensity of solar flares in counts per second, measured from Earth orbit between February1980 and November 1989. (i) Intensity of wars from 1816 to 1980, measured as battle deaths per 10 000 of the population of theparticipating countries. (j) Aggregate net worth in dollars of the richest individuals in the US in October 2003. (k) Frequencyof occurrence of family names in the US in the year 1990. (l) Populations of US cities in the year 2000.

STEVEN H. STROGATZ, 2001MARK E.J. NEWMAN, 2006

Page 23: Social Networks

PREFERENTIAL ATTACHMENT

BARABASI ALBERT MODEL

GROWING NETWORK: ON EVERY STEP A NEW NODE IS ADDED THAT LINKS TO EXISTING NODES

PREFERENTIAL ATTACHMENT: PROBABILITY OF CONNECTION TO A NODE IS PROPORTIONAL TO THE NODE DEGREE

23

BIRTHOFASCALE-FREENETWORKA SCALE-FREENETWORKgrows incrementally from two to 11 nodes in this example. When deciding where to establish a link, a new node(green) prefers to attach to an existing node (red) that already has many other connections. These two basic mechanisms-growthand preferential attachment-will eventually lead to the system's being dominated by hubs, nodes having an enormous number of links.

.---- -1 ~

~::;:~

;;:

~'"

connected actors are more likely to bechosen for new roles. On the Internet themore connected routers, which typicallyhave greater bandwidth, are more desir-able for new users. In the U.S. biotech in-dustry, well-established companies such asGenzyme tend to attract more alliances,which further increases their desirabilityfor future partnerships. Likewise, themost cited articles in the scientific litera-ture stimulate even more researchers toread and cite them, a phenomenon thatnoted sociologist Robert K. Mertoncalled the Matthew effect, after a passagein the New Testament: "For unto everyone that hath shall be given, and he shallhave abundance."

These two mechanisms-growth andpreferential attachment-help to explainthe existence of hubs: as new nodes ap-pear, they tend to connect to the moreconnected sites, and these popular loca-tions thus acquire more links over timethan their less connected neighbors. Andthis "rich get richer" process will gener-ally favor the early nodes, which are morelikely to eventually become hubs.

Along with Reka Albert, we have usedcomputer simulations and calculations toshow that a growing network with pref-erential attachment will indeed becomescale-free, with its distribution of nodesfollowing a power law. Although this the-oretical model is simplistic and needs tobe adapted to specific situations, it doesappear to confirm our explanation for

www.sciam.com

why scale-free networks are so ubiquitousin the real world.

Growth and preferential attachmentcan even help explicate the presence ofscale-free networks in biological systems.Andreas Wagner of the University ofNew Mexico and David A. Fell of OxfordBrookes University in England havefound, for instance, that the most-con-nected molecules in the E. coli metabolicnetwork tend to have an early evolution-ary history: some are believed to be rem-nants of the so-called RNA world (theevolutionary step before the eInergence ofDNA), and others are coiJ;].poneptsohremost ancient metabolic pathways.

Interestingly, the mechanism of pref-erential attachment tends to be linear. Inother words, a new node is twice as Hke-ly to link to an existing node that hastwice as many connections as its neigh-bor. Redner and his colleagues at BostonUniversity and elsewhere have investigat-ed different types of preferential attach-ment and have learned that if the mecha-nism is faster than linear (for example, anew node is four times as likely to link to

~ ~

~~an existing node that has twice as manyconnections), one hub will tend to runaway with the lion's share of connections.In such "winner take all" scenarios, thenetwork eventually assumes a star topol-ogy with a central hub.

AnAchilles' HeelAS HUMANITY BECOMES increasing-ly dependent on power grids and com-munications webs, a much-voiced con-cern arises: Exactly how reliable are thesetypes of networks? The good news is thatcomplex systems can be amazingly re-silient against accidental failures. In fact,although hundreds of routers routinelymalfunction on the Internet at any mo-ment, the network rarely suffers majordisruptions. A similar degree of robust-ness characterizes. living systems: peoplerarely notice the consequences of thou-sands of errors in their cells, ranging frommutations to misfolded proteins. What isthe origin of this robustness?

Intuition tells us that the breakdownofa substantial number of nodes will re-sult in a network's inevitable fragmenta-

ALBERT'L4SZL.6BARABASI. and ERICBONABEAU study the behavior and characteristics ofmyriad complex systems, ranging from the Internet to insect colonies. Barabasi is Emil T.Hofman. Professor of Physics at the. University of Notre Dame, where he directs researchon complex networks. He is author of Linked: The New Science of Networks. Bonabeau ischief scientist at Icosystem, a consulting firm based in Cambridge, Mass., that applies thetools of complexity science to the discovery of business opportunities. He is co-author ofSwarmIntelligence: From Natural to ArtificialSystems. Thisis Bonabeau's second articlefor Scientific American.

SCIENTIFIC AMERICAN 55

(1999) focuses on the dynamics of node degrees, fol-lowed by the master-equation approach of Dorogovtsev,Mendes, and Samukhin (2000a) and the rate-equationapproach introduced by Krapivsky, Redner, and Leyvraz(2000). As these methods are often used interchange-ably in the subsequent section, we briefly review each ofthem.

Continuum theory: The continuum approach intro-duced by Barabasi and Albert (1999) and Barabasi, Al-bert, and Jeong (1999) calculates the time dependenceof the degree ki of a given node i . This degree will in-crease every time a new node enters the system andlinks to node i , the probability of this process being!(ki). Assuming that ki is a continuous real variable,the rate at which ki changes is expected to be propor-tional to !(ki). Consequently ki satisfies the dynamicalequation

"ki

"t!m!#ki$!m

ki

%j!1

N"1

kj

. (79)

The sum in the denominator goes over all nodes in thesystem except the newly introduced one; thus its value is% jkj!2mt"m , leading to

"ki

"t!

ki

2t. (80)

The solution of this equation, with the initial conditionthat every node i at its introduction has ki(t i)!m , is

ki# t $!m! tt i" &

with &!12

. (81)

Equation (81) indicates that the degree of all nodesevolves the same way, following a power law, the onlydifference being the intercept of the power law.

Using Eq. (81), one can write the probability that anode has a degree ki(t) smaller than k , P'ki(t)#k( , as

P'ki# t $#k(!P! t i$m1/&tk1/& " . (82)

Assuming that we add the nodes at equal time intervalsto the network, the t i values have a constant probabilitydensity

P# t i$!1

m0%t. (83)

Substituting this into Eq. (82) we obtain

P! t i$m1/&tk1/& "!1"

m1/&tk1/&# t%m0$

. (84)

The degree distribution P(k) can be obtained using

P#k $!"P'ki# t $#k(

"k!

2m1/&tm0%t

1k1/&%1 , (85)

predicting that asymptotically (t!))

P#k $*2m1/&k"+ with +!1&

%1!3 (86)

being independent of m , in agreement with the numeri-cal results.

FIG. 21. Numerical simulations of network evolution: (a) Degree distribution of the Barabasi-Albert model, with N!m0%t!300 000 and !, m0!m!1; ", m0!m!3; !, m0!m!5; and #, m0!m!7. The slope of the dashed line is +!2.9, providingthe best fit to the data. The inset shows the rescaled distribution (see text) P(k)/2m2 for the same values of m , the slope of thedashed line being +!3; (b) P(k) for m0!m!5 and various system sizes, !, N!100 000; ", N!150 000; !, N!200 000. Theinset shows the time evolution for the degree of two vertices, added to the system at t1!5 and t2!95. Here m0!m!5, and thedashed line has slope 0.5, as predicted by Eq. (81). After Barabasi, Albert, and Jeong (1999).

72 R. Albert and A.-L. Barabasi: Statistical mechanics of complex networks

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

broad range of p is underlined by the results concerningthe spectral properties of the Laplacian operator, whichtell us about the time evolution of a diffusive field on thegraph (Monasson, 2000).

VII. SCALE-FREE NETWORKS

The empirical results discussed in Sec. II demonstratethat many large networks are scale free, that is, theirdegree distribution follows a power law for large k . Fur-thermore, even for those networks for which P(k) hasan exponential tail, the degree distribution significantlydeviates from a Poisson distribution. We have seen inSecs. III.D and VI.B.3 that random-graph theory andthe WS model cannot reproduce this feature. While it isstraightforward to construct random graphs that have apower-law degree distribution (Sec. V), these construc-tions only postpone an important question: what is themechanism responsible for the emergence of scale-freenetworks? We shall see in this section that answeringthis question will require a shift from modeling networktopology to modeling the network assembly and evolu-tion. While at this point these two approaches do notappear to be particularly distinct, we shall find that thereis a fundamental difference between the modeling ap-proach we took in random graphs and the small-worldmodels, and the one required to reproduce the power-law degree distribution. While the goal of the formermodels is to construct a graph with correct topologicalfeatures, the modeling of scale-free networks will putthe emphasis on capturing the network dynamics. Thatis, the underlying assumption behind evolving or dy-namic networks is that if we capture correctly the pro-cesses that assembled the networks that we see today,then we will obtain their topology correctly as well. Dy-namics takes the driving role, topology being only a by-product of this modeling philosophy.

A. The Barabasi-Albert model

The origin of the power-law degree distribution ob-served in networks was first addressed by Barabasi andAlbert (1999), who argued that the scale-free nature ofreal networks is rooted in two generic mechanismsshared by many real networks. The network models dis-cussed thus far assume that we start with a fixed numberN of vertices that are then randomly connected or re-wired, without modifying N . In contrast, most real-world networks describe open systems that grow by thecontinuous addition of new nodes. Starting from a smallnucleus of nodes, the number of nodes increasesthroughout the lifetime of the network by the subse-quent addition of new nodes. For example, the WorldWide Web grows exponentially in time by the additionof new web pages, and the research literature constantlygrows by the publication of new papers.

Second, network models discussed so far assume thatthe probability that two nodes are connected (or theirconnection is rewired) is independent of the nodes’ de-gree, i.e., new edges are placed randomly. Most real net-works, however, exhibit preferential attachment, suchthat the likelihood of connecting to a node depends onthe node’s degree. For example, a web page will morelikely include hyperlinks to popular documents with al-ready high degrees, because such highly connecteddocuments are easy to find and thus well known, or anew manuscript is more likely to cite well-known andthus much-cited publications than less-cited and conse-quently less-known papers.

These two ingredients, growth and preferential attach-ment, inspired the introduction of the Barabasi-Albertmodel, which led for the first time to a network with apower-law degree distribution. The algorithm of theBarabasi-Albert model is the following:

(1) Growth: Starting with a small number (m0) ofnodes, at every time step, we add a new node withm(!m0) edges that link the new node to m differentnodes already present in the system.

(2) Preferential attachment: When choosing the nodesto which the new node connects, we assume that theprobability " that a new node will be connected to nodei depends on the degree ki of node i , such that

"#ki$!ki

%j

kj

. (78)

After t time steps this procedure results in a networkwith N!t"m0 nodes and mt edges. Numerical simula-tions indicated that this network evolves into a scale-invariant state with the probability that a node has kedges following a power law with an exponent &BA!3(see Fig. 21). The scaling exponent is independent of m ,the only parameter in the model.

B. Theoretical approaches

The dynamical properties of the scale-free model canbe addressed using various analytic approaches. Thecontinuum theory proposed by Barabasi and Albert

FIG. 20. Spectral density of small-world networks, comparedto the semicircle law corresponding to random graphs (solidline). The rewiring probabilities are (a) p!0; (b) p!0.01; (c)p!0.3; and (d) p!1. After Farkas et al. (2001).

71R. Albert and A.-L. Barabasi: Statistical mechanics of complex networks

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

these systems is that the network continuous-ly expands by the addition of new verticesthat are connected to the vertices alreadypresent in the system.

Second, the random network models as-sume that the probability that two vertices areconnected is random and uniform. In con-trast, most real networks exhibit preferentialconnectivity. For example, a new actor ismost likely to be cast in a supporting rolewith more established and better-known ac-tors. Consequently, the probability that a newactor will be cast with an established one ismuch higher than that the new actor will becast with other less-known actors. Similarly,a newly created Web page will be more likelyto include links to well-known popular doc-uments with already-high connectivity, and anew manuscript is more likely to cite a well-known and thus much-cited paper than itsless-cited and consequently less-known peer.These examples indicate that the probabilitywith which a new vertex connects to theexisting vertices is not uniform; there is ahigher probability that it will be linked to avertex that already has a large number ofconnections.

We next show that a model based on thesetwo ingredients naturally leads to the ob-served scale-invariant distribution. To incor-porate the growing character of the network,starting with a small number (m0 ) of vertices,at every time step we add a new vertex withm(!m0 ) edges that link the new vertex to mdifferent vertices already present in the sys-tem. To incorporate preferential attachment,we assume that the probability ! that a newvertex will be connected to vertex i dependson the connectivity ki of that vertex, so that!(ki ) " ki /#j kj. After t time steps, themodel leads to a random network with t $m0 vertices and mt edges. This networkevolves into a scale-invariant state with theprobability that a vertex has k edges, follow-ing a power law with an exponent %model "2.9 & 0.1 (Fig. 2A). Because the power lawobserved for real networks describes systemsof rather different sizes at different stages oftheir development, it is expected that a cor-rect model should provide a distributionwhose main features are independent of time.Indeed, as Fig. 2A demonstrates, P(k) isindependent of time (and subsequently inde-pendent of the system size m0 $ t), indicat-ing that despite its continuous growth, thesystem organizes itself into a scale-free sta-tionary state.

The development of the power-law scal-ing in the model indicates that growth andpreferential attachment play an important rolein network development. To verify that bothingredients are necessary, we investigatedtwo variants of the model. Model A keeps thegrowing character of the network, but prefer-ential attachment is eliminated by assuming

that a new vertex is connected with equalprobability to any vertex in the system [thatis, !(k) " const " 1/(m0 $ t ' 1)]. Sucha model (Fig. 2B) leads to P(k) (exp(')k), indicating that the absence ofpreferential attachment eliminates the scale-free feature of the distribution. In model B,we start with N vertices and no edges. Ateach time step, we randomly select a vertexand connect it with probability !(ki ) " ki /#j k j to vertex i in the system. Although atearly times the model exhibits power-lawscaling, P(k) is not stationary: because N isconstant and the number of edges increaseswith time, after T ! N 2 time steps the systemreaches a state in which all vertices are con-nected. The failure of models A and B indi-cates that both ingredients—growth and pref-erential attachment—are needed for the de-velopment of the stationary power-law distri-bution observed in Fig. 1.

Because of the preferential attachment, avertex that acquires more connections thananother one will increase its connectivity at ahigher rate; thus, an initial difference in theconnectivity between two vertices will in-crease further as the network grows. The rateat which a vertex acquires edges is *ki /*t "ki / 2t, which gives ki(t) " m(t/ti )

0.5, whereti is the time at which vertex i was added tothe system (see Fig. 2C), a scaling propertythat could be directly tested once time-re-solved data on network connectivity becomesavailable. Thus older (with smaller ti ) verti-ces increase their connectivity at the expenseof the younger (with larger ti ) ones, leadingover time to some vertices that are highlyconnected, a “rich-get-richer” phenomenonthat can be easily detected in real networks.Furthermore, this property can be used tocalculate % analytically. The probability thata vertex i has a connectivity smaller than k,P[ki(t) + k], can be written as P(ti ,m2t/k2). Assuming that we add the verticesto the system at equal time intervals, weobtain P(ti , m2t/k2) " 1 ' P(ti !

m2t/k2) " 1 ' m2t/k2(t $ m0). The prob-ability density P(k) can be obtained fromP(k) " *P[ki(t) + k]/*k, which over longtime periods leads to the stationary solution

P-k. "2m2

k3

giving % " 3, independent of m. Although itreproduces the observed scale-free distribu-tion, the proposed model cannot be expectedto account for all aspects of the studied net-works. For that, we need to model thesesystems in more detail. For example, in themodel we assumed linear preferential attach-ment; that is, !(k) ( k. However, althoughin general !(k) could have an arbitrary non-linear form !(k) ( k/, simulations indicatethat scaling is present only for / " 1. Fur-thermore, the exponents obtained for the dif-ferent networks are scattered between 2.1 and4. However, it is easy to modify our model toaccount for exponents different from % " 3.For example, if we assume that a fraction p ofthe links is directed, we obtain %( p) " 3 'p, which is supported by numerical simula-tions (16 ). Finally, some networks evolve notonly by adding new vertices but by adding(and sometimes removing) connections be-tween established vertices. Although theseand other system-specific features couldmodify the exponent %, our model offers thefirst successful mechanism accounting for thescale-invariant nature of real networks.

Growth and preferential attachment aremechanisms common to a number of com-plex systems, including business networks(17, 18), social networks (describing individ-uals or organizations), transportation net-works (19), and so on. Consequently, weexpect that the scale-invariant state observedin all systems for which detailed data hasbeen available to us is a generic property ofmany complex networks, with applicabilityreaching far beyond the quoted examples. Abetter description of these systems wouldhelp in understanding other complex systems

Fig. 2. (A) The power-law connectivity distribution at t " 150,000 (E) and t " 200,000 (!) asobtained from the model, using m0 " m " 5. The slope of the dashed line is % " 2.9. (B) Theexponential connectivity distribution for model A, in the case of m0 " m " 1 (E), m0 " m "3 (!), m0 " m " 5 ({), and m0 " m " 7 (‚). (C) Time evolution of the connectivity for twovertices added to the system at t1 " 5 and t2 " 95. The dashed line has slope 0.5.

R E P O R T S

www.sciencemag.org SCIENCE VOL 286 15 OCTOBER 1999 511

on

Sept

embe

r 15,

200

7 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

(1999) focuses on the dynamics of node degrees, fol-lowed by the master-equation approach of Dorogovtsev,Mendes, and Samukhin (2000a) and the rate-equationapproach introduced by Krapivsky, Redner, and Leyvraz(2000). As these methods are often used interchange-ably in the subsequent section, we briefly review each ofthem.

Continuum theory: The continuum approach intro-duced by Barabasi and Albert (1999) and Barabasi, Al-bert, and Jeong (1999) calculates the time dependenceof the degree ki of a given node i . This degree will in-crease every time a new node enters the system andlinks to node i , the probability of this process being!(ki). Assuming that ki is a continuous real variable,the rate at which ki changes is expected to be propor-tional to !(ki). Consequently ki satisfies the dynamicalequation

"ki

"t!m!#ki$!m

ki

%j!1

N"1

kj

. (79)

The sum in the denominator goes over all nodes in thesystem except the newly introduced one; thus its value is% jkj!2mt"m , leading to

"ki

"t!

ki

2t. (80)

The solution of this equation, with the initial conditionthat every node i at its introduction has ki(t i)!m , is

ki# t $!m! tt i" &

with &!12

. (81)

Equation (81) indicates that the degree of all nodesevolves the same way, following a power law, the onlydifference being the intercept of the power law.

Using Eq. (81), one can write the probability that anode has a degree ki(t) smaller than k , P'ki(t)#k( , as

P'ki# t $#k(!P! t i$m1/&tk1/& " . (82)

Assuming that we add the nodes at equal time intervalsto the network, the t i values have a constant probabilitydensity

P# t i$!1

m0%t. (83)

Substituting this into Eq. (82) we obtain

P! t i$m1/&tk1/& "!1"

m1/&tk1/&# t%m0$

. (84)

The degree distribution P(k) can be obtained using

P#k $!"P'ki# t $#k(

"k!

2m1/&tm0%t

1k1/&%1 , (85)

predicting that asymptotically (t!))

P#k $*2m1/&k"+ with +!1&

%1!3 (86)

being independent of m , in agreement with the numeri-cal results.

FIG. 21. Numerical simulations of network evolution: (a) Degree distribution of the Barabasi-Albert model, with N!m0%t!300 000 and !, m0!m!1; ", m0!m!3; !, m0!m!5; and #, m0!m!7. The slope of the dashed line is +!2.9, providingthe best fit to the data. The inset shows the rescaled distribution (see text) P(k)/2m2 for the same values of m , the slope of thedashed line being +!3; (b) P(k) for m0!m!5 and various system sizes, !, N!100 000; ", N!150 000; !, N!200 000. Theinset shows the time evolution for the degree of two vertices, added to the system at t1!5 and t2!95. Here m0!m!5, and thedashed line has slope 0.5, as predicted by Eq. (81). After Barabasi, Albert, and Jeong (1999).

72 R. Albert and A.-L. Barabasi: Statistical mechanics of complex networks

Rev. Mod. Phys., Vol. 74, No. 1, January 2002

PERSPECTIVE

Scale-Free Networks: A Decadeand BeyondAlbert-László Barabási

For decades, we tacitly assumed that the components of such complex systems as the cell, thesociety, or the Internet are randomly wired together. In the past decade, an avalanche of researchhas shown that many real networks, independent of their age, function, and scope, converge tosimilar architectures, a universality that allowed researchers from different disciplines to embracenetwork theory as a common paradigm. The decade-old discovery of scale-free networks was one ofthose events that had helped catalyze the emergence of network science, a new research field withits distinct set of challenges and accomplishments.

Nature, society, and many technologies aresustained by numerous networks thatare not only too important to fail but

paradoxically for decades have also proved toocomplicated to understand. Simple models, likethe one introduced in 1959 by mathematiciansPál Erd!s and Alfréd Rényi (1), drove much ofour thinking about interconnected systems. Theyassumed that complex systems are wired randomlytogether, a hypothesis that was adopted by so-ciology, biology, and computer science. It hadconsiderable predictive power, explaining for ex-ample why everybody is only six handshakesfrom anybody else (2–5), a phenomenon ob-served as early as 1929 (2) but which resonatedin physical sciences only after Duncan Watts andStephen Strogatz extended its reach beyond so-ciology (5). Yet, the undeniable success of therandom hypothesis did pose a fundamental ques-tion: Are real networks truly random? That is,could systems such as the cell or a society func-tion seamlessly if their nodes, molecules, orpeoplewerewired randomly together? This ques-tion motivated our work as well, leading 10 yearsago to the discovery of the scale-free property(6, 7).

Our first clue that real networks may showmanifestly nonrandom features also came 10 yearsago from a map of the World Wide Web (WWW)(8), finding that the probability that a Web pagehas exactly k links (in other words, degree k)follows a power law distribution

P(k) ~ k-g (1)

a stunning departure from the Poisson distribu-tion predicted by random network theory (1). Yet,it was not until we realized that Eq. 1 character-izes the network of actors linked by movies andscientific papers linked by citations (9) that we

suspected that the scale-free property (6) mightnot be unique to theWWW. The main purpose ofthe 1999 Science paper was to report thisunexpected similarity between networks of quitedifferent nature and to show that twomechanisms,growth and preferential attachment, are theunderlying causes (Fig. 1).

When we concluded in 1999 that we “expectthat the scale invariant state […] is a generic

property of many complex networks” (7), it wasmore of a prediction than a fact, because naturecould have chosen as many different architec-tures as there are networks. Yet, probably themost surprising discovery of modern networktheory is the universality of the network topology:Many real networks, from the cell to the Internet,independent of their age, function, and scope,converge to similar architectures. It is this uni-versality that allowed researchers from differentdisciplines to embrace network theory as a com-mon paradigm.

Today, the scale-free nature of networks ofkey scientific interest, from protein interactions tosocial networks and from the network of inter-linked documents that make up the WWW to theinterconnected hardware behind the Internet, hasbeen established beyond doubt. The evidencecomes not only from better maps and data setsbut also from the agreement between empiricaldata and analytical models that predict the networkstructure (10, 11). Yet, the early euphoria was notwithout negative side effects, prompting some re-searchers to label many systems scale-free, evenwhen the evidence was scarce at best. However,the net result was to force us to better understandthe factors that shape network structure. For ex-

Pushing Networks to the Limit

Center for Complex Network Research, Department of Physics,Biology, and Computer Science, Northeastern University, Boston,MA 02115, USA. Department of Medicine, Harvard MedicalSchool and Center for Cancer Systems Biology, Dana FarberCancer Institute, Boston, MA 02115, USA. E-mail: [email protected]

Fig. 1. The birth of a scale-free network. (Top and Middle) The simplest process that can produce ascale-free topology was introduced a decade ago in (6), and it is illustrated in the top two rows. Startingfrom three connected nodes (top left), in each image a new node (shown as an empty circle) is added tothe network. When deciding where to link, new nodes prefer to attach to the more connected nodes, aprocess known as preferential attachment. Thanks to growth and preferential attachment, a rich-gets-richerprocess is observed, which means that the highly connected nodes acquire more links than those that are lessconnected, leading to the natural emergence of a few highly connected hubs. The node size, which waschosen to be proportional to the node’s degree, illustrates the natural emergence of hubs as the largestnodes. The degree distribution of the resulting network follows the power law (Eq. 1) with exponent g = 3.See also movies S1 to S3. (Bottom) Illustration of the growth process in the co-authorship network ofphysicists. Each node corresponds to an individual author, and two nodes are connected if they co-authored a paper together. The four images show the network’s growth at 1-month time intervals,indicating how the network expands in time, leading to the emergence of a clear hub. Once again, thenode size was chosen to be proportional to the node’s degree. [Credit: D. Wang and G. Palla]

24 JULY 2009 VOL 325 SCIENCE www.sciencemag.org412

on

July

24,

200

9 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

Page 24: Social Networks

PREFERENTIAL ATTACHMENT

24

Page 25: Social Networks

TWO MODELS

25

RANDOM GRAPH BARABASI-ALBERT MODEL

Page 26: Social Networks

STRENGTH OF WEAK TIESTHE THIRD STORY

26

Page 27: Social Networks

THE STRENGTH OF WEAK TIES

“THE STRENGTH OF WEAK TIES”. MARK GRANOVETTER. 1973

27

Page 28: Social Networks

TIE STRENGTH OF TIES

STRENGTH OF A TIE:

AMOUNT OF TIME

EMOTIONAL INTENSITY

INTIMACY

RECIPROCITY

TRIADIC CLOSURE

IF A AND B AND A AND C ARE STRONGLY LINKED, THEN THE TIE BETWEEN B AND C IS ALWAYS PRESENT

CLUSTERING COEFFICIENT

28

Page 29: Social Networks

BRIDGES

29

BRIDGE - A LINE IN A NETWORK WHICH PROVIDES THE ONLY PATH BETWEEN TWO POINTS

BRIDGE IS LOCAL BRIDGE IF ITS REMOVAL INCREASES DISTANCE BETWEEN TWO POINTS

NO STRONG TIE IS A BRIDGE

ROLE IN DIFFUSION

Page 30: Social Networks

STRENGTH OF WEAK TIES

WHERE IS THE STRENGTH?

JOB CHANGES:

16.7% FRIENDS (1-2 CONTACTS A WEEK)

55.6% ACQUAINTANCES (OCCASIONAL CONTACTS, MORE THEN ONCE A YEAR )

27.8% RARELY

WEAK TIES = ”LONG TIES”, CONNECT PEOPLE FROM DIFFERENT COMMUNITIES

30

Page 31: Social Networks

FACEBOOK

All Friends

One-way Communication Mutual Communication

Maintained Relationships

31

DECLARED FRIENDSHIP

MAINTAINED RELATIONSHIPS

ONE WAY

COMMUNICATIONS

CAMERON MARLOW ET. AL , 2009

Page 32: Social Networks

IN THE CIRCLE OF FRIENDSTHE FOURTH STORY

32

Page 33: Social Networks

COMMUNITY DETECTION

COMMUNITY - A SET OF NODES CONNECTED AMONG THEMSELVES MORE THAN WITH THE REST OF THE NETWORK

33

Page 34: Social Networks

GRAPH THEORY METHODS

GRAPH CUTS:

FLOW METHODS

MIN CUT, NORMALIZED CUTS

GREEDY ALGORITHMS

SPECTRAL METHODS

MULTI RESOLUTION METHODS

34

Page 35: Social Networks

BETWEENNESS CENTRALITY

NODE BETWEENNESS CENTRALITY IS PROPORTIONAL TO THE NUMBER OF SHORTEST PATHS GOING THROUGH THE NODE

EDGE BETWEENNESS

ITERATIVELY REMOVING THE WEAKEST

35

advanced material 69

2

3

5 7

9

10

8

6

41 11

(a)

2

3

5 7

9

10

8

6

41 11

(b)

2

3

5 7

9

10

8

6

41 11

(c)

2

3

5 7

9

10

8

6

41 11

(d)

Figure 3.17. The four steps (a)–(d) of the Girvan–Newman method applied to the network fromFigure 3.15.

. . . Proceed in this way as long as edges remain in the graph, in each step recalculatingall betweennesses and removing the edge or edges of highest betweenness.

Thus, as the graph falls apart first into large pieces and then into smaller ones, themethod naturally exposes a nested structure in the tightly-knit regions. In Figures 3.16and 3.17 we show how the method operates on the graphs from Figures 3.14(a) and3.15, respectively. Note how smaller regions emerge from larger ones as edges aresuccessively removed.

The sequence of steps in Figure 3.17 in fact exposes some interesting points abouthow the method works:! When we calculate the betweenness in the first step, the 5-7 edge carries all the

flow from nodes 1–5 to nodes 7–11, for a betweenness of 25. The 5-6 edge, on theother hand, only carries flow from node 6 to each of nodes 1–5, for a betweennessof 5 (and similarly for the 6-7 edge).

“A SET OF MEASURE OF CENTRALITY BASED ON BETWEENNESS”. LINTONC.FREEMAN, 1977.

“FINDING AND EVALUATING COMMUNITY STRUCTURE IN NETWORKS” . MARK E. J. NEWMAN AND MICHELLE GIRVAN. 2004.

Page 36: Social Networks

ECONOMICS OF FRIENDSHIPTHE FIFTH, BUT NOT THE LAST STORY

36

Page 37: Social Networks

GAME THEORY

GAME THEORY IS THE STUDY OF THE WAYS IN WHICH STRATEGIC INTERACTIONS AMONG ECONOMIC AGENTS PRODUCE OUTCOMES WITH RESPECT UTILITIES OF THOSE AGENTS,

NOTION OF PAYOFF

PAYOFF TABLE

RATIONAL PLAYERS, ACTING IN THEIR SELF INTERESTS

NETWORK FORMATION GAME

37

Page 38: Social Networks

UTILITARIAN RELATIONSHIPS

LINKS - SOCIAL RELATIONSHIPS, FRIENDSHIP

CONNECTIONS OFFER BENEFITS: FAVORS, SUPPORT, INFORMATION (0<D<1)

DISTANCE BASED UTILITY FUNCTION

PAY COSTS FOR DIRECT RELATIONSHIPS (0<C<1)

PLAYERS BENEFIT FROM INDIRECT RELATIONSHIPS, BENEFITS DETERIORATES WITH DISTANCE (B^D)

RELATIONSHIPS UTILITY = TOTAL BENEFITS - COSTS

38

1.2. A SET OF EXAMPLES: 27

t t t + 2 + 3 c 2 + 2 2c

1 2 3 4

2 + 2 2c + 2 + 3 c

Figure 1.2.3 The utilities to the players in a three-link four-player network inthe symmetric connections model.

Given a network g,12 write the net utility or payo§ ui(g) that player i receives from

a network g as

ui(g) =X

j 6=i: i and j are pathconnected in g

`ij(g) di(g)c;

where `ij(g) is the number of links in the shortest path between i and j, di(g) is the

number of links that i has (iís degree), and c > 0 is the cost for a player of maintaining

a link.

The highly stylized nature of the connections model allows us to begin to answer

questions regarding which networks are ìbestî (most ìe¢cientî) from societyís point

of view, as well as which networks are likely to form when self-interested players choose

their own links.

Let us deÖne a network to be e¢cient if it maximizes the total utility to all players

in the society. That is, g is e¢cient if it maximizesP

i ui(g).13

It is clear that if costs are very low, it will be e¢cient to include all links in the

network. In particular, if c < 2, then adding a link between any two agents iand j will always increase total welfare. This follows because they are each getting at

most 2 of value from having any sort of indirect connection between them, and since

2 < c the extra value even accounting for the cost of a direct connection betweenthem increases their utilities (and might also increase the utilities of other agents).

When the cost rises above this level, so that c > 2 but c is not too high (seeExercise 1.3), it turns out that the unique e¢cient network structure is to have all

players arranged in a ìstarî network. That is, there should be some central player who

variation has been studied by Johnson and Gilles [286] and is discussed in Exercise 6.13.12For complete deÖnitions, see Chapter 2. For now, all that is important is that this tells us which

pairs of players are linked.13This is just one of many possible measures of e¢ciency and societal welfare, which is a well-

studied subject in philosophy and economics. How we measure e¢ciency has important consequences

in network analysis and is discussed in more detail in Chapter 6.

1.2. A SET OF EXAMPLES: 27

t t t + 2 + 3 c 2 + 2 2c

1 2 3 4

2 + 2 2c + 2 + 3 c

Figure 1.2.3 The utilities to the players in a three-link four-player network inthe symmetric connections model.

Given a network g,12 write the net utility or payo§ ui(g) that player i receives from

a network g as

ui(g) =X

j 6=i: i and j are pathconnected in g

`ij(g) di(g)c;

where `ij(g) is the number of links in the shortest path between i and j, di(g) is the

number of links that i has (iís degree), and c > 0 is the cost for a player of maintaining

a link.

The highly stylized nature of the connections model allows us to begin to answer

questions regarding which networks are ìbestî (most ìe¢cientî) from societyís point

of view, as well as which networks are likely to form when self-interested players choose

their own links.

Let us deÖne a network to be e¢cient if it maximizes the total utility to all players

in the society. That is, g is e¢cient if it maximizesP

i ui(g).13

It is clear that if costs are very low, it will be e¢cient to include all links in the

network. In particular, if c < 2, then adding a link between any two agents iand j will always increase total welfare. This follows because they are each getting at

most 2 of value from having any sort of indirect connection between them, and since

2 < c the extra value even accounting for the cost of a direct connection betweenthem increases their utilities (and might also increase the utilities of other agents).

When the cost rises above this level, so that c > 2 but c is not too high (seeExercise 1.3), it turns out that the unique e¢cient network structure is to have all

players arranged in a ìstarî network. That is, there should be some central player who

variation has been studied by Johnson and Gilles [286] and is discussed in Exercise 6.13.12For complete deÖnitions, see Chapter 2. For now, all that is important is that this tells us which

pairs of players are linked.13This is just one of many possible measures of e¢ciency and societal welfare, which is a well-

studied subject in philosophy and economics. How we measure e¢ciency has important consequences

in network analysis and is discussed in more detail in Chapter 6.

Page 39: Social Networks

STABILITY AND EFFICIENCY

PAIRWISE STABILITY:

NO PLAYER WANTS TO REMOVE A LINK

NO TWO PLAYERS WANT TO BOTH ADD A LINK

EFFICIENCY:

STRONG EFFICIENCY (MAXIMIZES TOTAL UTILITY)

PARETO EFFICIENCY

TENSION BETWEEN STABILITY AND EFFICIENCY

39

JACKSON, M.O. AND WOLINSKY, A. (1996)

Page 40: Social Networks

OPTIMAL NETWORK STRUCTURE

IN SOME RANGE OF PARAMETERS, THESE NETWORKS ARE BOTH STABLE AND EFFICIENT

COMPLETE NETWORK

STAR NETWORK

40

Page 41: Social Networks

FOLLOWING THE CROWDTHE LAST STORY

41

Page 42: Social Networks

INFORMATION CASCADE

RESTAURANT CHOICE:

YOUR OWN INFORMATION (PRIVATE SIGNAL)

INFORMATION ABOUT CHOICE MADE BY OTHERS (EXTERNAL SIGNAL)

SEQUENTIAL DECISION MAKING

ONLY INFORMATION BASED

RATIONAL CHOICE IN BEST INTEREST

42

Page 43: Social Networks

NETWORK EFFECT

LOCAL LEVEL OF INTERACTION, FRIENDS INFLUENCE (NOT INTERESTED IN ENTIRE POPULATION OPINION)

INFORMATION EFFECT: OBSERVE THE CHOICE OF OTHERS

DIRECT BENEFIT EFFECT: ADVANTAGE OF COPYING DECISIONS OF OTHERS (MATCHING TECHNOLOGY ETC)

43

Page 44: Social Networks

COORDINATION GAME

THRESHOLD

44

A B

A a,a 0,0

B 0,0 b,b

500 cascading behavior in networks

v

w

A B

A a, a 0, 0

B 0, 0 b, b

Figure 19.1. A-B coordination game.

this using a game in which v and w are the players and A and B are the possiblestrategies. The payoffs are defined as follows:

! if v and w both adopt behavior A, they each get a payoff of a > 0;! if they both adopt B, they each get a payoff of b > 0; and! if they adopt opposite behaviors, they each get a payoff of 0.

We can write this in terms of a payoff matrix, as in Figure 19.1. Of course, it is easyto imagine many more general models for coordination, but for now we are trying tokeep things as simple as possible.

This describes what happens on a single edge of the network, but the point is thateach node v is playing a copy of this game with each of its neighbors, and its payoff isthe sum of its payoffs in the games played on each edge. Hence, v’s choice of strategywill be based on the choices made by all of its neighbors, taken together.

The basic question faced by v is the following: suppose that some of its neighborsadopt A, and some adopt B; what should v do in order to maximize its payoff? Thisclearly depends on the relative number of neighbors doing each, and on the relationbetween the payoff values a and b. With a little bit of algebra, we can make up adecision rule for v quite easily, as follows. Suppose that a p fraction of v’s neighborshave behavior A, and a (1 ! p) fraction have behavior B; that is, if v has d neighbors,then pd adopt A and (1 ! p)d adopt B, as shown in Figure 19.2. So if v chooses A, it

v

A

A

A

B

B

B

Bpd neighbors

use A

(1-p)dneighbors

use B

Figure 19.2. Node v must choose between behavior A and behavior B, based on what itsneighbors are doing.

modeling diffusion through a network 501

gets a payoff of pda, and if it chooses B, it gets a payoff of (1 ! p)db. Thus, A is thebetter choice if

pda " (1 ! p)db,

or, rearranging terms, if

p " b

a + b.

We’ll use q to denote this expression on the right-hand side. This inequality describesa very simple threshold rule: it says that if a fraction of at least q = b/(a + b) of yourneighbors follow behavior A, then you should, too. And it makes sense intuitively:when q is small, then A is the much more enticing behavior, and it only takes a smallfraction of your neighbors engaging in A for you to do so as well. However, if q islarge, then the opposite holds: B is the attractive behavior, and you need a lot of yourfriends to engage in A before you switch to A. There is a tie-breaking question whenexactly a q fraction of a node’s neighbors follow A; in this case, we will adopt theconvention that the node chooses A rather than B.

Notice that this is in fact a very simple – and in particular, myopic – model ofindividual decision making. Each node is optimally updating its decision based on theimmediate consideration of what its neighbors are currently doing, but it is an interestingresearch question to think about richer models in which nodes try to incorporate morelong-range considerations into their decisions about switching from B to A.

Cascading Behavior. In any network, there are two obvious equilibria to this network-wide coordination game: one in which everyone adopts A, and another in whicheveryone adopts B. Guided by diffusion questions, we want to understand how easy itis, in a given situation, to “tip” the network from one of these equilibria to the other.We also want to understand what other “intermediate” equilibria look like – states ofcoexistence where A is adopted in some parts of the network and B is adopted in others.

Specifically, we consider the following type of situation. Suppose that everyonein the network is initially using B as a default behavior. Then a small set of “initialadopters” all decide to use A. We will assume that the initial adopters have switched toA for some reason outside the definition of the coordination game – they have somehowswitched due to a belief in A’s superiority, rather than by following payoffs – but we’llassume that all other nodes continue to evaluate their payoffs using the coordinationgame. Given the fact that the initial adopters are now using A, some of their neighborsmay decide to switch to A as well, and then some of their neighbors may switch, andso forth, in a potentially cascading fashion. When does this result in every node in theentire network eventually switching over to A? And when this isn’t the result, whatcauses the spread of A to stop? Clearly the answer depends on the network structure, thechoice of initial adopters, and the value of the threshold q that nodes use for decidingwhether to switch to A.

The preceding discussion describes the full model. An initial set of nodes adopts Awhile everyone else adopts B. Time then runs forward in unit steps; in each step, each

PAYOFF MATRIX

Page 45: Social Networks

NETWORK CASCADES

CASCADE IS A “CHAIN REACTION” OF SWITCHING FROM ONE TYPE OF BEHAVIOR TO ANOTHER

45

T

Page 46: Social Networks

CASCADE SIZE

46

COMPLETE CASCADE PARTIAL CASCADE

q=0.4

NATURAL BOUNDARIES

RELATIVE ADVANTAGES

COMMUNITIES/CLUSTERS

WEAK TIES

Page 47: Social Networks

CASCADE MAXIMIZATION

47

A -SEED SETK - SIZE OF A- CASCADE FROM A

Page 48: Social Networks

MARKETING STRATEGY

48

Page 49: Social Networks

COMPLEX NETWORKS

FEATURES:

POWER LAW

SMALL AVERAGE DISTANCE

HIGH CLUSTERING

BUILD BY INDEPENDENT INTERACTING AGENTS

49

Page 50: Social Networks

FACEBOOK WORLD

PAUL BUTLER, FACEBOOK

50

Page 51: Social Networks

TEXTBOOKS

51

Page 52: Social Networks

EASY READ

52

Page 53: Social Networks

REFERENCES

ERDOS, P. AND A. RÈNYI. “ON RANDOM GRAPHS”. PUBLICATIONES MATHEMATICAE DEBRECEN 6: 290-297, 1959

JEFFREY TRAVERS AND STANLEY MILGRAM. “AN EXPERIMENTAL STUDY OF THE SMALL WORLD PROBLEM.” SOCIOMETRY, 32(4):425–443, 1969.

DUNCAN J. WATTS AND STEVEN H. STROGATZ. “COLLECTIVE DYNAMICS OF SMALL-WORLD NETWORKS”. NATURE, 393:440–442, 1998.

MARK GRANOVETTER. “THE STRENGTH OF WEAK TIES” AMERICAN JOURNAL OF SOCIOLOGY, 78:1360–1380, 1973.

C. MARLOW, L. BYRON, T. LENTO, AND I. ROSENN. “MAINTAINED RELATIONSHIPS ON FACEBOOK 2009”. ONLINE AT HTTP://OVERSTATED.NET/2009/03/09/MAINTAINED-RELATIONSHIPS-ON-FACEBOOK.

ALBERT-LA SZLO BARABA SI AND RE KA ALBERT. “EMERGENCE OF SCALING IN

RANDOM NETWORKS.” SCIENCE, 286:509–512, 1999.

53

Page 54: Social Networks

REFERENCES

ANDREJ BRODER, RAVI KUMAR, ET AL. “GRAPH STRUCTURE IN THE WEB” . IN PROC. 9TH INTERNATIONAL WORLD WIDE WEB CONFERENCE, PAGES 309–320, 2000.

LINTON FREEMAN. “A SET OF MEASURE OF CENTRALITY BASED ON BETWEENNESS”. 40(1):35– 41, 1977.

MARK E. J. NEWMAN AND MICHELLE GIRVAN. “FINDING AND EVALUATING COMMUNITY STRUCTURE IN NETWORKS”. PHYSICAL REVIEW E, 69(2):026113, 2004.

JACKSON, M.O. AND WOLINSKY, A. (1996) “A STRATEGIC MODEL OF SOCIAL AND ECO- NOMIC NETWORKS,” JOURNAL OF ECONOMIC THEORY, VOL 71, NO. 1, PP 44 Ñ74.

SUSHIL BIKHCHANDANI, DAVID HIRSHLEIFER, AND IVO WELCH. “A THEORY OF FADS, FASHION, CUSTOM AND CULTURAL CHANGE AS INFORMATION CASCADES.” JOURNAL OF POLITICAL ECONOMY, 100:992–1026, 1992.

STEPHEN MORRIS. “CONTAGION”. REVIEW OF ECONOMIC STUDIES, 67:57–78, 2000.

DAVID KEMPE, JON KLEINBERG, AND EVA TARDOS. “MAXIMIZING THE SPREAD OF INFLUENCE IN A SOCIAL NETWORK.” IN PROC. 9TH ACM SIGKDD INT. CONF. ON KNOWLEDGE DISCOVERY AND DATA MINING, PAGES 137–146, 2003.

54