Network Science: Theory, Modeling and Applications
Transcript of Network Science: Theory, Modeling and Applications
Network Science: Theory, Modeling and Applications
Madhav V. Marathe
Dept. of Computer Science & Network Dynamics and Simulation Science Laboratory
Virginia Bioinformatics Institute Virginia Tech
NDSSL TR-10-148
Supported by Grants from NIH MIDAS, NSF HSD, NSF CNS, CDC COE, and DoD.
Network Dynamics & Simulation Science Laboratory
Where: LLNL, Livermore,
Dates: December 1st to December 15th 2010
Hosts: Dr. David Brown and Dr. Celeste Matarazzo
Time: 10.00 am to 11.30 am (OfMice hours as needed afterwards)
Lecturer: Madhav Marathe, Virginia Tech ([email protected])
Guest Lectures: Christopher Kuhlman (VT), Goran Konjevod (Staff Scientist, LLNL), Anil Vullikanti (Asst Prof. VT and DOE Career award recipient)
Network Dynamics & Simulation Science Laboratory
Complex Networks are pervasive in our society. Realistic biological, information, social and technical networks share a number of unique features that distinguish them from physical networks. Examples of such features include: irregularity, time-‐varying structure, heterogeneity among individual components, and selMish/cooperative game-‐like behavior by individual components and co-‐evolution. The size and heterogeneity of these networks, their co-‐evolving nature and the technical difMiculties in applying dimension reduction techniques commonly used to analyze physical systems makes reasoning, prediction and controlling of these networks even more challenging. Recent quantitative changes in high performance and pervasive computing including faster machines, distributed sensors and service-‐oriented software have created new opportunities for collecting, integrating, analyzing and accessing information related to such large complex networks. The advances in network and information science that build on this new capability provide entirely new ways for reasoning and controlling these networks. Together, they enhance our ability to formulate, analyze and realize novel public policies pertaining to these complex networks. The course will cover the mathematical and computational aspects of Network Science. It will provide a broad overview of the area and then will focus on • Mathematical aspects, including structure theorems, existence proofs, • Computational aspects, including, provable lower as well as upper bounds on the computational resources, efMicient algorithms for computing the structure and dynamics over complex networks, • Developing high performance computing based computational models and modeling environments for supporting Network Science. Practical applications arising in the context of infrastructure planning, energy systems, national security and integrated communication systems will be used to illustrate the applicability of the concepts.
Course Synopsis
Network Dynamics & Simulation Science Laboratory
Work funded in part by NIGMS, NIH MIDAS program, CDC, Center of Excellence in Medical Informatics, DTRA CNIMS, NSF, NeTs, NECO and OCI program, VT Foundation.
Network Dynamics & Simulation Science Laboratory
• Lada Adamic: For graciously sharing her course notes • NDSSL Laboratory members who are in reality coauthors of this. • Other places that I have borrowed the material includes:
• Tim Roughgarden’s lectures on Games • David Kempe’s Lectures on Networks • Henning Mortveit’s lectures on SDS • Bogdan Oporowski’s lecture on Graph theory • Michael Kearns lectures on Networks and Games • … and many more
• Books • Fernando Vega-‐Redondo, Complex Social Networks, Econometric Society Monographs, , Cambridge University Press, 2007 • D. Easley, J. Kleinberg. Networks, Crowds, and Markets: reasoning about a Highly Connected World, Cambridge University Press, 2010. • J. Kleinberg, E. Tardos. Algorithm Design. Addison Wesley, 2005. Matthew Jackson, Social and Economic Networks, Princeton University Press, 2010 • … and many more
Acknowledgements for Course Material
What is a Network? History, Broad Research Questions, Illustrative
Applications
Network Dynamics & Simulation Science Laboratory
What is a network ?
Although no formal accepted deMinition, there appears to be a consensus that all network comprise of the following attributes: A set of agents (entities): agents can be simple, game like, adaptive …
Interaction among the entities governed by a graph (binary or in general k-‐ary relationship) Graph itself can change, co-‐evolve with the entities
Entities modify their local states and behavior by interacting with their neighbors
Blogosphere (datamining.typepad.com)
points lines vertices edges, arcs math nodes links computer science sites bonds physics actors ties, relations sociology
node
edge
Images of Various Networks
Network Dynamics & Simulation Science Laboratory
Social Networks: Facebook has over 500Million individuals!
http://www.smrfoundation.org/category/industry/companies/facebook/
Network Dynamics & Simulation Science Laboratory
High School Dating Network (Discovery Magazine 2007)
Network Dynamics & Simulation Science Laboratory
Router-level network based on ISPs
Network Dynamics & Simulation Science Laboratory
Delta Airlines Routes (airline routes maps.com
Network Dynamics & Simulation Science Laboratory
EU rail network
Network Dynamics & Simulation Science Laboratory
Biological Networks
Institute of biology and technology - Saclay (iBiTec-S)/ Unités/ Department of Integrative Biology and Molecular Genetics (SBiGeM)/ Integrative biology laboratory (LBI)/ Dynamics of Biological Network (J. Labarre)
http://djpowell.wordpress.com/
http://www.leonelmoura.com/tree.html
Network Dynamics & Simulation Science Laboratory
In real world Networks are layered and coupled
Network Dynamics & Simulation Science Laboratory
Growth of network science as measured by publications
#papers with “complex networks” in the title [National Academy of Science Report, 2007]
Journal special issues on Network Science
Network Dynamics & Simulation Science Laboratory
Even appears in main stream publications YESTERDAY !
Network Dynamics & Simulation Science Laboratory
The Emerging Network Science?
Newman, Barabasi, Watts: The Structure and Dynamics of Networks: “We argue that the science of networks that has been taking shape over the last few years is distinguished from preceding work on networks in three important ways: (1) by focusing on the properties of real-‐world networks, it is concerned with
empirical as well as theoretical questions; (2) it frequently takes the view that networks are not static, but evolve in time
according to various dynamical rules; and (3) it aims, ultimately at least, to understand networks not just as topological
objects, but also as the framework upon which distributed dynamical systems are built.”
Kearns: An Emerging Science: Examine apparent similarities (and differences) between many social, economic, information, biological and technological networks
Importance of network effects in such systems How things are connected matters greatly Details of interaction matter greatly
Qualitative and quantitative; can be very subtle A revolution of measurement, theory, and breadth of vision
Network Dynamics & Simulation Science Laboratory
Science of Networks: A personal (and likely biased) viewpoint 1:
Real World Networks: Extremely important but .. Folks in social sciences, transportation, electrical systems, VLSI, … all have been studying real world networks
We need to seriously revisit the use of simple random graph models as a way to explain a phenomenon: the mathematics is elegant but often means very little in the real world
Real world networks are dynamic, coupled and co-‐evolve Ability to collect data that is diverse (spatially, demographically), process it, store it and reason about it very fast New data should be utilized in developing network models
New and realistic models of real world networks. Models should represent coupling and co-evolution
Network Dynamics & Simulation Science Laboratory
Accessibility of Network Science: Pervasive Computing Environment
High performance computing (larger machines, data intensive systems, distributed systems …)
Software as a service; delivering results to specialist who is not interested in becoming a computer scientist
Ability to collect data that is diverse (spatially, demographically), process it, store it and reason about it very fast
Develop Pervasive computing technology to deliver Network Science technology to domain specialists and
others who are not computing experts
Network Dynamics & Simulation Science Laboratory
Science of Networks: Centrality of Computing and Information Science
From analytical results to algorithmic viewpoint: this is the essence of new science in my opinion if one has to do deal with real networks
Questions that become important are: How can we design certain networks How can we measure distributed networks What is a certain set of distributed agents computing: interaction based computing and
social cognition
Models are not monolithic or federated anymore but really a way to synthesize information by interacting with various components – Milner’s in_luential idea on interactionism
Algorithmic Viewpoint provide the foundational basis HPC computing provide the underlying technology
Network Dynamics & Simulation Science Laboratory
Inter and intra-discipline interactions – Emergence of a Giant Component !
We have reached critical point wherein researchers from diverse disciplines are starting to share their ideas and interact (Gladwell’s Tipping point) Beautiful convergence of ideas and view points in CS, Engineering, Economics, Mathematics, Physics, Social Science, Biology…. (convergence of several events, world becoming smaller, funding agencies pushing to do joint work!, global problems, problems that were being solved by disciplinary viewpoints)
Economic drivers: Information economy, distributed logisitics, global markets, mobile labor force, funding shortfalls
Measurement technologies and technologies for developing and sustaining diverse organizations and ecosystems have taken hold
Multi-disciplinary view important: from real research social networks !
Network Dynamics & Simulation Science Laboratory
Culmination of diverse _ields: Viewpoints are different and interesting
Engineers • Understand how infrastructure networks work • Design and control of these networks
Computer Scientists • Understand and design complex, distributed networks • algorithmic view: design of a system and inferring its semantics
Social Scientists, Behavioral Psychologists, Economists • Understand human behavior in “simple” settings • Revised views of economic rationality in humans
Biologists • Neural networks, gene regulatory networks,… • Understanding the evolution of networks
Physicists and Mathematicians • Interest and methods in complex systems • Theories of macroscopic behavior (phase transitions)
Scientists forming co-evolving
networks World
Network Dynamics & Simulation Science Laboratory
Proposed Components of a Research Program in Network Science and Engineering
Structural Analysis of Complex Networks
Dynamics on Complex Networks
Co-‐evolution of dynamics, network
and individual behavior
Measurement and Inference
Networks Science in Real
World
Network Dynamics & Simulation Science Laboratory
Key Research Challenges (NA report on Network Science)
1. Dynamics: Better understanding between structure and function
2. Modeling and Analysis of large networks: Tools, abstractions, approximations
3. Design and Synthesis of Networks 4. Increasing level of rigor and mathematical structure 5. Abstracting common concepts across Mields 6. Better experiments and measurements of network
structure 7. Robustness and Security
Motivating examples/applications
Network Dynamics & Simulation Science Laboratory
Application 1 (1736): First Use of Graphs Seven Bridges of Königsberg
Seven Bridges of Königsberg – one of the Mirst problems in graph theory
Is there a route that crosses each bridge only once and returns to the starting point?
We will see how this problem can be solved by modeling it as a graph theory problem later
Network Dynamics & Simulation Science Laboratory
Application 2 (1850s): Cholera Pandemic: John Snow
First Cholera Pandemic
Second Cholera Pandemic
During this time germ theory of diseases was not widely accepted.
During John Snow's life time there were three pandemics of Asiatic cholera (1817-‐23, 1826-‐37 and 1846-‐63), two of which reached the British isles.
The epidemic in 1848 to 1849, killed between 50,000 and 70,000 in England and Wales. A third outbreak in 1854 left over 30,000 people dead in London alone.
Vibrio cholerae: Toxin alters sodium pump in intestinal cells Mluid loss
Entry: oral Colonization: small intestine Symptoms: nausea, diarrhea, muscle cramps, shock
http://www.ph.ucla.edu/epi/snow.html
Network Dynamics & Simulation Science Laboratory
Application 3 (1950-60) Segregation (Schelling): Micromotives to Macrobehavior
Duncan and Duncan’s (1957) study of Chicago 1940-‐1950 Census tracts, mixed neighborhoods all segregate
Placed pennies and dimes on a chess board and moved them around according to various rules. Board = city, Square = Housing lot, agent: at a location Pennies and dimes = agents representing two groups in society,
e.g. boys and girls, smokers and non-‐smokers, etc. Neighborhood =adjacent locations on the board Happy if (neighbors of same type > threshold) If Unhappy then move to a random location that is happy
Result: Many basic conMigurations produce segregation relate decisions about where to live (micro) to patterns of
segregation (macro) No obvious relationship between individual behavior and
aggregate outcomes. Behavior is interdependent. Individuals’ behaviors depend on
social context (micro) Individual behaviors collectively change social context (long
term, macro)
http://cs.gmu.edu/~eclab/projects/mason/projects/schelling/
Network Dynamics & Simulation Science Laboratory
Application 4: Power grids and cascading failures
Vast system of electricity generation, transmission & distribution is essentially a single network
Power Mlows through all paths from source to sink (Mlow calculations are important for other networks, even social ones)
All AC lines within an interconnect must be in sync
If frequency varies too much (as line approaches capacity), a circuit breaker takes the generator out of the system
Larger Mlows are sent to neighboring parts of the grid – triggering a cascading failure
Network Dynamics & Simulation Science Laboratory
Application 4: Blackout of 2003:
Electrical Infrastructure Affected Area of 50 million people in eight US states and two provinces in Canada
Approximately61,800Megawatts(MW)oMload
Most cascaded happen extremely rapidly from 4.10 pm to 4.13 pm
Human and information system error also contributed to the cascade
Other Infrastructures including water, communication, and most notably transportation (rail, road and air) were affected
TV and radio stations also affected
Network Dynamics & Simulation Science Laboratory
Timeline for 2003 Blackout: Need for Multi-level networks
The 2003 blackout wasn't just about fallen trees and broken transmission lines. As this timeline from the Department of Energy report shows, it resulted from a combination of many grid events,
computer glitches, and human interaction.
Network Dynamics & Simulation Science Laboratory
Blackout of 2003:Time Line – The Initial Phase 12:15 p.m. Incorrect telemetry data renders inoperative the state estimator, a power Mlow
monitoring tool operated by the Indiana-‐basedMidwest Independent Transmission System Operator (MISO). An operator corrects the telemetry problem but forgets to restart the monitoring tool.
1:31 p.m. The Eastlake, Ohio generating plant shuts down. The plant is owned by FirstEnergy, an Akron, Ohio-‐based company that had experienced extensive recent maintenance problems.
2:02 p.m. The Mirst of several 345 kV overhead transmission lines in northeast Ohio fails due to contact with a tree in Walton Hills, Ohio.
2:14 p.m. An alarm system fails at FirstEnergy's control room and is not repaired. 3:05 p.m. A 345 kV transmission line known as the Chamberlain-‐Harding line fails in Parma, south
of Cleveland, due to a tree. 3:17 p.m. Voltage dips temporarily on the Ohio portion of the grid. Controllers take no action. 3:32 p.m. Power shifted by the Mirst failure onto another 345 kV power line, the Hanna-‐Juniper
interconnection, causes it to sag into a tree, bringing it ofMline as well. While MISO and FirstEnergy controllers concentrate on understanding the failures, they fail to inform system controllers in nearby states.
3:39 p.m. A FirstEnergy 138 kV line fails in northern Ohio. 3:41 p.m. A circuit breaker connecting FirstEnergy's grid with that of American Electric Power is
tripped as a 345 kV power line (Star-‐South Canton interconnection) and Mifteen 138 kV lines fail in rapid succession in northern Ohio.
http://en.wikipedia.org/wiki/Northeast_Blackout_of_2003
Network Dynamics & Simulation Science Laboratory
Blackout of 2003: Timeline -- the cascade begins
3:46 p.m. A Mifth 345 kV line, the Tidd-‐Canton Central line, trips ofMline.
4:05:57 p.m. The Sammis-‐Star 345 kV line trips due to undervoltage and overcurrent interpreted as a short circuit. Later analysis suggests that the blackout could have been averted prior to this failure by cutting 1.5 GW of load in the Cleveland–Akron area.
4:06–4:08 p.m. Sustained power surge north toward Cleveland overloads 3 138 kV lines.
4:09:02 p.m. Voltage sags deeply as Ohio draws 2 GW of power from Michigan, creating simultaneous undervoltage and overcurrent conditions as power attempts to Mlow in such a way as to rebalance the system's voltage.
4:10:34 p.m. Many transmission lines trip out, Mirst in Michigan and then in Ohio, blocking the eastward Mlow of power around the south shore of Lake Erie. Suddenly bereft of demand, generating stations go ofMline, creating a huge power deMicit. In seconds, power surges in from the east, overloading east-‐coast power plants whose generators go ofMline as a protective measure, and the blackout is on.
4:10:37 p.m. The eastern and western Michigan power grids disconnect from each other. Two 345 kV lines in Michigan trip. A line that runs from Grand Ledge to Ann Arbor known as the Oneida-‐Majestic interconnection trips. A short time later, a line running from Bay City south to Flint in Consumers Energy's system known as the Hampton-‐Thetford line also trips.
4:10:38 p.m. Cleveland separates from the Pennsylvania grid.
Network Dynamics & Simulation Science Laboratory
Blackout of 2003: Timeline -- Crescendo
4:10:39 p.m. 3.7 GW power Mlows from the east along the north shore of Lake Erie, through Ontario to southern Michigan and northern Ohio, a Mlow more than ten times greater than the condition 30 seconds earlier, causing a voltage drop across the system. 4:10:40 p.m. Flow Mlips to 2 GW eastward from Michigan through Ontario (a net reversal of 5.7 GW of power), then reverses back westward again within a half second.
4:10:40 p.m. Flow Mlips to 2 GW eastward from Michigan through Ontario (a net reversal of 5.7 GW of power), then reverses back westward again within a half second.
4:10:43 p.m. International connections between the United States and Canada begin failing. 4:10:45 p.m. Northwestern Ontario separates from the east when the Wawa-‐Marathon 230 kV
line north of Lake Superior disconnects. The Mirst Ontario power plants go ofMline in response to the unstable voltage and current demand on the system.
4:10:46 p.m. New York separates from the New England grid. 4:10:50 p.m. Ontario separates from the western New York grid. 4:11:57 p.m. The Keith-‐Waterman, Bunce Creek-‐Scott 230 kV lines and the St. Clair-‐Lambton #1
230 kV line and #2 345 kV line between Michigan and Ontario fail. 4:12:03 p.m. Windsor, Ontario and surrounding areas drop off the grid. 4:12:58 p.m. Northern New Jersey separates its power-‐grids from New York and the Philadelphia
area, causing a cascade of failing secondary generator plants along the Jersey coast and throughout the inland west.
4:13 p.m. End of cascading failure. 256 power plants are off-‐line, 85% of which went ofMline after the grid separations occurred, most due to the action of automatic protective controls.
Network Dynamics & Simulation Science Laboratory
Milgram’s Small World Experiment Travers & Milgram 1969: classic early social
network study destination: a Boston stockbroker; lived in Sharon, MA
sources: Nebraska stockowners; forward letter to a Mirst-‐name acquaintance “closer” to target
Information provided: name, address, occupation, Mirm, college, wife’s name and hometown
navigational value? Basic Mindings:
64 of 296 chains reached the target 20% of senders reached target. average chain length = 6.5: “Six degrees of separation” average length of completed chains: 5.2
interaction of chain length and navigational difMiculties
main approach routes: home (6.1) and work (4.6)
Boston sources (4.4) faster than Nebraska (5.5) no advantage for Nebraska stockowners
NE
MA
Network Dynamics & Simulation Science Laboratory
Recent small world experiment
Setup Email experiment Dodds,
Muhamad, Watts, Science 301, (2003)
18 targets, 13 different countries 60,000+ participants
a professor at an Ivy League university,
an archival inspector in Estonia, a technology consultant in India, a policeman in Australia, a veterinarian in the Norwegian
army.
Basic Analysis Approximate 37% participation rate
approximately . Probability of a chain of length 10
getting through: .3710 ~ 5 x 10-‐5 so only one out of 20,000 chains would
make it actual # of completed chains: 384
(1.6% of all chains). Average path length: 4, median: 7 Small changes in attrition rates lead to
large changes in completion rates e.g., a 15% decrease in attrition rate
would lead to a 800% increase in completion rate
Network Dynamics & Simulation Science Laboratory
Estimating ‘recovered’ chain lengths for uncompleted chains
<L> = 4.05 for all completed chains L* = Estimated `true' median chain length Intra-‐country chains: L* = 5 Inter-‐country chains: L* = 7 All chains: L* = 7 Milgram: L * ~ 8-‐9 hops
Network Dynamics & Simulation Science Laboratory
Attrition rate stays approx. constant throughout
rL – probability of not passing on the message at distance L from the source
average 95 % confidence interval
Network Dynamics & Simulation Science Laboratory
Estimated ‘recovered’ chain lengths
observed chain lengths
‘recovered’ histogram of path lengths
inter-‐country intra-‐country
Network Dynamics & Simulation Science Laboratory
Small world experiment at Columbia
Successful chains disproportionately used weak ties (Granovetter) professional ties (34% vs. 13%) ties originating at work/college target's work (65% vs. 40%)
. . . and disproportionately avoided hubs (8% vs. 1%) (+ no evidence of funnels) family/friendship ties (60% vs. 83%)
Strategy: Geography -‐> Work
Network Dynamics & Simulation Science Laboratory
How many hops actually separate any two individuals in the world?
Participants are not perfect in routing messages They use only local information “The accuracy of small world chains in social networks”
Peter D. Killworth, Chris McCarty , H. Russell Bernard& Mark House: Analyze 10920 shortest path connections between 105 members of an interviewing bureau,
together with the equivalent conceptual, or ‘small world’ routes, which use individuals’ selections of intermediaries.
This permits the Mirst study of the impact of accuracy within small world chains.
The mean small world path length (3.23) is 40% longer than the mean of the actual shortest paths (2.30)
Model suggests that people make a less than optimal small world choice more than half the time.
Network Dynamics & Simulation Science Laboratory
Tentative Schedule
Week 1 – Module 1. December 1-‐2 (Wednesday, & Thursday) Wednesday(1st December): Introduction to Network Science Thursday(2nd December): SDS and Diffusion on Networks, Friday (Extra Class if interest): EpiCure – modeling environment for studying
malware propagation in wireless networks. Week 2 – Module 2 December 7-‐9 (Monday, Tuesday, Thursday)
Monday (6th December): Control and InMluence maximization Tuesday (7th December): Branching process result, proof of Fastdiffuse. Introduction to
various diffusion style modeling environments Wednesday (Extra class if interest): Population and Network Synthesis. Introduction
to graph analysis Thursday (9th December): SIMDEMICS and related modeling environments.
Week 3 – Module 3 and Module 4 (December 13-‐16) Monday (13th December): Markets, Games, Mechanism Design and SIGMA: a modeling
environment to study commodity markets on networks, Tuesday (14th December): Shortest Paths, Formal language constrained paths, Greedy
routing, routing in small world networks, Introduction to TRANSIMS. Thursday (15th December): Concluding remarks, Brief discussion of uncovered topics,
Open Problems, Directions for Future Work.