Socialnetworkanalysis (Tin180 Com)

30
June 20, 2022 Data Mining: Concepts and Techniq ues 1 Social Network Analysis Social Network Introduction Statistics and Probability Theory Models of Social Network Generation Networks in Biological System

description

http://tin180.com - Trang tin tức văn hóa lành mạnh

Transcript of Socialnetworkanalysis (Tin180 Com)

Page 1: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 1

Social Network Analysis

Social Network Introduction

Statistics and Probability Theory

Models of Social Network Generation

Networks in Biological System

Page 2: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 2

Society

Nodes: individuals

Links: social relationship (family/work/friendship/etc.)

S. Milgram (1967)

Social networks: Many individuals with diverse social interactions between them.

John Guare

Six Degrees of Separation

Page 3: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 3

Communication networks

The Earth is developing an electronic nervous system, a network with diverse nodes and links are

-computers

-routers

-satellites

-phone lines

-TV cables

-EM waves

Communication networks: Many non-identical components with diverse connections between them.

Page 4: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 4

Complex systemsMade of

many non-identical elements connected by diverse interactions.

NETWORK

Page 5: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 5

“Natural” Networks and Universality

Consider many kinds of networks: social, technological, business, economic, content,…

These networks tend to share certain informal properties: large scale; continual growth distributed, organic growth: vertices “decide” who to link to interaction restricted to links mixture of local and long-distance connections abstract notions of distance: geographical, content, social,…

Do natural networks share more quantitative universals? What would these “universals” be? How can we make them precise and measure them? How can we explain their universality? This is the domain of social network theory Sometimes also referred to as link analysis

Page 6: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 6

Some Interesting Quantities

Connected components: how many, and how large?

Network diameter: maximum (worst-case) or average? exclude infinite distances? (disconnected components) the small-world phenomenon

Clustering: to what extent that links tend to cluster “locally”? what is the balance between local and long-distance

connections? what roles do the two types of links play?

Degree distribution: what is the typical degree in the network? what is the overall distribution?

Page 7: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 7

A “Canonical” Natural Network has…

Few connected components: often only 1 or a small number, indep. of network size

Small diameter: often a constant independent of network size (like 6) or perhaps growing only logarithmically with network

size or even shrink? typically exclude infinite distances

A high degree of clustering: considerably more so than for a random network in tension with small diameter

A heavy-tailed degree distribution: a small but reliable number of high-degree vertices often of power law form

Page 8: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 8

Probabilistic Models of Networks

All of the network generation models we will study are probabilistic or statistical in nature

They can generate networks of any size They often have various parameters that can be set:

size of network generated average degree of a vertex fraction of long-distance connections

The models generate a distribution over networks Statements are always statistical in nature:

with high probability, diameter is small on average, degree distribution has heavy tail

Thus, we’re going to need some basic statistics and probability theory

Page 9: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 9

Zipf’s Law Look at the frequency of English words:

“the” is the most common, followed by “of”, “to”, etc.

claim: frequency of the n-th most common ~ 1/n (power law, α = 1)

General theme: rank events by their frequency of occurrence resulting distribution often is a power law!

Other examples: North America city sizes personal income file sizes genus sizes (number of species)

People seem to dither over exact form of these distributions (e.g. value of α), but not heavy tails

Page 10: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 10

Linear scales on both axes

Logarithmic scales on both axes

The same data plotted on linear and logarithmic scales. Both plots show a Zipf distribution with 300 datapoints

Zipf’s Law

Page 11: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 11

Social Network Analysis

Social Network Introduction

Statistics and Probability Theory

Models of Social Network Generation

Networks in Biological System

Summary

Page 12: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 12

Some Models of Network Generation

Random graphs (Erdös-Rényi models): gives few components and small diameter does not give high clustering and heavy-tailed degree

distributions is the mathematically most well-studied and understood model

Watts-Strogatz models: give few components, small diameter and high clustering does not give heavy-tailed degree distributions

Scale-free Networks: gives few components, small diameter and heavy-tailed

distribution does not give high clustering

Hierarchical networks: few components, small diameter, high clustering, heavy-tailed

Affiliation networks: models group-actor formation

Page 13: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 13

The Clustering Coefficient of a Network

Let nbr(u) denote the set of neighbors of u in a graph all vertices v such that the edge (u,v) is in the graph

The clustering coefficient of u: let k = |nbr(u)| (i.e., number of neighbors of u) choose(k,2): max possible # of edges between vertices in nbr(u) c(u) = (actual # of edges between vertices in

nbr(u))/choose(k,2) 0 <= c(u) <= 1; measure of cliquishness of u’s neighborhood

Clustering coefficient of a graph: average of c(u) over all vertices u

k = 4choose(k,2) = 6c(u) = 4/6 = 0.666…

Page 14: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 14

Clustering: My friends will likely know each other!

Probability to be connected C » p

C =# of links between 1,2,…n neighbors

n(n-1)/2

Networks are clustered [large C(p)]

but have a small characteristic path length

[small L(p)].

Network C Crand L N

WWW 0.1078 0.00023 3.1 153127

Internet 0.18-0.3 0.001 3.7-3.763015-6209

Actor 0.79 0.00027 3.65 225226

Coauthorship 0.43 0.00018 5.9 52909

Metabolic 0.32 0.026 2.9 282

Foodweb 0.22 0.06 2.43 134

C. elegance 0.28 0.05 2.65 282

The Clustering Coefficient of a Network

Page 15: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 15

Erdos-Renyi: Clustering Coefficient

Generate a network G according to G(N,p) Examine a “typical” vertex u in G

choose u at random among all vertices in G what do we expect c(u) to be?

Answer: exactly p! In G(N,m), expect c(u) to be 2m/N(N-1) Both cases: c(u) entirely determined by overall

density Baseline for comparison with “more clustered”

models Erdos-Renyi has no bias towards clustered or

local edges

Page 16: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 16

Scale-free Networks

The number of nodes (N) is not fixed Networks continuously expand by additional new

nodes WWW: addition of new nodes Citation: publication of new papers

The attachment is not uniform A node is linked with higher probability to a node

that already has a large number of links WWW: new documents link to well known sites

(CNN, Yahoo, Google) Citation: Well cited papers are more likely to be

cited again

Page 17: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 17

Scale-Free Networks Start with (say) two vertices connected by an edge For i = 3 to N:

for each 1 <= j < i, d(j) = degree of vertex j so far let Z = S d(j) (sum of all degrees so far) add new vertex i with k edges back to {1, …, i-1}:

i is connected back to j with probability d(j)/Z Vertices j with high degree are likely to get more links! “Rich get richer” Natural model for many processes:

hyperlinks on the web new business and social contacts transportation networks

Generates a power law distribution of degrees exponent depends on value of k

Page 18: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 18

Preferential attachment explains heavy-tailed degree distributions small diameter (~log(N), via “hubs”)

Will not generate high clustering coefficient no bias towards local connectivity, but towards

hubs

Scale-Free Networks

Page 19: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 19

Social Network Analysis

Social Network Introduction

Statistics and Probability Theory

Models of Social Network Generation

Networks in Biological System

Mining on Social Network

Summary

Page 20: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 20

protein-gene interactions

protein-protein interactions

PROTEOME

GENOME

Citrate Cycle

METABOLISM

Bio-chemical reactions

Bio-Map

Page 21: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 21

Citrate Cycle METABOLISM Bio-chemical reactions

Metabolic Network

Page 22: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 22

Page 23: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 23

Nodes: chemicals (substrates)

Links: bio-chemical reactions

Metabolic Network

Page 24: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 24

Organisms from all three domains of life are scale-free networks!

H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi, Nature, 407 651 (2000)

Archaea Bacteria Eukaryotes

Metabolic Network

Page 25: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 25

protein-gene interactions

protein-protein interactions

PROTEOME

GENOME

Citrate Cycle

METABOLISM

Bio-chemical reactions

Bio-Map

Page 26: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 26

protein-protein interactions

PROTEOME

Protein Network

Page 27: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 27

Nodes: proteins

Links: physical interactions (binding)

P. Uetz, et al. Nature 403, 623-7 (2000).

Yeast Protein Network

Page 28: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 28

)exp()(~)( 00

k

kkkkkP

H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature 411, 41-42 (2001)

Topology of the Protein Network

Page 29: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 29

Nature 408 307 (2000)

“One way to understand the p53 network is to compare it to the Internet. The cell, like the Internet, appears to be a ‘scale-free network’.”

p53 Network

Page 30: Socialnetworkanalysis (Tin180 Com)

April 12, 2023 Data Mining: Concepts and Techniques 30

p53 Network (mammals)