EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering...

John C.S. LuiComputer science & Engineering Dept.The Chinese University of Hong Kong

EXPLORING LARGE GRAPHS: FROM RANDOM WALK TO CYBER-INSURANCE

2

Motivationsample of Twitter network

measure characteristics of

networks in the wild

3

measurement distortions

“World Map” in 1459

proved incomplete (Columbus et al. 1492)(Australia 17th

century)

wrong proportions (Africa & Asia)

The Fra Mauro world map (1459)source: Wikipedia

methods to sample graphs (e.g., online social networks)

uniform vertex sampling v.s. uniform edge sampling

random walks

Frontier sampling random walk

results

Outline

5

Sampling graphsrandom sampling

(uniform & independent)

crawling

vertex sampling BFS sampling

random walk sampling edge sampling

uniform vertex sampling θi - fraction of vertices with degree i

vertex with degree i is sampled with probability θi

uniform edge sampling πi - probability that a vertex with degree i is sampled

πi = θi x i / <average degree>

estimating θi from πi (uniform edge) : trivial to remove bias

Independent sampling

v u

7

estimate: θi - fraction of vertices with degree i ;

budget: B samples accuracy metric: Normalized root Mean

Squared Error

uniform vertex

uniform edge

Random sampling: accuracy of estimates

,

8

Independent sampling: uniform vertex vs. uniform edge

Flickr graph (1.7 M vertices, 22M

edges)

sampling budget: B = |V|/100

samples

uniform edge

uniform vertex

head: GOOD tail: BAD

GO

OD

head: BAD tail: GOOD

BA

D

vertex degree

avg

. d

eg

ree

9

uniform vertex

pros: independent

sampling OSN needs numeric

user IDs. E.g.: Livejournal, Flickr, MySpace, Facebook,...

cons: resource intensive

(sparse user ID space)

difficult to sample large degree vertices

pros & consuniform edge

pros:◦ independent sampling

◦ easy to sample large degree vertices

cons:◦ no public OSN interface

to sample edges

◦ difficult to sample small degree vertices

start at v randomly selects a neighbor of v ...

until B samples

vertices can be sampled multiple times

often (resource-wise) cheaper than uniform vertex sampling

graph should be connected

multiple RWs: m independent walkers to capture B/m samples

random walk (RW) [crawling]

11

RW degree distribution estimation

θi – fraction of vertices with degree i

P[sampled degree = i] πi

in steady state samples edges uniformly (only if graph connected)

RW = uniform edge sampling without independence

CC

DF

RW sampling

πi

θi

(i)

distribution observed by RW

true distribution

P[X

> x

]x = degree (log-

scale)

12

uniform vertex

pros: independent

sampling supported by OSNs

with numeric user IDs: Livejournal, Flickr, MySpace, Facebook,...

cons: resource intensive

(sparse user ID space)

difficult to sample large degree vertices

pros & consuniform edge RW

pros:◦ independent sampling◦ easy to sample high

degree vertices◦ resource-wise cheap

cons:◦ graph must be connected

◦ large estimation errors when graph looselyconnected

◦ should start in steady state (discard transient samples, but transient is unknown)

13

uniform vertex samples both A and B subgraphs but is expensive

RW samples either A or B

but is cheap

Hybrid sampling?

A

B

14

design a RW that in steady state samples edges uniformly (importance

sampling)

&

initialize steady state w/ uniform vertex sampling

in steady state we want to sample vertices proportional to degree

to start with uniformly sampled vertices

puzzle

15

Need to think in multiple dimensions (multiple walkers)

16

B – sampling budget

Let S = {v1, v2, … , vm} be a set of m vertices

(1) select vr S w.p. deg(vr)

(2) walk one step from vr

(3) add walked edge to E’ and update vr

(4) return to (1) (until m + | E’ | = B)

Multiple dependentwalkersFrontier Sampling (FS)

17

FS: an m-dimensional RWGm = m-th Cartesian power of G

G

Frontier sampling

random walk on Gm

u

j

k

u

j

k

=

G

u,u

j,u

u,k

u,j

k,u

k,k

j,j

= G2

k,jj,k

18

when in steady state (m → )

FS state at step k: Sk=(v1, v2, ... , v)

FS state at step k+1: Sk+1=(v1, u2, ... , v)

samples edges uniformly (like a RW)

m → number of walkers in v V is uniformly distributed (uniform vertex sampling)

FS property

uniform vertex

distribution

v2 , u2 chosen proportional

to their degrees

19

sample paths of θ1 estimates (Flickrgraph)

Flickr: 1.7M vertices, 22M edges Plot evolution (n) , where n = number of

steps 4 sample paths = 4 curves

20

2 Albert-Barabasi graphs (5x105 vertices) w/ avg. deg. 2 and 10 connected by 1 edge

GAB graph

AB2

AB10

1 edge

Outline

Motivation

Model of strategic invesment

Effect of cyber-insurance market on strategic investment behavior

Performance Evaluation

Summary & Lessons Learnt

From distribution to Cyber-Insurance

Motivation

Technical measures of security are abundant Antivirus software, firewalls, intrusion

detection… Ineffective New virus, worms or new form of attacks Carelessness or controllability of administrators,

….etc

Loss due to lack of network security is still big! AT&T’s chief security officer: cyber-criminals’

annual profit exceeds $1trillion- about 7% of US GDP.

In 2009, total reported losses due to payment fraud in US were $641 million.

Why?

Motivation

Virus spreading make security interdependent The interactions of nodes form graph

First goal: model strategic security investment behavior

Example of virus spreading

The investment of nodes can influence each other

Invest in security or not?

Motivation

Security risks can not be completely eliminated through technical measures

Cyber-insurance, offered by companies (e.g. AIG), can be resorted to deal with the residual risk.

However, cyber-insurance market is slow developing (estimated at $450 million)

Second goal: study the influence of cyber-insurance on strategy security investment

Understand what we need to (re-) engineer so to bring this activate this new business

Model: epidemic model

Combines epidemic theory and game theory Epidemic model: spreading of virus

Investment model: decision on security investment

Epidemic model: Use a graph to denote the interaction

relationship

State of node i: , healthy; infected

Each infected node contaminate neighbors with prob. (bond percolation process on G)

Initial state of node i: , denotes whether a node is attacked

The final state is given by the recursive equation

Model: investment model

Investment model (nodes are risk averse) Increasing, concave utility function:

Assumption: binary action : infected initially with prob.

: infected initially with prob.

Utility of no investment:

Utility with investment:

Determined by the epidemic model: virus spreading process

Loss of getting infected

Cost of secure

measure

Model: Bayesian network game

Bayesian network game: Practical situation: nodes have limited info on

the graph

Assumption: minimum common information, degree distribution of nodes

It defines a Bayesian network game The analysis of BNG is more tractable

Nodes are classified into types according to degree Loss distribution CDF:

Same cost of security investment:

Analysis

Nodes will invest if and only if

Problem: how node i estimate and with incomplete graph infoBy assuming the topology is random graph

with given degree distribution

Make use of local mean field technique

Analysis

The structure of random graph is locally tree-like

The prob. of a node with degree k getting infected

Where is given by

the prob. a neighbor is infected

Analysis

Prob. reduction of getting infected by taking action

Let be the fraction of nodes with degree k taking action is a decreasing function of

As a result, is an increasing function of

This reflects the positive externality effect The value of action increases as other nodes

take action

Nodes with higher degree are more sensitive to the externality effect

Self-fulfilling expectation equilibrium

Final adoption fractionNodes with degree k with take secure

measure if their loss is greater than , the fraction is , they are give by fixed point equations

Theorem: They have at least one equilibrium

Cyber-insurance

Cyber-insurance

Two main issues with insuranceMoral hazard: user with insurance will

invest less in security

Adverse selection: happens when insurance provider can not observe the protection of nodes

Model of insurance market Insurance provider offers insurance at price Pay , compensated if get infected

Assumption: competitive insurance market

Cyber-insurance

Model of insurance marketUtility of buying insurance amount

User will choose = , the loss, to maximize its utility

Full insurance coverage

Effect of cyber-insurance

With insurance market, user will choose iff

Without insurance market, user will choose iff

In order for insurance to be positive incentive,

Define , it


Condition of cyber-insurance to be an incentive greater than

is boundedCondition of


Effect on nodes with different degree For

Thus

Insurance market will be more likely to be an incentive for nodes with higher degree

Simulation & numerical results

Simulation & numerical results Verifying local mean field on random

graphs

Simulation & numerical results

Positive externality

Summary

Random walk on large graphs

Proposed a model, combining epidemic theory and game theory, to study strategic investment behaviorBayesian network game

Positive externality effect

Studied the effect of cyber-insurance Positive incentive: initial secure condition is

bad, while protection is bounded

Thank you!Q&A

EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering...

Documents

Transcript of EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering...