EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering...

44
John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK TO CYBER-INSURANCE

Transcript of EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering...

Page 1: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

John C.S. LuiComputer science & Engineering Dept.The Chinese University of Hong Kong

EXPLORING LARGE GRAPHS: FROM RANDOM WALK TO CYBER-INSURANCE

Page 2: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

2

Motivationsample of Twitter network

measure characteristics of

networks in the wild

Page 3: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

3

measurement distortions

“World Map” in 1459

proved incomplete (Columbus et al. 1492)(Australia 17th

century)

wrong proportions (Africa & Asia)

The Fra Mauro world map (1459)source: Wikipedia

Page 4: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

methods to sample graphs (e.g., online social networks)

uniform vertex sampling v.s. uniform edge sampling

random walks

Frontier sampling random walk

results

Outline

Page 5: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

5

Sampling graphsrandom sampling

(uniform & independent)

crawling

vertex sampling BFS sampling

random walk sampling edge sampling

Page 6: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

uniform vertex sampling θi - fraction of vertices with degree i

vertex with degree i is sampled with probability θi

uniform edge sampling πi - probability that a vertex with degree i is sampled

πi = θi x i / <average degree>

estimating θi from πi (uniform edge) : trivial to remove bias

Independent sampling

v u

Page 7: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

7

estimate: θi - fraction of vertices with degree i ;

budget: B samples accuracy metric: Normalized root Mean

Squared Error

uniform vertex

uniform edge

Random sampling: accuracy of estimates

,

Page 8: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

8

Independent sampling: uniform vertex vs. uniform edge

Flickr graph (1.7 M vertices, 22M

edges)

sampling budget: B = |V|/100

samples

uniform edge

uniform vertex

head: GOOD tail: BAD

GO

OD

head: BAD tail: GOOD

BA

D

vertex degree

avg

. d

eg

ree

Page 9: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

9

uniform vertex

pros: independent

sampling OSN needs numeric

user IDs. E.g.: Livejournal, Flickr, MySpace, Facebook,...

cons: resource intensive

(sparse user ID space)

difficult to sample large degree vertices

pros & consuniform edge

pros:◦ independent sampling

◦ easy to sample large degree vertices

cons:◦ no public OSN interface

to sample edges

◦ difficult to sample small degree vertices

Page 10: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

start at v randomly selects a neighbor of v ...

until B samples

vertices can be sampled multiple times

often (resource-wise) cheaper than uniform vertex sampling

graph should be connected

multiple RWs: m independent walkers to capture B/m samples

random walk (RW) [crawling]

Page 11: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

11

RW degree distribution estimation

θi – fraction of vertices with degree i

P[sampled degree = i] πi

in steady state samples edges uniformly (only if graph connected)

RW = uniform edge sampling without independence

CC

DF

RW sampling

πi

θi

(i)

distribution observed by RW

true distribution

P[X

> x

]x = degree (log-

scale)

Page 12: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

12

uniform vertex

pros: independent

sampling supported by OSNs

with numeric user IDs: Livejournal, Flickr, MySpace, Facebook,...

cons: resource intensive

(sparse user ID space)

difficult to sample large degree vertices

pros & consuniform edge RW

pros:◦ independent sampling◦ easy to sample high

degree vertices◦ resource-wise cheap

cons:◦ graph must be connected

◦ large estimation errors when graph looselyconnected

◦ should start in steady state (discard transient samples, but transient is unknown)

Page 13: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

13

uniform vertex samples both A and B subgraphs but is expensive

RW samples either A or B

but is cheap

Hybrid sampling?

A

B

Page 14: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

14

design a RW that in steady state samples edges uniformly (importance

sampling)

&

initialize steady state w/ uniform vertex sampling

in steady state we want to sample vertices proportional to degree

to start with uniformly sampled vertices

puzzle

Page 15: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

15

Need to think in multiple dimensions (multiple walkers)

Page 16: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

16

B – sampling budget

Let S = {v1, v2, … , vm} be a set of m vertices

(1) select vr S w.p. deg(vr)

(2) walk one step from vr

(3) add walked edge to E’ and update vr

(4) return to (1) (until m + | E’ | = B)

Multiple dependentwalkersFrontier Sampling (FS)

Page 17: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

17

FS: an m-dimensional RWGm = m-th Cartesian power of G

G

Frontier sampling

random walk on Gm

u

j

k

u

j

k

=

G

u,u

j,u

u,k

u,j

k,u

k,k

j,j

= G2

k,jj,k

Page 18: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

18

when in steady state (m → )

FS state at step k: Sk=(v1, v2, ... , v)

FS state at step k+1: Sk+1=(v1, u2, ... , v)

samples edges uniformly (like a RW)

m → number of walkers in v V is uniformly distributed (uniform vertex sampling)

FS property

uniform vertex

distribution

v2 , u2 chosen proportional

to their degrees

Page 19: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

19

sample paths of θ1 estimates (Flickrgraph)

Flickr: 1.7M vertices, 22M edges Plot evolution (n) , where n = number of

steps 4 sample paths = 4 curves

Page 20: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

20

2 Albert-Barabasi graphs (5x105 vertices) w/ avg. deg. 2 and 10 connected by 1 edge

GAB graph

AB2

AB10

1 edge

Page 21: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Outline

Motivation

Model of strategic invesment

Effect of cyber-insurance market on strategic investment behavior

Performance Evaluation

Summary & Lessons Learnt

Page 22: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

From distribution to Cyber-Insurance

Page 23: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Motivation

Technical measures of security are abundant Antivirus software, firewalls, intrusion

detection… Ineffective New virus, worms or new form of attacks Carelessness or controllability of administrators,

….etc

Loss due to lack of network security is still big! AT&T’s chief security officer: cyber-criminals’

annual profit exceeds $1trillion- about 7% of US GDP.

In 2009, total reported losses due to payment fraud in US were $641 million.

Why?

Page 24: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Motivation

Virus spreading make security interdependent The interactions of nodes form graph

First goal: model strategic security investment behavior

Example of virus spreading

The investment of nodes can influence each other

Invest in security or not?

Page 25: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Motivation

Security risks can not be completely eliminated through technical measures

Cyber-insurance, offered by companies (e.g. AIG), can be resorted to deal with the residual risk.

However, cyber-insurance market is slow developing (estimated at $450 million)

Second goal: study the influence of cyber-insurance on strategy security investment

Understand what we need to (re-) engineer so to bring this activate this new business

Page 26: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Model

Page 27: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Model: epidemic model

Combines epidemic theory and game theory Epidemic model: spreading of virus

Investment model: decision on security investment

Epidemic model: Use a graph to denote the interaction

relationship

State of node i: , healthy; infected

Each infected node contaminate neighbors with prob. (bond percolation process on G)

Initial state of node i: , denotes whether a node is attacked

The final state is given by the recursive equation

Page 28: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Model: investment model

Investment model (nodes are risk averse) Increasing, concave utility function:

Assumption: binary action : infected initially with prob.

: infected initially with prob.

Utility of no investment:

Utility with investment:

Determined by the epidemic model: virus spreading process

Loss of getting infected

Cost of secure

measure

Page 29: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Model: Bayesian network game

Bayesian network game: Practical situation: nodes have limited info on

the graph

Assumption: minimum common information, degree distribution of nodes

It defines a Bayesian network game The analysis of BNG is more tractable

Nodes are classified into types according to degree Loss distribution CDF:

Same cost of security investment:

Page 30: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Analysis

Nodes will invest if and only if

Problem: how node i estimate and with incomplete graph infoBy assuming the topology is random graph

with given degree distribution

Make use of local mean field technique

Page 31: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Analysis

The structure of random graph is locally tree-like

The prob. of a node with degree k getting infected

Where is given by

the prob. a neighbor is infected

Page 32: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Analysis

Prob. reduction of getting infected by taking action

Let be the fraction of nodes with degree k taking action is a decreasing function of

As a result, is an increasing function of

This reflects the positive externality effect The value of action increases as other nodes

take action

Nodes with higher degree are more sensitive to the externality effect

Page 33: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Self-fulfilling expectation equilibrium

Final adoption fractionNodes with degree k with take secure

measure if their loss is greater than , the fraction is , they are give by fixed point equations

Theorem: They have at least one equilibrium

Page 34: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Cyber-insurance

Page 35: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Cyber-insurance

Two main issues with insuranceMoral hazard: user with insurance will

invest less in security

Adverse selection: happens when insurance provider can not observe the protection of nodes

Model of insurance market Insurance provider offers insurance at price Pay , compensated if get infected

Assumption: competitive insurance market

Page 36: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Cyber-insurance

Model of insurance marketUtility of buying insurance amount

User will choose = , the loss, to maximize its utility

Full insurance coverage

Page 37: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Effect of cyber-insurance

With insurance market, user will choose iff

Without insurance market, user will choose iff

In order for insurance to be positive incentive,

Define , it

Page 38: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Effect of cyber-insurance

Condition of cyber-insurance to be an incentive greater than

is boundedCondition of

Page 39: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Effect of cyber-insurance

Effect on nodes with different degree For

Thus

Insurance market will be more likely to be an incentive for nodes with higher degree

Page 40: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Simulation & numerical results

Page 41: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Simulation & numerical results Verifying local mean field on random

graphs

Page 42: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Simulation & numerical results

Positive externality

Page 43: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Summary

Random walk on large graphs

Proposed a model, combining epidemic theory and game theory, to study strategic investment behaviorBayesian network game

Positive externality effect

Studied the effect of cyber-insurance Positive incentive: initial secure condition is

bad, while protection is bounded

Page 44: EXPLORING LARGE GRAPHS: FROM RANDOM … exploring...John C.S. Lui Computer science & Engineering Dept. The Chinese University of Hong Kong EXPLORING LARGE GRAPHS: FROM RANDOM WALK

Thank you!Q&A