Modeling Malware Spreading Dynamics
description
Transcript of Modeling Malware Spreading Dynamics
![Page 1: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/1.jpg)
Modeling Malware Spreading Dynamics
Michele Garetto (Politecnico di Torino – Italy)
Weibo Gong (University of Massachusetts – Amherst – MA)
Don Towsley (University of Massachusetts – Amherst – MA)
INFOCOM 2003
![Page 2: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/2.jpg)
2
Outline Motivation Modeling framework
Interactive Markov Chains Analysis and simulation of
malware propagation dynamics Percolation problem Transient behavior
Conclusions
![Page 3: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/3.jpg)
3
Motivation The Internet is an easy and powerful
mechanism for propagating malicious software programs : the “malware” (email viruses, worms, …)
It is expected that future malware acitivity will be more prevalent and virulent, resulting in significant greater damage and economic losses
![Page 4: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/4.jpg)
4
Motivation Dynamics of malware propagation are still
not well understood We would like to:
Predict the temporal evolution of an infection process that starts propagating on a network
Design and evaluate effective countermeasures Assess the defensibility and vulnerability of different
network architectures Need to develop mathematical
methodologies that are able capture the spreading characteristics of malware
![Page 5: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/5.jpg)
5
Our contribution A flexible modeling framework based on
Interactive Markov Chains, able to capture the probabilistic nature of malware propagation
Application of such framework to the case of email viruses: Identification of a “percolation problem” Investigation of the impact of the underlying
network topology
Analytical bounds and approximations validated through extensive simulations
![Page 6: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/6.jpg)
6
The “Interactive Markov Chain” (IMC) Modeling Framework
Global network structure ...
alertalert failednormal
Local Structure(can vary from node to node)
secure insecure emergency
Each node is represented by a Markov chain, whose state transitions are influenced by the status of its neighbors
but locally a Markov chain
Global Structure(the network)
![Page 7: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/7.jpg)
7
Computational complexity issue
The whole system evolves according to a global Markov chain G, whose state space dimension ( #G ) is equal to the product of the local chain dimensions ( #L ) G = L N
The solution of the global Markov chain is feasible only for small systems
example: - 20 nodes - binary status (0 = not infected, 1 = infected)
220 states !
![Page 8: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/8.jpg)
8
How can we study very large systems (thousands of nodes) ?
Discrete event simulations of the model
how many runs ? how long ? could be computationally too expensive do not help to understand the system dynamics
Analytical bounds and approximations
quick prediction of the system behavior gross-level approximations can be sufficient provide insights into the inner dynamics
![Page 9: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/9.jpg)
9
The virus propagates as attachment to e-mail messages
Requires human assistance random time elapses before the
recipient reads the message the “click” probability
The virus makes use of the recipient’s address book to send copies of itself
E-mail virus propagation
![Page 10: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/10.jpg)
10
We consider the network graph induced by email address books Each node stands for an email address Edges represent social or business
relationships
The resulting graph is expected to have “small world” properties: small characteristic path length high clustering coefficient
IMC model : global structure
![Page 11: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/11.jpg)
11
IMC model : local structure 3 statuses for each node:
S (Susceptible): the node can be infected by the virus I (Infected): the node has been infected by the virus M (Immune): the node can no more be infected by the
virus
Discrete time model probability that node “j” is susceptible at time
k probability that node “j” is infected at time k
probability that node “j” is immune at time k
![Page 12: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/12.jpg)
12
IMC model
i
j
wij
I M
S
cj1-cj
![Page 13: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/13.jpg)
13
Virus propagation model
The numerical solution of the system requires to know the joint probabilities of neighboring nodes:
?
![Page 14: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/14.jpg)
14
Fundamental questions about malware propagation dynamics:
Virus propagation model
What is the final size of the infection outbreak ? How many nodes (on average) will be
reached by the virus at the end ? How fast is the malware propagation ?
What is the (average) number of infected node as a function of time ?
![Page 15: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/15.jpg)
15
The “small-world” model of Watts and Strogatz
A ring lattice with additional random shortcuts
Parameters:N = number of nodesS = number of shortcuts ( = shortcuts density)
k = lattice connectivity (number of neighbors on each side of a node)
N = 24 S = 4 k = 3
![Page 16: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/16.jpg)
16
What is the final size of the infection outbreak ?
Not all of the susceptible nodes necessarily receive a copy of the virus !
site percolation problem
(node occupation probability = click probability)
![Page 17: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/17.jpg)
17
0.01
0.1
1
10
100
1000
10000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8click probability (c)Av
erag
e nu
mbe
r of
infe
cted
sit
es = 0.1 = 0.01 = 0.001
= 0.0001
Site percolation problem on the small world graph: exact asymptotic result
[Moore, Newman 2000]
![Page 18: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/18.jpg)
18
Transient analysis of an infection process : Bounds
Probability that a node has been reached by the virus
Lower boundproperty of joint probabilities
Upper boundtheory of Associated Random Variables
![Page 19: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/19.jpg)
19
Analytical bounds
0
1000
2000
3000
4000
5000
6000
7000
8000
0 500 1000 1500 2000 2500
Ave
rage
num
ber o
f inf
ecte
d no
des
time
simupper bound
lower bound
Infinite unidimentional lattice
k = 1k = 10
k = 100
(click probability = 1)1
![Page 20: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/20.jpg)
20
Transient analysis of an infection process : approximation
Linear mixing of lower bound and upper bound
k = local connectivity of the nodes = self-influence probability
![Page 21: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/21.jpg)
21
Fitting of mixing coefficient M(k,s)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1 10 100 1000
Mix
ing
coeffi
cien
t (M
)
Connectivity (k)
s = 0s = 1/3s = 2/3s = 0.9
Infinite unidimentional lattice
![Page 22: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/22.jpg)
22
Approximate analysis on the small world graph: the impact of topology
0200400600800
100012001400160018002000
0 200 400 600 800 1000
Ave
rage
num
ber o
f inf
ecte
d no
des
time
model approxsimulation
k = 10 no shortcuts
k = 10 2 shortcuts
k ~ geom(10) no shortcuts
k = 10 20 shortcuts
Fully-connected graph
(2000 nodes - click probability = 1)
![Page 23: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/23.jpg)
23
Combining transient analysis and percolation on general topologies
Probability not to be reached by the virus = initial immunization
overestimate of the spreading rate of the virus
Upper bound of the reaching probability on general topologies:
Global upper bound for the infection process
![Page 24: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/24.jpg)
24
0
1000
2000
3000
4000
5000
6000
0 500 1000 1500 2000 2500 3000 3500 4000
Aver
age
num
ber
of in
fect
ed n
odes
time
simulationmodel approx + bound perc
Transient analysis and percolation on power-law random graphs
10000 nodes
GLP algorithm (Bu 2002)power-law node degree, small-world properties
(click probability = 0.5)
m = initial connectivitym = 1
m = 2
m = 4
One initially infected node with degree 10
![Page 25: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/25.jpg)
25
Conclusions We have proposed an analytical framework
to study the dynamics of malware propagation on a network
We have obtained useful bounds and approximations to study an infection process on a general topology
Approach suitable to analyze a wide range of “dynamic interactions on networks” (routing protocols, p2p,…)
![Page 26: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/26.jpg)
26
The End
Thanks…
![Page 27: Modeling Malware Spreading Dynamics](https://reader030.fdocuments.net/reader030/viewer/2022013012/56816842550346895dde13fe/html5/thumbnails/27.jpg)
27
Site percolation on a given small-world graph
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 100 200 300 400 500 600 700 800 900 999
Rea
chin
g pr
obab
ility
node index
simupper bound - h = 0upper bound - h = 8approximationlower bound - h = 8lower bound - h = 0