Download - Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Transcript
Page 1: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Network Tomography and Anomaly Detection

Mark Coates

Tarem Ahmed

Network map from www.opte.org

Page 2: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Brain mapping(opening it up candisturb the system)

Internet mapping(opening it up candisturb the system)

Page 3: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

too complex to measure everywhere, all the time

traffic measurements expensive (hardware, bandwidth)

1969 19932005

Internet Boom

Page 4: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

unknown object

statistical model

measurements

Maximumlikelihood estimate

maximizelikelihood

physics

data

prior knowledge MRF model

counting &projection

Poisson

Brain Tomography

Page 5: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Link-level Network Tomography

unknown object

measurements

Maximumlikelihood estimate

maximizelikelihood

physics

data

prior knowledgequeuing behavior

end-to-endmeasurements

Page 6: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

topology / connectivitylink-level loss probability and delay distribution

Solely from edge-based traffic measurements, infer internal

Link-level Network Tomography

Page 7: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Challenges:

• 12 % never respond,15 % multiple interfaces - Barford et al (2000)

• detect level-2 topology “invisible” to IP layer (e.g., switches)

Application: Topology Discovery

Page 8: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Application: Overlay Voice-over-IP

Multiple paths to choose from select paths with minimal

delay or delay variance

Send a small number of critical packets (vocal transitions) along multiple paths

Use these packets to estimate the path delays (and the extent of path diversity)

Access Network

Autonomous System(s)

Service Gateway

Overlay Link

Page 9: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Network Monitoring

Challenges Restricted measurement High volumes and high rates of data (sampling of traffic on

Gb/s routers) High dimensional data (source/destination IP addresses,

port numbers)

Goals Supply networking protocols with relevant performance

information. Identify anomalous behaviour and operational transitions. Provide network administrators with appropriate notification

or visualization.

Page 10: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Outline

Inference about network performance based on passive measurements or active probing

Two components to the talk: Network tomography Network anomaly detection

Focus on online, sequential approaches Account for non-stationary behaviour Don’t repeat work that has already been done

Page 11: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

A = routing matrix (graph)

= packet loss probabilities or queuing delays for each link

y = packet losses or delays measured at the edge

= randomness inherent in traffic measurements

),|(),( AyfAl Statistical likelihood function

Ay

Network Tomography: Likelihood Formulation

Page 12: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Ay

Solve the linear system

Interesting if A, , or have special structures

)|()( Ayfl Maximize the likelihood function

)|(),( AyfAl or:

Classical Problem

Page 13: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

sender

receivers

Network Tomography: The Basic Idea

Page 14: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

sender

receivers

Network Tomography: The Basic Idea

Page 15: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

measurement packet pair

cross-traffic(2)packet (1)packet

(2)packet (1)packet

delay

packet(1) and packet(2) experience (nearly) identicallosses and/or delays on shared links

Packet-pair measurements

Page 16: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Cross-trafficCross-traffic

Modelling time-variations

Nonstationary cross-traffic induces time-variation

Directly model the dynamics (but not the traffic!)

Goal is to perform online tracking and prediction of network link characteristics

Page 17: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Introduce time-dependence in parameters

t t t ty A

Filtering exercise (track θt ):

1:( | )ˆ [ ]

t tt p y t E

(1) Describe dynamic behaviour of θt

(2) Form estimate: (MMSE)

Non-stationary behaviour

Page 18: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Limimi },{ ,,

Limimi },{ 1,1,

Particle Filtering

Page 19: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

• Time-varying delay distribution of window size R at time m

• In each window, R probe measurements.

• Form estimates of average delay and jitter over short time intervals

)(, kT Rm

time

Delay units

Delay unit

Delay Distribution Tracking

Page 20: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

• Queue/traffic model:

reflected random walk on [0,max_del]

),0(loglog 2,1, Nmjmj

mj ,

)exp()( ,,, mjmjmj kkp

Delay units

Probability

Dynamic Model

Page 21: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

• Measurements:

Observe

)(~ ,, kpx mjmj

)(packet(1) m)(packet(2) m

)()2( my )()1( my

2,1),(,,

ixymjPathsmsmj

Observations

Page 22: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

• Sequential Monte Carlo Approximation to posterior mean estimate:

)()()(

1,, ),,|()(ˆ i

mi

mi

mm

N

imjmj wykxpkp

Message-passing algorithm

• Estimate of time-varying delay distribution:

Particle weights

, , 1:1

1ˆ ˆ( ) ( | )m

m R j l ll m R

T k p x yR

Estimation of Delay Distributions

Page 23: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

• Complexity: per measurement)( 2NLKO

Average Number of Unique Links

Max. delay units per link

Number of Particles

• Convergence analysis of [Crisan, Doucet 01; Le Gland,

Oudjane 02] applies.

• The approximation to the posterior mean estimate converges to the true estimate as N ∞

Analysis

Page 24: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

time

Mean Delay

Delay Distributions

true

tracking

Simulation Results – ns2

Page 25: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Comments

Dynamic models allow us to account for non-stationarity but realistic models are hard to derive and incorporate

Particle filtering only appropriate when analytical techniques fail non-Gaussian or non-linear dynamics or observations Sequential structure allows on-line implementation Care must be taken to reduce computation at each step

Page 26: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Network Anomaly Detection

In tomography, a primary challenge is the restriction on available measurements.

Anomaly detection – a primary challenge is the abundance of measurements.

How can we process data at a sufficient rate?

How should we extract relevant information?

Page 27: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Netflow Data

Records of flows.

A flow is defined by: (source IP, dest. IP, source port #, dest. port #)

Packets are sampled at configurable rates.

Exported at 1-minute or 5-minute intervals.

Page 28: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Dataset – Abilene Network

Abilene Weathermap – Indiana University

Thanks to Rick Summerhill and Mark Fullmer at Abilene for providing access to the data.

Page 29: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Principal Component Analysis (PCA)

Goal: Identify a low-dimensional subspace that captures the key components of the feature set

Idea: If (most of) a measurement does not lie in this subspace, then it is anomalous

PCA conduct a linear transformation to choose a new coordinate

system Projection onto first principal component has greater

variance than any other projection (maximum energy). Subsequent principal components capture greatest

remaining energy

Page 30: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

PCA (2)

Reduce dimensionality by eliminating principal components that do not contribute significantly to variance in the dataset (small singular value)

Not optimized for class separability (linear discriminant analysis)

Minimizes reconstruction error under L2 norm.

Page 31: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

“Eigenflow” Analysis

Lakhina et al. (2004, 2004b).

PCA analysis of Origin-Destination (OD) Flows

Eigenflow: set of flows mapped onto a single principle component

Intrinsic Dimensionality: Empirical studies for Sprint and Abilene networks indicated that 5-10 principal components sufficed to capture most of the energy.

Page 32: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

PCA-based Anomaly Detection

Perform PCA on block of OD flow measurements

Project each measurement onto primary principal components

Test whether the residual energy exceeds a threshold.

Squared prediction error (SPE - Q-statistic) used to test for unusual flow-types.

Prone to Type-I errors (false positives) when applied to transient operations.

In these cases, the assumption that the source data is normally distributed is violated.

Page 33: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Online Method

Don’t need to relearn from scratch when new data arrive

Computational cost per time step should be bounded by constant independent of time

Block-based PCA unattractive

Alternative method: Kernel Recursive Least Squares (KRLS)

Page 34: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

KRLS

Represent function as:

Where {xi} are training points

Desire a sparse solution (storage and time savings + generalization ability)

Effective dimensionality of manifold spanned by training feature vectors may be much smaller than feature space dimension

Identify linearly independent feature vectors that approximately span this manifold.

t

iiikf

1

),()(ˆ xxx

Page 35: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

KRLS

Sequentially sample a stream of input/output pairs

At time step t, assume we have collected a dictionary of samples:

where by construction are linearly independent feature vectors

Ryyy ii ,X,),...,(),,( 2211 xxx

1

11~

tm

jjt xD

1

1)~(

tm

jjx

Page 36: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

KRLS

We encounter a new sample xt.

Test whether is approximately linearly dependent on feature vectors.

If not, add it to dictionary.

)( tx

2

1

1

)()~(mintm

jtjj

at a xx

Dictionary approximation Threshold

Page 37: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

KRLS Properties

Provided input set X is compact, then number of dictionary elements is finite.

Approximate version of kernel PCA eigenvectors with eigenvalues significantly larger

than are projected almost entirely onto the dictionary set.

O(m2) memory and O(tm2) time

Compare exact kernel PCA – O(t2) memory and O(t2p) time.

Page 38: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Application in Networks

Data set is the Origin-Destination Flows (11x11 matrix = 121 dimensional vector per measurement interval).

Normalized, these comprise the features.

We use the total traffic per measurement interval as the associated value y

Page 39: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Total traffic

Measurement interval

0000 hrs on Aug 10, 2005 to 2359 hrs Aug 21, 2005 at Chicago router. Gives 3456, 5-minute intervals over the 12-day period.

No.

Pac

kets

Page 40: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Origin-Destination Flows

t =1300 t =3000

t =100t =1

Page 41: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Building the Dictionary

= 0.1

= 0.2

Measurement interval Measurement interval

δδ

δδ

# E

lem

ents

# E

lem

ents

Gaussian

Linear

Page 42: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Dictionary Components

Element 20 Element 22

Element 6Element 5

Page 43: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

KRLS Anomaly Detection Algorithm

1. Based on xt , evaluate δt.

2. If δt < ν1, green-light traffic.

3. If δt < ν2, raise red alarm.

4. If ν1 < δt < ν2 raise orange alarm.

1. Test usefulness of xt. (Does φ(xt) provide good support for ensuing vectors).

2. If yes, add xt to the dictionary.

3. If no, raise red alarm.

5. Remove any obsolete dictionary elements

Page 44: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Evaluating Usefulness

Timestep

Ker

nel

val

ue

Normal

Obsolete

Anomalous

Page 45: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Anomaly Detection

KRLSPCA

OCNM

Euclideandistance

Magnitudeof Residual

KRLS

PCA

OCNM

Timestep

Page 46: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

PCA versus KRLSAnomaly 1

Timestep

No

. IP

flo

ws

Page 47: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

PCA Versus KRLS:Anomaly 2

Timestep

No

. IP

flo

ws

Mag

nit

ud

eo

f P

roje

ctio

n

Page 48: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Summary and Challenges

Network monitoring presents challenges on different fronts: Constraints on available measurements

(reconstruction based on partial views) High-rate, high-dimensional, distributed data

(Some of the many) open questions: Tomography: network models, spatial + temporal

correlations, optimal sampling, multiple source. Anomaly detection: thresholds, dictionary control,

feature space, dataset

Page 49: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Fig 3

False Alarm Rate (%)

Det

ecti

on

Rat

e (%

)

Page 50: Network Tomography and Anomaly Detection Mark Coates Tarem Ahmed Network map from .

Objective: Estimate expectations 0: 0:( ) ( )t t th d with respect to a sequence of distributionsknown up to a normalizing constant, i.e.

Monte Carlo: Obtain N weighted samples

0t t

0: 0: 0:( ) ( )t t t t td d

( ) ( )0: 1, ,

,i it t i Nw

( ) ( )

1

0, 1N

i it t

i

w w

where such that

( ) ( )0: 0: 0:

1

Ni it t t t tN

i

w h h d

Particle filtering