Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network...

17
Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates [email protected], [email protected], [email protected] USENIX SysML, Cambridge, MA April 10, 2007 Research supported by Canadian National Science and Engineering Research Council (NSERC) through the Agile All- Photonics Research Network (AAPN) and MITACS projects

Transcript of Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network...

Page 1: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

Machine Learning Approaches to Network Anomaly Detection

Tarem Ahmed, Boris Oreshkin and Mark [email protected], [email protected], [email protected]

USENIX SysML, Cambridge, MA

April 10, 2007

Research supported by Canadian National Science and Engineering Research Council (NSERC) through the Agile All-

Photonics Research Network (AAPN) and MITACS projects

Page 2: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

Introduction

Network data arevoluminoushigh-dimensionalhigh-rate

What is a network anomaly?rare eventshort-lived

ML-based network anomaly detection methods more general than

model-basedsignature-based

Page 3: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

Our Methodology

Show applicability of ML approaches to network anomaly detection

Two example algorithms:One-Class Neighbor Machine (OCNM) [Muñoz 06]Kernel-based Online Anomaly Detection (KOAD) [Ahmed 07]

Two example datasets:Transports QuebecAbilene

10 20 3005

10t=2014

10 20 3005

10t=2016

10 20 3005

10t=2018

Hash of srcIP

Log

of fr

eque

ncy

Page 4: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

One-Class Neighbor Machine (OCNM)

Region of normality should correspond to a Minimum Volume Set (MVS)

OCNM for estimating MVSproposed in [Muñoz 06],

Requires choice of sparsitymeasure, g. Example: k-thnearest-neighbour distance

Sorts list of g, identifies fraction µ inside MVS

Normal

2-D Isomap of CHIN-LOSA backbone flow, srcIP

Anomalous

Page 5: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

Kernel-based Online Anomaly Detection (KOAD): Introduction

Sequence of multivariate measurements: {xt}t=1:T

Choose feature space with associated kernel:

where

Then feature vectors corresponding to normal traffic measurements should cluster

( ) ( ) ( ), ,i j i jk ϕ ϕ=x x x x

( ): n Hϕ ϕ ∞∈ → ∈x x

Page 6: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

Should be possible to describe region of normality

in feature space using sparse dictionary,

Feature vector is said to be approximately

linearly independent on if [Engel 04]:

(1)

Kernel-based Online Anomaly Detection (KOAD): Dictionary

12

1

min ( ) ( )tm

t j j ta j

aδ φ φ ν−

=

= − >∑ x x

Dictionary approximationThreshold

( ){ }1

M

j jϕ

=x

( )tϕ x

{ }1

M

j jD

== x

Page 7: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

At timestep t with arriving input vector xt :Evaluate according to (1),compare with and where

If , infer xt far from normality: Red1

If , raise Orange, resolve l timestepslater, after “usefulness” test

If , infer xt close to normality: Green

Delete obsolete elements, use exponential forgettingFor details of KOAD see [Ahmed 07]

Kernel-based Online Anomaly Detection (KOAD): Algorithm

tδ1ν 2ν 1 2ν ν<

1tδ ν<

2tδ ν>

1tδ ν>

Page 8: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

Dataset 1: Transports Quebec

Cameras monitored

Page 9: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

Sample Images (normal)

Camera 1 Camera 2 Camera 3

Camera 4 Camera 5 Camera 6

Page 10: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

Sample Images (anomalous)

Camera 4 Camera 5 Camera 6

Camera 1 Camera 2 Camera 3

Page 11: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

Feature Extraction: Discrete Wavelet Transform

At each timestep, at each camera, get 6-D wavelet feature vector

0 200 400 600 800 1000-25

-20

-15

-10

-5

0

Timestep

α i(n

), dB

Page 12: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

Transports Quebec Results

Camera 1 Camera 6

-25

-20

-15

-10

-5

0

α i, d

B

10-2

10-1

OC

NM

di

stan

ce

100 200 300 400

10-3

10-1

KO

AD

δ

Timestep

Change ofcamera position

Traffic jam

-25

-20

-15

-10

-5

0

α i, d

B

10-2

10-1

OC

NM

di

stan

ce

100 200 300 400

10-3

10-1

KO

AD

δ

Timestep

Traffic jams

Use n = 3 out of c = 6 voting at central monitoring unit

Page 13: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

Transports Quebec ROC

KOAD: Gaussian kernel, with varying standard deviation for the kernel function

OCNM: identify 5%-50% of outliers

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

PFA

PD

KOADOCNM

Page 14: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

Dataset 2: Abilene

Abilene Weathermap

Data collection11 core routers, 121 backbone flows

4 main pkt header fields collected:(srcIP, dstIP, srcPort, dstPort)

Data processingConstruct histogram of headers

Calculate header entropies for each backbone flow, at each timestep

Variations in entropies (distributions) reveal many anomalies [Lakhina 2005]

10 20 3005

10t=2014

10 20 3005

10t=2016

10 20 3005

10t=2018

Hash of srcIP

Log

of fr

eque

ncy

Page 15: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

Abilene Results

0.01

0.03δ

10-5

10-4

Mag

nitu

de o

f res

idua

l

10-0.9

10-0.7

Euc

lidea

n d

ista

nce

400 800 1200 1600 2000

KOAD

PCA

OCNM

Timestep

KOAD

PCA, 10 components

OCNM, 5% outliers

Page 16: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

Conclusions and Future Work

Preliminary results indicate potential of ML approaches

Parameters set using supervised learning

Computations must be distributed

Online: complexity must be independent of time

Page 17: Machine Learning Approaches to Network Anomaly Detection · Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca,

Acknowledgements, ReferencesAcknowledgements:

Thanks to Sergio Restrepo for processing Transports Quebec dataand to Anukool Lakhina for providing Abilene dataset.

References:[Ahmed 07]

T. Ahmed, M. Coates, and A. Lakhina, “Multivariate online anomaly detection using kernel recursive least squares,” in Proc. IEEE Infocom, Anchorage, AK, May 2007, to appear.

[Engel 04]Y. Engel, S. Mannor, and R. Meir, “The kernel recursive least squares algorithm,” IEEE Trans. Signal Proc., vol. 52, no. 8, pp. 2275–2285, Aug. 2004.

[Lakhina 05]A. Lakhina, M. Crovella and C. Diot, “Mining anomalies using traffic feature distributions,” in Proc. ACM SIGCOMM, Philadelphia, PA, Aug. 2005.

[Muñoz 06]A. Muñoz and J. Moguerza, “Estimation of high-density regions using one-class neighbor machines,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, num 3, pp 476--480, Mar. 2006.