Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic...

23
Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1 , Wei Fan 2 , Deepak Turaga 2 , Olivier Verscheure 2 , Xiaoqiao Meng 2 , Lu Su 1 ,Jiawei Han 1 1 Department of Computer Science University of Illinois 2 IBM TJ Watson Research Center INFOCOM’2011 Shanghai, China T: multiple simple atomic detectors UT: optimization-based combination mostly consisten all atomic detectors

Transcript of Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic...

Page 1: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

Consensus Extraction from Heterogeneous Detectors to Improve Performance over

Network Traffic Anomaly Detection

Jing Gao1, Wei Fan2, Deepak Turaga2, Olivier Verscheure2, Xiaoqiao Meng2,

Lu Su1,Jiawei Han1

1 Department of Computer Science University of Illinois

2 IBM TJ Watson Research Center

INFOCOM’2011 Shanghai, China

INPUT: multiple simple atomic detectorsOUTPUT: optimization-based combination mostly consistent with all atomic detectors

Page 2: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

2

Network Traffic Anomaly Detection

ComputerNetwork

Network Traffic

Tid SrcIP Start time

Dest IP Dest Port

Number of bytes

1 206.135.38.95 11:07:20 160.94.179.223 139 192

2 206.163.37.95 11:13:56 160.94.179.219 139 195

3 206.163.37.95 11:14:29 160.94.179.217 139 180

4 206.163.37.95 11:14:30 160.94.179.255 139 199

5 206.163.37.95 11:14:32 160.94.179.254 139 19

6 206.163.37.95 11:14:35 160.94.179.253 139 177

7 206.163.37.95 11:14:36 160.94.179.252 139 172

8 206.163.37.95 11:14:38 160.94.179.251 139 285

9 206.163.37.95 11:14:41 160.94.179.250 139 195

… 10

Anomalous or Normal?

Page 3: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

3

Challenges• the normal behavior can be too

complicated to describe.• some normal data could be similar to the

true anomalies

• labeling current anomalies is expensive and slow

• the network attacks adapt themselves continuously – what we know in the past may not work for today

Page 4: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

The Problem

• Simple rules (or atomic rules) are relatively easy to craft.

• Problem: – there can be way too many simple rules– each rule can have high false alarm or FP rate

• Challenge: can we find their non-trivial combination (per event, per detector) that significantly improve accuracy?

Page 5: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

5

Why We Need Combine Detectors?

Count 0.1-0.5

Entropy 0.1-0.5

Count 0.3-0.7

Entropy 0.3-0.7

Count 0.5-0.9

Entropy 0.5-0.9

Label

Too many alarms!

Combined view is better than individual views!!

Page 6: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

6

Combining Detectors

• is non-trivial– We aim at finding a consolidated solution without any

knowledge of the true anomalies (unsupervised)

– But we could improve with limited supervision and incrementally (semi-supervised and incremental)

– We don’t know which atomic detectors are better and which are worse

– At some given moment, it could be some non-trivial and dynamic combination of atomic detectors

– There could be more bad base detectors than good ones, so that majority voting cannot work

Page 7: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

7

Problem Formulation

Record 1

Record 2

Record 3

Record 4

Record 5

Record 6

Record 7

……

A1 A2…… Ak-1 Ak

Which one is anomaly?

Y N N N……

N Y Y N……

Y N N N……

Y Y N Y……

N N Y Y……

N N N N……

N N N N……

Combine atomic detectors into one!We propose a non-trivial combinationConsensus: 1. mostly consistent with all atomic detectors2. optimization-based framework

Page 8: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

8

How to Combine Atomic Detectors?

• Linear Models– As long as one detector is correct, there always exist weights to combine them linearly – Question: how to figure out these weights– Per example & per detector

• Different from majority voting and model averaging

• Principles– Consensus considers the performance among a set of examples and weights each

detectors by considering its performance over others, i.e, each example is no longer i.i.d– Consensus: mostly consistent among all atomic detectors– Atomic detectors are better than random guessing and systematic flipping– Atomic detectors should be weighted according to their detection performance– We should rank the records according to their probability of being an anomaly

• Algorithm– Reach consensus among multiple atomic anomaly detectors

• unsupervised

• Semi-supervised

• incremental

– Automatically derive weights of atomic detectors and records – per detector & per event – no single weight works for all situations.

Page 9: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

9

Framework

],[ 10 iii uuu

],[ 10 jjj qqq

iu

…… ……

Detectors Records

A1

Ak

jq

record i

detector j

probability of anomaly, normal

otherwise

qua jiij 0

1adjacency

initial probability

normal

anomalousy j ]10[

]01[

[1 0] [0 1]

Page 10: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

10

Objectiveminimize disagreement

)||||||||(min 2

11 1

2, jj

v

j

n

i

v

jjiijUQ yqqua

Similar probability of being an anomaly if the record is connected to the detector

Do not deviate much from the initial probability

iu

…… ……

Detectors Records

A1

Ak

jq

[1 0] [0 1]

Page 11: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

11

Methodology

iu

…… ……

Detectors Records

A1

Ak

jq

[1 0] [0 1]

v

jij

v

jjij

i

a

qa

u

1

1

n

iij

j

n

iiij

j

a

yuaq

1

1

Iterate until convergence

Update detector probability

Update record probability

Page 12: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

12

Propagation Process

…… ……

Detectors Records

[1 0] [0 1]

[0.5 0.5]

[0.5 0.5]

[0.5 0.5]

[0.5 0.5]

[0.5 0.5]

[0.5 0.5]

[0.5 0.5]

[0.5 0.5]

[0.7 0.3]

[0.7 0.3]

[0.357 0.643]

[0.357 0.643]

[0.5285 0.4715]

[0.5285 0.4715]

[0.5285 0.4715]

[0.5285 0.4715]

[0.357 0.643]

[0.357 0.643]

[0.357 0.643]

[0.7 0.3]

[0.6828 0.3172]

[0.7514 0.2486]

[0.304 0.696]

[0.304 0.696]

Page 13: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

13

Consensus Combination Reduces Expected Error

• Detector A– Has probability P(A)– Outputs P(y|x,A) for record x regarding y=0 (normal)

and y=1 (anomalous)

• Expected error of single detector

• Expected error of combined detector

• Combined detector has a lower expected error

A yx

S AxyPxyPyxPErr),(

2),|()|(),(

2),(

),|()()|(),( yx A

C AxyPAPxyPyxPErr

SC ErrErr

Page 14: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

14

Extensions

• Semi-supervised– Know the labels of a few records in advance– Improve the performance of the combined

detector by incorporating this knowledge

• Incremental – Records arrive continuously

– Incrementally update the combined detector

Page 15: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

15

Incremental

iu

1nu

nu

…… ……

Detectors Records

A1

Ak

jq

[1 0] [0 1]

v

jij

v

jjij

i

a

qa

u

1

1

nj

n

iij

jnnj

n

iiij

j

aa

yuauaq 1

1

1

1

When a new record arrives

Update detector probability

Update record probability

Page 16: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

16

Semi-supervised

iu

v

jij

v

jijij

i

a

fqa

u

1

1

…… ……

Detectors Records

A1

Ak

jq

[1 0] [0 1]

v

jij

v

jjij

i

a

qa

u

1

1

n

iij

j

n

iiij

j

a

yuaq

1

1

Iterate until convergence

unlabeled

labeled

Page 17: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

17

Benchmark Data Sets• IDN

– Data: A sequence of events: dos flood, syn flood, port scanning, etc, partitioned into intervals

– Detector: setting threshold on two high-level measures describing the probability of observing events during each interval

• DARPA – Data: A series of TCP connection records, collected by

MIT Lincoln labs, each record contains 34 continuous derived features, including duration, number of bytes, error rate, etc.

– Detector: Randomly select a subset of features, and apply unsupervised distance-based anomaly detection algorithm

Page 18: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

18

Benchmark Datasets

• LBNL– Data: an enterprise traffic dataset collected at the

edge routers of the Lawrence Berkeley National Lab. The packet traces were aggregated by intervals spanning 1 minute

– Detector: setting threshold on six metrics including number of TCP SYN packets, number of distinct IPs in the source or destination, maximum number of distinct IPs an IP in the source or destination has contacted, and 6) maximum pairwise distance between distinct IPs an IP has contacted.

Page 19: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

19

Experiments Setup

• Baseline methods– base detectors– majority voting– consensus maximization– semi-supervised (2% labeled)– stream (30% batch, 70% incremental)

• Evaluation measure– area under ROC curve (0-1, 1 is the best)– ROC curve: tradeoff between detection rate

and false alarm rate

Page 20: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

20

AUC on Benchmark Data Setsworst best average MV UC SC IC

IDN 0.5269 0.6671 0.5904 0.7089 0.7255 0.7204 0.7270

0.2832 0.8059 0.5731 0.6854 0.7711 0.8048 0.7552

0.3745 0.8266 0.6654 0.8871 0.9076 0.9089 0.9090

DARPA 0.5804 0.6068 0.5981 0.7765 0.7812 0.8005 0.7730

0.5930 0.6137 0.6021 0.7865 0.7938 0.8173 0.7836

0.5851 0.6150 0.6022 0.7739 0.7796 0.7985 0.7727

LBNL 0.5005 0.8230 0.7101 0.8165 0.8180 0.8324 0.8160

Worst, best and average performance of atomic detectors

Unsupervised, semi-supervised and

incremental version of consensus combinationconsensus combination

Majority voting among detectors

Consensus combination improves anomaly detection performance!

Page 21: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

Stream ComputingContinuous Ingestion Continuous Complex Analysis in low latency

Page 22: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

22

Conclusions

• Consensus Combination– Combine multiple atomic anomaly detectors to a

more accurate one in an unsupervised way

• We give– Theoretical analysis of the error reduction by

detector combination– Extension of the method to incremental and semi-

supervised learning scenarios– Experimental results on three network traffic

datasets

Page 23: Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

23

Thanks!

• Any questions?

Code available upon request