ITEC 810 Entropy based anomaly detection systems

ITEC 810 Entropy based anomaly detection systems

Name:Xin Heng

Student No:41738799

Supervisor: Professor Vijay Varadharajan

1

1.Background 2.Introduction 3. Information theory 4. UNM sendmail system call data case study 5. CID compare to traditional matrix 6. Conclusion

Contents:

2

The best research ID systems had detection rate below 70%, most of missed intrusion are new attacks.

Anomaly detection is main technique for novel attack. The main reason of not applying anomaly detection

system is the high wrong alarm rate. Develop a significantly better anomaly detection

model

3

1.Background

Aim, Approach and Outcomes

Investigate entropy based anomaly detection systems

1. -Analyze information theory

2. -Study ‘UNM’ sendmail system call data case study

3. -Investigate new matrix CID(a matrix for evaluating the capability of intrusion detection)

4

2.Introduction

Shannon information theory Conditional entropy Relative (conditional) entropy Information gain(mutual information)

3.Information theoretic measure

5

H(X) = - p(x) log p(x) Σ x€Cx” Measure the uncertainty of a collection of data

items.

Shannon entropy

6

7

For different datasets, the entropy is smaller, the data collection is skewer . The data items are more redundant. The data items are more regular. Repeated or redundant data items in the previous events will imply that the same event will probably show up in the future event.

Conditional entropy is the amount of uncertainty left in dataset X after known about dataset Y.

Sequential characteristic of the user, program and network activities.

We need to measure the regularity of the sequence of audit data to predict the future event.

8

Conditional entropy

If each audit trail is a sequence of events of the same type. e.g;

X={aaaaa, bbbbb,….}. Then the conditional entropy is 0. the event sequence is deterministic.

Conversely, a large conditional entropy imply that the sequence are not as deterministic and it is hard to model.

9

Conditional entropy

Relative conditional entropy will measure the distance between two audit datasets.

If the relative entropy between two datasets differs more; the audit dataset (model) is not suitable for training data..

10

Relative entropy

The reduction of the entropy when a dataset was partitioned according to feature value

To classify data items If all features have low information gain, then the

classifier has poor performance, because after the original partitioning, the subset is still impure, and have large entropy.

11

Information gain(mutual information)

According to the case study of the university of New Mexico (UNM) sendmail system call data, we studied how to measure the data regularity and use conditional entropy to determine the appropriate sequence length(how to build up the detection model) and demonstrate the performance of anomaly detection model(why it works).

4.UNM sendmail system call data

12

13

Experiment result 1:

we see as the sequence length of each normal trace increases, the conditional entropy decreases.


we can use conditional entropy as estimated misclassification rate to decide the suitable length of the audit data in building up the anomaly detection model.

14

15


According to the high misclassification rate rather than the misclassification occurrence judge the abnormal trace.

16


The relative conditional entropy between two datasets differs more; the audit dataset (model) is not suitable as training data.

17


Also consider the information cost, and choose the optimized accuracy per cost for an IDS.

To evaluate the capability of intrusion detection systems

Capability of intrusion detection(CID) Other IDS matrix: ROC curve, PPV and NPV and

Cost-based approach

5.Comparison of matrix

18

The definition of CID is: CID=I(X; Y) / H(X); CID (capability of Intrusion detection) is the ratio of

the mutual information to the entropy of IDS input.

Capability of Intrusion detection (CID)

19

Selection of optimized operation point in IDS. Comparison of different IDSs

20

Effect of CID

FP rate=false positive rate

the chance that there is no intrusion when an alert occur.

TP rate=true positive rate

the chance that there is an intrusion when an alert occur.

21

ROC curve

22

If two ROC curves don’t cross, the operation point on the top curve is better than the one below, because for the top curve each operation point with the same false positive rate (FP) there is a higher true positive rate (TP).

PPV: Positive predictive value The chance that there is an intrusion when output an

alarm NPV: Negative predictive value The chance that there is no intrusion when doesn’t

output an alarm Both PPV and NPV are significant from the usability

point of view.

PPV and NPV

23

Single unified matrix Sensitivity and accuracy Objective and natural measure Compared to

traditional measures

24

Attribute of CID

Analyze basic information theory UNM sendmail system call data case study Analysis the new matrix CID Extension of the work is probably to suggest new

matrix based on the existing research and explain its performance.

6.Conclusion

25

Gu, G, Fogla, P, Dagon, D & Lee, W, Measuring intrusion detection capability: an information-theoretic approach.

Lee, W, Xiang, D 2001, ‘Information-theoretic Measures for Anomaly Detection’, IEEE Symp. On Security and Privacy, Oakland, CA, pp. 130-143.

Mehta, GK 2009, ’Application of information theoretic metrics in anomaly detection’, department of computer science UCSB.

26

References

ITEC 810 Entropy based anomaly detection systems

Documents

Transcript of ITEC 810 Entropy based anomaly detection systems