CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer &...

25
CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection of unknown computer worms based on behavioral classification of the host Robert Moskovitch ,Yuval Elovici ,Lior Rokach

Transcript of CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer &...

Page 1: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Presented by: SandeepDept of Computer & Information Sciences

University of Delaware

Detection of unknown computer worms based on behavioral classification of the host

Robert Moskovitch ,Yuval Elovici ,Lior Rokach

Page 2: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Worms

• Worms are considered malicious in nature

• Worms propagate actively over a network, while other types of malicious codes, such as viruses, commonly require human activity to propagate

• Viruses infect a file (its host), a worm does not require a host file .

Page 3: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

What do Antivirus Packages do ?

• Antivirus software packages inspect each file

that enters the system, looking for known

signatures which uniquely identify an instance of known malcode• Polymorphism and metamorphism are two common obfuscation techniques used by malware

writers• Polymorphic virus obfuscates its decryption loop

using several transformations, such as nop-

insertion, code transposition

Page 4: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Obfuscation Techniques

• Metamorphic viruses attempt to evade detection

by obfuscating the entire virus. When they

replicate, these viruses change their code in a

variety of ways, such as code transposition,

substitution of equivalent instruction sequences,

change of conditional jumps, and register

reassignment

Page 5: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Example

• Virus Code : Morphed Virus Code(From Chernobyl CIH1.4)Loop : Loop : pop ecx Loop: pop ecx

pop ecx nop nop

jecxz SFModMark jecxz SFModMark jmp L1

mov esi , ecx xor ebx , ebx L3: call edi

mov eax , 0d601h beqz N1 xor ebx , ebx

Pop edx N1: mov esi , ecx beqz N2

Pop ecx nop N2: jmp Loop

Call edi mov eax ,0d601h jmp l4

pop edx L2: nop

pop ecx mov eax , 0d601h

nop pop edx Xor ebx , ebx

call edi pop ecx beqz N1

Xor ebx , ebx nop N1: mov esi , ecx

beqz N2 jmp L3 jmp L2

N2: JMP loop L1: jecxz SFModMark L4:

Page 6: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Current Methods

• Existing methods rely on the analysis of the

binary for the detection of unknown malcode.

• Some less typical worms are left undetectable.

Therefore an additional detection layer at

runtime is required

Page 7: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Proposed Approach

• Malicious actions are reflected in the general

behavior of the host. By monitoring the host, one

can inexplicitly identify malcodes.

• A classifier is trained with computer measurements from infected and not infected computers.

Page 8: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Contributions of the Paper

• Machine learning techniques are capable of detecting and classifying worms

• Using feature selection techniques to show that a

relatively small set of features are sufficient for

solving the problem without sacrifice accuracy.

• Empirical results from an extensive study of various machine configurations suggesting that

the proposed methods achieve high detection rates on previously unseen worms.

Page 9: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Train and Test Phase

Page 10: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Dataset creation

• Lab network consisted of seven computers, which

contained heterogenic hardware, and a server

computer simulating the internet.

• Used the windows performance counters and Vtrace which enable monitoring system features

• A vector of 323 features for every second.

• Choose worms that differ in their behavior, from

among the available worms

Page 11: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Dataset Description

Page 12: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Feature selection methods

• Chi-Square

• Gain Ratio

• Relief

• Features’ ensemble :

fi is a feature, filter is one of the k filtering (feature selection) methods.

Page 13: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Consolidating features from different environments:Averaged and Unified

Page 14: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Feature Sets

Page 15: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Classification algorithms

• Decision Trees,

• Naıve Bayes,

• Bayesian Networks

• Artificial Neural Networks

Page 16: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Evaluation measures

Page 17: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Experiment I

• Each classifier is trained on a single dataset i

and tested on each one ( j ) of the eight datasets.

Eight corresponding evaluations were done on each one of the datasets, resulting in 64 evaluation runs.

• When i = j , 10 fold cross validation, in which the

dataset is randomly partitioned into ten partitions and repeatedly the classifier is

trained on nine partitions and tested on the

tenth.

Page 18: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Experiment I (Contd)

• Each evaluation run (out of the 64) was repeated

for each one of the combinations of feature

selection method, classification algorithm, and

number of top features.

• Each evaluation run was repeated for the 33 feature set described earlier

• 132 (four classification algorithms applied to 33

feature sets) evaluations (each comprises 64

runs), summing up to 8448 evaluation runs.

Page 19: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Results

Page 20: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Results(Contd)

Page 21: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Results(Contd)

Page 22: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Experiment II

• Classifiers based on part of the (five) worms and

the none activity, and tested on the excluded

worms (from the training set) and the none activity• Training set consisted of 5 − k worms and the testing set contained the k excluded worms,

while the none activity appeared in both datasets. • This process repeated for all the possible combinations of the k worms (k = 1–4). • The Top20 features, which outperformed in e1 were used

Page 23: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Results

Page 24: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Conclusion

• Q1: In the detection of known malicious code,

based on a computer’s measurements, using

machine learning techniques, what is the

achievable level of accuracy?

• Q2: Is it possible to reduce the number of

features to below 30, while maintaining a high

level of accuracy

Page 25: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.

CISC 879 - Machine Learning for Solving Systems Problems

Conclusions(Contd)

• Q3: Will the computer configuration and the

computer background activity, from which the

training sets were taken, have a significant

influence on the detection accuracy?

• Q4: Is the detection of unknown worms possible,

based on a training set of known worms?