0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System...
Transcript of 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System...
Salman Bin Abdulaziz University
College of Engineering and Computer science
Department of Computer Science
0-Day Detection in
Malware Software Dataset
By
Sultan Mohamed Al-Ajmi
Under Supervision of
Dr. Mohammad Alhawarat
1433/2012
Table of Contents
1. Introduction:….. .................................................................................................... 2
1.1Signature-Based:……. ...................................................................................... 2
1.2 Heuristic Scanning (0-Day Detection):.......……………...……………..……3
2. Literature Review:……………………...…………………………..…………... 3
3.Experiment Setup:………………….…………………..……………...…………5
3.1 Oracle Relational Database Management System (RDBMS):……..………..5
3.2 Malicious Software Dataset:……………………………….……. ………….6
3.3 Loading Dataset into the Database using OraLoader:……………….…. …..7
3.4 Oracle Data Mining:……………………………………. ………..………...9
4.Experiment Execution:…………………………….....…….……………...…….9
5.Results and Discussion:….. .................................................................................. 15
6.Future Work:………….. ...................................................................................... 15
7.References:…… ................................................................................................... 16
List of Tables & Figures
Tables:
Table 1: Dataset Summary ............................................................................................ 9
Figures:
Figure 1: Oracle RDBMS Site ...................................................................................... 5
Figure 2: OraLoader Login ........................................................................................... 7
Figure 3: OraLoader wizard (Open File Window) ........................................................ 7
Figure 4: OraLoader wizard (Table Options) ................................................................ 8
Figure 5: OraLoader wizard (Preview) ......................................................................... 8
Figure 6: Oracle Data Mining Connection .................................................................... 9
Figure 7: Selecting the Dataset Table ........................................................................... 9
Figure 8: Build Activity .............................................................................................. 10
Figure 9: Build Wizard (Model Choosing) ................................................................. 10
Figure 10: Build Wizard (Table and Primary key) ..................................................... 11
Figure 11: Build Wizard (Data Usage) ....................................................................... 11
Figure 12: Build Wizard (Model Completion) ............................................................ 12
Figure 13: Apply Activity ........................................................................................... 12
Figure 14: Apply Activity (Model) ............................................................................. 13
Figure 15: Apply Activity (Attributes Options) .......................................................... 13
Figure 16: Apply Activity (Results) ............................................................................ 14
1
Abstract
Security is a vital aspect in computer Science because it is a real problem
if your files or information are lost, damaged or even shared while they
are private. The solution is to analyze the shell trace and determine
whether this software is benign or malware. We are using Oracle Data
Mining software to analyze the trace file of a malware dataset with
Anomaly Detection Technique. It takes a bulk of records (training set)
with trace and the type of software (benign or malware) as input. Then
the model that can predict the type of new software upon its trace is
created. We Applied Anomaly Detection on Malicious Software Dataset
from data mining competition associated with ICONIP 2010. After that,
we got a Model for predicting if the software is benign or malware. The
Model accuracy is 81%. While signature-based techniques should study
and insert Malware signature to the antivirus software database after the
malware has already spread and infected many computers. Meanwhile
the 0-Day detection mechanism should detect Malwares immediately and
it can be reported to analysis.
2
1. Introduction:
Previously only strong passwords and firewalls were all that was required
to secure corporate networks. Nowadays, intruder attack methodologies
have become more targeted and sophisticated. So, we need new
techniques to defend against Intruders. Intrusion detection (ID) is a type
of security management system for computers and networks. An ID
System (IDS) monitors network traffic for suspicious activity and alerts
network administrators, or responds by taking predefined action. ID uses
vulnerability assessment or scanning, which is a technology developed to
assess the security of a computer system or network. What makes ID
important is that IDS will detect any type of intrusions or misuse that
falls out of normal system operation. This is as opposed to signature
based systems which can only detect attacks for which a signature has
previously been created.
Antivirus software uses 2 strategies to detect malwares: signature-based
and heuristic scanning. These will be discussed in the following
subsections.
1.1 Signature-Based:
Signature based detection method is a very old method. It is using a
simple method by comparing string in a scanned object or file (usually in
very specific places) against known malware or virus string pattern in a
virus signature database. The database is populated and updated by the
antivirus provider in a regular manner. Some anti-malware software
release several update in one day to catch up with the very fast new
malware creation. To add new signature to its database; the antivirus
provider will need reports of virus infections from its users. So there
always have to be some unlucky users who are of the first to be infected.
3
1.2 Heuristic Scanning (0-Day Detection):
Heuristic scanning is more advanced than that of signature based
scanning. In heuristic scanning the antivirus software will detect a
malware threat by recognizing some instructions or commands in the
scanned object or file. It will compare malicious instructions or
commands found in a file against a set of rule that is commonly used by
malicious software or virus. It will trigger an alarm when it finds a match
of certain rule. Heuristic scanning is much better than signature based
scanning especially in detecting new created malware or virus.
2. Literature Review:
In latest studies, researchers are concerned about the performance, real
time IDS and improvement of IDS Detection.
Decision tree based light weight intrusion detection using a wrapper
approach is discussed by [3]. The objective of this paper is to construct a
lightweight Intrusion Detection System (IDS) aimed at detecting
anomalies in networks. The goals of this paper are (i) removing
redundant instances that causes the learning algorithm to be unbiased (ii)
identifying suitable subset of features by employing a wrapper based
feature selection algorithm (iii) realizing proposed IDS with neurotree to
achieve better detection accuracy.
An efficient intrusion detection system based on support vector machines
and gradually features removal method is achieved by [4] using the
gradually feature removal method. In this work 19 critical features are
chosen to represent the various network visits. With the combination of
clustering method, ant colony algorithm and support vector machine
4
(SVM), an efficient and reliable classifier is developed to judge a
network visit to be normal or not. Moreover, the accuracy achieves
98.6249% in 10-fold cross validation and the average Matthews
correlation coefficient (MCC) achieves 0.861161.
A differentiated one-class classification method with applications to
intrusion detection is discussed in [2]. This paper proposes a new one-
class classification method with differentiated anomalies to enhance
intrusion detection performance for harmful attacks. We also propose
new extracted features for host-based intrusion detection based on three
viewpoints of system activity such as dimension, structure, and contents.
Experiments with simulated dataset and the DARPA 1998 BSM dataset
shows that their differentiated intrusion detection method performs better
than existing techniques in detecting specific type of attacks.
In [6] the authors propose a real-time intrusion detection approach using
a supervised machine learning technique. Their approach is simple and
efficient, and can be used with many machine learning techniques. They
applied different well-known machine learning techniques to evaluate the
performance of their IDS approach. Their experimental results show that
the Decision Tree technique can outperform the other techniques.
Therefore, they have further developed a real-time intrusion detection
system (RT-IDS) using the Decision Tree technique to classify on-line
network data as normal or attack data.
A Gaussian distributed WSN cannot effectively detect the intruder if it
starts from the network boundary [5]. In view of this, this paper
introduces a novel k-Gaussian deployment strategy to leverage the
advantages of both uniform and Gaussian random sensor deployment for
efficient and effective intrusion detection. The key idea is to employ
5
multiple deployment points in the area of interest and a subset of the total
sensors are deployed around each deployment point following a Gaussian
distribution and form a k-Gaussian distributed WSN.
3. Experiment Setup:
3.1 Oracle Relational Database Management System (RDBMS):
Oracle RDBMS is one of the most used and known Database
management systems. It has the powerful to deal with huge number of
records easily and efficiently. To use Oracle Database we have to install
one of the recent versions, I've chosen Oracle 10g and here are the
installation steps:
First of all, paste this URL to your browser address bar:
http://www.oracle.com/technetwork/database/10204-winx64-vista-win2k8-
082253.html
Figure 1: Oracle RDBMS Site
6
After downloading is complete, extract the file and carry out the
following steps:
1- Write the name of the database and your "sys" account password,
and then click next.
2- If this window gives you an error, then your PC is not compatible
with Oracle RDBMS, click next.
3- Check the summary information and click next.
4- Now Oracle RDBMS is being installed, wait until it finish and
click finish.
After Installing is complete, go to start run sql plus
And enter sys as the username and its password. Now your Oracle
DBMS is ready to import your dataset into it.
3.2 Malicious Software Dataset:
Tiltle Malicious Software Dataset
Sources International Conference on Neural Information Processing
Features
(Attributes)
Trace
Sequence of API calls
Class Attribute Type of the software
0 for benign
1 for malware
No. Instances 252
Missing
Attribute Values No
Table 1: Dataset Summary
This dataset is composed of a selection of Windows API/System-Call
trace files, intended for testing classifiers dealing with sequence patterns.
This dataset has been downloaded from the website of CS Mining Group
[9].
7
3.3 Loading Dataset into the Database using OraLoader:
One of the tools provided with Oracle DBMS is SQLLoader, where you
can load a table or data from excel, csv or text into your database.
OraLoader software uses SQLLoader with an easy interface. Here are the
steps in the form of screenshots:
Figure 2: OraLoader Login
Figure 3: OraLoader wizard (Open File Window)
8
Figure 4: OraLoader wizard (Table Options)
Figure 5: OraLoader wizard (Preview)
Click on close button, and now you are done with loading your dataset.
9
3.4 Oracle Data Mining:
Oracle Data Mining (ODM) embeds data mining within the Oracle
database. ODM algorithms operate natively on relational tables or views,
thus eliminating the need to extract and transfer data into standalone
tools or specialized analytic servers. Download the software from Oracle
website, and run the executable file:
Figure 6: Oracle Data Mining Connection
4. Experiment Execution:
After achieving all requirements in the last section, here are the steps of
how to apply Anomaly detection on the dataset as shown in the following
set of figures:
Figure 7: Selecting the Dataset Table
10
Figure 8: Build Activity
Figure 9: Build Wizard (Model Choosing)
11
Figure 10: Build Wizard (Table and Primary key)
Figure 11: Build Wizard (Data Usage)
12
After all activities have completed, click on result as shown in the
following figure:
Figure 12: Build Wizard (Model Completion)
Now, applying the model on the dataset, to see how accurate the model.
See the following figure:
Figure 13: Apply Activity
13
Figure 14: Apply Activity (Model)
Figure 15: Apply Activity (Attributes Options)
14
The result of applying the model on the dataset is shown in figure 23:
Figure 16: Apply Activity (Results)
15
5. Results and Discussion:
After creating the model, we applied it on the dataset. The Model is able
to detect the malware programs. There is no need to wait until the
antivirus updates the signature database, the malware can be detected
instantly. Signature-Based method is updated by the user reports, so there
is always at least some victims of a new malware. While the Intrusion
Detection System might detects that in milliseconds.
The Model -created in this study- accuracy is 81%. That means it can
predict 81% of 0-Day malwares, so it can decrease the victims by 81%.
6. Future Work:
I would like to design or participate in designing an IDS or network IDS.
Such IDS will generate a report for detected malwares. So that 0-day
malwares will be detected effectively. The next improvement to such
IDS is that it can learn and provide better detection with time. To do so
we need an algorithm to learn more when detecting malwares.
16
7. References:
[1] Tan P-N, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Addison-
Wesley.
[2] Inho Kang, Myong K. Jeong, and Dongjoon Kong. 2012. A differentiated one-class
classification method with applications to intrusion detection. Expert Syst. Appl. 39, 4
(March 2012), 3899-3905.
[3] Siva S. Sivatha Sindhu, S. Geetha, and A. Kannan. 2012. Decision tree based light weight
intrusion detection using a wrapper approach. Expert Syst. Appl. 39, 1 (January 2012), 129-
141.
[4] Yinhui Li, Jingbo Xia, Silan Zhang, Jiakai Yan, Xiaochuan Ai, and Kuobin Dai. 2012. An
efficient intrusion detection system based on support vector machines and gradually feature
removal method. Expert Syst. Appl. 39, 1 (January 2012), 424-430.
[5] Yun Wang and Zhengdong Lun. 2011. Intrusion detection in a K-Gaussian distributed
wireless sensor network. J. Parallel Distrib. Comput. 71, 12 (December 2011), 1598-1607.
[6] Phurivit Sangkatsanee, Naruemon Wattanapongsakorn, and Chalermpol Charnsripinyo.
2011. Practical real-time intrusion detection using machine learning approaches. Comput.
Commun. 34, 18 (December 2011), 2227-2235.
[7] Oracle Documentation Library, http://tahiti.oracle.com/pls/db102/homepage [Access Date
5/2/2012]
[8] Association for Computing Machinery, http://www.acm.org/ [Access Date 4/2/2012]
[9] CS Mining Group, http://csmining.org/index.php/ [Access Date 5/2/2012]