Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

Post on 06-May-2015

452 views 2 download

description

In recent years, it has been getting more difficult to detect malware by a traditional method like pattern matching because of the improvement of malware. Therefore machine learning-based detection has been introduced and reported that it has achieved a high detection rate compared to a traditional method in various research. However, it is well-known that the accuracy of detection significantly degrades against data that differ from training dataset. This study provides a method to improve accuracy of detection by filtering evaluation dataset based on similarity between evaluation and training dataset.

Transcript of Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

Fourteenforty Research Institute, Inc.

FFRI, Inc. http://www.ffri.jp

Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

Junichi Murakami Director of Advanced Development

FFRI, Inc.

• This slides was used for a presentation at CSS2013

– http://www.iwsec.org/css/2013/english/index.html

• Please refer the original paper for the detail data

– http://www.ffri.jp/assets/files/research/research_papers/MWS2013_paper.pdf (Written in Japanese but the figures are common)

• Contact information

– research-feedback@ffri.jp

– @FFRI_Research (twitter)

Preface

2

FFRI, Inc.

• Background

• Problem

• Scope and purpose

• Experiment 1

• Experiment 2

• Experiment 3

• Consideration

• Conclusion

Agenda

3

FFRI, Inc.

Background – malware and its detection

4

Increasing

malware

Targeted Attack

(Unknown malware)

Malware generators

Obfuscators

Limitation of

signature matching

other methods

Heuristic

Could reputation

Machine learning Bigdata

FFRI, Inc.

Background – Related works

5

Features

Static information

Dynamic information

Hybrid

Algorithms

SVM

Naive bayes

Perceptron, etc.

Evaluation

TPR/FRP, etc.

ROC-curve, etc.

Accuracy, Precision

• Mainly focusing on a combination of the factors below

– Features selection and modification, parameter settings

• Some good results are reported (TRP:90%+, FRP:1%-)

FFRI, Inc.

• General theory of machine learning:

– Accuracy of classification declines if trends of training and testing data are different

• How about malware and benign files

Problem

6

? ?

FFRI, Inc.

① Investigating differences between similarities of malware and benign files(Experiment-1)

② Investigating an effect for accuracy of classification by the difference(Experiment-2)

③Based on the result above, confirming an effect of removing data whose similarity with a training data is low (Experiment-3)

Scope and purpose

7

FFRI, Inc.

• Used FFRI Dataset 2013 and benign files we collected as datasets

• Calculated the similarity of each malware and benign files (Jubatus, MinHash)

• Feature vector: A number of 4-gram of sequential API calls

– ex: NtCreateFile_NtWriteFile_NtWriteFile_NtClose: n times NtSetInformationFile_NtClose_NtClose_NtOpenMutext: m times

Experiment-1(1/3)

8

malware

benign A B C ...

A

B

C

...

A B C ...

A ー 0.8 0.52 ...

B ー ー 1.0 ...

C ー ー ー ...

... ー ー ー ー

FFRI, Inc.

Grouping malware and benign files based on their similarities

Experiment-1(2/3)

9

Threshold of similarity (0.0 - 1.0) benign

malware

FFRI, Inc.

Experiment-1(3/3)

10

0%

20%

40%

60%

80%

100%

正常

マル

ウェ

正常

マル

ウェ

正常

マル

ウェ

正常

マル

ウェ

正常

マル

ウェ

0.8 0.85 0.9 0.95 1

仲間無

仲間有

Threshold of similarity

It is more difficult to find similar benign files compared to malware

malw

are

malw

are

malw

are

malw

are

malw

are

benig

n

benig

n

benig

n

benig

n

benig

n

unique

not unique

FFRI, Inc.

• How much does the difference affect a result?

• 50% of malware/benign are assigned to a training, the others are to a testing dataset(Jubatus, AROW)

Experiment-2(1/3)

11

benign

malware

train

jubatus

classify

jubatus TPR: ?

FPR: ?

TPR: True Positive Rate FPR: False Positive Rate

train

testi

ng

FFRI, Inc.

Experiment-2(2/3)

12

benign

malware

train

jubatus

classify

jubatus TPR: ?

FPR: ?

train

testi

ng

• How much does the difference affect a result?

• 50% of malware/benign are assigned to a training, the others are to a testing dataset(Jubatus, AROW)

FFRI, Inc.

The accuracy declines if trends of training and testing data are different

Experiment-2(3/3)

13

0 50 100 0 1 2 3 4 5

■TPR ■FPR

97.996(not unique)

81.297(unique)

0.624(not unique)

4.49(unique)

-16.699

+3.866

% %

FFRI, Inc.

14

benign(train) malware(train)

benign(test) malware(test)

dividing line

Experiment-3(1/6) – After a training

malware

benign

FFRI, Inc.

Experiment-3(2/6) – After a classification

15

benign(train) malware(train)

benign(test) malware(test)

dividing line

FFRI, Inc.

16

FP

FN

Experiment-3(2/6) – After a classification

benign(train) malware(train)

benign(test) malware(test)

dividing line

FFRI, Inc.

Experiment-3(3/6) – Low similarity data

17

TP(accidentally)

FN

FN

benign(train) malware(train)

benign(test) malware(test)

dividing line

FFRI, Inc.

Experiment-3(4/6) – Effect to TPR

18

0.88

0.90

0.92

0.94

0.96

0.98

1.00

0

200

400

600

800

1000

1200

1400

0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

TP

FN

TPR

Threshold of similarity

The n

um

ber

of cla

ssifie

d d

ata

FFRI, Inc.

Experiment-3(5/6) – Effect to FPR

19

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014

0

500

1000

1500

2000

2500

0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

TN

FP

FPR

The n

um

ber

of cla

ssi

fied d

ata

Threshold of similarity

FFRI, Inc.

Experiment-3(6/6)

20

0%

20%

40%

60%

80%

100%

120%

0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

マルウェア 正常系ソフトウェア

Threshold of similarity

The n

um

ber

of cla

ssifie

d d

ata

/ The n

um

ber

of to

tal te

stin

g d

ata

Transition of the number of classified data

malware benign

FFRI, Inc.

• In real scenario:

– trying to classify an unknown file/process whether it is benign files or not

• If we apply Experiment-3:

– Files are classified only if similar data is already trained

– If not, files are not classified which results in

• FN if the files is malware

• TF if the files is benign (All right as a result)

• Therefore it is a problem about “TPR for unique malware” (Unique malware is likely to be undetectable)

Consideration(1/3)

21

FFRI, Inc.

• If malware have many variants as the current

– ML-based detection works well

• Having many variants ∝ malware generators/obfuscators

• We have to investigate

– Trends of usage of the tools above

– Possibility of anti-machine learning detection

Consideration(2/3)

22

FFRI, Inc.

• How to deal with unclassified (filtered) data

1. Using other feature vectors

2. Enlarging a training dataset (Unique → Not unique)

3. Using other methods besides ML

Consideration(3/3)

23

FFRI, Inc.

• Distribution of similarity for malware and benign are difference (Experiment-1)

• Accuracy declines if trends of training and testing data are different (Experiment-2)

• TPR of unique malware declines when we remove low similarity data (Experiment-3)

• Continual investigation for trends of malware and related tools are required

• (Might be necessary to develop technology to determine benign files)

Conclusion

24