Khaled N. Khasawneh*, Meltem Ozsoy***, Caleb Donovick**, Nael Abu-Ghazaleh*, and Dmitry Ponomarev**...

Post on 18-Jan-2016

228 views 0 download

Tags:

Transcript of Khaled N. Khasawneh*, Meltem Ozsoy***, Caleb Donovick**, Nael Abu-Ghazaleh*, and Dmitry Ponomarev**...

Khaled N. Khasawneh*, Meltem Ozsoy***, Caleb Donovick**,Nael Abu-Ghazaleh*, and Dmitry Ponomarev**

Ensemble Learning for Low-levelHardware-supported Malware

Detection

* University of California, Riverside, ** Binghamton University, *** Intel Corp.

RAID 2015 – Kyoto, Japan, November 2015

Malware GrowthMcAfee Lab

Over 350M malware programs in their malware zoo387 new threat every minute

RAID 2015 – Kyoto, Japan, November 2015

Malware Detection Analysis

Static analysisSearch for signatures in the executableCan detect all known malware programs with no false alarmsCan't detect metamorphic malware, polymorphic malware, or targeted attacks

RAID 2015 – Kyoto, Japan, November 2015

Malware Detection Analysis

Static analysisSearch for signatures in the executableCan detect all known malware programs with no false alarmsCan't detect metamorphic malware, polymorphic malware, or targeted attacks

Dynamic analysisMonitors the behavior of the programCan detect metamorphic malware, polymorphic malware, and targeted attacksAdds substantial overhead to the system and have false positives

RAID 2015 – Kyoto, Japan, November 2015

TWO-LEVEL MALWARE DETECTION FRAMEWORK

RAID 2015 – Kyoto, Japan, November 2015

Two-Level Malware Detection

MAP was introduced by Ozsoy el al. (HPCA 2015)Explored a number of sub-semantic features vectors Single hardware supported detectorDetect malware online (In real time)Two stage detection

RAID 2015 – Kyoto, Japan, November 2015

RAID 2015 – Kyoto, Japan, November 2015

Contributions of this work

Better hardware malware detection using ensemble of detectors specialized for each type of malware

Metrics to measure resulting advantages of using two-level malware detection framework

EVALUATION METHODOLOGY:

WORKLOADS, FEATURES, PERFORMANCE MEASURES

RAID 2015 – Kyoto, Japan, November 2015

Data Set & Data Collection

Total Training Testing Cross-Validati

on

Backdoor

815 489 163 163

Rogue 685 411 137 137

PWS 558 335 111 111

Trojan 1123 673 225 225

Worm 473 283 95 95

Regular 554 332 111 111

• Source of programs– Malware

• MalwareDB• 2011-2014• 3,690 total malware programs

– Regular• Windows system binaries• Other applications like Winrar,

Notepad++, Acrobat Reader

• Dynamic trace– Windows 7 virtual machine – Firewall and security services

were all disabled– Pin tool was used to collect

the features during execution

RAID 2015 – Kyoto, Japan, November 2015

Feature SpaceInstruction mix

INS1: frequency of instruction categories INS2: frequency of most variant opcodesINS3: presence of instruction categories INS4: presence of most variant opcodes

Memory reference patterns MEM1: histogram (count) of memory address distancesMEM2: binary (presence) of memory address distances

Architectural eventsARCH: Total number of memory reads, memory writes, unaligned memory access, immediate branches and taken branches

RAID 2015 – Kyoto, Japan, November 2015

RAID 2015 – Kyoto, Japan, November 2015

Detection Performance Measures

Sensitivity:Percent of malware that was detected (True positive rate)

Specificity:Percent of correctly classified regular programs (True negative rate)

Receiver Operating Characteristic (ROC) CurveSummaries the prediction performance for range of detection thresholds

Area Under the Curve (AUC)Traditional performance metric for ROC curve

SPECIALIZING THE DETECTORS FOR DIFFERENT MALWARE

TYPES

RAID 2015 – Kyoto, Japan, November 2015

Constructing Specialized Detectors

Specialized detectors for each malware type were trained only with the data of that typeSupervised learning with logistic regression was used

MEM1 Detectors

RAID 2015 – Kyoto, Japan, November 2015

General vs. Specialized Detectors

Backdoor

PWS Rogue Trojan Worm

INS1 General 0.713 0.909 0.949 0.715 0.705

Specialized 0.715 0.892 0.962 0.727 0.819

INS2 General 0.905 0.946 0.993 0.768 0.810

Specialized 0.895 0.954 0.976 0.782 0.984

INS3 General 0.837 0.909 0.924 0.527 0.761

Specialized 0.840 0.888 0.991 0.808 0.852

INS4 General 0.866 0.868 0.914 0.788 0.830

Specialized 0.891 0.941 0.993 0.798 0.869

MEM1 General 0.729 0.893 0.424 0.650 0.868

Specialized 0.868 0.961 0.921 0.867 0.871

MEM2 General 0.833 0.947 0.761 0.866 0.903

Specialized 0.843 0.979 0.931 0.868 0.871

ARCH General 0.702 0.919 0.965 0.763 0.602

Specialized 0.686 0.942 0.970 0.795 0.560

RAID 2015 – Kyoto, Japan, November 2015

General vs. Specialized Detectors

Backdoor

PWS Rogue Trojan Worm

INS1 General 0.713 0.909 0.949 0.715 0.705

Specialized 0.715 0.892 0.962 0.727 0.819

INS2 General 0.905 0.946 0.993 0.768 0.810

Specialized 0.895 0.954 0.976 0.782 0.984

INS3 General 0.837 0.909 0.924 0.527 0.761

Specialized 0.840 0.888 0.991 0.808 0.852

INS4 General 0.866 0.868 0.914 0.788 0.830

Specialized 0.891 0.941 0.993 0.798 0.869

MEM1 General 0.729 0.893 0.424 0.650 0.868

Specialized 0.868 0.961 0.921 0.867 0.871

MEM2 General 0.833 0.947 0.761 0.866 0.903

Specialized 0.843 0.979 0.931 0.868 0.871

ARCH General 0.702 0.919 0.965 0.763 0.602

Specialized 0.686 0.942 0.970 0.795 0.560

RAID 2015 – Kyoto, Japan, November 2015

General vs. Specialized Detectors

Backdoor

PWS Rogue Trojan Worm

INS1 General 0.713 0.909 0.949 0.715 0.705

Specialized 0.715 0.892 0.962 0.727 0.819

INS2 General 0.905 0.946 0.993 0.768 0.810

Specialized 0.895 0.954 0.976 0.782 0.984

INS3 General 0.837 0.909 0.924 0.527 0.761

Specialized 0.840 0.888 0.991 0.808 0.852

INS4 General 0.866 0.868 0.914 0.788 0.830

Specialized 0.891 0.941 0.993 0.798 0.869

MEM1 General 0.729 0.893 0.424 0.650 0.868

Specialized 0.868 0.961 0.921 0.867 0.871

MEM2 General 0.833 0.947 0.761 0.866 0.903

Specialized 0.843 0.979 0.931 0.868 0.871

ARCH General 0.702 0.919 0.965 0.763 0.602

Specialized 0.686 0.942 0.970 0.795 0.560

RAID 2015 – Kyoto, Japan, November 2015

General vs. Specialized Detectors

Backdoor

PWS Rogue Trojan Worm

INS1 General 0.713 0.909 0.949 0.715 0.705

Specialized 0.715 0.892 0.962 0.727 0.819

INS2 General 0.905 0.946 0.993 0.768 0.810

Specialized 0.895 0.954 0.976 0.782 0.984

INS3 General 0.837 0.909 0.924 0.527 0.761

Specialized 0.840 0.888 0.991 0.808 0.852

INS4 General 0.866 0.868 0.914 0.788 0.830

Specialized 0.891 0.941 0.993 0.798 0.869

MEM1 General 0.729 0.893 0.424 0.650 0.868

Specialized 0.868 0.961 0.921 0.867 0.871

MEM2 General 0.833 0.947 0.761 0.866 0.903

Specialized 0.843 0.979 0.931 0.868 0.871

ARCH General 0.702 0.919 0.965 0.763 0.602

Specialized 0.686 0.942 0.970 0.795 0.560

RAID 2015 – Kyoto, Japan, November 2015

Is There an Opportunity?General Specialized Difference

Backdoor 0.8662 0.8956 0.0294

PWS 0.8684 0.9795 0.1111

Rogue 0.9149 0.9937 0.0788

Trojan 0.7887 0.8676 0.0789

Worm 0.8305 0.9842 0.1537

Average 0.8537 0.9441 0.0904

Best General (INS4) Best Specialized per Type

RAID 2015 – Kyoto, Japan, November 2015

ENSEMBLE DETECTORS

RAID 2015 – Kyoto, Japan, November 2015

Ensemble LearningMultiple diverse base detectors

Different learning algorithm Different data set

Combined to solve a problem

RAID 2015 – Kyoto, Japan, November 2015

Decision FunctionsOr’ing

High Confidence Or’ing

RAID 2015 – Kyoto, Japan, November 2015

Decision FunctionsMajority voting

Stacking

RAID 2015 – Kyoto, Japan, November 2015

Ensemble DetectorsGeneral Ensemble

Combines multiple general detectors Best of INS, MEM, ARCH

Specialized Ensemble Combines the best specialized detector for each malware type

Mixed EnsembleCombines the best general detector with the best specialized detectors from the same features vector

RAID 2015 – Kyoto, Japan, November 2015

Offline Detection Effectiveness

Decision Function Sensitivity

Specificity

Accuracy

Best General - 82.4% 89.3% 85.1%

General Ensemble

Or’ing 99.1% 13.3% 65.0%

High Confidence 80.7% 92.0% 85.1%

Majority Voting 83.3% 92.1% 86.7%

Stacking 80.7% 96.0% 86.8%

Specialized Ensemble

Or’ing 100% 5% 51.3%

High Confidence 94.4% 94.7% 94.5%

Stacking 95.8% 96.0% 95.9%

Mixed EnsembleOr’ing 84.2% 70.6% 78.8%

High Confidence 83.3% 81.3% 82.5%

Stacking 80.7% 96.0% 86.7%

RAID 2015 – Kyoto, Japan, November 2015

Offline Detection Effectiveness

Decision Function Sensitivity

Specificity

Accuracy

Best General - 82.4% 89.3% 85.1%

General Ensemble

Or’ing 99.1% 13.3% 65.0%

High Confidence 80.7% 92.0% 85.1%

Majority Voting 83.3% 92.1% 86.7%

Stacking 80.7% 96.0% 86.8%

Specialized Ensemble

Or’ing 100% 5% 51.3%

High Confidence 94.4% 94.7% 94.5%

Stacking 95.8% 96.0% 95.9%

Mixed EnsembleOr’ing 84.2% 70.6% 78.8%

High Confidence 83.3% 81.3% 82.5%

Stacking 80.7% 96.0% 86.7%

RAID 2015 – Kyoto, Japan, November 2015

Offline Detection Effectiveness

Decision Function Sensitivity

Specificity

Accuracy

Best General - 82.4% 89.3% 85.1%

General Ensemble

Or’ing 99.1% 13.3% 65.0%

High Confidence 80.7% 92.0% 85.1%

Majority Voting 83.3% 92.1% 86.7%

Stacking 80.7% 96.0% 86.8%

Specialized Ensemble

Or’ing 100% 5% 51.3%

High Confidence 94.4% 94.7% 94.5%

Stacking 95.8% 96.0% 95.9%

Mixed EnsembleOr’ing 84.2% 70.6% 78.8%

High Confidence 83.3% 81.3% 82.5%

Stacking 80.7% 96.0% 86.7%

RAID 2015 – Kyoto, Japan, November 2015

Online Detection Effectiveness

A decision is made after each 10,000 committed instructions Exponentially Weighted Moving Average (EWMA) to filter false alarms

Sensitivity

Specificity Accuracy

Best General 84.2% 86.6% 85.1%

General Ensemble (Stacking)

77.1% 94.6% 84.1%

Specialized Ensemble (Stacking)

92.9% 92.0% 92.3%

Mixed Ensemble (Stacking)

85.5% 90.1% 87.4%

RAID 2015 – Kyoto, Japan, November 2015

METRICS TO ASSESS RELATIVE PERFORMANCE OF TWO-LEVEL DETECTION

FRAMEWORK

RAID 2015 – Kyoto, Japan, November 2015

Metrics

Work Advantage

Time Advantage

Detection Performance

RAID 2015 – Kyoto, Japan, November 2015

Online Detection Effectiveness

A decision is made after each 10,000 committed instructions Exponentially Weighted Moving Average (EWMA) to filter false alarms

Sensitivity

Specificity Accuracy

Best General 84.2% 86.6% 85.1%

General Ensemble (Stacking)

77.1% 94.6% 84.1%

Specialized Ensemble (Stacking)

92.9% 92.0% 92.3%

Mixed Ensemble (Stacking)

85.5% 90.1% 87.4%

RAID 2015 – Kyoto, Japan, November 2015

Time & Work Advantage Results

Time Advantage Work Advantage

RAID 2015 – Kyoto, Japan, November 2015

Hardware Implementation

Physical design overheadArea 2.8% (Ensemble), 0.3% (General)Power 1.5% (Ensemble), 0.1% (General)Cycle time 9.8% (Ensemble), 1.9% (General)

RAID 2015 – Kyoto, Japan, November 2015

RAID 2015 – Kyoto, Japan, November 2015

Conclusions & Future WorkEnsemble learning with specialized detectors can significantly improve detection performance

Hardware complexity increases, but several optimizations still possible

Some features are complex to collect; simpler features may carry same information

Future work:Demonstrate a fully functional systemStudy how attackers could evolve and adversarial machine learning

Thank you!

RAID 2015 – Kyoto, Japan, November 2015

Questions?