Khaled N. Khasawneh, Meltem Ozsoy, Caleb Donovick, Nael Abu-Ghazaleh*, and Dmitry Ponomarev**...

Khaled N. Khasawneh*, Meltem Ozsoy***, Caleb Donovick**,Nael Abu-Ghazaleh*, and Dmitry Ponomarev**

Ensemble Learning for Low-levelHardware-supported Malware

Detection

* University of California, Riverside, ** Binghamton University, *** Intel Corp.

RAID 2015 – Kyoto, Japan, November 2015

Malware GrowthMcAfee Lab

Over 350M malware programs in their malware zoo387 new threat every minute


Malware Detection Analysis

Static analysisSearch for signatures in the executableCan detect all known malware programs with no false alarmsCan't detect metamorphic malware, polymorphic malware, or targeted attacks


Malware Detection Analysis

Static analysisSearch for signatures in the executableCan detect all known malware programs with no false alarmsCan't detect metamorphic malware, polymorphic malware, or targeted attacks

Dynamic analysisMonitors the behavior of the programCan detect metamorphic malware, polymorphic malware, and targeted attacksAdds substantial overhead to the system and have false positives


TWO-LEVEL MALWARE DETECTION FRAMEWORK


Two-Level Malware Detection

MAP was introduced by Ozsoy el al. (HPCA 2015)Explored a number of sub-semantic features vectors Single hardware supported detectorDetect malware online (In real time)Two stage detection



Contributions of this work

Better hardware malware detection using ensemble of detectors specialized for each type of malware

Metrics to measure resulting advantages of using two-level malware detection framework

EVALUATION METHODOLOGY:

WORKLOADS, FEATURES, PERFORMANCE MEASURES


Data Set & Data Collection

Total Training Testing Cross-Validati

on

Backdoor

815 489 163 163

Rogue 685 411 137 137

PWS 558 335 111 111

Trojan 1123 673 225 225

Worm 473 283 95 95

Regular 554 332 111 111

• Source of programs– Malware

• MalwareDB• 2011-2014• 3,690 total malware programs

– Regular• Windows system binaries• Other applications like Winrar,

Notepad++, Acrobat Reader

• Dynamic trace– Windows 7 virtual machine – Firewall and security services

were all disabled– Pin tool was used to collect

the features during execution


Feature SpaceInstruction mix

INS1: frequency of instruction categories INS2: frequency of most variant opcodesINS3: presence of instruction categories INS4: presence of most variant opcodes

Memory reference patterns MEM1: histogram (count) of memory address distancesMEM2: binary (presence) of memory address distances

Architectural eventsARCH: Total number of memory reads, memory writes, unaligned memory access, immediate branches and taken branches



Detection Performance Measures

Sensitivity:Percent of malware that was detected (True positive rate)

Specificity:Percent of correctly classified regular programs (True negative rate)

Receiver Operating Characteristic (ROC) CurveSummaries the prediction performance for range of detection thresholds

Area Under the Curve (AUC)Traditional performance metric for ROC curve

SPECIALIZING THE DETECTORS FOR DIFFERENT MALWARE

TYPES


Constructing Specialized Detectors

Specialized detectors for each malware type were trained only with the data of that typeSupervised learning with logistic regression was used

MEM1 Detectors


General vs. Specialized Detectors

Backdoor

PWS Rogue Trojan Worm

INS1 General 0.713 0.909 0.949 0.715 0.705

Specialized 0.715 0.892 0.962 0.727 0.819

INS2 General 0.905 0.946 0.993 0.768 0.810

Specialized 0.895 0.954 0.976 0.782 0.984

INS3 General 0.837 0.909 0.924 0.527 0.761

Specialized 0.840 0.888 0.991 0.808 0.852

INS4 General 0.866 0.868 0.914 0.788 0.830

Specialized 0.891 0.941 0.993 0.798 0.869

MEM1 General 0.729 0.893 0.424 0.650 0.868

Specialized 0.868 0.961 0.921 0.867 0.871

MEM2 General 0.833 0.947 0.761 0.866 0.903

Specialized 0.843 0.979 0.931 0.868 0.871

ARCH General 0.702 0.919 0.965 0.763 0.602

Specialized 0.686 0.942 0.970 0.795 0.560


Is There an Opportunity?General Specialized Difference

Backdoor 0.8662 0.8956 0.0294

PWS 0.8684 0.9795 0.1111

Rogue 0.9149 0.9937 0.0788

Trojan 0.7887 0.8676 0.0789

Worm 0.8305 0.9842 0.1537

Average 0.8537 0.9441 0.0904

Best General (INS4) Best Specialized per Type


ENSEMBLE DETECTORS


Ensemble LearningMultiple diverse base detectors

Different learning algorithm Different data set

Combined to solve a problem


Decision FunctionsOr’ing

High Confidence Or’ing


Decision FunctionsMajority voting

Stacking


Ensemble DetectorsGeneral Ensemble

Combines multiple general detectors Best of INS, MEM, ARCH

Specialized Ensemble Combines the best specialized detector for each malware type

Mixed EnsembleCombines the best general detector with the best specialized detectors from the same features vector


Offline Detection Effectiveness

Decision Function Sensitivity

Specificity

Accuracy

Best General - 82.4% 89.3% 85.1%

General Ensemble

Or’ing 99.1% 13.3% 65.0%

High Confidence 80.7% 92.0% 85.1%

Majority Voting 83.3% 92.1% 86.7%

Stacking 80.7% 96.0% 86.8%

Specialized Ensemble

Or’ing 100% 5% 51.3%


Stacking 95.8% 96.0% 95.9%

Mixed EnsembleOr’ing 84.2% 70.6% 78.8%


Stacking 80.7% 96.0% 86.7%


Online Detection Effectiveness

A decision is made after each 10,000 committed instructions Exponentially Weighted Moving Average (EWMA) to filter false alarms

Sensitivity

Specificity Accuracy

Best General 84.2% 86.6% 85.1%

General Ensemble (Stacking)

77.1% 94.6% 84.1%

Specialized Ensemble (Stacking)

92.9% 92.0% 92.3%

Mixed Ensemble (Stacking)

85.5% 90.1% 87.4%


METRICS TO ASSESS RELATIVE PERFORMANCE OF TWO-LEVEL DETECTION

FRAMEWORK


Metrics

Work Advantage

Time Advantage

Detection Performance


Online Detection Effectiveness

A decision is made after each 10,000 committed instructions Exponentially Weighted Moving Average (EWMA) to filter false alarms

Sensitivity

Specificity Accuracy

Best General 84.2% 86.6% 85.1%

General Ensemble (Stacking)

77.1% 94.6% 84.1%

Specialized Ensemble (Stacking)

92.9% 92.0% 92.3%

Mixed Ensemble (Stacking)

85.5% 90.1% 87.4%


Time & Work Advantage Results

Time Advantage Work Advantage


Hardware Implementation

Physical design overheadArea 2.8% (Ensemble), 0.3% (General)Power 1.5% (Ensemble), 0.1% (General)Cycle time 9.8% (Ensemble), 1.9% (General)



Conclusions & Future WorkEnsemble learning with specialized detectors can significantly improve detection performance

Hardware complexity increases, but several optimizations still possible

Some features are complex to collect; simpler features may carry same information

Future work:Demonstrate a fully functional systemStudy how attackers could evolve and adversarial machine learning

Thank you!


Questions?

Khaled N. Khasawneh, Meltem Ozsoy, Caleb Donovick, Nael Abu-Ghazaleh*, and Dmitry Ponomarev**...

Documents

Transcript of Khaled N. Khasawneh, Meltem Ozsoy, Caleb Donovick, Nael Abu-Ghazaleh*, and Dmitry Ponomarev**...

Khaled N. Khasawneh*, Meltem Ozsoy***, Caleb Donovick**, Nael Abu-Ghazaleh*, and Dmitry Ponomarev**...

Documents

Transcript of Khaled N. Khasawneh*, Meltem Ozsoy***, Caleb Donovick**, Nael Abu-Ghazaleh*, and Dmitry Ponomarev**...

Khaled N. Khasawneh, Meltem Ozsoy, Caleb Donovick, Nael Abu-Ghazaleh*, and Dmitry Ponomarev**...

Transcript of Khaled N. Khasawneh, Meltem Ozsoy, Caleb Donovick, Nael Abu-Ghazaleh*, and Dmitry Ponomarev**...