Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic...

30
Machine Learning-Based Detection of Ransomware Using SDN Greg Cusack*, Oliver Michel, Eric Keller 3/21/18 * Presenter

Transcript of Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic...

Page 1: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Machine Learning-Based Detection of Ransomware Using SDN

Greg Cusack*, Oliver Michel, Eric Keller3/21/18

* Presenter

Page 2: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

● Ransomware Overview● Previous work● Programmable Forwarding Engines (PFEs)● Method● Classification● Results● Current Progress

Overview

Page 3: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Ransomware

● Malicious software holds a victim’s files at ransom

● Files held until ransom paid● Two main types:

○ Locker○ Crypto

● Difficult to develop long term solutions

● IoT boom -> More avenues for infection

● Ransomware as a Service (RaaS)

WannaCry Ransomware

Source: https://www.pbs.org/newshour/science/everything-

need-know-wannacrypt-ransomware-attack

Page 4: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Ransomware Data Flow

WannaCry Ransomware

Page 5: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Previous Work

WannaCry Ransomware

● EldeRan: Machine learning approach for ransomware classification○ Track Windows API calls, file system operations, registry key operations, etc.

● Software-defined networking-based detection of crypto ransomware○ Fingerprint HTTP traffic

● Most packet trace approaches are payload-based

Source: K. Cabaj, M. Gregorczyk, and W. Mazurczyk. Software-defined networking-based crypto

ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016

Page 6: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Programmable Forwarding Engines (PFEs)

WannaCry Ransomware

● High-rate, programmable, network switches● Supports the scalable generation of rich flow records● Can process network data at high-rates of speed and

extract vital, per-packet flow information● Provides data and speed necessary for network,

flow-based, traffic analysis and fingerprinting

Page 7: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Compact, Per Packet Flow Records

WannaCry Ransomware

● Provides richness and scalability for large networked systems

● Tailored to fit a user's specific application

PFE Flow Record Overview and Features

Page 8: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Method

WannaCry Ransomware

● Goal: Utilize machine learning and leverage the recent trend in switch hardware to identify ransomware via its network traffic signature

● Collect ransomware PCAP samples (>100MB)● Collect clean traffic as baseline

○ Web browsing, streaming, file downloading, etc.

● Stream processor development● Classification

Source: http://www.malware-traffic-analysis.net

Page 9: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Flow Records and Stream Processing

WannaCry Ransomware

Page 10: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Flow-Based Features

WannaCry Ransomware

● Flow Duration● Interarrival Times

○ Minimum○ Mean○ Maximum ○ Standard deviation

● Packet Lengths○ Minimum○ Mean○ Maximum ○ Standard deviation

● Burst Lengths○ Minimum○ Mean○ Maximum

● Total number of packets● Ratios:

○ Packets out/packet in○ Bytes out/bytes in

● # of unique packet lengths

Page 11: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Random Forest

WannaCry Ransomware

● Ensemble Algorithm○ Divide and conquer approach

● Collection of decision trees○ Avoids overfitting

● Random subsets of features used to build smaller, shallower trees

● Majority voting from decision trees to decide class

● Bagging used to improve stability, reduce variance, and increase accuracy

Source: https://www.youtube.com/watch?v=D_2LkhMJcfY

Page 12: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

WannaCry Ransomware

Results

Page 13: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

WannaCry Ransomware

Confusion Matrix (28 Features)

● Accuracy○ Correct / total = 0.8689

● Recall ○ tp / (tp + fn) = 0.8925

● Precision○ tp / (tp + fp) = 0.8384

● F1 Score○ 2 * (R * P) / (R + P) = 0.8689

Page 14: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

WannaCry Ransomware

ROC Curve (28 Features)

● Area Under Curve (AUC)○ 0.93502

● 10-Fold Cross Validation Score○ 0.87301

● Decision Trees: 40● Max Tree Depth: 15

Page 15: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Features

Inflow #

of Bytes

Outflow to inflow

packet ratioInflow σLength

Inflow Mean

Burst Length

Outflow #

of Bytes

Outflow Minimum

Interarrival Time

Outflow Flow

Duration

Outflow σLength

Feature Importances (28 Features)

Page 16: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Confusion Matrix (8 Features)

WannaCry Ransomware

● Accuracy○ Correct / total = 0.8738

● Recall ○ tp / (tp + fn) = 0.8710

● Precision○ tp / (tp + fp) = 0.8617

● F1 Score○ 2 * (R * P) / (R + P) = 0.8738

Page 17: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

ROC Curve (8 Features)

WannaCry Ransomware

● Area Under Curve (AUC)○ 0.91951

● 10-Fold Cross Validation Score○ 0.86827

● Decision Trees: 40● Max Tree Depth: 15

Page 18: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Inflow σ Length

Outflow Minimum

Interarrival Time

Outflow to

inflow packet

ratio

Features

Feature Importances (8 Features)

Page 19: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Can we identify a specific type of ransomware?

Crypto-Based Cerber Ransomware Detection

Page 20: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Features

Inflow Mean

Burst Length

Inflow Max

Burst Length

Outflow to inflow

packet ratio

Cerber Feature Importances (8 Features)

Page 21: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Confusion Matrix Cerber Ransomware

WannaCry Ransomware

● Accuracy○ Correct / total = 0.9444

● Recall ○ tp / (tp + fn) = 1.000

● Precision○ tp / (tp + fp) = 0.9091

● F1 Score○ 2 * (R * P) / (R + P) = 0.9439

Page 22: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

ROC Curve Cerber Ransomware

WannaCry Ransomware

● Area Under Curve (AUC)○ 0.98750

● 10-Fold Cross Validation Score○ 0.90500

● Decision Trees: 40● Max Tree Depth: 15

Page 23: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Takeaways

● Initial findings are promising but require further research

● Packet lengths, interarrival times, and flow ratios leave ransomware susceptible to identification

● Recent emergence of PFEs provide the right backbone for flow-based feature extraction

Page 24: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Work in Progress

● Sandboxing ransomware samples to collect network traffic

● Implementing stream processor on a PFE ASIC

● Developing LSTM Recurrent Neural Network

● System architecture redesign

Page 25: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Questions?

Page 26: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Ransomware Detection System Overview

Page 27: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Hardware-generated, Bidirectional Microflows

● Generated in switch ASIC● Turn current *Flow flow table into bidirectional flow table● Evict bidirectional microflows to CPU for feature extraction● PFE data flow overview:

Page 28: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Flow Feature Structure

Page 29: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

SDN Controller/Server Data Flow

Page 30: Ransomware Using SDN Machine Learning-Based Detection of · ransomware detection using http traffic characteristics. arXiv preprint arXiv:1611.08294, 2016. Programmable Forwarding

Packet Length Frequency