Using Data Science Techniques to Detect Malicious Behavior

22
Using Data Science Techniques to Help Detect Malicious Behavior Phil Roth, Data Scientist

Transcript of Using Data Science Techniques to Detect Malicious Behavior

Page 1: Using Data Science Techniques to Detect Malicious Behavior

Using Data Science Techniques to

Help Detect Malicious Behavior

Phil Roth, Data Scientist

Page 2: Using Data Science Techniques to Detect Malicious Behavior

• An introduction to key data science concepts

• Challenges that exist to applying those concepts to security data

• Why focusing on aiding a human security analyst can lead to better machine learning tools

• How Endgame’s enterprise product benefits from that focus

Key Takeaways

Page 3: Using Data Science Techniques to Detect Malicious Behavior

Data Science Process

Page 4: Using Data Science Techniques to Detect Malicious Behavior

Gather Raw Data

Process and Clean

Data

Explore the Data

Apply a Model

Communicate the Result

Data Science Process

Page 5: Using Data Science Techniques to Detect Malicious Behavior

Data Science Process

Data can come from many disparate sources.

Raw data must be cleaned and features extracted

Gather RawData

Process and Clean Data

Explore DataFinding relationships in the data provides hints about what features and models will be useful.

Page 6: Using Data Science Techniques to Detect Malicious Behavior

Data Science Process

Models exploit features and relationships in the data to make a statement.

Apply a Model

Communicate the Result

The output of a data product is useless without effective and actionable communication.

Page 7: Using Data Science Techniques to Detect Malicious Behavior

Introduction to Machine Learning Models

Page 8: Using Data Science Techniques to Detect Malicious Behavior

In supervised learning, input data is labeled. An algorithm attempts to reproduce those labels on new unlabeled data.

input datalabel-3 -4 1 0 1-4 -3 1 1 1-4 -4 0 0 1+4 +3 1 0 0+3 +4 0 1 0+3 +3 1 0 0

new datalabel-3 -4 1 1 ???

Supervised learning

Page 9: Using Data Science Techniques to Detect Malicious Behavior

A Support Vector Machine1 finds the best separating boundary between two classes in space.

Supervised learning example

1 http://scikit-learn.org/stable/modules/svm.html

Page 10: Using Data Science Techniques to Detect Malicious Behavior

In unsupervised learning, input data is unlabeled. An algorithm attempts to find hidden structure in that data.

input data-3 -4 1 0-4 -3 1 1-4 -4 0 0+4 +3 1 0+3 +4 0 1+3 +3 1 0

group 1

group 2

Unsupervised learning

Page 11: Using Data Science Techniques to Detect Malicious Behavior

Unsupervised learning example

step 1:

step 2:

etc…

k-means clustering iteratively improves the location of cluster centers by moving them closer to cluster means

Page 12: Using Data Science Techniques to Detect Malicious Behavior

Challenges with Security Data

Page 13: Using Data Science Techniques to Detect Malicious Behavior

Recommendation Systems

Character RecognitionMNIST Database of Handwritten Digits

Security lacks open datasets

Page 14: Using Data Science Techniques to Detect Malicious Behavior

The DARPA Intrusion Detection Evaluation dataset is 15 years old, simulated, and techniques trained on it were never actionable.

Sharing data in the security industry will always be a challenge that even President Obama is attempting to address.

Security lacks open datasets

Page 15: Using Data Science Techniques to Detect Malicious Behavior

Labeling is an expensive process that requires expertise.

vs.

Security lacks easy labels

Is this binary malicious?

Is this traffic an intrusion?

Are these products related?

Page 16: Using Data Science Techniques to Detect Malicious Behavior

False positives lead to expensive analyst investigations and alert fatigue and

False negatives get CEOs fired

Security lacks tolerance for errors

Page 17: Using Data Science Techniques to Detect Malicious Behavior

Machine Learning in security could benefit from focusing on “human in the loop” products over

“the algorithm does it all” products

Chess Analogy

1997: IBM’s supercomputer Deep Blue vs. Gary Kasparov2005: Team ZachS vs multiple Grandmasters in Freestyle Chess2

Human/Machine teams retained an edge over machines for decades

2 Cowen, Tyler. Average Is Over. Chapter 5. 2013

Page 18: Using Data Science Techniques to Detect Malicious Behavior

Using the Human/Machine Model

Page 19: Using Data Science Techniques to Detect Malicious Behavior

Cloud deployed virtual machines are clustered based on their behavior. The results are communicated to analysts and used to improve the detection of malicious behavior.

Endgame Implementation

Page 20: Using Data Science Techniques to Detect Malicious Behavior

Package, process, and user information is collected from the machines.

DBSCAN, a clustering algorithm, groups the machines based on that information.

Endgame implementation

Page 21: Using Data Science Techniques to Detect Malicious Behavior

• An introduction to key data science concepts

• Existing challenges to applying those concepts to security data

• Why focusing on aiding a human security analyst can lead to better machine learning tools

• How Endgame’s enterprise product benefits from that focus

Key Takeaways

Page 22: Using Data Science Techniques to Detect Malicious Behavior

For more information contact: [email protected]