By Amir Javed Cyber Security Analytics @ Cardiff
Predictive Anomaly detection for IoT devices
Irene Anthi
The challenge Top IoT Attack Categories
Traditional Security is Ineffective
Artificial Intelligence
The Goal
The Internet of “Evil” Things
scale, heterogeneity, device constraints Lightweight IDS for
predicting attacks
Device agnostic method for anomaly detection with lifetime learning
1. Anthi, E., Williams, L. and Burnap, P., 2018. Pulse: An adaptive intrusion detection for the Internet of Things2. Anthi, E., Ahmad, S., Rana, O., Theodorakopoulos, G. and Burnap, P., 2018. EclipseIoT: A secure and adaptive
hub for the Internet of Things. Computers & Security, 78, pp.477‐490.
1 1
22
3
3
4 4
Auto-encoder for detecting low rate network attacks
Baskoro Adi Pratomo
Pratomo, B.A., Burnap, P. and Theodorakopoulos, G., 2018, June. Unsupervised Approach for Detecting Low Rate Attacks on Network Traffic with Autoencoder. In 2018 International Conference on Cyber Security and Protection of Digital Services (Cyber Security) (pp. 1‐8). IEEE.
Matilda Rhode
Recurrent Neural Network for Early-Stage Malware prediction
94% Malware detected within 5 seconds
1. Rhode, M., Burnap, P. and Jones, K., 2018. Early-stage malware prediction using recurrent neural networks. computers & security, 77, pp.578-594.2. Rhode, M., Burnap, P. and Jones, K., 2019. Dual-task agent for run-time classification and killing of malicious processes. arXiv preprint arXiv:1902.02598.3. Rhode, M. Tuson, L. Burnap, P. and Jones, K., 2019 Lab to Soc: Feature selection for robust dynamic malware detection. The 49th IEEE/IFIP International Conference on Dependable Systems and Networks [in print]
Is Twitter Safe? • Twitter is an online news and social networking service where users post and read short 280‐character messages called "tweets“.
• It has emerged as a go to place to get updated on current affairs and global events
• Currently there are over 300 million active users making it vulnerable to malware attacks
Drive‐by DownloadAny download that happens without a person's knowledge, often a computer virus, spyware, malware, or crimeware.
2008 2012 2014 2017
Pornographic webpage to inject trojan
SMS vulnerability to update someone else timeline
Trending Topic Hijacked to spread Malware
DDoS
Mirai Attack
MouseOver Exploit
Koobfacemodification hit Twitter
HBO's Twitter account hacked
Drive‐By Download Attack on Twitter
#Trending topic lmao this tweet by @user was
nuts Short_URL*
*
Motivation
• Why ?‐ To analyze system behavior of a malware infected machine which could help us better understand how an attack is carried out.
‐ To understand and analyze methods a cyber criminal use to carry out its attack .
‐ To predict these attacks based on machine activities and social features.
‐ To understand the propagation of malware based on social and content features of a malicious tweet
7
How?
Data Collection
Around 95k tweets captured for FIFA 2014
using #FIFA2014
Around 6k tweets captured for
Olympics using #Rio2016
Around 120k Tweets captured for
SuperBowl 2015 and 55k for 2016 using #SuperbowlSunday
#SB50#NFL
Around 120k Tweets captured for Rugby world cup 2015 using
#RWC2015
Around 8K Tweets captured for Cricket world cup 2015 using
#CWC2015
Identifying Malicious URL
Extract URL from tweets
Use a honeypot (bait) to identify malicious URL
Malicious URL
Benign URL
Monitor changes in File, Process and Registry during website visitation
Malicious URL Identified Across All Events
05
101520253035404550
Fifa2014 Circket World Cup 2015
European Football
Championship 2016
Rugby World Cup 2015
SuperBowl 2015 Olympics 2016 SuperBowl 2016
THOUSA
NDS
Total Number of Malicious URLs
Training Of Predictive Model
Segregated List of Tweets containing malicious and benign URL
Record Machine Activities for 10
secondsSend URL for visitation
Visit each URL and observe machine activity for 10 seconds in sandboxed environment
Extract social features from tweet
Create log file for each URL
Build Machine learning model 10 fold cross
validation
Machine Learning Model
Built
Results for Training Model
0,975
0,980
0,985
0,990
0,995
1,000
1 2 3 4 5 6 7 8 9 10
F‐Measure
Interaction Time in seconds
Training Result
Naïve‐Bayes Bayes‐Net J‐48 MLP
Key points• Model trained using one sporting event.
• Four different categories of model chosen to identity the best performing model.
• Discriminative Models ( MLP and Decision Tree ) outperforms the Generative Models (Naïve Bayes and BayesNet).
• The F‐measure of the machine learning model increases with time.
Testing Of Predictive Model
Segregated List of Tweets containing malicious and benign URL
Record Machine Activities for 10
secondsSend URL for visitation
Visit each URL and observe machine activity for 10 seconds in sandboxed environment
Extract social features from tweet
Create log file for each URL
Check Model performance
Machine Learning Model
Built
Classified URL
Results for Training Model
Key points• Model tested using unseen data from
different sporting event than used for training.
• The highest F‐Measure of 0.86 seen at 5 second interaction, whereas 0.82 seen at 1 second interaction only.
0,7
0,72
0,74
0,76
0,78
0,8
0,82
0,84
0,86
0,88
1 2 3 4 5 6 7 8 9 10
F‐measure
Interaction time in seconds
Vote (Naïve Bayes + J48)
Conclusion
• Around 10% of captured Tweets contained URLs pointing to malicious webserver.
• Decision tree and Multi‐level perceptron stood out as the best performing algorithms during training phase, with an F‐measure of 0.9975
• During testing of the model using unseen dataset, a highest F‐measure of 0.86 is observed at 5 seconds interaction, whereas 0.82 is observed at 1 seconds interaction.*
* Javed, A., Burnap, P. and Rana, O., 2018. Prediction of drive‐by download attacks on Twitter. Information Processing & Management.
Thank you
Top Related