AI approach to malware similarity analysis: Maping the malware genome with a deep neural network
-
Upload
priyanka-aash -
Category
Technology
-
view
63 -
download
1
Transcript of AI approach to malware similarity analysis: Maping the malware genome with a deep neural network
AnAIApproachtoMalwareSimilarityAnalysis:MappingtheMalwareGenomeWithaDeep
NeuralNetwork
Dr.KonstantinBerlinSeniorResearchEngineer
InvinceaLabs1
EnterpriseNetworksareUnderConstantAttack
• Intelligenceiscriticalforprevention
• Cyberdefendersareoverwhelmed
• AIcanhelpfindimportantrelations
2
NumberofNetworkBreachesPerYear(Verizon’s 2016DataBreachInvestigationsReport)
IntelligencethroughMalwareTriage• Benefits
• Identifythreatactors• Linkvariousattackstoasingleactor
• Quicklyunderstandfunctionality• Speedupreverseengineering
• Mitigation• Signatures• NetworkRulesEnterpriseNetwork
3
Nearest-NeighborClassification
FeatureA
FeatureB
4
$$$
?
SimilaritySearch• MinHash• Featurehashing• Othersketching• …
Jang,Jiyong et.al.Proceedingsofthe18thACMconferenceonComputerandcommunicationssecurity.ACM,2011.Sæbjørnsen,Andreas,etal.Proceedingsof18th internationalsymposiumonSoftwaretestingandanalysis.ACM,2009.Bayer,Ulrich,etal.NDSS.Vol.9.2009.…Manymore
AttributeEmbedding
A B C D E
AttributeExtraction
Attributes• Byten-grams• Opcoden-grams• Printablestrings• Systemcalls• …
WhatCanGoWrongwithEmbedding?• Embeddings skewdistances
• Sameembeddeddatacangivedifferentneighbors(ex.AlaskaandRussia)
5
IssueswithAttributeEmbedding
6
FeatureA
FeatureB
PossibleAttributes#1
FeatureA
FeatureB
PossibleAttributes#2
?
?
Howtogetconsistentresults,regardlessoffeatures?
SupervisedClassification(EndpointSolution)
7
Layer1
Layer2
Layer3
LayerN [0,1]…
LayerX
backdoor.msil.bladabindi.aa
backdoor.msil.bladabindi.aj
worm.winnt.lurka.a…
XJoshua Saxe and Konstantin Berlin, (MALWARE). IEEE, 2015.
0.97F1-score(precisionandrecall)• 1500Microsoft
Families• 2.0MTrainingfiles
MalwareDetection
CategoricalClassification
Notenoughforatriagesystem!
AB
CD
E
Howdowemapmalwareintoanembeddingsothatdistancesmakesemantic sense?
8
ImaginaryWorldofMalwareFactories• IdealWorld
• Eachhiddenfactoryproducesonemalwarefamily/variant
• Factoriesarepositionedrelativetowhatandhowtheyexploitvulnerabilities
• …butthisnotwhatwehave!?
9
SecretsauceA
Secretsa
uceB
IdealizedEmbedding
ThereisNoSpoon Embedding…• Wecreatedtheembeddingwhenweselectedthefeatures
• Wecanmorphtheminanywaywechoose
• Onegoodwaytomorphthefeaturesisusingadeepneuralnetwork
10
DeepNeuralNetwork
A B C D E
a b c d
“TheMatrix”,1999
MorphingtheEmbedding
11
DeepNeuralNetwork(Morphing)
Noise
PredictedFamilies
worm.win32.vobfus.hc…
trojanspy.win32.nivdort.af
Variational Autoencoder
Onlyonefactory
Embedding clustersfamilies
Kingma,D.P.,&Welling,M.(2013).arXivpreprintarXiv:1312.6114.
A B C D E
a b c d
Encoder
Decoder
Embedding
FamilyLabels
Secretsauce
EmbeddingVisualization• ToyExample
• 8family/variantprediction• 2Dembedding
12
Onlyonefactory Embedding clustersfamilies
a b
a
bCollapse intoasingularity
Inter-classdistancesnotpreserved
aa
b b
A B C D E
PrintableStringsResults• 800Ksamples
• 1500family/variants(99%coverage)
• Time-splitValidation• Trainonolddata• Teston30dayslater
• MeasureF1-scoreof3-nearestneighborclassifier
13
FeatureVector
SemanticEmbedding
Deep-learningFeatures
F1-score0.44 F1-score0.94
F1-score0.96F1-score0.66
nogap
gap
smallgap
largegap
Conclusion• Developingfeatureextractionisexpensiveandrequirestimeconsumingtuningtoadapttoaspecificdomain
• Traditionalapproachestomalwaresimilarityareunsupervisedandsoarebrittle
• Usingsupervised-learningapproacheswecanimprove existingfeaturesbyembeddingthemintoamoreoptimizedspace
• Automatic(re)tuningwillimprovedetectionratesandreducecost
14
MoreInformation• Email:[email protected]• Twitter:@kberlin
15