Hidden Markov Models for Software Piracy Detection
-
Upload
cooper-allison -
Category
Documents
-
view
54 -
download
1
description
Transcript of Hidden Markov Models for Software Piracy Detection
![Page 1: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/1.jpg)
1
Hidden Markov Models for Software Piracy Detection
Shabana KaziMark Stamp
HMMs for Piracy Detection
![Page 2: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/2.jpg)
2
Intro
Here, we apply metamorphic analysis to software piracy detection
Very similar to techniques used in malware detectiono But, problem is completely different o Has nothing to do with malware
We show that there are other applications of such techniques
HMMs for Piracy Detection
![Page 3: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/3.jpg)
3
Software Piracy
Software piracy is major problemo By 2009 estimate, $3 to $4 lost to
piracy for every $1 in software sales Usually, piracy consists of taking
software without modification In some cases, software is modified
o Commercial theft of intellectual property
o Thief really doesn’t want to get caught… HMMs for Piracy Detection
![Page 4: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/4.jpg)
4
Software Piracy
We assume software is stoleno And modified, making it hard to detecto If completely rewritten from scratch, we
won’t detect it by our approach Want to make life hard for bad guys
o Ideally, major modifications required How much modification is need
before we cannot reliably detect?
HMMs for Piracy Detection
![Page 5: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/5.jpg)
5
Goals
Technique applicable to any software
No special effort by developero Nothing extra inserted into code
We only require access to exe file Not a watermarking scheme
o More like software “birthmark” analysis
Also not plagiarism detectiono Here, want a “deeper” analysis
HMMs for Piracy Detection
![Page 6: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/6.jpg)
6
Use Case
You work for Alice’s Software Companyo And you develop fancy software for
ASC Trudy’s Software Company (TSC)
develops suspiciously similar product
You suspect TSC of stealing your codeo Not identical, but seems similar
What can you do?o We’ve got some ideas that might
help…
HMMs for Piracy Detection
![Page 7: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/7.jpg)
7
Use Case
Using the technique discussed here Can easily measure code similarity Low similarity?
o Then no hope of proving code is stolen High similarity?
o Further (costly) analysis is warranted High similarity does not prove
stoleno But a good reason to take a closer
look HMMs for Piracy Detection
![Page 8: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/8.jpg)
8
Background
Metamorphic softwareo Metamorphic techniques (dead code,
permutation, substitution) HMM
o Basic ideas and notationo The 3 problems and their solutions
(discussed at a high level) We’ve seen all of this before
HMMs for Piracy Detection
![Page 9: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/9.jpg)
9
Overview Training and scoring Train HMM on slightly morphed
copies of given “base” softwareo Slight morphing to avoid overfitting
Score morphed copies and other fileso Here, morphing serves to simulate
modifications by attacker Want to know how much morphing
required before detection failsHMMs for Piracy Detection
![Page 10: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/10.jpg)
10
Metamorphic Generator
Built our own metamorphic generator
Morph based on extracted opcodeso Morphing consists of dead code
insertiono Specify a dead code percentage and
number of blocks to insert Do not require morphed code works
o Makes detection more difficult, not easier
o A worst-case scenario, detection-wiseHMMs for Piracy Detection
![Page 11: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/11.jpg)
11
Training
Given a base executable file… Extract its opcode sequence Generate 100 slightly morphed
copieso Each morphed 10%, using dead code
extracted from random “normal” file Train HMM on morphed copies
o Using 5-fold cross validationo Note: We train one model for each
“fold”HMMs for Piracy Detection
![Page 12: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/12.jpg)
12
Training Illustration of training process
o Slightly morphed copies of base program
HMMs for Piracy Detection
![Page 13: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/13.jpg)
13
Determine Threshold
For each of 5-foldso Train HMMo Score 20 morphed files (match set)
and 15 normal (nomatch set) Determine threshold based on
scoreso Threshold is highest score of normal
fileo Implies FPR = 0; equivalently, TNR =
1 (for the given “fold”)HMMs for Piracy Detection
![Page 14: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/14.jpg)
14
Setting a Threshold Process used to set threshold
HMMs for Piracy Detection
![Page 15: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/15.jpg)
15
Experiments
Want to determine robustness For each base file tested… Train to obtain HMM and threshold Morph base file at various
percentageso Using various morphing strategieso Refer to this morphing as tampering
Score each tampered copyo Classify, based on threshold
HMMs for Piracy Detection
![Page 16: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/16.jpg)
16
Experiments Scoring tampered files
HMMs for Piracy Detection
![Page 17: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/17.jpg)
17
Experiment Details For each
base fileo 6 modelso 10
tamper percent for each
o 100 files each
o So, 6000 scores!
HMMs for Piracy Detection
![Page 18: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/18.jpg)
18
Experiment Details Tested 10 base files, each data
pointo So 60,000 scores computed…
HMMs for Piracy Detection
![Page 19: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/19.jpg)
19
Experiment Details Repeated entire experiment 6
timeso Using different number of blocks in
training phaseo Training made little difference on
scoreso So, here we only give results where 1
block used in training phase In total 360,000 scores computed
o And 360 “models” generateo That is, 1800 HMMs (one per fold)
HMMs for Piracy Detection
![Page 20: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/20.jpg)
20
Results: Bar Graph
HMMs for Piracy Detection
![Page 21: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/21.jpg)
21
Results: 3-d Plot
HMMs for Piracy Detection
![Page 22: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/22.jpg)
22
Conclusions
Results look very promisingo Robust high degree of morphing
required before base file undetectedo Practical only requires exe, no
special effort when developingo Applies to any exe, at any time
Overall, strong software “birthmark” strategy with practical implications
HMMs for Piracy Detection
![Page 23: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/23.jpg)
23
Future Work
Statistical analysis somewhat weako Results may be stronger than it
appears Many other scores/combinations of
scores can be testedo Results can only get better
Consider other morphing techniqueso And other file types (e.g., bytecode)o And mitigations for 1-block morphing
…
HMMs for Piracy Detection
![Page 24: Hidden Markov Models for Software Piracy Detection](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813556550346895d9cbaf4/html5/thumbnails/24.jpg)
24
References
S. Kazi and M. Stamp, Hidden Markov models for software piracy detection, Information Security Journal: A Global Perspective, 22:140-149, 2013
HMMs for Piracy Detection