Value-Based Program Characterization and Its Application to Software Plagiarism Detection
description
Transcript of Value-Based Program Characterization and Its Application to Software Plagiarism Detection
![Page 1: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/1.jpg)
Value-Based Program Characterization and Its Application to Software Plagiarism De-
tection
Embedded Lab.Park Yeongseong
ICSE 2011
Yoon-Chan Jhi, Xinran Wang, Sencun Zhu, Peng Liu, Dinghao Wu Penn State University
Xiaoqi JiaState Key Laboratory of Information Security, Institute of Software,
Chinese Academy of Sciences
![Page 2: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/2.jpg)
Introduction State of the art Core values Design Experiment Discussion Conclusion Q&A
Contents
![Page 3: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/3.jpg)
![Page 4: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/4.jpg)
Identifying same or similar code is very im-portant
Previous works◦ Static source code comparison – C1◦ Static excutable code comparison – C2◦ Dynamic control flow based methods – C3◦ Dynamic API based methods – C4
Introduction
![Page 5: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/5.jpg)
Three highly desired requirements◦ R1 – Resiliency◦ R2 - Ability to directly work on binary executables◦ R3 – Platform independence
BUT!!!! Not satisfy requirement◦ Static source code comparison – C1 R1 R2◦ Static excutable code comparison – C2 R1◦ Dynamic control flow based methods – C3 R1 R3◦ Dynamic API based methods – C4 R3
Introduction
![Page 6: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/6.jpg)
Introduce new approach◦ Core-values
5 optimization options (-O0 ~ -O3, -Os) 3 Compilers ( GCC, TCC, WCC ) KlassMaster, Thicket, Loco/Diablo Obfusca-
tors
Introduction
![Page 7: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/7.jpg)
Code Obfuscation Techniques◦ data obfuscation, control obfuscation, layout obfusca-
tion and preventive transformations◦ indirect branches, control-flow flattening, function-
pointer aliasing
Static Analysis Based Plagiarism Detection◦ String-based◦ AST-based◦ Token-based◦ PDG-based◦ Birthmark-based
State of the arts
![Page 8: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/8.jpg)
Dynamic Analysis Based Plagiarism Detec-tion◦ Whole program path based (WPP)◦ Sequence of API function calls birthmark(EXESEQ)◦ Frequency of API function calls
birthmark(EXEFREQ)◦ System call based birthmark
State of the arts
![Page 9: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/9.jpg)
Runtime values◦ The output operands of the machine instructions ex-
ecuted
Core values◦ Constructed from runtime values
Eliminate non-core values◦ If is not derived form , is not a core-value of ◦ If is not in the set of runtime values of is not a core-
value of
Core values
![Page 10: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/10.jpg)
Core values
![Page 11: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/11.jpg)
Not all values associated with the execution of a program are core-values◦ Value-updating instruction◦ Related to the program’s semantics
Design-Value Sequence Extrac-tion
![Page 12: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/12.jpg)
To refine value sequences◦ Sequential refinement – reduction rate 16%~34%◦ Optimization-based refinement – 5 optimization◦ Address removal – exclude pointer values
Design-Value Sequence Refinementand Similarity Metric
![Page 13: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/13.jpg)
Design-Overview
![Page 14: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/14.jpg)
Intel Quad-Core 2.00 GHz CPU 4GB RAM Linux machin QEMU 0.9.1
Questions1. resilient 2. false accusation3. credible
Experiment
![Page 15: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/15.jpg)
Obfuscation techniques◦ SandMark, KlassMaster : Java bytecode obfusca-
tors
Test application : Jlex◦ Lexical analyzer
Experiment-Obfuscation tool(resiliency)
![Page 16: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/16.jpg)
Test Application◦ 5 individual XML pasers:expat, libxml2, Parsifal,
rxp,xercesc
Experiment-Similar Programs(false accusation)
![Page 17: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/17.jpg)
Test application◦ Bzip2, gzip, oggenc, 9 of 11 programs
Result◦ Similarity scores between 0 and 0.27◦ zip and gzip similarity scores are 1.0
Same compression algorithm : deflate◦ zip and bzip2 similarity scores are 0.01 to 0.03
Different compression algorithm : block sorting
Experiment-Different Programs(credible)
![Page 18: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/18.jpg)
introduce a novel approach to dynamic characterization of executable programs.
The value-based method successfully dis-criminates 34 plagiarisms by SandMark, KlassMaster, Thicket.
Conclusion
![Page 19: Value-Based Program Characterization and Its Application to Software Plagiarism Detection](https://reader030.fdocuments.net/reader030/viewer/2022033100/568166c5550346895ddad238/html5/thumbnails/19.jpg)
Q&A