Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing:...

65
Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher Virus Bulletin 2008, Ottawa

Transcript of Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing:...

Page 1: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Graph, Entropy and Grid Computing:

Automatic Comparison of Malware

Ismael Briones

Aitor Gomez

PandaLabs Malware Researcher

Virus Bulletin 2008, Ottawa

Page 2: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

ChallengesChallenges

�� Identify new samplesIdentify new samples�� AutomaticallyAutomatically�� ASAPASAP

210/7/2008

�� Improve detection rateImprove detection rate�� Malware nomenclatureMalware nomenclature

Page 3: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

State of the ArtState of the Art

�� Ero Carrera & Gergely ErdélyiEro Carrera & Gergely Erdélyi��Digital Genome MappingDigital Genome Mapping

��Halvar FlakeHalvar Flake��Lot of researches in graphs analysis Lot of researches in graphs analysis

310/7/2008

��Lot of researches in graphs analysis Lot of researches in graphs analysis ��VxClassVxClass

��Marius GheorghescuMarius Gheorghescu��An Automated Virus Classification systemAn Automated Virus Classification system

Page 4: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

The SystemThe System

Grid Manager

410/7/2008

Grid Clients

ComparisonEngine

Page 5: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

The SystemThe System

Grid Manager

Gaobot Family

510/7/2008

Grid Clients

ComparisonEngine

Page 6: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Grid SystemGrid System

��Analyse as many samples as possibleAnalyse as many samples as possible��IDA Analysis take too much timeIDA Analysis take too much time��BOINC, GLOBUS, ARC (Advanced Resource Connector)BOINC, GLOBUS, ARC (Advanced Resource Connector)

610/7/2008

��BOINC, GLOBUS, ARC (Advanced Resource Connector)BOINC, GLOBUS, ARC (Advanced Resource Connector)��XMLRPCXMLRPC

Page 7: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Grid Server Grid Client

1) Agent2) Join Grid

710/7/2008

Page 8: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Grid Server Grid Client

1) Agent2) Join Grid

3) Send soft.

Grid Server Grid Client

7) Graph2DB

8) Requestsamples

10) Start comparison

810/7/2008

4) Req. task

5) Sample

Analize Sample

9) Send graphsinfo

comparison

11) Send Result

Page 9: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Sample AnalysisSample Analysis

�Ida Pro + IdaPython

910/7/2008

Page 10: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Sample AnalysisSample Analysis

�Ida Pro + IdaPython

�Flow Graph

1010/7/2008

Page 11: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Sample AnalysisSample Analysis

�Ida Pro + IdaPython

�Flow Graph

1110/7/2008

Page 12: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Sample AnalysisSample Analysis

�Ida Pro + IdaPython

�Flow Graph

Adjacency MatrixAdjacency Matrix

1210/7/2008

Columns & rows = graph nodes(i,j) = 1, if i calls j

Page 13: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Sample AnalysisSample Analysis

�Ida Pro + IdaPython

�Flow Graph

1310/7/2008

�Functions Control Flow Graph (CFG) signature

Page 14: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Sample AnalysisSample Analysis

�Ida Pro + IdaPython

�Flow Graph

1410/7/2008

�Functions Control Flow Graph (CFG) signature

Page 15: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Sample AnalysisSample Analysis

�Ida Pro + IdaPython

�Flow Graph Block 3Block 3

Block 2Block 2

Block 1Block 1 API 1API 1Signature: [6:8:3]Signature: [6:8:3]

1510/7/2008

�Functions Control Flow Graph (CFG) signatureBlock 5Block 5

Block 6Block 6

Block 4Block 4

API 2API 2

API 3API 3

Page 16: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Sample AnalysisSample Analysis

�Ida Pro + IdaPython

�Flow Graph

1610/7/2008

�Functions Control Flow Graph (CFG)

�Function’s crc32

Page 17: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Sample AnalysisSample Analysis

�Ida Pro + IdaPython

�Flow Graph

CRC32(Push+mov+push+push+push+call+…)

1710/7/2008

�Functions Control Flow Graph (CFG)

�Function’s crc32

Page 18: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Sample AnalysisSample Analysis

�Ida Pro + IdaPython

�Flow Graph

1810/7/2008

�Functions Control Flow Graph (CFG)

�Function’s crc32

�Operating Systems and Library Calls (API’s)

�Functions names (sub_, nullsub_,….)

Page 19: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Comparison AlgorithmComparison Algorithm

�Select best samples

•Compiler type (Peid Signature)

1910/7/2008

•Compiler type (Peid Signature)•File Size•Number Api Functions•Number custom functions•Checksum & Entropy

Page 20: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Checksum‘A checksum is a form of redundancy check, […] It ‘A checksum is a form of redundancy check, […] It works by adding up the basic components of a works by adding up the basic components of a message, typically the assorted bits message, typically the assorted bits [in our case [in our case each byte]each byte], and storing the resulting value.’ , and storing the resulting value.’

2010/7/2008

[from wikipedia][from wikipedia]

Page 21: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Checksum PropertiesChecksum Properties��Similar blocks has very close checksum valuesSimilar blocks has very close checksum values

Checksum: 0x260A

2110/7/2008

Checksum: 0x276D

Page 22: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Checksum PropertiesChecksum Properties��Similar blocks has very close checksum valuesSimilar blocks has very close checksum values

Checksum: 0x260A

2210/7/2008

Checksum: 0x276D

Checksum Diff: 0x163Checksum Diff: 0x163

Page 23: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Checksum disadvantagesChecksum disadvantages��A checksum is not changed by:A checksum is not changed by:

��Inserting or deleting zeroInserting or deleting zero--valued bytesvalued bytes

0xAA+0xBB = 0x1650xAA+0x00+0xBB+0x00=0x165

2310/7/2008

��Reordering of the bytes in the data.Reordering of the bytes in the data.

0xAA+0xBB = 0x1650xAA+0x00+0xBB+0x00=0x165

0xAA+0xBB = 0x1650xBB+0xAA=0x165

Page 24: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

��Checksum value depends on 2 factorsChecksum value depends on 2 factors::

��Size of data blockSize of data block��Base (basic component: bits, bytes,…)Base (basic component: bits, bytes,…)

256 (1 byte)

255 x 1024 = 0x03FC00(3 bytes)

2410/7/2008

1024 bytes

256 (1 byte)

65.536 (2 bytes)

4.294.976.296 (4 bytes)

4,294,967,295 x 256 =0xFFFFFFFF00 (5 bytes)

65,535 x 512 = 0x01FFFE00(4 bytes)

Page 25: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

��Checksum value depends on 2 factorsChecksum value depends on 2 factors::

��Size of data blockSize of data block��Base (basic component: bits, bytes,…)Base (basic component: bits, bytes,…)

256 (1 byte)

255 x 1024 = 0x03FC00(3 bytes)Bigger base implies a

2510/7/2008

1024 bytes

256 (1 byte)

65.536 (2 bytes)

4.294.976.296 (4 bytes)

4,294,967,295 x 256 =0xFFFFFFFF00 (5 bytes)

65,535 x 512 = 0x01FFFE00(4 bytes)

Bigger base implies a bigger checksum for the same data block

Page 26: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Bigger base involve a Bigger checksumBigger base involve a Bigger checksum�Decreases the probability of collision (same

checksum)

2610/7/2008

Page 27: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Bigger base involve bigger checksum differenceBigger base involve bigger checksum difference�� Similar data blocks could get significantly different checksum Similar data blocks could get significantly different checksum

valuesvalues

2710/7/2008

Page 28: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Bigger base involve bigger checksum differenceBigger base involve bigger checksum difference�� Similar data blocks could get significantly different checksum Similar data blocks could get significantly different checksum

valuesvalues

As code is represented as bytes, our approach will use a base of 256

2810/7/2008

As code is represented as bytes, our approach will use a base of 256

(1byte)

Page 29: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Our ChecksumOur Checksum��4KB, in blocks of 1KB, 4KB, in blocks of 1KB, from the beginning of the from the beginning of the 1st , 2nd and last 1st , 2nd and last sectionssections

2910/7/2008

Page 30: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Checksum AlgorithmChecksum Algorithm

��Substract checksum from both filesSubstract checksum from both files

3010/7/2008

Page 31: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Sample A Sample B

ABS(ChksmA – ChksmB) <= EntropyDiff

3110/7/2008

Page 32: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Sample A Sample B

ABS(ChksmA – ChksmB) <= EntropyDiff

3210/7/2008

Page 33: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Sample A Sample B

3310/7/2008

ABS(ChksmA – ChksmB) <= EntropyDiff

Page 34: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Sample A Sample B

Sample Selected

3410/7/2008

Sample Selected

Page 35: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Checksum vs EntropyChecksum vs Entropy

3510/7/2008

Page 36: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Checksum Collisions & EntropyChecksum Collisions & Entropy

High entropyLots of collisions

3610/7/2008

Low entropyLess collisions

Low entropyLess collisions

Page 37: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Checksum Collisions & EntropyChecksum Collisions & Entropy

High entropyLots of collisions

Medium-low entropy

3710/7/2008

Low entropyLess collisions

Low entropyLess collisions

Medium-low entropy contain code

Page 38: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Penalized checksumPenalized checksum

��Penalized checksum:Penalized checksum:��Don’t use blocks with high entropyDon’t use blocks with high entropy

3810/7/2008

��Minimal checksum changes in highMinimal checksum changes in high--entropy entropy blocks will generate distant checksum values.blocks will generate distant checksum values.

Page 39: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Comparison AlgorithmComparison Algorithm

�Select best samples

�Start comparison:�Graph Comparison

3910/7/2008

�Graph Comparison

Page 40: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

��Matrix AMatrix A

a1a1 a2a2 a3a3 a4a4

f1f1 00 11 00 11

f2f2 00 00 00 00

��Matrix BMatrix B

a1a1 a2a2 a3a3 a4a4

f1’f1’ 00 11 00 11

f2’f2’ 00 00 00 00

4010/7/2008

f3f3 11 00 00 00

f4f4 00 00 11 00

f3’f3’ 11 00 00 00

f4’f4’ 11 00 00 00

Page 41: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

��Matrix AMatrix A

a1a1 a2a2 a3a3 a4a4

f1f1 00 11 00 11

f2f2 00 00 00 00

��Matrix BMatrix B

a1a1 a2a2 a3a3 a4a4

f1’f1’ 00 11 00 11

f2’f2’ 00 00 00 00Common functions:Common functions:

••API FunctionsAPI Functions

4110/7/2008

f3f3 11 00 00 00

f4f4 00 00 11 00

f3’f3’ 11 00 00 00

f4’f4’ 11 00 00 00

••API FunctionsAPI Functions••Same and unique CRC32Same and unique CRC32••Same and unique CFGSame and unique CFG

Page 42: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

��Matrix AMatrix A

a1a1 a2a2 a3a3 a4a4

f1f1 00 11 00 11

f2f2 00 00 00 00

��Matrix BMatrix B

a1a1 a2a2 a3a3 a4a4

f1’f1’ 00 11 00 11

f2’f2’ 00 00 00 00

4210/7/2008

f3f3 11 00 00 00

f4f4 00 00 11 00

f3’f3’ 11 00 00 00

f4’f4’ 11 00 00 00

Page 43: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

��Matrix AMatrix A

a1a1 a2a2 a3a3 a4a4

f1f1 00 11 00 11

f2f2 00 00 00 00

��Matrix BMatrix B

a1a1 a2a2 a3a3 a4a4

f1’f1’ 00 11 00 11

f2’f2’ 00 00 00 00

4310/7/2008

f3f3 11 00 00 00

f4f4 00 00 11 00

f3’f3’ 11 00 00 00

f4’f4’ 11 00 00 00

f1=f1’ and f1 != {f2’,f3’,f4’}f1=f1’ and f1 != {f2’,f3’,f4’}

Page 44: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

��Matrix AMatrix A

a1a1 a2a2 a3a3 a4a4 f1f1

f2f2 00 00 00 00 11

f3f3 11 00 00 00 11

��Matrix BMatrix B

a1a1 a2a2 a3a3 a4a4 f1’f1’

f2’f2’ 00 00 00 00 11

f3’f3’ 11 00 00 00 11

4410/7/2008

f3f3 11 00 00 00 11

f4f4 00 00 11 00 00

f3’f3’ 11 00 00 00 11

f4’f4’ 11 00 00 00 00

Page 45: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

��Matrix AMatrix A

a1a1 a2a2 a3a3 a4a4 f1f1

f2f2 00 00 00 00 11

f3f3 11 00 00 00 11

��Matrix BMatrix B

a1a1 a2a2 a3a3 a4a4 f1’f1’

f2’f2’ 00 00 00 00 11

f3’f3’ 11 00 00 00 11

4510/7/2008

f3f3 11 00 00 00 11

f4f4 00 00 11 00 00

f3’f3’ 11 00 00 00 11

f4’f4’ 11 00 00 00 00

f2=f2’ and f2 != {f3’,f4’}f2=f2’ and f2 != {f3’,f4’}

Page 46: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

��Matrix AMatrix A

a1a1 a2a2 a3a3 a4a4 f1f1 f2f2

f3f3 11 00 00 00 11 11

f4f4 00 00 11 00 00 11

��Matrix BMatrix B

a1a1 a2a2 a3a3 a4a4 f1’f1’ f2’f2’

f3’f3’ 11 00 00 00 11 00

f4’f4’ 11 00 00 00 00 11

4610/7/2008

f4f4 00 00 11 00 00 11 f4’f4’ 11 00 00 00 00 11

f3 != {f3’,f4’} and f4 != {f3’,f4’}f3 != {f3’,f4’} and f4 != {f3’,f4’}

Two functions have been identified: f1 and f2Two functions have been identified: f1 and f2

Page 47: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Comparison AlgorithmComparison Algorithm

�Select best samples

�Start comparison:�Graph Comparison

4710/7/2008

�Graph Comparison

�Match more functions with CFG

Page 48: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

��CFG signature = 3CFG signature = 3--tuple vector in Euclidean spacetuple vector in Euclidean space

��Find minimal and unique ED among functionsFind minimal and unique ED among functions

Control Flow GraphControl Flow Graph

4810/7/2008

ThreeThree--dimensionaldimensionalEuclidean distanceEuclidean distance

Page 49: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Comparison AlgorithmComparison Algorithm

�Select best samples

�Start comparison:�Graph Comparison

4910/7/2008

�Graph Comparison

�Match more functions with CFG

�Index of Similarity

Page 50: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Index of SimilarityIndex of Similarity��Measure how close two binaries are.Measure how close two binaries are.

5010/7/2008

Page 51: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Index of SimilarityIndex of Similarity

Index of similarity vs number of matched functionsIndex of similarity vs number of matched functions

First Exp.

5110/7/2008

Second Exp.

Page 52: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

ImprovementsImprovements

��More Initial Fixed PointsMore Initial Fixed Points

5210/7/2008

Page 53: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

More Fixed PointsMore Fixed Points

SPP (Small Prime Product)SPP (Small Prime Product)

Identical String referencesIdentical String references

5310/7/2008

In/Out degree (similar number of calls to/calls from)In/Out degree (similar number of calls to/calls from)

Match same name (sub_XXXXXX) if same CRC32Match same name (sub_XXXXXX) if same CRC32

Stack Frame Size (similar stack frame size)Stack Frame Size (similar stack frame size)

Page 54: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

ImprovementsImprovements

��More Initial Fixed PointsMore Initial Fixed Points

��Python version improvementsPython version improvements

5410/7/2008

Page 55: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

��Psyco Psyco –– a Justa Just--InIn--Time compilerTime compiler

Python version improvementsPython version improvements

��Extend Python code with CExtend Python code with C

5510/7/2008

��From Python to C/C++From Python to C/C++��Develop the comparison algorithm in C/C++Develop the comparison algorithm in C/C++

Page 56: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

ImprovementsImprovements

��Python version improvementsPython version improvements

��Matrix rows as bitsMatrix rows as bits

��More Initial Fixed PointsMore Initial Fixed Points

5610/7/2008

��Matrix rows as bitsMatrix rows as bits

Page 57: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

��Avoid string comparison (string comparison is a timeAvoid string comparison (string comparison is a time--consuming task)consuming task)

��Treat rows as groups of bits (100110011101…)Treat rows as groups of bits (100110011101…)

��Split rows as groups of 32 bitsSplit rows as groups of 32 bits

Matrix rows as bitsMatrix rows as bits

5710/7/2008

��Compare as integers (integer comparison is faster)Compare as integers (integer comparison is faster)

Page 58: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

ImprovementsImprovements

��Python version improvementsPython version improvements

��Matrix rows as bitsMatrix rows as bits

��More Initial Fixed PointsMore Initial Fixed Points

5810/7/2008

��Matrix rows as bitsMatrix rows as bits

��Stream SIMD extensionsStream SIMD extensions

Page 59: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Streaming SIMD extensionsStreaming SIMD extensions

��SSE added eight 128SSE added eight 128--bits registers: bits registers: XMM0XMM0--XMM7XMM7

��Each register packs together four 32Each register packs together four 32--bit integersbit integers

5910/7/2008

��Compare 4x32 (four integers) in one instruction:Compare 4x32 (four integers) in one instruction:��__m128 _mm_cmpeq_ps(__m128 a, __m128 b)__m128 _mm_cmpeq_ps(__m128 a, __m128 b)

Page 60: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

ImprovementsImprovements

��Python version improvementsPython version improvements

��Matrix rows as bitsMatrix rows as bits

��More Initial Fixed PointsMore Initial Fixed Points

6010/7/2008

��Matrix rows as bitsMatrix rows as bits

��Stream SIMD extensionsStream SIMD extensions

��NVIDIA CudaNVIDIA Cuda

Page 61: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

NVIDIA CudaNVIDIA Cuda��Cuda: compiler and SDK for NVIDIA GPUsCuda: compiler and SDK for NVIDIA GPUs

��GPUs:GPUs:��Parallel "manyParallel "many--core" architecturecore" architecture��Each core: thousands of threads simultaneouslyEach core: thousands of threads simultaneously

6110/7/2008

��Each core: thousands of threads simultaneouslyEach core: thousands of threads simultaneously

��Not tried yet. Future development:Not tried yet. Future development:

��Code algorithm for graphics processing unit (GPU)Code algorithm for graphics processing unit (GPU)

��Launch one thread for each compared sampleLaunch one thread for each compared sample

��Launch a thread for each compared rowLaunch a thread for each compared row

Page 62: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

PossibilitiesPossibilities

��Shared code => generic signatureShared code => generic signature

��Port information from malware databasePort information from malware database

6210/7/2008

��Shared code => generic signatureShared code => generic signature�Functions with same CRC32

��Clusterization of malware automaticallyClusterization of malware automatically�Classify unknown samples (index of similarity)

Page 63: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

BarriersBarriers

�Hard bound to IDA

�Only unpacked samples

6310/7/2008

�Hard bound to IDA�Big database storage�Delphi and VB samples don’t work well

Page 64: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Demo

6410/7/2008

Page 65: Graph, Entropy and Grid Computing: Automatic Comparison … · Graph, Entropy and Grid Computing: Automatic Comparison of Malware Ismael Briones Aitor Gomez PandaLabs Malware Researcher

Questions?

Thank you very much

6510/7/2008

Questions?

[email protected]@pandasecurity.com