Selected Topics in Automated Diversity

24
Carnegie Mello Selected Topics in Automated Diversity Stephanie Forrest University of New Mexico Mike Reiter Dawn Song Carnegie Mellon University

description

Selected Topics in Automated Diversity. Stephanie Forrest University of New Mexico. Mike Reiter Dawn Song Carnegie Mellon University. Automated Diversity for Security. Computer systems are highly uniform Easy targets for standardized attacks. Use idea of biological diversity: - PowerPoint PPT Presentation

Transcript of Selected Topics in Automated Diversity

Page 1: Selected Topics in Automated Diversity

Carnegie Mellon

Selected Topics in Automated Diversity

Stephanie ForrestUniversity of New Mexico

Mike Reiter Dawn SongCarnegie Mellon University

Page 2: Selected Topics in Automated Diversity

Carnegie Mellon

Automated Diversity for Security Computer systems are highly uniform

Easy targets for standardized attacks. Use idea of biological diversity:

Introduce changes that make each system unique Attack will need to be rewritten for each computer Provide population resilience to unknown environmental threats

Two approaches: Interface diversity: Adapt vulnerable interfaces such as machine

language, system call numbers, and standard library locations. Implementation diversity: Utilize diverse implementations of

common services Two projects:

Randomized instruction set emulation [Barrantes, Ackley and Forrest] Behavioral distance for anomaly detection [Gao, Reiter and Song]

Page 3: Selected Topics in Automated Diversity

Carnegie Mellon

Randomized Instruction Set Emulation (RISE)An example of interface diversity Many current attacks insert binary code into a running program which is then executed. RISE protects the code itself, rather than points-of-entry:

Perimeter defense (e.g., stack protection) not enough. Randomize binary code instruction set for every program:

Foreign malicious code will try to execute code in the standard format and will fail. Knowledge of a particular translation will gain access only to that particular program.

Modify compiler/virtual machine to accept this “new” language: Prototype in open-source binary-to-binary translator Valgrind. Related to encrypting compilers.

Page 4: Selected Topics in Automated Diversity

Carnegie Mellon

How does foreign code infect a running program?

Page 5: Selected Topics in Automated Diversity

Carnegie Mellon

Page 6: Selected Topics in Automated Diversity

Carnegie Mellon

Page 7: Selected Topics in Automated Diversity

Carnegie Mellon

Page 8: Selected Topics in Automated Diversity

Carnegie Mellon

Results

Prototype implementation available under GPL from http://www.cs.unm.edu/~immsec:

Normal code runs properly. Binary code injection attacks stopped (100% of tested examples).

Performance (preliminary): Emulation overhead of Valgrind is high. Incremental cost of RISE is small. (Very) roughly a factor of 2 slowdown in current configuration. Significant space penalty:

Libraries Mask

Page 9: Selected Topics in Automated Diversity

Carnegie Mellon

Page 10: Selected Topics in Automated Diversity

Carnegie Mellon

Host-Based Anomaly Detector

User SpaceKernel Space

Is this system call request anomalous?

Model3 5 11

Anomalous?(Y/N)

Can we use another computer as the model?

Page 11: Selected Topics in Automated Diversity

Carnegie Mellon

Fault-Tolerant System

Commercial Off-the-shelf applications: may not produce the same responses Intrusions that do not result in observable deviation in the responses Need to observe the behavior

Request

Output

Voting

Response

Response

Response

Page 12: Selected Topics in Automated Diversity

Carnegie Mellon

The Problem

3 43 5 3 4

9 6 302 10 46 6 222

Match?

Diverse Platform (Linux and Windows) System call numbers observed do not have semantic meanings System calls may not have one-to-one correspondence System call sequences may have different length

Diverse Implementation (Apache and Abyss) Correspondence may not exist between individual system calls

Page 13: Selected Topics in Automated Diversity

Carnegie Mellon

Evolutionary Distance Are two DNA sequences derived from a common ancestral

sequence? Evolutionary distance between two DNA sequences

Substitutions Deletions Insertions

ATGCGTCGTTATCCGCGAT

ATGC-GTCGTTAT-CCG-CGAT

A C G T -

A 0 - - - -

C 0.3 0 - - -

G 0.1 0.1 0 - -

T 0.2 0.2 0.1 0 -

- 0.3 0.6 0.5 0.8 01.22.08.06.05.0 Dist

Insertion/Deletion (I/D) Symbols

Page 14: Selected Topics in Automated Diversity

Carnegie Mellon

Behavioral Distance and Evolutionary Distance

Similarities Evaluate difference between two sequences Substitutions, Deletions and Insertions

Differences Same system call number in two sequences are not the “same” We do not have the cost table in behavioral distance measure We have training data

Page 15: Selected Topics in Automated Diversity

Carnegie Mellon

Behavioral Distance Behavioral distance calculation Learning the cost table

Initializing the cost table Iteratively updating the cost table

System call phrase extraction

Page 16: Selected Topics in Automated Diversity

Carnegie Mellon

Behavioral Distance Calculation

,,

,,

2,21,22

2,11,11

sss

sss

),( nsExt The set of sequences obtained by inserting n-len(s) I/D symbols into s, at any location

)','(),( ,21

,1','

21 min21

i

n

ii

ss

sscostssDist

),('),(' 2211 nsExtsnsExts

ATGCGTCGTTATCCGCGAT

ATGC-GTCGTTAT-CCG-CGAT

Page 17: Selected Topics in Automated Diversity

Carnegie Mellon

Learning the Cost Table Training data: subjecting the replicas to a battery of well-formed

(benign) requests and observing the system calls induced Initializing the cost table

The first approach: comparing semantics of individual system calls The second approach: using frequency information

Iteratively updating the cost table Use the initialized cost table to calculate behavioral distance between

system call sequences in the training data Results of the behavioral distance reveal the “proper alignments”

between system calls Use these “proper alignments” to update the cost table

Page 18: Selected Topics in Automated Diversity

Carnegie Mellon

System call Phrases Correspondence may not exist between individual system calls Behavioral distance calculation is very slow when sequences are

long Solution: group system calls into system call phrases

System call phrases are also called system call subsequences A system call phrase is a sequence of system calls that frequently

appear together in program execution TEIRESIAS algorithm (also taken from Biology) TEIRESIAS algorithm has been used in other intrusion/anomaly

detection systems

Page 19: Selected Topics in Automated Diversity

Carnegie Mellon

Evaluation – Experimental Setup

Linux

Windows XP

· Duplicate Request

· Behavioral Distance

Calculation

· Output Voting

Page 20: Selected Topics in Automated Diversity

Carnegie Mellon

Behavioral Distance – Same Application

Apache Webserver Myserver Webserver

Page 21: Selected Topics in Automated Diversity

Carnegie Mellon

Behavioral Distance – Different Application

Linux: Apache WebserverWindows: Myserver Webserver

Linux: Myserver WebserverWindows: Apache Webserver

Page 22: Selected Topics in Automated Diversity

Carnegie Mellon

Behavioral Distance – Mimicry Attacks

Server on Linux Apache Myserver Myserver Apache

Server on Windows Apache Myserver Apache Myserver

Mimicry on Linux

10.28319499.9093%

26.656983100%

6.90859099.4555%

32.764897100%

Mimicry on Windows

6.84281399.4555%

9.96778099.4555%

13.354194100%

5.28087599.4555%

Mimicry on Linux

3.73698.9111%

13.657100%

2.73198.9111%

13.813100%

Mimicry on Windows

2.6598.7296%

2.17498.0944%

2.18798.9111%

2.6497.8221%

Attacker knows individual IDS on one replica

Attack knows behavioral distance and the cost table

Behavioral distance of the best mimicry attack

True acceptance rate when threshold is set to detect the best mimicry attack

Page 23: Selected Topics in Automated Diversity

Carnegie Mellon

Performance Overhead

Page 24: Selected Topics in Automated Diversity

Carnegie Mellon

Conclusion Behavioral distance detects an attack on one process that

causes its behavior to deviate from that of another Behavioral distance makes evasion attacks more difficult with

moderate overhead