Shuffler: Fast and Deployable Continuous Code Re …...Return-Oriented Programming ... permutation...

Post on 10-Aug-2020

9 views 0 download

Transcript of Shuffler: Fast and Deployable Continuous Code Re …...Return-Oriented Programming ... permutation...

1

Shuffler: Fast and DeployableContinuous Code Re-Randomization

David Williams-King,Graham Gobieski, Kent Williams-King, James P. Blake,

Xinhao Yuan, Patrick Colp, Michelle Zheng,Vasileios P. Kemerlis, Junfeng Yang, William Aiello

OSDI 2016

2

Software Remains Vulnerable

● High-profile server breaches are commonplace

3

Software Remains Vulnerable

● High-profile server breaches are commonplace● 90% of today’s attacks utilize ROP [1]

4

Return-Oriented Programming

● Reuse fragments of legitimate code (gadgets)

func_3

func_2

func_1

func_3

func_2

func_1

Program code

ret addr

Stack

5

Return-Oriented Programming

● Reuse fragments of legitimate code (gadgets)

Program code

ret addr

Stack

6

Return-Oriented Programming

● Reuse fragments of legitimate code (gadgets)

Stack

ret addrret addr

ret addrdata

Buffer Overrun

ret addr

Program code

7

Return-Oriented Programming

● Reuse fragments of legitimate code (gadgets)

ROP gadget chain

Stack

ret addrret addr

ret addrdata

Buffer Overrun

ret addr

Program code

8

Modern ROP Attacks

● JIT-ROP [2]: iteratively read code at runtime

9

Modern ROP Attacks

● JIT-ROP [2]: iteratively read code at runtime

func_3

func_2

func_1

Target program Attacker

func_3

func_2

func_1

10

Modern ROP Attacks

● JIT-ROP [2]: iteratively read code at runtimeTarget program Attacker

func_3

func_2

func_1

11

Modern ROP Attacks

● JIT-ROP [2]: iteratively read code at runtime

ROP gadget chain

Target program Attacker

Inject exploit

func_3

func_2

func_1

12

Modern ROP Attacks

● JIT-ROP [2]: iteratively read code at runtime

ROP gadget chain

Target program Attacker

Inject exploit

func_3

func_2

func_1

13

The Shuffler Idea

● What if we re-randomize code more rapidly than an attacker discovers gadgets?

func_3

func_2

func_1

func_3

func_2

func_1

14

The Shuffler Idea

● What if we re-randomize code more rapidly than an attacker discovers gadgets?

func_3

func_2

func_1

15

The Shuffler Idea

● What if we re-randomize code more rapidly than an attacker discovers gadgets?

func_3

func_2

func_1

func_3

func_2

func_1

16

The Shuffler Idea

● What if we re-randomize code more rapidly than an attacker discovers gadgets?

ROP gadget chain

Inject exploit

func_3

func_2

func_1

??

17

The Shuffler Idea

● What if we re-randomize code more rapidly than an attacker discovers gadgets?

ROP gadget chain

Inject exploit

18

How Is This Possible?

● Re-randomize code before an attacker uses it

19

How Is This Possible?

● Re-randomize code before an attacker uses it– faster than disclosure vulnerability execution time;

– faster than gadget chain computation time;

– or, faster than network communication time

20

How Is This Possible?

● Re-randomize code before an attacker uses it– faster than disclosure vulnerability execution time;

– faster than gadget chain computation time;

– or, faster than network communication time

21

How Is This Possible?

● Re-randomize code before an attacker uses it– faster than disclosure vulnerability execution time;

– faster than gadget chain computation time;

– or, faster than network communication time● one memory disclosure can only travel 820 miles!

22

What Is Shuffler?

● Defense based on continuous re-randomization– Defeats all known code reuse attacks

– 20-50 millisecond shuffling, scales to 24 threads

● Fast: bounds attacker’s available time– Defeats even attackers with zero network latency

● Deployable:– Binary analysis w/o modifying kernel, compiler, ...

● Egalitarian:– Shuffler runs in same address space, defends itself

23

Outline

24

Outline

1. Continuous re-randomization

2. Accelerating our randomization

3. Binary analysis and egalitarianism

4. Results and Demo

25

func_1

...call func_2...

Continuous Re-Randomization

● Easy to copy code & fix direct references

func_2

func_2

26

func_1

...call func_2...

Continuous Re-Randomization

● Easy to copy code & fix direct references

(deleted)

func_2

27

Continuous Re-Randomization

● Easy to copy code & fix direct references● What about code pointers?

28

func_1

...mov $func_2, ptr...call *ptr...

Continuous Re-Randomization

● Easy to copy code & fix direct references● What about code pointers?

func_2

ptr:

29

func_1

...mov $func_2, ptr...call *ptr...

Continuous Re-Randomization

● Easy to copy code & fix direct references● What about code pointers?

func_2

&func_2ptr:

30

func_1

...mov $func_2, ptr...call *ptr...

Continuous Re-Randomization

● Easy to copy code & fix direct references● What about code pointers?

func_2(deleted)func_2

&func_2ptr:

31

func_1

...mov $func_2, ptr...call *ptr...

Continuous Re-Randomization

● Easy to copy code & fix direct references● What about code pointers?

&func_2ptr:

(deleted)func_2

32

Continuous Re-Randomization

● Easy to copy code & fix direct references● What about code pointers?

● How to update allpropagated pointers?

&func_2ptr:

func_2(deleted)

&func_2&func_2

&func_2

&func_2

&func_2

&func_2&func_2

func_2

33

Continuous Re-Randomization

● Solution: add extra level of indirection

f_2_idxptr:

func_2

f_2_idxf_2_idxf_2_idx

f_2_idx

...

%gs: (table)

...

&func_2

...

34

Continuous Re-Randomization

● Solution: add extra level of indirection

f_2_idxptr:

func_2

f_2_idxf_2_idxf_2_idx

f_2_idx

...

%gs: (table)

...

&func_2

...

f_2_idx

f_2_idx

f_2_idx

35

Continuous Re-Randomization

● Solution: add extra level of indirection

f_2_idxptr:

func_2

f_2_idxf_2_idxf_2_idx

f_2_idx

...

%gs: (table)

...

&func_2

...

f_2_idx

f_2_idx

f_2_idx

func_2

36

Continuous Re-Randomization

● Solution: add extra level of indirection

f_2_idxptr:f_2_idxf_2_idx

f_2_idxf_2_idx

...

%gs: (table)

...

&func_2

...

f_2_idx

f_2_idx

f_2_idx

func_2

(deleted)

37

Code Pointer Abstraction

● Transforming *code_ptr into **code_ptr– Correctness: pointer updates sound & precise

– Disclosure-resilience: code ptr table is hidden

38

Code Pointer Abstraction

● Transforming *code_ptr into **code_ptr– Correctness: pointer updates sound & precise

– Disclosure-resilience: code ptr table is hidden

f_2_idxptr: func_2

func_2

...%gs:

...

39

Code Pointer Abstraction

● Transforming *code_ptr into **code_ptr– Correctness: pointer updates sound & precise

– Disclosure-resilience: code ptr table is hidden

f_2_idxptr: func_2

func_2

...%gs:

...

mov $0x40054d, %rax

=> mov $0x20, %rax

Rewrite initialization pointsRewrite call sites callq *%rax

=> callq *%gs:(%rax)

40

Outline

1. Continuous re-randomization

2. Accelerating our randomization

3. Binary analysis and egalitarianism

4. Results and Demo

41

Return Address Encryption

● Return addresses are code pointers too● Could use code pointer table, but inefficient

– call/ret instructions highly optimized

42

Return Address Encryption

● Return addresses are code pointers too● Could use code pointer table, but inefficient

– call/ret instructions highly optimized

● Alternative mechanism – correct and hidden– Use normal call instructions

– Encrypt return addresses with XOR key

43

Return Address Encryption

● Prevent return address disclosure

44

Return Address Encryption

● Prevent return address disclosure

Thread Stack

ret addr

func_2

func_1

ret addr

ret addr

func_3

45

Return Address Encryption

● Prevent return address disclosure

Thread Stack

(encrypted)

func_2

func_1

(encrypted)

(encrypted)

func_3

+

+

+

XOR key

46

Return Address Encryption

● Prevent return address disclosure

func:

; original code

ret

Thread Stack

(encrypted)

func_2

func_1

(encrypted)

(encrypted)

func_3

+

+

+

XOR key

47

Return Address Encryption

● Prevent return address disclosure● We use binary rewriting (expand basic blocks)

func:mov %fs:0x28,%r11xor %r11,(%rsp); original codemov %fs:0x28,%r11xor %r11,(%rsp)ret

Thread Stack

(encrypted)

func_2

func_1

(encrypted)

(encrypted)

func_3

+

+

+

XOR key

48

Return Address Migration

● Unwind stack and re-encrypt new addresses

Thread Stack

(encrypted)

func_2func_1

(encrypted)

(encrypted)

+

+

+

XOR key

func_3

49

Return Address Migration

● Unwind stack and re-encrypt new addresses

Thread Stack

func_2func_1

func_2

func_1+

+

+

XOR key

func_3

func_3

(encrypted)

(encrypted)

(encrypted)

50

Return Address Migration

● Unwind stack and re-encrypt new addresses

Thread Stack

(deleted)(deleted)

func_2

func_1+

+

+

XOR key

(deleted)

func_3

(encrypted)

(encrypted)

(encrypted)

51

Asynchronous Randomization

52

Asynchronous Randomization

Computations

20ms shuffle period

● Creating new code copies takes time

53

Asynchronous Randomization

● Creating new code copies takes time

ComputationsGenerate

permutationMake newcode copy

Fix callinstructions

Update codepointer table

Stackunwind

15ms shuffling overhead5ms real work

54

Asynchronous Randomization

5ms real work

● Creating new code copies takes time● Shuffler prepares new code asynchronously

Generatepermutation

Make newcode copy

Fix callinstructions

Update codepointer table

Stackunwind

15ms shuffling overhead

Computations

55

Asynchronous Randomization

● Creating new code copies takes time● Shuffler prepares new code asynchronously

Stackunwind

Stackunwind

19.94ms real work 0.06ms

Computations Computations

Generatepermutation

Make newcode copy

Fix callinstructions

Update codepointer table

56

Asynchronous Randomization

● Creating new code copies takes time● Shuffler prepares new code asynchronously● Each thread unwinds its own stack in parallel

99.7% of runtime 0.3%

Computations

Generatepermutation

Make newcode copy

Fix callinstructions

Update codepointer table

Stackunwind

Stackunwind

Computations

57

Outline

1. Continuous re-randomization

2. Accelerating our randomization

3. Binary analysis and egalitarianism

4. Results and Demo

58

Augmented Binary Analysis

● Use additional info from unmodified compilers– Symbols, to distinguish code and data (no -s)

– Relocations, to find all code pointers (--emit-relocs)

59

Augmented Binary Analysis

● Use additional info from unmodified compilers– Symbols, to distinguish code and data (no -s)

– Relocations, to find all code pointers (--emit-relocs)

.section .rodata: .quad 0x400620

.section .text: mov $0x400620, %rax

Code pointer, or integer?

60

Augmented Binary Analysis

● Use additional info from unmodified compilers– Symbols, to distinguish code and data (no -s)

– Relocations, to find all code pointers (--emit-relocs)

.section .rodata: .quad 0x400620

.section .text: mov $0x400620, %rax

.section .rodata: .quad 4195872

.section .text: mov $4195872, %rax

Code pointer, or integer?

61

Augmented Binary Analysis

● Use additional info from unmodified compilers– Symbols, to distinguish code and data (no -s)

– Relocations, to find all code pointers (--emit-relocs)

.section .rodata: .quad 0x400620

.section .text: mov $0x400620, %rax

Code pointer, or integer?

Relocations (meta-data)

.section .rodata: .quad 4195872

.section .text: mov $4195872, %rax

62

Augmented Binary Analysis

● Use additional info from unmodified compilers– Symbols, to distinguish code and data (no -s)

– Relocations, to find all code pointers (--emit-relocs)● ask linker to preserve relocations

.section .rodata: .quad 0x400620

.section .text: mov $0x400620, %rax

Code pointer, or integer?

Relocations (meta-data)

.section .rodata: .quad 4195872

.section .text: mov $4195872, %rax

63

Augmented Binary Analysis

● Allows accurate and complete disassembly

64

Augmented Binary Analysis

● Allows accurate and complete disassembly● Many special cases, but we handle them

65

Where to Re-Randomize From

● Most defenses operate at higher privilege level– i.e. kernel, hypervisor, hardware

– Or else declare their own code “trusted”

66

Where to Re-Randomize From

● Most defenses operate at higher privilege level– i.e. kernel, hypervisor, hardware

– Or else declare their own code “trusted”

● Shuffler is egalitarian– Same level of privilege, no system modifications

– Defends itself from attack

67

Egalitarian Bootstrapping

● Problem: transformations break original code– e.g. memcpy uses code pointers

68

Egalitarian Bootstrapping

● Problem: transformations break original code– e.g. memcpy uses code pointers

mov 0x400620(,%rax,8),%raxjmpq *%rax

0x400620: 0x400508 0x4005140x400630: 0x400520 0x40052c0x400640: 0x400538 0x400544

memcpy’s code

69

Egalitarian Bootstrapping

● Problem: transformations break original code– e.g. memcpy uses code pointers

Rewrite main, printf, ..., memcpy, ...

mov 0x400620(,%rax,8),%raxjmpq *%rax

memcpy’s code

0x400620: 0x400508 0x4005140x400630: 0x400520 0x40052c0x400640: 0x400538 0x400544

70

Egalitarian Bootstrapping

● Problem: transformations break original code– e.g. memcpy uses code pointers

Rewrite main, printf, ..., memcpy, ...

mov 0x400620(,%rax,8),%raxjmpq *%rax

0x400620: 0x20 0x280x400630: 0x30 0x880x400640: 0x40 0x48

memcpy’s codemov 0x400620(,%rax,8),%raxjmpq *%gs:(%rax)

New memcpy code

Invalidates memcpy jump table

But rewrite process uses (old) memcpy

71

Egalitarian Bootstrapping

● Problem: transformations break original code– e.g. memcpy uses code pointers

Rewrite main, printf, ..., memcpy, ...

mov 0x400620(,%rax,8),%raxjmpq *%rax

0x400620: 0x20 0x280x400630: 0x30 0x880x400640: 0x40 0x48

memcpy’s codemov 0x400620(,%rax,8),%raxjmpq *%gs:(%rax)

New memcpy code

??

Invalidates memcpy jump table

But rewrite process uses (old) memcpy

72

Egalitarian Bootstrapping

● Problem: transformations break original code– e.g. memcpy uses code pointers

● Solution: use two copies of Shuffler

73

Egalitarian Bootstrapping

● Problem: transformations break original code– e.g. memcpy uses code pointers

● Solution: use two copies of Shuffler

Shufflerstage 1

Shufflerstage 2

Otherlibraries

C library

Program

Loader loads

rewrites

74

Egalitarian Bootstrapping

● Problem: transformations break original code– e.g. memcpy uses code pointers

● Solution: use two copies of Shuffler

Shufflerstage 1

Shufflerstage 2

Otherlibraries

C library

Program

Loader

invokes

75

Egalitarian Bootstrapping

● Problem: transformations break original code– e.g. memcpy uses code pointers

● Solution: use two copies of Shuffler

Shufflerstage 1

Shufflerstage 2

Otherlibraries

C library

Program

Loader

eraseserases

76

Egalitarian Bootstrapping

● Problem: transformations break original code– e.g. memcpy uses code pointers

● Solution: use two copies of Shuffler– Make new copies

Shufflerstage 2

Otherlibraries

C library

Program

Shufflerstage 2

Otherlibraries

C library

Program

77

Egalitarian Bootstrapping

● Problem: transformations break original code– e.g. memcpy uses code pointers

● Solution: use two copies of Shuffler– Make new copies

Shufflerstage 2

Otherlibraries

C library

Program

78

Egalitarian Bootstrapping

● Problem: transformations break original code– e.g. memcpy uses code pointers

● Solution: use two copies of Shuffler– Make new copies

Shufflerstage 2

Otherlibraries

C library

Program

Shufflerstage 2

Otherlibraries

C library

Program

79

Egalitarian Bootstrapping

● Problem: transformations break original code– e.g. memcpy uses code pointers

● Solution: use two copies of Shuffler– Make new copies

Shufflerstage 2

Otherlibraries

C library

Program

80

Outline

1. Continuous re-randomization

2. Accelerating our randomization

3. Binary analysis and egalitarianism

4. Results and Demo

81

Performance Evaluation

● SPEC CPU overhead at 50ms = 14.9%

82

Performance Evaluation

● SPEC CPU overhead at 50ms = 14.9%● Multiprocess Nginx up to 24 workers

83

Security Evaluation

● Two disclosure-based attack methodologies:– Scan many pages for the desired gadgets

● impacted by disclosure time, network latency

– Explore gadget space in small number of pages● impacted by ROP chain computation time (> 40 seconds)

84

Security Evaluation

● Two disclosure-based attack methodologies:– Scan many pages for the desired gadgets

● impacted by disclosure time, network latency

– Explore gadget space in small number of pages● impacted by ROP chain computation time (> 40 seconds)

● Published JIT-ROP takes 2300-378000 ms● We can re-randomize typically every 20-50 ms

85

Demo

86

87

Conclusion

● Continuous re-randomization every 20-50 ms

88

Conclusion

● Continuous re-randomization every 20-50 ms● Fast:

– Defeats all known code reuse attacks

– Asynchronous shuffling offloads overhead

● Deployable:– Binary analysis w/o modifying kernel, compiler, ...

● Egalitarian:– No additional privileges required

– Shuffler defends its own code

Questions?

Demo website: http://shuffled.elfery.net:8000

90

Related Work

● JIT-ROP, SOSP 2013● Oxymoron, Usenix Sec 2014● Code Pointer Integrity, OSDI 2014● Stabilizer, SIGARCH 2013● Remix, CODASPY 2016● TASR, CCS 2015● ...more related work in our paper

[1] https://securityintelligence.com/anti-rop-a-moving-target-defense/[2] http://www.ieee-security.org/TC/SP2013/papers/4977a574.pdf

91

Future Work

● Translating stack unwind information– Breaks C++ exceptions, pthread_cancel, etc.

● Cannot shuffle the loader currently– Breaks dlopen

● If shuffling takes too long, no mechanism to pause target program

92

Shuffler Thread Performance

● Asynchronous shuffling runs quickly● Synchronous runtime is 0.3% of total runtime

93

Scalability

● Tradeoff for server workers– Multithreaded => better performance overhead

– Multiprocess => no disclosures across workers

● Both techniques scale well in practice (up to 24x)

unw

unwComputations

unwComputations

unwComputations

Multithreaded program

unw

unwComputations

unwComputations

Multiprocess program

unw

n Shuffler threads1 common Shuffler thread