Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them...

45
1 Overview and Usage of Binary Analysis Frameworks Florian Magin [email protected]

Transcript of Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them...

Page 1: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

1

Overview and Usage of Binary Analysis Frameworks

Florian Magin [email protected]

Page 2: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

2

whoami

o Security Research at ERNW Research GmbH from Heidelberg, Germany

o Organizer of the Wizards of Dos CTF team from Darmstadt, Germany

o Reach me via:o Twitter: @0x464D

o Email: [email protected]

Page 3: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

3

Who we are

o Germany-based ERNW GmbHo Independent

o Deep technical knowledge

o Structured (assessment) approach

o Business reasonable recommendations

o We understand corporate

o Blog: www.insinuator.net

o Conference: www.troopers.de

Page 4: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

4

Agenda

What actually is automated analysis?

How does it work?

What else are some of the frameworks capable of? (In this case angr)

Page 5: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

5

What is Automated Binary Analysis?

Page 6: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

6

-> Binary Analysis performed by algorithms

What is Automated Binary Analysis?

Page 7: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

7

Wait, isn’t that impossible?

o It’s impossible to generally tell if a program halts for a given input (Halting Problem)

o Also Rice’s Theorem

o Also, what exactly are we even looking for?

o Crashes?

o Memory Corruptions?

o Logic Errors?

Page 8: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

8

Bit of History

o The ideas themselves are 40 years oldo Robert S. Boyer and Bernard Elspas and Karl

N. Levitt, SELECT--a formal system for testing and debugging programs by symbolic execution, 1975

o Analysis is resource intensiveo Cray-1 supercomputer from 1975 had

80MFLOPS (8MB of RAM)

o iPhone 5s from 2013 produces about 76.8 GFLOPS (1GB of RAM)

Page 9: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

9

DARPA CGC

o Task: Develop a “Cyber Reasoning System”

o Big push in moving the ideas from academia to practicability

o Qualification Prize: $750,000

o Final Prizes:

1. $2,000,000

2. $1,000,000

3. $750,000

Page 10: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

10

DARPA CGC

o CRS needs to:o Find vulnerabilities

o Patch them

o Ran on 64 Nodes, each:o 2x Intel Xeon Processor E5-2670 v2 (25M cache,

2.50GHz) (20 physical cores per machine)

o 256GB Memory

o 7 finalists, winner competed at DEFCON CTF

o It was better than some of the human teams some of the time

Page 11: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

11

Overview of the basic concepts

How do they work?

Page 12: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

12

Intermediate Representations

o What do we actually analyze?

o Every architecture is different

o Common Representation

o Typical case of too may standards

o VEX IR (used by Valgrind and angr)

o Binary Ninja IR

o LLVM IR

o So many more

Page 13: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

13

CFG Recovery

o Recursively build a graph with jumps as edges and basic blocks as nodes

o Easy with calls and direct jumps

o But what about “jmp eax”?

o Jump table

o Callbacks/Higher Order Functions

o Functions of Objects in OOP

o “Graph-based vulnerability discovery”

Page 14: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

14

Value-Set Analysis

o Approximate program states

o Values in memory or registers

o Reconstruct buffers

o Can be enough to detect buffer overflows

o Research paper is in the last slide (18 Pages)

Page 15: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

15

Data-Flow Analysis/Taint Analysis

o Track where data ends up

o Data dependencies

o Discover functions that handle user input

o Different granularity:

o Bits

o Bytes

Page 16: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

16

Constraint Solving

o A LOT of Math involved

o Highly simplified:

a = 5

b = 15

x < b && x > a

x = ?

Page 17: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

17

Z3 Theorem Prover

o Developed by Microsoft Research

o Microsoft uses it to formally verify some parts of their products

o Windows Kernel

o Hyper-V

o MIT License

o Provides a SMT Solver

Page 18: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

18

o Claripy is the abstraction layer for constraint solvers like Z3 used by angr

o Only exposes needed functionality for binary analysis

DEMO

Z3/Claripy Example

Page 19: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

19

Symbolic Execution

o Symbolic instead of concrete variables

o Following example shamelessly stolen from the angr presentations

Page 20: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

20

Page 21: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

21

Page 22: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

22

Page 23: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

23

Page 24: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

24

Page 25: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

25

AST

o Abstract Syntax Tree

o Basically a representation for the constraints from the previous slide

Page 26: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

26

Side Note: LLVM Compiler

Infrastructure

o Own IR (LLVM IR)

o Own symbolic execution engine (KLEE)

o Own constraint solver (Kleaver)

Page 27: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

27

What to do with all this?

o These techniques don’t scale

o State Explosion

o Constraint solving is generally NP-Complete

o Combine with something really smart but slow: a human

o Combine with something really dumb but fast: a fuzzer

Page 28: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

28

Augmented Fuzzing

o Taint Analysis to discover what branch depends on what input

o Symbolic Execution with constraint solver to build input to take that branch

Page 29: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

29

angr

o Most beginner friendly of all tools

o Written in Python

o Good Documentation

o Plenty of available research

o Used by Mechaphish ( 3rd at Darpa’s CGC)

o Developed at University of California, Santa Barbara

o Even the CIA uses angr!

Page 30: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

30

Triton

o x86 and x86_64 only

o Designed as a library (LibTriton.so)

o Should be easier to integrate into C Projects

o Has python bindings

o Not focused on automating but assisting

o Sponsored by Quarkslab

Page 31: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

31

Others

o Bitblaze

o University of California, Berkeley

o bap: Binary Analysis Platform

o Carnegie Mellon University/ForAllSecure

o Written in OCaml

o Used by Mayhem (1st Place at Darpa’s CGC)

o Miasm

Page 32: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

32

Overview of Frameworks and Projects

Source: Angr Tutorial (so obviously biased)

Page 33: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

33

Installing Angr

o My setup: Vagrant Box with ArchstrikeRepositories (turns out that’s outdated)

o pacman –S angr

o Alternatives:

o Official Docker Image

o pip2 install angr

Page 34: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

34

o Automagically import binary functions

o angr detects calling convention

o Maps python types to binary representation

o Call them from python

o With concrete values

o With symbolic value

Foreign Function Interface

>>> import angr

>>>b=angr.Project(‘/path/binary’)

>>>f = b.factory.callable(address)

>>>f?

Type: Callable

[…]

Callable is a representation of a function in the binary that can be

interacted with like a native python function.

[…]

Page 35: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

35

o Get an input so that a function returns a certain value

o Function can be from Python or from binary(see FFI)

o f(x,y) -> (x*3 >> 1) * y

o f(?,0x42) - > 0x76E24

Inversing Functions

Page 36: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

36

o Get an input so that a function returns a certain value

o Function can be from Python or from binary(see FFI)

o f(x,y) -> (x*3 >> 1) * y

o f(?,0x42) - > 0x76E24

Inversing Functions

DEMO

Page 37: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

37

o Get an input so that a function returns a certain value

o Function can be from Python or from binary(see FFI)

o f(x,y) -> (x*3 >> 1) * y

o f(?,0x42) - > 0x76E24

o We got two possible solutions

o 0x1337 (intended)

o 0x555555555555688c which returns 0x210000000000076e24

-> Integer Overflow

Inversing Functions

Page 38: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

38

o Binary that takes some user input

o stdin

o argv

o Some file

o Checks it against constraints

o Determines if it’s valid

Automagic Solving of Crackmes

DEMO

Page 39: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

39

o Binary that takes some user input

o stdin

o argv

o Some file

o Checks it against constraints

o Determines if it’s valid

o We just declare that input as symbolic

o Choose a starting point and explore the possible paths from there

o Solve for an input that brings us down the wanted path that’s the solution

Automagic Solving of Crackmes

Page 40: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

40

o Breakpoints with callbacks before or after:o Instructions or address

o Memory read/write

o Register read/write

o Many others

o Hookso Optimized libc functions

o Own Python Code

>>> import angr, simuvex

>>>b=angr.Project(‘/path/binary’)

>>> s = b.factory.entry_state()

>>> def debug_func(state):

... print 'Read', state.inspect.mem_read_expr, 'from', state.inspect.mem_read_address

>>> s.inspect.b('mem_write', when=simuvex.BP_AFTER, action=debug_func)

Debugging capabilities

Page 41: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

41

Anti-Anti-Debugging

o angr is not a debugger

o Most tricks wont work

o Might accidentally break angr in other ways

o Simuvex or Unicorn can be used as an emulator

o Breakpoints without the program noticing

o Invisible Hooks

o Overall it needs a different approach

Page 42: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

42

Angr Cheat Sheet

o Currently developing an angr cheat sheet

o Common Commands to look up

o Features that are hidden somewhere in the docs

o Release $soon

o Probably as part of the angr/angr-doc repo

Page 43: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

43

Other Tools to know

o Unicorn Emulatoro Emulate all the architectures!

o Capstoneo Disassemble all the architectures!

o Interesting Mechaphish Componentso angrop (generates ROP Chains)

o Driller (Augmented Fuzzing)

o Everything in https://github.com/mechaphish

Page 44: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

44

www.ernw.de

www.insinuator.net

Thank you for your attention

[email protected]

0x464D

Page 45: Overview and Usage of Binary Analysis Frameworks Florian ... · o Find vulnerabilities o Patch them o Ran on 64 Nodes, each: o 2x Intel Xeon Processor E5-2670 v2 (25M cache, 2.50GHz)

45

References & Literature

o “SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis”

o https://docs.angr.io/

o VSA:

Analyzing Memory Accesses in x86 Executables

Gogul Balakrishnan and Thomas Reps

Comp. Sci. Dept., University of Wisconsin; {bgogul,reps}@cs.wisc.edu