Julio Auto Practical (Introduction to) Reverse Engineering.

41
Julio Auto <julio . auto *a* gmail> Practical (Introduction to) Reverse Engineering

Transcript of Julio Auto Practical (Introduction to) Reverse Engineering.

Page 1: Julio Auto Practical (Introduction to) Reverse Engineering.

Julio Auto <julio . auto *a* gmail>

Practical (Introduction to) Reverse Engineering

Page 2: Julio Auto Practical (Introduction to) Reverse Engineering.

AgendaPart I - 101

Why this presentation? (I mean... WHY?!?!)A few concepts (Mumble jumble++)Demo (Show me the goods)

Part II - 1337Advancing RE (Do your own!)Something extra (Finish pretty)

Linkz, lulz, refz, and shoutzQ & (maybe) A

Page 3: Julio Auto Practical (Introduction to) Reverse Engineering.

Why?Initially suggested by the H2HC crew

Based on my article ‘Cracking CrackMes’, published earlier this year while working for my previous employer, Scanit ME

RE is getting lots of attention, and many people seem interested in learning it

Still, it remains largely a black art

Page 4: Julio Auto Practical (Introduction to) Reverse Engineering.

Why? (2)It seems, then, that moving up from ground

zero is the most problematic stepThis presentation tries to help fix it

It aims to expose instant useful knowledgeAnd pointers to where go digging deeperInstead of advanced research _results_, basic

_techniques_ and _processes_Obs.: We’ll be targeting the Windows

platform most of the time in this speech

Page 5: Julio Auto Practical (Introduction to) Reverse Engineering.

ConceptsReverse Engineering is a very self-

explicative termYou take something and, from there, try to

learn how (some aspect of) it was engineeredIt’s also obviously broad

For example, it’s often used to describe the process through which you generate a higher-level, architectural view of a piece of software given its source code

Page 6: Julio Auto Practical (Introduction to) Reverse Engineering.

My Own ConceptThink of the times you asked yourself

“why” and “how” and let it go without an answer...

.........

RE is not letting go

Page 7: Julio Auto Practical (Introduction to) Reverse Engineering.

A Few ApplicationsMalware AnalysisVulnerability AnalysisSecurity Assessment of 3rd-party COTSEvaluation/Breaking of copy-protection

schemesAssorted how’s and why’s

Page 8: Julio Auto Practical (Introduction to) Reverse Engineering.

Why Still a Black Art?Perhaps because people think it’s only

good for SW cracking Perhaps because DRM has become a

nightmare no one is happy with and related laws everywhere bash reversers too hard every now and then (does anybody remember Dmitry Sklyarov, the DMCA and all that madness?)

Perhaps because many people still think it should be illegal (wtf?!)

Page 9: Julio Auto Practical (Introduction to) Reverse Engineering.

How To LearnThe Crack-Me approach

The one I illustrate in the paper I mentionedSmall and targeted challenges with different

levels and obstacles to choose fromThe real life approach

Choose a real-world problem and attack itTough but rewarding

We’ll demo a bit of both

Page 10: Julio Auto Practical (Introduction to) Reverse Engineering.

Tools of The TradeProbably millions of tools that can give you

some useful piece of info about your targetI’ll try to restrict myself to the most

relevant/common, thenUnfortunately, many of the best tools are

commercialOn the other hand, many of them have

free/student/evaluation versions For the rest... Well, remember “the real life

approach”? ;)

Page 11: Julio Auto Practical (Introduction to) Reverse Engineering.

DebuggersObvious importanceFairly good variety

It’s nice to play and know your way with all of themBut mastering them all is quite hard, so you’ll most

likely elect your debugger of choice in little timeChoose your debugger well!

Page 12: Julio Auto Practical (Introduction to) Reverse Engineering.

Debuggers (2)WinDbg

My personal choice of debuggerDeveloped by MSFTComes for free in the “Debugging Tools for

Windows” packageAmazingly rich in featuresExtensible with some C++ programming

Not the easiest or simplest dev environmentVery rich API, though

Poor interface

Page 13: Julio Auto Practical (Introduction to) Reverse Engineering.

Debuggers (3)Visual Studio Debugger

It’s crap, not suited for reversingBut it’s pretty and nice for developers :>

Seriously, don’t try to go very far reversing with itIt may use up the rest of your sanity

Page 14: Julio Auto Practical (Introduction to) Reverse Engineering.

Debuggers (4)OllyDbg

Enjoys quite a lot of popularity in the reversing community

Nice interfaceIn particular, a nice disassembly view

Comes in a few “tuned” versions, being one of the most popular...

Page 15: Julio Auto Practical (Introduction to) Reverse Engineering.

Debuggers (5)Immunity Debugger

Developed by Immunity Inc. (one of uCon’s proud sponsors)

Extends OllyDbg with a python interpreter and exposes a couple of debugging modules for the user to interact withVery neat plugin support

Embeds a command-line with windbg-aliased commands

Maintains a forum to support developers/users of ImmDbg plugins

Page 16: Julio Auto Practical (Introduction to) Reverse Engineering.

Debuggers (6)gdb

The standard debugger on *NIX systemsQuite complete debugger

Not the best thing in the RE world, but overall a good debugger

Page 17: Julio Auto Practical (Introduction to) Reverse Engineering.

DisassemblersReading assembly is not the sweetest thing

for most peopleThe way the code is represented is

extremely important and makes an increasingly great difference in big RCE tasks

Therefore, being confortable with your disassembler is essential

Page 18: Julio Auto Practical (Introduction to) Reverse Engineering.

Disassemblers (2)Pretty much every debugger is capable of

disassemblingApart of that, there’s lots of other tools that

can do it tooIn Linux, objdump is pretty much a standard

toolHowever, one particular tool is specially

known for its disassembly features

Page 19: Julio Auto Practical (Introduction to) Reverse Engineering.

Disassemblers (3)IDA Pro

Supports many binary formats and architectures

Displays the code in graphs, which greatly enhance the visualizationBlock-level CFGs

Many things can be customized/adjustedGraph layout, data types, annotations...

Quite frankly, it’s in every reverser’s toolkitIDA Pro is a commercial tool currently in

version 5.4But version 4.9 is available in a free edition

Page 20: Julio Auto Practical (Introduction to) Reverse Engineering.

System Monitoring ToolsAll of those from the SysInternals Suite

Process ExplorerRegMonFileMonTCPViewEtc...

Page 21: Julio Auto Practical (Introduction to) Reverse Engineering.

Advanced ToolsBinary Diff’ers

BinDiffDecompilers

Hex-RaysRE Frameworks

ERESI ;)PaiMei and all the PyThings

Page 22: Julio Auto Practical (Introduction to) Reverse Engineering.

DemoWe’ll try and beat a crack-me challengeThis crack-me was taken from a real

competitionHITB Dubai 2007 CTF

Perhaps it can serve as a tip for uCon’s CTF as well

Page 23: Julio Auto Practical (Introduction to) Reverse Engineering.

RE – Advanced TopicsCutting to the chase, advancing RE

basically means automating stuffMany of the RE tools are

scriptable/programmable/extensibleDeveloping smart ways to deal with

repetitive tasks is the way for more effective analyses

Page 24: Julio Auto Practical (Introduction to) Reverse Engineering.

RE – Advanced Topics (2)Less often, you might see opportunities to

advance RE in ways not based on automationDefeating a new anti-debug trickDeveloping new environments for RE

Virtualization, Sandboxing...Or even radically changing paradigms

E.g. The graph-based approach to binary navigation

Page 25: Julio Auto Practical (Introduction to) Reverse Engineering.

RE – Advanced Topics (3)Perhaps the most important lesson here is

not to reinvent the wheelRe-use the tools you have!

You’ll be amazed at how much stuff you can do by “glueing” pieces together

Having that said...Perhaps the tools you have are not perfectOr you might wanna re-do something just for

learningBut be sure to have the right goals in mind!

Page 26: Julio Auto Practical (Introduction to) Reverse Engineering.

Teaching By ExampleI will demonstrate how you can use

advanced RE to solve real life problemsThe main idea behind the “re-use” thing I

mentioned in the previous is slide is too keep your solution simple, by focusing on the logics itself rather than in the engineeringUnfortunately, what I’m about to show is

actually a bad example in this aspect (more on this later)

Page 27: Julio Auto Practical (Introduction to) Reverse Engineering.

ProblemSuppose you have ways to reproduce a high-

profile, possibly exploitable bug – Yay!BUT....

The target is closed-source softwareThe target is as large and complex as an

operating system – and way less documentedThe input is huge and has a complex, possibly

undisclosed formatThe source of the bug can be anywhere in the

inputFrom user-input to actual bug/crash, about 3

million instructions happen

Page 28: Julio Auto Practical (Introduction to) Reverse Engineering.

WHAT DO YOU DO????

Page 29: Julio Auto Practical (Introduction to) Reverse Engineering.

Introducing LEPLEP tries to answer a big question in this

problem:What exact part of this input is causing the bug?

If you can answer this question and somehow co-relate this with the input format, you may gain a great deal of understanding of the bug

For this, I have invented a new technique: “Staged Partial Tracing-Based Backwards Taint Analysis”Because not sounding like a Ph.D. is so 2001 :>And also because we all just love new terms we

can go media-cuckoo about

Page 30: Julio Auto Practical (Introduction to) Reverse Engineering.

Introducing LEP (2)One-liner idea: If we know when our input is

brought to memory and know where it’s mapped, we can trace the program from this point to the crash and then go backwards analyzing the dataflow to find out where the faulting data came from

We do it in two stages, with a component for each: the tracer and the analyzer

Simple, huh?

Page 31: Julio Auto Practical (Introduction to) Reverse Engineering.

Fundamental ConceptsWhen we trace the program, it becomes

“linear”, i.e. control-flow is irrelevantDataflow becomes concretely deterministicAliasing is not an issue (no need to theorize on

side-effects)All info we need is available in runtime

In particular, effective addressesIf the input is as big as the problem states, it

should be no problem to find it in memoryWe get most of the info we need from the

disassembly text (ASCII)! It’s like hacking with grep again!

Page 32: Julio Auto Practical (Introduction to) Reverse Engineering.

LEP TracerA WinDbg extensionTraces every instruction until the program

raises an exceptionDumps the following instruction info to a

file:MnemonicDestination operandSource operandDependences of the source op – e.g. mov

eax,[ecx+edx*2]

Page 33: Julio Auto Practical (Introduction to) Reverse Engineering.

LEP Tracer (2)Discards control-flow changing instructionsDiscards in/out instructions (all relevant

input should be in memory already?)Discards other groups of instructions that

will be supported as we goFPU, MMX, SSE{2,3}, etc...

Tries to parse the right info even when the debugger is too stupid to work as expected Why not to compute effective addresses in

rep’ed instructions?

Page 34: Julio Auto Practical (Introduction to) Reverse Engineering.

LEP AnalyzerReads the file generated by the tracer and

goes bottom-up investigating the dataflowYou have to specify the piece of data that

causes the last instruction to fail – usually (always?) a register

And the memory range(s) where your input was mapped into, at the time the trace was taken

Ignores register “slices” for simplicity(al || ah) == ax == eax == rax

Page 35: Julio Auto Practical (Introduction to) Reverse Engineering.

LEP Analyzer (2)When the source operand of a given instruction is

an immediate/constant, LEP tries it best to evaluate whether it _transforms_ or _overwrites_ the destinationIf it overwrites, we finish the analysis for this branch

mov eax, deadf0f0hElse if it transforms, we keep looking for another def

of the same destination operand inc eax

This gives a very special meaning for LEP’s existenceOtherwise, searching for occurences of the faulting data

inside the input could be just as effectiveLEP also tries to identify non-obvious constant

overwritesxor eax, eax

Page 36: Julio Auto Practical (Introduction to) Reverse Engineering.

Engineering Tech-TalkLEP was intended to be written entirely in Python

Didn’t work for performance reasons LEP Tracer is written in C++, since it’s a WinDbg

extensionIt makes use of a reference of the x86 instruction

set written in XML by MazeGenThe XML is mapped to C++ using CodeSynthesis’

XSD XML Data BindingLEP Analyzer was firstly written in Python

Then I also re-wrote it in C++LEP Analyzer’s search algorithm was initially a

DFSThen I implemented it as a BFS

Page 37: Julio Auto Practical (Introduction to) Reverse Engineering.

Demo II

Page 38: Julio Auto Practical (Introduction to) Reverse Engineering.

Linkz & RefzCracking CrackMes

http://www.scanit.net/rd/wp/wp04X86 Opcode and Instruction Reference, by

MazeGenhttp://ref.x86asm.net/

CodeSynthesis XSD – XML Data Binding for C++http://www.codesynthesis.com/products/xsd/

Thousands of elite RE projectshttp://www.google.com Seriously though, contact me if you can’t find

anything

Page 39: Julio Auto Practical (Introduction to) Reverse Engineering.

Greetz & ShoutzFilipe Balestra for lending me the bug used in

the 2nd demoH2HC crew for inspiring me to do this workuCon Crew for having the elitest con everEverybody in the room for coming The ERESI team, with whom I have most of my

discussions about RE, programa analysis, etcAll of the great people that I know from the

security sceneIt’s simply impossible to mention each and

everyone of you, but you know who you are!

Page 40: Julio Auto Practical (Introduction to) Reverse Engineering.

Questions?

Page 41: Julio Auto Practical (Introduction to) Reverse Engineering.

Julio Auto <julio . auto *a* gmail>

Practical (Introduction to) Reverse Engineering