Julio Auto Practical (Introduction to) Reverse Engineering.
-
Upload
ruby-walsh -
Category
Documents
-
view
233 -
download
4
Transcript of Julio Auto Practical (Introduction to) Reverse Engineering.
Julio Auto <julio . auto *a* gmail>
Practical (Introduction to) Reverse Engineering
AgendaPart I - 101
Why this presentation? (I mean... WHY?!?!)A few concepts (Mumble jumble++)Demo (Show me the goods)
Part II - 1337Advancing RE (Do your own!)Something extra (Finish pretty)
Linkz, lulz, refz, and shoutzQ & (maybe) A
Why?Initially suggested by the H2HC crew
Based on my article ‘Cracking CrackMes’, published earlier this year while working for my previous employer, Scanit ME
RE is getting lots of attention, and many people seem interested in learning it
Still, it remains largely a black art
Why? (2)It seems, then, that moving up from ground
zero is the most problematic stepThis presentation tries to help fix it
It aims to expose instant useful knowledgeAnd pointers to where go digging deeperInstead of advanced research _results_, basic
_techniques_ and _processes_Obs.: We’ll be targeting the Windows
platform most of the time in this speech
ConceptsReverse Engineering is a very self-
explicative termYou take something and, from there, try to
learn how (some aspect of) it was engineeredIt’s also obviously broad
For example, it’s often used to describe the process through which you generate a higher-level, architectural view of a piece of software given its source code
My Own ConceptThink of the times you asked yourself
“why” and “how” and let it go without an answer...
.........
RE is not letting go
A Few ApplicationsMalware AnalysisVulnerability AnalysisSecurity Assessment of 3rd-party COTSEvaluation/Breaking of copy-protection
schemesAssorted how’s and why’s
Why Still a Black Art?Perhaps because people think it’s only
good for SW cracking Perhaps because DRM has become a
nightmare no one is happy with and related laws everywhere bash reversers too hard every now and then (does anybody remember Dmitry Sklyarov, the DMCA and all that madness?)
Perhaps because many people still think it should be illegal (wtf?!)
How To LearnThe Crack-Me approach
The one I illustrate in the paper I mentionedSmall and targeted challenges with different
levels and obstacles to choose fromThe real life approach
Choose a real-world problem and attack itTough but rewarding
We’ll demo a bit of both
Tools of The TradeProbably millions of tools that can give you
some useful piece of info about your targetI’ll try to restrict myself to the most
relevant/common, thenUnfortunately, many of the best tools are
commercialOn the other hand, many of them have
free/student/evaluation versions For the rest... Well, remember “the real life
approach”? ;)
DebuggersObvious importanceFairly good variety
It’s nice to play and know your way with all of themBut mastering them all is quite hard, so you’ll most
likely elect your debugger of choice in little timeChoose your debugger well!
Debuggers (2)WinDbg
My personal choice of debuggerDeveloped by MSFTComes for free in the “Debugging Tools for
Windows” packageAmazingly rich in featuresExtensible with some C++ programming
Not the easiest or simplest dev environmentVery rich API, though
Poor interface
Debuggers (3)Visual Studio Debugger
It’s crap, not suited for reversingBut it’s pretty and nice for developers :>
Seriously, don’t try to go very far reversing with itIt may use up the rest of your sanity
Debuggers (4)OllyDbg
Enjoys quite a lot of popularity in the reversing community
Nice interfaceIn particular, a nice disassembly view
Comes in a few “tuned” versions, being one of the most popular...
Debuggers (5)Immunity Debugger
Developed by Immunity Inc. (one of uCon’s proud sponsors)
Extends OllyDbg with a python interpreter and exposes a couple of debugging modules for the user to interact withVery neat plugin support
Embeds a command-line with windbg-aliased commands
Maintains a forum to support developers/users of ImmDbg plugins
Debuggers (6)gdb
The standard debugger on *NIX systemsQuite complete debugger
Not the best thing in the RE world, but overall a good debugger
DisassemblersReading assembly is not the sweetest thing
for most peopleThe way the code is represented is
extremely important and makes an increasingly great difference in big RCE tasks
Therefore, being confortable with your disassembler is essential
Disassemblers (2)Pretty much every debugger is capable of
disassemblingApart of that, there’s lots of other tools that
can do it tooIn Linux, objdump is pretty much a standard
toolHowever, one particular tool is specially
known for its disassembly features
Disassemblers (3)IDA Pro
Supports many binary formats and architectures
Displays the code in graphs, which greatly enhance the visualizationBlock-level CFGs
Many things can be customized/adjustedGraph layout, data types, annotations...
Quite frankly, it’s in every reverser’s toolkitIDA Pro is a commercial tool currently in
version 5.4But version 4.9 is available in a free edition
System Monitoring ToolsAll of those from the SysInternals Suite
Process ExplorerRegMonFileMonTCPViewEtc...
Advanced ToolsBinary Diff’ers
BinDiffDecompilers
Hex-RaysRE Frameworks
ERESI ;)PaiMei and all the PyThings
DemoWe’ll try and beat a crack-me challengeThis crack-me was taken from a real
competitionHITB Dubai 2007 CTF
Perhaps it can serve as a tip for uCon’s CTF as well
RE – Advanced TopicsCutting to the chase, advancing RE
basically means automating stuffMany of the RE tools are
scriptable/programmable/extensibleDeveloping smart ways to deal with
repetitive tasks is the way for more effective analyses
RE – Advanced Topics (2)Less often, you might see opportunities to
advance RE in ways not based on automationDefeating a new anti-debug trickDeveloping new environments for RE
Virtualization, Sandboxing...Or even radically changing paradigms
E.g. The graph-based approach to binary navigation
RE – Advanced Topics (3)Perhaps the most important lesson here is
not to reinvent the wheelRe-use the tools you have!
You’ll be amazed at how much stuff you can do by “glueing” pieces together
Having that said...Perhaps the tools you have are not perfectOr you might wanna re-do something just for
learningBut be sure to have the right goals in mind!
Teaching By ExampleI will demonstrate how you can use
advanced RE to solve real life problemsThe main idea behind the “re-use” thing I
mentioned in the previous is slide is too keep your solution simple, by focusing on the logics itself rather than in the engineeringUnfortunately, what I’m about to show is
actually a bad example in this aspect (more on this later)
ProblemSuppose you have ways to reproduce a high-
profile, possibly exploitable bug – Yay!BUT....
The target is closed-source softwareThe target is as large and complex as an
operating system – and way less documentedThe input is huge and has a complex, possibly
undisclosed formatThe source of the bug can be anywhere in the
inputFrom user-input to actual bug/crash, about 3
million instructions happen
WHAT DO YOU DO????
Introducing LEPLEP tries to answer a big question in this
problem:What exact part of this input is causing the bug?
If you can answer this question and somehow co-relate this with the input format, you may gain a great deal of understanding of the bug
For this, I have invented a new technique: “Staged Partial Tracing-Based Backwards Taint Analysis”Because not sounding like a Ph.D. is so 2001 :>And also because we all just love new terms we
can go media-cuckoo about
Introducing LEP (2)One-liner idea: If we know when our input is
brought to memory and know where it’s mapped, we can trace the program from this point to the crash and then go backwards analyzing the dataflow to find out where the faulting data came from
We do it in two stages, with a component for each: the tracer and the analyzer
Simple, huh?
Fundamental ConceptsWhen we trace the program, it becomes
“linear”, i.e. control-flow is irrelevantDataflow becomes concretely deterministicAliasing is not an issue (no need to theorize on
side-effects)All info we need is available in runtime
In particular, effective addressesIf the input is as big as the problem states, it
should be no problem to find it in memoryWe get most of the info we need from the
disassembly text (ASCII)! It’s like hacking with grep again!
LEP TracerA WinDbg extensionTraces every instruction until the program
raises an exceptionDumps the following instruction info to a
file:MnemonicDestination operandSource operandDependences of the source op – e.g. mov
eax,[ecx+edx*2]
LEP Tracer (2)Discards control-flow changing instructionsDiscards in/out instructions (all relevant
input should be in memory already?)Discards other groups of instructions that
will be supported as we goFPU, MMX, SSE{2,3}, etc...
Tries to parse the right info even when the debugger is too stupid to work as expected Why not to compute effective addresses in
rep’ed instructions?
LEP AnalyzerReads the file generated by the tracer and
goes bottom-up investigating the dataflowYou have to specify the piece of data that
causes the last instruction to fail – usually (always?) a register
And the memory range(s) where your input was mapped into, at the time the trace was taken
Ignores register “slices” for simplicity(al || ah) == ax == eax == rax
LEP Analyzer (2)When the source operand of a given instruction is
an immediate/constant, LEP tries it best to evaluate whether it _transforms_ or _overwrites_ the destinationIf it overwrites, we finish the analysis for this branch
mov eax, deadf0f0hElse if it transforms, we keep looking for another def
of the same destination operand inc eax
This gives a very special meaning for LEP’s existenceOtherwise, searching for occurences of the faulting data
inside the input could be just as effectiveLEP also tries to identify non-obvious constant
overwritesxor eax, eax
Engineering Tech-TalkLEP was intended to be written entirely in Python
Didn’t work for performance reasons LEP Tracer is written in C++, since it’s a WinDbg
extensionIt makes use of a reference of the x86 instruction
set written in XML by MazeGenThe XML is mapped to C++ using CodeSynthesis’
XSD XML Data BindingLEP Analyzer was firstly written in Python
Then I also re-wrote it in C++LEP Analyzer’s search algorithm was initially a
DFSThen I implemented it as a BFS
Demo II
Linkz & RefzCracking CrackMes
http://www.scanit.net/rd/wp/wp04X86 Opcode and Instruction Reference, by
MazeGenhttp://ref.x86asm.net/
CodeSynthesis XSD – XML Data Binding for C++http://www.codesynthesis.com/products/xsd/
Thousands of elite RE projectshttp://www.google.com Seriously though, contact me if you can’t find
anything
Greetz & ShoutzFilipe Balestra for lending me the bug used in
the 2nd demoH2HC crew for inspiring me to do this workuCon Crew for having the elitest con everEverybody in the room for coming The ERESI team, with whom I have most of my
discussions about RE, programa analysis, etcAll of the great people that I know from the
security sceneIt’s simply impossible to mention each and
everyone of you, but you know who you are!
Questions?
Julio Auto <julio . auto *a* gmail>
Practical (Introduction to) Reverse Engineering