The LLVM Compiler Infrastructure Project

41
Taking it From The Source Thursday, November 29, 12

Transcript of The LLVM Compiler Infrastructure Project

Page 1: The LLVM Compiler Infrastructure Project

Taking it From The Source

Thursday November 29 12

This is a research project conducted under the Defense Advanced Research Projects Agency (DARPA) Cyber

Fast Track program

PI Jared CarlsonPrincipal

GoToTheBoard

Brought To You By

Thursday November 29 12

The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the US Government

Disclaimer

Thursday November 29 12

A Few Other Notesbull This project is ongoing and this presentation

is subject to a public release

bull This means material is a little older than whatrsquos actually in development

Thursday November 29 12

Project Goals

bull Create a tool to find exploitable bugs within a ldquonormalrdquo environment

bull Illustrate consequences of these bugs

bull Educate and interact with the developer

bull Allow for community improvement and sharing

Thursday November 29 12

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Static Analysis

bullRewrite scan-buildanalyzer in python for integration purposes

bullDigest amp Export JSON rather than HTML for comms

bullReuse existing infrastructure and passes for more exhaustive analysis

bullExpand using LLVM passes

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Function GraphbullWe construct our map of the program

bullUse this for fuzzing entry points

bull ldquoMapsrdquo the static to dynamic

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Data Dependencies

bullLook at data local variables etc

bullTaint analysisbullParse ASTbullStore dependency relationships into ldquofactsrdquo

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer Hypothesis Engine

bullHypothesis Engine tracks what succeeds and what does not

bullSimple rules engine

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 2: The LLVM Compiler Infrastructure Project

This is a research project conducted under the Defense Advanced Research Projects Agency (DARPA) Cyber

Fast Track program

PI Jared CarlsonPrincipal

GoToTheBoard

Brought To You By

Thursday November 29 12

The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the US Government

Disclaimer

Thursday November 29 12

A Few Other Notesbull This project is ongoing and this presentation

is subject to a public release

bull This means material is a little older than whatrsquos actually in development

Thursday November 29 12

Project Goals

bull Create a tool to find exploitable bugs within a ldquonormalrdquo environment

bull Illustrate consequences of these bugs

bull Educate and interact with the developer

bull Allow for community improvement and sharing

Thursday November 29 12

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Static Analysis

bullRewrite scan-buildanalyzer in python for integration purposes

bullDigest amp Export JSON rather than HTML for comms

bullReuse existing infrastructure and passes for more exhaustive analysis

bullExpand using LLVM passes

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Function GraphbullWe construct our map of the program

bullUse this for fuzzing entry points

bull ldquoMapsrdquo the static to dynamic

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Data Dependencies

bullLook at data local variables etc

bullTaint analysisbullParse ASTbullStore dependency relationships into ldquofactsrdquo

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer Hypothesis Engine

bullHypothesis Engine tracks what succeeds and what does not

bullSimple rules engine

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 3: The LLVM Compiler Infrastructure Project

The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the US Government

Disclaimer

Thursday November 29 12

A Few Other Notesbull This project is ongoing and this presentation

is subject to a public release

bull This means material is a little older than whatrsquos actually in development

Thursday November 29 12

Project Goals

bull Create a tool to find exploitable bugs within a ldquonormalrdquo environment

bull Illustrate consequences of these bugs

bull Educate and interact with the developer

bull Allow for community improvement and sharing

Thursday November 29 12

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Static Analysis

bullRewrite scan-buildanalyzer in python for integration purposes

bullDigest amp Export JSON rather than HTML for comms

bullReuse existing infrastructure and passes for more exhaustive analysis

bullExpand using LLVM passes

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Function GraphbullWe construct our map of the program

bullUse this for fuzzing entry points

bull ldquoMapsrdquo the static to dynamic

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Data Dependencies

bullLook at data local variables etc

bullTaint analysisbullParse ASTbullStore dependency relationships into ldquofactsrdquo

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer Hypothesis Engine

bullHypothesis Engine tracks what succeeds and what does not

bullSimple rules engine

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 4: The LLVM Compiler Infrastructure Project

A Few Other Notesbull This project is ongoing and this presentation

is subject to a public release

bull This means material is a little older than whatrsquos actually in development

Thursday November 29 12

Project Goals

bull Create a tool to find exploitable bugs within a ldquonormalrdquo environment

bull Illustrate consequences of these bugs

bull Educate and interact with the developer

bull Allow for community improvement and sharing

Thursday November 29 12

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Static Analysis

bullRewrite scan-buildanalyzer in python for integration purposes

bullDigest amp Export JSON rather than HTML for comms

bullReuse existing infrastructure and passes for more exhaustive analysis

bullExpand using LLVM passes

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Function GraphbullWe construct our map of the program

bullUse this for fuzzing entry points

bull ldquoMapsrdquo the static to dynamic

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Data Dependencies

bullLook at data local variables etc

bullTaint analysisbullParse ASTbullStore dependency relationships into ldquofactsrdquo

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer Hypothesis Engine

bullHypothesis Engine tracks what succeeds and what does not

bullSimple rules engine

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 5: The LLVM Compiler Infrastructure Project

Project Goals

bull Create a tool to find exploitable bugs within a ldquonormalrdquo environment

bull Illustrate consequences of these bugs

bull Educate and interact with the developer

bull Allow for community improvement and sharing

Thursday November 29 12

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Static Analysis

bullRewrite scan-buildanalyzer in python for integration purposes

bullDigest amp Export JSON rather than HTML for comms

bullReuse existing infrastructure and passes for more exhaustive analysis

bullExpand using LLVM passes

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Function GraphbullWe construct our map of the program

bullUse this for fuzzing entry points

bull ldquoMapsrdquo the static to dynamic

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Data Dependencies

bullLook at data local variables etc

bullTaint analysisbullParse ASTbullStore dependency relationships into ldquofactsrdquo

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer Hypothesis Engine

bullHypothesis Engine tracks what succeeds and what does not

bullSimple rules engine

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 6: The LLVM Compiler Infrastructure Project

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Static Analysis

bullRewrite scan-buildanalyzer in python for integration purposes

bullDigest amp Export JSON rather than HTML for comms

bullReuse existing infrastructure and passes for more exhaustive analysis

bullExpand using LLVM passes

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Function GraphbullWe construct our map of the program

bullUse this for fuzzing entry points

bull ldquoMapsrdquo the static to dynamic

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Data Dependencies

bullLook at data local variables etc

bullTaint analysisbullParse ASTbullStore dependency relationships into ldquofactsrdquo

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer Hypothesis Engine

bullHypothesis Engine tracks what succeeds and what does not

bullSimple rules engine

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 7: The LLVM Compiler Infrastructure Project

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Static Analysis

bullRewrite scan-buildanalyzer in python for integration purposes

bullDigest amp Export JSON rather than HTML for comms

bullReuse existing infrastructure and passes for more exhaustive analysis

bullExpand using LLVM passes

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Function GraphbullWe construct our map of the program

bullUse this for fuzzing entry points

bull ldquoMapsrdquo the static to dynamic

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Data Dependencies

bullLook at data local variables etc

bullTaint analysisbullParse ASTbullStore dependency relationships into ldquofactsrdquo

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer Hypothesis Engine

bullHypothesis Engine tracks what succeeds and what does not

bullSimple rules engine

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 8: The LLVM Compiler Infrastructure Project

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Static Analysis

bullRewrite scan-buildanalyzer in python for integration purposes

bullDigest amp Export JSON rather than HTML for comms

bullReuse existing infrastructure and passes for more exhaustive analysis

bullExpand using LLVM passes

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Function GraphbullWe construct our map of the program

bullUse this for fuzzing entry points

bull ldquoMapsrdquo the static to dynamic

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Data Dependencies

bullLook at data local variables etc

bullTaint analysisbullParse ASTbullStore dependency relationships into ldquofactsrdquo

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer Hypothesis Engine

bullHypothesis Engine tracks what succeeds and what does not

bullSimple rules engine

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 9: The LLVM Compiler Infrastructure Project

Static Analysis

FunctionGraph

Hypothesis Engine

Fuzzer

Static Analysis

bullRewrite scan-buildanalyzer in python for integration purposes

bullDigest amp Export JSON rather than HTML for comms

bullReuse existing infrastructure and passes for more exhaustive analysis

bullExpand using LLVM passes

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Function GraphbullWe construct our map of the program

bullUse this for fuzzing entry points

bull ldquoMapsrdquo the static to dynamic

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Data Dependencies

bullLook at data local variables etc

bullTaint analysisbullParse ASTbullStore dependency relationships into ldquofactsrdquo

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer Hypothesis Engine

bullHypothesis Engine tracks what succeeds and what does not

bullSimple rules engine

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 10: The LLVM Compiler Infrastructure Project

Static Analysis

FunctionGraph

DataDependency

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Function GraphbullWe construct our map of the program

bullUse this for fuzzing entry points

bull ldquoMapsrdquo the static to dynamic

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Data Dependencies

bullLook at data local variables etc

bullTaint analysisbullParse ASTbullStore dependency relationships into ldquofactsrdquo

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer Hypothesis Engine

bullHypothesis Engine tracks what succeeds and what does not

bullSimple rules engine

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 11: The LLVM Compiler Infrastructure Project

Static Analysis

FunctionGraph

DataDependency

Function GraphbullWe construct our map of the program

bullUse this for fuzzing entry points

bull ldquoMapsrdquo the static to dynamic

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Data Dependencies

bullLook at data local variables etc

bullTaint analysisbullParse ASTbullStore dependency relationships into ldquofactsrdquo

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer Hypothesis Engine

bullHypothesis Engine tracks what succeeds and what does not

bullSimple rules engine

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 12: The LLVM Compiler Infrastructure Project

FunctionGraph

DataDependency

Hypothesis Engine

Thursday November 29 12

FunctionGraph

DataDependency

Hypothesis Engine

Data Dependencies

bullLook at data local variables etc

bullTaint analysisbullParse ASTbullStore dependency relationships into ldquofactsrdquo

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer Hypothesis Engine

bullHypothesis Engine tracks what succeeds and what does not

bullSimple rules engine

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 13: The LLVM Compiler Infrastructure Project

FunctionGraph

DataDependency

Hypothesis Engine

Data Dependencies

bullLook at data local variables etc

bullTaint analysisbullParse ASTbullStore dependency relationships into ldquofactsrdquo

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer Hypothesis Engine

bullHypothesis Engine tracks what succeeds and what does not

bullSimple rules engine

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 14: The LLVM Compiler Infrastructure Project

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

DataDependency

Hypothesis Engine

Fuzzer Hypothesis Engine

bullHypothesis Engine tracks what succeeds and what does not

bullSimple rules engine

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 15: The LLVM Compiler Infrastructure Project

DataDependency

Hypothesis Engine

Fuzzer Hypothesis Engine

bullHypothesis Engine tracks what succeeds and what does not

bullSimple rules engine

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 16: The LLVM Compiler Infrastructure Project

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 17: The LLVM Compiler Infrastructure Project

Static Analysis

Dependency

Hypothesis Engine

Fuzzer

Fuzzer

bullUtilize the LLDB python framework to drive the fuzzing

bullAllows us to easily recover register stack information etc

bullPython allows for easy extensions

Thursday November 29 12

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 18: The LLVM Compiler Infrastructure Project

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Thursday November 29 12

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 19: The LLVM Compiler Infrastructure Project

Static Analysis

bull Rewriting the scan-buildanalyzer PERL scripts to Python scripts

bull Why Masochism Maybe a little

bull Comprehensive this makes it easier to take a python Fuzzer and have it interact with the static analyzer

Static Analysis

Thursday November 29 12

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 20: The LLVM Compiler Infrastructure Project

In Depth Analysis

bull With a rewrite we can make it easier to

bull Run non-standard checkers when needed not just when aware

bull Import custom checkersclang++ -Xclang -load -Xclang MainCallCheckerso -Xclang

-analyzer-checker=exampleMainCallChecker --analyze global_staticcpp

Thursday November 29 12

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 21: The LLVM Compiler Infrastructure Project

Call Function Graph

bull Anchors our static to the dynamic engine

bull We want to do this offline so we tie into hypothesis after all this is a development tool

bull Ultimately want to present the developer with a suggestion for another approach

bull Pie in the sky probably but canrsquot get it right unless you try

FunctionGraph

Thursday November 29 12

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 22: The LLVM Compiler Infrastructure Project

Call FunctionMappings

Function Variable Type Input to SubFunction

_start argv char main

main user_info struct param defaults

defaults diag_level unsigned int set_depth

set_depth cur_wrk_dir char readenv

Thursday November 29 12

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 23: The LLVM Compiler Infrastructure Project

Data Dependencybull Find local variables sources

understand how these taint downstream objects

bull Leverage the CFG to follow the byte trail

bull Donrsquot forget we can readwrite to memory if we absolutely need to

DataDependency

Thursday November 29 12

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 24: The LLVM Compiler Infrastructure Project

Local Tainting

bull By understanding the CFG we can even directly write to memory using LLDB to look at crash severity ie write a test payload

bull Write ldquoAAAArdquo = 0x41414141rdquo or other recognizable patterns into memory to taint

Thursday November 29 12

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 25: The LLVM Compiler Infrastructure Project

Hypothesis Enginebull This is a development tool there ought

to be a ldquogoalrdquo

bull Letrsquos face it incorrect assumptions lead to serious problems within a project

bull We can present information in this format as well as present ldquohypothesisrdquo for how to break the software - understand the limits

Hypothesis Engine

Thursday November 29 12

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 26: The LLVM Compiler Infrastructure Project

Facts Enginebull Using PYKE a Python based AI engine that

mirrors Prolog in terms of its functionality

bull Excellent fact storage

bull Draw conclusions ldquogoalsrdquo in PYKE Parlance to deduce information

bull PYKErsquos use of a ldquoplanrdquo for general or specific use cases is a valuable piece for our architecture

Thursday November 29 12

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 27: The LLVM Compiler Infrastructure Project

Conclusions

bull Draw on conclusions reached

bull At a later stage comments or other syntactic elements could be incorporated to test developerrsquos goals

Thursday November 29 12

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 28: The LLVM Compiler Infrastructure Project

Dynamic Instrumentation

bull Simple implementation for this project is a fuzzer

bull Incorporate some aspects of Sulley to leverage LLDB rather than PyDBG

bull Record keeping data generation most notably

Fuzzer

Thursday November 29 12

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 29: The LLVM Compiler Infrastructure Project

Crash Investigationbull The developer doesnt want just a log of crashes

Understanding why it crashed and severity is key

bull This means generate the crash use LLDB to store register information jump up a stack frame to create a hypothesis as to the crash

bull Can test against additional static analysis as well as generating additional dynamic tests of the hypothesis

Thursday November 29 12

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 30: The LLVM Compiler Infrastructure Project

Example of CrashRegister InformationGeneral Purpose Registers rax = 0x0000000000000000 rbx = 0x00007fff54dfcd00 rcx = 0x00007fff54dfcce8 rdx = 0x0000000000000000 rdi = 0x00000000000004cf rsi = 0x0000000000000006 rbp = 0x00007fff54dfcd10 rsp = 0x00007fff54dfcce8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x00007fff8d5a6342 libsystem_kerneldylib`sigprocmask + 10 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fff8d5a4d46 libsystem_kerneldylib`__kill + 10 rflags = 0x0000000000000206 cs = 0x0000000000000007 fs = 0x0000000000000000 gs = 0x0000000000000000

Diagnose Exploitability based on register control

LLDB python APIeasily grabs register state etc

Thursday November 29 12

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 31: The LLVM Compiler Infrastructure Project

Tracking Local Statesbull Python API use breakpoints to ldquopauserdquo

bull At these junctures we can grab local state ensure we understand how the program is being traversed (similar to dtrace functionality)

while processGetState() == lldbeStateStopped com_interpreterHandleCommand( command result ) name = which_frame( result str( target) ) if name add our name to the iterative results print Were at s name framesappend( name ) else lets do a register dump and kill the process print Stopped process performing register dump com_interpreterHandleCommand( register read result ) fhandlewrite(Execution Errorn) fhandlewrite(Register Information nsn resultGetOutput() ) processDestroy() break

processContinue()

Thursday November 29 12

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 32: The LLVM Compiler Infrastructure Project

Coupling

bull Fact based storage of both source deductions and dynamic results are used

bull Next iteration uses lessons learned

bull Augment with additional checkers or even notify developer of an incomplete analysis

Thursday November 29 12

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 33: The LLVM Compiler Infrastructure Project

Python Driverbull Alpha version delivered in Septemberanalysis_step = rdquordquordquoAnalysis Step =====================================================================Using the staticminusanalyzer to build the products via source as wellas assemble the analysis for inputs to dynamic instrumentation phase rdquordquordquoprint colored ( analysis_step rsquoyellowrsquo ) run scan build HtmlDir = scanbuildMain([rsquoclang++rsquorsquo-grsquorsquosimplecxxrsquorsquo-orsquorsquosimple2rsquo]) if HtmlDir == None print rdquordquordquoWe didnrsquot produce a report for now we flag this but this means that larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703ourstatic analysis didnrsquot reveal anything rdquordquordquo minusminusminusminusminusminusminusminusminusminusminusminusminusStep two assemble call function graphcfg_step = rdquordquordquoCall Function Graph =====================================================================Building the call function graph and a few other related inputs for downstream analysis and supporting functionality rdquordquordquoprint colored ( cfg_step rsquoyellowrsquo ) generate callgraph and assemblewalkcallgraph Main ( rsquosimplecxxrsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminusStep three assemble for supporting Hypothesis fact generation plus larr10485761048577104857810485791048580104858110485821048583104858410485851048586104858710485881048589104859010485911048592104859310485941048595104859610485971048598104859910486001048601104860210486031048604104860510486061048607104860810486091048610104861110486121048613104861410486151048616104861710486181048619104862010486211048622104862310486241048625104862610486271048628104862910486301048631104863210486331048634104863510486361048637104863810486391048640104864110486421048643104864410486451048646104864710486481048649104865010486511048652104865310486541048655104865610486571048658104865910486601048661104866210486631048664104866510486661048667104866810486691048670104867110486721048673104867410486751048676104867710486781048679104868010486811048682104868310486841048685104868610486871048688104868910486901048691104869210486931048694104869510486961048697104869810486991048700104870110487021048703 report scanning from our analysis step hypothesis_step = rdquordquordquoHypothesis Generation =====================================================================Building various hypothesis reframing meta data for more abstract representations and assembling supporting information for dynamic testing rdquordquordquoprint colored ( hypothesis_step rsquoyellowrsquo ) compile the facts engine = knowledge_engineengine(rsquorsquo) reveal where our issues arescanparser Main ( HtmlDir rsquosimplecxxrsquo ) step four run the fuzzer dynamic_step = rdquordquordquoDynamic Instrumentation =====================================================================Fuzzing the program using both predetermined pathways that analysis has come up along with more standard ( i e conventional ) fuzzing techniques rdquordquordquoprint colored ( dynamic_step rsquoyellowrsquo ) Runthefuzzerfuzzer1 Driver ( rsquosimple2rsquo ) minusminusminusminusminusminusminusminusminusminusminusminusminus Step five summarize the results summarization_step = rdquordquordquoSummarization =====================================================================Summarization of the analysis so far minus in the Beta version of the software we allow the optional recycling of this information back to the static analysis step to allow biminusdirectional communicationrdquordquordquoprint colored ( summarization_step rsquoyellowrsquo ) run the summarizersummarization Main ( rsquodynamic-instrumentationresultstxtrsquo )

ScanBuild

Mapping

Artificial Intelligence

Fuzzing

Summarization

Thursday November 29 12

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 34: The LLVM Compiler Infrastructure Project

Other plugins

bull Using other modules

bull This is another reason to use Python as there are numerous fuzzing and analysis libraries

bull Fairly straightforward for analysis Replacing the LLDB module can be done but not a trivial operation

Thursday November 29 12

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 35: The LLVM Compiler Infrastructure Project

Example of Incorporating Other Modules Sulley

bull Sulley uses pydbg which is necessarily replaced by LLDB

bull This is a substantial change within the files process_monitorpy instrumentationpypedrpcpy

bull But this is only 3 of 51 files Meaning that this is an essential but not onerous task

Thursday November 29 12

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 36: The LLVM Compiler Infrastructure Project

Important Ideas to Carry Over

bull Keep it modular

bull We want a general architecture that is easily customized

bull Fits well with Python modules

bull Plug and play

Thursday November 29 12

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 37: The LLVM Compiler Infrastructure Project

Envisioning a WorkFlow

Static Analysis

FunctionGraph

DataDependency

Hypothesis Engine

Fuzzer

Scripting (as it as been)

Now JSONAlternatively SQL

PYKE Factsbut will translate to

SQL

Now JSONAlternatively SQL

Facts butwill extend to SQL

Thursday November 29 12

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 38: The LLVM Compiler Infrastructure Project

Other Ideasbull By keeping this in a scripting language we

can create a distributed service without too much pain

bull LLVM Interpreter to simulate for additional architectures

bull Extend with additional black box techniques or modules

Thursday November 29 12

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 39: The LLVM Compiler Infrastructure Project

Project Timelinebull Wersquove released an alpha to DARPA a beta

release is due towards the end of November

bull After the conclusion of this project we will contribute this to the open source community

bull Look for this in early December or thereabouts

bull Feel free to contact jcarlsongototheboardcom

Thursday November 29 12

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 40: The LLVM Compiler Infrastructure Project

Thank You

Thanks to

Ayal Spitz Alan Stone Seth Landsman and especially Peiter Zatko DARPA and Bit Systems

And of course - Thanks to you for listening

Thursday November 29 12

Questions

Thursday November 29 12

Page 41: The LLVM Compiler Infrastructure Project

Questions

Thursday November 29 12