Exploring Effective Fuzzing Strategies to Analyze ... · FEAST’19, November 15, 2019, London,...

7
Exploring Effective Fuzzing Strategies to Analyze Communication Protocols Yurong Chen, Tian Lan, Guru Venkataramani {gabrielchen,tlan,guruv}@gwu.edu ABSTRACT In recent years, coverage-based greybox fuzzing has become pop- ular forvulnerability detection due to its simplicity and efficiency. However, it is less powerful when applied directly to protocol fuzzing due to the unique challenges involved in fuzzing communi- cation protocols. In particular, the communication among multiple ends contains more than one packet, which are not necessarily dependent upon each other, i.e., fuzzing single (usually the first) packet can only achieve extremely limited code coverage. In this paper, we study such challenges and demonstrate the limitation of current non-stateful greybox fuzzer. In order to achieve higher code coverage, we design stateful protocol fuzzing strategies for communication protocols to explore the code related to different protocol states. Our approach contains a state switching engine, to- gether with a multi-state forkserver to consistently and flexibly fuzz different states of an compiler-instrumented protocol program. Our experimental results on OpenSSL show that our approach achieves an improvement of 73% more code coverage and 2× unique crashes when comparing against fuzzing the first packet during a protocol handshake. CCS CONCEPTS Security and privacy Intrusion/anomaly detection and malware mitigation; Networks Web protocol security; Software and its engineering Software reliability. ACM Reference Format: Yurong Chen, Tian Lan, Guru Venkataramani. 2019. Exploring Effective Fuzzing Strategies to Analyze Communication Protocols. In 3rd Workshop on Forming an Ecosystem Around Software Transformation (FEAST’19), No- vember 15, 2019, London, United Kingdom. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3338502.3359762 1 INTRODUCTION Vulnerabilities in network protocols (such as Heartbleed in OpenSSL and Remote Code Execution in SNMP) are among the most devas- tating security problems since their exploitation typically exposes hundreds of thousands of networked devices to catastrophic risk. Efforts have been made toward developing automatic and scal- able techniques to detect vulnerabilities in large protocol codebase. In particular, fuzzing has gained increasing popularity due to its Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. FEAST’19, November 15, 2019, London, United Kingdom © 2019 Association for Computing Machinery. ACM ISBN 978-1-4503-6834-6/19/11. . . $15.00 https://doi.org/10.1145/3338502.3359762 p1’ p1 a p1 b p2’ p2 a p2 b p4’ p4 a p4 b p3’ p3 a p3 b Figure 1: Fuzzing a four-packet-flight communication proto- col: Different packets/fields are independent, e.g., each vari- ation of packet p2 may lead to a different subsequent packet p3, p4. The bug can only be triggered when following the ex- act sequence p1 p2 p3 p4 , driving the protocol to progressively traverse the corresponding states and eventu- ally reach the bug. simplicity and efficiency in practice, as compared to other testing techniques such as symbolic/concolic executions. Existing protocol fuzzers can be broadly categorized into three classes: whitebox, blackbox and greybox fuzzers. 1) A whitebox or blackbox protocol fuzzer is typically part of the communication chain by either directly mimicking a client/server in the protocol or acting as a Man-In-The-Middle (MITM) proxy. It generates/in- tercepts packets among multiple network entities, mutates and relays them. A whitebox fuzzer assumes that the specifications of the protocol is known while a blackbox fuzzer reverse engi- neers the protocol by packet/program analysis. 2) The greybox fuzzer works together with a proper testing program (TP) provided for the network protocol. It feeds the mutated inputs to the TP, which is responsible for executing the protocol, while the fuzzer stays out of the communication between clients and servers. For example, OpenSSL has several “official” testing programs [24] for LibFuzzer [27] and AFL [35]. Limitations: Despite recent progress on fuzzing tools, a number of fundamental limitations still exist. While both whitebox fuzzers (such as [1, 26]) and blackbox fuzzers (such as [3, 14, 16]) perform “blind” fuzzing and fail to leverage some useful program execution information, the blackbox fuzzers particularly suffer from the in- accuracy of protocol reverse engineering. Greybox fuzzers [5, 6, 17, 20, 27, 2931, 35] usually need a well-constructed TP with in- put interface to perform mutation and execution. Popular greybox fuzzers such as AFL can only fuzz the first packet by default. Challenges: Protocol fuzzing possesses the following unique challenges. 1) Communication protocols are typically implemented through state machines on servers/clients with state transitions driven by critical protocol events such as packet/message exchange. Although proxies can be employed to mutate and fuzz packets on the fly [4], the fuzzing is often not stateful and lacks the ability to drive protocol to a specific state of interest, trap it in the state, Workshop Presentation FEAST’19, November 15, 2019, London, United Kingdom 17

Transcript of Exploring Effective Fuzzing Strategies to Analyze ... · FEAST’19, November 15, 2019, London,...

Page 1: Exploring Effective Fuzzing Strategies to Analyze ... · FEAST’19, November 15, 2019, London, United Kingdom ... show that we are able to achieve 1.73X code coverage and discover

Exploring Effective Fuzzing Strategies to AnalyzeCommunication Protocols

Yurong Chen, Tian Lan, Guru Venkataramani{gabrielchen,tlan,guruv}@gwu.edu

ABSTRACTIn recent years, coverage-based greybox fuzzing has become pop-ular forvulnerability detection due to its simplicity and efficiency.However, it is less powerful when applied directly to protocolfuzzing due to the unique challenges involved in fuzzing communi-cation protocols. In particular, the communication among multipleends contains more than one packet, which are not necessarilydependent upon each other, i.e., fuzzing single (usually the first)packet can only achieve extremely limited code coverage. In thispaper, we study such challenges and demonstrate the limitationof current non-stateful greybox fuzzer. In order to achieve highercode coverage, we design stateful protocol fuzzing strategies forcommunication protocols to explore the code related to differentprotocol states. Our approach contains a state switching engine, to-gether with a multi-state forkserver to consistently and flexibly fuzzdifferent states of an compiler-instrumented protocol program. Ourexperimental results on OpenSSL show that our approach achievesan improvement of 73% more code coverage and 2× unique crasheswhen comparing against fuzzing the first packet during a protocolhandshake.

CCS CONCEPTS• Security and privacy → Intrusion/anomaly detection andmalware mitigation; • Networks→Web protocol security; •Software and its engineering → Software reliability.

ACM Reference Format:Yurong Chen, Tian Lan, Guru Venkataramani. 2019. Exploring EffectiveFuzzing Strategies to Analyze Communication Protocols. In 3rd Workshopon Forming an Ecosystem Around Software Transformation (FEAST’19), No-vember 15, 2019, London, United Kingdom.ACM, New York, NY, USA, 7 pages.https://doi.org/10.1145/3338502.3359762

1 INTRODUCTIONVulnerabilities in network protocols (such asHeartbleed inOpenSSLand Remote Code Execution in SNMP) are among the most devas-tating security problems since their exploitation typically exposeshundreds of thousands of networked devices to catastrophic risk.Efforts have been made toward developing automatic and scal-able techniques to detect vulnerabilities in large protocol codebase.In particular, fuzzing has gained increasing popularity due to its

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected]’19, November 15, 2019, London, United Kingdom© 2019 Association for Computing Machinery.ACM ISBN 978-1-4503-6834-6/19/11. . . $15.00https://doi.org/10.1145/3338502.3359762

p1’p1a p1b … … p2’p2a p2b … …

p4’p4a p4b … … p3’p3a p3b … …

Figure 1: Fuzzing a four-packet-flight communication proto-col: Different packets/fields are independent, e.g., each vari-ation of packet p2may lead to a different subsequent packetp3, p4. The bug can only be triggered when following the ex-act sequence p1′ → p2′ → p3′ → p4′, driving the protocol toprogressively traverse the corresponding states and eventu-ally reach the bug.

simplicity and efficiency in practice, as compared to other testingtechniques such as symbolic/concolic executions.

Existing protocol fuzzers can be broadly categorized into threeclasses: whitebox, blackbox and greybox fuzzers. 1) A whitebox orblackbox protocol fuzzer is typically part of the communicationchain by either directly mimicking a client/server in the protocolor acting as a Man-In-The-Middle (MITM) proxy. It generates/in-tercepts packets among multiple network entities, mutates andrelays them. A whitebox fuzzer assumes that the specificationsof the protocol is known while a blackbox fuzzer reverse engi-neers the protocol by packet/program analysis. 2) The greyboxfuzzer works together with a proper testing program (TP) providedfor the network protocol. It feeds the mutated inputs to the TP,which is responsible for executing the protocol, while the fuzzerstays out of the communication between clients and servers. Forexample, OpenSSL has several “official” testing programs [24] forLibFuzzer [27] and AFL [35].

Limitations:Despite recent progress on fuzzing tools, a numberof fundamental limitations still exist. While both whitebox fuzzers(such as [1, 26]) and blackbox fuzzers (such as [3, 14, 16]) perform“blind” fuzzing and fail to leverage some useful program executioninformation, the blackbox fuzzers particularly suffer from the in-accuracy of protocol reverse engineering. Greybox fuzzers [5, 6,17, 20, 27, 29–31, 35] usually need a well-constructed TP with in-put interface to perform mutation and execution. Popular greyboxfuzzers such as AFL can only fuzz the first packet by default.

Challenges: Protocol fuzzing possesses the following uniquechallenges. 1) Communication protocols are typically implementedthrough state machines on servers/clients with state transitionsdriven by critical protocol events such as packet/message exchange.Although proxies can be employed to mutate and fuzz packets onthe fly [4], the fuzzing is often not stateful and lacks the abilityto drive protocol to a specific state of interest, trap it in the state,

Workshop Presentation FEAST’19, November 15, 2019, London, United Kingdom

17

Page 2: Exploring Effective Fuzzing Strategies to Analyze ... · FEAST’19, November 15, 2019, London, United Kingdom ... show that we are able to achieve 1.73X code coverage and discover

and keep replaying and fuzzing it (e.g., with different packet in-puts). Stateful fuzzing is necessary for communication protocols.2) A general program takes inputs when being launched, and theexecution status depends solely on the inputs (excluding irrelevantfactors such as system status and user interruptions). However, inprotocols, there are multiple rounds of message flights that containboth independent and dependent packets/fields. Simply fuzzing onesingle packet/field limits achievable code coverage, whereas mutat-ing packets/fields together may lead to inconsistent (and invalid)message flights. Protocol fuzzers need to identify the dependenceand adapt its fuzzing strategies accordingly.

We illustrate the inefficiency of stateless and individual-packetfuzzing in Fig. 1. The example involves four packets in a completeround of handshake, and the packets are partially interdependent,e.g., the response to p1a could be p2a or p2b and so on. The buggycode is only executed when the packet sequence exactly followsp1′ → p2′ → p3′ → p4′. Simply fuzzing the first packet couldhelp us discover p1′. However, a single execution with p1′ maynot always trigger response p2′ and subsequent p3′,p4′, whilecontinuing to mutate the first packet can only generate differentvariations ofp1′, getting us no closer top2′. On the other hand, if thefuzzer starts fuzzing from p2 given a random p1 (say, p1a ), the bugwill not be triggered either. Therefore, a protocol fuzzer needs havethe ability to trap the protocol execution inside a state right afterp1′and to keep replaying and fuzzing the state repeatedly. It enablesthe protocol fuzzer to move forward in a progressive manner - bymoving into (and focusing on) a new state after finding p2′, andthen p3′,p4′ - and eventually finding the bug more efficiently.

Our proposal:A protocol fuzzer needs to be stateful. To achievemaximum code coverage and fuzzing depth, it should be able toidentify, replicate, and switch between different protocol stateswhile maintaining execution consistency. In this paper, we proposea progressive fuzzer for stateful communication protocols. In par-ticular, our approach consists a stateful fuzzer and an instrumentedtesting program (TP). The stateful fuzzer builds multiple fuzzingstates across the TP execution and identifies the correspondingfuzzing targets (i.e., packets and fields) for different fuzzing states.It then chooses when to replicate protocol states, progress (moveforward to the next fuzzing state), and regress (roll back to theprevious fuzzing state), based on the fuzzing yield achieved on thefly. For example, in Fig. 1, we will first fuzz p1 to find p1′, then savethe execution state (by forking) to continue fuzzing p2 given p1′,and continue the same process for p3′ and p4′. The fuzzing statemay move forward and backward several times during progressionand regression to identify the sweet spot for highest fuzzing yield,until the fuzzer can no longer find interesting testcases.

In summary, this work makes the following contributions.• We propose a novel framework for stateful protocol fuzzing.It consists of a stateful fuzzer and an instrumented TP, whichwork in concert to identify, replicate, and switch betweendifferent protocol states while maintaining execution consis-tency during fuzzing.

• We enable flexible power schedules1 to fully capitalize the po-tential of our fuzzer design. In this paper, we implement and

1The power schedule is the policy of assigning time to each testcase. In our approach,power schedule also denotes the time spent on each protocol state.

AFL

init FS

mutate

signal FS

stop?

analyze

TC’

FS

config

__AFL_INIT

recv signal

stop?

fork TPTP

execute

end

TC

1

2

Figure 2: Simplified AFL forking workflow. FS: forkserver,TC/TC’: testcase, TP: testing program

evaluate a new power schedule that continuously focuseson fuzzing the (current) most rewarding protocol states.

• Our experimental results of fuzzing the OpenSSL libraryshow that we are able to achieve 1.73X code coverage anddiscover 3X unique crashes (on average) compared with thedefault AFL.

2 BACKGROUNDAFL [35] is a popular coverage-guided greybox fuzzer. It maintainsa queue for the file paths of the testcases. Starting from the seedtestcase provided by the user, AFL will select one testcase at atime, map the file from disk to a memory buffer for mutation. Eachtestcase will go through multiple rounds of mutation with variousmutation operations (such as bit flips, additions, replacement andso on). After each mutation, the modified buffer will be writtento a file, which will be the input to the TP. Then the fuzzer willsignal TP to execute and wait for the execution to finish to collectinformation such as code coverage and exit status.

Instead of blindly generating testcases to the TP, AFL utilizescompile-time instrumentation to track the basic block transitionsin the TP. Each basic block of TP will have a unique ID and thepair of two IDs can represent the control flow transitions (callededges, which we will use throughout the following sections). AFLstores the occurrence of edges in a 64KB memory (shared betweenthe fuzzer and TP). For each execution, TP will update the sharedmemory about the edges information and the fuzzer will get suchinformation. If new edges occur or the numbers of edge occurenceschange (counts are categorized by value range buckets), the fuzzerwill consider the current testcase as an interesting one. Such test-cases will be appended to the queue for further mutation. Intuitively,

Workshop Presentation FEAST’19, November 15, 2019, London, United Kingdom

18

Page 3: Exploring Effective Fuzzing Strategies to Analyze ... · FEAST’19, November 15, 2019, London, United Kingdom ... show that we are able to achieve 1.73X code coverage and discover

Figure 3: System Overview of our Stateful Fuzzer Design. FS:forkserver, multiQ: queues for storing different types of testcases,TP: testing program

the testcase that can result in more code coverage will get moreattention and serve as the base for later mutations. (For binary-onlyfuzzing, instead of compile time instrumentation, AFL employsQEMU [2] to perform coverage tracking.)

Forkserver: In order to accelerate the fuzzing process, AFLdevelops a forkserver to avoid repeated program initialization [36].Without the forkserver, the fuzzer would call execve() to run the TPevery time a new testcase is generated. And the TP will have to startfrom beginning, e.g., the library initialization and dynamic linking.Such process could occupy a large ratio of the total execution time.In fact, it is the part of code after reading the input that can affect thecode coverage in most of the programs, and such repeated programinitializations are redundant. Hence, AFL designs a forkserver tocall execve() only once. The work flow of forkserver is shown inFig. 2.

The fuzzer calls f ork() to generate the forkserver. The fork-server will perform some configuration first and then call execve()to execute the TP. The TP then executes until a function called__AFL_IN IT () (which is placed at TP by users at desired positionsbeforehand)2. In __AFL_IN IT (), the TP will enter an infinitywhileloop. Every time the fuzzer finishes one mutation of the input andgenerates a new testcase, it will send a forking signal to the fork-server to generate a cloned TP that executes the mutated input. Thefork 1○ denotes the forking in fuzzer to generate forkserver and thefork 2○ denotes the forking in forkserver to generate the processthat reads inputs and executes as in normal TP. In this way, theexecve() is called only once in forkserver. After that, the forkservercan simply clone itself from the point where program initializationis already done.

3 SYSTEM DESIGNOur approach achieves stateful protocol fuzzing by combining astateful fuzzer(Section 3.2) and an instrumented TP(Section 3.1)

2The function __AFL_I N IT () is visible to users, it is actually a macro thatrepresents the function __af l_manual_init (), which will call the function__af l_star t_f orkserver (), where the infinitewhile loop truly resides. However,to simply the description, we will use __AFL_I N IT () throughout this paper.

as shown in Fig. 3. The fuzzer contains an array of queues and amulti-state forkserver(FS). Each queue is used to only store thetestcases that belong to the same fuzzing stage. For the example inFig. 1, there will be four queues for p1, p2, p3 and p4. The fuzzerwill collect the execution status and code coverage informationafter one execution of TP, then decides whether to move forward(proдression) or backward (reдression). The corresponding queuewill be chosen based on such decision to store the target testcases.Meanwhile, the fuzzer also sends such decision to the forkserver.The forserver will then fork at the right point by moving one stepforward or backward. When the forkserver is ready, it will keeplistening to the fuzzer, waiting for signals to fork and generate anew process of TP. We will explain each module separately in thefollowing subsections.

3.1 Testing programFor better understanding, we first show the changes of the TP forprotocol fuzzing, then explain how the fuzzer and forkserver workwith the TP in following subsections. The difference between aprotocol implementation and a general single-state program is that,there are multiple “inputs”(packets) across the protocol while thereis one input for single-state programs. As explained in Section 2,the function __AFL_IN IT () is used to mark the forking point in theTP so that the forkserver will always clone itself at that position.In practice, state machines of protocols are implemented in a whileloop, such as in OpenSSL. A simplified SSL client/server model canalso be developed for fuzzing [4], where socket communication istransformed to file operations3. We unroll the while loop into threestates to better demonstrate the instrumentation of __AFL_IN IT ()as shown in the shadowed area in TP, Fig. 3, but the idea and actualimplementation of our approach are not limited by the number ofstates a protocol may have. While the forkserver is only initializedonce in AFL, we conditionally initialize the forkserver multipletimes for different fork points in TP.

3.2 State-aware fuzzerThe state engine passes forking and fuzzing state information to theforkserver, based on the execution status of TP. It collects protocolstate and code coverage information from TP after each execution,and in turn, analyze such information to decide the forking stateof forkserver and TP in the next execution. At the core of the stateengine is the data structuremultiQ and the methods operate onit: constructQ , storeQ , destroyQ , switchQ . EachmultiQ struct willstore the queue entries with the same type (basically a linked list)as well as the global variables associated with them for logging andanalysis.

The reason for designing the multiple fuzzing queues is that,packets in various stages typically have different formats. It is ob-viously inefficient to uniformly mutate these types of packets togenerate new testcases for whatever state the TP has. And simplyputting all packets into one queue will definitely disrupt the analy-ses that are only meant for one queue. For general programs, onequeue will suffice as what is done in AFL, because it only needs

3In general, tools such as preeny [37] can be used to convert socket communicationsinto file operations through preloading customized libraries, if the source code of TPis not available

Workshop Presentation FEAST’19, November 15, 2019, London, United Kingdom

19

Page 4: Exploring Effective Fuzzing Strategies to Analyze ... · FEAST’19, November 15, 2019, London, United Kingdom ... show that we are able to achieve 1.73X code coverage and discover

to consider a single state of the TP. All the inputs denoted by thequeue entries (no matter what content they contain), will be readby the TP at exactly the same location during the execution, whichis not the case for protocols.

After each execution of the TP, the fuzzer analyzes the protocolstate, TP exit status as well as code coverage, as denoted by arrow5○ and 6○ in Fig. 3. In AFL, there is a 64kB shared memory betweenfuzzer and TP to track the code coverage information. Our designapproach also shares the protocol states using the shared memoryand pipes. The protocol state is updated per execution of TP andonce the fuzzer detects new states (or decides to move to the nex-t/previous state), it will invoke Q-related methods to store/destroycurrent fuzzing Q, and switch to the new Q, as denoted by arrow 1○.Meanwhile, it sends the state information to forkserver (as denotedby arrow 2○).

The state engine is able to utilize flexible policies for progression(moving to the next state) and regression (rolling back to the previ-ous state) based on the specifications of the target protocol or theuser’s requirement. The heuristics are explained later in Section 4.

3.3 CoordinationIn the example shown in Fig. 1, suppose we are mutating the firstpacket p1 and the forkserver is in state f s1. At first the mutatedp1 has wrong packet format and will not pass the format checkingand the server sends back error message and stop the handshake.We use TPstate to track the protocol state changes in TP. At somepoint, the mutated p1′ has the correct format and triggers new statein the TP (TPstate changes4). In this case, when TP finishes currentexecution and the fuzzer gets the updated TPstate , it decides tomutate p2 using this interesting p1′ for the next execution (p1′ canbe retrieved from the previous fuzzing by the fuzzer). So the stateengine signals the forkserver to switch to f s2. The forkserver atf s1 forks another instance f s2 to continue the fuzzing. Meanwhile,f s1 will be blocked by f s2 until f s2 switches back (when the stateengine decides to regress the fuzzing state).

In summary, our proposed stateful fuzzing design performs pro-gram fuzzing flexibly at multiple execution points with “memory”of precedent program states, by incorporating a state engine, amulti-state forkserver and a stateful TP into a closed loop. Thefuzzer analyzes the information provided by the forkserver andTP to decided fuzzing state. The forkserver carries out the statetransition for TP. The policies of state transitions (i.e., when to stay,proдress or reдress) will be discussed in Section 4 and evaluated inSection 5.

4 IMPLEMENTATIONOur implementation is based on AFL that utilizes its coverage-guided testcase generation and the infrastructure of communica-tion. The multi-state fuzzer contains multiple queues (for testcasesrepresenting different stages of packets), a state engine (that ana-lyzes the yield information and decides the next fuzzing state) and astateful forkserver (that is able to switch between different forkingstates).

4Currently we use the number of packet flights seen in each execution as T Pstate .However, there could be more options for specific protocls and fuzzing purposes, suchas execution time, packet size ranges or specific actions that the TP triggers.

The TP is also instrumented as follows. (1) Multiple program lo-cations are selected and set as the forking points for the multi-stateforkserver initialization. (2) In addition to the code coverage andexit status, the TP will share more information (such as TPstate)with the state engine in the fuzzer. (3) The fuzzer and TP also com-municate to record the packets when progressions are performed,such that we can keep track of the chain of packets that lead tovulnerabilities.

Search Policy: Based on the structure of the multi-state fork-server, we implement a DFS-like searching policy to transit amongdifferent fuzzing states. Taking Fig. 1 as an example, when inter-esting p1′ occurs, the fuzzer will use the program state of p1′ andstarts to fuzz p2. If interesting p2′ is generated, then we will followthe program state of p1′ and p2′ to fuzz p3, and so on. During anystate in-between p1 and p4, if no interesting case is generated, thenthe fuzzer will regress to previous fuzzing state (p4 → p3, p3 → p2or p2 → p1). When the fuzzer comes back at p1, then it continuesfuzzing p1 and wait for the next progression.

The conditions of progression and regression define the powerschedule. The fuzzer will perform progression to move the fuzzingstate forward when TPstate satisfies certain conditions (such asthe increase of the number of packets occurred during the currentexecution). In particular, the increase of number of packets indicatesthat the packet currently being fuzzed has triggered a new protocolstate, as well as new code coverage. However, such condition willpotentially prevent progression from happening when TPstatealready reaches its maximum value and cannot increase any more.In Fig. 1, suppose our fuzzer is handling p1, the initial seed of p1might not be valid and the number of packet flights is 2 (p1 and p2,where p2 terminates the session). After certain amount of mutation,a valid p1 is generated (p1′) and TPstate reaches 4. The fuzzer willstart to fuzz p2 based on the program state of p1′. At this point theTPstate will not exceed 4 any more, which means progression willnot be triggered to fuzz p3 and p4. To solve this problem, in additionto the TPstate monitoring, our design also adopts a profile-basedprogression policy. In particular, the fuzzing process is separatedinto two stages: profiling and testing. During the profiling stage,each packet is fuzzed for a fixed amount of time (say, one hour) toprovide an overview of code coverage and fuzzing queue related toeach packet. After profiling, the probability of progression at eachstate is decided. Intuitively, the fuzzing state that has higher codecoverage and more pending queue entries will be assigned morefuzzing time, and the probability of progressing to this fuzzingstate is assigned a larger value. In the testing stage, our fuzzerperforms random progression based on the probabilities determinedduring profiling. Periodically, we update the probabilities by jointlyconsider the code coverage (and queue entries) in the profiling andtesting stages. We also assign a higher score to the packets thattrigger new protocol states, giving more mutation time to thesepackets.

A similar mechanism is applied to regression, i.e., the fuzzingstate with higher code coverage and more pending queue entrieswill have lower probability of regressing to previous fuzzing state.Also, we set other thresholds for regression such asmax_Q_cyclesandmax_entries .When the current fuzzing state finishesmax_Q_cyclesor the index of current queue entry exceedsmax_entries , we will

Workshop Presentation FEAST’19, November 15, 2019, London, United Kingdom

20

Page 5: Exploring Effective Fuzzing Strategies to Analyze ... · FEAST’19, November 15, 2019, London, United Kingdom ... show that we are able to achieve 1.73X code coverage and discover

enforce regression to prevent wasting too much resources uponcurrent fuzzing state.

Note that we set the search policy in our approach heuristically.In fact, the progression and regression conditions can be easilychanged to adapt to different protocols.

Table 1: Statistics of fuzzing single packet (OpenSSL v101) atfour different stages using default AFL for 6 and 24 hours.

CodeCoverage(%)

UniqueCrashes

CyclesDone

Total # ofExecutions(M)

Time(hours)

p1 9.51 1 4 7.87 6p2 10.18 9 0 12.68 6p3 5.56 9 15 12.21 6p4 2.61 6 157 12.43 6

CodeCoverage(%)

UniqueCrashes

CyclesDone

Total # ofExecutions(M)

Time(hours)

p1 9.64 11 30 42.05 24p2 11.16 9 6 49.58 24p3 5.6 14 410 66.20 24p4 2.61 9 1308 54.80 24

5 EVALUATIONIn this section, we evaluate our stateful fuzzer design to answer thefollowing questions: (i) What is the performance of our fuzzer withrespect to metrics such as code coverage and number of uniquecrashes? (ii) How does it compare with non-stateful fuzzing likedefault AFL? (iii) What is the runtime overhead of our fuzzer dueto state forking and replay? (iv) What are the benefits of fuzzingstrategy that targets higher yield on code coverage?

5.1 Environment SetupAll experiments are done on a ubuntu server (16.04.5 LTS) with48 cores (Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz) and 92 GBRAM. Each fuzzer runs on a single core in the same environment.We choose OpenSSL (with version 101, 110) as our benchmark. Asmentioned in [4], we first compile OpenSSL using af l-clanд-f astto generate the static library libssl .a and libcrypto.a. Then in ourinstrumented TP, we invoke functions from libssl .a and libcrypto.ato perform the ssl handshake.We add init_y f uzz() after each packetis generated. In TP, we also get the shared memory pointer throughthe environment variable “__AFL_SHM_ID”, which is created bythe fuzzer and shared among its children processes. We use thenumber of packets occurred in the execution as the value ofTPstate .Hence, each time TPstate changes, we will consider a state changein the protocol. We utilize AFL’s built-in support for ASAN [15] toconsider more crash conditions.

We conduct each batch of experiment that with the same pa-rameters (w.r.t OpenSSL version, fuzzer settings, fuzzing time) fourtimes and show the average numbers where it applies.

5.2 Effect of single-packet fuzzingWe evaluate the performance (in terms of code coverage and uniquecrashes) of fuzzing single packet during OpenSSL handshake to

Table 2: Code coverage breakdown: the code explored byfuzzing four individual packet. Time is in hours. The total sizeof bitmap is 64kB (65536 Bytes). Ui stands for the number of edgesthat are only explored when fuzzing packet i but not explored whenfuzzing other packets (i.e., edges that is unique to packet i).

Version Time Covered Uncovered U1 U2 U3 U4

1016 7677 57859 563 955 30 38610 8879 56657 373 962 29 36024 8896 56640 312 966 32 359

1106 10721 54815 1472 755 26 29610 10093 55443 2123 81 15 29324 11272 54264 2054 738 22 295

demonstrate the limitations of default non-stateful fuzzing. TheTP is constructed using the design proposed by Bock et al [4]. Byassigning different values to packetID in line 11, we can utilize thedefault forkserver in AFL to conveniently fuzz different packets.

In the case of OpenSSL version 101, fuzzing different packetresults in different code coverage. Fuzzing the first and secondpacket typically can yield more code coverage than fuzzing thethird and fourth packet. The average numbers are shown in Table 1.In particular, fuzzing the first and second packet during OpenSSLhandshake for 6 hours can achieve 9.51% and 10.18% code coverage,respectively. However, the third and fourth packet fuzzing can onlyreach 5.56% and 2.61% code since the code space for them to exploreis greatly reduced when starting from the late stage of handshake.Correspondingly, the completed fuzzing queue cycle of later stagefuzzing (p3 and p4) are much larger than early stage fuzzing (p1and p2), which means that AFL cannot find interesting testcasesanymore, so the length of queue is much less and it will finishone round of fuzzing quickly then start the next cycle. When theexperiments are conducted for 24 hours, the code coverage resultsare similar. This indicates that the growth of code coverage whenfuzzing single packet is extremely slow after 6 hours or less.

Among the different code coverage explored by fuzzing differentpackets, some are common code (edges) and others may be uniqueto each packet. We want to find out the composition of the codecoverage by fuzzing individual packets. However, the default AFLassign ID to basic blocks randomly during runtime. If we restart theprogram to fuzz p2 after fuzzing p1, then the assignment of blockIDs will be different, which means that the same edge could appearin different position of the bitmap. Hence, we fuzz the four differentpackets in one run, each for 6 (10,24) hours. When the currentpacket fuzzing lasts for 6 (10,24) hours, we force the progressionto fuzz the next packet, by clearing the code coverage bitmap (theglobal variable virдin_bits in AFL) without relaunching the AFL.The experiment results are shown in Table 2. We can see that thetotal code coverage of 24 hours’ fuzzing (for each packet, the fuzzingtime is 6 hours) is 7677/64kB = 11.71%, which is higher than any ofthe four single-packet fuzzing shown in Table 1, due to the uniquecode coverage. Further, we analyze the bitmap (which is used tostore the code coverage information in AFL), and get the uniqueedges explored by each packet, as shown in Table 2 columnU 1,U 2,U 3 andU 4.

In summary, the experiments conducted in this section hasshown that:

Workshop Presentation FEAST’19, November 15, 2019, London, United Kingdom

21

Page 6: Exploring Effective Fuzzing Strategies to Analyze ... · FEAST’19, November 15, 2019, London, United Kingdom ... show that we are able to achieve 1.73X code coverage and discover

• By only fuzzing one packet, the code coverage is limited.Fuzzing early-stage packets results in higher code coverage.

• Different packet fuzzing can discover unique code. That is,even though late-stage packet fuzzing achieves less codecoverage, it still discovers the code that cannot be discov-ered by early-stage packet fuzzing. (And early-stage packetfuzzing also discovers unique code that cannot be exploredby late-stage packet fuzzing).

These two observations show the need of stateful fuzzing ap-proach, and demonstrate the usefulness of fuzzing different packetsinteractively and heuristically.

5.3 Progression and Regression policiesAfter the profiling stage (as mentioned in Section 4), our fuzzerstarts to perform progression and regression based on the protocolstate changes and the probability (based on code coverage andfuzzing queue during profiling stage). In the case of AFL, fuzzing p1or p2 results in better code coverage and unique crashes as shownin Table 1. In addition, AFL tends to stop discovering new code soonafter a short amount of time when there is no interesting testcases.Our design is able to “escape” the fuzzing stages that are no longerprofitable and flexibly switch between different states to discovernew code.

On average of four 24-hour fuzzing, our stateful fuzzer design isable to discover 19.27% code (of a total size of 64kB shared mem-ory). In particular, fuzzing packet p1, p2, p3 and p4 contributes 10%,4.65%, 1.73% and 2.79% code coverage (of a total size of 64kB sharedmemory) respectively. In other words, fuzzing p1 contributes a per-centage of 10/19.27 = 52.41% of the entire discovered code. Similarly,fuzzing p2, p3 and p4 contributes 24.13%, 8.98% and 14.48%. Andthe air time spent on each fuzzing stage is 7.2, 11, 1.4 and 4.4 hours.In terms of unique crashes, our new fuzzer design founds 43 uniquecrashes during 24 hours (on average), while AFL found 11 whenfuzzing p1 (for 24 hours) or 14 when fuzzing p3 (for 24 hours).

6 RELATEDWORKVarious techniques have been proposed to discover the vulnera-bilities in programs [32–34]. In particular, the network protocolsare common targets of cyber attacks and efforts have been madetowards evaluating and improving the protocol security [10, 11].Program fuzzing has enjoyed success in hunting bugs in real-worldprograms with researchers devoting tremendous efforts into it.

Code-coverage guided fuzzing: Plenty ofworks focus on smartertestcase mutation/selction or search heuristics, to help the fuzzergenerate inputs that explore more/rare/buggy execution paths [13,17, 20, 21, 27, 29, 30, 35]. AFLFast [6] models testcase generation asa Markov chain. It changes the testcase power scheduling policy(scoring and priority mechanism) of default AFL, to prevent AFLspending too much time on the high-frequency testcase, and assign-ing more resource to low-frequency paths. Similarly, AFLGo [5]uses simulated annealing algorithm to assign more mutation timeto testcases that are “closer” to the target basic block, to quicklydirect the fuzzing towards the target code area. These works helpAFL to find the target paths faster by changing the mutation timeassigned to each testcase, but cannot find new paths, e.g., new vul-nerabilities. Our stateful fuzzer design, on the other hand, can not

only optimize the power schedule based on the protocol states, butalso can explore new paths that the default AFL could never exploreby stateful progressions.

Symbolic execution and tainting: Techniques such as taint-ing and symbolic execution are studied to discover vulnerabilitiesin programs [7]. It’s also benificial to combine symbolic exection/-tainting with greybox fuzzing [19, 23]. Angora [9] implements byte-level tainting to locate the critical byte sequences (that determinesbranch control flows) from the input, then use gradient descentalgorithm to solve branch condition to explore both branches. SYM-FUZZ [8] utilizes tainting and symbolic execution to determine thedependencies between input bytes and program CFG, in order todecide which bytes to mutate (optimal input mutation ratio) duringfuzzing. Drill [28] uses concolic execution to solve constraints ofmagic numbers (to guide fuzzing) then apply fuzzing inside eachcode compartment (to mitigate path explosion).

Program transformation: Another interesting line of worktransforms the testing programs for fuzzing [18, 22, 25]. T-Fuzz [25]dynamically traces the testing programs to locate and remove thechecks once the fuzzer gets stuck. Untracer [22] creates customizedtesting programs with software interrupts at the beginning of eachbasic block. Instead of tracing every testcase for coverage informa-tion (as in AFL), Untracer enables the the testing program to signalthe fuzzer once new basic blocks are encountered, thus greatlyreducing the overhead caused by redundant testcase tracing.

Protocol fuzzing: Few greybox fuzzers are designed specifi-cally for protocols (in general, stateful programs). Sulley [1] and itssuccessor Boofuzz [26] are two popular whitebox protocol fuzzersthat generate packets based on protocol specifications then sendthem to target ports for fuzzing. Unlike greybox fuzzers that in-strument the testing program for code coverage and tainting in-formation, the whitebox fuzzers typically only instrument to mon-itor process/network failures. Thus they lack the guidance forsmarter testcase generation and power scheduling. Blackbox proto-col fuzzers [3, 12, 14, 16] have the same limitation.

7 CONCLUSIONIn this paper, we identify the challenges in fuzzing stateful proto-cols/programs and demonstrate the limitation of existing greyboxfuzzers when fuzzing protocols. In order to achieve higher codecoverage for protocol fuzzing, we propose a progressive stateful pro-tocol fuzzer to capture the state changes in protocols, and heuristi-cally explore code spaces that are related to multiple protocol states.We implemented our design upon the popular greybox fuzzer, AFLand evaluate using OpenSSL (v101 and v110). Our experimental re-sults show that we can achieve 1.73× code coverage and 2× uniquecrashes when comparing to only fuzzing the first packet duringthe protocol communication (which is adopted by current greyboxfuzzer).

8 ACKNOWLEDGMENTWe thank the reviewers for their valuable opinions. This work wassupported by the US Office of Naval Research (ONR) under AwardN00014-17-1-2786 and N00014-15-1-2210. Any opinions, findings,conclusions, or recommendations expressed in this article are thoseof the authors, and do not necessarily reflect those of ONR.

Workshop Presentation FEAST’19, November 15, 2019, London, United Kingdom

22

Page 7: Exploring Effective Fuzzing Strategies to Analyze ... · FEAST’19, November 15, 2019, London, United Kingdom ... show that we are able to achieve 1.73X code coverage and discover

REFERENCES[1] Pedram Amini and Aaron Portnoy. Sulley fuzzing framework, 2010.[2] Fabrice Bellard. Qemu, a fast and portable dynamic translator. In USENIX Annual

Technical Conference, FREENIX Track, volume 41, page 46, 2005.[3] Bernhards Blumbergs and Risto Vaarandi. Bbuzz: A bit-aware fuzzing framework

for network protocol systematic reverse engineering and analysis. In MILCOM2017-2017 IEEE Military Communications Conference (MILCOM), pages 707–712.IEEE, 2017.

[4] Hanno Bock. How heartbleed could’ve been found, 2015.[5] Marcel Böhme, Van-Thuan Pham,Manh-DungNguyen, andAbhik Roychoudhury.

Directed greybox fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference onComputer and Communications Security, pages 2329–2344. ACM, 2017.

[6] Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. Coverage-basedgreybox fuzzing as markov chain. IEEE Transactions on Software Engineering,2017.

[7] Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. Klee: Unassisted andautomatic generation of high-coverage tests for complex systems programs. InOSDI, volume 8, pages 209–224, 2008.

[8] Sang Kil Cha, Maverick Woo, and David Brumley. Program-adaptive mutationalfuzzing. In 2015 IEEE Symposium on Security and Privacy, pages 725–741. IEEE,2015.

[9] Peng Chen and Hao Chen. Angora: Efficient fuzzing by principled search. In2018 IEEE Symposium on Security and Privacy (SP), pages 711–725. IEEE, 2018.

[10] Yurong Chen, Tian Lan, and Guru Venkataramani. Damgate: dynamic adaptivemulti-feature gating in program binaries. In Proceedings of the 2017 Workshop onForming an Ecosystem Around Software Transformation, pages 23–29. ACM, 2017.

[11] Yurong Chen, Shaowen Sun, Tian Lan, and Guru Venkataramani. Toss: Tailoringonline server systems through binary feature customization. In Proceedings of the2018 Workshop on Forming an Ecosystem Around Software Transformation, pages1–7. ACM, 2018.

[12] Joeri De Ruiter and Erik Poll. Protocol state fuzzing of t ls implementations. In24thU SEN IX Security Symposium 15), pages 193–206, 2015.

[13] Shuitao Gan, Chao Zhang, Xiaojun Qin, Xuwen Tu, Kang Li, Zhongyu Pei, andZuoning Chen. Collafl: Path sensitive fuzzing. In 2018 IEEE Symposium onSecurity and Privacy (SP), pages 679–696. IEEE, 2018.

[14] Hugo Gascon, ChristianWressnegger, Fabian Yamaguchi, Daniel Arp, and KonradRieck. Pulsar: Stateful black-box fuzzing of proprietary network protocols. InInternational Conference on Security and Privacy in Communication Systems, pages330–347. Springer, 2015.

[15] Google. Addresssanitizer, 2018.[16] Serge Gorbunov and Arnold Rosenbloom. Autofuzz: Automated network protocol

fuzzing framework. IJCSNS, 10(8):239, 2010.[17] Siddharth Karamcheti, Gideon Mann, and David Rosenberg. Adaptive grey-box

fuzz-testing with thompson sampling. In Proceedings of the 11th ACM Workshopon Artificial Intelligence and Security, pages 37–47. ACM, 2018.

[18] Ulf Kargén and Nahid Shahmehri. Turning programs against each other: highcoverage fuzz-testing using binary-code mutation and dynamic slicing. In Pro-ceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering,pages 782–792. ACM, 2015.

[19] Su Yong Kim, Sangho Lee, Insu Yun, Wen Xu, Byoungyoung Lee, Youngtae Yun,and Taesoo Kim. Cab-fuzz: Practical concolic testing techniques for {COTS} oper-ating systems. In 2017 {USENIX} Annual Technical Conference ({USENIX}{ATC}17), pages 689–701, 2017.

[20] Caroline Lemieux and Koushik Sen. Fairfuzz: Targeting rare branches to rapidlyincrease greybox fuzz testing coverage. arXiv preprint arXiv:1709.07101, 2017.

[21] Yongbo Li, Fan Yao, Tian Lan, and Guru Venkataramani. Sarre: semantics-awarerule recommendation and enforcement for event paths on android. IEEE Trans-actions on Information Forensics and Security, 11(12):2748–2762, 2016.

[22] Stefan Nagy and Matthew Hicks. Full-speed fuzzing: Reducing fuzzing overheadthrough coverage-guided tracing. arXiv preprint arXiv:1812.11875, 2018.

[23] Saahil Ognawala, Thomas Hutzelmann, Eirini Psallida, and Alexander Pretschner.Improving function coverage with munch: a hybrid fuzzing and directed symbolicexecution approach. In Proceedings of the 33rd Annual ACM Symposium on AppliedComputing, pages 1475–1482. ACM, 2018.

[24] OpenSSL. Official tesing programs for openssl fuzzing, 2019.[25] Hui Peng, Yan Shoshitaishvili, and Mathias Payer. T-fuzz: fuzzing by program

transformation. In 2018 IEEE Symposium on Security and Privacy (SP), pages697–710. IEEE, 2018.

[26] J Pereyda. boofuzz: Network protocol fuzzing for humans. Accessed: Feb, 17, 2017.[27] Kosta Serebryany. Continuous fuzzing with libfuzzer and addresssanitizer. In

2016 IEEE Cybersecurity Development (SecDev), pages 157–157. IEEE, 2016.[28] Nick Stephens, John Grosen, Christopher Salls, Andrew Dutcher, Ruoyu Wang,

Jacopo Corbetta, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna.Driller: Augmenting fuzzing through selective symbolic execution. In NDSS,volume 16, pages 1–16, 2016.

[29] Robert Swiecki. Honggfuzz: A general-purpose, easy-to-use fuzzer with inter-esting analysis options. URl: https://github. com/google/honggfuzz (visited on06/21/2017).

[30] Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. Superion: Grammar-awaregreybox fuzzing. arXiv preprint arXiv:1812.01197, 2018.

[31] Wen Xu, Sanidhya Kashyap, Changwoo Min, and Taesoo Kim. Designing newoperating primitives to improve fuzzing performance. In Proceedings of the2017 ACM SIGSAC Conference on Computer and Communications Security, pages2313–2328. ACM, 2017.

[32] Hongfa Xue, Yurong Chen, Fan Yao, Yongbo Li, Tian Lan, and Guru Venkatara-mani. Simber: Eliminating redundant memory bound checks via statisticalinference. In IFIP International Conference on ICT Systems Security and PrivacyProtection, pages 413–426. Springer, 2017.

[33] Hongfa Xue, Guru Venkataramani, and Tian Lan. Clone-hunter: acceleratedbound checks elimination via binary code clone detection. In Proceedings of the2nd ACM SIGPLAN InternationalWorkshop onMachine Learning and ProgrammingLanguages, pages 11–19. ACM, 2018.

[34] Fan Yao, Yongbo Li, Yurong Chen, Hongfa Xue, Tian Lan, and Guru Venkatara-mani. Statsym: vulnerable path discovery through statistics-guided symbolicexecution. In 2017 47th Annual IEEE/IFIP International Conference on DependableSystems and Networks (DSN), pages 109–120. IEEE, 2017.

[35] Michal Zalewski. American fuzzy lop, 2014.[36] Michal Zalewski. fuzzing binaries without execve, 2019.[37] Zardus and Mrsmith0x00. Preeny, 2019.

Workshop Presentation FEAST’19, November 15, 2019, London, United Kingdom

23