Body armor for binaries: preventing buffer overflows without
Embed Size (px)
Transcript of Body armor for binaries: preventing buffer overflows without
Body armor for binaries: preventing buffer overflows without recompilation
Asia SlowinskaVrije Universiteit Amsterdam
Traian StancescuGoogle, Inc.
Herbert BosVrije Universiteit Amsterdam
BinArmor is a novel technique to protect existingC binaries from memory corruption attacks on bothcontrol data and non-control data. Without access tosource code, non-control data attacks cannot be detectedwith current techniques. Our approach hardens binariesagainst both kinds of overflow, without requiring the pro-grams source or symbol tables. We show that BinArmoris able to stop real attacksincluding the recent non-control data attack on Exim. Moreover, we did not in-cur a single false positive in practice. On the downside,the current overhead of BinArmor is highalthough noworse than competing technologies like taint analysisthat do not catch attacks on non-control data. Specifi-cally, we measured an overhead of 70% for gzip, 16%-180% for lighttpd, and 190% for the nbench suite.
Despite modern security mechanisms like stack protec-tion , ASLR , and PaX/DEP/WX , bufferoverflows rank third in the CWE SANS top 25 most dan-gerous software errors . The reason is that attackersadapt their techniques to circumvent our defenses.
Non-control data attacks, such as the well-known at-tacks on exim mail servers (Section 2), are perhaps mostworrying [12, 30]. Attacks on non-control data are hardto stop, because they do not divert the control flow, donot execute code injected by the attacker, and often ex-hibit program behaviors (e.g., in terms of system call pat-terns) that may well be legitimate. Worse, for binaries,we do not have the means to detect them at all.
Current defenses against non-control data attacks allrequire access to the source code [20, 3, 4]. In contrast,security measures at the binary level can stop variouscontrol-flow diversions [15, 2, 19], but offer no protec-tion against corruption of non-control data.
Even for more traditional control-flow diverting at-
tacks, current binary instrumentation systems detect onlythe manifestations of attacks, rather than the attacksthemselves. For instance, they detect a control flow di-version that eventually results from the buffer overflow,but not the actual overflow itself, which may have oc-curred thousands of cycles before. The lag between time-of-attack and time-of-manifestation makes it harder toanalyze the attack and find the root cause .
In this paper, we describe BinArmor, a tool to bolt alayer of protection on C binaries that stops state-of-the-art buffer overflows immediately (as soon as they occur).
High level overview Rather than patching systems af-ter a vulnerability is found, BinArmor is proactive andstops buffer (array) overflows in binary software, beforewe even know it is vulnerable. Whenever it detects an at-tack, it will raise an alarm and abort the execution. Thus,like most protection schemes, we assume that the systemcan tolerate rare crashes. Finally, BinArmor operates inone of two modes. In BA-fields mode, we protect indi-vidual fields inside structures. In BA-objects mode, weprotect at the coarser granularity of full objects.
BinArmor relies on limited information about the pro-grams data structuresspecifically the buffers that itshould protect from overflowing. If the programs sym-bol tables are available, BinArmor is able to protectthe binary against buffer overflows with great precision.Moreover, in BA-objects mode no false positives are pos-sible in this case. While we cannot guarantee this in BA-fields mode, we did not encounter any false positives inpractice, and as we will discuss later, they are unlikely.
However, while researchers in security projects fre-quently assume the availability of symbol tables , inpractice, software vendors often strip their code of all de-bug symbols. In that case, we show that we can use auto-mated reverse engineering techniques to extract symbolsfrom stripped binaries, and that this is enough to pro-tect real-world applications against real world-attacks.To our knowledge, we are the first to use data structure
if a pointer that first pointed into an array...
...later accesses an area outside the array...
crash()Find arrays in binaries.
Find accesses to arrays.
Rewrite the binary:- assign colours to arrays- check colors on every array access
(i) (ii) (iii)
Fig. 1: BinArmor overview.
recovery to prevent memory corruption. We believe theapproach is promising and may also benefit other sys-tems, like XFI  and memory debuggers .
BinArmor hardens C binaries in three steps (Fig. 1):
(i) Data structure discovery: dynamically extract thedata structures (buffers) that need protection.
(ii) Array access discovery: dynamically find poten-tially unsafe pointer accesses to these buffers.
(iii) Rewrite: statically rewrite the binary to ensure thata pointer accessing a buffer stays within its bounds.
Data structure discovery is easy when symbol tablesare available, but very hard when they are not. In theabsence of symbol tables, BinArmor uses recent researchresults  to reverse engineer the data structures (andespecially the buffers) from the binary itself by analyzingmemory access patterns (Fig. 1, step i). Something is astruct, if it is accessed like a struct, and an array, if itis accessed like an array. And so on. Next, given thesymbols, BinArmor dynamically detects buffer accesses(step ii). Finally, in the rewrite stage (step iii), it takesthe data structures and the accesses to the buffers, andassigns to each buffer a unique color. Every pointer usedto access the buffer for the first time obtains the colorof this buffer. BinArmor raises an alert whenever, say, ablue pointer accesses a red byte.
Contributions BinArmor proactively protects existingC binaries, before we even know whether the code is vul-nerable, against attacks on control data and non-controldata, and it can do so either at object or sub-field gran-ularity. Compared to source-level protection like WIT,BinArmor has the advantage that it requires no access tosource code or the original symbol tables. In addition,in BA-fields mode, by protecting individual fields in-side a structure rather than aggregates, BinArmor is finer-grained than WIT and similar solutions. Also, it preventsoverflows on both writes and reads, while WIT protectsonly writes and permits information leakage. Further, weshow in Section 9 that points-to analysis (a technique re-lied on by WIT), is frequently imprecise.
Compared to techniques like taint analysis that alsotarget binaries, BinArmor detects both control flow andnon-control flow attacks, whereas taint analysis detectsonly the former. Also, it detects attacks immediatelywhen they occur, rather than sometime later, when afunction pointer is used.
The main drawback of BinArmor is the very signif-icant slowdown (up to 2.8x for the lighttpd webserverand 1.7x for gzip). While better than most tainting sys-tems (which typically incur 3x-20x), it is much slowerthan WIT (1.04x for gzip). Realistically, such slow-downs make BinArmor in its current form unsuitable forany system that requires high performance. On the otherhand, it may be used in application domains where se-curity rather than performance is of prime importance.In addition, because BinArmor detects buffer overflowsthemselves rather than their manifestations, we expect itto be immediately useful for security experts analyzingattacks. Finally, we will show later that we have not ex-plored all opportunities for performance optimization.
Our work builds on dynamic analysis, and thus suffersfrom the limitations of all dynamic approaches: we canonly protect what we execute during the analysis. Thiswork is not about code coverage. We rely on existingtools and test suites to cover as much of the binary aspossible. Since coverage is never perfect, we may missbuffer accesses and thus incur false negatives. Despitethis, BinArmor detected all 12 real-world buffer overflowattacks in real-world applications we study (Section 8).
BinArmor takes a conservative approach to preventfalse positives (unnecessary program crashes). For in-stance, no false positives are possible when the protec-tion is limited to structures (BA-objects mode). In BA-fields mode, we can devise scenarios that lead to falsepositives due to the limited code coverage. However, wedid not encounter any in practice, and we will show thatthey are very unlikely.
Since our dynamic analysis builds on Qemu  pro-cess emulation which is only available for Linux, we tar-get x86 Linux binaries, generated by gcc (albeit of var-ious versions and with different levels of optimization).However, there is nothing fundamental about this and thetechniques should apply to other systems also.
2 Some buffer overflows are hard to stop:the Exim attack on non-control data
In December 2010, Sergey Kononenko posted a messageon the exim developers mailing list about an attack on theexim mail server. The news was slashdotted shortly af-ter. The remote root vulnerability in question concerns aheap overflow that causes adjacent heap variables to beoverwritten, for instance an access control list (ACL) forthe sender of an e-mail message. A compromised ACL
is bad enough, but in exim the situation is even worse. Itspowerful ACL language can invoke arbitrary Unix pro-cesses, giving attackers full control over the machine.
The attack is a typical heap overflow, but what makesit hard to detect is that it does not divert the programscontrol flow at all. It only overwrites non-control data.ASLR, WX, canaries, system call analysisall fail tostop or even detect the attack.
Both classic buffer ove