TightLip: Keeping Applications from Spilling the Beanslpcox/nsdi07/tr-cs-2006-07.pdf · A common...

14
TightLip: Keeping Applications from Spilling the Beans Aydan Yumerefendi, Benjamin Mickle, and Landon P. Cox Duke University Techinical Report CS-2006-07 April, 2006 Abstract Access control misconfigurations are widespread and can result in damaging breaches of confidentiality. This pa- per presents TightLip, a privacy management system that helps users define what data is sensitive and who is trusted to see it rather than forcing them to understand or predict how the interactions of their software packages can leak data. The key mechanism used by TightLip to detect and pre- vent breaches is the doppelganger process. Doppel- gangers are sandboxed copy processes that inherit most, but not all, of the state of an original process. The op- erating system runs a doppelganger and its original in parallel and uses divergent process outputs to detect po- tential privacy leaks. TightLip is compatible with legacy-code and provides complete, continuous protection at a modest perfor- mance cost. SpecWeb99 results show that Apache run- ning on the TightLip prototype exhibits a negligible 5% slowdown in request rate, response time, and transfer rate compared to an unmodified server environment. 1 Introduction Managing the permissions of any shared space is challenging, even for highly skilled system administra- tors [12]. Especially for untrained PC users, access con- trol errors are routine and can lead to damaging privacy leaks. A 2003 usability study of the Kazaa peer-to-peer file-sharing network found that many users share their entire hard drive with the rest of the Internet, including email inboxes and credit card information [10]. Over 12 hours, the study found 156 distinct users who were sharing their email inboxes. Not only were these files available for download, but other users could be observed downloading them. This and similar confidentiality compromises [12, 13, 14, 19] present a different threat model than is normally assumed by the privacy and security literature. In each case, data leaked because of access control misconfig- urations rather than malicious intent or buggy software. For example, companies in the UK were reported to have banned their employees from using Google Desktop be- cause it allows users to search across machines and can store sensitive files on remote Google servers [14]. Neither secure communication channels [7] nor host- based intrusion detection [6, 9] would have prevented these exposures. Furthermore, the impact of these leaks extends beyond the negligent users themselves since the leaked data includes previous communication and transaction records involving other users. No matter how careful any individual is, her privacy will only be as secure as her least competent confidant. Prior ap- proaches to similar problems are either incompatible with legacy code [8, 22] or exhibit prohibitively poor per- formance [17]. Thus, we are exploring a new approach to preventing leaks due to access control misconfigurations through a privacy management system called TightLip. TightLip helps users define what data is important and who is trusted, rather than forcing them to understand the com- plex dynamics of how data flows among their software packages. We have divided the problem into two com- plementary goals—first, to create file and host meta-data, such as labels and group membership, and second, to use this meta-data to alert users about potential leaks. TightLip transparently labels sensitive files by searching for data such as email and tax returns. Users only need to be prompted when a potential breach occurs, which should be rare. This paper focuses on TightLip’s second goal of us- ing labels to stop privacy leaks. Our approach relies on a new operating system object called a doppelganger 1 process. Doppelgangers are sandboxed copy processes that inherit most, but not all, of the state of an original process. In TightLip, doppelgangers are spawned when a process tries to read a sensitive file. The TightLip kernel returns sensitive data to the original and scrubbed data to the doppelganger. The doppelganger and original then run in parallel while the operating system monitors the sequence and arguments of their system calls. If output buffers for the two processes are the same, the output is independent of the sensitive input with very high probability and is al- lowed to pass. However, if output buffers are different, they likely depend on their respective inputs. Routines 1 From the Oxford English Dictionary: “the apparition of a living person; a double, a wraith.” 1

Transcript of TightLip: Keeping Applications from Spilling the Beanslpcox/nsdi07/tr-cs-2006-07.pdf · A common...

Page 1: TightLip: Keeping Applications from Spilling the Beanslpcox/nsdi07/tr-cs-2006-07.pdf · A common approach to data condentiality is to de-compose a system’s software into three sets

TightLip: Keeping Applications from Spilling the Beans

Aydan Yumerefendi, Benjamin Mickle, and Landon P. CoxDuke University Techinical Report CS-2006-07

April, 2006

Abstract

Access control misconfigurations are widespread and canresult in damaging breaches of confidentiality. This pa-per presents TightLip, a privacy management system thathelps users define what data is sensitive and who istrusted to see it rather than forcing them to understand orpredict how the interactions of their software packagescan leak data.The key mechanism used by TightLip to detect and pre-vent breaches is the doppelganger process. Doppel-gangers are sandboxed copy processes that inherit most,but not all, of the state of an original process. The op-erating system runs a doppelganger and its original inparallel and uses divergent process outputs to detect po-tential privacy leaks.TightLip is compatible with legacy-code and providescomplete, continuous protection at a modest perfor-mance cost. SpecWeb99 results show that Apache run-ning on the TightLip prototype exhibits a negligible 5%slowdown in request rate, response time, and transfer ratecompared to an unmodified server environment.

1 Introduction

Managing the permissions of any shared space ischallenging, even for highly skilled system administra-tors [12]. Especially for untrained PC users, access con-trol errors are routine and can lead to damaging privacyleaks. A 2003 usability study of the Kazaa peer-to-peerfile-sharing network found that many users share theirentire hard drive with the rest of the Internet, includingemail inboxes and credit card information [10]. Over12 hours, the study found 156 distinct users who weresharing their email inboxes. Not only were these filesavailable for download, but other users could be observeddownloading them.

This and similar confidentiality compromises [12, 13,14, 19] present a different threat model than is normallyassumed by the privacy and security literature. In eachcase, data leaked because of access control misconfig-urations rather than malicious intent or buggy software.For example, companies in the UK were reported to havebanned their employees from using Google Desktop be-cause it allows users to search across machines and canstore sensitive files on remote Google servers [14].

Neither secure communication channels [7] nor host-based intrusion detection [6, 9] would have preventedthese exposures. Furthermore, the impact of these leaksextends beyond the negligent users themselves sincethe leaked data includes previous communication andtransaction records involving other users. No matterhow careful any individual is, her privacy will only beas secure as her least competent confidant. Prior ap-proaches to similar problems are either incompatiblewith legacy code [8, 22] or exhibit prohibitively poor per-formance [17].

Thus, we are exploring a new approach to preventingleaks due to access control misconfigurations through aprivacy management system called TightLip. TightLiphelps users define what data is important and who istrusted, rather than forcing them to understand the com-plex dynamics of how data flows among their softwarepackages. We have divided the problem into two com-plementary goals—first, to create file and host meta-data,such as labels and group membership, and second, touse this meta-data to alert users about potential leaks.TightLip transparently labels sensitive files by searchingfor data such as email and tax returns. Users only needto be prompted when a potential breach occurs, whichshould be rare.

This paper focuses on TightLip’s second goal of us-ing labels to stop privacy leaks. Our approach relies ona new operating system object called a doppelganger1

process. Doppelgangers are sandboxed copy processesthat inherit most, but not all, of the state of an originalprocess. In TightLip, doppelgangers are spawned when aprocess tries to read a sensitive file. The TightLip kernelreturns sensitive data to the original and scrubbed data tothe doppelganger.

The doppelganger and original then run in parallelwhile the operating system monitors the sequence andarguments of their system calls. If output buffers for thetwo processes are the same, the output is independent ofthe sensitive input with very high probability and is al-lowed to pass. However, if output buffers are different,they likely depend on their respective inputs. Routines

1From the Oxford English Dictionary: “the apparition of a livingperson; a double, a wraith.”

1

Page 2: TightLip: Keeping Applications from Spilling the Beanslpcox/nsdi07/tr-cs-2006-07.pdf · A common approach to data condentiality is to de-compose a system’s software into three sets

for handling divergence are policy-specific and based onthe trust level of the destination and the state of the dop-pelganger.

We have implemented doppelgangers in a prototypeTightLip kernel and evaluated our approach throughmicro-benchmarks and SpecWeb99 web server bench-marks. The performance overhead in our micro-benchmarks is several orders of magnitude less thancomparable taint-checking systems. SpecWeb99 resultsshow that Apache running on TightLip exhibits a negli-gible 5% slowdown in request rate, response time, andtransfer rate compared to an unmodified server environ-ment.

2 Background and Context

A common approach to data confidentiality is to de-compose a system’s software into three sets of compo-nents: untrusted software that may be compromised bymalicious outsiders, an access control policy specifyinghow data should be handled by untrusted software, andtrusted software charged with enforcing the access con-trol policy. Confidentiality can be ensured as long as thetrusted components remain uncompromised and the ac-cess control policy is properly specified.

Software compromises are unlikely to disappear inthe foreseeable future, but research efforts such as As-bestos [8], CCured [16], Jif/split [22], and Vigilante [6]offer hope that their number and impact can be lim-ited. Unfortunately, there is little hope that policyspecification will become less error-prone. The inter-actions among widely used network applications suchas email, tax software, virus scanners, Google Desk-top, file-sharing, voice-over-IP, content-distribution, andpeer-to-peer storage are difficult for users to comprehendor predict. It is not surprising that access control mis-configurations have increasingly lead to confidentialitycompromises [10, 12, 13, 14, 19].

Well-funded organizations can manage the complexityof software package interactions by hiring highly skilledadministrators, but this is prohibitively expensive for PCusers and small enterprises. Thus, TightLip’s goal is toprovide inexpensive and unintrusive data confidentialityby transparently identifying sensitive data and notifyingusers of potential leaks. Our approach rests on two as-sumptions.

First, it is safest to assume that some easily identifiedfile types, such as emails, are sensitive until told other-wise. Second, it is easier to mark a few important indi-vidual files, such as students’ grades or medical records,as sensitive once rather than verifying that each applica-tion has the appropriate read permissions.

We currently use a suite of simple scripts to find and

label suspected sensitive file types and provide a utilityfor users to label individual files by hand. This process issimilar to virus protection software that downloads a li-brary of definitions and scans a file system for potentiallyinfected data. The difference is that rather than prompt-ing a user when it finds a match, our scanner silentlymarks the file as sensitive. The subject of this paper ishow an operating system can use these labels to preventleaks.

3 Alternative Approaches

We considered several approaches to ensuring con-fidentiality. A primary requirement was that TightLipbe compatible with legacy applications. This ruledout applying new language features such as Jif/split’sinformation-flow policies [22] or new program abstrac-tions such as Asbestos’s event processes [8].

A very simple, legacy-compatible way to preventbreaches is to disable the network write permissions ofany process that reads a sensitive file. We rejected thisas too heavy-weight because it can needlessly punishprocesses that use the network legitimately after readingsensitive data. For example, virus scanners often readsensitive files and later contact a server for updated anti-virus definitions. Google Desktop and other file index-ing tools may try to aggregate local and remote searchresults.

Taint-checking [17] offered a more promising ap-proach. The original goal of taint-checkers is to thwartattackers from exploiting software bugs that allow themto execute arbitrary code. Checkers do this by transi-tively marking memory locations that depend on incom-ing packets as “tainted” and raising an alarm when atainted value is copied to the processor’s system reg-isters. Taint-flow can be computed statically at com-pile time using special language features [22], but ismore commonly tracked dynamically via binary rewrit-ing [6, 17] or hardware emulation [4, 11].

Applying taint-flow analysis to confidentiality isstraightforward. Instead of tainting state derived fromincoming packets, the system would mark memory thatdepended on data read from a sensitive file. If a pro-cess subsequently tried to send a tainted buffer over thenetwork, the system would raise an alarm. Unfortu-nately, taint-checking suffers from three drawbacks: in-completeness, discontinuity, and poor performance.

There is a fundamental trade-off in taint-flow analy-sis between the false negative rate and false positive ratedue to the “implicit flow” induced by conditionals. Con-sider the following code fragment, in which variable x istainted: if(x) { y=1; } else { y=0; }. Vari-able y should be flagged since its value depends on the

2

Page 3: TightLip: Keeping Applications from Spilling the Beanslpcox/nsdi07/tr-cs-2006-07.pdf · A common approach to data condentiality is to de-compose a system’s software into three sets

Original

1. Original process reads a non-sensitive file.

Original Doppelganger

2. Original tries to read a sensitive file. Create copy doppelganger.

Original Doppelganger

3. Return sensitive data to original, scrubbed data to doppelganger.

Doppelganger

6. Swap doppelganger for original, allow network write.

Original Doppelganger

5. Original, doppelganger try to write different buffers to network.

Original Doppelganger

4. Original, doppelganger run in parallel, taint other objects.

Figure 1: Using doppelgangers to avoid a breach.

value of x. Tainting each variable written to within aconditional block captures all dependencies, but can alsoimplicate innocent variables and raise false positives.

In practice, following dependencies across condition-als is extremely difficult without carefully-placed pro-grammer annotations [22]. Every taint-checker forlegacy code that we are aware of ignores implicit flow toavoid false positives. This is troubling since ignoring im-plicit flow provides incomplete protection and may giveusers a false sense of security.

Another limitation of taint-checking is that it cannotgracefully transition a process between tainted and un-tainted states. A list of tainted memory locations is notenough to infer what a clean alternative state would looklike. Bridging this semantic gap requires understandinga process’s execution logic and data structure invariants.

Because of this, once a breach has been detected andaverted, any tainted processes associated with the breachmust be rebooted. While rebooting purges taint, it alsowipes out legitimate connections and any untainted datastructures. Ideally, clean process state could be insulatedso that untainted requests could be handled without in-terruption. This would also keep misconfigurations fromenabling denial of service attacks. Rather than takingdown a service while a policy error is being debugged,

legitimate requests could continue to be handled.Finally, most taint checkers exhibit prohibitively poor

performance. For example, an Apache web server run-ning under TaintCheck and serving 10KB files is nearly15 times slower than an unmodified server [17]. For1KB files, it is 25 times slower. This is particularly dis-couraging because the performance penalty must be paidthroughout a process’s execution lifetime even if readingsensitive files is rare or never occurs.

A recent taint checker built into the Xen hypervi-sor [11] can avoid emulation overhead as long as thereare no tainted resident memory pages. The hypervisortracks taint at a hardware byte granularity and can dy-namically switch a virtual machine to emulation modefrom virtualized mode once it requires tainted memoryto execute. This allows untainted systems to run at nor-mal virtual machine speeds.

While promising, tracking taint at a hardware bytegranularity has its own drawbacks. In particular, it forcesguest kernels to run in emulation mode whenever theyhandle tainted kernel memory. The system designershave modified a Linux guest OS to prevent taint frominadvertently infecting the kernel stack, but this does notaddress taint spread through system calls. For example,it is easy to imagine portions of the buffer cache becom-

3

Page 4: TightLip: Keeping Applications from Spilling the Beanslpcox/nsdi07/tr-cs-2006-07.pdf · A common approach to data condentiality is to de-compose a system’s software into three sets

ing tainted. This would impose a significant performancepenalty on untainted processes as long as the tainted dataremained in the cache. Worse, the tainted data could stayin the cache long after the tainted process that placed itthere had exited.

Because of these drawbacks, we have designed andimplemented a new operating system object called adoppelganger process for TightLip. Doppelgangers aresandboxed copy processes that help monitor the execu-tion of their original . When combined with sensitive filelabeling, they provide complete protection and continu-ous service at a modest performance cost.

4 Design

Figure 1 shows a simple example of how doppel-gangers can be used to avoid a breach. Initially, an orig-inal process runs without touching sensitive data. Atsome point, the original attempts to read a sensitive file,which prompts the TightLip kernel to spawn a doppel-ganger. The kernel returns the sensitive file’s content tothe original and an equal amount of scrubbed data to thedoppelganger.

In many cases, giving the doppelganger random datawill cause it to crash [15], particularly if it is expect-ing structured input, such as an email or a PDF file. Toavoid this and preserve application continuity, TightLipcan use hints about sensitive files’ types and a databaseof well known document formats to generate structuredscrubbed data. If it is not possible to infer a file’s type,TightLip simply returns a sequence of ASCII “a” char-acters.

Once each read has been satisfied, the original anddoppelganger are both placed on the CPU run queue and,when scheduled, modify private memory objects. If atsome point, both processes attempt to write to an un-trusted network host, the kernel compares their outputbuffers. If the buffers’ contents are different, then theoriginal’s output depends on the sensitive input with highprobability. When TightLip detects such divergence, itasks the user if data descended from the sensitive file maybe written to the destination host.

If the user allows the write (e.g. such as an email to amail server), then the host is added to a list of trusted des-tinations. If the original’s buffer should not be sent, sincethe doppelganger resides in a consistent, untainted exe-cution state, it may be swapped in for the original. Swap-ping preserves any accumulated untainted data structuresand allows TightLip to provide continuous service.

This simple example illustrates the two primary chal-lenges of designing doppelgangers. First, because dop-pelgangers may run for extended periods and competewith other processes for CPU time and physical memory,

they must be as resource-efficient as possible. Second,since TightLip relies on divergence to detect breaches, alldoppelganger inputs and outputs must be carefully regu-lated to minimize false positives.

4.1 Reducing Doppelganger Overhead

Our first challenge was limiting the resources con-sumed by doppelgangers. A doppelganger can bespawned at any point in the original’s execution. One op-tion is to create the doppelganger concurrently with theoriginal, but doing so would incur the cost of monitoringin the common case when taint is absent.

Instead, TightLip only creates a doppelganger whenan original attempts to read from a sensitive file. For thevast majority of processes, reading sensitive files will oc-cur rarely, if ever. However, some long-lived processesthat frequently handle sensitive data such as mail readers,virus scanners, and file search tools may require a dop-pelganger throughout their execution. For these applica-tions, it is important that doppelgangers be as resource-efficient as possible.

Once created, doppelgangers are inserted into thesame CPU run queue as other processes. This imposesa modest scheduling overhead and adds processor load.However, unlike taint-checkers, the fact that doppel-gangers have a separate execution context allows them tobe parallelized with other processes, including the origi-nal. As multi-core PCs and workstations become widelydeployed, the CPU overhead of doppelgangers should di-minish.

To limit memory consumption, doppelgangers areforked from the original with their memory markedcopy-on-write. In addition, nearly all of the doppel-ganger’s kernel-maintained process state is shared read-only with the original, including its file object table andassociated file descriptor namespace. The only sepa-rate, writable objects maintained for the doppelgangerare its execution context, file offsets, and modified mem-ory pages.

4.2 Doppelganger Inputs and Outputs

The TightLip kernel must manage doppelganger in-puts and outputs to perform three functions: preventexternal effects, limit the sources of divergence to theinitial scrubbed input, and contain sensitive data. Toperform these functions, TightLip must regulate infor-mation that passes between the doppelganger and ker-nel through system calls, signals, and thread schedules.Many of these issues are reminiscent of primary/backupfault tolerance [1]. In TightLip, the original is analogousto the primary and the doppelganger is analogous to thebackup.4

Page 5: TightLip: Keeping Applications from Spilling the Beanslpcox/nsdi07/tr-cs-2006-07.pdf · A common approach to data condentiality is to de-compose a system’s software into three sets

Type Example Processing description

Kernel update sys bind Apply original, return result to both.Kernel read sys getuid Verify identical system call sequences.

Non-kernel update sys send Synchronize, compare buffers.Non-kernel read sys gettimeofday Buffer original results, return to both.

Table 1: Doppelganger-kernel interactions.

Importantly, we assume that the kernel is completelytrusted and competent to manage sensitive data. TightLipis helpless to stop the kernel from maliciously or unin-tentionally compromising confidentiality. For example,TightLip cannot prevent an in-kernel NFS server fromleaking sensitive data.

Each kernel-doppelganger interaction can be assignedone of four types: kernel updates, kernel reads, non-kernel updates, and non-kernel reads. Table 1 lists eachtype, provides an example system call, and briefly de-scribes how TightLip regulates the interaction. The restof this section discusses each interaction type in greaterdetail.

4.2.1 Updates to Kernel State

As with speculative execution environments [3, 18],TightLip must prevent doppelgangers from producingany external effects. This stems from our desire to keepTightLip unintrusive. As long as an application does nottry to send tainted data to an untrusted host, it should be-have no differently than if there were no doppelganger.

This is why original processes must share their ker-nel state with the doppelganger read-only. If the doppel-ganger were allowed to update the original’s objects, itcould alter its execution. Thus, system calls that modifythe shared kernel state must be strictly ordered so thatonly the original process can apply updates.

System calls that update kernel state include exit, fork,time, lseek, alarm, sigaction, gettimeofday, settimeofday,select, poll, llseek, fcntl, bind, connect, listen, accept,shutown, and setsockopt.

TightLip uses barriers and condition variables to im-plement these system calls. A barrier is placed at the en-try of each kernel modifying call. After both processeshave entered, TightLip checks their call arguments toverify that they are the same. If the arguments match,then the original process executes the update, while thedoppelganger waits. Once the original finishes, TightLipnotifies the doppelganger of the result before allowing itto continue executing.

It is important to note that processes will never blockindefinitely. If one process times out waiting for the otherto reach the barrier, TightLip assumes that the processes

have diverged. If the doppelganger times out, it is imme-diately discarded and any future updates by the originalto non-kernel state, such as a network write, are consid-ered potential breaches.

The same thing happens if the original times out wait-ing for the doppelganger or if their system call argumentsdiffer. In these cases, the original’s update will be appliedafter the doppelganger has been eliminated. Once theupdate completes, future attempts to modify non-kernelstate are considered potential breaches.

Signals are a special kernel-doppelganger interactionsince they involve two phases: signal handler registra-tion, which modifies kernel data, and signal delivery,which injects data into the process. Handler registra-tion is managed using barriers and condition variablesas other kernel state updates are; only requests from theoriginal are actually registered. However, whenever sig-nals are delivered, both processes must receive the samesignals in the same order at the same points in their exe-cution. We discuss signal delivery in Section 4.2.2.

Of course, doppelgangers must also be prevented frommodifying non-kernel state such as writing to files or net-work sockets. Because it may not be possible to pro-ceed with these writes without involving the user, theyare treated differently. We discuss updates to non-kernelstate in Section 4.2.3.

4.2.2 Doppelganger Inputs

As we saw in Section 4.2.1, when TightLip determinesthat the doppelganger has strayed it must assume that anyexternal output from the original may compromise con-fidentiality. Thus, TightLip must limit the sources of di-vergence so that it only emerges from the initial scrubbedinput. If non-sensitive input causes a process to stray,TightLip may generate false positives.

To prevent this, all non-sensitive inputs should beidentical for both the original and doppelganger. Forexample, both processes must receive the same valuesfor time-of-day requests, receive the same network data,and experience the same thread interleavings. Ensuringthat reads from kernel state are the same is trivial, giventhat updates are synchronized. However, preventing non-kernel reads, signal delivery, and thread-interleavingsfrom generating divergence is much more challenging.5

Page 6: TightLip: Keeping Applications from Spilling the Beanslpcox/nsdi07/tr-cs-2006-07.pdf · A common approach to data condentiality is to de-compose a system’s software into three sets

In this example the kernel receives a signal for the orig-inal and doppelganger while they are both inside the se-lect system call. This allows the kernel to deliver thesignal to both processes as they exit select.

Figure 2: Signaling that avoids divergence.

Non-kernel reads

The values returned to non-kernel reads, such as froma file, a network socket, and the processor’s clock, canchange over time. For example, consecutive calls to get-timeofday or consecutive reads from a socket will eachreturn different data. TightLip must ensure that pairedaccesses to non-kernel state return the same value to boththe original and doppelganger. This requirement is sim-ilar to the Environment Instruction Assumption ensuredby hypervisor-based fault-tolerance [2].

To prevent the original from getting one input andthe doppelganger another, TightLip assigns a producer-consumer queue to each data source. For each queue,the original process is the producer and the doppelgangeris the consumer. System calls that utilize such queuesinclude read, readv, recv, recvfrom, and gettimeofday.Note that read and recv queues are associated with indi-vidual file descriptors, but manipulated in the same sys-tem call code.

If the original (producer) makes a read request first,it is satisfied by the external source and the result isbuffered in the queue. The original can continue to re-quest data from the same source as long its associatedqueue is not full. Once the queue becomes full, the orig-inal must block until the doppelganger (consumer) per-forms a read from the same non-kernel source and re-moves a buffer from the queue. Similarly, if the dop-pelganger attempts to read from a non-kernel source andthe queue is empty, it must wait for the original to addbuffers.

The mechanism is altered slightly if the read is fromanother sensitive source. In this case, TightLip returnsscrubbed queue buffers to the doppelganger and updatesa list of sensitive inputs to the process. Otherwise, the

In this example the original is pre-empted by the kernelbefore it reaches a loop. The doppelganger runs longerand iterates several times before being pre-empted.When the original runs again, it first handles the signal,changing the loop conditional.

Figure 3: Signaling that leads to divergence.

producer-consumer queue is handled exactly the same asfor a non-sensitive source.

As before, neither process blocks indefinitely. If a pro-cess times out waiting for the other, TightLip assumesthat the processes have diverged, discards the doppel-ganger, and views any non-kernel writes by the originalwith suspicion.

Signals

As explained in Section 4.2.1, signals are a two-phaseinteraction: a process registers a handler and the kernelmay later deliver a signal. We treat the first phase asa kernel update. Since modifications to kernel state aresynchronized, any signal handler that the original suc-cessfully registers is also registered for the doppelganger.We currently assume that the doppelganger will not mod-ify the copy-on-write memory pages on which the han-dler resides. This allows the location of the original’ssignal handler to also be valid for the doppelganger.

The TightLip kernel delivers signals to a process asit transitions into user mode. Any signals intended fora process are added to its signal queue and then movedfrom the queue to the process’s stack as it exits kernelspace. A process can exit kernel space either becauseit has finished a system call or because it had been pre-empted and is scheduled to start executing again.

To prevent divergence, any signals delivered to thedoppelganger and original must have the same content,be delivered in the same order, and must be deliveredwithin the same execution state. If any of these condi-tions is violated, the processes could stray. TightLip en-sures that signal content and order is identical by copying

6

Page 7: TightLip: Keeping Applications from Spilling the Beanslpcox/nsdi07/tr-cs-2006-07.pdf · A common approach to data condentiality is to de-compose a system’s software into three sets

any signal intended for the original to both the original’sand doppelganger’s signal queue.

As long as both processes are within synchronized sys-tem calls, TightLip can ensure that signals are deliveredto the same execution state. Before jumping back intouser space, the kernel places any pending signals on thefirst process’s stack. When the process re-enters userspace, it handles the signals in order before returningfrom its system call.

The same is true when the second process (whether thedoppelganger or original) re-enters user space. For thesecond process, a further check is needed to ensure thatonly signals that were delivered to the first are deliveredto the second. Figure 2 illustrates such an example.

Unfortunately, it may not always be possible to deliversignals at synchronization points. To see why, considerthe processes in Figure 3. In the example, a program usesan alarm signal to exit a loop. When the signal is han-dled determines how many times the loop iterates. Theproblem is that the doppelganger and original have beenpre-empted at different instructions, which forces themto handle the same signal in different states.

Though TightLip cannot completely eliminate thisform of divergence, it can limit its likelihood by defer-ring delivery until the processes reach a synchronizationpoint. Most programs make system calls throughout theirexecution, providing many opportunities to handle sig-nals. However, for the rare program that does not makeany system calls, the kernel cannot wait indefinitely with-out compromising program correctness. Thus, the kernelskips delivering signals on pre-emption re-entry only a fi-nite number of times. For example, signal delivery mightbe delayed five pre-emptions without an intervening sys-tem call.

A primary/backup fault-tolerance system developedby Bressoud and Schneider [2] faced a nearly identicalproblem in trying to ensure the proper ordering of mes-sages delivered to replicated virtual machines. Their so-lution relied on the HP PA-RISC processor’s recoveryregister. This register was decremented each time an in-struction was retired and the processor raised an interruptonce it became negative. If TightLip were running onsuch a processor, it could use this register to schedule theoriginal and doppelganger for precisely the same num-ber of instructions. This would ensure that signals werealways delivered at the same points in each processes’instruction stream.

Threads

Managing multi-threaded processes is similar to manag-ing signals, although TightLip must maintain two addi-tional pieces of information: a one-to-one mapping from

original threads to doppelganger threads and a brief his-tory of previously scheduled threads. These help ensurethat both processes’ threads are executed in the same or-der.

If control is transferred between threads along systemcalls and thread primitives such as yield and lock/unlockpairs, then TightLip can guarantee that the original anddoppelganger threads will enter and exit at the samepoints. Without these synchronization points, TightLipis faced with the same problem as with signals; our solu-tion is also similar.

If a a thread is pre-empted before a synchronizationpoint, it can be immediately rescheduled a finite num-ber of times in the hope that it will eventually reach asystem call or thread primitive. If it does not willinglyreturn control to the kernel after five consecutive quanta,TightLip can schedule another thread. If the underlyingprocessor supports a recovery register, as previously de-scribed for the PA-RISC [2], TightLip can use it to ensurethat interleavings for both processes have identical entryand exit points.

4.2.3 Updates to Non-kernel State

The last process interactions to be regulated are up-dates to non-kernel state. As with other system calls,these updates are synchronized between the processesusing barriers and condition variables. The differencebetween these modifications and those to kernel state isthat TightLip does not automatically apply the original’supdate and return the result to both processes. There aretwo inputs that determine how the kernel handles theseupdates: if the updates are the same and if the destina-tion is trusted.

Handling Potential Leaks

First, if both processes generate the same update, then weassume that the update does not depend on the sensitiveinput and that releasing it will not compromise confiden-tiality. TightLip applies the update, returns the result,and takes no further action. Though unlikely, in somepathological cases this can lead to a breach.

For example, an original process may accept a net-work query asking whether a sensitive variable is evenor odd. TightLip can generate a scrubbed value that isdifferent from the sensitive variable, but of the same par-ity. The output for the doppelganger and original will bethe same, despite the fact that the original is leaking in-formation. The problem is that it is possible to generateunlucky scrubbed data that can lead to a false negative.In practice, we doubt that such false negatives will everarise since the likelihood of a collision decreases with theamount of information needed to generate a response.7

Page 8: TightLip: Keeping Applications from Spilling the Beanslpcox/nsdi07/tr-cs-2006-07.pdf · A common approach to data condentiality is to de-compose a system’s software into three sets

Second, if the processes generate different updates,TightLip considers the destination. If the update is toa trusted destination, such as a file system that supportssensitivity, a mail server, or the display, TightLip appliesthe original’s update. To create a data flow history thekernel might also record where the sensitive informationwas passed. For a file write, this would entail labelingthe file as sensitive. For a trusted host, TightLip can logthe host, the time the update was sent, and the originalsensitive input.

Handling display updates or inter-process communi-cation is trickier. In the case of the display, one optionis to mark the display sensitive and then spawn a doppel-ganger for any process that tries to read from it. In a win-dowing environment, this may lead to spawning doppel-gangers for any process with a graphical user interface.In the case of inter-process communication, one option isto spawn a doppelganger for the receiving process. Bothoptions are reasonable, but the best way to handle thesesituations is an open question.

Finally, if the updates are different and destined foran untrusted source, TightLip blocks both processes andprompts the user for further instruction. If the user al-lows the write, TightLip may create an entry in a pol-icy database stating that the host is allowed to receivedata derived from the sensitive files. Future updates de-scended from the sensitive file to the host would be han-dled as described above. On the other hand, if the userrejects the write, TightLip asks whether to kill the ap-plication, send the doppelganger’s data, or swap in thedoppelganger for the original.

Swapping

If the user chooses to swap in the doppelganger, thekernel discards the original and associates the original’sprocess identifier with the doppelganger’s process state.Swapped-in processes require an extra mechanism to runthe doppelganger efficiently and safely.

For each swapped-in process, TightLip maintains afixed-size list of open files inherited from the doppel-ganger. Anytime the swapped-in process attempts to readfrom a file, the kernel checks whether the file is on thelist. If it is, TightLip knows that the process had pre-viously received scrubbed data from the file and returnsmore scrubbed data. If the file is not on the list and thefile is sensitive, TightLip spawns a new doppelganger.

These lists are an optimization to avoid spawning dop-pelgangers unnecessarily. Particularly for large files thatrequire multiple reads, spawning a new doppelganger forevery sensitive read can lead to poor performance. Im-portantly, leaving files off of the list can only hurt perfor-mance and will never affect correctness or compromise

confidentiality. Because of this guarantee, TightLip canremove any write restrictions on the swapped-in processsince its internal state is guaranteed to be untainted.

Unfortunately, swapping is not without risk. In somecases, writing the doppelganger’s buffer to the networkand keeping the doppelganger around to monitor theoriginal may be the best option. For example, the usermay want the original to write sensitive data to a localfile even if it should not write it to the network. How-ever, maintaining both processes incurs some overheadand non-sensitive writes would still be identical for boththe original and the swapped-in process with very highprobability.

Furthermore, the doppelganger can stray from theoriginal in unpredictable ways. This is similar to the un-certainty generated by failure-oblivious computing [20].To reduce this risk, TightLip can monitor internal diver-gence in addition to external divergence. External symp-toms of straying are obvious—when the doppelgangergenerates a different sequences of system calls or usesdifferent arguments. Less obvious may be if the scrubbeddata or some other input silently shifts the process’s con-trol flow. Straying of this form may not generate externalsymptoms, but can still leave the doppelganger in a dif-ferent execution state than the original.

We believe that this kind of divergence will be rare. Inapplications such as file servers, web servers, and peer-to-peer clients, a sensitive file will be read, its contentswill be encoded, and the result will be written back to thenetwork. In these cases, the doppelganger and originalreturn to the same state after the network write. In othercases, divergence will likely manifest itself as a differentsequence of system calls or a crash [15].

To be safe, TightLip can take advantage of commonprocessor performance counters, such as those offeredby the Pentium4 [21] to detect internal divergence. Ifthe the number of instructions, number of branches takenand mix of loads and stores are sufficiently similar, thenit is extremely unlikely that the scrubbed input affectedthe doppelganger’s control flow. TightLip can use thesevalues to measure the likelihood that the doppelgangerand original are in the same execution state and relay thisinformation to the user.

4.3 Limitations and Future Work

Though TightLip handles most execution environ-ments, it cannot prevent leaks in all situations. In ad-dition to previously discussed cases, TightLip cannot in-terpose on individual loads and stores to shared memory.This could allow sensitive data to pass freely.

TightLip also currently has no mechanism to preventa misconfigured or compromised process from overwrit-ing sensitive data. Our design targets data confidentiality,8

Page 9: TightLip: Keeping Applications from Spilling the Beanslpcox/nsdi07/tr-cs-2006-07.pdf · A common approach to data condentiality is to de-compose a system’s software into three sets

but does not address data integrity. However, it is easy toimagine integrating integrity checks with our current de-sign. For example, anytime a process attempts to writeto a sensitive file, TightLip could prompt the user, as itcurrently does for network socket writes.

Finally, we believe that it will be possible to reduce thememory consumed by a long-lived doppelganger by pe-riodically comparing its memory pages to the original’s.This would make using the doppelganger solely to gen-erate untainted network traffic—as opposed to swappingit in for the original—more attractive.

Though the doppelganger will copy-on-write memorypages as it executes, many of those pages may still beidentical to the original’s. This would be true for pagesthat only receive updates that are independent of thescrubbed input. These pages could be remarked copy-on-write and shared anew by the two processes.

Furthermore, even if a page initially contained bytesthat depended on the scrubbed input, over time thosebytes may be overwritten with non-sensitive values.These pages could also be recovered. Carried to its log-ical conclusion, if all memory pages of the original anddoppelganger converged, then the doppelganger could bediscarded altogether. TightLip could periodically com-pute page deltas while the original and doppelganger aresynchronized inside a system call. We intend to explorethe mechanisms required to do this efficiently in futurework.

5 Implementation

Our TightLip prototype consists of several hundredlines of C code scattered throughout the Linux 2.6.13kernel. Most of the code deals with monitoring doppel-ganger execution, but we also made minor modificationsto the ext3 file system to store persistent sensitivity la-bels.

5.1 File Systems

Sensitivity is currently represented as a single bit co-located on-disk with file objects. If more complex clas-sifications become necessary, using one bit could be ex-tended to multiple bits. To query sensitivity, we added apredicate to the kernel file object that returns the sensi-tivity status of any file. TightLip currently only supportssensitivity in the ext3 file system, though this imple-mentation is completely backwards-compatible with anyexisting ext3 partitions. Adding sensitivity to futurefile systems should be straightforward since manipulat-ing the sensitivity bit in on-disk ext3 inodes only re-quired an extra three lines of code.

original

doppelganger

read(fd, buffer, count)

buffer_copy, success

read(fd, buffer, count)

verify, scrub

TightLip uses buffer queues and completion structures tostore the results of kernel functions that the doppelgangeris not allowed to execute. After the original executes akernel function, it appends a completion structure to thecorresponding buffer queue. The doppelganger dequeuesthe completion structure and verifies its arguments beforeusing the result.Figure 4: Buffer queues and completion structures.

Our prototype also provides a new system call to man-age sensitivity from user-space. The system call can beused to read, set, or clear the sensitivity of a given file.This is used by scripts that periodically scan the disk forsensitive data and by a utility that allows users to managesensitive files by hand.

5.2 Data Structures

Our prototype augments several existing Linux datastructures and adds two new ones: completion struc-tures and buffer queues. These data structures are usedthroughout the kernel to synchronize sequences of sys-tem calls and ensure that kernel and non-kernel state isupdated properly.

Completion structures buffer the results of an invokedkernel function. This allows TightLip to apply an updateor receive a value from a non-kernel source once, butpass on the result to both the original and doppelganger.Minimally, completion structures consist of arguments toa function and its return value. They may also contain in-structions for the receiving process, such as a divergencenotification or instructions to terminate.

Using a single completion structure for each systemcall could over-constrain concurrency between the dop-pelganger and original and lead to poor performance. Forexample, using one structure for non-kernel reads wouldforce strictly alternating invocations of all file and net-work reads. We use buffer queues to address this limita-tion.

A buffer queue is a producer-consumer buffer for stor-ing completion structures. Figure 4 shows an examplequeue. Each buffer queue is owned by the original pro-cess, which acts as the producer. Every time the origi-nal process invokes a specific kernel function, it createsa completion structure describing the arguments and the

9

Page 10: TightLip: Keeping Applications from Spilling the Beanslpcox/nsdi07/tr-cs-2006-07.pdf · A common approach to data condentiality is to de-compose a system’s software into three sets

result of the function call. It then stores the structure inthe corresponding buffer queue.

The doppelganger plays the role of the consumer.When it reaches a function in the kernel that it is notallowed to invoke directly, it consults the correspondingcompletion buffer for the result of the original’s call. Be-fore using the result, the doppelganger ensures that itsown arguments are compatible with the arguments storedin the completion structure. On successful verification,the doppelganger uses the result as if it executed the func-tion itself. If the arguments do not match, TightLip as-sumes that the processes have diverged and takes a cor-rective action.

If the buffer is empty, the doppelganger blocks until aresult arrives. Similarly if the buffer is full, the originalprocess blocks until the doppelganger dequeues an item.TightLip carefully monitors blocking on buffer queues toprevent indefinite waits and to detect divergence; in ourcurrent implementation, a process never blocks for morethan five seconds waiting for its peer. If a timeout occurs,the kernel assumes that the processes have diverged.

The size of the buffer queue offers a trade-off be-tween performance and memory consumption. Largerbuffer queues allow longer uninterrupted execution, butthis gain comes at the price of larger in-kernel memoryusage. We currently set the size of buffer queues to onememory page—4 Kbytes.

TightLip also required several modifications to theLinux task structure. These additions allow the kernelto map doppelgangers to and from originals, synchro-nize their actions, and pass messages between them. Thetask structure of the original process also stores a listof buffer queues corresponding to kernel function callssuch as bind, accept, and read. Finally, all process struc-tures contain a list of at most 10 open sensitive files fromwhich scrubbed data should be returned. Once a sensitivefile is closed, it is removed from this list.

5.3 System Calls

System call barriers are crucial for detecting diver-gence between the original and the doppelganger. Forexample, correctly implementing the exit system call re-quires that peers synchronize in the kernel to atomicallyremove any mutual dependencies between them. Wehave taken a conservative approach to barriers and in-serted them in most implemented system calls. In the fu-ture, we may be able to relax these constraints and elim-inate some unnecessary barriers.

We began implementing TightLip by modifying readsystem calls for files and network sockets. Next, wemodified the write system call to compare the outputs ofthe original and the doppelganger. The prototype allows

invocation of a custom policy module when TightLip de-termines that a process is attempting to write sensitivedata to an untrusted source. We currently offer severaldefault policies: kill the process, close the file/socket,write the output of the doppelganger, and swap the dop-pelganger for the original process.

After read and write calls, we added support for readsand modifications of kernel state, including all of thesocket system calls. We have instrumented most, but notall relevant system calls. Linux currently offers more that290 system calls, of which we have modified 27.

5.4 Process Swapping

TightLip implements process swapping in two stages.First it synchronizes the processes using a barrier. Thenthe original process notifies the doppelganger that swap-ping should take place. The doppelganger receivesthe message and exchanges its process identifier withthe original’s. To do this requires unregistering bothprocesses from the global process table and then re-registering them under the exchanged identifiers. Thedoppelganger must then purge any pointers to the origi-nal process’s task structure.

Once the doppelganger has finished cleaning up, itacknowledges the original’s notification. After receiv-ing this acknowledgment, the original removes any of itsstate that depends on the doppelganger and sets its par-ent to the init process. This avoids a child death signalfrom being delivered to its actual parent. Once these up-dates are in place, the original safely exits.

5.5 Future Implementation Work

There are still several features of our design that re-main unimplemented. The major goal of the current pro-totype has been to evaluate our design by running severalkey applications such as a web server, NFS server, andsshd server. We are currently adding support for inter-process communication via pipes and graphical user in-terfaces via UNIX domain sockets. We are also workingon support for multi-threaded applications and propagat-ing taint across processes. Our focus on single-threadedapplications, files, and sockets has given us valuable ex-perience with the core abstractions of TightLip and welook forward to a complete environment in the very nearfuture.

6 Evaluation

To further explore the motivation for TightLip andevaluate our prototype implementation, we sought an-swers to three questions:10

Page 11: TightLip: Keeping Applications from Spilling the Beanslpcox/nsdi07/tr-cs-2006-07.pdf · A common approach to data condentiality is to de-compose a system’s software into three sets

• How prevalent are access control misconfigurationsin academic computer science departments?

• What is the impact of doppelgangers on typical datatransfer applications?

• How much overhead does TightLip impose on theApache web server as measured by the SpecWeb99benchmark?

To answer the first question, we ran the TightLipemail-labeling script in the home directories of users intwo research university computer science departments.Our script attempted to match a single substring oftenfound in email headers in all world-readable files. To an-swer the last two questions, we measured the results offile transfer micro-benchmarks and SpecWeb99 on sev-eral unmodified server applications: Apache-1.3.34, thedefault Debian user-mode NFS server 2.2beta47-20 andsshd-3.8.

Each of these applications interacts with the kernel in avariety of ways and is structured differently. Apache runsas a collective of nine processes, each of which blocks onthe same socket accept call. The NFS server is a single-threaded, event-driven process that uses signals to handleconcurrent requests. sshd forks a new process to satisfyany incoming ssh or scp request and encrypts all trafficbetween the client and server.

For all experiments, our prototype TightLip kernel ranas a guest operating system inside of Microsoft VirtualServer with a single 2.8Ghz virtual CPU and 256MBof virtualized physical memory. There is nothing fun-damental to TightLip that required a virtual machine; itdoes not utilize any virtualization functionality. We onlyran our experiments on this platform because it mirroredour development environment. The host OS was Win-dows 2003 Server running on a dual Intel Xeon 2.8 Ghzprocessor with 2GB of RAM. Clients ran on the samenetwork segment and connected to the server over anopen 100Mbs ethernet.

6.1 Access Control Misconfigurations

There are many stories from the academic [10] andpopular press [12, 13, 14, 19] of access control miscon-figurations leading to privacy leaks. Despite the amountof coverage, misconfigurations only become stories onceusers are aware of the problem. An error may persist foryears or may never be detected, even if exploited by oth-ers. Because of this uncertainty, we were curious to seehow well the users of two American university computerscience department file systems were at managing theirfiles’ permissions.

Both departments provide users with a UNIX dis-tributed file system and a universal namespace. The usersof this file-space are faculty, students, and departmen-tal staff. Most are highly skilled computer users with

Users Readable Files Readable Messages

Faculty 188 7832Students 35 2985

Staff 6 384Total 229 11201

Table 2: Access control misconfigurations.

a good understanding of file access control abstractions.We wanted to avoid reading any individual files or ask-ing the hundreds of users whether anything we might findwas sensitive. These constraints led us choose searchinghome directories (but nothing deeper) for text files likelyto contain emails.

We ran our TightLip email-labeling script on the homedirectories of all 532 users one evening in April, 2006.Our script only searched for files that could be read byany user and tried to match the sub-string “Message-ID:”—a common string in email headers. We countedthe number of files, the number of emails found in thosefiles, and noted the position within the department of thefile owner. The results are in Table 2.

We found over 200 files and 11,000 string matches.Most interestingly, there appeared to be a correlation be-tween users’ level of training and the number of accesscontrol errors. Computer science faculty were responsi-ble for 82% of exposed files and 70% of exposed mes-sages. We cannot be sure why this is, but believe thatfaculty members, who are generally older than others inthe department, are more likely to use text-based mailreaders such as pine that store emails in users’ home di-rectories.

Of course, it is possible that these files were left ex-posed on purpose. However, an examination of the filenames suggests that for many this was not the case.For example, one faculty member had two large filesin his home directory named “INBOX” and “INBOX~”with nearly two thousand emails in each. Another fac-ulty member left a file with almost a thousand messagesnamed “.newmail” open for reading.

Furthermore, many users—faculty, student, and staffalike—appear to have left their “Junk” and “Trash” filesreadable. It is possible that leaking these messages wouldbe inconsequential. However, we are aware of manyusers who place old, but potentially sensitive, messagesin such files. One might also argue that since only other“trusted” department members can take advantage ofthese misconfigurations, there is less incentive for usersto be vigilant. However, there is strong evidence that pri-vacy is more important among social relations than be-tween strangers [5].

It is important to note that TightLip would not beneeded to fix these errors; they could easily be fixed bysetting the proper file permissions. However, it is note-

11

Page 12: TightLip: Keeping Applications from Spilling the Beanslpcox/nsdi07/tr-cs-2006-07.pdf · A common approach to data condentiality is to de-compose a system’s software into three sets

0

100

200

300

400

500

512B 1K 2K 4K 8K 16K 32K 64K 256K

Tim

e to

Tra

nsf

er (

ms)

No sensitive files Swapped Continuous

Figure 5: Time to transfer a file using Apache

0

100

200

300

400

500

512B 1K 2K 4K 8K 16K 32K 64K 256K

Tim

e to

Tra

nsf

er (

ms)

No sensitive files Swapped Continuous

Figure 6: Time to transfer a file using NFS

worthy how badly this group of highly-skilled computerusers was at managing file permissions. If these users areprone to misconfiguration exposures, the likelihood thataverage PC users are as well is even greater.

6.2 Application Micro-benchmarks

We next wanted to understand TightLip’s impact onseveral data transfer applications. We focused on theseapplications because they seem most likely to inad-vertently leak data, as the Kazaa file-sharing clientdoes [10]. Our methodology was simple; each exper-iment consisted of single client making 100 consecu-tive requests for 100 different files, all of the same size.As soon as one request finished, the client immediatelymade another. For each trial, we examined three config-urations. To capture baseline performance, each serverinitially ran with no sensitive files. The server simplyread from the file system, encoded the files’ contents, andreturned the results over the network.

Next, we ran the servers with all files marked sensitiveand used a default policy of swapping in a doppelganger .In these experiments, TightLip spawned a doppelgangerfor each request of a sensitive file and swapped in thedoppelganger after the first network write. Subsequentwrites in the same session were satisfied directly by theswapped-in process.

Finally, we ran experiments in which all files weremarked sensitive, but the client was trusted. In this case,TightLip spawned a doppelganger for each worker pro-cess. However, rather than swapping it in, the doppel-ganger ran continuously alongside the original for the re-mainder of the experiment. Figure 5, Figure 6, and Fig-ure 7 show the average over 100 requests of the time to

0

100

200

300

400

500

512B 1K 2K 4K 8K 16K 32K 64K 256K

Tim

e to

Tra

nsf

er (

ms)

No sensitive files Swapped Continuous

Figure 7: Time to transfer a file using sshd

0

25

50

75

100

125

Apache NFS scpExt

ra P

ages

Co

pie

d-o

n-W

rite

Swapped (per request) Continuous (total)

Figure 8: Memory pages created.

complete transfers of various sizes.No application incurred any overhead when the dop-

pelganger could be swapped in; the difference in trans-fer times between unmodified and swapped-in serverswas statistically insignificant. However, running doppel-gangers continuously did have an impact. In the worstcase, running a doppelganger for the NFS server contin-uously caused transfer times to increase between 93% for512 byte files and 145% for 32K files. Apache’s transfertimes increased between 34% for 256K and 115% for 1Kfiles. On the other hand, running doppelgangers continu-ously had virtually no impact on sshd.

We believe that this these differences are due to theway each server satisfies requests and the way each isstructured. Because sshd forks a new process to handleeach request, doppelgangers are discarded as soon as theconnection is closed. The difference in performance forswapped-in and continuous sshd merely reflects the extratime doppelgangers run between the first write and theend of the connection.

Thus, sshd is able to perform many of its functions—authenticating the user and setting up a connection—without a doppelganger present. It is only when a forkedprocess actually reads the file to transfer that the doppel-ganger is spawned. This allows sshd to perform more op-erations without synchronizing with a doppelganger thaneither Apache or the NFS server.

Also, Apache does relatively less work per requestthan the NFS server. Apache parses an http request, readsdata from the file system (possibly over multiple systemcalls), and writes the result to a network socket. The NFSserver must perform many more functions before return-ing a file, including checking permissions and managing

12

Page 13: TightLip: Keeping Applications from Spilling the Beanslpcox/nsdi07/tr-cs-2006-07.pdf · A common approach to data condentiality is to de-compose a system’s software into three sets

All Sensitive,Metric No Sensitive Continuous

Request Rate (/s) 29.3 (0.8) 27.7 (0.6)Response Time (ms) 298.1 (6.6) 312.7 (7.1)Transfer Rate (kb/s) 399.2 (0.3) 382.3 (1.5)

Table 3: TightLip-Apache SpecWeb99 results.

an in-memory cache.Furthermore, the NFS server is a single process that (in

continuous mode) must synchronize with a doppelgangeron all system calls. Apache is structured as a dispatcherprocess and nine workers; only the workers having dop-pelgangers. These structural and functional differencesexplain why NFS experiences more overhead in contin-uous mode.

Despite the NFS server’s overhead it is crucial toput these results in a larger context. Even with dop-pelgangers running continuously, TightLip outperformsprior taint-checking approaches by many orders of mag-nitude. For example, Apache running under TaintCheckand serving 10KB files is nearly 15 times slower thanan unmodified server. For 1KB files, it is 25 timesslower [17]. Thus, even in the worst case, using dop-pelgangers provides a significant performance improve-ment.

In addition to their performance impact, we were in-terested in how much memory doppelgangers consumed.To measure this, we counted the number of pages newlyallocated and copied-on-write per request for all threeserver configurations. The normalized numbers are inFigure 8. Each series is the number of new and copiedpages beyond those allocated and copied satisfying a re-quest in the baseline case. This was only relevant to sshd,which forks a new process for each connection. For thecontinuous series, the values denote the total number ofpages copied; in these experiments individual doppel-gangers executed over many requests.

For all three applications, very few pages were createdfor the doppelganger before it was swapped in. Apachedoppelgangers required an extra 15 pages, NFS an ex-tra eight pages, and sshd an extra two pages. Similarly,when doppelgangers were run continuously, the Apacheprocesses needed 34 extra pages over the 100 requestsand sshd processes needed 23 extra pages. The NFS dop-pelganger required more memory than the others, using113 extra pages. We believe that this extra memory wasdue to the NFS server’s in-memory data cache.

6.3 Web Server Performance

Our final set of experiments consisted of using theSpecWeb99 benchmark on an Apache web server run-ning on a TightLip machine. We used two configurations

0.9

0.95

1

1.05

1.1

Requests/second Response Time Transfer Rate

No

rmal

ized

Per

form

ance

All sensitive files; continuous

Figure 9: Normalized TightLip-SpecWeb99 results.

for these experiments—no sensitive files, and continu-ous execution with all files marked sensitive. Since thebenchmark verifies the integrity of every file, we config-ured TightLip to return the data supplied by the origi-nal instead of the scrubbed data supplied by the doppel-ganger. This modification was only for test purposes,so that we could run the benchmark over our kernel.Even with this modification our attempts to use Apachewith process swapping halted the benchmark, since thesechanges cannot completely eliminate the effect of datascrubbing; the swapped-in doppelgangers still had somescrubbed data in their buffers.

We configured SpecWeb99 to request static content ofvarying sizes. By the end, the client had measured theaverage request rate, response time, and transfer rate ofeach request. The results of our experiments are in Ta-ble 3 with standard deviations in parentheses. For allthree metrics, TightLip in continuous mode (the poorestperforming in our micro-benchmarks) performed within5% of unmodified Apache. Figure 9 shows the samedata, only normalized to the baseline case. This demon-strates that in some cases doppelgangers can provide pri-vacy protection at negligible performance cost.

7 Related Work

TightLip is most closely related to work in taint-flowanalysis [4, 6, 8, 11, 17, 22] and primary/backup faulttolerance [1, 2]. In Section 3, we compared TightLipto prior taint-flow work in detail. The most impor-tant distinctions between TightLip and these systems arethat TightLip is compatible with legacy-code and pro-vides complete, continuous protection at a modest per-formance cost.

TightLip’s need to limit the sources of divergence af-ter scrubbed data has been delivered to the doppelgangeris similar to the state synchronization problems of pri-mary/backup fault tolerance [1]. In the seminal pri-mary/backup paper, Alsberg describes a distributed sys-tem in which multiple processes run in parallel and mustbe kept consistent. The primary process answers clientrequests, but any of the backup processes can be swapped

13

Page 14: TightLip: Keeping Applications from Spilling the Beanslpcox/nsdi07/tr-cs-2006-07.pdf · A common approach to data condentiality is to de-compose a system’s software into three sets

in if the primary fails or to balance load across repli-cas. Later, Bressoud and Schneider applied this modelto a hypervisor running multiple virtual machines [2].The main difference between doppelgangers and pri-mary/backup fault tolerance is that TightLip deliberatelyinduces a different state and then tries to eliminate anyfuture sources of divergence. In primary/backup faulttolerance, the goal is to eliminate all sources of diver-gence.

Doppelgangers also share some characteristics withspeculative execution [3, 18]. Both involve “best-effort”processes that can be thrown away if they stray. Thekey difference is that speculative processes run while theoriginal is blocked, while doppelgangers run in parallelwith the original.

8 Conclusions

Access control configurations are tedious and error-prone. TightLip helps users define what data is sensitiveand who is trusted to see it rather than forcing them tounderstand or predict how the interactions of their soft-ware packages can leak data. In addition to helping usersidentify sensitive data, the TightLip kernel introduces anew operating system object called a doppelganger pro-cess. Doppelgangers are spawned from and run in par-allel with an original process that has handled sensitivedata. Careful monitoring of doppelganger inputs and out-puts allows TightLip to alert users of potential privacybreaches.

Evaluation of the TightLip prototype shows that theoverhead of doppelganger processes is modest. Datatransfer micro-benchmarks show several orders of mag-nitude better performance than similar taint-flow analysistechniques. SpecWeb99 results show that Apache run-ning on TightLip exhibits a negligible 5% slowdown inrequest rate, response time, and transfer rate compared toan unmodified server environment.

References

[1] P.A. Alsberg and J. D. Day. A principle for resilient shar-ing of distributed resources. In ICSE, San Francisco, CA,October 1976.

[2] T. Bressoud and F.B. Schneider. Hypervisor-based fault-tolerance. ACM TOCS, pages 80–107, February 1996.

[3] F. Chang and G. A. Gibson. Automatic I/O hint gener-ation through speculative execution. In OSDI, New Or-leans, LA, Feburary 1999.

[4] J. Chow, B. Pfaff, T. Garfinkel, K. Christopher, andM. Rosenblum. Understanding data lifetime via wholesystem simulation. In USENIX Security, August 2004.

[5] S. Consolvo, Ian Smith, T. Matthews, A. Lamarca,J. Tabert, and P. Powledge. Location disclosure to socialrelations: Why, when, and what people want to share. InCHI, Portland, OR, April 2005.

[6] M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou,L. Zhang, and P. Barham. Vigilante: End-to-end contain-ment of internet worms. In SOSP, Brighton, UK, October2005.

[7] T. Dierks. The TLS protocol. Internet RFC 2246, January1999.

[8] P. Efstathopoulos, M. Krohn, S. VanDeBogart, C. Frey,D. Ziegler, E. Kohler, D. Mazieres, M. F. Kaashoek, andR. T. Morris. Labels and event processes in the asbestosoperating system. In SOSP, Brighton, UK, October 2005.

[9] J. T. Giffin, S. Jha, and B. P. Miller. Efficient context-sensitive intrusion detection. In NDSS, San Diego, CA,February 2004.

[10] N.S. Good and A. Krekelberg. Usability and privacy: astudy of Kazaa P2P file-sharing. In CHI, Ft. Lauderdale,FL, April 2003.

[11] A. Ho, M. Fetterman, C. Clark, A. Warfield, and S. Hand.Practical taint-based protection using demand emulation.In EuroSys, Leuven, Belgium, April 2006.

[12] J. Leyden. ChoicePoint fined $15m over data securitybreach. The Register, January 27, 2006.

[13] J. Leyden. HK police complaints data leak puts city onedge. The Register, March 28, 2006.

[14] Andy McCue. CIO Jury: IT bosses ban google desktopover security fears. silicon.com, March 2, 2006.

[15] B. P. Miller, L. Fredriksen, and B. So. An empirical studyof the reliability of UNIX utilities. Communications ofthe ACM, 33(12):32–44, 1990.

[16] G. C. Necula, J. Condit, M. Harren, S. McPeak, andW Weimer. CCured: Type-safe retrofitting of legacy soft-ware. TOPLAS, 27(3), May 2005.

[17] J. Newsome and D. Song. Dynamic taint analysis forautomatic detection, analysis, and signaturegeneration ofexploits on commodity software. In NDSS, San Diego,CA, February 2005.

[18] E. Nightingale, P. Chen, and J. Flinn. Speculative execu-tion in a distributed file system. In SOSP, Brighton, UK,October 2005.

[19] Associated Press. Miami university warns students of pri-vacy breach. Akron Beacon Journal, September 16, 2005.

[20] M. Rinard, C. Cadar, D. Dumitran, D. M. Roy, T. Leu,and W. S. Beebee. Enhancing server availability and se-curity through failure-oblivious computing. In OSDI, SanFrancisco, CA, December 2004.

[21] B. Sprunt. Pentium 4 performance monitoring features.IEEE Mirco, pages 72–82, July-August 2002.

[22] S. Zdancewic, L. Zheng, N. Nystrom, and A. C. Myers.Untrusted hosts and confidentiality: Secure program par-titioning. In SOSP, Banff, Canada, October 2001.

14