Mvedsua: Higher Availability Dynamic Software Updates via ...

13
Mvedsua: Higher Availability Dynamic Software Updates via Multi-Version Execution Luís Pina George Mason University [email protected] Anastasios Andronidis Imperial College London [email protected] Michael Hicks University of Maryland [email protected] Cristian Cadar Imperial College London [email protected] Abstract Dynamic Software Updating (DSU) is a technique for patch- ing stateful software without shutting it down, which enables both timely updates and non-stop service. Unfortunately, bugs in the update itself—whether in the changed code or in the way the change is introduced dynamically—may cause the updated software to crash or misbehave. Furthermore, the time taken to dynamically apply the update may be un- acceptable if it introduces a long delay in service. This paper makes the key observation that both prob- lems can be addressed by employing Multi-Version Execution (MVE). To avoid delay in service, the update is applied to a forked copy while the original system continues to operate. Once the update completes, the MVE system monitors that the responses of both versions agree for the same inputs. Expected divergences are specified by the programmer using an MVE-specific DSL. Unexpected divergences signal pos- sible errors and roll back the update, which simply means terminating the updated version and reverting to the orig- inal version. This is safe because the MVE system keeps the state of both versions in sync. If the new version shows no problems after a warmup period, operators can make it permanent and discard the original version. We have implemented this approach, which we call Mved- sua, 1 by extending the Kitsune DSU framework with Varan, a state-of-the-art MVE system. We have used Mvedsua to up- date several high-performance servers: Redis, Memcached, and Vsftpd. Our results show that Mvedsua significantly reduces the update-time delay, imposes little overhead in steady state, and easily recovers from a variety of update- related errors. This work was done while Luís was a researcher at Imperial College London 1 Mvedsua (“MVE” + “DSU”) is pronounced “Medusa”. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. ASPLOS ’19, April 13–17, 2019, Providence, RI, USA © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-6240-5/19/04. . . $15.00 hps://doi.org/10.1145/3297858.3304063 CCS Concepts Computer systems organization Reliability; Availability; Maintainability and maintenance. ACM Reference Format: Luís Pina, Anastasios Andronidis, Michael Hicks, and Cristian Cadar. 2019. Mvedsua: Higher Availability Dynamic Software Up- dates via Multi-Version Execution. In 2019 Architectural Support for Programming Languages and Operating Systems (ASPLOS ’19), April 13–17, 2019, Providence, RI, USA. ACM, New York, NY, USA, 13 pages. hps://doi.org/10.1145/3297858.3304063 1 Introduction For many modern software systems, constant availability is a key requirement. At the same time, such systems are often subject to frequent updates, including security patches and feature improvements. Applying such updates by stop- ping, patching, and restarting the system results in an unac- ceptable loss of availability, therefore a more sophisticated, “rebootless” updating technique must be used. 1.1 Seeking Fast, Reliable Updates to Stateful Services One common updating approach is the rolling upgrade [6]. This technique supports updating stateless nodes in a dis- tributed service (e.g., application servers in a web service) by gracefully directing new connections away from a node and then stopping, patching, and restarting it once its work queue is empty. Thus each node is upgraded (in a “rolling” fashion) without disrupting the overall service. Rolling upgrades work in many cases, but not always. Nodes with long-running sessions (e.g., SSH and remote access servers) present problems because they cannot be up- dated until sessions terminate. This is because sessions are stateful, and dropping session state ungracefully is disrup- tive. Stateful servers are problematic in general. For example, the Snort intrusion detection system [39] builds a substan- tial in-memory state machine to detect multi-packet attacks. Shutting down and restarting Snort drops this state machine and thus potentially misses a mounting attack [19]. Updating in-memory databases, like Redis and Memcached, requires checkpointing their state before shutdown and restart, which can introduce a long pause in service. This problem is acute enough that Facebook uses a custom Memcached that keeps in-memory state in a RAMdisk to which it recon- nects on restart after an update, “so that the data can remain 1

Transcript of Mvedsua: Higher Availability Dynamic Software Updates via ...

Page 1: Mvedsua: Higher Availability Dynamic Software Updates via ...

Mvedsua: Higher Availability Dynamic SoftwareUpdates via Multi-Version Execution

Luís Pina∗George Mason University

[email protected]

Anastasios AndronidisImperial College London

[email protected]

Michael HicksUniversity of [email protected]

Cristian CadarImperial College [email protected]

AbstractDynamic Software Updating (DSU) is a technique for patch-ing stateful softwarewithout shutting it down, which enablesboth timely updates and non-stop service. Unfortunately,bugs in the update itself—whether in the changed code or inthe way the change is introduced dynamically—may causethe updated software to crash or misbehave. Furthermore,the time taken to dynamically apply the update may be un-acceptable if it introduces a long delay in service.This paper makes the key observation that both prob-

lems can be addressed by employing Multi-Version Execution(MVE). To avoid delay in service, the update is applied to aforked copy while the original system continues to operate.Once the update completes, the MVE system monitors thatthe responses of both versions agree for the same inputs.Expected divergences are specified by the programmer usingan MVE-specific DSL. Unexpected divergences signal pos-sible errors and roll back the update, which simply meansterminating the updated version and reverting to the orig-inal version. This is safe because the MVE system keepsthe state of both versions in sync. If the new version showsno problems after a warmup period, operators can make itpermanent and discard the original version.

We have implemented this approach, which we callMved-sua,1 by extending the Kitsune DSU framework with Varan,a state-of-the-art MVE system.We have usedMvedsua to up-date several high-performance servers: Redis, Memcached,and Vsftpd. Our results show that Mvedsua significantlyreduces the update-time delay, imposes little overhead insteady state, and easily recovers from a variety of update-related errors.∗This workwas donewhile Luís was a researcher at Imperial College London1Mvedsua (“MVE” + “DSU”) is pronounced “Medusa”.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copiesare not made or distributed for profit or commercial advantage and thatcopies bear this notice and the full citation on the first page. Copyrightsfor components of this work owned by others than the author(s) mustbe honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee. Request permissions from [email protected] ’19, April 13–17, 2019, Providence, RI, USA© 2019 Copyright held by the owner/author(s). Publication rights licensedto ACM.ACM ISBN 978-1-4503-6240-5/19/04. . . $15.00https://doi.org/10.1145/3297858.3304063

CCS Concepts • Computer systems organization →Reliability;Availability;Maintainability andmaintenance.

ACM Reference Format:Luís Pina, Anastasios Andronidis, Michael Hicks, and CristianCadar. 2019. Mvedsua: Higher Availability Dynamic Software Up-dates via Multi-Version Execution. In 2019 Architectural Supportfor Programming Languages and Operating Systems (ASPLOS ’19),April 13–17, 2019, Providence, RI, USA. ACM, New York, NY, USA,13 pages. https://doi.org/10.1145/3297858.3304063

1 IntroductionFor many modern software systems, constant availabilityis a key requirement. At the same time, such systems areoften subject to frequent updates, including security patchesand feature improvements. Applying such updates by stop-ping, patching, and restarting the system results in an unac-ceptable loss of availability, therefore a more sophisticated,“rebootless” updating technique must be used.

1.1 Seeking Fast, Reliable Updates to StatefulServices

One common updating approach is the rolling upgrade [6].This technique supports updating stateless nodes in a dis-tributed service (e.g., application servers in a web service)by gracefully directing new connections away from a nodeand then stopping, patching, and restarting it once its workqueue is empty. Thus each node is upgraded (in a “rolling”fashion) without disrupting the overall service.Rolling upgrades work in many cases, but not always.

Nodes with long-running sessions (e.g., SSH and remoteaccess servers) present problems because they cannot be up-dated until sessions terminate. This is because sessions arestateful, and dropping session state ungracefully is disrup-tive. Stateful servers are problematic in general. For example,the Snort intrusion detection system [39] builds a substan-tial in-memory state machine to detect multi-packet attacks.Shutting down and restarting Snort drops this state machineand thus potentially misses a mounting attack [19].

Updating in-memory databases, like Redis andMemcached,requires checkpointing their state before shutdown and restart,which can introduce a long pause in service. This problemis acute enough that Facebook uses a custom Memcachedthat keeps in-memory state in a RAMdisk to which it recon-nects on restart after an update, “so that the data can remain

1

Page 2: Mvedsua: Higher Availability Dynamic Software Updates via ...

ASPLOS ’19, April 13–17, 2019, Providence, RI, USA Pina, Andronidis, Hicks, and Cadar

live across a software upgrade and thereby minimize disrup-tion” [29]. In short: While rolling upgrades work well onstateless servers, they do not solve the problem of updatingthe stateful components of a service.

Dynamic software updating (DSU) is a technique for updat-ing stateful servers without shutting them down or losingany state. DSU typically works by updating a process in place,replacing the code and transforming the existing in-memorystate—both control (i.e., thread call stacks and program coun-ters), and data (i.e., contents and format of heap objects)—toan equivalent representation that is compatible with the newcode. State transformations are partially or fully automated.DSU technology is available in commercial products, e.g., forpatching Linux [3], Java VM-based applications [24, 30], tele-com systems using Erlang [2], and even satellite systems [43].The research community has pushed the envelope further, de-veloping support for full release-level updates to substantialapplications, including operating systems, data managementsystems, and various servers [8, 13–15, 20, 28, 37, 42].Even with state-of-the-art DSU, the promise of constant

availability can be broken by errors in updates. The updatedcode itself may have errors—many patches that aim to fixbugs end up introducing new ones [49]. Or the updated codemay be correct, but the mechanics of applying it at runtimemay be the issue. For example, state transformations mayhave bugs that cause the program to fail [17, 36]. Even iftransformations are correct, they might take a long time toperform if the state to transform is very large (e.g., the wholeheap), temporarily halting service [19, 37].

1.2 Better DSU Reliability/Availability with MVEThe key idea of this paper is that DSU’s reliability and avail-ability problems can both be addressed by usingmulti-versionexecution (MVE) [5, 9, 10, 22, 27, 40, 46, 48]. An MVE systemworks by running multiple instances of a program at once,ensuring that each sees the same inputs and confirming thatall produce the same (or equivalent) outputs.To address the availability problem, we can fork the cur-

rent program (the leader) and connect it via MVE with thechild (the follower), which will perform the update. Whilethe update is taking place, the leader continues to provideservice, avoiding or shortening any update-induced pause.

To address the reliability problem, the MVE system moni-tors the behavior of both versions. When the update on thefollower completes, the MVE system feeds it the events itmissed, already processed by the leader. During and after thiscatch-up period, the MVE system compares the responses ofboth versions. Any disagreement may signal a bug in eitherthe new version or the update. In response, we can terminatethe follower, effectively rolling back the update. Importantly,no state changes made during or after the update are lost.Afterrunning for a while with the old version as leader, we canpromote the new version to leader, and eventually drop theold version (a demoted follower by now).

This solution to DSU’s availability and reliability prob-lems, which we call Mvedsua, is new. Proteos [15] handleserrors that manifest during updates, but not afterward, whileMUC [38] combines MVE and DSU, but not in a way thataddresses either the availability or reliablity problems (and,as it turns out, it imposes far higher overheads). §7 considersprior work in detail.

Mvedsua’s novel use of MVE creates some challenges. Inparticular, the behavior of the old and new versions shouldchange in most cases, due to added features and bug fixes.Yet MVE judges any divergence in external behavior as prob-lematic. There are two situations to consider. The first iswhen the new behavior is superficially different from theold, but both are equivalent to the client. For example, asingle system call in the old version might be broken intomultiple system calls in the new version.Mvedsua allowsthe programmer to specify such allowed differences usingdomain-specific languages (DSLs) provided by modern MVEsystems [23, 27, 34].The second, more interesting situation is when the new

version’s behavior purposely disagrees with the old. For ex-ample, the new version might support a new client command(e.g., store data in a new format). For this version, executingthe command updates the new version’s state, while the oldversion rejects the command and does not change the state.As such, tolerating the system call differences will result inlater divergences (e.g., retrieving the data stored in the newformat, only present in one version). To handle this situation,we use the MVE DSL to specify that semantically dissonantsystem call sequences leave the leader and follower in equiv-alent states. For the case of the added command, the DSLcan be used to direct an invalid command to the (updated)follower so its subsequent behavior, and state, matches thatof the (outdated) leader. As a result, Mvedsua essentiallyenforces the semantics of the old version while this versionis the leader, testing that the new version matches that se-mantics when it should. Once operators are confident thenew version is behaving correctly, they make it the leaderand thus expose (and test) its new semantics.

We have implementedMvedsua by extending Kitsune [20],a DSU system for C programs, with MVE support fromVaran [23], a modern, high-performance MVE system. Wehave evaluated Mvedsua by using it to perform multipledynamic updates on three high-performance servers: Mem-cached (2 updates), Redis (3), and Vsftpd (13). Our results arepromising. Mvedsua addresses the problems it set out to: itcan completely eliminate the pause due to updating and itcan detect and recover from a variety of errors, includingthose due to failed update specifications, and failures in theupdated code itself. In addition,Mvedsua’s costs are modest.The added programmer effort of usingMvedsua was man-ageable: No DSL rules were needed for either Memcachedupdate, one was needed for Redis, and, on average, one wasneeded for each Vsftpd update. Only Memcached required

2

Page 3: Mvedsua: Higher Availability Dynamic Software Updates via ...

Mvedsua: Higher Availability DSU via MVE ASPLOS ’19, April 13–17, 2019, Providence, RI, USA

interesting code changes (about 100 lines) to work properlywith Mvedsua, owing to its use of multiple threads andstateful libraries. In terms of runtime overhead, we foundthatMvedsua imposes 3–9% overhead on throughput duringnormal operation, as compared to 0–3% for Kitsune alone.While Mvedsua is monitoring both the original and up-dated versions, the overhead is 25–52% (which essentiallymatches Varan’s overhead)—but this overhead is only in-curred temporarily, and can be further mitigated by usingrolling upgrades.

1.3 ContributionsIn summary, the main contributions of this paper are:(1) A novel approach to supporting reliable, low-latencyupdates to stateful services, called Mvedsua. This approachaugments DSUwithMVE so as to hide the latency of dynamicupdates, and tolerate errors in the update process, includingthose from bugs introduced by the new software version, anderrors specific to the DSU process. The use of MVE ensuresthat no state is lost when recovering from bugs, even thosemanifesting after an update is deployed.(2) A prototype implementation of Mvedsua using Kitsuneand Varan, together with an empirical evaluation on thehigh-performance servers Redis, Memcached and Vsftpd.Our evaluation on more than a dozen updates shows thatMvedsua can effectively reduce update latency while incur-ring only a modest overhead of 3–9% in steady state. Forthese systems,Mvedsua can also correctly detect, abort, andrecover from erroneous updates; and can tolerate expecteddivergences in behavior across versions.

2 DSU: Background and ProblemStatement

This section sets up the problem we are trying to solve:How to quickly and reliably update a long-running, state-ful service. We introduce a running example update, usedthroughout the paper. We then explain how standard tech-niques such as rolling upgrades struggle with this example.Next, we introduce Dynamic Software Updating (DSU) as asolution that can handle general-purpose upgrades to state-ful services, including our example. Finally, we outline keychallenges faced by the state-of-the-art DSU systems, whichMvedsua addresses.

2.1 Update ExampleFigure 1a shows the API of a key-value store. The store iscontained in the global structure table, which has SIZE en-tries. Each entry has two fields: key and val. Clients storeand access data by calling functions put and get, respectively.The application server exposes this API through a simplewire protocol in the form of text commands, such as PUTbalance 1000 and GET balance, which the server (respec-tively) translates to calls to put and get.

1 typedef char∗ str;2

3

4 struct entry {str k; void∗ v;};5 #define SIZE 10246 struct entry∗ table[SIZE];7

8

9 void put(str k, void ∗v) { ... }10 struct entry ∗get(str k) { ... }

1 typedef char∗ str;2 typedef int typ;3 typ string = 1, number = 2; date = 3;4 struct entry {str k; void∗ v; typ t;};5 #define SIZE 10246 struct entry∗ table[SIZE];7

8 typ type(str k) { ... }9 void put(str k,void ∗v, typ t) { ... }10 struct entry ∗get(str k) { ... }

(a) Original (b) Update

Figure 1. An update for an in-memory key-value store.

Figure 1b shows an update to this program that: (1) ex-tends entry with a t field that indicates the value’s type; and(2) defines some standard types string, number, and date. Thischange impacts the signature of put, which now takes the typas an extra argument. The server assigns type string to out-dated client requests (e.g., PUT balance 100 translates toput("balance", "100", string)). Updated requests can specify thetype explicitly (e.g., PUT-number balance 100 translates toput("balance", "100", number)). The request GET balance doesnot change. A new request form TYPE balance translatesto type("balance"), returning the key’s type.

2.2 Challenges of Stateful UpgradesSuppose we would like to upgrade a running version of thekey-value store according to the above change. The simplestway to do so is to stop the old version and restart withthe new one. The problem is that our server is stateful, dueto its maintenance of table. Stopping the old version dropsthis important state, harming clients. For example, a stop-restart after client request PUT balance 1000 would causesubsequent request GET balance to fail, rather than return1000 as expected.The industry-standard rolling upgrade [6] approach does

not immediately help us here. Rolling upgrades work byshutting down and restarting individual nodes when theyhave completed their work, relying on the other nodes tomaintain the overall service. Ultimately, individual nodesmust be restarted, and if these are stateful, that state will belost.To mitigate this problem, a server could checkpoint its

state to persistent storage on exit, and restore it when start-ing in the new version. This process presents some chal-lenges. First, the state format may change between ver-sions, as happens in our example, necessitating a backward-compatible checkpoint/restore protocol. Second, the timerequired to persist and restore the state can be non-trivial,and therefore lengthen the latency of the full-service update.For example, checkpointing and restarting a 10GB Redis heap

3

Page 4: Mvedsua: Higher Availability Dynamic Software Updates via ...

ASPLOS ’19, April 13–17, 2019, Providence, RI, USA Pina, Andronidis, Hicks, and Cadar

took 28 seconds in one experiment [20], and 1GB H22 heaptook 13 seconds in another [35]. This delay is sufficientlydisruptive that Facebook avoids it on updates to its Mem-cached nodes by customizing Memcached to store its stateon a RAMdisk that the new version can immediately connectto [12]. Of course, Facebook’s approach only works if theinternal representation of the state does not change betweenversions; this is not true for our example.

A related mitigation to state checkpointing is state repli-cation. Most distributed services employ a protocol for statereplication to ensure fault tolerance. If the state represen-tation does not change due to an update, this replicationlayer is sufficient to preserve a node’s state during a rollingupgrade—the restarted node will warm its state from nearbyreplicas. But this approach suffers similar problems to ex-plicit persistence. Changes in state representation necessitatea change to the replication protocol that is compatible acrossversions. Even if this change is made, warming a replicapost-upgrade can take a while, hurting performance in themeantime.

Finally, we observe that not all stateful software servicesthat might benefit from on-the-fly updates necessarily cantake advantage of distribution and replication. Satellites mustprovide non-stop service and adapt to changing technolo-gies during their long lifetime, which pushes satellite con-trol systems to feature DSU [43]. The Internet-of-Things(IoT) promises a plethora of devices that must be updateddue to security concerns and changing requirements. Up-dating such IoT devices is an open problem, and DSU willbe unavoidable [44]. Non-Volatile Memory (NVM) makes allprogram state transparently persistent. As a consequence,the stop-restart approach simply does not work as a validmeans of updating the program as it will not reset the state.Again, DSU is central to support evolving programs in NVMdeployments [4].

2.3 Dynamic Software UpdatingDynamic Software Updating (DSU) is a solution to the prob-lem of upgrading a running, stateful process. With it, criticalupdates can be applied in a timely fashion without disrupt-ing existing sessions and without dropping performance-or safety-critical in-memory state, even when that state’srepresentation changes. Furthermore, DSU can be used as amechanism for per-node updates within an overall rollingupgrade.

DSU systems usually perform updates in two steps. First,they dynamically load new code, which constitutes the log-ical modifications to the program. Second, they transformthe program’s execution state into a form that is compatiblewith the updated code. In our example, all entries in table

before the update must be updated to have an additional t

2H2 is a SQL database implemented in Java: http://www.h2database.com

field. Oftentimes, automated support assists with this pro-cess (e.g., to reallocate entries that need more space), but theprogrammer also gets involved. In our example, the program-mer might indicate that all existing entries should have t setto string. Control state may also change between versions.For example, new variables or function parameters mightbe allocated on the stack. Mapping the running program’scontrol state to one compatible with new code also often re-quires programmer assistance. For instance, if function put isactive at the time of an update, programmers may transformthat stack frame by adding a new argument to represent thetype of the entry being added to the store.

The task of writing data and control state transformationsis fairly simple when a DSU system permits restricting whenupdates may happen. For example, a DSU systemmay simplydisallow updates when functions modified by the update areactive (e.g., put) [26, 28, 42]. Automatic techniques such asthis activeness checking work in some limited cases [3, 30]but, unfortunately, are incomplete in general and may stilllead to update errors simply due to performing the update atthe wrong time [17, 36]. Some solutions allow the program-mer to transform the stack in-place [26], others require theprogrammer to list when updates can happen [20, 21, 28, 37]or cannot happen [42]. In the former case, programmerscan specify update points at which all active threads mustpause (or “quiesce”) before the update is applied, with thegoal of ensuring all expected state invariants hold prior totransformation, and avoiding races while it takes place. Adetailed discussion of the design space of DSU can be foundelsewhere [31].Following decades of research, state-of-the-art DSU sys-

tems largely manage to support full-featured software up-grades while imposing minimal performance overhead, andrequiring little extra programmer work [20, 37, 47]. To makea system DSU-ready imposes a modest, one-time cost butlittle maintenance work. For instance, Kitsune [20] is ableto update the Tor anonymous router (76K LoC) with just159 LoC of changes, and the multimedia server IceCast (16KLoC) with 134 LoC of changes. In both cases combined, sup-port for 18 versions required only a combined effort of 24LoC. Rubah [37] showed similar numbers for Java. Both ofthese systems impose only a few percent overhead on nor-mal execution. Besides these examples, state-of-the-art DSUtools have demonstrated updates to the Snort IDS, a Quake 2port, the PostgreSQL database [26], and the Minix [15] andLinux [3] operating systems, among other examples.

2.4 Threats to AvailabilityThe advantages of DSU hold if all goes well, but there is apotential for problems. First off, there could be errors in thenew program version. New bugs may escape notice duringthe testing process and only manifest once a dynamic updateis applied, e.g., as a crash or wrong answer. There is also theproblem of bugs in the DSU-specific parts of the program, e.g.,

4

Page 5: Mvedsua: Higher Availability Dynamic Software Updates via ...

Mvedsua: Higher Availability DSU via MVE ASPLOS ’19, April 13–17, 2019, Providence, RI, USA

due to the wrong specification of when updates may happen,or how to transform the state. Roughly speaking, these canbe broken down as timing errors and state transformationerrors.

As an example timing error, suppose that a multi-threadedprogram attempts an update while thread T1 holds a lockand thread T2 waits for it. This may happen because theprogrammer allowed an update to happen at such a wrongpoint in the program [36]. As a result, T1 will block waitingfor all threads to be ready to perform the update while T2 isblocked waiting for the lock that will never be released.A state transformation error can arise when the code to

transform existing state is wrong. In our example, field t ismistakenly left uninitialized, rather than explicitly initializedto a default value (like string). Then code that retrieves theupdated entries may behave incorrectly. Or, suppose thatthe programmer mistakenly forgets to copy over the entriesfrom the old table to the new version, leaving the latteruninitialized.3 As such, the new version may fail to find anentry on a subsequent GET that should have been present.

Post-update failures are an extreme threat to system avail-ability, but a less extreme threat is the delay in service thatoccurs while an update is taking place. The delay is due to:(1) the time to quiesce the program threads, and (2) the timeto transform an arbitrarily large program state. Part (1) canoften be minimized [16], but part (2) may fundamentally takea while. For instance, in the example shown in Figure 1, thestate transformation must iteratively update every existingentry in the table. If SIZE is large, this could take a long time.In the next section, we present Mvedsua, our approach

to addressing these problems.

3 Better DSU with MvedsuaMvedsua extends a DSU system withmulti-version execution(MVE). This section describes MVE and then explains howMvedsua employs it to reduce the pause in service due to adynamic update, and to gracefully tolerate errors that mightbe introduced by it.

3.1 Multi-Version ExecutionMulti-Version Execution (MVE) allows several processes toexecute in parallel over the same inputs to increase eitherreliability (a bug that affects only some of the processes istolerated by the others which continue execution) or security(attacks need to succeed in all processes to go undetected,significantly raising the bar for a successful attack). MostMVE systems operate at the system-call level [22, 23, 25, 40,46], checking that all processes issue the same sequence ofsystem calls and ensuring these produce the same results.For instance, if two processes P1 and P2 issue a read systemcall from a socket, MVE ensures that the socket is only readonce and both processes receive the same data.

3Not all DSU systems leave this task to the programmer, but Kitsune does.

Leader

Follower

Ring-Buffer

t0 t1 t2 t3 t4 t5 t6 t7

Version 0

Version 1

t0 t1 t2 t3 t4 t5 t6 t7

SingleLeader

OutdatedLeader

UpdatedLeader

SingleLeader

Stage

Figure 2. Mvedsua’s update stages.

An efficient way to perform MVE is to define one processas the leader and all others as followers [23, 25, 46]. Theleader interacts with the underlying operating system (OS)by issuing system calls, while the followers check that theirsystem calls match the leader’s, in which case they get theirresults from it, not from the OS. This is typically done usinga ring buffer. The leader registers each system call and itsresult on the ring buffer. Each follower matches each systemcall with its current position on the ring buffer, ensuring thatit is about to perform the same system call with the samearguments, and if so returning the leader’s results from thering buffer. MVE can handle the non-determinism introducedby multi-threading, with some limitations [23, 32, 45].A divergence occurs when the sequence of system calls

issued by a follower does not match that of the leader. Some-times a divergence indicates a problem, but not always. Forinstance, a divergence could occur when the leader andfollower are executing different versions of the same pro-gram and they issue different, but equivalent, system calls. Aproven way to tolerate expected divergences is to provide theMVE system with a set of rewrite rules to map the sequenceof system calls of the follower into a different sequence thatmatches the leader’s [23, 27, 32, 34]. The use of rewrite rulesto tolerate divergences following an upgrade is a key elementof Mvedsua, which we describe next.

3.2 Mvedsua: MVE-enhanced DSUWe illustrate how Mvedsua enhances DSU with MVE inFigure 2. At time t0, we deploy a DSU-enabled program exe-cuting in a degenerate MVE mode with a single leader andno follower—the single-leader stage. This executes the pro-gram in a lightweight MVE runtime that will accept anotherversion later while imposing minimal overhead.

When an update becomes available at t1,Mvedsua usesthe MVE system to create a new follower by forking theleader. Then,Mvedsua uses the underlying DSU system toperform the dynamic update on the follower. This starts theoutdated leader stage. In the meantime, the leader keepsproviding service, registering its system calls on the ringbuffer. If the buffer gets full, the leader blocks until the fol-lower finishes the update.

5

Page 6: Mvedsua: Higher Availability Dynamic Software Updates via ...

ASPLOS ’19, April 13–17, 2019, Providence, RI, USA Pina, Andronidis, Hicks, and Cadar

The follower finishes the update at t2. At this point it isrunning the new version while the leader is running the oldversion. The follower will start consuming the system callsthat the leader registered on the ring buffer. As it does so,MVE confirms that it, the new version, is consistent with theold version’s behavior. Of course, there are going to be inten-tional divergences in behavior, e.g., because the new versionadded new features. These will be handled by a programmer-provided mapping, as discussed in §3.3. Eventually, the fol-lower will catch up with the leader, at t3. For the rest of theoutdated leader stage, the new version continues to be testedagainst the old one.

At t4, the operator decides to expose the updated interfaceto clients by promoting the updated version and demotingthe out-of-date version. This is a fast operation that ends att5, and involves the leader registering a special demotion/pro-motion event on the ring buffer, and becoming a followerimmediately, no longer processing incoming events. Duringthis time, there are two followers and no leader providingservice. The usage of the ring buffer drops to zero as thenew-version follower consumes the remaining system calls.

At t5, the new version becomes the leader and resumes ser-vice. During the updated leader stage, the new version reg-isters events on the ring buffer that the outdated follower willnow consume and validate. A reverse developer-providedmapping can handle expected divergences in system callsequences. This stage may be bypassed if constructing thereverse mapping is too difficult (owing to substantial changesin the new version).Finally, at t6, the operator decides that the update was

successful and terminates the outdated follower. From thispoint on,Mvedsua resumes the single-leader stage.

In sum, Mvedsua improves the reliability of dynamic up-dates and the availability of system services by:

Reducing update latency. By performing the dynamic up-date in the follower in parallel with the execution of theleader (between t1 and t2), Mvedsua avoids any pause inservice availability.

Handling in-update errors. If the dynamic update failsprior to t2,Mvedsua will terminate the follower and revertto a single-leader stage, allowing the old version to carry on.Nondeterministic failures (e.g., due to unlucky timing) cansimply be retried; deterministic failures (e.g., due to statetransformation errors) can be retried once the update is fixed.

Handlingnew-version errors. If the update succeedswith-out incident, MVE will try to match the new version’s exe-cution to the old version’s during the outdated leader stage.Bugs in the new version, or residual update bugs (e.g., due toincorrect state transformations), manifest as a new-versioncrash or a divergence. In these cases,Mvedsua terminatesthe follower and resumes single-leader mode with the oldversion until the bug is fixed and the update is retried.

Old Version(leader)

New Version(follower)

. . .

. . .

xform xform xform

cmd1 cmd2 cmd3

cmd′

1 cmd′

2 cmd′

3

map map map

Figure 3.Mapping that ensures old- and new-version statesare related by the state transformation.

Handling old-version errors. If the old version crashesbut the new version does not, this may indicate an old-version bug fixed in the new version. Mvedsua recoversby promoting the new version to sole leader (jump to t6).Such a promotion is possible at any time after t1. If the oldversion exhibits a bug between t5 and t6 that manifests as adivergence, then it (the follower) will be terminated.

3.3 Maintaining a Consistent Semantic ViewA key requirement for usingMvedsua is for the programmerto provide a mapping from system call sequences issued bythe leader to an equivalent sequence issued by the follower.During the outdated leader stage, the old version’s semanticsare primary, and an old-version view of events must be givento the new version, to ensure they maintain an equivalentsemantic view. During the updated leader stage, the newversion’s semantics are primary, and the situation is reversed.

3.3.1 Old-version Leader MappingsAfter the update is applied, Mvedsua treats the old versionas the leader, using its behavior to confirm that the newversion’s behavior is reasonable, i.e., that there was not a bugin the update, or in the new code. It does this by confirmingthat portions of the API that are backward compatible behavethe same, from the clients’ perspective, before and after theupdate.To do this, the programmer should write a mapping that

ensures that after processing each client command, boththe old version and new version are in compatible states.In particular, the new version should be in the same stateit would have been in had the old version been dynami-cally updated at that point. This situation is depicted inFigure 3. The top line depicts the processing of client com-mands cmd1, cmd2, cmd3, ... by the old-version leader. Eachof these changes the leader’s state, depicted as a red shape,e.g., adding entries to the key-value store. The map downarrow indicates the process by which system calls that cor-respond to these commands are mapped to system calls thatcorrespond to commands cmd ′1, cmd ′2, cmd ′3, ... for the newversion. Importantly, after each command, the old and newversion states should be related by the dynamic update’sstate transformer, here shown with a down arrow labeled

6

Page 7: Mvedsua: Higher Availability Dynamic Software Updates via ...

Mvedsua: Higher Availability DSU via MVE ASPLOS ’19, April 13–17, 2019, Providence, RI, USA

xform. This means that had the dynamic update occurredafter any of these commands instead, it would have producedan MVE system in the same state. As such, a divergence indi-cates a mistake in either xform, inmap, or in genuine systembehavior, e.g., due to a new bug in the new version.

For commands that have the same semantics in both ver-sions, it is likely that no syscall mapping is needed. Thisis the case for the GET and PUT commands, introduced inFigure 1. On the other hand, the new version is likely tohave added new features. Our example update introducedtwo new client-visible commands, PUT-type and TYPE. Ifwe provide no syscall mapping for these commands, thetwo versions’ behavior will diverge. Suppose a client issuesthe command PUT-number balance 1001. This commandwill be rejected by the old version (since it does not under-stand PUT-number) but accepted by the new version.We could write a mapping that tolerates this difference,

but doing so is problematic because the result would violateour expected state relation. In particular, it would no longerbe the case that dynamically updating the old version’s statewould yield the new version’s: the new-version followeradded a new entry balance→ (1001, number) to its store,but no corresponding entry in the old version would producethis on an update—the types of all updated values would bestring. Breaking the state relation will lead to divergenceslater that may not signal genuine errors. For example, asubsequent command GET balance would return an erroron the leader, since no mapping is present, but would returna value (1001) on the follower. But this divergence does notsignal an actual error in the update or the new version; it issimply a to-be-expected difference in behavior.

To avoid this spurious divergence, the programmer shouldwrite rules that force the new version to adhere to behaviorsdefined by the old version. In particular, we can easily writerules to translate new commands that the leader does not un-derstand to commands that the follower does not understandeither. Rule 1 in Figure 4 encodes this logic in the syntax ofthe Varan MVE [23]’s DSL [34].4 It says that if a read systemcall receives a PUT command with a type component, i.e.,of the form PUT-type, then it should issue bad-cmd to thenew-version follower. The follower does not understand thiscommand and will reject it, just as the old version will do forthe original command. We only show the rule for the PUTcommand for simplicity, the rules for the other commandscan be written in a similar way.

If commands understood by the old version are not validin the new version, sometimes we can write a mapping forthese. For example, if the new version dropped support forPUT, we could install Rule 2 to translate such commandsto PUT-string, which should have equivalent semantics.

4The syntax of the rules in Figure 4 is slightly changed to make the rules abit more concise; the actual rules are available at https://srg.doc.ic.ac.uk/projects/mvedsua/

1 // parse("PUT k1 v1") = (PUT, NULL, "k1", "v1")2 // parse("PUT−string k1 v1") = (PUT, string, "k1", "v1")3 // Rule 14 read(fd,s,_) {5 (cmd,typ,_,_) = parse($(s))6 return cmd == PUT && typ != NULL; } as r7 => r(fd, "bad−cmd", 7)8 // Rule 29 read(fd,s,n) {10 (cmd,typ,_,_) = parse($(s))11 return cmd == PUT && typ == NULL; } as r => r(fd,s,n) {12 (cmd,_,key,val) = parse($(s));13 $(s) = "$cmd−string $key $val";14 $(n) += 7; }

(a) Updated follower t2–t415 // Rule 316 read(fd,s,_) {17 (cmd,typ,_,_) = parse($(s))18 return cmd == PUT && typ == string; } as r19 => r(fd,s,n) {20 (cmd,_,k,v)=parse($(s));21 $(s) = "$cmd $k $v";22 $(n) −= 7; }

(b) Outdated follower t5–t6

Figure 4. Rewrite rules to map syscalls for PUT.

For commands in the old version that simply have no new-version equivalent, we have no choice but to terminate thefollower unless the command produces clearly-wrong be-havior in the leader, such as a hang or a crash.

3.3.2 New-version Leader MappingsDuring the updated leader stage the new version is the leaderso the situation depicted in Figure 3 is slightly different: Thenew version is on top and the xform arrows are reversed.But the principle is the same: after each command, the twoversions should be in related states.

When the new version presents a mostly backward com-patible client API, there is little work involved. However,for new commands it may be difficult or impossible to pro-vide a proper mapping. For our example update, GET andPUT commands will work as usual. But if the client sub-mits a PUT-type command, there is no complete solution. Iftype is string, then we can map the command to a normalPUT command, for which string is the default type. This isshown in Rule 3 in Figure 4. For other values of type, there isno possible mapping, meaning that the old-version followerwill diverge from the leader and be terminated. Up to thepoint that this happens,Mvedsua will check that the newversion does not do something obviously wrong, like crash,in which case it can promote the follower. But, in general,the inability to gracefully deal with new-version commandsmeans that updated-leader stage is likely to be less useful at

7

Page 8: Mvedsua: Higher Availability Dynamic Software Updates via ...

ASPLOS ’19, April 13–17, 2019, Providence, RI, USA Pina, Andronidis, Hicks, and Cadar

finding update-related errors compared to outdated-leaderstage.

We have used added/removed/updated commands in ourexamples to explain the expected mappings needed to han-dle semantic differences between versions. Prior work byAjmani et al. [1] identifies general principles for maintaininga consistent semantic view when multiple nodes of a stateful,distributed system are interacting with clients of differentversions during a rolling upgrade. For example, they rec-ommend exposing only the intersection of behaviors acrossmultiple versions so as to ensure that each client session’ssemantics is consistent with its own version. They enforcethis behavioral restriction by translating calls between ver-sions when possible, and causing problematic calls to fail,otherwise, mirroring the approach we have described here.

4 ImplementationWehave implementedMvedsua by extending the Kitsune [20]DSU system with support from the Varan [23] MVE system.The bulk of our implementation is located inside Varan, in1202 lines of extra C code. Kitsune required minimal changes,with only 88 lines modified. We believe that basingMvedsuaon other general-purpose DSU systems would be straight-forward.

The changes to Kitsune support coordinating with Mved-sua to fork a follower and perform the update only there,while aborting the update and resuming normal executionon the leader. This is done by having Kitsune contact Mved-sua to check if the update should be taken, after stopping allthreads at update points. Mvedsua uses this opportunity tofork execution, aborting the update on the leader and allow-ing it on the follower.Mvedsua provides a callback that isinvoked on an aborted update, in case some work should bedone before resuming; we used this callback for Memcachedas discussed in §5.3. One wrinkle arises in the case of multi-threaded applications. In modern operating systems, forkinga multithreaded program results in only one thread runningin the forked process;5 all other threads must be restarted.Pleasantly, this is something that Kitsune already does afterquiescing at update points, soMvedsua piggbacks on thatsupport.

Mvedsua enhances Varan with efficient support for single-leader mode, which spans the majority of a Mvedsua pro-gram’s lifetime. Normally, the leader logs its interceptedsystem calls on the ring buffer, and the followers read fromthe buffer to match their own intercepted calls. In addition,Varan tracks kernel state that is relevant to MVE, such aslogical process IDs (used for both leader and followers), event-poll descriptors, and more. In single-leader mode, the ringbuffer is not used, but system calls must still be intercepted totrack kernel state. This state is then used when forking intoleader-follower mode later on. The overhead due to syscall

5http://man7.org/linux/man-pages/man2/fork.2.html

Table 1. Mvedsua rewrite rules per Vsftpd pair.

Versions # rules Versions # rules1.1.0 → 1.1.1 0 2.0.0 → 2.0.1 01.1.1 → 1.1.2 2 2.0.1 → 2.0.2 11.1.2 → 1.1.3 0 2.0.2 → 2.0.3 11.1.3 → 1.2.0 2 2.0.3 → 2.0.4 11.2.0 → 1.2.1 0 2.0.4 → 2.0.5 11.2.1 → 1.2.2 0 2.0.5 → 2.0.6 01.2.2 → 2.0.0 3 Average 0.85

1 read(_,_,_), write(_,"500 Unknown command",_)2 => read(_,"FOOBAR\r\n",8), write(_,"500 Unknown command",_)

Figure 5. Rewrite rule for Vsftpd to safely redirect unknowncommands to newer version.

interception is relatively small, as Varan does it via binaryrewriting [23].

5 Case StudiesWe testedMvedsua by using it to perform multiple dynamicupdates to three high-performance servers: Vsftpd, Redis,and Memcached. We found that few DSL rules were needed,and only Memcached required nontrivial code changes towork withMvedsua.

5.1 VsftpdVsftpd is an open-source FTP server and is a useful bench-mark because several other DSU systems have used it forevaluation [18, 20, 26, 28]. We used 14 versions tested in 13pairs which cover three years of releases. On average, wefound we needed only one DSL rule per update. Table 1 re-ports the versions and number of rules required (the samenumber for both the outdated and updated leader stages,where the latter were easily derived from the former).

One interesting case was version 1.2.0 introducing a newcommand, STOU, which stores a unique file in the currentworking directory. Following the methodology in §3.3.1, weused a rule to trigger an invalid command on the new versionwhile the old version is the leader. Figure 5 shows a generalform of this rule: when the old version reports 500 Unknowncommand, it rewrites the STOU command to another invalidcommand, guaranteed to result in the same behavior in thenew version.

An interesting, happy coincidence happens when the up-dated leader issues the new command STOU. Normally, thiswould cause an irreparable divergence (§3.3.2) and terminatethe outdated follower. However, Vsftpd does not keep anystate about the file system that would diverge. Therefore,the leader executes the STOU command, generates a newfile f , and later GET commands of f execute successfully onboth leader and follower. We can thus write a rule to tolerate

8

Page 9: Mvedsua: Higher Availability Dynamic Software Updates via ...

Mvedsua: Higher Availability DSU via MVE ASPLOS ’19, April 13–17, 2019, Providence, RI, USA

the STOU divergence on the follower, and the two versionswill remain in sync. No existing MVE system keeps a sep-arate copy of the file system per version for performancereasons [22, 23, 25, 40, 46]. Of course, if an MVE system pro-vided that extra separation, or if Vsftpd kept any state aboutthe file system (e.g., a cache), this solution would not workandMvedsua would terminate the outdated follower.

5.2 RedisRedis6 is a single-threaded, in-memory key-value store, typ-ically used as a database cache or a message broker, withthe option of persisting the store contents to disk. We usedversions 2.0.0, 2.0.1, 2.0.2, and 2.0.3, which were also usedto evaluate Kitsune [20] and related systems [22, 38], andthus allows for a direct performance comparison. We alsoevaluated Redis with a bug fix to handle unitialized readerrors (7 lines per version, the same lines in the same files),and a DSL rule for 2.0.0 → 2.0.1 as 2.0.1 reverses the orderof two system calls when handling client commands.

5.3 Memcached/LibEventMemcached7 is a multi-threaded, in-memory key-value store,typically used for caching results of database queries, APIcalls, or page rendering. We used versions 1.2.2, 1.2.3, and1.2.4, which were also used to evaluate Kitsune [20] and re-lated systems [38]. Memcached is built around LibEvent,8which replaces the event loop found in many applications.With LibEvent, applications register file descriptors, time-outs, and signals they are interested in; and function pointersto execute when these events happen. LibEvent’s internalevent loop invokes the appropriate callbacks when eventsoccur.Memcached’s Kitsune support required changes to work

with Mvedsua so that it could properly abort the update onthe leader. One change involved writing a callback to resetsome of LibEvent’s state (see §4), to avoid spurious diver-gences due to the order that events are handled. In particular,each Memcached thread registers many events in which it isinterested. When several events become available, LibEventinvokes the callbacks in a round-robin fashion, rememberingwhere it was after each invocation. The updated followerdoes not have this memory, which means it may handleevents in a different order, causing spurious divergences. Re-setting the state on the leader ensures that it and the followerare in sync.A more interesting problem arose because of the use of

LibEvent.Memcached’s event handling loop is inside LibEvent.When an update is signaled Kitsune interrupts LibEvent,which causes all threads to exit LibEvent, reach an updatepoint, and then terminate (as Kitsune would relaunch them

6https://redis.io/

7https://memcached.org

8http://libevent.org/

in the following version). WithMvedsua, the update is sig-naled on the leader (to fork execution in MVE), which causesthe leader to terminate execution, wrongly. One solution is tocause Kitsune to relaunch threads in the same version on theleader. Unfortunately, this is not sufficient: Later, when de-moting the leader, no update is available to break LibEvent’sloop. Instead, we extended Kitsune to optionally consider in-stances of system call epoll_wait as an update point. Thisallows LibEvent to reach an update point frequently, with-out exiting; which works for establishing quiescence whenupdating originally, and for swapping leader and follower.In total, we modified 114 lines for each Memcached ver-

sion (same lines in the same files). No version changed thesequence of system calls or added any commands, so we didnot write any DSL rules.

6 Experimental EvaluationHere we describe how we used the applications describedin §5 to evaluate the performance and efficacy of Mved-sua. In summary, we found thatMvedsua introduces 3–9%overhead during the single-leader stage, which spans thevast majority of program execution, and 25–52% overheadwhen monitoring an update; that Mvedsua eliminates up-date pauses completely; and that it recovers from real andrealistic update errors.

6.1 PerformanceWe evaluatedMvedsua’s performance by measuring its ef-fect on the throughput of our test applications. We ran Redisversions 2.0.0 and 2.0.1, and Memcached versions 1.2.2 and1.2.3, both with the benchmark Memtier9 version 1.2.10. Weran it for 6 minutes, starting from an empty store, and usinga 90% read 10% write workload. For Vsftpd, we used versions2.0.5 and 2.0.6 with a custom benchmark script which sim-ply logs in and repeatedly downloads a particular file for 60seconds before logging out. We considered a “small” versionof the benchmark with a 5B file, and a “large” version witha 10MB file, with the former stressing user-space FTP com-mand processing, and the latter stressing the kernel-spacesyscall processing, which puts a load on Varan.We performed this evaluation on a machine equipped

with two Intel Xeon E5-2450 CPUs, each with 8 physicalcores; and 192GB RAM. To prevent NUMA memory-accessnoise, the server processes execute on one CPU and the clientbenchmark on the other. All live threads have a dedicatedcore. Results report the average and standard deviation of10 runs.

Unless otherwise specified, Varan was configured to use abuffer size of 256 entries. Each entry in the ring buffer is 32Blong; the largest buffer used with 224 entries requires 512MBof memory.

9https://github.com/RedisLabs/memtier_benchmark

9

Page 10: Mvedsua: Higher Availability Dynamic Software Updates via ...

ASPLOS ’19, April 13–17, 2019, Providence, RI, USA Pina, Andronidis, Hicks, and Cadar

Table 2. Steady-state performance and overhead of Memcached, Redis, and Vsftpd.

VersionMemcached Redis Vsftpd small Vsftpd large

Ops/sec Overhead Ops/sec Overhead Ops/sec Overhead Ops/sec Overhead×1000 vs. Native ×1000 vs. Native vs. Native vs. Native

Native 249 ± 1.05 — 73 ± 0.46 — 2667 ± 5.53 — 118 ± 0.06 —Kitsune 242 ± 1.52 3% 74 ± 0.31 -1% 2535 ± 5.27 5% 117 ± 0.13 2%Varan-1 234 ± 0.78 6% 68 ± 0.92 8% 2594 ± 3.14 3% 117 ± 0.10 2%

Mvedsua-1 226 ± 1.69 9% 69 ± 0.12 6% 2458 ± 4.26 8% 116 ± 0.11 3%Varan-2 125 ± 2.13 50% 41 ± 0.46 44% 2048 ± 2.71 24% 90 ± 0.18 25%

Mvedsua-2 121 ± 5.22 52% 43 ± 0.11 42% 2001 ± 4.30 25% 89 ± 0.19 25%MUC [38] — 23.2%–87.1% — 23.2%–75.9% —Mx [22] — — 3x–16x —

Imago [11] Up to 1000x overhead

0 60 120 180 240 300 360seconds

0K

50K

100K

150K

200K

250K

ops/

sec

MemcachedRedis

Figure 6. Performance while updating Memcached and Re-dis with Mvedsua during all update stages.

We note that our performance results reflect a worst-casescenario, with the client on the samemachine as the server. Amore realistic scenario, with the client in a different location,would incur a lower performance overhead as measured bythe client benchmark since network latency would hide someof Mvedsua’s overhead.

Steady-State. Mvedsua’s performance results are presentedin Table 2. The first six rows of the table show the perfor-mance of its two key modes of operation, single-leader mode(Mvedsua-1) and outdated/updated-leader mode (Mvedsua-2), as well as of related configurations which highlight com-ponent costs. The last three rows of Table 2 show the perfor-mance of competing techniques, explained in §7.

Mvedsua-1 represents a program’s normal mode of oper-ation, and its overhead is small compared to a Native binary:3–9%. Mode Mvedsua-2 imposes around 50% overhead, butis only enabled for a relatively short period: just during anupdate and afterward, while testing it. Roughly these over-heads correspond to the component overheads of Kitsuneadded to single-leader Varan (Varan-1) or leader-followerVaran (Varan-2), respectively.

Update time. To understand how well Mvedsua masks thepause due to updating, we evaluated the performance ofMvedsua when performing an update on Redis 2.0.0 →

0 60 120 180 240 300 360seconds

0K

10K

20K

30K

40K

50K

60K

ops/

sec

NativeKitsuneMVEDSUA 224

MVEDSUA 210

MVEDSUA 220

120 125 130 135seconds

Figure 7. Performance while updating Redis with a largeprogram state and different buffer sizes. The right-hand sidezooms-in on the 15 seconds following the update.

2.0.1 and Memcached 1.2.2 → 1.2.3.10 In our experiments,the update happens at 120 seconds (t1 in Figure 2); the pro-motion/demotion (t4) happens at 180 seconds; and singleleader mode resumes (t6) at 240 seconds. Figure 6 shows howmany operations Memtier completed per second, on aver-age. Times t1 and t6 are immediately visible, with Mvedsuaentering and exiting MVE. The performance levels matchMvedsua-1 andMvedsua-2 in Table 2. For Redis, t4 is visibleas a slight increase in throughput. A key takeaway is thatservice never stops during the updating process.To see the impact of a larger state on update time, we

modified the Redis experiment to initialize the store with1M entries before the benchmark starts, which results in aresident process size of around 250MB. We then repeated theexperiment for native without an update, Kitsune performinga 2.0.0 → 2.0.1 update, and Mvedsua performing the sameupdate with three ring-buffer sizes: 210, 220, and 224 entries.

Figure 7 shows the results. We measured the pause intro-duced by each update as the maximum latency reported byMemtier for each of 10 runs, and we report the average andstandard deviation of those maximum latencies: 5040±101msfor Kitsune; 7130 ± 45ms for Mvedsua-210; 5330 ± 100ms

10Vsftpd is essentially stateless, which means its update-time pause is al-ready low, so we did not measure it.

10

Page 11: Mvedsua: Higher Availability Dynamic Software Updates via ...

Mvedsua: Higher Availability DSU via MVE ASPLOS ’19, April 13–17, 2019, Providence, RI, USA

for Mvedsua-220; and 117 ± 41ms for Mvedsua-224. The na-tive maximum latency is 100 ± 46ms. The results thus showthat, with a large enough ring buffer, Mvedsua can maskthe update pause introduced by DSU.

Note that executing in outdated leader mode is importantfor masking update latency. This is because the ring buffercan be drained in parallel with providing service, as opposedto drained while service is paused, prior to switching to thenew version. To confirm this, we configuredMvedsua to pro-mote the updated version and terminate the outdated versionimmediately, measuring the maximum latency throughoutthe process as 3000ms, as compared to 117ms when runningfor a time in outdated leader mode.11

6.2 Fault ToleranceIn this set of experiments, we demonstrate how Mvedsua isable to detect and recover from a variety of errors.

Error in the New Code. In Redis, revision 7fb16bac intro-duces an error that crashes the server when invoking com-mand HMGET on the wrong type of data. The error is presenton all versions of Redis we tested. As an experiment, we ranversion 2.0.0 without the revision that introduced the error,so that the update 2.0.0 → 2.0.1 introduces it. Following adynamic update with Kitsune, the program crashes when aclient sends a bad HMGET command. But when using Mved-sua for the update, sending a bad HMGET command resultsin the follower failing, at which point execution reverts tosingle-leader mode. Clients proceed without incident.

Error in the State Transformation. A Kitsune-developedupdate had a latent bug: it would free memory still in use byLibEvent, whichwould cause a crash if the freedmemorywaslater re-used by the memory allocator. We observed that thiserror seemed to manifest only when a sufficiently large num-ber of clients were connected to Memcached. With Kitsune,the result was an immediate crash. WithMvedsua, however,this new-version crash is tolerated, and execution continueswith the old-version leader without clients noticing.

Timing Error. Recall that we needed to change Memcachedto reset LibEvent’s state in the leader after an aborted up-date (see §5.3). Failure to do so constitutes a kind of timingerror in the dynamic update, and leaving out our changemay produce a divergence detected by Mvedsua. While weultimately fixed the error by adding code to reset LibEvent’sstate, the fact that divergences are aborted means we couldhave simply retried the update. In an experiment, we config-ured Mvedsua to retry the update after waiting 500ms, andfound that the update was always installed eventually, aftera maximum of 8 retries with a median of 2 retries.

11The follower will take 6,200ms to do the update, during which time theleader is filling the ring buffer. When the update switches to the follower, itwill take half that time to consume the buffer.

7 Related WorkMUC [38] is the first work we know of that combines DSUand MVE. It aims to support incremental upgrades acrossdistributed systems, combining DSU with MVE to ensurethat old clients interact with old servers, new clients withnew servers, and client upgrades force a switch. As withMvedsua, an update starts by forking the current processand then updating the child. Both processes are shepherdedby a coordinator that monitors system calls between thetwo (via ptrace) and compares their outputs. Unfortunately,MUC’s MVE solution is very limited: (1) it cannot tolerateupdate-induced pauses as it runs both processes in lock-step;(2) it introduces 23–87% steady-state performance overhead;(3) it cannot handle failures during or after an update, i.e.none of the faults listed in §6.2; and (4) it cannot keep statesrelated across versions, in the manner of §3.3, and has nogood way to fix this. MUC can handle expected divergencesin behavior, but only by annotating the system calls in thesource code that are expected to diverge.

POLUS [8] also supports incremental updates, but withina process rather than across a distributed system. It allowsmultiple threads to run different code versions so they can beupdated one at a time. It employs transformation functionsto map shared data to a view consistent with the accessingthread’s version. POLUS does not support rollback on errorand does not support updates with incompatible backwardmappings. For example, for our update in Figure 1, POLUScould not back-transform store entries with non-string type.

TTST [13] proposes an approach for validating state trans-formations based on process-level updates. TTST first up-dates the old version (running in a process Old) to the newversion (running in a process New) using forward state trans-formations, and then updates the new version to an old ver-sion (running in a process Reversed) using backward statetransformations. It then compares Old and Reversed in orderto detect potential state transformation bugs, which wouldcancel the update.Mvedsua is more general than TTST inthat it may find state transformation errors that escape TTST(e.g., when both the forward and the backward transforma-tions are wrong, but in a reversible way, or when the mistakemanifests after update time); can find other types of bugsintroduced by the update process, particularly bugs intro-duced by the patch itself; and Mvedsua can mask the pausetimes introduced by live updates, which TTST cannot (moreprecisely, TTST adds 0.1–1.2s to the update time). Note thatTTST could also benefit from the overallMvedsua approach,by performing the forward and backward updates in thebackground.Proteos [15] provides OS support to update a set of pro-

cesses atomically. Similarly toMvedsua, if the update fails,Proteos simply rolls it back, allowing the to-be-updated pro-cesses to resume execution in the old version. However, Pro-teos does not monitor the updated processes after the update

11

Page 12: Mvedsua: Higher Availability Dynamic Software Updates via ...

ASPLOS ’19, April 13–17, 2019, Providence, RI, USA Pina, Andronidis, Hicks, and Cadar

is completed, so it cannot detect post-update errors such asthose due to buggy patches. Furthermore, Proteos cannottolerate update-induced delays as it pauses all processes untilthe update completes.

Mx [7, 22] uses MVE to run two program versions in par-allel and tolerates errors in one by using the other. However,Mx does not support DSU—it can run two versions with MVEfrom the beginning, but it cannot add a new version in mid-execution, resulting in fundamentally different requirementsfromMvedsua’s. Mx executes versions in lock-step, synchro-nizing at each system call, so it incurs a significant overhead,e.g., 3-16x on a comparable scenario for the versions of Redisevaluated with Mvedsua. Finally, Mx does not tolerate sys-tem call divergences nor changes to data structures, whichrestricts its ability to deploy release-level versions. Mx wasable to tolerate a simpler version of the new code error listedin §6.2, without syscall divergences; it fundamentally cannottolerate the other two faults listed in §6.2.

Imago [11] can update large distributed systems by launch-ing a full copy of the system under upgrade. Imago treatsthe whole system as a black-box, intercepting the inputs(e.g., HTTP requests) to send them to both versions, andcomparing the outputs (e.g., database queries). If an updatefails, Imago can terminate the duplicate system and revert tothe outdated version without any client-visible disruption.Imago can also detect semantic errors by comparing out-puts at the database level, with programmer-specified dataconversion logic to tolerate divergences between the two ver-sions. Imago requires a mostly stateless system which keepsits state on a data store that can be shared with other sys-tems (and whose semantics cannot change). Mvedsua doesnot require a particular architecture, supports expressiveupdates (including representation changes), can work withany updatable system (and adapted to many DSU systems)and with memory-only stateful applications. Both Imagoand Mvedsua tolerate update errors by using more compu-tational resources, but Mvedsua operates at a lower leveland requires less resources than Imago.

An approach to deal with long update times is to performparallel state transformation using several threads [37, 41] oron-demand (lazy) state transformation [28, 33, 37], transform-ing data as it is accessed after the update. Unfortunately, lazytransformation is particularly challenging for C programs,which can easily break proxies and other abstractions usedto support it. Parallel transformation can reduce the pausebut not eliminate it.

8 ConclusionDynamic software updating (DSU) can be an effective solu-tion to the problem of updating stateful applications withoutdisrupting service. However, DSU systems introduce pausesduring updates and require programmer assistance which is

prone to introducing errors. In addition, software updatesthemselves can introduce errors which can escape off-linetesting.

Mvedsua is a novel DSU system that employsmulti-versionexecution (MVE) to deliver a solution that both masks up-date pauses and tolerates a variety of failed updates. Weimplemented Mvedsua by combining Kitsune, a modernDSU system, with Varan, a high-performance MVE system,and evaluated it on several high-performance servers—Redis,Memcached and Vsftpd. We found that Mvedsua imposeslow overhead in steady state, masks the update-time pauses,and tolerates real and realistic update errors.

AcknowledgementsWe thank Karla Saur for her assistance with the Kitsunecodebase; Paul-Antoine Arras, Frank Busse, Jeff Foster, Ke-sha Hietala, Timotej Kapus, Martin Nowack and the anony-mous reviewers for their useful feedback. This research wasgenerously sponsored by the UK EPSRC through the Early-Career Fellowship EP/L002795/1 and the HiPEDS Centre forDoctoral Training.

References[1] Sameer Ajmani, Barbara Liskov, and Liuba Shrira. Modular software

upgrades for distributed systems. In European Conference on Object-Oriented Programming (ECOOP), July 2006.

[2] Joe Armstrong. Programming ERLANG: software for a concurrent world.Pragmatic programmers. 2007.

[3] Jeff Arnold and M. Frans Kaashoek. Ksplice: Automatic rebootlesskernel updates. In Proc. of the 4th European Conference on ComputerSystems (EuroSys’09), March-April 2009.

[4] Katelin Bailey, Luis Ceze, Steven D. Gribble, and Henry M. Levy. Op-erating system implications of fast, cheap, non-volatile memory. InProceedings of the 13th USENIX Conference on Hot Topics in Operat-ing Systems, HotOS’13, pages 2–2, Berkeley, CA, USA, 2011. USENIXAssociation.

[5] Emery D. Berger and Benjamin G. Zorn. Diehard: probabilistic memorysafety for unsafe languages. In Proc. of the Conference on ProgramingLanguage Design and Implementation (PLDI’06), June 2006.

[6] E. A. Brewer. Lessons from giant-scale services. IEEE Internet Com-puting, 5(4):46–55, Jul 2001.

[7] Cristian Cadar and Petr Hosek. Multi-version software updates. In Proc.of the 4th Workshop on Hot Topics in Software Upgrades (HotSWUp’12),June 2012.

[8] Haibo Chen, Jie Yu, Rong Chen, Binyu Zang, and Pen-Chung Yew. Po-lus: A powerful live updating system. In Proc. of the 29th InternationalConference on Software Engineering (ICSE’07), May 2007.

[9] Liming Chen and Algirdas Avizienis. N-version programming: A fault-tolerance approach to reliability of software operation. In Proc. ofthe 8th IEEE International Symposium on Fault Tolerant Computing(FTCS’78), June 1978.

[10] Benjamin Cox, David Evans, Adrian Filipi, Jonathan Rowanhill,Wei Hu,Jack Davidson, John Knight, Anh Nguyen-Tuong, and Jason Hiser. N-variant systems: A secretless framework for security through diversity.In Proc. of the 15th USENIX Security Symposium (USENIX Security’06),July-August 2006.

[11] Tudor Dumitraş and Priya Narasimhan. Why do upgrades fail andwhat can we do about it?: Toward dependable, online upgrades inenterprise system. In Proc. of the 10th ACM/IFIP/USENIX International

12

Page 13: Mvedsua: Higher Availability Dynamic Software Updates via ...

Mvedsua: Higher Availability DSU via MVE ASPLOS ’19, April 13–17, 2019, Providence, RI, USA

Conference on Middleware (Middleware’09), November 2009.[12] How Facebook pushes updates to its site every

day. http://thenextweb.com/facebook/2011/05/28/

how-facebook-pushes-updates-to-its-site-every-day/, 2011.[13] Cristiano Giuffrida, Calin Iorgulescu, Anton Kuijsten, and Andrew S.

Tanenbaum. Back to the Future: Fault-tolerant Live Update with Time-traveling State Transfer. In Proc. of the 27th USENIX Conference onSystem Administration (LISA’13), November 2013.

[14] Cristiano Giuffrida, Calin Iorgulescu, and Andrew S. Tanenbaum. Mu-table Checkpoint-Restart: Automating Live Update for Generic ServerPrograms. In Proc. of the 15th ACM/IFIP/USENIX International Confer-ence on Middleware (Middleware’14), December 2014.

[15] Cristiano Giuffrida, Anton Kuijsten, and Andrew S. Tanenbaum. Safeand automatic live update for operating systems. In Proc. of the 18thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS’13), March 2013.

[16] Christopher Hayden, Karla Saur, Michael Hicks, and Jeffrey Foster.A study of dynamic software update quiescence for multithreadedprograms. In HotSWUp, 2012.

[17] Christopher Hayden, Edward Smith, Eric Hardisty, Michael Hicks,and Jeffrey Foster. Evaluating dynamic software update safety usingefficient systematic testing. IEEE TSE, 2012.

[18] Christopher Hayden, Edward K. Smith, Michael Hicks, and Jeffrey S.Foster. State transfer for clear and efficient runtime upgrades. InHotSWUp, 2011.

[19] Christopher M. Hayden, Karla Saur, Edward K. Smith, Michael Hicks,and Jeffrey S. Foster. Efficient, general-purpose dynamic softwareupdating for c. ACM Transactions on Programming Languages andSystems (TOPLAS), 36(4):13, October 2014.

[20] Christopher M. Hayden, Edward K. Smith, Michail Denchev, MichaelHicks, and Jeffrey S. Foster. Kitsune: Efficient, general-purpose dy-namic software updating for C. In Proc. of the 27th Annual Conferenceon Object-Oriented Programming Systems, Languages and Applications(OOPSLA’12), October 2012.

[21] Michael Hicks and Scott M. Nettles. Dynamic software updating.TOPLAS, 2005.

[22] Petr Hosek and Cristian Cadar. Safe software updates via multi-versionexecution. In Proc. of the 35th International Conference on SoftwareEngineering (ICSE’13), May 2013.

[23] Petr Hosek and Cristian Cadar. Varan the Unbelievable: An efficientN-version execution framework. In Proc. of the 20th InternationalConference on Architectural Support for Programming Languages andOperating Systems (ASPLOS’15), March 2015.

[24] Jevgeni Kabanov and Varmo Vene. A thousand years of productivity:the jrebel story. Software: Practice and Experience, 44(1):105–127.

[25] K. Koning, H. Bos, and C. Giuffrida. Secure and efficient multi-variantexecution using hardware-assisted process virtualization. In Proc.of the 2016 46th International Conference on Dependable Systems andNetworks (DSN’16), June 2016.

[26] Kristis Makris and Rida A. Bazi. Immediate multi-threaded dynamicsoftware updates using stack reconstruction. In Proc. of the 2009USENIX Annual Technical Conference (USENIX ATC’09), June 2009.

[27] Matthew Maurer and David Brumley. TACHYON: Tandem executionfor efficient live patch testing. In Proc. of the 21st USENIX SecuritySymposium (USENIX Security’12), August 2012.

[28] Iulian Neamtiu, Michael Hicks, Gareth Stoyle, and Manuel Oriol. Prac-tical dynamic software updating for C. In Proc. of the Conference onPrograming Language Design and Implementation (PLDI’06), June 2006.

[29] Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Her-man Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, PaulSaab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani.Scaling Memcache at Facebook. In Proc. of the 10th USENIX Sympo-sium on Networked Systems Design and Implementation (NSDI’13), April2013.

[30] Oracle(TM). Java SE 1.4 Enhancements. http://download.java.net/

jdk8/docs/technotes/guides/jpda/enhancements1.4.html.[31] Luís Pina. Practical Dynamic Software Updating. PhD thesis, Instituto

Superior Técnico, 2016.[32] Luís Pina, Anastasios Andronidis, and Cristian Cadar. FreeDA: Incom-

patible stock dynamic analyses in production. In Proc. of the 2018 ACMInternational Conference on Computing Frontiers (CF’18), May 2018.

[33] Luís Pina and João Cachopo. Atomic dynamic upgrades using softwaretransactional memory. In Proc. of the 4th Workshop on Hot Topics inSoftware Upgrades (HotSWUp’12), June 2012.

[34] Luís Pina, Daniel Grumberg, Anastasios Andronidis, and CristianCadar. A DSL approach to reconcile equivalent divergent programexecutions. In Proc. of the 2017 USENIX Annual Technical Conference(USENIX ATC’17), July 2017.

[35] Luís Pina and Michael Hicks. Rubah: Efficient, general-purpose dy-namic software updating for Java. In Proc. of the 5th Workshop on HotTopics in Software Upgrades (HotSWUp’13), June 2013.

[36] Luís Pina and Michael Hicks. Tedsuto: a general framework for testingdynamic software updates. In Proc. of the IEEE International Conferenceon Software Testing, Verification, and Validation (ICST’16), April 2016.

[37] Luís Pina, Luís Veiga, and Michael Hicks. Rubah: DSU for Java on astock JVM. In Proc. of the 29th Annual Conference on Object-OrientedProgramming Systems, Languages and Applications (OOPSLA’14), Octo-ber 2014.

[38] Weizhong Qiang, Feng Chen, Laurence T Yang, and Hai Jin. MUC:Updating cloud applications dynamically via multi-version execution.Future Generation Computer Systems, 2015.

[39] Martin Roesch. Snort - lightweight intrusion detection for networks. InProc. of the 13th USENIX Conference on System Administration (LISA’99),November 1999.

[40] Babak Salamat, Todd Jackson, Andreas Gal, and Michael Franz. Or-chestra: intrusion detection using parallel execution and monitoring ofprogram variants in user-space. In Proc. of the 4th European Conferenceon Computer Systems (EuroSys’09), March-April 2009.

[41] Karla Saur, Michael Hicks, and Jeffrey S. Foster. C-strider: Type-awareheap traversal for C. Software, Practice, and Experience, May 2015.

[42] Suriya Subramanian, Michael Hicks, and Kathryn S. McKinley. Dy-namic software updates: A VM-centric approach. In Proc. of the Con-ference on Programing Language Design and Implementation (PLDI’09),June 2009.

[43] A. T. Tai and L. Alkalai. Long-life deep-space applications. Computer,31:37–38, 04 1998.

[44] Hannes Tschofenig and Stephen Farrell. Report from the Internet ofThings Software Update Workshop 2016. RFC 8240, September 2017.

[45] Stijn Volckaert, Bart Coppens, Bjorn De Sutter, Koen De Bosschere,Per Larsen, and Michael Franz. Taming parallelism in a multi-variantexecution environment. In Proc. of the 12th European Conference onComputer Systems (EuroSys’17), April 2017.

[46] Stijn Volckaert, Bart Coppens, Alexios Voulimeneas, Andrei Homescu,Per Larsen, Bjorn De Sutter, and Michael Franz. Secure and efficientapplication monitoring and replication. In Proc. of the 2016 USENIXAnnual Technical Conference (USENIX ATC’16), June 2016.

[47] Thomas Würthinger, Christian Wimmer, and Lukas Stadler. Dynamiccode evolution for Java. In Proceedings of the 8th International Confer-ence on the Principles and Practice of Programming in Java, 2010.

[48] Hui Xue, Nathan Dautenhahn, and Samuel T. King. Using replicatedexecution for a more secure and reliable web browser. In Proc. of the19th Network and Distributed System Security Symposium (NDSS’12),February 2012.

[49] Zuoning Yin, Ding Yuan, Yuanyuan Zhou, Shankar Pasupathy, andLakshmi Bairavasundaram. How do fixes become bugs? In Proc.of the joint meeting of the European Software Engineering Conferenceand the ACM Symposium on the Foundations of Software Engineering(ESEC/FSE’11), September 2011.

13