Distributed Provenance Compression - Georgetown University

Distributed Provenance Compression

Chen ChenUniversity of Pennsylvania

[email protected]

Harshal Tushar LehriUniversity of Pennsylvania

[email protected] Kuan Loh

Carnegie Mellon [email protected]

Anupam AlurUniversity of Pennsylvania

[email protected] Jia

Carnegie Mellon [email protected]

Boon Thau LooUniversity of Pennsylvania

[email protected]

Wenchao ZhouGeorgetown University

[email protected]

ABSTRACTNetwork provenance, which records the execution historyof network events as meta-data, is becoming increasinglyimportant for network accountability and failure diagnosis.For example, network provenance may be used to trace thepath that a message traversed in a network, or to revealhow a particular routing entry was derived and the partiesinvolved in its derivation. A challenge when storing theprovenance of a live network is that the large number of ar-riving messages may incur substantial storage overhead. Inthis paper, we explore techniques to dynamically compressdistributed provenance stored at scale. Logically, compres-sion is achieved by grouping equivalent provenance treesand maintaining only one concrete copy for each equiva-lence class. To efficiently identify the equivalent provenance,we (1) introduce distributed event-based linear programs(DELPs) to specify distributed network applications, and(2) statically analyze DELPs to allow for quick detectionof provenance equivalence at runtime. Our experimentalresults demonstrate that our approach leads to significantstorage reduction and query latency improvement over al-ternative approaches.

KeywordsProvenance; distributed systems; storage; static analysis

1. INTRODUCTIONNetwork administrators require the capability to identify

the root causes of device malfunction and performance slow-downs in data centers or across wide-area networks, and alsoto determine the sources of security attacks. Such capabili-ties often utilize network provenance, which allows the userto issue queries over network meta-data about the execu-tion history. In recent years, network provenance has been

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

SIGMOD ’17, May 14–19, 2017, Chicago, Illinois, USA.c© 2017 ACM. ISBN 978-1-4503-4197-4/17/05. . . $15.00

DOI: http://dx.doi.org/10.1145/3035918.3035926

successfully applied to various network settings, resultingin proposals for distributed provenance [28], secure networkprovenance [26], distributed time-aware provenance [27] andnegative provenance [23]. These proposals demonstrate thatdatabase-style declarative queries can be used for maintain-ing and querying distributed provenance at scale.One of the main drawbacks of the existing techniques is

their potentially significant storage overhead, when networkprovenance is incrementally maintained as network eventsoccur continuously. This is particularly challenging for thedata plane of networks that deals with frequent and high-volume incoming data packets. When there are streams ofincoming packet events, the provenance information can be-come prohibitively large. While there is prior work on prove-nance compression in the database literature [3], the workwas not designed for distributed settings. Our paper’s con-tributions are:System Model. We propose a new network program-ming model, which specifies distributed event-based linearprograms (DELPs), using a restricted variant of the Net-work Datalog language in declarative networking [12]. EachDELP is composed of a set of rules triggered by events,and executes until a fixpoint is reached. Unlike traditionalevent-condition-action rules, a DELP has the option of des-ignating slow changing tuples, which do not change whilea distributed fixpoint computation is happening, but arestill amenable to update at intervals. An example of a slowchanging tuple could be a routing entry in a router. Weshow, through two example applications (packet forwardingand DNS resolution), that this model is general enough tocover a wide range of network applications.Distributed Provenance Compression. Based on theDELP model, we propose two techniques to store prove-nance information efficiently. Our first technique relies onmaterializing only the tuples that the administrators are in-terested in. We propose a distributed querying techniquethat can reconstruct the entire provenance tree from the re-duced provenance information that is maintained. Our sec-ond technique combines multiple provenance trees together,based on a notion of equivalence classes. In each equiva-lence class, the provenance trees are identical except a fewpre-defined nodes. We compress these equivalent trees bymaintaining only one concrete copy for the shared prove-nance part, along with the delta information for each indi-

http://dx.doi.org/10.1145/3035918.3035926

vidual provenance tree. We also propose to efficiently iden-tify equivalence between provenance trees by simply inspect-ing the values of input events’ attributes, thus reducing thecomputation overhead in a distributed environment.Implementation and Evaluation. We implement a pro-totype of our distributed compression scheme based on theRapidNet declarative networking engine [14]. We enhanceRapidNet to include a rule rewrite engine that maintainsprovenance at runtime. Provenance queries are implementedas distributed recursive queries over the maintained prove-nance information. We deploy and evaluate our prototypeusing two popular network applications – i.e., packet for-warding and DNS resolution – and the performance resultsshow that the compression techniques achieve orders of mag-nitude reduction in storage and significant reduction in querylatency, with only negligible network overhead added to eachmonitored network application at runtime.

2. BACKGROUNDWe first provide an introduction to Network Datalog (ND-

log) [12], a declarative networking programming languagewe use to model network applications in the distributed sys-tem, then we introduce the concept of distributed networkprovenance [28, 27].

2.1 Network Datalogr1 packet(@N, S, D, DT ) :− packet(@L, S, D, DT ),

route(@L, D, N).r2 recv(@L, S, D, DT ) :− packet(@L, S, D, DT ), D == L.

Figure 1: An NDlog program for packet forwarding

To illustrate NDlog, we show an example query (Figure 1)that recursively forwards packets in a network. A typicalNDlog program is composed of a set of rules. Each ruletakes on the format p :− q1, q2, ...qn, where p is a relationcalled the rule head, and qis are rule bodies that are eitherrelational atoms, arithmetic atoms or user-defined functions.Relations and rules of an NDlog program can be deployedin a distributed fashion. To logically specify the location ofeach relation, an “@” symbol – called the location specifier– is prepended to the first attribute of each relation.Each node in the network maintains a relational database

storing base tuples (i.e., tuples that are input by the user)and/or derived tuples (i.e., tuples that are generated by theNDlog program). During program execution, when all therule bodies of a rule r have corresponding tuples in the localdatabase, r will be triggered, generating the head tuple.If the location specifier of the head tuple is different fromthat of the bodies (e.g., r1 in Figure 1), the head tuple willbe transmitted through the network to the remote node. Inthe example program of Figure 1, r1 forwards a local packet(packet) at a node L to a node N by looking up the packet’sdestination D in the local routing table (route). r2 receivesa packet and stores it locally in the recv table, if the packetis destined to the local node (D == L).

2.2 Distributed Network ProvenanceData provenance [8] can be used to explain why and how

a given tuple is derived. Based on data provenance, priorwork [28] also proposes network provenance, which faithfullyrecords the execution of (possibly erroneous) applications ina (possibly misconfigured) distributed system. This allows

n1 n2 n3

route

L D N

@n1 n3 n2

route

L D N

@n2 n3 n3

Figure 2: An example deployment of packet for-warding. Node n1 and node n2 has a local routetable indicating routes towards node n3 .

the network administrators to inspect the derivation historyof system states. For example, suppose there is a directlink between n1 and n3 in Figure 2. If the user prefersthe routing with the shortest paths, the routing entry ofn1 in Figure 2 would have been erroneous – a correct entryshould be route(@n1, n3, n3). The provenance engine, ag-nostic of this error, would record the packet traversal on thepath n1 → n2 → n3 . The user can later use this recordedprovenance as explanation on why the packet took a partic-ular route, eventually leading to further investigation intothe route table at n1 .Network provenance is typically represented as a directed

tree rooted at the queried tuple. Figure 3 shows the prove-nance tree of a tuple recv(@n3, n1, n3, “data”). This prove-nance tree records the traversal of packet(@n1, n1, n3, “data”)from node n1 to n3 in Figure 2. There are two types ofnodes in a typical provenance tree: the rule nodes and thetuple nodes. The rule nodes (i.e., the oval nodes in Fig-ure 3) stand for the rules that are triggered in the programexecution, while the tuple nodes (i.e., the square nodes inFigure 3) represent tuples that trigger/are derived by therule execution. Note that the root of a provenance tree isalways a tuple node that represents the queried tuple.To maintain the provenance, traditional database work [10]

often stores data provenance along with the target tuple forefficient provenance querying. Such centralized provenancemaintenance turns out to be very costly for network prove-nance – which is typically constructed in a distributed fash-ion – in terms of the extra bandwidth needed to ship theprovenance information.ExSPAN [28], a representative distributed provenance en-

gine, maintains the provenance information in a distributedrelational database. There are two (distributed) tables inthe database: a prov table and a ruleExec table. The provtable records the rule triggering of a derived tuple, while theruleExec table maintains the body tuples triggering a spe-cific rule. Table 1 shows an example distributed databasestoring the provenance tree in Figure 3. The Loc attributein the prov table and the RLoc attribute in the ruleExectable indicate the location of each tuple.ExSPAN uses a recursive query to retrieve the provenance

tree of a queried tuple. For example, to query the prove-nance tree of recv(@n3, n1, n3, “data”) (Figure 3), ExSPANfirst computes the hash value vid6 of the tuple, and usesvid6 to find the tuple prov(n3, vid6, rid3, n3) in the prov ta-ble. ExSPAN further uses the values rid3 and n3 to locateruleExec(n3, rid3, r2, (vid5)) in the ruleExec table, which rep-resents the provenance node of the rule execution (i.e., r2 )that derives vid6 . To further query the body tuples thattriggered r2 , the querier would then look up (vid5 ) in the

provLoc VID RID RLocn3 vid6 rid3 n3

(sha1(recv(@n3, n1, n3, “data”)))n3 vid5 rid2 n2

(sha1(packet(@n3, n1, n3, “data”)))n2 vid4 rid1 n1

(sha1(packet(@n2, n1, n3, “data”)))n2 vid3 NULL NULL

(sha1(route(@n2, n3, n3)))n1 vid2 NULL NULL

(sha1(packet(@n1, n1, n3, “data”)))n1 vid1 NULL NULL

(sha1(route(@n1, n3, n2)))ruleExec

RLoc RID R VIDSn3 rid3(sha1(r2+n3+vid5)) r2 (vid5)n2 rid2(sha1(r1+n2+vid3+vid4)) r1 (vid3,vid4)n1 rid1(sha1(r1+n1+vid1+vid2)) r1 (vid1,vid2)

Table 1: Relational tables (ruleExec and prov) main-taining the provenance tree in Figure 3.

packet(@n1,n1,n3, data )


r1@n1

route(@n1,n3,n2)

r1@n2

route(@n2,n3,n3)


r2@n3

recv(@n3,n1,n3, data )

Figure 3: A distributed provenance tree for the ex-ecution of packet(@n1, n1, n3, “data”), which traversesfrom node n1 to node n3 in Figure 2.

prov table. This recursive querying continues until it reachesthe base tuples (e.g., route(@n1, n3, n2)).We adopt the same storage model as ExSPAN. However,

our provenance compression scheme applies generally to anydistributed provenance model.

2.3 Motivation for Provenance CompressionA key problem not addressed in prior work on network

provenance [28][27] is that the provenance information canbecome very large, especially for distributed applications(e.g., network protocols) where event tuples trigger rulesin a streaming fashion. For example, in Figure 2, if n1 initi-ates a large volume of traffic towards n3 , each packet in thetraffic would generate a provenance tree similar to the one inFigure 3. Given that today’s routers forward over millions ofpackets per second, this would incur prohibitively high stor-age overhead for the maintenance of distributed provenanceon each intermediate node.We observe however that the provenance of different pack-

ets share significant similarities in their structures, present-ing opportunities for provenance compression across differ-ent provenance trees. For example, in Figure 2, whenevera new packet is sent from n1 to n3 , an entire provenancetree is created and maintained. However, it is not hard toobserve that all the packets traversing through n1 and n2take the same route – that is, they join with the same localroute tuples. Therefore, storage of the provenance trees gen-

erated by these packets could be significantly reduced if wemanage to remove the observed redundancy.The key challenge of provenance compression in a dis-

tributed system is to achieve significant storage saving whileincurring low network overhead (e.g., extra bandwidth andcomputation), and still enabling the user to query the prove-nance information effectively as does uncompressed prove-nance. Hence, we avoid content-level compression techniquessuch as gzip, but opt for the conservative compression basedon the structure of provenance trees.

3. MODELA distributed system DS is modeled as an undirected

graph G = (V ,E). Each node Ni in V represents an en-tity in DS . Two nodes Ni and Nj can communicate witheach other if and only if there is an edge (Ni ,Nj) in E . InDS , each node Ni maintains a local state in the form ofa relational database DBi . Tables in DBi can be dividedinto base tables and derived tables. Tuples in base tables aremanually updated, while tuples in derived tables are derivedby network applications. Figure 2 is an example distributedsystem with three nodes.

3.1 Network ApplicationsEach node in DS runs a number of network applications,

which are specified in NDlog with syntactic restriction. Thesyntactic restriction enables efficient provenance compres-sion (Section 5), while still being expressive enough to modelmost network applications. In particular, we have:

Definition 1. An NDlog program Prog={r1 , r2 , ..., rn} isa distributed event-driven linear program (DELP), if Progsatisfies the following three conditions:• Each rule is event-driven. Each rule ri can be speci-fied in the form: [head] : −[event], [conditions], where[event] is a body relation designated by the program-mer, and [conditions] are all non-event body atoms.• Consecutive rules are dependent. For each rule pair(ri , ri+1 ) in Prog, the head relation of ri is identical tothe event relation in ri+1 .• Head relations only appear as the event relations inrule bodies. For each head relation hd in any rule ri,there does not exist a rule rj, such that hd is a non-event relation in rj.

In a typical network application, non-event relations oftenrepresent the network states, which change slowly comparedto the fast rate of incoming events. For example, in thepacket forwarding program, the route relation is either up-dated manually or through a network routing protocol. Ineither case, it changes slowly compared to the large volumeof incoming packets. We call such non-event relations in aDELP as slow-changing relations, and assume they do notchange during the fixpoint computation. This assumptionis realistic and can be enforced easily in the networks whereconfigurations are updated at runtime and packets see onlyeither the old or new configuration version across routers,as shown in prior work [19] in the networking community.A DELP {r1 , r2 , ..., rn} can be deployed in a distributed

fashion over a network, and its execution follows the pipelinedsemi-naïve evaluation strategy introduced in prior work [11]– whenever a new event tuple is injected into a node Ni , ittriggers r1 by joining with the slow-changing tuples at Ni .

The generated head tuple hd is then sent to a node Nj asidentified by the location specifier of hd – triggering r2 atNj . This process continues until rn is executed.DELP can model a large number of network applications,

due to their event-driven nature, such as packet forwarding(Figure 1), Domain Name System (DNS) resolution [13], Dy-namic Host Configuration Protocol (DHCP) [7] and AddressResolution Protocol (ARP) [18].

3.2 Provenance of InterestIt is often the case that network administrators use a sub-

set of network states more often than others as their startingpoint for debugging. In the packet forwarding example, anadministrator is more likely to query the provenance of a recvtuple rather than a packet tuple upon packet misrouting, be-cause the nodes that generate recv tuples are usually thefirst places where an administrator observes abnormality.Therefore, we allow a user to specify relations of interest –i.e., relations whose provenance information interests a userthe most in a network application – and our runtime systemmaintains concrete provenance information only for thosetuples of the relations of interest. However, the provenanceof other tuples – i.e., those of the relations of less interest– is still accessible. We can adopt, for example, the reac-tive maintenance strategy proposed in DTaP [27], by onlymaintaining non-deterministic input tuples, and replayingthe whole system execution to re-construct the provenanceinformation of the tuples of less interest during querying.As with prior work [28], we represent the provenance in-

formation of the tuples of interest as provenance trees. Theonly difference is that, given the syntactic restriction of aDELP, our system treats slow-changing tuples as base tu-ples during provenance querying – i.e., the provenance treeof a slow-changing tuple, e.g., a route tuple, is not auto-matically presented, even if the tuple could be derived fromanother network application, e.g., a routing protocol. To ob-tain the provenance tree of a route tuple during provenancequerying, a user could specify route as a relation of interestin the application that derives it, and explicitly query theprovenance tree of the route tuple in a separate process.

4. BASIC STORAGE OPTIMIZATIONBased on the model introduced in the previous section, we

propose our basic storage optimization for provenance trees,which lays the foundation for the compression scheme in Sec-tion 5. Simply put, for each provenance tree, we remove itsprovenance nodes representing the intermediate event tu-ples. Figure 4 shows an optimized provenance tree tr ′ ofthe tree in Figure 3. The (distributed) relational databasemaintaining tr ′ is shown in Table 2, where vid values andrid values are identical to those in Table 1.Compared to Table 1, Table 2 differs at two parts:• The prov table only maintains the provenance of thequeried tuple, i.e., the recv tuple. Other entries in theprov table are omitted because they represent eitherthe removed intermediate tuples or the base tuples.• Two extra columns NLoc and NRID are added tothe ruleExec table. They help the recursive query findthe child node for each provenance node in the tree.

The optimization of removing the intermediate nodes savesa fair amount of storage space, especially when the inputevents arrive at a high rate and generate a large number


r1@n1

route(@n1,n3,n2)

r1@n2

route(@n2,n3,n3)

r2@n3

recv(@n3,n1,n3, data )

Figure 4: An optimized provenance tree for the treein Figure 3.

provLoc VID RID RLocn3 vid6 rid3 n3

ruleExecRLoc RID R VIDS NLoc NRIDn3 rid3 r2 NULL n2 rid2n2 rid2 r1 (vid4) n1 rid1n1 rid1 r1 (vid1,vid2) NULL NULL

Table 2: Optimized ruleExec and prov tables for theprovenance tree in Figure 4.

of intermediate tuples, as is common in typical networkingscenarios. We use the querying of recv(@n3, n1, n3, “data”)’sprovenance in Table 2 to illustrate the two-step provenancequerying process for optimized provenance trees:Step 1: Construct the optimized provenance tree.The query first fetches the provenance tree in the optimizedform through recursive querying over Table 2. Starting fromthe prov entry corresponding to recv(@n3, n1, n3, “data”), wefetch the provenance node for the last rule execution rid3in the ruleExec table, then follow the values in NLoc andNRID to recursively fetch all the ruleExec tuples (i.e., rid3 ,rid2 and rid1 ) until no further provenance nodes can befetched – i,e., both NLoc and NRID are NULL.Step 2: Compute the intermediate provenance nodes.At the end of Step 1, we obtain the provenance tree tr ′ inFigure 4. To recover the intermediate provenance nodes, westart from the leaf nodes, i.e., packet(@n1, n1, n3, “data”)and route(@n1, n3, n2), and re-execute the rule r1 to derivepacket(@n2, n1, n3, “data”). This process is repeated in abottom-up fashion until the root is reached, resulting in theprovenance tree in Figure 3 .In summary, the basic optimization still allows the user

to query the complete provenance trees, but incurs extracomputational overhead during provenance querying to re-cover the intermediate nodes. The extra query latency isnegligible, as is shown in Section 6.1.3.

5. EQUIVALENCE-BASED COMPRESSIONThe storage optimization described in Section 4 focuses

on reducing the storage overhead within a single provenancetree. Building upon this optimization, we further exploreremoving redundancy across provenance trees. We proposegrouping provenance trees of DELP execution into equiva-lence classes, and only maintaining one copy of the sharedsub-tree within each equivalence class. Our definition of theequivalence relation allows equivalent provenance trees to bequickly identified through inspection of equivalence keys – a

subset of attributes of input event tuples – and compressedefficiently at runtime. The equivalence keys can be obtainedthrough static analysis of a DELP.

5.1 Equivalence RelationWe first introduce the equivalence relation for provenance

trees. We say that two provenance trees tr and tr ′ are equiv-alent, written (tr ∼ tr ′) if (1) they are structurally identical– i.e., they share the identical sequence of rules – and (2) theslow-changing tuples used in each rule are identical as well.In other words, two equivalent trees tr and tr ′ only differat two nodes: (1) the root node that represents the outputtuple and (2) the input event tuple. The formal definitionof tr ∼ tr ′ can be found in Appendix A. In our packet for-warding example, the provenance tree generated by a newevent packet(@n1, n1, n3, “url”) (with “url” as its payload)is equivalent to the tree in Figure 4.For each equivalence class, we only need to maintain one

copy for the sub-provenance tree shared by all the classmembers, while each individual tree in the equivalence classonly needs to maintain a small amount of delta information– i.e., the root node, the event leaf node, and a reference tothe shared sub-provenance tree. Additionally, this definitionof equivalence enables more efficient equivalence detectionthan node-by-node comparison between trees. In fact, weshow that equivalence of two provenance trees can be de-termined by checking equivalence of the input event tuplesin both trees, based on the observation that the executionof a DELP is uniquely determined by the values of a subsetof attributes in the input event tuple. For example, in thepacket forwarding program (Figure 1), if the values of theattributes (loc, dst) in two input packet tuples are identical,these two tuples will generate equivalent provenance trees.We denote the minimal set of attributes K in the in-

put event relation whose values determine the provenancetrees as equivalence keys. Two event tuples ev1 and ev2 ofa relation e are said to be equivalent w.r.t K, written asev1 ∼K ev2, if their valuation of K is equal. Formally:

Definition 2 (Event equivalence). Let K = {e:i1, · · · , e:im},e(t1 · · · tn) ∼K e(s1 · · · sn) iff ∀j ∈ {i1, · · · , im}, tj = sj.

Here, e:i denotes the ith attribute of the relation e.Based on the above discussion, our approach to compress-

ing provenance trees, with regard to a program DQ, consistsof the following two main algorithms. (1) an equivalencekeys identification algorithm, which performs static analy-sis of DQ to compute the equivalence keys (Section 5.2);and (2) an online provenance compression algorithm, whichmaintains the shared provenance tree for each equivalenceclass in a distributed fashion (Section 5.3).Correctness of using event equivalence for determining

provenance tree equivalence is shown in Theorem 1. Theproof will be discussed in Section 5.2.

Theorem 1 (Correctness of equivalence keys). Given a pro-gram DQ of DELP, and two input event tuples ev1 andev2 , if ev1 ∼K ev2 , where K is the equivalence keys forDQ, then for any provenance tree tr1 (tr ′

2 ) generated by ev1(ev2 ), there exists a provenance tree tr2 (tr ′

1 ) generated byev2 (ev1 ) s.t. tr1 ∼ tr2 (tr ′

1 ∼ tr ′2 ).

5.2 Equivalence Keys IdentificationGiven a DELP, we define a static analysis algorithm to

identify the equivalence keys of the input event relation. The

1: function GetEquiKeys(G, ev)2: eqid ← {}3: eqid.append(ev:0 )4: nodes ← event attribute nodes in G5: for each ev:i in nodes do6: for bnode in non-event nodes of G do7: if ev:i is reachable to bnode then8: eqid.append(ev:i)9: return eqid10: end functionFigure 5: Pseudocode to identify equivalence keys

algorithm consists of two steps: (1) building an attribute-level dependency graph reflecting the relationship betweenvaluation of different attributes and (2) computing equiva-lence keys based on the constructed dependency graph. De-tails of each step are given below.Build the attribute-level dependency graph. Anattribute-level dependency graph G=(V , E) is an undi-rected graph. Nodes of G represent attributes in the pro-gram. Specifically, the i-th attribute of a relation rel corre-sponds to a vertex labeled as (rel:i) in G. We refer interestedreaders to Appendix C for an example dependency graph ofthe packet forwarding program.Two vertices v1 and v2 are directly connected in G if and

only if v1 represents an attribute attr1 of the event relationin a rule r and v2 represents another attribute attr2 in r ,and satisfies any of the following conditions: (1) attr2 is anattribute of the same name as attr1 in a slow-changing re-lation (e.g., v1 = (packet:1) and v2 = (route:1) in rule r1of Figure 1); (2) attr2 is a head attribute with the samename as attr1 (e.g., v1 = (packet:1) and v2 = (recv:1) in r2of Figure 1); (3) attr2 and attr1 appear in the same arith-metic atom (e.g., v1 = (packet:0) and v2 = (packet:2) inrule r2 of Figure 1); and (4) v1 is on the right hand sideof an assignment asn and attr2 is on the left hand side ofasn. (e.g., if rule r2 of Figure 1 were to be redefined asr2′ recv(@L, S,N,DT ) :− packet(@L, S,D,DT ), N := L+2.,and v1 = (packet:0) while v2 = (recv:2)).Identify equivalence keys. Given the attribute-level de-pendency graph G, we identify the equivalence keys of theinput event relation ev using the function GetEquiKeys(Figure 5). GetEquiKeys takes G and ev as input, andoutputs a list of attributes eqid representing the equivalencekeys. In the algorithm, for each node (ev:i) in G, GetE-quiKeys checks whether (ev:i) is reachable to any attributein a slow-changing relation. If this is the case, (ev:i) wouldbe identified as a member of the equivalence keys, and ap-pended to eqid. We always include the attribute indicatingthe input location of ev (e.g., (packet:0)) in the equivalencekeys, to ensure no two input event tuples at different loca-tions have the same equivalence keys. When applied to thepacket forwarding program, GetEquiKeys would identify(packet:0) and (packet:2) as equivalence keys.Now we introduce a few denotations to help prove Theo-

rem 1. We use predicate joinSAttr(p:n) to denote that a node(p:n) in the dependency graph has an edge to an attributein a slow changing relation. We denote each edge connectingtwo attributes (p:n, q:m) not in any slow-changing relationas predicate joinFAttr(p:n, q:m). We further use predicatejoinFAttr(p:n, q:m) to inductively define connected(e:i, p:n),denoting a path in the graph from (e:i) to (p:n). We then

formally define below what it means, given a DELP, for Kto be equivalence keys:

Definition 3. K is equivalence keys for a program DQ ofDELP, if ∀(e:i) ∈ K, either DQ ` joinSAttr(e:i) or ∃p, n s.t.DQ ` connected(e:i, p:n) and DQ ` joinSAttr(p:n).

We show the correctness of Theorem 1 by proving Lemma 2,a stronger lemma that gives us Theorem 1 as corollary. InLemma 2, we write tr : P to denote that tr is a provenancetree of the output tuple P , and write DQ,DB, ev � tr : Pto mean that tr is generated by executing the program DQover a database DB, triggered by the event tuple ev.

Lemma 2 (Correctness of equivalence keys (Strong)).If GetEquiKeys(G, ev) = K and ev1 ∼K ev2and DQ,DB, ev1 � tr1 : p(t1, ..., tn),then ∃tr2 : p(s1, ..., sn) s.t. DQ,DB, ev2 � tr2 : p(s1, ..., sn)and tr1 : p(t1, ..., tn) ∼ tr2 : p(s1, ..., sn)and ∀i ∈ [1, n], ti 6= si implies

∃` s.t. DQ ` connected(ev:`, p:i) and (ev:`) 6∈ K.

Intuitively, Lemma 2 states that given two equivalent in-put event tuples ev1 and ev2 w.r.t. K, and ev1 generatesa provenance tree tr1, we can construct a tr2 for ev2 suchthat tr1 and tr2 are equivalent – i.e., they share the samestructure and slow-changing tuples. Furthermore, if the twooutput tuples p(t1, ..., tn) and p(s1, ..., sn) have different val-ues for a given attribute, this attribute must connect to anevent attribute that is not in equivalence keys in the depen-dency graph. This last condition enables an inductive proof(Appendix B) of Lemma 2 over the structure of the trees.Time complexity. Next, we analyze the time complexityof static analysis. Assume that a program DQ has m rules.Each rule r has k atoms, including the head relation and allbody atoms. Each atom has at most t attributes. Hence, theattribute-level dependency graph G has at most n=m ∗ k ∗ tnodes. The construction of G takes O(n2 ) time, and theidentification of equivalence keys takes O(t ∗ n) time. Nor-mally t is much smaller than n. Therefore, the total com-plexity of static analysis is O(n2 ).

5.3 Online Provenance CompressionWe next present an online provenance compression scheme

that compresses equivalent (distributed) provenance treesbased on the identified equivalence keys. In our compressionscheme, the execution of a DELP, triggered by an eventtuple ev, is composed of three stages:• Stage 1: Equivalence keys checking. Extract thevalue v of ev’s equivalence keys, and check whether vhas ever been seen before. If so, set a Boolean flagexistFlag to True. Otherwise, set existFlag to False.Tag existFlag along with ev throughout the execution.• Stage 2: Online provenance maintenance. If

existFlag is True, no provenance information is main-tained during the execution. Otherwise, the completeprovenance tree of the execution would be maintained.• Stage 3: Output tuple provenance maintenance.When the execution finishes, associate the output tu-ple to the shared provenance tree to allow for futureprovenance querying.

To illustrate this, Figure 6 presents an example consistingof two packets traversing the network topology (from n1 ton3 ) in Figure 2. packet(@n1, n1, n3, “data”) is first inserted

for execution (represented by the solid arrows), followed bythe execution of packet(@n1, n1, n3, “url”) (represented bythe dashed arrows). The three stages of online compressionare logically separated with vertical dashed lines. Table 3presents the (distributed) relational tables (i.e., a ruleExectable and a prov table) that maintain the compressed prove-nance trees for the aforementioned execution. Next, we in-troduce each stage in detail.Equivalence Keys Checking. Upon receiving an inputevent ev, our runtime system first checks whether the valueof ev’s equivalence keys have been seen before. To do this,we use a hash table htequi to store all unique equivalencekeys that have arrived. If ev’s equivalence keys eqid has avalue that already exists in htequi, a Boolean flag existFlagwill be created and set to True. This existFlag is supposedto accompany ev throughout the execution, notifying allnodes involved in the execution to avoid maintaining theconcrete provenance tree. Otherwise, existFlag would beset to False, instructing the subsequent nodes to maintainthe provenance tree. For example, in Figure 6, when thefirst packet tuple packet(@n1, n1, n3, “data”) arrives, it hasvalues (n1 ,n3 ) for its equivalence keys, which have neverbeen encountered before, so its existFlag is False. But whenthe second packet tuple packet(@n1, n1, n3, “url”) arrives,since it shares the same equivalence keys values with thefirst packet, the existFlag for it is True.Online Provenance Maintenance. For each rule r trig-gered in the execution, we selectively maintain the prove-nance information based on existFlag’s value. if existFlagis False, the provenance nodes are maintained as tuples inthe ruleExec table locally. Otherwise, no provenance in-formation is maintained at all. For example, in Figure 6,when packet(@n2, n1, n3, “data”) triggers rule r1 at noden2 , the existFlag is False. Therefore, we insert a tupleruleExec(n2, rid2, r1, vid1, n1, rid3) into the ruleExec tableat node n2 to record the provenance. The semantics of theinserted tuple are the same as introduced in Section 4. Incomparison, when packet(@n2, n1, n3, “url”) triggers r2 atnode n2 , its existFlag is True. In this case, we simply exe-cute r2 without recording any provenance information.Output Tuple Provenance Maintenance. For the ex-ecution whose existFlag is True, we need to associate itsoutput tuple to the shared provenance tree maintained byprevious execution. To do this, we use a hash table hmap tostore the reference to the shared provenance tree, whereinthe key is the hash value of the equivalence keys, and thevalue is the node closest to the root in the shared provenancetree. For example, in Figure 6, the shared provenance treeis stored in hmap as {hash(n1 ,n3 ): (n3 , rid1 )}.We then associate each output tuple tp to the shared

provenance tree st, by looking up its equivalence keys’ val-ues in hmap. This association is stored as a tuple in theprov table. For example, in Figure 6, the first executiongenerates the output tuple recv(@n3, n1, n3, “data”), whichis associated to the reference (n3 , rid1 ). This is reflectedby the tuple prov(n3, tid1, n3, rid1, evid1) in the prov table(Table 3). evid1 is used to identify the event tuple pecu-liar to the execution, which is not included in the sharedprovenance tree.Correctness of Online Compression. We prove the cor-rectness of the online compression algorithm by showingthat our compression scheme of provenance trees is loss-less – that is, the distributed provenance nodes maintained

Is (n1,n3) existing?

existFlag := FalseexistFlag := True

Append existFlag to packet tuple

existFlag == False? existFlag == False?

n2 n3 existFlag == False?

Associate Equivalence Key with the shared provenance tree

Associate the output tuple with the shared provenance tree

Yes

Yes

Yes

Stage 1: Equivalence Key Checking Stage 2: Distributed Online Provenance Maintenance Stage 3: Output Tuple Provenance Association

: Execution of packet(@n1,n1,n3, data ) : Execution of packet(@n1,n1,n3, url )

Q={r1,r2}

packet(@n1,n1,n3, url )


Equivalence Key:(loc, dst)

Yes No

Maintain provenance nodes

n1

existFlag == False?

Yes

No



Figure 6: An example execution of the packet forwarding program in Figure 1. The program is first triggeredby packet(@n1, n1, n3,“data”), followed by packet(@n1, n1, n3,“url”).

ruleExecLoc RID RULE VIDS NLoc NRIDn3 rid1(sha1(r2)) r2 NULL n2 rid2n2 rid2(sha1(r1,vid1 )) r1 (vid1 (sha1(route(@n2, n3, n3)))) n1 rid3n1 rid3(sha1(r1,vid2 )) r1 (vid2 (sha1(route(@n1, n3, n2)))) NULL NULL

provLoc VID RLoc RID EVIDn3 tid1(sha1(recv(@n3, n1, n3, “data”))) n3 rid1 evid1(sha1(packet(@n1, n1, n3, “data”)))n3 tid2(sha1(recv(@n3, n1, n3, “url”))) n3 rid1 evid2(sha1(packet(@n1, n1, n3, “url”)))

Table 3: a ruleExec table and a prov table for compressed provenance trees produced in Figure 6

in the ruleExec and prov tables contain the exact same setof provenance trees that would have been derived by semi-naïve evaluation [11] without compression (Theorem 3). Todo this, we define the operational semantics of semi-naïveevaluation of a DELP with a set of transition rules of form:Csn→SN Csn′, where Csn denotes a state in semi-naïve evalu-ation that records the complete execution as provenance [5].We also define a set of transition rules of form: Ccm ↗CMCcm′ for semi-naïve evaluation with our online compressionalgorithm. Here, Ccm denotes a state in semi-naïve evalu-ation with compression. The proof is to show that we canassemble entries in the ruleExec and prov tables to reconstructan original provenance tree tr . Likewise, given a provenancetree tr , we can also find an identical tree P encoded as en-tries in the ruleExec and prov tables. This correspondence isdenoted as tr ∼d P and can be defined by induction overthe structure of provenance trees.

Theorem 3 (Correctness of Compression). ∀n ∈ N and aninitial state Cinit, if Cinit →n

SN Csn, then ∃Ccm s.t. Cinit ↗nCM

Ccm and for any provenance tree tr ∈ Csn, there exists aprovenance tree P ∈ Ccm s.t. tr ∼d P and for any prove-nance tree P ∈ Ccm, there exists a provenance tree tr ∈ Csns.t. tr ∼d P. And the same is true for semi-naïve evaluationwhen Ccm is given.

The above theorem states that if we initiate a DELP DQfrom an initial state Cinit , and execute DQ for n steps toreach a state Csn, then we can also execute DQ for n stepswith the online compression scheme, starting from Cinit andending in Ccm . In the end, the sets of provenance trees re-spectively maintained by these two processes are identical.

An implication of Theorem 3 is that compressed provenancetrees, like traditional network provenance, would faithfullyrecord the system execution, even if the execution is erro-neous due to misconfiguration (e.g., wrong routing tables).To prove Theorem 3, we show Lemma 4 which implies

Theorem 3 as corollary. Lemma 4 shows that semi-naïveevaluation with the online compression scheme is bisimilarto the one that stores provenance trees without compres-sion. This bisimilarity relation shows that both evaluationstrategies have identical semantics.

Lemma 4 (Compression Simulates Semi-naïve Evaluation).∀n ∈ N and an initial state Cinit, if Cinit →n

SN Csn, then ∃Ccms.t. Cinit ↗n

CM Ccm and Csn RC Ccm, and vice versa.

We define a bisimulation relation RC between Csn andCcm – i.e., Csn RC Ccm means that when Csn→SN Csn′, thereexists a state Ccm′ s.t. Ccm ↗CM Ccm′ and Csn′ RC Ccm′,and vice versa. Intuitively, RC relates two states of the twoevaluation strategies – i.e., semi-naïve evaluation with andwithout compression – that execute identical programs tothe point of identical program execution states, and, mostimportantly, for any provenance tree P ∈ Ccm, there existsa provenance tree tr ∈ Csn s.t. tr ∼d P, and vice versa.Proof details of Lemma 4, along with the formal defini-

tion of the bisimulation relation RC, are presented in Ap-pendix D. Briefly, we apply induction over n, the number ofsteps taken by the execution. The key is to show that if twobismiluar states both take one step, the resulting states arestill bisimular.Generality of equivalence-based compression. Theidea of equivalence-based compression is not just applicable

to distributed scenarios, but can be generally used to com-press arbitrary provenance tree sets maintained in a cen-tralized manner as well. We adopt the definition of theequivalence relation in Section 5.1 because it allows us to useequivalence keys to efficiently identify equivalent provenancetrees, thus more suitable for the distributed environmentwhere networking resources (e.g., bandwidth) are scarce.

5.4 Inter-Equivalence Class CompressionThe online compression scheme introduced in Section 5.3

focuses on intra-equivalence class compression of provenancetrees – i.e., only trees of the same equivalence class are com-pressed. In fact, provenance trees of different equivalenceclasses can be compressed as well. For example, assume atuple packet(@n2, n2, n3, “ack”) is inserted into n2 in Fig-ure 6 for execution. The produced provenance tree provshares the provenance nodes rid1 and rid2 in the ruleExectable of Table 3. To avoid the storage of such redundantrule execution nodes, we separate the ruleExec table into twosub-tables: a ruleExecNode table and a ruleExecLink table (Ta-ble 4). The ruleExecNode table maintains the concrete ruleexecution nodes, while the ruleExecLink table, maintained foreach provenance tree tr individually, records the parent-child relationship of the rule execution nodes in tr . If twoprovenance trees, whether in the same equivalence class ornot, share the same rule execution node nd, only one copyof the concrete nd will be maintained in the ruleExecNodetable. Each tree maintains a reference pointer pointing tond in their respective ruleExecLink tables.

ruleExecNodeLoc RID RULE VIDSn3 rid1 r2 NULLn2 rid2 r1 (vid1)n1 rid3 r1 (vid2)

ruleExecLinkLoc RID NLoc NRIDn3 rid1 n2 rid2n2 rid2 n1 rid3n1 rid3 NULL NULL

Table 4: The ruleExecNode table and the ruleExecLinktable replacing the ruleExec table in Table 3 to allowfor compression of the shared rule execution nodes.

5.5 Updates to Slow-changing TablesThough we assume that slow-changing tables do not change

during a fixpoint computation, our system is designed tohandle these updates at runtime. Figure 7 presents an exam-ple scenario based on Figure 2, where a network administra-tor decides to use n4 , instead of n2 , as the intermediate hopfor packets sent from n1 to n3 . To redirect the traffic, theadministrator (1) deletes the route entry route(@n1, n3, n2),and (2) inserts a new route entry route(@n1, n3, n4).Deletion of a tuple from a slow-changing table, such as

route(@n1, n3, n2) in Figure 7, does not affect the storedprovenance, as provenance information is monotone – thatis, it represents the execution history which is immutable [27].However, when a tuple tp is inserted into a slow-changing

table, such as route(@n1, n4, n3) in Figure 7, the provenancetree generated by tp could be incorrect or missing. For ex-ample, in Figure 7, after route(@n1, n4, n3) is inserted, the

n1 n4 n3

route

L D N

@n1 n3 n2

@n1 n3 n4

route

L D N

@n4 n3 n3

n2

Figure 7: An updated topology of Figure 2. A newnode n4 is deployed to reach n3 . The route table ofn1 is updated to forward packets to n4 now.

provenance trees for all subsequent packets need to be re-calculated. However, since these packets are not the first intheir equivalence classes, their existFlags are set to true. Asa result, the provenance tree for the packet traversal on thepath n1 → n4 → n3 would not be maintained.To handle such scenarios, we require that, once a new

tuple tp is inserted into a node n’s slow-changing table, nshould broadcast a control message sig to all the nodes in thesystem. Any node receiving sig would empty the hash tableused for equivalence keys checking (Section 5.3). Therefore,provenance trees will be maintained again for all equivalenceclasses. In Figure 7, after the insertion of route(@n1, n3, n4),n1 would broadcast a sig to all the nodes, including itself.When a new packet pkt destined to n3 arrives at n1 , thepacket would have its existFlag set as false. When thispacket traverses the path n1 → n4 → n3 , the nodes onthe path are expected to maintain the corresponding prove-nance nodes. In all our network applications, the extra net-work overhead incurred by the broadcast and the impacton the effectiveness of compression due to reset of the hashtable is negligible, as slow-changing tables are updated infre-quently in practice (relative to the rate of event arrival). Weexperimentally validated this, as is shown in Section 6.1.2.

5.6 Provenance QueryingTo query the provenance tree of an output tuple tp, we

take the following steps:• Compute the hash value htp of tp, and find the tuple

prvtp in the prov table that has htp as its VID.• Initiate a recursive query for the (shared) provenancenodes in the ruleExec table, starting with the valuesof (Loc,RID) in prvtp. Also, tag the event ID evidstored in the attribute EVID along with the query.• When the recursive query reaches a ruleExec tuple atnode n that has (NULL,NULL) for (NLoc,NRID),the tagged evid is used to retrieve the event tuple ma-terialized at n.

For example, in Table 3, to query the provenance tree ofrecv(@n3, n1, n3, “data”), we first find prov(n3, tid1, n3, rid1,evid1), and use the values (n3 , rid1 ) to initiate the recur-sive query in the ruleExec table to fetch the provenance nodesrid1 , rid2 and rid3 . evid is carried throughout the query,and is used to retrieve the event packet(@n1, n1, n3, “data”)when the query stops at ruleExec(n1, rid3, r1, vid2, NULL,NULL). The above steps return to the initial querying loca-tion a collection of entries from the ruleExec and prov tables.We define a top-level algorithm Query that reconstructsthe complete provenance tree tr based on these entries. The

pseudocode of Query can be found in Figure 18 of Ap-pendix E. Query takes as input the network state Ccm ofthe online compression scheme, an output tuple P , an eventID evid, and returns a set of provenance trees, each of whichcorresponds to one derivation of P using the input event tu-ple of ID evid. The example based on Table 3 has only onederivation for the output tuple, so we return a singleton set.Correctness of Querying. From the correctness of theonline compression algorithm (Theorem 3), we can provethat all the provenance trees generated by semi-naïve evalu-ation can be queried and the query algorithm will return thecorrect provenance tree. One subtlety is that the compres-sion algorithm may propagate updates out of order, causingruleExec entries to be referred to in a provenance tree be-fore being stored. We handle this subtlety by assuming allupdates are processed before querying.

Theorem 5 (Correctness of the Query Algorithm).∀n ∈ N, given an initial state Cinit s.t. Cinit →n

CM Ccmand there are no more updates to be processed,

then ∃Csn s.t. Cinit →nSN Csn

and ∀tr :P in the output provenance storage of Csns.t. hash(EventOf(tr)) = evid,

∃M s.t. Query(Ccm , P, evid) =M and tr ∈Mand ∀tr ′ ∈M\tr, tr ′ is a proof of P stored in Csn

and hash(EventOf(tr ′)) = evid.

Details of the proof are in Appendix E. Briefly, by Theo-rem 3, there exists Csn s.t. Cinit →n

SN Csn and Csn RC Ccm.By Csn RC Ccm, we know that for any provenance tree tr oftuple P in Csn, there exists a prov tuple in Ccm that stores thereference to a provenance tree P for P , such that tr ∼d P.We induct over the depth of P to show that given the root ofP, the recursive lookup will return P. Now, it is straightfor-ward to reconstruct tr from P as the return value of Query.

6. EVALUATIONWe have implemented a prototype based on enhancement

to the RapidNet [14] declarative networking engine. At com-pile time, we add a program rewrite step that rewrites eachDELP into a new program that supports online provenancemaintenance and compression at runtime. We evaluate ourprototype to understand the effectiveness of the online com-pression scheme. In all the experiments, we compare threetechniques for maintaining distributed provenance. The firstis ExSPAN [28], a typical network provenance engine. Wemaintain uncompressed provenance trees in the same way asExSPAN. The second is the distributed provenance mainte-nance with basic storage optimization (Section 4). The thirdis the provenance maintenance using equivalence-based com-pression (Section 5). In the evaluation, we refer to the threetechniques as ExSPAN, Basic, and Advanced respectively.Workloads. Our experiments are carried out on two clas-sic network applications: packet forwarding (Section 2) andDomain Name System (DNS) resolution. DNS resolutionis an Internet service which translates human-readable do-main names into IP addresses. Both applications are event-driven, and typically involve a large volume of traffic dur-ing execution. The high-volume traffic incurs large storageoverhead if we maintain provenance information for eachpacket/DNS request, which leaves potential opportunity forcompression. The workloads are also sufficiently different toevaluate the generality of our approach. Packet forwarding

involves larger messages along different paths in a graph,while DNS resolution involves smaller messages on a tree-like topology.Testbed. In our experiment setup, we write the packetforwarding and DNS resolution applications in DELP, anduse our enhanced RapidNet [14] engine to compile them intolow-level (i.e., C++) execution codes.The experiments for measuring storage and bandwidth are

run on the ns-3 [15] network simulator, which is a discrete-event simulator that allows a user to evaluate network ap-plications on a variety of network topologies. The simula-tion is run on a 32-core server with Intel Xeon 2.40 GHzCPUs. The server has 24G RAM, 400G disk space, andruns Ubuntu 12.04 as the operating system. We run mul-tiple node instances on the same machine communicatingover the ns-3 simulated network.Performance Metrics. The performance metrics that weuse in our experiments are: (1) the storage overhead, (2) thenetwork overhead (i.e., bandwidth consumption) for prove-nance maintenance, and (3) the query latency when differentprovenance maintenance techniques are adopted.In our experiments, the relational provenance tables are

maintained in memory. To measure the storage occupation,we use the boost library [20] to serialize C++ data structuresinto binary data. At the end of each experiment run, weserialize the per-node provenance tables (i.e., the ruleExectable and the prov table) into binary files, and measure thesize of files to estimate the storage overhead.

6.1 Application #1: Packet ForwardingOur first set of results is based on the packet forward-

ing program in Figure 1. The topology we used for packetforwarding is a 100-node transit-stub graph, randomly gen-erated by the GT-ITM [25] topology generator. In par-ticular, there are four transit nodes – i.e., nodes throughwhich traffic can traverse – in the topology, each connect-ing to three stub domains, and each stub domain has eightstub nodes – i.e., nodes where traffic only originates orterminates. Transit-transit links have 50ms latency and1Gbps bandwidth; transit-stub links have 10ms latency and100Mbps bandwidth; stub-stub links have 2ms latency and50Mbps bandwidth. The diameter of the topology is 12,and the average distance for all node pairs is 5.3. Each noderuns one instance of the packet forwarding program.In the experiment, we randomly selected a number of node

pairs (s, d) – where s is the source and d is the destination–and sent packets from s to d while the provenance of eachpacket is maintained. To allow the packets to be correctlyforwarded in the network, we pre-computed the shortestpath p between s and d using a distributed routing pro-tocol written as a declarative networking program[12]. Theroutes are stored in the route tables at each node in p.

6.1.1 Storage of Provenance TreesFigure 8 shows the CDF (Cumulative Distribution Func-

tion) graph of storage growth for all the nodes in the 100-node topology. In the experiment, we randomly selected100 pairs of nodes, and continuously sent packets withineach pair at the rate of 100 packets/second. As packets aretransmitted, their provenance information is incrementallycreated and stored at each node (and optionally compressedfor Basic and Advanced). We calculated the average storagegrowth rate of each node, and plotted a CDF graph based

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

% o

f N

odes

Rate of Growth of Storage (Mbps)

ExSPANBasic

Advanced

Figure 8: Cumulative growth rateof provenance with 100 pairs ofcommunicating nodes, at inputrate of 100 packets/second.

0

2

4

6

8

10

12

0 20 40 60 80 100

Sto

rage (

GB

)

Time (Seconds)

ExSPANBasic

Advanced

Figure 9: Provenance storagegrowth of all nodes, with input rateof 100 packets/second for 100 pairsof communicating nodes.

0

5

10

15

20

25

30

0 50 100 150 200 250 300 350 400 450 500

Sto

rage (

MB

)

Number of Communicating Pairs

ExSPANBasic

Advanced

Figure 10: Provenance storage us-age with 2000 input packets evenlydistributed among increasing num-ber of communicating pairs.

0

1

2

3

4

5

6

7

8

9

10

0 10 20 30 40 50 60 70

Bandw

idth

(M

bps)

Time (seconds)

ExSPANBasic

Advanced

Figure 11: Bandwidth consump-tion during packet forwarding,with 500 pairs of nodes, each trans-mitting 100 packets.

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100 120 140 160 180 200

% o

f Q

ueri

es

Query latency (ms)

ExSpanBasic

Advanced

Figure 12: Cumulative distributionof provenance querying latency for100 random queries with 100 pairsof communicating nodes.

0

20

40

60

80

100

0 2 4 6 8 10 12 14

% o

f N

odes

Rate of Growth of Storage(Mbps)

ExSPANBasic

Advanced

Figure 13: Cumulative provenancestorage growth rate of nameserverswith input request at a rate of 1000requests/second.

on the results. We observe that ExSPAN has the higheststorage growth rate among the three: 20% of the nodes havestorage growth greater than 5 Mbps; 4% of nodes (i.e., tran-sit nodes) have storage growth greater than 30 Mbps. Thisis because a number of node pairs share the same transitnodes in their paths. As expected, Basic has smaller storagegrowth rate compared to ExSPAN, as it removes intermedi-ate packet tuples from the provenance tables of each node.Advanced significantly outperforms the other two: all thenodes in the topology has less than 2 Mbps storage growthrate. The gap between Advanced and ExSPAN results fromthe fact that Advanced only maintains one representativeprovenance tree for each pair of nodes, while ExSPAN hasto maintain provenance trees of all the traversing packets.Figure 9 shows the total storage usage with continuous

packet insertion. We ran the experiment for 100 secondsand took a snapshot of the storage every 10 seconds. Thefigure shows that ExSPAN has the highest storage overhead.For example, it reaches the storage of 11.8 GB at 90 seconds,and keeps growing in a linear fashion. Basic has a similarpattern, with 9.2 GB at 90 seconds. However, Advancedpresents lower storage growth, where at 90 seconds it onlyconsumes storage space of 0.92 GB. We further calculatethe average growth rate for all three lines. ExSPAN’s stor-age grows at 131 MB/second, Basic at 109 MB/second, andAdvanced at 10.3 MB/second. This means that ExSPANcould fill a 1TB disk within 2 hours, Basic within 2.5 hours,whereas Advanced more than one day.Figure 10 shows the storage usage when we increase the

number of communicating pairs, but keep the total numberof packets the same (i.e., 2000 packets). All the packetsare evenly distributed among all the communicating pairs.We observe that the storage usage of ExSPAN and Basicremains almost constant: ExSPAN’s total storage usage is

around 27 MB and Basic’s total storage usage is around 21MB. This is because in both cases, each packet has a prove-nance tree maintained in the network, irrelevant of its sourceand destination. The burst of storage at the beginning ofthe experiments for ExSPAN and Basic is due to the factthat sizes of provenance trees also depend on the length ofthe path that each packet traverses. In our experiment, theinitial node pairs happen to have a path length shorter thanthe average path length in the topology, thus incurring lessstorage overhead.For the case of Advanced, its storage usage increases with

the number of communicating pairs. This is because eachcommunicating pair forms an equivalence class, and main-tains one copy of the shared provenance tree in the equiva-lence class. Therefore, whenever a new communicating pairis added to the experiment, we need to maintain one moreprovenance tree for that pair, which increases the total stor-age. Despite the storage increase, Advanced still consumesmuch less storage space than the other two schemes.In summary, we observe that Basic is able to reduce stor-

age growth, and in combination with the equivalence-basedcompression (Advanced), the storage reduction is significant– i.e., a 92% reduction over ExSPAN.

6.1.2 Network Overhead.Figure 11 presents the bandwidth utilization when we ran-

domly selected 500 pairs of nodes and each pair communi-cated 100 packets. As expected, the bandwidth consump-tion of Advanced is close to the ones of ExSPAN and Basic.This is because the extra information carried with each pack-ets is merely existFlag and some auxiliary data (e.g., hashvalue of the event tuple), which is negligible compared to thelarge payload of the packets. We repeated the experimentfor Advanced, but updated a route every 10 seconds, in or-

der to study the effects of updates to slow-changing tuples.We observe a negligible bandwidth increase of 0.6%.

6.1.3 Query LatencyTo evaluate the latency of queries, we ran emulation that

could account for both network delays and computationtime. We ran the packet forwarding application on a testbedconsisting of 25 machines. Each machine is equipped witheight Intel Xeon 2.67 GHz CPUs, 4G RAM and 500G diskspace, running CentOS 6.8 as the operating system.On each machine, we ran up to four instances of the same

packet forwarding application with provenance enabled. In-stead of communicating via the ns-3 network, actual sock-ets were used over a physical network. In total, there were100 nodes, connected together using the same transit-stubtopology we used for simulation.In our experiment, we executed 100 queries, selected on

random nodes, where each query returned the provenancetree of a recv tuple corresponding to a random source anddestination pair – the destination node is the starting pointof the query. The query is executed in a distributed fashionas described in Section 5.6. Based on our physical networktopology, each query takes 5.3 hops on average in the net-work. We repeated the experiment for Basic, Advanced, andExSPAN for 100 queries each.Figure 12 shows our experimental results in the form of

a CDF. We observe that both Basic and Advanced havelatency numbers that are significantly lower than that ofExSPAN. For example, the mean/median for ExSPAN is75ms and 74ms respectively, as compared to only 25.5ms and25ms for Basic. This is approximately a 3X reduction in la-tency time. The extra overhead is due to ExSPAN’s need inprocessing larger intermediate tuples. Basic and Advancedavoid this overhead by symbolically rederiving intermediateresults during query execution.

6.2 Application #2: DNS ResolutionDNS resolution [13] is an Internet service that translates

the requested domain name, such as “www.hello.com”, intoits corresponding IP address in the Internet. In practice,DNS resolution is performed by DNS nameservers, whichare organized into a tree-like structure, where each name-server is responsible for a domain name (e.g., “hello.com”or “.com”). We used the recursive name resolution proto-col in DNS, and implemented the protocol as a DELP (seeAppendix F). During the execution of each DNS program,provenance support is enabled so that the history DNS re-quests can be queried.We synthetically generated the hierarchical network of

DNS name servers. In total, there were 100 name servers,and the maximum tree depth is 27. Our workload consistsof clients issuing requests to 38 distinct URLs. In total,DNS requests were issued at a rate of 1000 requests/second.Our topology resembles real-world DNS deployments. Priorwork [9] has shown that in reality, the requested domainnames satisfy Zipfian distribution. In our experiments, weadopted the same distribution.

6.2.1 Storage of Provenance TreesFigure 13 shows the provenance storage growth rate for all

nameservers in the Domain Name System over a 100 secondsduration. We measure the storage growth of each name-server by first measuring the growth rate of each 10-second

interval, and calculating the average growth rates over all10 intervals. We observe that ExSPAN has the largest stor-age growth rate for each node among the three experiments,while Advanced has the lowest storage growth rate. Notethat the reduction of storage growth rate in Figure 13 is notas significant as that in the packet forwarding experiments(Figure 8). For example, 80% of nameservers in ExSPANhave storage growth rate less than 476 Kbps. while the rateis 121 Kbps for Advanced. Advanced is four times betterthan ExSPAN, compared to 11 times in packet forwarding.The reason is that, compared to packet forwarding, we ratethe total throughput of incoming events – i.e., packet tuplesin packet forwarding and request tuple in DNS resolution –and this causes the storage growth rate at each node usingeither ExSPAN and Basic to drop as well.Figure 16 shows the provenance storage growth for all

name servers. We record the current storage growth rate at10-second intervals. In Figure 16, the storage of ExSPANand Basic grows much faster than that of Advanced. Specifi-cally, the growth rates of ExSPAN, Basic and Advanced are13.15 Mbps, 11.57 Mbps and 3.81 Mbps respectively, andthe storage space at 100 seconds reaches 1.32 GB, 1.16 GB,and 0.38 GB respectively. With the given rates, ExSPANwould fill up a 1TB disk within 21 hours, Basic within 24hours, and Advanced up to 3 days.Figure 14 shows the storage growth when we increased the

number of requested URLs. In this experiment, we fixedthe total number of requests at 200, so that when moreURLs were added, there would be fewer duplicate requests.In Figure 14, the storage overhead for ExSPAN and Basicremains stable at around 2.5 MB and 2.26 MB respectively.This is because the storage overhead is mostly determinedby the number of provenance trees, which is equal to thenumber of incoming requests (i.e., 200 in this case). ForAdvanced, the storage grows at a rate of 11.6 Kb per URL.This is expected as we need to maintain one provenance treefor each equivalence class, and the number of equivalenceclasses grows in proportion to the number of URLs. Similarto our packet forwarding results, despite the storage growth,Advanced still requires significantly less storage comparedto ExSPAN and Basic. Unless a URL is only requestedonce (highly unlikely in reality), which is the worst possiblecase for Advanced, Advanced always performs better thanExSPAN and Basic.

6.2.2 Network OverheadFigure 15 shows the bandwidth usage with elapsed time

when we continuously sent 100,000 requests to the root name-server. All three experiments finish within 102 seconds.Throughout the execution, ExSPAN and Basic have simi-lar bandwidth usage, which is stable at around 4.5 MBps.On the other hand, Advanced’s bandwidth usage is about6 MBps, which is about 25% higher than the other twotechniques. This is because unlike in the packet forward-ing experiments where each packet carries a payload of 500characters, each DNS request does not have any extra pay-load. Therefore, the meta-data tagged with each request(e.g., existFlag) accounts for a large part of the size of eachrequest, resulting in higher additional bandwidth overhead.

7. RELATED WORKNetwork provenance has been proposed and developed by

ExSPAN [28] and DTaP [27]. These two proposals store un-

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

5 10 15 20 25 30 35

Sto

rage(M

B)

Number of URLs

ExSPANBasic

Advanced

Figure 14: Provenance storagegrowth with increasing URLs, with200 requests sent in total.

0

1

2

3

4

5

6

0 20 40 60 80 100 120

Bandw

idth

(M

bps)

Time (Seconds)

ExSPANBasic

Advanced

Figure 15: Bandwidth consump-tion for DNS resolution with100,000 DNS requests.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 20 40 60 80 100

Sto

rage (

GB

)

Time (Seconds)

ExSPANBasic

Advanced

Figure 16: Provenance storagegrowth with continuous input re-quests at 1000 requests/second.

compressed provenance information, laying the foundationfor our work. In database literature, a number of works haveconsidered optimization of provenance storage. However, wediffer significantly in our design due to the distributed natureof our target environment. We briefly list a few representa-tive bodies of work, and explain our differences.Woodruff et al. [21] reduce storage usage for maintaining

fine-grained lineage (i.e., provenance) by computing prove-nance information dynamically during query time throughinvertible functions. Their approach tradeoffs storage withaccuracy of provenance. On the other hand, our approachrequires no such tradeoff, achieving the same level of accu-racy as queries on uncompressed provenance trees.Chapman et al. [3] develop a set of factorization algo-

rithms to compress workflow provenance. Their proposaldoes not consider a distributed setting. For example, node-level factorization (combining identical nodes) requires ad-ditional states to be maintained and propagated from nodeto node during provenance maintenance to resolve potentialambiguities. Maintaining and propagating these states canlead to significant communication overhead in a distributedenvironment. In contrast, our solution uses the equivalencekeys to avoid comparing provenance trees on a node-by-nodebasis, and hence minimizes communication overhead duringprovenance maintenance.Our compression technique implicitly factorizes provenance

trees at runtime before removing redundant factors amongtrees in the same equivalence class. Olteanu et al. [16][17]propose factorization of provenance polynomials for con-junctive queries with a new data structure called factoriza-tion tree. Polynomial factorization in [17] can be viewedas a more general form of the factorization used in theequivalence-based compression proposed in this paper. If weencode the provenance trees of each packet as polynomials,the general factorization algorithm in [17], with specializedfactorization tree, would produce the same factorization re-sult in our setting. Our approach is slightly more efficient,as we can skip the factorization step by directly using theequivalence keys at runtime to group provenance trees forcompression. Exploring the more general form of factor-ization in [17] for provenance of distributed queries is aninteresting avenue of future work.ProQL [10] proposes to save the storage of single prove-

nance tree by (1) using primary keys to represent tuples inthe provenance, and (2) maintaining one copy for attributesof the same values in a mapping (rule). These techniquescould also be applied alongside our online compression algo-rithm to further reduce storage. ProQL does not considerstorage sharing across provenance trees. Amsterdamer et

al. [1] theoretically defines the concept of core provenance,which represents derivation shared by multiple equivalentqueries. In our scenario, the shared provenance tree of eachequivalence class can be viewed as core provenance.Xie et al. [24] propose to compress provenance graphs with

a hybrid approach combining Web graph compression anddictionary encoding. Zhifeng et al. [2] proposes rule-basedprovenance compression scheme. Their approaches on a highlevel compresses provenance trees to reduce redundant stor-age. However, these approaches require knowledge of theentire trees prior to compression, which is not practical, ifnot impossible, for distributed provenance.Provenance has been applied to network repairing [23, 22,

4] where root-cause analysis is used to identify and fix config-uration errors in networks. Network repairing is orthogonalto our work, but can benefit from our compression tech-niques to reduce the storage of provenance maintenance.

8. CONCLUSION & FUTURE WORKIn this paper, we propose an online, equivalence-based

compression scheme for the maintenance of distributed net-work provenance. Equivalent provenance trees are identifiedat compile time through static analysis of the declarativeprogram, whereas our runtime maintains only one concreterepresentative provenance tree for each equivalence class.Our evaluation results show that the compression schemesaves storage significantly, incurs little network overhead,and allows for efficient provenance query.This paper focuses on compressing trees generated within

one program executed at all nodes. In most network deploy-ments, there may be multiple programs (or network proto-cols) running concurrently. As future work, we plan to ex-plore the possibility of compressing provenance trees acrossprograms that share execution rules.

9. ACKNOWLEDGMENTSWe thank the anonymous reviewers for their feedback to

help improve the paper. We also thank Ling Ding andYang Li for comments on the work and proof-read of thepaper. This work is jointly funded by NSF CNS-1513679,CNS-1218066, CNS-1065130, CNS-1513961, CNS-1453392and CNS-1513734.

10. REFERENCES[1] Y. Amsterdamer, D. Deutch, T. Milo, and V. Tannen.

On provenance minimization. ACM Trans. DatabaseSyst., 37(4):30, 2012.

[2] Z. Bao, H. Köhler, L. Wang, X. Zhou, and S. W.Sadiq. Efficient provenance storage for relationalqueries. In CIKM, pages 1352–1361, 2012.

[3] A. Chapman, H. V. Jagadish, and P. Ramanan.Efficient provenance storage. In Proceedings of ACMSIGMOD, pages 993–1006, 2008.

[4] A. Chen, Y. Wu, A. Haeberlen, W. Zhou, and B. T.Loo. The Good, the Bad, and the Differences: BetterNetwork Diagnostics with Differential Provenance. InProceedings of ACM SIGCOMM, Aug. 2016.

[5] C. Chen, L. Jia, H. Xu, C. Luo, W. Zhou, and B. T.Loo. A program logic for verifying secure routingprotocols. In Proceedings of FORTE, pages 117–132,2014.

[6] C. Chen, H. Lehri, L. K. Loh, A. Alur, L. Jia, B. T.Loo, and W. Zhou. Provably correct distributedprovenance compression (cmu-cylab-17-001). Technicalreport, CyLab, Carnegie Mellon University, Jan. 2017.

[7] R. Droms. Dynamic host configuration protocol. 1997.RFC 2131.

[8] T. J. Green, G. Karvounarakis, and V. Tannen.Provenance semirings. In Proceedings of PODS, pages31–40, 2007.

[9] J. Jung, E. Sit, H. Balakrishnan, and R. Morris. DNSperformance and the effectiveness of caching.IEEE/ACM Trans. Netw., 10(5):589–603, 2002.

[10] G. Karvounarakis, Z. G. Ives, and V. Tannen.Querying data provenance. In Proceedings of ACMSIGMOD, pages 951–962, 2010.

[11] B. T. Loo, T. Condie, M. Garofalakis, D. E. Gay,J. M. Hellerstein, P. Maniatis, R. Ramakrishnan,T. Roscoe, and I. Stoica. Declarative NetworkingLanguage, Execution and Optimization. InProceedings of ACM SIGMOD, 2006.

[12] B. T. Loo, T. Condie, M. Garofalakis, D. E. Gay,J. M. Hellerstein, P. Maniatis, R. Ramakrishnan,T. Roscoe, and I. Stoica. Declarative networking. InCommunications of the ACM, 2009.

[13] P. V. Mockapetris. Domain names - implementationand specification, Nov. 1987. RFC 1035.

[14] S. C. Muthukumar, X. Li, C. Liu, J. B. Kopena,M. Oprea, and B. T. Loo. Declarative toolkit for rapidnetwork protocol simulation and experimentation. InSIGCOMM (demo), 2009.

[15] ns 3 project. Network Simulator 3.http://www.nsnam.org/.

[16] D. Olteanu and J. Závodný. On factorisation ofprovenance polynomials. In Proceedings of TaPP,2011.

[17] D. Olteanu and J. Závodný. Factorised representationsof query results: size bounds and readability. InProceedings of ICDT, pages 285–298, 2012.

[18] D. C. Plummer. An ethernet address resolutionprotocol. 1982. RFC 826.

[19] M. Reitblatt, N. Foster, J. Rexford, C. Schlesinger,and D. Walker. Abstractions for network update. InProceedings of ACM SIGCOMM, pages 323–334, 2012.

[20] Robert Ramey. http://www.boost.org/doc/libs/1_61_0/libs/serialization/doc/index.html.

[21] A. Woodruff and M. Stonebraker. Supportingfine-grained data lineage in a database visualizationenvironment. In Proceedings of ICDE, pages 91–102,1997.

[22] Y. Wu, A. Chen, A. Haeberlen, W. Zhou, and B. T.Loo. Automated network repair with meta provenance.In Proceedings of HotNets, pages 26:1–26:7, 2015.

[23] Y. Wu, M. Zhao, A. Haeberlen, W. Zhou, and B. T.Loo. Diagnosing missing events in distributed systemswith negative provenance. In Proceeding of ACMSIGCOMM, pages 383–394, 2014.

[24] Y. Xie, K. Muniswamy-Reddy, D. Feng, Y. Li, andD. D. E. Long. Evaluation of a hybrid approach forefficient provenance storage. TOS, 9(4):14, 2013.

[25] E. W. Zegura, K. L. Calvert, and S. Bhattacharjee.How to model an internetwork. In Proceedings IEEEINFOCOM, pages 594–602, 1996.

[26] W. Zhou, Q. Fei, A. Narayan, A. Haeberlen, B. T.Loo, and M. Sherr. Secure network provenance. InProceedings of SOSP, pages 295–310, 2011.

[27] W. Zhou, S. Mapara, Y. Ren, Y. Li, A. Haeberlen,Z. G. Ives, B. T. Loo, and M. Sherr. Distributedtime-aware provenance. PVLDB, 6(2):49–60, 2012.

[28] W. Zhou, M. Sherr, T. Tao, X. Li, B. T. Loo, andY. Mao. Efficient querying and maintenance ofnetwork provenance at internet-scale. In Proceedingsof ACM SIGMOD, pages 615–626, 2010.

APPENDIXIn this appendix, we provide proof sketches of the key lem-mas in the paper. We refer interested readers to our com-panion technical report [6] for more details.

A. PROVENANCE TREE EQUIVALENCEWe formally define tree equivalence below using the fol-

lowing notations: an instance of the input event relation e isdenoted e(@ι,~c), or ev in shorthand; an instance of a slow-changing relation b is denoted as b(@ι,~c) or B (thus we writeB1:: · · · ::Bn to denote the slow-changing tuples used to ex-ecute rule rID); and instances of fast-changing relations aredenoted by P , p(@ι,~c), Q, or q(@ι,~c). A provenance tree tris inductively defined as follows:

Provenance tree tr ::= 〈rID, P, ev, B1:: · · · ::Bn〉| 〈rID, P, tr , B1:: · · · ::Bn〉

tr ∼K tr ′ is defined inductively as follows:

ev ∼K ev′

v〈rID, P, ev, B1:: · · · ::Bn〉 ∼K 〈rID, P ′, ev′, B1:: · · · ::Bn〉

tr ∼K tr ′

〈rID, P, tr , B1:: · · · ::Bn〉 ∼K 〈rID, P ′, tr ′, B1:: · · · ::Bn〉

B. CORRECTNESS OF STATIC ANALYSISFormal definition of equivalent keys. In this section,we explain the predicates used to define equivalence keys.

http://www.nsnam.org/

http://www.boost.org/doc/libs/1_61_0/libs/serialization/doc/index.html

http://www.boost.org/doc/libs/1_61_0/libs/serialization/doc/index.html

DQ ` joinFAttr(p:i, q:j)

JoinF-BaserID p(~xp) :- q(~xq), b1(~xb1), · · · , bk(~xbk), · · · , bn(~xbn), · · · ∈ DQ

q:i = p:jDQ ` joinFAttr(p:i, q:j)

DQ ` joinSAttr(p:i)

Join-BaserID p(~xp) :- q(~xq), b1(~xb1), · · · , bk(~xbk), · · · , bn(~xbn), · · · ∈ DQ

q:i = bk:jDQ ` joinSAttr(q:i)

Join-Func-AttrrID p(~xp) :- q(~xq), · · · , Fi : y := F (~z), · · · ∈ DQ

q:j = ~z:kDQ ` joinSAttr(q:j)

Join-Arith-LeftrID p(~xp) :- q(~xq), · · · , aL(~xaL)bop aR(~xaR), · · · ∈ DQ

aL:j = q:iDQ ` joinSAttr(q.i)

Join-Arith-RightrID p(~xp) :- q(~xq), · · · , aL(~xaL)bop aR(~xaR), · · · ∈ DQ

aR:j = q:iDQ ` joinSAttr(q:i)

We define rules to derive how an attribute of a tuple isconnected to an event trigger attribute.DQ ` connected(e:i, p:j)

DQ ` joinFAttr(e:i, p:j)DQ ` connected(e:i, p:j)

Connected-Base

Connected-JoinDQ ` connected(e:i, q:j) DQ ` joinFAttr(q:j, p:k)

DQ ` connected(e:i, p:k)

e:i is in the equivalent key K is defined as follows.DQ ` e:i ∈ K

DQ ` joinSAttr(e:i)DQ ` e:i ∈ K

Equi-Direct

Equi-ReachableDQ ` joinSAttr(q:j) DQ ` connected(e:i, q:j)

DQ ` e:i ∈ K

Proof of correctness of equivalence keys (Lemma 2).The proof uses induction over the structure of tr1. In thebase case, tr1 only uses one rule. We construct tr2 using thesame slow-changing tuples as tr1 because ev1 and ev2 agreeon every attribute that joins with a slow-changing tuple.By the definition of DELP, any attribute on the root oftr1 and tr2 whose values differ in those trees must comefrom (be connected to) the input event tuple, not from anyslow-changing tuples; otherwise, they cannot differ. By thedefinition equivalence keys, this differing attribute is not inthe equivalence keys.For the inductive case, trp:p(t1, ..., tn) contains a subtree

trq:q(s1, ..., sm), in which q(s1, ..., sm) is the event of therule r that derived p(t1, ..., tn). By I.H., we know that

we can construct tr ′q:q(s′

1, ..., s′m), which is equivalent to

trq:q(s1, ..., sm), and for any j s.t. sj 6= s′j , ∃l, q:j is con-

nected to e:l, and e:l /∈ K. Next, we show that we can con-struct a tr ′

p:p(t′1, ..., t′n) that is equivalent to trp:p(t1, ..., tn).We can construct tr ′

p:p(t′1, ..., t′n) by using the same rule rand the same slow-changing tuples from the last layer oftrp:p(t1, ..., tn). All the attributes that have different val-ues for q(s1, ..., sm) and q(s′

1, ..., s′m) do not join with the

attributes in slow-changing tables; otherwise, it contradictswith the condition that e:l /∈ K. The condition that ∀i,ti 6= t′i, ∃l, p:i is connected to e:l, and e:l /∈ K follows fromthe fact that such p:i must satisfy joinFAttr(p:i, q:j) for someq:j, where ∃l, q:j is connected to e:l, and e:l /∈ K.

C. EXAMPLE DEPENDENCY GRAPHFigure 17 shows an example attribute-level dependency

graph for the packet forwarding program in Figure 1. Basedon Section 5.2, the equivalence keys are (packet:0, packet:2).

packet:0

packet:1

packet:2

packet:3

route:0

route:1

route:2

recv:0

recv:1

recv:2

recv:3

Figure 17: The attribute-level dependency graph forthe packet forwarding program in Figure 1.

D. CORRECTNESS OF COMPRESSIONThe key to showing that our online compression algorithm

stores the required provenance trees is to define a bisimu-lation relation RC between the network state of the onlinecompression execution (Ccm) and the network state of semi-naïve evaluation (Csn) introduced in prior work [11].A bisimulation relation between Csn and Ccm. We de-fine RC, a bisimulation relation that corresponds the deriva-tion trees tr in Csn (where Csn = Qsn� Ssn1 · · · SsnN ) to theprovenance trees P in Ccm (where Ccm = Qcm�Scm1 · · · ScmN ).We present the formal definition below.

E1 :: Scm.Γ ` Qsn ∼U QcmE2 :: ∀i ∈ [1, N ],Scm.Γ ` Ssni.Usn RU Scmi.U cmE3 :: U cmF ⊆ Qcm ∪

(∪Ni=1Scmi.U cm

)E4 :: Scm.Γ,Scm.DQ,U cmF

` ∪Ni=1Ssni.MRre Scmi.(Table ruleExec)E5 :: Scm.Γ,Scm.DQ,U cmF ,∪Ni=1Scmi.Υ

` ∪Ni=1Ssni.Mprov Rprov ∪Ni=1 Scmi.(Table prov)Qsn� Ssn1 · · · SsnN RC Qcm� Scm1 · · · ScmN

The network state Csn consists of the local states Ssn1· · · SsnN of each node in the distributed system that theprogram DQ is evaluated on, and a queue of unprocessedupdates Qsn representing the rule events in DQ that are

to be sent to various nodes in the distributed system totrigger rules in DQ. Similarly, a network state Ccm consistsof a set of local states Scm1 · · · ScmN for each node in thedistributed system, and a queue of unprocessed updates Qcmrepresenting the same rule events in Qsn.The local states Ssn and Scm of both evaluations contain

the same copy of DQ and also a declaration Γ that mapsrelations in DQ to their types as specified by the networkadministrator (i.e. whether they are input events or slow-changing tuples).A semi-naïve local state Ssn is written as 〈@ι, DQ, Γ,DB, E , Usn, equiSet, M, Mprov〉. Node ι is the identifierof the node in the distributed system whose local state isrepresented by Ssn. DQ and Γ have been introduced above.DB is a local database of slow-changing tables and instancesof relations of interest derived on node ι that are used asnon-event inputs to a rule during the execution. E is a listof unprocessed input event tuples that will eventually beused to trigger the execution of DQ on node ι. Usn arethe unprocessed updates for the execution that have alreadybeen triggered by an input event tuple. equiSet records theequivalence keys that have been seen on node ι. The storagestructures areM that store the proofs of the tuples P thatare derived locally, andMprov that record the proofs of therelations of interest that are stored locally.The local state Scm of the online compression scheme

is similar to Ssn. It is defined as 〈@ι, DQ, Γ, DB, E ,Usn, equiSet, Table ruleExec, Table prov〉. Because the on-line compression scheme stores compressed provenance trees,while semi-naïve evaluation does not perform compression,the structures used for storage in Scm necessarily differ fromthat of Ssn. In Scm, the ruleExec table (introduced in Sec-tion 5.3) corresponds toM in Ssn, while the prov table cor-responds toMprov in Ssn.Our bisimulation relation uses relation Rre to related

derivation trees tr stored in theM of local states of Csn toprovenance trees P stored in the ruleExec tables of the localstates of Ccm (E4). We define the following relation tr ∼d Pbetween a derivation tree and a compressed provenance inorder to formalize Rre:

∀i ∈ [1, n], vIDi = hash(Bi)vids = vID1:: · · · ::vIDn rid = hash(r :: vids)

ruleExec = 〈@ι, rid, r, vids,NULL,NULL〉〈r, P, e(@ι,~t), B1:: · · · ::Bn〉 ∼d ruleExec

ruleExecq = 〈@ιq, ridq, rq, vidsq,@ιρ, ridρ〉trq ∼d P :: ruleExecq ∀i ∈ [1, n], vIDi = hash(Bi)vidsp = vID1:: · · · ::vIDn ridp = hash(rp :: vidsp)

ruleExecp = 〈@ιp, ridp, rp, vidsp,@ιq, ridq〉〈rp, P, trq:q(@ιq,~tq), B1:: · · · ::Bn〉∼d P :: ruleExecq :: ruleExecp

Rre is one of several intermediary relations used to relatestructures in each local state Ssn to structures in local stateScm in order to define the bisimulation relation RC. Theseintermediary relations are used in E1 to E4 to specify therequired correspondences. Due to space limitations, we elidemost of their definitions.• Semi-naïve evaluation and online compression execu-tion have the same set of updates (E1).• Every local state contains the same set of unprocessedupdates during both executions (E2).

• All derivations derived by semi-naïve evaluation can ei-ther be found in the set of existing rule provenances ofthe online compression execution, or will eventually begenerated by the unprocessed updates (E3 and E4).• For each tuple whose provenance information we recordduring online compression execution, we also record itsproof during semi-naïve evaluation, and vice versa (E5).

The last three judgments are very complicated because ofpossible out of order execution, thus we omit the details.Proof of Lemma 4. We present the proof details of Lemma 4,from which Theorem 3 immediately follows. The proof usesinduction on the number of steps and relies on a lemmarelating two states if one takes a single step. The lemmashows that starting from related states (Csn RC Ccm), ifsemi-naïve evaluation takes a step (Csn→SN Csn′), then on-line compression can also take a step (i.e. Ccm ↗CM Ccm′)and the resulting states are again bisimilar (Csn′ RC Ccm′)and vice versa.There are two key cases where new rule provenances are

generated and provenance storage data structures are up-dated: one is when all the rules that an update can fire areprocessed and the update is stored into the database if it isthe result tuple, the other is when a rule is fired and a newupdate is generated.In the first case, in semi-naïve evaluation, a derivation tr

of tuple P is stored. In the online compression execution,the prov table will be updated. In this case, the derivationof P has already been stored in the respective provenancedata structures when the rule that generates the update isprocessed. Therefore, we only need to check that the hashvalues in the prov table match corresponding fields in thederivation stored inMprov. This can be shown because theupdates relate to each other in the initial execution states.In the second case, an update P ′ triggers a rule r and

generates a new update P . Semi-naïve evaluation storesthe derivation tr for P . The online compression executionupdates the ruleExec table. We need to show that (1) thenew updates in both executions still relate to each other,(2) the new derivation tr and the new entry in the ruleExectable relate to each other, and (3) other relations are notaffected by the new entries. All of the above can be shownby examining our relations.

E. CORRECTNESS OF QUERYTo show that our query algorithm (Query in Figure 18)

is able to recover the correct derivation tree of a given tuplefrom our compressed provenance storage, we state and proveCorrectness of the Query Algorithm (Theorem 5). This the-orem states that given initial network state Cinit that transi-tions to Ccm in n steps using the rules for online compression,there exists a network state Csn for semi-naïve evaluation s.t.for any derivation tree tr that is a proof of output tuple Pand derived using an input event tuple ev with ID evid,Query takes as inputs Ccm, P and evid and returns a setM consisting of all derivation trees (including tr) that areproofs of P and that were derived using ev.Theorem 5 tells us that Query always returns all the

derivations in Csn for P and evid.The proof relies on Lemma 3 to determine that there ex-

ists a network state Csn for semi-naïve evaluation s.t. Ccmand Csn are bisimilar (Csn RC Ccm). Given such an Csn,we pick an arbitrary derivation tree tr for tuple P in Csnthat was generated by event tuple ev with event ID evid.

1: function Query(Ccm, P , evid)2: htp ← hash(P )3: if 〈_, htp,_,_, evid〉 ∈ Ccm then4: [prov1 · · · provn]← Get_Prov(Ccm, htp, evid)5: M← {}6: for i ∈ [1, n] do7: 〈loc, htp, rloci, ridi, evid〉 ← provi8: Pi ← Qr(Ccm, (rloci, ridi))9: tr i ← Transform_To_D(Pi, evid)10: M←M∪ tr i11: returnM12: else13: return ∅14: end function15:16: function Qr(Ccm, (loc, rid))17: if loc == NULL and rid == NULL then18: return []19: else20: ruleExec ← Get_RuleExec(Ccm, (loc, rid))21: 〈loc, rid, r, vids,nloc,nrid〉 ← ruleExec22: return Qr(Ccm, (nloc,nrid)) :: ruleExec23: end functionFigure 18: Pseudocode for querying a provenancetree.

Because Csn RC Ccm, there exists a tuple prov for P in aspecific prov table in Ccm storing an association to a specificprovenance P, and furthermore tr ∼d P. By the above rea-soning and the semantics of Query, the “If” branch of theIf-Else statement on lines 3-13 of Query is taken. On line4, Get_Prov takes as input Ccm, htp (the hash of P ), andevid, then returns every provi in the prov tables of Ccm con-taining an association (rloci, ridi) to a provenance tree Pifor P that was derived using ev. By the relation in E5, eachPi is recorded in Ccm. We use Qr to retrieve Pi. If we canshow that Qr can correctly retrieve Pi, it is straightforwardto show that Transform_To_D recovers tr when givenPi and evid as inputs. Hence, the conclusion holds.We still need to show that recursive algorithm Qr will

return Pi when given Ccm and the association (rloci, ridi) toprovenance Pi. We prove Lemma 6 below. The proof usesa uniqueness property on elements in the ruleExec table–thefirst two arguments of ruleExec (i.e. loc and rid) are primarykeys that uniquely determine it. Thus, given any ruleExecand ruleExec′, that agree on the first two arguments loc andrid, then ruleExec = ruleExec′.

Lemma 6 (Correctness of Qr). Given that Csn RC Ccm andtr :P ∈ Csn and ruleExec = 〈loc, rid, r, vids, nloc, nrid〉 andP :: ruleExec is stored in the ruleExec tables of the local statesof Ccm and tr ∼d P :: ruleExec, then Qr(Ccm, (nloc,nrid)) =P :: ruleExec.

We prove Lemma 6 by induction over `, the length ofP :: ruleExec.Base Case A: ` = 0. By the assumption we have P ::

ruleExec = []. By the definition of ∼d that relates derivationtrees to compressed provenance trees, @tr ∈ Csn s.t. tr ∼d [].Thus the antecedent of the lemma is false.Base Case B: ` = 1. By the assumption we have P =

[] and thus P :: ruleExec = ruleExec. Because tr :P ∼druleExec, tr has only one rule. Thus (nloc, nrid) are null by

the correspondence relation as only rule r was used to de-rive P . Because (loc, rid) are not null, the “Else” branch ofthe If-Else statement on Lines 17-22 of Qr is taken. There-fore on Line 20 of Qr, the algorithm finds ruleExec (whereruleExec = 〈loc, rid, r, vids,nloc,nrid〉) by the uniquenessproperty. The query Qr(Ccm, (nloc,nrid)) initiates a queryfor an empty rule provenance list, that by Base Case A re-turns an empty list. By Line 22 of Qr, we have Qr(Ccm,(loc, rid)) = [] :: ruleExec as desired.Inductive Case: ` = k + 1 ≥ 2. By assumption, nloc

and nrid are not null, thus the “Else” branch of the If-Elsestatement on Lines 17-22 of Qr is taken. Therefore on Line20 of Qr the algorithm finds ruleExec (where ruleExec =〈loc, rid, r, vids,nloc,nrid〉) by the uniqueness property. Byassumption, there exists P ′ and ruleExec′ s.t. P = P ′ ::ruleExec′ and ruleExec′ is not null. By the above and thecorrespondence tr ∼d P :: ruleExec′, exists tr ′ in Csn thatis a subderivation of tr s.t. tr ′ ∼d P and ruleExec′ =〈nloc,nrid, r′, vids′,nloc′,nrid ′〉. Using I.H. we can obtainQr(Ccm, (nloc, nrid)) = P ′ :: ruleExec′. By Line 22 of Qr,we have Qr(Ccm, (loc, rid)) = P ′ :: ruleExec′ :: ruleExec =P :: ruleExec as desired.

F. DELP FOR DNS RESOLUTION

r1 request(@RT,URL,HST,RQID) :−url(@HST,URL,RQID).rootServer(@HST,RT ).

r2 request(@SV,URL,HST,RQID) :−request(@X,URL,HST,RQID),nameServer(@X,DM,SV ).f_isSubDomain(DM,URL) == true.

r3 dnsResult(@X,URL, IPADDR,HST,RQID) :−request(@X,URL,HST,RQID),addressRecord(@X,URL, IPADDR).

r4 reply(@HST,URL, IPADDR,RQID) :−dnsResult(@X,URL, IPADDR,HST,RQID).

Figure 19: DELP for DNS resolution.

Figure 19 shows the DELP encoding of the recursive DNSresolution. The program is composed of four rules. Ruler1 forwards a DNS request of ID RQID to the root name-server RT for resolution. The request is generated by thehost HST for the URL URL. Rule r2 is triggered whena nameserver X receives a DNS request for URL, but hasdelegated the resolution of sub-domain DM correspondingto URL to another nameserver SV. Rule r2 then forwardsthe DNS request to SV for further DNS resolution. Rule r3generates a DNS resolution result containing the IP addressIPADDR corresponding to the requested URL, when URLmatches an address record on the nameserver X. Finally,Rule r4 is responsible for returning the DNS result to therequesting host HST.

Distributed Provenance Compression - Georgetown University

Documents

Transcript of Distributed Provenance Compression - Georgetown University