Verifying Balanced Treestheory.stanford.edu/~tingz/papers/lfcs07/lfcs07_full.pdfof red-black trees...

Verifying Balanced Trees

Zohar Manna1, Henny B. Sipma1, Ting Zhang2

1 Stanford University {zm,sipma}@cs.stanford.edu2 Microsoft Research Asia [email protected]

Abstract. Balanced search trees provide guaranteed worst-case time per-formance and hence they form a very important class of data structures.However, the self-balancing ability comes at a price; balanced trees aremore complex than their unbalanced counterparts both in terms of datastructure themselves and related manipulation operations. In this paperwe present a framework to model balanced trees in decidable first-ordertheories of term algebras with Presburger arithmetic. In this framework, atheory of term algebras (i.e., a theory of finite trees) is extended with Pres-burger arithmetic and with certain connecting functions that map terms(trees) to integers. Our framework is flexible in the sense that we can ob-tain a variety of decidable theories by tuning the connecting functions.By adding maximal path and minimal path functions, we obtain a theoryof red-black trees in which the transition relation of tree self-balancing(rotation) operations is expressible. We then show how to reduce the veri-fication problem of the red-black tree algorithm to constraint satisfiabilityproblems in the extended theory.

1 Introduction.

Balanced search trees provide guaranteed worst-case time performance andhence they form a very important class of data structures. Also they are thebasis of efficient implementations of many advanced data structures such as as-sociative arrays and associative sets. However, the self-balancing ability comesat a cost; balanced trees are more complex than their unbalanced counterpartsboth in terms of data structure themselves and related manipulation opera-tions. Moreover, as balanced trees are not regular trees, their properties cannotbe directly characterized by standard tree automata techniques [4].

In this paper we present a framework to model balanced trees in decid-able first-order theories of term algebras with Presburger arithmetic [23]. In thisframework, a theory of term algebras (i.e., a theory of finite trees) is extendedwith Presburger arithmetic and certain connecting functions that map terms(trees) to integers. Given connecting functions and a fixed signature of a termalgebra, the corresponding extended theory is two sorted with integer sort and

1 The first and the second author were supported in part by NSF grants CCR-01-21403, CCR-02-20134, CCR-02-09237, CNS-0411363, and CCF-0430102, by ARO grantDAAD19-01-1-0723, and by NAVY/ONR contract N00014-03-1-0939.

1

term sort. The language is the set-theoretic union of the language of term al-gebras and the language of Presburger arithmetic augmented with connectingfunctions fromT toN. Formulae are formed from term literals and integer literalsusing logical connectives and quantifications.

Our framework is flexible in the sense that we can obtain a variety of decid-able theories by varying the connecting functions. By adding maximal path andminimal path functions, we obtain a theory of red-black trees in which the tran-sition relation of tree self-balancing (color exchange and rotation) operationsare expressible. We then show how to reduce the verification problem of thered-black tree algorithm to a constraint satisfiability problem in the extendedtheory.

Related Work and Comparison. There has been a considerably amount of workin shape analysis, a kind of pointer analysis aimed at statically inferring proper-ties on heap allocated data structures. Shape analysis tools can partially detectpointer linkage properties such as sharing, aliasing, cyclicity and reachability.The property of being a balanced tree, however, is a much higher-level propertythat can not be inferred from pointed-to relations on heaps. Rugina presents amethod, called quantitative shape analysis, to verify rebalancing operations onAVL trees [19]. Based on abstract interpretation, the method performs forwardpropagation in an abstract heap where each location (node) is associated withquantitative attributes and relations to characterize the balancing property.

Tree automata techniques are widely used in solving constraints on treelanguages [4]. However, balanced trees are not regular trees (by the pumpinglemma), and hence their corresponding tree languages cannot be directly char-acterized by standard tree automata [4]. Habermehl et al. present extended treeautomata with size constraints on transition relations (TASC) [10]. TASCs areable to represent pre- and post-conditions of a program involving tree rotationoperations. Hence, given the right invariants, the verification of a program re-duces to checking the validity of Hoare triples that state that after execution ofthe program from the starting state satisfying the pre-condition, the resultingreachable states are included in the states represented by the post-condition.However, TASCs that encode transition relations of tree operations are fairlycomplicated. The lack of intuitive connections between low level program state-ments and the corresponding automata representations makes this formalismunattractive for use in practical verification tools.

Baldan et al. treat red-black trees as hypergraphs and tree update operationsas rewritings on hypergraphs [2]. They use approximate unfolding to computethe reachable states of a graph rewriting system to prove the property thatno red node has a red parent. The balancing property itself, however, is notexpressible in graph rewriting grammar and an additional type system is in-troduced to prove it. Calcagno et al. present a context logic to model trees withlocal updates, which are destructive operations at pointed locations [3]. Theypresent a deductive proof system based on Hoare triples and prove soundnessand completeness of the proof system. The balancing property, however, is not

2

expressible this formal system and the verification of Hoare triples is not fullyautomatic.

Like [10] we reduce the verification problem to checking the validity of Hoaretriples. In our approach, however, the pre- and post-conditions and transitionrelations are all represented directly by the first-order formulas in the extendedtheory of term algebras. Different from [10, 2, 3], we do not have an updatefunction in the theory and hence we can not express local updates at an arbitrarypointed location. On the other hand, local updates only affect subtrees aroundthe focus point, and hence our theory can still express and prove verificationconditions of step-by-step tree operations. An informal but easy induction willgive us global safety properties.

The contributions of this paper are the following: (1) we develop a first-ordertheory of red-black trees, that is, a theory of term algebras augmented with Pres-burger arithmetic; (2) we show how to use this theory to represent the transitionrelations of the tree operations directly from the program statements, and howto use them to construct Hoare triples; (3) we provide a decision procedure forautomatically checking validity of the resulting verification conditions. To thebest of our knowledge, this is the first decidable logic theory for red-black trees.Moreover, it can be easily generalized to model other balanced tree structures,such as AVL trees and B-trees.

Paper Organization Section 2 presents the notation and terminology for termalgebras. Section 3 introduces the theory of red-black trees and states its decid-ability result. Section 4 shows how to use the theory to analyze the red-blacktree insertion algorithm. Section 5 concludes with a discussion of future work.Because of space limitations all decidability proofs are given in the appendix.

2 Preliminaries

We assume the first-order syntactic notions of variables, parameters and quan-tifiers, and semantic notions of structures, satisfiability and validity as in [9]. Weuse JxK to denote the value given by an assignment and x to denote a sequenceof variables.

Definition 1 (Term Algebras). A term algebra TA : 〈T;C,A,S,T 〉 consists of

1. T: The term domain, which exclusively consists of terms recursively built up fromconstants by applying non-nullary constructors. Objects inT are called TA-terms.The type of a term t, denoted by type(t), is the outermost constructor symbol of t.We say that t is α-typed (or is an α-term) if type(t) = α.

2. C: A set of constructors: α, β, γ, . . . The arity of α is denoted by ar(α).3. A: A set of constants: a, b, c, . . . We require A , ∅ and A ⊆ C. For a ∈ A,

ar(a) = 0 and type(a) = a.4. S: A set of selectors. For a constructor α with arity k > 0, there are k selectors

sα1 , . . . , sαk in S. We call sαi (1 ≤ i ≤ k) the ith α-selector. For a term x, sαi (x) returns

the ith immediate subterm of x if x is an α-term and x itself otherwise.

3

5. T : A set of testers. For each constructor α there is a corresponding tester Isα. Fora term x, Isα(x) is true if and only if x is an α-term. For a constant a, Isa(x) is justx = a. In addition there is a special tester IsA such that IsA(x) is true if and only ifx is a constant.

We use LT to denote the language of term algebras.

Term domain T consists of only ground terms built from constructors. Se-lectors only exist in the formal language. In fact selectors can be defined byconstructors in the existential fragment of the language; for a quantifier-freeformula Φ(x) containing selectors, we can obtain an equivalent and selector-free formula ∃yΦ′(x, y) where Φ′(x, y) is quantifier-free and y is fresh.

A term t is called a constructor term if t is a variable or the outermost functionsymbol of t is a constructor. Constants are constructor terms. A term t is calleda selector term if either t is a variable or the outermost function symbol of t isa selector. Variables are both constructor terms and selector terms. We assumethat no constructors appear immediately inside selectors as simplification canalways be done. A term is called proper if it is not a constant or a variable.

The first-order theory of term algebras was shown to be decidable by Mal’cevusing quantifier elimination [15]. Decision procedures for the quantifier-freetheory were discovered by Nelson, Oppen et al. [16, 17, 8]. Oppen gave a linearalgorithm for acyclic structures [17] and (with Nelson) a quadratic algorithmfor cyclic structures [16]. If the values of the selector functions on constants arespecified, then the problem is NP-complete [17].

Presburger arithmetic is the first-order theory of addition in the arithmeticof integers. The corresponding structure is denoted by PA = 〈Z; 0,+, <〉. We useLZ to denote the formal language of PA. The first-order theory of Presburgerarithmetic (PA) was first shown to be decidable in 1929 by the quantifier elim-ination method [9]. More efficient algorithms were later discovered by [6] andfurther improved in [18].

There has been a great interest in generalizing Mal’cev’s result on term al-gebras. Maher showed the decidability of the theory of infinite and rationaltrees [14]. Comon and Delor presented an elimination procedure for term al-gebras with membership predicate in the regular tree language [5]. Backofenpresented an elimination procedure for structures of feature trees with arity con-straints [1]. Rybina and Voronkov showed the decidability of term algebras withqueues [20]. Kuncak and Rinard showed the decidability of term powers, whichare term algebras augmented with coordinate-wise defined predicates [13]. Acombination of Presburger arithmetic and term algebras was used by Korovinand Voronkov to show that the quantifier-free theory of term algebras withKnuth-Bendix order is NP-complete [11, 12]. In [21, 23] we presented decisionprocedures both for the first-order theory and the corresponding quantifier-freefragments of term algebras with integer functions. In [22] we extended the de-cidability result to the first-order theory of term algebras with Knuth-Bendixorder.

4

3 The Theory of Red-Black Trees

In this section we present a theory of a term algebra with two integer functionsto express the properties of red-black trees.

Definition 2 (Red-black Trees [7]). A red-black tree is a binary tree with the fol-lowing coloring properties:

1. Every node is either red or black.2. Every leaf node is black.3. The root is black.4. Every red node has two black children.5. All paths from the root to leaf nodes contain the same number of black nodes.

Properties (1)-(3) can be modeled in a theory of term algebras as follows:

Definition 3 (Structure of Colored Trees). The structure of red-black colored treesis

RB = 〈 Trb; {red, black,nil}, {nil},{carred, cdrred, carblack, cdrblack}, {Isred, Isblack, Isnil} 〉 ,

whereTrb denotes the domain, nil denotes a leaf, red and black are binary constructors,car] and cdr], respectively, are the left and the right ]-selectors (] ∈ {red, black}). Thecorresponding language is denoted by LRB.

For notation simplicity we use car to denote either carred or carblack, whichshould be clear from the context. Similar for the use of cdr. If a term t appearsin a selector, we assume either Isred(t) or Isblack(t) holds. For example, car(x) = yshould be understood as an abbreviation of

(Isred(x) ∧ y = carred(x)) ∨ (Isblack(x) ∧ y = carblack(x)) .

In the following we use terms and trees,respectively, to refer to syntactic objectsand semantic objects. We call terms (trees) of red-type (resp. of black-type)red-terms (-trees) (resp. black-terms (-trees)).

We extend RB with PA to express balancing properties (4)-(5):

Definition 4 (Structure of Red-black Trees). The structure of red-black trees is

RBZ = 〈RB; PA; | · |max, | · |min : Trb →N 〉 ,

where, | · |max and | · |min are two integer functions defined recursively as

|x|? =

1 x = nil ,0 Vio(x) ,?(|x1|?, |x2|?) + 1 GB(x, x1, x2) ,?(|x1|?, |x2|?) GR(x, x1, x2) .

5

where ? ∈ {max,min} and GB(x, x1, x2), GR(x, x1, x2) and Vio(x) are

Vio(x) def== x , nil ∧ ∀x1∀x2

(

¬GB(x, x1, x2) ∨ ¬GR(x, x1, x2))

,

GB(x, x1, x2) def== x = black(x1, x2) ∧ |x1|max , 0 ∧ |x2|max , 0 ,

GR(x, x1, x2) def== x = red(x1, x2) ∧ |x1|max , 0 ∧ |x2|max , 0

∧ ¬Isred(x1) ∧ ¬Isred(x2) .

We denote the corresponding language by L Z

RB.

Vio(x) states that x violates property (4) of red-black trees. GB(x, x1, x2) statesx is a black tree with two good subtrees x1 and x2. Similarly for GR(x, x1, x2). |x|max(resp. |x|min) gives the maximal (resp. minimal) number of black nodes that xcan have on a maximal path. A maximal path of x that contains the largest (resp.smallest) number of black nodes is called a maximal black path (resp. minimalblack path) of x. We call |x|max the maximal black length of x, |x|min the minimal blacklength, and the pair (|x|max, |x|min) the measure of x, denoted by ‖x‖.

In this theory, properties (1) and (2) of Definition 2, which state that everynode is either black or red, and that a nil node is black, are trivially satisfied bythe choice of signature and the integer functions. Therefore x is a red-black treeif x satisfies the following three conditions.

(§1) |x|max = |x|min any maximal path of x contains the same number of black nodes,(§2) |x|max > 0 any red node of x must have two black children,(§3) Isblack(x) the root of x is black.

We denote by ϕ−RB(x) the conjunction of (§1) and (§2), and by ϕRB(x) the con-junction of (§1)-(§3). We note that ϕRB(x) defines a subdomain of Trb and thetheory of this subdomain can be obtained by relativizing quantifiers to ϕRB(x).Formally, ∀x(ϕRB(x) → Φ(x)) (resp. ∃x(ϕRB(x) ∧ Φ(x))) expresses that Φ(x) is auniversal (resp. existential) property of red-black trees. We have

Theorem 1 (Decidability of RBZ).

1. The first-order theory of RBZ is decidable and admits quantifier elimination.2. The decision problem for the quantifier-free fragment is NP-complete.

4 Analysis of Red-black Trees

4.1 Algorithm and Example

In this section we consider the insertion-fixup operation of red-black trees rep-resented by Algorithm 2, a slightly modified version of the algorithm givenin [7] . We illustrate the algorithm on the same example as in [7], inserting 4 atthe bottom of the tree and showing how the algorithm restores the red-blacktree property.

6

Algorithm 2 (RB-I-F)Input: root, T, x.1: while (x , root and T[x-1].color = ) do2: if T[x-1].dir=right then3: if (T[x-1].tree ) {Case 1} then4: T[x-1].tree := (car(T[x-1].tree),cdr(T[x-1].tree))5: T[x-1].color := 6: T[x-2].color := 7: x := x-28: else9: if (T[x].dir=left) {Case 2} then

10: swap(T[x].tree, T[x+1].tree)11: T[x].dir := right12: T[x+1].dir := left13: end if14: T[x-1].color := {Case 3}15: T[x].tree := (T[x].tree, T[x-1].tree)16: T[x-1].tree := T[x-2].tree17: T[x-1].dir := T[x-2].dir18: if (x-2 , root) then19: x-3+1:=x-120: else21: root := x-122: end if23: end if24: else if (T[x-1].dir=left) then25: similar code as the then clause with left and right swapped26: end if27: end while28: T[root].color :=

Recall that our language does not have an update function to express therelation between the original tree and updated tree if the update happens at anunbounded depth inside the tree. We know that the restoring updates will beginat the newly inserted node and traverse upwards to the root and that all localupdates will happen on this path. We represent the tree as a sequence of subtreesindexed by nodes on the path from the root to the newly inserted node. We treatthe path as a doubly linked list (denoted by T in the algorithm) in which eachelement contains three fields, .color, .dir and .tree. Field .color denotes the typeof the node. Field .dir indicates whether the subtree at this node is the left childor the right child. Field .tree denotes the sibling subtree of this node. We haveroot.dir = ⊥ and root.tree = ⊥. For simplicity, we omit the value field as it hasno role in restoring the red-black tree property. We treat root and x as iteratorsand we use array notation T[x] to denote the element pointed to by x. We usex+1 and x−1 to denote previous iterator and next iterator of x, respectively. Forexample, the statement x+3−1 = x−1 at line 19 means x.pre.pre.pre.next := x.pre.

7

11

2

1 7

5

4 nil

8

14

nil 15

11

2

1 7

5

4 nil

8

14

nil 15

(a) (b)

11

7

2

1 5

4 nil

8

14

nil 15

7

2

1 5

4 nil

11

8 14

nil 15

(c) (d)

Fig. 1. A run of RB-I-F.

Figure 1 shows the results of the operations performed to restore the balanced-tree property after inserting 4. Figure 2 gives a more detailed picture of the datastructures of the nodes on the path from the root to x. Figure 1 (b) shows the treeobtained by recoloring. The new violation now corresponds to Case 2 in Algo-rithm 2. Figure 1 (c) shows the tree obtained from a left rotation. There is still aviolation which corresponds to Case 3 in Algorithm 2. Figure 1 (d) shows a newred-black tree after a right rotation. Figures 2 (b)-(d) show the correspondingchanges of the data structure during the run3.

4.2 Verification Conditions

We now show how to use L Z

RB to express the verification conditions for state-ments restoring that red-black tree property in Algorithm 2. Recall that in thealgorithm x is an iterator and T[x] is a node pointed by x in a linked list andit contains three fields, .dir, .color and .tree. At the semantic level, however, weview x as an integer index and T[x] as a subtree indexed by x. If x , root, thenT[x].tree denotes the sibling tree of T[x], and T[x − 1] represents the immediate

3 To save space, in all figures we do not draw a nil node if its sibling is not nil, and forthis reason, we do not draw nil nodes black.

8

(a)

11 2 7 5 4

⊥ −→ ←− −→ −→

⊥ 14

nil 15

1 8 nil

(b)

11 2 7 5 4

⊥ −→ ←− −→ −→

⊥ 14

nil 15

1 8 nil

(c)

11 7 2 5 4

⊥ −→ −→ ←− −→

⊥ 14

nil 15

8 1 nil

(d)

7 2 5 4

⊥ −→ ←− −→

⊥ 11

8 14

nil 15

1 nil

Fig. 2. Paths from the root of the tree to x. In each of (a)-(d), the first row shows thesequence of nodes from the root to x; the second row shows whether the node above itis a left (←) or right (→) sibling; the third row shows the sibling tree of the node in thetop row.

9

super-tree containing T[x]. For example, if T[x].dir is right and T[x − 1].coloris red, then T[x − 1] = red(T[x],T[x].tree). We have three field operators, .dir,.color and .tree. Among them .dir can only take three values, le f t, right and ⊥, soexpressions involving .dir can be removed by disjunctive splitting. Similar for.color, but it can be directly expressed in LRB as below.

T[x].color = red def== Isred(T[x]) ,

T[x].color = black def== x = nil ∨ Isblack(T[x]) .

With the help of .dir, .tree can be expressed in LRB as follows.

T[x].tree = y , ⊥ def== x , root ∧

(

(y = car(T[x − 1]) ∧ T[x].dir = right)∨ (y = cdr(T[x − 1]) ∧ T[x].dir = le f t)

)

,

T[x].tree = ⊥ def== x = root .

Therefore from now on we treat field access expressions as abbreviations inLRB. Note that we use array and record notations for clarity. At the formulalevel terms of index access or field access are simply variables. For example,T[x], T[x].tree, T[x].color and T[x].dir can be represented by variables fx, gx, hxand kx indexed by x, respectively. Similarly for terms indexed by x− i and x+ i.

Let v denote the variables in the current state and v′denote the correspondingvariables in the next state. The transition relation of a statement q is denoted byρq(v, v′). The post-condition post(q, ϕ) of ϕ(v) after executing a statement q is

(∃v0)(

ρq(v0, v) ∧ ϕ(v0))

.

The transition relation of two sequential statements can be computed as fol-lows. Let ρq(v, v1) and ρr(v1, v′) be the transition relations for statements q and rrespectively. Then the transition relation of the composite statement 〈q; r〉 is

(∃v1)(

ρq(v, v1) ∧ ρr(v1, v′))

.

The validity checking of a Hoare triples {ϕ}q{ψ} is equivalent to proving thatpost(q, ϕ)→ ψ.

The lack of update functions makes it impossible to express the tree opera-tional semantics precisely in a finite formula. For example, when T[x] is changed,not only should T′[x] appear in ρq(v, v′), but also all ancestors of T[x]. In factρq(v, v′) has an unbounded number of conjuncts of the form car(T′[x − i]) =T′[x − i + 1] or cdr(T′[x − i]) = T′[x − i + 1]. We can still, however, prove safetyproperties about tree operations with the help of an informal induction. As anexample, we show that ϕ−RB(T[x]), introduced in Section 3, is an invariant withrespect to each code fragment (corresponding to Case 1, 2 or 3 in Algorithm 2).This can be obtained by establishing the Hoare triple {ϕ}Q{ψ} where ϕ is thepre-condition

x , root ∧ x − 1 , root→(

ϕ−RB(T[x]) ∧ ϕ−RB(T[x].tree) ∧ ¬ϕ−RB(T[x − 1]) ∧ ϕ−RB(T[x − 1].tree))

10

ψ is the post-condition ϕ−RB(T[x]), and Q is a code fragment corresponding toCase 1, 2 or 3. Here we need another invariant ∀x(x , root → ϕ−RB(T[x].tree)).This invariant can not be formally proved in our theory because of the universalquantification on indexes. But it is easy to verify that the parametric Hoaretriples

{x , root→ ϕ−RB(T[x ± i].tree)} q {x , root→ ϕ−RB(T[x ± i].tree)}

can be established for each statement q not modifying index x (see below for thetransition relations of those statements). In the following we list local transitionrelations of all statements involving tree update and use guard conditions tosimplify those transition relations.

Case 1 is implemented by statements 4-7. The guard conditions are x ,root ∧ Isred(T[x − 1]) (line 1), T[x − 1].dir = right (line 2) and Isred(T[x − 1].tree)(line 3). Under these conditions the transition relations for statements 4-7 are,respectively,

T′[x − 1].tree = cdr(T′[x − 2])= black(car(T[x− 1].tree), cdr(T[x − 1].tree)) , (S-4)

car(T′[x − 2]) = T′[x − 1] = black(car(T[x − 1]), cdr(T[x − 1])) , (S-5)T′[x − 2] = red(car(T[x − 2]), cdr(T[x − 2])) , (S-6)x′ = x − 2 . (S-7)

The composite transition relation for statements 4-7 is

T′[x − 1].tree = black(car(T[x − 1].tree), cdr(T[x − 1].tree))∧ T′[x − 1] = black(car(T[x− 1]), cdr(T[x − 1]))∧ T′[x − 2] = red(T′[x − 1],T′[x − 1].tree)∧ x′ = x − 2 .

Recall that T[x], T[x].tree, T[x].color and T[x].dir are just more informativealiases of indexed variables fx, gx, hx and kx, respectively. Similarly for termsindexed by x−1 and x−2. The next state variable for T[x] should be T′[x′], but bydefault we write T′[x] when x′ = x. When we do transition relation composition,statement 7 requires us to hard code the integer indexing properties in theformula by adding equalities like T′[x− 1] = T′[x′ + 1], T′[x− 2].tree = T′[x′].treeand so on. To save space, however, we omit them in the above example.

Figure 3 illustrates Case 1 by a run on tree (a) in Figure 1 (copied as (b-0)).Trees (b-1), (b-2) and (b-3) are the outcomes of statements 4, 5 and 6, respectively.

The code fragment for Case 2 consists of statements 10-12. We take intoaccount the conditions T[x − 1].color = red (line 1), T[x − 1].dir = right (line 2)and T[x].dir = le f t (line 9). Under these condition the transition relations forstatements 10-12 are, respectively,

cdr(T′[x − 1]) = T′[x] ∧ (T′[x + 1].tree = cdr(T′[x]) = T[x].tree)

11

11

2

1 7x − 2

5x − 1

4x

nil

8

14

nil 15

11

2

1 7x − 2

5x − 1

4x

nil

8

14

nil 15

(b-0) (b-1)

11

2

1 7x − 2

5x − 1

4x

nil

8

14

nil 15

11

2

1 7x − 2

5x − 1

4x

nil

8

14

nil 15

(b-2) (b-3)

Fig. 3. A detailed run of RB-I-F step (b).

∧ (T′[x].tree = car(T′[x − 1]) = T[x + 1].tree) , (S-10)T′[x].dir = right ∧ T′[x − 1] = red(cdr(T[x − 1]), car(T[x − 1])) , (S-11)T′[x + 1].dir = le f t ∧ car(T′[x − 1]) = T′[x]

∧ T′[x] = red(cdr(T[x]), car(T[x])) . (S-12)

The composite transition relation for statements 10-12 is

T′[x + 1].tree = T[x].tree ∧ T′[x].tree = T[x + 1].tree∧ T′[x].dir = right ∧ T′[x − 1] = red(T′[x],T[x+ 1].tree)∧ T′[x + 1].dir = le f t ∧ T′[x] = red(T[x].tree, car(T[x])) .

Figure 4 illustrates Case 2 by a run on tree (b) in Figure 1 (copied as (c-0)). Trees(c-1), (c-2) and (c-3) are the outcomes of statements 10, 11 and 12, respectively.Recall that we ignored value labels at internal nodes as they are irrelevant to thered-black tree properties. But the binary search tree property may be violatedwithout adjusting the value labels. So for the sake of illustration we switchedpositions of 2 and 7 in Figure 4 (c-1) although statement 10 does not have thiseffect.

12

11

2x − 1

1 7x

5

4 nil

8

14

nil 15

11

7x − 1

8 2x

5

4 nil

1

14

nil 15

(c-0) (c-1)

11

7x − 1

2x

5

4 nil

1

8

14

nil 15

11

7x − 1

2x

1 5

4 nil

8

14

nil 15

(c-2) (c-3)

Fig. 4. A detailed run of RB-I-F step (c).

Case 3 consists of statements 14-21. We take into account the conditionsT[x − 1].color = red (line 1), T[x − 1].dir = right (line 2) and T[x].dir = right(line 11). Under these conditions the transition relations for statements 14-21are, respectively,

car(T′[x − 2]) = T′[x − 1] = black(car(T[x − 1]), cdr(T[x − 1])) , (S-14)cdr(T′[x − 1]) = T′[x].tree = red(T[x].tree,T[x− 1].tree) , (S-15)T′[x − 1].tree = T[x − 2].tree∧ (x − 2 , root → cdr(T′[x − 2]) = T[x − 2].tree) , (S-16)

T′[x − 1].dir = T[x − 2].dir∧(

x − 2 , root ∧ T[x − 1].dir , T[x − 2].dir →

T′[x − 2] = black(cdr(T[x − 2]), car(T[x − 2])))

, (S-17)

x′ − 2 = x − 3 ∧(

(cdr(T′[x − 3]) = T[x − 1] ∧ T[x − 2].dir = le f t)

∨ (car(T′[x − 3]) = T[x − 1] ∧ T[x − 2].dir = right))

, (S-19)x′ − 1 = root . (S-21)

13

11

7x − 1

2x

1 5

4 nil

8

14

nil 15

11

7x − 1

2x

1 5

4 nil

8

14

nil 15

(d-0) (d-1)

11

7x − 1

2x

1 5

4 nil

11

8 14

nil 15

14

nil 15

11

7x − 1

2x

1 5

4 nil

11

8 14

nil 15

nil

(d-2) (d-3)

11

7x − 1

2x

1 5

4 nil

11

8 14

nil 15

nil

7

2x

1 5

4 nil

11

8 14

nil 15

(d-4) (d-5)

Fig. 5. A detailed run of RB-I-F step (d) with x − 2 = root.

Assuming x− 2 = root, the composite transition relation for statements 14-21 is

car(T′[x − 2]) = T′[x − 1] = black(car(T[x − 1]),T′[x].tree)∧ T′[x].tree = red(T[x].tree,T[x− 1].tree)∧ T′[x − 1].tree = T[x − 2].tree ∧ T′[x − 1].dir = T[x − 2].dir∧ x′ − 1 = root .

14

Assuming x − 2 , root, the composite transition relation for statements 14-21 is

car(T′[x − 2]) = T′[x − 1] = black(car(T[x − 1]),T′[x].tree)∧ T′[x].tree = red(T[x].tree,T[x− 1].tree)∧ T′[x − 1].tree = T[x − 2].tree ∧ cdr(T′[x − 2]) = T[x − 2].tree∧ T′[x − 1].dir = T[x − 2].dir ∧

(

T[x − 1].dir , T[x − 2].dir →

T′[x − 2] = black(cdr(T[x − 2]), car(T[x − 2])))

∧ x′ − 2 = x − 3 ∧(

(cdr(T′[x − 3]) = T[x − 1] ∧ T[x − 2].dir = le f t)

∨ (car(T′[x − 3]) = T[x − 1] ∧ T[x − 2].dir = right))

.

Figure 5 illustrates Case 3 by a run on tree (c) in Figure 1 (copied as (d-0)). Trees(d-1)-(d-5) are the outcomes of statements 14-17 and 21, respectively, underthe assumption that x − 2 = root. Here (d-3) and (d-4) are the same becauseT[x − 1].dir = T[x − 2].dir and hence statement 17 has no effect. Figures 6 and 7illustrate Case 3 under the assumption x−2 , root∧T[x−2].dir = right. Trees (d’-1)-(d’-5) corresponds to the outcomes of statements 14-17 and 19, respectively.Similarly as before (d’-3) and (d’-4) are the same because T[x−1].dir = T[x−2].dir.Figures 8 and 9 illustrate Case 3 under the assumption x−2 , root∧T[x−2].dir =le f t. Trees (d”-1)-(d”-5) corresponds to the outcomes of statements 14-17 and 19,respectively. As before we keep the binary search tree property by adjustingvalue labels 11, 16 and −16. For space limitation, Figures 6-9 are given in theappendix.

5 Conclusion

We presented a decidable theory of red-black trees, which is an extension of thetheory of term algebras with two size functions. We showed how the red-blacktree insertion algorithm can be analyzed using this theory. We plan to extendthis theory to express local updates at an arbitrary pointed location in a tree.We note that adding a standard update function easily makes the first-ordertheory undecidable. We will investigate ways to enhance the expressiveness ofthe theory while maintaining the decidability.

References1. Rolf Backofen. A complete axiomatization of a theory with feature and arity constraints. Journal

of Logical Programming, 24(1&2):37–71, 1995.2. Paolo Baldan, Andrea Corradini, Javier Esparza, Tobias Heindel, Barbara Konig, and Vitali

Kozioura. Verifying red-black trees. In Proceedings of the 1st International Workshop on theVerification of Concurrent Systems with Dynamic Allocated Heaps (COSMICAH 2005), 2005.

3. Cristiano Calcagno, Philippa Gardner, and Uri Zarfaty. Context logic and tree update. In Pro-ceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,pages 271–282. ACM Press, 2005.

15

4. Hubert Comon, Max Dauchet, Remi Gilleron, Denis Lugiez, Sophie Tison, andMarc Tommasi. Tree Automata Techniques and Applications. Electronic edition athttp://l3ux02.univ-lille3.fr/tata/tata.pdf, 2002.

5. Hubert Comon and Catherine Delor. Equational formulae with membership constraints. Infor-mation and Computation, 112(2):167–216, 1994.

6. D. C. Cooper. Theorem proving in arithmetic without multiplication. In Machine Intelligence,volume 7, pages 91–99. American Elsevier, 1972.

7. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction toAlgorithms. The MIT Press, Cambridge, Massachusetts, 2001.

8. J. Downey, R. Sethi, and R. E. Tarjan. Variations of the common subexpression problem. Journalof the ACM, 27:758–771, 1980.

9. H. B. Enderton. A Mathematical Introduction to Logic. Academic Press, 2001.10. Peter Habermehl, Radu Iosif, and Tomas Vojnar. Automata-based verification of programs

with tree updates. In Proceedings of 12th International Conference on Tools and Algorithms forthe Construction and Analysis of Systems (TACAS’06), volume 3920 of Lecture Notes in ComputerScience, pages 350–364. Springer-Verlag, 2006.

11. Konstantin Korovin and Andrei Voronkov. A decision procedure for the existential theory ofterm algebras with the Knuth-Bendix ordering. In Proceedings of 15th IEEE Symposium on Logicin Computer Science, pages 291 – 302. IEEE Computer Society Press, 2000.

12. Konstantin Korovin and Andrei Voronkov. Knuth-Bendix constraint solving is NP-complete. InProceedings of 28th International Colloquium on Automata, Languages and Programming (ICALP’01),volume 2076 of Lecture Notes in Computer Science, pages 979–992. Springer-Verlag, 2001.

13. Viktor Kuncak and Martin Rinard. The structural subtyping of non-recursive types is decidable.In Proceedings of 18th IEEE Symposium on Logic in Computer Science, pages 96–107. IEEE ComputerSociety Press, 2003.

14. M. J. Maher. Complete axiomatizations of the algebras of finite, rational and infinite tree.In Proceedings of the 3rd IEEE Symposium on Logic in Computer Science, pages 348–357. IEEEComputer Society Press, 1988.

15. A. I. Mal’cev. Axiomatizable classes of locally free algebras of various types. In The Meta-mathematics of Algebraic Systems, Collected Papers, chapter 23, pages 262–281. North Holland,1971.

16. Greg Nelson and Derek C. Oppen. Fast decision procedures based on congruence closure.Journal of the ACM, 27(2):356–364, April 1980.

17. Derek C. Oppen. Reasoning about recursively defined data structures. Journal of the ACM,27(3):403–411, July 1980.

18. C. R. Reddy and D. W. Loveland. Presburger arithmetic with bounded quantifier alternation.In Proceedings of the 10th Annual Symposium on Theory of Computing, pages 320–325. ACM Press,1978.

19. Radu Rugina. Quantitative shape analysis. In Proceedings of the 11th International Static AnalysisSymposium (SAS’04), volume 3148 of Lecture Notes in Computer Science, pages 228–245. Springer-Verlag, 2004.

20. Tatiana Rybina and Andrei Voronkov. A decision procedure for term algebras with queues.ACM Transactions on Computational Logic, 2(2):155–181, 2001.

21. Ting Zhang, Henny B. Sipma, and Zohar Manna. Decision procedures for recursive data struc-tures with integer constraints. In the 2nd International Joint Conference on Automated Reasoning(IJCAR’04), volume 3097 of Lecture Notes in Computer Science, pages 152–167. Springer-Verlag,2004.

22. Ting Zhang, Henny B. Sipma, and Zohar Manna. The decidability of the first-order theory ofterm algebras with Knuth-Bendix order. In Robert Nieuwenhuis, editor, the 20th InternationalConference on Automated Deduction (CADE’05), volume 3632 of Lecture Notes in Computer Science,pages 131–148. Springer-Verlag, 2005.

23. Ting Zhang, Henny B. Sipma, and Zohar Manna. Decision procedures for term algebras withinteger constraints. Information and Computation, 204:1526–1574, October 2006.

16

(d’-0) 16

11x − 2

7x − 1

2x

1 5

4 nil

8

14

nil 15

18

19 20

(d’-1) 16

11x − 2

7x − 1

2x

1 5

4 nil

8

14

nil 15

18

19 20

(d’-2) 16

11x − 2

7x − 1

2x

1 5

4 nil

11

8 14

nil 15

14

nil 15

18

19 20

Fig. 6. A detailed run of RB-I-F step (d) with x−2 , root∧T[x−2].dir = right.

17

(d’-3) 16

11x − 2

7x − 1

2x

1 5

4 nil

11

8 14

nil 15

18

19 20

18

19 20

(d’-4) 16

11x − 2

7x − 1

2x

1 5

4 nil

11

8 14

nil 15

18

19 20

18

19 20

(d’-5) 16

7x − 1

2x

1 5

4 nil

11

8 14

nil 15

18

19 20

Fig. 7. A detailed run of RB-I-F step (d) with x−2 , root∧T[x−2].dir = right(continued).

18

(d”-0) -16

-18

-20 -19

11x − 2

7x − 1

2x

1 5

4 nil

8

14

nil 15

(d”-1) -16

-18

-20 -19

11x − 2

7x − 1

2x

1 5

4 nil

8

14

nil 15

(d”-2) -16

-18

-20 -19

11x − 2

7x − 1

2x

1 5

4 nil

11

8 14

nil 15

14

nil 15

Fig. 8. A detailed run of RB-I-F step (d) with x−2 , root∧T[x−2].dir = le f t.

19

(d”-3) -16

-18

-20 -19

11x − 2

7x − 1

2x

1 5

4 nil

11

8 14

nil 15

-18

-20 -19

(d”-4) -16

-18

-20 -19

11x − 2

-18

-20 -19

7x − 1

2x

1 5

4 nil

11

8 14

nil 15(d”-5) -16

-18

-20 -19

7x − 1

2x

1 5

4 nil

11

8 14

nil 15

Fig. 9. A detailed run of RB-I-F step (d) with x− 2 , root∧T[x− 2].dir = le f t(continued).

20

A Proof

In this section we prove Theorem 1. The proof follows closely [23]. The strategygo as follows. By equality propagation we can partition terms into equivalenceclasses such that all induced equalities between terms are discovered. If noinconsistency has been found, then satisfiability reduces to satisfiability of dis-equalities, i.e., whether we can assign each equivalence class a distinct tree.Without integer constraints such a satisfying assignment always exists for thereare infinitely many distinct trees. But the presence of integer constraints mayrestrict our choice to a finite set so that not all disequalities can be satisfied atthe same time. [23] shows that we can overcome this difficulty by making theinteger constraints complete (Definition 7). Intuitively, this completion proce-dure adds to the original integer constraints counting constraints that “count”the number of distinct trees at a given integral measure. In the following weshow that counting constraints are expressible in PA.

We define counting constraints CNTrbn (x1, x2), CNTr

n(x1, x2) and CNTbn(x1, x2)

as follows.

CNTrbn (x1, x2) iff | { t ∈ Trb | |t|max = x1 ∧ |t|min = x2 } | > n ,

CNTrn(x1, x2) iff | { t ∈ Trb | |t|max = x1 ∧ |t|min = x2 ∧ Isred(t) } | > n ,

CNTbn(x1, x2) iff | { t ∈ Trb | |t|max = x1 ∧ |t|min = x2 ∧ Isblack(t) } | > n .

CNTn(x1, x2) states that there are more than n distinct terms of measure (x1, x2).Similarly for CNTr

n(x1, x2) and CNTbn(x1, x2).

Theorem 3 (Counting Constraints in RBZ). CNTn(x1, x2), CNTrn(x1, x2) and

CNTbn(x1, x2) are expressible by quantifier-free Presburger formulas that can be com-

puted in O(n).

Proof. We first consider three counting functions:

f (n1, n2) = |S| where S = { t | |t|max = n1 ∧ |t|min = n2 } ,f r(n1, n2) = |S| where S = { t | |t|max = n1 ∧ |t|min = n2 ∧ Isred(t) } ,f b(n1, n2) = |S| where S = { t | |t|max = n1 ∧ |t|min = n2 ∧ Isblack(t) } ,

where f (n1, n2) (n1, n2 > 0) gives the number of distinct trees whose maxi-mal black path has length n1 and minimal black path has length n2. Similarly,f r(n1, n2) (resp. f b(n1, n2)) gives the corresponding number for trees with red(resp. black) root.

These three functions are computable as they can be defined recursively asfollows.

f (1, 1) = 2 , (1)f (n1, n2) = f r(n1, n2) + f b(n1, n2) for n1 , 1 ∨ n2 , 1 , (2)f r(0, n2) = f r(n1, 0) = 0 , (3)f r(1, 1) = 1 , (4)

21

f r(n1, 1) = 2∑

i≤n1

f b(n1, i) for n1 > 1 , (5)

f r(n1, n2) ={ ( ∑

n2< j≤i<n1

f b(i, j))

· f b(n1, n2) + (6)

∑

n2≤i<n1

f b(i, n2) ·∑

n2≤ j≤n1

f b(n1, j) + (7)

∑

n2< j≤n1

f b(n1, j) ·∑

n2≤i≤n1

f b(i, n2) + (8)

f b(n1, n2) ·∑

n2≤ j≤i≤n1

f b(i, j)}

for n2 > 1 , (9)

f b(0, n2) = f b(n1, 0) = 0 , (10)f b(n1, 1) = 0 , (11)

f b(n1 + 1, 2) = 2∑

i≤n1

f b(n1, i) + 2∑

i≤n1

f r(n1, i) , (12)

f b(n1 + 1, n2 + 1) ={ ∑

(\,])∈{(r,b),(r,r),(b,b),(b,r)}[ ( ∑

n2< j≤i<n1

f \(i, j))

· f ](n1, n2) + (13)

∑

n2≤i<n1

f \(i, n2) ·∑

n2≤ j≤n1

f ](n1, j) + (14)

∑

n2< j≤n1

f \(n1, j) ·∑

n2≤i≤n1

f ](i, n2) + (15)

f \(n1, n2) ·∑

n2≤ j≤i≤n1

f ](i, j)] }

for n2 > 1 . (16)

Note that f (n1, n2) is defined to be the sum of f r(n1, n2) and f b(n1, n2) exceptwhen n1 = n2 = 1. The reason is that in this case nil counts as a tree havingboth a maximal black path and a minimal black path of length 1. The recurrencerelation for f r(n1, n2) is defined by (3)-(9). Formulas (3)-(5) take care of irregularbasic cases. In particular, f r(n1, 1) (n1 > 1) is so defined as a tree with a minimalblack path of length 1 must have nil as one child and a tree with black root asthe other child. For n2 > 1, f r(n1, n2) is defined as a sum of four terms (6)-(9).Term (6) corresponds to the cases that the left child contains neither a maximalblack path nor a minimal black path. Term (7) corresponds to the cases that theleft child contains a minimal black path but no maximal black path. Term (8)corresponds to the cases that the left child contains a maximal black path but nominimal black path. Term (9) corresponds to the cases that the left child containsboth a maximal black path and a minimal black path. The recurrence relationfor f b(n1 + 1, n2 + 1) is defined by (10)-(16). Similar as before, formulas (10)-(12)take care of basic but irregular cases. For n2 > 1, f b(n1 + 1, n2 + 1) is obtained

22

in a similar way as f r(n1, n2) (when n2 > 1) but taking into account that eachaforementioned term splits into four terms because there are four ways to colorthe root of the left child and the root of the right child. As a result, f b(n1+1, n2+1)(n2 > 1) contains 16 terms.

In the following we show that CNTn(x1, x2) is expressible in PA. The ex-pressibility of CNTr

n(x1, x2) and CNTbn(x1, x2) can be similarly obtained. Since

CNTn(x1, x2) define the set U = {(x, y) ∈ N2 | f (x, y) > n}, we can expressCNTn(x1, x2) in PA if and only if we can finitely represent U in PA. First wenote that f is non-decreasing. If f (x, y) > n then f (x + 1, y) > n because everytree with at least two paths and measure (x, y) can grow into a tree of measure(x + 1, y) by appending black(nil,nil) to a leaf node on a maximal black path.The exception happens to nil which only has one path. However, we still havef (2, 1) = f (1, 1) = 2 > 1. Also if f (x, y) > n and y < x, then f (x, y + 1) > nbecause every tree with measure (x, y) can grow into a tree of measure (x, y+ 1)by appending black(nil,nil) to leaf nodes on all minimal black paths. It followsthat if f (x, y) > n, x′ ≥ x, y′ ≥ y, and x′ ≥ y′ then we have f (x′, y′) > n. Next weexploit this property to finitely represent U in PA.

Let D = {(x, y) ∈N2 | x ≥ y} be the domain of legitimate pairs; for the lengthof a maximal black path must be greater than or equal to the length of a minimalblack path. Let <c, ≤c∈ D2 be the partial covering orderings such that

(x, y) <c (x′, y′) iff x ≤ x′ ∧ y ≤ y′ ∧ x + y < x′ + y′ ,(x, y) ≤c (x′, y′) iff (x, y) = (x′, y′) ∨ (x, y) <c (x′, y′) .

We say that (x, y) is covered (resp. strictly covered) by (x′, y′) if (x, y) ≤c (x′, y′)(resp. (x, y) <c (x′, y′)). The non-decreasing property implies that if f (x, y) > nand (x, y) ≤c (x′, y′), then f (x′, y′) > n. It follows that for a fixed n, U is upperclosed under ≤c, or equivalently U is a filter of 〈D,≤c〉. We can present U in PAif we can find a finite base set B ⊂ D generating U in the sense that

(a) ∀(x, y) ∈ B f (x, y) > n, and(b) ∀(x, y) ∈ U ∃(x′, y′) ∈ B (x′, y′) ≤c (x, y).

In other words, U is the upper closure of B in 〈D,≤c〉. Below we construct suchB.

Let (x, y) = (x′, y′) denote x = x′ ∧ y = y′. Let <l,≤l:∈ D2 be the contra-variantlexicographical linear orderings such that

(x1, y1) <l (x2, y2) iff x1 < x2 ∨ (x1 = x2 ∧ y2 < y1) ,(x1, y1) ≤l (x2, y2) iff (x, y) = (x′, y′) ∨ (x1, y1) <l (x2, y2) .

It follows from (10)-(9) that the computation of f r(x, y) relies on the values off b(x′, y′) where (x′, y′) ≤l (x, y), and the computation of f b(x, y) relies on thevalues of f b(x′, y′) and of f r(x′, y′) where (x′, y′) <l (x, y). Therefore, we can usedynamic programming to compute f b and f r inductively with respect to <l. Soobtained is f . Eventually we will find the first pair (xmin, ymax) (with respect to

23

<l) such that f (xmin, ymax) > n. In a similar way, for each positive i < ymax, wecan find the smallest x(i)

min such that f (x(i)min, i) > n. We claim that B is

{ (xmin, ymax) } ∪ { (x(i)min, i) | 0 < i < ymax } .

It suffices to show that for any (x, y), if f (x, y) > n then there exists (x′, y′) ∈ Bsuch that (x′, y′) ≤c (x, y). By the definition of <l and the fact that (xmin, ymax) isthe first pair (with respect to <l) such that f (xmin, ymax) > n, we have x ≥ xmin.If y ≥ ymax, then (xmin, ymax) ≤c (x, y). So without loss of generality assume0 < y < ymax. Since x(y)

min is the smallest number such that f (x(y)min, y) > n, we have

x ≥ x(y)min and hence (x(y)

min, y) ≤c (x, y). So if f (x, y) > n then (x, y) covers somepairs in B and hence B is a finite base for U.

Therefore we define CNTn(x1, x2) as

x1 ≥ x2 > 0 ∧( ∨

(i, j)∈B(x1, x2) ≥c (i, j)

)

.

Clearly, f (x1, x2) grows exponentially in terms of x1 or x2. Hence the size of Bis O(n), so is the size of CNTn(x1, x2). Similarly, we can obtain CNTr

n(x1, x2) andCNTb

n(x1, x2) of size O(n).

With the help of counting constraints we next show that given a combinedconstraints ΦRB(x)∧ΦZ(x) satisfying type-completeness and equality completeness(see below), there exists an integer constraint Φ∆(x) which precisely character-izes ΦRB(x) ∧ΦZ(x) in the sense that ΦRB(x) ∧ΦZ(x) is satisfiable in RBZ if andonly if Φ∆(x) is satisfiable in PA.

Definition 5 (Type Completeness in RBZ). A conjunction of literals ΦRB is typecomplete if for any selector term t occurring inΦRB, exactly one type of tester predicateIsred(t), Isblack(t), Isnil(t) is a conjunct of ΦRB.

Definition 6 (Equality Completeness in RBZ). ΦRB ∧ΦZ is equality completeif for any two terms u and v in ΦRB

– either u = v or u , v (but not both) is in ΦRB,– either |u|max = |v|max or |u|max , |v|max (but not both) is in ΦZ, and– either |u|min = |v|min or |u|min , |v|min (but not both) is in ΦZ.

If a formula Φ is type and equality complete, then we have all equalityinformation between terms occurring in Φ, and hence Φ induces a partitionbetween terms occurring in it. By CLSr

n(x, x1, . . . , xn) we denote the conjunctionof literals expressing that x0, . . . , xn are red-terms having the same measure butpairwise distinct. Similarly we define CLSb

n(x, x1, . . . , xn). It is not hard to see thatsuchΦ can be rewritten as a conjunction of literals of the forms CLSr

n(x, x1, . . . , xn)and CLSb

n(x, x1, . . . , xn).

24

Definition 7 (Length Constraint Completion (LCC) in RBZ). An L Z

RB-formulaΦ∆(x) is a length constraint completion (LCC) for ΦRB(x) ∧ΦZ(x) if the followingformulas are valid:

(∀x : Trb)[

ΦRB(x) ∧ΦZ(x) → (∃z : Z)(

Φ∆(z) ∧ |x| = z) ]

, (17)

(∀z : Z)[

Φ∆(z) → (∃x : Trb)(

ΦRB(x) ∧ ΦZ(x) ∧ |x| = z) ]

. (18)

Here |x| = z denote a sequence of equality |ti(x)| = zi and Φ∆(z) is obtained fromΦ∆(x) with each ti(x) replaced by the corresponding zi.

Algorithm 4 (Computation of LCC in RBZ) Input: ΦRB ∧ ΦZ (type and equalitycomplete). Initially set Φ∆ = ΦZ. For each term t add the following to Φ∆.

– |t|max = 1 and |t|min = 1, if t ≡ nil;– |t|max = |s|max and |t|min = |s|min if t = s is present in ΦRB;– |t|max = max(|t1|max, |t2|max) + 1 and |t|min = min(|t1|min, |t2|min) + 1, if t ≡

black(t1, t2);– |t|max = 0 and |t|min = 0 if t ≡ red(t1, t2) and either Isred(t1) or Isred(t2) are present

in ΦRB;– |t|max = |t1|max = |t2|max and |t|min = |t1|max = |t2|min if

1. t ≡ red(t1, t2),2. either Isblack(t1) or Isnil(t1) is present in ΦRB,3. either Isblack(t2) or Isnil(t2) is present in ΦRB;

– CNTrn(|t|max, |t|min) if CLSr

n(t, t1, . . . , tn) is induced byΦRB∧ΦZ for some t1, . . . , tn;– CNTb

n(|t|max, |t|min) if CLSbn(t, t1, . . . , tn) is induced byΦRB∧ΦZ for some t1, . . . , tn.

Theorem 5 (LCC in RBZ). Φ∆(x) obtained by Algorithm 4 is an LCC for ΦRB(x) ∧ΦZ(x) and is expressible in a quantifier-free Presburger formula of size linear in the sizeof ΦRB(x) ∧ΦZ(x).Proof. It follows immediately from Algorithm 4 that Φ∆ is expressible in aquantifier-free Presburger formula of size linear in the size of ΦRB ∧ ΦZ. Weare left to show the validity of (17) and (18). The validity of (17) is obvious asevery rule in Algorithm 4 is sound. We establish (18) by showing that givena satisfying assignment σ∆ for Φ∆(z), there is a satisfying assignment σRB forΦRB(x) such that |σRB| = σ∆, that is, |Jt(x)K| = JzK for each corresponding t(x) andz ∈ z. We assume that ΦRB(x) is consistent; otherwise (18) is trivially true. Alsofor simplicity we assume that no selectors occur in ΦRB(x) ∧ΦZ(x).

It is easily seen that the contra-variant lexicographical order <l has subtermproperty (with respect to ‖ · ‖), that is, for t1, t2 ∈ Trb, we have

‖t1‖ <l ‖black(t1, t2)‖ ‖t2‖ <l ‖black(t1, t2)‖ , if ‖black(t1, t2)‖ > 0 ;‖t1‖ ≤l ‖red(t1, t2)‖ ‖t2‖ ≤l ‖red(t1, t2)‖ , if ‖red(t1, t2)‖ > 0 .

Let σ∆ be a satisfying assignment ofΦ∆. We order all integer terms accordingto the measure as induced by σ∆ as follows.

‖t(1)0 ‖ = · · · = ‖t

(1)n1 ‖

︸︷︷︸

block 1

<l ‖t(2)0 ‖ = · · · = ‖t

(2)n2 ‖

︸︷︷︸

block 2

<l · · · · · · <l ‖t(k)0 ‖ = · · · = ‖t

(k)nk ‖

︸︷︷︸

block k

25

We assume that no variable is asserted to be nil as such an variable can beremoved by instantiation. Let Mi denote the measure of terms in the i-th block.For each i > 0, the i-th block contains ni terms in which there are n(b)

i black-termsand n(r)

i red-terms. In general, ni = n(b)i + n(r)

i except for the block containingnil. Without loss of generality we assume that the first block contains terms ofmeasure (0, 0); that is, terms violating red-black tree property (4) (Definition 2).Let us begin to build a partial assignment σRB from the second block which hasmeasure (1, 1). Only two distinct trees, namely nil and red(nil,nil) have measure(1, 1), and hence CNTr

n(1, 1)(x) will be false for any n > 0. Since Φ∆ is satisfiableand contains CNTr

n(1, 1)(x), we have n = 0, i.e., there is at most one red-term inthis block. If it is a variable, then it can be assigned red(nil,nil). Let us assume wehave partially assigned all terms up to the i-th block. For the (i+ 1)-th block, wefirst consider all black-terms, i.e., constructor terms of the form black(t1, t2) andvariables of black-type. Since both ‖t1‖ <l ‖black(t1, t2)‖ and ‖t2‖ <l ‖black(t1, t2)‖,t1 and t2 must have been assigned. So has been black(t1, t2). Due to the presenceof CNTb

n(b)i+1

(Mi+1) in Φ∆, we are able to assign each such variable a black-treeof measure Mi+1. Now let us consider the rest red-terms. For all constructorterm of the form red(t1, t2), we know that t1 and t2 must be black-terms or nil(because otherwise Mi+1 = (0, 0)). As before we also have ‖t1‖ ≤l ‖red(t1, t2)‖and ‖t2‖ ≤l ‖red(t1, t2)‖. So t1 and t2 can only appear in the first i blocks or in thisblock. In the former case, they have been assigned. In the latter case, they havebeen assigned too because they are black-trees. So unassigned terms are onlyvariables of red-type. As CNTr

n(r)i+1

(Mi+1) is present in Φ∆, we are able to assigneach variable a distinct red-trees of measure Mi+1. By induction we can build apartial assignment up to the k-th block.

Now let us go back to the first block which contains terms violating prop-erty (4). Terms appearing in this block are constructor terms that may be vari-ables or may contain variables appearing in other blocks of higher order. Letus denote y those variables appearing in other blocks and by x the rest of vari-ables. y have been assigned values, which have measure greater than (withrespect to <l) (0, 0). We assign x distinct trees having only red internal nodesand make sure that for any x1, x2 ∈ x, the difference between the heights (thelongest paths) of Jx1K and Jx2K is greater than the number of terms appearingso far, including those in the assignment. Since there are infinitely many treesviolating property 4, this is obviously feasible. We now finished the constructionof σRB.

It is clear that σRB satisfies all disequalities between terms from the 2-nd tok-th blocks. Let us consider disequalities of the form x , t(x, y) where x ∈ x inthe first block. t(x, y) can not be some y ∈ y since it will have measure (0, 0),contradicting the fact it appears in other blocks. For the same reason we willnot have disequalities y , t(x, y) for y ∈ y. Now if t(x, y) is some x′ ∈ x, thenobviously JxK , Jx′K as they are assigned distinct trees. If t(x, y) is a properconstructor terms containing some of x, then we still have JxK , Jt(x, y)K as bythe choice of values of x, JxK and Jt(x, y)K are trees of different height. All in

26

all, σRB respects all disequalities in the first block, and hence σRB is a satisfyingassignment such that |σRB| = σ∆.

Theorem 1 (Decidability of RBZ)

1. The first-order theory of RBZ is decidable and admit quantifier elimination.2. The decision problem for the quantifier-free fragment is NP-complete.

Proof. By Theorem 5 and the results in [23].

27

Verifying Balanced Treestheory.stanford.edu/~tingz/papers/lfcs07/lfcs07_full.pdfof red-black trees...

Documents

Transcript of Verifying Balanced Treestheory.stanford.edu/~tingz/papers/lfcs07/lfcs07_full.pdfof red-black trees...