Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of...

40
Proof–theoretic investigations on Kruskal’s theorem Michael Rathjen Department of Mathematics, Ohio State University Columbus, Ohio 43210, USA Andreas Weiermann Institut f¨ ur mathematische Logik und Grundlagenforschung derWestf¨alischenWilhelms–Universit¨atM¨ unster Einsteinstraße 62, D–W–4400 M¨ unster, Germany Abstract In this paper we calibrate the exact proof–theoretic strength of Kruskal’s theorem, thereby giving, in some sense, the most elementary proof of Kruskal’s theorem. Furthermore, these investigations give rise to ordinal analyses of restricted bar induction. Introduction S.G. Simpson in his article [10], “Nonprovability of certain combinatorial properties of finite trees”, presents proof–theoretic results, due to H. Friedman, about embeddability properties of finite trees. It is shown there that Kruskal’s theorem is not provable in ATR 0 . An exact description of the proof–theoretic strength of Kruskal’s theorem is not given. On the assumption that there is a bad infinite sequence of trees, the usual proof of Kruskal’s theorem utilizes the existence of a minimal bad sequence of trees, thereby employing some form of Π 1 1 comprehension. So the question arises whether a more constructive proof can be given. The need for a more elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem figures prominently in computer science, because it is the main tool for showing that sets of rewrite rules are terminating (see [3], p. 258, where this challenge is offered). Our paper gives a complete proof–theoretic characterization of Kruskal’s theorem in terms of ordinal notation systems, subsystems of second order arithmetic, and subsystems of Kripke– Platek set theory. The paper is divided into eleven parts. In Section 1 we introduce an ordinal notation system T A , which represents the Ackermann- ordinal (see [2] for a definition) in a natural way. It is shown that within ACA 0 , Kruskal’s theorem implies the well–foundedness of T A . The reversal of the latter implication constitutes the content of Section 2. Given a bad sequence of trees, we show how to produce effectivly a strictly descending sequence of ordinals in T A of the same length. 1

Transcript of Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of...

Page 1: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

Proof–theoretic investigations on Kruskal’s theorem

Michael RathjenDepartment of Mathematics, Ohio State University

Columbus, Ohio 43210, USA

Andreas WeiermannInstitut fur mathematische Logik und Grundlagenforschung

der Westfalischen Wilhelms–Universitat Munster

Einsteinstraße 62, D–W–4400 Munster, Germany

Abstract

In this paper we calibrate the exact proof–theoretic strength of Kruskal’s theorem, therebygiving, in some sense, the most elementary proof of Kruskal’s theorem.

Furthermore, these investigations give rise to ordinal analyses of restricted bar induction.

Introduction

S.G. Simpson in his article [10], “Nonprovability of certain combinatorial properties of finitetrees”, presents proof–theoretic results, due to H. Friedman, about embeddability propertiesof finite trees. It is shown there that Kruskal’s theorem is not provable in ATR0. An exactdescription of the proof–theoretic strength of Kruskal’s theorem is not given. On the assumptionthat there is a bad infinite sequence of trees, the usual proof of Kruskal’s theorem utilizes theexistence of a minimal bad sequence of trees, thereby employing some form of Π1

1 comprehension.So the question arises whether a more constructive proof can be given. The need for a moreelementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem figuresprominently in computer science, because it is the main tool for showing that sets of rewriterules are terminating (see [3], p. 258, where this challenge is offered).

Our paper gives a complete proof–theoretic characterization of Kruskal’s theorem in termsof ordinal notation systems, subsystems of second order arithmetic, and subsystems of Kripke–Platek set theory.

The paper is divided into eleven parts.In Section 1 we introduce an ordinal notation system TA, which represents the Ackermann-

ordinal (see [2] for a definition) in a natural way. It is shown that within ACA0, Kruskal’stheorem implies the well–foundedness of TA.

The reversal of the latter implication constitutes the content of Section 2. Given a badsequence of trees, we show how to produce effectivly a strictly descending sequence of ordinalsin TA of the same length.

1

Page 2: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

The equivalence of Kruskal’s theorem with the well–foundedness of TA then provides anupper bound for the order–types of simplification orderings, since Kruskal’s theorem can beused to proof the well–foundedness of these orderings.

For α ≤ ω let T(α) be the set of finite trees T such that every vertex of T has less thanα immediate successors (so KT (ω) is just Kruskal’s theorem). T(α) is quasi-ordered by thenatural tree-embeddability relation. Let KT (α) be the statement ”T(α) is well-quasi-ordered.”.We also show that ∀nKT (n) and KT (ω) are equivalent over ACA0.

In [10], p. 99, it is stated that Kruskal’s theorem is provable in the formal system T :=ACA0+Π1

2−BI. The investigations of this paper were mainly prompted by this remark. It turnsout that this is not quite true. Indeed, ACA0 +KT (ω) proves the uniform Π1

1 reflection principleof the latter theory, RFNΠ1

1(T ), and is therefore a stronger theory. The remainder of this paper

is devoted to proving T := ACA0 + Π12 −BI 0 ∀nKT (n) and ACA0 ` KT (ω)↔ RFNΠ1

1(T ).

Sections 4–10 pursue the ordinal analysis of the system ACA0 + Π12 − BI, thereby showing

that the order type of TA is the proof–theoretic ordinal of the latter system. The methodsused here are perfectly general in that they provide a general framework for analyzing all thetheories ACA0 + Π1

n−BI for n ≥ 2. Using results from [5], we also establish the proof–theoreticequivalence of ACA0 + Π1

n − BI and KPω− + Πn−Foundation for all n ≥ 2, where KPω−

stands for Kripke-Platek set theory including infinity but with foundation restricted to sets.An effective version of the ordinal analysis of ACA0 + Π1

2 − BI is sketched in Section 11,finally establishing ACA0 ` KT (ω)↔ RFNΠ1

1(T ).

1 An ordinal notation system

Firstly, we need some ordinal–theoretic background. Let On be the class of ordinals. LetAP := ξ ∈ On : ∃η ∈ On[ξ = ωη] be the class of additive principal numbers and let E := ξ ∈On : ξ = ωξ be the class of ε–numbers which is enumerated by the function λξ.εξ.

We write α =NF ωβ + δ if α = ωβ + δ and either, δ = 0 and β < α, or δ = ωδ1 + · · · + ωδk

with β ≥ δ1 ≥ . . . ≥ δk and k ≥ 1.Note that by Cantor’s normal form theorem, for every α /∈ E ∪ 0, there are uniquely

determined ordinals β and δ such that α =NF ωβ + δ.

Let Ω := ℵ1. For any α < εΩ+1 we define the set EΩ(α) which consists of the ε–numbersbelow Ω which are needed for the unique representation of α in Cantor normal form recursivelyas follows:

1. EΩ(0) := EΩ(Ω) := ∅,

2. EΩ(α) := α, if α ∈ E ∩ Ω,

3. EΩ(α) := EΩ(β) ∪ EΩ(δ) if α =NF ωβ + δ.

Let α∗ := max(EΩ(α) ∪ 0).

We define sets of ordinals C(α, β), Cn(α, β), and ordinals ϑα by main recursion on α < εΩ+1

and subsidiary recursion on n < ω (for β < Ω) as follows.

2

Page 3: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

(C1) 0,Ω ∪ β ⊆ Cn(α, β),

(C2) γ, δ ∈ Cn(α, β) ∧ ξ =NF ωγ + δ =⇒ ξ ∈ Cn+1(α, β),

(C3) δ ∈ Cn(α, β) ∩ α =⇒ ϑδ ∈ Cn+1(α, β),

(C4) C(α, β) :=⋃Cn(α, β) : n < ω,

(C5) ϑα := minξ < Ω: C(α, ξ) ∩ Ω ⊆ ξ ∧ α ∈ C(α, ξ).

Lemma 1.1 ϑα is defined for every α < εΩ+1.

Proof: Let β0 := α∗ + 1. Then α ∈ C(α, β0) via (C1) and (C2). Since the cardinality ofC(α, β) is less than Ω there exists a β1 < Ω such that C(α, β0)∩Ω ⊂ β1. Similarly there existsfor each βn < Ω (which is constructed recursively) a βn+1 < Ω such that C(α, βn) ∩ Ω ⊆ βn+1.Let β := supβn : n < ω. Then α ∈ C(α, β) and C(α, β)∩Ω ⊂ β < Ω. Therefore ϑα ≤ β < Ω.2

Lemma 1.2 1. ϑα ∈ E,

2. α ∈ C(α, ϑα),

3. ϑα = C(α, ϑα) ∩ Ω, and ϑα /∈ C(α, ϑα),

4. γ ∈ C(α, β) ⇐⇒ γ∗ ∈ C(α, β),

5. α∗ < ϑα,

6. ϑα = ϑβ =⇒ α = β,

7. ϑα < ϑβ ⇐⇒ (α < β ∧ α∗ < ϑβ) ∨ (β < α ∧ ϑα ≤ β∗),

8. β < ϑα ⇐⇒ ωβ < ϑα.

Proof: 1.) and 8.) issue from closure of ϑα under (C2).2.) and 3.) follow from Lemma 1.1 and the definition of ϑα.4.): If γ∗ ∈ C(α, β), then γ ∈ C(α, β) by (C2). On the other hand,

γ ∈ Cn(α, β) =⇒ γ∗ ∈ Cn(α, β) is easily seen by induction on n.5.): α∗ ∈ C(α, ϑα) holds by 4.). As α∗ < Ω, this implies α∗ < ϑα by 3.).6.): Suppose, aiming at a contradiction, that ϑα = ϑβ and α < β. Then C(α, ϑα) ⊆

C(β, ϑβ); hence α ∈ C(β, ϑβ) ∩ β by 2.); thence ϑα = ϑβ ∈ C(α, ϑβ), contradicting 3.).7.): Suppose α < β. Then ϑα < ϑβ implies α∗ < ϑβ by 5.). If α∗ < ϑβ, then α ∈ C(β, ϑβ);

hence ϑα ∈ C(β, ϑβ); thus ϑα < ϑβ. This shows

(a) α < β =⇒ (ϑα < ϑβ ⇐⇒ α∗ < ϑβ).

By interchanging the roles of α and β, and employing 5.), one obtains

(b) β < α =⇒ (ϑα < ϑβ ⇐⇒ ϑα ≤ β∗).

(a) and (b) yield 7.). 2

The Ackermann ordinal is denoted in this context by ϑΩω.

3

Page 4: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

Definition 1.1 Inductive definition of a set OT (ϑ) of ordinals and a natural number Gϑα forα ∈ OT (ϑ).

1. 0,Ω ∈ OT (ϑ), Gϑ0 := GϑΩ := 0,

2. α =NF ωβ + δ ∧ β, δ ∈ OT (ϑ) =⇒ α ∈ OT (ϑ), Gϑα := maxGϑβ,Gϑδ+ 1,

3. α = ϑα1 ∧ α1 ∈ OT (ϑ) =⇒ ϑα1 ∈ OT (ϑ), Gϑα := Gϑα1 + 1.

Observe that according to Lemma 1.2.1.) and 1.2.6.), the function Gϑ is well-defined. Eachordinal α ∈ OT (ϑ) has a unique normal form using the symbols 0,Ω,+, ω, , ϑ. Furthermore,if for α, β ∈ OT (ϑ), represented in their normal form, we were to decide α < β, we could dothis by deciding α0 < β0 for ordinals α0 and β0 that appear in these representations and, inaddition, satisfy Gϑα0 +Gϑβ0 < Gϑα+Gϑβ. This follows from Lemma 1.2 7.) and the recursiveprocedure for comparing ordinals in Cantor normal form. So we come to see the following fact.

Lemma 1.3 After a straightforward coding in the natural numbers, we may consider 〈OT (ϑ), <OT (ϑ)〉 as a primitive recursive ordinal notation system.

Lemma 1.4 1. OT (ϑ) =⋃C(α, 0) : α < εΩ+1,

2. OT (ϑ) ∩ α = α for α ∈ OT (ϑ) ∩ Ω.

From now on, we presume an effective coding of 〈OT (ϑ), < OT (ϑ)〉 in the natural numbers,so that the latter structure can be dealt with in ACA0 (actually in primitive recursive arithmetic).Of course, the well–foundedness of 〈OT (ϑ), < OT (ϑ)〉 is not provable in ACA0.

Next, we recall some basic definitions from [10]. A finite tree T is a finite partial order〈T,≤T 〉 such that,

1. (∃r ∈ T )(∀t ∈ T )[t 6= r → r ≤T t ∧ t 6≤T r],

2. (∀s ∈ T )(∀t ∈ T )(∀u ∈ T )[t ≤T s ∧ u ≤T s → t ≤T u ∨ u ≤T t].

For a finite tree T = 〈T,≤T 〉 and t, u ∈ T we denote the ≤T -infimum of t and u by t ∧T u.The uniquely determined ≤T -minimal element is called the root of T .

A finite tree T = 〈T,≤T 〉 is embeddable into a finite tree U = 〈U,≤U 〉 if there exists anone-to-one (embedding-) function f : T → U such that f(t ∧T u) = f(t) ∧U f(u) for all t, u ∈ T.

For a ∈ T let T a := b ∈ T : a ≤T b and T a = 〈T a,≤T T a〉. An immediate subtree U ofT has the form T a where a is an immediate ≤T -successor of the root of T . Every immediatesubtree U of T is embeddable into T .

Theorem 1.1 (Kruskal) For any ω-sequence 〈Ti : i < ω〉 of finite trees there exist indices i andj such that i < j < ω and Ti is embeddable into Tj.

In [10] it is shown that Kruskal’s theorem implies the well–foundedness of Γ0 within ACA0.This proof can be extended to also yield the well–foundedness of ϑΩω from Kruskal’s theorem inACA0. The proof utilizes a normal form for ordinals < ϑΩω. This normal form can be computedprimitive recursively.

We require some notation. Let Ω · 0 := 0 and Ω · (n + 1) := Ω · n + Ω. Let Ωn · 0 := 0. Ifβ = ωβ1 + · · ·+ ωβk and β1 ≥ . . . ≥ βk we set Ωn · β := ωΩ·n+β1 + · · ·+ ωΩ·n+βk .

4

Page 5: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

Proposition 1.1 Let α ∈ E ∩ ϑΩω. Then there exists a unique n < ω and unique ordinalsα0, . . . , αn < α such that α = ϑ(Ωn · αn + · · ·+ Ω0 · α0), and αn 6= 0 if n 6= 0.

Definition 1.2 For any α ≤ ω let T(α) be the set of trees T such that any vertex in T has lessthan α immediate successors. We use KT (α) to abbreviate that T(α) is well–quasi–ordered.

For α ∈ OT (ϑ) let WF (α) stand for “< restricted to β ∈ OT (ϑ) : β < α is well–founded”.

For ordinals α, β we denote by α⊕ β the (commutative) natural sum of α and β and by α⊗ βthe (commutative) natural product of α and β. These operations are defined as follows: Letα⊕ 0 := 0⊕ α := α and α⊗ 0 := 0⊗ α := 0.

Now suppose α = ωδ0 + · · ·+ωδn ≥ δ0 ≥ . . . ≥ δn, n ≥ 0 and β = ωδn+1 + · · ·+ωδm ≥ δn+1 ≥. . . ≥ δm,m ≥ n+ 1. Then

α⊕ β := ωδπ(0) + · · ·+ ωδπ(m) ,

where π is a permutation of 0, . . . ,m such that δπ(i) ≥ δπ(j) if i < j ≤ m. We define α⊗ β tobe

(ωδ0⊕δn+1 ⊕ · · · ⊕ ωδ0⊕δm)⊕ · · · ⊕ (ωδn⊕δn+1 ⊕ · · · ⊕ ωδn⊕δm).

Theorem 1.2 ACA0 ` (∀n < ω)KT (n)→WF (ϑΩω)

Proof. We reason within ACA0. We shall define a primitive recursive mapping

o : T(ω) −→ α ∈ OT (ϑ) : α < ϑΩω

by recursion on the number of elements of a tree T , | T |. If T consists only of its root, thenlet o(T ) = 0. Otherwise the root of T has finitely many immediate successors a0, . . . , ak. LetTi be the immediate subtree of T determined by ai. Since | Ti | < | T |, we may assume thatαi := o(Ti) is already defined. We may further assume that α0 ≥ . . . ≥ αk. We set o(T ) = α0

if k = 0, o(T ) = α1 ⊕ α0 if k = 1 (⊕ being defined in Definition 1.2), o(T ) = ωα0 if k = 2 andωα0 > α0, o(T ) = ωα0+1 if k = 2 and ωα0 = α0 and o(T ) = ϑα0 if k = 3. To deal with the casek ≥ 4 we introduce some auxiliary notation. For finite sequences of ordinals we define

〈β0, . . . , βm〉 <l 〈γ0, . . . γn〉 ⇐⇒maxβ0, . . . , βm ≤ maxγ0, . . . γn∧[2 ≤ m < n ∨ [2 ≤ m = n ∧ (∃j < m)[βj < γj ∧ (∀k < j)βk = γk]

].

For a finite sequence 〈β0, . . . , βm〉 (m ≥ 1) of countable ordinals let

ϑ(〈β0, . . . , βm〉) := ϑ(Ωm · (1 + βm) + Ωm−1 · βm−1 + · · ·+ Ω0 · β0).

Then, by Lemma 1.5 below, 〈β0, . . . , βm〉 <l 〈γ0, . . . , γn〉 implies ϑ(〈β0, . . . , βm〉) < ϑ(〈γ0, . . . , γn〉).Now we can complete the definition of o(T ) for k ≥ 4. Let

S := 〈βπ(0), . . . , βπ(i)〉 : 1 ≤ i ≤ k and π is a permutation of 0, . . . , i.

Let 〈β0, . . . , βi0〉 be the (k−3)-th element of S with respect to <l and let o(T ) = ϑ(〈β0, . . . , βi0〉).Note that this assignment of ordinals to trees is weakly increasing, i.e, o(Ta) ≤ o(T ) if Ta is animmediate subtree of T . This is due to the fact, that for β0, . . . , βn < Ω we have β0, . . . , βn <ϑ(Ωn · βn + · · ·+ Ω0 · β0).

Now we define (for technical reasons) a set of ordinal representations OT ′ ⊆ ϑΩω as follows.

5

Page 6: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

1. 0 ∈ OT ′.

2. γ = ωα + β & α, β ∈ OT ′ ⇒ γ ∈ OT ′.

3. α0, . . . , αn ∈ OT ′ & cardα0, . . . , αn = n+ 1⇒ ϑ(Ωn · αn + · · ·Ω0 · α0) ∈ OT ′,

where card(X) stands for the number of elements of a finite set X.Then the order type of OT ′ is equal to ϑΩω. Moreover the well-foundedness of ϑΩω follows

(provably in ACA0) from the well-foundedness of OT ′. For, let h : ϑΩω → OT ′ be defined byh(0) := 0, h(ωα + β) := ωh(α) + h(β) and

h(ϑ(Ωn · αn + · · ·+ Ω0 · α0)) := ϑ(Ωn(ω · h(αn) + n) + · · ·+ Ω0 · (ω · h(α0) + 0)).

Then γ < δ < ϑΩω implies h(γ) < h(δ) as can be readily verified.By induction on Gϑα one easily verifies that for α ∈ OT ′ there is a tree T such that o(T ) = α.

For α ∈ E one has to employ Proposition 1.1.Given an embedding of trees f : T1 −→ T2, where o(T1), o(T2) ∈ OT ′, we claim that o(T a1 ) ≤

o(T f(a)2 ) for each a ∈ T1. The proof is by induction on | T a1 |; it just springs from the above

considerations and the following observation. Let ξ0 < . . . , ξm < Ω, η0 < . . . < ηn < Ω andg : ξ0, . . . , ξm → η0, . . . , ηn be a one to one mapping such that g(ξi) ≥ ξi for 1 ≤ i ≤ m.Then

Ωn · ξπ(n) + · · ·+ Ω0 · ξπ(0) ≤ Ωn · g(ξπ(n)) + · · ·+ Ω0 · g(ξπ(0))

holds for every permutation π : 1, . . . ,m → 1, . . . ,m.If now 〈αk : k < ω〉 were an infinitely descending sequence of ordinals from OT ′, then

we would get a corresponding sequence of finite trees 〈Tk : k < ω〉 with o(Tk) = αk for eachk < ω. Since supϑ(Ωn + · · · + Ω0) : n < ω = ϑΩω, there would exist n0 such that for allk < ω, Tk ∈ T(n0). Therefore, by KT (n0), we could pick i < j < ω such that Ti ≤ Tj .Hence o(Ti) = αi ≤ o(Tj) = αj , contradicting the assumption that 〈αk : k < ω〉 is an infinitelydescending sequence of ordinals. 2

In the next Section we will frequently draw on the following result (we already did in the previousTheorem).

Lemma 1.5 Suppose α0, . . . , αm, β0, . . . , βn ∈ OT (ϑ) ∩ Ω. Assume Ωn · βn + · · · + Ω0 · β0 <Ωm · αm + · · ·+ Ω0 · α0 and max(β∗0 , . . . , β

∗n) ≤ max(α∗0, . . . , α

∗m). Then

ϑ(Ωn · βn + · · ·+ Ω0 · β0) < ϑ(Ωm · αm + · · ·+ Ω0 · α0).

Proof. This follows from lemma 1.2 7.). ut

2 An elementary proof of Kruskal’s theorem

From now on we argue in ACA0 (in fact, RCA0 would suffice).Let TA := α ∈ OT (ϑ) : α < ϑΩω and <TA := (< OT (ϑ)) TA. Then 〈TA, <TA〉 is

a primitive recursive ordinal notation system for the Ackermann ordinal. We introduce some

6

Page 7: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

more notation. A quasi–order X is an ordered pair 〈X,≤〉 where X is a countable set and ≤ is abinary, reflexive and transitive relation on X. X = ∅ is explicitly allowed. A (finite or countablyinfinite) sequence ~x = 〈x0, x1, . . .〉 of elements in X is called bad if there are no indices i and jsuch that i < j < length(~x) and xi ≤ xj . In particular, the empty sequence is bad.

X is a well–quasi–order if all bad sequences in X are finite. Let Bad(X) be the set of allfinite bad sequences in X. A reification of X into α ∈ OT (ϑ) is a mapping f : Bad(X)→ α+ 1such that f(~x_y) < f(~x) for every ~x ∈ Bad(X) and every y ∈ X such that ~x_y ∈ Bad(X).

Lemma 2.1 The following is provable in ACA0: If f is a reification of X into α and α iswell–founded, then X is a well–quasi–order.

Proof: An infinite bad sequence in X would give rise to an infinite descending chain below α+1.2

We introduce some more terminology. Let X = 〈X,≤〉 be a quasi-order. For ~x ∈ Bad(X) letX~x := y ∈ X : ~x_y ∈ Bad(X) and X~x := 〈X~x,≤ X~x〉.Let X0 = 〈X0,≤0〉 and X1 = 〈X1,≤1〉 be quasi–orders.

We define quasi–orders X0⊕X1 and X0⊗X1 as follows. The domain of X0⊕X1 is the disjointunion X0 ] X1 of X0 and X1. Therefore X0 ] X1 consists of ordered pairs 〈0, x0〉 and 〈1, x1〉where x0 ∈ X0 and x1 ∈ X1. X0 ]X1 is quasi–ordered by a relation ≤0⊕≤1 as follows:

〈i, u〉 ≤0⊕≤1 〈j, v〉 :⇐⇒ i = j ∧ u ≤i v.

When we write X0]X1, we assume (without loss of generality) that X0 and X1 are disjoint andwe will identify x0 ∈ X0 with 〈0, x0〉 and x1 ∈ X1 with 〈1, X1〉. The domain of X0 ⊗ X1 is thecartesian product X0 ×X1 of X0 and X1. X0 ×X1 is quasi-ordered by the relation ≤0⊗≤1 asfollows:

〈x0, x1〉 ≤0⊗≤1 〈x′0, x′1〉 :⇐⇒ x0 ≤0 x′0 ∧ x1 ≤1 x

′1.

For a given quasi–order X = 〈X,≤〉 we define the quasi–order X<ω as follows:The domain of X<ω consists of the set X<ω of all finite sequences in X. X<ω is quasi–orderedby a relation ≤<ω as follows:

〈x1, . . . , xm〉 ≤<ω 〈x′1, . . . , x′n〉

if and only if there exists a sub–sequence 1 ≤ i1 < . . . < im ≤ n such that xl ≤ x′il for every

1 ≤ l ≤ m.1If X0 and X1 are well–quasi–orders then X0 ⊕ X1,X0 ⊗ X1 and X<ω0 are well–quasi–orders,

too. According to [8], this can be shown in ACA0 (especially, Higman’s lemma is provable inACA0).

A finite tree T with labels in a quasi-order X = 〈X,≤〉 is an ordered triple 〈T,≤T , lT 〉 suchthat 〈T,≤T 〉 is a finite tree and lT : T → X. If T = 〈T,≤T , lT 〉 and U = 〈U,≤U , lU 〉 are finitetrees with labels in a quasi-order X = 〈X,≤X〉 we say that T is embeddable into U if there existsan embedding-function f : T → U such that lT (t) ≤X lU (f(t)) for every t ∈ T . Let T (X) be theset of finite trees with labels in X and let ≤T (X) be the corresponding embeddability relation.Let T(X) := 〈T (X),≤T (X)〉.

1To be precise, the empty sequence is the bottom element with respect to the ordering ≤<ω.

7

Page 8: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

Theorem 2.1 (Kruskal) T(X) is a well-quasi-order for every well-quasi-order X.

We want to show that the existence of a reification of X into α for appropriate implies theexistence of a reification of T(X) into ϑ(Ωω · α). This will imply Kruskal’s theorem by takingthe domain of X to be a singleton, i.e., α = 1. For technical reasons we introduce the followingterminology, which is due to Schmidt [6].

Definition 2.1 Let Xi = 〈Xi,≤i〉 (i = 0, . . . , n) be pairwise disjoint quasi–orders and letα0, . . . , αn ordinals such that 0 < α0 < . . . < αn ≤ ω. Let X := X0 ⊕ . . .⊕ Xn. Let

T

(X0 . . . Xnα0 . . . αn

)be the set of all finite trees T = 〈T,≤T , lT 〉 in T (X) such that for every vertex t ∈ T , if the labellT (t) of t is in Xi, then t has strictly fewer than αi immediate successors in T . Let ≤T ( ~X,~α) be

the restriction of the tree-embeddability-relation to T

(X0 . . . Xnα0 . . . αn

)and let

T(

X0 . . . Xnα0 . . . αn

):= 〈T

(X0 . . . Xnα0 . . . αn

),≤T ( ~X,~α)〉.

An ordinal ρ is said to be additively closed if α+β < ρ holds whenever α, β < ρ. The first threeordinals that are additively closed are 0, 1, ω. The additively closed ordinals > 0 are exactlythe ordinals of the form ωδ for some δ. From the latter it immediately follows that the productof two additively closed ordinals is additively closed, too. Also note that every ε-number isadditively closed.

Theorem 2.2 The following is provable in ACA0: Let Xi (i = 0, . . . , n) be pairwise disjointquasi–orders Let 0 < α0 < . . . < αn ≤ ω and β0, . . . , βn ∈ OT (ϑ) ∩ Ω. Suppose that for eachi ∈ 0, . . . , n, fi is a reification of Xi into βi, where βi is additively closed and, moreover, therange of fi consists entirely of additively closed ordinals. Then there exists a reification of

T(

X0 . . . Xnα0 . . . αn

)into ϑ(Ωαn · βn + . . .+ Ωα0 · β0).

The proof requires some preparations. Let X0, . . . ,Xn be names, that means appropriate Godel-numbers, for pairwise disjoint quasi–orders X0, . . . ,Xn and let 0 < α0 < . . . < αn ≤ ω. Let

Comp

(X0 . . . Xnα0 . . . αn

)be the least collection (of names for quasi–orders) which is closed under the following rules:

1. For every i ≤ n:

Xi ∈ Comp(

X0 . . . Xnα0 . . . αn

)is a name for Xi.

8

Page 9: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

2. For every i ≤ n and every ~xi ∈ Bad(Xi) :

Xi~xi ∈ Comp(

X0 . . . Xnα0 . . . αn

)is a name for Xi~xi .

3. If

Y0, . . . ,Yr ∈ Comp(

X0 . . . Xnα0 . . . αn

)are names for Y0, . . . ,Yr then

⊕Yl : l ≤ r, ⊗Yl : l ≤ r ∈ Comp =

(X0 . . . Xnα0 . . . αn

)are names for ⊕Yl : l ≤ r and ⊗ Yl : l ≤ r, respectively.

4. If

X ∈ Comp(

X0 . . . Xnα0 . . . αn

)is a name for X then

X<ω ∈ Comp(

X0 . . . Xnα0 . . . αn

)is a name for X<ω.

5. If

Y0, . . . ,Ym ∈ Comp(

X0 . . . Xnα0 . . . αn

)are names for pairwise disjoint quasi–orders Y0, . . . ,Ym and0 < γ0 < . . . < γm ≤ ω and γm ≤ αn, then

T

(Y0 . . . Ym

γ0 . . . γn

)∈ Comp

(X0 . . . Xnα0 . . . αn

)is a name for

T(

Y0 . . . Ymγ0 . . . γm

).

For the remainder of this section we will assume that Xi (i = 0, . . . , n) are pairwise disjointquasi–orders and that we have ordinals β0, . . . , βn ∈ OT (ϑ)∩Ω and reifications fi (i ∈ 0, . . . , n)of Xi into βi, where βi is additively closed and, moreover, the range of fi consists entirely ofadditively closed ordinals. The reason for this requirement will become clear in the proof of casetwo of Lemma 2.3 below.

9

Page 10: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

Definition 2.2 Let X0, . . . ,Xn be names for the quasi–orders X0, . . ., Xn and let 0 < α0 <

. . . < αn ≤ ω. For each X ∈ Comp(

X0 . . . Xnα0 . . . αn

)let o(X) be defined recursively as follows:

1. o(Xi) := βi.

2. o(Xi~xi) := fi(~xi).

3. o(⊕ Yl : l ≤ r

):= ⊕o(Yl) : l ≤ r and o

(⊗ Yl : l ≤ r

):= ⊗o(Yl) : l ≤ r, where

⊕ and ⊗ featuring in the right hand sides of these equations refer to the natural ordinalsum and the natural ordinal product, respectively, introduced in Definition 1.2.

4. o(X<ω) := ϑ(o(X)).

5. o(T

(Y0 . . . Ym

γ0 . . . γm

)):= ϑ

(Ωγm · o(Ym) + · · ·+ Ωγ0 · o(Y0)

).

Definition 2.3 Let 0 < α0 < . . . < αn ≤ ω. For every

X ∈ Comp(

X0 . . . Xnα0 . . . αn

)we define the complexity number c(X) recursively as follows:

1. c(Xi) := 0,

2. c(Xi~xi) := 0,

3. c(⊕Yl : l ≤ r) := c(⊗Yl : l ≤ r):= maxc(Yl) : l ≤ r+ 1,

4. c(X<ω) := c(X) + 1,

5. c(T

(Y0 . . . Ym

α0 . . . αm

)):= maxc(Yl) : l ≤ m+ 1

Definition 2.4 Let X0 = 〈X0,≤0〉 and X1 := 〈X1,≤1〉 be two quasi–orders.A function e : X0 → X1 is called a quasi–embedding, if e(x0) ≤1 e(x

′0) implies x0 ≤0 x′0 for

every x0, x′0 ∈ X.

Lemma 2.2 1. Let X0 = 〈X0,≤0〉,X1 = 〈X1,≤1〉 and X2 = 〈X2,≤2〉 be quasi–orders ande0 : X0 → X1, e1 : X1 → X2 be quasi–embeddings. Then e1 e0 : X0 → X2 is aquasi–embedding.

2. Let X0 = 〈X0,≤0〉,X1 = 〈X1,≤1〉 be quasi-orders, e : X0 → X1 be a quasi–embeddingand f : X1 → δ be a reification of X1 into δ. Then f and e induce the reification f e ofX0 into δ.

10

Page 11: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

Definition 2.5 Let X = 〈X,≤〉 be a quasi–order and let x ∈ X.LX(x) := y ∈ X : ¬ x ≤ y. LX(x) is quasi–ordered by the restriction of ≤ to LX(x).Let X(x) := 〈LX(x),≤ LX(x)〉.

Note that a quasi–order X is a well–quasi–order if X(x) is a well–quasi–order for every x ∈ X .Theorem 2.2 will follow from the next lemma. (Our proof of this lemma is partly based on [6].)

Lemma 2.3 Let 0 < α0 < . . . < αn ≤ ω,

X ∈ Comp(

X0 . . . Xnα0 . . . αn

)be a name for a quasi–order X = 〈X,≤〉, and x ∈ X. Then there exists a name

Xx ∈ Comp(

X0 . . . Xnα0 . . . αn

)for a quasi–order Xx = 〈Xx,≤x〉 and a quasi–embedding e(X, x) : LX(x) → Xx such thato(Xx) < o(X).

We show how the theorem follows from this lemma. Let

〈s0, . . . , sk〉 ∈ Bad(T(

X0 . . . Xnα0 . . . αn

))and define recursively (through the use of Lemma 2.3)

f(〈s0, . . . sk〉) := o

(. . .

(T

(X0 . . . Xn

α0 . . . αn

)t0). . .

)tk,

where t0 := s0 , T0 := T

(X0 . . . Xn

α0 . . . αn

), e0 := e(T0, t0) , and, for i < k, ei+1 := e(Ti, ti) ei,

ti+1 = ei+1(si+1) and Ti+1 := Ttii . Then this function f is a reification of

T(

X0 . . . Xnα0 . . . αn

)into ϑ

(Ωαn · βn + . . .+ Ωα0 · β0

).

Proof of Lemma 2.3 by induction on c(X). Our proof strategy is to define e(X, x) by primitiverecursion in previously defined functions.

Case 1: c(X) = 0.Assume X = Xi~xi and ~xi ∈ Bad(Xi). Then LXi

~xi

(x) = Xi〈~xi,x〉 since x ∈ Xi

~xi.

Then e(X, x) := id LXi~xi

(x) is quasi–embedding of X(x) into Xi〈~xi,x〉. Put Xx := Xi〈~xi,x〉 and let

Xx be a name for Xx. Then o(Xx) = fi(〈~xi, x〉) < fi(~x) = o(Xi〈~xi〉).

Case 2: X = ⊕Yl : l ≤ r, X = ⊕Yl : l ≤ r or X = (Y)<ω. In these cases the assertionfollows by arguments employed in [8]. The proof is actually similar to the proof of Case 3but much simpler. However, we will present more details to clarify the role of the standingassumption that the βi are additively closed and the range of fi consists of additively closed

11

Page 12: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

ordinals. Mainly due to distributivity of ⊗ over ⊕, i.e., X0 ⊗ (X2 ⊕ X3) being isomorphic to(X0⊗X1)⊕(X0⊗X2) and o(X0⊗(X2⊕X3)) = o((X0⊗X1)⊕(X0⊗X2)), every X of either formX = ⊕Yl : l ≤ r or X = ⊕Yl : l ≤ r is a name of a quasi-order isomorphic to a quasi-orderwith name ⊕

Yi0 ⊗ . . .⊗Yini : i ≤ m

such that o(X) = o(⊕Yi0 ⊗ . . . ⊗ Yini : i ≤ m), c(Yij) < c(X) and o(Yij) is additively

closed. This is due to the fact that the ‘atomic’ names of Definition 2.3 (introduced in the firsttwo clauses) get assigned additively closed ordinals and that every ordinal of the form ϑα isadditively closed.

Now assume x ∈⊕Y i0 ⊗ . . . ⊗ Y ini : i ≤ m. Without loss of generality suppose x =

(x0, . . . , xn0) ∈ Y 00 ⊗ . . .⊗ Y 0n0 . Then put

Xx := ((Y00)x0 ⊗Y01 ⊗ . . .⊗Y0n0)

⊕ (Y00 ⊗ (Y01)x1 ⊗ . . .⊗Y0n0)

...

⊕ (Y00 ⊗Y01 ⊗ . . .⊗ (Y0n0)xn0 )

⊕ (Y10 ⊗Y11 ⊗ . . .⊗Y1n1)

...

⊕ (Ym0 ⊗Ym1 ⊗ . . .⊗Ymnm).

By the induction hypothesis, o((Y0j)xj ) < o(Y0j), and thus

o(((Y00)x0⊗Y01⊗ . . .⊗Y0n0)⊕ . . .⊕ (Y00⊗Y01⊗ . . .⊗ (Y0n0)xn0 )) < o(Y00⊗Y01⊗ . . .⊗Y0n0)

since the latter ordinal, being a product of additively closed ordinals, is additively closed. As aresult, o(Xx) < o(X). It is also clear how to construct the quasi-embedding e(X, x) : LX(x)→ Xx

utilizing the inductively available quasi-embeddings e(Y0j , xj) : LY 0j (xj)→ Y 0j (see [8]).

Case 3:

X = T

(Y0 . . . Ym

γ0 . . . γm

).

Let X be the domain of X and let x ∈ X. Then x is a finite tree with labels in ⊕Yl | l ≤ m.At this stage of the proof we employ a subsidiary induction on the number of vertices (elements)of x.

Subcase 3.1: Assume x is a tree consisting only of a root with a label y ∈ Y i. By themain induction hypothesis we can pick a quasi–embedding e(Yi, y) : LY i(y)→ (Y i)y such thato((Yi)y) < o(Yi). The quasi-embedding e(Yi, y) induces a natural quasi–embedding of

T(. . . Yi(y) . . .. . . γi . . .

)

into T(. . . (Yi)y . . .. . . γi . . .

)=: Xx.

12

Page 13: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

(Here the “. . .” parts are those which remain unchanged.) Since X(x) is trivially quasi-embeddable

into T(. . . Yi(y) . . .. . . γi . . .

)we get a quasi-embedding from X(x) into Xx. The ordinal of the

name of Xx is strictly less than o (X) by Lemma 1.5.Subcase 3.2: Assume now that x is a tree with root–label y ∈ Y i where the root has exactly

N < γi immediate subtrees x0, . . . , xN−1 ∈ X.Assume first that N = γk for some k < i. By the subsidiary induction hypothesis we get

quasi–embeddings e(X, xj) : LX(xj)→ Xxj such that o (Xxj ) < o (X).By our main induction hypothesis we can pick a quasi–embedding

e(Y, y) : LY i(y)→ (Y i)y such that o ((Yi)y) < o (Yi).Let Z := Yi ⊗

(⊕(Xx0)<ω ⊗ . . . ⊗ (Xxj )<ω : j < N

)and let Z = 〈Z,≤Z〉 be the corre-

sponding quasi–order. Set

Xx := T

(. . . Yk ⊕Yi ⊕ Z . . . (Yi)y . . .. . . γk . . . γi . . .

).

Then Xx is a name for a quasi–order Xx and o(Xx) < o(X) holds by Lemma 1.2,(7).If N 6= γk for all k < i, put

Xx := T

(. . . Yi ⊕ Z . . . (Yi)y . . .. . . N . . . γi . . .

),

where the new column has to be inserted at the place which is determined by the ordering ofγ0, . . . , γi, N.

We shall focus on the case N = γk since the other case is similar. We construct a quasi–embedding e(X, x) of X(x) into Xx by primitive recursion from e(X, x0), . . . , e(X, xN−1) ande(Y, y).

Let z ∈ LX(x). We define e(X, x)(z) by recursion on the number of vertices of z.Assume first that z is a tree consisting only of a root which carries a label v ∈ Yj .If j 6= i define e(X, x)(z) := z. Then e(X, x)(z) ∈ Xx.If v ∈ Y i, then v ∈ Y k ] Y i ] Z. So let e(X, x)(z) := z ∈ Xx.

Assume now that z is a tree with root-label v ∈ Y j and z0, . . . , zl−1 are the immediate sub-trees, where l < αj . We can assume inductively that e(X, x)(z0),. . .,e(X, x)(zl−1) are defined.If j 6= i or l < N let e(X, x)(z) be the tree with root–label v and the immediate subtreese(X, x)(z1), . . . , e(X, x)(zl−1). Then e(X, x)(z) ∈ Xx.

Now assume j = i and l ≥ N .If v ∈ LY i(y) let e(X, x)(z) be the tree with root label e(Y, y)(v) and the immediate subtrees

e(X, x)(z1), . . . e(X, x)(zl−1).Assume also y ≤Y i v, where ≤Yi is the quasi–ordering on Y i. Let ≤X be the quasi–ordering

on X. Since z ∈ LX(x) and y ≤Y i v,

〈x0, . . . xN−1〉 ≤<ωX 〈z0, . . . , zl−1〉

does not hold. (Otherwise we would have x ≤X z.) So there exists a minimal s < N such that〈x0, . . . xs−1〉 ≤<ωX 〈z0, . . . , zl−1〉 does not hold. Then 〈x0, . . . xs−2〉 ≤<ωX 〈z0, . . . , zl−1〉.

13

Page 14: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

Let j0 < l be minimal such that x0 ≤X zj0 . Let j1 < l minimal such that j0 < j1 and x1 ≤ zj1 .Let finally js−2 < l be minimal such that js−3 < js−2 and xs−2 ≤ zjs−2 . Then zj0 , . . . , zjs−2 ∈ X,〈z0, . . . , zj0−1〉 ∈ LX(x0)<ω, 〈zjs−3+1, . . . zjs−2−1〉 ∈ LX(xs−2)<ω, and 〈z(js−2)+1, . . . , zl−1〉 ∈LX(xs−1)<ω. Define e(X, x)(z) to be the tree determined by the immediate subtrees e(X, x)(zj0),-. . .,e(X, x)(zjs−2) and the root labeled with

〈v, 〈〈e(X, x)(z0), ..., e(X, x)(zj0−1)〉, ..., 〈e(X, x)(z(js−2)+1), ..., e(X, x)(zl−1)〉〉〉

which is an element of Y i ] Z. Then e(X, x)(z) ∈ Xx since s− 1 < γk.Next, we show that e(X, x)(z) ≤x e(X, x)(z′) implies z ≤X z′ by induction on the sum of

vertices of z and z′. If e(X, x)(z) is embeddable into an immediate subtree z of e(X, x)(z′), thenz has the form e(X, x)(z′′) for some immediate subtree z′′ of z′. Then z ≤x z′′ ≤x z′ by theinduction hypothesis.

Now we assume that e(X, x)(z) is embeddable into e(X, x)(z′) and that e(X, x)(z) is notembeddable into an immediate subtree of e(X, x)(z′). Then the root-label r of e(X, x)(z) is lessthan or equal to the root-label r′ of e(X, x)(z′) with respect to the appropriate quasi-ordering.Thus, if

r = 〈v, 〈〈e(X, x)(z0), ..., e(X, x)(zj0−1)〉, ..., 〈e(X, x)(z(js−2)+1), ..., e(X, x)(zl−1)〉〉〉

and

r′ = 〈v′, 〈〈e(X, x)(z′0), ..., e(X, x)(z′j′0−1)〉, ..., 〈e(X, x)(z′(j′s′−2

)+1), ..., e(X, x)(z′l′−1)〉〉,

then s = s′, v ≤Y i v′,

〈e(X, x)(z0), ..., e(X, x)(zj0−1)〉 (≤x)<ω 〈e(X, x)(z′0), ..., e(X, x)(z′j′0−1)〉, ...,

and finally

〈e(X, x)(z(js−2)+1), ..., e(X, x)(zl−1)〉 (≤x)<ω〈e(X, x)(z′(j′s′−2

)+1), ..., e(X, x)(z′l′−1)〉.

Furthermore, there exists a permutation π of 0, . . . , s− 2 such that e(X, x)(zj)≤x e(X, x)(z′π(j)) for j ≤ s−2. Therefore, by the inductive assumption, zj ≤X z′π(j) for j ≤ s−2.

By combining the above results, we obtain a one-to-one mapping ρ : 0, . . . , l−1 → 0, . . . , l′−1such that zi ≤x z′l(i) holds for 0 ≤ i ≤ l − 1. Since v ≤Y i= v′, we conclude z ≤X z′. ut

Corollary 2.1 ACA0 ` ∀nKT (n)↔ KT (ω)↔WF (ϑΩω).

Proof. Clearly KT (ω) implies ∀nKT (n) and, by Theorem 1.2, we know that ∀nKT (n) impliesWF (ϑΩω) on the basis of ACA0.

Now assume WF (ϑΩω). Let X0 be a quasi-order with exactly one element e. Then the

quasi-order of finite trees can be identified with T(

X0

ω

). Also note that the function e 7→ 0 is

a reification of X0 into 1 which meets the requirements of Theorem 2.2. Thus by said Theorem

we obtain a reification of T(

X0

ω

)into ϑ(Ωω · 1), showing that WF (ϑΩω) implies KT (ω). 2

14

Page 15: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

3 A comparison between two ordinal notation systems for theHoward-Bachmann-ordinal

We have just seen that the ordinal notation system for the Howard-Bachmann-ordinal whichis based on the function ϑ is an appropriate tool for the ordinal analysis of Kruskal’s theorem.For a perspicious proof-theoretic analysis of Π1

2 bar induction we need another concept forrepresenting the Howard-Bachmann-ordinal namely the ψ–function which is due to Buchholz.See, for example, [1] for a definition.

Let Ω(1, ω) := Ωω and Ω(n + 1, ω) := ΩΩ(n,ω). We are going to show that ϑ(Ω(n, ω)) =ψ(Ω(n + 1, ω)) is true for every natural number n. This technical result will be needed forcomparing the proof-theoretic strength of Kruskal’s theorem and Π1

2 − BI over ACA0. (ThisSection is very technical and may be skipped at first reading.)

Definition 3.1 Inductive definition of sets of ordinals Cn(α), C(α), and of ordinals ψα by mainrecursion on α and side recursion on n < ω.

1. 0,Ω ⊆ Cn(α),

2. β, γ ∈ Cn(α) =⇒ ωβ + γ ∈ Cn+1(α),

3. β ∈ C(β) ∩ α ∩ Cn(α) =⇒ ψβ ∈ Cn+1(α),

4. C(α) :=⋃Cn(α) : n < ω.

ψα := minξ : ξ 6∈ C(α).

The following Lemmata can be gathered from [1] or [4].

Lemma 3.1 ψα < Ω.

Lemma 3.2 1. ψα = C(α) ∩ Ω,

2. α ∈ C(α) ∩ β =⇒ ψα < ψβ,

3. α, β < ψγ =⇒ ωα + β < ψγ.

Definition 3.2 Inductive definition of a set T (ψ) of ordinals and a natural number Gψα forα ∈ OT (ψ).

1. 0,Ω ⊆ OT (ψ), Gψ0 := GψΩ := 0,

2. α = α1+· · ·+αn, α > α1 ≥ . . . ≥ αn, α1, . . . , αn ∈ OT (ψ)∩AP =⇒ α ∈ OT (ψ), Gψα :=maxGψα1, . . . , Gψαn+ 1,

3. α = ωα1 , α > α1, α1 ∈ OT (ψ) =⇒ α ∈ OT (ψ), Gψα := Gψα1 + 1,

4. α = ψα1, α1 ∈ C(α1), α1 ∈ OT (ψ) =⇒ α ∈ OT (ψ), Gψα := Gψα1 + 1.

Lemma 3.3 1. OT (ψ) = C(εΩ+1),

15

Page 16: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

2. OT (ψ) ∩ Ω ⊂ ψεΩ+1,

3. OT (ψ) ∩ α = α for α ≤ ψεΩ+1.

To see that OT (ψ) may be considered as a primitive recursive ordinal notation system we mustbe able to decide the relation α ∈ C(β) for α, β ∈ OT (ψ). For this purpose we introduce theauxiliary concept of coefficient sets.

Definition 3.3 Inductive definition of a set of ordinals Kα for α ∈ OT (ψ).

1. K0 := KΩ := ∅,

2. K(α1 + · · ·+ αn) := Kα1 ∪ . . . ∪Kαn,

3. K(ωα1) = Kα1,

4. Kψα1 := Kα1 ∪ α1.

Let kα := max(Kα ∪ 0) and hα := kα+ ωα(cf. [4]).

Lemma 3.4 α ∈ OT (ψ) =⇒ (Kα < β ⇐⇒ α ∈ C(β)).

Lemma 3.5 1. α ∈ OT (ψ) =⇒ α∗, kα ∈ OT (ψ), Gψα∗ ≤ Gψα, Gψkα ≤ Gψα, kα =

kα∗,

2. α ∈ OT (ψ) ∧ kα < α =⇒ ψα ∈ OT (ψ).

3. α ∈ OT (ψ) =⇒ ψhα ∈ OT (ψ),

Lemma 3.6 α, β ∈ OT (ψ) =⇒ (α∗ < ψβ ⇐⇒ kα < β).

Definition 3.4 Recursive definition of α for α ∈ OT (ϑ)

1. 0 := 0, Ω := Ω,

2. α1 + · · ·+ αn := α1 + · · ·+ αn,

3. ωα1 := ωα1 ,

4. ϑα1 := ψhα1.

Lemma 3.7 1. α ∈ OT (ϑ) =⇒ α ∈ OT (ψ),

2. α, β ∈ OT (ϑ) =⇒ (α < β ⇐⇒ α < β),

3. α ∈ OT (ϑ) =⇒ α∗ = α∗.

Proof by a simultaneous induction on Gϑα+Gϑβ. We consider only the nontrivial case for 2.).Let α = ϑα1 and β = ϑβ1. Assume first that α1 < β1 and α∗1 < ϑβ1. Then we see α1 < β1 and

α∗1 < ψhβ1 by the induction hypothesis. Thus kα1∗ < hβ1 and therefore hα1 = kα1 + ωα1 =

kα1∗ + α1 < kβ1 + ωβ1 = hβ1. This yields ϑα1 < ϑβ1. Assume now α1 > β1 and ϑα1 ≤ β∗1 .

Then ϑα1 ≤ β∗1 . Since β1∗≤ hβ1

∗we see β1

∗≤ ψhβ1 by Lemma 3.6. Thus ϑα1 < ϑβ1.

16

Page 17: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

Corollary 3.1 1. ϑα ≤ ψhα,

2. ϑ(Ω(n, ω)) ≤ ψ(Ω(n+ 1, ω)).

Proof: This follows from Lemma 3.7 and the well known fact that fβ ≥ β holds for any strictlymonotonic ordinal function f . ut

We now define the reverse embedding α 7→ α of OT (ψ) into OT (ϑ). Of course the trivial setupψα 7→ ϑα will induce an order-preserving mapping of OT (ψ) into OT (ϑ). But this yields onlyψα ≤ ϑα ≤ ψhα and this is not an equality in the interesting cases.

Lemma 3.8 For each α ∈ OT (ψ) \ 0 there exist uniquely determined α1, . . . , αn∈ OT (ψ) and β1, . . . , βn ∈ OT (ψ) such that α = Ωα1(1 +β1) + · · ·+ Ωαn(1 +βn), α1 > . . . > αnand β1, . . . , βn < Ω.

Definition 3.5 α =Ω−NF Ωα1(1 + β1) + · · ·+ Ωαn(1 + βn):⇐⇒ α = Ωα1(1 + β1) + · · ·+ Ωαn(1 + βn) ∧ α1 > . . . > αn ∧ β1, . . . , βn < Ω.

Lemma 3.9 α =Ω−NF Ωα1(1 + β1) + · · ·+ Ωαn(1 + βn)=⇒ α∗ = maxα∗1, . . . , α∗n, β∗1 , . . . , β∗n.

Definition 3.6 Inductive definition of α for α ∈ OT (ψ).

1. 0 := 0, Ω := Ω,

2. α1 + · · ·+ αn := α1 + · · ·+ αn,

3. ωα1 := ωα1 ,

4. ψ0 := ϑ0,

5. ψα := ϑ(Ωαn + · · ·+ ϑ(Ωα2 + ϑ(Ωα1 + β1) + β2) · · ·+ βn),if α =Ω−NF Ωα1(1 + β1) + · · ·+ Ωαn(1 + βn).

Lemma 3.10 α ∈ OT (ψ) =⇒ α ∈ OT (ϑ).

The following Lemma is of crucial importance.

Lemma 3.11 1. If α, β ∈ OT (ψ), then (α < β ⇐⇒ α < β).

2. If α =Ω−NF Ωα1(1 + β1) + · · · + Ωαn(1 + βn) + Ωαn+1(1 + βn+1) + · · · +Ωαm(1 + βm) ∈ OT (ψ),if β =Ω−NF Ωα1(1 + β1) + · · ·+ Ωαn(1 + βn) + Ωγn+1(1 + δn+1) ∈ OT (ψ),and if α ∪Kα < β,then ϑ(Ωαm + · · · + ϑ(Ωαn+1 + ϑ(Ωαn + (· · · ) + βn) + βn+1) + · · · + βm) < ϑ(Ωγn+1 +ϑ(Ωαn + (· · · ) + βn) + δn+1),

3. If β =Ω−NF Ωα1(1 + β1) + · · · + Ωαn(1 + βn) + Ωγn+1(1 + δn+1) ∈ OT (ψ),α < β and α ∈ OT (ψ),then α∗ < ϑ(Ωγn+1 + ϑ(Ωαn + (· · · ) + βn) + δn+1).

17

Page 18: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

Proof by main induction on 2Gψα + 2Gψβ and side induction on m− n.1.): The critical case is α = ψα0, β = ψβ0, Kα0 < α0, Kβ0 < β0 and α < β. Let

α0 :=Ω−NF Ωα1(1 + β1) + · · ·+ Ωαn(1 + βn) + Ωαn+1(1 + βn+1) + · · ·+ Ωαm(1 + βm)

and

β0 =Ω−NF Ωα1(1 + β1) + · · ·+ Ωαn(1 + βn) + Ωγn+1(1 + δn+1) + · · ·+ Ωγr(1 + δr),

where Ωαn+1(1 + βn+1) < Ωγn+1(1 + δn+1).Let

γ0 :=Ω−NF Ωα1(1 + β1) + · · ·+ Ωαn(1 + βn) + Ωγn+1(1 + δn+1).

Then α0 < γ0 and Kα0 < γ0. By the main induction hypothesis applied to 2.) we see that

ψα0 < ϑ(Ωγn+1 + ϑ(Ωαn + (· · · ) + βn) + δn+1)

< ϑ(Ωγn+2 + ϑ(Ωγn+1 + (· · · ) + δn+1) + δn+2)

≤ . . .

≤ ψβ0.

2.): The assertion holds for m− n = 0. We assume first that γn+1 > αn+1 is true. Let

α′ :=Ω−NF Ωα1(1 + β1) + · · ·+ Ωαn(1 + βn) + Ωαn+1(1 + βn+1) + · · ·+ Ωαm−1(1 + βm−1).

Thenmaxα′, kα′ ≤ maxα, kα < β. We conclude by the side induction hypothesis ϑ(Ωαm−1+· · · + ϑ(Ωαn+1 + ϑ(Ωαn + (· · · ) + βn) + βn+1) + · · · + βm−1)< ϑ(Ωγn+1 + ϑ(Ωαn + (· · · ) + βn) + δn+1) =: γ′. The main induction hypothesis for 1.) yieldsγn+1 > αn+1 > . . . > αm. The main induction hypothesis for 3.) yields βm

∗< γ′ and αm

∗ < γ′.Therefore

ϑ(Ωαm + · · ·+ ϑ(Ωαn+1 + ϑ(Ωαn + (· · · ) + βn) + βn+1) + · · ·+ βm) < γ′.

Assume now γn+1 = αn+1 and βn+1 < δn+1. If αm < αn+1 = γn+1 the assertion follows asbefore. Assume now αm = αn+1 = γn+1 and βn+1 < δn+1. The main induction hypothesis for1.) yields βn+1 < δn+1. Thus

ϑ(Ωαm + (· · · ) + βn+1) < ϑ(Ωγn+1 + (· · · ) + δn+1).

3.): The claim is true for α ∈ 0,Ω. If α is not of the form ψα0 with Kα0 < α0 then theassertion follows immediately from the main induction hypothesis for 3.). Let α = ψα0 andKα0 ∪ α0 = Kα < β. The main induction hypothesis for 2.) yields the assertion. 2

Corollary 3.2 ψ(Ω(n+ 1, ω)) = ϑ(Ω(n, ω)).

Remark: Let θ be the collapsing function defined in [7]. Then θ(Ω(n, ω))0 = ϑ(Ω(n, ω)) forevery n ∈ N.

18

Page 19: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

4 Majorization Relations and Fundamental Functions

In this section we restate for convenience the concepts of majorization relations and fundamentalfunctions which are developed in [1]. These concepts are needed for carrying through the ordinalanalysis of the restricted bar induction schemata. All missing proofs can be found in [1].

Definition 4.1 1. αµ β means α < β and for all δ, η:α ≤ δ ≤ minβ, η, δ, µ ∈ C(η) =⇒ α ∈ C(η).

2. α β :⇐⇒ α0 β,

3. α β :⇐⇒ (α β ∨ α = β).

Lemma 4.1 1. α β =⇒ αµ β,

2. α < β =⇒ αα β,

3. α < β < γ & αµ γ =⇒ αµ β,

4. 0 < β < ε0 =⇒ α α+ β,

5. α < β < Ω =⇒ α β,

6. α β =⇒ α+ 1 β.

Lemma 4.2 αµ β, β µ γ =⇒ αµ γ.

Lemma 4.3 αµ β, β < ωγ+1 =⇒ ωγ + αµ ωγ + β.

Corollary 4.1 ωα · n ωα · (n+ 1).

Lemma 4.4 αµ β =⇒ ωα · nµ ωβ.

Lemma 4.5 αµ β, µ ∈ C(α), β ∈ C(β) implies:

1. α ∈ C(α),

2. ψαµ ψβ.

Corollary 4.2 α = α0 + 1 ∈ C(α) =⇒ α0 ∈ C(α) & ψα0 ψα.

Definition 4.2 A function f with the domain dom(f) ⊆ OT (ψ) is said to be a fundamentalfunction if the following holds.

F1. If β ∈ dom(f) and α < β, then α ∈ dom(f) and f(α) α f(β).

F2. If β ∈ dom(f) and f(0) ≤ δ < f(β), then there is an α < β such that f(α) ≤ δ < f(α+ 1)and f(α) f(α+ 1).

F3. If α ∈ dom(f) and f(α) ∈ C(η), then α ∈ C(η).

19

Page 20: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

Lemma 4.6 If f is a fundamental function and α ∈ dom(f), then α ≤ f(α).

Definition 4.3 Let Idβ be the function with domain dom(Idβ) := α ∈ OT (ψ) : α ≤ β andIdβ(α) := α for all α ∈ dom(Idβ).

Lemma 4.7 Idβ is a fundamental function.

Definition 4.4 Let f be a fundamental function.

1. Let ωγ + f be the function with domain dom(ωγ + f) := α ∈ dom(f) : f(α) < ωγ+1 and(ωγ + f)(α) := ωγ + f(α) for all α ∈ dom(ωγ + f).

2. Let ωf be the function with domain dom(ωf ) := dom(f) and (ωf )(α) := ωf(α) for allα ∈ dom(ωf ).

3. Let ψf be the function with domain dom(ψf) := α ∈ dom(f) : α < Ω, f(α) ∈ C(f(α))and (ψf)(α) := ψ(f(α)) for all α ∈ dom(ψf).

Lemma 4.8 If f is a fundamental function, then also ωγ + f, ωf and ψf are fundamentalfunctions.

Lemma 4.9 If f is a fundamental function with α,Ω ∈ dom(f), α < β = ψf(α) and f(α) f(Ω), then also f(β) f(Ω).

Corollary 4.3 If f is a fundamental function with Ω ∈ dom(f), then f(ψ(f(0))) f(Ω).

5 Ordinal analysis of restricted bar induction

In this Section we determine the proof–theoretic strength of the subsystems of second orderarithmetic ACA0 +Π1

2−BI which is based on arithmetical comprehension and Π12 bar induction.

From this result we shall gather the unprovability of Kruskal’s Theorem in ACA0 + Π12 − BI

as well as the proof–theoretic equivalence of ACA0 + Π12 − BI and Kripke–Platek set theory

plus infinity axiom but with foundation restricted to set–theoretic Π2 formulas. Our device fordealing with Π1

2 bar induction will be Buchholz’ Ω–rule (cf. [1]).To set the context, we fix some notations. The language of second order arithmetic, L2,

consists of free numerical variables a, b, c, d, . . ., bound numerical variables x, y, z, . . ., free setvariables U, V,W, . . . , bound set variables X,Y, Z, . . ., the constant 0, a symbol for each primitiverecursive function, and the symbols = and ∈ for equality in the first sort and the elementhoodrelation, respectively. The numerical terms of L2 are built up in the usual way; r, s, t, . . .are syntactic variables for them. Formulas are obtained from atomic formulas (s = t), (s ∈U) and negated atomic formulas ¬ (s = t),¬ (s ∈ U) by closing under ∧,∨ and quantification∀x,∃x,∀X,∃X over both sorts; so we stipulate that formulas are in negation normal form.

The classes of Π12– and Σ1

n–formulas are defined as usual (with Π10 = Σ1

0 = ∪Π0n : n ∈ N).

¬A is defined by de Morgan’s laws; A → B stands for ¬A ∨ B. All theories in L2 will beassumed to contain the axioms and rules of classical two sorted predicate calculus, with equalityin the first sort. In addition, it will be assumed that they comprise the system ACA0. ACA0

20

Page 21: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

contains all axioms of elementary number theory, i.e. the usual axioms for 0, ′ (successor), thedefining equations for the primitive recursive functions, the induction axiom

∀X [0 ∈ X ∧ ∀x(x ∈ X → x′ ∈ X)→ ∀x(x ∈ X)],

and all instances of arithmetical comprehension

∃Z ∀x[x ∈ Z ↔ F (x)],

where F (a) is an arithmetic formula, i.e. a formula without set quantifiers.For a 2-place relation ≺ and an arbitrary formula F (a) of L2 we define

Prog(≺, F ) := (∀x)[∀y(y ≺ x→ F (y))→ F (x)] (progressiveness)

TI(≺, F ) := Prog(≺, F )→ ∀xF (x) (transfinite induction)

WF (≺) := ∀X TI(≺, X) :=∀X (∀x[∀y(y ≺ x→ y ∈ X)→ x ∈ X]→ ∀x[x ∈ X]) (well-foundedness).

Let F be any collection of formulas of L2. For a 2–place relation ≺ we will write ≺∈ F , if ≺is defined by a formula Q(x, y) of F via x ≺ y := Q(x, y). In addition, we will use the notation≺∈ F− to express that Q(x, y) contains no set parameters.

Definition 5.1 1. (Π1n −BI)0 denotes ACA0 extended by the Π1

n bar induction scheme, i.e.all formulas of the form

WF (≺)→ TI(≺, F ),

where ≺∈ Π10 and F ∈ Π1

n.

2. (Π1n −BI)−0 denotes the modification, where ≺ is required to contain no set variables.

3. (Π1n −BI)− and (Π1

n −BI) denote the corresponding theories augmented by the scheme

F (0) ∧ ∀x(F (x)→ F (x′))→ ∀xF (x)

for every L2–formula F (a).

For technical purposes it it convenient to reformulate (Π12 −BI)0 in a sequent calculus.

Definition 5.2 The sequent calculus version of (Π12 −BI)0 derives finite sets of formulas de-

noted by Γ,Θ,Ξ,Λ, ... . The intended meaning of Γ is the disjunction of all formulas of Γ. Weuse the notation Γ, A for Γ∪A and Γ,Ξ for Γ∪Ξ. Let Γ be an arbitrary finite set of formulas.

The axioms of (Π12 −BI)0 are:

(Ax1) Γ, A,¬ A for every prime formula A.

(Ax2) Γ,∆ if ∆ is a set of prime and negated prime formulas such that∨

∆ (i.e. the disjunc-tion of all formulas of ∆) is a tautological consequence of the equality axioms and the definingequalities of the primitive recursive functions.

21

Page 22: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

(IA) Γ,∀X [0 ∈ X ∧ ∀x(x ∈ X → x′ ∈ X)→ ∀x(x ∈ X)].

(Π10 − CA) Γ, ∃Z ∀x[x ∈ Z,↔ F (x)] where F is arithmetic.

The logical rules of inferences are:

(∧) ` Γ, A and ` Γ, B =⇒ ` Γ, A ∧B,

(∨) ` Γ, Ai =⇒ ` Γ, A0 ∨A1 if i ∈ 0, 1,

(∀1) ` Γ, F (a) =⇒ ` Γ, ∀xF (x),

(∀2) ` Γ, F (U) =⇒ ` Γ,∀XF (X),

(∃1) ` Γ, F (t) =⇒ ` Γ, ∃xF (x),

(∃2) ` Γ, F (U) =⇒ ` Γ,∃XF (X),

(Cut) ` Γ, A and ` Γ,¬ A =⇒ ` Γ,

where in (∀1) and (∀2) the free variable a, respectively U is not to occur in the conclusion.

The only non–logical rule of inference of (Π12 −BI)0 is the Π1

2 bar induction rule:

` Γ,WF (≺) and ` Γ, ∃y ∃Y ∀Z[y ≺ a ∧ ¬B(y, Y, Z)], ∃Z B(a, U, Z)

=⇒ ` Γ,∃ZB(b, V, Z),

where ≺ and B are arithmetic; and a and U are not to occur inΓ,∀x ∀Y ∃ZB(x, Y, Z).

We shall conceive axioms as inferences with an empty set of premises. The minor formulas(m.f.) of an inference are those formulas which are rendered prominently in its premises. Theprincipal formulas (p.f.) of an inference are the formulas rendered prominently in its conclusion.(Cut) has no principal formula. So any inference has the form

(∗) For all i < k ` Γ,Ξi =⇒ ` Γ,Ξ

(0 ≤ k ≤ 2), where Ξ consists of p.f. and Xi is the set of m.f. in the i–th premise. Theformulas in Γ are called side formulas (s.f.) of (∗).

Derivations of (Π12 −BI)0 are defined inductively, as usual. D,D′,D′, . . . range as syntactic

variables over (Π12 −BI)0 derivations. All this is completely standard, and we refer to [9] for

notions like ”length of a derivation D” (abbreviated by | D |), ”last inference of D”, ”directsubderivation of D ”. We write D ` Γ to mean that D is a derivation of Γ.

The most important feature of sequent calculi is cut–elimination. Our sequent calculus(Π1

2 −BI)0 admits cut–elimination concerning cuts whose cut formula is neither a principalformula of a non–logical rule of inference nor a principal formula of an axiom. This is a generalphenomenon which will be exploited next. To state this fact concisely, let us introduce a measureof complexity, gr(A), the grade of a formula A:

1. gr(A) = 0 if A is a prime formula or negated prime formula.

2. gr(∀XF (X)) = gr(∃XF (X)) = ω if F (U) is arithmetic.

22

Page 23: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

3. gr(A ∧B) = gr(A ∨B) = maxgr(A), gr(B)+ 1.

4. gr(∀xH(x)) = gr(∃xH(x)) = gr(H(0)) + 1.

5. gr(∀XG(X)) = gr(∃XG(X)) = gr(G(U)) + 1,if G is not arithmetic.

The cut–rank, %(D), of a derivation D is also defined by induction: Let Di, i < k, bethe direct subderivations of D. If the last inference of D is (Cut) with m.f. A and ¬A let%(D) := supgr(A) + 1, sup%(Di) : i < k. Otherwise, let %(D) := sup%(Di) : i < k. By

(Π12 −BI)0

k

% Γ we mean that there is a derivation D ` Γ in (Π12 −BI)0 such that |D |≤ k and

%(D) ≤ %.

Theorem 5.1 (Cut–elimination) Let 2k0 := k and 2km+1 := 2l where l := 2km.

If (Π12 −BI)0

k

ω+n+1Γ then (Π1

2 −BI)0p

ω+1Γ where p = 2kn.

Proof. Observe that gr(A) < ω + 1 holds for every p.f. A of an axiom or non–logical rule of(Π1

2 −BI)0 . So the result follows by the standard cut–elimination procedure for sequent calculi(cf. [9]). ut

Proposition 5.1 (Π12 −BI)0 proves WF (≺)→ TI(≺, F ) for any ≺∈ Π1

0 and F ∈ Π12.

Proof. Let F (x) be ∀Y ∃ZB(x, Y, Z) with B arithmetic.Then

(Π12 −BI)0 ` ∀y [y ≺ a→ ∀Y ∃ZB(y, Y, Z)], ∃y ∃Y ∀Z[y ≺ a ∧ ¬B(y, Y, Z)]

and(Π1

2 −BI)0 ` ∃Y ∀Z¬B(a, Y, Z), ∃ZB(A,U,Z)

yield(Π1

2 −BI)0 ` C(a), ∃y ∃Y ∀Z[y ≺ a ∧ ¬B(y, Y, Z)], ∃ZB(a, U, Z)

withC(a) ≡ ∀y[y ≺ a→ ∀Y ∃ZB(y, Y, Z)] ∧ ∃Y ∀Z¬B(a, Y, Z).

Here we assume that a and U are ”fresh” variables. By (∃1), we get

(Π12 −BI)0 ` ∃xC(x), ∃y∃Y ∀Z[y ≺ a ∧ ¬B(y, Y, Z)], ∃ZB(a, U, Z). (1)

Using the Π12 bar induction rule on (1) and

(Π12 −BI)0 ` ¬WF (≺),WF (≺), (2)

we obtain(Π1

2 −BI)0 ` ¬WF (≺), ∃xC(x),∃ZB(a, U, Z);

23

Page 24: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

thence, by (∀2), (∀1), and (∨), this becomes

(Π12 −BI)0 ` ¬WF (≺) ∨ (∃xC(x) ∨ ∀x∀Y ∃ZB(x, Y, Z)).

Therefore (Π12 −BI)0 `WF (≺)→ TI(≺, F ).

(Note that A→ B ≡ ¬ A ∨B.) ut

In the next section we shall embed (Π12 −BI)0 into an infinitary calculus T ∗. To handle this

with optimal ordinal bounds, we have to resort to very well behaved derivations.

Definition 5.3 Let ∃Σ12 be the collection of formulas of the form

∃y∃X∀Y A(y,X, Y ), where A(0, U, V ) is arithmetic.A (Π1

2 −BI)0 derivation D ` Γ is said to be nice if %(D) ≤ ω + 1, and, for every Σ12-formula

A that serves as the minor formula of an inference (∃1) in this derivation, either A is an elementof Γ or A is never a side formula of an inference in D.

Note that if none of the formulas occuring in a derivation D0 has a ∃Σ12 subformula, then D0 is

automatically nice. We say that a formula C is essentially Σ12 (written ess-Σ1

2) if

C ∈ ∃Σ12 ∪⋃Σ1

i : i ≤ 2 ∪⋃Π1

j : j < 2.

Lemma 5.1 Let Γ be a set of ess-Σ12 formulas,

Ξ = ∃Z1B1(t1, Z1), . . . ,∃ZrBr(tr, Zr) ⊆ Σ12, and

Θ = ∃y1∃Z1B1(y1, Z1), . . . ,∃yr∃ZrBr(yr, Zr).

1. If D ` Γ,Ξ, then we can find a nice D+ ` Γ,Θ.

2. If D0 ` Γ, then there is a nice D+0 ` Γ.

Proof: 1.) By Theorem 5.1, we may assume %(D) ≤ ω + 1. We proceed by induction on |D |. IfΓ,Ξ is an axiom, then so is Γ; hence Γ,Θ is an axiom. The derivation consisting merely of thisaxiom is of course nice.

Now suppose 0 <|D | . If neither a m. f. nor a p. f. of the last inference (l. i.) of D is Σ12,

then the assertion follows by using the induction hypothesis on the premisses and reapplying thesame inference, since this does not change the stock of Σ1

2–formulas. Note that ρ(D) ≤ ω + 1.Next assume that a formula ∃ZB(t, Z) ∈ Σ1

2 is a m. f. of the last inference of D. Then thismust be an instance of (∃1) because of Γ,Ξ ⊂ess–Σ1

2. So the p. f. is of the form ∃y∃ZB(y, Z) ∈ Γ,and the direct subderivation of D takes the form D0 ` Λ,Ξ′ with Λ ⊆ Γ and Ξ′ = Ξ, ∃ZB(y, Z).Applying the induction hypothesis we get a nice

D+ ` Λ,Θ, ∃y∃ZB(y, Z),

thence D+ ` Γ,Θ.Finally, suppose that ∃Y C(Y ) ∈ Σ1

2 is the p. f. of the last inference of D. This then mustbe an instance of (∃2). So there is a nice derivation D0 ` Γ,Ξ, C(U) such that %(D0) ≤ ω + 1and |D0 |<|D |. Inductively we find a nice derivation

D+0 ` Γ,Θ, C(U).

24

Page 25: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

By use of (∃2), we can continue D+0 to a nice derivation of Γ,Θ,∃Y C(Y ). If ∃Y C(Y ) ∈ Γ, then

we are done. Otherwise, ∃Y C(Y ) ∈ Ξ; thus an application of (∃1) gives us a nice derivationD+ ` Γ,Θ, since in this case ∃Y C(Y ) does not appear as a s. f. in D+

0 .2.) follows from 1.) with Ξ = ∅. ut

6 The infinitary calculus T ∗

The formulas of T ∗ arise from L2-formulas by replacing free numerical variables by numerals, i.e. terms of the form 0, 0′, 0′′, ... . Especially, every formula A of T ∗ is a L2-formula, and thusgr(A) is understood. A formula without second order variables will be called constant. We aregoing to measure the length of derivations by ordinals. For technical reasons we are compelledto diverge from the ordinal notation system OT (ϑ) of Section 1. Instead we are going to use theset of ordinals OT (ψ) of Section 3.

Definition 6.1 1. A formula B is said to be weak if it belongs to Π10 ∪Π1

1.

2. Two closed terms s and t are said to be equivalent if they yield the same value whencomputed.

3. A formula is called constant if it contains no set variables. The truth or falsity of such aformula is understood with respect to the standard structure of the integers.

4. 0 := 0, m+ 1 := m′.

Definition 6.2 Inductive definition of T ∗α

% Γ for α ∈ OT (ψ) and % < ω + ω.

1. If A is a true constant prime formula or negated prime formula and A ∈ Γ, then T ∗α

% Γ.

2. If Γ contains formulas A(s1, . . . , sn) and ¬A(t1, . . . , tn) of grade 0 or ω, where si andti (1 ≤ i ≤ n) are equivalent terms, then T ∗

α

% Γ.

3. If T ∗β

% Γi and β α hold for every premiss Γi of an inference (∧), (∨),

(∃1), (∀2) or (Cut) with a cut formula having grade < %, and conclusion Γ, then T ∗α

% Γ.

4. If T ∗α0

% Γ, F (U) holds for some α0 α and a non-arithmetic formula F (U) (i. e.,

gr(F (U)) ≥ ω), then T ∗α

% Γ,∃XF (X) .

5. (ω-rule). If T ∗β

% Γ, A(m) is true for every m < ω, ∀xA(x) ∈ Γ, and β α, then T ∗α

% Γ .

6. (Ω-rule). Let f be a fundamental function satisfying

a. Ω ∈ dom(f) and f(Ω) α,

25

Page 26: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

b. T ∗f(0)

% Γ,∀XF (X) , where ∀XF (X) ∈ Π11, and

c. T ∗β

0Ξ, ∀XF (X) implies T ∗

f(β)

% Ξ,Γ for every set of weak formulas Ξ and β < Ω.

Then T ∗α

% Γ holds.

Remark 6.1 The derivability relation T ∗α

% Γ is modelled upon the relation PB∗α

n F of [1],the main difference being the sequent calculus setting instead of P– and N–forms and a differentassignment of cut–degrees. The allowance for transfinite cut–degrees will enable us to deal witharithmetical comprehension.

Lemma 6.1 1. T ∗α

δΓ & Γ ⊆ ∆ & α β & δ ≤ % =⇒ T ∗

β

% ∆ ,

2. T ∗α

% Γ, A ∧B =⇒ T ∗α

% Γ, A & T ∗α

% Γ, B,

3. T ∗α

% Γ, A ∨B =⇒ T ∗α

% Γ, A,B

4. T ∗α

% Γ, F (t) =⇒ T ∗α

% Γ, F (s) if t and s are equivalent,

5. T ∗α

% Γ, ∀xF (x) =⇒ T ∗α

% Γ, F (s) for every term s.

6. If T ∗α

% Γ,∀XG(X) and gr(G(U)) ≥ ω, then T ∗α

% Γ, G(U) .

Proofs by induction on α. These can be carried out straightforwardly. 5.) requires 4.). As to6.), observe that ∀XG(X) cannot be the main formula of an axiom. 2

Lemma 6.2 T ∗2·α0

Γ, A(s1, . . . , sk),¬A(t1, . . . , tk) if α ≥ gr(A(s1, . . . , sk)) and si and ti areequivalent terms.

Proof by induction on gr(A). ut

Lemma 6.3 1. T ∗2m

0¬(0 ∈ U), (∃x)[x ∈ U ∧ ¬(x′ ∈ U)],m ∈ U ,

2. T ∗ω+5

0∀X[0 ∈ X ∧ ∀x(x ∈ X → x′ ∈ X)→ ∀x(x ∈ X)].

Proof. For 1.) use induction on m. For a detailed proof see [4] 10.17. 2.) is an immediateconsequence of 1.) using Lemma 6.1 1.), the ω-rule, (∨), and (∀2).

Definition 6.3 For formulas F (U) and A(a), F (A) denotes the result of replacing each occur-rence of the form (e ∈ U) in F (U) by A(e). The expression F (A) is a formula if the boundvariables in A(a) are chosen in an appropriate way, in particular, if F (U) and A(a) have nobound variables in common.

26

Page 27: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

Lemma 6.4 Assume that α < Ω. Let ∆(U) = F1(U), . . . , Fk(U) be a set of weak formulassuch that U doesn’t occur in ∀XFi(X) (1 ≤ i ≤ k). For an arbitrary formula A(a) we then have:

T ∗α

0∆(U) =⇒ T ∗

Ω+α

0∆(A) .

Proof by induction on α. Suppose ∆(U) is an axiom. Then either ∆(A) is an axiom too, or

T ∗ω+ω

0∆(A) can be obtained through use of Lemma 6.2. Therefore T ∗

Ω+α

0∆(A) by Lemma

6.1 1.). If T ∗α

0∆(U) is the result of an inference, then this inference must be different from

(∃2), (Cut), and (Ω−rule). Therefore the assertion follows easily from the induction hypothesis.2

Lemma 6.5 Let Γ, ∀XF (X) be a set of weak formulas. If T ∗α

0Γ, ∀XF (X) and α < Ω, then

T ∗α

0Γ, F (U) .

Proof by induction on α. Note that ∀XF (X) cannot be a principal formula of an axiom, since∃X¬F (X) does not surface in such a derivation. Also, due to α < Ω, the derivation doesn’tinvolve instances of the Ω-rule. Therefore the proof is straightforward. 2

The role of the (Ω− rule) in our calculus T ∗ is enshrined in the next lemma.

Lemma 6.6 T ∗Ω·20∃XF (X),¬F (A) for every arithmetic formula F (U) and arbitrary formula

A(a).

Proof. Let f(α) := Ω + α with dom(f) := α ∈ OT (ψ) : α ≤ Ω. Then

T ∗f(0)

0∀X¬F (X),∃XF (X),¬F (A) (1)

according to Lemma 6.2. For α < Ω and every set of weak formulas Θ, we have by Lemmata6.4 and 6.5,

T ∗α

o Θ, ∀X¬F (X) =⇒ T ∗f(α)

0Θ,¬F (A).

Therefore, by Lemma 6.1 1.),

T ∗α

0Θ, ∀X¬F (X) =⇒ T ∗

f(α)

0Θ, ∃XF (X),¬F (A). (2)

The assertion now follows from (1) and (2) by the Ω-rule. 2

7 The reduction procedure for T ∗

Lemma 7.1 Let C be a formula of grade %. Suppose C is a prime formula or of either form∃XH(X), ∃xG(x) or A ∨B. Let α = ωα1 + · · ·+ ωαk with δ ≤ ωαk ≤ . . . ≤ ωα1 . Then we have

T ∗α

% ∆,¬C & T ∗δ

% Γ, C =⇒ T ∗α+δ% ∆,Γ .

Proof by induction on δ.1. Let Γ, C be an axiom. Then there are three cases to consider.

1.1. Γ is an axiom. Then so is ∆,Γ Hence T ∗α+δ% ∆,Γ .

27

Page 28: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

1.2. C is a true constant prime formula or negated prime formula. A straightforward induction

on α then yields T ∗α

% ∆ , and thus T ∗α+δ% ∆,Γ by 6.1 1.).

1.3. C ≡ A(s1, . . . , sn) and Γ contains a formula ¬A(t1, . . . , tn) where si and ti are equivalentterms. From T ∗

α

% ∆,¬A(s1, . . . , sn) one receives

T ∗α

% ∆,¬A(t1, . . . , tn) by use of Lemma 6.1 4.). Thence T ∗α+δ% ∆,Γ follows by use of Lemma

6.1 1.), since ¬A(t1, . . . , tn) ∈ Γ.

2. Suppose C ≡ A ∨B and T ∗δ0% Γ, C,A0 with A0 ∈ A,B and δ0 δ. Inductively we get

T ∗α+δ0% ∆,Γ, A0 . (1)

Next use Lemma 6.1 2.) on T ∗α

% ∆,¬A ∧ ¬B to obtain

T ∗α+δ0% ∆,Γ,¬A0 . (2)

Whence use a cut on (1) and (2) to get the assertion.

3. Suppose C ≡ ∃xG(x) and T ∗δ0% Γ, C,G(t) with δ0 δ. Inductively we get

T ∗α+δ0% ∆,Γ, G(t) . (3)

By Lemma 6.1 1.), 5.), we also get

T ∗α+δ0% ∆,Γ,¬G(t) ; (4)

thus (3) and (4) yield T ∗α+δ% ∆,Γ by (Cut).

4. Suppose the last inference was (∃2) with p. f. C. Then C ≡ ∃XH(X) and T ∗δ0% Γ, C,H(U)

for some δ0 δ and gr(H(U)) ≥ ω. Inductively we get

T ∗α+δ0% ∆,Γ, H(U). (5)

By Lemma 6.1 1.), 6.) we also get

T ∗α+δ0% ∆,Γ,¬H(U). (6)

From (5) and (6) we obtain

T ∗α+δ% ∆,Γ.

5. Let T ∗δ

% Γ, C be derived by the Ω-rule with fundamental function f . Then the assertionfollows from the I. H. by the Ω-rule using the fundamental function α+ f .6. In the remaining cases the assertion follows from the I. H. used on the premises and byreapplying the same inference. 2

Lemma 7.2 T ∗α

η+1Γ =⇒ T ∗

ωα

η Γ.

28

Page 29: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

Proof by induction on α. We only treat the crucial case when T ∗α0

η+1Γ, D and T ∗

α0

η+1Γ,¬D ,

where α0 α, and gr(D) = η. Inductively this becomes T ∗ωα0

η Γ, D and T ∗ωα0

η Γ,¬D. Since

D or ¬D must be of one of the forms exhibited in Lemma 7.1, we obtain T ∗ωα0+ωα0

η Γ byLemma 7.1. As ωα0 + ωα0 ωα, we can use Lemma 6.1 1.) to get the assertion.

Theorem 7.1 (Collapsing Theorem) If Γ is a set of weak formulas and α ∈ C(α), then we have

T ∗α

ω Γ =⇒ T ∗ψα

0Γ.

Proof by induction on α. Observe that for β < δ < Ω, we always have β δ.1. If Γ is an axiom, then the assertion is trivial.2. Let T ∗

α

ω Γ be the result of an inference other than (Cut) and Ω-rule. Then we have

T ∗α0

ω Γi with α0 α and Γi being the i-th premiss of that inference. α0 α and α ∈ C(α)

imply α0 ∈ C(α0) and ψα0 ψα by Lemma 4.5. Therefore T ∗ψα0

0Γ0 by the I. H., hence

T ∗ψα

0Γ by reapplying the same inference.

3. Suppose T ∗α

ω Γ results by the Ω-rule with respect to a Π11-formula ∀XF (X) and a funda-

mental function f . Then Ω ∈ dom(f) and f(Ω) α. Also

T ∗f(0)

ω Γ,∀XF (X), (1)

and, for every set of weak formulas Ξ and β < Ω,

T ∗β

0Ξ,∀XF (X) =⇒ T ∗

f(β)

ω Ξ,Γ. (2)

From Ω ∈ dom(f) we get f(0) f(Ω) by (F1.); thus f(Ω) α yields f(0) ∈ C(f(0)) andf(Ω) ∈ C(f(Ω)) using Lemma 4.5. Therefore the I. H. used on (1) supplies us with

T ∗ψ(f(0))

0Γ, ∀XF (X) .

Hence with Ξ = Γ we get

T ∗f(ψ(f(0)))

ω Γ (3)

from (2). Now Corollary 4.3 ensures that f(β)f(Ω), where β = ψ(f(0)); hence f(β) ∈ C(f(β))by Lemma 4.5 since f(Ω) ∈ C(f(Ω)). So using the I. H. on (3), we obtain

T ∗ψ(f(β))

0Γ , (4)

thus T ∗ψα

o Γ as f(β) α.

4. Suppose T ∗α0

ω Γ, A and T ∗α0

ω Γ,¬A , where α0 α and gr(A) < ω. Inductively we then

get T ∗ψα0

0Γ, A and T ∗

ψα0

0Γ,¬A. Let gr(A) = n− 1. Then (Cut) yields

T ∗β1

n Γ (5)

with β1 = (ψα0) + 1. Applying Lemmma 7.2, we get T ∗ωβ1

n−1Γ , and by repeating this process

we arrive atT ∗

βn

0Γ ,

where βk+1 := ωβk (1 ≤ k < n). Since ψα0 < ψα, we have βn < ψα; thus T ∗ψα

0Γ. 2

29

Page 30: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

8 The Refined Reduction Lemma

We would like to define nice derivations also in the context of T ∗. A T ∗-derivation is a tupleconsisting of its direct subderivations (d. s.), the set of minor formulas (m. f.), the set ofprincipal formulas (p. f.) (possibly empty), the ordinal bounds for the length and cut-rank,and a symbol indicating the last inference (l. i.). If there were not the Ω-rule, we needn’t to bespecific about this. We will write D α

% Γ if D is a derivation witnessingα

% Γ . The tuple

D = 〈D0, T , f,∀XH(X), α, %,Ω〉

is said to be a nice T ∗-derivation if: D0α

% Γ,∀XH(X) is nice; f is a fundamental function with

Ω ∈ dom(f) and f(Ω)α; ∀XH(X) ∈ Π11; and for every set of weak formulas Ξ and β < Ω the

following is valid:

If D′ is a T ∗-derivation satisfying D′ β

0Ξ, ∀XH(X), then T (D′) is a nice T ∗-derivation satisfying

T (D′)f(β)

% Ξ,Γ. (So T is a function on proofs.) Nice derivations allow us to improve on theReduction Lemma 7.1.

Lemma 8.1 (The Refined Reduction Lemma) Let B ≡ ∃y∃ZA(y, Z) be ∃Σ12. Let α = ωα1 +

· · · + ωαn ≥ α1 ≥ . . . ≥ αn , β = ωβ1 + · · · + ωβk ≥ β1 ≥ . . . ≥ βk such that β1 ≤ αn. Suppose

that D′ α

ω+1Γ,¬B and D β

ω+1Λ, B are nice T ∗-derivations, where Γ,Λ ⊆ ess−Σ1

2. Then there

is a nice T ∗-derivation D∗ α+β

ω+1Γ,Λ .

Proof by induction on β. If the last inference (l. i.) does not have a Σ12 or ∃Σ1

2 principalformula (p. f.), the assertion follows by using the I. H. on the direct subderivations (d. s.) andreapplying the same inference. (If this is the Ω-rule with fundamental function f , then the newfundamental function will be α + f .) Note that such an inference does not change the stock of”critical” formulas.

If the last inference is (∃2) with Σ12 p. f. A, then the d. s. D0 might not be nice. But this

problem can be overcome by adding A as a side formula throughout the derivation D0, whichgives rise to a nice derivation D1. So we can apply the I. H. on D1 and subsequently (∃2) to getthe assertion.

Now suppose that the p. f. of the l. i. is E ≡ ∃x∃ZA(x, Z) and E ∈ ∃Σ12. Then the d. s. of

D has the form D0β0

ω+1Λ0, C , where β0 β, C ≡ ∃ZA(t, Z), and Λ0 ⊆ Λ, B. Note that D0 is

also nice. Suppose C ∈ Λ. The I. H. provides us with a nice derivation

D+ α+β0

ω+1Λ0 \ B,Γ, C.

So we get a nice derivation D∗ α+β

ω+1Γ,Λ by weakening. If C 6∈ Λ, then the niceness of D

implies that the l. i. of D0 is (∃2) with p. f. C. So the d. s. D1 of D0 then has the form

D1β1

ω+1Λ1, A(t, U) with Λ1 ⊆ Λ0 and β1 β0. Using the I. H. on D1, we find a nice derivation

D2α+β1

ω+1Λ1 \ B,Γ, A(t, U) .

30

Page 31: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

Suppose that E ≡ B. Then ¬B ≡ ∀y∀Z¬A(y, Z). Using inversion (6.1 2. and 6.1 6.) on D′, weget a derivation

D3α

ω+1Γ,¬A(t, U) ,

which is also nice, since it has the same stock of Σ12 minor and side formulas as D′. Using (Cut)

on D2 and D3 (and weakening) we get a nice derivation

D∗ α+β

ω+1Γ,Λ.

Finally, suppose E is different from B. Then E ∈ Γ. Then apply (∃2) followed by (∃1) to D2, to

get a nice derivation D∗ α+β

ω+1Λ1 \ B,Γ, E (note that α+β1α+β0α+β). As Λ1\B ⊆ Λ

and E ∈ Γ, the result follows by weakening. 2

9 Embedding (Π12 −BI)0 into T ∗.

The objective of this Section is to embed (Π12 −BI)0 into T ∗, so as to obtain an upper bound

for the proof-theoretic ordinal of (Π12 −BI)0 by putting to use results of the previous Section.

The treatment of the Π12 bar induction rule requires the following lemma.

Lemma 9.1 Let E(a) and F (a, U, V ) be arithmetic formulas. If Λ is an arbitrary set of T ∗

formulas and T ∗δ

ω+1Λ, ∀y[E(y) → ∀Y ∃ZF (y, Y, Z)], then

T ∗Ω·2+δ·4ω+1

Λ,∀y ∀Y ∃Z [E(y) → F (y, Y, Z)].

Proof by induction on δ. Let G0 ≡ ∀y[E(y) → ∀Y ∃ZF (y, Y, Z)]. If G0 is not the p.f. of thelast inference, then we get the desired derivation by employing the I. H. on the premises andreapplying the same inference.Now let G0 be the p.f. of the last inference, which must be an instance of the ω-rule. Hence

T ∗δ0

ω+1Λ, G0, E(n)→ ∀Y ∃ZF (n, Y, Z)

for some δ0 δ and all n. Let G be ∀y ∀Y ∃Z [E(y) → F (y, Y, Z)]. The I. H. yields

T ∗α

ω+1Λ, G,E(n)→ ∀Y ∃ZF (n, Y, Z) ,

where α = Ω · 2 + δ0 · 4. Hence, by 6.1 (3) followed by 6.1 (6), we get that, for all n,

T ∗α

ω+1Λ, G,¬E(n), ∃ZF (n, U, Z) (1)

for a ‘fresh’ variable U .From Lemma 6.6 we obtain

T ∗Ω·2ω+1∃Z[E(n)→ F (n, U, Z)], E(n) ∧ ¬F (n, U, V ),

whence, by Lemma 6.1 (2),

T ∗Ω·2ω+1∃Z[E(n)→ F (n, U, Z)], E(n) (2)

31

Page 32: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

andT ∗

Ω·2ω+1∃Z[E(n)→ F (n, U, Z)],¬F (n, U, V ) . (3)

An inference (∀2) applied to (3) yields

T ∗Ω·2+1

ω+1∃Z[E(n)→ F (n, U, Z)], ∀Z¬F (n, U, Z) . (4)

Applying (Cut) to (1) and (2) gives

T ∗α+1

ω+1Λ, G,∃Z[E(n)→ F (n, U, Z)], ∃ZF (n, U, Z) (5)

and a further (Cut) with (4) yields

T ∗α+2

ω+1Λ, G,∃Z[E(n)→ F (n, U, Z)]. (6)

Continuing the derivation, first using (∀2) and then the ω-rule, we arrive at

T ∗α+4

ω+1Λ, G. (7)

As α+ 4 Ω · 2 + δ · 4, this finishes the proof. 2

Theorem 9.1 Let Γ be a set of ess-Σ12 formulas. Suppose that Γ∗ results from Γ by replacing

all the free variables in formulas of Γ with numerals. Then

(Π12 −BI)0

k

ω+1Γ =⇒ T ∗

Ωf(k)

ω+1Γ∗ ,

where f(k) := (k + 1) · 4.

Proof by induction on k.Using Lemma 5.1 2.), we can assume that we have a nice derivation of Γ. During this proof weshall also ensure that the corresponding T ∗-derivation will be nice.

1. If Γ is an axiom (Ax1) or (Ax2), then Γ∗ is also an axiom of T ∗.2. If Γ is an axiom (IA), then this follows from Lemma 6.3 2.).3. Suppose Γ is an axiom (Π1

0 − CA), i. e., Γ contains a formula (∃Z)(∀x)[x ∈ Z ↔ A(x)],with A arithmetic. Setting F (U) := (∀x)[x ∈ U ↔ A∗(x)], we obtain

T ∗ω+ω

0F (A∗) (1)

using Lemma 6.2.By Lemma 6.6 we have

T ∗Ω·20∃ZF (Z),¬F (A∗). (2)

Thus (1) and (2) yield (since gr(F (A∗)) < ω) T ∗Ωf(0)

ω ∃ZF (Z) , thence T ∗Ωf(0)

ω+1Γ∗ .

Observe that in the cases, we have been considering so far, the infinitary derivations didn’tcontain inferences with Σ1

2 m. f. or p. f.; so they are automatically nice.

32

Page 33: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

4. Suppose (Π12 −BI)0

k

ω+n−1Γ is the result of an inference (∃2) with principal formula

∃XH(X) ∈ Γ having grade ω. Then

(Π12 −BI)0

k0

ω+1Γ0, H(U)

for some k0 < k and Γ0 ⊆ Γ. By I.H., we then have a nice T ∗-derivation D0Ωf(0)

ω+1Γ∗0, H

∗(U) .

The (∃2)-rule of T ∗ is not available in this situation since gr(H∗(U) < ω. We have to resort toLemma 6.6 to get a nice

D1Ω·20∃XH∗(X),¬H∗(U) .

Applying (Cut) gives us a nice

D2Ωf(k)

ω+1Γ∗ .

5. Suppose (Π12 −BI)0

k0

ω+1Γ, A and (Π1

2 −BI)0k0

ω+1Γ,¬A for some k0 < k and gr(A) <

ω + 1. Inductively we then get

T ∗Ωf(k0)

ω+1Γ∗, A∗

and

T ∗Ωf(k0)

ω+1Γ∗,¬A∗.

So, by (Cut), we obtain T ∗Ωf(k)

ω+1Γ∗ niceness preserving.

6. Let (Π12 −BI)0

k

ω+1Γ be the result of (∀1). Then

(Π12 −BI)0

k0

ω+1Γ0, F (a)

for some k0 < k and Γ0,∀xF (x) = Γ. Inductively we get T ∗Ωf(k0)

ω+1Γ∗0, F

∗(n) for every n. Thus

the assertion follows by the ω-rule. Since (∀1) does not violate niceness in the original derivation,the ω-rule won’t do this in the infinitary derivation.

7. Let (Π12 −BI)0

k

ω+1Γ be the result of a logical inference other than the ones already

treated. Then the assertion follows using the I.H. on the premises and subsequently reapplyingthe same inference in T ∗. This of course preserves niceness.

8. Finally, we have to deal with the Π12 bar induction rule. So suppose Γ = Λ, ∃ZB(b, V, Z),

(Π12 −BI)0

k0

ω+1Λ,WF (≺) , (3)

and(Π1

2 −BI)0k0

ω+1Λ, ∃y ∃Y ∀Z[y ≺ a ∧ ¬B(y, Y, Z],∃ZB(a, U, Z), (4)

where k0 < k and a, U are ”fresh” variables. Put

C(n) :≡ ∃y∃Y ∀Z[y ≺ n ∧ ¬B(y, Y, Z)]∗,

andA(n) :≡ ∀Y ∃ZB(n, Y, Z)∗.

33

Page 34: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

By I.H. we then have, for β = Ωf(k0) and for each m < ω, a nice T ∗-derivation

Dnβ

ω+1Λ∗, C(n), ∃ZB(n, U, Z)∗;

thus, using (∀2), we obtain a nice derivation

D1n

α0

ω+1Λ∗, C(n), A(n) (5)

with α0 = Ωf(k0)+1 (note that ∃ZB(n, U, Z)∗ 6∈ Σ12), and also, by the I. H. used on (3), we get

a niceD2 α0

ω+1Λ∗,WF (≺)∗ . (6)

Moreover, Lemma 6.6 provides us with a nice

D′ Ω·20¬∀X TI(≺∗, X), T I(≺∗, A),

simply because the formulas occuring in this derivation do not have ∃Σ12 subformulas. By Lemma

6.1 3. we get a nice

D3 Ω·20¬∀X TI(≺∗, X),∃z[¬C0(z) ∧ ¬A(z)], ∀zA(z) , (7)

whereC0(b) ≡ ∃y[y ≺ b ∧ ∃Y ∀Z¬B(y, Y, Z)]∗.

Claim: If β ≤ Ω · 2, Ξ a set of formulas which do not have Σ12 subformulas, and D∗ is a nice

derivation such thatD∗ β

0Ξ,∃z[¬C0(z) ∧ ¬A(z)], ∀zA(z) ,

then, for all m < ω, there is a nice

D∗mg(β)

ω+1Ξ,Γ∗, A(m) ,

whereg(β) = α0 · ωβ = Ωf(k0)+1 · ωβ.

We prove the Claim by induction on β. Put

D :≡ ∃z[¬C0(z) ∧ ¬A(z)].

If neither D nor ∀zA(z) is a principal formula of the last inference, then the assertion is easilyobtained by the induction hypothesis. Suppose now that ∀zA(z) is the principal formula. Then

there is some β0β such that for allm < ω there is a niceDmβ0

0Ξ, D, ∀zA(z), A(m) . Inductively

we then can pick a niceD′mg(β0)

ω+1Ξ,Γ∗, A(m) , thusD∗m

g(β)

ω+1Ξ,Γ∗, A∗(m) for some niceD∗m (note

that g(β0) g(β)).Now let D be the principal formula. Then there are β0 β, a term t, and a nice

D∗0β0

0Ξ,∀zA(z), D,¬C0(t) ∧ ¬A(t) .

34

Page 35: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

Let n be the value of t. Using the I. H. and Lemma 6.1 4.), we get a nice

D∗1g(β0)

0Ξ,Γ∗, A(m),¬C0(n) ∧ ¬A(n).

By Lemma 6.1 it 2.) we therefore find nice D∗2 and D∗3 such that

D∗2g(β0)

ω+1Ξ,Γ∗, A(m),¬C0(n) (8)

and

D∗3g(β0)

ω+1Ξ,Γ∗, A(m),¬A(n). (9)

We would like to replace

¬C0(n) ≡ ∀y[y ≺∗ n → ∀Y ∃Z B(y, Y, Z)∗]

with¬C(n) ≡ ∀y ∀Y ∃Z [y ≺∗ n→ B(y, Y, Z)∗].

This can be done using Lemma 9.1. By inspection of the proof of 9.1, one readily verifies thatthis transformation leads from nice derivations to nice derivations. So from (8) we obtain a niceD∗4 satisfying

D∗4η

ω+1Ξ,Γ∗, A(m),¬C(n), (10)

where η := g(β0) ·4. If we now use the Refined Reduction Lemma 8.1 on (5) and (10), we obtaina nice

D∗5η+α0

ω+1Ξ,Γ∗, A(m), A(n). (11)

Note that A(n) is neither the minor formula of (∃1) in D∗3 nor in D∗5 ((9) and (11)). Thence, byusing the Reduction Lemma 7.1 on (9) and (11), we arrive at a nice

D∗6η+α0+g(β0)

ω+1Ξ,Γ∗, A(m) .

Since η+α0+g(β0) = Ωf(k0)+1 ·ωβ0 ·4+Ωf(k0)+1+Ωf(k0)+1 ·ωβ0 = Ωf(k0)+1 ·ωβ0 ·5Ωf(k0)+1 ·ωβ =g(β), this furnishes proof of the Claim. Letting Ξ = ¬WF (≺∗) and β = Ω ·2, the Claim, usedon (7), yields a nice

D∗mg(Ω·2)

ω+1¬WF (≺∗),Γ∗, A(m). (12)

Now, g(Ω · 2) = Ωf(k0)+3. So applying (Cut) to (6) and (12), will provide us with a derivation

D+m

Ωf(k)

ω+1Γ∗, A(m) (13)

for all m < ω. Since A(m) ≡ ∀Y ∃ZB∗(m,Y, Z) and ∃ZB∗(m,V, Z) ∈ Γ∗ for some m < ω, theassertion follows from (13) and Lemma 6.1 (6). 2

Corollary 9.1 Let A be a Π11 sentence such that (Π1

2 −BI)0 A , then there is some α < ΩΩω

such that T ∗ψα

0A .

35

Page 36: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

Proof. Employing Theorem 9.1, we find an n < ω such that T ∗Ωn

ω+1A ; thus T ∗

ωΩn

ω A by

Lemma 7.2. Theorem 7.1 then yields T ∗ψ(ωΩn )

0A . 2.

Corollary 9.2 Let ≺ be an arithmetic well-ordering such that(Π1

2 −BI)0 WF (≺) , then the order type of ≺ is less than ψΩΩω .

Proof. From a cut–free proof of WF (≺) in T ∗ of length less α one can extract that the order–typeof ≺ is majorized by 2α (cf. [4] 13.10 or [8]). 2

Corollary 9.3 (Π12 −BI)0 does not prove ∀nKT (n).

Proof. Immediate by Theorem 1.1, Corollary 9.2, and Corollary 3.2. 2

10 A Well-ordering Proof in (Π12 −BI)−0

By Corollary 9.2 and ψΩΩω ≤ ϑΩω (see Corollary 3.2), we know that the well-foundedness ofthe primitive recursive well-ordering < ϑΩω cannot be proved in (Π1

2 −BI)0 . This Section isaimed at showing (Π1

2 −BI)−0 WF (< ϑΩn) for any (meta) n. In the sequel α, β, γ, δ, . . . aresupposed to range over OT (ϑ). < will be used to denote the ordering on OT (ϑ). We are goingto work informally in second order arithmetic. Especially, variables X,Y, Z, . . . are ranging oversubsets of N. Note also that we consider OT (ϑ) to be a subset of N. The proofs require someterminology.

Definition 10.1 1. Acc := α < Ω: WF (< α),

2. M := α : EΩ(α) ⊆ Acc,

3. α <Ω β :⇐⇒ α, β ∈M ∧ α < β.

Remark: Acc is a Π11–definable class. Therefore M and <Ω are also Π1

1.

Lemma 10.1 α, β ∈ Acc =⇒ α+ ωβ ∈ Acc.

Proof. Familiar from Gentzen’s proof in PA. The proof just requires ACA0. (cf. [4], Sec. 15).2

Lemma 10.2 Acc = M ∩ Ω (:= α ∈M : α < Ω.)

Proof. If α ∈ Acc, then EΩ(α) ⊆ Acc as well; hence α ∈ M ∩ Ω. If α ∈ M ∩ Ω, thenEΩ(α) ⊆M ∩ Ω, so α ∈ Acc follows from Lemma 10.1. 2

Definition 10.2 Let ProgΩ(X) stand for

(∀α ∈M)[(∀β <Ω α)(β ∈ X) −→ α ∈ X].

Let AccΩ := α ∈M : ϑα ∈ Acc.

Lemma 10.3 ProgΩ(AccΩ).

36

Page 37: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

Proof. Assume α ∈ M and (∀β <Ω α)(β ∈ AccΩ). We have to show ϑα ∈ Acc. It suffices toshow

β < ϑα =⇒ β ∈ Acc. (1)

We shall employ induction on Gϑ(β), i.e., the length of (the term that represents) β. If β 6∈ E,then (1) follows easily by the inductive assumption and Lemma 10.1. Now suppose β = ϑβ0.Then we have to distinguish two cases according to Lemma 1.2.Case 1: β ≤ α∗. Since α ∈M, we have α∗ ∈ EΩ(α) ⊆ Acc; therefore β ∈ Acc.Case2: β0 < α and β0 < ϑα. As the length of β∗0 is less than the length of β, we get β∗0 ∈ Acc;thus EΩ(β0) ⊆ Acc, therefore β0 ∈M. By the assumption at the beginning of the proof, we thenget β0 ∈ AccΩ; hence β = ϑβ0 ∈ Acc. ut

It should be noted that the proof of Lemma 10.3 only requires complete induction for Π11–

classes. The next lemma is where we really need (Π12 −BI)−0 .

Lemma 10.4 Let A(a) be a Π11 formula. Letting Ak be the formula

∀α[(∀β <Ω α)A(β) −→ (∀β <Ω α+ Ωk)A(β)];

(Π12 −BI)−0 proves ProgΩ(ξ : A(ξ)) −→ Ak.

Proof. We proceed by outer induction on k. Assume ProgΩ(α : A(α)) and (∀β <Ω δ)A(β).Then also (∀β ≤Ω δ)A(β). For k = 0 this gives the assertion, since γ < δ + Ω0 implies γ ≤ δ.Now let k = m + 1. So we get Am by the inductive assumption. Let B(η) be the formula(∀β <Ω δ + Ωm · η)A(β). B(η) is (in ACA0) provably equivalent to a Π1

2 formula. Suppose thatη ∈ Acc and (∀ρ < η)B(η). Clearly, B(0) holds. If η is a limit, then for β <Ω δ + Ωm · ηthere exists ρ < η such that β <Ω δ + Ωm · ρ, hence B(η) holds. Now let η be a successorγ + 1. Then (∀β <Ω δ+ Ωm · γ)A(β). Using Am, this implies (∀β < δ+ Ωm · γ + Ωm)A(β), thus(∀β <Ω δ + Ωm · (γ + 1))A(β), hence B(η). By the above considerations, we have

(∀η ∈ Acc)[(∀ρ < η)B(ρ) −→ B(η)],

hence, using Π12 bar induction along < (η + 1), we obtain

(∀η ∈ Acc)B(η).

Now for every β <Ω δ + Ωk, there is some η ∈ Acc such that β <Ω δ + Ωm · η. Therefore

(∀β <Ω δ + Ωk)A(β)

follows from (∀η ∈ Acc)B(η). 2

Theorem 10.1 For any n, (Π12 −BI)−0 WF (< ϑΩn) .

Proof. Obviously (∀β <Ω 0)A(β) holds for any formula. Therefore we obtain

(Π12 −BI)−0 ProgΩ(ξ : A(ξ)) −→ (∀β <Ω Ωk)A(β) (1)

37

Page 38: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

for any Π11–formula A(α) and k ∈ N. Since the formula “ξ ∈ AccΩ” can be written Π1

1, Lemma10.3 along with (1) implies, for any k ∈ N,

(Π12 −BI)−0 (∀β <Ω Ωk)[β ∈ AccΩ] .

Therefore, for any n, (Π12 −BI)−0 WF (< ϑΩn) . ut

Corollary 10.1 The proof-theoretic ordinal of (Π12 −BI)0 and (Π1

2 −BI)−0 is ϑΩω.

Proof. Theorem 10.1 and Corollary 9.2. ut

Corollary 10.2 For every n < ω, (Π12 −BI)−0 ` KT (n).

Proof. For fixed n the proof of KT (n) requires only WF (ϑΩm) for some m < n+3. This followsfrom the proof of Theorem 2.2. ut

The methods of the last six sections can also be employed to determine the proof-theoreticordinals of the systems (Π1

n −BI)0, (Π1n −BI)−0 , (Π1

n −BI), and (Π1n −BI)− for n > 2.

For T a theory, let |T | denote the proof–theoretic ordinal of T .Letting Ω(1, α) = Ωα, and Ω(n+ 1, α) := ΩΩ(n,α) for n > 1, the following is true.

Theorem 10.2 Let n ≥ 2.

1. |(Π1n −BI)0 |=|(Π1

n −BI)−0 |=|KPω− + Πn − Foundation |= ϑΩ(n− 1, ω).

2. |(Π1n −BI) |=|(Π1

n −BI)− |= ϑΩ(n− 1, ε0).

Proof. The results about the set theories are from [5]. ut

11 ACA0 ` KT ↔ RFNΠ11((Π1

2 −BI)0)

In view of Theorem 10.1 one is naturally led to search for a natural strengthening of (Π12 −BI)0 that

proves WF (ϑΩω) and is not stronger than ACA0 + WF (ϑΩω). Of course, (Π12 − BI) proves

WF (ϑΩω), because the outer induction on k in the proof of Lemma 20.4 can be carried out as aformal induction within (Π1

2−BI). Hence (Π12−BI) ` ∀nWF (ϑΩn), thus (Π1

2−BI) `WF (ϑΩω).However, (Π1

2−BI) has proof–theoretic ordinal ϑΩε0 ; so this is not the right candidate. It turnsout that the Uniform Reflection Principle for (Π1

2 −BI)0 , RFNΠ11((Π1

2−BI)0), provides the ap-

propiate strengthening. This scheme asserts the Π11 soundness (with parameters) of (Π1

2 −BI)0 .

Definition 11.1 RFNΠ11((Π1

2 −BI)0) is the scheme

∀n[Pr(Π12−BI)0

(pF (n)q)→ F (n)]

for Π11 formulas F (a) with at most one free variable a. Here Pr(Π1

2−BI)0denotes the Σ0

1 provability

predicate for (Π12 −BI)0 and pF (n)q signifies the Godel number of F (a) when a is replaced by

the nth numeral (for details see [11]).

38

Page 39: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

The proof of Theorem 10.1 can be formalized inACA0. ThereforeACA0 ` ∀xPr(Π12−BI)0

(WF (ϑΩx)).So we come to see the following.

Theorem 11.1 ACA0 +RFNΠ11((Π1

2 −BI)0) `WF (ϑΩω).

The remainder of the Section will be devoted to outlining the proof of the next theorem.

Theorem 11.2 ACA0 +WF (ϑΩω) ` RFNΠ11((Π1

2 −BI)0).

We first give the rough idea of that proof. So let D ` F (n) be a (Π12 −BI)0 proof of a Π1

1

formula F (n) ≡ ∀XA(X, n). Using Theorem 9.1, we can pick a T ∗ derivation D∗ such that

D∗ Ωm

ω+1A(U, n) for some m < Ω which is determined by D. Furthermore, we can transform D∗

into a cut–free derivation D∗∗ α

0A(U, n) with α = ψΩΩm . Since α < Ω, the derivation D∗∗ does

not contain instances of the Ω–rule. Especially, all the formulas occurring in D∗∗ have to besubformulas of A(U, n). Thence, by induction on α one verifies that A(X,n) is true for any setX of natural numbers. Since the formula A(U, n) is Σ0

k for some k, this very last step requiresonly a truth predicate for Σ0

k formulas (with parameters) which is available in ACA0.A minor problem with the above sketch is that we didn’t exactly specify the amount of

transfinite induction that it requires.. Closer inspection reveals that we can restrict ourselves toordinals from C(ΩΩω). Since the order–type of < C(ΩΩω) is not much bigger than σ := ψΩΩω ,namely ωω

σ·ω, the well–foundedness of < C(ΩΩω) is still provable in ACA0 +WF (ψΩΩω).

A considerably greater challenge is offered by the question: How can we formalize infinitaryderivations in ACA0 + WF (ψΩΩω)? Obviously we have to use an effective counterpart of theabove construction, where we work with codes for T ∗–derivations instead of using the T ∗–derivations themselves. The main idea here is that we can do everything with recursive proof–trees instead of arbitrary derivations. A proof–tree is a tree, with each node labeled by: Asequent, a rule of inference or the designation “Axiom”, two sets of formulas specifying theset of principal and minor formulas,respectively, of that inference, and two ordinals (length andcut–rank) such that the sequent is obtained from those immediately above it through applicationof the specified rule of inference. The well-foundedness of a proof–tree is then witnessed by the(first) ordinal “tags” which are in reverse order of the tree order.

If the inference is an instance of (Ω − rule), the label should also provide an index for thefundamental function f and an index for the functional T that gives the transformations onproof–trees (cf. Section 8); hence both are required to be recursive.

We then have to show that none of our manipulations on T ∗–derivations leads us beyondthis class of recursive proof–trees. The latter is guaranteed by the fact that for the embeddingof (Π1

2 −BI)0 into T ∗ we only need instances of the (Ω − rule), where the transformation onproof–trees is given by a recursive functional, and by the fact that all the operations in thecut–elimination procedure are of a local nature, i.e. they give rise to recursive functionals.

To carry out all the details of this constructivization would mean to produce another lengthypaper. But it is high time that we finished this paper; so we simply quit at this point.

References

[1] W. Buchholz and K. Schutte: Proof Theory of Impredicative Subsystems of analysis. Bib-liopolis 1988.

39

Page 40: Proof{theoretic investigations on Kruskal’s theoremrathjen/KRUSKAL.neu.pdf · elementary proof of Kruskal’s theorem is especially felt due to the fact that this theorem gures

[2] N. Dershowitz and M. Okada: Proof-theoretic techniques for term-rewriting theory. Pro-ceedings of the third annual symposium on Logic in Computer Science. (1988) EdinburghScotland, pp. 104-111.

[3] J. H. Gallier:What’s so special about Kruskal’s theorem and the ordinal Γ0? A survey ofsome results in proof theory. Annals of Pure and Applied Logic 53 (1991), pp. 199-260.

[4] W. Pohlers: Proof Theory: An introduction. Lecture Notes in Mathematics 1407, Springer1990.

[5] M. Rathjen: Fragments of Kripke-Platek set theory with infinity. In: P. Aczel, J. Simmons,S. Wainer (eds.): Proof Theory (Cambridge University Press, Cambridge, 1992) 251–273.

[6] D. Schmidt: Well-Partial Orderings and Their Maximal Order Types. Habilitationsschrift,Heidelberg, 1979.

[7] K.Schutte: Proof Theory. Springer 1977.

[8] K.Schutte and S.G. Simpson: Ein in der reinen Zahlentheorie unbeweisbarer Satz uberendliche Folgen von naturlichen Zahlen. Archiv fur mathematische Logik und Grundlagen-forschung 25 (1985), pp. 75-89.

[9] H. Schwichtenberg: Proof Theory: Some applications of cut-elimination. In: Handbook ofMathematical Logic (J. Barwise ed.) North Holland 1977, pp. 867-895.

[10] S. G. Simpson: Nonprovability of certain combinatorial properties of finite trees. In: H.Friedman’s Research on the Foundations of Mathematics, Amsterdam, North-Holland, pp.87-117.

[11] G. Smorynski: The incompleteness theorems. In: Handbook of Mathematical Logic (J.Barwise ed.), North Holland 1977, pp. 821-865.

40