Linguistic side effects

173
Linguistic side eects A thesis presented by Chung-chieh Shan to the Division of Engineering and Applied Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the subject of computer science Harvard University Cambridge, Massachusetts September 2005

Transcript of Linguistic side effects

Page 1: Linguistic side effects

Linguistic side effects

A thesis presented by

Chung-chieh Shanto the Division of Engineering and Applied Sciences

in partial fulfillment of the requirements for the degree ofDoctor of Philosophy in the subject of computer science

Harvard UniversityCambridge, Massachusetts

September 2005

Page 2: Linguistic side effects

c© 2005, Chung-chieh Shan. All rights reserved.

Page 3: Linguistic side effects

Stuart Shieber Linguistic side effects Chung-chieh Shan

Abstract

Apparently noncompositional phenomena in natural languages can be ana-lyzed like computational side effects in programming languages: anaphora can beanalyzed like state, intensionality can be analyzed like environment, quantifica-tion can be analyzed like delimited control, and so on. We thus term apparentlynoncompositional phenomena in natural languages linguistic side effects. We putthis new, general analogy to work in linguistics as well as programming-languagetheory.

In linguistics, we turn the continuation semantics for delimited control into anew implementation of quantification in type-logical grammar. This graphically-motivated implementation does not move nearby constituents apart or distantconstituents together. Just as delimited control encodes many computationalside effects, quantification encodes many linguistic side effects, in particularanaphora, interrogation, and polarity sensitivity. Using the programming-languageconcepts of evaluation order and multistage programming, we unify four linguisticphenomena that had been dealt with only separately before: linear scope inquantification, crossover in anaphora, superiority in interrogation, and linear orderin polarity sensitivity. This unified account is the first to predict a complex patternof interaction between anaphora and raised-wh questions, without any stipulationon both. It also provides the first concrete processing explanation of linear orderin polarity sensitivity.

In programming-language theory, we transfer a duality between expressionsand contexts from our analysis of quantification to a new programming languagewith delimited control. This duality exchanges call-by-value evaluation withcall-by-name evaluation, thus extending a known duality from undelimited todelimited control. The same duality also exchanges the familiar let construct withthe less-familiar shift construct, so that the latter can be understood in terms ofthe former.

Page 4: Linguistic side effects

Contents

Abstract iii

List of Figures vi

Acknowledgments ix

Chapter 1. Introduction 11.1. Linguistics 11.2. Programming-language theory 21.3. Contributions 31.3.1. Evaluation order in natural languages 41.3.2. Delimited duality in programming languages 61.4. Environment in programming languages 71.5. Intensionality in natural languages 111.6. Computational versus linguistic side effects 15

Chapter 2. Formalities 192.1. Context-free grammars for natural language 212.2. A toy programming language 242.3. The λ-calculus 302.3.1. Confluence, normalization, and typing 352.3.2. The formulas-as-types correspondence 362.3.3. Products 382.3.4. Sums 402.3.5. Recursion 412.4. Type-logical grammar 442.4.1. Multimodal type-logical grammar 49

Chapter 3. The analogy: delimited control and quantification 553.1. Delimited control 553.1.1. A first attempt 57

iv

Page 5: Linguistic side effects

Contents v

3.1.2. Enforcing call-by-value, left-to-right evaluation 603.2. Continuations for delimited control 643.2.1. The continuation-passing-style transform 663.2.2. Types for continuations 713.2.3. Higher-order delimited control 763.3. Quantification 833.3.1. Generalized quantifiers 833.3.2. In-situ quantification 863.4. Delimited control versus quantification 91

Chapter 4. Evaluation order in natural languages 934.1. Type environments as graphs 944.2. Scopes apply inward; quantifiers apply outward 984.3. Refining the analysis of quantification by refining notions of context 1034.3.1. Restrictors 1044.3.2. Islands 1064.4. Linear order and evaluation order 1084.5. Beyond quantification 1124.5.1. Anaphora and crossover 1134.5.2. Questions and superiority 1184.5.3. Polarity sensitivity 1234.5.4. Reversing evaluation order 128

Chapter 5. Delimited duality in programming languages 1295.1. Call-by-value versus call-by-name 1305.2. Delimited control in a dual calculus 1365.3. Restoring confluence 1415.4. The continuation-passing-style transform 147

Chapter 6. Conclusion 149

Bibliography 151

Page 6: Linguistic side effects

List of Figures

1.1 Observations on a toy programming language for printing 31.2 A tiny programming language 71.3 A tiny programming language, with environment 91.4 A tiny fragment of English 111.5 A tiny fragment of English, with think 13

2.1 A natural-language fragment 222.2 The context-free grammar equivalent to Figure 2.1 222.3 A natural-language fragment that distinguishes among three kinds of

verbs 232.4 The context-free grammar equivalent to Figure 2.3 232.5 A toy programming language for arithmetic 242.6 The context-free grammar equivalent to Figure 2.5 242.7 Expressions are trees but usually written with a textual representation 252.8 Proving that if true = succ succ 0 then false else 0 + 0 is well

formed 262.9 Evaluation contexts 272.10 The context-free grammar of evaluation contexts, equivalent to

Figure 2.9 272.11 The computation relation between expressions 272.12 Three confluent ways to run the same program 282.13 A toy programming language with a more refined type system 292.14 The untyped λ-calculus 312.15 Proving that λf . λx. f (fx) is well formed 332.16 Evaluation contexts in the λ-calculus 332.17 The simply-typed λ-calculus 352.18 Proving that λf . λx. f (fx) has type (a→ a)→ (a→ a) 362.19 Natural-deduction rules for propositional intuitionistic logic 372.20 Proving that (a→ a)→ (a→ a) 37

vi

Page 7: Linguistic side effects

List of Figures vii

2.21 Proof reduction for→ I followed by→E 382.22 Adding products to the λ-calculus 392.23 Adding unit to the λ-calculus 402.24 Adding sums to the λ-calculus 412.25 Natural-language derivations in the simply-typed λ-calculus, using

structural rules to generate acceptable as well as unacceptablesentences 46

2.26 The non-associative Lambek calculus without products, withsemantics 48

2.27 The non-associative Lambek calculus without products, withoutsemantics 49

2.28 Adding a binary mode m to the Lambek calculus without products 512.29 Adding a unary mode m to the Lambek calculus 522.30 A multimodal type-logical grammar that generates the

context-sensitive language of doubled strings {ww | w ∈ {a, b}∗ } 54

3.1 Defining evaluation subcontexts and redefining evaluation contexts 583.2 Defining values in the λ-calculus 603.3 Restricting the computation relation to enforce call-by-value,

left-to-right evaluation in a λ-calculus with # and abort 633.4 The evaluation result bVc of values V in the λ-calculus 643.5 The continuation-passing-style transform JEK for expressions E in

the λ-calculus with # and abort, under call-by-value, left-to-rightevaluation 67

3.6 Defining expressions in the λ-calculus with # and abort 723.7 Proving that 〈left〈〉, right〈〉〉 is well formed 753.8 Proving that 〈abort(left〈〉), abort(right〈〉)〉 is well formed 753.9 Abstracting control 763.10 Alice thinks nobody saw Bob 853.11 Alice saw nobody’s mother 873.12 Someone saw everyone 89

4.1 Adding the binary mode } to the Lambek calculus without products 954.2 Three ways to decompose ∆[Θ,Π] into a subexpression and a context 984.3 Alice saw [ ] 994.4 Alice saw everyone 1004.5 Alice saw everyone, without the formal bureaucracy 101

Page 8: Linguistic side effects

viii List of Figures

4.6 Linear scope for Someone saw everyone 1014.7 Inverse scope for Someone saw everyone 1024.8 Every school griped 1054.9 Letting a school take narrow scope within dean of a school 1064.10 Linear scope for Someone saw everyone, under left-to-right

evaluation 1104.11 Inverse scope for Someone saw everyone, under left-to-right

evaluation, using unquotation 1134.12 Everyonei saw hisi mother 1174.13 Deriving whose mother Alice saw from Alice saw whose mother 1214.14 Who saw what 1224.15 The negative clause (incomplete sentence) np saw anyone 126

5.1 An untyped λ-calculus with shift and reset, enforcing call-by-value,argument-to-function evaluation 131

5.2 An untyped λ-calculus with shift and reset, enforcing call-by-name,context-to-function evaluation 132

5.3 A dual calculus with delimited control operators 1375.4 Translating the λ-calculus to the dual calculus 1395.5 Duality in the dual calculus 1415.6 A dual calculus with delimited control operators, enforcing

call-by-value evaluation 1425.7 A continuation-passing-style transform for the dual calculus 148

Page 9: Linguistic side effects

Acknowledgments

Chapter 4 is joint work with Chris Barker (Shan and Barker 2005; Barker andShan 2005).

Incomparable thanks to Nate Ackerman, Amal Ahmed, David Ahn, KarlosArregi, Peter Arvidson, Tamara Babaian, Chris Barker, Rachel Ben-Eliyahu-Zohary, Johan van Benthem, Raffaella Bernardi, William Byrd, Daniel Büring,Marco Carbone, Bob Carpenter, Balder ten Cate, Ruggiero Cavallo, VictoriaChang, David Chiang, Florin Constantin, Olivier Danvy, Veneeta Dayal, ElizabethDenne, John Dias, Allyn Dimock, David Dowty, Matthias Felleisen, AndrzejFilinski, Kai von Fintel, Matthew Fluet, Danny Fox, Lisa Friedland, DanielFriedman, Kobi Gal, Svetlana Godjevac, Rajeev Gore, Paul Govereau, Philippe deGroote, Barbara Grosz, Daniel Hardt, Robert Harper, Kelly Heffner, Irene Heim,Andre Henriques, David Hiniker, Glenn Holloway, C.-T. James Huang, LukeHunsberger, Rebecca Hwa, Sabine Iatridou, Pauline Jacobson, Micha Josephy,Aravind Joshi, Gerhard Jäger, Emir Kapancı, Oleg Kiselyov, Samuel Klein,Shriram Krishnamurthi, Susumu Kuno, Sebastien Lahaie, François Lamarche,Jo-wang Lin, Shine Lin, Harry Mairson, Chris Manning, Jan Midtgaard, MichaelMoortgat, Richard Moot, Greg Morrisett, Radhika Nagpal, Peter Neergaard, RaniNelken, Rebecca Nesson, Brenda Ng, Lynn Nichols, Jill Nickerson, AndreeaNicoara, Jonathan Nissenbaum, Richard Oehrle, David Parkes, Francis JeffryPelletier, Fernando Pereira, David Pesetsky, Simon Peyton Jones, Avi Pfeffer,Andrew Pimlott, Chris Potts, Johnn Prance, Michael Rabin, Antonio Ramirez,Norman Ramsey, Tim Rauenbusch, Kevin Redwine, Seth Riney, Janet Rosenbaum,Wheeler Ruml, Amr Sabry, Ivan Sag, Margo Seltzer, Pao-ching Shan, StuartShieber, Olin Shivers, Jeffrey Siskind, Christian Skalka, Paul Steckler, MatthewStone, Kaihsu Tai, Autrijus Tang, Xiaopeng Tao, Geetika Tewari, Hayo Thielecke,Chris Thorpe, Dylan Thurston, Jean-Baptiste Tristan, David Van Horn, WillemijnVermaat, Philip Wadler, Mitchell Wand, Yanling Wang, Susan Wieczorek, YoadWinter, and Noam Zeilberger.

This research is supported by the United States National Science FoundationGrants IRI-9712068 and BCS-0236592.

This version of the dissertation is typeset for use rather than archival. The twoversions differ only in numbering and cross-references. All citations should be tothis version.

ix

Page 10: Linguistic side effects
Page 11: Linguistic side effects

CHAPTER 1

Introduction

This dissertation is about computational linguistics, in two senses. First,we apply insights from computer science, especially programming-language se-mantics, to the science of natural languages. Second, we apply insights fromlinguistics, especially natural-language semantics, to the engineering of program-ming languages. The phrase “computational linguistics” has a popular third sense,which is natural-language processing: teaching computers to listen to, speak,read, and write natural language. That is not our aim, even though the researchdescribed here indirectly helps it—by enhancing our understanding of natural andprogramming languages. This dissertation contains no quantitative performancemeasures. Rather, our success is measured by the qualitative yardsticks of linguis-tics and programming-language theory. Therefore, we introduce these yardsticksbefore describing our contributions in more detail.

1.1. Linguistics

Linguistics aims to explain empirical observations of natural language: whatutterances are available for what meanings. For example, a linguist observingHarvard students may note that they can pronounce and perceive the utterancesAlice passed and Bob passed without any sense of error, and that they reactdifferently to the two utterances. But before analyzing or even reporting suchobservations, the linguist must focus on some aspects of language and abstractaway from others. For example, Alice passed can be pronounced and perceiveddifferently by different people in different situations with different shades of mean-ing. We pretend—as linguists often do—that there is a single language calledEnglish, spoken perfectly at an instant in time by a homogeneous communitywho pronounce Alice passed identically. We idealize—as natural-language syn-tacticians often do—the pronunciation of Alice passed as a two-word sequence,which speakers judge to be acceptable with little or no information about thesituation of use. By contrast, we take the two-word sequence *passed Aliceto be unacceptable (notated with the asterisk in front), in that it is only with asense of error that an idealized community of English speakers can produce orcomprehend the isolated utterance in an idealized situation of English speech.

As natural-language semanticists often do, we distinguish the meanings of

1

Page 12: Linguistic side effects

2 1. Introduction

utterances by observing their truth conditions: when is an utterance true, andwhen is it false? That is, what situations (or models, or possible worlds) satisfya given utterance’s description? For example, we observe empirically that Alicepassed and Bob passed differ semantically, because a situation exists whereAlice passed is true but Bob passed is false. We also observe that Alice passed istrue in every situation where Both Alice and Bob passed is true. (These are factsabout English, in that only an English speaker knows them, but they presumablyhave to do with corresponding logical entailments and non-entailments.) Thus weaim to produce a scientific theory that is as simple as possible and accounts asaccurately as possible for these observations of utterance acceptability and truthconditions. Like many linguists, we focus our theories on particular linguisticphenomena we are interested in and utterances that exemplify them, then expectto gain formal devices and informal insights that apply more broadly.

Our notions of word and meaning only intuitively resemble the colloquialsenses of the terms, because we justify (operationalize) them only by how muchthey help us account for utterance acceptability and truth conditions. For example,we view Alice passed as a two-word sequence only because it makes for a simpleror more accurate account of utterance acceptability, not because standard Englishorthography puts a space between Alice and passed. Similarly, to determine themeaning of Alice, we weigh only what would make for a simpler or more accurateaccount of truth conditions, not our intuition that the utterance Alice passed isabout the person Alice. Indeed, truth conditions only tell us that certain utteranceshave different meanings, not what these meanings are. Moreover, truth conditionsfall silent on utterances that can be neither true nor false, such as Alice, Did Alicepass?, and If Alice passed. Yet as Section 1.5 illustrates, natural language is richenough for us to find out about words and meanings just by striving to account forutterance acceptability and truth conditions simply and accurately. For example,the utterances mentioned above help us decide that Alice is a word and what itmeans.

1.2. Programming-language theory

Computer-science research on programming aims to help programmers conveytheir intentions, and computers execute them, correctly and efficiently. Program-ming-language theory contributes to this goal by modeling and reasoning aboutthe conveyance and execution. As has proven useful, we abstract away from howprogrammers and computers interact and vary over time and space in differentenvironments. Instead, we formalize a programming language as a collection ofwell-formed programs that evaluate to observable outcomes. Under this model, anidealized programmer conveys a well-formed program to an idealized computer,which runs the program and announces the outcome. Programs that are ill-formed

Page 13: Linguistic side effects

1.3. Contributions 3

Well-formed program Outcome

2 + 3 5print (2 + 3) 5, and printing 52 + (print 3) 5, and printing 3

Ill-formed program No outcome

2 + ×

print ×

2 + (3 print) ×

Figure 1.1. Observations on a toy programming language for printing

(that is, not well-formed) are prohibited.As in linguistics, we focus our theories on issues and programs of interest, then

expect to gain formal devices and informal insights that apply more broadly. Forexample, to study the issue of printing, we may contemplate a toy programminglanguage with the programs and outcomes listed in Figure 1.1. An outcome inthis language is a number and possibly some printed output.

Suppose that a demanding customer calls up a programming-language designshop and orders the observations given in Figure 1.1. The programming-languagetheorist at the shop then tries to design simple rules that give rise to the obser-vations accurately, so as to teach programmers and build computers to conveyand execute intentions in the language. These rules invoke notions of syntax andsemantics to distinguish well-formed programs and determine their outcomes.For example, it can be useful to view parentheses (as in print (2 + 3)) as syntacticunits (“words”) in their own right, but only for some purposes (like checking fora well-formed program) and not others (like executing a well-formed program).Similarly, it can be useful to assign semantic values (“meanings”) to programparts like 2, +, and print, but only to account for different outcomes (that is, toensure that expressions with the same meaning have the same outcome). Forexample, the fact that the three well-formed programs in Figure 1.1 differ inoutcome tells us only that they differ in meaning, not what they mean. Moreover,outcomes fall silent on program parts that are not complete programs, such as +and print. Yet as Section 1.4 illustrates, we can learn plenty about the syntax andsemantics of a rich-enough programming language just by striving to specify itswell-formed programs and their outcomes simply and accurately.

1.3. Contributions

This dissertation shows a new way for linguists and programming-languagetheorists to share their work and help each other: an analogy between apparentlynoncompositional phenomena in natural languages and computational side effectsin programming languages. As is well-studied and explained below in this chapter,expressions like every student elude compositional treatment at first glance, andprograms like print 3 incur computational side effects. It turns out that these

Page 14: Linguistic side effects

4 1. Introduction

phenomena may be analyzed using similar tools. To stress this analogy, weterm apparently noncompositional phenomena in natural languages linguistic sideeffects. We apply this analogy to linguistics as well as programming-languagetheory.

Many connections between natural and programming languages are based onlogic, including this one. The reason is that logic underlies the syntax and seman-tics of both kinds of languages. On the programming-language side, programs canbe viewed as logical proofs according to the formulas-as-types correspondence(Girard et al. 1989), as well as reasoned about using logical axioms (Hoare 1969).On the natural-language side, the proofs and models of logic can characterizethe acceptability of utterances (Lambek 1958) as well as their truth conditions(Montague 1974a).

We introduce the logical machinery used in the bulk of this dissertation inChapter 2. Then, in Chapter 3, we use this machinery to present the analogybetween computational and linguistic side effects. In one direction, Chapter 4draws from the execution of computer programs to model natural languagesmore realistically. In the opposite direction, Chapter 5 then draws from thesymmetry of linguistic combinations to design programming languages moreperspicuously. We evaluate our contributions and compare them with previouswork throughout the dissertation, then collect the evaluations in Chapter 6, alongwith some thoughts for future work.

To be sure, this work is far from the first to draw an analogy between utterancesand programs. Intensional logic, in which much natural-language semanticssince Montague (1974a) is couched, has long been understood computationally(Hobbs and Rosenschein 1978; Hung and Zucker 1991). More recently, dynamicsemantics (Groenendijk and Stokhof 1991; Heim 1982; Kamp 1981) has beenused to analyze natural-language phenomena such as verb-phrase ellipsis (vanEijck and Francez 1995; Gardent 1991; Hardt 1999), as well as to design aprogramming language (van Eijck 1998). Our work is novel because the generalconcept of side effects unifies many concerns at both ends of the analogy.

The rest of this section gives more details on our contributions to the study ofnatural and programming languages.

1.3.1. Evaluation order in natural languages. Most computational sideeffects and many linguistic ones are thought of as the dynamic effect of executinga program or processing an utterance. This is the intuition underlying the term“side effects”. For example, it is intuitive to conceive of state (a computationalside effect) and anaphora (a linguistic side effect) in similar, dynamic terms, asfollows. In the sentence

(1.1) Every woman’s father saw her mother,

Page 15: Linguistic side effects

1.3. Contributions 5

the pronoun her can refer to the woman introduced by the antecedent everywoman. Thus (1.1) can mean that every woman x is such that x’s father saw x’smother. A pronoun tends to appear only after its antecedent, so the sentence

(1.2) Her father saw every woman’s mother

does not have the same meaning. To explain this prohibition against crossover,it is popular to hypothesize that every woman stores a discourse referent into amemory cell and her retrieves it, and that a human typically processes parts of anutterance in spoken order. This hypothesis is appealingly reminiscent of how theprogram

(1.3) x := 2; x + 1

stores a number into the variable x and later retrieves it, and how a computertypically executes parts of a program in written order. This view on anaphora andstate is dynamic, as indicated by the preceding verbs “store”, “retrieve”, “process”,and “execute”. Unfortunately, the naïve form of this hypothesis fails to accountfor exceptions like the following.

(1.4) Which of her relatives did every woman see?(1.5) Which woman did her father see?

In (1.4), the phrase every woman occurs after the pronoun her, yet can serve asits antecedent. In (1.5), the phrase which woman occurs before the pronoun her,yet cannot serve as its antecedent.

Relating this dynamic (operational) view on side effects to a static (denota-tional) view has been a focus of recent research. Across our new side-effectsanalogy, linguists and programming-language theorists have been asking alike:how does an expression manage to be both a static product that stands alonemathematically and a dynamic action that takes place physically (Wadler 1997;Trueswell and Tanenhaus 2005)? For programming languages, game semantics(Abramsky and McCusker 1997) and other models of interaction (Milner 1996)blur the boundary between what programs statically denote and how they dynam-ically operate. Linguists are constantly reminded that the distinction betweensemantics and pragmatics is far from black and white, by dynamic semantics andnotions like the felicity of an answer to a question (Hamblin 1973; Groenendijkand Stokhof 1997) and the computational load to produce or comprehend anutterance (Gibson and Pearlmutter 1998; Altmann 1990).

In Chapter 4, we apply the dynamic concept of evaluation order from pro-gramming languages to natural languages. Using this concept, we properlyformalize the hypothesis about anaphora above to account for crossover, includ-ing exceptions like (1.4) and (1.5). Moreover, the similarity between anaphoraand state turns out to be just one in a wide range of similarities between linguistic

Page 16: Linguistic side effects

6 1. Introduction

and computational side effects: intensionality can be analyzed like environment,quantification can be analyzed like delimited control, and so on. The singlehypothesis that humans typically process utterance parts in spoken order, once weformalize it properly, turns out to explain an unprecedented variety of linguisticgeneralizations whose similarity across side effects was previously unrecognized:not just the prohibition against crossover in anaphora, but also the preference forlinear scope in quantification, the superiority constraint on questions, and theeffect of spoken order on polarity sensitivity. Via our side-effects analogy, pro-gramming-language theory helps us formalize this natural hypothesis regardinghow humans process utterances dynamically.

1.3.2. Delimited duality in programming languages. Many linguistic for-malisms feature a duality between left and right, that is, an involution that reversesthe spoken order of utterance parts. Type-logical grammar, the linguistic formal-ism used in this dissertation, is one such formalism whose left-right duality isespecially obvious. For example, given a model of English in type-logical gram-mar, it is trivial to mirror it to yield a model of a hypothetical natural languagethat is just like English, except with the opposite word order, so *Alice passed isnot an acceptable utterance, but passed Alice is.

The left-right duality in type-logical grammar is intrinsic to the default modeof binary combination: juxtapose two utterances to form a larger one. To analyzethe linguistic side effect of quantification, in Chapter 4 we add another modeof binary combination: plug a subexpression into a context to form a largerexpression. This multimodal type-logical grammar thus features another intrinsicduality, between the subexpression inside a context and the context outside asubexpression.

In Chapter 5, we apply the duality between inside and outside from naturallanguages to programming languages. Drawing from our linguistic analysis ofquantification, we formalize a programming language with the computationalside effect of delimited control, in which subexpressions and contexts are dual.It turns out that this duality exchanges call-by-value and call-by-name, twoevaluation orders long studied in programming languages, so that one order maybe understood in terms of the other. This result is not surprising, because call-by-value and call-by-name are known to be dual in the presence of undelimitedcontrol (Filinski 1989a,b; Danos et al. 1995; Curien and Herbelin 2000; Selinger2001; Wadler 2003). Via our side-effects analogy, linguistics helps us extend thisduality to delimited control for the first time. The same duality also exchanges thewell-understood let construct for variable binding and the less-well-understoodshift construct for delimited control, so that one construct may be understood interms of the other.

The rest of this chapter informally illustrates our analogy between compu-

Page 17: Linguistic side effects

1.4. Environment in programming languages 7

Well-formed program Outcome

2 > 6 false2 + 6 > 6 true2 + 6 > (2 + 2) + 2 true2 + 6 > 6 + 6 false2 > 2 false

Figure 1.2. A tiny programming language

tational and linguistic side effects. We examine how a denotational semanticstypically treats environment (a computational side effect) in Section 1.4, andintensionality (a linguistic side effect) in Section 1.5. We then point out how thetreatments are similar in Section 1.6.

1.4. Environment in programming languages

Figure 1.2 illustrates a tiny programming language where arithmetic expres-sions are compared to yield Boolean results (true or false). For example, theprogram

(1.6) 2 + 6 > 6‘(Check if) the sum of 2 and 6 is greater than 6.’

evaluates to the true outcome.Although arithmetic expressions like 2 + 6 and 6 are not complete programs

with (Boolean) outcomes, it is intuitive to regard them as evaluating to numericresults. For example, we regard 2 + 6 as evaluating to 8, and 6 as evaluating to 6,even though these results are only indirectly observable by comparison using >.Under this intuition, if two expressions E1 and E2 evaluate to the same result inthis language, then whenever E1 occurs as part of a larger expression, it can bereplaced with E2 without affecting what the larger expression evaluates to.1 Forexample, since 6 and (2 + 2) + 2 evaluate to the same result, so do 2 + 6 > 6 and2 + 6 > (2 + 2) + 2.

In operational terms, each expression is a procedure that can be followed toproduce a result. In denotational terms, we can take each expression simply tomean this result. For example, the program (1.6) can simply denote true. Toprovide a denotational semantics for a language is to specify the denotation JEKof every expression E. For our programming language, we specify the integer

1This property is sometimes known as referential transparency (Quine 1960), but we avoidthe term because it is problematically overloaded (Søndergaard and Sestoft 1990, 1992).

Page 18: Linguistic side effects

8 1. Introduction

and Boolean denotations

J2K = 2,(1.7)

J6K = 6,(1.8)

JA + BK = JAK + JBK ,(1.9)

JA > BK ={

true if JAK > JBK,false if JAK ≤ JBK,

(1.10)

where A and B are any expressions. These rules together entail that the pro-gram (1.6) denotes true.

J2 + 6K = J2K + J6K = 2 + 6 = 8.(1.11)

J2 + 6 > 6K ={

true if J2 + 6K > J6Kfalse if J2 + 6K ≤ J6K

(1.12)

= true, because 8 > 6.

We want a denotational semantics that is sound with respect to the operationalsemantics, and this one is. Soundness means that, if two expressions E1 and E2

have the same denotation, then they evaluate to the same result. For example,since 2 > 6 and 2 + 6 > 6 + 6 both denote false, soundness guarantees (correctly)that they evaluate to the same result (namely false). Soundness is desirablebecause expressions with the same denotation should be equivalent in meaning,and what an expression evaluates to (in particular, the outcome of a program)should be part of its meaning.

Another desirable feature of this denotational semantics is that it is compo-sitional: the denotation of every complex expression is fully determined by thedenotations of its parts and the mode of combination. For example, the denotationof every + expression is fully determined by the denotations of its two subexpres-sions. Compositional semantic theories are desirable because they are easier touse and extend (Janssen 1997). In theory, any language admits a compositionalsemantics, regardless of what its expressions are and what they evaluate to—justlet each expression denote itself.2 However, it is often a challenge to come upwith a compositional semantics that is simple and insightful enough to facilitatefurther analysis of the language.

2A less trivial result, shown by Zadrozny (1994) and Lappin and Zadrozny (2000) usingnon-well-founded set theory, is that any language admits a compositional semantics that respectssynonymy. For a denotational semantics to respect synonymy is for two expressions E1 and E2 todenote the same thing whenever they are synonymous (or observationally equivalent). We saythat E1 and E2 are synonymous if, whenever E1 occurs as part of a larger expression, it can bereplaced with E2 to form another expression with the same evaluation result, and vice versa.

Page 19: Linguistic side effects

1.4. Environment in programming languages 9

Well-formed program Outcome

2 > 6 false2 + 6 > 6 true2 + 6 > (2 + 2) + 2 true2 + 6 > 6 + 6 false2 > 2 false

let it be 2. (it > 6) falselet it be 2. (it + 6 > 6) truelet it be 2. (it + 6 > (let it be it + it. it + 2)) truelet it be 6. (2 + it > it + it) falseit + 6 > 6 falseit > 2 falselet it be 6. (it > 2) truelet it be 6. (2 > 2) false

Figure 1.3. A tiny programming language, with environment

We now add the environment feature to the language, as illustrated in Fig-ure 1.3. We introduce two new constructs: a primitive expression it, which reads anumber from the environment, and a binary operator let it be A. B, which providesA as the environment to be read by B. The program

(1.13) let it be 2. (it + 6 > 6)‘As for 2, the sum of it and 6 is greater than 6.’

evaluates to the true outcome, as 2 + 6 > 6 does, because let it be 2 provides theenvironment 2 to the expression

(1.14) it + 6 > 6‘The sum of it and 6 is greater than 6.’

As another example, the program

(1.15) let it be 2. (it + 6 > (let it be it + it. it + 2))

evaluates to the true outcome, as 2 + 6 > (2 + 2) + 2 does, because let it be it + itprovides the environment 4 to the expression it + 2. We stipulate that any it notenclosed by let it be . . . evaluates to 0. For example, the program (1.14) aboveevaluates to false, because 0 + 6 is not greater than 6. (On the other hand, alet it be . . . that encloses no it is simply ignored.)

Page 20: Linguistic side effects

10 1. Introduction

In denotational terms, once environment is introduced into the language, wecan no longer just let each expression denote its result—at least not if we want thedenotational semantics to stay compositional and sound, which we do. To see this,suppose that each program were to denote its Boolean outcome. Then, since thetwo programs in (1.16) both evaluate to false, they have the same denotation. Bycompositionality, the two programs in (1.17) must also have the same denotation.But (1.17a) evaluates to true, whereas (1.17b) evaluates to false, so soundnessfails.

(1.16) a. it > 2‘It is greater than 2.’

b. 2 > 2‘2 is greater than 2.’

(1.17) a. let it be 6. (it > 2)‘As for 6, it is greater than 2.’

b. let it be 6. (2 > 2)‘As for 6, 2 is greater than 2.’

It is crucial to this proof that, whereas the programs in (1.16) evaluate tothe same result, the programs in (1.17) do not. That is, the proof relies on twoexpressions E1 and E2 that evaluate to the same result, such that replacing E1

with E2 as part of a larger expression affects what the larger expression evaluatesto. Adding environment made it impossible to specify a compositional and sounddenotational semantics where each expression denotes its result. A computationalside effect is a programming-language feature that gives rise to two expressionsthat evaluate to the same result by themselves but yield different results as partof a larger expression, thus rendering unsound any compositional denotationalsemantics where each expression denotes its result. In this sense, environment isa computational side effect.

To preserve soundness and compositionality in the presence of environment,the programming-language semanticist typically revises the denotational seman-tics so that an expression denotes not a result but a map from environments toresults. For example, the expression 2 will now denote not 2 but the function thatmaps every environment to 2. Formally, we specify (cf. (1.7)–(1.10))

J2K (ρ) = 2,(1.18)

J6K (ρ) = 6,(1.19)

JA + BK (ρ) = JAK (ρ) + JBK (ρ),(1.20)

JA > BK (ρ) ={

true if JAK (ρ) > JBK (ρ),false if JAK (ρ) ≤ JBK (ρ),

(1.21)

Page 21: Linguistic side effects

1.5. Intensionality in natural languages 11

Acceptable utterance Truth value

Alice saw Bob trueAlice saw the morning star falseAlice saw the evening star falseBob saw Alice trueBob saw the morning star trueBob saw the evening star true

Figure 1.4. A tiny fragment of English

for any number ρ and any expressions A and B. Having lifted denotations fromplain results to environment-to-result functions, we can now provide denotationsfor expressions involving it:

Jlet it be A. BK (ρ) = JBK(JAK (ρ)

),(1.22)

JitK (ρ) = ρ.(1.23)

These rules give the desired denotation for the program (1.13).

Jit + 6K (2) = JitK (2) + J6K (2) = 2 + 6 = 8.(1.24)

Jit + 6 > 6K (2) ={

true if Jit + 6K (2) > J6K (2)false if Jit + 6K (2) ≤ J6K (2)

(1.25)

= true, because 8 > 6.

Jlet it be 2. (it + 6 > 6)K (ρ) = Jit + 6 > 6K(J2K (ρ)

)(1.26)

= Jit + 6 > 6K (2) = true.

We have seen that adding environment to a programming language forcesus to revise our denotational semantics, and how to model environment usingfunctions as denotations. We have also seen that, even when a toy programminglanguage is so simple that the outcome of a program can only be true or false, wecan learn about the syntax and semantics of the language by striving to specify theprograms and their outcomes simply and accurately. In the next section, we turnto natural language and learn about syntax and semantics by striving to specifywhich sentences are true and which are false simply and accurately.

1.5. Intensionality in natural languages

Figure 1.4 shows a tiny fragment of English, comprised of utterances aboutseeing (Frege 1891, 1892; Quine 1960). According to the table, Alice and Bob

Page 22: Linguistic side effects

12 1. Introduction

saw each other, but only Bob saw Venus. (Venus is also known as the morningstar and the evening star.)

When an utterance is true or false, like those in Figure 1.4, we say thatthe utterance refers to its truth value. Although expressions like Alice and themorning star are not utterances with (Boolean) truth values, it is intuitive to regardthem as referring to objects. For example, we regard Alice as referring to Alice,and the morning star as referring to Venus, even though these references are onlyindirectly observable by embedding these expressions in a larger utterance with atruth value. Under this intuition, if two expressions E1 and E2 in this fragmentof English refer to the same thing, then whenever E1 occurs as part of a largerexpression, it can be replaced with E2 without affecting what the larger expressionrefers to. For example, since the morning star and the evening star have thesame reference, so do Bob saw the morning star and Bob saw the evening star.

In operational terms, an expression like Alice and the morning star is aprocedure that can be followed to pick out an object, and an expression like

(1.27) Alice saw the morning star(1.28) Alice saw the evening star

is a procedure that can be followed to check if the expression is true. Thesentence (1.27) means to pick out Alice and the morning star, then check whethershe saw it. In denotational terms, we can take each expression simply to mean itsreference. For example, the utterance (1.27) can simply denote false. We specify

JAliceK = Alice,(1.29)

JBobK = Bob,(1.30)

Jthe morning starK = Venus,(1.31)

Jthe evening starK = Venus,(1.32)

JA saw BK =

true if JAK = Alice and JBK = Bob,false if JAK = Alice and JBK = Venus,true if JAK = Bob and JBK = Alice,true if JAK = Bob and JBK = Venus, . . . .

(1.33)

These rules together entail that the utterance (1.27) denotes false.This denotational semantics is sound. For example, since the morning star

and the evening star denote the same thing, they also refer to the same thing:the morning star is the evening star. This semantics is also compositional, as isevident from the form of (1.29)–(1.33).

We now expand this fragment of English to include some utterances aboutthinking, as shown in Figure 1.5. According to the table, Bob (being ignorant of

Page 23: Linguistic side effects

1.5. Intensionality in natural languages 13

Acceptable utterance Truth value

Alice saw Bob trueAlice saw the morning star falseAlice saw the evening star falseBob saw Alice trueBob saw the morning star trueBob saw the evening star true

Bob thinks Alice saw the morning star trueBob thinks Alice saw the evening star false

Figure 1.5. A tiny fragment of English, with think

astronomy) does not know that the morning star is the evening star, so he thinksAlice saw the morning star but does not think that Alice saw the evening star.3

In denotational terms, once we expand our fragment to include think, we canno longer just let each expression denote its reference—at least not if we wantthe denotational semantics to stay compositional and sound, which we do. To seethis, suppose that each utterance that is true or false were to denote its Booleanreference. Then, since the utterances (1.27) and (1.28) have the same reference(namely false), they have the same denotation. By compositionality, the twoutterances

(1.34) Bob thinks Alice saw the morning star(1.35) Bob thinks Alice saw the evening star

must also have the same denotation. But (1.34) is true, yet (1.35) is false, sosoundness fails.

It is crucial to this proof that, whereas the utterances (1.27) and (1.28) havethe same reference, the utterances (1.34) and (1.35) do not. That is, the proofrelies on two expressions E1 and E2 that have the same reference, such thatreplacing E1 with E2 as part of a larger expression affects the larger expression’sreference. In English, think makes it impossible to specify a compositional andsound denotational semantics where each expression denotes its reference. Theword think is said to be intensional.

To preserve soundness and compositionality in the presence of intensionality,the natural-language semanticist typically revises the denotational semantics so

3This classical example may be counter-intuitive to a reader who knows astronomy. Anequivalent example is Bob thinks Alice drove on Route 9 versus Bob thinks Alice drove onWorcester Street.

Page 24: Linguistic side effects

14 1. Introduction

that an expression denotes not a reference but a map from possible worlds toreferences. The morning star and the evening star will now denote not Venusbut two distinct functions from possible worlds to heavenly bodies. These twodenotations will map the actual world to the same heavenly body Venus, but otherworlds to different heavenly bodies. Similarly, (1.27) and (1.28) will now denotenot false but two distinct functions from possible worlds to truth values. Thesetwo denotations will map the actual world to false, but other worlds to differenttruth values. If w is a world where the morning and evening stars are distinct andAlice only saw the former, then the denotations of (1.27) and (1.28) map w totrue and false, respectively. Formally, we specify (cf. (1.29)–(1.33))

JAliceK (w) = Alice,(1.36)

JBobK (w) = Bob,(1.37)

Jthe morning starK (w) = the morning star in the world w,(1.38)

Jthe evening starK (w) = the evening star in the world w,(1.39)

JA saw BK (w) = whether JAK (w) saw JBK (w).(1.40)

Having lifted denotations from plain references to possible-world-to-referencefunctions, we can now provide denotations for utterances involving think:

JA think BK (w) = whether JBK (w′) is true for every world w′

that JAK (w) considers possible in w.(1.41)

These rules give the desired denotation for the sentence (1.34).

JBob thinks Alice saw the morning starK (w)

= whether JAlice saw the morning starK (w′) is truefor every world w′ that JBobK (w) considers possible in w

= whether JAliceK (w′) saw Jthe morning starK (w′)for every world w′ that Bob considers possible in w

= whether Alice saw the morning starin every world that Bob considers possible in w

(1.42)

We have seen that the presence of an intensional verb in natural languageforces us to revise our denotational semantics, and how to model intensionalityusing functions as denotations. We have also seen that we can learn about thesyntax and semantics of natural language just by striving to specify the sentencesand their truth conditions simply and accurately. We now compare the situationwith that for programming languages.

Page 25: Linguistic side effects

1.6. Computational versus linguistic side effects 15

1.6. Computational versus linguistic side effects

Environment constructs in programming languages (Section 1.4) and inten-sional verbs in natural languages (Section 1.5) exemplify a general pattern. Innatural-language semantics, it is intuitive to expect every expression to have areference, like a physical object or a Boolean value. In programming-languagesemantics, it is intuitive to model program evaluation by assigning to every ex-pression a result, like a number or a Boolean value. (We henceforth use theterms “reference” and “result” interchangeably.) Given a notion of reference,some simple languages guarantee that, if two expressions E1 and E2 have thesame reference, then whenever E1 occurs as part of a larger expression, it can bereplaced with E2 without affecting the reference of the larger expression.

A denotational semantics is a system that assigns a denotation to every ex-pression. We want such a semantics to be sound: two expressions with the samedenotation should also have the same reference. We also want it to be compo-sitional: the denotation of every complex expression should be determined bythe denotations of its parts and the mode of combination. We prefer for everyexpression simply to denote its reference, but to preserve soundness and com-positionality in the absence of the guarantee above requires that denotations bemore complex than simple reference. Indeed, some expressions, such as SherlockHolmes and the king of France, seem to have no reference at all, let alone areference to denote soundly and compositionally. When such expressions arepresent in a language, it is obviously impossible for every expression to denote itsreference.

Intensionality is only one of many natural-language phenomena that break thenotion of reference or the guarantee above. We term them linguistic side effects.Some examples are anaphora (1.43), quantification (1.44), interrogation (1.45),focus (1.46), and presuppositions (1.47).

(1.43) Anaphora: A man walks in the park. He whistles.(1.44) Quantification: Every woman whistles.(1.45) Interrogation: Which star did Alice see?(1.46) Focus: Alice only saw Venus.(1.47) Presuppositions: The king of France whistles.

To account for each linguistic side effect, semantic theories typically preservesoundness and compositionality by incorporating into denotations some newaspect of meaning beyond reference.

The same guarantee discussed above is not just observed to break down innatural languages, but also designed to break down in programming languages:to make programs more concise and modular, we often want some expressions

Page 26: Linguistic side effects

16 1. Introduction

to evaluate to the same result or no result at all, yet, when placed inside somelarger context, to yield larger expressions that evaluate to different results. En-vironment is only one of many programming-language features that break thenotion of evaluation result or the guarantee above. Such features are knowntechnically as computational side effects, and the vast majority of programminglanguages in use today have them. Some examples are output (1.48), state (1.49),nondeterminism (1.50), exceptions (1.51), and control (1.52).

(1.48) Output: print 2; 10‘Print the number 2, then produce the number 10.’

(1.49) State: x := 2; 10‘Store 2 in the variable x, then produce 10.’

(1.50) Nondeterminism: 2 + random(10, 20)‘Add 2 to either 10 or 20, randomly chosen.’

(1.51) Exceptions: try(2 + throw) catch 3‘Add 2 to an error; fall back to 3 in case of error.’

(1.52) Control: label: 2 + goto label‘Add 2 to the result of starting over again.’

To preserve soundness and compositionality in the face of a computational sideeffect, semantic theories of programming languages typically take the samestrategy as those of natural languages: they add to denotations a new aspect ofmeaning beyond evaluation results.

As described above, environment and intensionality can be modeled usingfunctions as denotations. Other side effects call for different treatments. Theproper treatments of many linguistic and computational side effects are open areasfor research: some studies focus on a single side effect, while others characterizemore than one.

Be it for the scientific goal of characterizing humans who learn and uselanguage or for the engineering goal of building machines that process andexecute language, the linguist and the programming-language theorist both strivefor a simple, modular, and extensible treatment of side effects. One approachtowards such a treatment is to unify multiple notions or phenomena and showhow they are instances of a more general concern. That is the approach wetake throughout this dissertation, both in drawing a general analogy betweencomputational and linguistic side effects and in applying the analogy to its twoends:

• We use apparent noncompositionality to unify computational and lin-guistic side effects.• We use left-to-right evaluation in the presence of computational side

Page 27: Linguistic side effects

1.6. Computational versus linguistic side effects 17

effects to unify four generalizations across linguistic side effects: theprohibition against crossover in anaphora, the preference for linear scopein quantification, the superiority constraint on interrogation, and theeffect of spoken order on polarity sensitivity.• We use the duality between inside and outside in quantification (a lin-

guistic side effect) to unify call-by-value with call-by-name, and the letconstruct with the shift construct, in the presence of delimited control (acomputational side effect).

But first, we need some notation.

Page 28: Linguistic side effects
Page 29: Linguistic side effects

CHAPTER 2

Formalities

This chapter establishes the formal frameworks in which the remainder ofthis dissertation presents analyses of programming and natural languages. Ananalysis is comprised of general rules that specify or predict what expressions areacceptable and what they mean. Because these rules need to apply to a potentiallyinfinite number of expressions, we cannot simply enumerate every case. Instead,we specify a finite set of inference rules, which can give rise to an infinite numberof conclusions.

As is standard in logic, we indicate inference by a horizontal rule that separateszero or more premises above from exactly one conclusion below. Whereas theinference

(2.1)All men are mortal Socrates is a man

Socrates is mortal

concludes with mortality, the inference

(2.2)Alice is a subject left is a predicate

Alice left is a sentence

concludes with acceptability. These premises and conclusions are called judg-ments.

This notation can be used to write inference rules as well as proofs. Aninference rule describes an allowed pattern of inference. For example, the infer-ence (2.1) illustrates the rule

(2.3)All men are P X is a man

All-Men,X is P

where P and X are called metavariables because they range over expressions thatform a judgment (such as “mortal” and “Socrates”), not the objects they refer to(such as the set of mortals or the person Socrates). The optional label “All-Men”names the rule. We say that the inference (2.1) instantiates (or matches) theAll-Men rule. To take another example, the inference (2.2) illustrates the rule

(2.4)S is a subject P is a predicate

Subject-Predicate,S P is a sentence

19

Page 30: Linguistic side effects

20 2. Formalities

where S P means to concatenate S and P. A rule with no premise is an axiom. Toreason about English, we might stipulate the axiom

(2.5) Alice.Alice is a subject

An axiom like this, which classifies an atomic expression like Alice, is calleda lexical entry because it can be thought of as an entry in a lexicon (that is,dictionary).

A proof is a tree of connected inferences; it is also sometimes called aderivation. For example, the Subject-Predicate rule (2.4) and the Alice rule (2.5)can be instantiated and combined to form the proof

(2.6)Alice

Alice is a subject thinks vanilla is a predicateSubject-Predicate.

Alice thinks vanilla is a sentenceStarting from the sole premise judgment that “thinks vanilla is a predicate”, thisproof concludes that “Alice thinks vanilla is a sentence”. That “Alice is a subject”is not a premise assumed by the proof, but the conclusion of the Alice rule usedby the proof. The proof works whether or not thinks vanilla is a predicate: it onlysays that Alice thinks vanilla is a sentence if thinks vanilla is a predicate.1

Our formal plan, then, is as follows. To specify a language, we introducesome judgment forms and enumerate some inference rules whose premise andconclusion patterns are of these forms. A typical judgment classifies an expressioninto a type (or category). For example, the judgment “Alice is a subject” aboveclassifies the expression Alice into the type of subjects. We classify expressionsinto types so as to reason about similar expressions, such as all subjects or allpredicates, together.

The inference rules we enumerate can be instantiated and combined to formproofs that (starting from no premise) classify certain expressions into the type ofcomplete programs or complete utterances. We take exactly these expressions tobe complete programs in our programming language or complete utterances inour natural language. Further, we describe what these programs mean, often byspecifying how they execute, or what these utterances mean, often by modelingwhen they are true. A grammar is a formal system of inference rules and meaningdescriptions.

When a given grammar classifies an expression into a type, we say thatthe expression is well typed. Contrary to some usage in computer science andlinguistics, this classification is a syntactic discipline (Reynolds 1983; Pierce2002) rather than a semantic one, in that it is an expression that is classifiedrather than its meaning. For example, the judgment “Alice is a subject” classifiesAlice rather than Alice. Thus we use the terms “well-typed” and “well-formed”

1In response to What is Bob’s favorite ice cream flavor?, for instance.

Page 31: Linguistic side effects

2.1. Context-free grammars for natural language 21

interchangeably. Ill-typed (or ill-formed) expressions are those that are notclassified into any type. A grammar models them simply by their absence.

Like all models, a natural-language grammar only approximates reality: itconcentrates on aspects of natural language that we want to study, leaving outthe rest. In particular, in this dissertation we examine how words combine witheach other (roughly speaking) without worrying about how they sound, originate,change, or vary from speaker to speaker. Hence we do not try to enumerate orderive every lexical entry like the Alice rule in (2.5). Instead, in most natural-language derivations below, we simply start from premises like “Alice is a subject”,rather than starting from no premise or drilling down into the internal constitutionof the word Alice. This approximation simplifies our grammar while still lettingus study how words combine with each other. Similarly, a programming-languagegrammar also only approximates reality. Even if we wanted to, we could notperfectly formalize practical programming with all the features and complicationsthat it entails, such as interactions with users, cosmic rays, and the evolution ofa programming language over time. In order to make progress, it is standardpractice to model programming languages with toy programming languages andnatural languages with natural-language fragments. In either case, we designand study formal grammars that bring into focus the phenomena of interest inreal-world languages.

The main body of this dissertation builds on the λ-calculus as a programminglanguage and uses type-logical grammar to analyze natural languages. Beforedescribing the λ-calculus (in Section 2.3) and type-logical grammar (in Sec-tion 2.4), we first introduce some concepts through a context-free fragment ofnatural language (in Section 2.1) and a toy programming language for arithmetic(in Section 2.2). Although we specify these systems completely, the reader whohas not encountered inference rules and the λ-calculus before may find it helpfulto acquire more background and intuition from an introductory text like Wadler’sarticle (2000), Gallier’s tutorial (1993, 1991), and Carpenter’s (1997), Girardet al.’s (1989), and Pierce’s (2002) books.

2.1. Context-free grammars for natural language

Figure 2.1 shows some inference rules that model a tiny fragment of English.There is only one judgment form, written A : T and pronounced “A is a T”, whereA is an expression and T is a type. For now, an expression is a sequence of words,such as the empirically acceptable Alice said Bob left and the empirically unac-ceptable *Alice said Bob said. As mentioned earlier, types classify expressions.The types in this system are named after classes of expressions in linguistics: np(noun phrase), vp (verb phrase), v (verb), and s (clause, or sentence). For example,

Page 32: Linguistic side effects

22 2. Formalities

Expressions A : T

Alice : np Bob : np left : v slept : v saw : v loved : v said : v

A : np B : vpA B : s

A : vA : vp

A : v B : npA B : vp

A : v B : sA B : vp

Figure 2.1. A natural-language fragment

npF Alice | Bob vF left | slept | saw | loved | saidsF np vp vpF v | v np | v s

Figure 2.2. The context-free grammar equivalent to Figure 2.1

the judgment “Alice said Bob left : s” has the proof below.

(2.7)Alice : np

said : vBob : np left : vp

Bob left : ssaid Bob left : vp

Alice said Bob left : sHence, as desired, Alice said Bob left is predicted to be an acceptable Englishutterance, in particular a sentence. Also, “said Bob left : s” has no proof, so *saidBob left is predicted to be unacceptable.

In this set of inference rules, every expression metavariable (A or B) thatappears in the conclusion of a rule also appears in a premise of the rule. Thus thetree of rules used in a proof determines the final conclusion. For example, if wegive a name to each rule in Figure 2.1, and add these names to (2.7), then we canomit all the judgments between and under the horizontal rules in (2.7) with noloss of information. This property, that the proof determines the expression, holdsin many grammars below as well.

Because the judgment form and inference rules in Figure 2.1 are so simpleand use only finitely many types, this grammar is just a context-free grammarin thin disguise: In more familiar notation, the grammar can be written as inFigure 2.2. Each type is a nonterminal symbol, each word is a terminal symbol,and each inference rule is a production.

Alas, the types in Figures 2.1 and 2.2 do not distinguish finely enough amongexpressions. For example, the grammar classifies both said and left as justverbs (v), so it predicts incorrectly that *Alice said Bob said is as acceptable asAlice said Bob left. We can begin to fix this problem by making finer distinctions

Page 33: Linguistic side effects

2.1. Context-free grammars for natural language 23

Expressions A : T

Alice : np Bob : np left : iv slept : iv saw : tv loved : tv said : sv

A : np B : vpA B : s

A : ivA : vp

A : tv B : npA B : vp

A : sv B : sA B : vp

Figure 2.3. A natural-language fragment that distinguishes among three kinds ofverbs

npF Alice | Bob ivF left | slept tvF saw | loved svF saidsF np vp vpF iv | tv np | sv s

Figure 2.4. The context-free grammar equivalent to Figure 2.3

in the types. Figures 2.3 and 2.4 show a refined system (as inference rules andas productions) that splits the type v into three more refined types of verbs: iv(intransitive verb), tv (transitive verb), and sv (sentential verb). This new systemproperly rules out *Alice said Bob said.

The linguistic rules so far rely on a binary concatenation operation on expres-sions (written with a space), so as to combine Bob and left to give Bob left. Aswe have been taking an expression to be a sequence of words (rather than a binarytree, a set, or a multiset of words), this concatenation operation is associative,non-commutative, and non-idempotent. The language models in the followingsections treat the data structure of expressions more elaborately and explicitly:they allow more operations for building up expressions than just concatenation,and they take fewer properties of these operations for granted. One reason to ex-plore such models is to go beyond context-free grammars. Context-free grammarsare not ideal for modeling natural languages, for at least three reasons.

• Some natural languages are not context-free. For example, verb-objectagreement in Swiss-German embedded clauses boils down (Shieber1985) to the formal language of doubled strings {ww | w ∈ Σ∗ }, which iscontext-sensitive.• Some natural-language phenomena are awkward to model with context-

free grammars. In particular, many languages let a wh-phrase appearin front of a question, arbitrarily far from its “associated” verb. Forexample, who in the English question who do you think Alice said leftis far from left.• A context-free grammar only defines the syntax of a language and does

Page 34: Linguistic side effects

24 2. Formalities

Expressions A : s

0 : sA : s

succ A : sA : s B : s

A + B : s true : s false : sA : s B : s

A = B : sA : s B1 : s B2 : s

if A then B1 else B2 : s

Figure 2.5. A toy programming language for arithmetic

sF 0 | succ s | s + s | true | false | s = s | if s then s else s

Figure 2.6. The context-free grammar equivalent to Figure 2.5

not constrain its semantics. For example, although the expression Alicesaid Bob left is classified as a sentence, it is not required to have ameaning. Moreover, although the proof (2.7) splits into one branchfor Alice and another for said Bob left, a meaning for Alice said Bobleft need not be composed from one meaning for Alice and another forsaid Bob left. One may prefer to model natural language in a way thatenforces a tighter correspondence between syntax and semantics.

We start to assuage these shortcomings in Section 2.4 below, with a frameworkfor natural-language grammars that makes utterance structure more explicit andelaborate than just concatenation. But first, let us consider some programminglanguages.

2.2. A toy programming language

Figure 2.5 defines the syntax of a toy programming language. This grammaruses only one type s (for “start”), the type of programs. Accordingly, a judgmenthas the form A : s, pronounced “A is an s” or “A is a program”, where A is anexpression. (We omit “: s” from judgments in the next section for brevity.) Someexpressions classified as programs are:

succ 0 + succ succ 0 + succ succ succ 0,(2.8)true + false,(2.9)

if true = succ succ 0 then false else 0 + 0,(2.10)succ if true then false else succ 0.(2.11)

As in Section 2.1, this grammar is equivalent to a context-free grammar, whoseproductions are shown in Figure 2.6.

Page 35: Linguistic side effects

2.2. A toy programming language 25

(2.9)

+

true

pppppfalse

PPPPPP

(2.8) = (2.12)

+

+

qqqqqqsucc

RRRRRRR

succ

qqqqqqsucc

MMMMMMsucc

0 succ succ

0 0

(2.13)

+

succ

lllllll+

MMMMMM

0 succ

qqqqqqsucc

MMMMMM

succ succ

0 succ

0

(2.14)

succ

+

0

nnnnnn+

MMMMMM

succ

qqqqqqsucc

MMMMMM

succ succ

0 succ

0

Figure 2.7. Expressions are trees but usually written with a textual representation

We take an expression in this language to be not a string of characters (concretesyntax) but a tree-like data structure (abstract syntax), such that

• each leaf is labeled by 0, true, or false;• each unary branch is labeled by succ;• each binary branch is labeled by + or =;• each ternary branch is labeled by if-then-else; and• there are no other kind of branches.

For example, the first tree in Figure 2.7 depicts graphically the same expressionas (2.9) represents textually. This abstract syntax tree is equivalent to a context-free parse tree. It is also equivalent to a proof tree, so the tree of proof rulesused determines the final conclusion judgment and the expression therein, as inprevious grammars in Section 2.1.

Some concrete strings are ambiguous among many abstract expressions, whichcan be distinguished in writing using parentheses. For example, the string in (2.8)may be understood as any of the following expressions, among others.

((succ 0) + (succ succ 0)) + (succ succ succ 0)(2.12)(succ 0) + ((succ succ 0) + (succ succ succ 0))(2.13)succ (0 + ((succ succ 0) + (succ succ succ 0)))(2.14)

(The remainder of Figure 2.7 depicts these three expressions graphically.) Byconvention, we let + and = associate to the left, so 0+0+0 represents (0+0)+0 andnot 0+(0+0). Also by convention, we specify that succ has the highest precedence

Page 36: Linguistic side effects

26 2. Formalities

true : s

0 : ssucc 0 : s

succ succ 0 : strue = succ succ 0 : s false : s

0 : s 0 : s0 + 0 : s

if true = succ succ 0 then false else 0 + 0 : s

Figure 2.8. Proving that if true = succ succ 0 then false else 0 + 0 is wellformed

in the language, followed by +, =, and if-then-else (in that order). This way,the concrete string in (2.8) means only the abstract expression in (2.12). To takeanother example, the concrete string (2.10) means only the abstract expression

(2.15) if true = succ succ 0 then false else (0 + 0).

Figure 2.8 proves the well-formedness of this expression.A popular use of programs is to run them. We now describe how to run a

program in our toy language, using a small-step operational semantics. Firstwe define the set of evaluation contexts (Felleisen 1987).2 Intuitively, a contextis an expression with a hole in it, where a subexpression can be plugged in. Inother words, a context is an expression tree with a subtree removed, where a newsubtree can be attached. For example,

(2.16) succ 0 + succ [ ] + succ succ succ 0

is a context, as is the trivial context [ ]. We write the metavariable C[ ] for acontext. If C[ ] is a context and A is an expression, then we write C[A] for theresult of plugging A into C[ ]. For example, if C[ ] is the context (2.16), thenC[succ 0] is the expression (2.8) on page 24. If C[ ] and C′[ ] are two contexts,then we write C[C′[ ]] for the result of plugging C′[ ] into C[ ]. Continuing theexample, C[0 = [ ]] is the context

(2.17) succ 0 + succ (0 = [ ]) + succ succ succ 0.

Figure 2.9 uses this notation to define evaluation contexts formally. Our judgmentform for evaluation contexts is simply C[ ], where C[ ] contains [ ]. This grammar

2For this toy language, it is overkill to specify the operational semantics in terms of evaluationcontexts, but the notion of contexts is crucial for studying the computational side effect of delimitedcontrol and the linguistic side effect of quantification, which we start to do in Chapter 3.

Page 37: Linguistic side effects

2.2. A toy programming language 27

Evaluation contexts C[ ]

[ ]C[ ]

C[succ [ ]]C[ ] B : sC[[ ] + B]

A : s C[ ]C[A + [ ]]

C[ ] B : sC[[ ] = B]

A : s C[ ]C[A = [ ]]

C[ ] B1 : s B2 : sC[if [ ] then B1 else B2]

A : s C[ ] B2 : sC[if A then [ ] else B2]

A : s B1 : s C[ ]C[if A then B1 else [ ]]

Figure 2.9. Evaluation contexts

cF ; | c succ | + s c | c s + | = s c | c s =| then s else s c if | else s c if s then | c if s then s else

Figure 2.10. The context-free grammar of evaluation contexts, equivalent toFigure 2.9

Computation A B B

C[succm0 + succn0] B C[succm+n0]C[succm0 = succm0] B C[true]C[succm0 = succn0] B C[false] if m , n

C[if true then B1 else B2] B C[B1]C[if false then B1 else B2] B C[B2]

Figure 2.11. The computation relation between expressions. By succm0 wemean m copies of succ in front of 0, where m is a nonnegative integer.

can be viewed “inside-out” as a context-free grammar: if we change the notationof contexts to swap what is written to the left of the hole with what is written tothe right, so that the context (2.16) is notated

(2.18) + succ succ succ 0 ; succ 0 + succ

instead, then Figure 2.9 becomes a context-free grammar, shown in Figure 2.10.Figure 2.11 introduces the computation relation B, a binary relation between

expressions. Intuitively, A B B holds just in case the expression A turns into theexpression B in one small step of computation. Computation proceeds by chainingthese small steps together. For example, starting with the expression (2.8) on

Page 38: Linguistic side effects

28 2. Formalities

if 0 = 0 then succ 0 + succ 0 else false B if 0 = 0 then succ succ 0 else falseO O

if true then succ 0 + succ 0 else false B if true then succ succ 0 else falseO O

succ 0 + succ 0 B succ succ 0

Figure 2.12. Three confluent ways to run the same program

page 24, we have

succ 0 + succ succ 0 + succ succ succ 0B succ succ succ 0 + succ succ succ 0B succ succ succ succ succ succ 0.

(2.19)

We write B∗ for the reflexive and transitive closure of B, and B+ for the transitiveclosure of B. Under the (intended) interpretation of succ as the successor functionon numbers, this fact that

succ 0 + succ succ 0 + succ succ succ 0

B∗ succ succ succ succ succ succ 0

(2.20)

means that (1 + 2) + 3 evaluates to 6.The computation relation B is nondeterministic: there exist expressions A such

that A B B1 and A B B2 hold for two different expressions B1 and B2. For example,Figure 2.12 shows three ways to run the program at the upper-left corner to getthe result at the lower-right corner. Nevertheless, this programming language isconfluent (or equivalently, it has the Church-Rosser property): whenever A B∗ B1

and A B∗ B2 for some expressions A, B1, and B2, there exists an expression B′

such that B1 B∗ B′ and B2 B

∗ B′. This definition is illustrated below.

(2.21)

A

B1

O∗

B2B∗

B′O∗

B∗

An expression A from which computation cannot proceed (that is, such thatno B satisfies A B B) is said to be in normal form. Our language enjoys the weaknormalization property, which means that every expression A has a normal form Bsuch that A B∗ B. In words, every program can be run in some way to stop. In fact,every way to run every program in this language eventually stops. That is, there

Page 39: Linguistic side effects

2.2. A toy programming language 29

Expressions A : T

0 : nA : n

succ A : nA : n B : n

A + B : n true : b false : bA : n B : n

A = B : bA : b B1 : T B2 : T

if A then B1 else B2 : T

Figure 2.13. A toy programming language with a more refined type system

does not exist an infinite sequence of computation steps A0 B A1 B A2 B · · · .This property is called strong normalization or termination.

Informally speaking, there are two kinds of expressions in normal form.On one hand, an expression in normal form may be the result of a successfulcomputation, such as the last line in (2.19): succ succ succ succ succ succ 0.On the other hand, an expression in normal form may be stuck in an unsuccessfulcomputation, such as (2.9) on page 24, which tries to add true to false. Todistinguish between these two cases formally, we define a value to be either true,false, or zero or more copies of succ in front of 0. A normal form that is nota value, like (2.9), is stuck. A value is the result of a successful computation,whereas a stuck expression is the carnage of an unsuccessful computation.

The expressions (2.10) and (2.11) on page 24 are not initially stuck, but getstuck after some computation steps. In other words, these expressions neverproduce a value when run; they are akin to a program that begins running but thencrashes. To rule out such errors without actually running the program, we canrefine the type system of our language, as is done in Figure 2.13. The catch-alltype s for programs becomes two types: n for (an expression that computes to)a number and b for (an expression that computes to) a Boolean. A judgment inthis type system has the form A : T , where A is a metavariable that ranges overexpressions, and T is a metavariable that ranges over types. Types are generatedby the simple context-free grammar

(2.22) T F n | b.

This type system shares with the previous one in Figure 2.5 the subjectreduction property, also known as type preservation: if A B B and A : T , then B : Tas well. Furthermore, no expression to which this refined type system assigns atype (either n or b) is stuck. That is, any expression assigned a type either is avalue or computes to some expression. These two properties together entail thatany expression assigned a type never computes to a stuck expression—in otherwords, never “goes wrong”. Formally, if A B∗ B and A : T , then B is not stuck. Inparticular, the refined type system assigns no type to (“rules out”) the expressions

Page 40: Linguistic side effects

30 2. Formalities

(2.10) and (2.11).For the purpose of ruling out programs that go wrong, this type system only

classifies expressions approximately. Besides ruling out every program that goeswrong, the system also rules out some programs (hopefully seldom encounteredor needed in practice) that do not go wrong when run. For example, the program(similar to (2.11) on page 24)

(2.23) succ if false then false else succ 0

is no longer assigned a type according to Figure 2.13, yet computes to a valuejust fine:

(2.24) succ if false then false else succ 0 B succ succ 0.

This kind of approximation is often necessary in a practical type system, especiallyfor a programming language without the termination property, because it maytake too long or even be impossible for a computer to decide precisely whethera given program goes wrong. Even when every computation sequence is finite,as is the case here, an approximate type serves to summarize an expression toa programmer, much as a natural-language type classifies an expression to alinguist.

2.3. The λ-calculus

A crucial ability missing from the toy programming language in Section 2.2is to pass a program or its result for use elsewhere. The λ-calculus (Church 1932,1940; Barendregt 1981; Hindley and Seldin 1986) is a canonical programminglanguage that incorporates this ability by allowing values, including functions, tobe passed around. Because the λ-calculus lets us pass functions as values, it isknown as a higher-order programming language.

We first introduce the λ-calculus informally, then consider its type systemsand operational semantics. The three kinds of expressions in the λ-calculus arevariables, abstractions, and applications. The expression λx. succ 0 + x is anabstraction; it denotes a function that maps each number to its successor. Thevariable x here, representing the input to the function, is bound within the bodysucc 0 + x of the abstraction. We notate the application of a function F to anargument E as FE. A function can apply multiple times to different arguments,each time binding the variable to a different input. For example, the applicationexpression

(2.25) (λf . f (0) + f (succ succ 0))(λx. succ 0 + x)

first binds f to the increment function, then binds x to 0 in the first application of f ,but to succ succ 0 in the second application of f . This program thus computes(1 + 0) + (1 + 2).

Page 41: Linguistic side effects

2.3. The λ-calculus 31

Expressions Γ ` E

Idx ` x

Γ, x ` EAbstract

Γ ` λx. EΓ ` F ∆ ` E

ApplyΓ,∆ ` FE

Γ[∆] ` EWeaken

Γ[∆,Θ] ` EΓ[∆,∆] ` E

ContractΓ[∆] ` E

Γ[∆,Θ] ` EExchange

Γ[Θ,∆] ` EΓ[(∆,Θ),Π] ` E================ AssociateΓ[∆, (Θ,Π)] ` E

Figure 2.14. The untyped λ-calculus

For brevity, we omit some parentheses in expressions: Abstractions extendas far to the right as possible, so λx. xy means λx. (xy) rather than (λx. x)y.Applications associate to the left, so xyz means (xy)z rather than x(yz).

By convention, when multiple nested abstractions bind the same variablename, the innermost binding “wins”. For example, the expression λx. λx. x meansthe constant function returning the identity function—that is, λy. λx. x rather thanλx. λy. x. Expressions like λx. λx. x and λy. λx. x are α-equivalent in that they(informally speaking) differ only in the names of bound variables. We henceforthregard α-equivalent expressions to be equal. We then assume that all boundvariables in an expression are renamed whenever necessary to be distinct fromeach other; this assumption is Barendregt’s variable convention (1981). In otherwords, bound-variable names are not part of the abstract syntax of this language;rather, they are just nomenclature for associating an abstraction expression withvariable expressions in its body.

To substitute an expression E for a variable x in another expression E′ is toreplace every occurrence of x in E′ by E. We notate the result as E′ {x 7→ E}.(Here x is a metavariable that ranges over variables in the λ-calculus.) Forexample,

(2.26) (x(λy. x)) {x 7→ λz. zw} = (λz. zw)(λy. λz. zw).

Because the same variable may be bound multiple times in E and E′, substitutionis not just a matter of textual replacement, but may involve renaming variables toavoid conflicts. For example,

(2.27) (x(λy. x)) {x 7→ λx. yx} = (λx. yx)(λy ′. λx. yx) , (λx. yx)(λy. λx. yx).

Figure 2.14 defines the untyped λ-calculus, which is so called because itonly classifies expressions as well-formed. We use E and F as metavariables forexpressions. To keep track of bound variables, judgments in this system are ofthe form Γ ` E. Here the type environment (or antecedent) Γ is a list of bound

Page 42: Linguistic side effects

32 2. Formalities

variables—more precisely, a binary tree whose leaves are either just the nullenvironment, written with a dot ·, or an assumption x, meaning that the variable xis bound. The turnstile ` separates the type environment from the expression Eclassified well-formed. Besides the metavariable Γ for type environments, wealso use the Greek letters ∆,Θ,Π.

The first three rules in this system construct variables, abstractions, andapplications. The Id rule says that a variable is a well-formed expression (· · · ` x),as long as it is bound (x ` · · · ). The Abstract rule says that an abstraction λx. Eis well formed in the type environment Γ, as long as the body E is well formedin Γ extended with an assumption for the bound variable x. The Apply rule saysthat, if F and E are well formed in the type environments Γ and ∆, then theapplication FE is well formed in the combined type environment Γ,∆.

As with expressions, we equate judgments that differ only in the name ofbound variables. A variable name in an environment, like the variable name boundby an abstraction expression, is just nomenclature for associating an assumptionto the left of ` with variable expressions to the right of `. We then assume thatall bound variables in a judgment are renamed whenever necessary to be distinctfrom each other. Thus the Abstract rule in Figure 2.14 is really shorthand for amore elaborate rule that renames the bound variable x to a freshly chosen name y:

(2.28)Γ, y ` E{x 7→ y} E does not mention y

Abstract.Γ ` λx. E

The remaining four rules in the system are structural. They ensure that a typeenvironment behaves as a set of bound variables, even though strictly speakingit has the structure of a binary tree. The notation Γ[. . .] in these rules means atype environment that includes another. For example, Γ[∆,∆] in the Contractrule matches any type environment that somewhere contains two identical typeenvironments ∆ together. Accordingly, Γ[∆] in the same rule matches the sametype environment with only one copy of ∆ at the same place. Thus Γ[ ] is ametavariable for not a complete type environment but a type environment with ahole [ ].

The Weaken rule says that it never hurts the well-formedness of an expres-sion to assume an extra bound variable; that is, a bound variable’s value can bediscarded freely. The Contract rule says that duplicate assumptions in a type envi-ronment can be merged; that is, a bound variable’s value can be duplicated freely.The Exchange rule says that order does not matter in a type environment; thatis, the comma for combining type environments is commutative. The Associaterule is written with a special notation: the double horizontal rule between the twojudgments means that either judgment can be used as the premise in an inference,and the other judgment as the conclusion. This rule says that the comma forcombining type environments is associative. If we include the antecedents of each

Page 43: Linguistic side effects

2.3. The λ-calculus 33

Idf ` f

Weakenf , · ` f

Exchange·, f ` f

Idf ` f

Idx ` x

Applyf , x ` fx

Apply(·, f ), (f , x) ` f (fx)

Associate((·, f ), f ), x ` f (fx)

Abstract(·, f ), f ` λx. f (fx)

Associate·, (f , f ) ` λx. f (fx)

Contract·, f ` λx. f (fx)

Abstract· ` λf . λx. f (fx)

Idf ` f

Idf ` f

Idx ` x

Applyf , x ` fx

Applyf , x ` f (fx)

Abstractf ` λx. f (fx)

Abstract· ` λf . λx. f (fx)

Figure 2.15. Proving that λf . λx. f (fx) is well formed

Evaluation contexts C[ ]

[ ]C[ ]

C[λx. [ ]]C[ ] ∆ ` E

C[[ ]E]Γ ` F C[ ]

C[F[ ]]

Figure 2.16. Evaluation contexts in the λ-calculus

structural rule in its name—for example, if we distinguish among instances ofWeaken with different Γ or Θ—then this grammar, like previous ones, ensuresthat the tree of rules used in a proof determines the final judgment.

To illustrate all these rules, Figure 2.15 derives that the “twice” functionλf . λx. f (fx) is a well-formed expression. The same proof is shown there twice:first in its full glory, regarding the type environment as a binary tree, and thenwith the structural inferences omitted for brevity, regarding the type environmentas a set of bound variables rather than a tree.

A program in the λ-calculus runs as follows. Figure 2.16 defines a judgmentform C[ ], which means that C[ ] is a context that can be plugged with someexpression E to make a complete program C[E]. The context C[ ] may bindvariables in E; for example, plugging the expression E = fx into the contextC[ ] = λf . λx. f [ ] makes the complete program C[E] = λf . λx. f (fx). Armedwith this definition of an evaluation context, we can then define a computationrelation B, called β-reduction or λ-conversion:

(2.29) C[(λx. E′)E] B C[E′ {x 7→ E}].

Here C[ ] is any evaluation context, and E and E′ are any expressions, such thatthe program on the left is well formed, in which case the program on the right

Page 44: Linguistic side effects

34 2. Formalities

is also well formed. As before, we write B∗ to mean the reflexive and transitiveclosure of this computation relation, and B+ to mean the transitive closure. Thesubject reduction property holds: if E B∗ E′ and Γ ` E, then Γ ` E′ as well.

To illustrate β-reduction, let us apply the “twice” function (from Figure 2.15)to itself, to get the “four times” function.

(λf . λx. f (fx))(λf . λx. f (fx))B λx. (λf . λx. f (fx))((λf . λx. f (fx))x)= λf . (λf . λx. f (fx))((λf . λx. f (fx))f ) (by α-equivalence)B λf . (λf . λx. f (fx))(λx. f (fx))B λf . λx. (λx. f (fx))((λx. f (fx))x)B λf . λx. (λx. f (fx))(f (fx))B λf . λx. f (f (f (fx)))

(2.30)

To take another example, if we enrich the syntax of this λ-calculus to encompassarithmetic expressions like (2.25) on page 30, then we can perform the followingcomputation.

(λf . f (0) + f (succ succ 0))(λx. succ 0 + x)B (λx. succ 0 + x)(0) + (λx. succ 0 + x)(succ succ 0)B (λx. succ 0 + x)(0) + (succ 0 + succ succ 0)B (succ 0 + 0) + (succ 0 + succ succ 0)

(2.31)

Via a suitably enlarged computation relation, the last expression above maycompute eventually to succ succ succ succ 0, as desired.3

For clarity in the many example programs presented below in the λ-calculus,we sometimes write

(2.32) let E be x. E′

to mean (λx. E′)E.4 This syntactic sugar makes sense because of β-reduction. Wecall E the argument of the let, and E′ the body of the let. By convention, thebody E′ of the let expression extends as far to the right as possible, like the bodyof the corresponding λ-abstraction.

3Many programming examples below assume, like this one, that the syntax and computa-tion relation of the programming language are suitably extended with integer arithmetic as inSection 2.2. We do not formalize such constants as succ, 0, and + and computation steps forthem, because they are not needed for the technical results in this dissertation. If desired, onecan implement them using products, unit, sums, and recursion, described in Sections 2.3.3, 2.3.4,and 2.3.5.

4This notation orders E and x differently from the usual syntax let x = E in E′ in functionalprogramming languages. We choose this notation so that we can extend it to perform casediscrimination (case E of . . . in typical functional programming languages) in Section 2.3.4.

Page 45: Linguistic side effects

2.3. The λ-calculus 35

Expressions Γ ` E : T

Idx : T ` x : T

Γ, x : T1 ` E : T2 AbstractΓ ` λx. E : T1→ T2

Γ ` F : T1→ T2 ∆ ` E : T1 ApplyΓ,∆ ` FE : T2

Γ[∆] ` E : TWeaken

Γ[∆,Θ] ` E : TΓ[∆,∆] ` E : T

ContractΓ[∆] ` E : T

Γ[∆,Θ] ` E : TExchange

Γ[Θ,∆] ` E : TΓ[(∆,Θ),Π] ` E : T=================== AssociateΓ[∆, (Θ,Π)] ` E : T

Evaluation contexts C[ ]

[ ]C[ ]

C[λx. [ ]]C[ ] ∆ ` E : T1

C[[ ]E]Γ ` F : T0 C[ ]

C[F[ ]]

Figure 2.17. The simply-typed λ-calculus

2.3.1. Confluence, normalization, and typing. The computation relationjust defined is confluent (Barendregt 1981). (As we extend this programminglanguage in Chapter 3, confluence becomes threatened.) But it is not weaklynormalizing, let alone strongly normalizing. For example, the expression

(2.33) (λx. xx)(λx. xx)

β-reduces to itself and no other expression, so there is no normal-form expres-sion E such that (λx. xx)(λx. xx) B∗ E. Informally speaking, the program (2.33)enters an infinite loop when run.

To enforce normalization, we can refine the type system. We classify expres-sions into a countably infinite family of types, not just well-formed or s. A type isnow either an element of a fixed set of base types, say a, b, c, . . . , or a functiontype, written T1→ T2 where T1 and T2 are again types. More formally, types Tare generated by the context-free grammar

(2.34) T F A | T → T ,

where base types A are generated by

(2.35) AF a | b | c | · · · .

A type environment Γ no longer just enumerates bound variables but also asso-ciates each variable with a type. Figure 2.17 shows the refined type system, calledthe simply-typed λ-calculus. For example, Figure 2.18 proves that the “twice”

Page 46: Linguistic side effects

36 2. Formalities

Idf : a→ a ` f : a→ a

Idf : a→ a ` f : a→ a

Idx : a ` x : a

Applyf : a→ a, x : a ` fx : a

Applyf : a→ a, x : a ` f (fx) : a

Abstractf : a→ a ` λx. f (fx) : a→ a

Abstract· ` λf . λx. f (fx) : (a→ a)→ (a→ a)

Figure 2.18. Proving that λf . λx. f (fx) has type (a→ a)→ (a→ a)

function λf . λx. f (fx) has the type (a→ a)→ (a→ a). (We omit the structural rulesin this proof; they are entirely analogous to those in Figure 2.15.)

By convention, the function-type arrow→ associates to the right. For example,a→ b→ c means the type a→ (b→ c) rather than (a→ b)→ c.

Some expressions in the untyped λ-calculus can no longer be derived in thesimply-typed λ-calculus. For example, the infinite-loop program (2.33) cannotbe assigned a type in the refined system. The computation relation B for thesimply-typed λ-calculus is just the restriction of the computation relation for theuntyped λ-calculus to those expressions that can still be derived in the refinedsystem. As in Section 2.2, the refined system preserves the subject reductionproperty: if E B∗ E′ and Γ ` E : T , then Γ ` E′ : T as well. Furthermore, everysimply-typed λ-expression is strongly normalizing. That is, whenever Γ ` E : Tfor some environment Γ, expression E and type T , there is no infinite sequence ofcomputation steps E B E1 B E2 B · · · starting at E.

We have been using the word “function” loosely in reference to expressionsand types. Technically speaking, expressions in the untyped λ-calculus arenot—cannot be interpreted as—functions to be applied in the traditional, set-theoretic sense. For example, although we can think of the unique expressionλx. x informally as “the identity function”, there is no such thing as the identityfunction in the traditional, set-theoretic sense—only a family of identity functions,one for each domain. By contrast, an expression in the simply-typed λ-calculuscan be interpreted set-theoretically, provided that we fix the type of the expressionand specify a set for each base type mentioned in that type.

2.3.2. The formulas-as-types correspondence. In the simply-typed λ-cal-culus, as with previous grammars, the tree of inference rules used in a proofdetermines the final conclusion of the proof (provided, as before, that instancesof a structural rule with different antecedents count as different inference rules).We can thus omit expressions and variables from judgments with no loss ofinformation.

Figure 2.19 shows the result of omitting expressions and variables from the

Page 47: Linguistic side effects

2.3. The λ-calculus 37

IdT ` T

Γ, T1 ` T2→ I

Γ ` T1→ T2

Γ ` T1→ T2 ∆ ` T1→E

Γ,∆ ` T2

Γ[∆] ` TWeaken

Γ[∆,Θ] ` TΓ[∆,∆] ` T

ContractΓ[∆] ` T

Γ[∆,Θ] ` TExchange

Γ[Θ,∆] ` TΓ[(∆,Θ),Π] ` T================ AssociateΓ[∆, (Θ,Π)] ` T

Figure 2.19. Natural-deduction rules for propositional intuitionistic logic

Ida→ a ` a→ a

Ida→ a ` a→ a

Ida ` a

→Ea→ a, a ` a

→Ea→ a, a ` a

→ Ia→ a ` a→ a

→ I· ` (a→ a)→ (a→ a)

Figure 2.20. Proving that (a→ a)→ (a→ a)

simply-typed λ-calculus. This “grammar without expressions” turns out to bea system for propositional intuitionistic logic, as follows. We read a, b, c, . . . asnot base types but atomic formulas, and T1→ T2 as not the function type fromT1 to T2 but the implication formula from T1 to T2. Under this interpretation,the inference rule named Abstract in Figure 2.17 and → I in Figure 2.19 ishypothetical reasoning (implication introduction), and the rule named Apply inFigure 2.17 and→E in Figure 2.19 is modus ponens (implication elimination).The antecedent, to the left of `, lists the hypotheses that have yet to be discharged.The structural rules enforce the fact that hypotheses in intuitionistic logic form anunordered set, from which an unused hypothesis can be freely discarded.

For example, Figure 2.20 (the result of erasing expressions from Figure 2.18)proves the formula (a→ a)→ (a→ a) in intuitionistic logic. Conversely, theλ-calculus expression λf . λx. f (fx) is a succinct way to write this proof that savesspace and hides where structural rules are used. This proof is one of an infinitenumber of proofs of the same formula:

(2.36) λf . λx. x, λf . λx. fx, λf . λx. f (fx), λf . λx. f (f (fx)), . . . .

Propositional intuitionistic logic thus corresponds to the simply-typed λ-calcu-lus: Formulas are types (atomic formulas are base types; implication formulas arefunction types). Proofs are programs (the trivial proof is a variable; to compose

Page 48: Linguistic side effects

38 2. Formalities

· · · IdT1 ` T1

· · · IdT1 ` T1

· · ·

·········

E′

Γ, T1 ` T2→ I

Γ ` T1→ T2

······

E

∆ ` T1→E

Γ,∆ ` T2

=⇒

· · ·

······

E

∆ ` T1 · · ·

······

E

∆ ` T1 · · ··········

E′

Γ,∆ ` T2

Figure 2.21. Proof reduction for→ I followed by→E

two programs is to sequence two proofs). Moreover, to reduce a proof is toexecute a program: to β-reduce a program, as specified in (2.29) on page 33, isto eliminate the detour in a proof that occurs when the conclusion of→ I is usedas the first premise of →E, as shown in Figure 2.21. The bound variable x ina λ-abstraction λx. E′ represents a hypothesized proof; to β-reduce the program(λx. E′)E is to substitute the actual proof E for x in E′. To emphasize this corre-spondence, we henceforth take Abstract to be synonymous with→ I, and Applyto be synonymous with→E.

This formulas-as-types correspondence is also called the Curry-Howard iso-morphism (Howard 1980; Girard et al. 1989; Wadler 2000). We use it in bothdirections. First, any semantics for the simply-typed λ-calculus or a variationon it, such as the usual semantics in which types denote sets and λ-abstractionsdenote functions, is a semantics for propositional intuitionistic logic or a (cor-responding) variation on it. In Section 2.4 below, we present a formal systemfor natural-language grammars that is a variation on propositional intuitionisticlogic, and endow that system with just such a semantics: we turn the proof that anutterance is grammatical into the meaning of that utterance (van Benthem 1983,1988; Carpenter 1997; Ranta 1994). Our linguistic formalism thus extends theformulas-as-types correspondence to natural language: formulas and types aresyntactic categories; proofs and programs are utterance meanings.

Second, extensions to propositional intuitionistic logic guide us to extend thesimply-typed λ-calculus. We now follow this guidance to extend the simply-typedλ-calculus with additional data structures besides functions. These extensionsyield a more practical programming language that is suitable for subsequentexample programs.

2.3.3. Products. Besides functions, we can add ordered pairs to our lan-guage, to help build data structures with multiple components. A product type

Page 49: Linguistic side effects

2.3. The λ-calculus 39

Expressions Γ ` E : T (additional)

Γ ` E1 : T1 ∆ ` E2 : T2× I

Γ,∆ ` 〈E1, E2〉 : T1 × T2

∆ ` E : T1 × T2 Γ[x : T1, y : T2] ` E′ : T ′×E

Γ[∆] ` let E be 〈x, y〉. E′ : T ′

Evaluation contexts C[ ] (additional)

C[ ] ∆ ` E2 : T2

C[〈[ ], E2〉]Γ ` E1 : T1 C[ ]

C[〈E1, [ ]〉]

C[ ] Γ ` E′ : T ′

C[let [ ] be 〈x, y〉. E′]∆ ` E : T0 C[ ]

C[let E be 〈x, y〉. [ ]]

Computation E B E′ (additional)

C[let 〈E1, E2〉 be 〈x, y〉. E′] B C[E′ {x 7→ E1} {y 7→ E2}]

Figure 2.22. Adding products to the λ-calculus

T1×T2 classifies an ordered pair of something of type T1 and something of type T2.For example, assuming that 2 has the type int, the expression 〈2, (λf . f )(λx. x)〉 isan ordered pair with the type int × (int→ int). To extract the two components ofthis pair, we can write code like

(2.37) let 〈2, (λf . f )(λx. x)〉 be 〈x, y〉. yx,

which computes either to

(2.38) ((λf . f )(λx. x))(2)

or to

(2.39) let 〈2, λx. x〉 be 〈x, y〉. yx,

both of which in turn compute to (λx. x)2, and from there to 2. In an expressionlike let E be 〈x, y〉. E′, we call E the argument and E′ the body.

Figure 2.22 formalizes the two rules for expressions, four rules for contexts,and one rule for computation that we just illustrated. Types T are now generatedby the context-free grammar

(2.40) T F A | T → T | T × T ,

with a new case T × T at the end.Via the formulas-as-types correspondence, a product type (written here with

the connective ×) corresponds to a conjunction formula (written typically withthe connective ∧), and a program of a product type corresponds to a proof of aconjunction formula. Similarly, corresponding to logical truth (verum, typicallywritten >) is the unit or singleton type (here written 1). The unit type 1 contains

Page 50: Linguistic side effects

40 2. Formalities

Expressions Γ ` E : T (additional)

1 I· ` 〈〉 : 1

∆ ` E : 1 Γ[·] ` E′ : T ′1 E

Γ[∆] ` let E be 〈〉. E′ : T ′

Evaluation contexts C[ ] (additional)

C[ ] Γ ` E′ : T ′

C[let [ ] be 〈〉. E′]∆ ` E : T0 C[ ]C[let E be 〈〉. [ ]]

Computation E B E′ (additional)

C[let 〈〉 be 〈〉. E′] B C[E′]

Figure 2.23. Adding unit to the λ-calculus

just one element, denoted by the expression 〈〉. It is useful sometimes as aplaceholder. Figure 2.23 formalizes the rules. Types T are now generated by thecontext-free grammar

(2.41) T F A | T → T | T × T | 1,

with a new case 1 at the end.

2.3.4. Sums. We can also add disjoint unions of types to our programminglanguage. A sum type T1 + T2 classifies something of type T1 or type T2, dis-criminated using labels left and right. For example, if 2 has the type int, thenthe expressions left 2 and right((λf . f )(λz. z)) both have the type int + (int→ int).Given something of sum type T1 + T2, we want to check whether it is a T1 labeledwith left or a T2 labeled with right, and to extract the labeled T1 or T2. To do so,we can write code like

(2.42) let left 2 be left x. x | right y. y5

(which chooses the left alternative) and

(2.43) let right((λf . f )(λz. z)) be left x. x | right y. y5

(which chooses the right alternative). These programs compute to 2 and 5,respectively.

Another example, perhaps a more obviously useful one, is to encode theBoolean type as 1 + 1: the sum of two unit types. We can represent, say, false asleft〈〉 and true as right〈〉, both of type 1 + 1 (but distinct, since we are taking thedisjoint union of 1 and 1). If x and y are both Booleans encoded in this way, oftype 1 + 1, then their conjunction is

(2.44) let x be left u. left〈〉 | right u. y,

Page 51: Linguistic side effects

2.3. The λ-calculus 41

Expressions Γ ` E : T (additional)

Γ ` E : T1+ I

Γ ` left E : T1 + T2

Γ ` E : T2+ I

Γ ` right E : T1 + T2

∆ ` E : T1 + T2 Γ[x : T1] ` E′1 : T ′ Γ[y : T2] ` E′2 : T ′+E

Γ[∆] ` let E be left x. E′1 | right y. E′2 : T ′

Evaluation contexts C[ ] (additional)

C[ ]C[left [ ]]

C[ ]C[right [ ]]

C[ ] Γ1 ` E′1 : T ′1 Γ2 ` E′2 : T ′2C[let [ ] be left x. E′1 | right y. E′2]

∆ ` E : T0 C[ ] Γ ` E′2 : T ′

C[let E be left x. [ ] | right y. E′2]∆ ` E : T0 Γ ` E′1 : T ′ C[ ]

C[let E be left x. E′1 | right y. [ ]]

Computation E B E′ (additional)

C[let left E be left x. E′1 | right y. E′2] B C[E′1 {x 7→ E}]

C[let right E be left x. E′1 | right y. E′2] B C[E′2 {y 7→ E}]

Figure 2.24. Adding sums to the λ-calculus

and their disjunction is

(2.45) let x be left u. y | right u. right〈〉.

Figure 2.24 formalizes the rules for sum types, which correspond to dis-junction formulas via the formulas-as-types correspondence. Types T are nowgenerated by the context-free grammar

(2.46) T F A | T → T | T × T | 1 | T + T ,

with a new case T + T at the end. We could also add an empty type, which wouldcorrespond to the false formula in logic.

2.3.5. Recursion. If an integer like 2 is an expression of type int, then wecan use products to express a pair of integers (such as 〈2, 2〉, of type int × int), aswell as chain products to express longer tuples like a triple of integers (such as〈2, 〈2, 2〉〉, of type int × (int × int)). To allow for a tuple of 0, 1, 2, or 3 integers,we can use the sum type

(2.47) 1 + (int + ((int × int) + (int × (int × int)))),

in which we can express a pair of integers (right(right(left〈2, 2〉))) as well as atriple of integers (right(right(right(left〈2, 〈2, 2〉)))). Alternatively, we can use the

Page 52: Linguistic side effects

42 2. Formalities

more concise sum type

(2.48) 1 + (int × (1 + (int × (1 + int)))),

in which again we can express a pair of integers (right〈2, right〈2, left〈〉〉〉) as wellas a triple of integers (right〈2, right〈2, right 2〉〉). This pattern of types lets us dealwith integer lists of any length, as long as a maximum length is specified. Forexample, the type (following the pattern in (2.48))

(2.49) 1 + (int × (1 + (int × (1 + (int × (1 + (int × (1 + (int × 1)))))))))

accommodates up to 5 integers. If y is of this type, then the program

(2.50) let y be left u. 0 | right v. let v be 〈x, y〉. x + (let y be left u. 0 | right v. let v be 〈x, y〉. x + (let y be left u. 0 | right v. let v be 〈x, y〉. x + (let y be left u. 0 | right v. let v be 〈x, y〉. x + (let y be left u. 0 | right v. let v be 〈x, y〉. x))))

computes the sum of the integers in the list.For programs in the real world as well as examples in this dissertation, we

seldom want to impose a maximum length limit on lists. Products and sums arenot enough to encode lists of arbitrary length. Roughly, we need an “infinite type”like

(2.51) 1 + (int × (1 + (int × (1 + (int × (1 + (int × (1 + (int × · · · ))))))))),

where “· · · ” is a “miniature copy” of the whole type. In general, for any type T(such as int), we can define the type list T by the recursive equation

(2.52) list T = 1 + (T × list T ).

Informally speaking, (2.52) defines the type of a list whose elements are of type T ,by specifying that such a list is either empty or nonempty. If the list is empty, werepresent it by the unit type 1. If the list is nonempty, we represent it by a producttype T × list T , where the first component T represents the first element of thelist, and the second component list T represents the remainder of the list (whichin turn can be empty or nonempty, and so on). Because these are the only twoalternatives for a list, (2.52) defines list T to be a sum type that recursively refersto list T itself. The type list T is hence called a recursive type.

For the sake of presenting example programs below, we informally introducerecursive types and programs here. Pierce (2002) and Gapeyev et al. (2002) givemore formal introductions. In particular, this dissertation uses only inductivetypes, which we can add to our programming language while preserving thetermination property (Greiner 1992).

Page 53: Linguistic side effects

2.3. The λ-calculus 43

A second example of a recursive type is the type tree T of binary trees, definedby

(2.53) tree T = T + (tree T × tree T ),

where T is the type of leaf labels. This recursive equation defines tree T to bea sum type: a binary tree is either a leaf node (containing some data of type T )or a branch (containing an ordered pair of type tree T × tree T , recursively). Forinstance,

(2.54) right〈left 1, right〈left 2, left 3〉〉

is a binary tree, of type tree int. We can think of tree T as the “infinite type”

(2.55) T + ((T + (· · · × · · · )) × (T + (· · · × · · · ))),

where each “· · · ” is a “miniature copy” of the whole type. This “infinite type”is produced by unrolling the recursion in (2.53)—that is, by repeatedly substi-tuting (2.53) into itself. Another way to encode binary trees is to define tree Tby

(2.56) tree T = T + ((1 + 1)→ tree T ).

The tree in (2.54) is then encoded as

(2.57) right(λx. let x be left u. left 1| right u. right(λy. let y be left u. left 2 | right u. left 3)).

As a third example, we can encode integers using a recursive type. An integeris either 0, a positive integer (in other words, a natural number), or a negativeinteger (in other words, the negation of a natural number). We write nat for thetype of natural numbers, defined by the recursive equation

(2.58) nat = 1 + nat

because a natural number is either 1 or the successor of a natural number. Un-rolling the recursion gives the “infinite type”

(2.59) 1 + (1 + (1 + (1 + · · · ))),

where “· · · ” is a “miniature copy” of the whole type. Having defined nat, we canthen define the type int of integers by the (nonrecursive) equation

(2.60) int = 1 + (nat + nat).

Here 1 is the type of integers that are zero, the first nat is the type of integersthat are positive, and the second nat is the type of integers that are negative.For example, we encode the integer 3 as right(left(right(right(left〈〉)))), where theright(left(. . . )) outside means a positive integer, and the right(right(left〈〉)) insidemeans the natural number 3.

Page 54: Linguistic side effects

44 2. Formalities

Some operations on recursive types can be expressed by ordinary programs.For example, if E is of type list int, then the program (2.50) on page 42 computesthe sum of the first 5 integers in the list (or of the entire list, if it is shorter than 5elements). Other operations on recursive types call for recursive programs. Likea recursive type, a recursive program is defined by a recursive equation, and canbe thought of as the “infinite program” that results from unrolling the equation.For example, the recursive equation

(2.61) F = λy. let y be left u. 0 | right v. let v be 〈x, y〉. x + Fy

defines an “infinite program” F, so that Fy computes the sum of the integers inthe list y. This “infinite program” (cf. (2.50))

(2.62) λy. let y be left u. 0 | right v. let v be 〈x, y〉. x + (λy. let y be left u. 0 | right v. let v be 〈x, y〉. x + (λy. let y be left u. 0 | right v. let v be 〈x, y〉. x + (λy. let y be left u. 0 | right v. let v be 〈x, y〉. x + (λy. let y be left u. 0 | right v. let v be 〈x, y〉. x + . . . y)y)y)y)y

is the result of unrolling (2.61). Here “. . . ” is a “miniature copy” of the wholeprogram.

2.4. Type-logical grammar

We now turn back to natural language. The combinatory structure of theλ-calculus turns out to be extremely useful for modeling how utterances combinewith each other. For the basic intuition, consider a sentence like

(2.63) Alice saw Bob.

We want to model how the pronunciations and meanings of the individual wordsAlice, saw, and Bob combine to form the pronunciation and meaning of thecomplete sentence (2.63).

Syntactically, we treat a complex expression as a binary tree whose leaves areatomic expressions. For example, we treat (2.63) as the binary tree

(2.64)

Alice

wwwwww

;;;;;;;

saw

������Bob.

<<<<<

(We revisit this grouping below.) The sequence of leaves in a tree from left to rightis the tree’s fringe (also frontier or yield). For example, the fringe of (2.64) isAlice saw Bob. If a grammar proves a tree well-formed (in other words, classifies,admits, or generates the tree), then we also say that the grammar proves the tree’sfringe well-formed.

Page 55: Linguistic side effects

2.4. Type-logical grammar 45

Semantically, we treat the verb saw as a function that takes two arguments,Alice and Bob, to give the sentence (2.63). A simple (or simplistic) theory ofmeaning goes as follows: Alice denotes the individual Alice; Bob denotes theindividual Bob; and saw denotes a relation among individuals, or equivalently, afunction from individuals to functions from individuals to Boolean values (trueor false). If Alice did see Bob, then the meaning of saw would be a functionthat maps Bob to a function that maps Alice to true. (Note the argument order:the meaning of saw first takes Bob as argument, then Alice.) A more sophis-ticated theory of meaning might involve mental concepts, physical references,or functions from possible worlds as in Section 1.5. For example, a sentencemight denote a propositional formula, a situation description, or a function frompossible worlds to Boolean values. But the basic idea of applying the meaningof saw as a function to the meanings of Bob and Alice as arguments is the same.Whatever meanings are, let us say that Alice means a, Bob means b, and sawmeans f , so that Alice saw Bob denotes f ba.

To express the syntactic assertion that Alice is a noun phrase alongside thesemantic assertion that Alice means a, we write the λ-calculus judgment

(2.65) a : Alice ` a : np,

pronounced “Alice with the meaning a is an np with the meaning a”. We seta above in Roman type because it is not in the syntax of the language underdiscussion, namely English. In the same judgment, Alice and np are both (base)types. That is because Alice and np are both (atomic) classifications of expressions:an expression belongs to the type Alice just in case it is the word Alice, whereasan expression belongs to the type np just in case it is a noun phrase.

As promised in Section 2.1, this representation of expressions tightly couplessemantics to syntax: each colon connects the semantic meaning of an expres-sion (to the left) to its syntactic classification (to the right). For example, thejudgment (2.65) should feel natural because the antecedent a : Alice couples themeaning a to the classification Alice, whereas a :np couples the same meaning a tothe less informative classification np. There is no way to represent an expressionthat is well formed but meaningless or ill formed yet meaningful. The turnstile `models entailment among linguistic classifications, so this judgment says that themeaning a of every Alice-expression is the meaning of some np-expression. As inSection 2.3.1, we can take each type to denote a set: we can take np to denote theset of all individuals, and Alice to denote the singleton set of the individual Alice.

Analogously to (2.65), we write

(2.66) b : Bob ` b : np

to model the word Bob.

Page 56: Linguistic side effects

46 2. Formalities

f : saw ` f : np→ np→ s b : Bob ` b : np→E

f : saw, b : Bob ` f b : np→ s a : Alice ` a : np→E

( f : saw, b : Bob), a : Alice ` f ba : sExchange

a : Alice, ( f : saw, b : Bob) ` f ba : s

(a) Alice saw Bob

···

(a)a : Alice, ( f : saw, b : Bob) ` f ba : s

Weakena : Alice,

(( f : saw, b : Bob), c : Carol

)` f ba : s

(b) *Alice saw Bob Carol

f : saw ` f : np→ np→ s a : Alice ` a : np→E

f : saw, a : Alice ` f a : np→ s a : Alice ` a : np→E

( f : saw, a : Alice), a : Alice ` f aa : sAssociate

f : saw, (a : Alice, a : Alice) ` f aa : sContract

f : saw, a : Alice ` f aa : s

(c) *saw Alice

Figure 2.25. Natural-language derivations in the simply-typed λ-calculus, usingstructural rules to generate acceptable as well as unacceptable sentences

The verb saw is a bit more complicated. Syntactically, it combines with twonoun phrases (np) to form a complete sentence (s, or clause). Semantically, itapplies to two noun-phrase meanings (perhaps individuals) to form a complete-sentence meaning (perhaps a Boolean). We write

(2.67) f : saw ` f : np→ np→ s

to capture and couple these assertions (Ajdukiewicz 1935). Here saw is a basetype that classifies just the word saw, whereas np→ np→ s is a function type thatclassifies those expressions that combine with an np to form an expression thatcombines with an np to form an s. This judgment says that the meaning f of everyexpression of type saw is the meaning of some expression of type np→ np→ s.As in Section 2.3, we can take each function type to denote a set of functions: thetype np→ np→ s denotes the set of functions from individuals to functions fromindividuals to sentence meanings.

We can now combine these words into a sentence. Figure 2.25a derives Alice

Page 57: Linguistic side effects

2.4. Type-logical grammar 47

saw Bob (2.63) in the simply-typed λ-calculus (Figure 2.17). As promised at thebeginning of this chapter, each word in the sentence enters the proof as a premise.Pairing models juxtaposition of utterances. For example, the judgment

(2.68) f : saw, b : Bob ` f b : np→ s

in the proof says that, if an expression is constituted of one part that can beclassified as saw and means f , followed by another part that can be classifiedas Bob and means b, then it can be classified as np→ s and means the functionapplication f b. The formula np→ s here indicates that saw Bob is incomplete; itmust be applied to another np (in this case Alice) to form a complete sentence.The final conclusion of the proof says not just that Alice saw Bob means f ba, butwith a particular ordering (Alice before saw before Bob) and grouping (saw withBob rather than Alice) of words.

As a model of the structure and meaning of utterances, the simply-typedλ-calculus suffers from several obvious inaccuracies.

• Phrases cannot be reordered freely in natural language, yet that is pre-cisely what the Exchange rule allows. One step before Figure 2.25aconcludes correctly that Alice saw Bob is well formed with some mean-ing, it concludes incorrectly that *saw Bob Alice is also well formedwith the same meaning.• Irrelevant phrases cannot be inserted in natural language, yet that is pre-

cisely what the Weaken rule allows. Figure 2.25b concludes incorrectlythat *Alice saw Bob Carol is well formed with the same meaning asAlice saw Bob.• Duplicate phrases cannot be merged in natural language, yet that is pre-

cisely what the Contract rule allows. Figure 2.25c concludes incorrectlythat *saw Alice is a well-formed sentence that means that Alice sawherself.• Phrases cannot be regrouped (that is, the binary tree of constituency

reparenthesized or rebracketed) freely in natural language, yet that isprecisely what the Associate rule allows. Much empirical evidenceindicates that grouping matters in natural language. To take an examplefrom English: Without grouping, the string ’s mother saw Bob wouldhave the type np→ s just as the verb phrases saw Bob and heard Caroldo. Yet, whereas the sentence Alice saw Bob and heard Carol meansthat Alice saw Bob and Alice heard Carol, the sentence Alice’s mothersaw Bob and heard Carol does not mean that Alice’s mother saw Boband Alice heard Carol. We follow standard linguistics practice here ingrouping a transitive verb like saw more tightly with its object Bob thanwith its subject Alice.

Page 58: Linguistic side effects

48 2. Formalities

Idx : T ` x : T

Γ, x : T1 ` E : T2/ I

Γ ` λx. E : T2/T1

x : T1, Γ ` E : T2\ I

Γ ` λx. E : T1\T2

Γ ` F : T2/T1 ∆ ` E : T1/E

Γ,∆ ` FE : T2

∆ ` E : T1 Γ ` F : T1\T2\ E

∆,Γ ` FE : T2

Figure 2.26. The non-associative Lambek calculus without products, with seman-tics

To deal with these problems, we remove all four structural rules (Weaken, Con-tract, Exchange, Associate) from the grammar. However, Figure 2.25a usesExchange to derive correctly that Alice saw Bob is acceptable. This use of Ex-change is crucial, because the→E rule in Figure 2.17 only allows a function totake an argument to its right, yet saw Bob needs to take its argument Alice toits left. Without Exchange, we need some other way to apply a function to anargument to the left.

Lambek (1958), following Bar-Hillel (1953), solves the problem by splittingfunctions into two groups: those that take arguments to the right (of type T2/T1,pronounced “T2 from T1” or “T2 over T1”) and those that take arguments to theleft (of type T1\T2, pronounced “T1 into T2” or “T1 under T2”). Figure 2.26shows the result of the split, with all structural rules removed. Each of the twofunction connectives, / and \, has its own introduction and elimination rules. Inthis new system, we can assign saw the type (np\s)/np and represent Alice sawBob (2.63) by the following proof.

(2.69) a : Alice ` a : npf : saw ` f : (np\s)/np b : Bob ` b : np

/Ef : saw, b : Bob ` f b : np\s

\ Ea : Alice, ( f : saw, b : Bob) ` f ba : s

One way to understand the inference rules in Figure 2.26 is to think of eachword or phrase as a conserved resource. For example, the verb saw has the type(np\s)/np because it first consumes an np to its right, then consumes an np to itsleft, finally to give an s. Without structural rules, resources can no longer be freelydiscarded, duplicated, reordered, or regrouped. Hence this is a substructural logic(Restall 2000), more specifically multiplicative linear logic (Girard 1987) withneither commutativity (the Exchange rule) nor associativity (the Associate rule).

By convention, the slashes / and \ associate towards the result type. Forexample, c/b/a means the type (c/b)/a, whereas a\b\c means the type a\(b\c).

Figure 2.26 is called the non-associative Lambek calculus without products.It is an instance of type-logical grammar (or categorial type logic) in that ituses connectives like / and \ in types to classify utterances and govern their

Page 59: Linguistic side effects

2.4. Type-logical grammar 49

IdT ` T

Γ, T1 ` T2/ I

Γ ` T2/T1

T1, Γ ` T2\ I

Γ ` T1\T2

Γ ` T2/T1 ∆ ` T1/E

Γ,∆ ` T2

∆ ` T1 Γ ` T1\T2\ E

∆,Γ ` T2

Figure 2.27. The non-associative Lambek calculus without products, withoutsemantics

combination in a substructural logic (Moortgat 1997). As noted earlier, onedistinguishing property of type-logical grammar is its tight coupling of syntaxand semantics: Via the formulas-as-types correspondence, the proof that classifiesan utterance as well-formed dictates the utterance’s meaning. In particular, the / Iand \ I rules build functions, whereas /E and \ E apply functions. This coupling isso tight that, for brevity, we omit meanings in most derivations below. Figure 2.27shows the same inference rules so abbreviated. This abbreviation should notdistract from the crucial constraints that semantics places on linguistic theory. Aswe turn below to additional linguistic phenomena like quantification, we oftendecide how to analyze them by considering what phrases can mean.

2.4.1. Multimodal type-logical grammar. Section 2.1 above gives threereasons why context-free grammars are not ideal for modeling natural language.

• They do not exist for some languages.• They are awkward for some linguistic phenomena.• They enforce no correspondence between syntax and semantics.

So far we have only addressed the last point, using the formulas-as-types corre-spondence. As for the first two points (in short, expressiveness), we actually stillcannot describe any set of strings that is not a context-free language: Context-freegrammars and non-associative Lambek grammars generate exactly the same string-sets. In other words, they have the same weak generative capacity (Buszkowski1986; see also Kandulski 1988 for the Lambek calculus with products, and Pentus1993, 1997, 1996 for variations with associativity).

To go beyond context-free languages, we introduce additional modes ofcombination besides juxtaposition (Moortgat 1997; Section 4 and referencestherein). Most intuitive understanding for these modes comes from their use,which abounds in Chapter 4. In this section, we just describe the machinery anddemonstrate it with a formal language alluded to in Section 2.1: the set of doubledstrings {ww | w ∈ Σ∗ }, which is not context-free.

The basic idea behind multimodal type-logical grammar is that utterancesmay combine in more than one way, and some of these ways may not have arity

Page 60: Linguistic side effects

50 2. Formalities

two. For intuition, imagine that we are modeling not utterances pronounced overtime but visual elements laid out on a page (Henderson 1982) or musical notesarranged in a score (Hudak et al. 1996). Two visual elements may be combined byjuxtaposing them horizontally or vertically (two binary modes of combination);one visual element may be embellished by framing it with a box (a unary mode ofcombination). Two musical notes may be combined by playing them at the sametime or one after another (two binary modes of combination); one musical notemay be embellished with dynamics like forte (a unary mode of combination).

While it may be less obvious at first glance, it is also useful to posit multipleways to combine utterances in spoken language: juxtaposition alone is often notenough to encode the desired syntax and semantics. Indeed, our linguistic analysesin Chapter 4 rely crucially on adding binary and unary modes and specifying howthey interact.

For example, to analyze the copular sentence

(2.70) Alice is a woman,

it is intuitive to take the noun woman to mean a function that applies to themeaning of Alice to yield the meaning of the sentence: perhaps woman is afunction from individuals to Booleans that maps women to true and others tofalse. In other words, woman is a noun that predicates of Alice. However, wecannot assign the type np\s or s/np to woman, because neither *Alice woman nor*woman Alice is an acceptable sentence in English. Rather, Section 4.3.1 belowintroduces a new binary mode ,n (the subscript n being mnemonic for “noun”)for combining something with a noun that predicates of it. We then assign thetype np\ns to woman, so that the judgment

(2.71) Alice,n woman ` s

is derivable, while the judgment

(2.72) Alice, woman ` s

remains not derivable.To take another example, the wh-question who Alice saw in the sentence

(2.73) Bob knows who Alice saw

can be analyzed as the function λx. f xa, where f and a are the meanings of sawand Alice as above. In general, we can analyze a wh-question to mean a functionfrom a short answer (such as Carol) to a proposition (such as that Alice sawCarol) (Krifka 2001). However, who Alice saw cannot have the type np\s ors/np, because neither *Carol who Alice saw nor *who Alice saw Carol is an

Page 61: Linguistic side effects

2.4. Type-logical grammar 51

Γ,m x : T1 ` E : T2/m I

Γ ` λx. E : T2/mT1

x : T1,m Γ ` E : T2\m I

Γ ` λx. E : T1\mT2

Γ ` F : T2/mT1 ∆ ` E : T1/m E

Γ,m ∆ ` FE : T2

∆ ` E : T1 Γ ` F : T1\mT2\m E

∆,m Γ ` FE : T2

Figure 2.28. Adding a binary mode m to the Lambek calculus without products

acceptable sentence in English, let alone one that means that Alice saw Carol.Rather, Section 4.5.2 below introduces a new binary mode ,? for combiningsomething with a question that predicates of it. We then arrange for our grammarto derive the type np\?s for who Alice saw, so that the judgment

(2.74) Carol,?(who, (Alice, saw)

)` s

is derivable, while the judgment

(2.75) Carol,(who, (Alice, saw)

)` s

remains not derivable.For the linguistic applications in Chapter 4, yet another binary mode is crucial.

In Section 4.1, we treat a context as an expression in its own right. We thenintroduce the continuation mode, which combines a subexpression with a contextto form a larger expression by plugging the subexpression into the context, orequivalently, by wrapping the context around the subexpression. Informallyspeaking, we arrange for the continuation mode to combine the subexpressionBob with the context Alice saw [ ]’s mother to form the larger expression Alicesaw Bob’s mother.

Motivated by these examples, we now formalize multiple modes of combi-nation. The non-associative Lambek calculus (without products) in Figures 2.26and 2.27 (on pages 48–49) has a single, binary mode of combination, whichformally models the juxtaposition of expressions with grouping. Thus the syntaxof an expression is treated as a binary tree with a single kind of branch nodes,unlabeled in (2.64) on page 44. This mode of combination is called the defaultmode. In an antecedent, it is represented by the comma.

To add a new binary mode to type-logical grammar, we simply duplicate thecomma and the connectives / and \, as follows. We first pick a letter, say m, todistinguish this new mode from others. We let two type environments Γ and ∆be combined using a new comma “,m” into a larger type environment Γ,m ∆.We also add types of the form T2/mT1 and T1\mT2, where T1 and T2 are types.Finally, we add the inference rules in Figure 2.28. These rules are just the rules inFigure 2.26 (except Id) with m subscripted everywhere. Just as the semantics in

Page 62: Linguistic side effects

52 2. Formalities

〈Γ〉m ` E : T�↓m I

Γ ` E : �↓mTΓ ` E : T

^m I〈Γ〉m ` E : ^mT

Γ ` E : �↓mT�↓m E

〈Γ〉m ` E : T∆ ` E : ^mT0 Γ[〈x : T0〉m] ` E′ : T

^m EΓ[∆] ` let E be x. E′ : T

Figure 2.29. Adding a unary mode m to the Lambek calculus

Figure 2.26 does not distinguish between / and \ in abstractions and applications,the semantics here does not distinguish /m, \m, /, and \ from each other. Thischoice reflects the popular intuition that different binary modes regulate justsyntactic combination.

To add a unary mode to type-logical grammar, we again pick a letter m. Wethen let a type environment Γ be enclosed in a pair of angle brackets 〈 〉m to forma larger type environment 〈Γ〉m. We also add types of the form ^mT and �↓mT ,where T is a type. Finally, we add the inference rules in Figure 2.29. The unarytype connective ^m and the corresponding unary structural punctuation 〈 〉mrepresent this new mode of combination in types and type environments, muchas the binary structural punctuation , (comma) represents juxtaposition in typeenvironments. By convention, unary connectives have higher precedence thanbinary connectives, so ^ma/b means the type (^ma)/b rather than ^m(a/b).

The category theorist will recognize that these rules specify that ^m and �↓mare a pair of functors that form an adjunction. The modal logician can take ^m

to mean “in some world accessible from here” and take �↓m to mean “in everyworld accessible to here”. The reader who is more comfortable with binary modescan treat 〈Γ〉m as t,m Γ, ^mT as t,m T , and �↓mT as t\mT , for some base type t andbinary mode m. The rules for the unary m mode then follow from Id and the rulesfor the binary m mode (including for products).

The semantics shown in Figure 2.29 essentially forgets the unary mode; forexample, the premise and conclusion have the same meaning in the first threerules. This choice reflects how the linguistic analyses in this dissertation use unarymodes, namely to regulate just syntactic combination as we now demonstratebriefly. Moortgat (1997; Section 4.2.1) describes another semantics for unarymodes that attributes semantic significance to them, such as intensionality andinformation structure.

A multimodal type-logical grammar can contain any number of modes, of anyarity (Dunn 1991), though we only need binary and unary modes. Each modecan have its own structural rules. For example, if we want a binary mode m to beassociative but not the default mode, we can leave out the Associate rule for thedefault mode in Figure 2.17, as discussed above, but add the corresponding rule

Page 63: Linguistic side effects

2.4. Type-logical grammar 53

for mode m:

(2.76)Γ[(∆,mΘ),mΠ] ` E : T====================== Associatem.Γ[∆,m (Θ,mΠ)] ` E : T

Moreover, modes can interact through mixed structural rules, which are structuralrules that mention multiple modes. For example, if we want to allow reorderingcertain phrases, we can introduce a unary mode m, and arrange for reorderablephrases to have types of the form ^mT . The default (binary) mode and the newunary mode m can then communicate through the following Exchange-like mixedrule.

(2.77)Γ[〈∆〉m, 〈Θ〉m] ` E : TΓ[〈Θ〉m, 〈∆〉m] ` E : T

This rule allows a pair of juxtaposed phrases to be reordered as long as their typesare both enclosed by ^m.

With multiple modes of combination, we need to redefine what it means for agrammar to generate a string. A multimodal grammar designates certain modes,including the default mode, as external. We say that the grammar generatesa string if the grammar proves a tree well-formed whose fringe is the string,and which is built up from atomic expressions using only external modes. Forexample, because we do not designate the binary modes n and ? as external inChapter 4, deriving the judgments (2.71) and (2.74) on pages 50–51 does notgenerate the strings *Alice woman and *Carol who Alice saw (that is, does notpredict the strings to be acceptable sentences). Informally speaking, a mode isexternal just in case a speaker can pronounce it. Otherwise, it is internal.

To demonstrate all this machinery, Figure 2.30 shows a multimodal grammarthat generates the context-sensitive language of strings of the form ww, where wis any string over the alphabet {a, b}. This grammar uses three modes: the defaultbinary mode (unnamed), a new binary mode d, and a new unary mode (unnamed).Only the default mode is external.

The idea behind this grammar is to create structures ∆,d Θ such that ∆ and Θare identical. To achieve this, the lexical entries create and juxtapose copies ofthe structures a,d a and b,d b. Structural rule 2 then lets the d mode “bubble up”to the top level. Figure 2.30 derives the doubled string aabaab using structuralrule 2 twice. Finally (at the bottom of the derivation), structural rule 1 changesthe top-level composition mode from d to default, which renders the antecedentpronounceable. Because the d mode only combines a with a and b with b in thelexicon, this grammar generates only strings of the form ww.

Page 64: Linguistic side effects

54 2. Formalities

Lexicona ` �↓s/da a ` (�↓s\�↓s)/da b ` �↓s/db b ` (�↓s\�↓s)/db

Structural rulesΓ[〈∆,d Θ〉] ` E : T

1Γ[∆, Θ] ` E : T

Γ[(∆1,d Θ1), (∆2,d Θ2)] ` E : T2

Γ[(∆1, ∆2),d (Θ1, Θ2)] ` E : T

Sample derivation of aabaab

a ` �↓s/daId

a ` a/d E

a,d a ` �↓sa ` (�↓s\�↓s)/da

Ida ` a

/d Ea,d a ` �↓s\�↓s

\ E(a,d a), (a,d a) ` �↓s

b ` (�↓s\�↓s)/dbId

b ` b/d E

b,d b ` �↓s\�↓s\ E

((a,d a), (a,d a)), (b,d b) ` �↓s2

((a, a),d (a, a)), (b,d b) ` �↓s2

((a, a), b),d ((a, a), b) ` �↓s�↓ E

〈((a, a), b),d ((a, a), b)〉 ` s1

((a, a), b), ((a, a), b) ` s

Figure 2.30. A multimodal type-logical grammar that generates the context-sensitive language of doubled strings {ww | w ∈ {a, b}∗ }

Page 65: Linguistic side effects

CHAPTER 3

The analogy: delimited control and quantification

In Chapter 1, we saw two instances of similarity between a computationalside effect and a linguistic side effect: state can be analyzed like anaphora, andenvironment can be analyzed like intensionality. Armed with the formal toolsfrom Chapter 2, we now consider a third instance: delimited control can beanalyzed like quantification. As in Chapter 1, we first present the two side effectsand how they have been treated in programming-language theory and linguistics.We then make the new observation that the side effects are similar, so as to comeup with better treatments in Chapters 4 and 5.

3.1. Delimited control

The model of computation embodied by the λ-calculus of Section 2.3 is “hier-archical”: two disjoint subexpressions of the same program execute in isolation.For example, the program

(3.1) (λx. x)2 + (λy. y)3

contains two disjoint subexpressions (λx. x)2 and (λy. y)3. Either subexpressioncan undergo β-reduction separately, in either order, without affecting the other orchanging the final result 5 of the program.

(λx. x)2 + (λy. y)3 B 2 + (λy. y)3 B 2 + 3 B 5(3.2)(λx. x)2 + (λy. y)3 B (λx. x)2 + 3 B 2 + 3 B 5(3.3)

Whenever a context (like [ ] + (λy. y)3) waits to be plugged with an evaluationresult (like 2), only one subexpression (like (λx. x)2) may evaluate to provide thatevaluation result. This hierarchy of computation simplifies parallel execution:should a second processor become available, it can evaluate a chosen subexpres-sion concurrently and provide the result separately without interfering with orbeing interfered with by the evaluation of the rest of the program, that is, thecontext of the chosen subexpression.

Sometimes, however, we want multiple ways to plug a context. For example,suppose that we have a list of integers, of type list int from Section 2.3.5, andwant to compute the product of its elements. The recursive function (cf. (2.61) on

55

Page 66: Linguistic side effects

56 3. The analogy: delimited control and quantification

page 44)

(3.4) P1 = λy. let y be left u. 1 | right v. let v be 〈x, y〉. x × P1y

does the job. For example, applying P1 to the list

(3.5) L1 = right〈4, right〈3, right〈2, left〈〉〉〉〉

gives the program P1L1, which executes as follows. (Ellipsis below stands forleft u. 1 | right v. let v be 〈x, y〉. x × P1y.)

P1(right〈4, right〈3, right〈2, left〈〉〉〉〉)(3.6)B let right〈4, right〈3, right〈2, left〈〉〉〉〉 be . . .B let 〈4, right〈3, right〈2, left〈〉〉〉〉 be 〈x, y〉. x × P1yB 4 × P1(right〈3, right〈2, left〈〉〉〉)B 4 × let right〈3, right〈2, left〈〉〉〉 be . . .B 4 × let 〈3, right〈2, left〈〉〉〉 be 〈x, y〉. x × P1yB 4 × (3 × P1(right〈2, left〈〉〉))B 4 × (3 × let right〈2, left〈〉〉 be . . . )B 4 × (3 × let 〈2, left〈〉〉 be 〈x, y〉. x × P1y)B 4 × (3 × (2 × P1(left〈〉)))B 4 × (3 × (2 × let left〈〉 be . . . ))B 4 × (3 × (2 × 1))B 4 × (3 × 2)B 4 × 6B 24

However, this solution may compute the product for every suffix of the given list.If we already know that 0 appears in the list, then we also know that the productis 0, so we need not compute the product for any remaining suffix of the list. Forexample, to compute the product for the list

(3.7) L2 = right〈4, right〈0, right〈3, right〈2, left〈〉〉〉〉〉,

there is no need to calculate 3 × 2, or even 4 × 0, once we see 0.To make it easy to optimize computing the product of a list with 0 in it,

we extend our programming language with features for delimited control inSection 3.1.1 below. Despite the familiarity of these features, our initial attemptat these features makes the language dangerously nonconfluent. We then fix theproblem in Section 3.1.2 by restricting the computation relation using the notionsof values and evaluation contexts.

Page 67: Linguistic side effects

3.1. Delimited control 57

3.1.1. A first attempt. To implement this optimization, we extend our pro-gramming language with two constructs, # (pronounced “prompt” or “reset”) andabort. The reader familiar with exception handling in programming languagesmay think of abort as a primitive variant of throw, and # as a primitive variant oftry . . . catch.

If E is an expression, then so are #E and abort E. If we were working in theuntyped rather than simply-typed λ-calculus, then we would add to Figure 2.14on page 31 the inference rules

(3.8)Γ ` EΓ ` #E

and

(3.9)Γ ` E

Γ ` abort E.

We leave it to Section 3.2.2, after describing the semantics of # and abort, toextend the simply-typed λ-calculus with formal typing rules for these constructs.

With # and abort, we can write the recursive function

P2 = λy. #(P′2y)(3.10)

P′2 = λy. let y be left u. 1| right v. let v be 〈x, y〉.

(if x = 0 then abort 0 else x) × P′2y

(3.11)

and apply P2 to the given list. In words, abort 0 means to replace the subexpres-sion inside the innermost enclosing # by 0, that is, answer 0 to the #. By “thesubexpression inside the innermost enclosing #”, we mean the smallest subexpres-sion #E that contains abort 0 in the running program when abort 0 takes effect,not in the initial written program.

In a first attempt to formalize how # and abort execute, we could add thefollowing steps to the computation relation.

C[#E] B C[E](3.12)C[#(D[abort E′])] B C[#E′](3.13)

The metavariable D[ ] stands for an evaluation subcontext, or subcontext for short:an evaluation context whose [ ] is not enclosed in #. This proviso matches abortto the innermost enclosing #. Because subcontexts are crucial to delimited control,we formalize the notion in Figure 3.1. This figure defines a new judgment formD[ ] : d, which means that D[ ] is a subcontext. The two inference rules at thebottom of the figure redefine contexts C[ ] in terms of subcontexts; these tworules are equivalent to Figures 2.17, 2.22, 2.23, and 2.24 taken together.

Page 68: Linguistic side effects

58 3. The analogy: delimited control and quantification

Evaluation subcontexts D[ ] : d

[ ] : dD[ ] : d

D[λx. [ ]] : dD[ ] : d ∆ ` E : T1

D[[ ]E] : dΓ ` F : T0 D[ ] : d

D[F[ ]] : d

D[ ] : d ∆ ` E2 : T2

D[〈[ ], E2〉] : dΓ ` E1 : T1 D[ ] : d

D[〈E1, [ ]〉] : dD[ ] : d Γ ` E′ : T ′

D[let [ ] be 〈x, y〉. E′] : d

∆ ` E : T0 D[ ] : dD[let E be 〈x, y〉. [ ]] : d

D[ ] : d Γ ` E′ : T ′

D[let [ ] be 〈〉. E′] : d∆ ` E : T0 D[ ] : dD[let E be 〈〉. [ ]] : d

D[ ] : dD[left [ ]] : d

D[ ] : dD[right [ ]] : d

D[ ] : d Γ1 ` E′1 : T ′1 Γ2 ` E′2 : T ′2D[let [ ] be left x. E′1 | right y. E′2] : d

∆ ` E : T0 D[ ] : d Γ ` E′2 : T ′

D[let E be left x. [ ] | right y. E′2] : d∆ ` E : T0 Γ ` E′1 : T ′ D[ ] : d

D[let E be left x. E′1 | right y. [ ]] : d

Evaluation contexts C[ ]D[ ] : d

D[ ]C[ ]

C[#(D[ ])]

Figure 3.1. Defining evaluation subcontexts and redefining evaluation contexts

The intention of the new constructs # and abort is for the program P2L2 to ex-ecute as follows. (Ellipsis below stands for 〈x, y〉. (if x = 0 then abort 0 else x)×P′2y.)

P2 (right〈4, right〈0, right〈3, right〈2, left〈〉〉〉〉〉)(3.14)

B #(let right〈4, right〈0, right〈3, right〈2, left〈〉〉〉〉〉 be left u. 1| right v. let v be . . . )

B #(let 〈4, right〈0, right〈3, right〈2, left〈〉〉〉〉〉 be . . . )B #((if 4 = 0 then abort 0 else 4) × P′2(right〈0, right〈3, right〈2, left〈〉〉〉〉))B #((if false then abort 0 else 4) × P′2(right〈0, right〈3, right〈2, left〈〉〉〉〉))B #(4 × P′2(right〈0, right〈3, right〈2, left〈〉〉〉〉))B #(4 × let right〈0, right〈3, right〈2, left〈〉〉〉〉 be left u. 1 | right v. let v be . . . )B #(4 × let 〈0, right〈3, right〈2, left〈〉〉〉〉 be . . . )B #(4 × ((if 0 = 0 then abort 0 else 0) × P′2(right〈3, right〈2, left〈〉〉〉)))B #(4 × ((if true then abort 0 else 0) × P′2(right〈3, right〈2, left〈〉〉〉)))B #(4 × ((abort 0) × P′2(right〈3, right〈2, left〈〉〉〉)))

Page 69: Linguistic side effects

3.1. Delimited control 59

B #0 by (3.13)B 0 by (3.12)

The context #[ ] in P2 can be plugged in two ways—either by sequentially com-pleting the recursion over the entire list, or by abort. The steps above take thesecond way out before inspecting right〈3, right〈2, left〈〉〉〉 at all, as desired. Wesay that the control operator abort transfers control in the program nonlocally,from within P′2 to the control delimiter # (Felleisen 1987, 1988). Another wayto understand this behavior is that abort actively takes control over, rather thanpassively reporting to, its evaluation subcontext: the subcontext D[ ] in (3.13)expects to receive an evaluation result from abort E′, but is instead discarded byabort E′ without receiving anything.

Unfortunately, our computation relation is too permissive for this approachto work as is. Although the desired sequence of computation steps in (3.14) ispossible, so are many undesired sequences. First, because

(3.15) (λy. let y be left u. 1| right v. let v be 〈x, y〉. (if x = 0 then [ ] else x) × P′2y)y

is an evaluation subcontext, abort 0 in P2 may take effect right away, before thefunction is applied to any list y, with or without 0.

(3.16) P2 B λy. #0 B λy. 0

If y contains no 0, this premature evaluation of abort then produces the wrongproduct. Second, according to (3.12) on page 57, the # in P2 may be removed atany time, not necessarily as late as in (3.14).

(3.17) P2 B λy. P′2y

This premature evaluation of # then prevents abort 0 from matching the # in P2.Third, because

(3.18) if x = 0 then abort 0 else [ ]

is an evaluation context, even when P2 encounters 0 in the list, it may not skip therest of the list. For example, an alternative ending to (3.14) is

P2L2 B+ #(4 × ((abort 0) × P′2(right〈3, right〈2, left〈〉〉〉)))

B+ #(4 × ((abort 0) × 6))B #0 by (3.13)B 0 by (3.12).

(3.19)

In this scenario, abort 0 takes effect only after the partial product 6 is computedfor the rest of the list.

Page 70: Linguistic side effects

60 3. The analogy: delimited control and quantification

Values Γ `V V : T

IdVx : T `V x : TΓ, x : T1 ` E : T2

→ IVΓ `V λx. E : T1→ T2

Γ `V V1 : T1 ∆ `V V2 : T2× IV

Γ,∆ `V 〈V1, V2〉 : T1 × T2

1 IV· `V 〈〉 : 1

Γ `V V : T1+ IV

Γ `V left V : T1 + T2

Γ `V V : T2+ IV

Γ `V right V : T1 + T2

Γ[∆] `V V : TWeakenV

Γ[∆,Θ] `V V : TΓ[∆,∆] `V V : T

ContractVΓ[∆] `V V : T

Γ[∆,Θ] `V V : TExchangeV

Γ[Θ,∆] `V V : TΓ[(∆,Θ),Π] `V V : T==================== AssociateVΓ[∆, (Θ,Π)] `V V : T

Figure 3.2. Defining values in the λ-calculus

Without # and abort, our programming language is confluent and stronglynormalizing, so every way to execute a given program yields the same outcome.The new constructs give rise to some desirable computation sequences but alsosome undesirable ones. To distinguish the baby from the bathwater and address theproblems illustrated above, we restrict the computation relation to a deterministicone below.

3.1.2. Enforcing call-by-value, left-to-right evaluation. Following Fellei-sen (1987) in this section, we define a new notion of values and modify ourexisting notion of evaluation contexts, so as to shrink the computation relationfor our language to one that is deterministic in that it enforces call-by-value,left-to-right evaluation.

Figure 3.2 defines certain expressions to be values. As in Section 2.2, a valueis the result of a successful computation, as opposed to an expression that canstill compute to a subsequent expression or that is stuck due to an error. (Asmentioned in Section 2.3.1, the last case is impossible for well-formed terms inthe simply-typed λ-calculus.) We introduce a new judgment form

(3.20) Γ `V V : T .

It means that V is a value expression of type T in the type environment Γ. Themetavariable V stands for an expression that is a value. It is easy to check thatΓ ` V : T is provable whenever Γ `V V : T is provable.

As the inference rules in Figure 3.2 indicate, a variable or an abstraction isalways a value, even if the body of the abstraction is not a value. An expressionthat introduces a product, unit, or sum type—in other words, an expression of

Page 71: Linguistic side effects

3.1. Delimited control 61

the form 〈E1, E2〉, 〈〉, left E, or right E—is a value just in case its parts E, E1, E2

are values. No other expression is a value. For example, λf . fx, 〈〉, and 〈f , left〈〉〉are values in an appropriate environment, whereas let x be 〈y, z〉. 〈y, z〉, #〈〉, and〈fx, left〈〉〉 are well-formed expressions in an appropriate environment, but notvalues in any environment.

To execute a program call-by-value (Plotkin 1975) is to choose computationsteps so as to substitute only values for variables, and so as to never look undera variable binding (that is, never examine the body of an abstraction or let-expression) for a subexpression to evaluate. For example, the computation relationthat we have been using contains both

(λx. left x)((λy. right y)〈〉) = let (let 〈〉 be y. right y) be x. left xB (λx. left x)(right〈〉) = let (right〈〉) be x. left x

(3.21)

and

(λx. left x)((λy. right y)〈〉) = let (let 〈〉 be y. right y) be x. left xB left((λy. right y)〈〉) = left(let 〈〉 be y. right y)

(3.22)

as possible steps from the same program. Call-by-value evaluation eschews(3.22) in favor of (3.21), in order to substitute only values for variables: (3.21)substitutes the value 〈〉 for the variable y, whereas (3.22) substitutes the nonvalue(λy. right y)〈〉 for the variable x. To take another example, this computationrelation contains both

(λy. (λx. left x)(right y))〈〉 = let 〈〉 be y. let (right y) be x. left xB (λx. left x)(right〈〉) = let (right〈〉) be x. left x

(3.23)

and

(λy. (λx. left x)(right y))〈〉 = let 〈〉 be y. let (right y) be x. left xB (λy. left(right y))〈〉 = let 〈〉 be y. left(right y)

(3.24)

as possible steps from the same program. Call-by-value evaluation eschews(3.24) in favor of (3.23), in order to avoid looking under a variable binding fora subexpression to evaluate: (3.23) β-reduces the top-level expression, whereas(3.24) β-reduces a body under a binding for y. In the programming languages inChapter 2, such choices of computation steps are inconsequential, in the sensethat the computation relation is confluent: the programs above all compute toleft(right〈〉) eventually. But abort destroys confluence, as shown in Section 3.1.1and below.

#((λx. abort 1)(abort 2)) B #2 B 2(3.25)#((λx. abort 1)(abort 2)) B #1 B 1(3.26)

Page 72: Linguistic side effects

62 3. The analogy: delimited control and quantification

Call-by-value evaluation rules out (3.26) and restores confluence in this case.To execute a program left-to-right is to choose computation steps so as to

evaluate subexpressions of a branching expression (that is, an expression of theform FE or 〈E1, E2〉) from left to right. For example, the computation relationthat we have been using contains both

〈(λx. x)〈〉, (λx. x)〈〉〉 B 〈〈〉, (λx. x)〈〉〉(3.27)

and

〈(λx. x)〈〉, (λx. x)〈〉〉 B 〈(λx. x)〈〉, 〈〉〉(3.28)

as possible steps from the same program. Both steps are consistent with call-by-value, but left-to-right evaluation eschews (3.28) in favor of (3.27), in orderto evaluate the first component of the ordered pair before the second. To takeanother example, this computation relation contains both

((λf . f )(λx. left x))((λy. right y)〈〉) B (λx. left x)((λy. right y)〈〉)(3.29)

and

((λf . f )(λx. left x))((λy. right y)〈〉) B ((λf . f )(λx. left x))(right〈〉)(3.30)

as possible steps from the same program. Both steps are consistent with call-by-value, but left-to-right evaluation eschews (3.30) in favor of (3.29), in order toevaluate the function before the argument of the top-level application expression.The presence of abort destroys confluence in our language, as shown below.

#((abort 1)(abort 2)) B #1 B 1(3.31)#((abort 1)(abort 2)) B #2 B 2(3.32)

Left-to-right evaluation rules out (3.32) and restores confluence for this program.Call-by-value, left-to-right evaluation makes computation deterministic (and

hence trivially restores confluence): each expression either is in normal form orcomputes to exactly one expression in a step under such discipline. Figure 3.3enforces this discipline by restricting the computation relation.

The top part of Figure 3.3 replaces the inference rules for evaluation sub-contexts in Figure 3.1 on page 58. The new rules are fewer and stricter: Toavoid looking under a variable binding for a subexpression to evaluate, contextslike λx. [ ] are no longer allowed. To evaluate subexpressions of a branchingexpression from left to right, contexts like F[ ] and 〈E, [ ]〉 are only allowed whenF and E are values.

Page 73: Linguistic side effects

3.1. Delimited control 63

Evaluation subcontexts D[ ] : d

[ ] : dD[ ] : d ∆ ` E : T1

D[[ ]E] : dΓ `V V : T0 D[ ] : d

D[V[ ]] : d

D[ ] : d ∆ ` E : T2

D[〈[ ], E〉] : dΓ `V V : T1 D[ ] : d

D[〈V, [ ]〉] : d

D[ ] : dD[left [ ]] : d

D[ ] : dD[right [ ]] : d

D[ ] : d Γ ` E′ : T ′

D[let [ ] be 〈x, y〉. E′] : d

D[ ] : d Γ ` E′ : T ′

D[let [ ] be 〈〉. E′] : dD[ ] : d Γ1 ` E′1 : T ′1 Γ2 ` E′2 : T ′2D[let [ ] be left x. E′1 | right y. E′2] : d

Computation E B E′

C[(λx. E′)V] B C[E′ {x 7→ V}]

C[let 〈V1, V2〉 be 〈x, y〉. E′] B C[E′ {x 7→ V1} {y 7→ V2}]

C[let 〈〉 be 〈〉. E′] B C[E′]

C[let left V be left x. E′1 | right y. E′2] B C[E′1 {x 7→ V}]

C[let right V be left x. E′1 | right y. E′2] B C[E′2 {y 7→ V}]

C[#V] B C[V]

C[#(D[abort E′])] B C[#E′]

Figure 3.3. Restricting the computation relation to enforce call-by-value, left-to-right evaluation in a λ-calculus with # and abort

The bottom part of Figure 3.3 replaces the computation relation defined inSection 2.3 on page 30. The new computation relation is smaller: it substitutesonly values V , V1, V2 for variables x, y, z.

With these restrictions, the recursive function P2 in (3.10) on page 57 nowworks reliably on all integer lists. However, at least two fundamental issuesremain.

First, we have presented the syntax for # and abort only informally, withoutproviding formal rules that specify when an expression using these constructsis well-formed. It is not trivial to add these inference rules while preserving thecrucial property of subject reduction, because we want to classify the same abortexpression as well-formed or ill-formed depending on the program it appears in.For example, abort 2 should be allowed in

1 + #(let abort 2 be 〈x, y〉. x + y) B 1 + #2 B 1 + 2 B 3(3.33)

Page 74: Linguistic side effects

64 3. The analogy: delimited control and quantification

bxc = x

bλx. Ec = λx. JEKb〈V1, V2〉c = 〈bV1c, bV2c〉

b〈〉c = 〈〉

bleft Vc = leftbVcbright Vc = rightbVc

Figure 3.4. The evaluation result bVc of values V in the λ-calculus

because the computation concludes in a value, but disallowed in

let #(1 + abort 2) be 〈x, y〉. x + y B let #2 be 〈x, y〉. x + yB let 2 be 〈x, y〉. x + y

(3.34)

because the computation gets stuck. What should the type of abort 2 be?Second, to assign a denotational semantics to this programming language that

is sound and compositional, we can no longer just let each expression denote itsevaluation result. For one thing, any abort expression without a matching # isstuck: it neither is a value nor computes to an expression, so it has no evaluationresult to speak of or denote. Yet, replacing one abort expression by anotheras part of a larger expression can obviously affect the outcome of the largerexpression. For example, the outcome of 1 + #(abort 2) is 3, but the outcome of1 + #(abort 3) is 4. Thus, as in Section 1.4, each expression can no longer justdenote its evaluation result.

Strictly speaking, it makes sense to assign denotations to well-formed expres-sions only once it is defined what well-formed expressions are. For exposition,however, we address denotations first, then return to typing.

3.2. Continuations for delimited control

In this section, we give a well-known denotational semantics and correspond-ing type system for abort, then generalize them to other control operators. We usethe repeatedly rediscovered (Reynolds 1993) concept of continuations (Stracheyand Wadsworth 1974; Plotkin 1975).

With the addition of # and abort, not every program in our language has anevaluation result; for example, abort〈〉 is stuck. Still, every expression that isa value has an evaluation result; for example, the syntactic expression 〈〈〉, 〈〉〉results in the semantic pair 〈〈〉, 〈〉〉. We write bVc for the evaluation result ofa value V , specified in Figure 3.4. The right-hand side of this definition is theλ-calculus of Section 2.3, without # and abort. Although one could program

Page 75: Linguistic side effects

3.2. Continuations for delimited control 65

in this target language, we use it merely as a metalanguage in which to notateevaluation results bVc as semantic objects. For example, we write 〈left〈〉, right〈〉〉in Roman type to mean an ordered pair in the set-theoretic sense. (The second lineof Figure 3.4 defines the evaluation result of λx. E in terms of the denotation JEKof an expression E, which we leave unspecified until Section 3.2.1 below.)

Every program of the form #( . . . ) also has an evaluation result; for example,#(abort〈〉) results in 〈〉. Every evaluation context C[ ] thus corresponds to acontinuation dC[ ]e: a function that maps the evaluation result bVc of a value V tothe evaluation result of #(C[V]). For example, corresponding to the evaluationcontext

(3.35) 1 + [ ] + 4

is the continuation

(3.36) λx. 1 + x + 4,

the function that maps each number x to x+5. (This λ-abstraction is set in Romantype, because the continuation lies in not the syntax but the semantics of theprogramming language under discussion.) Informally, we may write the typeequation

(3.37) continuation = plugged-value→ intermediate-answer.

For example, the continuation (3.36) is a function from numbers to numbers,reflecting the fact that plugging a numeric value into #(1 + [ ] + 4) gives (in otherwords, sends to the #) a numeric intermediate answer.

For an evaluation subcontext D[ ], the function dD[ ]e is sort of a denotation.More precisely, every syntactic rule in the top part of Figure 3.3 on page 63 forbuilding subcontexts D[ ] corresponds to a semantic rule for building continua-tions dD[ ]e. For example, the first rule for building subcontexts is that [ ] is asubcontext.

(3.38) [ ] : dBecause the evaluation result of #V is V for any value V , the correspondingcontinuation d[ ]e is the identity function λx. x: the plugged value is the same asthe evaluation result.

To take a second example, the fifth rule for building subcontexts is that,whenever D[ ] is a subcontext and V is a value, D[〈V, [ ]〉] is again a subcontext.

(3.39)Γ `V V : T1 D[ ] : d

D[〈V, [ ]〉] : d

Given two values V1 and V2, the evaluation result of #(D[〈V1, V2〉]) is

(3.40) dD[ ]eb〈V1, V2〉c

Page 76: Linguistic side effects

66 3. The analogy: delimited control and quantification

by the definition of dD[ ]e. Because

(3.41) b〈V1, V2〉c = 〈bV1c, bV2c〉,

this evaluation result is also

(3.42) dD[ ]e〈bV1c, bV2c〉.

Therefore,

(3.43) dD[〈V1, [ ]〉]ebV2c = dD[ ]e〈bV1c, bV2c〉.

In other words,

(3.44) dD[〈V1, [ ]〉]e = λy. dD[ ]e〈bV1c, y〉.

Thus the continuation for D[ ] and the evaluation result for V1 together determinethe continuation for D[〈V1, [ ]〉].

To calculate the continuation for a subcontext that contains a nonvalue expres-sion, such as for a subcontext built using the fourth rule

(3.45)D[ ] : d ∆ ` E : T2

D[〈[ ], E〉] : din Figure 3.3 on page 63, we need to assign denotations to nonvalue expressions E.We now turn to these denotations.

3.2.1. The continuation-passing-style transform. As we saw in Section3.1.1, an abort expression actively takes control over, rather than passively report-ing an evaluation result to, its evaluation subcontext. Semantically speaking, itdoes not provide an argument to the continuation for its evaluation subcontext. Tomodel this denotationally, we follow Strachey and Wadsworth (1974) and let eachexpression E denote not its evaluation result (which may not even exist) but afunction that maps the continuation dD[ ]e for each evaluation subcontext D[ ] tothe evaluation result of #(D[E]). The idea is that an expression takes a subcontextas argument, whereas a subcontext only takes a value as argument. The rest ofthis section formalizes this idea.

For example, we let 2 denote not the number 2 but the function λc. c(2),because dD[ ]e(2) is the evaluation result of #(D[2]) for any D[ ]. By contrast, welet abort 2 denote the constant function λc. 2, because 2 is the evaluation result of#(D[abort 2]) for any D[ ].

Informally, we may write the type equation (cf. (3.37) on page 65)

(3.46) denotation = continuation→ intermediate-answer.

Thus, a plugged value and an expression denotation combine semantically inopposite ways with a continuation to give an intermediate answer: whereas acontinuation applies to a plugged value to produce an intermediate answer, anexpression denotation applies to a continuation to produce an intermediate answer.

Page 77: Linguistic side effects

3.2. Continuations for delimited control 67

JVK = λc. cbVc

J〈E1, E2〉K = λc. JE1K(λx. JE2K(λy. c〈x, y〉))

Jleft EK = λc. JEK(λx. c(left x))

Jright EK = λc. JEK(λx. c(right x))

JFEK = λc. JFK(λ f . JEK(λx. f xc))

Jlet E be 〈x, y〉. E′K = λc. JEK(λv. let v be 〈x, y〉. JE′Kc)

Jlet E be 〈〉. E′K = λc. JEK(λv. let v be 〈〉. JE′Kc)

Jlet E be left x. E′1 | right y. E′2K = λc. JEK(λv. let v be left x. JE′1Kc | right y. JE′2Kc)

J#EK = λc. c(JEK(λx. x))

Jabort EK = λc. JEK(λx. x)

Figure 3.5. The continuation-passing-style transform JEK for expressions E inthe λ-calculus with # and abort, under call-by-value, left-to-right evaluation

Figure 3.5 makes this idea precise in the form of a translation, called acontinuation-passing-style transform.1 As in Figure 3.4, the translation maps theλ-calculus extended with # and abort to the λ-calculus of Section 2.3, without #and abort. As before, although one could program in the target language, we use itmerely as a metalanguage in which to notate the meaning JEK that our denotationalsemantics assigns to each source-language expression E. For example, we writeλx. x in Roman type to mean an identity function in the set-theoretic sense.

Figure 3.5 specifies the denotation JEK of an expression E in terms of theevaluation result bVc of a value V . (Recall that the translation from V to bVc isspecified in Figure 3.4; in particular, bλx. Ec is defined to be λx. JEK, so Figures3.4 and 3.5 are mutually recursive.) Because every value is an expression, JVK isdefined whenever bVc is defined. When V is a value, and D[ ] is an evaluationsubcontext, the evaluation result of #(D[V]) must be equal to dD[ ]ebVc by thedefinition of dD[ ]e, and to JVKdD[ ]e by the first paragraph of this section. HenceJVKdD[ ]e must be dD[ ]ebVc, so we must define JVK to be λc. cbVc for a value V ,which we do in the first line of Figure 3.5.

Similar reasoning explains the continuation-passing-style transform for pairs,

(3.47) J〈E1, E2〉K = λc. JE1K(λx. JE2K(λy. c〈x, y〉)).

The type equations (3.37) and (3.46) say that the denotation of a pair expres-

1Strictly speaking, this translation (like its extension in Figure 3.9 on page 76) maps expres-sions (in particular #E) to continuation-composing style (Danvy and Filinski 1990) rather thancontinuation-passing style.

Page 78: Linguistic side effects

68 3. The analogy: delimited control and quantification

sion 〈E1, E2〉 should be a function from functions from pairs to answers to an-swers. To see this, let D[ ] be a subcontext that is waiting to be plugged with apair. For any value V1, the evaluation result of #(D[〈V1, E2〉]) must be equal toJE2KdD[〈V1, [ ]〉]e by the definition of JE2K. Hence

(3.48) dD[〈[ ], E2〉]e = λx. JE2K(λy. dD[ ]e〈x, y〉)

by the definition of dD[〈[ ], E2〉]e and the conclusion (3.44) on page 66. Whetheror not E1 is a value, the evaluation result of #(D[〈E1, E2〉]) must be equal toJ〈E1, E2〉KdD[ ]e by the definition of J〈E1, E2〉K, and to JE1KdD[〈[ ], E2〉]e by thedefinition of JE1K. By (3.48), J〈E1, E2〉KdD[ ]e must be

(3.49) JE1K(λx. JE2K(λy. dD[ ]e〈x, y〉)),

so J〈E1, E2〉K must be

(3.50) λc. JE1K(λx. JE2K(λy. c〈x, y〉)).

Figure 3.5 specifies the denotation of some expressions multiple times, con-sistently. That is, if E is a value of the form 〈V1, V2〉, left V , or right V , then wecan compute JEK either using the first line of Figure 3.5 (shown below on the left)or using other lines of Figure 3.5 (shown below on the right).

J〈V1, V2〉K = λc. cb〈V1, V2〉c J〈V1, V2〉K = λc. JV1K(λx. JV2K(λy. c〈x, y〉))(3.51)

Jleft VK = λc. cbleft Vc Jleft VK = λc. JVK(λx. c(left x))(3.52)

Jright VK = λc. cbright Vc Jright VK = λc. JVK(λx. c(right x))(3.53)

It is easy to check that the results above on the right β-reduce to those on the left,so the two ways give the same denotation. For example, we can compute thedenotation of the source expression

(3.54) 〈left〈〉, right〈〉〉

in two ways (among others). On one hand, because this expression is a value, wecan use its evaluation result bleft〈〈〉, right〈〉〉c.

J〈left〈〉, right〈〉〉K = λc. cbleft〈〈〉, right〈〉〉c= λc. c〈bleft〈〉c, bright〈〉c〉= λc. c〈leftb〈〉c, rightb〈〉c〉= λc. c〈left〈〉, right〈〉〉

(3.55)

This denotation is a function from continuations to evaluation results. It makessense because dD[ ]e〈left〈〉, right〈〉〉 is the result of #(D[〈left〈〉, right〈〉〉]) for anyD[ ]. On the other hand, we can also use the continuation-passing-style transform

Page 79: Linguistic side effects

3.2. Continuations for delimited control 69

for pairs.

J〈left〈〉, right〈〉〉K= λc. Jleft〈〉K(λx. Jright〈〉K(λy. c〈x, y〉))= λc. (λc1. c1bleft〈〉c)(λx. (λc2. c2bright〈〉c)(λy. c〈x, y〉))= λc. (λc1. c1(leftb〈〉c))(λx. (λc2. c2(rightb〈〉c))(λy. c〈x, y〉))= λc. (λc1. c1(left〈〉))(λx. (λc2. c2(right〈〉))(λy. c〈x, y〉))

(3.56)

We get the same denotation: the last line of (3.56) β-reduces to the last lineof (3.55).

The continuation-passing-style transform for pairs is necessary to computethe denotation of a pair whose components are not both values. For example,given

Jabort(left〈〉)K = λc. Jleft〈〉K(λx. x)= λc. (λc′. c′bleft〈〉c)(λx. x)= λc. left〈〉,

(3.57)

we can compute the denotation of

(3.58) 〈abort(left〈〉), abort(right〈〉)〉

as follows.

J〈abort(left〈〉), abort(right〈〉)〉K= λc. Jabort(left〈〉)K(λx. Jabort(right〈〉)K(λy. c〈x, y〉))

= λc. (λc1. left〈〉)(λx. Jabort(right〈〉)K(λy. c〈x, y〉))= λc. left〈〉

(3.59)

This denotation is again a function from continuations to evaluation results. Itmakes sense under left-to-right evaluation, in that left〈〉 is the result of

(3.60) #(D[〈abort(left〈〉), abort(right〈〉)〉])

for any D[ ]. In particular, when D[ ] is the null context [ ], this denotationalsemantics tells us (correctly) that the outcome of

(3.61) #〈abort(left〈〉), abort(right〈〉)〉

is left〈〉 rather than right〈〉. The continuation-passing-style transform for pairsthus enforces left-to-right evaluation. For right-to-left evaluation instead, wewould change Figure 3.5 to replace (3.47) on page 67 by

(3.62) J〈E1, E2〉K = λc. JE2K(λy. JE1K(λx. c〈x, y〉)).

Page 80: Linguistic side effects

70 3. The analogy: delimited control and quantification

The expression translation for FE enforces left-to-right evaluation analogously,for example so that

(3.63) (abort(left〈〉))(abort(right〈〉))

denotes λc. left〈〉 rather than λc. right〈〉.The denotation in (3.59) also enforces call-by-value evaluation: It specifies

that the outcome of the program

(3.64) #(let 〈abort(left〈〉), abort(right〈〉)〉 be 〈x, y〉. 〈〉)

(which is the expression (3.58) plugged into the evaluation context #(let [ ] be〈x, y〉. 〈〉)) is left〈〉 rather than 〈〉. In other words, it prohibits substituting thenonvalues abort(left〈〉) and abort(right〈〉) for x and y in 〈〉. If we wanted torelax this prohibition, the denotation of (3.58) would have to be of a differenttype, such as a pair of functions from continuations rather than a function fromcontinuations to pairs. The other equations in Figure 3.5 similarly embody call-by-value evaluation in the source language (Plotkin 1975; Sabry and Felleisen1993; Sabry 1994, 1996): for example, the expression

(3.65) (λx. abort(left〈〉))(abort(right〈〉))

denotes λc. right〈〉 rather than λc. left〈〉.Having enriched denotations to be functions from continuations rather than

just evaluation results, we can assign meanings to # and abort expressions, asshown in Figure 3.5 and repeated below.

J#EK = λc. c(JEK(λx. x))(3.66)

Jabort EK = λc. JEK(λx. x)(3.67)

These equations are explained by the fact that the null context corresponds to theidentity continuation: formally, d[ ]e = λx. x. By the definition of JEK, the evalua-tion result of #E is JEK(λx. x). Inspecting the computation relation in Figure 3.3on page 63 shows that D[#E] B D[#E′] whenever #E B #E′, for any expressionsE and E′ and any subcontext D[ ]. By the definition of dD[ ]e, then, the evalua-tion result of any D[#E] is dD[ ]e(JEK(λx. x)), so J#EK must be λc. c(JEK(λx. x)).Meanwhile, the equation (3.67) makes sense because #(D[abort E]) computes to#E, whose evaluation result is JEK(λx. x).

To illustrate, we can compute the denotation of

(3.68) (λv. abort〈〉)(#〈abort(left〈〉), abort(right〈〉)〉)

Page 81: Linguistic side effects

3.2. Continuations for delimited control 71

as follows.

Jλv. abort〈〉K = λc. cbλv. abort〈〉c

= λc. c(λv. Jabort〈〉K)

= λc. c(λv. λc′. J〈〉K(λx. x))= λc. c(λv. λc′. (λc′′. c′′b〈〉c)(λx. x))= λc. c(λv. λc′. (λc′′. c′′〈〉)(λx. x))= λc. c(λv. λc′. 〈〉)

(3.69)

J#〈abort(left〈〉), abort(right〈〉)〉K= λc. c(J〈abort(left〈〉), abort(right〈〉)〉K(λx. x))= λc. c((λc′. left〈〉)(λx. x))= λc. c(left〈〉)

(3.70)

J(λx. abort〈〉)(#〈abort(left〈〉), abort(right〈〉)〉)K

= λc. Jλx. abort〈〉K(λ f . J#〈abort(left〈〉), abort(right〈〉)〉K(λx. f xc))

= λc. (λc1. c1(λv. λc′. 〈〉))(λ f . (λc2. c2(left〈〉))(λx. f xc))= λc. 〈〉

(3.71)

As one expects from call-by-value, left-to-right evaluation, the denotation λc. 〈〉is equal to that of abort〈〉 alone.

We have seen that adding delimited control to a programming language forcesus to revise our denotational semantics, and how to model delimited control usingfunctions from continuations (in other words, functions from functions fromresults to results to results) as denotations.

3.2.2. Types for continuations. As the target language of our continuation-passing-style transform, the simply-typed λ-calculus provides not just a denota-tional semantics for the source language but also a type system: via the transform,not only do denotations for the simply-typed λ-calculus pull back to provide deno-tations for delimited control, but the typing rules for the simply-typed λ-calculusalso pull back to provide typing rules for delimited control. In this section, wespell out the latter type system, due to Danvy and Filinski (1989).

Recall from (2.46) on page 41 that types in Section 2.3 are generated by thefollowing context-free grammar.

T F A | T → T | T × T | 1 | T + T(3.72)AF a | b | c | · · ·(3.73)

The metavariable T ranges over types; the metavariable A ranges over base types.

Page 82: Linguistic side effects

72 3. The analogy: delimited control and quantification

Expressions Γ ` E : W( (T)W0)

Γ `V V : TValue

Γ ` V : W0( (T)W0)Γ ` E1 : W( (T1)W1) ∆ ` E2 : W1( (T2)W0)

× IΓ,∆ ` 〈E1, E2〉 : W( ((T1 × T2))W0)

Γ ` E : W( (T1)W0)+ I

Γ ` left E : W( ((T1 + T2))W0)Γ ` E : W( (T2)W0)

+ IΓ ` right E : W( ((T1 + T2))W0)

Γ ` F : W( ((T1→W1( (T2)W0)))W2) ∆ ` E : W2( (T1)W1)→E

Γ,∆ ` FE : W( (T2)W0)

∆ ` E : W( ((T1 × T2))W1) Γ[x : T1, y : T2] ` E′ : W1( (T ′)W0)×E

Γ[∆] ` let E be 〈x, y〉. E′ : W( (T ′)W0)

∆ ` E : W( (1)W1) Γ[·] ` E′ : W1( (T ′)W0)1 E

Γ[∆] ` let E be 〈〉. E′ : W( (T ′)W0)

∆ ` E : W( ((T1 + T2))W1)Γ[x : T1] ` E′1 : W1( (T ′)W0)Γ[y : T2] ` E′2 : W1( (T ′)W0)

+EΓ[∆] ` let E be left x. E′1 | right y. E′2 : W( (T ′)W0)

Γ ` E : W( (T)T )#

Γ ` #E : W0( (W)W0)Γ ` E : W( (T)T )

abortΓ ` abort E : W( (T0)W0)

Γ[∆] ` E : W( (T)W0)Weaken

Γ[∆,Θ] ` E : W( (T)W0)Γ[∆,∆] ` E : W( (T)W0)

ContractΓ[∆] ` E : W( (T)W0)

Γ[∆,Θ] ` E : W( (T)W0)Exchange

Γ[Θ,∆] ` E : W( (T)W0)Γ[(∆,Θ),Π] ` E : W( (T)W0)============================= AssociateΓ[∆, (Θ,Π)] ` E : W( (T)W0)

Figure 3.6. Defining expressions in the λ-calculus with # and abort

To deal with delimited control operators in the programming language, we changefunction types from the binary construction T1→T2 to the quaternary constructionT1 → W( (T2)W0), where T1, W, T2, W0 are four types. A type of this form ispronounced “T1 to W outside T2 inside W0”. Here T1 is the usual argument typeand T2 is the usual return type, while W and W0 are types that record the context inwhich a function may be applied, as described below. Types T are now generatedby the context-free grammar

(3.74) T F A | T → T( (T)T ) | T × T | 1 | T + T

Page 83: Linguistic side effects

3.2. Continuations for delimited control 73

Figure 3.6 replaces the judgment form

(3.75) Γ ` E : T

for expressions in the simply-typed λ-calculus with a new judgment form

(3.76) Γ ` E : W( (T)W0)

for expressions in our programming language with # and abort. Again,( ispronounced “outside”, and ) is pronounced “inside”. This judgment form meansthat, given any evaluation subcontext D[ ] such that a value of type T can beplugged into #(D[ ]) to yield an evaluation result of type W0, we can plug theexpression E into #(D[ ]) to yield an evaluation result of type W.

The types W0 and W are known as answer types. This terminology reflectsthe intuition that W0 and W are types of the evaluation result of an expressionof the form #(D[E]). In other words, W0 and W are types of an intermediateanswer provided to a control delimiter. In the simply-typed λ-calculus, thecontinuation dD[ ]e for the subcontext D[ ] has the type bT c → bW0c, indicatingthat the evaluation result of #(D[E]) has type W0 whenever E is a value of type T .For E to satisfy (3.76) means that the evaluation result of #(D[E]) has the type Win the end. Therefore, if E is a value, then the two answer types W0 and W are thesame. Indeed they are in the Value rule in Figure 3.6.

Informally speaking, then, a judgment of the form (3.76) means that E is anexpression that behaves locally like the type T , but takes control over its delimitedevaluation context, changing the intermediate answer type from W0 to W. In otherwords, when we put E inside a subcontext D[ ] that expects an evaluation resultof type T so as to produce an intermediate answer of type W0, the expression Edoes not report an evaluation result to D[ ] as a value would. Rather, it takes(the continuation for) D[ ] as a semantic argument. Similarly, a function typeof the form T1→W1( (T2)W0) means that applying the function to somethingof type T1 behaves locally like the return type T2, but takes control over thedelimited evaluation context of the application, changing the intermediate answertype from W0 to W1.

With this new judgment form for expressions, we also change the→ IV rulein Figure 3.2 on page 60 from

(3.77)Γ, x : T1 ` E : T2

→ IVΓ `V λx. E : T1→ T2

to

(3.78)Γ, x : T1 ` E : W( (T2)W0)

→ IV.Γ `V λx. E : T1→W( (T2)W0)

Here W0 and W are types that record the context in which the expression Emay be evaluated. The expression judgment form (3.76) and the value judgment

Page 84: Linguistic side effects

74 3. The analogy: delimited control and quantification

form (3.20) on page 60 are designed for the continuation-passing-style transformto translate. To make this statement precise, let us recursively define a map b·cfrom each type T of the form (3.74) to a type bT c of the form (3.72).

bAc = A(3.79)

bT1→W( (T2)W0)c = bT1c → (bT2c → bW0c)→ bWc(3.80)bT1 × T2c = bT1c × bT2c(3.81)

b1c = 1(3.82)bT1 + T2c = bT1c + bT2c(3.83)

Given a type environment Γ where the types are of the form (3.74), we writebΓc to mean applying b·c to each type in Γ to form a type environment in thesimply-typed λ-calculus. The inference rules in Figures 3.2 and 3.6 are reverse-engineered from Figures 3.4 and 3.5 to satisfy two properties that are easy tocheck by mutual structural induction on source-language derivations.

• Whenever Figure 3.2 (with (3.78) in place of (3.77)) derives Γ `V V : T ,the simply-typed λ-calculus derives bΓc ` bVc : bT c.• Whenever Figure 3.6 derives Γ ` E : W( (T)W0), the simply-typedλ-calculus derives bΓc ` JEK : (bT c → bW0c)→ bWc.

For example, let us check the first property for when V is an abstraction: supposethat V = λx. E for some x and E. A derivation of Γ `V V : T must conclude with(3.78) or one of the structural rules in Figure 3.2. If the derivation concludeswith (3.78), then T is of the form T1→W( (T2)W0), and Figure 3.6 must derive

(3.84) Γ, x : T1 ` E : W( (T2)W0).

By the induction hypothesis, the simply-typed λ-calculus must derive

(3.85) bΓc, x : bT1c ` JEK : (bT2c → bW0c)→ bWc.

The inference

(3.86)bΓc, x : bT1c ` JEK : (bT2c → bW0c)→ bWc

→ IbΓc ` λx. JEK : bT1c → (bT2c → bW0c)→ bWc

in the simply-typed λ-calculus then derives bΓc ` bVc : bT c, as desired. If thederivation concludes with a structural rule instead, the corresponding structuralrule in the simply-typed λ-calculus completes the induction.

To illustrate this type system, let us derive two sample expressions. First,Figure 3.7a proves that 〈left〈〉, right〈〉〉 (from (3.54) on page 68) is a well-formedexpression. (We omit structural inferences in this proof and those following.)This proof works for any answer type W. The final judgment

(3.87) · ` 〈left〈〉, right〈〉〉 : W( (((1 + a) × (b + 1)))W)

Page 85: Linguistic side effects

3.2. Continuations for delimited control 75

1 IV· `V 〈〉 : 1

Value· ` 〈〉 : W( (1)W)

+ I· ` left〈〉 : W( ((1 + a))W)

1 IV· `V 〈〉 : 1

Value· ` 〈〉 : W( (1)W)

+ I· ` right〈〉 : W( ((b + 1))W)

× I· ` 〈left〈〉, right〈〉〉 : W( (((1 + a) × (b + 1)))W)

(a) Using the Value rule twice at the top

1 IV· `V 〈〉 : 1

+ IV· `V left〈〉 : 1 + a

1 IV· `V 〈〉 : 1

+ IV· `V right〈〉 : b + 1

× IV· `V 〈left〈〉, right〈〉〉 : (1 + a) × (b + 1)

Value· ` 〈left〈〉, right〈〉〉 : W( (((1 + a) × (b + 1)))W)

(b) Using the Value rule once at the bottom

Figure 3.7. Proving that 〈left〈〉, right〈〉〉 is well formed

1 IV· `V 〈〉 : 1

+ IV· `V left〈〉 : 1 + a

Value·` left〈〉:(1+a)( ((1+a))(1+a))

abort·`abort(left〈〉):(1+a)( (T1)(b+1))

1 IV· `V 〈〉 : 1

+ IV· `V right〈〉 : b + 1

Value·`right〈〉:(b+1)( ((b+1))(b+1))

abort·`abort(right〈〉):(b+1)( (T2)W)

× I· ` 〈abort(left〈〉), abort(right〈〉)〉 : (1 + a)( ((T1 × T2))W)

Figure 3.8. Proving that 〈abort(left〈〉), abort(right〈〉)〉 is well formed

says that, given any subcontext D[ ] such that a value of type (1 + a) × (b + 1)can be plugged into #(D[ ]) to yield an evaluation result of type W, we can plugthe expression 〈left〈〉, right〈〉〉 into #(D[ ]) to yield an evaluation result of type W.This conclusion is not surprising because 〈left〈〉, right〈〉〉 is itself a value of type(1 + a)× (b + 1). Consequently, we can derive the same judgment using the Valuerule just once, as Figure 3.7b does.

Second, Figure 3.8 shows a proof that 〈abort(left〈〉), abort(right〈〉)〉 (from(3.58) on page 69) is well formed. The final judgment says that, given anyevaluation context D[ ] such that a value (of any product type T1 × T2) can beplugged into #(D[ ]) to yield a result (of any type W), we can plug 〈abort(left〈〉),abort(right〈〉)〉 into #(D[ ]) to yield an evaluation result of type 1+a. This last typeis 1 + a rather than b + 1, because the × I rule in Figure 3.6 enforces left-to-right

Page 86: Linguistic side effects

76 3. The analogy: delimited control and quantification

Types T

T F A | T → T( (T)T ) | T × T | 1 | T + T | T)T

Expressions Γ ` E : W( (T)W0) (additional)

Γ, c : T)W0 ` E : W( (T)W0)Plug

Γ ` #(cE) : W0( (W)W0)Γ, c : T0)W0 ` E : W( (T)T )

ShiftΓ ` ξc. E : W( (T0)W0)

Computation E B E′ (additional)

C[#(D[ξc. E′])] B C[#E′ {c[ ] 7→ D[ ]}]

Continuation-passing-style transform JEK (additional)

Jξc. EK = λc. JEK(λx. x)

Figure 3.9. Abstracting control

evaluation by chaining the answer types W0, W1, and W among its conclusion andtwo premises. In particular, enclosing this expression in #[ ] gives an evaluationresult of type 1 + a (namely left〈〉).

(3.88)

···

Figure 3.8· ` 〈abort(left〈〉), abort(right〈〉)〉 : (1 + a)( ((T1 × T2))(T1 × T2))

#· ` #〈abort(left〈〉), abort(right〈〉)〉 : W( ((1 + a))W)

3.2.3. Higher-order delimited control. So far, we have• added # and abort to the syntax of the λ-calculus (in Figure 3.2 on

page 60 with (3.78) on page 73 in place of (3.77) on page 73, and inFigure 3.6 on page 72);• specified their operational semantics using evaluation contexts (in Fig-

ure 3.3 on page 63); and• provided them with a denotational semantics based on continuations (in

Figure 3.4 on page 64 and Figure 3.5 on page 67).This setup supports another delimited control operator, shift (Danvy and Filinski1989, 1990, 1992). Shift is higher-order in that it allows passing continuationsas functional values (Felleisen 1987, 1988). It subsumes the first-order operatorabort, which only discards continuations. We first describe the syntax andsemantics of shift, then demonstrate its use in programming examples.

Figure 3.9 adds shift to our programming language. It is a binding construct:the expression ξc. E (pronounced “shift c in E”) binds the variable c in the body E.The variable c is special in that it is bound to a subcontext rather than a value.This fact is indicated in the type environment by assigning to c a type of the

Page 87: Linguistic side effects

3.2. Continuations for delimited control 77

(new) form T)W, which means that c is a subcontext that can be plugged with avalue of type T to produce an intermediate answer of type W. We prohibit theIdV rule in Figure 3.2 on page 60 from accessing a variable of such a type in theenvironment. Instead, the Plug rule must be used, which forces c to appear in theform #(cE), which means to plug the expression E into the subcontext c. Thisway, we distinguish between plugging E as a subexpression into a subcontext cand feeding E as an argument to a function c, even though the concrete syntaxfor Apply and Plug both juxtapose c and E. This new presentation of higher-order delimited control becomes important in Chapter 5, where we introduce aprogramming language in which an expression may be a subcontext (“coterm”)or a subexpression (“term”), whether or not it is a function.

The surrounding # required by the Plug rule is a technical detail to make thesmall-step operational semantics in Figure 3.9 match the continuation-passing-style denotational semantics (Danvy and Filinski 1989, 1990, 1992). The intuitionlies in Section 3.2.1 why we restrict ourselves to plugging an expression E into asubcontext D[ ] only when D[ ] is surrounded by #: the continuation for D[ ] onlytells us, given a value V , the evaluation result of #(D[V]), which is surroundedby #. This # distinguishes shift from other delimited control operators, such asFelleisen’s control (1987, 1988), which is the first delimited control operator in theliterature. Continuation-passing-style denotational semantics are also available forthese other operators, but they are trickier (Felleisen et al. 1987, 1988; Kiselyov2005; Shan 2004c; Biernacki et al. 2005a; Dybvig et al. 2005).

The computation step in Figure 3.9 is defined in terms of a new kind of substi-tution: instead of substituting a value V for a variable x in a body expression E′ toyield the new expression E′ {x 7→ V}, we substitute a context D[ ] for a variable cin a body expression E′ to yield the new expression E′ {c[ ] 7→ D[ ]}. This newkind of substitution is defined just in case c appears in E′ only in subexpressionsof the form #(cE), that is, only by the Plug rule. To substitute D[ ] for c in such E′

is to replace each #(cE) by #(D[E]).The syntactic rules, operational semantics, and denotational semantics for

ξc. E are all identical to those for abort E if we ignore the bound variable c.Indeed, both constructs skip the rest of the computation until the innermostenclosing #, and provide an intermediate answer directly to that #. If the body Enever mentions c, then ξc. E degenerates into abort E. That is, we can treatabort E as just syntactic sugar for ξc. E, where c is a freshly chosen name thatdoes not appear in E.

(3.89) 1 + #(10 × ξc. 2) B 1 + #2 B 1 + 2 B 3

Unlike abort E, the expression ξc. E lets E use the captured evaluation con-text (D[ ] in Figure 3.9). For example, ξc. #(c2) means to capture the currentcontinuation as c, then restore it right away, so ξc. #(c2) can be replaced by 2

Page 88: Linguistic side effects

78 3. The analogy: delimited control and quantification

without affecting the outcome of the program. Thus the following computationsequences have the same result.

1 + #(10 × 2) B 1 + #20 B 1 + 20 B 21(3.90)1 + #(10 × ξc. #(c2)) B 1 + ##(10 × 2)

B 1 + ##20 B 1 + #20 B 1 + 20 B 21(3.91)

The expression ξc. #(c2) above captures its surrounding delimited context 10 × [ ](or semantically, the continuation λv. 10 × v) as c when evaluated.

Besides discarding the captured context and reinstating it right away, we canalso reuse it later. For example, the following program duplicates a capturedcontext.

1 + #(10 × ξc. #(c#(c2))) B 1 + ##(10 × #(10 × 2))B 1 + ##(10 × #20) B 1 + ##(10 × 20)B 1 + ##200 B 1 + #200 B 1 + 200 B 201

(3.92)

The captured context can also be reused outside the body E in ξc. E. For example,the program below uses ξc. λv. #(cv) to suspend a computation as a functioncontaining a captured context, then resumes it (twice) outside the delimiting #.

let 〈2, #(10 × ξc. λv. #(cv))〉 be 〈x, f〉. 1 + f (fx)(3.93)B let 〈2, #(λv. #(10 × v))〉 be 〈x, f〉. 1 + f (fx)B let 〈2, λv. #(10 × v)〉 be 〈x, f〉. 1 + f (fx)B 1 + (λv. #(10 × v))((λv. #(10 × v))2)B 1 + (λv. #(10 × v))(#(10 × 2))B 1 + (λv. #(10 × v))(#20)B 1 + (λv. #(10 × v))20B 1 + #(10 × 20) B 1 + #200 B 1 + 200 B 201

We conclude this discussion of delimited control with a more practical pro-gramming example for shift: the classical same-fringe problem. The goal is towrite a (recursive) function F such that, given two binary trees of integers E1

and E2, the evaluation result of FE1E2 is a Boolean value that indicates whethertheir fringes are the same.

The input trees E1 and E2 are represented by values of type tree int, definedin Section 2.3.5 by the recursive equation (2.53) on page 43:

(3.94) tree int = int + (tree int × tree int).

Page 89: Linguistic side effects

3.2. Continuations for delimited control 79

For example, the following two binary trees have the same fringe, namely thesequence 1, 2, 3.

(3.95) right〈left 1, right〈left 2, left 3〉〉

1uuuuuIIIIII

2uuuuu

3

IIIII

right〈right〈left 1, left 2〉, left 3〉uuuuuu 3

IIIII

1uuuuu

2

IIIII

By contrast, the following binary tree has a different fringe, namely the sequence2, 3, 1.

(3.96) right〈right〈left 2, left 3〉, left 1〉uuuuuu 1

IIIII

2uuuuu

3

IIIII

For clarity, we write leaf and branch in place of left and right when using thisencoding of binary trees. For example, the trees in (3.95) are

(3.97) branch〈leaf 1, branch〈leaf 2, leaf 3〉〉

and

(3.98) branch〈branch〈leaf 1, leaf 2〉, leaf 3〉,

and the tree in (3.96) is

(3.99) branch〈branch〈leaf 2, leaf 3〉, leaf 1〉.

The output Boolean is represented by a value of type 1 + 1, as mentioned inSection 2.3.4: either left〈〉 if the fringes are different, or right〈〉 if the fringes arethe same. For clarity, we write false and true for left〈〉 and right〈〉 when usingthis encoding of Booleans.

One simple solution to the same-fringe problem is to first compute the fringesof the input trees, then compare the two fringes as lists. To start, we representa fringe as a value of type list int, defined in Section 2.3.5 by the recursiveequation (2.52) on page 42:

(3.100) list int = 1 + (int × list int).

For clarity, we write nil and cons for left and right when using this encoding oflists.

The Fringe1 function below computes the fringe of an input tree. It has thetype tree int→W( (list int)W) for any answer type W. It is defined in terms of arecursive function Fringe′1, which takes two arguments.

(3.101) Fringe1 = λt . Fringe′1 t (nil〈〉)

Page 90: Linguistic side effects

80 3. The analogy: delimited control and quantification

The expression Fringe′1 t l means to prepend the fringe of the tree t before thelist l. Thus Fringe′1 is of type tree int→ W( ((list int→ W ′( (list int)W ′)))W)for any answer types W and W ′.

(3.102) Fringe′1 = λt . λl. let t be leaf x. cons〈x, l〉| branch y. let y be 〈t1, t2〉.

Fringe′1 t1 (Fringe′1 t2 l)

The recursive function Same1 below checks whether two lists are equal. It hasthe type list int→W( ((list int→W ′( ((1 + 1))W ′)))W) for any answer typesW and W ′.

(3.103) Same1 = λl1. λl2. let l1 be nil u1. let l2 be nil u2. true| cons v2. false

| cons v1. let l2 be nil u2. false| cons v2. let v1 be 〈x1, y1〉.

let v2 be 〈x2, y2〉.if x1 = x2

then Same1 y1 y2

else false

Finally, we can put these two steps together into the desired function F. It has thetype tree int→W( ((tree int→W ′( ((1 + 1))W ′)))W) for any answer types Wand W ′.

(3.104) F = λt1. λt2. Same1(Fringe1 l1)(Fringe1 l2)

To check this solution against the examples in (3.95) and (3.96), we cancompute

F(branch〈leaf 1, branch〈leaf 2, leaf 3〉〉)(branch〈branch〈leaf 1, leaf 2〉, leaf 3〉)

(3.105)

B+ Same1 (Fringe1(branch〈leaf 1, branch〈leaf 2, leaf 3〉〉))(Fringe1(branch〈branch〈leaf 1, leaf 2〉, leaf 3〉))

B+ Same1 (cons〈1, cons〈2, cons〈3, nil〈〉〉〉〉)(Fringe1(branch〈branch〈leaf 1, leaf 2〉, leaf 3〉))

B+ Same1 (cons〈1, cons〈2, cons〈3, nil〈〉〉〉〉)(cons〈1, cons〈2, cons〈3, nil〈〉〉〉〉)

B+ true

Page 91: Linguistic side effects

3.2. Continuations for delimited control 81

as well as

F(branch〈leaf 1, branch〈leaf 2, leaf 3〉〉)(branch〈branch〈leaf 2, leaf 3〉, leaf 1〉)

(3.106)

B+ Same1 (Fringe1(branch〈leaf 1, branch〈leaf 2, leaf 3〉〉))(Fringe1(branch〈branch〈leaf 2, leaf 3〉, leaf 1〉))

B+ Same1 (cons〈1, cons〈2, cons〈3, nil〈〉〉〉〉)(Fringe1(branch〈branch〈leaf 2, leaf 3〉, leaf 1〉))

B+ Same1 (cons〈1, cons〈2, cons〈3, nil〈〉〉〉〉)(cons〈2, cons〈3, cons〈1, nil〈〉〉〉〉)

B+ false.

The test in (3.106) reveals an inefficiency in this solution: even though the twofringes differ at the very beginning (1 is not 2), this solution still computes the restof the fringes before Same1 returns false. We want to interleave the computationof the two fringes with their comparison. In other words, we want to represent afringe as a lazy list, or a stream (Friedman and Wise 1976). Shift provides oneway to program such interleaving, by suspending the two fringe computationsand resuming them bit by bit during comparison (Biernacki et al. 2005b,c).2

A fringe is either empty or nonempty. The solution above represents anonempty fringe by an ordered pair whose components are the first elementof the fringe (an integer) and the rest of the fringe (a fringe, recursively). Wenow change this representation to an ordered pair whose components are the firstelement of the fringe (still an integer) and a suspended computation that, whenresumed, produces the rest of the fringe (a function from the unit type to a fringe,recursively). Formally, we represent fringes using the recursive type (cf. (3.100)on page 79)

(3.107) fringe = 1 + (int × (1→W( (fringe)W))),

where W is any answer type. The type of a suspended fringe computation

(3.108) 1→W( (fringe)W)

is the type of #(D[ξc. λv. #(cv)]), if #(D[〈〉]) has the type of a fringe. For clarity,we write snil and scons for left and right when using this suspended encoding oflists.

The Fringe2 function below computes the fringe of an input tree, using thisnew representation of fringes. It relies on a recursive function Fringe′2, which

2Generalizing this technique, Kiselyov (2004) lets a suspended computation receive anargument when it is resumed. Using this argument, he implements Huet’s zipper data-structure(Huet 1997; Hinze and Jeuring 2001; Abbott et al. 2003) in terms of shift.

Page 92: Linguistic side effects

82 3. The analogy: delimited control and quantification

traverses a given tree depth-first, stopping at each leaf.

Fringe2 = λt . #(let Fringe′2 t be 〈〉. snil〈〉)(3.109)

Fringe′2 = λt . let t be leaf x. ξc. scons〈x, λv. #(cv)〉| branch y. let y be 〈t1, t2〉.

let Fringe′2 t1 be 〈〉.Fringe′2 t2

(3.110)

The Fringe′2 function has the type tree int → fringe( (1)fringe). This meansthat applying Fringe′2 to a tree behaves locally like the return type 1, but takescontrol over the delimited evaluation context of the application, whose interme-diate answer type is (and stays) fringe. Every time Fringe′2 encounters a leaf x,it suspends the computation in c using ξc. , and returns the nonempty fringescons〈x, λv. #(cv)〉 to the innermost enclosing #, namely the # in Fringe2. TheFringe2 function has the type tree int→W( (fringe)W) for any answer type W.It provides Fringe′2 t with its initial delimited evaluation context

(3.111) #(let [ ] be 〈〉. snil〈〉),

which means that, once Fringe′2 finishes traversing t and returns 〈〉, the computa-tion of the fringe completes with the empty fringe snil〈〉.

The Same2 function checks whether two fringes in the new representationare equal. It is the same as Same1, except crucially the recursive applicationSame1 y1 y2 is changed to Same2(y1〈〉)(y2〈〉), so as to resume the suspended com-putations for the rest of the fringes.(3.112)

Same2 = λl1. λl2. let l1 be snil u1. let l2 be snil u2. true| scons v2. false

| scons v1. let l2 be snil u2. false| scons v2. let v1 be 〈x1, y1〉.

let v2 be 〈x2, y2〉.if x1 = x2

then Same2(y1〈〉)(y2〈〉)else false

Putting these pieces together, we can define F in terms of Fringe2 and Same2,just as in terms of Fringe1 and Same1.

(3.113) F = λt1. λt2. Same2(Fringe2 l1)(Fringe2 l2)

Like the old F in (3.104) on page 80, this new F solves the same-fringe problem,but

(3.114) F(branch〈leaf 1, E1〉)(branch〈leaf 2, E2〉)

Page 93: Linguistic side effects

3.3. Quantification 83

now computes to false in the same number of computation steps, no matter howlarge the trees E1 and E2 are.

This concludes our presentation of delimited control in programming lan-guages. Before moving to quantification in natural languages, let us stress thatshift and reset, especially Danvy and Filinski’s type system for them (1989)presented here, are but one instantiation of delimited control. This system has aparticularly strong connection to the continuation-passing-style transform, andhence to natural-language quantification as explained below. Nevertheless, otherproposals abound (Felleisen 1987, 1988; Felleisen et al. 1987, 1988; Sitaram andFelleisen 1990; Hieb and Dybvig 1990; Queinnec and Serpette 1991; Sitaram1993; Gunter et al. 1995, 1998) and are also compatible with the transform (Kise-lyov 2005; Shan 2004c; Biernacki et al. 2005a; Dybvig et al. 2005). In particular,it can be useful to let an expression name an enclosing control delimiter—notnecessarily the innermost one—to capture the current continuation up to.

3.3. Quantification

The following sentences illustrate the linguistic phenomenon of quantification.

(3.115) Nobody saw Bob.(3.116) Some student saw every professor.(3.117) Alice consulted Bob before most meetings.

As with the previously encountered English sentences, the linguist wants to modelwhich sentences containing words like nobody, some, every, and most areacceptable, and in what situations they are true. In type-logical grammar, it isa non-starter to assign to the quantificational noun phrase nobody the type nplike Alice, for a semantic reason: Unlike Alice, the word nobody does not denotean individual—not a real person; not an imaginary person; not even a conceptof a person. In other words, for nobody to denote an individual would make itdifficult to model the truth conditions of sentences with nobody.

3.3.1. Generalized quantifiers. We look naturally to first-order predicatelogic for inspiration. The beginning logic student is taught to translate a sentencelike (3.115) to a universally quantified logical formula like

(3.118) ∀x.¬saw(x, Bob),

which we regard as shorthand for

(3.119) ∀(λx.¬saw(x, Bob)

).

Here λx.¬saw(x, Bob) denotes the property of not having seen Bob, a functionfrom individuals to Boolean values. We apply to this property the function ∀,

Page 94: Linguistic side effects

84 3. The analogy: delimited control and quantification

which is a function from properties to truth values. The truth value ∀(c) is truejust in case every individual has the property c.

Following the footsteps of the beginning logic student, we want our grammarto derive the expression (3.115) with the meaning (3.118). One way to achievethis goal is to assign to nobody the type s/(np\s), and let it denote the functionλc.∀x.¬cx. We can then derive (3.115).

(3.120) nobody ` s/(np\s)saw ` (np\s)/np Bob ` np

/Esaw, Bob ` np\s

/Enobody, (saw, Bob) ` s

Unlike the derivation of Alice saw Bob in (2.69) on page 48, this derivation feedssaw Bob to nobody as an argument, rather than applying saw Bob as a function toAlice. Reversing the direction of function application lets us generate the correctdenotation (3.118).

This analysis of nobody extends to many other sentences. For example, giventhat left is a verb phrase like saw Bob, we can derive

(3.121) Nobody left

in the same way.

(3.122)nobody ` s/(np\s) left ` np\s

/Enobody, left ` s

Other quantificational noun phrases behave similarly to nobody, at least at firstglance. For example, we can assign the same type s/(np\s) to the words everyoneand someone, but let them denote the functions ∀ and ∃, respectively. Naturallanguage also expresses quantifiers that cannot be expressed in terms of ∀ and ∃in first-order predicate logic, such as “most”: We can roughly analyze the phrasemost students in

(3.123) Most students saw Bob,

to denote the function that maps each given property c to true just in case thenumber of students with the property c is more than half of the number ofstudents. A function from properties to truth values (or equivalently, a set of setsof individuals) is called a generalized quantifier (Montague 1974b; Barwise andCooper 1981), or sometimes just “quantifier” for short, because these functionsgeneralize the first-order quantifiers ∀ and ∃.3

To model the difference between nobody and nothing, everyone and every-thing, and someone and something, we can assume that some individuals are

3Often the term “generalized quantifier” is used to refer to a function from tuples of properties(typically pairs of properties) to truth values.

Page 95: Linguistic side effects

3.3. Quantification 85

Alice ` npthinks ` (np\s)/s

nobody ` s/(np\s)saw ` (np\s)/np Bob ` np

/Esaw, Bob ` np\s

/Enobody, (saw, Bob) ` s

/Ethinks, (nobody, (saw, Bob)) ` np\s

\ EAlice, (thinks, (nobody, (saw, Bob))) ` s

Figure 3.10. Alice thinks nobody saw Bob

animate whereas others are not. To a first approximation, nobody denotes

(3.124) λc.∀x. animate(x)⇒¬cx

(that is, whether every animate individual lacks a given property), whereas nothingdenotes

(3.125) λc.∀x.¬animate(x)⇒¬cx

(that is, whether every inanimate individual lacks a given property). Similarlyfor everyone versus everything, and someone versus something. The sen-tence (3.115) on page 83 then denotes

(3.126) ∀x. animate(x)⇒¬saw(x, Bob).

When a generalized quantifier is applied to a property, the property (or theexpression that denotes it) is called the scope of that occurrence of the generalizedquantifier. For example, the scope of most students in (3.123) is the property ofhaving seen Bob, or the phrase saw Bob. The scope of nobody in (3.115) is thesame. The set of individuals to which a generalized quantifier applies a property,or the expression that denotes the set, is called the restrictor of the generalizedquantifier. For example, the restrictor of most students is the property of being astudent, or the word students. The restrictor of nobody is the property of animacy,or perhaps the suffix -body.

It may appear from the examples so far that the scope of a quantificationalnoun phrase is always the rest of the sentence containing it, but that is not alwaysthe case. For example, the sentence

(3.127) Alice thinks nobody saw Bob

means that Alice thinks the proposition (3.118) on page 83 is true. In this meaning,the scope of the generalized quantifier nobody is the property of having seen Bob(saw Bob), not the property of being thought by Alice to have seen Bob (Alicethinks . . . saw Bob). We say that the quantifier nobody takes scope over theembedded clause nobody saw Bob, or that it takes scope at the boundary betweenthe embedded clause and the rest of the sentence. Figure 3.10 derives this sentence

Page 96: Linguistic side effects

86 3. The analogy: delimited control and quantification

using the same lexical entry for nobody as proposed above. The /E inferencesabove apply nobody to its scope saw Bob, and thinks to the resulting meaning ofthe embedded clause. Hence, as explained in Section 2.4, the formulas-as-typescorrespondence assigns to this derivation the meaning for (3.127) with the desiredscope for nobody.

3.3.2. In-situ quantification. Unfortunately, the simplistic account presentedabove handles only quantificational noun phrases that combine with their scope tothe right. In English and many other languages, quantifiers can occur in a varietyof locations throughout a clause. For example, nobody occurs to the right of sawin (3.128), and to the left of ’s mother in (3.129) and (3.130).

(3.128) Alice saw nobody.(3.129) Nobody’s mother saw Bob.(3.130) Alice saw nobody’s mother.

The type s/(np\s) proposed above for nobody can combine neither with saw(of type (np\s)/np) on the left nor with ’s mother (of type np\np) on the right.Therefore, our straw-man analysis predicts erroneously that these sentences arenot acceptable.

Roughly speaking, these example sentences suggest that a quantifier in Englishcan occur anywhere in the clause where it takes scope, not just to the left at thetop level as in (3.115) on page 83. This flexibility is called in-situ quantificationbecause the quantifier can take scope “in place”, surrounded by its scope.

Starting with Montague’s seminal Proper Treatment of Quantification (1974b),many ways to account for in-situ quantification have been proposed in the natural-language syntax and semantics literature. Some proposals, such as May’s LogicalForm (1985) and Hobbs and Shieber’s (1987) and Moran’s (1988) quantifierscoping algorithms, posit a level of representation where quantifiers move silentlyto the edge of their scopes. For example, we could move nobody in (3.130) assketched below.

(3.131)Alice

||||

BBBBBBB

saw

|||||

BBBBBBB

nobody

||||’s mother

BBBB

=⇒

nobody

||||

BBBBBBB

Alice

||||

BBBBBBB

saw

|||||

BBBBBBB

RR

|||||| ’s mother

BBBB

The meaning of the sentence can then be computed from the latter representation,where the quantifier nobody does occur to the left at the top level. Other proposals,such as Cooper storage (1983) and Keller storage (1988), enrich the mechanismfor semantic interpretation to deal specifically with quantification. For example,

Page 97: Linguistic side effects

3.3. Quantification 87

nobody ` q(np, s, s)Alice ` np

saw ` (np\s)/np

Idnp ` np ’s mother ` np\np

\ Enp, ’s mother ` np

/Esaw, (np, ’s mother) ` np\s

\ EAlice, (saw, (np, ’s mother)) ` s

q EAlice, (saw, (nobody, ’s mother)) ` s

Figure 3.11. Alice saw nobody’s mother

Hendriks’s Flexible Types system (1988) systematically generates an infinitefamily of types to account for different surroundings in which a quantifier maywake up from the lexicon to find itself.

Particularly relevant to us is Moortgat’s attractive proposal in type-logicalgrammar of a ternary type constructor q (1988, 1995, 1996). If T , W0, and Ware types, then Moortgat proposes that q(T , W0, W) be a type as well. Moortgatterms T the bound expression type, W0 the binding domain type, and W theresultant expression type. A quantificational noun phrase canonically has the typeq(np, s, s). In terms of semantics, an expression of type q(T , W0, W) denotes afunction from functions from T to W0 to W, or formally, (T→W0)→W. In termsof syntax, an expression of type q(T , W0, W) can plug into any context into whichan expression of type T can be plugged to produce a larger expression of type W0,to produce a larger expression of type W. That is, we add the inference rule

(3.132)∆ ` F : q(T , W0, W) Γ[x : T ] ` E : W0 q E.

Γ[∆] ` F(λx. E) : WWith the type q(np, s, s) for nobody and the same denotation as above, wecan derive the previously problematic sentences. For instance, Figure 3.11 de-rives (3.130). (We give examples below where W0 and W differ.)

The q E rule in (3.132) suggests the following intuition behind the threetypes T , W0, and W in the type q(T , W0, W). An expression of type q(T , W0, W)behaves “locally” like type T . That is, it combines syntactically with phrases(in Γ[ ]) with which a T can combine. But “behind the scenes”, it waits untilsyntactic combination yields a W0. It then takes scope over that W0 to producea W finally. We call W0 and W the answer types in the type q(T , W0, W).

The W produced when a quantificational subexpression of type q(T , W0, W)takes scope enters the rest of the derivation like any other W. In particular, anotherquantificational subexpression (of type q(T ′, W, W ′), say) may take scope over it.Hence we predict that multiple quantifiers may occur in the same sentence andtake scope over the same clause. Many examples bear out this prediction. Forexample, we empirically observe that the sentence

Page 98: Linguistic side effects

88 3. The analogy: delimited control and quantification

(3.133) A man is robbed in New York every 11 seconds

is ambiguous between two readings due to the quantificational noun phrases aman and every 11 seconds: one where a possibly different man is robbed eachtime, and one where the same man is repeatedly robbed. The follow-up sentence

(3.134) Let’s interview him

disambiguates (3.133) in favor of the second reading. For simplicity, we considerthe sentence

(3.135) Someone saw everyone,

which is similarly ambiguous between two readings due to the quantificationalnoun phrases someone and everyone. The linear scope (or surface scope)reading corresponds to the logical formula

(3.136) ∃x.∀y. saw(x, y),

which we regard as shorthand for

(3.137) ∃(λx.∀

(λy. saw(x, y)

)).

To describe this reading, we say that someone takes wide scope over everyone,which takes narrow scope. The inverse scope reading corresponds to the logicalformula

(3.138) ∀y.∃x. saw(x, y),

which we regard as shorthand for

(3.139) ∀(λy.∃(λx. saw(x, y)

)).

We say that everyone takes inverse scope over someone in this reading becausethe order between the quantifiers in the English sentence is the opposite of that inthe corresponding logical formula.

Our grammar correctly predicts that (3.135) is acceptable and ambiguous: itprovides two derivations for (3.135) that assign different denotations. Figure 3.12aderives the linear-scope reading. As the two successive q E inferences indicate,someone applies to the clausal meaning produced by everyone. Figure 3.12bderives the inverse-scope reading, where everyone applies to the clausal meaningproduced by someone instead.

So far, we have only used q at the type q(np, s, s). The same formal mechanismfor in-situ quantification, be it q or another account, turns out to be applicableto a wide range of linguistic phenomena, where the answer type is not always s.In other words, if we generalize generalized quantifiers beyond those that takescope over s to yield s, then many other linguistic phenomena reduce to in-situ quantification. For example, Morrill (2000, 2003) proposes an analysis ofanaphora that essentially treats a pronoun and its antecedent both as quantifiers of

Page 99: Linguistic side effects

3.3. Quantification 89

someone` q(np, s, s)

everyone` q(np, s, s)

Idnp ` np

saw ` (np\s)/npId

np ` np/E

saw, np ` np\s\ E

np, (saw, np) ` sq E

np, (saw, everyone) ` sq E

someone, (saw, everyone) ` s

(a) Linear scope

everyone` q(np, s, s)

someone` q(np, s, s)

Idnp ` np

saw ` (np\s)/npId

np ` np/E

saw, np ` np\s\ E

np, (saw, np) ` sq E

someone, (saw, np) ` sq E

someone, (saw, everyone) ` s

(b) Inverse scope

Figure 3.12. Someone saw everyone

a sort. We return to this idea in Section 4.5.1, though without Morrill’s assumption(problematic as explained in Section 2.4) that syntactic combination is associative.

To take another example, Carpenter (1994) and Morrill (1994) encode inten-sionality with q. Given our discussion of intensionality in Section 1.5, we canunderstand their idea as follows. Suppose that the type np denotes the set ofindividuals, including planets. Then the verb thinks shows that the noun phrasesthe morning star and the evening star cannot be of type np (and denote the sameplanet). But if the type s denotes not the (two-element) set of truth values but theset of functions from possible worlds to truth values (perhaps s is shorthand forw→ s0, where w denotes the set of possible worlds and s0 denotes the set of truthvalues), then we can assign to the morning star and the evening star the typeq(np, s, s). The denotation of the morning star would be

(3.140) λc. λw. c(the morning star in the world w)(w),

whereas the denotation of the evening star would be

(3.141) λc. λw. c(the evening star in the world w)(w).

An intensional verb, such as thinks, can then distinguish between these twodenotations. Informally speaking, the morning star and the evening star takescope over a clause to access its possible world.

Page 100: Linguistic side effects

90 3. The analogy: delimited control and quantification

Although q is versatile, its status as a logical connective raises concerns. Wehave only provided the q E rule, which specifies how to consume a quantificationalexpression (∆ in (3.132) on page 87). It is unclear what a corresponding q I rulemight be, with which to create a quantificational expression. Given the intuitionof plugging a subexpression into a context, we expect the judgments

T ` q(T , W, W)(3.142)

and

T ′/T , q(T , W, W) ` q(T ′, W, W)(3.143)

to be logical validities. Yet neither is derivable using q E alone, simply because qappears to the right of ` in these judgments.

Concomitant with this proof-theoretic concern about q is a model-theoreticone. Several different sets of inference rules for q have been proposed (Moortgat1996; Barker 2005). What makes any given set of inference rules for q correct?(That is, with respect to what class of semantic models do we want a set ofinference rules for q that is sound and complete?)

One reason why the empirical linguist cares for judgments like (3.142)and (3.143) to be valid arises in the analysis of and. It would be nice to beable to say that and can conjoin two phrases just in case they have the same type.This natural idea explains why

(3.144) Alice saw Bob’s mother and Carol

is acceptable (because Bob’s mother and Carol both have type np), but not

(3.145) *Alice saw Bob’s mother and to Carol’s assassination

(because to Carol’s assassination does not have type np). Given that

(3.146) Alice saw Bob and someone’s mother

is acceptable, we would like Bob and someone’s mother to have the sametype. Using (3.142) and (3.143), we can derive the type q(np, s, s) for Bob andsomeone’s mother respectively, as desired.

One way to strengthen the foundation for q is to reduce it to other logicalconnectives that are better understood. Bernardi (2003) summarizes three imple-mentations of q in multimodal type-logical grammar (Morrill 1994; Moortgat1995, 2000); and Areces and Bernardi (2003) give a fourth solution using hybridlogic. In Chapter 4, we propose yet another implementation of q, which unlike theothers is motivated by a computational (or operational, or dynamic, or processing)interpretation based on continuations.

Page 101: Linguistic side effects

3.4. Delimited control versus quantification 91

3.4. Delimited control versus quantification

Delimited control in programming languages and in-situ quantification innatural languages exemplify the same general pattern as in Section 1.6: theprogram abort 2 does not evaluate to any result; the utterance nobody does notrefer to any individual. Yet we would like a sound and compositional theory thatexplains how these expressions can take part in a complete program or utterancethat does evaluate to a result or refer to an individual or a truth value.

Programming-language theorists and linguists alike have responded to thischallenge by adding to denotations a new aspect of meaning beyond evaluationresults and reference, namely to let an expression take control over, or applyto, its context. The meaning of the context is known as the continuation of anexpression in programming-language semantics and the scope of a quantifierin natural-language semantics. With this addition, denotations become morecomplex: they are now functions from functions from a value type T to an answertype W0 to an answer type W.4 The scope of a natural-language quantifier isanalogous to the delimited context of a programming-language control operator(Barker 2001, 2002; de Groote 2001). The analogy is as if nobody denotes theshift-expression ξc. ∀x. ¬#(cx) (Shan 2005, 2004a; Barker 2004).

Ambiguity may result when a program contains multiple control operatorsor an utterance contains multiple quantifiers. In Section 3.2, the denotation ofthe control operator evaluated earlier applies to a continuation containing thedenotation of the control operator evaluated later. In Section 3.3.2, the denotationof the quantifier taking wide scope applies to a scope containing the denotationof the quantifier taking narrow scope. Thus evaluation order in programminglanguages corresponds to scope order in natural languages: left-to-right evaluation(as in the denotation (3.47) on page 67) corresponds to linear scope (as in thedenotation (3.137) on page 88); right-to-left evaluation (as in the denotation (3.62)on page 69) corresponds to inverse scope (as in the denotation (3.139) on page 88)(Barker 2001, 2002; de Groote 2001). Section 3.1.2 above removes nonconfluencein a programming language by enforcing left-to-right evaluation. Sections 4.4and 4.5 below explain empirical generalizations in linguistics, also by enforcingleft-to-right evaluation.

A popular view in programming-language theory holds that each computa-tional side effect corresponds to a notion of computation expressed as a monador monad morphism (Moggi 1990, 1991; Wadler 1992a,b). Under this view,delimited control turns out to subsume all computational side effects (Filinski1994, 1996, 1999). Given our analogy between computational and linguistic side

4The traditional answer type is s in linguistics but ⊥ in logic and computer science, whereundelimited control was studied before delimited control. De Groote (2001) relates in-situquantification to undelimited control by identifying the types s and ⊥.

Page 102: Linguistic side effects

92 3. The analogy: delimited control and quantification

effects, we then expect quantification to subsume all linguistic side effects (Shan2001). Section 4.5 below confirms this expectation by using quantification toaccount for a variety of linguistic side effects observed empirically.

Page 103: Linguistic side effects

CHAPTER 4

Evaluation order in natural languages

This chapter implements in-situ quantification in multimodal type-logicalgrammar using an idea from the analysis of delimited control in programminglanguages: the meaning of an evaluation context is a continuation (Section 3.2).In type-logical grammar, Moortgat’s ternary type constructor q for in-situ quan-tification (1988; 1996; 1995) already enjoys several proposed implementations(Morrill 1994; Moortgat 1995, 2000), summarized by Bernardi (2003). Moreover,both Barker (2001, 2002, 2004) and de Groote (2001) already relate quantifica-tion to continuations. Nevertheless, our implementation improves over previouslinguistic theories in two ways, one theoretical and one empirical.

First, whereas previous implementations of q in type-logical grammar aredesigned to move a quantifier to the edge of its scope as depicted in (3.131) onpage 86, our implementation does not move nearby constituents apart or distantconstituents together. Rather, as Section 4.1 explains, the structural rules thatwe propose just encode symmetries inherent in a syntactic structure viewed asa graph. In Section 4.2, we use this encoding to model how a quantifier such asnobody combines with its scope outward, and how a scope such as Alice saw[ ]’s mother combines with its gap inward, just as an intransitive verb such asslept combines with its subject argument leftward, and as a transitive verb such assaw combines with its object argument rightward. (Chapter 5 exploits the samesymmetries for programming-language theory.)

Second, we treat the order in which a human processes parts of an utter-ance like the order in which a computer evaluates parts of a program. Just asSection 3.1.2 specifies the evaluation order of a toy programming language byrestricting its evaluation contexts, Sections 4.3 and 4.4 specify the processingorder of a natural-language fragment by restricting its scope-taking contexts. InSection 4.5, we use this new notion of order in natural language to unify thepreference for linear scope in quantification with empirical generalizations inother linguistic side effects: crossover in anaphora, superiority in wh-questions,and the effect of linear order on polarity sensitivity. In particular, Section 4.5.2shows how this unified account of crossover and wh-questions predicts a complexpattern of interaction between them, without any stipulation that mentions bothanaphora and interrogation. Section 4.5.3 then describes the first concrete pro-cessing account of linear order in polarity sensitivity, linking it to linear scope in

93

Page 104: Linguistic side effects

94 4. Evaluation order in natural languages

quantification.

4.1. Type environments as graphs

This section introduces a graphical interpretation of type environments in type-logical grammar, in terms of graphs. Section 4.2 then applies this interpretationto implement q for linguistic analyses.

Every type environment in type-logical grammar can be uniquely drawn asa binary tree. It is crucial for us to view the root of the tree as just another leafnode. Because this is not a conventional view of syntactic structure, let us bepedantically specific and define a binary tree to be an acyclic graph in whichevery node has either one or three edges, and every leaf node but one (the root) islabeled with a type. For example, the type environment

(4.1) Γ = a,((b, c), d

)can be drawn as follows.

(4.2) a,((b, c), d

)⇐⇒

bqqq

qc

MMMM

dqqqqMMMMMa qqq

q

=

bqqq

qc

MMMM

qqqqq dMMM

MMMMMMa qqq

q

In our graphs, edges are unoriented, so the following two graphs are identical.

(4.3)a=

a

However, we do orient nodes by designating, for each trivalent node, one ofthe two cyclic orderings of its edges as counterclockwise. Hence each of thefollowing graphs are equal to two others, but not to their mirror images.

(4.4)a MMMM

cqqqq

b

=

c MMMMbqqqq

a

=

b MMMMaqqqq

c

,

a MMMMbqqqq

c

=

b MMMMcqqqq

a

=

c MMMMaqqqq

b

Together, these two conventions ensure that each type environment correspondsto only one graph.

Suppose now that Γ and Θ are two type environments, such that Θ appearsas part of Γ. For instance, Γ might be the type environment a,

((b, c), d

)as above,

and Θ might be the type environment b, c. We write Γ = ∆[Θ], where ∆[ ] isinformally the context in Γ of the subexpression Θ; that is, ∆[ ] = a, ([ ], d).Graphically speaking, we have decomposed Γ into two parts, ∆[ ] and Θ, by

Page 105: Linguistic side effects

4.1. Type environments as graphs 95

Γ } x : T1 ` E : T2( IΓ ` λx. E : T2( T1

x : T1 } Γ ` E : T2 ) IΓ ` λx. E : T1)T2

Γ ` F : T2( T1 ∆ ` E : T1( EΓ } ∆ ` FE : T2

∆ ` E : T1 Γ ` F : T1)T2 ) E∆ } Γ ` FE : T2

Figure 4.1. Adding the binary mode } to the Lambek calculus without products

ripping apart the graph at an edge, calling the side that contains the root node ∆[ ],and the other side Θ.

(4.5)

bqqq

qc

MMMM

dqqqqMMMMMa qqq

q

⇐⇒ b, c = Θ

⇐⇒ a, ([ ], d) = ∆[ ]

To represent such a decomposition as a graph, we insert a new, special node } inthe middle of the edge that connects the two components.

(4.6)

bqqq

qc

MMMM}

dqqqqMMMMMa qqq

q

1

=}

qqqqqb MMMM

c

MMMMM

d

qqqqq 1MMM

M

a

This new node } connects the two components of the decomposition to a newroot node. The old root node becomes a leaf with the special label 1. We knowwhich side of the } corresponds to the context, since that is the side that containsthe 1 node marking the position of the old root. Starting at the } node, we alwaysput the context side right after the subexpression side (where “after” meansmoving counterclockwise). The symbol } is pronounced “at”, because it puts asubexpression at the hole of a context.

To convert the picture (4.6) back to a type environment, we introduce anew binary mode (following Section 2.4.1), the continuation or context mode.Instead of writing this new mode’s comma and slashes with subscripts as ,c ,/c, and \c, we use the easier-to-read symbols } (again, pronounced “at”), ((pronounced “outside”), and ) (pronounced “inside”). The new inference rules,following Figure 2.28 on page 51, are shown in Figure 4.1. The graph in (4.6)thus corresponds to the formula

(4.7) (b, c) }(d, (1, a)

).

Page 106: Linguistic side effects

96 4. Evaluation order in natural languages

In the terminology of Section 2.4.1, this mode is internal (or unpronounceable).The continuation mode } treats the decomposition Γ = ∆[Θ] as a type environ-

ment in its own right: if ∆ is the type environment that represents the context ∆[ ](for example, ∆ = d, (1, a) above), then Θ } ∆ decomposes Γ into the subexpres-sion Θ and the context ∆[ ]. In the trivial case, the graph is ripped apart at the rootedge into two sides: the null context [ ], and the type environment Γ viewed asits own (improper) subexpression. Because the null context is represented by thetype environment 1, we henceforth treat 1 as the right identity for }. That is, weintroduce the structural rule1

Γ[Θ] ` T============= RootΓ[Θ } 1] ` T

(4.8)

to make Θ } 1 logically equivalent to just Θ. Informally, this rule mandates theequation

(4.9) Θ = Θ } 1

in an algebra of subexpressions and contexts.In the original type environment Γ = a,

((b, c), d

)in (4.1) on page 94, the

type d follows a and combines with b, c, whereas in the decomposition (b, c) }(d, (1, a)

)in (4.7), the type d precedes a and combines with 1, a. It appears

from (4.7) that ∆ represents a context ∆[ ] by turning it inside-out, but the graph-ical view in (4.6) shows that we have simply inserted a continuation node }into an edge, without pulling nearby constituents far apart or pushing distantconstituents close together. We have merely changed our perspective by placingone subexpression in the foreground and backgrounding the rest—in Belnap’swords (1982), displaying a subexpression. We can pronounce the environmentd, (1, a) as “a hole d”, because it is just what the context a, ([ ], d) looks like fromthe perspective of the continuation node. In general, if an environment built upusing the default mode (the comma ,) contains a unique 1, then we can pronounceit as what is to the right of the 1, followed by “hole”, followed by what is tothe left of the 1. Interpreting environments as graphs makes mapping betweencontexts and environments as simple as rotating a graph as in (4.6).

Every implementation of in-situ quantification in multimodal type-logicalgrammar represents a meta-level notion of context (like ∆[ ] = a, ([ ], d)) asan object-level expression (like ∆ = d, (1, a)), in order to let an expressionaccess (that is, take scope over) its context. We want to represent contexts at theobject level, in order to preserve our logical machinery with its desirable meta-theoretic properties (such as the finite-reading property: any expression can onlybe ambiguous among a finite number of meanings). Our implementation is unique

1Restall (2000; pages 30–31) calls this rule Push when it is used to infer from top to bottom,and Pop when it is used to infer from bottom to top.

Page 107: Linguistic side effects

4.1. Type environments as graphs 97

in representing contexts “inside-out”, just as the grammars for evaluation contextsin Chapter 2 (starting with Figure 2.9 on page 27) represents contexts “inside-out”. This view results from defunctionalizing (Reynolds 1972) continuations(Danvy and Nielsen 2001a,b; Danvy 2004), and corresponds to Huet’s zipperdata-structure (Huet 1997; Hinze and Jeuring 2001; Abbott et al. 2003) for thedata type of binary trees. Section 4.5 uses this connection between contexts innatural and programming languages to analyze empirical observations on naturallanguage.

We now introduce two structural rules to equate equivalent decompositionsof the same graph. If we find some type environment of the form Θ,Π inside acontext ∆[ ] (that is, given the type environment ∆[Θ,Π]), we can move eitherΘ or Π from the subexpression part to the context part of the decomposition.In other words, the environment ∆[Θ,Π] can be decomposed in three ways:one that isolates Θ, one that isolates Θ,Π, and one that isolates Π. If the typeenvironment ∆ represents the context ∆[ ] (so that it contains 1 in place of the rootof ∆[Θ,Π]), then Figure 4.2 depicts these three decompositions. Since these arethree decompositions of the same graph, we want them to be logically equivalent,so we add two structural rules.2

Γ[Θ } (Π,∆)] ` T================= LeftΓ[(Θ,Π) } ∆] ` T

Γ[(Θ,Π) } ∆] ` T================= RightΓ[Π } (∆,Θ)] ` T

(4.10)

Informally speaking, these structural rules mandate the equation

(4.11) Θ } (Π,∆) = (Θ,Π) } ∆ = Π } (∆,Θ)

2Unfortunately, in the presence of the two structural rules in (4.10) and the right identity 1for the } mode, the default mode becomes commutative.

Γ[Θ,Π]` T

================= RootΓ[(Θ,Π) } 1

]` T

================= LeftΓ[Θ } (Π, 1)

]` T

====================== RootΓ[Θ }((Π, 1) } 1

)]` T

====================== LeftΓ[Θ }(Π } (1, 1)

)]` T

====================== RightΓ[Θ }((1,Π) } 1

)]` T

====================== RootΓ[Θ } (1,Π)

]` T

================= RightΓ[(Π,Θ) } 1

]` T

================= RootΓ[Π,Θ]` T

As explained in Section 2.4, we do not want the default mode to commute, because word order issignificant in natural language. In Section 4.4 below, to enforce left-to-right evaluation, we restrictthe Right rule to apply only when the environment Θ is surrounded by ^. With this restriction,the Right inferences above (especially the upper one) no longer apply. (Barker and Shan (2005)mention two other ways to prevent the default mode from commuting.) Therefore, we leave thecommutativity issue aside here.

Page 108: Linguistic side effects

98 4. Evaluation order in natural languages

ΠMMMMMM

Θ}

qqqq

11111

Θqqqqqq

ΠMMMMMM

}

Θqqqqqq

Π}

MMMM

Θ } (Π,∆) (Θ,Π) } ∆ Π } (∆,Θ)

Figure 4.2. Three ways to decompose ∆[Θ,Π] into a subexpression and a context

to relate the two modes , and } in an algebra of subexpressions and contexts.Using these structural rules, any decomposition of a (connected) graph Θ can bederived from the trivial decomposition Θ } 1, which represents Θ itself insidethe null context [ ] and is logically equivalent to just Θ by (4.8) on page 96. Forexample, the decomposition in (4.6) on page 95 can be derived in three steps.3

(4.12)

Γ[a, ((b, c), d)

]` T

======================== RootΓ[(

a, ((b, c), d))} 1]` T

======================== RightΓ[(

(b, c), d)} (1, a)

]` T

======================== LeftΓ[(b, c) }

(d, (1, a)

)]` T

The continuation mode } and its associated structural rules expand the ma-chinery of abstraction and application built-in to type-logical grammar beyonddescribing functions that apply leftward or rightward. We can now describefunctions that apply “inward” or “outward”. Such functions underlie in-situquantification, as Section 4.2 below shows.

4.2. Scopes apply inward; quantifiers apply outward

Let us apply the techniques developed above to types and expressions from nat-ural language. As in Sections 2.1 and 2.4, we assume a type of noun phrases (moreprecisely, proper nouns) np and a type of sentences (more precisely, clauses) s.For example, Alice saw Bob is a clause: Syntactically, the phrase saw combineswith Bob to the right, then Alice to the left, using the default mode (the comma ,).Semantically, the function saw takes Bob and Alice as arguments.

(4.13) Alice ` npsaw ` (np\s)/np Bob ` np

/Esaw, Bob ` np\s

\ EAlice, (saw, Bob) ` s

3Whereas the steps for splay-tree rotation (Sleator and Tarjan 1985) move among differenttrees that parenthesize the same fringe, our steps in (4.10) move among different decompositionsof the same tree.

Page 109: Linguistic side effects

4.2. Scopes apply inward; quantifiers apply outward 99

a : Alice ` a : npf : saw ` f : (np\s)/np

Idx : np ` x : np

/Ef : saw, x : np ` f x : np\s

\ Ea : Alice, ( f : saw, x : np) ` f xa : s

Root(a : Alice, ( f : saw, x : np)

)} 1 ` f xa : s

Right( f : saw, x : np) } (1, a : Alice) ` f xa : s

Rightx : np }

((1, a : Alice), f : saw

)` f xa : s

) I(1, a : Alice), f : saw ` λx. f xa : np)s

Figure 4.3. Alice saw [ ]

In general, any two noun phrases with saw in between form a clause.

(4.14) Idnp ` np

saw ` (np\s)/npId

np ` np/E

saw, np ` np\s\ E

np, (saw, np) ` s

The context Alice saw [ ] has a hole [ ] where an np can be plugged in to yieldan s. In linguistic terms, it is a gapped clause. Following the graphical approachin Section 4.1, we represent this context as a type environment, or equivalently, agraph.

(4.15) (1, Alice), saw ⇐⇒

saw qqqq

MMMMM

Alice MMMM1qqqq

Figure 4.3 uses the structural rules in (4.10) on page 97 to prove that this contextis a gapped clause, in other words, that the type environment in (4.15) entails thetype np)s. The latter type is the type of something that gives s when combinedon the left with an np using }. In other words, a gapped clause is a context thatencloses an np to give an s.

Figure 4.3 shows the proof with meanings (λ-expressions to the left of :). Asexplained in Section 2.4, the formulas-as-types correspondence determines thesemeanings from the syntactic derivation: the /E and \ E inferences correspondto function application; while the ) I inference at the bottom corresponds toabstraction. Thus the meaning of the gapped clause Alice saw [ ] is the functionλx. f xa.

We can now treat quantificational noun phrases like everyone. As a syntacticelement, everyone combines with a gapped clause enclosing it to give a complete

Page 110: Linguistic side effects

100 4. Evaluation order in natural languages

e : everyone ` e : s( (np)s)

···

Figure 4.3 on page 99(1, a : Alice), f : saw ` λx. f xa : np)s

( Ee : everyone }

((1, a : Alice), f : saw

)` e(λx. f xa) : s

Right( f : saw, e : everyone) } (1, a : Alice) ` e(λx. f xa) : s

Right(a : Alice, ( f : saw, e : everyone)

)} 1 ` e(λx. f xa) : s

Roota : Alice, ( f : saw, e : everyone) ` e(λx. f xa) : s

Figure 4.4. Alice saw everyone

clause. In terms of our context mode }, everyone gives s when combined on theright with np)s. Thus we assign to it the type s( (np)s). Figure 4.4 uses thistype to prove the grammaticality of Alice saw everyone. This proof assigns thedenotation e(λx. f xa) to the sentence. This denotation is the desired outcomeunder the standard view that the meaning e of everyone is a generalized quantifierthat maps the property of being seen by Alice to the proposition that Alice saweveryone.

Just as the function-type constructors / and \ for the default mode , takearguments from the right and from the left, the function-type constructors(and ) for the continuation mode } take arguments from outside (that is, apply tosurrounding contexts) and from inside (that is, apply to enclosed subexpressions),respectively. More generally, we can reconstruct the ternary type constructor q forin-situ quantification in type-logical grammar: encode q(A, B, C) as C( (A)B).The type for everyone in Figure 4.4 encodes q(np, s, s), as expected.

Informally speaking, Figure 4.3 derives the “judgment”

(4.16) Alice saw [ ] ` np)s

using a bureaucracy of structural rules. Omitting this bureaucracy from Figure 4.4yields the “derivation” of Alice saw everyone in Figure 4.5.

As noted in Section 3.3.2, the English sentence

(4.17) Someone saw everyone

is ambiguous between a linear-scope reading

(4.18) ∃x.∀y. saw(x, y)

and an inverse-scope reading

(4.19) ∀y.∃x. saw(x, y).

Like any account of in-situ quantification implementing q, our system derivesboth readings for the sentence, as shown in Figures 4.6 and 4.7. Ignoring thestructural rules Root, Left, and Right, these derivations correspond exactly to

Page 111: Linguistic side effects

4.2. Scopes apply inward; quantifiers apply outward 101

everyone ` s( (np)s)

···

Alice saw np ` s) I

Alice saw [ ] ` np)s( E

Alice saw everyone ` s

Figure 4.5. Alice saw everyone, without the formal bureaucracy

∃:someone`∃:s( (np)s)

∀:everyone`∀:s( (np)s)

···

(4.14) on page 99x:np,( f :saw,y:np)` f yx:s

Root(x:np,( f :saw,y:np)

)}1` f yx:s

Right( f :saw,y:np)}(1,x:np)` f yx:s

Righty:np}

((1,x:np), f :saw

)` f yx:s

)I(1,x:np), f :saw`λy. f yx:np)s

( E∀:everyone}

((1,x:np), f :saw

)`∀(λy. f yx):s

Right( f :saw,∀:everyone)}(1,x:np)`∀(λy. f yx):s

Right(x:np,( f :saw,∀:everyone)

)}1`∀(λy. f yx):s

Leftx:np}

(( f :saw,∀:everyone),1

)`∀(λy. f yx):s

)I( f :saw,∀:everyone),1`λx.∀(λy. f yx):np)s

( E∃ : someone }

(( f : saw,∀ : everyone), 1

)` ∃(λx.∀(λy. f yx)) : s

Left(∃ : someone, ( f : saw,∀ : everyone)

)} 1 ` ∃(λx.∀(λy. f yx)) : s

Root∃ : someone, ( f : saw,∀ : everyone) ` ∃(λx.∀(λy. f yx)) : s

Figure 4.6. Linear scope for Someone saw everyone

ones using q in a system where q is built-in, such as with the q E rule in (3.132).The quantifier that takes wider scope is the one that takes its context as argumentlower in the proof.

The derivations in this section illustrate the rule of 1 in our system. In formallogic, 1 is a nullary mode, just as the empty type environment · in Section 2.3 isa nullary mode, and as the default mode , and the continuation mode } are twobinary modes. The computational analogue of Θ} 1 is to enclose a programming-language expression (corresponding to Θ) in a control delimiter: On one hand, thetop-to-bottom direction of the Root rule says that Θ } 1 entails Θ. Using Root toinfer from top to bottom, we can prove that the null context 1 denotes the identity

Page 112: Linguistic side effects

102 4. Evaluation order in natural languages

∀:everyone`∀:s( (np)s)

∃:someone`∃:s( (np)s)

···

(4.14) on page 99x:np,( f :saw,y:np)` f yx:s

Root(x:np,( f :saw,y:np)

)}1` f yx:s

Leftx:np}

(( f :saw,y:np),1

)` f yx:s

)I( f :saw,y:np),1`λx. f yx:np)s

( E∃:someone}

(( f :saw,y:np),1

)`∃(λx. f yx):s

Left(∃:someone,( f :saw,y:np)

)}1`∃(λx. f yx):s

Right( f :saw,y:np)}(1,∃:someone)`∃(λx. f yx):s

Righty:np}

((1,∃:someone), f :saw

)`∃(λx. f yx):s

)I(1,∃:someone), f :saw`λy.∃(λx. f yx):np)s

( E∀ : everyone }

((1,∃ : someone), f : saw

)` ∀(λy.∃(λx. f yx)) : s

Right( f : saw,∀ : everyone) } (1,∃ : someone) ` ∀(λy.∃(λx. f yx)) : s

Right(∃ : someone, ( f : saw,∀ : everyone)

)} 1 ` ∀(λy.∃(λx. f yx)) : s

Root∃ : someone, ( f : saw,∀ : everyone) ` ∀(λy.∃(λx. f yx)) : s

Figure 4.7. Inverse scope for Someone saw everyone

continuation λx. x, of type T)T for any T , as follows.

(4.20)

Idx : T ` x : T

Rootx : T } 1 ` x : T

) I1 ` λx. x : T)T

On the converse hand, the bottom-to-top direction of the Root rule says that Θentails Θ } 1, so Θ can apply outward to the null context. In other words, Rootfrom bottom to top inserts an expression in a control delimiter (like the # rulein Figure 3.6 on page 72), while Root from top to bottom removes a controldelimiter (like the computation steps for # in (3.12) on page 57 and Figure 3.3 onpage 63). Together, the two directions of Root in this natural-language fragmentcorrespond to the null-context inference rule

(4.21) [ ]in the small-step operational semantics in Chapters 2 and 3.

Whereas Root manages the null context 1, the other structural rules Left andRight manage contexts of other forms. More precisely, Left manages contextsof the form ∆[[ ],Π], and Right manages contexts of the form ∆[Θ, [ ]], where Θand Π are two (environments that represent) expressions. If we were to define

Page 113: Linguistic side effects

4.3. Refining the analysis of quantification by refining notions of context 103

contexts in the style of (4.21), Left and Right would correspond respectively to∆[ ]∆[[ ],Π]

∆[ ]∆[Θ, [ ]]

.(4.22)

In words, a context may descend recursively into either branch of a constituentbuilt with the default mode.

The rules (4.21) (Root) and (4.22) (Left and Right) specify a notion of contextthat lets [ ] appear anywhere in a type environment built up using the default mode.In other words, they allow a graph to be decomposed at any edge. Hence thistreatment of gapped clauses and quantification (unlike several other type-logicaltreatments) works uniformly no matter where the gap or quantifier appears in theclause over which it takes scope: at the left as in Everyone saw Bob, at the rightas in Alice saw everyone, or in the middle as in Alice introduced everyone toBob. It also does not matter how or how deeply the gap or quantifier is embedded.In particular, our system handles quantificational noun phrases in possessiveposition without further stipulation, as in Everyone’s mother saw Bob. Everytype-logical treatment of quantification we are aware of (that is more sophisticatedthan the straw-man proposal in Section 3.3.1) enjoys this success, but it is by nomeans typical of linguistic frameworks other than type-logical grammar, whichassociate syntax with semantics less tightly.

Compared to other type-logical treatments of quantification (especially otherimplementations of q), ours uniquely represents the scope of a quantifier “inside-out”, as explained in Section 4.1. We deal with in-situ quantification not bymoving the quantifier to the edge of its scope, as is standard, but by viewing thesame graph from the perspective of the continuation node. This view makes iteasy to come up with the structural rules Root, Left, and Right above as well asthe refinements and generalizations in the sections below.

4.3. Refining the analysis of quantification by refining notions of context

This section extends the analysis of quantification above to account for twokinds of additional empirical observations. First, we discuss situations in whichone quantificational noun phrase contains another, as in every dean of a school.The challenge is to understand how every can take scope over (or in computationalterms, be evaluated before) something it contains. Second, we discuss scopeislands: expressions that limit the scope of quantifiers they contain. For instance,it is generally assumed that everyone in Someone thought everyone left cannottake scope outside of the embedded clause everyone left, and hence cannot takeinverse scope over someone. If so, then the embedded clause is a scope island,and the challenge is to enforce its islandhood.

In both cases, the empirical observations we discuss are well studied in theliterature. Our contribution is to relate them to the operational semantics of

Page 114: Linguistic side effects

104 4. Evaluation order in natural languages

programming languages: We restrict what a quantificational expression can takescope over by restricting what we allow as a legitimate context that the expressioncan access. This is the same strategy by which Section 3.1.2 enforces evaluationorder in a toy programming language, and by which Section 4.4 below enforcesevaluation order in a natural-language fragment.

As explained in Section 4.2 above, the structural rules Root, Left, and Rightspecify a notion of context by relating the continuation mode } to other modes. Wecan change these rules to change what is allowed as a context: adding structuralrules broadens the notion to include more scope-taking possibilities; removingstructural rules constrains the notion to include fewer scope-taking possibilities.(Different notions of context can be mixed in the same grammar by using one }-like mode for each notion, though we have no need to do so here.) We demonstrateeach linguistic application of this flexibility by making a separate amendment tothe same basic system presented above. These amendments are compatible in thesense that it is straightforward to combine them into one grammar that covers allof the applications at once.

4.3.1. Restrictors. Having treated simple quantificational noun phrases likeeveryone and someone in Section 4.2, we now turn to quantifiers that combinewith a common noun (like school) or a more complex restrictor constituent, asin most schools or every dean of a school. (For simplicity, we ignore thedifference between plural and singular nouns, like schools and school. We alsotreat dean of as a single word.) We assume that common nouns are predicates,in that they denote functions from individuals (type np) to propositions or truthvalues (type s). However, they cannot directly combine with noun phrases in thedefault mode—that is why *Harvard school is unacceptable; it does not meanthat Harvard is a school. Thus, as promised in Section 2.4.1, we introduce a newbinary mode n (the letter n being mnemonic for “noun”) and assign commonnouns to the type np\ns.4 (For intuition, we can imagine that is a in Harvard is a

4We cannot reuse the continuation mode } for this new mode and assign common nouns tothe type np)s. If we did, then the Rightn rule that we are about to add in (4.26) would predictincorrectly that the strings Alice is a dean of Harvard and *Harvard is an Alice dean of areequally acceptable and have the same meaning, as follows.

Alice } (dean of, Harvard)================================ Root(Alice } (dean of, Harvard)

)} 1

================================ Left((Alice, dean of) } Harvard

)} 1

================================ RightnHarvard }(1 } (Alice, dean of)

)================================ LeftHarvard }

((1, Alice) } dean of

)================================ RightHarvard }

(Alice } (dean of, 1)

)================================ LeftHarvard }

((Alice, dean of) } 1

)================================ Root

Harvard } (Alice, dean of)

Page 115: Linguistic side effects

4.3. Refining the analysis of quantification by refining notions of context 105

e : every ` e :(s( (np)s)

)/(np\ns)

s : school ` s : np\ns/E

e : every, s : school ` es : s( (np)s)

Idx : np ` x : np g : griped ` g : np\s

\ Ex : np, g : griped ` gx : s

Root(x : np, g : griped) } 1 ` gx : s

Leftx : np } (g : griped, 1) ` gx : s

) Ig : griped, 1 ` λx. gx : np)s

( E(e : every, s : school) } (g : griped, 1) ` es(λx. gx) : s

Left((e : every, s : school), g : griped

)} 1 ` es(λx. gx) : s

Root(e : every, s : school), g : griped ` es(λx. gx) : s

Figure 4.8. Every school griped

school converts school, of type np\ns as just described, to is a school, of typenp\s like an ordinary verb phrase.) The new mode n is internal, so the directionof this slash is arbitrary, but we choose it for the notation to match the type np\sof an intransitive verb like slept.

A quantifier like every combines with a noun to the right to form a quantifica-tional noun phrase. Because nouns have the type np\ns and quantificational nounphrases have the type s( (np)s), we let every take the type

(4.23)(s( (np)s)

)/(np\ns).

Figure 4.8 straightforwardly derives the sentence every school griped. If we addthe lexical items

dean of ` (np\ns)/np,(4.24)Harvard ` np,(4.25)

then essentially the same derivation generates Every dean of Harvard griped.How does the new binary mode n interact with the continuation mode }? We

broaden our notion of context to allow descending recursively into either branchof a constituent built with the n mode. To do so, we add the following rulesalongside those in (4.10) on page 97.

Γ[Θ } (Π,n ∆)] ` T================== LeftnΓ[(Θ,nΠ) } ∆] ` T

Γ[(Θ,nΠ) } ∆] ` T================== RightnΓ[Π } (∆,nΘ)] ` T

(4.26)

If we were to define contexts in the style of (4.21) and (4.22), these new ruleswould correspond respectively to

∆[ ]∆[[ ],nΠ]

∆[ ]∆[Θ,n [ ]]

.(4.27)

Page 116: Linguistic side effects

106 4. Evaluation order in natural languages

a :a`a :(s( (np)s)

)/(np\ns)

s :school` s :np\ns/E

a :a, s :school`as :s( (np)s)

Idx :np` x :np

d :dean of`d : (np\ns)/npId

y :np` y :np/E

d :dean of, y :np`dy :np\ns\n E

x :np,n (d :dean of, y :np)`dyx :sRoot(

x :np,n (d :dean of, y :np))}1`dyx :s

Rightn(d :dean of, y :np)} (1,n x :np)`dyx :sRight

y :np}((1,n x :np), d :dean of

)`dyx :s

)I(1,n x :np), d :dean of`λy.dyx :np)s

( E(a : a, s : school) }

((1,n x : np), d : dean of

)` as(λy. dyx) : s

Right(d : dean of, (a : a, s : school)

)} (1,n x : np) ` as(λy. dyx) : s

Rightn(x : np,n (d : dean of, (a : a, s : school))

)} 1 ` as(λy. dyx) : s

Rootx : np,n

(d : dean of, (a : a, s : school)

)` as(λy. dyx) : s

\n Id : dean of, (a : a, s : school) ` λx. as(λy. dyx) : np\ns

Figure 4.9. Letting a school take narrow scope within dean of a school

A welcome consequence of this revision is that a scope ambiguity betweentwo quantifiers is predicted even when one is located within the other’s restrictor.This ambiguity is illustrated in the following sentence.

(4.28) Every dean of a school griped.

As Dalrymple et al. note (1999; Section 2.5.1), it can be tricky to account forlinear scope in this sentence because there is no pronounced clause (“syntacticunit at the f-structure level”) over which a school can take scope in the restrictordean of a school. Nevertheless, the “imaginary clause” s in the type np\ns fornouns suffices for a school to take scope over. Figure 4.9 proves that dean of aschool has the type np\ns, just like a common noun. The Rightn structural ruleis crucial to this derivation. As the use of( E in Figure 4.9 indicates, a schooltakes scope entirely within the complex “noun” dean of a school, so insertingthis proof into a derivation for (4.28) generates the linear-scope reading. Theinverse-scope reading is also generated, without using either rule in (4.26). Thetop of that derivation proves

(4.29)(every, (dean of, np)

), griped ` s.

4.3.2. Islands. If we add a new mode of combination to a grammar withoutalso adding any structural rule relating the new mode to the continuation mode },then contexts would be unable to cross the new mode. That is, the new modewould be an island.

Page 117: Linguistic side effects

4.3. Refining the analysis of quantification by refining notions of context 107

For example, many linguists believe that a quantificational noun phrase cannottake scope outside of a tensed clause. A tensed clause is a clause whose mainverb is inflected for tense. For example, the clause everyone left in the sentence

(4.30) Someone thought everyone left

is tensed, but the clause everyone to leave in the sentence

(4.31) Someone asked everyone to leave

is untensed. The usual way in type-logical grammar to make tensed clauses intoscope islands is using an external unary mode i, whose unary connectives are^i and �↓i . For instance, if the type of thought is

(�↓i (np\s)

)/s, then Someone

thought everyone left has only the linear-scope reading, on which someone takeswides scope over everyone. This is because

(4.32) np,(thought, (np, left)

)` s

is not derivable—only

(4.33) np, 〈thought, (np, left)〉i ` s

is derivable, using the �↓i E rule as follows.

(4.34)Id

x : np ` x : np

t : thought ` t :(�↓i (np\s)

)/s

Idy : np ` y : np l : left ` l : np\s

\ Ey : np, l : left ` ly : s

/Et : thought, (y : np, l : left) ` t(ly) : �↓i (np\s)

�↓i E〈t : thought, (y : np, l : left)〉i ` t(ly) : np\s

\ Ex : np, 〈t : thought, (y : np, l : left)〉i ` t(ly)x : s

Without a structural rule relating the unary mode i to the binary mode }, everyonein the embedded tensed clause cannot cross the i mode, which it must to accessits entire surrounding context np, 〈thought, ([ ], left)〉i. In other words, thoughteveryone left is a scope island.

Another way to make tensed clauses be scope islands is to assign thought thetype (np\s)/^is, so that (4.32) is not derivable (nor is (4.33)) but

(4.35) np, (thought, 〈np, left〉i) ` s

is derivable, using the ^i I rule as follows.(4.36)

Idx : np ` x : np

t : thought ` t : (np\s)/^is

Idy : np ` y : np l : left ` l : np\s

\ Ey : np, l : left ` ly : s

^i I〈y : np, l : left〉i ` ly : ^is

/Et : thought, 〈y : np, l : left〉i ` t(ly) : np\s

\ Ex : np, (t : thought, 〈y : np, l : left〉i) ` t(ly)x : s

Page 118: Linguistic side effects

108 4. Evaluation order in natural languages

Again, without a structural rule relating the unary mode i to the binary mode },everyone in the embedded tensed clause cannot access the context np, (thought,〈[ ], left〉i). In other words, everyone left is a scope island.

Because we have only considered quantifiers that take scope over a clause(type s), it makes no observable difference whether the verb phrase thoughteveryone left or the clause everyone left is a scope island. If we have reasonto treat an expression E as a quantifier that takes scope over a verb phrase (typenp\s) rather than a clause—perhaps E is an adverbial phrase of some sort—thenwe may be able to distinguish between the two kinds of scope islands by checkingwhether E can take scope over the entire verb phrase thought Alice E left. We donot pursue such a test here.

4.4. Linear order and evaluation order

The linear order of quantifiers in a sentence can affect its interpretation. Forexample, it is frequently observed that Mandarin quantifiers tend to take linearscope (Huang 1982; Aoun and Li 1993), and quantifier scope seems to be overtlyexpressed by syntactic raising in Hungarian (Szabolcsi 1997). In this section,we model quantifier order by restricting the notion of context, drawing from therestriction of evaluation contexts in Section 3.1.2 to specify evaluation order ina programming language. In Section 4.5 below, we apply the same concept ofevaluation order to other linguistic side effects, namely anaphora, interrogation,and polarity sensitivity, which are also sensitive to linear order.

Figure 4.7 on page 102 successfully derives inverse scope for the sentenceSomeone saw everyone because everyone can take as argument its context

(4.37) (1, someone), saw,

or equivalently,

(4.38) someone, (saw, [ ]).

In general, the scope of one quantifier contains another just in case the latterappears in the context argument of the former. Imagine for the moment that weare studying a language that resembles English but mandates linear scope. Torule out inverse scope, we can refine our notion of context so as to rule out anyquantifier linearly located to the left of [ ]. Then (4.38), in which the quantifiersomeone occurs to the left of [ ], would no longer be a legitimate context, whereas

(4.39) Alice, (saw, [ ])

and

(4.40) [ ], (saw, everyone)

would still be legitimate contexts. (The context (4.39) is used in Figure 4.3on page 99 and Figure 4.4 on page 100 to derive Alice saw everyone. The

Page 119: Linguistic side effects

4.4. Linear order and evaluation order 109

context (4.40) is used in (4.14) on page 99 and Figure 4.6 on page 101 to derivethe linear-scope reading of Someone saw everyone.)

We implement this idea using a unary mode. Following terminology in thestudy of computational side effects, we call an expression pure if it contains noquantifier (that is, incurs no control effect), or impure otherwise. In particular, allvalues (in the sense of Section 3.1.2) are pure. To distinguish pure expressionsfrom impure ones, we tag the types of pure expressions with a unary mode ^, orequivalently, the unary structural punctuation 〈 〉. Any formula can be turned pureby embedding it under 〈 〉 using the T rule.

(4.41)Γ[〈∆〉] ` T

TΓ[∆] ` T

The T rule is analogous to quotation or staging in programming languages, whichturns executable code into static data. Two quotations can be concatenated usingthe K′ rule.

(4.42)Γ[〈Θ,Π〉] ` T

K′Γ[〈Θ〉, 〈Π〉] ` T

Whereas the other rules introduced so far allow inference from top to bottom aswell as from bottom to top, these rules only allow inference from top to bottom.We do not allow the converse of T because we want to regulate when inversescope is available: we add a partial converse of T, called Unquote, in (4.47) onpage 112 below. The converse of K′ appears harmless but is unneeded, so weleave it out.5

We consider a derivation complete if and only if it culminates in the type ^s,rather than s as before. The type ^s signifies a pure clause rather than a quantifierover clauses (propositions). By contrast, everyone and someone are impure:their type ^s( (np)^s) is not surrounded by ^, though the propositions theyquantify over and finally produce are pure (hence the ^ in ^s).

To rule out inverse-scope contexts like (4.38), we modify our notion of contextto require that the left branch of a type environment built up using the defaultmode be pure before descending recursively into the right branch. That is, werevise our notion of context from (4.21) on page 102 and (4.22) on page 103 to

[ ]∆[ ]∆[[ ],Π]

∆[ ]∆[〈Θ〉, [ ]]

(4.43)

In terms of structural rules, we replace the Right rule from (4.10) on page 97 with

5Moot (2002; pages 39 and 156) explains these rules and their names. Curiously, the presentwork seems to be the first linguistic application of K′.

Page 120: Linguistic side effects

110 4. Evaluation order in natural languages

someone ` ^s( (np)^s)

everyone ` ^s( (np)^s)

···

(4.14) on page 99np, (saw, np) ` s

^ I⟨np, (saw, np)

⟩` ^s

K′〈np〉, 〈saw, np〉 ` ^s

K′〈np〉, (〈saw〉, 〈np〉) ` ^s

T〈np〉, (〈saw〉, np) ` ^s

Root(〈np〉, (〈saw〉, np)

)} 1 ` ^s

Right′(〈saw〉, np) } (1, 〈np〉) ` ^s

Right′np }

((1, 〈np〉), 〈saw〉

)` ^s

) I(1, 〈np〉), 〈saw〉 ` np)^s

( Eeveryone }

((1, 〈np〉), 〈saw〉

)` ^s

Right′(〈saw〉, everyone) } (1, 〈np〉) ` ^s

Right′(〈np〉, (〈saw〉, everyone)

)} 1 ` ^s

T(np, (〈saw〉, everyone)

)} 1 ` ^s

T(np, (saw, everyone)

)} 1 ` ^s

Leftnp }

((saw, everyone) } 1

)` ^s

) I(saw, everyone) } 1 ` np)^s

( Esomeone }

((saw, everyone) } 1

)` ^s

Left(someone, (saw, everyone)

)} 1 ` ^s

Rootsomeone, (saw, everyone) ` ^s

Figure 4.10. Linear scope for Someone saw everyone, under left-to-rightevaluation. Bernardi (2002; page 50) shows the ^ I rule used in this derivation(same as ^R in the Gentzen presentation (Moortgat 1997; Definition 4.16)), aswell as the other natural-deduction rules for the unary operators ^ and �↓, namely^E, �↓ I (same as �↓ R), and �↓ E.

a more specific instance Right′.

Γ[Θ } (Π,∆)] ` T================= LeftΓ[(Θ,Π) } ∆] ` T

Γ[(〈Θ〉,Π) } ∆] ` T=================== Right′Γ[Π } (∆, 〈Θ〉)] ` T

(4.44)

With these changes to the grammar, only the linear-scope reading for Some-one saw everyone (of type ^s) remains derivable. Figure 4.10 shows the deriva-tion. It is shaped exactly like Figure 4.6 on page 101, except the T, K′, and ^ I

Page 121: Linguistic side effects

4.4. Linear order and evaluation order 111

rules are used after (4.14) on page 99 to prove

(4.45) 〈np〉, (〈saw〉, np) ` ^s.

The inverse-scope derivation in Figure 4.7 on page 102 is no longer valid, becausesomeone is impure and so the Right′ rule in (4.44) does not apply.

What we have just seen is that a linguistic preference for linear scope reflectsa computational preference for left-to-right evaluation. Of course, in manylanguages (including English), inverse scope is available. That does not mean thatorder-of-evaluation effects are absent—it only means that, if they are present, theyare more subtle. In the rest of this section, we examine one way to reintroduceinverse scope that accounts for additional empirical observations in Section 4.5below.

The basic idea is to treat inverse scope as multistage programming (seeCalcagno et al. 2003, Taha and Nielsen 2003, and references therein). A multi-stage program is a program that generates another program to be run later. Thegenerating program is said to execute in an earlier or outer stage, and the gener-ated program is said to execute in a later or inner stage. More than two stagesare also possible. Evaluation in each stage is ordered separately: later-stage coderuns only after earlier-stage code generates it. Informally, then, we can treat theinverse-scope reading of Someone saw everyone as the following outer-stageprogram.

(4.46) Run the program consisting of the word someone, the word saw, andeveryone.

Serifs are significant here: The sentence above only uses one quantifier (everyone).It also mentions a word (someone) that is a quantifier, but does not use it. If thedomain of people under discussion is Alice, Bob, and Carol, then (4.46) conjoinsthe meanings of the sentences Someone saw Alice, Someone saw Bob, andSomeone saw Carol. These three sentences are the inner-stage programs.

A typical multistage programming language provides facilities for:

• creating programs (by quotation or staging);• combining programs (by concatenation); and• running programs (by unquotation or evaluation).6

Intuitively, a quoted value is an inactive piece of program text that does notexecute until the “wrapping” ^ (or equivalently 〈 〉) is removed. Removing thewrapping turns inactive data into active code.

6We call this step “unquotation” despite the fact that “evaluation” or “eval” is the commonlyused term for the same concept in the staged programming literature, to avoid confusion with theconcept of evaluation order just discussed.

Page 122: Linguistic side effects

112 4. Evaluation order in natural languages

The T and K′ rules in (4.41) and (4.42) are our facilities for quoting andconcatenating programs, respectively. To run programs, we add the Unquote rule.

(4.47)Γ[〈∆〉u] ` T

UnquoteΓ[〈〈∆〉u〉] ` T

Because the type environment of a quoted program is enclosed in 〈 〉, to run aprogram—to unquote it—is to remove that 〈 〉. Although quotation applies freely(using T), unquotation does not. Rather, the Unquote rule above only unquotesa type environment of the form 〈∆〉u. Here u is a unary mode that marks thosetypes that are allowed as the answer type of a quoted program; this mode can bethought of as a feature (in the linguistic sense) checked by the Unquote rule. Inparticular, we hereafter take the clause type s to be shorthand for ^us′ (or 〈s′〉u ina type environment), where s′ is an atomic formula distinct from s, so that we canderive s from the quoted clause type ^s using the Unquote rule.7

Figure 4.11 shows how to derive the inverse-scope reading of Someonesaw everyone once again using Unquote. (Because s and ^s entail each otherin the presence of Unquote, we can restore the lexical types of someone andeveryone from ^s( (np)^s) back to s( (np)s).) As noted above, the Unquoterule is necessary to derive this reading using the Right′ rule in (4.44) on page 110.More precisely, in order for one quantifier to take inverse scope over another, theUnquote rule must apply to the clause produced by the narrower-scope quantifier(here someone).

Our application of multistage programming to natural language is a newcontribution that builds on the idea of evaluation order. The unary modes ^and ^u, which we use to encode this application, add considerable complexity tothe basic analysis of Someone saw everyone given above in Section 4.2. Thepayoff, in the next section, is to account for fairly subtle and complex linguisticphenomena by controlling evaluation order. In programming-language theory,modal logic has also been used in type systems for quotation and staging (Daviesand Pfenning 1996; Davies 1996; Davies and Pfenning 2001; Goubault-Larrecq1996a,b,c, 1997). It is unclear whether the uses of unary modes there and hereare related. In future work, we hope to investigate this possible relationshipand, perhaps through it, explain the structural rules T, K′, and Unquote moresystematically.

4.5. Beyond quantification

In this section, we survey continuation-based analyses of several linguisticside effects and transliterate them into type-logical grammar. The phenomena

7In this paper, s is the only type that can be unquoted, that is, the only type of the form ^uA.A treatment of polarity sensitivity that is less simplistic than the one in Section 4.5.3 calls formultiple clause types that can be unquoted (Shan 2004b).

Page 123: Linguistic side effects

4.5. Beyond quantification 113

everyone ` s( (np)s)

someone ` s( (np)s)

···

(4.14) on page 99np, (saw, np) ` s

Root(np, (saw, np)

)} 1 ` s

Leftnp }

((saw, np), 1

)` s

) I(saw, np), 1 ` np)s

( Esomeone }

((saw, np), 1

)` s

Left(someone, (saw, np)

)} 1 ` s

Rootsomeone, (saw, np) ` s

^ I⟨someone, (saw, np)

⟩` ^s

Ids ` s

Unquote〈s〉 ` s

^E⟨someone, (saw, np)

⟩` s

K′〈someone〉, 〈saw, np〉 ` s

K′〈someone〉, (〈saw〉, 〈np〉) ` s

T〈someone〉, (〈saw〉, np) ` s

Root(〈someone〉, (〈saw〉, np)

)} 1 ` s

Right′(〈saw〉, np) } (1, 〈someone〉) ` s

Right′np }

((1, 〈someone〉), 〈saw〉

)` s

) I(1, 〈someone〉), 〈saw〉 ` np)s

( Eeveryone }

((1, 〈someone〉), 〈saw〉

)` s

Right′(〈saw〉, everyone) } (1, 〈someone〉) ` s

Right′(〈someone〉, (〈saw〉, everyone)

)} 1 ` s

Root〈someone〉, (〈saw〉, everyone) ` s

Tsomeone, (〈saw〉, everyone) ` s

Tsomeone, (saw, everyone) ` s

Figure 4.11. Inverse scope for Someone saw everyone, under left-to-rightevaluation, using unquotation

discussed are quite intricate, and we cannot hope to be comprehensive here—thepapers cited provide more details. Our point is that continuation-based analysesaccount for more empirical observations and with more theoretical unity thanprevious treatments of the same phenomena in type-logical grammar.

4.5.1. Anaphora and crossover. As alluded to in Section 1.3.1, we followthe view of dynamic semantics (Groenendijk and Stokhof 1991; Heim 1982;Kamp 1981) that anaphora is the storage and retrieval of discourse referents,

Page 124: Linguistic side effects

114 4. Evaluation order in natural languages

analogous to the computational side effect of state. Under this view, anaphorais an ideal arena in which to test the utility of continuations and our side-effectanalogy in linguistics, for two reasons. First, continuations model side effects,including state, a prototypical computational side effect (Filinski 1994; Section5.4). Second, continuations model how multiple side effects interact, and anaphorainteracts with quantification and interrogation.

Shan and Barker (2005) argue that evaluation order explains two distinctphenomena in English, crossover and superiority. Here we express that analysisin type-logical grammar and compare our approach to Jäger’s (2001) analysisof crossover. We account for more empirical cases where crossover interactswith wh-raising. We also argue that crossover in the linguistic side effect ofanaphora and superiority in the linguistic side effect of interrogation stem fromthe same underlying mechanism, namely, left-to-right evaluation as implementedin Section 4.4.

Crossover is the name for the fact that, if a quantificational noun phrase servesas the antecedent of a pronoun, the antecedent usually must precede the pronoun(Postal 1971).

(4.48) a. Everyonei saw hisi mother.b. *Hisi mother saw everyonei.

We use standard linguistics notation above to indicate the relationship betweena pronoun and its antecedent: by attaching the same subscript (such as i) toboth. Thus (4.48) indicates that the sentence (4.48a) can correspond to the logicalformula

(4.49) ∀x. saw(x, mother(x)),

but (4.48b) cannot correspond to

(4.50) ∀x. saw(mother(x), x).

When anaphora succeeds, the antecedent is said to bind the pronoun. Wederive crossover by assuming that people evaluate expressions from left to rightby default, and that a quantifier must be evaluated before any pronoun it binds.

To explain crossover, we must first implement anaphora in our grammar.Many implementations of anaphora exist for type-logical grammar and relatedapproaches (Szabolcsi 1989, 1992; Dowty 1992; Moortgat 1996; Morrill 2000;Hepple 1990, 1992; Jacobson 1999, 2000). Jäger (2001) surveys these implemen-tations. He then proposes a new logical connective “|” with special inference rules.Pronouns have type np|np; more generally, T2|T1 means “I behave like somethingof type T2, but need to be bound by something of type T1”. Jäger’s inference rulesallow any expression of type T1 to bind something of type T2|T1 as long as theantecedent precedes the pronoun.

Page 125: Linguistic side effects

4.5. Beyond quantification 115

Two features of Jäger’s proposal are most relevant to us: resource duplicationand linear order.

Resource duplication: In standard type-logical grammar, every linguisticresource is used exactly once, never duplicated. By contrast, anaphorauses a resource twice: once for the antecedent and then again for thebound pronoun. For example, the variable x in (4.49) is used twice: onceas the seer and once as the offspring. To deal with this mismatch, theinference rules for | incorporate a limited8 form of resource duplication.

Linear order: Like us but unlike in most accounts of anaphora based onLogical Form (May 1985; Reinhart 1983), Jäger recognizes the role oflinear order. His system requires that the antecedent precede the pronounit binds, and he gives empirical arguments that anaphora is sensitive tolinear order. In particular, his system correctly predicts crossover factslike those in (4.48).

In part because Jäger is primarily interested in resource duplication, he merelystipulates linear order: it follows from nothing, and the rules could just as easilyhave required that the antecedent follow the pronoun it binds. We explain the roleof linear order in anaphora in a deeper way: Our account rules out crossover byassuming that people evaluate expressions from left to right by default.

To help compare our approach with Jäger’s (2001), we also accomplishanaphora by means of a (single) inference rule that incorporates a limited form ofresource duplication. However, whereas Jäger extends type-logical grammar witha new logical connective |, we express binding relationships with a new binarymode b (mnemonic for “binding”). For example, the pronoun she has the lexicalentry

(4.51) she ` (np\bT )( (np)T ).

Conceptually, np\bT is the type of an expression that would otherwise count asa T , but which contains a pronoun waiting to be bound by an np. The type for sheindicates that it behaves locally like an np, but turns an enclosing expression oftype T into something of type np\bT . For instance, Bob thought she left wouldhave type np\bs.

8Jäger details logical and linguistic reasons to limit resource duplication. Logically, Jäger’ssystem preserves the finite-reading property alongside cut elimination, decidability, and thesubformula property. Linguistically, the sentence The claimi that someone saw everyone anditsi negation are both true cannot possibly be true; to rule out the spurious reading wheresomeone saw everyone takes, say, linear scope in the antecedent but inverse scope in the boundpronoun it, we must only duplicate formulas like np, not someone, (saw, everyone). Thus the |connective must not obey contravariance: A ` B must not entail np|B ` np|A.

Page 126: Linguistic side effects

116 4. Evaluation order in natural languages

Our binding rule says that a subexpression np within a context ∆[ ] can bindinto (that is, bind a pronoun inside) the larger expression ∆[np].

Γ[np,b (np } ∆)] ` TBind

Γ[np } ∆] ` T(4.52)

This rule duplicates an np resource:9 it says that, whenever two nps in np,b (np}∆)suffice to derive T , a single np in np } ∆ will do. In the former type environmentnp,b (np } ∆), the second np is an antecedent in the context ∆, and the first np isa copy of the antecedent that provides the bound meaning to the pronoun. Readbottom-up, the Bind rule says that an np in a context ∆ can be duplicated to bindinto ∆[np].

To illustrate, Figure 4.12 derives Everyonei saw hisi mother, in which thesubexpression everyone binds into the context [ ] saw his mother. On ouranalysis, the pronoun itself takes scope over its antecedent’s context. In Fig-ure 4.12, everyone binds he, and the context of everyone is [ ] saw his mother.(This context appears at the bottom of the derivation, where everyone enters, as(saw, (he, ’s mother)), 1.) Thus he must take scope over this context—that is,must take as its context np saw [ ]’s mother. (This context appears in the middleof the derivation, where he enters, as ’s mother, ((1, 〈np〉), 〈saw〉).)

It is always easier to explain why a good derivation works than why a badsentence has no derivation, but let us offer a few words on why the crossoverexample *Hisi mother saw everyonei (4.48b) does not have the reading indicatedby the subscripts. The crucial difference between (4.48a) and (4.48b) is thateveryone is now to the right of the pronoun it is trying to bind. In order foreveryone to take scope over the context his mother saw [ ], we must apply theRight′ rule twice (just as in the inverse-scope derivation above in Figure 4.7 onpage 102). But the Right′ rule requires quoting his mother and saw with two^s. Once ^s enter the picture, the only way to remove them is the Unquote rule.But since Unquote only applies to expressions of type s, and there is no way toarrive at the type s without first unquoting the pronoun, there is an unresolvablestandoff: once the non-unquotable np\bs gets quoted, the derivation is doomed.

9Like Dalrymple et al. (1999), we limit resource duplication to np only—Footnote 8 explainswhy. This form of limited resource duplication is much more primitive than Jäger’s, but sufficeshere because we focus on the role of linear order in anaphora and in other linguistic side effects thatdo not duplicate resources. It would be nice to combine Jäger’s treatment of resource duplicationwith our treatment of linear order, because the former unifies noun-phrase anaphora with verb-phrase ellipsis while the latter unifies crossover with superiority, but we leave that to future work.One promising bridge between the two treatments is Morrill’s account of anaphora in terms ofdiscontinuous constituency (2000, 2003): perhaps } is associative in nature with 1 as its left aswell as right identity, and Jäger’s |L and |R rules can be rephrased as follows.

Y ` B Γ[A } B] ` C|L

Γ[(A|B) } Y] ` CAn } · · · } A1 } K ` B

|R(An|C) } · · · } (A1|C) } K ` B|C

Page 127: Linguistic side effects

4.5. Beyond quantification 117

everyone ` s( (np)s)

Idnp ` np

he ` (np\bs)( (np)s)

···

np, (saw, (np, ’s mother)) ` s^ I⟨

np, (saw, (np, ’s mother))⟩` ^s

Ids ` s

Unquote〈s〉 ` s

^E⟨np, (saw, (np, ’s mother))

⟩` s

K′〈np〉, 〈saw, (np, ’s mother)〉 ` s

K′〈np〉, (〈saw〉, 〈np, ’s mother〉) ` s

T〈np〉, (〈saw〉, (np, ’s mother)) ` s

Root(〈np〉, (〈saw〉, (np, ’s mother))

)} 1 ` s

Right′(〈saw〉, (np, ’s mother)) } (1, 〈np〉) ` s

Right′(np, ’s mother) }

((1, 〈np〉), 〈saw〉

)` s

Leftnp }

(’s mother, ((1, 〈np〉), 〈saw〉)

)` s

) I’s mother, ((1, 〈np〉), 〈saw〉) ` np)s

( Ehe }

(’s mother, ((1, 〈np〉), 〈saw〉)

)` np\bs

Left(he, ’s mother) }

((1, 〈np〉), 〈saw〉

)` np\bs

Right′(〈saw〉, (he, ’s mother)) } (1, 〈np〉) ` np\bs

Right′(〈np〉, (〈saw〉, (he, ’s mother))

)} 1 ` np\bs

T(np, (〈saw〉, (he, ’s mother))

)} 1 ` np\bs

T(np, (saw, (he, ’s mother))

)} 1 ` np\bs

Leftnp }

((saw, (he, ’s mother)), 1

)` np\bs

\b Enp,b(np } ((saw, (he, ’s mother)), 1)

)` s

Bindnp }

((saw, (he, ’s mother)), 1

)` s

) I(saw, (he, ’s mother)), 1 ` np)s

( Eeveryone }

((saw, (he, ’s mother)), 1

)` s

Left(everyone, (saw, (he, ’s mother))

)} 1 ` s

Rooteveryone, (saw, (he, ’s mother)) ` s

Figure 4.12. Everyonei saw hisi mother

It is controversial whether the crossover constraint regulates anaphora ac-cording to linear order or according to c-command relations between the quan-tificational antecedent and the pronoun. We agree with Jäger that linear orderis the determining factor, contra Reinhart (1983). Thus we need to say nothingmore for our theory to deal with a sentence like Everyone’si mother saw himi , inwhich the antecedent everyone precedes but does not c-command the pronoun

Page 128: Linguistic side effects

118 4. Evaluation order in natural languages

him (cf. Büring 2001, 2004). One way in which we address Reinhart’s objectionsto linear order is to distinguish linear order from evaluation order: Our claim isthat anaphora depends on evaluation order, which is based on linear order but maydiffer from linear order due to certain syntactic constructions. For example, thepied binding examples in Section 4.5.2 below illustrate that a raised-wh questiondelays the evaluation of the raised wh-phrase. We discuss the role of c-command,linear order, and evaluation order in more detail elsewhere (Shan and Barker 2005;Sections 1.3 and 1.4).

4.5.2. Questions and superiority. We now turn to the linguistic side effectof interrogation, in other words, questions. Given that (as explained in Section 1.1)we measure the success of a natural-language semantic theory by how well itmodels when an utterance is true, and a question seems neither true nor false,it may be surprising that we consider questions at all. Indeed, questions areone reason why we may need other ways to appraise a semantic theory, suchas whether it is felicitous to respond to a given question with a given answer.Questions are also an area of natural language where the boundary betweensemantics (meaning) and pragmatics (use) is less clear. Nevertheless, we canembed a question in a larger utterance that is true or false to learn indirectly whatquestions mean. For example, the wh-questions in (4.53) must have differentmeanings in a sound and compositional semantics, because the sentences in (4.54)may not be all true or all false.

(4.53) a. who saw Bobb. who saw Carolc. who saw what

(4.54) a. Alice knows who saw Bob.b. Alice knows who saw Carol.c. Alice knows who saw what.

Motivated by these examples, we restrict our attention in this section to embeddedwh-questions, such as those in (4.53). The linguistic generalizations and analyseswe discuss carry over to unembedded wh-questions, with minor adjustments todeal with subject-auxiliary inversion and verb forms.

For concreteness, we follow the structured meaning approach (Krifka 2001)and let a question (such as (4.53a)) denote a function from a short answer (suchas Alice) to a proposition (such as that Alice saw Bob).10 If the proper nounBob denotes the individual b, and the transitive verb saw denotes the two-placefunction f , then we let the single-wh question (4.53a) denote the one-place

10Despite this choice of approach, our basic idea transfers to other approaches to questiondenotations (Hamblin 1973; Karttunen 1977; Groenendijk and Stokhof 1984, 1997; Nelken andFrancez 2000, 2002; Nelken and Shan 2004).

Page 129: Linguistic side effects

4.5. Beyond quantification 119

function λx. f bx. We also let the double-wh question (4.53c) denote the two-place function λx. λy. f yx, in which the variable x corresponds to the wh-wordwho, and the variable y to the wh-word what.

Just as we introduced a new binary mode b in Section 4.5.1 to expressanaphora in syntax, we add a new binary mode ? here to express interroga-tion in syntax. Conceptually, if S is a type, then np\?S is the type of a questionfunction that maps a short answer of type np to a result of type S . For example, asingle-wh question such as (4.53a) has the type np\?s, and a double-wh questionsuch as (4.53c) has the type np\?np\?s.

As mentioned above, we explain superiority just as we explain crossover,using left-to-right evaluation order. Superiority is the name (Chomsky 1973) forthe fact (Kuno and Robinson 1972) that a wh-trace usually must precede anyadditional in-situ wh-phrase. For example, the acceptable question (4.55b) (as inAlice knows who you think saw what) begins with the raised wh-phrase who. Asis typical, this raised wh-phrase precedes the gap indicated by , called a trace.The trace is associated with the raised wh-phrase in the sense that who asks forthe seer. After the trace comes the in-situ wh-phrase what. In the unacceptablequestions (4.55c) and (4.55d), the in-situ wh-phrase who precedes the trace.11

(4.55) a. who saw whatb. who you think saw whatc. *what who sawd. *what you think who saw

We suggest that the in-situ wh-phrase who in (4.55c) and (4.55d) prevents theraised wh-phrase what from associating with its trace because a raised wh-phrasecan associate with a trace only if the trace takes widest scope over any additionalin-situ wh-phrase. As with crossover in anaphora, we appeal to linear order: weshow that a wh-trace can take scope over an in-situ wh-phrase only if the tracecomes first.

Of course, before we can explain superiority, we have to treat questions first.A raised-wh question is a wh-question like (4.56), in which a wh-phrase raises tothe front and leaves behind a trace.

(4.56) whose mother Alice saw

In this example, not only does the wh-word whose raise to the front, but the largerphrase whose mother raises together (technically, gets pied-piped) to the front.

11An unembedded question can have a so-called echo-question reading, which we excludefrom consideration. For example, despite the unacceptability indicated in (4.55c), the questionWhat did who see ? is perfectly acceptable in response to the question What did Alice see ?,if the speaker of the first question does not know who Alice is or did not hear the word Alice.

Page 130: Linguistic side effects

120 4. Evaluation order in natural languages

Less standard in English is an in-situ-wh question: a wh-question like (4.57), inwhich the wh-phrase stays in situ.

(4.57) Alice saw whose mother

We now provide lexical entries for who and what. (We analyze the phrasewhose mother as who followed by ’s mother.)

who ` (np\?S )( (np)S )(4.58)

what ` (np\?S )( (np)S )(4.59)

Here the type variable S ranges over all types, not just s, to accommodate multiple-wh questions. The lexical entries above specify that wh-words take scope in situ,so they by themselves generate in-situ-wh questions. To generate raised-whquestions, we add two structural rules.12

Γ[A,? (Θ } (Π, ∆))] ` TTrace Left

Γ[A,? (Θ, (Π } ∆))] ` T(4.60)

Γ[A,? (Π } (∆, 〈Θ〉))] ` TTrace Right′

Γ[A,? (Π, (〈Θ〉 } ∆))] ` T(4.61)

Figure 4.13 shows the basic usage scenario for these additions to the grammar.First, the( E inference near the top concludes with the in-situ-wh question (4.57).Then, the entire derivation culminates in the raised-wh question (4.56).

Let us now examine the derivation of a double-wh question that abides bysuperiority, so that we can see what goes wrong in an attempt to violate superiority.Figure 4.14 derives the question who saw what in (4.55a), in which who is raisedand what remains in situ. (We say that who is raised, even though raising whodoes not affect the sequence of words, because this derivation uses the Trace Leftrule near the end. We exemplify raised-wh questions with who saw what forsimplicity.) In Figure 4.14, Trace Left applies to the raised wh-phrase who inthe context [ ] saw what. In other words, the raised wh-phrase takes scope overthe rest of the sentence. Similarly, to derive the superiority violation *what whosaw in (4.55c), the raised wh-phrase what must take scope over the context whosaw [ ]. Just as in the previous section, taking scope over such a context requiresusing the Right′ rule twice, which introduces two ^s and necessitates Unquote.

12These two rules are really the composition of the Left and Right′ rules with a single newrule

Γ[A,? (Θ } ∆)] ` TTrace,

Γ[A,? (Θ, ( } ∆))] ` Twhere is the empty string, in other words the identity for the default mode. But the defaultmode in standard type-logical grammar does not have an identity. If multimodal type-logicalgrammar were to allow identities for all modes and use unary modes to enforce non-emptinesswhen necessary, then we would be able to replace Trace Left and Trace Right′ with this Tracerule—an appealing prospect.

Page 131: Linguistic side effects

4.5. Beyond quantification 121

Idnp ` np

who ` (np\?s)( (np)s)

·····

like the top of Figure 4.12on page 117

’s mother, ((1, 〈Alice〉), 〈saw〉) ` np)s( E

who }(’s mother, ((1, 〈Alice〉), 〈saw〉)

)` np\?s

\? Enp,?(who } (’s mother, ((1, 〈Alice〉), 〈saw〉))

)` s

Leftnp,?((who, ’s mother) } ((1, 〈Alice〉), 〈saw〉)

)` s

Trace Right′np,?((who, ’s mother), (〈saw〉 } (1, 〈Alice〉))

)` s\? I

(who, ’s mother),(〈saw〉 } (1, 〈Alice〉)

)` np\?s Right′

(who, ’s mother),((〈Alice〉, 〈saw〉) } 1

)` np\?s Root

(who, ’s mother), (〈Alice〉, 〈saw〉) ` np\?s T(who, ’s mother), (Alice, 〈saw〉) ` np\?s T(who, ’s mother), (Alice, saw) ` np\?s

Figure 4.13. Deriving whose mother Alice saw from Alice saw whose mother

But questions cannot be unquoted: their types are of the form A\?S , not ^uS .Superiority is thus enforced.

For Jäger, a raised wh-phrase has the type q/(np ↑ s), where q is the type ofquestions and np ↑ s is the type of an s with an np-trace. Because the mecha-nism that associates a raised wh-phrase with its trace is separate from anaphora,whatever might explain superiority in Jäger’s system will be independent of hisexplanation for crossover. Our system uses continuations for anaphora as wellas to associate a raised wh-phrase with its trace. Hence evaluation order ex-plains crossover as well as why an intervening in-situ wh-phrase prevents a raisedwh-phrase from associating with its trace.

It is debatable whether we should account for crossover and superiority witha unified explanation. Further investigation may reveal empirical arguments thatthese two phenomena are in fact distinct. One empirical observation that favorsour unified treatment is what we call pied binding, a pattern of interaction betweenwh-raising and crossover. To start with, the typical raised wh-phrase can bind apronoun only if the trace of the wh-phrase precedes the pronoun.

(4.62) whoi saw hisi mother(4.63) *whoi hisi mother saw

One popular explanation since Reinhart’s work (1983) is that it is the trace thatbinds the pronoun. In Jäger’s system (his pages 124–125), the trace can indeedbind the pronoun in examples such as (4.62). Jäger’s system also explains why

Page 132: Linguistic side effects

122 4. Evaluation order in natural languages

Idnp ` np

who `(np\?(np\?s)

)((np)(np\?s)

)

what ` (np\?s)( (np)s)

···

(4.14) on page 99np, (saw, np) ` s

^ I⟨np, (saw, np)

⟩` ^s

Ids ` s

Unquote〈s〉 ` s

^E⟨np, (saw, np)

⟩` s

K′〈np〉, 〈saw, np〉 ` s

K′〈np〉, (〈saw〉, 〈np〉) ` s

T〈np〉, (〈saw〉, np) ` s

Root(〈np〉, (〈saw〉, np)

)} 1 ` s

Right′(〈saw〉, np) } (1, 〈np〉) ` s

Right′np }

((1, 〈np〉), 〈saw〉

)` s

) I(1, 〈np〉), 〈saw〉 ` np)s

( Ewhat }

((1, 〈np〉), 〈saw〉

)` np\?s Right′

(〈saw〉, what) } (1, 〈np〉) ` np\?s Right′(〈np〉, (〈saw〉, what)

)} 1 ` np\?s Left

〈np〉 }((〈saw〉, what), 1

)` np\?s T

np }((〈saw〉, what), 1

)` np\?s T

np }((saw, what), 1

)` np\?s ) I

(saw, what), 1 ` np)(np\?s)( E

who }((saw, what), 1

)` np\?(np\?s)

\? Enp,?(who } ((saw, what), 1)

)` np\?s Trace Left

np,?(who, ((saw, what) } 1)

)` np\?s

\? Iwho,

((saw, what) } 1

)` np\?(np\?s)

Rootwho, (saw, what) ` np\?(np\?s)

Figure 4.14. Who saw what

(4.63) is unacceptable: when the trace follows the pronoun, crossover rulesout (4.63).

However, more complex examples cast doubt on the hypothesis that it is thetrace that binds the pronoun.

(4.64) whosei father Alice said saw hisi mother(4.65) *whosei father Alice said hisi mother saw

In (4.64) and (4.65), the trace is associated with the entire pied-piped wh-phrase

Page 133: Linguistic side effects

4.5. Beyond quantification 123

whose father. That is, it is a father who is seeing or being seen. Yet, as thesubscripts in (4.64) indicate, the wh-word who alone can bind the pronoun. Thisexample suggests that the wh-word must be able to bind the pronoun directly afterall.

Interestingly, the second example (4.65) shows that wh-binding still requireslinear precedence of a sort: the wh-word who can bind the pronoun only if thetrace precedes the pronoun. In our system, a wh-phrase can bind a pronoun just asa quantificational noun phrase does. However, just as it is a superiority violationfor an in-situ wh-phrase to interfere between a raised wh-phrase and its trace, itis a crossover violation for a pronoun to interfere between an antecedent raisedwh-phrase and its trace. Put differently, the raised-wh question (4.65) violatescrossover because the corresponding in-situ-wh question Alice said hisi mothersaw whosei father violates crossover. In computational terms, the evaluationof a raised wh-phrase is delayed until its corresponding wh-trace. This complexpattern of interaction between wh-phrases and pronouns falls out from our unifiedaccount without additional stipulation.

To recap, our analysis of anaphora and interrogation in terms of in-situ quan-tification holds its own against Jäger’s robust previous analysis: we account forboth crossover and superiority, including the interaction between wh-raising andcrossover. Before us, many linguists have also noted and tried to explain that therelation between an antecedent and a pronoun that violate crossover is similar tothe relation between a wh-trace and an in-situ wh-phrase that violate superiority.Hornstein (1995) and Dayal (1996) separately argue that superiority reduces tocrossover, but they rely on encoding multiple-wh questions as functional-whquestions (Chierchia 1991, 1993). O’Neil (1993) stipulates crossover and supe-riority as two parts of his Generalized Scope Marking Condition, but he doesnot unify them. By contrast, we unify crossover and superiority to a commonhypothesis, namely left-to-right evaluation. This common hypothesis appliesacross linguistic side effects: not just to crossover in anaphora and superiorityin interrogation, but also to the preference for linear scope in quantification, asexplained in Section 4.4, and the effect of linear order on polarity sensitivity,which we turn to next.

4.5.3. Polarity sensitivity. The sentences (4.66) and (4.67) have the sametruth conditions: knowing whether anyone slept and knowing whether someoneslept are both knowing whether there exists an x such that x slept.

(4.66) Alice knows whether anyone slept.(4.67) Alice knows whether someone slept.

Examples like these make us want to analyze anyone as an existential quantifierjust like someone. Unfortunately for the simplicity of our theory, anyone and

Page 134: Linguistic side effects

124 4. Evaluation order in natural languages

someone (and more generally, any and some) are not always interchangeablein their existential usage. For example, whereas the sentence (4.68a) only has aninverse-scope reading ∃y.∀x.¬saw(x, y) (following Section 3.3.1’s use of logicalformulas to indicate meaning), the sentence (4.69a) only has a linear-scopereading ∀x.¬∃y. saw(x, y). Moreover, whereas the sentences (4.68b) and (4.68c)are acceptable, their counterparts (4.69b) and (4.69c) with someone replaced byanyone are unacceptable.

(4.68) a. Nobody saw someone.b. Everyone saw someone.c. Alice saw someone.

(4.69) a. Nobody saw anyone.b. *Everyone saw anyone.c. *Alice saw anyone.

These differences are termed polarity sensitivity, or polarity licensing, for thefollowing reason (Ladusaw 1979; Krifka 1995; inter alia). The word anyone is anegative polarity item: to a first approximation, it can occur only in a downward-entailing context, such as in the scope of a monotonically decreasing quantifier.(Dually, the word someone is a positive polarity item, but we focus on anyonehere.) A quantifier q, of type (np→ s)→ s, is monotonically decreasing just incase

(4.70) ∀s1.∀s2.(∀x. s2(x)⇒ s1(x)

)⇒ q(s1)⇒ q(s2).

Here s1 and s2 are properties, that is, of type np→ s. According to this approxi-mate generalization, the sentence (4.69a) has a linear-scope reading because thequantifier nobody is monotonically decreasing: letting q be λc.∀x.¬c(x) in (4.70)gives

(4.71) ∀s1.∀s2.(∀x. s2(x)⇒ s1(x)

)⇒(∀x.¬s1(x)

)⇒(∀x.¬s2(x)

),

which is a valid formula in second-order predicate logic. For example, let s1

be the property of seeing someone and s2 be the property of seeing a student:because seeing a student entails seeing someone, if nobody has the property ofseeing someone then nobody has the property of seeing a student.

By contrast, the sentence (4.69a) has no inverse-scope reading, and (4.69b)and (4.69c) have no reading whatsoever, because those contexts for anyone are notdownward-entailing. For instance, the quantifier everyone is not monotonicallydecreasing: letting q be λc.∀x. c(x) in (4.70) gives

(4.72) ∀s1.∀s2.(∀x. s2(x)⇒ s1(x)

)⇒(∀x. s1(x)

)⇒(∀x. s2(x)

),

which is not a valid formula in second-order predicate logic.

Page 135: Linguistic side effects

4.5. Beyond quantification 125

So far, we have presented the view that anyone is an existential quantifier thatcan only be used in certain contexts. In fact, it has been controversial in linguisticsfor decades whether anyone quantifies existentially, universally, both, or neither.Below we avoid that semantic debate and concentrate on the syntactic relationshipbetween nobody and anyone in (4.69a). We say that nobody licenses anyone:unlike everyone in (4.69b) and Alice in (4.69c), nobody in (4.69a) creates acontext in which anyone can occur.

Perhaps because it spans syntax and semantics, polarity sensitivity has beena popular linguistic phenomenon to analyze in logically-motivated approachesto grammar, such as the type-logical (Bernardi 2002; Bernardi and Moot 2001),categorial (Dowty 1994), and lexical-functional (Fry 1997, 1999) traditions. Fry(1997, 1999), Bernardi (2002), and Bernardi and Moot (2001) all split the clausetype s into several types, some of which entail others. Fry (1997, 1999) distin-guishes between the types s and ` ( (s ⊗ `) in linear logic, the first of whichentails the second. Analogously, Bernardi (2002) and Bernardi and Moot (2001)distinguish between different unary modes applied to s in multimodal type-logicalgrammar.

A simplistic version of Bernardi’s analysis is to distinguish between the clausetypes s (“neutral clause”) and �↓p^ps (“negative clause”). Here ^p is a new unarymode (the letter p stands for “polarity”), and we abbreviate �↓p^ps to s−. Wechoose the type �↓p^ps so that s ` s− is a theorem in multimodal type-logicalgrammar.

(4.73)

Ids ` s

^p I〈s〉p ` ^ps

�↓p Is ` �↓p^ps

Splitting the clause type like this helps us account for polarity sensitivity, becausewe can now assign different types to different quantifiers in the lexicon.

everyone ` s( (np)s)(4.74)

nobody ` s( (np)s−)(4.75)

anyone ` s−( (np)s−)(4.76)

The type of everyone is unchanged: it takes scope over a neutral clause to form aneutral clause. The types of nobody and anyone involve the newly introduced s−:they both take scope over a negative clause, but nobody forms a neutral clausewhereas anyone forms a negative clause. As the proof (4.73) shows, a neutralclause can be converted to a negative clause. Thus, as Figure 4.15 shows, anyonecan take scope in np, (saw, anyone) to form a negative clause s−, even though theverb saw produces a neutral clause s initially. As before, we consider a derivationcomplete if it culminates in the type s, not s−. The fact that np, (saw, anyone)

Page 136: Linguistic side effects

126 4. Evaluation order in natural languages

anyone ` s−( (np)s−)

···

(4.14) on page 99np, (saw, np) ` s

^p I⟨np, (saw, np)

⟩p ` ^ps

�↓p Inp, (saw, np) ` s−

Root(np, (saw, np)

)} 1 ` s−

Right′(saw, np) } (1, np) ` s−

Right′np }

((1, np), saw

)` s−

) I(1, np), saw ` np)s−

( Eanyone }

((1, np), saw

)` s−

Right′(saw, anyone) } (1, np) ` s−

Right′(np, (saw, anyone)

)} 1 ` s−

Rootnp, (saw, anyone) ` s−

Figure 4.15. The negative clause (incomplete sentence) np saw anyone

only has the type s− (as derived above) and not s predicts that a clause like *Alicesaw anyone (4.69c) is not a grammatical sentence by itself. Moreover, becausethere is no way to convert a negative clause to a neutral clause, everyone cannottake scope over a negative clause, so *Everyone saw anyone (4.69b) is ruledout as well. However, nobody (unlike Alice or everyone) can take scope overanyone to form a complete (neutral) clause. Thus even our simplistic renderingof Bernardi’s account predicts correctly that Nobody saw anyone (4.69a) has alinear-scope reading (but no inverse-scope reading). We also predict correctlythat the sentence

(4.77) Nobody introduced everyone to anyone.

has no linear-scope reading, and that it does have an interpretation on whichnobody scopes over anyone, then everyone. (Kroch (1974), Linebarger (1980),and Szabolcsi (2004) discuss such intervention cases.)

This picture is complicated by the empirical observation that the syntacticrelationship between the licensor and the licensee is restricted. For example,(4.69a) above is acceptable—nobody manages to license anyone—but

(4.78) *Anyone saw nobody.

is not. As the contrasts below further illustrate, the licensor usually must precedethe licensee. (The examples in (4.80) show that c-command is not at issue, becauseneither nobody nor anyone c-commands the other. The examples in (4.81) show

Page 137: Linguistic side effects

4.5. Beyond quantification 127

that subject-object asymmetry is not the culprit either, because the only subjectis I.)

(4.79) a. Alice didn’t visit anyone.b. *Anyone didn’t visit Alice.

(4.80) a. Nobody’s mother saw anyone’s father.b. *Anyone’s mother saw nobody’s father.

(4.81) a. I gave nothing to anyone.b. *I gave anything to nobody.c. I gave nobody anything.d. *I gave anyone nothing.

These and other examples show that the syntactic relationships allowed betweenlicensor and licensee for polarity are similar to those allowed between antecedentand pronoun for anaphora.

Because Fry, Bernardi, and Bernardi and Moot focus on quantification andscope, they easily characterize how nobody must take scope over anyone, butthey leave it a mystery why nobody must precede anyone. In particular, theiraccounts wrongly accept all of (4.78)–(4.81). Ladusaw (1979) notes this mysteryin his Inherent Scope Condition: “If the negative polarity item is clausematewith the trigger, the trigger must precede” (Section 4.4). Likewise, Fry (1999;Section 8.2) faults current accounts of polarity sensitivity for ignoring linearorder. Ladusaw goes on to speculate that this left-right requirement is related toquantifier scope and sentence processing:

I do not at this point see how to make this part of the InherentScope Condition follow from any other semantic principle.This may be because the left-right restriction, like the left-rightrule for unmarked scope relations, should be made to followfrom the syntactic and semantic processing of sentences . . . .(Section 9.2)

This speculation is our claim: the preference for linear order in quantificationand the requirement that a polarity licensor precede its licensee are both due toleft-to-right evaluation order. Our type-logical treatment of in-situ quantificationand its application to multiple linguistic side effects provide precisely the missinglink between linear order and polarity sensitivity.

Recall from Section 4.4 that inverse scope can be generated, even under aregime of left-to-right evaluation, using the Unquote rule in (4.47) on page 112.The crucial step, as illustrated in Figure 4.11 on page 113, is to unquote theclause produced by the narrower-scope quantifier. In that example, everyonecan take inverse scope over someone, because someone produces a neutralclause s. A neutral clause can be unquoted because its type s is shorthand for

Page 138: Linguistic side effects

128 4. Evaluation order in natural languages

^us′, which is enclosed in ^u. By contrast, a negative clause cannot be unquotedbecause its type s− is shorthand for �↓p^p^us′, which is not enclosed in ^u. Giventhat anyone produces s−, then, inverse scope over anyone is impossible. Thisprediction correctly rules out the unacceptable examples in (4.78)–(4.81) whileleaving (4.77) available.

This explanation for linear order in polarity sensitivity is further developedelsewhere (Shan 2004b), including treating someone as a positive polarity item.We are not aware of any other account of polarity sensitivity that unifies itssyntactic properties with crossover as we do using left-to-right evaluation.

4.5.4. Reversing evaluation order. One way to check that our implementa-tion of evaluation order unifies linear-order effects in crossover, superiority, andpolarity is to impose right-to-left evaluation and see what happens. Because ourLeft and Right′ rules embody left-to-right evaluation order, we can reverse defaultevaluation order while leaving all other aspects of the system unchanged.

Specifically, suppose that we replace the Left and Right′ rules in (4.44) onpage 110 with

Γ[Θ } (〈Π〉,∆)] ` T=================== Left′Γ[(Θ, 〈Π〉) } ∆] ` T

Γ[(Θ,Π) } ∆] ` T================= Right,Γ[Π } (∆,Θ)] ` T

(4.82)

and the Trace Left and Trace Right′ rules in (4.60) and (4.61) on page 120 with13

Γ[A,? (Θ } (〈Π〉, ∆))] ` TTrace Left′

Γ[A,? (Θ, (〈Π〉 } ∆))] ` T(4.83)

Γ[A,? (Π } (∆, Θ))] ` TTrace Right.

Γ[A,? (Π, (Θ } ∆))] ` T(4.84)

We can still derive linear and inverse scope for quantifiers, but the predictionsconcerning the crossover examples in (4.48) on page 114, the superiority examplesin (4.55) on page 119, and the polarity examples in (4.78)–(4.81) on pages 126–127 reverse: an antecedent must follow any pronoun it binds, a wh-trace mustfollow any in-situ wh-phrase, and a polarity licensor must follow any polarityitem it licenses.

13As one would expect from Footnote 12.

Page 139: Linguistic side effects

CHAPTER 5

Delimited duality in programming languages

In this chapter, we apply the study of in-situ quantification in natural languagesto the study of delimited control in programming languages. We present aprogramming language with delimited control and equip it with a duality, that is,an involution on programs and their execution. This duality is of interest for tworeasons.

First, the duality exchanges call-by-value and call-by-name, two evaluationorders long studied in programming languages. Thus, not only can call-by-valueand call-by-name be encoded in each other (Plotkin 1975), but the same encodingworks in both directions in a suitable language, so a programmer can choose moreeasily between them. We are not the first to note that call-by-value and call-by-name are dual to each other (Filinski 1989a,b; Danos et al. 1995; Curien andHerbelin 2000; Selinger 2001; Wadler 2003). Rather, we are the first to constructsuch a duality in the presence of delimited rather than undelimited control. Thisdifference arises because we draw our inspiration from quantification in naturallanguage: the context over which a quantifier takes scope is delimited.

Second, the duality is the first to exchange contexts using let (defined in (2.32)on page 34) with expressions using shift (defined in Figure 3.9 on page 76).Although both features are useful, let is much more familiar than shift to thetypical programmer, who can thus understand shift by way of let and use shiftmore effectively. The intuition underlying this duality between let and shift is thata shift-expression is a function that applies outward and a let-context is a functionthat applies inward, like a quantifier and its scope in natural language.

Two ideas from type-logical grammar motivate the design of our programminglanguage. The first idea is that, given a decomposition of an expression into asubexpression and a context, we can treat both the subexpression and the contextas binary trees and depict them as unitrivalent graphs. In particular, accordingto the structural rules proposed in Section 4.1 in type-logical grammar, the samedefault binary mode builds up subexpressions in an utterance as well as thedelimited contexts over which they may take scope.

The second idea is that, when two syntactic structures combine, either con-stituent may be a function that takes the other as argument. In particular, the /Eand \ E rules in Section 2.4 both use the default mode to combine the premises’type environments, so the direction of function application—that is, which con-

129

Page 140: Linguistic side effects

130 5. Delimited duality in programming languages

stituent applies to which—can be ambiguous given only the conclusion’s typeenvironment.

For the default mode, which prepends one constituent to another, this bidirec-tional function application yields a duality between left and right. That is, thereversal involution − on types and type environments defined by

(5.1)

−A = A,−(T2/T1) = −T1\−T2,−(T1\T2) = −T2/−T1,−(T2/mT1) = −T2/m−T1,−(T1\mT2) = −T1\m−T2,

−(x : T ) = x : −T ,−(Γ, ∆) = −∆, −Γ,−(Γ,m ∆) = −Γ,m −∆

(for any non-default mode m) extends trivially to an involution on grammars andtheir derivations, by flipping the direction of function application. For the }mode,which plugs one constituent into another, the same bidirectionality yields a dualitybetween inside and outside: the context may apply to the subexpression, as wellas vice versa. That is, the reversal involution ¬ on types and type environmentsdefined by

(5.2)

¬A = A,

¬(T2( T1) = ¬T1)¬T2,

¬(T1)T2) = ¬T2( ¬T1,¬(T2/mT1) = ¬T2/m¬T1,¬(T1\mT2) = ¬T1\m¬T2,

¬(x : T ) = x : ¬T ,¬(Γ } ∆) = ¬∆ } ¬Γ,¬(Γ,m ∆) = ¬Γ,m ¬∆

(for any non-continuation mode m) again extends trivially to an involution ongrammars and their derivations, by flipping the direction of function application.Our side-effect analogy enables us to transfer this involution to a programminglanguage with delimited control.

5.1. Call-by-value versus call-by-name

As mentioned in Section 2.3.1, the simply-typed λ-calculus is confluent, eventhough the computation relation is not deterministic: the same expression canexecute in a variety of ways to yield the same result. We can make executiondeterministic by imposing an evaluation order, such as call-by-value, left-to-rightevaluation as enforced in Section 3.1.2. As explained in Section 3.1.1, evaluationorder affects the final outcome of programs with control operators like abort andshift. Furthermore, in the untyped λ-calculus, not all programs terminate as in thesimply-typed λ-calculus, and evaluation order affects whether programs can ormust terminate. For example, the untyped program

(5.3) (λy. λz. z)((λx. xx)(λx. xx))

Page 141: Linguistic side effects

5.1. Call-by-value versus call-by-name 131

Values Γ `V V

Γ, x ` EΓ `V λx. E Γ, x `V x

Expressions Γ ` E

Γ `V VΓ ` V

Γ ` EΓ ` #E

Γ, cD ` EΓ ` ξc. E

Γ ` F Γ ` EΓ ` FE

Γ, cD ` EΓ, cD ` #(cE)

Evaluation subcontexts D[ ] : d

[ ] : d` F D[ ] : d

D[F[ ]] : dD[ ] : d `V V

D[[ ]V] : d

Evaluation contexts C[ ]D[ ] : d

D[ ]C[ ] D[ ] : d

C[#(D[ ])]

Computation #E B #E′

#(C[(λx. E′)V]) B #(C[E′ {x 7→ V}])#(C[#V]) B #(C[V])

#(C[#(D[ξc. E′])]) B #(C[#E′ {c[ ] 7→ D[ ]}])

Figure 5.1. An untyped λ-calculus with shift and reset, enforcing call-by-value,argument-to-function evaluation

(based on (2.33) on page 35) computes to itself as well as the value

(5.4) λz. z.

Hence it initiates an infinite sequence of computation steps as well as finite onesthat conclude in a normal form. On one hand, if we restrict the computationrelation to rule out the step from (5.3) to (5.4), for example by enforcing call-by-value evaluation, then we prevent the program from stopping. On the other hand,if we rule out the step from (5.3) to itself, then we force the program to stop rightaway.

In Sections 3.1 and 3.2, we introduced the delimited control constructs #E(called prompt or reset) and ξc. E (called shift) in a programming language basedon the simply-typed λ-calculus. Figures 5.1 and 5.2 define two variants of theuntyped λ-calculus with shift and reset, with the same set of expressions butdifferent computation relations. Each of Figures 5.1 and 5.2 provides inferencerules for four judgment forms. The judgment Γ ` E classifies E as a well-formed

Page 142: Linguistic side effects

132 5. Delimited duality in programming languages

Values Γ `V V

Γ, x ` EΓ `V λx. E

Expressions Γ ` E

Γ `V VΓ ` V Γ, x ` x

Γ ` EΓ ` #E

Γ, cD ` EΓ ` ξc. E

Γ ` F Γ ` EΓ ` FE

Γ, cD ` EΓ, cD ` #(cE)

Evaluation subcontexts D[ ] : d

[ ] : dD[ ] : d ` E

D[[ ]E] : d

Evaluation contexts C[ ]D[ ] : d

D[ ]C[ ] D[ ] : d

C[#(D[ ])]

Computation #E B #E′

#(C[(λx. E′)E]) B #(C[E′ {x 7→ E}])#(C[#V]) B #(C[V])

#(C[#(D[ξc. E′])]) B #(C[#E′ {c[ ] 7→ D[ ]}])

Figure 5.2. An untyped λ-calculus with shift and reset, enforcing call-by-name,context-to-function evaluation

expression in the type environment Γ. As a special case, the judgment Γ `V Vclassifies V as a value expression. A type environment Γ is a finite, possiblyempty set of variables, some subscripted by D (as in cD) to indicate a contextvariable. The judgment D[ ] : d classifies D[ ] as an evaluation subcontext. Thejudgment C[ ] classifies C[ ] as an evaluation context: essentially, a sequence ofsubcontexts separated by control delimiters.

Compared to the programming language in Figure 3.9 on page 76, thesevariants are more suitable for our purposes in this chapter because they incorporatetwo changes. First, we consider an expression to be a complete program only if itis enclosed by #.1 Thus we define the computation relation only for expressions

1If we were considering a delimited control operator (such as Gunter et al.’s cupto (1995,1998)) that, unlike shift, allows removing a control delimiter around an expression that is not avalue, then for an expression to be enclosed by a delimiter would no longer guarantee that thecontrol operators in the expression execute successfully. It would then be no longer appropriate toconsider only an expression enclosed by a delimiter as a complete program.

Page 143: Linguistic side effects

5.1. Call-by-value versus call-by-name 133

enclosed by #. Second, the two variants enforce two orders of evaluation. Thevariant in Figure 5.1 enforces call-by-value but argument-to-function evaluation.Argument-to-function evaluation means that, when a function F is applied to anargument E, evaluation proceeds from E to F. Because we write F to the left of Eas usual, argument-to-function evaluation is right-to-left evaluation—the oppositeof left-to-right evaluation as in Chapters 3 and 4. We consider the argument-to-function variant of call-by-value evaluation here because it turns out to be dual tostandard (“context-to-function”) call-by-name evaluation.2 The latter evaluationorder is shown in Figure 5.2. Although shift and reset have been proposed onlyfor call-by-value languages in the literature, we think the system in Figure 5.2is a reasonable proposal for shift and reset under call-by-name: it extends thecall-by-name λ-calculus (Plotkin 1975) and otherwise tracks Figure 5.1.

As explained in Section 3.1.2, call-by-value evaluation means to substituteonly values for variables and never evaluate a subexpression inside a body (thatis, under a variable binding). By contrast, call-by-name evaluation means tosubstitute any expressions for variables but never evaluate a subexpression insidea body or an argument. For example, whereas call-by-value evaluation computes

(5.5) #((λy. λz. z)((λx. xx)(λx. xx)))

to itself only, call-by-name evaluation computes it to

(5.6) #(λz. z)

only. To take another example, the program

(5.7) #(let ξc. 1 be x. 2),

which is shorthand for

(5.8) #((λx. 2)(ξc. 1)),

computes to #1 under call-by-value evaluation, but #2 under call-by-name evalua-tion.3

The analysis of in-situ quantification in Chapter 4 helps us understand thedifference between call-by-value and call-by-name, by treating the plugging of anexpression into a subcontext as a mode of syntactic combination (namely }) andby visualizing syntactic combination in terms of unitrivalent graphs. To take asimple example, the program (5.7) is the result of plugging the expression ξc. 1

2Call-by-value, function-to-argument evaluation is dual to a nonstandard (“function-to-con-text”) kind of call-by-name evaluation.

3Often “call-by-name” is taken to mean that one may substitute any expressions for variables,and may also evaluate a subexpression inside an argument or even a body. Call-by-name asdefined here is a more restrictive—in fact deterministic—computation relation.

Page 144: Linguistic side effects

134 5. Delimited duality in programming languages

into the subcontext let [ ] be x. 2 inside a control delimiter. Using notation fromChapter 4, we can write the program roughly as

(5.9) #(ξc. 1 } let [ ] be x. 2).

Intuitively, to the left of } is a structure with the syntactic type int( T : it combineswith any type T to the right (that is, outside) to produce 1. To the right of } isa structure with the syntactic type T)int: it combines with any type T to theleft (that is, inside) to produce 2. Because T can be any type for each of theseconstituents, they can combine by either the( E rule or the ) E rule. Semanticallyspeaking, the constant function returning 1 can apply to the constant functionreturning 2, as well as vice versa. The duality in (5.2) on page 130 makes thedirection of function application ambiguous in (5.9), which we can visualize asthe following trivial graph.

(5.10)let [ ] be x. 2

ξc. 1

To break the symmetry, we can restrict the direction of function application,as follows. If we prohibit taking an argument whose type is of the form T2( T1,then let [ ] be x. 2 can no longer apply to the non-value ξc. 1 by the ) E rule,and (5.9) means 1 only, as call-by-value specifies. Dually, if we prohibit takingan argument whose type is of the form T1)T2, then ξc. 1 can no longer applyto the non-subcontext let [ ] be x. 2 by the( E rule, and (5.9) means 2 only, ascall-by-name specifies. Intuitively, a shift expression like ξc. 1 is a function thatapplies outward to the subcontext it is plugged into, as a quantificational nounphrase does. Dually, a let subcontext like let [ ] be x. 2 is a function that appliesinward to the subexpression plugged into it, as a gapped clause does. Call-by-value means that the subexpression takes priority in applying to the subcontext,whereas call-by-name means that the subcontext takes priority in applying to thesubexpression.

To visualize more complex programs, we need trivalent nodes in our graphs.For example, the program (5.5) on page 133 corresponds to the graph below.

(5.11) [ ]??

???

λy. λz. z

�����

???????

λx. xx

�����λx. xx

?????

A trivalent node indicates function application. Because the λ-calculus (unlike

Page 145: Linguistic side effects

5.1. Call-by-value versus call-by-name 135

the Lambek calculus) distinguishes a function applied by putting it to the leftof the argument, the trivalent nodes in this chapter (unlike those in Chapter 4)also distinguishes a function applied, by putting its edge perpendicular to the twoother edges.

In general, a λ-expression corresponds to a “pipeline” that connects an argu-ment at one end (such as λx. xx in the lower-right corner above) to a subcontextat the other end (such as [ ] in the upper-left corner above). Along the pipelineare zero or more functions (two above), which combine successively with theend argument to yield what plugs into the end subcontext, or dually, combinesuccessively with the end subcontext to yield what the end argument plugs into.Each of these functions in turn forms one side of another pipeline. For example,a trivially short pipeline connects the function λy. λz. z above to the subcontext[ ]((λx. xx)(λx. xx)). Similarly, depicting the program

(5.12) #((fx)(gy))

as the graph

(5.13) [ ]??

???

f??

???? �������

???????

x

�����g

�����y

?????

makes clear that a pipeline with just the function f connects x to the subcontext[ ](gy).

One way to specify the order of evaluation in a graph like these is to orient thepipelines: for each pipeline, pick one end as the start and the other as the finish.For example, we could orient the graph (5.11) as

(5.14) [ ] __??

???

λy. λz. z

??�����

???????

λx. xx

??�����λx. xx

?????

(in which each arrowhead orients one or more colinear edges, which form apipeline), then consider the parts of the graph for evaluation from start to finish.That is, we first consider the argument λx. xx (which is already a value), thenconsider the function λx. xx (which applies). Because (λx. xx)(λx. xx) β-reducesto itself, we never get to consider the function λy. λz. z further along the pipeline.

Page 146: Linguistic side effects

136 5. Delimited duality in programming languages

Similarly, if we orient the graph (5.13) as

(5.15) [ ] __??

???

f

��???

???

??�������

???????

x

�����g

??�����y,

?????

it means to apply g to y first, then apply f to x, and finally apply fx to gy. Roughly,to orient every pipeline towards the “root node” [ ] is to mandate call-by-value,argument-to-function evaluation.

Dually, we can orient every pipeline away from the root [ ]. In the caseof (5.11), it means to apply λy. λz. z first, yielding the value λz. z.

(5.16) [ ]??

???

λy. λz. z��

�����

???????

λx. xx��

�����λx. xx

��

?????

In the case of (5.13), it means to apply f first, then the function resulting from fx,without inspecting gy.

(5.17) [ ]??

???

f __

????

?? �������

???????

x��

�����g

��

�����y

��

?????

Roughly, to orient every pipeline away from the “root node” [ ] is to mandatecall-by-name, context-to-function evaluation.

In the next section, we formalize these intuitions and make the duality betweencall-by-value and call-by-name explicit in an extended programming languagewith delimited control.

5.2. Delimited control in a dual calculus

Figure 5.3 defines the dual calculus, a programming language in which eachcomplete program, a statement G, is a pipeline in which zero or more intermediatestages connect a left end to a right end. The intermediate stages and two endsof a pipeline are separated by semicolons ; in between. Accordingly, we write a

Page 147: Linguistic side effects

5.2. Delimited control in a dual calculus 137

Leaf terms Γ `L E;

Γ, x;, ; c ` GΓ `L x.(G).c; Γ, x; `L x; Γ `L [];

Γ ` E; Γ ` ; DΓ `L (E; −; D);

Γ ` GΓ `L #G;

Γ, ; c ` GΓ `L (G).c;

Leaf coterms Γ `L ; D

Γ, x;, ; c ` GΓ `L ; x.(G).c Γ, ; c `L ; c Γ `L ; []

Γ ` E; Γ ` ; DΓ `L ; (E; −; D)

Γ ` GΓ `L ; #G

Γ, x; ` GΓ `L ; x.(G)

Terms Γ ` E;

Γ `L E;Γ ` E;

Γ ` E; Γ ` E′;Γ ` E; (E′; );

Γ ` E; Γ ` ; D′

Γ ` E; (; D′);

Coterms Γ ` ; D

Γ `L ; DΓ ` ; D

Γ ` ; D′ Γ ` ; DΓ ` ; (; D′); D

Γ ` E′; Γ ` ; DΓ ` ; (E′; ); D

Statements Γ ` GΓ ` E; Γ `L ; DΓ ` E; D

(These two rules are equivalent.)Γ `L E; Γ ` ; DΓ ` E; D

Evaluation contexts C[ ]

[ ]C[ ] Γ ` ; D

C[#[ ]; D]Γ ` E; C[ ]

C[E; #[ ]]

Computation G1 B G2

C[E; (E′; ); D] B C[E′; (E; −; D)] C[#(E; []); D] B C[E; D]C[E; (; D′); D] B C[(E; −; D); D′] C[E; #([]; D)] B C[E; D]

C[x.(G).c; (E; −; D)] B C[G {x 7→ E} {c 7→ D}] C[(G).c; D] B C[G {c 7→ D}]C[(E; −; D); x.(G).c] B C[G {x 7→ E} {c 7→ D}] C[E; x.(G)] B C[G {x 7→ E}]

Figure 5.3. A dual calculus with delimited control operators

term E—an incomplete pipeline lacking a right end—with a dangling semicolonto the right, and a coterm D—an incomplete pipeline lacking a left end—witha dangling semicolon to the left. Each intermediate stage is in turn a term or acoterm, so a generic statement might look like

(5.18) E1; (E2; ); (; (E3; ); D4); (E5; ); D6

Page 148: Linguistic side effects

138 5. Delimited duality in programming languages

and be depicted as a graph like

(5.19)

E2

��

E5

��E1//

��

D6.

E3oo

D4

The statement (5.18) singles out the long horizontal pipeline in this graph as adistinguished trunk.

We use the metavariable x for a term variable, and the metavariable c for acoterm variable. This convention follows one of two possible, dual intuitions,namely that terms correspond to expressions in the λ-calculus, and coterms tosubcontexts. A type environment Γ here is a finite set of variables, each markeda term variable (x;) or a coterm variable (; c). As defined by the judgment formΓ `L E;, the possible left ends E; are:

• an abstraction x.(G).c;,• a variable x;,• the unit [];,• an activation (E; −; D);, related to subtraction (also known as coimplica-

tion) in logic (Crolard 2001, 2004),• a delimited substatement #G;,• a shift-term (G).c;.

Dually, the judgment form Γ `L ; D defines the possible right ends ; D to be:

• an abstraction ; x.(G).c,• a variable ; c,• the unit ; [],• an activation ; (E; −; D),• a delimited substatement ; #G,• a let-coterm ; x.(G).

An abstraction always takes two arguments x and c at once, one from eachside of the pipeline. The abstraction can effectively take just one argument byreinstating the other right away, as in x.(E; c).c and x.(x; D).c. This “dot” notationfor abstraction is inspired by Wadler’s (2003).

Be it as a term or a coterm, the unit [] represents the null subcontext [ ] inthe λ-calculus. On the other hand, an activation (E; −; D) represents a subcontextof the form D[[ ]E] in the λ-calculus. It can be thought of as a stack frame forinvoking a function with two inputs E; and ; D, one of which is the argument andthe other is the return address or continuation.

Page 149: Linguistic side effects

5.2. Delimited control in a dual calculus 139

Translating expressions E to terms E

λx. E = x.(E; c).c;

x = x;

#(cE) = #(E; c);

ξc. E = (E; []).c;

FE = E; (F; );

#E = #(E; []);

Translating evaluation subcontexts D[ ] to coterms D[ ]

[ ] = ; []

D[F[ ]] = ; (F; ); D[ ]

D[[ ]E] = ; (E; −; D[ ])

Figure 5.4. Translating the λ-calculus to the dual calculus

Graphically speaking, each left or right end that is not an activation corre-sponds to a leaf node in a graph. The leaf node corresponding to an abstraction, adelimited substatement, a shift-term, or a let-coterm contains a child graph within.An activation corresponds to coming to a stop at a T-intersection at the end of apipeline. For example, the statement

(5.20) (E1; (E2; ); −; (E5; ); D6); (E3; ); D4

corresponds to the same graph (5.19) as before, but with the long vertical pipelinesingled out as the distinguished trunk.

The judgment form Γ ` G defines statements G formally. A statement issimply the result of plugging a term E; and a coterm ; D together to form acomplete pipeline E; D. Crucial to the notation and duality of our language is thefact that concatenating incomplete pipelines is an associative operation. Thus, forexample, the statement (5.18) is the result of plugging together

• the term E1; (E2; ); (; (E3; ); D4); (E5; ); and the coterm ; D6 (using the firstrule for statements in Figure 5.3 on page 137),• the term E1; (E2; ); (; (E3; ); D4); and the coterm ; (E5; ); D6,• the term E1; (E2; ); and the coterm ; (; (E3; ); D4); (E5; ); D6, as well as• the term E1; and the coterm ; (E2; ); (; (E3; ); D4); (E5; ); D6 (using the

second rule for statements in Figure 5.3 on page 137).

Page 150: Linguistic side effects

140 5. Delimited duality in programming languages

Figure 5.4 translates the λ-calculus with shift and reset to the dual calculus.More precisely, it translates expressions E to terms E and evaluation subcon-texts D[ ] to coterms D[ ]. For example, the λ-expression (5.3) on page 130translates to the term

(5.21) (λy. λz. z)((λx. xx)(λx. xx)) = x.(x; (x; ); c).c;(x.(x; (x; ); c).c; );(y.(z.(z; c).c; c).c; );.

Enclosing the expression (5.3) in # to form the program (5.5) on page 133 in theλ-calculus corresponds to plugging together this term and the unit coterm ; [] inthe dual calculus. The resulting statement can execute in the call-by-value way,yielding the following infinite loop of computation steps.

x.(x; (x; ); c).c; (x.(x; (x; ); c).c; ); (y.(z.(z; c).c; c).c; ); []B x.(x; (x; ); c).c; (x.(x; (x; ); c).c; −; (y.(z.(z; c).c; c).c; ); [])B x.(x; (x; ); c).c; (x.(x; (x; ); c).c; ); (y.(z.(z; c).c; c).c; ); []

(5.22)

It can also execute in the call-by-name way, concluding in a normal form.

x.(x; (x; ); c).c; (x.(x; (x; ); c).c; ); (y.(z.(z; c).c; c).c; ); []B y.(z.(z; c).c; c).c; (x.(x; (x; ); c).c; (x.(x; (x; ); c).c; ); −; [])B z.(z; c).c; []

(5.23)

We distinguish between call-by-value and call-by-name execution in the dualcalculus only informally here. Section 5.3 below formalizes the distinction.

For another example, the λ-calculus program (5.7) on page 133 correspondsto the statement

(5.24) (1; []).c; (x.(2; c).c; ); []

if we take 1 and 2 to translate to themselves. The call-by-value way to executethis statement binds c to the coterm ; (x.(2; c).c; ); [] and discards it.

(1; []).c; (x.(2; c).c; ); [] B 1; [](5.25)

The call-by-name way, on the other hand, binds x to the term (1; []).c; and discardsit.

(1; []).c; (x.(2; c).c; ); [] B x.(2; c).c; ((1; []).c; −; []) B 2; [](5.26)

Were we not to treat #(let ξc. 1 be x. 2) as just shorthand for #((λx. 2)(ξc. 1)), wecould translate the former expression more faithfully into the dual calculus as thestatement

(5.27) (1; []).c; x.(2; []),

Page 151: Linguistic side effects

5.3. Restoring confluence 141

¬(x.(G).c;) = ; c.(¬G).x ¬(; x.(G).c) = c.(¬G).x;¬(x;) = ; x ¬(; c) = c;¬([];) = ; [] ¬(; []) = [];

¬((E; −; D);) = ; (¬D; −;¬E) ¬(; (E; −; D)) = (¬D; −;¬E);¬(#G;) = ; #¬G ¬(; #G) = #¬G;¬((G).c;) = ; c.(¬G) ¬(; x.(G)) = (¬G).x;

¬(E; (E′; );) = ; (;¬E′);¬E ¬(; (; D′);D) = ¬D; (¬D′; );¬(E; (; D′);) = ; (¬D′; );¬E ¬(; (E′; );D) = ¬D; (;¬E′);

¬(E; D) = ¬D;¬E

Figure 5.5. Duality in the dual calculus

formed by plugging together the shift-term (1; []).c; and the let-coterm ; x.(2; []).These examples demonstrate that the computation relation for the dual calculus isnot confluent.

The translations in Figure 5.4 on page 139 respect substitution: mutual induc-tion on the structure of E′ and D′[ ] shows that

E′{x 7→ E

}= E′ {x 7→ E}, E′

{c 7→ D[ ]

}= E′ {c[ ] 7→ D[ ]},(5.28)

D′[ ]{x 7→ E

}= D′[ ] {x 7→ E}, D′[ ]

{c 7→ D[ ]

}= D′[ ] {c[ ] 7→ D[ ]}.(5.29)

The translation in Figure 5.4 is, as one would expect, only one of two dualtranslations possible. Figure 5.5 inductively defines an obvious involution ¬between terms and coterms of the dual calculus, and among its statements. Thisinvolution respects the computation relation. Translating expressions E andsubcontexts D[ ] from the λ-calculus to coterms ¬E and terms ¬D[ ] in the dualcalculus works just as well as translating them to terms E and coterms D[ ].

5.3. Restoring confluence

Figure 5.6 shows a restriction of the dual calculus that makes it confluent.It turns out that this restriction enforces call-by-value evaluation under the in-tuition (Figure 5.4 on page 139) that terms are subexpressions and coterms aresubcontexts, but call-by-name evaluation under the dual (Figure 5.5 on page 141)intuition that terms are subcontexts and coterms are subexpressions. The judg-ment forms Γ `V V; and Γ `V ; U define certain terms and coterms, includingabstraction, to be values V; and covalues ; U. Furthermore, in an abstraction(E; −; D), the term E; must be a value, even though the coterm ; D need not be acovalue. The revised computation relation is crafted to only substitute a value

Page 152: Linguistic side effects

142 5. Delimited duality in programming languages

Values Γ `V V;

Γ, x;, ; c ` GΓ `V x.(G).c; Γ, x; `V x; Γ `V [];

Γ `V V; Γ ` ; DΓ `V (V; −; D);

Covalues Γ `V ; U

Γ, x;, ; c ` GΓ `V ; x.(G).c

Leaf terms Γ `L E;

Γ `V V;Γ `L V;

Γ ` GΓ `L #G;

Γ, ; c ` GΓ `L (G).c;

Leaf coterms Γ `L ; D

Γ `V ; UΓ `L ; U Γ, ; c `L ; c Γ `L ; []

Γ `V V; Γ ` ; DΓ `L ; (V; −; D)

Γ ` GΓ `L ; #G

Γ, x; ` GΓ `L ; x.(G)

Terms Γ ` E;

Γ `L E;Γ ` E;

Γ ` E; Γ ` E′;Γ ` E; (E′; );

Γ ` E; Γ ` ; D′

Γ ` E; (; D′);

Coterms Γ ` ; D

Γ `L ; DΓ ` ; D

Γ ` ; D′ Γ ` ; DΓ ` ; (; D′); D

Γ ` E′; Γ ` ; DΓ ` ; (E′; ); D

Statements Γ ` GΓ ` E; Γ `L ; DΓ ` E; D

(These two rules are equivalent.)Γ `L E; Γ ` ; DΓ ` E; D

Evaluation contexts C[ ]

[ ]C[ ] Γ ` ; D

C[#[ ]; D]Γ `V V; C[ ]

C[V; #[ ]]

Computation G1 B G2

C[V; (E′; ); D] B C[E′; (V; −; D)] C[#(V; []); D] B C[V; D]C[V; (; D′); D] B C[(V; −; D); D′] C[V; #([]; U)] B C[V; U]

C[x.(G).c; (V; −; D)] B C[G {x 7→ V} {c 7→ D}] C[(G).c; D] B C[G {c 7→ D}]C[(V; −; D); x.(G).c] B C[G {x 7→ V} {c 7→ D}] C[V; x.(G)] B C[G {x 7→ V}]

Figure 5.6. A dual calculus with delimited control operators, enforcing call-by-value evaluation

Page 153: Linguistic side effects

5.3. Restoring confluence 143

for a term variable, even though any coterm may still be substituted for a cotermvariable. The notion of a covalue U only governs when to remove the controldelimiter from a delimited substatement coterm. It is easy to check that thiscomputation relation is deterministic, hence confluent.

Let us revisit the example statements above. This restriction on the dualcalculus rules out the computation sequence (5.23) on page 140, leaving (5.22) onpage 140. It also rules out the computation sequence (5.26) on page 140 in favorof (5.25) on page 140. The dual restriction (that is, conjugating Figure 5.6 by theinvolution ¬ in Figure 5.5 on page 141) makes the opposite choices among thesecomputation sequences. Although the two dual restrictions of the dual calculusdo not have the same expressions—an abstraction must have a value term undercall-by-value, versus a covalue coterm under call-by-name—they both containthe image of the translation from the λ-calculus to the dual calculus in Figure 5.4on page 139 (and the dual of that image).

To formalize how the dual calculus simulates the λ-calculus, let C[ ] be acall-by-value evaluation context in the λ-calculus enclosed by #, say

(5.30) C[ ] = #(Dn[. . . [#(D1[#(D0[ ])])] . . . ]).

We write Ch[ ] (h for “head”) for the subcontext D0[ ]. We also write Ct[ ](t for “tail”) for the context #(Dn[. . . [#(D1[ ])] . . . ]). Plugging a λ-expression Einto C[ ] gives the program C[E]. Correspondingly, we can plug the dual-calculusstatement E; Ch[ ] into the evaluation context Ct[ ][ ] defined by

(5.31) Ct[ ][ ] = #( . . . (#[ ]; D1[ ]) . . . ); Dn[ ].

(In particular, if n = 0 and Ct[ ] is just [ ], then Ct[ ][ ] is also just [ ].) We writeE@C[ ] for the resulting statement Ct[ ]

[E; Ch[ ]

].

As a special case, if V is a value expression in the call-by-value λ-calculus,then V; is a value term in the call-by-value dual calculus, and the statement V; []is in normal form. As for computation, the following theorem maps each step inthe λ-calculus to one or more steps in the dual calculus.

T 5.1 (Call-by-value simulation). Let C1[E1] = C2[E2] in the call-by-value λ-calculus; in other words, suppose that C1[E1] and C2[E2] are two waysto decompose the same λ-expression. Suppose that C2[E2] B C′2[E′2] as follows,by one of the three kinds of computation steps in Figure 5.1 on page 131.

E2 E′2 C′2[ ]

(λx. E′)V E′ {x 7→ V} C2[ ]#V V C2[ ]

ξc. E′ E′ {c[ ] 7→ Ch2[ ]} Ct

2[#[ ]]

Then E1@C1[ ] B+ E′2@C′2[ ] in the call-by-value dual calculus.

Page 154: Linguistic side effects

144 5. Delimited duality in programming languages

P. Inspecting the definition of values and evaluation contexts in Figure 5.1on page 131 shows that E1 and E2, and accordingly C1[ ] and C2[ ], are related inone of three ways.

In the first case, C[E1] = E2 for some context C[ ]; that is, C1[ ] = C2[C[ ]]for some context C[ ]. We consider a computation step of each kind in turn.

• If E2 = (λx. E′)V , then C[ ] must be [ ], [ ]V , or (λx. E′)[ ], so Ct1[ ] =

Ct2[ ]. The computation sequence

(5.32) E2@C2[ ] = Ct2[ ][V; (x.(E′; c).c; ); Ch

2[ ]]= V@C2[(λx. E′)[ ]]

B Ct2[ ][x.(E′; c).c; (V; −; Ch

2[ ])]= (λx. E′)@C2[[ ]V]

B Ct2[ ][E′{x 7→ V

}; Ch

2[ ]]

= E′ {x 7→ V}@C2[ ]= E′2@C′2[ ]

in the dual calculus covers all three cases for C[ ]. The second-to-lastequality uses the fact that

(5.33) E′{x 7→ V

}= E′ {x 7→ V},

from (5.28) on page 141.• If E2 = #V , then C[ ] must be [ ] or #[ ]. In either case, we have

E1@C1[ ] = E2@C2[ ]

= Ct2[ ][#(V; []); Ch

2[ ]]

B Ct2[ ][V; Ch

2[ ]]

= E′2@C′2[ ].

(5.34)

• If E2 = ξc. E′, then C[ ] must be [ ], and we have

E1@C1[ ] = E2@C2[ ]

= Ct2[ ][(E′; []).c; Ch

2[ ]]

B Ct2[ ][E′{c 7→ Ch

2[ ]}; []]

= E′{c[ ] 7→ Ch

2[ ]}@Ct

2[#[ ]]= E′2@C′2[ ].

(5.35)

The second-to-last equality uses the fact that

(5.36) E′{c 7→ Ch

2[ ]}= E′ {c[ ] 7→ Ch

2[ ]},

again from (5.28) on page 141.In the second case, E1 = C[E2] for some context C[ ]; that is, C2[ ] = C1[C[ ]]

for some context C[ ]. We proceed by induction on the structure of E1 (with E2

Page 155: Linguistic side effects

5.3. Restoring confluence 145

as the base case), in other words on the structure of C[ ] outside in (with [ ] as thebase case).

• If E1 is E2 and C[ ] is [ ], then we have the first case already dealt withabove.• If E1 is F(C0[E2]) or #(C0[E2]), and C[ ] is accordingly F(C0[ ]) or

#(C0[ ]), then Figure 5.4 on page 139 confirms that E1@C1[ ] is the samestatement as C0[E2]@C1[F[ ]] or C0[E2]@C1[#[ ]]. Hence E1@C1[ ] B+

E′2@C′2[ ] by the induction hypothesis.• If E1 is (C0[E2])V and C[ ] is (C0[ ])V , then

E1@C1[ ] = Ct1[ ][V; (C0[E2]; ); Ch

1[ ]]

B Ct1[ ][C0[E2]; (V; −; Ch

1[ ])]

= C0[E2]@C1[[ ]V].

(5.37)

By the induction hypothesis, C0[E2]@C1[[ ]V] B+ E′2@C′2[ ]. HenceE1@C1[ ] B+ E′2@C′2[ ] as well.

In the third, final case, E1 is a value, and C2[ ] = C′1[C[ ]E1] and C1[ ] =C′1[C[E2][ ]] for some context C[ ]. In this case, E1@C1[ ] is the same statementas C[E2]E1@C′1[ ], and C[E2]E1@C′1[ ] B+ E′2@C′2[ ] by the first case alreadydealt with above. �

C 5.2. Whenever C1[E1] B E′ in the call-by-value λ-calculus, thereexists an expression E′2 and an evaluation context C′2[ ] in the call-by-valueλ-calculus, such that E′ = C′2[E′2], and E1@C1[ ] B+ E′2@C′2[ ] in the call-by-value dual calculus.

Conversely, the following theorems show how each computation step in thedual calculus maps to either zero or one step in the λ-calculus.

T 5.3. If E1@C1[ ] B G in the call-by-value dual calculus, thenG = E2@C2[ ] for some expression E2 and some evaluation context C2[ ] in thecall-by-value λ-calculus, such that either C1[E1] = C2[E2] or C1[E1] B C2[E2].In any infinite sequence of steps E1@C1[ ] B E2@C2[ ] B · · · , there is eventuallyan index i where Ci[Ei] B Ci+1[Ei+1].

P. The first part of the theorem follows from inspecting the computationrelation in Figure 5.6 on page 142. Only the first kind of steps listed there, namely

(5.38) C[V; (E′; ); D] B C[E′; (V; −; D)],

may occur without guaranteeing that C1[E1] B C2[E2]. A step of this kindincreases the number of minus signs by 1 while keeping the same number ofsemicolons in the statement. There is no infinite sequence of such steps because

Page 156: Linguistic side effects

146 5. Delimited duality in programming languages

every minus sign must appear between two semicolons, and there are only finitelymany semicolons to start with. �

T 5.4. For any expression E and value V in the call-by-value λ-calculus, we have #E B∗ #V if and only if E; [] B∗ V; []. Moreover, #E begins aninfinite sequence of computation steps if and only if E; [] does.

P. For any λ-expression E, by definition E; [] = E@#[ ].[⇒] By induction via Corollary 5.2 on the number of computation steps in

the call-by-value λ-calculus.[⇐] By induction via Theorem 5.3 on the number of computation steps

in the call-by-value dual calculus, and the fact that E2@C2[ ] = V; [] impliesC2[E2] = #V . �

The dual of the call-by-value dual calculus, the call-by-name dual calculus,simulates the call-by-name λ-calculus with shift and reset. The analogous theo-rems below substantiate this simulation and our claim of duality. Their proofsfollow the same structure as the call-by-value ones above, but are somewhatsimpler because evaluation contexts are simpler in the call-by-name λ-calculus.

T 5.5 (Call-by-name simulation). Let C1[E1] = C2[E2] in the call-by-name λ-calculus. Suppose that C2[E2] B C′2[E′2] as follows, by one of the threekinds of computation steps in Figure 5.2 on page 132.

E2 E′2 C′2[ ]

(λx. E′)E E′ {x 7→ E} C2[ ]#V V C2[ ]

ξc. E′ E′ {c[ ] 7→ Ch2[ ]} Ct

2[#[ ]]

Then E1@C1[ ] B+ E′2@C′2[ ] in the call-by-name dual calculus.

C 5.6. Whenever C1[E1] B E′ in the call-by-name λ-calculus, thereexists an expression E′2 and an evaluation context C′2[ ] in the call-by-nameλ-calculus, such that E′ = C′2[E′2], and E1@C1[ ] B+ E′2@C′2[ ] in the call-by-name dual calculus.

T 5.7. If E1@C1[ ] B G in the call-by-name dual calculus, thenG = E2@C2[ ] for some expression E2 and some evaluation context C2[ ] in thecall-by-name λ-calculus, such that either C1[E1] = C2[E2] or C1[E1] B C2[E2].In any infinite sequence of steps E1@C1[ ] B E2@C2[ ] B · · · , there is eventuallyan index i where Ci[Ei] B Ci+1[Ei+1].

T 5.8. For any expression E and value V in the call-by-name λ-calculus, we have #E B∗ #V if and only if E; [] B∗ V; []. Moreover, #E begins aninfinite sequence of computation steps if and only if E; [] does.

Page 157: Linguistic side effects

5.4. The continuation-passing-style transform 147

5.4. The continuation-passing-style transform

Figure 5.7 translates the call-by-value dual calculus into the plain λ-calculuswithout shift or reset. The translation consists of five sets of equations: from val-ues V; to bV;c, from terms E; to dE;e, from covalues ; U to b; Uc, from coterms ; Dto d; De, and from statements G to JGK. A term or coterm translates to an abstrac-tion or variable in the λ-calculus; a statement translates to an application. Theterm translation is essentially Fischer’s continuation-passing-style transform forthe call-by-value λ-calculus (1972); the coterm transform is essentially Plotkin’scontinuation-passing-style transform for the call-by-name λ-calculus (1975).

The statement translation JGK allows multiple translations for the same state-ment G because the same G can be decomposed into E; D in n + 1 different ways,if the pipeline G has n intermediate stages. Fortunately, these different translationsall β-reduce to each other: it is easy to check that

dE; (E′; );ed; De B dE;ed; (E′; ); De, dE; (; D′);ed; De B dE;ed; (; D′); De.(5.39)

Hence, induction on the structure of E; gives dE;ed; De B∗ dE′;ed; D′e wheneverE′; is a leaf term and E; D = E′; D′. We henceforth let JE; DK be dE′;ed; D′e.

The translation of delimited substatements, shift-terms, and let-coterms in-troduces complex expressions JGK that behave differently under a call-by-valueversus a call-by-name interpreter of the target language. Thus, strictly speaking,this translation produces not continuation-passing-style code but continuation-composing-style code. When the source program contains delimited substate-ments, shift-terms, or let-coterms, we intend for the output of this translationto be interpreted under call-by-value. To achieve indifference between call-by-value and call-by-name interpretation, we could simply apply another pass of thecontinuation-passing-style transform.

T 5.9. If G B G′ in the call-by-value dual calculus, then JGK B+ JG′Kin the call-by-value λ-calculus.

P. By inspection of Figures 5.6 and 5.7. �

As one would expect, it is easy to check that, given any value V; in the dualcalculus, its continuation-passing-style transform bV;c is a value in the λ-calculus,and JV; []K B+ bV;c.

C 5.10. If G B∗ V; [] in the call-by-value dual calculus, then JGK B∗bV;c in the call-by-value λ-calculus. If G begins an infinite sequence of computa-tion steps, then JGK does too.

Page 158: Linguistic side effects

148 5. Delimited duality in programming languages

Value translation bV;c

bx.(G).c;c = λc. λx. JGKbx;c = xb[];c = λ f . f

b(V; −; D);c = λ f . f d; DebV;c

Term translation dE;e

dV;e = λc. cbV;c

d#G;e = λc. cJGKd(G).c;e = λc. JGK

dE; (E′; );e = λc. dE;e(λx. dE′;e(λ f . f cx))dE; (; D′);e = λc. dE;e(λx. d; D′e(λ f . f cx))

Covalue translation b; Uc

b; x.(G).cc = λc. λx. JGK

Coterm translation d; De

d; Ue = λx. xb; Ucd; ce = cd; []e = λx. x

d; (V; −; D)e = λ f . f d; DebV;c

d; #Ge = λx. JGKx

d; x.(G)e = λx. JGKd; (; D′); De = λx. d; D′e(λ f . f d; Dex)d; (E′; ); De = λx. dE′;e(λ f . f d; Dex)

Statement translation JGK

JE; DK = dE;ed; De

Figure 5.7. A continuation-passing-style transform for the dual calculus

Page 159: Linguistic side effects

CHAPTER 6

Conclusion

We have identified a general analogy between side effects in programminglanguages and in natural languages (Section 1.3). In particular, delimited con-trol subsumes other computational side effects as quantification subsumes otherlinguistic side effects (Section 3.4).

• On one hand, in the quest for a denotational semantics that is sound andcompositional, computational and linguistic side effects both make itimpossible for every expression to just denote its reference or evaluationresult (Section 1.6). In particular, in the standard denotational semanticsfor delimited control and quantification, the meaning of an expression isa function from a continuation (Section 3.2) or scope (Section 3.3).• On the other hand, in the quest for an operational semantics that models

how a computer executes a program or how a human processes anutterance, computational and linguistic side effects both make observablethe order in which parts of an expression are evaluated. In particular,evaluation order governs what contexts can be accessed by a delimitedcontrol operator (Section 3.1) or a quantificational phrase (Section 4.3).

For both kinds of languages, the relation between denotational and operationalsemantics is a mind-body problem of sorts: how does an expression manage to beboth a static product that stands alone mathematically and a dynamic action thattakes place physically (Wadler 1997; Trueswell and Tanenhaus 2005)?

We have put this analogy to work in both directions. Drawing on the gen-eral notion of computational side effects, our work on natural language extendsdynamic semantics (Groenendijk and Stokhof 1991; Heim 1982; Kamp 1981)from whole sentences to parts of a sentence, and from anaphora and existen-tial quantification to other linguistic side effects. Drawing on the continuationsemantics for delimited control, our new implementation of quantification in type-logical grammar (Section 4.2) is graphically motivated and does not move nearbyconstituents apart or distant constituents together (Section 4.1). Drawing on theprogramming-language concepts of evaluation order and multistage programming,our grammar unifies the preference for linear scope in quantification (Section 4.4)with the prohibition against crossover in anaphora, the superiority constraint onwh-questions, and the effect of linear order on polarity sensitivity (Section 4.5):

149

Page 160: Linguistic side effects

150 6. Conclusion

• Our treatment of crossover uses left-to-right evaluation to deal withproblems for previous accounts based purely on c-command or linearorder. It correctly predicts a complex pattern of interaction betweenanaphora and raised-wh questions, without any stipulation on both forthe first time.• Our explanation for superiority unifies it with crossover yet need not

encode multiple-wh questions as functional questions.• Our account of polarity sensitivity is the first to explain the effect of

linear order by appealing to processing.Further connections remain to be drawn and strengthened:

• to other linguistic side effects, such as intensionality, focus, and presup-positions (Section 1.6);• to psycholinguistic evidence on human sentence processing, in particular

how raised-wh questions are processed (Section 4.5.2);• to types for multistage programming, especially modal types (Sec-

tion 4.4); and• to the philosophical notion of illocution.

These connections all serve a broader investigation of operational semantics fornatural language.

Drawing from the duality between inside and outside in our treatment oflinguistic quantification, our programming language with delimited control inChapter 5 is the first to make subexpressions and subcontexts dual to each other.

• The duality exchanges call-by-value and call-by-name, two long-studiedevaluation orders that had previously been shown dual in the presence ofundelimited but not delimited control.• The duality also exchanges the familiar let construct and the less-familiar

shift construct, so that a programmer can understand the latter in termsof the former.

It remains to turn our type-logical grammar for linguistic quantification into atype system for this yet-untyped programming language. Such a type systemwould answer the open question of how to interpret delimited control logicallyunder the formulas-as-types correspondence. We suspect that the answer will bea noncommutative intuitionistic logic, like type-logical grammar.

Page 161: Linguistic side effects

Bibliography

Abbott, Michael, Thorsten Altenkirch, Neil Ghani, and Conor McBride. 2003.Derivatives of containers. In TLCA 2003: Proceedings of the 6th internationalconference on typed lambda calculi and applications, ed. Martin Hofmann,16–30. Lecture Notes in Computer Science 2701, Berlin: Springer-Verlag.

Abramsky, Samson, and Guy McCusker. 1997. Game semantics. Presented at the1997 Marktoberdorf Summer School. 26 Sep. 2000, http://web.comlab.ox.ac.uk/oucl/work/samson.abramsky/mdorf97.ps.gz.

Ajdukiewicz, Kasimir. 1935. Die syntaktische Konnexität. Studia Philosophica 1:1–27. English translation: Ajdukiewicz, Kasimir. 1967. Syntactic connexion.In Polish logic, 1920–1939, ed. Storrs McCall, 207–231. New York: OxfordUniversity Press.

Altmann, G. T. M., ed. 1990. Cognitive models of speech processing: Psycholin-guistic and computational perspectives. Cambridge: MIT Press.

Aoun, Joseph, and Yen-hui Audrey Li. 1993. Syntax of scope. Cambridge: MITPress.

Areces, Carlos, and Raffaella Bernardi. 2003. In situ binding: A hybrid ap-proach. In Proceedings of ICoS-4: 4th international workshop on inferencein computational semantics.

Bar-Hillel, Yehoshua. 1953. A quasi-arithmetical notation for syntactic descrip-tion. Language 29(1):47–58.

Barendregt, Henk P. 1981. The lambda calculus: Its syntax and semantics.Amsterdam: Elsevier Science.

Barker, Chris. 2001. Continuations: In-situ quantification without storage ortype-shifting. In Proceedings from Semantics and Linguistic Theory XI, ed.Rachel Hastings, Brendan Jackson, and Zsofia Zvolensky. Ithaca: CornellUniversity Press.

———. 2002. Continuations and the nature of quantification. Natural LanguageSemantics 10(3):211–242.

———. 2004. Continuations in natural language. In CW’04: Proceedings of the4th ACM SIGPLAN continuations workshop, ed. Hayo Thielecke, 1–11. Tech.Rep. CSR-04-1, School of Computer Science, University of Birmingham.

———. 2005. Direct compositionality on demand. In Direct compositionality,ed. Chris Barker and Pauline Jacobson. New York: Oxford University Press.

151

Page 162: Linguistic side effects

152 Bibliography

In press.Barker, Chris, and Chung-chieh Shan. 2005. Types as graphs: Continuations in

type logical grammar. Journal of Logic, Language and Information. In press.Barwise, Jon, and Robin Cooper. 1981. Generalized quantifiers and natural

language. Linguistics and Philosophy 4:159–219.Belnap, Nuel D., Jr. 1982. Display logic. Journal of Philosophical Logic 11(4):

375–417.van Benthem, Johan F. A. K. 1983. The semantics of variety in categorial grammar.

Report 83-29, Department of Mathematics, Simon Fraser University, Burnaby,BC.

———. 1988. The semantics of variety in categorial grammar. In Categorialgrammar, ed. Wojciech Buszkowski, Witold Marciszewski, and Johan F. A. K.van Benthem, 37–55. Amsterdam: John Benjamins.

Bernardi, Raffaella. 2002. Reasoning with polarity in categorial type logic. Ph.D.thesis, Utrecht Institute of Linguistics (OTS), Utrecht University.

———. 2003. CTL: In situ binding. http://www.let.uu.nl/~Raffaella.Bernardi/personal/q_solutions.pdf.

Bernardi, Raffaella, and Richard Moot. 2001. Generalized quantifiers in declara-tive and interrogative sentences. Journal of Language and Computation 1(3):1–19.

Biernacki, Dariusz, Olivier Danvy, and Kevin Millikin. 2005a. A dynamiccontinuation-passing style for dynamic delimited continuations. Report RS-05-16, BRICS, Denmark.

Biernacki, Dariusz, Olivier Danvy, and Chung-chieh Shan. 2005b. On the dynamicextent of delimited continuations. Report RS-05-13, BRICS, Denmark.

———. 2005c. On the dynamic extent of delimited continuations. InformationProcessing Letters 96(1):7–17.

Büring, Daniel. 2001. A situation semantics for binding out of DP. In Proceedingsfrom Semantics and Linguistic Theory XI, ed. Rachel Hastings, BrendanJackson, and Zsofia Zvolensky, 56–75. Ithaca: Cornell University Press.

———. 2004. Crossover situations. Natural Language Semantics 12(1):23–62.Buszkowski, Wojciech. 1986. Generative capacity of nonassociative Lambek

calculus. Bulletin of the Polish Academy of Sciences: Mathematics 34(7–8):507–516.

Calcagno, Cristiano, Eugenio Moggi, and Tim Sheard. 2003. Closed types for asafe imperative MetaML. Journal of Functional Programming 13(3):545–571.

Carpenter, Bob. 1994. Quantification and scoping: A deductive account. In Pro-ceedings of the 13th West Coast Conference on Formal Linguistics, ed. RaœlAranovich, William Byrne, Susanne Preuss, and Martha Senturia. Stanford,CA: Center for the Study of Language and Information.

———. 1997. Type-logical semantics. Cambridge: MIT Press.

Page 163: Linguistic side effects

Bibliography 153

Chierchia, Gennaro. 1991. Functional WH and weak crossover. In Proceedingsof the 10th West Coast Conference on Formal Linguistics, ed. Dawn Bates,75–90. Stanford, CA: Center for the Study of Language and Information.

———. 1993. Questions with quantifiers. Natural Language Semantics 1(2):181–234.

Chomsky, Noam. 1973. Conditions on transformations. In A festschrift for MorrisHalle, ed. Stephen Anderson and Paul Kiparsky, 232–286. New York: Holt,Rinehart and Winston.

Church, Alonzo. 1932. A set of postulates for the foundation of logic. Annals ofMathematics II.33(2):346–366.

———. 1940. A formulation of the simple theory of types. Journal of SymbolicLogic 5(2):56–68.

Cooper, Robin. 1983. Quantification and syntactic theory. Dordrecht: Reidel.Crolard, Tristan. 2001. Subtractive logic. Theoretical Computer Science 254(1–2):

151–185.———. 2004. A formulae-as-types interpretation of Subtractive Logic. Journal

of Logic and Computation 14(4):529–570.Curien, Pierre-Louis, and Hugo Herbelin. 2000. The duality of computation. In

ICFP ’00: Proceedings of the ACM international conference on functionalprogramming, vol. 35(9) of ACM SIGPLAN Notices, 233–243. New York:ACM Press.

Dalrymple, Mary, John Lamping, Fernando Pereira, and Vijay A. Saraswat.1999. Quantification, anaphora, and intensionality. In Semantics and syntaxin Lexical Functional Grammar: The resource logic approach, ed. MaryDalrymple, chap. 2, 39–89. Cambridge: MIT Press.

Danos, Vincent, Jean-Baptiste Joinet, and Harold Schellinx. 1995. LKQ and LKT:Sequent calculi for second order logic based upon dual linear decompositionsof classical implication. In Advances in linear logic, ed. Jean-Yves Girard,Yves Lafont, and Laurent Regnier, 211–224. London Mathematical SocietyLecture Note 222, Cambridge: Cambridge University Press.

Danvy, Olivier. 2004. On evaluation contexts, continuations, and the rest of thecomputation. In CW’04: Proceedings of the 4th ACM SIGPLAN continuationsworkshop, ed. Hayo Thielecke. Tech. Rep. CSR-04-1, School of ComputerScience, University of Birmingham.

Danvy, Olivier, and Andrzej Filinski. 1989. A functional abstraction of typedcontexts. Tech. Rep. 89/12, DIKU, University of Copenhagen, Denmark.http://www.daimi.au.dk/~danvy/Papers/fatc.ps.gz.

———. 1990. Abstracting control. In Proceedings of the 1990 ACM conferenceon Lisp and functional programming, 151–160. New York: ACM Press.

———. 1992. Representing control: A study of the CPS transformation. Mathe-matical Structures in Computer Science 2(4):361–391.

Page 164: Linguistic side effects

154 Bibliography

Danvy, Olivier, and Lasse R. Nielsen. 2001a. Defunctionalization at work. InProceedings of the 3rd international conference on principles and practice ofdeclarative programming, 162–174. New York: ACM Press.

———. 2001b. Defunctionalization at work. Report RS-01-23, BRICS, Denmark.Davies, Rowan. 1996. A temporal logic approach to binding-time analysis. In

LICS ’96: Proceedings of the 11th symposium on logic in computer science,184–195. Washington, DC: IEEE Computer Society Press.

Davies, Rowan, and Frank Pfenning. 1996. A modal analysis of staged compu-tation. In POPL ’96: Conference record of the annual ACM symposium onprinciples of programming languages, 258–270. New York: ACM Press.

———. 2001. A modal analysis of staged computation. Journal of the ACM48(3):555–604.

Dayal, Veneeta. 1996. Locality in wh quantification: Questions and relativeclauses in Hindi. Dordrecht: Kluwer.

Dowty, David R. 1992. ‘Variable-free’ syntax, variable-binding syntax, the naturaldeduction Lambek calculus, and the crossover constraint. In Proceedings ofthe 11th West Coast Conference on Formal Linguistics, ed. Jonathan Mead.Stanford, CA: Center for the Study of Language and Information.

———. 1994. The role of negative polarity and concord marking in naturallanguage reasoning. In Proceedings from Semantics and Linguistic Theory IV,ed. Mandy Harvey and Lynn Santelmann. Ithaca: Cornell University Press.

Dunn, J. Michael. 1991. Gaggle theory: An abstraction of Galois connectionsand residuation with applications to negation and various logical operations.In Logics in AI: European workshop JELIA ’90, ed. Jan van Eijck. LectureNotes in Artificial Intelligence 478, Berlin: Springer-Verlag.

Dybvig, R. Kent, Simon L. Peyton Jones, and Amr Sabry. 2005. A monadicframework for delimited continuations. Tech. Rep. 615, Computer ScienceDepartment, Indiana University.

van Eijck, Jan. 1998. Programming with dynamic predicate logic. ReportINS-R9810, Centrum voor Wiskunde en Informatica, Amsterdam. Also asResearch Report CT-1998-06, Institute for Logic, Language and Computation,Universiteit van Amsterdam.

van Eijck, Jan, and Nissim Francez. 1995. Verb-phrase ellipsis in dynamicsemantics. In Applied logic: How, what, and why: Logical approaches tonatural language, ed. László Pólos and Michael Masuch. Dordrecht: Kluwer.

Felleisen, Matthias. 1987. The calculi of λv-CS conversion: A syntactic theory ofcontrol and state in imperative higher-order programming languages. Ph.D.thesis, Computer Science Department, Indiana University. Also as Tech. Rep.226.

———. 1988. The theory and practice of first-class prompts. In POPL ’88: Con-ference record of the annual ACM symposium on principles of programming

Page 165: Linguistic side effects

Bibliography 155

languages, 180–190. New York: ACM Press.Felleisen, Matthias, Daniel P. Friedman, Bruce F. Duba, and John Merrill. 1987.

Beyond continuations. Tech. Rep. 216, Computer Science Department, Indi-ana University.

Felleisen, Matthias, Mitchell Wand, Daniel P. Friedman, and Bruce F. Duba.1988. Abstract continuations: A mathematical semantics for handling fulljumps. In Proceedings of the 1988 ACM conference on Lisp and functionalprogramming, 52–62. New York: ACM Press.

Filinski, Andrzej. 1989a. Declarative continuations: An investigation of dualityin programming language semantics. In Proceedings of category theoryand computer science, ed. David H. Pitt, David E. Rydeheard, Peter Dybjer,Andrew M. Pitts, and Axel Poigné, 224–249. Lecture Notes in ComputerScience 389, Berlin: Springer-Verlag.

———. 1989b. Declarative continuations and categorical duality. Master’s thesis,DIKU, University of Copenhagen, Denmark. Also as Tech. Rep. 89/11.

———. 1994. Representing monads. In POPL ’94: Conference record of theannual ACM symposium on principles of programming languages, 446–457.New York: ACM Press.

———. 1996. Controlling effects. Ph.D. thesis, School of Computer Science,Carnegie Mellon University. Also as Tech. Rep. CMU-CS-96-119.

———. 1999. Representing layered monads. In POPL ’99: Conference recordof the annual ACM symposium on principles of programming languages,175–188. New York: ACM Press.

Fischer, Michael J. 1972. Lambda-calculus schemata. In Proceedings of ACMconference on proving assertions about programs, vol. 7(1) of ACM SIGPLANNotices, 104–109. New York: ACM Press. Also ACM SIGACT News 14.

Frege, Gottlob. 1891. Funktion und Begriff. Vortrag, gehalten in der Sitzungvom 9. Januar 1891 der Jenaischen Gesellschaft für Medizin und Naturwis-senschaft, Jena: Hermann Pohle. English translation: Frege, Gottlob. 1980.Function and concept. In Translations from the philosophical writings of Gott-lob Frege, ed. Peter Geach and Max Black, 3rd ed., 21–41. Oxford: Blackwell.Reprinted as: Frege, Gottlob. 1997. Function and concept. In The Fregereader, ed. Michael Beaney, 130–148. Oxford: Blackwell.

———. 1892. Über Sinn und Bedeutung. Zeitschrift für Philosophie undphilosophische Kritik 100:25–50. English translation: Frege, Gottlob. 1980.On sense and reference. In Translations from the philosophical writingsof Gottlob Frege, ed. Peter Geach and Max Black, 3rd ed., 56–78. Oxford:Blackwell. Reprinted as: Frege, Gottlob. 1997. On Sinn and Bedeutung. InThe Frege reader, ed. Michael Beaney, 151–171. Oxford: Blackwell.

Friedman, Daniel P., and David S. Wise. 1976. Cons should not evaluate itsarguments. In Automata, languages, and programming: 3rd international

Page 166: Linguistic side effects

156 Bibliography

colloquium, ed. Sidney Michaelson and Robin Milner, 257–284. Edinburgh:Edinburgh University Press.

Fry, John. 1997. Negative polarity licensing at the syntax-semantics interface. InProceedings of the 35th annual meeting of the Association for ComputationalLinguistics and 8th conference of the European chapter of the Associationfor Computational Linguistics, ed. Philip R. Cohen and Wolfgang Wahlster,144–150. San Francisco: Morgan Kaufmann.

———. 1999. Proof nets and negative polarity licensing. In Semantics andsyntax in Lexical Functional Grammar: The resource logic approach, ed.Mary Dalrymple, chap. 3, 91–116. Cambridge: MIT Press.

Gallier, Jean. 1991. Constructive logics, Part II: Linear logic and proof nets. Tech.Rep., University of Pennsylvania.

———. 1993. Constructive logics, Part I: A tutorial on proof systems and typedλ-calculi. Theoretical Computer Science 110:249–339.

Gapeyev, Vladimir, Michael Y. Levin, and Benjamin C. Pierce. 2002. Recursivesubtyping revealed. Journal of Functional Programming 12(6):511–548.

Gardent, Claire. 1991. Dynamic semantics and VP-ellipsis. In Logics in AI:European workshop JELIA ’90, ed. Jan van Eijck, 251–266. Lecture Notes inArtificial Intelligence 478, Berlin: Springer-Verlag.

Gibson, Edward, and Neal J. Pearlmutter. 1998. Constraints on sentence compre-hension. Trends in Cognitive Sciences 2(7):262–268.

Girard, Jean-Yves. 1987. Linear logic. Theoretical Computer Science 50:1–101.Girard, Jean-Yves, Paul Taylor, and Yves Lafont. 1989. Proofs and types. Cam-

bridge: Cambridge University Press.Goubault-Larrecq, Jean. 1996a. On computational interpretations of the modal

logic S4: I. Cut elimination. Interner Bericht 1996-35, Institut für Logik,Komplexität und Deduktionssysteme, Universität Karlsruhe.

———. 1996b. On computational interpretations of the modal logic S4: II. TheλevQ-calculus. Interner Bericht 1996-34, Institut für Logik, Komplexität undDeduktionssysteme, Universität Karlsruhe.

———. 1996c. On computational interpretations of the modal logic S4: IIIa. Ter-mination, confluence, conservativity of λevQ. Interner Bericht 1996-33,Institut für Logik, Komplexität und Deduktionssysteme, Universität Karl-sruhe.

———. 1997. On computational interpretations of the modal logic S4: IIIb. Con-fluence, termination of the λevQH-calculus. Rapport de Recherche 3164,INRIA.

Greiner, John. 1992. Programming with inductive and co-inductive types. Tech.Rep. CMU-CS-92-109, School of Computer Science, Carnegie Mellon Uni-versity.

Groenendijk, Jeroen, and Martin Stokhof. 1984. Studies on the semantics of

Page 167: Linguistic side effects

Bibliography 157

questions and the pragmatics of answers. Ph.D. thesis, Universiteit vanAmsterdam.

———. 1991. Dynamic predicate logic. Linguistics and Philosophy 14(1):39–100.

———. 1997. Questions. In Handbook of logic and language, ed. Johan F. A. K.van Benthem and Alice G. B. ter Meulen, 1055–1124. Amsterdam: ElsevierScience.

de Groote, Philippe. 2001. Type raising, continuations, and classical logic. InProceedings of the 13th Amsterdam Colloquium, ed. Robert van Rooy andMartin Stokhof, 97–101. Institute for Logic, Language and Computation,Universiteit van Amsterdam.

Gunter, Carl A., Didier Rémy, and Jon G. Riecke. 1995. A generalization ofexceptions and control in ML-like languages. In Functional programming lan-guages and computer architecture: 7th conference, ed. Simon L. Peyton Jones,12–23. New York: ACM Press.

———. 1998. Return types for functional continuations. http://pauillac.inria.fr/~remy/work/cupto/.

Hamblin, Charles Leonard. 1973. Questions in Montague English. Foundationsof Language 10:41–53.

Hardt, Daniel. 1999. Dynamic interpretation of verb phrase ellipsis. Linguisticsand Philosophy 22(2):185–219.

Heim, Irene. 1982. The semantics of definite and indefinite noun phrases. Ph.D.thesis, Department of Linguistics, University of Massachusetts.

Henderson, Peter. 1982. Functional geometry. In Proceedings of the 1982 ACMsymposium on Lisp and functional programming, 179–187. New York: ACMPress.

Hendriks, Herman. 1988. Type change in semantics: The scope of quantificationand coordination. In Categories, polymorphism and unification, ed. EwanKlein and Johan F. A. K. van Benthem, 96–119. Centre for Cognitive Science,University of Edinburgh.

Hepple, Mark. 1990. The grammar and processing of order and dependency: Acategorial approach. Ph.D. thesis, Centre for Cognitive Science, University ofEdinburgh.

———. 1992. Command and domain constraints in a categorial theory of bind-ing. In Proceedings of the 8th Amsterdam Colloquium, ed. Paul Dekker andMartin Stokhof, 253–270. Institute for Logic, Language and Computation,Universiteit van Amsterdam.

Hieb, Robert, and R. Kent Dybvig. 1990. Continuations and concurrency. InProceedings of the 2nd ACM SIGPLAN symposium on principles and practiceof parallel programming, 128–136. New York: ACM Press.

Hindley, J. Roger, and Jonathan P. Seldin. 1986. Introduction to combinators and

Page 168: Linguistic side effects

158 Bibliography

λ-calculus. Cambridge: Cambridge University Press.Hinze, Ralf, and Johan Jeuring. 2001. Weaving a web. Journal of Functional

Programming 11(6):681–689.Hoare, C. A. R. 1969. An axiomatic basis for computer programming. Communi-

cations of the ACM 12(10):576–580,583.Hobbs, Jerry R., and Stanley J. Rosenschein. 1978. Making computational sense

of Montague’s intensional logic. Artificial Intelligence 9:287–306.Hobbs, Jerry R., and Stuart M. Shieber. 1987. An algorithm for generating

quantifier scopings. Computational Linguistics 13(1–2):47–63.Hornstein, Norbert. 1995. Logical form: From GB to Minimalism. Oxford:

Blackwell.Howard, William A. 1980. The formulae-as-types notion of construction. In To H.

B. Curry: Essays on combinatory logic, lambda calculus and formalism, ed.Jonathan P. Seldin and J. Roger Hindley, 479–490. San Diego, CA: AcademicPress.

Huang, Cheng-Teh James. 1982. Logical relations in Chinese and the theoryof grammar. Ph.D. thesis, Department of Linguistics and Philosophy, Mas-sachusetts Institute of Technology.

Hudak, Paul, Tom Makucevich, Syam Gadde, and Bo Whong. 1996. Haskoremusic notation: An algebra of music. Journal of Functional Programming6(3).

Huet, Gérard. 1997. The zipper. Journal of Functional Programming 7(5):549–554.

Hung, Hing-Kai, and Jeffery I. Zucker. 1991. Semantics of pointers, referencingand dereferencing with intensional logic. In LICS ’91: Proceedings of the 6thsymposium on logic in computer science, 127–136. Washington, DC: IEEEComputer Society Press.

Jacobson, Pauline. 1999. Towards a variable-free semantics. Linguistics andPhilosophy 22(2):117–184.

———. 2000. Paycheck pronouns, Bach-Peters sentences, and variable-freesemantics. Natural Language Semantics 8(2):77–155.

Jäger, Gerhard. 2001. Anaphora and type logical grammar. Habilitationss-chrift, Humboldt University Berlin. UIL-OTS Working Papers 01004-CL/TL,Utrecht Institute of Linguistics (OTS), Utrecht University.

Janssen, Theo M. V. 1997. Compositionality. In Handbook of logic and language,ed. Johan F. A. K. van Benthem and Alice G. B. ter Meulen. Amsterdam:Elsevier Science.

Kamp, Hans. 1981. A theory of truth and semantic representation. In Formalmethods in the study of language: Proceedings of the 3rd Amsterdam Collo-quium, ed. Jeroen A. G. Groenendijk, Theo M. V. Janssen, and Martin B. J.Stokhof, 277–322. Amsterdam: Mathematisch Centrum.

Page 169: Linguistic side effects

Bibliography 159

Kandulski, Maciej. 1988. The equivalence of nonassociative Lambek categorialgrammars and context-free grammars. Zeitschrift für Mathematische Logikund Grundlagen der Mathematik 34(1):41–52.

Karttunen, Lauri. 1977. Syntax and semantics of questions. Linguistics andPhilosophy 1(1):3–44.

Keller, William R. 1988. Nested Cooper storage: The proper treatment of quantifi-cation in ordinary noun phrases. In Natural language parsing and linguistictheories, ed. Uwe Reyle and Christian Rohrer, 432–447. Dordrecht: Reidel.

Kiselyov, Oleg. 2004. Zipper in Scheme. Usenet posting to the comp.lang.scheme newsgroup; http://okmij.org/ftp/Scheme/zipper-in-scheme.txt.

———. 2005. How to remove a dynamic prompt: Static and dynamic delimitedcontinuation operators are equally expressible. Tech. Rep. 611, ComputerScience Department, Indiana University.

Krifka, Manfred. 1995. The semantics and pragmatics of polarity items. LinguisticAnalysis 25:209–257.

———. 2001. For a structured meaning account of questions and answers. InAudiatur vox sapientiae: A festschrift for Arnim von Stechow, ed. CarolineFéry and Wolfgang Sternefeld, 287–319. Berlin: Akademie Verlag.

Kroch, Anthony S. 1974. The semantics of scope in English. Ph.D. thesis,Massachusetts Institute of Technology. Reprinted by New York: Garland,1979.

Kuno, Susumu, and Jane J. Robinson. 1972. Multiple wh questions. LinguisticInquiry 3:463–487.

Ladusaw, William A. 1979. Polarity sensitivity as inherent scope relations. Ph.D.thesis, Department of Linguistics, University of Massachusetts. Reprinted byNew York: Garland, 1980.

Lambek, Joachim. 1958. The mathematics of sentence structure. AmericanMathematical Monthly 65(3):154–170.

Lappin, Shalom, and Wlodek Zadrozny. 2000. Compositionality, synonymy, andthe systematic representation of meaning. e-Print cs.CL/0001006, arXiv.org.Submitted to Linguistics and Philosophy.

Linebarger, Marcia Christine. 1980. The grammar of negative polarity. Ph.D.thesis, Massachusetts Institute of Technology.

May, Robert. 1985. Logical form: Its structure and derivation. Cambridge: MITPress.

Milner, Robin. 1996. Calculi for interaction. Acta Informatica 33(8):707–737.Moggi, Eugenio. 1990. An abstract view of programming languages. Tech.

Rep. ECS-LFCS-90-113, Laboratory for Foundations of Computer Science,Department of Computer Science, University of Edinburgh.

———. 1991. Notions of computation and monads. Information and Computation

Page 170: Linguistic side effects

160 Bibliography

93(1):55–92.Montague, Richard. 1974a. Formal philosophy: Selected papers of Richard

Montague, ed. Richmond Thomason. New Haven: Yale University Press.———. 1974b. The proper treatment of quantification in ordinary English. In

Formal philosophy: Selected papers of Richard Montague, ed. RichmondThomason, 247–270. New Haven: Yale University Press.

Moortgat, Michael. 1988. Categorial investigations: Logical and linguisticaspects of the Lambek calculus. Dordrecht: Foris.

———. 1995. In situ binding: A modal analysis. In Proceedings of the 10th Am-sterdam Colloquium, ed. Paul Dekker and Martin Stokhof, 539–549. Institutefor Logic, Language and Computation, Universiteit van Amsterdam.

———. 1996. Generalized quantification and discontinuous type constructors.In Discontinuous constituency, ed. Harry C. Bunt and Arthur van Horck,181–207. Berlin: Mouton de Gruyter.

———. 1997. Categorial type logics. In Handbook of logic and language, ed.Johan F. A. K. van Benthem and Alice G. B. ter Meulen, chap. 2. Amsterdam:Elsevier Science.

———. 2000. Computational semantics: Lab sessions. http://www.let.uu.nl/~Michael.Moortgat/personal/Courses/cslab2000s.pdf.

Moot, Richard. 2002. Proof nets for linguistic analysis. Ph.D. thesis, UtrechtInstitute of Linguistics (OTS), Utrecht University.

Moran, Douglas B. 1988. Quantifier scoping in the SRI core language engine. InProceedings of the 26th annual meeting of the Association for ComputationalLinguistics, 33–40. Somerset, NJ: Association for Computational Linguistics.

Morrill, Glyn V. 1994. Type logical grammar: Categorial logic of signs. Dor-drecht: Kluwer.

———. 2000. Type-logical anaphora. Report de Recerca LSI-00-77-R, De-partament de Llenguatges i Sistemes Informàtics, Universitat Politècnica deCatalunya.

———. 2003. On bound anaphora in type logical grammar. In Resource-sensitivity in anaphora and binding, ed. Geert-Jan M. Kruijff and Richard T.Oehrle, chap. 6. Dordrecht: Kluwer.

Nelken, Rani, and Nissim Francez. 2000. A calculus of interrogatives based ontheir algebraic semantics. In Proceedings of TWLT16/AMILP2000: Alge-braic methods in language processing, ed. Dirk Heylen, Anton Nijholt, andGiuseppe Scollo, 143–160.

———. 2002. Bilattices and the semantics of natural language questions. Lin-guistics and Philosophy 25(1):37–64.

Nelken, Rani, and Chung-chieh Shan. 2004. A logic of interrogation should beinternalized in a modal logic for knowledge. In Proceedings from Semanticsand Linguistic Theory XIV, ed. Kazuha Watanabe and Robert B. Young,

Page 171: Linguistic side effects

Bibliography 161

197–211. Ithaca: Cornell University Press.O’Neil, John. 1993. A unified analysis of superiority, crossover, and scope. In

Harvard working papers in linguistics, ed. Höskuldur Thráinsson, Samuel D.Epstein, and Susumu Kuno, vol. 3, 128–136. Cambridge: Harvard University.

Pentus, Mati. 1993. Lambek grammars are context free. In LICS ’93: Proceedingsof the 8th symposium on logic in computer science, 429–433. Washington,DC: IEEE Computer Society Press.

———. 1996. Lambek calculus and formal grammars. Ph.D. thesis, MoscowState University, Moscow. In Russian. English translation: Pentus, Mati.1999. Lambek calculus and formal grammars. In Provability, complexity,grammars, vol. 192 of American Mathematical Society Translations Series 2,57–86. Providence: American Mathematical Society.

———. 1997. Product-free Lambek calculus and context-free grammars. Journalof Symbolic Logic 62(2):648–660.

Pierce, Benjamin C. 2002. Types and programming languages. Cambridge: MITPress.

Plotkin, Gordon D. 1975. Call-by-name, call-by-value and the λ-calculus. Theo-retical Computer Science 1(2):125–159.

Postal, Paul Martin. 1971. Cross-over phenomena. New York: Holt, Rinehart andWinston.

Queinnec, Christian, and Bernard Serpette. 1991. A dynamic extent controloperator for partial continuations. In POPL ’91: Conference record of theannual ACM symposium on principles of programming languages, 174–184.New York: ACM Press.

Quine, Willard Van Orman. 1960. Word and object. Cambridge: MIT Press.Ranta, Aarne. 1994. Type-theoretical grammar. New York: Oxford University

Press.Reinhart, Tanya. 1983. Anaphora and semantic interpretation. London: Croom

Helm.Restall, Greg. 2000. An introduction to substructural logics. London: Routledge.Reynolds, John C. 1972. Definitional interpreters for higher-order programming

languages. In Proceedings of the ACM national conference, vol. 2, 717–740.New York: ACM Press. Reprinted as: Reynolds, John C. 1998. Definitionalinterpreters for higher-order programming languages. Higher-Order andSymbolic Computation 11(4):363–397.

———. 1983. Types, abstraction and parametric polymorphism. In Informationprocessing 83: Proceedings of the IFIP 9th world computer congress, ed.R. E. A. Mason, 513–523. Amsterdam: Elsevier Science.

———. 1993. The discoveries of continuations. Lisp and Symbolic Computation6(3–4):233–247.

Sabry, Amr. 1994. The formal relationship between direct and continuation-

Page 172: Linguistic side effects

162 Bibliography

passing style optimizing compilers: A synthesis of two paradigms. Ph.D.thesis, Department of Computer Science, Rice University. Also as Tech. Rep.TR94-241.

———. 1996. Note on axiomatizing the semantics of control operators. Tech. Rep.CIS-TR-96-03, Department of Computer and Information Science, Universityof Oregon.

Sabry, Amr, and Matthias Felleisen. 1993. Reasoning about programs incontinuation-passing style. Lisp and Symbolic Computation 6(3–4):289–360.

Selinger, Peter. 2001. Control categories and duality: On the categorical semanticsof the lambda-mu calculus. Mathematical Structures in Computer Science 11:207–260.

Shan, Chung-chieh. 2001. Monads for natural language semantics. In Proceedingsof the ESSLLI-2001 student session, ed. Kristina Striegnitz, 285–298. Helsinki:13th European Summer School in Logic, Language and Information.

———. 2004a. Delimited continuations in natural language: Quantificationand polarity sensitivity. In CW’04: Proceedings of the 4th ACM SIGPLANcontinuations workshop, ed. Hayo Thielecke, 55–64. Tech. Rep. CSR-04-1,School of Computer Science, University of Birmingham.

———. 2004b. Polarity sensitivity and evaluation order in type-logical grammar.In Proceedings of the 2004 human language technology conference of theNorth American chapter of the Association for Computational Linguistics, ed.Susan Dumais, Daniel Marcu, and Salim Roukos, vol. 2, 129–132. Somerset,NJ: Association for Computational Linguistics.

———. 2004c. Shift to control. In Proceedings of the 5th workshop on Schemeand functional programming, ed. Olin Shivers and Oscar Waddell, 99–107.Tech. Rep. 600, Computer Science Department, Indiana University.

———. 2005. Linguistic side effects. In Direct compositionality, ed. Chris Barkerand Pauline Jacobson. New York: Oxford University Press. In press.

Shan, Chung-chieh, and Chris Barker. 2005. Explaining crossover and superiorityas left-to-right evaluation. Linguistics and Philosophy. In press.

Shieber, Stuart M. 1985. Evidence against the context-freeness of natural language.Linguistics and Philosophy 8:333–343.

Sitaram, Dorai. 1993. Handling control. In PLDI ’93: Proceedings of the ACMconference on programming language design and implementation, vol. 28(6)of ACM SIGPLAN Notices, 147–155. New York: ACM Press.

Sitaram, Dorai, and Matthias Felleisen. 1990. Control delimiters and their hierar-chies. Lisp and Symbolic Computation 3(1):67–99.

Sleator, Daniel Dominic, and Robert Endre Tarjan. 1985. Self-adjusting binarysearch trees. Journal of the ACM 32(3):652–686.

Søndergaard, Harald, and Peter Sestoft. 1990. Referential transparency, definite-ness and unfoldability. Acta Informatica 27:505–517.

Page 173: Linguistic side effects

Bibliography 163

———. 1992. Non-determinism in functional languages. The Computer Journal35(5):514–523.

Strachey, Christopher, and Christopher P. Wadsworth. 1974. Continuations: Amathematical semantics for handling full jumps. Technical Monograph PRG-11, Programming Research Group, Oxford University Computing Laboratory.Reprinted as: Strachey, Christopher, and Christopher P. Wadsworth. 2000.Continuations: A mathematical semantics for handling full jumps. Higher-Order and Symbolic Computation 13(1–2):135–152.

Szabolcsi, Anna. 1989. Bound variables in syntax (are there any?). In Semanticsand contextual expression, ed. Renate Bartsch, Johan F. A. K. van Benthem,and Peter van Emde Boas, 295–318. Dordrecht: Foris.

———. 1992. Combinatory grammar and projection from the lexicon. InLexical matters, ed. Ivan A. Sag and Anna Szabolcsi, 241–269. CSLI LectureNotes 24, Stanford, CA: Center for the Study of Language and Information.

———. 1997. Strategies for scope taking. In Ways of scope taking, ed. AnnaSzabolcsi, chap. 4, 109–154. Dordrecht: Kluwer.

———. 2004. Positive polarity—negative polarity. Natural Language andLinguistic Theory 22(2):409–452.

Taha, Walid, and Michael Florentin Nielsen. 2003. Environment classifiers. InPOPL ’03: Conference record of the annual ACM symposium on principlesof programming languages, 26–37. New York: ACM Press.

Trueswell, John C., and Michael K. Tanenhaus, eds. 2005. Approaches to study-ing world-situated language use: Bridging the language-as-product andlanguage-as-action traditions. Cambridge: MIT Press.

Wadler, Philip L. 1992a. Comprehending monads. Mathematical Structures inComputer Science 2(4):461–493.

———. 1992b. The essence of functional programming. In POPL ’92: Con-ference record of the annual ACM symposium on principles of programminglanguages, 1–14. New York: ACM Press.

———. 1997. How to declare an imperative. ACM Computing Surveys 29(3):240–263.

———. 2000. Proofs are programs: 19th century logic and 21st cen-tury computing. http://homepages.inf.ed.ac.uk/wadler/topics/history.html.

———. 2003. Call-by-value is dual to call-by-name. In ICFP ’03: Proceedingsof the ACM international conference on functional programming. New York:ACM Press.

Zadrozny, Wlodek. 1994. From compositional to systematic semantics. Linguis-tics and Philosophy 17(4):329–342.