Cyclic Domains and (Multiple) Specifiers...

1

Cyclic Domains and (Multiple) Specifiers Constructions

Milan Rezac

1 Introduction

1.1 Cyclicity

This paper is about Cyclicity. Intuitively, the technology has mainly served two purposes, of which one I will not be concerned with at all: the determination intermediate landing sites for successive-cyclic Ā-movement, which we may call Successive Cyclicity (SC). Our quest is Extensional Cyclicity (ExtC), which took off as a thing of its very own fairly recently as the idea that movement results in the growth of the phrase-marker. This is the Extension Condition of Chomsky 1993:190, which bars (1) because there is no subextraction from subjects, and ExtC makes sure that a picture of

who will have moved to [Spec, TP] by the time [+Q] C Merges in: (1) Who1 was a [picture of __1]2 sold __2

The ground covered by ExtC and SC is not always different. Make SC nodes small enough, and you might capture ExtC, hence bounding nodes (Chomsky 1973, where taking S' and S as cyclic domains accounts for (1))1 and lately phases (Chomsky 1999). Conversely, force movement arbitrarily at cyclic categories by giving them e.g. EPP, SC follows from Ext (Chomsky 1998). Still, the distinction is very useful. The business of SC is to decide where successive cyclic Ā-movement stops, which is conceptually a very different thing from governing the application of syntactic operations. So let's turn to ExtC. A big step along the road specific transformations to Move-α was the idea that syntactic operations are tied to (features of) phrasal categories, like wh-movement to a [+Q]C

2. Looked in this light, ExtC governs the interaction of rules like A-movement and wh-movement by applying rules in the same order as the dominance relations between the heads that trigger them. This has an attractive derivational implementation: have the order in which a phrase marker is assembled be what yields dominance (Epstein 1994/1999), and force a category to trigger whatever operations are associated with it as soon as it is introduced into the construction; perhaps even use the same mechanism for both (Frampton and Gutmann 1999:3, Chomsky 1998:49ff.). The empirical domain of ExtC is rule interaction. Not all patterns of interaction in a phrase marker are possible, nor is there a general principle like "maximize application" which would result in all and only feeding orders. Instead, what we get are at feeding, bleeding, counter-feeding, and counter-bleeding orders, which are correctly predicted by ExtC3, though (so far as I know) matters are not usually described this way: 1 Wh-movement before passive promotion because TP and CP (S and S') are bounding nodes. See Bejar

and Rezac 2002b for an argument that in such systems the bounding nodes can't be made small enough to capture ExtC, and if they could they'd suddenly be too small. 2 The origin lies in the mid sixties, when the idea came along that the question transformation is 'governed' by a question complementizer; the details of who and when escape me (Klima or Katz, perhaps). 3 ExtC alone, however, is not responsible for such interaction patterns among rules. The other major culprit is the Improper Movement Generalization (IMG). IMG blocks Ā-movement from feeding A-movement,

2

(2)a. #Tell me [how many eagles]1 Dirk did [not successfully evade __1] b. Tell me [how many eagles]1 were [not successfully evaded by Dirk __1]

Feeding

(3)a. Who1 did Grendel take [a bite out of __1]? b. *Who1 was [a picture of __1]2 taken __2? Bleeding (4)a. [Which isles]1 were1 visited __1 by Ged? b. *[Which isles]1 was visited __1 by Ged? Counter-bleeding (5) __1 was asked __1 by Ath [CP who1 [Q-]C dragons ever lie]. Counter-feeding

In the first example, A-movement feeds Ā-movement by taking a wh-phrase out of an inner island. But A-movement can also bleed Ā-movement; in the second example, it blocks subextraction from an in-situ NP, as already discussed. The third example shows that Ā-movement cannot bleed φ-agreement and/or A-movement. Finally, the last example shows in what sense downward movement is counter-feeding: object promotion can feed wh-movement, but not when the latter is downwards. Once upon a time when chains were more popular, such behavior had been banned by the Proper Binding Condition, which ran into problems mainly because of remnant movement: (6)a. [__1 How threatening]2 would a hobbit1 seem to Sauron __2? b. [Assigned __1 to the department]2, those sheep2 couldn't possibly be __2 .

In all four cases, ExtC correctly generalizes that operations associated with a c-commanding category must precede those associated with a category it c-commands. The role ExtC plays also comes out in combination with the theory of Locality in Chomsky 1995 and later work. Instead of a global metric over bounding nodes, Locality came to restrict features which trigger operations to the closest matching Goal in their c-command domain. To see where ExtC comes it, consider a weak island violation: (7)4Why1 [Q-]R is it known which name2 [Q-]E Heleth called Ogion __2 __1? ExtC forces wh-movement triggered by [Q-]E to happen before [Q-]R enters the picture, giving counter-feeding relationships all over the place. I'd bet (socks or other small change) that IMG is also responsible for Postal's 1974 observation (Rezac 2002c for the bet): (i) *Peter assured Paul / wagers Mary to be leaving on a jet plane in the early morning rain. (ii) Mary1, who1 Peter assured Paul / wagers __1 to be leaving on a jet plane in the early morning rain (iii) The crown alleged there to have been goats in the pen before the dragon came. The ECM complements of wager-class and (what I'll call) assure-class verbs apparently can only enter into an A-relation with a trace (ii) or and expletive (iii). The observation is pervasive: ECM and Raising with experiencer constructions in Romance, HNPS and Negative/Quantifier Movement in Icelandic, etc. My guess is that Fox (1998) and Müller (2002) are right in taking the edge of every XP as a (potential?) landing site for successive-cyclic Ā-movement, so who1 moves and IMG applies somewhere between the embedded TP level and matrix v in (ii) to render who1 invisible, leaving only t1. 4 Heleth called Ogion Silence because Ogion seldom spoke unnecessarily, except for the unfortunate suggestion about the goat. So the semantics is fine.

3

since adding a category to a phrase marker and it's entering into long distance

dependencies take place in tandem. This rules out a derivation where why moves to the

matrix clause first and what name to the embedded clause second: at the [Q-]E level, the

upstairs [Q-]R isn't there yet.

So much for the idea. In a nutshell, we have the following observations: (a)

categories enter into long distance dependencies in the order of dominance; (b) ExtC gets this by deriving dominance from the order of assembly, which is also the order in which

dependencies are established. A still bolder speculation is that the mechanism which

drives assembly and establishes dependencies is the same; this is what Chomsky 1998

comes to when Merge, like movement and agreement, becomes asymmetric.

The Extension Condition imposed ExtC "from the outside," we might say; it is not an

intrinsic property of the construction algorithm, but a condition that governs its output.

On the other hand, Chomsky 1998 explores embedding cyclicity in the mechanism of the

derivation itself; and that's the idea we will explore here. We will want to keep the basic

insights; but we will also try to exorcise some haunting issues.

1.2 The Dogma

We will adopt the technology promulgated in Chomsky 1998 or MI. This is both what

we will proceed to make changes to, and the base line of what we hold inviolate.

Derivations start with a Numeration, which is a collection of lexical items (LIs). LIs have interpretable/valued and uninterpretable/unvalued features, [F-] and [F+] resp.;

[F±]X is [F±] on LI X. MI discusses three syntactic operations in detail, Match, Merge,

and Agree. Match is a pre-requisite to Agree, which eliminates [F-]; the constraints of

Match are as follows (we will call [F-]X the Probe):

(8) Matching is a relation that holds of a probe P and a Goal G. Not every matching

pair induces Agree. To do so, G must (at least) be in the domain D(P) of P and

satisfy locality conditions. The simplest assumptions for the probe-Goal system

are:

(I) matching is feature identity

(II) D(P) is the sister of P

(III) locality reduces to "closest c-command"

Thus D(P) is the c-command domain of P, and a matching feature G is closest to P if there is no G' in D(P) matching P s.t. G is in D(G').

(Chomsky 1998:38)

Agree in turn takes the Probe and Goal and deletes [F-]Probe under valuing by [F+]Goal, at

least, creating a Modified Lexical Item or MLI of the Probe. There is a further condition:

(9) [U]ninterpretable features render the Goal active, able to implement an operation:

to select a phrase for Merge (pied-piping) or to delete the probe. The operations

Agree and Move require a Goal that is both local and active.

(Chomsky 1998:38)

4

Movement is a consequence of Match, which is followed by Merge of the Goal at the

Probe if the Probe also has an "EPP feature" (which by itself does not enter into the

Probe-Goal system). A pied-piping operation is also involved in movement, which

selects how much of the Goal to take along, but peripheral to the discussion in MI.

Merge is thus both a component of move, and the basic combinatory operation of syntax:

(10)a. Merge(α, β) → {Label(α), {α, β}} b. Label(α) = α, for α an LI. (Chomsky 1998:50)

MI argues that Merge is inherently asymmetric, even when theta-theoretic rather than driven by the uninterpretable features of a Probe. The selector-selectee or Probe-Goal

asymmetry determines and is encoded as the Label, α being the Probe. Labels will play a

major role in what follows; we will take seriously the Bare Phrase Structure (BPS) idea

that a label is literally the LI that projects through a phrase marker:

(11)a. On minimal assumptions, the label Γ should be the label of either α or β. Hence

no matter how complex the object construct, its label is an LI, the head selected

from the lexicon that has "projected" through the derivation, or a reduced MLI.

(Chomsky 1998:50).

Finally, cyclicity is motivated (on grounds of computational economy) and encoded

as follows:

(12) The operations Merge and Agree must:

(52) (I) Find syntactic objects on which they apply

(II) Find a feature F that drives the operation

(III) Performer the operation, constructing a new object K.

An operation OP takes objects already constructed (perhaps in the lexicon), and

forms from them a new object. Condition (I) is optimally satisfied if OP applies to full syntactic objects already constructed, with no search; that is, if CHL

operates cyclically. It follows that derivations meet the condition (53):

(53) Properties of the probe/selector α must be exhausted before new elements of the lexical subarray [Numeration] are accessed to drive further

operations.

If the properties of α are not exhausted, the derivation crashes because α can no

longer be accessed. … By condition (II), F has to be readily detectable, hence

optimally in the label L(α) of α, its sole designated element. (MI:49, 51)

Observe that condition (53) of MI:49 applies to both the Probe-Goal relationship for [F-

]Probe, and the theta-theoretic selector-selectee relationship. (53) thus claims that one

5

mechanism is responsible for the fact that phrase marker assembly and triggering of

operations happen go hand in hand, and this mechanism is moreover intrinsic to the

definition of the basic combinatory procedure of syntax, which is asymmetric.

1.3 The Plot

Too anticlimactic.

2 Probing Specifiers

The first assumption we will challenge is that the domain of a Probe [F-]H is the

complement of H. This almost follows from (8), except for the unclarity of what a Probe

is; but from the discussion in MI and Chomsky 1999, so much seems clear. Whether or

no, however, I will argue that first the assumption is conceptually stipulative, and

empirically false. In a limited set of circumstances, specifiers will turn out to be in the

domain of Probe, exactly when a natural formulation of ExtC predicts them to be.

One of the great minimalist virtues of ExtC is that the domain of Probe, Dom([F-]H),

need not be stated: it is all and only the syntactic object (SO) which has been constructed

up to this point. This is what directly rules out "lowering" derivations, taking over the

role of the PBC but without the problems the PBC ran into. Take a typical remnant

movement derivation:

(13) [__1 How threatening]2 would a hobbit1 seem to Sauron __2?

PBC is violated at the surface. But ExtC correctly predicts that subextraction of a hobbit

out of [a hobbit how threatening] is fine at the TP level, and wh-movement of the

remnant at the succeeding CP level. So ExtC allows remnant movement, yet correctly

bars lowering.

That all of the current SO is in Dom([F-]H) partly follows either from it being all

there is, or from it being the sister of the Probe when it is introduced; which, it doesn't

matter right now, for until we get to multiple specifiers ExtC makes sure they give

identical results. In either case, the crux of the issue is that as soon at least as a category

H with Probe [F-]H projects, the domain is extended on the BPS assumption that a label

is literally an LI which accordingly contains the Probe. Now excluding multiple

specifiers to which we will return, the insertion of a specifier must trigger projection:

both the theta-theoretic base-generation of [Spec, HP], for the new SO needs to be labeled H; and the insertion of one by movement, for the same reason. Thus, as soon as

[Spec, HP] is created, H with unvalued Probe [F-]H projects, and [Spec, HP] falls into

Dom([F-]H):

(14) {H, Complement} → {H, {Specifier, {H, Complement}}}

The empirical consequence should be that after but not before the Merge of [Spec, HP],

Dom([F-]H) should include it.

One might suspect that the limitation of domains to complements is empirical, and

rests on constructions of the form Fido ate my socks: unpretentious English transitives.

6

The configuration we have at hand is:

(15)a. {v, {External Argument, {v, {V, Internal Argument}}}} which is

b. {vP [φ-]v {[φ+]NP {[φ-]v {VP V [φ+]NP }}}} or more generally, c. {HP [F-]H {[F+]YP {[F-]H {ZP…[F+]XP…}}}}

After [Spec, HP] Merges in and H projects, we expect the specifier to be in the domain

any Probe on H, and by Locality to be the only visible constituent if it has a Matching

feature. Since Agree in transitives must be forced5 between v and the internal argument,

we want to exclude [Spec, vP] from the Probe [φ-]v.

As already noted in the discussion of (14), of which (15) is an instance, ExtC in fact

predicts that upon the Merge of the complement VP, Dom([φ-]v) is at first limited to it; only after [Spec, vP] is Merged and v projects is the domain extended. Thus, there is a

step in the derivation that does allow Agree between v and NP in the complement. And

in most implementations of MP, some sort of an economy metric will force Match plus

Agree to apply as soon as possible. Two examples are the Earliness Principle (EP) of

Pesetsky and Torrego (2001:400), and Minimality of Collins (1997:77):

(16)a. Earliness Principle (Pesetsky and Torrego 2001:400): An uninterpretable feature

must be marked for deletion as early in the derivation as possible.

b. Minimality (Collins 1997:77): An operation OP (satisfying Last Resort) may

apply only if there is no smaller operation OP' (satisfying Last Resort).

We will adopt EP throughout. So we are in fact fine here.

What we really want, then, is empirical evidence that ExtC makes the right

predictions: that [Spec, HP] is in the domain of a Probe of H, but if and only if there is

no Match within the complement, which will force EP to delay until the specifier is

Merged in. I want to show that this is in fact what is going on, specifically for [φ-] on v,

and the data are found in Georgian and Basque. However, first there are a number of

situations where it is prima facie necessary that [Spec, HP] is in the domain. One

example are constructions with why and how on their "high" reading, which are often

taken to involve base-generation in [Spec, CP] (e.g. Rizzi 1990)6; on standard

assumptions then, [Q-]C must enter into Agree with its specifier (esp. if there is no other

wh-word). Nonetheless, this is not very impressive because in English, Romance and

Slavic at least, we don't get to see a special [Q-] complementizer here. Similar is the case

of subjects of individual-level predicates, which have been argued to be base-generated in

[Spec, TP] (e.g. Kratzer 1989, Diesing 1992), and T0 agrees; agreement from within the

complement of T0 is pre-empted because it already has accusative Case. But again, not

too impressive; we are after language-internal evidence of ExtC governed alternation.

5 Strictly speaking, just allowed if the internal argument has structural Case; for if Case assignment is a

reflex of Agree as in MI, the alternative derivation where v Agrees with [Spec, vP] does not converge

because accusative assignment fails. However, as soon as switch from transitives with a structural Case on the internal argument to ones with a lexical Case, e.g. verbs like German hilfen, Czech pomoci 'help' which

govern a dative, Match of [φ-]v must effectively be restricted to the VP.

6 I think this generalizes quite widely to all non-VP

max adjuncts, to "adverb preposing" in Romance, and to

weak island escapers, but this is only work in progress at this point.

7

Let's take up Georgian first. Bejar (2000) argues that Georgian agreement

morphology argues that (a) the φ-feature Probe must be split into separate person ([π-])

and number ([#-]) Probes, with [π-] on v and [#-] on T; and (b) 3rd

person and sg. number are morphologically underspecified and literally not present in the syntactic

representation of φ-features. This leads her to a very impressive simplification of the

complicated Georgian morphology. The gist of the pattern is that the Georgian verb has

exhibits only one agreement morpheme for [π-] and [#-] each, but the two may cross-

reference different arguments not predictable by structural or thematic conditions. [π-]

(for her on v) cross-references only the internal argument (16), (17); [#-] (on T) cross-

references the external argument, but only if it is plural (17); otherwise, it cross-

references the closest plural internal argument (16). On her analysis then, the generalization is simply that a particular Probe cross-references (Agrees with) the closest

Matching non-underspecified argument in its domain (the (b) schemata).

(17)a. m-xedav-t

1-see-PL

"You(pl.) see me."

b. T0 External argument v

0 Internal argument

[#-] [#=PL] []

[π=2] [π-] [π=1]

(18)a. g-xedav-t

2-see-PL

"I see you(pl.)"

b. T0 External argument v

0 Internal argument

[#-] [] [#=PL]

[π=1] [π-] [π=2]

This pretty much deals with the mess that Georgian agreement is, and I adopt its

essentials wholesale. However, there is a pattern which goes at first looks against the grain of the analysis: under certain circumstances, the external argument is cross-

referenced for [π-] on the verbal morphology:

(19)a. v-k'l-av

1-kill-X

"I killed him."

(20)a. mi-v-[s-]ts'er-e-t

hither-1-[DAT-]write-X-pl.

"We have written it to him/them."

I think the generalization this pattern is amenable to is quite striking: agreement with the

external argument takes place if and only if there is no [π+] feature in the complement of v

0, namely if there either are no internal arguments or if they are the absent 3

rd person.

This is exactly what ExtC and EP leads us to believe, and it lends further support Bejar's

analysis. [π-]v must try to find a Match as soon as the complement is Merged; but if

there's nothing there, once v projects upon Merge of [Spec, vP], the specifier falls into its

8

domain and it must try and find a Match there. A pre-requisite for this is that the relevant

feature be underspecified in the language and thus not Matchable; English obviously

doesn't behave this way.

Georgian is not isolated in this respect. Basque is a three-way agreement language,

cross-referencing ergatives, absolutives, and datives with separate agreement markers on

the verb. In certain tense/mood combinations, ergative displacement takes place: if the absolutive is 3

rd person, person agreement with the ergative is found in the absolutive slot

and uses absolutive markers (but Case marking does not change). Compare present

transitive (21)a with the past (21)b (Laka 1993):

(21)a. Zuk ikusi nind-u-zu-n

you-E saw me(ABS)-have-you(ERG)-past

"You have seen the woman."

b. Zuk emakumea ikusi zen-u-en

you-E woman-det seen you(ABS)-have-past

"You saw the woman."

I will make a number of assumptions first which are justified in Rezac (2002c): (a)

ergative and absolutive, but not dative agreement is φ-feature agreement with arguments that bear Case; (b) the ergative is base-generated in [Spec, vP] and the absolutive as the

complement of v0, (c) v

0 hosts the [π-] feature responsible for absolutive agreement, and

(d) [π=3rd

] is not syntactically present in Basque, the pattern is exactly as in Georgian. We then see the same pattern as in Georgian: the absolutive agreement slot, a reflex of

[π-] on v, must cross-reference the NP in the complement of v; but if there is no [π+] there, it must agree with [Spec, vP], which is what ergative displacement is.

I have somewhat shorted the argument from Basque here, and it is taken up more

fully in Rezac 2002c, where I will deal with e.g. why there is no separate ergative

agreement under ergative displacement. But I have introduced it here because it nicely

contrasts the Georgian, in that Basque does regularly cross-reference separate arguments

for person and number separately on the verb; and so it is even more striking that we see

absolutive morphology on the verb just when ExtC and EP lead us to find it. Similar

patterns are found elsewhere; Yimas is an example I want to turn to in the future.

So it seems that specifiers are in the domain of Probe after all, and the domain is

predictable from ExtC and EP. Nice, minimal, even virtually conceptually necessary

perhaps. In the next section, we will see that this is not always the case, and the

exceptions are quite principled: a specifier is outside of the domain of Probe if it has

Merged in, but the Probe has not yet projected. The evidence which bears on this are

multiple specifiers, and we will begin by reviewing the data.

3 Multiple specifiers and crossing paths

3.1 The facts

BPS does not lit recursion of Merge, giving rise to Multiple Specifier Constructions or

9

MSCs7. Two kinds of MSCs have been the focus of the literature: (a) one specifier is

formed by Move as a consequence of Match, and the rest are base-generated; (b) all

specifiers are formed by Move consequent on multiple Match for a particular feature.

The first kind have recently been investigated by Doron and Heycock 1999, who

survey these constructions in Japanese, Arabic, and Hebrew; I adopt their analysis. They

show that in such MSCs, there is a unique nominative subject moved from the vP to [Spec, TP], agrees with T, and remains closest to it: this is the "narrow subject". Then

there are nominatives, "broad subjects", which are base-generated directly in [Spec, TP],

do not agree with T, and precede the narrow subject.

(22) hind-un /aT-Tulla:b-u yuqa:bilu-una-ha

hind.F-N the-students.M-N meet.3M-PL-her

"The students are meeting Hind."

Doron and Heycock have a range for arguments that both broad and narrow subjects

are in [Spec, TP], rather than specifiers of other categories: (a) broad subjects can appear

in clauses where topicalization and left dislocations are impossible, e.g. under ECM, and

are not interpreted as topics or foci; (b) in ECM clauses, broad subjects get ECM

(accusative) Case; (c) they can bind subject-oriented reflexives in the narrow subject8; (d)

they can be controlled; (e) they can be shared between several predicates. (a) and (b) are

demonstrated here for Arabic, which disallows topics in ECM clauses:

(23) Dhanan-tu l-bayt-a /alwa:n-u-hu za:hiyat-un

thought-1.SG the-housei-ACC colors-NOM-itsi bright-NOM

I believed the house to be of bright colors.

Doron and Heycock also show that broad subjects unlike narrow subjects must be

introduced by Merge rather than Move, since (a) the dependency between the subject its

θ position violates strong islands; (b) the θ position is spelled out as a pronoun; (c) there

is no scope reconstruction to the θ position; and (d) idiom-chunks involving broad subjects only get the literal interpretation. Points (a) and (b) have been exemplified in

(23), where the broad subject is coreferential with a possessor spelled out as an overt

pronominal affix.

Two facts here are crucial for us: the narrow subject is always the specifier closest to

T, and the root, and only it can trigger agreement. The latter point is forced by the

theory we have so far: when T Merges with its complement, EP forces its φ-set to be

valued immediately if possible, which it is. However, it is not clear that the narrow

subject should be closer to T than broad subjects. Presumably, what licenses broad

subject constructions in the first place is the ability of T to have multiple EPP features,

arguably if and only if adding them creates an interpretive difference (Rezac 2002b). As

between evaluating an EPP feature and [φ-] on T, there is no intrinsic ordering; we might

7 Arguments that both multiple specifiers and left-adjunction are needed mostly remain to be given, but I

believe they exist: scrambling, for example, induces far worse subextraction islands than specifiers, and

scrambled positions cannot feed further movement (Müller 1998, Grewendorf and Sabel 1999). 8 But not beyond: constraints on binding domains that a lower subject-oriented reflexive be bound by the

closest c-commanding specifier, namely the narrow subject.

10

expect EPP can be discharged by broad subject Merge and [φ-] by narrow subject Agree in parallel. Subsequent movement of the narrow subject, then, might (or might not) be expected to cross over the broad subject which is already in [Spec, TP]. To get at the heart of this, we must examine the second kind of MSCs, those formed by multiple movements induced by one feature; I will put aside for the moment just how a single feature can enter into multiple Match plus Merge relations, returning to it in a bit. First off, it turns out that Locality conditions on Match along with the copy theory should never allow such a configuration. We start off as follows: (24) [F-]H … [F+]α … [F+]β … [F+]γ Match can occur here between [F-]H and [F+]α, and only between them; α or the copy of α, both with [F+], should always intervene between the Probe [F-]H and lower occurrences of [F+]. However, configuration (24) in fact paradigmatically allows Match plus Merge of all elements with [F+] if the category that hosts the Probe allows multiple specifiers in the first place. The most commonly discussed example is multiple wh-movement (Rudin 1988, Richards 1997, Alboiu 2000), here in Romanian: (25)a. Cine1 cui2 ce3 a dat t1 t2 t3? Who whom what AUX.3SG given? b. Cine1 ce2 ziceai [CP că işi inchipuie t1 [CP că ai spus t2]] who what say-2.SG.PAST that REFL imagine.3.SG.PR that AUX-2.SG said Who did you say imagines you said what? (Alboiu 2000:171-2) A catalogue gives some idea of how wide-spread this is beside wh-movement (e.g. in Romanian, Bulgarian, Tibetan): optional Ā-movement of non-exhaustive quantifiers in Romanian (Alboiu 2000:219ff., chapter 5); NPI movement in Bulgarian (Izvorski 1995) and West Flemish (Haegeman 1995); scrambling in West Flemish (Haegeman 1993); tous-movement in French (Starke 2001:77ff.); NP-movement in Icelandic (Richards 1997:90-99, Rezac 2001, 2002b); A-scrambling in Japanese (Richards 1997:84ff.); second position clitic clusters in Serbo-Croatian and Tagalog (Richards 1997:106) and Czech (Rezac 2000a, 2000b; there assimilated to NP-movement). For all of these, it can be shown that (a) one feature type of a single category is targeted; (b) the movement forms crossing paths, for which see much more below9.

So, Locality as given cannot bar MSC. However, based on independent evidence from agreement in Icelandic10, Chomsky (2000:131) reaches what I will call Ω: (26) Ω: It is only the head of [a]11 chain that blocks matching under the locality

condition (iii). A-movement traces are 'invisible' to the probe-associate relation;

9 To be sure, MSCs have been argued to have other properties. Equidistance is just wrong; see the

appendix. McGinnis's (1998a) Lethal Ambiguity is far more intriguing. 10

In Icelandic, in the configuration T0 … DAT … NOM, Agree between T

0 and NOM may happen iff

DAT moves to [Spec, TP]; in particular, no Agree is possible in T0 … DAT1 … DAT2 … NOM (a raising

construction with DAT1 being the experiencer of the raising verb, and DAT2 … NOM an ECM clause with a quirky dative subject and nominative object (such as a psych-verb or ditransitive passive construction). 11 For Chomsky's "the A-".

11

or from another perspective, the A-chain itself (regarded as a set of occurrences of α) constitutes the barrier.

Anagnostopoulou (To appear) has given another very strong argument for Ω. She investigates constructions of the type (24) in Greek, where H is T, α is an indirect object with lexical genitive that prevents it from Agreeing, and β is an NP with structural Case. She finds the following generalization (chapter 1), which may be exemplified by the passive of a ditransitive in the double object construction: (27)a. When a nominative argument undergoes NP-movement [subsuming

Case/agreement without XP pied-piping -MR] to T in the presence of a dative DP argument, the dative DP is not allowed unless it is a clitic or is doubled by a clitic.

b. To gramaj ?*(tui) taxidromitike tu Petrui tj xtes the letter-N ?*(CL-G) was.sent the Peter-G yesterday Thus, Match of a lower Goal past a higher Goal with Matching features requires displacement of the higher Goal to the specifier of the Probe. In Rezac 2002b, I argue that given Ω, multiple goals can match one type of feature in a derivation in fact only if they form an MSC of the Probe, for Locality rules out every other derivation12. MSCs driven by one feature type have a crucial property in common: they form crossing, not nesting paths, in that the c-command relations among the specifiers of the head are the same as those before the movement (Richards 1997:chapter 3). This Crossing Paths Condition or CPC on MSCs violates the Extension Condition coding of ExtC, by which movement must extend the tree. Since it happens, we want to code ExtC in such a (natural, simple) way that CPC falls out. A hint is found in the case of MSCs created by movements for different feature types. Miyagawa (1997) and Richards (1997:78ff.) discuss pre-nominative scrambling in Japanese, which they show is movement but not Ā-movement by the standard diagnostics (e.g. idiom chunks), but which McGinnis (1998:106) following Takano (1997) shows is not φ-driven either, because it affects PPs. In this it contrasts minimally with NP-movement e.g. in Icelandic, which affects only NPs with structural Case. I will refer to this constellation of properties as A-scrambling. Multiple A-scrambling shows CPC, as shown by Richards (1997:80), e.g. pre-nominative movement of two idiom chunk NPs must preserve the in-situ order: (28)a. Taroo-ga hi-ni abura-o sosoida Taroo-N fire-D oil-A poured Taroo made things workse b. Hi-ni Taroo-ga t abura-o sosoida // *Abura-o Taroo-ga hi-ni … c. Hi-ni abura-o Taroo-ga t t sosoida // *Abura-o hi-ni … (Richards 1997:80-81, ex. 34a-c, 35a-b) 12

Pesetsky's (1982) nesting paths are thus out, which seem correct. Some nesting paths, such as super-raising and wh-islands, are obviously out; others, such as copy raising and wh-island violations, have

received motivated alternative analyses (base-generation, different attractors, depending on the school).

12

We will take the feature responsible to be [Σ-], on some category.

Now it turns out that it is possible to show that [Σ-] is in fact on T, where also the φ-features are that drive nominative NP-movement. Two completely independent

arguments are found in the literature: Miyagawa 1997 and Richards 1997:84ff.; I will

only review the first. The Japanese T allows MSCs. φ-features on T can be diagnosed

either by the presence of a nominative subject, or by a subject honorification marker on

the verb which is a form of verb agreement (Harada 1976, Toribio 1990). The subject can also be genitive under a process called NOMINATIVE/GENITIVE (or GA/NO) CONVERSION in relative clauses and complex NPs; Miyagawa (pp. 16-17) argues that no

AgrS is present in that case:

(29) [DP [IP John-no/ga katta] hon] John-G/N bought book

The book that John bought (Miyagawa 1997:16, ex. 46)

Now granting this, Miyagawa shows that pre-subject A-scrambling is out if it occurs in

the ga/no-conversion structure, unless there is honorification (p. 18-19):

(30)a. [DP [IP te-oi [IP Tanaka-ga/*no hoteru-gyoo-ni ti nobasita]] uwasa] Idiom

hand-A Tanaka-N/*G hotel-business-to extended rumour

The rumour that Tanaka became involved in the hotel business

(Miyagawa 1997:18, ex. 53a, b)

b. ?[DP [IP te-oi [IP Tanaka-kyoozyu-no hoteru-gyoo-ni ti o-nobasi-ni natta]] uwasa]

hand-A Tanaka-prof.-G hotel-business-to SH-extended-SH SH rumour

The rumour that Prof. Tanaka became involved in the hotel business

(Miyagawa 1997:19, ex. 56)

So A-scrambling is tied to the same T whose φ-features drive nominative movement13

.

And now we reach the golden donkey14

. As we have seen throughout, A-scrambling

is to a position above the nominative (narrow) subject. Both types of movements target

the TP. Hence multiple movements driven by [Σ-] on T do not tuck in under that driven

by [φ-] on T15

. Consequently, MSCs formed via a single (type of) feature obey CPC; but

movements driven by evaluating different (types of) features on a single head will not tuck in within each other, a conclusion also reached by McGinnis (1998:114-115). To

push the conclusion a bit, I will go with an intuition, somewhat justified since [φ-] is

obligatory and A-scrambling is optional: [φ-] is evaluated first in the derivation, [Σ-]

second. So we may go a bit further: movement blocks (MSCs) driven by distinct

features do not tuck in under each other either; their c-command relations mirror their

derivational history (φ- > Σ- but A-scrambled position > nominative), and the tree is extended in this case.

13 What be these marvelous heads with multiple Probes? Syncretic, that's what they are: cf. the proposals in Diesing's (1990) for Yiddish and Zubizarreta's (1998) for Spanish that T has both A and Ā-features, and Rezac (2002b) that Icelandic C and Bejar and Rezac (2002a) that Spanish T have both Ā and [Σ-] features. The last two sources identify [Σ-] with the EPP itself. 14 Neologism; copyright summarily waved. 15 Recall that multiple φ-driven NP-movements, as Icelandic, do tuck in, so CPC is not A/Ā-distinction.

13

So, let's summarize this bit of empiricizing about how multiple features on a single

category behaved. (I) MSCs formed by the same feature (type) obey CPC, tucking in

under each other. (II) Movements due to different (types of) features do not tuck in under

each other, but extend the tree, the movement block closest to the root being the one

driven by the first-evaluated feature, etc. (III) As between Merge for EPP-alone, and

Merge consequent on Match and Agree for φ-features, the first c-commands the second16

.

3.2 The form of the solution

We want to deduce (I)-(III) from ExtC and EP. We will see it works out quite nicely, but

first let's return a bit to a point mentioned in section 2: given that ExtC limits the search

space of a Probe, by deriving SOs bit by bit, to the current SO, is that all there is to say

about it, or does the position of the Probe enter in? Most plausibly, position would limit

the domain to the sister of the Probe, which under BPS is the object the category of the

Probe labels. There's more at play in deciding between these two alternatives than just

adding an axiom. The second approach is more congenial to the late-nineties arguments

that computational tractability of an intuitive sort matters for syntax, and in particular that

Locality and sometimes ExtC involve a top-down search of the syntactic tree (which derives c-command). I think the very existence of MSCs forces the second alternative. If

the search-space was the entire SO, Locality (any metric based on something like c-

command) would automatically pick the specifier of the Probe as closer than any Match

embedded in a complement, and since in MSC formation the specifier by definition has a

Matching feature, it would create a "defective intervention effect". On the other hand, if

the domain is the sister of the Probe, we can let MSCs form, and it turns out predict CPC,

and not even add an axiom, if Probes don't necessarily project and thus extend their

domain when specifiers are added.

This is the argument we will turn to now, with a bit of a road-map. We will start with

a biopsy of how MSCs would have to be built in on current assumptions, and see just

what the problem with tucking-in is. Then we reconstruct. Tucking-in falls into place if

we can dissociate Merge and projection (labeling): Merge targets the Probe, so if the

Probe hasn't projected, previously added specifiers will be ignored, giving CPC.

Projection will be forced by appealing to inclusiveness in a very general form; it turns out

that Agree cannot just change an LI as it pleases, it must change its copy, and that copy must be added it as the LI's projection and the new Probe. Happily, the new projection

operation, once actually written down on paper, is just a toke of Merge itself. We end up

16 The configuration rings of the Merge-over-Move preference, supposed to account for (i): (i) There seems <*a man> to have arrived <a man>. I confess to having been very partial to Merge-over-Move, despite the extremely limited domain in which it holds (there-type expletives in ECM constructions in English and Scandinavian). However, Bošković (2002) has given a decisive counter-argument, so we have to find another way of dealing with (i) (and (iii)): (ii) Deux soldats resemblent (*au général) être arrivés en ville two soldiers seem to the.general to.be arrived in city (iii) Il semble au général être arrivé deux soldats en ville. (iv) *Il semble au général pleuvoir. (ii) shows that raising across an experiencer is out in French. (iii) shows that 'raising' of an expletive is fine, however, so long as it's not a quasi-argument (iv). The contrast shows the expletive is base-generated upstairs and not raised; yet (iii) is at the same time Merge-over-Move formation (i), with deux soldats not in the embedded [Spec, TP], which the Merge-over-Move preference tries to account for

14

with a pretty picture: Match, Agree, and Project all apply freely given up to convergence,

and we will see one way of bringing it all together in section 4. Along the way though,

we'll have to swallow a lot of minimalizing, like getting rid of (un)interpretability as a

force to be reckoned with.

3.3 The problem with tucking in

In the MI implementation, SOs are built by Merge, which combines two SOs one

asymmetrically related to the label of the other. To this end let's introduce some

terminology. S(α, β) means that there is an asymmetric relationship between α and β; for

now, S(α, β) if L(α) theta-selects for β or a Probe of L(α) finds a Match in β. L(α) as the

label of α will be redefined from MI as follows, but Merge is kept:

(31)a.17

L(α) := γ s.t. γ is the unique X0 ∈ α.

b. Merge(α, β) → {L(α), {α, β}} iff S(α, β).

The labeling or projection inherent in Merge takes a copy of L(α) and gets it in a position

where it dominates the combination {α, β). This yields correct results for both the first Merge of a complement YP and the

second Merge of a specifier XP18

:

(32)a. Beginning: SOs (sets) YP, XP in the Numeration, {H} in the computation

b. Merge({H}, YP): H → {H, {{H}, YP}}

c. Merge({H, {{H}, YP}}, XP): {H, {{H}, YP}} → {H, {XP, {H, {{H}, YP}}}}

But once we start adding further specifiers in a CPC-obeying MSC, the situation starts

looking very different (the added specifiers are in bold):

(33)a. Merge(XP2, {H, {XP1, {H, {{H}, YP}}}}): 2nd

Spec

H → {H, {XP1, {H, {XP2, {H, {{H}, YP}}}}}} b. Merge(XP3, {H, {XP1, {H, {XP2, {H, {{H}, YP}}}}}}): 3

rd Spec

H → {H, {XP1, {H, {XP2, {H, {XP3, {H, {{H}, YP}}}}}}

The oddity is the pretty arbitrary nature of formulating tucking-in: the input to Merge is

α and β, but β must be introduced within α at a point that is to be found by isolating the

most deeply embedded SO of the form {H, {{H}, γP}}, so that we tuck in under all previous specifiers but not between H and the complement. This looks pretty stipulative:

attach β in α outermost, or innermost (between H and its complements), or anywhere, all

look fine, but attaching it at a pre-specified point that doesn't follow from anything is just

odd (note that BPS that doesn't let us express the complement/specifier distinction). So

17 I'm making a distinction between x and {x}. The way L(α) has been defined, it doesn't apply to x; but for

{x} it yields x as its label, for {x, {…}} ditto, etc. So we have adopted a way of identifying labels, and given up the Foundation Axiom; this much simplifies some definitions. I have a feeling this is more than a

coding trick, and that labels, pure LIs, are naturally distinguished from SOs by not being sets. 18

{α, β} is a set, and thus is the same as {β, α}. I use ordering just for perspicuity: instead of always writing {H, XP} or vice versa, I write {H, XP} if XP is a complement, and {XP, H} if it is a specifier.

15

instead of adding to Merge, let's reduce it a bit.

The Merge of MI goes beyond a necessary combinatory procedure and encodes the

asymmetry that holds between α, β by virtue of S(α, β). Let's split this up, and take Merge to simply be just combinatory:

(34) Merge(α, β) → {α, β}

Now suppose we could come up with a simple, natural, virtually conceptually necessary

way of forcing the projection of a label under theta-theoretic Merge (A), suspending it

during the formation of a CPC MSC (B), and forcing it again when we are done with a

feature in forming an MSC (C) (a single specifier being the limiting case of an MSC).

Suppose further that we can identify or keep track of the label of an SO. Then, it seems,

we get both to form MSCs without higher specifiers interfering for Locality, and we

predict the CPC. First, if we assume that the domain Dom(L(α)) is the sister of L(α),

then since projection takes place after theta-theoretic Merge (A), any [Spec, αP] thus

introduced falls in Dom(L(α)). But for MSCs, no projection takes place while [F-]L(α))

constructs a CPC-MSC (B) and such specifiers are outside Dom(L(α)), and out of

running for Locality, until we are done with this feature, at which point L(α) projects (C)

and the entire current SO falls into Dom(L(α)). MSCs will obey CPC exactly so long as

they are being produced by one feature because only then does no projection takes place

(B), and thus each successive specifier Merges just above the current location of the label

which stays the same:

(35)a. Merge(H={H, {{H}, YP}}, XP1}): H → {XP1, {H, {{H}, YP}}}

b. Merge(H={XP1, {H, {{H}, YP}}}, XP2}): H → {XP1, {XP2, {H, {{H}, YP}}}}

c. Merge(H={XP1, {XP2 {H, {{H}, YP}}}}, XP3}:

H → {XP1, {XP2 {XP3, {H, {{H}, YP}}}}}

A new feature projects a new label (C) over all the current SO, and Merge targets that.

Finally, even observation (III) about broad subjects falls into place: [φ-] on T is valued first, and once Agree takes place T must project over the narrow subject nominative; EPP

valuation by broad subject Merge follows atop this new label, so broad subjects are above

narrow subjects.

We have a big supposition, and a little one. The little one is that labels can always be

identified. We'll just say this is a condition on any final system; the one we build in

section 4 will do fine. The big supposition are conditions (A)-(C) governing projection.

(B) and (C) suggest a very simple generalization: projection happens upon Agree, which

eliminates a feature from the computation. Why this happens, and how (A) fits in, will

be the topics of the next two sections respectively. However, first let's suggest a solution to a minor design issue; just how is it possible

for one feature enter into a relationship with multiple targets in the formation of MSCs?

A theme that will run through what follows, but that can only make any sense when a

complete system is suggested in section 4, will be that syntactic operations are more or

less free to apply as they will; Agree in particular. This means that a feature [F-] may

engage in multiple Match plus Merge (movement) operations, if the head that hosts it

provides positions for multiple specifiers, without Agree happening. Still though, we will

16

want the computation to finally cease its ruminations over [F-] and move on to either a

new feature; and this means getting rid of the current "locus" feature by Agree.

3.4 Agree and Project

The visible result of Agree is an unvalued feature valued by a Matching Goal. One conceptual take (Chomsky 1993, 1995) has been that unvalued features are

uninterpretable, and must be deleted during the course of the computation to LF under

identity with interpretable valued features lest they cause a crash. Chomsky 1998 comes

close to a second take, that the relation is simple valuing-cum-deletion19

, which we will

develop. Some reason to give up the first story is the familiar hornet's nest of problems

(cp. erasure vs. deletion in Chomsky 1995), e.g. unvalued features must be removed from

intermediate links in a chain when they are deleted at its top, and the putative crash they

cause is often avoided even though no Match is possible if there is default morphology

available. But I think there are better reason: the uninterpretability-as-crash story makes

no sense, and it's not necessary.

First, let's grant that an occurrence of a feature might be paradigmatically

uninterpretable on a category, like [person] on T20

. This means that at LF, it doesn't

contribute to interpretation. But I can't think of any reason why that should cause a crash.

Surely uninterpretability isn't fatal? First of, we have got all these phonological,

categorial, semantic and other features in the computation that don't seem to cause any trouble at a non-corresponding interface (and saying spell-out e.g. filters out the

phonological but not the formal features before LF is reached is pretty ad-hoc). Worse,

some valued feature occurrences arguably just don't play an interpretive role, though their

differently-valued kin do: singular number and 3rd

person on DPs come to mind. These

are directly analogous to uninterpretable features on T. Worst of all, uninterpretable

features on T are fine in the absence of a Matching Goal, if default morphology is

available. Thus, if there are uninterpretable occurrences of features, LF seems happy to

ignore them. I think these last two points are quite important, since the evidence simply

flies in the face of fatal uninterpretability.

Do we need uninterpretability technically then (as opposed to lack of value), even as

things currently stand? Certainly not. Uninterpretability figures in MP as the big-picture

motivation for Agree only, not in the implementation; there isn't enough content to the

notion to explain why [F-] must be gotten rid of by being paired with Matching with

[F+], to start off. The one big thing that the uninterpretability-is-crash story does get us

for free is that valuing must happen before LF is reached. This seems true by and large, except for the default agreement point. But the big insight of work on ExtC after

Chomsky 1993 was that it is statable along the lines of, "Once an item is introduced into

the derivation, its unvalued features must be valued (trigger valuing operations) as fully

as possible as early as possible." This both guarantees ExtC and navigates around default

agreement. We will return to this at great length; what matters is that all is as great as it

19

That is, he does take this approach technically; but there is contingent one-to-one mapping being

value(less)-ness and (un)interpretability, and the latter drives the big picture. 20

Even so much is not clear, e.g. number-dependent interpretive differences in Animal languages is/are their main research interest (MI:34, ft. 71), or a potential analyses of pro-drop. Still, I'm sympathetic to

failure to be interpreted, just not to its being lethal.

17

could be without lethal uninterpretability

Now we're ready to see why Agree forces projection. The reason is entropy, or more

specifically, a very general formulation of Structure Preservation / Inclusiveness. The

latter former is a ban on destroying stuff, the latter on adding stuff that isn't there to start

with. And in the evolution of the theory, both have played less and less role: adding

traces from the outside let us keep structure, copy theory let us get rid of traces and keep both structure and content. The logical conclusion is that no content gets added or

deleted, except for structural information: Radical Inclusiveness. Syntax consists purely

in the combinatory (re-)arrangement of objects.

This prevents even unvalued features from deleting, or rather, from deleting on the

occurrence of the category they are on to start with. But we want to keep to the MP

observation that Match cum Agree drive syntax, and the core of this is (deletion of)

unvalued features under valuation. Agree takes a category with a Probe and by doing

something (deletion, valuing) to the Probe, it creates a modified instance of the original; a

Modified Lexical Item (MLI), as MI calls it. By Radical Inclusiveness, the original

cannot be deleted; but MLI must get into the computation anyway, else Agree is vacuous.

The answer is pretty much determined by the tools we have: MLI gets inserted at the top

of the current SO (ExtC), where its presence 'hides' the original unmodified category with

the Probe (so as to let syntax move on). This new occurrence-sans-Probe of an LI, the

MLI c-commanding the original occurrence, is exactly what a label or projection is.

Successful Agree thus forces projection. Looking closer at the implementation of this projection, we get a surprisingly

minimal picture. MLI gets added at the top of the current SO, we said. This "adding"

must look something like this, where α is the current SO and Agree outputs an MLI::

(36) Project(Agree(S(α)), α) → {Agree(L(α)), α}

In words, projection takes the current SO α, the MLI of its label created by Agree, and

labels α with the latter which in BPS means doing a set combination. But this rule,

Project, is just Merge (34). With hindsight, this is not surprising. If labels literally are

(modified) heads, then both movement (under the copy theory) and label projection have

the same formal description: take two objects and combine21

. Barring complications in implementing the mechanism as a whole (patience), we keep to a single combinatory

operation after all.

This obviously captures our desideratum (C), that after a feature is done with,

projection happens. Moving to a new feature on a head (or to a new head), which requires Agree for the old feature in ExtC implementations, will ipso facto force

projection. (B), non-projection during the formation of an MSC created by a single

feature, also follows. We take the application of Match, Merge, Agree to be free, except

as governed by principles like EP which implement ExtC. CPC MSCs are formed by

multiple Match + Merge for a single Probe, where each successively more distant Goal is

dislocated to the Probe's specifier. If at any point Agree should happen, the Probe is

valued and finished, and no more MSC formation22

. While the MSC is being formed

21

This obviously gives quite a different take on head movement, but I haven't worked it out. In spirit, it might be similar to Koenigman 00, if the teaser on the back cover (no dust jacket) is right. 22 Actually, if Merge-cum-Project is free, it can happen at any point. Obviously, if it applies during the

18

though, no Agree and no projection; and thus the Probe keeps its place in the syntactic

structure, specifiers are added outside of its domain, and each new movement Merges at

the root where the Probe is, 'tucking in' under previous movements23

.

3.5 Select and Project

The other big issue to address is (A) how come projection is consequent upon base

Merge, in particular theta-theoretic Merge. It must be; in the limit, base-generated

specifiers are always outside complements (no tucking in here) which shows that

projection follows complement merger and precedes specifier Merger; and it follows

specifier Merge, e.g. to label the category in question for selectional purposes24

. I say we

bite the bullet on this one25

: first, base Merge is asymmetric, with a selector and a

selectee and a featural relation between them; and second the relation is just Match plus

Agree with all their normal properties, the selector having a Probe that must be

eliminated under Agree with a Matching selectee.

The general intuition in MP up to and including Chomsky 1995, roughly, has been

that base Merge does not involve a featural relation. The basic combinatorial operation is

free; big-picture motivation, because it is absolutely necessary for language. Label

determination rises as the big technical problem: what determines, when V and N are

combined, that the label of {V, N} is V rather than N? Chomsky 1995:00 had a

complicated story where projection is essentially free, with wrong labels filtered out by interpretability at LF (thus, with massive globality): {N, {V, N}} simply won't make any

sense (in a given context). I don't see this: regardless of the exact categories involved,

"see a picture" and "a proposal to leave" are both semantically coherent at LF, showing

that there is nothing wrong with either (referent, predicate) or (predicate, referent) at any

rate. But global computation was superceded by local computation about this time (Yang

1996, Collins 1997, Frampton and Gutmann 1999 (ms. 1998)), and Chomsky goes for

local determinability of labels by taking Merge to be asymmetric in MI:50-1. There, the

difference between Agree in checking theory and in theta theory is that the selectional

feature is interpretable on the selector and the selectee both, and thus does not delete (the

formation of an MSC, the specifier(s) now in its domain will block Match beyond them and yet being no

longer active will disallow Agree; which generally leads to an infinite Match-Can't agree loop and no

convergent derivation. However, if a specifier that's gotten there otherwise, say by base-generation, thus

falls in the Probes domain, it may now Match with it; a way of getting the facts in section 2, though

presently we will actually decide that adding a specifier deletes a feature and thus forces projection

anyway. 23 Now, it turns out that the long diatribe against fatal uninterpretability is not the only way to go and still

keep this system. We get a chain (MLI(α), α), with [F-] on α but not on MLI(α). The same is true for successive-cyclic movement, where only the head of a chain gets its Case, etc., deleted, and the

intermediate copies have it. So even with uninterpretable features, there may be a principle that says one

(or the head) occurrence in a chain must have undergone Agree, and this licenses the other occurrences;

and this would work fine for (MLI(α), α) chains. Getting rid of lethal uninterpretability, while having

valueless-ness still drive syntax, is just one such principle; the diatribe makes the principle principled. 24 Module ft. 22, this ensures that a Probe on H gets to see [Spec, HP] after it is Merged in; EP still ensures

that it first tries looking in the complement. This requires the feature that selects for a specifier and the

Probe to be unordered w.r.t. each other, and executed in parallel; see section 4. 25

That is, after drafts of not finding this conclusion necessary, finding it necessary to bury it in footnotes, and exploring the 'hide text' feature of my word processor (the one below where the 'write dissertation'

option will one day be). At least the story seems to impose its own logic.

19

Projection Principle). Selectional asymmetry is what determines which object projects.

The featural relationship between the selector and the selectee, and its connection to the

Match/Agree system, is left unclear.

I will assume that there is an asymmetry, and that there is a featural relationship. But

I will take the relationship to be precisely just Match plus Agree: the selector's Probe

Matches the selectee under Locality, gets valued/deleted under Agree, and the selector projects as a consequence. There are a couple of points which want discussion: the

nature of the featural relationship, and the validity of the Agree mechanism here.

The first point mainly concerns what kind of features are selectors and selectees

attributed. I think we can get our best cue not exactly from theta selection, but from non-

thematic instances of base-generation and their alternations with movement. One case is

the (classis) EPP. Expletive satisfy it by base Merge; otherwise movement does, I would

argue of the closest 'dislocatable' XP does so, modulo some variation as to what counts as

dislocatable. What I mean is particularly clear in Icelandic Stylistic Fronting, discussed

recently by Holmberg (2000): if there is no subject and no expletive (e.g. under subject relativization), the closest XP will move to [Spec, TP] (adverb; participle; NP and PP argument). I think pretty much the same generalization might go for Locative and Copular Inversion in English. If we follow Collins (1997) in taking the unaccusativity restriction here to result from the theme and the locative being "equidistant" just here (e.g. if both base orders are possible), and exclude e.g. temporals from the running through being too high (TP-adjuncts, say), we ballpark the correct restriction on inversion just by saying EPP takes the closest XP. Still more generally, in Rezac (2002b) I apply the syncretic category theory to C in Icelandic, and suggest that the V2 phenomenon is due to EPP on C which gets satisfied by Ā movement if there is an Ā feature, and by the closest XP (subject or expletive) if not; the EPP is perhaps; ditto ceteris paribus for A-scrambling in Japanese (our mysterious [Σ-] above)26, broad subjects, etc. The generalization that looks as though it's coming out of all this is that there is a feature [Σ-], and classical EPP is an instance, that just demands an XP, however it gets it including initiating its own Match + Merge. If this [Σ-] feature then just seeks an XP, Locality will not let it look very far: not beyond the closest XP, give or take a bit27. Match and Agree are how we deal with such feature relations. In the movement case, [Σ-] triggers Match with its Locality, finds the closest XP, and Merge to [Spec, TP] follows. In the base Merge case, the Match is in the Numeration. [Σ-], being unvalued, gets to be valued/deleted by Agree under Match with an XP; and as we now know, this cases projection. All these mechanisms we already have; they are free to happen up to convergence, but they must lest [Σ-] is unvalued and ExtC never ges beyond it. The end result is {T2 {NP {T1 VP}}}, [Σ-] ∈ T1 but ∉ T2, for older [TP NP [T' T VP]]28. The way we've just implemented EPP, theta-theoretic Merge can also be implemented, and if it can be so it must be so. Ok, so we would prefer better reasons, but

26 Ditto Bejar and Rezac (2002b) for Romance dative "subjects". 27 What I mean by this Fodorism is, the EPP of T generally doesn't move the VP complement of T. I think this is probably general: the (head of the) complement can't be moved (Bošković and Takahashi 1994, Bošković 1997, Abels 2002). Where it looks like the VP moves to [Spec, TP] (Niuean, Massam and Smallwood 1998), we would expect a category to intervene (say, a VP-peripheral Ā target). 28 This was just one of many page-long diatribes that eventually became a non-issue; I had a sentimental attachment to this one because it ended, "…in this draft; hopefully this doesn't become a stray footnote."

20

I really don't have strong ones beyond the system when all is put together; still, I will offer two weak ones. One is parsimony. We have adopted Chomsky's hypothesis that theta-theoretic Merge is asymmetric and thus labels are determined; Merge-qua-move is also asymmetric and also determines labels, so we might want the same system for both. To do this well we would need another paper; but I don't see obvious problems. The other reason is selection. S-selection has been very popular for a while now; but it seems to me the limiting cases of c-selection assure its continued existence29. Now this is an empirical argument that selection involves uninterpretable features, for "I want a dative PP" doesn't seem interpretable on V in the same way "I am about experiencing things" is. Generalizing, I think the present system can implement c-selection fairly easily, e.g. [PP-] (though we will probably want Locality to count all XPs as one thing). But enough of this. It's pretty obvious that EPP-theoretic and theta-theoretic Merge both are parsimoniously formulable in the present system, without adding anything except some unvalued features, to give the right empirical results and the right theoretical results for projection (which is what we wanted). It goes against the intuition that theta-theoretic Merge and movement are very different kinds of things, and if they're so different maybe they should not be implemented in the same way30. I'm very sympathetic to this, but a mixture of parsimony and empirical evidence, particularly EPP which seems to bridge the two worlds, look like a good argument for an identical implementation. Let's see where we stand if this is right: we have got Match, with consequent Merge and Agree, applying freely up to convergence; each application of Agree forces projection due to Radical Inclusiveness; projection looks like it's just Merge it Agree passes it the right arguments; and the empirical facts about MSCs and CPC, and even probing specifiers from section 2, all follow. The technical modifications have not added stuff to the MI system; they've simplified and followed up consistently on principles. Now, it's time (a) give a more precise formulation of Match, Merge, Agree; (b) find a way of putting them together. 4 Basic operations and their composition

29 One are idioms. Sure, they are listed in the lexicon. But clearly, they're not words since they can undergo movement (Tabs are to be kept on all verificationist epistemologists at the conference), while parts

of words can't (the lexical integrity hypothesis for movement); so they need to be assembled in the syntax.

But it's crucial to idioms that the relation between keep and tabs not be semantic; ergo, c-selection drives

Merge. A different limiting cases are instances of the semantic-independent variation in the argument

structure of LIKE and FEAR: dative/PP-experiencer vs. Case-experiencer both across languages and within a

language. These are not just cases where current theories of s-selection don't work, but ones where I don't

think it can work in principle. 30 One may ask why the result of Agree is never spelled out as agreement morphology in the case of theta-

theoretic or EPP-theoretic Merge. Although there is an intuitive functional trade-off between Merge and Agree, in that both signal dependencies, and accordingly e.g. topicalization, wh-movement, even A-

movement are often signaled by one only operation per construction, this is not always the case (Irish has

both at the same time for wh-movement, for example, English for A-movement; but neither vice versa).

The obvious question to ask is, what would Agree for valued categorial feature look like? Under c-

selection, something like an applicative "I want a dative" would Agree into something like "I have selected

a dative"; and that is applicative (inversion) etc. morphology. More generally, we might analyze version/applicative morphology as due to one category with a Probe whose spell-out varies with the

thematic/categorial status of the NP it selects under Agree: applicative, instrumental, locative.

21

We have been dealing with Match, Merge, and Agree. Both Merge and Agree are always

preceded by Match in the current implementation, if the above remarks about theta-

theoretic Merge hold water. Match is what gives Merge and Agree their asymmetry, the

fact that they are relations between Probe and Goal, unvalued and valued feature, selector

and selectee. What we want to do now is create a system in which these operations

compose just so that all and only the desired (observed) combinations result and ExtC is implemented. This is rather more than art than science: of the many ways of doing this,

art picks those that are simple and natural, and satisfy our theoretical intuitions. I take

this to be one use of formalism: it should encode the properties of the mentally

(psychologically) real mechanism we postulate, but also clarify them and exhibit their

relationships. We'll strive and fail to do that here, but there some value in the attempt.

A word about theoretical intuitions. These are, of course, "isotropic" and "Quinean"

as Fodor 1983 puts it, in other words thankfully not capable of formulation or open to

challenge; "virtual conceptual necessity" being a choice candidate. Despite these quips, I

take theoretical intuitions to be crucial. Here's however one that's pretty isolable in the

present formulation, and we've been bumping into it: I want applications of the basic

operations to be intrinsically determined. No principles along the lines of "Agree follows

Merge follows Match follows an uninterpretable feature at the root," which is an extrinsic

description. Intrinsic determinability consists in (I) giving the operations certain

structural descriptions (e.g. Merge needs two SOs), (II) giving lexical categories certain

structural descriptions (e.g. T in English has [φ-] and EPP), (III) giving convergence requirements at the interfaces (familiar!), (IV) saying there's nothing more to the system

than (I)-(III), or perhaps some very general, intuitive principle we can't do without and

that feels like it could be conceptually justified; for us, the EP. (I)-(IV) feel pretty

minimal, though minimality is also to be applied within them, e.g. Radical Inclusiveness.

I hasten to point out that intrinsic determinability is just an intuition, and there are others

which guide the present story to give a satisfying system; but it's the one we'll follow31

.

We start of with two desiderata: operations must be able to apply the Numeration and

SOs constructed from it; and they must be able to feed (build on) each other. One option

that comes to mind is something like a production system (Post 1943, Newell 1973). In a

nutshell, operations are formulated as if-then conditions which apply just in case the if-

clause is satisfied; the conditions to which the if-clause refers, and the output of the then-

clause if any, are stored in a special data buffer, the working memory. For example,

assume Match yields a pair of SOs (α, β) and puts it in working memory; MI-style Merge

might be formulated, "if there is (α, β) in working memory s.t. [β] ∈ α, then put

{α, {α, β}} in working memory." We might easily identify working memory with the

Numeration, or create a separate buffer. This approach is very doable; the algorithm of

Frampton and Gutmann (1999), in particular, looks very much like a production system

31

Examples are (combinations of): every Numeration must be able to converge (Frampton and Gutmann 2000) or if there is a convergent derivation, it is (the only one that is) interpretable (Yang 1996); only local

decisions are allowed (Yang 1996, Collins 1997); the computation is deterministic (A-over-A, Chomsky

1954); the system should be transparently implementable on a certain computational architecture; etc. And

these lead to very different ideas: the last one, for example, might over-ride ontological purism and replace

SOs as sets with something more computationally tractable (in GOFAI), e.g. linked lists; or in; or in

connectionist architectures operations might become soft constraints (OT). What we will end up with looks (in retrospect!) like something that combines the intuitions of production systems (Post 1943, Newell 1973)

and function composition (e.g. lambda calculus, recursive functions).

22

of this sort (and the influence is felt in MI).

The reason we won't quite take this route is because it attributes a mental reality to

the intermediate outputs of all operations. In the above example, the pair (α, β) has existence internal to the formalism

32 that is independent of its being the output of

Match and the input of Merge. In different terms, there is an intermediate level of

representation with (α, β) in it. And in principle it should then be possible to state

conditions on (α , β) or have external systems access it.. Now it is purely a judgment call for me that the output of Match doesn't exist in this way, Match being just a step in the

derivation. Nevertheless, I will assume so. Thus, we want to devise a formalism that will

disallow, in principle, any external system or rule or constraint to refer to its output.

What I'm after, then, is a formalism that captures Fodor's (1983) property of

encapsulation in the present context. Specifics aside, this is the only way to do formalism: if encapsulation is a real property of a system, we want it to follow from the

formalism, and preferably without coarse external stipulations.

We know how to make one operation, or function, the input of another: this is

function application in Church's lambda calculus or function composition in Gödel's

definition of primitive recursive functions. The idea is that if a function, f(x)=2x say,

takes as input arguments of a certain type, integers, it cares little if the integers are given

directly or are produced by a function that produces integers, like g(x)=3+x:

f(7)=2×7=2×(3+4)=2×g(4)=f(g(4)), or in lambda notation, f=λx∈ℵ(2x), g=λx∈ℵ(3+x),

14=[λx∈ℵ(2x)](7)=[λx∈ℵ(2x)](λx∈ℵ(3+x)(4)). The output of function g in the formula f(g) has no independent existence, and that is exactly what we want. Oh sure, if you want

to build a device (say a Turing machine) to implement f(g), you might want to give it a

working memory which stores the output of g; but that is quite an independent matter.

The formalism describes the mentally real virtual architecture, the level of description at

which we deal with the language faculty.

To start then, let's introduce some vocabulary. The set of all operations is ς :=def

{Match, Merge, Agree}. We assume a Numeration is the start of our derivations as per

standard arguments: we need it to determine the reference set for economy

comparisons33

. Following MI:35 in a departure from Chomsky 1995, the Numeration is a

set rather than a list of items from the lexicon; w.r.t. the derivation, it contains types, not

tokens. A derivation tokenizes Numeration types as occurrences34

. A derivation δ is a

mapping from the Numeration. Given an SO α, L(α) returns the label of α, namely the

unique X0 β ∈ α, as defined in (10)b; it is undefined otherwise, as for example if α

32

Perhaps temporary, if Merge includes a clean-up operation, "then … and remove (α, β) from working memory." This is part of why encapsulation probably is impossible to prove empirically here. 33 Chomsky (1995, 1998) uses it to determine the Merge-over-Move preference; Reinhart (1995, 1997,

1999) and Fox (1995, 2000) show some such set is needed to allow marked optional operations. 34 This all assumes the problems can be worked out. In a nutshell, a token of he introduced by thematic

Merge from the Numeration, and one introduced via movement at the head of a chain, can't be

distinguished if nothing more is said: he1 washed him*1/2 and he1 was washed he1/*2 can't be properly

indexed just be saying the former contains two he's and the latter one. In this particular case the LF

algorithm is quite simple, actually, free coindexing: obviation should rule out he1

washed him1

and absence

of theta-role he1 was washed he2. PF is a different matter. If a multi-dominance approach (Gärtner 1997) distinguishes first and later Merge (as in Chomsky 2001:00), it solves the problem but at a cost I don't want

to pay (Merge and Move are distinct primitives); in the version I want to buy it, it doesn't help.

23

doesn't immediately contain any X0.35

Finally, we will pretty idiosyncratically define a *

function, which takes an occurrence of a feature and returns the minimal set that contains

the X0 which contains it (since X

0s are not sets for us, simply the minimal containing set):

given {α … {β X0, {γ…}}}, if X

0 contains an occurrence [φ-]i, then *[φ-]i = β. We will

also assume Ω (26).

Let's start the construction of the system with some minimal building blocks:

(37)a. Merge: (α, β) → γ = {α, β}

b. Agree: (α, β) → γ = MLI of L(α) under L(β)

So, both Merge and Agree take two objects; Merge forms a set of them, while Agree

returns the modification of the label of first, details of which we will leave for later. Let's

further tacitly agree that Merge is distinct from Match (at least) and Agree (possibly, not

a big point) in that it puts its output into the Numeration, and deletes its input from it if it

is there: thus, if the Numeration is {α, β}, Merge(α, β) will output {α, β} and also

rewrite the Numeration as {{α, β}}. The technical implementation is trivial and

completely uninsightful (meaning, I suspect there's something bigger to this), so we'll

skip it36

. Intuitively, we are drawing a distinction between Merge and the other operations precisely in encapsulation: the output of each Merge does have independent

existence, and is in principle accessible to external systems. For example, if interfaces

accessed the Numeration dynamically in parallel with the derivation, conditions can be

stated over {α, β} even if (a) it is deleted from the Numeration as soon as {γ, {α, β}} is

constructed and (b) even if say β is deleted by a later operation. (b) ensures

reconstruction (in most general terms) of β into {α, β} is always possible if {α, β} is ever

the output of Merge even if {α, β} doesn't finally survive as such (e.g. ellipsis, maybe).

There's probably evidence for this. For one thing, this is what a Binding Theory, for

example, that allowed binding conditions to apply at each derivational step would look

like, as has frequently been proposed; but this can be handled just as well under the copy

theory. More interestingly, multiple spell-out along the lines proposed by Uriagereka

1999a, 1999b requires access (by PF) to the assembly process, in that it actually feeds the

computation proper, if he is right (as I want to believe) that CEDs are due to a PF effect.

Technically, we will need what Merge does to the Numeration when we come to Match37

.

I trust that except for the partial encapsulation bit, this is pretty mundane; just MI,

mostly. A lot of explanatory burden falls on Match, the definition of a derivation, and the

Early Match Principle:

(38)a. Match: α → (β, γ)

β = *[F-], γ = *[F+] ∈ h2 s.t. β ≥ γ and

35 E.g. a specifier construction before the head projects, {SPEC {H YP}}, vs. after, {H', {SPEC {H YP}}} 36 For example, since all operations occur relative to a Numeration ℵ, and if Numerations are relativized to

"phases" (e.g. CPs) that switch during the course of the Numeration, we probably want all operations to

carry a "pointer" to the current Numeration anyway; so take instead Merge(α, β, ℵ), Match(α, ℵ),

Agree(α, β, ℵ), where ℵ refers the current Numeration. For Agree this is vacuous; Merge and Match can

both access the Numeration, e.g. Merge modifies it, so they do what they do on ℵ. Trivial. 37

But a way of encapsulating the whole system is to interpret the Numeration as a function, ℵ, that returns a single argument; this involves trivial changes to the system. I haven't explored it.

24

there is no *[F] > β and/or β > *[F] > γ (s.t. *[F] is the head of its chain)

b. Derivation: A member of the Δ of all δi s.t.

(i) δi is the closure of ς under function composition on the Numeration;

(ii) for any LF-interpretable object λi = δi has fewer instances of function

composition than any other δj = λi. c. Earliness Match Principle (EMP): Match must apply whenever possible over

the Numeration.

Match takes a single set-theoretic object, and it finds in it a Matching pair ([F-], [F+]),

such that (a) [F-] is the highest occurrence of [F] in the object, and (b) [F+] is the closest

occurrence of [F] in the domain of [F-]. (b) is Locality. (a) partly implements ExtC, in

connection with the EMP. The intuition behind (a) is found in MI:49, 51, and elsewhere, e.g. Frampton and Gutmann 1999: deciding whether an SO enters into an operation

involves a top-down search of the SO for an unvalued feature to drive an operation, and

thus only the top-most such feature is findable38

. Locality, in turn, has the same intuitive

force (Frampton and Gutmann 1999:00): a top-down search by [F-] within its domain for

a Match. In our formulation, the two conditions are unified in a simple way; Match looks

for the minimal pair ([F-], [F+]), where minimal means both closest to the root and to

each other. Together with EMP, (a) plays a role in enforcing ExtC; for example, once

Agree has created MLI(α) with [F+] from α with [F-], MLI(α) > α and the original,

undeleted [F-] cannot drive operations again39

.

The argument of Match is a set-theoretic construct. Beside SOs, the Numeration,

with which we start, is such a construct: the set {x1, … , xn}, xi ∞ xj for all i, j ∈ 1…n, where xi is an X

0 drawn from the lexicon; it is thus a possible input to Match. Since > is

not define among SOs belonging to the Numeration, Match will yield (if there are such)

xi, xj s.t. the label of one has [F-] and of the other [F+]. The ability of Match, thus

generally formulated, to apply to a Numeration, allows it to be the general procedure for

introducing items into the Numeration, the analog of the extra operation Select of Collins

(1997), Frampton and Gutmann (1999). And in turn, since we are using the same procedure for Match within an SO as we now use to introduce objects into the

computation, we ensure that they always stand in a feature relationship that can be

discharged via Agree. The asymmetry of theta-theoretic Merge which Chomsky 1998

argued for is implemented using exactly the same mechanism, maximally generally

formulated, as the asymmetry of Probe-Goal relations.

Match thus has two potential inputs: either an SO as the output of a previous steps in

the computation; or the Numeration, to which it provides the only interface since Merge

38

Chomsky puts it this way (ibid): "The operations Merge and Agree must find (i) find syntactic objects to which they apply, (ii) find a feature F that drives the computation… Condition (i) is optimally satisfied if

OP applies to full syntactic objects already constructed… By condition (ii), F has to be readily detectable,

hence optimally in the label." 39

Despite this, there is a disjunction between (a) and (b), and I think (a) can be dropped. Nearly all of ExtC

assured by EMP. (a) ensures that [F-] on α after MLI(α) projects becomes inactive; but this is probably

independently done by Ω, if that means that non-head occurrences are inactive. Thus, given the chain

(MLI(α), α), properties of α are not visible.

25

and Agree are binary and cannot interface with a single object40

. Now, recall that the

output of Merge is put back into the Numeration; this is what allows Match to apply to it

again, to take e.g. a {v' {v, VP}}, where v has [DP-], pair it up with a [DP+] external

argument in the Numeration, and pass it onto Merge.

A derivation partly defined as the closure of ς, that is all the possible sequences of Match, Merge, and Agree feeding each other with the Numeration as the only initial

object41

. However, there is an infinite number of such sequences that may be arranged in

order of length or number of occurrences of operations. Only those sequences are

defined to be derivations that lead to LF-interpretable objects from a Numeration, and

moreover for any such particular object, only those that are shortest. This is in line with a

lot of older and newer research, such as Reinhart 1993, 1995 and Fox 1995, 2000, which

all show that economy principles like this one must be relativized both to distinct LF

objects and to different Numerations. This condition on derivations is global; but it

seems mandated, at present, by the impressive arguments of these works42

.

An important property of the system is that construction may (and indeed must) take

place in parallel. Consider a Numeration {D1, N1, D2, N2, V, v}, where D's, V, and v all

have selectional features. Match may apply to D1, D2, V, and v at the same time, the only

ordering here being extrinsic. {D1, N1} and {D2, N2} may be formed in parallel; V must wait for one to form by cyclicity (EMP below) before forming e.g. {V, {D2, N2}}, while

v by the same token must wait until V's done. The two arguments of the principal binary

compositional operation, Merge, may lead two independent parallel lives:

Merge(Merge(α, β), Merge(γ, δ)), where potentially e.g. α=γ, β=δ.

The Early Match Principle constraints this parallelism, and enforces ExtC. We start

with a Numeration, and every instance of Merge returns its output to the Numeration.

EMP, combined with Match effectively says that the Numeration cannot tolerate an

unvalued feature being the top-most occurrence of a feature type in an SO, if it is possible

to pair it up with a Matching valued feature either inside that SO or at the top of another

SO. Procrastination is allowed just so long as Match can't succeed. EMP has sweeping

consequences. It ensures ExtC: intuitively and empirically counter-cyclic construction,

like that of specifiers and complements after they have Merged, are all blocked by EMP. It also imposes an intrinsic ordering on operations and thus constraints parallel

computations: given Merge(Agree(α, β), Merge(α, β))43

where α, β are the output of

40 Strictly speaking: we could have Merge/Agree(α, Numeration) or (Numeration, Numeration). In the

case of Merge, there is no LF interpretation (eh-hem, if unstructured sets were interpretable, wherefore

syntax?); in the case of Agree, L(Numeration) is undefined because there is no unique label. 41

E.g.: two step derivations are Agree(Match(x)), Merge(Match(x)), etc. 42

The globality might or might not be apparent, depending on how one construes interface access to the

computation ('opportunistic' is the word Marslen-Wilson and Tyler 1989 use for this kind of thing, in their

case of interfaces between Fodorian modules). I won't worry about it here. Conceptual arguments from

computational economy don't impress me terribly, not even those that eschew complexity theory for

vehicular fuel supply economy. Good arguments against globality in MP were empirical, e.g. Reinhart's (1993) demonstration that a global formulation of the MLC is untenable. The works cited deal with a very

different class of phenomena, where there might be empirical justification that a computationally complex

comparison of derivations is in fact being performed (cp. esp. Reinhart's work). 43 This happens to be the derivation of theta-theoretic Merge: given a Numeration ℵ={saw, {the dragon}},

V with [DP-] features, Merge(Agree(Match{ℵ})), Merge(Match(ℵ)) = {V, {the dragon}}, where

Match(ℵ)=(saw, {the dragon}). Pure convergence determines what is a complement and what is a specifier under two instances of theta-theoretic Merge; v, for example, could first Merge wither with VP or

26

Match over the Numeration, for example, both Merge(α, β) and Agree(α, β) must start

from the same α, β for the two instances of Match to the Numeration that yield α, β are required to take place as early as possible, and thus at the same time. Finally, the EMP is

a local economy condition: at each point in a derivation, it is possible to determine

whether to apply it by inspecting the Numeration.

I think the system is kinda minimal and pretty; there's very little here added beyond

what we need conceptually (Merge) or what we know to be necessary (Match with its

locality; EMP with its cyclicity; Agree that's simply visible), and the formulations are

very simple. It can be shown that I am leaving on a jet-plane is derivable (except maybe

for the jet-plane). Moreover, all the derivations we have talked about work: probing into

specifiers under the right conditions, and CPC and non-CPC MSCs. It seems to allow

only derivations that obey the ExtC. But "seems" gets us to exactly what is wrong with

it: I don't even know how to begin proving it works. Translating the definitions into

some logical language is trivial (they've been given so that it is); and so presumably, one

could try and construct a formal proof in such a language that enumerates the phrase (or rather, feature) markers it can produce, and sees how they tally up with the fairly limited

number of phrase markers we actually want (head-complement; spec-head-complement;

etc.). But that's presently beyond me.

However, such was not the point of the exercise (with hindsight, of course). It was

rather (a) to show that the observations about when specifiers may be probed, and how

specifiers are constructed, are implementable in a minimalist-looking system that also

preserves the core properties like cyclicity and locality; and (b) for me to see just what

constructing a minimal system that has certain properties I want but present systems don't

have, like encapsulation, feels like.

5 Requiem

Ah, the things we have seen … what have we seen? We have gone through some

challenges for cyclicity. First off, an insightful formulation of it would predict

(circumstances under which) not only complements but also specifiers can have relations with their selectors; but it turned out they do, so that turned out not to be a problem.

Nesting paths in Multiple Specifier Constructions have long been regarded as a challenge

or heresy; they make mockery of the Extensions Condition, it has been alleged. But in

fact they do not; they bear subtle witness. The trick lies in knowing where the label is,

that's all; but the position of the label is predicted by giving virtually bare definitions of

syntactic operations, conceptually letting them apply quite freely, and necessarily

encapsulating the syntax through Radical Inclusiveness. Not just nesting paths, but when

to tuck in and when not to tuck in and where to tuck in to, fall out as a consequence.

Finally, we pulled it all together into a system which I am ready to disavow and throw out

on the street to fend for itself. But we had some useful pontifications along the way; how

to encode encapsulation (and why one should worry), letting operations apply freely

subject to a very general computational economy principle (EP) and letting convergence

take care of things, what role interfaces might have in the derivation. I am skeptical, not

(subject) DP once both exist; but the order DP > VP would eventually yield a crash, among other reasons because VP would become the left branch, which can't be extracted out of, and EPP of T

0 could not be

satisfied.

27

so much because I can't prove this goes where it's supposed to, but more because of

empirical challenges that inhere in Rezac 2002c; but enough for now. I abandon the

reader with a quod sequitur, again from Gabriel's Guide:

The reason why Fodor's modularity approach eventually disintegrated was because it was discovered that humans were remarkably good at reading. Really good. An ambitious series of psycholinguistic experiments proved what everybody already knew: reading is fast, mandatory, domain-specific, encapsulated, subject to specific neurological defects … the very paradigm of a module. And the problem only began with humans. The end of the twentieth century stunned the ovejologic community, but not the modularists, with a revised estimate of sheep intelligence in the wake of demonstrations that sheep are in fact almost unerring at recognizing faces of their fellow sheep. Twenty years later, the modularists were swept away by sheep that, raised among humans, differentiated human faces better than the faces of conspecifics. Some tried to argue for a while that there was indeed a genetically encoded reading module. Perhaps there was a distinct advantage in hunter gatherers' (monoids, higher primates, etc.) being "able to process linear arrangements of featurally categorizable shapes," game tracks and what not; the school took Paleolithic stone carvings as evidence of a genetic predisposition. A minority even claimed, sometimes entirely in capitals, that reading itself (of magic runes) had been slowly selected for during the Long Eons between the Fall of the Fourth Moon and the evolution of the duck-billed fire dragon, during which the Atlanteans ruled the earth. Ironically, these latter who we now know had at least got the historical facts fairly straight were simply ignored…

1

Appendix: Equidistance*

One property commonly attributed to MSCs is equidistance: multiple specifiers of the same head do not interact with each other for the purposes of locality. Arguably, the antecedents are Reinhart's (1979) proposal that wh-island violations (cp. Rizzi 1978) uses a second COMP position for the moving wh-phrase to pass through and avoid subjacency, and May's (1985:34) Scope Principle by which operators which govern each other (in particular, [Spec, S'] and S-adjoined positions) can take any scope w.r.t. each other1. In MP, once multiple specifiers were introduced in BPS theory, their equidistance came to be used in the derivation of transitive constructions with Object Shift (OS) and super-raising. Transitive constructions created the need for MSC equidistance with Chomsky (1995, 1998), where Agr projections were abandoned, and Icelandic overt OS was taken to target [Spec, vP] which c-commands the [Spec, vP] in which the subject is base-generated; further movement of the subject to [Spec, TP] then violates locality if specifiers are not equidistant. In the background lie two assumptions: that OS targets the same phrasal category in which the subject is base-generated; and that an OSed object c-commands the subject (Chomsky 1998:17, esp. ft. 36 for the latter). However, both can be shown to be wrong (Rezac 2002b, building particularly on Jónsson 1996). OSed NPs preferably precede high-level sentential adverbs like 'probably', and can be shown to c-command at their lowest a category 'QP' which is the target for the insertion of negation and the movement of quantified and negative objects, where QP itself c-commands the in-situ position of auxiliaries and the in-situ subject: (39)a. OS > QP > AUX* > [SU v [V …]] Further, the hypothetical EXPL OS SU orders can also be shown to require subjects that have undergone Ā-movement to [Spec, QP], and (by the improper movement generalization) no longer count for A-movement locality. Rezac (2002b) argues that the correct analysis of OS is the creation of an MSC of TP which also includes the expletive if present in a transitive expletive construction; as locality with 'tucking-in' predicts, the only possible order found under A-movement is (EXPL) SU OB (cp. also Richards 1997:90ff.). An analysis of super-raising which uses equidistance between multiple specifiers has been developed in various works by Hiroyuki Ura, particularly Ura (1994, 1996), where it is posited that the generalization holds true that super-raising is permitted if and only if a language also allows an MSC of TP. I will not argue here against this analysis; cp. for example Doron and Heycock (1999) and other arguments that the phenomena are require a different analysis. Rather, I will turn to a direct argument against equidistance, due to

*

The last remnant of an earlier, more optimistic, less stylistically idiosyncratic draft. 1 But for Reinhart the issue wasn't equidistance: the idea that islands are induced by an intervention effect

of a constituent rather than a bounding node along the path of movement only came with Rizzi (1990). The

point of having multiple COMP positions wasn't their equidistance from each other, but their availability as

intermediate positions between two bounding nodes (S and S'). For May, on the other hand, the issue is precisely equidistance although it has nothing to do with movement: roughly, specifiers and adjuncts that

m-command each other can take scope over each other regardless of their c-command relations.

2

Richards (1997:115). The gist of Richards' argument is this: if a languages that allow multiple wh-specifiers, for example, and has cross-clausal wh-movement, equidistance should allow any ordering relationship between the wh-words that might hold in a single clause to be scrambled: (40) [CP2 ___MSC/CP2 [+Q] … [CP1 ___MSC/CP1 … wh1 … wh2]] Although in moving to MSC of CP1, the wh-words originating in that clause might be expected to be mutually ordered by locality and other principles, and they indeed are, further movement to MSC of CP2 need not observe such ordering since the wh-words in MSC of CP1 are equidistant from each other. Now, Richards demonstrates this using apparent wh-island violations in Bulgarian, where CP1 also has a [+Q] feature: the generalization is that only wh1 may move to [Spec, CP2]

2. Since I'm actually not willing to admit that any wh-island violations exist that are created by movement, I take my version of the argument from Romanian (which seems to pattern with Bulgarian in relevant respects), based on Alboiu's (2000, particularly chapter 4) extensive discussion. Romanian wh-words all require movement to form an MSC of a single category, IP as a syncretic category in Alboiu's analysis, and the MSC obeys the usual subject/object 'tucking-in' asymmetry. The MSC must be adjacent to the verb ('inversion'); subjects, topics, and adverbs, in particular, may not intervene. Alboiu shows that once D-linking is completely controlled for, no wh-island violations may occur: all wh-words must target the closest c-commanding [+Q] complementizer, which may thus host wh-words from different clauses (p. 172, 210-212). As this generalization predicts, cross-clausal movement out of complements is both possible and necessary; and it can be shown to be successive-cyclic because it triggers obligatory inversion along its path (p. 190ff.). Now this successive-cyclic movement is an instance of the configuration in (40); if MSC of CP1 had its specifiers equidistant for further movement, no ordering would be imposed on MSC of CP2. Yet that too must obey the MST. Consequently there is not equidistance for MSC of CP1

3. The situation here is the same as the one Richards observes for Serbo-Croatian (cp. discussion on pp. 54, 152-3), except that arguably there the final landing site in CP2 may be [Spec, CP] for one wh-word and MSC of TP for the rest. Richards (1997:118) observes that this runs counter to equidistance in A-movement, but doesn't want to give it up because he thinks there is evidence it acts as an 'escape' for further movement (p. 94ff.). The two phenomena in question are cliticization in West

2 The relevant data are:

(i) Koj se opitvat da razberat kogo e ubil? // *kogo … koj …

who SELF try to find.out whom AUX killed?

(ii) Na kogo se optivaš da razbereš kakvo dade Ivan? // *kakvo … na kogo …

to whom SELF try-2.sg. to find.out what gave Ivan? (Richards 1997:116, 299, citing Roumyana Izvorski, Ani Petkova, Roumyana Slabakova, Kamen Stefanov,

p.c.). A complication here is that if there are three wh-words, and two of them escape the island, it must be

the highest and the lowest, while the middle one must stay in [Spec, CP2].

(iii) Koj kakvo se opitva da razbere na kogo dade Ivan? // *koj na kogo … kakvo …

who what SELF try to find.out to whom gave Ivan?

(Richards 1997:302, citing Roumyana Izvorski, p.c.) 3 Interestingly, the situation here is the S-structure manifestation of Reinhart's (1979) proposal, which

therefore cannot go through: weak island violations are not possible in Romanian.

3

Flemish (p. 94) and topicalization in Dutch (p. 95) of the direct object in a double object construction, which both require OS of the indirect object. Richards' way of looking at this is that this follows from locality and equidistance: the indirect object, which c-commands the direct object in a double object construction, blocks its movement, unless equidistance with it as under OS if OS creates an MSC. For topicalization, this doesn't go through simply because A-positions do not count as interveners for Ā-movement (otherwise all but subject topicalization/wh-movement would be out). Rather, based on the discussion in Rezac (2002b), I would suggest that the semantics of topicalization and OS is such that an object which can be topicalized must undergo OS (the latter basically to escape being interpreted as new information, cp. Reinhart 1995, 1997); and since OS is A-movement, A-positions will count as interveners, and the IO must OS in order for the DO to do so. In Rezac (1999) I argue that cliticization in Czech must pass through an OS position, and I would like to suggest the same here, for basically the same semantic reason. Consequently, there is no evidence for equidistance in A-movement. Further, if the analysis of Icelandic NP-movement in Rezac (2002b) is correct, there is evidence against it. While this falls short of being persuasive, in combination with the more solid evidence from wh-movement in Romanian, it seems MSCs do not define equidistant positions.

4

6 Appendix

Richards (1997:84ff.) argues that multiple tucking-in A-scrambling in Japanese targets

the TP, namely the same category that contains the nominative subject.

-Japanese has a kind of relativization which violates islands (Kuno 1973, Hasegawa

1984, Ochi 1996). The general format is as follows (with English order:

(41)a. NPi [Opi [NPj [Opj __i … __j] V …]]

b. The childi [thati [the clothesj [thatj __i is wearing __j] are dirty]]

-Here we have two relative clauses, modifying different NPs; the external relative clause

is modifying an NP extracted from the contained relative clause (the child).

-There is a restriction: the contained relative clause must modify the subject of the

relative clause by which it is contained (the clothes), where subject is either the

nominative narrow subject (Hasegawa 1984), a nominative broad subject in an MSC, or

an object A-scrambled before the nominative.

Nominative subject

(42)a. [ei ej kiteiru] hukuj-ga yogoreteiru] kodomoi

wear clothes-N dirthy child

the childi that the clothesj that ti is wearing tj are dirty b. *[[Mary-ga [ei ej kiteiru] hukuj-o tukutta] kodomoi

Mary-N wear clothes-A made child

the childi that Mary made the clothesj that ti is wearing tj

(Richards 1997:84, ex. 39, 40, citing Ochi 1996)

Broad subject

(43)a. *[Taroo-ga [eij kaita] honj-o katta] sakkai

Taroo-N wrote book-A bought author

the authori that Taroo bought the bookj that ti wrote tj

b. ?[Taroo-ga [ei ej kaita] honj-ga suki na] sakkai

Taroo-N wrote book-N likes author

the authori that Taroo likes the bookj that ti wrote tj

(Richards 1997:85, ex. 41a-b, citing Takako Aikawa & Shigeru Miyagawa, p.c.)

IP-scrambled object (44)a. *[Taroo-ga [ei ej kaita] honj-o katta] sakkai

Taroo-N wrote book-A bought author

the authori that the bookj that ti wrote tj, Taroo bought

b. ?[ei ej kaita] honj-o Taroo-ga katta] sakkai

wrote book-A Taroo-N bought author

the authori that the bookj that ti wrote tj, Taroo bought

(Richards 1997:85, ex. 42a-b, citing Takako Aikawa & Shigeru Miyagawa, p.c.)

-Apparently, the possibility of having a contained relative can be used to diagnose

[Spec, IP] – nominative subjects, broad subjects, A-scrambled XPs.

5

-Richards thus shows that when multiple A-scrambling all targets the TP, tucking-in

occurs. He relies on Miyagawa's (1997) observation that although Japanese ditransitives

base-generate both (IONP, DONP) and (DONP, IOPP), floated numeral quantifiers which

modify NPs but not PPs can only be attached to IOs in the former order.

"[I]n cases in which the double objects are scrambled to the left of the subject and the dative argument is marked by a floated numeral quantifier, we expect to find that a

contained relative clause can modify the second object only when the base IO-DO

order is maintained." (p. 87).

This is borne out (Richards 1997, 88-89, ex. 47a-b, 48):

(45)a. ?[seijika-ni hutari [ti tj kyoonen osieta] Tanaka-san-no kodomoj-o Hanako-ga syookaisita] senseei

politician-D two last.year taught Tanaka-HON-G child-A Hanako-N introduced teacher

The teacheri that Hanako introduced the childrenj of Mr. Tanaka that ti taught tj last year to two politicians.

b. [seijika-o [ti tj kyoonen osieta] Tanaka-san-no kodomoj-ni ?(*hutari) Hanako-ga syookaisita senseei

politician-A last.year taught Tanaka-HON-G child-D ?(*two) Hanako-N introduced teacher

"The teacheri that Hanako introduced the politician ?(*of two) of the childrenj of Mr. Tanaka that ti

taught tj last year."

Cyclic Domains and (Multiple) Specifiers...

Documents

Transcript of Cyclic Domains and (Multiple) Specifiers...