The Unbelievable Legacy of Methodological Dualismrthomaso/lpw05/johnson.pdf · 2005-09-03 · 3...

1

The Unbelievable Legacy of Methodological Dualism

Draft – Comments Gratefully Received Kent Johnson Dept. of Logic and Philosophy of Science University of California, Irvine

[email protected] http://www.lps.uci.edu/~johnsonk

ABSTRACT. Methodological dualism in linguistics occurs when its theories are subjected to general methodological standards that are inappropriate for the more developed sciences. Despite Chomsky’s criticisms, methodological dualism abounds in contemporary thinking. In this paper, I treat linguistics as a scientific activity and explore some instances of dualism. By extracting some ubiquitous aspects of scientific methodology from its typically quantitative expression, I show that two recent instances of methodologically dualistic critiques of linguistics are ill-founded. I then show that the Chomskian endorsement of the scientific status of linguistic methodology is also incorrect, reflecting yet a third instance of methodological dualism.

Introduction Perhaps more than any other discipline, linguistics has continually defended its methods and

practices as “scientific”. This practice has been heavily inspired by Noam Chomsky’s frequent

and vigorous critiques of “methodological dualism”. Methodological dualism occurs in

linguistics when its theories are subjected to certain general methodological standards which

themselves are inappropriate for other, more developed sciences. In this vein, Chomsky has

repeatedly charged many central figures in philosophy – Dummett, Davidson, Kripke, Putnam,

and Quine, to name a few – of subjecting theories of language (and mind) to dualistic standards

(e.g., Chomsky 1986, ch. 4; 2000, ch. 2 – 6; 1975, ch. 4). Moreover, he has argued that these

standards have no known plausible defense, and that there is no reason to take them seriously. In

place of these dualistic requirements, Chomsky has recommended that linguistic theories be held

to the standards that normally apply to empirical theories. In his words, he advocates “an

approach to the mind that considers language and similar phenomena to be elements of the

natural world, to be studied by ordinary methods of empirical inquiry” (Chomsky 2000, 106).

This is a natural position for Chomsky, given his rather traditional view of the relationship

between philosophy and science:

2

In discussing the intellectual tradition in which I believe contemporary work [sc. on language] finds its

natural place, I do not make a sharp distinction between philosophy and science. The distinction, justifiable

or not, is a fairly recent one…. What we call [Descartes’] “philosophical work” is not separable from his

“scientific work” but is rather a component of it concerned with the conceptual foundations of science and

the outer reaches of scientific speculation and (in his eyes) inference. (Chomsky 1988, 2).

If philosophy is a kind of study into the foundations of the sciences, then there is little room for a

“philosophical” theory of language or mind that is not itself a “scientific” theory.

The view that linguistic theories are ordinary scientific theories, subject to the same

methodological standards as the (other) sciences, has much to recommend itself. Indeed, this

general view has been endorsed, at least in name, by virtually all linguists and a great many

philosophers. In this paper, I adopt this view, and explore some of its consequences. I diverge

from Chomsky and the many others who have discussed “scientific linguistics” in one crucial

respect. It is standard to discuss the nature of scientific methodology by appealing to general

descriptions of individual cases in the history of science (for a representative example, cf. the

discussion of Newton in Chomsky 2000, ch. 5). Instead of considering particular examples, I

focus instead on some ubiquitous quantitative aspects of ordinary scientific practice. Ordinary

science encodes most of its methodology in the quantitative details of its theories, so this is a

natural place to look for clues about how to do linguistics scientifically. The particular

mathematical aspects I’ll discuss are extremely common in ordinary scientific research,

particularly when the research involves the kinds of highly complex systems that are (as we’ll

see) natural points of comparison with linguistics. Thus, these mathematical aspects represent

some very fundamental aspects of scientific inquiry generally. Indeed, rather than applying to

just a handful of case studies or one or two scientific disciplines, they are much more naturally

(and normally, at least in the sciences) thought of as reflecting the actual nature of scientific

inquiry quite generally.

Dialectically speaking, it won’t be enough to merely point out that a number of central

aspects of ordinary scientific methodology have precise parallels in linguistics. Once we take the

time to extract some methodology from its quantitative presentation, it might seem obvious that

the very same methods are legitimately used in linguistics. We need a foil, someone who would

deny these claims. Unfortunately, such foils are all too easy to find; methodological dualism is

alive and well in philosophy. My strategy, then, will be to discuss two instances of

3

methodological dualism that have been explicitly endorsed in the literature by prominent

philosophers (Francois Recanati and Jerry Fodor in particular). Abstractly stated, the two

instances of dualism sound oddly familiar, which may explain why they have been adopted. The

first one says that empirical theories of some phenomenon of interest must explain or account for

that phenomenon. The second one says that when a theory uses theoretical terms, they must be

defined. Neither principle is plausible. In debunking these principles, my interest is not so much

to say something surprising (I would’ve taken the problems with such principles to be obvious, if

they didn’t keep getting endorsed). Instead, my primary aim is to expose just how deeply these

principles conflict with ordinary science. As these two examples suggest, it seems that when

philosophers impose dualistic standards, they don’t just strike out at some controversial fringe

aspect of ordinary inquiry; they go for the methodological jugular vein of scientific inquiry.

In both cases, my discussion of the principle in question takes the following form. (1) I

first briefly present a proposed bit of linguistic theorizing, followed by (2) the methodological

criticism to which this theorizing has been subjected. In both cases, the criticism has an air of

generality about it, as though it is an instance of a much more broadly applicable methodological

principle. But I then show that (3) (the generalization of) the principle conflicts with some of the

absolutely fundamental and ever-present mathematical techniques of modern science, viz. certain

basic statistical methods. (These techniques are used everywhere, from sociology and economics

to neuroscience and biology to chemistry and physics.) Finally, I return to linguistics and show

that (4) its methodology is not relevantly different from that of the (other) sciences, so that the

principle in question does not apply to language.

Thus, the first two sections of this paper are a defense of current linguistic theory. In the

third section, though, I bring out some divergences between linguistic and scientific

methodology. Although a detailed discussion of these differences is a topic for another paper

(Johnson, ms.), I offer a brief sketch of a few of these differences, showing how they constitute a

third instance of methodological dualism. It is here that we precisely identify some errors of

Chomsky’s characterization of linguistics. I conclude in section four.

1 Dualism I: Saving the Phenomena The first form of methodological dualism I discuss concerns the explanatory scope of a semantic

theory. Francois Recanati and others have suggested that semantic theories must capture a great

4

deal of the apparent semantic phenomena, in a sense to be spelled out below. This requirement is

used to critique various types of semantic theories, such as the one developed by Herman

Cappelen and Ernie Lepore (hereafter CL). I briefly sketch CL’s view, and then consider

Recanati’s criticism. We’ll then be in a position to uncover the dualistic nature behind the

requirement that semantic theories “save the phenomena”.

1.1 The Proposed Linguistic Theory

Recently, CL have defended a view called “Semantic Minimalism” (CL 2005). According to

Semantic Minimalism, only a handful of expressions are actually context sensitive (e.g., I, you,

she, this, that, tomorrow, etc.). There are no hidden (i.e., unpronounced, unwritten) context-

sensitive elements in the syntactic or semantic structure of an expression. Thus, Semantic

Minimalism contrasts sharply with most semantic theories. In particular, it is normal for

semanticists and philosophers of language to assume that a correct semantics for (1a) and (2a)

assigns more structure than what is given by the overt structure of these sentences.

(1) a. Mary is ready;

b. Mary is ready to X.

(2) a. It is raining;

b. It is raining in location X.

(1a) and (2a) are normally assumed to have semantic forms like (1b) and (2b). Thus, (1a) means

something like Mary is ready to do something, or is ready for something to happen. Similarly,

(2a) means it’s raining in some contextually specified place. Semantic Minimalism denies this,

holding instead that (1a) simply means that Mary is ready, and (2a) simply means that it’s

raining.

1.2 Criticism of the Theory via Methodological Principle

Unsurprisingly, Semantic Minimalism has encountered numerous objections (cf. CL 2005, ch. 11

– 12 for discussion). My focus will be on just one of these, which I will call the Problem of

5

Unsaved Phenomena (PUP). PUP is most straightforwardly presented in Recanati 2001 (cf. also

Carsten 2004 for similar sentiments , and CL 2005, ch. 12 and citations therein). For instance,

Recanati writes:

That minimal notion of what is said is an abstraction with no psychological reality, because of the holistic nature of

speaker’s meaning. From a psychological point of view, we cannot separate those aspects of speaker’s meaning which

fills gaps in the representation associated with the sentence as a result of purely semantic interpretation, and those

aspects of speaker’s meaning which are optional and enrich or otherwise modify the representation in question. They

are indissociable, mutually dependent aspects of a single process of pragmatic interpretation. (Recanati 2001, 88)

Recanati’s pessimism about Minimalist semantic theories is driven largely by his view that such

theories don’t explain enough of the phenomena. In Recanati’s view, a semantic theory must

capture the (entire) “content of the statement as the participants in the conversation themselves

would gloss it” (Recanati 2001, 79-80). Recanati expresses this in his

“Availability Principle”, according to which “what is said” must be analyzed in conformity to the intuitions shared by

those who fully understand the utterance – typically the speaker and the hearer, in a normal conversational setting. This

in turn supports the claim that the optional elements…(e.g., the reference to a particular time in “I’ve had breakfast”)

are indeed constitutive of what is said, despite their optional character. For if we subtract those elements, the resulting

proposition no longer corresponds to the intuitive truth conditions of the utterance. (Recanati 2001, 80)

Let’s begin by putting this argument in a bit more manageable form. We can characterize the

Availability Principle as:

(AP) A semantic theory is acceptable only if it correctly characterizes the intuitive truth

conditions often enough within some psychologically interesting range of cases.1

Moreover, we can give PUP the following form:

1 For present purposes, I will assume that (AP) is an appropriate formulation of Recanati’s Availability Principle;

any divergences between the two will not matter in this paper. One might strengthen (AP) further by specifying the

particular range of cases in which a semantic theory must get things right, and by specifying how often the theory

must get things right. I won’t worry about such strengthenings, though; since what I have to say will apply equally

to all such versions of (AP).

6

The Problem of Unsaved Phenomena

(3i) In all relevant ranges of cases, the intuitive truth conditions of our utterances

contain much more content than what is characterized by minimalist theories.

(3ii) If (3i) is right, then from a psychological point of view, we cannot separate the

minimalist aspects of meaning from those aspects supplied by a more enriched

view of meaning (often enough, in any relevant range of cases).

(3iii) Hence, minimalist aspects of meaning cannot be separated from those aspects

supplied by a more enriched view of meaning (often enough, in any relevant

range of cases).

(3iv) But if we can’t separate minimalist from non-minimalist elements of meaning

(often enough, in any relevant range of cases), then minimalist theories are

unacceptable.

(3v) Hence, minimalist views are unacceptable.

Premise (3i) is an empirical claim; premises (3ii) and (3iv) are theoretical. Premise (3ii) comes

from the quote of Recanati above (2001, 88), and premise (3iv) comes from (AP). (To see this,

notice that if we can never separate out the minimalist aspects of meaning, then there must

always be some non-minimalist aspects present, so the minimalist aspects of meaning never

characterize the intuitive truth conditions in the utterance. Hence, by (AP), minimalist theories

are unacceptable.) I won’t discuss CL’s attempt to deny the empirical claim (3i); suffice it to say

that I find it inconclusive. But there are also flaws with (AP), (3ii) and (3iv), assuming linguistic

theories are treated like ordinary scientific theories.

1.3 Is the Principle Justified by General Scientific Considerations?

In order to see what is wrong with (AP), (3ii), and (3iv), it will be useful to step back from

linguistic theorizing and examine some aspects of the methodology of the (other) sciences. I’ll

argue that no non-linguistic scientific theories would ever be constrained to observe appropriate

counterparts of these principles. The parallel between linguistic and other scientific theories thus

renders (AP), (3ii) and (3iv) unacceptable.

7

To get things started, let’s take a simple example. Suppose we are studying the

relationship between different quantities of a given additive X used in some manufacturing

process and the amount of some type of atmospheric pollution Y generated by the process. The

industry standard is to use n units of X per ton of product, but for a period of time, certain

companies used more or less than n units. The relation between the varying amounts of X used

and Y emitted are given as black diamonds in the plot below (ignore the two curves and white

diamonds for the moment).2

-20 -10 0 10 20-200

0

200

400

600

800

(The example of a pollution study here is arbitrary; it could be replaced with literally thousands

of different examples from any given area of empirical science that rationally scales its

measurements.) Given this data, there are (infinitely) many possible relations that could hold

between X and Y. One extreme option would be to insist that every aspect of the data is crucial

to understanding how X and Y are related. In such a case, a researcher might look for a function

that captured the data precisely, as in the very complex one depicted with a solid line. In the

present case, a polynomial of order 29, will do so, for the given raw data set of size 30:

(4) Predicted value of Yi = Yi = f2(Xi) = β0 + β1Xi + β2Xi2 + … + β29Xi

29

2 Zero on the x-axis represents the use of n units of X; other values represent the respective deviations from this

standard

8

The resulting theory will then perfectly predict the behavior of Y on the basis of the behavior of

X. The raw data, in the form of a set of pairs of measurements {<Xi, Yi> : i ∈ I}, is fully

accounted for. In other words, (4) saves all the phenomena, which in this case is the variation in

<X, Y> scores of individual samples.

Despite its success at capturing the data, the first approach is almost never adopted. A

vastly more common strategy hypothesizes a simpler relation between X and Y, and that Y is

influenced by other factors unrelated to X. One might, e.g., hypothesize that that relationship is

given by the simple function:

(5) Predicted value of Yi = f1(Xi) = β0 + β1Xi + β2Xi2

for some fixed numbers β0, β1, β2. Once these numbers are determined from the data, we get the

simpler curve given by the dashed line. In the present example, the values of β0, β1, and β2 were

determined by seeking those values for which ]))([( 21 i

Iii XfY −∑

∈

is as small as possible.

Although (5) doesn’t predict the behavior of the original data as well as its rival (4),

many other theoretical considerations speak in its favor. For example, suppose we got hold of

another sample of data, given by the white diamonds above. Then we might ask how well the

two functions captured this new data. One way to do this would be to compare the sizes of the

discrepancies between what (5) and (4) predict about the value of Y for given values of X in the

new data set. E.g., we might examine the ratio:

(6) ]))([(

]))([(

22

'

21

'

iIi

i

iIi

i

XfY

XfY

−

−

∑∑

∈

∈

Here I’ indexes the second set of measurements, and f1 and f2 are assumed to have had the

particular numerical values of their parameters – { β0, β1, β2} in the case of f1, and {βk: 0 ≤ k ≤

29} in the case of f2 – fixed by the first data set. In this case, we get a value greater than 6 × 1031,

indicating that there is vastly more discrepancy between the new data and what f1 predicts than

9

there is between this data and what f2 predicts.3 Thus, the extra structure in the curve given by f2

errs in that it captures much variance in the data that is unrelated to the true relation between X

and Y.4 In short, a bizarre model like (4) that captures all the (original) data is vastly inferior to

the far more standard model like (5) that doesn’t. (As a bit of terminology, I will use model and

theory interchangeably.) In particular, the simpler model does a massively better job at

predicting the general trends of new data as it arrives.

What then is the relation between the model in (5) and the actual raw data? This relation

is given by adding a “residual” or “error” term to our equation:

(7) Yi = f1(Xi) + εi = β0 + β1

Xi + β2Xi2 + εi,

The term εi, whose value varies as i varies, expresses whatever deviation is present between the

model and the raw data. As (7) shows, εi = Yi – f1(Xi). In practice, scientific models of complex

phenomena never perfectly fit the data, and there is always a residual element (εi) present. This is

so even when the system under study is completely deterministic. E.g., the true model might be

something like

(8) Yi = f1(Xi) + f3(Z1i, …, Zki)

In such a case, Y is always an exact function of X and Z1, …, Zk. However, the influence of the

Zjs may be very small, very complicated, unknown, poorly understood, etc. Thus, for any

number of reasons, it may be natural to model the phenomena as in (5), all the while realizing

that the existence of residuals in the raw data show that there is more to the full story than is

3 From a God’s-eye view, this is unsurprising, because f1 is the form that actually generated the data. I used the

formula Yi = 3 + 4Xi + 2Xi2 + εi, where ε and X were normally distributed with a mean of zero and variances of 100

and 10 respectively. 4 There’s much more to be said about the general issues of model construction and model selection; cf. e.g. Forster

and Sober 1994, Burnham and Anderson 2002 for further relevant discussion.

10

presented by (5). (In fact, residuals may correspond roughly to the philosophical notion of a

“ceteris paribus” clause.5)

There’s nothing more basic to statistical research than the idea that the best (or true)

theories/models will imperfectly fit the actual data. Indeed, that’s why statistical research is

founded upon probability theory instead of directly on algebra and analysis. This is just a fancy

way of saying that in real empirical research of any complexity, there will always be unsaved

phenomena. But this not a criticism of statistical modeling. Rather, it is a reflex of the fact that

actual data is frequently the result of multiple influences, only some of which are relevant for a

given project.

1.4 Is Linguistics Relevantly Different from the Other Sciences?

Let’s get back to semantic theorizing. Notice that like (4) – (5), semantic theories are theories of

a complex phenomenon (i.e., the interpretation of language). The raw data of a sample of the

linguistic phenomena aren’t numerical; instead, they are assessments about certain types of

idealized6 linguistic behavior: what sorts of things would typical speakers communicate by

uttering a given sentence, and under what conditions? That is, the raw data of semantic

theorizing are the intuitive truth conditions of our utterances, as we do or would make them in

various contexts. Proceeding like the statistical researcher, the Semantic Minimalist begins by

hypothesizing that there is some relatively simple structure – i.e. simple in comparison to the

complexity of the raw data – that accounts for much of the collective behavior of the raw data. In

order to obtain this simple, general structure, some aspects of the raw data (i.e., the intuitive truth

5 The correspondence may not be perfect, though, since in real life as well as in the mathematical assumptions

underlying this part of statistical modeling, the probability that the residual contributes nothing to the equation is

zero. Thus, it may not be true that ceteris paribus, Yi = f1(Xi), depending on what one’s theory of ceteris paribus

clauses is. 6 The notion of idealization in linguistics and the other sciences has been discussed at great length in many places

(e.g., Liu 2004, Chomsky 1986, and citations therein). Since the primary data of interest in the present paper

concerns “intuitive truth conditions”, the idealizations at play here are substantially less (although by no means

absent!) than in other areas of linguistics.

11

conditions) must be ignored, just as we ignore some aspects of variance in the statistical case. In

general, both minimalist semantic theories and scientific models have the same general form:7

(9) Raw Data = (i) Effects of processes under study (ii) Interacting in some way with

(iii) Residual Effects

The minimalist theory supplies some aspects of meaning that are hypothesized to capture much

of the general behavior of the totality of the data set. By assumption, the outputs of this theory

are not assumed to capture all of the raw data (i.e., intuitive truth conditions of utterances). In

fact, it is not even assumed that the semantic theory will ever capture all of the intuitive truth

conditions. As we’ve seen, such an outcome is absolutely standard science. Our pollution

researcher, would not assume that there will be some raw datum Yi such that Yi = f1(Xi), with no

contribution from the residual effects. Indeed, it is quite typical to expect that εi will never equal

0, particularly when the phenomenon under study is extremely complex. (When the phenomena

are quite complex, a model may be considered significant even if it captures as little as 16% of

the raw data (e.g. R. Putnam 2000, 487).) Likewise, the intuitive truth conditions of utterances

may always be determined by both the minimalist theory of meaning, and by other interacting

aspects of communication. These other aspects of communication are familiar: background

beliefs, indexical-fixing elements, demonstrations, “performance” capacities of speaker/hearers,

etc.

In short, a minimalist semantic theory is just that – a theory about the nature of the raw

data. Like any other scientific theory, one of its essential rights and obligations is to characterize

those parts of the raw data it considers to be truly part of the phenomenon under study, and what

other parts are due to extraneous processes; cf. (i) and (iii) in (9). The fact that semantic theories

get to characterize their own scope also means that they should be judged by the standard,

complicated but familiar, criteria of successful scientific theories: simplicity, elegance, predictive

fecundity, integration with other successful theories which collectively account for the raw data

(or, more typically, hopefully someday will account for the raw data), etc. Methodologically 7 Of course, correlation does not imply causation, so more is needed here than just the regression analysis. Similar

issues apply to linguistic theories as well. For simplicity’s sake, I will ignore these matters, and assume that both

types of models support the scientific interpretation in (9).

12

speaking, demanding that a semantic theory sometimes exactly characterize the intuitive truth

conditions of utterances appears to be just like demanding that statistical models should (at least

for some interesting range of values) be like the complex f2, instead of the like the more standard

f1. Such a demand would be bizarre and deeply incorrect in the statistical case; I submit it is no

better motivated in the case of semantic theorizing.

The points just made show that (AP) and (3iv) place an unwarranted constraint on theory

construction. In no other study of complex phenomena would one demand that theories perfectly

capture the raw data across some interesting range of cases. (3ii) should be rejected because it is

one of the rights of a theory to provide a theoretically useful characterization of the phenomena it

addresses. (3ii) simply denies this in when the theory is semantic. Thus, PUP is unsound.

Another way to view the problem with PUP is that it depends on an equivocal

interpretation of “separability”. Everyone can agree that the intuitive truth conditions of our

utterances are almost always substantially influenced by pragmatic factors. In this sense, it’s

probably true that pure semantic content is “inseparable” from pragmatic factors: in actual

language use, you rarely if ever find the former alone, without the latter. This interpretation of

inseparability makes (3ii) plausible, but it also undermines (AP) and (3iv). After all, it’s no

criticism of a theory that it treats the raw data as being a product of multiple sources. If this is

what separability is, then claiming that minimalist and non-minimalist aspects of meaning are

inseparable simply begs the question against minimalist theories.

On the other hand, (AP) and (3iv) are plausible if inseparability means that no reasonable

total theory of language will treat minimalist and non-minimalist aspects of meaning as effects of

(relevantly) distinct processes. That is, in order for (AP) and (3iv) to be plausible, the relevant

notion of inseparability must require that that all aspects of the intuitive truth conditions be

explained by the same mechanisms. Now (AP) and (3iv) are virtually tautologies, but (3ii) loses

its support. Why should the fact that the intuitive truth conditions of our utterances do contain

both minimalist and non-minimalist aspects of content be sufficient to license the restriction that

any theory of semantic content must capture all of these aspects? Such a view clearly begs the

question against minimalist semantic theories.

I complete this section by briefly considering an argument in favor of (AP). Recanati

writes:

13

Suppose I am right and most sentences, perhaps all, are semantically indeterminate. What follows? That there

is no such thing as ‘what the sentence says’ (in the standard sense in which that phrase is generally used). . . .

If that is right, then we cannot sever the link between what is said and the speaker’s publicly recognizable

intentions. We cannot consider that something has been said, if the speech participants themselves, though

they understand the utterance, are not aware that that has been said. This means that we must accept the

Availability Principle. (Recanati 2001, 87 – 88)

Recanati’s claim that most or all sentences are “semantically indeterminate” amounts to the

claim that minimalist semantic theories don’t capture the intuitive truth conditions of most or all

sentences. It’s unclear, though, why such a claim should be taken to imply that “there is no such

thing as ‘what the sentence says’ (in the standard sense in which that phrase is generally used)”. I

take it that “what the sentence says” here refers to the content that a (minimalist or other

standard) semantic theory ascribes to a sentence. If that is correct, then claiming that there “is no

such thing” is simply false. Again, it’s part of the job of a theory to carve out the sub-portion of

the phenomenon that it directly deals with, leaving the remaining parts for further theorizing.

Recanati’s claim that there is no such thing as “what is said” in this context is like saying there is

no such thing as the true population model f1 in the statistical case. Hence, Recanati hasn’t made

a viable case for (AP).

In sum, PUP fails primarily because it ignores a basic fact about scientific theorizing:

each theory gets to determine what part of the phenomena it addresses, and typically this is only

a very proper subpart of the total phenomena. The requirement that a theory accommodate all of

the intuitive truth conditions often enough in some relevant range of cases is a restriction on

semantic theories that has no precedent in any of the developed sciences. Indeed, it is far more

typical to assume that a given theory will not account for the data.

2 Dualism II: Defining Theoretical Terms I now turn to a second form of methodological dualism. This one concerns the need to provide

definitions of the technical or theoretical terms that one uses in linguistic theories. Jerry Fodor

(inter alia) has stridently argued that linguists’ failure to do this undermines their ability to

appeal to such notions in their theories. I quickly sketch an example of the offending linguistic

theory, and then we’ll look more carefully at Fodor’s position. We’ll then consider whether it’s

generally true, as Fodor explicitly claims, that this requirement on theoretical terms holds

14

generally throughout the sciences. It doesn’t. Examining why the requirement doesn’t hold in the

(other) sciences explains why it doesn’t hold in linguistic theorizing, either.

2.1 The Proposed Linguistic Theory

Within the linguistics community, it is almost universally held that there are a small number of

thematic roles which, for present purposes, we may regard as linguistically primitive semantic

elements (e.g., Baker 1988; Dowty 1991; Grimshaw 1990; Hale and Keyser 1986, 1987, 1992,

1993, 1999, 2003; Jackendoff 1983, 1987, 1990, 1997, 2002; Levin and Rappaport Hovav 1995;

Parsons 1990; Pesetsky 1995). For example, one such thematic role is that of “Agency*”. The

thematic notion of Agency* is roughly and intuitively similar to the ordinary notion of agency,

but the two are not identical.8 In general, thematic roles are similar but not identical to certain

ordinary notions. Hence, I use asterisks in the names of thematic roles to distinguish them from

their ordinary language counterparts. Although Agency* isn’t agency, agency can be a rough

indicator of Agency*. In fact, the Agent* of a clause can be thought of as roughly something

along the lines of the doer of the action described by the verb, if there is such a doer. (Thus,

Agency* is always relativized to a sentence; Bob is the Agent* of Bob bought a camera from

Sue, but Sue is the Agent* of Sue sold a camera to Bob, even though these two sentences

describe the same event.)

Agency* figures into a wide variety of linguistic generalizations and explanations (cf. the

citations above). For a very simple example, notice that the doer of an action is always in the

subject position of a transitive verb (assuming that the verb requires either the subject or object

to perform the action):

(10) a. John kicked the horse.

b. Susan kissed the bartender.

c. Christine built the shelf.

8 E.g., the poison is plausibly the Agent* in a sentence like The poison killed the Pope, even though poison is not an

agent of a killing.

15

A standard view in linguistics is that this generalization is not accidental, and in fact holds

robustly across all human languages. Thus, (11) is normally taken as an important structural

generalization about human languages that linguistic theories should respect and explain:

(11) All verbs with Agents* associate the Agents* with their subject positions.

There is much more to be said about (various theories of) Agency* and other thematic roles and

thematic role-like elements. However, this brief introduction will be enough to introduce the

methodological criticism I want to explore.

2.2 Criticism of the Theory Via Methodological Principle

A number of philosophers have taken linguists to task regarding their use of notions like

Agency*. I noted above that Agency* isn’t agency, so some critics have argued that linguists

need to define what is meant by this new notion of Agency*. To give this worry a name, I will

call it the Problem of Undefined Terms (PUT). The most vocal advocate of PUT is Jerry Fodor.

For instance, Fodor writes,

If a physicist explains some phenomenon by saying ‘blah, blah, blah, because it was a proton…’ being a word that means proton is not a property his explanation appeals to (though, of course, being a proton is). That, basically, is why it is not part of the physicist’s responsibility to provide a linguistic theory (e.g. a semantics) for ‘proton’. But the intentional sciences are different. When a psychologist says ‘blah, blah, blah, because the child represents the snail as an agent…’, the property of being an Agent*-representation (viz. being a symbol that means Agent*) is appealed to in the explanation, and the psychologist owes an account of what property that is. The physicist is responsible for being a proton but not for being a proton-concept; the psychologist is responsible for being an Agent*-concept but not for being an Agent*-concept-ascription. Both the physicist and the psychologist is required to theorize about the properties he ascribes, and neither is required to theorize about the properties of the language he uses to ascribe them. The difference is that the psychologist is working one level up. (Fodor 1998, 59, underlining added; cf. Fodor and Lepore 2005, 353 – 4 for similar sentiments)

Prima facie, Fodor’s argument here seems pretty compelling.9 If Agency* doesn’t mean agency,

then what does it mean? Demanding that one define one’s terms seems like a reasonable and 9 It is somewhat odd that Fodor endorses PUT. After all, the linguist posits the notion of an Agent* as a

linguistically primitive element, and Fodor has vigorously defended the view that such (the concepts denoted by)

linguistically primitive elements can’t be defined. Instead, he maintains that the best theory of what they mean is

simply given atomistically. According to atomism, the best theory of meaning says only that the word dog means

dog (or better, dog denotes the concept of dogs). One would think that such an attitude would carry over to other

parts of language, too: Agen*t means Agent*.

16

important requirement. After all, if you don’t know what Agency* means, then how can you use

it in an alleged “generalization” like (11)? If you only say that the subject of transitive verbs can

be the Agent* of the verb, but no independent constraints are placed on what it is to be an

Agent*, then the alleged generalization about verbs has no content. To see this, just replace the

technical term Agent* with any other made-up word, say flurg (cf. Fodor 1998, 59). Now (11)

can be restated as All verbs with flurgs associate the flurgs with their subject positions.

Obviously, with no theory of what flurgs are, this statement is empty.

2.3 Is the Principle Justified by General Scientific Considerations?

As his allusion to physics makes clear, Fodor believes that any scientific theory must “provide an

account of what property” is denoted by any theoretical term it uses. Indeed, his criticism is just

that linguistics violates this general principle of science. Alas, this principle is not generally true

in the sciences. It’s often the case that a theoretical term is introduced to denote some

hypothetical property that is posited in the theory in order to account for some kind of “surprise”

or “pattern” in the empirical data. Moreover, the nature of this theoretical property is often

determined not by stipulation in advance, but by continuing theoretical work. This pattern of

theory construction is especially common in the earlier, developing stages of some area of

inquiry, which is undeniably where all areas of linguistics currently are at (cf. §3 for some

discussion of this last claim). In that sense, it is common for a scientist to “theorize about the

properties he ascribes”, but this does not amount to producing more of “an account of what

property” he postulates than the linguist produces regarding Agency* and the like. Furthermore,

just like our first dualism, this aspect of scientific inquiry is encoded in the mathematical

apparatus that constitutes our standard forms of data analysis.

To flesh out these ideas, let’s consider an example. As before, I stress that this is only an

example. It is meant to illustrate what scientists standardly do when exploring complex

phenomena. Suppose we are examining the concentrations of three chemicals X, Y, and Z in a

given region. One hundred groundwater samples are taken from the region, and the amounts of

each of X, Y, and Z are recorded. When the data are plotted as points on three axes, they are

distributed as in Figure 1 below. Rather than being randomly dispersed, the data appear to be

more structured around a general “pattern”. This structure is of course a real surprise, in the

17

sense that it’s extraordinarily improbable that a random sample of unrelated measurements

would ever yield such a pattern. (The boxes are scaled to a 1-1-1 ratio to visually present the

correlations, as opposed to the covariances, of the three variables.)

-200-100

0100

-200

0

200

-2000

-1000

0

1000

2000

-200-100

0100

-200

0

200

Figure 1

-200-100

0100

-200

0

200

-2000

-1000

0

1000

2000

-200-100

0100

-200

0

200

Figure 2 Figure 3

It’s the essence of the sciences not to ignore such patterns in the world. A natural first step is to

try to understand “how much” of a pattern is there, and what its nature is. Obviously, the relative

concentrations of X, Y, and Z are related. From the geometric perspective of the cube, the data

appear to be organized around an angled plane, partly depicted in Figure 2. The fit isn’t perfect,

and the planar surface lies at a skewed angle, so all three axes of the cube are involved. But if we

used a different set of axes, we could view the data as organized primarily along two axes. In

other words, suppose we replaced axes X, Y, and Z with three new axes, A, B, and C. (If we

keep A, B, and C perpendicular to one another, we can think of ourselves as holding the data

fixed in space, but rotating the cube.) Moreover, suppose that we choose these axes so that the A

axis runs right through the center of the swarm of data. In other words, let A to be that single axis

on which we find as much of the variation in the data as possible. If we had to represent all the

variation in our data with just one axis, A would be our best choice. It wouldn’t perfectly

reproduce all the information about X, Y, and Z, but it would capture a lot of it. Suppose that we

now fixed the second axis B in such a way that it captured as much of the remaining variation in

the data as possible, after we factor out that variation that is represented by the A axis. Together,

A and B would determine a plane lurking in the three-dimensional space. (The two darker lines

in Figure 3 correspond to Axes A and B.) By projecting all the data onto this plane, we could

recover quite a bit of information about the variation in the data. We won’t recover all the

18

information in the original three-dimensional space, but we’ll recover a lot of it. (We’ll miss just

that information regarding how far to one side or another from the plane the actual data points

lie.) If we set axis C to best capture the remaining information, we will then be able to recover all

the information in the original space. If we represent the original data using all three new axes,

we will be merely representing the original data as linear combinations of X, Y, and Z. If,

however, we are satisfied with the amount of information we get using only one or two axes, we

can represent the data in a less complex manner, using only one or two dimensions, instead of

the original three-dimensional format. Whether we represent the pattern in the data with just axis

A, or with axes A and B is not a decision that the mathematical analysis itself makes for us,

although in many cases other techniques or considerations will provide strong evidence for one

option.

The technique just described is called Principal Component Analysis (PCA). The

mathematical essence of PCA involves finding new axes on which to represent the data. The first

Principal Component (PC) is axis A, and the second and third PCs are B and C respectively.

Clearly, this technique is not restricted to three dimensions – it can be used with any (finite)

collection X1,…,Xn of measurements.10 The real scientific import of PCA comes when we find

that e.g., one or two PCs can account for say 95% of the variation in one or two hundred types of

observations. Geometrically, this is like finding that in a space of one or two hundred

dimensions, all the data are arranged almost perfectly in a straight line or on a single two-

dimensional plane lurking in that space. That is a pattern far too extreme to be random, and it

10 Very briefly, here is a general description of the mathematics of PCA. You first construct an n × n correlation (or

covariance) matrix, where the ijth entry gives the correlation between Xi and Xj. One then extracts all the

eigenvalues (there are virtually always n of them) and eigenvectors of unit length from this matrix. Ordering these

eigenvectors according to the size of their eigenvalues, we obtain our PCs. The kth eigenvector gives the direction of

the kth axis in the n-dimensional data space. Moreover, the kth eigenvalue expresses the amount of total variation in

the data that is captured by the kth PC. (When working with a correlation matrix, the total variation will always be

n.) Also, the amount of variation in a given measurement Xi that the kth PC accounts for is given as kik aλ , where

λk is the kth eigenvalue and aik is the ith element of the kth eigenvector. This form of PCA produces perpendicular

axes, each of which accounts for the maximum amount of remaining variance in the data. Once one decides to retain

m PCs (where m is almost always much less than n), the basis for the m-dimensional subspace can be changed to suit

background hypotheses, etc.

19

cries out for explanation. PCA and related techniques can help quantify this pattern in useful

ways.

By exposing a small number of dimensions of variation which capture most of the

variation in the data, we derive an explanandum. Why should the concentrations of three – or

more realistically, 30 or 100 – different chemicals behave as though they were from only two

sources? At this point, an empirical/metaphysical hypothesis suggests itself: maybe they behave

this way because there are two sources responsible for the emission of the chemicals. At this

point in the investigation, we may not be able to say much about these two hypothesized sources

other than that they are what are emitting the chemicals in question. We can’t, for instance,

automatically infer that one source corresponds to axis A and the other source to B. Axes A and

B determine a plane in the data-space, but infinitely many other pairs of lines (not necessarily at

right angles) could also determine that same plane. (In fact, I used the three lighter axes in Figure

3, which are not at right angles to one another, to generate the data.11) If the two sources do

correspond to axes other than A and B, this means that they each emit different concentrations of

X, Y, and Z than A and B predict. If the sources correspond to two axes that are not at right

angles, this means that their emissions of chemicals are correlated with one another. Finally,

even the notion of a “source” must be understood in a broad functional sense. It’s possible that

there are two physical sources that both emit the same relative concentrations of X, Y, and Z, and

so one of the axes represents both their emissions. Alternatively, one axis could represent the

joint effects of several physical sources that emit different concentrations of X, Y, and Z, but

which are all perfectly correlated with one another. At the same time, the hypothesis that there

are two sources of emission is a very strong and testable hypothesis, not least because together

they must nearly determine the particular plane in the data-space (cf. e.g., Malinowski 2002, ch.

10 – 12 for discussion of this and other uses of PCA and related techniques in chemistry).12

11 The two longer axes occur on the plane (given by 4X + 5Y – Z = 3), and the third axis is just Z. The data was

generated by randomly selecting points (coded on the X, Y, Z axes) on each axis from a normal distribution with

mean 0 and variance 50. These points were then weighted at .65, .35, and .05 and added together to produce the

data. 12 There are lots of other ways to explore the results of the PCA. If two candidate sources are found, the PCA will

supply evidence about their correlation. If the sources are different factories, this may indicate e.g., the degree to

which they are working together, or are both influenced by the same economic factors, etc. Also, if the two sources

20

PCA is one member of a family of methods – which includes factor analysis, structural

equation modeling, latent class analysis, discriminant analysis, multidimensional scaling, and

others – for exploring the extent to which there is latent structure in the data. These techniques

all involve the uncovering of underlying regularities that appear when individuals (persons,

groundwater samples, etc.) are measured in a variety of different ways. (Such techniques bear

some similarity to Whewell’s (1840) “consilience of inductions”, although they supply much

more information and structure.) In the study of complex phenomena, where many different sorts

of measurements are possible, the use of such techniques is extremely common, particularly in

the early stages of inquiry, but also consistently throughout the development of the theory.

Indeed, one would be hard-pressed to identify a line of scientific inquiry into some complex

phenomena where such techniques weren’t used. It’s just what you do with data.

In short, in the early stages of ordinary scientific inquiry, it is perfectly possible to

hypothesize the existence of unobserved empirical structures, sources, etc. without having much

of a theory about their natures. It’s certainly correct that as the investigation develops, the

chemist “owes an account” of the nature of the sources of emission. However, at the early stages

of inquiry, the details of this account may be a long ways off. Indeed, for highly complex

situations, there may be a lengthy initial period of many rounds of data analysis, with a focus

only on figuring out how many latent structures there are and what kinds of overt measurements

they affect. If scientists were required to precisely characterize all hypothesized structures, a

great deal of successful research into complex systems would be illegitimate.

2.4 Is Linguistics Relevantly Different from the Other Sciences?

If we consider the underlying logic of quantitative methods like PCA, here is what we find.

Sometimes there are one or more significant PCs lurking in a multidimensional data-space. If

both the dimensionality of the data-space and the amount of variation that the PCs account for

are large, then we have a significant explanandum that needs explaining. This need can justify

the provisional adoption of a hypothesis that there is some unobserved empirical structure

underpinning these mathematical PCs. Of course, the hypothesis may be wrong, and even if it is are discovered using only some of the measurements, say X and Y, then the PCA will express the relative amounts

of Z that the sources emit. If Z is a noxious pollutant, this may be extremely important information.

21

on the right track, much of the nature of this unobserved structure is an issue for further research.

Importantly, though, the multivariate nature of the PCs constrains (and thus helps to form)

hypotheses about the unobserved structures responsible for the PCs. That is, the fact that the PCs

are built out of multiple overt measurements severely limits what sorts of things they could

represent. E.g. inspection of our simple example above shows that if you know what a proposed

PC predicts on just one dimension, say X, then you can tightly constrain what it predicts on the

remaining dimensions. Put another way, not all sets of possible predictions correspond to

possible PCs.

The situation with linguistics is very similar to what we have just seen with quantitative

data analysis. For starters, the discipline of linguistics is certainly still at an early “exploratory”

stage, and it certainly concerns a very complex phenomenon. Aspects of these phenomena are

represented with various measurements, which themselves aren’t quantitative, but instead

concern such things as the grammaticality or acceptability of a sentence, its sound and meaning,

etc. Linguistic theorizing is currently centrally driven by examining various patterns that exist

within various interestingly clustered sets of sentences. The goal is to uncover the latent

structures responsible for (much of) these patterns. Moreover, just as with PCA, the structures

we hypothesize may not capture all the empirical data – linguistic theories will employ residual

effects, as we saw in §1. Thus, it’s natural to understand thematic roles like Agency* as having

the same sort of epistemic status as the latent structures in any other exploratory data analysis.

True, we don’t know fully what Agency* is, but that doesn’t mean that we can’t provisionally

hypothesize the existence of such a latent element as part of a theory about why our overt

measurements (i.e., linguistic judgments) behave as they do. In the early stages of inquiry, one

chooses to provisionally hypothesize the existence of thematic roles or of a correlate of a highly

significant PC for largely the same reasons: both types of hypotheses are testable in many ways

and have lots of room for potentially wide-ranging augmentation and refinement through further

scientific inquiry.

So why did the Problem of Undefined Terms seem so compelling? I suspect that there are

two main reasons for this.13 First, PUT encourages us to think of a completed theory of Agency*.

13 Actually, there is also probably a third reason, which has to do with the unclarity of some of the crucial

judgments, and the fact that such unclarity often seems to accumulate as theories become increasingly complex.

22

In a finished linguistics, we would expect more details about Agency*. But the real issue is about

justifying the very beginnings of such a theory. Why it is worthwhile to explore a theory that

posits thematic roles rather than some other one, say one that posits tiny fairies in our brains?

Second, recall that the real worry behind PUT was that a technical notion like Agency* is

too unconstrained and underdetermined to be useful in theorizing or in forming generalizations.

If there are no constraints on what it is to be an Agent*, then one can make a generalization like

All (or only) Agents* are Fs true by brute force. For any potential counterexample C to the

generalization, nothing prevents you from stipulating that C is not an Agent*. But this worry is

defused when we notice that, just as PCs can only be (non-trivially) extracted from collections of

more than one sort of measurement (e.g. X, Y, and Z concentrations), the extraction of linguistic

structure always involves multiple sorts of linguistic phenomena. To see this, consider the

following simplified sketch of how one might justify positing Agency* in a linguistic theory.

Since I only want to illustrate a very common method in linguistics, I’ll omit lots of details and

data, in order to avoid the complexities of doing linguistics straight out. You don’t have to be

convinced of the details of the example in order to understand the method employed. (All the

ideas and data, though, are very familiar from the linguistics literature.)

Suppose a linguist, call her Lana, notices that some derived nominals (nouns that Lana

hypothesizes to be derived from an underlying verb) have a form that corresponds to the passive

form of the verb (12), but other verbs do not (13):

(12) a. Sharon proved the theorem/ the theorem’s proof by Sharon

b. John destroyed the vase/ the vase’s destruction by John

c. John created the vase/ the vase’s creation by John

(13) a. Sue loved Mary/ *Mary’s love by Sue

b. Sue resembled Mary/ *Mary’s resemblance by Sue

c. Sue awakened / *the awakening by Sue

There is a lot to be said about this issue. However, since it does not pertain directly to the Problem of Undefined

terms, I will leave this topic for another day.

23

After studying this pattern, Lana begins to explore the hypothesis that there is some structural

property she calls flurg present in the nouns or verbs in (12) and absent in (13), or vice versa. At

this point, flurg simply encodes a difference between two kinds of words. But then Lana notices

that with nominals like (12), the nominal can be the complement of a possessive, or it can have a

passive by-phrase adjunct, but not both (although it can take some by-phrases and possessives):

(14) a. The Roman’s destruction of the city

b. The destruction of the city by the barbarians

c. *The Roman’s destruction of the city by the barbarians

d. The Roman’s destruction of the city by catapults and mass attack

Lana now hypothesizes that such nominals have some property that can license either the

possessive or the passive by-phrase, but not both. She then hypothesizes, that this property is

flurg, the same one used in (12) – (13). As a third bit of data, Lana notices that languages lack

symmetric pairs of verbs for asymmetric events. E.g., while we have verbs like kick, lift, build,

etc., we don’t have verbs like blik, where x bliks y if and only if y kicks x (and similarly for lift,

build, etc.). (In contrast, notice that we do find symmetric pairs elsewhere; e.g., the direct and

indirect objects of Sue sent a letter to Tim and Sue sent Tim a letter can be exchanged without

any apparent change in meaning.) Now Lana hypothesizes that flurg is once again responsible

for this phenomenon, because flurg must necessarily be located in the subject position of the

verb, and can never appear in object position. Of course, any of these hypotheses could turn out

to be false, but so far Lana feels that her developing theory of flurg and its roles in language is

sufficiently plausible to merit further study.

As Lana continues to study the words that she hypothesizes have flurg versus those that

don’t, she begins to sense that words with flurg all share a certain semantic similarity, although

she cannot fully articulate what it is. To explore this hunch, she gives a very brief

characterization of what little she knows about this semantic similarity to a variety of people

(both linguistically trained and untrained). She uses only a couple words as examples, and then

gives her subjects a large number of other words, and asks them to indicate whether they

perceive this hypothetical semantic property in the words, and if so, where (e.g., in subject or

object position). To her surprise, there is an enormous amount of agreement across subjects.

24

They typically find flurg clearly present or absent in the same places, and are unsure about the

same cases. As Grice and Strawson (1956) noted, this kind of agreement marks a distinction

(between the presence and absence of flurg), even if many details about the nature of the

distinction are unknown. Attending to the cases where there are uniformly clear judgments, Lana

notices that these are also places where the other hypothesized effects of flurg occur. Thus, she

further expands her hypothesis about flurg, claiming that it is, or is associated with some kind of

semantic feature, which she is currently investigating, but has not yet fully identified or even

confirmed. Since this hypothesized semantic feature of flurg corresponds roughly but not

completely to the notion of agency, she renames flurg Agent* out of convenience.

Notice that in the story just told, the nature of flurg is unconstrained only at the very

beginning, when it is hypothesized to underwrite just one distinction. However, as the theory is

developed to account for multiple distinctions, flurg becomes more tightly constrained. By the

end, the hypothesis that flurg exists is rather demanding. It is not enough for subjects to simply

feel that flurg is clearly present/absent in a given word; the theory also makes (heavily ceteris

paribus, as it turns out) predictions about the syntactic behavior of that word, and it predicts

where in the word, if anywhere, subjects will sense flurg’s semantic presence. The multiple

dimensions that are used to characterize flurg render it anything but an unconstrained

hypothetical element.

In PCA, one seeks out a few vectors in the data space that explain most of the variation in

the data. In linguistics one seeks out a few structural elements that explain most of the variation

in the data. In neither case are these elements always fully defined or understood. Demanding a

complete definition in the linguistic case is a methodologically dualistic standard. I know of no

defense of such an atypical stance towards linguistics. Certainly Fodor’s incorrect appeal to the

sciences (quoted above) offers no such support. As a final comment, it’s worth observing that the

parallel in linguistics with techniques like PCA is quite strong. Indeed, work in consensus theory

suggests that it might be possible to perform some form of latent variable analysis on the kinds

of collective linguistic judgments discussed above (e.g., Batchelder and Romney 1988). In such a

case, the evidence for a structural element like Agency* could be reduced to something like the

question of whether the first PC in the data space captured nearly all the variation in the data.

This topic has not been researched by theoretical linguists.

25

3 Dualism III: Aggregation and Degrees of Accuracy In the previous two sections, we considered two dualistic principles which, though rejected,

attempted to make linguistic theorizing harder in some respects than ordinary scientific

theorizing. At this point, it’s natural to ask, are there any commonly accepted aspects of

linguistic methodology that make linguistic theorizing easier than scientific theorizing? In fact,

there are, and in this section, I gesture at some of them. My remarks here will be brief, although I

address this issue in more detail elsewhere (Johnson, ms.).

Before beginning, two caveats are in order. First, I’ll illustrate the divergences between

linguistic and (other) scientific methodologies with a particular example. But as will be obvious

to anyone familiar with mainstream linguistic methods, the morals of this case study generalize

very broadly to a vast amount of linguistic research. Also, since the purpose of the example is

only to illustrate certain widely used methods of linguistic theorizing, I make no effort to

exhaustively characterize the relevant literature. For present purposes, that would only

complicate matters by presenting a more complex instance of the same methods my simplistic

example will bring out. (Indeed, the precise details are unimportant enough that readers familiar

with linguistic methods may wish to skip the example altogether, and go right to the discussion.)

Second, my critical remarks are not intended to be some sort of “tearing down” of

linguistic theory or the “harassing of emerging disciplines” (Chomsky 2000, 60, 77). Rather, I

suggest only that linguistics should follow the quantitative sciences, which routinely study their

own methodologies in order to better understand, use, and improve them.14 Without such study,

there will continue to be many reasons why Chomsky was incorrect to claim that ordinary

linguistic practices have “exhausted the methods of science” (Chomsky 1986, 252). With these

caveats in hand, let us turn to the example.

In a series of papers, Norbert Hornstein (1998, 1999, 2000a, 2000b, 2004) has argued that

control phenomena can be accounted for simply by allowing movement into theta positions. E.g.,

the relevant syntax of (15a) does not have the traditional form in (15b), where PRO is a distinct

lexical item controlled by Sue. Instead, the proper form is in (15c), where Sue has moved from

14 To verify this last claim, one need merely consult a current statistics journal, or a journal of a particular science

that publishes papers on mathematical methods (e.g., The Review of Economics and Statistics or The Journal of

Mathematical Psychology).

26

the lower subject position to the higher one. (Following Hornstein, I treat movement as a

combination of the Minimalist operations of Copy and Merge.)

(15) a. Sue wants to win;

b. Suei wants [PROi to win];

c. Sue wants Sue to win.

More generally, Hornstein argue that linguistic theories don’t need to posit the null pronomial

element PRO at all. Nor do linguistic theories need to posit a control module to determine the

referent of an occurrence of PRO. Hornstein’s argument is primarily driven by considerations of

simplicity. PRO is unnecessary in a theory, he argues, because the phenomena that initially

motivate positing PRO can be accounted for by appealing to independently motivated

components of the grammar. Movement (aka Copy and Merge), Hornstein assumes, is a

prevalent feature of grammar. If all the relevant facts can be accounted for without positing PRO,

then (with other things being equal, as well), linguistic theories should favor the simpler theory

and reject the employment of PRO. In the development of this theory, Hornstein also notes

several advantages to his theory. Here are two representative examples.

It’s well-known that both PRO and traces (i.e., the residue of Copy and Merge) are both

phonetically null. By identifying the two, we reduce the need to explain this fact from two

separate phenomena to just one. Similarly, the fact that wanna contraction can occur either with

raising or with control will now require only one explanation:

(16) a. I seem to be getting taller.

a’. I seemta be getting taller.

b. I want to get taller.

b’. I wanna get taller.

Famously, there is some need to explain this type of contraction, because it does not happen

willy-nilly. You can turn You want to help Mary into a wanna-question by asking Who do you

wanna help?, but you can turn You want John to help Mary into a question only by asking Who

do you want to help Mary? It is not grammatical to ask *Who do you wanna help Mary?

As a second advantage, Hornstein considers “hygienic” verbs, as in Peter

washed/dressed/shaved Dan. These verbs are interesting, because they can also appear

intransitively:

27

(17) Peter washed/dressed/shaved

Notice that (17) has a reflexive meaning; it says that Peter washed himself (or shaved himself

etc.) This stands in stark contrast to other verbs that can drop their objects; John ate does not

mean John ate himself, only that he ate something. According to standard views, the reflexive

behavior of (17) is puzzling, since a reflexive reading would most naturally be supplied by an

element like PRO, but PRO is typically thought to appear only as the subject of a clause (e.g.,

Chomsky and Lasnik 1993).15 But on Hornstein’s view, the syntactic element PRO is replaced by

whatever structure underlies movement phenomena, and it is well-known that movement can

occur from object position – e.g., on Hornstein’s view, who did Shaun kiss whoi would be an

example.16 Hornstein shows how the reflexive readings in (17) are (relatively) neatly and

unproblematically produced within his theory.

Unsurprisingly, Hornstein’s proposal has not gone unnoticed (e.g., Brody 1999,

Culicover and Jackendoff 2001, Landau 2000, 2003, Manzini and Roussou 2000). Here are two

representative criticisms of the view. The first problem comes from Landau 2003. Landau argues

that Hornstein’s theory has problems accounting for partial control, illustrated in (18):

(18) The chair wanted to meet on Tuesday afternoon.

(18) is most naturally interpreted as expressing that the chair wanted some set X to meet on

Tuesday afternoon, where X contains the chair and at least one other person. Roughly and

intuitively speaking, partial control constructions are distinctive in that the controlling DP is only

a proper subset of the collective subject of the embedded clause. Further evidence that there is a

15 For instance, Bill wants Mary to kiss cannot have the structure Billi wants Maryj to kiss PROi/j; it can mean neither

that Bill wants Mary to kiss Bill nor that Bill wants Mary to kiss herself. 16 On a technical note, it is standard to distinguish (roughly speaking) A-movement from A’-movement. However, if

one relinquishes, as Hornstein does, the constraint that a syntactically realized NP (or DP, I won’t adjudicate here)

must bear at most one theta role, such a distinction becomes less well motivated. As Hornstein discusses at length, a

central component of his theory is that syntactic chains can bear multiple theta roles. This relaxation of the Theta

Criterion in many ways is a central bit of machinery of his theory.

28

plural syntactic subject in the lower clause comes from the ability of partial control to support

distinctively plural types of predicates and anaphors:

(19) a. Susan enjoyed getting together on weekends.

b. Steve wondered whether helping one another would be productive in the long

run.

It is hard to see how a “control as movement” view such as Hornstein’s can handle partial

control. The relevant structure of (18) is simply The chairi wanted to the chairi meet on Tuesday

afternoon. Nothing in Hornstein’s theory appears to explain how the overt copy of the chair

denotes a single person, but the deleted copy of that very expression denotes a group, of which

the chair is only one member. This suggests that the subject of the lower clause of (18) is

realized as something other than merely a copy of the chair. Further evidence that this other

element may be PRO comes from the fact that partial control does not appear to exist in raising

constructions:

(20) *The chair seemed to meet on Tuesday afternoon.

“Without further detail”, Landau argues, “one can already see how damaging the very existence

of partial control is to the thesis ‘control is raising’. Simply put: there is no partial raising. It is

not even clear how to formulate a rule of NP-movement that would yield a chain with

nonidentical copies” (Landau 2003, 493).

The second problem comes from Brody (1999, 218 – 19). Consider the following pattern.

(21) a. John attempted to leave.

b. *John was attempted to leave.

c. *John believed to have left.

d. John was believed to have left.

Why can’t (21c) be used to express that John believed that he himself had left, just as (21a)

expresses (roughly) that John attempted to make himself leave? Similarly, why can’t (21b) be

29

used to express that someone attempted to make John leave, just like (21d) expresses that

someone believed that John had left? If control is just movement, as Hornstein proposes, then we

have no explanation for the different syntactic abilities of what is typically thought to be NP-

movement – (21c,d) – and what is typically thought to be control – (21a,b).

Now, of course there is a great deal more to be said about Hornstein’s theory. The theory

has more prospects and problems than what I’ve presented, and there are objections and replies

to them, and objections and replies to the objections and replies, and so on. But the points to

follow can be made by examining just a few considerations.

In ordinary scientific inquiry, there are a great many questions regarding the relation

between the empirical data, background assumptions, and the resulting theory or theories

generated from them. Neither the questions nor the answers are inherently quantitative, although

with statistical modeling, they are typically expressed that way. As linguistics is currently

practiced, though, there is no known way to address the vast majority of these questions. Of the

many questions that are routinely studied in the other sciences but not linguistics, one is an 800-

pound gorilla. I call it the problem of model-performance aggregation, or the problem of

aggregation for short. Intuitively speaking, the problem is that current linguistic methods don’t

provide any systematic means for aggregating multiple assessments of a theory into a single

more general assessment. To see what I mean here, consider the following example.

Imagine a linguist who needs to evaluate Hornstein’s view. Perhaps, e.g., she works in a

related area of syntax or semantics, and she is trying to decide whether a movement analysis of

control is promising enough that she should explore incorporating this view into her own theory.

(Obviously, if she has little faith in the view, she will be disinclined to expand her own theory in

this direction.) Simplifying greatly, suppose also that her only considerations about the view

concern the advantages and disadvantages just listed: she thinks that (i) Hornstein’s theory does a

good job accounting for certain reflexive intransitives and (ii) for wanna-contraction. But she

thinks (iii) the theory is weaker at handling partial control and (iv) passivization. Our linguist

recognizes that Hornstein’s theory can be made to handle (iii) – (iv), but she also thinks that the

only way to do so is rather unelegant and somewhat ad hoc. (For the moment, let’s bracket the

very difficult issue of how she arrived at these judgments.) Now what does she do? How should

she combine her various assessments about Hornstein’s theory to arrive at a single assessment?

At this point, linguistic methodology comes to a grinding halt. If our linguist has no further data

30

or considerations to add, she cannot further analyze the situation except by appealing to her

subjective impressions (and perhaps also the impressions of her colleagues) of the overall

promise of the theory. This is the problem of aggregation: linguistic methodology provides no

theoretical tools to guide the inference from the collection of considerations to an overall

assessment of the theory.

Although the problem of aggregation is rather obvious, its importance shouldn’t be

underestimated. Its force can be brought out with some very naïve questions. Is the overall

assessment of the theory in question positive or negative in light of (i) – (iv)? Does the advantage

of (i) outweigh the disadvantage of (iv)? If so, by how much? Enough to offset the sum

disadvantage left over from (iii) when the advantage of (ii) is factored out? If we factor in

another feature, say (v) the relative simplicity of the theory, is the theory preferable to a given

rival theory? What if we encounter some further data, and we are unsure of how it impacts on

Hornstein’s theory? Suppose our linguist is also considering a rival theory that she feels does

well on (i) and (iii) but not so well on (ii) and (iv), but she also feels it does reasonably well with

respect to some further area (vi). How is she to decide between them? Linguistic methodology

provides no systematic means for addressing any of these questions. You just have to go with

your gut.

The fact that linguistic methodology offers no way to aggregate various aspects of the

performance of the model on actual empirical data stands in stark contrast to the (other) sciences.

Indeed, it’s fair to say that the ability to aggregate diverse aspects of the performance of a model

is the single most central aspect involved in the study of the relation between a model and the

empirical phenomena it models. We’ve already seen one simple form of assessing how well

overall a given model does with respect to some data. In our discussion of (5) and (6) above, we

considered the sum of squared deviations of the data from the model’s predictions:

. This quantity represents an aggregation, via addition, of all model’s failings

to accurately predict the data. It appears in a variety of techniques for analyzing and assessing

the overall performance of a given theory. Moreover, there are a large number of sophisticated

techniques that aggregate such diverse aspects of a theory as its empirical coverage and its

“simplicity”. For instance, much attention has been paid to (various forms of) the Akaike

Information Criterion (e.g., Forster and Sober 1994, Burnham and Anderson 2002). Additionally,

]))([( 21 i

Iii XfY −∑

∈

31

there are a wide variety of other methods of “model selection” (e.g., Zucchini 2000) but they all

centrally involve the aggregation of various features of the model into a single assessment.

Is the problem of aggregation really a problem? For analyses that require the aggregation

of different aspects of a model’s performance, can’t we trust the reflective, considered judgments

of professional linguists? In practice, of course, that’s what we do, but the present question

concerns the reliability of such an analytic and inferential strategy. Although I don’t know of any

research on linguistics, decision analysts have paid a great deal of attention to the reliability of

professional (e.g., scientific, medical, academic) judgments. The short answer is that scientists’

intuitive assessments of theories are subject to the same foibles that pervade ordinary human

judgment and decision-making. In particular, scientists are apt to be overconfident about the

accuracy, success, and promise of their favored theories, and underconfident about their rivals’

theories (e.g., Frischoff and Henrion 1986; cf. Bishop and Trout 2005 for detailed discussion and

many more citations). Additionally, this lack of inferential guidance in linguistic theorizing

renders it virtually impossible for linguists to recognize certain sorts of “surprising results” that

are fairly common in the other sciences, and which we may expect to obtain in linguistics. (E.g.,

there are many instances where what is commonly agreed to be a good/bad theory is seen to be

not so, only after the use of more systematic, principled, and precise instruments than

professional judgments.) Such facts explain why the non-linguistic sciences maintain active

research programs into the study and refinement of methods for inferring theories from the

empirical data and background hypotheses. Indeed, a central part of applied statistics and related

subject-specific areas (e.g., psychophysics, biometrics) involves developing fine-tuned methods,

often restricted to very specific sorts of models and inferences, that can be used in place of

human judgment. It is a commonplace that the careful use of such methods produces better –

often dramatically better – results than unaided judgments.

The problem of aggregation is only one particularly striking instance of a divergence

between linguistic and (other) scientific methods. In general, it is very hard to see how to

systematically combine any effects or features of a linguistic theory into a single collective

judgment. Even more generally, linguistic methodology provides remarkably little in the way of

systematic study and analysis of the relation between one’s empirical data and one’s theory

(although it does provide some). In contrast, such methods are a central component of the (other)

sciences. To give a sense of this, I will end by briefly sketching another area where statistical

32

methods have been highly developed for assessing one’s theory in light of the data. (On a

technical note, the quantitative models I have discussed have all been continuous, rationally

scaled variables. All points made in this paper could just as easily be made with nominally scaled

variables, which correspond to measurements such as yes/no, on/off, present/absent,

grammatical/ungrammatical, answer A/B/C/D, etc.)

Statistical models and inferences rely on background assumptions, typically quite

substantial ones, such as that the residuals εi discussed above are all normally distributed (i.e.,

they fall on a “bell curve”). Typically many of these background conditions are false. An

important question in statistical modeling concerns how sensitive the model and its predictions

are to various kinds of violations of these assumptions. If a model’s predictions vary

substantially with relatively minor violations of the assumptions, the model and inferences from

it are said to be “fragile”, or to lack “robustness”. Fragility also appears when a model and its

inferences are heavily influenced by “minor” aspects of the data that is selected to derive the

model (and/or its parameters) But as Leamer (1985, 308) notes, “a fragile inference is not worth

taking seriously.” This is because the background assumptions typically are false, and models

(particularly models of complex phenomena with substantial residual effects) that are highly

sensitive to small changes in empirical data are unreliable.

The very same issues arise for theoretical linguistics. Mainstream linguistic theories rely

heavily on a whole host of background assumptions, primarily in the form of idealizations. We

make various idealizations about the speakers of the language, their judgments about various

expressions, the computational architecture of grammatical competence, the nature of Universal

Grammar, etc. These idealizations are important, perhaps crucial, for real-life work in linguistics.

But they are probably also all false. If we accept that “a fragile inference is not worth taking

seriously”, we need to develop better methods for analyzing how well various linguistic theories

perform even when some of the idealizations above are relaxed in various ways. Furthermore, we

need to develop better methods for understanding and estimating the degree of dependence of

individual theories on potentially “variable” linguistic judgments of various sorts. And in

assessing various theories, we should penalize, perhaps severely, theories that are too sensitive in

either of these ways. The systematic study of the sensitivity of models has played virtually no

role in contemporary linguistics.

33

In addition to the issues discussed here, ordinary scientific research involves quite a

number of other forms of analysis of the theory-data relationship which raise corresponding

questions about this relationship in linguistics. A few examples include such notions as

tolerance, leverage, reliability, validity, and the identifiability of a model’s parameters. Although

these notions raise questions about linguistic theorizing, there has been essentially no work in

theoretical linguistics to address these issues. Until at least some of these issues are addressed

more conscientiously, we must realize that in some important respects, linguistic theories are

held to different standards than ordinary scientific theories. In short, in many respects, it is not

even remotely true that ordinary linguistic methods have “exhausted the methods of science”

(Chomsky 1986, 252).

4 Conclusion If we treat linguistics as a genuine scientific activity, then such philosophical issues as the

Problem of Unsaved Phenomena and the Problem of Undefined Terms simply disappear. At the

same time, a wide range of new issues arise regarding the study and improvement of linguistic

methods. Linguistics is rather unique among the sciences in that it has seen relatively little work

on its methodology. But linguistics is arguably the area of inquiry that needs such work the most.

On a final note, one occasionally encounters the view that it’s optional whether one treats

linguistic theories as scientific, and that linguistic theories can alternatively be treated as

“philosophical” theories. I confess I simply don’t understand this position. Nonetheless, there are

a few things that can be said in response. First, such alternative theories don’t appear to compete

or conflict with anything I’ve said about scientific theories. Second, I mean very little by calling

a semantic theory “scientific”. Linguistic theories deserve this appellate, I suggest, primarily

because their construction and confirmation centrally involve employing some of our best known

methods for obtaining knowledge about a particular empirical phenomenon. From this

perspective, it is unclear how one could reasonably defend the importance of a “non-scientific”

theory of language. (It’s trivial to show that there are other sorts of projects; the trick is to

interpret linguistic theorizing as some other project that is interesting and worth persuing.)

Moreover, my present use of the idea that linguistic theories are a type of scientific theory is, I

believe, especially uncontentious. So for an alternative view to avoid the conclusions we’ve

34

drawn, one needs to show why the particular features of linguistic theorizing that I’ve appealed

to are not part of some other (worthwhile) form of linguistics.

References Baker, Mark 1988, Incorporation, Chicago: University of Chicago Press. Basilevsky, Alexander 1994, Statistical Factor Analysis and Related Methods. New York:

Wiley-Interscience. Burnham, Kenneth, and David R. Anderson (2002). Model Selection and Multimodel Inference

(2nd ed.). New York: Springer. Cappelen, Herman, and Ernie Lepore, 2005. Insensitive Semantics. Oxford: Blackwell’s. Carsten, Rachael 2004, “Relevance Theory and the Saying/Implicating Distinction.” In L. Horn

and G. Ward (eds.) Handbook of Pragmatics. Oxford: Blackwell, 633 – 656. Chomsky, Noam 2000. New Directions in the Study of Language and Mind. Cambridge: CUP. Chomsky, Noam 1995. The Minimalist Program. Cambridge, MA: MIT Press. Chomksy, Noam 1988. Language and Problems of Knowledge. Cambridge, MA: MIT Press. Chomksy, Noam 1975, Reflections on Language. New York: Pantheon. Chomsky, Noam 1980, Rules and Representations. New York: Columbia University Press. Noam Chomsky 1986, Knowledge of Language, Westport, Conn.: Praeger. Chomsky, Noam, and Howard Lasnik 1993. “The Theory of Principles and Parameters”.

Reprinted in Chomsky 1995. Davidson, Donald 1986, “A Nice Derangement of Epitaphs”, in Lepore, Ernest (ed.) Truth and

Interpretation: Perspectives on the Philosophy of Donald Davidson. Oxford: Blackwell’s.

David Dowty 1991, “Thematic Proto-Roles and Argument Selection”, Language, Vol. 67, no. 3, pp. 547-619.

Fodor, Jerry A. 1998, Concepts, Oxford: Clarendon. Fodor, Jerry A. and Ernie Lepore 2005, “Impossible Words: A Reply to Kent Johnson”. Mind

and Language 20:3, 353 – 356. Forster, Malcom, and Elliott Sober 1994. “How to Tell when Simpler, More Unified, or Less Ad

Hoc Theories will Provide More Accurate Predictions” British Journal for the Philosophy of Science 45. 1 – 35.

Gleitman Lila R., and Mark Liberman (eds.) 1995. An Introduction to Cognitive Science, Volume I: Language. Cambridge, MA: MIT Press.

Grice, H. P. and P. F. Strawson 1956. “In Defense of a Dogma”. The Philosophical Review 65:2, 141 – 158.

Jane Grimshaw 1990, Argument Structure, Cambridge: MIT Press. Kenneth Hale and Samuel Jay Keyser 1986, “Some Transitivity Alternations in English”,

Lexicon Project Working Papers 7, Cambridge, MA: Center for Cognitive Science, MIT. Kenneth Hale and Samuel Jay Keyser 1987, “A View from the Middle”, Lexicon Project

Working Papers 10, Cambridge, MA: Center for Cognitive Science, MIT. Kenneth Hale and Samuel Jay Keyser 1992, “The Syntactic Character of Thematic Structure”, in

I. M. Roca (ed.) Thematic Structure and its Role in Grammar, New York: Foris, pp. 107-143.

Kenneth Hale and Samuel Jay Keyser 1993, “On Argument Structure and the Lexical Expression

35

of Syntactic Relations”, in K. Hale and S. Keyser (eds.), The View from Building 20, Cambridge, MA: MIT Press, pp. 53-109.

Kenneth Hale and Samuel Jay Keyser 1999, “A Response to Fodor and Lepore, ‘Impossible Words?’”, Linguistic Inquiry, Vol. 30, no. 3, pp. 453-66.

Hale, K. and Keyser, S. J. 2003, Prolegomenon to a theory of argument structure. Cambridge, MA: MIT Press.

Norbert Hornstein 1999, “Movement and Control”, Lingusitic Inquiry, Vol. 30, no. 1, pp. 69-96. Norbert Hornstein 2003, “On Control”, in Randall Hendrick (ed.) Minimalist Syntax. Oxford:

Blackwell, 6 – 81. Ray Jackendoff 1983, Semantics and Cognition, Cambridge, MA: MIT Press. Ray Jackendoff 1987, “The Status of Thematic Relations in Linguistic Theory”, Linguistic

Inquiry, Vol. 18, no. 3, pp. 369-411. Ray Jackendoff 1990, Semantic Structures, Cambridge, MA: MIT Press. Jackendoff, Ray 1997, The Architecture of the Language Faculty. Cambridge, MA: MIT Press. Jackendoff, Ray 2002, Foundations of Language. Oxford: OUP. Johnson, Kent 2004, “From Impossible words to Conceptual Structure: The Role of Structure

and Processes in the Lexicon” Mind and Language 19:3. pp 334 – 358. Johnson, Kent ms. “Methodological Divergences between Linguistics and the (Other) Sciences” Leamer, Edward 1985, “Sensitivity Analyses Would Help”, American Economic Review 75. 308

– 313. Beth Levin and Malka Rappaport Hovav 1995, Unaccusativity: At the Syntax--Lexical Semantics

Interface, Cambridge, MA: MIT Press. Malinowski, Edmund R. 2002, Factor Analysis in Chemistry. New York: John Wiley and Sons. Terence Parsons 1990, Events in the Semantics of English, Cambridge, MA: MIT Press. Pesetsky, David, 1995. Zero Syntax. Cambridge, MA: MIT Press. Pietroski, Paul 2005. Events and Semantic Architecture. Oxford: OUP. Putnam, Robert D. 2000, Bowling Alone New York: Touchstone. Recanati, Francois 2001. “What is Said”, Synthese 128. 75 – 91. Stone, Tony, and Martin Davies 2002, “Chomsky Amongst the Philosophers”. Mind and

Language 17:3, 276 – 289. Whewell, William 1840, The Philosophy of the Inductive Sciences. London. Zucchini, Walter 2000. “An Introduction to Model Selection”. Journal of Mathematical

Psychology 44, 41 – 61.

The Unbelievable Legacy of Methodological Dualismrthomaso/lpw05/johnson.pdf · 2005-09-03 · 3...

Documents

Transcript of The Unbelievable Legacy of Methodological Dualismrthomaso/lpw05/johnson.pdf · 2005-09-03 · 3...