Journal of Experimental Psychology: Copyright 1999 by the...

19
Journal of Experimental Psychology: Learning, Memory, and Cognition 1999, Vol. 25, No. 6. 1495-1513 Copyright 1999 by the American Psychological Association, Inc. 0278-7393/99/S3.00 Reasoning About Necessity and Possibility: A Test of the Mental Model Theory of Deduction Jonathan St. B. T. Evans, Simon J. Handley, and Catherine N. J. Harper University of Plymouth Philip N. Johnson-Laird Princeton University This article examined syllogistic reasoning that differs from previous research in 2 significant ways: (a) Participants were asked to decide whether conclusions were possible as well as necessary, and (b) every possible combination of syllogistic premises and conclusions was presented for evaluation with both single-premise (Experiment 1) and double-premise (Experiment 2) problems. Participants more frequently endorsed conclusions as possible than as necessary, and differences in response to the 2 forms of instruction conformed to several predictions derived from the mental model theory of deduction. Findings of Experiments 2 and 3 showed that some fallacies are consistently endorsed and others consistently resisted when people are asked to judge whether conclusions that are only possible follow necessarily. This finding was accounted for by the computational implementation of the model theory: Fallacies are made when the first mental model of the premises considered supports the conclusion presented. Traditional applications of logic concern the validity of arguments, that is, proving that some conclusion is neces- sary given some premises. However, in everyday reasoning it may be just as important to decide whether some proposition is possible in light of the given information. Inferences of possibility occur whenever rules and regula- tions constrain a person's behavior rather than determine what it must be. For example, students choosing a degree program within a modular course structure will have many degrees of freedom but will have to respect constraints owing to timetabling restrictions, availability of teaching staff, prerequisite and corequisite relations between mod- ules, and so on. Their decision making here involves inferring what is possible and then deciding between the possibilities identified. Most of the rules by which people live their lives in society, including criminal law, operate in a similar way. It is rarely determined by such rules that people must follow a particular course of action, but they are fre- quently constrained to act within a set of legal possibilities. The psychological study of deductive reasoning has been steadily increasing in recent years, and there are now many hundreds of experimental studies reported in the literature (see Evans, Newstead, & Byrne, 1993, for a review). In view of the above remarks, however, there is a curious limitation Jonathan St. B. T. Evans, Simon J. Handley, and Catherine N. J. Harper, Department of Psychology, University of Plymouth, Ply- mouth, United Kingdom; Philip N. Johnson-Laird, Department of Psychology, Princeton University. This study was supported by a research grant from the Economic and Social Research Council of the United Kingdom (R000221742). Correspondence concerning this article should be addressed to Jonathan St. B. T. Evans, Center for Thinking and Language, Department of Psychology, University of Plymouth, Plymouth PL4 8AA United Kingdom. Electronic mail may be sent to [email protected]. in almost all studies reported to date: They ask participants only to decide if a conclusion is necessary given some premises. There have been almost no studies prior to that reported in the present article that ask participants to decide what is or is not possible given some assumptions. In logic, a deductive inference is valid if its conclusion must be true given that its premises are true. Conclusions in standard logic state what is the case, but in "modal" logic they can in addition state what is possibly the case, or what is necessarily the case. Most psychological experimental inves- tigations of reasoning have considered only conclusions about what is the case. Typically, the participants are asked to evaluate the validity of a given conclusion (the evaluation task) or else to generate their own valid conclusion (the production task). Most of the studies have been based on problems drawn from the propositional calculus, which concerns the logic of negation and such connectives as if, or, and and, or on classical syllogisms of the sort devised by Aristotle. There have been few studies of modal reasoning (but see Bell & Johnson-Laird, 1998; Galotti, Baron, & Sabini, 1986; Osherson, 1976). In the research reported here, we examine inferences about what is possible as well as what is necessary within the syllogistic reasoning paradigm. The research is motivated by the mental model theory of deductive reasoning from which we are able to derive some general predictions about necessity and possibility in deductive reasoning. We start with a brief account of that theory. Mental Model Theory of Deduction The reason that so much research on standard deduction has been carried out is because many cognitive scientists believe that the ability to reason deductively is central to human intelligence. In fact, people make many logical errors on these tasks and exhibit systematic biases (Evans, 1989), a 1495

Transcript of Journal of Experimental Psychology: Copyright 1999 by the...

Page 1: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

Journal of Experimental Psychology:Learning, Memory, and Cognition1999, Vol. 25, No. 6. 1495-1513

Copyright 1999 by the American Psychological Association, Inc.0278-7393/99/S3.00

Reasoning About Necessity and Possibility: A Test of the Mental ModelTheory of Deduction

Jonathan St. B. T. Evans, Simon J. Handley, andCatherine N. J. Harper

University of Plymouth

Philip N. Johnson-LairdPrinceton University

This article examined syllogistic reasoning that differs from previous research in 2 significantways: (a) Participants were asked to decide whether conclusions were possible as well asnecessary, and (b) every possible combination of syllogistic premises and conclusions waspresented for evaluation with both single-premise (Experiment 1) and double-premise(Experiment 2) problems. Participants more frequently endorsed conclusions as possible thanas necessary, and differences in response to the 2 forms of instruction conformed to severalpredictions derived from the mental model theory of deduction. Findings of Experiments 2 and3 showed that some fallacies are consistently endorsed and others consistently resisted whenpeople are asked to judge whether conclusions that are only possible follow necessarily. Thisfinding was accounted for by the computational implementation of the model theory: Fallaciesare made when the first mental model of the premises considered supports the conclusionpresented.

Traditional applications of logic concern the validity ofarguments, that is, proving that some conclusion is neces-sary given some premises. However, in everyday reasoningit may be just as important to decide whether someproposition is possible in light of the given information.Inferences of possibility occur whenever rules and regula-tions constrain a person's behavior rather than determinewhat it must be. For example, students choosing a degreeprogram within a modular course structure will have manydegrees of freedom but will have to respect constraintsowing to timetabling restrictions, availability of teachingstaff, prerequisite and corequisite relations between mod-ules, and so on. Their decision making here involvesinferring what is possible and then deciding between thepossibilities identified. Most of the rules by which peoplelive their lives in society, including criminal law, operate in asimilar way. It is rarely determined by such rules that peoplemust follow a particular course of action, but they are fre-quently constrained to act within a set of legal possibilities.

The psychological study of deductive reasoning has beensteadily increasing in recent years, and there are now manyhundreds of experimental studies reported in the literature(see Evans, Newstead, & Byrne, 1993, for a review). In viewof the above remarks, however, there is a curious limitation

Jonathan St. B. T. Evans, Simon J. Handley, and Catherine N. J.Harper, Department of Psychology, University of Plymouth, Ply-mouth, United Kingdom; Philip N. Johnson-Laird, Department ofPsychology, Princeton University.

This study was supported by a research grant from the Economicand Social Research Council of the United Kingdom (R000221742).

Correspondence concerning this article should be addressed toJonathan St. B. T. Evans, Center for Thinking and Language,Department of Psychology, University of Plymouth, Plymouth PL48AA United Kingdom. Electronic mail may be sent [email protected].

in almost all studies reported to date: They ask participantsonly to decide if a conclusion is necessary given somepremises. There have been almost no studies prior to thatreported in the present article that ask participants to decidewhat is or is not possible given some assumptions.

In logic, a deductive inference is valid if its conclusionmust be true given that its premises are true. Conclusions instandard logic state what is the case, but in "modal" logicthey can in addition state what is possibly the case, or what isnecessarily the case. Most psychological experimental inves-tigations of reasoning have considered only conclusionsabout what is the case. Typically, the participants are askedto evaluate the validity of a given conclusion (the evaluationtask) or else to generate their own valid conclusion (theproduction task). Most of the studies have been based onproblems drawn from the propositional calculus, whichconcerns the logic of negation and such connectives as if, or,and and, or on classical syllogisms of the sort devised byAristotle. There have been few studies of modal reasoning(but see Bell & Johnson-Laird, 1998; Galotti, Baron, &Sabini, 1986; Osherson, 1976).

In the research reported here, we examine inferencesabout what is possible as well as what is necessary within thesyllogistic reasoning paradigm. The research is motivated bythe mental model theory of deductive reasoning from whichwe are able to derive some general predictions aboutnecessity and possibility in deductive reasoning. We startwith a brief account of that theory.

Mental Model Theory of Deduction

The reason that so much research on standard deductionhas been carried out is because many cognitive scientistsbelieve that the ability to reason deductively is central tohuman intelligence. In fact, people make many logical errorson these tasks and exhibit systematic biases (Evans, 1989), a

1495

Page 2: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

1496 EVANS, HANDLEY, HARPER, AND JOHNSON-LAIRD

finding which has led to much debate about human rational-ity (see Evans & Over, 1996; Manktelow & Over, 1993).However, most psychologists in the area would agree thatuntrained reasoners do show some basic deductive compe-tence: For example, they can distinguish valid inferencesfrom fallacies at well above chance rates. Accounting forsuch competence has, however, led to a protracted, some-times heated, and as-yet unresolved debate between twotheoretical camps. The debate is focused on whether deduc-tive reasoning is achieved by the use of mental rules ormental models (see Evans et al., 1993, chapter 3, for detailedcoverage of the debate).

Rule-based theories derive from so-called natural deduc-tion, a method of formalizing logic that is intuitivelyplausible and that postulates rules of inference for eachlogical term. The most durable of these have proved to be thesystems devised by Rips (1983, 1994) known as PSYCOP,and the system developed by Braine and O'Brien (Braine,1978; Braine & O'Brien, 1991; Braine, Reiser, & Rumain,1984). However, we provide little discussion of thesetheories in the present article for two reasons. First, ruletheorists have largely confined their theoretical and experi-mental work to the study of propositional reasoning andhave had rather little to say about syllogistic inference (butsee some discussion of reasoning with quantifiers by Rips,1994). Second, the rale theories as published provide onlyrules for valid inferences and hence can account for neces-sary but not possible inferences. We defer consideration ofhow rule theories might be extended to the explanation ofpossible inference until the General Discussion.

The account of human deduction in terms of mentalmodel theory was devised by Johnson-Laird (1983; Johnson-Laird & Byrne, 1991). In this approach, people do not proveconclusions syntactically by applying valid inference rules.Rather, their deductions are based on grasping a semanticprinciple, namely, that a conclusion is valid if there is nomodel of the premises that exclude it. The model theoryincludes psychological constraints that do not apply toformal semantic methods such as truth-table analysis. Inparticular, it is supposed that people focus on situations inwhich the premises would be true and represent suchsituations selectively.

The mental model theory was originally devised as anaccount of syllogistic reasoning (Johnson-Laird & Bara,1984) and in its general form supposes that deductionrequires three stages:

1. Model formation. The reasoner forms an initial modelfrom the premises.2. Conclusion formation. The reasoner derives a putativeconclusion from the model that is informative (e.g., not arepetition of a premise).3. Conclusion validation. The reasoner searches for a coun-terexample, that is, a model in which the premises are true butthe conclusion is false. If no such model is found, theconclusion is valid.

A large experimental program of research has beencarried out by mental model theorists who claim support forthe predictions of their theory (see Johnson-Laird & Byrne,1991). For example, Johnson-Laird and Bara (1984) ran a

large-scale study of syllogistic reasoning using productiontask methodology. All possible premise pairs of the classicalsyllogistic form (see below) were presented, and participantswere asked to derive a conclusion. The authors analyzedthese premise pairs according to whether they were consis-tent with one, two, or three models. As predicted, error rateswere higher on multimodel problems. The psychologicalprinciple here is that consideration of multiple models placesa strain on working-memory capacity. In particular, peopleoften commit fallacies because they fail to find a counterex-ample model at Stage 3, even though such a model exists.

Many specific findings claimed in support of the modeltheory have, however, been disputed by rule theorists (seeEvans et al., 1993). Evans and Over (1996,1997) argued thatit is very difficult to decide between the two approaches inmost of these arguments as neither theory is fully specified.For this reason, we present here an attempt to test somepredictions that follow from only the most general principlesof the model theory and that do not rely on any supplemen-tary assumptions to account for the specific tasks. Nor dothey rest on any assumptions about how particular connec-tives or quantifiers are represented. These predictions con-cern what people will do when asked to make judgments ofnecessity and possibility and are derived below.

Possible and Necessary Inference

In modal logic, a distinction is drawn among truths,necessary truths, and possible truths. A statement is true if itis a matter of fact; a statement is necessary if it must be true;and a statement is possible if it may be true. Consider thefollowing three arguments based on universal premises:

All artists are beekeepers,Lisa is an artist,Therefore, it necessary that Lisa is a beekeeper. (1)

All artists are beekeepers,Lisa is a beekeeper,Therefore, it is possible Lisa is an artist. (2)

All artists are beekeepers,Lisa is an artist,Therefore, it is possible that Lisa is not a beekeeper. (3)

Argument 1 is a modus ponens inference in which themodal conclusion states a necessity. Given that the premisesare true, then indeed it is necessary that Lisa is a beekeeper,and so the conclusion is valid. Argument 2 is also valid, butany stronger conclusion, such as "It is necessary that Lisa isan artist," would be invalid, because the premises areconsistent with a situation in which there are beekeeperswho are not artists, and so Lisa may be a beekeeper who isnot an artist. Argument 3 is invalid, because as Argument 1shows, Lisa is necessarily a beekeeper given the premises.

In this article, we refer to these three sorts of argument asNecessary, Possible, and Impossible. Standard deductivereasoning instructions ask people to make judgments aboutthe validity of factual conclusions. If people are asked todecide whether it is necessary that X, where X is anassertion, then they should endorse Necessary conclusionsand reject both those that are Possible and Impossible. If

Page 3: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

NECESSITY AND POSSIBILITY 1497

people are asked instead to decide whether it is possible thatX, then they should endorse Necessary conclusions andPossible conclusions and reject only Impossible conclu-sions. The ability to discriminate Possible from Impossibleconclusions is neglected in normal psychological researchon reasoning.

Consider the three general stages of the model theorydefined earlier. In the first two stages, the reasoner constructsa single model of the premises that supports a provisionalconclusion. This conclusion can be reported as possiblewithout any further reasoning. The conclusion validationstage (search for counterexamples) only comes into playwhen people are asked to prove that the putative conclusionis necessary. Hence, according to the model theory, possibleinference should be easier than necessary inference. Our firstprediction (P) therefore is as follows:

PL People will be more willing to endorse conclusions aspossible than as necessary.

The model theory provides some further clear predictionsabout the relative difficulty of different types of problems inthis paradigm. Consider the three basic types of problemsfrom a model theory perspective:

Necessary problems: the conclusion is true in all models of thepremises.Possible problems: the conclusion is true in at least one modelof the premises.Impossible problems: the conclusion is true in no model of thepremises.

From this analysis we derive and test in this article twospecific predictions:

P2. It will be easier to decide that a conclusion is possible if itis also necessary. Specifically, we predict more endorsementsof possibility for Necessary than for Possible problems.

P3. It will be easier to decide that a conclusion is notnecessary if it is also not possible. Specifically, we predict thatmore Possible than Impossible problems will be endorsed asnecessary.

P2 follows because it is only necessary to discover onemodel supporting the conclusion in order to infer that it ispossible. On Necessary problems, all models support theconclusion, whereas on Possible problems there is at leastone model that does not. Hence, reasoners are at risk ofthinking first of a model that does not support the conclusionin the latter case. P3 follows because a conclusion is notnecessary when there is at least one model that does notsupport it. On Impossible problems, there is no model of thepremises that supports the conclusion, whereas on Possibleproblems there is at least one that does. Hence on the latterproblems reasoners are at risk of first thinking of a modelthat supports the conclusion.

We report two experiments designed to test our a priorihypotheses and a third that addresses issues arising from thefindings of these studies. In Experiment 1, we tested theabove predictions using a simpler task than full syllogisticreasoning. The task involves giving people a single premiseof the kind used in syllogisms from which they must draw aninference; this is usually referred to as the immediate

inference task (see Newstead & Griggs, 1983). Experiment 2involves full syllogistic reasoning and to our knowledge isthe first to ask people to evaluate every possible conclusionwith regard to every possible pair of premises even forvalidity (Necessity) judgments. Both experiments includedgroups required to make judgments of Possibility as well asof Necessity of conclusions.

Experiments 1 and 2 were designed together and run onthe same participants in the same sessions. Owing to thelarge difference in scale between the two experiments,however, counterbalancing of the order of the experimentsseemed inappropriate. It was also undesirable to ask thesame participant to make judgments of both necessity andpossibility. The procedure adopted was as follows. Partici-pants were allocated into Necessity and Possibility groupsand run through the short task of Experiment 1 using theappropriate form of instruction. Each participant then per-formed the large task required by Experiment 2 using thesame form of instruction allocated to them in Experiment 1.The groups were also subdivided in Experiment 2 accordingto the order of terms in the conclusions they evaluated (seeMethod section of Experiment 2 for details). We also report athird experiment that replicates some of the key findings ofExperiment 2 using a much smaller subset of syllogisms andsimpler method of problem presentation. Experiment 3 alsoprovides a check that the results of Experiment 2 werenot influenced by the prior experience of taking part inExperiment 1.

Experiment 1: Immediate Inference

Syllogisms are formed by combining two premises and aconclusion, each of which can have one of only fourquantified forms known traditionally by the abbreviations A,E, I, and O and described as "moods." These are as follows:

AEIO

Universal affirmativeUniversal negativeParticular affirmativeParticular negative

All X are YNo X are YSome X are YSome X are not Y

In Experiment 1, we looked only at the immediateinferences that people may draw when one of these state-ments acts as a premise and the other as a conclusion.Because the order of terms can be exchanged, there areseven possible conclusions (excluding the repetition of thepremise) to be considered for each of the four possiblepremises.

As with syllogisms, immediate inference problems can beclassified into three types, where there is no doubt about theexistence of entities referred to by the various terms:

Necessary. The conclusion statement must be true given thatthe premise is true. Of the 28 problems, 6 are of this kind.Possible. The conclusion statement might be true given thatthe premise is true. Of the 28 problems, 12 are of this kind.Impossible. The conclusion statement cannot be true giventhat the premise is true. Of the 28 problems, 10 are of thiskind.

Page 4: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

1498 EVANS, HANDLEY, HARPER, AND JOHNSON-LAIRD

Note that logically any conclusion that is necessary is alsopossible. The problems we classify as Possible are in factones in which the conclusion is possible but not necessary.

To demonstrate the method, we consider an example ofwhat can be inferred from the premise "All A are B." Eachof the following possible conclusions can be presented inturn for evaluation:

1. NoAareB2. Some A are B3. Some A are not B4. All Bare A5. No B are A6. Some B are A7. Some B are not A

Conclusions 2 and 6 are necessary; 4 and 7 are possible(but not necessary), whereas 1, 3, and 5 are impossible.Hence, if the participant was instructed to decide whethereach of these conclusions was necessary given the premise,the correct answer would be yes for 2 and 6 and no for all theothers. If the instruction were to decide which were possible,then the correct answer would be yes for 2,4,6, and 7 and nofor 1,3, and 5.

In Experiment 1, participants were presented with imme-diate inference problems under either Necessity or Possibil-ity instructions. This allows us to test predictions P1-P3identified earlier. Note that in this design there is aninescapable confounding between the type of instruction(Possibility or Necessity) and the kind of answer constitut-ing a correct response. For example, because many moreconclusions are in fact possible than necessary, any generalbias to accept conclusions would lead to more correctresponding in the Possibility group compared with theNecessity group. However, this does not present us with aproblem in testing our predictions above, which are in anycase phrased in terms of conclusions accepted rather thancorrect decisions. Note also that predictions P2 and P3 bothinvolve comparisons within groups, in which instructiontype is held constant.

Method

Design. Participants were run in two experimental groupsaccording to whether they received Necessity or Possibility instruc-tions. Each evaluated the 28 immediate inference problems de-scribed above in an independently generated random order. Experi-ment 1 was the first task administered to these participants, whothen proceeded to cany out the tasks of Experiment 2, reportedbelow.

Participants. A total of 120 undergraduate students at theUniversity of Plymouth, Plymouth, England, took part in this studyand received payment for their participation. Participants wererandomly allocated to the two experimental groups, 60 in each. Allwere familiar with the use of computers including the operation ofa mouse.

Materials. The instructions for the tasks were presented sepa-rately on sheets of A4 paper. However, the reasoning problemswere presented on an Acorn computer using a custom-writtencomputer program. A separate screen was used for each of the 28

problems with a layout as in the example that follows:

GIVEN THATSome K are R

IS IT NECESSARY THATSome R are not K

The above example is in the form shown to the Necessity group.In the Possibility group, the second header read "IS IT POSSIBLETHAT." Underneath were two boxes showing the words YES andNO in uppercase. Participants signaled their decision by moving amouse pointer into either box and clicking. The program randomlyassigned letters to the sentences excluding I and O.

Procedure. Participants were run in small groups in a labora-tory containing several computers running identical software. Eachhad their own workstation and worked independently at their ownpace. Participants were randomly assigned to the experimentalconditions on entering the room. As a group, they were given abrief introduction as to what the tasks of both Experiments 1 and 2would involve, and they were informed that their data would beconfidential and that they had a right to withdraw at any timewithout penalty.

The sheets containing the instructions for Experiment 1 werealready present at the participants' computer terminals; they wereasked to read these and then to start as soon as they were ready.They were instructed that the experiment was designed to find outhow people solve logical reasoning problems. (Because naivereasoners are known to assume that assertions of the form "All Aare B" implicate the existence of A's and B's, we did not feel itnecessary to state this explicitly in the instructions. Our resultssupport our assumption.) It was explained that the problems wouldbe presented one at a time on separate screens. The problem layoutswere described, and participants were instructed to click in the YESor NO box to make their response. Necessity participants wereinstructed to click YES if the conclusion must follow from thepremises and NO if it did not follow. Possibility participants wereinstructed to click YES if the conclusion could follow from thepremises and NO if it could not. Each participant received the 28problems in an independently randomized order. Both answers andresponse latencies1 were recorded by the program and saved to adisk.

Results and Discussion

This study involves collection of data sets that could beanalyzed for a number of theoretical purposes other than

1 Although latency measures were collected in these experi-ments, we decided not to report them. Interpretation of the latencydata was considered problematic for two main reasons. First, in thelarge experiment (Experiment 2) people had to evaluate fourconclusions in relation to the premises pair on each screen withonly one latency recorded. Hence, we cannot separate the timetaken for reasoning about each conclusion. Second, there is muchvariability of responding (our main measure of interest) associatedwith different types of logical problems in all of the experiments, sothat overall latencies are based on a variable mix of yes and noresponses. There is no possibility of excluding "error" trials aswith a simple cognitive task. Given the multifactorial nature of thedesigns, there is also no practical method of analyzing responses toyes and no decisions separately. In addition to these main problems,we are also of the view that psychologists currently lack a goodtheory of how response latencies map onto cognitive processeswith problems of this complexity.

Page 5: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

NECESSITY AND POSSIBILITY 1499

those that we have space to describe here. Consequently, acomplete list of the percentage of conclusions endorsed oneach problem by each group is shown in Appendix A. Weconcentrate our analysis here on the three categories ofproblems identified a priori as having conclusions that wereNecessary (« = 6), Possible (n = 12), or Impossible (n = 10)given their premises. The mean percentage acceptance ofconclusions in each category within the two instructiongroups is shown in Figure 1. Note that more conclusionswere endorsed in the Possibility group throughout and thatthe order of endorsement of problem type was Necessary >Possible > Impossible under both forms of instruction. Thelatter trend is to be expected, not just by the model theory butby any account that allows .a significant element of deductivecompetence in syllogistic reasoning.

A 2 X 3 analysis of variance (ANOVA) was carried out totest the effect of groups (Necessity vs. Possibility) and typesof inference (Necessary vs. Possible vs. Impossible). Forpurposes of this analysis, the percentage of inferencesendorsed by each participant was computed for each prob-lem type such that each participant contributed three datapoints to the ANOVA. There was a highly significant effectof group, reflecting the greater endorsement under Possibil-ity instructions, F(l , 118) = 50.89, MSE = 0.421,p < .001.There was also a highly significant main effect of problemtype, F(2, 236) = 283.8, MSE = 0.349, p < .001. Inaddition, the interaction between the two variables wassignificant, F(2, 236) = 7.07, MSE = 0.349,/? < .002. Themain effect of instructions confirms our general predictionPI, discussed earlier. The interaction appears to reflect thelarger difference between the two groups for conclusionsthat were possible. The phenomenon is not surprising,

100

20-

Necessary Possible Impossible

Figure 1. Percentage of conclusions accepted in Experiment 1(immediate inference) in Group N (Necessity instructions) andGroup P (Possibility instructions).

because these conclusions differ in their correct answerbetween the two groups: it is yes for the Possible group andno for the Necessary group.

Follow-up analyses were conducted to test predictions P2and P3 derived from the mental model theory. As a reminder,P2 states:

P2. It will be easier to decide that a conclusion is possible if itis also necessary. Specifically, we predict more endorsementsof possibility for Necessary than for Possible problems.

The rationale for this prediction is that people only need toidentify one model that supports the conclusion to decidethat it is possible. On Necessary problems all models of thepremises support the conclusion, whereas on Possible prob-lems there is at least one model that does not. Hence, it willbe easier to find a supporting model on Necessary problems.This prediction was tested using a one-tailed related-groups ttest comparing endorsement of Possible and Necessaryproblems for the group given Possibility instructions. Theprediction was strongly confirmed, ?(59) = 4.84, p < .001.

Prediction P3 states:

P3. It will be easier to decide that a conclusion is notnecessary if it is also not possible. Specifically, we predict thatmore Possible than Impossible problems will be endorsed asnecessary.

The rationale here is that reasoners need only identify onemodel of the premises that does not contain the conclusion inorder to decide that the conclusion is not necessary. This willbe easier on Impossible problems because no model of thepremises supports the conclusion than on Possible problemsin which at least one model will support it. This predictionwas tested with a one-tailed t test comparing acceptancerates for Possible and Impossible problems under Necessityinstructions and was again supported, f(59) = 17.04, p <.001.

All of the main predictions have then been stronglysupported in our analysis of conclusions accepted on theimmediate inference task of Experiment 1. An additionalaspect of the frequency data is also worthy of a note.Whereas Possible problems are less frequently endorsedthan Necessary problems overall in both instruction groups,the data under Necessity instructions seem to follow abimodal distribution. Of the 12 Possible problems, the 7most frequently endorsed ranged from 41% to 91% and theremaining 5 from 3% to 11%. The first group averages75%—slightly more than the mean acceptance for Neces-sary problems with these instructions; the second groupaverages 6.5%—a little less than the mean for Impossibleproblems. Although retrospective, this observation suggeststhat reasoners often fail to search for counterexamples evenwhen instructed to make decisions of necessity. The data areconsistent with the hypothesis that some Possible problemssuggest an initial model that supports the conclusion—leading to endorsement of the fallacy—whereas others donot. We return to this issue in discussing the results ofExperiment 2 and providing the rationale for Experiment 3,both involving syllogistic reasoning proper.

Page 6: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

1500 EVANS, HANDLEY, HARPER, AND JOHNSON-LAIRD

Experiment 2: Syllogistic Reasoning

Experiment 2 extends the investigation reported in Experi-ment 1 to syllogistic inference, so we first provide somebrief discussion of the logic and psychology of syllogisms. Asyllogism has two end terms, which we refer to as A andC—in order of mention in the premises—and a middle orlinking term which we call B. The conclusion can be ineither order A-C or C-A. Either premise or the conclusionmay take each of the four forms (moods) discussed earlierand investigated in the immediate inference task of Experi-ment 1. For example, a possible syllogism is

All A are CSome B are not CTherefore, some A are not C

In classical syllogistic logic, a distinction is drawnbetween four figures of syllogism, which are defined by theorder of the terms in the premises with reference to the orderin the conclusion. Classical syllogisms are also defined in away that fixes only one order of premises. In this study wewish to consider all possible ways of presenting theseproblems with conclusions. Hence, we follow the practice ofJohnson-Laird and Bara (1984) in defining figure by theorder of terms in the premises without regard to theconclusion. There are four such possible orderings:

Figure 1

Figure 2

Figure 3

Figure 4

A-BB-C

B-AC-B

A-BC-B

B-AB-C

In an evaluation task, as used in the present study, each ofthe above figures can be presented with a conclusion ineither AC or CA order. The number of possible problemsincreases considerably in the evaluation paradigms becausethe two possible conclusion orders must be multiplied by thefour possible moods. This results in 512 distinct problems.All of these were presented for evaluation in Experiment 2.To keep the task within reasonable bounds, however,participants evaluated either the AC or CA order but not both. Inaddition, they evaluated four conclusions on the same screen fora given premise pair, so that they effectively evaluated 256syllogisms each, presented on 64 separate screens.

There are numerous theories of syllogistic inference to befound in the psychological literature (see Evans et al., 1993).The details of most of these need not concern us here. Wenote briefly that the theories fall into two broad types. Onetradition emphasizes nonlogical processes in which the formof the premises biases the choice of conclusion withoutintervention of deductive processes. Although this approachhas been out of fashion for many years, a new nonlogicaltheory based on probability heuristics has been developedrecently by Chater and Oaksford (1999). With regard todeductive approaches, a provisional account of reasoningwith quantifiers has been offered by Rips (1994) within the

mental logic tradition. A detailed account of syllogisticreasoning has been developed over the years by mentalmodel theorists (see Johnson-Laird & Byrne, 1991) and hasbeen implemented in a working computer program (Johnson-Laird & Byrne, 1994). A significant development is theverbal reasoning model of Polk and Newell (1995), whichis also model based but differs in some significant re-spects from the theory presented by Johnson-Laird and hiscolleagues.

As stated earlier, our objective in this research is to testpredictions derived from general principles of the modeltheory that were independent of implementation details. Forthis reason, we omit a description of the detailed account ofsyllogistic reasoning offered by Johnson-Laird and Byrne(1991), which is not required to follow the rationale andanalysis of the present study. The experiment permits us totest the same three predictions PI, P2, and P3 about thedifference between necessary and possible inference thatwere assessed on the simpler immediate inference task inExperiment 1.

Method

Design. Participants were run in four experimental groupsimmediately following the presentation of the immediate inferencetask reported as Experiment 1. Participants received the same kindof experimental instruction (Necessity or Possibility) as they weregiven in Experiment 1 but were subdivided according to whetherthey received conclusions to evaluate in AC or CA order (seeintroduction to this experiment). Each participant was shown 64separate computer screens, one for each possible premise pair. Foreach pair they were required to indicate whether each of fourpossible conclusions was either necessary or possible according toinstruction group. The four possible conclusions were those in eachof the four moods and in an order (AC or CA) dependent on thesubgroup to which participants were assigned.

Participants. The same 120 students who participated inExperiment 1 served as participants for Experiment 2, administeredimmediately afterward.

Materials. Problems were presented in a similar form to thoseof Experiment 1, except that there were two premises and a list offour conclusions to be evaluated. An example screen layout is asfollows:

GIVEN THATSome T are PNo P are G

IS IT NECESSARY THAT

YES NOAllTareG D •Some T are G • •No T are G • •Some T are not G O •

The above example would be given to the Necessity-AC group.Possibility participants were shown a second header that read, IS ITPOSSIBLE THAT, and those in CA groups would have receivedthe conclusions reversed: All G are T, and so on. Letters wereassigned randomly and independently to logical forms by theprogram on each problem. The order of the four conclusion types(moods) was fixed as above as A, I, E, O for all problems.

Procedure. On completion of Experiment 1, participants werepresented with further written instructions titled TASK 2. They

Page 7: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

NECESSITY AND POSSIBILITY 1501

were told that there would be another series of logical problems butthat this time there will be two statements and that they would beasked, given that these statements were true, to decide which of thefollowing statements must be true (Necessity group) or could betrue (Possibility group). They were shown an example of the screenlayout and were told to signal their answers by clicking the mousein the YES or NO box next to eaeh of the following statements.They were told that they could decide about the statements in anyorder but that a response once made could not be changed.

Whenever participants clicked in a box, it was filled on thescreen. When all four responses were made to a screen, a boxappeared underneath labeled CLICK HERE FOR THE NEXTPROBLEM. Thus participants were self-paced and could restbetween problems in the long sequence as required. The programrecorded both the responses and the latencies and saved them todisk. The 64 screens were presented in an independently random-ized order for each participant.

Results and Discussion

A list of response frequencies to all individual problems isgiven in Appendix B. Of the 256 premise-conclusioncombinations evaluated by each participant, 24 were Neces-sary (conclusion must follow from premise), 208 werePossible (conclusion could follow from premises), and 24were Impossible (conclusion could not be true given thepremises). These numbers reflect the fact that where, forexample, premises support a conclusion of the form "All Gare T," they also support "Some G are T" and "Some T areG," given that naive reasoners take universal assertions toimply the existence of the relevant entities and that "some"is taken to mean "at least some, and possibly all." The meanacceptance rates for each problem type in each instructiongroup are shown in Figure 2 averaged over the AC and CAsubgroups (this factor had no overall effect). The graph isextraordinarily similar to that for Experiment 1 (see Figure

Necessary Possible Impossible

Figure 2. Percentage of conclusions accepted in Experiment 2(syllogistic inference) in Group N (Necessity instructions) andGroup P (Possibility instructions).

1) when one considers the additional complexity of the twopremise syllogistic format. A 2 X 2 X 3 ANOVA wasconducted again on aggregated data with each individualcontributing one data point for each of the three types ofproblems. The additional (group) factor to that of Experi-ment 1 was conclusion order.

The trends in the ANOVA were similar to those observedin Experiment 1. There was a significant main effect ofinstruction group, F(l, 116) = 64.7, MSE = 0.374, p <.001, reflecting higher acceptance rates under Possibilityinstructions. There was also a significant main effect ofproblem type, F(2, 232) = 446.6, MSE = 0.218, p < .001,reflecting the order Necessary > Possible > Impossible.Again, as in Experiment 1, the interaction between these twofactors was significant, F(2,232) = 6.03, MSE = 0.218,/? <.01, reflecting the fact that instructions had the largest effecton problems of Possible type. There was, however, no effectof conclusion order.

As in Experiment 1, prediction PI is confirmed as a maineffect in the ANOVA, but P2 and P3 require follow-up tests.These were assessed using one-tailed related-groups t tests,with separate tests being carried out for the AC and CAsubgroups {dfs, = 29). P2 predicts that participants givenPossibility instructions will accept more Necessary thanPossible conclusions. This was confirmed for both AC(f = 6.11, p<.00l) and CA (f = 5.19, p < .001) sub-groups. P3 predicts that participants given Necessity instruc-tions will accept more Possible than Impossible conclusions.This also was confirmed for both AC (t = 12.3, p < .001)andCA(f = 9.2, p < .001) subgroups.

This experiment provides the opportunity to look at someother issues of interest using the syllogistic evaluation task.An important claim of the model theory is that people will beless inclined to draw valid conclusions from syllogisticpremises consistent with multiple mental models, as op-posed to a single model. The point here is that when theconclusion is valid—as on our Necessary problems—thenany model of the premises will support the conclusion.Hence, if people consider only one model of the premises,there should be no difference in performance on single andmultimodel problems. If, on the other hand, people make aneffort at deduction by trying to prove the conclusion in allmodels of the premises, then multimodel problems shouldbe more difficult. To our knowledge, this prediction has notpreviously been tested on the syllogistic evaluation task,although evidence for it was adduced by Johnson-Laird andBara (1984) using the production task. To maximize compa-rability with their study, we eliminated syllogisms with validbut weak conclusions from this analysis—for example, onesin which the conclusion to be evaluated was "Some A are C"even though the conclusion "All A are C" was supported bythe premises. Such conclusions are rarely generated on aproduction task.

Of the 256 problems presented to each participant, just 36are valid with strong conclusions. For this subset, weexamined the difference in acceptance rates between singleand multiple-model problems. Under Necessity instructions,we would expect a similar trend to that observed byJohnson-Laird and Bara (1984)—that is, fewer acceptances

Page 8: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

1502 EVANS, HANDLEY, HARPER, AND JOHNSON-LAIRD

of the valid conclusions on multiple-model problems. UnderPossibility instructions, we should not find such a trend,however, because in this case only a single model need befound supporting the conclusion for the participant to justifya YES response. The trends in our data accord with thishypothesis. Of the 36 problems, 18 are of a kind classified assingle model, and 18 as multiple model by Johnson-Lairdand Bara. Under Necessity instructions, 81% of singlemodel problems had their conclusions accepted, comparedwith 59% of multiple-model problems. As expected, thistrend was weaker under Possibility instructions: 88% accep-tance of single-model problems compared with 77% accep-tance on multiple-model problems. To test the significanceof this interaction, we computed the difference betweenacceptance rates of single- and multiple-model problems forall participants. These difference scores were then comparedon a one-tailed between-groups t test, comparing the 60participants who received Necessity instructions with the 60in the Possibility group. The test was comfortably signifi-cant, r(118) = 2.89, p < .005.

The above finding supports the model theory claim thatreasoners look for counterexamples, at least on this re-stricted set of 36 syllogisms. However, looking at the data asa whole, it is clear that any such facility must be weak, inview of the very high rates of endorsement of fallaciesreported in this study as well as elsewhere in the literature.Nevertheless, it appears that the lower rate of acceptance ofPossible than Necessary problems under Necessity instruc-tions could be due to some searching for counterexamplesthat successfully leads to refutation of some of the potentialfallacies. However, there is an alternative interpretation ofthe lower rate of acceptance of Possible conclusions that weneed to consider. Suppose that people are willing to acceptas necessary any conclusion supported by the first model ofthe premises they think of. Suppose also that the premises ofPossible problems consistently suggest a model that eitherdoes or does not support the conclusion. The result would bethat some Possible problems would be endorsed as fre-quently as Necessary problems and others as infrequently asImpossible problems. We have already noted that this wasthe case for the immediate inference task in Experiment 1.Inspection of the data for Possible problems under Necessityinstructions in the syllogistic inference task of Experiment 2suggested a similar trend was present. Some fallacies werestrongly endorsed, as for example:

All A are BSome B are not CTherefore, some A are not C

Ninety percent of participants judged the above conclu-sions to follow under instructions of Necessity, even thoughit is fallacious. Judgments of Possibility were also 90%,suggesting that (a) people easily imagine situations (mentalmodel) in which premises and conclusion are true but (b)they fail to seek counterexamples to prove necessity. Bycontrast, consider the following syllogism:

Some A are BAll C are BTherefore, all C are A

Only 3% of participants endorsed this syllogism underinstructions for Necessity, but 43% did so when givenPossibility instructions. This suggests that participants donot initially think of a model of the premises that supportsthe conclusion in this case and hence reject it as unnecessary.However, in this case it appears that some effort to find analternative and supporting model does occur when peopleare asked if the conclusion is possible. A trend to seekalternative models when proving possibility rather thannecessity could account for the interaction between instruc-tions and logical classification of problems reported above.Recall that this interaction reflects the fact that the increasein acceptance of conclusions under Possibility instructionswas larger for Possible syllogisms than either Necessary orImpossible ones.

Experiment 3 was designed to establish further thereliability of the finding that some fallacies are consistentlymade and others withheld and to investigate the processesthat underlie this effect.

Experiment 3: Possible Strong Versus PossibleWeak Syllogisms

As noted earlier, Experiments 1 and 2 have stronglysupported the a priori predictions that we derived aboutpossible and necessary inference from the mental modelsaccount. However, there is also an observation that fits lesscomfortably with this framework. It appears that whenPossible problems are assessed under Necessity instructions,some are accepted with very high rates comparable withthose of Necessary problems, whereas others have very lowacceptance rates comparable with those associated withImpossible problems (see above for examples of the twokinds of syllogism). A possible interpretation is that peopleare not searching for counterexample models, but rather aresimply saying yes if the first model suggested by thepremises contains the conclusion.

Experiment 2 is to our knowledge the only study to haveexamined people's evaluations to all possible conclusionswith all possible premise types, and certainly the first to havedone so with judgment of Possibility as well as Necessityrequired. Although it should prove useful to theorists of allpersuasions to have such a database available (Appendix B),there is a disadvantage inherent in such a comprehensiveapproach, namely, that a very large number of problemshave to be administered to each participant. This enforcedthe use of a method in which 64 premise pairs werepresented with four conclusions to be evaluated, as thealternative would have required processing of 256 separatecomputer screens for each participant, even keeping conclu-sion order as a between-participant variable.2

2 Another possible problem with the procedure of Experiment 2is that the conclusions to be evaluated are logically related to eachother. For example, if a participant has decided that the conclusion"All A are C" is necessary given the premises, then they mightconclude that "Some A are not C" is impossible without returning

Page 9: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

NECESSITY AND POSSIBILITY 1503

Table 1Syllogisms Selected on the Basis of Inference Rates in Experiment 2 for FurtherInvestigation in Experiment 3

Syllogism

Necessary

Impossible

Possible strong

Possible weak

A-BB-C

ac

AAaAEeIAiIEoAEaAEiIAeIE aAOoHiOAoOOoHeOleOEaOOa

ca

AEeIAiEAoEIoAAeAEaAEiIAeAli10 oOAo01 oAOaII aIEaOOa

ac

AliAEoIEoEAeAAeAleIEaEAaAOoIAiII oOAoAlaAOaOAaOOa

B-AC-B

ca

AAaAliEAeEIoAAeAAoEAaEl aHi10 oOAoOIoAOaIEaEOaOOa

ac

AEeIEoEAeOAoAEaAEiEAaOAaAIoIAiIOoOIoAlaIAeII aOEa

A-BC-B

ca

AEeAOoEAeEIoAEaAOaEAaEl aAliIAiIOoOAoIAaIEaEOaOla

ac

AliAEoIEoEAeAAeAleIEaEAaAOoIAiII 0OAoAlaAOaOAaOOa

B-AC-B

ca

AAaAliEAeEIoAAeAAoEAaEl aII iIOoOAoOIoAOaIEaEOaOOa

Note. Uppercase letters represent premise pairs and lowercase represent conclusion. Key topremise and conclusion types: A = All X are Y; E = No X are Y; I = Some X are Y; O = Some X arenotY.

Experiment 3 involves presenting selected syllogismssuch that each participant receives only 64 problems in totalwith each presented on a single screen, similar to theimmediate inference problems of Experiment 1. In selectingsyllogisms, our main interest was in the hypothesis that thereare two kinds of Possible problems that we define as follows:

Possible Strong: problems whose conclusions are possiblebut not necessary given their premises, but that are frequentlyendorsed as having necessary conclusions.

Possible Weak: problems whose conclusions are possiblebut not necessary given their premises, and that are infre-quently endorsed as having necessary conclusions.

In other words, Possible Strong syllogisms are the falla-cies that people tend to make (under instructions forNecessity), and Possible Weak syllogisms are the fallaciesthey tend to avoid. The identification of these two categorieshas been necessarily retrospective in the two experimentsreported to date. Hence, we decided to choose examples ofPossible problems with high and low endorsement rates inExperiment 2 and submit them to independent replication inExperiment 3. The actual syllogisms selected are shown inTable 1. Note that the 64 problems presented to AC groupswere not necessarily the same as those presented to CAgroups. This is because problems were selected strictly onthe basis of performance in Experiment 2 under Necessityinstructions. The Necessary syllogisms selected were thefour most frequently endorsed associated with each figure ofpremises and conclusion order. Possible Strong syllogismswere produced by choosing the four most frequently en-

to the premises. Any possible distortion of responses resulting fromthis will be eliminated by the procedure in Experiment 3, in whichonly one conclusion is evaluated at a time with respect to aparticular pair of premises.

dorsed Possible problems in each figure. Similarly, PossibleWeak problems were the least frequently endorsed. Finally,the least frequently endorsed Impossible problems wereselected. This method provides a strong test of whetherPossible Strong problems are endorsed as often as Necessaryproblems, and Possible Weak problems as rarely as Impos-sible problems. With Necessity instructions, the levels werecomparable in both cases (see the black bars in Figure 3).When the same problems were considered under Possibilityinstructions, however, Possible Weak problems had a some-what higher acceptance rate than Impossible problems (seeFigure 4). Although the Necessity data of Experiment 2 wereused as the basis for problem selection, participants receivedthe same syllogisms in Experiment 3 whether responding toPossibility or Necessity instructions.

Method

Design. There were four groups of participants (as withExperiment 2) according to conclusion order (AC or CA) and thetype of instruction (Necessity or Possibility). Each participant waspresented with 64 separate syllogisms generated by two within-participant variables on four levels each: the type of problem(Necessary, Impossible, Possible Strong, and Possible Weak),syllogistic figure, and a repetition of four examples in each cell.

Participants. Eighty undergraduate students from the Univer-sity of Plymouth took part in this experiment and received eitherpayment or course credits for their participation. None hadparticipated in Experiment 1 and 2 or in other similar studies. Theywere randomly allocated to the four experimental conditions, ingroups of 20.

Materials and procedure. The computer program randomlyallocated letters (excluding I and O) for the end and middle termsused in the syllogisms that were previously selected as shown inTable 1. The screen layout used was similar to that of Experiment 1except that two premises were given with a single conclusion to be

Page 10: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

EVANS, HANDLEY, HARPER, AND JOHNSON-LAIRD

Figure 3. Percentage of conclusions accepted in Experiment 2(Ex 2) under Necessity instructions broken down into four problemtypes: Necessary (N), Possible Strong (PS), Possible Weak (PW),and Impossible (I), together with replication data from Experiment3 (Ex 3).

evaluated by pressing the mouse in a YES or NO box. The 64syllogisms were presented on separate screens in an independentlyrandomized order for each participant.

Initially the participants were given an overview of the structureof the experiment and informed of their right to withdraw andaspects of confidentiality. They were then presented with instruc-

100 -,

80-

20-

N

Figure 4. Percentage of conclusions accepted in Experiment 2under Possibility instructions broken down into four problemtypes: Necessary (N), Possible Strong (PS), Possible Weak (PW),and Impossible (I), together with replication data from Experiment3 (Ex 3).

tions on screen depending on the experimental condition to whichthey were assigned. Instructions were similar to those used inExperiment 2 except for the requirement to evaluate a singleconclusion on each screen. Before the syllogisms were randomlypresented, participants were given practice with the mouse to getused to the manner of signaling Yes and No responses.

Results and Discussion

The frequency of acceptance of different logical typesunder Necessity instructions is shown in Figure 3 togetherwith the corresponding data from Experiment 2. It can beseen that Experiment 3 provides a close replication, dispel-ling any possibility that the difference between PossibleStrong and Possible Weak problems was magnified byretrospective choice of these categories (e.g., due to statisti-cal regression). It remains the case in Experiment 3 thatacceptance rates of Necessary and Possible Strong problemsare very similar. Possible Weak problems are also acceptedonly with slightly higher frequency than Impossible prob-lems. Figure 4 compares acceptance frequencies for thedifferent problem types under Possibility instructions, againcomparing the data of Experiments 2 and 3. A goodreplication is again seen, but the tendency for Possible Weakproblems to be more highly accepted than Impossibleproblems appears more marked in the replication.

Mean percentage acceptance of conclusions in Experi-ment 3 is shown in Table 2, broken down by figure, problemtype, conclusion order, and instruction type. Each cellrepresents the mean of 20 participants' responses with fourreplications of syllogisms. The use of replications permittedus to run a four-way ANOVA on the frequency of Yesresponses, with instruction and conclusion order as between-groups variables and figure and problem type as within-group variables. There was a significant main effect ofinstruction, F(l, 76) = 18.51, MSE = 0.127,/? < .001, suchthat more conclusions were accepted under Possibility(57%) than Necessity (49%) instructions and as might be

Table 2Percentage of Conclusions Accepted in Experiment 3,Broken Down by Problem Type, Figure, Conclusion Order,and Instructional Group

Instruction andproblem type

NecessityNecessaryImpossiblePossible strongPossible weak

MPossibility

NecessaryImpossiblePossible strongPossible weak

M

A-BB-C

ac

819

841848

8931885967

ca

7921832351

7815932653

B-AC-B

ac

7025802450

8630865164

ca

8420811951

865

932151

A-BC-B

ac

7613741946

7421904558

ca

8615811549

8314912654

B-AB-C

ac

8424701348

8630794359

ca

7913781947

8519792953

M

80177919

83218738

Page 11: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

NECESSITY AND POSSIBILITY 1505

expected a very highly significant effect of problem type,F(3,228) = 412.00, MSE = 0.090, p < .001.

In Experiments 1 and 2, problem type (defined on threerather than four levels) interacted significantly with instruc-tions, such that relatively more Possible conclusions wereaccepted under instructions for Possibility. The two vari-ables also interacted significantly in this analysis, F(3,228) = 4.85, MSE = 0.903, p < .005. With the breakdownof the Possible category into Strong and Weak, however, itappears from the means (right-hand column of Table 2) thatthe major difference lies in the acceptance rates of PossibleWeak problems that are accepted at a rate of 38% underPossibility instructions compared with only 19% underNecessity instructions. This difference is statistically signifi-cant, £(78) = 4.48, p < .01, and is theoretically importantbecause it suggests that under instructions for Possibility,reasoners do seek alternative models of the premises if thefirst fails to support the premises, rather than simplyresponding No. This contrasts with the very high acceptancerates on Possible Strong problems under Necessity instruc-tions, which suggest that rather little search for alternativemodels occurs when the first model identified confirms theconclusion.

There were several other significant effects. First, therewas a significant if small main effect of figure, F(3, 228) =3.04, MSE = 0.026, p < .05. Conclusions were most oftenaccepted in Figure 1 (55%) and least often in Figure 4(52%). There was no main effect of conclusion order, butthis variable did interact significantly with instruction, F(l,76) = 7.88, MSE = 0.127, p < .01. Inspection of Table 3suggests that this was due to greater acceptance of ACconclusions in the Possibility group only. There is noobvious explanation for this trend. The remaining trendswere interactions involving problem type. This interactedsignificantly with figure and conclusion order separately,and all three variables significantly if weakly interactedtogether, F(9,684) = 2.12, MSE = 2.12, p < .05.

Finally, it should be noted that the strong replication ofsimilar findings for the same syllogisms presented in Experi-ment 3 when compared with Experiment 2 counters twopossible criticisms of the design of the previous experiment.First, it might be suggested that the prior experience ofparticipating in Experiment 1 might have had some effect onperformance in Experiment 2; Experiment 3 had no suchprior task. Second, it could be argued that participantsbecame fatigued or bored while processing the large numberof syllogisms used in Experiment 2, which might add noiseto their data. We think this was unlikely to be true in anycase, because of the self-pacing provided and the fact thatthe analysis showed very clear and systematic trends. Thefact that the data were so similar on corresponding syllo-gisms in Experiment 3, in which only one quarter as manywere used, confirms this view.

General Discussion

Our experiments have shown that logically untrainedindividuals are able to draw modal conclusions both tosingle quantified assertions and to syllogistic premises. The

results also confirm our three main predictions based on themental model theory. First, the participants were more likelyto endorse conclusions about what is possible than what isnecessary (Prediction PI). Possibility calls for only a singlemodel of the premises to support the conclusion, whereasnecessity calls for all the models of the premises to supportthe conclusion. Second, the participants were more likely toendorse a possible conclusion if the premises supported itsnecessity as opposed to its mere possibility (Prediction P2).In the case of necessity, the conclusion holds in all themodels of the premises, and so no matter which model of thepremises reasoners construct, it will support the conclusion.In contrast, if the premises only support a possible con-clusion, then reasoners need to search for the model ofthe premise in which the conclusion holds, and they needto ignore those models that are counterexamples to theconclusion.

Our third hypothesis was that participants would be morelikely to decide that a conclusion was not necessary if it wasimpossible (Prediction P3). A conclusion that is impossibleholds in no models of the premises, whereas a conclusionthat is merely not necessary holds in at least one model of thepremises, and so reasoners are liable to go wrong in the lattercase if they focus on a model in which the conclusion holds.In other words, they have to search for a counterexample tothe content of the conclusion to infer that it is not necessary.The search is easier where all the models of the premises arecounterexamples than where at least one is not. The threepredictions were corroborated in Experiment 1 in which theparticipants evaluated immediate inferences from a singlequantified premise to a quantified conclusion, in Experiment2 in which they evaluated all possible modal conclusionsfrom syllogistic premises, and in Experiment 3 in which theyevaluated a subset of modal conclusions from syllogisticpremises.

One initially unexpected effect was that problems support-ing possible conclusions fell into two categories. Some ofthem were regularly taken to imply necessary conclusions,whereas others were rarely taken to imply necessary conclu-sions and indeed sometimes were not even taken to implypossible conclusions. This phenomenon was first detected inExperiment 1. It also occurred in Experiment 2, and it wascorroborated by Experiment 3, which we designed tocontrast the two sorts of problems, which we termedPossible Strong and Possible Weak. It is particularly strikingin Figure 3 that Possible Strong syllogisms are endorsed asfrequently as Necessary ones and that Possible Weaksyllogisms are endorsed as infrequently as Impossible ones.This suggests that any search for counterexample models isweak in the present study and that most participants arebasing their conclusions on the first model that occurs tothem (we return to this issue later). On this argument, thefallacies are sharply divided into our two categories becausesome premise pairs consistently suggest an initial model thatsupports the conclusion (Possible Strong), whereas othersconsistently suggest a model that negates the conclusion(Possible Weak).

It is clear that such a finding cannot be accounted for at the

Page 12: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

1506 EVANS, HANDLEY, HARPER, AND JOHNSON-LAIRD

very general level of our other predictions. However, thecomputer program implementation of the model theoryreported by Johnson-Laird and Byrne (1994) does generatemodels in specific orders, so we were able to examine theoutput of the program for our two categories of syllogisms totry to confirm our intuitive account. The results providedstrong corroboration. All premise pairs studied in the twoPossible conditions of Experiment 3 were run through theprogram. In 29 out of the 32 cases, the Possible Strongsyllogisms had an initial model that supported the givenconclusion, whereas none of the 32 Possible Weak syllo-gisms had an initial model that supported the givenconclusion.

To see how this arises, consider an example of PossibleStrong syllogisms whose fallacious conclusion is endorsedby most participants:

Some A are not BAll B are CTherefore, some A are not C

The initial model of the premises, as constructed by theprogram, is

a -,ba -,b

[b] c[b] c

where —. denotes negation and a, b, and c are tokensrepresenting exemplars for the classes A, B, and C. Thesquare brackets around b in the latter models indicate that b'sare exhaustively represented with respect to c's. In otherwords, there cannot be b's found in models that donot contain c's. This model supports the conclusion pre-sented, and 75% of the participants in Experiment 3 inferred(wrongly) that it was a necessary conclusion whereas85% of the participants inferred (correctly) that it was apossible conclusion. In fact, as the program also shows,there is an alternative model of the premises that refutes theconclusion:

a -^b ca -.b c

[b] c[b] c

Hence, the conclusion is possible but not necessary. Anexample of a Possible Weak syllogism has the followingform:

All A are BSome C are BTherefore, all A are C

The initial model of the premises is:

[a][a]

This model does not support the given conclusion, butinstead the conclusion "Some A are C" (or its converse). Butthis conclusion can also be refuted by an alternative model

of the premises:

[a] bb c

[a] bb c

To grasp that the given conclusion is at least possible,reasoners need to construct another model of the premises:

[a] b c[a] b c

c

As this account predicts, few participants in Experiment 3wrongly inferred that the given conclusion was necessary(15%), and only a minority inferred that it was even possible(40%). It is important to note that discovery of such PossibleWeak syllogisms was made in this experiment because of themethodology involved whereby participants were asked toevaluate every possible conclusion for every possible premisepair. With a production task method in which people areinstructed to produce a single, necessary conclusion, thepremise pairs involved would normally be associated withthe production of some alternative conclusion, as illustratedin the above example. An interesting possibility for futureresearch would involve asking people to produce conclu-sions that are possible rather than necessary.

The program implementing the model theory works byconstructing separate models of the two premises and thenjoining them together by way of the tokens representing themiddle term. Thus, additional steps are required to addfurther tokens to the model (as in the example above of aPossible Strong syllogism) or to split an individual, such as[a] b c, into two distinct individuals: [a] b and b c (as in theexample above of a Possible Weak conclusion). These stepsyield models that are constructed after the initial models (seeJohnson-Laird and Byrne, 1994, for a detailed description ofthe operations of the program).

Although the above analysis is very encouraging for themodel theory, we should note that alternative explanationscan be offered for the difference between Possible Strongand Possible Weak syllogisms. In particular, the probabilityheuristics model of Chater and Oaksford (1999) can accountfor the difference in terms of the "min-heuristic." Themin-heuristic specifies that the conclusion drawn fromsyllogistic premises will match the form of least informativepremise, in which informativeness has the following order:all > some > no > some . . . not. Of the 32 Possible Strongsyllogisms shown in Table 1,30 have conclusions that matchthe less informative premise, whereas the remaining 2 haveconclusions that are less informative than either premise. Forthe 32 Possible Weak problems, all but 1 have a conclusionthat is more informative than at least one premise.

The equally strong account that both the model programand the min-heuristic provide the difference between Pos-sible Strong and Weak syllogisms is both surprising andinteresting. The min-heuristic is offered as part of an entirelynondeductive account of syllogistic reasoning by Chater andOaksford (1999). The effect that it has is to ensure thatarguments are not supported when their conclusions are

Page 13: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

NECESSITY AND POSSIBILITY 1507

more informative than the premises. This property is true notonly of all valid arguments (deduction cannot add informa-tion) but of many fallacies also. Hence, the theory accounts(as it was designed to do) for the fact that in syllogisticreasoning experiments, participants endorse most validarguments but also many fallacies. It is clear that the modelprogram, based on a quite different approach, has the effectof first generating models whose provisional conclusionsconform with the min-heuristic. This coincidence has beendiscussed recently in terms of the "atmosphere effect"—whose predictions are similar to those of the min-heuristic—by Shaw and Johnson-Laird (1998). These au-thors also examined the order with which the mental modelsprogram produces models and observed, as we do here, thaterrors often result from fallacious conclusions being sup-ported by the initial models of the premises considered.

Let us return now to the question of whether reasonersactually search for alternative models as envisaged by thethird stage of the model theory. This issue is theoreticallyimportant because the emphasis placed on this stage by Polkand Newell (1995) in their verbal reasoning theory is muchless than in the mental models account of, for example,Johnson-Laird and Byrne (1991). It is also the aspect of themodel theory required to account for deductive competence.We say above that the evidence for such searching is weak inthe present study. We do not say that it is absent becauseother aspects of our data—as well those of other studies inthe literature—suggest that at least some search for alterna-tive models is occurring. First, a robust result in standardsyllogistic reasoning is that one-model problems are reliablyeasier than multiple-model problems—every study in theliterature, including those carried out by critics of the modeltheory, has corroborated this result (see, e.g., Ford, 1995).This difference is manifest on valid syllogisms taken sepa-rately in which any model of the premises supports theconclusion. Thus if people were deciding on the basis of thefirst model alone, there should be no difference.

The present study provides some important new evidencefor this claim. First, Experiment 2 showed that the effectoccurs when participants were instructed to judge necessityof conclusions—equivalent to the validity judgments used inconventional reasoning experiments. Second, for those par-ticipants who were instructed to judge the possibility ofconclusions, die effect was reliably reduced. This interactionstrongly suggests that people understand that a judgment ofnecessity requires finding a conclusion in all models,whereas judgment of possibility does not. This requires thefurther assumption, however, that people realize that thereare multiple models but are not confident of their ability tocheck them all.

Also pertinent is the significant interaction betweeninstruction type and logical classification reported in Experi-ments 1 and 2. Inspection of Figures 1 and 2 reveals that thiswas a result of the difference in endorsement rates betweenPossibility and Necessity instructions that was most markedon Possible syllogisms (potential fallacies), which were notsubdivided at this stage of our report. Although a generalincrease in acceptance rates under Possibility instructionscould be due to response bias, such as a caution effect, the

interaction suggests that at least some people are searchingfor alternative models and finding them. However, theanalysis of Experiment 3, in which the two kinds of Possibleproblems were separated, indicates that this difference wasmostly due to a quite marked increase in acceptance rate ofPossible Weak syllogisms under Possibility instructions.Thus we have clear evidence that people search for alterna-tive models to prove the possibility of conclusions that arenot supported by the first model considered, but in this caseless clear evidence that people search for counterexamplesto establish the necessity of conclusions tiiat are supportedby the first model considered.

The issue of whether people search for alternative modelshas also been investigated in studies reported elsewhere,with mixed conclusions depending on the methodologyused. Newstead, Handley, and Buck (1999) used twoprocess-tracing methods to address this question. The firstmethod involved asking participants to indicate any otherconclusion they had considered immediately after respond-ing, and the second involved drawing Euler circle diagrams.Neither method produced evidence that people were consid-ering alternative models. A ratiier different picture emergesfrom the study of Bucciarelli and Johnson-Laird (1998).They asked their participants to construct external models(cut-out shapes) as they drew conclusions from syllogisticpremises, and they videotaped the resulting sequences ofmodels. Their results showed that reasoners consider avariety of different interpretations of the premises, asclaimed by Polk and Newell (1995). However, the partici-pants tended not to reencode the premises for one-modelsyllogisms (in contrast with the verbal reasoning), and theydid generate sequences of models for multiple-model syllo-gisms. Whether they were searching for counterexampleswas less clear. However, another phenomenon of syllogisticreasoning does support a search for counterexamples. Byrneand Johnson-Laird (1990) tested their participants' ability torecognize conclusions that they had drawn earlier to syllogis-tic premises. As a search for counterexamples predicts,participants often falsely recognized conclusions supportedby an initial model of the premises when, in fact, they hadresponded correctly that nothing followed from the pre-mises. This result suggests that they had fleetingly consid-ered the erroneous conclusion only to reject it as a result of acounterexample.

In view of this somewhat ambiguous evidence, a cautiousconclusion might be that people can search for alternativemodels but do not necessarily do so spontaneously. Thismight explain why evidence for such search seems todepend on the experimental method used. We are aware oftwo factors that seem to influence strongly whether such asearch occurs. The first is the so-called belief bias effect thatmight better be described as a debiasing effect. Althoughcommonly described as a tendency for people to endorsefallacies whose conclusions are believable, it might be moreaccurately described as a tendency to suppress fallacieswhen their conclusions are unbelievable. Most problemsused in this literature are of the type we term Possible Strongin which people will normally endorse the fallacy when theconclusion is abstract or neutral. Those few studies that have

Page 14: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

1508 EVANS, HANDLEY, HARPER, AND JOHNSON-LAIRD

included neutral as well as believable and unbelievableconclusions have reported that the endorsements of neutralconclusions are at least as high as for believable ones, andthat belief bias is due to suppression of endorsement ofunbelievable conclusions (Evans & Handley, 1997; Evans &Pollard, 1990; Newstead, Pollard, Evans, & Allen, 1992).

This finding supports the mental model account of thebelief bias effect (see, e.g., Oakhill, Johnson-Laird, &Gamham, 1989). When a putative conclusion is unbeliev-able, then people are motivated to search for counterex-amples to disprove it. Also consistent with the view thatpeople may require some stimulus to engage in modelsearching is the effect of instructions. Evans, Allen, New-stead, and Pollard (1994) showed that strong instructionalemphasis on logical necessity significantly reduced beliefbias and also significantly suppressed the tendency toendorse possible but not necessary conclusions. If thenatural mode of thinking is inductive rather than deductive(as argued by Evans & Over, 1996) then it makes sense thatdeductive competence will be most apparent under condi-tions in which people are clearly motivated to make an effortat deduction.

Finally, might there be an alternative explanation for ourmain findings in terms of a theory based on formal rules?Our principal predictions are independent of the details ofthe model theory, and they cannot be made by currentversions of formal rule theories that describe only a mecha-nism for necessary inference. Osherson's (1976) rules formodal inferences concern only a limited set of inferencesbased on prepositional connectives and make no predictionsabout quantified modal reasoning. Our findings clearlyprovide a strong challenge for mental logicians as well asthose psychologists who believe that syllogistic reasoningcan be explained without any reference to an effort atdeduction.

References

Bell, V. A., & Johnson-Laird, P. N. (1998). A model theory ofmodal reasoning. Cognitive Science, 22, 25-52.

Braine, M. D. S. (1978). On the relation between the natural logicof reasoning and standard logic. Psychological Review, 85, 1-21.

Braine, M. D. S., & O'Brien, D. P. (1991). A theory of If: A lexicalentry, reasoning program, and pragmatic principles. Psychologi-cal Review, 98, 182-203.

Braine, M. D. S., Reiser, B. J., & Rumain, B. (1984). Someempirical justification for a theory of natural prepositional logic.In G. H. Bower (Ed.), The psychology of learning and motivation(Vol. 18, pp. 317-371). New York: Academic Press.

Bucciarelli, M., & Johnson-Laird, P. N. (1998). Strategies insyllogistic reasoning. Unpublished manuscript, PrincetonUniversity.

Byrne, R. M. J., & Johnson-Laird, P. N. (1990). Rememberingconclusions we have inferred: What biases reveal. In J.-P.Cavemi, J.-M. Fabre, & M. Gonzalez (Eds.), Advances inpsychology: Vol. 68. Cognitive biases (pp. 109-120). Amster-dam: North-Holland.

Chater, N., & Oaksford, M. (1999). The probability heuristicsmodel of syllogistic reasoning. Cognitive Psychology, 38, 191—258.

Evans, J. St. B. T. (1989). Bias in human reasoning: Causes andconsequences. Hillsdale, NJ: Erlbaum.

Evans, J. St. B. T, Allen, J. L., Newstead, S. E., & Pollard, P.(1994). Debiasing by instruction: The case of belief bias.European Journal of Cognitive Psychology, 6, 263—285.

Evans, J. St. B. T, & Handley, S. J. (1997). Necessary and possibleinferences: A test of the mental model theory of reasoning(Report to the Economic and Social Research Council, GrantR000221742). Plymouth, United Kingdom, University ofPlymouth.

Evans, J. St. B. T, Newstead, S. E., & Byrne, R. M. J. (1993).Human reasoning: The psychology of deduction. Hove, England:Erlbaum.

Evans, J. St. B. T, & Over, D. E. (1996). Rationality and reasoning.Hove, England: Psychology Press.

Evans, J. St. B. T, & Over, D. E. (1997). Rationality in reasoning:The case of deductive competence. Current Psychology ofCognition, 16, 3-38.

Evans, J. St. B. T., & Pollard, P. (1990). Belief bias and problemcomplexity in deductive reasoning. In J. P. Caverni, J. M. Fabre,& M. Gonzales (Eds.), Cognitive biases (pp. 131-154). Amster-dam: North-Holland.

Ford, M. (1995). Two modes of mental representation and problemsolution in syllogistic reasoning. Journal of Experimental Psy-chology: General, 115, 16-25.

Galotti, K. M., Baron, J., & Sabini, J. P. (1986). Individualdifferences in syllogistic reasoning: Deduction rules or mentalmodels? Journal of Experimental Psychology: General, 115,16-25.

Johnson-Laird, P. N. (1983). Mental models. Cambridge, England:Cambridge University Press.

Johnson-Laird, P. N., & Bara, B. G. (1984). Syllogistic inference.Cognition, 16, 1-62.

Johnson-Laird, P. N., & Byrne, R. (1991). Deduction. Hove,England: Erlbaum.

Johnson-Laird, P. N., & Byrne, R. M. J. (1994). Models, necessityand the search for counter-examples. A reply to Martin-Cordero& Gonzalez-Labra and to Smith. Behavioral and Brain Sciences,17, 775-777.

Manktelow, K. I., & Over, D. (1993). (Eds.). Rationality. London:Routledge.

Newstead, S. E., & Griggs, R. A. (1983). Drawing inferences fromquantified statements: A study of the square of opposition.Journal of Verbal Learning and Verbal Behavior, 22, 535-546.

Newstead, S. E., Handley, S. J., & Buck, E. (1999). Falsifyingmental models: Testing the predictions of theories of syllogisticreasoning. Journal of Memory and Language, 27, 344—354.

Newstead, S. E., Pollard, P., Evans, J. St. B. T, & Allen, J. (1992).The source of belief bias in syllogistic reasoning. Cognition, 45,257-284.

Oakhill, J., Johnson-Laird, P. N., & Garnham, A. (1989). Believabil-ity and syllogistic reasoning. Cognition, 31, 117-140.

Osherson, D. N. (1976). Logical abilities in children: Vol. 4.Reasoning and concepts. Hillsdale, NJ: Erlbaum.

Polk, T. A., & Newell, A. (1995). Deduction as verbal reasoning.Psychological Review, 102, 533-566.

Rips, L. J. (1983). Cognitive processes in propositional reasoning.Psychological Review, 90, 38-71.

Rips, L. J. (1994). The psychology of proof. Cambridge, MA: MITPress.

Shaw, V., & Johnson-Laird, P. N. (1998). Dispelling the "atmo-sphere" effect on reasoning. In A. C. Quelhas (Ed.), Cognitionand context (pp. 169-200). Lisbon: Instituto Superior de Psicolo-gia Aplicado.

Page 15: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

NECESSITY AND POSSIBILITY

Appendix A

Percentage Endorsements of Conclusions in Experiment 1

1509

Premise

All A are B

_

Some A are B

No A are B

Some A are not B

Conclusion

Some A are BNo A are BSome A are not BAll B are ASome B are ANo B are ASome B are not A

All A are BNo A are BSome A are not BAll Bare ASome B are ANo Bare ASome B are not A

All A are BSome A are BSome A are not BAH Bare ASome B are ANo Bare ASome B are not A

All A are BSome A are BNo A are BAH B are ASome B are ANo Bare ASome B are not A

Logic

NIIPNIP

PIPPNIP

IINIINN

IPPPPPP

Necessity

653

1057721742

38

925

822285

37

777

206765

28757

701288

Possibility

82127

85803252

53559753955097

25

8025359087

3974838955395

Note. N = necessary; I = impossible; P = possible.

(Appendix B follows)

Page 16: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

1510 EVANS, HANDLEY, HARPER, AND JOHNSON-LAIRD

Appendix B

Fig.

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

Percentage Endorsement of ConclusionsCone.order

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

Premises

All A are BAll B are C

All A are BAll B are C

All A are BSome B are C

All A are BSome B are C

Some A are BAll B are C

Some A are BAll B are C

Some A are BSome B are C

Some A are BSome B are C

No A are BAll B are C

No A are BAll B are C

No A are BSome B are C

No A are BSome B are C

Some A are not BAll B are C

Some A are not BAll B are C

Some A are not BSome B are C

Some A are not BSome B are C

Conclusion

All A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll C are ASome C are ANo Care ASome C are not AAll A are CSome A are CNo A are CSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo C are ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll C are ASome C are ANo C are ASome C are not AAll A are CSome A are CNo A are CSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll C are ASome C are ANo C are ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not A

in

L

NNIIPNIPPPPPPPPPPNIPPNIPPPPPPPPPPPPPIPPNPPPPIPPNPPPPPPPPPPPPPPPP

Experimen

N

732710237747

7201087138010831067

787108013837

6713803

733

777

6710107740132380471340405313334357

76313837

60178313773

707

502383

P

80532713836327403083338733902080239020873790208333

1005093478750932017837323278057276077802050636727873793108743902093438033835383

12 Under Necessity (N) and Possibility (P) Instructions

Fig.

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

Cone.order

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

Premises

All A are BNo B are C

All A are BNo B are C

All A are BSome B are not C

All A are BSome B are not C

Some A are BNo B are C

Some A are BNo B are C

Some A are BSome B are not C

Some A are BSome B are not C

NoAareBNo B are C

No A are BNo B are C

No A are BSome B are not C

No A are BSome B are not C

Some A are not BNo B are C

Some A are not BNo B are C

Some A are not BSome B are not C

Some A are not BSome B are not C

Conclusion

All A are CSome A are CNo A are CSome A are not CAll Care ASome C are ANo C are ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo C are ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo C are ASome C are not AAll A are CSome A are CNo A are CSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll C are ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll C are ASome C are ANo C are ASome C are not A

L

IINNIINNPPPPPPPPIPPNPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP

N

37

8327

77

8340106713906

4323770

3037833

50406013677

777

57177717276040173053401340405017404063

3632767104727737

7717870

571763

P

1013

10057101387501080379027872393136373932047777037975393309347

1003040777337408070438367873060677723805390376747732093409323873793

Page 17: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

NECESSITY AND POSSIBILITY 1511

Fig.

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

Cone.order

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

Premises

All Bare AAll C are B

All Bare AAll C are B "

All Bare ASome C are B

All Bare ASome C are B

Some B are AAll C are B

Some B are AAll C are B

Some B are ASome C are B

Some B are ASome C are B

No B are AAll C are B

No Bare AAll C are B

No Bare ASome C are B

No B are ASome C are B

Some B are not AAll C are B =

Some B are not AAll C are B

Some B are not ASome C are B

Some B are not ASome C are B

Conclusion

All A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo C are ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll C are ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo C are ASome C are not A

Appendix B

L

PNIPNNIIPNIPPNIPPPPPPPPPPPPPPPPPIINNIINNPPPPIPPNPPPPPPPPPPPPPPPP

N

6343

33380331073

877

73108313531783105010777

633

777

8313833

707

1373331010773010404053

3433373

77010873

5713807

63107310571780

P

83807

3087701720338727872787208037972087437720775397578740

10057971720977013178360275077601063708027875377238327973787479030904387

(continued)

Fig.

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

Cone.order

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

Premises

All Bare ANo C are B

All Bare ANo C are B

All Bare ASome C are not B

All A are BSome B are not C

Some B are ANo C are B

Some B are ANo C are B

Some B are ASome C are not B

Some B are ASome C are not B

No Bare ANo C are B

No B are ANo C are B

No B are ASome C are not B

No B are ASome C are not B

Some B are not ANo C are B

Some B are not ANo C are B

Some B are not ANo C are B

Some B are not ASome C are not B

Conclusion

All A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo C are ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll C are ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo C are ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNo A are CSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo C are ASome C are not A

L

IPPNPPPPPPPPPPPPIPPNPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP

N

1717232710207047

7701093

353207010275750

02060671060

773

3577

77172753332323533710532753

04727531047305310303743

38013800

677

77

P

20308067273780601793539317872390204783832763707730

1004793338743933030936737478363408067771753837727607370235363572790439020975087

Page 18: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

1512 EVANS, HANDLEY, HARPER, AND JOHNSON-LAIRD

Fig.

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

Cone.order

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

Premises

All A are BAll C are B

All A are BAll C are B

All A are BSome C are B

All A.are BSome C are B

Some A are BAll C are B

Some A are BAll C are B

Some A are BSome C are B

Some A are BSome C are B

NoAareBAll C are B

No A are BAll C are B

NoAareBSome C are B

No A are BSome C are B

Some A are not BAll C are B

Some A are not BAll C are B

Some A are not BSome C are B

Some A are not BSome C are B

Conclusion

AllAareCSome A are CNo A are CSome A are not CAll Care ASome C are ANo Care ASome C are not AAllAareCSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAllAareCSome A are CNo A are CSome A are not CAll Care ASome C are ANoCareASome C are not AAllAareCSome A are CNoAareCSome A are not CAll Care ASome C are ANo C are ASome C are not AAllAareCSome A are CNoAareCSome A are not CAll C are ASome C are ANoCareASome C are not AAllAareCSome A are CNoAareCSome A are not CAll Care ASome C are ANoCareASome C are not AAll A are CSome A are CNo A are CSome A are not CAll Care ASome C are ANoCareASome C are not AAllAareCSome A are CNoAareCSome A are not CAll Care ASome C are ANoCareASome C are not A

Appendix B

L

PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPIINNIINNPPPPIPPNIPPNPPPPPPPPPPPP

N

60272023674717173

6723801080237010830

673

701357

357107010637

63101780373

108340104043637

374060

377

790

75027807

6713803

601363

P

8363233077602330307053932390338030973783439030874090478733935393102093701713875713538780

357638017874090178043903097409327933096

(continued)

Fig.

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

Cone.order

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

Premises

All A are BNo C are B

AllAareBNo C are B

AllAareBSome C are not B

AllAareBSome C are not B

Some A are BNo C are B

Some A are BNo C are B

Some A are BSome C are not B

Some A are BSome C are not B

NoAareBNo C are B

No A are BNo C are B

No A are BSome C are not B

NoAareBSome C are not B

Some A are not BNo C are B

Some A are not BNo C are B

Some A are not BSome C are not B

Some A are not BSome C are not B

Conclusion

All A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANoCareASome C are not AAllAareCSome A are CNoAareCSome A are not CAll Care ASome C are ANo C are ASome C are not AAllAareCSome A are CNo A are CSome A are not CAll Care ASome C are ANoCareASome C are not AAllAareCSome A are CNoAareCSome A are not CAll C are ASome C are ANoCareASome C are not AAllAareCSome A are CNo A are CSome A are not CAll Care ASome C are ANoCareASome C are not AAllAareCSome A are CNoAareCSome A are not CAll Care ASome C are ANoCareASome C are not AAllAareCSome A are CNo A are CSome A are not CAll Care ASome C are ANoCareASome C are not AAllAareCSome A are CNo A are CSome A are not CAll Care ASome C are ANoCareASome C are not A

L

IINNIINNPPPPIPPNIPPNPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP

N

37

833377

8037106713800

57178013373363

0374340106313807

637

73102743

31720402713503357

3433050

3332753

747306713531373

3571360

P

720876313178060178047871073438317638077175380732397579043875783606377775757776727638073136077702780508330637067409357

10043

1003387

Page 19: Journal of Experimental Psychology: Copyright 1999 by the ...mentalmodels.princeton.edu/papers/1999necc&prob.pdfJournal of Experimental Psychology: Learning, Memory, and Cognition

NECESSITY AND POSSIBILITY 1513

Fig.

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

Cone.order

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

Premises

All Bare AAll B are C

All Bare A_All B are C

All Bare ASome B are C

All B are ASome B are C

Some B are AAll B are C

Some B are AAll B are C

Some B are ASome B are C

Some B are ASome B are C

No Bare AAll B are C

No Bare AAll B are C

No B are ASome B are C

No B are ASome B are C

Some B are not AAll B are C

Some B are not AAll B are C

Some B are not ASome B are C

Some B are not ASome B are C

Conclusion

All A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNo A are CSome A are not CAll Care ASome C are ANo C are ASome C are not AAll A are CSome A are CNo A are CSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll C are ASome C are ANo Care ASome C are not AAllAareCSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not A

Appendix B

L

PNIPPNIPPNIPPNIPPNIPPNIPPPPPPPPPPPPPIPPNPPPPIPPNPPPPIPPNPPPPPPPP

N

70271010604710171390

770108310607

777

7313731067

7571760205717570

176353102763337

373750

3402757136717877

601387

367

3873

531367

P

877027438770273737932790439027373387238753873077479060903790538317208360

7178760134783801057777723835087408023904083409327806083

{continued)

Fig.

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

Cone.order

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

A-C

C-A

Premises

All Bare ANo B are C

All Bare ANo B are C

All Bare ASome B are not C

All A are BSome C are not B

Some B are ANo B are C

Some B are ANo B are C

Some B are ASome B are not C

Some B are ASome B are not C

No B are ANo B are C

No Bare ANo B are C

No Bare ASome B are not C

No Bare ASome B are not C

Some B are not ANo B are C

Some B are not ANo B are C

Some B are not ANo B are C

Some B are not ASome B are not C

Conclusion

AllAareCSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAllAareCSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAllAareCSome A are CNoAareCSome A are not CAll Care ASome C are ANo C are ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo C are ASome C are not AAllAareCSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAllAareCSome A are CNo A are CSome A are not CAll Care ASome C are ANo Care ASome C are not AAllAareCSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not AAll A are CSome A are CNoAareCSome A are not CAll Care ASome C are ANo Care ASome C are not A

L

IPPNPPPPIPPNPPPPIPPNPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP

N

72063307

2073377

6313870

5317837

303070

7304753

770

7800

571367201747232330432713403353133743501343335013374047135713533

531760

P

20179370132790673087408320802787236080832060778333874390338747936050807753578073437067832357637737606770235770834093609743935390

Note. Fig. = figure; Cone. = conclusion; L = logic (under which N = necessary; I = impossible; P = possible).

Received May 12, 1998Revision received April 27, 1999

Accepted April 30,1999