Beyond Deja Vu in the Search for Cross-Situational Consistency

Psychological Review Copyright 1982 by the American Psychological Association, Inc.1982, Vol. 89, No. 6, 730-755 0033-295X/82/8906-0730$00.75

Beyond Deja Vu in the Search forCross-Situational Consistency

Walter Mischel and Philip K. PeakeStanford University

Recent efforts to resolve the debate regarding the consistency of social behaviorare critically analyzed and reviewed in the light of new data. Even with reliablemeasures, based on multiple behavior observations aggregated over occasions,mean cross-situational consistency coefficients were of modest magnitude; incontrast, impressive temporal stability was found. Although aggregation of mea-sures over occasions is a useful step in establishing reliability, aggregation ofmeasures over situations bypasses rather than resolves the problem of cross-sit-uational consistency. The Bem-Funder (1978) template-matching approach didnot enhance the search for cross-situational consistency either in their originaldata or in an extended replication presented here. The Bern-Allen (1974) mod-erator-variable approach also was not found to yield greater cross-situationalconsistency in the behavior of "some of the people some of the time" either intheir original data or in the present study of conscientiousness. Congruent witha cognitive prototype approach, it was proposed and demonstrated that the judg-ment of trait consistency is strongly related to the temporal stability of highlyprototypic behaviors. In contrast, the global impression of consistency may notbe strongly related to highly generalized cross-situational consistency, even inprototypic behaviors. Thus, the perception and organization of personality con-sistencies seems to depend more on the temporal stability of key features thanon the observation of cross-situational behavioral consistency, and the formermay be easily interpreted as if it were the latter.

The relative specificity versus consistency Reviewers of this dispute have repeatedlyof social behavior, and the nature and breadth noted a curious paradox (e.g., Bern & Allen,of the dispositions underlying such behavior, 1974; Mischel, 1973). On the one hand, corn-probed by Thorndike (1906), Hartshorne pelling intuitive evidence supports the en-and May (1928), Allport (1937, 1966), Fiske during conviction that people are character-(1961), and many others, still remains the ized by broad dispositions revealed in exten-focus of controversy in contemporary per- sive cross-situational consistency. On thesonality theory. The position one takes on other hand, the history of research in the areathese issues profoundly affects one's view of has yielded persistently perplexing results,personality and the strategies worth pursuing suggesting much less consistency than ourin the search for its nature and implications, intuitions predict.Few assumptions are simultaneously more Reactions to this consistency paradox haveself-evident, yet more hotly disputed, than taken two directions. One response has beenthat an individual's behavior is characterized to challenge the underlying assumptions ofby pervasive cross-situational consistencies, traditional trait theories (Mischel, 1968; Pe-

terson, 1968; Vernon, 1964) and search foralternative ways of conceptualizing person

Preparation of this paper and the research by the au- variables (e.g., Bandura, 1978; Cantor &thors were supported in part by Grant HD MH-0914 A,,- u . K\-H\ K./I- u i imi irnru A cfrom the National Institute of Child Health and Human Mischel, 1979; Mischel, 1973, 1979) and ofDevelopment and Grant MH-36953 from the National Studying person-Situation interactionsinstitute of Mental Health. We benefited much from the through more fine-grained analyses (e.g.,thoughtful comments of more colleagues than we can Magnusson & Endler, 1977; Moos & Fuhr,thank here: We are grateful for their generous help. , 009. patterson 1976' Patterson & Moore

Requests for reprints should be sent to Walter Mis- l**~' „ , ' * ( ' fallerson * MOOTC,chel, Jordan Hall, Building 420, Department of Psy- 1979; Raush, 1977). Alternatively, it haschology, Stanford University, Stanford, California 94305. been argued that the problems raised reflect

730

CROSS-SITUATIONAL CONSISTENCY 731

not the inadequacy of traditional conceptu-alizations of broad traits that yield cross-sit-uationally consistent behaviors but rather theinadequacy of earlier searches for such traits(e.g., Block, 1977; Olweus, 1977; Rushton,Jackson, & Paunonen, 1981). Of the manyefforts to pursue this "better methods" ap-proach to the consistency problem, the onesthat appear to be most dramatic, and thatcurrently have been greeted as most prom-ising, are the "reliability" solution (Epstein,1979) and the "template-matching" and "id-iographic" solutions proposed by Bern andhis colleagues (Bern & Allen, 1974; Bern &Funder, 1978). They are the focus of the pres-ent paper both because they deserve seriousattention in their own right and because theyare excellent exemplars of the methodologi-cal solutions proposed for the consistencyproblem, and their analysis may teach somehighly instructive lessons both for researchand theory building. In spite of the attentionthese approaches are attracting currently,the solutions they propose do not resolvethe basic issues raised by the consistencyparadox.

The present analysis documents this claim,offers data that show the limitations of theapproaches of both Epstein and Bern and hisassociates in the search for behavioral con-sistency, and argues for a resolution of theconsistency paradox based on a theoreticalreconceptualization. New data on the tem-poral stability and cross-situational consis-tency of conscientiousness in college studentsare presented and analyzed, guided by a cog-nitive prototype approach (Cantor & Mis-chel, 1979; Cantor, Mischel, & Schwartz,1982a, 1982b). These analyses suggest thatthe widely shared intuitions of consistencyare based on valid observations of behavioralregularities but of a type different from thoseusually pursued in the effort to resolve theconsistency paradox.

On Predicting Most of the People Much ofthe Time: The Reliability Solution

In Epstein's (1979) view, the consistencydebate in psychology, rather than meritingdeep and enduring discussion, should neverhave happened. He sees the consistency issuein personality as ready (indeed, as overdue)

for "a solution . . . so obvious that, oncepointed out, it reminds one of the fairy taleof The Emperor's New Clothing" (p. 1097).The solution is to realize that studies of be-havioral consistency rarely sample the be-haviors of interest on more than a single oc-casion. The consistency issue "can be re-solved by recognizing that most single itemsof behavior have a high component of errorof measurement and a narrow range of gen-erality" (p. 1097). In other words, the prob-lems of demonstrating behavioral consis-tency to support global traits are simply theresult of unreliable measurement in past re-search.

Remembering Reliability

In support of his arguments, Epstein (1979)recently demonstrated that coefficients oftemporal stability (e.g., of self-reported emo-tions and experiences recorded daily and ofobserver judgments) become much largerwhen based on averages over many days.Epstein computed split-half reliabilities forsamples of behavior varying from 2 days upto about 28 days. He found that as the num-ber of observations included in the compositeincreased, the split-half reliability also in-creased. Of course, this phenomenon is a fun-damental premise of classical reliability the-ory (Gulliksen, 1950; Lord & Novick, 1968;Thurstone, 1932), and the increase in reli-ability through use of aggregated compositesis exactly what the Spearman-Brown formulahas been used to estimate for years (also seeHorowitz, Inouye, & Siegelman, 1979).

The recognition that reliability is impor-tant and increases with the number of itemsaggregated is hardly new to the consistencydebate. Even introductory statistics texts rou-tinely intone that we cannot have either va-lidity or utility without reliability, and it isremarkable to suggest that the consistencydebate has suffered from mass amnesia forthe reliability construct and for the Spear-man-Brown prophecy. Far from overlookingreliability, virtually all of the classic, large-scale investigations of cross-situational con-sistency (e.g., Dudycha, 1936; Hartshorne& May, 1928; Newcomb, 1929) routinelyemployed behavioral measures aggregatedover repeated occasions yet reported findings

732 WALTER MISCHEL AND PHILIP K. PEAKE

that triggered the enduring debate (see Peake,Note 1, for review). Moreover, the Office ofStrategic Services (OSS) project in WorldWar II, the Michigan Veterans Administra-tion project, the Harvard personologists, andthe Peace Corps projects—all large-scale ap-plied assessment projects of the 1940s, 1950s,and 1960s—used aggregated measures,pooled judgments, assessment boards, andmultiple-item criteria and neverthelessyielded overall results that raised basic ques-tions about the usefulness and limitations ofthe traditional personality-assessment enter-prise (e.g., Peterson, 1968; Vernon, 1964;Wiggins, 1973). Although those who criti-cally evaluated the state of the field in the1960s did not overlook the issue of reliability,they nevertheless concluded, as Vernon(1964) did, that "The real trouble (with thetrait approach) is that it has not worked wellenough, and despite the huge volume of re-search it has stimulated, it seems to lead toa dead end" (p. 239).

As tempting as simple solutions might be,the problems raised by the consistency debatecannot be dismissed as the result of forget-fulness for the basic concepts of measure-ment error. How, then, can Epstein concludethat the consistency debate should neverhave occurred because it is resolved whenreliability is taken into account? Put anotherway, How can the use of aggregated measuresresolve a debate in the 1980s that it was un-able to resolve throughout the 1940s, 1950s,and 1960s? The answer to this seeminglypuzzling shift becomes quite simple once onerecognizes that the discrepancy is one of in-terpretation, not effect. Reliability is doingnothing more (or less) for Epstein than it didfor any of the earlier large-scale assessmentprojects that sought and obtained it.

Distinguishing Temporal Stability andCross-Situational Consistency

Adequate reliability coefficients can surelybe found, as Epstein has demonstrated andas earlier work amply attests. But we haveto discriminate clearly between demonstra-tions of impressive temporal stability on theone hand and cross-situational generality orconsistency in behavior on the other. By col-lecting specific observations over a series of

days, and then computing split-half "stabil-ity" coefficients, Epstein has accumulateddata (Tables 1, 2, 3, of Epstein, 1979) thatare relevant to the temporal stability of be-havior. But temporal stability has never beena central issue in this debate. As noted byMischel in 1968, "Considerable stability overtime has been demonstrated" (p. 36), and"although behavior patterns may often bestable, they are usually not highly generalizedacross situations" (p. 282).

Although temporal stability is a funda-mentally important phenomenon and meritscareful empirical attention, it is not the basicissue of the consistency debate. In our view,the crux of the classic debate is the cross-situational consistency or discriminativenessof social behavior and the utility of inferringtraits for the prediction of an individual'sactions in particular contexts. The opera-tional distinction between temporal stabilityand cross-situational consistency is impor-tant conceptually because it allows one topostpone (though only momentarily) theproblems of psychological similarity by sim-ply looking for the same behaviors over mul-tiple occasions in time. The moment the be-havior measures are not identical, the prob-lem of psychological similarity surfaces, andthat is the problem that has continually be-deviled the search for cross-situational con-sistency in a nomothetic trait framework.Although Epstein purports to resolve theconsistency debate by using aggregated mea-sures, most of his data are relevant only toan issue.that has never been seriously raised.

Epstein presents some data (in Tables 4and 5 of his 1979 article) that do go beyondthe demonstration that aggregation increasesreliability coefficients and enhances temporalstability. He presents (in his Table 4) inter-correlations of objective events with eachother and with self-rated emotions for a 12-day sample. Because the consistency of self-ratings is also not controversial (Mischel,1968), only the data intercorrelating aggre-gated objective events are directly relevantto the issue of behavioral consistency. Table1, adapted from Epstein's Table 4, showsthose coefficients that reached statistical sig-nificance. First, note that each of the mea-sures listed in Table 1 is the aggregate of a12-day sample. As such, the corresponding


reliability coefficients (first column) are assubstantial as they are noncontroversial.

Evidence for Cross-Situational Consistency?

Now, consider the more interesting ques-tion, What is the evidence for cross-situa-tional consistency when highly reliable mea-sures of objectively observed behavior areintercorrelated? Among the 105 relevant cor-relations computed, the 7 that reached sig-nificance (at the p < .01 level) were letterswritten with letters received, calls made withcalls received, stomachaches with headaches,heart rate mean with heart rate range, errorswith heart rate range, erasures with letterswritten, and absences with erasures. Signifi-cant intercorrelations among such itemshardly suggest that the solution to the con-sistency issue has arrived. Most of the ob-tained interrelations seem virtually auto-matic; the more doors I open, the more Itend to close. And we also predict a signifi-cant correlation between how often you say"hello" to people and how often they say"hello" to you. Demonstrating some linksbetween bits of behavior (calls made, callsreceived) that may cohere (often almost bydefinition and because they are functionallyrelated, virtually demanding each other) doesnot provide impressive evidence for cross-sit-uational consistencies. Similarly, demon-strating links between bits of behavior thathave no apparent conceptual relation (errorswith heart rate range) does not provide evi-dence for the breadth of personality traits.Finally, demonstrating some links betweenany two bits of behavior that do not evencome from the same person seems an oddway to prove consistency within a given per-son's behavior. It is puzzling how links be-tween the number of letters I write to you,for example, and the number of letters youwrite to me can speak to my consistency.

Epstein also provides coefficients betweenaggregated objective events and trait-inven-tory scores (his Table 5). Most of the signif-icant associations he found are between self-reported headaches, self-reported stomach-aches, and other self-reported physicalcomplaints and troubles (e.g., muscle ten-sion, autonomic arousal on the Epstein-FenzAnxiety Scale). It is well known to person-

.o3

$

1I

K3

I

*i?c s

Be

§ 00 —ooo\


Table 2Effects of Aggregation Over Occasions onTemporal Stability and Cross-SituationalConsistency

Measure

Temporal stabilityCross-situational

consistency

Singlebehaviors

.29

.08

Aggregates

.65

.13

Note. The coefficients reported are mean correlationcoefficients across all possible comparisons of the des-ignated type.

ality assessors that some people do complainmore than others about their well-being (e.g.,Byrne, 1964; Mischel, 1981; Mischel, Eb-besen, & Zeiss, 1973). The remaining coef-ficients are extremely unimpressive. Suchobjective events as calls made, calls received,letters written, and letters received—the itemsof behavior that do interrelate—even whenaggregated and highly reliable, turn out tocorrelate significantly with none of the in-ventories. Epstein's consistency data, farfrom offering a solution to the problemsraised by the consistency debate, demon-strate and illustrate those problems vividly.

The Carleton Behavior Study

An adequate empirical search for behav-ioral consistency requires data aggregated toachieve reliability. But it also needs to gobeyond reliability, beyond temporal stability,and beyond scattered self-report and behav-ior correlations. It needs to explore cross-sit-uational consistency in behavior with appro-priate and reliable behavior measures sam-pled across a range of presumably similarsituations. Furthermore, such a search needsto be informed by some conceptualization—even if rudimentary—of how behavior is or-ganized or should be categorized for partic-ular goals (e.g., Cantor & Mischel, 1979;Mischel, 1973, 1979). That is exactly whatwe have been trying to do over the last 4years, studying behavioral consistency amongcollege students at Carleton College inNorthfield, Minnesota.

In collaboration with Neil Lutsky, we ini-tiated the Carleton study in an effort to rep-licate the work of Bern and Allen (1974),extending greatly the behavioral referents

and battery of measures employed. In thisresearch, 63 Carleton College volunteers par-ticipated in extensive self-assessments rele-vant to friendliness and conscientiousness.They were assessed by their parents and aclose friend and were observed systematicallyin a large number of situations relevant tothe traits of interest. To illustrate the gist ofthe results as they bear on the issues ad-dressed in the present article, we will focuson the domain of conscientiousness/stu-diousness (henceforth called simply con-scientiousness).1 The behavioral referents ofconscientiousness in this work consist of 19different measures. For example, the behav-ioral assessment of conscientiousness in-cluded such measures as class attendance,study-session attendance, assignment neat-ness, assignment punctuality, reserve-readingpunctuality for course sessions, room neat-ness, and personal-appearance neatness. Notethat the specific behaviors selected as relevantto each trait were supplied by the subjectsthemselves as part of the pretesting at Car-leton College to obtain referents for the traitconstructs as perceived by the subjects; thisis in contrast to many studies in which thesereferents are selected exclusively by the as-sessors. For each different measure, repeatedobservations (ranging in number from 2 to12) were obtained. Thus, for example, weobtained three observations of assignmentpunctuality and nine observations of ap-pointment punctuality.

This design allowed us to assess both thetemporal stability and the cross-situationalconsistency of behavior using single obser-vations and using measures aggregated overoccasions. We were thus able to systemati-cally assess the gains accrued on both tem-poral stability and cross-situational consis-tency when we employed the aggregation so-lution espoused by Epstein. The results ofthis analysis are summarized in Table 2.

Temporal Stability: Cross-SituationalDiscriminativeness

First we computed the percentage of sig-nificant coefficients among all the possible

' Although the focus here is on the conscientiousnessdata, results that essentially parallel those reported herehave emerged from analyses in the friendliness domainand are described in Peake (Note 1).


coefficients of temporal stability. To qualifyas a coefficient of temporal stability, the cor-relation had to consist of two observationsof the, same type of measure. For example,lecture attendance on Day 1 correlated withlecture attendance on Day 6 is a correlationof temporal stability. This analysis revealedthat nearly half of the single-observation tem-poral-stability coefficients (specifically, 46%)were statistically significant. Note that thisis prior to any aggregation to enhance reli-ability. Here, then, is another clear indicationthat even at the single-observation level, tem-poral stability can be demonstrated readily.Given the moderate to high levels of tem-poral stability among the single observations,it is not surprising either conceptually or em-pirically that when these single observationsare aggregated into composite measures, allof the resulting reliability coefficients aresignificant (with the mean coefficientbeing .65).2

We performed a similar analysis for all thecorrelations relevant to cross-situational con-sistency. At the single-observation level, cross-situational consistency coefficients consist ofsuch correlations as a single observation ofappointment punctuality with a single ob-servation of lecture punctuality, or with asingle observation of class-note neatness (i.e.,with any other single observation except an-other observation of appointment punctual-ity). As is now typical for research findingsof this type, although the percentage of sig-nificant correlations (11%) exceeded chance,the obtained correlations were highly erratic,with a mean coefficient of .08. The criticalquestion then becomes, What gains in cross-situational consistency are evidenced whenour more reliable aggregates are intercorre-lated?

To address this question, we must examinesuch correlations as aggregated lecture atten-dance with aggregated appointment punc-tuality or aggregated lecture attendance withaggregated appointment attendance. For thispurpose, 171 cross-situational consistencycoefficients were computed by intercorrelat-ing the 19 different aggregated measures ofconscientiousness. Of the 171 coefficients,20% (35 coefficients) reached significance—a number considerably above chance. Someof these coefficients reached substantial lev-els. For instance, aggregated class attendance

correlates highly with aggregated appoint-ment attendance (r=.67, p<.001). Fur-thermore, there are patterns of meaningfulcoherences among the correlations. For in-stance, aggregated class attendance correlatessignificantly with aggregated assignmentpunctuality (r = .53, p < .001), with comple-tion of class readings (r = .58, p < .001), andwith amount of time studying (r = .31, p <.05). These coherences once again testify thatbehavior is patterned and organized ratherthan random. But it is just as clear from theresults that behavior is also highly discrimi-native and that broad cross-situational con-sistencies remain elusive even with reliablemeasures. For example, whereas aggregatedclass attendance correlates impressively withthe above measures, it does not correlate sig-nificantly with aggregated class-note thor-oughness (r = . 14, ns), aggregated punctualityto lectures (r = —.03, ns), or aggregated as-signment neatness (r = —.04, ns). This dis-criminativeness is further reflected in the factthat for the 19 aggregated measures, themean cross-situational consistency coeffi-cient was only .13 (see Table 2). Thus, theuse of aggregate measures increases the meanconsistency coefficient from .08 to .13.3

It is possible that our data do not ade-quately reflect the gains in cross-situationalconsistency attainable through aggregationbecause several of our measures evidencedlow reliability coefficients even after aggre-

2 The aggregate temporal reliability coefficients re-ported here were computed using the split-half techniqueemployed by Epstein (1979) to parallel his procedures.A more efficient procedure involves computing themean single-observation intercorrelation for the behav-iors to be aggregated (Lord & Novick, 1968). Aggregatereliability is then computed using the Spearman-Brownformula, correcting the mean single-observation corre-lation by the number of observations that are to be in-cluded in the composite. With this latter procedure, themean aggregate temporal stability coefficient is .61(Peake, Note 1).

3 Unless the single observations within one aggregateshow a stable (albeit low) pattern of association with theobservations in the comparison aggregate, substantialincreases in validity should not be expected to,resultfrom aggregation (Lord & Novick, 1968). That is, gainsin cross-situational consistency should not be expectedfrom aggregation when the measures being comparedshow erratic validity coefficients prior to aggregation.Interestingly, the persistent finding of such erratic rela-tions is precisely what led to the controversy over thebreadth and generality of behavioral consistency in thefirst place.


gation. To apply a more stringent test of theeffects of increased reliability for demonstrat-ing cross-situational consistencies, we fo-cused our analysis on those measures whoseestimated reliability exceeded an arbitraryselected level of .65. In all, 14 of the 19 orig-inal measures met this break-off point, andthe mean reliability estimate of these mea-sures is .74. Intercorrelating these 14 mea-sures results in 91 consistency coefficientswith a mean level of .14. Thus, restrictingour attention to our most reliable measuresyields a minimal gain in the mean cross-sit-uational consistency coefficient which in-creases from .13 to .14.

Although restricting our attention to ourmost reliable measures does not seem to havea substantial effect on the cross-situationalconsistency evidenced in the data, one mightargue that a mean reliability coefficient of.74 is still too low to evaluate the reliabilitysolution. What would happen if all our meanreliability coefficients were .95? More im-pressively, what kinds of cross-situationalconsistency would be evidenced if we wereto use perfectly reliable measures? Of course,it is to answer just this type of question thatclassical test theory tells us to to apply thecorrection for attenuation. By correctingeach of the obtained correlations in our ma-trix for attenuation due to low reliability, wecan estimate the maximal ("true") level ofassociation between each of our measures.In this sense, correcting for attenuation is thelogical extreme, or ultimate test, of the "re-liability solution" as proposed by Epstein.When we apply the correction as describedabove, the mean consistency coefficient in-creases to .20. Thus, if we were to collect alarge number of observations for each mea-sure such that the reliability of each of thecomposites (aggregations) of these observa-tions approached 1.0, the mean level of cross-situational consistency evidenced betweenthese measures still would be unlikely to ex-ceed .20.

In light of the various results summarizedto this point, what can one conclude aboutthe promise of the reliability solution for theconsistency debate? First of all, it is clear thataggregation of repeated observations in orderto obtain adequately reliable measures yields,as expected (and hardly to anyone's surprise),

gains in the mean levels of correlations bothfor measures of temporal stability and forcross-situational consistency in behavior.These gains are most impressive for measuresof temporal stability (mean r - .65) and doc-ument that aggregation over occasions is auseful method for increasing the reliabilityof a measure. The results also indicate, how-ever, that aggregating observations over oc-casions does not necessarily lead to highcross-situational consistency (mean r = .13or about .20 if perfect reliability is assumed).4

Although aggregation over occasions has thedesirable effect of enhancing reliability, itdoes not provide a simple solution to theconsistency paradox.

The results of the Carleton project are notunique. A survey of the early studies of thecross-situational consistency of behavior(Peake, Note 1) indicates that the obtainedresults are quite consistent with past findings.These early studies, including the studies byHartshorne and May (1928) of honesty, byNewcomb (1929) of introversion-extrover-sion, by Allport and Vernon (1933) of ex-pressive movements, and by Dudycha (1936)of punctuality, routinely employed repeatedobservations of each behavior to increase thereliability of their measures. Each of theseinvestigators reported substantial reliabilitycoefficients, not as a solution to the consis-tency problem, but as one index of the ad-equacy (reliability) of their aggregate mea-sures. More importantly, each of these stud-ies obtained cross-situational correlationcoefficients of around .20 when using the re-liably aggregated measures. Although Epsteinproposes aggregation as a potential cure-allfor the consistency problem, this cure hasbeen employed routinely for years, and itsuse only serves to document more clearly the

4 The summary coefficients of mean temporal stabilityand mean cross-situational consistency are not strictlycomparable because the former always reflect the sameresponse forms in the same situation, whereas the latteroften reflect different response forms in different situa-tions. To allow a direct comparison, cross-situationalconsistency coefficients were computed separately whenthe response forms in the different situations were thesame (mean r = .28) and when they were different (meanr= .12). Thus, whereas the mean correlation for thesame response forms in the same situations over timewas .65 (temporal stability), it was .28 for the same re-sponse forms in different situations.


pervasiveness of the phenomena of behav-ioral discriminativeness.

The Carleton data do not imply, however,that there is little coherence among the be-haviors studied. Although the overall pat-terning of correlations is erratic and on theaverage low level, the results do not suggestthat behavior is random and unorganized. Aswas noted above, some impressive coeffi-cients emerge, and coherent patterns of cor-relations are apparent among some of thevariables. In addition, 78% (133 of 171) ofthe obtained correlations are positive, and ofthe 38 negative correlations, only 2 reach sta-tistical significance. Thus, we obtained con-siderably more positive significant correla-tions and also obtained considerably fewernegative significant correlations than wouldbe expected by chance. So, whereas the datareflect behavioral discriminativeness, thereis also a positive trend, a coherence or gistamong the behaviors sampled.

Aggregation of the Data—WithoutAggregation of the Issues

Cross-situational consistency coefficientsof the sort we are finding can be construedas evidence either for the relative discrimi-nativeness of behavior or for its coherence,and as evidence either for a stable thread ofindividual differences or for the need to takeaccount of situations seriously. How onereads the results depends on the particularpurposes of the research or assessment task.Recently, however, it has become increas-ingly common to interpret mean coefficientsof the sort obtained at Carleton as ample ev-idence for the consistency of behavior. Byaggregating measures of behavior in partic-ular situations into a single composite, or"multiple-act criterion," substantial internalreliability coefficients are readily obtained(Fishbein & Ajzen, 1975). The problem hereis not whether reliability will increase by us-ing various types of data aggregation but howto select the appropriate type and level ofdata aggregation for particular research prob-lems. Automatic aggregation into overallcomposites simultaneously risks aggregationof the conceptual issues.

Aggregation of observations over occa-sions—Epstein's (1979) basic admonition—

certainly is a requisite for adequate reliabil-ity. No one would contend that a person'sattendance at today's psychology lecture at9 a.m. is an adequate index of that person'stendency to attend psychology lectures. Bymeasuring lecture attendance on repeatedoccasions, a more reliable index of lectureattendance will result. Of course, aggregationper se need not stop at aggregation over oc-casions (Epstein, 1980). The investigator whowants to amass high correlation coefficientscould forge ahead and aggregate behavioracross different response forms and evenacross situations, arriving at the now popularmultiple-act criterion (Fishbein & Ajzen,1975; Jaccard, 1974; McGowan & Gormly,1976; Rushton et al., 1981). Here, again,these aggregations will be appropriate anduseful for some purposes but not for others.

Consider the case for aggregating acrossresponse forms. In many situations a partic-ular dimension of interest may be assessedthrough several behavioral manifestations ortypes of responses. In our data at CarletonCollege, for instance, one could assess cross-situational consistency not just through theintercorrelation of the 19 measures discussedso far but by treating particular settings (e.g.,classrooms) as the situations and aggregatingthe multiple response forms sampled withinthem. In that case, conscientiousness in class-room situations may be indexed by suchmeasures as lecture attendence,1 lecture punc-tuality, note thoroughness, and so forth. Theinvestigator who is more interested in "con-scientiousness in classroom situations" thanin the variations between measures withinthis situation could appropriately aggregateacross the response forms. The compositemeasure of conscientiousness in the class-room could then be compared with measuresof conscientiousness, aggregated across re-sponse forms, in other situations (e.g., at ap-pointments). By aggregating across occasionsand response forms in the Carleton data, themean number of specific observations persituation became 20. The mean cross-situa-tional consistency coefficient for these aggre-gated measures was .18. Aggregation acrossresponse forms within situations provides auseful increment, but hardly a resolution ofthe paradox.

After measures have been aggregated over


occasions, and across response forms withinsituations, it is possible to aggregate even fur-ther, combining measures across situations.One can treat the specific situations simplyas "error" and aggregate across them to forma single composite score and, as the Spear-man-Brown formula predicts, convert ouraverage .13 cross-situational consistencycoefficient into an internal reliability esti-mate of .74. But aggregating across situationsin this way cancels the variance and speci-ficity due to situations, thereby bypassing theproblem of crbss-situational consistency in-stead of solving it; that "solution" merelytreats situations as errors to be averaged outrather than as psychological units to be takeninto account. Although achieving reliablemeasures by sampling over occasions is ofself-evident value, further aggregation, acrossresponse forms or across situations, must bedictated by the assessor's goals and by a prioritheoretical considerations about psychologi-cal similarity (equivalence groupings) andabout the level of generality—the units—atwhich assessment should occur for particularpurposes (e.g., Cantor & Mischel, 1979; Mis-chel, 1977). Although such aggregation isuseful for making statements about meanlevels of behavior across a range of contexts,cross-situational aggregation also often hasthe undesirable effect of canceling out someof the most valuable data about a person. Itmisses the point completely for the psychol-ogist interested in the unique patterning ofthe individual by treating within-person vari-ance, and indeed the context itself, as if itwere "error."

The persistent pursuit of consistency byattempting to treat situations as error is acurious paradox in a field committed to afocus on individuality and the pursuit of per-sonology (e.g., Carlson, 1971). On the onehand, the importance of attention to thewithin-person patterning of attributes andbehavior—the crux of the idiographic ap-proach and the uniqueness of the person—has long been recognized (e.g., Allport, 1937).On the other hand, this patterning is aggre-gated out and treated as if it undermined thebasic phenomena of personality psychology.As a personologist (or clinician) I may be lessinterested in aggregating a child's total ag-gressiveness across situations than in notingthat she is aggressive with her sister, but not

with her brother, or is aggressive only whenteased in a particular way, but never whenin the presence of her father. In sum, aggre-gation may be ideal for canceling out manyinfluences and describing mean level differ-ences between individuals, but such lumpingis obtained at the cost of much valuable in-formation—often the most interesting infor-mation—about the individual in the partic-ular contexts in which he or she lives. Forthe clinician as well as for the personologist,aggregation is often the route to weak gen-eralizations about people in general but by-passes the uniqueness, specificity—and pre-dictability—of individuality to which a sci-ence of personology is ostensibly devoted.

The conceptual problems that must beaddressed when aggregating various types ofdata are essentially problems of psychologicalsimilarity. Evidence for higher average coef-ficients for temporal versus cross-situationalconsistency suggests that when the situationsare as close as possible to identical (i.e.,changed pnly by time), there is impressiveaverage stability. But when situations be-come even somewhat dissimilar, the patternsbecome more complex and uneven, averagecoefficients become much lower, and consis-tency can no longer be assumed. The needto search for ways to identify similarity thenbecomes manifest (see Lord, 1982, and Mag-nusson & Ekehammar, 1973, for interestingexamples). Few problems in psychology seemmore basic than that of psychological simi-larity, (e.g., Tversky, 1977), and its resolutionultimately should have much to say to thestudy of situational equivalences and the cat-egorization of behavior. Theory-guided ag-gregation requires identifying psychologicalequivalences and the psychological similarityamong situations, not just averaging every-thing that can be summed.

In our view, the challenging problems inthe consistency debate require more thansearching for significant correlations and rec-ognizing coherence in the obtained results.We need to understand why the obtainedcoherences emerge and when and why ex-pected coherences do not. The technologiesof psychometrics supply us with ample meth-ods for distilling the coherence among ourmeasures, for accentuating the mean levelsof individual differences we have identified,and for focusing on their gist. As our analysis


continues, we intend to employ these varioustechnologies in hope of fully illuminating thepsychological significance of these coher-ences. However, we simultaneously intendto pursue the oft-neglected alternative pathof attempting to understand the discrimi-nativeness that also clearly exists in our dataand that we believe demands an interactionalperspective that treats situations as sourcesof meaningful variance. For instance, in ourresearch at Carleton College, we plan tosearch for consistency at different levels ofabstraction-generality in the data, from themost "subordinate" or molecular to increas-ingly broad "superordinate" molar levels,guided by a cognitive prototype and hierar-chical-levels analysis of the sort proposed byCantor and Mischel (1979) and Cantor et al.(1982a). We also plan to explore the com-parative usefulness of measures specificallydesigned to tap such cognitive social-learningperson variables as the individual's relevantcompetencies, encodings, expectancies, val-ues, and plans (Mischel, 1973, 1979).

Such analyses should help illuminate pro-cesses that underlie both the significant andthe nonsignificant coefficients yielded by per-sonality research—the "uneven and erraticpatterns" of behavior that characterize per-son-situation interactions. Aggregation ofrepeated observations over occasions will aidin these analyses by providing a more ac-curate picture of the significant and nonsig-nificant links that characterize the data. Thereliability solution, rather than providing asimple answer to the issues raised regardingthe cross-situational consistency of behavior,highlights their complexity. Rather than re-solving the consistency debate, the reliabilitysolution underlines the need to seek alter-native conceptualizations of personality thatmight lead us to a better assessment, under-standing, and appreciation of both the co-herence and the discriminativeness of humanbehavior.

On Predicting More of the PeopleMore of the Time:

The Template-Matching Solution

Another solution for the consistency prob-lem is offered by Bern and Funder (1978).Their aim was to utilize Block's CaliforniaChild Q-Set to find a common language for

the description of both persons and situationsand to develop a template-matching tech-nique that allows one to examine the inter-face of person and situation characteristics.Because of the great promise it offers, andthe substantial attention it already has at-tracted, Bern and Funder's contribution mer-its careful analysis and a thorough exami-nation of its findings and implications. In thisanalysis the focus will be on the content areaof delay of gratification, the domain Bern andFunder chose to illustrate their approach, asit speaks to the consistency problem, andthus will deal only with their first study.

The Bem-Funder Study

Bern and Funder note the failure to obtainmore impressive cross-situational consis-tency coefficients in personality research gen-erally and for delay of gratification in partic-ular. They suggest that the inconsistenciesobtained are due to the erroneous equationof situations that are superficially similar butfunctionally different. To identify situationsthat are functionally dissimilar though ap-parently similar, they propose examining therated personality characteristics associatedwith the behavior in these situations. Theirultimate hope is to increase the predictivepower of trait information by matching theindividual's personality characteristics withthe "personality of the situation," defined byQ-sort ratings that supply "portraits" of boththe individual and the setting. They see theirwork as relevant to the consistency issue byshowing that situations that seem alike mayactually be dissimilar, as evidenced by dis-crepant patterns of Q-sort correlates. Wetherefore should not expect behavior in suchsituations to be highly intercorrelated. More-over, they argue that only when situationsare characterized by similar Q-sort portraitsshould we expect—and find—high intercor-relations among the behaviors displayed inthem.

Their procedure and results require closeconsideration. They exposed 29 preschoolchildren to a version of the traditional delay-of-gratification situation. Although Bern andFunder retained the basic features of the de-lay paradigm (e.g., Mischel & Ebbesen, 1970),there was a possibly important difference:The experimenter remained in the room with


the child during the delay period rather thanleaving the room as in the traditional para-digm. Correlations were computed betweena child's delay time and each of the 100 itemsin the Q-set (i.e., the parents' trait ratings ofthis child).

Some of the correlations they obtainedwere highly consistent with previous findingson the rated characteristics of children whoare high versus low in their ability to wait inthe standard delay situations (see Table 3,reproduced from Table 1 of Bern & Funder,1978, p. 490). Bern and Funder note the ex-istence of these expected correlations (e.g.,

Table 3Q-Item Correlates With Delay Scores

Item r

Positively correlated

Has high standards of performance forself .48***

Tends to imitate and take over thecharacteristic manners and behavior ofthose he or she admires .39**

Is protective of others .39**Is helpful and cooperative .36*Shows a recognition of the feelings of

others (empathic) .35*Is considerate and thoughtful of other

children .34*Develops genuine and close relationships .31*

Negatively correlated

Appears to have high intellectual capacityIs emotionally expressiveIs verbally fluent, can express ideas well

in languageIs curious and exploring, eager to learn,

open to new experiencesIs self-assertiveIs cheerfulIs an interesting, arresting childIs creative in perception, thought, work,

or playAttempts to transfer blame to othersBehaves in a dominating way with othersIs restless and fidgetySeeks physical contact with othersIs unable to delay gratification

-.62***-.56***

-.50***

-.49***-.47**-.43**-.43**

-.40**-.37**-.34*-.31*-.31*-.31*

Note. From "Prediciting More of the People More of theTime: Assessing the Personality of Situations" by DarylJ. Bern and David C. Funder, Psychological Review,1978, 85, 485-501. Copyright 1978 by the AmericanPsychological Association. Reprinted by permission.* p < .10 (two-tailed). ** p < .05 (two-tailed). *** p <.01 (two-tailed).

"Has high standards of performance" is re-lated to duration of delay, as earlier researchsuggests it should be). However, they em-phasize that the rest of the portrait

introduces a more dissonant note, a picture of the long-delaying child as not very intelligent, not verbally fluent,not eager to learn, or not open to new experiences—achild, moreover, who is not self-assertive, cheerful, in-teresting, or creative. The very strong negative relation-ship between delay and rated intelligence is particularlyinconsistent with theories of ego control, (p. 490; em-phasis added)

Bern and Funder then confront the con-sistency issue by comparing the Q-sort por-trait from their version of the delay situationwith the portrait that emerged from a studyof "gift delay" by Block (1977) in an effortto show why behavioral consistency has notbeen found in these two situations. In Block'sprocedure, the child sits in front of a gailywrapped gift that is not to be opened untilhe or she has completed a puzzle. The mea-sure of delay is the length of time the childwaits before reaching out and taking the gift.Bern and Funder compare their own resultswith the Q-sort items that were associatedwith long delay time on Block's measure andnote that the two Q-sort portraits t

show very little overlap, and the dull-passive-obedientcluster of items that emerged in our experiment is com-pletely absent from the Block data. What we have here,then, are two situations that appear conceptually equiv-alent but that are functionally quite different, and itwould appear that different subsets of children are de-laying in the two settings. Typically, one learns only thatbehavior across two theoretically similar situations isdisappointingly inconsistent. By collecting Q-sort data,however, one can see in exquisite detail the nature ofthat inconsistency and draw plausible inferences aboutits source in the nonoverlapping features of the settings,(p. 491)

The Bem-Funder approach promises tohelp resolve the consistency issue by identi-fying those situations that are psychologicallyequivalent rather than merely superficiallysimilar, but in fact, their data provide no ev-idence whatsoever for cross-situational con-sistency. The design of their study does notallow them even to make any estimates aboutpotential consistency because evidence forcross-situational consistency can only be ob-tained if the same subjects are observed inat least two or more situations. Rather, theirconclusions are based on a comparison of the


Q correlates of delay behavior in two differ-ent delay situations, for two different sets ofchildren, where samples were not evenmatched in age. Their paper thus raises theprospect of improving evidence for cross-sit-uational consistency in behavior, but it doesnot take the essential step of doing so. Thatis why the present authors proceeded to tryto fill this void and tested the Bem-Funderpromise empirically with an appropriate de-sign for assessing cross-situational consis-tency.

The Mischel-Peake Study

We attempted to replicate the Bem-Fun-der study, with some important additions.We exposed children to the Bem-Funderparadigm using identical procedures, thesame immediate and delayed rewards, andsubjects of the same age drawn from the samepopulation (Stanford University's Bing Nurs-ery School), whom we tested at about thesame time of year as those in the Bem-Fun-der study. These children also participatedin the standard delay paradigm (described inMischel & Ebbesen, 1970), which differsfrom the Bem-Funder procedure only in thatthe experimenter is absent during the delayperiod. This design made it possible for usto assess the consistency of the children'sdelay behavior across the two situations (ex-perimenter present versus absent) and tocompare the Q correlates for the two situa-tions systematically. The children partici-pated in the two situations in random orderand within a period of 3 weeks. To enhancereliability, the sample size was nearly twicethat in the Bem-Funder study.

Delay behavior in the two situations cor-related at a modest level (r = .22, ns). Ac-cording to Bern and Funder's reasoning, thislow level of association provides an ideal testof their proposition on two counts. First, thetwo situations appear to be similar (indeed,the two vary only in the experimenter's pres-ence versus absence—an aspect apparentlyso trivial that Bern and Funder employed itas a substitute for the standard Mischel delayparadigm). Second, behavior in the two sit-uations proves to be not strongly intercor-related empirically. Applying the Bem-Fun-der Q-sort approach to the two situations

should reveal the "distinctive features" thatmake these seemingly similar situations psy-chologically different, showing their "uniquepersonalities" and revealing why behavior inthe two situations is not more closely inter-correlated.

Our investigation yielded results that sur-prised us greatly. In spite of our considerableefforts to conduct an exact replication of theBem-Funder delay study, the data we ob-tained did not support their basic findings;it reversed them. The major distinctive Qcorrelates obtained by Bern and Funder areshown in the first column of Table 4; the Qcorrelates yielded by our replication are givenin the second column. The resulting portrait,far from replicating the Bem-Funder dis-tinctive features, is the opposite of the onethey drew. For example, on the identicalmeasure (i.e., with experimenter present), thehigh-delay child, rather than being intellec-tually dull and uneager to learn, appears tohave high intellectual capacity (r = .23, p <.10) and is curious and exploring, eager tolearn, and open to new experiences (r = .27,p < .05).5 Bern and Funder's strong negativecorrelations between high delay and verbalfluency; self-assertiveness; and creativity inperception, thought, work, or play are all lost(r = .06, .09, and .00, respectively). The re-maining "distinctive features" also fail toreach statistical significance.

Interestingly, it is only the distinctive fea-tures that are lost or reversed in the repli-cation. The Q-sort correlates that are theo-retically consistent with the previously estab-lished view of the "ego strength" correlatesof high delay remain intact, indicating thatthe delay measure employed had validityeven though the Q correlates reported as dis-tinctive by the Bem-Funder methodologyturned out to be unstable. Thus, traditionalego-strength correlates for the delay measure(Mischel, 1974) such as "has high standardsof performance for self and "develops gen-

5 Bern and Funder's criterion for determining theweightings of Q-sort items was statistical significance atthe p <. 10 level (two-tailed). We have adopted this non-traditional level here only for purposes of replicationand to compare the obtained Q-item correlates withtheirs. For all other correlations and analyses in this ar-ticle, we use the more traditional p < .05 (two-tailed)criterion unless otherwise noted.


Table 4Comparative Correlates for Significant"Distinctive" Items of the Bern and Funder(1978) Delay Situation

Item

Bern andFunder(1978)/-

Mischel andPeake

replication r

Appears to have highintellectual capacity -.62*** .23*

Is verbally fluent, canexpress ideas well inlanguage -.50*** .06

Is curious and exploring,eager to learn, opento new experiences -.49*** ,27*

Is self-assertive -;47** .09Is cheerful -.43** -.19Is an interesting,

arresting child -.43** -.17Is creative in perception,

thought, work, or play -.40** .00N 29 52

* p < .10 (two-tailed). ** p < .05 (two-tailed). *** p <.01 (two-tailed).

uine and close relationships" were of the ex-pected sort (r = .25, p < .05 and r = .23, p <.10, respectively). In the same vein, the rep-lication yielded the expected correlations be-tween high delay time and such ratings as"is competent and skillful" (r = .38, p <.005), "is planful, thinks ahead" (r = .36, p <.01), "uses and responds to reason" (r = .27,p < .05), and "is reflective, thinks and delib-erates before he or she speaks or acts" (r =.26,//<.10).

In sum, we not only found that the Bem-Funder delay situation and the classical delaysituation were not reliably differentiated bydistinctive Q-sort portraits but also foundthat the two situations share highly similarand familiar patterns of correlations. Thecorrelations obtained for the "standard de-lay" (experimenter absent) procedure weregenerally similar to those for the Bem-Fun-der (experimenter present) version and werebasically coherent with the expected ego-strength portrait (Mischel, 1966, 1974).Namely, these were low-level but statisticallysignificant associations between the stan-dard-delay measure and such ratings as "hashigh standards of performance for self;"uses and responds to reason"; "is curiousand exploring, eager to learn, open to newexperiences"; "can acknowledge unpleasant

experiences and admit to own negative feel-ings."

This total pattern of results has consider-ably more significance than a failure to rep-licate the Bem-Funder distinctive portraits.The data more importantly speak to thevalue of Bern and Funder's approach for re-solving the consistency issue in the directionsthey proposed. According to Bern and Fun-der, here are two situations revealed by theirmethod to have similar correlates and there-fore expected by their analysis to yield cross-situational consistency in behavior. Yet, infact, the intercorrelation for behavior in thesetwo situations is only .22 in our data. Dis-appointingly, the Bem-Funder approach, atleast in the delay context, provides no per-ceptible increment in the usual level of cross-situational consistency obtained, regardlessof whether one views that level as not veryhigh or not very low.

Although the obtained Q-sort portraitsproved neither stable nor distinctive, wemust still consider the utility of the template-matching technique for making predictionsabout the delay behavior of children withinour study. So far, we have considered onlythe Q-sort correlates of the measures, withoutapplying the template-matching proceduresproposed by Bern and Funder to the data.Their weighting procedure was intended toallow them to transport "the relevance in-formation . . . from one subject sample toanother" (p. 492) and thus to "generate tem-plates that retain their situation-character-izing properties while simultaneously char-acterizing most closely the particular sampleof individuals whose behavior is to be pre-dicted" (p. 492). Given that the distinctivefeatures of the Q-sort portrait for the Bem-Funder measure proved to be so unstable, itis not surprising that application of the orig-inal Bem-Funder template weights to ourreplication turned out to have no predictivevalue, producing a negligible correlation(r = .05, ns). More distressing is the fact thatwithin the replication sample itself, efforts tocross validate internally the template-match-ing technique did not succeed in spite of ourattempts to adhere closely to the original pro-cedures.6

6 Although our focus here is only on the applicationof the Bem-Funder template-matching approach to the


Old Cookbooks in New Templates?

In our view, the significance of the resultsdiscussed so far is not that they documentthe empirical limits of a particular study butrather that they illustrate the boundaries ofa more general approach with a formidablehistory and tradition. To understand the fail-ure of the Bem-Funder strategy for solutionof the consistency problem, we must beginby examining more closely the strategy itself.They hoped their strategy would be a newroute to uncover the "personality of the sit-uation"—in this case, the delay-of-gratifica-tion situation. But though they conceptual-ized and described their efforts in the lan-guage of person-situation interaction andhoped to capture the personality of the delaysituation, their method only described thecorrelates of performance in that situation(i.e., the ratings associated with duration ofwaiting time). To justify describing the Bem-Funder approach as an assessment of thepersonality of situations would require thatone at least also Q sort the situations inde-pendently, not just the people who performin them (Hoffman & Bern, Note 2). No setof correlates for performance, no matter howdescribed, can do more than illuminate someof the specific ways particular kinds of peopleare likely to perform or to seem differentfrom other kinds on particular measures. Thesearch to characterize what kinds of peopleare likely to do well on particular tasks, sit-uations, or measures is exactly what tradi-tional personality assessment—both clinicaland actuarial—has been about for decades(e.g., the classic work of Murray and theHarvard personologists). Such a search forperformance correlates to characterize themeaning of a response pattern and to predictindividual differences is the essence and sta-ple of the trait approach (Mischel, 1968).

This search can be undertaken through astrategy of construct validity (Cronbach &Meehl, 1955), in which the focus is on the

delay situation, there also have been attempts to replicatetheir effort to predict attitude change in the forced-com-pliance paradigm. These attempts at extended replica-tion so far also seem to have failed (Funder, 1979; Hoff-man & Bern, Note 2). A more situation-specific or "con-textual" procedure now has been substituted for theglobal Q-sort descriptions (Hoffman & Bern, Note 2)and seems promising, but it might be wise to await rep-lication of these new efforts before judging them.

development of a theory about what ac-counts for the behavior of interest. Or, likeBern and Funder's approach to revealingconsistency in delay of gratification, it canbe undertaken in an entirely empirical fash-ion, as in the actuarial and empirical keyingstrategies pioneered by Paul Meehl (e.g.,1956) and favored for many years in person-nel and psychodiagnostic assessment. The"cookbooks" and "atlases" devised two de-cades ago in this approach, for example, triedto assess the degree of fit between an indi-vidual and an ideal personality type (definedby a distinctive Minnesota Multiphasic Per-sonality Inventory [MMPI] test configura-tion) in order to predict a pattern of criterionperformance and characteristics associatedwith that ideal type in a particular setting(e.g., a Veterans Administration mental hy-giene clinic). It is worth remembering in de-tail Meehl's (1956) cookbook method for theprediction of personality descriptions and/orother data in particular situations. The logic—and fate—of his approach bears directly onthe solutions currently proposed by Bern andhis associates:

In the cookbook method, any given configuration (ho-lists please note—I said "configuration," not "sum"!) ofpsychometric data is associated with each facet (or con-figuration) of a personality description, and the closenessof this association is explicitly indicated by a number.This number need not be a correlation coefficient—itsform will depend upon what is most appropriate to thecircumstances. It may be a correlation, or merely anordinary probability of attribution, or (as in the empir-ical study I shall report upon later) an average Q-sortplacement, (p. 264)

Twenty-two years later, Bern and Funder(1978) wrote,

What is being proposed here, then, is that situations becharacterized as sets of template-behavior pairs, eachtemplate being a personality description of an idealized"type" of person expected to behave in a specified wayin that setting. The probability that a particular personwill behave in a particular way in a particular situationis then postulated to be a monotonically increasing func-tion of the match or similarity between his or her char-acteristics and the template associated with the corre-sponding behavior, (p. 486)

Note again that although their intent is tocharacterize situations, the operation is ac-tually to seek the correlates of performancein a specific situation. Both Bern and Funderand Meehl assert that the probability that asubject will do a specified act in a particular


situation (or be judged to have particularcharacteristics) depends on the degree ofmatch or similarity between that subject'sscores and a given configuration. The con-figuration is a "template" for Bern and Fun-der, a "profile" for Meehl. The behaviors (oroutcomes) for Bern and Funder are suchthings as delay time or attitude changeor adjustment to Stanford University. ForMeehl, they were Q-sort patterns, other per-sonality descriptions, or any other data (e.g.,duration of remission or adjustment to theMinnesota Veterans Administration hospi-tal). The method for determining the degreeof match between the subject and the con-figuration involves a particular weightingprocedure for Bern and Funder; it was alsoachieved quantitatively by Meehl but withthe recognition that the particular form (sim-ple correlation, regression, probability attri-bution) will depend on its appropriatenessfor the particular purpose at hand. And, likeBern and Funder, Meehl (1956, p. 267)searched for ideal types whose distinctiveprofiles (recipes) would be related to suchcriterion data as Q sorts in the cookbooks.

Although the language in the descriptionsof the two methods is somewhat different,the Bem-Funder "template-behavior pair"parallels the Meehl "configuration-person-ality description association." The two ap-proaches seem to overlap a good deal, in-cluding a heavy reliance on the same tech-nique—the Q sort. Indeed, Meehl (1956)illustrated his pioneering article on the cook-book method with a study that shows itsvalue for predicting therapists' Q sorts of pa-tients from their match to the cookbook'sMMPI curve types (substitute "templates").For example, for each patient who best fitsa given MMPI code, "we simply assign theQ-sort recipe found in the cookbooks as thebest available description. How accurate thisdescription is can be estimated (in the senseof construct validity) by Q correlating it withthe criterion therapist's description" (p. 268).Empirically, the initial yield was highly en-couraging, with a median validity coefficientof .69.

Once the close parallels are recognized, itbecomes clear that the Bem-Funder ap-proach shares both the strengths and theweaknesses of its predecessor. The strengths

are the bypassing of weak (poor, messy) the-ory-based predictions for the sake of neat,mechanically assisted empiricism.7 Theweakness is the opposite side of this samecoin, that is, the cost of blunt empiricism:Atheoretical approaches yield results thattend to seem promising at first but are no-toriously difficult to replicate. Assessors ex-cited by the first actuarial cookbooks andtheir strong predictions were soon sobered bythe frequent failure to replicate the config-uration-outcome patterns that at first seemedso useful. Particularly when the sample sizeis small, and when the associations are purelyempirical (with no theoretical hypotheses apriori), correlations with Q-sort items maybe extremely unstable and can vanish rap-idly, as we saw in the failure to replicate thedistinctive Bem-Funder portraits. If person-ality assessment has led to any firm conclu-sions, it is that it is generally not worth of-fering conclusions about actuarially obtainedperformance correlates unless they are basedon careful cross-validation.

Another truism emerging repeatedly fromMeehl's once-exciting cookbook approachand from the history of personality assess-ment more generally is that simple linearcombinations of single scale scores often turnout to be more accurate than complex, so-phisticated configural models, including theMeehl-Dahlstrom Rules (e.g., Goldberg,1965). Simple cooking with simple recipesgenerally works better than more esotericmethods in actuarial assessment. At a min-imum, it seems wise to compare the incre-ments that fancier (thus costlier) methods(like template matching) provide when testedagainst old-fashioned, basic fare (like linearregression to predict a particular behavior ina given situation from any cross-validateditems shown to have predicted it before). Itmay be less exciting, but often it works better(e.g., Tellegen, Kamp, & Watson, 1982).Complex weighting procedures, as in theBem-Funder template-matching technique,may inadvertently serve to compound error

7 Our analysis and comments in the present article arerestricted to Bern and Funder's (1978) first study, theonly one that addressed the consistency issue, not totheir efforts to use social-psychological theories to gen-erate predictors about individual differences in the rel-evant social situations.


(by weighting chance findings) and thus hurtrather than help the enterprise. For example,reviewing efforts to compare a number ofregression equations that varied in complex-ity from linear to higher order, Wiggins(1981) noted,As one might expect, there was a tendency for increasedpredictiveness to be associated with increased complex-ity of prediction models—in the sample from whichpredictor weights were derived. When these same equa-tions were applied to an independent sample, however,there was a tendency for decreased predictiveness to beassociated with increased complexity of models. In otherwords, cross-validation appears to wipe out any predic-tive gains that are apparent in a derivation sample. Per-haps there is a general principle here that should dis-courage psychologists from overfilling their data withcomplex equations, (p. 6)

In sum, the Bem-Funder attempt to re-solve the consistency issue held out an ex-citing prospect. We have analyzed it in detailto try to untangle the promise from the out-come. The prospect of a parallel languageand methodology for the study of personsand situations remains attractive, but Bernand Funder's efforts toward a solution of theconsistency issue in the delay domain proveddisappointing both in its method and in itsresults. When the sample is adequate, onefinds the typical, replicable, low-level, theory-consistent Q-sort correlates of delay behav-ior. No advance is made by use of their meth-ods to show replicable distinctive portraits,to demonstrate improved cross-situationalconsistency in the domain, or to explain itsphenotypic absence. The promise that theBem-Funder technique would allow dem-onstrations of impressive cross-situationalconsistency by identifying measures thathave similar rather than distinctive Q-sortcorrelates still awaits realization. We are pes-simistic about the prospect for this effort inits present form not only because of the datawe presented but because of the theoreticalconsiderations discussed and the fate of rel-evant efforts attempted repeatedly in thepast.

On Predicting Some of the People Some ofthe Time: Bern and Allen's (1974)

"Idiographic" Solution

In the third approach to the consistencyissue to be discussed here, Bern and Allen

(1974) argued that the low consistency coef-ficients so typically found in research reflectthe nomothetic fallacy of assuming that alltraits are relevant to all people. Instead, theyurged adopting an "idiographic stance,"studying only the subset of people for whoma given trait is relevant.8 For this purpose,Bern and Allen separated subjects into high-and low-variability groups, using subjects'self-reports about their variability and be-havior. They then tried to document thatlow-variability subjects are in fact more con-sistent than high-variability subjects on twotraits: friendliness and conscientiousness.

To assess cross-situational consistency,Bern and Allen used measures consistingmostly of global ratings of the subjects madeby the subject, a peer, and the subject's par-ents. In addition, On each trait dimension,several composited behavior measures wereemployed. Careful inspection of their cor-relational matrices reveals good support fortheir predictions on the rating data. For ex-ample, for subjects classified as low ratherthan high variability, raters agreed muchmore about the subject's level of conscien-tiousness. But, whereas the technique nicelyidentifies people for whom the correlationsamong the global ratings will be substantial(i.e., people about whose trait level ratersagree), the results are tenuous when behaviormeasures are intercorrelated. In fact, onthose few measures directly relevant to theissue of cross-situational consistency of be-havior, only one correlation was higher forsubjects rated a priori as "low variability."Thus, in their analysis of friendliness, the

8 Bern and Allen presented their work as "idio-graphic," and it is widely cited as exemplifying the powerof idiographic methodologies (e.g., Kenrick & String-field, 1980). There is a confusion of terminology here,however. Idiographic usually refers to the unique orga-nization of traits within individuals (Allport, 1937), notto the fact that not all characteristics may be relevantto all people. Because Bern and Allen's approach doesnot speak to wilhin-person trait organization, it seemsa misnomer to label it idiographic. Rather, their ap-proach rests on the assumption that a given trait di-mension may simply not be relevant for some people,and such irrelevance may be identified by selecting thosesubjects who rate themselves as highly variable on thatdimension. Instead of bearing on the idiographic-no-mothetic distinction, this approach, as noted elsewhere,is an instance of the classic moderator-variable strategy(Tellegen, Kamp, & Watson, 1982).


Table 5Interrater Agreement About Trait Standing for Low- and High-Variability Groups Determined bySelf-Report (Below Diagonal) and Ipsatized Variance (Above Diagonal) for Conscientiousness

MeasureSelf-

reportMother's

reportFather'sreport

Peer'sreport

Self-reportMother's reportFather's reportPeer's report

.56 (.20)

.74 (.28)

.66 (.39)

.47 (.29)

.76 (.29)

.71 (.29)

(.52)(.45)

.66 (-.12)

.54 (.50)

.75 (.35)

.44 (.24)

Note. Agreement ratings for high-variability subjects are in parentheses.

correlation between "spontaneous friendli-ness" and "group discussion friendliness"was .73 for the low-variability group, whereasthe same correlation for the high-variabilitygroup was .30. However, none of the threelow-variability versus high-variability com-parisons among behavior measures of con-scientiousness fell in the predicted direction.It would seem premature to offer conclusionsabout the efficacy of the Bern and Allen ap-proach as it speaks to the issue of behavioralconsistency on the basis of a single confirm-ing comparison (see Lutsky, Peake, & Wray,Note 3).

In view of the weight given to the Bern andAllen (1974) data in current theorizing aboutthe consistency issue, it seemed important totry to replicate their work as carefully as pos-sible, extending the number and types of be-havior measures obtained. For this purpose,their Cross-Situational Behavior Survey(CSBS) was administered to the 63 subjectsat Carleton College. Similar ratings of thestudents were obtained from their parentsand from a close friend, using a modifiedCSBS. Subjects were divided into high- andlow-variability groups for both traits on thebasis of their self-reported variability and onthe basis of the ipsatized variance index—thetwo techniques proposed by Bern and Allen.For convenience, as well as continuity withthe rest of our presentation, we will sum-marize the findings only for the conscien-tiousness domain. (Detailed analyses anddiscussion of these data for both traits are inPeake & Lutsky, Note 4.)

As already noted, the bulk of Bern andAllen's data consisted of interrater agreementacross CSBS trait ratings. Table 5 shows thaton these measures, their results are nicely

replicated regardless of the classification pro-cedure used. Raters agreed more about sub-jects classified as low variability by either ofthe Bern-Allen techniques. Using the self-re-ported variability procedure, the mean in-tercorrelation for low-variability subjects*was.68 compared to .22 for high-variability sub-jects. The comparable mean coefficients us-ing the ipsatized variance index were .56 and.39 for the low- and high-variability groups,respectively.

These findings replicate Bern and Allen'sfor those measures that worked best for them,providing support that' their techniques forclassifying low- versus high-variability peopleallow one to select those individuals forwhom raters will tend to agree when makingglobal personality judgments. But more rel-evant to the issues of behavioral consistencyare the cross-situational behavior data fromthe Carleton project summarized in previoussections on the applications of the reliabilitysolution to the conscientiousness data. Not-ing the .13 mean consistency coefficient ob-tained in the Carleton College data after ag-gregating over occasions, Bern and Allenmight reasonably argue that even perfect re-liability will be of little value as long as re-searchers proceed with the nomothetic as-sumption that all traits belong to all individ-uals. Rather, now that adequate reliability isestablished, the search for consistency mustadopt the idiographic stance and must selectfor study only that subset of individuals forwhom the particular trait is relevant: Greaterconsistency should be obtained, but only forthose people who are identified as low vari-ability on the trait.

To explore this possibility, separate cor-relation matrices were generated for the 19


measures of conscientiousness for both thehigh- and low-variability subgroups identi-fied with the Bern and Allen classificationprocedures. The summary coefficients in Ta-ble 6 suggest that the Bern-Allen classifica-tion procedures and approach provide noappreciable gain over the traditional yieldwhen one turns from data based on interrateragreements about the subject's conscien-tiousness to more direct measures of cross-situational consistency in the referent behav-iors. Thus, the mean cross-situational con-sistency coefficients for the low- versushigh-variability subgroups are .11 versus .14using the self-reported variability index and.15 versus .10 using the ipsatized index. Al-though Bern and Allen subtitled their article"The Search for Cross-Situational Consisten-cies in Behavior," it is precisely in this searchthat their technique fails to meet its promise.

What do these data imply? First, note thatthe current results parallel Bern and Allen'squite closely. In both studies, interrateragreement about the subjects' conscientious-ness was substantial for those students clas-sified as low variability, but this agreementwas not reflected in substantially highercross-situational consistency in their ob-served behavior. We are left then with a rep-licated paradox: Subjects classified as lowvariability (consistent) tend to show high lev-els of interrater agreement when rated onpersonality indexes by relevant others (in-tuitively implying some consistency), yetthey do not show appreciably higher levelsof cross-situational consistency in behavior—the very data on which their variability judg-ments were presumably based. We believethat this "replicable paradox" is a key com-ponent of the puzzle that needs to be solvedto help untangle the consistency problem.

Toward a Resolution of theConsistency Paradox

Reviewing the hoary history of the con-sistency paradox, Bern and Allen (1974)"appreciate the sense of dejS vu that mustcurrently be affecting psychology's elderstatesmen now that the 'consistency prob-lem' has suddenly been rediscovered" (p.507). Dejii vu may be experienced no,t onlyat the rediscovery of the consistency paradox

Table 6Overall Mean Correlations by VariabilityClassification for Correlations ReflectingInterrater Agreements and Cross-SituationalConsistency

Type of correlations

Type ofClassification

Nomothetic(all subjects)

Low variability(high variability)

Self-reportedIpsative

Interrateragreement

.52

.68 (.22)

.56 (.39)

Cross-situationalconsistency

.13

.11 (.14)

.15 (.10)

Note. Correlations for high-variability groups are in pa-rentheses.

but also at the contemporary proposals forits solution. In one sense, this feeling of fa-miliarity has a comfortable side: It is reas-surring to see continuity in the fundamentalquestions and struggles for progress in thefield of personality. Each of the "better meth-ods" proposals has a long history but also amajor lesson to teach, one that is too oftenforgotten. But if the study of personality isto be a cumulative enterprise in whichknowledge and insight are not merely recy-cled, then these lessons must be distinguishedfrom premature conclusions about theirtheoretical implications. In our view, theconsistency paradox may now be at the brinkof a resolution, if the relevant lessons of ourfield are properly read and integrated witha theoretical reconceptualization for whichthe outlines are already available.

The State of the Paradox

Consider again the proposed solutions re-viewed here. First, aggregation over occa-sions, as advocated by Epstein, is a necessarystep that will surely increase the reliability ofmeasures, as the Spearman-Brown Formulahas long predicted. Further aggregation,across response forms and especially acrosssituations, will serve to highlight stable meanlevels of behavior by eliminating the vari-ability due to contexts. We do not doubt theoccurrence of stable means, but we areequally impressed by the occurrence of sub-


stantial variance around those means. Sam-pling behavior extensively in a domain oftenallows useful predictions of individuals' ag-gregated mean levels of behavior in that do-main. The fact that relevant past behavior isoften the best predictor of future behavior isnot in doubt (Mischel, 1968). Just as the oc-currence of stable mean levels of behavior ina domain does not deny within-person vari-ability, so should the appreciation of suchvariability not be mistakenly read to preemptthe existence of stable personal qualities.

But as numerous analyses (including ours)have shown, such increases in reliability ofbehavior measures within specific situations,rather than assuring impressive increases inthe associations between them, suggest againthat the discriminativeness of behavior is avalid phenomenon rather than a reflectionof poor methodology. It is old wisdom thatthe prediction of single acts is as difficult andgenerally as unlikely as it is challenging. (Yetit would hardly be of trivial interest to be ableto predict single acts like suicide, coronarydeath, and homicide.) Usually we must settlefor trying to predict an average of repeatedobservations, to infer a tendency within agiven situation. With that goal, one samplesthe behavior of interest multiply in the sit-uation, as we did in the reported Carletonproject analyses. Such sampling of repeatedobservations within a given setting shouldnot be confused, however, with aggregationacross situations in the search for cross-sit-uational consistency. To aggregate cross sit-uationally is to circumvent the issue ratherthan to demonstrate its resolution. The riskis that such aggregation may tempt one tosubstitute self-evident psychometrics (whichmagnify even trivial consistencies by elimi-nating situational variance) for more com-plex (and less obvious) psychological analy-ses of the nature of perceived similarities andappropriate equivalence groupings in the or-ganization of behavior and the constructionof personality.

We share Bern and Funder's (1978) desirefor a language that can be used to describeboth persons and situations commensur-ately. But we doubt the viability of their ap-proach to this goal as they applied it to theirsearch for consistency in the domain of delayof gratification. Searching to identify the per-

sonality of situations and unraveling thestructure of person-situation interactions issurely a fundamental problem shared by ourfield. The long and rich history in the Meehlcookbook tradition shows that understand-ing situations will require more than iden-tifying the correlates of performance withinthem. In our view, an adequate resolution ofthe many issues raised in the pursuit of per-son-situation interaction will require a theo-retical reconceptualization of both person-ality and situation constructs themselves, notjust more clever methods for applying every-day trait terms to people's behavior in par-ticular contexts. We believe that such a re-conceptualization will unify the analysis ofperson characteristics with the analysis ofcognitive-learning processes and requires thatthe person and the situation be analyzed inlight of the same psychological principles andnot merely described with the same traitterms (e.g., Mischel, 1973). It also requiresa deeper analysis of the nature of person andsituation categories (Cantor et al., 1982a,1982b; Cantor & Mischel, 1979). In the ab-sence of an appropriate theoretical frame-work, the search for consistencies can be-come an ultimately uninteresting hunt forstatistically significant coefficients that ne-glects their psychological significance andtheir links to psychologically interesting pro-cesses.

Finally, our attempts to identify a subsetof individuals for whom conscientiousness\s relevant, and who will thus show appre-ciably more consistency across contexts,guided by Bern and Allen (1974), led to aproblematic conclusion. Like Bern and Al-len, we found clear support that raters agreewell with each other about people who seethemselves as generally consistent with re-gard to the particular dimension. Conversely,raters agree much less about the attributesof people who view themselves as highly vari-able on the relevant dimension. Less obvious,but more challenging theoretically, is thefinding that people's global perceptions oftheir own overall consistency or variabilityon a dimension do not appear to be closelyrelated to the observed cross-situational con-sistency of their behavior. Although inter-judge agreement was greater for people whosee themselves overall as consistent in con-


scientiousness, cross-situational consistencyin their behavior was not significantly greaterthan it was for those who see themselves asvariable or for the entire group as a whole.This pattern occurred in the Bern-Allen dataas well as in the Carleton data.

From Paradox to Paradox?

Our pursuit of cross-situational consis-tency in behavior through the use of the bet-'ter methods proposals has brought us fullcircle. We began by noting the paradox thatexists between our intuitions of consistencyin behavior on the one hand and researchthat documents specificity on the other. Wereviewed and examined the utility of threeof the most popular methodological refine-ments that have been proposed as possiblesolutions to the consistency paradox. Theend result of our conceptual and empiricalendeavors is another paradox that closelyparallels the one we set out to resolve. Theresults of the replication of the Bern andAllen work suggests that raters agree sub-stantially more about persons who identifythemselves as cross-situationally consistent.However, these individuals do not show sub-stantially greater cross-situational consis-tency in behavior than people identified asmore variable. Here, again, shared intuitionsabout persons do not agree with the data.

We might account for these results in avariety of alternative ways. On the one hand,one might suggest that the replicable paradoxresults from methodological problems com-monly associated with the use of behavioraldata and dismiss the behavioral results, rest-ing the case for personality structure on theimpressive findings among the rating data(e.g., Block, 1977). Alternatively, one mightargue that the behavioral data accurately re-flect the complex structure of behavior andthat the substantial interrater agreements re-flect shared theories about persons and otherheuristics that bias our judgments about thecoherences that actually exist in the behaviorof others (e.g., Chapman & Chapman, 1969;Mischel, 1968, 1979; Nisbett & Ross, 1980;Ross, 1977; Schneider, 1973; Shweder &D'Andrade, 1980). Both of these interpreta-tions have some merit. Nevertheless, grant-ing both the methodological problems of be-

havioral data and the existence of cognitiveeconomics, our perceptions of others are stillunlikely to be entirely illusory. They mayderive rather directly from the behavior ofthe individual but not from those aspects thatwe expect or to which the consistency debatehas pointed so far.

Components for a Reconceptualization

Our approach to understanding the con-sistency paradox is guided simultaneously bya cognitive social-learning conceptualizationof behavior organization (e.g., Mischel, 1973)and a cognitive prototype view of person cat-egorization (e.g., Cantor & Mischel, 1979).First, consider the nature of the regularitiesrevealed from the study of behavioral con-sistency. We read these data as repeatedlyshowing temporal stability more impres-sively than cross-situational consistency.Greater temporal'than cross-situational con-sistency seems sensible from the perspectiveof cognitive social-learning theory. Becausethe contingencies in a given situation oftenremain unchanged over time, stability overtime is expected and predicted in much socialbehavior (e.g., Bandura, 1969; Mischel, 1968).Moreover, from this perspective, temporalstability would be expected to the degree thatsuch qualities as the person's competencies,encodings, expectancies, values, and plansendure (Mischel, 1973). The pursuit of du-rable values and goals with stable skills andexpectations for long periods of time surelyinvolves coherent and meaningful pattern-ings among the individual's efforts and en-terprises. The degree of cross-situational con-sistency, however, might be high, low, orintermediate, depending on many consid-rations, including the structure of theperceived cross-situational contingencies andthe subjective equivalences among the di-verse situations sampled. Distinctive contin-gencies may be expected to occur even inslightly different situations, producing highdiscriminativeness cross situationally. If so,if cross-context behavioral discriminative-ness is the rule rather than the exception—the phenomenon rather than the error—thenthe search for consistency across situationswill continue to yield slim results.

But then how can we understand the other

750 WALTER MISCHEL ANP PHILIP K. PEAKE

side of the consistency paradox, the intuitiveconviction of consistency? Our attempts toemploy Bern and Allen's idiographic solutionto the Carleton College data showed that peo-ple who see themselves as consistent on adimension are indeed rated with greater in-terjudge agreement by others even though(and this is the key point) their behavior doesnot necessarily show appreciably greateroverall cross-situational consistency. Whatare the bases—the ingredients—of the seem-ingly pervasive and shared perception of con-sistency in a personality disposition if theperception is not related to the level of cross-situational consistency in the reliably ob-served referent behaviors? To try to answerthis question, we turn to the cognitive pro-totype approach (e.g., Cantor & Mischel,1979). Guided by cognitive theories of thecategorization of everyday objects (e.g.,Rosch, Mervis, Gray, Johnson, & Boyes-Bfaem, 1976), this prototype approach to thecategorization problem appreciates the real-ity of individual differences but seeks to re-conceptualize the nature of the consistenciesthey reflect in an interactional framework(Cantor et al., 1982a, 1982b; Cantor & Mis-chel, 1979). The prototype approach recog-nizes the especially fuzzy nature of naturalcategories and, along the lines first traced byWittgenstein (1953), searches not for any sin-gle set of features snared by all members ofa category but rather, for a family-resem-blance structure, a pattern of overlappingsimilarities. The recognition of fuzzy sets alsosuggests that categorization decision will beprobabilistic, with many ambiguous border-line cases that produce overlapping, fuzzyboundaries between the categories, and thatmembers of a category will vary in degree ofmembership (prototypicality).

The cognitive prototype approach appliedto the consistency paradox suggests that con-sistency judgments with respect to a categoryare made not by seeking the average of allthe observable features of a category but bynoting the reliable occurrence of some fea-tures that are central to the category, or moreprototypic. That is, we suggest that consis-tency judgments rely heavily on the obser-vation of central (prototypic) features so thatthe impression of consistency will derive notfrom average levels of consistency across all

the possible features of the category butrather from the observation that some centralfeatures are reliably (stably) present. Fromthis perspective, extensive cross-situationalconsistency may not be a basic ingredient foreither the organization of personal consis-tency in a domain or for its perception.

The Construction of Consistency

In accord with the cognitive prototypeview, we propose that the shared globalimpression of trait consistency (in self andin others) arises not primarily from the ob-servation of cross-situational consistency inrelevant behaviors. Rather, we propose thatto assess variability (versus consistency) withregard to a category of behavior people scanthe temporal stability of a limited numberof behaviors that are most relevant (centralor prototypic) to that category for them.Thus, the impression of consistency is basedsubstantially on the observation of temporalstability in those behaviors that are highlyrelevant (central) to the prototype but is in-dependent of the temporal stability of be-haviors that are not highly relevant to theprototype. Conversely, the perceptipn ofvariability arises from the observation oftemporal instability in highly relevant fea-tures.

To explore this hypothesis in the CarletonCollege data on conscientiousness, we pre-dicted that people who judge themselvesoverall as consistent across situations (lowvariability on Bern and Allen's, 1974, globalself-report measure) will show greater tem-poral stability but not greater cross-situa-tional consistency than those who view them-selves as less consistent.9 In addition, becausewe believe that the judgment of consistencyis independent of the temporal stability ofbehaviors that are not highly prototypical, wepredicted this difference in temporal stabilityto be more pronounced on the more proto-typic features of conscientiousness than onthe less prototypic features.

9 The measure of global self-perceived consistency wasthe subject's answer to the question (on a 0-6 scale),"How much do you vary from situation to situation inhow conscientious you are about daily matters and re-sponsibilities?"


To test these hypotheses, temporal-stabil-ity coefficients were obtained separately oneach of the behavioral measures employedat Carleton for subjects who rated themselvesas high (versus low) in variability. Subjectswho perceived themselves as highly consis-tent rather than as more variable across sit-uations had somewhat higher temporal sta-bility across all the behavior measures. Themean temporal-stability coefficients were .68and .55 for those high versus low in self-per-ceived consistency, respectively, t(32) = 1.82,p < .10 (two-tailed). There were no appre-ciable differences in the behavioral cross-sit-uational consistency of those who saw them-selves as high (r = .11) or low (r= .14) inconsistency.

Most interesting, and central to our hy-pothesis, is the linkage between the globalself-perception of consistency and the tem-poral stability of more prototypic behaviors.Ratings of prototypicality were available for17 of the 19 Carleton behavior measures andallowed us to divide these measures into the"more" and the "less" prototypical (at themedian of the total ratings for all items).10

Table 7 presents the links between the globalself-perception of consistency and the behav-ioral data, divided into more prototypic ver-sus less prototypic behaviors. The pattern ofresults was exactly as expected by the hy-pothesis. First, consider the more prototypicbehaviors. Students who saw themselves ashighly consistent in conscientiousness weresignificantly more temporally stable on theseprototypic behaviors than those who viewedthemselves as more variable (low variability,mean r = .71; high variability, mean r = .47),t( 15) = 2.97, p<.025 (two-tailed)." Thismean difference is reflected pervasively in thecomponent behaviors. On seven of the ninecomparisons between prototypic behaviorscomputed, subjects who perceived them-selves as highly variable showed significantlyless temporal stability (with no appreciabledifference on the other two comparisons). Incontrast to the clear and consistent differ-ences in temporal stability of prototypic be-haviors, there was no difference between theself-perceived low- and high-variability groupsin the mean temporal stability for the lessprototypic behaviors (r = .64 and .65, re-spectively). The two groups differed signifi-

Table 7Links Between Self-Perceived Consistencyand Behavior

Self-perceivedconsistency

Behavioral dataLow High

variability variability

Cross-situational consistencyMore prototypical .15 .13Less prototypical .09 .14

Temporal stabilityMore prototypical .71 .47Less prototypical .65 .64

Note. The coefficients reported are mean correlationcoefficients across all possible comparisons of the des-ignated type.

cantly on only one of the eight less prototypicbehaviors (with low-variable subjects show-ing greater stability), and there were no dif-ferences approaching significance on any ofthe other comparisons. Finally, as expected,there was no relation between self-perceivedconsistency and behavioral cross-situationalconsistency, regardless of the prototypicalityof the behaviors.

10 The prototypicality of the behavioral measures wasassessed by having subjects rate each of the 47 conscien-tiousness items on the original Cross-Situational Behav-ior Survey (CSBS) questionnaire for that item's rele-vance to "conscientiousness" using a 0-6 scale where0 signifies "not at all relevant" and 6 signifies "extremelyrelevant." Mean relevance (prototypicality) ratings werecomputed for each of the 47 items. From this largergroup of ratings it was possible to abstract the prototyp-icality ratings of 17 of the 19 behavioral measures em-ployed in the study. These 17 measures were then rankordered according to their degree of prototypicality anddivided into two groups (more prototypical and less pro-totypical), depending on whether they fell above or be-low the median level of prototypicality derived from thecomplete list of 47 items.

11 Prototypicality ratings for all 19 behaviors subse-quently were obtained from a second questionnairegiven to an independent sample of 40 Carleton students.They rated on a 7-point scale how well each behavioralreferent exemplifies the category. Analyses of these rat-ings that parallel the analyses just described provide evenstronger support for the hypothesis. For the more pro-totypic behaviors, the mean temporal stability coeffi-cients of the self-perceived low-variability versus high-variability groups were .68 and .31, respectively, /(17) =3.33, p < .01. Parallel analyses for the friendliness do-main yielded similar results in support of the hypothesisand are described in Peake (Note 1).


These results support the view that theimpression of consistency in behavior maybe rooted in temporally stable prototypic be-haviors rather than in pervasive overall cross-situational consistencies. The findings sug-gest that individuals judge their degree ofconsistency from the temporal stability of therelevant, more prototypic behaviors. Iriter-estingly, the two groupsjio not differ in tem-poral stability on the less prototypic behav-iors, suggesting that those behaviors do notenter into the judgment of one's variability.It seems then that the locus of the perceptionof variability may be in the temporal stabilityof highly prototypic behaviors, regardless ofcross-situational consistency. A tendency toovergeneralize from the observation of tem-poral stability in prototypic features to animpression of overall consistency would cer-tainly be congruent with other tendencies togo well beyond observations in social infer-ences and attributions (e.g., Mischel, 1979;Nisbett, 1980; Nisbett & Ross, 1980; Ross,1977; Tversky & Kahneman, 1974).

The consistency debate has been aptlycharacterized as reflecting a continuous con-flict between the findings of research and theconvictions of our intuitions (Bern & Allen,1974, p. 508). After reviewing the issues anddata on the debate, Bern and Allen concludedthat "in terms of the underlying logic andfidelity to reality, we believe that our intu-itions are right; the research, wrong" (p. 510).We hope that our present analysis helps toidentify some of the roots of the conflict andthe routes toward its resolution. We believethat both the intuitions and the research havevalidity, but they are based on different data.The intuitions of cross-situational consis-tency are grounded in data, but these data,we suggest, are not highly generalized cross-situational consistencies in behavior: Rather,we propose the intuitions about a person'sconsistency arise from the observation oftemporal stability in prototypical behaviors.The error is to confuse the temporal stabilityof key behaviors or central features with per-vasive cross-situational consistency and thento overestimate the latter, a common ten-dency hardly confined to the layperson. Ourcompelling intuitions are based on consis-tencies in behavior, but perhaps not on theconsistencies that the debate pursued for somany years.

The consistency paradox has been a puz-zling and persistent barrier in the search forpersonality structure. But a theory of per-sonality structure does not require everyoneto be characterized by high levels of pervasivecross-situational consistency in behavior. Itdoes require a structure for behavior: TheCarleton data suggest to us that such struc-ture may be rooted in the occurrence of tem-porally stable but cross-situationally discrim-inative features that are prototypic for theparticular behavior category as perceived bythe particular person. A close analysis of thepatterning and organization of such featureswithin individuals should be most interest-ing, and we plan such an analysis. We expectthat the most consistent and prototypic ex-emplars of a category like conscientiousnesswill be those individuals who stably exhibta number—but not necessarily many—of itsprototypic features, as they themselves definethat prototypicality. We expect that the par-ticular constellations of features will be idio-graphically patterned so that no individualsnecessarily share the identical configuration,although considerable between-person over-lap occurs. Although the exact pattern thatdefines conscientiousness may not be iden-tical for any two persons, each individualwho is characterized as consistently conscien-tious will display some of its features withtemporal stability, albeit with a distinctivecross-situational constellation.12 If so, theroute may be open not only for seeing theuniqueness of each personality (which per-sonologists have long appreciated) but alsofor ultimately understanding its commonstructure.

Conclusions

It is tempting to tire of the consistencydebate, to trivialize it by focusing on its ob-

12 If these hypotheses prove valid, they also would helpone understand why the issue of cross-situational con-sistency is a more serious problem for the nomothetictrait psychologist than for the layperson. To the degreethat each individual maintains reasonable temporal sta-bility in his or her distinctive pattern of prototypic be-haviors, the impression of pervasive consistency may bepreserved. Individuals thus may be perceived as consis-tent on their particular set of stable behaviors even ifthese behaviors do not map very well onto the total setof referents selected by the trait researcher to define hisor her dimension for all people.


vious qualities. Surely each person's behaviorshows some consistency, some discrimina-tiveness, some continuity, some change. Butthis tempting stance has nontrivial conse-quences, diverting attention from the seriousquestions that have been raised thoughoutthe debate by evidence of unexpected dis-criminativeness in behavior as a function ofcontext and the psychological situation. Un-fortunately, the debate has been difficult toresolve not only because the phenomena areexceedingly complex but because the dataavailable are often readily misread and theinterpretations far exceed their source. Thepresent analysis has attempted to clarifysome of the basic issues and to assess someof the proposals for progress while also of-fering an alternative direction. The data wehave presented, and-our analysis of the rel-evant research by others, suggest the follow-ing conclusions:

1. An extensive assessment of conscien-tiousness in college students showed thateven with reliable measures, based on mul-tiple observations of behavior aggregatedover occasions, or aggregated further overresponse modes within situations, mean cross-situational consistency coefficients were ofmodest magnitude (on average not exceedingabout r = .20). In contrast, impressive tem-poral stability was found (average r = .65) inthe same data. Although aggregation of mea-sures over occasions is a useful step in estab-lishing reliability, aggregation of measuresover situations may bypass rather than re-solve the problem of cross-situational con-sistency. The demonstration of impressivetemporal stability, but of only modest cross-situational consistency, was seen as con-gruent with extensive other research and withpredictions from cognitive social-learningtheory.

2. The Bem-Funder (1978) template-matching approach did not enhance thesearch for cross-situational consistency eitherin their original data or in our extended rep-lication. These results are not surprisinggiven the history of earlier approaches thatshared similar logic and method (e.g., cook-books for the MMPI). The prospect of a com-mensurate language for describing personsand situations remains attractive but cannotbe achieved by merely describing perfor-mance correlates in particular situations. A

potential alternative is a prototype analysisin which both persons and situations arecharacterized by the same theory-informedmethods (e.g., Cantor et al., 1982a, 1982b;Cantor & Mischel, 1979).

3. The Bern-Allen (1974) moderator-variable approach successfully identifies peo-ple who see themselves as consistent (lowrather than high variability) on a dimension:Other people, in turn, are more likely to agreewith each other and with the low-variabilitysubjects about their status on the dimension.Global self-report judgments of one's consis-tency, however, do not seem to be stronglylinked to cross-situational consistency in thereferent behaviors even when reliably mea-sured. Evidence for greater cross-situationalconsistency in behavior in "some of the peo-ple some of the time" was not found eitherin the Bern and Allen data or in our studyof conscientiousness.

4. Congruent with a cognitive prototypeapproach, we propose that the judgment oftrait consistency is strongly related to thetemporal stability of highly prototypic (butnot of less prototypic) behaviors. In contrast,the global impression of consistency may notbe strongly related to overall or average cross-situational consistency, even in prototypicbehaviors. Thus, the perception and organi-zation of personality consistencies, we sug-gest, will depend more on the temporal sta-bility of key features than on the observationof cross-situational behavioral consistency,and the former may be easily interpreted as iif it were the latter. Results supporting theseexpectations were presented.

Before firm generalizations can be reachedfrom the results of the Carleton study justsummarized, we will need thorough repli-cations and extensions to assure the robust-ness and breadth of the conclusions. Evenwith these strong reservations in mind, thepattern already obtained seems promising,pointing ultimately, we hope, to a structuralsolution for an enduring dilemma in ourfield. The consistency paradox may be par-adoxical only because we have been lookingfor consistency in the wrong place. If ourshared perceptions of consistent personalityattributes are indeed rooted in the observa-tion of temporally stable behavioral featuresthat are prototypic for the particular attrib-ute, the paradox may well be on the way to


resolution. Instead of seeking high levels ofcross-situational consistency—instead oflooking for broad averages—we may need,instead, to identify unique bundles or sets oftemporally stable prototypic behaviors—keyfeatures—that characterize the person evenover long periods of time but not necessarilyacross many or all possibly relevant situa-tions.

Reference Notes

1. Peake, P. K. Searching for consistency: The Carletonstudent behavior study. Unpublished doctoral disser-tation, Stanford University, 1982.

2. Hoffman, C, & Bern, D. J. Contextual templatematching: A progress report on predicting all of thepeople all of the time. Unpublished manuscript, Uni-versity of Alberta, Alberta, Canada, 1981.

3. Lutsky, N., Peake, P. K., & Wray, L. Inconsistenciesin the search for cross-situational consistencies in be-havior: A critique of the Bern and Allen study. Paperpresented at the meeting of the Midwestern Psycho-logical Association, Chicago, 1978.

4. Peake, P. K., & Lutsky, N. S. On predicting "sums"of the person, "sums" of the t ime. Manuscript in prep-aration, Stanford University, 1982.

References

Allport, G. W. Personality: A psychological interpreta-tion. New York: Holt, Rinehart & Winston, 1937.

Allport, G. W. Traits revisited. American Psychologist,1966, 21, 1-10.

Allport, G. W., & Vernon, P. E. Studies in expressivemovement. New York: Macmillan, 1933.

Bandura, A. Principles of behavior modification. NewYork: Holt, Rinehart & Winston, 1969.

Bandura, A. Reflections on self-efficacy. In S. Rachman(Ed.), Advances in behaviour research and therapy(Vol. 1). Oxford, England: Pergamon Press, 1978.

Bern, D. J., & Allen, A. On predicting some of the peoplesome of the time: The search for cross-situational con-sistencies in behavior. Psychological Review, 1974,81,506-520.

Bern, D. J., & Funder, D. C. Predicting more of thepeople more of the time: Assessing the personality ofsituations. Psychological Review, 1978, 85, 485-501.

Block, J. Advancing the psychology of personality: Par-adigmatic shift or improving the quality of research.In D. Magnusson & N. S. Endler (Eds.), Personalityat the crossroads: Current issues in interactional psy-chology. Hillsdale, N.J.: Erlbaum, 1977.

Byrne, D. Repression-sensitization as a dimension ofpersonality. In B. A. Maher (Ed.), Progress in exper-imental personality research (Vol. 1). New York: Ac-ademic Press, 1964.

Cantor, N., & Mischel, W. Prototypes in person per-ception. In L. Berkowitz (Ed.), Advances in experi-mental social psychology (Vol. 12). New York: Aca-demic Press, 1979.

Cantor, N., Mischel, W., & Schwartz, J. A prototypeanalysis of psychological situations. Cognitive Psy-chology, 1982, 14, 45-77. (a)

Cantor, N., Mischel, W., & Schwartz, J. Social knowl-edge: Structure, content, use and abuse. In A. Hastorf& A. Isen (Eds.), Cognitive social psychology. NewYork: Elsevier North-Holland, 1982. (b)

Carlson, R. Where is the person in personality research?Psychological Bulletin, 1971, 75, 203-219.

Chapman, L. J., & Chapman, J. P. Illusory correlationsas an obstacle to the use of valid psycho-diagnosticsigns. Journal of A bnormal Psychology, 1969, 74,271-280,

Cronbach, L. J., & Meehl, P. E. Construct validity inpsychological tests. Psychological Bulletin, 1955, 52,281-302.

Dudycha, G. J. An objective study of punctuality inrelation to personality and achievement. Archives ofPsychology, 1936, 29, 1-53.

Epstein, S. The stability of behavior: I. On predictingmost of the people much of the time. Journal of Per-sonality and Social Psychology, 1919,37, 1097-1126.

Epstein, S. The stability of behavior: II. Implications forpsychological research. American Psychologist, 1980,35, 790-806.

Fishbein, M., & Ajzen, I. Belief, attitude, intention andbehavior: An introduction to theory and research.Reading, Mass.: Addison-Wesley, 1975.

Fiske, D. W. The inherent variability of behavior. InD. W. Fiske & S. R. Maddi (Eds.), Functions of variedexperience. Homewood, 111.: Dorsey Press, 1961.

Funder, D. C. The person-situation interaction in atti-tude change. Unpublished doctoral dissertation,Standford University, 1979.

Goldberg, L. R. Diagnosticians vs. diagnostic signs: Thediagnosis of psychosis vs. neurosis from the MMPI.Psychological Monographs, 1965, 79 (9, Whole No.602).

Gulliksen, H. Theory of mental tests. New York: Wiley,1950.

Hartshorne, H., & May, M. A. Studies in deceit. NewYork: Macmillan, 1928.

Horowitz, L. M., Inouye, D., & Siegelman, E. Y. Onaveraging judges' ratings to increase their correlationwith an external criterion. Journal of Consulting andClinical Psychology, 1979, 47, 453-458.

Jaccard, J. J. Predicting social behavior from personalitytraits. Journal of Research in Personality, 1974, 7,358-367.

Kenrick, D. T., & Stringfield, D. O. Personality traitsand the eye of the beholder: Crossing some traditionalphilosophical boundaries in the search for consistencyin all of the people. Psychological Review, 1980, 87,88-104.

Lord, C. G. Predicting behavioral consistency from anindividual's perception of situational similarities.Journal of Personality and Social Psychology, 1982,42, 1076-1088.

Lord, F. M., & Novick, M. R. Statistical theories ofmental test scores. Reading, Mass.: Addison-Wesley,1968.

Magnusson, D., & Ekehammar, B, An analysis of situ-ational dimensions: A replication. Multivariate Be-havioral Research, 1973, 8, 331-339.

Magnusson, D., & Endler, N. S. Interactional psychol-ogy: Present status and future prospects. In D. Mag-nusson & N. S. Endler (Eds.), Personality at the cross-roads: Current issues in interactional psychology.Hillsdale, N.J.: Erlbaum, 1977.


McGowan, J., & Gormly, J. Validation of personalitytraits: A multicriteria approach. Journal of Personalityand Social Psychology, 1976, 34, 791-795.

Meehl, P. E. Wanted—A good cookbook. AmericanPsychologist, 1956, 11, 263-272.

Mischel, W. Theory and research on the antecedents ofself-imposed delay of reward. In B. A. Maher (Ed.),Progress in experimental personality research (Vol. 3).New York: Academic Press, 1966.

Mischel, W. Personality and assessment. New-York:Wiley, 1968.

Mischel, W. Toward a cognitive social learning recon-ceptualization of personality. Psychological Review,1973, 80, 252-283.

Mischel, W. Processes in delay of gratification. In L.Berkowitz (Ed.), Advances in experimental social psy-chology (Vol. 7). New York: Academic Press, 1974.

Mischel, W. On the future of personality measurement.American Psychologist, '1917, 32, 246-254.

Mischel, W. On the interface of cognition and person-ality: Beyond the person-situation debate. AmericanPsychologist, 1979, 34, 740-754.

Mischel, W. Introduction to Personality, (3rd ed.). NewYork: Holt, Rinehart & Winston, 1981.

Mischel, W., & Ebbesen, E. B. Attention in delay ofgratification. Journal of Personality and Social Psy-chology, 1970, 16, 329-337.

Mischel, W., Ebbesen, E. B., & Zeiss, A. R. Selectiveattention to the self: Situational and dispositional de-terminants. Journal of Personality and Social Psy-chology, 1973, 27, 129-142.

Moos, R. H., & Fuhr, R. The clinical use of social-eco-logical concepts: The case of an adolescent girl. Amer-ican Journal of Orthopsychiatry, 1982, 52, 111-122.

Newcomb, T. M. Consistency of certain extrovert-intro-vert behavior patterns in 51 problem boys. New York:Columbia University, Teachers College, Bureau ofPublications, 1929.

Nisbett, R. E. The trait construct in lay and professionalpsychology. In L. Festinger, (Ed.), Retrospections onsocial psychology. New York: Oxford UniversityPress, 1980.

Nisbett, R. E., & Ross, L. D. Human inference: Strat-egies and shortcomings of social judgment. EnglewoodCliffs, N.J.: Prentice-Hall, 1980.

Olweus, D. A critical analysis of the "modern" inter-actionist position. In D. Magnusson & N. S. Endler(Eds.), Personality at the crossroads: Current issuesin interactional psychology. Hillsdale, N.J.: Erlbaum,1977.

Patterson, G. R. The aggressive child: Victim and ar-chitect of a coercive system. In L. A. Hamerlynck,L. C. Handy, & E. J. Mash (Eds.), Behavior modifi-cation and families: 1. Theory and research. NewYork: Brunner/Mazel, 1976.

Patterson, G. R., & Moore, D. R. Interactive patternsas units. In M. Lamb, S. Suomi, & G. Stephenson(Eds.), Methodological problems in the study of socialinteraction. Madison: University of Wisconsin Press,1979.

Peterson, D. R. The clinical study of social behavior.New York: Appleton-Century-Crofts, 1968.

Raush, H. L. Paradox levels, and junctures in person-situation systems. In D. Magnusson & N. S. Endler(Eds.), Personality at the crossroads: Current issuesin interactional psychology. Hillsdale, N.J.: Erlbaum,1977.

Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M.,& Boyes-Braem, P. Basic objects in natural categories.Cognitive Psychology, 1976, 8, 382-439.

Ross, L. The intuitive psychologist and his shortcom-ings: Distortions in the attribution process. In L. Ber-kowitz (Ed.), Advances in experimental social psy-chology (Vol. 10). New York: Academic Press, 1977.

Rushton, J. P., Jackson, D. N., & Paunonen, S. V. Per-sonality: Nomothetic or idiographic? A response toKenrick and Stringfield. Psychological Review, 1981,88, 582-589.

Schneider, D. J. Implicit personality theory: A review.Psychological Bulletin, 1973, 79, 294-309.

Shweder, R. A., & D'Andrade, R. G. The systematicdistortion hypothesis. New Directions for Methodol-ogy of Social and Behavioral Science, 1980, 4, 37-58.

Tellegen, A., Kamp, J., & Watson, D. Recognizing in-dividual differences in predictive structures. Psycho-logical Review, 1982, 89, 95-105.

Thorndike, E. L. Principles of teaching. New York:Seiler, 1906.

Thurstone, L. L. The reliability and validity of tests. AnnArbor, Mich.: Edwards Brothers, 1932.

Tversky, A. Features of similarity. Psychological Review,1977, 84, 327-352.

Tversky, A., & Kahneman, D. Judgment under uncer-tainty: Heuristics and biases. Science, 1974, 185,1124-1131.

Vernon, P. E. Personality assessment: A critical survey.New York: Wiley, 1964.

Wiggins, J. S. Personality and prediction: Principles ofpersonality assessment. Reading, Mass.: Addison-Wesley, 1973.

Wiggins, J. S. Clinical and statistical prediction: Whereare we and where do we go from here? Clinical Psy-chology Review, 1981, 7, 3-18.

Wittgenstein, L. Philosophical investigations. New York:Macmillan, 1953.

Received November 4, 1981Revision received June 16, 1982

Beyond Deja Vu in the Search for Cross-Situational Consistency

Documents

Transcript of Beyond Deja Vu in the Search for Cross-Situational Consistency