Mathematical Psychology - UCI Webfiles

Mathematical Psychology

Jean-Claude Falmagnea, Michael D. Leea

aUniversity of California, Irvine, 3151 Social Sciences Plaza A, Irvine CA 92697-5100

Abstract

The article reviews the field from a historical perspective, starting from theworks of Fechner and Thurstone, and outlining the basic theoretical conceptsin four traditional areas: learning, psychophysics, measurement and choice,and response times. More recent topics that have emerged from these areasare also briefly mentioned together with their most prominent contributors.

Keywords: Bayesian statistics, choice theory, Decision Field Theory,European Mathematical Psychology Group, Fechner’s Law, Law ofComparative Judgment, Markov chain processes, mathematical learningtheory, measurement theory, psychophysics, sequential sampling models,signal detection theory, Society for Mathematical Psychology, stimulussampling theory, utility theory

1. Introduction

Mathematics has been used in psychology for a long time, and for differentpurposes. When William James (1890/1950) writes

Self−esteem =Success

Pretensions

he is using mathematical notation metaphorically. Because no method isprovided to measure the three variables, the equation cannot be taken liter-ally. James means to convey, by a dramatic formula, the idea that if yourpretensions increase without a corresponding increase of your success, yourself-esteem will suffer. Such usages of mathematics, which Miller (1964) calls

Email addresses: [email protected] (Jean-Claude Falmagne), [email protected] (MichaelD. Lee)

Preprint submitted to Encyclopedia of the Social and Behavioral Sciences April 16, 2012

‘discursive’, can be found in psychological discourse at least since Aristotle.Despite their historical interest, we shall not review such metaphorical usesof mathematics here (see, instead, Boring 1950 or Miller 1964).

In this article, we reserve the term ‘mathematical psychology’ to the elab-oration and the testing of mathematical theories and models for behavioraldata. In principle, such a theory entails an economical representation of aparticular set of data in mathematical terms, where ‘economical’ means thatthe number of free parameters of the theory is substantially smaller thatthe number of degrees of freedom in the data. In that sense, mathematicalpsychology plays for behavioral data the role that mathematical physics ormathematical biology play for physics or biology, respectively. In the bestcases, the theory is cast in probabilistic terms and is testable by standardstatistical methods. The large majority of such mathematical theories for be-havioral data have emerged from four partially overlapping traditional fields:psychophysics, learning, choice, and response times. Each of these field isoutlined below from the standpoint of the prominent mathematical modelsthat have been proposed. Other topics to which mathematical psychologistshave devoted much work are also mentioned.

2. The Precursors: Fechner and Thurstone

Gustav Theodor Fechner (1801-1887; see Boring 1950) was by trainingan experimental physicist with a strong mathematical background. Whilehis interests were diverse and his contributions many, ranging from experi-mental physics to philosophy, we only consider him here as the founder ofpsychophysics. His main purpose was the measurement of ‘sensation’ in amanner imitating the methods used for the fundamental scales of physics,such as length or mass. Because ‘sensation’ could not be measured directly,Fechner proposed to evaluate the difference between the ‘sensations’ evokedby two stimuli by the difficulty of discriminating between them.

To be more specific, we introduce some notation. We write x, y, etc.for positive real numbers representing physical intensities measured on someratio scale, such as sound pressure level. Let P (x, y) be the probability thatstimulus x is judged to be louder than stimulus y. Cast in modern terms(cf. Luce and Galanter 1963; Falmagne 1985/2002) Fechner’s idea amountsto finding a real valued function u defined on the set of physical intensities,and a function F (both strictly increasing and continuous) such that

P (x, y) = F [u (x) − u (y)] . (1)

2

This equation is supposed to hold for all pairs of stimuli (x, y) such that0 < P (x, y) < 1 (subjectively, x is close to y). If such an equation holds,then the function u can be regarded as a candidate scale for the measurementof ‘sensation’ in the sense of Fechner. A priori, it is by no means clear thata scale u satisfying (1) necessarily exists. A necessary condition is the so-called Quadruple Condition: P (x, y) ≤ P (x′, y′) ⇐⇒ P (x, x′) ≤ P (y, y′) ,where the equivalence is assumed to hold whenever the four probabilitiesare defined. In the literature, the problem of constructing such a scale,or of finding sufficient conditions for its existence, has come to be labeledFechner’s Problem. An axiomatic discussion of Fechner’s Problem can befound in Falmagne (1985/2002; see also Krantz et al. 1971).

For various reasons, partly traditional, psychophysicists often prefer tocollect their data in terms of discrimination thresholds, which can be obtainedas follows from the discrimination probabilities. We define a sensitivity func-

tion ξ : (y, ν) 7→ ξν (y) by the equivalence: ξ (y, ν) = x ⇔ P (x, y) = ν.The Weber function1 ∆ : (y, ν) 7→ ∆ν (y) is then defined by the equation∆ν(y) = ξν(y) − ξ.5(y). A representation of these fundamental concepts ofclassic psychophysics is given in Figure 1, in a special case where P (y, y) = .5.For each value of y, the S-shaped function x 7→ P (x, y) is called a psychome-

tric function.Starting with Weber himself, the Weber function ∆ has been investigated

experimentally for many sensory continua. In practice, ∆ν (y) is estimated bystochastic approximation (cf. Robbins and Monro 1951; Wasan 1969; Levitt1970) for one or a few criterion values of the discrimination probability νand for many values of the stimulus y. A typical finding is that, for stimulusvalues in the midrange of the sensory continuum and for some criterion valuesν, ∆ν(y) grows approximately linearly with y, according to the equation:

∆ν (y) = yC (ν) , (2)

where C is a function depending on the criterion. The label Weber’s Law

is attached to this equation. It is easily shown that (2) is equivalent to thehomogeneity equation P (λx, λy) = P (x, y), (λ > 0), which in turn leadsto P (x, y) = H (log x− log y), with H (s) = P (es, 1). This means that the

1E. E. Weber (1795-1878), a colleague of Fechner, professor of anatomy and physiologyat Leipzig.

3

y

0.5

ξν(y)

ν

1

∆ν(y)

a psychometric

function

Figure 1: The Weber function ∆ in the case where P (y, y) = .5 for all stimuli y; thusξ.5 (y) = y. The S-shape function x 7→ P (x, y) is referred to as a psychometric function.

function u in (1) has the form

u (y) = A log y + B, (3)

where the constants A > 0 and B arise from uniqueness considerations forboth u and F . (3) has been dubbed Fechner’s Law. Our discussion indicatesthat, in the framework of (reffechone), Weber’s Law and Fechner’s Law areequivalent. Much of psychophysics evolved from Fechner’s ideas.

The most durable contribution of Leon Louis Thurstone (1887–1955) tomathematical psychology is his Law of Comparative Judgments (Thurstone1927a,b; see Bock and Jones 1968), a cornerstone of binary choice theoryclosely related to (1) of Fechner’s Problem. Thurstone supposed that a sub-ject confronted with a choice between two alternatives x and y (where x, yare arbitrary labels and do not necessarily represent numerical values), makesa decision by comparing the sampled values of two random variables Ux andUy associated to the alternatives. Suppose that these random variables areindependent and normally distributed, with means µ (x), µ (y), and vari-ances σ2 (x), σ2 (y), respectively. Denoting by Φ the distribution function ofa standard normal random variable and with P standing for the probability

4

measure, this leads to

P (x, y) = P (Ux > Uy) = Φ

µ (y) − µ (y)√

σ2 (x)2 + σ2 (y)

. (4)

Assuming that all the random variables have the same variance σ2 = α2/2,we obtain

P (x, y) = Φ(u (x)− u (y)), (5)

with u (x) = µ (x) /α and F = Φ, a special case of (1). (4) and (5) arecalled Case III and Case V of the Law of Comparative Judgments, respec-tively. Thurstone’s model has been widely applied in psychophysics andchoice. Other models in the same vein have been proposed by various re-searchers (e.g. Luce 1959; Luce and Suppes 1965). Thurstone’s other impor-tant contributions are in learning theory (e.g. Thurstone 1919), and especiallyMultiple-Factor Analysis (e.g., Thurstone 1947).

3. The Beginning: Mathematical Learning Theory

Two papers mark the beginning of mathematical psychology as a dis-tinguished research field: one by Estes (1950) entitled Toward a StatisticalTheory of Learning, and another by Bush and Mosteller (1951). These workswere the highlights of a movement spearheaded by R.R. Bush and R.D. Luceat the University of Pennsylvania, and R.C. Atkinson, W.K. Estes and P.Suppes at Stanford, which set out to formalize mathematical learning the-ories in terms of stochastic processes, and especially, Markov processes (cf.Bharucha-Reid, 1960; Parzen, 1994). There were good reasons for such a de-velopment at that time. The previous decade had been plagued by fruitlesscontroversies concerning the basic mechanisms of learning. While a consider-able experimental literature on learning was available (cf. Hilgard, 1943), thestatistical tools in use for the analysis of the data were poor, and the promi-nent theories ambiguous. When mathematics was used, it was metaphori-cally. Moreover, the scope of the theories were ambitious, covering a vastclass of experimental situation loosely connected to each other conceptually.A typical example of such an endeavor is Hull (1943). By contrast, theMarkov models developed by Bush, Estes and their followers were designedfor specific experimental situations. Under the influence of Suppes, a philoso-pher of science from Stanford who played a major role in the development of

5

mathematical psychology, these models were often stated axiomatically. Asa consequence, the predictions of the models could in many cases be derivedby straightforward mathematical arguments. Two classes of models wereinvestigated.

3.1. Finite State Markov Chains

In these models, the basic idea is that the subject’s responses in a learningexperiment are the reflection of some internal states, which coincide withthe states of a finite Markov chain. The transitions of the Markov chain aregoverned by the triple of events (stimulus, response, reinforcement) occurringon each trial. In many situations, the number of states of the Markov chain issmall. We sketch a simple case of the so-called one-element model, in whichthe chain has only two states which we denote by N and K. Intuitively, Nstands for the ‘naive’ state, and K for the ‘cognizant’ state. On each trial ofthe experiment, the subject is presented with a stimulus and has to provide aresponse, which is either labeled as C (correct) or as F (false). For example,suppose that the subject has to identify a rule used to classify some cardsinto two piles. Say, a drawing on each card has either one or two lines, whichcan be either both straight or both curved, and either both vertical or bothhorizontal. The subject must discover that all the cards with curved linesand only those, go on the left pile. The subject is told on each trial whetherthe chosen pile is the correct one.

In words, the four axioms of the model are as follows: [M1] The subjectalways begins the experiment in the naive state N ; [M2] the probability ofa transition from the naive state N to the cognizant state K is equal tosome parameter 0 < θ ≤ 1, constant over trials, regardless of the subject’sresponse; [M3] in state N , the probability of a correct placement is equal toa parameter 0 < α ≤ 1, constant over trials; [M4] in state K, the response isalways correct. The derivation of the model can either be based on a 2-stateMarkov chain with state space {N, K}, or on a 3-state Markov chain withstate space {(N, C), (N, F ), (K, C)} (where N , K denote the subject’s statesand C , F the responses). In either case, the derivations are straightforward.Notice that, from Axioms [M1] and [M2], the trial number of the occurrencesof the first K state has a geometric distribution with parameter θ. Writing Sn

and Rn for the cognitive state and the response provided on trial n = 0, 1, . . .(thus, Sn = N, K and Rn = C, F ), we get easily, for n = 0, 1 . . .,

P (Sn = N) = (1 − θ)n ,

6

P (Rn = C | Sn = N) = α

P (Rn = C | Sn = K) = 1

P (Rn = C) = 1 − (1 − θ)n (1 − α) ,

which implies, with pn = P (Rn = C),

pn+1 = (1 − θ)pn + θ. (6)

A strong prediction of this model is that the number of correct responsesrecorded before an error occurring on trial number n+1 should be binomiallydistributed with parameters α and n (i.e., all the subjects’ responses havebeen generated by the naive state). This prediction is surprisingly difficultto reject, at least in some situations (Suppes and Ginsburg, 1963).

3.2. Linear Operators Models

To facilitate the comparison, we take the same concept identification ex-periment as above, involving the classification of cards into two piles, and weconsider a simple representative of this class of models. As before, we denoteby Rn the response on trial n. We take as the sample space the set of allsequences (R0,R1, . . . ,Rn, . . .), with Rn = C, F for n = 0, 1, . . .. We alsodefine

pω,0 = P(R0 = C)

pω,n = P(Rn = C | Rn−1, . . . ,R0), n = 1, 2, . . . .

Let 0 < θ ≤ 1 be a parameter. The axiom of the model has two cases, forn = 0, 1, . . .,

[L1] pω,n+1 = (1 − θ) pωn + θ if Rn−1 = F

[L2] pω,n+1 = pωn if Rn−1 = C.

Thus, the model has the two parameters pω,0 and θ, and learning occursonly when false responses are provided. As in the case of the finite Markovchains models, many predictions could be derived from such models, whichcould then be tested on the type of learning data traditionally collected bythe experimenters. For both classes of models, the results of such enterpriseswere often quite successful. Nevertheless, the interest for such models wanedduring the sixties, at least for learning situations, because researchers gradu-ally realized that their simple mechanisms were far too primitive to capture

7

all the intricacies revealed by more sophisticated analyses of learning data(see especially Yellott, 1969). General presentations of this topic can befound in Atkinson and Estes (1963) or Atkinson et al. (1965) for the finitestate Markov learning models, and in Estes and Suppes (1959) or Sternberg(1963) for the linear operator models. A mathematical discussion of Markovprocesses for learning models is contained in Norman (1972).

Despite the partial failure of these models to provide a satisfactory de-tailed explanation of traditional learning data, their role was neverthelessessential in the introduction of modern probability theory (in particularstochastic processes) and axiomatic methods in theoretical psychology, andin promoting the emergence of mathematical psychology as a field of research.In recent years, a renewed interest in learning theory has appeared from thepart of some economists.

4. Psychophysics

The main research topics in psychophysics can be traced back to the ideasof Fechner and Thurstone outlined in the first section of this article. Fech-ner’s method of measuring sensation is indirect, and based on the difficultyof discriminating between two stimuli. Under the impetus of S.S. Stevens,a psychologist from Harvard, different experimental methods for ‘scaling’sensation became popular.

4.1. Direct Scaling Methods

In the case of the magnitude estimation method, Stevens (1957) askedsubjects to make direct numerical judgments of the intensities of stimuli.For example, a subject may be presented with a pure tone of some intensityx presented binaurally, and would be required to estimate the magnitudeof the tone on a scale from 1 to 100. Typically, each subject would onlybe asked to provide one or a couple of such estimations, and the data ofmany subjects would be combined into an average or median result which wedenote by φ (x). In many cases, these results would be fit reasonably well bythe so-called Power Law φ (x) = αxβ or such variants as φ (x) = αxβ +γ andφ (x) = α (x − γ)β . In the cross modality matching method, the subject ispresented with some a stimulus from one sensory continuum (e.g., loudness),and is required to match its intensity with that of some other stimulus, froma different sensory continuum (e.g., brightness). Power laws were often alsoobtained in these situations. In the context of a discussion concerning the

8

measurement of sensation, the difference of forms between (3) and the PowerLaw was deemed important. While not much mathematical theorizing wasinvolved in any particular application of these ideas, a real challenge wasoffered by the need to construct a comprehensive theory linking all importantaspects of psychophysical methods and data. The most ambitious effort inthis direction is due to Krantz (1972). For a slightly different approach todirect scaling, see Anderson (1981).

4.2. Functional Equation Methods

Because the data from psychophysical experiments are typically noisy,the theoretician may be reluctant to make specific assumptions regardingthe form of some functions entering in the equations of a model. An exampleof such a model is (1), in which the functions u and F are not specified apriori. In such cases, the equations themselves may sometimes specify thefunctions implicitly. For instance, if we assume that both Weber’s Law and(1) hold, then Fechner’s Law must also hold, that is, the function u mustbe logarithmic, as in (3). Many more difficult cases have been analyzed (seeAczel et al., 2000, for the application of functional equation methods in thebehavioral sciences).

4.3. Signal detection theory

Response strategies are often available to a subject in a psychophysi-cal experiment. Consider a situation in which the subject must detect alow intensity stimulus presented over a background noise. On some trials,just the background noise is presented. The subject may have a bias to re-spond “YES” on some trials even though no clear detection occurred. Thisphenomenon prevents a straightforward analysis of the data because somesuccessful “YES” responses may be due to lucky guesses. A number of ‘signaldetection’ theories have been designed for parsing out the subject’s responsestrategy from the data. The key idea is to manipulate the subject’s strategyby systematically varying the payoff matrix, that is, the system of rewardsand penalties given to the subject for his or her responses. These fall in fourcategories: correct detection or ‘hit ’; correct rejection; incorrect detectionor ‘false alarm ’; and incorrect rejection or ‘miss.’ An example of a payoffmatrix is displayed in Table 1, in which the subject collects 4 monetary unitsin the case of a correct detection, and looses 1 such unit in the case of a falsealarm.

9

Table 1: An example of payoff matrix. The subject collects 4 monetary units in the caseof a correct detection (or hit).

Responses

Yes No

Yes 4(Hit) -2(Miss)Stimulus

No -1(False alarm) 3(Correct rejection)

For any payoff matrix θ, we denote by ps (θ) and pn (θ) the probabili-ties of a correct detection and of a false alarm, respectively. Varying thepayoff matrix θ over conditions yields estimates of points (ps(θ), ps(θ)) inthe unit square. It is assumed that these points lie on a Receiver-Operator-Characteristic (ROC) curve representing the graph of some ROC functionρ : pn (θ) 7→ ps (θ). The function ρ is typically assumed to be continuousand strictly increasing. The basic notion is that the subject’s strategy variesalong the ROC curves, while the discriminating ability varies across thesecurves. The following basic random variable model illustrates this interpre-tation. Suppose that to each stimulus s is attached a random variable Us

representing the effect of the stimulus on the subject sensory system. Simi-larly, let Un be a random variable representing the effect of the noise on thatsystem. The random variables Us and Un are assumed to be independent.We also suppose that the subject responds “YES” whenever some thresh-old λθ (depending on the payoff matrix θ), is exceeded. We obtain the twoequations

ps (θ) = P (Us > λθ) , pn (θ) = P (Un > λθ) . (7)

The combined effects of detection ability and strategy on the subject’s per-formance can be disentangled in this model, however. Under some generalcontinuity and monotonicity conditions and because Us and Un are indepen-dent, we get

P (Us > Un) =

∫

∞

−∞

P (Us > λ) dP (Un ≤ λ) =

∫

1

0

ρ (p) dp, (8)

with ρ the ROC function and after changing variable from λ to pn (λ) = p.Thus, for a fixed pair (s, n), the area under the ROC curve, which doesnot depend upon the subject’s strategy, is a measure of the probability thatUs exceeds Un. Note that (8) remains true under any arbitrary continuous

10

strictly increasing transformation of the random variables. For practicalreasons, specific hypotheses are often made on the distributions of theserandom variables, which are (in most cases) assumed to be Gaussian, withexpectations µs = E (Us) and µn = E (Un), and a common variance equalto 1. Replotting the ROC curves in (standard) normal-normal coordinates,we see that each replotted ROC curve is a straight line with a slope equal to1 and an intercept equal to µs − µn.

Obviously, this model is closely related to Thurstone’s Law of Compar-ative Judgments. Using derivations similar to those leading to (4) and (5)and defining d′ (s, n) = µs − µn we obtain P (Us > Un) = Φ

(

d′ (s, n) /√

2)

,an equation linking the basic signal detectability index d′ and the area un-der the ROC curve. The index d′ has become a standard tool not only insensory psychology, but also in other fields where the paradigm is suitableand the subject’s guessing strategy is of concern. Multidimensional versionsof this Gaussian signal detection model have been developed. Various othermodels have also been considered for such data, involving either differentassumptions on the distributions of the random variables Us and Un, oreven completely different models, such as ‘threshold’ models (Krantz, 1969).Presentations of this topic can be found in Green and Swets (1974) andMacMillan and Creelman (2004).

Mathematical models for ‘multidimensional’ psychophysics were also de-veloped. One approach takes the guise of ‘Geometric representations of per-ceptual phenomena’, which is the title of a seminal volume on the topic (Luceet al., 1995, see, in particular Indow 1995). Another approach emphasizespsychological dissimilarity as the foundational concept (Dzhafarov, 2011).

5. Measurement and Choice

Because they were preoccupied with the scientific bases of their discipline,a number of mathematical psychologists have devoted considerable efforts tothe elucidation of the foundation of measurement theory, that is, the set ofprinciples governing the use of numbers in the statement and discussion ofscientific facts and theories. An account of the results can be found in thethree volumes of ‘Foundation of Measurement’ (Krantz et al., 1971; Luceet al., 1990; Suppes et al., 1989)

The literature on Choice Theory is extensive. Contributors come fromdiverse fields including mathematical psychology, but also microeconomics,political science and business, for example, the latter two concerned with the

11

study of voter or consumer choice. Early on, the literature was dominated byThurstone’s Law of Comparative Judgments, which still remains influential.Many other situations and models have been analyzed, however, and we onlygive a few pointers here. A generalization of Thurstone model is obtained bydropping the assumption of normality of the random variables Ux and Uy inthe last part of (4). Despite many attempts, the problem of characterizingthis model in terms of conditions on the binary choice probabilities, posed byBlock and Marschak (1960), is still unsolved. In other words, we do not knowwhich set of necessary and sufficient conditions on the choice probabilitiesP (x, y) guarantee the existence of the random variables Ux, Uy satisfyingthe first part of (4) for all x and y in the choice set. A number of partialresults have been obtained, however.

In the multiple choice paradigm, the subject is presented with a subsetY of a basic finite set X of objects and is required to select one of theobjects in Y . We denote by P (x; Y ) the probability of selecting x in Y . Byabuse of notation, we also write P (X; Y ) =

∑

x∈X P (x; Y ). Suppose thatP > 0. The Choice Axiom, proposed by Luce (1959), states that, for allZ ⊆ Y ⊆ W ⊆ X , we have

P (Z; Y )P (Y ; W ) = P (Z; W ) . (9)

Defining the function v : x 7→ P (x;X ), (9) yields immediately P (x; Y ) =

v(x)[

∑

y∈Y v(y)]

−1

, for all Y ⊆ X and x ∈ Y . This model plays an impor-

tant role in the literature. In the binary case, it has an interpretation in termsof random variables as in the Thurstone model, but these random variables,rather than being Gaussian, have a negative exponential distribution.

In the general case of such a random variable model for the multiplechoice paradigm, we simply suppose that to each x in X is attached a randomvariable Ux such that, for all subsets X of X and all x in X, we have

P (x; X) = P (Ux = max{Uy | y ∈ X}) . (10)

The general characterization problem for this model has been solved by Fal-magne (1978) who states necessary and sufficient conditions for the existenceof the random variables satisfying (10). His paper also contains a unique-ness result. As in the binary case, specific assumptions can be made on thedistributions of these random variables.

Other models, based on different principles, have also been proposed forthe multiple choice paradigm. For example, in the elimination by aspects

12

model, due to Tversky (1972), a subject’s choice of some object x in a set Xis regarded as resulting from an implicit Markovian-type process graduallynarrowing down the acceptable possibilities. For reviews of probabilisticchoice models see Luce and Suppes (1965), or Suppes et al. (1989). A sampleof some of the results can be found in Marley (1997).

6. Response Time Models

The time or latency of a response has been used as a behavioral indexof the sensory or mental processes involved in the task since the inceptionof experimental psychology in the 19th century. Many mathematical modelsare based on Donders’ idea that the observed response time is a sum of anumber of unobservable components including at least a sensory, a decisionand a motor response part (Donders, 1868/1969). These models make variousassumptions on the distributions of the component times, which are oftentaken to be independent. McGill (1963), for instance, assumes that thecomponent times are all distributed exponentially and independently, withpossibly different parameters, so that their sum is distributed as a generalgamma random variable.

Another category of models is grounded on the assumption that the ob-served response results from an unobservable accumulation of evidence withabsorbing boundaries. These models are known as sequential sampling mod-els (Busemeyer and Rapoport, 1988; Laming, 1968; Link and Heath, 1975;Ratcliff, 1978; Vickers, 1979), and are based on sequential sampling meth-ods from statistics, including the sequential probability ratio test (Wald andWolfowitz, 1948). As psychological models, sequential sampling processesassume the latent accumulation of information or evidence from a stimu-lus, based on a series of samples. When the accumulated evidence reaches aboundary or threshold, the decision corresponding to that boundary is made,and the number of samples taken provides a measure of the time taken tomake the decision. This means the models generate a joint distribution overboth decisions and response times. There are also some theoretical accountsof confidence that can be related to the dynamics of sequential sampling pro-cesses, and so allow this third behavioral variable to be modeled (Vickers,1979; Pleskac and Busemeyer, 2010). In some reaction time situations, thesuccessive stimuli follow each other in a fast paced sequence. In such cases,prominent sequential effects appear. Markovian models explaining such se-quential effect have been developed (Falmagne, 1965; Falmagne et al., 1975).

13

Within the general sequential sampling framework, there are many mod-els and model classes making different assumptions about the way evidenceis accumulated, and the form of the boundaries. Random-walk or drift diffu-sion models (Ratcliff, 1978; Ratcliff and McKoon, 2008) assume a single tallyis maintained, race or accumulator models (Smith and Vickers, 1988; Vick-ers, 1970) maintain separate tallies for each alternative decision, and ballisticmodels (Brown and Heathcote, 2008) also maintain separate tallies but with-out stochastic variability. In most models, the sampling process that gener-ates evidence from a stimulus is assumed to be homogeneous, and boundariesare also assumed to be constant throughout the decision-making process.There are, however, some exceptions Smith (2000), especially consideringmore general utilities incorporating deadlines or time pressure (Frazier andYu, 2008), or considering mechanisms for the learning or self-regulation ofthe boundaries over trials (Busemeyer and Myung, 1992; Simen et al., 2006;Vickers, 1979).

Sequential sampling models were at first mostly applied to simple visualand perceptual decision-making phenomena, but have also found applica-tion in two other areas. One of these is in modeling higher-order cognitiveprocesses such as categorization (Nosofsky and Palmeri, 1997) and judg-ment and preference phenomena, especially through Decision Field Theory(Busemeyer and Townsend, 1993). The other is in combining behavioral andneuroscientific data relating to the time course of simple decisions (Gold andShadlen, 2007; Smith and Ratcliff, 2004). There is recent work on hierarchi-cal extensions of the model, to account for individual differences in subjects,differences between stimuli, and other sources of variation beyond the level ofa single decision trial (Rouder et al., 2003; Vandekerckhove et al., in press).

Early work in the modeling of response times relied on analytic results,making the comparison of models to data, and the estimation of parametersfrom data tractable. Computational approaches have played a progressivelymore important role, but there continues to be important foundational math-ematical development (Navarro and Fuss, 2009; Smith, 2000). Luce (1986) isa basic reference for the models and their applications, and Smith (2000) andBogacz et al. (2006) provide reviews of many of the relevant mathematicaland statistical results.

14

7. Other topics

From these four traditional areas, research in mathematical psychologyhas grown to include a wide variety of subjects. Current research includemany aspects of perception, cognition, memory, decision-making,and moregenerally information processing (cf. Dosher and Sperling, 1998). In somecases, the models can be seen as more or less direct descendant of thoseproposed by earlier researchers in the field. The multinomial process treemodels (Batchelder and Riefer, 1999), for instance, is in the spirit of theMarkovian models of the learning theorists.

However, as with response time modeling, the advent of powerful comput-ers gave also rise to different types of models for which the predictions couldbe obtained by simulation, rather than by mathematical derivation. Repre-sentative of this trend are the parallel distributed processing or connectionistmodels (Rumelhart and McClelland, 1986), scaling and clustering models ofstimulus representation (Shepard, 1980; Navarro and Griffiths, 2008) and awide range of cognitive process models, including especially models of cat-egory learning (Ashby and Maddox, 2005; Kruschke, 2008; Nosofsky, 1992)and memory (Clark and Gronlund, 1996; Norman et al., 2008). Despitethe computational emphasis, there continue to be important mathematicalresults for some of these models (Myung et al., 2007; Navarro, 2005).

Most recently, Bayesian methods have had a rapid and widespread influ-ence over the areas traditionally studied by mathematical psychology. Thereare at least three types of Bayesian influence on the field (Lee, 2011). Onetype involves applying Bayesian statistics for data analysis, and to compareand evaluate models (Pitt et al., 2002; Rouder et al., 2009; Kruschke, 2011).Another influence is as a framework for extending cognitive process modelaccounts of complicated behavioral data, allowing the incorporation of in-dividual differences, stimulus variability, co-variate information, and otherhierarchical and latent mixture structure (Lee, 2008; Rouder et al., 2007).The third influence Bayesian methods have had is as a theoretical metaphorfor the mind, to contrast with alternatives like the information processingor connectionist metaphors. The Bayesian view treats the mind as apply-ing Bayesian inference to sparse and noisy data to learn about richly struc-tured mental hypothesis spaces (Chater et al., 2006). This approach hasrisen quickly in prominence, and many Bayesian models have been devel-oped across a wide range of phenomena, including especially generalization,concept learning, and inductive reasoning (Anderson, 1991; Kemp and Tenen-

15

baum, 2008; Griffiths and Tenenbaum, 2009; Tenenbaum and Griffiths, 2001).Finally, we mention Psychometrics. The research in Psychometrics con-

cerns the elaboration of statistical models and techniques for the analysis oftest results. As suggested by the term, the main objective is the assignmentof one or more numbers to a subject for the purpose of measuring some men-tal or physical traits. In principle, such a topic could be regarded as part ofour subject. For historical reasons, however, this line of work has remainedseparate from mathematical psychology. The research on knowledge spacesis an alternative to psychometrics. Instead of measuring a person’s numericalaptitude in a subject, it uses stochastic algorithms to uncover the knowledgestate of a person, which is the set of all concepts mastered by the person inthat subject (Falmagne and Doignon, 2011).

8. The Journals, the Researchers, the Society

The research results in mathematical psychology are mostly published inspecialized journals such as the Journal of Mathematical Psychology, Math-

ematical Social Sciences, Psychometrika, Econometrica and Mathematiques,

Informatique et Sciences Humaines. Some of the work also appears in mainstream publications: Psychological Review or Psychonomic Bulletin & Re-

view. Early on, the research was typically produced by psychologists, andthe work often had a strong experimental component. The last couple ofdecades other researchers became interested in the field, coming especiallyfrom economics, applied mathematics, and machine learning.

The Society for Mathematical Psychology (SMP) was founded in 1979.The society manages the Journal of Mathematical Psychology, and organizesa yearly meeting gathering several hundred participants coming from all overthe world. The European Mathematical Psychology Group (EMPG), is aninformal association of about one hundred scientists meeting every summerin some European university. The first meeting was in 1971.

9. Cross References

43013. Bayesian theory, history of applications; 43017. Computationalapproaches to model evaluation; 43021. Connectionist approaches; Deci-sion and choice: behavioral decision research; 43029. Decision and Choice:Economic Psychology; 43031. Decision and Choice: Luce’s Choice Axiom;43033. Decision and choice: random utility models of choice and response

16

time; 43034. Decision and Choice: Utility and Subjective Probability, Con-temporary Theories; 43037. Diffusion and Random Walk Processes; 43039.Discrete state models of information processing 43040. Dynamic decisionmaking; 43043. Functional Equations in Behavioral and Social Sciences;43047. Information theory; 43049. Knowledge spaces; 43050. Learning:mathematical learning theory; 43051. Learning: Mathematical LearningTheory, History; 43055. Markov decision processes; 43059. MathematicalPsychology: History; 43062. Measurement theory: history and philosophy;43064. Measurement: Representational Theory of; 43065. Memory Models,Quantitative; 43066. Model Testing and Selection, Theory of; 43081. Psy-chometrics: Multidimensional Scaling in Psychology; 43084. Psychophysicallaws and theory, history; 43089. Sequential decision making; 43090. Sig-nal Detection Theory; 43091. Signal Detection Theory, History of; 43094.Stochastic dynamic models (choice, reponse, time); 43109. Decision andChoice: Risk, Theories; 43113. Mathematical Learning Theory

10. References

Bush, R.R, & Mosteller, F., 1951. A mathematical model for simplelearning. Psychological Review, 58, 313–323.

Estes,W.K., 1950. Toward a statistical theory of learning. Psychological

Review, 57, 94–107.

Falmagne, J.C., 1985/2002. Elements of Psychophysical Theory. OxfordUniversity Press, New York. Reprinted in 2002.

Green, D.M., Swets, J.A., 1974. Signal Detection Theory and Psychophysics.

Krieger, New York.

Luce, R.D., 1959. Individual Choice Behavior. Wiley, New York.

Luce, R.D., 1986. Response Times: Their Role in Inferring Elementary

Mental Organization. Oxford University Press, New York.

Luce, R.D., Bush, R.R, & Galanter, E. (Eds.), 1963. Handbook of Math-

ematical Psychology, Volume 1. Wiley, New York.

17

Luce, R.D., Bush, R.R, & Galanter, E. (Eds.), 1963. Handbook of Math-

ematical Psychology, Volume 2. Wiley, New York.

Luce, R.D., Bush, R.R.,& Galanter, E. (Eds.), 1965) Handbook of Math-

ematical Psychology, Volume 3: Representation, axiomatization, and invari-

ance. Wiley, New York.

Norman, F., 1972. Markov Processes and Learning Models. AcademicPress, New York.

Shepard, R.N., 1980.Multidimensional scaling, tree-fitting, and cluster-ing. Science 214, 390–398.

Stevens, S.S, 1957. On the psychophysical law. Psychological Review 64,

153–181.

Thurstone, L.L., 1927. A law of comparative judgement. Psychological

Review, 34, 273–286.

Thurstone, L.L., 1947. Multiple-FactorAnalysis. University of ChicagoPress.

11. Relevant Websites

The Society for Mathematical Psychology. http://www.mathpsych.org.

References

Aczel, J., Falmagne, J., Luce, R., 2000. Functional equations in the behavo-rial sciences. Japonica Mathematica 52, 469–512.

Anderson, J., 1991. The adaptive nature of human categorization. Psycho-logical Review 98, 409–29.

Anderson, N., 1981. Foundations of Information Integration Theory. Aca-demic Press, New York.

Ashby, F., Maddox, W., 2005. Human category learning. Annual Review ofPsychology 56, 149–178.

18

Atkinson, R., Bower, G., Crothers, E., 1965. An Introduction to Mathemat-ical Learning Theory. Wiley, New York.

Atkinson, R., Estes, W., 1963. Stimulus sampling theory, in: R.D.Luce,R.R.Bush, Galanter, E. (Eds.), Handbook of Mathematical Psychology,Volume 2. Wiley, New York, pp. 121–268.

Batchelder, W., Riefer, D., 1999. Theoretical and empirical review of multi-nomial process tree modeling. Psychonomic Bulletin & Review 6, 57–86.

Bharucha-Reid, A., 1960. Elements of the Theory of Markov Processes andtheir Applications. MacGraw-Hill, New York.

Block, H., Marschak, J., 1960. Random ordering and stochastic theories ofresponses, in: Olkin, I., Ghurye, S., Hoeffding, W., Madow, W., Mann,H. (Eds.), Contributions to Probability and Statistics. Stanford UniversityPress, Stanford, CA, pp. .97–132.

Bock, R., Jones, L., 1968. The Measurement and Prediction of Judgementand Choice. Holden-Day, San Francisco, CA.

Bogacz, R., Brown, E., Moehlis, J., Holmes, P., Cohen, J., 2006. The physicsof optimal decision making: A formal analysis of models of performance intwo–alternative forced choice tasks. Psychological Review 113, 700–765.

Boring, E., 1950. A History of Experimental Psychology. Prentice Hall,Engelwood Cliffs, NJ.

Brown, S., Heathcote, A., 2008. The simplest complete model of choiceresponse time: Linear ballistic accumulation. Cognitive Psychology 57,153–178.

Busemeyer, J., Myung, I., 1992. An adaptive approach to human decisionmaking: Learning theory, decision theory, and human performance. Jour-nal of Experimental Psychology: General 121, 177–194.

Busemeyer, J., Rapoport, A., 1988. Psychological models of deferred decisionmaking. Journal of Mathematical Psychology 32, 91–134.

Busemeyer, J., Townsend, J., 1993. Decision field theory: A dynamic–cognitive approach to decision making. Psychological Review 100, 432–459.

19

Bush, R., Mosteller, F., 1951. A mathematical model for simple learning.Psychological Review 58, 313–323.

Chater, N., Tenenbaum, J., Yuille, A., 2006. Probabilistic models of cogni-tion: Conceptual foundations. Trends in Cognitive Sciences 10, 287–291.

Clark, S., Gronlund, S., 1996. Global matching models of recognition mem-ory: How the models match the data. Psychonomic Bulletin & Review 3,37–60.

Donders, F., 1868/1969. On the speed of mental processes. Acta Psychologica30, 412–431. Translated by W. G. Koster.

Dosher, B., Sperling, G., 1998. A century of information processing the-ory: Vision, attention and memory, in: Hochberg, J., Cutting, J.E. (Eds.),Handbook of Perception and Cognition at Century’s End: History, Philos-ophy, Theory. Academic Press, San Diego, CA, pp. 199–252.

Dzhafarov, E., 2011. Mathematical foundations of universal Fechnerian scal-ing, in: Berglund, B., Rossi, G.B., Townsend, J.T., Pendrill, L. (Eds.),Measurements With Persons. Psychology Press, New York, pp. 185–210.

Estes, W., 1950. Toward a statistical theory of learning. Psychological Review57, 94–107.

Estes, W., Suppes, P., 1959. Foundations of linear models, in: Bush, R.,Estes, W. (Eds.), Studies in Mathematical Learning Theory. Stanford Uni-veristy Press, Stanford, CA, pp. 137–179.

Falmagne, J.C., 1965. Stochastic models for choice reaction time with ap-plication to experimental results. Journal of Mathematical Psychology 2,77–124.

Falmagne, J.C., 1978. A representation theorem for finite random scale sys-tems. Journal of Mathematical Psychology 18, 52–72.

Falmagne, J.C., 1985/2002. Elements of Psychophysical Theory. OxfordUniversity Press, New York. Reprinted in 2002.

Falmagne, J.C., Cohen, S., Dwivedi, A., 1975. Two-choice reactions as anordered memory scanning process, in: Rabbit, P., Dornic, S. (Eds.), At-tention and Performance V. Academic Press, pp. 296–344.

20

Falmagne, J.C., Doignon, J.P., 2011. Learning spaces, in: InterdisciplinaryMathematics Series. Springer, Heidelberg.

Frazier, P., Yu, A., 2008. Sequential hypothesis testing under stochasticdeadlines, in: Platt, J., Koller, D., Singer, Y., Roweis, S. (Eds.), Advancesin Neural Information Processing Systems 20, MIT Press, Cambridge, MA.pp. 465–472.

Gold, J., Shadlen, M., 2007. The neural basis of decision making. AnnualReview of Neuroscience 30, 535–574.

Green, D., Swets, J., 1974. Signal Detection Theory and Psychophysics.Krieger, New York.

Griffiths, T., Tenenbaum, J., 2009. Theory-based causal induction. Psycho-logical Review 116, 661–716.

Hilgard, E., 1943. Theories of Learning. Appleton Century Crofts, New York.second edition.

Hull, C., 1943. Principles of Behavior. Appleton Century Crofts, New York.

Indow, T., 1995. Psychophysical scaling: Scientific and practical applications,in: Luce, R.D., D’Zmura, M., Hoffman, D.D., Iverson, G.I., K, R.A. (Eds.),Geometric Representations of Perceptual Phenomena. Papers in Honor ofTarow Indow for his 70th Birthday. Erlbaum, Mahwah, NJ, pp. 1–28.

James, W., 1890/1950. The Principles of Psychology. volume I. Holt.Reprinted by Dover Publications.

Kemp, C., Tenenbaum, J., 2008. The discovery of structural form. Proceed-ings of the National Academy of Sciences 105, 10687–10692.

Krantz, D., 1969. Threshold theories of signal detection. Psychological Re-view 76, 308–324.

Krantz, D., 1972. A theory of magnitude estimation and cross modalitymatching. Journal of Mathematical Psychology 9, 168–199.

Krantz, D., Luce, R., Suppes, P., Tversky, A., 1971. Foundations of Mea-surement, Volume 1: Additive and Polynomial Representations. AcademicPress, New York.

21

Kruschke, J., 2008. Models of categorization, in: Sun, R. (Ed.), The Cam-bridge Handbook of Computational Psychology. Cambridge UniversityPress, New York, pp. 267–301.

Kruschke, J., 2011. Doing Bayesian Data Analysis: A Tutorial with R andBUGS. Academic Press / Elsevier.

Laming, D., 1968. Information Theory of Choice Reaction Time. AcademicPress, London.

Lee, M., 2008. Three case studies in the Bayesian analysis of cognitive models.Psychonomic Bulletin & Review 15, 1–15.

Lee, M., 2011. How cognitive modeling can benefit from hierarchical Bayesianmodels. Journal of Mathematical Psychology 55, 1–7.

Levitt, H., 1970. Transformed up-down methods in psychoacoustics. TheJournal of the Acoustical Society of America 49, 467–476.

Link, S., Heath, R., 1975. A sequential theory of psychological discrimination.Psychometrika 40, 77–105.

Luce, R., 1959. Individual Choice Behavior. Wiley, New York.

Luce, R., 1986. Response Times: Their Role in Inferring Elementary MentalOrganization. Oxford University Press, New York.

Luce, R., D’Zmura, T., Hoffman, D., Iverson, G.., A.K., R. (Eds.), 1995.Geometric Representations of Perceptual Phenomena. Papers in Honor ofTarow Indow for his 70th Birthday. Erlbaum, Mahwah, NJ.

Luce, R., Galanter, E., 1963. Discrimination, in: Luce, R., Bush, R.,Galanter, E. (Eds.), Handbook of Mathematical Psychology, Volume 1.Wiley. volume I, pp. 191–244.

Luce, R., Krantz, D., Suppes, P., Tversky, A., 1990. Foundations of mea-surement, Volume 3: Representation, axiomatization, and invariance. Aca-demic Press, San Diego, CA.

Luce, R., Suppes, P., 1965. Preference, utility and subjective probability, in:Luce, R.D., Bush, R.R., Galanter, E. (Eds.), Handbook of MathematicalPsychology, Volume 3. Wiley, pp. 252–410.

22

MacMillan, N., Creelman, C., 2004. Detection Theory: A User’s Guide (2nded.). Erlbaum, Hillsdale, NJ.

Marley, A. (Ed.), 1997. Choice, Decision and Measurement. Papers in Honorof R. Duncan Luce. Erlbaum, Mahwah, NJ.

McGill, W., 1963. Stochastic latency mechanisms, in: Luce, R.D., Bush,R.R., Galanter, E. (Eds.), Handbook of Mathematical Psychology, Vol-ume 1. Wiley, New York, pp. 309–360.

Miller, G., 1964. Mathematics and Psychology. McGraw-Hill, New York.

Myung, J., Montenegro, M., Pitt, M., 2007. Analytic expressions for BCD-MEM models of recognition memor. Journal of Mathematical Psychology51, 198–204.

Navarro, D., 2005. Analyzing the RULEX model of category learning. Journalof Mathematical Psycholog 49, 259–275.

Navarro, D., Fuss, I., 2009. Fast and accurate calculations for first-passagetimes in Wiener diffusion models. Journal of Mathematical Psychology 53,222–230.

Navarro, D., Griffiths, T., 2008. Latent features in similarity judgment: Anonparametric Bayesian approach. Neural Computation 20, 2597–2628.

Norman, F., 1972. Markov Processes and Learning Models. Academic Press,New York.

Norman, K., Detre, G., Polyn, S., 2008. Computational models of episodicmemory, in: Sun, R. (Ed.), The Cambridge handbook of computationalpsychology. Cambridge University Press, New York, pp. 189–224.

Nosofsky, R., 1992. Similarity scaling and cognitive process models. AnnualReview of Psychology 43, 25–53.

Nosofsky, R., Palmeri, T., 1997. An exemplar-based random walk model ofspeeded classification. Psychological Review 104, 266–300.

Parzen, E., 1994. Stochastic Processes. Holden-Day, San Francisco, CA.

23

Pitt, M., Myung, I., Zhang, S., 2002. Toward a method of selecting amongcomputational models of cognition. Psychological Review 109, 472–491.

Pleskac, T., Busemeyer, J., 2010. Two-stage dynamic signal detection: Atheory of confidence, choice, and response time. Psychological Review117, 864–901.

Ratcliff, R., 1978. A theory of memory retrieval. Psychological Review 85,59–108.

Ratcliff, R., McKoon, G., 2008. The diffusion decision model: Theory anddata for two-choice decision tasks. Neural Computation 20, 873–922.

Robbins, H., Monro, S., 1951. A stochastic approximation method. TheAnnals of Mathematical Statistics 22, 400–407.

Rouder, J., Lu, J., Speckman, P., Sun, D., Morey, R., Naveh-Benjamin, M.,2007. Signal detection models with random participant and item effects.Psychometrika 72, 621–642.

Rouder, J., Speckman, P., Sun, D., Morey, R., Iverson, G., 2009. Bayesian t-tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin& Review 16, 225–237.

Rouder, J., Sun, D., Speckman, P., Lu, J., Zhou, D., 2003. A hierarchicalBayesian statistical framework for response time distributions. Psychome-trika 68, 589–606.

Rumelhart, D., McClelland, J. (Eds.), 1986. Parallel distributed processing.Exploration in the microstructure of cognition, Volume 1. MIT Press,Cambridge, MA.

Shepard, R., 1980. Multidimensional scaling, tree-fitting, and clustering.Science 214, 390–398.

Simen, P., Cohen, J., Holmes, P., 2006. Rapid decision threshold modulationby reward rate in a neural network. Neural Networks 19, 1013–1026.

Smith, P., 2000. Stochastic dynamic models of response time and accuracy:A foundationalprimer. Journal of Mathematical Psychology 44, 408–463.

24

Smith, P., Ratcliff, R., 2004. The psychology and neurobiology of simpledecisions. Trends in Neurosciences 27, 161–168.

Smith, P., Vickers, D., 1988. The accumulator model of two-choice discrimi-nation. Journal of Mathematical Psychology 32, 135–168.

Sternberg, S., 1963. Stochastic learning theory, in: Luce, R.D., Bush, R.,Galanter, E. (Eds.), Handbook of Mathematical Psychology, Volume 2.Wiley, New York.

Stevens, S., 1957. On the psychophysical law. Psychological Review 64,153–181.

Suppes, P., Ginsburg, R., 1963. A fundamental property of all-or-none mod-els, binomial distribution of responses prior to conditioning, with applica-tion to concept formation in children. Psychological Review 70, 139–171.

Suppes, P., Krantz, D., Luce, R., Tversky, A., 1989. Foundations of Mea-surement, Volume 2. Academic Press, San Diego, CA.

Tenenbaum, J., Griffiths, T., 2001. Generalization, similarity, and Bayesianinference. Behavioral and Brain Sciences 24, 629–640.

Thurstone, L., 1919. The learning curve equation. Psychological Monographs26, 1–51.

Thurstone, L., 1927a. A law of comparative judgement. Psychological Review34, 273–286.

Thurstone, L., 1927b. Psychological analysis. American Journal of Psychol-ogy 38, 368–389.

Thurstone, L., 1947. Multiple-Factor Analysis. University of Chicago Press.

Tversky, A., 1972. Elimination by aspects: A theory of choice. PsychologicalReview 79, 281–299.

Vandekerckhove, J., Tuerlinckx, F., Lee, M., in press. Hierarchical diffusionmodels for two-choice response time. Psychological Methods .

Vickers, D., 1970. Evidence for an accumulator model of psychophysicaldiscrimination. Ergonomics 13, 37–58.

25

Vickers, D., 1979. Decision Processes in Visual Perception. Academic Press,New York, NY.

Wald, A., Wolfowitz, J., 1948. Optimal character of the sequential probabilityratio test. Annals of Mathematical Statistics 19, 326–339.

Wasan, M., 1969. Stochastic approximation. Cambridge Tracts in Mathe-matics and Mathematical Physics 58.

Yellott, Jr, J., 1969. Probability learning with non contingent success. Jour-nal of Mathematical Psychology 6, 541–575.

26

Mathematical Psychology - UCI Webfiles

Documents

Transcript of Mathematical Psychology - UCI Webfiles