Lawful relation between perceptual bias and discriminabilityastocker/lab/publications...perceptual...

14
Lawful relation between perceptual bias and discriminability Xue-Xin Wei a,b,c and Alan A. Stocker d,e,1 a Grossman Center for the Statistics of Mind, Columbia University, New York, NY 10027; b Center for Theoretical Neuroscience, Columbia University, New York, NY 10027; c Department of Statistics, Columbia University, New York, NY 10027; d Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104; and e Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA 19104 Edited by Wilson S. Geisler, University of Texas at Austin, Austin, TX, and approved August 4, 2017 (received for review November 21, 2016) Perception of a stimulus can be characterized by two fundamen- tal psychophysical measures: how well the stimulus can be dis- criminated from similar ones (discrimination threshold) and how strongly the perceived stimulus value deviates on average from the true stimulus value (perceptual bias). We demonstrate that perceptual bias and discriminability, as functions of the stimu- lus value, follow a surprisingly simple mathematical relation. The relation, which is derived from a theory combining optimal encod- ing and decoding, is well supported by a wide range of reported psychophysical data including perceptual changes induced by con- textual modulation. The large empirical support indicates that the proposed relation may represent a psychophysical law in human perception. Our results imply that the computational processes of sensory encoding and perceptual decoding are matched and opti- mized based on identical assumptions about the statistical struc- ture of the sensory environment. perceptual behavior | efficient coding | Bayesian observer | Weber–Fechner | stimulus statistics P erception is a subjective experience that is shaped by the expectations and beliefs of an observer (1). Psychophysical measures provide an objective yet indirect characterization of this experience by describing the dependency between the phys- ical properties of a stimulus and the corresponding perceptually guided behavior (2). Two fundamental measures characterize an observer’s percep- tion of a stimulus. Discrimination threshold indicates the sen- sitivity of an observer to small changes in a stimulus variable (Fig. 1A). The threshold depends on the quality with which the stimulus variable is represented in the brain (2) (i.e., encoded; Fig. 1B); a more accurate representation results in a lower dis- crimination threshold. In contrast, perceptual bias is a measure that reflects the degree to which an observer’s perception devi- ates on average from the true stimulus value. Perceptual bias is typically assumed to result from prior beliefs and reward expec- tations with which the observer interprets the sensory evidence (1), and thus is determined by factors that seem not directly related to the sensory representation of the stimulus. As a result, it has long been believed that there is no reason to expect any lawful relation between perceptual bias and discrimination threshold (3). However, here we derive a direct mathematical relation be- tween discrimination threshold D (θ) and perceptual bias b (θ) based on a recently proposed observer theory of perception (4, 7). The key idea is that both the encoding and the decoding pro- cess of the observer are optimally adapted to the statistical struc- ture of the perceptual task (Fig. 2A). Specifically, we assume encoding to be efficient (8, 9) such that it maximizes the infor- mation in the sensory representation about the stimulus given a limit on the overall available coding resources (10). The assump- tion implies a sensory representation whose coding resources are allocated according to the stimulus distribution p (θ). This results in the encoding constraint p (θ) p J (θ), [1] where the Fisher information J (θ) represents the coding accu- racy of the sensory representation (4, 11–13). Fisher information provides a lower bound on the discrimination threshold irrespec- tive of whether the estimator is biased or not, which can be for- mulated as D (θ) c / p J (θ), where c is a constant (5, 6). Rather than assuming a tight bound, we make the weaker assumption that the bound is equally “loose” over the range of the stimu- lus value θ. Using the encoding constraint (Eq. 1) above, we can express discrimination threshold in terms of the stimulus distri- bution as D (θ) 1/p (θ). [2] Similarly, we have shown that the perceptual bias of the Bayesian observer model (Fig. 2A) can also be expressed in terms of the stimulus distribution (4, 7). Assuming that uncertainty in the perceptual process is dominated by internal (neural) noise and that the noise is relatively small, we can analytically derive the observer’s bias as b (θ) (1/p (θ) 2 ) 0 . [3] We can show that the expression holds independently of the details of the assumed loss function for a large family of sym- metric loss functions (Supporting Information). Magnitude and sign of its proportionality coefficient, however, depend on sev- eral factors, including the noise magnitude and the loss function. Finally, by combining Eqs. 2 and 3, we obtain a direct functional relation between perceptual bias and discrimination threshold in the form of b (θ) (D (θ) 2 ) 0 ; [4] Significance We present a law of human perception. The law expresses a mathematical relation between our ability to perceptually discriminate a stimulus from similar ones and our bias in the perceived stimulus value. We derived the relation based on theoretical assumptions about how the brain represents sen- sory information and how it interprets this information to cre- ate a percept. Our main assumption is that both encoding and decoding are optimized for the specific statistical structure of the sensory environment. We found large experimental sup- port for the law in the literature, which includes biases and changes in discriminability induced by contextual modulation (e.g., adaptation). Our results imply that human perception generally relies on statistically optimized processes. Author contributions: X.-X.W. and A.A.S. designed research, performed research, ana- lyzed data, and wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. 1 To whom correspondence should be addressed. Email: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1619153114/-/DCSupplemental. 10244–10249 | PNAS | September 19, 2017 | vol. 114 | no. 38 www.pnas.org/cgi/doi/10.1073/pnas.1619153114

Transcript of Lawful relation between perceptual bias and discriminabilityastocker/lab/publications...perceptual...

Page 1: Lawful relation between perceptual bias and discriminabilityastocker/lab/publications...perceptual bias and discriminability, as functions of the stimu-lus value, follow a surprisingly

Lawful relation between perceptual biasand discriminabilityXue-Xin Weia,b,c and Alan A. Stockerd,e,1

aGrossman Center for the Statistics of Mind, Columbia University, New York, NY 10027; bCenter for Theoretical Neuroscience, Columbia University, NewYork, NY 10027; cDepartment of Statistics, Columbia University, New York, NY 10027; dDepartment of Psychology, University of Pennsylvania, Philadelphia,PA 19104; and eDepartment of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA 19104

Edited by Wilson S. Geisler, University of Texas at Austin, Austin, TX, and approved August 4, 2017 (received for review November 21, 2016)

Perception of a stimulus can be characterized by two fundamen-tal psychophysical measures: how well the stimulus can be dis-criminated from similar ones (discrimination threshold) and howstrongly the perceived stimulus value deviates on average fromthe true stimulus value (perceptual bias). We demonstrate thatperceptual bias and discriminability, as functions of the stimu-lus value, follow a surprisingly simple mathematical relation. Therelation, which is derived from a theory combining optimal encod-ing and decoding, is well supported by a wide range of reportedpsychophysical data including perceptual changes induced by con-textual modulation. The large empirical support indicates that theproposed relation may represent a psychophysical law in humanperception. Our results imply that the computational processes ofsensory encoding and perceptual decoding are matched and opti-mized based on identical assumptions about the statistical struc-ture of the sensory environment.

perceptual behavior | efficient coding | Bayesian observer |Weber–Fechner | stimulus statistics

Perception is a subjective experience that is shaped by theexpectations and beliefs of an observer (1). Psychophysical

measures provide an objective yet indirect characterization ofthis experience by describing the dependency between the phys-ical properties of a stimulus and the corresponding perceptuallyguided behavior (2).

Two fundamental measures characterize an observer’s percep-tion of a stimulus. Discrimination threshold indicates the sen-sitivity of an observer to small changes in a stimulus variable(Fig. 1A). The threshold depends on the quality with which thestimulus variable is represented in the brain (2) (i.e., encoded;Fig. 1B); a more accurate representation results in a lower dis-crimination threshold. In contrast, perceptual bias is a measurethat reflects the degree to which an observer’s perception devi-ates on average from the true stimulus value. Perceptual bias istypically assumed to result from prior beliefs and reward expec-tations with which the observer interprets the sensory evidence(1), and thus is determined by factors that seem not directlyrelated to the sensory representation of the stimulus. As a result,it has long been believed that there is no reason to expectany lawful relation between perceptual bias and discriminationthreshold (3).

However, here we derive a direct mathematical relation be-tween discrimination threshold D(θ) and perceptual bias b(θ)based on a recently proposed observer theory of perception (4,7). The key idea is that both the encoding and the decoding pro-cess of the observer are optimally adapted to the statistical struc-ture of the perceptual task (Fig. 2A). Specifically, we assumeencoding to be efficient (8, 9) such that it maximizes the infor-mation in the sensory representation about the stimulus given alimit on the overall available coding resources (10). The assump-tion implies a sensory representation whose coding resources areallocated according to the stimulus distribution p(θ). This resultsin the encoding constraint

p(θ) ∝√

J (θ), [1]

where the Fisher information J (θ) represents the coding accu-racy of the sensory representation (4, 11–13). Fisher informationprovides a lower bound on the discrimination threshold irrespec-tive of whether the estimator is biased or not, which can be for-mulated as D(θ)≥ c/

√J (θ), where c is a constant (5, 6). Rather

than assuming a tight bound, we make the weaker assumptionthat the bound is equally “loose” over the range of the stimu-lus value θ. Using the encoding constraint (Eq. 1) above, we canexpress discrimination threshold in terms of the stimulus distri-bution as

D(θ) ∝ 1/p(θ). [2]

Similarly, we have shown that the perceptual bias of the Bayesianobserver model (Fig. 2A) can also be expressed in terms of thestimulus distribution (4, 7). Assuming that uncertainty in theperceptual process is dominated by internal (neural) noise andthat the noise is relatively small, we can analytically derive theobserver’s bias as

b(θ) ∝ (1/p(θ)2)′. [3]

We can show that the expression holds independently of thedetails of the assumed loss function for a large family of sym-metric loss functions (Supporting Information). Magnitude andsign of its proportionality coefficient, however, depend on sev-eral factors, including the noise magnitude and the loss function.Finally, by combining Eqs. 2 and 3, we obtain a direct functionalrelation between perceptual bias and discrimination threshold inthe form of

b(θ) ∝ (D(θ)2)′; [4]

Significance

We present a law of human perception. The law expressesa mathematical relation between our ability to perceptuallydiscriminate a stimulus from similar ones and our bias in theperceived stimulus value. We derived the relation based ontheoretical assumptions about how the brain represents sen-sory information and how it interprets this information to cre-ate a percept. Our main assumption is that both encoding anddecoding are optimized for the specific statistical structure ofthe sensory environment. We found large experimental sup-port for the law in the literature, which includes biases andchanges in discriminability induced by contextual modulation(e.g., adaptation). Our results imply that human perceptiongenerally relies on statistically optimized processes.

Author contributions: X.-X.W. and A.A.S. designed research, performed research, ana-lyzed data, and wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.1To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1619153114/-/DCSupplemental.

10244–10249 | PNAS | September 19, 2017 | vol. 114 | no. 38 www.pnas.org/cgi/doi/10.1073/pnas.1619153114

Page 2: Lawful relation between perceptual bias and discriminabilityastocker/lab/publications...perceptual bias and discriminability, as functions of the stimu-lus value, follow a surprisingly

NEU

ROSC

IEN

CEPS

YCH

OLO

GIC

AL

AN

DCO

GN

ITIV

ESC

IEN

CES

A B

Stimulusvariable

Observer

Percept

Discriminationthreshold

Discriminationthreshold Perceptual

bias

Perceptualbias

Encoding Decoding

Sensoryrepresentation

Prior beliefs, etc.

Fig. 1. Psychophysical characterization and modeling of perception. (A) Perception of a stimulus variable (e.g., the angular tilt θ0 of the leaning towerof Pisa) can be characterized by an observer’s discrimination threshold and perceptual bias. Discrimination threshold specifies how well an observer candiscriminate small deviations around the particular stimulus orientation θ0 given that there is noise in perceptual processing (green arrow). Perceptual biasspecifies how much, on average, the perceived orientations over repeated presentations (thin lines) deviate from the true stimulus orientation (blue arrow).(B) Modeling perception as an encoding–decoding processing cascade. Discriminability is limited by the characteristics of the encoding process, i.e., thequality of the internal, sensory representation m. Perceptual bias, however, also depends on the decoding process that typically involves cognitive factorssuch as prior beliefs and reward expectations. Both discrimination threshold and perceptual bias can be characterized with appropriate psychophysicalmethods (indicated by dashed arrows).

i.e., perceptual bias (as a function of the stimulus variable) is pro-portional to the slope of the square of the discrimination thresh-old (Fig. 2B).

We tested the surprisingly simple relation against a wide rangeof existing psychophysical data. Figs. 3 and 4 show data for thoseperceptual variables for which both discrimination threshold andperceptual bias have been reported over a sufficiently large stim-ulus range. We grouped the examples according to their charac-teristic bias–threshold patterns. The first group consists of a setof circular variables (Fig. 3 A–C). It includes local visual orienta-tion, probably the most well-studied perceptual variable. Orien-tation perception exhibits the so-called oblique effect (38), whichdescribes the observation that the discrimination threshold peaksat the oblique orientations yet is lowest for cardinal orientations(14). Based on the oblique effect, Eq. 4 predicts that percep-tual bias is zero at, and only at, both cardinal and oblique ori-

A B

Learning

Dis

crim

inab

ility

Bia

s

Discriminationthreshold

Perceptualbias

Sensoryrepresentation

Bayesiandecoding

Efficientcoding

Likelihoodfunction 0

0+

-

Prior belief

Stimulus distribution

Fig. 2. Observer model that links perceptual bias and discrimination threshold. (A) Our theory of perception proposes that encoding and decoding areboth optimized for a given stimulus distribution (4). Based on this theory, the encoding accuracy characterized by Fisher information J(θ) and the bias b(θ)of the Bayesian decoder are both dependent on the stimulus distribution p(θ). With Fisher information providing a lower bound on discriminability D(θ) (5,6), we can mathematically formulate the relation between perceptual bias and discrimination threshold as b(θ) ∝ (D(θ)2)′. (B) Arbitrarily chosen examplehighlighting the characteristics of the relation: Bias is zero at the extrema of the discrimination threshold (red arrows) and largest for stimulus values wherethe threshold changes most rapidly. Thus, the magnitude of perceptual bias typically does not covary with the magnitude of the discrimination threshold.

entations. Measured bias functions confirm this prediction (15).Other circular variables that exhibit similar patterns are headingdirection using either visual or vestibular information (16), 2Dmotion direction measured with a two alternative forced-choice(2AFC) procedure (17, 18) or by smooth pursuit eye movements(19), pointing direction (20), and motion direction in depth (21,22). The relation also holds for the more high-level perceptualvariable of perceived heading direction of approaching biologi-cal motion (human pedestrian) (23) as shown in Fig. 3C.

The second group contains noncircular magnitude variablesfor which discrimination threshold (approximately) followsWeber’s law (24) and linearly increases with magnitude (Fig.3D). We predict that these variables should exhibit a perceptualbias that is also linear in stimulus magnitude. Indeed, we foundthis to be true for spatial frequency [threshold (14, 34, 39), bias(25)] as well as temporal frequency [threshold (40), bias (27)]

Wei and Stocker PNAS | September 19, 2017 | vol. 114 | no. 38 | 10245

Page 3: Lawful relation between perceptual bias and discriminabilityastocker/lab/publications...perceptual bias and discriminability, as functions of the stimu-lus value, follow a surprisingly

−π 0 +π

−π 0 +π

0

0-90 0 90

-50 0 50

-60 0 60

Heading direction (approaching)Orientation

Heading direction

Motion direction

Pursuit direction

Motion direction (depth)

[deg]

[deg]

[deg]

+-

0+

-

visualvestibular

[a.u.]

[cycles/deg]

[Hz]

[deg/s]

[deg]

[deg]

[deg]

−π 0

0 10 20

0 6 12 0 6 12

00

01

20

12

3

0-0

.20.

4

30

16

9

-60 0 60-10

010

0 10 200 10 20

-88

+-

[a.u.]0

0

10 20

+π [a.u.]−π 0 +π

+-

0

−π 0 +π

[a.u.]−π 0 +π

010

020

010

-10

010

-10

04

-40

30-3

0-90 0 90

-180 0 180-180 0 180

-180 0 180 -180 0 180

-180 0 180-180 0 180

00

00.

6

4

-80

4

Portfors-Yeomans/Regan1996

-50 0 50-30

030

Welchman et al. 2008

Krukowski/Stone 2005

Gros et al. 1998

Rauber/Treue 1998

Crane 2012

De Gardelle et al 2010

Caelli et al. 1983

Regan/Beverley 1983

Stocker/Simoncelli 2006

Caelli et al. 1983

Georgeson/Ruddock 1980

Sweeny et al. 2012

Vintch/Gardner 2014

Vintch/Gardner 2014Waugh/Hess 1994

0 5

00.

3-0

.3

Visual speed

Spatial frequency

0 5

0.2

0.6

1

Threshold Bias Threshold Bias

Threshold Bias Threshold Bias

A C

DB

Temporal frequency

Pointing direction Smyrnis et al. 2014

-180 0 180

01

2

[deg]-180 0 180

*

*

Fig. 3. Predicted and measured bias–threshold patterns in perception. Data are organized into different groups. Green/blue curves represent the predicteddiscrimination threshold/bias. (A) Discrimination threshold (14) and bias (15) data for perceived orientation and heading direction (16) (solid lines for visualstimulation, and dashed lines for vestibular stimulation). (B) We found similar patterns for perceived motion direction [measured both with a 2AFC procedure(17, 18) or with smooth pursuit behavior (19)] and pointing direction [where subjects had to estimate the direction of a visually presented arrow (20)]. (C) Thepredicted relation also holds for perceived motion direction in depth (21, 22), as well as for the perception of higher-level stimuli such as the approachingheading direction of a person (23), although the quality of the available data is limited. (D) The bias–threshold relation is different for magnitude (i.e.,noncircular) variables: Discrimination threshold is typically proportional to the stimulus value [Weber’s law (24)], and thus we predict that perceptual bias isalso linear in stimulus value. Reported patterns for visually perceived spatial frequency (14, 25, 34), temporal frequency (26, 27), and grating speed (27, 28)match this prediction. All data curves are replotted from their corresponding publications, except bias data for orientation (15) and threshold data for visualspeed (28), which both were derived by analyzing the original data. Stars (*) mark data that represent perceptual variability rather than discriminationthresholds (see also Figs. S2 and S3 for more details).

10246 | www.pnas.org/cgi/doi/10.1073/pnas.1619153114 Wei and Stocker

Page 4: Lawful relation between perceptual bias and discriminabilityastocker/lab/publications...perceptual bias and discriminability, as functions of the stimu-lus value, follow a surprisingly

NEU

ROSC

IEN

CEPS

YCH

OLO

GIC

AL

AN

DCO

GN

ITIV

ESC

IEN

CES

in vision. Visual speed is another example for which discrimina-tion threshold approximately follows Weber’s law (28) and biasis also approximately linear with stimulus speed (27), although,in contrast to the other examples, with a negative proportionalitycoefficient. A possible explanation for the sign difference may bethat speed perception is governed by a loss function that differsfrom the loss functions for the other variables. In perceiving thespeed of a moving object, an observer’s emphasis might be onestimating the speed “just right” to maximize the chance to, e.g.,successfully intercept the object. A loss function that approxi-mates the “all-or-nothing” characteristics of the L0 norm wouldserve that goal, and would also predict the negative sign of theproportionality coefficient (see Supporting Information). Finally,perceived weight, one of the classical examples for illustratingWeber’s law (24), also seems to be consistent with the proposedbias–threshold pattern (41).

The last group contains bias–threshold patterns that are notintrinsic to individual specific variables but are induced by con-textual modulation (Fig. 4). Spatial context as in the tilt illu-sion can induce characteristic repulsive biases in the perceivedstimulus orientation away from the orientation of the spatial sur-round (30). The corresponding change in discrimination thresh-old (29) well matches the predicted pattern based on our the-oretically derived relation. A similar bias–threshold pattern hasbeen reported when probing motion direction instead of orienta-tion (31). Furthermore, similar patterns have been observed fortemporal context, i.e., as the result of adaptation. Adaptation-induced biases and changes in discrimination threshold for per-ceived visual orientation (32, 33) and spatial frequency (34, 35)are qualitatively in agreement with the prediction. At a slightlylonger time scale, perceptual learning is also known to reducediscrimination thresholds. We therefore predict that perceptuallearning also induces repulsive biases away from the learnedstimulus value. This prediction is indeed confirmed by data for

−π 0 +π −π 0 +π

-90 0 90 -90 0 90

+-

[a.u.][cycles/deg]

[deg]

Adaptation (orientation)

Adaptation (spatial frequency)

0 10 20 0 10 20

1 0

1

1

1.5

0.5 -2

02

2 [-

]

-10

1

Regan/Beverly 1983 Blakemore et al. 1970

Regan/Beverly 1985 Mitchell/Muir 1976

Threshold Bias

-90 0 90 -90 0 90

-15 0 15 -15 0 15

-90 0 90-90 0 90 [deg] [deg]

[deg]

Tilt illusion Perceptual learning (orientation)

Attention (gap size)

00

11

-40

4-.

160

.16

-201

0

22

Suzuki/Cavanagh 1997

Chen/Fang 2011Solomon/Morgan 2006 Westheimer 1990

prediction

Fig. 4. Predicted and measured bias–threshold patterns induced by various forms of contextual modulation. Spatial context as in the tilt illusion (29, 30) [orusing motion stimuli (31)] and temporal context during adaptation experiments [orientation (32, 33), relative to adaptor orientation; spatial frequency (34,35), red arrows indicating the value of the adaptor stimulus] induce similar biases and changes in thresholds that are qualitatively well predicted (relativeto the unadapted condition). The predicted relation seems to also hold for perceptual changes induced by perceptual learning (36) and spatial attention.Spatial attention has been tied to repulsive biases (37) at the locus of attention and is also known for improvements in discriminability, although little isknown about how discriminability changes with stimulus value. All data curves are replotted from the corresponding publications. All data represent biasesand changes in threshold (ratio) relative to the corresponding control conditions (no surround; preadapt; prelearn; nonattended).

learning orientation (36) and motion direction (42), albeit theexisting data are sparse. Finally, attention has been known asa mechanism that can decrease discrimination threshold (43).We predict that this decrease should coincide with a repul-sive bias in the perceived stimulus variable. Although limitedin extent, data from a Vernier gap size estimation experimentare in agreement with this prediction (37) and are further sup-ported by recent results (44). In sum, the derived relation canreadily explain a wide array of empirical observations across dif-ferent perceptual variables, sensory modalities, and contextualmodulations.

Based on the strong empirical support, we argue that we haveidentified a universal law of human perception. It provides a uni-fied and parsimonious characterization of the relation betweendiscrimination threshold and perceptual bias, which are the twomain psychophysical measures characterizing the perception of astimulus variable. Only a very few quantitative laws are known inperceptual science, including Weber–Fechner’s law (2, 24) andStevens’ law (45). These laws express simple empirical regular-ities which provide a compact yet generally valid description ofthe data. The law we have proposed here shares the same virtue.However, unlike these previous laws, our law is not the result ofempirical observations but rather was derived based on theoret-ical considerations of optimal encoding and decoding (4). Thus,as such, it does not merely describe perceptual behavior butrather reflects our understanding of why perception exhibits suchcharacteristics in the first place. Note that, without the theory,we would not have discovered the general empirical regularitybetween discrimination threshold and bias. It is conceivable thatsome of the theoretical assumptions we made in deriving thelaw may prove incorrect (see Fig. S1) despite the fact that thelaw itself is empirically well supported. It is difficult to imag-ine, however, how a lawful relation between perceptual biasand discrimination threshold could emerge without a functional

Wei and Stocker PNAS | September 19, 2017 | vol. 114 | no. 38 | 10247

Page 5: Lawful relation between perceptual bias and discriminabilityastocker/lab/publications...perceptual bias and discriminability, as functions of the stimu-lus value, follow a surprisingly

constraint that tightly links the encoding and decoding processesof perception (Fig. 2A).

The law allows us to predict either perceptual bias based onmeasured data for discrimination threshold or vice versa. Onegeneral prediction is that stimulus variables that follow Weber’slaw should exhibit perceptual biases that are linearly propor-tional to the stimulus value as demonstrated with examples inFig. 3D. Furthermore, because perceptual illusions are oftenexamples of a strong form of perceptual bias induced by changesin context, we predict that these illusions should be accompaniedwith substantial threshold changes according to our law.

Perceptual biases can arise for different reasons, not all ofwhich are aligned with the assumptions we made in our deriva-tions. In particular, because we assumed that the uncertainty inthe inference process is predominantly due to internal (neural)noise of the observer, we do not expect the proposed law to holdunder conditions where stimulus ambiguity/noise is the domi-nant source of uncertainty. In this case, we expect discriminationthreshold to be mainly determined by the stimulus uncertainty andnot the prior expectations as we have assumed (Eq. 2).

It is worth noting that the law can also be expressed in termsof perceptual variance rather than discrimination threshold. Thiscan be useful because some psychophysical experiments designedfor measuring perceptual bias (e.g., by a method of adjustment)often record variance in subjects’ estimates as well. Using theCramer–Rao bound on the variance σ2(θ) of a biased estimator,we can rewrite Eq. 4 as

b(θ) ∝ (σ2(θ)/(1 + b′(θ))2)′. [5]

For relatively small and smoothly changing biases, the predic-tions for variance and discrimination threshold are similar (seeSupporting Information for details).

Last but not least, perhaps the most surprising finding is thatthe law seems to hold for bias and discriminability patternsinduced by contextual modulation (Fig. 4). This implies not onlythat changes in encoding and decoding can happen immediately(e.g., spatial context), or at least on short time scales, but alsothat these changes are matched between encoding and decod-ing by relying on identical assumptions (i.e., prior expectations)about the structure of the sensory environment. This fundamen-tally contrasts with existing theories that assume mismatchesbetween encoding and decoding (i.e., the “coding catastrophe”)to be responsible for many of the known contextually modu-lated bias effects (46). It also contrasts with findings that putthe locus of perceptual learning either at the encoding (47) orat the decoding (48) level; we predict learning to occur at bothlevels. Whether these contextual priors actually match the stim-ulus distributions within these contexts or not is unclear andremains a subject for future studies. Data from spatial atten-tion experiments (Fig. 4) at least suggest that they may reflectsubjective rather than objective expectations. This would implythat the distinction is not relevant in the context of the observermodel considered here (Fig. 2A) and that efficient encoding andBayesian decoding are both optimized for identical prior expec-tations, irrespective of whether these expectations are subjectiveor objective. We believe that the proposed law and its underlyingtheoretical assumptions have profound implications for the com-putations and neural mechanisms governing perception, whichwe have just started to explore.

ACKNOWLEDGMENTS. We thank V. De Gardelle, S. Kouider, and J. Sackurfor sharing their data. We thank Josh Gold and Michael Eisenstein forhelpful comments and suggestions on earlier versions of the manuscript.This research was performed while both authors were at the University ofPennsylvania. We thank the Office of Naval Research for supporting thework (Grant N000141110744).

1. Helmholtz Hv (1867) Handbuch der Physiologischen Optik, Allgemeine Enzyklopadieder Physik (Voss, Leipzig, Germany), Vol 9.

2. Fechner GT (1876) Vorschule der Asthetik (Breitkopf and Hartel, Leipzig, Germany),Vol 1.

3. Green D, Swets J (1966) Signal Detection Theory and Psychophysics (Wiley, New York).4. Wei XX, Stocker AA (2015) A Bayesian observer model constrained by efficient coding

can explain “anti-Bayesian” percepts. Nat Neurosci 18:1509–1517.5. Seung H, Sompolinsky H (1993) Simple models for reading neuronal population codes.

Proc Natl Acad Sci USA 90:10749–10753.6. Series P, Stocker AA, Simoncelli EP (2009) Is the homunculus ‘aware’ of sensory adap-

tation? Neural Comput 21:3271–3304.7. Wei XX, Stocker AA (2012) Efficient coding provides a direct link between prior and

likelihood in perceptual Bayesian inference. Advances in Neural Information Process-ing Systems NIPS 25 (MIT Press, Cambridge, MA), pp 1313–1321.

8. Attneave F (1954) Some informational aspects of visual perception. Psychol Rev61:183–193.

9. Barlow HB (1961) Possible principles underlying the transformation of sensory mes-sages. Sensory Communication, ed Rosenblith W (MIT Press, Cambridge, MA), pp 217–234.

10. Linsker R (1988) Self-organization in a perceptual network. Computer 21:105–117.11. Brunel N, Nadal JP (1998) Mutual information, Fisher information, and population

coding. Neural Comput 10:1731–1757.12. Ganguli D, Simoncelli EP (2014) Efficient sensory encoding and Bayesian inference

with heterogeneous neural populations. Neural Comput 26:2103–2134.13. Wei XX, Stocker AA (2016) Mutual information, Fisher information and efficient cod-

ing. Neural Comput 28:305–326.14. Caelli T, Brettel H, Rentschler I, Hilz R (1983) Discrimination thresholds in the two-

dimensional spatial frequency domain. Vision Res 23:129–133.15. De Gardelle V, Kouider S, Sackur J (2010) An oblique illusion modulated by visibility:

Non-monotonic sensory integration in orientation processing. J Vis 10:6.16. Crane BT (2012) Direction specific biases in human visual and vestibular heading per-

ception. PLoS One 7:e51383.17. Gros BL, Blake R, Hiris E (1998) Anisotropies in visual motion perception: A fresh look.

J Opt Soc Am A Opt Image Sci Vis 15:2003–2011.18. Rauber HJ, Treue S (1998) Reference repulsion when judging the direction of visual

motion. Perception 27:393–402.19. Krukowski AE, Stone LS (2005) Expansion of direction space around the cardinal axes

revealed by smooth pursuit eye movements. Neuron 45:315–323.20. Smyrnis N, Mantas A, Evdokimidis I (2014) Two independent sources of anisotropy in

the visual representation of direction in 2-D space. Exp Brain Res 232:2317–2324.21. Portfors-Yeomans C, Regan D (1996) Cyclopean discrimination thresholds for the

direction and speed of motion in depth. Vision Res 36:3265–3279.

22. Welchman AE, Lam JM, Bulthoff HH (2008) Bayesian motion estimation accounts fora surprising bias in 3D vision. Proc Natl Acad Sci USA 105:12087–12092.

23. Sweeny TD, Haroz S, Whitney D (2012) Reference repulsion in the categorical percep-tion of biological motion. Vision Res 64:26–34.

24. Weber EH (1978) The Sense of Touch, trans Ross HE, Murray DJ (Academic, San Diego).25. Georgeson M, Ruddock K (1980) Spatial frequency analysis in early visual processing.

Philos Trans R Soc Lond B Biol Sci 290:11–22.26. Waugh S, Hess R (1994) Suprathreshold temporal-frequency discrimination in the

fovea and the periphery. J Opt Soc Am A Opt Image Sci Vis 11:1199–1212.27. Vintch B, Gardner JL (2014) Cortical correlates of human motion perception biases. J

Neurosci 34:2592–2604.28. Stocker AA, Simoncelli EP (2006) Noise characteristics and prior expectations in human

visual speed perception. Nat Neurosci 9:578–585.29. Solomon JA, Morgan MJ (2006) Stochastic re-calibration: Contextual effects on per-

ceived tilt. Proc Biol Sci 273:2681–2686.30. Westheimer G (1990) Simultaneous orientation contrast for lines in the human fovea.

Vision Res 30:1913–1921.31. Tzvetanov T, Womelsdorf T (2008) Predicting human perceptual decisions by decoding

neuronal information profiles. Biol Cybern 98:397–411.32. Regan D, Beverley K (1985) Postadaptation orientation discrimination. J Opt Soc Am

A 2:147–155.33. Mitchell DE, Muir DW (1976) Does the tilt after-effect occur in the oblique meridian?

Vision Res 16:609–613.34. Regan D, Beverley K (1983) Spatial-frequency discrimination and detection: Compari-

son of postadaptation thresholds. J Opt Soc Am A 73:1684–1690.35. Blakemore C, Nachmias J, Sutton P (1970) The perceived spatial frequency shift:

Evidence for frequency-selective neurones in the human brain. J Physiol 210:727–750.

36. Chen N, Fang F (2011) Tilt aftereffect from orientation discrimination learning. ExpBrain Res 215:227–234.

37. Suzuki S, Cavanagh P (1997) Focused attention distorts visual space: An attentionalrepulsion effect. J Exp Psychol Hum Percept Perform 23:443–463.

38. Appelle S (1972) Perception and discrimination as function of stimulus orientation:The “oblique effect” in man and animals. Psychol Bull 78:266–278.

39. Campbell FW, Nachmias J, Jukes J (1970) Spatial-frequency discrimination in humanvision. J Opt Soc Am 60:555–559.

40. Mandler M (1984) Temporal frequency discrimination above threshold. Vision Res24:1873–1880.

41. Ross HE (1964) Constant errors in weight judgements as a function of the size of thedifferential threshold. Br J Psychol 55:133–141.

42. Szpiro S, Spering M, Carrasco M (2014) Perceptual learning modifies untrained pursuiteye movements. J Vis 14:8.

10248 | www.pnas.org/cgi/doi/10.1073/pnas.1619153114 Wei and Stocker

Page 6: Lawful relation between perceptual bias and discriminabilityastocker/lab/publications...perceptual bias and discriminability, as functions of the stimu-lus value, follow a surprisingly

NEU

ROSC

IEN

CEPS

YCH

OLO

GIC

AL

AN

DCO

GN

ITIV

ESC

IEN

CES

43. Carrasco M, McElree B (2001) Covert attention accelerates the rate of visual informa-tion processing. Proc Natl Acad Sci USA 98:5363–5367.

44. Cutrone E (2017) Attention, appearance, and neural computation. PhD thesis (NewYork University, New York).

45. Stevens SS (1957) On the psychophysical law. Psychol Rev 64:153–181.46. Schwartz O, Hsu A, Dayan P (2007) Space and time in visual context. Nat Rev Neurosci

8:522–535.

47. Schoups A, Vogels R, Qian N, Orban G (2001) Practising orientation identificationimproves orientation coding in V1 neurons. Nature 412:549–553.

48. Law CT, Gold JI (2008) Neural correlates of perceptual learning in a sensory-motor,but not a sensory, cortical area. Nat Neurosci 11:505–513.

49. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86.50. McDonnell MD, Stocks NG (2008) Maximally informative stimuli and tuning curves for

sigmoidal rate-coding neurons and populations. Phys Rev Lett 101:058103.

Wei and Stocker PNAS | September 19, 2017 | vol. 114 | no. 38 | 10249

Page 7: Lawful relation between perceptual bias and discriminabilityastocker/lab/publications...perceptual bias and discriminability, as functions of the stimu-lus value, follow a surprisingly

Supporting InformationWei and Stocker 10.1073/pnas.1619153114

SI TextThis document consists of three parts: The first part provides derivations of all of the components (i.e., essentially Eqs. 1–3) usedin expressing the relation between perceptual bias and the discrimination threshold. Note that most of these derivations have beendescribed before, and we only include them here to make the paper self-contained. In the second part, we derive the relation betweenperceptual bias and variance (rather than discrimination threshold). Finally, in the last part, we prove that the proposed relationbetween discrimination threshold (or variance) and perceptual bias is not limited to the L2 case but holds for a large class of symmetricloss functions.

Part 1: Relation Between Perceptual Bias and Discrimination Threshold.Relation between Fisher information and stimulus distribution (prior). A key assumption of the used observer model (4) is that encodingis efficient in the sense that it seeks to maximize the mutual information I [θ,m] between a scalar stimulus variable θ and its sensoryrepresentation m (9, 10). Fisher information J (θ), defined as

J (θ) =

∫ (∂ ln p(m|θ)

∂θ

)2

p(m|θ)dm, [S1]

can approximate mutual information in the regime of small Gaussian noise (11, 13) through

I [θ,m] =1

2ln

(S2

2πe

)−KL

(p(θ)||

√J (θ)

S

), [S2]

where S =∫θ

√J (θ)dθ and KL( || ) represents the (always nonnegative) Kullback–Leibler divergence (49).

We have previously shown (4, 13) that, by imposing a constraint to bound the total coding resource of the form

S =

∫θ

√J (θ)dθ ≤ c1, [S3]

where c1 is a constant, maximizing mutual information requires the KL( || ) in Eq. S2 to be zero. This is the case if

p(θ) ∝√

J (θ), [S4]

i.e., if the limited coding resources as measured by Fisher information J (θ) are matched to the stimulus prior distribution p(θ). Severaltheoretical studies (4, 11, 12, 50) have previously derived and identified the constraint Eq. S4, which we refer to as the “efficient codingconstraint.”Relation between Fisher information and discrimination threshold. Fisher information provides a lower bound on the discriminationthreshold (5), including biased estimators (6). Below, we provide a shortened version of the derivation for the biased estimator accord-ing to ref. 6.

Consider two stimulus values θ0 − δθ/2 and θ0 + δθ/2. With the assumption δθ > 0, it follows that θ2 > θ1. Signal detection theory(3) provides a formal description of how bias and variance for the estimates θ1, θ2 of the two stimulus values determine the probabilitythat θ2 > θ1, i.e., that the discrimination of the two stimulus values based on the estimates is correct. We can write

Pcorrect(θ1, θ2) =1

2G

(d(θ1, θ2)

2

), [S5]

by making the assumption that both estimates are Gaussian distributed, where the function G(.) is defined as

G(d) =2√π

∫ d

−∞exp(−y2)dy , [S6]

and d(θ1, θ2) is the normalized distance (referred to as d-prime) between the distributions for the two estimates θ1, θ2. The distanced(θ1, θ2) can be expressed as

d(θ1, θ2) =µ(θ2)− µ(θ1)√σ2(θ2)+σ2(θ1)

2

, [S7]

where µ(θi) and σ2(θi) denote the mean and the variance of each of the two Gaussian distributions.By writing µ(θi) = θi + b(θi), where b(θi) is the estimation bias at θi , we can approximate d(θ1, θ2) for a small δθ as

d(θ1, θ2) = d(θ0 − δθ/2, θ0 + δθ/2)

=δθ + b(θ0 + δθ/2)− b(θ0 − δθ/2)√

σ2(θ2)+σ2(θ1)2

≈ δθ(1 + b′(θ0))

σ(θ0), [S8]

where, in the last step, we have exploited the assumption of small δθ by (i) approximating σ2(θ1) and σ2(θ2) as σ2(θ0) and (ii) approxi-mating [b(θ0 + δθ/2)− b(θ0 − δθ/2)]/δθ using b′(θ0). For a given discrimination criterion α (measured in Pcorrect ), the corresponding

Wei and Stocker www.pnas.org/cgi/content/short/1619153114 1 of 8

Page 8: Lawful relation between perceptual bias and discriminabilityastocker/lab/publications...perceptual bias and discriminability, as functions of the stimu-lus value, follow a surprisingly

d(θ1, θ2) can be obtained via Eq. S5. We will refer to this d value as Dα. With Eq. S9, we then can express the discrimination thresholdδθ, which we will formally denote as D(θ), as

D(θ) = δθ = Dασ(θ)

1 + b′(θ). [S9]

The Cramer–Rao bound for a biased estimator is given as

σ2(θ) ≥ (1 + b′(θ))2

J (θ). [S10]

Strictly, the Cramer–Rao bound is only formulated for linear variables. However, for small noise levels, we can approximate circularvariables as locally linear variables, and thus apply the Cramer–Rao bound. The approximation will break down for nonlocal problems,in which case the metrics and topology of the circular space also make the calculation of several key statistical quantities (e.g., mean,variance) substantially different from the way they are computed in the linear space. Rearranging Eq. S10, we get

σ(θ)

1 + b′(θ)≥ 1√

J (θ). [S11]

By combining Eqs. S11 and S9, we find that the discrimination threshold is lower bounded by the Fisher information, i.e.,

D(θ) ≥ Dα1√J (θ)

. [S12]

Finally, assuming that the Cramer–Rao bound is equally loose (not necessarily tight) for every θ, we obtain the relation between Fisherinformation and discrimination threshold,

D(θ) ∝1√J (θ)

. [S13]

Relation between perceptual bias and stimulus distribution (L2 loss). We have previously derived an analytic expression for perceptualb(θ) as a function of the stimulus distribution p(θ), assuming a Bayesian decoder that minimizes squared error (L2 loss) (4). Assumingrelative small noise and a smooth stimulus distribution, we found

b(θ) ∝

(1

p(θ)2

)′. [S14]

As we show in Generalization to Different Loss Functions, this functional relation holds for a large family of symmetric loss functions,including some of the most commonly used loss functions (e.g., Lp loss function). Thus there seems little use in rederiving Eq. S14 forthe special case of mean squared error loss.Putting it together: The relation between perceptual bias and discrimination threshold. By combining Eqs. S4, S13, and S14, we obtainour main result, the proposed relation between perceptual bias and discrimination threshold,

b(θ) ∝ (D(θ)2)′. [S15]

In Generalization to Different Loss Functions, we will prove that this relation generalizes to a large class of symmetric loss functions ofthe Bayesian decoder.Validation of the analytic expressions for discrimination threshold and perceptual bias. The analytic expressions for both the discrim-ination threshold and the perceptual bias (Eqs. 2 and 3) relied on a set of assumptions and approximations (mostly with regard tothe magnitude of the noise). To provide some insight into how these assumptions affect our results, we simulated the full Bayesianobserver model (as briefly described in Generalization to Different Loss Functions and in more detail in ref. 4) for various sensory(internal) noise magnitudes, and then compared the bias and threshold curves of the observer model with those predicted by the twoanalytic expressions.

Fig. S1 shows this comparison for increasing levels of noise. As expected, the approximations are good under small noise conditionsand smoothly degrade with increasing noise levels. The largest noise level shown (κ = 4) corresponds to an estimation variance of acircular variable of approximately 30◦. Importantly, while the numerically computed discrimination threshold and perceptual bias ofthe full Bayesian observer model are noticeably different from the curves given by the analytic expressions (Eqs. 2 and 3), they stillalmost perfectly obey the proposed law.

Part 2: Relation Between Estimation Bias and Variance. The relation between discrimination threshold and bias can be rewritten in termsof estimation variance rather than threshold. This can be useful when trying to analyze datasets obtained with an estimation task(e.g., the method of adjustment). These datasets typically consist of a subject’s estimates over repeated trials, and thus allow thesimultaneous estimate of perceptual bias and variance. While discrimination threshold and estimation variance are directly related,the reformulation in terms of variance results in a slightly more complex expression.

We start with the Cramer–Rao bound on the variance of a biased estimator, that is,

σ2(θ) ≥ (1 + b′(θ))2

J (θ), [S16]

where J (θ) is the Fisher information and b(θ)′ is the derivative of the bias. Using the characteristic efficient coding constraint Eq. S4of our Bayesian observer model, we can express the variance in terms of the prior expectation as

σ2(θ) ∝(1 + b′(θ))

2

p(θ)2, [S17]

Wei and Stocker www.pnas.org/cgi/content/short/1619153114 2 of 8

Page 9: Lawful relation between perceptual bias and discriminabilityastocker/lab/publications...perceptual bias and discriminability, as functions of the stimu-lus value, follow a surprisingly

where we again make the assumption that the bound is equally loose across the entire range of θ. Combining Eq. S17 with the abovederived dependency of the bias on the prior expectation Eq. S14, we find

b(θ) ∝

(σ(θ)2

(1 + b′(θ))2

)′. [S18]

Compared with the original expression Eq. S14, the formulation is identical for variance except that it is scaled by (1 + b′(θ))2. Given

the bias b(θ), we can predict variance as

σ(θ)2 ∝ (1 + b′(θ))2(∫

b(θ)dθ + c0

), [S19]

while the reverse is generally not easy because it would involve solving a nonlinear differential equation. Obviously, when b′(θ) isconstant, the relation simplifies to

b(θ) ∝ (σ(θ)2)′, [S20]

and the predicted relation between the bias and variance has identical functional form to the one between the bias and discriminationthreshold. This should be the case for magnitude variables (i.e., variables that follow Weber’s law), as the perceptual bias for thesevariables is approximately linear in the stimulus value θ, and thus b′(θ) is constant (Fig. 3D). Similarly, for low noise levels, the biasesbecome small and thus, assuming reasonably smooth bias curves, the term (1 + b′(θ))

2 in Eq. S18 approximates 1. The relation betweenvariance and bias again approximates Eq. S20. As the noise increases, the term (1 + b′(θ))

2 becomes effective, and predictions forvariance and squared discrimination threshold start to deviate from each other. Simulations shown in Fig. S2 demonstrate this effect.

The two predictions remain qualitatively similar, which was the reason for us to include the pointing direction dataset (20) in ourcollection of examples (Fig. 3) even though it reports estimation variance rather than discrimination threshold.

Part 3: Generalization to Different Loss Functions. Below, we show that the expression for bias (Eq. S15) generalizes to a large classof symmetric loss functions. Therefore, the main result of the paper, the predicted relation between discrimination threshold (orvariance) and perceptual bias, is general. We first focus on the family of Lp loss functions and then consider the case of more generalsymmetric loss functions in the end.Bayesian observer model with efficient sensory representation. We consider a Bayesian observer model that relies on the efficientencoding of sensory information (4). Given a stimulus with value θ0, we assume the generative model for the encoding stage as

m = F (θ0) + δ, [S21]

where m is the noisy sensory measurement, and δ represents the internal noise, which we assume is additive Gaussian with smallvariance σ2. F (θ) represents the encoder (Fig. 1) and is a one-to-one mapping from stimulus space to an internal sensory space withunits θ = F (θ). This internal sensory space (and thus the mapping F ) is defined as the space in which Fisher information is uniform(thus uniform discrimination threshold). For all following derivations, we use the notation (.) to indicate any variable that is expressedin units of this space, e.g., m = F (m).

Due to the efficient constraint Eq. S4 of the observer model, the mapping F is defined as

F (θ) =

∫ θ

−∞p(χ)dχ, [S22]

which is the cumulative of the prior distribution p(θ).With the above assumption of Gaussian noise, the posterior distribution can be expressed as

p(θ|m) ∝ p(θ)e− (F(θ)−F(m))2

2σ2 . [S23]

Note that we formulate the inference problem in units of the stimulus space where a loss function has a clear physical interpretation.The Lp loss function is defined as the norm

Lp(θ, θ) = |θ − θ|p [S24]

between the estimate θ and the actual stimulus value θ. The corresponding Bayesian estimate θ is the one that minimizes theBayes risk

R =

∫θ

|θ − θ|pp(θ)e−(F(θ)−F(m))2

2σ2 dθ. [S25]

Below, we will derive how the bias of this estimate depends on the specifics of the loss function Eq. S24. Note that we have previouslyderived the bias for the L2 loss (mean squared error estimate) (4).L1 loss. The Bayesian estimate that minimizes the L1 loss function corresponds to the median of the posterior distribution. Note thatthe median is invariant to the particular parametrization of the posterior based on the following mathematical fact: Assuming x0 isthe median of random variable X , for a one-to-one mapping g , g(x0) is the median of random variable g(X ). Because the posteriordistribution for θ is Gaussian and centered around F (m) (the prior is flat in the internal space), the median for p(θ|m) is simplyF (m). Mapping the value back to the stimulus space via F−1, the median for p(θ|m) becomes F−1(F (m)) = m , and thus theestimate θ = m . The bias then can be rewritten as

b(θ0) = E [θ]− θ0 = E [m]− θ0. [S26]

Note that the density of m (with the inverse mapping of Eq. S22) follows

d(m) =1√2πσ

p(m)e− (F(m)−F(θ0))2

2σ2 , [S27]

Wei and Stocker www.pnas.org/cgi/content/short/1619153114 3 of 8

Page 10: Lawful relation between perceptual bias and discriminabilityastocker/lab/publications...perceptual bias and discriminability, as functions of the stimu-lus value, follow a surprisingly

where p(.) represents the prior distribution on θ. Thus,

E [m]− θ0 =1√2πσ

∫mp(m)e

− (F(m)−F(θ0))2

2σ2 dm − θ0 [S28]

=1√2πσ

∫F−1(m)e

− (m−θ0)2

2σ2 dm − θ0,

where we performed a change of variable from m to m = F (m). Assuming the prior density to be smooth, we expand F−1 around θ0up to the second order,

F−1(m) ≈ F−1(θ0) + (F−1)′θ(θ0)(m − θ0) +

1

2(F−1)′′θ (θ0)(m − θ0)

2. [S29]

Note that, the derivative is taken with respect to θ = F (θ), as indicated by the subscript θ. Applying this expansion, we find that

E [m]− θ0 ≈1√2πσ

∫1

2(F−1)′′θ (θ0)(m − θ0)

2e− (F(m)−F(θ0))2

2σ2 dm [S30]

=1

2√2πσ

∫ (1

p(F−1(θ0))

)′θ

(m − θ0)2e−(F(m)−F(θ0))2

2σ2 dm

=1

2√2πσ

∫−(p′θ(θ0)

p(θ0)3

)(m − θ0)2e−

(F(m)−F(θ0))2

2σ2 dm

=1

4√2πσ

∫ (1

p(θ0)2

)′θ

(m − θ0)2e−(F(m)−F(θ0))2

2σ2 dm

=1

4√2πσ

(1

p(θ0)2

)′ ∫(m − θ0)2e−

(m−θ0)2

2σ2 dm

=1

4√2πσ

(1

p(θ0)2

)′ ∫y2e− y2

2σ2 dy ,

where, in the last step, we have defined y = m − θ0 to simplify the notation. Evaluating the Gaussian integral∫y2e− y2

2σ2 dy , we thenfind that

E [m]− θ0 ≈1

4σ2

(1

p(θ0)2

)′. [S31]

It is important to note that this expression for E [m] − θ0 does not depend on the loss function of the decoder, because it does notinvolve the estimate θ. It is determined merely by the encoding process. We will use this result again when we consider other lossfunctions. With Eqs. S26 and S31, we then can express the bias for any θ0 as

b(θ0) = E [θ]− θ0 ≈1

4σ2(1/p(θ0)

2)′. [S32]

Therefore, b(θ) ∝ (1/p(θ)2)′.L0 loss (MAP estimate). The Bayesian estimate that minimizes the L0 loss corresponds to the mode of the posterior distribution.Computing the mode of the posterior Eq. S23 is equivalent to computing the mode of the logarithm of the posterior, thus

∂ ln p(θ|m)

∂θ=

p′(θ)

p(θ)− 1

σ2(F (θ)− F (m))F ′(θ). [S33]

Applying a first-order Taylor expansion on F (θ) around m , we obtain the approximation

F (θ)− F (m) ≈ F ′(m)(θ −m) = p(m)(θ −m) [S34]

and can reexpress Eq. S33 as

∂ ln p(θ|m)

∂θ≈ p′(θ)

p(θ)− 1

σ2p(m)(θ −m)p(θ). [S35]

Setting the left-hand side to zero, we find that the difference between the optimal Bayesian estimate θ and the measurement m can beapproximated by a constant related to p(θ0) and p(θ0)

′, i.e.,

θ −m ≈ σ2 p′(θ)

p(θ)2p(m)

≈ σ2 p′(θ0)

p(θ0)3 , [S36]

where we approximated both p(θ) and p(m) by p(θ0) because σ is assumed to be small. The expectation of this difference is

E [θ −m] ≈ σ2 p′(θ0)

p(θ0)3 = −1

2σ2

(1

p(θ0)2

)′. [S37]

With Eq. S31, we then can express the bias for the MAP estimator as

b(θ0) = E [θ − θ0] = E [θ −m] + E [m − θ0] ≈ −1

4σ2(1/p(θ0)

2)′. [S38]

Therefore, b(θ) ∝ (1/p(θ)2)′. Note that the coefficient of proportionality is negative.

Wei and Stocker www.pnas.org/cgi/content/short/1619153114 4 of 8

Page 11: Lawful relation between perceptual bias and discriminabilityastocker/lab/publications...perceptual bias and discriminability, as functions of the stimu-lus value, follow a surprisingly

L2q loss function (q > 1, and q is an integer). The Bayesian estimate under L2q loss cannot be expressed in closed form, in general.However, we can compute the leading term of the bias of this estimator when the noise variance is small.

Again, the goal is to find the estimator θ, which minimizes the following Bayes risk:

R =

∫θ

(θ − θ)2qp(θ)e

− (F(θ)−F(m))2

2σ2 dθ. [S39]

By examining the second-order derivative of R with respect to θ, we find that the problem is a convex optimization problem. Therefore,there exists one and only one solution. We begin by rewriting

θ − θ = (θ −m) + (m − θ) = (θ −m) + (F−1(m)− F−1(θ)). [S40]

Applying a second-order Taylor expansion on F−1 (Eq. S22) around m , we find

θ − θ = (θ −m) + F−1(m)− F−1(θ)

≈ (θ −m) +1

p(m)(m − θ) + p′(m)

2p(m)3(m − θ)2. [S41]

To simplify notation, we define r = θ −m and s = p′(m)

2p(m)3. Given these notations,

θ − θ = r +1

p(m)(m − θ) + s(m − θ)2. [S42]

Now the Bayes risk R can be approximated as

R ≈∫θ

[r +

1

p(m)(m − θ) + s(m − θ)2

]2qe− (θ−m)2

2σ2 d θ. [S43]

To find the estimate θ that minimizes R, we take the first-order derivative with respect to θ (which is equivalent to the first-orderderivative with respect to r because of the linear dependence on θ), and set it to zero. We find that r should approximately satisfy thefollowing equation: ∫

θ

[r +

1

p(m)(m − θ) + s(m − θ)2

]2q−1

e− (θ−m)2

2σ2 d θ = 0. [S44]

Denoting x = m − θ, the above expression can be simplified as∫x

[r +

1

p(m)x + sx2

]2q−1

e− x2

2σ2 dx = 0. [S45]

We expand the left-hand side, exploiting (due to the symmetry) the fact that∫x2n−1e

− x2

2σ2 dx = 0, [S46]

where n is an arbitrary integer, and obtain∫x

[B0(x2qs + o(x2qs)) + B1r(x

2q−2 + o(x2q−2)) + B2r2(x2q−2s + o(x2q−2s)) + [S47]

B3r3(x2q−4 + o(x2q−4)) + ...+ B2q−2r

2q−2(x2s + o(x2)) + B2q−1r2q−1]e

− x2

2σ2 dx = 0,

where Bi are the corresponding coefficients which we didn’t explicitly express for simplicity. Using the Gaussian integral∫x2ne

− x2

2σ2 dx =√2π(2n − 1)!! σ2n+1, [S48]

where n is an arbitrary integer, we use the notation for double factorial (2n − 1)!! = (2n − 1) ∗ (2n − 3) ∗ ... ∗ 3 ∗ 1 and obtain

C0(σ2q+1s + o(σ2q+1s)) + C1r(σ

2q−1 + o(σ2q−1)) + C2r2(σ2q−1s + o(σ2q−1s)) + [S49]

C3r3(σ2q−3 + o(σ2q−3)) + ...+ C2q−2r

2q−2(σ3s + o(σ3s)) + C2q−1r2q−1σ = 0,

where Ci are the corresponding coefficients. Ignoring the high-order terms, we obtain

C0σ2qs + C1rσ

2q−2 + C2r2σ2q−2s + C3r

3σ2q−4 + ...+ C2q−2r2q−2σ2s + C2q−1r

2q−1 = 0, [S50]

which can be rewritten as

σ2q−2(C0σ2s + C1r) + σ2q−4r2(C2σ

2s + C3r) + ...+ r2q−2(C2q−2σ2s + C2q−1r) = 0. [S51]

It can be verified that the coefficients Ci satisfy the following property:

q <C2k

C2k+1< q2, k = 0, 1, ..., q − 1. [S52]

The left-hand side of Eq. S51 is a continuous function of r that we define as f (r). We find that

f (−qσ2s)f (−q2σ2s) < 0. [S53]

Wei and Stocker www.pnas.org/cgi/content/short/1619153114 5 of 8

Page 12: Lawful relation between perceptual bias and discriminabilityastocker/lab/publications...perceptual bias and discriminability, as functions of the stimu-lus value, follow a surprisingly

Due to the Intermediate Value Theorem, there must exist a solution for f (r) = 0 in the interval (−qσ2s,−q2σ2s), and, because theproblem is convex, this solution is unique. This tells us that r ∼O(σ2). As a result, all of the terms on the left-hand side of Eq. S51 areof the order o(σ2q), except for the first term σ2q−2(C0σ

2s + C1r), which is of the order O(σ2q).For small σ, the higher-order terms can be ignored, and the only relevant term left is σ2q−2(C0σ

2s + C1r). Thus, for small σ, weonly need to solve the equation

σ2q−2(C0σ2s + C1r) = 0. [S54]

With the constraint Eq. S52 on the coefficients C0,C1, this results in

r = −C0

C1σ2s = −(2q − 1)σ2s = −(2q − 1)σ2 p′(m)

2p(m)3≈ (2q − 1)

4σ2

(1

p(θ0)2

)′, [S55]

and thus

θ0 −m ≈ (2q − 1)

4σ2

(1

p(θ0)2

)′. [S56]

The bias of the Bayesian estimator for any L2q loss function can then be approximated as

b(θ0) = E [θ0 − θ0] = E [θ0 −m] + E [m − θ0] [S57]

≈ (2q − 1)

4σ2

(1

p(θ0)2

)′+

1

4σ2

(1

p(θ0)2

)′=

2q

4σ2(1/p(θ0)

2)′,

where we have again used Eq. S31.Therefore, b(θ) ∝

(1/p(θ)2

)′.

L2q−1 loss function (q > 1, and q is an integer). We can use techniques similar to those for the L2q case to compute the bias for anL2q−1 loss function. The main difference is that we have now to consider an additional term M that, however, as we show below, canbe neglected. For an L2q−1 loss, the Bayesian estimator θ minimizes the Bayes risk

R =

∫θ

|θ − θ|2q−1p(θ)e− (F(θ)−F(m))2

2σ2 dθ. [S58]

Similar to the L2q case, the optimization problem is convex and thus has one and only one solution. We solve for the extremum bysetting ∂R/∂θ = 0. Applying a Taylor expansion, we find

0 =

∫θ≥θ

(θ − θ)2q−2

p(θ)e− (F(θ)−F(m))2

2σ2 dθ +

∫θ<θ

−(θ − θ)2q−2

p(θ)e− (F(θ)−F(m))2

2σ2 dθ

=

∫θ≥θ

(θ − θ)2q−2

p(θ)e− (F(θ)−F(m))2

2σ2 dθ +

∫θ<θ

−(θ − θ)2q−2

p(θ)e− (F(θ)−F(m))2

2σ2 dθ

≈∫θ≥ ˜θ

(r +

1

p(m)(m − θ) + s(m − θ)2

)2q−2

e− (θ−m)2

2σ2 d θ

+

∫θ<

˜θ

−(r +

1

p(m)(m − θ) + s(m − θ)2

)2q−2

e− (θ−m)2

2σ2 d θ

=

∫θ≥m

(r +

1

p(m)(m − θ) + s(m − θ)2

)2q−2

e− (θ−m)2

2σ2 d θ

+

∫θ<m

−(r +

1

p(m)(m − θ) + s(m − θ)2

)2q−2

e− (θ−m)2

2σ2 d θ +M , [S59]

where the term M is introduced without being explicitly spelled out. We will deal with this term later.To simplify notation, we define x = m − θ. It follows that

0 ≈∫x≤0

(r +1

p(m)x + sx2)

2q−2

e− x2

2σ2 dx −∫x>0

(r +1

p(m)x + sx2)

2q−2

e− x2

2σ2 dx +M . [S60]

Let us define g(r)=∫x≤0

(r + 1p(m)

x + sx2)2q−2

e− x2

2σ2 dx −∫x>0

(r + 1p(m)

x + sx2)2q−2

e− x2

2σ2 dx . The goal then is to find the r

which satisfies g(r) +M ≈ 0. We will take advantage of the fact that the problem has a unique solution.We first deal with g(r). Similar to the analysis performed for the L2q case, we can verify that, for small σ,

g(r) ≈∫x≤0

[C0x2q−1s + C1x

2q−3r ]e− x2

2σ2 dx . [S61]

By verifying the coefficients C0,C1, we find that r = −(2q − 2)sσ2 to first order approximately satisfies g(r) = 0.

Wei and Stocker www.pnas.org/cgi/content/short/1619153114 6 of 8

Page 13: Lawful relation between perceptual bias and discriminabilityastocker/lab/publications...perceptual bias and discriminability, as functions of the stimu-lus value, follow a surprisingly

Furthermore, we find that r = −(2q − 2)sσ2 also satisfies g(r) +M ≈ 0. The reason is that, for r = −(2q − 2)sσ2, M ∼ o(σ2q) isof higher order than g(r). Thus, M can be ignored. Specifically, we can calculate the order of M as

|M | = 2

∫θ∈[ ˜θ,m]

(r +

1

p(m)(m − θ) + s(m − θ)2

)2q−2

e− (θ−m)2

2σ2 d θ

≈ 2

∫θ∈[ ˜θ,m]

(1

p(m)(m − θ)

)2q−2

e− (θ−m)2

2σ2 d θ

≤ 2

∫θ∈[ ˜θ,m]

(1

p(m)(m − θ)

)2q−2

d θ

∼ O(σ4q−2)

∼ o(σ2q). [S62]

We have assumed m ≥ ˜θ in the above calculation for the convenience of notation. The same technique applies and the result holds

for the case of m ≤ ˜θ.

Thus, r = −(2q − 2)sσ2 is the only solution of ∂R/∂θ = 0 when σ is small. Similar to the case of the L2q loss function, we then canexpress the bias of the Bayesian estimate as

b(θ0) ≈2q − 1

4σ2(1/p(θ0)

2)′. [S63]

Therefore, b(θ) ∝ (1/p(θ)2)′.General symmetric loss function. For an arbitrary symmetric loss function L with L being smooth except at 0, the Bayesian estimator θis the estimator that minimizes the Bayes risk

R =

∫θ

L(|θ − θ|)p(θ)e−(F(θ)−F(m))2

2σ2 dθ. [S64]

Suppose that L(|θ − θ|) ∼ O(|θ − θ|k ) where k is an integer. Then the bias of the Bayesian estimator is approximately the bias of theBayesian estimator corresponding to the Lk loss function, which is what we have derived above.

We thus conclude that b(θ) ∝ (1/p(θ)2)′ represents a general description of the bias for a large set of symmetric loss functions.

Bia

s [

de

g]

Dis

crim

ina

tion

th

resh

old

[de

g]

05

1015

-0.6

00.

6

010

20-1

01

010

2030

-2-1

01

2

040

-20

220

0 +-

0 +-

0 +-

0 +-

0 +-

0 +-

0 +-

0 +-

Noise level

Fig. S1. Bayesian observer model simulations vs. analytic approximations (Eqs. 2 and 3). Discrimination threshold and perceptual bias are shown for a givenprior distribution p(θ) ∝ 1/(3 − cos 2θ) and four increasing levels of sensory (internal) noise. We assumed additive noise following a von Mises distributionwhose concentration parameter κ determined the noise magnitude (high κ corresponds to low magnitude). Black lines represent the predicted curvesaccording to Eqs. 2 and 3, and the colored lines represent the predicted curves according to numerical simulations of the full Bayesian observer model. Notethat, while threshold and bias curves of the observer model show noticeable differences from the curves predicted by the analytic approximations, the lawfulrelation between the two still persists to an almost perfect extent. This is illustrated by how well the discrimination thresholds predicted by the law based onthe numerically computed bias of the observer model (dashed purple lines) match the actual discrimination thresholds of the observer model (green lines).

Wei and Stocker www.pnas.org/cgi/content/short/1619153114 7 of 8

Page 14: Lawful relation between perceptual bias and discriminabilityastocker/lab/publications...perceptual bias and discriminability, as functions of the stimu-lus value, follow a surprisingly

-4-2

02

4

0 90 180 270 360

-4-2

02

4

0 90 180 270 360

-4-2

02

4

0 90 180 270 360

-4-2

02

4

0 90 180 270 360

01

23

4

0 90 180 270 360

01

23

4

0 90 180 270 3600

12

34

0 90 180 270 360

01

23

4

0 90 180 270 360

Fig. S2. Comparing the predictions for discrimination thresholds and variance (i.e., SD). (A) Perceptual bias is modeled as b(θ) = C1 ∗ sin(2θ), with fourdifferent C1 values (0.5, 1, 2, 4). (B) A comparison between the predictions for discrimination threshold D(θ) (dashed lines) and the SD σ(θ) based on the biascurves in A. Discrimination threshold is modeled as D(θ) ∝

√C2 − 0.5 ∗ C1 ∗ cos(2θ). For SD, we use σ(θ) ∝

√C2 − 0.5 ∗ C1 ∗ cos(2θ)(1 + 2 ∗ C1 ∗ cos(2θ)).

C2 = 3 for all of the panels. When b′(θ) is small, the threshold and SD behave similarly (up to a scale factor, which we have set to be 1 here for simplicity). Asb′(θ) increases, the difference between the two becomes larger.

0 50 100 150 200 250 300 350-5

-4

-3

-2

-1

0

1

2

3

4

5

0 50 100 150 200 250 300 3500

1

2

3

Direction [deg]Direction [deg]

Bia

s [

de

g]

Sta

nd

ard

de

via

tion

[d

eg

]

A B

Fig. S3. Variance prediction for pointing direction dataset (20). (A) Bias data and sinewave approximation (bold line). (B) Pointing variability σ(θ) (data)superimposed with the prediction (based on the sinewave approximation) according to Eq. S18 (bold line) and the prediction for discrimination thresholdaccording to Eq. S15 (dashed line). Both predictions are made using the same values for their common free parameters C2. See also Fig. S2.

Wei and Stocker www.pnas.org/cgi/content/short/1619153114 8 of 8

Alan A Stocker
Note:
Alan A Stocker