Musicians rock on short-term memory and multisensory...
Transcript of Musicians rock on short-term memory and multisensory...
D
R
A
F
T
Musicians rock on short-term memory
and multisensory integration
Avigael M. Aizenman, Jason M. Gold
ú& Robert Sekuler
Brandeis University & Indiana Universityú
Supported by CELEST, an NSF Science of Learning Center (SBE-035478), National Institutes ofHealth grant EY-019265, and by AFOSR grant FA9550-10-1-0420. We thank Trevor Agus, Barbara Shinn-Cunningham and Randolph Blake for valuable comments on earlier versions of this paper, and Abigail Noyceand Arielle Keller for their assistance. A version of this work was presented to the 2013 meeting of the VisionSciences Society. e-mail: [email protected]
MULTISENSORY INTEGRATION 2
Abstract
Musicians may have very good memory for sounds, but does that ability extend to other
kinds of stimuli as well? For answers, we assessed musicians’ and non-musicians’ short-term
memory for rapidly-presented, quasi-random sequences whose components varied in lumi-
nance (visual stimuli), or frequency (auditory stimuli), or both (audiovisual stimuli). In
all cases, subjects judged whether a sequence’s last four items replicated its first four. For
some audiovisual sequences, the frequency of each auditory item was monotonically related
to the accompanying visual item’s luminance; for other audiovisual sequences, frequency
and luminance were uncorrelated. Subjects with prior instrumental-training significantly
outperformed their untrained counterparts on both auditory and visual sequences, and on
sequences of correlated auditory and visual items. Reverse correlation analysis revealed
that the correlated, concurrent auditory stream altered how subjects weighted items at
particular ordinal positions in a sequence. Finally, congruence between auditory and visual
items enabled subjects to perform far better than predicted from simple summation of
information from the two modalities, perhaps by engaging special-purpose mechanisms
sensitive to audiovisual correlation.
Keywords
Multisensory, short-term memory, audiovisual integration, modality-appropriateness
MULTISENSORY INTEGRATION 3
Music-related skills are enhanced in people who have been trained to play an instru-
ment (Hyde et al., 2009; Kraus & Chandrasekaran, 2010). Surprisingly, this e�ect includes
superior performance on tasks with little obvious connection to music (Chan, Ho, & Cheung,
1998; Strait, Parbery-Clark, Hittner, & Kraus, 2012; Francois & Schön, 2011; Oxenham,
Fligor, Mason, & Kidd, 2003; Bergstrom, Howard, & Howard, 2012). Various examples
of cross-talk between auditory and visual processing (e.g., Sekuler, Sekuler, & Lau, 1997;
Guttman, Gilroy, & Blake, 2005; Berger & Ehrsson, 2013) led us to hypothesize that musi-
cians, who have had extensive practice with auditory tasks, might also demonstrate superior
visual processing if tested with in an appropriate task.
The selection of an appropriate task took account of Welch and Warren’s (1980)
“modality-appropriateness” conjecture. Specifically, this conjecture asserts that when vi-
sual and auditory processing are compared, the advantage goes to vision when spatial at-
tributes must be processed, but the advantage shifts to audition when temporal attributes
are critical (Welch, 1999; Guttman et al., 2005). The modality-appropriateness conjecture
is supported by recent functional magnetic resonance imaging (fMRI) results. Michalka,
Rosen, Kong, Shinn-Cunningham, and Somers (2012) showed that task demands can dy-
namically recruit di�erent modality-related frontal lobe regions: a visual task entailing rapid
stimulus presentation activates cortical regions normally implicated in auditory attention,
but an auditory task requiring spatial judgements activates regions normally implicated in
visual attention. Together with the modality-appropriateness conjecture, Michalka et al.’s
results suggest that evidence of musicians’ possible superiority in visual processing would
depend upon the temporal characteristics of any test.
For our test, we we chose a paradigm recently introduced by Gold, Aizenman, Bond,
and Sekuler (2013). Building on a paradigm that Agus, Thorpe, and Pressnitzer (2010)
devised for the study of auditory memory, Gold et al. showed subjects rapidly-presented
sequences of quasi-random luminance levels, and asked them to judge whether the second
four luminance levels in each eight-item sequence identically repeated the first four. Stimuli
in their experiments entailed a sequence of rapid variation along what have been described
MULTISENSORY INTEGRATION 4
as “elemental” or “low-level” sensory dimensions (Magnussen, 2000; Pasternak & Greenlee,
2005). Sequences of low-level sensory attributes a�ord useful experimental probes, in part
because they reduce the likelihood that subjects’ performance would be mediated by verbal
labels (Miller & Gazzaniga, 1998; Kahana & Sekuler, 2002). For the present purposes,
such sequences o�ered another potential advantage. Although subjects’ self-reports are
hardly dispositive (Nisbett & Wilson, 1977), some of Gold et al.’s subjects volunteered that
as they observed the visual sequences, they generated subvocal “tunes.” In other words,
they claimed to have recruited auditory imagery for what nominally was a purely visual
task (Berger & Ehrsson, 2013), suggesting a form of cross-talk between modalities that
Guttman et al. (2005) described as “hearing what the eyes see.”
Various evidence of cross-talk between seeing and hearing led us to ask whether musi-
cianship brought enhanced processing of rapidly-presented visual stimuli. For an answer, we
adapted Gold et al.’s paradigm in order to compare music-trained and non-trained subjects
with rapidly-presented stimulus sequences in which luminance levels or auditory frequen-
cies varied (Rammsayer & Altenmüller, 2006). Finally, as many ordinary events generate
multisensory signals, and the confluence of signals from multiple senses can powerfully in-
fluence perception (e.g., Thomas, 1941; Chen & Spence, 2010), we adapted the paradigm
in order to test subjects with multisensory sequences whose co-occurring audio and visual
components were either perceptually congruent or perceptually incongruent.
Method
In all of our test conditions, subjects had to judge whether the first four items in a
rapidly-presented stimulus sequences of eight items did or did not repeat. Figure 1 shows
schematic examples of our unimodal stimuli, with Visual stimuli in Panel A and Auditory
stimuli in Panel B. Items of each type were drawn from a homogeneous pool and were devoid
of semantic content.
As in Gold et al. (2013), visual stimuli were presented against a uniform background
of average luminance 19.03 cd/m2 on a 17” CRT monitor (Sony Trinitron UltraScan P780)
MULTISENSORY INTEGRATION 5
RN
N
Sample Auditory TrialsType
1 sec 1 sec1 sec 1 sec
One trial Another trial
Sample Visual Trials
1 sec 1 sec
RN
N
Type One trial Another trialA
B
Figure 1 . Schematic examples of auditory and visual unimodal stimuli. Panel A: exemplarsof visual stimuli; Panel B: exemplars of auditory stimuli. In each panel, two examples areshown for the N condition (the last four items in an eight-item sequence are uncorrelatedwith the first four) and for the RN condition (the last four items in an eight-item sequencerepeat identically the first four).
with a resolution of 1024◊768 pixels and a refresh rate of 75 Hz. Display luminances were
linearized by means of a calibration-based lookup table. Stimulus sequences were generated
and presented by an Apple iMac computer, using Matlab (version 7.7) and extensions
from the Psychophysics Toolbox (Brainard, 1997). Each visual sequence comprised eight
luminance levels presented in rapid succession to the same 4.1¶◊4.1¶ (128◊128 pixels) region
at the display’s center. Each luminance level in an entire eight-item sequence was presented
for 10 complete refreshes of the CRT screen (≥133 ms), which meant that a complete eight-
item sequence played out in 1,067 ms. A viewing distance of 57 cm was enforced by means
of a chin rest.
Auditory stimuli were seamless streams of eight equal-duration pure tones, each ≥133
MULTISENSORY INTEGRATION 6
ms in duration. These tones were sampled at 44.1 kHz and presented at 70-72 db(A) through
Sennheiser HD280 supra-aural earphones. To eliminate audible transients that would arise
from abrupt changes in frequency from one tone to another, the leading and trailing edges
of each tone were tapered with a raised cosine (≥1.13 msec rise or fall time).
When the experiment’s design called for a multimodal stimulus, auditory and visual
components of the stimulus sequence were presented synchronously. The synchronization
of auditory and visual sequences was assessed using photodiode and microphone inputs to
a dual-trace oscilloscope. Observations showed that two streams were synchronized to ±7
msec.
The stimulus-generation algorithm (see Figure 2) began by drawing eight random
samples from a normal distribution, N (0, 0.2). Samples more than ±2 standard deviations
from the mean were discarded and replaced. Together with the distribution’s relatively
small standard deviation, censoring extreme values served to homogenize items that would
appear in a stimulus sequence. This kept subjects from basing judgments on some highly-
distinctive, “oddball” item or items. As a measure of how well this goal was met, successive
samples in a sequence di�ered by a maximum of 1.57, while 10% of successive samples
di�ered by 0.07 or less, and 50% of successive values di�ered by 0.37 or less.
To determine what luminances would be presented, the eight samples drawn for the
trial were translated into equivalent luminance contrasts. Contrast was defined as (Lpix
-
Lbg
)/Lbg
, where Lpix
is the luminance of a stimulus pixel, and L
bg
is the display’s background
luminance, which was held constant at 19.03 cd/m2. The resulting samples ranged from 2
cd/m2 to 42 cd/m2. When a stimulus sequence included an auditory component, the eight
luminances in the sequence were translated into equivalent pure tones whose frequencies
were a linear function of luminance.
Experiment
Valid comparisons between unimodal and multimodal conditions demand that base-
line performance with Auditory and Visual sequences be equated. Were the separate uni-
MULTISENSORY INTEGRATION 7
modal contributions to a multimodal sequence substantially unequal, performance with a
multimodal sequence would be dominated by the more potent of the two unimodal drivers.
As a first step toward equating performance with Visual and Auditory sequences, we turned
to existing brightness-pitch cross-modal matching results reported by Marks (1974). Marks
asked subjects to adjust the pitch of a tone to perceptually match the brightness of various
achromatic Munsell patches. As we were committed to using the same luminance range
that Gold et al. (2013) had used, these cross-modal matching results dictated that we use
a set of frequencies spanning just over three octaves, 100 to 555 Hz (≥A2˜ to ≥C5˘ on an
equal-tempered musical scale).
A preliminary experiment tested 12 non-musicians with Visual stimuli generated as
in Gold et al. and with Auditory sequences drawn from the frequency range implied by
Marks’s cross-modal matching result, that is, 100–555 Hz. Although performance with
Visual stimuli nicely replicated what had been found previously by Gold et al. with the same
sample size, the Auditory sequences produced d
Õ values considerably higher. In particular,
subjects’ ability to recognize a within-sequence repetition of items was far better with
Auditory sequences than with Visual sequences, mean d
Õ values of 2.49 (SeM=0.15) and
1.23 (SeM=0.14), respectively (p <.01). This substantial mismatch between Auditory and
Visual performance probably reflects the di�erence between conditions used for cross-modal
matching and the conditions confronting our subjects. For example, Marks (1974)’s subjects
matched individual Auditory and Visual items under self-paced viewing and listening times,
and did so under conditions that put no burden on subjects’ memory. In contrast, our
task not only imposed a considerable burden on subjects’ memory, but, as importantly,
presented items in succession at a high rate (8 Hz). Welch and Warren (1980)’s modality-
appropriateness conjecture suggests that our task’s emphasis on temporal attributes of a
sequence would advantage auditory processing over visual processing, just as our preliminary
experiment showed. Moreover, the large di�erence between d
Õ values for Auditory and Visual
sequences is consistent with the idea that perceptual encoding of pitch sequences may
be aided by special-purpose neural mechanisms responsive to frequency shifts (Cousineau,
MULTISENSORY INTEGRATION 8
Demany, & Pressnitzer, 2009).
Whatever its cause, the approximately twofold di�erence in d
Õ values in our prelimi-
nary experiment suggests that if the unimodal stimuli from that experiment were combined
in multisensory sequences, performance would be dominated by the sequences’ Auditory
components, rendering valid assessment of multisensory integration di�cult. To avoid that
likelihood while retaining the luminance range that Gold et al. used, we narrowed the range
of auditory frequencies that would be used in the experiment proper. Specifically, the tones
comprising auditory sequences were drawn from a range of 344 to 400 Hz. In musical terms,
this reduced range of tones went from slightly below F4 to slightly above G4.
Draw 8 random samples from
Gaussian
Translate to luminance values
Unimodal
Visual Auditory
Translate to frequencies;
discard luminances
Multimodal
Present stimulus
AVCongruent
AVIncongruent
Translate luminances to frequencies;
retain both A and V
Generate new random sequence for audio; retain both A and V
Replace last 4 items with copies of first 4
Stimulus is Repeat?
Yes No
Figure 2 . Flowchart for stimulus generation. The steps in the stimulus generation algorithmare explained in the text.
Within each block, stimulus sequences comprised two di�erent structural categories.
In some sequences, hereafter termed “Repeat” (RN) sequences, the last four items in the
sequence repeated the first four items identically and in order; all items were reconstituted
anew for each trial. In other sequences, hereafter called “Noise” (N) sequences, each item
MULTISENSORY INTEGRATION 9
of the eight was the product of an independent sample (see Figure 2); these sequences, too,
were reconstituted anew for each trial. In each block of trials, Repeat and Noise sequences
were randomly intermingled, with both trial types occurring equally often.
With unimodal stimuli, subjects attempted to identify whether halves of an eight-item
stimulus sequence repeated or not. Unimodal stimuli, Visual or Auditory, were presented in
separate blocks of 75 trials each.
Figure 3 presents schematic examples of both classes of multimodal stimuli, AVcon-
gruent and AVincongruent. With multimodal sequences, subjects were instructed to ignore
a sequence’s auditory dimension, and to base judgments solely on variation in the visual
dimension. In order to probe limits on the ability to ignore concurrent Auditory signals,
we devised two classes of multimodal sequences: Congruent sequences, in which variation
in frequency was a monotone function of the accompanying luminance, and Incongruent
sequences, in which variation in frequency was uncorrelated with variation in luminance.
Each sequence comprised eight items presented in rapid succession, at 8 Hz. As it has long
been known that concurrent co-modulation promotes integration or binding of auditory
and visual signals (Thomas, 1941), we hypothesized further that co-modulation would help
subjects to recognize when items in a sequence were repeated.
In order to generate multisensory, Audiovisual stimuli whose Visual and Auditory com-
ponents were incongruent, the stimulus-generation algorithm (Figure 2) drew a second,
“dummy” set of eight luminance samples from the zero-mean Gaussian. The tonal equiva-
lents to members of this new set were derived and substituted for the tonal equivalents to
the luminances already selected for that trial. The result was a set of frequencies that were
uncorrelated with the set of luminances. To produce a Repeat (RN) sequence we replaced
the sequence’s last four items –whether unimodal or multimodal– with exact copies of the
first four items. With this last step, the algorithm could generate any of the stimulus types
that the experiment required.
Audiovisual stimuli were presented in blocks of 150 trials divided approximately equally
between two sequences types. For AVcongruent sequences, an Auditory item’s frequency
MULTISENSORY INTEGRATION 10
RNcon
Ncon
Sample AV Congruent TrialsType One trial Another trial
Nincon
Type One trial
RNincon
Sample AV Incongruent TrialsAnother trial
A
B
Figure 3 . Schematic examples of multimodal stimuli. Panel A: exemplars of stimuli whoseaudio and visual components were congruent; Panel B: exemplars of stimuli was audio andvisual components were incongruent, that is uncorrelated. In each panel, two examples areshown for the N condition (the last four items in an eight-item sequence are uncorrelatedwith the first four) and for the RN condition (the last four items in an eight-item sequencerepeat identically the first four).
MULTISENSORY INTEGRATION 11
was an increasing linear monotone of the luminance of the accompanying Visual item; for
AVincongruent sequences, component luminances were uncorrelated with the frequency of
accompanying Auditory components. AVcongruent and AVincongruent sequences were ran-
domly intermingled within a block of trials, and equal numbers of RN and N sequences were
randomly presented for each type. With all Audiovisual stimuli, subjects were instructed to
ignore the Auditory component, and base their judgments solely on the Visual aspect of the
sequence.
Three hundred milliseconds after a stimulus sequence ended, a message on the screen
prompted the subject for a key press that signaled whether elements in the sequence re-
peated. Feedback, in the form of a text message, followed. Subjects were encouraged to rest
after every 50 trials, but were asked to remain seated throughout the experiment. The order
in which Auditory, Visual and Audiovisual trial blocks were presented was counterbalanced
across subjects. Before beginning the experiment, subjects practiced with 20 trials of each
stimulus type ≠ Auditory, Visual, AVcongruent and AVincongruent.
We tested equal numbers of subjects who had had music training and ones who
had not. Previous results on processing of temporal sequences led us to hypothesize that
musicians would excel not only with auditory sequences, but with rapidly-presented visual
sequences as well (Deliége, Mélen, Stammers, & Cross, 1996; Deliége, 1996). A questionnaire
about music training was used to recruit and assign subjects to two groups, one whose
members experienced extensive musical training, and another whose members had little or
no training. Following Skoe and Kraus (2012), a subject qualified as a ”musician” if he or
she had played one or more musical instruments for six or more years, and was continuing
to play/practice an instrument up to the time of the experiment. A “non-musician” was
defined as someone who either had never played a musical instrument or had played a
musical instrument for three or fewer years, more than six years before study participation.1
The musicians on average had 10.93 years of musical training.
1We recognize that merely having played an instrument for some time does not truly make someone amusician, at least as that term is usually used. However, the terms “musician” and “non-musician” can serveas convenient, if imperfect surrogates.
MULTISENSORY INTEGRATION 12
Fourteen musicians and fourteen non-musicians, all between the ages of 18 and 22
years of age, participated in this experiment. These subject samples were comparable in
size to ones previously tested by Gold et al. (2013) on the same task. Each subject was
compensated $10 for participation. Nine subjects in each group were female. Table 1
summarizes the history of musical training reported by subjects who qualified as Musi-
cians. All subjects had normal visual acuity and hearing, and had best-corrected Snellen
acuity of at least 20/40. Hearing was indexed by a subject’s pure tone average (PTA;
the average threshold in each participant’s better ear for 1, 2, and 4 kHz). All subjects’
PTAs, as measured with a Beltone 120 audiometer, were Æ25 dB(HL), which qualifies as
clinically-normal hearing (Mueller & Hall, 1998). The experimental protocol was approved
by Brandeis University’s Committee for the Protection of Human Subjects.
Table 1Gender and age at which musical training began, years of musical training and instrument(s)played by musically trained subjects. For subjects who reported playing multiple instruments,instruments are listed in order of earliest learned.
Subject Gender Starting Age Musical Training Instrument(years) (years)
1 F 10 12 Violin,Piano2 M 4 15 Cello, Violin, Guitar3 F 5 15 Violin4 F 7 12 Piano, Clarinet, Bass Clarinet5 F 4.5 15 Piano, Drums6 M 6 15 Piano7 F 7 12 Piano, Flute, Saxophone8 F 7 9 Piano9 M 5 9 Piano, Flute, Guitar10 M 10 8 Saxophone, Bass, and Guitar11 M 12 6 Guitar, Piano12 F 9 6 Flute, Piano13 F 7 8 Saxophone, Violin14 F 9 11 Piano, Flute, Guitar
MULTISENSORY INTEGRATION 13
Results
Performance with various stimulus types was measured by each subject’s d
Õ values.
Figure 4 shows musically-trained and non-trained subjects’ mean d
Õ values for Auditory,
Visual, AVcongruent and AVincongruent trials. These results were analyzed with separate
ANOVAs on results from unimodal and multimodal stimuli. The ANOVAs were were fol-
lowed by t-tests, when needed.
First, the two types of unimodal stimuli, Auditory and Visual, proved to be equally
challenging for subjects (F1,26 = .01, p = 0.92, ÷
2 = .023, ). Thus, the goal of equating
the two types of unimodal stimuli was achieved. Second, Musicians outperformed Non-
musicians in distinguishing unimodal random from unimodal repeating sequences (F1,26
= 8.24, p = 0.01, ÷
2 = .241). Finally, Musicians outperformed Non-musicians with both
Auditory (t(23) = 2.47; p = .02) and Visual (t(23) = 2.03; p = .05) sequences.
Turning to Audiovisual stimuli, an ANOVA showed no significant overall di�erence
between Musicians and Non-musicians (F1,26 = 2.66, p = 0.11, ÷
2 = .095), although con-
gruency between Auditory and Visual components did matter: performance was significantly
better with AVcongruent sequences than with AVincongruent sequences (F1,26 = 182.9 9,
p<.00001, ÷
2 = .859). A t-test comparing the performance of Musicians and Non-musicians
for AVcongruent and AVincongruent trials revealed significant di�erences on AVcongruent
trials (t(26) = 2.21 ; p = .04), but not on AVincongruent trials (t(26)=.59 ; p=.56).
The pattern of results shown in Figure 4 led us to ask whether Musicians’ advantage
over Non-musicians with Visual stimuli was exaggerated with stimuli that included Auditory
components. For an answer, we computed two sets of di�erence scores, the first by sub-
tracting a subject’s d
Õ value for Visual sequences from that subject’s d
Õ value for Auditory
sequences, and the second by subtracting a subject’s d
Õ value for Visual sequences from
that subject’s d
Õ value for AVcongruent sequences. Independent-samples t-tests compared
Musicians and Non-musicians on each of these sets of di�erence scores. The advantage that
Musicians showed with Visual stimuli was not significantly enlarged either with Auditory or
AVcongruent stimuli (p=0.08 and p=0.36, respectively; df=23, one-tailed t-tests).
MULTISENSORY INTEGRATION 14
A V AVcon AVincon0
0.5
1
1.5
2
2.5
3
3.5
stimulus type
d pr
ime
d values for experiment 2
MusicianNon−Musician
Figure 4 . Mean d
Õ values. For each condition, the mean d
Õ value for musicians (lighterbar) is stacked atop the corresponding value for non-musicians (darker bar). Error barsrepresent ±1 within-subject standard error
Figure 5 highlights how history of musical training a�ects short-term memory for
sequences of various kinds. Years of music training and d
Õ were significantly correlated for
Auditory sequences (r(26) = .60, p <.05; Panel B) and for AVcongruent sequences (r(26)=.61,
p <.05; Panel C). For Visual sequences, though, the relationship between years of training
and performance failed to reach significance (r(26)=.47, p =.10; Panel A), as did the result
for AVincongruent sequences, (r(26) = .08, p = .80; Panel D). So, with some, but not all
kinds of stimulus sequences, performance is significantly correlated with years of music
training.
MULTISENSORY INTEGRATION 15
0 2 4 6 8 10 12 14 160
0.5
1
1.5
2
2.5
3
3.5
valu
es o
f d’
musical experience
d’ in relation to musical experience for AVincon trials
MusiciansNon
Musicians
0 2 4 6 8 10 12 14 160
0.5
1
1.5
2
2.5
3
3.5
valu
es o
f d’
musical experience
d’ in relation to musical experience for AVcon trials
MusiciansNon
Musicians
C D
0 2 4 6 8 10 12 14 160
0.5
1
1.5
2
2.5
3
3.5
d va
lues
musical experience
d values as a function of musical experience for A trials
MusiciansNonMusicians
Auditory sequences
Years of musical training
d'
0 2 4 6 8 10 12 14 160
0.5
1
1.5
2
2.5
3
3.5
d va
lues
musical experience
d values as a function of musical experience for V trials
MusiciansNonMusicians
Visual sequences
Years of musical training
d'
A B
Years of musical training Years of musical training
Years of musical training Years of musical training
Musicians
Non-Mus.
Non-Mus.Musicians
Non-Mus.
Non-Mus.
Visual Only Auditory Only
A-V Congruent A-V Incongruent
d'd'
d'd'
Musicians
Musicians
Figure 5 . Values of d
Õ as function of years of musical training. Results for unimodalsequences are shown in Panels A and B; results for multimodal sequences are shown inPanels C and D.
MULTISENSORY INTEGRATION 16
Reverse correlation
To determine whether performance di�erences between Musicians and Non-Musicians
were associated di�erences in subjects’ strategies, we turned to reverse correlation analysis
(Ahumada & Beard, 1997; Murray, Bennett, & Sekuler, 2002). This analytic technique
computes the correlation between subjects’ responses across trials and the contrast of each
item in a sequence. The result is a set of weights that shows the relative influence exerted by
each item on subjects’ decisions. Recall that subjects were instructed to judge whether the
last four items in a sequence did or did not identically repeat the first four items. Previously,
applying this analysis to results with Visual sequences in this same task, Gold et al. (2013)
discovered that subjects placed preferential emphasis on the luminance of certain sequence
items. In particular, reverse correlation revealed that subjects gave particular weight to the
final items in each half of an eight-item Visual sequence. To see how adding correlated and
uncorrelated auditory sequences a�ected strategy, we performed the same analysis on Visual,
AVcongruent and AVincongruent data. Specifically, vectors containing the eight contrast
values displayed on a trial were sorted into four possible stimulus-response combinations.
The vectors were then averaged for each stimulus-response combination, and Equation 1
was used to produce the mean kernel c̨
c̨ =!rR + rN
"≠
!nN + nR
", (1)
where xY denotes the combination of response x (either “repeating” or “not repeating”)
and stimulus Y (either Repeat or Noise).
The result, c̨, is an eight-element vector whose values are the relative weights as-
signed to the items in a sequence that was being judged. If an observer’s classification
of a stimulus were uncorrelated with the contrast value at particular ordinal position of a
sequence, the resulting mean kernel for that ordinal position would not significantly di�er
from zero. A positive weight in the mean kernel indicates that positive contrast values
in a sequence promoted “repeating” responses, while a negative weight promoted “non
MULTISENSORY INTEGRATION 17
repeating” responses. Likewise, a negative weight in the mean kernel indicated that pos-
itive contrast values promoted “non repeating“ responses, while negative contrast values
promoted “repeating” responses.
Figure 6 shows the results of this analysis for Visual only (Panel A), AVincongruent
(Panel B) and AVcongruent (Panel C) stimuli. The filled symbols (•) represent Musicians’ re-
sults, and the open symbols (o) represent Non-Musicians’ results. First, consider Visual-only
sequences. These reverse correlation functions strongly resemble ones reported previously
for Visual sequences (Gold et al., 2013). In particular, the fourth and eighth items in an
eight-item sequence have the strongest influence on subjects’ judgments. Moreover, both
Musicians and Non-musicians exhibited this pattern. Thus, di�erences in d
Õ values between
Musicians and Non-Musicians with Visual-only stimuli probably did not reflect di�erences in
overall strategy, but resulted from some other aspect of how each group processed stimulus
information, for example, levels of internal noise (Burgess, Wagner, Jennings, & Barlow,
1981) or uncertainty (Pelli, 1985).
Next, consider results with AVcongruent stimuli. Unlike what was seen with Visual
stimuli, here there is a marked di�erence between Musicians’ and Non-musicians’ strategies.
In particular, Musicians appear to have maintained the same strategy that they used when
no Auditory stream was present; in contrast, Non-musicians show no consistent preferential
weighting for particular items within the eight-item sequence. Gold et al. (2013) showed that
their subjects weighted a sequence’s fourth and eighth items mainly in order to deal with
intrinsic uncertainty about the temporal boundaries of the visual sequences they were seeing.
(Estimates of these boundaries would obviously be an important element in comparing a
sequence’s two halves.) That interpretation predicts that subjects’ performance would be
reduced if they failed to use such a strategy, which is exactly what we found. Thus, it
appears that the presence of a correlated Auditory stream interferes with the ability of Non-
musicians to maintain the strategy that they might otherwise use to overcome the limiting
e�ects of temporal uncertainty. Musicians, on the other hand, appear to be much less
a�ected by the concurrent, correlated Auditory stream.
MULTISENSORY INTEGRATION 18
-0.20
-0.15
-0.10
-0.05
0.00
0.05
0.10
0.15
0.20
Rela
tive
Wei
ght
87654321
Item in Sequence
Musician Non-Musician
A-V Congruent
B
-0.20
-0.15
-0.10
-0.05
0.00
0.05
0.10
0.15
0.20
Rela
tive
Wei
ght
87654321
Item in Sequence
Musician Non-Musician
Visual Only
A
-0.20
-0.15
-0.10
-0.05
0.00
0.05
0.10
0.15
0.20
Rela
tive
Wei
ght
87654321
Item in Sequence
Musician Non-Musician
A-V Incongruent
C
Visual Only
A-V Congruent
A-V Incongruent
Musician Non-Musician
Musician Non-Musician
Musician Non-Musician
A
B
C
Figure 6 . Reverse correlations based on sequences’ Visual attributes. In each panel, sepa-rate curves are shown for Musicians (•) and Non-musicians (o). Panel A shows the reversecorrelations for Visual unimodal sequences; Panel B shows the reverse correlations for AVin-
congruent sequences; and Panel C shows the reverse correlations for AVcongruent sequences.
MULTISENSORY INTEGRATION 19
Finally, consider results produced with AVincongruent sequences (Panel C). Here, both
Musicians and Non-musicians seem to have failed to maintain the strategy they adopted with
Visual-only stimuli. Further, recall that with AVincongruent stimuli there was no significant
di�erence between Musicians’ and Non-musicians’ d
Õ values. Apparently, the inclusion of an
uncorrelated Auditory stream undermines subjects’ ability to selectively weight key items in
a sequence.
Discussion
Our results demonstrated that Musicians have enhanced ability to detect repetitions
of items within rapidly-presented sequences, and that this enhanced ability extends not
only to Auditory sequences, but also to Visual, and correlated Audiovisual sequences as well.
This result may reflect the e�ects of training to play an instrument, especially given that
performance does correlate with years of training (Figure 5B and C). However, our results
should not encourage non-musically trained readers to rush headlong into taking up a mu-
sical instrument. The mere fact that Musicians outperform Non-musicians on our task does
not mean that music training per se is responsible (see, Morrison & Chein, 2011). After all,
a person who, absent training, would have excelled on the task anyway might have been
more inclined to start such training. Further, pre-existing talent for processing Auditory
sequences might encourage a person not only to initiate, but also to persist in learning to
play an instrument.
A proper test of causal linkage between music training and performance on a task
like ours requires that researchers begin with subjects who had never had training to play
an instrument. Some of those naive subjects would be randomly assigned to undergo in-
strument training for an extended period, while control subjects receive some equivalent
non-instrument training for the same period (Barrett, Ashley, Strait, & Kraus, 2013). Ide-
ally, the e�ect of di�erential training would be assessed not only via di�erential behavioral
changes, but also in terms of correlated changes in brain. We know of just one study that
meets these stringent criteria. In that study, Hyde et al. (2009) constituted two groups of
MULTISENSORY INTEGRATION 20
children who were a bit over six years old at the start of the 15-month study. One group re-
ceived weekly, 30-minute private keyboard lessons during the study; a second group received
no instrumental music training, but instead participated in weekly 40-minute group music
classes, during which they sang and played with drums and bells. Comparisons of pre- and
post-treatment measures showed that instrumental training di�erentially enhanced ability
to distinguish between pairs of five-tone musical phrases that di�ered either in melody, that
is, in pitch sequence, or in rhythm (Overy et al., 2004). Moreover, analysis of magnetic
resonance images captured at the start and end of the study revealed that training-induced
improvement on the melodic/rhythmic discrimination test were correlated with deformation
changes in a key auditory area of subjects’ right hemispheres, that is, the lateral aspect of
Heschl’s gyrus.
The criteria we and others used to distinguish Musicians from Non-musicians have
obvious limitations. In fact, the challenge of precisely defining what constitutes a “musician”
is well-known (Levitin, 2012). Following the lead of Skoe and Kraus (2012), we defined
musician status mainly by subjects’ self-reports of how long they had played an instrument.
Of course, not every person who receives many years of music instruction or engages in years
of continuous practice achieves a level of proficiency that would satisfy common definitions of
“musician.” Conversely, a person might possess su�cient native talent that he or she could
achieve very high proficiency in very short order. Additionally, there likely are multiple
di�erences between our two groups, including di�erences in auditory imagery (Brown &
Palmer, 2013; Keller, Dalla Bella, & Koch, 2010), perceptual grouping (Kung, Tzeng, Hung,
& Wu, 2011), and other, general cognitive factors as well. By assessing multiple dimensions
of auditory experience and other cognitive attributes, future studies could contribute toward
a more complete definition of “musicianship.”
Of course, playing a musical instrument does not require processing quasi-random
luminance sequences like the ones in our Visual or Audiovisual stimuli. However, playing
an instrument could entail the translation of spatial information into temporal sequences,
as one does, for example, when reading music. This spatio-temporal translation might be
MULTISENSORY INTEGRATION 21
expedited by the spontaneous, natural mapping of pitch onto the visual feature of vertical
location (Evans & Treisman, 2010). Mindful of the role that spatial information might
play for musicians, Bergstrom et al. (2012) tested the speed and accuracy when subjects
learned to make key presses to each of a series of targets presented at di�erent locations on a
computer screen. Embedded in the sequence of events was a sub-sequence in which events’
locations were governed by the rules of an artificial grammar (e.g., Reber & Millward,
1968). Using much the same definition of “musician” that we did, Bergstrom et al. found
that Musicians’ implicit learning of sequential regularities was better than that of Non-
musicians. This result suggests that among the skills on which musicians excel is the skill
of implicitly learning and remembering quasi-random spatio-temporal sequences.
Figure 4 showed that even though subjects were instructed to focus exclusively on
the Visual aspect of an Audiovisual sequence, the presence of a concurrent, congruent Audi-
tory sequence boosted performance considerably over what was seen with either unimodal
sequence. Although our data do not support formal model selection, we can compare this
Audiovisual e�ect against what would be expected from one simple, widely-used benchmark.
Imagine that two orthogonal signals were processed by independent mechanisms, A and V ,
whose noise was uncorrelated. Under such conditions, with each sensitivity expressed as
d
Õ, the response to the combination of the two signals would beÔ
d
ÕV
2 + d
ÕA
2 (Green &
Swets, 1966; Green, 1958; Viemeister & Wakefield, 1991). A t-test confirmed that AVcon-
gruent sequences boosted performance to a level well above the predicted value (t(23)=-3.46,
p <0.001). For the sake of completeness, we also compared performance with AVincongruent
sequences against performance with each type of unimodal sequence. Neither comparison
was statistically significant (p=.51 and .33, for t-tests against Auditory and Visual sequences,
respectively). Returning to the surprisingly powerful advantage seen with AVcongruent se-
quences, it should be noted that the super-additivity of Auditory and Visual components in
such sequences was produced despite the fact that those separate unimodal aspects were
strongly correlated, that is, distinctly non-orthogonal. As this surprising result may be
valuable in informing theories of multisensory integration, the boundary conditions on this
MULTISENSORY INTEGRATION 22
result demand further study. For example, it may be this apparent super-additivity reflects
the engagement of mechanisms specialized for multisensory coincidence or congruence (e.g.,
Bushara et al., 2003; Kayser, Logothetis, & Panzeri, 2010; Orchard-Mills et al., 2013).
We should note that one recent study did not find di�erences in Musicians’ and Non-
musicians’ visual memory. Di�erences between that study and our own may be instructive,
so they are worth considering in some detail. For stimuli, Cohen, Evans, Horowitz, and
Wolfe (2011) chose pictures of objects, abstract art, speech clips and clips of familiar music.
Stimuli were presented one at time, for five seconds each. After all stimuli of one class had
been presented, the researchers tested recognition memory by presenting intermixed old
(previously presented) and new (novel) stimuli, noting how well subjects correctly catego-
rized these intermixed stimuli as “old” or “new”. The key result for the present discussion is
that Musicians and Non-musicians did not di�er in recognition memory for visual stimuli.
Of course, multiple task-related di�erences make it di�cult to compare Cohen et al.’s results
to ours. These di�erences include (i) the types of stimuli used (low-level, elemental features
vs. higher-level stimuli, such as familiar tunes), (ii) the temporal characteristics of stimulus
presentation (rapid presentation of item sequences, which worked against online rehearsal,
vs. five seconds per individual item), and (iii) the task (recognizing within-trial repetitions
of items vs. longer term recognition of single items ). Although any or all of these di�erences
could account for the di�erence between Cohen et al.’s results and our own, it seems ad-
visable that when researchers want to assess musicians’ and non-musicians’ visual memory,
their choice of stimuli and test task should take account of the modality-appropriateness
conjecture (Welch & Warren, 1980).
In our study, Audiovisual congruence was defined by a positive monotone relationship
between an item’s luminance and the frequency of an accompanying tone. In our study
Audiovisual congruence could be described as an all-or-none phenomenon: while components
of an AVcongruent sequence were perfectly correlated, components of an AVincongruent
sequence were on average completely unrelated. Of course, one could devise Audiovisual
sequences in which the correlation between Auditory and Visual components was neither
MULTISENSORY INTEGRATION 23
1.0 nor 0.0, but various values between. Such partially-correlated stimuli, like ones Agus
and Pressnitzer (2013) used to study memory for Auditory sequences, could be leveraged
to identify the strategies subjects drew on. It is worth noting that Audiovisual congruence
could take on forms other than the one we implemented (see review in Evans & Treisman,
2010). One such form would exploit the normal Audiovisual congruence that is characteristic
of speech production. Speech production requires movements of the mouth and face, which
produce a reliable correlation between the auditory output of the vocal tract, on one hand,
and visual motion cues, on the other. It has long been known that speech-related visual
cues alter the intelligibility and detectability of heard speech (e.g., Campbell, 2008). In fact,
the congruence between a speaker’s mouth and lip movements and the sound uttered by
the speaker is basis of the well-known McGurk-MacDonald e�ect (1976) in which altering
the normal relationship between a spoken sound and the accompanying movements of the
mouth distorts a listener’s perception of that sound. Face-to-face speech can be described
as an inherently multisensory phenomenon (Chandrasekaran, Lemus, Trubanova, Gondan,
& Ghazanfar, 2011; Chandrasekaran, Lemus, & Ghazanfar, 2013). It is noteworthy that
this form of Audiovisual congruence extends to situations seemingly far removed from face-
to-face speech. In particular, this form of Audiovisual congruence was recently incorporated
into a first-person fisherman computer game (Sun, Shinn-Cunningham, Somers, & Sekuler,
2014; Bensussen et al., 2014). In that game, responses to computer-generated fish that
swam across a video display were considerably speeded when the amplitude modulation of
a sound emitted by a fish was correlated with the periodic fluctuations in the fish’s size.
Finally, when presented with Audiovisual sequences, our subjects were instructed to
attend only to variations along the Visual dimension, ignoring the accompanying Auditory
dimension. Classification decisions, they were told, should be based only on the correspon-
dence between luminance sequences in a stimulus’ first and second halves. This instruction
to attend only to sequence’s visual dimension, was reinforced by the fact that explicit feed-
back after every response was contingent solely on the relationship between the subject’s
judgement and the stimulus’ Visual dimension. It is unknown how our subjects’ perfor-
MULTISENSORY INTEGRATION 24
mance would have been impacted had they been given the inverse instruction, that is,
to base decisions on Auditory signals, while ignoring Visual ones, although results from a
speeded-categorization task do suggest that with Audiovisual stimuli, leakage from a nomi-
nally unattended modality is bidirectional (Bensussen et al., 2014).
References
Agus, T. R., & Pressnitzer, D. (2013). The detection of repetitions in noise before and after
perceptual learning. Journal of the Acoustical Society of America, 134 (1), 464–473.
Agus, T. R., Thorpe, S. J., & Pressnitzer, D. (2010). Rapid formation of robust auditory
memories: insights from noise. Neuron, 66 (4), 610–618.
Ahumada, A. J., Jr, & Beard, B. L. (1997). Image discrimination models predict detection
in fixed but not random noise. Journal of the Optical Society of America. A, Optics,
image science, and vision, 14 (9), 2471–2476.
Barrett, K. C., Ashley, R., Strait, D. L., & Kraus, N. (2013). Art and science: how musical
training shapes the brain. Frontiers in Psychology, 4, 713.
Bensussen, S., Chou, K. F., Varghese, L., Sun, Y., Somers, D. C., Shinn-Cunningham,
B., & Sekuler, R. (2014). Bidirectional audiovisual interactions: Evidence from a
computerized fishing game. Providence, R I: Meeting of the Acoustical Society of
America.
Berger, C. C., & Ehrsson, H. H. (2013). Mental imagery changes multisensory perception.
Current Biology, 23 (14), 1367–1372.
Bergstrom, J. C. R., Howard, J. H., & Howard, D. V. (2012). Enhanced implicit sequence
learning in college-age video game players and musicians. Applied Cognitive Psychol-
ogy, 26, 91–96.
Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision, 10, 433-436.
MULTISENSORY INTEGRATION 25
Brown, R. M., & Palmer, C. (2013). Auditory and motor imagery modulate learning in
music performance. Frontiers in Human Neuroscience, 7, 320.
Burgess, A. E., Wagner, R. F., Jennings, R. J., & Barlow, H. B. (1981). E�ciency of human
visual signal discrimination. Science, 214 (4516), 93–94.
Bushara, K. O., Hanakawa, T., Immisch, I., Toma, K., Kansaku, K., & Hallett, M. (2003).
Neural correlates of cross-modal binding. Nature Neuroscience, 6 (2), 190–195.
Campbell, R. (2008). The processing of audio-visual speech: empirical and neural bases.
Philosophical Transactions of the Royal Society of London, Series B, Biological Sci-
ences, 363 (1493), 1001–1010.
Chan, A. S., Ho, Y. C., & Cheung, M. C. (1998). Music training improves verbal memory.
Nature, 396 (6707), 128.
Chandrasekaran, C., Lemus, L., & Ghazanfar, A. A. (2013). Dynamic faces speed up
the onset of auditory cortical spiking responses during vocal detection. Proceedings
of National Academy of Sciences of the United States of America, 110 (48), E4668–
E4677.
Chandrasekaran, C., Lemus, L., Trubanova, A., Gondan, M., & Ghazanfar, A. A. (2011).
Monkeys and humans share a common computation for face/voice integration. PLoS
Computational Biology, 7 (9), e1002165.
Chen, Y. C., & Spence, C. (2010). When hearing the bark helps to identify the dog:
Semantically-congruent sounds modulate the identification of masked pictures. Cog-
nition, 114 (3), 389-404.
Cohen, M. A., Evans, K. K., Horowitz, T. S., & Wolfe, J. M. (2011). Auditory and visual
memory in musicians and nonmusicians. Psychonomic Bulletin & Review, 18 (3),
586–591.
MULTISENSORY INTEGRATION 26
Cousineau, M., Demany, L., & Pressnitzer, D. (2009). What makes a melody: The percep-
tual singularity of pitch sequences. The Journal of the Acoustical Society of America,
126 (6), 3179–3187.
Deliége, I. (1996). Cue abstraction as a component of categorisation processes in music
listening. Psychology of Music, 24 (2), 131-156.
Deliége, I., Mélen, M., Stammers, D., & Cross, I. (1996). Musical schemata in real time
listening to a piece of music. Music Perception: An Interdisciplinary Journal, 14 (4),
117-160.
Evans, K. K., & Treisman, A. (2010). Natural cross-modal mappings between visual and
auditory features. Journal of Vision, 10 (1), 11507–11510.
Francois, C., & Schön, D. (2011). Musical expertise boosts implicit learning of both musical
and linguistic structures. Cerebral Cortex, 21 (10), 2357-65.
Gold, J. M., Aizenman, A., Bond, S. M., & Sekuler, R. (2013). Memory and incidental
learning for visual frozen noise sequences. Vision Research.
Green, D. M. (1958). Detection of multiple component signals in noise. The Journal of the
Acoustical Society of America, 50, 904–911.
Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York:
Wiley.
Guttman, S., Gilroy, L. A., & Blake, R. (2005). Hearing what the eyes see: auditory
encoding of visual temporal sequences. Psychological Science, 16 (3), 228-35.
Hyde, K. L., Lerch, J., Norton, A., Forgeard, M., Winner, E., Evans, A. C., & Schlaug,
G. (2009). Musical training shapes structural brain development. Journal of Neuro-
science, 29 (10), 3019–3025.
Kahana, M. J., & Sekuler, R. (2002). Recognizing spatial patterns: a noisy exemplar
approach. Vision Research, 42 (18), 2177–2192.
MULTISENSORY INTEGRATION 27
Kayser, C., Logothetis, N. K., & Panzeri, S. (2010). Visual enhancement of the information
representation in auditory cortex. Current Biology, 20 (1), 19–24.
Keller, P. E., Dalla Bella, S., & Koch, I. (2010). Auditory imagery shapes movement timing
and kinematics: evidence from a musical task. Journal of Experimental Psychology:
Human Perception and Performance, 36 (2), 508–513.
Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory
skills. Nature Review Neuroscience, 11 (8), 599-605.
Kung, S.-J., Tzeng, O. J. L., Hung, D. L., & Wu, D. H. (2011). Dynamic allocation of
attention to metrical and grouping accents in rhythmic sequences. Experimental Brain
Research, 210 (2), 269–282.
Levitin, D. J. (2012). What does it mean to be musical? Neuron, 73 (4), 633-7.
Magnussen, S. (2000). Low-level memory processes in vision. Trends in Neuroscience,
23 (6), 247–251.
Marks, L. E. (1974). On associations of light and sound: The mediation of brightness, pitch
and loudness. The American Journal of Psychology, 87, 173-188.
McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 265, 746–748.
Michalka, S., Rosen, M., Kong, L., Shinn-Cunningham, B., & Somers, D. (2012). fMRI
investigations of temporal sequence processing in visual short-term memory of humans.
(Poster presented at SFN2012 Conference)
Miller, M. B., & Gazzaniga, M. S. (1998). Creating false memories for visual scenes.
Neuropsychologia, 36 (6), 513–520.
Morrison, A. B., & Chein, J. M. (2011). Does working memory training work? the promise
and challenges of enhancing cognition by training working memory. Psychononic
Bulletin & Review, 18 (1), 46–60.
MULTISENSORY INTEGRATION 28
Mueller, G., & Hall, J. W. (1998). Audiologist’s Desk Reference: Audiolologic Management,
Rehabilitation and Terminology (Vol. II). Singular Publishing Group, Inc.
Murray, R. F., Bennett, P. J., & Sekuler, A. B. (2002). Optimal methods for calculating
classification images: weighted sums. Journal of Vision, 2 (1), 79–104.
Nisbett, R., & Wilson, T. D. (1977). Telling more than we can know: Verbal reports on
mental processes. Psychological Review, 84 (3), 231-259.
Orchard-Mills, E., Leung, J., Burr, D., Morrone, M. C., Wufong, E., Carlile, S., & Alais, D.
(2013). A mechanism for detecting coincidence of auditory and visual spatial signals.
Multisensory Research, 26 (4), 333–345.
Overy, K., Norton, A. C., Cronin, K. T., Gaab, N., Alsop, D. C., Winner, E., & Schlaug,
G. (2004). Imaging melody and rhythm processing in young children. NeuroReports,
15, 1723–1726.
Oxenham, A. J., Fligor, B. J., Mason, C. R., & Kidd, G., Jr. (2003). Informational masking
and musical training. The Journal of the Acoustical Society of America, 114 (3), 1543–
1549.
Pasternak, T., & Greenlee, M. W. (2005). Working memory in primate sensory systems.
Nature Review Neuroscience, 6 (2), 97–107.
Pelli, D. G. (1985). Uncertainty explains many aspects of visual contrast detection and
discrimination. Journal of Optical Society of American, A, 2 (9), 1508–1532.
Rammsayer, T., & Altenmüller, E. (2006). Temporal information processing in musicians
and nonmusicians. Music Perception: An Interdisciplinary Journal, 24, 37–47.
Reber, A. S., & Millward, R. B. (1968). Event observation in probability learning. Journal
of Experimental Psychology, 77 (2), 317–327.
Sekuler, R., Sekuler, A. B., & Lau, R. (1997). Sound alters visual motion perception.
Nature, 385 (6614), 308.
MULTISENSORY INTEGRATION 29
Skoe, E., & Kraus, N. (2012). A little goes a long way: How the adult brain is shaped by
musical training in childhood. The Journal of Neuroscience, 34, 11507–11510.
Strait, D. L., Parbery-Clark, A., Hittner, E., & Kraus, N. (2012). Musical training during
early childhood enhances the neural encoding of speech in noise. Brain and language,
123 (3), 191-201.
Sun, Y., Shinn-Cunningham, B., Somers, D., & Sekuler, R. (2014). Multisensory learn-
ing and integration in a first-person fisherman game. Boston, MA: Meeting of the
Cognitive Neuroscience Society.
Thomas, G. (1941). Experimental study of the influence of vision on sound localization.
Journal of Experimental Psychology, 28, 167-177.
Viemeister, N. F., & Wakefield, G. H. (1991). Temporal integration and multiple looks.
The Journal of the Acoustical Society of America, 90 (2 Pt 1), 858–865.
Welch, R. B. (1999). Meaning, attention and the "unity assumption" in the intersensory bias
of spatial and temporal perceptions. In G. Aschersleben, T. Bachmann, & J. Müsseler
(Eds.), Cognitive contributions to the perception of spatial and temporal events. (pp.
371–387). Amsterdam: Elsevier.
Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory
discrepancy. Psychological Bulletin, 88 (3), 638–667.