Inducing musical-interval learning by combining task ... · Inducing musical-interval learning by...

14
Inducing musical-interval learning by combining task practice with periods of stimulus exposure alone David F. Little 1 & Henry H. Cheng 2 & Beverly A. Wright 3 Published online: 22 August 2018 # The Psychonomic Society, Inc. 2018 Abstract A key component of musical proficiency is the ability to discriminate between and identify musical intervals, or fixed ratios between pitches. Acquiring these skills requires training, but little is known about how to best arrange the trials within a training session. To address this issue, learning on a musical-interval comparison task was evaluated for two four-day training regimens that employed equal numbers of stimulus presentations per day. A regimen of continuous practice yielded no learning, but a regimen that combined practice and stimulus exposure alone generated clear improvement. Learning in the practice-plus- exposure regimen was due to the combination of the two experiences, because two control groups who received only either the practice or the exposure from that regimen did not learn. Posttest performance suggested that this improvement in comparison learning generalized to an untrained stimulus type and an untrained musical-interval identification task. Naïve comparison performance, but not learning, was better for larger pitch-ratio differences and for individuals with more musical experience. The reported benefits of the practice-plus-exposure regimen mirror the outcomes for fine-grained discrimination and speech tasks, suggesting that a general learning principle is involved. In practical terms, it appears that combining practice and stimulus exposure alone is a particularly effective configuration for improving musical-interval perception. Keywords Perceptual learning . Music cognition . Sound recognition . Psychoacoustics A crucial component of musicianship is the ability to discrim- inate between and identify musical intervals, such as a perfect fourth and a major third. This ability is employed while playing by ear, self-correcting during music performance, and analyz- ing music (Arenson, 1984; Cleland & Dobrea-Grindahl, 2013; Furby, 2016; Karpinski, 2000). Therefore, learning this skill opens a gateway to musical expertise. Unfortunately, mastery is not easy. Even accomplished performers can struggle to attain proficiency, and only people with extensive musical experience tend to reach it (Burns & Ward, 1978; Foxton, Brown, Chambers, & Griffiths, 2004; McDermott, Keebler, Micheyl, & Oxenham, 2010; Thompson, 2013). Despite the known chal- lenges of musical-interval learning, there has been little inves- tigation of the characteristics of successful training regimens. The focus of the present investigation was on how training influences musical-interval perception depending on the con- figuration of the training blocks. Musical-interval learning is so difficult, in part, because of the abstraction required. Musical intervals are determined by the pitch ratio between two musical notes. For example, both the note pair A4 (440 Hz) and D5 (587 Hz) and the pair B2 (123 Hz) and E2 (164 Hz) instantiate a perfect fourth, even though their abso- lute pitches are quite different. Likewise, the pair A4 and F4 (587 Hz) form a perfect fourth, and the pair A4 and E4 (660 Hz) form a perfect fifth, even though they share an absolute pitch. Thus, to identify musical intervals, the listener must be able to compute the pitch ratio between two notes well enough to be able to discrim- inate among different ratios and then apply the appropriate label to each ratioa challenging categorization task. We are aware of only two prior training studies related to musical-interval perception. In one (Burns & Ward, 1978), as part of a larger investigation of pitch-ratio discrimination and musical-interval identification, four undergraduate music * David F. Little [email protected] 1 Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA 2 Communication Sciences and Disorders, Northwestern University, Evanston, IL 60208-3550, USA 3 Communication Sciences and Disorders, Knowles Hearing Center, Northwestern Institute for Neuroscience, Northwestern University, Evanston, IL 60208-3550, USA Attention, Perception, & Psychophysics (2019) 81:344357 https://doi.org/10.3758/s13414-018-1584-x

Transcript of Inducing musical-interval learning by combining task ... · Inducing musical-interval learning by...

Page 1: Inducing musical-interval learning by combining task ... · Inducing musical-interval learning by combining task practice with periods of stimulus exposure alone David F. Little1

Inducing musical-interval learning by combining taskpractice with periods of stimulus exposure alone

David F. Little1 & Henry H. Cheng2& Beverly A. Wright3

Published online: 22 August 2018# The Psychonomic Society, Inc. 2018

AbstractA key component of musical proficiency is the ability to discriminate between and identify musical intervals, or fixed ratiosbetween pitches. Acquiring these skills requires training, but little is known about how to best arrange the trials within a trainingsession. To address this issue, learning on a musical-interval comparison task was evaluated for two four-day training regimensthat employed equal numbers of stimulus presentations per day. A regimen of continuous practice yielded no learning, but aregimen that combined practice and stimulus exposure alone generated clear improvement. Learning in the practice-plus-exposure regimen was due to the combination of the two experiences, because two control groups who received only eitherthe practice or the exposure from that regimen did not learn. Posttest performance suggested that this improvement in comparisonlearning generalized to an untrained stimulus type and an untrained musical-interval identification task. Naïve comparisonperformance, but not learning, was better for larger pitch-ratio differences and for individuals with more musical experience.The reported benefits of the practice-plus-exposure regimen mirror the outcomes for fine-grained discrimination and speechtasks, suggesting that a general learning principle is involved. In practical terms, it appears that combining practice and stimulusexposure alone is a particularly effective configuration for improving musical-interval perception.

Keywords Perceptual learning .Music cognition . Sound recognition . Psychoacoustics

A crucial component of musicianship is the ability to discrim-inate between and identify musical intervals, such as a perfectfourth and a major third. This ability is employed while playingby ear, self-correcting during music performance, and analyz-ing music (Arenson, 1984; Cleland & Dobrea-Grindahl, 2013;Furby, 2016; Karpinski, 2000). Therefore, learning this skillopens a gateway to musical expertise. Unfortunately, masteryis not easy. Even accomplished performers can struggle to attainproficiency, and only people with extensive musical experiencetend to reach it (Burns & Ward, 1978; Foxton, Brown,Chambers, & Griffiths, 2004; McDermott, Keebler, Micheyl,

& Oxenham, 2010; Thompson, 2013). Despite the known chal-lenges of musical-interval learning, there has been little inves-tigation of the characteristics of successful training regimens.The focus of the present investigation was on how traininginfluences musical-interval perception depending on the con-figuration of the training blocks.

Musical-interval learning is so difficult, in part, because of theabstraction required. Musical intervals are determined by thepitch ratio between twomusical notes. For example, both the notepair A4 (440 Hz) and D5 (587 Hz) and the pair B2 (123 Hz) andE2 (164 Hz) instantiate a perfect fourth, even though their abso-lute pitches are quite different. Likewise, the pair A4 and F4 (587Hz) form a perfect fourth, and the pairA4 andE4 (660Hz) form aperfect fifth, even though they share an absolute pitch. Thus, toidentifymusical intervals, the listenermust be able to compute thepitch ratio between two notes well enough to be able to discrim-inate among different ratios and then apply the appropriate labelto each ratio—a challenging categorization task.

We are aware of only two prior training studies related tomusical-interval perception. In one (Burns & Ward, 1978), aspart of a larger investigation of pitch-ratio discrimination andmusical-interval identification, four undergraduate music

* David F. [email protected]

1 Electrical and Computer Engineering, Johns Hopkins University,Baltimore, MD, USA

2 Communication Sciences and Disorders, Northwestern University,Evanston, IL 60208-3550, USA

3 Communication Sciences and Disorders, Knowles Hearing Center,Northwestern Institute for Neuroscience, Northwestern University,Evanston, IL 60208-3550, USA

Attention, Perception, & Psychophysics (2019) 81:344–357https://doi.org/10.3758/s13414-018-1584-x

Page 2: Inducing musical-interval learning by combining task ... · Inducing musical-interval learning by combining task practice with periods of stimulus exposure alone David F. Little1

majors and two musically untrained listeners practiced dis-criminating between pitch ratios in two-presentation forced-choice trials. Discrimination thresholds were measured sepa-rately for each of five different pitch ratios using an adaptivetracking procedure. Training blocks were repeated until thethresholds reached asymptote. Neither the amount nor thetemporal distribution of the training was reported. Based ondescriptive statistics, it appeared that both the music majorsand the musically untrained listeners improved, but that themusic majors reached lower (i.e., better) discriminationthresholds. In the other investigation (Foxton et al., 2004),listeners with ~ 3 or fewer years of musical experience weretrained for 25 min a day for seven days on melody differencedetection. The task was to indicate which of two melodiesdiffered from a random reference melody presented on eachtrial. During training, for one group (n = 10) a single-notechange altered the direction of pitch changes—the pitch con-tour—of the melody. For another group (n = 10), a single-notechange maintained the pitch contour but changed the pitchratios of the melody. Between the pre- and posttraining tests,melody difference detection improved in both groups whenthe melodies differed in pitch contour, but it improved in nei-ther group when the melodies differed in pitch ratio.

Here we compared the effectiveness of two different three-day regimens for training on a musical-interval comparisontask that differed only in the configuration of the trainingblocks: all-practice, or practice+exposure. In the all-practiceregimen, listeners practiced the trained task continuouslythroughout each session, which has been a standard approachto perceptual training. In the practice+exposure regimen, ineach training session the blocks of task practice were inter-leaved with blocks of stimulus exposure in the absence ofpractice. Our motivation was that we had previously observedthat practice+exposure regimens can yield at least as muchlearning on perceptual tasks as all-practice regimens that takethe same total time. Thus, with practice+exposure training, theamount of practice could be reduced relative to conventionalall-practice training, with no reduction in learning. Practice+exposure regimens can be successful even when neither theblocks of practice nor the blocks of stimulus exposure aloneyield learning on their own (as we tested with practice+silenceand exposure+silence regimens). In previous investigations,practice+exposure training promoted learning on fine-grained discrimination tasks—auditory frequency discrimina-tion (Wright, Sabin, Zhang, Marrone, & Fitzgerald, 2010) andvisual orientation discrimination (Szpiro, Wright, & Carrasco,2014)—as well as on speech tasks—acquisition of a nonna-tive phonetic contrast (Wright, Baese-Berk, Marrone, &Bradlow, 2015) and adaptation to a foreign accent (Wright etal., 2015). Therefore, we suspected that it might aid learningon musical-interval perception, as well.

During the musical-interval comparison task, two musicalintervals were presented, and listeners were asked to indicate

which of the two was the target interval (a perfect fourth) ratherthan a foil interval. We chose this task because a listener whocannot distinguish between a target interval and a foil has nohope of labeling those intervals correctly in isolation. Isolatedinterval identification requires listeners’ resolution to be at leasta semitone, the smallest difference between two musical inter-vals, but there are several indications that pitch-ratio discrimi-nation is poorer than that in inexperienced listeners. For exam-ple, in one report, thresholds for pitch-ratio discriminationwere larger than a semitone, and as large as five semitones,in listeners with no musical experience, but were smaller than asemitone in formally trained musicians (McDermott et al.,2010). Likewise, during magnitude estimation of pitch ratios,listeners with less than ~ 2 years of musical experience oftengave the most similar pitch ratios along a half-semitone gridthe same or misordered magnitudes, whereas listeners with ~11 or more years of experience gave more accurate estimates(Russo & Thompson, 2005). After the training on the musical-interval comparison task was complete, we tested all listenerson the trained comparison condition; on the trained compari-son task using triangle tones, rather than the pure tones usedduring training (an untrained stimulus); and on musical-interval identification (an untrained task).

Finally, because all of the listeners completed the samepretraining test, we took advantage of the opportunity to exam-ine naïve performance on musical-interval comparison across arelatively large group of people. Pitch-ratio discrimination per-formance in naïve listeners has been reported to improve as thedifference in pitch ratios increases, and to be better in musiciansthan in nonmusicians (Burns&Campbell, 1994; Burns&Ward,1978; McDermott et al., 2010). In the present investigation, weassessed naïve performance on the related task of musical-interval comparison in a larger group than we had previouslyevaluated for discrimination (n = 28 vs. n = 4–8 per group).

Method

Listeners

We report data for a total of 28 listeners (21 females, 7 males),ranging in age from 18 to 27 years (mean 20.2 years, standarddeviation = 2.0). All had normal hearing thresholds at thestandard audiometric frequencies assessed during the investi-gation (see below), no previous experience with psychoacous-tic testing, and no formal musical-interval training. They re-ported between 0 and 8 years of experience playing a musicalinstrument (mean = 3.2 years, SE = 2.8). One listener reported15 years of experience with voice training. Individuals whoseresponse accuracy on the pretest (assessed from the first twoblocks of trials on the trained condition on Day 1; see below)was ≥ 85% correct (n = 9) were excluded from participation,because they had relatively little room for improvement (total

Atten Percept Psychophys (2019) 81:344–357 345

Page 3: Inducing musical-interval learning by combining task ... · Inducing musical-interval learning by combining task practice with periods of stimulus exposure alone David F. Little1

who completed the pretest: n = 37; total who completed theexperiment: n = 28). Note that the Day 1 performance for onelistener was 90% correct (Figs. 2B and 2C in the Results).That listener scored < 85% correct on the pretest (77% cor-rect), but 90% correct across the following 12 blocks of train-ing trials that same day. Listeners were recruited from theNorthwestern University community and paid for their partic-ipation. All procedures were approved by the NorthwesternOffice for the Protection of Research Subjects.

General protocol

All listeners participated in the same basic protocol over fourconsecutive days. On Day 1, a familiarization period wasfollowed by a pretest and a training session. On Days 2 and3 the training session was repeated, and on Day 4 there was aposttest on the trained condition and two untrained conditions,followed by a test of hearing thresholds. The training sessionsdiffered across the four different training regimens, completedby four different listener groups.

Stimuli

The stimuli were tokens of four different musical intervals.Each interval consisted of two sequentially presented pitchesthat were separated by a constant pitch ratio. The four intervalswere a major third (24/12), a perfect fourth (pitch ratio 25/12), aperfect fifth (27/12), and a major sixth (29/12), and are referredto as a third, fourth, fifth, and sixth throughout this article.Each pitch pair (interval token) was selected uniformly atrandom from a fixed set of 24 pitches, from all possible pitchpairs that satisfied the given pitch ratio. These 24 possiblepitches fell along the even-tempered scale, starting from thenote directly above A4 (440 Hz). In the even-tempered scale,the nth note above A4 has a pitch of 440 × 2n/12 Hz. Thus, thepitches ranged from 466 Hz (= 440 × 21/12) to 1760 Hz (= 440× 224/12). Because we selected each interval token from a fixedset of 24 pitches, the total number of possible tokens for eachinterval varied. We chose to select pitches from a fixed rangein order to discourage listeners from using absolute pitch as acue to differentiate one interval from another.

Each interval token was instantiated by either two puretones or two triangle tones. The two tones were presented inimmediate succession, with the lower pitch first. Each tonewas 400 ms in length with 5-ms half-cosine on and off rampsand was presented at 70 dB SPL. All stimuli were generatedusing a Roland Cakewalk US-25EX audio interface at a sam-pling rate of 44100 Hz and a bit rate of 16, and were deliveredto the left ear over Sennheiser HD265 headphones withcircumaural cushions. Listeners were tested in a sound-attenuated booth.

Trained task

The trained task was musical-interval comparison. On eachtwo-presentation forced-choice trial, listeners were presenteda target musical interval (a fourth) and one of three foil inter-vals (a third, fifth, or sixth). The target and foil intervals werepresented in random order and were marked on a computerscreen as BInterval 1^ and BInterval 2.^ The two presentationswere separated by 900 ms, measured from the offset of thesecond tone in the first presentation to the onset of the firsttone in the second presentation. Listeners were asked to indi-cate which of the two presentations contained the fourth bypressing a key on a computer keyboard. They were allowed torespond at their own pace. Feedback displayed on the com-puter screen after each response indicated whether the re-sponse was BCorrect!^ or BWrong!^ and provided the appro-priate label (B3rd,^ B4th,^ B5th,^ or B6th^) for the interval thathad been presented in each observation period. The feedbackremained on the screen for 1,350 ms, and then the next trialbegan. The mean trial length, including the listener responsetime, was ~ 4,300 ms. At the beginning of each block of trials,listeners were given two randomly selected tokens of a fourth,each labeled as a fourth.

Training regimens

Each of the four training regimens consisted of 12 30-trialtraining blocks per day for three consecutive days, for a grandtotal of 360 × 3 = 1,080 training trials. The daily sessions werecomposed of various combinations of three block types: prac-tice, exposure, and silence. During the practice blocks (Fig. 1,left panel), listeners performed the musical-interval compari-son task with pure-tone stimuli. During the exposure blocks(Fig. 1, middle panel), listeners were presented randomly gen-erated tokens of pure-tone fourths in the background whilethey performed a written symbol-to-number matching task.Stimulus presentations were organized into Btrials^ modeledafter the practice trials, but in which only fourths were pre-sented. The purpose of the written matching task was to en-gage the listeners in a nonauditory task while the stimulusexposures were presented. For that task, listeners were provid-ed with a sheet of paper, at the top of which was a table of 80randomly selected Wingding-font symbols, each associatedwith a number from 1 to 80. At the bottom of the sheet wasa random sequence of 160 instances of those 80 symbols.Listeners were asked to write the corresponding number nextto each symbol sequentially from the top to the bottom of thesheet, and to continue to another sheet if they finished beforethe interval presentations were complete. During the silenceblocks (Fig. 1, right panel), listeners performed the same taskas in the practice blocks (musical-interval comparison), exceptthat no interval tokens were presented. Listeners were told thatthey would hear no sounds, and they were asked simply to do

346 Atten Percept Psychophys (2019) 81:344–357

Page 4: Inducing musical-interval learning by combining task ... · Inducing musical-interval learning by combining task practice with periods of stimulus exposure alone David F. Little1

their best to Bguess^ what the randomly generated correctanswer would be. They received the same feedback as duringthe practice blocks. Thus, the experience during the silenceblocks was as similar as possible to the practice blocks, butwithout any sound stimuli.

The four training regimens provided different amounts ofpractice and exposure (see Fig. 3A, right column, in theResults). To document the effect of continuous practice, inthe all-practice regimen all 12 blocks in each session werepractice blocks (360 practice trials per session; n = 7). Todetermine the effect of replacing half the practice trials withexposure alone, in the practice+exposure regimen the 12blocks in each session alternated every two blocks betweenpractice and exposure blocks, for a total of six blocks each,starting with practice (180 practice trials + 180 exposure trialsper session; n = 7). To establish the influence of the practiceportions of the practice+exposure regimen, in the practice+silence regimen the 12 blocks in each session alternated everytwo blocks between practice and silence blocks, for a total ofsix blocks each, starting with practice (180 practice trials +180 silence trials per session; n = 7). Likewise, to ascertain theinfluence of the exposure-alone portions of the practice+ex-posure regimen, in the exposure+silence regimen the 12blocks in each session alternated every two blocks betweenexposure and silence blocks, for a total of six blocks each,starting with silence (180 exposure trials + 180 silence trialsper session; n = 7).

Listeners were assigned to groups as follows. The first fourlisteners completed the practice+exposure regimen, and thenext three completed the practice+silence regimen.Subsequent listeners were assigned at random to the four reg-imens. There was no significant relationship between percentcorrect on Day 4 and the date of the first day of training, eitherbefore (p = .384) or after (p = .657) controlling for any influ-ence of the percent correct on Day 1, according to a single-level logistic regression (following the same approach de-scribed in the Analysis section below).

Familiarization

On Day 1, listeners were first familiarized with the concept ofa musical interval. To do so, they were shown schematic spec-trograms of single tokens of each of the four musical intervals,each with a different starting pitch, and told that the relation-ship between pitches, not the absolute pitch, was the importantfeature of the stimulus. They were then given a verbal descrip-tion of the interval-comparison task, followed by four presen-tations of randomly selected tokens of a fourth, each labeled asa fourth; eight practice trials with feedback; a second set offour tokens of a fourth; and a second set of eight practice trials.

Pre- and posttests

The pre- and posttests were the same for all four regimens.The pretest occurred on Day 1 immediately after the familiar-ization period, and consisted of two 30-trial practice blockswith feedback. These blocks were in addition to the 12 dailytraining blocks that occurred on the same day.

The posttest occurred on Day 4 and consisted of four 30-trial blocks in each of three conditions: musical-interval com-parison with pure tones (trained task and stimulus), musical-interval comparison with triangle tones (untrained stimulus),and musical-interval identification with pure tones (untrainedtask). During the interval comparison with triangle tones, tri-als proceeded as in the trained task, except that each pitch ratiowas composed of triangle tones rather than pure tones. Atriangle tone is defined as |(4ft) mod 4 – 2| – 1, where f isthe frequency in hertz and t is the time in seconds. Duringinterval identification, on each trial listeners were presentedwith a single pure-tone interval token and asked to identifywhich interval had been presented: a third, fourth, fifth, orsixth. Each token was randomly selected from the four possi-ble pitch ratios with equal probability. The four options for theinterval label were displayed on a computer screen, and lis-teners were asked to indicate which interval they heard by

Time

Fre

q

Which interval was the 4th? Which interval was the 4th?

Time

Fre

q

Practice Exposure Silence

Training Block Types

Which interval was the 4th?

Time

Fre

q

Fig. 1 Training block types. Schematics of a single trial for each of thethree block types: During practice blocks (left panel), listeners werepresented two musical intervals and asked to indicate which of the twowas a fourth. The starting pitch of each interval was selected at random, sothe two intervals in each trial usually began on different pitches. Duringexposure blocks (middle panel), listeners performed a nonauditory

symbol-to-number matching task while in the background two examplesof a fourth were presented on each Btrial.^ During silence blocks (rightpanel), listeners were presented the same visual cues as during practiceblocks, but without any sounds, and were asked to respond by making aBguess^ for each trial. Listeners were trained using various combinationsof the three block types.

Atten Percept Psychophys (2019) 81:344–357 347

Page 5: Inducing musical-interval learning by combining task ... · Inducing musical-interval learning by combining task practice with periods of stimulus exposure alone David F. Little1

pressing a key on a keyboard. Following each response, feed-back displayed on the computer screen indicated whether theresponse was BCorrect!^ or BWrong!^ and indicated which ofthe four intervals had been presented. The feedback remainedon screen for 1,350 ms before the next trial began. Listenerscompleted the interval-comparison trials with pure tones first,and then completed the other two condit ions incounterbalanced order. Finally, we verified that listeners hadnormal pure-tone thresholds at the standard audiometric fre-quencies, as assessed with a two-interval forced-choiceprocedure.

Analysis

We analyzed the data using single-level regressions for themean data and a multiple-level (mixed-effects) regression forthe individual data. For the mean data, the dependent variablewas the proportion of correct responses (ranging from 0 to 1;percent correct = proportion correct × 100), and for the indi-vidual data it was the response accuracy on a given trial (either0 or 1). The regressions employed a logistic curve, which canbe used to model bounded quantities (between 0 and 1) ordiscrete outcomes (0 or 1). For the single-level regressions,the mean proportion correct was computed for each listenerand level of the within-listener predictors (e.g., day). Thesemeans were then fitted by each regression using all predictors,both the within-listener (e.g., day) and across-listener (e.g.,regimen) ones. Which specific within- and across-listener pre-dictors were included depended on the specific analysis. Allsingle-level regressions of the data from the trained conditionwere fitted by regimen and day. The regressions of the datafrom the untrained posttest conditions were fitted either by theregimen or by the trained-condition posttest. Additional pre-dictors included for the specific analyses are noted in the re-sults (e.g., a more detailed analysis of the training data includ-ed the foil as a predictor). For the multilevel regression, theindividual data were modeled with two levels: One levelmodeled the effects of each individual listener, and the otherthe effects of the regimen. Each level contributed to the slopeand intercept of a line fitted across days for each comparisonfoil. Note that any group differences revealed by this analysiswere present in spite of any individual variation captured bythe model, such as that due to starting performance.

To implement these regressions we used a Bayesian ap-proach for its flexibility (Gelman & Hill, 2007) and for itsconservative nature when N is small. In a Bayesian analysis,a probability distribution called the posterior is computed foreach parameter of the model (e.g., the mean of the distributionand the variance of the distribution). The posterior is based ona prior, which is an assumed distribution of the parameter, andon the data themselves. The prior is selected to shrink theposterior toward zero. Thus, when compared to a more tradi-tional regression analysis, a more conservative estimate is

computed in this way when n is small. To compute the poste-rior, we used the Stan modeling language (version 2.7.0) tofind 4,000 Markov chain Monte Carlo samples (Hoffman &Gelman, 2014) (four chains, of 2,000 samples each, with thefirst 1,000 burn-in samples of each chain discarded). Thisnumber of samples was selected to keep the estimated errorof the method acceptably low (≤ 0.28 percentage points of thedependent variable). These posterior samples were then usedto compute the p values, means, and standard errors reportedin the Results. In the few cases in which the standard error wasa nonsensical quantity because the posterior distribution wasasymmetric, the highest posterior-density interval with a den-sity equivalent to one standard error (~ 68.2%) is shown in thefigures. Where appropriate, p values were corrected for mul-tiple comparisons (Gelman, Hill, & Yajima, 2012; Gelman &Tuerlinckx, 2000).

For the single-level regressions, to ensure stability and ro-bustness, the dependent variable (y) was transformed by theexpression r/2 + y(1 – r), where r was a relatively small pro-portion of the data range (0 to 1) modeled to best fit the data.For further robustness, in both the single- and multilevel re-gressions we used a beta distribution to model overdispersion.The prior over r was a zero-mean normal distribution (trun-cated below 0) with a variance of 5 percentage points, and theprior over the beta was an uninformative zero-mean normaldistribution (truncated below 0) with a variance of 100 (whichis effectively uniform, given the range of the data). Priors overthe coefficients of the logistic regression were zero-meanCauchy distributions with a variance of 5 (Gelman, Jakulin,Pittau, & Su, 2008). The hyperpriors in the multilevel regres-sion were weakly informative Cauchy distributions (scale 1)(Gelman, 2006). We validated the model designs and choiceof priors using posterior predictive checks (Gelman, Meng, &Stern, 1996). Those checks indicated that the resulting modelsaccurately predicted the 97.5%, 84.1%, 50%, 15.9%, and2.5% percentiles of the residuals with a standard error of ≤2.8 percentage points (posterior predictive p ≥ .248).

Code availability

The full analyses and all experimental data are available athttps://github.com/haberdashPI/apmusic.

Results

Pretraining performance

The initial response accuracy on the interval-comparison taskdiffered across the foils (fourth vs. third, fifth, or sixth) andimproved with increasing musical experience. Figure 2Ashows the percentages of correct responses for each of the28 listeners (raw data; gray circles) for the three foils (x-axis),

348 Atten Percept Psychophys (2019) 81:344–357

Page 6: Inducing musical-interval learning by combining task ... · Inducing musical-interval learning by combining task practice with periods of stimulus exposure alone David F. Little1

as well as the means and standard errors for these data, esti-mated from a multilevel regression across all days of training(bars and error bars; see the Method section). Comparisonperformance was greater for the fourth versus the sixth(73.9%, SE = 5.3) than for the fourth versus the third(63.6%, SE = 3.8) or the fourth versus the fifth (61.2%, SE =4.5) (multilevel regression: p ≤ .041), and did not differ be-tween the latter two cases (p = .510). Figure 2B shows the

individual starting performance (raw data; gray circles) for thethree pairwise comparisons (x- and y-axes) among the foils(panels), as well as the means and standard errors from themultilevel regression (lines and gray regions). Starting perfor-mance was significantly correlated between each pair of foils,such that the regression coefficients for one foil could accountfor ≥ 53% of the variance in the coefficients for another foil(multilevel regression: r ≥ .73, SE ≤ .12, p ≤ .001). Figure 2C

a

50

70

90

% C

orre

ct o

n D

ay 1

Naïve Performance

3rd vs. 4th 5th vs. 4th 6th vs. 4th

b

3rd vs. 4th % Correct on Day 1 5th vs. 4th % Correct on Day 1 6th vs. 4th % Correct on Day 1

5th

vs. 4

th %

Cor

rect

on

Day

1

6th

vs. 4

th %

Cor

rect

on

Day

1

3rd

vs. 4

th %

Cor

rect

on

Day

1

50

70

90

50 70 90

50

70

90

50 70 90

50

70

90

50 70 90

c

Years of Musical Experience

% C

orre

ct o

n D

ay 1

15 years50

60

70

80

90

100

0 1 2 3 4 5 6 7 8 9

Fig. 2 Naïve performance (Day 1). On each trial, listeners were presenteda target musical interval (a fourth) and one of three foils (a third, fifth, orsixth) and asked to indicate which interval was the target. (A) Individualresponse accuracy on Day 1 separated by foils for the 28 listeners (graycircles; raw data), with means and standard errors as estimated by amultilevel regression (bars and error bars). (B) Individual response accu-racy onDay 1 (gray circles; raw data) for each of the three foils (x-axes) as

compared to another of the foils (y-axes) for the three pairwise compar-isons (panels), with means and standard errors as estimated by the mul-tilevel regression (lines and gray regions). (C) Individual response accu-racy on Day 1 across years of musical experience (gray circles; raw data),with the means and standard errors of this relationship as estimated by themultilevel regression (line and gray region).

Atten Percept Psychophys (2019) 81:344–357 349

Page 7: Inducing musical-interval learning by combining task ... · Inducing musical-interval learning by combining task practice with periods of stimulus exposure alone David F. Little1

shows the number of years of musical experience (x-axis) andthe percent-correct performance (y-axis) for each listener (rawdata; symbols), along with the fit and standard error acrossyears of musical experience according to the multilevel regres-sion (line and gray region). Each year of experience improvedthe odds in favor of a correct response on Day 1 by 10% (SE =3.1, p < .005). Note that, given the odds (o), the probability (p)of a correct response is p = o/(1 + o).

Trained condition

Only the practice+exposure regimen yielded learning onmusical-interval comparison at the group level. Figure 3Ashows the mean percent-correct performance for each group(raw data; symbols with standard errors) across the four daysof training, as well as the fit and standard error of a single-levelregression of day and group (lines and shaded regions).Accuracy for the practice+exposure group (black squares) im-proved from 68.9% (SE = 4.1) on Day 1 to 88.2% (SE = 2.7)on Day 4 (single-level regression: p = .002), but did not im-prove for any of the other groups: all-practice (gray dia-monds), practice+silence (gray triangles), or exposure+silence(white circles) (p ≥ .142, starting at 61.7%–65.7%, SE ≤ 5.1,and ending at 71.5%–71.9%, SE ≤ 4.8). Day 1 responses didnot differ across the groups (absolute differences ≤ 6.6 per-centage points, SE ≤ 6.1, p ≥ .272). By the end of training, theperformance of the practice+exposure group was ≥ 16.3 per-centage points higher than that of the remaining three groups(SE ≤ 5.5, p ≤ .003), whose performances did not differ fromone another (absolute difference ≤ 0.7 percentage points, SE ≤6.3, p ≥ .927). The amounts of learning did not differ acrossthe three foils (fourth vs. third, fifth, or sixth) for any of thegroups (single-level regression of day, group, and foil: abso-lute difference ≤ 2.2 percentage points, SE ≤ 7.8, p ≥ .786),despite the differences in starting performance across foils(see above).

The advantage of the practice+exposure group over theother three groups remained even after accounting for individ-ual differences in starting performance (multilevel regression:p ≤ .046). At the individual level, more listeners learned, andthey tended to learn more, in the practice+exposure group thanin the other three groups. Figure 3B shows the performanceacross days for each individual listener (raw data; symbols anddotted lines), separated by training group (panels), as well asthe fits and standard errors of the multilevel regression to thesedata (lines and shaded regions). Listeners who improved sig-nificantly from Day 1 to Day 4 (by the multilevel regression)were termed learners (black lines), and the others asnonlearners (dark gray lines). All seven listeners in the prac-tice+exposure group learned (multilevel regression: p ≤ .001).In contrast, only about half of the seven listeners improved ineach of the other groups (n = 3 or 4; p ≤ .013). Furthermore,among the listeners who improved, the amount of

improvement tended to be larger in the practice+exposuregroup (15.7–23.3 percentage points, SE ≤ 4.1, as estimatedfrom the regression) than in the other groups: all-practice (n= 3; 11.2–20.4, SE ≤ 4.4), practice+silence (n = 3; 9.9–14.0,SE ≤ 4.0), and exposure+silence (n = 4; 11.7–16.4, SE ≤ 5.5).Listeners identified as nonlearners improved by less than 3.1percentage points (SE ≤ 4.2, p ≥ .164). The amount of musicalexperience did not appear to affect the results. Musical expe-rience did not differ across the groups [ANOVA of experienceacross groups: F(3, 24) = 0.1, p = .969, nonparametricKruskal–Wallis χ2(3) = 0.6, p = .891]. Experience also didnot predict the amount of listener improvement in the multi-level regression (p ≥ .721), and its inclusion in that regressiondid not influence the overall outcome.

Finally, Fig. 3C shows the performance of individuallisteners (raw data; symbols) on Day 1 (x-axis) versusDay 4 (y-axis), and the fits by single-level regression foreach group (lines). Consistent with the outcome of themultilevel analysis, the data points for the majority oflisteners in the all-practice (gray diamonds), practice+si-lence (dark gray triangles), and exposure+silence (whitecircles) groups fell close to the diagonal (no learning) orbelow it (worsening; light gray region). In contrast, thedata points for all of the listeners in the practice+exposuregroup (black squares) were well above the diagonal(learning), regardless of their starting performance.

Untrained conditions

Untrained stimulus: Interval comparison with triangle tonesThe posttest performance on musical-interval comparisonwith an untrained stimulus (triangle tones) essentially matchedthat with the trained stimulus (pure tones). Figure 4A showsthe percent-correct performance for each individual listener(raw data; gray circles), separated by training groups, alongwith the means and standard errors determined by a single-level regression across stimuli and group (bars and error bars),for the pure-tone (solid bars) and triangle-tone (striped bars)stimuli. Within each group, the response accuracies did notdiffer between the two stimulus types (single-level regression:absolute difference ≤ 2.2 percentage points, SE ≤ 5.9, p ≥.711). Therefore, the practice+exposure group outperformedthe other three groups with the untrained triangle tones (≥15.4, SE ≤ 5.0, p ≤ .002), just as with the trained pure tones(≥ 16.3, SE ≤ 4.9, p ≤ .002). Figure 4B shows the percent-correct performance for the pure tones (x-axis) versus the tri-angle tones (y-axis) for each listener in each group (raw data;symbols), as well as the fit and standard error of a single-levelregression of triangle-tone by pure-tone response accuracy(line and shaded area). Performance with the pure tonesaccounted for 80.3% of the variance in performance with thetriangle tones (single-level regression: SE = 1.7, p < .001). Therelationship between response accuracy with the pure tones

350 Atten Percept Psychophys (2019) 81:344–357

Page 8: Inducing musical-interval learning by combining task ... · Inducing musical-interval learning by combining task practice with periods of stimulus exposure alone David F. Little1

and triangle tones was not influenced by the group or foil(group—single-level regression of triangle-tone responsesacross pure-tone responses and group: absolute difference ≤2.8 percentage points, SE ≤ 4.9, p ≥ .536; foil—single-levelregression of triangle-tone responses across pure-tone re-sponses and foil: absolute difference ≤ 4.6 percentage points,

SE ≤ 10.7, p ≥ .617). Thus, regardless of group or foil, perfor-mance with the untrained triangle-tone stimulus was predictedwell by performance with the trained pure-tone stimuli.

Untrained task: Interval identification Posttest performanceon the untrained interval-identification task was better for

Day

1 2 3 4

Practice+Exposure

1 2 3 4

% C

orre

ct

b

1 2 3 4

Practice+Silence

50

70

90

All-Practice

1 2 3 4

Exposure+Silence

aPractice+Exposure

60

70

80

90

1 2 3 4Day

% C

orre

ct

Practice+Silence

All-Practice

Exposure+Silence

Practice Exposure Practice Exposure

Practice

Practice Silence Practice Silence

Exposure Silence Exposure Silence

Daily Training Regimen

Training

Day 1 (% Correct)

c

Day

4 (

% C

orre

ct)

50

70

90

50 70 90

Practice+Exposure

Practice+Silence

All-Practice

Exposure+Silence

Atten Percept Psychophys (2019) 81:344–357 351

Page 9: Inducing musical-interval learning by combining task ... · Inducing musical-interval learning by combining task practice with periods of stimulus exposure alone David F. Little1

the practice+exposure group than for the other three groups.Figure 5A shows the percentages of correct responses on theidentification task for each listener (gray circles), separated bytraining group, and the means and standard errors determinedby a single-level regression across group (bars and error bars).The overall identification accuracy for the practice+exposuregroup was 55.9% (SE = 3.9), ≥ 14.3 percentage points higherthan for each of the other three groups (single-level regression:SE = 5.5, p ≤ .015), among which performance did not differ(absolute difference ≤ 7.6 percentage points, SE ≥ 5.4, p ≥.183). Figure 5B shows the posttest response accuracy (rawdata; shades and symbols) for the trained interval-comparisontask (x-axis) and the untrained identification task (y-axis), aswell as the fit and standard error of a single-level regression ofidentification by comparison response accuracy (line andshaded region). Comparison performance at the posttestaccounted for 52.7% of the variance in identification perfor-mance (single-level regression: SE = 3.9, p < .001; without theindividual at 87.6% correct comparison performance: R2 =.508, SE = 0.039, p < .001). The relationship between identi-fication and comparison accuracy did not differ across thegroups (single-level regression of identification accuracyacross comparison accuracy and group: absolute difference ≤10.1 percentage points, SE ≤ 8.6, p ≥ .249). Figure 5C isplotted as is Fig. 5B, but with the posttest response accuracyshown separately for each interval (raw data; symbols) andwith the fits and standard errors of a single-level regressionof identification response accuracy by comparison responseaccuracy and interval (lines and shaded regions). For compar-ison response accuracies above 64.5%, identification perfor-mance was better for the perfect fourth than for the other threeintervals (single-level regression: p ≤ .05). Therefore, it ap-pears that regardless of group, performance at the end of train-ing on the trained comparison task determined how well lis-teners performed on the untrained identification task,

especially when identifying the interval that had been the tar-get during comparison training (the fourth).

% C

orre

ct o

n D

ay 4

Tr

iang

le T

one

(Unt

rain

ed)

b

% Correct on Day 4 Pure Tone (Trained)

Practice+SilencePractice+Exposure

Exposure+Silence

All-Practice

% C

orre

ct o

n D

ay 4

a

Untrained Stimulus

50

70

90

50 70 90

50

70

90

All-Practice

Practice+Exposure

Practice+Silence

Exposure+Silece

Pure Tone (Trained)

Triangle Tone (Untrained)

Fig. 4 Performance on the final day (Day 4) for the trained and untrainedstimuli. (A) Mean response accuracy on Day 4 for the trained musical-interval comparison task with the trained pure-tone (solid bars) and un-trained triangle-tone (striped bars) stimuli for each of the four trainingregimens (shades), as estimated by a single-level regression, with stan-dard errors from the regression (error bars) and the individual data (graycircles; raw data). The dotted line indicates chance performance. (B)Individual response accuracy on Day 4 for pure-tone (x-axis) andtriangle-tone (y-axis) stimuli in each of the four training regimens (sym-bols; raw data), with means and standard errors as estimated by a single-level regression (line and gray region).

352 Atten Percept Psychophys (2019) 81:344–357

Fig. 3 Performance across days. (A) Mean response accuracy (symbolsand error bars; from raw data) across days (x-axis) on the trained musical-interval comparison task, along with means and standard errors for thosedata as estimated by a single-level regression (lines and gray regions), foreach of the four training regimens (symbols). Each training sessionconsisted of either (1) periods of practice alternating with periods ofstimulus exposure in the absence of practice (practice+exposure, n = 7;black squares), (2) continuous practice on the comparison task (all-prac-tice, n = 7; gray diamonds), (3) periods of practice alternating with pe-riods of silence (practice+silence, n = 7; dark gray triangles), or (4) pe-riods of stimulus exposure alone alternating with periods of silence (prac-tice+silence, n = 7; white circles), all for the same total duration. (B)Individual response accuracy (symbols and dotted lines; raw data) acrossdays (x-axis) and training regimens (panels), with means and standarderrors as estimated by a multilevel regression (solid lines and gray re-gions), for learners (black lines) and nonlearners (gray lines). (C)Individual mean response accuracy on Day 1 (x-axis) versus Day 4 (y-axis) for each training regimen (symbols), as well as the fits by single-level regression for each group (lines).

R

Page 10: Inducing musical-interval learning by combining task ... · Inducing musical-interval learning by combining task practice with periods of stimulus exposure alone David F. Little1

Discussion

The purpose of the present study was to compare the amountsof learning on a musical-interval comparison task generatedby two different training regimens. The task was to indicatewhich of two musical intervals was the target interval (a per-fect fourth), rather than one of three foils (a major third, per-fect fifth, and major sixth). An all-practice group who per-formed the task continuously throughout each of three con-secutive daily practice sessions did not learn. In contrast, apractice+exposure group who performed the task only inter-mittently during each session, but who received stimulus ex-posure alone during the remaining periods, did learn. Thislearning resulted from the combination of practice and expo-sure, because two control groups also did not learn: a prac-tice+silence group, who only performed the task intermittent-ly, with no periods of stimulus exposure alone, and an expo-sure+silence group, who only received the intermittent periodsof stimulus exposure alone, but did not perform the task. Aftertraining, the performance of the practice+exposure group wasalso better than that of the other groups on musical-intervalcomparison with an untrained triangle-tone stimulus, as wellas on an untrained musical-interval identification task. Finally,the magnitudes of improvement were similar for all three foils(third, fifth, and sixth)—despite differences in comparisonperformance across them on the first day—and had no clearrelationship with the number of years of musical experience—despite the advantage that more years of experience providedon the first day. Overall, the results suggest that the combina-tion of task practice and stimulus exposure alone is a promis-ing means to improve musical-interval perception, and moregenerally that the configuration of training blocks is crucial tothe outcome of musical-interval training.

Effects of practice and stimulus exposure alone

The present results extend the demonstrations of the effective-ness of combining task practice and stimulus exposure alonefrom fine-grained discrimination (Szpiro et al., 2014; Wrightet al., 2010) and speech (Wright et al., 2015) tasks to a

% C

orre

ct Id

entif

icat

ion

on D

ay 4

b

% Correct Comparison on Day 4

a%

Cor

rect

Iden

ticia

tion

on D

ay 4

20

40

60

80

100

Untrained Identification Task

All-Practice

Practice+Exposure

Practice+Silence

Exposure+Silece

20

40

60

80

100

60 80 100

Practice+SilencePractice+Exposure

Exposure+Silence

All-Practice

% C

orre

ct Id

entif

icat

ion

on D

ay 4

c

% Correct Comparion on Day 4

20

40

60

80

100

60 80 100

3rd4th5th 6th

Atten Percept Psychophys (2019) 81:344–357 353

Fig. 5 Performance on the final day (Day 4) for the trained and untrainedtasks. (A) Mean response accuracy on Day 4 for the untrained musical-interval comparison task (bars) for each of the four training regimens(shades), as estimated by a single-level regression, with standard errorsfrom the regression (error bars) and the individual data (gray circles; rawdata). (B) Individual response accuracy on Day 4 for the trained musical-interval comparison task (x-axis) and the untrained identification task (y-axis) for each of the four training regimens (symbols; raw data), withmeans and standard errors as estimated by a single-level regression (lineand gray region). (C) Individual response accuracy on Day 4 for thetrained musical-interval comparison task (x-axis) and the untrained iden-tification task (y-axis) for each interval (symbols), with means and stan-dard errors as estimated by a single-level regression (lines and grayregions).

R

Page 11: Inducing musical-interval learning by combining task ... · Inducing musical-interval learning by combining task practice with periods of stimulus exposure alone David F. Little1

challenging nonspeech classification task. That this combina-tion training aids learning on such a wide range of tasks sug-gests that the mechanisms that it engages are quite general.One proposal is that there are at least two requirements forlong-lasting perceptual learning: a requirement for sufficientsensory stimulation—provided by stimulus exposures with orwithout associated task practice—and another requirement foran internal permissive signal, such as attention or reward—provided by task practice but not by stimulus exposure alone(Wright et al., 2015; Wright et al., 2010). The idea is thatstimulus exposure alone cannot generate long-lasting percep-tual learning on its own because such exposures lack thispermissive signal (Ahissar & Hochstein, 2004; Gilbert &Sigman, 2007; Seitz & Dinse, 2007; Seitz & Watanabe,2005; Wright & Sabin, 2007; Wright & Zhang, 2009); how-ever, the combination of task practice and stimulus exposurealone can yield perceptual learning (1) because the periods oftask practice lead to effects that remain during subsequentperiods of stimulus exposure alone, allowing the exposuresto contribute to learning, and/or (2) because the periods ofexposure, though unable to contribute directly to learning,increase the potential for subsequent periods of task practiceto generate learning (Wright et al., 2010).

The present data also demonstrate that the combination oftask practice and stimulus exposure alone can exceed the ben-efits of an equivalent period of task practice, as has beenreported previously (Szpiro et al., 2014; Wright et al., 2015).One potential explanation for this finding is that practice—butnot stimulus exposure alone—slowly depletes a limited re-source that is necessary for learning and that is replenishedduring practice breaks. Similar explanations have been pro-posed to account for the benefits of distributed over massedtraining (Naqib, Sossin, & Farah, 2012; Pavlik & Anderson,2005). New here is the idea that stimulus exposure alone nei-ther draws on that limited resource nor prevents that resourcefrom recovering, even though this exposure can contribute tolearning when paired with task practice. Note that this pro-posed limited resource differs from the proposed permissivesignal. The permissive signal is triggered by task practice, andits effects can extend to periods of stimulus exposure alone,whereas the limited resource is depleted by task practice and isnot influenced by mere stimulus exposure.

Although the key outcome reported here is the benefit ofcombining task practice with stimulus exposure alone, at leastfour aspects of the practice periods themselves may have con-tributed to the learning observed. First, the task was musical-interval comparison rather than single-interval identification,as in traditional musical aural training (e.g., Karpinski, 2000).The direct comparison between stimuli that this task requiredmay have helped emphasize the differences between similarpitch ratios and thereby aided learning. Second, the feedbackafter each trial indicated not only whether the response wascorrect or incorrect—as is typical for a discrimination task—

but also provided the names of the target and foil intervals.This joint feedback may have facilitated the formation ofmusical-interval categories, potentially by targeting a rule-based learning system for which such feedback can facilitatecategory learning (Ashby & Maddox, 2005; Yi &Chandrasekaran 2016). Third, on a given trial, as comparedto the pitch ratio of the target interval (a fourth), the pitch ratioof the foil interval could be either smaller (a fourth vs. a third )or larger (a fourth vs. a fifth or sixth), rather than consistentlylarger, as in the tasks typically used to assess pitch-ratio dis-crimination (Burns & Campbell, 1994; Burns & Ward, 1978;McDermott et al., 2010). The task thus required reference toan absolute pitch ratio, rather than simply the selection of thelarger of two pitch ratios, which may have supported learningof the absolute pitch ratio of the target. Fourth, the pitch ratioof the target interval remained constant throughout the exper-iment, rather than varying as in traditional musical aural train-ing (Karpinski, 2000). It is possible that training multiple dif-ferent targets in a single session would have impeded learningon any one target, as has been documented for a number ofperceptual discrimination tasks (Aberg & Herzog, 2009;Banai, Ortiz, Oppenheimer, & Wright, 2010; Maidment,Kang, Gill, & Amitay, 2015; Parkosadze, Otto, Malania,Kezeli, & Herzog, 2008; Seitz et al., 2005; Yotsumoto,Chang, Watanabe, & Sasaki, 2009).

Just as some specific aspects of the practice periods mayhave contributed to the present results, the particular nature ofthe stimulus exposure alone may also have been important.Listeners were presented two tokens of a fourth on each trialduring stimulus exposure alone, but were presented only asingle token of a fourth paired with a single token of a foil(third, fifth, or sixth) on each trial during practice. Thus, dur-ing stimulus exposure alone, there were twice as many pre-sentations of the target interval (the fourth) as during practice,and no presentations of the foil intervals. These differencesmay have contributed to the greater effectiveness of the prac-tice+exposure compared to the all-practice regimen. Even ifthat were the case, however, their influence was not sufficientfor the exposure+silence and practice+silence regimens toyield different outcomes.

Generalization across stimulus and task

The similarity in mean posttest comparison performance be-tween the trained (pure tone) and untrained (triangle tone)stimuli, as well as the strong correlation in individual perfor-mance at posttest between those two stimulus types, suggeststhat learning generalized across the stimulus types. The impli-cation is that training modified the neural circuitry that wasresponsive to the pitch ratio of the stimulus and was at leastsomewhat insensitive to the stimulus type. Cross-stimulusgeneralization has been documented on other auditory dis-crimination tasks, though it is relatively rare following

354 Atten Percept Psychophys (2019) 81:344–357

Page 12: Inducing musical-interval learning by combining task ... · Inducing musical-interval learning by combining task practice with periods of stimulus exposure alone David F. Little1

training on a single condition (for reviews, see Fahle &Poggio, 2002; Wright & Zhang, 2009).

Learning also appeared to have generalized from intervalcomparison to interval identification, because the group thatlearned the most during comparison training (practice+expo-sure) performed the best on average on the identification taskduring the posttest, and because performance on the compar-ison and identification tasks was strongly correlated acrossindividuals. If so, the generalization was greatest for the pitchratio of the trained musical interval, given the better identifi-cation performance for the target interval than for the foilintervals. The implication here is that training modified neuralcircuitry that contributed to both the comparison and identifi-cation of musical intervals. Others have reached a similar con-clusion concerning the relationship between discriminationand identification, based on evidence that the discriminationof pitch ratios is better across rather than within musical-interval category boundaries (Burns & Campbell, 1994;Burns & Ward, 1978). A shared process for both the discrim-ination and identification of speech sounds has also been im-plicated, because discrimination training and identificationtraining have both led to similar improvements in discrimina-tion (Flege, 1995; Wayland & Li, 2008), and discriminationtraining has led to improvements on various speech-classification-related tasks (Fu, Galvin, Wang, & Nogaki,2005; Moore, Rosenberg, & Coleman, 2005; e.g., Pisoni,Aslin, Perey, & Hennessy, 1982; Strange & Dittmann, 1984).

Finally, the strong correlation in performance across alllisteners at the posttest between the trained and untrained stim-ulus types, and between the trained and untrained tasks, sug-gests that the important factor for generalization was theamount of learning in the trained condition, irrespective ofwhich training regimen induced that learning. It therefore ap-pears that the different training regimens may have modifiedthe same locus of learning, but to varying degrees.

Naïve interval-comparison performance

Before training, the present listeners were better able to identifywhich of two intervals was the target interval when the compar-ison was to foils with the largest pitch-ratio difference from thetarget (fourth vs. sixth), as opposed to the foils with pitch ratiosthat weremore similar to the target (fourth vs. third and fourth vs.fifth). This outcome is consistent with prior studies of pitch-ratiodiscrimination performance in listeners with varying degrees ofexperience (Burns & Campbell, 1994; Burns & Ward, 1978;McDermott et al., 2010). Furthermore, the performance of thepresent listeners was highly correlated between any given pair offoil intervals. At least some of the across-listener variation can beattributed to differences in musical experience: Before training,musicians were better at the musical-interval comparison taskthan were nonmusicians, as has been reported previously forpitch-ratio discrimination (Burns & Campbell, 1994; Burns &

Ward, 1978; McDermott et al., 2010). This musician advantagemay actually have been underestimated in the present case, be-cause the inclusion criteria (≤ 85% pretest accuracy and no for-mal interval-training experience) likely excludedmoremusiciansthan nonmusicians. Thus, it appears that musical intervals areordered perceptually by their pitch-ratio magnitude, and that be-fore training the capacity to recognize specific intervals in directcomparisons differs across listeners due to systematic variation inan ability that is enhanced by musical experience.

Summary

1. Combining periods of task practice and stimulus exposurealone led to better learning on musical-interval comparisonthan did continuous practice, demonstrating that the configu-ration of training trials can influence the amount of musical-interval learning.

2. Training on musical-interval comparison led to betteridentification of those same intervals, suggesting that difficul-ties in identification might be mitigated by interval-comparison training.

3. Neither initial performance nor musical experience affect-ed the benefits of combining practice on musical-interval com-parison and stimulus exposure alone, suggesting that such train-ing may be useful to listeners at a variety of experience levels.

Author note This work was supported in part by the NationalInstitute on Deafness and Other Communication Disorders,National Institutes of Health; by the NorthwesternUniversity Bioscientist program, funded by HHMIUndergraduate Science Education Grants to ResearchUniversities #52006934; and by the Defense AdvancedResearch Projec ts Agency (DARPA) Biologica lTechnologies Office (BTO) TNT program under the auspicesof Dr. Doug Weber and Tristan McClure-Begley through theSpace and Naval Warfare Systems Center, Pacific Grant/Contract No. N66001-17-2-4011. We thank JessicaConderman for providing valuable input in the design of thetrained task.

DisclaimerAny opinions, findings, and conclusions or recom-mendations expressed in this publication are those of the au-thor(s) and do not necessarily reflect the views of the DefenseAdvanced Research Projects Agency (DARPA) BiologicalTechnologies Office (BTO).

References

Aberg, K. C., & Herzog, M. H. (2009). Interleaving bisection stimuli—randomly or in sequence—does not disrupt perceptual learning, it

Atten Percept Psychophys (2019) 81:344–357 355

Page 13: Inducing musical-interval learning by combining task ... · Inducing musical-interval learning by combining task practice with periods of stimulus exposure alone David F. Little1

just makes it more difficult. Vision Research, 49, 2591–2598. doi:https://doi.org/10.1016/j.visres.2009.07.006

Ahissar, M., & Hochstein, S. (2004). The reverse hierarchy theory ofvisual perceptual learning. Trends in Cognitive Sciences, 8, 457–464. doi:https://doi.org/10.1016/j.tics.2004.08.011

Arenson, M. A. (1984). Computer-based instruction in musicianshiptraining: Some issues and answers. Computers and theHumanities, 18, 157–163.

Ashby, F. G., &Maddox,W. T. (2005). Human category learning. AnnualReview of Psychology, 56, 149–178. doi:https://doi.org/10.1146/annurev.psych.56.091103.070217

Banai, K., Ortiz, J. A., Oppenheimer, J. D., & Wright, B. A. (2010).Learning two things at once: Differential constraints on the acquisi-tion and consolidation of perceptual learning. Neuroscience, 165,436–444. doi:https://doi.org/10.1016/j.neuroscience.2009.10.060

Burns, E. M., & Campbell, S. L. (1994). Frequency and frequency-ratioresolution by possessors of absolute and relative pitch: Examples ofcategorical perception? Journal of the Acoustical Society ofAmerica, 96, 2704–2719.

Burns, E. M., & Ward, W. D. (1978). Categorical perception—phenom-enon or epiphenomenon: Evidence from experiments in the percep-tion of melodic musical intervals. Journal of the Acoustical Societyof America, 63, 456–468.

Cleland, K. D., &Dobrea-Grindahl, M. (2013). Developing musicianshipthrough aural skills: A holistic approach to sight singing and eartraining. New York: Routledge.

Fahle, M., & Poggio, T. (2002). Perceptual learning. Cambridge: MITPress.

Flege, J. E. (1995). Two procedures for training a novel second languagephonetic contrast. Applied Psycholinguistics, 16, 425–442.

Foxton, J. M., Brown, A. C. B., Chambers, S., & Griffiths, T. D. (2004).Training improves acoustic pattern perception. Current Biology, 14,322–325. doi:https://doi.org/10.1016/j.cub.2004.02.001

Fu, Q.-J., Galvin, J., Wang, X., & Nogaki, G. (2005). Moderate auditorytraining can improve speech performance of adult cochlear implantpatients. Acoustics Research Letters Online, 6, 106–111.

Furby, V. J. (2016). The effects of peer tutoring on the aural skills perfor-mance of undergraduate music majors. Update: Applications ofResearch in Music Education, 34, 33–39.

Gelman, A. (2006). Prior distributions for variance parameters in hierar-chical models (comment on article by Browne and Draper).Bayesian Analysis, 1, 515–534.

Gelman, A., & Hill, J. (2007). Data analysis using regression andmultilevel/hierarchical models (Vol. 1). New York: CambridgeUniversity Press.

Gelman, A., Hill, J., & Yajima, M. (2012). WhyWe (Usually) Don’t haveto worry about multiple comparisons. Journal of Research onEducational Effectiveness, 5, 189–211.

Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y.-S. (2008). A weaklyinformative default prior distribution for logistic and other regres-sion models. Annals of Applied Statistics, 2, 1360–1383. doi:https://doi.org/10.2139/ssrn.1010421

Gelman, A., Meng, X.-L., & Stern, H. (1996). Posterior predictive assess-ment of model fitness via realized discrepancies. Statistica Sinica, 6,733–760.

Gelman, A., & Tuerlinckx, F. (2000). Type S error rates for classical andBayesian single and multiple comparison procedures.Computational Statistics, 15, 373–390.

Gilbert, C. D., & Sigman, M. (2007). Brain states: Top-down influencesin sensory processing. Neuron, 54, 677–696. doi:https://doi.org/10.1016/j.neuron.2007.05.019

Hoffman, M. D., & Gelman, A. (2014). The no-U-turn sampler:Adaptively setting path lengths in Hamiltonian Monte Carlo.Journal of Machine Learning Research, 15, 1593–1623.

Karpinski, G. S. (2000). Aural skills acquisition: The development oflistening, reading, and performing skills in college-level musicians.Oxford: Oxford University Press.

Maidment, D.W., Kang, H., Gill, E. C., & Amitay, S. (2015). Acquisitionversus consolidation of auditory perceptual learning using mixed-training regimens. PLoS ONE, 10, e0121953. doi:https://doi.org/10.1371/journal.pone.0121953

McDermott, J. H., Keebler, M. V.,Micheyl, C., &Oxenham, A. J. (2010).Musical intervals and relative pitch: Frequency resolution, not inter-val resolution, is special. Journal of the Acoustical Society ofAmerica, 128, 1943–1951. doi:https://doi.org/10.1121/1.3478785

Moore, D. R., Rosenberg, J. F., & Coleman, J. S. (2005). Discriminationtraining of phonemic contrasts enhances phonological processing inmainstream school children. Brain and Language, 94, 72–85.

Naqib, F., Sossin, W. S., & Farah, C. A. (2012). Molecular determinantsof the spacing effect. Neural Plasticity, 2012, 581291:1–8. doi:https://doi.org/10.1155/2012/581291

Parkosadze, K., Otto, T. U., Malania, M., Kezeli, A., & Herzog, M. H.(2008). Perceptual learning of bisection stimuli under roving: Slowand largely specific. Journal of Vision, 8(1), 5. doi:https://doi.org/10.1167/8.1.5

Pavlik, P. I., & Anderson, J. R. (2005). Practice and forgetting effects onvocabulary memory: An activation-based model of the spacing ef-fect. Cognitive Science, 29, 559–586.

Pisoni, D. B., Aslin, R. N., Perey, A. J., & Hennessy, B. L. (1982). Someeffects of laboratory training on identification and discrimination ofvoicing contrasts in stop consonants. Journal of ExperimentalPsychology: Human Perception and Performance, 8, 297–314.doi:https://doi.org/10.1037/0096-1523.8.2.297

Russo, F. A., & Thompson, W. F. (2005). The subjective size of melodicintervals over a two-octave range. Psychonomic Bulletin & Review,12, 1068–1075. doi:https://doi.org/10.3758/BF03206445

Seitz, A., & Watanabe, T. (2005). A unified model for perceptual learn-ing. Trends in Cognitive Sciences, 9, 329–334. doi:https://doi.org/10.1016/j.tics.2005.05.010

Seitz, A. R., & Dinse, H. R. (2007). A common framework for perceptuallearning. Current Opinion in Neurobiology, 17, 148–153.

Seitz, A. R., Yamagishi, N., Werner, B., Goda, N., Kawato, M., &Watanabe, T. (2005). Task-specific disruption of perceptual learn-ing. Proceedings of the National Academy of Sciences, 102, 14895–14900.

Strange, W., & Dittmann, S. (1984). Effects of discrimination training onthe perception of /r–l/ by Japanese adults learning English.Perception & Psychophysics, 36, 131–145. doi:https://doi.org/10.3758/BF03202673

Szpiro, S. F. A., Wright, B. A., & Carrasco, M. (2014). Learning one taskby interleaving practice with another task. Vision Research, 101,118–124.

Thompson, W. F. (2013). 4—Intervals and scales. In D. Deutsch (Ed.),The psychology of music (3rd) (pp. 107–140). Amsterdam:Academic Press.

Wayland, R. P., & Li, B. (2008). Effects of two training procedures incross-language perception of tones. Journal of Phonetics, 36, 250–267.

Wright, B. A., Baese-Berk, M. M., Marrone, N., & Bradlow, A. R.(2015). Enhancing speech learning by combining task practice withperiods of stimulus exposure without practice. Journal of theAcoustical Society of America, 138, 928–937. doi:https://doi.org/10.1121/1.4927411

Wright, B. A., & Sabin, A. T. (2007). Perceptual learning: How muchdaily training is enough? Experimental Brain Research, 180, 727–736. doi:https://doi.org/10.1007/s00221-007-0898-z

Wright, B. A., Sabin, A. T., Zhang, Y., Marrone, N., & Fitzgerald, M. B.(2010). Enhancing perceptual learning by combining practice withperiods of additional sensory stimulation. Journal of Neuroscience,

356 Atten Percept Psychophys (2019) 81:344–357

Page 14: Inducing musical-interval learning by combining task ... · Inducing musical-interval learning by combining task practice with periods of stimulus exposure alone David F. Little1

30, 12868–12877. doi:https://doi.org/10.1523/JNEUROSCI.0487-10.2010

Wright, B. A., & Zhang, Y. (2009). A review of the generalization ofauditory learning. Philosophical Transactions of the Royal SocietyB, 364, 301–311.

Yi, H. G., & Chandrasekaran, B. (2016). Auditory categories with sepa-rable decision boundaries are learned faster with full feedback than

with minimal feedback. Journal of the Acoustical Society ofAmerica, 140, 1332–1335.

Yotsumoto, Y., Chang, L.-H., Watanabe, T., & Sasaki, Y. (2009).Interference and feature specificity in visual perceptual learning.Vision Research, 49, 2611–2623. doi:https://doi.org/10.1016/j.visres.2009.08.001

Atten Percept Psychophys (2019) 81:344–357 357