Human auditory cortex is sensitive to the perceived ...

13
Human auditory cortex is sensitive to the perceived clarity of speech Conor J. Wild a, , Matthew H. Davis b , Ingrid S. Johnsrude a, c, d a Centre for Neuroscience Studies, Queen's University, Kingston ON Canada b Medical Research Council Cognition and Brain Sciences Unit, Cambridge UK c Linnaeus Centre HEAD, Linköping University, Linköping Sweden d Department of Psychology, Queen's University, Kingston ON Canada abstract article info Article history: Received 7 September 2011 Revised 23 November 2011 Accepted 2 January 2012 Available online 10 January 2012 Keywords: fMRI Primary auditory cortex Speech perception Predictive coding Top-down inuences Superior temporal sulcus Premotor cortex Inferior frontal gyrus Feedback connections Feedback connections among auditory cortical regions may play an important functional role in processing naturalistic speech, which is typically considered a problem solved through serial feed-forward processing stages. Here, we used fMRI to investigate whether activity within primary auditory cortex (PAC) is sensitive to the perceived clarity of degraded sentences. A region-of-interest analysis using probabilistic cytoarchitec- tonic maps of PAC revealed a modulation of activity, in the most primary-like subregion (area Te1.0), related to the intelligibility of naturalistic speech stimuli that cannot be driven by stimulus differences. Importantly, this effect was unique to those conditions accompanied by a perceptual increase in clarity. Connectivity an- alyses suggested sources of input to PAC are higher-order temporal, frontal and motor regions. These ndings are incompatible with feed-forward models of speech perception, and suggest that this problem belongs amongst modern perceptual frameworks in which the brain actively predicts sensory input, rather than just passively receiving it. Crown Copyright © 2012 Published by Elsevier Inc. All rights reserved. Introduction The most natural and common activities often turn out to be the most computationally difcult behaviors to understand. Human per- ception is a prime example of this complexity, as seemingly stable and coherent perceptions of the world are constructed from impover- ished and inherently ambiguous sensory information. Predictive cod- ing and Bayesian models of perception, in which prior knowledge is combined with sensory representations through feedback projec- tions, provide an elegant mechanism by which accurate interpreta- tions of the environment can be constructed from variable and ambiguous input in way that is exible and task dependent (Alink et al., 2010; Friston, 2005; Friston et al., 2010; Murray et al., 2002, 2004; Summereld and Koechlin, 2008). However, these probabilistic frameworks for perception have not yet gained traction in the audito- ry domain. They may provide valuable insight into the perception of spoken language, which has typically been considered a unique or specialproblem solved through serial feed-forward processing (Norris, 1994; Norris and McQueen, 2008; Norris et al., 2000). Research has demonstrated that knowledge and the predictability of a speech signal play a large role in determining how much we glean from itespecially in compromised listening situations. For example, linguistic, syntactic, and semantic content (Kalikow et al., 1977; Miller and Isard, 1963; Miller et al., 1951), familiarity with a speaker (Nygaard and Pisoni, 1998), and visual cues (Sumby and Pollack, 1954; Zekveld et al., 2008) can all greatly enhance the intelligibility of speech presented in masking noise. If prior knowledge and the pre- dictability of speech inuence our perception of a stimulus, then, in accordance with predictive coding models of perception, we might expect to see an inuence of this information at the earliest cortical auditory stage: in primary auditory cortex (PAC). The anatomical organization of the cortical auditory system sup- ports this functional view of interactive processing, wherein higher- order cortical areas may inuence early auditory processes. Tracer studies have demonstrated that feedback projections are present at all levels of the auditory system (de la Mothe et al., 2006; Kaas and Hackett, 2000; Winer, 2005), and although forward connections typically project only to adjacent regions, feedback connections can project across multiple levels and target primary auditory cortex (de la Mothe et al., 2006). Currently, however, there is little evidence from human neuroimaging demonstrating a functional role of these feedback connections in the perception of naturalistic speech. We designed the present fMRI study to investigate whether activ- ity elicited within PAC by spoken English sentences is modulated by their perceptual clarity. We avoid confounding the intelligibility of speech with acoustic stimulus features by exploiting a phenomenon NeuroImage 60 (2012) 14901502 Corresponding author at: Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada K7L 3N6. E-mail address: [email protected] (C.J. Wild). 1053-8119/$ see front matter. Crown Copyright © 2012 Published by Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2012.01.035 Contents lists available at SciVerse ScienceDirect NeuroImage journal homepage: www.elsevier.com/locate/ynimg

Transcript of Human auditory cortex is sensitive to the perceived ...

Page 1: Human auditory cortex is sensitive to the perceived ...

NeuroImage 60 (2012) 1490–1502

Contents lists available at SciVerse ScienceDirect

NeuroImage

j ourna l homepage: www.e lsev ie r .com/ locate /yn img

Human auditory cortex is sensitive to the perceived clarity of speech

Conor J. Wild a,⁎, Matthew H. Davis b, Ingrid S. Johnsrude a,c,d

a Centre for Neuroscience Studies, Queen's University, Kingston ON Canadab Medical Research Council Cognition and Brain Sciences Unit, Cambridge UKc Linnaeus Centre HEAD, Linköping University, Linköping Swedend Department of Psychology, Queen's University, Kingston ON Canada

⁎ Corresponding author at: Centre for NeurosciencKingston, Ontario, Canada K7L 3N6.

E-mail address: [email protected] (C.J. Wild).

1053-8119/$ – see front matter. Crown Copyright © 20doi:10.1016/j.neuroimage.2012.01.035

a b s t r a c t

a r t i c l e i n f o

Article history:Received 7 September 2011Revised 23 November 2011Accepted 2 January 2012Available online 10 January 2012

Keywords:fMRIPrimary auditory cortexSpeech perceptionPredictive codingTop-down influencesSuperior temporal sulcusPremotor cortexInferior frontal gyrusFeedback connections

Feedback connections among auditory cortical regions may play an important functional role in processingnaturalistic speech, which is typically considered a problem solved through serial feed-forward processingstages. Here, we used fMRI to investigate whether activity within primary auditory cortex (PAC) is sensitiveto the perceived clarity of degraded sentences. A region-of-interest analysis using probabilistic cytoarchitec-tonic maps of PAC revealed a modulation of activity, in the most primary-like subregion (area Te1.0), relatedto the intelligibility of naturalistic speech stimuli that cannot be driven by stimulus differences. Importantly,this effect was unique to those conditions accompanied by a perceptual increase in clarity. Connectivity an-alyses suggested sources of input to PAC are higher-order temporal, frontal and motor regions. These findingsare incompatible with feed-forward models of speech perception, and suggest that this problem belongsamongst modern perceptual frameworks in which the brain actively predicts sensory input, rather thanjust passively receiving it.

Crown Copyright © 2012 Published by Elsevier Inc. All rights reserved.

Introduction

The most natural and common activities often turn out to be themost computationally difficult behaviors to understand. Human per-ception is a prime example of this complexity, as seemingly stableand coherent perceptions of the world are constructed from impover-ished and inherently ambiguous sensory information. Predictive cod-ing and Bayesian models of perception, in which prior knowledge iscombined with sensory representations through feedback projec-tions, provide an elegant mechanism by which accurate interpreta-tions of the environment can be constructed from variable andambiguous input in way that is flexible and task dependent (Alinket al., 2010; Friston, 2005; Friston et al., 2010; Murray et al., 2002,2004; Summerfield and Koechlin, 2008). However, these probabilisticframeworks for perception have not yet gained traction in the audito-ry domain. They may provide valuable insight into the perception ofspoken language, which has typically been considered a unique or“special” problem solved through serial feed-forward processing(Norris, 1994; Norris and McQueen, 2008; Norris et al., 2000).

Research has demonstrated that knowledge and the predictabilityof a speech signal play a large role in determining howmuchwe glean

e Studies, Queen's University,

12 Published by Elsevier Inc. All righ

from it—especially in compromised listening situations. For example,linguistic, syntactic, and semantic content (Kalikow et al., 1977;Miller and Isard, 1963; Miller et al., 1951), familiarity with a speaker(Nygaard and Pisoni, 1998), and visual cues (Sumby and Pollack,1954; Zekveld et al., 2008) can all greatly enhance the intelligibilityof speech presented in masking noise. If prior knowledge and the pre-dictability of speech influence our perception of a stimulus, then, inaccordance with predictive coding models of perception, we mightexpect to see an influence of this information at the earliest corticalauditory stage: in primary auditory cortex (PAC).

The anatomical organization of the cortical auditory system sup-ports this functional view of interactive processing, wherein higher-order cortical areas may influence early auditory processes. Tracerstudies have demonstrated that feedback projections are present atall levels of the auditory system (de la Mothe et al., 2006; Kaas andHackett, 2000; Winer, 2005), and although forward connectionstypically project only to adjacent regions, feedback connections canproject across multiple levels and target primary auditory cortex(de la Mothe et al., 2006). Currently, however, there is little evidencefrom human neuroimaging demonstrating a functional role of thesefeedback connections in the perception of naturalistic speech.

We designed the present fMRI study to investigate whether activ-ity elicited within PAC by spoken English sentences is modulated bytheir perceptual clarity. We avoid confounding the intelligibility ofspeech with acoustic stimulus features by exploiting a phenomenon

ts reserved.

Page 2: Human auditory cortex is sensitive to the perceived ...

Fig. 1. The experimental procedure for the fMRI study illustrated for a single trial. Themidpoint of the auditory stimulus was presented 4 s prior to scanning, and text primes(matching, in this case) preceded the equivalent auditory word by 200 ms. The entireprime sequence was bookended by a fixation cross.

1491C.J. Wild et al. / NeuroImage 60 (2012) 1490–1502

known as “pop-out”: a dramatic increase in the perceptual clarity of ahighly degraded utterancewhen the listener possesses prior knowledgeof the signal content (Davis et al., 2005). Intelligibility is often operatio-nalized as the percentage of words in a sentence a listener can under-stand or report correctly; although this differs from perceptual clarity,these measures likely describe the same underlying dimension ofspeech. We hypothesize that “pop-out” is achieved through onlineperceptual feedback which influences, or re-tunes, low-level auditoryprocessing. On this basis, we would expect modulation of PAC activityas a function of prior knowledge. We further expect that “pop-out” isunique to degraded, yet potentially intelligible, speech signals, andtherefore, amodulation of PAC activity associatedwith prior knowledgewill be uniquely observed for degraded speech. Additional informationcannot increase the clarity of speech that is already perfectly clear, norcan it turn an entirely unintelligible stimulus into recognizable speech.

Here, we provide participants with prior knowledge of a speechutterance in the form of written text from which they derive an ab-stract phonological representation of the sentence (Frost, 1998;Orden, 1987; Rastle and Brysbaert, 2006). A more naturalistic sourceof constraining information might be audiovisual speech, which ismore intelligible in noise than speech alone (Benoit et al., 1994;Macleod and Summerfield, 1987; Sumby and Pollack, 1954). Existingwork has shown interactions of visual and auditory information inprimary sensory cortices for integrating moving faces and concurrentspeech (van Wassenhove et al., 2005, 2007). It may be, however, thatthe human brain has evolved specialized mechanisms or pathwaysfor speech perception that reflect lateral interactions between audioand visual speech input rather than top-down projections. Converse-ly, written text is unlikely to have direct associations with auditoryexpectations since the phonological representations accessed fromprint are more abstract and cannot directly predict the auditoryform of vocoded words. We therefore propose that the most likelysource of any observed influence of matching written text on auditorycortex activity must be higher-level, amodal, phonological represen-tations acting through top-down mechanisms.

In the current study, participants hear speech presented at one ofthree levels of acoustic clarity—perfectly clear, degraded, or completelyunintelligible—while viewing text that either matches what they hear(i.e., useful predictive information), or uninformative primes that arematched for low-level visual features. This factorial design allowsus to separate areas that are sensitive to speech and/or written text(i.e., the main effects), from those where speech-evoked responses aremodulated by the presence of useful predictive information (i.e., theinteraction). Considering that we expect to observe an interaction of aspecific form in PAC (where “pop-out” is unique to degraded speech),we perform a region-of-interest analysis that directly targets theearliest areas of cortical auditory processing. We also employ typicalwhole-brain and functional connectivity analyses in amore exploratoryfashion to identify other brain areas that may be involved in enhancingthe perceptual clarity of degraded speech.

Materials and methods

Participants

Data were collected from 19 normal healthy volunteers (8 male,ages 18–26). All participants were right-handed, native Englishspeakers, and reported no neurological disease, hearing loss, or atten-tional deficits. All subjects gave written informed consent for the ex-perimental protocol, which was cleared by the Queen's HealthSciences Research Ethics Board.

Design

To create acoustically identical speech conditions that differed inintelligibility, we used noise-vocoded speech (NVS)—a form of

distorted speech that preserves temporal cues while eliminatingmuch of the fine spectral detail (Shannon et al., 1995). NVS is largelyunintelligible to naïve listeners, but becomes highly intelligible (i.e.“pops out”) if listeners are provided with a written representationof the speech (Davis et al., 2005). Here we presented NVS concurrent-ly with text strings that either matched the content of the sentence(“matching”) or were random strings of consonants (“non-match-ing”). Therefore, the NVS presented with matching text was per-ceived as intelligible speech, whereas the other was not. Non-matching primes were designed purposefully to be uninformative,rather than mismatching (i.e. incongruent real words), since mis-matching text might not be quite so mismatching for auditory signalsthat are highly ambiguous.

On each trial, participants heard a single auditory stimulus andviewed concomitant visual text strings (i.e. primes) that were timelocked to the auditory signal (Fig. 1). The experiment conformed toa 3×2 factorial design, which paired three types of speech stimuliwith matching or non-matching visual text primes (i.e. text strings).Matching primes (M) precisely matched the content of the sentences,whereas non-matching primes (nM) consisted of length matchedstrings of randomly selected consonants. Speech stimuli could be pre-sented as: clear speech (C), which is always intelligible; three-bandNVS, for which naïve listeners can report approximately 40% wordscorrectly, according to pilot data; and rotated NVS (rNVS), which isalways unintelligible. Randomly generated consonant strings werealso presented in a baseline silent condition (S).

Stimulus preparation

Auditory stimulus materials consisted of 150 mundane Englishsentences (e.g. “His handwriting was very difficult to read"). The sen-tences were recorded by a female native speaker of North AmericanEnglish in a soundproof chamber using an AKG C1000S microphoneinto an RME Fireface 400 audio interface (sampled at 16 bits,44.1 kHz). Each of the 150 sentences was processed to create thetwo types of processed speech stimuli: clear speech, three-bandNVS, and three-band rNVS. After processing, all stimuli were normal-ized to have the same total RMS power.

NVS stimuli were created as described by Shannon et al. (1995)using a custom Matlab vocoder. Items were first filtered into threecontiguous frequency bands (50 Hz→558 Hz→2264 Hz→8000 Hz;

Page 3: Human auditory cortex is sensitive to the perceived ...

1492 C.J. Wild et al. / NeuroImage 60 (2012) 1490–1502

band edges were selected to be equally spaced along the basilarmembrane (Greenwood, 1990)) using FIR Hann band-pass filterswith a window length of 801 samples. The amplitude envelope fromeach frequency band was extracted by full-wave rectifying theband-limited signal, then low-pass filtering it at 30 Hz using afourth-order Butterworth filter. The envelopes were then applied toband-pass filtered noise of the same frequency ranges, which wererecombined to produce the NV utterance. To create rNVS stimuli,the envelope from the lowest frequency band was applied to thehighest frequency noise band, and vice versa. Clear speech remainedunprocessed, and contained all frequencies up to 22.05 kHz.

Six sets of twenty-five sentences were constructed from the cor-pus of 150 items. The sets were statistically matched for number ofwords (M=9.0, STD=2.2); number of syllables (M=20.1,STD=7.3); length in milliseconds (M=2499, STD=602.8); and thelogarithm of the sum word frequency (Thorndike and Lorge writtenfrequency, M=5.5, STD=0.2). Each set of sentences was assignedto one of six experimental conditions, such that sets and conditionswere counterbalanced across subjects to eliminate material-specificeffects.

Pilot study

A pilot behavioural study was conducted to confirm a strong “pop-out” effect unique to the NVS stimuli. Using an identical experimentaldesign and stimuli as above, and a unique assignment of materials toconditions for each subject (to avoid material-specific effects), fivenaïve, young adult participants from the Queen's community wereasked to rate the amount of noise they heard in each sentence. Rat-ings were on a scale from 1 to 7 where 1 indicated no noise at all,and 7 indicated that the entire auditory stimulus consisted of noise(note that the NVS and rNVS are constructed entirely of noise, sothese ratings reflect perceptual clarity). A two-factor ANOVA on thenoise ratings, with prime condition (M vs. nM) and stimulus type(C, NVS and rNVS) as factors, revealed a significant interaction(F(2,8)=35.37, pb0.001; Fig. 2): NVM items were judged as signifi-cantly less noisy than NVnM items (t(5)=−7.13, pb0.01), whereasrNVM items were judged as slightly less noisy than rNVnM (t(5)=−3.15, pb0.05), but prime type had no effect on the noise ratingsof clear stimuli (t(5)=0.0, p=1.0). To determine whether the in-crease in clarity afforded by M primes (i.e. “pop-out”) for NV itemswas greater than that for the other types of speech, we performed aone-factor three-level ANOVA on noise-rating difference scores(nM–M). This analysis yielded a significant main effect of speech

Fig. 2. Subjective noise ratings obtained from 5 subjects in the pilot study. Noise ratingswere made on a scale from 1 to 7, where 1 indicated no noise (i.e. the auditory stimuluswas perfectly clear), and 7 indicated that the auditory stimulus consisted of total noise.Bar height indicates the mean noise rating across participants for each condition: clear(C), noise-vocoded (NV), and rotated NV (rNV), paired with matching (M) or non-matching (nM) primes. Error bars indicate standard error of the mean (SEM) suitablefor repeated-measures data. Asterisks represent significant differences from post-hoct-tests, Sidak corrected for multiple comparisons (** pb0.01; * pb0.05).

type (F(2,8)=35.40, pb0.001), where “pop-out” for NV items wasgreater than for rNV items (t(5)=5.23, pb0.01), and for clear speechitems (t(5)=6.95, pb0.01).

MRI protocol and data acquisition

MR imaging was performed on a 3.0 Tesla Siemens Trio MRIscanner in the Queens Centre for Neuroscience Studies MR Facility(Kingston, Ontario, Canada). We used a sparse-imaging technique,(Hall et al., 1999) in which stimuli were presented during the silentperiod between successive scans. T2*-weighted functional imageswere acquired using GE-EPI sequences, with: field of view of211 mm×211 mm; in-plane resolution of 3.3 mm×3:3 mm; slicethickness of 3.3 mm with a 25% gap; acquisition time (TA) of2000 ms per volume; and a time-to-repeat (TR) of 9000 ms. Acquisi-tion was transverse oblique, angled away from the eyes, and in mostcases covered the whole brain (in a very few cases, slice positioningexcluded the very top of the superior parietal lobule).

A total of 179 volumes were acquired from each subject over fourscanning runs. Conditions were pseudo-randomized such that all runscontained equal numbers of each condition type, and there were ap-proximately an equal number of transitions between all conditiontypes. Twenty-five images were collected for each experimental con-dition, plus a single dummy image at the beginning of each run(which were discarded from analysis). In each trial, the stimulus se-quence was positioned such that the midpoint of the auditory stimu-lus occurred 4 s prior to scan onset, ensuring that any hemodynamicresponse evoked by the audio–visual stimulus peaked within thescan. Visual primes appeared a single word at a time preceding thehomologous auditory segment by 200 ms (Fig. 1). In addition to thefunctional data, a whole-brain 3D T1-weighted anatomical image(voxel size of 1.0 mm3) was acquired for each participant at thestart of the session.

Participants were instructed to, “Listen carefully, and attempt tounderstand the sounds you hear”. Additionally, they performed asimple target monitoring task to keep them alert and focused on thevisual primes. In one of every twelve trials, pseudorandomly selected(balanced across condition type, including silence), a visual target(“X X X”) would appear for 200 ms after the complete primesequence; the presence of a target was indicated with the press of abutton under the right index finger. Prior to the experimental session,participants were trained on the listening and detection tasksinside the scanner (4 trials each, using different stimuli than theexperiment) to familiarize them with them with the tasks and theexperimental procedure.

During the scanning session, participants viewed visual stimuliprojected on to a screen at one end of the magnet bore. An angledmirror attached to the head coil allowed participants to view thescreen from the supine position. An NEC LT265 DLP projector wasused to project visual stimuli to the screen, and auditory stimuliwere presented via a Nordic Neurolabs MR-compatible pneumaticaudio system.

fMRI data pre-processing and analysis

All fMRI data were processed and analyzed using Statistical Para-metric Mapping (SPM5; Wellcome Department of Cognitive Neuro-science, London, UK). Data preprocessing steps for each subjectincluded: 1) rigid realignment of each EPI volume to the first of thesession; 2) coregistration of the structural image to the mean EPI;3) normalization of the structural image to a standard template(ICBM452) using the SPM5 unified segmentation routine; 4) warpingof all EPI volumes using deformation parameters generated from nor-malization step, including resampling to a voxel size of 3.0 mm3; and5) spatial smoothing using a Gaussian kernel with a full-width at halfmaximum (FWHM) of 8 mm.

Page 4: Human auditory cortex is sensitive to the perceived ...

1493C.J. Wild et al. / NeuroImage 60 (2012) 1490–1502

Analysis at the first level was conducted using a single GeneralLinear Model (GLM) for each participant in which each scan wascoded as belonging to one of seven experimental conditions. Thefour runs were modelled as one session within the design matrix,and four regressors were used to code each of the runs. Twenty-fourparameters were included to account for movement-related effects(i.e. six realignment parameters and their Volterra expansion(Friston et al., 1996). The detection task was modelled as a single re-gressor indicating the presence of a visual target (and subsequentbutton press). Due to the long TR of this sparse-imaging paradigm,no correction for serial autocorrelation was necessary. Furthermore,no high-pass filter was applied to the data because the previously de-scribed pseudo-randomization of conditions was sufficient to ensurethat any low-frequency noise would not confound activation results.

Contrast images for each of the six experimental conditions weregenerated by comparing each of the condition parameter estimates(i.e. betas) to the silent baseline condition. The images from each sub-ject were entered into a second-level group 3×2 (Speech×Primetype) repeated-measures ANOVA, treating subjects as a random ef-fect. F-contrasts were used to assess the overall main effects of speechtype, prime type, and their interaction. For the whole-brain analysis,we report only peak voxels that were statistically significant at thewhole-brain level using a family-wise correction for Type I error(FWE, pb0.05). The overall main effects were parsed into simple ef-fects by inclusively masking specific t-contrast images (i.e. simple ef-fects) with corresponding F-contrasts (i.e. the main effect). The t-contrasts were combined to determine logical intersections of thesimple effects; in this way, significant voxels revealed by F-contrastswere labelled as being driven by one or more simple effects. Peakswere localized using the LONI probabilistic brain atlas (LPBA40;(Shattuck et al., 2008) and confirmed by visual inspection of the aver-age structural image.

Region-of-interest (ROI) analysis

Given the hypothesis motivating this study, we performed aregion-of-interest analysis examining activity directly within audito-ry cortical core areas, particularly the most primary-like area.Morosan et al. (2001) used observer-independent cytoarchitectonicmapping to identify three adjacent zones arranged medial to lateral(Te1.1, Te1.0, Te1.2) approximately along the axis of the first trans-verse temporal gyrus (Heschl's gyrus—HG; the morphological markerof primary auditory cortex on the superior temporal plane) in 10post-mortem human specimens. These researchers quantitatively an-alyzed serial coronal sections from the ten brains stained for cell bod-ies; area borders were digitized and mapped on to high-resolutionstructural MR images of the same post-mortem brains to createthree-dimensional labelled volumes of the cytoarchitectonic regions(Morosan et al., 2001). The three areas (Te1.1, Te1.0, Te1.2) were re-liably distinguishable by their cytoarchitectonic features. Based on acomparison of qualitative and quantitative descriptions of temporalcortex in primates, Hackett et al. (2001) concluded that the mostprimary-like, koniocortical, core region is probably Te1.0 correspond-ing to area AI in rhesus macaque monkeys. Similarly, area Te1.1 likelycorresponds to the macaque caudomedial belt (area CM; Hackett,2001). We focused our analysis on area Te1.0 since it is the mostprimary-like of the three Te1 regions, although we also performed asimilar analysis of Te1.1 and Te1.2.

To define our ROIs, we constructed probabilistic maps of the threeTe1 regions. Template-free probabilistic maps of these regions wereconstructed using a group-wise registration technique to transformthe ten post-mortem structural images, along with their cytoarchitec-tonic information, to a common space (Tahmasebi et al., 2009). Theprobabilistic maps were transformed into the same space as our nor-malized subject data by performing a pair-wise registration (i.e. SPM2normalization) between the average post-mortem group structural

image and the ICBM152 template. A threshold was applied to theprobability maps so that they only included voxels that were labelledas belonging to the structure of interest in at least 3/10 of the post-mortem brains. The warped probability maps overlapped with theanterior-most gyrus of Heschl in all our subjects, as evaluated by visu-al inspection. Fig. 6A shows the thresholded Te 1.0 probability mapoverlaid on the average group structural image.

The cytoarchitectonic data were defined bilaterally (i.e. for bothleft and right auditory cortices), and so the probabilistic maps also in-cluded a left and a right region. A wealth of neuroscience research ex-plores the structural and functional asymmetries of human primaryauditory cortex, and whether such effects play a role in language pro-cessing (Binder et al., 1996; e.g., Abrams et al., 2008; Devlin et al.,2003; Zatorre et al., 2002). Though we make no predictions regardingan asymmetry in this study, it is possible that a hemispheric effectmight be observed: accordingly, we include the hemisphere factorin the ROI analysis.

A region-of-interest toolkit (MarsBaR v0.42; (Brett et al., 2002)was used to perform the analysis on unsmoothed data sets. First,the signal was extracted from left and right PAC, corrected for effectsof no interest (e.g. movement parameters, etc.) and summarized foreach left and right volume using a weighted-mean calculation where-in the signal from each voxel was weighted by the probability of it be-longing to the structure of interest. Contrasts were computed yieldingaverage signal intensity in the ROI for each condition compared to thesilent baseline. The contrast values from all subjects were analyzed as2 (hemisphere)×3 (Speech Type)×2 (Prime Type) within-subjectsANOVA in SPSS software (www.spss.com).

Functional connectivity analysis

A correlation analysis was used to search for brain regions thatexhibited significant functional connectivity with primary auditorycortex. The probability map of area Te1.0 in the left hemisphere wasused as a seed region. From each subject, we first extracted the timeseries of residuals for each voxel within the seed region. Voxels with-in the seed region were corrected for global signal using linear regres-sion (Fox et al., 2009; Macey et al., 2004) before being summarizedwith a weighted-mean calculation to create a single time series repre-senting the seed region. The corrected seed time series was correlatedwith the residual time series of every voxel in the brain to create awhole-brain correlation map. Correlation coefficients were convertedto z-scores with the Fischer Transform, serving to make the null hy-pothesis sampling distribution approach that of the normal distribu-tion, which allows for statistical conclusions to be made based oncorrelationmagnitudes (Rissman et al., 2004). The z-transformed cor-relation maps were subjected to a group-level random-effects analy-sis, as in the conventional group analysis. Again, we report only peakvoxels that were statistically significant at the whole-brain level usinga family-wise correction for Type I error (FWE, pb0.05).

In order to remove areas in the region of the seed voxel that exhibitconnectivity due to spatial smoothing (autocorrelation), theseanalyses were masked with a spherical mask with a radius equal to1.27 times the FWHM of the intrinsic smoothness of the analyseddata (This radius is equal to 3 times the standard deviation (sigma)of the smoothing kernel, since the FWHM of a Gaussian is equal to2.3548 * sigma).

Results

Behavioural results, target detection task

Almost all participants performed perfectly on the target detectiontask, indicating that they were attentive and aware during the entirescanning session (only one subject made a single error).

Page 5: Human auditory cortex is sensitive to the perceived ...

1494 C.J. Wild et al. / NeuroImage 60 (2012) 1490–1502

Imaging results, whole-brain analysis

Cortical activation in response to all six conditions with soundcompared to silence yielded bilateral activation in large clustersencompassing most of the superior temporal plane and superior tem-poral gyrus (STG). The clusters included Heschl's gyrus as well as thesuperior temporal sulcus (STS). This contrast was used for qualitycontrol purposes to ensure valid models and data. All subjects dis-played robust bilateral activation for the comparison of sound to si-lence and were included in further analyses.

The contrast assessing the main effect of speech type revealedlarge bilateral activations restricted mainly to the temporal lobesranging along the full length of the STG and STS, and including the su-perior temporal plane (Fig. 3A). The simple effects driving speech-related activity within these areas were examined using t-contrasts(Fig. 3B, and Table 1). Most regions of the STG and STS exhibited apattern of activity such that clear speech elicited greater activationthan NV speech, which in turn elicited greater activity than rNVspeech (i.e. C>NV>rNV; Fig. 3B, yellow voxels). HG and much ofthe superior temporal plane exhibited strongest activation for clearspeech, with no significant difference between NV and rNV speech

Fig. 3. (A) Voxels that exhibited a main effect of speech type. The activation map is threshocolour scale indicates the p-value corrected family-wise for type I error (FWE), such that anlevel. (B) The simple effects of speech type revealed by t-contrasts. Only voxels demonstratiincluded in this analysis. Simple effects were evaluated voxel-wise at pb0.001, uncorrected,For example, yellow voxels exhibit the pattern C>NV>rNV, whereas red voxels exhibit the p

(i.e. C>NV≈rNV; Fig. 3B, red voxels). Interestingly, we observed asmall area just posterior to bilateral HG which demonstrates an ele-vated response to NV speech compared to C and rNV speech (i.e.NV>C≈ rNV; Fig. 3B, blue and cyan regions).

Activations due to the different effects of the two types of primesoverlapped considerably with those due to speech type in the lefttemporal lobe (Fig. 4, Table 1). We also observed strong activationsdue to prime type in left inferior temporal areas, such as the fusiformgyrus, as well as left frontal areas, including the left inferior frontalgyrus (LIFG), orbitofrontal gyrus, and precentral gyrus (Table 1). Acti-vation due to prime type was also observed in the right STS. Analysisof the simple effects showed that most regions demonstrated a signif-icantly elevated response to matching compared to non-matchingprimes (i.e. M>nM; Fig. 4, red voxels).

Most interesting were areas that demonstrated an interaction be-tween speech and prime type; that is, areas where sensitivity tospeech type was modulated by prime type. Several clusters of voxelsin the left hemisphere demonstrated a significant speech- by prime-type interaction (Fig. 5A; Table 1): two regions of the STS (one poste-rior, one more anterior); STG posterior and lateral to HG, and near theanterior end of the temporal lobe; LIFG, including orbitofrontal gyrus;

lded at pb0.001 uncorrected and overlaid on the group average structural image. They colour brighter than dark blue is statistically significant (pb0.05) at the whole-brainng a significant main effect of speech type (pb0.05 FWE) at the whole-brain level wereand logically combined to label each voxel as demonstrating one or more simple effects.attern C>NV≈ rNV. C = clear speech; NV= noise-vocoded speech; rNV= rotated NV.

Page 6: Human auditory cortex is sensitive to the perceived ...

Table 1Results of the group-level ANOVA of whole-brain data; peaks revealed by overall F-contrasts are significant at pb0.05 FWE. Asterisks indicate marginal significance. L = left, R =right, P = posterior, A = anterior, I = inferior, S = superior, G = gyrus, STG = Superior temporal gyrus, STS = Superior temporal sulcus, SMA = Supplementary motor area.

Contrast Coordinates (mm) F Voxelsincluster

Location Simple effect(s)

X Y Z

Main effect: speech type −60 −12 −3 164.36 1007 L STG C>NV>rNV−63 −30 6 81.20 L P STG C>NV>rNV−51 12 −18 74.30 L A STG C>NV>rNV−33 −30 15 40.82 L Heschl's G C>NV≈rNV−45 −21 9 38.21 L Heschl's G C>NV≈rNV−57 −39 15 *11.92 L P STG NV>C≈ rNV

60 −6 −6 157.30 896 R STG C>NV>rNV51 −27 0 93.15 R P STG C>NV>rNV54 12 −15 69.35 R A STG C>NV>rNV63 −30 15 21.09 9 R P STG NV>C≈rNV39 −33 30 20.57 7 R supramarginal G rNV>NV≈C

Main effect: prime type −51 −42 3 153.10 1066 L P STS M>nM−51 12 −21 102.15 L A STG M>nM−54 −12 −12 84.25 L STS M>nM−39 30 −9 65.86 L orbitofrontal G M>nM−57 −60 12 63.75 L P STS M>nM−54 −3 48 90.62 150 L precentral G M>nM−39 −45 −18 65.63 97 L fusiform G M>nM

27 −63 −48 59.72 32 R cerebellum M>nM−3 3 60 56.51 46 L SMA M>nM

−54 21 18 50.04 130 L I frontal G M>nM36 −90 12 40.54 51 R MID Occipital G nM>M

−39 48 12 37.10 29 L MID frontal G nM>M51 −12 −12 32.59 11 R STS M>nM

Interaction: Speech×Prime type −51 −42 3 40.28 163 L P STS−51 −45 15 24.50 L P STG−54 21 18 28.26 116 L I frontal G−39 30 −6 22.13 14 L orbitofrontal G−51 12 −18 21.05 4 L A STG−6 12 57 20.09 17 L SMA

−45 3 51 18.37 4 L precentral S−51 −6 −15 16.90 1 L STS

Bold emphasis indicates the peak voxel with the highest statistic in the cluster.

1495C.J. Wild et al. / NeuroImage 60 (2012) 1490–1502

premotor cortex (dorsal pre-frontal cortex); and the supplementarymotor area (SMA). In general, the pattern of activity was suchthat the difference between matching and non-matching primes(M>nM) increased as speech became more degraded (Fig. 5B).The parameter estimates of selected peaks are presented in Fig. 5Cto illustrate the interaction pattern.

Fig. 4. Results of the group-level ANOVA showing voxels that exhibited a significant main efaverage T1-weighted structural image. Two contrast images—M>nM and nM>M—were geprime type (pb0.001, uncorrected). P values indicated by the colour scales are corrected famvoxels that are not significant at the whole-brain level when corrected for multiple compar

No interaction was observed in the region of primary auditory cor-tex. This may be due to the smoothness of the data: after appliedsmoothing of 8 mm, effective smoothing was approximately 12 mm.Heschl's gyrus—the morphological landmark of our region of inter-est—measures about 7 mm across on the group average structuralimage, and the effective smoothness could easily have washed out

fect of prime type, and the simple effects driving the main effect, overlaid on the groupnerated and masked to include only voxels that demonstrated an overall main effect ofily-wise for multiple comparisons (FWE), such that dark red and blue voxels indicateisons. M = matching primes; nM = non-matching primes.

Page 7: Human auditory cortex is sensitive to the perceived ...

Fig. 5. (A) Voxels that exhibited a speech- by prime-type interaction. The activation map is thresholded at pb0.001 uncorrected and overlaid on the group average structural image.The colour scale indicates the p value corrected family-wise for type I error (FWE), such that any colour brighter than dark blue is statistically significant (pb0.05) at the whole-brain level. (B) The simple interactions as revealed by t-contrasts. Only voxels demonstrating a significant overall interaction (pb0.05 FWE) at the whole-brain level were includedin this analysis. Simple interactions were evaluated voxel-wise at pb0.001, uncorrected, and logically combined to label each voxel as demonstrating one or more simple effects.rNV(M–nM) = Simple main effect of prime type for rNV speech; NV(M–nM) = Simple main effect of prime type for NV speech; C(M–nM) = Simple main effect of prime type for clearspeech; C) Parameters of selected peaks that displayed a significant Speech×Prime type interaction: 1) LIFG, 2) left posterior STS, 3) SMA, 4) left precentral sulcus. Bar height is themean contrast value for each condition (i.e. the parameter estimate for each condition minus the parameter estimate for silence, averaged across participants). Error bars indicateSEM adjusted for repeated-measures data (Loftus and Masson, 1994).

1496 C.J. Wild et al. / NeuroImage 60 (2012) 1490–1502

any interaction in a region of that size. To examine PAC more directly,we performed a region-of-interest analysis on unsmoothed data.

Region-of-interest (ROI) results

A probabilistic map of cytoarchitectonic area Te1.0 was used to ex-tract signal in the left and right hemispheres from unsmoothed data-sets. A three-factor (hemisphere, speech and prime type) repeated-measures ANOVA revealed a significant main effect of speech type(F(2,36)=34.57, pb0.001); post-hoc analysis showed that this effectwas mainly attributed to enhanced BOLD signal for clear speech com-pared to NV speech (C>NV: t(19)=5.91, pb0.001), and for NV speechcompared to rNV speech (NV>rNV: t(19)=3.57, pb0.01). A signifi-cant effect of hemisphere was observed (F(1,18)=10.86, pb0.01)with signal in the right hemisphere greater than in the left (t(19)=3.29, pb0.01). There was also a significant main effect of prime type(F(1,18)=6.58, pb0.05), such that M primes elicited greater activitythan nM primes (t(18)=2.58, pb0.05). Importantly, there was a sig-nificant interaction between speech and prime types (F(2,36)=3.74,

pb0.05), such that signal elicited by NV speech paired with M primeswas enhanced relative to NV speech with nM primes (NVM>NVnM:t(19)=3.37, pb0.005), and this difference was not present for theother speech types (CM>CnM: t(19)=0.95, p=0.36; rNVM>rNVnM:t(19)=−0.27, p=0.78). Thus, the main effect of prime type wasdriven only by the difference in activity between NVM and NVnMconditions. The three-way interaction was not significant suggestingthat the pattern of the Speech×Prime interaction was similar inboth hemispheres. Fig. 6B shows the estimated signal values withinarea Te1.0 for the six experimental conditions collapsed acrosshemisphere.

Areas Te1.1 and Te1.2 were analyzed in a similar manner, al-though neither exhibited a significant Speech×Prime interaction norsignificant post-hoc simple effects of prime type (i.e. NVM>NVnM).However, to truly determine whether the Speech×Prime interactionwas unique to area Te1.0, the data from all three regions were ana-lyzed together as a 3 (region)×2 (hemisphere)×3 (speech type)×2(prime type) repeated-measures ANOVA. The Region×Speech×Primeinteraction was not significant, indicating that we cannot conclude

Page 8: Human auditory cortex is sensitive to the perceived ...

1497C.J. Wild et al. / NeuroImage 60 (2012) 1490–1502

the Speech×Prime interaction pattern differs across the threecytoarchitectonic regions. From this we can conclude that activity inprimary auditory cortex, especially area Te1.0, is enhanced by thepresence of useful visual information when speech is degraded, butnot when speech is perfectly clear or impossible to understand. Thiseffect may be present to some degree in other primary regions, butwas not detected with our region-of-interest analysis.

This particular Speech×Prime interaction pattern appeared differ-ent from interactions observed in the whole-brain fMRI analysis, inthat area Te1.0 was only sensitive to the difference between M andnM primes for NV speech (see Fig. 5C1-4 vs. Fig. 6B). To confirmthat profiles of activity were truly differed across regions, we com-pared the pattern of activity in Te1.0 to the patterns observed infour whole-brain interaction peaks (Fig. 5C). All four pair-wise com-parisons yielded significant Region×Speech×Prime interactionswhen corrected for multiple comparisons: Te1.0 vs. LIFG, F(2,36)=23.07, pb0.001; Te1.0 vs. STS, F(2,36)=34.89, pb0.001; Te1.0 vs.SMA, F(2,36)=19.99, pb0.001, Te1.0 vs. precentral sulcus, F(2,36)=21.83, pb0.001. Thus, we can conclude that the interaction patternof activity in primary auditory cortex reliably differs from those ob-served in the whole-brain interaction. To examine whether the inter-action pattern that we observe in primary auditory cortex was alsoobserved in another primary sensory region, visual cortex, we per-formed a ROI analysis for primary visual cortex (V1 / BrodmannArea 17) using the same techniques. A probabilistic map of V1 wasobtained from the SPM Anatomy Toolbox (Eickhoff et al., 2005),which is derived from the same cytoarchitectonically labelled post-mortem brains as described earlier. The data from V1 yielded no sig-nificant main effects or interactions. The pair-wise comparison activa-tion patterns in Te1.0 and V1 demonstrated a significantRegion×Speech×Prime interaction (F(2,36)=4.59, pb0.05), showingthat the interaction pattern in Te1.0 was truly different than the pat-tern observed in V1.

Functional connectivity results

A correlational analysis was used to search for regions that exhib-ited significant functional coupling with left PAC. Since a likely causeof correlated neural activity, and hence correlated BOLD signal, is un-derlying anatomical connections, this analysis was used to help iden-tify brain regions connected with PAC that might be the source(s) oftop-down modulation observed in the ROI analysis. To define PAC,we used the left Te1.0 probabilistic map as seed region. Right PACwas not used as a seed because we would expect similar resultsgiven that left and right PAC exhibit strong interhemispheric connec-tions (Kaas and Hackett, 2000; Pandya et al., 1969), and that regionsdemonstrating anatomical connectivity also show strong functionalconnectivity (Koch et al., 2002).

The connectivity results are depicted in Fig. 7A, and summarizedin Table 2. As expected, we observed strongest coupling betweenleft PAC and homologous regions of the right hemisphere. In bothhemispheres, the coupling was restricted mainly to the superior tem-poral plane extending anteriorly to include the insulae, inferior fron-tal gyri, and inferior portions of the precentral gyri. We also observedsignificant coupling bilaterally with the medial geniculate nuclei. Inthe left hemisphere, there was significant coupling with a cluster ofvoxels towards the dorsal end of the precentral sulcus (Fig. 7A, sagit-tal slice at x=−60 mm).

A second seed region was also selected for a functional connectiv-ity analysis. For each subject, we selected a seed voxel in the STS,within 30 mm of the STS peak from the whole-brain analysis, thatdemonstrated a significant speech- by prime-type interaction. Resultsare depicted in Fig. 7B and summarized in Table 3. Regions that exhib-ited significant coupling with the STS seed overlapped considerablywith those regions that also demonstrated a strong Speech×Prime in-teraction in the whole-brain analysis, specifically: areas of the left

superior temporal plane, left STG, left STS, LIFG, and left precentralsulcus. Interestingly, homologous regions of these in the right hemi-sphere also exhibited significant functional coupling with the leftSTS seed. Finally, the left and right fusiform gyri demonstrated signif-icant coupling with the STS.

Discussion

The current study examined the effects of perceptual clarity onBOLD activity within human primary auditory cortex. Estimated sig-nal within the most primary-like region of auditory cortex, areaTe1.0, was enhanced for degraded speech (i.e. NVS) presented withmatching visual text, compared to NVS with non-matching text. Crit-ically, the result manifested as an interaction, such that the differencebetween matching and non-matching text was not present for clearspeech or unintelligible rotated noise-vocoded material. Therefore,this enhancement cannot be attributed to a main effect of primetype. For example, any processes recruited or initiated by the differ-ence between real words and consonant strings (e.g. sub-vocalization, auditory imagery, directed attention, or a crude auditorysearch) should also be recruited for rotated NVS—which is acoustical-ly very similar to NVS; yet, matching primes had no effect on activityin PAC elicited by rNVS. Furthermore, this difference in activity can-not be attributed to stimulus-item effects since, by design, stimulusmaterials were counterbalanced across participants.

This interaction pattern with enhanced BOLD signal for degradedspeech paired with matching over non-matching text may reflecttop-down modulation of PAC activity and be related to the increasedperceptual clarity of the NVS. Clear speech produced the strongest re-sponses in this region, and the enhancing effect of matching text onNVS responses may reflect the increased clarity of the degradedspeech (i.e. the “pop-out”). We do not conclude that this effect isunique to area Te1.0—despite demonstrating a difference in activitypatterns between Te1.0 and higher-order temporal and frontal re-gions and a lack of an interaction in other subfields of PAC—becausewe did not test other auditory regions, such as belt auditory cortexor subcortical structures. Given the dense interconnectivity betweenprimary auditory cortex and other auditory regions, it would not besurprising if the effect was found elsewhere as well. However, this re-sult does demonstrate an influence of non-auditory information onauditory processing as early as PAC related to the intelligibility offull-sentence stimuli. We observed no effects related to our experi-mental manipulations in primary visual cortex, which suggests thatthe modulatory influence may be relatively selective.

Althoughwe observed a significantmain effect of hemisphere in theROI analysis of Te1.0, we are hesitant to ascribe any functional signifi-cance to this difference in activity. It is well known that there are struc-tural asymmetries of size and morphology of Heschl's Gyrus (Penhuneet al., 1996) and the planum temporale (Geschwind and Levitsky,1968). Such physical asymmetries, or differences in the vasculatureresulting from these structural differences (Ratcliff et al., 1980), couldpotentially underlie interhemispheric differences in BOLD signalmagni-tude. Convincing evidence for any functional hemispheric asymmetry inPAC would have arisen as a significant interaction involving hemi-sphere, yet we observed no such effects in our data.

Attention can modulate activity in sensory cortices, such that re-sponses in the sensory cortex corresponding to the modality of anattended stimulus are enhanced (e.g. vision—(Heinze et al., 1994);and audition—(Petkov et al., 2004), and those of the ignored modalityare suppressed (Johnson and Zatorre, 2005, 2006). Furthermore, at-tention appears to have perceptual consequences, as it has beenshown to increase the perceived contrast of Gabor patches(Cameron et al., 2002; Carrasco, 2009; Carrasco et al., 2000; Pestilliand Carrasco, 2005; Treue, 2004). In our study, the combination of apartially intelligible utterance with obviously matching text mayhave drawn subjects’ attention to the auditory stimulus, resulting in

Page 9: Human auditory cortex is sensitive to the perceived ...

Fig. 6. A) The TE1.0 probability map, used for the ROI analysis, rendered on the average group T1-weighted structural image. The probability map was constructed from labelledcytoarchitectonic data in ten post-mortem brains, which were combined and transformed to our group space. The probability map was thresholded to include only voxels labelledas TE1.0 in three of ten brains (i.e. 30%). B) Mean contrast values (arbitrary units) in area Te1.0 for three types of speech: clear (C), noise-vocoded (NV), and rotated NV (rNV),paired with matching (M) or non-matching (nM) primes. Error bars indicate standard error of the mean (SEM) suitable for repeated-measures data. Asterisks represent significantdifferences (** pb0.001; * pb0.01).

1498 C.J. Wild et al. / NeuroImage 60 (2012) 1490–1502

enhanced processing and increased clarity of the degraded speech:attentional modulation may be the mechanism by which pop-out oc-curs. Alternatively, our results may reflect an automatic process thatenhances auditory processing in the presence of congruous informa-tion; further experimentation incorporating attentional manipula-tions may help resolve this question.

The modulatory phenomenonwe observed in PAC was an increasein BOLD activity for degraded stimuli that were rendered more

Fig. 7. Results of the functional connectivity analysis using A) Primary auditory cortex (as dregion. The correlation maps are thresholded at pb0.05 (FWE). White voxels indicate the svoxels were masked from the analysis to eliminate extreme peaks due to spatial autocorre

intelligible by matching primes. This might seem opposite to whatwould be expected of a predictive coding model, since, within thatframework, feedback processes ought to reduce activity in lowerlevel stages when predicted (i.e. top-down) and sensory (i.e.bottom-up) signals converge (Friston, 2005; Mumford, 1992;Murray et al., 2002, 2004; Rao and Ballard, 1999). However, it hasbeen suggested that BOLD signal in primary sensory cortices mightrepresent the precision of a prediction error, rather than error signal

efined with Te 1.0 probabilistic map) and B) left posterior STS [−51 −42 3], as a seedeed region(s). Blue voxels indicate masked voxels (see Materials and methods); theselation resulting from smoothing.

Page 10: Human auditory cortex is sensitive to the perceived ...

Table 2Clusters and peak voxels that exhibit significant functional connectivity with primaryauditory cortex, as defined with the Te1.0 probabilistic map. Areas within the maskare not reported, due to autocorrelation with the seed. All peaks are significant atpb0.05 FWE. R = Right, L = Left, P = Posterior, A = Anterior, I = Inferior, S = Supe-rior, M = Medial, MID = Middle, G = Gyrus, STG = Superior temporal gyrus, STS =Superior temporal sulcus.

Seed Coordinates (mm) t Voxelsincluster

Location

X Y Z

Primary auditory cortex(Te1.0 probabilistic map)

57 −9 0 23.04 1489 R STG48 −12 21 20.37 R postcentral G51 −24 12 17.31 R planum

temporale66 −30 18 14.86 R P STG48 9 6 14.52 R I frontal G54 9 −3 13.97 R I frontal G63 −3 15 9.09 R I precentral G51 −24 −9 8.54 R P STS48 24 15 8.19 R I frontal G

−57 −36 12 18.21 833 L planumtemporale

−36 −30 18 14.33 R M Heschl's G−45 −3 0 13.50 L insula−60 −6 −3 12.01 L A STG−60 0 15 11.82 L I precentral G−42 −9 15 10.31 L insula−60 −24 0 9.46 L STS−36 9 3 9.35 L insula

6 −18 3 12.48 22 R M geniculate−6 −18 3 8.88 L M geniculate24 3 −6 11.05 54 R putamen

−45 −78 −3 10.04 26 L MID occipitalG

3 12 24 9.85 164 R cingulate G0 12 48 9.61 R S frontal G

−24 6 −3 9.48 33 L putamen−60 −3 42 8.48 6 L precentral S−24 −69 −9 8.30 6 L I occipital G

33 −45 −18 8.29 6 R fusiform G

Bold emphasis indicates the peak voxel with the highest statistic in the cluster.

Table 3Clusters and peak voxels that exhibit significant functional connectivity with STS peak [−51action from the whole-brain analysis. All peaks are significant at pb0.05 FWE. R = Right, L =STG = Superior temporal gyrus, STS = Superior temporal sulcus.

Seed Coordinates (mm)

X Y Z

Superior temporal sulcus[−51 −42 3]

60 −24 −651 −24 1257 0 −945 −36 −363 −33 948 −18 −15−3 18 48−3 12 63

−57 −24 6−54 21 21−57 3 −15−42 27 12−51 21 −3−39 −36 18−57 −45 21−42 −45 −18

12 6 315 −72 −2442 −48 −1845 24 1857 21 24

−30 3 −6−39 3 33

Bold emphasis indicates the peak voxel with the highest statistic in the cluster.

1499C.J. Wild et al. / NeuroImage 60 (2012) 1490–1502

itself (Hesselmann et al., 2010); thus, increased signal may indicatethe brain's confidence in a perceptual hypothesis. This is consistentwith our observations, as the primes rendered responses to NVSmore similar to those evoked by coherent clear speech (i.e., primesincreased the brain's confidence in the interpretation of an ambigu-ous auditory signal). This increase in activity is consistent withhuman fMRI studies that demonstrate multisensory interactions inauditory regions: both somatosensory and visual stimuli appear toenhance BOLD responses elicited by acoustic stimuli (Bernstein etal., 2002; Calvert, 1997; Lehmann et al., 2006; Martuzzi et al., 2007;Pekkola et al., 2005; Sams et al., 1991; van Atteveldt et al., 2004,2007). These results, however, likely correspond to activations withinbelt and/or parabelt fields of auditory cortex—not necessarily the coreof PAC.

Data from non-human mammals (macaque monkeys and ferrets),on the other hand, provide more detailed mappings of multisensoryinteractions within auditory cortex. Electrophysiological and high-resolution fMRI studies have demonstrated non-auditory responsesin secondary (i.e.) belt areas of auditory cortex, involving: somato-sensation (Brosch et al., 2005; Fu et al., 2003; Kayser et al., 2005;Lakatos et al., 2007; Schroeder et al., 2001), vision (Bizley et al.,2007; Brosch et al., 2005; Ghazanfar et al., 2005; Kayser et al., 2007;Schroeder and Foxe, 2002) and eye position (Fu et al., 2004;Werner-Reiss et al., 2003). Notably, many of these studies also pro-vide evidence of multisensory effects within the primary-most coreof PAC (Bizley et al., 2007; Brosch et al., 2005; Fu et al., 2004;Ghazanfar et al., 2005; Kayser et al., 2007; Werner-Reiss et al.,2003). The current study is the first to extend such findings tohumans using rich and naturalistic full-sentence stimuli, demonstrat-ing that similar top-down interactions in primary auditory cortexmay occur in the context of natural communication.

A logical question that follows from these results is: where doesthe modulatory input to PAC come from? Anatomical models ofnon-human primates suggest that these multisensory effects couldarise from a number of possible sources. In addition to cortical feed-back from frontal regions (Hackett et al., 1999; Romanski et al.,

−42 3]; this seed was selected as the strongest peak of the Speech×Prime type inter-Left, P = Posterior, A = Anterior, I = Inferior, S = Superior, M = Medial, G = Gyrus,

t Voxelsincluster

Location

17.02 711 R STS11.74 R planum temporale11.47 R A STG10.40 R STS9.90 R P STG8.41 R STS13.90 115 L S frontal G10.44 L S frontal G13.24 577 L planum temporale10.76 L I frontal G10.65 L A STS10.38 L I frontal G9.64 L I frontal G8.78 L planum temporale8.34 L P STG13.02 29 L fusiform G11.03 46 R caudate9.75 24 R cerebellum8.83 10 R fusiform G8.56 21 R I frontal G7.42 R I frontal G8.29 15 L putamen8.09 13 L precentral S

Page 11: Human auditory cortex is sensitive to the perceived ...

1500 C.J. Wild et al. / NeuroImage 60 (2012) 1490–1502

1999a, 1999b), auditory cortex also receives efferent connectionsfrom multisensory nuclei of the thalamus (Hackett, 2007; Smileyand Falchier, 2009), and is reciprocally connected via direct connec-tions to early visual areas (Falchier et al., 2002; Rockland and Ojima,2003; Smiley and Falchier, 2009). Indeed, our functional connectivityanalysis demonstrated significant coupling between primary auditorycortex and all of these regions. Of course, functional connectivity can-not determine causal influence, and these patterns of functional cou-pling can indicate feed-forward, feedback, or indirect connections.Nonetheless, the functional connectivity and whole-brain analysesidentified candidate regions that are likely sources of top-down influ-ences on PAC. Using these findings to specify more focused hypothe-ses, future studies may shed light on this issue by using DynamicCausal Modeling (Friston et al., 2003) to compare different configura-tions of cortical networks.

Despite the necessarily equivocal evidence provided by fMRI con-nectivity measures, we believe that cognitive and neural consider-ations suggest that input to PAC arises from top-down, rather thanfrom bottom-up or lateral connections. In the conventional whole-brain analysis, the main effect of prime type revealed that onlyhigher-order areas (i.e. inferior and middle temporal, and frontal re-gions) were sensitive to the differences between prime types—a re-sult consistent with neuroimaging evidence suggesting that thecortical processes that support reading are arranged hierarchically(Cohen et al., 2002; Pugh et al., 1996; Vinckier et al., 2007). Therefore,considering the nature of the visual stimuli in this study, which werematched for low-level (i.e. temporal, spatial, and orthographic) fea-tures, it seems unlikely that the modulation of primary auditory activ-ity arises from direct thalamic or occipital connections. Instead, it ismore likely that temporal and frontal regions are sources of input toprimary auditory cortex. The main effect of speech type revealed asmall bilateral area just posterior to HG, likely on the planum tempor-ale (PT), that demonstrated an elevated response to degraded speech.Furthermore, the functional connectivity analysis demonstrated sig-nificant coupling between left PAC and this region bilaterally. In theleft hemisphere, this area corresponds quite closely to regions sur-rounding PAC as observed by Davis and Johnsrude (2003) that exhib-ited an increase in activity for degraded speech relative to clearspeech and signal-correlated noise. They proposed that this patternreflects mechanisms—recruited or initiated by frontal regions—thatact to enhance comprehension of potentially intelligible input. Thishypothesis fits with our data, and suggests this area as possible sourceof modulatory input to PAC.

Frontal regions were also implicated as sources of top-down inputto PAC. The pattern of activity within LIFG and left dorsal premotorcortex may reflect increased perceptual effort for degraded speech(Davis and Johnsrude, 2003; Giraud et al., 2004; Hervais-Adelman etal., submitted for publication) when matching primes are providedcompared to when visual information is uninformative. The contribu-tions of these two areas are not necessarily the same or even specificto spoken language comprehension. Evidence from neuroimagingstudies suggests that LIFG contributes to the processes involved inaccessing and combining word meanings (Rodd et al., 2005;Thompson-Schill et al., 1997; Wagner et al., 2001), and that premotorcortex plays an active role in speech perception by generating inter-nal representations of segmental categories to compare against audi-tory input (Iacoboni, 2008; Osnes et al., 2011; Wilson and Iacoboni,2006; Wilson et al., 2004). Indeed, segmental representations maybe critical to the perceptual enhancement of NVS: recent studieshave demonstrated a sublexical (i.e., segmental) locus for the percep-tual learning of NVS (Hervais-Adelman et al., 2008). The informationafforded by both these processes (accessing/combining word mean-ings, and generating internal representations) could be relayed fromLIFG and premotor cortex to temporal auditory regions via feedbackconnections (Hackett et al., 1999; Romanski et al., 1999a, 1999b).Our functional connectivity analysis supports this hypothesis, as we

observed significant coupling of activity among posterior STS, STG,LIFG and dorsal premotor cortex.

Our results may underestimate the size of the effect in PAC. Daviset al. (2005) found that word-report scores from previously naive lis-teners of six-band NVS improved from 40% to 70% correct after onlythirty sentences. Our volunteers probably experienced some form oflearning for NVS over the course of the experiment, reducing the in-telligibility difference between the NVM and NVnM conditions andthus any activity difference between these two conditions.

Proponents of autonomous models of speech adhere to the notionof a parsimonious system in which feedback is an additional, and un-necessary, component (Norris et al., 2000). This view, however,seems at odds with the anatomy of the cortical auditory system inwhich feedback connections, and cross-modal projections, are sucha prominent feature. The findings of this study are strictly incompat-ible with a feed-forward account of speech perception, and insteadprovide empirical support for an interactive account of speech pro-cessing by demonstrating a modulation of activity within PAC, relatedto the perceptual clarity of naturalistic speech stimuli (and hence in-telligibility), which cannot be driven by stimulus differences alone.The source of modulatory input to PAC appears to be higher-ordertemporal areas, and/or frontal regions including LIFG and motorareas. Further work, including studies using EEG and MEG measureswith high-temporal precision (e.g. Gow and Segawa, 2009) are re-quired to more clearly understand how, and when, these interactiveprocesses contribute to speech perception.

References

Abrams, D.A., Nicol, T., Zecker, S., Kraus, N., 2008. Right-hemisphere auditory cortex isdominant for coding syllable patterns in speech. J. Neurosci. 28, 3958–3965.

Alink, A., Schwiedrzik, C.M., Kohler, A., Singer, W., Muckli, L., 2010. Stimulus predict-ability reduces responses in primary visual cortex. J. Neurosci. 30, 2960–2966.

Benoit, C., Mohamadi, T., Kandel, S., 1994. Effects of phonetic context on audio-visualintelligibility of French. J. Speech Hear. Res. 37, 1195–1203.

Bernstein, L.E., Auer Jr., E.T., Moore, J.K., Ponton, C.W., Don, M., Singh, M., 2002. Visualspeech perception without primary auditory cortex activation. Neuroreport 13,311.

Binder, J.R., Frost, J.A., Hammeke, T.A., Rao, S.M., Cox, R.W., 1996. Function of the leftplanum temporale in auditory and linguistic processing. Brain 119, 1239–1247.

Bizley, J.K., Nodal, F.R., Bajo, V.M., Nelken, I., King, A.J., 2007. Physiological and anatom-ical evidence for multisensory interactions in auditory cortex. Cereb. Cortex 17,2172–2189.

Brett, M., Anton, J.-L., Valabregue, R., Poline, J.-B., 2002. Region of interest analysis usingan SPM toolbox [abstract]. Presented at the 8th International Conference on Func-tional Mapping of the Human Brain, June 2–6, 2002, Sendai, Japan.

Brosch, M., Selezneva, E., Scheich, H., 2005. Nonauditory events of a behavioralprocedure activate auditory cortex of highly trained monkeys. J. Neurosci. 25,6797–6806.

Calvert, G.A., 1997. Activation of auditory cortex during silent lipreading. Science 276,593–596.

Cameron, E.L., Tai, J.C., Carrasco, M., 2002. Covert attention affects the psychometricfunction of contrast sensitivity. Vis. Res. 42, 949–967.

Carrasco, M., 2009. Cross-modal attention enhances perceived contrast. Proc. Natl.Acad. Sci. 106, 22039.

Carrasco, M., Penpeci-Talgar, C., Eckstein, M., 2000. Spatial covert attention increasescontrast sensitivity across the CSF: support for signal enhancement. Vis. Res. 40,1203–1215.

Cohen, L., Lehéricy, S., Chochon, F., Lemer, C., Rivaud, S., Dehaene, S., 2002. Language-specific tuning of visual cortex? Functional properties of the visual word formarea. Brain 125, 1054–1069.

Davis, M.H., Johnsrude, I.S., 2003. Hierarchical processing in spoken language compre-hension. J. Neurosci. 23, 3423.

Davis, M.H., Johnsrude, I.S., Hervais-Adelman, A., Taylor, K., McGettigan, C., 2005. Lexi-cal information drives perceptual learning of distorted speech: evidence from thecomprehension of noise-vocoded sentences. J. Exp. Psychol. Gen. 134, 222–241.

de la Mothe, L.A., Blumell, S., Kajikawa, Y., Hackett, T.A., 2006. Cortical connections ofthe auditory cortex in marmoset monkeys: core and medial belt regions. J. Comp.Neurol. 496, 27–71.

Devlin, J.T., Raley, J., Tunbridge, E., Lanary, K., Floyer-Lea, A., Narain, C., Cohen, I., Behrens,T., Jezzard, P., Matthews, P.M., Moore, D.R., 2003. Functional asymmetry for auditoryprocessing in human primary auditory cortex. J. Neurosci. 23, 11516–11522.

Eickhoff, S.B., Stephan, K.E., Mohlberg, H., Grefkes, C., Fink, G.R., Amunts, K., Zilles, K.,2005. A new SPM toolbox for combining probabilistic cytoarchitectonic maps andfunctional imaging data. Neuroimage 25, 1325–1335.

Falchier, A., Clavagnier, S., Barone, P., Kennedy, H., 2002. Anatomical evidence ofmultimodal integration in primate striate cortex. J. Neurosci. 22, 5749–5759.

Page 12: Human auditory cortex is sensitive to the perceived ...

1501C.J. Wild et al. / NeuroImage 60 (2012) 1490–1502

Fox, M.D., Zhang, D., Snyder, A.Z., Raichle, M.E., 2009. The global signal and observedanticorrelated resting state brain networks. J. Neurophysiol. 101, 3270–3283.

Friston, K., 2005. A theory of cortical responses. Philos. Trans. R. Soc. Lond. B Biol. Sci.360, 815–836.

Friston, K.J., Williams, S., Howard, R., Frackowiak, R.S., Turner, R., et al., 1996.Movement-related effects in fMRI time-series. Magn. Reson. Med. 35, 346–355.

Friston, K.J., Harrison, L., Penny, W., 2003. Dynamic causal modelling. Neuroimage 19,1273–1302.

Friston, K.J., Daunizeau, J., Kilner, J., Kiebel, S.J., 2010. Action and behavior: a free-energy formulation. Biol. Cybern. 102, 227–260.

Frost, R., 1998. Toward a strong phonological theory of visual word recognition: trueissues and false trails. Psychol. Bull. 123, 71–99.

Fu, K.-M.G., Johnston, T.A., Shah, A.S., Arnold, L., Smiley, J., Hackett, T.A., Garraghty, P.E.,Schroeder, C.E., 2003. Auditory cortical neurons respond to somatosensory stimu-lation. J. Neurosci. 23, 7510–7515.

Fu, K.-M.G., Shah, A.S., O'Connell, M.N., McGinnis, T., Eckholdt, H., Lakatos, P., Smiley, J.,Schroeder, C.E., 2004. Timing and laminar profile of eye-position effects on audito-ry responses in primate auditory cortex. J. Neurophysiol. 92, 3522–3531.

Geschwind, N., Levitsky, W., 1968. Human brain: left–right asymmetries in temporalspeech region. Science 161, 186–187.

Ghazanfar, A.A., Maier, J.X., Hoffman, K.L., Logothetis, N.K., 2005. Multisensory integra-tion of dynamic faces and voices in rhesus monkey auditory cortex. J. Neurosci. 25,5004–5012.

Giraud, A.L., Kell, C., Thierfelder, C., Sterzer, P., Russ, M.O., Preibisch, C., Kleinschmidt, A.,2004. Contributions of sensory input, auditory search and verbal comprehension tocortical activity during speech processing. Cereb. Cortex 14, 247.

Gow Jr., D.W., Segawa, J.A., 2009. Articulatory mediation of speech perception: a causalanalysis of multi-modal imaging data. Cognition 110, 222–236.

Greenwood, D.D., 1990. A cochlear frequency-position function for several species—29 years later. J. Acoust. Soc. Am. 87, 2592–2605.

Hackett, T.A., 2001. Architectonic identification of the core region in auditory cortex ofmacaques, chimpanzees, and humans. J. Comp. Neurol. 441, 197–222.

Hackett, T.A., 2007. Multisensory convergence in auditory cortex, II. Thalamocorticalconnections of the caudal superior temporal plane. J. Comp. Neurol. 502, 924–952.

Hackett, T.A., Stepniewska, I., Kaas, J.H., 1999. Prefrontal connections of the parabelt au-ditory cortex in macaque monkeys. Brain Res. 817, 45–58.

Hackett, T.A., Preuss, T.M., Kaas, J.H., 2001. Architectonic identification of the core re-gion in auditory cortex of macaques, chimpanzees, and humans. The Journal ofComparative Neurology 441, 197–222.

Hall, D.A., Haggard, M.P., Akeroyd, M.A., Palmer, A.R., Summerfield, A.Q., Elliott, M.R.,Gurney, E.M., Bowtell, R.W., 1999. “Sparse” temporal sampling in auditory fMRI.Hum. Brain Mapp. 7, 213–223.

Heinze, H.J., Mangun, G.R., Burchert, W., Hinrichs, H., Scholz, M., Münte, T.F., Gös, A.,Scherg, M., Johannes, S., Hundeshagen, H., Gazzaniga, M.S., Hillyard, S.A., 1994.Combined spatial and temporal imaging of brain activity during visual selective at-tention in humans. Nature 372, 543–546.

Hervais-Adelman, A., Davis, M.H., Johnsrude, I.S., Carlyon, R.P., 2008. Perceptual learn-ing of noise vocoded words: effects of feedback and lexicality. J. Exp. Psychol. 34,460–474.

Hervais-Adelman, A., Carlyon, R.P., Johnsrude, I.S., Davis, M.H., 2010. Motor regions arerecruited for effortful comprehension of noise-vocoded words: evidence fromfMRI.

Hesselmann, G., Sadaghiani, S., Friston, K.J., Kleinschmidt, A., 2010. Predictive coding orevidence accumulation? False inference and neuronal fluctuations. PLoS One 5,e9926.

Iacoboni, M., 2008. The role of premotor cortex in speech perception: evidence fromfMRI and rTMS. J. Physiol. Paris 102, 31–34.

Johnson, J.A., Zatorre, R.J., 2005. Attention to simultaneous unrelated auditory and visu-al events: behavioral and neural correlates. Cereb. Cortex 15, 1609–1620.

Johnson, J.A., Zatorre, R.J., 2006. Neural substrates for dividing and focusing attentionbetween simultaneous auditory and visual events. Neuroimage 31, 1673–1681.

Kaas, J.H., Hackett, T.A., 2000. Subdivisions of auditory cortex and processing streams inprimates. Proc. Natl. Acad. Sci. U. S. A. 97, 11793.

Kalikow, D.N., Stevens, K.N., Elliott, L.L., 1977. Development of a test of speech intelligi-bility in noise using sentence materials with controlled word predictability.J. Acoust. Soc. Am. 61, 1337–1351.

Kayser, C., Petkov, C., Augath, M., Logothetis, N.K., 2005. Integration of touch and soundin auditory cortex. Neuron 48, 373–384.

Kayser, C., Petkov, C., Augath, M., Logothetis, N.K., 2007. Functional imaging revealsvisual modulation of specific fields in auditory cortex. J. Neurosci. 27, 1824–1835.

Koch, M.A., Norris, D.G., Hund-Georgiadis, M., 2002. An investigation of functional andanatomical connectivity using magnetic resonance imaging. Neuroimage 16,241–250.

Lakatos, P., Chen, C.-M., O'Connell, M.N., Mills, A., Schroeder, C.E., 2007. Neuronal oscil-lations and multisensory interaction in primary auditory cortex. Neuron 53,279–292.

Lehmann, C., Herdener, M., Esposito, F., Hubl, D., di Salle, F., Scheffler, K., Bach, D.R.,Federspiel, A., Kretz, R., Dierks, T., Seifritz, E., 2006. Differential patterns of multi-sensory interactions in core and belt areas of human auditory cortex. Neuroimage31, 294–300.

Loftus, G., Masson, M., 1994. Using confidence-intervals in within-subject designs.Psychon. Bull. Rev. 1, 476–490.

Macey, P.M., Macey, K.E., Kumar, R., Harper, R.M., 2004. A method for removal of globaleffects from fMRI time series. Neuroimage 22, 360–366.

Macleod, A., Summerfield, Q., 1987. Quantifying the contribution of vision to speechperception in noise. Br. J. Audiol. 21, 131–141.

Martuzzi, R., Murray, M.M., Michel, C.M., Thiran, J.-P., Maeder, P.P., Clarke, S., Meuli,R.A., 2007. Multisensory interactions within human primary cortices revealed byBOLD dynamics. Cereb. Cortex 17, 1672–1679.

Miller, G.A., Isard, S., 1963. Some perceptual consequences of linguistic rules. J. VerbalLearn. Verbal Behav. 2, 217–228.

Miller, G.A., Heise, G.A., Lichten, W., 1951. The intelligibility of speech as a function ofthe context of the test materials. J. Exp. Psychol. 41, 329–335.

Morosan, P., Rademacher, J., Schleicher, A., Amunts, K., Schormann, T., Zilles, K., 2001.Human primary auditory cortex: cytoarchitectonic subdivisions and mappinginto a spatial reference system. Neuroimage 13, 684–701.

Mumford, D., 1992. On the computational architecture of the neocortex. Biol. Cybern.66, 241–251.

Murray, S.O., Kersten, D., Olshausen, B.A., Schrater, P., Woods, D.L., 2002. Shape percep-tion reduces activity in human primary visual cortex. Proc. Natl. Acad. Sci. U. S. A.99, 15164.

Murray, S.O., Schrater, P., Kersten, D., 2004. Perceptual grouping and the interactionsbetween visual cortical areas. Neural Netw. 17, 695–705.

Norris, D., 1994. Shortlist: a connectionist model of continuous speech recognition.Cognition 52, 189–234.

Norris, D., McQueen, J.M., 2008. Shortlist B: a Bayesian model of continuous speech rec-ognition. Psychol. Rev. 115, 357–395.

Norris, D., McQueen, J.M., Cutler, A., 2000. Merging information in speech recognition:feedback is never necessary. Behav. Brain Sci. 23 299-+.

Nygaard, L., Pisoni, D., 1998. Talker-specific learning in speech perception. Percept.Psychophys. 60, 355–376.

Orden, G.C., 1987. A ROWS is a ROSE: spelling, sound, and reading. Mem. Cogn. 15,181–198.

Osnes, B., Hugdahl, K., Specht, K., 2011. Effective connectivity analysis demonstratesinvolvement of premotor cortex during speechperception.NeuroImage54, 2437–2445.

Pandya, D.N., Hallett, M., Mukherjee, S.K., 1969. Intra-and interhemispheric connec-tions of the neocortical auditory system in the rhesus monkey. Brain Research14, 49–65.

Pekkola, J., Ojanen, V., Autti, T., Jääskeläinen, I.P., Möttönen, R., Tarkiainen, A., Sams, M.,2005. Primary auditory cortex activation by visual speech: an fMRI study at 3 T.Neuroreport 16, 125.

Penhune, V.B., Zatorre, R.J., MacDonald, J.D., Evans, A.C., 1996. Interhemispheric ana-tomical differences in human primary auditory cortex: probabilistic mapping andvolume measurement from magnetic resonance scans. Cereb. Cortex 6, 661.

Pestilli, F., Carrasco, M., 2005. Attention enhances contrast sensitivity at cued and im-pairs it at uncued locations. Vis. Res. 45, 1867–1875.

Petkov, C.I., Kang, X., Alho, K., Bertrand, O., Yund, E.W., Woods, D.L., 2004. Attentionalmodulation of human auditory cortex. Nat. Neurosci. 7, 658–663.

Pugh, K.R., Shaywitz, B.A., Shaywitz, S.E., Constable, R.T., Skudlarski, P., Fulbright, R.K.,Bronen, R.A., Shankweiler, D.P., Katz, L., Fletcher, J.M., Gore, J.C., 1996. Cerebral or-ganization of component processes in reading. Brain 119, 1221–1238.

Rao, R.P.N., Ballard, D.H., 1999. Predictive coding in the visual cortex: a functional inter-pretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79.

Rastle, K., Brysbaert, M., 2006. Masked phonological priming effects in English: are theyreal? Do they matter? Cogn. Psychol. 53, 97–145.

Ratcliff, G., Dila, C., Taylor, L., Milner, B., 1980. The morphological asymmetry of thehemispheres and cerebral dominance for speech: a possible relationship. BrainLang. 11, 87–98.

Rissman, J., Gazzaley, A., D'Esposito, M., 2004. Measuring functional connectivity dur-ing distinct stages of a cognitive task. Neuroimage 23, 752–763.

Rockland, K.S., Ojima, H., 2003. Multisensory convergence in calcarine visual areas inmacaque monkey. Int. J. Psychophysiol. 50, 19–26.

Rodd, J.M., Davis, M.H., Johnsrude, I.S., 2005. The neural mechanisms of speech compre-hension: fMRI studies of semantic ambiguity. Cereb. Cortex 15, 1261.

Romanski, L.M., Bates, J.F., Goldman-Rakic, P.S., 1999a. Auditory belt and parabelt projec-tions to the prefrontal cortex in the rhesus monkey. J. Comp. Neurol. 403, 141–157.

Romanski, L.M., Tian, B., Fritz, J., Mishkin, M., Goldman-Rakic, P.S., Rauschecker, J.P.,1999b. Dual streams of auditory afferents target multiple domains in the primateprefrontal cortex. Nat. Neurosci. 2, 1131–1136.

Sams, M., Aulanko, R., Hämäläinen, M., Hari, R., Lounasmaa, O.V., Lu, S.-T., Simola, J.,1991. Seeing speech: visual information from lip movements modifies activity inthe human auditory cortex. Neurosci. Lett. 127, 141–145.

Schroeder, C.E., Foxe, J.J., 2002. The timing and laminar profile of converging inputs tomultisensory areas of the macaque neocortex. Cogn. Brain Res. 14, 187–198.

Schroeder, C.E., Lindsley, R.W., Specht, C., Marcovici, A., Smiley, J.F., Javitt, D.C., 2001.Somatosensory input to auditory association cortex in the macaque monkey.J. Neurophysiol. 85, 1322–1327.

Shannon, R.V., Zeng, F.G., Kamath, V., Wygonski, J., Ekelid, M., 1995. Speech recognitionwith primarily temporal cues. Science 270, 303–304.

Shattuck, D.W., Mirza, M., Adisetiyo, V., Hojatkashani, C., Salamon, G., Narr, K.L., Pol-drack, R.A., Bilder, R.M., Toga, A.W., 2008. Construction of a 3D probabilistic atlasof human cortical structures. Neuroimage 39, 1064–1080.

Smiley, J.F., Falchier, A., 2009. Multisensory connections of monkey auditory cerebralcortex. Hear. Res. 258, 37–46.

Sumby, W.H., Pollack, I., 1954. Visual contribution to speech intelligibility in noise.J. Acoust. Soc. Am. 26, 212–215.

Summerfield, C., Koechlin, E., 2008. A neural representation of prior information duringperceptual inference. Neuron 59, 336–347.

Tahmasebi, A., Abolmaesumi, P., Geng, X., Morosan, P., Amunts, K., Christensen, G.,Johnsrude, I., 2009. A new approach for creating customizable cytoarchitectonicprobabilistic maps without a template. Med. Image Comput. Comput.-Assist.Interv.–MICCAI 795–802.

Page 13: Human auditory cortex is sensitive to the perceived ...

1502 C.J. Wild et al. / NeuroImage 60 (2012) 1490–1502

Thompson-Schill, S.L., D'Esposito, M., Aguirre, G.K., Farah, M.J., 1997. Role of left inferiorprefrontal cortex in retrieval of semantic knowledge: a reevaluation. Proc. Natl.Acad. Sci. U. S. A. 94, 14792.

Treue, S., 2004. Perceptual enhancement of contrast by attention. Trends Cogn. Sci. 8,435–437.

van Atteveldt, N., Formisano, E., Goebel, R., Blomert, L., 2004. Integration of letters andspeech sounds in the human brain. Neuron 43, 271–282.

van Atteveldt, N., Formisano, E., Blomert, L., Goebel, R., 2007. The effect of temporalasynchrony on the multisensory integration of letters and speech sounds. Cereb.Cortex 17, 962–974.

van Wassenhove, V., Grant, K.W., Poeppel, D., 2005. Visual speech speeds up the neuralprocessing of auditory speech. Proc. Natl. Acad. Sci. U. S. A. 102, 1181–1186.

van Wassenhove, V., Grant, K.W., Poeppel, D., 2007. Temporal window of integration inauditory–visual speech perception. Neuropsychologia 45, 598–607.

Vinckier, F., Dehaene, S., Jobert, A., Dubus, J., Sigman, M., Cohen, L., 2007. Hierarchicalcoding of letter strings in the ventral stream: dissecting the inner organization ofthe visual word-form system. Neuron 55, 143–156.

Wagner, A.D., Paré-Blagoev, E.J., Clark, J., Poldrack, R.A., 2001. Recoveringmeaning: left prefrontal cortex guides controlled semantic retrieval. Neuron31, 329–338.

Werner-Reiss, U., Kelly, K.A., Trause, A.S., Underhill, A.M., Groh, J.M., 2003. Eyeposition affects activity in primary auditory cortex of primates. Curr. Biol. 13,554–562.

Wilson, S.M., Iacoboni, M., 2006. Neural responses to non-native phonemes varying inproducibility: evidence for the sensorimotor nature of speech perception. Neuro-image 33, 316–325.

Wilson, S.M., Saygin, A.P., Sereno, M.I., Iacoboni, M., 2004. Listening to speech activatesmotor areas involved in speech production. Nat. Neurosci. 7, 701–702.

Winer, J.A., 2005. Decoding the auditory corticofugal systems. Hear. Res. 207, 1–9.Zatorre, R.J., Belin, P., Penhune, V.B., 2002. Structure and function of auditory cortex:

music and speech. Trends Cogn. Sci. 6, 37–46.Zekveld, A.A., Kramer, S.E., Kessens, J.M., Vlaming, M.S.M.G., Houtgast, T., 2008. The

benefit obtained from visually displayed text from an automatic speech recognizerduring listening to speech presented in noise. Ear Hear. 29, 838–852.