Investigating the Neural Correlates of Crossmodal Facilitation as … · 2014. 1. 15. · Zainab...

133
INVESTIGATING THE NEURAL CORRELATES OF CROSSMODAL FACILITATION AS A RESULT OF ATTENTIONAL CUEING: AN EVENT-RELATED fMRI STUDY by Zainab Fatima A thesis submitted in conformity with the requirements for the degree of Masters of Science Institute of Medical Science University of Toronto © Copyright by Zainab Fatima, 2008

Transcript of Investigating the Neural Correlates of Crossmodal Facilitation as … · 2014. 1. 15. · Zainab...

  • INVESTIGATING THE NEURAL CORRELATES OF CROSSMODAL FACILITATION AS A RESULT OF

    ATTENTIONAL CUEING: AN EVENT-RELATED fMRI STUDY

    by

    Zainab Fatima

    A thesis submitted in conformity with the requirements for the degree of Masters of Science

    Institute of Medical Science University of Toronto

    © Copyright by Zainab Fatima, 2008

  • ii

    Investigating the Neural Correlates of Crossmodal Facilitation as

    a Result of Attentional Cueing: an Event-Related fMRI Study

    Zainab Fatima

    Masters of Science

    Institute of Medical Science University of Toronto

    2008

    Abstract

    Attentional cueing modulated neural processes differently depending on input modality. I used

    event-related fMRI to investigate how auditory and visual cues affected reaction times to

    auditory and visual targets. Behavioural results showed that responses were faster when: cues

    appeared first compared to targets and cues were auditory versus visual. The first result was

    supported by an increase in BOLD percent signal change in sensory cortices upon cue but not

    target presentation. Task-related activation patterns showed that the auditory cue activated

    auditory and visual cortices while the visual cue activated the visual cortices and the fronto-polar

    cortex. Next, I computed brain-behaviour correlations for both cue types which revealed that the

    auditory cue recruited medial visual areas and a fronto-parietal attentional network to mediate

    behaviour while the visual cue engaged a posterior network composed of lateral visual areas and

    subcortical structures. The results suggest that crossmodal facilitation occurs via independent

    neural pathways depending on cue modality.

  • iii

    Acknowledgments I am still in awe of the fact that I have now officially completed my Masters degree. This

    is partly because writing the thesis appeared to be such an arduous task at the beginning.

    Nevertheless, with time and patience, the page numbers increased and the quality of the writing

    improved. In retrospect, I realize that the one person who constantly supported me and pushed

    me to produce the best possible work is my supervisor – Dr. Anthony Randal McIntosh. His

    relentless critiques of my thesis and constant insistence on getting tasks completed have brought

    me to this pinnacle in my life. I am deeply grateful to him. I admire his strength of character and

    his ability to always be innovative in the light of academic adversity. He regards criticisms of his

    work as challenges and is truly an inspiration when it comes to scientific knowledge. If my

    scientific career is an inkling of what Randy’s is, I would be quite satisfied with my progress. So,

    thank you Randy for being the person that you are and for keeping me motivated throughout this

    whole process.

    I would like to thank past and present members of the McIntosh lab who have provided

    valuable feedback in various forms during the preparation of my graduate work. These people

    include Maria Tassopoulos, Jordan Poppenk, Grigori Yourganov, Tanya Brown, Roxane Itier,

    Vasily Vakorin, Diana Khrapatch, Anjali Raja, Antonio Vallesi, Wilkin Chau, Michele Korostil,

    Signe Bray, Jeremy Caplan and Mackenzie Glaholt. I would like to especially thank Natasa

    Kovacevic – for her insights with regards to my experiment, Andreea Diaconescu and Bratislav

    Misic – for allowing me to vent and fret at any given time during my writing episodes, Andrea

    Protzner – for constantly keeping me caffeinated, Sandra Moses – for her gentle way of

    conveying criticisms, and Hana Burian – for the sushi runs and bear hugs.

    I would like to thank my committee members – Drs. Adam Anderson and Claude Alain

    for their support and for providing me with thesis-related feedback so promptly. I would also like

    to thank Karen Davis, graduate coordinator at the Institute of Medical Science (IMS), for

    allowing me to defend my thesis within such stringent time constraints.

    Last but not least, I would like to thank my parents and brother for their faith in my

    ability to accomplish any goal that I have set for myself and for taking care of me. I would like to

    thank my husband and best friend, Ali - without his foot-rubs, back massages and constant

  • iv

    coaxing – I would be far from completing any graduate work. And finally, I would like to thank

    my baby girl who has so patiently stayed in my tummy till I have completed my academic

    responsibilities. This is one journey we’ve already shared and I can’t wait to meet you, my

    darling.

  • v

    Table of Contents Abstract ........................................................................................................................................... ii

    Acknowledgments.......................................................................................................................... iii 

    Table of Contents............................................................................................................................ v 

    List of Tables ............................................................................................................................... viii 

    List of Figures ................................................................................................................................ ix 

    List of Appendices .......................................................................................................................... x 

    List of Abbreviations ..................................................................................................................... xi 

    Chapter 1: Literature Review.......................................................................................................... 1 

    1.1  Overview............................................................................................................................. 1 

    1.2  Classifying Different Attentional Mechanisms .................................................................. 1 

    1.2.1  Orienting ................................................................................................................. 1 

    1.2.2  Endogenous vs. Exogenous Shifts .......................................................................... 2 

    1.3  Influence of Crossmodal Stimuli on Attentional Mechanisms ........................................... 4 

    1.4  Crossmodal Asymmetry Reported In Cognitive, Physiological, and Developmental Studies................................................................................................................................. 7 

    1.4.1  Auditory to Visual (A-V) Interactions.................................................................... 8 

    1.4.2  Visual to Auditory (V-A) Interactions.................................................................. 10 

    1.5  General Anatomy of Central Auditory and Visual Pathways........................................... 11 

    1.5.1  The Auditory Pathway .......................................................................................... 11 

    1.5.2  The Visual Pathway .............................................................................................. 13 

    1.6  From Anatomy to Function - A Dynamic Systems’ Perspective...................................... 14 

    1.7  An Overview of fMRI....................................................................................................... 16 

    1.7.1  Basic MRI Physics................................................................................................ 16 

    1.7.2  Physiological Basis of BOLD fMRI ..................................................................... 18 

    1.7.3  Coupling of Neuronal Activity & BOLD ............................................................. 19 

  • vi

    Chapter 2: Aims and Hypotheses.................................................................................................. 22 

    Chapter 3: Attentional Cueing Modulates Multisensory Interactions in Human Sensory Cortices .................................................................................................................................... 24 

    3.1  Introduction....................................................................................................................... 24 

    3.2  Materials and Methods...................................................................................................... 26 

    3.2.1  Participants............................................................................................................ 26 

    3.2.2  Stimuli................................................................................................................... 27 

    3.2.3  Apparatus .............................................................................................................. 27 

    3.2.4  Procedure .............................................................................................................. 28 

    3.2.4.1  Trial Structure......................................................................................... 28 

    3.2.4.2  Task Types.............................................................................................. 28 

    3.2.4.3  fMRI Session .......................................................................................... 29 

    3.2.5  fMRI Scanning Parameters ................................................................................... 30 

    3.2.6  Data Analysis ........................................................................................................ 30 

    3.2.6.1  Pre-processing Pipeline. ......................................................................... 30 

    3.2.6.2  Statistical Analysis. ................................................................................ 32 

    3.3  Results............................................................................................................................... 34 

    3.3.1  Behavioural Performance...................................................................................... 34 

    3.3.2  fMRI Results......................................................................................................... 35 

    3.4  Discussion ......................................................................................................................... 36 

    Tables............................................................................................................................................ 40 

    Figures........................................................................................................................................... 46 

    Chapter 4: The Interplay of Cue Modality and Response Latency in Neural Networks Supporting Crossmodal Facilitation......................................................................................... 60 

    4.1  Introduction....................................................................................................................... 60 

    4.2  Methods............................................................................................................................. 62 

    4.1.1  Data Analysis ........................................................................................................ 62 

  • vii

    4.3  Results............................................................................................................................... 63 

    4.1.2  Behavioural Performance...................................................................................... 63 

    4.1.3  fMRI Results......................................................................................................... 63 

    4.4  Discussion ......................................................................................................................... 64 

    Tables............................................................................................................................................ 68 

    Figures........................................................................................................................................... 75 

    Chapter 5: General Discussion...................................................................................................... 87 

    5.1  A Convergent Model of Audio-Visual Interactions.......................................................... 90 

    5.2  Dynamic Processing in Sensory-specific Cortices ........................................................... 91 

    5.3  Limitations ........................................................................................................................ 92 

    5.4  Future Directions .............................................................................................................. 93 

    References..................................................................................................................................... 95 

    Appendices.................................................................................................................................. 115 

  • viii

    List of Tables Table 3.1: Mean Reaction Times by Condition. ........................................................................... 40

    Table 3.2: Local Maxima from AC-VT: AT-VC Task ST-PLS. .................................................. 41

    Table 3.3: Local Maxima from VC-AT: VT-AC Task ST-PLS. .................................................. 43

    Table 4.1: Local Maxima from AC-VT: AT-VC Behavioural ST-PLS. ...................................... 68

    Table 4.2: Local Maxima from VC-AT: VT-AC Behavioural ST-PLS. ...................................... 71

  • ix

    List of Figures Figure 3.1. Experimental design schematic for auditory cue-visual target tasks.......................... 46

    Figure 3.2. Experimental design schematic for visual cue-auditory target tasks.......................... 48

    Figure 3.3. Behaviour measures for all four experimental tasks. ................................................. 50

    Figure 3.4. BOLD HRFs for cue and target – auditory modality. ................................................ 52

    Figure 3.5. Singular image and design scores differentiating cue from target for auditory

    modality ........................................................................................................................................ 54

    Figure 3.6. BOLD HRFs for cue and target – visual modality. .................................................... 56

    Figure 3.7. Singular image and design scores differentiating cue from target for visual modality..

    ....................................................................................................................................................... 58

    Figure 4.1. Correlation profiles for the auditory cue. ................................................................... 75

    Figure 4.2. Brain scores plotted by participants for the auditory cue. .......................................... 77

    Figure 4.3. Singular images of brain areas that facilitate reaction time for the auditory cue. ...... 79

    Figure 4.4. Correlation profiles for the visual cue. ....................................................................... 81

    Figure 4.5. Brain scores plotted by participants for the visual cue............................................... 83

    Figure 4.6. Singular images of brain regions that facilitate reaction time for the visual cue. ...... 85

  • x

    List of Appendices Appendix A. fMRI Screening Form. .......................................................................................... 115

    Appendix B. MRI Screening Form............................................................................................. 119

    Appendix C: Information and Consent Form. ............................................................................ 120

  • List of Abbreviations

    Abbreviation Region Cu cuneus Cl claustrum Ga angular gyrus GC cingulate gyrus GF fusiform gyrus

    GFd medial frontal gyrus GFi inferior frontal gyrus

    GFm middle frontal gyrus GFs superior frontal gyrus Gh parahippocampal gyrus GL lingual gyrus GOi inferior occipital gyrus

    GOm middle occipital gyrus GOs superior occipital gyrus

    GPoC postcentral gyrus (sensory cortex) GPrC precentral gyrus (motor cortex) GTi inferior temporal gyrus

    GTm middle temporal gyrus GTs superior temporal gyrus Gsm supramarginal gyrus INS insula LPc paracentral lobule LPi inferior parietal lobe LPs superior parietal lobe Pcu precuneus Th thalamus

    Note: Definition of the region abbreviations by reference to Talairach and Tournoux (1988). Other abbreviations used in tables include L and R which stand for left and right respectively. Ant refers to anterior and Post refers to posterior.

    xi

  • 1

    Chapter 1: Literature Review

    1.1 Overview The past few decades have marked a dramatic change in the realm of cognitive

    neuroscience. Neuroimaging techniques such as functional magnetic resonance imaging (fMRI)

    and electroencephalography (EEG) have provided scientists with tools to examine brain function

    across spatial and temporal domains. Prior to the use of functional neuroimaging, attentional

    theories about the brain were based primarily on animal work or observations of deficits in

    patients with brain damage or disease. These theories of attention are now informed both by

    animal work and human neuroimaging studies; researchers are actively trying to bridge the gap

    between brain function and attentional models.

    This section will begin with an overview of some common attentional mechanisms and

    how these processes can form the basis of behavioural crossmodal facilitation. Crossmodal

    facilitation occurs when a cue in one sensory modality, for example auditory, speeds responses to

    a target in a different sensory modality, such as vision. The motivation for the current experiment

    is to study the effects of crossmodal facilitation, in audition and vision, using the fMRI

    technique.

    A survey of the behavioural literature on crossmodal facilitation will be followed by an

    outline of the neuroanatomy that may support audio-visual processes in the brain. The

    implications of neuroanatomical studies on the organization of brain function will then be

    described continued by a brief review of the fMRI technique.

    1.2 Classifying Different Attentional Mechanisms

    1.2.1 Orienting

    Experiments on selective attention have shown that when participants are provided with a

    cue for the location of an upcoming event, they direct their attention to the event (Posner, 1980;

    Posner, Inhoff, Friedrich, & Cohen, 1987). Posner developed an attentional cueing paradigm to

    study such visual shifts of attention. The paradigm consisted of a central cross and two peripheral

    boxes. The central cross was replaced by a plus sign or an arrow that pointed in the direction of

    one of the boxes. The plus sign indicated that there was an equiprobable chance of a target

  • 2

    appearing in either of the two boxes while the direction of the arrow correctly identified the

    location of the target in eighty-percent of the trials (valid trials) and did not cue correct target

    location in twenty-percent of the trials (invalid trials). Responses were faster to valid trials

    compared to invalid trials. Posner (1980) postulated that facilitated responses in valid conditions

    occurred as a result of attentional engagement at the target location.

    Subsequently, Posner and Petersen (1990) proposed a model of selective attention based

    on neuropsychological studies where three loci in the brain – the parietal lobe, the superior

    colliculus and the thalamus - seemed to contribute to different aspects of attentional shifts.

    Individuals with unilateral parietal lobe damage showed normal responses when cues and targets

    were presented on the side that was contra-lateral to their lesion but their performance was

    impaired when cues appeared on the ipsa-lesional side and the target appeared on the contra-

    lesional side (Posner, Walker, Friedrich, & Rafal, 1984). Patients that had degenerate superior

    colliculi, structures involved in eye movement coordination, such as those suffering from

    supranucluear palsy were slow to move their attention from one location to the next (Rafal et al.,

    1988). Lastly, patients with thalamic damage responded poorly at spatial locations that were

    contra-lateral to their lesion irrespective of whether trials were valid or invalid (Rafal & Posner,

    1987). In summarizing the neuropsychological work that supported the selective attention model,

    it can be stated that the parietal lobe was critical in the disengagement of attention, the superior

    colliculus was implicated in moving attention from one point in space to another and the

    thalamus was essential for re-engaging attention at particular location.

    1.2.2 Endogenous vs. Exogenous Shifts

    The model of attentional cueing postulated by Posner was based on experiments in which

    cues were centrally presented (at fixation) and targets occurred in the periphery (either side of

    fixation). Experimenters have subsequently claimed that the location of cue presentation, central

    or peripheral, can tap into two different attentional mechanisms – endogenous and exogenous

    (see Ruz & Lupianez, 2002 for a review). Endogenous shifts of attention occur following an

    instruction to attend to a target location where participants have to use their own volition to

    orient attention. In contrast, exogenous shifts of attention are more reflexive in that participants

    can automatically orient to the stimulus that is presented at a target location (see Gazzaniga, Ivry,

    & Mangun, 2002 for review). When cues are presented centrally, participants have to move their

  • 3

    attention according to the instructional content of the cue (Ruz & Lupianez, 2002). On the other

    hand, peripheral cues are able to capture attention exogenously at a particular location without

    any instructional content (Ruz & Lupianez, 2002). Therefore, informative central and peripheral

    cues have been used to study endogenous shifts of attention while spatially non-predictive

    peripheral cues have been used to study exogenous shifts of attention (Chica, Sanabria,

    Lupianez, & Spence, 2007).

    Researchers that have attempted to separate the effects of endogenous and exogenous

    shifts of attention have found that endogenous cueing requires a longer delay interval between

    cue and target presentation (Eimer, 2000; Muller and Findley, 1988). These studies imply that

    different attentional mechanisms may be mediating cue processing. Corbetta and Shulman

    (2002) have claimed that endogenous versus exogenous shifts of attention may invoke different

    behavioural patterns and neural activations. In a meta-analysis of studies conducted on shifts of

    attention, Corbetta and Shulman (2002) ascribed voluntary (endogenous) attention to a network

    of areas that include the dorsal posterior parietal and frontal cortex with transient activity in

    occipital areas. The functional attributes of this fronto-parietal network include integration of

    prior experience with expectations and goals to result in volitional shifts in attention. The

    temporo-parietal cortex and ventral frontal regions comprise a separate network that has been

    implicated in reflexive (exogenous) attentional shifts. This reflexive network is thought to focus

    attention on salient events in the environment.

    A similar but less specific distinction in attentional processes was suggested earlier by

    Posner (1992) and Posner and Petersen (1990). The authors claimed that an anterior attentional

    network consisting of frontal areas and the anterior cingulate was responsible for extracting the

    relevance of a selected item while a posterior attentional network involving parietal areas

    selected information based on sensory attributes. Posner argued that this anterior-posterior

    attention system functioned to coordinate activity at a supramodal (modality-independent) level

    and was able to regulate data processing that was specific to particular cognitive tasks.

    While Corbetta and Shulman (2002) and to some extent, Posner (1992) have made an

    attempt to categorize exogenous and endogenous attentional shifts into distinct processing

    streams, a study by Rosen and colleagues (1999) suggested a fair bit of overlap in brain

    activations in response to voluntary and reflexive attention. In both endogenous and exogenous

  • 4

    cueing tasks, activations were seen in the dorsal premotor region, the frontal eye fields and the

    superior parietal cortex. Evidence to date remains inconclusive about the exact nature of

    interactions between endogenous and exogenous shifts of attention at the neural level.

    The mechanisms of attention discussed in this section are not directly explored in the

    current experiment but provide the context for understanding experiments on crossmodal cueing

    outlined below.

    1.3 Influence of Crossmodal Stimuli on Attentional Mechanisms The research mentioned thus far has focused on visual spatial orienting while

    experiments conducted in the 1960s also found attentional modulations of response times to

    cross-modal (auditory, visual) stimuli (Bertelson & Tisseyre, 1969; Davis & Green, 1969). In

    one experiment, Bertelson and Tisseyre found that an auditory click decreased reaction time to a

    subsequent visual flash while a visual flash did not have the same effect on an auditory click. A

    more specific investigation into crossmodal cueing was conducted by Buchtel & Butter (1988)

    who used lateralized auditory and visual stimuli. These stimuli were spatially significant

    (spatially neutral conditions were used as controls) in comparison to Bertelson & Tisseyre’s non-

    spatial stimuli (centered flash, binaural click: neutral spatial significance), Two cross-modal

    cueing cases were devised, auditory to visual (A-V) and visual to auditory (V-A), as well as the

    intra-modal cases, visual to visual (V-V) and auditory to auditory (A-A). Previous findings were

    reinforced with respect to reaction time: cross-modal cueing led to faster responses compared to

    intra-modal cueing, and audio cues were more effective in facilitating responses compared to

    visual cues. In the cross-modal conditions, visual target stimuli seemed much easier to cue than

    audio target stimuli, regardless of the cue modality. In fact, Buchtel and Butter reported virtually

    no cueing effect for auditory target stimuli.

    Farah, Wong, Monheit, and Morrow (1989) performed a cross-modal cueing study on

    patients with lesions circumscribed unilaterally to the parietal lobe to try to understand the nature

    of crossmodal cueing effects in the brain. Prior to this study, Posner and colleagues (1984) had

    shown that patients with unilateral parietal lobe lesions had difficulty disengaging attention in a

    standard visual spatial cueing paradigm. Farah and colleagues (1989) extrapolated on Posner’s

    work and examined A-V and V-V conditions. In the V-V condition there was a 50ms reduction

    in reaction time when the target appeared on the ipsa-lesional side of space, and a considerable

  • 5

    increase in reaction times for targets that appeared on the contra-lesional side of space. In the A-

    V condition, response times were faster than in the visual cue condition. Again, there was a large

    increase in reaction time for targets that appeared contra-lateral to the parietal lesion. This study

    was in agreement with its predecessors in a general way: cross-modal A-V cueing produced

    facilitated reaction times.

    Contrary to the observation of these past studies was a paper by Ward published in 1994.

    Ward (1994) used a crossmodal spatial discrimination task where participants made speeded left-

    right responses to visual or auditory targets following the presentation of an auditory or a visual

    non-predictive cue, both auditory and visual cues, or no cues. Reaction times were measured for

    all conditions at different inter-stimulus intervals between the cue and target. The results

    indicated that visual cues facilitated reaction times to auditory targets that were presented on the

    same side of the cue (compatible) at short inter-stimulus intervals. Auditory cues, in contrast, did

    not facilitate reaction times to visual targets shown on either side of cue presentation or at any

    inter-stimulus interval. Auditory cues did facilitate reaction times to compatible auditory targets

    at short inter-stimulus intervals. Ward’s findings were directly opposite to those found by

    previous crossmodal studies (Bertelson & Tisseyre, 1969; Buchtel & Butter, 1988; Farah, Wong,

    Monheit, & Morrow, 1988).

    In a subsequent study by Spence and Driver (1997), an orthogonal spatial cueing

    paradigm (Spence and Driver, 1994) was used to examine crossmodal cueing effects. In the

    orthogonal spatial cueing experiment, participants had to discriminate the elevation of a target

    sound rather than its laterality. The targets were preceded by an uninformative cue on the same

    side in fifty-percent of the trials and on the opposite side in the rest of the trials. The results of

    multiple manipulations of the orthogonal spatial cueing paradigm replicated the finding that

    auditory cues facilitate reaction times to visual targets. The authors claimed that Ward’s

    contradictory findings could be explained in light of spatial compatibility effects and response

    priming since Ward’s stimuli were lateralized.

    Ward and his collaborators have subsequently demonstrated that the cross-modal

    asymmetry in favour of visual cueing still holds when all methodological confounds are removed

    (McDonald & Ward, 1999; Ward, McDonald, & Golestani, 1998; Ward, McDonald, & Lin,

    2000). Ward, McDonald and Golestani (1998) showed that when cues and targets were presented

  • 6

    from exactly the same lateral eccentricity, responses to visual cues were faster, ruling out Spence

    and Driver’s criticism about Ward’s initial findings being based on spatial compatibility effects

    that arise as a result of cue-target lateralization differences. In another study by Ward, McDonald

    and Lin (2000), an implicit spatial discrimination paradigm was used to study cross-modal

    asymmetry and visual cue superiority was once again found eliminating effects of response

    priming on Ward’s early findings (1994).

    Mondor and Amirault (1998) reacted to discrepancies in the direction of the crossmodal

    cueing (auditorily or visually driven effects) by conducting a comprehensive investigation into

    the impact of experimental set-up on cueing effects. The investigators manipulated cue and target

    modalities, stimulus onset asynchrony (SOA), and target location given cue location. In their first

    experiment, they tried to maximize uncertainty for the subject: an auditory or visual spatial cue

    preceded an auditory or visual lateralized target by either 150 or 300ms. The results showed that

    valid trials were faster than invalid trials (also known as the cue validity effect) only for intra-

    modal tasks (A-A, V-V) and only for an SOA of 150ms. Cueing for cross-modal cases did not

    reach significance. These results seemed inconsistent with past findings (Bertelson & Tisseyre,

    1969), but Mondor and Amirault (1998) fostered a hypothesis that could address the discrepancy.

    They suspected that endogenous mechanisms were responsible for cross-modal cueing effects.

    That is, rather than pure perceptual-motor reactivity, it was expectation or anticipation that

    allowed cross-modal attention to be efficient. In their second experiment, Mondor and Amirault

    (1998) decreased the uncertainty: seventy-five percent of the cues were valid, while the modality

    of cue and stimulus was unpredictable and the SOA was a constant 150ms. A significant cue

    validity effect for A-V and V-A resulted even though the A-A and V-V effect was stronger. In a

    third experiment, Mondor and Amirault (1998) administered the conditions in blocks, thus

    rendering modality of cue and target predictable. This increased intra-modality cueing effects,

    though cross-modal cueing effects were unaffected. Mondor and Amirault (1998) concluded that

    cross-modal cueing was effective when endogenous mechanisms were at work, and those

    mechanisms were spurred by predictability (predictably of modality seemed irrelevant) or lack of

    uncertainty.

    Although, Mondor and Amirault’s (1998) third experiment did not report strong

    crossmodal cueing effects, a subsequent study by Schmitt, Postma and de Haan (2000) found

    facilitated reaction times to both auditory and visual cues in an experimental set-up where cue

  • 7

    and target modalities were fixed within a block. Symmetric audio-visual cueing effects have

    subsequently been reported in other studies (McDonald & Ward, 1999; 2003). At present,

    facilitation effects have been shown for both auditory and visual cues but the exact neural

    underpinnings of these cueing effects have yet to be isolated.

    On a side note, most crossmodal cueing experiments mentioned previously have

    manipulated SOA in addition to cue and target modalities (Bertelson & Tisseyre, 1969; Spence

    & Driver, 1997; Ward, 1994) to try to understand the type of attentional mechanisms that may

    underlie the integration of cue-target information. When Bertelson and Tisseyre (1969) varied

    stimulus onset asynchrony between 0 and 700 ms and measured the effect of SOA manipulation

    on responses to a target, they found that an auditory click decreased reaction time to a

    subsequent visual flash at the smallest SOA. At larger SOAs (greater than 70ms), in the case of

    the visual flash that preceded the auditory click, some facilitation of reaction time was noted but

    it was greatly attenuated in comparison to the auditory click – visual flash (A-V) case. The

    primary purpose of carrying out such behavioural experiments was to determine if there was an

    attentional refractory period that followed processing of the first stimulus. The logic was that if

    attentional mechanisms which may be governed by higher-order cognitive systems in the brain

    were required to process the first stimulus, than this processing would utilize resources that

    would not be available to process a subsequent stimulus till a certain time had elapsed (Davis,

    1959). This elapsed time was known as the attentional refractory period. However, experiments

    by Bertelson and Tisseyre found that there was no attentional refractory period if the two stimuli

    were presented in two different sensory modalities (auditory and visual). When stimuli were

    presented in the same modalities (visual-visual: V-V), the facilitation was very weakly

    represented in the behavioural data. By using a substantially long SOA, both auditory and visual

    cueing effects can potentially be delineated.

    1.4 Crossmodal Asymmetry Reported In Cognitive, Physiological, and Developmental Studies

    The majority of behavioural studies conducted on some aspect of crossmodal cueing

    allude to the idea that there may be cognitive, physiological and developmental bases for

    auditory and visual interactions. Some of these studies will be described next.

  • 8

    1.4.1 Auditory to Visual (A-V) Interactions

    Investigators examining multisensory interactions have found that a sudden sound can

    enhance the detection of a subsequent flash of light in the same location (McDonald, Teder-

    Salejarvi, & Hillyard, 2000). Abrupt sounds synchronized with visual search arrays can improve

    the identification of visual targets embedded in a series of distracters (Vroomen & de Gelder,

    2000). These studies suggest that auditory events can influence visual processing.

    According to Neumann, Van der Heijden, & Allport (1986), shifting visual attention to

    auditory events has evolutionary significance. They claim that auditory events that occur distally

    can be registered in the brain resulting in appropriate action whereas; by the time visual events

    come into view proximally, a response may not be viable. Since auditory events in the world are

    transient and intermittent relative to visual events which appear continuous in time, it is more

    beneficial to have an asymmetry in audio-visual cueing. Building on concepts proposed by

    Neumann and colleagues’ ideas, some researchers (Brown & May, 1989; Harrison & Irving,

    1966; Spence & Driver, 1997) have suggested that the primary function of sound localization in

    animals is to the direct the eyes towards auditory events. In this way, sound localization may be

    necessary for the control of orienting towards significant distal events which occur outside an

    animal’s field of view.

    Visual orienting with respect to sound localization has been explored in physiological

    studies involving the superior colliculus (King, 1993; Stein & Meredith, 1993; Stein, Wallace, &

    Meredith, 1995). In order to understand how the superior colliculus may contribute to the

    integration of audio-visual stimuli, it is imperative to consider the anatomical organization of this

    structure. The first three layers of the superior colliculus are purely visual and organized spatio-

    topically (de Monasterio, 1978a, 1978b; Dubois & Cohen, 2002; Leventhal et al., 1981; Perry

    and Cowey, 1984; Rodieck and Watanabe, 1993; Schiller and Malpeli, 1977; Sparks, 1988;

    Sparks & Hartwich-Young, 1989). There is no influence of audition on these first three

    superficial layers (King, 1993). The three layers beneath the visual layers are often referred to as

    the deep layers of the superior colliculus. These layers are considered polymodal because they

    receive ascending inputs from brainstem nuclei such as the inferior colliculus and the trigeminal

    nucleus that represent modalities such as audition, vision and touch (Wickelgren, 1971).

    Descending inputs from the temporal cortex and postcentral gyrus also culminate in these deep

  • 9

    layers (Wickelgren, 1971). Cells within the deep layers that receive afferents from auditory,

    visual pathways and multimodal cortical sites are known as multisensory integrative cells (MSI,

    term coined in Calvert, 2001). These MSI cells are not only capable of generating responses to

    different modalities but are also able to transform separate sensory inputs into an integrated

    product. In conclusion, the anatomy of the superior colliculus makes it an ideal candidate for

    carrying out multisensory integration; that is the assimilation of information from multiple

    sensory modalities into a motor outcome.

    According to Wickelgren, the superior colliculus is able to track stimuli in either vision

    or audition that are moving laterally away from an animal to ensure appropriate avoidance or

    approach behaviour. This structure, in a variety of species ranging from amphibians to primates,

    is able to produce orienting movements linked to the eyes (saccades), head and body as well as

    approach, freezing or running responses (Sahibzada, Dean, & Redgrave, 1986; Sparks, 1999;

    Vargas, Marques, & Schenberg, 2000). Researchers have hypothesized that the superior

    colliculus may be vital in generating motoric responses to salient sensory events (Sparks, 1999;

    Stein, 1998).

    Other neurophysiological investigations of the functional architecture of the superior

    colliculus have shown that there is a two-dimensional representation of auditory target location

    represented within deep layers of this structure (King, 1993; Stein & Meredith, 1993). Stein and

    Meredith (1993) however, are careful in noting that the deep layers are multimodal rather than

    purely auditory suggesting that any shifts of attention could implicate involvement of other

    modalities. King (1993) advocates further that there is no spatio-topic map of auditory space in

    the brain therefore; it unlikely that vision could guide auditory localization. Clarey, Barone and

    Imig (1992) have found that in structures like the inferior colliculus and the primary auditory

    cortex, there is a tonotopic organization with lateralized auditory receptive fields but there is no

    indication of spatial tuning or spatiotopy. Therefore, the only spatio-topic map of auditory space

    in the brain is the one found in polymodal deep layers of the superior colliculus.

    Developmental studies of the superior colliculus have shown that the representation of

    auditory space in deep layers of the superior colliculus can be altered by varying the visual

    environment (King & Carlile, 1993; King, Hutchings, Moore and Blakemore, 1988; Withington,

    1992). In contrast, manipulations of auditory experience have no influence on the spatiotopically

  • 10

    organized superficial layers of the superior colliculus (Knudsen, Esterly & Knudsen, 1984;

    Withington-Wray et al., 1990). This asymmetry in the direction of audio-visual interactions is

    also prominent in studies that have examined the development of senses in infants. According to

    Gottlieb (cited in Lewkowicz & Kraebel, 2004), different sensory modalities develop in the

    following sequence: tactile, vestibular, chemical, auditory and visual. The first four modalities

    mentioned are functional prenatally while vision develops most postnatally. Lewkowicz (1988a,

    1988b) claims that sensory processing hierarchies evident in developmental profiles of infants

    may contribute to the different degrees of how input from a modality is transmitted to different

    unisensory and multisensory sites in the brain. For example, inputs from auditory cortical

    neurons could possibly project to areas of the visual cortex that are still underdeveloped at birth

    however; neurons within the visual cortex may be unable to innervate the developed neuronal

    structure of the auditory cortex. Thus, the cellular architecture that supports a particular sensory

    modality may result from temporal differences in the development of the senses.

    1.4.2 Visual to Auditory (V-A) Interactions

    Over the years, researchers have reported some effects of vision influencing audition.

    These studies are not as extensive as the ones mentioned previously for the auditory capture of

    visual attention but are worth considering.

    The famous McGurk effect provides evidence for vision altering speech perception

    (McGurk & MacDonald, 1976). For example, a sound of /ba/ is perceived as /da/ when it is

    coupled with a visual lip movement associated with /ga/. The McGurk effect suggests that sound

    can be misperceived when it is coupled with different visual lip movements. fMRI studies by

    Calvert and colleagues (1997, 1999, 2000) have investigated brain activity in relation to

    incongruent audio-visual linguistic information (similar to the McGurk effect). Calvert et al.

    (1997) found that lip-reading (visual speech) activated areas of the auditory cortex that were

    previously considered to be unimodal. In a subsequent study, Calvert and colleagues (1999)

    showed an enhancement in sensory-specific cortices when participants saw and heard speech.

    These neuroimaging studies uphold the view that vision can influence audition.

    Another well-known effect of vision on audition is the ventriloquist effect first reported

    by Howard and Templeton (1966). The ventriloquist’s illusion is caused when the perceived

    location of a sound shifts towards the location of the visual source. Ventriloquists are able to

  • 11

    produce speech without visible lip movements while simultaneously moving a puppet. This leads

    to a conflict between visual and auditory localization that culminates in vision dominating

    audition. A recent study by Guttman, Gilroy and Blake (2005) has suggested that in audio-visual

    conflict situations, vision dominates spatial processing while audition directs temporal

    processing. The effects stated in this section suggest that vision can alter audition in very specific

    cases such as those of conflicting information from multiple modalities. The influence of vision

    on audition in terms of spatial localization and cueing still remains an area that has potential for

    exploration.

    1.5 General Anatomy of Central Auditory and Visual Pathways An exploration of the characteristics of audio-visual interactions is incomplete without a

    consideration for the general organization of auditory and visual pathways in the brain. This

    section will briefly review some general areas that are involved in the processing of auditory and

    visual stimuli.

    1.5.1 The Auditory Pathway

    Sound waves from the environment are captured and focused by the auricle – the external

    ear – into the auditory canal. The auditory canal is a hollow, air-filled tube that transmits sound

    waves to three bones located in the middle ear (the malleus, the incus and the stapes). These

    three bones amplify the auditory signal enabling it to travel through the fluid-filled inner ear

    structure called the cochlea. The cochlea is the primary site of the conversion of sound energy

    into a neural code. This process is called signal transduction (Noback, 1967). Outer hair cells,

    also known as auditory sensory receptors, in the cochlea display motility converting mechanical

    sound energy into receptor potentials. Subsequently, these outer hair cells transmit receptor

    potentials to inner hair cells which provide frequency and intensity information to cochlear

    ganglion cells (Miller & Towe, 1979). The innervation of hair cells by ganglion cells in the

    cochlea is the first part of the auditory neural pathway where information is encoded both in

    terms of frequency and intensity of sound. Neurons within the cochlea respond best to

    stimulation at characteristic frequencies of contiguous cells. In this way, tonotopy – organization

    of tones that share similar frequencies into topologically neighbouring neurons – begins

    postsynaptic to inner hair cells (Spoendlin, 1974).

  • 12

    Axons from the cochlear ganglion cells form the cochlear component of the eighth

    cranial nerve synapsing in the cochlear nuclear complex of the medulla-pontine junction. The

    cochlear nucleus sends auditory inputs via three different pathways – dorsal acoustic stria,

    intermediate acoustic stria and trapezoid body – to the pons. The trapezoid body projects to the

    superior olivary nuclei where information from both ears (binaural) is processed. Localization of

    sounds in space occurs in the medial and lateral portions of the superior olivary nucleus.

    Efferents from the superior olivary nucleus, dorsal and intermediate acoustic stria terminate in

    the inferior colliculus by way of the lateral lemniscus nuclei. The inferior colliculus in turn sends

    outputs to the medial geniculate nucleus of the thalamus which end up in the primary auditory

    cortex (area A1, or Brodmann areas 41 and 42; Brodmann, 1909) located on Heschl’s gyrus in

    the superior temporal lobe (adapted from Brodal, 1981, Hudspeth, 2000). The auditory cortex is

    organized in concentric bands with the primary auditory cortex in the centre and auditory

    association areas forming the periphery (see Pandya, 1995 for review). The auditory neural

    pathway maintains its tonotopic organization from the cochlear ganglion cells to the primary

    auditory cortex

    For decades, it was assumed that the auditory neural pathway processed exclusively

    acoustic information. In recent years, studies have shown that outputs from midbrain structures

    such as the inferior colliculus and the medial geniculate nucleus of the thalamus can contain

    some visual information (Komura et al., 2005; Porter, Metzger, & Groh, 2007). Porter, Metzger

    and Groh (2007) have demonstrated that the inferior colliculus in monkeys carries visual,

    specifically saccade-related information in addition to auditory responses. This study suggests

    that the inferior colliculus, predominantly considered to be a unisensory structure responsive to

    only auditory stimuli, may have the capacity to integrate specific types of audio-visual

    information. In another study by Komura and colleagues (2005), rats were trained to perform an

    auditory spatial discrimination task with auditory or auditory-visual cues. The auditory-visual

    cues were presented in such a manner that the visual cues were either congruent with auditory

    cues or provided conflicting information. The results showed that almost fifteen percent of

    auditory thalamic neurons were modulated by visual cues. Responses in these neurons were

    enhanced in congruent conditions and suppressed when auditory and visual cues conflicted.

    Studies by Porter, Metzger and Groh, and Komura et al. imply that the auditory cortex receives

    some visual input via subcortical structures within the central auditory pathway; the extent of

  • 13

    which is not clear. Thus, unisensory sites in the brain may have the ability to perform some

    multisensory functions.

    1.5.2 The Visual Pathway

    Light entering the eye is detected by the retina which is composed of photoreceptors

    (cones and rods) that transduce light into electrical signals that can be decoded by the brain.

    Information from rods and cones feeds into a network of interneurons composed of horizontal,

    bipolar and amacrine cells which in turn project to ganglion cells. Axons from the retinal

    ganglion cells form the optic nerve (Wurtz & Kandel, 2000a; 2000b). The optic nerve carries

    information from both eyes till it reaches the optic chiasm where inputs from the left side of each

    eye (left hemifield) and the right hemifield cross to the contra-lateral hemisphere (Guillery,

    1982). From the optic chiasm, visual inputs flow through optic tracts segregated by eye (left or

    right) to the lateral geniculate nucleus of the thalamus. The primate lateral geniculate nucleus

    contains six layers. Layers 1 and 2 are called the magnocellular layers and layers 3-6 are termed

    the parvocellular layers (Kaas, Guillery, & Allman, 1972; Sherman, 1988). Both magno- and

    parvo-cellular layers project to the primary visual cortex (area V1 or Brodmann area 17;

    Brodmann, 1909). V1 sends outputs to V2/V3 from which point the visual information is split

    into two streams – temporal and parietal (see Desimone & Ungerleider, 1989). The visual

    association areas include areas such as V4, V5 and MT (middle temporal area; adapted from

    Wurtz & Kandel, 2000a; 2000b). The plethora of connections between primary visual areas and

    visual association cortices are beyond the scope of this discussion but can be reviewed in a paper

    by Felleman and Van Essen (1991). Despite the multitude of levels in the central visual pathway,

    information is maintained retinotopically - adjacent areas in the visual field are encoded by

    slightly different but overlapping neuronal receptive fields.

    Apart from the major retinogeniculostriate pathway, the retina also projects to other

    subcortical areas such as the superior colliculus (Leventhal, Rodieck, & Drehkr, 1985;

    Magalhaes-Castro, Murata, & Magalhaes-Castro, 1976; Rodieck & Watanabe, 1993; Wassle &

    Iling, 1980) and the pulvinar nucleus of the thalamus (Grieve, Acuna, & Cudeiro, 2000; Itoh,

    Mizuno, & Kudo, 1983; Mizuno et al., 1982; Nakagawa & Tanaka, 1984) as seen in primates

    and cats. The superior colliculus and the pulvinar nucleus are also extensively interconnected

    (Grieve, Acuna & Cudeiro, 2000; Stepniewska, Qi, & Kaas, 2000). The anatomical organization

  • 14

    of both these subcortical structures and their interactions with auditory and visual pathways

    places them in a unique position to integrate multisensory information.

    1.6 From Anatomy to Function - A Dynamic Systems’ Perspective As discussed in previous sections, the anatomical architecture of subcortical structures in

    auditory and visual pathways supports both unisensory and multisensory processing. At the

    cortical level, Felleman and Van Essen (1991) have documented extensive feed forward,

    feedback and horizontal connections between visual and multisensory brain areas. Connections

    between unimodal auditory cortex and primary visual areas have been mapped in primates by

    Rockland and Ojima (2003) and Falchier et al. (2002). Cognitive neuroscience studies have also

    successfully identified multisensory sites that include cortical regions like the posterior parietal

    cortex (Bushara, Grafman, & Hallett, 2001; Iacoboni, Woods, & Mazziotta, 1998), intraparietal

    sulcus (Macaluso & Driver, 2005), premotor areas (Dassonville et al., 2001; Kurata et al., 2000;

    Tanji & Shima, 1994), anterior cingulate (Laurienti et al., 2003; Taylor et al., 1994) and

    prefrontal cortex (Asaad, Rainer, & Miller, 1998; Bushara, Grafman, & Hallett, 2001; Laurienti

    et al., 2003; Tanji & Hoshi, 2001; Taylor et al., 1994) and subcortical regions such as the

    thalamus (Grieve, Acuna,& Cudeiro, 2000; Porter, Metzger, & Groh, 2007), superior colliculus

    (Calvert, 2000; King, 1993; Stein & Meredith, 1993; Stein, Wallace, & Meredith, 1995),

    cerebellum (Allen, Buxton, Wong, & Courchesne, 1997; Bense et al., 2001; Kim et al., 1999)

    and parts of the basal ganglia (Chundler, Sugiyama, & Dong, 1995; Nagy et al., 2006). The

    abundance in areas that process multisensory information suggests that there are large degrees of

    functional overlap and redundancy in the brain. This idea was initially asserted by Mountcastle

    (1979) who noticed that proximal areas showed similar functional characteristics and that there

    were many structural and functional redundancies in sensorimotor systems.

    Theories about how the brain’s structural capacity contributes to its functional properties

    can be classified into a framework that considers the brain to be a dynamic system. This

    framework has been defined by researchers in numerous ways, some of which will be discussed

    below (Bressler, 1995; Bressler, 2002; Bressler & McIntosh, 2007; Bressler & Tognoli, 2006;

    Fuster, 1997; Goldman-Rakic, 1988; Mesulam, 1981, 1990; 1998; McIntosh, 1999; Price &

    Friston, 2002).

  • 15

    Bressler and Tognoli (2006) suggest that the functional expression of a particular

    cognitive operation results from the co-activation of specific interconnected local area networks.

    Mesulam (1990) has also stated that a cognitive or behavioural operation can be subserved by

    several interconnected brain areas each of which is capable of multiple computations. Therefore,

    a cognitive operation such as attention can be controlled by a diffuse cortical network that is

    redundant and specialized at the same time (Mesulam, 1981). According to his view, each

    region within a cortical network has some specialization because of its anatomical connections

    but this specialization is not absolute in that lesions to different areas in a network could have

    similar behavioural consequences. Goldman-Rakic, in a review paper (1988) comparing the

    literature on different models of cortical organization claims that the brain’s association cortices

    (areas that are largely responsible for complex processing that occurs between sensory input to

    primary sensory cortices and motor output elicited by primary motor areas) interact to form a

    finite number of dedicated networks that are reciprocally interconnected. These networks are

    capable of communicating with sensorimotor systems to produce integrated behaviour.

    One dynamical systems’ perspective that has gained momentum in the past few decades

    is the notion that regions of the brain that share similar structural properties can contribute to a

    multitude of functions by way of their interactions with other regions (Bressler, 1995; Bressler,

    2002; McIntosh, 1999; Bressler & McIntosh, 2007, Price & Friston, 2002). These interactions

    could be represented via direct or indirect connections. In addition, brain areas that have very

    different neural connections can contribute to the same functional output. This framework of

    distributed function in the brain has been defined operationally by McIntosh (1999, 2004) using

    the term neural context. Neural context represents the local processing environment of a given

    brain area that results from modulatory influences from other brain areas (Bressler & McIntosh,

    2007; McIntosh, 1999). Therefore, cognitive function may not be localized in an area or network

    of the brain but may emerge from dynamic large-scale neural interactions between different

    brain areas that change as a function of task demands. Task demands, in broader terms, can be

    the situational context in which an event occurs. Situational context refers to environmental

    factors such as sensory input and response processing in which a task is performed. In most

    cases, neural context is elicited from changes in situational context (Bressler & Tognoli, 2006;

    Protzner & McIntosh, 2007).

  • 16

    While dynamical models incorporating large-scale neural interactions, neural and

    situational contexts have been explored for cognitive processes such as learning and memory

    (Lenartowicz & McIntosh, 2005; McIntosh & Gonzalez-Lima, 1994; 1998), they have seldom

    been applied to investigate interactions between cognitive processes such as attention and

    sensorimotor systems. Brain areas that display attention-sensorimotor interactions can be

    dissociated using techniques like functional resonance imaging (fMRI). The use of fMRI to study

    brain function will be discussed in the next section.

    1.7 An Overview of fMRI

    1.7.1 Basic MRI Physics

    The MRI concepts presented here are summarized from Brown and Semelka (1999) and

    Huettel, Song and McCarthy (2004).

    The proton of a hydrogen atom has a magnetic spin that is denoted by a spherical,

    distributed positive charge. This charge rotates about an axis at high speeds producing a small

    magnetic field called a magnetic moment. In addition to the magnetic moment, a proton’s mass

    in combination with the rotating charge produces angular momentum. The magnetic moment and

    the angular momentum form the basis of the spin properties of a proton. When a proton is

    exposed to a uniform magnetic field (B0), it can assume one of two types of spins – parallel

    (same direction as B0) and anti-parallel (opposite direction to B0) – forming the equilibrium state.

    Parallel spins are lower in energy and more stable compared to anti-parallel spins. In order to

    convert a proton in a parallel spin state to an anti-parallel state, the proton needs to absorb

    electromagnetic energy. By the same token, a proton in a high energy state emits electromagnetic

    energy as it returns to a low energy state. The electromagnetic radiation frequency required to

    excite a proton from a low-energy state to a high-energy state can be calculated for an MR

    scanner. This frequency is called the Larmor frequency and is needed to change spins from

    parallel to anti-parallel orientations.

    Apart from the changes in spin states induced by B0, a proton’s motion about its axis can

    also be influenced by B0. In the presence of B0, the axis of rotation for a proton can rotate around

    the direction of B0. The motion of a proton in B0 is referred to as spin precession. The physical

    characteristics of spin precession are exploited in MRI.

  • 17

    In an MRI set-up, a participant is placed in the centre of a uniform magnetic field.

    Protons within brain tissue assume their equilibrium states (parallel or anti-parallel). Using the

    Larmor frequency to generate an electromagnetic radiation pulse, protons can be excited and

    their rotational axis perturbed to generate an MR signal. Once the electromagnetic pulse is

    removed, the MR signal starts to decay. MR signal decay, also known as spin relaxation, is of

    two types – longitudinal and transverse. Longitudinal relaxation (T1 recovery) corresponds to the

    return of net magnetization in the same plane as B0. It is caused by protons in high-energy, anti-

    parallel spins (excited state) returning to low-energy, parallel, relaxed states. In order to

    understand transverse relaxation (T2 decay), consider the following: an electromagnetic pulse

    tips the axis about which a proton precesses in the transverse plane such that all protons precess

    in the same phase. When the pulse is removed, this phase-locking gradually subsides and protons

    return to their original out of phase states. This is referred to as transverse relaxation. The rates

    of longitudinal and transverse relaxation are constant for particular substances such as water,

    bone or fat. The constants that describe longitudinal and transverse relaxation are called T1 and

    T2, respectively.

    While both the T1 and T2 constants are important for MR, a third constant, T2* is

    essential for functional MR. T2* includes transverse relaxation due to phase differences in spin

    precessions as well as local magnetic field inhomogeneities. The latter can be described by

    considering an inhomogeneous external magnetic field where the strength of the magnetic field

    at a particular location influences the spin precessional frequency. Perturbed protons at different

    locations within the magnetic field lose coherence in their spin precessions at different rates

    contributing to the decay of net magnetization in the transverse plane.

    An MR signal can highlight different parts of the brain depending on the contrast used. A

    T1-weighted image of the brain shows high signal (bright) for fat content and low signal for

    cerebrospinal fluid (CSF; dark) while a T2-weighted image shows the opposite contrasts for fat

    and CSF.

    Two parameters that are critical to the amount of MR signal recorded and the contrast

    expressed are repetition time (TR) and echo time (TE). The interval between successive

    electromagnetic pulses that excite protons is referred to as TR. The TR influences the rate of

    longitudinal recovery after an excitation pulse is removed. TE corresponds to the interval

  • 18

    between the excitation pulse and the acquisition of data and affects the rate of transverse decay.

    By varying the length of the TR or TE, image intensity at each spatial location can be made more

    or less sensitive to differences in T1 and T2.

    1.7.2 Physiological Basis of BOLD fMRI

    Functional MRI (fMRI) can be used to estimate the spatial locations of neural activations

    and associated changes in metabolic demands when a person performs a certain task. These

    metabolic changes include variations in concentration of deoxygenated hemoglobin, blood flow

    and blood volume, (Buxton, Wong, & Frank, 1998; Kwong et al., 1992) all of which play a role

    in producing the blood-oxygenation-level dependent (BOLD) response recorded in fMRI. To

    understand the BOLD response, the magnetic properties of oxygenated and deoxygenated

    hemoglobin must first be considered. Oxygenated hemoglobin is diamagnetic (does not affect

    magnetic field strength) while deoxygenated hemoglobin is paramagnetic (distorts local

    magnetic fields). The paramagnetic properties of deoxygenated hemoglobin decrease T2* values

    (Thulborn et al., 1982) which allows measurement of changes in brain activity as indexed by

    changes in the amount of deoxygenated hemoglobin.

    The BOLD response in MR was first discovered by Ogawa and colleagues in 1990.

    Anaesthetised rats were placed in an MR scanner. The experiment was performed to try to

    investigate brain physiology with MRI. It was known by that point that deoxygenated

    hemoglobin decreased blood’s T2* values (Thulborn et al., 1982). Ogawa et al. (1990) used this

    finding to demonstrate that varying the proportion of blood oxygenation in rats could lead to

    different MR image characteristics. In cases where rats breathed in oxygen at a hundred percent

    concentration, Ogawa et al., noticed that T2* images of the brain showed structural differences

    and very little vasculature. As the amount of oxygen in rats decreased, brain vasculature became

    more prominent. To test this interesting effect, a saline-filled container that had test-tubes of

    oxygenated and deoxygenated blood within it was imaged using at T2* contrast (Ogawa & Lee,

    1990). T2*-weighted images of oxygenated blood showed a dark outline around the test-tube’s

    diameter. In contrast, deoxygenated blood showed a greater signal loss that spilled over into the

    area filled with saline. Ogawa and others concluded that functional changes in brain activity

    could be studied using what was to be known as the BOLD contrast.

  • 19

    Following Ogawa’s work, Belliveau and colleagues (1991) observed the first functional

    images of the brain by using a paramagnetic contrast agent called Gd (DTPA)2-. Subsequently,

    Ogawa and others (1992) and Kwong et al., (1992) published the first functional images using

    the BOLD signal.

    The physiological basis of the BOLD signal can be summarized as follows. Relative

    amounts of oxygenated and deoxygenated hemoglobin in the capillary bed of a brain region

    depend on the ratio of oxygen consumption and supply. When neural activity increases, the

    amount of oxygenated blood delivered to that area also increases while levels of deoxygenated

    hemoglobin decrease. The BOLD signal captures the displacement of deoxygenated hemoglobin

    by oxygenated blood since the former has the capability of affecting magnetic fields but the latter

    does not. In other words, changes in oxygenation levels lead to the modulation of microscopic

    field gradients around blood vessels which in turn affect T2* values for tissue water that produce

    differences in signal strength (Huettel, Song, & McCarthy, 2004).

    A gradient echo-planar imaging sequence that is sensitive to the paramagnetic properties

    of deoxygenated hemoglobin can be used to display tomographic maps of brain activation

    (Brown & Semelka, 1999; Kwong et al., 1992; Huettel, Song, & McCarthy, 2004).

    1.7.3 Coupling of Neuronal Activity & BOLD

    The majority of functional neuroimaging studies assume that the physiological changes

    underlying the BOLD response are capturing neuronal activity. However, the nature of neuronal

    activity represented by BOLD responses is still an actively debated topic.

    Neuronal activity that would predict fMRI BOLD response could include multiple factors

    such as the average firing rate of a sub-population of neurons also referred to as multi-unit

    activity (MUA; Legatt, Arezzo, & Vaughan, 1980); synchronous spiking activity across a

    neuronal population; the local field potential (LFP) which represents the synchronization of

    dendritic currents (Mitzdorf, 1987), or some measure of the sub-threshold electrical activity as

    measured by single-unit recordings (SUA, measured by Hubel & Wiesel, 1959). The size of a

    neuronal population whose activity is indexed by fMRI signals may also be an issue. Scannell

    and Young (1999) postulate that changes in fMRI BOLD responses could be caused by large

  • 20

    changes in firing rates in small neuronal populations or small changes in firing rates in larger

    neuronal sub-populations.

    Logothetis and colleagues (2001) attempted to investigate the relationship between

    neuronal firing rates in the brain and subsequent BOLD responses picked up by fMRI. Monkeys

    were presented with rotating checkerboard patterns and both fMRI BOLD responses and

    electrophysiological signals were measured from primary visual cortex. The electrophysiological

    data that was acquired consisted of SUA, MUA and LFP recordings. The results showed a

    transient increase in BOLD at the onset of the visual stimulus which persisted until the visual

    stimulus went offline. Approximately twenty-five percent of MUA showed a transient increase

    in activity and subsequent return to baseline, while LFPs were sustained throughout the stimulus

    duration. The authors claimed that increased LFPs during stimulation were significantly stronger

    than MUA and were maintained over longer intervals therefore; LFPs give a better estimate of

    BOLD responses than MUA. Logothetis et al.’s (2001) paper also suggested that LFPs

    resembled integrative activity at neuronal dendritic sites while MUA corresponded to the axonal

    firing rate of a small population of neurons. Hence, BOLD seemed to reflect incoming input and

    local processing in an area more that spiking activity.

    The experiment by Logothetis and colleagues (2001) assumed a somewhat linear

    relationship between BOLD and LFPs. A subsequent study by Mukamel and colleagues (2005)

    suggested that BOLD responses may be comprised of more complex neuronal activity that does

    not necessarily follow a linear pattern. In Mukamel et al.’s experiment (2005), SUA and LFPs

    were recorded from two neurosurgical patients. fMRI BOLD signals were collected from healthy

    participants. Both patients and participants viewed a movie segment during measurements of

    neural activity. The results showed a high, significant correlation (r = 0.75) between SUA from

    neurosurgical patients and fMRI BOLD signals in healthy controls. The authors claimed that

    fMRI BOLD responses were reliable measures of firing rates in human cortical neurons.

    While the findings of Logothetis et al. (2001) and Mukamel et al., (2005) do not directly

    contradict each other; it does appear though that neuronal activity that is represented in the

    BOLD response may be a complex milieu of input and output processing. According to a review

    article on coupling between BOLD and neuronal activity, Heeger and Rees (2002) state that

    LFPs could be dominated by activity in proximal neurons which would suggest that local spiking

  • 21

    activity, synaptic activity and dendritic currents are all co-varying. The authors emphasize that

    fMRI BOLD response may be capturing both presynaptic and postsynaptic activity within a

    particular region. For experiments that are designed to investigate global changes in brain

    activity, having the resolution of single-unit recordings in fMRI may not be necessary.

  • 22

    Chapter 2: Aims and Hypotheses The primary objective of the present study is to understand crossmodal facilitation effects

    in the brain. There are discrepancies in behavioural research about the direction of crossmodal

    facilitation. To recap, some scientists have found that auditory cues are superior to visual cues in

    producing fast responses (Bertelson & Tisseyre, 1969; Buchtel & Butter, 1988; Farah, Wong,

    Monheit, & Morrow, 1988; Spence & Driver, 1997) while other researchers have shown the

    opposite effect – visual cues facilitate reaction times to auditory targets (McDonald & Ward,

    1999; Ward, McDonald, & Golestani, 1998; Ward, McDonald, & Lin, 2000; Ward, 1994). There

    have been no studies that I am aware of that have investigated the effects of audio-visual

    crossmodal facilitation in the brain using fMRI.

    The hypotheses for the current experiment are as follows. Firstly, the magnitude of

    facilitation for auditory cues will be larger compared to visual cues given the greater

    neuroanatomical capacity for audition to influence vision. Secondly, auditory and visual cueing

    will be represented by distinct patterns of brain activation given the differences in behavioural

    performance. Lastly, brain areas that respond to cue processing (input) may not be the same

    areas that coordinate behaviour (output).

    The first two hypotheses will be tested in Chapter 3 and the last hypothesis will be

    focused on in Chapter 4. In order to test the hypotheses mentioned, participants will perform a

    crossmodal version of the spatial stimulus-response compatibility task while BOLD fMRI

    responses are recorded. In spatial stimulus-response compatibility tasks, a cue signals the

    response rule to a lateralized target. Responses (button presses) are made to the same

    (compatible) or opposite (incompatible) side of target presentation. Reaction times are faster in

    compatible conditions than in incompatible conditions (Simon, 1969; Fitts & Seeger, 1953). In

    this experiment, the cue and targets will be presented in both auditory and visual modalities.

    A unique aspect of the current experimental design that I would like to emphasize is the

    manipulation of cue-target order. Tasks can be of two types – cues appearing first followed by

    targets or targets being presented first followed by cues. Previous behavioural studies that have

    investigated crossmodal facilitation did not manipulate order however, in my fMRI study I can

  • 23

    use this order manipulation to examine changes in neural activity based exclusively on cue or

    target processing.

  • 24

    Chapter 3: Attentional Cueing Modulates Multisensory Interactions in Human Sensory Cortices

    3.1 Introduction Experiments on selective attention have shown that when participants are provided with a

    cue, they shift their attention to the cued location (Posner, Inhoff, Friedrich, & Cohen, 1987).

    Attentional tasks where cues and targets are manipulated have been adapted to study crossmodal

    facilitation effects (Spence and Driver, 1997; Ward, 1994). Crossmodal facilitation occurs when

    a cue in one sensory modality elicits a speeded response in another sensory modality. A study

    conducted by Ward (1994) investigated the effects of crossmodal facilitation using a spatial

    discrimination task. Participants made speeded left-right responses to visual or auditory targets

    following the presentation of an auditory or a visual non-predictive cue, auditory and visual cues

    or no cues. Reaction times were measured for all conditions at different inter-stimulus intervals

    between the cue and target. The results indicated that visual cues facilitated reaction times to

    auditory targets presented on the same side of the cue (compatible) at short inter-stimulus

    intervals. Auditory cues, in contrast, did not facilitate reaction times to visual targets on either

    side of cue presentation or at any inter-stimulus interval. Auditory cues did facilitate reaction

    times to compatible auditory targets at short inter-stimulus intervals. Ward’s findings were met

    with skepticism because previous crossmodal studies that had presented non-predictive auditory

    cues had shown response time facilitation to visual targets ipsa-lateral to the cue (Buchtel &

    Butter, 1988; Farah, Wong, Monheit, & Morrow, 1989). A crossmodal study conducted after

    Ward’s study also showed an asymmetrical facilitation of reaction times for auditory cues in

    comparison to visual cues (Spence & Driver, 1997).

    Spence and Driver (1997) explained the asymmetrical auditory cue facilitation as having

    evolutionary significance. Auditory events in the world are transient and intermittent whereas

    visual events are continuous in time. Also, auditory events that occur distally could be registered

    in the brain resulting in appropriate action whereas; by the time visual events come into view

    proximally, a response may not be viable. Therefore, it is more beneficial to shift visual attention

    to auditory events rather than the other way around (Neumann, Van der Heijden, & Allport,

    1986). Evidence from neuropsychological studies also indicates that orienting to auditory events

    is usually followed by visual localization in areas like the superior colliculus (Stein & Meredith,

  • 25

    1993; Stein, Wallace, & Meredith, 1995). McDonald, Teder-Salejarvi & Hillyard (2000) showed

    that sudden sound can improve the detection of a subsequent flash of light at the same location.

    Abrupt sounds synchronized with visual search arrays can improve the identification of visual

    targets embedded in a series of distracters (Vroomen & de Gelder, 2000). The evidence

    presented thus far implies that auditory events can influence the processing of subsequent visual

    events.

    However, there has also been some evidence in favour of visual facilitation of reaction

    time. Schmitt et al. (2000) found facilitated reaction times to both auditory and visual cues in an

    experimental set-up where cue and target modalities were fixed within a block. Symmetric

    audio-visual cueing effects have subsequently been reported in other studies (McDonald &

    Ward, 1999; 2003). The famous McGurk effect also provides evidence for vision altering speech

    perception (McGurk & MacDonald, 1976). For example, a sound of /ba/ is perceived as /da/

    when it is coupled with a visual lip movement associated with /ga/. The McGurk effect shows

    that sound can be misperceived when it is coupled with different visual lip movements. These

    studies suggest that vision can also alter audition in some cases.

    The lack of consensus about the direction of reaction time facilitation in response to

    auditory and visual cues is further compounded by attempts to understand the exact nature of

    cue-target processing in the brain. Some researchers argue that salient sensory information

    contained in the cue is integrated with target information via separate, modality-specific sub-

    systems (Bertelson & Tisseyre, 1969; Bushara et al., 1999; Cohen, Cohen, & Gifford, 2004;

    Posner, Inhoff, Friedrich, & Cohen, 1987; Spence & Driver, 1997; Ward, 1994). Alternatively,

    other scientists argue that synthesis of information from different sensory modalities is achieved

    through a supramodal network that involves parts of the prefrontal cortex and parietal areas

    (Andersen, 1995; Andersen, Snyder, Bradley, & Xing, 1997; Bedard, Massioui, Pillon, &

    Nandrino, 1993; Downer, Crawley, Mikulis, & Davis, 2000; Eimer & Driver, 2001; Farah,

    Wong, Monheit, & Morrow, 1989; Iacoboni, Woods, & Mazziotta, 1998; Laurens, Kiehl, &

    Liddle, 2005; Snyder, Batista, & Andersen, 1997). Convergent theories, advocated by Macaluso

    and others suggest that while supramodal attentional networks may guide sensorimotor

    integration, reverberating loops that link sensory-specific cortices to each other can also integrate

    sensory information across modalities (Corbetta & Shulman, 2002; Ettlinger & Wilson, 1990;

    Macaluso, 2006; Macaluso & Driver, 2005). Macaluso and colleagues have derived a convergent

  • 26

    model for visual and tactile modalities but this model has not been applied to audio-visual cue-

    target processing.

    In order to reconcile discrepancies found in behavioural crossmodal facilitation data and

    exact brain mechanisms responsible for multisensory processing; I designed a task that attempted

    to capture audio-visual interactions between a cue and a target in an event-related functional

    neuroimaging study. A stimulus-response compatibility paradigm, traditionally used to study

    response selection and cue-target processing (Bertelson and Tisseyre, 1969; Rodway, 2005), was

    modified to investigate crossmodal cue-target interactions. A general version of the stimulus-

    response compatibility paradigm has within it a cue that signals a response rule to a lateralized

    target. Response times are faster when responses are made to the same side as the presentation of

    a target (compatible responses) compared to when responses are made to the opposite side of

    target presentation (incompatible responses). This robust behavioural finding is known as the

    stimulus-response compatibility effect (Simon, 1969; Fitts & Seeger, 1953).The cues and targets

    in my experiment were presented in auditory and visual modalities.

    The aims of the first part of my study are to determine the neural correlates that underlie

    reaction time facilitation in response to auditory and or visual cues. Given the discrepancies in

    behavioural findings, I am unsure about the direction of reaction time facilitation however, I

    hypothesize that auditory cues will facilitate reaction times to visual targets because of the

    structure of auditory and visual neural pathways in the brain (see Chapter 1 for details).

    3.2 Materials and Methods

    3.2.1 Participants

    Twenty-four (12 female) healthy, right-handed individuals between the ages of 19 and 35

    (mean age - 23.08 ± 3.87 years) were voluntarily recruited through undergraduate psychology

    courses to partake in the study. All individuals were screened for any history of medical,

    neurological, psychiatric, substance abuse-related problems (see Appendix A and B for screening

    forms) prior to their participation in the study. All participants signed an informed consent (see

    Appendix C) and were reimbursed $100.00 for two sessions. The experiment was conducted in

    the fMRI Suite at Baycrest upon approval from the Research Ethics Board at Baycrest.

  • 27

    3.2.2 Stimuli

    Two types of auditory stimuli that were matched in amplitude but varied in frequency

    (250 Hz and 4000 Hz) were used in the experiment. Each participant adjusted the volume of all

    auditory tones using the left and right buttons on the response box such that the tones appeared

    perceptually identical. The volume adjustment was conducted at the beginning of the experiment

    with the scanner turned on so that participants could adjust the volume of the stimuli according

    to the noise produced by the scanner.

    The visual stimuli used in the experiment were of two types and were matched for

    luminance and contrast. The first visual stimulus was a black and white checkerboard pattern and

    the second visual stimulus was also a black and white checkerboard pattern, but rotated at a 45

    degree angle. Stimulus presentation was controlled and documented by Presentation software

    (version 10.2, Neurobehavioural Systems Inc.)

    3.2.3 Apparatus

    Participants viewed visual stimuli on a translucent screen via a mirror that was mounted

    on the head coil (used for acquiring images) in the MRI scanner. The total distance between the

    participants’ eyes and the screen was approximately 52 inches. The size of the image on the

    screen was 13.75 inches by 11 inches with an image resolution of 800 x 600 x 60 Hz. The field

    of view (FOV) was 12 degree vertical and 15 degree horizontal. The majority of the participants

    wore their own contact lenses for vision correction. MR safe glasses made by SafeVision

    (Webster Groves, MO, USA) with a range from +/- 6 dioptres in increments of 0.5 dioptres were

    provided to those participants that did not have prescription contact lenses. Auditory stimuli were

    presented using the Avotec Audio System (Jensen Beach, FL, USA). A button-press response

    was made by participants with either the left or the right index finger using Fiber-Optic Response

    Pad System developed by Current Designs Inc. (Philadelphia, PA, USA). The response pad

    system had two paddles – one for each hand and each paddle had four buttons. The first button

    used for the left hand (left paddle) was the first button in the four buttons starting from the right

    going left. The first button used for the right hand (right paddle) was the first button out of four

    buttons going from left to right. The other three buttons on both hands were taped to prevent

    responses. Participants rested their hands gently on top of the taped buttons. The signal

    transmission of the paddles was less than 1ms.

  • 28

    3.2.4 Procedure

    3.2.4.1 Trial Structure

    A trial had the following sequence: presentation of the first stimulus (S1) for 250ms, a 4-

    second inter-stimulus interval (ISI), presentation of the second stimulus (S2) for 250ms and a

    response window of 1500ms. The inter-trial interval (ITI) was jittered randomly at 3, 5, 7, 9

    seconds. Response times were recorded using Presentation software from the onset of the second

    stimulus.

    3.2.4.2 Task Types

    The study consisted of two scanning sessions on different days. Each session was 1.5

    hours in length. Two different ta