Visuo-verbal interactions in working memory: Evidence from event-related potentials

www.elsevier.com/locate/cogbrainres

Cognitive Brain Research

Research Report

Visuo-verbal interactions in working memory: Evidence from

event-related potentials

Jan Petersa,b,*, Boris Suchanb, Yaxin Zhanga, Irene Dauma,b

aInternational Graduate School of Neuroscience, Ruhr-University Bochum, GermanybInstitute of Cognitive Neuroscience, Department of Neuropsychology, Ruhr-University Bochum, Germany

Accepted 10 July 2005

Available online 15 August 2005

Abstract

Working memory is thought to involve separate modality-specific storage systems. Interactions between these storage systems were

investigated using a novel cross-modal 2-back paradigm. 2-back, 1-back and target items were presented either visually as a verbalizable

linedrawing or auditorily as a digitized spoken word. ERPs for auditory targets were primarily modulated by the presentation modality of the 2-

back item, whereas ERPs for visual targets were largely modulated by presentation modality of the 1-back item. Results indicate that

verbalizable pictures are only partially transformed into a phonological code for rehearsal in working memory. Furthermore, results support the

idea of a more stable and persistent auditory short-term store as opposed to a more transiently activated visual store for verbalizable material.

D 2005 Elsevier B.V. All rights reserved.

Theme: Neural Basis of Behavior

Topic: Cognition

Keywords: Working memory; Event-related potential; N100; P200; Slow wave; Visuo-verbal; Cross-modal

1. Introduction

Working memory, the short-term maintenance and manipu-

lation of information, is a critical component of intelligent

behavior. Specific subsystems have been described for the

maintenance of material from different modalities. The original

model of working memory involved short-term buffers for

phonological and visuo-spatial material under the control of a

central executive [1,3]. The interaction between the two

modality-specific storage systems is as yet unclear.

The issue of cross-modal transformation of content in

working memory has mainly been investigated with respect

to a possible phonological re-coding of verbalizable visual

material. Based on findings from earlier psychophysical

0926-6410/$ - see front matter D 2005 Elsevier B.V. All rights reserved.

doi:10.1016/j.cogbrainres.2005.07.001

* Corresponding author. Institute of Cognitive Neuroscience, Depart-

ment of Neuropsychology, Ruhr-University Bochum, Universitatsstraße

150, D-44780 Bochum, Germany. Fax: +49 234 32 14622.

E-mail address: [email protected] (J. Peters).

experiments, Penney [22] suggested a dual processing

stream model, with partly independent and partly over-

lapping processes for visually and auditorily presented

words. By contrast, Schumacher et al. [27] reported findings

from a PET study, in which regions involved in rehearsal of

visual–verbal and auditory–verbal material showed almost

complete overlap. Smith and Jonides [13,28] suggested a

three-component model of working memory, with separate

storage systems for verbal, spatial and object material. It has

been argued that due to the superior storage capacities of the

phonological loop, verbalizable visual material is trans-

formed into a phonological code for rehearsal in phono-

logical working memory [2,19,28].

There is, however, some evidence against this recoding-

view, as far as visual–verbal (non-pictoral) material is

concerned. Working memory for auditorily and visually

presented verbal material has been previously investigated

using event-related potentials [25]. This study revealed a

fronto-central negativity associated with an auditory–verbal

25 (2005) 406 – 415

J. Peters et al. / Cognitive Brain Research 25 (2005) 406–415 407

processing stream and a transient posterior positivity

associated with a visual–verbal processing stream [25].

There is evidence from a recent fMRI study that auditory

verbal working memory recruits larger regions in the dorso-

lateral prefrontal cortex compared to visual verbal working

memory, whereas the latter is associated with increased

activity in left posterior parietal cortex, more specifically in

the intraparietal sulcus [8]. Processing of visual and auditory

verbal material may thus involve distinct cortical circuits.

As retention time increases, these processes are thought to

converge onto a unitary, encoding modality independent

rehearsal process [7,25], indicating that high temporal

resolution is essential for an in-depth investigation of

modality specific coding in working memory.

Additionally, visual and auditory verbal short-term

storage systems may differ with respect to their temporal

characteristics. Behavioral and event-related potential data

[22,25] both point towards a more stable and persistent

auditory store and a more transiently activated visual store

for verbal material.

To investigate interactions between auditory and visual

short-term storage for verbalizable material, a novel cross-

modal 2-back paradigm was developed with both auditory

(spoken words) and visual (verbalizable linedrawings) items

presented at encoding and retrieval. Processes that occurred

when a target item from one modality had to be matched to a

2-back item that had been encoded in the same or a different

modality were monitored using ERPs. Attentional and

working memory demands (i.e. number of items to be

rehearsed) were held constant while encoding and retrieval

modalities were systematically varied [31].

We hypothesized that coding differences for visual and

auditory material should not only manifest themselves in

different patterns of brain activity associated with rehearsal

[25], but also in different brain activity patterns associated

with matching represented in early ERP components. For

auditorily presented targets, we expected visual encoding to

give rise to differences in brain activation during retrieval

only if the phonological recoding of the visual items is

incomplete, that is, if the coding of verbalizable pictures in

working memory is at least partly visuo-spatial. Secondly,

we hypothesized that if activation in the visual store decays

more rapidly, re-activation of this system after auditory

stimulation should give rise to enhanced activity reflected in

the visual ERP. At the same time, a more stable and

persistent auditory store might be able to maintain activation

over a longer period of time and thus not be subject to such

switching effects.

2. Methods

2.1. Participants

Fourteen healthy human subjects participated in the

experiment (mean age 24.6, 6 participants were male).

One participant was left-handed. Participants were univer-

sity students who received course credit for their

participation.

2.2. Stimuli

Fifteen German words spoken by a male speaker served

as auditory stimuli. For visual stimulation, the correspon-

ding pictures, taken from a standardized picture set [29],

were used. Pictures were matched both for name agreement,

thus ensuring verbalizability of the items, and visual

complexity [29] in order to avoid differential visual memory

load effects. Additionally, only pictures corresponding to

disyllabic German words were chosen. It has been shown

that immediate serial recall of disyllabic words is not

influenced by the exact spoken duration of the words [15],

indicating that phonological working memory load for these

words can be considered comparable. The following words/

linedrawings were used: apple, flower, glasses, hammer,

jacket, bottle, saw, fork, screw, crown, socks, cat, bicycle,

whistle, scissors.

2.3. Procedure

A modified 2-back paradigm (Fig. 1) was used. Stimuli

were continuously presented for 700 ms each, followed by

the presentation of a fixation cross for 1700 ms. For each

stimulus, participants were instructed to make a same/

different judgement with respect to the 2-back item, that is,

the stimulus presented two trials earlier. Participants were

instructed that content, not modality, was relevant. Eight

blocks of 60 trials each were administered. In each block,

the experimental conditions were presented in a pseudo-

randomized fashion. After each block, participants started

the subsequent block by pressing a button.

The task was continuous in the sense that the 1-back item

of any given trial was the 2-back item in the subsequent trial

and so on. The 1-back item thus served as an interfering

item which was nonetheless task relevant in the subsequent

trial.

2.4. Conditions

The experimental setup included 8 experimental con-

ditions, depending on the presentation modalities of 2-back

item, 1-back item and target item (see Fig. 1). Accord-

ingly, throughout this report, conditions will be labelled

with letter-triplets where an ‘‘A’’ represents auditory

presentation and a ‘‘V’’ visual presentation. VAA thus

labels the condition in which the 2-back item was

presented visually whereas 1-back and target item were

both presented auditorily. Trials with a modality change

from 2-back item to target item will be termed ‘‘Trans-

formation’’ trials. Trials with a modality change from 1-

back item to target item will be termed ’’Switch’’ trials

(see Table 1).

Fig. 1. Schematic outline of the cross-modal 2-back paradigm. Stimuli were presented continuously. Here, a trial from condition VAA is shown. The 2-back

item was presented visually whereas 1-back item and target item were presented auditorily. A transformation occurs because 2-back and target item are not

presented in the same modality. Switching is not required because 1-back item and target item were both presented auditorily.

J. Peters et al. / Cognitive Brain Research 25 (2005) 406–415408

2.5. Electrophysiological recording

EEG was recorded with tin electrodes mounted in an

elastic cap from 30 scalp positions (F7, F3, FZ, F4, F8, FT7,

FC3, FCZ, FC4, FT8, T7, C3, CZ, C4, T8, TP7, CP3, CPz,

CP4, TP8, P7, P3, PZ, P4, P8, PO7, PO3, POz, PO4, PO8)

according to the extended 10–20 system. All scalp electro-

des were referenced to linked mastoids and impedance was

kept below 5 kOhm. To control for ocular artefacts, eye

movements were also recorded. EEG was sampled at a rate

of 500 Hz.

Subjects were seated comfortably in an electrically and

acoustically shielded dimmed room in front of a 17 in.

computer monitor at a distance of about 70 cm. Two 90-arranged response keys were used.

The EEG data were analyzed off-line using the Brain

Vision Analyzer software package. After filtering the raw

data with a low pass filter of 40 Hz with 12 db, data were

segmented to the presentation of the target stimulus, from

200 ms pre-presentation to 1000 ms after presentation. After

removing segments containing artefacts, ocular correction

[12] was carried out. A 200-ms pre-stimulus interval was

Table 1

Experimental conditions

Transformation [+] Transformation [�]

Switch [+] VVA, AAV AVA, VAV

Switch [�] VAA, AVV AAA, VVV

Note: Transformation represents a modality difference between target item

and 2-back item. Switching represents a modality difference between target

item and 1-back item (see text). Letter triplets refer to the modalities

(auditory—A, visual—V) of 2-back item, 1-back item and target item,

respectively.

used for baseline correction. Thereafter, grand averaged

ERPs were obtained and pooled across all subjects (Figs. 2

and 3).

2.6. Data analysis

Response accuracy and reaction time were recorded.

Only ERPs from correct 2-back trials are included in this

report. N100 and P200 peak components were determined

(see Figs. 2 and 3). N100 was defined as the maximum

negative peak in the time window 80–180 ms, whereas

P200 was defined as the maximum positive peak in the time

window 150–250 ms.

Slow wave amplitude (see Figs. 2 and 3) was analyzed

by computing the mean amplitude in a 200-ms time

window. Window measures ranged from 350 to 550 ms

(auditory slow wave) and from 200 to 400 ms (visual slow

wave). The different window measures for the auditory and

the visual modality were chosen to account for the differ-

ences in reaction time as a function of target modality. Mean

reaction time was �150 ms slower for auditory trials (see

Fig. 4a). Auditory and visual slow wave measures therefore

each cover a 200-ms time segment prior to a point

approximately 270 ms before motor response onset. The

subsequent 200 ms of slow wave activity were also analyzed

in both modalities. Since the effects largely resemble those

found for the earlier portion of the slow wave, the data are

not reported here.

2.7. Statistics

Repeated measure ANOVAs with Greenhouse–Geisser

correction were performed for analysis of the behavioral and

Fig. 2. Grand average event-related potentials from all subjects (n = 14) to auditory targets in all conditions. N100, P200 and slow wave amplitudes were

analyzed (see text).


electrophysiological data. Except for response accuracy, all

measures were analyzed separately for auditory and visual

targets. Behavioral data analysis included the factors cross-

modal Transformation and Switching (Table 1). For

technical reasons (hardware problem), the response accu-

racy (RA) and reaction time (RT) data of one subject had to

Fig. 3. Grand average event-related potentials from all subjects (n = 14) to visua

analyzed (see text).

be excluded from analysis. The subject’s data were used in

the electrophysiological analysis.

Electrophysiological data analysis included the factors

Transformation, Switch, Electrode position (AP, anterior:

F7, Fz, F8, central: T7, Cz, T8, posterior: P7, Pz , P8) and

Hemisphere (HM, left: F7, T7, P7, central: Fz, Cz, Pz, right:

l targets from all conditions. N100, P200 and slow wave amplitudes were

Fig. 4. (a) Reaction times (RT) as a function of the experimental conditions.

Visual RTs were significantly prolonged for AAV and VAV as compared to

VVVand AVV ( P < 0.005). (b) Response accuracy (RA) as a function of the

experimental conditions. Auditory RAwas significantly higher for AAA and

AVA as compared to VAA and VVA ( P < 0.05). Error bars denote SEM.


F8, T8, P8). Due to large differences in the ERPs as a

function of target modality, electrophysiological data were

analyzed separately for each modality.

3. Results

3.1. Behavioral data

3.1.1. Auditory reaction times

Auditory RTs were generally about 150 ms slower than

visual RTs (mean � 825 ms, see Fig. 4a) due to the

extended presentation time for spoken words compared to

pictures. Statistical analysis of auditory RTs did not reveal

significant effects for Transformation (F(1,12) = 1.228, P =

0.290), Switch (F(1,12) = 4.008, P = 0.068) or Trans-

formation � Switch (F(1,12) = 0.433, P = 0.523).

3.1.2. Visual reaction times

Analysis of visual RTs revealed a significant main effect

of Switch (F(1,12) = 13.520, P < 0.005). RTs for visual

targets were significantly increased when the 1-back item

was presented auditorily as opposed to when it was presented

visually (see Fig. 4a). No significant effects were found for

the factors Transformation (F(1,12) = 1.790, P = 0.206) and

Transformation � Switch (F(1,12) = 1.306, P = 0.275).

3.1.3. Response accuracy

Response accuracy was generally above or around 80% in

all conditions (Fig. 4b). Statistical analysis of response

accuracy revealed a significant Transformation � Target

Modality interaction (F(1,12) = 7.129, P < 0.05). To further

explore this interaction, Transformation effects were ana-

lyzed for each modality separately. For auditory targets, a

main effect for Transformation reached significance (F(1,12) =

8.662, P < 0.05), indicating that participants made signifi-

cantly more errors in auditory trials when the 2-back stimulus

was visual compared to when it was auditory. For visual

trials, neither Transformation (F(1,12) = 3.933, P = 0.071) nor

Transformation � Switch (F(1,12) = 3.792, P = 0.075)

attained significance.

3.2. Electrophysiological data

3.2.1. Auditory targets

Grand average potentials for auditory targets with data

pooled across all participants (n = 14) are depicted in Fig. 2.

Auditory target items gave rise to an N100-P200 complex,

followed by slow wave activity.

3.2.1.1. N100. The interactions Transformation � Switch

(F(1,13) = 12.594, P < 0.005) and Transformation � Switch �Hemisphere (F(2,26) = 6.748, P < 0.01) were significant.

Further analysis of the three way interaction revealed

significant Transformation�Switch interactions at left

(F(1,13) = 9.525, P < 0.01) central (F(1,13) = 12.774, P <

0.01) and right hemisphere sites (F(1,13) = 9.708, P < 0.01). As

can bee seen from Fig. 2, N100 effects were most pronounced

at site Cz. At Cz, the interaction Transformation � Switch was

significant (F(1,13) = 11.978, P < 0.005, see Fig. 5a). Further

analysis revealed that here, N100 amplitude in condition VVA

was significantly larger than in conditions AVA (P < 0.005),

VAA (P < 0.005) and AAA (P < 0.05).

There were no significant effects of Transformation

(F(1,13) = 0.700, P = 0.418), Switch (F(1,13) = 0.067, P =

0.800) or Transformation � Switch (F(1,13) = 0.730, P =

0.408) on auditory N100 peak latency.

3.2.1.2. P200. For P200 amplitude, a main effect of

Transformation emerged (F(1,13) = 9.369, P < 0.01).

Auditory P200 amplitude was thus significantly reduced

for trials where the 2-back was presented visually compared

to auditory 2-back presentation. P200 amplitude was most

pronounced at site Pz. The main effect of Transformation

was also significant at site Pz (F(1,13) = 9.005, P < 0.05, see

Fig. 5b), with VAA and VVA exhibiting smaller amplitudes

than AAA and AVA.

There were no significant effects of Transformation

(F(1,13) = 0.026, P = 0.873), Switch (F(1,13) = 1.545, P =

0.236) or Transformation � Switch (F(1,13) = 0.031, P =

0.862) on P200 latency.

3.2.1.3. Slow wave. A significant main effect of Trans-

formation on auditory slow wave amplitude was found

(F(1,13) = 6.671, P < 0.05). Visual inspection indicated that

this effect was most pronounced at the left-frontal electrode

Fig. 5. Mean amplitudes of auditory N100 at site Cz (a) and auditory P200 at site Pz (b). The observed effects were usually significant across all electrodes or

a large subset of electrodes submitted to the ANOVAs (see text). They were most pronounced at the depicted electrode positions. Significance levels (*P <

0.05) indicate results of ANOVAs conducted at the selected electrodes. (a) Auditory N100 amplitude was larger in condition VVA as compared to the other

conditions. (b) Auditory P200 was significantly reduced for Transformation conditions VAA and VVA as compared to AAA and AVA. Error bars denote

SEM.


F7. At site F7, the main effect of Transformation was also

significant (F(1,13) = 8.811, P < 0.05, graph not shown). The

interaction Transformation � Switch did not reach signifi-

cance (F(1,13) = 0.077, P = 0.786).

3.2.2. Visual targets

Grand average potentials for visual targets with data

pooled across all participants (n = 14) are depicted in Fig. 3.

Visual targets also gave rise to an initial N100 peak

followed by a positive deflection. A pronounced positive

slow wave followed.

3.2.2.1. N100. For visual targets, a main effect of Switch

(F(1,13) = 12.066, P < 0.005) emerged. The interaction

Transformation � Switch, however, was not significant

(F(1,13) = 4.593, P = 0.052). An examination of the Switch

effect revealed that visual N100 amplitude was generally

larger when the 1-back item had been presented visually as

opposed to auditorily. Furthermore, there was a significant

Switch � Electrode position interaction (F(1,13) = 5.636,

P < 0.05), indicating that the Switch effect was significant

at anterior (F(1,13) = 13.364, P < 0.005) and central

electrodes (F(1,13) = 11.785, P < 0.005), but was not

significant at parietal sites (F(1,13) = 4.464, P = 0.055).

Visual N100 effects were most pronounced at site Cz.

Here, the main effect of Switch was also significant

(F(1,13) = 8.059, P < 0.05, see Fig. 6a).

3.2.2.2. P200. For visual P200, the interaction Switch �Hemisphere reached significance (F(1,13) = 8.671, P <

0.005). However, a subsequent analysis of this effect for

each hemisphere separately did not yield significant effects

for Switch at left (F(1,13) = 0.708, P = 0.415), central

(F(1,13) = 4.481, P = 0.054) or right electrodes (F(1,13) =

0.281, P = 0.602).

3.2.2.3. Slow wave. Asignificantmain effect of Switchwas

found (F(1,13) = 16.234,P < 0.005) with increased amplitudes

for conditions AAV and VAV compared to VVV and AVV.

The factor Transformation was not significant (F(1,13) =

4.396, P = 0.056). The interactions Switch � Hemisphere

(F(2,26) = 6.198, P < 0.05), and Transformation � Switch �Hemisphere (F(2,26) = 5.137, P < 0.05), also reached

significance. A subsequent analysis of the latter 3-way

interaction for each hemisphere separately revealed that the

interaction Transformation � Switch reached significance at

central electrodes (F(1,13) = 4.708, P < 0.05) but not at left

(F(1,13) = 0.058, P = 0.813) or right hemisphere sites

Fig. 6. Mean amplitudes of visual N100 at site Cz (a) and visual slow wave at site Pz (b). The observed effects were usually significant across all electrodes or a

large subset of electrodes submitted to the ANOVAs (see text). Effects were most pronounced at the depicted electrode positions. Significance levels (*P <

0.05; **P < 0.01) indicate results of ANOVAs conducted at the selected electrodes. (a) Visual N100 was significantly reduced for the Switch conditions VAV

and AAV compared to VVVand AVV. (b) Visual slow wave amplitude in the time window 200–400 ms poststimulus was significantly increased for the Switch

conditions VAV and AAV compared to VVV and AVV. Error bars denote SEM.


(F(1,13) = 2.353, P = 0.149). This effect was most

pronounced at site Pz. At this site, the main effect of

Switch was also significant (F(1,13) = 11.978, P < 0.005,

see Fig. 6b).

4. Discussion

In the present study, visuo-verbal working memory

processes were investigated in a cross-modal 2-back

paradigm. The task involved modality switching between

encoding and retrieval as well as between 1-back item and

target item. Behavioral and electrophysiological responses

to auditory targets were found to be modulated by the

presentation modality of the 2-back item, whereas responses

to visual targets were primarily modulated by the modality

of the 1-back item. Transformation trials were those trials

where target and 2-back item were not in the same modality,

whereas Switch trials were those trials where target and 1-

back item were not in the same modality.

Increased error rates were found for auditory Trans-

formation trials. This effect was not found for visual

Transformation trials. Behavioral data indicate that this

effect only occurs when an auditory target item has to be

matched to a visually encoded 2-back item but not vice

versa. As one cognitive component of the 2-back paradigm

is an embedded matching subtask [31], this effect could be

attributed to an increased difficulty in performing this task

in auditory Transformation trials. The data may thus point

towards an asymmetry with respect to the ease with which

auditorily and visually encoded verbalizable material can be

successfully matched in working memory. It has been

argued that auditorily encoded verbal material is stored in a

phonological code that has more sensory aspects than the

phonological code which is created when verbalizable

visual material is phonologically recoded [22]. In the

context of her ‘‘separate-streams model’’, Penney [22] has

termed this code the A-code (acoustic code) as opposed to

the P-code, which corresponds to the memory trace created

upon phonological re-coding of visual material. The RA

data of the present study indicate that a representation in P-

code is matched more easily onto a representation in A-code

than the reverse, thus yielding further evidence for the

proposed distinction by Penney [22].

RT analysis revealed prolonged RTs for visual Switch

trials. This effect was not observed for auditory Switch

trials. In the model of separate processing streams for

auditory verbal and visual verbalizable material, Penney has


presented behavioral evidence for different temporal char-

acteristics of these streams [22]. More recently, Ruchkin and

colleagues have reported encoding modality-dependent

patterns of rehearsal-related brain activity supporting

Penneys proposal [25]. In this context, the observed RT

effects can be interpreted as further evidence for distinct

temporal characteristics of short-term storage of verbal-

izable material as a function of the encoding modality. We

suggest that encoding of a novel verbalizable picture into

working memory requires re-activation of a visual or visuo-

spatial store that may correspond to the visual–verbal store

described by Penney as well as to Baddeleys visuo-spatial

sketchpad. In accordance with the view that activity in this

store decays more rapidly than in the auditory store, longer

RTs for visual Switch trials possibly reflect increased

processing resources allocated to the re-activation of the

visual store.

Generally, the suggested interpretation of the behavioral

data in terms of Penneys separate-streams model is also

supported by the electrophysiological data. The distinction

between a representation of auditory items in A-code and of

phonologically recoded visual items in P-code, which has

less auditory sensory properties and faster decay, is reflected

by the findings for auditory N100. N100 amplitude was

largest for condition VVA as opposed to the other

conditions. Firstly, the possibility that this finding is merely

a result of attenuation or of refraction, processes known to

modulate N100 amplitude [20], needs to be considered.

Attenuation can be ruled out because, in this case, a similar

(though less pronounced) effect would be expected for

condition AVA. However, a tendency for the reverse pattern

was found. An explanation of the observed N100 effects in

terms of refractory processes can be excluded. Refractory

effects would imply reduced N100 amplitude after repetitive

auditory stimulation due to saturation. As there were no

significant differences between AVA on the one hand and

AAA and VAA on the other hand, the data do not suggest

that refractory effects played a role. Since the N100 is

generated partly in primary and secondary auditory cortices

[20], which are known to be involved in auditory working

memory [5], it is likely that the N100 findings can be

interpreted in the context of working memory processes.

Auditory N100 has been previously shown to index working

memory operations [6,10,11].

N100 amplitude for auditory probe items was previously

found to decrease linearly as memory load of auditorily

presented digits increased [6]. In response to auditory

probes, this effect was absent for visually encoded memory

set items in a Sternberg task [11]. Thus, N100 reduction

appears to index memory load for auditorily encoded items

only. The present finding of largest N100 for VVA, the

condition with no auditorily encoded task relevant items in

working memory (2-back and 1-back item have been

encoded visually), is therefore in accordance with the

literature. Additionally, these findings support the notion

that auditorily and visually encoded material is represented

in qualitatively different codes in working memory. Given

that the effect is found in the N100 component, a component

that has been argued to partly reflect activity in auditory

cortices [20], this finding is supportive of Penney’s proposal

that material needs to be encoded auditorily in order to be

represented in the more sensory-based A-code.

Transformation gave rise to a reduced auditory P200

amplitude. In functional terms, modulation of the auditory

P200 has been implicated in attentional processing [16,32].

Attenuation of auditory P200 amplitude has been suggested

to reflect reduced allocation of attentional resources towards

the eliciting stimulus [21]. The modulation of the P200 can

therefore be interpreted in terms of a modulation of the

matching subtask of the 2-back [31]. Unlike in the

commonly used Sternberg paradigm [30], in the 2-back

task, participants know that they have to match the target

stimulus to the 2-back item. Thus, an expectancy for the

target item or for features of the target item can be

generated. In the present study, the expectation for a visual

target when the 2-back had been encoded visually was

reflected in reduced P200 amplitude for the auditory target.

Had the visual items been converted into a representational

code exactly matching the representation after auditory

encoding, no such effect would be expected. Additionally,

the P200 findings parallel the behavioral findings for RA.

Reduced attentional resource allocation to the target item

may be one potential source for the increased error rates in

auditory Transformation trials.

Along similar lines, the findings of prolonged RTs for

visual Switch trials are paralleled by increased slow wave

activity in these trials. Furthermore, numerically, slow wave

amplitude increased the longer ago the last visual stim-

ulation occurred. At least two alternative interpretations of

this effect appear plausible. On the one hand, positive

parietal slow waves have been shown to index the retention

of visual object information [4]. Also, slow wave ampli-

tudes were found to be more negative in conditions of

higher visuo-spatial memory load [24]. Consistent with

these results, we observed the most pronounced positive

slow wave amplitude in AAV. If we assume a partially

visuo-spatial representation of the pictures in working

memory, AAV clearly has the lowest visuo-spatial memory

load of all visual conditions. The present slow wave

findings may thus reflect differences in the amount of

visuo-spatial information that is held in working memory.

On the other hand, in the context of Penney’s ‘‘separate-

streams model’’, it could be argued that the slow wave data

reflect a switching process that occurs when a more

transient store for visual material is reactivated after

encoding of an auditory item. The latter interpretation

would be more consistent with the findings for visual RTs.

Most likely, however, the present findings reflect a

combination of both factors.

As far as the representation of verbalizable material in

working memory is concerned, the present results indicate

that, depending on the encoding modality, differences can


be observed, both on the behavioral and the electro-

physiological level. Furthermore, the present findings are

consistent with the hypothesis that the phonological

representation of auditorily encoded verbal material only

partially overlaps with the representation of phonologically

recoded verbalizable linedrawings. What do the present

results contribute to the question of the underlying neural

mechanisms? Patients with left-hemisphere lesions are

consistently reported to show impaired verbal memory,

whereas a greater impairment of nonverbal memory is

found in right hemisphere lesion patients [17,18,23].

Similarly, neuroimaging studies have reported a functional

specialization of the left hemisphere for verbal encoding

and the right hemisphere for nonverbal encoding, in regions

of the medial temporal lobes and the prefrontal cortex

[9,14]. More specifically, the involvement of left and right

hemisphere in memory encoding has been found to be

modulated by the degree to which the encoded material

could be verbalized. In one study, left lateralized encoding

activity was observed for words, right lateralized activity

for unfamiliar faces and bilateral frontal activitation for

namable objects [14]. This may be the reason why no

lateralized effects have been found in the present study. As

a verbalization strategy was clearly most the most efficient

way to perform the cross-modal 2-back task, it is likely that

no subject adopted a purely visuo-spatial rehearsal strategy.

Generally, our results lend some support to the notion that

auditorily encoded words are represented in a more sensory-

based code than are phonologically recoded pictures. It has

recently been proposed that working memory rehearsal

involves the activation of posterior sensory cortices by

regions in the prefrontal cortex [26]. Thus, by means of

‘‘attentional pointers’’ [26], prefrontal cortex is thought to

maintain activity in those brain regions that have initially

been involved in the perception of the particular stimulus.

The present auditory N100 effects lend some support to this

idea, as this ERP component is thought to reflect to a

considerable extent the activity of primary and secondary

auditory association cortices [20]. Our results suggest that

these regions may be more involved in the rehearsal of

auditorily as compared to visually encoded verbalizable

material.

While the present results support previous findings

concerning more persistent storage of auditorily encoded

verbal material, the question arises if this is a unique feature

of the auditory modality in general, or if this pertains only to

phonological material. Addressing this question would

necessitate the use of non-phonological auditory material

such as non-verbalizable sounds. It has been suggested that

rehearsal of auditory material may be facilitated by its

intrinsic temporal characteristics [22]. As this does not only

apply to verbal but also to a certain extent to non-verbal

auditory material, temporal persistence may be hypothesized

to be a more general feature of the auditory modality,

regardless of whether the material is verbal or not. Further

research addressing this issue is clearly needed.

Acknowledgments

This research was funded by the International Graduate

School of Neuroscience, Ruhr-University Bochum, and the

German Research Society (DA 259/9-1).

References

[1] A. Baddeley, Working Memory, Oxford Univ. Press, New York,

1986.

[2] A. Baddeley, The episodic buffer: a new component of working

memory? Trends Cogn. Sci. 4 (2000) 417–423.

[3] A. Baddeley, G. Hitch, Working memory, G.H. Bower (Ed.), The

Psychology of Learning and Motivation, vol. 8, Academic Press, New

York, 1974.

[4] V. Bosch, A. Mecklinger, A.D. Friederici, Slow cortical potentials

during retention of object, spatial, and verbal information, Brain Res.

Cogn. Brain Res. 10 (2001) 219–237.

[5] M. Colombo, M.R. D’Amato, H.R. Rodman, C.G. Gross, Auditory

association cortex lesions impair auditory short-term memory in

monkeys, Science 247 (1990) 336–338.

[6] E.M. Conley, H.J. Michalewski, A. Starr, The N100 auditory cortical

evoked potential indexes scanning of auditory short-term memory,

Clin. Neurophysiol. 110 (1999) 2086–2093.

[7] S.M. Courtney, L. Petit, J.V. Haxby, L.G. Ungerleider, The role of

prefrontal cortex in working memory: examining the contents of

consciousness, Philos. Trans. R. Soc. Lond., B Biol. Sci. 353 (1998)

1819–1828.

[8] S. Crottaz-Herbette, R.T. Anagnoson, V. Menon, Modality effects in

verbal working memory: differential prefrontal and parietal responses

to auditory and visual stimuli, NeuroImage 21 (2004) 340–541.

[9] A.J. Golby, R.A. Poldrack, J.B. Brewer, D. Spencer, J.E. Desmond,

A.P. Aron, J.D. Gabrieli, Material-specific lateralization in the medial

temporal lobe and prefrontal cortex during memory encoding, Brain

124 (2001) 1841–1854.

[10] E.J. Golob, A. Starr, Serial position effects in auditory event-related

potentials during working memory retrieval, J. Cogn. Neurosci. 16

(2004) 40–52.

[11] E.J. Golob, A. Starr, Visual encoding differentially affects auditory

event-related potentials during working memory retrieval, Psycho-

physiology 41 (2004) 186–192.

[12] G. Gratton, M.G. Coles, E. Donchin, A new method for off-line

removal of ocular artifact, Electroencephalogr. Clin. Neurophysiol. 55

(1983) 468–484.

[13] J. Jonides, E.E. Smith, The architecture of working memory, in: M.D.

Rugg (Ed.), Cognitive Neuroscience, Psychology Press, Hove, 1997,

pp. 243–276.

[14] W.M. Kelley, F.M. Miezin, K.B. McDermott, R.L. Buckner, M.E.

Raichle, N.J. Cohen, J.M. Ollinger, E. Akbudak, T.E. Conturo, A.Z.

Snyder, S.E. Petersen, Hemispheric specialization in human dorsal

frontal cortex and medial temporal lobe for verbal and nonverbal

memory encoding, Neuron 20 (1998) 927–936.

[15] P. Lovatt, S.E. Avons, J. Masterson, The word-length effect and

disyllabic words, Q. J. Exp. Psychol., A 53 (2000) 1–22.

[16] G.R. Mangun, S.A. Hillyard, Selective attention: mechanisms and

models, in: M.D. Rugg, M.G. Coles (Eds.), Electrophysiology of

Mind, Oxford University Press, 1995.

[17] B. Milner, Interhemispheric differences in the localization of psycho-

logical processes in man, Br. Med. Bull. 27 (1971) 272–277.

[18] B. Milner, Disorders of learning and memory after temporal lobe

lesions in man, Clin. Neurosurg. 19 (1972) 421–446.

[19] D.J. Murray, Articulation and acoustic confusability in short-term

memory, J. Exp. Psychol. (1968) 679–684.

[20] R. Naatanen, T. Picton, The N1 wave of the human electric and


magnetic response to sound: a review and an analysis of the

component structure, Psychophysiology 24 (1987) 375–425.

[21] S. Oray, Z.L. Lu, M.E. Dawson, Modification of sudden onset

auditory ERP by involuntary attention to visual stimuli, Int. J.

Psychophysiol. 43 (2002) 213–224.

[22] C.G. Penney, Modality effects and the structure of short-term verbal

memory, Mem. Cogn. 17 (1989) 398–422.

[23] M. Petrides, B. Milner, Deficits on subject-ordered tasks after

frontal- and temporal-lobe lesions in man, Neuropsychologia 20

(1982) 249–262.

[24] P. Rama, K. Kesseli, K. Reinikainen, J. Kekoni, H. Hamalainen, S.

Carlson, Visuospatial mnemonic load modulates event-related slow

potentials, NeuroReport 8 (1997) 871–876.

[25] D.S. Ruchkin, R.S. Berndt, R. Johnson Jr., W. Ritter, J. Grafman, H.L.

Canoune, Modality-specific processing streams in verbal working

memory: evidence from spatio-temporal patterns of brain activity,

Brain Res. Cogn. Brain Res. 6 (1997) 95–113.

[26] D.S. Ruchkin, J. Grafman, K. Cameron, R.S. Berndt, Working

memory retention systems: a state of activated long-term memory,

Behav. Brain Sci. 26 (2003) 709–728 (discussion 728–77).

[27] E.H. Schumacher, E. Lauber, E. Awh, J. Jonides, E.E. Smith, R.A.

Koeppe, PET evidence for an amodal verbal working memory system,

NeuroImage 3 (1996) 79–88.

[28] E.E. Smith, J. Jonides, Working memory: a view from neuroimaging,

Cogn. Psychol. 33 (1997) 5–42.

[29] J.G. Snodgrass, M. Vanderwart, A standardized set of 260 pictures:

norms for name agreement, image agreement, familiarity, and visual

complexity, J. Exp. Psychol. Hum. Learn. 6 (1980) 174–215.

[30] S. Sternberg, High-speed scanning in human memory, Science 153

(1966) 652–654.

[31] S. Watter, G.M. Geffen, L.B. Geffen, The n-back as a dual-task: P300

morphology under divided attention, Psychophysiology 38 (2001)

998–1003.

[32] M.G. Woldorff, S.A. Hillyard, Modulation of early auditory process-

ing during selective listening to rapidly presented tones, Electro-

encephalogr. Clin. Neurophysiol. 79 (1991) 170–191.

Visuo-verbal interactions in working memory: Evidence from event-related potentials

Documents

Transcript of Visuo-verbal interactions in working memory: Evidence from event-related potentials