What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are...

68
What the Networks Tell us about Serial and Parallel Processing: An MEG Study of Language Networks and N-gram Frequency Effects in Overt Picture Description Antoine Tremblay 1,2,3 , Elissa Asp 2 , Anne Johnson 1 , Małgorzata Zarzycka Migdał 4 , Tim Bardouille 1 , and Aaron J. Newman 1 1 Dalhousie University, Halifax; 2 Saint Mary’s University, Halifax; 3 NovaScape Data Analysis and Consulting, Canada, 4 Codility, USA Abstract A large literature documenting facilitative effects for high frequency complex words and phrases has led to proposals that high frequency phrases may be stored in memory rather than constructed on-line from their component parts (similarly to high frequency complex words). To investigate this, we explored language processing during a novel picture description task. Using the magneto-encephalographam (MEG) technique and generalised additive mixed-effects modelling, we characterised the effects of the frequency of use of single words as well as two-, three-, and four-word sequences (N-grams) on brain activity during the pre-production stage of unconstrained overt picture description. We expected amplitude responses to be modulated by N- gram frequency such that if N-grams were stored we would see a corresponding reduction or flattening in amplitudes as frequency increased. We found that while amplitude responses to increasing N-gram frequencies corresponded with our expectations about facilitation, the effect appeared at low frequency ranges and for single words only in the phonological network. We additionally found that high frequency N-grams elicited activity increases in some networks, which may be signs of competition or combination depending on the network. Moreover, this effect was not reliable for single word frequencies. These amplitude responses do not clearly support storage for high frequency multi-word sequences. To probe these unexpected results, we turned our attention to network topographies and the timing. We found that, with the exception of an initial 'sentence' network, all the networks aggregated peaks from more than one domain (e.g. semantics and phonology). Moreover, although activity moved serially from anterior ventral networks to dorsal posterior networks during processing, as expected in combinatorial accounts, sentence processing and semantic networks ran largely in parallel. Thus, network topographies and timing may account for (some) facilitative effects associated with frequency. We review literature relevant to the network topographies and timing and briefly discuss our results in relation to current processing and theoretical models. Keywords scene description; N-gram frequency; serial/parallel processing; magneto-encephalogram (MEG); generalized additive mixed-effects modelling (GAMM). 1

Transcript of What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are...

Page 1: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

What the Networks Tell us about Serial and Parallel Processing:

An MEG Study of Language Networks and N-gram Frequency Effects in Overt Picture

Description

Antoine Tremblay1,2,3, Elissa Asp2, Anne Johnson1, Małgorzata Zarzycka Migdał4, Tim

Bardouille1, and Aaron J. Newman1

1Dalhousie University, Halifax; 2Saint Mary’s University, Halifax; 3NovaScape Data Analysis and Consulting,Canada, 4Codility, USA

Abstract

A large literature documenting facilitative effects for high frequency complex words and phrases has led to proposals that high frequency phrases may be stored in memory rather than constructed on-line from their component parts (similarly to high frequency complex words). Toinvestigate this, we explored language processing during a novel picture description task. Using the magneto-encephalographam (MEG) technique and generalised additive mixed-effects modelling, we characterised the effects of the frequency of use of single words as well as two-, three-, and four-word sequences (N-grams) on brain activity during the pre-production stage of unconstrained overt picture description. We expected amplitude responses to be modulated by N-gram frequency such that if N-grams were stored we would see a corresponding reduction or flattening in amplitudes as frequency increased. We found that while amplitude responses to increasing N-gram frequencies corresponded with our expectations about facilitation, the effect appeared at low frequency ranges and for single words only in the phonological network. We additionally found that high frequency N-grams elicited activity increases in some networks, which may be signs of competition or combination depending on the network. Moreover, this effect was not reliable for single word frequencies. These amplitude responses do not clearly support storage for high frequency multi-word sequences. To probe these unexpected results, we turned our attention to network topographies and the timing. We found that, with the exception ofan initial 'sentence' network, all the networks aggregated peaks from more than one domain (e.g. semantics and phonology). Moreover, although activity moved serially from anterior ventral networks to dorsal posterior networks during processing, as expected in combinatorial accounts, sentence processing and semantic networks ran largely in parallel. Thus, network topographies and timing may account for (some) facilitative effects associated with frequency. We review literature relevant to the network topographies and timing and briefly discuss our results in relation to current processing and theoretical models.

Keywordsscene description; N-gram frequency; serial/parallel processing; magneto-encephalogram (MEG); generalized additive mixed-effects modelling (GAMM).

1

Page 2: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

1. Introduction

This paper addresses the question “do we store highly frequent multi-word sequences (N-

grams)?”. The question arises in the context of a collection of contemporary linguistic models

which have offered evidences and arguments that the mental lexicon consists of more than words

and morphemes. Such models are theoretically and descriptively quite various, but share (at

least) four common features. They are typically oriented to language use, lexicalist, give a

prominent role to memory along with other domain general processes in accounting for linguistic

productivity and acquisition, and are tolerant of redundancy (e.g., Abbot-Smith & Tomasello,

2006; Bybee & McClelland, 2005; Goldberg 2003; Hagoort, 2005, 2013; Jackendoff, 2002a,

2011). Such models are lexicalist in that they tend to ground structure building (one way or

another) in the lexicon rather than in a discrete (modular) set of syntactic algorithms. Moreover,

they tend to treat the lexicon-grammar relationship as a continuum rather than a divide. On one

end of that continuum are unique words (noise, beaver, chew), morphemes (the morph, -eme, and

-s in morphemes), and idioms (kick the bucket, break a leg) all of which are conventionally

lexical in exhibiting arbitrary associations of form and meaning. That is, the meaning GOOD

LUCK associated with the idiom break a leg cannot be assigned to any of the expression's parts

any more than the meaning LARGE SEMI-AQUATIC RODENT can be assigned to any part of

beaver. In the middle of the continuum are partially filled constructions – structures with some

lexical information specified and one or more variables ranging from make no bones about X

which is an idiom with a 'fillable' position, to to X one's way Y, where the X can be filled literally

by manner of motion verbs (wiggle, inch, creep) or by any other verb that can be coerced into a

motion + goal + simultaneous activity/means interpretation as in to belch/fight one's way to the

door (e.g. Jackendoff, 2011). The point about such constructions is that their meanings are a

function of the construction per se since there are no plausible decomposition accounts. They

2

Page 3: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

must be stored in memory too. At the other end of the continuum are empty bits of structure,

'treelets' for basic phrase structures, which some linguists now include in what is stored in

memory (e.g. Cullicover & Jackendoff, 2005; Goldberg 2003, 2006; Hagoort 2005). Thus, in

such paradigms, memory is absolutely central to language acquisition and use since linguistic

knowledge is assumed to emerge via generalization and abstraction (i.e. domain general

processes) operating over linguistic experience stored in memory.

Jackendoff has also proposed a 'parallel architecture model' which explicitly rejects

seriality and strict modularity. In this model, semantic, syntactic, and phonological information is

stored in long term memory in independent generative components, but related by (violable)

constraints or interface rules, and may be accessed in parallel (e.g. Jackendoff 2002, 2007, 2011;

Cullicover & Jackendoff, 2005, also Lamb, 1966 for related earlier models; and Hagoort 2005,

2013 for psycholinguistic framing of the model). Producing and understanding language thus

reduces to retrieval from memory in different domains, integration (or unification) of the

retrieved data, and control processes that modulate retrieval and unification (e.g. Hagoort 2005,

2013; Jackendoff 2007). Ullman (2007) and Jackendoff (2011) also argue that redundancy is

ubiquitous in biological systems and, although it has costs, it also has benefits in terms of

adaptive flexibility. Systems can complement and compensate for each other, optimizing or

preserving levels of function that might otherwise be compromised. Being able to either retrieve

partially or fully constructed bits of phrases from memory or construct them on-line would seem

to be advantageous in precisely this way, and need not be viewed as mutually exclusive

possibilities.

The question we began with in this environment was if we allow redundancy in the

mental lexicon, is it possible that we store not only idioms and frequent constructions (partially

filled or not) but also fully lexicalized semantically transparent (i.e. non-idiomatic) highly

3

Page 4: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

frequent sequences which do not necessarily coincide with phrasal boundaries? That is, do we

store, for instance, sequences (N-grams) such as in the, in the middle, in the middle of, and at the,

at the end, at the end of and so on, as well as the words that constitute them?

1.1. Evidence for stored multi-word sequences

The idea that we do store such sequences has been motivated by a number of

experimental findings. When comparing high frequency multi-word sequences to low frequency

multi-word sequences it was found that high frequency sequences (A) are fixated on for a shorter

period of time (Underwood, Schmitt, & Galpin, 2004; Siyanova-Chanturia, Conklin, & Heuven,

2011; Columbus, 2012), (B) are responded to more quickly and with fewer errors (Jiang &

Nekrasova, 2007), (C) are read faster and recalled from short term memory with higher accuracy

(Columbus, 2010; Tremblay et al., 2011), and (D) are more difficult to parse than low frequency

sequences (Sosa & MacFarlane, 2002). Moreover, the greater the frequency of a sequence, (E)

the faster it is judged to be grammatically well formed and the more it primed its final word

(Ellis & Simpson-Vlach, 2009), (F) the faster participants decided whether a sequence was

possible in English (Arnon & Snider, 2010), (G) the higher the probability of correctly recalling

a four-word sequence (Tremblay & Baayen, 2010), and (H) production began faster and

production durations were shorter (Tremblay & Tucker, 2011). The phrasal frequency effects

found in these studies might be taken to support the idea that we store, in one form or another, at

least some aspects of compositional multi-word sequences (see also Gregory, Raymond, Bell,

Fosler-Lussier, & Jurafsky, 1999, Jurafsky, Bell, & Raymond, 2001, Bell, Brenier, Gregory,

Girand, & Jurafsky, 2009). However, the later studies (E-G) showed no discontinuity in the

frequency effects – such as for example a sudden increase or decrease in reaction time occurring

at a certain frequency value – based on which one could deduce that a change occurred in the

4

Page 5: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

brain and that phrases could be divided into high and low frequency items. This suggests that

although we store probabilistic information about compositional multi-word sequences, the

monotonic effect of frequency does not support the existence of two distinct categories of

phrases.

These latter results may be attributable to the fact that, in the statistical analyses, the

relationship between frequency of use and behaviour was assumed to be a straight line. Such an

assumption precludes finding discontinuities in frequency effects. If one relaxes this assumption

and allows the relationship to be wiggly (e.g., exponential, logarithmic, sinusoidal, etc.), such

discontinuities may be found. In their word-monitoring experiment, Kapatsinski and Radicke

(2009) found a U-shaped relationship between the parsability of a verb + up sequence where the

lowest point of this curve could be viewed as a threshold for storage. In their four-word sequence

recall experiment with electro-encephalographic (EEG) recordings, Tremblay and Baayen (2010)

found a logarithmic relationship between the probability of occurrence of compositional

quadgrams (e.g. in the middle of) and the amplitude of a positive going event-related potential

(ERP) peaking 110–150 milliseconds after presentation (the P100). The amplitude of the P100

decreased for quadgrams with a very low to medium probability of occurrence, but for medium

to very high quadgrams, the amplitude of the P100 remained flat. The point where the probability

of occurrence effect stopped decreasing and remained flat could be interpreted as indexing

retrieval of multi-word sequences from memory. It was this possibility that motivated the current

experiment.

1.2. The Present Study

In this magneto-encephaolography (MEG) study, we used data-driven methods to explore

(a) a set of thirty left hemisphere brain areas identified by meta-analysis as parts of the language

5

Page 6: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

processing network (Vigneau et al., 2006) which were active during a relatively unconstrained

scene description task, (b) how these areas assembled into networks in real time during

processing, and (c) whether any of these networks showed frequency effects for single words,

and two, three, and four word sequences (N-grams). Our leading question was whether

frequency of use for N-grams would modulate amplitude responses in the MEG signal in ways

consistent with the hypothesis that high frequency multi-word sequences are stored in memory

like words. Given the behavioural and EEG research reviewed above, we expected that

amplitude responses would decrease with increasing frequency of N-grams, showing a

facilitative effect of increasing frequency. We also expected that if high-frequency multi-word

sequences were stored, we would see either (1) a 'flattening' of amplitude responses in the higher

frequency ranges that would index this (as in Tremblay & Baayen 2010), and/or (2) that

amplitude responses to high frequency multi-word N-grams would mimic amplitude responses to

high-frequency single words if they were stored in the same way. The latter hypothesis was to

allow for the possibility of, for instance, amplitude increases that might arise as a result of

competition in lexical access amongst N-grams (e.g. Pylkkänen et al., 2004; Solomyak &

Marantz, 2009; Simon et al., 2012). That is, if high-frequency N-grams are stored then they

might show competition effects (reflected in amplitude increases) comparable to those observed

for single words.

Given the robustness of behavioural experiments, MEG studies linking decreasing

amplitude responses to increasing frequency, and the absence at the time of MEG studies

exploring combinatorial effects, we did not begin with explicit hypotheses as to what amplitude

responses reflecting combinatorial processes would look like. However, we note here that MEG

studies of simple syntactic or conceptual combinatorial processes show amplitude increases in

6

Page 7: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

anterior temporal lobe areas (e.g. Bemis & Pylkkänen, 2012; Del Prato & Pylkkänen, 2014). We

take up the fact that different processes may elicit amplitude increases in the discussion.

To foreshadow our results, we found only some of the effects we expected in the

amplitude responses, none of which could be convincingly interpreted as representing

unequivocal evidence for storage of high frequency multi-word sequences. Given this, we turned

to consider the network topographies and timing as possible sources for behavioural effects

associated with high frequency. Review of the timing and functional characterisation of network

areas revealed significant within and between network parallelism within an overall serial

processing cascade. These findings suggest that some frequency effects might be accounted for

in terms of the parallel processing apparent in network architectures and timing. We also take

these matters up in the discussion.

2. Methods: English Scene Description with MEG Recordings

To address our questions, we built on experiments that examined the production of

utterances during scene description (Levelt, 1983; Bock, 1986; Martin et al., 1989; Oomen &

Postma, 2001; Indefrey et al., 2001; Haller et al., 2007; Marek et al. 2007; Arbib & Lee, 2008;

Arbib, 2010). The main reason for using scene description is that it requires the speaker to access

and retrieve material from the mental lexicon and combine it to create utterances. It also allows a

range of variation among individuals in terms of the word sequences that are produced. It thus

provides an ecologically valid task for the questions we pose.

We recorded whole-head magneto-encephalographic (MEG) data during the scene

description experiment. We chose the MEG technique for its excellent temporal and spatial

resolution (Lu & Kaufman, 2003; Hansen et al., 2010) on the one hand, and, on the other, for its

recognized potential for testing theories and resolving questions in linguistics that are not readily

7

Page 8: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

amenable to the traditional tools of linguistics (Marantz, 2005; Pylkkanen & Marantz, 2003). The

MEG technique affords us the possibility to characterise the timing and topography of networks

as modulated by frequency effects tied to N-grams.

2.1. Participants

We recruited ten healthy participants from the Halifax Regional Municipality, Nova Scotia,

Canada, for participation in the study. They were paid $50 for their participation. In the pre-scan

session, we recorded a range of demographic variables and subject characteristics which are

provided in Table S1 as Supplementary Data. We did not include any of these variables in our

statistical analyses. Our goal here was to determine whether we could draw generalisations at the

group level. It stands to reason that the effects we found here may vary to some degree as a

function of some of these subject characteristics. Nevertheless, the fact that an effect reached

statistical significance under the present circumstances with a heterogeneous group (in addition

to the random effect terms present in the model) would indicate that they may apply more

broadly. Future work with a larger sample size will investigate the potential influences of these

individual-level variables.

2.2. Experimental Design

Participants were asked to describe scenes. We selected 210 color pictures from the

Internet using the Google search engine, with usage rights “free to use share or modify”. These

pictures represented a wide range of scenes. We endeavoured to create a continuum with familiar

scenes on one extreme (a cup of coffee on a table, a woman texting on her cell phone, someone

taking a nap) and unfamiliar ones on the other (a bushman doing something that may be applying

8

Page 9: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

poison to the tip of his arrows, cacti growing in a desert, a blacksmith forging a tool). The

pictures are provided as Supplementary Data. Ten extra pictures were used for the practice

session at the beginning of the experiment.

The experiment was divided into six blocks. Each block was further divided into three sub-

blocks: (a) first an overt naming sub-block, (b) then a covert naming sub-block, and (c) finally a

describing sub-block (they were always presented in this order). Only the describing sub-block is

addressed in this paper. In the describing sub-block, participants were asked to describe the

pictures. They could say whatever they wanted about the pictures although they were asked to

limit their responses to about 2 sentences; actual examples include It looks like a man is sleeping

on a bench, I have no idea what this is or I was mistaken earlier, this is a man welding a

backhoe. For each participant, the 210 pictures were randomly ordered (so that no two

participants saw the same sequence of pictures) and then divided into six (35 pictures for each

block). In each block, participants first overtly named the 35 pictures, then covertly named the

same 35 pictures, and finally described the same 35 pictures. The 35 pictures in each block were

presented in a different random order for each task, and were not shown in other blocks.

Participants had a two-minute break between each block. Each picture was presented on a non-

magnetic back-projection screen (112 cm in diameter) using Presentation, a stimulus delivery

and experimental control program (Neurobehavioral Systems, Inc.). The screen was situated

roughly one meter away from participants. A fixation cross appeared on the screen for 500 ms,

followed by a blank screen for a random length of time (between 500 and 1000 ms), and then a

picture. Participants were instructed to complete the task for that picture (e.g., describe it), then

press a button on a MEG compatible button box. The picture stayed on the screen until

participants pressed the button. Participants could begin speaking whenever they wished. A

9

Page 10: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

microphone was positioned in the booth approximately 3 meters away from participants and the

MEG scanner, to minimise MEG artefacts.

2.3. Language Data

Participants’ verbal descriptions of the scenes were transcribed. The first four words of

every utterance were split into single words (unigrams), two-word sequences (bigrams), three-

word sequences (trigrams), and four-word sequences (quadgrams). None of the utterances

contained idioms or metaphors. The transcribed descriptions were subsequently coded as to

whether they were syntactically complex if they included subordinate clauses of any kind, or any

re-ordering associated with information structure, otherwise they were coded as simple. We

included fillers such as “ah” and “um”. Contractions such as don’t, can’t, and they’re counted as

one word. Frequency counts were obtained from the Bing search engine constrained to Canadian

sites (see Thelwall & Sud, 2012, for details about frequency counts obtained from this search

engine).1

To compare unigram, bigram, trigram, and quadgram frequencies, we computed, for each

participant and each trial, the mean unigram, bigram, and trigram frequencies and then computed

the natural log of these values. There was substantial collinearity between the probabilistic

measures (condition number = 131.17). It is crucial to be able to ascertain that whatever

frequency effects we find are independent of other frequency effects. For example, in the event

that we find a quadgram frequency effect, we want to make sure that it is not confounded by the

frequency of smaller N-grams. Thus, we orthogonalised (i.e., made independent) in a

1 Based on research conducted by a number of corpus linguists (e.g., Keller & Lapata, 2003; Meyer, Grabowski, Han, Mantzouranis & Moses, 2003), the Bing frequency counts are as reasonably representative of the informal usage of our Nova Scotian participants as frequency counts that might have been obtained from the spoken component of any traditional corpus such as the British National Corpus or the Corpus of Contemporary American English where differences arise from dialects in the latter and from medium in the former.

10

Page 11: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

unidirectional hierarchical manner bigram frequencies from unigram frequencies, trigram

frequencies from unigram and bigram frequencies, as well as quadgram frequency from unigram,

bigram, and trigram frequencies by way of linear regression. The resulting probabilistic measures

were confirmed to be orthogonal (condition number = 1; see Tremblay & Tucker, 2011, for more

details about collinearity and residualisation). A summary of the original frequency counts from

Bing and the scaled/residualised log values are provided in Table 1. It can be seen that the

distributions are approximately normal (although the left tails are somewhat long). We use these

orthogonalised variables in our statistical analyses.

Table 1. Summary of unigram, bigram, trigram, and quadgram frequency counts (number of URLs containing the n-gram). The percentages are estimates of the underlying distribution quantiles at 0%, 25%, 50% (median), 75%, and 100% for each frequency count. s = scaled. r = residualised. Log = natural logarithm.

0% 25% 50% 75% 100%Unigram 28,397,500 727,350,000 1,038,735,000 1,383,250,000 2,810,000,000sLogUnigram -6.59 -0.49 0.18 0.72 2.05

Bigram 1,033,600 161,846,667 392,533,333 730,000,000 2,549,666,667rLogBigram -5.59 -0.56 0.17 0.7 3.17

Trigram 31,250 33,875,000 139,645,000 407,500,000 3,475,000,000rLogTrigram -8.3 -0.54 0.21 0.75 3.54

Quadgram 62 7,460,000 49,800,000 222,000,000 2,870,000,000rLogQuadgram -12.53 -0.51 0.37 0.83 3.46

2.4. Magneto-encephalograpic Recordings

Prior to entering the MEG scanner room, electrodes were placed at the left and right canthi

and at the top and bottom of each participant’s left eye to detect eye-movements and blinks, on

the left and right jaw muscle (masseter muscle) to detect jaw movement artifacts from speech,

and on the left and right side of the neck, where the jugular veins are located, to record heart

beats and neck electro-myograms. A ground electrode was placed behind the neck. Electrodes

11

Page 12: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

were also placed on the left and right side of the MEG system to record environmental noise in

the magnetically shielded room. Four head position indicator (HPI) coils were affixed to the

scalp (behind the left and right ears, and on the left and right side of the forehead). Using a 3D

digitiser (Isotrak, a trademark of Polhemus Navigational Sciences), we digitised the position of

the HPI coils on the scalp, as well as the three fiducial landmarks that were subsequently used for

MEG/MRI co-registration (nasion, left and right pre-auricular points). The head shape was also

digitised using approximately 200 points for a more accurate MEG/MRI co-registration.

MEG recordings using a 306-channel whole-head Elekta Neuromag system while

participants performed the task. The exact location of participants’ head within the helmet was

recorded using signals produced by the HPI coils. We determined that none of the participants

moved their head by more than 5 mm translation-wise and 5º rotation-wise. Total scan time per

subject was approximately one hour.

2.5. MEG Pre-processing

Initial pre-processing of the MEG recordings was performed on a PC running 64 bit Red

Hat Enterprise Linux 5 as follows. We first used Elekta’s proprietary MaxFilter software to

suppress magnetic interferences coming from inside and outside of the sensor array and to

compensate (to a certain extent) disturbances due to head movements (Taulu & Simola, 2006).

The data was then exported to R version 2.15.2 (R Development Core Team, 2012) for further

pre-processing on a PC running 64 bit Ubuntu 12.04 LTS. The neuro-magnetic signals from each

of the three types of sensors was decomposed using independent components analysis (package

icaOcularCorrection; Tremblay, 2013). Components that strongly correlated with eye

movements, blinks, heart beat, jaw and neck muscle artefacts (as measured by the electrodes

12

Page 13: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

placed on the participant to measure these signals directly), as well as noise within the booth

(measured from the EEG electrodes on each side of the MEG system to record this noise) were

removed. The data was then band pass filtered from 0.1 to 100 Hz and down-sampled from 1000

Hz to 250 Hz. The MEG data was subsequently co-registered with participants' structural

magnetic resonance images (acquired on a 1.5 Tesla GE scanner; see

structural_MRI_scans_preprocessing.pdf in Supplementary Data for details) using MRI

integration software that is an integral part of the Elekta Neuromag Software installed on the PC

running Red Hat.

Following previous work conducted by the fifth author (e.g., Bardouille & Bow, 2012), we

used the signal space separation beamfomer (Vrba et al., 2010) implemented in software

provided by Elekta Neuromag to derive the time courses of activity at each one of the thirty

language regions in the left hemisphere identified by Vigneau and colleagues’ large-scale meta

analysis of 129 functional MRI studies of language (Table 4 in Vigneau et al., 2006, p. 1419).

These regions are listed in Table 2. The MNI coordinates were converted to Talairach space

using the Signed Differential Mapping Coordinate Utilities program

(http://sdmproject.com/utilities/?show=Coordinates) with the MNI to Brett’s mni2tal option

switched on (accessed February 19, 2013). To test how well the beamformer performed, we

derived the time course of activation at five locations outside the head (left, right, top, front,

back) and in the left and right ventricles.

The beamformer data was segmented into epochs ranging from 100 ms before the

appearance of a picture on the screen to 2500 ms after its presentation. It was subsequently

baseline corrected on a 100 ms pre-stimulus interval for each trial of each subject. Finally, we

further downsampled the pre-processed data from 250 Hz to 125 Hz for analysis.

13

Page 14: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Table 2. Summary of brain regions comprising the phonological, semantic, and sentence processing systems from Vigneau et al. (2006). Coordinates are in Talarirach space. RolS = Rolandic Sulcus. RolOp = Rolandic Operculum. F3t = pars triangularis of the inferior frontal gyrus. F3op = pars opercularis of the inferior frontal gyrus. F3orb = pars orbitalis of the inferior frontal gyrus. SMG = supramarginal gyrus. PT = planum temporale. T1 = superior temporal gyrus. T2 = middle temporal gyrus. T3 = inferior temporal gyrus. Prec = precentral gyrus. F2 = middle frontal gyrus. PrF3op = precentral gyrus/F3op junction. STS = superior temporal sulcus. AG = angular gyrus. Fusa =anterior fusiform gyrus. Out = outside the head. Vent = ventricle. a, p, l, m, d, v = anterior, lateral, middle, dorsal, and ventral respectively.

Region x y z Region x y zPhonology RolS -47 -4 41 Semantics PrF3op -42 6 33

Prec -48 3 24 F3opd -44 21 21F3td -44 23 13 F3tv -43 20 3RolOp -48 8 2 F3orb -37 30 -9F3orb/F2 -33 36 -7 AG -45 -65 27SMG -42 -49 37 T1p -54 -46 16T1 -50 -36 13 T1a -55 -13 -4PT -59 -26 10 T3p -47 -54 -3T1a -55 -12 -2 T2ml -58 -36 3T3p -50 -58 -3 Fusa -38 -34 -9T2m -50 -34 -8 Pole -41 2 -20

Sentence F2p -37 12 44 Test Out -106 13 9F3opd -49 17 21 Out 106 13 9F3tv -44 25 1 Out 3 15 106STSp -50 -51 23 Out -2 102 -10T1a -56 -13 -6 Out 3 -116 36T2p -40 -61 8 Vent 7 -8 19T2ml -56 -39 4 Vent -7 -1 18Pole -47 5 -20

3. Results

3.1. Finding Different Language Networks

We first determined whether the time courses of activation at the thirty brain regions from

Vigneau et al. (2006) could be grouped into meaningful networks, on the assumption that if their

activity co-varied, they might be contributing to the same process (Hebb 1949, p. 70). The time

courses were first decomposed into twenty independent components by way of independent

components analysis (Brookes et al., 2011) using R function fastICA from package fastICA

(Marchini et al., 2012). Each brain region was characterised by 20 mixing weights, one for each

independent component (IC).2 The mixing weights specified the contribution of a region to an

independent component. For example, if the weight of a region in an IC was close to 0, then the

2 Trying to extract more components than 20 resulted in fastICA failing to converge.

14

Page 15: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

region did not take part in the process characterised by that IC. On the other hand, if its weight

was close to 1 (or -1), then the region was an important player in the process characterised by the

IC. We subsequently performed a hierarchical cluster analysis on the brain regions’ estimated

mixing weights. This was done using R functions dist, which computed the euclidean distances

between the regions' mixing weights, and hclust, which performed the clustering of the brain

regions based on these distances (both functions are from R package Stats).3

Results of the clustering are shown in Figure 1, where each brain region is plotted onto a

template of the brain obtained from Brainstorm version 3.1 (http://neuroimage.usc.edu/

brainstorm/). Each region is color coded according to its membership in one of five networks

identified by the hierarchical clustering analysis: Network 1a (yellow), network 1b (orange),

network 2a (red), network 2b (green), and network 3 (blue). The strength of activation of a

region within a network, calculated as the root mean square of its mixing weights across the 20

ICs, is depicted by the size of its circle. The bigger the circle, the stronger the contribution of a

region to the overall activity of the network during the task. Overall, the most strongly activated

Networks were 2a, where activation was greatest in phonological T1 and RolS, and Network 2b

where activation was greatest in F3td and semantic T1a.

In Figure 1, assignment of regions within networks as semantic, phonological or sentence

processing (in the legend) is based on the functional assignments in the meta-analysis presented

in Vigneau et al. (2006). Two points need to be made about this here. First, these regions are

generalisations, derived from the clustering methods for peaks in Vigneau et al. Secondly,

Vigneau and colleagues' functional categorisations (as phonological, semantic, and sentence

processing) are based on their classification of the tasks in individual studies entered into their

3 In the hierarchical clustering analysis, each object is initially assigned to its own cluster and then the algorithmproceeds iteratively, at each stage joining the two most similar clusters, continuing until there is just a single cluster.At each stage distances between clusters are recomputed by the Lance-Williams dissimilarity update formula.

15

Page 16: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

meta-analysis. For example, if an area was reported as engaged during a phonological task it was

classed as 'phonological' even though it was known to be involved in (for instance) motor control

of articulation. In the review and discussion, we consider the potential functional contributions of

regions and networks in more detail.

The regions forming Network 1a in yellow were all implicated in sentence processing in

Vigneau et al. (5/5; temporal pole (Pole), lateral middle temporal gyrus (T2ml), anterior superior

temporal gyrus (T1a), dorsal pars opercularis (F3opd), and ventral pars triangularis (F3tv)).4 The

activity in this network was strongest at the temporal pole. Network 1b (orange) was fronto-

temporal and was largely associated with semantic processing (3/4; ventral pars triangularis

(F3tv), Pole, pars orbitalis (F3orb)), with one area implicated in phonological processing (1/4;

middle temporal sulcus (T2m)). The network maximum was at T2m.

Network 2a in red was mostly associated with aspects of phonological processing (4/5;

supramarginal gyrus (SMG), middle superior temporal gryus (T1), rolandic sulcus (RolS), lower

precentral gyrus (Prec)), but included dorsal middle frontal gyrus (F2p) linked to sentence

processing and working memory. Area T1 (modality-selective auditory cortex) was the most

active area in this network. Most peaks in Network 2b (in green) were also involved in

phonological processing (4/6; dorsal pars triangularis (F3td), the junction between pars orbitalis

and middle frontal gyrus (F3orb/F2), anterior superior temporal sulcus (T1a), rolandic operculum

(RolOp)), but Network 2b included two additional areas associated with semantic processing

(2/6; anterior superior temporal gyrus (T1a), and anterior fusiform gyrus (Fusa)). Regions F3td

and T1a (semantic) had the greatest level of activation. Eight of the ten peaks comprising

4 Vigneau et al. note that these sentence processing areas are all close to or overlap with areas associated withsemantic tasks, and indeed many of the sentence processing tasks require semantic processing. The reason fordesignating these areas as 'sentence processing' is that they were activated in tasks which used sentences or texts asstimuli or explicitly investigated syntactic contrasts. Conversely, areas which are classified as 'semantic' used wordlevel stimuli to investigate aspects of lexical or conceptual meaning. For simplicity, we keep Vigneau's classificationhere and take up specific functional contributions in the review and discussion.

16

Page 17: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Network 3 in blue were located in temporo-parietal regions, and two were in frontal cortex. In

Vigneau et al. (2006), six of these areas are associated with semantic processing (angular gyrus

(AG), inferior frontal gyrus at the junction with the precentral gyrus (PrF3op), dorsal pars

opercularis (F3opd), posterior superior temporal gyrus (T1p), middle lateral temporal gyrus

(T2ml), and posterior inferior temporal gyrus (T3p)). Additionally, two peaks were implicated in

phonological processing (planum temporal (PT) and posterior middle temporal sulcus (T3p)),

and two in sentence processing (posterior middle temporal gyrus (T2p) and posterior superior

temporal sulcus (STSp)). PT, PrF3op, and AG were the most active peaks within this network.

Figure 1. The left panel shows a visual representation of the estimated mixing matrix obtained by decomposing the thirty time courses of activation into twenty independent components. The right panel illustrates the five networks obtained from the ICA decomposition and subsequent clustering. The regions (circles) that constitute a network are of the same color. The strength of activation of a region is indicated by the size of its circle (it is proportional to the mean of the absolute values of its 20 mixing weights). The networks appear in the legend above the brain, where, foreach network, the number of regions assigned as semantic, phonological or sentence processing in Vigneau et al. (2006) is listed. See Table 2 for legend of brain regions.

3.2. Statistical Analysis

We wished to specifically determine (1) whether the network time courses of activation

were statistically reliable, (2) whether the activity in these networks was modulated by the

17

Page 18: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

frequency of N-grams of different lengths, and (3) whether different networks were affected

differently by the frequency of N-grams of different lengths. To answer these questions we

performed generalized additive mixed-effects modelling (GAMM; Faraway, 2006; Hastie &

Tibshirani, 1990; Keele, 2008; Wood, 2006) following the procedure used in De Cat,

Klepousniotou, and Baayen (2015), Hendrix (2009), Kryuchkova et al., (2012), Tremblay (2009),

Tremblay and Baayen (2010), and Tremblay and Newman (2015). A GAMM is a non-parametric

regression model that can take into account repeated measurements and other sources of random

variability (i.e., random effects). Like ANOVA, GAMM also fits within the generalized linear

mixed-effects model (GLMM). ANOVA and GAMM differ in that (a) GAMM can model more

complex random-effect structure (e.g., crossed, independent random effects); (b) GAMM is

robust to violations of sphericity if the correct random-effect structure is used, eliminating the

need to correct for this post hoc using methods that are known to be either overly conservative

(Greenhouse-Geisser) or liberal (Huynh-Feldt); (c) GAMM enables one to appropriately model

imbalanced data; and (d) the model-fitting objective is augmented by a “wiggliness” penalty

enabling one to model nonlinear relationships that do not over- or underfit the data in a manner

that reduces subjectivity and circularity (see Tremblay & Newman, 2015, for more details). This

latter characteristic enabled us to model the whole time course of activity for each network, thus

obviating the need to bin the time continuum into time windows for analysis and correct for the

numerous multiple comparisons that would be performed between these windows, therefore

increasing overall statistical power and replicability (Tremblay & Newman, 2015).

We performed GAMM on the un-averaged time courses of activation obtained from the

beamformer (20,192,276 data points) using function bam from R package mgcv (Wood, 2012).

We created a network factor variable with levels 1a, 1b, 2a, 2b, and 3. Thus all time courses from

all trials from all regions comprising one network were treated as repeated measures of activity

18

Page 19: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

of that network; activity was not parcellated by sub-region within each network for the statistical

analysis. We used continuous unigram, bigram, trigram, and quadgram frequency counts as

described above in section Stimuli. This enabled us to circumvent the pitfalls of dichotomisation

(i.e., binarizing into high versus low) such as reduced power, increased chances of finding

spurious effects, and the inability to detect non-linear relationships (Cohen, 1983; MacCallum et

al., 2010). Because we were analysing time series data, we included an auto-regressive model of

order 1 (AR1; estimated correlation parameter = 0.64).

The data were trimmed to include only samples within 2.5 standard deviations of the

mean of the residuals of a model including by-subject and by-network random effects and a Time

× Network interaction. This had the effect of removing data points with potentially undue

influence – 248,442 data points were removed (1.2%). The model was subsequently re-fitted to

the trimmed data.

The most probable model given the data was identified by way of Akaike’s Information

Criterion (AIC) comparisons (Motulsky & Christopoulos, 2004; Zuur et al., 2009). Note that this

method is based on information theory and does not produce probability (p) values, nor suggest

any conclusions about the statistical significance of a model. Rather, the AIC value provides

information concerning how well the data supports a model. Model selection can be achieved by

comparing the AIC values of two or more models. The model with the lowest AIC value is more

likely to be correct. Before comparing two models, we first looked at the graph of the fitted

curves with 95% confidence intervals. If the curves made no scientific sense – e.g., predicted

values ranging from -60 to 80 units would have been completely implausible here given that the

beamformer data ranged from -35 to 35 units – the model was rejected (Motulsky &

Christopoulos, 2004, p. 151). Following established practices, we deemed a model to be worth

retaining if the AIC value of the more complex model was lower by at least 3 point than the AIC

19

Page 20: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

value of the less complex model – e.g., a model with only a smooth for Time is less complex

than a model with a Time × Network interaction, which in turn is less complex than a model with

a Time × Network × Unigram interaction. Supplementary Table S2 lists, for each model

considered, the degrees of freedom used and its AIC value.

The most likely model given the data (AIC = 91675125) included crossed by-subject and

by-network random effects, a Time × Network interaction, a Unigram × Network interaction, a

Bigram × Network interaction, a Trigram × Network interaction, and a Quadgram × Network

interaction. Models with Time × N-gram frequency interactions or Time ×N-gram frequency ×

Network interactions were not retained.

Regions of a curve that did not include 0 in its 95% confidence intervals were deemed

significant (i.e., p < 0.05). All other portions of the curve were deemed non-significant. For

example, the time curve in Network 1a was significantly different than 0 only between 210 and

650 ms given that it is only in this interval that 0 was not included in the curve’s 95% confidence

interval. If the whole curve contained 0 in its confidence interval, it was deemed non-significant.

Comparisons between different curves were made with the upper and lower confidence intervals

of the difference between two curves. The width of the confidence intervals were Bonferroni

corrected for 405 comparisons. We performed a similar analysis on the seven test locations.

Neither time, unigram, bigram, trigram, or quadgram frequency significantly correlated with

activity in these test regions.

3.3. The Optimal Model

In this section, we present the results of the optimal model, which are illustrated in Figure

2. We first discuss the results regarding the time courses of activation at each of the five

networks and then describe the N-gram effects on the five networks.

20

Page 21: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

3.3.1. Time × Network interaction.

The networks showed significant temporal modulation, where increases in activation

corresponded to departures from 0 (the baseline) irrespective of whether they were in the positive

or negative polarity. The only exception was Network 2b (green) where activity did not vary as a

function of time (although it did vary as a function of N-gram frequency; see below). Activity

began in Network 1a 210 ms after the presentation of a picture and peaked 430 ms later. This

network became inactive from 650 ms onwards. It was closely followed by Network 1b, where

activity was significant 300 ms after picture onset. Activity in this network also peaked 430 ms

after picture onset and ended 650 ms post stimulus. Following the period of significant activity in

Networks 1a and 1b, activity became significant in Network 2a 825 ms after picture onset and

lasted until 1660 ms. A first peak occurred at 870 ms and a second one 1575 ms post stimulus.

Finally, brain activity moved to Network 3 from 1840 to 2410 ms. In sum, as depicted in panel

(A) of Figure 2, activity began in Networks 1a and 1b and then moved to Networks 2a and 3.

21

Page 22: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Figure 2: Results of the optimal GAMM model. In panel (A), the x-axis is time in milliseconds (ms) and the y-axis is amplitude (pseudo-Z2) Each time course is color coded and refers back to the networks it pertains to, depicted in panel (B). The hashed time windows, also color coded, indicate significant portions of the time courses (i.e., where the 95% confidence intervals did not include zero). Although the estimated curves in the pre-stimulus interval were not flat, they all included 0 in their 95% confidence interval meaning that they were, statistically speaking, flat. Panels (C)–(F) show the N-gram frequency effects on each network. The x-axis is log frequency of use. The y-axis isamplitude in (pseudo-Z2; please note the different scale in each panel). As in panel (A), effect curves are color-codedaccording to the networks they pertain to and hash marks indicate frequency ranges over which activity in particular networks were significant.

22

Page 23: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

3.3.2. N-gram Frequency × Network interaction.

N-gram frequency effects are re-plotted in Figure 3 for ease of comparison across N-

grams (the difference between Figures 2 and 3 being that each effect is plotted on the same y-axis

scale without the significance windows). It is important to understand that the N-gram Frequency

× Network interaction captures the fact that the average activation level of a network changed as

a function of the frequency of use of the N-grams participants were preparing to utter. This effect

modulated the Time × Network interaction in an additive way by shifting (up or down) the mean

of the entire time course of a network, thus increasing or decreasing its overall magnitude of

activity.

Of all N-gram frequencies tested, unigram frequency had the smallest influence on

network activity. The frequency of occurrence of single words did not reliably affect brain

activity in any network except 2a where it decreased as frequency increased in the lower

frequency range and flattened at -4.3. Bigram frequency reliably affected Networks 2a and 2b. In

both networks, activations reliably decreased as frequency increased and flattened at -3 and -2

respectively. In Network 2b, there was also a reliable increase in activity for high frequency

bigrams. Trigram frequency affected the overall level of activation in all networks except 1a. The

effect was most notable in lower frequency trigrams, where the level of activation decreased as

frequency increased and flattened between -5 and -4 in all networks. This effect was greatest in

23

Page 24: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Figure 3: Panels (C)–(F) of Figure 2 plotted on the same scale.

Network 2a, followed by Network 2b, Network 3, and finally in Network 1b. There was also a

slight increase in activity for higher trigram frequencies, which was reliable in Networks 1b and

2a. Quadgram frequency reliably affected Networks 2a, 2b, and 3. In all three networks, the level

of activity decreased as frequency increased. The decrease in activity was more gradual in

Network 2b which flattened at -2 than in Networks 2a and 3 which flattened at -8 and -6

respectively. Activity subsequently increased for higher frequency quadgrams in Network 2a.

24

Page 25: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

In sum, N-gram frequencies significantly affected the overall level of activity in all

networks except Network 1a. In general, the decrease in activation was logarithmic from very

low to somewhat low frequency of use. In all cases, the 0 point – where N-gram frequency

ceased to modulate network amplitudes – occurred in the bottom 25% of the data. In some cases,

there were significant activation increases for high frequency N-grams.

4. Review and Discussion

We set out to discover whether frequency of use for N-grams would modulate amplitude

responses in the MEG signal in ways consistent with memory-based or combinatorial accounts

of language. To achieve this, we investigated the time course of activity occurring at thirty

linguistically relevant brain regions, elicited during the pre-production stage of a relatively

natural scene description task. These time courses were important in enabling us to determine

what cortical networks were at work, which ones were active before the others, and which ones

were affected by which N-gram frequencies. In this section, we review literature pertaining to

functions of individual areas contributing to the networks and discuss the networks' likely

function(s) given their functional profiles, timing, and amplitude response to N-gram

frequencies.

4.1. Networks 1a and 1b: Sentence Processing/Lemma Retrieval, and Semantic

Selection/Integration

Networks 1a and 1b comprise a single network insofar as their time courses of activation

are similar: 1b lags behind activity onset in 1a by 90 ms, but they both peak at 430 ms and

become inactive at 650ms. They also involve areas that are adjacent or overlap each other

25

Page 26: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

(Vigneau et al. 2006). The spatial proximity of these areas is a potential confound in interpreting

our networks. However, we note that areas identified in Vigneau's study as functionally distinct

based on their task profiles were recruited into different networks in our study, largely in

agreement with the functional roles assigned by Vigneau and colleagues. For instance, Networks

1a and 1b differ in that 1a only encompasses regions linked to sentence and text processing tasks

in Vigneau's study, and shows no reliable response to N-gram frequencies. Network 1b, on the

other hand, aggregates regions from word level semantic and phonological tasks, and is affected

by N-gram frequencies. The time course and spatial proximity of the regions comprising the two

networks suggest that they collaborate. Nevertheless, the different task profiles associated with

the regions that are encompassed in each network, and their response to frequency effects

indicate that the contributions they make to picture description differ. Inspection of the ROI's

they aggregate is suggestive as to what these functions may be.

4.1.1. Network 1a: A sentence processing/lemma retrieval network.

According to Vigneau et al. (2006) both of the frontal areas, dorsal pars opercularis and

ventral pars triangularis of the left inferior frontal gyrus (F3opd and F3tv) in Network 1a are

recruited in tasks that contrast complex syntactic structures with simpler ones, as well as other

sentence and text tasks which engage syntax and semantics (Vigneau et al. 2006, S Tables 24,

25). The overlap of both these areas with areas associated with semantic (word level) tasks,

together with their activation in the more semantic sentence and text tasks makes it unreasonable

to attribute a 'pure' syntactic function to them. However, all of the tasks do share the common

feature that they require responses to sentences or texts rather than to isolated words, and at least

half of the tasks assess syntax with relevant syntactic contrasts. Moreover, a number of studies

subsequent to Vigneau et al.'s meta-analysis have identified these areas as specifically important

26

Page 27: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

for syntactic processing (e.g. Caplan et al., 2008; Grewe et al. 2007; Makuuchi et al., 2009; Tyler

et al. 2011; Fedorenko et al., 2012; Makuuchi & Friederici 2013; Hagoort &Indefrey 2013;

Saegert et al. 2013). Thus, we think these areas are contributing to syntactic processing in

Network 1a, though we cannot say whether this means they are involved in some specific

syntactic computation or, for instance, are supporting selection of syntactic elements and/or their

unification with lexical semantics. We can however note that since most (87.5%) of the sentences

produced in response to pictures were syntactically simple, these frontal areas are likely not

activated because of complex syntactic processing.

The lateral middle temporal lobe (T2ml) in Network 1a is robustly associated with “word

knowledge” and Vigneau et al. (2006) describe it as an area essential to word meaning, more

important for semantics than for syntax though it is active on syntactic tasks. Dronkers et al.

(2004) also made this point in their study of aphasic patients insofar as lesions here produce

profound word level comprehension deficits. It is not spatially distinct from the semantic area

(T2ml) identified by Vigneau and colleagues as active on word level semantic tasks (semantic

T2ml is recruited into our Network 3; see below). However, Indefrey and Levelt (2004)

identified T2ml as essential for lemma retrieval and selection. Lemmas correspond to formal

properties of words such as their categorical features (as nouns, verbs, etc.), inflectional

properties (e.g. tense for verbs), and dependency relations (e.g. the type of complements verbs

take; Roelofs et al., 1998; Indefrey & Levelt, 2004). Given that lemmas involve information

necessary for sentence construction, contribution to some aspect(s) of lemma retrieval is a

plausible role for T2ml in Network 1a.

The temporal pole (Pole) peak recruited into Network 1a is commonly engaged in

sentence and discourse processing. However, its function too is controversial. It has been

associated with complex syntactic processing in fMRI and voxel based lesion symptom mapping

27

Page 28: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

studies (den Ouden et al., 2012; Magnusdottir et al., 2013), in MEG studies with a combinatorial

function in simple phrases such as red boat that is either syntactic (e.g. Bemis & Pylkkänen,

2012) or conceptual (Del Prato & Pylkkänen, 2014), but not domain general combination (Bemis

& Pylkkänen, 2013), and sentence and discourse processing more generally as in the task profile

in Vigneau et al. (2006, S Table 30). Conversely, work exploring the semantic variant of primary

progressive aphasia (sv-PPA) shows that focal damage in the left Pole does not affect

comprehension of complex syntax, which contrasts with marked deficits in naming tasks (Wilson

et al., 2014). In fact, the naming deficits combined with preserved syntax (and episodic memory)

for sv-PPA patients has been treated as evidence that the Pole functions as an amodal semantic

hub that integrates information from different domains (Patterson et al., 2007, for review) or that

the Pole is specifically important, like T2ml, for lemma selection, perhaps especially for

subordinate and/or specific concepts (e.g. Hurley et al., 2012; Mesulam et al., 2009, 2013;

Schwartz et al., 2009; Walker et al., 2011).

One way of interpreting such varied findings is that the Pole is important for all these

functions, with adjacent neuronal clusters contributing to different aspects of different tasks, as

the split in our networks as well as in Vigneau's study suggest. That is, since the Pole activation

in Network 1a spatially overlaps with the Pole activation in Network 1b, which is more clearly

associated with word level semantic tasks in Vigneau, and since there is no response to N-gram

frequencies in Network 1a but there is a frequency response in 1b, it may be the case that the Pole

activation in 1a indexes a combinatorial or integrative role for (abstract) features relevant for

sentence processing as suggested by the tasks in Vigneau and colleagues (also see e.g.,

Westerland & Pylkkänen, 2014), while Pole activation in 1b reflects selection and/or integration

of more subordinate semantic features.

28

Page 29: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Finally in Network 1a, area T1a, the anterior superior temporal gyrus, has been associated

with syntactic processing in a number of studies subsequent to Vigneau et al. (e.g., Allen et al.,

2012; Brennan et al., 2012; Herrmann et al., 2011), with morphology (Newman et al., 2010) and

again, with lemma retrieval (Hurley et al., 2012). Moreover, in MEG studies, T1a has frequently

been associated with aspects of lexical-concept retrieval (e.g. Embick et al., 2001; Lewis et al.,

2011; Pylkkänen et al., 2000, 2002, 2004; Lewis et al., 2011; Simon et al., 2012; Solomyak &

Marantz, 2009). Vigneau et al. (2006) note that sentence T1a overlaps with semantic T1a, which

in turn is not spatially distinct from phonological T1a. They describe the area as part of the

‘human voice selective area’ and speculate that the more ventral sentence processing component

recruited into Network 1a may be contributing to prosody and integration of meaningful speech.

All three T1a areas are simultaneously active in our scene description task, although the

phonological and semantic areas are recruited into Network 2b (see below). Thus assigning any

very particular function to the T1a area aggregated into the sentence processing network is

speculative. However, it is of interest that this area – associated with sentence processing in

Vigneau – is in fact recruited in this network rather than any of the others, making it plausible

that it is involved in retrieving and/or integrating lexical-structural components and perhaps

morpho-syntactic features.

What all this amounts to is that Network 1a does appear to be an initial sentence

processing/lemma retrieval network. Its contribution to the description of the scenes, based on

the functions of its parts as just reviewed, suggest that it is involved in the retrieval, selection,

and integration of lexical (lemma) and structural information relevant to sentence and text

processing. The fact that the Pole is most active in this network, that the network is unresponsive

to frequency effects, and that it is the first network to become active supports such a hypothesis

insofar as lexical and structural retrieval and selection might precede other aspects of production.

29

Page 30: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Importantly, although we describe this network as the first to become active because it is first to

be reliably modulated by time, Network 2b is continuously active. We argue below that Network

2b is involved in lexical-concept retrieval. Thus the lexical-concepts that would drive lemma

activation would be made available via interaction of Network 1a (and 1b) with 2b.

4.1.2. Network 1b: A semantic selection and integration network.

Network 1b is dominantly semantic. Of its frontal components, ventral pars triangularis

of the inferior frontal gyrus (F3tv) is associated with semantic retrieval, selection, association,

categorization and generation of words or associated features in Vigneau et al. (2006). Pars

orbitalis of the inferior frontal gyrus (F3orb) is associated with a similar range of tasks with

emphasis on controlled retrieval and selection of lexical-semantic and episodic information

(Noonan et al., 2013 for a recent meta-analysis). Network 1b also involves the anterior temporal

Pole just posterior to and spatially overlapping the Pole peak in the sentence processing/lemma

retrieval Network 1a. Tasks activating the semantic Pole and frontal areas in Vigneau et al.

(2006, S Table 22) share the common feature that they involve lexical-semantics at the word

level in contrast with the sentence and text processing tasks common to the Pole area in Network

1a. The ‘odd bit’ in this network is the middle temporal sulcus (T2m) which Vigneau et al.

(2006, S Table 11) classify as phonological. More recent experimental work associates the area

with retrieval of stored phonological representation (rather than acoustic analysis) of word forms

(Zhang et al., 2011; Mei et al. 2014). T2m activation in Network 1b may thus reflect retrieval of

phonological word-forms associated with lexical-semantics. It is the most active part of Network

1b.

30

Page 31: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Since Network 1b is active early in the processing stream, is sensitive to frequency effects

and involves the Pole and F3orb, it looks like a ventral network for lexical-semantic processing.

If we consider it in relation to Network 1a, it is possible that 1b is involved in selection and

integration/unification of semantic information and stored phonological word forms with

activated lemmas and structures (from 1a) and lexical-concepts (from 2b). F3tv and F3orb are

frequently co-activated with the middle temporal gyrus and Pole respectively (e.g. Binney et al.,

2012), and this network lags just behind Network 1a in activation onset but then runs in parallel

with it and with 2b. The slight delay in activation together with overlap at the Pole suggests that

activation spreads or feeds forward from 1a (and 2b, see below) to 1b.

4.2. Network 2a: A Phonological Network.

Except for the middle frontal gyrus (F2p), all components of Network 2a are implicated

in phonological processing. The rolandic sulcus (RolS) is in the mouth primary motor area,

activated not only in overt and covert articulation of phonemes, syllables, psuedo-words and so

on, but also in syllable perception (Vigneau et al., 2006, S Table 1). The precentral gyrus (Prec)

is similarly associated with covert and overt articulation and discrimination of phonologically

salient material and with phonological maintenance and working memory (Vigneau et al., 2006,

S Table 2; Price, 2012). Posteriorly, the supramarginal gyrus (SMG) is linked to phonological

representations or auditory attention (Price, 2012) and to phonological working memory

(Vigneau et al., 2006). The posterior superior temporal gyrus (T1) is associated with

phonological production and perception (Okuda & Hickok, 2006; Rauschecker & Scott, 2009;

Price, 2012). The one area that is tagged as a sentence processing area – the middle frontal gyrus

(F2p) – is actually discussed in Vigneau et al. (2006) as a potential working memory area since it

31

Page 32: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

is recruited in tasks that place high demand on working memory. They propose that it forms a

'sentence and text comprehension' working memory loop with posterior superior temporal sulcus.

Its co-activation with SMG as part of Network 2a suggests that it is involved in working

memory, but the task here is sentence production rather than comprehension. So, it may be that

here it contributes to maintaining the to-be-uttered word or sentence while its phonology is being

elaborated in the rest of the network.

Network 2a thus appears to be unequivocally a phonological network. It is clearly dorsal,

is centered on T1 and is the most sensitive to N-gram frequency effects. It also follows in a

properly serial fashion the activities in networks 1a and 1b which is expected for phonological

code retrieval in the Indefrey and Levelt (2004) model. However, co-activation of the frontal

areas associated with articulation in this network and the absence of posterior middle or inferior

temporal areas (as predicted in Indefrey & Levelt, 2004) strongly suggest the sort of motor-sound

based phonological representation argued for by Hickok and Poeppel (2004, 2007) and supported

in Vigneau et al.'s meta-analysis as an 'audio-motor loop' (also Price, 2012), rather than or in

addition to (see below) the modularity and seriality of Indefrey and Levelt (2004).

4.3. Network 2b: A Lexical-Concept Retrieval Network.

Network 2b includes three frontal areas. The rolandic operculum area (RolOp) is involved

in articulation and sensory motor integration and is activated in articulatory and auditory

discrimination tasks (Vigneau et al., 2006, S Table 4). Dorsal pars triangularis (F3td) has a

putative role in phonological working memory since it is activated by tasks involving rehearsal

of letters, numbers and so on, while the third frontal area, the junction between orbital and

32

Page 33: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

middle frontal gyri (F3orb/F2) is activated in tasks that require working memory and/or attention

to phonological or articulatory information (Vigneau et al., 2006, S Table 5). The posterior parts

of this network include both the phonological and semantic areas of the anterior superior

temporal gyrus (T1a) and a semantic node in anterior fusiform gyrus (Fusa). The T1a areas

spatially overlap each other and, as noted above, semantic T1a is not spatially separate from

sentence processing T1a. The tasks activating semantic T1a include semantic categorisation,

association, retrieval and reading or listening to words (Vigneau et al., 2006, S Table 18). The

reading and listening tasks are contrasted with non-word perception tasks, including pseudo-

words and isolated phonemes, foregrounding the 'word specific' semantic character of these

activations. More recent studies support the specificity of T1a for linguistic (as opposed to

amodal or visual) semantic processing (e.g. Visser & Lambon Ralph, 2011; Hurley et al., 2012).

Lau et al. (2013) also offer evidence that it is a source of bottom-up semantic processing insofar

as the area shows deactivations (facilitation) to semantic primes in comprehension. MEG studies

similarly have shown that the area is important for lexical access as it is sensitive to lexical

frequency and competition effects (e.g. Embick et al., 2001; Pylkkänen et al., 2004; Solomyak &

Marantz, 2009; Lewis et al., 2011; Simon et al., 2012). Phonological T1a is activated by human

voice contrasted with non-human sounds and by listening, reading, and overt production tasks to

phonological or phonetic stimuli (Vigneau et al., 2006). Finally, the anterior fusiform gyrus

(Fusa) is important for processing words in any modality (auditory, visual, or tactile; Vigneau et

al., 2006; Price, 2010). It is active on relatively passive tasks such as reading words versus

letters, on semantic association, categorisation, and retrieval ranging from superordinates (e.g.

living/non-living) to tasks requiring retrieval or comparison of subordinate level features (e.g.,

labrador versus golden retriever; Vigneau et al., 2006, S Table 21). It is also active in word

generation tasks to pictures or specific semantic stimuli. Vigneau and colleagues highlight the

33

Page 34: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

possibility that Fusa may be a site where semantic or conceptual features are accessed directly.

Mion et al. (2010) identify the left Fusa as the only area that reliably associated naming and

semantic fluency tasks with hypometabolism in sv-PPA patients, whereas hypometabolism in the

right Fusa was linked to impaired object knowledge, underlining the importance of this area for

concept retrieval.

We propose that Network 2b is a lexical concept retrieval network. There are at least three

arguments in favour of this interpretation. First, the involvement of Fusa in this network suggests

such a role since it is a likely source for lexical-concepts. The activation of semantic T1a also

seems consistent with lexical concept retrieval in that it is linked to bottom-up semantic priming

(Lau et al., 2013) and is sensitive to a variety of lexical frequency effects in MEG studies. The

inclusion of phonological areas, both in frontal cortex and T1a, might be interpreted as

confounding if one assumes strict seriality. However, under parallel architecture assumptions,

lexical concepts might involve both semantics and some sort of 'pre-phonological' representation

– where lexical-concepts are associated, not with phonological code as such, but with traces of

articulatory or syllabic (frontal) and auditory (T1a) features. Second, Network 2b does respond

to N-gram frequencies (though its response to unigrams is not significant – see section 4.6 for

discussion), as one might expect from a lexical-concept retrieval network. Third, there is no other

plausible network for lexical-concept retrieval. Network 1a is not a likely candidate since it only

aggregates sentence and text processing areas and shows no response to N-gram frequency.

Network 1b – though distinctly semantic in character – is implausible because it only comes on-

line after 1a does and so is impossible in this role. Similarly, Networks 2a and 3 become active

much later in the processing stream and 2a lacks any areas relevant for lexical-concept retrieval.

34

Page 35: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Assuming that lexical-concept retrieval must happen whether the task is naming or scene

description, 2b is the only candidate.

4.4. Network 3 – An Integration-Unification Network for Sentence Production.

Network 3 is mostly comprised of regions associated with semantics but also aggregates

phonological and sentence processing areas. Vigneau et al. (2006) identify the frontal areas of the

network, the junction of the precentral gyrus and dorsal pars opercularis (Pr/F3op) and dorsal

pars opercularis (F3opd), with many aspects of semantic processing. Of particular note, they

associate both PrF3op and F3opd areas with controlled semantic retrieval, semantic priming, and

word generation to pictures and other stimuli. They further suggest that PrF3op forms a semantic

working memory loop with angular gyrus (AG). PrF3op and AG, together with planum temporal

(PT), are the most active parts of Network 3.

There are eight posterior areas in this network, making it the most widely distributed of the

five networks. Borrowing from Mesulam (1998), Vigneau et al. (2006) describe AG as a

'transmodal gateway', integrating features from different modalities into conceptual

representations. Subsequent work supports this assessment (e.g. Binder et al., 2009; Bonner et

al., 2013). As already noted, the middle lateral temporal gyrus (T2ml) is robustly associated with

lexical retrieval and selection. This activation is just anterior to the T2ml activation for sentence

tasks and they overlap each other. They differ in that semantic T2ml is activated by word level

semantic tasks (Vigneau et al., 2006, S Table 20).

The posterior inferior temporal gyrus (T3p) includes two overlapping areas associated with

semantic and phonological tasks. Vigneau and colleagues describe both areas as particularly

35

Page 36: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

sensitive to visually presented stimuli. Indefrey and Levelt (2004) associate T3p (and posterior

middle temporal gyrus (T2p)) with phonological code retrieval, while Hickok and Poeppel

(2004) treat the same areas as a phonology-semantics interface. The latter is more consistent with

Vigneau's analysis and also with our results insofar as Network 3 appears integrative and a motor

— sound based (pre-)phonological representation has already developed in Network 2a.

Nevertheless, our study design does not allow us to decide between these options and in any

case, the sound-motor representations in Network 2a do not exclude other later phonological

representations. Similarly, the posterior superior temporal gyrus (T1p) is active on semantic tasks

and sensitive to visually presented stimuli. Based on the task activations they observe, Vigneau et

al. describe T1p as a modality selective area particularly important for transforming letters and

graphemes to amodal syllabic representations that can then be made available for further

processing. The activation of T1p in our scene description task, where there is visual presentation

(pictures) but no reading, suggests a broader function perhaps of integrating auditory and visual

information (e.g. Erickson et al. 2014), or modelling features for motor control and/sensory

prediction (Rauschecker and Scott, 2009). Planum temporal (PT), the next part of this network,

is associated with ‘auditory imagery’ by Price (2012) and as part of an 'audio-motor loop' by

Vigneau and colleagues. Here, plausibly PT and T1p may be maintaining or transforming

representations specified in the prior activity of Network 2a and/or retrieved from T3p for

integration with other aspects of the message, and/or as 'forward models' for subsequent phonetic

encoding and articulation.

Network 3 also includes the posterior middle temporal gyrus (T2p) and posterior superior

temporal sulcus (STSp) both of which are involved in sentence processing tasks in Vigneau et al.

(2006). They describe the function of T2p as the integration of visual imagery into sentence/text

36

Page 37: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

processing, which very nicely fits our task. However, T2p is also activated by specifically

syntactic tasks in their data suggesting that its role may be broader. Lastly, Vigneau and

colleagues describe STSp as a sort of semantic and discourse coherence processor insofar as the

area is active on many tasks that require integration of complex semantic and syntactic

information in sentences and texts. Related studies support an integrative role for STSp with

investigations ranging over argument hierarchies in a syntax-semantics interface (Bornkessel et

al., 2005), syntactic and semantic matching (Grewe et al., 2007), integration of visual

(pantomime) and auditory (speech) information (Willems et al., 2009), social perception

(Lahnakoski et al., 2012) and so on (see Hein & Knight, 2008, for review). The common thread

in these papers is that STSp is engaged when some sort of matching task is required. This is

nicely expressed in Willems et al. (2009, p. 2001) who see the area, networked with LIFG, as

'integrative' insofar as it matches input streams for which there is 'a relatively stable common

representation'. They contrast this with unification – the on-line construction of representations

of novel input – which they attribute to top-down modulatory activity from (mostly) LIFG.

Hocking and Price (2008) add that the salient STSp contribution seems to be 'conceptual

matching' which is neither specifically linguistic, nor necessarily multimodal.

Overall then, the components of this network suggest that it is dominantly semantic but

involves areas associated (perhaps flexibly rather than specifically) with sentence and text

processing, and areas for visual and auditory/phonological representations prior to speech. Its

multi-functional composition, including Vigneau and colleagues' proposed 'semantic working

memory loop', together with the fact that it is activated late in the processing stream (1840–2410

ms), just before voice onset suggests a complex role of integration/unification of syntactic,

lexical-semantic, phonological and visual information that happens after initial sentence,

37

Page 38: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

semantic, and phonological processing. If there were an integrative or unification network for

speech production – responsible for “unifying pieces” – Network 3 is a potential candidate in the

late preproduction phase.

4.5. Structural Connectivity for the Networks?

Lastly, it is worth noting that although we used data driven methods to define the networks,

the areas aggregated into each network by these means not only make 'functional sense', they are

plausible in terms of potential structural connectivity. For example, the topography of the

semantic selection-integration Network 1b suggests it is supported by the uncinate fasiculus

(linking F3orb and Pole) and perhaps also by an anterior portion of the extreme capsule fiber

system (EmC) and inferior occipitofrontal fasciculus (IFOF) fiber systems linking T2m and F3tv

(Catani et al., 2005, 2013; Saur et al., 2008; Duffau et al., 2014; Bajada et al., 2015). The lexical-

concept retrieval Network 2b includes F2/F3orb but not the Pole, so it may rely on the EmC and

IFOF fibre systems. The sentence processing-lemma retrieval Network 1a also appears markedly

ventral but lacks an orbital frontal component and so may not involve the uncinate fasiculus

pathway either. It may be supported by an EmC and IFOF fiber systems as proposed by Saur et

al. (2008) and/or by the arcuate fasciculus system (AF), perhaps with a little help from other

fibre systems coursing through the temporal lobe (Bajada et al. 2015; Catani et al., 2005; Duffau

et al., 2014). Of the later networks, phonological code retrieval (2a) is consistent with the dorsal

(arcuate fasciculus) pathway if its anterior and posterior indirect bundles are taken into

consideration (as in Catani et al., 2005, and Duffau et al., 2014). The integration-unification

Network 3 combines areas associated with both dorsal and ventral networks, and thus may be

supported by both arcuate fasciculus fibres (Catani et al., 2005) and IFOF as proposed in Duffau

38

Page 39: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

et al. (2014) – see Griffiths et al., 2013, for evidence that both tracts are involved in sentence

processing. Thus, the networks elicited by time in this preproduction phase of scene description

appear consistent with current dorsal-ventral models of the neural systems that support language

processing, with processing moving from ventral to dorsal and dorsal-ventral networks. We are

conducting a combined MEG and DTI tractography study on the same tasks to address the issue

of the structural connectivity for these networks explicitly.

4.6. Network Timing and Network Functions

We found significant modulation of activity over time in all networks except 2b. Our

failure to detect a significant Time × Network interaction in this latter network may stem from

substantial variability between individuals regarding its time course of activation. However, we

think it is more probable that it is involved in on-going lexical-concept retrieval as proposed

above given its architecture, the ongoing task, and that it responds to frequency. If this is the

case, then apart from the lexical-concept retrieval Network 2b, the timing of the networks argue

for a compositional sequence that begins with sentence processing-lemma retrieval in 1a, closely

followed by and subsequently in parallel with semantic selection-integration in 1b. Activity then

moves to phonological retrieval in Network 2a, and finally to Network 3 for integration and

unification prior to speech onset. Considering the network functions, this timing supports a

relatively conventional compositional picture with the following four differences. (1) Four of the

five networks aggregate areas associated with more than one domain – only the sentence

processing Network 1a is relatively 'pure' in only aggregating areas linked to sentence processing

by Vigneau et al.’s (2006) meta-analysis. (2) All the networks involve frontal and posterior areas,

the latter including temporal areas in all networks and also parietal areas in Networks 2a and 3.

39

Page 40: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

(3) Lexical-concept retrieval appears to be continuous in this sentence production task,

happening in parallel with other processes. (4) The initial phases of lemma retrieval/sentence

processing and semantic selection/integration are coordinated with each other insofar as they run

in parallel after the first 90 ms.

The timing of the networks is thus evidence for both parallel and serial processing, while

individual network architectures involve within network parallelism and continuous

contributions from the frontal areas. The latter may reflect language specific processing and/or

domain general selection, maintenance and working memory that support retrieval and

integration-unification. It is also noteworthy that – again with the exception of the sentence

processing network 1a – every network includes at least one component somehow linked to

phonology. This suggests, first, that lexical-semantic information is continuously linked to

aspects of phonological representations. In effect, some distributed 'bundling' that might facilitate

processing is built into the network architectures. Secondly, the parallelism within networks also

argues for a situation in which networks supply 'forward models' in partially specified semantic

and phonological information facilitating links between networks and distinct processing phases.

The only network not exhibiting this feature, the sentence processing/lemma retrieval network,

has nodes that overlap all other networks except the phonological network, and it runs in parallel

with the lexical/concept retrieval and semantic selection-integration networks. It is thus

connected to these via its topography and timing. Moreover, the inclusion of F2 (working

memory for sentences according to Vigneau and colleagues) in the phonological network

indicates that this network too may have access to the sentential representation while its

phonology is elaborated. PT (linked to 'auditory imagery' by Price, 2012) might serve a similar

function in the unification-integration network. Thus, the combination of timing and

40

Page 41: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

architectures for the networks implies not only serial and parallel processing, including the

continuous activity of the lexical-concept retrieval network, but ongoing 'feed forward' effects in

the series so that indices of the activity of earlier networks are available through network internal

architectures to subsequent networks. Of course, the topographies and timing of our networks

need replication. Nevertheless, they offer a compelling picture of how different domains might

be associated via serial and parallel networks in sentence production.

4.7. N-gram Frequency Effects and Network Functions

Frequency of occurrence modulated the overall level of activity in all networks except the

sentence processing-lemma retrieval Network 1a. In general, the frequency effect tied to N-

grams was logarithmic. This is consistent with the robust finding from a number of fMRI and

single-cell recording studies that the neural activity in a variety of brain regions reduces with the

repeated presentation of, for example, faces, words, numbers, or objects (e.g., Grill-Spector,

Henson, & Martin, 2006, and references cited therein; see Li, Miller, & Desimone, 2004,

regarding the logarithmic form of the reduction) and with MEG studies investigating priming

(Sekiguchi et al., 2000) or using frequency and priming effects to investigate lexical access (e.g.

Embick et al., 2001; Pylkkänen et al., 2000, 2002). We also saw the flattening that we

hypothesised might index storage of N-grams but it began in all cases in the lowest 25% of the

data rather than in the middle or high frequency ranges. Additionally, there were slight increases

for higher frequency items. Overall activity rose in Network 1b for trigrams, in Network 2a for

trigrams and quadgrams, and in Network 2b for bigrams and trigrams. There are four points

about these results that warrant comment.

41

Page 42: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

First, the fact that the flattening of amplitude responses occurred in the lower frequency

ranges rather than, as expected, in response to high frequency N-grams, argues against treating it

as an index of storage since it would imply holistic storage for more than 75% of the data. The

problem, eloquently presented in Baayen et al. (2013) of exponential and computationally

overwhelming proliferation of N-grams seems very real if this marks a storage point. Inspection

of the data makes the case even more vividly. If the flattening point is taken as a storage marker,

then taking trigrams as an example, this would mean that we store sequences such as decaying

decrepit boat, some teepees have, salamander peeking out, propellers swishing around,

stalactites or maybe (all with a log frequency > -4.7 (the 0 point) but < -4.0) as well as more

frequent items such as boys are playing, firefighters are engulfed, man and child, a group of,

group of people (all with log frequencies > 1.00). The examples suggest this is unlikely.

Secondly, the absence of a reliable response to unigram frequency in the lexical-concept

retrieval network initially appears puzzling. However, given that the task involves sentence (or at

least phrase) production, frequency effects for single words may have been masked by the

abundance of high frequency closed class items in the data (e.g. articles, prepositions,

quantifiers). These don't elicit the prominent frequency effects of open class lexical items except

in highly constrained experimental designs and so might mask frequency contrasts for higher

frequency unigrams. Additionally, participants have already seen the pictures twice in overt and

covert naming tasks within the same block which could have affected the level of activation for

unigrams so that what we see in the lexical-concept retrieval network is a gradual, non-

significant, decrease in activation as frequency increases (see Llorens et al. 2014 on the neural

effects of repetition and familiarization of stimuli). Of course, the other possibility is that we are

simply mistaken and this is not a lexical-concept retrieval network. However, as noted, the other

42

Page 43: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

networks are unlikely candidates for lexical-concept retrieval since they were either activated too

late (Networks 1b, 2a, 3) or showed no frequency effect at all (Network 1a). That said, our

alternative hypothesis, that amplitude responses to high frequency multi-word N-grams might

mimic amplitude responses to high frequency unigrams is not supported by this result.

Third, there is the striking result that the sentence processing-lemma retrieval network did

not respond to frequency at all. This could be interpreted as signifying that it is purely

combinatorial, or, as we suggested, that it is involved in retrieval, selection, and unification-

integration of abstract, comparatively invariant features (lemmas) and associated structures

relevant for sentence processing. It is worth noting in this context that although frequency effects

have been observed for syntactic structures in behavioural studies (e.g. Barnard & Mathews,

2008; Arnon & Snider, 2010; Troyer et al., 2011; Jansen & Barber, 2012) and imaging studies

(e.g. Noppeney & Price, 2004; Dehaene-Lambertz et al., 2006), Saegert and colleagues (2013)

showed that while less frequent structures such as passive voice show syntactic priming effects

(where repetition normally reduces amplitude responses), active clauses show no effect of

repetition priming. Given that our data consists mostly (87.5%) of simple clauses or phrases with

no object relative clauses and few passive voice clauses, and that individual participants tended

to repeat very similar structure types (i.e. they self prime for the structure types they do use), the

absence of frequency effects in Network 1a may arise from these factors. Of course, this should

not be interpreted as evidence that syntax or syntactic features are completely insensitive to

frequency – just that the syntactic features in this data are relatively invariant within speakers

and structures are mostly simple.

Fourth, the increase in amplitudes for high frequency N-grams that appeared in some

networks also deserves comment. As noted above, this result appears fairly often, though in

43

Page 44: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

'patches' and with small effects. A common interpretation of amplitude increases in MEG studies

is that it indexes an inhibitory effect that arises from competition of activated alternatives. For

instance, a number of MEG studies exploring lexical access and morphological models have

reported amplitude increases in response to increasing lexical frequencies (e.g. Pylkkänen et al.,

2004; Solomyak & Marantz, 2009; Lewis et al., 2011; Simon et al., 2012; and Lewis & Poeppel,

2014). In all of these, the increased activity to increasing lexical frequency modulated neural

response approximately 300 to 400 ms after stimulus presentation (the M350) which has been

associated with lexical access (Pylkkänen et al., 2000, 2002; Embick et al., 2001). Following

Pylkkänen et al. (2004), the explanation offered for the effect is that competition arises as more

frequent forms activate their 'family' resulting in an inhibitory neural response. Pylkkänen et al.

(2004) expected to find that the frequency of morphologically related words that shared a base

would have a cumulative facilitative effect on the lexical base consistent with combinatorial

models. For instance, government is actually more frequent in English than govern so they

expected that govern would inherit from the frequency of government and show corresponding

facilitation. Instead, they found that the frequency of morphological family members had an

inhibitory effect on the M350 (reflected in increased activity), as did larger morphological family

size (reflected in slower neural responses). Lewis et al. (2011) subsequently showed that when

lexical bases have no morphological family (e.g. vulnerable) the response of M350 to increasing

frequency of the base is facilitative (amplitude reduces). Simon et al. (2012) showed that both

form frequency and entropy (in this case a measure of the semantic uncertainty associated with

homonyms) had an inhibitory effect on the M350, and Lewis and Poeppel (2014) reported

inhibitory effects for lexical surface frequency on the M350. It thus appears that the amplitude of

the MEG signal increases for an assortment of lexical variables associated with frequency. These

studies all link the effect to competition.

44

Page 45: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

In all of these studies except Lewis and Poeppel (2014), the ROI where the M350

amplitude increase appeared was in the anterior superior temporal gyrus (i.e. an area that

encompasses our T1a ROIs in the lexical-concept retrieval network). The study by Lewis and

Poeppel reports the M350 amplitude increase to increasing lexical surface frequency in posterior

middle temporal gyrus (approximately corresponding to our T2ml-T2p areas in the unification-

integration network). They also report an increase in amplitude to increasing biphone frequency

in posterior superior temporal gyrus (approximately T1 in our phonological network). Thus,

three of our networks have one or more areas where this effect has been reported.

Nevertheless, such effects do not translate straightforwardly into an interpretation of our

network responses to high frequency N-grams in part because our task is scene description,

whereas all the studies reporting this effect involve experimental paradigms manipulating the

visual or auditory presentation of single words. In contrast, we see this activity not only in our

lexical-concept retrieval network for bigrams and trigrams, but also in all our other networks

except sentence processing. Thus, while it might be argued that its appearance in the lexical

concept retrieval network for high frequency bigrams and trigrams suggests that these are being

retrieved as 'chunks' that compete with each other, it should not only be an index of competition

in lexical access since it shows up in late networks not plausibly associated with lexical access

and it appears in the semantic selection-integration network which lacks the areas where the

effect has been reported in MEG studies. However, MEG studies have also shown amplitude

increases for combinatorial processes. For instance, Bemis and Pylkkänen (2012) and Westerlund

and Pylkkänen (2014) demonstrate that when combination is actually required for simple phrases

activation increases in anterior temporal areas, and Dikker and Pylkkänen (2012) showed that

pre-activation in predictive contexts increases amplitude responses in middle temporal (and

45

Page 46: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

other) areas. These studies are consonant with the appearance of amplitude increases for high-

frequency N-grams in some of our networks insofar as the highest frequency bigrams and

trigrams are constructions consisting of closed class words with or without open class high

frequency items (e.g. this is a, this is, there are only, a man and, a group of, and so on).

Moreover, in trigrams where the amplitude increase to high frequency N-grams is most robust,

67% of the data involves inflected verbs which not only need compliments but have been shown

in behavioural studies to activate their inflectional paradigms after lexical concept retrieval has

occurred (Bien et al. 2011). Thus, high frequency N-grams that elicit amplitude increases might

involve a number of different combinatorial and selection demands.

In summary, the increase of amplitude as a response to increase of frequency might be

associated with competition amongst stored representations in some networks. However,

consideration of our data and results, together with review of the psycholinguistic processing and

MEG literatures suggests a number of other interpretations which should serve to modulate the

enthusiasm with which we endorse such a claim. First, the psycholinguistic literature tells us

that, among other things, frequency effects can arise without any stored representations and that

in any case frequency often masks other more important variables including those which result in

inhibitory effects for high frequency items (Baayen 2010b). Secondly, and perhaps more

importantly, increases in amplitude in MEG studies are not only linked to inhibitory responses

signifying competition amongst activated alternatives. The recent work reviewed above from

Pylkkänen and colleagues makes it clear that combinatorial and anticipatory processes can elicit

amplitude increases in MEG signal. Given the functional characterisation of the networks and the

large-scale seriality they exhibit, the possibility that amplitude increases signify different

processes in different networks must be taken seriously.

46

Page 47: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

4.8. Limitations of this Study

There are a number of limitations in this study. We designed our task to be as ecologically

valid as possible since we were interested in exploring data driven approaches to a relatively

natural sentence production task. However, this meant that we did not constrain the response

time for participants. The absence of response time data does limit our ability to interpret some

of the results, especially whether the high frequency-higher amplitude responses were facilitatory

or inhibitory. On the other hand, this data driven approach has been helpful in suggesting what

the networks for our relatively natural task are like and in doing so, has opened up new areas of

inquiry in relation to frequency effects.

A second limitation is that we only considered the thirty left hemisphere areas identified in

Vigneau et al. (2006) as active in fMRI studies on language tasks. We thus excluded large parts

of the brain, many of which undoubtedly play roles in tasks such as this. We did so because we

wanted to find out first about how areas known to have roles in linguistic processing responded

on this task. We are conducting a follow-up whole-brain analysis to investigate interactions

between language networks and other relevant areas.

Thirdly, the ICA-derived networks are characterisations that represent an average across

complex spatio-temporal phenomena (Fornito et al., 2013), and as such are unable to capture the

fact that, over time, the 30 regions could re-organise into other networks. For instance, the lack

of a temporal response in Network 2b could reflect this, though we favour the task-based

interpretation of its behaviour.

47

Page 48: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Finally, the study was relatively small with ten participants. However, they were quite

varied demographically and thus the signal that survived such variability is likely to generalise

more widely. Nevertheless, replication in a larger group is a necessary next step.

4.9. Theoretical Implications

There are several interesting theoretical implications from our results. First, these results

challenge notions of strict modularity and seriality. Every network – except the sentence

processing/lemma retrieval network – accesses areas associated with more than one domain.

Most notably, semantic areas are always co-activated with phonological areas (as broadly

construed here to include pre-articulatory traces and auditory images in the early networks).

Indeed, the ubiquity with which phonological areas appear in networks suggests that aspects of

phonology may function as the linguistic equivalent of embodiment, that is, the memory traces of

perceptual and articulatory experience that continuously bind semantic units to their expression

forms. This sort of bundling of information might be facilitative in precisely the way that is

needed to account for behavioural responses to frequency and invites further research.

Relatedly, the structural and temporal overlaps between the the sentence

processing/lemma retrieval network and lexical concept retrieval and semantic

selection/integration networks suggest that these networks communicate with each other –

challenging notions that syntactic processing is entirely modular, happening in isolation from

semantic or phonological processing. However, the fact that the early networks differ in onset

and frequency response, suggests that it is premature to toss syntax as an independent processing

level onto the heap of theoretical epiphenomena as some usage-based theorists have proposed

(e.g. Bybee & McClelland, 2005). Again, these are targets for future inquiries.

48

Page 49: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

The network results also speak to another trend – from within psycholinguistics – which is

to abandon the mental lexicon as either too 'bloated' with too many different kinds of information

(e.g. Elman 2009) or as simply unnecessary since lexical effects can be accounted for, or at least

simulated, through combinatorial processes (Baayen, 2010; Baayen et al., 2013). Many of the

phenomena eliciting these responses may be addressed via the internal and between network

parallelism in our networks. Indeed, Elman (2009, pp. 21-22) speculates that the difficulties he

outlines might be resolved by a parallel architecture of the sort proposed by Jackendoff (e.g.

2002), but seems not to take the idea of a parallel architecture, permitting interactions between

domains, as particularly viable. The timing and topographies of our networks suggest

interactions between domains is quite possible, and offer support for a modified parallel

architecture within an overall serial organisation, obviating the need (though perhaps not the

urge) to abandon the lexicon.

Finally, it is important that all the networks have frontal components and each network

has at least one frontal area that has been associated with some aspect of working memory,

retrieval, or selection, as well as having areas associated with domain specific processing. The

fronto-posterior structure of every network together with within network parallelism suggests

that top-down and bottom-up effects in language processing are an essential and continuous part

of the picture rather than something that only happens sometimes – at least when the task is a

comparatively natural one like scene description.

5. Conclusion

We investigated claims for memory-based versus combinatorial approaches to language

processing using data driven methods in a MEG study of scene description. We expected that if

highly frequent multi-word sequences (N-grams) were stored in memory, we would see this in

49

Page 50: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

the MEG signal through a 'flattening' of amplitude responses for high frequency N-grams, or that

the amplitude response for single words would be paralleled in that for highly frequent multi-

word N-grams in network(s) associated with lexical-concept retrieval. Neither of these

expectations were met in ways that can be interpreted as unequivocal support for the memory

based hypothesis. The expected 'flattening' occurred, but too early (bottom 25%) in all cases.

Single word N-grams only elicited a reliable response in the phonological network, not in the

lexical-concept retrieval network or other early networks. Moreover, we observed activation

increases for high-frequency multi-word N-grams, which may reflect competition amongst

activated alternatives in early networks, but may also reflect combinatorial processes. In view of

this, we explored network timing and topographies in more detail with a view to clarifying their

probable processing contributions.

We identified five networks, four of which involved areas from more than one domain

arguing for within network parallelism and all of which involved frontal and posterior areas,

arguing for continuous top-down and bottom-up processing. The only network that did not

aggregate areas from different domains was an early sentence processing/lemma retrieval

network which was also unique in that it did not respond to N-gram frequencies. However, it ran

in parallel with lexical/concept retrieval and semantic selection/integration networks, with which

it also shared areas, thus suggesting that these networks collaborate in early stages of processing.

Later networks for phonological processing and a novel “unification-integration” network for

production present a more conventionally serial picture, although they too exhibited network

internal parallelism. The results of the network exploration suggest an overall activation series

which is simultaneously quite plausible as the neural basis for combinatorial approaches, and

with memory-based parallel architecture accounts. Overall, these results suggest not a language

50

Page 51: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

network, but interacting language networks contributing to specific aspects of processing, some

running in parallel and all but one involving network internal parallelism. They may offer new

ways to address a number of persistent questions in linguistics and the neurobiology of language.

6. References

Abbot-Smith, K., & Tomasello, M. (2006). Exemplar-learning and schematization in a usage-

based account of syntactic acquisition. The Linguistic Review, 23, 275–290.

Allen, K., Pereira, F., Botvinick, M., & Goldberg, A. E. (2012). Distinguishing grammatical

constructions with fMRI pattern analysis. Brain and Language, 123(3), 174-182.

Arbib, M. A., & Lee, J. (2008). Describing visual scenes: Towards a neurolinguistics based on

construction grammar. Brain Research, 1225, 146–162.

Arnon, I., & Snider, N. (2010). More than words: Frequency effects for multi-word phrases.

Journal of Memory and Language, 62, 67–82.

Baayen, R.H. (2007) Storage and computation in the mental lexicon. In G. Jarema and G. Libben

(Eds.), The Mental Lexicon: Core Perspectives, Elsevier, 81-104.

Baayen, R. H. (2010a). A real experiment is a factorial experiment? The Mental Lexicon, 5(1),

149–157.

Baayen, R. H. (2010b). Demythologizing the word frequency effect: A discriminative learning

perspective. The Mental Lexicon, 5, 436-461.

Baayen, R. H., Hendrix, P., & Ramscar, M. (2013). Sidestepping the combinatorial explosion: An

explanation of n-gram frequency effects based on naive discriminative learning. Language

and Speech, 56(3), 329-347.

51

Page 52: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Bajada, C. J., Ralph, M. A. L., & Cloutman, L. L. (2015). Transport for Language South of the

Sylvian Fissure: The routes and history of the main tracts and stations in the ventral

language network. Cortex.

Bannard, C., & Matthews, D. (2008). Stored word sequences in language learning: The effect of

familiarity on children’s repetition of four-word combinations. Psychological Science, 19,

241–248.

Bardouille, T., & Bow, S. (2012). State-related changes in MEG functional connectivity reveal

the task-positive sensorimotor network. PLoS One, 7, 1–8.

Bardouille, T., Picton, T., & Ross, B. (2010). Attention modulates beta oscillations during

prolonged tactile stimulation. European Journal of Neuroscience, 31, 761–769.

Bell, A., Brenier, J. M., Gregory, M., Girand, C., & Jurafsky, D. (2009). Predictability effects on

durations of content and function words in conversational English. Journal of Memory and

Language, 60, 92–111.

Bemis, D. K., & Pylkkänen, L. (2012). Basic linguistic composition recruits the left anterior

temporal lobe and left angular gyrus during both listening and reading. Cerebral Cortex,

bhs170.

Bemis, D. K., & Pylkkänen, L. (2013). Flexible composition: MEG evidence for the deployment

of basic combinatorial linguistic mechanisms in response to task demands. PloS One, 8,

e73949.

Bien, H., Baayen, H. R., & Levelt, W. J. M. (2011). Frequency effects in the production of

Dutch deverbal adjectives and inflected verbs. In R. Bertram, J. Hyönä, & M. Laine (eds.),

Morphology in language comprehension, production and acquisition (pp. 683-715).

London: Psychology Press.

52

Page 53: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Binder, J., Desai, R., Graves, W., & Conant, L. (2009, DEC). Where Is the Semantic System? A

Critical Review and Meta-Analysis of 120 Functional Neuroimaging Studies. Cerebral

Cortex, 19, 2767–2796.

Bock, J. (1986). Meaning, sound, and syntax - Lexical priming in sentence productioN. Journal

of Experimental Psychology-Learning Memory and Cognition, 12, 575–586.

Bonner, M. F., Peelle, J. E., Cook, P. A., & Grossman, M. (2013). Heteromodal conceptual

processing in the angular gyrus. Neuroimage, 71, 175-186.

Bornkessel, I., Zysset, S., Friederici, A. D., von Cramon, D. Y., & Schlesewsky, M. (2005). Who

did what to whom? The neural basis of argument hierarchies during language

comprehension. Neuroimage, 26, 221-233.

Brennan, J., Nir, Y., Hasson, U., Malach, R., Heeger, D., & Pylkkänen, L. (2012). Syntactic

structure building in the anterior temporal lobe during natural story listening. Brain and

Language, 120, 163–173.

Brookes, M.J., Woolrich, M., Luckhoo, H., Price, D., Hale, J.R., Stephenson, M.C., Barnes,

G.R., Smith, S.M., & Morris, P.G., 2011. Investigating the electrophysiological basis of

resting state networks using magnetoencephalography. Procedings of the Nationa. Academy

of Sciences of the United States of America, 108, 16783-16788.

Bybee, J., & McClelland, J. L. (2005). Alternatives to the combinatorial paradigm of linguistic

theory based on domain general principles of human cognition. The Linguistic Review, 22,

381–410.

Caplan, D., Chen, E., & Waters, G. (2008). Task-dependent and task-independent neurovascular

responses to syntactic processing. Cortex, 44, 257-275.

53

Page 54: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Catani, M., Jones, D., & Fytche, D. (2005). Perisylvian language networks of the human brain.

Annals of Neurology, 57, 8–16.

Catani, M., Mesulam, M. M., Jakobsen, E., Malik, F., Matersteck, A., Wieneke, C., ... &

Rogalski, E. (2013). A novel frontal pathway underlies verbal fluency in primary

progressive aphasia. Brain, awt163.

Cohen, J. (1983). The cost of dichotomization. Applied Psychological Measurement, 7, 249–253.

Columbus, G. (2010). Processing MWUs: Are MWU subtypes psycholinguistically real? In D.

Wood (Ed.), Perspectives on formulaic language: Acquisition and communication (pp. 194–

212). London and New York: Continuum.

Columbus, G. (2012). An analysis of the processing of multiword units in sentence reading and

unit presentation using eye movement data: Implications for theories of mwus. PhD

dissertation, University of Alberta, Edmonton, Canada.

Culicover, P.W., & Jackendoff, R. (2005). Simpler Syntax. Oxford: Oxford University Press.

De Cat, C., Klepousniotou, E., & Baayen, R. H. (2015). Representational deficit or processing

effect? An electrophysiological study of noun-noun compound processing by very advanced

L2 speakers of English. Frontiers in psychology, 6.

Dehaene Lambertz, G., Dehaene, S., Anton, J. L., Campagne, A., Ciuciu, P., Dehaene, G. P., ... &‐

Poline, J. B. (2006). Functional segregation of cortical language areas by sentence

repetition. Human brain mapping, 27, 360-371.

Del Prato, P., & Pylkkanen, L. (2014). MEG evidence for conceptual combination but not

numeral quantification in the left anterior temporal lobe during language production.

Language Sciences, 5, 524.

54

Page 55: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

den Ouden, D. B., Saur, D., Mader, W., Schelter, B., Lukic, S., Wali, E., ... & Thompson, C. K.

(2012). Network modulation during complex syntactic processing. Neuroimage, 59, 815-

823.

Dikker, S., & Pylkkänen, L. (2013). Predicting language: MEG evidence for lexical

preactivation. Brain and language, 127, 55-64.

Dronkers, N., Wilkins, D., Van Jr., V., Redfern, B., Jaeger, & J.J. (2004). Lesion analysis of the

brain areas involved in language comprehension. Cognition, 92, 145–177.

Duffau, H., Moritz-Gasser, S., & Mandonnet, E. (2014). A re-examination of the neural basis of

language processing: proposal of a dynamic hodotopical model from data provided by brain

stimulation mapping during picture naming. Brain and Language. 131, 1–10.

Ellis, N. C., & Simpson-Vlach, R. (2009). Formulaic language in native speakers: Triangulating

psycholinguistics, corpus linguistics, and education. Corpus Linguistics and Linguistic

Theory, 5(1), 61-78.

Elman, J. L. (2009). On the meaning of words and dinosaur bones: Lexical knowledge without a

lexicon. Cognitive Science, 33, 547-582.

Embick, D., Hackl, M., Schaeffer, J., Kelepir, M., & Marantz, A. (2001). A

magnetoencephalographic component whose latency reflects lexical frequency. Cognitive

Brain Research, 10, 345-348.

Erickson, L. C., Zielinski, B. A., Zielinski, J. E., Liu, G., Turkeltaub, P. E., Leaver, A. M., &

Rauschecker, J. P. (2014). Distinct cortical locations for integration of audiovisual speech

and the McGurk effect. Language Sciences, 5, 534.

55

Page 56: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Faraway, J. J. (2006). Extending linear models with R: Generalized linear, mixed effects and

nonparametric regression models. Boca Raton, FL: Chapman & Hall/CRC.

Fedorenko, E., Duncan, J., & Kanwisher, N. (2012). Language-selective and domain-general

regions lie side by side within Broca’s area. Current Biology, 22, 2059-2062.

Fornito, A., Zalesky, A., & Breakspear, M. (2013). Graph analysis of the human connectome:

Promise, progress, and pitfalls. NeuroImage 80, 426–444.

Goldberg, A.E. 2003c. Constructions: A New Theoretical Approach to Language. Trends in

Cognitive Science 7. 5:219-224.

Gregory, M. L., Raymond, W. D., Bell, A., Fosler-Lussier, E., & Jurafsky, D. (1999). The effects

of collocational strength and contextual predictability in lexical production. CLS–99, 35,

151–166.

Grewe, T., Bornkessel-Schlesewsky, I., Zysset, S., Wiese, R., von Cramon, D. Y., &

Schlesewsky, M. (2007). The role of the posterior superior temporal sulcus in the processing

of unmarked transitivity. Neuroimage, 35, 343-352.

Griffiths, J.D., Marslen-Wilson, W.D., Stamatakis, E.A., & Tyler, L.K. (2013). Functional

organization of the neural language system: dorsal and ventral pathways are critical for

syntax. Cerebral Cortex, 23, 139-47.

Grill-Spector, K., Henson, R., & Martin, A. (2006). Repetition and the brain: Neural models of

stimulus-specific effects. Trends in Cognitive Sciences, 10, 14-23.

Hagoort, P. (2005). On Broca, brain, and binding: a new framework. Trends in Cognitive

Science, 9, 416–423.

56

Page 57: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Hagoort, P. (2013). MUC (Memory, Unification, Control) and beyond. Frontiers in Psychology,

4.

Haller, S., Klarhoefer, M., Schwarzbach, J., Radue, E. W., & Indefrey, P. (2007). Spatial and

temporal analysis of fMRI data on word and sentence reading. European Journal of

Neuroscience, 26, 2074–2084.

Hansen, P. C., Kringelbach, M. L., & Salmelin, R. (2010). MEG: An introduction to methods.

Oxford: Oxford University Press.

Hastie, T., & Tibshirani, R. (1990). Generalized Additive Model Regression. New York:

Chapman & Hall.

Hebb, D.O. (1949). The Organization of Behavior. New York: Wiley & Sons.

Hein, G., & Knight, R. T. R. T. (2008). Superior temporal sulcus—it's my area: or is it?.

Cognitive Neuroscience, Journal of, 20, 2125-2136.

Hendrix, P. (2009). Electrophysiological effects in language production: a picture naming study

using generalized additive modeling. MA dissertation, Radboud University, Nijmegen, the

Netherlands.

Herrmann, B., Maess, B., Hahne, A., Schröger, E., & Friederici, A. D. (2011). Syntactic and

auditory spatial processing in the human temporal cortex: an MEG study. NeuroImage, 57,

624-633.

Hickok, G., & Poeppel, D. (2004). Dorsal and ventral streams: A framework for understanding

aspects of the functional anatomy of language. Cognition, 92, 67–99.

57

Page 58: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature

Reviews Neuroscience, 8, 393-402.

Hocking, J., & Price, C. J. (2008). The role of the posterior superior temporal sulcus in

audiovisual processing. Cerebral Cortex, 18, 2439-2449.

Hurley, R. S., Paller, K. A., Rogalski, E. J., & Mesulam, M. M. (2012). Neural mechanisms of

object naming and word comprehension in primary progressive aphasia. The Journal of

Neuroscience, 32, 4848-4855.

Indefrey, P., Brown, C. M., Hellwig, F., Amunts, K., Herzog, H., & Seitz, R. J. (2001). A neural

correlate of syntactic encoding during speech production. Proceedings of the National

Academy of Sciences of the United States of America, 98, 5933–5936.

Indefrey, P., & Levelt, W. J. (2004). The spatial and temporal signatures of word production

components. Cognition, 92, 101–144.

Jackendoff, R. (2002a). Foundations of Language. Oxford: Oxford University Press.

Jackendoff, R. (2002b). What’s in the lexicon? In S. Nooteboom, F. Weerman, & F. Wijnen

(Eds.), Storage and Computation in the Language Faculty. (pp. 23–58). Dordrecht, The

Netherlands: Kluwer Academic Press.

Jackendoff, R. (2007). A parallel architecture perspective on language processing. Brain

Research, 1146, 2-22.

Jackendoff, R. (2011). What is the human language faculty: Two views. Language, 87, 586–624.

Janssen, N., & Barber, H. A. (2012). Phrase frequency effects in language production. PloS One,

7, e33202.

58

Page 59: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Jiang, N., & Nekrasova, T. (2007). The processing of formulaic sequences by second language

speakers. The Modern Language Journal, 91, 433–445.

Jurafsky, D., Bell, M., & Raymond, W. (2001). Probabilistic relations between words: Evidence

from reduction in lexical production. In J. Bybee & P. Hopper (Eds.), Frequency and the

Emergence of Linguistic Structure (pp. 229–254). Amsterdam: John Benjamins.

Kapatsinski, V., & Radicke, J. (2009). Frequency and the emergence of prefabs: Evidence from

monitoring. In R. Corrigan, E. Moravcsik, H. Ouali, & K. Wheatley (Eds.), Formulaic

language. vol. ii: Acquisition, loss, psychological reality, functional explanations. (pp. 499–

522). Amsterdam and Philadelphia: John Benjamins.

Keele, L. (2008). Semiparametric Regression for the Social Sciences. New York: Chapman &

Hall/CRC.

Keller, F., & Lapata, M. (2003). Using the web to obtain frequencies for unseen bigrams.

Journal Computational Linguistics, 29 , 459–484.

Kryuchkova, T., Tucker, B. V., Wurm, L. H., & Baayen, R. H. (2012). Danger and usefulness are

detected early in auditory lexical processing: Evidence from electroencephalography. Brain

and Language, 122, 81–91.

Lahnakoski, J. M., Glerean, E., Salmi, J., Jääskeläinen, I. P., Sams, M., Hari, R., & Nummenmaa,

L. (2012). Naturalistic fMRI mapping reveals superior temporal sulcus as the hub for the

distributed brain network for social perception. Frontiers in Human Neuroscience, 6.

Lamb, S. (1966). Outline of Stratificational Grammar. Washington: Georgetown University

Press.Lancaster, J., Rainey, L., Summerlin, J., Freitas, C., Fox, P., & Evans, A. (1997).

Automated labeling of the human brain: A preliminary report on the development and

evaluation of a forward-transform method. Human Brain Mapping, 5, 238–242.

59

Page 60: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Lancaster, J., Woldorff, M., Parsons, L., Liotti, M., Freitas, C., & Rainey, L. (2000). Automated

Talairach Atlas labels for functional brain mapping. Human Brain Mapping, 10, 120–131.

Lau, E. F., Gramfort, A., Hämäläinen, M. S., & Kuperberg, G. R. (2013). Automatic semantic

facilitation in anterior temporal cortex revealed through multimodal neuroimaging. The

Journal of Neuroscience, 33, 17174-17181.

Levelt, W. J. (1983). Monitoring and self-repair in speech. Cognition, 14, 41–104.

Lewis, G., & Poeppel, D. (2014). The role of visual representations during the lexical access of

spoken words. Brain and language, 134, 1-10.

Lewis, G., Solomyak, O., & Marantz, A. (2011). The neural basis of obligatory decomposition of

suffixed words. Brain and language, 118, 118-127.

Li, L., Miller, E. K., & Desimone, R. (2004). The representation of stimulus familiarity in

anterior inferior temporal cortex. Journal of Neuropsychology, 69, 1918–1929.

Llorens, A., Trébuchon, A., Riès, S., Liégeois-Chauvel, C., & Alario, F. (2014). How

familiarization and repetition modulate the picture naming network. Brain and language,

133, 47-58.

Lu, Z., & Kaufman, L. (2003). Magnetic source imaging of the human brain. New York:

Routledge.

MacCallum, R., Zhang, S., Preacher, K., & Rucker, D. (2002). On the practice of

dichotomization of quantitative variables. Psychological Methods, 7, 19–40.

Magnusdottir, S., Fillmore, P., den Ouden, D. B., Hjaltason, H., Rorden, C., Kjartansson, O., ... &

Fridriksson, J. (2013). Damage to left anterior temporal cortex predicts impairment of

60

Page 61: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

complex syntactic processing: A lesion symptom mapping study. ‐ Human brain mapping, 34,

2715-2723.

Makuuchi, M., Bahlmann, J., Anwander, A., & Friederici, A. D. (2009). Segregating the core

computational faculty of human language from working memory. Proceedings of the

National Academy of Sciences, 106, 8362-8367.

Makuuchi, M., & Friederici, A. D. (2013). Hierarchical functional connectivity between the core

language system and the working memory system. Cortex, 49, 2416-2423.

Marantz, A. (2005). Generative linguistics within the cognitive neuroscience of language. The

Linguistic Review, 22, 429–445.

Marchini, J., Heaton, C., & Ripley, B. (2012). fastICA: FastICA Algorithms to perform ICA and

Projection Pursuit (R package version 1.1-16).

Marek, A., Habets, B., Jansma, B. M., Nager, W., & Muente, T. F. (2007). Neural correlates of

conceptualisation difficulty during the preparation of complex utterances. Aphasiology, 21,

1147–1156.

Martin, N., Weisberg, R., & Saffran, E. (1989). Variables influencing the occurrence of naming

errors - implications for models of lexical retrieval. Journal of Memory and Language, 28,

462–485.

Mei L, Xue G, Lu Z-L, He Q, Zhang M, et al. (2014) Artificial language training reveals the

neural substrates underlying addressed and assembled phonologies. PLoS ONE 9, e93548.

Mesulam, M. M. (1998). From sensation to cognition. Brain, 121, 1013-1052.

61

Page 62: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Mesulam, M., Rogalski, E., Wieneke, C., Cobia, D., Rademaker, A., Thompson, C., &

Weintraub, S. (2009). Neurology of anomia in the semantic variant of primary progressive

aphasia. Brain, awp138.

Mesulam, M. M., Wieneke, C., Hurley, R., Rademaker, A., Thompson, C. K., Weintraub, S., &

Rogalski, E. J. (2013). Words and objects at the tip of the left temporal lobe in primary

progressive aphasia. Brain, 136, 601-618.

Meyer, C. F., Grabowski, R., Han, H.-Y., Mantzouranis, K., & Moses, S. (2003). The world wide

web as linguistic corpus. In P. Leistyna & C. F. Meyer (Eds.), Language and Computers,

Corpus Analysis: Language Structure and Language Use. (pp. 241–254). Amsterdam:

Rodopi.

Mion, M., Patterson, K., Acosta-Cabronero, J., Pengas, G., Izquierdo-Garcia, D., Hong, Y. T., ...

& Nestor, P. J. (2010). What the left and right anterior fusiform gyri tell us about semantic

memory. Brain, 133, 3256-3268.

Motulsky, H., & Christopoulos, A. (2004). Fitting Models to Biological Data Using Linear and

Non-linear Regression. A Practical Guide to Curve Fitting.]. Oxford, UK: Oxford

University Press.

Newman, A. J., Supalla, T., Hauser, P., Newport, E. L., & Bavelier, D. (2010). Dissociating

neural subsystems for grammar by contrasting word order and inflection. Proceedings of the

National Academy of Sciences, 107, 7539-7544.

Noppeney, U., & Price, C. J. (2004). An fMRI study of syntactic adaptation. Journal of

Cognitive Neuroscience, 16, 702-713.

62

Page 63: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Okada, K., & Hickok, G. (2006). Left posterior auditory-related cortices participate both in

speech perception and speech production: Neural overlap revealed by fMRI. Brain and

Language, 96, 112–117.

Oomen, C., & Postma, A. (2001). Effects of time pressure on mechanisms of speech production

and self-monitoring. Journal of Psycholinguistic Research, 30, 163–184.

Patterson, K., Nestor, P. J., & Rogers, T. T. (2007). Where do you know what you know? The

representation of semantic knowledge in the human brain. Nature Reviews Neuroscience, 8,

976-987.Price, C. (2010). The anatomy of language: a review of 100 fMRI studies published

in 2009. Annals of the New York Academy of Sciences: The Year in Cognitive Neuroscience,

1191, 62–88.

Price, C. (2012). A review and synthesis of the first 20 years of PET and fMRI studies of heard

speech, spoken language and reading. Neuroimage, 15, 816–847.

Pylkkänen, L., Stringfellow, A., Flagg, E., & Marantz, A. (2000). A neural response sensitive to

repetition and phonotactic probability: MEG investigations of lexical access. In Proceedings

of biomag (pp. 363-367).

Pylkkänen, L., Stringfellow, A., & Marantz, A. (2002). Neuromagnetic evidence for the timing of

lexical activation: An MEG component sensitive to phonotactic probability but not to

neighborhood density. Brain and Language, 81, 666-678.

Pylkkanen, L., & Marantz, A. (2003). Tracking the time course of word recognition with MEG.

Trends in Cognitive Sciences, 7, 187–189.

Pylkkänen, L., Feintuch, S., Hopkins, E., & Marantz, A. (2004). Neural correlates of the effects

of morphological family frequency and family size: an MEG study. Cognition, 91, B35-B45.

63

Page 64: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Pylkkänen, L., Bemis, D. K., & Blanco Elorrieta, E. (2014). Building phrases in language

production: An MEG study of simple composition. Cognition, 133, 371-384.

Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: nonhuman

primates illuminate human speech processing. Nature neuroscience, 12, 718-724.

R Development Core Team. (2012). R: A language and environment for statistical computing.

Vienna, Austria (ISBN 3-900051-07-0).

Roelofs, A., Meyer, A. S., & Levelt, W. J. (1998). A case for the lemma/lexeme distinction in

models of speaking: Comment on Caramazza and Miozzo (1997). Cognition, 69(2), 219-

230.

Saur, D., Kreher, B. W., Schnell, S., Kümmerer, D., Kellmeyer, P., Vry, M. S., ... & Weiller, C.

(2008). Ventral and dorsal pathways for language. Proceedings of the national academy of

Sciences, 105, 18035-18040.

Schreuder, R. & Baayen, R.H. (1997) How simplex complex words can be. Journal of Memory

and Language 37, 118-139.

Segaert, K., Kempen, G., Petersson, K. M., & Hagoort, P. (2013). Syntactic priming and the

lexical boost effect during sentence production and sentence comprehension: an fMRI study.

Brain and language, 124(2), 174-183.

Schwartz, M. F., Kimberg, D. Y., Walker, G. M., Faseyitan, O., Brecher, A., Dell, G. S., &

Coslett, H.B. (2009). Anterior temporal involvement in semantic word retrieval: VLSM

evidence from aphasia. Brain. 132, 3411-3427.

Sekiguchi, T., Koyama, S., & Kakigi, R. (2000). The effect of word repetition on evoked

magnetic responses in the human brain. Japanese Psychological Research, 42(1), 3-

64

Page 65: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

14.Segonne, F., Dale, A., Busa, E., Glessner, M., Salat, D., Hahn, H., . (2004). A hybrid

approach to the skull stripping problem in MRI. Neuroimage, 22, 1060–1075.

Simon, D. A., Lewis, G., & Marantz, A. (2012). Disambiguating form and lexical frequency

effects in MEG responses using homonyms. Language and Cognitive Processes, 27, 275-

287.

Siyanova-Chanturia, A., Conklin, K., & Heuven, W. J. van. (2011). Seeing a phrase “time and

again” matters: The role of phrasal frequency in the processing of multiword sequences.

Journal of Experimental Psychology: Learning, Memory, and Cognition, 37, 776–784.

Solomyak, O., & Marantz, A. (2009). Lexical access in early stages of visual word processing: A

single-trial correlational MEG study of heteronym recognition. Brain and language, 108,

191-196.

Sosa, A. v., & MacFarlane, J. (2002). Evidence for frequency-based constituents in the mental

lexicon: Collocations involving the word of. Brain and Language, 82, 227–236.

Taulu, S., & Simola, J. (2006). Spatiotemporal signal space separation method for rejecting

nearby interference in MEG measurements. Physics in Medicine and Biology, 51, 1759–

1768.

Thelwall, M., & Sud, P. (2012). Wbometrics research with the Bing Search API 2.0. Journal of

Informetrics, 6, 44–52.

Tremblay, A. (2013). icaOcularCorrection: Independent Components Analysis (ICA) based eye-

movement correction (R package version 2.4.3.5.1 alpha – pre-release).

Tremblay, A., & Baayen, R. H. (2010). Holistic processing of regular four-word sequences: A

behavioral and ERP study of the effects of structure, frequency, and probability on

65

Page 66: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

immediate free recall. In D. Wood (Ed.), Perspectives on Formulaic Language: Acquisition

and Communication (pp. 151–173). London and New York: Continuum.

Tremblay, A., Derwing, B., Libben, G., & Westbury, C. (2011). Processing advantages of lexical

bundles: Evidence from self-paced reading and sentence recall tasks. Language Learning,

61, 569–613.

Tremblay, A., & Tucker, B. V. (2011). The effects of n-gram probabilistic measures on the

recognition and production of four-word sequences. The Mental Lexicon, 6, 302–324.

Tremblay, A. & Newman, A.J. (2015). Modeling non-linear relationships in ERP data using

mixed-effects regression with R examples. Psychophysiology, 1—16, doi:

10.1111/psyp.12299.

Troyer, M., O’Donnell, T.J., Fedorenko, E., & Gibson, E. 2011. In Taatgen, N.A., van Rijn, H.

(Eds.): Proceedings of the 33rd Annual Conference of the Cognitive Science Society, Austin,

TX: Cognitive Science Society, 336-341.

Tyler, L. K., Marslen-Wilson, W. D., Randall, B., Wright, P., Devereux, B. J., Zhuang, J., ... &

Stamatakis, E. A. (2011). Left inferior frontal cortex and syntax: function, structure and

behaviour in patients with left hemisphere damage. Brain, 134, 415-431.

Ullman, M. (2007). The biocognition of the mental lexicon. In M. G. Gaskell (Ed.), The Oxford

Handbook of Psycholinguistics. (pp. 267–286). Oxford: Oxford University Press.

Underwood, G., Schmitt, N., & Galpin, A. (2004). The eyes have it. An eye-movement study into

the processing of formulaic sequences. In N. Schmitt (Ed.), Formulaic Sequences.

Acquisition, Processing and Use (pp. 153–172). Amsterdam and Philadelphia: John

Benjamins.

66

Page 67: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Vigneau, M., Beaucousin, V., Herve, P., Duffau, H., Crivello, F., & Houde, O. (2006). Meta-

analyzing left hemisphere language areas: Phonology, semantics, and sentence processing.

NeuroImage, 30, 1414–1432.

Visser, M., & Ralph, M. L. (2011). Differential contributions of bilateral ventral anterior

temporal lobe and left anterior superior temporal gyrus to semantic processes. Journal of

Cognitive Neuroscience, 23, 3121-3131.

Vrba, J., Taulu, S., Nenonen, J., & Ahonen, A. (2010). Signal space separation beamformer.

Brain Topography, 23, 128–133.

Walker, G. M., Schwartz, M. F., Kimberg, D. Y., Faseyitan, O., Brecher, A., Dell, G. S., &

Coslett, H.B. (2011). Support for anterior temporal involvement in semantic error

production in aphasia: New evidence from VLSM. Brain and Language, 117, 110–122.

Westerlund, M., & Pylkkänen, L. (2014). The role of the left anterior temporal lobe in

semantic composition vs. semantic memory. Neuropsychologia, 57, 59-70.

Willems, R. M., Özyürek, A., & Hagoort, P. (2009). Differential roles for left inferior frontal and

superior temporal cortex in multimodal integration of action and language. Neuroimage, 47,

1992-2004.

Wilson, S. M., Demarco, A. T., Henry, M. L., Gesierich, B., Babiak, M., Mandelli, M. L., ... &

Gorno-Tempini, M. L. (2014). What Role Does the Anterior Temporal Lobe Play in

Sentence-level Processing? Neural Correlates of Syntactic Processing in Semantic Variant

Primary Progressive Aphasia. Journal of Cognitive Neuroscience, 26, 970.

Wood, S. N. (2012). mgcv: Mixed GAM Computation Vehicle with GCV/AIC/REML Smoothness

Estimation (R package version 1.7-22).

67

Page 68: What the Networks Tell us about Serial and Parallel ... · and morphemes. Such models are theoretically and descriptively quite various, but share (at least) four common features.

Zhang, L., Xi, J., Xu, G., Shu, H., Wang, X., & Li, P. (2011). Cortical dynamics of acoustic and

phonological processing in speech perception. PloS One, 6, e20963.

Zuur, A. F., Ieno, E. N., Walker, N. J., Saveliev, A. A., & Smith, G. M. (2009). Mixed effects

models and extensions in ecology with R. New York: Springer.

68