Automatic identification of received language in MEGfrank/Download/Papers/IS15_kara_MEG.pdf · 101...

5
Automatic identification of received language in MEG Emilio Parisotto 1 , Youness A. Ghassabeh 2 , Matt J. MacDonald 3 , Adelina Cozma 4 , Elizabeth W. Pang 5 , Frank Rudzicz 1,2,* 1 Department of Computer Science, University of Toronto; 2 Toronto Rehabilitation Institute-UHN; 3 SickKids Research Institute; 4 University of Calgary; 5 Neurology, Hospital for Sick Children; Toronto ON Canada (*) [email protected] Abstract We identify the language being received during English and Ro- manian auditory stimuli in 11 subjects before and after a period of learning 50 words in the latter using only magnetoencephalo- graphic measures. To accomplish this, we extract on the order of 100,000 features (based on wavelets and descriptive statis- tics over windowed signals), and identify the most salient fea- tures. While we achieve very high accuracy in pre-training (up to 90% mean accuracy across 10-fold cross-validation for some subjects), it is significantly more difficult to tell received lan- guages apart after training. We also identify significant effects of semantic word category and the subject’s ability to play a musical instrument on classification accuracy. Index Terms: Magnetoencephalography, feature selection, lan- guage classification 1. Introduction Automatically classifying magnetoencephalography (MEG) data presents several challenges including high dimensional- ity, low signal-to-noise ratio, and high inter-channel redun- dancy. These factors, along with comparatively low trial counts in many MEG experiments, can lead to overfitting. To over- come these challenges, we examine a method of feature se- lection and dimensionality reduction that reduces an initially very high-dimensional feature space into a more succinct, low- dimensional representation which still maintains discriminative information. Specifically, we classify the language of received heard utterances given only MEG signals recorded during the time the word was spoken. Previous work on MEG classification includes detecting hand movement [1], identifying schizophrenia [2], and on dis- criminating between sets of imagined words [3]. To classify be- tween three different hand movements 1 , Asano et al. [1] used an adaptive spatial filter, principal components analysis (PCA) and a support vector machine (SVM) to achieve 62.6% on held-out test data. In Ince et al. [2], a subject performed a working mem- ory functional task while MEG data were recorded; an SVM with recursive feature elimination (SVM-RFE) was then used to both select a concise feature set and to identify schizophrenia. SVM-RFE recursively discarded features that did not signifi- cantly contribute to the margin of the SVM classifier to prevent excessive overfitting on the training set, and achieved 83.8% to 91.9% on the test data. Closer to our work, Guimaraes et al. [3] classified sets of 7-9 imagined words in two subtasks. In the first, the subject was 1 Corresponding to the signs in the game of ‘rock, paper, scissors’. simply required to attentively listen to a spoken word, while in the second the subject was shown each word visually and told to recite it silently. Those data were then examined using lin- ear discriminant classification and SVM algorithms to classify each channel, and further analyzed in terms of the effects of spa- tial PCA, independent components analysis (ICA) and second- order blind identification decomposition. By combining chan- nels, Guimaraes et al. achieved 60.1% mean classification rate on nine auditory words and 97.5% maximum mean classifica- tion rate on two-word problems. 2. Data The data used in this paper were originally from a neuroimag- ing study examining how language learning over an extended period affects semantic processing [4]. Each subject learned 50 words in a new language, Romanian, over a two-week period; this length of time avoids possible effects of short-term memory on language discrimination. The MEG data were collected for each subject during a receptive language comprehension task involving one session prior to learning any Romanian words and another after language training. In each case, subjects were pre- sented with an auditory word in either English or Romanian and instructed to choose one of two pictures whose meaning coin- cided with that word. These sessions used 50 English and 50 Romanian words, all of which were distinct in meaning. Each word was repeated twice for a total of 200 trials per subject, al- though visual stimuli were all unique images. Each 4.5 second epoch started with the auditory word being presented during a 1 second interval, after which the two images were presented for 3.5 seconds. The data were continuously acquired by a whole-head 151- channel MEG with a 625 Hz sampling rate. The signals were segmented into the 200 trials and then downsampled to 100 Hz to remove the high frequency noise components and transform the data into a more manageable form. Fourteen subjects participated (7 females, 7 males; mean age = 28.4 (σ =4.7)), 11 of whom are considered here due to missing demographic information and unresolved inconsis- tencies in the data for two subjects. All subjects were right- handed, as confirmed by the Edinburgh Handedness Inventory and spoke English as a first language except two (French and Swiss German, respectively). All participants completed the Peabody Picture Vocabulary Test (3rd ed.; PPVT) [5] and the Expressive Vocabulary Test (EVT) [6]. Subjects also completed a questionnaire that determined, among other things, their pri- mary language, second language (if applicable), and whether they play a musical instrument. These factors are explored in

Transcript of Automatic identification of received language in MEGfrank/Download/Papers/IS15_kara_MEG.pdf · 101...

Page 1: Automatic identification of received language in MEGfrank/Download/Papers/IS15_kara_MEG.pdf · 101 ms (ROM > ENG) L / R precuneus + R insula 361 & 423 ms (ENG > ROM) L supramarginal

Automatic identification of received language in MEG

Emilio Parisotto1, Youness A. Ghassabeh2, Matt J. MacDonald3, Adelina Cozma4,Elizabeth W. Pang5, Frank Rudzicz1,2,∗

1 Department of Computer Science, University of Toronto; 2 Toronto Rehabilitation Institute-UHN;3 SickKids Research Institute; 4 University of Calgary;

5Neurology, Hospital for Sick Children; Toronto ON Canada(*) [email protected]

AbstractWe identify the language being received during English and Ro-manian auditory stimuli in 11 subjects before and after a periodof learning 50 words in the latter using only magnetoencephalo-graphic measures. To accomplish this, we extract on the orderof 100,000 features (based on wavelets and descriptive statis-tics over windowed signals), and identify the most salient fea-tures. While we achieve very high accuracy in pre-training (upto 90% mean accuracy across 10-fold cross-validation for somesubjects), it is significantly more difficult to tell received lan-guages apart after training. We also identify significant effectsof semantic word category and the subject’s ability to play amusical instrument on classification accuracy.Index Terms: Magnetoencephalography, feature selection, lan-guage classification

1. IntroductionAutomatically classifying magnetoencephalography (MEG)data presents several challenges including high dimensional-ity, low signal-to-noise ratio, and high inter-channel redun-dancy. These factors, along with comparatively low trial countsin many MEG experiments, can lead to overfitting. To over-come these challenges, we examine a method of feature se-lection and dimensionality reduction that reduces an initiallyvery high-dimensional feature space into a more succinct, low-dimensional representation which still maintains discriminativeinformation. Specifically, we classify the language of receivedheard utterances given only MEG signals recorded during thetime the word was spoken.

Previous work on MEG classification includes detectinghand movement [1], identifying schizophrenia [2], and on dis-criminating between sets of imagined words [3]. To classify be-tween three different hand movements1, Asano et al. [1] used anadaptive spatial filter, principal components analysis (PCA) anda support vector machine (SVM) to achieve 62.6% on held-outtest data. In Ince et al. [2], a subject performed a working mem-ory functional task while MEG data were recorded; an SVMwith recursive feature elimination (SVM-RFE) was then used toboth select a concise feature set and to identify schizophrenia.SVM-RFE recursively discarded features that did not signifi-cantly contribute to the margin of the SVM classifier to preventexcessive overfitting on the training set, and achieved 83.8% to91.9% on the test data.

Closer to our work, Guimaraes et al. [3] classified sets of7-9 imagined words in two subtasks. In the first, the subject was

1Corresponding to the signs in the game of ‘rock, paper, scissors’.

simply required to attentively listen to a spoken word, while inthe second the subject was shown each word visually and toldto recite it silently. Those data were then examined using lin-ear discriminant classification and SVM algorithms to classifyeach channel, and further analyzed in terms of the effects of spa-tial PCA, independent components analysis (ICA) and second-order blind identification decomposition. By combining chan-nels, Guimaraes et al. achieved 60.1% mean classification rateon nine auditory words and 97.5% maximum mean classifica-tion rate on two-word problems.

2. DataThe data used in this paper were originally from a neuroimag-ing study examining how language learning over an extendedperiod affects semantic processing [4]. Each subject learned 50words in a new language, Romanian, over a two-week period;this length of time avoids possible effects of short-term memoryon language discrimination. The MEG data were collected foreach subject during a receptive language comprehension taskinvolving one session prior to learning any Romanian words andanother after language training. In each case, subjects were pre-sented with an auditory word in either English or Romanian andinstructed to choose one of two pictures whose meaning coin-cided with that word. These sessions used 50 English and 50Romanian words, all of which were distinct in meaning. Eachword was repeated twice for a total of 200 trials per subject, al-though visual stimuli were all unique images. Each 4.5 secondepoch started with the auditory word being presented during a 1second interval, after which the two images were presented for3.5 seconds.

The data were continuously acquired by a whole-head 151-channel MEG with a 625 Hz sampling rate. The signals weresegmented into the 200 trials and then downsampled to 100 Hzto remove the high frequency noise components and transformthe data into a more manageable form.

Fourteen subjects participated (7 females, 7 males; meanage = 28.4 (σ = 4.7)), 11 of whom are considered here dueto missing demographic information and unresolved inconsis-tencies in the data for two subjects. All subjects were right-handed, as confirmed by the Edinburgh Handedness Inventoryand spoke English as a first language except two (French andSwiss German, respectively). All participants completed thePeabody Picture Vocabulary Test (3rd ed.; PPVT) [5] and theExpressive Vocabulary Test (EVT) [6]. Subjects also completeda questionnaire that determined, among other things, their pri-mary language, second language (if applicable), and whetherthey play a musical instrument. These factors are explored in

Page 2: Automatic identification of received language in MEGfrank/Download/Papers/IS15_kara_MEG.pdf · 101 ms (ROM > ENG) L / R precuneus + R insula 361 & 423 ms (ENG > ROM) L supramarginal

Acknowledgements: We would like to thank Tatiana Valica, Darren Kadis, Vickie Yu, and Marc Lalancette for assistance with data collection and analysis. This work was funded by a Canadian Institutes of Health Research operating grant (CIHR MOP-89961) to the last author. These data were presented by A. Cozma at the 2012 Canada Wide Science Fair and was awarded a Gold Medal.

Learning new words in a foreign language: brain changes as tracked by MEG

M. J. MacDonald, A. Cozma, A. Oh, E. W. Pang Division of Neurology, Hospital for Sick Children / University of Toronto, Toronto, Canada

Introduction The question of what happens in the brain as it learns vocabulary words in a new language is of interest. Neuroimaging studies of word learning often use pseudo-words which activate the associative learning and memory networks but not the language network, per se. As well, few neuroimaging studies have used multi-day, multi-session training paradigms. Finally, most studies using ERPs or MEG have examined receptive, but not expressive, language processing. We present an MEG study where subjects were taught vocabulary words in a foreign language using an interactive daily training software package over 7-10 days. They were tested pre- and post-training using receptive and expressive language tasks.

Methods SUBJECTS 14 adult native English speakers (mean = 27.9yrs;7F); naïve to Romanian; all right-handed PRE – POST EXPERIMENTAL DESIGN Prior to training, subjects completed 2 conditions in the MEG:

1) expressive task (Fig. 1): name the object in English 2) receptive task (Fig. 2): words were spoken in

English or Romanian, 2 pictures were shown, task is to match the word with the picture

After training, subjects completed the same 2 conditions as above, plus an additional: 3) expressive task: name the object in Romanian

“Harbuz”

TRAINING SESSIONS Subjects completed an interactive at-home computer training program over approximately 2 weeks (Fig. 3) Taught to name 50 objects in Romanian When accuracy of 98% was reached for 2 consecutive days, they returned for a second MEG session

MEG DATA ACQUISITION 151 channel whole-head CTF Omega MEG 4K samples/s, to allow for audio recording during expressive task continuous recording, 0-200 Hz filter MEG DATA ANALYSIS global field power plots created for 3 comparisons event-related beamforming (ERB) was performed using BrainWave (cheynelab.utoronto.ca) images were contrasted and significance tested in BrainWave and contrasts with p<0.01 are presented individual multi-sphere head models were created from T1-weighted structural MRI

“zebra” Figure 1 (top). Name the object in English. (bottom). Name the object in Romanian.

“Chicken”

Press Right Button

1.3s

Figure 2. An auditory word is presented in either English or Romanian. This is followed by two pictures, one of which matches the word. Subject is forced to choose a response with a button press.

Figure 3. Screen capture from the Language Training Program.

Time (s)

Glo

bal F

ield

Pow

er

(x10-1

4 T

esla

)

Question #1: Where were differences seen between English and Romanian on the receptive language task?

Auditory stimulus onset = 0s

101 ms (ROM > ENG) L / R precuneus + R insula

361 & 423 ms (ENG > ROM) L supramarginal gyrus

Brain locations showing differences (p<0.01) between conditions.

ENG > ROM361 – 423 ms

L BA 39, 40, 41

ROM > ENG

101 ms L / R Precuneus BA 7

101 msR BA 13 / 45

POST > PRE195 ms

L Superior Parietal Lobule BA 7

ROM > ENG 162 – 193 msL BA 39, 31, 4

301 ms L Precuneus BA 7489 ms L Superior Frontal Gyrus BA 10

ENG > ROM 667 ms R Pre-Central Gyrus BA 9

Receptive task(post-training)

Supramarginal GyrusSuperior Temporal Gyrus

InsulaInferior Frontal Gyrus

Receptive task(Romanian)

Expressive task(post-training)

Middle Temporal GyrusPrecuneus

Pre-Central Gyrus

Summary of Beamformer Locations

Results: MEG Data

Time (s)

Glo

bal F

ield

Pow

er

(x10-1

4 T

esla

)

162 & 193 ms (ROM > ENG) L Middle Temporal L Pre-Central Gyrus Gyrus

301 ms (ROM > ENG) L Precuneus

489 ms (ROM > ENG) L Superior Frontal

Gyrus

667 ms (ENG > ROM) R Pre-Central

Gyrus

Question #2: Where were differences seen between English and Romanian on the expressive language task?

Time (s)

Glo

bal F

ield

Pow

er

(x10-1

4 T

esla

)

Auditory stimulus onset = 0s

195 ms (post-training > pre) L Superior Parietal Lobule

Question #3: Where did brain changes occur pre- and post-training for Romanian on the receptive task?

Discussion This is the first demonstration of brain changes as a function of learning new vocabulary words over a brief, but intensive, training period. Expected language areas were activated. As well, additional areas were observed and interpreted to be involved in controlling and processing the new language words. These findings suggest that learning a new language utilizes and builds upon the existing language system. This has implications for clinical conditions where language localization is of interest.

Expressive Task – Fluency

Figure 6. A significant increase in pronunciations rated as “excellent” and a significant decrease in those rated “acceptable” and “poor” were observed in Romanian word fluency between the training self-tests and post-training MEG.

MEG2 Expressive Task

Expressive Training Self-Tests

Results: Behavioural Data

Figure 5. A significant decrease in reaction time was seen with Romanian words pre- and post-training. There was no difference for English, although English and Romanian remained significantly different post-training.

MEG2 Expressive Task

Expressive Training Self-Tests

Receptive Task – Reaction Time

Pre Post Pre Post Training Self-Tests

Figure 7. A significant decrease in voice reaction time was seen with Romanian words post-training. There was no difference for English, although English and Romanian remained significantly different post-training.

Expressive Task – Voice Reaction Time

Pre Post Post Training Self- Tests

Figure 4. Significant improvement in accuracy was observed with Romanian words. English and Romanian accuracy remained different post-training. From pre- to post-training, no change was seen in English accuracy. (*p<0.05,**p<0.01)

Pre Post Pre Post Training Self-Tests

Receptive Task – Accuracy

Figure 1: Regions showing maximal changes post-training forRomanian during the receptive task, adapted from [4].

section 4.1.Source localization analyses in [4] showed pre-dominant

and significant changes pre-/post-training for Romanian in thereceptive task in the left superior parietal lobule (Fig. 1).

3. MethodsTo classify the MEG data, we first transform all raw data fromtheir original sensor space into ‘source’ space using indepen-dent components analysis (ICA). Many potentially redundantfeatures are then extracted from the transformed data and con-catenated together for each trial, as described in section 3.2.To offset the resulting high dimensionality, we score featuresheuristically according to their discriminability (according toWelch’s t-test) between the two language classes. This pre-cedes a method of coarse dimensionality reduction that maxi-mizes the retained information. Finally, the transformed dataare sent through a non-linear support vector machine, whichclassifies each MEG trial as resulting from either a Romanianor English spoken word.

To test the generalizability of our system, we use 10-foldcross-validation. In each fold, for each subject independently,20 trials (10 from each language) are held out as the test set andall ICA weights, PCA components, and classifiers are trainedon the remaining 180 trials. This was repeated using 10 distinctsubsets of 20 trials and the final averaged classification perfor-mance is reported.

3.1. Blind source separation

Blind source separation (BSS) tries to recover the originalsource signals from their mixture without a priori knowledgeof the source signal [7]. This can be reduced to finding alinear representation with (maximally) statistically indepen-dent components. Given a vector of n observations at timet, x(t) = [x1(t), . . . , xn(t)]

ᵀ, we apply ICA, which mod-els each xi as a linear mixture of independent components(sources), si, i = 1, . . . , n. We then have x = As, wheres(t) = [s1(t), . . . , sn(t)]

ᵀ, and an unknown mixture matrixA[7]. ICA looks for an ‘un-mixing’ matrix W = A−1 suchthat s ≈Wx. There are two main families of ICA algorithms[8, 9] which generate slightly different W. Some implementa-tions try to minimize the mutual information and use measuressuch as Kullback-Leibler divergence or maximum entropy [10].Others are based on the maximization of non-Gaussianity, asmeasured by kurtosis or negentropy [7]. Before applying ICAon our data, we first zero-mean all observations and ‘whiten’the observed variables; that is, we linearly transform observa-

t=3 st=0 Time

IC

A c

han

nels

Figure 2: ICA channels with top 5 greatest mean projected vari-ance, averaged over all receptive trials. Blue curves are givenRomanian stimuli, and green curves are given English.

tion vector x such that the components of the new vector x areuncorrelated and E(xxᵀ) = I. For our ICA decomposition,we use the logistic infomax algorithm [11]. This is a gradient-based neural network algorithm that uses higher-order statisticsfor the information maximization [12]. Specifically [13, 14],

1. We choose an initial un-mixing matrix W0,

2. We update Wk+1 = Wk + ηk(I− g(y)yᵀ)Wk,

3. Normalize Wk+1 = Wk+1/‖Wk+1‖, and

4. If not converged, go back to 2.

Here, ‖W‖ is the matrix norm, y = Wx, I is the identity ma-trix, and g(y) = y−tanh(y). Figure 2 shows the ICA channelshaving the maximal variance, averaged over all receptive trials.The sudden inflection in the fifth ICA channel appears to cor-respond strongly to the onset of the image one second after theauditory stimulus.

3.2. Feature extraction

Features are extracted from the ICA-transformed source space.Specifically, we apply the discrete wavelet transform (DWT),often used in electroencephalography (EEG) classification tasks[15, 16], and also extract some descriptive statistics within slid-ing windows over each trial.

In contrast to the short-time Fourier transform (STFT), con-tinuous wavelet transforms (CWTs) can decompose a signal tohave both high temporal resolution (for short, high-frequencyevents) and high spectral resolution (for long, low-frequencyevents). They accomplish this by shifting and scaling a basefunction (i.e., the ‘mother wavelet’) and convolving it with theinput signal [17]:

X(τ, α) =1√α

∫ ∞−∞

x(t)h∗(t− τα

)dt, (1)

where ∗ is the complex conjugation operator, α is the scale pa-rameter, τ is the shift parameter, and h is the mother wavelet.The DWT is a discretized version of the CWT where the scaleand shift factors are sampled at discrete points [17].

To reconstruct the original input signal from the wavelet co-efficients, certain conditions must hold with regards to the dis-cretized mother wavelet. In this paper, we choose the scale and

Page 3: Automatic identification of received language in MEGfrank/Download/Papers/IS15_kara_MEG.pdf · 101 ms (ROM > ENG) L / R precuneus + R insula 361 & 423 ms (ENG > ROM) L supramarginal

shift factors used for computing the DWT using dyadic sam-pling, where αm = 2m and τn = nαm with n,m ∈ N. Fordiscrete time signals, the DWT coefficients can be calculated as[17]:

cn,m =

∞∑p=−∞

x[p]h∗n,m[p]. (2)

The DWT has an efficient implementation wherein both ahigh-pass and low-pass decomposition are applied to the inputsignal, with each filtered signal then downsampled by a factorof 2. This is repeated recursively on the low-passed signal untila stopping criterion is reached. The values of the low- and high-passed signals on each iteration are called the ‘approximation’and ‘detail’ coefficients, respectively. The low- and high-passfilters (q and r, respectively) are calculated from the motherwavelet as follows [17]:

q[n] = (−1)nh[−n+ 1], (3)

r[n] = h[n]. (4)

Here, the mother wavelet is the Daubechies DB4 wavelet with5 levels of decomposition. The Daubechies family of orthogo-nal wavelets are commonly used in the context of EEG epilepsyprediction [15, 16]. We additionally use the wavelet band en-ergy as a feature, which is calculated by taking the energy ofeach group of detail or approximation coefficients. The waveletbands are 50-25 Hz, 25-12.5 Hz, 12.5-6.25 Hz, 6.25-3.125 Hz,3.125-0 Hz. Once the wavelet coefficients and their energieshave been calculated, estimates of their velocities and accelera-tions are also calculated.

We additionally derive descriptive statistics on overlappingwindows of each MEG channel. These windows are empiricallychosen to be approximately 10% of the total epoch length, with50% overlap between adjacent windows. From each of thesewindows, we calculate: minimum, maximum, mean, maximum+/− minimum, standard deviation, variance, skewness, kurto-sis, sum, median, energy, and an estimate of the integral (bytrapezoidal numerical integration).

3.3. Feature selection and dimensionality reduction

The wavelets, descriptive statistics for each window, and theirvelocities and accelerations, for each channel are all concate-nated into a single (long) vector trial on the order of 100,000features. To prevent overfitting, we perform Welch’s t-test sep-arately on each feature and sort the results by the resultingp-values in decreasing order. This approximates the discrim-inability of the two language classes for each feature. Empir-ically, the 150 features with the lowest p-values gave the low-est classification error, among all alternatives tested. Welch’st-test depends on the data under comparison to be normally dis-tributed. The Lilliefors test [18] on a subset of the data revealsthat only 16.4% and 16.8% of features are not normally dis-tributed in the English and Romanian stimuli, respectively, atα = 0.05.

Before classification, these 150-dimensional feature vectors(for each trial) are further processed using principal componentsanalysis (PCA)[19]. The number of components, N , was se-lected for each subject individually in order to capture 97% ofthe variance (typically, 10 ≤ N ≤ 20).

4. ExperimentsThe processed MEG data are classified using a nonlinear sup-port vector machine (SVM) [20]. Given a set of training pairs,

Figure 3: Mean accuracy for each subject on the pre- and post-learning trials with 95% confidence intervals.

0 2 4 6 8 10 1230

40

50

60

70

80

90

100

Subject

Mea

n ac

cura

cy

Post−Learning

Pre−learning

(xi, yi), i = 1..n, where xi ∈ RD is the observed vector andyi ∈ {−1, 1} is the corresponding label, we find the optimalhyperplane by optimizing its normal vector w, intercept b, andslack variables ξi, i = 1..n [21] by:

arg minw,b,ξ

1/2‖w‖2 + C

n∑i=1

ξi,

s.t. yi(wᵀxi + b) ≥ 1− ξi, and ξi ≥ 0, for i = 1..n

(5)

where the constant C > 0 is the penalty parameter of the er-ror term. We tried the polynomial (Kpoly(xi,xj) = (xᵀ

i xj +1)d, d varied empirically) and radial-basis (Krbf (xi,xj) =exp(−γ‖xi − xj‖2), γ > 0) kernels [20]. The Kpoly kernelwith d = 2 gave the best results and is hereafter reported.

Figure 3 shows the mean accuracy for each subject on thepre- and post-learning trials. A full 2-way ANOVA revealssignificant effects of subject (F9 = 8.18, p < 0.001) andtraining (F1 = 243.87, p < 0.001) on classification accu-racy, and significant interaction between subject and training(F9 = 4.94, p < 0.001). Intuitively, integration of Romanianinto the left superior parietal lobule may partially explain whyidentifying the received language from MEG becomes closer tochance after training, and this is being investigated.

4.1. Effects of behavioural demographics

Table 1 shows a linear 5-way ANOVA for behavioural demo-graphics, namely first and second language, PPVT and EVTfluency scores, and whether the subject plays a musical instru-ment. Interestingly, one’s first language has a mild effect onaccuracy, but bilingualism does not. If subjects play a musicalinstrument, the received language becomes significantly harderto discriminate, on average.

4.2. Effects of stimuli

Across both languages, stimuli words were designed to evenlydistribute across five semantic categories: Animal (e.g., ‘ze-bra’), Clothing and accessories (e.g., ‘necklace’), Food prod-ucts (e.g., ‘pumpkin’), Found in the home (e.g., ‘fireplace’), andFound in the workplace (e.g., ‘stapler’). The average accura-cies across 11 subjects are shown in table 2, given the period ofrecording and the stimulus language.

Page 4: Automatic identification of received language in MEGfrank/Download/Papers/IS15_kara_MEG.pdf · 101 ms (ROM > ENG) L / R precuneus + R insula 361 & 423 ms (ENG > ROM) L supramarginal

Table 1: ANOVA given behavioural demographics.Source Sum Sq. F p1st Lang. 1466.8 F2 = 3.25 0.042nd Lang. 831.2 F2 = 1.84 0.16Music 1544.16 F1 = 6.84 <0.01PPVT 11.8 F1 = 0.05 0.82EVT 222.2 F1 = 0.98 0.32

A 3-way linear ANOVA reveals significant effects of thesubject (F10 = 5.58, p < 0.001), received language (F1 =31.65, p < 0.001), and category of word (F4 = 29.48, p <0.001) on the accuracy of language classification. Here weuse all 11 subjects – in section 4.1 we had to exclude one per-son with incomplete demographics. Across stimuli languageand period of testing, the Food was most discernible (69.66%)followed by Animals (67.73%), Found in the home (67.71%),Found at work (65.11%), and Clothing (64.66%). Whether thisorder corresponds to the increasing age-of-acquisition of theircomponent words is yet to be determined.

Table 2: Average (and σ) classification accuracies (%) acrossword categories, language, and pre/post-learning.

English RomanianPRE POST PRE POST

Animal 82.73 (10.8) 56.82 (19.0) 75.91 (14.1) 55.45 (10.8)Clothing 71.82 (12.1) 53.64 (17.6) 77.27 (13.3) 55.91 (14.3)Food 82.27 (10.8) 57.73 (9.8) 80.91 (8.9) 57.73 (10.3)In home 78.64 (15.5) 54.09 (13.0) 76.82 (10.1) 60.91 (12.4)At work 69.55 (11.3) 57.73 (12.1) 73.64 (11.6) 59.55 (8.5)

5. DiscussionThis paper is the first, to our knowledge, that identifies receivedlanguage in MEG (or EEG, for that matter). After reducinga very large feature space of wavelets and descriptive statis-tics across the 151 available channels using a combination ofWelch’s t-test and PCA, we achieve high accuracy pre-training(up to 90%), and note that it becomes significantly more dif-ficult to identify received languages after training. This couldsuggest that the newly-learned Romanian words are being in-tegrated into the subject’s existing word knowledge base. Wealso find that both the semantic category of stimuli words andthe subject’s ability to play a musical instrument significantlyaffects classification accuracy.

Future work involves extending these methods to an expres-sive task in MEG where subjects were shown an image andasked to name the object in either English or Romanian [4].We are currently looking into which parts of the brain containthe most discriminative features for language- and category-identification, and how they relate across receptive and expres-sive tasks. The joint tasks will allow us to correlate discrimi-native features across both receptive and expressive language,which will hopefully illuminate possible shared semantic pro-cesses underlying both tasks, within theoretical models of jointspeech production and perception [22].

6. AcknowledgementsThe infomax function was implemented in EEGlab [23]. Thisresearch is funded by the Toronto Rehabilitation Institute -

UHN, the Natural Sciences and Engineering Research Councilof Canada (RGPIN 435874), and a grant from the Nuance Foun-dation. Data acquisition was funded by CIHR MOP-89961.

7. References[1] F. Asano, M. Kimura, T. Sekiguchi, and Y. Kamitani, “Classifi-

cation of movement-related single-trial meg data using adaptivespatial filter,” in Acoustics, Speech and Signal Processing, 2009.ICASSP 2009. IEEE International Conference on, April 2009, pp.357–360.

[2] N. Ince, F. Goksu, G. Pellizzer, A. Tewfik, and M. Stephane, “Se-lection of spectro-temporal patterns in multichannel MEG withsupport vector machines for schizophrenia classification,” in En-gineering in Medicine and Biology Society, 2008. EMBS 2008.30th Annual International Conference of the IEEE, Aug 2008, pp.3554–3557.

[3] M. Guimaraes, D. Wong, E. Uy, L. Grosenick, and P. Suppes,“Single-trial classification of MEG recordings,” IEEE Transac-tions on Biomedical Engineering, vol. 54, no. 3, pp. 436–443,March 2007.

[4] M. MacDonald, A. Cozma, A. Oh, and E. Pang, “Object namingin a foreign language: a pre/post MEG study of language learn-ing,” in Proceedings of the 18th International Conference on Bio-magnetism, Paris France, August 2012.

[5] L. Dunn and L. Dunn, Peabody picture vocabulary test, 3rd ed.Circle Pines, MN: American Guidance Service, 1997.

[6] K. Williams, Expressive vocabulary test. Circle Pines, MN:American Guidance Service, 1997.

[7] A. Hyvarinen, J. Karhunen, and E. Oja, Independent ComponentAnalysis. Wiley-Interscience, 2001.

[8] S. Haykin, Neural Networks and Learning Machines. New Jer-sey: Pearson Education, Inc., 2009.

[9] D. Langlois, S. Chartier, and D. Gosselin, “An introduction toindependent component analysis: InfoMax and FastICA algo-rithms,” Tutorials in Quantitative Methods for Psychology, vol. 6,no. 1, pp. 31–38, 2010.

[10] A. Hyvarinen and E. Oja, “Independent component analysis: al-gorithms and applications,” Neural Networks, vol. 13, no. 4-5, pp.411–430, 2000.

[11] A. J. Bell and T. J. Sejnowski, “An information maximization ap-proach to blind separation and blind deconvolution,” Neural Com-puting, vol. 7, no. 6, pp. 1129–1159, 1995.

[12] G. R. Naik and D. K. Kumar, “An overview of independent com-ponent analysis and its applications,” Informatica, vol. 35, no. 1,pp. 63–81, March 2011.

[13] S. Amari, A. Cichocki, and H. H. Yang, “A new learning algo-rithm for blind signal separation,” in Adv. Neural Inf. Process.Syst., MIT Press, Cambridge, MA, 1998, pp. 757–763.

[14] A. Delorme, J. Palmer, J. Onton, R. Oostenveld, and S. Makeig,“Independent EEG Sources Are Dipolar,” PLoS ONE, vol. 7,no. 2, 2012.

[15] M. Saab and J. Gotman, “A system to detect the onset of epilepticseizures in scalp EEG,” Clinical Neurophysiology, vol. 116, pp.427–442, 2005.

[16] A. Zandi, M. Javidan, G. Dumont, and R. Tafreshi, “Automatedreal-time epileptic seizure detection in scalp EEG recordings us-ing an algorithm based on wavelet packet transform,” IEEE Trans-actions on Biomedical Engineering, vol. 57, no. 7, pp. 1639–1651,July 2010.

[17] T. Quatieri, Discrete-time Speech Signal Processing: Principlesand Practice, 1st ed. Upper Saddle River, NJ, USA: PrenticeHall Press, 2001.

[18] H. Lilliefors, “On the Kolmogorov-Smirnov test for normalitywith mean and variance unknown,” Journal of the American Sta-tistical Association, vol. 62, pp. 399–402, 1967.

Page 5: Automatic identification of received language in MEGfrank/Download/Papers/IS15_kara_MEG.pdf · 101 ms (ROM > ENG) L / R precuneus + R insula 361 & 423 ms (ENG > ROM) L supramarginal

[19] Y. A. Ghassabeh and H. A. Moghaddam, “Adaptive linear dis-criminant analysis for online feature extraction,” Machine Visionand Applications, vol. 24, no. 4, pp. 777–794, 2013.

[20] B. Scholkopf, C. J. C. Burges, and A. J. Smola, Advances in Ker-nel Methods: Support Vector Learning. The MIT Press, 1998.

[21] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, “A practical guide tosupport vector classification,” Department of Computer Scienceand Information Engineering, National Taiwan University, Taipei,Tech. Rep., 2003.

[22] A. M. Liberman and I. G. Mattingly, “The motor theory of speechperception revised,” Cognition, vol. 21, pp. 1–36, 1985.

[23] A. Delorme and S. Makeig, “EEGLAB: an open source toolboxfor analysis of single-trial EEG dynamics,” Journal of Neuro-science Methods, vol. 134, pp. 9–21, 2004.