PCA-Dien PCA Chapter

8/22/2019 PCA-Dien PCA Chapter

1/41

1

Introduction to Principal Components Analysis of Event-Related

Potentials

Joseph Dien13 and Gwen A. Frishkoff2

1Department of Psychology, Tulane University

2Department of Psychology, University of Oregon

3Department of Psychology, University of Kansas

To appear in:

Event Related Potentials: A Methods Handbook. Handy, T. (editor).

Cambridge, Mass: MIT Press.

Address for correspondence: Joseph Dien, Department of Psychology,

426 Fraser Hall, University of Kansas, 1415 Jayhawk Blvd., Lawrence,

KS 66045-7556. E-mail: [email protected].


2/41

2

Introduction

Over the last several decades, a variety of methods have

been developed for statistical decomposition of event-related

potentials (ERPs). The simplest and most widely applied of these

techniques is principal components analysis (PCA). It belongs to

a class of factor-analytic procedures, which use eigenvalue

decomposition to extract linear combinations of variables

(latent factors) in such a way as to account for patterns of

covariance in the data parsimoniously, that is, with the fewest

factors.

In ERP data, the variables are the microvolt readings

either at consecutive time points (temporal PCA) or at each

electrode (spatial PCA). The major source of covariance is

assumed to be the ERP components, characteristic features of the

waveform that are spread across multiple time points and

multiple electrodes (Donchin & Coles, 1991). Ideally, each

latent factor corresponds to a separate ERP component, providing

a statistical decomposition of the brain electrical patterns

that are superposed in the scalp-recorded data.

PCA has a range of applications for ERP analysis. First, it

can be used for data reduction and cleaning or filtering, prior


3/41

3

to data analysis. By reducing hundreds of variables to a handful

of latent factors, PCA can greatly simplify analysis and

description of complex data. Moreover, the factors retained for

further analysis are considered more likely to represent pure

signal (i.e., brain activity), as opposed to noise (i.e.,

artifacts or background EEG).

Second, PCA can be used in data exploration as a way to

detect and summarize features that might otherwise escape visual

inspection. This is particularly useful when ERPs are measured

over many tens or hundreds of recording sites; spatial patterns

can then be used to constrain the decomposition into latent

temporal patterns, as described in the following section.

The use of such high-density ERPs (recordings at 50 or more

electrodes) has become increasingly popular in the last several

years. A striking feature of high-density ERPs is that the

complexity of the data seems to grow exponentially as the number

of recording sites is doubled or tripled. Thus, while increases

in spatial resolution can lead to important new discoveries,

subtle patterns are likely to be missed, as higher spatial

sampling reveals more and more complex patterns, overlapping in

both time and space. A rational approach to data decomposition

can improve the chances of detecting these subtler effects.


4/41

4

Third, PCA can serve as an effective means of data

description. In principle, PCA can describe features of the

dataset more objectively and more precisely than is possible

with the unaided eye. Such increased precision could be

especially helpful when PCA is used as a preprocessing step for

ERP source localization (Dien, 1999; Dien, Frishkoff, Cerbonne,

& Tucker, 2003a; Dien, Spencer, & Donchin, 2003b; Dien, Tucker,

Potts, & Hartry, 1997).

Despite the many useful functions of PCA, this method has

had a somewhat checkered history in ERP research, beginning in

the 1960s (Donchin, 1966; Ruchkin, Villegas, & John, 1964). An

influential review paper by Donchin & Heffley (1979) promoted

the use of PCA for ERP component analysis. A few years later,

however, PCA entered something of a dark age in the ERP field

with the publication of a methodological critique (Wood &

McCarthy, 1984), which demonstrated that PCA solutions may be

subject to misallocation of varianceacross the latent factors.

Wood and McCarthy noted that the same problems arise in the use

of other techniques, such as reaction time and peak amplitude

measures. The difference is that misallocation is made more

explicit in PCA, which they argued should be regarded as an

advantage. Yet this last point was often overlooked, and this

seminal paper has, ironically, been cited as an argument against


5/41

5

the use of PCA. Perhaps as a consequence, many researchers

continued to rely on conventional ERP analysis techniques.

More recently, the emergence of high-density ERPs has

revived the interest in PCA as a method of data reduction.

Moreover, some recent studies have shown that statistical

decomposition can lead to novel insights into well-known ERP

effects, providing evidence to help separate ERP components

associated with different perceptual and cognitive operations

(Dien et al., 2003a; Dien, Frishkoff, & Tucker, 2000; Dien et

al., 1997; Spencer, Dien, & Donchin, 2001).

The present review presents a systematic outline of the steps

in temporal PCA, and the issues that arise at each step in

implementation. Some problems and limitations of temporal PCA

are discussed, including rotational indeterminacy, problems of

misallocation, and latency jitter. We then compare some recent

alternatives to temporal PCA, namely: spatial PCA (Dien, 1998a),

sequential (spatio-temporal or temporo-spatial) PCA (Spencer et

al., 2001), parametric PCA (Dien et al., 2003a), multi-mode PCA

(Mcks, 1988), and partial least squares (PLS) (Lobaugh, West, &

McIntosh, 2001). Each technique has evolved to address certain

weaknesses with the traditional PCA method. We conclude with

questions for further research, and advocate a research program


6/41

6

for systematic comparison of the strengths and limitations of

different multivariate techniques in ERP research.

Steps in Temporal PCA

The two most common types of factor analysis are principal

axis factors and principal components analysis. These methods

are equivalent for all practical purposes when there are many

variables and when the variables are highly correlated (Gorsuch,

1983), as in most ERP datasets. In the ERP literature PCA is

the more common method. Normally, one and the same term,

"component," has been used for both PCA linear combinations and

for characteristic spatial and temporal features of the ERP. To

avoid confusion, the termfactor(or latent factor) will be used

here to refer to PCA (latent) components, and the termcomponent

will be reserved for spatiotemporal features of the ERP

waveform.

The PCA process consists of three main steps: computation

of the relationship matrix, extraction and retention of the

factors, and rotation to simple structure. In the following

sections, PCA simulation is performed to illustrate each step,

using the PCA Toolkit (version 1.06), a set of Matlab functions

for performing PCA on ERP data. This toolkit was written by the

first author and is freely available upon request.


7/41

7

The Data Matrix

A key to understanding PCA procedures as applied to ERPs is

to be clear about how multiple sources of variance contribute to

the data decomposition. In temporal PCA, the dataset is

organized with the variables corresponding to time points, and

observations corresponding to the different waveforms in the

dataset, as shown in Figure 1.

Figure 1. Data matrix, with dimensions 20 x 6. Variables are

time points, measured in two conditions for ten subjects. Part

b presents the covariance matrix computed from the data matrix.


8/41

8

The waveforms vary across subjects, electrodes, and experimental

conditions. Thus, subject, spatial, and task variance are

collectively responsible for covariance among the temporal

variables. Although it may seem odd to commingle these three

sources of variance, they provide equally valid bases for

distinguishing an ERP component; in this respect, is reasonable

to treat them collectively. For example, the voltage readings

tend to rise and fall together between 250 and 350 ms in a

simple oddball experiment, because they are mutually influenced

by the P300 which occurs during this period. Since individual

differences, scalp location, and experimental task may all

affect the recorded P300 amplitude, the amplitudes of these time

points will likewise covary as a function of these three sources

of variance. Figure 2 shows the grand-averaged waveforms (for

n=10 subjects) corresponding to the simulated data in Figure 1.

For simplicity, this example involves only one electrode site,

ten subjects, and two experimental conditions. In subsequent

sections, we will use these simulated data to help illustrate

the steps in implementation of PCA for ERP analysis.


9/41

9

Figure 2. Waveforms for grand-averaged data (n=10),

corresponding to data in Figure 1. Graph displays two non-

overlapping correlated components, plotted for two hypothetical

conditions, A and B.

The Relationship Matrix

The first step in applying PCA is to generate a

relationship (or association) matrix, which captures the

interrelationships between temporal variables. The simplest

such matrix is the sum-of-squares cross-products (SSCP) matrix.

For each pair of variables, the two values for each observation

are multiplied and then added together. Thus, variables that

tend to rise and fall together will produce the highest values

in the matrix. The diagonal of the matrix (the relationship of

each variable to itself) is the sum of the squared values of

each variable. For an example of the effect of using the SSCP


10/41

10

matrix, see (Curry, Cooper, McCallum, Pocock, Papakostopoulos,

Skidmore, & Newton, 1983). SSCP treats mean differences in the

same fashion as differences in variance, which has odd effects

on the PCA computations. For example, factors computed on an

SSCP matrix can be correlated, even when they are orthogonal

when using other matrices. In general, we do not recommend the

use of the SSCP matrix in ERP analyses.

An alternative to the SSCP matrix is the covariance matrix.

This matrix is computed in the same fashion as the SSCP matrix,

except that the mean of each variable is subtracted out before

generating the relationship matrix. Mean correction ensures that

variables with high mean values do not have a disproportionate

effect on the factor solution. The effect of mean correction on

the solution depends on the EEG reference site, a topic that is

beyond the scope of this review (cf. Dien, 1998a).

A third alternative is to use the correlation matrix as the

relationship matrix. The correlation matrix is computed in the

same fashion as the covariance matrix, except that the variable

variances are standardized. This is accomplished by first mean

correcting each variable, and then dividing each variable by its

standard deviation, which ensures that the variables contribute

equally to the factor solution. Since time points that do not


11/41

11

contain ERP components have smaller variances, this procedure

may exacerbate the influence of background noise. Simulation

studies indicate that covariance matrices can yield more

accurate results (Dien, Beal, & Berg, submitted). We therefore

recommend the use of covariance matrices.

In Figure 3(a), the simulated data are converted into a

covariance matrix. Observe how the time points containing the

two components (t2 and t5) result in larger entries than those

without. The entries with the largest numbers will have the

most influence on the next step in the PCA procedure: factor

extraction.

Factor Extraction

In the extraction stage, a process called eigenvalue

decomposition is performed, which progressively removes linear

combinations of variables that account for the greatest variance

at each step. Each linear combination constitutes a latent

factor. In Figure 3, we demonstrate how this process

iteratively reduces the remaining values in the relationship

matrix to zero.


12/41

12

Figure 3. (a) Original covariance matrix. (b) Covariance matrix

after subtraction of Factor 1 (Varimax-rotated; see next

Section). (c) Covariance matrix after subtraction of Factors 1

and 2. Since Factors 1 and 2 together account for nearly all of

the variance, the result is the null matrix.

In general, PCA should extract as many factors as there are

variables, as long as the number of observations is equal to or


13/41

13

greater than the number of variables (i.e., as long as the data

matrix is of full rank).

The initial extraction yields an unrotated solution,

consisting of a factor loading matrixand a factor score matrix.

The factor loading matrix represents correlations between the

variables and the factor scores. The factor score matrix

indexes the magnitude of the factors for each of the

observations and thus represents the relationship between the

factors and the observations. If the two matrices are

multiplied together, they will reproduce the data matrix. By

convention, the reproduced data matrix will be in standardized

form, regardless of the type of relationship matrix that was

entered into the PCA. To recreate the original data matrix, the

variables of this standardized matrix are multiplied by the

original standard deviations, and the original variable means

are restored.


14/41

14

Figure 4. Reconstructed waveforms, calculated by multiplying the

Factor Loadings by the Factor Scores, scaled to microvolts

(i.e., multiplied by the matrix of standard deviations for the

original data). Data reconstructed using Varimax-rotated factors

for this example.

For a temporal PCA, the loadings describe the time course

of each of the factors. To accurately represent the time course

of the factors, it is necessary to first multiply them by the

variable standard deviations, which rescales them to microvolts


15/41

15

(see proof in Dien, 1998a). Further, it is important to note

that the sign of a given factor loading is arbitrary. This is

necessarily the case since a given peak in the factor time

course will be positive on one side of the head, and negative on

the other side, due to the dipolar nature of electrical fields

(Nunez, 1981). Note, further, that the dipolar distributions can

be distorted or obscured by referencing biases in the data

(Dien, 1998b). Only the product of the factor loading and the

factor score corresponds to the original data in an unequivocal

way. Thus, if the factor loading is positive at the peak, then

the factor scores from one side of the head will be positive and

the other side will be negative, corresponding to the dipolar

field.

The factor scores, on the other hand, provide information

about the other sources of variance (i.e., subject, task, and

spatial variance). For example, to compute the amplitude of a

factor at a specific electrode site for a given task condition,

one simply takes the factor scores corresponding to the

observations for that task at the electrode of interest and

computes their mean (across subjects). If this mean value is

computed for each electrode, the resulting values can be used to

plot the scalp topography for that factor. If a specific time

point is chosen, it is possible reconstruct the scalp topography


16/41

16

with the proper microvolt scaling by multiplying the mean scores

by the factor loading and the standard deviation for the time

point of interest (see proof in Dien, 1998a).

Unlike the PCA algorithm in most statistics packages, the

PCA Toolkit does not mean-correct the factor scores. This

maintains an interpretable relationship between the factor

scores and the original data. If factor scores are mean-

corrected as part of the standardization, the mean task scores

will be centered around zero, which can make factor

interpretation more difficult. In an oddball experiment, for

example, the P300 factor scores should be large for the target

condition and small for the standard condition. However, if the

factor scores are mean-corrected, then the mean task scores for

the two conditions will be of equal amplitude and opposite signs

(since mean correction splits the difference).

Factor Retention

Most of the PCA factors that are extracted account for

small proportions of variance, which may be attributed to

background noise, or minor departures from group trends. In the

interest of parsimony, only the larger factors are typically

retained, since they are considered most likely to contain

interpretable signal. A common criterion for determining how


17/41

17

many factors to retain is the scree test (Cattell, 1966; Cattell

& Jaspers, 1967). This test is based on the principle that the

PCA of a random set of data will produce a set of randomly sized

factors. Since factors are extracted in order of descending

size, when their sizes are graphed they will form a steady

downward slope. A dataset containing signal, in addition to the

noise, should have initial factors that are larger than would be

expected from random data alone. The point of departure from

the slope (the elbow) indicates the number of factors to retain.

Factors beyond this point are likely to contain noise and are

best dropped. Figure 5 plots the reconstructed grand-averaged

data, using the retained factors in order to verify that

meaningful factors have not been excluded.


18/41

18

Figure 5. Reconstructed waveforms, computed as in Figure 4,

using only Factors 1 and 2. Because the first two factors

account for nearly all of the variance, the reconstruction is

nearly as good as the original data (cf. Fig. 2).

In practice, the scree plot for ERP datasets often contains

multiple elbows, which can make it difficult to determine the

proper number of factors to retain. Part of the problem is that

the noise contains some unwanted signal (remnants of the

background EEG). A modified version of the parallel test can be

used to address this issue (Dien, 1998a). The parallel test

determines how many factors represent signal by comparing the

scree produced by the full dataset to that produced when only

the noise is present. The noise level is estimated by

generating an ERP average with every other trial inverted, which

has the effect of canceling out the signal while leaving the

noise level unchanged. The results of the parallel test should

be considered a lower bound since retaining additional factors

to account for major noise features can actually improve the

analysis (for such an exampole, see Dien, 1998a), although in

principle if too many additional factors are retained it can

result in unwanted distinctions being made (such as between

subject-specific variations of the component). In general, the

experience of the first author is that between eight and sixteen


19/41

19

factors is often appropriate, although this may depend, among

other things, on the number of recording sites.

Factor Rotation

A critical step, after deciding how many factors to retain,

is to determine the best way of allocating variance across the

remaining factors. Unfortunately, there is no transparent

relationship between the PCA factors and the latent variables of

interest (i.e., ERP components). Eigenvalue decomposition

blindly generates factors that account for maximum variance,

which may be influenced by more than one latent variable;

whereas, the goal is to have each factor represent a single ERP

component.

As shown in Figure 6, there is not a one-to-one mapping of

factors to variables after the initial factor extraction.

Rather, the initial extraction has maximized the variance of the

first factor by including variance from as many variables as

possible. In doing so, it has generated a factor that is a

hybrid of two ERP components, the linear sum of roughly 10% of

the P1 and 90% of the P3. The second factor contains the

leftover variance of both components. This example demonstrates

the danger of interpreting the initial unrotated factors


20/41

20

directly, as is sometimes advocated (e.g., Rsler & Manzey,

1981).

Figure 6. Graph of unrotated factors. Only Factors 1 and 2

are graphed, since Factors 36 are close to 0.

Factor rotation is used to restructure the allocation of

variables to factors to maximize the chance that each factor

reflects a single latent variable. The most common rotation is

Varimax (Kaiser, 1958). In Varimax, each of the retained

factors is iteratively rotated pairwise with each of the other

factors in turn, until changes in the solution become

negligible. More specifically, the Varimax procedure rotates the


21/41

21

two factors such that the sum of the factor loadings (raised to

the fourth power) is maximized. This has the effect of favoring

solutions in which factor loadings are as extreme as possible

with a combination of near-zero loadings and large peak values.

Since ERP components (other than DC potentials) tend to have

zero activity for most of the epoch with a single major peak or

dip, Varimax should yield a reasonable approximation to the

underlying ERP components (Fig. 7). Temporal overlap of ERP

components raises additional issues, which are addressed in the

following section.


22/41

22

Figure 7. Graph of Varimax-rotated Factor Loadings. Only Factors

1 and 2 are graphed, since Factors 36 are close to 0.

Simulation studies have demonstrated that the accuracy of a

rotation is influenced by several situations, including

component overlap and component correlation (Dien, 1998a).

Component overlap is a problem, because the more similar two ERP

components are, the more difficult it is to distinguish them

(Mcks & Verleger, 1991). Further, correlations between

components may lead to violations of statistical assumptions.

The initial extraction and the subsequent Varimax rotation

maintain strict orthogonality between the factors (so the

factors are uncorrelated). To the extent that the components

are in fact correlated, the model solution will be inaccurate,

producing misallocation of variance across the factors.

Component correlation can arise when two components respond to

the same task variables (an example is the P300 and Slow Wave

components, which often co-occur), or when both components

respond to the same subject parameters (e.g., age, sex, or

personality traits), or share a common spatial distribution.

Further, these two components can be measured at some of the

same electrodes due to their similar scalp topographies.


23/41

23

Component correlation can be effectively addressed by using

an oblique rotation, such as Promax (Hendrickson & White, 1964),

allowing for correlated factors (Dien, 1998a). In Promax, the

initial Varimax rotation is succeeded by a "relaxation" step, in

which each individual factor is further rotated to maximize the

number of variables with minimal loadings. A factor is adjusted

in this fashion without regard to the other factors, allowing

factors to become correlated, and thus relaxing the

orthogonality constraint in the Varimax solution. The Promax

rotation typically leads to solutions that more accurately

capture the large features of the Varimax factors while

minimizing the smaller features. As a result, Promax solutions

tend to account for slightly less variance than the original

Varimax solutions, but may also give more accurate results

(Dien, 1998a). A typical result can be seen in Figure 8.


24/41

24

Figure 8. Graph of Promax-rotated Factor Loadings. Only Factors

1 and 2 are graphed, since Factors 36 are close to 0.

Spatial versus Temporal PCA

A limitation of temporal PCA is that factors are defined

solely as a function of component time course, as instantiated

by the factor loadings. This means that ERP components which

are topographically distinct, but have a similar time course,

will be modeled by a single factor. A sign that this has

occurred is when temporal PCA yields condition effects

characterized by a scalp topography that differs from the

overall factor topography.


25/41

25

To address this problem, spatial PCA may be used as an

alternative to temporal PCA (Dien, 1998a). In a spatial PCA,

the data are arranged such that the variables are electrode

locations, and observations are experimental conditions,

subjects, and time points. The factor loadings therefore

describe scalp topographies, instead of temporal patterns. This

approach is less likely to confound ERP components with the same

time course. as long as they are differentiated by the task or

subject variance. On the other hand, it will be subject to the

converse problem, confusing components with similar scalp

topographies, even when they have clearly separate time

dynamics.

The choice between spatial versus temporal PCA should

depend on specific analysis goals. A rule of thumb is that if

time course is the focus of an analysis, then spatial PCA should

be used, and vice versa. The reason is that the factor loadings

are constrained to be the same across the entire dataset (i.e.,

the same time course for temporal PCA, and the same scalp

topography for spatial PCA). The factor scores, on the other

hand, are free to vary between conditions and subjects. Thus,

one can examine latency changes with spatial, but not temporal,

PCA, and vice versa. In particular, this implies that temporal,


26/41

26

rather than spatial, PCA should be more effective as a

preprocessing step in source localization, since these modeling

procedures rely on the scalp topography to infer the number and

configuration of sources.

All other things being equal, temporal PCA is in principle

more accurate than spatial PCA, since component overlap reduces

PCA accuracy, and volume conduction ensures that all ERP

components will overlap in a spatial PCA. Furthermore, the

effect of the Varimax and Promax rotations is to minimize factor

overlap, which is a more appropriate goal for temporal than for

spatial PCA. A caveat in either case is that ERP component

separation can be achieved only if subject or task variance (or

both) can effectively distinguish the components. In other

words, the three sources of variance associated with each

observation must collectively be able to distinguish the ERP

components, regardless of whether the components differ along

the variable dimension (time for temporal PCA, or space for

spatial PCA).

Recent alternatives to PCA

In recent years, a variety of multivariate statistical

techniques have been developed and are increasingly making their

way into the ERP literature. One such method, independent


27/41

27

components analysis (Makeig, Bell, Jung, & Sejnowski, 1996), is

discussed elsewhere in this volume.

In this section, we present four multivariate techniques in

ERP analysis, which share a common basis in their use of

eigenvalue decomposition. Each technique has been claimed to

address one or more problems with conventional PCA. The

application these techniques in ERP research is very recent, and

more work is needed to characterize their respective strengths

and limitations for various ERP applications. Future

developments of these techniques may lead to a powerful suite of

tools that can be used to address a range of problems in ERP

analysis.

Sequential spatiotemporal (or temporospatial) PCA

A recent procedure for improved separation of ERP

components is spatiotemporal (or temporospatial) PCA (Spencer,

Dien, & Donchin, 1999; Spencer et al., 2001). In this

procedure, ERP components that were confounded in the initial

PCA are separated by the application of a second PCA, which is

used to separate variance along the other dimension. For a

temporospatial PCA this is accomplished by rearranging the

factor scores resulting from the temporal PCA so that each

column contains the factor scores from a different electrode. A


28/41

28

spatial PCA can then be conducted using these factor scores as

the new variables. While the temporal variance has been

collapsed by the initial PCA, the subject and task variance are

still expressed in the observations, and can thus be used to

separate ERP components that were confounded in the initial PCA.

In Spencer, et al. (1999, 2001), the initial spatial PCA

was followed by a temporal PCA, with the factor scores from all

the factors combined within the same analysis (number of

observations equal to number of subjects x number of tasks x

number of spatial factors). This procedure led to a clear

separation of the P300 from the Novelty P3. However, it also had

an important drawback: the application of a single temporal PCA

to all the spatial factors could result in loss of some of the

finer distinctions in time course between different spatial

factors. This analytic strategy was necessary because the

generalized inverse function, used by SAS to generate the factor

scores, requires that there be more observations than variables.

The PCA Toolkit has bypassed this requirement by directly

rotating the factor scores (Mcks & Verleger, 1991), allowing

each initial factor to be subjected to a separate PCA (following

the example of Scott Makeigs ICA toolbox and an independent

suggestion by Bill Dunlap). In a more recent study (Dien et al.,


29/41

29

2003b), an initial spatial PCA yielded 12 factors; each spatial

factor was then subjected to a separate temporal PCA (each

retaining four factors for simplicity's sake). For analyses

using this newer approach (it makes little difference for the

original approach), temporospatial PCA is recommended over

spatiotemporal PCA, since temporal PCA may lead to better

initial separation of ERP components. Subsequent application of

a spatial PCA can then help separate components that were

confounded in the temporal PCA. On the other hand, if latency

analysis is a goal of the PCA, then spatial PCA should be done

first (since latency analysis cannot be done on the results of a

temporal PCA) with the succeeding temporal PCA step used to

verify whether multiple components are present in the factor of

interest, as demonstrated in another recent study (Dien,

Spencer, & Donchin, in press).

The full equation to generate the microvolt value for a

specific time point t and channel c for a spatiotemporal PCA is

L1 * V1 * L2 * S2 * V2 (where L1 is the spatial PCA factor

loading for c, V1 is the standard deviation of c, L2 is the

temporal PCA factor loading for t, S2 is the mean factor scores

for the temporal factor, and V2 is the standard deviation of the

spatial factor scores at t. The temporal and spatial terms are

reversed for temporospatial PCA.


30/41

30

Parametric PCA

Another recent method involves the use of parametric

measures to improve PCA separation of latent factors, which

differ along one or more stimulus dimension (Dien et al.,

2003a). This more specialized procedure can only be conducted

on datasets containing observations with a continuous range of

values. In Dien, et al. (2003), ERP responses to sentence

endings were averaged for each stimulus item (collapsing over

subjects) rather than averaging over subjects (collapsing over

items in each experimental condition). This item-averaging

approach resulted in 120 sentence averages, which were rated on

a number of linguistic parameters, such as meaningfulness and

word frequency. After an initial temporal PCA, it was then

possible to correlate the parameter of interest with the mean

factor score at each channel, to determine the influence of the

stimulus parameter on a given ERP component, such as the N400.

This had the effect of highlighting the relationship between ERP

components and stimulus parameters, while factoring out the

effects of ERP components unrelated to the parameters of

interest. In this fashion, parametric PCA can lead to scalp

topographies that reflect only the parameters of interest,

providing a new approach to functional separation of ERP

components. This approach thus provides an alternative method to


31/41

31

sequential PCA for deconfounding components. These components

can be then be subjected to further analyses, such as dipole and

linear inverse modeling.

Partial Least Squares

Partial least squares (PLS), like PCA, is a multivariate

technique based on eigenvalue decomposition. Unlike PCA, PLS

operates on the covariance between the data matrix and a matrix

of contrasts that represents features of the experimental design

(McIntosh, Bookstein, Haxby, & Grady, 1996). Similar to

parametric PCA procedures, the decomposition is focused on

variance due to the experimental manipulations (condition

differences). A recent paper (Lobaugh et al., 2001) applied PLS

to ERP data for the first time. Simulations showed that the PLS

analysis led to accurate modeling of the spatial and temporal

effects that were associated with condition differences in the

ERP waveforms. Lobaugh, et al., also suggest that PLS may be an

effective preprocessing method, identifying time points and

electrodes that are sensitive to condition differences and can

therefore be targeted for further analyses.

One cautionary note concerning PLS arises from the use of

difference waves, which are created by subtracting the ERP

waveform in one experimental condition from the response in a


32/41

32

different condition, in order to isolate experimental effects

prior to factor extraction. This approach, based on the logic of

subtraction, can lead to incorrect conclusions when the

assumption of pure insertion is violated, that is, when two

conditions are different in kind rather than in degree. It can

also produce misleading results when a change in latency appears

to be an amplitude effect or when multiple effects appear to be

a single effect.

Neuroimaging measures, such as ERP and fMRI, may be

particularly subject to such misinterpretations, since both

spatial (anatomical) and temporal, variance can lead to

condition differences. If these multiple sources of variance not

adequately separated, a temporal difference between conditions

may be erroneously ascribed to a single anatomical region or ERP

component (e.g., (Zarahn, Aguirre, & D'Esposito, 1999). In a

recent example (Spencer, Abad, & Donchin, 2000), PCA was used to

examine the claim that recollection (as compared with

familiarity) is associated with a unique electrophysiological

component. Spencer, et al., concluded that the effect was more

accurately ascribed to differences in latency jitter, or trial-

to-trial variance in peak latency of the P300 across the two

conditions.


33/41

33

For this reason, condition differences, while useful,

should only be interpreted in respect to the overall patterns in

the original data. Further, both spatial and temporal variance

between conditions should be analyzed fully, to rule out

differences in latency jitter or other electrophysiological

effects that may be hidden or confounded through cognitive

subtraction. This recommendation also applies to the

interpretation of results from the use of partial variance

techniques, such as PLS.

Multi-Mode Factor Analysis

The techniques discussed in previous sections were all

based on two-mode (defined as a dimension of variance) analysis

of ERP data. In temporal PCA, for instance, time points

represent one dimension (variables axis), and the other

dimension combines the remaining sources of variance i.e.,

subjects, electrodes, and experimental conditions (observations

axis). By contrast, multi-mode procedures analyze the data

across three or more dimensions simultaneously. For example, in

trilinear decomposition (TLD), the subject data matrix Xi is

expressed as the cross-product of three factors, as shown in

equation (1):

Xi =B *Ai *C, (1)


34/41

34

where B is a set of spatial components and C is a set of

temporal components. Ai represents the subject loadings on B and

C. B and C are calculated in separate, spatial and temporal,

singular value decompositions of the data and are then combined

to yield a new decomposition of the data, for any fixed

dimensionality (Wang, Begleiter, & Porjesz, 2000). Since tri-

mode PCA is susceptible to the same rotational indeterminacies

as regular PCA, rotational procedures will need to be developed

and evaluated.

It has been claimed that tri-mode PCA can effectively remove

nuisance sources of variance, as described by Mcks (1985),

providing greater sensitivity as compared with conventional,

two-dimensional PCA. Further, Achim has described the use of

multi-modal procedures to help address misallocation of variance

(Achim & Bouchard, 1997). These reports point to the need for

thorough and systematic comparison of multi-mode methods such as

TLD with other methods, such as parametric PCA and PLS. This

can only be done for algorithms that are made available to the

rest of the research community, either through open source or

through commercial software packages.


35/41

35

Conclusion

PCA and related procedures can provide an effective way to

preprocess high-density ERP datasets, and to help separate

components that differ in their sensitivity to spatial,

temporal, or functional parameters. This brief review has

attempted to characterize the current state of the art in PCA of

ERPs. Ongoing research will continue to refine and optimize

statistical procedures and will aim to determine the optimal

procedures for statistical decompositions of ERP data.

Ultimately, it is likely that a range of statistical tools will

be required, each best suited to different applications in ERP

analysis.


36/41

36

Bibliography

Achim, A., & Bouchard, S. (1997). Toward a dynamic topographic

components model. Electroencephalography and Clinical

Neurophysiology, 103, 381-385.

Cattell, R. B. (1966). The scree test for the number of factors.

Multivariate Behavioral Research, 1, 245-276.

Cattell, R. B., & Jaspers, J. (1967). A general plasmode (No.

3010-5-2) for factor analytic exercises and research.

Multivariate Behavioral Research Monographs, 67-3, 1-212.

Curry, S. H., Cooper, R., McCallum, W. C., Pocock, P. V.,

Papakostopoulos, D., Skidmore, S., et al. (1983). The

principal components of auditory target detection. In A. W.

K. Gaillard & W. Ritter (Eds.), Tutorials in ERP research:

Endogenous components (pp. 79-117). Amsterdam: North-

Holland Publishing Company.

Dien, J. (1998a). Addressing misallocation of variance in

principal components analysis of event-related potentials.

Brain Topography, 11(1), 43-55.


37/41

37

Dien, J. (1998b). Issues in the application of the average

reference: Review, critiques, and recommendations.

Behavioral Research Methods, Instruments, and Computers,

30(1), 34-43.

Dien, J. (1999). Differential lateralization of trait anxiety

and trait fearfulness: evoked potential correlates.

Personality and Individual Differences, 26(1), 333-356.

Dien, J., Beal, D., & Berg, P. (submitted). Optimizing principal

components analysis for event-related potential analysis.

Dien, J., Frishkoff, G. A., Cerbonne, A., & Tucker, D. M.

(2003a). Parametric analysis of event-related potentials in

semantic comprehension: Evidence for parallel brain

mechanisms. Cognitive Brain Research, 15, 137-153.

Dien, J., Frishkoff, G. A., & Tucker, D. M. (2000).

Differentiating the N3 and N4 electrophysiological semantic

incongruity effects. Brain & Cognition, 43, 148-152.

Dien, J., Spencer, K. M., & Donchin, E. (2003b). Localization of

the event-related potential novelty response as defined by

principal components analysis. Cognitive Brain Research,

17, 637-650.

Dien, J., Spencer, K. M., & Donchin, E. (in press). Parsing the

"Late Positive Complex": Mental chronometry and the ERP


38/41

38

components that inhabit the neighborhood of the P300.

Psychophysiology.

Dien, J., Tucker, D. M., Potts, G., & Hartry, A. (1997).

Localization of auditory evoked potentials related to

selective intermodal attention. Journal of Cognitive

Neuroscience, 9(6), 799-823.

Donchin, E. (1966). A multivariate approach to the analysis of

average evoked potentials. IEEE Transactions on Bio-Medical

Engineering, BME-13, 131-139.

Donchin, E., & Coles, M. G. H. (1991). While an undergraduate

waits. Neuropsychologia, 29(6), 557-569.

Donchin, E., & Heffley, E. (1979). Multivariate analysis of

event-related potential data: A tutorial review. In D. Otto

(Ed.), Multidisciplinary perspectives in event-related

potential research (EPA 600/9-77-043) (pp. 555-572).

Washington, DC: U.S. Government Printing Office.

Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ:

Lawrence Erlbaum Associates.

Hendrickson, A. E., & White, P. O. (1964). Promax: A quick

method for rotation to oblique simple structure. The

British Journal of Statistical Psychology, 17, 65-70.


39/41

39

Kaiser, H. F. (1958). The varimax criterion for analytic

rotation in factor analysis. Psychometrika, 23, 187-200.

Lobaugh, N. J., West, R., & McIntosh, A. R. (2001).

Spatiotemporal analysis of experimental differences in

event-related potential data with partial least squares.

Psychophysiology, 38(3), 517-530.

Makeig, S., Bell, A. J., Jung, T., & Sejnowski, T. J. (1996).

Independent component analysis of electroencephalographic

data. Advances in Neural Information Processing Systems, 8,

145-151.

McIntosh, A. R., Bookstein, F. L., Haxby, J. V., & Grady, C. L.

(1996). Spatial pattern analysis of functional brain images

using Partial Least Squares. Neuroimage, 3, 143-157.

Mcks, J. (1988). Topographic components model for event-related

potentials and some biophysical considerations. IEEE

Transactions on Biomedical Engineering, 35(6), 482-484.

Mcks, J., & Verleger, R. (1991). Multivariate methods in

biosignal analysis: application of principal component

analysis to event-related potentials. In R. Weitkunat

(Ed.), Digital Biosignal Processing(pp. 399-458).

Amsterdam: Elsevier.


40/41

40

Nunez, P. L. (1981). Electric fields of the brain: The

neurophysics of EEG. New York: Oxford University Press.

Rsler, F., & Manzey, D. (1981). Principal components and

varimax-rotated components in event-related potential

research: Some remarks on their interpretation. Biological

Psychology, 13, 3-26.

Ruchkin, D. S., Villegas, J., & John, E. R. (1964). An analysis

of average evoked potentials making use of least mean

square techniques. Annals of the New York Academy of

Sciences, 115(2), 799-826.

Spencer, K. M., Abad, E. V., & Donchin, E. (2000). On the search

for the neurophysiological manifestation of recollective

experience. Psychophysiology, 37, 494-506.

Spencer, K. M., Dien, J., & Donchin, E. (1999). A componential

analysis of the ERP elicited by novel events using a dense

electrode array. Psychophysiology, 36, 409-414.

Spencer, K. M., Dien, J., & Donchin, E. (2001). Spatiotemporal

Analysis of the Late ERP Responses to Deviant Stimuli.

Psychophysiology, 38(2), 343-358.

Wang, K., Begleiter, H., & Porjesz, B. (2000). Trilinear

modeling of event-related potentials. Brain Topography,

12(4), 263-271.


41/41

41

Wood, C. C., & McCarthy, G. (1984). Principal component analysis

of event-related potentials: Simulation studies demonstrate

misallocation of variance across components.

Electroencephalography and Clinical Neurophysiology, 59,

249-260.

Zarahn, E., Aguirre, G. K., & D'Esposito, M. (1999). Temporal

isolation of the neural correlates of spatial mnemonic

processing with fMRI. Cognitive Brain Research, 7(3), 255-

268.

PCA-Dien PCA Chapter

Documents

Transcript of PCA-Dien PCA Chapter