PCA-Dien PCA Chapter
-
Upload
juntujuntu -
Category
Documents
-
view
245 -
download
0
Transcript of PCA-Dien PCA Chapter
-
8/22/2019 PCA-Dien PCA Chapter
1/41
1
Introduction to Principal Components Analysis of Event-Related
Potentials
Joseph Dien13 and Gwen A. Frishkoff2
1Department of Psychology, Tulane University
2Department of Psychology, University of Oregon
3Department of Psychology, University of Kansas
To appear in:
Event Related Potentials: A Methods Handbook. Handy, T. (editor).
Cambridge, Mass: MIT Press.
Address for correspondence: Joseph Dien, Department of Psychology,
426 Fraser Hall, University of Kansas, 1415 Jayhawk Blvd., Lawrence,
KS 66045-7556. E-mail: [email protected].
-
8/22/2019 PCA-Dien PCA Chapter
2/41
2
Introduction
Over the last several decades, a variety of methods have
been developed for statistical decomposition of event-related
potentials (ERPs). The simplest and most widely applied of these
techniques is principal components analysis (PCA). It belongs to
a class of factor-analytic procedures, which use eigenvalue
decomposition to extract linear combinations of variables
(latent factors) in such a way as to account for patterns of
covariance in the data parsimoniously, that is, with the fewest
factors.
In ERP data, the variables are the microvolt readings
either at consecutive time points (temporal PCA) or at each
electrode (spatial PCA). The major source of covariance is
assumed to be the ERP components, characteristic features of the
waveform that are spread across multiple time points and
multiple electrodes (Donchin & Coles, 1991). Ideally, each
latent factor corresponds to a separate ERP component, providing
a statistical decomposition of the brain electrical patterns
that are superposed in the scalp-recorded data.
PCA has a range of applications for ERP analysis. First, it
can be used for data reduction and cleaning or filtering, prior
-
8/22/2019 PCA-Dien PCA Chapter
3/41
3
to data analysis. By reducing hundreds of variables to a handful
of latent factors, PCA can greatly simplify analysis and
description of complex data. Moreover, the factors retained for
further analysis are considered more likely to represent pure
signal (i.e., brain activity), as opposed to noise (i.e.,
artifacts or background EEG).
Second, PCA can be used in data exploration as a way to
detect and summarize features that might otherwise escape visual
inspection. This is particularly useful when ERPs are measured
over many tens or hundreds of recording sites; spatial patterns
can then be used to constrain the decomposition into latent
temporal patterns, as described in the following section.
The use of such high-density ERPs (recordings at 50 or more
electrodes) has become increasingly popular in the last several
years. A striking feature of high-density ERPs is that the
complexity of the data seems to grow exponentially as the number
of recording sites is doubled or tripled. Thus, while increases
in spatial resolution can lead to important new discoveries,
subtle patterns are likely to be missed, as higher spatial
sampling reveals more and more complex patterns, overlapping in
both time and space. A rational approach to data decomposition
can improve the chances of detecting these subtler effects.
-
8/22/2019 PCA-Dien PCA Chapter
4/41
4
Third, PCA can serve as an effective means of data
description. In principle, PCA can describe features of the
dataset more objectively and more precisely than is possible
with the unaided eye. Such increased precision could be
especially helpful when PCA is used as a preprocessing step for
ERP source localization (Dien, 1999; Dien, Frishkoff, Cerbonne,
& Tucker, 2003a; Dien, Spencer, & Donchin, 2003b; Dien, Tucker,
Potts, & Hartry, 1997).
Despite the many useful functions of PCA, this method has
had a somewhat checkered history in ERP research, beginning in
the 1960s (Donchin, 1966; Ruchkin, Villegas, & John, 1964). An
influential review paper by Donchin & Heffley (1979) promoted
the use of PCA for ERP component analysis. A few years later,
however, PCA entered something of a dark age in the ERP field
with the publication of a methodological critique (Wood &
McCarthy, 1984), which demonstrated that PCA solutions may be
subject to misallocation of varianceacross the latent factors.
Wood and McCarthy noted that the same problems arise in the use
of other techniques, such as reaction time and peak amplitude
measures. The difference is that misallocation is made more
explicit in PCA, which they argued should be regarded as an
advantage. Yet this last point was often overlooked, and this
seminal paper has, ironically, been cited as an argument against
-
8/22/2019 PCA-Dien PCA Chapter
5/41
5
the use of PCA. Perhaps as a consequence, many researchers
continued to rely on conventional ERP analysis techniques.
More recently, the emergence of high-density ERPs has
revived the interest in PCA as a method of data reduction.
Moreover, some recent studies have shown that statistical
decomposition can lead to novel insights into well-known ERP
effects, providing evidence to help separate ERP components
associated with different perceptual and cognitive operations
(Dien et al., 2003a; Dien, Frishkoff, & Tucker, 2000; Dien et
al., 1997; Spencer, Dien, & Donchin, 2001).
The present review presents a systematic outline of the steps
in temporal PCA, and the issues that arise at each step in
implementation. Some problems and limitations of temporal PCA
are discussed, including rotational indeterminacy, problems of
misallocation, and latency jitter. We then compare some recent
alternatives to temporal PCA, namely: spatial PCA (Dien, 1998a),
sequential (spatio-temporal or temporo-spatial) PCA (Spencer et
al., 2001), parametric PCA (Dien et al., 2003a), multi-mode PCA
(Mcks, 1988), and partial least squares (PLS) (Lobaugh, West, &
McIntosh, 2001). Each technique has evolved to address certain
weaknesses with the traditional PCA method. We conclude with
questions for further research, and advocate a research program
-
8/22/2019 PCA-Dien PCA Chapter
6/41
6
for systematic comparison of the strengths and limitations of
different multivariate techniques in ERP research.
Steps in Temporal PCA
The two most common types of factor analysis are principal
axis factors and principal components analysis. These methods
are equivalent for all practical purposes when there are many
variables and when the variables are highly correlated (Gorsuch,
1983), as in most ERP datasets. In the ERP literature PCA is
the more common method. Normally, one and the same term,
"component," has been used for both PCA linear combinations and
for characteristic spatial and temporal features of the ERP. To
avoid confusion, the termfactor(or latent factor) will be used
here to refer to PCA (latent) components, and the termcomponent
will be reserved for spatiotemporal features of the ERP
waveform.
The PCA process consists of three main steps: computation
of the relationship matrix, extraction and retention of the
factors, and rotation to simple structure. In the following
sections, PCA simulation is performed to illustrate each step,
using the PCA Toolkit (version 1.06), a set of Matlab functions
for performing PCA on ERP data. This toolkit was written by the
first author and is freely available upon request.
-
8/22/2019 PCA-Dien PCA Chapter
7/41
7
The Data Matrix
A key to understanding PCA procedures as applied to ERPs is
to be clear about how multiple sources of variance contribute to
the data decomposition. In temporal PCA, the dataset is
organized with the variables corresponding to time points, and
observations corresponding to the different waveforms in the
dataset, as shown in Figure 1.
Figure 1. Data matrix, with dimensions 20 x 6. Variables are
time points, measured in two conditions for ten subjects. Part
b presents the covariance matrix computed from the data matrix.
-
8/22/2019 PCA-Dien PCA Chapter
8/41
8
The waveforms vary across subjects, electrodes, and experimental
conditions. Thus, subject, spatial, and task variance are
collectively responsible for covariance among the temporal
variables. Although it may seem odd to commingle these three
sources of variance, they provide equally valid bases for
distinguishing an ERP component; in this respect, is reasonable
to treat them collectively. For example, the voltage readings
tend to rise and fall together between 250 and 350 ms in a
simple oddball experiment, because they are mutually influenced
by the P300 which occurs during this period. Since individual
differences, scalp location, and experimental task may all
affect the recorded P300 amplitude, the amplitudes of these time
points will likewise covary as a function of these three sources
of variance. Figure 2 shows the grand-averaged waveforms (for
n=10 subjects) corresponding to the simulated data in Figure 1.
For simplicity, this example involves only one electrode site,
ten subjects, and two experimental conditions. In subsequent
sections, we will use these simulated data to help illustrate
the steps in implementation of PCA for ERP analysis.
-
8/22/2019 PCA-Dien PCA Chapter
9/41
9
Figure 2. Waveforms for grand-averaged data (n=10),
corresponding to data in Figure 1. Graph displays two non-
overlapping correlated components, plotted for two hypothetical
conditions, A and B.
The Relationship Matrix
The first step in applying PCA is to generate a
relationship (or association) matrix, which captures the
interrelationships between temporal variables. The simplest
such matrix is the sum-of-squares cross-products (SSCP) matrix.
For each pair of variables, the two values for each observation
are multiplied and then added together. Thus, variables that
tend to rise and fall together will produce the highest values
in the matrix. The diagonal of the matrix (the relationship of
each variable to itself) is the sum of the squared values of
each variable. For an example of the effect of using the SSCP
-
8/22/2019 PCA-Dien PCA Chapter
10/41
10
matrix, see (Curry, Cooper, McCallum, Pocock, Papakostopoulos,
Skidmore, & Newton, 1983). SSCP treats mean differences in the
same fashion as differences in variance, which has odd effects
on the PCA computations. For example, factors computed on an
SSCP matrix can be correlated, even when they are orthogonal
when using other matrices. In general, we do not recommend the
use of the SSCP matrix in ERP analyses.
An alternative to the SSCP matrix is the covariance matrix.
This matrix is computed in the same fashion as the SSCP matrix,
except that the mean of each variable is subtracted out before
generating the relationship matrix. Mean correction ensures that
variables with high mean values do not have a disproportionate
effect on the factor solution. The effect of mean correction on
the solution depends on the EEG reference site, a topic that is
beyond the scope of this review (cf. Dien, 1998a).
A third alternative is to use the correlation matrix as the
relationship matrix. The correlation matrix is computed in the
same fashion as the covariance matrix, except that the variable
variances are standardized. This is accomplished by first mean
correcting each variable, and then dividing each variable by its
standard deviation, which ensures that the variables contribute
equally to the factor solution. Since time points that do not
-
8/22/2019 PCA-Dien PCA Chapter
11/41
11
contain ERP components have smaller variances, this procedure
may exacerbate the influence of background noise. Simulation
studies indicate that covariance matrices can yield more
accurate results (Dien, Beal, & Berg, submitted). We therefore
recommend the use of covariance matrices.
In Figure 3(a), the simulated data are converted into a
covariance matrix. Observe how the time points containing the
two components (t2 and t5) result in larger entries than those
without. The entries with the largest numbers will have the
most influence on the next step in the PCA procedure: factor
extraction.
Factor Extraction
In the extraction stage, a process called eigenvalue
decomposition is performed, which progressively removes linear
combinations of variables that account for the greatest variance
at each step. Each linear combination constitutes a latent
factor. In Figure 3, we demonstrate how this process
iteratively reduces the remaining values in the relationship
matrix to zero.
-
8/22/2019 PCA-Dien PCA Chapter
12/41
12
Figure 3. (a) Original covariance matrix. (b) Covariance matrix
after subtraction of Factor 1 (Varimax-rotated; see next
Section). (c) Covariance matrix after subtraction of Factors 1
and 2. Since Factors 1 and 2 together account for nearly all of
the variance, the result is the null matrix.
In general, PCA should extract as many factors as there are
variables, as long as the number of observations is equal to or
-
8/22/2019 PCA-Dien PCA Chapter
13/41
13
greater than the number of variables (i.e., as long as the data
matrix is of full rank).
The initial extraction yields an unrotated solution,
consisting of a factor loading matrixand a factor score matrix.
The factor loading matrix represents correlations between the
variables and the factor scores. The factor score matrix
indexes the magnitude of the factors for each of the
observations and thus represents the relationship between the
factors and the observations. If the two matrices are
multiplied together, they will reproduce the data matrix. By
convention, the reproduced data matrix will be in standardized
form, regardless of the type of relationship matrix that was
entered into the PCA. To recreate the original data matrix, the
variables of this standardized matrix are multiplied by the
original standard deviations, and the original variable means
are restored.
-
8/22/2019 PCA-Dien PCA Chapter
14/41
14
Figure 4. Reconstructed waveforms, calculated by multiplying the
Factor Loadings by the Factor Scores, scaled to microvolts
(i.e., multiplied by the matrix of standard deviations for the
original data). Data reconstructed using Varimax-rotated factors
for this example.
For a temporal PCA, the loadings describe the time course
of each of the factors. To accurately represent the time course
of the factors, it is necessary to first multiply them by the
variable standard deviations, which rescales them to microvolts
-
8/22/2019 PCA-Dien PCA Chapter
15/41
15
(see proof in Dien, 1998a). Further, it is important to note
that the sign of a given factor loading is arbitrary. This is
necessarily the case since a given peak in the factor time
course will be positive on one side of the head, and negative on
the other side, due to the dipolar nature of electrical fields
(Nunez, 1981). Note, further, that the dipolar distributions can
be distorted or obscured by referencing biases in the data
(Dien, 1998b). Only the product of the factor loading and the
factor score corresponds to the original data in an unequivocal
way. Thus, if the factor loading is positive at the peak, then
the factor scores from one side of the head will be positive and
the other side will be negative, corresponding to the dipolar
field.
The factor scores, on the other hand, provide information
about the other sources of variance (i.e., subject, task, and
spatial variance). For example, to compute the amplitude of a
factor at a specific electrode site for a given task condition,
one simply takes the factor scores corresponding to the
observations for that task at the electrode of interest and
computes their mean (across subjects). If this mean value is
computed for each electrode, the resulting values can be used to
plot the scalp topography for that factor. If a specific time
point is chosen, it is possible reconstruct the scalp topography
-
8/22/2019 PCA-Dien PCA Chapter
16/41
16
with the proper microvolt scaling by multiplying the mean scores
by the factor loading and the standard deviation for the time
point of interest (see proof in Dien, 1998a).
Unlike the PCA algorithm in most statistics packages, the
PCA Toolkit does not mean-correct the factor scores. This
maintains an interpretable relationship between the factor
scores and the original data. If factor scores are mean-
corrected as part of the standardization, the mean task scores
will be centered around zero, which can make factor
interpretation more difficult. In an oddball experiment, for
example, the P300 factor scores should be large for the target
condition and small for the standard condition. However, if the
factor scores are mean-corrected, then the mean task scores for
the two conditions will be of equal amplitude and opposite signs
(since mean correction splits the difference).
Factor Retention
Most of the PCA factors that are extracted account for
small proportions of variance, which may be attributed to
background noise, or minor departures from group trends. In the
interest of parsimony, only the larger factors are typically
retained, since they are considered most likely to contain
interpretable signal. A common criterion for determining how
-
8/22/2019 PCA-Dien PCA Chapter
17/41
17
many factors to retain is the scree test (Cattell, 1966; Cattell
& Jaspers, 1967). This test is based on the principle that the
PCA of a random set of data will produce a set of randomly sized
factors. Since factors are extracted in order of descending
size, when their sizes are graphed they will form a steady
downward slope. A dataset containing signal, in addition to the
noise, should have initial factors that are larger than would be
expected from random data alone. The point of departure from
the slope (the elbow) indicates the number of factors to retain.
Factors beyond this point are likely to contain noise and are
best dropped. Figure 5 plots the reconstructed grand-averaged
data, using the retained factors in order to verify that
meaningful factors have not been excluded.
-
8/22/2019 PCA-Dien PCA Chapter
18/41
18
Figure 5. Reconstructed waveforms, computed as in Figure 4,
using only Factors 1 and 2. Because the first two factors
account for nearly all of the variance, the reconstruction is
nearly as good as the original data (cf. Fig. 2).
In practice, the scree plot for ERP datasets often contains
multiple elbows, which can make it difficult to determine the
proper number of factors to retain. Part of the problem is that
the noise contains some unwanted signal (remnants of the
background EEG). A modified version of the parallel test can be
used to address this issue (Dien, 1998a). The parallel test
determines how many factors represent signal by comparing the
scree produced by the full dataset to that produced when only
the noise is present. The noise level is estimated by
generating an ERP average with every other trial inverted, which
has the effect of canceling out the signal while leaving the
noise level unchanged. The results of the parallel test should
be considered a lower bound since retaining additional factors
to account for major noise features can actually improve the
analysis (for such an exampole, see Dien, 1998a), although in
principle if too many additional factors are retained it can
result in unwanted distinctions being made (such as between
subject-specific variations of the component). In general, the
experience of the first author is that between eight and sixteen
-
8/22/2019 PCA-Dien PCA Chapter
19/41
19
factors is often appropriate, although this may depend, among
other things, on the number of recording sites.
Factor Rotation
A critical step, after deciding how many factors to retain,
is to determine the best way of allocating variance across the
remaining factors. Unfortunately, there is no transparent
relationship between the PCA factors and the latent variables of
interest (i.e., ERP components). Eigenvalue decomposition
blindly generates factors that account for maximum variance,
which may be influenced by more than one latent variable;
whereas, the goal is to have each factor represent a single ERP
component.
As shown in Figure 6, there is not a one-to-one mapping of
factors to variables after the initial factor extraction.
Rather, the initial extraction has maximized the variance of the
first factor by including variance from as many variables as
possible. In doing so, it has generated a factor that is a
hybrid of two ERP components, the linear sum of roughly 10% of
the P1 and 90% of the P3. The second factor contains the
leftover variance of both components. This example demonstrates
the danger of interpreting the initial unrotated factors
-
8/22/2019 PCA-Dien PCA Chapter
20/41
20
directly, as is sometimes advocated (e.g., Rsler & Manzey,
1981).
Figure 6. Graph of unrotated factors. Only Factors 1 and 2
are graphed, since Factors 36 are close to 0.
Factor rotation is used to restructure the allocation of
variables to factors to maximize the chance that each factor
reflects a single latent variable. The most common rotation is
Varimax (Kaiser, 1958). In Varimax, each of the retained
factors is iteratively rotated pairwise with each of the other
factors in turn, until changes in the solution become
negligible. More specifically, the Varimax procedure rotates the
-
8/22/2019 PCA-Dien PCA Chapter
21/41
21
two factors such that the sum of the factor loadings (raised to
the fourth power) is maximized. This has the effect of favoring
solutions in which factor loadings are as extreme as possible
with a combination of near-zero loadings and large peak values.
Since ERP components (other than DC potentials) tend to have
zero activity for most of the epoch with a single major peak or
dip, Varimax should yield a reasonable approximation to the
underlying ERP components (Fig. 7). Temporal overlap of ERP
components raises additional issues, which are addressed in the
following section.
-
8/22/2019 PCA-Dien PCA Chapter
22/41
22
Figure 7. Graph of Varimax-rotated Factor Loadings. Only Factors
1 and 2 are graphed, since Factors 36 are close to 0.
Simulation studies have demonstrated that the accuracy of a
rotation is influenced by several situations, including
component overlap and component correlation (Dien, 1998a).
Component overlap is a problem, because the more similar two ERP
components are, the more difficult it is to distinguish them
(Mcks & Verleger, 1991). Further, correlations between
components may lead to violations of statistical assumptions.
The initial extraction and the subsequent Varimax rotation
maintain strict orthogonality between the factors (so the
factors are uncorrelated). To the extent that the components
are in fact correlated, the model solution will be inaccurate,
producing misallocation of variance across the factors.
Component correlation can arise when two components respond to
the same task variables (an example is the P300 and Slow Wave
components, which often co-occur), or when both components
respond to the same subject parameters (e.g., age, sex, or
personality traits), or share a common spatial distribution.
Further, these two components can be measured at some of the
same electrodes due to their similar scalp topographies.
-
8/22/2019 PCA-Dien PCA Chapter
23/41
23
Component correlation can be effectively addressed by using
an oblique rotation, such as Promax (Hendrickson & White, 1964),
allowing for correlated factors (Dien, 1998a). In Promax, the
initial Varimax rotation is succeeded by a "relaxation" step, in
which each individual factor is further rotated to maximize the
number of variables with minimal loadings. A factor is adjusted
in this fashion without regard to the other factors, allowing
factors to become correlated, and thus relaxing the
orthogonality constraint in the Varimax solution. The Promax
rotation typically leads to solutions that more accurately
capture the large features of the Varimax factors while
minimizing the smaller features. As a result, Promax solutions
tend to account for slightly less variance than the original
Varimax solutions, but may also give more accurate results
(Dien, 1998a). A typical result can be seen in Figure 8.
-
8/22/2019 PCA-Dien PCA Chapter
24/41
24
Figure 8. Graph of Promax-rotated Factor Loadings. Only Factors
1 and 2 are graphed, since Factors 36 are close to 0.
Spatial versus Temporal PCA
A limitation of temporal PCA is that factors are defined
solely as a function of component time course, as instantiated
by the factor loadings. This means that ERP components which
are topographically distinct, but have a similar time course,
will be modeled by a single factor. A sign that this has
occurred is when temporal PCA yields condition effects
characterized by a scalp topography that differs from the
overall factor topography.
-
8/22/2019 PCA-Dien PCA Chapter
25/41
25
To address this problem, spatial PCA may be used as an
alternative to temporal PCA (Dien, 1998a). In a spatial PCA,
the data are arranged such that the variables are electrode
locations, and observations are experimental conditions,
subjects, and time points. The factor loadings therefore
describe scalp topographies, instead of temporal patterns. This
approach is less likely to confound ERP components with the same
time course. as long as they are differentiated by the task or
subject variance. On the other hand, it will be subject to the
converse problem, confusing components with similar scalp
topographies, even when they have clearly separate time
dynamics.
The choice between spatial versus temporal PCA should
depend on specific analysis goals. A rule of thumb is that if
time course is the focus of an analysis, then spatial PCA should
be used, and vice versa. The reason is that the factor loadings
are constrained to be the same across the entire dataset (i.e.,
the same time course for temporal PCA, and the same scalp
topography for spatial PCA). The factor scores, on the other
hand, are free to vary between conditions and subjects. Thus,
one can examine latency changes with spatial, but not temporal,
PCA, and vice versa. In particular, this implies that temporal,
-
8/22/2019 PCA-Dien PCA Chapter
26/41
26
rather than spatial, PCA should be more effective as a
preprocessing step in source localization, since these modeling
procedures rely on the scalp topography to infer the number and
configuration of sources.
All other things being equal, temporal PCA is in principle
more accurate than spatial PCA, since component overlap reduces
PCA accuracy, and volume conduction ensures that all ERP
components will overlap in a spatial PCA. Furthermore, the
effect of the Varimax and Promax rotations is to minimize factor
overlap, which is a more appropriate goal for temporal than for
spatial PCA. A caveat in either case is that ERP component
separation can be achieved only if subject or task variance (or
both) can effectively distinguish the components. In other
words, the three sources of variance associated with each
observation must collectively be able to distinguish the ERP
components, regardless of whether the components differ along
the variable dimension (time for temporal PCA, or space for
spatial PCA).
Recent alternatives to PCA
In recent years, a variety of multivariate statistical
techniques have been developed and are increasingly making their
way into the ERP literature. One such method, independent
-
8/22/2019 PCA-Dien PCA Chapter
27/41
27
components analysis (Makeig, Bell, Jung, & Sejnowski, 1996), is
discussed elsewhere in this volume.
In this section, we present four multivariate techniques in
ERP analysis, which share a common basis in their use of
eigenvalue decomposition. Each technique has been claimed to
address one or more problems with conventional PCA. The
application these techniques in ERP research is very recent, and
more work is needed to characterize their respective strengths
and limitations for various ERP applications. Future
developments of these techniques may lead to a powerful suite of
tools that can be used to address a range of problems in ERP
analysis.
Sequential spatiotemporal (or temporospatial) PCA
A recent procedure for improved separation of ERP
components is spatiotemporal (or temporospatial) PCA (Spencer,
Dien, & Donchin, 1999; Spencer et al., 2001). In this
procedure, ERP components that were confounded in the initial
PCA are separated by the application of a second PCA, which is
used to separate variance along the other dimension. For a
temporospatial PCA this is accomplished by rearranging the
factor scores resulting from the temporal PCA so that each
column contains the factor scores from a different electrode. A
-
8/22/2019 PCA-Dien PCA Chapter
28/41
28
spatial PCA can then be conducted using these factor scores as
the new variables. While the temporal variance has been
collapsed by the initial PCA, the subject and task variance are
still expressed in the observations, and can thus be used to
separate ERP components that were confounded in the initial PCA.
In Spencer, et al. (1999, 2001), the initial spatial PCA
was followed by a temporal PCA, with the factor scores from all
the factors combined within the same analysis (number of
observations equal to number of subjects x number of tasks x
number of spatial factors). This procedure led to a clear
separation of the P300 from the Novelty P3. However, it also had
an important drawback: the application of a single temporal PCA
to all the spatial factors could result in loss of some of the
finer distinctions in time course between different spatial
factors. This analytic strategy was necessary because the
generalized inverse function, used by SAS to generate the factor
scores, requires that there be more observations than variables.
The PCA Toolkit has bypassed this requirement by directly
rotating the factor scores (Mcks & Verleger, 1991), allowing
each initial factor to be subjected to a separate PCA (following
the example of Scott Makeigs ICA toolbox and an independent
suggestion by Bill Dunlap). In a more recent study (Dien et al.,
-
8/22/2019 PCA-Dien PCA Chapter
29/41
29
2003b), an initial spatial PCA yielded 12 factors; each spatial
factor was then subjected to a separate temporal PCA (each
retaining four factors for simplicity's sake). For analyses
using this newer approach (it makes little difference for the
original approach), temporospatial PCA is recommended over
spatiotemporal PCA, since temporal PCA may lead to better
initial separation of ERP components. Subsequent application of
a spatial PCA can then help separate components that were
confounded in the temporal PCA. On the other hand, if latency
analysis is a goal of the PCA, then spatial PCA should be done
first (since latency analysis cannot be done on the results of a
temporal PCA) with the succeeding temporal PCA step used to
verify whether multiple components are present in the factor of
interest, as demonstrated in another recent study (Dien,
Spencer, & Donchin, in press).
The full equation to generate the microvolt value for a
specific time point t and channel c for a spatiotemporal PCA is
L1 * V1 * L2 * S2 * V2 (where L1 is the spatial PCA factor
loading for c, V1 is the standard deviation of c, L2 is the
temporal PCA factor loading for t, S2 is the mean factor scores
for the temporal factor, and V2 is the standard deviation of the
spatial factor scores at t. The temporal and spatial terms are
reversed for temporospatial PCA.
-
8/22/2019 PCA-Dien PCA Chapter
30/41
30
Parametric PCA
Another recent method involves the use of parametric
measures to improve PCA separation of latent factors, which
differ along one or more stimulus dimension (Dien et al.,
2003a). This more specialized procedure can only be conducted
on datasets containing observations with a continuous range of
values. In Dien, et al. (2003), ERP responses to sentence
endings were averaged for each stimulus item (collapsing over
subjects) rather than averaging over subjects (collapsing over
items in each experimental condition). This item-averaging
approach resulted in 120 sentence averages, which were rated on
a number of linguistic parameters, such as meaningfulness and
word frequency. After an initial temporal PCA, it was then
possible to correlate the parameter of interest with the mean
factor score at each channel, to determine the influence of the
stimulus parameter on a given ERP component, such as the N400.
This had the effect of highlighting the relationship between ERP
components and stimulus parameters, while factoring out the
effects of ERP components unrelated to the parameters of
interest. In this fashion, parametric PCA can lead to scalp
topographies that reflect only the parameters of interest,
providing a new approach to functional separation of ERP
components. This approach thus provides an alternative method to
-
8/22/2019 PCA-Dien PCA Chapter
31/41
31
sequential PCA for deconfounding components. These components
can be then be subjected to further analyses, such as dipole and
linear inverse modeling.
Partial Least Squares
Partial least squares (PLS), like PCA, is a multivariate
technique based on eigenvalue decomposition. Unlike PCA, PLS
operates on the covariance between the data matrix and a matrix
of contrasts that represents features of the experimental design
(McIntosh, Bookstein, Haxby, & Grady, 1996). Similar to
parametric PCA procedures, the decomposition is focused on
variance due to the experimental manipulations (condition
differences). A recent paper (Lobaugh et al., 2001) applied PLS
to ERP data for the first time. Simulations showed that the PLS
analysis led to accurate modeling of the spatial and temporal
effects that were associated with condition differences in the
ERP waveforms. Lobaugh, et al., also suggest that PLS may be an
effective preprocessing method, identifying time points and
electrodes that are sensitive to condition differences and can
therefore be targeted for further analyses.
One cautionary note concerning PLS arises from the use of
difference waves, which are created by subtracting the ERP
waveform in one experimental condition from the response in a
-
8/22/2019 PCA-Dien PCA Chapter
32/41
32
different condition, in order to isolate experimental effects
prior to factor extraction. This approach, based on the logic of
subtraction, can lead to incorrect conclusions when the
assumption of pure insertion is violated, that is, when two
conditions are different in kind rather than in degree. It can
also produce misleading results when a change in latency appears
to be an amplitude effect or when multiple effects appear to be
a single effect.
Neuroimaging measures, such as ERP and fMRI, may be
particularly subject to such misinterpretations, since both
spatial (anatomical) and temporal, variance can lead to
condition differences. If these multiple sources of variance not
adequately separated, a temporal difference between conditions
may be erroneously ascribed to a single anatomical region or ERP
component (e.g., (Zarahn, Aguirre, & D'Esposito, 1999). In a
recent example (Spencer, Abad, & Donchin, 2000), PCA was used to
examine the claim that recollection (as compared with
familiarity) is associated with a unique electrophysiological
component. Spencer, et al., concluded that the effect was more
accurately ascribed to differences in latency jitter, or trial-
to-trial variance in peak latency of the P300 across the two
conditions.
-
8/22/2019 PCA-Dien PCA Chapter
33/41
33
For this reason, condition differences, while useful,
should only be interpreted in respect to the overall patterns in
the original data. Further, both spatial and temporal variance
between conditions should be analyzed fully, to rule out
differences in latency jitter or other electrophysiological
effects that may be hidden or confounded through cognitive
subtraction. This recommendation also applies to the
interpretation of results from the use of partial variance
techniques, such as PLS.
Multi-Mode Factor Analysis
The techniques discussed in previous sections were all
based on two-mode (defined as a dimension of variance) analysis
of ERP data. In temporal PCA, for instance, time points
represent one dimension (variables axis), and the other
dimension combines the remaining sources of variance i.e.,
subjects, electrodes, and experimental conditions (observations
axis). By contrast, multi-mode procedures analyze the data
across three or more dimensions simultaneously. For example, in
trilinear decomposition (TLD), the subject data matrix Xi is
expressed as the cross-product of three factors, as shown in
equation (1):
Xi =B *Ai *C, (1)
-
8/22/2019 PCA-Dien PCA Chapter
34/41
34
where B is a set of spatial components and C is a set of
temporal components. Ai represents the subject loadings on B and
C. B and C are calculated in separate, spatial and temporal,
singular value decompositions of the data and are then combined
to yield a new decomposition of the data, for any fixed
dimensionality (Wang, Begleiter, & Porjesz, 2000). Since tri-
mode PCA is susceptible to the same rotational indeterminacies
as regular PCA, rotational procedures will need to be developed
and evaluated.
It has been claimed that tri-mode PCA can effectively remove
nuisance sources of variance, as described by Mcks (1985),
providing greater sensitivity as compared with conventional,
two-dimensional PCA. Further, Achim has described the use of
multi-modal procedures to help address misallocation of variance
(Achim & Bouchard, 1997). These reports point to the need for
thorough and systematic comparison of multi-mode methods such as
TLD with other methods, such as parametric PCA and PLS. This
can only be done for algorithms that are made available to the
rest of the research community, either through open source or
through commercial software packages.
-
8/22/2019 PCA-Dien PCA Chapter
35/41
35
Conclusion
PCA and related procedures can provide an effective way to
preprocess high-density ERP datasets, and to help separate
components that differ in their sensitivity to spatial,
temporal, or functional parameters. This brief review has
attempted to characterize the current state of the art in PCA of
ERPs. Ongoing research will continue to refine and optimize
statistical procedures and will aim to determine the optimal
procedures for statistical decompositions of ERP data.
Ultimately, it is likely that a range of statistical tools will
be required, each best suited to different applications in ERP
analysis.
-
8/22/2019 PCA-Dien PCA Chapter
36/41
36
Bibliography
Achim, A., & Bouchard, S. (1997). Toward a dynamic topographic
components model. Electroencephalography and Clinical
Neurophysiology, 103, 381-385.
Cattell, R. B. (1966). The scree test for the number of factors.
Multivariate Behavioral Research, 1, 245-276.
Cattell, R. B., & Jaspers, J. (1967). A general plasmode (No.
3010-5-2) for factor analytic exercises and research.
Multivariate Behavioral Research Monographs, 67-3, 1-212.
Curry, S. H., Cooper, R., McCallum, W. C., Pocock, P. V.,
Papakostopoulos, D., Skidmore, S., et al. (1983). The
principal components of auditory target detection. In A. W.
K. Gaillard & W. Ritter (Eds.), Tutorials in ERP research:
Endogenous components (pp. 79-117). Amsterdam: North-
Holland Publishing Company.
Dien, J. (1998a). Addressing misallocation of variance in
principal components analysis of event-related potentials.
Brain Topography, 11(1), 43-55.
-
8/22/2019 PCA-Dien PCA Chapter
37/41
37
Dien, J. (1998b). Issues in the application of the average
reference: Review, critiques, and recommendations.
Behavioral Research Methods, Instruments, and Computers,
30(1), 34-43.
Dien, J. (1999). Differential lateralization of trait anxiety
and trait fearfulness: evoked potential correlates.
Personality and Individual Differences, 26(1), 333-356.
Dien, J., Beal, D., & Berg, P. (submitted). Optimizing principal
components analysis for event-related potential analysis.
Dien, J., Frishkoff, G. A., Cerbonne, A., & Tucker, D. M.
(2003a). Parametric analysis of event-related potentials in
semantic comprehension: Evidence for parallel brain
mechanisms. Cognitive Brain Research, 15, 137-153.
Dien, J., Frishkoff, G. A., & Tucker, D. M. (2000).
Differentiating the N3 and N4 electrophysiological semantic
incongruity effects. Brain & Cognition, 43, 148-152.
Dien, J., Spencer, K. M., & Donchin, E. (2003b). Localization of
the event-related potential novelty response as defined by
principal components analysis. Cognitive Brain Research,
17, 637-650.
Dien, J., Spencer, K. M., & Donchin, E. (in press). Parsing the
"Late Positive Complex": Mental chronometry and the ERP
-
8/22/2019 PCA-Dien PCA Chapter
38/41
38
components that inhabit the neighborhood of the P300.
Psychophysiology.
Dien, J., Tucker, D. M., Potts, G., & Hartry, A. (1997).
Localization of auditory evoked potentials related to
selective intermodal attention. Journal of Cognitive
Neuroscience, 9(6), 799-823.
Donchin, E. (1966). A multivariate approach to the analysis of
average evoked potentials. IEEE Transactions on Bio-Medical
Engineering, BME-13, 131-139.
Donchin, E., & Coles, M. G. H. (1991). While an undergraduate
waits. Neuropsychologia, 29(6), 557-569.
Donchin, E., & Heffley, E. (1979). Multivariate analysis of
event-related potential data: A tutorial review. In D. Otto
(Ed.), Multidisciplinary perspectives in event-related
potential research (EPA 600/9-77-043) (pp. 555-572).
Washington, DC: U.S. Government Printing Office.
Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ:
Lawrence Erlbaum Associates.
Hendrickson, A. E., & White, P. O. (1964). Promax: A quick
method for rotation to oblique simple structure. The
British Journal of Statistical Psychology, 17, 65-70.
-
8/22/2019 PCA-Dien PCA Chapter
39/41
39
Kaiser, H. F. (1958). The varimax criterion for analytic
rotation in factor analysis. Psychometrika, 23, 187-200.
Lobaugh, N. J., West, R., & McIntosh, A. R. (2001).
Spatiotemporal analysis of experimental differences in
event-related potential data with partial least squares.
Psychophysiology, 38(3), 517-530.
Makeig, S., Bell, A. J., Jung, T., & Sejnowski, T. J. (1996).
Independent component analysis of electroencephalographic
data. Advances in Neural Information Processing Systems, 8,
145-151.
McIntosh, A. R., Bookstein, F. L., Haxby, J. V., & Grady, C. L.
(1996). Spatial pattern analysis of functional brain images
using Partial Least Squares. Neuroimage, 3, 143-157.
Mcks, J. (1988). Topographic components model for event-related
potentials and some biophysical considerations. IEEE
Transactions on Biomedical Engineering, 35(6), 482-484.
Mcks, J., & Verleger, R. (1991). Multivariate methods in
biosignal analysis: application of principal component
analysis to event-related potentials. In R. Weitkunat
(Ed.), Digital Biosignal Processing(pp. 399-458).
Amsterdam: Elsevier.
-
8/22/2019 PCA-Dien PCA Chapter
40/41
40
Nunez, P. L. (1981). Electric fields of the brain: The
neurophysics of EEG. New York: Oxford University Press.
Rsler, F., & Manzey, D. (1981). Principal components and
varimax-rotated components in event-related potential
research: Some remarks on their interpretation. Biological
Psychology, 13, 3-26.
Ruchkin, D. S., Villegas, J., & John, E. R. (1964). An analysis
of average evoked potentials making use of least mean
square techniques. Annals of the New York Academy of
Sciences, 115(2), 799-826.
Spencer, K. M., Abad, E. V., & Donchin, E. (2000). On the search
for the neurophysiological manifestation of recollective
experience. Psychophysiology, 37, 494-506.
Spencer, K. M., Dien, J., & Donchin, E. (1999). A componential
analysis of the ERP elicited by novel events using a dense
electrode array. Psychophysiology, 36, 409-414.
Spencer, K. M., Dien, J., & Donchin, E. (2001). Spatiotemporal
Analysis of the Late ERP Responses to Deviant Stimuli.
Psychophysiology, 38(2), 343-358.
Wang, K., Begleiter, H., & Porjesz, B. (2000). Trilinear
modeling of event-related potentials. Brain Topography,
12(4), 263-271.
-
8/22/2019 PCA-Dien PCA Chapter
41/41
41
Wood, C. C., & McCarthy, G. (1984). Principal component analysis
of event-related potentials: Simulation studies demonstrate
misallocation of variance across components.
Electroencephalography and Clinical Neurophysiology, 59,
249-260.
Zarahn, E., Aguirre, G. K., & D'Esposito, M. (1999). Temporal
isolation of the neural correlates of spatial mnemonic
processing with fMRI. Cognitive Brain Research, 7(3), 255-
268.