Exploratory versus Confirmatory Analysis in Imaging...

11
Exploratory versus Confirmatory Analysis in Imaging Neuroscience Radu MUTIHAC University of Bucharest Faculty of Physics 405, Atomistilor St., Sector 5 Romania [email protected] Allen BRAUN National Institutes of Health NIDCD, Language Section 9000 Rockville Pike, Bethesda U.S.A. [email protected] Thomas J. BALKIN WRAIR, Psychiatry and Neuroscience Department of Behavioral Biology 503 Robert Grant Ave, Silver Spring U.S.A. [email protected] Abstract: The present overview briefly highlights the advances and limitations in functional neuroimaging by critically comparing the confirmatory and exploratory methods of experimental fMRI data analysis. Typical multi- variate methods like independent component analysis and fuzzy cluster analysis were applied to real-life functional neuroimaging data and comparatively discussed versus univariate inferential statistical techniques like the general linear model. Key–Words: Functional magnetic resonance imaging (fMRI), statistical parametric mapping (SPM), independent component analysis (ICA), fuzzy cluster analysis (FCA), general linear model (GLM). 1 Introduction Experimental data are intricate mixtures of inter- esting and uninteresting signal sources. Biomedical time series, particularly functional brain imaging data, are rich sources of information about physiological processes, but they are often contaminated with ar- tifacts and noise, and typically recorded as mixtures of unknown combinations of sources summing differ- ently in time and/or space. In many cases, even the nature of sources is an open question. Unless accurate information is available to allow for faithful retrieval of the original signal sources, educated estimation of plausible solutions falls in the class of blind source separation (BSS) methods [1]. Analysis of fMRI data is a noninvasive method that allows the localization of dynamic brain pro- cesses in intact living brains. The index of neuronal activity (contrast) widely spread in neuroimaging is the blood oxygenation level dependent (BOLD) signal [2], which is based on the differing magnetic suscep- tibilities of the oxygenated hemoglobin (diamagnetic) and deoxygenated (paramagnetic) in relation with the surrounding tissue. The basic assumption is that an increase in neuronal activity within a brain region en- tails an increase in local blood flow, leading to reduced concentrations of deoxyhemoglobin in the blood ves- sels. Consequently, relative decreases in deoxyhe- moglobin concentration attract a reduction in local field inhomogeneity and a slower decay of the MR signal, resulting in higher intensities in T2*-weighted images. However, the changes in blood flow and oxy- genation (vascular and hemodynamic) are temporally delayed relative to the neural firing, a confounding factor known as hemodynamic lag. The interest in functional brain studies lays in the electrical activity of firing neurons, which cannot en- tirely be inferred by analyzing the vascular process because: (i) the hemodynamic lag varies in a complex way from tissue to tissue, and (ii) no theory on the relationship between the electrical and hemodynamic processes is available. Yet the vascular process pro- vides valuable information for the electrical activity in firing neurons, which advocates for the interest in fMRI data analysis and need for methods that alleviate some of the shortcomings associated with the detec- tion of BOLD signal. The presence of artifacts (e.g., subject and scanner movements, RF coil heating, dif- ferences from scanner to scanner, ...), and physiolog- ical sources of variability (cardiac, pulmonary, and other pulsations) render detection of the activation- related signal changes difficult and sometimes ques- tionable. An additional difficulty in delineating func- tional correlates from spatiotemporal fMRI data sets stems from the relatively small effect sizes in blood- flow related phenomena, which is translated in low image contrast-to-noise ratio (CNR) of the BOLD sig- nal. The difficulty in extracting information from raw data is supplementary increased by the functional cor- relates of brain activity that may relate to behavioral paradigms in complicated ways. In any case, the interpretation of functional brain imaging data inevitably requires some assumptions on processing in the working brain that may not be entirely realistic and which preclude canonical meth- ods of data analysis and experimental design. Two Recent Advances in Intelligent Control, Modelling and Simulation ISBN: 978-960-474-365-0 64

Transcript of Exploratory versus Confirmatory Analysis in Imaging...

Page 1: Exploratory versus Confirmatory Analysis in Imaging ...wseas.us/e-library/conferences/2014/CambridgeUSA/ICMS/ICMS-06.pdfExploratory versus Confirmatory Analysis in Imaging Neuroscience

Exploratory versus Confirmatory Analysis in Imaging Neuroscience

Radu MUTIHACUniversity of Bucharest

Faculty of Physics405, Atomistilor St., Sector 5

[email protected]

Allen BRAUNNational Institutes of HealthNIDCD, Language Section

9000 Rockville Pike, BethesdaU.S.A.

[email protected]

Thomas J. BALKINWRAIR, Psychiatry and Neuroscience

Department of Behavioral Biology503 Robert Grant Ave, Silver Spring

[email protected]

Abstract: The present overview briefly highlights the advances and limitations in functional neuroimaging bycritically comparing the confirmatory and exploratory methods of experimental fMRI data analysis. Typical multi-variate methods like independent component analysis and fuzzy cluster analysis were applied to real-life functionalneuroimaging data and comparatively discussed versus univariate inferential statistical techniques like the generallinear model.

Key–Words: Functional magnetic resonance imaging (fMRI), statistical parametric mapping (SPM), independentcomponent analysis (ICA), fuzzy cluster analysis (FCA), general linear model (GLM).

1 Introduction

Experimental data are intricate mixtures of inter-esting and uninteresting signal sources. Biomedicaltime series, particularly functional brain imaging data,are rich sources of information about physiologicalprocesses, but they are often contaminated with ar-tifacts and noise, and typically recorded as mixturesof unknown combinations of sources summing differ-ently in time and/or space. In many cases, even thenature of sources is an open question. Unless accurateinformation is available to allow for faithful retrievalof the original signal sources, educated estimation ofplausible solutions falls in the class of blind sourceseparation (BSS) methods [1].

Analysis of fMRI data is a noninvasive methodthat allows the localization of dynamic brain pro-cesses in intact living brains. The index of neuronalactivity (contrast) widely spread in neuroimaging isthe blood oxygenation level dependent (BOLD) signal[2], which is based on the differing magnetic suscep-tibilities of the oxygenated hemoglobin (diamagnetic)and deoxygenated (paramagnetic) in relation with thesurrounding tissue. The basic assumption is that anincrease in neuronal activity within a brain region en-tails an increase in local blood flow, leading to reducedconcentrations of deoxyhemoglobin in the blood ves-sels. Consequently, relative decreases in deoxyhe-moglobin concentration attract a reduction in localfield inhomogeneity and a slower decay of the MRsignal, resulting in higher intensities in T2*-weightedimages. However, the changes in blood flow and oxy-genation (vascular and hemodynamic) are temporally

delayed relative to the neural firing, a confoundingfactor known as hemodynamic lag.

The interest in functional brain studies lays in theelectrical activity of firing neurons, which cannot en-tirely be inferred by analyzing the vascular processbecause: (i) the hemodynamic lag varies in a complexway from tissue to tissue, and (ii) no theory on therelationship between the electrical and hemodynamicprocesses is available. Yet the vascular process pro-vides valuable information for the electrical activityin firing neurons, which advocates for the interest infMRI data analysis and need for methods that alleviatesome of the shortcomings associated with the detec-tion of BOLD signal. The presence of artifacts (e.g.,subject and scanner movements, RF coil heating, dif-ferences from scanner to scanner, ...), and physiolog-ical sources of variability (cardiac, pulmonary, andother pulsations) render detection of the activation-related signal changes difficult and sometimes ques-tionable. An additional difficulty in delineating func-tional correlates from spatiotemporal fMRI data setsstems from the relatively small effect sizes in blood-flow related phenomena, which is translated in lowimage contrast-to-noise ratio (CNR) of the BOLD sig-nal. The difficulty in extracting information from rawdata is supplementary increased by the functional cor-relates of brain activity that may relate to behavioralparadigms in complicated ways.

In any case, the interpretation of functional brainimaging data inevitably requires some assumptionson processing in the working brain that may not beentirely realistic and which preclude canonical meth-ods of data analysis and experimental design. Two

Recent Advances in Intelligent Control, Modelling and Simulation

ISBN: 978-960-474-365-0 64

Page 2: Exploratory versus Confirmatory Analysis in Imaging ...wseas.us/e-library/conferences/2014/CambridgeUSA/ICMS/ICMS-06.pdfExploratory versus Confirmatory Analysis in Imaging Neuroscience

Figure 1: Data/model-based univariate/multivariate dataanalysis. Some basic methods with their main highs (+)and lows (-).

general principles of cerebral function have been de-rived from investigating brain lesions and recordingsignals from smaller or larger clusters of neurons:(i) functional specialization of brain regions, whichmeans that different brain regions perform differenttasks [3], and (ii) functional integration [4], whichstates that cerebral functions are carried out by net-works of interacting regions and that different func-tions correspond to different networks. As such, thereare two main types of assumption underlying the in-terpretation of functional neuroimages, namely, thesubtraction paradigm and the covariance paradigm[5]. The subtraction paradigms assume that differ-ent brain regions are engaged in different brain func-tions (i.e., they rely on functional specialization). Thecovariance paradigms assess the temporal covariancebetween different brain regions during a particulartask. Significant covariance between regions associ-ated with a particular brain function is termed func-tional connectivity. The extraction of functional cor-relates from raw data sets is facilitated by using thesubtraction or covariance paradigms during prepro-cessing step. Due to their complementarity, it is oftennecessary to employ both paradigm types in order toresolve all the functional components of a given cere-bral process [6].

2 Image Data Analysis

Imaging Neuroscience is aiming to reveal func-tional changes, that is, changes in brain activity, andstructural changes, which address changes in neu-roanatomy. Modifications in a structure such as greyor white matter, a gyrus or a sulcus, can be detectedusing three different methodologies: (i) segmentedMR images of the structure, where each voxel recordsthe presence or absence of the structure [7]; (ii) sur-

face extraction using an automated method [8], [9],[10], [11], and (iii) non-linear deformations requiredto move the structure to an atlas or standardized struc-ture [12], [13]. Images produced by all differing tech-niques presented above can still be similarly analyzedbecause differences consist in details. Since the signalto noise ratio (SNR) is quite small, the signal is com-monly enhanced by two methods. The first involvesspatial smoothing of the images, with the amount ofsmoothing set to match the signal to be detected. Try-ing all possible smoothing filter widths adds an extrascale space dimension to the data [14], [15]. The sec-ond amounts to experiment repetition, either on dif-ferent subjects (which requires careful image regis-tration), or on different scans within the same sub-ject. The result is a set of dependent variables, Y,one for each voxel of the images. These are related toa set of predictor variables, X, which are measured oneach image. These include factors such as tasks (base-line, stimulus) or groups of subjects (cases, controls),leading to ANOVA type models, or continuous vari-ates, leading to ANCOVA type models, with possiblequadratic or polynomial effects in these variates [16].Replacing the variate by the image values at a singlevoxel leads to an analysis of the covarying voxels. Inall cases, the computations may be tuned for optimalresults with different types of data. Since temporalcorrelation exist in fMRI time series, some modifi-cations to the least squares theory for linear modelsare necessary [17]. In addition, all three image typesof structural data have non-stationary smoothness andnon-stationary voxel standard deviation.

Two categories of data analysis are employed infunctional neuroimaging to reveal statistical regulari-ties in data that can be associated with brain function:(i) hypothesis-driven (inferential or model-based), and(ii) data-driven (exploratory or model-free) analysis(Fig. 1).

2.1 Hypothesis-Driven Models

Most of imaging neuroscience relies on confirma-tory data analysis (CAD) like inferential hypothesis-led analysis, which makes use of spatially extendedprocesses like statistical parametric mapping (SPM)[18]. Testing a model reduces operationally to spec-ifying the expected changes on the basis of the hy-potheses drawn independently from the data understudy. Changes are specified as regressors of interestin a multiple linear regression framework (general lin-ear model) and the estimated regression coefficientsare tested against a null hypothesis. The voxel-wisetest statistics form summary images known as statis-tical parametric maps, which are commonly assessedfor statistical significance or tested for the size or mass

Recent Advances in Intelligent Control, Modelling and Simulation

ISBN: 978-960-474-365-0 65

Page 3: Exploratory versus Confirmatory Analysis in Imaging ...wseas.us/e-library/conferences/2014/CambridgeUSA/ICMS/ICMS-06.pdfExploratory versus Confirmatory Analysis in Imaging Neuroscience

of suprashresholded clusters [19]. The resulting mapof statistic is a representation of the spatial distribu-tion of functional activity elicited by the task (Fig. 2).

Problems with the hypothesis-driven models:(i) possible presence of unmodeled or missed signalsin data, particularly artifactual activity; (ii) structurednoise, which is temporally non-orthogonal to an as-sumed regression model, will bias the parameter es-timates, and noise orthogonal to the design will in-flate the residual errors, thus reducing the statisti-cal significance and rendering the analysis subopti-mal; (iii) an increasingly number of models includespatial prior information, whereas the Gaussian Ran-dom Field theory-based inference deals with spatialproperties after modeling has completed, (iv) this ap-proach is essentially confirmatory in nature and basedon strong priors about the spatiotemporal character-istics of the signals, so that the inferred patterns ofactivation depend heavily on the accuracy of these as-sumptions, (v) the standard GLM is essentially uni-variate, i.e, the model is fit separately to each voxel’stime course (though univariate approaches may be for-mulated independent of a temporal model [20]). In-escapable limitations of univariate hypothesis-drivenmethods stem from the temporal predictability of theexpected hemodynamic response on which they onlyrely ignoring the information derived from the covari-ance of the acquired voxel time series.

2.2 Data-Driven Models

Spatiotemporal characteristics of brain activityare frequently unknown and variable, which precludetheir evaluation by confirmatory methods only. Re-vealing unanticipated or missed patterns of activation,exploratory data analysis (EDA) allows to improve oreven to change the original hypotheses. In contrastto CAD, EDA makes no reference to prior knowledgeof the structure in data and provides models whosecharacteristics are determined by the statistical prop-erties of data only. In data-driven analysis no statis-tical model is needed about what inferences to per-form. Multivariate data analysis relies on the covari-ance paradigm and is free of prior assumptions on ac-tivation functions. EDA is capable to detect the func-tional activity without reference to the experimentalprotocol and can also reveal new components in data.Generally, methods of unsupervised learning fall inthe class of data-driven analysis, such as eigenimageanalysis, self-organizing maps (SOM) [21], temporalcluster analysis (CA) and fuzzy cluster analysis (FCA)[22], factor analysis (FA) [23], projection pursuit (PP)[24], principal component analysis (PCA) [25], andindependent component analysis (ICA) [26].

Tukey [27] argued that classical statistics leaning

on analyzing small, homogeneous, stationary data bymeans of known distributional models and assump-tions would have proved inappropriate to face the is-sues raised by the analysis of large and complex datasets. Two basic features of fMRI data, which are char-acteristics for massive data sets, are nonstationarityand distributional heterogeneity. It is claimed thatthe difference between real-life large data sets andsmaller ones consists not only in size but in quali-tative terms as well [28]. Consequently, the investi-gations of functional brain imaging data should pri-marily rely on critical consideration of methods thatbelong to data mining and exploratory data analy-sis. A critical evaluation and comparison of the data-driven methods used in fMRI data analysis has notbeen published to date. Besides, as briefly stated byHuber [28] “ there are no panaceas in data analysis”either. In other words, all methods have highs andlows, so that an educated choice appears to be prob-lem domain-dependent. SOM is a clustering method,which maps high-dimensional data onto a 2-D or 3-D grid, while trying to preserve the original relativedistances. The constraints imposed by PCA and FA,which segregate data by partitioning its total vari-ance into uncorrelated components, appear unrealisticin fMRI since may lead to ambiguous separation oftime courses corresponding to activation, noise, andartifacts. This is due to the relative small amplitudeof the task-related components and non-specificity ofvariance partitioning. Variance partitioning in ICA isbased on mutual information (MI), though constraintsof spatial and/or temporal statistical independence, aswell as non-Gaussianity are imposed, which may onlypartly be true in real data. Most applications of ICAinclude PCA as a preprocessing step for whiteningdata, dimension reduction, and/or filtering out somenoise, though by removing the many smallest prin-cipal components one runs the risk of potentially re-moving small details of interest. Individual PCA com-ponents are necessarily both spatially and temporallyuncorrelated, making them unlikely to represent func-tionally distinct brain systems. Rotation methods suchas Varimax and Promax [29], might be used to relaxthe orthogonality constraint, but their utility for fMRIdata analysis has not yet been explored and the rele-vance of their underlying assumptions to fMRI datamay also be questioned [30]. Additional possible lim-itations of ICA refer to linearity and the global char-acterization of data (i.e., even if data sets are statis-tically heterogeneous ICA attempts to describe themusing the same global features as if the data were spa-tially homogeneous) [31]. Theoretically, nonlinearICA might circumvent the distributional heterogene-ity [32], though its implementation becomes rathercomputationally intensive. A fast deflation-type fixed-

Recent Advances in Intelligent Control, Modelling and Simulation

ISBN: 978-960-474-365-0 66

Page 4: Exploratory versus Confirmatory Analysis in Imaging ...wseas.us/e-library/conferences/2014/CambridgeUSA/ICMS/ICMS-06.pdfExploratory versus Confirmatory Analysis in Imaging Neuroscience

Figure 2: The model of fMRI data processing in the linear regression framework.

point based ICA algorithm introduced by Hyvarinenand Oja [33] relaxed considerably the computationaldemand of ICA. In contrast, clustering, and particu-larly FCA, is local in the sense that the cluster cen-troids do not consist of linear combinations of the timecourses of activations, hence does not get confoundedby global heterogeneity. Moreover, its algorithmicimplementation can be made fast, which is importantin processing large and complex data sets. FCA andICA are complementary in the sense that spatial ICAcould be used subsequently to temporal FCA if it ap-pears that the centroids are linear mixtures of well-defined temporal shapes. This view advanced by So-morjai and Jarmasz [31] was supported by Karhunenand Malaroiu [34], who proposed a preprocessing stepby k-means clustering of data. The idea behind k-means clustering is to classify individual voxels in thevolume with respect to their time courses of activa-tion. A k-means algorithm needs k cluster centroids tobe chosen of the same dimensionality as the time se-ries, then each voxel is assigned to the cluster centroidwith the best match. Subsequently, local ICA per-forms demixing of the k clusters (or their centroids).In conditions where the time courses of activation sig-nificantly or quite abruptly change in amplitude dur-ing experiments, wavelet analysis can more accuratelydetect activations than most commonly used data anal-ysis methods invoked so far [35].

2.2.1 PCA and ICA Models

ICA is a data-driven multivariate exploratory ap-proach based on the covariance paradigm and formu-lated as a generative linear latent variables model. Incontrast to inferential approaches, ICA reveals task-related, transiently task-related, and function-relatedactivity without reference to any experimental pro-tocol, including unanticipated or missed activations.Basic linear ICA model assumptions are the follow-ing: (i) source signals are not observable but are as-sumed to be statistically independent and at most oneGaussian, and (ii) mixing process is unknown, but as-sumed to be stationary and linear. Though ICA modelenforces a stringent statistical-independence require-

ment on the estimated spatial maps, their associatedtime courses (TCs) of activation result separable butnot necessarily independent.

Mathematically, the linear stationary PCA andICA models can be defined on the basis of a com-mon data model. Suppose that some stochastic pro-cesses are represented by three random (column) vec-tors x(t),n(t) ∈ RN and s(t) ∈ RM with zero meanand finite covariance, and the components of s(t) be-ing statistically independent and at most one Gaus-sian. Let A be a rectangular constant full column rankN ×M matrix with at least as many rows as columns(N ≥M), and denote by t the sample index (i.e. timeor point) taking discrete values t = 1, 2, ..., T . Wepostulate the existence of a linear relationship amongthese variables as follows:

x(t) = As(t) + n(t) =M∑i=1

si(t)ai + n(t) (1)

Here s(t), x(t), n(t), and A denote the latentsources, the observed data, the (unknown) noise indata, and the (unknown) mixing matrix, respectively,whereas ai, i = 1, 2, ...,M are the columns of A.Mixing is supposed to be instantaneous, so there isno time delay between the source variables {si(t)}mixing into observable (data) variables {xj(t)}, withi = 1, 2, ...,M and j = 1, 2, ..., N .

Consider that the stochastic process vector x(t)has the mean E{x(t)} = 0 and the covariance ma-trix Cx = E{x(t)x(t)T }, where T stands for thetranspose. The goal of PCA is to identify the de-pendence structure in each dimension and to comeout with an orthogonal transform matrix W of sizeL×N from RN to RL, where L ≤ N , such that the L-dimensional output vector y(t) = Wx(t) sufficientlyrepresents the intrinsic features of the input data, andwhere the covariance matrix Cy of y(t) is a diago-nal matrix D with its diagonal elements arranged ina descending order di−1,i−1 ≥ di,i, i = 2, 3, ..., L.The reconstruction of x(t) from y(t) denoted by x(t)is consequently given by x(t) = WTWx(t). ForL fixed, PCA aims to find an optimal value of W,say W, such as to minimize the reconstruction error

Recent Advances in Intelligent Control, Modelling and Simulation

ISBN: 978-960-474-365-0 67

Page 5: Exploratory versus Confirmatory Analysis in Imaging ...wseas.us/e-library/conferences/2014/CambridgeUSA/ICMS/ICMS-06.pdfExploratory versus Confirmatory Analysis in Imaging Neuroscience

( x )L N ( x )L N

W WT

x t1( )

x t2( )

x tN( )

...

( x )N T

x( )t

x t1( )y t1( )

x t2( )y t2( )

x tN( )y tL( )

.. .. ..

( x )N T( x )L T

x( )ty( )t ^

^

^

^

Figure 3: PCA model.

J = E{‖x(t) − x(t)‖} . The rows of the transformmatrix W are the principal components (PCs) of thestochastic process x(t), which constitute the eigen-vectors {cj}, j = 1, 2, ..., L of the input covariancematrix Cx. The subspace spanned by the principaleigenvectors {c1, c2, ..., cL} with L < N is the PCAsubspace of dimensionality L (Fig. 3).

The ICA problem can be formulated as given Trealizations of x(t), estimate both the matrix A andthe corresponding realizations of s(t) (Fig. 4). InBSS the task is somewhat relaxed to finding the wave-forms {si(t)} of the sources knowing only the (ob-served) mixtures {xj(t)}. There are several limita-tions to solving this problem. The size of s(t) (usu-ally unknown) should not be greater than the sizeof data x(t), otherwise the problem becomes under-determined. If the size of x(t) is greater than the sizeof s(t) (i.e., there are more observations than sources),the problem is over-determined and the extra data canbe used for reducing noise. If no suppositions aremade about the noise, then the additive noise term isomitted in (1). Modeling noise is a complex problemin itself and it is usually difficult to separate it fromthe genuine source signals. A practical strategy is toinclude it in the signals as supplementary term(s):

x(t) = As(t) =M∑i=1

si(t)ai (2)

Source separation in ICA amounts to updating a M ×N unmixing matrix B(t) without resorting to anyinformation about the spatial mixing matrix A, sothat the output vector y(t) = B(t)x(t) becomes anestimate of the original independent source signalsy(t) = s(t). Since ICA deals with higher-order statis-tics, it is justified to normalize in some sense the first-and second-order moments. The effect is that the sep-arating matrix B(t) is divided in two parts dealingwith dependencies in the first two moments, i.e., thewhitening (sphering) matrix V(t), and the dependen-cies in higher-order statistics, i.e., the orthogonal sep-arating matrix W(t) in the whitened space (Fig. 4).Whitening zero-mean observed data x(t) is often car-ried out by PCA, which optimally allows information

( x )N M

A

Mixing

( x )M N

( x )M M

W

( x )M N

V

BSphering Rotating

Unmixing

s t1( )

s t2( )

s tM( )

...

( x )M T

s( )t

x t1( )

x t2( )

x tN( )

...

( x )N T

x( )t

y t1( )

y t2( )

y tM( )

...

( x )M T

y s( )= ( )t t^

Figure 4: Stationary noiseless linear ICA model.

compression in the mean error square sense and somenoise may be filtered out. The PCA whitening matrixcan be expressed in the form:

V = D− 12ET (3)

where EDET = E{x(t)x(t)T } is the eigenvectordecomposition of the covariance matrix Cx, so thatD = diag[d1, d2, ..., dM ] is a M×M diagonal matrixcontaining the eigenvalues and E = [c1, c2, ..., cM ] isan orthogonal N ×M matrix having the eigenvectorsas columns. By whitening, we get a vector v(t) =V(t)x(t) with decorrelated components. The subse-quent linear transform W(t) seeks the solution by anadequate rotation in the space of component proba-bility densities and yields y(t) = W(t)v(t). Theappropriate orthogonal transform W can be soughtinvoking: (i) heuristic conditions for independence,(ii) optimizing some information-theoretic criterion,or (iii) optimizing some suitable contrast functions, insuch a way as to yield independent outputs. The fullseparation matrix between the input and the output be-comes B(t) = W(t)V(t). In the stationary case, thewhitening and the orthogonal separating matrices con-verge to some constant values after a finite numberof iterations during learning, so that B(t) → WV.Thus the estimated independent components (ICs) ofthe stationary noiseless linear ICA model at any sam-ple index t are given by:

s(t) = y(t) = Bx(t) (4)

Since sources are assumed mutually independent, thesequence of the estimated ICs is irrelevant. Oncethe unmixing matrix B was found, the best approx-imation of the mixing matrix A is given by A =

BT(BBT

)−1, or simpler, A = B−1, if the sig-nal sources equal the recording channels (square ICA:N = M ). The columns of A are the ICA basis vec-tors, whereas the rows of B provide the filters (weightvectors) in the original, not whitened space. The basisvectors {a1,a2, ...,aM} in ICA are the counterpartsof the principal eigenvectors {c1, c2, ..., cL} in PCA.Scale (amplitude/energy) and polarity (sign) informa-tion are distributed in the ICA decomposition between

Recent Advances in Intelligent Control, Modelling and Simulation

ISBN: 978-960-474-365-0 68

Page 6: Exploratory versus Confirmatory Analysis in Imaging ...wseas.us/e-library/conferences/2014/CambridgeUSA/ICMS/ICMS-06.pdfExploratory versus Confirmatory Analysis in Imaging Neuroscience

Voxels ( )M Voxels ( )M

Tim

e (T

)

Tim

e (

)T

Observed data Latent sources

Scan #j

Activation measuredin voxel #k

Source #i=

Sources ( )N

Sourc

es (

)N

Unknown mixing matrix

*

( )T MX ( )T NX

*( )N MX

=X A S

Vo

xe

l #k

Time ( )T

Estimated (independent) sources Observed data

=

Filte

rs (

)N

Maps (

)N

Voxels ( )M

Estimated map of activation #j

Activation time-courseestimated in voxel #k

Voxels ( )M

Tim

e (

)T

Scan #j

Activation measuredin voxel #k

Unmixing matrix

*

( xN M ) ( x )N T*

( x )T M

=S = Y B X

^

Figure 5: ICA stationary linear model for fMRI.

the ICA basis vectors and the estimated componentactivations (the rows of the activation matrix y). Con-sequently, the absolute amplitude and polarity of com-ponent activations have no intrinsic meaning and nounit of measure.

The flexibility of ICA approach to BSS by incor-porating higher-order statistical information residesin transforming the PCA ill-posed problem associ-ated with decorrelated decompositions into a well-posed problem of independent decompositions, thatis, ICA avoids the non-uniqueness associated withPCA. The ICA decomposition is unique up to IC am-plitude (scale), IC polarity, (sign) and IC ranking (or-der).

Basic ICA does not include a noise model, in-stead the (latent) sources are assumed to be com-pletely characterized on the basis of their indepen-dence and non-Gaussianity by the (observed) data andthe estimation of the mixing matrix. This approachprecludes: (i) the assessment of statistical significanceof the source estimates (ICs) within the frameworkof a null-hypothesis testing, and (ii) threshold tech-niques like converting the component map values toz-scores are devoid of statistical meaning and can beonly conceived as ad-hoc recipes. If we relax the con-straint on the ICA model of being square, then a mis-match between the best linear model fit and the orig-inal data is inevitably introduced. Underestimationof the dimensionality discards valuable informationand results in suboptimal signal extraction. Overfit-ting a noise-free generative model to noisy observa-tions results in a large number of spurious componentsdue to unconstrained estimation and factorization, ad-versely affecting the subsequent inference. That is,slight differences in the measured hemodynamic re-sponse at two different voxel locations are treated asreal effects due to a noise model missing. These dif-

Figure 6: Spatial ICA of auditory fMRI data [44].

ferences may represent either (i) valid spatial varia-tions, or (ii) differences in the background noise level.Accordingly, clusters of voxels activated by the sameexternal stimulus may be split onto different spatialmaps, which complicates their neurophysiological in-terpretation and increases the computational demand.

2.2.2 Cluster Analysis

Temporal FCA is based on the initial partition ofdata sets according to a test statistic in the power spec-trum domain, which comes out with trendy and belowthe noise level time series. Then FCA of the above thenoise level time series extracts common cluster behav-ior patterns (centroids). Each time series is modeledas a linear combination of the closest centroid plusresiduals (noise). The noise pool provides appropriatethresholds when testing the statistical significance ofthe model parameters without assuming distributionalproperties of the underlying noise source [36].

Experimental design consisted in successiveblocks alternating between rest and auditory stimula-tion by bi-syllabic words presented binaurally at a rateof 60/minute, starting with rest. Unmixing of fMRIdata into independent components consisting of spa-tial maps sources of brain activity and their associatedtime courses (TCs) of activation was performed bynon-square spatial ICA (Fig. 6). Alternatively, voxelswith similarly structured TCs were grouped by tem-poral FCA into statistically significant homogeneousclusters (Fig. 7). Each centroid and all cluster memberTCs satisfied two significant tests, one in time domain

Recent Advances in Intelligent Control, Modelling and Simulation

ISBN: 978-960-474-365-0 69

Page 7: Exploratory versus Confirmatory Analysis in Imaging ...wseas.us/e-library/conferences/2014/CambridgeUSA/ICMS/ICMS-06.pdfExploratory versus Confirmatory Analysis in Imaging Neuroscience

Figure 7: Temporal FCA of auditory fMRI data [44].

Figure 8: Inferential statistical analysis of auditory fMRIdata [44].

Figure 9: The fractal self-similarity of Daubechies 4mother wavelet generated by Wavelab850.

and one in frequency domain. The results were com-pared with the output produced by inferential statistics(SPM)(Fig. 8).

Though cluster analysis operates on statisticallyselected active time courses of activation while spatialICA separates the independent components on the ba-sis of their spatial independence, the results showedthat no statistically significant difference exists in theactivation areas detected by both FCA and ICA, as faras the CNR of fMRI time series is beyond a certainvalue.

2.2.3 Fractal Analysis

Brain function depends on adaptive self-organization of large-scale neural assemblies. Itspontaneously generates neural oscillations withlarge variability in frequency, amplitude, duration,and recurrence. Brain is prone to multiscale analysissince the cerebral cortex is statistically self-similar, afeature typically attributed to fractals [37]. Fractalsare complex, patterned, statistically self-similar orself-affine, scaling or scale-invariant structures withnon-integer dimensions, generated by simple iterativerules widespread in natural and synthetic systems.Fractal analysis largely matches the fractal propertiesof the human cerebral cortex in space and time downto a coarse spatial scale of 2.5 mm, which roughlyequals the cortex thickness. Fractals are particularlyuseful in computer modeling of natural and artificialirregular patterns and structures.

2.2.4 Wavelet-Based Statistical Analysis

Wavelets are fractals (Fig. 9) and so an appropri-ate choice of bases for fractal data analysis. Waveletfamilies provide a rich space in which to search foroptimal functional representations. Wavelet methods

Recent Advances in Intelligent Control, Modelling and Simulation

ISBN: 978-960-474-365-0 70

Page 8: Exploratory versus Confirmatory Analysis in Imaging ...wseas.us/e-library/conferences/2014/CambridgeUSA/ICMS/ICMS-06.pdfExploratory versus Confirmatory Analysis in Imaging Neuroscience

Figure 10: Preprocessed fMRI data (left) have excessivepower at low frequency as displayed by the power spectrumdensity plot (right). The decay of the MR spectrum at lowfrequency follows the 1/f dependence over 2 octaves (10Hz to 40 Hz) with 12 dB/octave roll-off. Negative slopeclose to 2 indicates scale-invariant properties [43].

are particularly adequate in brain imaging data anal-ysis due to broadly fractal properties exhibited by thecortex. As such, wavelets may constitute natural basesin neuroscience revealing activity patterns by multi-scale analysis. Wavelets provide an orthonormal ba-sis for multiresolution analysis and decorrelation ofnonstationary, scaling, scale-invariant, and fractal pro-cesses in time, space, or both, which is the case in neu-roimaging. Single scale Gaussian smoothing followedby hypothesis testing of regression coefficients in thespatial domain showed that the results of activationmapping are conditional on the choice of the kernel.A kernel much larger than the spatial extent of brainsources precludes evidence for significant local acti-vation. Scale-varying wavelet-based methods for hy-pothesis testing of brain activation maps circumventthe need to specify a priori the size of signals expectedand, therefore, the optimal choice of smoothing kernelrequired by spatial Gaussian filtering.

The decay of the MR spectrum at low frequencyfollowing the 1/f dependence over 2 octaves (10 Hz to40 Hz) with 12 dB/octave roll-off is shown in Fig. 10).Wavelet-based statistical analysis optimally decorre-lates fMRI data and performs Karhunen-Loeve expan-sion for long-memory 1/f-like processes (Fig. 11 andFig. 12).

Wavelet-based statistical inference, though pro-ducing more false negatives, entails fewer total errorsthan Gaussian filtering. Wavelet methods preservethe original shapes and sharpness of the active re-gions, along with optimal detection of transient eventsby adapting to local and/or nonstationary conditionswithin the decomposition scales. Wavelet-based de-noising methods, by introducing less smoothing, pre-serve the sharpness of images and retain the originalshapes of the distributed brain activation.

Figure 11: Spatial activation maps resulting from softthresholding that minimizes the estimate of the meansquared error in the wavelet domain generated by Wave-lab850 [42], [44].

Figure 12: Spatial activation maps generated byWSPM (http://www.big.epfl.ch/demo/wspm/), which per-forms wavelet analysis in the wavelet domain and multiplehypothesis testing in spatial domain [44].

Recent Advances in Intelligent Control, Modelling and Simulation

ISBN: 978-960-474-365-0 71

Page 9: Exploratory versus Confirmatory Analysis in Imaging ...wseas.us/e-library/conferences/2014/CambridgeUSA/ICMS/ICMS-06.pdfExploratory versus Confirmatory Analysis in Imaging Neuroscience

Figure 13: Activation parametric maps generated by: min-imally preprocessed raw fMRI data (left); SPM analysiswith Gaussian smoothing (mid); wavelet-based statisticalanalysis. In all cases, the multiple hypothesis testing wascontrolled by FDR thresholding (q = 0.05).

2.2.5 SPM and Wavelet-Based SPM

Analysis of fMRI data of a block-type visualstimulation paradigm was comparatively performedby SPM, which assumes parametric statistical mod-els at each voxel using the GLM in combination witha temporal convolution model, and wavelet shrink-age nonparametric regression, which is particularlysuited for underlying phenomena with irregular fea-tures such as spikes or sharp discontinuities. The linkis supported by the low-pass analysis filter of the dis-crete wavelet transform (DWT) that can be similarlyshaped to a Gaussian filter as in SPM, and the subsam-pling scheme, which allows defining the number ofcoefficients in the low-pass subband of the wavelet de-composition [38]. False discovery rate control (FDR)was used to correct for multiple hypothesis testing(Fig. 13).

2.3 Assessing reliability of data model sub-space

Reliability estimation is used to select an ap-propriate model for data subspace, to boost its sep-aration power, and to detect the estimated compo-nents that are most likely to have a neurophysiolog-ical meaning. Self-consistency of ICA decompositionof imging data is proven by detecting similar patternsof activity when processing the same data by differ-ent types of ICA algorithms optimizing distinct ob-jective functions (Fig. 14). Applying the ICA modelentails two main processes: (i) an estimation proce-dure: find a decomposition matrix such that the sta-tistical dependency between the estimated sources tobe minimized; (ii) use a contrast (objective) functionand an appropriate optimization algorithm. The con-

Figure 14: Different ICA algorithms running the sametask: best time correlated detected maps (top), and corre-sponding time courses of activation (bottom)

trasts and algorithms under study comprise: (i) max-imum entropy with gradient descent implemented ina stochastic neuromorphic algorithm InfoMax [39],(ii) kurtosis or higher-order cumulants with gradientdescent deterministic algorithms JADE [40] and SIM-BEC [41], and (iii) negentropy with batch-type fixed-point iteration (FastICA) [33]. All ICA algorithmscomparably performed brain source separation andfound task-related components in both left and righthemifields.

The BOLD response in fMRI time series biasesthe estimation of temporal autocorrelation dependingon the experimental paradigm complexity, which en-tails biased thresholds. Resampling based on whiten-ing transform proved the most robust in the presenceof BOLD signal in a block-type experimental design.Resampling raw data in spectral domain produces de-generated thresholds due to Fourier truncated modelused, whereas wavelet-based resampling resulted in arather conservative threshold due to fairly few waveletcoefficients capturing the low frequency paradigm andenforcing poor randomization.

3 ConclusionEDA encompasses methods that help to cope in-

formally with data and reveal structure in data rela-tively straightforward. The stress is on flexible prob-

Recent Advances in Intelligent Control, Modelling and Simulation

ISBN: 978-960-474-365-0 72

Page 10: Exploratory versus Confirmatory Analysis in Imaging ...wseas.us/e-library/conferences/2014/CambridgeUSA/ICMS/ICMS-06.pdfExploratory versus Confirmatory Analysis in Imaging Neuroscience

ing of data, often before comparing them to any prob-abilistic model. Such methods are ”best” compro-mises in several situations and, quite often, are closeto ”best” for each single situation. Artifactual behav-ior that EDA may easily discover could raise questionson data appropriateness, if additional preprocessingsteps are required, or if the preprocessing employedhas introduced spurious effects. CDA methods aremandatory for controlling the type I (false positives)errors and type II (false negatives) errors, but give sta-tistically meaningful results only if both the chosenmodel and the distributional assumptions are correct.EDA emphasizes flexible searching for evidence indata, whereas CDA focuses on evaluating the avail-able evidence. In imaging Neuroscience, the dynamicinterplay between hypothesis generation and hypoth-esis testing, a Hegelian synthesis of EDA and CDA,has the best chance of dealing successfully with theincreasingly complex experiments, or the emergingbroad range of theoretical and clinical studies [45].

Acknowledgements: R. Mutihac was fully sup-ported by the National Academies of Science / Na-tional Research Council, Research Associate ProgramAward #W81XWH-07-2-0001-0114, which is grate-fully acknowledged.

References:

[1] J.-F. Cardoso, Blind signal separation: Statisticalprinciples, Proc. IEEE 9, 10, 1998, pp. 2009–2025.

[2] S. Ogawa, T. M. Lee, A. S. Nayak, and P. Glynn,Oxygenation-sensitive contrast in magnetic res-onance image of rodent brain at high magneticfields, J. Magnetic Resonance in Medicine 14,1990, pp. 68–78.

[3] S. Zeki, Functional specialization in the visualcortex: the generation of separate constructs andtheir multistage integration, in Signal and Sense(G. M. Edelman, W. E. Gall, and W. M. Cowan,Eds.), pp. 85–130, John Wiley, New York, 1990.

[4] G. L. Gernstein, P. Bedenbaugh, andA. M. H. J. Aertsen, Neuronal assemblies,IEEE Trans. on Biomedical Engineering 36,1989, pp. 4–14.

[5] B. Horwitz, K. Frinston, and J. G. Taylor,Neural modeling and functional brain imaging:An overview, Neural Networks 13, 8–9, 2000,pp. 829–846.

[6] F. T. Sommer, J. A. Hirsch, and A. Wichert,Theories, data analysis, and simulation modelsin neuroimaging - An overview, in Exploratory

Analysis and Data Modeling in Functional Neu-roimaging (F. T. Sommer and A. Wichert, Eds.),pp. 1–13, Neural Information Processing Series,The MIT Press, Cambridge, 2003.

[7] I. C. Wright, P. C. McGuire, J.-B. Poline,J. M. Travere, R. M. Murray, C. D. Frith,R. S. J. Frackowiak, and K. J. Friston, A voxel-based method for the statistical analysis of grayand white matter density applied to schizophre-nia, NeuroImage 2, 1995, pp. 244–252.

[8] D. MacDonald, K. J. Worsley, D. Avis, andA. C. Evans, Surface segmentation and matchingby 3D deformation, NeuroImage 3, 1996, S253.

[9] K. J. Worsley, D. MacDonald, J. Cao,Kh. Shafie, and A. C. Evans, Statisticalanalysis of cortical surfaces, NeuroImage 3,1996, S108.

[10] P. M, Thompson, C. Schwartz, and A. W. Toga,High-resolution random mesh algorithms forcreating a probabilistic 3D surface atlas of thehuman brain, NeuroImage 3, 1996, pp. 19–34.

[11] K. Zilles, P. Falkal, T. Schormann, H. Steinmetz,and N. Palermo-Gallagher, Cortical surface inschizophrenic patients and controls: MRI, 3-Dreconstruction and in vivo morphometry, Neu-roImage 3, 1996, S525.

[12] D. L. Collins, C. J. Holmes, T. M. Peters, andA. C. Evans, Automatic 3-D model-based neu-roanatomical segmentation, Human Brain Map-ping 3, 1995, pp. 190–208.

[13] U. Kjems, C. T. Chen, S. C. Strother,L. K. Hansen, J. R. Anderson, I. Law, O. B. Paul-son, I. Kanno, and D. A. Rottenberg, Reveal-ing structural effects in functional imaging withanatomical warps, NeuroImage 3, 1996, S137.

[14] J. B. Poline and B. M. Mazoyer, Enhanced de-tection in brain activation maps using a multifil-tering approach, Journal of Cerebral Blood Flowand Metabolism 14, 1994, pp. 639–642.

[15] K. J. Worsley, S. Marrett, P. Neelin, andA. C. Evans, Searching scale space for activa-tion in PET images, Human Brain Mapping 4,1996, pp. 74–90.

[16] C. Buchel, R. J. S. Wise, C. J. Mummery,J. B. Poline, and K. J. Friston, Non-linear regres-sion in parametric activation studies, NeuroIm-age 4, 1996, pp. 60–66.

[17] K. J. Worsley and K. J. Friston, Analysis offMRI time-series revisited - again, NeuroImage2, 1995, pp. 173–181.

[18] K. J. Friston, A. P. Holmes, K. J. Worsley, J.-B. Poline, C. D. Frith, and R. S. J. Frackowiak,Statistical parametric maps in functional imag-ing: A general linear approach, Hum. BrainMap. 2, 1995, pp. 189–210.

Recent Advances in Intelligent Control, Modelling and Simulation

ISBN: 978-960-474-365-0 73

Page 11: Exploratory versus Confirmatory Analysis in Imaging ...wseas.us/e-library/conferences/2014/CambridgeUSA/ICMS/ICMS-06.pdfExploratory versus Confirmatory Analysis in Imaging Neuroscience

[19] J.-B. Poline , K. J. Worsley, A. C. Evans, andK. J. Friston, Combining spatial extent and peakintensity to test for activations in functionalimaging, NeuroImage 5, 1997, pp. 83–96.

[20] F. Esposito, E. Seifritz, E. Formisano, R. Mor-rone, T. Scarabino, G. Tedeschi, S. Cirillo,R. Goebel, and F. Di Salle, Real-time indepen-dent component analysis of fMRI time-series,NeuroImage 20, 2003, pp. 2209–2224.

[21] T. Kohonen, Self-Organizing Maps. New York,Springer, 1995.

[22] C. Goutte, P. Toft, E. Rostrup, F. Nielsen, andL. K. Hansen, On clustering fMRI time series,Neuroimage 9, 3, 1999, pp. 298–310.

[23] L. L. Thurstone, Multiple factor analysis, Psy-chol. Rev. 38, 1931, pp. 406–427.

[24] J. H. Friedman, Exploratory projection pursuit,J. of the American Statistical Association 82,397, 1987, pp. 249–266.

[25] H. Hotelling, Analysis of a complex of statisticalvariables into principal components, Journal ofEducational Psychology 24, 1933, pp. 417–441and pp. 498–520.

[26] P. Comon P, Independent component analysis, Anew concept? Signal Processing 36, 3, 1994,pp. 287–314.

[27] J. W. Tukey, The future of data analysis, Annalsof Statistics 33, 1962, pp. 1–67.

[28] P. J. Huber, Huge data sets, in Proceedings,Compstat 1994 (R. Dutter and W. Grossman,Eds.), pp. 3–13, Physica, Verlag, Heidelberg,1994.

[29] A. E. Hendrickson and P. O. White, Promax:A quick method for rotation to oblique simplestructure, The British Journal of Statistical Psy-chology 17, Part 1, 1964, pp. 65–70.

[30] J.-R. Duann, T.-P. Jung, W.-J. Kuo, T.-C. Yeh,S. Makeig, J.-C. Hsieh, and T. J. Sejnowski,Single-trial variability in event-related BOLDsignals, NeuroImage 15, 2002, pp. 823–835.

[31] R. L. Somorjai and M. Jarmasz, Exploratoryanalysis of fMRI data by fuzzy clustering:philosophy, strategy, tactics, implementation,”in Exploratory Analysis and Data Modelingin Functional Neuroimaging, pp. 17–48, MITPress, Cambridge, MA, USA, 2003.

[32] L. Parra, G. Deco, and S. Miesbach, Statis-tical independence and novelty detection withinformation-preserving nonlinear maps, NeuralComputation 8, 1996, pp. 260–269.

[33] A. Hyvarinen and E. Oja, Independent compo-nent analysis: algorithms and applications, Neu-ral Networks 13, 2000, pp. 411–430.

[34] J. Karhunen and S. Malaroiu, Local independentcomponent analysis using clustering, in Proc. 1stIntl. Workshop on ICA and Signal Processing(ICA ’99), Aussois, France, 1999.

[35] M. J. Brammer, Multidimensional wavelet anal-ysis of functional magnetic resonance imaging,Hum. Brain. Mapp. 6, 1998, pp. 378–382.

[36] M. Jarmasz and R. L. Somorjai, EROICA: Ex-ploring regions of interest with cluster analysisin large functional magnetic resonance imagingdata sets, Concepts in Magnetic Resonance 16A,1, 2003, pp. 50-62.

[37] B. B. Mandelbrot, The Fractal Geometry of Na-ture, W. H. Freeman and Co., New York, 1977.

[38] D. Van De Ville, T. Blu, and M. Unser, Inte-grated wavelet processing and spatial statisticaltesting of fMRI data, NeuroImage 23, 4, 2004,pp. 1472–1485.

[39] A. Bell and T. Sejnowski, An information-maximization approach to blind separation andblind deconvolution, Neural Comput. 7, 1995,pp. 1129–1159.

[40] J.-F. Cardoso and B. H. Laheld, Equivariantadaptive source separation, IEEE T. Signal Pro-ces. 44, 12, 1996, pp. 3017–3030.

[41] S. A. Cruces, L. Castedo-Ribas, and A. Ci-chocki, Robust blind source separation algo-rithms using cumulants, Neurcomput., 49, 2002,pp. 87–118.

[42] J. B. Buckheit and D. L. Donoho, Wave-lab and reproducible research, Stanford Uni-versity, CA, 1995. Available at http://www-stat.stanford.edu/wavelab

[43] R. Mutihac, Fractal complexity of the humanbrain cortex, OHBM 2010, Abstract 3155.

[44] R. Mutihac, Multiscale analysis of the humanbrain cortex, Rom. Rep. Phys. 62, 4, 2010,pp. 801–810.

[45] R. L. Somorjai, Exploratory data analysis infunctional neuroimaging, Artificial Intelligencein Medicine 25, 2002, pp. 1–3.

Recent Advances in Intelligent Control, Modelling and Simulation

ISBN: 978-960-474-365-0 74