Model-driven statistical analysis of fMRI data
Keith Worsley
Department of Mathematics and Statistics,
Brain Imaging Centre, Montreal Neurological Institute, McGill University
www.math.mcgill.ca/keith
References
• Worsley et al. (2002). A general statistical analysis for fMRI data. NeuroImage, 15:1-15.
• Liao et al. (2002). Estimating the delay of the response in fMRI data. NeuroImage, 16:593-606.
• FMRISTAT: MATLAB package from www.math.mcgill.ca/keith/fmristat
0
500
1000First scan of fMRI data
-5
0
5
T statistic for hot - warm effect
0 100 200 300
870880890 hot
restwarm
Highly significant effect, T=6.59
0 100 200 300
800
820hotrestwarm
No significant effect, T=-0.74
0 100 200 300
790800810
Drift
Time, seconds
fMRI data: 120 scans, 3 scans each of hot, rest, warm, rest, hot, rest, …
T = (hot – warm effect) / S.d. ~ t110 if no effect
0 20 40 60 80 100 120
4
3
2
1
Component
Frame
Temporal components (sd, % variance explained)
105.7, 77.8%
26.1, 4.8%
15.8, 1.7%
14.8, 1.5%
Slice
Component
Spatial components
0 2 4 6 8 10
1
2
3
4
Exploring the data: PCA of time space
1: excludefirst frames
2: drift
3: long-range correlationor anatomicaleffect: removeby converting to % of brain
4: signal?
Modeling the data: Choices …• Time domain / frequency domain?• AR / ARMA / state space models?• Linear / non-linear time series model?• Fixed HRF / estimated HRF?• Voxel / local / global parameters?• Fixed effects / random effects?• Frequentist / Bayesian?
Compromise:Simple, general, valid, robust, fast statistical analysis
0 50 100 150 200 250 300 350-1
0
1
2Alternating hot and warm stimuli separated by rest (9 seconds each).
hotwarm
hotwarm
0 50-0.2
0
0.2
0.4
Hemodynamic response function: difference of two gamma densities
0 50 100 150 200 250 300 350-1
0
1
2Responses = stimuli * HRF, sampled every 3 seconds
Time, seconds
Covariates example: pain perception
Linear model for fMRI time series with AR(p) correlated errors
• Linear model: ? ? Yt = (stimulust * HRF) b + driftt c + errort
• AR(p) errors: ? ? ? errort = a1 errort-1 + … + ap errort-p + s WNt
‘White Noise’
unknown parameters
-0.1
0
0.1
0.2
0.3
First step: estimate the autocorrelationAR(1) model: errort = a1 errort-1 + s WNt
• Fit the linear model using least squares
• errort = Yt – fitted Yt
• â1 = Correlation ( errort , errort-1)
• Estimating errort’s changes their correlation structure slightly, so â1 is slightly biased:
Raw autocorrelation Smoothed 15mm Bias corrected â1
~ -0.05 ~ 0~ -0.05 ~ 0
?
-1
-0.5
0
0.5
1 Hot - warm effect, %
0
0.05
0.1
0.15
0.2
0.25Sd of effect, %
-6
-4
-2
0
2
4
6 T = effect / sd, 110 df
Pre-whiten: Yt* = Yt – â1 Yt-1, then refit using least squares:
Second step: pre-whiten, refit the linear model
T > 4.93 (P < 0.05, corrected)
a1
a2
-0.1
0
0.1
0.2
0.3
a3
AR(1) AR(2)
-5
0
5
AR(3)
Higher order AR model? Try AR(3):
… has little effect on the T statistics:No correlation
biases T up ~12% more false positives
AR(1) seemsto be adequate
Results from 4 runs on the same subject
-1
0
1 Run 1 Run 2 Run 3 Run 4
Effect, E
i
0
0.1
0.2 Sd, S
i
-5
0
5
T stat, E
i / S
i
Mixed effects linear model for combining effects from different
runs/sessions/subjects:
• Ei = effect for run/session/subject i
• Si = standard error of effect
• Mixed effects model:
Ei = covariatesi c + Si WNiF + WNi
R
Random effect,due to variability from run to run
‘Fixed effects’ error,due to variabilitywithin the same run
Usually 1, but could add group,treatment, age,sex, ...
}from
Lin. Mod.
? ?
REML estimation using the EM algorithm
• Slow to converge (10 iterations by default).• Stable (maintains estimate 2 > 0 ), but2 biased if 2 (random effect) is small, so:• Re-parameterize the variance model:
Var(Ei) = Si2 + 2
= (Si2 – minj Sj
2) + (2 + minj Sj2)
= Si*2 + *2 2 = *2 – minj Sj
2 (less biased estimate)^ ^
^
?
?
^
Run 1 Run 2 Run 3 Run 4
Effect, E i
Sd, S
i
T stat, E i / S i
-1
0
1 MULTISTAT
0
0.1
0.2
-5
0
5
Problem: 4 runs, 3 df for random effects sd ...
… and T>15.96 for P<0.05 (corrected):
… very noisy sd:
… so no response is detected …
• Basic idea: increase df by spatial smoothing (local pooling) of the sd.
• Can’t smooth the random effects sd directly, - too much anatomical structure.
• Instead,
random effects sd
fixed effects sd
which removes the anatomical structure before smoothing.
Solution: Spatial regularization of the sd
sd = smooth fixed effects sd )
Random effects sd, 3 dfFixed effects sd, 440 df
0
0.05
0.1
0.15
0.2
Mixed effects sd, ~100 df
Random sd / fixed sd
0.5
1
1.5Smoothed sd ratio
randomeffect, sdratio ~1.3
divide multiply
^ Average Si
dfratio = dfrandom(2 + 1)1 1 1
dfeff dfratio dffixed
Effective df depends on smoothing
FWHMratio2 3/2
FWHMdata2
= +
e.g. dfrandom = 3, dffixed = 4 110 = 440, FWHMdata = 8mm:
0 20 40 Infinity0
100
200
300
400
FWHMratio
dfeff
random effectsanalysis, dfeff = 3
fixed effects analysis, dfeff = 440
Target = 100 df FWHM = 19mm
Why 100?If out by 50%,dbn of T notmuch affected
Run 1 Run 2 Run 3 Run 4
Effect, E i
Sd, S
i
T stat, E i / S i
-1
0
1 MULTISTAT
0
0.1
0.2
-5
0
5
Final result: 19mm smoothing, 100 effective df …
… less noisy sd:
… and T>4.93 for P<0.05 (corrected):
… and now we can detect a response!
P-values assessed for:• Peaks or local maxima• Spatial extent of clusters of neighbouring voxels
above a pre-chosen threshold (~3)
• Correct for searching over a pre-specified region (usually the whole brain), which depends on:– number of voxels in the search region (Bonferroni) or
– number of resels = volume / FWHM3 in the search region (random field theory)
– in practice, take the minimum of the two!
FWHM is spatially varying (non-isotropic)
• fMRI data is smoother in GM than WM• VBM data is highly non-isotropic
• Has little effect on P-values for local maxima (use ‘average’ FWHM inside search region), but
• Has a big effect on P-values for spatial extents: smooth regions → big clusters, rough regions → small clusters, so
• Replace cluster volume by cluster resels = volume / FWHM3
FWHM – the local smoothness of the noise
FWHM = (2 log 2)1/2 voxel size(1 – correlation)1/2
(If the noise is modeled as white noise smoothed with a Gaussian kernel, this would be its FWHM)
resels = VolumeFWHM3
0 500 10000
0.02
0.04
0.06
0.08
0.1
Resels of search volume
P v
alue
of l
ocal
max
Local maximum T = 4.5
0 0.5 1 1.5 20
0.02
0.04
0.06
0.08
0.1
Resels of cluster
P v
alue
of c
lust
er
Clusters above t = 3.0, search volume resels = 500
P-values depend on resels:
0
5
10
15
20FWHM (mm) of scans (110 df)
0
5
10
15
20FWHM (mm) of effects (3 df)
0
5
10
15
20FWHM of effects (smoothed)
0.5
1
1.5effects / scans FWHM (smoothed)
Resels=1.90P=0.007
Resels=0.57P=0.387
Statistical summary: clusters clus vol resel p-val (one)• 1 33992 54.22 0 ( 0) • 2 14150 25.03 0 ( 0) • 3 12382 20.29 0 ( 0) • 4 2538 3.12 0.011 (0.001) • 5 2538 2.77 0.016 (0.001) • 6 1577 2.15 0.035 (0.002) • 7 1000 1.43 0.098 (0.006) • 8 500 1.31 0.119 (0.007) • 9 1000 1.07 0.179 (0.011) • 10 385 0.99 0.208 (0.013)• •
Statistical summary: peaks• clus peak p-val (one) q-val (i j k) ( x y z )• 1 12.72 0 ( 0) 0 (59 74 1) ( 10.5 -28.7 24.1)• 1 12.58 0 ( 0) 0 (60 75 1) ( 8.2 -31 23.7)• 1 11.45 0 ( 0) 0 (61 73 2) ( 5.9 -25.3 17.5)• 1 11.08 0 ( 0) 0 (62 66 4) ( 3.5 -6.9 6.3)• 1 10.95 0 ( 0) 0 (61 70 4) ( 5.9 -16.2 4.8)• 1 10.6 0 ( 0) 0 (62 69 3) ( 3.5 -15 12.1)• • • • 2 5.07 0.029 (0.004) 0 (48 69 10) ( 36.3 -7.3 -36.3)• 3 5.06 0.029 (0.004) 0 (73 72 9) (-22.3 -15.3 -30.5)• 3 5.03 0.033 (0.004) 0 (81 63 10) ( -41 6.6 -34.1)• 13 5.02 0.035 (0.005) 0 (88 72 8) (-57.4 -16.4 -23.6)• 6 4.91 0.054 (0.007) 0 (42 69 3) ( 50.4 -15 12.1)• 11 4.91 0.055 (0.007) 0 (69 70 7) (-12.9 -12.9 -15.9)• 9 4.91 0.055 (0.007) 0 (48 46 5) ( 36.3 40.5 6.7)• 1 4.85 0.069 (0.008) 0 (52 93 2) ( 27 -71.6 10.2)• 3 4.82 0.08 (0.009) 0 (79 66 8) (-36.3 -2.5 -21.4)• 3 4.81 0.082 (0.009) 0 (78 65 8) ( -34 -0.2 -21)• 1 4.8 0.086 ( 0.01) 0 (62 59 5) ( 3.5 10.4 1.9)• 3 4.77 0.097 (0.011) 0 (82 61 10) (-43.4 11.2 -33.4)• 1 4.75 0.106 (0.012) 0 (55 71 2) ( 19.9 -20.7 18.3)• 5 4.73 0.114 (0.012) 0 (67 84 2) ( -8.2 -50.8 13.5)• •
T>4.86
T>4.86
T > 4.93 (P < 0.05, corrected)
T>4.86
T > 4.93 (P < 0.05, corrected)
T>4.86
Efficiency : optimum block design
0
0.1
0.2
0.3
0.4
0.5InterStimulus Interval (secs)
Sd of hot stimulus
X
5 10 15 200
5
10
15
20
0
0.1
0.2
0.3
0.4
0.5Sd of hot-warm
X5 10 15 20
0
5
10
15
20
0
0.2
0.4
0.6
0.8
1 (secs)
5 10 15 20
5
10
15
20
0
0.2
0.4
0.6
0.8
1
Stimulus Duration (secs)
(secs)
5 10 15 200
5
10
15
20
Optimumdesign
Optimum designX
Optimumdesign
Optimum designX
Magnitude
Delay
(Not enough signal)(Not enough signal)
5 10 15 200
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Average time between events (secs)
Sd of effect (secs for delays)
uniform . . . . . . . . .random .. . ... .. .
concentrated :
Efficiency : optimum event design
____ magnitudes ……. delays
(Not enough signal)
How many subjects?
• Largest portion of variance comes from the last stage i.e. combining over subjects:
sdrun2 sdsess
2 sdsubj2
nrun nsess nsubj nsess nsubj nsubj
• If you want to optimize total scanner time, take more subjects.
• What you do at early stages doesn’t matter very much!
+ +
References
• Worsley et al. (2002). A general statistical analysis for fMRI data. NeuroImage, 15:1-15.
• Liao et al. (2002). Estimating the delay of the response in fMRI data. NeuroImage, 16:593-606.
• FMRISTAT: MATLAB package from www.math.mcgill.ca/keith/fmristat
-5 0 5 10 15 20 25-0.4
-0.2
0
0.2
0.4
0.6
t (seconds)
Estimating the delay of the response• Delay or latency to the peak of the HRF is approximated by a linear combination of two optimally chosen basis functions:
HRF(t + shift) ~ basis1(t) w1(shift) + basis2(t) w2(shift)
• Convolve bases with the stimulus, then add to the linear model
basis1 basis2HRF
shift
delay
-5 0 5-3
-2
-1
0
1
2
3
shift (seconds)
• Fit linear model, estimate w1 and w2
• Equate w2 / w1 to estimates, then solve for shift (Hensen et al., 2002)
• To reduce bias when the magnitude is small, use
shift / (1 + 1/T2)
where T = w1 / Sd(w1) is the T statistic for the magnitude
• Shrinks shift to 0 where there is little evidence for a response.
w1
w2
w2 / w1
-6
-4
-2
0
2
4
6
-6
-4
-2
0
2
4
6
-4
-2
0
2
4
0
0.5
1
1.5
2
Shift of the hot stimulusT stat for magnitude T stat for shift
Shift (secs) Sd of shift (secs)
-6
-4
-2
0
2
4
6
-6
-4
-2
0
2
4
6
-4
-2
0
2
4
0
0.5
1
1.5
2
Shift of the hot stimulusT stat for magnitude T stat for shift
Shift (secs) Sd of shift (secs)
~1 sec +/- 0.5 sec
T>4
T~2
Run 1 Run 2 Run 3 Run 4
Effect, E
i
Sd, S i
T stat, E i / S i
-4
-2
0
2
4 MULTISTAT
0
1
2
-5
0
5
Combining shifts of the hot stimulus(Contours are T stat for magnitude > 4)
Shift (secs)
Shift of the hot stimulus
T stat for magnitude > 4.93
References
• Worsley et al. (2002). A general statistical analysis for fMRI data. NeuroImage, 15:1-15.
• Liao et al. (2002). Estimating the delay of the response in fMRI data. NeuroImage, 16:593-606.
• FMRISTAT: MATLAB package from www.math.mcgill.ca/keith/fmristat
False Discovery Rate (FDR)Benjamini and Hochberg (1995), Journal of the Royal Statistical Society
Benjamini and Yekutieli (2001), Annals of StatisticsGenovese et al. (2001), NeuroImage
• FDR controls the expected proportion of false positives amongst the discoveries, whereas
• Bonferroni / random field theory controls the probability of any false positives
• No correction controls the proportion of false positives in the volume
-4
-2
0
2
4
-4
-2
0
2
4
-4
-2
0
2
4
-4
-2
0
2
4
Noise
P < 0.05 (uncorrected), T > 1.645% of volume is false +
FDR < 0.05, T > 2.825% of discoveries is false +
P < 0.05 (corrected), T > 4.225% probability of any false +
Signal + Gaussian white noise
False +
True +Signal
• FDR depends on the ordered P-values: P1 < P2 < … < Pn. To control the FDR at a = 0.05, find K = max {i : Pi < (i/n) a}, threshold the P-values at PK
Proportion of true + 1 0.1 0.01 0.001 0.0001 Threshold T 1.64 2.56 3.28 3.88 4.41
• Bonferroni thresholds the P-values at a/n: Number of voxels 1 10 100 1000 10000 Threshold T 1.64 2.58 3.29 3.89 4.42
• Random field theory: resels = volume / FHHM3: Number of resels 0 1 10 100 1000 Threshold T 1.64 2.82 3.46 4.09 4.65
Comparison of thresholds
P < 0.05 (uncorrected), T > 1.645% of volume is false +
FDR < 0.05, T > 2.675% of discoveries is false +
P < 0.05 (corrected), T > 4.935% probability of any false +
-6
-4
-2
0
2
4
6
-6
-4
-2
0
2
4
6
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
Conjunction: Minimum Ti > threshold‘Minimum of Ti’ ‘Average of Ti’
For P=0.05,threshold = 1.82
For P=0.05,threshold = 4.93
Efficiency = 82%
Functional connectivity• Measured by the correlation between residuals at
every pair of voxels (6D data!)
• Local maxima are larger than all 12 neighbours• P-value can be calculated using random field theory• Good at detecting focal connectivity, but• PCA of residuals x voxels is better at detecting large
regions of co-correlated voxels
Voxel 2
Voxel 1
++ +
+++
Activation onlyVoxel 2
Voxel 1++
+
+
+
+
Correlation only
First Principal Component > threshold
|Correlations| > 0.7,P<10-10 (corrected)
Top Related