Circular analysis in systems neuroscience – with particular attention to cross-subject correlation...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Circular analysis in systems neuroscience – with particular attention to cross-subject correlation...
Circular analysis in systems neuroscience– with particular attention to cross-subject
correlation mappingNikolaus Kriegeskorte
Laboratory of Brain and Cognition, National Institute of Mental Health
Part 1General introduction to circular analysis in systems neuroscience(synopsis of Kriegeskorte et al. 2009)
Part 2Specific issue: selection bias incross-subject correlation mapping(following up on Vul et al. 2009)
Overview
How do assumptions tinge results?
Elimination(binary selection)
Weighting(continuous selection)
Sorting(multiclass selection)
– Through variants of selection!
Experimental design
“Animate?” “Pleasant?”
ST
IMU
LU
S(o
bje
ct c
ateg
ory
)TASK
(property judgment)Simmons et al. 2006
• define ROI by selecting ventral-temporal voxels for which any pairwise condition contrast is significant at p<.001 (uncorr.)
• perform nearest-neighbor classificationbased on activity-pattern correlation
• use odd runs for trainingand even runs for testing
Pattern-information analysis
0
0.5
1d
eco
din
g a
ccu
racy
task
(judged property
)
stimulus
(object
category)
Results
chance level
fMRI data
using all datato select ROI voxels
using onlytraining data
to select ROI voxels
data from Gaussianrandom generator
0
0.5
1
0
0.5
1
0
0.5
1
0
0.5
1
dec
od
ing
acc
ura
cy
chance level
taskstim
ulus
...but we used cleanly independenttraining and test data!
?!
Conclusion for pattern-information analysis
The test data must not be used in either...• training a classifier or• defining the ROI
continuous weighting
binary weighting
Data selection is key to many conventional analyses.
Can it entail similar biases in other contexts?
ROI definition is affected by noise
true region
overfitted ROI
RO
I-av
erag
eac
tiva
tio
n
overestimated effect
independent ROI
Set-average tuning curves
stimulus parameter (e.g. orientation)
res
po
ns
e
...for data sorted by tuning
noise data
RO
I-av
erag
efM
RI r
esp
on
se
A B C Dcondition
Set-average activation profiles...for data sorted by activation
noise data
To avoid selection bias, we can...
...perform a nonselective analysis
OR
...make sure that selection and results statistics are independent under the null hypothesis,
because they are either:• inherently independent• or computed on independent data
e.g. independent contrasts
e.g. whole-brain mapping(no ROI analysis)
Does selection by an orthogonal contrast vector ensure unbiased analysis?
ROI-definition contrast: A+B
ROI-average analysis contrast: A-B
cselection=[1 1]T
ctest=[1 -1]T
orthogonal contrast vectors
Does selection by an orthogonal contrast vector ensure unbiased analysis?
not sufficient
contrastvector
The design and noise dependencies matter.design noise dependencies
– No, there can still be bias.
still not sufficient
Circular analysis
Pros
• highly sensitive
• widely accepted (examples in all high-impact journals)
• doesn't require independent data sets
• grants scientists independencefrom the data
• allows smooth blending of blind faith and empiricism
Cons
Circular analysis
Pros
• highly sensitive
• widely accepted (examples in all high-impact journals)
• doesn't require independent data sets
• grants scientists independencefrom the data
• allows smooth blending of blind faith and empiricism
Cons
Circular analysis
Pros
• highly sensitive
• widely accepted (examples in all high-impact journals)
• doesn't require independent data sets
• grants scientists independencefrom the data
• allows smooth blending of blind faith and empiricism
Cons• [can’t think of any right now]
Pros• the error that beautifies results
• confirms even incorrect hypotheses
• improves chances ofhigh-impact publication
Part 2Specific issue: selection bias in
cross-subject correlation mapping(following up on Vul et al. 2009)
Motivation
Vul et al. (2009) posed a puzzle:
Why are the cross-subject correlations found in brain mapping so high?
Selection bias is one piece of the puzzle.
But there are more pieces and we have yet to put them all together.
Overview
• List and discuss six pieces of the puzzle.(They don't all point in the same direction!)
• Suggest some guidelines for good practice.
Six pieces synopsis1. Cross-subject correlation estimates are very noisy.
2. Bin or within-subject averaging legitimately increases correlations.
3. Selecting among noisy estimates yields large biases.
4. False-positive regions are highly likely for a whole-brain mapping thresholded at p<.001, uncorrected.
5. Reported correlations are high, but not highly significant.
6. Studies have low power for finding realistic correlations in the brain if multiple testing is appropriately accounted for.
Vul et al. 2009
,,,,
population
The geometric mean of the reliability is an upper boundon the population correlation.The reliabilities provide no bound
on the sample correlation.
noise-freecorrelation
Sample correlationsacross small numbers of subjects
are very noisy estimatesof population correlations.
Piece 1
Subjects are like bins...
For each subject, all data is averaged to give one number.
Take-home message
Cross-subject correlation estimates are expected to be...• high (averaging all data for each subject)• noisy (low number of subjects)
So what's Ed fussing about?We don't need selection bias to explain the high correlations, right?
Expected maximum correlationselected among null regions
exp
ecte
d m
axim
um
co
rrel
atio
n
16 subjects
bias
False-positive regions are likely to be found in whole-brain mapping
using p<.001, uncorrected.
Piece 4
Mapping with p<.001, uncorrectedGlobal null hypothesis is true
(population correlation = 0 in all brain locations)
Reported correlations are high,but not highly significant
p<0.00001p<0.001 p<0.01p<0.05one-sided
two-sided
correlation thresholds as a functionof the number of subjects
Reported correlations are high,but not highly significant
p<0.00001p<0.001 p<0.01p<0.05one-sided
two-sided
correlation thresholds as a functionof the number of subjects
Reported correlations are high,but not highly significant
p<0.00001p<0.001 p<0.01p<0.05one-sided
two-sided
correlation thresholds as a functionof the number of subjects
(assuming each study reportsthe maximum of 500
independent brain locations)
What correlations would we expectunder the global null hypothesis?
Reported correlations are high,but not highly significant
p<0.00001p<0.001 p<0.01p<0.05one-sided
two-sided
(assuming each study reports the max.of 500 independent brain locations)
What correlations would we expectunder the global null hypothesis?
Most of the studies have low powerfor finding realistic correlations
with whole-brain mappingif multiple testing is appropriately
accounted for.
Piece 6
see also: Yarkoni 2009
Numbers of subjectsin studies reviewed by Vul et al. (2009)
nu
mb
er o
f co
rrel
atio
ns
esti
mat
es
number of subjects4 8 16 36 60 100
po
wer
In order to find a single region with across-subject correlation of 0.7 in the brain...
...we would needabout 36 subjects
16 subjects
po
wer
In order to find a single region with across-subject correlation of 0.7 in the brain...
...we would needabout 36 subjects
16 subjects
Take-home message
Whole-brain cross-subject correlation mapping
with 16 subjects
does not work.
Need at least twice as many subjects.
ConclusionsUnless much larger numbers of subjects are used,
whole-brain cross-subject correlation mapping suffers from either:– very low power to detect true regions
(if we carefully to correct for multiple comparisons)– very high rates of false-positive regions
(otherwise)
If analysis is circular, selection bias is expected to be high here (because selection occurs among noisy estimates).
...in other words,it doesn't work.
Suggestions• Design study to have enough power to detect realistic
correlations. (Need either anatomical restrictions or large numbers of subjects.)
• Consider studying trial-to-trial rather than subject-to-subject effects.
• Correct for multiple testing to avoid false positives.
• Avoid circularity: Use leave-one-subject out procedure to estimate regional cross-subject correlations.
• Report correlation estimates with error bars.