J. Daunizeau Brain and Spine Institute, Paris, France
description
Transcript of J. Daunizeau Brain and Spine Institute, Paris, France
J. Daunizeau
Brain and Spine Institute, Paris, FranceWellcome Trust Centre for Neuroimaging, London, UK
Bayesian inference
Overview of the talk
1 Probabilistic modelling and representation of uncertainty1.1 Bayesian paradigm1.2 Hierarchical models1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods2.1 Sampling methods2.2 Variational methods (ReML, EM, VB)
3 SPM applications3.1 aMRI segmentation3.2 Decoding of brain images3.3 Model-based fMRI analysis (with spatial priors)3.4 Dynamic causal modelling
Overview of the talk
1 Probabilistic modelling and representation of uncertainty1.1 Bayesian paradigm1.2 Hierarchical models1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods2.1 Sampling methods2.2 Variational methods (ReML, EM, VB)
3 SPM applications3.1 aMRI segmentation3.2 Decoding of brain images3.3 Model-based fMRI analysis (with spatial priors)3.4 Dynamic causal modelling
Degree of plausibility desiderata:- should be represented using real numbers (D1)- should conform with intuition (D2)- should be consistent (D3)
a=2b=5
a=2
• normalization:
• marginalization:
• conditioning :(Bayes rule)
Bayesian paradigmprobability theory: basics
Bayesian paradigmderiving the likelihood function
- Model of data with unknown parameters:
y f e.g., GLM: f X
- But data is noisy: y f
- Assume noise/residuals is ‘small’:
22
1exp2
p
4 0.05P
→ Distribution of data, given fixed parameters:
2
2
1exp2
p y y f
f
Likelihood:
Prior:
Bayes rule:
Bayesian paradigmlikelihood, priors and the model evidence
generative model m
Bayesian paradigmforward and inverse problems
,p y m
forward problem
likelihood
,p y m
inverse problem
posterior distribution
Principle of parsimony :« plurality should not be assumed without necessity »
y=f(x
)y
= f(
x)
x
“Occam’s razor” :
mod
el e
vide
nce
p(y|
m)
space of all data sets
Model evidence:
Bayesian paradigmmodel comparison
••• inference
causality
Hierarchical modelsprinciple
Hierarchical modelsdirected acyclic graphs (DAGs)
t t Y t *
0*P t t H
0p t H
0*P t t H if then reject H0
• estimate parameters (obtain test stat.)
H0 : 0• define the null, e.g.:
• apply decision rule, i.e.:
classical SPM
• define two alternative models, e.g.:
• apply decision rule, e.g.:
Bayesian Model Comparison
Frequentist versus Bayesian inferencea (quick) note on hypothesis testing
Y y
1p Y m
0p Y m
space of all datasets
if then accept m0
0
1
P m yP m y
0 0
1 1
1 if 0:
0 otherwise
: 0,
m p m
m p m N
Family-level inferencetrading model resolution against statistical power
A B A B
A B
u
A B
u
P(m1|y) = 0.04 P(m2|y) = 0.25
P(m2|y) = 0.7P(m2|y) = 0.01
1 1 max
0.3m
P e y P m y
model selection error risk:
Family-level inferencetrading model resolution against statistical power
A B A B
A B
u
A B
u
P(m1|y) = 0.04 P(m2|y) = 0.25
P(m2|y) = 0.7P(m2|y) = 0.01
1 1 max
0.3m
P e y P m y
model selection error risk:
P(f2|y) = 0.95P(f1|y) = 0.05
1 1 max
0.05f
P e y P f y
family inference(pool statistical evidence)
m f
P f y P m y
Overview of the talk
1 Probabilistic modelling and representation of uncertainty1.1 Bayesian paradigm1.2 Hierarchical models1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods2.1 Sampling methods2.2 Variational methods (ReML, EM, VB)
3 SPM applications3.1 aMRI segmentation3.2 Decoding of brain images3.3 Model-based fMRI analysis (with spatial priors)3.4 Dynamic causal modelling
Sampling methodsMCMC example: Gibbs sampling
Variational methodsVB / EM / ReML
→ VB : maximize the free energy F(q) w.r.t. the approximate posterior q(θ) under some (e.g., mean field, Laplace) simplifying constraint
1 or 2q
1 or 2 ,p y m
1 2, ,p y m
1
2
Overview of the talk
1 Probabilistic modelling and representation of uncertainty1.1 Bayesian paradigm1.2 Hierarchical models1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods2.1 Sampling methods2.2 Variational methods (ReML, EM, VB)
3 SPM applications3.1 aMRI segmentation3.2 Decoding of brain images3.3 Model-based fMRI analysis (with spatial priors)3.4 Dynamic causal modelling
realignment smoothing
normalisation
general linear model
template
Gaussian field theory
p <0.05
statisticalinference
segmentationand normalisation
dynamic causalmodelling
posterior probabilitymaps (PPMs)
multivariatedecoding
grey matter CSFwhite matter
…
…
yi ci
k
2
1
1 2 k
class variances
classmeans
ith voxelvalue
ith voxellabel
classfrequencies
aMRI segmentationmixture of Gaussians (MoG) model
Decoding of brain imagesrecognizing brain states from fMRI
+
fixation cross
>>
paceresponse
log-evidence of X-Y sparse mappings:effect of lateralization
log-evidence of X-Y bilateral mappings:effect of spatial deployment
fMRI time series analysisspatial priors and model comparison
PPM: regions best explainedby short-term memory model
PPM: regions best explained by long-term memory model
fMRI time series
GLM coeff
prior varianceof GLM coeff
prior varianceof data noise
AR coeff(correlated noise)
short-term memorydesign matrix (X)
long-term memorydesign matrix (X)
m2m1 m3 m4
V1 V5stim
PPC
attention
V1 V5stim
PPC
attention
V1 V5stim
PPC
attention
V1 V5stim
PPC
attention
m1 m2 m3 m4
15
10
5
0
V1 V5stim
PPC
attention
1.25
0.13
0.46
0.39 0.26
0.26
0.10estimated
effective synaptic strengthsfor best model (m4)
models marginal likelihoodln p y m
Dynamic Causal Modellingnetwork structure identification
1 2
31 2
31 2
3
1 2
3
time
( , , )x f x u
32
21
13
0t
t t
t t
tu
13u 3
u
DCMs and DAGsa note on causality
m1
m2
diffe
renc
es in
log-
mod
el e
vide
nces
1 2ln lnp y m p y m
subjects
fixed effect
random effect
assume all subjects correspond to the same model
assume different subjects might correspond to different models
Dynamic Causal Modellingmodel comparison for group studies
I thank you for your attention.
A note on statistical significancelessons from the Neyman-Pearson lemma
• Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) test
1
0
p y Hu
p y H
is the most powerful test of size to test the null. 0p u H
MVB (Bayes factor) u=1.09, power=56%
CCA (F-statistics)F=2.20, power=20%
error I rate
1 - e
rror
II ra
te
ROC analysis
• what is the threshold u, above which the Bayes factor test yields a error I rate of 5%?