Post on 24-Feb-2016
description
J. Daunizeau
Brain and Spine Institute, Paris, FranceWellcome Trust Centre for Neuroimaging, London, UK
Bayesian inference
Overview of the talk
1 Probabilistic modelling and representation of uncertainty1.1 Bayesian paradigm1.2 Hierarchical models1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods2.1 Sampling methods2.2 Variational methods (ReML, EM, VB)
3 SPM applications3.1 aMRI segmentation3.2 Decoding of brain images3.3 Model-based fMRI analysis (with spatial priors)3.4 Dynamic causal modelling
Overview of the talk
1 Probabilistic modelling and representation of uncertainty1.1 Bayesian paradigm1.2 Hierarchical models1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods2.1 Sampling methods2.2 Variational methods (ReML, EM, VB)
3 SPM applications3.1 aMRI segmentation3.2 Decoding of brain images3.3 Model-based fMRI analysis (with spatial priors)3.4 Dynamic causal modelling
Degree of plausibility desiderata:- should be represented using real numbers (D1)- should conform with intuition (D2)- should be consistent (D3)
a=2b=5
a=2
• normalization:
• marginalization:
• conditioning :(Bayes rule)
Bayesian paradigmprobability theory: basics
Bayesian paradigmderiving the likelihood function
- Model of data with unknown parameters:
y f e.g., GLM: f X
- But data is noisy: y f
- Assume noise/residuals is ‘small’:
22
1exp2
p
4 0.05P
→ Distribution of data, given fixed parameters:
2
2
1exp2
p y y f
f
Likelihood:
Prior:
Bayes rule:
Bayesian paradigmlikelihood, priors and the model evidence
generative model m
Bayesian paradigmforward and inverse problems
,p y m
forward problem
likelihood
,p y m
inverse problem
posterior distribution
Principle of parsimony :« plurality should not be assumed without necessity »
y=f(x
)y
= f(
x)
x
“Occam’s razor” :
mod
el e
vide
nce
p(y|
m)
space of all data sets
Model evidence:
Bayesian paradigmmodel comparison
••• inference
causality
Hierarchical modelsprinciple
Hierarchical modelsdirected acyclic graphs (DAGs)
t t Y t *
0*P t t H
0p t H
0*P t t H if then reject H0
• estimate parameters (obtain test stat.)
H0 : 0• define the null, e.g.:
• apply decision rule, i.e.:
classical SPM
• define two alternative models, e.g.:
• apply decision rule, e.g.:
Bayesian Model Comparison
Frequentist versus Bayesian inferencea (quick) note on hypothesis testing
Y y
1p Y m
0p Y m
space of all datasets
if then accept m0
0
1
P m yP m y
0 0
1 1
1 if 0:
0 otherwise
: 0,
m p m
m p m N
Family-level inferencetrading model resolution against statistical power
A B A B
A B
u
A B
u
P(m1|y) = 0.04 P(m2|y) = 0.25
P(m2|y) = 0.7P(m2|y) = 0.01
1 1 max
0.3m
P e y P m y
model selection error risk:
Family-level inferencetrading model resolution against statistical power
A B A B
A B
u
A B
u
P(m1|y) = 0.04 P(m2|y) = 0.25
P(m2|y) = 0.7P(m2|y) = 0.01
1 1 max
0.3m
P e y P m y
model selection error risk:
P(f2|y) = 0.95P(f1|y) = 0.05
1 1 max
0.05f
P e y P f y
family inference(pool statistical evidence)
m f
P f y P m y
Overview of the talk
1 Probabilistic modelling and representation of uncertainty1.1 Bayesian paradigm1.2 Hierarchical models1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods2.1 Sampling methods2.2 Variational methods (ReML, EM, VB)
3 SPM applications3.1 aMRI segmentation3.2 Decoding of brain images3.3 Model-based fMRI analysis (with spatial priors)3.4 Dynamic causal modelling
Sampling methodsMCMC example: Gibbs sampling
Variational methodsVB / EM / ReML
→ VB : maximize the free energy F(q) w.r.t. the approximate posterior q(θ) under some (e.g., mean field, Laplace) simplifying constraint
1 or 2q
1 or 2 ,p y m
1 2, ,p y m
1
2
Overview of the talk
1 Probabilistic modelling and representation of uncertainty1.1 Bayesian paradigm1.2 Hierarchical models1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods2.1 Sampling methods2.2 Variational methods (ReML, EM, VB)
3 SPM applications3.1 aMRI segmentation3.2 Decoding of brain images3.3 Model-based fMRI analysis (with spatial priors)3.4 Dynamic causal modelling
realignment smoothing
normalisation
general linear model
template
Gaussian field theory
p <0.05
statisticalinference
segmentationand normalisation
dynamic causalmodelling
posterior probabilitymaps (PPMs)
multivariatedecoding
grey matter CSFwhite matter
…
…
yi ci
k
2
1
1 2 k
class variances
classmeans
ith voxelvalue
ith voxellabel
classfrequencies
aMRI segmentationmixture of Gaussians (MoG) model
Decoding of brain imagesrecognizing brain states from fMRI
+
fixation cross
>>
paceresponse
log-evidence of X-Y sparse mappings:effect of lateralization
log-evidence of X-Y bilateral mappings:effect of spatial deployment
fMRI time series analysisspatial priors and model comparison
PPM: regions best explainedby short-term memory model
PPM: regions best explained by long-term memory model
fMRI time series
GLM coeff
prior varianceof GLM coeff
prior varianceof data noise
AR coeff(correlated noise)
short-term memorydesign matrix (X)
long-term memorydesign matrix (X)
m2m1 m3 m4
V1 V5stim
PPC
attention
V1 V5stim
PPC
attention
V1 V5stim
PPC
attention
V1 V5stim
PPC
attention
m1 m2 m3 m4
15
10
5
0
V1 V5stim
PPC
attention
1.25
0.13
0.46
0.39 0.26
0.26
0.10estimated
effective synaptic strengthsfor best model (m4)
models marginal likelihoodln p y m
Dynamic Causal Modellingnetwork structure identification
1 2
31 2
31 2
3
1 2
3
time
( , , )x f x u
32
21
13
0t
t t
t t
tu
13u 3
u
DCMs and DAGsa note on causality
m1
m2
diffe
renc
es in
log-
mod
el e
vide
nces
1 2ln lnp y m p y m
subjects
fixed effect
random effect
assume all subjects correspond to the same model
assume different subjects might correspond to different models
Dynamic Causal Modellingmodel comparison for group studies
I thank you for your attention.
A note on statistical significancelessons from the Neyman-Pearson lemma
• Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) test
1
0
p y Hu
p y H
is the most powerful test of size to test the null. 0p u H
MVB (Bayes factor) u=1.09, power=56%
CCA (F-statistics)F=2.20, power=20%
error I rate
1 - e
rror
II ra
te
ROC analysis
• what is the threshold u, above which the Bayes factor test yields a error I rate of 5%?