J. Daunizeau Brain and Spine Institute, Paris, France

J. Daunizeau

Brain and Spine Institute, Paris, FranceWellcome Trust Centre for Neuroimaging, London, UK

Bayesian inference

Overview of the talk

1 Probabilistic modelling and representation of uncertainty1.1 Bayesian paradigm1.2 Hierarchical models1.3 Frequentist versus Bayesian inference

2 Numerical Bayesian inference methods2.1 Sampling methods2.2 Variational methods (ReML, EM, VB)

3 SPM applications3.1 aMRI segmentation3.2 Decoding of brain images3.3 Model-based fMRI analysis (with spatial priors)3.4 Dynamic causal modelling

Degree of plausibility desiderata:- should be represented using real numbers (D1)- should conform with intuition (D2)- should be consistent (D3)

a=2b=5

a=2

• normalization:

• marginalization:

• conditioning :(Bayes rule)

Bayesian paradigmprobability theory: basics

Bayesian paradigmderiving the likelihood function

- Model of data with unknown parameters:

y f e.g., GLM: f X

- But data is noisy: y f

- Assume noise/residuals is ‘small’:

22

1exp2

p

4 0.05P

→ Distribution of data, given fixed parameters:

2

2

1exp2

p y y f

f

Likelihood:

Prior:

Bayes rule:

Bayesian paradigmlikelihood, priors and the model evidence

generative model m

Bayesian paradigmforward and inverse problems

,p y m

forward problem

likelihood

,p y m

inverse problem

posterior distribution

Principle of parsimony :« plurality should not be assumed without necessity »

y=f(x

)y

= f(

x)

x

“Occam’s razor” :

mod

el e

vide

nce

p(y|

m)

space of all data sets

Model evidence:

Bayesian paradigmmodel comparison

••• inference

causality

Hierarchical modelsprinciple

Hierarchical modelsdirected acyclic graphs (DAGs)

t t Y t *

0*P t t H

0p t H

0*P t t H if then reject H0

• estimate parameters (obtain test stat.)

H0 : 0• define the null, e.g.:

• apply decision rule, i.e.:

classical SPM

• define two alternative models, e.g.:

• apply decision rule, e.g.:

Bayesian Model Comparison

Frequentist versus Bayesian inferencea (quick) note on hypothesis testing

Y y

1p Y m

0p Y m

space of all datasets

if then accept m0

0

1

P m yP m y

0 0

1 1

1 if 0:

0 otherwise

: 0,

m p m

m p m N

Family-level inferencetrading model resolution against statistical power

A B A B

A B

u

A B

u

P(m1|y) = 0.04 P(m2|y) = 0.25

P(m2|y) = 0.7P(m2|y) = 0.01

1 1 max

0.3m

P e y P m y

model selection error risk:

Family-level inferencetrading model resolution against statistical power

A B A B

A B

u

A B

u

P(m1|y) = 0.04 P(m2|y) = 0.25

P(m2|y) = 0.7P(m2|y) = 0.01

1 1 max

0.3m

P e y P m y

model selection error risk:

P(f2|y) = 0.95P(f1|y) = 0.05

1 1 max

0.05f

P e y P f y

family inference(pool statistical evidence)

m f

P f y P m y

Sampling methodsMCMC example: Gibbs sampling

Variational methodsVB / EM / ReML

→ VB : maximize the free energy F(q) w.r.t. the approximate posterior q(θ) under some (e.g., mean field, Laplace) simplifying constraint

1 or 2q

1 or 2 ,p y m

1 2, ,p y m

1

2

realignment smoothing

normalisation

general linear model

template

Gaussian field theory

p <0.05

statisticalinference

segmentationand normalisation

dynamic causalmodelling

posterior probabilitymaps (PPMs)

multivariatedecoding

grey matter CSFwhite matter

…

…

yi ci

k

2

1

1 2 k

class variances

classmeans

ith voxelvalue

ith voxellabel

classfrequencies

aMRI segmentationmixture of Gaussians (MoG) model

Decoding of brain imagesrecognizing brain states from fMRI

+

fixation cross

>>

paceresponse

log-evidence of X-Y sparse mappings:effect of lateralization

log-evidence of X-Y bilateral mappings:effect of spatial deployment

fMRI time series analysisspatial priors and model comparison

PPM: regions best explainedby short-term memory model

PPM: regions best explained by long-term memory model

fMRI time series

GLM coeff

prior varianceof GLM coeff

prior varianceof data noise

AR coeff(correlated noise)

short-term memorydesign matrix (X)

long-term memorydesign matrix (X)

m2m1 m3 m4

V1 V5stim

PPC

attention

V1 V5stim

PPC

attention

V1 V5stim

PPC

attention

V1 V5stim

PPC

attention

m1 m2 m3 m4

15

10

5

0

V1 V5stim

PPC

attention

1.25

0.13

0.46

0.39 0.26

0.26

0.10estimated

effective synaptic strengthsfor best model (m4)

models marginal likelihoodln p y m

Dynamic Causal Modellingnetwork structure identification

1 2

31 2

31 2

3

1 2

3

time

( , , )x f x u

32

21

13

0t

t t

t t

tu

13u 3

u

DCMs and DAGsa note on causality

m1

m2

diffe

renc

es in

log-

mod

el e

vide

nces

1 2ln lnp y m p y m

subjects

fixed effect

random effect

assume all subjects correspond to the same model

assume different subjects might correspond to different models

Dynamic Causal Modellingmodel comparison for group studies

I thank you for your attention.

A note on statistical significancelessons from the Neyman-Pearson lemma

• Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) test

1

0

p y Hu

p y H

is the most powerful test of size to test the null. 0p u H

MVB (Bayes factor) u=1.09, power=56%

CCA (F-statistics)F=2.20, power=20%

error I rate

1 - e

rror

II ra

te

ROC analysis

• what is the threshold u, above which the Bayes factor test yields a error I rate of 5%?

J. Daunizeau Brain and Spine Institute, Paris, France

Documents

Transcript of J. Daunizeau Brain and Spine Institute, Paris, France