1 Hidden Process Models Rebecca Hutchinson Joint work with Tom Mitchell and Indra Rustandi.
-
date post
20-Dec-2015 -
Category
Documents
-
view
231 -
download
0
Transcript of 1 Hidden Process Models Rebecca Hutchinson Joint work with Tom Mitchell and Indra Rustandi.
1
Hidden Process Models
Rebecca Hutchinson
Joint work with Tom Mitchell and Indra Rustandi
2
Talk Outline
• fMRI (functional Magnetic Resonance Imaging) data
• Prior work on analyzing fMRI data
• HPMs (Hidden Process Models)
• Preliminary results
• HPMs and BodyMedia
3
functional MRI
4
fMRI Basics
• Safe and non-invasive
• Temporal resolution ~ 1 3D image every second
• Spatial resolution ~ 1 mm– Voxels: 3mm x 3mm x 3-5mm
• Measures the BOLD response: Blood Oxygen Level Dependent– Indirect indicator of neural activity
5
The BOLD response
• Ratio of deoxy-hemoglobin to oxy-hemoglobin (different magnetic properties).
• Also called hemodynamic response function (HRF).
• Common working assumption: responses sum linearly.
6
More on BOLD response
Sig
nal
Am
plitu
de
Time (seconds)
• At left is a typical BOLD response to a brief stimulation.
• (Here, subject reads a word, decides whether it is a noun or verb, and pushes a button in less than 1 second.)
7
8
Lots of features!• 10,000-15,000 voxels per image
…
9
Study: Pictures and Sentences
• 13 normal subjects.
• 40 trials per subject.
• Sentences and pictures describe 3 symbols: *, +, and $, using ‘above’, ‘below’, ‘not above’, ‘not below’.
• Images are acquired every 0.5 seconds.
Read Sentence
View Picture Read Sentence
View PictureFixation
Press Button
4 sec. 8 sec.t=0
Rest
10
The star is not below the plus.
11
12
+
---
*
13
.
14
fMRI Summary
• High-dimensional time series data.
• Considerable noise on the data.
• Typically small number of examples (trials) compared with features (voxels).
• BOLD responses sum linearly.
15
Talk Outline
• fMRI (functional Magnetic Resonance Imaging) data
• Prior work on analyzing fMRI data
• HPMs (Hidden Process Models)
• Preliminary results
• HPMs and BodyMedia
16
It’s not hopeless!
• Learning setting is tough, but we can do it!• Feature selection is key.
• Learn fMRI(t,t+8)->{Picture,Sentence}
Read Sentence
View Picture Read Sentence
View PictureFixation
Press Button
4 sec. 8 sec.t=0
Rest
17
ResultsA B C D E F G
71.3% 91.2% 76.2% 96.3% 85.0% 66.2% 71.3%
• Gaussian Naïve Bayes Classifier.
• 95% confidence intervals per subject are +/- 10%-15%.
• Accuracy of default classifier is 50%.
• Feature selection: Top 240 most active voxels in brain.
Subject:
Accuracy:
H I J K L M Avg.95.0% 81.2% 90.0% 85.0% 65.0% 90.0% 81.8%
Subject:
Accuracy:
18
Why is this interesting?• Cognitive architectures like ACT-R and
4CAPS predict cognitive processes involved in tasks, along with cortical regions associated with the processes.
• Machine learning can contribute to these architectures by linking their predictions to empirical fMRI data.
19
Other Successes
• We can distinguish between 12 semantic categories of words (e.g. tools vs. buildings).
• We can train classifiers across multiple subjects.
20
What can’t we do?
• Take into account that the responses for Picture and Sentence overlap.
• What does the response for Decide look like and when does it start?
Read Sentence
View Picture Read Sentence
View PictureFixation
Press Button
4 sec. 8 sec.t=0
Rest
21
Talk Outline
• fMRI (functional Magnetic Resonance Imaging) data
• Prior work on analyzing fMRI data
• HPMs (Hidden Process Models)
• Preliminary results
• HPMs and BodyMedia
22
Motivation
• Overlapping processes– The responses to Picture and Sentence could
overlap in space and/or time.
• Hidden processes– Decide does not directly correspond to the
known stimuli.
• Move to a temporal model.
23
Hidden Markov Models?
• Can’t do overlapping processes – states are mutually exclusive.
• Markov assumption: given statet-1, statet is independent of everything before t-1.
• BOLD response: Not Markov!
t-1 t t+1 t+2
CogProc{Picture, Sentence,Decide}
fMRI
24
factorial HMMs?
• Have more flexibility than we need.– Picture state sequence should not be {0 1 0 1
0 1 0 1…}
• Still have Markov assumption problem.
t-1 t t+1 t+2
Picture = {0,1}
Sentence = {0,1}
Decide = {0,1}
fMRI
25
Hidden Process Models
Process ID = 3
Process ID = 2Process Instances:
Observed fMRI: cortical region 1:
cortical region 2:
Processes:
Name: Read sentenceProcess ID: 1Response:
Name: View PictureProcess ID: 2Response:
Name: Decide whetherconsistentProcess ID: 3Response:
Process ID = 1 Process ID = 1
26
HPM Parameters• Set of processes, each of which has:
– a process ID– a maximum response duration R– emission weights for each voxel v [W(v,1),
…,W(v,t),…,W(v,R)] – a multinomial distribution over possible start
times within a trial [1,…,t,…,T]
• Set of standard deviations – one for each voxel 1,…,v,...,V]
27
Interpreting data with HPMs
• Data Interpretation (int)– Set of process instances, each of which has:
• a process ID• a start time S
• To predict fMRI data using an HPM and int:– For each active process, add the response
associated with its processID to the prediction.
28
Synthetic Data ExampleProcess 1: Process 2: Process 3:
Process responses:
Process instances:
Predicted data
ProcessID=1, S=1
ProcessID=2, S=17
ProcessID=3, S=21
29
Our Assumptions
• Processes, not states.– One hidden variable – process start time.
• Known number of processes in the model.– e.g. Picture, Sentence, Decide – 3 processes
• Known number of instantiations of those processes.– e.g. numTrials*3 processes
• Each process has a unique signature.• Contributions of overlapping processes to the
same output variable sum linearly.
30
The generative model
• Together HPM and interpretation (int) define a probability distribution over sequences of fMRI images:
where
P(yv,t|hpm,int) = N(v,t,v)
v,t = Wi.procID(v,t – start(i))i active process instances
31
Inference
• Given:– An HPM– A set of data interpretations (int) of
processIDs and start times– Priors over the interpretations
• P(int=i|Y) P(Y|int=i)P(int=i)
Choose the interpretation i with the highest probability.
32
Synthetic Data ExampleInterpretation 1:
Observed data
ProcessID=1, S=1
ProcessID=2, S=17
ProcessID=3, S=21
Interpretation 2:
ProcessID=2, S=1
ProcessID=1, S=17
ProcessID=3, S=23
Prediction 1
Prediction 2
33
Learning the Model
• EM (Expectation-Maximization) algorithm
• E-step– Estimate a conditional distribution over the
start times of the process instances given the observed data, P(S|fMRI).
• M-step– Use the distribution from the E step to get
maximum-likelihood estimates of the HPM parameters {, W, }.
34
More on the E-step
• The start times of the process instances are not necessarily conditionally independent given the data.– Must consider joint configurations.– With no constraints, TnInstances configurations. – 2000120 configurations for typical experiment.
• Can we consider a smaller set of start time configurations?
35
Reducing complexity
• Prior knowledge– Landmarks
• Events with known timing that “trigger” processes. • One per process instance.
– Offsets• The interval of possible delays from a landmark to
a process instance onset. • One vector of n offsets per process.
• Conditional independencies– Introduced when no process instance could
be active.
36
Before Prior Knowledge
Decide whether consistent
Read sentence
View pictureCognitive processes:
Observed fMRI:
cortical region 1:
cortical region 2:
37
Decide whether consistent
Read sentence
View pictureCognitive processes:
Observed fMRI:
cortical region 1:
cortical region 2:
Landmarks:(Stimuli)
Sentence Presentation
PicturePresentation
Sentence offsets = {0,1} Picture
offsets = {0,1}Decide offsets = {0,1,2,3}
Landmarks go to process instances.
Offset values aredetermined byprocess IDs.
Prior Knowledge
38
Conditional Independencies
Decide whether consistent
Read sentence
View picture
Observed fMRI:
cortical region 1:
cortical region 2:
Landmarks:(Stimuli)
Sentence Presentation
PicturePresentation
Sentence offsets = {0,1} Picture
offsets = {0,1}Decide offsets = {0,1,2,3}
Read sentence
View picture
Sentence Presentation
Sentence offsets = {0,1}
HERE
39
More on the M-step
• Weighted least squares procedure – exact, but may become intractable for large
problems– weights are the probabilities computed in the
E-step
• Gradient ascent procedure – approximate, but may be necessary when
exact method is intractable– derivatives of the expected log likelihood of
the data with respect to the parameters
40
Talk Outline
• fMRI (functional Magnetic Resonance Imaging) data
• Prior work on analyzing fMRI data
• HPMs (Hidden Process Models)
• Preliminary results
• HPMs and BodyMedia
41
View PictureOr
Read Sentence
Read SentenceOr
View PictureFixation
Press Button
4 sec. 8 sec.t=0
Rest
picture or sentence? picture or sentence?
16 sec.
GNB:
picture or sentence?
picture or sentence?
HPM:
Preliminary Results
42
GNB vs. HPM Classification
• GNB: non-overlapping processes• HPM: simultaneous classification of
multiple overlapping processes
• Average improvement of 15% in classification error using HPM vs GNB
• E.g., for one subject– GNB classification error: 0.14– HPM classification error: 0.09
43trial 25
Learned models
Comprehend sentence
Comprehend picture
44
Model selection experiments
• Model with 2 or 3 cognitive processes?– How would we know ground truth?– Cross validated data likelihood P(testData |
HPM)• Better with 3 processes than 2
– Cross validated classification accuracy• Better with 3 processes than 2
45
Current work and challenges
• Add temporal and/or spatial smoothness constraints.
• Feature selection for HPMs.• Process libraries, hierarchies.• Process parameters (e.g. sentence
negated or not).• Model process interactions.• Scaling parameters for response
amplitudes to model habituation effects.
46
Talk Outline
• fMRI (functional Magnetic Resonance Imaging) data
• Prior work on analyzing fMRI data
• HPMs (Hidden Process Models)
• Preliminary results
• HPMs and BodyMedia
47
One idea…
Sensor 1:
Sensor 2:
Processes:
Name: Riding busProcess ID: 1Response:
Name: Eating Process ID: 2Response:
Name: WalkingconsistentProcess ID: 3Response:
Process instances:
Observed data:
ProcessID=3
ProcessID=2
ProcessID=1
48
Some questions
• What processes are interesting?
• What granularity/duration would processes have?
• What would landmarks be?
• Variable process durations needed?
• Better way to parameterize process signatures?