CMU SCS
Mining Large Graphs
Christos Faloutsos CMU
CMU SCS
(c) 2014, C. Faloutsos 2
Roadmap
• Introduction – Motivation – Why study (big) graphs?
• Part#1: Patterns in graphs • Part#2: time-evolving graphs • Conclusions
CMU SCS 15649
CMU SCS
(c) 2014, C. Faloutsos 3
Graphs - why should we care?
CMU SCS 15649
~1B nodes (web sites) ~6B edges (http links) ‘YahooWeb graph’
CMU SCS
(c) 2014, C. Faloutsos 4
Graphs - why should we care?
CMU SCS 15649
U Kang, Jay-Yoon Lee, Danai Koutra, and Christos Faloutsos. Net-Ray: Visualizing and Mining Billion-Scale Graphs PAKDD 2014, Tainan, Taiwan.
~1B nodes (web sites) ~6B edges (http links) ‘YahooWeb graph’
CMU SCS
(c) 2014, C. Faloutsos 5
Graphs - why should we care?
>$10B; ~1B users
CMU SCS 15649
CMU SCS
(c) 2014, C. Faloutsos 6
Graphs - why should we care?
Internet Map [lumeta.com]
Food Web [Martinez ’91]
CMU SCS 15649
CMU SCS
(c) 2014, C. Faloutsos 7
Graphs - why should we care? • web-log (‘blog’) news propagation • computer network security: email/IP traffic and
anomaly detection • Recommendation systems • ....
• Many-to-many db relationship -> graph
CMU SCS 15649
CMU SCS
Motivating problems • P1: patterns? Fraud detection?
• P2: patterns in time-evolving graphs / tensors
CMU SCS 15649 (c) 2014, C. Faloutsos 8
time
destination
CMU SCS
(c) 2014, C. Faloutsos 9
Roadmap
• Introduction – Motivation – Why study (big) graphs?
• Part#1: Patterns in graphs • Part#2: time-evolving graphs • Conclusions
CMU SCS 15649
CMU SCS
CMU SCS 15649 (c) 2014, C. Faloutsos 10
Part 1: Patterns, &
fraud detection
CMU SCS
(c) 2014, C. Faloutsos 11
Laws and patterns • Q1: Are real graphs random?
CMU SCS 15649
CMU SCS
(c) 2014, C. Faloutsos 12
Laws and patterns • Q1: Are real graphs random? • A1: NO!!
– Diameter (‘6 degrees’; ‘Kevin Bacon’) – in- and out- degree distributions – other (surprising) patterns
• So, let’s look at the data
CMU SCS 15649
CMU SCS
(c) 2014, C. Faloutsos 13
Solution# S.1 • Power law in the degree distribution [FFF,
SIGCOMM99]
log(rank)
log(degree)
internet domains
att.com
ibm.com
CMU SCS 15649
CMU SCS
(c) 2014, C. Faloutsos 14
Solution# S.1 • Power law in the degree distribution [FFF,
SIGCOMM99]
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
CMU SCS 15649
CMU SCS
(c) 2014, C. Faloutsos 15
Solution# S.1 • Q: So what?
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
CMU SCS 15649
CMU SCS
(c) 2014, C. Faloutsos 16
Solution# S.1 • Q: So what? • A1: # of two-step-away pairs:
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
CMU SCS 15649
= friends of friends (F.O.F.)
CMU SCS
(c) 2014, C. Faloutsos 17
Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: 100^2 * N= 10 Trillion
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
CMU SCS 15649
= friends of friends (F.O.F.)
CMU SCS
(c) 2014, C. Faloutsos 18
Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: 100^2 * N= 10 Trillion
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
CMU SCS 15649
= friends of friends (F.O.F.)
CMU SCS
(c) 2014, C. Faloutsos 19
Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
CMU SCS 15649
~0.8PB -> a data center(!)
DCO @ CMU
Gaussian trap
= friends of friends (F.O.F.)
CMU SCS
(c) 2014, C. Faloutsos 20
Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
CMU SCS 15649
~0.8PB -> a data center(!)
Gaussian trap
CMU SCS
Observation – big-data: • O(N2) algorithms are ~intractable - N=1B
• N2 seconds = 31B years (>2x age of universe)
CMU SCS 15649 (c) 2014, C. Faloutsos 21
1B
1B
CMU SCS
(c) 2014, C. Faloutsos 22
Solution# S.2: Eigen Exponent E
• A2: power law in the eigenvalues of the adjacency matrix (‘eig()’)
E = -0.48
Exponent = slope
Eigenvalue
Rank of decreasing eigenvalue
May 2001
CMU SCS 15649
A x = λ x
CMU SCS
(c) 2014, C. Faloutsos 23
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs
– Patterns: Degree; Triangles – Anomaly/fraud detection – Graph understanding
• Part#2: time-evolving graphs; tensors • Part#3: Cascades and immunization • Conclusions
CMU SCS 15649
CMU SCS
(c) 2014, C. Faloutsos 24
Solution# S.3: Triangle ‘Laws’
• Real social networks have a lot of triangles
CMU SCS 15649
CMU SCS
(c) 2014, C. Faloutsos 25
Solution# S.3: Triangle ‘Laws’
• Real social networks have a lot of triangles – Friends of friends are friends
• Any patterns? – 2x the friends, 2x the triangles ?
CMU SCS 15649
CMU SCS
(c) 2014, C. Faloutsos 26
Triangle Law: #S.3 [Tsourakakis ICDM 2008]
SN Reuters
Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles
CMU SCS 15649
CMU SCS
(c) 2014, C. Faloutsos 27
Triangle Law: Computations [Tsourakakis ICDM 2008]
But: triangles are expensive to compute (3-way join; several approx. algos) – O(dmax
2) Q: Can we do that quickly? A:
details
CMU SCS 15649
CMU SCS
(c) 2014, C. Faloutsos 28
Triangle Law: Computations [Tsourakakis ICDM 2008]
But: triangles are expensive to compute (3-way join; several approx. algos) – O(dmax
2) Q: Can we do that quickly? A: Yes!
#triangles = 1/6 Sum ( λi3 )
(and, because of skewness (S2) , we only need the top few eigenvalues! - O(E)
CMU SCS 15649
A x = λ x
details
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
29 CMU SCS 15649 29 (c) 2014, C. Faloutsos
? ?
?
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
30 CMU SCS 15649 30 (c) 2014, C. Faloutsos
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
31 CMU SCS 15649 31 (c) 2014, C. Faloutsos
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
32 CMU SCS 15649 32 (c) 2014, C. Faloutsos
CMU SCS
MORE Graph Patterns
CMU SCS 15649 (c) 2014, C. Faloutsos 33
✔ ✔ ✔
RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD’09.
CMU SCS
MORE Graph Patterns
CMU SCS 15649 (c) 2014, C. Faloutsos 34
• Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed.: Charu Aggarwal)
• Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool.
CMU SCS
(c) 2014, C. Faloutsos 35
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs
– Patterns – Anomaly / fraud detection – Graph understanding
• Part#2: time-evolving graphs; tensors • Part#3: Cascades and immunization • Conclusions
CMU SCS 15649
CMU SCS
Fraud • Given
– Who ‘likes’ what page, and when
• Find – Suspicious users and suspicious
products
CMU SCS 15649 (c) 2014, C. Faloutsos 36
CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.
CMU SCS
Fraud • Given
– Who ‘likes’ what page, and when
• Find – Suspicious users and suspicious
products
CMU SCS 15649 (c) 2014, C. Faloutsos 37
CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.
Users Pages
A
B
C
D
E
1
2
3
4
40 Z
Likes
CMU SCS
Our intuition ▪ Lockstep behavior: Same Likes, same time
Graph Patterns and Lockstep Behavior
Pages
Users
0
10
20
30
40
50
60
70
80
90
Tim
e
Users Pages
A
B
C
D
E
1
2
3
4
40 Z
Likes CMU SCS 15649 38 (c) 2014, C. Faloutsos
CMU SCS
Our intuition ▪ Lockstep behavior: Same Likes, same time
Graph Patterns and Lockstep Behavior
Users Pages
A
B
C
D
E
1
2
3
4
40 Z
Pages
Users
0
10
20
30
40
50
60
70
80
90
Tim
e
B CD A E
Reordered Pages
4
3 2 1
Reord
ere
d U
sers
0
10
20
30
40
50
60
70
80
90
Tim
e
Likes CMU SCS 15649 39 (c) 2014, C. Faloutsos
CMU SCS
Our intuition ▪ Lockstep behavior: Same Likes, same time
Graph Patterns and Lockstep Behavior
Users Pages
A
B
C
D
E
1
2
3
4
40 Z
Pages
Users
0
10
20
30
40
50
60
70
80
90
Tim
e
B CD A E
Reordered Pages
4
3 2 1
Reord
ere
d U
sers
0
10
20
30
40
50
60
70
80
90
Tim
e
Suspicious Lockstep Behavior Likes
CMU SCS 15649 40 (c) 2014, C. Faloutsos
CMU SCS
MapReduce Overview ▪ Use Hadoop to search for
many clusters in parallel:
1. Start with randomly seed
2. Update set of Pages and center Like times for each cluster
3. Repeat until convergence
Users Pages
A
B
C
D
E
1
2
3
4
40 Z
Likes
CMU SCS 15649 41 (c) 2014, C. Faloutsos
CMU SCS
Deployment at Facebook ▪ CopyCatch runs regularly (along with many other
security mechanisms, and a large Site Integrity team)
08/25 09/08 09/22 10/06 10/20 11/03 11/17 12/01
Num
ber
of u
sers
cau
ght
Date of CopyCatch run
3 months of CopyCatch @ Facebook
#users caught
time CMU SCS 15649 42 (c) 2014, C. Faloutsos
CMU SCS
Deployment at Facebook
23%
58%
5%9%
5%
Fake AccountsMalicious Browser ExtensionsOS MalwareCredential StealingSocial Engineering
Manually labeled 22 randomly selected clusters from February 2013
Most clusters (77%) come from real but compromised users
Fake acct
CMU SCS 15649 43 (c) 2014, C. Faloutsos
CMU SCS
(c) 2014, C. Faloutsos 44
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs
– Patterns – Anomaly / fraud detection – Graph understanding
• Part#2: time-evolving graphs; • Conclusions
CMU SCS 15649
CMU SCS
Wikipedia - editors
CMU SCS 15649 (c) 2014, C. Faloutsos 45
• Nodes: editors • Edge A->B: ‘A’ changed ‘B’ Any pattern?
CMU SCS
VoG:Summarizing and Understanding Large Graphs
Danai Koutra, U Kang, Jilles Vreeken, Christos Faloutsos.
SDM 2014, Philadelphia, PA, April 2014.
CMU SCS 15649 (c) 2014, C. Faloutsos 46
Code: www.cs.cmu.edu/~dkoutra/CODE/vog.tar
CMU SCS Wiki Controversial Article
(c) 2014, C. Faloutsos 47
Stars: admins,
bots, heavy users
Bipartite cores: edit wars
Kiev vs. Kyiv vandals vs. admins
Danai Koutra (CMU) 47 CMU SCS 15649
CMU SCS VoG: Summarizing Graphs using
Rich Vocabularies Main Ideas: (1) Use `vocabulary' of subgraph types
(2) Minimum Description Length (MDL) and above vocabulary, to summarize graph
…
CMU SCS 15649 48 (c) 2014, C. Faloutsos
CMU SCS
Summary of Part#1 • *many* patterns in real graphs
– Power-laws everywhere – Gaussian trap
• Avg << Max
CMU SCS 15649 (c) 2014, C. Faloutsos 49
CMU SCS
(c) 2014, C. Faloutsos 50
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors
– P2.1: time-evolving graphs – P2.2: with side information (‘coupled’ M.T.F.) – Speed
• Part#3: Cascades and immunization • Conclusions
CMU SCS 15649
CMU SCS
CMU SCS 15649 (c) 2014, C. Faloutsos 51
Part 2: Time evolving
graphs; tensors
CMU SCS
Graphs over time -> tensors! • Problem #2.1:
– Given who calls whom, and when – Find patterns / anomalies
CMU SCS 15649 (c) 2014, C. Faloutsos 52
smith
CMU SCS
Graphs over time -> tensors! • Problem #2.1:
– Given who calls whom, and when – Find patterns / anomalies
CMU SCS 15649 (c) 2014, C. Faloutsos 53
CMU SCS
Graphs over time -> tensors! • Problem #2.1:
– Given who calls whom, and when – Find patterns / anomalies
CMU SCS 15649 (c) 2014, C. Faloutsos 54
Mon Tue
CMU SCS
Graphs over time -> tensors! • Problem #2.1:
– Given who calls whom, and when – Find patterns / anomalies
CMU SCS 15649 (c) 2014, C. Faloutsos 55 callee
caller
CMU SCS
Graphs over time -> tensors! • Problem #2.1’:
– Given author-keyword-date – Find patterns / anomalies
CMU SCS 15649 (c) 2014, C. Faloutsos 56 keyword
author
MANY more settings, with >2 ‘modes’
CMU SCS
Graphs over time -> tensors! • Problem #2.1’’:
– Given subject – verb – object facts – Find patterns / anomalies
CMU SCS 15649 (c) 2014, C. Faloutsos 57 object
subject
MANY more settings, with >2 ‘modes’
CMU SCS
Graphs over time -> tensors! • Problem #2.1’’’:
– Given <triplets> – Find patterns / anomalies
CMU SCS 15649 (c) 2014, C. Faloutsos 58 mode2
mode1
MANY more settings, with >2 ‘modes’ (and 4, 5, etc modes)
CMU SCS
Graphs & side info • Problem #2.2: coupled (eg., side info)
– Given subject – verb – object facts • And voxel-activity for each subject-word
– Find patterns / anomalies
CMU SCS 15649 (c) 2014, C. Faloutsos 59 object
subject
fMRI voxel activity `apple tastes sweet’
CMU SCS
Graphs & side info • Problem #2.2: coupled (eg., side info)
– Given subject – verb – object facts • And voxel-activity for each subject-word
– Find patterns / anomalies
CMU SCS 15649 (c) 2014, C. Faloutsos 60 ‘sweet’
‘apple’
fMRI voxel activity `apple tastes sweet’
‘apple’
CMU SCS
(c) 2014, C. Faloutsos 61
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors
– P2.1: time-evolving graphs – P2.2: with side information (‘coupled’ M.T.F.) – Speed
• Part#3: Cascades and immunization • Conclusions
CMU SCS 15649
CMU SCS Answer to both: tensor
factorization • Recall: (SVD) matrix factorization: finds
blocks
CMU SCS 15649 (c) 2014, C. Faloutsos 62
N users
M products
‘meat-eaters’ ‘steaks’
‘vegetarians’ ‘plants’
‘kids’ ‘cookies’
~ + +
CMU SCS Answer to both: tensor
factorization • PARAFAC decomposition
CMU SCS 15649 (c) 2014, C. Faloutsos 63
= + + subject
object
politicians artists athletes
CMU SCS
Answer: tensor factorization • PARAFAC decomposition • Results for who-calls-whom-when
– 4M x 15 days
CMU SCS 15649 (c) 2014, C. Faloutsos 64
= + + caller
callee
?? ?? ??
CMU SCS Anomaly detection in time-
evolving graphs
• Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks
~200 calls to EACH receiver on EACH day!
1 caller 5 receivers 4 days of activity
CMU SCS 15649 65 (c) 2014, C. Faloutsos
=
CMU SCS Anomaly detection in time-
evolving graphs
• Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks
~200 calls to EACH receiver on EACH day!
1 caller 5 receivers 4 days of activity
CMU SCS 15649 66 (c) 2014, C. Faloutsos
=
CMU SCS Anomaly detection in time-
evolving graphs
• Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks
~200 calls to EACH receiver on EACH day! CMU SCS 15649 67 (c) 2014, C. Faloutsos
=
Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast Automatic Discovery of Temporal (Comet) Communities. PAKDD 2014, Tainan, Taiwan.
CMU SCS
(c) 2014, C. Faloutsos 68
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors
– P2.1: Discoveries @ phonecall network – P2.2: Discoveries in neuro-semantics – Speed
• Part#3: Cascades and immunization • Conclusions
CMU SCS 15649
CMU SCS Coupled Matrix-Tensor Factorization
(CMTF)
Y X
a1 aF
b1 bF
c1 cF
a1 aF
d1 dF
CMU SCS 15649 69 (c) 2014, C. Faloutsos
+
+
CMU SCS
Neuro-semantics
!"
!" #$$%&%'()* +%,-./0 )($ */,/($0
!"#$%& '() *%&+&,-.-"/, .,0 +&- /1 &2&345.%+ $+&0 ", -6& &24&%"3&,-)!"#$%&'&(#)%*!+,$,!($,*,)%,-!./!-&*%&)'%!+0$-1(&'%2$,!(#&$*!-,*'$&3&)4!'0550)!'0)'$,%,!)02)*6!!78,*,!'0)*&*%,-!09!:!,;,5(<#$*!9$05!,#'8!09!=>!'#%,40$&,*?!#*!*80+)!#30@,6!!A!*<0+!,@,)%1$,<#%,-!(#$#-&45!+#*!,5(<0B,-?!&)!+8&'8!%8,!*%&52<2*!+#*!($,*,)%,-!90$!C*?!90<<0+,-!3B!#!D*!9&;#%&0)!(,$&0-!-2$&)4!+8&'8!#)!E!+#*!($,*,)%,-!&)!%8,!',)%,$!09!%8,!*'$,,)6!!F5#4,*!+,$,!($,*,)%,-!#*!+8&%,!<&),*!#)-!'8#$#'%,$*!0)!#!-#$G!3#'G4$02)-?!32%!#$,!&)@,$%,-!8,$,!%0!&5($0@,!$,#-#3&<&%B6!!78,!,)%&$,!*,%!09!./!,;,5(<#$*!+#*!($,*,)%,-!*&;!%&5,*?!$#)-05<B!(,$52%&)4!%8,!*,H2,)',!0)!,#'8!($,*,)%#%&0)6!
!
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
• Brain Scan Data*
• 9 persons • 60 nouns
• Questions • 218 questions • ‘is it alive?’, ‘can
you eat it?’
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
CMU SCS 15649 70 (c) 2014, C. Faloutsos
*Mitchell et al. Predicting human brain activity associated with the meanings of nouns. Science,2008. Data@ www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html
CMU SCS
Neuro-semantics
!"
!" #$$%&%'()* +%,-./0 )($ */,/($0
!"#$%& '() *%&+&,-.-"/, .,0 +&- /1 &2&345.%+ $+&0 ", -6& &24&%"3&,-)!"#$%&'&(#)%*!+,$,!($,*,)%,-!./!-&*%&)'%!+0$-1(&'%2$,!(#&$*!-,*'$&3&)4!'0550)!'0)'$,%,!)02)*6!!78,*,!'0)*&*%,-!09!:!,;,5(<#$*!9$05!,#'8!09!=>!'#%,40$&,*?!#*!*80+)!#30@,6!!A!*<0+!,@,)%1$,<#%,-!(#$#-&45!+#*!,5(<0B,-?!&)!+8&'8!%8,!*%&52<2*!+#*!($,*,)%,-!90$!C*?!90<<0+,-!3B!#!D*!9&;#%&0)!(,$&0-!-2$&)4!+8&'8!#)!E!+#*!($,*,)%,-!&)!%8,!',)%,$!09!%8,!*'$,,)6!!F5#4,*!+,$,!($,*,)%,-!#*!+8&%,!<&),*!#)-!'8#$#'%,$*!0)!#!-#$G!3#'G4$02)-?!32%!#$,!&)@,$%,-!8,$,!%0!&5($0@,!$,#-#3&<&%B6!!78,!,)%&$,!*,%!09!./!,;,5(<#$*!+#*!($,*,)%,-!*&;!%&5,*?!$#)-05<B!(,$52%&)4!%8,!*,H2,)',!0)!,#'8!($,*,)%#%&0)6!
!
• Brain Scan Data*
• 9 persons • 60 nouns
• Questions • 218 questions • ‘is it alive?’, ‘can
you eat it?’
CMU SCS 15649 71 (c) 2014, C. Faloutsos
Patterns?
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
CMU SCS
Neuro-semantics
!"
!" #$$%&%'()* +%,-./0 )($ */,/($0
!"#$%& '() *%&+&,-.-"/, .,0 +&- /1 &2&345.%+ $+&0 ", -6& &24&%"3&,-)!"#$%&'&(#)%*!+,$,!($,*,)%,-!./!-&*%&)'%!+0$-1(&'%2$,!(#&$*!-,*'$&3&)4!'0550)!'0)'$,%,!)02)*6!!78,*,!'0)*&*%,-!09!:!,;,5(<#$*!9$05!,#'8!09!=>!'#%,40$&,*?!#*!*80+)!#30@,6!!A!*<0+!,@,)%1$,<#%,-!(#$#-&45!+#*!,5(<0B,-?!&)!+8&'8!%8,!*%&52<2*!+#*!($,*,)%,-!90$!C*?!90<<0+,-!3B!#!D*!9&;#%&0)!(,$&0-!-2$&)4!+8&'8!#)!E!+#*!($,*,)%,-!&)!%8,!',)%,$!09!%8,!*'$,,)6!!F5#4,*!+,$,!($,*,)%,-!#*!+8&%,!<&),*!#)-!'8#$#'%,$*!0)!#!-#$G!3#'G4$02)-?!32%!#$,!&)@,$%,-!8,$,!%0!&5($0@,!$,#-#3&<&%B6!!78,!,)%&$,!*,%!09!./!,;,5(<#$*!+#*!($,*,)%,-!*&;!%&5,*?!$#)-05<B!(,$52%&)4!%8,!*,H2,)',!0)!,#'8!($,*,)%#%&0)6!
!
• Brain Scan Data*
• 9 persons • 60 nouns
• Questions • 218 questions • ‘is it alive?’, ‘can
you eat it?’
Tofullyspecify
amodelwithint
hiscom-
putationalmode
lingframework,o
nemustfirst
defineasetofin
termediateseman
ticfeatures
f 1(w)f 2(w)…f n(w)tobeextract
edfromthetext
corpus.Inthispa
per,eachintermed
iatesemantic
featureisdefined
intermsoftheco
-occurrence
statisticsofthein
putstimuluswo
rdwwitha
particularotherwo
rd(e.g.,“taste”)or
setofwords
(e.g.,“taste,”“tas
tes,”or“tasted”)w
ithinthetext
corpus.Themode
listrainedbythe
applicationof
multipleregressio
ntothesefeature
sf i(w)andthe
observedfMRIim
ages,soastoobt
ainmaximum-
likelihoodestima
tesforthemodel
parametersc vi
(26).Oncetrained,
thecomputational
modelcanbe
evaluatedbygivin
gitwordsoutside
thetraining
setandcomparin
gitspredictedfM
RIimagesfor
thesewordswith
observedfMRIda
ta.Thiscom
putationalmodeli
ngframeworkis
basedontwokey
theoreticalassump
tions.First,it
assumesthesema
nticfeaturesthatd
istinguishthe
meaningsofarbit
raryconcretenoun
sarereflected
inthestatisticsof
theirusewithina
verylargetext
corpus.Thisassum
ptionisdrawnfro
mthefieldof
computationallin
guistics,wherest
atisticalword
distributionsare
frequentlyusedt
oapproximate
themeaningofd
ocumentsandw
ords(14–17).
Second,itassume
sthatthebrainact
ivityobserved
whenthinkinga
boutanyconcrete
nouncanbe
derivedasaweig
htedlinearsumo
fcontributions
fromeachofits
semanticfeatures.
Althoughthe
correctnessofthis
linearityassumpti
onisdebat-
able,itisconsist
entwiththewide
spreaduseof
linearmodelsin
fMRIanalysis(2
7)andwiththe
assumptionthatf
MRIactivationo
ftenreflectsa
linearsuperpositio
nofcontributions
fromdifferent
sources.Ourtheore
ticalframeworkdo
esnottakea
positiononwheth
ertheneuralactiv
ationencoding
meaningislocali
zedinparticular
corticalre-
gions.Instead,it
considersallcort
icalvoxelsand
allowsthetrainin
gdatatodetermin
ewhichloca-
tionsaresystemat
icallymodulated
bywhichas-
pectsofwordmean
ings.
Results.Weevaluated
thiscomputational
mod-elusing
fMRIdatafromn
inehealthy,college
-ageparticipan
tswhoviewed60
differentword-pic
turepairspre
sentedsixtimes
each.Anatomicall
yde-finedreg
ionsofinterestwe
reautomaticallyla
beledaccording
tothemethodolog
yin(28).The60
ran-domlyo
rderedstimuliinc
ludedfiveitems
fromeachof1
2semanticcatego
ries(animals,body
parts,buildings
,buildingparts,clo
thing,furniture,in
sects,kitchenit
ems,tools,vegeta
bles,vehicles,and
otherman-mad
eitems).Areprese
ntativefMRIimage
foreachstim
uluswascreatedb
ycomputingthem
eanfMRIre
sponseoveritssi
xpresentations,a
ndthemeanof
all60oftheserep
resentativeimages
wasthensub
tractedfromeach
[fordetails,see(
26)].Toinstan
tiateourmodeling
framework,wefir
stchoseas
etofintermediate
semanticfeatures.
Tobeeffective,
theintermediatese
manticfeaturesm
ustsimultane
ouslyencodethew
idevarietyofsema
nticcontento
ftheinputstimul
uswordsandfacto
rtheobserved
fMRIactivationin
tomoreprimitivec
om-
Predicted
“celery” = 0.84
“celery”“airplan
e”
Predicted:
Observed:
AB
+...
high
average below average
Predicted “celer
y”:+ 0.35+ 0.32
“eat”“taste”
“fill”
Fig.2.Predicting
fMRIimages
forgivenstimulus
words.(A)
Formingapredict
ionforpar-
ticipantP1fort
hestimulus
word“celery”afte
rtrainingon
58otherwords.Le
arnedc vico-efficients
for3ofthe25s
e-manticfe
atures(“eat,”“tast
e,”and“fill”
)aredepictedby
thevoxelcolo
rsinthethreeima
gesatthetop
ofthepanel.Thec
o-occurrenc
evalueforeacho
fthesefeaturesfor
thestimulusword
“celery”is
showntothelefto
ftheirrespectiveim
ages[e.g.,thevalu
efor“eat(celery)”i
s0.84].The
predictedactivation
forthestimuluswo
rd[shownatthebo
ttomof(A)]isal
inearcombinationo
fthe25semantic
fMRIsignatures,w
eightedby
theirco-occurrence
values.Thisfigure
showsjustonehor
izontalslice[z=
–12mminMontr
ealNeurological
Institute(MNI)spa
ce]ofthepredict
edthree-dim
ensionalimage.(
B)Predictedand
observedfMRIima
gesfor“celery”a
nd“airplane”after
trainingthatuses5
8otherwords.The
twolongredandb
lueverticalstreaks
nearthetop(post
eriorregion)ofthe
predictedandobse
rvedimagesaret
heleftandrightfu
siformgyri.
A B
C
Mean overparticipants
Participant P5
Fig.3.Locations
ofmostac
curatelypre-
dictedvoxels.Sur
face(A)andg
lassbrain(B)
renderingofthecor
rela-tionbetw
eenpredicted
andactualvoxelac
tiva-tionsfor
wordsoutside
thetrainingsetfor
par-ticipantP
5.Thesepanelssho
wclusterscontaining
atleast10contigu
ousvoxels,eachof
whosepredicted-
actualcorrelationis
atleast0.28.These
voxelclustersared
istributedthroughou
tthecortexan
dlocatedinthelef
tandrightoccipit
alandparietallobe
s;leftandrightfu
siform,postcentra
l,andmiddlefronta
lgyri;leftinferiorfr
ontalgyrus;medial
frontalgyrus;anda
nteriorcingulate.
(C)Surfacerenderi
ngofthepredicted-
actualcorrelation
averagedoveralln
ineparticipan
ts.Thispanelrepres
entsclusterscontai
ningatleast10co
ntiguousvoxels,ea
chwithaveragec
orrelationofatleast
0.14.
30MAY2008V
OL320SCIENC
Ewww.science
mag.org1192RESEAR
CHARTICLES
on May 30, 2008 www.sciencemag.org Downloaded from
Tofullyspecify
amodelwithin
thiscom-
putationalmode
lingframework
,onemustfirst
defineasetof
intermediatese
manticfeatures
f 1(w)f 2(w)…f n(w)tobeextrac
tedfromthetext
corpus.Inthisp
aper,eachinter
mediatesemanti
cfeature
isdefinedinte
rmsoftheco-o
ccurrence
statisticsofthe
inputstimulus
wordwwitha
particularotherw
ord(e.g.,“taste”
)orsetofword
s(e.g.,“ta
ste,”“tastes,”or
“tasted”)within
thetextcorpus.
Themodelistra
inedbytheappl
icationof
multipleregressio
ntothesefeatu
resf i(w)andthe
observedfMRI
images,soasto
obtainmaximum
-likelihoo
destimatesfort
hemodelparam
etersc vi
(26).Oncetraine
d,thecomputatio
nalmodelcanbe
evaluatedbygiv
ingitwordsou
tsidethetrainin
gsetand
comparingitsp
redictedfMRIim
agesforthesew
ordswithobser
vedfMRIdata.
Thiscomputatio
nalmodeling
frameworkis
basedontwok
eytheoreticalas
sumptions.First
,itassumes
thesemanticfea
turesthatdisting
uishthemeaning
sofarbitraryco
ncretenounsare
reflected
inthestatisticso
ftheirusewithi
naverylargetex
tcorpus.
Thisassumption
isdrawnfromthe
fieldofcomputa
tionallinguistic
s,wherestatist
icalword
distributionsare
frequentlyused
toapproximate
themeaningof
documentsand
words(14–17).
Second,itassum
esthatthebrain
activityobserve
dwhenth
inkingaboutan
yconcretenou
ncanbe
derivedasawe
ightedlinearsu
mofcontributio
nsfromea
chofitsseman
ticfeatures.Alth
oughthe
correctnessofth
islinearityassu
mptionisdebat
-able,it
isconsistentwi
ththewidesprea
duseof
linearmodelsin
fMRIanalysis(2
7)andwiththe
assumptiontha
tfMRIactivatio
noftenreflects
alinearsu
perpositionofc
ontributionsfrom
differentsources.
Ourtheoreticalfr
ameworkdoesn
ottakeaposition
onwhetherthen
euralactivation
encoding
meaningisloc
alizedinpartic
ularcorticalre
-gions.I
nstead,itconsid
ersallcorticalv
oxelsand
allowsthetrain
ingdatatodeterm
inewhichloca-
tionsaresystem
aticallymodulat
edbywhichas-
pectsofwordm
eanings.
Results.Weevaluate
dthiscomputatio
nalmod-
elusingfMRId
atafromninehea
lthy,college-age
participantswho
viewed60diffe
rentword-pictur
epairspr
esentedsixtime
seach.Anatom
icallyde-
finedregionsofi
nterestwereauto
maticallylabele
daccordin
gtothemethodo
logyin(28).Th
e60ran-
domlyordered
stimuliincluded
fiveitemsfrom
eachof12sema
nticcategories(a
nimals,bodypart
s,building
s,buildingparts
,clothing,furnit
ure,insects,
kitchenitems,to
ols,vegetables,
vehicles,andoth
erman-ma
deitems).Arepr
esentativefMRI
imagefor
eachstimuluswa
screatedbycomp
utingthemean
fMRIresponse
overitssixpres
entations,andth
emeanof
all60oftheser
epresentativeim
ageswas
thensubtractedf
romeach[ford
etails,see(26)]
.Toinsta
ntiateourmodeli
ngframework,w
efirstchosea
setofintermedia
tesemanticfeatu
res.Tobe
effective,theint
ermediatesema
nticfeaturesmu
stsimultan
eouslyencodeth
ewidevarietyof
semanticcontent
oftheinputstim
uluswordsand
factorthe
observedfMRIa
ctivationintomor
eprimitivecom-
Predicted
“celery” = 0.84
“celery”
“airplane”
Predicted:
Observed:
AB
+...
high
average below averag
e
Predicted “cele
ry”:+ 0.35+ 0.32
“eat”“taste”
“fill”
Fig.2.Predictin
gfMRIimages
forgivenstimu
luswords.(A)
Formingapredic
tionforpar-
ticipantP1for
thestimulus
word“celery”aft
ertrainingon
58otherwords.L
earnedcvico-
efficientsfor3of
the25se-
manticfeatures
(“eat,”“taste,”
and“fill”)ared
epictedbythe
voxelcolorsinth
ethreeimages
atthetopofthe
panel.Theco-
occurrencevalue
foreachofthese
featuresforthes
timulusword“ce
lery”isshownto
theleftoftheirre
spectiveimages[
e.g.,thevaluefor
“eat(celery)”is
0.84].Thepredict
edactivationfor
thestimulusword
[shownatthebot
tomof(A)]isa
linearcombinatio
nofthe25sema
nticfMRIsignatu
res,weightedby
theirco-occurren
cevalues.Thisf
igureshowsjust
onehorizontals
lice[z=
–12mminMont
realNeurologica
lInstitute(MNI)
space]ofthepr
edictedthree-dim
ensionalimage.
(B)Predictedan
dobservedfMR
Iimagesfor
“celery”and“air
plane”aftertrain
ingthatuses58
otherwords.The
twolongredand
blueverticalstre
aksnearthetop(
posteriorregion
)ofthepredicte
dandobs
ervedimagesare
theleftandrigh
tfusiformgyri.
A B
C
Mean overparticipants
Participant P5
Fig.3.Location
sofmostac
curatelypre-
dictedvoxels.S
urface(A)and
glassbrain(B)
renderingofthe
correla-tionbet
weenpredicted
andactualvoxel
activa-tionsfor
wordsoutside
thetrainingsetf
orpar-ticipantP
5.Thesepanelssh
owclusterscontai
ningatleast10c
ontiguousvoxels,
eachofwhose
predicted-actualc
orrelationisatleas
t0.28.Thesevox
elclustersaredis
tributedthrougho
utthecortexan
dlocatedinthele
ftandrightoccip
italandparietal
lobes;leftandri
ghtfusiform,
postcentral,andm
iddlefrontalgyri;l
eftinferiorfronta
lgyrus;medialfron
talgyrus;andant
eriorcingulate
.(C)Surfaceren
deringofthepr
edicted-actualcor
relationaveraged
overallnine
participants.This
panelrepresents
clusterscontaining
atleast10contigu
ousvoxels,each
withaverage
correlationofatle
ast0.14.
30MAY2008
VOL320SCI
ENCEwww.sc
iencemag.org
1192RESEARCHART
ICLES
on May 30, 2008 www.sciencemag.org Downloaded from
Tofullyspecify
amodelwithint
hiscom-
putationalmode
lingframework,o
nemustfirst
defineasetofin
termediateseman
ticfeatures
f 1(w)f 2(w)…f n(w)tobeextract
edfromthetext
corpus.Inthispa
per,eachintermed
iatesemantic
featureisdefined
intermsoftheco
-occurrence
statisticsofthein
putstimuluswo
rdwwitha
particularotherwo
rd(e.g.,“taste”)or
setofwords
(e.g.,“taste,”“tas
tes,”or“tasted”)w
ithinthetext
corpus.Themode
listrainedbythe
applicationof
multipleregressio
ntothesefeature
sf i(w)andthe
observedfMRIim
ages,soastoobt
ainmaximum-
likelihoodestima
tesforthemodel
parametersc vi
(26).Oncetrained,
thecomputational
modelcanbe
evaluatedbygivin
gitwordsoutside
thetraining
setandcomparin
gitspredictedfM
RIimagesfor
thesewordswith
observedfMRIda
ta.Thiscom
putationalmodeli
ngframeworkis
basedontwokey
theoreticalassump
tions.First,it
assumesthesema
nticfeaturesthatd
istinguishthe
meaningsofarbit
raryconcretenoun
sarereflected
inthestatisticsof
theirusewithina
verylargetext
corpus.Thisassum
ptionisdrawnfro
mthefieldof
computationallin
guistics,wherest
atisticalword
distributionsare
frequentlyusedt
oapproximate
themeaningofd
ocumentsandw
ords(14–17).
Second,itassume
sthatthebrainact
ivityobserved
whenthinkinga
boutanyconcrete
nouncanbe
derivedasaweig
htedlinearsumo
fcontributions
fromeachofits
semanticfeatures.
Althoughthe
correctnessofthis
linearityassumpti
onisdebat-
able,itisconsist
entwiththewide
spreaduseof
linearmodelsin
fMRIanalysis(2
7)andwiththe
assumptionthatf
MRIactivationo
ftenreflectsa
linearsuperpositio
nofcontributions
fromdifferent
sources.Ourtheore
ticalframeworkdo
esnottakea
positiononwheth
ertheneuralactiv
ationencoding
meaningislocali
zedinparticular
corticalre-
gions.Instead,it
considersallcort
icalvoxelsand
allowsthetrainin
gdatatodetermin
ewhichloca-
tionsaresystemat
icallymodulated
bywhichas-
pectsofwordmean
ings.
Results.Weevaluated
thiscomputational
mod-elusing
fMRIdatafromn
inehealthy,college
-ageparticipan
tswhoviewed60
differentword-pic
turepairspre
sentedsixtimes
each.Anatomicall
yde-finedreg
ionsofinterestwe
reautomaticallyla
beledaccording
tothemethodolog
yin(28).The60
ran-domlyo
rderedstimuliinc
ludedfiveitems
fromeachof1
2semanticcatego
ries(animals,body
parts,buildings
,buildingparts,clo
thing,furniture,in
sects,kitchenit
ems,tools,vegeta
bles,vehicles,and
otherman-mad
eitems).Areprese
ntativefMRIimage
foreachstim
uluswascreatedb
ycomputingthem
eanfMRIre
sponseoveritssi
xpresentations,a
ndthemeanof
all60oftheserep
resentativeimages
wasthensub
tractedfromeach
[fordetails,see(
26)].Toinstan
tiateourmodeling
framework,wefir
stchoseas
etofintermediate
semanticfeatures.
Tobeeffective,
theintermediatese
manticfeaturesm
ustsimultane
ouslyencodethew
idevarietyofsema
nticcontento
ftheinputstimul
uswordsandfacto
rtheobserved
fMRIactivationin
tomoreprimitivec
om-
Predicted
“celery” = 0.84
“celery”“airplan
e”
Predicted:
Observed:
AB
+...
high
average below average
Predicted “celer
y”:+ 0.35+ 0.32
“eat”“taste”
“fill”
Fig.2.Predicting
fMRIimages
forgivenstimulus
words.(A)
Formingapredict
ionforpar-
ticipantP1fort
hestimulus
word“celery”afte
rtrainingon
58otherwords.Le
arnedc vico-efficients
for3ofthe25s
e-manticfe
atures(“eat,”“tast
e,”and“fill”
)aredepictedby
thevoxelcolo
rsinthethreeima
gesatthetop
ofthepanel.Thec
o-occurrenc
evalueforeacho
fthesefeaturesfor
thestimulusword
“celery”is
showntothelefto
ftheirrespectiveim
ages[e.g.,thevalu
efor“eat(celery)”i
s0.84].The
predictedactivation
forthestimuluswo
rd[shownatthebo
ttomof(A)]isal
inearcombinationo
fthe25semantic
fMRIsignatures,w
eightedby
theirco-occurrence
values.Thisfigure
showsjustonehor
izontalslice[z=
–12mminMontr
ealNeurological
Institute(MNI)spa
ce]ofthepredict
edthree-dim
ensionalimage.(
B)Predictedand
observedfMRIima
gesfor“celery”a
nd“airplane”after
trainingthatuses5
8otherwords.The
twolongredandb
lueverticalstreaks
nearthetop(post
eriorregion)ofthe
predictedandobse
rvedimagesaret
heleftandrightfu
siformgyri.
A B
C
Mean overparticipants
Participant P5
Fig.3.Locations
ofmostac
curatelypre-
dictedvoxels.Sur
face(A)andg
lassbrain(B)
renderingofthecor
rela-tionbetw
eenpredicted
andactualvoxelac
tiva-tionsfor
wordsoutside
thetrainingsetfor
par-ticipantP
5.Thesepanelssho
wclusterscontaining
atleast10contigu
ousvoxels,eachof
whosepredicted-
actualcorrelationis
atleast0.28.These
voxelclustersared
istributedthroughou
tthecortexan
dlocatedinthelef
tandrightoccipit
alandparietallobe
s;leftandrightfu
siform,postcentra
l,andmiddlefronta
lgyri;leftinferiorfr
ontalgyrus;medial
frontalgyrus;anda
nteriorcingulate.
(C)Surfacerenderi
ngofthepredicted-
actualcorrelation
averagedoveralln
ineparticipan
ts.Thispanelrepres
entsclusterscontai
ningatleast10co
ntiguousvoxels,ea
chwithaveragec
orrelationofatleast
0.14.
30MAY2008V
OL320SCIENC
Ewww.science
mag.org1192RESEAR
CHARTICLES
on May 30, 2008 www.sciencemag.org Downloaded from
Tofullyspecify
amodelwithin
thiscom-
putationalmode
lingframework
,onemustfirst
defineasetof
intermediatese
manticfeatures
f 1(w)f 2(w)…f n(w)tobeextrac
tedfromthetext
corpus.Inthisp
aper,eachinter
mediatesemanti
cfeature
isdefinedinte
rmsoftheco-o
ccurrence
statisticsofthe
inputstimulus
wordwwitha
particularotherw
ord(e.g.,“taste”
)orsetofword
s(e.g.,“ta
ste,”“tastes,”or
“tasted”)within
thetextcorpus.
Themodelistra
inedbytheappl
icationof
multipleregressio
ntothesefeatu
resf i(w)andthe
observedfMRI
images,soasto
obtainmaximum
-likelihoo
destimatesfort
hemodelparam
etersc vi
(26).Oncetraine
d,thecomputatio
nalmodelcanbe
evaluatedbygiv
ingitwordsou
tsidethetrainin
gsetand
comparingitsp
redictedfMRIim
agesforthesew
ordswithobser
vedfMRIdata.
Thiscomputatio
nalmodeling
frameworkis
basedontwok
eytheoreticalas
sumptions.First
,itassumes
thesemanticfea
turesthatdisting
uishthemeaning
sofarbitraryco
ncretenounsare
reflected
inthestatisticso
ftheirusewithi
naverylargetex
tcorpus.
Thisassumption
isdrawnfromthe
fieldofcomputa
tionallinguistic
s,wherestatist
icalword
distributionsare
frequentlyused
toapproximate
themeaningof
documentsand
words(14–17).
Second,itassum
esthatthebrain
activityobserve
dwhenth
inkingaboutan
yconcretenou
ncanbe
derivedasawe
ightedlinearsu
mofcontributio
nsfromea
chofitsseman
ticfeatures.Alth
oughthe
correctnessofth
islinearityassu
mptionisdebat
-able,it
isconsistentwi
ththewidesprea
duseof
linearmodelsin
fMRIanalysis(2
7)andwiththe
assumptiontha
tfMRIactivatio
noftenreflects
alinearsu
perpositionofc
ontributionsfrom
differentsources.
Ourtheoreticalfr
ameworkdoesn
ottakeaposition
onwhetherthen
euralactivation
encoding
meaningisloc
alizedinpartic
ularcorticalre
-gions.I
nstead,itconsid
ersallcorticalv
oxelsand
allowsthetrain
ingdatatodeterm
inewhichloca-
tionsaresystem
aticallymodulat
edbywhichas-
pectsofwordm
eanings.
Results.Weevaluate
dthiscomputatio
nalmod-
elusingfMRId
atafromninehea
lthy,college-age
participantswho
viewed60diffe
rentword-pictur
epairspr
esentedsixtime
seach.Anatom
icallyde-
finedregionsofi
nterestwereauto
maticallylabele
daccordin
gtothemethodo
logyin(28).Th
e60ran-
domlyordered
stimuliincluded
fiveitemsfrom
eachof12sema
nticcategories(a
nimals,bodypart
s,building
s,buildingparts
,clothing,furnit
ure,insects,
kitchenitems,to
ols,vegetables,
vehicles,andoth
erman-ma
deitems).Arepr
esentativefMRI
imagefor
eachstimuluswa
screatedbycomp
utingthemean
fMRIresponse
overitssixpres
entations,andth
emeanof
all60oftheser
epresentativeim
ageswas
thensubtractedf
romeach[ford
etails,see(26)]
.Toinsta
ntiateourmodeli
ngframework,w
efirstchosea
setofintermedia
tesemanticfeatu
res.Tobe
effective,theint
ermediatesema
nticfeaturesmu
stsimultan
eouslyencodeth
ewidevarietyof
semanticcontent
oftheinputstim
uluswordsand
factorthe
observedfMRIa
ctivationintomor
eprimitivecom-
Predicted
“celery” = 0.84
“celery”
“airplane”
Predicted:
Observed:
AB
+...
high
average below averag
e
Predicted “cele
ry”:+ 0.35+ 0.32
“eat”“taste”
“fill”
Fig.2.Predictin
gfMRIimages
forgivenstimu
luswords.(A)
Formingapredic
tionforpar-
ticipantP1for
thestimulus
word“celery”aft
ertrainingon
58otherwords.L
earnedcvico-
efficientsfor3of
the25se-
manticfeatures
(“eat,”“taste,”
and“fill”)ared
epictedbythe
voxelcolorsinth
ethreeimages
atthetopofthe
panel.Theco-
occurrencevalue
foreachofthese
featuresforthes
timulusword“ce
lery”isshownto
theleftoftheirre
spectiveimages[
e.g.,thevaluefor
“eat(celery)”is
0.84].Thepredict
edactivationfor
thestimulusword
[shownatthebot
tomof(A)]isa
linearcombinatio
nofthe25sema
nticfMRIsignatu
res,weightedby
theirco-occurren
cevalues.Thisf
igureshowsjust
onehorizontals
lice[z=
–12mminMont
realNeurologica
lInstitute(MNI)
space]ofthepr
edictedthree-dim
ensionalimage.
(B)Predictedan
dobservedfMR
Iimagesfor
“celery”and“air
plane”aftertrain
ingthatuses58
otherwords.The
twolongredand
blueverticalstre
aksnearthetop(
posteriorregion
)ofthepredicte
dandobs
ervedimagesare
theleftandrigh
tfusiformgyri.
A B
C
Mean overparticipants
Participant P5
Fig.3.Location
sofmostac
curatelypre-
dictedvoxels.S
urface(A)and
glassbrain(B)
renderingofthe
correla-tionbet
weenpredicted
andactualvoxel
activa-tionsfor
wordsoutside
thetrainingsetf
orpar-ticipantP
5.Thesepanelssh
owclusterscontai
ningatleast10c
ontiguousvoxels,
eachofwhose
predicted-actualc
orrelationisatleas
t0.28.Thesevox
elclustersaredis
tributedthrougho
utthecortexan
dlocatedinthele
ftandrightoccip
italandparietal
lobes;leftandri
ghtfusiform,
postcentral,andm
iddlefrontalgyri;l
eftinferiorfronta
lgyrus;medialfron
talgyrus;andant
eriorcingulate
.(C)Surfaceren
deringofthepr
edicted-actualcor
relationaveraged
overallnine
participants.This
panelrepresents
clusterscontaining
atleast10contigu
ousvoxels,each
withaverage
correlationofatle
ast0.14.
30MAY2008
VOL320SCI
ENCEwww.sc
iencemag.org
1192RESEARCHART
ICLES
on May 30, 2008 www.sciencemag.org Downloaded from
…
airplane
dog
noun
s
questions
voxels CMU SCS 15649 72 (c) 2014, C. Faloutsos
Patterns?
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
CMU SCS Neuro-semantics
50 100 150 200 250
50
100
150
200
250
3000
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
Premotor Cortex
50 100 150 200 250
50
100
150
200
250
300
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Group1 Group 2 Group 4Group 3
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?bear does it grow?
cow is it alive?coat was it ever alive?
bear does it grow?cow is it alive?coat was it ever alive?
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
bed does it use electricity?house can you sit on it?car does it cast a shadow?
bed does it use electricity?house can you sit on it?car does it cast a shadow?
Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very different from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.
v1 and v2 which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:
�v1 − v̂1�2 + �v2 − v̂2�2 < �v1 − v̂2�2 + �v2 − v̂1�2
Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watchwe obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.
5 Experiments
We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used
5http://www.cs.cmu.edu/~epapalex/src/turbo_smt.zip
the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.
5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that
6http://www.models.life.ku.dk/joda/CMTF_Toolbox
73 CMU SCS 15649 (c) 2014, C. Faloutsos
=
CMU SCS Neuro-semantics
74 CMU SCS 15649 (c) 2014, C. Faloutsos
50 100 150 200 250
50
100
150
200
250
3000
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
Premotor Cortex
50 100 150 200 250
50
100
150
200
250
300
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Group1 Group 2 Group 4Group 3
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?bear does it grow?
cow is it alive?coat was it ever alive?
bear does it grow?cow is it alive?coat was it ever alive?
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
bed does it use electricity?house can you sit on it?car does it cast a shadow?
bed does it use electricity?house can you sit on it?car does it cast a shadow?
Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very different from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.
v1 and v2 which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:
�v1 − v̂1�2 + �v2 − v̂2�2 < �v1 − v̂2�2 + �v2 − v̂1�2
Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watchwe obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.
5 Experiments
We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used
5http://www.cs.cmu.edu/~epapalex/src/turbo_smt.zip
the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.
5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that
6http://www.models.life.ku.dk/joda/CMTF_Toolbox
Small items -> Premotor cortex
=
CMU SCS Neuro-semantics
75 CMU SCS 15649 (c) 2014, C. Faloutsos
50 100 150 200 250
50
100
150
200
250
3000
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
Premotor Cortex
50 100 150 200 250
50
100
150
200
250
300
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Group1 Group 2 Group 4Group 3
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?bear does it grow?
cow is it alive?coat was it ever alive?
bear does it grow?cow is it alive?coat was it ever alive?
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
bed does it use electricity?house can you sit on it?car does it cast a shadow?
bed does it use electricity?house can you sit on it?car does it cast a shadow?
Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very different from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.
v1 and v2 which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:
�v1 − v̂1�2 + �v2 − v̂2�2 < �v1 − v̂2�2 + �v2 − v̂1�2
Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watchwe obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.
5 Experiments
We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used
5http://www.cs.cmu.edu/~epapalex/src/turbo_smt.zip
the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.
5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that
6http://www.models.life.ku.dk/joda/CMTF_Toolbox
Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x, SDM 2014
Small items -> Premotor cortex
CMU SCS
(c) 2014, C. Faloutsos 76
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors
– P2.1: Discoveries @ phonecall network – P2.2: Discoveries in neuro-semantics – Speed
• Part#3: Cascades and immunization • Conclusions
CMU SCS 15649
CMU SCS
Speed of tensor/CMTF analysis • Q1: Can we make it fast? • Q2: Does it work for large, disk-based data?
CMU SCS 15649 (c) 2014, C. Faloutsos 77
CMU SCS
A1: Turbo-SMT a time-evolving social network with side information,discovering anomalies.
0 0.2 0.4 0.6 0.8 101
5
10
15
20
25
30
40
50
60
70
Relative run time
Rela
tive c
ost
Parameter s taking values in [2 5 10 20]
CMTF−OPT, F=10, 8 coresCMTF−OPT, F=10, 4 coresALS, F=2, 8 coresALS, F=10, 1 core
~200x speedup
Ideal
Baseline
Figure 1: Turbo-SMT is up to 200x faster, for com-parable accuracy. Relative execution time vs relativecost (lower is better), for various settings (see Sec. 3.Turbo-SMT boosting either ALS or CMTF-OPT [5])(i.e. the baselines) vs. plain execution of the baselines(on a single core), on the BrainQ dataset (see Section4.). For more details, see Section 5.
Disclaimer: An earlier version of this work was up-loaded on Arxiv.org [16] for quick dissemination thereofand early feedback. However, the Arxiv.org version hasnot been officially published in any conference proceed-ings or journal, and has evolved into this present work.
2 Preliminaries
Symbol DescriptionCMTF Coupled Matrix-Tensor Factorization
ALS Alternating Least Squares
CMTF-OPT Algorithm introduced in [5]
x,x,X,X scalar, vector, matrix, tensor (respectively)
�A�F Frobenius norm of A.
(i) as superscript Indicates the i-th iteration
Ai1, a
i1 series of matrices or vectors, indexed by i.
I Set of indices.
x(I) Spanning indices I of x.nucleus a biased random sample of the data tensor.
Table 1: Table of symbols
2.1 Introduction to Tensors Matrices recorddyadic properties, like “people recommending prod-ucts”. Tensors are the n-mode generalizations, cap-turing 3- and higher-way relationships. For example“subject-verb-object” relationships, such as the onesrecorded by the Read the Web - NELL project [1] (and
X
≈
voxels
words
persons
a1
b1
c1+
Concept1
a2
b2
c2+
Concept2
aR
bRcR
Concept R
. . . +
Figure 2: PARAFAC decomposition of a three-way tensorof a brain activity tensor as sum of F outer products(rank-one tensors), reminiscent of the rank-F singular valuedecomposition of a matrix. Each component corresponds toa latent concept of, e.g. ”insects”, ”tools” and so on, a setof brain regions that are most active for that particular setof words, as well as groups of persons.
have been recently used in this context [9] [15]) naturallylead to a 3-mode tensor. In this work, our working ex-ample of a tensor has three modes. The first mode con-tains a number of nouns; the second mode correspondsto the brain activity, as recorded by an fMRI machine;and the third mode identifies the human subject corre-sponding to a particular brain activity measurement.
In [15], the authors introduced a scalable andparallelizable tensor decomposition which uses modesampling. In this work, we focus on a more general andexpressive framework, that of Coupled Matrix-TensorFactorizations.
2.2 Coupled Matrix-Tensor Factorization Of-tentimes, two tensors, or a matrix and a tensor, mayhave one mode in common; consider the example thatwe mentioned earlier, where we have a word by brainactivity by human subject tensor, we also have a se-mantic matrix that provides additional information forthe same set of words. In this case, we say that thematrix and the tensor are coupled in the ’word’ mode.
X
voxels
words
persons
Y1
questions
words
Figure 3: Coupled Matrix - Tensor example: Tensors oftenshare one or more modes (with thick, wavy line): X is thebrain activity tensor and Y is the semantic matrix. As thewavy line indicates, these two datasets are coupled in the’word’ dimension.
In this work we focus on three mode tensors,
78 CMU SCS 15649 (c) 2014, C. Faloutsos
Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x, SDM 2014
200x faster
Relative run time
Relative cost
Ideal
CMU SCS A1: Turbo-SMT
a time-evolving social network with side information,discovering anomalies.
0 0.2 0.4 0.6 0.8 101
5
10
15
20
25
30
40
50
60
70
Relative run time
Rela
tive c
ost
Parameter s taking values in [2 5 10 20]
CMTF−OPT, F=10, 8 coresCMTF−OPT, F=10, 4 coresALS, F=2, 8 coresALS, F=10, 1 core
~200x speedup
Ideal
Baseline
Figure 1: Turbo-SMT is up to 200x faster, for com-parable accuracy. Relative execution time vs relativecost (lower is better), for various settings (see Sec. 3.Turbo-SMT boosting either ALS or CMTF-OPT [5])(i.e. the baselines) vs. plain execution of the baselines(on a single core), on the BrainQ dataset (see Section4.). For more details, see Section 5.
Disclaimer: An earlier version of this work was up-loaded on Arxiv.org [16] for quick dissemination thereofand early feedback. However, the Arxiv.org version hasnot been officially published in any conference proceed-ings or journal, and has evolved into this present work.
2 Preliminaries
Symbol DescriptionCMTF Coupled Matrix-Tensor Factorization
ALS Alternating Least Squares
CMTF-OPT Algorithm introduced in [5]
x,x,X,X scalar, vector, matrix, tensor (respectively)
�A�F Frobenius norm of A.
(i) as superscript Indicates the i-th iteration
Ai1, a
i1 series of matrices or vectors, indexed by i.
I Set of indices.
x(I) Spanning indices I of x.nucleus a biased random sample of the data tensor.
Table 1: Table of symbols
2.1 Introduction to Tensors Matrices recorddyadic properties, like “people recommending prod-ucts”. Tensors are the n-mode generalizations, cap-turing 3- and higher-way relationships. For example“subject-verb-object” relationships, such as the onesrecorded by the Read the Web - NELL project [1] (and
X
≈
voxels
words
persons
a1
b1
c1+
Concept1
a2
b2
c2+
Concept2
aR
bRcR
Concept R
. . . +
Figure 2: PARAFAC decomposition of a three-way tensorof a brain activity tensor as sum of F outer products(rank-one tensors), reminiscent of the rank-F singular valuedecomposition of a matrix. Each component corresponds toa latent concept of, e.g. ”insects”, ”tools” and so on, a setof brain regions that are most active for that particular setof words, as well as groups of persons.
have been recently used in this context [9] [15]) naturallylead to a 3-mode tensor. In this work, our working ex-ample of a tensor has three modes. The first mode con-tains a number of nouns; the second mode correspondsto the brain activity, as recorded by an fMRI machine;and the third mode identifies the human subject corre-sponding to a particular brain activity measurement.
In [15], the authors introduced a scalable andparallelizable tensor decomposition which uses modesampling. In this work, we focus on a more general andexpressive framework, that of Coupled Matrix-TensorFactorizations.
2.2 Coupled Matrix-Tensor Factorization Of-tentimes, two tensors, or a matrix and a tensor, mayhave one mode in common; consider the example thatwe mentioned earlier, where we have a word by brainactivity by human subject tensor, we also have a se-mantic matrix that provides additional information forthe same set of words. In this case, we say that thematrix and the tensor are coupled in the ’word’ mode.
X
voxels
words
persons
Y1
questions
words
Figure 3: Coupled Matrix - Tensor example: Tensors oftenshare one or more modes (with thick, wavy line): X is thebrain activity tensor and Y is the semantic matrix. As thewavy line indicates, these two datasets are coupled in the’word’ dimension.
In this work we focus on three mode tensors,
79 CMU SCS 15649 (c) 2014, C. Faloutsos
200x faster
Exploiting sparsity
Relative run time Ideal
Relative cost
baseline
CMU SCS A1: Turbo-SMT
a time-evolving social network with side information,discovering anomalies.
0 0.2 0.4 0.6 0.8 101
5
10
15
20
25
30
40
50
60
70
Relative run time
Rela
tive c
ost
Parameter s taking values in [2 5 10 20]
CMTF−OPT, F=10, 8 coresCMTF−OPT, F=10, 4 coresALS, F=2, 8 coresALS, F=10, 1 core
~200x speedup
Ideal
Baseline
Figure 1: Turbo-SMT is up to 200x faster, for com-parable accuracy. Relative execution time vs relativecost (lower is better), for various settings (see Sec. 3.Turbo-SMT boosting either ALS or CMTF-OPT [5])(i.e. the baselines) vs. plain execution of the baselines(on a single core), on the BrainQ dataset (see Section4.). For more details, see Section 5.
Disclaimer: An earlier version of this work was up-loaded on Arxiv.org [16] for quick dissemination thereofand early feedback. However, the Arxiv.org version hasnot been officially published in any conference proceed-ings or journal, and has evolved into this present work.
2 Preliminaries
Symbol DescriptionCMTF Coupled Matrix-Tensor Factorization
ALS Alternating Least Squares
CMTF-OPT Algorithm introduced in [5]
x,x,X,X scalar, vector, matrix, tensor (respectively)
�A�F Frobenius norm of A.
(i) as superscript Indicates the i-th iteration
Ai1, a
i1 series of matrices or vectors, indexed by i.
I Set of indices.
x(I) Spanning indices I of x.nucleus a biased random sample of the data tensor.
Table 1: Table of symbols
2.1 Introduction to Tensors Matrices recorddyadic properties, like “people recommending prod-ucts”. Tensors are the n-mode generalizations, cap-turing 3- and higher-way relationships. For example“subject-verb-object” relationships, such as the onesrecorded by the Read the Web - NELL project [1] (and
X
≈
voxels
words
persons
a1
b1
c1+
Concept1
a2
b2
c2+
Concept2
aR
bRcR
Concept R
. . . +
Figure 2: PARAFAC decomposition of a three-way tensorof a brain activity tensor as sum of F outer products(rank-one tensors), reminiscent of the rank-F singular valuedecomposition of a matrix. Each component corresponds toa latent concept of, e.g. ”insects”, ”tools” and so on, a setof brain regions that are most active for that particular setof words, as well as groups of persons.
have been recently used in this context [9] [15]) naturallylead to a 3-mode tensor. In this work, our working ex-ample of a tensor has three modes. The first mode con-tains a number of nouns; the second mode correspondsto the brain activity, as recorded by an fMRI machine;and the third mode identifies the human subject corre-sponding to a particular brain activity measurement.
In [15], the authors introduced a scalable andparallelizable tensor decomposition which uses modesampling. In this work, we focus on a more general andexpressive framework, that of Coupled Matrix-TensorFactorizations.
2.2 Coupled Matrix-Tensor Factorization Of-tentimes, two tensors, or a matrix and a tensor, mayhave one mode in common; consider the example thatwe mentioned earlier, where we have a word by brainactivity by human subject tensor, we also have a se-mantic matrix that provides additional information forthe same set of words. In this case, we say that thematrix and the tensor are coupled in the ’word’ mode.
X
voxels
words
persons
Y1
questions
words
Figure 3: Coupled Matrix - Tensor example: Tensors oftenshare one or more modes (with thick, wavy line): X is thebrain activity tensor and Y is the semantic matrix. As thewavy line indicates, these two datasets are coupled in the’word’ dimension.
In this work we focus on three mode tensors,
80 CMU SCS 15649 (c) 2014, C. Faloutsos
200x faster
www.cs.cmu.edu/~epapalex/src/turbo_smt.zip
Relative run time Ideal
Relative cost
baseline
CMU SCS Q2: spilling to the disk?
Reminder: tensor (eg., Subject-verb-object) 144M non-zeros
NELL (Never Ending Language Learner)
@CMU
CMU SCS 15649 81 (c) 2014, C. Faloutsos
26M
26M
48M
CMU SCS A2: GigaTensor
Reminder: tensor (eg., Subject-verb-object) 26M x 48M x 26M, 144M non-zeros
NELL (Never Ending Language Learner)
@CMU
CMU SCS 15649 82 (c) 2014, C. Faloutsos
U Kang, Evangelos E. Papalexakis, Abhay Harpale, Christos Faloutsos, GigaTensor: Scaling Tensor Analysis Up By 100 Times - Algorithms and Discoveries, KDD’12
CMU SCS
A2: GigaTensor
• GigaTensor solves 100x larger problem
Number of nonzero = I / 50
(J)
(I)
(K)
GigaTensor
Out of Memory
100x
CMU SCS 15649 83 (c) 2014, C. Faloutsos
CMU SCS
Part 2: Conclusions
• Time-evolving / heterogeneous graphs -> tensors
• PARAFAC finds patterns • Turbo-SMT; GigaTensor -> fast & scalable
CMU SCS 15649 84 (c) 2014, C. Faloutsos
50 100 150 200 250
50
100
150
200
250
3000
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
Premotor Cortex
50 100 150 200 250
50
100
150
200
250
300
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Group1 Group 2 Group 4Group 3
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?bear does it grow?
cow is it alive?coat was it ever alive?
bear does it grow?cow is it alive?coat was it ever alive?
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
bed does it use electricity?house can you sit on it?car does it cast a shadow?
bed does it use electricity?house can you sit on it?car does it cast a shadow?
Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very different from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.
v1 and v2 which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:
�v1 − v̂1�2 + �v2 − v̂2�2 < �v1 − v̂2�2 + �v2 − v̂1�2
Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watchwe obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.
5 Experiments
We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used
5http://www.cs.cmu.edu/~epapalex/src/turbo_smt.zip
the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.
5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that
6http://www.models.life.ku.dk/joda/CMTF_Toolbox
=
CMU SCS
(c) 2014, C. Faloutsos 85
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors • Conclusions
CMU SCS 15649
CMU SCS
(c) 2014, C. Faloutsos 86
Project info: PEGASUS
CMU SCS 15649
www.cs.cmu.edu/~pegasus Results on large graphs: with Pegasus +
hadoop + M45 Apache license Code, papers, manual, video
Prof. U Kang Prof. Polo Chau
CMU SCS
(c) 2014, C. Faloutsos 87
Cast
Akoglu, Leman
Chau, Polo
Kang, U
Prakash, Aditya
CMU SCS 15649
Koutra, Danai
Beutel, Alex
Papalexakis, Vagelis
Shah, Neil
Lee, Jay Yoon
Araujo, Miguel
CMU SCS
(c) 2014, C. Faloutsos 88
CONCLUSION#1 – Big data
• Large datasets reveal patterns/outliers that are invisible otherwise
CMU SCS 15649
CMU SCS
(c) 2014, C. Faloutsos 89
CONCLUSION#2 – tensors
• powerful tool
CMU SCS 15649
50 100 150 200 250
50
100
150
200
250
3000
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
Premotor Cortex
50 100 150 200 250
50
100
150
200
250
300
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Group1 Group 2 Group 4Group 3
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?bear does it grow?
cow is it alive?coat was it ever alive?
bear does it grow?cow is it alive?coat was it ever alive?
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
bed does it use electricity?house can you sit on it?car does it cast a shadow?
bed does it use electricity?house can you sit on it?car does it cast a shadow?
Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very different from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.
v1 and v2 which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:
�v1 − v̂1�2 + �v2 − v̂2�2 < �v1 − v̂2�2 + �v2 − v̂1�2
Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watchwe obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.
5 Experiments
We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used
5http://www.cs.cmu.edu/~epapalex/src/turbo_smt.zip
the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.
5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that
6http://www.models.life.ku.dk/joda/CMTF_Toolbox
=
CMU SCS
(c) 2014, C. Faloutsos 90
References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws,
Tools and Case Studies, Morgan Claypool 2012 • http://www.morganclaypool.com/doi/abs/10.2200/
S00449ED1V01Y201209DMK006
CMU SCS 15649
CMU SCS
TAKE HOME MESSAGE: Cross-disciplinarity
CMU SCS 15649 (c) 2014, C. Faloutsos 91
50 100 150 200 250
50
100
150
200
250
3000
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
Premotor Cortex
50 100 150 200 250
50
100
150
200
250
300
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Group1 Group 2 Group 4Group 3
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?bear does it grow?
cow is it alive?coat was it ever alive?
bear does it grow?cow is it alive?coat was it ever alive?
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
bed does it use electricity?house can you sit on it?car does it cast a shadow?
bed does it use electricity?house can you sit on it?car does it cast a shadow?
Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very different from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.
v1 and v2 which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:
�v1 − v̂1�2 + �v2 − v̂2�2 < �v1 − v̂2�2 + �v2 − v̂1�2
Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watchwe obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.
5 Experiments
We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used
5http://www.cs.cmu.edu/~epapalex/src/turbo_smt.zip
the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.
5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that
6http://www.models.life.ku.dk/joda/CMTF_Toolbox
=
CMU SCS
TAKE HOME MESSAGE: Cross-disciplinarity
CMU SCS 15649 (c) 2014, C. Faloutsos 92
50 100 150 200 250
50
100
150
200
250
3000
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
Premotor Cortex
50 100 150 200 250
50
100
150
200
250
300
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Group1 Group 2 Group 4Group 3
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questionsbeetle can it cause you pain?bear does it grow?
cow is it alive?coat was it ever alive?
bear does it grow?cow is it alive?coat was it ever alive?
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
bed does it use electricity?house can you sit on it?car does it cast a shadow?
bed does it use electricity?house can you sit on it?car does it cast a shadow?
Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very different from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.
v1 and v2 which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:
�v1 − v̂1�2 + �v2 − v̂2�2 < �v1 − v̂2�2 + �v2 − v̂1�2
Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watchwe obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.
5 Experiments
We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used
5http://www.cs.cmu.edu/~epapalex/src/turbo_smt.zip
the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.
5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that
6http://www.models.life.ku.dk/joda/CMTF_Toolbox
=
Questions?
Top Related