Semantic Representations with Probabilistic Topic Models
description
Transcript of Semantic Representations with Probabilistic Topic Models
![Page 1: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/1.jpg)
Semantic Representations with Probabilistic Topic Models
Mark SteyversDepartment of Cognitive Sciences
University of California, Irvine
Joint work with:Tom Griffiths, UC BerkeleyPadhraic Smyth, UC Irvine
![Page 2: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/2.jpg)
Topic Models in Machine Learning
• Unsupervised extraction of content from large text collection
• Topics provide quick summary of content / gist
What is in this corpus?
What is in this document, paragraph, or sentence?
What are similar documents to a query?
What are the topical trends over time?
![Page 3: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/3.jpg)
Topic Models in Psychology
• Topic models address three computational problems for semantic memory system:
1) Gist extraction: what is this set of words about?
2) Disambiguation: what is the sense of this word?
- E.g. “football field” vs. “magnetic field”
3) Prediction: what fact, concept, or word is next?
![Page 4: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/4.jpg)
PLAY
How are these learned?
CASH
BAT
LOAN
MONEY
BANK
RIVER STREAM
Two approaches to semantic representation
Semantic networks Semantic Spaces
FUNGAME
STAGE
THEATER
BALL
Can be learned (e.g. Latent Semantic Analysis), but is this representation flexible enough?
![Page 5: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/5.jpg)
Overview
I Probabilistic Topic Models
generative model
statistical inference: Gibbs sampling
II Explaining human memory
word association
semantic isolation
false memory
III Information retrieval
![Page 6: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/6.jpg)
Probabilistic Topic Models
• Extract topics from large text collections
unsupervised
generative
Bayesian statistical inference
• Our modeling work is based on:
– pLSI Model: Hoffman (1999)
– LDA Model: Blei, Ng, and Jordan (2001, 2003)
– Topics Model: Griffiths and Steyvers (2003, 2004)
![Page 7: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/7.jpg)
Model input: “bag of words”
• Matrix of number of times words occur in documents
• Note: some function words are deleted: “the”, “a”, “and”, etc
documents
wor
ds
| ??P w d
1
…
16
…
0
…
MONEY
…
6195BANK
0012STREAM
0034RIVER
Doc3 … Doc2Doc1
![Page 8: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/8.jpg)
Probabilistic Topic Models
• A topic represents a probability distribution over words
– Related words get high probability in same topic
• Example topics extracted from NIH/NSF grants:
brainfmri
imagingfunctional
mrisubjects
magneticresonance
neuroimagingstructural
schizophreniapatientsdeficits
schizophrenicpsychosissubjects
psychoticdysfunction
abnormalitiesclinical
memoryworking
memoriestasks
retrievalencodingcognitive
processingrecognition
performance
diseasead
alzheimerdiabetes
cardiovascularinsulin
vascularblood
clinicalindividuals
Probability distribution over words. Most likely words listed at the top
![Page 9: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/9.jpg)
Document = mixture of topics
brainfmri
imagingfunctional
mrisubjects
magneticresonance
neuroimagingstructural
schizophreniapatientsdeficits
schizophrenicpsychosissubjects
psychoticdysfunction
abnormalitiesclinical
memoryworking
memoriestasks
retrievalencodingcognitive
processingrecognition
performance
diseasead
alzheimerdiabetes
cardiovascularinsulin
vascularblood
clinicalindividuals
Document
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
80%20%
Document
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
100%
![Page 10: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/10.jpg)
Generative Process
• For each document, choose a mixture of topics
Dirichlet()
• Sample a topic [1..T] from the mixture
z Multinomial()
• Sample a word from the topic
w Multinomial((z))
Dirichlet(β)
z
w
D
T
Nd
![Page 11: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/11.jpg)
Prior Distributions• Dirichlet priors encourage sparsity on topic mixtures and
topics
θ ~ Dirichlet( α ) ~ Dirichlet( β )
Topic 1 Topic 2
Topic 3
Word 1 Word 2
Word 3
(darker colors indicate lower probability)
![Page 12: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/12.jpg)
Creating Artificial Dataset
Docs
Can we recover the original topics and topic mixtures from this data?
River Stream Bank Money Loan123456789
10111213141516
Two topics
16 documents
topic 1 topic 2
River 0.33 0Stream 0.33 0Bank 0.33 0.33Money 0 0.33Loan 0 0.33
![Page 13: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/13.jpg)
Statistical Inference
• Three sets of latent variables– topic mixtures θ– word mixtures – topic assignments z
• Estimate posterior distribution over topic assignments – P( z | w )
(we can later infer θ and )
![Page 14: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/14.jpg)
Statistical Inference
• Exact inference is impossible
• Use approximate methods:
• Markov chain Monte Carlo (MCMC) with Gibbs sampling
Sum over Tn terms( , )
( | )( , ')
PP
P z'
w zz w
w z
![Page 15: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/15.jpg)
Gibbs Sampling
' '' '
( | )i i
td wti i i i
t d w tt w
n np z t z
n T n W
count of word w assigned to topic t
count of topic t assigned to doc d
probability that word i is assigned to topic t
![Page 16: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/16.jpg)
Example of Gibbs Sampling
• Assign word tokens randomly to topics:
(●=topic 1; ○=topic 2 )
River Stream Bank Money Loan123456789
10111213141516
![Page 17: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/17.jpg)
River Stream Bank Money Loan123456789
10111213141516
After 1 iteration
• Apply sampling equation to each word token:
(●=topic 1; ○=topic 2 )
![Page 18: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/18.jpg)
River Stream Bank Money Loan123456789
10111213141516
After 4 iterations
(●=topic 1; ○=topic 2 )
![Page 19: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/19.jpg)
River Stream Bank Money Loan123456789
10111213141516
After 8 iterations
(●=topic 1; ○=topic 2 )
![Page 20: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/20.jpg)
River Stream Bank Money Loan123456789
10111213141516
After 32 iterations
(●=topic 1; ○=topic 2 )
topic 1 topic 2
River 0.42 0Stream 0.29 0.05Bank 0.28 0.31Money 0 0.29Loan 0 0.35
![Page 21: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/21.jpg)
INPUT: word-document counts (word order is irrelevant)
OUTPUT:topic assignments to each word P( zi )likely words in each topic P( w | z )likely topics in each document (“gist”) P( θ | d )
Algorithm input/output
![Page 22: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/22.jpg)
Software
Public-domain MATLAB toolbox for topic modeling on the Web:
http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm
![Page 23: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/23.jpg)
Examples Topics from New York Times
WEEKDOW_JONES
POINTS10_YR_TREASURY_YIELD
PERCENTCLOSE
NASDAQ_COMPOSITESTANDARD_POOR
CHANGEFRIDAY
DOW_INDUSTRIALSGRAPH_TRACKS
EXPECTEDBILLION
NASDAQ_COMPOSITE_INDEXEST_02
PHOTO_YESTERDAYYEN10
500_STOCK_INDEX
WALL_STREETANALYSTS
INVESTORSFIRM
GOLDMAN_SACHSFIRMS
INVESTMENTMERRILL_LYNCH
COMPANIESSECURITIESRESEARCH
STOCKBUSINESSANALYST
WALL_STREET_FIRMSSALOMON_SMITH_BARNEY
CLIENTSINVESTMENT_BANKINGINVESTMENT_BANKERS
INVESTMENT_BANKS
SEPT_11WAR
SECURITYIRAQ
TERRORISMNATIONKILLED
AFGHANISTANATTACKS
OSAMA_BIN_LADENAMERICAN
ATTACKNEW_YORK_REGION
NEWMILITARY
NEW_YORKWORLD
NATIONALQAEDA
TERRORIST_ATTACKS
BANKRUPTCYCREDITORS
BANKRUPTCY_PROTECTIONASSETS
COMPANYFILED
BANKRUPTCY_FILINGENRON
BANKRUPTCY_COURTKMART
CHAPTER_11FILING
COOPERBILLIONS
COMPANIESBANKRUPTCY_PROCEEDINGS
DEBTSRESTRUCTURING
CASEGROUP
Terrorism Wall Street Firms
Stock Market
Bankruptcy
![Page 24: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/24.jpg)
Example topics from an educational corpus
PRINTINGPAPERPRINT
PRINTEDTYPE
PROCESSINK
PRESSIMAGE
PLAYPLAYSSTAGE
AUDIENCETHEATERACTORSDRAMA
SHAKESPEAREACTOR
TEAMGAME
BASKETBALLPLAYERSPLAYER
PLAYPLAYINGSOCCERPLAYED
JUDGETRIALCOURTCASEJURY
ACCUSEDGUILTY
DEFENDANTJUSTICE
HYPOTHESISEXPERIMENTSCIENTIFIC
OBSERVATIONSSCIENTISTS
EXPERIMENTSSCIENTIST
EXPERIMENTALTEST
STUDYTEST
STUDYINGHOMEWORK
NEEDCLASSMATHTRY
TEACHER
Example topics from psych review abstracts
SIMILARITYCATEGORY
CATEGORIESRELATIONS
DIMENSIONSFEATURES
STRUCTURESIMILAR
REPRESENTATIONONJECTS
STIMULUSCONDITIONING
LEARNINGRESPONSESTIMULI
RESPONSESAVOIDANCE
REINFORCEMENTCLASSICAL
DISCRIMINATION
MEMORYRETRIEVAL
RECALLITEMS
INFORMATIONTERM
RECOGNITIONITEMSLIST
ASSOCIATIVE
GROUPINDIVIDUAL
GROUPSOUTCOMES
INDIVIDUALSGROUPS
OUTCOMESINDIVIDUALSDIFFERENCESINTERACTION
EMOTIONALEMOTION
BASICEMOTIONS
AFFECTSTATES
EXPERIENCESAFFECTIVE
AFFECTSRESEARCH
![Page 25: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/25.jpg)
Choosing number of topics
• Bayesian model selection
• Generalization test
– e.g., perplexity on out-of-sample data
• Non-parametric Bayesian approach
– Number of topics grows with size of data
– E.g. Hierarchical Dirichlet Processes (HDP)
![Page 26: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/26.jpg)
Applications to Human Memory
![Page 27: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/27.jpg)
Computational Problems for Semantic Memory System
• Gist extraction
– What is this set of words about?
• Disambiguation
– What is the sense of this word?
• Prediction
– what fact, concept, or word is next?
2 1|P w w
| ,P z w context
|P z w
|P w
|iP z w
2 1|P w w
![Page 28: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/28.jpg)
Disambiguation
“FIELD”
0
0.1
0.2
0.3
0.4
0.5
FIELDMAGNETICMAGNET
WIRENEEDLE
CURRENTCOIL
POLES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
P(
z FIE
LD |
w )
“FOOTBALL FIELD”
FIELDMAGNETICMAGNET
WIRENEEDLE
CURRENTCOIL
POLES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
P(
z FIE
LD |
w )
0
0.2
0.4
0.6
0.8
1
![Page 29: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/29.jpg)
Modeling Word Association
![Page 30: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/30.jpg)
Word Association(norms from Nelson et al. 1998)
CUE: PLANET
![Page 31: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/31.jpg)
people
EARTH STARS SPACE
SUN MARS
UNIVERSE SATURN GALAXY
associate number
12345678
Word Association(norms from Nelson et al. 1998)
CUE: PLANET
(vocabulary = 5000+ words)
![Page 32: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/32.jpg)
Word Association as a Prediction Problem
• Given that a single word is observed, predict what other words
might occur in that context
• Under a single topic assumption:
2 1 2 1| | |z
P w w P w z P z w
Response Cue
z
wzPzwPwwP )|()|()|( 1212
![Page 33: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/33.jpg)
people
EARTH STARS SPACE
SUN MARS
UNIVERSE SATURN GALAXY
model
STARS STAR SUN
EARTH SPACE
SKY PLANET
UNIVERSE
associate number
12345678
Word Association(norms from Nelson et al. 1998)
CUE: PLANET
First associate “EARTH” has rank 4 in model
![Page 34: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/34.jpg)
# Topics
300
500
700
90011
0013
0015
0017
00
Me
dian
Ra
nk
0
10
20
30
40
0
10
20
30
40
TOPICS LSA
Cosine
Inner product
Median rank of first associate
TOPICS
![Page 35: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/35.jpg)
# Topics
300
500
700
90011
0013
0015
0017
00
Me
dian
Ra
nk
0
10
20
30
40
0
10
20
30
40
TOPICS LSA
Cosine
Inner product
Median rank of first associate
TOPICS LSA
![Page 36: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/36.jpg)
Episodic Memory
Semantic Isolation EffectsFalse Memory
![Page 37: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/37.jpg)
Semantic Isolation Effect
Study this list:PEAS, CARROTS, BEANS, SPINACH, LETTUCE, HAMMER, TOMATOES, CORN, CABBAGE, SQUASH
HAMMER,PEAS,
CARROTS,...
![Page 38: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/38.jpg)
Semantic isolation effect / Von Restorff effect
• Finding: contextually unique words are better remembered
• Verbal explanations:
– Attention, surprise, distinctiveness
• Our approach:
– assume memories can be accessed and encoded at
multiple levels of description
• Semantic/ Gist aspects – generic information
• Verbatim – specific information
![Page 39: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/39.jpg)
39
Computational Problem
• How to tradeoff specificity and generality?
– Remembering detail and gist
• Dual route topic model =
topic model + encoding of
specific words
![Page 40: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/40.jpg)
40
Dual route topic model
• Two ways to generate words:
– Topic Model
– Verbatim word distribution (unique to document)
• Each word comes from a single route
– Switch variable xi for every word i:
xi =0 topics
xi =1 verbatim
• Conditional prob. of a word under a document:
| ( 0 | ) ( | )
( 1 | ) ( | )
topics
verbatim
P w d P x d P w d
P x d P w d
![Page 41: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/41.jpg)
Graphical Model
Variable x is a switch :
x=0 sample from topicx=1 sample from verbatim word distribution
![Page 42: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/42.jpg)
42
Applying Dual Route Topic Model to Human Memory
• Train model on educational corpus (TASA)
– 37K documents, 1700 topics
• Apply model to list memory experiments
– Study list is a “document”
– Recall probability based on model
| ( 0 | ) ( | )
( 1 | ) ( | )
topics
verbatim
P w d P x d P w d
P x d P w d
![Page 43: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/43.jpg)
43
Study wordsPEAS
CARROTS BEANS
SPINACH LETTUCE HAMMER
TOMATOES CORN
CABBAGE SQUASH
Retrieval Probability
0.00 0.01 0.02
HAMMER
BEANS
CORN
PEAS
SPINACH
CABBAGE
LETTUCE
CARROTS
SQUASH
TOMATOES
Verbatim Word Probability
0.00 0.05 0.10 0.15
HAMMER
PEAS
SPINACH
CABBAGE
CARROTS
LETTUCE
SQUASH
BEANS
TOMATOES
CORN
Switch Probability
0.0 0.5 1.0
Verbatim
Topic
Topic Probability
0.0 0.1 0.2
VEGETABLES
FURNITURE
TOOLS
ENCODING RETRIEVAL
Specialverbatim
![Page 44: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/44.jpg)
44
Hunt & Lamb (2001 exp. 1)
OUTLIER LISTPEAS
CARROTS BEANS
SPINACH LETTUCE HAMMER
TOMATOES CORN
CABBAGE SQUASH
CONTROL LISTSAW
SCREW CHISEL DRILL
SANDPAPERHAMMER
NAILSBENCHRULERANVIL
outlier list pure list
Pro
b. o
f R
ecal
l
0.0
0.2
0.4
0.6
0.8
1.0TargetBackground
DATA
![Page 45: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/45.jpg)
45
False Memory(e.g. Deese, 1959; Roediger & McDermott)
Study this list:Bed, Rest, Awake, Tired, Dream, Wake, Snooze, Blanket, Doze, Slumber, Snore, Nap, Peace, Yawn, Drowsy
SLEEP,BED,REST,
...
![Page 46: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/46.jpg)
46
MADFEARHATE
SMOOTHNAVYHEAT
SALADTUNE
COURTSCANDYPALACEPLUSHTOOTHBLIND
WINTER
Number of Associates
3 6 9
MADFEARHATERAGE
TEMPERFURY
SALADTUNE
COURTSCANDYPALACEPLUSHTOOTHBLIND
WINTER
MADFEARHATERAGE
TEMPERFURY
WRATHHAPPYFIGHT
CANDYPALACEPLUSHTOOTHBLIND
WINTER
(lure = ANGER)
Number of Associates Studied
3 6 9 12 15
Pro
b. o
f R
ecal
l
0.0
0.2
0.4
0.6
0.8
1.0Studied itemsNonstudied (lure)
DATA
Robinson & Roediger (1997)
Number of Associates Studied
3 6 9 12 15
Pro
b. o
f R
etri
eval
0.00
0.01
0.02
0.03Studied associatesNonstudied (lure)
PREDICTED
False memory effects
![Page 47: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/47.jpg)
Modeling Serial Order Effects in Free Recall
![Page 48: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/48.jpg)
48
Problem
• Dual route model predicts no sequential effects
– Order of words is important in human memory
experiments
• Standard Gibbs sampler is psychologically implausible:
– Assumes list is processed in parallel
– Each item can influence encoding of each other item
![Page 49: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/49.jpg)
49
Semantic isolation experiment to study order effects
• Study lists of 14 words long
– 14 isolate lists (e.g. A A A B A A ... A A )
– 14 control lists (e.g. A A A A A A ... A A )
• Varied serial position of isolate (any of 14 positions)
![Page 50: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/50.jpg)
50
Immediate Recall Results
Isolate list: B A A A A ... A
Control list: A A A A A ... A
![Page 51: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/51.jpg)
51
Immediate Recall Results
Isolate list: B A A A A ... A
Control list: A A A A A ... A
![Page 52: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/52.jpg)
52
Immediate Recall Results
Isolate list: A B A A A ... A
Control list: A A A A A ... A
![Page 53: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/53.jpg)
53
Immediate Recall Results
Isolate list: A A B A A ... A
Control list: A A A A A ... A
![Page 54: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/54.jpg)
54
Immediate Recall Results
![Page 55: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/55.jpg)
55
Modified Gibbs Sampling Scheme
• Update items non-uniformly in Gibbs sampler
• Probability of updating item i after observing words 1..t
Words further back in time are less likely to be re-
assigned
Pr( ) 1t i
i t item to update
Current time Parameter
![Page 56: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/56.jpg)
560 5 10 15
0
0.2
0.4
0.6
0.8
1
Serial Position
Rec
all
Pro
ba
bili
ty
0 5 10 150
0.2
0.4
0.6
0.8
1
Serial Position
Rec
all
Pro
ba
bilit
y
ControlOutlier
0 5 10 150
0.2
0.4
0.6
0.8
1
Serial Position
Rec
all
Pro
bab
ility
1 2 3 4 5 6 7 8 9 10 11 12 13 141 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Effect of Sampling Scheme
Study order
=1 =0.3 =0
![Page 57: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/57.jpg)
57
Normalized Serial Position Effects
DATA
2 4 6 8 10 12 14-0.4
-0.2
0
0.2
0.4
Serial Position
P(
Iso
late
) -
P(
Mai
n )
MODEL
![Page 58: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/58.jpg)
Information Retrieval &
Human Memory
![Page 59: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/59.jpg)
59
Example
• Searching for information on Padhraic Smyth:
![Page 60: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/60.jpg)
60
Query = “Smyth”
![Page 61: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/61.jpg)
61
Query = “Smyth irish computer science department”
![Page 62: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/62.jpg)
Query = “Smyth irish computer science department weather prediction seasonal climate fluctuations hmm models nips conference consultant
yahoo netflix prize dave newman steyvers”
![Page 63: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/63.jpg)
Problem
• More information in a query can lead to worse search
results
• Human memory typically works better with more cues
• Problem: how can we better match queries to
documents to allow for partial matches, and matches
across documents?
![Page 64: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/64.jpg)
Dual route model for information retrieval
• Encode documents with two routes
– contextually unique words verbatim route
– Thematic words topics route
![Page 65: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/65.jpg)
Example encoding of a psych review abstract
alcove attention learning covering map is a connectionist model of category learning that incorporates an exemplar based representation d . l . medin and m . m . schaffer 1978 r . m . nosofsky 1986 with error driven learning m . a . gluck and g . h . bower 1988 d . e . rumelhart et al 1986 . alcove selectively attends to relevant stimulus dimensions is sensitive to correlated dimensions can account for a form of base rate neglect does not suffer catastrophic forgetting and can exhibit 3 stage u shaped learning of high frequency exceptions to rules whereas such effects are not easily accounted for by models using other combinations of representation and learning method .
Kruschke, J. K.. ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22-44.
Contextually unique words:ALCOVE, SCHAFFER, MEDIN, NOSOFSKY
Topic 1 (p=0.21): learning phenomena acquisition learn acquired ...
Topic 22 (p=0.17): similarity objects object space category dimensional categories spatial
Topic 61 (p=0.08): representations representation order alternative 1st higher 2nd descriptions problem form
![Page 66: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/66.jpg)
Retrieval Experiments
• For each candidate document, calculate how likely the
query was “generated” from the model’s encoding
| 0 | | 1 | |topics verbatimw Query
P Query d P x d P w d P x d P w d
![Page 67: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/67.jpg)
Information Retrieval Results
Method Title Desc Concepts
TFIDF .406 .434 .549
LSI .455 .469 .523
LDA .478 .463 .556
SW .488 .468 .561
SWB .495 .473 .558
Method Title Desc Concepts
TFIDF .300 .287 .483
LSI .366 .327 .487
LDA .428 .340 .487
SW .448 .407 .560
SWB .459 .400 .560
Evaluation Metric: precision for 10 highest ranked docs
APs FRs
![Page 68: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/68.jpg)
68
Information retrieval systems in the mind & web
• Similar computational demands:
– Both retrieve the most relevant items from a large information
repository in response to external cues or queries.
• Useful analogies/ interdisciplinary approaches
• Many cognitive aspects in information retrieval
– Internet content is produced by humans
– Queries are formulated by humans
![Page 69: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/69.jpg)
69
Recent Papers• Steyvers, M., Griffiths, T.L., & Dennis, S. (2006). Probabilistic inference in
human semantic memory. Trends in Cognitive Sciences, 10(7), 327-334.
• Griffiths, T.L., Steyvers, M., & Tenenbaum, J.B.T. (2007). Topics in Semantic Representation. Psychological Review, 114(2), 211-244.
• Griffiths, T.L., Steyvers, M., & Firl, A. (in press). Google and the mind: Predicting fluency with PageRank. Psychological Science.
• Steyvers, M. & Griffiths, T.L. (in press). Rational Analysis as a Link between Human Memory and Information Retrieval. In N. Chater and M Oaksford (Eds.) The Probabilistic Mind: Prospects from Rational Models of Cognition. Oxford University Press.
• Chemudugunta, C., Smyth, P., & Steyvers, M. (2007, in press). Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model. In: Advances in Neural Information Processing Systems, 19.
![Page 70: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/70.jpg)
Text Mining Applications
![Page 71: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/71.jpg)
Topics provide quick summary of content
• Who writes on what topics?
• What is in this corpus? What is in this document?
• What are the topical trends over time?
• Who is mentioned in what context?
![Page 72: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/72.jpg)
Faculty Browser
• System spiders UCI/UCSD faculty websites related to
CalIT2 = California Institute for Telecommunications
and Information Technology
• Applies topic model on text extracted from pdf files
• Browser demo:
http://yarra.calit2.uci.edu/calit2/
![Page 73: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/73.jpg)
one topic
most prolific researchers for this topic
![Page 74: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/74.jpg)
topics this researcher works on
other researchers with similar
topical interests
one researcher
![Page 75: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/75.jpg)
Inferred network of researchers connected through topics
![Page 76: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/76.jpg)
330,000 articles
2000-2002
Analyzing the New York Times
![Page 77: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/77.jpg)
Three investigations began Thursday into the securities and exchange_commission's choice of william_webster to head a new board overseeing the accounting profession. house and senate_democrats called for the resignations of both judge_webster and harvey_pitt, the commission's chairman. The white_house expressed support for judge_webster as well as for harvey_pitt, who was harshly criticized Thursday for failing to inform other commissioners before they approved the choice of judge_webster that he had led the audit committee of a company facing fraud accusations. “The president still has confidence in harvey_pitt,” said dan_bartlett, bush's communications director …
Extracted Named Entities
• Used standard algorithms to extract named entities:
- People- Places- Organizations
![Page 78: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/78.jpg)
Standard Topic Model with Entities
team 0.028 tour 0.039 holiday 0.071 award 0.026play 0.015 rider 0.029 gift 0.050 film 0.020game 0.013 riding 0.017 toy 0.023 actor 0.020season 0.012 bike 0.016 season 0.019 nomination 0.019final 0.011 team 0.016 doll 0.014 movie 0.015games 0.011 stage 0.014 tree 0.011 actress 0.011point 0.011 race 0.013 present 0.008 won 0.011series 0.011 won 0.012 giving 0.008 director 0.010player 0.010 bicycle 0.010 special 0.007 nominated 0.010coach 0.009 road 0.009 shopping 0.007 supporting 0.010playoff 0.009 hour 0.009 family 0.007 winner 0.008championship 0.007 scooter 0.008 celebration 0.007 picture 0.008playing 0.006 mountain 0.008 card 0.007 performance 0.007win 0.006 place 0.008 tradition 0.006 nominees 0.007LAKERS 0.062 LANCE-ARMSTRONG 0.021 CHRISTMAS 0.058 OSCAR 0.035SHAQUILLE-O-NEAL0.028 FRANCE 0.011 THANKSGIVING 0.018 ACADEMY 0.020KOBE-BRYANT 0.028 JAN-ULLRICH 0.003 SANTA-CLAUS 0.009 HOLLYWOOD 0.009PHIL-JACKSON 0.019 LANCE 0.003 BARBIE 0.004 DENZEL-WASHINGTON 0.006NBA 0.013 U-S-POSTAL-SERVICE 0.002 HANUKKAH 0.003 JULIA-ROBERT 0.005SACRAMENTO 0.007 MARCO-PANTANI 0.002 MATTEL 0.003 RUSSELL-CROWE 0.005RICK-FOX 0.007 PARIS 0.002 GRINCH 0.003 TOM-HANK 0.005PORTLAND 0.006 ALPS 0.002 HALLMARK 0.002 STEVEN-SODERBERGH 0.004ROBERT-HORRY 0.006 PYRENEES 0.001 EASTER 0.002 ERIN-BROCKOVICH 0.003DEREK-FISHER 0.006 SPAIN 0.001 HASBRO 0.002 KEVIN-SPACEY 0.003
Basketball Holidays OscarsTour de France
![Page 79: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/79.jpg)
Jan00 Jul00 Jan01 Jul01 Jan02 Jul02 Jan030
50
100
Jan00 Jul00 Jan01 Jul01 Jan02 Jul02 Jan030
5
10
15
Topic Trends
Tour-de-France
Anthrax
Jan00 Jul00 Jan01 Jul01 Jan02 Jul02 Jan030
10
20
30Quarterly Earnings
Proportion of words assigned to topic for that
time slice
![Page 80: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/80.jpg)
Example of Extracted Entity-Topic Network
Muslim_Militance
Mid_East_Conflict
Palestinian_Territories
Pakistan_Indian_War
FBI_Investigation
Detainees
Mid_East_Peace
US_Military
Religion
Terrorist_Attacks
Afghanistan_War
AL_QAEDA
HAMID_KARZAIMOHAMMED
MOHAMMED_ATTA
NORTHERN_ALLIANCE
BIN_LADEN
TALIBAN
ZAWAHIRI
YASSER_ARAFAT
EHUD_BARAK
ARIEL_SHARON
HAMAS
AL_HAZMI
KING_HUSSEIN
![Page 81: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/81.jpg)
Prediction of Missing Entities in Text
Shares of XXXX slid 8 percent, or $1.10, to $12.65 Tuesday, as major
credit agencies said the conglomerate would still be challenged in repaying
its debts, despite raising $4.6 billion Monday in taking its finance group
public. Analysts at XXXX Investors service in XXXX said they were
keeping XXXX and its subsidiaries under review for a possible debt
downgrade, saying the company “will continue to face a significant debt
burden,'' with large slices of debt coming due, over the next 18 months.
XXXX said …
Test article with entities
removed
![Page 82: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/82.jpg)
Prediction of Missing Entities in Text
Shares of XXXX slid 8 percent, or $1.10, to $12.65 Tuesday, as major
credit agencies said the conglomerate would still be challenged in repaying
its debts, despite raising $4.6 billion Monday in taking its finance group
public. Analysts at XXXX Investors service in XXXX said they were
keeping XXXX and its subsidiaries under review for a possible debt
downgrade, saying the company “will continue to face a significant debt
burden,'' with large slices of debt coming due, over the next 18 months.
XXXX said …
fitch goldman-sachs lehman-brother moody morgan-stanley new-york-
stock-exchange standard-and-poor tyco tyco-international wall-street
worldco
Test article with entities
removed
Actual missing entities
![Page 83: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/83.jpg)
Prediction of Missing Entities in Text
Shares of XXXX slid 8 percent, or $1.10, to $12.65 Tuesday, as major
credit agencies said the conglomerate would still be challenged in repaying
its debts, despite raising $4.6 billion Monday in taking its finance group
public. Analysts at XXXX Investors service in XXXX said they were
keeping XXXX and its subsidiaries under review for a possible debt
downgrade, saying the company “will continue to face a significant debt
burden,'' with large slices of debt coming due, over the next 18 months.
XXXX said …
fitch goldman-sachs lehman-brother moody morgan-stanley new-york-
stock-exchange standard-and-poor tyco tyco-international wall-street
worldco
wall-street new-york nasdaq securities-exchange-commission sec merrill-
lynch new-york-stock-exchange goldman-sachs standard-and-poor
Test article with entities
removed
Actual missing entities
Predicted entities given observed words (matches
in blue)
![Page 84: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/84.jpg)
Model Extensions
![Page 85: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/85.jpg)
Model Extensions
• HMM-topics model
– Modeling aspects of syntax
• Hierarchical topic model
– Modeling relations between topics
• Collocation topic models
– Learning collocations of words within topics
![Page 86: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/86.jpg)
Hidden Markov Topic Model
![Page 87: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/87.jpg)
Hidden Markov Topics Model
• Syntactic dependencies short range dependencies
• Semantic dependencies long-range
z z z z
w w w w
s s s s
Semantic state: generate words from topic model
Syntactic states: generate words from HMM
(Griffiths, Steyvers, Blei, & Tenenbaum, 2004)
![Page 88: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/88.jpg)
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
z = 1 0.4
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
z = 2 0.6
THE 0.6 A 0.3MANY 0.1
OF 0.6 FOR 0.3BETWEEN 0.1
0.9
0.1
0.2
0.8
0.7
0.3
Transition between semantic state and syntactic states
![Page 89: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/89.jpg)
THE ………………………………
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
z = 1 0.4
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
z = 2 0.6
x = 1
THE 0.6 A 0.3MANY 0.1
x = 3
OF 0.6 FOR 0.3BETWEEN 0.1
x = 2
0.9
0.1
0.2
0.8
0.7
0.3
Combining topics and syntax
![Page 90: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/90.jpg)
THE LOVE……………………
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
z = 1 0.4
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
z = 2 0.6
x = 1
THE 0.6 A 0.3MANY 0.1
x = 3
OF 0.6 FOR 0.3BETWEEN 0.1
x = 2
0.9
0.1
0.2
0.8
0.7
0.3
Combining topics and syntax
![Page 91: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/91.jpg)
THE LOVE OF………………
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
z = 1 0.4
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
z = 2 0.6
x = 1
THE 0.6 A 0.3MANY 0.1
x = 3
OF 0.6 FOR 0.3BETWEEN 0.1
x = 2
0.9
0.1
0.2
0.8
0.7
0.3
Combining topics and syntax
![Page 92: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/92.jpg)
THE LOVE OF RESEARCH ……
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
z = 1 0.4
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
z = 2 0.6
x = 1
THE 0.6 A 0.3MANY 0.1
x = 3
OF 0.6 FOR 0.3BETWEEN 0.1
x = 2
0.9
0.1
0.2
0.8
0.7
0.3
Combining topics and syntax
![Page 93: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/93.jpg)
FOODFOODSBODY
NUTRIENTSDIETFAT
SUGARENERGY
MILKEATINGFRUITS
VEGETABLESWEIGHT
FATSNEEDS
CARBOHYDRATESVITAMINSCALORIESPROTEIN
MINERALS
MAPNORTHEARTHSOUTHPOLEMAPS
EQUATORWESTLINESEAST
AUSTRALIAGLOBEPOLES
HEMISPHERELATITUDE
PLACESLAND
WORLDCOMPASS
CONTINENTS
DOCTORPATIENTHEALTH
HOSPITALMEDICAL
CAREPATIENTS
NURSEDOCTORSMEDICINENURSING
TREATMENTNURSES
PHYSICIANHOSPITALS
DRSICK
ASSISTANTEMERGENCY
PRACTICE
BOOKBOOKS
READINGINFORMATION
LIBRARYREPORT
PAGETITLE
SUBJECTPAGESGUIDE
WORDSMATERIALARTICLE
ARTICLESWORDFACTS
AUTHORREFERENCE
NOTE
GOLDIRON
SILVERCOPPERMETAL
METALSSTEELCLAYLEADADAM
OREALUMINUM
MINERALMINE
STONEMINERALS
POTMININGMINERS
TIN
BEHAVIORSELF
INDIVIDUALPERSONALITY
RESPONSESOCIAL
EMOTIONALLEARNINGFEELINGS
PSYCHOLOGISTSINDIVIDUALS
PSYCHOLOGICALEXPERIENCES
ENVIRONMENTHUMAN
RESPONSESBEHAVIORSATTITUDES
PSYCHOLOGYPERSON
CELLSCELL
ORGANISMSALGAE
BACTERIAMICROSCOPEMEMBRANEORGANISM
FOODLIVINGFUNGIMOLD
MATERIALSNUCLEUSCELLED
STRUCTURESMATERIAL
STRUCTUREGREENMOLDS
Semantic topics
PLANTSPLANT
LEAVESSEEDSSOIL
ROOTSFLOWERS
WATERFOOD
GREENSEED
STEMSFLOWER
STEMLEAF
ANIMALSROOT
POLLENGROWING
GROW
![Page 94: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/94.jpg)
GOODSMALL
NEWIMPORTANT
GREATLITTLELARGE
*BIG
LONGHIGH
DIFFERENTSPECIAL
OLDSTRONGYOUNG
COMMONWHITESINGLE
CERTAIN
THEHIS
THEIRYOURHERITSMYOURTHIS
THESEA
ANTHATNEW
THOSEEACH
MRANYMRSALL
MORESUCHLESS
MUCHKNOWN
JUSTBETTERRATHER
GREATERHIGHERLARGERLONGERFASTER
EXACTLYSMALLER
SOMETHINGBIGGERFEWERLOWER
ALMOST
ONAT
INTOFROMWITH
THROUGHOVER
AROUNDAGAINSTACROSS
UPONTOWARDUNDERALONGNEAR
BEHINDOFF
ABOVEDOWN
BEFORE
SAIDASKED
THOUGHTTOLDSAYS
MEANSCALLEDCRIED
SHOWSANSWERED
TELLSREPLIED
SHOUTEDEXPLAINEDLAUGHED
MEANTWROTE
SHOWEDBELIEVED
WHISPERED
ONESOMEMANYTWOEACHALL
MOSTANY
THREETHIS
EVERYSEVERAL
FOURFIVEBOTHTENSIX
MUCHTWENTY
EIGHT
HEYOU
THEYI
SHEWEIT
PEOPLEEVERYONE
OTHERSSCIENTISTSSOMEONE
WHONOBODY
ONESOMETHING
ANYONEEVERYBODY
SOMETHEN
Syntactic classes
BEMAKE
GETHAVE
GOTAKE
DOFINDUSESEE
HELPKEEPGIVELOOKCOMEWORKMOVELIVEEAT
BECOME
![Page 95: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/95.jpg)
MODELALGORITHM
SYSTEMCASE
PROBLEMNETWORKMETHOD
APPROACHPAPER
PROCESS
ISWASHAS
BECOMESDENOTES
BEINGREMAINS
REPRESENTSEXISTSSEEMS
SEESHOWNOTE
CONSIDERASSUMEPRESENT
NEEDPROPOSEDESCRIBESUGGEST
USEDTRAINED
OBTAINEDDESCRIBED
GIVENFOUND
PRESENTEDDEFINED
GENERATEDSHOWN
INWITHFORON
FROMAT
USINGINTOOVER
WITHIN
HOWEVERALSOTHENTHUS
THEREFOREFIRSTHERENOW
HENCEFINALLY
#*IXTN-CFP
EXPERTSEXPERTGATING
HMEARCHITECTURE
MIXTURELEARNINGMIXTURESFUNCTION
GATE
DATAGAUSSIANMIXTURE
LIKELIHOODPOSTERIOR
PRIORDISTRIBUTION
EMBAYESIAN
PARAMETERS
STATEPOLICYVALUE
FUNCTIONACTION
REINFORCEMENTLEARNINGCLASSESOPTIMAL
*
MEMBRANESYNAPTIC
CELL*
CURRENTDENDRITICPOTENTIAL
NEURONCONDUCTANCE
CHANNELS
IMAGEIMAGESOBJECT
OBJECTSFEATURE
RECOGNITIONVIEWS
#PIXEL
VISUAL
KERNELSUPPORTVECTOR
SVMKERNELS
#SPACE
FUNCTIONMACHINES
SET
NETWORKNEURAL
NETWORKSOUPUTINPUT
TRAININGINPUTS
WEIGHTS#
OUTPUTS
NIPS Semantics
NIPS Syntax
![Page 96: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/96.jpg)
Random sentence generation
LANGUAGE:[S] RESEARCHERS GIVE THE SPEECH[S] THE SOUND FEEL NO LISTENERS[S] WHICH WAS TO BE MEANING[S] HER VOCABULARIES STOPPED WORDS[S] HE EXPRESSLY WANTED THAT BETTER VOWEL
![Page 97: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/97.jpg)
Nested Chinese Restaurant Process
![Page 98: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/98.jpg)
Topic Hierarchies
• In regular topic model, no relations
between topicstopic 1
topic 2 topic 3
topic 4 topic 5 topic 6 topic 7
• Nested Chinese Restaurant Process
– Blei, Griffiths, Jordan, Tenenbaum
(2004)
– Learn hierarchical structure, as
well as topics within structure
![Page 99: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/99.jpg)
Example: Psych Review Abstracts
RESPONSESTIMULUS
REINFORCEMENTRECOGNITION
STIMULIRECALLCHOICE
CONDITIONING
SPEECHREADINGWORDS
MOVEMENTMOTORVISUALWORD
SEMANTIC
ACTIONSOCIALSELF
EXPERIENCEEMOTION
GOALSEMOTIONALTHINKING
GROUPIQ
INTELLIGENCESOCIAL
RATIONALINDIVIDUAL
GROUPSMEMBERS
SEXEMOTIONS
GENDEREMOTIONSTRESSWOMENHEALTH
HANDEDNESS
REASONINGATTITUDE
CONSISTENCYSITUATIONALINFERENCEJUDGMENT
PROBABILITIESSTATISTICAL
IMAGECOLOR
MONOCULARLIGHTNESS
GIBSONSUBMOVEMENTORIENTATIONHOLOGRAPHIC
CONDITIONINSTRESS
EMOTIONALBEHAVIORAL
FEARSTIMULATIONTOLERANCERESPONSES
AMODEL
MEMORYFOR
MODELSTASK
INFORMATIONRESULTSACCOUNT
SELFSOCIAL
PSYCHOLOGYRESEARCH
RISKSTRATEGIES
INTERPERSONALPERSONALITY
SAMPLING
MOTIONVISUAL
SURFACEBINOCULAR
RIVALRYCONTOUR
DIRECTIONCONTOURSSURFACES
DRUGFOODBRAIN
AROUSALACTIVATIONAFFECTIVEHUNGER
EXTINCTIONPAIN
THEOF
ANDTOINAIS
![Page 100: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/100.jpg)
Generative Process
RESPONSESTIMULUS
REINFORCEMENTRECOGNITION
STIMULIRECALLCHOICE
CONDITIONING
SPEECHREADINGWORDS
MOVEMENTMOTORVISUALWORD
SEMANTIC
ACTIONSOCIALSELF
EXPERIENCEEMOTION
GOALSEMOTIONALTHINKING
GROUPIQ
INTELLIGENCESOCIAL
RATIONALINDIVIDUAL
GROUPSMEMBERS
SEXEMOTIONS
GENDEREMOTIONSTRESSWOMENHEALTH
HANDEDNESS
REASONINGATTITUDE
CONSISTENCYSITUATIONALINFERENCEJUDGMENT
PROBABILITIESSTATISTICAL
IMAGECOLOR
MONOCULARLIGHTNESS
GIBSONSUBMOVEMENTORIENTATIONHOLOGRAPHIC
CONDITIONINSTRESS
EMOTIONALBEHAVIORAL
FEARSTIMULATIONTOLERANCERESPONSES
AMODEL
MEMORYFOR
MODELSTASK
INFORMATIONRESULTSACCOUNT
SELFSOCIAL
PSYCHOLOGYRESEARCH
RISKSTRATEGIES
INTERPERSONALPERSONALITY
SAMPLING
MOTIONVISUAL
SURFACEBINOCULAR
RIVALRYCONTOUR
DIRECTIONCONTOURSSURFACES
DRUGFOODBRAIN
AROUSALACTIVATIONAFFECTIVEHUNGER
EXTINCTIONPAIN
THEOF
ANDTOINAIS
![Page 101: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/101.jpg)
Collocation Topic Model
![Page 102: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/102.jpg)
What about collocations?
• Why are these words related?
– PLAY - GROUND
– DOW - JONES
– BUMBLE - BEE
• Suggests at least two routes for association:
– Semantic
– Collocation
Integrate collocations into topic model
![Page 103: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/103.jpg)
TOPIC MIXTURE
TOPIC
WORD
X
TOPIC
WORD
X
TOPIC
WORDIf x=0, sample a word from the topic
If x=1, sample a word from the distribution based on previous word
......
......
......
Collocation Topic Model
![Page 104: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/104.jpg)
TOPIC MIXTURE
TOPIC
DOW
X=1
JONES
X=0
TOPIC
RISES
Example: “DOW JONES RISES”
JONES is more likely explained as a word following DOW than as word sampled from topic
Result: DOW_JONES recognized as collocation
......
......
......
Collocation Topic Model
![Page 105: Semantic Representations with Probabilistic Topic Models](https://reader035.fdocuments.net/reader035/viewer/2022062222/56814dd8550346895dbb40df/html5/thumbnails/105.jpg)
Examples Topics from New York Times
WEEKDOW_JONES
POINTS10_YR_TREASURY_YIELD
PERCENTCLOSE
NASDAQ_COMPOSITESTANDARD_POOR
CHANGEFRIDAY
DOW_INDUSTRIALSGRAPH_TRACKS
EXPECTEDBILLION
NASDAQ_COMPOSITE_INDEXEST_02
PHOTO_YESTERDAYYEN10
500_STOCK_INDEX
WALL_STREETANALYSTS
INVESTORSFIRM
GOLDMAN_SACHSFIRMS
INVESTMENTMERRILL_LYNCH
COMPANIESSECURITIESRESEARCH
STOCKBUSINESSANALYST
WALL_STREET_FIRMSSALOMON_SMITH_BARNEY
CLIENTSINVESTMENT_BANKINGINVESTMENT_BANKERS
INVESTMENT_BANKS
SEPT_11WAR
SECURITYIRAQ
TERRORISMNATIONKILLED
AFGHANISTANATTACKS
OSAMA_BIN_LADENAMERICAN
ATTACKNEW_YORK_REGION
NEWMILITARY
NEW_YORKWORLD
NATIONALQAEDA
TERRORIST_ATTACKS
BANKRUPTCYCREDITORS
BANKRUPTCY_PROTECTIONASSETS
COMPANYFILED
BANKRUPTCY_FILINGENRON
BANKRUPTCY_COURTKMART
CHAPTER_11FILING
COOPERBILLIONS
COMPANIESBANKRUPTCY_PROCEEDINGS
DEBTSRESTRUCTURING
CASEGROUP
Terrorism Wall Street Firms
Stock Market
Bankruptcy