Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and...
-
Upload
justyn-hodges -
Category
Documents
-
view
231 -
download
0
Transcript of Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and...
![Page 1: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/1.jpg)
Amr AhmedThesis Proposal
Modeling Users and Content:Structured Probabilistic Representation
and Scalable Online Inference Algorithms
![Page 2: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/2.jpg)
This thesis is about
Document collections
they are everywhere
they cover many domains
![Page 3: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/3.jpg)
Research Publications
Social Media
Conferen
ce pro
ceed
ing
Journal transactions
ArXiv
Pubm
ed ce
ntra
l
Yahoo! news
Google news
CNN
BBC
Blogs
Daily KOS
Red state
![Page 4: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/4.jpg)
Ban abortion with Constitutional amendment
Choice is a fundamental,
constitutional right
CS
BioPhy
time
Drillexplosion
time
“BP wasn't prepared for an oil spill at such depths”
BP: “We will make this right."
Temporal Dynamics Structural Correspondence
![Page 5: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/5.jpg)
Thesis Question• How to build a structured representation of
document collections that reveals
– Temporal Dynamics• How ideas/events evolve over time
– Structural Correspondence• How ideas are addressed across modalities and communities
![Page 6: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/6.jpg)
Thesis Approach• Models
– Probabilistic graphical models• Topic models and Non-parametric Bayes
– Principled, expressive and modular
• Algorithms– Distributed
• To deal with large-scale datasets
– Online• To update the representation with new data
![Page 7: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/7.jpg)
Outline• Background• Temporal Dynamics
– Timelines for research publications– Storylines form news stream– User interest-lines
• Structural Correspondence– Across modalities– Across ideologies
![Page 8: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/8.jpg)
What is a Good Model for Documents?
• Clustering– Mixture of unigram model
• How to specify a model?• Generative process
– Assume some hidden variables– Use them to generate documents
• Inference– Invert the process
• Given documents hidden variables
ci
wiN
p fK
![Page 9: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/9.jpg)
Mixture of Unigram
ci
wiN
p fK
fkf1
-For Document wi - Sample ci ~ Multi(p)- Sample wi ~ Mult(fci)
wi
p1 pj pk
Generative Process Is this a good model for documents?
When is this a good model for documents?
- When documents are single-topic- Not true in our settings
![Page 10: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/10.jpg)
What Do We Need to Model?
• Q: What is it about?• A: Mainly MT, with syntax, some learning
A Hierarchical Phrase-Based Model for Statistical Machine Translation
We present a statistical phrase-based Translation model that uses hierarchical phrases—phrases that contain sub-phrases. The model is formally a synchronous context-free grammar but is learned from a bitext without any syntactic information. Thus it can be seen as a shift to the formal machinery of syntaxbased translation systems without any linguistic commitment. In our experiments using BLEU as a metric, the hierarchical Phrase based model achieves a relative Improvement of 7.5% over Pharaoh, a state-of-the-art phrase-based system.
SourceTargetSMT
AlignmentScoreBLEU
ParseTreeNoun
PhraseGrammar
CFG
likelihoodEM
HiddenParametersEstimation
argMax
MT Syntax Learning
Unigram over vocabulary
Topi
cs
Mixing Proportion
Topic Models
0.6 0.3 0.1
![Page 11: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/11.jpg)
Mixed-Membership Models
fkf1
wi
q1 qj qk
A Hierarchical Phrase-Based Model for Statistical Machine Translation
We present a statistical phrase-based Translation model that uses hierarchical phrases. Thus it can be seen as a shift to the formal machinery of syntaxbased translation systems without any linguistic commitment. In our experiments using BLEU as a metric, the hierarchical Phrase based model achieves a relative Improvement of 7.5% over Pharaoh, a state-of-the-art phrase-based system.
q
z
w f N
D
Prior
K
-For each document d- Sample qd ~ Prior- For each word w in d
- Sample z ~ Multi(qd) - Sample w ~ Multi(fz)
Generative Process
![Page 12: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/12.jpg)
Topic Models• Prior over topic Vector
– Latent Dirichlet Allocation (LDA)– Correlated priors (CTM)– Hierarchical priors
• Topics– Unigram, bigrams, etc
• Document structure– Bag of words– Multi-modal– Side information
q
z
w f N
D
Prior
K
![Page 13: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/13.jpg)
Outline• Background• Temporal Dynamics
– Timelines for research publications– Storylines form news stream– User interest-lines
• Structural Correspondence– Across modalities– Across ideologies
![Page 14: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/14.jpg)
1900 2009
CS
BioPhy
Research Papers
Topics
Problem Statement
• Potentially infinite number of topics– With time-varying trends– And time-varying distributions– And variable durations
• Topics can die• New topics can be born
given
Discover
![Page 15: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/15.jpg)
The Big PictureTime
Mod
el D
imen
sion
LDA
HDPM
Dynamic clusteringDynamic LDA
q z w
f N
D
a
K
Infinite Dynamic Topic Models
![Page 16: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/16.jpg)
LDA: The Generative Process
Topics’ distributions evolve over time?
Topics’ trends evolve over time?
q
z
w
f
ND
a
K
Number of topics grow with the data?
-For each document d- Sample qd ~ Dirichlet(a)- For each word w in d
- Sample z ~ Multi(qd) - Sample w ~ Multi(fz)
Generative Process
![Page 17: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/17.jpg)
The Big PictureTime
LDA
q z w
b N
D
a
K
Infinite Dynamic Topic Models
HDPM
Mod
el D
imen
sion
Dynamic clusteringDynamic LDA
![Page 18: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/18.jpg)
Dynamic LDA: The Generative Process
q
z
w
f1
ND
a1
K
1900 2009
Research Papers
-For each document d- Sample qd ~ Normal( ,a lI)- For each word w in d- Sample z ~ Multi(L(qd)) - Sample w ~ Multi(L(fz))
Necessary to evolve trends
Logistic transformation:
![Page 19: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/19.jpg)
Dynamic LDA: The Generative Process
q
z
w
f1
ND
a1
K
1900 2009
Research Papers
q
z
w N
D
a2
f2 K
-at ~ Normal(.|at -1,s)- Fk,t ~ Normal(.| Fk,t,r)- For each document d
- Sample qd ~ Normal(at ,lI)- For each word w in d- Sample zd,i ~ Multi(L(qd)) - Sample wd,i ~ Multi(L(fz(d,i)))
![Page 20: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/20.jpg)
Dynamic LDA: The Generative Process
q
z
w
f1
ND
a1
K
1900 2009
Research Papers
q
z
w N
D
a2
f2 K
q
z
w N
D
aT
fT K
![Page 21: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/21.jpg)
Dynamic LDA: The Generative Process
q
z
w
f1
ND
a1
K
q
z
w N
D
a2
f2 K
q
z
w N
D
aT
fT K
Topics’ distributions evolve over time?
Topics’ trends evolve over time?
Number of topics grow with the data?
![Page 22: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/22.jpg)
The Big Picture Time
LDA Dynamic clusteringDynamic LDA
q z w
b N
D
a
K
Infinite Dynamic Topic Models
HDPM
Mod
el D
imen
sion
![Page 23: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/23.jpg)
The Chinese Restaurant Franchise Process
• HDPM automatically determines number of topics in LDA
• We will focus on the Chinese Restaurant Franchise process construction – A set of restaurants that share a global menu
• Metaphor– Restaurant = documents– Customer = word– Dish = topic– Global Menu = Set of topics
![Page 24: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/24.jpg)
The Chinese Restaurant Franchise Process
Restaurant 1 Restaurant 2
m1: Number of tables serving this dish (topic)
TableDish served
CustomersSharing the same dish
CustomersSharing the same dish
f4: distribution for topic 4
f4f3f2f1
Global Menu
![Page 25: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/25.jpg)
The Chinese Restaurant Franchise ProcessGlobal Menu
Restaurant 1 Restaurant 2 Restaurant 3
-For customer w in restaurant 3
- Choose table j Nj
- Choose a new table b a - Sample a new dish for this table
Generative Process
?
f4f3f2f1
a
![Page 26: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/26.jpg)
The Chinese Restaurant Franchise ProcessGlobal Menu
Restaurant 1 Restaurant 2 Restaurant 3
-For customer w in restaurant 3
- Choose table j Nj
- Choose a new table b a - Sample a new dish for this table
Generative Process
?
f4f3f2f1
w ~ Multi(L( f3))
a
![Page 27: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/27.jpg)
The Chinese Restaurant Franchise ProcessGlobal Menu
Restaurant 1 Restaurant 2 Restaurant 3
-For customer w in restaurant 3
- Choose table j Nj
- Choose a new table b a - Sample a new dish for this table
Generative Process
?
f4f3f2f1
a
![Page 28: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/28.jpg)
The Chinese Restaurant Franchise ProcessGlobal Menu
Restaurant 1 Restaurant 2 Restaurant 3
-For customer w in restaurant 3
- Choose table j Nj
- Choose a new table b a - Sample a new dish for this table- Existing dish k mk - A new dish g
Generative Process
f4f3f2f1 new
?
?
g
a
![Page 29: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/29.jpg)
The Chinese Restaurant Franchise ProcessGlobal Menu
Restaurant 1 Restaurant 2 Restaurant 3
-For customer w in restaurant 3
- Choose table j Nj
- Choose a new table b a - Sample a new dish for this table- Existing dish k mk - A new dish g
Generative Process
?
f4f3f2f1 new
w ~ Multi(L( f3))g
a
![Page 30: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/30.jpg)
The Chinese Restaurant Franchise ProcessGlobal Menu
Restaurant 1 Restaurant 2 Restaurant 3
-For customer w in restaurant 3
- Choose table j Nj
- Choose a new table b a - Sample a new dish for this table- Existing dish k mk - A new dish g
Generative Process
?
f4f3f2f1 new
?
f5~ H
f5
w ~ Multi(L( f5))
a
![Page 31: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/31.jpg)
The Chinese Restaurant Franchise ProcessGlobal Menu
Restaurant 1 Restaurant 2 Restaurant 3
f4f3f2f1 f5
Topics’ distributions evolve over time?
Topics’ trends evolve over time?
Number of topics grow with the data?
![Page 32: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/32.jpg)
The Big Picture Time
LDA Dynamic clusteringDynamic LDA
q z w
b N
D
a
K
HDPM
Mod
el D
imen
sion
Infinite Dynamic Topic Models
![Page 33: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/33.jpg)
Recurrent Chinese Restaurant Franchise ProcessGlobal Menu T=1
Epoch 1
Documents in epoch 1 are generated as before
Observations
-Popular topics at epoch 1 are likely to be popular at epoch 2- fk,2 is likely to smoothly evolve from fk,1
Topics at end of epoch 1
- Height (mk,1) represent topic popularity- fk,1 represents topic’s k distribution
Global Menu T=2
= *
Pseudo counts
Decay factor
f4,1f3,1f2,1f1,1 f5,1
![Page 34: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/34.jpg)
Recurrent Chinese Restaurant Franchise ProcessGlobal Menu T=1
Epoch 1
f4,1f3,1f2,1f1,1 f5,1
Global Menu T=2 New real dish served
f3,2f2,2
f3,2 ~ Normal(.| f3,1,r)
Inherited but not yet used
![Page 35: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/35.jpg)
Recurrent Chinese Restaurant Franchise ProcessGlobal Menu T=1
Epoch 1
Global Menu T=2
f3,2f2,2
-For customer w in restaurant 1
- [as in static case] Choose table j Nj
- Choose a new table b a - Sample a new dish for this table- Existing and inherited dish k m`k,2 + mk,2 - Existing but NOT inherited dish k m`k,2 Then fk,2 ~ Normal(.| fk,1,r)- A new dish g Then fnew ~ H
Generative Process
f4,1f3,1f2,1f1,1 f5,1
![Page 36: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/36.jpg)
Recurrent Chinese Restaurant Franchise ProcessGlobal Menu T=1
Epoch 1
Global Menu T=2
f3,2f2,2
-For customer w in restaurant 1
- [as in static case] Choose table j Nj
- Choose a new table b a - Sample a new dish for this table- Existing and inherited dish k m`k,2 + mk,2 - Existing but NOT inherited dish k m`k,2 Then fk,2 ~ Normal(.| fk,1,r)- A new dish g Then fnew ~ H
Generative Process
f4,1f3,1f2,1f1,1 f5,1
![Page 37: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/37.jpg)
Recurrent Chinese Restaurant Franchise ProcessGlobal Menu T=1
Epoch 1
Global Menu T=2
f3,2f2,2
-For customer w in restaurant 1
- [as in static case] Choose table j Nj
- Choose a new table b a - Sample a new dish for this table- Existing and inherited dish k m`k,2 + mk,2 - Existing but NOT inherited dish k m`k,2 Then fk,2 ~ Normal(.| fk,1,r)- A new dish g Then fnew ~ H
Generative Process
f1,2
f1,2 ~ Normal(.| f1,1,r)
f4,1f3,1f2,1f1,1 f5,1
![Page 38: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/38.jpg)
Recurrent Chinese Restaurant Franchise ProcessGlobal Menu T=1
Epoch 1
Global Menu T=2
f3,2f2,2
-For customer w in restaurant 1
- [as in static case] Choose table j Nj
- Choose a new table b a - Sample a new dish for this table- Existing and inherited dish k m`k,2 + mk,2 - Existing but NOT inherited dish k m`k,2 Then fk,2 ~ Normal(.| fk,1,r)- A new dish g Then fnew ~ H
Generative Process
f 6,2
f6,2 ~ H
f4,1f3,1f2,1f1,1 f5,1
![Page 39: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/39.jpg)
Recurrent Chinese Restaurant Franchise ProcessGlobal Menu T=1
Epoch 1
Global Menu T=2
f3,2f2,2 f 6,2
Epoch 2
f1,2
Global Menu T=3
died out topics Newly born
f4,1f3,1f2,1f1,1 f5,1
![Page 40: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/40.jpg)
Recurrent Chinese Restaurant Franchise ProcessGlobal Menu T=1
Epoch 1
Global Menu T=2
f3,2f2,2 f 6,2
Epoch 2
f1,2
Global Menu T=3
Topics’ distributions evolve over time?
Topics’ trends evolve over time?
Number of topics grow with the data?
f4,1f3,1f2,1f1,1 f5,1
![Page 41: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/41.jpg)
Recurrent Chinese Restaurant Franchise ProcessGlobal Menu T=1
Epoch 1
Global Menu T=2
f3,2f2,2 f 6,2
Epoch 2
f1,2
Global Menu T=3
-We just described a first order RCRF process- for a general D-order process
f4,1f3,1f2,1f1,1 f5,1
![Page 42: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/42.jpg)
Inference• Gibbs Sampling
– Sample a table for each word– Sample a topic for each table– Sample the topic parameter over time – Sample hyper-parameters
• How to deal with non-conjugacy– Algorithm 8 in Neal’s 1998 + Metropolis-Hasting
• Efficiency– The Markov blanket contains the previous and
following D epochs
![Page 43: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/43.jpg)
Sampling a Topic for a TableGlobal Menu T=1 Global Menu T=2 Global Menu T=3
f4,1f3,1f2,1f1,1 f5,1
Past FutureEmission
EfficiencyNon-Conjugacy
f3,2f2,2 f 6,2f1,2
![Page 44: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/44.jpg)
Sampling a Topic for a TableGlobal Menu T=1 Global Menu T=2 Global Menu T=3
f4,1f3,1f2,1f1,1 f5,1
Past FutureEmission
EfficiencyNon-Conjugacy
f3,2f2,2 f 6,2
f1,2
~ H= N(0,sI)
/g 3
![Page 45: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/45.jpg)
Sampling a Topic for a TableGlobal Menu T=1 Global Menu T=2
f3,2f2,2 f 6,2f1,2
Global Menu T=3
f4,1f3,1f2,1f1,1 f5,1
Past FutureEmission
Pre-computeAnd update
Non-Conjugacy
![Page 46: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/46.jpg)
Sampling Topic Parameters
• V| ~ f Mult( Logistic(f))• Linear-State space model with non-Gaussian
emission• Use Laplace approximation inside the Forward-
Backward algorithm• Use the resulting distribution as a proposal
f1 f2
fT
v v v
![Page 47: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/47.jpg)
Experiments • Simulated data
– Simulated 20 epochs with 100 data points in each epoch
• Timeline of the NIPS conference– 13 years– 1740 documents– 950 words per document– ~3500 vocabulary
![Page 48: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/48.jpg)
Simulation Experiment
2 4 6 8 10 12 14 16 18 201
2
3
4
5
6
7
8
9Ground truth
Time
Topic
Index
Sample Documents:
![Page 49: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/49.jpg)
Ground Truth
Recovered
0 2 4 6 8 10 12 14 16 18 20
Ground truth
Recovered
![Page 50: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/50.jpg)
1987
speech
Neurosience
NN
Classification
Methods
Control
Prob. Models
image
SOM
RL
Bayesian
Mixtures
Generalizatoin
1990
boosting
1991
Clustering
1995
ICA
Kernels
19961994
Memory
speechKernelsICA
PM
Classification
Mixtures
Control
![Page 51: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/51.jpg)
field code
temperature tree
boltzmann energy
annealing node
probability
field tree level
energy probability
node annealing boltzmann variables
tree variables
node level
probability field
distribution structure
graph energy
variables graph tree
probability field structure
node distribution
energy
1987 1990 1993 1996
probability variables tree
field distribution
graph nodes belief node
inference propagation
1999
em expert mixture
gating missing experts gaussian
parameters density
mixture em
likelihood missing experts
mixtures gaussian
parameters
1990 1994mixture gaussian
em likelihood
parameters analysis
density factor variables
distribution
1999
PM
Mixtures
wavelet natural
separation source
ica coefficients
independent basis
1995
source ica
blind separation coefficients
natural independent basis wavelet
1999ICA
method solution energy values
gradient convergence
equation algorithms
gradient weight method
methods local rate optimal descent solution
gradient matrix weight
algorithms local rate problems
point equation
matrix algorithms
gradient convergence
equation optimal method
parameter
Methods
![Page 52: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/52.jpg)
support kernel
svm regularization
sv vectors feature
regression
kernel support
sv svm
machines regression
vapnik feature solution
Kernels
kernel support
Svm regression
feature machines solution
margin pca
Kernel svm support
regression solution
machines matrix feature regularization
1996 1997 1998 1999
-Support Vector Method for Function Approximation, Regression Estimation, and Signal Processing, V.Vapnik, S. E. Golowich and A.Smola- Support Vector Regression MachinesH. Drucker, C. Burges, L. Kaufman, A. Smola and V. Vapnik-Improving the Accuracy and Speed of Support Vector Machines, C. Burges and B. Scholkopf
- From Regularization Operators to Support Vector Kernels, A. Smola and B. Schoelkopf- Prior Knowledge in Support Vector Kernels, B. Schoelkopf, P. Simard, A. Smola and V.Vapnik
- Uniqueness of the SVM Solution, C. Burges and D.. Crisp- An Improved Decomposition Algorithm for Regression Support Vector Machines, P. Laskov..... Many more
![Page 53: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/53.jpg)
The Big Picture Time
LDA Dynamic clusteringDynamic LDA
q z w
b N
D
a
K
HDPM
Mod
el D
imen
sion
Infinite Dynamic Topic Models
![Page 54: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/54.jpg)
Quantitative Analysis
![Page 55: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/55.jpg)
Analyzing the NIPS CorpusStart state
Posterior sample
(b)
(c)(a)
![Page 56: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/56.jpg)
Ban abortion with Constitutional amendment
Choice is a fundamental,
constitutional right
CS
BioPhy
time
Drillexplosion
time
“BP wasn't prepared for an oil spill at such depths”
BP: “We will make this right."
Temporal Dynamics Structural Correspondence
![Page 57: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/57.jpg)
Outline• Background• Temporal Dynamics
– Timelines for research publications– Storylines form news stream– User interest-lines
• Structural Correspondence– Across modalities– Across ideologies
![Page 58: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/58.jpg)
Problem Statement• Rapid growth of social media and news outlets• Lots of redundancy• How to get the big picture?
– What are the stories?– Who are the main entities?– When and how do they develop overtime?– How are they categorized? (sports, economics, etc)
![Page 59: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/59.jpg)
Proposed Solution• Topic models
– Discover long-term high-level themes• Sports• Health• Politics
• Dynamic clustering– Discover short-term ephemeral themes
• Cricket match• Sars epidemic
• Inference– Online algorithm using Sequential Monte Carlo
![Page 60: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/60.jpg)
Preliminary ResultSports
gamesWonTeamFinal
SeasonLeague
held
Politics
GovernmentMinister
AuthoritiesOpposition
OfficialsLeadersgroup
Accidents
PoliceAttach
runman
grouparrested
move
Border-Tension
NuclearBorderDialogueDiplomaticmilitantInsurgencymissile
PakistanIndiaKashmirNew DelhiIslamabadMusharrafVajpayee
UEFA-soccer
ChampionsGoalLegCoachStrikerMidfieldpenalty
Juventus AC Milan Real Madrid Milan Lazio RonaldoLyon
Tax-bills
TaxBillionCutPlanBudgetEconomylawmakers
BushSenateUSCongressFleischerWhite HouseRepublican
![Page 61: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/61.jpg)
Structure Browsing
More Like this StoryMore Like this Story
Middle-east-conflict
PeaceRoadmapSuicideViolenceSettlementsbombing
Israel PalestinianWest bankSharonHamasArafat
Based on topics
Nuclear programs
Nuclearsummitwarningpolicymissileprogram
North KoreaSouth KoreaU.SBushPyongyang
Nuclear+ topics [politics]
- India in any topic- Pakistan in any topic- India and Pakistan in any topic-……
![Page 62: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/62.jpg)
Outline• Background• Temporal Dynamics
– Timelines for research publications– Storylines form news stream– User interest-lines
• Structural Correspondence– Across modalities– Across ideologies
![Page 63: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/63.jpg)
Modeling Dynamic User Intent• How to model users’ intents?
– Long-term – Short-term– spurious
• Input– Queries issued by the user– Documents viewed by the user
• Output• Dynamic distribution over intents
![Page 64: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/64.jpg)
The Big PictureCarDealsvan
jobHiringdiet
![Page 65: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/65.jpg)
The Big PictureCarDealsvan
jobHiringdiet
HiringSalaryDietcalories
AutoPriceUsedinception
FlightLondonHotelweather
![Page 66: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/66.jpg)
The Big PictureCarDealsvan
jobHiringdiet
HiringSalaryDietcalories
AutoPriceUsedinception
FlightLondonHotelweather
MoviesTheatreArtgallery
![Page 67: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/67.jpg)
The Big PictureCarDealsvan
jobHiringdiet
HiringSalaryDietcalories
AutoPriceUsedinception
FlightLondonHotelweather
DietCaloriesRecipechocolate
MoviesTheatreArtgallery
SchoolSuppliesLoancollege
![Page 68: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/68.jpg)
The Big PictureCarDealsvan
jobHiringdiet
HiringSalaryDietcalories
AutoPriceUsedinception
FlightLondonHotelweather
DietCaloriesRecipechocolate
MoviesTheatreArtgallery
SchoolSuppliesLoancollege
CARS Art
DietJobs
Travel College
finance
![Page 69: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/69.jpg)
Highlights• Applications
– Behavioral targeting • Matching users to Ads
– But you can match users to • Stories• New research papers
• Challenges– Large scale ~ 35 M users– Incremental data
![Page 70: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/70.jpg)
Outline• Background• Temporal Dynamics
– Timelines for research publications– Storylines form news stream– User interest-lines
• Structural Correspondence– Across modalities– Across ideologies
![Page 71: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/71.jpg)
Ban abortion with Constitutional amendment
Choice is a fundamental,
constitutional right
CS
BioPhy
time
Drillexplosion
time
“BP wasn't prepared for an oil spill at such depths”
BP: “We will make this right."
Temporal Dynamics Structural Correspondence
![Page 72: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/72.jpg)
Biological Images
• High throughput devices in recent years• Important source of information for biologists• A pressing need to manage and organize this
information for retrieval and visualization tasks• Embedded within research papers • Pose challenges to mainstream text-image systems
FMI images Gel images
papers
![Page 73: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/73.jpg)
Biological Figures are challenging• Hierarchical Organization
– Multiple panels– Image labels and image pointers
• Scoped Caption• Global caption• Protein annotations• Free text annotations
Marketpeople
Scotlandwater
Bridge skywater
fishwater
Clouds jet plane
Mainstream image retrieval datasets
![Page 74: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/74.jpg)
The Big Picture
Mice + antibodies
Cancer + tubulin
Query Handling Module
Actin
Imageretrieval
Textualretrieval
MMretrieval
Anno-tation
Visualization
High level Overview
• High level overview: summary• Retrieval across modalities
– Image retrieval– Text-based retrieval– Text + protein based retrieval– Annotation
• Mixed Granularity– Input can be either panel or figure– Output can be either panel or figure
Tasks
![Page 75: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/75.jpg)
Why Queries Are Hard?
• What if I only want to retrieve figures that address the role of vha-8 during Larva state – Only addressed in panel E
• How can we compare figures with vastly different number of panels– Same study but with different
time resolution?
![Page 76: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/76.jpg)
The Big Picture
Extraction System
Affinity-purified rabbit antir mnp 41 antibodies
Monocolonal anti-cPAPB antibodies
Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized
Mice + antibodies
Cancer + tubulin
Query Handling ModuleAcross Modality and granularity
Actin
Imageretrieval
Textualretrieval
MMretrieval
Anno-tation
Visualization
High level OverviewScoped Caption
Global Caption
Protein entities
- Segment the figure into panels- Detect panel image pointer : a, b- Detect mention of pointer in text like (a)- Match image pointer to text label (CRF)- Detect named entities in text- See paper for reference
![Page 77: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/77.jpg)
The Big Picture
Extraction System
Affinity-purified rabbit antir mnp 41 antibodies
Monocolonal anti-cPAPB antibodies
Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized
Mice + antibodies
Cancer + tubulin
Query Handling ModuleAcross Modality and granularity
Actin
Imageretrieval
Textualretrieval
MMretrieval
Anno-tation
Visualization
High level Overview
![Page 78: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/78.jpg)
The Big Picture
Extraction System
Topic Modeling
Affinity-purified rabbit antir mnp 41 antibodies
Monocolonal anti-cPAPB antibodies
Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized
Mice + antibodies
Cancer + tubulin
Query Handling ModuleAcross Modality and granularity
Imageretrieval
Textualretrieval
MMretrieval
Anno-tation
Visualization
High level Overview
Actin
![Page 79: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/79.jpg)
The Big Picture
Extraction System
Topic Modeling
Affinity-purified rabbit antir mnp 41 antibodies
Monocolonal anti-cPAPB antibodies
Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized
Mice + antibodies
Cancer + tubulin
Query Handling ModuleAcross Modality and granularity
Imageretrieval
Textualretrieval
MMretrieval
Anno-tation Semantic Representation
FigurePanel
Learnt Topics for Visualization
Topic 1 Topic K
Actin
![Page 80: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/80.jpg)
Topic Models• Each topic has triplet distributions
– Multinomial distribution over words– Multinomial distribution over protein
words– Gaussian distribution over image
features– Texture and histograms
• Each topic models correspondence between its facets
Top panels
Feature 1 Feature M
![Page 81: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/81.jpg)
Structured Correspondence LDA
Affinity-purified rabbit antir mnp 41 antibodies
Monocolonal anti-cPAPB antibodies
Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized
q
a
Pf
MN
yp z
gwp
b m sa
wf
yf
F
Nf
x
l
b0
a
W
rv Lf
a
a b
K
Learnt Topics
Topic 1 Topic K
ProteinWord SLIF features
Background Topic
![Page 82: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/82.jpg)
Structured Correspondence LDA
Affinity-purified rabbit antir mnp 41 antibodies
Monocolonal anti-cPAPB antibodies
Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized
q
a
Pf
MN
yp z
gwp
b m sa
wf
yf
F
Nf
x
l
b0
a
W
rv Lf
a
a b
K
Learnt Topics
Topic 1 Topic K
Panel
Number of Panels
Background: annotation ratio
![Page 83: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/83.jpg)
A Sample TopicsTumorigenesis
Top Panels
Known Tumor-suppressors
Codes for protein with tumor-suppressing effect
Member of Caspase familywith role in apoptosis
(cell programmed death)
![Page 84: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/84.jpg)
Figure Embedding
![Page 85: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/85.jpg)
The Big Picture
Extraction System
Topic Modeling
Mice + antibodies
Cancer + tubulin
Query Handling ModuleAcross Modality and granularity
Imageretrieval
Textualretrieval
MMretrieval
Anno-tation Semantic Representation
FigurePanel
Learnt Topics for Visualization
Topic 1 Topic K
Actin
Affinity-purified rabbit antir mnp 41 antibodies
Monocolonal anti-cPAPB antibodies
Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized
![Page 86: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/86.jpg)
Protein Annotations
Query Handling ModuleAcross Modality and granularity
Ranked list of proteins
Evaluate ranking
• How to rank • Based on similarity between latent
representation of figure and protein
Latent figure representation
Latent protein representation
Affinity-purified rabbit antir mnp 41 antibodies
Monocolonal anti-cPAPB antibodies
Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized
![Page 87: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/87.jpg)
Protein Annotations• How to rank
• Based on similarity between latent representation of figure and protein
• How to evaluate the ranking• Best rank• Average Rank• Rank at full recall
Query Handling ModuleAcross Modality and granularity
Ranked list of proteins
Evaluate ranking
Affinity-purified rabbit antir mnp 41 antibodies
Monocolonal anti-cPAPB antibodies
Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized
ActinmAB
TubulinVhat-8MTP-1cPABP
![Page 88: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/88.jpg)
Protein Annotations
Query Handling ModuleAcross Modality and granularity
Ranked list of proteins
Evaluate ranking
Affinity-purified rabbit antir mnp 41 antibodies
Monocolonal anti-cPAPB antibodies
Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized
![Page 89: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/89.jpg)
Text-based Image Retrieval
• Input words (w) + protein (r)• Output ranked list of figures
– Use query language model• Measure precision-recall tradeoffs
Latent figure representation
Latent word representation
Latent protein representation
![Page 90: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/90.jpg)
Transfer Learning from Partial Figures
Affinity-purified rabbit antir mnp 41 antibodies
Monocolonal anti-cPAPB antibodies
Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized ..
Full figures
Tie the parameters
Partial Figures
![Page 91: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/91.jpg)
Does it Help?
Affinity-purified rabbit antir mnp 41 antibodies
Monocolonal anti-cPAPB antibodies
Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized ..
Protein annotation
protein annotation
![Page 92: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/92.jpg)
Transfer Learning from Partial Figures
Full figuresPartial Figures Affinity-purified rabbit antir mnp 41 antibodies
Monocolonal anti-cPAPB antibodies
Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized ..
p ( , Words, Protein )
q (Words, Protein ) Tie the parameters
p ( Words, Protein ) Bettermarginal
Betterdistribution
lifted
![Page 93: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/93.jpg)
Outline• Background• Temporal Dynamics
– Timelines for research publications– Storylines form news stream– User interest-lines
• Structural Correspondence– Across modalities– Across ideologies
![Page 94: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/94.jpg)
Ban abortion with Constitutional amendment
Choice is a fundamental,
constitutional right
CS
BioPhy
time
Drillexplosion
time
“BP wasn't prepared for an oil spill at such depths”
BP: “We will make this right."
Temporal Dynamics Structural Correspondence
![Page 95: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/95.jpg)
Problem StatementGiven
Builds a model that couldanswer following
Visualization• How does each ideology view mainstream events?• On which topics do they differ?• On which topics do they agree?
![Page 96: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/96.jpg)
Problem StatementGiven
Builds a model that couldanswer following
Classification•Given a new news article or a blog post, the system should deice:
• From which side it was written• Justify its answer on a topical level
• E.g. because its view on abortion coincides with the pro-choice stance
![Page 97: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/97.jpg)
Problem StatementGiven
Builds a model that couldanswer following
Structured browsing•Given a new news article or a blog post, the user can ask for :
• Examples of other articles from the same ideology about the same topic• Documents that could exemplify alternative views from other ideologies
![Page 98: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/98.jpg)
Approach: Build a Factored Model
W1 W2
b1
b1
bk-1
bk
f1,1
f1,2
f1,k
f2,1
f2,2
f2,k
Ideology 1Views
Ideology 2Views
Topics
![Page 99: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/99.jpg)
Example: Bitterlemons corpus
palestinian israelipeaceyear
political process state end right
government need
conflict way
security
palestinian israeliPeacepolitical
occupation process
end security conflict
way governmen
t people time year
force negotiation
bush US president american sharon
administration prime settlement pressure policy
washington ariel new middle
unit state american george powell minister colin visit internal policy statement
express pro previous package work transfer
european administration
arafat state leader roadmap george election month iraq week peace
june realistic yasir senior involvement clinton
november post mandate terrorism
US role
PalestinianView
IsraelieView
roadmap phase security ceasefire state plan
international step authority final quartet issue map
effort
roadmap end settlement implementation obligation
stop expansion commitment fulfill unit illegal present previou
assassination meet forward
process force terrorism unit road demand provide
confidence element interim discussion want union
succee point build positive recognize present
timetable
Roadmap process
syria syrian negotiate lebanon deal conference
concession asad agreement regional october
initiative relationship
track negotiation official leadership position
withdrawal time victory present second stand
circumstance represent sense talk strategy issue
participant parti negotiator
peace strategic plo hizballah islamic neighbor
territorial radical iran relation think obviou countri
mandate greater conventional intifada affect
jihad time
Arab Involvement
![Page 100: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/100.jpg)
Outline• Background• Temporal Dynamics
– Timelines for research publications– Storylines form news stream– User interest-lines
• Structural Correspondence– Across modalities– Across ideologies
• Summary and Timeline
![Page 101: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/101.jpg)
Summary• Topic models are flexible framework • Very useful if you
– Care about the hidden structure– Want to leverage the hidden structure in tasks for
which you have few labels– Have partially labeled data (many-many)
• Bayesian and Hierarchical models are not slow– It can be scaled– Can be made to work online
![Page 102: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/102.jpg)
Main Contributions• Models
– Time-varying non-parametric framework• Inference
– Distributed incremental inference algorithms– Online SMC algorithms
• Applications– In research publications– Social media
![Page 103: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/103.jpg)
Thanks!
Questions?
![Page 104: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/104.jpg)
Backup slides
![Page 105: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/105.jpg)
Hyper-parameter Sensitivityf1
f2 fT
v v v
![Page 106: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/106.jpg)
Hyper-parameter Sensitivityf1
f2 fT
v v v
![Page 107: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/107.jpg)
Hyper-parameter Sensitivity Global Menu T=3
-14 -12 -10 -8 -6 -4 -2 00
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
We
igh
t
Past
0.5
124
6
![Page 108: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/108.jpg)
Structured cLDA and cLDA
Market people
Affinity-purified rabbit antir mnp 41 antibodies
Monocolonal anti-cPAPB antibodies
Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized Blei and Jordan SIGIR 2003
![Page 109: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/109.jpg)
Affinity-purified rabbit antir mnp 41 antibodies
Monocolonal anti-cPAPB antibodies
Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized ..
Can we use cLDA instead?
Market people
Affinity-purified rabbit antir mnp 41 antibodies
Monocolonal anti-cPAPB antibodies
Whole captions replication Scoped captions replication
Lose structure can no longer answer figure queries
Under representationOver representation
![Page 110: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/110.jpg)
Mixtures and MM-models
fkf1
wi
q1 qj qk
fkf1
wi
p1
pj pk
- Two orthogonal dimensions- Mixtures- Membership models
![Page 111: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/111.jpg)
Example Story• Story: Obama’s Controversial pastor
– Topics• Politics• Religion• Race
– Entities: • Obama, Wright, Illinois
![Page 112: Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.](https://reader036.fdocuments.net/reader036/viewer/2022062511/5518a7ac550346991f8b4b30/html5/thumbnails/112.jpg)
Storyline Models• We can use clustering
– Each document belong to a story (cluster)– Lacks global structure
• What is shared across stories?• How about story classification?
• We can use topic models– Ignore the notion of story
• Tightly-focused, Short-term
– Topics are high-level concept• coarse-grained, Long-term