Post on 18-Dec-2015
Empirical Development of anExponential Probabilistic Model
Using Textual Analysis to Build a Better Model
Jaime Teevan & David R. KargerCSAIL (LCS+AI), MIT
Goal: Better Generative Model
Generative v. discriminative modelApplies to many applications Information retrieval (IR)
Relevance feedback Using unlabeled data
Classification
Assumptions explicit
Using a Model for IR
1. Define model2. Learn parameters from query3. Rank documents
Hyper-learn
• Better model improves applications Trickle down to improve retrieval Classification, relevance feedback, …
• Corpus specific models
Overview
Related workProbabilistic models Example: Poisson Model Compare model to text
Hyper-learning the model Exponential framework Investigate retrieval performance
Conclusion and future work
Related Work
Using text for retrieval algorithm [Jones, 1972], [Greiff, 1998]
Using text to model text [Church & Gale, 1995], [Katz, 1996]
Learning model parameters [Zhai & Lafferty, 2002]
Hyper-learn the model from text!
Probabilistic Models
Rank documents by RV = Pr(rel|d)
Naïve Bayesian models
RV = Pr(rel|d)
Probabilistic Models
Rank documents by RV = Pr(rel|d)
Naïve Bayesian models
= Pr(dt|rel) features t
RV = Pr(rel|d) 8Open assumptionsFeature definitionFeature distribution family
words
# occs in doc
Defines the model!
Pr(d|rel)
Using a Naïve Bayesian Model
1. Define model2. Learn parameters from query3. Rank documents
Using a Naïve Bayesian Model
1. Define model2. Learn parameters from query3. Rank documents
Pr(dt|rel) =
Using a Naïve Bayesian Model
1. Define model2. Learn parameters from query3. Rank documents
Pr(dt|rel) = θ e -θ
dt!
dtPoisson Model
θ: specifies term distribution
1E-19
1E-171E-15
1E-13
1E-111E-09
1E-071E-05
0.001
0.1
0 1 2 3 4 5
Poisson
Term occurs exactly dt
times
Pr(
d t|rel)
Example Poisson Distribution
θ=0.0006
Pr(dt|rel)≈1E-15
+
Using a Naïve Bayesian Model
1. Define model2. Learn parameters from query3. Rank documents
Learn a θ for each term
Maximum likelihood θ Term’s average number of occurrence
Incorporate prior expectations
Using a Naïve Bayesian Model
1. Define model2. Learn parameters from query3. Rank documents
Using a Naïve Bayesian Model
1. Define model2. Learn parameters from query3. Rank documents
For each document, find RV
Sort documents by RV
= Pr(dt|rel)
. words t
RV
Using a Naïve Bayesian Model
1. Define model2. Learn parameters from query3. Rank documents
For each document, find RV
Sort documents by RV
= Pr(dt|rel)
. words t
RV
Which step goes wrong?
Using a Naïve Bayesian Model
1. Define model2. Learn parameters from query3. Rank documents
Using a Naïve Bayesian Model
1. Define model2. Learn parameters from query3. Rank documents
Pr(dt|rel) = θ e -θ
dt!
dt
1E-19
1E-171E-15
1E-13
1E-111E-09
1E-071E-05
0.001
0.1
0 1 2 3 4 5
DataPoisson
Term occurs exactly dt
times
Pr(
d t|rel)
How Good is the Model?
θ=0.0006
15 times
+
How Good is the Model?
1E-19
1E-171E-15
1E-13
1E-111E-09
1E-071E-05
0.001
0.1
0 1 2 3 4 5
DataPoisson
Term occurs exactly dt
times
Pr(
d t|rel)
θ=0.0006
15 times
Misfit!
+
Hyper-learning a Better FitThrough Textual Analysis
Using an Exponential Framework
Need framework for hyper-learning
Bernoulli
Poisson
Normal
Mixtures
Hyper-Learning Framework
Need framework for hyper-learning
Goal: Same benefits as Poisson Model One parameter Easy to work with (e.g., prior)
Bernoulli
Poisson
Normal
One parameter exponential families
Mixtures
Hyper-Learning Framework
Well understood, learning easy [Bernardo & Smith, 1994], [Gous, 1998]
Pr( dt | rel ) = f(dt) g(θ) e
Functions f(dt) and h(dt) specify family E.g., Poisson: f(dt) = (dt!)-1, h(dt) = dt
Parameter θ term’s specific distribution
Exponential Framework
θ h(dt)
Using a Hyper-learned Model
1. Define model2. Learn parameters from query3. Rank documents
Using a Hyper-learned Model
1. Hyper-learn model2. Learn parameters from query3. Rank documents
Using a Hyper-learned Model
1. Hyper-learn model2. Learn parameters from query3. Rank documents
Want “best” f(dt) and h(dt)
Iterative hill climbing Local maximum Poisson starting point
Using a Hyper-learned Model
1. Hyper-learn model2. Learn parameters from query3. Rank documents
Data: TREC query result sets Past queries to learn about future queries
Hyper-learn and test with different sets
Recall the Poisson Distribution
1E-19
1E-171E-15
1E-13
1E-111E-09
1E-071E-05
0.001
0.1
0 1 2 3 4 5
DataPoissonNew Model
Term occurs exactly dt
times
Pr(
d t|rel)
15 times
+
Poisson Starting Point - h(dt)
-2
-1
0
1
2
3
4
5
6
0 1 2 3 4 5
PoissonLearned
h(d
t)
dt
Pr(dt|rel) = f(dt) g(θ) eθ h(dt)
+
-2
-1
0
1
2
3
4
5
6
0 1 2 3 4 5
PoissonLearned
h(d
t)
dt
Hyper-learned Model - h(dt)Hyper-learned Model - h(dt)+
Pr(dt|rel) = f(dt) g(θ) eθ h(dt)
Poisson Distribution
1E-19
1E-171E-15
1E-13
1E-111E-09
1E-071E-05
0.001
0.1
0 1 2 3 4 5
DataPoissonNew Model
Term occurs exactly dt
times
Pr(
d t|rel)
15 times
+
1E-19
1E-171E-15
1E-13
1E-111E-09
1E-071E-05
0.001
0.1
0 1 2 3 4 5
DataPoissonNew Model
Term occurs exactly dt
times
Hyper-learned Distribution
15 times
Hyper-learned Distribution+
Pr(
d t|rel)
1E-19
1E-171E-15
1E-13
1E-111E-09
1E-071E-05
0.001
0.1
0 1 2 3 4 5
DataPoissonNew Model
Term occurs exactly dt
times
5 times
Hyper-learned DistributionHyper-learned Distribution+
Pr(
d t|rel)
1E-19
1E-171E-15
1E-131E-11
1E-091E-07
1E-050.001
0.1
0 1 2 3 4 5
DataPoissonNew Model
Term occurs exactly dt
times
30 times
Hyper-learned DistributionHyper-learned Distribution+
Pr(
d t|rel)
1E-19
1E-171E-15
1E-13
1E-111E-09
1E-071E-05
0.001
0.1
0 1 2 3 4 5
DataPoissonNew Model
Term occurs exactly dt
times
300 times
Hyper-learned DistributionHyper-learned Distribution+
Pr(
d t|rel)
Performing Retrieval
1. Hyper-learn model2. Learn parameters from query3. Rank documents
Performing Retrieval
1. Hyper-learn model2. Learn parameters from query3. Rank documents
Pr( dt | rel ) = f(dt) g(θ) e
Learn θ for each term
θ h(dt)
Labeled docs
Learning θ
Sufficient statistics Summarize all observed data τ1: # of observations τ2: Σobservations d h(dt)
Incorporating prior easy
Map τ1 and τ2 θ
20 labeled documents
Performing Retrieval
1. Hyper-learn model2. Learn parameters from query3. Rank documents
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
PoissonNew Model
Recall
Pre
cisi
on
Results: Labeled DocumentsResults: Labeled Documents
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
PoissonNew Model
Recall
Pre
cisi
on
Results: Labeled DocumentsResults: Labeled Documents
Performing Retrieval
1. Hyper-learn model2. Learn parameters from query3. Rank documents
Short query
Query = single labeled documentVector space-like equation
RV = Σ a(t, d) + Σ b(q, d)
Problem: Document dominatesSolution: Use only query portion Another solution: Normalize
Retrieval: Query
t in doc q in query
Retrieval: Query
0
0.1
0.2
0.3
0.4
0.5
0.6
PoissonNew ModelTF.IDF
Recall
Pre
cisi
on
Retrieval: Query
0
0.1
0.2
0.3
0.4
0.5
0.6
PoissonNew ModelTF.IDF
Recall
Pre
cisi
on
Retrieval: Query
0
0.1
0.2
0.3
0.4
0.5
0.6
PoissonNew ModelTF.IDF
Recall
Pre
cisi
on
Retrieval: Query
Conclusion
Probabilistic models Example: Poisson Model
Hyper-learning the model Exponential framework Learned a better model Investigate retrieval performance
- Easy to work with
- Better …
- Bad text model
- Heavy tailed!
Use model betterUse for other applications Other IR applications Classification
Correct for document lengthHyper-learn on different corpora Test if learned model generalizes Different for genre? Language?
People?
Hyper-learn model better
Future Work
Questions?
Contact us with questions:
Jaime Teevanteevan@ai.mit.edu
David Kargerkarger@theory.lcs.mit.edu