TA Contract Management 13-14 th July, 2011 Aslı Gülgör Burcu Arıkan Kara
İrem Arıkan, Srikanta Bedathur, Klaus Berberich Time Will Tell: Leveraging Temporal Expressions in...
-
Upload
bernice-bates -
Category
Documents
-
view
216 -
download
0
Transcript of İrem Arıkan, Srikanta Bedathur, Klaus Berberich Time Will Tell: Leveraging Temporal Expressions in...
İrem Arıkan, Srikanta Bedathur, Klaus Berberich
Time Will Tell:
Leveraging Temporal Expressions in IR
Users have temporal information needs
Query: Prime Minister United Kingdom 2000
PROBLEMTraditional information retrieval systems do not exploit the temporal content in documents
Temporal expressions are more than common terms
Motivation
Users have temporal information needs
Query: Prime Minister United Kingdom 2000
PROBLEMTraditional information retrieval systems do not exploit the temporal content in documents
OUR APPROACH
Integrates temporal dimension into a language model based retrieval framework
Temporal expressions are more than common terms
Motivation
Document d = { dtext ,dtemp }
dtext : a bag of textual terms
dtemp : a bag of temporal expressions
Document Model
Document d = { dtext ,dtemp }
dtext : a bag of textual terms
dtemp : a bag of temporal expressions
a temporal expression is considered as a time interval T = [ begin, end ]
begin end0
T[ ]
Document Model
Query q = { qtext ,qtemp }
qtext : set of textual terms
qtemp : set of temporal expressions
Prime Minister United Kingdom 2000
qtempqtext
Query Model
Our Baseline: Ponte and Croft‘s Model (LM)
Each document has a language model associated
Query is a random process
Documents are ranked according to the likelihood that the query would be generated by the language model estimated for each document
textqw
texttexttext dwPdqP )|(~)|(
Filtering Approach (LMF)
Idea: Discard all documents that do not contain any temporal expression relevant to the user‘s query
t
Filtering Approach
Idea: Discard all documents that do not contain any temporal expression relevant to the user‘s query
our definition of temporal relevance
only relevant, if it overlaps with a temporal expression from the query
t
28 Nov 1990 - 2 May 1997
2 May 1997 – 27 June 2007
2000
begin end
query
Filtering Approach
Idea: Discard all documents that do not contain any relevant temporal expressions to user‘s query
our definition of temporal relevance
only relevant, if it overlaps with a temporal expression from the query
t
28 Nov 1990 - 2 May 1997
2 May 1997 – 27 June 2007
begin end
Relevant
X Irrelevant
2000 query
Problem: has a black-and-white view of the world
Does not take into account
how many relevant temporal expressions a document contains
how closely they match the temporal expressions specified in the user‘s query
Filtering Approach
Problem: has a black-and-white view of the world
Does not take into account
how many relevant temporal expressions a document contains
how closely they match the temporal expressions specified in the user‘s query
query: 1980 – 1990
1980 – 1989 is more relevant than 23 March 1984
Filtering Approach
Idea: Assign higher relevance to a document, if it contains more temporal expressions that match more closely to the temporal expressions from the user‘s query
Weighted Approach (LMW)
Idea: Assign higher relevance to a document, if it contains more temporal expressions that match more closely to the temporal expressions from the user‘s query
We assume that qtext and qtemp are produced independently
)|()|()|( temptemptexttext dqPdqPdqP
Weighted Approach
Idea: Assign higher relevance to a document, if it contains more temporal expressions that match more closely to the temporal expressions from the user‘s query
We assume that qtext and qtemp are produced independently
Temporal expressions occur independently
)|()|()|( temptemptexttext dqPdqPdqP
tempqQ
temptemptemp dQPdqP )|()|(
Weighted Approach
Each temporal expression T in d is a sample from a different generative model
Generating a temporal expression Q = [qBegin, qEnd] given dtemp
1. draw a single temporal expression T=[dBegin, dEnd] at uniform from d
2. generate Q by the generative model that is associated with T
Weighted Approach
Each temporal expression T in d is a sample from a different generative model
Generating a temporal expression Q = [qBegin, qEnd] given dtemp
1. draw a single temporal expression T=[dBegin, dEnd] at uniform from d
2. generate Q by the generative model that is associated with T
The likelihood of generating Q by the set of generative models that
produced dtemp
tempdTtemp
temp TQPd
dQP )|(1
)|(
Weighted Approach
Generate Q = [qBegin, qEnd] from the query by the generative model that is associated with T = [dBegin, dEnd] from a document
dEnd dEnd+α(dEnd-dbegin)
)|()()|( qBeginqEndPqBeginPTQP
dBegin dEnddBegin-α(dEnd-dBegin)
qBegin qEnd
P(qBegin) P(qEnd|qBegin)
Weighted Approach
qBegin
Generate Q = [qBegin, qEnd] from the query by the generative model that is associated with T = [dBegin, dEnd] from a document
dEnd dEnd + α(dEnd-dbegin)dBegin dEnddBegin - α(dEnd-dBegin)
qBegin qEnd
P(qBegin) P(qEnd|qBegin)
Weighted Approach
qBegin
produces only relevant temporal expressions of T
P(Q|T) gets smaller as the length of their overlap decreases
)|()()|( qBeginqEndPqBeginPTQP
Dataset
HTML snapshot of English Wikipedia from May 2007 containing
~ 2M documents
Implementation
Terrier Information Retrieval Platform:
provides an implementation of Ponte & Croft's approach
LMF, LMW
Java + MySQL
A set of regular expressions for extracting temporal information
Experimental Evaluation
Anectodal query results - 1
LM LMF LMW
1 Art in Puerto Rico Jose del Castillo Jose del Castillo
2 Spanish Art List of Spanish Artists Roybal
3 Plazzo Bianco(Genoa) Roybal Augustine Esteve
4 Caprichos Augustine Esteve Maldonado
5 Portrait Painting Francisco Eduardo Tresguerras Luis Egidio Melendez
Spanish painter 18th century
Experimental Evaluation
Anectodal query results - 2
LM LMF LMW
1 Battle of Dunbar(1650) List of Norwegian Battles Battle of Gabbard
2 Monte Mataiur Battle of Portland Battle of Portland
3 St. George Caye Action of 22 February 1812 Battle of Schveningen
4 Culrain Scottland Naval Strategy Battle of Kentish Knock
5 First Anglo-Dutch War Battle of Gabbard Battle of Dungeness
Sea Battle 1650 - 1670
Experimental Evaluation
User Study
20 queries
Pooling top-10 results returned by the three methods
Relevance assessment by 15 users
highly relevant: 2
marginally relevant: 1
irrelevant: 0
NDCG as a measure of effectiveness
Experimental Evaluation
Conclusion
Documents are rich of temporal expressions, but existing retrieval models are ignorant of their inherent semantics
Our work proposes two methods addressing this problem
Initial experimental evidence shows that our methods improve retrieval effectiveness for temporal information needs
Queries
1 Mergers and Acquisitions <2001-2004>
2 United States Railway <1800-1900>
3 Folklore Music <1700-1799>
4 Earthquake <1980-1990>
5 Sea Battle <1650 - 1700>
6 United States Secretary of State <1950 - today>
7 Native Americans <1950 - today>
8 German Architecture <1919 - 1933>
9 Internet <1950 - 1995>
10 Olympic Games <1976>
Queries
11 Blues Music <1900 - 1930>
12 Personal Computer <1975 - 1985>
13 Clint Eastwood <1970 - 1979>
14 Black Death Spain <1600 - 1699>
15 Italian Fascism <1920 - 1950>
16 George Bush <1989 - 1992>
17 Flying Machine <1500 - 1799>
18 Spanish Painter <18th Century>
19 Economic Situation Germany <1920s>
20 Ford Motor Company <1900-1930>
generative model associated with T =[b,e]
e e+α(e-b)
b’b eb-α(e-b)
P(b’) P(e’)
Weighted Approach
only generates overlapping intervals of T
P(b’,e’) ~ |overlap|