GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.
-
Upload
lesley-mcdonald -
Category
Documents
-
view
228 -
download
0
description
Transcript of GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.
GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST
WHAT IS THE PROBLEM TO BE SOLVED?
• Query logs aren’t always available or the best tool to determine query suggestions• Most user queries don’t provide enough information
WHO CARES ABOUT THE PROBLEM?
• Builders of search programs without query logs• Users
WHAT HAVE OTHERS DONE?
• Most query suggestion work uses query logs• Suggestion of alternate queries (in non-query log
approaches):• Adding frequent terms occurring in close proximity• Auto-completion of last term• N-gram suggestions
• Different than this paper’s approach due to ranking of possible completions by n-gram occurrence frequency
WHAT IS THE PROPOSED SOLUTION?
• Rank possible phrase completions by semantic relation• Topical N-gram (TNG) model
RANKING OF PHRASES
• P is the set of phrases extracted by N-grams• Qu is the user query, while Qc is the already completed
portion and Qt is the uncompleted portion• Qu = Qc + Qt
• Ranked by probability of the occurrence of Qt
• Use of hidden topics
N-GRAM MODELING
• Find bigrams in document corpus• Concatenate to find larger N-gram phrases• Creates a cleaner list• More applicable to search engine use
EXPERIMENT DESIGN
• AP News and Labour news datasets• Standard N-gram generation only extracted 1, 2, and 3 grams• TNG-N-gram model found up to 10 grams• Relevance and Diversity used to evaluate efficacy• 20 test queries generated from titles of articles
RESULTS
• TNG with Probability performs better than standard N-grams• TNG with hidden topics provides “topically diverse” and
“semantically related” results
RESULTS• Relevance is highest for TNGSim• Diversity is also highest for
TNGSim
RESULTS
• Clarity scores used to calculate retrieval effectiveness• Difference between the query language model and the corpus
language model• Higher scores are better
• TNG model didn’t perform well• Claim that clarity is less important than retrieving semantically related
results AP News dataset
Labour dataset
NgramsProb 4.9 3.5TNGProb 4.2 2.7TNGSim 4.23 2.8
CONCLUSION
• TNG model can be effectively used in system without query logs• Good for domain-specific search engines