GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST

WHAT IS THE PROBLEM TO BE SOLVED?

• Query logs aren’t always available or the best tool to determine query suggestions• Most user queries don’t provide enough information

WHO CARES ABOUT THE PROBLEM?

• Builders of search programs without query logs• Users

WHAT HAVE OTHERS DONE?

• Most query suggestion work uses query logs• Suggestion of alternate queries (in non-query log

approaches):• Adding frequent terms occurring in close proximity• Auto-completion of last term• N-gram suggestions

• Different than this paper’s approach due to ranking of possible completions by n-gram occurrence frequency

WHAT IS THE PROPOSED SOLUTION?

• Rank possible phrase completions by semantic relation• Topical N-gram (TNG) model

RANKING OF PHRASES

• P is the set of phrases extracted by N-grams• Qu is the user query, while Qc is the already completed

portion and Qt is the uncompleted portion• Qu = Qc + Qt

• Ranked by probability of the occurrence of Qt

• Use of hidden topics

N-GRAM MODELING

• Find bigrams in document corpus• Concatenate to find larger N-gram phrases• Creates a cleaner list• More applicable to search engine use

EXPERIMENT DESIGN

• AP News and Labour news datasets• Standard N-gram generation only extracted 1, 2, and 3 grams• TNG-N-gram model found up to 10 grams• Relevance and Diversity used to evaluate efficacy• 20 test queries generated from titles of articles

RESULTS

• TNG with Probability performs better than standard N-grams• TNG with hidden topics provides “topically diverse” and

“semantically related” results

RESULTS• Relevance is highest for TNGSim• Diversity is also highest for

TNGSim

RESULTS

• Clarity scores used to calculate retrieval effectiveness• Difference between the query language model and the corpus

language model• Higher scores are better

• TNG model didn’t perform well• Claim that clarity is less important than retrieving semantically related

results AP News dataset

Labour dataset

NgramsProb 4.9 3.5TNGProb 4.2 2.7TNGSim 4.23 2.8

CONCLUSION

• TNG model can be effectively used in system without query logs• Good for domain-specific search engines

GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

Documents

Transcript of GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.