GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

12
GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST

description

WHO CARES ABOUT THE PROBLEM? Builders of search programs without query logs Users

Transcript of GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

Page 1: GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST

Page 2: GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

WHAT IS THE PROBLEM TO BE SOLVED?

• Query logs aren’t always available or the best tool to determine query suggestions• Most user queries don’t provide enough information

Page 3: GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

WHO CARES ABOUT THE PROBLEM?

• Builders of search programs without query logs• Users

Page 4: GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

WHAT HAVE OTHERS DONE?

• Most query suggestion work uses query logs• Suggestion of alternate queries (in non-query log

approaches):• Adding frequent terms occurring in close proximity• Auto-completion of last term• N-gram suggestions

• Different than this paper’s approach due to ranking of possible completions by n-gram occurrence frequency

Page 5: GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

WHAT IS THE PROPOSED SOLUTION?

• Rank possible phrase completions by semantic relation• Topical N-gram (TNG) model

Page 6: GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

RANKING OF PHRASES

• P is the set of phrases extracted by N-grams• Qu is the user query, while Qc is the already completed

portion and Qt is the uncompleted portion• Qu = Qc + Qt

• Ranked by probability of the occurrence of Qt

• Use of hidden topics

Page 7: GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

N-GRAM MODELING

• Find bigrams in document corpus• Concatenate to find larger N-gram phrases• Creates a cleaner list• More applicable to search engine use

Page 8: GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

EXPERIMENT DESIGN

• AP News and Labour news datasets• Standard N-gram generation only extracted 1, 2, and 3 grams• TNG-N-gram model found up to 10 grams• Relevance and Diversity used to evaluate efficacy• 20 test queries generated from titles of articles

Page 9: GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

RESULTS

• TNG with Probability performs better than standard N-grams• TNG with hidden topics provides “topically diverse” and

“semantically related” results

Page 10: GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

RESULTS• Relevance is highest for TNGSim• Diversity is also highest for

TNGSim

Page 11: GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

RESULTS

• Clarity scores used to calculate retrieval effectiveness• Difference between the query language model and the corpus

language model• Higher scores are better

• TNG model didn’t perform well• Claim that clarity is less important than retrieving semantically related

results AP News dataset

Labour dataset

NgramsProb 4.9 3.5TNGProb 4.2 2.7TNGSim 4.23 2.8

Page 12: GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

CONCLUSION

• TNG model can be effectively used in system without query logs• Good for domain-specific search engines