Generating Query Substitutions Alicia Wood. What is the problem to be solved?

download Generating Query Substitutions Alicia Wood. What is the problem to be solved?

If you can't read please download the document

description

Problem Imperfect description of need Search engine not able to retrieve documents matching query Need accurate and related query substitutions

Transcript of Generating Query Substitutions Alicia Wood. What is the problem to be solved?

Generating Query Substitutions Alicia Wood What is the problem to be solved? Problem Imperfect description of need Search engine not able to retrieve documents matching query Need accurate and related query substitutions Problem (cont.) Given a query Want to generate modified query (related) Improvements (specification) Neutral (spelling change, synonym) Loss of original meaning (generalization) Who cares about this problem and why? Who cares? User typing the query Want correct results with imperfect query What have others done to solve this problem and why is this inadequate? Previous Work Relevance/Pseudo relevance feedback Query term deletion Substituting query terms with related terms Latent Semantic Indexing (LSI) Relevance/Pseudo relevance feedback Submit query for initial retrieval Processing resulting documents Modify the query by expanding with additional terms from documents Perform second retrieval with modified query Can cause query drift Computationally expensive Query term deletion Loss of specificity from original query Substituting query terms Relies on an initial retrieval Latent Semantic Indexing (LSI) Identify patterns in relationships between terms and concepts in unstructured collection of text Computationally expensive What is the proposed solution to the problem? Solution Query modification based on pre- computed query and phrase similarity, Ranking proposed queries Similar queries /phrases derived from user query sessions Learned models used to re-rank Based on similarity of new query to original query Contributions 1.Identification of new source of data to identify similar queries and phrases 2.The definition of a scheme for scoring query suggestions 3.An algorithm to combine query and phrase suggestions Finds highly and broadly relevant phrases 4.Identification of features that are predictive of highly relevant query suggestions Classes of Suggestion Relevance Precise rewriting Match users intent, preserve core meaning automobile insurance automotive insurance Approximate rewriting direct close relationship to topic, scope narrowed or broadened Apple music player ipod shuffle Possible rewriting Categorical relationship to initial query, complementary product but distinct Eye glasses contact lenses Clear mismatch no clear relationship Jaguar xj6 os x jaguar Classes of Rewriting Specific Rewriting (1+2) closely related query highly relevant Broad Rewriting (1+2+3) query expansion relevant to user interests Substitutables Initial query -> generate relevant queries Replace query as whole or phrases Segment query into phrases Find query pairs where one segment has changed (britney spears) (mp3s) -> (britney spears) (lyrics) Pair Independence Hypothesis Likelihood Ratio High value = strong dependence between two terms Validation 1000 initial queries Generate single suggestion (q j ) for each Evaluate accuracy of approaches Train machine learned classifier Evaluate ability to produce higher quality suggestions Word distance, normalized edit distance, number of substitutions Suggestions criteria: Some words from initial query Modifications shouldnt be made at start of query Future Work Build semantic classifier Predict semantic class of rewriting Take inspiration from machine translation techniques Introduce language model Avoid producing nonsensical queries