Cross-Lingual Query Suggestion Using Query Logs of Different Languages
description
Transcript of Cross-Lingual Query Suggestion Using Query Logs of Different Languages
![Page 1: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/1.jpg)
1
Cross-Lingual Query Suggestion Using
Query Logs of Different Languages
SIGIR 07
![Page 2: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/2.jpg)
2
Abstract
• Query suggestion– To suggest relevant queries for a given query– To help users better specify their information
needs
• Cross-Lingual Query Suggestion (CLQS): – For a query in one language, we suggest similar or
relevant queries in other languages.• cross-lingual keyword bidding (Search Engine)
• cross-language information retrieval (CLIR)
![Page 3: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/3.jpg)
3
Introduction
• CLQS vs. Cross-Lingual Query Expansion – Full queries formulated by users in another
language.
• The users of search engines – similar interests in the same period of time– queries on similar topics in different languages
• Key point– How to learn a similarity measure between two
queries– MLQS: Term Co-Occurrence based MI and 2
![Page 4: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/4.jpg)
4
Estimating Cross-Lingual Query similarity
• Discriminative Model for Estimating Cross-Lingual Query Similarity
• Monolingual Query Similarity Measure Based on Click-through Information
• Features Used for Learning Cross-Lingual Query Similarity Measure– Bilingual Dictionary– Parallel Corpora– Online Mining for Related Queries– Monolingual Query Suggestion
• Estimating Cross-lingual Query Similarity
![Page 5: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/5.jpg)
5
Discriminative Model for Estimating Cross-Lingual Query Similarity – 1/2
– qf : a source language query
– qe : a target language query
– simML : Monolingual query similarity
– simCL : Cross-lingual query similarity
– Tqf : translation of qf in the target language
![Page 6: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/6.jpg)
6
Discriminative Model for Estimating Cross-Lingual Query Similarity – 2/2
• Learning: LIBSVM regression algorithm– f : feature functions– : mapping feature space onto kernel space– w : weight vector in the kernel space
– relevant vs. irrelevant– strongly relevant, weakly relevant or irrelevant
![Page 7: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/7.jpg)
7
Estimating Cross-Lingual Query similarity
• Discriminative Model for Estimating Cross-Lingual Query Similarity
• Monolingual Query Similarity Measure Based on Click-through Information
• Features Used for Learning Cross-Lingual Query Similarity Measure– Bilingual Dictionary– Parallel Corpora– Online Mining for Related Queries– Monolingual Query Suggestion
• Estimating Cross-lingual Query Similarity
![Page 8: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/8.jpg)
8
Monolingual Query Similarity Measure Based on Click-through Information
• click-through information in query logs [26]
• KN(x) : number of keyword in a query x
• RD(x) : number of clicked URLs for a query x
• = 0.4 , =0.6
![Page 9: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/9.jpg)
9
Estimating Cross-Lingual Query similarity
• Discriminative Model for Estimating Cross-Lingual Query Similarity
• Monolingual Query Similarity Measure Based on Click-through Information
• Features Used for Learning Cross-Lingual Query Similarity Measure– Bilingual Dictionary– Parallel Corpora– Online Mining for Related Queries– Monolingual Query Suggestion
• Estimating Cross-lingual Query Similarity
![Page 10: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/10.jpg)
10
1. Bilingual Dictionary – 1/2
– 120,000 unique entries (built-in-house)– Given an input query qf={wf1,wf2,…,wfn} (in source languag
e)– By bilingual dictionary D: D(wfi)={ti1,ti2,…,tim}
– C(x,y) is the number of queries in the log containing both x and y.
– C(x) is the number of queries in the log containing x. – N is the total number of queries in the log
![Page 11: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/11.jpg)
11
1. Bilingual Dictionary – 2/2
–
– The set of top-4 query translations is denoted as S(Tqf)
– T S(Tqf)• Retrieve all queries containing T in target language and
assign Sdict(T) as their value
![Page 12: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/12.jpg)
12
2. Parallel Corpora– Given a pair of queries
• qf : in the source language • qe : in the target language
– Bi-Directional Translation Score : • IBM model 1 & GIZA++ tool
• P(yj|xi) is the word to word translation probability
– Top 10 queries {qe} with qf from the query log
![Page 13: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/13.jpg)
13
3. Online Mining for Related Queries – 1/3
• OOV is a major knowledge bottleneck for query translation and CLIR
• Assumption :– A query in the target co-occurs with the source
query in many web pages– They are probably semantically related – but, amount of noise
![Page 14: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/14.jpg)
14
3. Online Mining for Related Queries – 2/3
– Frequency in the Snippets• For example:
– Given a query q=abc in source language
– By dictionary : a={a1,a2,a3}, b={b1,b2} and c={c1}
– Web query : q ^ (a1 v a2 v a3) ^ (b1 v b2) ^ (c1) in target language
– 700 snippets , most frequent 10 target queries
![Page 15: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/15.jpg)
15
3. Online Mining for Related Queries – 3/3
– Any query qe mined from the web will be associated with a feature CODC Measure with SCODC(qf,qe)
![Page 16: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/16.jpg)
16
4. Monolingual Query Suggestion
• Q0 : candidate queries (in target language)
– For each target query qe,
• SQML(qe) : monolingual source query
![Page 17: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/17.jpg)
17
Estimating Cross-Lingual Query similarity
• Discriminative Model for Estimating Cross-Lingual Query Similarity
• Monolingual Query Similarity Measure Based on Click-through Information
• Features Used for Learning Cross-Lingual Query Similarity Measure– Bilingual Dictionary– Parallel Corpora– Online Mining for Related Queries– Monolingual Query Suggestion
• Estimating Cross-lingual Query Similarity
![Page 18: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/18.jpg)
18
Estimating Cross-lingual Query Similarity
• Four categories of features are used to learn the cross-lingual query similarity.
• cross-lingual query similarity score– Learning: LIBSVM regression algorithm
• f : feature functions
• : mapping feature space onto kernel space
• w : weight vector in the kernel space
![Page 19: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/19.jpg)
19
Performance Evaluation – Log Data
• Data Resources : – MSN Search Engine
• French (source language) vs. English ( target language)– A one-month English query log
– 7 million unique English queries
– Occurrence frequency more than 5
• 5,000 French queries – 4,171 queries have their translations in the English queries
– 70% training weight of LIBSVM
– 10% development data
– 20% testing
![Page 20: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/20.jpg)
20
Performance Evaluation - CLIR
• Data Resources : – TREC6 CLIR data (AP88-90 newswire, 750MB)– 25 short French-English queries Pairs (CL1-CL25)
• average long 3.3
• match in the web query logs for training CLQS
Source Language
Target Language
BM25
CLIR
CLQS {q
e}qf
![Page 21: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/21.jpg)
21
• CLQS
![Page 22: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/22.jpg)
22
![Page 23: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/23.jpg)
23
• CLIR
![Page 24: Cross-Lingual Query Suggestion Using Query Logs of Different Languages](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814a7d550346895db7937b/html5/thumbnails/24.jpg)
24
Conclusion
• Cross-lingual query suggestion
• Query Logs
• French to English
• TREC6 French to English CLIR task– CLQO demonstrates the high quality