ppt
-
Upload
hondafanatics -
Category
Documents
-
view
116 -
download
0
Transcript of ppt
![Page 1: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/1.jpg)
1
Efficient Computation of Diverse Query Results
Erik Vee
joint work with
Utkarsh Srivastava, Jayavel Shanmugasundaram,Prashant Bhat, Sihem Amer Yahia
![Page 2: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/2.jpg)
2
Motivation
• Imagine looking for shoes on Yahoo! Shopping, and seeing only Reeboks
![Page 3: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/3.jpg)
3
Motivation
• Imagine looking for shoes on Yahoo! Shopping, and seeing only Reeboks
• … or looking for cars on Yahoo! Autos, andseeing only Hondas
![Page 4: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/4.jpg)
4
Motivation
• Imagine looking for shoes on Yahoo! Shopping, and seeing only Reeboks
• … or looking for cars on Yahoo! Autos, andseeing only Hondas
• … or looking for jobs on Yahoo! Hotjobs, andseeing only jobs from Yahoo!
• It is not enough to simply give the best response– Need diversity of answers
![Page 5: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/5.jpg)
5
Diversity Search
• If we display 30 results in 5 categories, then should show 6 items from each category– NB: Our goal is to show range of choices,
not representative sample
– Recurse on each subgroup of items
• Diversity crucial for users looking for range of results– e.g. Shopping, information gathering/research
• Useful for aiding navigation– Users tend to favor search-and-click over hierarchies
• Likely to give at least one good answer on first page
![Page 6: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/6.jpg)
6
Contributions
• Formally define diversity search– Other diversity-like approaches use extensive post-processing
or are not query-dependent
• Proved that traditional IR engines cannot produce guaranteed diverse results
• Gave novel algorithms to produce diverse results– Both one-pass (datastreaming) and probing algorithms
• Experimentally verified that these results are nearly as fast as normal top-k processing– Much faster than post-processing techniques
![Page 7: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/7.jpg)
7
What about other approaches?
• If not diverse enough, query again– E.g. If all results are from one company, issue another query– Bad for latency
• Issue multiple queries (one for Honda, one for Toyota...)– Can be prohibitively expensive (kills throughput)
• latency fine
– Some applications may have dozens of top-level categories
• Fetch extra results, then find most diverse set from this– Not guaranteed to get good results– Requires fetching additional results unnecessarily
• Fetch all results, then find diverse set– Many times slower
• Random sample of results– Miss important results this way
![Page 8: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/8.jpg)
8
What about clever scoring?
• Can we give each item a global “diversity” score, then find top-k using this?– Prove in paper: There is no global score that gives guaranteed
diversity
• Can we give each item a local “diversity” score, so that it has a different score in each list of the inverted index?– Prove in paper: There is no list-based scoring of the item that
gives guaranteed diversity
![Page 9: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/9.jpg)
9
Outline
• Definition of diversity
• Overview of our algorithms
• Our experimental results
![Page 10: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/10.jpg)
10
Diversity search
• Over all possible sets of top-k results that match query, return set with most diversity
• Paper defines diversity more precisely– Focus on hierarchy view of diversity (in next slides)
• For scored diversity (in which each item has a score)– Over all possible sets of top-k results with maximum score,
return set with highest diversity
– Note: Diversity only useful when score not too fine-grained
![Page 11: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/11.jpg)
11
Diversity definition (by picture)
Implicitly defineshierarchy
Make
Model
Color
Year
Text
Determine a category ordering
![Page 12: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/12.jpg)
12
Hierarchy after a query
Diversity search alwaysreturns valid results
E.g. Query text contains `Low`
![Page 13: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/13.jpg)
13
Hierarchy after a query
Diversity search alwaysreturns valid results
E.g. Query text contains `Low`
All siblings return thesame number of results(or as close as possible)
![Page 14: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/14.jpg)
14
Returning top-k diverse results
Diversity search alwaysreturns valid results
E.g. Query text contains `Low`
Suppose return k=4 results
Must return 2 Hondas and 2 Toyotas
Will not return2 green Civics
![Page 15: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/15.jpg)
15
Outline
• Definition of diversity
• Overview of our algorithms
• Our experimental results
![Page 16: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/16.jpg)
16
Algorithms
• One Pass– Never goes backward (just one pass over dataset)
– Maintains a top-k diverse set based on what has been seen
– Jumps ahead if more results will not help diversity
– Optimal one-pass algorithm
• Probe– May jump forward or backward (i.e. probes)
– Prove: at most 2k probes for top-k diverse result set
• Both also work for scored diversity
![Page 17: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/17.jpg)
17
Dewey IDs
Every branch gets a number
Every item then labeled,e.g. 0.2.0.1.0 isHonda Odyssey Green ’06 `Good miles’
Create invertedindex
low → 00000, 00010, 00100, 00200, 00300, 00310, 10000, 11000, 12000, 13000
![Page 18: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/18.jpg)
18
Next and Prev
Supports two basic operations: Next and Prev
E.g. Query text contains `Low`
Next(0.0.3.2.2) = 1.0.0.0.0Prev(2.0.0.0.0) = 1.3.0.0.0
Inverted index for ‘Low’ listsall items in Dewey ID order
In general, must find intersection of lists (still easy)
low → 00000, 00010, 00100, 00200, 00300, 00310, 10000, 11000, 12000, 13000
![Page 19: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/19.jpg)
19
One pass (for k = 2)
First finds 00000, 00010
Now knows Civic Greenno longer helps
Jumps by callingnext(0.0.1.0.0)
![Page 20: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/20.jpg)
20
Finds 00100Removes 00010
One pass (for k = 2)
First finds 00000, 00010
Now knows Civic Greenno longer helps!
Jumps by callingnext(0.0.1.0.0)
Now knows Civicno longer helps!
Jumps by callingnext(0.1.0.0.0)
![Page 21: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/21.jpg)
21
Finds 00100Removes 00010
One pass (for k = 2)
First finds 00000, 00010
Now knows Civic Greenno longer helps!
Jumps by callingnext(0.0.1.0.0)
Now knows Civicno longer helps!
Jumps by callingnext(0.1.0.0.0)
Finds 01000Removes 00100 Knows to stop
![Page 22: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/22.jpg)
22
Probe (for k = 4)
Calls next(0.0.0.0.0) and prev(∞. ∞. ∞. ∞. ∞)to find first and last items
Wants another Honda
Calls prev(0. ∞. ∞. ∞. ∞)
Discovers there are only2 top-level categories
![Page 23: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/23.jpg)
23
Probe (for k = 4)
Calls next(0.0.0.0.0) and prev(∞. ∞. ∞. ∞. ∞)to find first and last items
Wants another Honda
Calls prev(0. ∞. ∞. ∞. ∞)
Why not next(0.1.0.0.0)?
If Honda has only onechild, then will returna Toyota!
![Page 24: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/24.jpg)
24
Probe (for k = 4)
Calls next(0.0.0.0.0) and prev(∞. ∞. ∞. ∞. ∞)to find first and last items
Wants another Honda
Calls prev(0. ∞. ∞. ∞. ∞)
Finds 00310
Wants another Toyota
Calls next(1.0.0.0.0)
![Page 25: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/25.jpg)
25
Probe (for k = 4)
Calls next(0.0.0.0.0) and prev(∞. ∞. ∞. ∞. ∞)to find first and last items
Wants another Honda
Calls prev(0. ∞. ∞. ∞. ∞)
Finds 00310
Wants another Toyota
Calls next(1.0.0.0.0)
Finds 10000
![Page 26: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/26.jpg)
26
Outline
• Definition of diversity
• Overview of our algorithms
• Our experimental results
![Page 27: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/27.jpg)
27
Results
• Dataset consisted of listing from Yahoo! Autos
• Queries were synthetic to test various parameters– Selectivity, # predicates, # results
• Preprocessing time for 100K listings < 5min– Times shown are for 5K queries
• 4 algorithms– Basic: No diversity
– Naïve: Fetch everything, post-process
– OnePass: Our algorithm. Takes just one pass over data
– Probe: Our algorithm. May make multiple probes into data
![Page 28: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/28.jpg)
28
Comparable time for diversity search
unscored scored
Basic: No diversity
Naïve: Many times slower OnePass: Close to probe
Probe: Within factor 2 of no diversity
MultiQuery (not shown): Latency close to Basic, but throughput many times worse
![Page 29: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/29.jpg)
29
Results summary
• Getting diverse results not too much slower than getting non-diverse results– Many times faster than naïve approaches
• Multi-query approach has even worse throughput than naïve– But keeps latency low
• How does this compare to getting extra results, then finding a diverse subset?– Getting 2k results instead of k is about twice as slow
– Plus, does not guarantee diverse results
![Page 30: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/30.jpg)
30
Conclusions
• Can get guaranteed diversity, taking time close to normal top-k query– Almost as fast or faster than non-guaranteed results
– Diversity at every level
• Works even when items have scores
• Needs a different algorithm than traditional IR engines– Proved this in paper (under standard notions)
• Are there approximate notions that can use existing IR machinery?
![Page 31: ppt](https://reader033.fdocuments.net/reader033/viewer/2022052910/559a59531a28ab114a8b46b1/html5/thumbnails/31.jpg)
31