Information Retrieval in Department 1
description
Transcript of Information Retrieval in Department 1
![Page 1: Information Retrieval in Department 1](https://reader034.fdocuments.net/reader034/viewer/2022051417/568148f7550346895db618df/html5/thumbnails/1.jpg)
Information Retrieval in Department 1
Holger BastMax-Planck-Institut für Informatik (MPII)
Saarbrücken, Germany
Visit of the Scientific Advisory BoardSaarbrücken, June 2nd – 3rd, 2005
![Page 2: Information Retrieval in Department 1](https://reader034.fdocuments.net/reader034/viewer/2022051417/568148f7550346895db618df/html5/thumbnails/2.jpg)
How it got started … I shifted from formerly very theoretical work …
… to information retrieval topics
Over time a number of PhD/Master/Bachelor students joined in …
JosianeParreira
ThomasWarken
IngmarWeber
ChristianMortensen
DebapriyoMajumdar
… and a lot ofinteraction with
Gerhard Weikum's group
ChristianKlein
BenediktGrundmann
RegisNewo
DanielFischer
![Page 3: Information Retrieval in Department 1](https://reader034.fdocuments.net/reader034/viewer/2022051417/568148f7550346895db618df/html5/thumbnails/3.jpg)
What we are doing … Motivation
– even basic retrieval tasks are still far from being solved satisfactorily, e.g. searching my Email
Two main research areas in the past 2 years
– Concept-based retrieval
– Searching with Autocompletion
This presentation
– main idea behind these areas
– lots of demos and examples
– highlight two results
![Page 4: Information Retrieval in Department 1](https://reader034.fdocuments.net/reader034/viewer/2022051417/568148f7550346895db618df/html5/thumbnails/4.jpg)
a querya document expressed
in terms
Concept-Based Retrieval
internet 0 2 0 1 0 0
web 2 1 0 0 0 0
surfing 1 1 0 1 1 1
beach 0 0 1 1 1 1
hawaii 0 0 2 2 2 1
Hawaii, 2nd June 2004Dear Pen Pal,I am writing to you from Hawaii. They have got internet access right on the beach here, isn’t that great? I’ll go surfing now! your friend, CB
1
0
0
0
0
Equally dissimilar to query!
![Page 5: Information Retrieval in Department 1](https://reader034.fdocuments.net/reader034/viewer/2022051417/568148f7550346895db618df/html5/thumbnails/5.jpg)
query expressedin concepts
a querya document expressed
in terms
document expressedin concepts
Concept-Based Retrieval
internet 0 2 0 1 0 0
web 2 1 0 0 0 0
surfing 1 1 0 1 1 1
beach 0 0 1 1 1 1
hawaii 0 0 2 2 2 1
1 1 0 .5
0 0 WWWWWW
0 0 1 .5
1 1 HawaiiHawaii
1
0
0
0
0
1
0
![Page 6: Information Retrieval in Department 1](https://reader034.fdocuments.net/reader034/viewer/2022051417/568148f7550346895db618df/html5/thumbnails/6.jpg)
a conceptexpressedin terms
a document expressed
in terms
document expressedin concepts
Concept-Based Retrieval
internet 0 2 0 1 0 0
web 2 1 0 0 0 0
surfing 1 1 0 1 1 1
beach 0 0 1 1 1 1
hawaii 0 0 2 2 2 1
2 0
2 0
1 1
0 1
0 2
1 1 0 .5
0 0 WWWWWW
0 0 1 .5
1 1 HawaiiHawaii
![Page 7: Information Retrieval in Department 1](https://reader034.fdocuments.net/reader034/viewer/2022051417/568148f7550346895db618df/html5/thumbnails/7.jpg)
Concept-Based Retrieval
internet 0 2 0 1 0 0
web 2 1 0 0 0 0
surfing 1 1 0 1 1 1
beach 0 0 1 1 1 1
hawaii 0 0 2 2 2 1
2 0
2 0
1 1
0 1
0 2
1 1 0 .5
0 0 WWWWWW
0 0 1 .5
1 1 HawaiiHawaii
●
matrix multiplication
![Page 8: Information Retrieval in Department 1](https://reader034.fdocuments.net/reader034/viewer/2022051417/568148f7550346895db618df/html5/thumbnails/8.jpg)
Concept-Based Retrieval
The approximation actually adds to the precision
2 0
2 0
1 1
0 1
0 2
●1 1 0 .
50 0 WWWWWW
0 0 1 .5
1 1 HawaiiHawaii
internet 2 2 0 1 0 0
web 2 2 0 1 0 0
surfing 1 1 1 1 1 1
beach 0 0 1 .5
1 1
hawaii 0 0 2 1 2 2 matrix multiplication
Finding concepts = approximate low-rank matrix decomposition
![Page 9: Information Retrieval in Department 1](https://reader034.fdocuments.net/reader034/viewer/2022051417/568148f7550346895db618df/html5/thumbnails/9.jpg)
A Concrete Example
676 abstracts from the Max-Planck-Institute
– for example:
We present two theoretically interesting and empirically successful techniques for improving the linear programming approaches, namely graph transformation and local cuts, in the context of the Steiner problem. We show the impact of these techniques on the solution of the largest benchmark instances ever solved.
– 3283 words (words like and, or, this, … removed)
– abstracts come from 5 departments: Algorithms, Logic, Graphics, CompBio, Databases
– reduce to 10 concepts
![Page 10: Information Retrieval in Department 1](https://reader034.fdocuments.net/reader034/viewer/2022051417/568148f7550346895db618df/html5/thumbnails/10.jpg)
voronoi / diagram
200 400 6000number of concepts
logic / logics
200 400 6000number of concepts
logic / voronoi
200 400 6000number of concepts
How many concepts? Implicitly, the matrix decomposition assigns
a relatedness score to each pair of terms
→ every fixed number of concepts is wrong!
Bast/MajumdarSIGIR 2005
rela
ted
ness
![Page 11: Information Retrieval in Department 1](https://reader034.fdocuments.net/reader034/viewer/2022051417/568148f7550346895db618df/html5/thumbnails/11.jpg)
voronoi / diagram
200 400 6000number of concepts
logic / logics
200 400 6000number of concepts
logic / voronoi
200 400 6000number of concepts
How many concepts? Implicitly, the matrix decomposition assigns
a relatedness score to each pair of terms
we instead assess the shape of the curves!
Bast/MajumdarSIGIR 2005
rela
ted
ness
![Page 12: Information Retrieval in Department 1](https://reader034.fdocuments.net/reader034/viewer/2022051417/568148f7550346895db618df/html5/thumbnails/12.jpg)
Searching with Autocompletion
best understood by exampleand you can try it yourself via the new MPII webpages
An interactive search technology
– suggests completions of the word that is currently being typed
– along with that, hits are displayed (for the yet to be completed query)
![Page 13: Information Retrieval in Department 1](https://reader034.fdocuments.net/reader034/viewer/2022051417/568148f7550346895db618df/html5/thumbnails/13.jpg)
Useful in many ways Learn about formulations used in the collection
– e.g. "guestbook"
Minimum of information required
– e.g. people's names
Gives stemming functionality (without stemmer)
– e.g. "raghavans", "raghavan3", …
Gives error-correction functionality (without error-correction)
– e.g. "raghvan", "ragavan", …
Database-like queries
– e.g. publications by Kurt Mehlhorn
all this with a single functionalityno dictionary, no training, readily applicable to any collection
![Page 14: Information Retrieval in Department 1](https://reader034.fdocuments.net/reader034/viewer/2022051417/568148f7550346895db618df/html5/thumbnails/14.jpg)
The core algorithmic problem Given
– a set of documents D(the hits of the preceding part of the query)
– a range of words W(all completions of the last word the user has started typing)
Compute– the subset of documents D' ⊆
Dthat contain at least one word from W
– the subset of words W' ⊆ Wthat occur in at least one document of D
– typically |W'| << |W|
D = 17, 23, 48, 116, …
raga 11, 47, 97, 134, …
ragade 15, 77, 214, …
ragan 58, 917, …
ragchi 6, 107, 514, …
ragavan 23, 118, …
rage 211
raged 6, 111, 517, …
ragen 37, 919, …
ragged 14, 77, 112, 245, …
raggett 17, 51, 116, …
raggio 7, 22, 50, 714, …
raghavan 23, 57, 116, …
![Page 15: Information Retrieval in Department 1](https://reader034.fdocuments.net/reader034/viewer/2022051417/568148f7550346895db618df/html5/thumbnails/15.jpg)
The core algorithmic problem
D = 17, 23, 48, 116, …
raga 11, 47, 97, 134, …
ragade 15, 77, 214, …
ragan 58, 917, …
ragchi 6, 107, 514, …
ragavan 23, 118, …
rage 211
raged 6, 111, 517, …
ragen 37, 919, …
ragged 14, 77, 112, 245, …
raggett 17, 51, 116, …
raggio 7, 22, 50, 714, …
raghavan 23, 57, 116, …
Ordinary Inverted Index~|W| time per query
Bast/Mortensen/Weber~|W'| time per query
Given– a set of documents D
(the hits of the preceding part of the query)
– a range of words W(all completions of the last word the user has started typing)
Compute– the subset of documents D' ⊆
Dthat contain at least one word from W
– the subset of words W' ⊆ Wthat occur in at least one document of D
– typically |W'| << |W|