Meeting di Cooperazione Internazionale Macerata 2 Aprile 2009 Dott. Giuseppe Bordoni
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni...
Transcript of Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni...
![Page 1: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/1.jpg)
Conceptual structures in
modern information retrieval
Claudio CarpinetoClaudio CarpinetoFondazione Ugo BordoniFondazione Ugo Bordoni
[email protected]@fub.it
![Page 2: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/2.jpg)
OverviewOverview
• Keyword-based IR and early conceptual approachesKeyword-based IR and early conceptual approaches
• Context and concepts in modern topical IRContext and concepts in modern topical IR
• Emerging IR tasks requiring knowledge structuresEmerging IR tasks requiring knowledge structures
• Research at FUBResearch at FUB
• ConclusionsConclusions
![Page 3: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/3.jpg)
DocumentsDocuments QueryQuery
Vectors ofVectors ofweighted keywordsweighted keywords
Vector of Vector of weighted keywordsweighted keywords
Retrieved documentsRetrieved documents
MatchingMatching
Vector-based IR
![Page 4: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/4.jpg)
Term weighting
• tf.idf and vector space model (Salton) very popular in70’s and 80’s
• BM25 (Robertson) has been the state of the art in the 90’s
• Several recent term-weighting functions based on statistical language modeling (Ponte, Lafferty)
• A new weighting framework based on deviation from randomness + information gain (FUB + UG)
![Page 5: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/5.jpg)
W = Inf1. Inf2
tf . log [(N + 1) / (n + 0.5)]......…
tf / (tf + 1)......…
tfn = tf . log (1 + K . avg_l / l)
![Page 6: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/6.jpg)
Inherent limitations of keyword-based IR
• Vocabulary problemVocabulary problem
• Relations are ignoredRelations are ignored
![Page 7: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/7.jpg)
Early approaches to conceptual IR
• n-gramsn-grams (Salton 1975, Maarek 1989)
• parse treeparse tree (Dillon 1983, Metzler 1989)
• case relationscase relations (Fillmore 1968, Somers 1987)
• conceptualconceptual graphsgraphs (Dick 1991)
![Page 8: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/8.jpg)
Why early conceptual IR not successful
• No best representation schemeNo best representation scheme
• Manual coding too costlyManual coding too costly
• Automated coding too hardAutomated coding too hard
• Training required both for the indexer and the userTraining required both for the indexer and the user
• Effectiveness not clearly demonstratedEffectiveness not clearly demonstrated
• Retrieval task often not appropriateRetrieval task often not appropriate
![Page 9: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/9.jpg)
OverviewOverview
• Vector-based IR and early conceptual approachesVector-based IR and early conceptual approaches
• Context and concepts in modern topical IRContext and concepts in modern topical IR
• Emerging IR tasks requiring knowledge structuresEmerging IR tasks requiring knowledge structures
• Research at FUBResearch at FUB
• ConclusionsConclusions
![Page 10: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/10.jpg)
Evolution of topical IR
• Very short queriesVery short queries
• Heterogeneous collectionsHeterogeneous collections
• Unreliable sourcesUnreliable sources
• Interactive sessionsInteractive sessions
![Page 11: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/11.jpg)
IndexingIndexing
DocsDocs QueryQuery ContextContext
VisualizationVisualization
RankingRanking
UseUse
IndexingIndexing
InteractionInteraction
Model of modern topical IRModel of modern topical IR
![Page 12: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/12.jpg)
Ranking
Query
Inverted File
Weighted Query
Form. Docs
+norm
Select top D docs
Compute σ(w )
Select top E terms
Query Expansion
![Page 13: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/13.jpg)
Performance of retrieval feedback versus query difficultyPerformance of retrieval feedback versus query difficulty
![Page 14: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/14.jpg)
Ranking based on interdocument similarity
Cluster hypothesis (van Rijsbergen 1978)Cluster hypothesis (van Rijsbergen 1978)
ApproachesApproaches
- Matching the query against document clusters (Willet 1988)- Matching the query against document clusters (Willet 1988)
- Matching the query against transformed document- Matching the query against transformed document representations (GVSM, Wong 1987, LSI, Deerwester 1990)representations (GVSM, Wong 1987, LSI, Deerwester 1990)
- Computing the conceptual distance between query andComputing the conceptual distance between query and documents (Order-theoretical ranking, Carpineto 2000)documents (Order-theoretical ranking, Carpineto 2000)
![Page 15: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/15.jpg)
Order-theoretical ranking
NNS 0 FINANCE (Query)
1 NNS
FINANCE CREDIT
KBS (D7)
4 KBS
1 NNS
FINANCE BANK
ACCOUNT (D1)
1 NNS
1 FINANCE
2 NNS
BANK
2 NNS
BANK ACCOUNT
(D3)
2 FINANCE
CREDIT KBS (D4)
3 CREDIT
KBS (D5)
3 NNS
BANK RIVER
(D2)
3 BANK
4 BANK
KBS WATERS
(D6)
![Page 16: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/16.jpg)
Performance of order-theoretical ranking
• Better than hierarchic clustering and comparable to best matching on the whole collection
• Markedly better than both hierarchic clustering and best matching on non-matching relevant documents
• Order-theoretical ranking does not scale up well but it is synergistic with best matching document ranking
![Page 17: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/17.jpg)
OverviewOverview
• Vector-based IR and early conceptual approachesVector-based IR and early conceptual approaches
• Context and concepts in modern topical IRContext and concepts in modern topical IR
• Emerging IR tasks requiring knowledge structuresEmerging IR tasks requiring knowledge structures
• Research at FUBResearch at FUB
• ConclusionsConclusions
![Page 18: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/18.jpg)
Question Answering
Task:
Closed-class questions in unrestricted domains with
no guarantee of answer and result possibly scattered
over multiple documents
![Page 19: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/19.jpg)
Question Answering
Approach:
1. Recognize type of queries2. Retrieve relevant documents3. Find sought entities near question words4. Fall back to best-matching passage retrieval in case of failure
![Page 20: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/20.jpg)
Web Information Retrieval
![Page 21: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/21.jpg)
Web Information Retrieval
Current tasks:
named-entity finding tasktopic distillation task
Approach:
1. Use of multiple methods2. Combination of results via interpolation and normalization schemes
![Page 22: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/22.jpg)
XML document retrieval
Goal:
Use document structure to improve precision andrecall of unstructured queries
“concerts this weekend at Sofia under 20 euros”
Approaches:
• Automatic inference of query structure
• Semi-automatic query annotation
• Hybrid query languages
![Page 23: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/23.jpg)
OverviewOverview
• Vector-based IR and early conceptual approachesVector-based IR and early conceptual approaches
• Context and concepts in modern topical IRContext and concepts in modern topical IR
• Emerging IR tasks requiring knowledge structuresEmerging IR tasks requiring knowledge structures
• Research at FUBResearch at FUB
• ConclusionsConclusions
![Page 24: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/24.jpg)
Recommender systemsRecommender systems
“Related keyword” feature
versus
Context-dependent query reformulation
![Page 25: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/25.jpg)
DocumentDocument
RankingRanking
DocsDocs
QueryQueryQuery
Term ranking 1Term ranking 1
Term ranking 2Term ranking 2
Term ranking 3Term ranking 3
+
![Page 26: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/26.jpg)
![Page 27: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/27.jpg)
Combining text retrieval and text mining with concept latticesCombining text retrieval and text mining with concept lattices
Integration of multiple search strategies
(querying, browsing, thesaurus climbing,
bounding) into a unique Web interface
Goal
![Page 28: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it.](https://reader030.fdocuments.net/reader030/viewer/2022020417/5697bfa31a28abf838c96d88/html5/thumbnails/28.jpg)
The use of conceptual structures surfaces in traditionaltopic relevance retrieval and it is at the heart of manynon-topical retrieval tasks
Towards conceptual search
Conclusions
•Understand term meaning•Adapt to the user•Can translate between applications•Explainable•Capable of filtering and summarization