WIDIT at TREC-2005 HARD Track Kiduk Yang, Ning Yu, Hui Zhang, Ivan Record, Shahrier Akram WIDIT...
-
Upload
frank-gray -
Category
Documents
-
view
218 -
download
0
description
Transcript of WIDIT at TREC-2005 HARD Track Kiduk Yang, Ning Yu, Hui Zhang, Ivan Record, Shahrier Akram WIDIT...
WIDIT at TREC-2005 HARD TrackWIDIT at TREC-2005 HARD Track
Kiduk Yang, Ning Yu, Hui Zhang, Ivan Record, Shahrier AkramKiduk Yang, Ning Yu, Hui Zhang, Ivan Record, Shahrier AkramWIDIT LaboratoryWIDIT Laboratory
School of Library & Information ScienceSchool of Library & Information ScienceIndiana University at BloomingtonIndiana University at Bloomington
22
BackgroundBackground WIDIT LaboratoryWIDIT Laboratory
• http://elvis.slis.indiana.edu/http://elvis.slis.indiana.edu/• TREC research groupTREC research group
http://elvis.slis.indiana.edu/TREC/index.htmlhttp://elvis.slis.indiana.edu/TREC/index.html
Text REtrieval Conference (TREC)Text REtrieval Conference (TREC)• http://trec.nist.gov/http://trec.nist.gov/• HARD trackHARD track
http://http://ciir.cs.umass.edu/research/hard/guidelines.htmlciir.cs.umass.edu/research/hard/guidelines.html
33
HARD Track: HARD Track: OverviewOverview Test CollectionTest Collection
• AQUAINT corpusAQUAINT corpus English news (1M docs, 3 GB text)English news (1M docs, 3 GB text)
• AP newswire (1998-2000)AP newswire (1998-2000)• NY Times (1998-2000)NY Times (1998-2000)• Xinhua News Agency (1996-2000)Xinhua News Agency (1996-2000)
• 50 “difficult” topics50 “difficult” topics Not too many relevant documentsNot too many relevant documents Low scores in previous experimentsLow scores in previous experiments
TaskTask• Baseline RunBaseline Run
Retrieve 1000 ranked documents per each topicRetrieve 1000 ranked documents per each topic• Clarification Forms (CF)Clarification Forms (CF)
Create user feedback form to be filled out by TREC assessorsCreate user feedback form to be filled out by TREC assessors• Final RunFinal Run
Leverage CF data to improve the baseline resultLeverage CF data to improve the baseline result
44
Research QuestionsResearch Questions Baseline RunBaseline Run
• How can IR system handle difficult queries? How can IR system handle difficult queries? Why are HARD topics difficult? Why are HARD topics difficult? (Harmon & Buckley, 2004)(Harmon & Buckley, 2004)
• lack of good termslack of good terms add good termsadd good terms
• misdirection by non-pivotal terms or partial conceptmisdirection by non-pivotal terms or partial concept identify important terms & phrasesidentify important terms & phrases
Clarification FormClarification Form• What information to get from user?What information to get from user?
How can user help with difficult queries?How can user help with difficult queries?• identify good/important termsidentify good/important terms• identify relevant documentsidentify relevant documents
Final RunFinal Run• How to apply CF data to improve search results?How to apply CF data to improve search results?
CF-term expanded queryCF-term expanded query Rank boostingRank boosting Relevance FeedbackRelevance Feedback
55
WIDIT StrategyWIDIT Strategy Baseline RunBaseline Run
• Automatic Query ExpansionAutomatic Query Expansion add related termsadd related terms
• synonym identification, definition term extraction, Web query expansionsynonym identification, definition term extraction, Web query expansion identify important query termsidentify important query terms
• noun phrase extraction, keyword extraction by overlapping sliding noun phrase extraction, keyword extraction by overlapping sliding windowwindow
• FusionFusion Clarification FormClarification Form
• User FeedbackUser Feedback identify relevant termsidentify relevant terms identify relevant documentsidentify relevant documents
Final RunFinal Run• Manual Query ExpansionManual Query Expansion• Post-retrieval RerankingPost-retrieval Reranking• Relevance FeedbackRelevance Feedback• FusionFusion
66
WIDIT HARD System ArchitectureWIDIT HARD System Architecture
Topics
WordNet
NLPModule
Web
CF
Documents
OSWModule
WebXModule
IndexingModule
Inverted Index
SynonymDefinition
Noun PhraseWeb TermsOSW Phrase
Search Results
Retrieval ModuleFusion Module
Automatic Tuning
Baseline Result
CF Terms
Post-CFResult
Re-ranking Module
FinalResult
User
77
QE: QE: Overlapping Sliding Window (OSW)Overlapping Sliding Window (OSW) FunctionFunction
• identify important phrasesidentify important phrases AssumptionAssumption
• phrases appearing in multiple fields/sources tend to be importantphrases appearing in multiple fields/sources tend to be important AlgorithmAlgorithm
1.1. Set window size and the number or maximum words allowed Set window size and the number or maximum words allowed between words.between words.
2.2. Slide window from left to right in a field/source. For each of the Slide window from left to right in a field/source. For each of the phrase it catches, look for the same/similar phrase in other phrase it catches, look for the same/similar phrase in other fields/sources.fields/sources.
3.3. Output the OSW phrase when match is found.Output the OSW phrase when match is found.4.4. Change source field/source and repeat step 1 to 3 till all the Change source field/source and repeat step 1 to 3 till all the
fields/sources have been used. fields/sources have been used. ApplicationApplication
• Topic fieldsTopic fields title, description, narrativetitle, description, narrative
• Definition SourceDefinition Source WordIQ, Google, Dictionary.com, Answers.com WordIQ, Google, Dictionary.com, Answers.com
88
QE: QE: NLP1 & NLP2NLP1 & NLP2 NLP1NLP1
• Expand Acronyms/AbbreviationsExpand Acronyms/Abbreviations uses Web-harvested acronym/abbreviation listuses Web-harvested acronym/abbreviation list
• Identify nouns & noun phrasesIdentify nouns & noun phrases uses Brill taggeruses Brill tagger
• Find synonymsFind synonyms queries WordNetqueries WordNet
• Find definitionsFind definitions queries the Web (WordIQ, Google, Dictionary.com, queries the Web (WordIQ, Google, Dictionary.com,
Answers.com )Answers.com ) NLP2NLP2
• Refine noun phrase identificationRefine noun phrase identification uses multiple taggersuses multiple taggers
• Identify best synset based on term contextIdentify best synset based on term context uses sense disambiguation module by NLP group at UMuses sense disambiguation module by NLP group at UM
• Identify important termsIdentify important terms uses OSW on topic fields & definitionsuses OSW on topic fields & definitions
99
QE: QE: Noun Phrase IdentificationNoun Phrase Identification
Brill’sTagger Collin’s
Parser
MiniparPOS tagging Proper noun phrase
AND relation
Noun phrase
Simple phraseDictionary phrase Complex phrase
WordNet
Topics
1010
QE: QE: Web Query ExpansionWeb Query Expansion Basic IdeaBasic Idea
• Use the Web as a type of thesaurus to find related terms Use the Web as a type of thesaurus to find related terms (Grunfeld et al., 2004; Kwok et (Grunfeld et al., 2004; Kwok et al., 2005)al., 2005)
MethodMethod1.1. Web Query ConstructionWeb Query Construction
construct web query by selecting 5 most salient terms from HARD topicconstruct web query by selecting 5 most salient terms from HARD topic uses NLP-based techniques and uses NLP-based techniques and rotating windowrotating window to identify “salient” terms to identify “salient” terms
2.2. Web SearchWeb Search query Google with the Web queryquery Google with the Web query
3.3. Result Parsing & Term SelectionResult Parsing & Term Selection parse the top 100 search results (snippets & document texts)parse the top 100 search results (snippets & document texts) extract up to 60 “best” termsextract up to 60 “best” terms uses PIRC algorithm to rank the terms uses PIRC algorithm to rank the terms (Grunfeld et al., 2004; Kwok et al., 2005)(Grunfeld et al., 2004; Kwok et al., 2005)
Web Query Generator
Selected expansion terms
Processed Topics
Web Queries
Search ResultsGoogle
ParserTerm
Selector
1111
QE: QE: WebX by Rotating WindowWebX by Rotating Window RationaleRationale
• NLP-based identification of salient/important term does not NLP-based identification of salient/important term does not always workalways work
• Related terms to salient/important query terms are likely to Related terms to salient/important query terms are likely to appear frequently in search resultsappear frequently in search results
MethodMethod1.1. Rotate a 5-word window across HARD topic descriptionRotate a 5-word window across HARD topic description
• generates generates mm queries for a description of queries for a description of mm terms ( terms (mm>5)>5)2.2. Query GoogleQuery Google3.3. Merge all the resultsMerge all the results4.4. Rank the documents based on their frequency in Rank the documents based on their frequency in mm result result
lists.lists.5.5. Select 60 terms with highest weight (length-normalized Select 60 terms with highest weight (length-normalized
frequency) from top 100 documentsfrequency) from top 100 documents
SA
1212
Fusion: Fusion: Baseline RunBaseline Run Fusion PoolFusion Pool
• Query Formulation resultsQuery Formulation results combination of topic fields (title, description, narrative)combination of topic fields (title, description, narrative) stemming (simple plural stemmer, combo stemmer)stemming (simple plural stemmer, combo stemmer) term weights (okapi, SMART)term weights (okapi, SMART)
• Query Expansion resultsQuery Expansion results NLP, OSW, WQXNLP, OSW, WQX
Fusion FormulaFusion Formula• Result merging by Weighted SumResult merging by Weighted Sum
FSFSwsws = = ((wwii**NSNSii))where where wi is the weight of system i (relative contribution of each
system)NSi is the normalized score of a document by system iNSi = (Si – Smin) / (Smax – Smin)
Fusion OptimizationFusion Optimization• Training dataTraining data
2004 Robust test collection2004 Robust test collection• Automatic Fusion Optimization by CategoryAutomatic Fusion Optimization by Category
1313
Fusion: Fusion: OverviewOverview AssumptionAssumption
• Individual WeaknessIndividual Weakness single data source/method/system possesses weaknesssingle data source/method/system possesses weakness
• Complementary StrengthsComplementary Strengths the whole is better than sum of its partsthe whole is better than sum of its parts
StrategyStrategy• Combine a variety of diverse data source/method/systemsCombine a variety of diverse data source/method/systems
QuestionQuestion• What to combine?What to combine?• How to combine?How to combine?
1414
Fusion: Fusion: OptimizationOptimization Conventional Fusion Optimization approachesConventional Fusion Optimization approaches
• Exhaustive parameter combination Exhaustive parameter combination Step-wise search of the whole solution spaceStep-wise search of the whole solution space Computationally demanding when the number of parameter is largeComputationally demanding when the number of parameter is large
• Parameter combination based on past evidenceParameter combination based on past evidence Targeted search of restricted solution spaceTargeted search of restricted solution space
i.e., parameter ranges are estimated based on training datai.e., parameter ranges are estimated based on training data
Next-Generation Fusion Optimization approachesNext-Generation Fusion Optimization approaches• Non-linear Transformation of fusion component scoresNon-linear Transformation of fusion component scores
e.g. log transformation to compensate for the power law e.g. log transformation to compensate for the power law distribution of PageRankdistribution of PageRank
• Hybrid Fusion OptimizationHybrid Fusion Optimization Semi-automaticSemi-automatic Dynamic Tuning Dynamic Tuning (Yang & Yu, in press)(Yang & Yu, in press) Automatic Fusion Optimization by Category Automatic Fusion Optimization by Category (Yang et al., in press)(Yang et al., in press)
1515
Automatic Fusion OptimizationAutomatic Fusion Optimization
Results pool
Fetching result setsFor different categories
Automatic fusion optimization
performance gain> threshold?
Category 1Top 10 systems
Category nCategory 2Top system ineach query length
Yes
Nooptimized
fusion formula
SA
1616
Clarification FormClarification Form ObjectiveObjective
• Collect information from the user that can be used to improve Collect information from the user that can be used to improve the baseline retrieval resultthe baseline retrieval result
StrategyStrategy• Ask the user to identify and add relevant termsAsk the user to identify and add relevant terms
validation/filtering of system QE resultsvalidation/filtering of system QE results• nouns, synonyms, OSW & NLP phrasesnouns, synonyms, OSW & NLP phrases
manual QE terms that system missedmanual QE terms that system missed• free text boxfree text box
• Ask the user to identify relevant documentsAsk the user to identify relevant documents ProblemProblem
• HARD topics tend to retrieve non-relevant documents in top ranksHARD topics tend to retrieve non-relevant documents in top ranks• 3 minute time limit for each CF3 minute time limit for each CF
SolutionSolution• cluster top 200 results and select best sentence from each clusercluster top 200 results and select best sentence from each cluser• select best n sentences from top 200 resultsselect best n sentences from top 200 results• select best sentence from every select best sentence from every kkthth document document
1717
Clarification FormClarification Form SA
1818
Final RunFinal Run How to make use of CF data?How to make use of CF data?
• Search with CF-term expanded querySearch with CF-term expanded query• Boost the ranks of documentsBoost the ranks of documents
with CF-terms: phrases, BoolAND termswith CF-terms: phrases, BoolAND terms that are CF-relevant documentsthat are CF-relevant documents
• Relevance FeebackRelevance Feeback apply Rocchio RF algorithm using CF-relevant documentsapply Rocchio RF algorithm using CF-relevant documents
OSW PhrasesOSW Phrases• Post-retrieval rank boostingPost-retrieval rank boosting
boost the rank of OSW phrasesboost the rank of OSW phrases FusionFusion
• Rerun automatic fusion optimizationRerun automatic fusion optimization
SA
1919
Results: Results: Evaluation MeasuresEvaluation Measures• Mean Average Precision (MAP)Mean Average Precision (MAP)
Average Precision (AP) averaged over queriesAverage Precision (AP) averaged over queries AP = sum of precision where relevant item is retrieved AP = sum of precision where relevant item is retrieved //
number of relevant items in the collection number of relevant items in the collection • Single valued measure that reflects performance over all relevant Single valued measure that reflects performance over all relevant
documents documents • rewards the system that retrieves relevant documents at high ranksrewards the system that retrieves relevant documents at high ranks
Precision (P) Precision (P) • P = number of relevant items retrieved P = number of relevant items retrieved // total number of items total number of items
retrieved retrieved • A measure of the system’s ability to present only the relevant items A measure of the system’s ability to present only the relevant items
• R-PrecisionR-Precision RP = precision at rank R, RP = precision at rank R,
where R= number of relevant items in the where R= number of relevant items in the collection collection
De-emphasizes the exact ranking of documentsDe-emphasizes the exact ranking of documents
2020
Results: Results: OverallOverallMAPMAP R-PrecisionR-Precision
Baseline Title Run (Baseline Title Run (wdoqsz1d2)wdoqsz1d2) - - combo stemmer, okapi weight, QE w/ noun, acronym & definition combo stemmer, okapi weight, QE w/ noun, acronym & definition termsterms
0.16940.1694 0.24160.2416
Baseline Description Run (Baseline Description Run (wdoqdn1d2)wdoqdn1d2) -- combo stemmer, okapi weight, QE w/ noun & definition termscombo stemmer, okapi weight, QE w/ noun & definition terms
0.16980.1698 0.23950.2395
Baseline Fusion Run (Baseline Fusion Run (wdf1t3qf2)wdf1t3qf2) - - optimized fusion runoptimized fusion run
0.23240.2324 0.29610.2961
Final Title Run (Final Title Run (wf2t3qs1RODX)wf2t3qs1RODX) - - rank-boosting of OSW phrases & CF documentsrank-boosting of OSW phrases & CF documents
0.2513 (0.2513 (+48%)+48%) 0.3020 0.3020 (+25%)(+25%)
Final Description Run (Final Description Run (wf1t3qd1ROD10)wf1t3qd1ROD10) - - rank-boosting of OSW phrases & CF documentsrank-boosting of OSW phrases & CF documents
0.2062 (0.2062 (+21%)+21%) 0.2625 (0.2625 (+10%)+10%)
Final Fusion Run Final Fusion Run (wf1t10q1RODX)(wf1t10q1RODX) -- optimized fusion run optimized fusion run
0.2918 (0.2918 (+25%)+25%) 0.3442 (0.3442 (+16%)+16%)
2121
2222
ReferencesReferencesGrunfeld, L., Kwok, K.L., Dinstl, N., & Deng, P. (2004). TREC 2003 Robust, Grunfeld, L., Kwok, K.L., Dinstl, N., & Deng, P. (2004). TREC 2003 Robust,
HARD, and QA track experiments using PIRCS. HARD, and QA track experiments using PIRCS. Proceedings of the 12th Proceedings of the 12th Text Retrieval ConferenceText Retrieval Conference, 510-521., 510-521.
Harman, D. & Buckley, C. (2004). The NRRC Reliable Information Access (RIA) Harman, D. & Buckley, C. (2004). The NRRC Reliable Information Access (RIA) workshop. workshop. Proceedings of the 27th Annual International ACM SIGIR Proceedings of the 27th Annual International ACM SIGIR ConferenceConference, 528-529., 528-529.
Kwok, K. L., Grunfeld, L., Sun, H. L., & Deng, P. (2005). TREC2004 robust track Kwok, K. L., Grunfeld, L., Sun, H. L., & Deng, P. (2005). TREC2004 robust track experiments using PIRCS. experiments using PIRCS. Proceedings of the 13th Text REtrieval Proceedings of the 13th Text REtrieval ConferenceConference..
Yang, K., & Yu, N. (in press). WIDIT: Fusion-based Approach to Web Search Yang, K., & Yu, N. (in press). WIDIT: Fusion-based Approach to Web Search Optimization. Optimization. Asian Information Retrieval Symposium 2005Asian Information Retrieval Symposium 2005..
Yang, K., Yu, N., George, N., Loehrlen, A., MaCaulay, D., Zhang, H., Akram, S., Yang, K., Yu, N., George, N., Loehrlen, A., MaCaulay, D., Zhang, H., Akram, S., Mei, J., & Record, I. (in press). WIDIT in TREC2005 HARD, Robust, and Mei, J., & Record, I. (in press). WIDIT in TREC2005 HARD, Robust, and SPAM tracks. SPAM tracks. Proceedings of the 14th Text Retrieval ConferenceProceedings of the 14th Text Retrieval Conference..