WIDIT at TREC-2005 HARD Track Kiduk Yang, Ning Yu, Hui Zhang, Ivan Record, Shahrier Akram WIDIT...

WIDIT at TREC-2005 HARD TrackWIDIT at TREC-2005 HARD Track

Kiduk Yang, Ning Yu, Hui Zhang, Ivan Record, Shahrier AkramKiduk Yang, Ning Yu, Hui Zhang, Ivan Record, Shahrier AkramWIDIT LaboratoryWIDIT Laboratory

School of Library & Information ScienceSchool of Library & Information ScienceIndiana University at BloomingtonIndiana University at Bloomington

22

BackgroundBackground WIDIT LaboratoryWIDIT Laboratory

• http://elvis.slis.indiana.edu/http://elvis.slis.indiana.edu/• TREC research groupTREC research group

http://elvis.slis.indiana.edu/TREC/index.htmlhttp://elvis.slis.indiana.edu/TREC/index.html

Text REtrieval Conference (TREC)Text REtrieval Conference (TREC)• http://trec.nist.gov/http://trec.nist.gov/• HARD trackHARD track

http://http://ciir.cs.umass.edu/research/hard/guidelines.htmlciir.cs.umass.edu/research/hard/guidelines.html

http://elvis.slis.indiana.edu/

http://elvis.slis.indiana.edu/TREC/index.html

http://trec.nist.gov/

http://ciir.cs.umass.edu/research/hard/guidelines.html

http://ciir.cs.umass.edu/research/hard/guidelines.html

33

HARD Track: HARD Track: OverviewOverview Test CollectionTest Collection

• AQUAINT corpusAQUAINT corpus English news (1M docs, 3 GB text)English news (1M docs, 3 GB text)

• AP newswire (1998-2000)AP newswire (1998-2000)• NY Times (1998-2000)NY Times (1998-2000)• Xinhua News Agency (1996-2000)Xinhua News Agency (1996-2000)

• 50 “difficult” topics50 “difficult” topics Not too many relevant documentsNot too many relevant documents Low scores in previous experimentsLow scores in previous experiments

TaskTask• Baseline RunBaseline Run

Retrieve 1000 ranked documents per each topicRetrieve 1000 ranked documents per each topic• Clarification Forms (CF)Clarification Forms (CF)

Create user feedback form to be filled out by TREC assessorsCreate user feedback form to be filled out by TREC assessors• Final RunFinal Run

Leverage CF data to improve the baseline resultLeverage CF data to improve the baseline result

44

Research QuestionsResearch Questions Baseline RunBaseline Run

• How can IR system handle difficult queries? How can IR system handle difficult queries? Why are HARD topics difficult? Why are HARD topics difficult? (Harmon & Buckley, 2004)(Harmon & Buckley, 2004)

• lack of good termslack of good terms add good termsadd good terms

• misdirection by non-pivotal terms or partial conceptmisdirection by non-pivotal terms or partial concept identify important terms & phrasesidentify important terms & phrases

Clarification FormClarification Form• What information to get from user?What information to get from user?

How can user help with difficult queries?How can user help with difficult queries?• identify good/important termsidentify good/important terms• identify relevant documentsidentify relevant documents

Final RunFinal Run• How to apply CF data to improve search results?How to apply CF data to improve search results?

CF-term expanded queryCF-term expanded query Rank boostingRank boosting Relevance FeedbackRelevance Feedback

55

WIDIT StrategyWIDIT Strategy Baseline RunBaseline Run

• Automatic Query ExpansionAutomatic Query Expansion add related termsadd related terms

• synonym identification, definition term extraction, Web query expansionsynonym identification, definition term extraction, Web query expansion identify important query termsidentify important query terms

• noun phrase extraction, keyword extraction by overlapping sliding noun phrase extraction, keyword extraction by overlapping sliding windowwindow

• FusionFusion Clarification FormClarification Form

• User FeedbackUser Feedback identify relevant termsidentify relevant terms identify relevant documentsidentify relevant documents

Final RunFinal Run• Manual Query ExpansionManual Query Expansion• Post-retrieval RerankingPost-retrieval Reranking• Relevance FeedbackRelevance Feedback• FusionFusion

66

WIDIT HARD System ArchitectureWIDIT HARD System Architecture

Topics

WordNet

NLPModule

Web

CF

Documents

OSWModule

WebXModule

IndexingModule

Inverted Index

SynonymDefinition

Noun PhraseWeb TermsOSW Phrase

Search Results

Retrieval ModuleFusion Module

Automatic Tuning

Baseline Result

CF Terms

Post-CFResult

Re-ranking Module

FinalResult

User

77

QE: QE: Overlapping Sliding Window (OSW)Overlapping Sliding Window (OSW) FunctionFunction

• identify important phrasesidentify important phrases AssumptionAssumption

• phrases appearing in multiple fields/sources tend to be importantphrases appearing in multiple fields/sources tend to be important AlgorithmAlgorithm

1.1. Set window size and the number or maximum words allowed Set window size and the number or maximum words allowed between words.between words.

2.2. Slide window from left to right in a field/source. For each of the Slide window from left to right in a field/source. For each of the phrase it catches, look for the same/similar phrase in other phrase it catches, look for the same/similar phrase in other fields/sources.fields/sources.

3.3. Output the OSW phrase when match is found.Output the OSW phrase when match is found.4.4. Change source field/source and repeat step 1 to 3 till all the Change source field/source and repeat step 1 to 3 till all the

fields/sources have been used. fields/sources have been used. ApplicationApplication

• Topic fieldsTopic fields title, description, narrativetitle, description, narrative

• Definition SourceDefinition Source WordIQ, Google, Dictionary.com, Answers.com WordIQ, Google, Dictionary.com, Answers.com

88

QE: QE: NLP1 & NLP2NLP1 & NLP2 NLP1NLP1

• Expand Acronyms/AbbreviationsExpand Acronyms/Abbreviations uses Web-harvested acronym/abbreviation listuses Web-harvested acronym/abbreviation list

• Identify nouns & noun phrasesIdentify nouns & noun phrases uses Brill taggeruses Brill tagger

• Find synonymsFind synonyms queries WordNetqueries WordNet

• Find definitionsFind definitions queries the Web (WordIQ, Google, Dictionary.com, queries the Web (WordIQ, Google, Dictionary.com,

Answers.com )Answers.com ) NLP2NLP2

• Refine noun phrase identificationRefine noun phrase identification uses multiple taggersuses multiple taggers

• Identify best synset based on term contextIdentify best synset based on term context uses sense disambiguation module by NLP group at UMuses sense disambiguation module by NLP group at UM

• Identify important termsIdentify important terms uses OSW on topic fields & definitionsuses OSW on topic fields & definitions

99

QE: QE: Noun Phrase IdentificationNoun Phrase Identification

Brill’sTagger Collin’s

Parser

MiniparPOS tagging Proper noun phrase

AND relation

Noun phrase

Simple phraseDictionary phrase Complex phrase

WordNet

Topics

1010

QE: QE: Web Query ExpansionWeb Query Expansion Basic IdeaBasic Idea

• Use the Web as a type of thesaurus to find related terms Use the Web as a type of thesaurus to find related terms (Grunfeld et al., 2004; Kwok et (Grunfeld et al., 2004; Kwok et al., 2005)al., 2005)

MethodMethod1.1. Web Query ConstructionWeb Query Construction

construct web query by selecting 5 most salient terms from HARD topicconstruct web query by selecting 5 most salient terms from HARD topic uses NLP-based techniques and uses NLP-based techniques and rotating windowrotating window to identify “salient” terms to identify “salient” terms

2.2. Web SearchWeb Search query Google with the Web queryquery Google with the Web query

3.3. Result Parsing & Term SelectionResult Parsing & Term Selection parse the top 100 search results (snippets & document texts)parse the top 100 search results (snippets & document texts) extract up to 60 “best” termsextract up to 60 “best” terms uses PIRC algorithm to rank the terms uses PIRC algorithm to rank the terms (Grunfeld et al., 2004; Kwok et al., 2005)(Grunfeld et al., 2004; Kwok et al., 2005)

Web Query Generator

Google

Selected expansion terms

Processed Topics

Web Queries

Search ResultsGoogle

ParserTerm

Selector

1111

QE: QE: WebX by Rotating WindowWebX by Rotating Window RationaleRationale

• NLP-based identification of salient/important term does not NLP-based identification of salient/important term does not always workalways work

• Related terms to salient/important query terms are likely to Related terms to salient/important query terms are likely to appear frequently in search resultsappear frequently in search results

MethodMethod1.1. Rotate a 5-word window across HARD topic descriptionRotate a 5-word window across HARD topic description

• generates generates mm queries for a description of queries for a description of mm terms ( terms (mm>5)>5)2.2. Query GoogleQuery Google3.3. Merge all the resultsMerge all the results4.4. Rank the documents based on their frequency in Rank the documents based on their frequency in mm result result

lists.lists.5.5. Select 60 terms with highest weight (length-normalized Select 60 terms with highest weight (length-normalized

frequency) from top 100 documentsfrequency) from top 100 documents

SA

1212

Fusion: Fusion: Baseline RunBaseline Run Fusion PoolFusion Pool

• Query Formulation resultsQuery Formulation results combination of topic fields (title, description, narrative)combination of topic fields (title, description, narrative) stemming (simple plural stemmer, combo stemmer)stemming (simple plural stemmer, combo stemmer) term weights (okapi, SMART)term weights (okapi, SMART)

• Query Expansion resultsQuery Expansion results NLP, OSW, WQXNLP, OSW, WQX

Fusion FormulaFusion Formula• Result merging by Weighted SumResult merging by Weighted Sum

FSFSwsws = = ((wwii**NSNSii))where where wi is the weight of system i (relative contribution of each

system)NSi is the normalized score of a document by system iNSi = (Si – Smin) / (Smax – Smin)

Fusion OptimizationFusion Optimization• Training dataTraining data

2004 Robust test collection2004 Robust test collection• Automatic Fusion Optimization by CategoryAutomatic Fusion Optimization by Category

1313

Fusion: Fusion: OverviewOverview AssumptionAssumption

• Individual WeaknessIndividual Weakness single data source/method/system possesses weaknesssingle data source/method/system possesses weakness

• Complementary StrengthsComplementary Strengths the whole is better than sum of its partsthe whole is better than sum of its parts

StrategyStrategy• Combine a variety of diverse data source/method/systemsCombine a variety of diverse data source/method/systems

QuestionQuestion• What to combine?What to combine?• How to combine?How to combine?

1414

Fusion: Fusion: OptimizationOptimization Conventional Fusion Optimization approachesConventional Fusion Optimization approaches

• Exhaustive parameter combination Exhaustive parameter combination Step-wise search of the whole solution spaceStep-wise search of the whole solution space Computationally demanding when the number of parameter is largeComputationally demanding when the number of parameter is large

• Parameter combination based on past evidenceParameter combination based on past evidence Targeted search of restricted solution spaceTargeted search of restricted solution space

i.e., parameter ranges are estimated based on training datai.e., parameter ranges are estimated based on training data

Next-Generation Fusion Optimization approachesNext-Generation Fusion Optimization approaches• Non-linear Transformation of fusion component scoresNon-linear Transformation of fusion component scores

e.g. log transformation to compensate for the power law e.g. log transformation to compensate for the power law distribution of PageRankdistribution of PageRank

• Hybrid Fusion OptimizationHybrid Fusion Optimization Semi-automaticSemi-automatic Dynamic Tuning Dynamic Tuning (Yang & Yu, in press)(Yang & Yu, in press) Automatic Fusion Optimization by Category Automatic Fusion Optimization by Category (Yang et al., in press)(Yang et al., in press)

1515

Automatic Fusion OptimizationAutomatic Fusion Optimization

Results pool

Fetching result setsFor different categories

Automatic fusion optimization

performance gain> threshold?

Category 1Top 10 systems

Category nCategory 2Top system ineach query length

Yes

Nooptimized

fusion formula

SA

1616

Clarification FormClarification Form ObjectiveObjective

• Collect information from the user that can be used to improve Collect information from the user that can be used to improve the baseline retrieval resultthe baseline retrieval result

StrategyStrategy• Ask the user to identify and add relevant termsAsk the user to identify and add relevant terms

validation/filtering of system QE resultsvalidation/filtering of system QE results• nouns, synonyms, OSW & NLP phrasesnouns, synonyms, OSW & NLP phrases

manual QE terms that system missedmanual QE terms that system missed• free text boxfree text box

• Ask the user to identify relevant documentsAsk the user to identify relevant documents ProblemProblem

• HARD topics tend to retrieve non-relevant documents in top ranksHARD topics tend to retrieve non-relevant documents in top ranks• 3 minute time limit for each CF3 minute time limit for each CF

SolutionSolution• cluster top 200 results and select best sentence from each clusercluster top 200 results and select best sentence from each cluser• select best n sentences from top 200 resultsselect best n sentences from top 200 results• select best sentence from every select best sentence from every kkthth document document

1717

Clarification FormClarification Form SA

1818

Final RunFinal Run How to make use of CF data?How to make use of CF data?

• Search with CF-term expanded querySearch with CF-term expanded query• Boost the ranks of documentsBoost the ranks of documents

with CF-terms: phrases, BoolAND termswith CF-terms: phrases, BoolAND terms that are CF-relevant documentsthat are CF-relevant documents

• Relevance FeebackRelevance Feeback apply Rocchio RF algorithm using CF-relevant documentsapply Rocchio RF algorithm using CF-relevant documents

OSW PhrasesOSW Phrases• Post-retrieval rank boostingPost-retrieval rank boosting

boost the rank of OSW phrasesboost the rank of OSW phrases FusionFusion

• Rerun automatic fusion optimizationRerun automatic fusion optimization

SA

1919

Results: Results: Evaluation MeasuresEvaluation Measures• Mean Average Precision (MAP)Mean Average Precision (MAP)

Average Precision (AP) averaged over queriesAverage Precision (AP) averaged over queries AP = sum of precision where relevant item is retrieved AP = sum of precision where relevant item is retrieved //

number of relevant items in the collection number of relevant items in the collection • Single valued measure that reflects performance over all relevant Single valued measure that reflects performance over all relevant

documents documents • rewards the system that retrieves relevant documents at high ranksrewards the system that retrieves relevant documents at high ranks

Precision (P) Precision (P) • P = number of relevant items retrieved P = number of relevant items retrieved // total number of items total number of items

retrieved retrieved • A measure of the system’s ability to present only the relevant items A measure of the system’s ability to present only the relevant items

• R-PrecisionR-Precision RP = precision at rank R, RP = precision at rank R,

where R= number of relevant items in the where R= number of relevant items in the collection collection

De-emphasizes the exact ranking of documentsDe-emphasizes the exact ranking of documents

2020

Results: Results: OverallOverallMAPMAP R-PrecisionR-Precision

Baseline Title Run (Baseline Title Run (wdoqsz1d2)wdoqsz1d2) - - combo stemmer, okapi weight, QE w/ noun, acronym & definition combo stemmer, okapi weight, QE w/ noun, acronym & definition termsterms

0.16940.1694 0.24160.2416

Baseline Description Run (Baseline Description Run (wdoqdn1d2)wdoqdn1d2) -- combo stemmer, okapi weight, QE w/ noun & definition termscombo stemmer, okapi weight, QE w/ noun & definition terms

0.16980.1698 0.23950.2395

Baseline Fusion Run (Baseline Fusion Run (wdf1t3qf2)wdf1t3qf2) - - optimized fusion runoptimized fusion run

0.23240.2324 0.29610.2961

Final Title Run (Final Title Run (wf2t3qs1RODX)wf2t3qs1RODX) - - rank-boosting of OSW phrases & CF documentsrank-boosting of OSW phrases & CF documents

0.2513 (0.2513 (+48%)+48%) 0.3020 0.3020 (+25%)(+25%)

Final Description Run (Final Description Run (wf1t3qd1ROD10)wf1t3qd1ROD10) - - rank-boosting of OSW phrases & CF documentsrank-boosting of OSW phrases & CF documents

0.2062 (0.2062 (+21%)+21%) 0.2625 (0.2625 (+10%)+10%)

Final Fusion Run Final Fusion Run (wf1t10q1RODX)(wf1t10q1RODX) -- optimized fusion run optimized fusion run

0.2918 (0.2918 (+25%)+25%) 0.3442 (0.3442 (+16%)+16%)

2222

ReferencesReferencesGrunfeld, L., Kwok, K.L., Dinstl, N., & Deng, P. (2004). TREC 2003 Robust, Grunfeld, L., Kwok, K.L., Dinstl, N., & Deng, P. (2004). TREC 2003 Robust,

HARD, and QA track experiments using PIRCS. HARD, and QA track experiments using PIRCS. Proceedings of the 12th Proceedings of the 12th Text Retrieval ConferenceText Retrieval Conference, 510-521., 510-521.

Harman, D. & Buckley, C. (2004). The NRRC Reliable Information Access (RIA) Harman, D. & Buckley, C. (2004). The NRRC Reliable Information Access (RIA) workshop. workshop. Proceedings of the 27th Annual International ACM SIGIR Proceedings of the 27th Annual International ACM SIGIR ConferenceConference, 528-529., 528-529.

Kwok, K. L., Grunfeld, L., Sun, H. L., & Deng, P. (2005). TREC2004 robust track Kwok, K. L., Grunfeld, L., Sun, H. L., & Deng, P. (2005). TREC2004 robust track experiments using PIRCS. experiments using PIRCS. Proceedings of the 13th Text REtrieval Proceedings of the 13th Text REtrieval ConferenceConference..

Yang, K., & Yu, N. (in press). WIDIT: Fusion-based Approach to Web Search Yang, K., & Yu, N. (in press). WIDIT: Fusion-based Approach to Web Search Optimization. Optimization. Asian Information Retrieval Symposium 2005Asian Information Retrieval Symposium 2005..

Yang, K., Yu, N., George, N., Loehrlen, A., MaCaulay, D., Zhang, H., Akram, S., Yang, K., Yu, N., George, N., Loehrlen, A., MaCaulay, D., Zhang, H., Akram, S., Mei, J., & Record, I. (in press). WIDIT in TREC2005 HARD, Robust, and Mei, J., & Record, I. (in press). WIDIT in TREC2005 HARD, Robust, and SPAM tracks. SPAM tracks. Proceedings of the 14th Text Retrieval ConferenceProceedings of the 14th Text Retrieval Conference..

WIDIT at TREC-2005 HARD Track Kiduk Yang, Ning Yu, Hui Zhang, Ivan Record, Shahrier Akram WIDIT...

Documents

Transcript of WIDIT at TREC-2005 HARD Track Kiduk Yang, Ning Yu, Hui Zhang, Ivan Record, Shahrier Akram WIDIT...