Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

28
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    225
  • download

    0

Transcript of Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Page 1: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Affinity RankYi Liu, Benyu Zhang, Zheng Chen

MSRA

Page 2: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Outline Motivation Related Work Model & Algorithm Evaluation Conclusion & Future work

Page 3: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Search for Useful Information

Full-text search

Importance Judgment

Manual compilation

Failure Still Exists

Page 4: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Example – “Spielberg”

Search

Page 5: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Example – “Spielberg” Search (Cont.)

Page 6: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Motivation Existing problem in IR applications

Similar search results dominate in top one/two pages Users feel tired to similar results of same topic Users cannot find what they need in those similar results

Situations where problem are/will be intensified Highly repetitive corpus, e.g.

Newsgroup News archive Specialized website

Generalized or short query

Page 7: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Diversity & Informativeness Diversity The coverage of different topics of a group of documents

InformativenessTo what extent a document can represent its topic locality

(high informativeness: inclusive)

Page 8: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Why? Traditional IR evaluation measure

Maximize relevance between query & results Most important results

To end-usersrelevant + important ≠ desirable

A way out Increase diversity in top results Increase the informativeness of each single results

Page 9: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Basic Idea Build similarity-based link map Link analysis Affinity Rank

indicating the informativeness of each document Rank adjustment

Only the most informative of each topic can rank high

Re-rank with Affinity Rank More diversified top results More informative top results

Page 10: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Related Work – link analysis Explicit

PageRank (Page et al. 1998)

HITS(Kleinberg, 1998)

Implicit DirectHit

(http://www.directhit.com) Small Web Search

(Xue et al. 2003)

Web author’s perspective

End-user’s perspective

Subjective Objective

Page 11: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Related Work – Clustering

Algorithm Complexity Naming

Scatter/Gather* O(kn) Centroid + ranked words

TopCat High Set of named entities

WBSC* O(m2+n) Ranked words

STC* O(n) Sets of N-grams

IF O(kn) -

PRSA O(knm) Ranked words

Bipartite O(nm)? Ranked words

n: #doc k: #clusters m: #words

* applied on clustering search results

Page 12: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Our proposed IR framework

AffinityGraph

Informativeness

DiversityPenalty

Relevance

DocumentCollection

Query

Query-independent

Query-dependentOutput

AffinityRank

Re-rank

Page 13: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Link Construction Similarity to directed link Directed graph Threshold

Save storage space Reduce noise brought by

overwhelmingly large amount of weak-similarity-links

BA

),cos(),( BABAsim

A

BAsimBAaff

),(),(

BA

B

BAsimABaff

),(),(

Page 14: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

AssumptionObservation : relation among documents varies

Some are similar, others are not Similarity varies

The more relatives a document has, the more informative it is itself

The more informative a document’s relatives are, the more informative it is itself

Page 15: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Link Analysis Link map adjacency matrix Row Normalize Based on two assumptions

Principal eigenvector rank score Implementation: Power Method

ijall

jiji MARAR

,

~ M

~

n

cMARcAR

ijalljiji

)1(~

,

en

cc

)1(~ M

Page 16: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

“Random Transform” Model A transforming document

jump from doc. to doc. at each time step

Markov Chainstationary transition probability principle eigenvector

informativeness

ccurrent

doc.

“relative” doc.

randomly picked doc.c1

) ( affinity

Page 17: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Rank Adjustment Greedy-like Algorithm

decrease the score of j by the part conveyed from i (the most informative one in the same topic)

T1-1

T1-6T1-5

T1-4

T1-3

T1-2

T2-3

T2-3

T2-1

iijjj ARMARAR ,

~

Page 18: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Re-rank Score-combine scheme

where

Rank-combine scheme

i

i

ii d

AR

AR

qSim

dqSimdqScore ,

log

log

)(

),(),(

),()( id dqSimMaxqSimi

id ARMaxARi

iARdqSimi dRankRankdqScoreii

, ),( ),(

1

Page 19: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Advantages of Affinity Rank Give attention to both diversity and

informativeness Implicitly expand the query towards the

multiple topics Automatically pick the representative ones for

each chosen topic Most of the computation can be computed

OFFLINE

Page 20: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Experiment Setup Dataset

Microsoft Newsgroup 117 Office product related newsgroups

256,449 posts (mainly in 4 months), about 400M Preprocess

Title & text body (citation, signature, etc. stripped) Stemming, stop words removal, tfidf weighting

Query Randomly picked 20 query scenarios with query words

Search Results Okapi Top 50 results as answer set

Page 21: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Evaluation – ground truth User Study

4 users independently evaluate all results For each query

First manually cluster all results into different topics Then score each result in terms of its informativeness in

corresponding topic Finally score each result in terms of its relevance to the

query Evaluation

Compare original ranking with new ranking (re-ranked by Affinity Rank)

3 aspects of ranking concerned -- diversity, informativeness & relevance in top n results

Page 22: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Definitions Diversity

diversity = No. of different topics in a document group

Informativeness3 - very informative2 - informative1 - somewhat informative0 - not informative

Relevance1 - relevant0 – hard to tell-1 - irrelevant

Page 23: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Experiment Result (1) Top 10 search results

Compared to traditional IR results

DiversityInformative

nessRelevance

RelativeChange

+31.02% +11.97% +0.72%

p value(t-test)

0.004632 0.002225 0.067255

Significant improvement in diversity & informative

without loss in relevance

Page 24: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Experiment Result (2) Diversity Improvement Informative Improvement

Affinity Rank efficiently improves both diversity & informativeness of top search results

(Re-ranking top 50 results all by Affinity Rank, e.g. )0iw

Page 25: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Experiment Result (3) - Parameter Tuning

Top 10 search results

Affinity Rank is robust

1. Parameter doesn’t affect much if enough weight is given

2. No over-tune problem - Simply re-rank all by Affinity Rank is nearly optimal)

Page 26: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Experiment Result (4) - Parameter Tuning

Improvement overview subject to weight adjustment

Affinity Rank STABLELY exerts positive influence on

diversity & informativeness enhancement

Page 27: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Conclusion A new IR framework Affinity Rank can help to improve diversity &

informativeness of search results, especially for TOP ones

Affinity Rank is computed offline, therefore brings few burden in online retrieval

Future work Metrics for information quantity measurement Scale to large collection

Page 28: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Thanks