1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering...

56
1 Information Filtering Rong Jin

Transcript of 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering...

Page 1: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

1

Information Filtering

Rong Jin

Page 2: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

2

Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering

Page 3: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

3

Short vs. Long Term Info. Need

Short-term information need (Ad hoc retrieval) “Temporary need”, e.g., info about used cars Information source is relatively static User “pulls” information Application example: library search, Web search

Long-term information need (Filtering) “Stable need”, e.g., new data mining algorithms Information source is dynamic System “pushes” information to user Applications: news filter

Page 4: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

4

Short vs. Long Term Info. Need

Short-term information need (Ad hoc retrieval) “Temporary need”, e.g., info about used cars Information source is relatively static User “pulls” information Application example: library search, Web search

Long-term information need (Filtering) “Stable need”, e.g., new data mining algorithms Information source is dynamic System “pushes” information to user Applications: news filter

Page 5: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

5

Examples of Information Filtering News filtering Email filtering Movie/book/product recommenders Literature recommenders And many others …

Page 6: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

6

Information Filtering Basic filtering question: Will user U like item X? Two different ways of answering it

Look at what U likes characterize X content-based filtering

Look at who likes X characterize U collaborative filtering

Combine content-based filtering and collaborative filtering unified filtering (open research topic)

Page 7: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

7

Other Names for Information Filtering Content-based filtering is also called

“Adaptive Information Filtering” in TREC “Selective Dissemination of Information” (SDI)

in Library & Information Science Collaborative filtering is also called

Recommender systems

Page 8: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

8

Part I: Collaborative Filtering

Page 9: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

9

Example: Collaborative Filtering

User 1 1 5 3 4 3

User 2 4 1 5 2 5

User 3 2 ? 3 5 4?

Page 10: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

10

Example: Collaborative Filtering

User 1 1 5 3 4 3

User 2 4 1 5 2 5

User 3 2 ? 3 5 4

User 3 is more similar to user 1 than user 2

5 for movie “15 minutes” for user 3

5

Page 11: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

11

Collaborative Filtering (CF) vs. Content-based Filtering (CBF)

CF do not need content of items while CBF relies the content of items

CF is useful when content of items are not available or difficult to acquire are difficult to analyze

Problems with CF Privacy issues

Page 12: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

12

Why Collaborative Filtering?

Page 13: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

13

Why Collaborative Filtering? Because it worth a million dollars!

Page 14: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

14

Collaborative Filtering Goal: Making filtering decisions for an individual user based

on the judgments of other users

u1

u2

um

Users: U

Objects: O

o1 o2 … oj oj+1… on

3 1 …. … 4 2 ?

2 5 ? 4 3

? 3 ? 1 2

utest 3 4…… 1

Page 15: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

15

?

Collaborative Filtering Goal: Making filtering decisions for an individual user based

on the judgments of other users

u1

u2

um

Users: U

Objects: O

o1 o2 … oj oj+1… on

3 1 …. … 4 2 ?

2 5 ? 4 3

? 3 ? 1 2

utest 3 4…… 1

Page 16: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

16

Collaborative Filtering Goal: Making filtering decisions for an individual user based

on the judgments of other users

Memory-based approaches Given a test user u, find similar users {u1, …, ul}

Predict u’s rating based on the ratings of u1, …, ul

Page 17: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

17

Example: Collaborative Filtering

User 1 1 5 3 4 3

User 2 4 1 5 2 5

User 3 2 ? 3 5 4

User 3 is more similar to user 2 than user 1

5 for movie “15 minutes” for user 3

5

Page 18: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

18

Important Issues with CF How to determine the similarity between

different users?

How to aggregate ratings from similar users to form the predictions?

Page 19: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

19

Pearson Correlation for CF

V1 = (1, 3,4,3), va1 = 2.75 V3 = (2,3,5,4), va3 = 3.5 Pearson correlation measures the linear correlation between

two vectors

User 1 1 5 3 4 3

User 2 4 1 5 2 5

User 3 2 ? 3 5 4

Page 20: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

20

Pearson Correlation for CF

V1 = (1, 3, 4, 3), va1 = 2.75 V3 = (2, 3, 5, 4), va3 = 3.5

User 1 1 5 3 4 3

User 2 4 1 5 2 5

User 3 2 ? 3 5 4

0.93

Page 21: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

21

Pearson Correlation for CF

V1 = (1, 3, 4, 3), va1 = 2.75 V3 = (2, 3, 5, 4), va3 = 3.5

User 1 1 5 3 4 3

User 2 4 1 5 2 5

User 3 2 ? 3 5 4

0.93

Page 22: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

22

Pearson Correlation for CF

V1 = (1, 3, 4, 3), va1 = 2.75 V3 = (2, 3, 5, 4), va3 = 3.5

User 1 1 5 3 4 3

User 2 4 1 5 2 5

User 3 2 ? 3 5 4

0.93

Page 23: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

23

Pearson Correlation for CF

V2 = (4, 5, 2, 5), va2 = 4 V3 = (2, 3, 5, 4), va3 = 3.5

User 1 1 5 3 4 3

User 2 4 1 5 2 5

User 3 2 ? 3 5 4

0.93

-0.55

Page 24: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

24

Pearson Correlation for CF

V2 = (4, 5, 2, 5), va2 = 4 V3 = (2, 3, 5, 4), va3 = 3.5

User 1 1 5 3 4 3

User 2 4 1 5 2 5

User 3 2 ? 3 5 4

0.93

-0.55

Page 25: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

25

Pearson Correlation for CF

V2 = (4, 5, 2, 5), va2 = 4 V3 = (2, 3, 5, 4), va3 = 3.5

User 1 1 5 3 4 3

User 2 4 1 5 2 5

User 3 2 ? 3 5 4

0.93

-0.55

Page 26: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

26

Aggregate Ratings

va1 = 2.75, va2 = 4, va3 = 3.5

User 1 1 5 3 4 3

User 2 4 1 5 2 5

User 3 2 ? 3 5 4

0.93

-0.55

Estimated Relative Rating Average Rating

Roundup Rating

Page 27: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

27

Pearson Correlation for CF

V1 = (1, 3, 3), va1 = 2.33 V3 = (2, 3, 4), va3 = 3

User 1 1 5 3 ? 3

User 2 4 1 5 2 ?

User 3 2 ? 3 5 4

0.87

Page 28: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

28

Pearson Correlation for CF

V1 = (1, 3, 3), va1 = 2.33 V3 = (2, 3, 4), va3 = 3

User 1 1 5 3 ? 3

User 2 4 1 5 2 ?

User 3 2 ? 3 5 4

0.87

Page 29: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

29

Pearson Correlation for CF

V2 = (4, 5, 2), va2 = 3.67 V3 = (2, 3, 5), va3 = 3.33

User 1 1 5 3 ? 3

User 2 4 1 5 2 ?

User 3 2 ? 3 5 4

0.87

-0.79

Page 30: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

30

Pearson Correlation for CF

V2 = (4, 5, 2), va2 = 3.67 V3 = (2, 3, 5), va3 = 3.33

User 1 1 5 3 ? 3

User 2 4 1 5 2 ?

User 3 2 ? 3 5 4

0.87

-0.79

Page 31: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

31

Aggregate Ratings

va1 = 2.33, va2 = 3.67, va3 = 3.3

User 1 1 5 3 ? 3

User 2 4 1 5 2 ?

User 3 2 ? 3 5 4

0.87

-0.79

Estimated Relative Rating Average Rating

Roundup Rating

Page 32: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

32

Problems with Memory-based Approaches

User 1 ? 5 3 4 2

User 2 4 1 5 ? 5

User 3 5 ? 4 2 5

User 4 1 5 3 5 ?

Most users only rate a few items Two similar users can may not rate the same set of items

Clustering users and items

Page 33: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

33

Problems with Memory-based Approaches

User 1 ? 5 3 4 2

User 2 4 1 5 ? 5

User 3 5 ? 4 2 5

User 4 1 5 3 5 ?

Most users only rate a few items Two similar users can may not rate the same set of items

Clustering users and items

Page 34: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

34

Flexible Mixture Model (FMM)Cluster both users and items simultaneously

User 1 ? 5 3 4 2

User 2 4 1 5 ? 5

User 3 5 ? 4 2 5

User 4 1 5 3 5 ?

User clustering and item clustering are correlated !

Page 35: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

35

Evaluation: Datasets

EachMovie: http://www.grouplens.org/node/76 (no longer

available) MovieRating: http://www.grouplens.org/node/73

Netflix prize: http://archive.ics.uci.edu/ml/datasets/Netflix+Prize

MovieRating EachMovie Netflix

Number of Users 71,567 72,000 480,000

Number of Movies 10,681 1682 17,000

Avg. # of rated items/User 100 100 200

Number of ratings 5 6 5

Page 36: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

36

Evaluation Metric Mean Absolute Error (MAE): average absolute

deviation of the predicted ratings to the actual ratings on items.

The smaller MAE, the better the performance

Predicted ratingTrue rating

T: #Predicted Items

Page 37: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

37

Part II: Adaptive Filtering

Page 38: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

38

Adaptive Information Filtering Stable & long term interest, dynamic info source System must make a delivery decision

immediately as a document “arrives”

FilteringSystem

my interest:

Page 39: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

40

A Typical AIF System

...Binary

Classifier

UserInterestProfile

User

Doc Source

Accepted Docs

Initialization

Learning FeedbackAccumulated

Docs

utility func

User profile text

Page 40: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

41

Evaluation Typically evaluated with a utility function

Each delivered doc gets a utility value Good doc gets a positive value Bad doc gets a negative value E.g., Utility = 3* #good - 2 *#bad (linear utility)

NegFalseCNegTrueC

PosFalseCPosTrueC

CCCCU

NN

RR

NNRR

##

##

),,,(

Page 41: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

42

Three Basic Problems in AIF Making filtering decision (Binary classifier)

Doc text, profile text yes/no Initialization

Initialize the filter based on only the profile text or very few examples

Learning from Limited relevance judgments (only on “yes” docs) Accumulated documents

All trying to maximize the utility

Page 42: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

43

AIF vs. Retrieval, & Categorization Adaptive filtering as information retrieval

Rank the incoming documents Only returned top k ranked ones to users

Page 43: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

44

AIF vs. Retrieval, & Categorization Adaptive filtering as information retrieval

Rank the incoming documents Only returned top k ranked ones to users

Adaptive filtering as categorization problems Classify documents into the categories of interested

and not-interested Only returned the ones that are classified as of

being interested

Page 44: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

45

AIF vs. Retrieval, & Categorization Like retrieval over a dynamic stream of docs,

but ranking is impossible Like online binary categorization, but with no

initial training data and with limited feedback

Page 45: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

46

Major Approaches to AIF “Extended” retrieval systems

“Reuse” retrieval techniques to score documents Use a score threshold for filtering decision Learn to improve scoring with traditional feedback New approaches to threshold setting and learning

“Modified” categorization systems (not covered) Adapt to binary, unbalanced categorization New approaches to initialization Train with “censored” training examples

Page 46: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

47

A General Vector-Space Approach

doc vector

profile vector

Scoring Thresholding

yes

no

FeedbackInformation

VectorLearning

ThresholdLearning

threshold

UtilityEvaluation

Page 47: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

48

Difficulties in Threshold Learning

36.5 R33.4 N32.1 R29.9 ?27.3 ?…...

=30.0

Little/none labeled data

Correlation between threshold and profile vector

Exploration vs. Exploitation (related to utility function)

Page 48: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

49

Threshold Setting in Extended Retrieval Systems Utility-independent approaches (generally not

working well, not covered in this lecture)

Indirect (linear) utility optimization Logistic regression (score->prob. of relevance)

Direct utility optimization Empirical utility optimization Expected utility optimization given score distributions

All try to learn the optimal threshold

Page 49: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

50

Logistic Regression (Robertson & Walker. 00)

General idea: convert score of D to p(R|D)

Fit the model using feedback data Linear utility is optimized with a fixed prob. cutoff But,

Possibly incorrect parametric assumptions No positive examples initially limited positive feedback Doesn’t address the issue of exploration

)()|(log DsDRO

Page 50: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

51

Score Distribution Approaches( Aramptzis & Hameren 01; Zhang & Callan 01)

Assume generative model of scores p(s|R), p(s|N)

Estimate the model with training data Find the threshold by optimizing the expected

utility under the estimated model Specific methods differ in the way of defining

and estimating the scoring distributions

Page 51: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

52

Gaussian-Exponential Distributions P(s|R) ~ N(,2) p(s-s0|N) ~ E()

)()(

)|()|( 02

2

02

1 sss

eNsspeRsp

(From Zhang & Callan 2001)

Page 52: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

53

Score Distribution Approaches (cont.)

Pros Principled approach Arbitrary utility Empirically effective

Cons May be sensitive to the scoring function Exploration not addressed

Page 53: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

54

Direct Utility Optimization Given

A utility function U(CR+ ,CR- ,CN+ ,CN-)

Training data D={<si, {R,N}>}

Formulate utility as a function of the threshold and training data: U=F(,D)

Choose the threshold by optimizing F(,D), i.e., ),(maxarg DF

Page 54: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

55

Empirical Utility Optimization

Basic idea Compute the utility on the training data for each candidate

threshold (score of a training doc) Choose the threshold that gives the maximum utility

Difficulty: Biased training sample! We can only get an upper bound for the true optimal

threshold.

Solutions: Heuristic adjustment(lowering) of threshold Lead to “beta-gamma threshold learning”

Page 55: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

56

optimalθ

Illustration of Beta-Gamma Threshold Learning

Cutoff position

Utility

0 1 2 3 … K ...

zeroθ

, N

examplestrainingN

e N

#

*β-1(βα γ*

, [0,1]

The more examples,the less exploration(closer to optimal)

optimalzero θ*α-1(θ*αθ

Encourage exploration up to zero

Page 56: 1 Information Filtering Rong Jin. 2 Outline Brief introduction to information filtering Collaborative filtering Adaptive filtering.

57

Beta-Gamma Threshold Learning (cont.)

Pros Explicitly addresses exploration-exploitation

tradeoff (“Safe” exploration) Arbitrary utility (with appropriate lower bound) Empirically effective

Cons Purely heuristic Zero utility lower bound often too conservative