Privacy-Preserving Eigentaste-based Collaborative Filtering

25
Privacy-Preserving Privacy-Preserving Eigentaste-based Eigentaste-based Collaborative Filtering Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering Anadolu University, Turkey

description

Privacy-Preserving Eigentaste-based Collaborative Filtering. Ibrahim Y akut and Huseyin P olat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering Anadolu University , Turkey. Collaborative Filtering (CF). Problem Information Overload. Solution Collaborative Filtering. - PowerPoint PPT Presentation

Transcript of Privacy-Preserving Eigentaste-based Collaborative Filtering

Page 1: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Privacy-Preserving Privacy-Preserving Eigentaste-based Eigentaste-based

Collaborative FilteringCollaborative Filtering

Ibrahim Yakut and Huseyin Polat{iyakut,polath}@anadolu.edu.tr

Department of Computer Engineering

Anadolu University, Turkey

Page 2: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Collaborative Filtering(CF)Collaborative Filtering(CF)

21.04.23 IWSEC'07 2

ProblemInformation Overload

Solution Collaborative

Filtering

Page 3: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Collaborative Filtering Collaborative Filtering Recent technique for filtering and

recommendationApplications

◦E-commerce◦Search engines◦Direct recommendations

21.04.23 IWSEC'07 3

Page 4: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

21.04.23 IWSEC'074

Collaborative Filtering ProcessCollaborative Filtering Process

i1 i2 iq im

u1

u2

ua

un

Active user

Prediction

Paq = Prediction on item q for active user

Item for which prediction is sought

Page 5: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Proposed by Goldberg et al in 2001The main feature: Online

computation in constant time.Secondly, flexibly usage of several

clustering algorithms.Based on Principal Component

AnalysisApplication in Jester: online joke

recommendation. http://eigentaste.berkeley.edu/

21.04.23 IWSEC'07 5

EigenTasteEigenTaste

Page 6: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Eigentaste AlgorithmEigentaste Algorithm

Step.1 Find correlation matrix of AStep.2 Find eigenvectors(E) and eigenvalues() of

C

21.04.23 IWSEC'07 6

AAn

C T

1

1

D:nxmA: nxk

User-item matrix

n us

ers

m items k gauge items

Correlation Matrix of A

Page 7: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Eigentaste Algorithm Eigentaste Algorithm cont’dcont’dStep.3 Take first m=2 eigenvectors and

project A. x = AEm

T = AE2T

Step.4 Cluster the projected data using RRC.

21.04.23 IWSEC'07 7

Recursive Rectangular Clustering(RRC)

Step.5 Construct a lookup table with mean of nongauge item ratings for each clusters.

Page 8: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Eigentaste- onlineEigentaste- online

When active user(a) enters,◦Rate the items in gauge set.◦Using PCs of his data, a is projected◦Find representative cluster◦Recommend objects based on

preconstructed lookup table.

21.04.23 IWSEC'07 8

Disapprove Approve

Page 9: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

MotivationMotivationMentioned algorithm is succesfulBut due to privacy risks, collection

of truthful and trustworthy data is challenge!!!

Therefore, how can users give data for CF purposes without jeopardizing their privacy?

Is it possible to use perturbed data in Eigentaste-based algorithms?

21.04.23 IWSEC'07 9

Page 10: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Modifications on OriginalModifications on OriginalNormalization:

◦Instead of item mean and std, user mean and std.

Clustering:◦Instead of RRC, k-means clustering is

used.Prediction

◦Instead of look up table directly, denormalize then predict.

21.04.23 IWSEC'07 10

u

uujuj

vvz

qaaaq zvp

Page 11: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Masking dataMasking data

21.04.23 IWSEC'07 11

CF Process

Central Database

User1

User2 Usern-1 Usern

+R1 +R2+Rn-1 +Rn

Randomized Pertubation

Technique (RPT)Aggrawal&Srikant,

2000

Page 12: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Masking ProcessMasking Process

1. Users and servers agree on γ, θ, δ

2. Each user u compute z-scores of their ratings

3. u selects σu over [0, γ] uniformly randomly, use it as std of masking data

4. u selects ru over [0,1], if ru<= θ, use uniform otherwise gaussian

5. u selects xer over [0, δ]. %xer of unfilled cells to be filled with noise

21.04.23 IWSEC'07 12

γ θ δ

Page 13: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Masking ProcessMasking Processu creates mu number of random

numbers where◦mu= number of rated cell+xer

◦std=σu, μ=0, gaussian or uniform(√3 .σu) wrt ru

Mask his private data by adding this noise data. Here empty cells are selected randomly.

21.04.23 IWSEC'07 13

Page 14: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Eigentaste-based CF with Eigentaste-based CF with PrivacyPrivacyNow server holds disguised user-

item matrix, D’and user-gauge matrix A’

In some steps, the effects of perturbation must be considered and handled! ◦Correlation matrix construction◦Projection◦Active user’s entry of gauge set

21.04.23 IWSEC'07 14

Page 15: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Correlation Matrix Correlation Matrix ConstrctionConstrction

21.04.23 IWSEC'07 15

If f≠g means for nondiagonal entries of C’

Expected values 0 0 0 since μ=0

n

uuguf zz

nC

11

1'Then

Page 16: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Correlation Matrix Correlation Matrix ConstrctionConstrction

21.04.23 IWSEC'07

If f=g means for diagonal entries of C’

Expected value is 0 since μ=0

n

uuf

n

uruf

n

uuf z

nr

nz

nC

1

2

1

22

1

2

1

1

1

1

1

1'

Then, assumming n≈n-1

Page 17: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

ProjectionProjection

21.04.23 IWSEC'07 17

Similarly, expected values are 0, then approximated matrix is obtained

TEAx 2

k

lljljililij Rerzx

1

))((

k

llj

k

l

k

l

k

lilljilljilljil RrerRzez

1 1 1 1

k

lljilez

1

Page 18: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Remaining PartsRemaining PartsAfter determining clusters depending

on estimated data◦Z-score means of nongauge items are

stored in look up table.◦When active user, enters disguised gauge

ratings the effect of randomization is got rid of by the same way.

◦The representative cluster is defined, corresponding value from the table denormalized and the prediction is obtained!

21.04.23 IWSEC'07 18

Page 19: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

ExperimentsExperimentsData Set

◦Jester is a web-based joke data 17,988 users, 100 jokes Ratings over a range (-10,+10),continuos 50% of all ratings are present

Evaluation Metrics

21.04.23 IWSEC'07 19

d

rpMAE

d

iii

1

minmax rr

MAENMAE

p:predicted valuer:original valued:size of test setrmax:max rating

rmin: min rating

Page 20: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Eigentaste vs. ModifiedEigentaste vs. Modified9000 training users, 5000 test

users(10 test items)

21.04.23 IWSEC'07 20

MAE NMAE

Eigentaste 3,740 0,187

Modified Eigentaste 3,334 0,167

Page 21: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Protecting active users’ Protecting active users’ privacyprivacy

M1 M2 M3

MAE 3,3508 3,4710 3,4807

NMAE 0,1676 0,1735 0,1741

21.04.23 IWSEC'07 21

M1: No disguise, but requires additional costM2: Just considering gauge mean and stdM3: Considering whole mean and std

Page 22: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Accuracy vs. Varying Accuracy vs. Varying Numbers of UsersNumbers of Users

n 500 1000 2000 4000 8000

MAE 4,678 4,242 3,832 3,624 3,483

NMAE 0,234 0,212 0,192 0,181 0,174

21.04.23 IWSEC'07 22

Fix 5000 users and random 10 test items

•By increasing number of users, accuracy improves since random numbers will converge to zero•n>=2000, results are satisfying!

Page 23: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Accuracy with Varying Accuracy with Varying δδ ValuesValuesδ 0 35 70 100

MAE 3,4460 3,4567 3,4615 3,4710

NMAE 0,1723 0,1728 0,1730 0,1735

21.04.23 IWSEC'07 23

Accuracy slightly becomes better with decreasing δ values!

Page 24: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

ConclusionConclusionWe showed that how to achieve

privacy preserving CF tasks using Eigentaste-based algorithms?

We will study ◦whether we can employ other

clustering algorithms◦How to improve recommendation

qualitiesby using correlation based CF algorithms.

21.04.23 IWSEC'07 24

Page 25: Privacy-Preserving  Eigentaste-based  Collaborative Filtering

Thanks for your interests!Questions?