Resource recommendation vs privacy enhancement

Post on 20-Jan-2015

223 views 0 download

Tags:

description

Social tagging has opened new possibilities for applications interoperability on the semantic web, while at the same time posing new privacy treats. Recommendation and information filtering systems in fact predict users preferences, providing personalized content to their users, but also exposing their profiles to possible privacy attacks. Tag suppression and forgery are Privacy Enhancing Techniques that protect users privacy to a certain extent, at the loss of semantic accuracy loss, or in other words privacy gain at the expenses of utility loss. The impact of tag suppression and forgery to content-based recommendation is hence investigated in a real world application scenario.

Transcript of Resource recommendation vs privacy enhancement

1/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

Silvia Puglisisilvia.puglisi@upc.edu

“Research Seminar”Master in Telematics Engineering-UPC

On Content-Based Recommendation and Users Privacy in Social Tagging SystemsSilvia Puglisi

Barcelona, UPC, 2013

2/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

Social tagging is the activity that allows users to assign keywords (tags) to web based resources.

What is social tagging?

3/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

Tagging and tags

Tag: a label attached to someone or something for identification or other information

4/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

Scenario

Social tagging enables semantic interoperability in web applications.

Recommendation and information filtering systems have been developed to predict users preferences.

Users hence reveal their personal preferences on social tagging platforms.

Privacy enhancing techniques (PET) have been developed to protect user privacy to a certain extent, at the expense of semantic loss.

5/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

Objective

Using as starting point research done in the field of recommendations systems [1] and PET [2].

The objective of this study is evaluate the impact of two PET, tag forgery and suppression, on the performance of a recommendation system, on real world application data.

[1] Bellogín, Alejandro, Iván Cantador, and Pablo Castells. "A comparative study of heterogeneous item recommendations in social systems." Information Sciences (2012)

[2] Parra-Arnau, Javier, David Rebollo-Monedero, and Jordi Forné. "A privacy-protecting architecture for collaborative filtering via forgery and suppression of ratings." Data Privacy Management and Autonomous Spontaneus Security (2012): 42-57.

6/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

Dataset

Considering different social bookmarking platform, Delicious was identified as a representative system of an application rich in collaborative tagging information.

Delicious is a social bookmarking platform for web resources.

The dataset containing Delicious data was obtained from the ones publicly available at the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems.

7/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

Delicious

8/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

TechniquesModelling the User/Item Profile

The simplest approach to model users and items is to count the number of times a tag has been used:

•By a user to annotate different items in the same category.

•Or by the community to annotate the item.

The user/item profile is then described as a histogram of the relative frequencies of tags within a predefined set of categories of interest.

9/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

TechniquesHistogram of a user profile

10/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

Techniques Privacy Metric

The Kullback-Leibler (KL) divergence has been adopted as privacy criteria, following the perspective of Jaynes’ rationale on entropy maximization methods.

Since the KL divergence may be regarded as a generalization of entropy of a distribution, relative to another, it is often referred to as relative entropy.

11/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

TechniquesUtility Metric

A measure of how an item is useful for a certain user is needed.

We could convey that an item is useful if its profile is somehow similar to the user profile.

Hence we need a measure of similarity.

Content based recommender models are defined as similarity measures between users and item profiles. This is provided by the cosine-based similarity measure:

12/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

TechniquesPerformance Metric

The recommender system is evaluated considering a content retrieval scenario where a user is provided with a ranked list of N recommended items.

The performance metric adopted is hence among the commonly used for ranked list prediction, i.e. precision at top N.

In the field of Information Retrieval precision can be defined as the fraction of recommended items that are relevant for a target user.

13/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

Techniques Tag Forgery and Suppression

Tag suppression and forgery are privacy enhancing techniques that helps users who tags resources online, from revealing sensible information to a possible attacker.

14/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

Techniques Tag Forgery and Suppression Rates

The tag forgery rate represents the ratio of forged items:

The tag suppression rate, is the proportion of items that the user consents to eliminate:

15/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

Techniques The Privacy-Forgery-Suppression Function

Consistently the privacy-forgery-suppression function can be defined:

16/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

Evaluation

17/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

EvaluationStatistics about the dataset

Categories 11 Users 1867

Item-Category Tuples

98998 Avg. tags per user 477.75

Items 69226Avg. Items per Category

81044

Avg. categories per item

1.4 Tags per item 13.06

18/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

ResultsRelative Risk Reduction with forgery - Utility

19/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

ResultsRelative Risk Reduction with suppression - Utility

20/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

Conclusions

Tag suppression and forgery are simple privacy enhancing techniques able to protect users privacy at the cost of some semantic loss.

This study shows with a simple experimental evaluation, in a real world application scenario, how the performances degradation of a recommender system, is small if compared to the privacy risk reduction offered by the application of these techniques.

21/21Research Seminar. Silvia Puglisi

Departament d'Enginyeria Telemàtica

Thank you!