SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet,...

34
SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias, Univ Paris-Sud LRI WICOW’10 January 26 2011 Presented by Somin Kim

Transcript of SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet,...

Page 1: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

SpotRank: A Robust Voting System for Social News Websites

Thomas Largillier, Guillaume Peyronnet, Sylvain PeyronnetUniv Paris-Sud LRI, Nalrem Mdeias, Univ Paris-Sud LRIWICOW’10

January 26 2011Presented by Somin Kim

Page 2: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Outline Introduction Related Work SpotRank Algorithm Experiments Conclusion

2/33

Page 3: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Introduction

In social news website, users share content they found on the web and can vote for the news they like the most– Voting for a news is then considered as a recommendation– News with a sufficient number of recommendations are dis-

played on front page.

3/33

Page 4: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Introduction

It is tempting for a user to use malicious techniques in order to obtain a good visibility for his websites– Being on the front page of a website such as Digg seems to

be very interesting and thousands of unique visitors are ob-tained within one day

The top users are acting together in order to have websites they support displayed on the front page– Using daily mailing list– Posting hundreds of links– Voting for themselves

4/33

Page 5: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Outline Introduction Related Work SpotRank Algorithm Experiments Conclusion

5/33

Page 6: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Related Work

Spam countermeasures for social websites– Identification-based methods : detection of spam and spam-

mers– Ranked-based methods : demotion of spam– Limit-based methods : preventing spam by making spam

content difficult to publish

A related field of research– Machine learning based ranking framework for social media– Detection of click fraud in the Pay Per Click– Giving to users a good selection of news

We focus on techniques that demote votes that are malicious, or done by users known to be malicious

6/33

Page 7: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Outline Introduction Related Work SpotRank Algorithm

– Framework and principle– Proposing a spot– Voting for a spot– Detecting cabals

Experiments Conclusion

7/33

Page 8: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

SpotRank Algorithm

Framework and principle

U : a community of users who use the voting system S : the set of spots

– Spot : news or content proposed by any user V : the set of all votes

– Vote is a triple of (u, s, v) where u, v ∈U and s ∈ S

Some notations :

8/33

Page 9: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

SpotRank Algorithm

Framework and principle

Two votes do not necessarily have the same value– A score to each vote will be assigned depending on many

factors– The higher the score of a spot, the closer to the first place is

the spot.

Pertinence– The pertinence of a user depends on the pertinence of the

spots he voted for, and vice versa

9/33

Page 10: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

SpotRank Algorithm

Framework and principle

Voting process of SpotRank

10/33

Page 11: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

SpotRank Algorithm

Proposing a spot

When a user proposes a spot it is necessary to initial-ize its score

n : the number of spots proposed by the user in the last 24 hours

m : the number of spots previously posted from the user’s IP in the last 20 minutes

With this formula, we prevent the effective “spot bombing” from spammers

11/33

Page 12: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

SpotRank Algorithm

Voting for a spot

Once a spot has been proposed, it can be “pushed” to the front page according to its score– The base score of a vote is the pertinence of the voter– This value is then modified according to several criteria to

provide its score

The voting part is the most important part where the spammers will concentrate – We propose a set of filters whose aim is to counter all the at-

tacks a spammer could think of

12/33

Page 13: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

SpotRank Algorithm

Voting for a spot

Base value of a vote : pertinence– Pert(u) is the mean value of the pertinence of the spots u

voted for– Pert(s) is its score divided by the number of votes it received

13/33

Page 14: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

SpotRank Algorithm

Voting for a spot

High frequency voting– A typical spammer votes for a lot of spots in a short amount

of time

α4 is the time interval that is reasonable between two votes

14/33

Page 15: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

SpotRank Algorithm

Voting for a spot

Abusive one-way voting– A typical spammer uses several accounts

One clean account to propose spots Several disposable accounts to vote for the spots proposed by

the clean account

– Users that vote only for one specific user will have their vote becoming useless

15/33

Page 16: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

SpotRank Algorithm

Voting for a spot

Quick voting– The behavior of a spammer is to propose a spot and to

quickly vote for it A spammer will not stay a long time on one given website

– To avoid quick voting we block any vote in the first minute of appearance of the spot s on the site and after that we use a stair function time(s)

t : current time

16/33

Page 17: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

SpotRank Algorithm

Voting for a spot

Multiple avatars and physical community– SpotRank demotes votes for a given spot if they come from

the same IP address A typical spammer will have many accounts, sometimes he will

also have automatic voting mechanisms These voting bots are often located on only a few servers, so

they share the same IP address (or only very few IPs addresses)

– n : number of previous votes from this IP address

17/33

Page 18: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

SpotRank Algorithm

Voting for a spot

Avoiding the voting list effect– A group of people can unite their efforts in order to promote

their own spots This is classically done through daily mailing lists

– if a user u votes for a user u’ and both users are in the same cluster then the value of the vote is weighted by the inverse of the size of this cluster

18/33

Page 19: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

SpotRank Algorithm

Voting for a spot

Summary : Computation of the actual score of a vote

19/33

Page 20: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

SpotRank Algorithm

Voting for a spot

Computation of the score of a spot– The score of a spot is simply the sum of all votes for this spot

and of the initial score of the spot The score of a spot s is updated each time a user votes for it,

but also periodically since the value of time decay varies over time

– Time decay is used to promote new spots against old strong spots

20/33

Page 21: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

SpotRank Algorithm

Detecting cabals

We propose here to regroup people that massively vote between themselves– We use the following algorithm that should be run regularly

to identify new cabals and actualize the existing ones

21/33

Page 22: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Outline Introduction Related Work SpotRank Algorithm Experiments

– Log analysis of spotrank.fr– Human evaluation

Conclusion

22/33

Page 23: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Experiments

In order to collect data about the behavior of Spo-tRank, spotrank.fr has been launched– The data were collected from 09/07/2009 to 10/26/2009– 15600 visits, 43000 page views– Average time spent by a visitor on the website : 2:37 min-

utes We estimated that at least 10 to 15% of accounts be-

long to spammers

23/33

Page 24: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Experiments

Log analysis of spotrank.fr

% of users with regard to pertinence– As time goes and the number of users grows, the pertinence

of the users tends to spread more

2009/07/23 2009/09/08 2009/10/26

– Two categories of users the non-relevant users : pertinence (u) < 10

– It contains mainly spammers the relevant users : pertinence(u) > 50 (except newcomers )

24/33

Page 25: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Experiments

Log analysis of spotrank.fr

% of low and high pertinent users with regard to time (during 3 months)– The percentage of non-relevant users including spammers is

decreasing while the percentage of relevant users is increas-ing

25/33

Page 26: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Experiments

Log analysis of spotrank.fr

# users versus # proposed spots– Majority of users proposes a few spots (less than 3)– There are few people with a oddly high number of proposed

spots Most of them are spammers

26/33

Page 27: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Experiments

Log analysis of spotrank.fr

% users with regard to # votes– Most users don’t vote a lot– The people that vote the most are clearly the ones we sus-

pect to be spammers

27/33

Page 28: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Experiments

Log analysis of spotrank.fr

# votes versus their scores– Most of the votes have very low score– Most legitimate users seems to have votes with score be-

tween 5 and 50

28/33

Page 29: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Experiments

Human evaluation

We compared the top “stories” of spotrank.fr and two other major social news websites in France

Survey protocol– Collect the first five spot on each website periodically– Generate a webpage containing a shuffle of list of 15 news– Each webpage is sent to a volunteer who has to tell for each

news if, Yes, it is relevant for the news to appear on the front page of a

social news website No, it is not relevant for the news to appear on the front page of

a social news website DnK, he is not able to determine if the news deserve to be on

the front page or not Err, the news was not accessible when he tried

29/33

Page 30: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Experiments

Human evaluation

# answers of each type– The ranking given by SpotRank is of higher quality than two

others– The filtering of SpotRank gives clearer results

30/33

Page 31: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Experiments

Human evaluation

Rank with regard to the number of Yes, No, DnK– User satisfaction survey show clearly that the filtering of Spo-

tRank is perceived to be of high quality

Yes NoDnK

31/33

Page 32: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Outline Introduction Related Work SpotRank Algorithm Experiments Conclusion

32/33

Page 33: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,

Conclusion We presented a robust voting system for social news

website– to demote the effect of manipulation

SpotRank clearly outperforms real competitors in a real life web ecosystem

33/33

Page 34: SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, Nalrem Mdeias,