SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet,...
-
Upload
angela-holland -
Category
Documents
-
view
219 -
download
2
Transcript of SpotRank : A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet,...
SpotRank: A Robust Voting System for Social News Websites
Thomas Largillier, Guillaume Peyronnet, Sylvain PeyronnetUniv Paris-Sud LRI, Nalrem Mdeias, Univ Paris-Sud LRIWICOW’10
January 26 2011Presented by Somin Kim
Outline Introduction Related Work SpotRank Algorithm Experiments Conclusion
2/33
Introduction
In social news website, users share content they found on the web and can vote for the news they like the most– Voting for a news is then considered as a recommendation– News with a sufficient number of recommendations are dis-
played on front page.
3/33
Introduction
It is tempting for a user to use malicious techniques in order to obtain a good visibility for his websites– Being on the front page of a website such as Digg seems to
be very interesting and thousands of unique visitors are ob-tained within one day
The top users are acting together in order to have websites they support displayed on the front page– Using daily mailing list– Posting hundreds of links– Voting for themselves
4/33
Outline Introduction Related Work SpotRank Algorithm Experiments Conclusion
5/33
Related Work
Spam countermeasures for social websites– Identification-based methods : detection of spam and spam-
mers– Ranked-based methods : demotion of spam– Limit-based methods : preventing spam by making spam
content difficult to publish
A related field of research– Machine learning based ranking framework for social media– Detection of click fraud in the Pay Per Click– Giving to users a good selection of news
We focus on techniques that demote votes that are malicious, or done by users known to be malicious
6/33
Outline Introduction Related Work SpotRank Algorithm
– Framework and principle– Proposing a spot– Voting for a spot– Detecting cabals
Experiments Conclusion
7/33
SpotRank Algorithm
Framework and principle
U : a community of users who use the voting system S : the set of spots
– Spot : news or content proposed by any user V : the set of all votes
– Vote is a triple of (u, s, v) where u, v ∈U and s ∈ S
Some notations :
8/33
SpotRank Algorithm
Framework and principle
Two votes do not necessarily have the same value– A score to each vote will be assigned depending on many
factors– The higher the score of a spot, the closer to the first place is
the spot.
Pertinence– The pertinence of a user depends on the pertinence of the
spots he voted for, and vice versa
9/33
SpotRank Algorithm
Framework and principle
Voting process of SpotRank
10/33
SpotRank Algorithm
Proposing a spot
When a user proposes a spot it is necessary to initial-ize its score
n : the number of spots proposed by the user in the last 24 hours
m : the number of spots previously posted from the user’s IP in the last 20 minutes
With this formula, we prevent the effective “spot bombing” from spammers
11/33
SpotRank Algorithm
Voting for a spot
Once a spot has been proposed, it can be “pushed” to the front page according to its score– The base score of a vote is the pertinence of the voter– This value is then modified according to several criteria to
provide its score
The voting part is the most important part where the spammers will concentrate – We propose a set of filters whose aim is to counter all the at-
tacks a spammer could think of
12/33
SpotRank Algorithm
Voting for a spot
Base value of a vote : pertinence– Pert(u) is the mean value of the pertinence of the spots u
voted for– Pert(s) is its score divided by the number of votes it received
13/33
SpotRank Algorithm
Voting for a spot
High frequency voting– A typical spammer votes for a lot of spots in a short amount
of time
α4 is the time interval that is reasonable between two votes
14/33
SpotRank Algorithm
Voting for a spot
Abusive one-way voting– A typical spammer uses several accounts
One clean account to propose spots Several disposable accounts to vote for the spots proposed by
the clean account
– Users that vote only for one specific user will have their vote becoming useless
15/33
SpotRank Algorithm
Voting for a spot
Quick voting– The behavior of a spammer is to propose a spot and to
quickly vote for it A spammer will not stay a long time on one given website
– To avoid quick voting we block any vote in the first minute of appearance of the spot s on the site and after that we use a stair function time(s)
t : current time
16/33
SpotRank Algorithm
Voting for a spot
Multiple avatars and physical community– SpotRank demotes votes for a given spot if they come from
the same IP address A typical spammer will have many accounts, sometimes he will
also have automatic voting mechanisms These voting bots are often located on only a few servers, so
they share the same IP address (or only very few IPs addresses)
– n : number of previous votes from this IP address
17/33
SpotRank Algorithm
Voting for a spot
Avoiding the voting list effect– A group of people can unite their efforts in order to promote
their own spots This is classically done through daily mailing lists
– if a user u votes for a user u’ and both users are in the same cluster then the value of the vote is weighted by the inverse of the size of this cluster
18/33
SpotRank Algorithm
Voting for a spot
Summary : Computation of the actual score of a vote
19/33
SpotRank Algorithm
Voting for a spot
Computation of the score of a spot– The score of a spot is simply the sum of all votes for this spot
and of the initial score of the spot The score of a spot s is updated each time a user votes for it,
but also periodically since the value of time decay varies over time
– Time decay is used to promote new spots against old strong spots
20/33
SpotRank Algorithm
Detecting cabals
We propose here to regroup people that massively vote between themselves– We use the following algorithm that should be run regularly
to identify new cabals and actualize the existing ones
21/33
Outline Introduction Related Work SpotRank Algorithm Experiments
– Log analysis of spotrank.fr– Human evaluation
Conclusion
22/33
Experiments
In order to collect data about the behavior of Spo-tRank, spotrank.fr has been launched– The data were collected from 09/07/2009 to 10/26/2009– 15600 visits, 43000 page views– Average time spent by a visitor on the website : 2:37 min-
utes We estimated that at least 10 to 15% of accounts be-
long to spammers
23/33
Experiments
Log analysis of spotrank.fr
% of users with regard to pertinence– As time goes and the number of users grows, the pertinence
of the users tends to spread more
2009/07/23 2009/09/08 2009/10/26
– Two categories of users the non-relevant users : pertinence (u) < 10
– It contains mainly spammers the relevant users : pertinence(u) > 50 (except newcomers )
24/33
Experiments
Log analysis of spotrank.fr
% of low and high pertinent users with regard to time (during 3 months)– The percentage of non-relevant users including spammers is
decreasing while the percentage of relevant users is increas-ing
25/33
Experiments
Log analysis of spotrank.fr
# users versus # proposed spots– Majority of users proposes a few spots (less than 3)– There are few people with a oddly high number of proposed
spots Most of them are spammers
26/33
Experiments
Log analysis of spotrank.fr
% users with regard to # votes– Most users don’t vote a lot– The people that vote the most are clearly the ones we sus-
pect to be spammers
27/33
Experiments
Log analysis of spotrank.fr
# votes versus their scores– Most of the votes have very low score– Most legitimate users seems to have votes with score be-
tween 5 and 50
28/33
Experiments
Human evaluation
We compared the top “stories” of spotrank.fr and two other major social news websites in France
Survey protocol– Collect the first five spot on each website periodically– Generate a webpage containing a shuffle of list of 15 news– Each webpage is sent to a volunteer who has to tell for each
news if, Yes, it is relevant for the news to appear on the front page of a
social news website No, it is not relevant for the news to appear on the front page of
a social news website DnK, he is not able to determine if the news deserve to be on
the front page or not Err, the news was not accessible when he tried
29/33
Experiments
Human evaluation
# answers of each type– The ranking given by SpotRank is of higher quality than two
others– The filtering of SpotRank gives clearer results
30/33
Experiments
Human evaluation
Rank with regard to the number of Yes, No, DnK– User satisfaction survey show clearly that the filtering of Spo-
tRank is perceived to be of high quality
Yes NoDnK
31/33
Outline Introduction Related Work SpotRank Algorithm Experiments Conclusion
32/33
Conclusion We presented a robust voting system for social news
website– to demote the effect of manipulation
SpotRank clearly outperforms real competitors in a real life web ecosystem
33/33