Ranking Tweets Considering Trust and Relevance Srijith Ravikumar,Raju Balakrishnan, and Subbarao...

download Ranking Tweets Considering Trust and Relevance Srijith Ravikumar,Raju Balakrishnan, and Subbarao Kambhampati Arizona State University 1.

If you can't read please download the document

Transcript of Ranking Tweets Considering Trust and Relevance Srijith Ravikumar,Raju Balakrishnan, and Subbarao...

  • Slide 1

Ranking Tweets Considering Trust and Relevance Srijith Ravikumar,Raju Balakrishnan, and Subbarao Kambhampati Arizona State University 1 Slide 2 One of the most prominent micro-blogging service. Twitter has over 140 million active users and generates over 340 millions tweets daily and handles over 1.6 billion search queries per day.search queries Users access tweets by following other users and by using the search function. 2 Slide 3 Twitter Search Sorted by Reverse Chronological Order Select the top retweeted single tweet as the top Tweet. Does not apply any relevance metrics. Contains spams and untrustworthy tweets. Results for the Query: Britney Spears 3 Slide 4 TweetRank Query Top K Results Top N Results Acts as a mediator between User and Twitter K is much higher than N and thereby we are able to eliminate untrustworthy results. 4 Slide 5 Need for Relevance and Trust Spread of False Facts in Twitter has become an everyday event Re-Tweets and users can be bought. Thereby making relying on those for trustworthiness does not work. 5 Slide 6 Getting Relevant & Trustworthy Results Manual curation is out of question.. (unless you are the Government of China :-) ) How many would it take to clean up a micro-blog with140 million active users? Automated analysis? Page Rank uses the explicit links between the Web Pages for evaluation of Trust and Relevance. But what are the links between tweets? 6 Slide 7 Links in Twitter Space Retweet Agreement Re-Tweet: Explicit links between tweets Agreement: Implicit links between tweets that contain the same fact 7 Slide 8 Agreement Agreement between two tweets is defined as amount of similarity in their content. Retweets are not considered in Agreement as Retweets are unverified endorsements. How does agreement Capture Relevance and Trust? A tweet which is agreed upon by a large number of other tweets is likely to be popular. The popular tweets are more likely to be Relevant. Since agreement does not include retweets, most agreed tweet has most number of independent users agreeing on the same fact and hence they are more trustworthy. 8 Slide 9 Agreement Computation For efficient computation of agreement we need to understand the meaning of each tweet. This need Natural Language Processing. As a preliminary idea, we compute agreement using Soft TF-IDF with Jaro-Winkler similarity. Soft TF-IDF is similar to TF-IDF except it considers similar tokens in two compared document vectors in addition exactly similar terms. 9 Slide 10 Computing Ranked Results Simple voting technique is used to compute the Ranked Results. The Agreement of a tweet is the sum of the agreement with all others tweets. The tweets are sorted according to Agreement voting and Top-N results are send to user..7.4.6 1 2 3 1.3 1.0 0.0 10 Slide 11 Results: Britney Spears Twitter ResultsTweetRank Results (Oops?!) Britney Spears is Engaged... Again! - its britney: http://t.co/1E9LsaH7 http://t.co/1E9LsaH7 In entertainment: Britney Spears engaged to marry her longtime boyfriend and former agent Jason Trawick. RT @GMA: Britney Spears Engaged Again http://t.co/5Ly0lga4http://t.co/5Ly0lga4 #Britney #Spears #engaged to #boyfriend: #report: LOS ANGELES (Reuters) - Pop star Britney... http://t.co/PiVU http://t.co/PiVU Britney Spears engaged: http://t.co/gpQQ2S6I" http://t.co/gpQQ2S6I Congratulations to Britney Spears and her beau Jason Trawick for getting engaged via a 3.5 carat ring! We are certainly happy for her! 11 Slide 12 Evaluation - Relevance Top N results where manually labelled as follows: Not related to the topic or spam0 Remotely Relevant to the topic1/3 Tweets which have some information on the topic 2/3 Tweets which have good amount of information 1 12 Slide 13 Evaluation - Trust Untrustworthy tweets such as spam or wrong facts Tweets which are opinions 0 Tweets which contain correct facts 1 Top N results where manually labelled as follows: 13 Slide 14 Ranking Cost The time increases quadratically with the number of tweets. Since the computation of agreement is pairwise it can be easily parallelized using MapReduce. 14 Slide 15 Twitter Eco-System Followers Hyperlinks Tweeted By Tweeted URL 15 Slide 16 Summary We model the tweet space as a tri-layer graph; containing tweet layer, user layer and web-page layer. Ranking is derived based on users, tweets, and prestige of the referred web pages. Micro-blog spamming is increasingly becoming lucrative and problematic. We are working on a ranking sensitive to trustworthiness and relevance of Micro-blogs. 16