Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 120 July, 2008
Analyzing Social Bookmarking Systems:A del.icio.us Cookbook
Robert Wetzker, Carsten Zimmermann, Christian Bauckhage
Workshop on Mining Social Data, ECAI 2008
21 April 2023 Dipl.-Ing. Robert Wetzker I [email protected]
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 220 July, 2008
Why this paper?
Why social bookmarking?
Provides a vast amount of user-generated annotations for web content.
Reflects the interests of millions of users.
Wisdom-of-crowds.
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 320 July, 2008
Why this paper?
Why social bookmarking?
Provides a vast amount of user-generated annotations for web content.
Reflects the interests of millions of users.
Wisdom-of-crowds.
Research areas:
(Web-) Search
(Web-) Content classification
Ontology building
Trend detection
Recommendation
…
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 420 July, 2008
Outline
1. The del.icio.us bookmarking service
2. Bookmarking patterns
3. Tagging patterns
4. Social bookmarking and spam
5. Conclusions and future work
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 520 July, 2008
The del.icio.us bookmarking service
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 620 July, 2008
The del.icio.us bookmarking service
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 720 July, 2008
The growth of del.icio.us
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 820 July, 2008
The dataset
We recursively crawled del.icio.us tag wise starting with the tag “web2.0” (Oct.-Dez. 2007). From the retrieved corpus of 45 million bookmarks we extracted the 1 million most frequent users and downloaded the bookmarks of these users. (Dez. 2007 – Apr. 2008) For the analysis presented here, we only considered the 142 million bookmarks obtained from the user wise crawling.
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 920 July, 2008
The dataset
We recursively crawled del.icio.us tag wise starting with the tag “web2.0” (Oct.-Dez. 2007). From the retrieved corpus of 45 million bookmarks we extracted the 1 million most frequent users and downloaded the bookmarks of these users. (Dez. 2007 – Apr. 2008) For the analysis presented here, we only considered the 142 million bookmarks obtained from the user wise crawling.
Corpus details
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1020 July, 2008
The dataset
We recursively crawled del.icio.us tag wise starting with the tag “web2.0” (Oct.-Dez. 2007). From the retrieved corpus of 45 million bookmarks we extracted the 1 million most frequent users and downloaded the bookmarks of these users. (Dez. 2007 – Apr. 2008) For the analysis presented here, we only considered the 142 million bookmarks obtained from the user wise crawling.
Corpus details
> 80% of del.icio.us
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1120 July, 2008
Bookmarking patterns
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1220 July, 2008
Bookmarking patterns
Top 10 most frequent URLs in the corpus
The del.icio.us community is biased toward web community and web technology related content.
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1320 July, 2008
Bookmarking patterns
Top 10 most frequent domains in the corpus
The del.icio.us community is biased toward web community and web technology related content.
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1420 July, 2008
Bookmarking patterns
The Top 1% of users proliferates 22% of all bookmarks. 39% of all bookmarks link to 1% of all URLs.
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1520 July, 2008
Bookmarking patterns
The del.icio.us community pays attention to new content only for a very short period of time.
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1620 July, 2008
Tagging patterns
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1720 July, 2008
Tagging patterns
Each bookmark is labeled with 3.16 tags on average.About 7% of all bookmarks are not tagged at all.
Top 20 most frequent tags in the corpus
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1820 July, 2008
Tagging patterns
700 of 7.000.000 tags account for 50% of all labels. 55% of all tags appear only once.
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1920 July, 2008
Tagging patterns
Tendencies in the del.icio.us tag distribution strongly correlate with upcoming and periodic external events.
Occurrence of 5 sample tags in 2007.
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2020 July, 2008
Social bookmarking and spam
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2120 July, 2008
Social bookmarking and spam
Del.icio.us is highly vulnerable to spam.
19 of the Top 20 users are of apparently non human origin accounting for 1.3 million bookmarks, around 1% of the corpus.
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2220 July, 2008
Social bookmarking and spam
Del.icio.us is highly vulnerable to spam.
19 of the Top 20 users are of apparently non human origin accounting for 1.3 million bookmarks, around 1% of the corpus.
We find spammers to exhibit one or more of the following characteristics:
very high activity bookmarking only few domains high tagging rate very low tagging rate bulk posts a combination of the above
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2320 July, 2008
Social bookmarking and spam
The number of bookmarks and the number of users linking to a domain.
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2420 July, 2008
Social bookmarking and spam
The number of user bookmarks and the average number of tags per bookmark.
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2520 July, 2008
The diffusion of attention
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2620 July, 2008
The diffusion of attention
In some cases spam detection may prove computational expensive or ambiguous.
The diffusion of attention concept reduces the effect of spam on the tag distribution without the actual need of spam detection.
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2720 July, 2008
The diffusion of attention
In some cases spam detection may prove computational expensive or ambiguous.
The diffusion of attention concept reduces the effect of spam on the tag distribution without the actual need of spam detection.
We define the attention given to a tag as the number of users using the tag.
The diffusion of attention for a tag is then given by the number of users that assign a tag for the first time in a given period.
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2820 July, 2008
The diffusion of attention
Tagging trends by tag occurrence.
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2920 July, 2008
The diffusion of attention
Tagging trends by tag occurrence. Tagging trends by diffusion of attention.
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 3020 July, 2008
Future work
Provide automatic and scalable spam detection methods. Topic aware detection of trends.
Follow up paper:
Detecting Trends in Social Bookmarking Systems using a Probabilistic Generative Model and Smoothing, R. Wetzker, T. Plumbaum, A.Korth, C. Bauckhage, T. Alpcan, F. Metze, International Conference on Pattern Recognition (ICPR), 2008, Tampa, USA (to appear)
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 3120 July, 2008
Questions?
Thank you.
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 3220 July, 2008
Social bookmarking and spam
The number of bookmarks and the number of users linking to a domain.
http://d.hatena.ne.jp
Top Related