Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs –...

161
Blogosphere: Research Issues, Tools and Applications Huan Liu and Nitin Agarwal {Huan.Liu, Nitin.Agarwal.2}@asu.edu Computer Science and Engineering Arizona State University An updated version could be downloaded from www.public.asu.edu/~huanliu/KDD08BlogosphereTutorial.pdf or www.public.asu.edu/~nagarwa6/KDD08BlogosphereTutorial.pdf

Transcript of Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs –...

Page 1: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blogosphere: Research Issues, Tools and Applications

Huan Liu and Nitin Agarwal{Huan.Liu, Nitin.Agarwal.2}@asu.edu

Computer Science and EngineeringArizona State University

An updated version could be downloaded fromwww.public.asu.edu/~huanliu/KDD08BlogosphereTutorial.pdf or www.public.asu.edu/~nagarwa6/KDD08BlogosphereTutorial.pdf

Page 2: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Acknowledgments

• We would like to express our sincere thanks to Magdiel Oliveras Galan, John J. Salerno, Shankar Subramanya, Sanjay Sundarajan,Lei Tang, Philip S. Yu , and Alan Zheng Zhao for collaboration, discussion, and valuable comments.

• This work is, in part, sponsored by AFOSR and ONR grants in 2008.

• This agreement covers the use of all slides of this tutorial.

– You may use these slides freely for teaching if you send us an email stating the university name and class/course number in advance, and cite this tutorial.

– If you wish to use these slides in any other ways, please contact (or email) us. The ppt version contains notes with additional information such as various sources in addition to References.

Page 3: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Outline

• Background: Web 2.0 and Social Networks

• Blogosphere: Definition, Types, and Comparison

• Blogosphere Research Issues

• Tools and APIs

• Data Collection

• Measures, Models, and Methods

• Performance, Evaluation, and Metrics

• Case Studies

• References

Page 4: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

WEB 2.0 AND SOCIAL NETWORKS

Page 5: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Web vs. Web 2.0

Page 6: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Characteristics of Web 2.0

• Rich Internet Applications

• User generated contents

• User enriched contents

• User developed widgets

• Collaborative environment: Participatory Web, Citizen journalism

• Thus, it leverages the power of the Long Tail with user generated data as the driving force

• More of a paradigm shift than a technology shift

Page 7: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Web 2.0 Services (examples)• Blogs

– Blogspot– Wordpress

• Wikis– Wikipedia– Wikiversity

• Social Networking Sites– Facebook– Myspace– Orkut

• Digital media sharing websites– Youtube– Flickr

• Social Tagging– Del.icio.us

• Others– Twitter– Yelp

Page 8: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Top 20 Most Visited Websites• Internet traffic report by Alexa on July 29th 2008

• 40% of the top 20 websites are Web 2.0 sites

1 Yahoo! 11 Orkut

2 Google 12 RapidShare

3 YouTube 13 Baidu.com

4 Windows Live 14 Microsoft Corporation

5 Microsoft Network 15 Google India

6 Myspace 16 Google Germany

7 Wikipedia 17 QQ.Com

8 Facebook 18 EBay

9 Blogger 19 Hi5

10 Yahoo! Japan 20 Google France

Page 9: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Social Networks

• A social structure made of nodes (individuals or organizations) that are related to each other by various interdependencies like friendship, kinship, like, ...

• Graphical representation– Nodes = members– Edges = relationships

Page 10: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Social Networks

Page 11: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Social Networks

• A social structure made of nodes (individuals or organizations) that are related to each other by various interdependencies like friendship, kinship, like, ...

• Graphical representation– Nodes = members– Edges = relationships

• Various realizations– Social bookmarking (Del.icio.us)– Friendship networks (facebook, myspace)– Blogosphere – Media Sharing (Flickr, Youtube)– Folksonomies

Page 12: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

• ACM TKDD Special Issue on Social Computinghttp://www.public.asu.edu/~huanliu/acm-tkdd-sbp

• Second International Conference on Social Computing, Behavioral Modeling, and Prediction (SBP09)http://www.public.asu.edu/~huanliu/sbp09

• SIAM International Conf on Data Mining (SDM)Sparks (Reno area), Nevada, April 30 - May 2, 2009.http://www.siam.org/meetings/sdm09

Some Related CFPs

Page 13: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

BLOGOSPHEREDefinitions, Types, and Comparison

Page 14: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blogging Phenomenon• It’s growing fast as a new means for online

communications and interactions

• A blogger could gain instant fame via his blogs

• A blogger may make a good living with her blogs

• Abundant, lucrative business opportunities

• A new political arena

Page 15: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Source: The New York Times

“The site, chock full of advertising, is a moneymaking machine – so much so that Ms. Armstrong and her husband have both quit their regular jobs.“The reason? The advertisers are eager to influence her 850,000 readers.

Arnold Kim, founder and senior editor of MacRumors.com.

“The site places MacRumors No. 2 on a list of the ‘25 most valuable blogs,’ …” What is the potential value? “Two of the other tech-oriented blogs on its list, …, were sold earlier this year, reportedly for sums in excess of $25 million.”

Page 16: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blogosphere Growth• “In January 2004, there were about 1 million blogs on the

Internet. As of mid-2006, the population of the ‘blogosphere’ was well past 50 million and climbing.” – Paul Gillin, The New Influencers, 2007

“36 million women participate in the blogosphere each week, and 15 million have their own blogs”

– A Study by BlogHer

Today Front Page NY TimesThe Year of the Political Blogger Has Arrived… both parties understand the need to have greater numbers of bloggers attend.… to bring down the walls of the convention …

Page 17: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Understanding Blogosphere

• Blogosphere• Blog sites• Bloggers• Blog posts• Reverse chronologically

ordered entries • Blogroll• Permalinks• Trackback

• Everyone can publish, but few are heard

• Many interesting questions to address– How to build traffic– How to find niche online– How to increase

influence– How to …

• Fertile research domain

Page 18: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blog Site

Page 19: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blog Post

Page 20: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blogger

Page 21: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Types of Blogs• Individual vs. community

– Single authored (Individual blog sites)– Multi authored (Community blog sites)

• Regulated vs. anonymous

Individual Blog Sites Community Blog Sites

Owned and maintained by individual users.Owned and maintained by a group of like-minded users.

More like personal accounts, journals or diaries.More like discussion forums and discussion boards.

No or almost negligible group interaction.High degree of group discussion and collaboration.

No or almost negligible collective wisdom.Enormous collective wisdom and open source intelligence.

Page 22: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blogosphere

• Complex Social Networks

• Vertices (Nodes): Bloggers/ Blog posts/Blog sites

• Edges: Relationships/Links

• In-Degree: Number of inlinks

• Out-Degree: Number of outlinks

Page 23: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Friendship Networks vs. BlogosphereFriendship Networks Blogosphere

Explicit Links/Edges Implicit Links/Edges

Undirected Graph Directed Graph

Network Centrality Measures Blog Statistics

Quantifying Spread of Influence Quantifying Influential Members

Nodes are members/actors Nodes can be bloggers/blogs or blog sites

Strictly defined graph structure Loosely defined graph structure

“Being in touch” or “Making Friends” Sharing ideas and opinions

Person-to-person Person-to-group

Friendship Oriented Community Oriented

Member’s Reputation/Trust based on network connections and/or location in the network

Member’s Reputation/Trust based on the response to other member’s knowledge solicitations

Page 24: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Friendship Networks vs. Blogosphere

Social Friendship Networks

Blogosphere

Social Networks

Orkut, Facebook, LinkedIn, Classmates.com, etc.

LiveJournal, MySpace, etc.

TUAW, Blogger, Windows Live Spaces, etc.

Page 25: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Citation Networks vs. Blogosphere

• Citation links – DBLP: strict notion of links. People cite what they refer to

– Blogs: links are casual and often missing

• Social networks– DBLP: inferred from co-authorship, citation networks

– Blogs: people explicitly specify their social network or inferred from links, comments, etc.

• Communities– DBLP: conference venues, journals, (relatively static)

– Blogs: community blogs, inferred from blog roll (related blogs), topic taxonomy, blog-blog interaction, (very dynamic)

Page 26: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

BLOGOSPHERE RESEARCH ISSUES

Page 27: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Understanding Blogosphere• Understand structures and properties of Blogosphere• Gain insights into the relationships between

bloggers, readers, blog posts, comments, different blog sites in Blogosphere

• Models help generate artificial data, tune the parameters to simulate special scenarios, and compare various studies and different algorithms

• Study peculiarities in Blogosphere and infer latent patterns and structures that could explain certain phenomena like influence, diffusion, splogs, community discovery.

Page 28: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Modeling Web and Blogosphere• Some key differences between Web and Blogosphere

– Models developed for Web assume dense graph structure due to a large number of interconnecting hyperlinks within webpages. This assumption does not hold true. Blogosphere is shown to have a very sparse hyperlink structure [Kritikopoulos et al. 2006].

– The level of interaction in terms of comments and replies to blog posts makes Blogosphere different from Web

– The highly dynamic and “short-lived” nature of the blog posts could not be simulated by the web models. Web models do not consider dynamicity in the web pages

– Web models assume webpages accumulate links over time. However, this is not true with Blogosphere

– “Categories” and “tags” gives blogs flexibility that conventional websites typically don’t have

– Descriptive filenames used in permalinks of blogs as compared to webpage filenames

Page 29: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Modeling Blogosphere• Preferential attachment

– Probability of a new edge to a node to be added depends on its degree

– “The rich get richer”

– Power law distribution or scale free distribution)deg():( iji vvveP ∝⇔

Page 30: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Modeling Blogosphere• Preferential attachment

– Probability of a new edge to a node to be added depends on its degree

– “The rich get richer”

– Power law distribution or scale free distribution

)deg():( iji vvveP ∝⇔

Page 31: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Modeling Blogosphere• Preferential attachment

– Probability of a new edge to a node to be added depends on its degree

– “The rich get richer”

– Power law distribution or scale free distribution

• Hybrid model– Mixture of both preferential attachment model and random model

– Give a lucky poor guy some chance to get rich

– To solve irreducibility (strong connectedness with few isolated subgraphs) random walk on a graph model proposes a random jump with a fixed probability

• Leskovec et al. 2007 studied temporal patterns– How often people create blog posts

– Busrtiness and popularity

– How these posts are linked and what is the link density

– Developed a SIS based model

• Kumar et al. 2003 use blogrolls on the blog posts to construct a network of blog posts assuming that blogrolls contain similar blog posts

VvvveP iji /)deg():( ∝⇔

βαα )1(/)deg():( −+∝⇔ VvvveP iji

Page 32: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blog Clustering

Page 33: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blog Clustering• Dynamic and automatic organization of the content

• Convenient accessibility

• Optimizing search engines by reducing search space– Search only the relevant cluster

• Focused crawling

• Summarization

• Topic identification

• Reduce information overload– 175,000 blog posts per day, i.e., 2 blog posts per second – Dec

2006

• Extraction and analysis of the trends

Page 34: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blog Clustering (2)• Brooks and Montanez 2006, used tf-idf and

picked top 3 keywords for blog posts– Clustered blogs based on these keywords– Reported improved clustering as compared to that using tags

• Li et al. 2007 assigned different weights to title, body, and comments of blog posts – Need to address high dimensionality and sparsity due to their

keyword-based approach

• Agarwal et al. 2008 proposed a collective-wisdom based approach– Generate a category relation graph based on user assignments– Compute similarity matrix from this graph

∑=

k jk

jiji n

ntf

,

,,

{ }jiji dtd

Didf

∈=

:log

ijiji idftftfidf ⋅= ,,

Page 35: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blog Mining• Interactions between producers and consumers improved with blogs

• Consumers not only speak their mind but also broadcast their opinions

• Blogs are invaluable information sources– consumers’ beliefs and opinions,

– initial reaction to a launch,

– understand consumer language,

– track trends and buzzwords, and

– fine-tune information needs

• Blog conversations leave behind the trails of links, useful for understanding how information flows and how opinions are shaped and influenced

• Tracking blogs also help in gaining deeper insights

Page 36: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blog Mining for Opinion• A prototype system called Pulse [Gamon et al. 2005] uses a Naive Bayes

classifier trained on manually annotated sentences with positive/negative sentiments and iterates until all unlabeled data is adequately classified.

• Another system presented in [Attardi and Simi 2006] improves the blog retrieval by using opinionated words acquired from WordNet in the query proximity.

• Some well-known opinion mining and sentiment analysis techniques [B. Liu 2006] could also be borrowed from text mining domain due to high textual nature of blogs.

• LingPipe (http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html) is another open source software which performs sentiment analysis on text corpora.

– Subjective (opinion) vs. Objective (fact) sentences

– Positive (favorable) vs. Negative (unfavorable) movie reviews

Page 37: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Influence

• Market Movers: “word-of-mouth”, trust and reputation

• Sway opinions: Government policies, campaign

• Customer Support and Troubleshooting

• Market research surveys: “use-the-views”

• Representative articles: 18.6 new blog posts per sec

• Advertising

Page 38: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blog Influence• Two types of influence

– Influential blog sites and site networks [Gill 2004, Gruhl et al 2004, Java et al 2006]

– Influential bloggers in a community [Agarwal et al. 2008]

• Blogosphere vs. Friendship Networks– Implicit vs. Explicit links– Blog statistics vs. Centrality measures– “influencing” vs. “could influence”– Loosely vs. Strictly defined graph structures

• Blog vs. Webpage Ranking– Blog sites too sparse for webpage ranking algorithms to work [Kritikopoulos et

al 2006]– Webpage acquires authority over time, blog posts’ influence diminishes– Greedy approach works better than PageRank, HITS to maximize influence

flow [Kempe et al 2003, Richardson & Domingos 2002]

Page 39: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Issue of Trust• Open standards and low barriers to publishing have created

overwhelming amount of collective wisdom• Yet more difficult for readers to discern whom to trust in

some cases• Similar to WWW

– Authoritative webpages e.g., HITS [Kleinberg et al. 1998], PageRank [Page et al. 1999]

• Blogosphere allow mass to create and edit content compromising the sanctity of the original content

• Some work exists for social friendship network domain, not many researchers have explored Blogosphere

• Huge potential for trust study in Blogosphere domain

Page 40: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Trust• Kale et al. 2007 transformed the problem of trust in

blogosphere to the one in social friendship networks– Studied propagation of trust among different blog sites

– Mined sentiments from a window of words around hyperlinks

– Identified positive, negative, or neutral sentiments towards the linked blog site

– Constructed a network of blog sites using hyperlinks

– Used Gruhl et al. 2004 trust propagation algorithm

– Some concerns• These blog sites have to be linked for trust propagation

• Trust is computed between blog sites based on how much one blog agrees or disagrees with the other

Mi+1 = Mi * Ci – Perform till convergence

M = Belief Matrix; Ci = Atomic Propagation

Ci = M + MT*M + MT + M*MT

Page 41: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Community Extraction• Blogosphere doesn’t have an explicit notion of communities

except community blogs

• Discovering communities among individual blogs based on interaction

• Different from blog clustering– Blog Clustering uses textual similarity

– Community extraction taps interaction

and link analysis

Page 42: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Community Extraction• Blogosphere doesn’t have an explicit notion of communities

• Different from blog clustering

• Researchers identify communities based on– Links: network of hyperlinks allows identification of virtual communities

• Several studies on finding community of webpages like Kleinberg 1998 and Kumar et al. 1999

• While Kleinberg used authority and hubs idea to explore communities of webpages, Kumar et al. extended the idea of hubs and authorities and included co-citations as a way to extract all communities on the web and used graph theoretic algorithms to identify all instances of graph structures that reflect community characteristics.

– Content: blogs with similar content or inspired by the same event form a virtual community

• Kumar et al. 2003, Efimova and Hendrick 2005, Blanchard 2004

Page 43: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Community Extraction• Chin and Chignell 2006 proposed a model for finding

communities taking the blogging behavior of bloggers into account– They aligned behavioral approaches through blog reader survey

in studying blog community.

• Blanchard and Marcus 2004 studied a multiple sport newsgroup “Virtual Settlement” and analyzed the possibility of emerging virtual communities– Newsgroups and discussion forums are similar in terms of

interaction patterns to Blogosphere

– More person-to-group interaction rather than person-to-person interaction

Page 44: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Spam blog (Splogs) Filtering• One of the major rising concerns on Blogosphere

• Spammers make most of their money by getting viewers to click on ads that run adjacent to their nonsensical text

• Open standards and low barriers to publishing escalates the problem and challenges while solving

• Besides degrading search quality, affects the network resources

Page 45: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Spam blog (Splogs) Filtering• One of the major rising concerns on Blogosphere

• Open standards and low barriers to publishing escalates the problem and challenges while solving

• Besides degrading search quality, affects the network resources

• Initial researches applied web spam link detection approaches– Ntoulas et al. 2006, distinguish between normal web pages and spam

webpages based on the statistical properties like• number of words, average length of words, anchor text, title keyword frequency,

tokenized URL

– Gyongyi et al. 2004, Gyongyi et al. 2006 use PageRank to compute the spam score of a webpage

• Kolari et al. 2006, consider each blog post as a static webpage and use both content and hyperlinks to classify a blog post as spam using a SVM based classifier

Page 46: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Spam blog (Splogs) Filtering• Some critical differences between web spam detection and

splog detection – The content on blog sites is very dynamic as compared to that of web pages,

so content based spam filters are ineffective

– Moreover, spammers can copy the content from some regular blog posts to evade content based spam filters

– Link based spam filters can easily be beaten by creating legitimate links

• Lin et al. 2007, consider the temporal dynamics of blog posts and propose a self similarity based splog detection algorithm based on characteristic patterns found in splogs like, – Regularities or patterns in posting times of splogs,

– Content similarity in splogs, and

– Similar links in splogs.

Page 47: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Opinion and Sentiment Analysis• BLEWS (http://research.microsoft.com/projects/blews/blews.aspx)

– Using Blogs to Provide Context for News Articles

– Political views: Liberal vs. Conservative

– Emotional charge

Page 48: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Opinion and Sentiment Analysis

Page 49: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Opinion and Sentiment Analysis

• BLEWS (http://research.microsoft.com/projects/blews/blews.aspx)

– Using Blogs to Provide Context for News Articles

– Political views: Liberal vs. Conservative

– Emotional charge

• SKEWS (http://www.skewz.com/)

– Reveal bias in news story (articles)

– Users rate the story on a scale from Liberal to Conservative

– Readers vote

Page 50: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Opinion and Sentiment Analysis

Page 51: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Opinion and Sentiment Analysis• BLEWS (http://research.microsoft.com/projects/blews/blews.aspx)

– Using Blogs to Provide Context for News Articles

– Political views: Liberal vs. Conservative

– Emotional charge

• SKEWS (http://www.skewz.com/)

– Reveal bias in news story (articles)

– Users rate the story on a scale from Liberal to Conservative

– Readers vote

• Opinion mining in legal blogs [Conrad and Schilder, 2007]– Collected blogs on legal search tools

– N-gram Language modeling approach to determine• Subjectivity of text

• Polarity of text

• Degree of polarity

Page 52: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

TOOLS AND APIS

Page 53: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Analysis and Visualization Tools

• Tools– Data Analysis & Visualization tools

– Statistics like centrality measures

• NetLogo (http://ccl.northwestern.edu/netlogo/)– Multi-agent programming language and modeling environment

designed in Logo

– Modelers can give instructions to hundreds or thousands of concurrently operating autonomous agents.

– Exploring the connection between the individuals (micro-level) and the patterns that emerge from the interaction of many individuals (macro-level).

Page 54: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Analysis and Visualization Tools

• StarLogo (http://education.mit.edu/starlogo/)– An extension of Logo– It is used to model the behavior of decentralized systems like social

networks.

• REPAST (http://repast.sourceforge.net/)– Recursive Porous Agent Simulation Toolkit– Agent-based social network modeling toolkit. – It has libraries for genetic algorithms, neural networks, etc. and allows

users to dynamically access and modify agents at run time.

• Swarm (http://www.swarm.org/wiki/Main Page)– A multi-agent simulation package– Simulates social or biological interaction of agents and their emergent

collective behavior.

Page 55: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Analysis and Visualization Tools• UCINet (http://www.analytictech.com/)

– Package for the analysis of social network data including centrality measures, subgroup identification, role analysis, elementary graph theory, and permutation-based statistical analysis

– Has strong matrix analysis routines, such as matrix algebra and multivariate statistics

• Pajek (http://vlado.fmf.uni-lj.si/pub/networks/pajek/)– Slovenian for spider– Analyzing and visualizing large networks like social networks

• Network package in R (http://cran.r-project.org/src/contrib/Descriptions/network.htm)– The network class can represent a range of relational data types, and

support arbitrary vertex/edge/graph attributes– This is used to create and/or modify the network objects and is used

for social network analysis (SNA)

Page 56: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Analysis and Visualization Tools

• InFlow (http://www.orgnet.com/inflow3.html)– Integrated product for network analysis and visualization

– Used in the SNA domain

• NetMiner (http://www.netminer.com/)– Tool for exploratory network data analysis and visualization

– NetMiner allows to explore network data visually and interactively, and helps in detecting underlying patterns and structures of the network

Page 57: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

APIs

• APIs– Data collection (blog posts, inlinks, tags, etc.)

– Technorati

– Digg

– del.icio.us

– Facebook

– StumbleUpon

Page 58: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Technorati API

• bloginfo query

API url: http://api.technorati.com/bloginfo?key=[apikey]&url=[blog url]

Sample response:<result><url>[URL]</url><weblog><name>[blog name]</name><url>[blog URL]</url><rssurl>[blog RSS URL]</rssurl><atomurl>[blog Atom URL]</atomurl><inboundblogs>[inbound blogs]</inboundblogs><inboundlinks>[inbound links]</inboundlinks><lastupdate>[date blog last updated]</lastupdate><rank>[blog ranking]</rank><lang></lang><foafurl>[blog foaf URL]</foafurl>

</weblog></result>

Page 59: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Technorati API

• BlogPostTags query

API url: http://api.technorati.com/blogposttags?key=[apikey]&url=[blog url]

Sample response:

<document><result><querycount>[limit parameter]</querycount>

</result><item><tag>[tag name];/tag><posts>[tag count]</posts>

</item></document>

Page 60: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Digg API

• List Stories

Api url: http://services.digg.com/stories/popular?domain=engadget.com&count=10&min_submit_date=[epoch(07/01/2008)]&max_submit_date=[epoch(07/15/1008)]&appkey=[appkey]

Sample response:

Page 61: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

<story id="7511382" link="http://www.engadget.com/2008/07/15/dev-team-shows-off-video-of-worlds-first-jailbroken-iphone-3g/" submit_date="1216139955" diggs="623" comments="38" promote_date="1216186807" status="popular" media="news" href="http://digg.com/apple/World_s_First_Jailbroken_iPhone_3G">

<title>World's First Jailbroken iPhone 3G</title><description>

We can't say this is a surprise... but it is sweet to see. The iPhone Dev Team has added a video to their blog showing off the latest version of their upcoming PwnageTool 2.0, along with a video of what they claim is the "world's first" jailbroken iPhone 3G.

</description><user name="jordankasteler"

icon="http://digg.com/users/jordankasteler/l.png" registered="1172914233" profileviews="8344" fullname="Jordan Kasteler"/>

<topic name="Apple" short_name="apple"/><container name="Technology" short_name="technology"/><thumbnail originalwidth="500" originalheight="378"

contentType="image/jpeg" src="http://digg.com/apple/World_s_First_Jailbroken_iPhone_3G/t.jpg" width="80" height="80"/></story>…

Digg API

lists 10 most popular stories from http://www.engadget.com between 1st July 2008 and 15th July

2008

Page 62: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

del.icio.us API

https://api.del.icio.us/v1/tags/get

Returns a list of tags and number of times used

Sample response

<tags><tag count="1" tag="activedesktop" /><tag count="1" tag="business" /><tag count="3" tag="radio" /><tag count="5" tag="xml" /><tag count="1" tag="xp" /><tag count="1" tag="xpi" />

</tags>

Page 63: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

DATA COLLECTION

Page 64: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Some Available Datasets

• Nielsen Buzzmetrics dataset (http://www.icwsm.org/format.txt)

– ~ 14M blog posts from 3M blog sites collected by Nielsen BuzzMetrics in May 2006

– 1.7M blog-blog links

– Up to a half of the blog outlinks are missing

– 51% of the total blog posts are in English

• Enron Email dataset (http://www.cs.cmu.edu/~enron/)

– Emails from about 150 users

– The corpus contains a total of about 0.5M messages

– People have studied the social networks between users based on link construction

– Links are constructed based on email senders and recipients

Page 65: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Available Datasets (2)

• TREC (http://ir.dcs.gla.ac.uk/test_collections/blog06info.html)

– A crawl of Feeds, and associated Permalink and homepage documents (from late 2005 and early 2006)

– 100,649 feeds were polled once a week for 11 weeks

– Total Number of Feeds collected:753,681

– Average feeds collected every day:10,615

– Uncompressed Size:38.6GB Compressed Size:8.0GB

– Reasonably sized spam component for added realism

– Fee: £400 ~ $794.36

Page 66: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Available Datasets (3)

• Mobile Network (http://kdl.cs.umass.edu/data/msn/msn-info.html)

– 27 objects – over 180,000 links – 1 object attribute – 2 link attributes

• Other ways– Crawl blogs– Blogcatalog– Statistics available from technorati API– Tagging available from del.icio.us API

Page 67: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Data Crawler

• BlogTrackers– User interface to crawl blog sites

• Scratch crawling (from blog archives)• Incremental crawling (from RSS feeds)

– Stores the blog posts in Microsoft SQL server– Collects

– Track blog posts like generate tag clouds for user specified time window

Blog post title Blog post tags

Blog post content Blog post permalink

Outlinks Blogger name

Inlinks Blog post date and time

Page 68: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Collectable Statistics from Blogs

• Inbound links– Blogs, blog post, webpage

• Outbound links– Blogs, blog post, webpage

• Comments• Blog server logs• Subscribers• Time to read/length• Links to post and incoming traffic from them• Links from post and outgoing traffic to them• Topic frequency score• Blogroll links• Tagged urls (del.icio.us, furl)

Page 69: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Citation Dataset• DBLP (http://kdl.cs.umass.edu/data/dblp/dblp-info.html)

– Over 1,200,000 objects

– Over 2,480,000 links

– 12 object attributes

– 6 link attributes

– 910 MB

Page 70: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

MEASURES, MODELS, AND METHODS

Page 71: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Measures, Models, and Methods

• Centrality Measures• Mathematical models: random, scale-free,

preferential attachment, hybrid, cascade• Content analysis techniques• Link analysis• Supervised/unsupervised learning algorithms• Decision theoretic approaches• Agent-based modeling

Page 72: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Centrality Measures

• Degree centrality– Defined as the number of ties a node has

– For directed network• Indegree ~ “popularity”

• Outdegree ~ “gregariousness”

– O(V2) for V vertices in dense network

– O(E) for E edges in sparse network

},0),(:{)( jvvMevC jadjd ∀≠=

Page 73: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Centrality Measures• Betweenness centrality

– a centrality measure of a vertex within a graph

– Vertices that occur on many shortest paths between other vertices have higher betweenness than those that do not

– Act as “broker” or “bridge”

– O(V3) complexity

– O(V2logV+VE) for sparse network

∑≠

∈≠≠

=

tsVtvs st

stB

vvCσ

σ )()(

σst is the geodesic path between s and t. σst(v) is the geodesic path between s and t passing through v.

Page 74: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Centrality Measures• Closeness centrality

– A centrality measure of a vertex within a graph

– Vertices that tend to have short geodesic distances to other vertices within the graph have higher closeness.

– Defined as the mean geodesic distance between a vertex v and all other reachable vertices

– O(V3) complexity

1

),(\

∑∈

n

tvdvVt

G

Page 75: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Centrality Measures

• Eigenvector centrality– Measure of the importance of a node in a network

– Assigns relative scores to all nodes in the network

– Better to connect to more “popular” nodes than less “popular” ones

– Google's PageRank is a variant of the Eigenvector centrality measure

∑=

=N

jjjii xAx

1,

1λ or xAx

λ1

=

Page 76: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Mathematical Models

• Power law– Polynomial relationship with scale invariance

– a and α are constants > 1

α−= axxf )(

Power Law plot Log-log plot of Power Law

Page 77: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Mathematical Models

• Power law– Examples: fractals, inverse square law, Zipf law,

pareto rule, etc.– Two aspects of real networks (e.g., Social

networks, Blog networks, World Wide Web, biological networks, etc.) make power law models an appropriate choice as compared to random models

• Number of nodes (N) in the real networks is not static• Most real networks exhibit preferential connectivity.

Page 78: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Mathematical Models

• Random– Random network models assume the probability that two vertices are

connected is random and uniform

• Preferential attachment– For example, a newly created webpage will be more likely to include links to

well-known documents with already high connectivity

– Thus the probability with which a new vertex connects to the existing vertices is not uniform

– This property of power law models is also known as preferential attachment models

• Hybrid– Pennock et al. 2002, have shown the relative importance of hybrid models in

simulating social networks

– Determine the appropriate proportion of random and scale free networks

VvvveP iji /)deg():( ∝⇔

βαα )1(/)deg():( −+∝⇔ VvvveP iji

10,):( ≤≤∝⇔ ββji vveP

Page 79: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Mathematical Models• Cascade

– Model information diffusion across the network– Linear threshold model

• Assumes a linear relation between influencing and influenced nodes• Defines influencing capacity and tolerance limit of each node• Sum of the influencing capacities of the neighboring nodes > tolerance

limit of this node, then this node gets influenced

– Independent cascade model• Assumes the process of influence flow as cascade of events• Event represents a node being influenced• Each node is assigned an influencing probability• If node v influences node w then at time t+1 w gets influenced. No more

attempts are made by v to influence w• Algorithm terminates when it is not possible to influence anymore nodes

Page 80: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Content Analysis Techniques

• Blogs have rich textual content

• Not only people create new content, they also enrich the existing content by providing meta data such as labels and tags

• Human-generated tags are also called folksonomies

• State-of-the-art content analysis techniques could be used for basic clustering, classification of the blog posts/blog sites

Page 81: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Content Analysis Techniques

• tf-idf could be used for indexing the blog entries

• Folksonomies could be considered as class labels

• Supervised machine learning could be performed and learned models could be used to predict the tags of unlabeled corpus

• This forms an essential concept for semi-automatically generating tag-clouds with least human intervention.

Page 82: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Link Analysis• Directed graph representation of blogs• Links form the edges of this graph

– Incoming links (inlinks)– Outgoing links (outlinks)

• Link analysis helps in understanding several interesting phenomena of social networks.

• Text around the links give us knowledge about the linked blog posts.

• Based on the links, hubs and authorities could be discovered. • This approach could lead to the identification of expert(s)

within communities. • Link traversal: O(dh) for average outdegree d and h hops

Page 83: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Use of Link Analysis

• Sparsity in the link structure of social networks makes it different from the World Wide Web model

• Many of them like Blogosphere assume implicit link information among bloggers

• Links could be constructed using the topic analysis

• Blog posts talking about same topic could be connected– Supervised learning algorithms could be used to predict topics of

unlabeled blogs

Page 84: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Decision Theoretic Approaches

• Group-individual interaction and the effect of decision on an individual and/or a community as a whole.

• Decision theory studies what is the best possible decision to take given a fully informed decision maker.

• In social networks find the node that is the best to make decisions with least possible side-effects and maximum possible gains for the rest of the nodes. – Finding a node that has maximum information diffusion across

• The analysis of such social decisions is dealt through game theory.

Page 85: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Agent-based Modeling• Each node in a social network can be treated as an agent

[Sallach and Macal, 2001]

• This agent could be a blogger in the blogosphere

• Decision making ability of the agent can be modeled probabilistically

• This can help us in studying the factors that affect his/her blogging behavior, what and how (s)he makes decisions

• Neural networks or genetic algorithms could also be used to train the model of these agents to closely simulate real-world scenario [Axelrod and Tesfatsion, 2005]

Page 86: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

PERFORMANCE, EVALUATION, AND METRICS

Page 87: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Performance• Does a project make any difference? We need to compare

– Previously proposed model(s)

– Baseline model(s)

• Basic criteria– Efficiency (speed, scalability)

– Correctness (get what you aim to get)

• Traditional data mining/ machine learning performance criteria– Precision

– Recall

– F-measure

– Area under ROC curve

– Inter and intra cluster distances

• Often we assume some ground truth

• Training-testing models work on this assumption

Train Test

Total number of examples

Page 88: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Evaluation Challenges in Blogosphere

• Concepts like influence, trust in Blogosphere can be subjective and often change based on particular needs

• No ground truth available

• Typical training-testing models may not work

• Often resort to human evaluation and surveys– How to select subjects, and how many would suffice

– How big is the evaluation budget, how long is the duration

• Need to figure out objective ways of evaluation

Page 89: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Evaluation and Metrics• Obviously, various tasks may require different

ways of performance evaluation– Blog search and retrieval– Clustering– Classification– Spam blogs– Diffusion– Influence

• We provide some illustrative examples next.

Page 90: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blog Search and Retrieval• Precision and Recall

– Typically evaluated on unordered sets of documents

– Top k results generate k sets for different values of k

– P and R evaluated at different top k

Recall Interpolated Precision

0.0 1.00

0.1 0.67

0.2 0.63

0.3 0.55

0.4 0.45

0.5 0.41

0.6 0.36

0.7 0.29

0.8 0.13

0.9 0.10

1.0 0.08

• Interpolated Precision– Defined as the highest precision at certain

recall

– Red line in the graph above shows the interpolated precision

)(max)( rprprrip ′=

≥′

Page 91: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blog Search and Retrieval• Mean Average Precision (MAP)

– Average of the precision scores after each relevant document retrieved for each query

– Mean of the individual average precision scores for all the queries q є Q

– Gives both precision and recall oriented aspects

– Generates a single value for the set of queries

– Less obvious interpretation than other measures

∑ ∑= =

=Q

j

m

kjk

j

j

RPmQ

QMAP1 1

)(11)(

Page 92: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Measuring a Ranked List• Normalized Discounted Cumulative Gain (NDCG)

• Measuring relevance of returned search result

• Multi levels of relevance (r): irrelevant (0), borderline (1), relevant (2)

• Each relevant document contributes some gain to be cumulated

• Gain from low ranked documents is discounted

• Normalized by the maximum DCG

∑=

=n

iin rddCG

11 ),...,(

∑=

+=n

i

in i

rrddDCG2 2

11 log),...,(

∑=

+=n

i

i

iRRMaxDCG

2 21 log

MaxDCGddDCGddNDCG nn /),...,(),...,( 11 =

Page 93: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

NDCG - Example

i

Ground Truth Ranking Function1 Ranking Function2

Document Order

riDocument

Orderri

Document Order

ri

1 d4 2 d3 2 d3 2

2 d3 2 d4 2 d2 1

3 d2 1 d2 1 d4 2

4 d1 0 d1 0 d1 0

NDCGGT=1.00 NDCGRF1=1.00 NDCGRF2=0.9203

6309.44log

03log

12log

22222

=

+++=GTDCG

6309.44log

03log

12log

22222

1 =

+++=RFDCG

2619.44log

03log

22log

12222

2 =

+++=RFDCG

6309.4== GTDCGMaxDCG

4 documents: d1, d2, d3, d4

Page 94: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Comparing Two Ranked Lists

• Rank correlation– Spearman’s rank correlation

coefficient

– Example

ρ = 1-(6*194/10*(102-1))= -0.175

)1(6

1 2

2

−−= ∑

nndiρ

Xi Yirank

xi

rank yi

di di2

86 0 1 1 0 0

97 20 2 6 -4 16

99 28 3 8 -5 25

100 27 4 7 -3 9

101 50 5 10 -5 25

103 29 6 9 -3 9

106 7 7 3 4 16

110 17 8 5 3 9

112 6 9 2 7 49

113 12 10 4 6 36

Page 95: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Concordance between a Pair• Rank correlation

– [-1,1]: perfect agreement=1, perfect disagreement=-1

– Kendall tau rank correlation coefficient

– Example1

)1(4

−−

=nn

Person A B C D E F G H

Rank by Height 1 2 3 4 5 6 7 8

Rank by Weight 3 4 1 2 5 7 8 6

P = 5 + 4 + 5 + 4 + 3 + 1 + 0 + 0 = 22τ = (4*22/8*7 )-1= (88/56)-1 = 0.57

Page 96: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blog Clustering• Within cluster between cluster distance

– Small within cluster distance Cohesive

– Large between cluster distance well-separated clusters• Distance between cluster mean/centroids

• Single linkage

• Complete linkage

• Average linkage

Cluster Mean/Centroids Single Linkage Complete Linkage Average Linkage

Cohesive, well-separated clusters

Page 97: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blog Clustering• How many clusters should we have

– The elbow criterion can be used to pick the number of clusters

– Explained variance is ratio of between-group variance to total variance

Page 98: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Spam Blogs• Train-Test model

• Precision, Recall, F-measure based metrics

• Precision (P) = TP/(TP+FP)

• Recall (R)= TP/(TP+FN)

• F-measure (F) = 2*PR/(P+R)

spam not-spam

spam 7 4

not-spam 3 6

Actual

Pred

icte

d

TP=7, FP=4, FN=4, TN=6P=7/11, R=7/10, F=0.663

ANAP

Where can we find FP, FN,TP, and TN

Page 99: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

CASE STUDIES

Page 100: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Case Studies

• “Familiar Strangers” in Blogosphere

• Employing Collective Wisdom

• Blog Community Interaction

• iFinder: Finding Influential Bloggers

Page 101: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

“FAMILIAR STRANGERS”

Page 102: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Short Head and Long Tail

• Few people are densely connected: Short Head

• Many people are sparsely connected: Long Tail

• Businesses like Amazon, Netflix, Wal-Mart, etc. obey this phenomenon

• Wal-Mart sells more Long Tail items than Short Head

• Zipf, Power Law, Pareto’s Law generate Long Tail

Short Head

Long Tail

Page 103: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Who are Familiar Strangers?

• Observe repeatedly, but do not know each other

• Real World– E.g., Individuals observe each other daily on a train

– Discover the latent pattern: going to same workplace,

• Blogosphere– What you write is what you are…

– Have similar blogging behavior, interests (Movie and games, Technology, and Politics, etc.)

– Never cited (came across) each other

Page 104: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Bloggers in Long Tail

• Not returned as top hits by search engines

• Not popular

• Inordinately many

• Disconnected

• Movie Critics – Short Head

(nytimes.com)

• Movie Bloggers – Long Tail

• Most lucrative test-bed for Familiar Strangers

Page 105: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Aggregating Niches in Long Tail

• A blogger’s familiar-strangers together form a critical mass such that– the understanding of one blogger gives us a sensible

and representative glimpse to others,– more data about familiar strangers can be collected

for better customization and services (e.g., personalization and recommendation),

– the nuances among them present new business opportunities, and

– knowledge about them can facilitate predictive modeling and trend analysis.

Page 106: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Need for Aggregation• Customized attention requires

substantial data• Majority of blog sites are in the

Long Tail• …and are disconnected• Aggregating the similar yet

disconnected for obtaining critical mass

• Lack of data can result in irrelevant ads (see an example on the right)

• Increase participation• Move from the Long Tail closer to

the Short Head• Smooth knowledge transfer

between familiar strangers

Page 107: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Definition – Familiar Strangers

• Given a blogger b, familiar strangers to b are a set of bloggers B = {b1,b2,…,bn}, who share common patterns as b, like blogging on similar topics, but have never come across each other or have never related to each other.

• Familiar:

Blog posts

Page 108: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Definition – Familiar Strangers

• Strangers:– Partial strangers

– Total strangers

• Partial strangers

bj is in b’s Social Network b is in bj’s Social Network

Page 109: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Definition – Familiar Strangers

• Total strangers

• We focus on total strangers

b and bj have disjoint Social Networks

Page 110: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Types of Familiar Strangers

• Organizational differences in the blogosphere eventuate disparate types of familiar stranger bloggers

Community-level familiar strangers

Networking-site-level familiar strangers

Blogosphere-level familiar strangers

Page 111: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Community Level Familiar Stranger

• MySpace has a community called “A group for those who love history”

• It has 38 members• two members, “Maria”

and “John” – blog profusely on the

similar topic,– but they are not in each

other’s social network.

Page 112: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Networking Site Level Familiar Stranger

• 2 groups on MySpace, – The Samurai (32 members)– The Japanese Sword (84

members)– Marc, top blogger on “The

Samurai” and Jeff, top blogger on “The Japanese Sword” discuss about Japanese martial arts.

– Neither of them is in the other’s social network.

– This implies, though being active locally and discussing on the same theme, the two bloggers are still strangers.

Page 113: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blogosphere Level Familiar Stranger

• 2 different social networking sites, MySpace and Orkut. – The Samurai (32 members)

from MySpace– Samurai Sword (29 members)

from Orkut– Top bloggers from the

respective communities in MySpace and Orkut, Marc and Anant, respectively, share the blogging theme but they are not in each others’ social network.

– The above example illustrates the existence of blogosphere-level familiar strangers.

Page 114: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Challenges

• Link analysis

• Defining Similarity

• Data collection

• Experiments

• Evaluation & Validation

• Current tools & technologies search the Short Head

Page 115: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Search via Blog Posts

Search via Blogger’s Blog Post

Page 116: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Search via Context

Search via Blogger’s context

Page 117: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Leveraging User Contributions

Page 118: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

EMPLOYING COLLECTIVE WISDOM

iFinder

Page 119: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

What is Collective Wisdom?

• Shared knowledge arrived at by individuals and groups, used to solve problems

• Group wisdom or Co-intelligence

• Blog Clustering– User generated content as well as user enriched content

– A prominent feature of social web

– Several users tag and categorize their blogs

– Collective wisdom emerges

Page 120: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Why Collective Wisdom?

• Challenges with traditional approaches– High dimensionality

– Sparsity

– Do not leverage collective wisdom

– Require number of clusters a priori

– Similarity measure

Page 121: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Blog Categories

Blog level Tags

Blog Post level Tags

5 Most recent blog posts’ snippets

BlogCatalog

Page 122: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

BlogCatalog taxonomy

WisClus clusters

Page 123: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Data Collection

• Blogcatalog, using 4 bloggers as seed, crawled their social network in a breadth-first fashion

• Report number of unique bloggers recorded with different number of seed bloggers (2,4,6)

0

2000

4000

6000

8000

10000

12000

14000

Tota

l Blo

gger

s Cr

awle

d

Total Number of Starting Bloggers

Page 124: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Dataset Characteristics

• Variations in the dataset – depending on the category taxonomy– Top-level

– All-category

– One node-split: because of the skewed distribution of categories

0

2000

4000

6000

8000

10000

12000

pers

onal

blog

ging

arts

& e

nthu

mor

tech

nolo

gyne

ws

& m

edia

polit

ical

writ

ing

heal

thin

tern

etso

ciet

ym

usic

busi

ness

spor

tssh

oppi

ngco

mpu

ters

food

& d

rink

philo

soph

yed

ucat

ion

& …

relig

ion

hom

e &

gar

den

trav

elbl

og re

sour

ces

pets

scie

nce

envi

ronm

ent

craf

tsce

lebr

ity

Num

ber o

f Blo

g Si

tes

Page 125: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Experiments & Results

• Link strength experiments: LinkStrength > 5

• Category taxonomy variations: All-category

• Baseline vs. WisClus – K-means

– Hierarchical Type Method Within Avg Between Avg

Baseline - BloggerSpaceKmeans 0.0363 0.2194Hierarchical 0.0890 0.3644

WisClus - CategorySpaceKmeans 0.0615 0.2860Hierarchical 0.0857 0.2761

WisClus - BloggerSpaceKmeans 0.0844 0.7090Hierarchical 0.0849 0.8118

Page 126: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Visualizations of clusters using Collective Wisdom

Visualization Results

Page 127: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Visualizations of clusters using Baseline approach

Visualization Results

Page 128: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Use Pajek to visualize the results

Visualization Results

Page 129: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

BLOG COMMUNITY INTERACTION

Page 131: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Interaction Through Observation

• Interaction through observed events– Communities with similar sentiments could be aggregated

Macbook

Dislike Like

-1

0

1

Dislike

Indifferent

Like

Page 132: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Proposed Approach – Flowchart

Identify an event

E.g., Saddam Hussein’s Death Sentence

Analyze pre-event, during-event, post-event blog posts

E.g., November-06, December-06, January-07

Summarize the blog posts to pick relevant

content

Generate Tag CloudsUse “WeFeelFine” API to filter the sentiments

Compare these Sentiments to observe

the interaction with respect to an event

Page 133: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

A Running Example

accept according agree

Americaannounced

Baghdad building cabinet decisions defense dialogue first future haveincrease looking mass partnerpatriotic people plan political powers regional see shares situation

solutions start state term will

army bad beginning channelscountry dead demonstrationsdown justice new occupation outside right Saddam Salahuddin security shut since single some stupidity todayZawra

LegendPositive SentimentNegative Sentiment

J F M …. … D J F M … … D J F M … N D J F …. … D J F M … D20082004 2005 2006 2007

Iraq the Model Baghdad Burning

Saddam’s Verdict

Page 134: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

IFINDER: IDENTIFYING INFLUENTIAL BLOGGERS IN A COMMUNITY

http://videolectures.net/wsdm08_agarwal_iib/

Page 135: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Physical and Virtual World

Physical World

Domain Expert

Friends

Virtual World

Online Community

Page 136: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Introduction

• Inspired by the analogy between real-world and blog communities, we answer:

Who are the influentials in Blogosphere?

Can we find them?

Active Bloggers = Influential Bloggers?

• Active bloggers may not be influential• Influential bloggers may not be active

Page 137: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Searching The Influentials

• Active bloggers– Easy to define

– Often listed at a blog site

– Are they necessarily influential

• How to define an influential blogger?– Influential bloggers have influential posts

– Subjective

– Collectable statistics

– How to use these statistics

Page 138: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Intuitive Properties• Social Gestures (statistics)

– Recognition: Citations (incoming links)– An influential blog post is recognized by many. The more influential the

referring posts are, the more influential the referred post becomes.– Activity Generation: Volume of discussion (comments)

– Amount of discussion initiated by a blog post can be measured by the comments it receives. Large number of comments indicates that the blog post affects many such that they care to write comments, hence influential.

– Novelty: Referring to (outgoing links)– Novel ideas exert more influence. Large number of outlinks suggests that

the blog post refers to several other blog posts, hence less novel. – Eloquence: “goodness” of a blog post (length)

– An influential is often eloquent. Given the informal nature of Blogosphere, there is no incentive for a blogger to write a lengthy piece that bores the readers. Hence, a long post often suggests some necessity of doing so.

• Influence Score = f(Social Gestures)

Page 139: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

A Preliminary Model• Additive models are good to determine the combined value of

each alternative [Fensterer, 2007]. It also supports preferential independence of all the parameters involved in the final decision. A weighted additive function can be used to evaluate trade-offs between different objectives [Keeney and Raiffa, 1993].

))(max()(

))(()()(

)()(

)()()(||

1

||

1

l

pcomm

pcomm

m nnoutmin

pIBiIndex

plowInfluenceFwwpI

plowInfluenceFwpI

pIwpIwplowInfluenceF

=

+×=

+∝

−= ∑ ∑= =

γλ

γ

ι θ

Page 140: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Understanding the Influentials• Are influential bloggers simply active bloggers?

• If not, in what ways are they different? – Can the model differentiate them?

• Are there different types of influential bloggers?

• What other parameters can we include to evolve the model?

• Are there temporal patterns of the influential bloggers?

Page 141: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

How to Evaluate the Model

• Where to find the ground truth?– Lack of Training and Test data

– Any alternative?

• About the parameters – How can they be determined

– Are they all necessary?• Are any of these correlated?

• Data collection– A real-world blog site

– “The Unofficial Apple Weblog”

Page 142: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Active & Influential Bloggers

• Active and Influential Bloggers

• Inactive but Influential Bloggers

• Active but Non-influential Bloggers

• We don’t consider “Inactive and Non-influential Bloggers”, because they seldom submit blog posts. Moreover, they do not influence others.

Page 143: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Lesion Study

• To observe if any parameter is irrelevant.

Page 144: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Other Parameters

• Rate of Comments

“Spiky” comments reaction “Flat” comments reaction

Page 145: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Temporal Patterns of Influential Bloggers

• Long term Influentials• Average term Influentials• Transient Influentials• Burgeoning Influentials

Page 146: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Verification of the Model

• Revisit the challenges– No training and testing data– Absence of ground truth– Subjectivity

• We use another Web 2.0 website, Digg as a reference point.

• “Digg is all about user powered content. Everything is submitted and voted on by the Digg community. Share, discover, bookmark, and promote stuff that‘s important to you!”

• The higher the digg score for a blog post is, the more it is liked.

• A not-liked blog post will not be submitted thus will not appear in Digg.

Page 147: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Verification of the Model• Digg records top 100 blog posts.

• Top 5 influential and top 5 active bloggers were picked to construct 4 categories

• For each of the 4 categories of bloggers, we collect top 20 blog posts from our model and compare them with Digg top 100.

• Distribution of Digg top 100 and TUAW’s 535 blog posts

Page 148: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Verification of the Model• Observe how much our model aligns with Digg.

• Compare top 20 blog posts from our model and Digg.

• Considered last six months

• Considered all configuration to study relative importance of each parameter.

• Inlinks > Comments > Outlinks > Blog post length

Page 149: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

Some Call for Papers• ACM TKDD Special Issue on Social Computing

http://www.public.asu.edu/~huanliu/acm-tkdd-sbp

• Second International Conference on Social Computing, Behavioral Modeling, and Prediction (SBP09)http://www.public.asu.edu/~huanliu/sbp09

• SIAM International Conf on Data Mining (SDM)Sparks (Reno area), Nevada, April 30 - May 2, 2009.

http://www.siam.org/meetings/sdm09

Page 150: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

References[Adar and Adamic, 2005] Adar, E. and Adamic, L. A. (2005). Tracking information epidemics in blogspace. In WI ’05: Proceedings of

the The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’05), pages 207–214, Washington, DC, USA. IEEE Computer Society.

[Adar et al., 2004] Adar, E., Zhang, L., Adamic, L., and Lukose, R. (2004). Implicit structure and the dynamics of blogspace. InProceedings of the 13th International World Wide Web Conference.

[Agarwal et al., 2007a] Agarwal, N., Galan, M., and Chen, Y. (2007a). Approximate structural matching over ordered xml documents. In International Database Engineering and Applicaion Symposium.

[Agarwal et al., 2008a] Agarwal, N., Galan, M., Liu, H., and Subramanya, S. (2008a). Clustering blogs with collective wisdom. InProceedings of the International Conference on Web Engineering (ICWE08).

[Agarwal et al., 2005] Agarwal, N., Haque, E., Liu, H., and Parsons, L. (2005). Research paper recommender system: A subspaceclustering approach. In The 6th International Conference on Web-Age Information Management (WAIM 2005), pages 475 –491.

[Agarwal et al., 2006a] Agarwal, N., Haque, E., Liu, H., and Parsons, L. (2006a). A subspace clustering framework for research group collaboration. International Journal of Information Technology and Web Engineering, 1(1):35 – 58.

[Agarwal and Liu, 2008a] Agarwal, N. and Liu, H. (2008a). Blogosphere: Research issues, tools, and appli-cations. SIGKDD Explorations.

[Agarwal and Liu, 2008b] Agarwal, N. and Liu, H. (2008b). A study of communities and in fluence in blogosphere. In 2nd SIGMOD PhD Innovative Database and Research Doctorate Consortium (IDAR08), Vancouver, Canada.

[Agarwal et al., 2008b] Agarwal, N., Liu, H., Salerno, J. J., and Sundarajan, S. (2008b). Understanding group interaction in blogosphere: A case study. In 2nd International Conference on Computational Cultural Dynamics (ICCCD08), Washington D.C.

[Agarwal et al., 2007b] Agarwal, N., Liu, H., Salerno, J. J., and Yu, P. S. (2007b). Searching for Familiar Strangers on Blogosphere: Problems and Challenges. In NSF Symposium on Next-Generation Data Mining and Cyber-enabled Discovery and Innovation (NGDM).

Page 151: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

References[Agarwal et al., 2008c] Agarwal, N., Liu, H., Tang, L., and Yu, P. S. (2008c). Identifying the in fluential bloggers. In Proccedings of the

First ACM International Conference on Web Search and Data Mining (WSDM08) (Video available at: http://videolectures.net/wsdm08 agarwal iib/).

[Agarwal et al., 2006b] Agarwal, N., Liu, H., and Zhang, J. (2006b). Blocking objectionable web content by leveraging multiple information sources. SIGKDD Explor. Newsl., 8(1):17–26.

[Albert, 2001] Albert, R. (2001). Statistical mechanics of complex networks. PhD thesis.

[Aleman-Meza et al., 2006] Aleman-Meza, B., Nagarajan, M., Ramakrishnan, C., Ding, L., Kolari, P., Sheth,

A. P., Arpinar, I. B., Joshi, A., and Finin, T. (2006). Semantic analytics on social networks: experiences in addressing the problem of conflict of interest detection. In WWW ’06: Proceedings of the 15th international conference on World Wide Web, pages 407–416, New York, NY, USA. ACM Press.

[Ali-Hasan and Adamic, 2007] Ali-Hasan, N. and Adamic, L. (2007). Expressing social relationships on the blog through links and comments. In International Conference on Weblogs and Social Media.

[Anderson, 2006] Anderson, C. (2006). The long tail : why the future of business is selling less of more. New York : Hyperion.

[Attardi and Simi, 2006] Attardi, G. and Simi, M. (2006). Blog mining through opinionated words. In Proceedings of the fifteenth Text REtrieval Conference (TREC).

[Avesani et al., 2005] Avesani, P., Massa, P., and Tiella, R. (2005). A trust-enhanced recommender system application: Moleskiing. In SAC, pages 1589–1593.

[Backstrom et al., 2006] Backstrom, L., Huttenlocher, D., Kleinberg, J., and Lan, X. (2006). Group for-mation in large socialnetworks: membership, growth, and evolution. In KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 44–54, New York, NY, USA. ACM Press.

[Barabasi and Albert, 1999] Barabasi, A. L. and Albert, R. (1999). Emergence of scaling in random net-works. Science, 286(509).

[Bekkerman and McCallum, 2005] Bekkerman, R. and McCallum, A. (2005). Disambiguating web appear-ances of people in a social network. In WWW ’05: Proceedings of the 14th international conference on World Wide Web, pages 463–470, New York, NY, USA. ACM Press.

Page 152: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

References[Blanchard and Markus, 2004] Blanchard, A. and Markus, M. (2004). The experienced sense of a virtual community:

Characteristics and processes. The DATA BASE for Advances in Information Systems, 35(1).

[Blum et al., 2006] Blum, A., Mugizi, T. H. C., and Rwebangira, M. R. (2006). A random-surfer web-graph model. In Third Workshopon Analytic Algorithmics and Combinatorics (ANALCO06).

[Bonhard et al., 2006] Bonhard, P., Harries, C., McCarthy, J., and Sasse, M. A. (2006). Accounting for taste: using pro file similarity to improve recommender systems. In CHI ’06: Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 1057–1066, New York, NY, USA. ACM Press.

[Brin and Page, 1998] Brin, S. and Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1–7):107–117.

[Brooks and Montanez, 2006] Brooks, C. H. and Montanez, N. (2006). Improved annotation of the blogo-sphere via autotagging and hierarchical clustering. In WWW ’06: Proceedings of the 15th international conference on World Wide Web, pages 625–632, New York, NY, USA. ACM Press.

[Cai et al., 2005] Cai, D., Shao, Z., He, X., Yan, X., and Han, J. (2005). Mining hidden community in heterogeneous social networks. In LinkKDD ’05: Proceedings of the 3rd international workshop on Link discovery, pages 58–65, New York, NY, USA. ACM Press.

[Cai-Nicolas Ziegler, 2004] Cai-Nicolas Ziegler, G. L. (2004). Analyzing correlation between trust and user similarity in onlinecommunities. In iTrust, number 251-265.

[Chi et al., 2006] Chi, Y., Tseng, B. L., and Tatemura, J. (2006). Eigen-trend: trend analysis in the blogo-sphere based on singular value decompositions. In CIKM ’06: Proceedings of the 15th ACM international conference on Information and knowledge management, pages 68–77, New York, NY, USA. ACM Press.

[Chin and Chignell, 2006] Chin, A. and Chignell, M. (2006). A social hypertext model for finding community in blogs. In HYPERTEXT’06: Proceedings of the seventeenth conference on Hypertext and hypermedia, pages 11–22, New York, NY, USA. ACM Press.

Page 153: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

References[Coffman and Marcus, 2004] Coffman, T. and Marcus, S. (2004). Dynamic classification of groups through social network analysis

and hmms. In Proceedings of IEEE Aerospace Conference.

[Conrad and Schilder, 2007] Conrad, J. G. and Schilder, F. (2007). Opinion mining in legal blogs. In ICAIL ’07: Proceedings of the 11th international conference on Arti ficial intelligence and law, pages 231–236, New York, NY, USA. ACM.

[Deerwester et al., 1990] Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for information science.

[Domingos and Richardson, 2001] Domingos, P. and Richardson, M. (2001). Mining the network value of customers. In KDD ’01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 57–66, New York, NY, USA. ACM Press.

[Drezner and Farrell, 2004] Drezner, D. and Farrell, H. (2004). The power and politics of blogs. In American Political Science Association Annual Conference.

[Durrett, 1988] Durrett, R. (1988). Lecture Notes on Particle Systems and Percolation. Wadsworth Pub-lishing.

[Efimova and Hendrick, 2005] Efimova, L. and Hendrick, S. (2005). In search for a virtual settlement: An exploration of weblog community boundaries.

[Elkin, ] Elkin, T. Just an online minute... online forecast. http://publications.mediapost.com/index.cfm?fuseaction =Articles.showArticle art aid=29803.

[Fensterer, 2007] Fensterer, G. D. (2007). Planning and Assessing Stability Operations: A Proposed Value Focus Thinking Approach. PhD thesis, Air Force Institute of Technology.

[Flake et al., 2002] Flake, G., Lawrence, S., Giles, C. L., and Coetzee, F. (2002). Self-organization and identi fication of web communities. IEEE Computer, 35(3).

[Flake et al., 2000] Flake, G. W., Lawrence, S., and Giles, C. L. (2000). E fficient identification of web communities. In 6th International Conference on Knowledge Discovery and Data Mining.

Page 154: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

References[Friedman, 2005] Friedman, T. L. (2005). The World Is Flat: A Brief History of the Twenty-First Century. Farrar, Straus and Giroux.

[Gamon et al., 2005] Gamon, M., Aue, A., Corston-Oliver, S., and Ringger, E. (2005). Pulse: Mining

[Gamon et al., 2008] Gamon, M., et al. (2008). BLEWS: Using Blogs to Provide Context for News Articles. In 2nd ICWSM.

Customer Opinions from Free Text. In Proceedings of the 6th International Symposium on Intelligent

Data Analysis.

[Gibson et al., 1998] Gibson, D., Kleinberg, J., and Raghavan, P. (1998). Inferring web communities from link topology. In 9th ACM Conference on Hypertext and Hypermedia.

[Gill, 2004] Gill, K. E. (2004). How can we measure the influence of the blogosphere? In Proceedings of the WWW’04: workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics.

[Gillmor, 2006] Gillmor, D. (2006). We the Media: Grassroots Journalism by the People, for the People. O’Reilly.

[Girvan and Newman, 2002] Girvan, M. and Newman, M. E. J. (2002). Community structure in social and biological networks. In National Academy of Science.

[Gladwell, 2000] Gladwell, M. (2000). The Tipping Point: How Little Things Can Make a Big Di fference. Little, Brown and Company.

[Golbeck, 2006a] Golbeck, J. (2006a). Combining provenance with trust in social networks for semantic web content filtering. In IPAW, pages 101–108.

[Golbeck, 2006b] Golbeck, J. (2006b). Generating predictive movie recommendations from trust in social networks. In iTrust, pages 93–104.

[Golbeck et al., 2004] Golbeck, J., Bonatti, P. A., Nejdl, W., Olmedilla, D., and Winslett, M. (2004). Trust,

security, and reputation on the semantic web. In Proceedings of the ISWC-04 Workshop on Trust, Security,

and Reputation on the Semantic Web.

[Golbeck and Hendler, 2004a] Golbeck, J. and Hendler, J. (2004a). Reputation Network Analysis for Email Filtering. In Proceedings Conference on Email and Anti-Spam (CEAS), Mountain View, USA.

Page 155: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

References[Golbeck and Hendler, 2006] Golbeck, J. and Hendler, J. (2006). Inferring binary trust relationships in web-based social networks.

ACM Trans. Inter. Tech., 6(4):497–529.

[Golbeck and Hendler, 2004b] Golbeck, J. and Hendler, J. A. (2004b). Accuracy of metrics for inferring trust and reputation in semantic web-based social networks. In EKAW, pages 116–131.

[Golbeck and Parsia, 2006] Golbeck, J. and Parsia, B. (2006). Trust network-based filtering of aggregated claims. IJMSO, 1(1):58–65.

[Golbeck et al., 2003] Golbeck, J., Parsia, B., and Hendler, J. A. (2003). Trust networks on the semantic web. In CIA, pages 238–249.

[Goldenberg et al., 2001] Goldenberg, J., Libai, B., and Muller, E. (2001). Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing Letters, 12:211–223.

[Gruhl et al., 2005] Gruhl, D., Guha, R., Kumar, R., Novak, J., and Tomkins, A. (2005). The predictive power of online chatter. In KDD ’05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 78–87, New York, NY, USA. ACM Press.

[Gruhl et al., 2004] Gruhl, D., Liben-Nowell, D., Guha, R., and Tomkins, A. (2004). Information di ffusion through blogspace. SIGKDD Exploration Newsletter, 6(2):43–52.

[Guha et al., 2004] Guha, R., Kumar, R., Raghavan, P., and Tomkins, A. (2004). Propagation of trust and distrust. In WWW ’04:Proceedings of the 13th international conference on World Wide Web, pages 403–412, New York, NY, USA. ACM Press.

[Gyongyi et al., 2006] Gyongyi, Z., Berkhin, P., Garcia-Molina, H., and Pedersen, J. (2006). Link spam detection based on mass estimation. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB).

[Gyongyi et al., 2004] Gyongyi, Z., Garcia-Molina, H., and Pedersen, J. (2004). Combating web spam with trustrank. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB).

[Herring et al., 2005] Herring, S. C., Kouper, I., Paolillo, J. C., Scheidt, L. A., Tyworth, M., Welsch, P., Wright, E., and Yu, N. (2005). Conversations in the blogosphere: An analysis ”from the bottom up”. hicss, 04:107b.

Page 156: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

References[Hopcroft et al., 2003] Hopcroft, J., Khan, O., Kulis, B., and Selman, B. (2003). Natural communities in large linked networks. In 9th

Intl. Conf. on Knowledge Discovery and Data Mining. [Hope et al., 2006] Hope, T., Nishimura, T., and Takeda, H. (2006). An integrated method for social network extraction. In WWW

’06: Proceedings of the 15th international conference on World Wide Web, pages 845–846, New York, NY, USA. ACM Press. [Java et al., 2006] Java, A., Kolari, P., Finin, T., and Oates, T. (2006). Modeling the spread of in fluence on the blogosphere. In

Proceedings of the 15th International World Wide Web Conference. [Kale et al., 2007] Kale, A., Karandikar, A., Kolari, P., Java, A., Finin, T., and Joshi, A. (2007). Modeling trust and in fluence in the

blogosphere using link polarity. In International Conference on Weblogs and Social Media. [Katz and Golbeck, 2006] Katz, Y. and Golbeck, J. (2006). Social network-based trust in prioritized default logic. In AAAI. [Kautz et al., 1997] Kautz, H., Selman, B., and Shah, M. (1997). Referral web: combining social networks and collaborative filtering.

Commun. ACM, 40(3):63–65. [Keeney and Raiffa, 1993] Keeney, R. L. and Raiffa, H. (1993). Decisions with Multiple Objectives: Prefer­ences and Value Tradeoffs.

Cambridge University Press. [Keller and Berry, 2003] Keller, E. and Berry, J. (2003). One American in ten tells the other nine how to vote, where to eat and,

what to buy. They are The In fluentials. The Free Press. [Kleinberg, 1998] Kleinberg, J. (1998). Authoritative sources in a hyperlinked environment. In 9th ACM-SIAM Symposium on

Discrete Algorithms. [Kolari et al., 2006a] Kolari, P., Finin, T., and Joshi, A. (2006a). SVMs for the blogosphere: Blog iden-ti fication and splog detection.

In AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs. [Kolari et al., 2006b] Kolari, P., Java, A., Finin, T., Oates, T., and Joshi, A. (2006b). Detecting spam blogs: A machine learning

approach. In Proceedings of the 21st National Conference on Arti ficial Intelligence (AAAI). [Kritikopoulos et al., 2006] Kritikopoulos, A., Sideri, M., and Varlamis, I. (2006). Blogrank: ranking weblogs based on connectivity

and similarity features. In AAA-IDEA ’06: Proceedings of the 2nd international workshop on Advanced architectures and algorithms for internet delivery and applications, page 8, New York, NY, USA. ACM Press.

Page 157: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

References[Kumar et al., 2003] Kumar, R., Novak, J., Raghavan, P., and Tomkins, A. (2003). On the Bursty Evolution of Blogspace. In Proceedings of the

12th international conference on World Wide Web, pages 568–576, New York, NY, USA. ACM Press. [Kumar et al., 2006] Kumar, R., Novak, J., and Tomkins, A. (2006). Structure and evolution of online social networks. In KDD ’06: Proceedings

of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 611–617, New York, NY, USA. ACM Press.

[Kumar et al., 1999] Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. (1999). Trawling the web for emerging cyber communities. In The 8th International World Wide Web Conference.

[Leshed and Kaye, 2006] Leshed, G. and Kaye, J. J. (2006). Understanding how bloggers feel: recognizing a ffect in blog posts. In CHI ’06: CHI ’06 extended abstracts on Human factors in computing systems, pages 1019–1024, New York, NY, USA. ACM Press.

[Leskovec et al., 2007] Leskovec, J., McGlohon, M., Faloutsos, C., Glance, N., and Hurst, M. (2007). Cas-cading behavior in large blog graphs. In SIAM International Conference on Data Mining.

[Li et al., 2007] Li, B., Xu, S., and Zhang, J. (2007). Enhancing clustering blog documents by utilizing author/reader comments. In ACM-SE 45: Proceedings of the 45th annual southeast regional conference, pages 94–99, New York, NY, USA. ACM Press.

[Liggett, 1985] Liggett, T. (1985). Interacting Particle Systems. Springer. [Lin et al., 2006] Lin, Y.-R., Sundaram, H., Chi, Y., Tatemura, J., and Tseng, B. (2006). Discovery of blog communities based on mutual

awareness. In Proceedings of the 3rd annual workshop on webloging ecosystem: aggreation, analysis and dynamics. [Lin et al., 2007] Lin, Y.-R., Sundaram, H., Chi, Y., Tatemura, J., and Tseng, B. L. (2007). Splog detection using self-similarity analysis on blog

temporal dynamics. In Proceedings of the 3rd international workshop on Adversarial information retrieval on the web (AIRWeb), pages 1–8, New York, NY, USA. ACM Press.

[Liu et al., 2007] Liu, Y., Huang, X., An, A., and Yu, X. (2007). Arsa: a sentiment-aware model for predicting sales performance using blogs. In SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 607–614, New York, NY, USA. ACM Press.

[Marlow et al., 2006] Marlow, C., Naaman, M., Boyd, D., and Davis, M. (2006). Ht06, tagging paper, taxonomy, flickr, academic article, to read. In HYPERTEXT ’06: Proceedings of the seventeenth conference on Hypertext and hypermedia, pages 31–40, New York, NY, USA. ACM Press.

Page 158: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

References[Massa and Avesani, 2004] Massa, P. and Avesani, P. (2004). Trust-aware collaborative filtering for recom­mender systems. In

CoopIS/DOA/ODBASE, pages 492–508. [Massa and Avesani, 2005] Massa, P. and Avesani, P. (2005). Controversial users demand local trust metrics: An experimental

study on epinions.com community. In AAAI, pages 121–126. [Massa and Bhattacharjee, 2004] Massa, P. and Bhattacharjee, B. (2004). Using trust in recommender sys-tems: An experimental

analysis. In iTrust, pages 221–235. [Massa and Hayes, 2005] Massa, P. and Hayes, C. (2005). Page-rerank: Using trusted links to re-rank authority. In Web Intelligence,

pages 614–617. [Matsumura et al., 2005] Matsumura, N., Goldberg, D. E., and Llor&#224;, X. (2005). Mining directed social network from message

board. In WWW ’05: Special interest tracks and posters of the 14th inter-national conference on World Wide Web, pages 1092–1093, New York, NY, USA. ACM Press.

[Matsuo et al., 2006] Matsuo, Y., Mori, J., Hamasaki, M., Ishida, K., Nishimura, T., Takeda, H., Hasida, K., and Ishizuka, M. (2006). Polyphonet: an advanced social network extraction system from the web. In WWW ’06: Proceedings of the 15th international conference on World Wide Web, pages 397–406, New York, NY, USA. ACM Press.

[McDonald, 2003] McDonald, D. W. (2003). Recommending collaboration with social networks: a compar-ative evaluation. In CHI ’03: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 593–600, New York, NY, USA. ACM Press.

[McNee et al., 2002] McNee, S. M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S. K., Rashid, A. M., Konstan, J. A., and Riedl, J. (2002). On the recommending of citations for research papers. In CSCW ’02: Proceedings of the 2002 ACM conference on Computer supported cooperative work, pages 116–125, New York, NY, USA. ACM Press.

[Mei et al., 2007] Mei, Q., Ling, X., Wondra, M., Su, H., and Zhai, C. (2007). Topic sentiment mixture: modeling facets and opinions in weblogs. In WWW ’07: Proceedings of the 16th international conference on World Wide Web, pages 171–180, New York, NY, USA. ACM Press.

Page 159: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

References[Mishne and de Rijke, 2006] Mishne, G. and de Rijke, M. (2006). Deriving wishlists from blogs show us your blog, and we’ll tell you

what books to buy. In Proceedings of the 15th international conference on World Wide Web, pages 925–926, New York, NY, USA. ACM Press.

[Newman, 2003] Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review, 45:167. [Newman, 2004b] Newman, M. E. J. (2004b). Fast algorithm for detecting community structure in networks. Physical Review E,

69:066133. [Ntoulas et al., 2006] Ntoulas, A., Najork, M., Manasse, M., and Fetterly, D. (2006). Detecting spam web pages through content

analysis. In Proceedings of the 15th international conference on World Wide Web (WWW). [O’Reilly, 2005] O’Reilly, T. (2005). What is Web 2.0 -design patterns and business models for the next generation of software.

http://www.oreillynet.com/pub/a/oreilly/tim/news/ 2005/09/30/what-is-web-20.html. [Osman and Yearwood, 2007] Osman, D. J. and Yearwood, J. L. (2007). Opinion search in web logs. In ADC ’07: Proceedings of the

eighteenth conference on Australasian database, pages 133–139, Darlinghurst, Australia, Australia. Australian Computer Society, Inc.

[Page et al., 1998] Page, L., Brin, S., Motwani, R., and Winograd, T. (1998). The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project.

[Pennock et al., 2002] Pennock, D. M., Flake, G. W., Lawrence, S., Glover, E. J., and Giles, C. L. (2002). Winners don’t take all: Characterizing the competition for links on the web. Proceedings of the National Academy of Sciences, 99(8):5207–5211.

[Pujol et al., 2002] Pujol, J. M., Sangesa, R., and Delgado, J. (2002). Extracting reputation in multi agent systems by means of social network topology. In Proceedings of the first international joint conference on Autonomous agents and multiagent systems (AAMAS), pages 467–474, New York, NY, USA. ACM Press.

[Richardson and Domingos, 2002] Richardson, M. and Domingos, P. (2002). Mining knowledge-sharing sites for viral marketing. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge Discovery and Data mining, pages 61–70, New York, NY, USA. ACM Press.

Page 160: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

References[Sabater and Sierra, 2002] Sabater, J. and Sierra, C. (2002). Reputation and social network analysis in multi-agent systems. In

AAMAS ’02: Proceedings of the first international joint conference on Autonomous agents and multiagent systems (AAMAS), pages 475–482, New York, NY, USA. ACM Press.

[Schelling, 1978] Schelling, T. (1978). Micromotives and macrobehavior norton. [Scoble and Israel, 2006] Scoble, R. and Israel, S. (2006). Naked conversations : how blogs are changing the way businesses talk

with customers. John Wiley. [Spertus et al., 2005] Spertus, E., Sahami, M., and Buyukkokten, O. (2005). Evaluating similarity measures: a large-scale study in

the orkut social network. In Proceeding of the eleventh ACM SIGKDD international conference on Knowledge Discovery in Data mining (KDD), pages 678–684, New York, NY, USA. ACM Press.

[Stefanone et al., 2004] Stefanone, M., Hancock, J., Gay, G., and Ingra ffea, A. (2004). Emergent networks, locus of control, and the pursuit of social capital. In Proceedings of the 2004 ACM conference on Computer Supported Cooperative Work (CSCW), pages 592–595, New York, NY, USA. ACM Press.

[Tang et al., 2008] Tang, L., Liu, H., Zhang, J., Agarwal, N., and Salerno, J. J. (2008). Topic taxonomy adaptation for group profiling. ACM Transactions on Knowledge Discovery from Data, TKDD, 1(4).

[Terveen and McDonald, 2005] Terveen, L. and McDonald, D. W. (2005). Social matching: A framework and research agenda. ACM Trans. Comput.-Hum. Interact., 12(3):401–434.

[Thelwall, 2006] Thelwall, M. (2006). Bloggers under the London attacks: Top information sources and topics. In Proceedings of the 3rd annual workshop on webloging ecosystem: aggreation, analysis and dynamics.

[Watts and Strogatz, 1998] Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of ‘small-world networks. Nature, 393(6684):440442.

[Wu et al., 2003] Wu, F., Huberman, B. A., Adamic, L. A., and Tyler, J. (2003). Information flow in social groups. [Yu and Singh, 2003] Yu, B. and Singh, M. P. (2003). Detecting deception in reputation management. In Proceedings of the second international joint conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 73–80,

New York, NY, USA. ACM Press.

Page 161: Blogosphere: Research Issues, Tools and Applications · Web 2.0 Services (examples) • Blogs – Blogspot – Wordpress • Wikis – Wikipedia – Wikiversity • Social Networking

References[Zhang and Varadarajan, 2006] Zhang, Z. and Varadarajan, B. (2006). Utility scoring of product reviews. In Proceedings of the 15th

ACM international conference on Information and Knowledge Management (CIKM), pages 51–57, New York, NY, USA. ACM Press.

[Zhou and Davis, 2006] Zhou, Y. and Davis, J. (2006). Community discovery and analysis in blogspace. In Proceedings of the 15th international conference on World Wide Web, pages 1017–1018, New York, NY, USA. ACM Press.

[Ziegler and Golbeck, 2007] Ziegler, C.-N. and Golbeck, J. (2007). Investigating interactions of trust and interest similarity. Decis. Support Syst., 43(2):460–475.

[Ziegler and Lausen, 2004a] Ziegler, C.-N. and Lausen, G. (2004a). Paradigms for decentralized social filter­ing exploiting trustnetwork structure. In CoopIS/DOA/ODBASE (2), pages 840–858.

[Ziegler and Lausen, 2004b] Ziegler, C.-N. and Lausen, G. (2004b). Spreading activation models for trust propagation. In Proceedings of the 2004 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE’04), pages 83–97, Washington, DC, USA. IEEE Computer Society.

[Ziegler and Lausen, 2005] Ziegler, C.-N. and Lausen, G. (2005). Propagation models for trust and distrust in social networks. Information Systems Frontiers, 7(4-5):337–358.

[Ziegler and Skubacz, 2006] Ziegler, C.-N. and Skubacz, M. (2006). Towards automated reputation and brand monitoring on the web. In WI ’06: Proceedings of the 2006 IEEE/WIC/ACM International Con-ference on Web Intelligence, pages 1066–1072, Washington, DC, USA. IEEE Computer Society.

[Sallach and Macal, 2001] Sallach, David and Macal, Charles (2001). The simulation of social agents: an introduction, Special Issue of Social Science Computer Review 19(3):245–248.

[Axelrod and Tesfatsion, 2005] Axelrod, Robert and Tesfatsion, Leigh (2005). A guide for newcomers to Agent-Based Modeling in the Social Sciences, Handbook of Computational Economics, Vol. 2: Agent-Based Computational Economics