Who says what to whom on Twitter? - Twitter flow

10
Who Says What to Whom on Twitter Shaomei Wu * Cornell University, USA [email protected] Jake M. Hofman Yahoo! Research, NY, USA [email protected] Winter A. Mason Yahoo! Research, NY, USA winteram@yahoo- inc.com Duncan J. Watts Yahoo! Research, NY, USA [email protected] ABSTRACT We study several longstanding questions in media communi- cations research, in the context of the microblogging service Twitter, regarding the production, flow, and consumption of information. To do so, we exploit a recently introduced feature of Twitter—known as Twitter lists—to distinguish between elite users, by which we mean specifically celebri- ties, bloggers, and representatives of media outlets and other formal organizations, and ordinary users. Based on this clas- sification, we find a striking concentration of attention on Twitter—roughly 50% of tweets consumed are generated by just 20K elite users—where the media produces the most in- formation, but celebrities are the most followed. We also find significant homophily within categories: celebrities listen to celebrities, while bloggers listen to bloggers etc; however, bloggers in general rebroadcast more information than the other categories. Next we re-examine the classical “two-step flow” theory of communications, finding considerable sup- port for it on Twitter, but also some interesting differences. Third, we find that URLs broadcast by different categories of users or containing different types of content exhibit sys- tematically different lifespans. And finally, we examine the attention paid by the different user categories to different news topics. Categories and Subject Descriptors H.1.2 [Models and Principles]: User/Machine Systems; J.4 [Social and Behavioral Sciences]: Sociology General Terms two-step flow, communications, classification Keywords Communication networks, Twitter, information flow * Part of this research was performed while the author was visiting Yahoo! Research, New York. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. WWW ’11 Hyderabad, India Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00. 1. INTRODUCTION A longstanding objective of media communications re- search is encapsulated by what is known as Lasswell’s maxim: “who says what to whom in what channel with what ef- fect” [9], so-named for one of the pioneers of the field, Harold Lasswell. Although simple to state, Laswell’s maxim has proven difficult to satisfy in the more-than 60 years since he stated it, in part because it is generally difficult to ob- serve information flows in large populations, and in part because different channels have very different attributes and effects. As a result, theories of communications have tended to focus either on “mass” communication, defined as “one- way message transmissions from one source to a large, rela- tively undifferentiated and anonymous audience,” or on “in- terpersonal” communication, meaning a “two-way message exchange between two or more individuals.” [13]. Correspondingly, debates among communication theorists have tended to revolve around the relative importance of these two putative modes of communication. For exam- ple, whereas early theories such as the so-called “hypodermic model” posited that mass media exerted direct and relatively strong effects on public opinion, mid-century researchers [10, 6, 11, 4] argued that the mass media influenced the pub- lic only indirectly, via what they called a two-step flow of communications, where the critical intermediate layer was occupied by a category of media-savvy individuals called opinion leaders. The resulting “limited effects” paradigm was then subsequently challenged by a new generation of researchers [5], who claimed that the real importance of the mass media lay in its ability to set the agenda of public discourse. But in recent years rising public skepticism of mass media, along with changes in media and communica- tion technology, have tilted conventional academic wisdom once more in favor of interpersonal communication, which some identify as a “new era” of minimal effects [2]. Recent changes in technology, however, have increasingly undermined the validity of the mass vs. interpersonal di- chotomy itself. On the one hand, over the past few decades mass communication has experienced a proliferation of new channels, including cable television, satellite radio, special- ist book and magazine publishers, and of course an array of web-based media such as sponsored blogs, online communi- ties, and social news sites. Correspondingly, the traditional mass audience once associated with, say, network television has fragmented into many smaller audiences, each of which increasingly selects the information to which it is exposed, and in some cases generates the information itself. Mean-

description

Article on Nieman Journalism Lab: http://bit.ly/gVRSAcWho says what to whom on Twitter? Four scientists studied the production, flow, and consumption of information on Twitter.... they found a dramatic concentration of attention: 50 percent of tweets consumed are generated by just 20,000 elite users. Or, as the authors note later in the study: “Roughly 0.05% of the population accounts for almost half of all attention”. But there are two other key takeaways in the study worth highlighting, particularly for news organizations trying to figure out how their content meshes with the web’s social layer. First, there’s the concentration-of-distribution idea: the way users discover, and then consume, media content on Twitter. Second, there’s the longevity idea: the lifespan of that content as it enters our consciousness and then, often almost instantly, fades away.

Transcript of Who says what to whom on Twitter? - Twitter flow

Page 1: Who says what to whom on Twitter? - Twitter flow

Who Says What to Whom on Twitter

Shaomei Wu∗

Cornell University, [email protected]

Jake M. HofmanYahoo! Research, NY, USA

[email protected]

Winter A. MasonYahoo! Research, NY, USA

[email protected]

Duncan J. WattsYahoo! Research, NY, [email protected]

ABSTRACTWe study several longstanding questions in media communi-cations research, in the context of the microblogging serviceTwitter, regarding the production, flow, and consumptionof information. To do so, we exploit a recently introducedfeature of Twitter—known as Twitter lists—to distinguishbetween elite users, by which we mean specifically celebri-ties, bloggers, and representatives of media outlets and otherformal organizations, and ordinary users. Based on this clas-sification, we find a striking concentration of attention onTwitter—roughly 50% of tweets consumed are generated byjust 20K elite users—where the media produces the most in-formation, but celebrities are the most followed. We also findsignificant homophily within categories: celebrities listen tocelebrities, while bloggers listen to bloggers etc; however,bloggers in general rebroadcast more information than theother categories. Next we re-examine the classical “two-stepflow” theory of communications, finding considerable sup-port for it on Twitter, but also some interesting differences.Third, we find that URLs broadcast by different categoriesof users or containing different types of content exhibit sys-tematically different lifespans. And finally, we examine theattention paid by the different user categories to differentnews topics.

Categories and Subject DescriptorsH.1.2 [Models and Principles]: User/Machine Systems;J.4 [Social and Behavioral Sciences]: Sociology

General Termstwo-step flow, communications, classification

KeywordsCommunication networks, Twitter, information flow

∗Part of this research was performed while the author wasvisiting Yahoo! Research, New York.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.WWW ’11 Hyderabad, IndiaCopyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.

1. INTRODUCTIONA longstanding objective of media communications re-

search is encapsulated by what is known as Lasswell’s maxim:“who says what to whom in what channel with what ef-fect” [9], so-named for one of the pioneers of the field, HaroldLasswell. Although simple to state, Laswell’s maxim hasproven difficult to satisfy in the more-than 60 years sincehe stated it, in part because it is generally difficult to ob-serve information flows in large populations, and in partbecause different channels have very different attributes andeffects. As a result, theories of communications have tendedto focus either on “mass” communication, defined as “one-way message transmissions from one source to a large, rela-tively undifferentiated and anonymous audience,” or on “in-terpersonal” communication, meaning a “two-way messageexchange between two or more individuals.” [13].

Correspondingly, debates among communication theoristshave tended to revolve around the relative importance ofthese two putative modes of communication. For exam-ple, whereas early theories such as the so-called“hypodermicmodel”posited that mass media exerted direct and relativelystrong effects on public opinion, mid-century researchers [10,6, 11, 4] argued that the mass media influenced the pub-lic only indirectly, via what they called a two-step flow ofcommunications, where the critical intermediate layer wasoccupied by a category of media-savvy individuals calledopinion leaders. The resulting “limited effects” paradigmwas then subsequently challenged by a new generation ofresearchers [5], who claimed that the real importance of themass media lay in its ability to set the agenda of publicdiscourse. But in recent years rising public skepticism ofmass media, along with changes in media and communica-tion technology, have tilted conventional academic wisdomonce more in favor of interpersonal communication, whichsome identify as a “new era” of minimal effects [2].

Recent changes in technology, however, have increasinglyundermined the validity of the mass vs. interpersonal di-chotomy itself. On the one hand, over the past few decadesmass communication has experienced a proliferation of newchannels, including cable television, satellite radio, special-ist book and magazine publishers, and of course an array ofweb-based media such as sponsored blogs, online communi-ties, and social news sites. Correspondingly, the traditionalmass audience once associated with, say, network televisionhas fragmented into many smaller audiences, each of whichincreasingly selects the information to which it is exposed,and in some cases generates the information itself. Mean-

Page 2: Who says what to whom on Twitter? - Twitter flow

while, in the opposite direction interpersonal communicationhas become increasingly amplified through personal blogs,email lists, and social networking sites to afford individu-als ever-larger audiences. Together, these two trends havegreatly obscured the historical distinction between mass andinterpersonal communications, leading some scholars to referinstead to “masspersonal” communications [13].

Nowhere is the erosion of traditional categories more ap-parent than in the micro-blogging platform Twitter. To il-lustrate, the top ten most-followed users on Twitter are notcorporations or media organizations, but individual people,mostly celebrities. Moreover, these individuals communi-cate directly with their millions of followers, often managedby themselves or publicists, thus bypassing the traditionalintermediation of the mass media between celebrities andfans. Next, in addition to conventional celebrities, a newclass of“semi-public” individuals like bloggers, authors, jour-nalists, and subject matter experts have come to occupy animportant niche on Twitter, in some cases becoming moreprominent than traditional public figures such as entertain-ers and elected officials. Third, in spite of these shifts awayfrom centralized media power, media organizations—alongwith corporations, governments, and NGOs—all remain wellrepresented among highly followed users, and are often ex-tremely active. And finally, Twitter is primarily made upof many millions of users who seem to be ordinary individ-uals communicating with their friends and acquaintances ina manner largely consistent with traditional notions of in-terpersonal communication.

Twitter, therefore, represents the full spectrum of commu-nications from personal and private to“masspersonal”to tra-ditional mass media. Consequently it provides an interestingcontext in which to address Lasswell’s maxim, especially asTwitter—unlike television, radio, and print media—enablesone to easily observe information flows among the membersof its ecosystem. Unfortunately, however, the kinds of ef-fects that are of most interest to communications theorists,such as changes in behavior, attitudes, etc., remain difficultto measure on Twitter. Therefore in this paper we limitour focus to the “who says what to whom” part of Laswell’smaxim.

To this end, our paper makes three main contributions:

• We introduce a method for classifying users using Twit-ter Lists into “elite” and “ordinary” users, further clas-sifying elite users into one of four categories of interest—media, celebrities, organizations, and bloggers.

• We investigate the flow of information among thesecategories, finding that although audience attention ishighly concentrated on a minority of elite users, muchof the information they produce reaches the massesindirectly via a large population of intermediaries.

• We find that different categories of users place slightlydifferent emphasis on different types of content, andthat different content types exhibit dramatically dif-ferent characteristic lifespans, ranging from less thana day to months.

The remainder of the paper proceeds as follows. In thenext section, we review related work. In section 3 we dis-cuss our data and methods, including section 3.3 in whichwe describe how we use Twitter Lists to classify users, out-line two different sampling methods, and show that they

deliver qualitatively similar results. In section 4 we analyzethe production of information on Twitter, particularly whopays attention to whom. In section 4.1, we revisit the the-ory of the two-step flow—arguably the dominant theory ofcommunications for much of the past 50 years—finding con-siderable support for the theory as well as some interestingdifferences. In section 5, we consider “who listens to what”,examining first who shares what kinds of media content, andsecond the lifespan of URLs as a function of their origin andtheir content. Finally, in section 6 we conclude with a briefdiscussion of future work.

2. RELATED WORKAside from the communications literature surveyed above,

a number of recent papers have examined information dif-fusion on Twitter. Kwak et al. [8] studied the topologicalfeatures of the Twitter follower graph, concluding from thehighly skewed nature of the distribution of followers and thelow rate of reciprocated ties that Twitter more closely resem-bled an information sharing network than a social network—a conclusion that is consistent with our own view. In ad-dition, Kwak et al. compared three different measures ofinfluence—number of followers, page-rank, and number ofretweets—finding that the ranking of the most influentialusers differed depending on the measure. In a similar vein,Cha et al. [3] compared three measures of influence—numberof followers, number of retweets, and number of mentions—and also found that the most followed users did not neces-sarily score highest on the other measures. Weng et al. [15]compared number of followers and page rank with a modifiedpage-rank measure which accounted for topic, again findingthat ranking depended on the influence measure. Finally,Bakshy et al. [1] studied the distribution of retweet cascadeson Twitter, finding that although users with large followercounts and past success in triggering cascades were on aver-age more likely to trigger large cascades in the future, thesefeatures are in general poor predictors of future cascade size.

Our paper differs from this earlier work by shifting atten-tion from the ranking of individual users in terms of variousinfluence measures to the flow of information among dif-ferent categories of users. In particular, we are interestedin identifying “elite” users, who we differentiate from “ordi-nary” users in terms of their visibility, and understandingtheir role in introducing information into Twitter, as well ashow information originating from traditional media sourcesreaches the masses.

3. DATA AND METHODS

3.1 Twitter Follower GraphIn order to understand how information is transmitted on

Twitter, we need to know the channels by which it flows;that is, who is following whom on Twitter. To this end, weused the follower graph studied by Kwak et al. [8], whichincluded 42M users and 1.5B edges. This data representsa crawl of the graph seeded with all users on Twitter asobserved by July 31st, 2009, and is publicly available1. Asreported by Kwak et al. [8], the follower graph is a directednetwork characterized by highly skewed distributions both

1The data is free to download fromhttp://an.kaist.ac.kr/traces/WWW2010.html

Page 3: Who says what to whom on Twitter? - Twitter flow

of in-degree (# followers) and out-degree (#“friends”, Twit-ter notation for how many others a user follows); however,the out-degree distribution is even more skewed than thein-degree distribution. In both friend and follower distribu-tions, for example, the median is less than 100, but the max-imum # friends is several hundred thousand, while a smallnumber of users have millions of followers. In addition, thefollower graph is also characterized by extremely low reci-procity (roughly 20%)—in particular, the most-followed in-dividuals typically do not follow many others. The Twitterfollower graph, in other words, does not conform to the usualcharacteristics of social networks, which exhibit much higherreciprocity and far less skewed degree distributions [7], butinstead resembles more the mixture of one-way mass com-munications and reciprocated interpersonal communicationsdescribed above.

3.2 Twitter FirehoseIn addition to the follower graph, we are interested in the

content being shared on Twitter—particularly URLs—andso we examined the corpus of all 5B tweets generated overa 223 day period from July 28, 2009 to March 8, 2010 us-ing data from the Twitter “firehose,” the complete streamof all tweets2. Because our objective is to understand theflow of information, it is useful for us to restrict attention totweets containing URLs, for two reasons. First, URLs addeasily identifiable tags to individual tweets, allowing us toobserve when a particular piece of content is either retweetedor subsequently reintroduced by another user. And second,because URLs point to online content outside of Twitter,they provide a much richer source of variation than is pos-sible in the typical 140 character tweet. Finally, we notethat almost all URLs broadcast on Twitter have been short-ened using one of a number of URL shorteners, of which themost popular is http://bit.ly/. From the total of 5B tweetsrecorded during our observation period, therefore, we focusour attention on the subset of 260M containing bit.ly URLs.

3.3 Twitter ListsOur method for classifying users exploits a relatively re-

cent feature of Twitter: Twitter Lists. Since its launch onNovember 2, 2009, Twitter Lists have been welcomed by thecommunity as a way to group people and organize one’s in-coming stream of tweets by specific sets of users. To createa Twitter List, a user needs to provide a name (required)and description (optional) for the list, and decide whetherthe new list is public (anyone can view and subscribe to thislist) or private (only the list creator can view or subscribe tothis list). Once a list is created, the user can add/edit/deletelist members. As the purpose of Twitter Lists is to help usersorganize users they follow, the name of the list can be con-sidered a meaningful label for the listed users. List creationtherefore effectively exploits the “wisdom of crowds” [12]to the task of classifying users, both in terms of their im-portance to the community (number of lists on which theyappear), and also how they are perceived (e.g. news organi-zation vs. celebrity, etc.).

Before describing our methods for classifying users in termsof the lists on which they appear, we emphasize that weare motivated by a particular set of substantive questionsarising out of communications theory. In particular, we

2http://dev.twitter.com/doc/get/statuses/firehose

are interested in the relative importance of mass commu-nications, as practiced by media and other formal organiza-tions, masspersonal communications as practiced by celebri-ties and prominent bloggers, and interpersonal communica-tions, as practiced by ordinary individuals communicatingwith their friends. In addition, we are also interested in therelationships between these categories of users, motivatedby theoretical arguments such as the theory of the two-stepflow [6]. Rather than pursuing a strategy of automatic clas-sification, therefore, our approach depends on defining andidentifying certain predetermined classes of theoretical in-terest, where both approaches have advantages and disad-vantages. In particular, we restrict our attention to fourclasses of what we call “elite” users: media, celebrities, orga-nizations, and bloggers, as well as the relationships betweenthese elite users and the much larger population of “ordi-nary” users.

In additional to these theoretically-imposed constraints,our proposed classification method must also satisfy a prac-tical constraint—namely that the rate limits established byTwitter’s API effectively preclude crawling all lists for allTwitter users3. Thus we instead devised two different sam-pling schemes—a snowball sample and an activity sample—each with some advantages and disadvantages, discussed be-low.

3.3.1 Snowball sample of Twitter ListsThe first method for identifying elite users employed snow-

ball sampling. For each category, we chose a number u0 ofseed users that were highly representative of the desired cat-egory and appeared on many category-related lists. For eachof the four categories above, the following seeds were chosen:

• Celebrities: Barack Obama, Lady Gaga, Paris Hilton

• Media: CNN, New York Times

• Organizations: Amnesty International, World WildlifeFoundation, Yahoo! Inc., Whole Foods

• Blogs4: BoingBoing, FamousBloggers, problogger, mash-able. Chrisbrogan, virtuosoblogger, Gizmodo, Ileane,dragonblogger, bbrian017, hishaman, copyblogger, en-gadget, danielscocco, BlazingMinds, bloggersblog, Ty-coonBlogger, shoemoney, wchingya, extremejohn,GrowMap, kikolani, smartbloggerz, Element321, bran-donacox, remarkablogger, jsinkeywest, seosmarty, No-tAProBlog, kbloemendaal, JimiJones, ditesco

After reviewing the lists associated with these seeds, thefollowing keywords were hand-selected based on (a) theirrepresentativeness of the desired categories; and (b) theirlack of overlap between categories:

3The Twitter API allows only 20K calls per hour, where atmost 20 lists can be retrieved for each API call. Under themodest assumption of 40M users (roughly the number in the2009 crawl by [8]), where each user is included on at most20 lists, this would require 4 ∗ 106/2 ∗ 103 = 2, 000 hours, or11 weeks. Clearly this time could be reduced by deployingmultiple accounts, but it also likely underestimates the realtime quite significantly, as many users appear on many morethan 20 lists (e.g. Lady Gaga appears on nearly 140,000)4The blogger category required many more seeds becausebloggers are in general lower profile than the seeds for theother categories

Page 4: Who says what to whom on Twitter? - Twitter flow

u0l0u1l1u2l2

Figure 1: Schematic of the Snowball SamplingMethod

• Celebrities: star, stars, hollywood, celebs, celebrity,celebrities, celebsverified, celebrity-list,celebrities-on-twitter, celebrity-tweets

• Media: news, media, news-media

• Organizations: company, companies, organization,organisation, organizations, organisations, corporation,brands, products, charity, charities, causes, cause, ngo

• Blogs: blog, blogs, blogger, bloggers

Having selected the seeds and the keywords for each cate-gory, we then performed a snowball sample of the bipartitegraph of users and lists (see Figure 1). For each seed, wecrawled all lists on which that seed appeared. The resulting“list of lists” was then pruned to contain only the l0 listswhose names matched at least one of the chosen keywordsfor that category. For instance, Lady Gaga is on lists called“faves”, “celebs”, and “celebrity”, but only the latter two listswould be kept after pruning. We then crawled all u1 usersappearing in the pruned “list of lists” (for instance, find-ing all users that appeared in the “celebrity” list with LadyGaga), and then repeated these last two steps to completethe crawl. In total, 524, 116 users were obtained, who ap-peared on 7, 000, 000 lists; however, many of the more promi-nent users appeared on lists in more than one category—forexample Oprah Winfrey is frequently included in lists of“celebrity” as well as “media.” To resolve this ambiguity, wecomputed a user i’s membership score in category c:

wic =nic

Nc,

where nic is the number of lists in category c that containuser i and Nc is the total number of lists in category c.We then assigned each user to the category in which heor she has the highest membership score. The number ofusers assigned in this manner to each category is reportedin Table 1.

3.3.2 Activity Sample of Twitter ListsAlthough the snowball sampling method is convenient and

is easily interpretable with respect to our theoretical moti-vation, it is also potentially biased by our particular choiceof seeds. To address this concern, we also generate a sampleof users based on their activity. Specifically, we crawl alllists associated with all users who tweet at least once everyweek for our entire observation period.

This “activity-based” sampling method is also clearly bi-ased towards users who are consistently active. Importantly,

Table 1: Distribution of users over categories

Snowball Sample Activity Samplecategory # of users % of users # of users % of users

celeb 82,770 15.8% 14,778 13.0%media 216,010 41.2% 40,186 35.3%org 97,853 18.7% 14,891 13.1%blog 127,483 24.3% 43,830 38.6%total 524,116 100% 113,685 100%

however, the bias is likely to be quite different from any in-troduced by the snowball sample; thus obtaining similar re-sults from the two samples should give us confidence that ourfindings are not artifacts of the sampling procedure. Thismethod initially yielded 750k users and 5M lists; however,after pruning the lists to those that contained at least of thekeywords above, and assigning users to unique categories(as described above), we obtained a much-reduced sampleof 113,685 users, where Table 1 reports the number of usersassigned to each category. We note that the number of listsobtained by the activity sampling methods is considerablysmaller than that obtained by the snowball sample, andthat bloggers are more heavily represented among the ac-tivity sample at the expense of the other three categories—consistent with our claim that the two methods introducedifferent biases. Interestingly, however, 97,614 of the ac-tivity sample, or 85%, also appear in the snowball sample,suggesting that the two sampling methods identifying sim-ilar populations of elite users–as indeed we confirm in thenext section.

3.3.3 Classifying Elite UsersIn order to identify categories of elite users, we not only

need to classify users into categories, but also arrive at a def-inition of “elite” that satisfies a tradeoff between (a) keepingeach category relatively small, so as not to include users whoare not distinguishable from ordinary users, while (b) maxi-mizing the volume of attention that is accounted for by eachcategory. In addition, it is also desirable to make the fourcategories the same size, so as to facilitate comparisons. Tothis end, we first rank all users in each of category by howfrequently they are listed in that category. Next, we mea-sure the flow of information from the top k users in eachof the four categories to a random sample of 100K ordinary(i.e. unclassified) users in two ways: the proportion of peo-ple the user follows in each category, and the proportion oftweets the user received from everyone the user follows ineach category.

Figure 2(a) shows for the snowball sample the share offollowing links (square symbols) and tweets received (dia-monds) by an average user, while Figure 2(b) shows thesame information for the activity sample. Although the nu-merical values differ slightly, the two sets of results are qual-itatively similar. In particular, for both sampling methods,celebrities outrank all other categories, followed by the me-dia, organizations, and bloggers. Also in both cases, thebulk of the attention is accounted for by a relatively smallnumber of users within each category, as evidenced by therelatively flat slope of the attention curves, where we notethat the curve for celebrities asymptotes more slowly thanfor the other three categories. Balancing the requirementsdescribed above, therefore, we chose k = 5000 as a cut-offfor the elite categories, where all remaining users are hence-

Page 5: Who says what to whom on Twitter? - Twitter flow

forth classified as ordinary. In addition, from this point on,we restrict our analysis to elite categories to the top 5,000users identified by the sampling method, noting that bothmethods generate similar results.

010

2030

celebrities

top k

aver

age

%

1000 4000 7000 10000

friendstweets received

010

2030

media

top kav

erag

e %

1000 4000 7000 10000

friendstweets received

010

2030

organizations

top k

aver

age

%

1000 4000 7000 10000

friendstweets received

010

2030

blogs

top k

aver

age

%

1000 4000 7000 10000

friendstweets received

(a) Snowball sample

010

2030

celebrities

top k

aver

age

%

1000 4000 7000 10000

friendstweets received

010

2030

media

top k

aver

age

%

1000 4000 7000 10000

friendstweets received

010

2030

organizations

top k

aver

age

%

1000 4000 7000 10000

friendstweets received

010

2030

blogs

top k

aver

age

%

1000 4000 7000 10000

friendstweets received

(b) Activity sample

Figure 2: Average fraction of # following (blue line)and # tweets (red line) for a random user that areaccounted for by the top K elites users crawled

Based on this definition of elite users, Table 2 shows thatalthough ordinary users collectively introduce by far thehighest number of URLs, members of the elite categories arefar more active on a per-capita basis. In particular, usersclassified as “media” easily outproduce all other categories,followed by bloggers, organizations, and celebrities. Ordi-nary users originate on average only about 6 URLs each,compared with over 1,000 for media users. In the rest ofthis paper, therefore, when we talk about “celebrity”, “me-dia”, “organization”, “blog”, we refer the top 5K users drawnfrom the snowball sample listed as “celebrity”, “media”, “or-ganization”, “blog”, respectively.

Table 3, which shows the top 5 users in each of the fourcategories, suggests that the sampling method yields re-sults that are consistent with our objective of identifyingusers who are prominent exemplars of our target categories.Among the celebrity list, for example, “aplusk,” is the han-

Table 2: # of URLs initiated by category

# of URLscategory # of URLs per-capitaceleb 139,058 27.81media 5,119,739 1023.94org 523,698 104.74blog 1,360,131 272.03ordinary 244,228,364 6.10

dle for actor Ashton Kusher, one of the first celebrities toembrace Twitter and still one of the most followed users,while the remain celebrity users—Lady Gaga, Ellen De-generes, Oprah Winfrey, and Taylor Swift, are all householdnames. In the media category, CNN Breaking News and theNew York Times are most prominent, followed by BreakingNews, Time, and Asahi, a leading Japanese daily newspa-per. Among organizations, Google, Starbucks, and Twit-ter are obviously large and socially prominent corporations,while JoinRed is the charity organization started by Bono ofU2, and ollehkt is the Twitter account for KT, formerly Ko-rean Telecom. Finally, among the blogging category, Mash-able and ProBlogger are both prominent US blogging sites,while Kibe Loco and Nao Salvo are popular blogs in Brazil,and dooce is the blog of Heather Armstrong, a widely read“mommy blogger” with over 1.5M followers.

Table 3: Top 5 users in each category

Celebrity Media Org Blogaplusk cnnbrk google mashable

ladygaga nytimes Starbucks probloggerTheEllenShow asahi twitter kibelocotaylorswift13 BreakingNews joinred naosalvo

Oprah TIME ollehkt dooce

4. “WHO LISTENS TO WHOM”The results of the previous section provide qualified sup-

port for the conventional wisdom that audiences have be-come increasingly fragmented. Clearly, ordinary users onTwitter are receiving their information from many thou-sands of distinct sources, most of which are not traditionalmedia organizations—even though media outlets are by farthe most active users on Twitter, only about 15% of tweetsreceived by ordinary users are received directly from themedia. Equally interesting, however, is that in spite of thisfragmentation, it remains the case that 20K elite users, com-prising less than 0.05% of the user population, attracts al-most 50% of all attention within Twitter. Even if the mediahas lost attention relative to other elites, information flowshave not become egalitarian by any means.

The prominence of elite users also raises the question ofhow these different categories listen to each other. To ad-dress this issue, we compute the volume of tweets exchangedbetween elite categories. Specifically, Figure 3 shows theaverage percentage of tweets that category i receives fromcategory j, exhibiting striking homophily with respect toattention: celebrities overwhelmingly pay attention to othercelebrities, media actors pay attention to other media ac-tors, and so on. The one slight exception to this rule is thatorganizations pay more attention to bloggers than to them-

Page 6: Who says what to whom on Twitter? - Twitter flow

Celeb Media

Org Blog

A B

Category of Twitter Users

B receive tweets from A

Figure 3: Share of tweets received among elite cat-egories

Figure 4: RT behavior among elite categories

selves. In general, in fact, attention paid by organizations ismore evenly distributed across categories than for any othercategory.

Figure 3, it should be noted, shows only how many URLsare received by category i from category j, a particular weakmeasure of attention for the simple reason that many tweetsgo unread. A stronger measure of attention, therefore, isto consider instead only those URLs introduced by categoryi that are subsequently retweeted by category j. Figure 4shows how much information originating from each categoryis retweeted by other categories. As with our previous mea-sure of attention, retweeting is strongly homophilous amongelite categories; however, bloggers are disproportionately re-sponsible for retweeting URLs originated by all categories.This result reflects the characterization of bloggers as recy-clers and filters of information. However, even though on aper-capita basis bloggers disproportionately occupy the roleof information recyclers—93 retweets per person, comparedto only 1.1 retweets per person for ordinary users—the totalnumber of URLs retweeted by bloggers (465k) is vastly out-weighed by the number retweeted by ordinary users (46M);thus their overall impact is relatively minimal.

4.1 Two-Step Flow of InformationExamining information flow on Twitter can also shed new

light on the theory of the two-step flow, arguably the theorythat has most successfully captured the dueling importanceof mass media and interpersonal influence. The essence ofthe two-step flow is that information passes from the mediato the masses not directly, as supposed by early theories ofmass communication, but passes first through an intermedi-ate layer of “opinion leaders” who decide which informationto rebroadcast to their followers, and which to ignore. Aswe have already noted, on Twitter the flow of informationto the masses from the media accounts for only a fractionof the total volume of information. Nevertheless, it is still asubstantial fraction, so it is still interesting to ask: for thespecial case of information originating from media sources,what proportion is broadcast directly to the masses, andwhat proportion is transmitted indirectly via some popula-tion of intermediaries? In addition, we may inquire whetherthese intermediaries, to the extent they exist, are drawnfrom other elite categories or from ordinary users, as claimedby the two-step flow theory; and if the latter, in what re-spects they differ from other ordinary users.

Before proceeding with this analysis, we note that thereare two ways information can pass through an intermedi-ary in Twitter. The first is via retweeting, which occurswhen a users explicitly rebroadcasts a URL that he or shehas received, along with an explicit acknowledgement of thesource—either using official retweet function provided byTwitter, or making use of an informal convention such as“RT @user” or “via @user.” The second mechanism is whatwe label reintroduction, where a user subsequently tweets aURL that has previously been introduced by another user,but without the acknowledgment, in which case we assumethe information has been rediscovered independently. Forthe purposes of studying when a user receives informationdirectly from the media or indirectly through an intermedi-ary, we treat retweets and reintroductions equivalently. Ifthe first occurrence of a URL in Twitter came from a mediauser, but a user received the URL from another source, thenthat source can be considered an intermediary, whether theyare citing the source within Twitter by retweeting the URL,or reintroducing it, having discovered the URL outside ofTwitter.

To quantify the extent to which ordinary users get theirinformation indirectly versus directly from the media, wesampled 1M random ordinary users5, and for each user,counted the number n of bit.ly URLs they had received thathad originated from one of our 5K media users, where ofthe 1M total, 600K had received at least one such URL.For each member of this 600K subset we then counted thenumber n2 of these URLs that they received via non-mediafriends; that is, via a two-step flow. The average fractionn2/n = 0.46 therefore represents the proportion of media-originated content that reaches the masses via an interme-diary rather than directly. As Figure 5 shows, however,this average is somewhat misleading. In reality, the pop-ulation comprises two types—those who receive essentiallyall of their media-originating information via two-step flowsand those who receive virtually all of it directly from themedia. Unsurprisingly, the former type is exposed to lesstotal media than the latter. What is surprising, however, isthat even users who received up to 100 media URLs dur-

5As before, performing this analysis for the entire populationof over 40M ordinary users proved to be computationallyunfeasible.

Page 7: Who says what to whom on Twitter? - Twitter flow

a b

c d

Figure 5: Percentage of information that is receivedvia an intermediary as a function of total volume ofmedia content to which a user is exposed.

ing our observation period received all of them from opinionleaders.

Who are these intermediaries, and how many of them arethere? In total, the population of intermediaries is smallerthan that of the users who rely on them, but still surprisinglylarge, roughly 500K, the vast majority of which (96%) areclassified ordinary users, not elites. Interestingly, Figure 5calso shows that at least some intermediaries also receive thebulk of their media content indirectly, just like other ordi-nary users. Comparing Figure 5a and 5c, however, we notethat intermediaries are not like other ordinary users in thatthey are exposed to considerably more media than randomlyselected users, hence the number of intermediaries who relyon two-step flows is much smaller than for random users. Inaddition, we find that on average intermediaries have morefollowers than randomly sampled users (543 followers versus34) and are also more active (180 tweets on average, versus7). Finally, Figure 6 shows that although all intermediaries,by definition, pass along media content to at least one otheruser, a minority satisfies this function for multiple users,where we note that the most prominent intermediaries aredisproportionately drawn from the 4% elite users—AshtonKucher (asplusk), for example acts as an intermediary forover 100,000 users.

Interestingly, these results are all broadly consistent withthe original conception of the two-step flow, advanced over50 years ago, which emphasized that opinion leaders were“distributed in all occupational groups, and on every socialand economic level,” corresponding to our classification ofmost intermediaries as ordinary. [6]. The original theoryalso emphasized that opinion leaders, like their followers,also received at least some of their information via two-stepflows, but that in general they were more exposed to themedia than their followers—just as we find here. Finally,the theory predicted that opinion leadership was not a bi-nary attribute, but rather a continuously varying one, cor-responding to our finding that intermediaries vary widely inthe number of users for whom they act as filters and trans-mitters of media content. Given the length of time that haselapsed since the theory of the two-step flow was articulated,and the transformational changes that have taken place in

110

010

000

# of two−step recipients

# of

opi

nion

lead

ers

0 4 16 64 256 2048 16384 131072

Figure 6: Frequency of intermediaries binned by #randomly sampled users to whom they transmit me-dia content.

communications technology in the interim—given, in fact,that a service like Twitter was likely unimaginable at thetime—it is remarkable how well the theory agrees with ourobservations.

5. WHO LISTENS TO WHAT?The results in section 4 demonstrate the “elite” users ac-

count for a substantial portion of all of the attention onTwitter, but also show clear differences in how the attentionis allocated to the different elite categories. It is thereforeinteresting to consider what kinds of content is being sharedby these categories. Given the large number of URLs in ourobservation period (260M), and the many different ways onecan classify content (video vs. text, news vs. entertainment,political news vs. sports news, etc.), classifying even a smallfraction of URLs according to content is an onerous task.Bakshy et al. [1], for example, used Amazon’s MechanicalTurk to classify a stratified sample of 1,000 URLs along avariety of dimensions; however, this method does not scalewell to larger sample sizes.

Instead, we restrict attention to URLs originated by theNew York Times which, with over 2.5M followers, is thesecond-most-followed news organization on Twitter, afterCNN Breaking News. NY Times, however, is roughly tentimes as active as CNN Breaking News, so it is arguable abetter source of data. To classify NY Times content, weexploit a convenient feature of their format—namely thatall NY Times URLs are classified in a consistent way bythe section in which they appear (e.g. U.S., World, Sports,Science, Arts, etc.) 6. Of the 6398 New York Times bit.lyURLs we observed, 6370 could be successfully unshortenedand assigned to one of 21 categories. Of these, however, only9 categories had more than 100 URLs during the observa-tion period, one of which—“NY region”—was highly specificto the New York metropolitan area; thus we focused ourattention on the remaining 8 topical categories. Figure 7shows the proportion of URLs from each New York Timessection retweeted or reintroduced by each category. World

6http://www.nytimes.com/year/month/day/category/title.html?ref=category

Page 8: Who says what to whom on Twitter? - Twitter flow

User Category

% R

Ts a

nd R

e-in

trodu

ctio

ns

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

1. World News

3. Business

5. Health

7. Science

blog celeb media org other

2. U.S. News

4. Sports

6. Technology

8. Arts

blog celeb media org other

Figure 7: Number of RTs and Reintroductions ofNew York Times stories by content category

news is the most popular category, followed by U.S. News,Business, and Sports, where increasingly niche categorieslike Health, Arts, Science, and Technology are less popu-lar still. In general, the overall pattern is replicated for allcategories of users, but there are some minor deviations: inparticular, organizations show disproportionately little in-terest in business and arts-related stories, and dispropor-tionately high interest in science, technology, and possiblyworld news. Celebrities, by contrast, show greater interestin sports and less interest in health, while the media showssomewhat greater interest in U.S. news stories.

5.1 Lifespan of ContentIn addition to different types of content, URLs introduced

by different types of elite users or ordinary users may exhibitdifferent lifespans, by which we mean the time lag betweenthe first and last appearance of a given URL on Twitter.

Naively, measuring lifespan seems a trivial matter; how-ever, a finite observation period—which results in censoringof our data—complicates this task. In other words, a URLthat is last observed towards the end of the observation pe-riod may be retweeted or reintroduced after the period ends,while correspondingly, a URL that is first observed towardthe beginning of the observation window may in fact havebeen introduced before the window began. What we ob-serve as the lifespan of a URL, therefore, is in reality alower bound on the lifespan. Although this limitation doesnot create much of a problem for short-lived URLs—whichaccount for the vast majority of our observations—it does

first observationof URL

δδ τ

last observationof URL

δτ

estimation period = 133 days evaluation period = 90 days

Total observation window = 223 days

Figure 8: Schematic of lifespan estimation proce-dure

create large biases for long lived URLs. In particular, URLsthat appear towards the end of our observation period willbe systematically classified as shorter-lived than URLs thatappear towards the beginning.

To address the censoring problem, we seek to determinea buffer δ at both the beginning and the end of our 223-day period, and only count URLs as having a lifespan of τif (a) they do not appear in the first δ days, (b) they firstappear in the interval between the buffers, and (c) they donot appear in the last δ days, as illustrated in Figure 8(a).To determine δ we first split the 223 day period into twosegments—the first 133 day estimation period and the last90 day evaluation period (see Figure 8(b))—and then ask: ifwe (a) observe a URL first appear in the first 133−δ days and(b) do not see it in the δ days prior to the splitting point, howlikely are we see it in the last 90 days? Clearly this dependson the actual lifespan of the URL, as the longer a URLlives, the more likely it will re-appear in the future. Usingthis estimation/evaluation split, we find an upper-bound onlifespan for which we can determine the actual lifespan with95% accuracy as a function of δ. Finally, because we requirea beginning and ending buffer, and because we can onlyclassify a URL as having lifespan τ if it appears at least τdays before the end of our window, we need to pick τ andδ such that τ + 2δ ≤ 223. We determined that τ = 70and δ = 70 sufficiently satisfied our constraints; thus forthe following analysis, we consider only URLs that have alifespan τ ≤ 70.

5.2 Lifespan By CategoryHaving established a method for estimating URL lifes-

pan, we now explore the lifespan of URLs introduced bydifferent categories of users, as shown in Figure 9(a). URLsinitiated by the elite categories exhibit a similar distribu-tion over lifespan to those initiated by ordinary users. AsFigure 9(b) shows, however, when looking at the percent-age of URLs of different lifespans initiated by each category,we see two additional results: first, URLs originated by me-dia actors generate a large portion of short-lived URLs (es-pecially URLs with lifespan=0, those that only appearedonce); and second, URLs originated by bloggers are over-represented among the longer-lived content. Both of theseresults can be explained by the type of content that origi-nates from different sources: whereas news stories tend tobe replaced by updates on a daily or more frequent basis,the sorts of URLs that are picked up by bloggers are of morepersistent interest, and so are more likely to be retweeted orreintroduced months or even years after their initial intro-duction.

Page 9: Who says what to whom on Twitter? - Twitter flow

●●

● ● ●●

●● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ●

0 10 20 30 40 50 60 700

5

10

15

20

lifespan (day)

log(

# of

UR

Ls w

ith li

fesp

an =

x d

ay) ● other

celebmediaorgblog

(a) Count

0 10 20 30 40 50 60 700

1

2

3

4

5

6

7

lifespan (day)

% o

f UR

Ls fr

om e

lites

cat

egor

y

celebmediaorgblog

(b) Percent

Figure 9: 9(a) Count and 9(b) percentage of URLsinitiated by 5 categories, with different lifespans

To shed more light on the nature of long-lived content onTwitter, we used the bit.ly API service to unshorten 35Kof the most long-lived URLs (URLs that lived at least 200days), and mapped them into 21034 web domains. As Figure10 shows, the population of long-lived URLs is dominated byvideos, music, and books. Twitter is, in other words, shouldbe viewed as a subset of a much larger media ecosystem inwhich content exists and is repeatedly rediscovered by Twit-ter users. Some of this content—such as daily news stories—has a relatively short period of relevance, after which a givenstory is unlikely to be reintroduced or rebroadcast. At theother extreme, classic music videos, movie clips, and long-format magazine articles have lifespans that are effectivelyunbounded, and can seemingly be rediscovered by Twitterusers indefinitely without losing relevance.

Two related points are illustrated by Figure 11, whichshows the average RT rate (the proportion of tweets con-taining the URL that are retweets of another tweet) of URLswith different lifespans, grouped by the categories that in-troduced the URL7. First, for ordinary users, the majority

7Note here that URLs with lifespan = 0 are those URLs

Figure 10: Top 20 domains for URLs that lived morethan 200 days

●●

●● ●

●● ●

● ●● ●

●●

●● ● ● ● ● ● ● ●

● ●● ● ● ● ● ● ● ● ● ● ●

●● ● ●

● ● ● ● ● ●● ● ●

● ● ● ● ● ●● ● ● ● ● ●

● ● ● ●●

●●

● ●

0 10 20 30 40 50 60 700.0

0.2

0.4

0.6

0.8

1.0

lifespan (day)

RT

rat

e (#

of R

Ts

/ tot

al #

of o

ccur

renc

es)

● othercelebmediaorgblog

Figure 11: Average RT rate by lifespan for each ofthe originating categories

of appearances of URLs after the initial introduction derivesnot from retweeting, but rather from reintroduction, wherethis result is especially pronounced for long-lived URLs.For the vast majority of URLs on Twitter, in other words,longevity is determined not by diffusion, but by many dif-ferent users independently rediscovering the same content,consistent with our interpretation above. Second, however,for URLs introduced by elite users, the result is somewhatthe opposite—that is, they are more likely to be retweetedthan reintroduced, even for URLs that persist for weeks.Although it is unsurprising that elite users generate moreretweets than ordinary users, the size of the difference isnevertheless striking, and suggests that in spite of the dom-inant result above that content lifespan is determined to alarge extent by type, the source of its origin also impacts itspersistence, at least on average—a result that is consistentwith previous findings [1].

6. CONCLUSIONSIn this paper, we investigated a classic problem in me-

dia communications research, captured by the first part ofLaswell’s maxim—“who says what to whom”—in the context

that only appeared once in our dataset, thus the RT rate iszero.

Page 10: Who says what to whom on Twitter? - Twitter flow

of Twitter. By restricting our attention to Twitter, our con-clusions are necessarily limited to one narrow cross-section ofthe media landscape. Moreover, communications on Twittermay be unrepresentative of information flow via more tradi-tional channels, such as TV and radio on the one hand, andinterpersonal interactions on the other hand. However, wefeel the advantages of using Twitter to answer this questionoutweighed the limitations. First, because Twitter users ex-plicitly opt-in to “follow” each other, and because Twittermaintains a complete record of every tweet broadcast, itprovides an unprecedented level of resolution and coverageregarding who is listening to whom. Second, because Twit-ter users themselves classify other users by including themon lists, Twitter effectively provides a ready-made, crowd-sourced classification scheme of users.

By studying the flow of information among the five cat-egories that we identified (media, celebrities, organizations,bloggers, and ordinary), our analysis sheds new light onsome old questions of communications research. First, wefind that although audience attention has indeed fragmentedamong a wider pool of content producers than classical mod-els of mass media, attention remains highly concentrated,where roughly 0.05% of the population accounts for almosthalf of all attention. Within the population of elite users,moreover, attention is highly homophilous, with celebritiesfollowing celebrities, media following media, and bloggersfollowing bloggers. Second, we find considerable support forthe two-step flow of information—almost half the informa-tion that originates from the media passes to the masses indi-rectly via a diffuse intermediate layer of opinion leaders, whoalthough classified as ordinary users, are more connectedand more exposed to the media than their followers. Third,we find that although all categories devote a roughly simi-lar fraction of their attention to different categories of news(World, U.S., Business, etc), there are some differences—organizations, for example, devote a surprisingly small frac-tion of their attention to business-related news. We also findthat different types of content exhibit very different lifes-pans. In particular, media-originated URLs are dispropor-tionately represented among short-lived URLs while thoseoriginated by bloggers tend to be overrepresented amonglong-lived URLs. Finally, we find that the longest-livedURLs are dominated by content such as videos and music,which are continually being rediscovered by Twitter usersand appear to persist indefinitely.

In closing, we note that although our use of Twitter liststo label users was motivated by a specific set of questionsregarding mass vs interpersonal communications, and thatfor this reason we have focused on a limited set of predeter-mined user-categories, it would also be interesting to exploreautomatic classification schemes from which additional usercategories could emerge. In particular, such an approachwould allow one to examine the category of opinion lead-ers in more detail, possibly identifying opinion leaders fordifferent topics, as has been proposed elsewhere [14]. Inaddition, another area for future work would be to extractcontent information in a more systematic manner, sheddingmore light on the “what” element of Lasswell’s maxim. Andfinally, a significant challenge for future work is to mergethe data regarding information flow on Twitter with othersources of outcome data—relating, for example, to opinionsor actions that would engage more directly with the “effects”component of Lasswell’s maxim.

7. REFERENCES[1] E. Bakshy, J. M. Hofman, A. Mason, Winter, and

D. J. Watts. Identifying ‘influencers’ on twitter. InFourth ACM International Conference on Web Seachand Data Mining (WSDM), Hong Kong, 2011. ACM.

[2] W. L. Bennett and S. Iyengar. A new era of minimaleffects? the changing foundations of politicalcommunication. Journal of Communication,58(4):707–731, 2008.

[3] M. Cha, H. Haddadi, F. Benevenuto, and K. P.Gummad. Measuring user influence on twitter: Themillion follower fallacy. In 4th Int’l AAAI Conferenceon Weblogs and Social Media, Washington, DC, 2010.

[4] J. S. Coleman, E. Katz, and H. Menzel. The diffusionof an innovation among physicians. Sociometry,20(4):253–270, 1957.

[5] T. Gitlin. Media sociology: The dominant paradigm.Theory and Society, 6(2):205–253, 1978.

[6] E. Katz and P. F. Lazarsfeld. Personal influence; thepart played by people in the flow of masscommunications. Free Press, Glencoe, Ill.

”1955.

[7] G. Kossinets and D. J. Watts. Empirical analysis of anevolving social network. Science, 311(5757):88–90,2006.

[8] H. Kwak, C. Lee, H. Park, and S. Moon. What istwitter, a social network or a news media? InProceedings of the 19th international conference onWorld Wide Web, pages 591–600. ACM, 2010.

[9] H. D. Lasswell. The structure and function ofcommunication in society. In L. Bryson, editor, TheCommunication of Ideas, pages 117–130. University ofIllinois Press, Urbana, IL, 1948.

[10] P. F. Lazarsfeld, B. Berelson, and H. Gaudet. Thepeople’s choice; how the voter makes up his mind in apresidential campaign. Columbia University Press,New York, 3rd edition, 1968.

[11] R. K. Merton. Patterns of influence: Local andcosmopolitan influentials. In R. K. Merton, editor,Social theory and social structure, pages 441–474. FreePress, New York, 1968.

[12] J. Surowiecki. The Wisdom of Crowds : Why the manyare smarter than the few and how collective wisdomshapes business, economies, societies, and nations.Doubleday, New York, 1st edition, 2004. 2003070095James Surowiecki. Includes bibliographical references.

[13] J. B. Walther, C. T. Carr, S. S. W. Choi, D. C.DeAndrea, J. Kim, S. T. Tong, and B. Van Der Heide.Interaction of interpersonal, peer, and media influencesources online. In Z. Papacharissi, editor, A NetworkedSelf: Identity, Community, and Culture on SocialNetwork Sites, pages 17–38. Routledge, 2010.

[14] G. Weimann. The Influentials: People Who InfluencePeople. State University of New York Press, Albany,NY, 1994.

[15] J. Weng, E. P. Lim, J. Jiang, and Q. He. Twitterrank:finding topic-sensitive influential twitterers. InProceedings of the third ACM international conferenceon Web search and data mining, pages 261–270. ACM,2010.