Icwsm13 Cascades

download Icwsm13 Cascades

of 10

Transcript of Icwsm13 Cascades

  • 8/13/2019 Icwsm13 Cascades

    1/10

    The Anatomy of Large Facebook Cascades

    P. Alex DowFacebook

    [email protected]

    Lada A. AdamicFacebook

    [email protected]

    Adrien FriggeriFacebook

    [email protected]

    Abstract

    When users post photos on Facebook, they have the op-tion of allowing their friends, followers, or anyone atall to subsequently reshare the photo. A portion of thebillions of photos posted to Facebook generates cas-cades of reshares, enabling many additional users tosee, like, comment, and reshare the photos. In this pa-per we present characteristics of such cascades in ag-gregate, finding that a small fraction of photos accountfor a significant proportion of reshare activity and gen-erate cascades of non-trivial size and depth. We alsoshow that the true influence chains in such cascades canbe much deeper than what is visible through direct at-tribution. To illuminate how large cascades can form,we study the diffusion trees of two widely distributedphotos: one posted on President Barack Obamas pagefollowing his reelection victory, and another posted byan individual Facebook user hoping to garner enoughlikes for a cause. We show that the two cascades, despiteachieving comparable total sizes, are markedly differentin their time evolution, reshare depth distribution, pre-dictability of subcascade sizes, and the demographicsof users who propagate them. The findings suggest not

    only that cascades can achieve considerable size but thatthey can do so in distinct ways.

    Introduction

    Social networks are highly adept at transmitting informa-tion. This function has only been enhanced by platformssuch as Facebook. It is not only the opportunity to trans-mit information that has been enhanced by online so-cial networks, but also the ability to study and, specifi-cally, to track such transmission. Of particular interest havebeen large cascades, which would validate models of vi-ral spread of both information and influence through so-cial networks. However, evidence of such large cascades

    has been scant. Both explicit and implicit transmissionroutes have shown cascades to be shallow and highly frag-mented in a number of different contexts, ranging fromviral marketing (Leskovec, Singh, and Kleinberg 2006;Leskovec, Adamic, and Huberman 2007), to diffusion ofvirtual goods (Bakshy et al. 2011), and spread of newson platforms such as Twitter (Goel, Watts, and Goldstein

    Copyright c 2013, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

    2012; Bhattacharya and Ram 2012). Counterexamples in-clude email chain letters which have been measured toreach depths of hundreds of steps (Liben-Nowell and Klein-berg 2008), although they potentially also have substantialbreadth (Golub and Jackson 2010). A specific instance of acopy-and-paste meme on Facebook similarly reached a max-imum depth of 40 (Adamic, Lento, and Fiore 2012).

    There are therefore what appear to be conflicting find-ings. On the one hand, large-scale aggregate analyses of con-tent that is propagating in online media shows little or noevidence of viral spread. On the other there are instanceswhere careful tracing of cascades through email headers,content structure, and a known contact graph, has yieldedcascades that have achieved a non-trivial depth. What ismore, these long cascades align with the common experi-ence of encountering the same content several times overa period of time, suggesting that it has been circulatingthrough social ties through non-trivial paths. What is lack-ing is an understanding of the extent of viral content andthe particular dynamics that govern highly viral spread. Re-cent experimental work has made considerable advances

    in understanding how individuals influence one another topropagate information (Bakshy et al. 2012) and the rela-tive importance of demographics, influence and susceptibil-ity in the underlying social network (Aral and Walker 2011;2012). However, while the individual decisions to share in-formation are now better understood, the global cascadesthat may or may not result are still obscured.

    In this paper we present a large-scale study of resharedphotos on Facebook. We show that a small but significantportion of the content is spreading virally. With an in-depthanalysis of two large cascades, each containing over a hun-dred thousand nodes, we demonstrate that the actual infor-mation cascade can be much deeper than what is typicallymeasured. Namely, an individual can reshare information

    directly from the source, appearing to be creating yet an-other depth 1 branch of the cascade. However, a more care-ful tracing of how they arrived at the source may place themat a considerably greater depth. We also examine the overallcharacteristics of such cascades: their growth over time, thenodes branching factor as a function of cascade depth andtime, and the response of users to repeat exposure. We showthat the two cascades appealed to different demographics,yet both spread through a substantial portion of the globe

  • 8/13/2019 Icwsm13 Cascades

    2/10

  • 8/13/2019 Icwsm13 Cascades

    3/10

    In this study we contrast the explicit credit attributionmade by users to inferred information paths. The explicitattribution works as follows. User A uploads a photo. In thesimplest case, user A simply adds the photo to his/her Time-line, although they can also attach the photo to a directedpost to another user, group, or page. The Timeline is a collec-tion of all the posts by and about the user. As friend B mightsee As photo if they visit As Timeline, or if a story about

    A posting the photo appears in Bs News Feed. The NewsFeed is a stream of stories about Bs friends activities, thatis curated by an automated ranking algorithm. Most viewsof photos occur via the News Feed as opposed to Timelineviews. When B sees the story (either on As Timeline or inhis/her own News Feed), if the photo is public, B can decideto reshare it. If user C sees the story of user B resharing,again either in her own News Feed or on Bs Timeline, userC can click share, and a post is generated that optionallycredits B with the story which is now also on Cs Timeline.This is recorded as a reshare of depth 2, whose parent is B,and whose root node is A.

    This explicit attribution information is potentially incom-plete, because C may not reveal how she came to learn

    about As photo. She could, for example, click on Bs re-share, which takes her to As post with the original photoupload, and from there share the photo directly from A. Thisplaces C at depth 1 from the root, making the cascade ap-pear shallower and more fragmented than it really is. Whenthis occurs one can rely on path indicators to reconstruct orrechain the cascade. For example if C clicks on Bs reshareand shortly therafter reshares As photo, one can rechainthe cascade by moving node C from being directly attachedto the root A and instead placing it as a child node of B.This process is naturally not without error. For example ifC clicks on multiple reshares, it is unclear which one to at-tribute to. To minimize error, we look only at reshares occur-ring within an hour of a click. In subsequent sections, we use

    rechained diffusion paths to demonstrate that actual infor-mation flow often occurs through deeper paths than what isvisible through direct attribution on platforms such as Face-book and Twitter.

    Reshares that have been posted by users who have clickedon another users reshare prior to sharing themselves are as-signed the latter as a parent. This rechained path reconstructsthe chain of influence, since the user encountered the infor-mation from a friend before passing it on, rather than inde-pendently arriving at the source. Similarly, we rechain sharesfrom Facebook Pages if the user posting as the page suchas the pages owner has clicked on another share beforeresharing from the source.

    With the rechained data we calculate the depth of each

    node in the diffusion tree. We observe that in aggregate in-terest in content dissipates rapidly, both in terms of timeelapsed since the photo is first posted and in terms of cas-cade depth (see Figure 2). However, many cascades achieveremarkable depth, and others sustain interest throughout thetwo week observation period. The fast fading of interest formost items is consistent with prior findings relating to col-lective attention in social media, e.g. Digg (Wu and Huber-man 2007).

    1e!06

    1e!03

    1e+00

    1 10 100Depth

    Pr

    oportion

    ofreshares

    Cascade

    trackedinferred

    !

    !

    !!!

    !

    !

    !

    !

    !

    !

    !!

    !!!!

    !

    !

    !!

    ! !

    !

    !!

    !

    !

    !

    !

    !

    ! !

    0.0001

    0.0100

    0 10 20 30 40 50Hour since uploadP

    roportion

    ofreshares

    Figure 2: The proportion of reshares recorded at a givendepth after applying the described rechaining process. In-set: the proportion of reshares as a function of time elapsedsince upload.

    Focusing specifically on photos that have been resharedat least 100 times, we find that while many have the bulk of

    their reshares occurring within a single step from the source,Figure 3 shows that there is a significant fraction for whichthe bulk of the reshares occurs deeper than one level intothe cascade. That is, a significant portion are spreading vi-rally, as opposed to being primarily broadcast from a popularsource.

    0.00

    0.02

    0.04

    0.00 0.25 0.50 0.75 1.00Fraction of reshares at depth 1

    Proportion

    ofcascades

    Figure 3: The proportion of reshares occurring at depth 1for a sample of photos that have been reshared at least 100times.

    Large cascades

    After establishing the distribution of cascade sizes anddepth, showing that a small but significant fraction achievewide and deep distribution, we study two large cascades,OVP and MLM individually. By focusing on these twomemes, we can gain intuition about the different mecha-nisms by which a cascade can grow large.

    From Figure 4 we see that rechaining for OVP has rel-atively little effect. As we will discuss in a later sec-tion, 50% of those sharing Obamas photo subscribe to his

  • 8/13/2019 Icwsm13 Cascades

    4/10

    0.0

    0.2

    0.4

    0.6

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    MLM

    OVP

    0 5 10 15 20Depth

    F

    raction

    ofreshares

    Cascade

    trackedinferred

    Figure 4: The effect of rechaining on the measured depth ofcascades for the two memes

    page, so they were likely directly exposed anyway. In con-trast, rechaining the MLM meme produces a more dramaticchange in the depth distribution, since it is unlikely that tensof thousands of individuals would have independently ar-

    rived at a users Timeline and then decided to reshare thephoto. Yet many nodes do remain at this depth after therechaining procedure. This is because our information tracesonly cover a subset of all channels via which users can be-come aware of photos before resharing them. Some usersarrive via links shared in email, through Facebook Groupsor instant messages. Others arrive from external sources dis-cussing the photo, among them popular news sources andvarious message boards. The close interaction between a so-cial medium and external sources has previously been ob-served in Twitter (Myers, Zhu, and Leskovec 2012). Nev-ertheless, the minority of nodes remaining at depth 1 afterrechaining shows that although some users chose to resharethe photo directly, most became acquainted with it through

    their social networks.Rechaining not only shows us that one of the photos

    achieved highly viral spread, but it also serves as a cau-tionary tale about taking explicit attribution cascades at facevalue. At first glance, if one were to calculate depth usingexplicit attribution, one would conclude that MLM is pre-dominantly shared from the root, and that it is an even shal-lower cascade than OVP. On the contrary, it is OVP that istop-heavy because of the very high connectivity of its rootnode the Obama Facebook page. These findings suggestthat at least some of the shallow cascades observed in priorstudies may be artifacts of missing links in the true chain ofinfluence.

    In Figure 6 we visualize both cascades before and after

    rechaining. As described in Figure 5, reshares are repre-sented by dots here the color denotes the nature of theresharer, black for users and red for pages and cascadesare laid out radially around the original post. The distanceof the dots to the center is a function of the time at which thereshare was posted. Starting with the original post, we dividethe space by allocating to each reshare an angle proportionalto the fraction of all reshares that are its descendants. Wethen recursively repeat this process to position every reshare

    on the plane, processing each child node in chronological or-der, which causes the layout to adopt a spiral look. For read-abilitys sake and because the layout is already constrainedby the cascades tree structure, we do not explicitly draw thediffusion edges.

    Figure 5: Schematic representation of radial visualization.Each reshare is allocated a distance to the center based onthe time it was posted at and an angle proportional to itsnumber of descendants. Its position is then determined bythe position of its parent reshare and its siblings.

    The difference of effect in rechaining is particularly strik-ing when visualizing the cascades before and after the treat-ment. In both cases, before rechaining, pages lie on the innerspiral which represents the direct reshares of the post. Afterrechaining, pages are spread out in the MLM cascade, mir-roring their increase in depth.

    Temporal spread

    While the two memes we are investigating both spreadwidely, the rate and manner in which they spread differsmarkedly. In both cases, the first 24 hours after the pho-tos were posted contained the majority of reshares, 96% forMLM and 90% for OVP. As of February, OVP was still be-ing reshared, enjoying a slight increase in popularity due to

    Valentines Day. Shortly after MLM had reached its goal of1 million likes at 18 hours after upload, the owner madethe photo private to himself and his friends, disabling fur-ther resharing. Figure 7 illustrates the number of resharesper hour for the two photos in their first 24 hours. OVP en-joyed a burst of attention with a large number of shares inthe first hour after it was posted, quickly and steadily fallingoff after that, except for two spikes where popular Face-book Pages shared the photo with their followers. In con-trast, MLM started out slowly, gaining traction before peak-ing at 17 hours and quickly dropping thereafter.

    Repeated exposure

    The most common way that users see photos and other con-

    tent on Facebook is through their News Feed. If a friend orpage they follow were to share a photo, the News Feed mightcontain a story that [Friend] shared [source]s photo, alongwith any comments the friend made, the photo, and the orig-inal text accompanying the photo. The user can then sharethe photo with their own friends by clicking on the Sharelink. Figure 8 shows, for each meme, the percentage of userswho shared the photo after seeing it some number of timesin their News Feed. For both memes, we see an increase in

  • 8/13/2019 Icwsm13 Cascades

    5/10

    (a) MLM before (b) MLM after

    (c) OVP before (d) OVP after

    Figure 6: Cascades before and after rechaining. Black nodes represent users and red nodes represent pages.

    share rate from one to two and three impressions, though theincrease is greater for MLM. This increase may be due toboth influence and homophily (Bakshy et al. 2012): each ad-ditional impression gives the user another chance to resharethe photo, but it also indicates that the user is surrounded byindividuals susceptible to the meme, and so is more likely tosusceptible as well.

    Branching factors

    Why did one meme fall in popularity from a high initialburst, while the other built up momentum gradually? We canget an idea of how the two memes spread by looking at thebranching factors of the reshare trees. Since each node isa reshare, the branching factor of an individual node is thenumber of subsequent reshares that followed directly fromthat node. If a cascade were to have a uniform branchingfactor above 1, it would spread throughout the network as a

    viral pandemic would, but below 1 it would peter out. Ratherthan being constant, the branching factor varies both overtime and by depth.

    Figure 9 shows the average branching factor for reshares(omitting the root shares) of each photo by hour since theoriginal post. For OVP, the mean number of reshares startsbelow 1 and drops steadily for later shares, with two no-table exceptions: at 10 and 14 hours after the photo was

    posted, two celebrity pages with large numbers of follow-ers (Michelle Obama and Alicia Keys, respectively) sharedthe photo, each individually generating tens of thousands ad-ditional reshares these reshares are especially notable inthe visualized cascades in Figure 6. For MLM, we againsee a drop in the first few hours. However, subsequently,the mean branching factor increased steadily until 16 hours,when it surpassed 1, with the meme reaching popular usersand pages, at which point it peaked and began dropping

  • 8/13/2019 Icwsm13 Cascades

    6/10

    !

    !

    ! ! !! !

    ! ! !!

    !! !

    !!

    ! ! !! ! ! ! !0

    50000

    100000

    150000

    0 5 10 15 20Hours since photo posted

    Shares

    perhour

    ! OVPMLM

    Figure 7: Shares per hour for each meme for the first 24hours after the respective photo was posted.

    !

    ! !

    !

    ! !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    ! !

    !

    !

    0.4

    0.6

    0.8

    1.0

    5 10 15 20Feed impressions

    Percentofuser

    s

    thatshared

    ! OVPMLM

    Figure 8: Of the users that saw either photo up to 20 times intheir Facebook News Feed, the percentage that then sharedit, with 95% confidence intervals.

    again. The drop is likely due to the photo reaching its goalof 1 million likes at 17.5 hours (see Figure 7), depressingthe branching factor of any nodes posting shortly before thegoal was reached, since many seeing those reshares wouldlikely also see that there was no need to reshare further.

    While the relationship between the reshare time andbranching factor is clearly different for the two memes, Fig-ure 10 shows that both memes have similar branching fac-tor at a given tree depth. For both memes, there was a dropfrom nodes at depth 1 to nodes at depth 2. This is likely dueto most popular Facebook Pages sharing directly from thesource. From depths 2 to 5 the mean branching factor in-creases at which point it stays fairly flat until later depthswhere there are fewer nodes and, thus, more noise.

    Implicit in our discussion so far is the idea that the branch-ing factor depends on the number of individuals any givennode could influence to reshare. These include friends andfollowers of users, and followers of pages. As discussedpreviously, these users have a high likelihood of seeing astory about the reshare in their own News Feed, althoughthey and other users may become aware through other paths.Figure 11 shows the average branching factor per audiencemember for nodes in the OVP and MLM cascades, buck-

    !

    !

    !

    !

    !

    !

    ! !!

    !

    !

    !

    ! !

    !

    !!

    !

    !

    !

    !

    ! !!

    0.0

    0.5

    1.0

    0 5 10 15 20Hours since original post

    Ave

    ragebranchingfactor ! OVP

    MLM

    Figure 9: Mean branching factor, with 95% confidence in-tervals, by hour.

    !

    !

    !! ! ! ! ! !

    !

    !!

    !

    !

    !

    !

    0.0

    0.5

    1.0

    0 5 10 15 20Depth

    Averagebran

    chingfactor ! OVP

    MLM

    Figure 10: Mean branching factor, with 95% confidence in-tervals, by depth.

    eted by ranges of audience sizes. Though branching factor

    increases with audience size, each added audience membercontributes mostly the same or less to the branching factor.In the following section, we examine just how much ex-planatory power variables such as audience size and timeelapsed since first post have on a nodes branching factor.

    Explaining spread

    We modeled two measures of a nodes influence: the numberof other nodes who directly shared from it (its branchingfactor), and the size of the entire subcascade it generated.Explanatory variables we looked at included the depth of thereshare in the cascade, time elapsed since the original post,time of day, whether a node corresponds to a Facebook useror a Facebook Page, and its audience size. For a page, the

    audience size is the number of followers, and for users it isthe number of friends and subscribers. We also looked at anumber of demographic variables for users, including age,gender, and country.

    Table 2 shows the results of linear regression models fitwith least squares, separated by meme and outcome vari-able. Of all the independent variables we considered, audi-ence size yielded the greatest explanatory power, with othervariables explaining less than 1% of the variance. We restrict

  • 8/13/2019 Icwsm13 Cascades

    7/10

    !

    !

    !

    !! !

    !

    0.000

    0.001

    0.002

    0.003

    (10,10

    0]

    (100

    ,1e+

    03]

    (1e+

    03,1e+

    04]

    (1e+

    04,1e+

    05]

    (1e+

    05,1e+

    06]

    (1e+

    06,1e+

    07]

    (1e+

    07,1e+

    08]

    Audience size

    Averagebranchingfactor

    peraudiencemember

    ! OVP

    MLM

    Figure 11: Mean branching factor, with 95% confidence in-tervals, by audience size.

    the data used for the MLM regression to only include thosenodes where the share occurred within 15 hours of the orig-inal post. This is to avoid including nodes which posted lateenough to be affected by the drop in interest after the photoachieved 1 million likes (as we saw in Figures 7 and 9).Without this restriction, the regression is able to explain sub-stantially less variance than reported in Table 2. This pointsto the need to know external and internal driving factors tofully understand the temporal dynamics of cascades.

    In Table 2 we see that, for both outcomes, the audiencesize explains more variance for the OVP meme than it doesfor MLM. This is likely due to having more skew in the OVPdata than in MLM. In particular, OVP includes a small num-ber of data points with very large branching factors and sub-cascades. By fitting these points well, the OVP models looklike a better fit. To remove this skew, we also fit regressionsafter applying a logarithmic transform to variables. For theseregressions the variance explained was much less, though itwas more comparable between the two memes. The modestcorrelation between a nodes audience and its cascade influ-ence is consistent with prior findings on Twitter (Cha et al.2010; Romero et al. 2011).

    Table 2: Regressing the branching factor and subcascadesize on the audience size. For MLM only nodes that sharedin the first 15 hours are included.

    branching factor subcascadeOVP MLM OVP MLM

    intercept 0.08 0.40 0.41 2.34

    audience *104 8.17 5.18 11.33 13.60

    R2

    0.49 0.30 0.45 0.10

    As expected, for both memes, the audience size explainsmore of the variance in direct influence (branching factor)than it does when including indirect influence (subcascadesize), though the difference is more dramatic for MLM thanit is for OVP. This is likely due to the fact that the ratio ofbranching factor to subcascade size is smaller for MLM. ForOVP there is little difference between branching factor andsubcascade size, thus the regressions are not very different.

    Demographics and susceptibilityIn the previous section we saw that some of the variance inthe spread of memes can be explained simply by the connec-tivity of the nodes sharing the content. The better connectedthe node, the more additional shares are likely to result.However, this leaves a large portion of the response to theinformation unexplained. Naturally, some of that variancemay be due to how much the content matches the interests

    of the audience. For example, one might expect Alicia Keyspost of the OVP to garner intense interest because of the mil-lions of subscribers to her Facebook Page. But the responsemight also depend on Keys involvement with the Obamacampaign; she recorded Obamas campaign theme song. Aswell, her own gender and ethnicity (she is of African Amer-ican descent), and potentially that of her followers, are thoseof likely Obama supporters.

    Therefore, we next investigate whether some portions ofthe underlying population on which each of the two memeswere spreading were more susceptible than others. In prin-ciple, highly viral memes need not be uniformly appealingto the entire internet population, as long as the susceptiblepopulation is sufficiently interconnected.

    In fact, we find that the two memes appealed to broad butdistinct populations. There is further differentiation betweenusers in terms of who chooses to click, like, comment andreshare. Each of these actions requires a different amountof effort. Clicking and liking are just a single action. Re-sharing can be just two button clicks, but the user also fre-quently types in a description or comment about what theyare resharing. Comments require typing in text. Beyond theeffort required in each of these interactions, users considerthe signal they are sending by engaging in them. A like istypically considered an endorsement, and in the case of theMLM, a like has the additional feature of helping the postermove toward his goal. A comment can be supportive, but itcan also engage with the poster in other ways, e.g. express-

    ing disapproval. While clicks are not visible to their Face-book friends, a friend might notice a like and comment. Fi-nally, reshares are the strongest endorsement, as they broad-cast the information content to the users selected audience.

    We therefore examine not just demographic differencesin resharing, but compare them against rates of exposure andother types of activities, using as a baseline the exposure andactions related to content overall.

    Unsurprisingly, the lighthearted sexual subject matter ofMLM appealed primarily to young males. As Figure 12(a)shows, nearly as many women as men were exposed toMLM in their News Feed, though they were less likely thanmen to like it and commented on it even less frequently.Their reshare rate was significantly lower, with only 19%

    of the reshares being made by women. That is, the greaterthe amount of effort, and the greater the endorsement of thememe, the less likely it was for the action to be taken by awoman. Nevertheless, because the Facebook network is rel-atively well mixed, that is, there is little gender homophilyin the network, the exposure is relatively even. In terms ofage, this meme appealed to younger than average users, asshown in Figure 13. It skews even younger for those usersliking the photo. Resharers are typically older than the aver-

  • 8/13/2019 Icwsm13 Cascades

    8/10

  • 8/13/2019 Icwsm13 Cascades

    9/10

    a second signal about a users political affiliation and thatis the free-form political affiliation users can fill out on theirprofiles. Limiting ourselves just to users in the United States,we manually mapped the top 100 occurring terms on a -2to +2 scale, -2 being very liberal and +2 being very con-servative. For example, Democrat maps to -1, Tea PartyRepublican maps to +2, and Moderate maps to 0. Of the2,122,145 users who saw OVP and had a political affilia-

    tion that we could map on a -2 to 2 scale, 882,040 liked theObama page, and 242,807 liked the Romney page.

    Proportion of active users who were exposed

    0.00312 0.0371 0.0847 0.294

    (a)

    Proportion of exposed users who reshared

    0.00238 0.00717 0.0134 0.0464

    (b)

    Figure 16: Geographical distribution of (a) exposures rela-tive to active country users populations and (b) reshare rateper exposure for OVP

    As Figure 15 shows, there is less skew in exposure andinteraction based on political affiliation alone. This is be-cause the number of liberal and conservative users on Face-book is roughly balanced (whereas the popularity of Obamaand Romney is not). So even though about 15.2% of users

    who identify as liberal in their profile were exposed, rela-tive to 5.2% percent of those who identify as conservative,the relative proportions of liberals and conservatives inter-acting with the meme, though still skewed, are more bal-anced. Interestingly, the very liberal (12.8%) and very con-servative (4.7%) were slightly less likely to be exposed thantheir more moderate counterparts.

    Finally, we examine the geographic distribution of the twomemes. Even though both memes primary activity spanned

    little over a single day, their reach throughout the world isexpansive. While one might expect that a meme pertainingto the election outcome in the United States would primar-ily be of interest to US users, we find instead that a ma-jority of the impressions (70.3%), likes (60.1%), comments(63.2%) and reshares (62.9%) occurred outside of the US.These statistics are an interesting proxy of the interest inUS politics across the world, and are also suggestive of the

    rapid, global diffusion that Facebook enables. MLM, postedby a Norwegian user in English, quickly spread across con-tinents.

    Proportion of active users exposed

    0.00018 0.0261 0.0718 0.215

    (a)

    Proportion of exposed users who reshared

    0.00066 0.00568 0.00723 0.00937 0.0111 0.0154

    (b)

    Figure 17: Geographical distribution of (a) exposures rela-tive to active country users populations and (b) reshare rateper exposure for MLM

    Conclusion

    In this paper we took a first step toward understanding thedistribution and diffusion of viral content on one large social

    networking platform. We showed that while most content isnot viral, a small minority is. This is likely a natural result ofthe limited bandwidth of users. With a large number of pho-tos uploaded daily, only a very small fraction of photos couldbe sustained as viral without users having to reshare an un-reasonable number of photos every day. By examining twospecific instances, we showed that photos reshared hundredsof thousands of times are seen by millions of users, affectinga significant fraction of the user population and confirming

  • 8/13/2019 Icwsm13 Cascades

    10/10

    anecdotal accounts of seemingly ubiquitous memes. We sawthat there are different ways in which memes can achievesuch ubiquity. In the case of OVP, the source was itself alarge hub, and half of the shares were made by followersof the source. In the case of MLM, the growth was largelyorganic, although the meme may not have spread nearly asfar without the help of the high branching factors of pageswhich reshared it.

    Although we studied photo reshares in aggregate and twovery large cascades individually, a fair portion of activityis actually centered around moderately-sized memes, thosebeing reshared dozens to hundreds of times. It would be in-teresting to see whether such memes are specific to partic-ular interests and communities (Macskassy and Michelson2011), and whether they may actually be relatively success-ful in the sense that they permeate their target audience. Ifsome memes are specific to certain communities, it wouldbe of interest to infer this from their initial spread. Priorwork has shown that cascades may die out due to a combi-nation of high clustering among the susceptible populationand a lesser probability of resharing content upon repeat ex-posure (Greg Ver Steeg et al. 2011).

    The particular cascade shape may depend on the topic theitem relates to (Romero, Meeder, and Kleinberg 2011). Cas-cades may also be competing for attention and affecting oneanothers spread (Weng et al. 2012; Myers and Leskovec2012). For example, the second OVP, showing just the pres-ident standing alone, received half the likes and only a sixthof the reshares. Potentially its spread was affected by theOVP that preceded it. We leave these and other questionsfor future work.

    Acknowledgments

    The authors would like to thank Cameron Marlow and JonKleinberg for insightful discussions.

    References

    Adamic, L.; Lento, T.; and Fiore, A. 2012. How you met me. InProc. of the 6th International AAAI Conference on Weblogs andSocial Media (ICWSM).

    Aral, S., and Walker, D. 2011. Creating social contagion throughviral product design: A randomized trial of peer influence in net-works. Management Science57(9):16231639.

    Aral, S., and Walker, D. 2012. Identifying influential and suscep-tible members of social networks. Science337(6092):337341.

    Bakshy, E.; Hofman, J. M.; Mason, W. A.; and Watts, D. J. 2011.Everyones an influencer: Quantifying influence on Twitter. InProc. of the 3rd ACM Conference on Web Search and Data Min-ing (WSDM).

    Bakshy, E.; Rosenn, I.; Marlow, C.; and Adamic, L. 2012. Therole of social networks in information diffusion. In Proc. of the21st International Conference on World Wide Web (WWW) , 519528. ACM.

    Bhattacharya, D., and Ram, S. 2012. Sharing news articles using140 characters: A diffusion analysis on Twitter. InProc. Advancesin Social Networks Analysis and Mining (ASONAM), 966 971.

    Cha, M.; Haddadi, H.; Benevenuto, F.; and Gummadi, K. P. 2010.Measuring user influence in Twitter: The million follower fallacy.

    InProc. of the 4th International AAAI Conference on Weblogs andSocial Media (ICWSM), volume 14, 8.

    Goel, S.; Watts, D.; and Goldstein, D. 2012. The structure ofonline diffusion networks. InProc. of the 13th ACM Conferenceon Electronic Commerce (EC), 623638. ACM.

    Golub, B., and Jackson, M. 2010. Using selection bias to explainthe observed structure of Internet diffusions. Proceedings of the

    National Academy of Sciences107(24):1083310836.

    Gomez-Rodriguez, M.; Leskovec, J.; and Krause, A. 2012. In-ferring networks of diffusion and influence. ACM Transactions onKnowledge Discovery from Data (TKDD) 5(4):21.

    Greg Ver Steeg; Ghosh; Rumi; Lerman; and Kristina. 2011. Whatstops social epidemics? In Proc. of the 5th International AAAIConference on Weblogs and Social Media (ICWSM).

    Leskovec, J.; Adamic, L.; and Huberman, B. 2007. The dynamicsof viral marketing. ACM Transactions on the Web (TWEB)1(1):5.

    Leskovec, J.; Singh, A.; and Kleinberg, J. 2006. Patterns of in-fluence in a recommendation network. Advances in Knowledge

    Discovery and Data Mining380389.

    Liben-Nowell, D., and Kleinberg, J. 2008. Tracing informationflow on a global scale using internet chain-letter data. Proceedingsof the National Academy of Sciences105(12):46334638.

    Macskassy, S. A., and Michelson, M. 2011. Why do peopleretweet? Anti-homophily wins the day! In Proc. of the 5th Interna-tional AAAI Conference on Weblogs and Social Media (ICWSM),209216.

    Milano, D., and Stern, J. 2013. One million likes on Facebookfor a puppy, clothing, sex. http://abcnews.go.com/Technology/facebook-million-likes-puppy-bunny-sex-clothing-meme/story?id=

    18295476. ABC News.

    Myers, S. A., and Leskovec, J. 2012. Clash of the contagions:Cooperation and competition in information diffusion. In IEEE In-ternational Conference on Data Mining (ICDM), 539548. IEEE.

    Myers, S. A.; Zhu, C.; and Leskovec, J. 2012. Information diffu-sion and external influence in networks. InProc. of the 18th ACMSIGKDD International Conference on Knowledge Discovery and

    Data Mining (KDD), 3341.

    Rainie, L., and Smith, A. 2012. Social networking sites and poli-tics. Washington, DC: Pew Internet & American Life Project. Re-trieved June12:2012.

    Romero, D.; Galuba, W.; Asur, S.; and Huberman, B. 2011. Influ-ence and passivity in social media. Machine Learning and Knowl-edge Discovery in Databases1833.

    Romero, D. M.; Meeder, B.; and Kleinberg, J. 2011. Differences inthe mechanics of information diffusion across topics: idioms, po-litical hashtags, and complex contagion on Twitter. In Proc. of the20th International Conference on World Wide Web (WWW) , 695704. ACM.

    Weng, L.; Flammini, A.; Vespignani, A.; and Menczer, F. 2012.Competition among memes in a world with limited attention. Na-ture Scientific Reports2(355).

    Wu, F., and Huberman, B. A. 2007. Novelty and collectiveattention. Proceedings of the National Academy of Sciences104(45):1759917601.