Modeling and predicting retweeting dynamics via a mixture ...gdac.uqam.ca/ ·...
Transcript of Modeling and predicting retweeting dynamics via a mixture ...gdac.uqam.ca/ ·...
Modeling and predicting retweeting dynamics via amixture process
Jinhua Gao, Huawei Shen, Shenghua Liu and Xueqi [email protected], {shenhuawei, liushenghua, cxq}@ict.ac.cn
CAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology,Chinese Academy of Sciences, Beijing, China
ABSTRACTModeling and predicting retweeting dynamics in social me-dia has important implications to an array of applications.Existing models either fail to model the triggering effect ofretweeting dynamics, e.g., the model based on reinforcedPoisson process, or are hard to be trained using only theretweeting dynamics of individual tweet, e.g., the modelbased on self-exciting Hawkes process. In this paper, mo-tivated by the observation that each retweeting dynamics isgenerally dominated by a handful of key nodes that sepa-rately trigger a high number of retweets, we propose a mix-ture process to model and predict retweeting dynamics, witheach subprocess capturing the retweeting dynamics initiatedby a key node. Experiments demonstrate that the proposedmodel outperforms the state-of-the-art model.
Keywordsretweeting dynamics; popularity prediction; mixture process
1. INTRODUCTIONDifferent from traditional media where people access in-
formation released generally by a single source, social mediaplatforms, e.g., Facebook, Twitter, and Sina Weibo, offera decentralized style of information spreading, i.e., infor-mation spreads among users via retweeting or forwarding.Consequently, popularity of individual information follows ahighly-asymmetric distribution, and it is a challenging prob-lem to predict the popularity of information. Retweetingdynamics prediction, i.e., predicting how popular a piece ofinformation will become, attracts much research attention inthe last decade. Existing methods make predictions eitherby exploring relevant factors and applying standard regres-sion/classification methods, or by fitting retweeting dynam-ics using certain class of functions [1].Recently, researchers begin to directly model the process
that a piece of information gains its popularity. Shen etal. employed reinforced Poisson process (RPP) to modelthe arriving process of paper citations [2], and this model is
Copyright is held by the author/owner(s).WWW’16 Companion, April 11–15, 2016, Montréal, Québec, Canada.ACM 978-1-4503-4144-8/16/04.http://dx.doi.org/10.1145/2872518.2889389.
subsequently extended to model the retweeting dynamics ofmicroblog [3]. RPP assumes that information spreading isan arriving process of attention to a particular piece of infor-mation, simply using the count of retweets as the sole indi-cator of triggering effect of all retweets rather than modelingthe triggering effect of each retweet separately. Self-excitingHawkes process (SEHP) is then proposed to explicitly modelthe triggering effect of each retweet, with the arriving rateof retweets being determined by the aggregation of the trig-gering effect of all the previous retweets [4, 5, 6]. AlthoughSEHP has the flexibility to capture the triggering effect ofeach retweet, it is impractical to learn the strength of trig-gering effect for each retweet when only the retweeting dy-namics of individual tweet is available. Therefore, we stilllack an effective method to predict retweeting dynamics insocial media.
In this paper, we propose a mixture process to model andpredict retweeting dynamics. Our model is motivated by theobservation that the retweeting process of tweets could begenerally characterized by a diffusion tree with only a hand-ful of key nodes, each triggering a high number of retweets(Fig. 1A). Thus, the whole process of retweeting dynamicscould be divided into several subprocesses, each of which iswell modeled via a RPP model (Fig. 1B), and the wholeprocess is taken as the aggregation of these subprocesses(Fig. 1C). The model based on such a mixture process hashigher flexibility than RPP model, and could be much easilytrained than SEHP model. Experiments demonstrate thatthe proposed model outperforms the state-of-the-art modelat predicting retweeting dynamics.
2. MODEL AND VALIDATIONRetweeting dynamics of a tweet during a time period [0, T ]
is denoted by a set of retweeting moments {ti}(1 ≤ i ≤ n),where n represents the total number of retweets up to timeT . Without loss of generality, we have 0 = t0 ≤ t1 ≤ · · · ≤ti ≤ · · · ≤ tn ≤ T . We model the retweeting dynamics as amixture process of RPP characterized by the rate function
x(t) =
k∑l=1
λlf(t− τl; θl)c(t), (1)
where k is the number of subprocesses, λl captures the strengthof triggering effect initiated by the source retweeter of thel-th subprocess, τl is the moment when the l-th subprocessbegins, f(t; θl) is the relaxation function that characterizeshow the triggering effect matters over time modulated by pa-rameters θl, and c(t) is the total number of retweets received
33
Real data RPP model
A
B
C
0
20
40
60
80
100
120
140
160
180
0 5 10 15 20 25 30 35 40 45
Fre
qu
ency
Hours after publication
Real data
Our model
Figure 1: Retweeting dynamics of a message in SinaWeibo. (A) Diffusion tree of the retweeting pro-cess; (B) Subprocesses, each being initiated by a keynode; (C) Retweeting dynamics of the message.
up to time t. In this paper, we use log-normal function asthe relaxation function as in [2].The probability of the i-th retweet arrives at ti, given the
(i− 1)-th retweet arrives at ti−1, can be written as
p1(ti|ti−1) = x(ti)exp
(−∫ ti
ti−1
x(t)dt
), (2)
and the probability of no retweet between [tn, T ] is
p0(T |tn) = exp
(−∫ T
tn
x(t)dt
). (3)
Therefore, the likelihood of the retweeting dynamics {ti}ni=1
during [0, T ] is
L = p0(T |tn)n∏
i=1
p1(ti|ti−1). (4)
The parameters (i.e., λl, θl, τl) are estimated by maximizingthe log likelihood, and the retweeting dynamics of messagecould be predicted as
c(t) = n ∗ exp
(∫ t
T
k∑l=1
λlf(s− τl; θl)ds
). (5)
To evaluate the effectiveness of the proposed model, wecompare it with the RPP model [2] on a dataset crawledfrom Sina Weibo, the largest microblogging platform. Wefilter out the tweets with less than 100 retweets, and obtainthe retweeting dynamics of 164 tweets. For each tweet, weuse its retweeting dynamics in the first two hours for train-ing, and predict its dynamics at the following hours. In theexperiments, we set k=3 with the prior that few tweets hasmore than 3 key nodes in its retweeting dynamics in the firsttwo hours. We use MAPE (mean average percentage error)to evaluate prediction accuracy. Results are shown in Fig. 2.
0
100
200
300
400
500
600
700
800
900
1000
0 500 1000 1500
Real data
Our model
RPP model
0
50
100
150
200
250
300
350
400
450
500
0 50 100 150 200
Minutes after publication
Acc
um
ula
tiv
e re
twee
t co
un
t
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0 5 10 15 20
MA
PE
Hours after training period
Our model
RPP model
Figure 2: Comparison with the RPP model.
To offer some intuitions about the comparison, we also givetwo examples. We can see that our model not only fits wellthe retweeting dynamics in training period, but also has amore desirable prediction accuracy than competing model.
3. CONCLUSIONIn this paper, we proposed a mixture process to model
and predict retweeting dynamics in social media. The ideais motivated by the fact that the whole process of retweet-ing dynamics could be effectively captured by a handful ofsubprocesses initiated by retweeters with high number of fol-lowers. The proposed model is efficient to be trained usingonly the retweeting dynamics of individual tweet. Experi-mental results demonstrate the effectiveness of the proposedmodel, compared with the state-of-the-art model.
4. ACKNOWLEDGMENTSThis work was funded by the National Basic Research
Program of China (the 973 program) with grant numbers2014CB340401 and 2013CB329602, and the National Natu-ral Science Foundation of China with grant numbers 61472400,61232010, and 61572467.
5. REFERENCES[1] G. Szabo and B. A. Huberman. Predicting the
popularity of online content. Communications of theACM 53(8):80–88, 2010.
[2] H. W. Shen, D. Wang, C. Song, and A.-L. Barabasi.Modeling and predicting popularity dynamics viareinforced poisson processes. In AAAI 2014, pp.291–297.
[3] S. Gao, J. Ma, and Z. Chen. Modeling and predictingretweeting dynamics on microblogging platforms. InWSDM 2015, pp. 107–116.
[4] R. Crane and D. Sornette. Robust dynamic classesrevealed by measuring the response function of a socialsystem. PNAS, 105(41):15649–15653, 2008.
[5] P. Bao, H. W. Shen, X. Jin, and X. Q. Cheng. Modelingand predicting popularity dynamics of microblogs usingself-excited hawkes processes. In WWW 2015, pp. 9–10.
[6] Q. Zhao, M. A. Erdogdu, H. Y. He, A. Rajaraman, andJ. Leskovec. SEISMIC: A Self-Exciting Point ProcessModel for Predicting Tweet Popularity. In KDD 2015,pp. 1513–1522.
34