Effect of Landslide Factor Combinations on the Prediction ...
Social Influence Analysis and Action Prediction via Factor ...
Transcript of Social Influence Analysis and Action Prediction via Factor ...
1
Jie Tang
Tsinghua University, China
Collaborate with
Jimeng Sun (IBM), Chi Wang (UIUC), and Chenhao Tan (Cornell)
Social Influence Analysis and Action
Prediction via Factor Graph Models
2
Motivation
• 500 million users
• the 3rd largest ―Country‖ in the world
• More visitors than Google
• Action: Update statues, create event
• More than 4 billion images
•Action: Add tags, Add favorites
• 2009, 2 billion tweets per quarter
• 2010, 4 billion tweets per quarter
•Action: Post tweets, Retweet
Social networks already become a bridge to connect
our really daily life and the virtual web space
3
Motivation (cont.)
• Modeling and tracking users’ actions in
social networks is a very important issue
and can benefit many real applications
– Advertising
– Social recommendation
– Marketing
– …
4
Ada
Frank
Eve David
Carol
Bob
George
2
1
14
2
2 33
Marketer Alice
Application—Influence Maximization
Find K nodes (users) in a social network that could maximize the
spread of influence (Domingos, 01; Richardson, 02; Kempe, 03)
Social action and
influence Who are the
opinion leaders
in a community?
5
Ada
Frank
Eve David
Carol
Bob
George
2
1
14
2
2 33
Marketer Alice
Application—Influence Maximization
Find K nodes (users) in a social network that could maximize the
spread of influence (Domingos, 01; Richardson, 02; Kempe, 03)
Social action and
influence Who are the
opinion leaders
in a community?
Questions: - How to quantify the strength of social influence
between users?
- How to predict users’ actions over time?
6
Jie Tang, Jimeng Sun, Chi Wang, and Zi Yang. Social Influence Analysis in
Large-scale Networks. SIGKDD 2009. pp. 807-816.
Social Influence Analysis
7
Topic-based Social Influence Analysis
• Social network -> Topical influence network
Ada
Frank
Eve David
Carol
Bob
George
Input: coauthor network
Ada
Frank
Eve David
Carol
George
Social influence anlaysis
θi1=.5
θi2=.5
Topic
distributiong(v1,y1,z)θi1
θi2
Topic
distribution
Node factor function
f (yi,yj, z)
Edge factor function
rz
az
Output: topic-based social influences
Topic 1: Data mining
Topic 2: Database
Topics:
Bob
Output
Ada
Frank
Eve
BobGeorge
Topic 1: Data mining
Ada
Frank
Eve David
George
Topic 2: Database
. . .
2
1
14
2
2 33
8
How a person
influence a social
community?
Several key challenges: • How to differentiate the social influences from
different angles (topics)?
• How to incorporate different information (e.g.,
topic distribution and network structure) into a
unified model?
• How to estimate the model on real-large networks?
How two persons
Influence each
other?
9
Our Solution: Topical Affinity Propagation
• Topical Affinity Propagation
– Topical Factor Graph model
– Efficient learning algorithm
– Distributed implementation
10
Topical Factor Graph (TFG) Model
Node/user
Nodes that have the
highest influence on
the current node
The problem is cast as identifying which node has the highest probability to
influence another node on a specific topic along with the edge.
Social link
11
• The learning task is to find a configuration
for all {yi} to maximize the joint probability.
Topical Factor Graph (TFG)
Objective function:
1. How to define?
2. How to optimize?
12
How to define (topical) feature functions?
– Node feature function
– Edge feature function
– Global feature function
similarity
or simply binary
13
Model Learning Algorithm
Sum-product:
- Low efficiency!
- Not easy for
distributed learning!
}{~ \~'
)'(')(
\~'
)(')(
)',(i i
ijiiij
iji
iiiiji
y yfy
yfyiyyf
fyf
yyfyfy
myyfm
mm
14
New TAP Learning Algorithm
1. Introduce two new variables r and a, to replace
the original message m.
2. Design new update rules:
mij
15
The TAP Learning Algorithm
16
• Map-Reduce
– Map: (key, value) pairs
• eij /aij ei* /aij; eij /bij ei* /bij; eij /rij e*j /rij .
– Reduce: (key, value) pairs
• eij / * new rij; eij/* new aij
• For the global feature function
Distributed TAP Learning
17
Experiment
• Data set: (http://arnetminer.org/lab-datasets/soinf/)
• Evaluation measures
– CPU time
– Case study
– Application
Data set #Nodes #Edges
Coauthor 640,134 1,554,643
Citation 2,329,760 12,710,347
Film 18,518 films
7,211 directors
10,128 actors
9,784 writers
142,426
18
Scalability Performance
19
Speedup results
0 170K 540K 1M 1.7M0
1
2
3
4
5
6
7
1 2 3 4 5 61
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
Perfect
Our method
Speedup vs. Dataset
size
Speedup vs. #Computer
nodes
20
Influential nodes on different topics
21
Social Influence Sub-graph on ―Data mining‖
22
Application—Expert Finding
Expert finding data from (Tang, KDD08; ICDM08)
http://arnetminer.org/lab-datasets/expertfinding/
23
Social Action Prediction
Chenhao Tan, Jie Tang, Jimeng Sun, Quan Lin, and Fengjiao Wang. Social
Action Tracking via Noise Tolerant Time-varying Factor Graphs. SIGKDD 2010.
pp. 1049-1058.
24
John
Time t
John
Time t+1
Action Prediction:
Will John post a tweet on ―Haiti Earthquake‖?
Personal attributes:
1. Always watch news
2. Enjoy sports
3. ….
Influence 1
Action bias 4
Dependence 2
Social Action Modeling and Prediction
Correlation 3
25
Problem formulation
Gt =(Vt, Et, Xt, Yt)
Input:
Gt =(Vt, Et, Xt, Yt) t = 1,2,…T
Output:
F: f(Gt) ->Yt
Nodes at time t
Edges at time t
Attribute matrix at time t
Actions at time t
26
NTT-FGM Model
Continuous latent action state Personal attributes
Correlation
Dependence
Influence
Action Personal attributes
27
How to estimate the parameters?
Model Instantiation
28
Model Learning—Two-step learning
Extremely time costing!!
Our solution: distributed learning (MPI)
29
• Data Set (http://arnetminer.org/stnt)
• Baseline
– SVM
– wvRN (Macskassy, 2003)
• Evaluation Measure:
Precision, Recall, F1-Measure
Action Nodes #Edges Action Stats
Twitter Post tweets on
―Haiti Earthquake‖ 7,521 304,275 730,568
Flickr Add photos into
favorite list 8,721 485,253 485,253
Arnetminer Issue publications
on KDD 2,062 34,986 2,960
Experiment
30
Performance Analysis
31
Factor Contribution Analysis
• NTT-FGM: Our model
• NTT-FGM-I: Our model ignoring influence
• NTT-FGM-CI: Our model ignoring influence and correlation
32
Efficiency Performance
With
5x4
cores
33
Conclusion
• Formalize a novel problem of topic-based social
influence analysis and propose a Topical Factor
Graph model to solve this problem
• Propose a unified model: NTT-FGM to model and
predict social actions
• Distributed model learning:
– For TFG model: Map-reduce (hadoop)
– For NTT-FGM: MPI (Message Passing Interface)
34
Thank you!
QA?
Data & Code:
http://arnetminer.org/lab-datasets/soinf
http://arnetminer.org/stnt
35
Statistical Study: Influence Y-axis: the likelihood that the user also performs the action at t
X-axis: the percentage of one’s friends who perform an action at t − 1
36
Statistical Study: Dependence Y-axis: the likelihood that a user performs an action
X-axis: different time windows
37
Statistical Study: Correlation Y-axis: the likelihood that two friends(random) perform an action together
X-axis: different time windows
38
Appendix
39
Appendix
40
Appendix
41
Prediction
• Based on the learning parameters we just
need to solve the following equations:
42
Latent State Analysis
Action Bias Factor: f(y12|z1
2)
Influence Factor: g(z11,z1
2)
Correlation Factor: h(z12,z2
2), h(z12,x1
2)
43
Related Work—Social networks and influences
• Social network
– Metrics to characterize a social network
– Web community discovery [Flake,2000]
• Influence in social network
– The existence of influence. [Singla, 2008]
[Anagnostopoulos, 2008]
– The correlation between social similarity and
interactions [Crandall, 2008]
44
• Factor graph models – A graph model [Kschischang, 2001]
– Computing marginal function [Frey, 2006]
– Message passing/affinity propagation [Frey, 2007]
• Distributed programming model – Map-reduce [J. Dean, 2004]
Related Work—large-scale mining