Scalable and Parallelizable Processing of Influence Maximization for Large-Scale Social Networks

53
Scalable and Parallelizable Processing of Influence Maximization for Large-Scale Social Networks for Large-Scale Social Networks Apr 9, 2013 Jinha Kim, Seung-Keol Kim, Hwanjo Yu Pohang University of Science and Technology (POSTECH)

description

The slides I presented in ICDE 2013

Transcript of Scalable and Parallelizable Processing of Influence Maximization for Large-Scale Social Networks

Page 1: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

Scalable and Parallelizable Processing of Influence Maximization

for Large-Scale Social Networksfor Large-Scale Social Networks

Apr 9, 2013Jinha Kim, Seung-Keol Kim, Hwanjo Yu

Pohang University of Science and Technology (POSTECH)

Page 2: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

2

Goal

• Boosting Influence Maximization processingby efficient influence evaluation

Page 3: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

3

Page 4: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

4

Viral MarketingViral Marketing

Influence Maximization ProblemInfluence Maximization Problem

GraphGraph DiffusionDiffusionModelModel

ProcessingProcessingAlgorithmAlgorithm

Page 5: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

5

Word of Mouth Effect

...

...

...

Page 6: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

7

A Marketer’s Perspective

...

...

...PERSUADPERSUADEE

ONE!ONE!

Making Making Money!!!!Money!!!!

Page 7: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

9

How to find in an algorithmic way?

Page 8: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

10

Viral MarketingViral Marketing

Influence Maximization ProblemInfluence Maximization Problem

GraphGraph DiffusionDiffusionModelModel

ProcessingProcessingAlgorithmAlgorithm

Page 9: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

11

Quantifying Influence

The expected number of users influenced by S

Page 10: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

12

Influence Maximization Problem (KKT 03)

Page 11: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

13

Viral MarketingViral Marketing

Influence Maximization ProblemInfluence Maximization Problem

GraphGraph DiffusionDiffusionModelModel

ProcessingProcessingAlgorithmAlgorithm

Page 12: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

14

Abstracting Social Networks

Page 13: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

15

Abstracting Social Network

uu vve

Page 14: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

16

Viral MarketingViral Marketing

Influence Maximization ProblemInfluence Maximization Problem

GraphGraph DiffusionDiffusionModelModel

ProcessingProcessingAlgorithmAlgorithm

Page 15: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

17

Quantifying Influence

The expected number of entities influenced by SDEPENDS ON

how influence is propagated through a graph

Page 16: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

19

SEEDSSEEDS

Independent Cascade (IC) model

active

inactive

t = 0

Page 17: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

20

Independent Cascade(IC) model

active at t = i

inactive

t = i + 1

active at t < i

Page 18: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

21

Independent Cascade(IC) model

inactive

active at t < j

t = j + 1

Propagation ends!!!

Page 19: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

22

Viral MarketingViral Marketing

Influence Maximization ProblemInfluence Maximization Problem

GraphGraph DiffusionDiffusionModelModel

ProcessingProcessingAlgorithmAlgorithm

Page 20: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

24

Processing AlgorithmProcessing Algorithm

Macro LevelMacro LevelProcessingProcessing

Micro LevelMicro LevelProcessingProcessing

Page 21: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

25

Processing AlgorithmProcessing Algorithm

Macro LevelMacro LevelProcessingProcessing

Micro LevelMicro LevelProcessingProcessing

Page 22: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

26

Macro Level (KKT 03)

• Finding the maximum from cases

• Reducible to set-covering problem (NP-Hard)

Page 23: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

27

Greedy Algorithm (KKT 03)

• Repeatedly selects the node which gives the most marginal gain from

• and are two major evaluation components

Page 24: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

28

Processing AlgorithmProcessing Algorithm

Macro LevelMacro LevelProcessingProcessing

Micro LevelMicro LevelProcessingProcessing

Page 25: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

29

Micro Level (CWW 10)

• Cannot count influence propagation routes between two nodes

Page 26: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

30

Evaluating (S)σ• Monte-Carlo Simulation (KKT 03)

• Simultaneous simulation (CWY 09)

• Breaking down a graph into communities (WCS 10)

• Shortest path between two nodes (KS 06)

• Local arborescence based on the most probable path (CWW 10)

Page 27: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

31

Processing AlgorithmProcessing Algorithm

Macro LevelMacro LevelProcessingProcessing IPAIPA

Page 28: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

33

Intuition

• How about extremely localizing influence??

• Influence path between two nodes as influence evaluation unit !!

• Considering all path is not tractable (#P-hard)

• Only considering meaningful influence paths

Page 29: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

36

Meaningful Influence Path in IC model

vv11vv11 vv22vv22 vv33vv33 vv44vv44 vv55vv550.1 0.1 0.1 0.1

Page 30: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

37

Traversing Graph

Graph A traversing tree from a

Page 31: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

38

Extracting Paths

A traversing tree from a A path collection from a

Page 32: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

39

Organizing Paths

A path collection from a

Page 33: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

40

Approximating ({v})σInfluence of a node v

infl. of v to itself

Influence of a node v to u

Page 34: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

41

Parallel evaluation

• To approximate ({v}), σPv V→ is required

• For v≠u, Pv V→ and Pu V → do not have common paths

• Independent evaluation of ({v}) is guaranttedσ

vv11vv11

uu1111uu1111

uu1n1nuu1n1n

. . .

. . .

vv22vv22

uu2121uu2121

uu2n2nuu2n2n

. . .

. . .

Page 35: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

42

Re-organizing

• Changing perspective from starting nodes to ending nodes

Page 36: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

43

• ({v}) ≠ (S {v}) - (S)σ σ ∪ σ

• influence blocking!!!!

• v blocks a path from u S∈

• We should detect blocked(invalid) paths

Approximating (S {v}) - (S)σ ∪ σis not trivialis not trivial

uuuu vvvv

uuuu vvvv

before

after

Page 37: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

44

Detecting influence blocking

• Current seed set : S

• New seed node : v

• Valid Paths

uuuu vvvv

vvvv uuuu

Page 38: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

45

Adding a seed node

Page 39: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

46

Detect invalid paths

Page 40: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

47

Approximating (S {v}) - (S)σ ∪ σ(S {v}) - (S)σ ∪ σMarginal infl. of a node v

infl. of v to itself

Infl. of seeds S to a node v

Only consider valid paths

Page 41: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

51

Empirical EvaluationEmpirical Evaluation

Page 42: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

52

Dataset

Page 43: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

53

Algorithms

• Monte-Carlo[Greedy] (LKG 07)

• PMIA (CWW 10)

• SD (single discount)

• Random (baseline)

• IPA

Page 44: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

54

Finding Threshold

Page 45: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

55

Processing Time

Page 46: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

57

Influence

Page 47: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

58

Influence

Page 48: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

59

Influence

Page 49: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

60

Parallelization Effect

Page 50: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

61

Q & A

Page 51: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

62

References

Page 52: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

63

• KKT 03 : Kempe, D., Kleinberg, J., and Tardos, E. Maximizing the spread of influence through a social network. (KDD ’03)

• SC 06 : Kimura, M., and Saito, K. Tractable models for information diffusion in social networks. (PKDD ’06)

• LKG 07 : Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., and Glance, N. Cost-effective outbreak detection in networks. (KDD ’07)

• CWY 09 : Chen, W., Wang, Y., and Yang, S. Efficient influence maximization in social networks.(KDD ’09)

Page 53: Scalable and Parallelizable Processing of Influence Maximization  for Large-Scale Social Networks

64

• CWW 10 : Chen, W., Wang, C., and Wang, Y. Scalable influence maximization for prevalent viral marketing in large-scale social networks.(KDD ’10)

• WCS 10 : Wang, Y., Cong, G., Song, G., and Xie, K. Community-based greedy algorithm for mining top- k influential nodes in mobile social networks.(KDD ’10)

• JSC 11 : Jiang, Q., Song, G., and Cong, G., Simulated Annealing Based Influence Maximization in Social Networks.(AAAI ’11)

• LYK 12 : Lee, W., Kim, J., and Yu, H., CT-IC: Continuously activated and Time-restricted Independent Cascade Model for Viral Marketing(ICDM ’12)