StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization Suqi Cheng Research...

StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

Suqi ChengResearch Center of Web Data Sciences & Engineering

Institute of Computing Technology, Chinese Academy of Scienceschengsuqi@ict.ac.cn,chengsuqi@gmail.com

http://www.nascgroup.org/~chengsuqi

Authors: Suqi Cheng, Huawei Shen, Junming Huang, Guoqing Zhang, Xueqi Cheng

Outline

• Background• Preliminaries• Motivation• StaticGreedy algorithm• Experiments

Information Cascade

• An action or idea are adopted one by one due to social influence– cascade through social relationships

• Main Applications– Word-of-Mouth marketing– Out-break detection– Popularity prediction

social network

Word-of-Mouth Marketing

• To promote a product by seeding a few users; users adopting the product will recommend it

• Advantages: efficient; cost-effective

Company seed users follow-up activated users

free product/discount influence

How to select the optimal seed users?

Influence Maximization for Viral Marketing

• Objective function– Influence spread I(S) : expected number of activated

(influenced/adpoted) nodes– Maximize I(S)

• Input:– A social influence graph G=(V, E)

– An information cascade model– An integer k, |S| ≤ k

• Output: A seed set S

Information Cascade Model

• Independent cascade (IC) model– each edge (u, v) has a propagation probability

p(u, v)– each newly activated node u independently

activates its out-neighbor v with probability p(u, v)

– a discrete time model

• Influence spread estimation on IC model– Monte Carlo simulation– Heuristic methods

0.1 0.2

0.3 0.1

0.4 0.4

0.10.5

Social influence graph

[Leskovec, 2008]

Difficulties in Influence Maximization

Greedy approximate algorithm [Kempe, KDD’03]

(1-1/e-ε)-approximation iteratively select nodes with largest

marginal influence spread guaranteed by submodularity and

montonicity properties of influence spread function

accurate

inefficient

Difficulty 1: Influence maximization problem is NP-hard.[kempe, KDD’03]

Existing solutions

Heuristics Degree Pagerank Betweennes

efficient

inaccurate

Difficulties in Influence Maximization

Existing solutions

Heuristic methods DegreeDiscount[Chen,

KDD’09] CGA[Wang, KDD‘10] PMIA[Chen,KDD’10] IRIE[Jung, ICDM’12]

efficient

inaccurate

Monte-Carlo simulation CELF optimization[Leskovec,KDD’07] NewGreedy[Chen, KDD’09] CELF++ optimization[Goyal,WWW’11]

accurate

time-consuming

Difficulty 2: To exactly compute influence spread is #P-hard. [Chen, KDD’10]

A scalability-accuracy delimma!

Our works

• Objective : to propose an influence maximization algorithm to solve the scalability-accuracy dilemma

Algorithm Accuracy Scalability

Approximate algorithms

Greedy [Kempe, KDD’03] gurannteed low

CreedyCELF [Leskovec, KDD’07] gurannteed low

GreedyCELF++ [Goyal, WWW’11] gurannteed low

NewGreedy/MixedGreedy

[Chen, KDD’09] gurannteed low

StaticGreedy [cheng, CIKM’13] gurannteed high

Heuristics

Degree ungurannteed high

PageRank [Page, 1999] ungurannteed high

DegreeDiscount [Chen, KDD’09] ungurannteed high

PMIA [Chen, KDD’10] ungurannteed high

IRIE [Jung, ICDM’12] ungurannteed high

SP1M [Kimura, PKDD’06] ungurannteed relatively low

Preliminaries-1

• Social influence graph: G=(V, E), n=|V|, m=|E|

• Influence spread: I(S)

• Marginal influence spread: M(v|S)=I(S{v}) - I(S)

guaranteeguarantee

• Greedy approximate algorithm– iteratively select nodes with the largest marginal influence spread– provide 1-1/e-ε approximation

• Properties of I(S) under independent cascade model– submodularity: I(S{v}) - I(S) I(T{v}) - I(S) iff vV, S T V

– monotonicity: I(S{v}) I(S)

Influence spread estimation

Preliminaries-2

• Monte Carlo simulation for influence spread estimation– to approximate true values of influence spread by realizations

method An instance Advantage Disadvantage

simulation modeling the information cascade process

relatively low time complexity

estimate one seed set at a time

snapshot[Chen, KDD’09]

removing each edge (u, v) from G with probability 1-p(u, v)

can estimate any seed set simultaneously

relatively high time complexity

equivalent

Motivation

• In existing greedy algorithms– a risk of unguaranteed submodularity and monotonicity of influence

spread function

influence graph snapshot1 snapshot 2

iteration 1 iteration 2

Submodularity is breaked!

0 4 0 4

1 4 1 2 4 2

( { }) ( ) ({ }) ( ) 1

( { }) ( ) ({ , }) ({ }) 3

I S v I S I v I

I S v I S I v v I v

– caused by using different results of Monte Carlo simulation across different influence spread estimation

– a very large value of R is required, e.g. R=20000R: number of Monte Carlo simulations for estimation

StaticGreedy algorithm

• Core idea: to always use the same snapshots for influence spread estimation– influence spread function is submodular and monotone– a small value of R is required, e.g. R=100

Part1: Generate R static snapshots

Part 2: Greedy selection

Performance analysis: Convergence rate

• provide (1-1/e-ε)-approximation with a small value of R

( ) ( )

( )k R k

I S I Sd

seed set size = 50

NetHEPT: a benchmark networkuniform independent cascade (UIC) model: p(u, v) = p = 0.01weighted independent cascade (WIC) model: p(u, v) = 1/(# of in-neighbors of v)

Performance analysis: Scalabilitylo

seed set size

min ,min{ | 0.005}R kR R d

seed set size

≈103 times≈102 times

Minimal R required Running time

R is significantly reduced Running time is significantly reduced

Performance analysis: Complexity

n: number of nodes in social influence graphm: number of edges in social influence graphm’: expected number of edges in a snapshot

Speed up StaticGreedy

• A dynamic update strategy– calculates the marginal gain in an efficient incremental manner

• at each step t, for each snapshot: M(v) M(v) - |R(v)R(vt*)|, R(v) R(v) - R(v)R(vt*)

– trades space for time

v3 v4 v5

v6 v7 v8

M(v1)=4M(v2)=3M(v3)=2M(v4)=1M(v5)=1M(v6)=1M(v7)=2M(v8)=1

snapshot

initial

R(v): reachable nodes from v in the snapshot

Speed up StaticGreedy

• A dynamic update strategy– calculates the marginal gain in an efficient incremental manner

• at each step t, for each snapshot: M(v) M(v) - |R(v)R(vt*)|, R(v) R(v) - R(v)R(vt*)

– trades space for time

v3 v4 v5

v6 v7 v8

directlyupdate

snapshot

after select v* = v1

R(v): reachable nodes from v in the snapshot

Experiments: setup

• Algorithms: – Our algorithms: StaticGreedyCELF, StaticGreedyDU– Baselines: CELFGreedy, SP1M, PMIA, Degree, DegreeDiscount

• Tested datasets

• Independent cascade models– uniform independent cascade(UIC) model: p(u, v) = p = 0.01– weighted independent cascade(WIC) model: p(u, v) = 1/(# of in-neighbors of v)

• Metrics: Influence spread, running time

Experiments: influence spread

• StaticGreedy achieves better accuracy than other heuristics

NetPHY

UIC model

WIC model

Experiments: running time• StaticGreedy runs >103 times faster than CELFGreedy• StaticGreedy has comparable scalability to state-of-the-art heuristics• StaticGreedyDU always runs faster than StaticGreedyCELF

UIC model WIC model

conclusion• Essential reason of the inefficiency of existing greedy algorithms

– a risk of unguaranteed submodularity and monotonicity– caused by different Monte Carlo simulations across different estimations– a very large value of R is required guaranteed accuracy + inefficiency

• StaticGreedy algorithm– guaranteed submodularity and monotonicity– using the same Monte Carlo simulations across different estimations– a small value of R is required guaranteed accuracy + high scalability

– runs >103 times quicker than conventional greedy algorithms

• A dynamic update strategy to speed up StaticGreedy– about 10 times faster

Thank you!Thank you!

Q & AQ & A

StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization Suqi Cheng Research...

Documents

Transcript of StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization Suqi Cheng Research...

Ch09 Profit Maximization

Expectation & Maximization

knows suqi

7 Simplex Maximization

Wealth Maximization

Expectation Maximization Method

Units Page T'ebooks.noads.biz/FINANCIAL MANAGEMENT M212.pdf(a) Profit maximization objective (b) Wealth maximization objective. . NOTES PROFIT MAXIMIZATION OBJECTIVE Profitability

Socable Influence Maximization

Lecture 5: Utility Maximization Problemsd.umn.edu/~watanabe/econ460su10/doc/umpho.pdf · Lecture 5: Utility Maximization Problems ... called utility maximization problem ... We need

117601676 Profit Maximization

Family Utility Maximization

StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

Hashem Al-dujaili Tamer Barakat Hala Al Suqi

LinkedIn Maximization

Expectation-Maximization-based Channel Estimation for …islab.snu.ac.kr/upload/expectation-maximization-based... · 2018-08-13 · 2 Expectation-Maximization-based Channel Estimation

Shareholder Value Maximization

Expectation Maximization (Intuition) Expectation ... Maximization (Intuition) Expectation Maximization (Maths) 1 Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495) •

I. Minimization & Maximization

Medium Maximization

Profit maximization