Least Cost Influence in Multiplex Social Networks

Post on 15-Jan-2017

159 views 0 download

Transcript of Least Cost Influence in Multiplex Social Networks

Least Cost Influence in Multiplex Social Networks

MODEL REPRESENTATION AND ANALYSIS

Presented by:Ayushi Jain Rahul BobhateNatasha Mandal Ankur Sachdeva

Dung T. Nguyen, Huiyuan Zhang, Soham Das, My T. Thai, Thang N. Dinh

Structure• Define a few terms

• Motivation

• Related work

• Challenges and proposed solution

• Math notations and problem definition

• Lossless coupling

• Lossy coupling

• Influence relay

• Experiments

• Conclusion

What are Multiplex networks?• Networks extended to multiple edges between nodes like in more than one

social media platforms

• Example: A set of users who interact of Facebook, Twitter & Foursquare

What is least cost influence (LCI) problem?

• A minimum number of seed users who can eventually influence a large number of users

• Example: How to find the least advertising cost set of influencers who can influence a massive number of users

Or

How to find the minimum number of inducements required for the product adoption to reach a certain proportion of the population

motivation• In the recent decade, the popularity of OSNs has created a major

communication medium which allows for information sharing

• Similar to real social networks: word-of-mouth & peer-pressure effect

Do you know how much time does an individual spend(on average) on social media?

1.72 hours per

day

28% of online activity

some more statistics!Number of Facebook Users

Number of users(in millions)

Why is it important to study information diffusion in these networks??• Considerable number of overlapping users

• Users can relay the information from one network to another

• Example:

Jack

If we only consider the information propagation in one network, we’ll fail to identify the most influent users

Single network• Kempe et al.

• Find a set of k users who can maximize influence• Stochastic process- Independent Cascade Model (IC)• Probability of influencing friends α Strength of Friendship• NP Hard- greedy algorithm with approximation ratio (1-1/e)

• Linear Threshold Model (LT)• User adopts a new product when total influence of friends exceeds a threshold

• Dinh et al.• Suggested algorithm for a special case of LT• Influence between users is uniform and user is influenced if a certain fraction ρ of his friends

are active

Related work

Multiplex Networks• Yagan et al.

• Studied connection between online and offline networks• Investigated outbreak of information using SIR model on random networks

• Liu at al.• Analyzed networks formed by online interaction and offline events

Drawbacks:• Studied flow of information and network clustering but not LCI• Did not study specific optimization problem of viral marketing

• Shen et al.• Studied information propagation in multiplex OSN• Combined all networks into one network by representing an overlapping user as a

super node• Cannot preserve individual networks’ properties

challenges

How to evaluate influence of

overlapping users in multiplex networks?

In which network, a user is easier to be influenced?

Which network propagates the

influence better?

• In this paper, we study LCI for a set of users with minimum cardinality to influence a certain fraction of users in multiplex networks

• Represent a model for various coupling schemes to reduce the problem in multiplex networks to an equivalent problem on a single network. Coupling schemes can be applied for most popular diffusion models including: Linear Threshold model, Stochastic Threshold model, and Independent Cascading model

• Introduce a new metric called influence relay to analyze the influence diffusion process in both- a single network and multiplex networks

Proposed solution

Graph Notations• Gi – Weighted directed graph consisting of (Vi, Ei, θi, Wi).

• Vi – Set of vertices in graph Gi, represents users in the network.

• Ei – Set of edges in graph Gi, which represent the connection between the users.

• Wi – Set of weights of the edges which belong to Ei, which represents the strength of influence or the strength of connection.

• Nui- , Nu

i+ – Set of incoming and outgoing neighbors of u.

• θi(u) – Threshold indicating the persistence of opinions of u.

Least Cost Influence (LCI) Problem definition• Given:

• System of k networks G1..k

• Set of users U• Time hop d• 0<β<1

• To find:• A seed set S ⊂ U of minimum cardinality to such that • There are at least β fraction of users U active• After d hops

Linear Threshold model• Influence and information diffusion model for single network

• Could be extended to handle multiple networks

• In LT model:• Every user is either active or inactive• A user u is active if he/she accepts the information OR• The total influence of their neighbors is greater than their threshold.

• After each time hop, inactive users are activated and they continue to activate new users.• d be the number of hops in the network till which information is propagated.• Active set of users after d hops caused by seed set S is denoted by Ad(G1...k, S)

Coupling Schemes

• Lossless coupling scheme:• Scheme to combine multiple networks into single network.• No loss of data while combining networks. (Obviously!)• Advantages:

• Use existing algorithms• Same quality of solution

Challenges• Heterogeneity of user participation:

• User might have joined a single network• Other user might have joined multiple networks• Recognition of users is difficult

• Inter-network Influence propagation• User transmits the information in multiple networks• Represent transmission of influence between networks in a single network.

• Preserving properties of individual networks• Coupled network should preserve diffusion properties of individual networks.• Should be able to establish relationship between solution for coupled network and

individual network

Coupling scheme for LT-model• Solution to 1st challenge

• Introduce dummy nodes.• They represent a user u in the network Gi, in which the user is not registered.

• Solution to 2nd challenge• Introduce gateway vertices.• Introduce Synchronization edges. • Instead of an edge between two vertices, there exist • An edge between a user to a gateway vertex• And an edge from gateway vertex to a user

• Solution to 3rd challenge• Don’t need to do anything else.

Lemmas• Lemma 1: Suppose that the propagation process in the coupled network G

starts from the seed set which contains only gateway vertices S = {s01, . . . , s0

p}, then representative vertices are activated only at even propagation hops.

• Lemma 2: Suppose that the propagation process on G1...k and G starts from the same seed set S, then following conditions are equivalent: • User u is active after d propagation hops in G1...k.• There exists i such that ui is active after 2d − 1 propagation hops in G. • Vertex u0 is active after 2d propagation hops in G.

Theorems• Theorem 1: Given a system of k networks G1...k with the user set U, the coupled

network G produced by the lossless coupling scheme, and a seed set S = {s1, s2, . . . , sp}, if Ad(G1...k, S) = {a1, a2, . . . , aq} is the set of active users caused by S after d propagation hops in multiplex networks, then A2d(G, S)= {a0

1, a11, . . . ,

ak1, . . ., a0

q, a1q, . . . , ak

q} is the set of active vertices caused by S after 2d propagation hops in the coupled network.

• Theorem 2: When the lossless scheme is used, the set S = {s1, s2, . . . , sp} influences β fraction of users in G1...k after d propagation hops if and only if S = {s0

1, s02, . . . , s0

p} influences β fraction of vertices in coupled network G after 2d propagation hops.

Extension to other diffusion models• Lossless coupling scheme can be used for other diffusion models.

• Stochastic Threshold model• Independent Cascading model

• Similarity between LT model and other approaches• Same approach of using

• Gateway vertices• Representative vertices• Synchronization edges

Lossy Coupling

MOTIVATION

• In the coupled network of Lossless Coupling which was shown, there were a large number of extra vertices and edges.

• It is ideal to have a compact coupled network which contains only users as vertices.

• Such a compact coupled network will inevitably have loss of information.

Lossy Coupling

GOALS

• The goal is to design a scheme which will minimize this loss of information.

• The solution for finding the Least Cost Influence in the compact coupled network should be very close to the solution in the original multiplex network.

Lossy CouplingOBSERVATION 1 • A user will be activated if there exists such that where is the set of active

users.

• We can relax the conditions to activate with positive parameters as in follows:

Lossy CouplingPROPOSITION 1• For a system of networks , if

is satisfied, then user is activated.

• This can be used by checking the condition for a single network . The inequality still holds because .

Lossy Coupling• can constitute for extra influence which may be required to activate

• can be made proportional to . In this way, when we choose [].

• In real life, we don’t know in which network will be activated. Hence, we have to use heuristics.

Lossy CouplingOBSERVATION 2• When participates in multiple networks, it may be easier to influence in some

networks, than in others.

• For example if a node is in two networks:

Network 1: = 0.1, has 8 in-neighbors and each in-neighbor influences with = 0.1, it takes 1 neighbor to activate .

Network 2: = 0.7, has 8 in-neighbors and each in-neighbor influences with = 0.1, it takes 7 neighbors to activate .

Lossy CouplingEASINESS

• Intuitively we can say that is easier to influence in Network 1.

• Formally, =

• We can use as for the equation stated in OBSERVATION 1.

Lossy Coupling• Vertex Set is the set of users ,…}

• The threshold of vertex is

• The weight of edge is where if there is no edge from to in the network

Lossy Coupling

For the blue node, =>

For the edge between red node and blue node,=>

Lossy CouplingINVOLVEMENT

• If a user is surrounded by a group of friends who have a high influence on each other, the user tends to get influenced.

• We estimate of a node in a network by measuring how strongly the 1-hop neighborhood is connected and to what extent influence can propagate from one node to another in a 1-hop neighborhood.

Lossy Coupling• Formally, of a node in a network is defined as where

AVERAGE• All parameters have same value i.e.

Lossy CouplingTHEOREM 3• When a lossy coupling scheme is used, if the set of users activates fraction of

users in (lossy coupled network), then it activates at least fraction of users in (original system).

• The proof is based on the fact that the active state of a user in implies an active state of users in .

Influence RelayMOTIVATION• When information is diffused in multiplex networks, it may flow within a single

network or may travel through multiplex networks.

• What is the contribution of each component network in the influence process?

• How much information flows within a network or between networks?

• Quantifying these values will help us understand the diffusion process in multiplex networks.

Influence RelayDEFINITION• The authors proposed as a metric to quantify the role of users in propagating

information.

• The of vertices is recursively defined depending on order of activation.

• = seed set, = coupled network, = number of hops after which the activation process stops, = hop at which u is activated.

• All inactive vertices in have an of 0.

Influence Relay• For each activated vertex , of u, denoted by , is a linear combination of the of

its outgoing neighbors that are activated after .

• Formally, the of vertices is defined as:

Influence Relay• The captures the amount of influence a vertex relays to other vertices after

adopting the information.

• Thus, the of a vertex depends largely on the of vertices that helps to activate and the weight of edges between and them.

• The vertex is responsible for of ’s .

• We add 1 to of since also contributes itself to the set of activated vertices.

Influence RelayCOMPUTING INFLUENCE RELAY

• We compute of vertices in reverse order of the diffusion process.

• We construct the influence graph from the seed set to represent the diffusion process and to calculate the of all nodes in .

• The vertex set of nodes is .

• There is an edge from to in if has passed information to i.e. and .

• is a directed acyclic graph and the reverse topological ordering of takes linear time. The main loop runs for all the edges in so of all vertices can be computed in linear time.

Input: A network , a seed set and the number of hops .

Output: The influence relay of all vertices.

← The influence graph caused by on

for each do

← 0

end for

Compute the topological ordering of vertices in

for down to 1 do

← + 1

total ← 0

for each do

total ← total +

end for

for each do

end for

end for

Return IR

Influence RelayTHEOREM 4• One of the important properties of is that it preserves the number of activated

vertices.

• The total of seeding vertices is equal to the total number of activated vertices.

|

Influence RelayINFLUENCE CONTRIBUTION• To obtain the contribution of a network to the diffusion process, we sum up of

all seed vertices in that network.

INTERNAL AND EXTERNAL INFLUENCE• This can be used to quantify the amount of information flowing within and

between networks.

Influence Relay• When the information is propagated within a component network called the

“target” network there are two kinds of influence paths: • include edges only in the target network.• include some edges of other networks. They are formed when some of the

vertices are activated outside the target network.

• We adapt relay influence to measure internal influence (passes through internal paths) and external influence (passes through external paths) of the seed set in the target network as follows:

Influence Relay• Each vertex has internal influence and external influence .

• Both values are calculated backwards from activated vertices under ’s influence.

• Only activated vertex in the target network receives 1 more influence unit to since we only consider the influence propagation in the target network.

• If a vertex is activated outside the target network, all internal influence is converted to external influence.

EXPERIMENTS

Data Sets

Type

s of d

ata

sets

Real Networks

Synthesized Networks

Real Networks• Experiments performed on 2 data sets :

• Foursquare (FSQ) and Twitter networks• Co-author networks in the area of Condensed Matter(CM), High-Energy Theory(Het), and Network

Science(NetS)

• Number of overlapping users in first dataset FSQ-Twitter is 4100.• For second dataset, the numbers of overlapping users of the network pairs CM-Het, CM-NetS, and Het-

NetS are 2860, 517, and 90, respectively.

Real Networks

Weights of edges are randomly assigned

from 0 to 1.

The edge weights are then normalized so

that the total weight of incoming degree of each node is 1.

Threshold of each node is a random value from 0 to 1.

Synthesized Networks• Synthesized networks generated by Erdos-Renyi random network model are

used for testing networks with controlled parameters.

• Two networks with 10000 nodes are formed by randomly connecting each pair of nodes with probabilities 0.0008 and 0.006.

• The average degrees of the two networks are 8 and 60.

Comparison of coupling schemes Solution Quality

• In both networks the seed size is smallest when the lossless coupling scheme is used.• The seed sizes are only a bit larger using the lossy coupling schemes.

Comparison of coupling schemes• The small seed size is obtained through two different means:

• Increasing the fraction of overlapping users.

• Increasing the number of propagation hops.

Comparison of coupling schemesRunning Time

• The greedy algorithm runs much faster in the lossy coupled networks than in the lossless coupled networks.

• Using the lossy coupled networks reduces the running times by a factor of 2 in FSQ-Twitter and a factor 4 in the co-author networks in comparison to using the lossless coupled networks.

• The major disadvantages of the lossless coupling scheme are the doubled number of hops and the number of extra nodes and edges.

Advantages of using coupled networksInfluencing a fraction β of the nodes in all networks:

• The results using our lossless coupling method outperform the results when we run the greedy algorithm on each network separately and take the union of the produced seed sets.

• In Co-author networks, the size of seed set is 30% larger, and in FSQ-Twitter, it is 47% larger than the size of seed sets using lossless coupling method.

Influencing a fraction β of the nodes in a particular network:• The seed size decreases up to 9%, 25%, 17%, and 26% in CM, Het, FSQ, and Twitter,

respectively, when we consider these networks in connection with other networks.

• The external influence is substantial and accounts for large portions in many cases. For instance, when the influenced fraction β = 0.2, the external influence accounts for 27.3%, 52.7%, and 30.0% the total influence in CM, Het, and NetS, respectively.

Analysis of seed sets• A significant fraction of the seed set is overlapping nodes although only 5%-7% users

of any network are overlapping users.

• For β = 0.4, the fraction of overlapping seed vertices is around 24.9% and 25% in the co-author and FSQ-Twitter networks, respectively.

• When β is small, there is high influence contribution of overlapping users(approx. 50% when β = 0.2). However when β is large, overlapping users are already selected so they are not favored.

Mutual Impact of networks

• When k increases from 2 to 5, the seed size decreases several times. It implies that the introduction of a new OSN increases the diffusion of information significantly.

• The number of influenced vertices is raised 46% with the support of 3 new networks when k is changed from 2 to 5.

• the fraction of external influence is also increased dramatically from 39% when k = 2 to 67% when k = 5.

• All these results suggest that the existing networks may benefit from the newly introduced competitor.

Conclusion and future Work• To tackle the LCI problem, novel coupling schemes are introduced to reduce

the problem to a version on a single network.

• A new metric is designed to quantify the flow of influence inside and between networks based on the coupled network.

• Exhaustive experiments provide new insights to the information diffusion in multiplex networks.

• In future, the LCI problem can be investigated in multiplex networks with heterogeneous diffusion models in which each network may have its own diffusion mode.

Thank you!!

Thank You!!