Mining Social and Information...

104
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks Mining Social and Information Networks Fragkiskos D. Malliaros ´ Ecole Polytechnique, France UC San Diego E-mail: [email protected] UCSD Artificial Intelligence Seminar October 3, 2016 1/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Transcript of Mining Social and Information...

Page 1: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Mining Social and Information Networks

Fragkiskos D. Malliaros

Ecole Polytechnique, France→ UC San Diego

E-mail: [email protected]

UCSD Artificial Intelligence Seminar

October 3, 2016

1/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 2: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Networks

Networks allow to model relationshipsbetween entities

General-purpose languagefor describing real-world systems

2/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 3: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Online Social Networks Source: https://www.facebook.com/zuck

Page 4: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Collaboration networks (Co-authorship)

Page 5: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Weblogs network (Political blogs)

Page 6: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Term co-occurrence network (David Copperfield novel by

Charles Dickens)

Page 7: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Motivation: Analytics in the Network Science Era

Overview of my work

Models, tools and observations for problems in the area of mining real-world networks

We build upon computationally efficient graph mining methods to:

1. Design models for analyzing the structure and dynamics of real-worldnetworks

� Unravel properties that can further be used in practical applications

2. Develop algorithmic tools for large-scale analytics on data with:

� Inherent graph structure (e.g., social networks)� Without inherent graph structure (e.g., text)

7/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 8: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Motivation: Analytics in the Network Science Era

Overview of my work

Models, tools and observations for problems in the area of mining real-world networks

We build upon computationally efficient graph mining methods to:

1. Design models for analyzing the structure and dynamics of real-worldnetworks

� Unravel properties that can further be used in practical applications

2. Develop algorithmic tools for large-scale analytics on data with:

� Inherent graph structure (e.g., social networks)� Without inherent graph structure (e.g., text)

7/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 9: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Motivation: Analytics in the Network Science Era

Overview of my work

Models, tools and observations for problems in the area of mining real-world networks

We build upon computationally efficient graph mining methods to:

1. Design models for analyzing the structure and dynamics of real-worldnetworks

� Unravel properties that can further be used in practical applications

2. Develop algorithmic tools for large-scale analytics on data with:

� Inherent graph structure (e.g., social networks)� Without inherent graph structure (e.g., text)

7/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 10: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Motivation: Analytics in the Network Science Era

Overview of my work

Models, tools and observations for problems in the area of mining real-world networks

We build upon computationally efficient graph mining methods to:

1. Design models for analyzing the structure and dynamics of real-worldnetworks

� Unravel properties that can further be used in practical applications

2. Develop algorithmic tools for large-scale analytics on data with:

� Inherent graph structure (e.g., social networks)� Without inherent graph structure (e.g., text)

7/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 11: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Overview of my Work (1/2)

� Engagement dynamics and vulnerability assessment in socialnetworks

� [w/ Vazirgiannis, CIKM ’13; EPL ’15]

� Graph clustering and community detection� [w/ Vazirgiannis, Phys. Rep. ’13]� [w/ Giatsidis, Thilikos, Vazirgiannis, AAAI ’14]� [w/ Mitri, Vazirgiannis, Manuscript ’16]

� Anonymity models for real networks� [w/ Casas-Roma, Vazirgiannis, Manuscript ’16]

8/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 12: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Overview of my Work (2/2)

� Spectral robustness estimation in real networks [This talk]� [w/ Megalooikonomou, Faloutsos, SDM ’12; KAIS ’15]

� Identification of influential spreaders (nodes) in complexnetworks [This talk]

� [w/ Rossi, Vazirgiannis, WWW ’15; Scientific Reports ’16]

� Graph-based text analytics (text categorization and keywordextraction) [This talk]

� [w/ Skianis, Vazirgiannis, ASONAM ’15; Manuscript ’16]� [w/ Tixier, Vazirgiannis, EMNLP ’16]

Main tools: spectral graph theory, core decomposition

9/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 13: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Outline

1 Introduction and Research Overview

2 Robustness Estimation in Social Graphs

3 Locating Influential Spreaders in Social Networks

4 Graph-Based Text Analytics

5 Concluding Remarks

10/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 14: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Robustness Estimation in Social Graphs

Page 15: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Graph Robustness Estimation

Main question:

What can we say about the robustness of large real-world social graphs?

Modular network Non modular network

12/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 16: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Graph Robustness Estimation

Main question:

What can we say about the robustness of large real-world social graphs?

Modular network Non modular network

12/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 17: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Graph Robustness Estimation

Main question:

What can we say about the robustness of large real-world social graphs?

Modular network Non modular network

12/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 18: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Graph Robustness Estimation

Main question:

What can we say about the robustness of large real-world social graphs?

Modular network Non modular network

12/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 19: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Graph Robustness Estimation

Main question:

What can we say about the robustness of large real-world social graphs?

Modular network Non modular network

12/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 20: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Expansion Properties

� Quick estimation of the robustness of agraph by examining the expansionproperties

� Graph with good expansion properties:simultaneously sparse and highlyconnected

� Given a graph G = (V ,E), theexpansion of a subset of nodes S, with

|S| ≤ |V |2

is|N(S)||S|

13/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 21: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Expansion Properties

� Quick estimation of the robustness of agraph by examining the expansionproperties

� Graph with good expansion properties:simultaneously sparse and highlyconnected

� Given a graph G = (V ,E), theexpansion of a subset of nodes S, with

|S| ≤ |V |2

is|N(S)||S|

S ⊂ V

13/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 22: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Expansion Properties

� Quick estimation of the robustness of agraph by examining the expansionproperties

� Graph with good expansion properties:simultaneously sparse and highlyconnected

� Given a graph G = (V ,E), theexpansion of a subset of nodes S, with

|S| ≤ |V |2

is|N(S)||S|

S ⊂ V

13/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 23: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Expansion Properties

� Quick estimation of the robustness of agraph by examining the expansionproperties

� Graph with good expansion properties:simultaneously sparse and highlyconnected

� Given a graph G = (V ,E), theexpansion of a subset of nodes S, with

|S| ≤ |V |2

is|N(S)||S|

S ⊂ V

13/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 24: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Expansion Properties

� Quick estimation of the robustness of agraph by examining the expansionproperties

� Graph with good expansion properties:simultaneously sparse and highlyconnected

� Given a graph G = (V ,E), theexpansion of a subset of nodes S, with

|S| ≤ |V |2

is|N(S)||S|

S ⊂ V

13/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 25: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Expansion Properties

� Quick estimation of the robustness of agraph by examining the expansionproperties

� Graph with good expansion properties:simultaneously sparse and highlyconnected

� Given a graph G = (V ,E), theexpansion of a subset of nodes S, with

|S| ≤ |V |2

is|N(S)||S|

Expansion factor

h(G) = min{S:|S|≤ |N|2

} |N(S)||S|

14/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 26: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Why the Expansion Properties are Important?

� They can act as a natural measure of the graph’s robustness� Bottlenecks edges inside the network

� Good expansion properties imply high robustness� Any subset S ⊂ V will have a relatively large neighborhood� Poor modular structure

� Poor expansibility reflects absence of robustness� Modular structure→ Better community structure

15/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 27: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Measuring Expansion Properties

� Computing the expansion factor is a computational difficultproblem

� Approximation techniques?

� Use the spectrum of the adjacency matrix A

� The expansion properties of a graph can be approximated by thespectral gap ∆λ = λ1 − λ2

Alon - Milman inequality

∆λ

2≤ h(G) ≤

√2λ1∆λ

16/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 28: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

First Approach in Estimating the Robustness

� Large spectral gap ∆λ implies good expansion properties→high robustness

� If λ2 is close enough to λ1, the spectral gap will be small→ lowrobustness

Spectral gap based approach

1. Compute the spectral gap2. If this is large, the graph will show high robustness properties

Problem

How large the spectral gap should be for a graph to have goodexpansibility?

17/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 29: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

First Approach in Estimating the Robustness

� Large spectral gap ∆λ implies good expansion properties→high robustness

� If λ2 is close enough to λ1, the spectral gap will be small→ lowrobustness

Spectral gap based approach

1. Compute the spectral gap2. If this is large, the graph will show high robustness properties

Problem

How large the spectral gap should be for a graph to have goodexpansibility?

17/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 30: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Spectral Gap + Subgraph Centrality

� Combine the spectral gap with the subgraph centrality

� Subgraph centrality:

� Node centrality measure

� It is based on the number of closed walks that a node participates:

SC(i) =∞∑`=0

A`ii`!, ∀i ∈ V

� Using techniques from spectral graph theory

SC(i) =

|V |∑j=1

u2ij sinh(λj), ∀i ∈ V

18/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 31: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Characterizing the Robustness : the Idea

SC(i) =∑|V |

j=1 u2ij sinh(λj), ∀i ∈ V

� Good expansion properties→ High robustness→ λ1 � λ2→u2

i1 sinh(λ1)�∑|V |

j=2 u2ij sinh(λj)

� For a graph with high robustness

SC(i) ≈ u2i1 sinh(λ1)

� This implies a power-law relationship

ui1 ∝ sinh−1/2(λ1) SC(i)1/2

� Deviations from this behavior imply existence (or lack thereof) ofhigh robustness

19/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 32: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Observations and Shortcoming

1. Scalability issues� It requires the computation of all the pairs (λi , ui ), ∀i ∈ V of the

adjacency matrix A

� Computational bottleneck for large graphs with millions of nodesand edges

2. It cannot be applied directly to bipartite graphs

20/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 33: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Proposed Metric: Generalized Robustness Index rkBasic Idea

Q: Can we efficiently approximate the SC(i), ∀i ∈ V?

1. The eigenvalues of A follow a power-lawdistribution [Faloutsos et al., ’99]

2. Almost symmetric eigenvalues around zero

3. sinh(·): odd function

� sinh(−x) = − sinh(x)0 1000 2000 3000 4000 5000 6000 7000

−100

−50

0

50

100

150

RankE

ige

nva

lue

Figure : Skewed spectrum (WIKI-VOTE)

21/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 34: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Normalized Subgraph Centrality NSCk

SC(i) =∑|V|

j=1 u2ij sinh(λj ), ∀i ∈ V

� Use only the first top k eigenvalues and their correspondingeigenvectors

� Due to the sinh(·) function, the contribution of each eigenvaluewill be roughly canceled out by its (almost) symmetric eigenvalue

� The contribution of most of the eigenvalues is negligiblecompared with that of the first few eigenvalues

Normalized subgraph centrality

NSCk (i) =∑k

j=1 u2ij sinh(λj), ∀i ∈ V

� Generally, k � |V | for real-world graphs

22/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 35: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Normalized Subgraph Centrality NSCk

SC(i) =∑|V|

j=1 u2ij sinh(λj ), ∀i ∈ V

� Use only the first top k eigenvalues and their correspondingeigenvectors

� Due to the sinh(·) function, the contribution of each eigenvaluewill be roughly canceled out by its (almost) symmetric eigenvalue

� The contribution of most of the eigenvalues is negligiblecompared with that of the first few eigenvalues

Normalized subgraph centrality

NSCk (i) =∑k

j=1 u2ij sinh(λj), ∀i ∈ V

� Generally, k � |V | for real-world graphs

22/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 36: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Generalized Robustness Index rk

ui1 ∝ sinh−1/2(λ1) SC(i)1/2

Generalized robustness index rk

rk =

(1|V |

∑|V |i=1

{log(ui1)−

(log(sinh−1/2(λ1)) +

12

log(NSCk (i)))}2

)1/2

� Small rk value→ High robustness� For a robust enough graph, mostly (λ1, u1) will contribute to NSCk

� Parameter k� Trade-off between accuracy and required time

� It can be computed very easily in any programming environment

23/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 37: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

An Illustrative Example: Random vs. Real Graphs

103

104

105

106

10−2

10−1

100

Normalized Subgraph Centrality

Prin

cip

al E

ige

nve

cto

r

r = 3.5027e−05

(a) ER random graph (b) Discrepancy plot

10−2

100

102

104

10−8

10−6

10−4

10−2

100

Normalized Subgraph Centrality

Princip

al E

igenvecto

r

r = 2.4921

A..−L.B.

G.G.

(c) Network science graph (d) A.-L. Barabasi’s egonet (e) Discrepancy plot

24/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 38: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

DatasetsBasic characteristics of real-world networks

Graph # Nodes # Edges

EPINIONS 75, 877 405, 739EMAIL-EUALL 224, 832 340, 795SLASHDOT 77, 360 546, 487WIKI-VOTE 7, 066 100, 736FACEBOOK 63, 392 816, 886YOUTUBE 1, 134, 890 2, 987, 624CA-ASTRO-PH 17, 903 197, 031CA-GR-QC 4, 158 13, 428CA-HEP-TH 8, 638 24, 827DBLP 404, 892 1, 422, 263CIT-HEP-TH 26, 084 334, 091

25/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 39: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Efficiency of rk Index

Scalability

0 5 10 15

x 105

0

10

20

30

40

50

60

Number of Edges

Tim

e (

se

co

nd

s)

Observed Time

Linear Fit

� Several snapshots of theDBLP graph

� Set k = 30 eigenpairs

� The rk index scales linearlywith respect to the number ofedges

26/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 40: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Robustness of Large Static Graphs (1/2)Discrepancy Plots (Principal eigenvector vs. NSC in log-log scales)

EPINIONS

EMAIL-EUALL

SLASHDOT

WIKI-VOTE

FACEBOOK

YOUTUBE

27/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 41: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Robustness of Large Static Graphs (2/2)Discrepancy Plots (Principal eigenvector vs. NSC in log-log scales)

CA-ASTRO-PH

DBLP-1980

CA-HEP-TH

DBLP-2006

CA-GR-QC

CIT-HEP-TH

28/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 42: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Observations: Robustness of Large Social Graphs

� Almost all graphs show good expansion properties→ Highrobustness

� rk values tend to zero

� Not a clear modular structure

� Lack of well-defined clusters that can be easily separated fromthe whole graph

Observation: High Robustness Pattern

Large real-world social graphs exhibit good expansion propertiesand thus high robustness

29/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 43: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Are the Observations Expected?

Community structure property

Social networks

� Social Networks ≡ Community Structure� Bad expansion properties→ Low robustness� Previously observed on small scale social graphs

30/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 44: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Possible Explanations

DBLP-1980: |V | = 5K , |E| = 9K DBLP-2006: |V | = 405K , |E| = 1, 5M

� Structural differences between different scale networks

31/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 45: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Time Evolving Graphs: Fragility Evolution (1/2)

� The graphs evolve over time

� Q: How the robustness of a graph changes over time?

� Study the fragility evolution of a graph

1. For every time point t in the dataset2. Form the graph up to the specific time point t3. For every time snapshot Gt we examine the rk index

� Datasets:� DBLP: 1960-2006 (cumulative graph snapshots per year)� CIT-HEP-TH: February 1993 - April 2003 (cumulative graph

snapshots per month)

32/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 46: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Time Evolving Graphs: Fragility Evolution (1/2)

� The graphs evolve over time

� Q: How the robustness of a graph changes over time?

� Study the fragility evolution of a graph

1. For every time point t in the dataset2. Form the graph up to the specific time point t3. For every time snapshot Gt we examine the rk index

� Datasets:� DBLP: 1960-2006 (cumulative graph snapshots per year)� CIT-HEP-TH: February 1993 - April 2003 (cumulative graph

snapshots per month)

32/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 47: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Time Evolving Graphs: Fragility Evolution (1/2)

� The graphs evolve over time

� Q: How the robustness of a graph changes over time?

� Study the fragility evolution of a graph

1. For every time point t in the dataset2. Form the graph up to the specific time point t3. For every time snapshot Gt we examine the rk index

� Datasets:� DBLP: 1960-2006 (cumulative graph snapshots per year)� CIT-HEP-TH: February 1993 - April 2003 (cumulative graph

snapshots per month)

32/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 48: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Time Evolving Graphs: Fragility Evolution (2/2)DBLP: rk over time

0 10 20 30 40 500

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Time [Years]

Ro

bu

stn

ess I

nd

ex r

t=16 (1975)

CIT-HEP-TH: rk over time

0 20 40 60 80 100 120 1400

0.5

1

1.5

2

2.5

3

Time [Months]

Ro

bu

stn

ess I

nd

ex r

t = 17 (June ’94)

33/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 49: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Time Evolving Graphs: Fragility Evolution (2/2)DBLP: rk over time

0 10 20 30 40 500

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Time [Years]

Ro

bu

stn

ess I

nd

ex r

t=16 (1975)

DBLP: Diameter over time

0 10 20 30 40 502

4

6

8

10

12

14

Time [Years]

Dia

me

ter

t=16 (1975)

CIT-HEP-TH: rk over time

0 20 40 60 80 100 120 1400

0.5

1

1.5

2

2.5

3

Time [Months]

Ro

bu

stn

ess I

nd

ex r

t = 17 (June ’94)

CIT-HEP-TH: Diameter over time

0 20 40 60 80 100 120 1400

2

4

6

8

10

12

14

Time [Months]

Dia

me

ter

t = 17 (June ’94)

34/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 50: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Time Evolving Graphs: Fragility Evolution Pattern

Observation: Fragility evolution pattern

The spike of the robustness is aligned with the gelling point

General Observation:� At the first time points rk ↗ gradually→ Low robustness→ Good

community structure

� After a specific time point, rk starts↘ gradually→ The graphs tend tobecome more robust

� The time point that rk changes, corresponds to the gelling point[McGlohon et al., KDD ’08]

Explanation for the structural differences (regarding robust-ness and community structure) between different scale graphs

35/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 51: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Time Evolving Graphs: Fragility Evolution Pattern

Observation: Fragility evolution pattern

The spike of the robustness is aligned with the gelling point

General Observation:� At the first time points rk ↗ gradually→ Low robustness→ Good

community structure

� After a specific time point, rk starts↘ gradually→ The graphs tend tobecome more robust

� The time point that rk changes, corresponds to the gelling point[McGlohon et al., KDD ’08]

Explanation for the structural differences (regarding robust-ness and community structure) between different scale graphs

35/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 52: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Anomaly Detection (1/2)

Question

Can we spot anomalies over time using the rk index along with thefragility evolution pattern?

1. Examine the rk index over time, trying to identify and track abruptchanges and deviations

2. Sudden deviations from the fragility evolution pattern can possiblycorrespond to anomalies

3. The specific time snapshots (graphs) can be tagged as outliers

36/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 53: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Anomaly Detection (2/2)Fragility Evolution of DBLP (lin-log)

0 10 20 30 40 5010

−15

10−10

10−5

100

105

Time [Years]

Robustn

ess Index r

2002 and 2003

� Strange behavior of the rk indexfor 2002 and 2003

Q: Are these two time graphs reallyoutliers?

A: Yes!

� Explanation1:� After 2001 a large number of new publications were introduced→

Robustness↗→ rk ↘� After 2002-2003 new research fields are covered from DBLP→

New fields formed new communities→ rk ↗

1Personal communication with Michael Ley and Florian Reitz of the DBLP server37/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 54: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Conclusions

� Fast robustness estimation for large scale social networks

� Efficient and effective robustness index� Take advantage of the special spectral properties

� Patterns of large scale static and time evolving social graphs

� Anomaly detection combining the rk index with the observed patterns

� Robustness properties of network generation models

� Ongoing work: Given a budget of k edges, how to improve therobustness of the graph?

38/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 55: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Locating Influential Spreaders

in Social Networks

Page 56: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Identification of Influential SpreadersMotivation

� Spreading processes in complex networks� Spread of news and ideas� Diffusion of influence� Disease propagation� Viral marketing (word-of-mouth effect)� ...

� Identification of influential spreaders (goal of this work)� Able to diffuse information to a large part of the network

� Understand and control spreading dynamics� E.g, vaccinate individuals with good spreading properties in epidemic

control

40/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 57: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Identification of Influential SpreadersMotivation

� Spreading processes in complex networks� Spread of news and ideas� Diffusion of influence� Disease propagation� Viral marketing (word-of-mouth effect)� ...

� Identification of influential spreaders (goal of this work)� Able to diffuse information to a large part of the network

� Understand and control spreading dynamics� E.g, vaccinate individuals with good spreading properties in epidemic

control

40/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 58: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Identification of Influential Spreaders: the Process

Typically, a two-step approach:

1. Consider a topological or centrality criterion of the nodes

� Rank the nodes accordingly

� The top-ranked nodes are candidates for the most influential ones

2. Simulate the spreading process over the network to examine theperformance of the chosen nodes

[Pei and Makse, ’13]

41/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 59: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Identification of Single Influential SpreadersRelated work

� Straightforward approach: consider degree centrality� High degree nodes are expected to be good spreaders� Hub nodes can trigger big cascades

� However, degree is a local criterion� Bad case: star subgraph

� Core-periphery structure of real-world networks

Equal degree (8)

42/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 60: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Identification of Single Influential SpreadersRelated work

� Straightforward approach: consider degree centrality� High degree nodes are expected to be good spreaders� Hub nodes can trigger big cascades

� However, degree is a local criterion� Bad case: star subgraph

� Core-periphery structure of real-world networks

Equal degree (8)

42/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 61: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Identification of Single Influential SpreadersRelated work

� Straightforward approach: consider degree centrality� High degree nodes are expected to be good spreaders� Hub nodes can trigger big cascades

� However, degree is a local criterion� Bad case: star subgraph

� Core-periphery structure of real-world networks

Equal degree (8)

42/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 62: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Degeneracy and k -core Decomposition

Example

3-core

2-core

1-core

Core number ci = 1

Core number ci = 2

Core number ci = 3

Graph Degeneracy δ∗(G) = 3

G0 = G

G1 = 1-core of G

G2 = 2-core of G

G3 = 3-core of G

G0 ⊇ G1 ⊇ G2 ⊇ G3

Detection of dense and cohesive subgraphs

43/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 63: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

The k -core Decomposition Finds Good Spreaders

Degree 96 Core number 63

Degree 96 Core number 26

Node A Node B

� The core number is better spreadingpredictor compared to the degree

� [Kitsak et al., Nature Phys. ’10]

44/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 64: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

The k -core Decomposition Finds Good Spreaders

Degree 96 Core number 63

Degree 96 Core number 26

Node A Node B

� The core number is better spreadingpredictor compared to the degree

� [Kitsak et al., Nature Phys. ’10]

44/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 65: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Motivation and Contributions

� The k -core decomposition often returns a relatively large number ofcandidate influential spreaders

� Only a small fraction corresponds to highly influential nodes

� Can we further refine the set of the most influential spreaders?

Main contributions

� Propose the K -truss decomposition for locating influential nodes

� Triangle-based extension of the k -core decomposition

� Experimental evaluation

� Better spreading behavior� Faster and wider epidemic spreading

45/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 66: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Motivation and Contributions

� The k -core decomposition often returns a relatively large number ofcandidate influential spreaders

� Only a small fraction corresponds to highly influential nodes

� Can we further refine the set of the most influential spreaders?

Main contributions

� Propose the K -truss decomposition for locating influential nodes

� Triangle-based extension of the k -core decomposition

� Experimental evaluation

� Better spreading behavior� Faster and wider epidemic spreading

45/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 67: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

K -truss Decomposition (1/2)Definitions

K -truss subgraph TK [Cohen, ’08], [Wang and Cheng, ’12]

K -truss TK = (VTK , ETK ),K ≥ 2: the largest subgraph of G where every edge iscontained in at least K − 2 triangles within the subgraph

Maximal K -truss subgraph

� The K -truss subgraph defined for the maximum value Kmax of K

� The nodes of this subgraph define set T

46/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 68: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

K -truss Decomposition (2/2)Maximal k -core and K -truss subgraphs

Set

Set

6

CT

� Set C: nodes of maximal k -core

� 3-core subgraph

� Set T : nodes of maximal K -truss

� 4-truss subgraph

� The maximal k -core and K -truss subgraphs overlap� K -truss is a subgraph of k -core (core of the k -core)� Heuristic to improve execution time

� We argue that set T contains highly influential nodes

47/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 69: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

K -truss Decomposition (2/2)Maximal k -core and K -truss subgraphs

Set

Set

6

CT

� Set C: nodes of maximal k -core

� 3-core subgraph

� Set T : nodes of maximal K -truss

� 4-truss subgraph

� The maximal k -core and K -truss subgraphs overlap� K -truss is a subgraph of k -core (core of the k -core)� Heuristic to improve execution time

� We argue that set T contains highly influential nodes

47/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 70: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Experimental Set-up (1/2)Baseline methods

� truss method: nodes belonging to set T (Proposed method)

� core method: nodes belonging to set C − T

� top degree method: nodes with highest degree T� Choose |C| − |T | nodes for fair comparison

48/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 71: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Experimental Set-up (2/2)How to simulate the spreading process?

Susceptible-Infected-Recovered (SIR) model

1. Set candidate node as infected (I state)

2. An infected node can infect its susceptible neighbors with probability β

� Set β close to the epidemic threshold τ = 1λ1

[Chakrabarti et al., ’08]

3. An infected node can recover (stop being active) with probability γ

� Set γ = 0.8

4. Count the total number of infected individuals (avg. over multiple runs)

S I Rβ γ

1− β 1− γ

State diagram of the SIR model

[Barrat et al., ’08]

49/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 72: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Datasets and PropertiesCharacteristics of the K -truss subgraphs

Network Name Nodes Edges kmax Kmax |C| |T | τ

EMAIL-ENRON 33, 696 180, 811 43 22 275 45 0.00840EPINIONS 75, 877 405, 739 67 33 486 61 0.00540WIKI-VOTE 7, 066 100, 736 53 23 336 50 0.00720EMAIL-EUALL 224, 832 340, 795 37 20 292 62 0.00970SLASHDOT 82, 168 582, 533 55 36 134 96 0.00074WIKI-TALK 2, 388, 953 4, 656, 682 131 53 700 237 0.00870

� τ = 1/λ1: epidemic threshold of the graph (λ1: largest eigenvalue of A)

� Set T has significantly smaller size compared to set C

50/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 73: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Datasets and PropertiesCharacteristics of the K -truss subgraphs

Network Name Nodes Edges kmax Kmax |C| |T | τ

EMAIL-ENRON 33, 696 180, 811 43 22 275 45 0.00840EPINIONS 75, 877 405, 739 67 33 486 61 0.00540WIKI-VOTE 7, 066 100, 736 53 23 336 50 0.00720EMAIL-EUALL 224, 832 340, 795 37 20 292 62 0.00970SLASHDOT 82, 168 582, 533 55 36 134 96 0.00074WIKI-TALK 2, 388, 953 4, 656, 682 131 53 700 237 0.00870

� τ = 1/λ1: epidemic threshold of the graph (λ1: largest eigenvalue of A)

� Set T has significantly smaller size compared to set C

50/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 74: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Evaluation the Spreading Process

Apply SIR simulation starting from a single node v each time

� Number of infected nodes at each time step of the process

� Total number of infected nodes Mv

� The time step where the epidemic fades out

51/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 75: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Average Number of Infected Nodes

Time StepMethod 2 4 6 8 10 Final step σ Max step

EMAIL- truss 8.44 46.66 204.08 418.77 355.84 2, 596.52 136.7 33ENRON core 4.78 31.97 152.55 367.28 364.13 2, 465.60 199.6 37

top degree 6.89 34.13 155.48 360.89 357.08 2, 471.67 354.8 36

EPINIONS truss 4.17 19.70 75.04 204.14 329.08 2, 567.69 227.8 37core 3.45 14.72 55.27 158.56 280.03 2, 325.37 327.2 43

top degree 4.22 16.03 58.84 166.23 289.49 2, 414.99 331.7 47

WIKI- truss 2.92 6.92 15.27 28.73 42.46 560.66 114.9 52VOTE core 1.92 4.78 10.65 20.66 32.40 466.01 104.5 57

top degree 2.43 5.46 12.05 23.05 35.55 502.88 104.5 62

EMAIL- truss 11.62 62.25 240.97 584.87 725.42 5, 018.52 487.94 36EUALL core 9.85 40.82 158.72 433.81 644.76 4, 579.84 498.71 38

top degree 17.96 39.93 144.69 503.18 548.25 4, 137.56 1, 174.84 39

� Performance of truss method

� Higher infection rate during the first steps of the process� The total number of infected nodes is larger� The epidemic dies out earlier

52/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 76: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Average Number of Infected Nodes

Time StepMethod 2 4 6 8 10 Final step σ Max step

EMAIL- truss 8.44 46.66 204.08 418.77 355.84 2, 596.52 136.7 33ENRON core 4.78 31.97 152.55 367.28 364.13 2, 465.60 199.6 37

top degree 6.89 34.13 155.48 360.89 357.08 2, 471.67 354.8 36

EPINIONS truss 4.17 19.70 75.04 204.14 329.08 2, 567.69 227.8 37core 3.45 14.72 55.27 158.56 280.03 2, 325.37 327.2 43

top degree 4.22 16.03 58.84 166.23 289.49 2, 414.99 331.7 47

WIKI- truss 2.92 6.92 15.27 28.73 42.46 560.66 114.9 52VOTE core 1.92 4.78 10.65 20.66 32.40 466.01 104.5 57

top degree 2.43 5.46 12.05 23.05 35.55 502.88 104.5 62

EMAIL- truss 11.62 62.25 240.97 584.87 725.42 5, 018.52 487.94 36EUALL core 9.85 40.82 158.72 433.81 644.76 4, 579.84 498.71 38

top degree 17.96 39.93 144.69 503.18 548.25 4, 137.56 1, 174.84 39

� Performance of truss method

� Higher infection rate during the first steps of the process� The total number of infected nodes is larger� The epidemic dies out earlier

52/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 77: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Average Number of Infected Nodes

Time StepMethod 2 4 6 8 10 Final step σ Max step

EMAIL- truss 8.44 46.66 204.08 418.77 355.84 2, 596.52 136.7 33ENRON core 4.78 31.97 152.55 367.28 364.13 2, 465.60 199.6 37

top degree 6.89 34.13 155.48 360.89 357.08 2, 471.67 354.8 36

EPINIONS truss 4.17 19.70 75.04 204.14 329.08 2, 567.69 227.8 37core 3.45 14.72 55.27 158.56 280.03 2, 325.37 327.2 43

top degree 4.22 16.03 58.84 166.23 289.49 2, 414.99 331.7 47

WIKI- truss 2.92 6.92 15.27 28.73 42.46 560.66 114.9 52VOTE core 1.92 4.78 10.65 20.66 32.40 466.01 104.5 57

top degree 2.43 5.46 12.05 23.05 35.55 502.88 104.5 62

EMAIL- truss 11.62 62.25 240.97 584.87 725.42 5, 018.52 487.94 36EUALL core 9.85 40.82 158.72 433.81 644.76 4, 579.84 498.71 38

top degree 17.96 39.93 144.69 503.18 548.25 4, 137.56 1, 174.84 39

� Performance of truss method

� Higher infection rate during the first steps of the process� The total number of infected nodes is larger� The epidemic dies out earlier

52/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 78: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Average Number of Infected Nodes

Time StepMethod 2 4 6 8 10 Final step σ Max step

EMAIL- truss 8.44 46.66 204.08 418.77 355.84 2, 596.52 136.7 33ENRON core 4.78 31.97 152.55 367.28 364.13 2, 465.60 199.6 37

top degree 6.89 34.13 155.48 360.89 357.08 2, 471.67 354.8 36

EPINIONS truss 4.17 19.70 75.04 204.14 329.08 2, 567.69 227.8 37core 3.45 14.72 55.27 158.56 280.03 2, 325.37 327.2 43

top degree 4.22 16.03 58.84 166.23 289.49 2, 414.99 331.7 47

WIKI- truss 2.92 6.92 15.27 28.73 42.46 560.66 114.9 52VOTE core 1.92 4.78 10.65 20.66 32.40 466.01 104.5 57

top degree 2.43 5.46 12.05 23.05 35.55 502.88 104.5 62

EMAIL- truss 11.62 62.25 240.97 584.87 725.42 5, 018.52 487.94 36EUALL core 9.85 40.82 158.72 433.81 644.76 4, 579.84 498.71 38

top degree 17.96 39.93 144.69 503.18 548.25 4, 137.56 1, 174.84 39

� Performance of truss method

� Higher infection rate during the first steps of the process� The total number of infected nodes is larger� The epidemic dies out earlier

52/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 79: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Comparison to the Optimal Spreading (1/2)Methodology

OPT1 OPT2 OPT3 · · · OPT|V |

Window W

1. Rank nodes according to the total infection size Mv

� OPT1 ≥ OPT2 ≥ . . . ≥ OPT|V |, where OPT1 = arg maxv∈V Mv

2. Consider window W over the ranked nodes

PTW =|TW |/|T ||W |/|V |

� TW is the set of nodes v ∈ T located in W

53/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 80: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Comparison to the Optimal Spreading (1/2)Methodology

OPT1 OPT2 OPT3 · · · OPT|V |

Window W

1. Rank nodes according to the total infection size Mv

� OPT1 ≥ OPT2 ≥ . . . ≥ OPT|V |, where OPT1 = arg maxv∈V Mv

2. Consider window W over the ranked nodes

PTW =|TW |/|T ||W |/|V |

� TW is the set of nodes v ∈ T located in W

53/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 81: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Comparison to the Optimal Spreading (2/2)Results

EMAIL-ENRON

0 0.5 1 1.50

20

40

60

80

100

Window W (%)

PW

(%

)

C

T

WIKI-VOTE

0 5 100

20

40

60

80

100

Window W (%)

PW

(%

)

C

T

� PTW reaches the maximum value (i.e., 100%) relatively early and forsmall window sizes, compared to PCW

� The nodes detected by the K -truss decomposition are betterdistributed among the most efficient spreaders

54/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 82: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Comparison to the Optimal Spreading (2/2)Results

EMAIL-ENRON

0 0.5 1 1.50

20

40

60

80

100

Window W (%)

PW

(%

)

C

T

WIKI-VOTE

0 5 100

20

40

60

80

100

Window W (%)

PW

(%

)

C

T

� PTW reaches the maximum value (i.e., 100%) relatively early and forsmall window sizes, compared to PCW

� The nodes detected by the K -truss decomposition are betterdistributed among the most efficient spreaders

54/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 83: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Discussion

� Core decomposition methods can help towards identifying singleinfluential spreaders

� K -truss decomposition

� Faster and wider epidemic spreading

� Well distributed nodes among those that are achieving the optimalspreading

� Ongoing work: Identification of multiple influential spreaders

55/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 84: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

From Influential Spreaders to Influential TermsIn a network of words

Graph mining and core decomposition

Detection of influential users in social networks

Influential (important) entities in an interconnected space

Influential terms (i.e., words) for text analytics

Text as information network

with application to keyword selection and text categorization

56/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 85: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

From Influential Spreaders to Influential TermsIn a network of words

Graph mining and core decomposition

Detection of influential users in social networks

Influential (important) entities in an interconnected space

Influential terms (i.e., words) for text analytics

Text as information network

with application to keyword selection and text categorization

56/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 86: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

From Influential Spreaders to Influential TermsIn a network of words

Graph mining and core decomposition

Detection of influential users in social networks

Influential (important) entities in an interconnected space

Influential terms (i.e., words) for text analytics

Text as information network

with application to keyword selection and text categorization

56/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 87: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

From Influential Spreaders to Influential TermsIn a network of words

Graph mining and core decomposition

Detection of influential users in social networks

Influential (important) entities in an interconnected space

Influential terms (i.e., words) for text analytics

Text as information network

with application to keyword selection and text categorization

56/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 88: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Graph-Based Text Analytics

Page 89: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Text is Everywhere

� Rapid growth of social media and networking platforms� Vast amount of textual data� Analyze and extract useful information

� Text analytics (mining) tasks� Summarization (e.g., keyword extraction)� Categorization (e.g., sentiment analysis)� Information Retrieval� ...

58/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 90: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Text is Everywhere

� Rapid growth of social media and networking platforms� Vast amount of textual data� Analyze and extract useful information

� Text analytics (mining) tasks� Summarization (e.g., keyword extraction)� Categorization (e.g., sentiment analysis)� Information Retrieval� ...

58/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 91: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Term Weighting in the Bag-of-Words Model

� Vector Space Model� Collection of m documents: D = {d1, d2, . . . , dm}� Set of n distinct terms (dictionary): T = {t1, t2, . . . , tn}

� Feature extraction and term weighting in the Bag-of-words model

� Find: feature vector di = {wi,1,wi,2, . . . ,wi,n}, ∀di ∈ D� Each document is represented as a bag (multiset) of its words

� Limitations� Term independence assumption of frequency-based weighting

schemes (e.g., TF, TF-IDF)� Order of terms is disregarded

59/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 92: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Term Weighting in the Bag-of-Words Model

� Vector Space Model� Collection of m documents: D = {d1, d2, . . . , dm}� Set of n distinct terms (dictionary): T = {t1, t2, . . . , tn}

� Feature extraction and term weighting in the Bag-of-words model

� Find: feature vector di = {wi,1,wi,2, . . . ,wi,n}, ∀di ∈ D� Each document is represented as a bag (multiset) of its words

� Limitations� Term independence assumption of frequency-based weighting

schemes (e.g., TF, TF-IDF)� Order of terms is disregarded

59/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 93: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Term Weighting in the Bag-of-Words Model

� Vector Space Model� Collection of m documents: D = {d1, d2, . . . , dm}� Set of n distinct terms (dictionary): T = {t1, t2, . . . , tn}

� Feature extraction and term weighting in the Bag-of-words model

� Find: feature vector di = {wi,1,wi,2, . . . ,wi,n}, ∀di ∈ D� Each document is represented as a bag (multiset) of its words

� Limitations� Term independence assumption of frequency-based weighting

schemes (e.g., TF, TF-IDF)� Order of terms is disregarded

59/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 94: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

ExampleText to graph representation; The Graph-of-words model

Graph representation of a document (w = 3; undirected graph)

Data Science is the extraction of knowledge from large volumes of datathat are structured or unstructured which is a continuation of the field ofdata mining and predictive analytics, also known as knowledge discoveryand data mining.

data

scienc

extract

knowledg

larg

volum

structur

unstructur

continu

field

mine

predict

analyt

discoveri

known

60/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 95: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

ExampleText to graph representation; The Graph-of-words model

Graph representation of a document (w = 3; undirected graph)

Data Science is the extraction of knowledge from large volumes of datathat are structured or unstructured which is a continuation of the field ofdata mining and predictive analytics, also known as knowledge discoveryand data mining.

data

scienc

extract

knowledg

larg

volum

structur

unstructur

continu

field

mine

predict

analyt

discoveri

known

60/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 96: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Degeneracy-based Keyword Extraction

mathemat

aspect

computer−aid

share

trade

problem

statist

analysi

price

probabilist

characterist

seri

method

model

Edge weights

12345

Core numbers

891011

inf

shells

diffe

renc

e in

she

ll si

zes

−6

4

−2

0

2

(10 − 11) (9 − 10) (8 − 9)

0.38

0.42

0.46

0.50

dens

k−cores

dens

ity

11 10 9 8

● ● ●

● ●●

● ●●

● ●

2 4 6 8 10 12 14

7090

110

130TR

nodes

TR s

core

s

elbowtop 33%

Mathematical aspects of computer-aided share trading. We consider problems of statistical analysis of share prices and propose probabilistic characteristics to describe the price series. We discuss three methods of mathematical modelling of price series with given probabilistic characteristics.

Core numbers TR scoresP R F1 P R F1

MAIN 0.86 0.55 0.67 ELB 1 0.18 0.31INF 0.83 0.91 0.87

PER 1 0.45 0.63DENS 0.88 0.64 0.74

mathemat 11 price .1359

price 11 share .0948

probabilist 11 .0906

characterist 11 .0870

seri 11 .0860

method 11 mathemat .0812

model 11 analysi .0633

share 10 statist .0595

trade 9 method .0569

problem 9 problem .0560

statist 9 trade .0525

analysi 9 model .0493

aspect 8 computer-aid .0453

computer-aid 8 aspect .0417

MAIN

INF

DENS

ELB

probabilist

characterist

seri

CR scoresP R F1

ELB 0.90 0.82 0.86

PER 1 0.45 0.63

mathemat 128

price 120

analysi 119

share 118

probabilist 112

characterist 112

statist 108

trade 97

problem 97

seri 94

method 85

computer-aid 76

model 66

aspect 65

PER

ELB

PER

(a)

(c)

(b)

●●

● ●●

●●

● ●●

●●

2 4 6 8 10 12 14

0.04

0.08

0.12

CRelbowtop 33%

nodes

CR

sco

res

(a) Graph-of-words (W = 8) for document #1512 of Hulth2003 decomposed with k -core. (b) Keywords extracted by the different methods

(human assigned keywords in bold). (c) Selection criterion of each method.

61/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 97: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Concluding Remarks

Page 98: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Mining Large Scale Networks

Network analytics with

spectral techniques and core decomposition

1. Spectral robustness estimation of social graphs

2. Core decomposition based identification of influential spreaders

3. Degeneracy-based keyword extraction

63/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 99: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Ongoing and Future Work

� Community detection and dense subgraph detectionalgorithms

� Influence maximization in social networks

� Algorithms for rich network structures� Time, location, strength, trust, text� Probabilistic, signed and multilayer graphs

� Machine Learning tasks on graphs� Link prediction, node classification and graph classification� Node and graph embeddings based on deep learning

� Text analytics using graph-based techniques

64/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 100: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Publications I

F. D. Malliaros, V. Megalooikonomou, and C.Faloutsos

Fast Robustness Estimation in Large Social Graphs: Communities and Anomaly DetectiionIn SIAM International Conference on Data Mining (SDM), 2012.

F. D. Malliaros and M. Vazirgiannis

Clustering and Community Detection in Directed Networks: A Survey.Physics Reports, 533(4), Elsevier, 2013.

F. D. Malliaros and M. Vazirgiannis

To Stay or Not to Stay: Modeling Engagement Dynamics in Social Graphs.In ACM International Conference on Information and Knowledge Management (CIKM), 2013.

C. Giatsidis, F. D. Malliaros, D. M. Thilikos, and M. Vazirgiannis

CoreCluster: A Degeneracy Based Graph Clustering Framework.In AAAI Conference on Artificial Intelligence, (AAAI), 2014.

M.-E. G. Rossi, F. D. Malliaros, and M. Vazirgiannis

Spread It Good, Spread It Fast: Identification of Influential Nodes in Social Networks.In International Conference on World Wide Web Companion (WWW), 2015.

F. D. Malliaros and M. Vazirgiannis

Vulnerability Assessment in Social Networks under Cascade-based Node Departures.EPL (Europhysics Letters), 11(6), 2015.

F. D. Malliaros and Michalis Vazirgiannis

Disengagement Social Contagion: Assessing Network Vulnerability under Node Departures.In International Conference on Computational Social Science (ICCSS), 2015.

65/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 101: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Publications II

F. D. Malliaros, V. Megalooikonomou, and C.Faloutsos

Estimating Robustness in Large Social Graphs.Knowledge and Information Systems (KAIS), Springer, 2015.

F. D. Malliaros and K. Skianis

Graph-Based Term Weighting for Text Categorization.In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Social Media and RiskWorkshop (SoMeRis), 2015.

F. D. Malliaros, M. Rossi, and M. Vazirgiannis

Locating Influential Nodes in Complex Networks.Scientific Reports, 6(19307), Nature Publishing Group, 2016.

A. J.-P. Tixier, F. D. Malliaros, and M. Vazirgiannis.

A Graph Degeneracy-based Approach to Keyword Extraction.In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016.

F. D. Malliaros, K. Skianis, and M. Vazirgiannis

A Graph-Based Framework for Text Categorization: Term Weighting and Selection as Ranking of Interconnected Features.Submitted Manuscript, 2016.

J. Casas-Roma, F. D. Malliaros, and M. Vazirgiannis

k -Degree Anonymity on Directed Networks.Manuscript, 2016.

M. Mitri, F. D. Malliaros, and M. Vazirgiannis

Sensitivity of Community Structure to Network Uncertainty.Manuscript, 2016.

66/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 102: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

References I

V. Batagelj and M. Zaversnik.

AnO(m) Algorithm for Cores Decomposition of Networks.CoRR, 2003.

A. Barrat, M. Barthlemy, and A. Vespignani.Dynamical Processes on Complex Networks.Cambridge University Press, 2008.

D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and Christos Faloutsos.

Epidemic Thresholds in Real Networks.ACM Trans. Inf. Syst. Secur., 10(4), 2008.

J. Cohen

Trusses: Cohesive Subgraphs for Social Network Analysis.National Security Agency Technical Report, 2008.

D. Easley and J. KleinbergNetworks, Crowds, and Markets: Reasoning About a Highly Connected World.Cambridge University Press, 2010.

M. Faloutsos, P. Faloutsos, and C. Faloutsos

On Power-law Relationships of the Internet Topology.In SIGCOMM, 1999.

D. Kempe, J. Kleinberg, and E. Tardos

Maximizing the Spread of Influence Through a Social Network.In KDD, 2003.

M. Kitsak, L. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. Stanley, and H. Makse

Identification of Influential Spreaders in Complex Networks.Nature Physics, 6(11), 2010.

67/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 103: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

References II

S. Pei and H. A. Makse

Spreading Dynamics in Complex Networks.Journal of Statistical Mechanics: Theory and Experiment, 12, 2013.

S. B. Seidman

Network Structure and Minimum Degree.Social Networks 5, 1983.

J. Wang and J. Cheng

Truss Decomposition in Massive Networks.Proc. VLDB Endow., 5(9), 2012.

68/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks

Page 104: Mining Social and Information Networkscseweb.ucsd.edu/classes/fa16/cse259-a/slides/fragkiskos.pdf · Mining Social and Information Networks Fragkiskos D. Malliaros Ecole Polytechnique,

Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks

Thank You!

NETWORK AND DATA SCIENCE

Graph Mining

Social Networks Analysis

Spectral Graph Theory

Core Decomposition

Applied Machine Learning

Natural Language Processing

Engagementdynamics

Social vulnerabilityassessment

Graph-basedtext analytics

Privacy in realnetworks

Spectral robust-ness estimation

Communitydetection

Influentialspreaders

69/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks