Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World...

19
PCI13 Thessaloniki, 19 Sep 2013 Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena Konstantinos Konstantinidis, Symeon Papadopoulos, Yiannis Kompatsiaris

description

Paper presentation in PCI 2013. Abstract:

Transcript of Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World...

Page 1: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

PCI13 Thessaloniki, 19 Sep 2013

Community Structure, Interaction and Evolution

Analysis of Online Social Networks around Real-World

Social Phenomena

Konstantinos Konstantinidis, Symeon Papadopoulos, Yiannis Kompatsiaris

Page 2: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

Problem

#2

Online Social Networks (OSNs) are immense!

Page 3: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

#3

Motivation

• Social Networks – Used to be small (Grevy's zebra dataset) – Easy to organize

• Online Social Networks (Twitter) – Have an immense amount of data – Incredibly difficult to organize and extract useful information

• Ways to monitor activity in OSNs: – Keywords (Produces too much info, doesn’t work when lexical variations are used) – Newshounds and Persons of Interest (may result in loss of info)

• Proposal to leverage: – Time – Communities formulated by users interested in a specific topic – The behavior of these communities in time

• Provide the user with info regarding: – Temporal user activity per topic – Influential, Stable and Persistent Communities – Users worth following (possibility of new newshounds) – Content worth monitoring

Page 4: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

#4

Framework overview

Feature

Fusion

Most influential

users and

communities

+

Popular

hashtags

Persistence

Stability

Centrality*

(PageRank)

Community

Size

Evolution

Heatmap

Pre-processsing

(Information

Extraction)

Temporal

Adjacency Matrix

Creation

Interaction Data

Discretization

Community

Evolution Detection

Community

Detection

(Louvain)

Ranking Process

Evolution Detection Process

*Ongoing work

Twitter Data

Mentions and

hashtags in

time

Page 5: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

#5

Interaction data discretization

• Community evolution study requires timeslot analysis

• Tweeting activity provides information on whether or not the users are active as well as if something interesting is happening (has happened)

• In this framework, the timeslots are created using the local minima of the overall activity

• Peaks and positive slopes inform us that the users are interested in some phenomenon or are involved in a conversation

• Minima and negative slopes show us that the users’ interest is diminishing

Page 6: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

#6

Interaction data discretization example

Page 7: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

#7

Community detection & evolution

1 1 2 1 1 3 1 2 1 1 1

2 2 2 2

1 1 1 1 1 1

1 1

2 1

2 1 4 1

1 2 2 2

2 1 1 1

1 8 2 1 1 1 1 1

2 4

1 1 1 2 1

1 1 1 2 1

1 1 1 1 1

4 1 2 1

1 1 1 4

1 1 2 1 1 3

1 1 1 1 2

1 1 2 1 1

1 1 1 2 1

5 1 1 2 2

Timeslot (n-2)

Timeslot (n-1)

Timeslot (n)

Timeslot (n+1)

Louvain Community Detection Method (V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008 (12pp), 2008.)

n-1 n n+1

T1

T5

T4

T3

T2

C6(n-1)

C1n C1(n+1)C1(n-1)

C2(n-1) C2n C2(n+1)

C4(n-1) C4(n+1)

C5n C5(n+1)

C3n C3(n+1)C3(n-1)

Sequential Adjacency Matrices Evolving Communities Timeslots [1,…,n-1,n,n+1,…]

Communities C = {C1n,C2n, ...,Ckn}

Time-Evolving Communities Ti

Page 8: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

Louvain Community Detection

A popular greedy modularity optimization approach.

The two following steps are repeated iteratively until a maximum of modularity is attained and a hierarchy of communities is produced:

a) Small community detection by local modularity optimization

b) Aggregation of nodes belonging to the same community and creation of a network with the communities as nodes

It was selected due to its efficiency regarding:

• Speed

• Accuracy when dealing with ad-hoc networks

• Due to its hierarchical structure it allows to look at communities at different resolutions

#8

Page 9: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

T11 T21 T41 T61 T81 T91

T11 T41 T52 T91

T11 T21 T52 T81 T91

T21 T52 T74 T91

T41 T52 T74 T81 T91

#9

Community evolution detection

C11 C21 C31 C41 C51 C61 C71 C81 C91

C12 C22 C32 C42 C52 C62 C72 C82 C92

C13 C23 C33 C43 C53 C63 C73 C83 C93

C14 C24 C34 C44 C54 C64 C74 C84 C94

C15 C25 C35 C45 C55 C65 C75 C85 C95

Comparing the communities from each row to communities from past rows using the Jaccard Index

Community similarity according to:

• Jaccard Index • Adaptive threshold

Adaptive threshold:

• Relative to size • Range: [0.7,0.1]

Page 10: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

#10

Single timeslot graph example

Searching through a single timeslot (i.e. approximately 24 hours) can be time consuming. Imagine browsing through months of data! Indexing is clearly a necessity.

Page 11: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

#11

Evolution features, fusion & ranking

Centrality

Persistence

Stability

Community Evolution

Dynamic Community

Ranking

Ranked Communities

(All Users)

Ranked Users in Communities

based on Centrality

Content (txt) from timeslots of

interest

User Interface

• Persistence: overall appearances / total number of timeslots

• Stability: overall consecutive appearances/ total number of timeslots

• PageRank Centrality: a rough estimate of how important a node is by counting the number and quality of links

Page 12: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

Pros and Cons

#12

Dynamic Community and User Ranking

• Advantages – Saves user time (manually searching for news is extremely time

consuming)

– Enables browsing through the most important information

– Provides a sense of user importance over time (users worth following for future investigations)

• Disadvantages – Community Detection and Community Evolution Detection are slow

processes

– No semantic ranking (lack of content consideration) renders the framework susceptible to error

Page 13: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

Framework application example

Application on a dataset extracted from the Twitter OSN.

• Dataset Characteristics: – Period: 32 days

– Keywords: 40 (English and Greek)

– Unique users: 857K

– Messages: 880K

– Edges: 1.07M

#13

Greek Global

Hashtags Keywords Hashtags Keywords

Michaloliakos nazi

#Xryshaygh Kasidiaris #nazi far right

#GoldenDawn golden dawn #extremeright extreme right

#Kasidiaris xrysh aygh #farright Hitler

illegal immigrants Swastica

Page 14: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

Framework application example

• Results – Total number of communities:

232K

– Final number of communities (excluding self loops & communities<3): 89K

– Total evolution steps: 7K

– Total evolving communities: 1.1K

– Number of Timeslots: 28

#14

• Light Shades signify Small communities • Dark Shades signify Large Communities

Page 15: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

Framework application example (results)

Rank 1 2 3 4 5

Community Id 1,122 13,2044 10,404 18,89 22,2

Timeslot appearance

1,2,3,4,5,6,7,8,9,11,13

13,15,16,17,18,19,20,22,23,25

10,11,12,15,16,17,18,19

18,19,20,21,22,23,25

22,23,24,25,26,27

Size/slot 16,15,8,5,7,28,4,8,9,8,30

3,4,9,4,6,6,5,4,7,5 6,5,4,4,9,5,3,3 36,137,323,281,64,146,139

977,1129,942,946,1251,2054

Persistence 0.392857 0.357142 0.285714 0.25 0.214285

Stability 0.310344 0.241379 0.241379 0.206896 0.206896

Centrality 0.635401 0.801170 0.817923 0.820052 0.797400

Popular Tags (ranked)

Indiebooks, bcn, madrid, andalucía, españa

keepmovingforward Israel, ashkenazi, ptsd, 2rrf

Jamaat, nazi, shahbag, taliban, sayeedi

1,01,31,4,2

Topic Spanish book on Hitler: El Legado

Pakistani person named Nazi

Israeli anti-nazi posts

Associating Jamaat (Bangladesh) to nazi

Videogame

#15

Page 16: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

Framework application example (Greek interest)

Group of interconnected foreign and Greek communities surrounded by an abundance of groups and single users.

#16

A Greek community commenting on a poll that presented the GGD party as the most popular amongst unemployed citizens

Page 17: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

Future Work

• Enhance community similarity search (speedup)

• Framework enrichment by incorporating retweets as a feature

• Introduce to journalists for constructive criticism

#17

Mention, Retweet &

Timestamp Information

Extraction

Community

Detection

Community

Evolution

Detection

Community

Size

Total # of

Mentions

Degree of

mentions

Persistence

Stability

Centrality

Could they be

used as a

Ground Truth

Set?

Provide a

base line

Fusion

Most

influential

users and

communities

+

Popular

hashtags

Query

Correction &

Improvement

via Relevance

Feedback?

Twitter Data

Retweets in

time

Page 18: Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

Conclusions

• A framework for extracting information from evolving communities in dynamic social networks.

• Significant information can be retrieved by studying the evolution of communities of OSNs (e.g. Twitter).

• Existence of a large number of dynamic communities with various evolutionary characteristics.

#18