Social Network Analysis and Visualization

Social Network Analysis

& Visualization

@mirezez Alberto Ramirez

About Presenter


Systems Architect, iSystems




Nodes & Edges

Un/directed, Un/weighted



Graph DensityMeaures how many edges are in the graph compared to the maximum

possible number of edges (complete graph).

DiameterLongest path between any two nodes in the network

RadiusMinimum eccentricity for any given node in the graph

Graph Basics

Real NetworksProperties

1. Growth: Networks are assembled one node at a time

and increase in size.

2. Preferential attachment: As new nodes join the

network, the probability that it will choose a given

node is proportional to the number of nodes that

target node already has.

“Rich Get Richer”

Social Network Examples

Undirected, UnweightedDirected, Unweighted

Cliques• k-clique, where all nodes are adjacent to each other within the


• n-clique, where n is a positive integer, is a collection C of vertices in

which any two vertices u,v ∈ C have distance ≤n.

• p-clique, where p is a real number between 0 and 1, is a collection C of

vertices in which any vertex has ≥p|C| neighbors in C.

Trouble with Clique Targeting1. Not resilient networks.

2. Uniformity in the way cliques are defined can lead to little to no insights into that subgraph.

3. The clique might be a narrowing of a larger, more legitimate community to be evaluated.

Finding Cliques

K-CoresMaximal subgraph with minimum degree at least k.

In Graph G as k increases,

the subgraph becomes more exclusive

Finding K-Cores

Node CentralityIdentifying important nodes

Betweenness Centrality

Measures how often a node appears in the shortest path between nodes in network

Closeness Centrality

Average distance from a given node to all other nodes in the graph


(Hierarchical) Clustering

Girvan–Newman algorithm O(N^3)

1. Calculate betweenness of all edges in graph.

2. Remove edge with highest betweenness.

3. The betweenness of all edges affected by the removal is recalculated.

4. Rinse and repeat until no edges remain.

Expensive, Yet Intuitive Decomposition of Graph

Case Study: Finding @VTCodeCamp

Twitter Communities


NetworkTwitter users as nodes, follows as directed edges.

1. Find all followers of @VTCodeCamp, recursively find

next level of users.

2. Removing @VTCodeCamp from final datasets.

3. Twitter RESTful Search API v1.1

4. Node.js Client

5. MongoDB 3.0 Aggregation Framework

6. GEXF - Graph Exchange Xml Format

7. Gephi & Gephi Toolkit (JVM) - Analysis & Viz

Twitter API

Node.js Client


.gexf Format

Gephi / Toolkit




GET followers/ids

15 Resource Requests per 15 Minute Window,

5000 max per response (cursor)



Twitter Search API

GET users/lookup180 Resource Requests per 15 Minute Window



{ $project : {_id:0, user_id : "$user_id_str",

twitterFollowers : "$followers.ids_str" } },

{ $unwind : "$twitterFollowers" }]);

KV Pairs of User/Follower

Aggregation FrameworkRank Followers of @VTCodeCamp by their In-Degree

Aggregate k-cliques

GEXF Format

Graph Statsvtcodecamp

557 users

11,058 follows364,951 users

1,045,606 follows

Gephi Toolkit Demo

Gephi Application Demo

Iteration #1

MCL Clustering

Betweenness Node Size

Iteration #1

MCL Clustering

Betweenness Node Size

Diameter = 7

Software/Data Resources◦ Applications



Pajek (Windows)

◦ Packages

NetworkX (Python)

igraph (R or Python)

Statnet (R)


Gephi Toolkit (JVM)

◦ Data Sets

Gephi Datasets

UCIrvine -

