Class 3: Introduction to CINET Tools for network analysis and visualization Network Science:...

35
Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin Kuzmin

Transcript of Class 3: Introduction to CINET Tools for network analysis and visualization Network Science:...

Page 1: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

Class 3: Introduction to CINET

Tools for network analysis and visualization

Network Science: Introduction to CINET 2015

Prof. Boleslaw K. SzymanskiKonstantin Kuzmin

Page 2: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

2

TOOLS OVERVIEW (LISTED ALPHABETICALLY)

Tools for network analysis and visualization

• Computing model and interface– Desktop GUI applications– API/code libraries, Web services– Web GUI front-ends (cloud, distributed, HPC)

• Extensibility model– Only by the original developers– By other users/developers (add-ins, modules, additional packages, etc.)

• Source availability model– Open-source– Closed-source

• Business model– Free of charge– Commercial

Network Science: Introduction to CINET 2015

Page 3: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

3

TOOLS CINET

CyberInfrastructure for NETwork science

• Accessed via a Web-based portal• Supported by grants, no charge for end users• Aims to provide researchers, analysts, and educators

interested in Network Science with an easy-to-use cyber-environment that is accessible from their desktop and integrates into their daily work

• Users can contribute new networks, data, algorithms, hardware, and research results

• Primarily for research, teaching, and collaboration• No programming experience

is required

Network Science: Introduction to CINET 2015

Page 4: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

4

TOOLS Cytoscape

Network Data Integration, Analysis, and Visualization

• A standalone GUI application • A platform for visualizing complex networks and integrating

these with any type of attribute data• Originally developed for biological research• Includes features for data integration, analysis, and visualization• A variety of layout algorithms, including cyclic, tree, force-

directed, edge-weight, and yFiles Organic layouts• Implemented in Java• Runs on any Java-supported platform• Modular architecture extensible through

plugins (called Apps)• Open-source and free of charge

Network Science: Introduction to CINET 2015

Page 5: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

5

TOOLS Gephi

The Open Graph Viz Platform

• A standalone GUI application • An interactive visualization and exploration platform for all

kinds of networks and complex systems, dynamic and hierarchical graphs

• Static and dynamic networks• Clustering and hierarchical graphs, community detection• Visualization layouts supported: ForceAtlas, Yifan's Hu

Multilevel• Modular architecture customizable with plugins• Runs on Windows, Linux and Mac OS X• Implemented in Java. Graph size <1M nodes & edges• Open-source and free of charge

Network Science: Introduction to CINET 2015

Page 6: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

6

TOOLS Graphviz

Graph Visualization Software• A graph description language (called DOT) and a set of tools that can

generate and/or process DOT files• Can be used as standalone tool or as a library• Only graph drawing• A wide range of layouts:

– Hierarchical or layered drawings– Spring model layouts– Multiscale layout for large graphs– Radial layouts– Circular layouts

• Implemented in C• Runs on Linux, Windows and Mac OS X• Extensible through a scripting API• Open-source and free of charge

Network Science: Introduction to CINET 2015

Page 7: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

7

TOOLS Pajek

Pajek and Pajek-XXL• A standalone GUI application• Several partitioning and community detection

algorithms• Network generator (random, Bernoulli/Poisson, scale free, small world,

etc.)• Support for ordinary (directed, undirected, mixed) as well as multi-

relational networks, bipartite, and temporal networks• Capable of analyzing and visualizing large networks with thousands or

even millions of nodes• Macro capability enables recording and

playback of a sequence of primitivecommands

• Implemented in Delphi (Pascal). Only Windows OS are supported (32 and64 bit)

• Freely available for noncommercial useNetwork Science: Introduction to CINET 2015

Page 8: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

8

TOOLS SNAP

Stanford Network Analysis Platform (SNAP)

• A general purpose network analysis and graph mining library

• Written in C++ but Python interface is also available• Scales to massive networks with hundreds of millions of

nodes, and billions of edges• Efficiently manipulates large graphs, calculates structural

properties, generates regular and random

Network Science: Introduction to CINET 2015

http://snap.stanford.edu/

graphs, and supports attributes on nodesand edges

• Also available through the NodeXL which is a graphical front-end that integrates network analysis into Microsoft Office and Excel

Page 9: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

9

CINET

What is CINET

• A web-based tool for analyzing networks that represent interactions in large-scale complex systems

• A large set of networks and algorithms to analyze networks

• Ability to add user networks and have them analyzed by the algorithms available in CINET

• The web-based interface has been designed to simplify the analysis of complex networks for users who are not necessarily computer scientists

Network Science: Introduction to CINET 2015

Page 10: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

10

CINET Registration

Creating an account with GRANITE• Go to the login page http://

cinet.vbi.vt.edu/granite/granite.html• Click “Register” to create an account• Fill in the “Request Account”

form and click “RegisterAccount”

• Use your username andpassword to log into thesystem

Network Science: Introduction to CINET 2015

Page 11: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

11

CINET Structural organization

Client-server model

Network Science: Introduction to CINET 2015

Page 12: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

12

CINET Architecture

Layered architecture

Network Science: Introduction to CINET 2015

Page 13: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

13

CINET Apps

Tools in CINET• Structural Analysis Tool (Granite)

– 190+ networks (graphs) – 20+ network generators – 70+ network algorithms (measures): GaLib, SNAP (Stanford), NetworkX– Visualization of networks: Gephi– Service for adding new networks (graphs) – Service for adding new structural analysis tools (graph algorithms)

• Graph Dynamical System Calculator (GDSC) – Complete network dynamics on networks– Analyzing the phase structure of GDS; small graphs – 13 graph templates; 15 vertex function (behavior) families

• Simulation of Dynamics (EDISON) – Forward trajectory (dynamics) on networks– Compute (contagion) dynamics on larger networks: simulation– Services to manipulate attributed networks and to run simulations– Several contagion models: with and without interventions

Network Science: Introduction to CINET 2015

Page 14: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

14

CINET Components

Computational engines and resources• GaLib: provides efficient implementations of various classical and

new graph algorithms that are motivated by the analysis of social contact graphs and disease dynamics on such graphs.

• NetworkX: a powerful Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

• Stanford Network Analysis Platform (SNAP): a general purpose network analysis and graph mining library.

• Both traditional high performance computing clusters, e.g., Shadowfax, Pecos (Virginia Tech), and cloud computing infrastructure, e.g., FutureGrid. An intelligent resource manager chooses appropriate computing platform for a network analysis job considering resource availability and computational and memory requirement.

Network Science: Introduction to CINET 2015

Page 15: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

15

CINET Features

Available features

• Network Analysis • Network Generators • Network List • Measure List • Visualization • NetScript• Dynamic Analysis (in upcoming versions)

Network Science: Introduction to CINET 2015

Page 16: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

16

CINET Datasets

Networks• Social, web/internet, biological, infrastructure and transportation, artificial,

and other types of networks• Currently 194 public datasets are available:

– Amazon product co-purchasing– American College Football– DBLP Collaboration– Enron email– Gowalla friendship– Wikipedia Who-votes-on-whom– …

• Public networks are available to any CINET user• Users can also upload their own datasets and make them public or private• Two different representations of the networks are supported:

– Adjacency list (Galib) format– Edge list (NetworkX) format

Network Science: Introduction to CINET 2015

Page 17: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

17

CINET Analysis Tools

Network analysis• Graph Algorithms

Over 70 algorithms with variety of types related to shortest path, sub graph and motif counting, centrality, graph traversal, etc.

• Dynamic AnalysisMultiple different simulation codes to provide different diffusion models and simulation capabilities. Analysis of the phasic structure of a graph dynamical system (e.g., spreading dynamic phenomena such as rumors through networks).

• Network GeneratorsImplementation of ~20 random and deterministic network generators such as Barabási–Albert, Erdős–Rényi, small world, star graphs, etc.

Network Science: Introduction to CINET 2015

Page 18: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

18

CINET Visualization

Network Visualization

• An integrated visualization module that supports dynamic range of visualizations. Multiple layout algorithms: Random, Force Atlas, Yifan Hu, etc.

• Feature based organization: determining node size and color by degree, betweenness, etc.

• Coloring communities: applying community detection algorithm to visualize different communities in different colors.

• Vector graphics output (SVG).Network Science: Introduction to CINET 2015

Page 19: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

19

CINET Applications

Using CINET in education and research

• Network science courses– Virginia Tech, Blacksburg, VA– North Carolina A&T State University, Greensboro, NC– Jackson State University, Jackson, MS– University at Albany – State University of New York,

Albany, NY

• Research– We the People (WtP) project: Web-enabled petitioning

system– Other petitioning sites (change.org)

• Case studiesNetwork Science: Introduction to CINET 2015

Page 20: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

20

CINET Summary

CINET in Context• User interface—all user interaction

– No need to program– No need for HPC resources.

• Types of analysis – Network structural characteristics– Dynamics on networks

• Large networks – Generation– Analysis

• Multiple tools provided under a CINET umbrella• Crowd-sourced platform

– Self-sustaining – Self-managing

• Collaborative science• Community resource

Network Science: Introduction to CINET 2015

Page 21: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

21

CINET References

Papers and other publications• Abdelhamid S, Alo R, Arifuzzaman S, Beckman P, Bhuiyan M, Bisset K,

Fox E, Fox G, Hall K, Hasan S, Joshi A, Khan M, Kuhlman C, Lee S, Leidig J, Makkapati H, Marathe M, Mortveit H, Qiu J, Ravi S, Shams Z, Sirisaengtaksin O, Subbiah R, Swarup S, Trebon N, Vullikanti A, Zhao Z (2012) CINET: A CyberInfrastructure for Network Science. In The 8th IEEE International Conference on eScience, 2012. Chicago, IL, October 8-12, 2012.

• Abdelhamid S, Alam M, Alo R, Arifuzzaman S, Beckman P, Bhattacharjee T, Bhuiyan H, Bisset K, Eubank S, Esterline A, Fox E, Fox G, Hasan S, Hayatnagarkar H, Khan M, Kuhlman C, Marathe M, Meghanathan N, Mortveit H, Qiu J, Ravi S, Shams Z, Sirisaengtaksin O, Swarup S, Vullikanti A, Wu T (2014) CINET 2.0: A CyberInfrastructure for Network Science. In The 10th IEEE International Conference on eScience, 324-331.

• Abdelhamid et. al., “GDSCalc:  A Web-Based Application for Evaluating Discrete Graph Dynamical Systems,” PLOS One 2015.

• …

Network Science: Introduction to CINET 2015

Page 22: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

22

CINET Links

Useful links

• Main CINET pagehttp://cinet.vbi.vt.edu/

• Granite pagehttp://cinet.vbi.vt.edu/granite/granite.html

• Stanford Network Analysis Projecthttp://snap.stanford.edu/

Network Science: Introduction to CINET 2015

Page 23: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

23

CINET Hands-on Labs

http://pmtips.net/Blog/handson-project-manager

Overview

Exercise 1• Learn how to use CINET through the Granite interface• Compute simple network measures

Exercise 2• Analyze a larger set of networks with CINET• Use the output of CINET to compute additional network

measures and study correlations between graph parameters

Exercise 3• Use CINET to visualize networks• Explore different layouts and

visualization parametersNetwork Science: Introduction to CINET 2015

Page 24: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

24

CINET Hands-on Labs Exercise 1 Objectives

In this exercise

• Review networks and measures available in CINET

• Practice setting up network analysis and using different measures

• Compute three measures for each of the two networks (Dolphins Social Network in New Zealand and Erdős Collaboration Network). Fill in the following table:

Network Science: Introduction to CINET 2015

Network # of nodes # of edges Density # of triangles Diameter

Dolphins 62 159

Erdős 6,927 11,850

Page 25: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

25

CINET Hands-on Labs Exercise 1 Procedure

Follow these steps • Set up a new analysis (click on the “+New Analysis” button and choose a name for the

analysis).• In the search box under the “Networks” heading , type “Dolphins” (without the quote

marks).• The name of the network (“Dolphins Social Network in NZ”) appears below the search box.

Select the network by clicking on the check box. If necessary, additional networks can also be selected for analysis.

• Click the “Continue” button above “Networks”; the system then displays the menu for “Add measure”.

• In the search box under “Measures”, type “Density” (without the quote marks).• The Density measure appears below the search box. Select the measure by clicking on

the check box. It is possible to compute multiple measures as part of the same analysis. Use measures called “Compute the Number of Triangles” and “Find Diameter of a Graph” provided by CINET.

• Click the “Analyze” button above “Measures”.• The system starts the computation and displays the “Status” of the computation.• When the “Status” appears as “COMPLETED”, click on the “View Report” link.• In the resulting window, click on the log.out link to see the answer and record the answer

in the table above.

Network Science: Introduction to CINET 2015

Page 26: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

26

CINET Hands-on Labs Exercise 1 Outcome

Exercise review

• What kind of networks are publicly available in CINET?

• What network analysis measures does CINET offer?

• Analysis results for the networks:

Network Science: Introduction to CINET 2015

Network # of nodes # of edges Density # of triangles Diameter

Dolphins 62 159 0.084082 95 8

Erdős 6,927 11,850 0.000494 5,973 4

Page 27: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

27

• Compute three measures for each of the five networks. Fill in the following table:

• Determine whether certain pairs of graph measures are correlated using Pearson Correlation Coefficient (PCC) as the measure of correlation. Draw scatter plots.

CINET Hands-on Labs Exercise 2 Objectives

In this exercise

Network Science: Introduction to CINET 2015

Network# of

nodes# of

edges

Average node degree

(∆)

# of triangles (T)

Diameter (D)

Autonomous systems - Oregon-1-010407

10,729 21,999

Erdős Collaboration Network 6,927 11,850

Autonomous systems - Oregon-1-010331

10,670 22,002

Autonomous systems - Oregon-2-010331

10,900 31,180

Enron Giant Component 33,696 180,811

Page 28: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

28

CINET Hands-on Labs Exercise 2 Pearson Correlation Coefficient

Pearson Correlation Coefficient

Suppose we are given a data sample consisting of n ≥ 1 pairs of

numbers . Let and denote respectively the mean values of the sets

and ; that is , and .

The Pearson Correlation Coefficient (PCC) r for the sample is given by

where positive square roots are used for both terms in the denominator. The PCC value r defined above satisfies the condition −1 ≤ r ≤ 1. The value r = 1 indicates that a linear equation describes the relationship between the two sets X and Y . Similarly, r = −1 indicates a linear relationship between the two sets, with Y values decreasing as the X values increase. The value r = 0 indicates that X and Y are not correlated. Network Science: Introduction to CINET 2015

Page 29: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

29

CINET Hands-on Labs Exercise 2 Procedure

Follow these steps • Set up a new analysis in CINET and select the appropriate networks.• For each of the five networks, find the average node degree (use a

measure called “Degree Statistics” provided by CINET), the number of triangles, and the diameter. Since each of the five networks is connected, all five diameter values should be finite.

• Compute the PCC value for the sample using a tool of your choice (a calculator, an Excel spreadsheet, by writing a simple program, etc.).

• Compute the PCC value for the sample using a tool of your choice.• Prepare two scatter plots, one showing the pairs and the other

showing the pairs . In each case, please show the ∆ values along the X axis and the other value along the Y axis.

Network Science: Introduction to CINET 2015

Page 30: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

30

CINET Hands-on Labs Exercise 2 Outcome

Exercise review• Is there a correlation between the network measures you

computed? If so, what kind of correlation it is and why? What does it tell you about the networks?

• Analysis results for two networks:

Network Science: Introduction to CINET 2015

Network# of

nodes# of

edges

Average node degree

(∆)

# of triangles (T)

Diameter (D)

Autonomous systems - Oregon-1-010407

10,729 21,999 4.101

Erdős Collaboration Network 6,927 11,850 3.421

Autonomous systems - Oregon-1-010331

10,670 22,002 4.124

Autonomous systems - Oregon-2-010331

10,900 31,180 5.721

Enron Giant Component 33,696 180,811 10.732

Page 31: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

31

CINET Hands-on Labs Exercise 2 Outcome

Exercise review• The PCC values are and • Scatter plots

Network Science: Introduction to CINET 2015

3 4 5 6 7 8 9 10 11 120

100000

200000

300000

400000

500000

600000

700000

800000

Average node degree and # of triangles

(∆, T)

T

3 4 5 6 7 8 9 10 11 120

2

4

6

8

10

12

14

Average node degree and diameter

(∆, D)

D

Page 32: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

32

CINET Hands-on Labs Exercise 3 Objectives

In this exercise

• Review layout algorithms and visualization parameters available in CINET

• Create visualizations for the following networks:

Network Science: Introduction to CINET 2015

Network # of nodes # of edges

Karate 34 78

American College Football 115 613

Amazon product co-purchasing 262,111 617,438

Page 33: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

33

CINET Hands-on Labs Exercise 3 Procedure

Follow these steps • Switch to the “Networks” tab.• In the search box under the “Networks” heading , type “Karate” (without the quote marks).• The name of the network (“Karate network”) appears below the search box. Click the network to select it.• Set up a new visualization (click on the “+Add Visualization” button and choose a name for the

visualization).• Select “Random” as the Layout Algorithm.• Click the “Generate” button at the bottom of the screen to produce the visualization. The system starts

creating the visualization and displays the “Viz request submitted” status.• Click the “Visualization” link to switch to the visualization pane. If the system is still displaying the

“QUEUED”, “RUNNING”, or “DOWNLOADING RESULTS” prompt wait until rendering is done. Check the status by clicking on the visualization name to refresh the pane.

• Click on the network visualization to view it in a vector format (SVG). Save the SVG file on your local filesystem.

• Create additional visualizations for the same network with the following parameters:

• Once you have multiple visualizations you can switch between them by clicking on visualization names in the pane header.

• Follow the same procedure to create visualizations for other networks.

Network Science: Introduction to CINET 2015

Layout Node Node size Node Min Size Node Max Size

Random Degree None 1 10

Force Atlas Modularity Degree 5 10

Page 34: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

34

CINET Hands-on Labs Exercise 3 Outcome

Exercise review

• What layout algorithms does CINET offer?• What are some possible ways of reducing

clutter when visualizing large networks • Network visualizations

Network Science: Introduction to CINET 2015

Karate, random layout, degree node parameter

Karate, Force Atlas layout, modularity node parameter, node size: degree, node min size: 5,

node max size: 10

American Football, Force Atlas layout, modularity node parameter, node size: degree,

node min size: 5, node max size: 10

Page 35: Class 3: Introduction to CINET Tools for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski Konstantin.

35

CINET Hands-on Labs Exercise 3 Outcome

Exercise review

• Network visualizations

Network Science: Introduction to CINET 2015

Amazon product co-purchasing, Force Atlas layout, modularity node parameter, node size: degree, node min size: 5, node max size: 10