Class 3: Introduction to CINET Tools for network analysis and visualization Network Science:...
-
Upload
samuel-montgomery -
Category
Documents
-
view
222 -
download
1
Transcript of Class 3: Introduction to CINET Tools for network analysis and visualization Network Science:...
Class 3: Introduction to CINET
Tools for network analysis and visualization
Network Science: Introduction to CINET 2015
Prof. Boleslaw K. SzymanskiKonstantin Kuzmin
2
TOOLS OVERVIEW (LISTED ALPHABETICALLY)
Tools for network analysis and visualization
• Computing model and interface– Desktop GUI applications– API/code libraries, Web services– Web GUI front-ends (cloud, distributed, HPC)
• Extensibility model– Only by the original developers– By other users/developers (add-ins, modules, additional packages, etc.)
• Source availability model– Open-source– Closed-source
• Business model– Free of charge– Commercial
Network Science: Introduction to CINET 2015
3
TOOLS CINET
CyberInfrastructure for NETwork science
• Accessed via a Web-based portal• Supported by grants, no charge for end users• Aims to provide researchers, analysts, and educators
interested in Network Science with an easy-to-use cyber-environment that is accessible from their desktop and integrates into their daily work
• Users can contribute new networks, data, algorithms, hardware, and research results
• Primarily for research, teaching, and collaboration• No programming experience
is required
Network Science: Introduction to CINET 2015
4
TOOLS Cytoscape
Network Data Integration, Analysis, and Visualization
• A standalone GUI application • A platform for visualizing complex networks and integrating
these with any type of attribute data• Originally developed for biological research• Includes features for data integration, analysis, and visualization• A variety of layout algorithms, including cyclic, tree, force-
directed, edge-weight, and yFiles Organic layouts• Implemented in Java• Runs on any Java-supported platform• Modular architecture extensible through
plugins (called Apps)• Open-source and free of charge
Network Science: Introduction to CINET 2015
5
TOOLS Gephi
The Open Graph Viz Platform
• A standalone GUI application • An interactive visualization and exploration platform for all
kinds of networks and complex systems, dynamic and hierarchical graphs
• Static and dynamic networks• Clustering and hierarchical graphs, community detection• Visualization layouts supported: ForceAtlas, Yifan's Hu
Multilevel• Modular architecture customizable with plugins• Runs on Windows, Linux and Mac OS X• Implemented in Java. Graph size <1M nodes & edges• Open-source and free of charge
Network Science: Introduction to CINET 2015
6
TOOLS Graphviz
Graph Visualization Software• A graph description language (called DOT) and a set of tools that can
generate and/or process DOT files• Can be used as standalone tool or as a library• Only graph drawing• A wide range of layouts:
– Hierarchical or layered drawings– Spring model layouts– Multiscale layout for large graphs– Radial layouts– Circular layouts
• Implemented in C• Runs on Linux, Windows and Mac OS X• Extensible through a scripting API• Open-source and free of charge
Network Science: Introduction to CINET 2015
7
TOOLS Pajek
Pajek and Pajek-XXL• A standalone GUI application• Several partitioning and community detection
algorithms• Network generator (random, Bernoulli/Poisson, scale free, small world,
etc.)• Support for ordinary (directed, undirected, mixed) as well as multi-
relational networks, bipartite, and temporal networks• Capable of analyzing and visualizing large networks with thousands or
even millions of nodes• Macro capability enables recording and
playback of a sequence of primitivecommands
• Implemented in Delphi (Pascal). Only Windows OS are supported (32 and64 bit)
• Freely available for noncommercial useNetwork Science: Introduction to CINET 2015
8
TOOLS SNAP
Stanford Network Analysis Platform (SNAP)
• A general purpose network analysis and graph mining library
• Written in C++ but Python interface is also available• Scales to massive networks with hundreds of millions of
nodes, and billions of edges• Efficiently manipulates large graphs, calculates structural
properties, generates regular and random
Network Science: Introduction to CINET 2015
http://snap.stanford.edu/
graphs, and supports attributes on nodesand edges
• Also available through the NodeXL which is a graphical front-end that integrates network analysis into Microsoft Office and Excel
9
CINET
What is CINET
• A web-based tool for analyzing networks that represent interactions in large-scale complex systems
• A large set of networks and algorithms to analyze networks
• Ability to add user networks and have them analyzed by the algorithms available in CINET
• The web-based interface has been designed to simplify the analysis of complex networks for users who are not necessarily computer scientists
Network Science: Introduction to CINET 2015
10
CINET Registration
Creating an account with GRANITE• Go to the login page http://
cinet.vbi.vt.edu/granite/granite.html• Click “Register” to create an account• Fill in the “Request Account”
form and click “RegisterAccount”
• Use your username andpassword to log into thesystem
Network Science: Introduction to CINET 2015
11
CINET Structural organization
Client-server model
Network Science: Introduction to CINET 2015
12
CINET Architecture
Layered architecture
Network Science: Introduction to CINET 2015
13
CINET Apps
Tools in CINET• Structural Analysis Tool (Granite)
– 190+ networks (graphs) – 20+ network generators – 70+ network algorithms (measures): GaLib, SNAP (Stanford), NetworkX– Visualization of networks: Gephi– Service for adding new networks (graphs) – Service for adding new structural analysis tools (graph algorithms)
• Graph Dynamical System Calculator (GDSC) – Complete network dynamics on networks– Analyzing the phase structure of GDS; small graphs – 13 graph templates; 15 vertex function (behavior) families
• Simulation of Dynamics (EDISON) – Forward trajectory (dynamics) on networks– Compute (contagion) dynamics on larger networks: simulation– Services to manipulate attributed networks and to run simulations– Several contagion models: with and without interventions
Network Science: Introduction to CINET 2015
14
CINET Components
Computational engines and resources• GaLib: provides efficient implementations of various classical and
new graph algorithms that are motivated by the analysis of social contact graphs and disease dynamics on such graphs.
• NetworkX: a powerful Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
• Stanford Network Analysis Platform (SNAP): a general purpose network analysis and graph mining library.
• Both traditional high performance computing clusters, e.g., Shadowfax, Pecos (Virginia Tech), and cloud computing infrastructure, e.g., FutureGrid. An intelligent resource manager chooses appropriate computing platform for a network analysis job considering resource availability and computational and memory requirement.
Network Science: Introduction to CINET 2015
15
CINET Features
Available features
• Network Analysis • Network Generators • Network List • Measure List • Visualization • NetScript• Dynamic Analysis (in upcoming versions)
Network Science: Introduction to CINET 2015
16
CINET Datasets
Networks• Social, web/internet, biological, infrastructure and transportation, artificial,
and other types of networks• Currently 194 public datasets are available:
– Amazon product co-purchasing– American College Football– DBLP Collaboration– Enron email– Gowalla friendship– Wikipedia Who-votes-on-whom– …
• Public networks are available to any CINET user• Users can also upload their own datasets and make them public or private• Two different representations of the networks are supported:
– Adjacency list (Galib) format– Edge list (NetworkX) format
Network Science: Introduction to CINET 2015
17
CINET Analysis Tools
Network analysis• Graph Algorithms
Over 70 algorithms with variety of types related to shortest path, sub graph and motif counting, centrality, graph traversal, etc.
• Dynamic AnalysisMultiple different simulation codes to provide different diffusion models and simulation capabilities. Analysis of the phasic structure of a graph dynamical system (e.g., spreading dynamic phenomena such as rumors through networks).
• Network GeneratorsImplementation of ~20 random and deterministic network generators such as Barabási–Albert, Erdős–Rényi, small world, star graphs, etc.
Network Science: Introduction to CINET 2015
18
CINET Visualization
Network Visualization
• An integrated visualization module that supports dynamic range of visualizations. Multiple layout algorithms: Random, Force Atlas, Yifan Hu, etc.
• Feature based organization: determining node size and color by degree, betweenness, etc.
• Coloring communities: applying community detection algorithm to visualize different communities in different colors.
• Vector graphics output (SVG).Network Science: Introduction to CINET 2015
19
CINET Applications
Using CINET in education and research
• Network science courses– Virginia Tech, Blacksburg, VA– North Carolina A&T State University, Greensboro, NC– Jackson State University, Jackson, MS– University at Albany – State University of New York,
Albany, NY
• Research– We the People (WtP) project: Web-enabled petitioning
system– Other petitioning sites (change.org)
• Case studiesNetwork Science: Introduction to CINET 2015
20
CINET Summary
CINET in Context• User interface—all user interaction
– No need to program– No need for HPC resources.
• Types of analysis – Network structural characteristics– Dynamics on networks
• Large networks – Generation– Analysis
• Multiple tools provided under a CINET umbrella• Crowd-sourced platform
– Self-sustaining – Self-managing
• Collaborative science• Community resource
Network Science: Introduction to CINET 2015
21
CINET References
Papers and other publications• Abdelhamid S, Alo R, Arifuzzaman S, Beckman P, Bhuiyan M, Bisset K,
Fox E, Fox G, Hall K, Hasan S, Joshi A, Khan M, Kuhlman C, Lee S, Leidig J, Makkapati H, Marathe M, Mortveit H, Qiu J, Ravi S, Shams Z, Sirisaengtaksin O, Subbiah R, Swarup S, Trebon N, Vullikanti A, Zhao Z (2012) CINET: A CyberInfrastructure for Network Science. In The 8th IEEE International Conference on eScience, 2012. Chicago, IL, October 8-12, 2012.
• Abdelhamid S, Alam M, Alo R, Arifuzzaman S, Beckman P, Bhattacharjee T, Bhuiyan H, Bisset K, Eubank S, Esterline A, Fox E, Fox G, Hasan S, Hayatnagarkar H, Khan M, Kuhlman C, Marathe M, Meghanathan N, Mortveit H, Qiu J, Ravi S, Shams Z, Sirisaengtaksin O, Swarup S, Vullikanti A, Wu T (2014) CINET 2.0: A CyberInfrastructure for Network Science. In The 10th IEEE International Conference on eScience, 324-331.
• Abdelhamid et. al., “GDSCalc: A Web-Based Application for Evaluating Discrete Graph Dynamical Systems,” PLOS One 2015.
• …
Network Science: Introduction to CINET 2015
22
CINET Links
Useful links
• Main CINET pagehttp://cinet.vbi.vt.edu/
• Granite pagehttp://cinet.vbi.vt.edu/granite/granite.html
• Stanford Network Analysis Projecthttp://snap.stanford.edu/
Network Science: Introduction to CINET 2015
23
CINET Hands-on Labs
http://pmtips.net/Blog/handson-project-manager
Overview
Exercise 1• Learn how to use CINET through the Granite interface• Compute simple network measures
Exercise 2• Analyze a larger set of networks with CINET• Use the output of CINET to compute additional network
measures and study correlations between graph parameters
Exercise 3• Use CINET to visualize networks• Explore different layouts and
visualization parametersNetwork Science: Introduction to CINET 2015
24
CINET Hands-on Labs Exercise 1 Objectives
In this exercise
• Review networks and measures available in CINET
• Practice setting up network analysis and using different measures
• Compute three measures for each of the two networks (Dolphins Social Network in New Zealand and Erdős Collaboration Network). Fill in the following table:
Network Science: Introduction to CINET 2015
Network # of nodes # of edges Density # of triangles Diameter
Dolphins 62 159
Erdős 6,927 11,850
25
CINET Hands-on Labs Exercise 1 Procedure
Follow these steps • Set up a new analysis (click on the “+New Analysis” button and choose a name for the
analysis).• In the search box under the “Networks” heading , type “Dolphins” (without the quote
marks).• The name of the network (“Dolphins Social Network in NZ”) appears below the search box.
Select the network by clicking on the check box. If necessary, additional networks can also be selected for analysis.
• Click the “Continue” button above “Networks”; the system then displays the menu for “Add measure”.
• In the search box under “Measures”, type “Density” (without the quote marks).• The Density measure appears below the search box. Select the measure by clicking on
the check box. It is possible to compute multiple measures as part of the same analysis. Use measures called “Compute the Number of Triangles” and “Find Diameter of a Graph” provided by CINET.
• Click the “Analyze” button above “Measures”.• The system starts the computation and displays the “Status” of the computation.• When the “Status” appears as “COMPLETED”, click on the “View Report” link.• In the resulting window, click on the log.out link to see the answer and record the answer
in the table above.
Network Science: Introduction to CINET 2015
26
CINET Hands-on Labs Exercise 1 Outcome
Exercise review
• What kind of networks are publicly available in CINET?
• What network analysis measures does CINET offer?
• Analysis results for the networks:
Network Science: Introduction to CINET 2015
Network # of nodes # of edges Density # of triangles Diameter
Dolphins 62 159 0.084082 95 8
Erdős 6,927 11,850 0.000494 5,973 4
27
• Compute three measures for each of the five networks. Fill in the following table:
• Determine whether certain pairs of graph measures are correlated using Pearson Correlation Coefficient (PCC) as the measure of correlation. Draw scatter plots.
CINET Hands-on Labs Exercise 2 Objectives
In this exercise
Network Science: Introduction to CINET 2015
Network# of
nodes# of
edges
Average node degree
(∆)
# of triangles (T)
Diameter (D)
Autonomous systems - Oregon-1-010407
10,729 21,999
Erdős Collaboration Network 6,927 11,850
Autonomous systems - Oregon-1-010331
10,670 22,002
Autonomous systems - Oregon-2-010331
10,900 31,180
Enron Giant Component 33,696 180,811
28
CINET Hands-on Labs Exercise 2 Pearson Correlation Coefficient
Pearson Correlation Coefficient
Suppose we are given a data sample consisting of n ≥ 1 pairs of
numbers . Let and denote respectively the mean values of the sets
and ; that is , and .
The Pearson Correlation Coefficient (PCC) r for the sample is given by
where positive square roots are used for both terms in the denominator. The PCC value r defined above satisfies the condition −1 ≤ r ≤ 1. The value r = 1 indicates that a linear equation describes the relationship between the two sets X and Y . Similarly, r = −1 indicates a linear relationship between the two sets, with Y values decreasing as the X values increase. The value r = 0 indicates that X and Y are not correlated. Network Science: Introduction to CINET 2015
29
CINET Hands-on Labs Exercise 2 Procedure
Follow these steps • Set up a new analysis in CINET and select the appropriate networks.• For each of the five networks, find the average node degree (use a
measure called “Degree Statistics” provided by CINET), the number of triangles, and the diameter. Since each of the five networks is connected, all five diameter values should be finite.
• Compute the PCC value for the sample using a tool of your choice (a calculator, an Excel spreadsheet, by writing a simple program, etc.).
• Compute the PCC value for the sample using a tool of your choice.• Prepare two scatter plots, one showing the pairs and the other
showing the pairs . In each case, please show the ∆ values along the X axis and the other value along the Y axis.
Network Science: Introduction to CINET 2015
30
CINET Hands-on Labs Exercise 2 Outcome
Exercise review• Is there a correlation between the network measures you
computed? If so, what kind of correlation it is and why? What does it tell you about the networks?
• Analysis results for two networks:
Network Science: Introduction to CINET 2015
Network# of
nodes# of
edges
Average node degree
(∆)
# of triangles (T)
Diameter (D)
Autonomous systems - Oregon-1-010407
10,729 21,999 4.101
Erdős Collaboration Network 6,927 11,850 3.421
Autonomous systems - Oregon-1-010331
10,670 22,002 4.124
Autonomous systems - Oregon-2-010331
10,900 31,180 5.721
Enron Giant Component 33,696 180,811 10.732
31
CINET Hands-on Labs Exercise 2 Outcome
Exercise review• The PCC values are and • Scatter plots
Network Science: Introduction to CINET 2015
3 4 5 6 7 8 9 10 11 120
100000
200000
300000
400000
500000
600000
700000
800000
Average node degree and # of triangles
(∆, T)
∆
T
3 4 5 6 7 8 9 10 11 120
2
4
6
8
10
12
14
Average node degree and diameter
(∆, D)
∆
D
32
CINET Hands-on Labs Exercise 3 Objectives
In this exercise
• Review layout algorithms and visualization parameters available in CINET
• Create visualizations for the following networks:
Network Science: Introduction to CINET 2015
Network # of nodes # of edges
Karate 34 78
American College Football 115 613
Amazon product co-purchasing 262,111 617,438
33
CINET Hands-on Labs Exercise 3 Procedure
Follow these steps • Switch to the “Networks” tab.• In the search box under the “Networks” heading , type “Karate” (without the quote marks).• The name of the network (“Karate network”) appears below the search box. Click the network to select it.• Set up a new visualization (click on the “+Add Visualization” button and choose a name for the
visualization).• Select “Random” as the Layout Algorithm.• Click the “Generate” button at the bottom of the screen to produce the visualization. The system starts
creating the visualization and displays the “Viz request submitted” status.• Click the “Visualization” link to switch to the visualization pane. If the system is still displaying the
“QUEUED”, “RUNNING”, or “DOWNLOADING RESULTS” prompt wait until rendering is done. Check the status by clicking on the visualization name to refresh the pane.
• Click on the network visualization to view it in a vector format (SVG). Save the SVG file on your local filesystem.
• Create additional visualizations for the same network with the following parameters:
• Once you have multiple visualizations you can switch between them by clicking on visualization names in the pane header.
• Follow the same procedure to create visualizations for other networks.
Network Science: Introduction to CINET 2015
Layout Node Node size Node Min Size Node Max Size
Random Degree None 1 10
Force Atlas Modularity Degree 5 10
34
CINET Hands-on Labs Exercise 3 Outcome
Exercise review
• What layout algorithms does CINET offer?• What are some possible ways of reducing
clutter when visualizing large networks • Network visualizations
Network Science: Introduction to CINET 2015
Karate, random layout, degree node parameter
Karate, Force Atlas layout, modularity node parameter, node size: degree, node min size: 5,
node max size: 10
American Football, Force Atlas layout, modularity node parameter, node size: degree,
node min size: 5, node max size: 10
35
CINET Hands-on Labs Exercise 3 Outcome
Exercise review
• Network visualizations
Network Science: Introduction to CINET 2015
Amazon product co-purchasing, Force Atlas layout, modularity node parameter, node size: degree, node min size: 5, node max size: 10