Final Paper Revision

Multilevel Network VisualizationEmmanuel OppongComputer Science and EngineeringThe Pennsylvania State University

SROP 2014 ReportAugust 4, 2014

AbstractIn this research project, we investigate the problem of visualizing large networks. Networks, or graphs, are used to describe relationships between different objects. Graphs are widely used in social networks, roadway systems, and in general, to describe a system that has interactions among multiple entities. Visualizing relationships through graph drawings is important so that information can be easily comprehended and navigated. Some networks, for instance social networks, can become very large when they represent a large number of entities. In this project, we develop a new multilevel method for visualizing graphs, using existing tools and algorithms for graph drawing. We tested this method on real-world networks from several online repositories, such as the Koblenz network collection and Stanford large network collection. We evaluated the method and compared it to alternatives. This research tool will allow users to generate multilevel network visualizations for describing systems such as social connections, microorganism relationships, highway systems, large populations, and map topologies.IntroductionA network is a system of interconnected objects. Graph theory is the mathematical language used to describe networks. It is a very old branch of mathematics which started in 1736 when Leonhard Euler attempted to solve the problem of the seven bridges of Konigsberg. He tried to prove that there wasnt a possible way of visiting each bridge without crossing one twice [1].Since then, graph theory has evolved. Currently, researchers study how networks arise in real-world scenarios and analyze their properties. Graphs are used to model relations in physical, social, biological, and information systems. They are a unifying information abstraction to capture various types of data. Graphs are currently widely used on the internet to make sense of large datasets. In 2012, Google announced the Knowledge Graph feature as an addition to their search engine [3]. The idea was to build a massive graph of real world objects and the connections between them. The knowledge graphs uses links between documents on the web to understand their semantic context. The graph contains millions of objects and billions of facts connecting them, which it uses to understand the meaning of the keywords entered for the search. Facebook also utilizes a graph-based search engine. They combine big data from their billions of users and external data into one big search engine providing user-specific search results. The amount of data on the internet continues to grow each day. Graphs are used to create network connections to make it easier to understand the type of information coming in, and the information that is already there on the web. With the growth of data, especially on the internet, graphs have become very large. They encapsulate millions of networks and can contain billions of different connection types. Visual representation of networks is an important way of describing the data they represent. Visualization of graphs is done with graph drawing techniques. A graph drawing is visual representation of the vertices and edges it contains. The typical drawing of a graph consists of a shaded circle depicting the vertices and line segments depicting the edges, which connects related vertices. Graph drawing makes the information in the graph legible and navigable. The data within a network can be explored through displaying the vertices and edges in various layouts with attributing colors, size, and other properties. The display highlights patterns, shows connections, and provides visual information about a vertex. These factors are used to draw conclusions about a certain dataset, in order to solve complex problems. There are many graph drawing techniques that utilize mathematical algorithms to space out the vertices and edges accordingly. The arc diagram method (See Figure 1) evenly lays out all the vertices on the same line, and the edges are drawn as semicircles that go above or below the line to connect the vertices. The layered drawing method (also shown in Figure 1) is done by placing the vertices of directed graphs in horizontal rows, with the edges directed downwards. These methods are ideal when drawing displaying networks with a few vertices and edge connections. However, they are not ideal for drawing larger graphs.

Figure 1: Arc diagram (left)[5], and Layered method (right)[4].The force-directed system (see example in Figure2) is a physics-based method that calculates the attractive and repulsive force between vertices, and moves the vertices along the direction of the force [7]. The process is repeated multiple times until the edges are close to equal lengths and there are as few crossing edges as possible. This method is better suited for displaying clustered graphs. The larger the graph however, the longer it takes for the vertices to be repositioned.The spring electrical model (Figure 3) is a type of force-directed algorithm, where the system is visualized as electrically-charged vertices connected by springs [7]. Springs are imagined to be placed between vertices that share edges. The vertices are pulled together by the spring, while a repulsive electrical force exists among all pairs of nodes. This method is also repeated until the system reaches equilibrium [7].

Figure 2: Force-directed graph drawing technique [6].

Figure 3: Spring-Electrical Models [7].The multilevel approach to graph drawing aims to scale very large graphs to small ones. This is done by taking the edge connections between multiple vertices and separating them into layers. Figure 4 shows a demonstration of the multilevel approach. First the original graph is broken down into parts, and then new vertices are created encapsulating the parts they represent. The new vertices can be used to construct a smaller graph, which is then displayed. The new smaller graph now allows easy visualization of the entire graph.

Figure 4: Multilevel graph visualization approach.Visualization of large networks, i.e., graphs with more than millions of entities, is very challenging. This is due to the constraints of screen displays and the limitations of current graph drawing algorithms. To solve this problem, we implement a multilevel approach, where the network is partitioned into smaller graphs that hold different parts of the larger graph. Figure 4 illustrates the multilevel approach to graph visualization.A network can be partitioned in many different ways. It can be partitioned by labeled categories in the dataset, using weights associated with the vertices, or using a user-defined parameter present in the data. For example, if a data consists of a list of interactions between different animals, the data can be partitioned by grouping together animals that belong to the same species. This way, we can visualize a higher level view, where the types of species which will be represented by new vertices that belong to a smaller graph. We can then navigate to a specific species, to view an animal that belongs to that category.There are many software tools currently used to visualize small graphs. Gephi[2] is a windows application that is an interactive visualization and exploration for networks and complex systems. It can be used for social network analysis, exploratory data analysis, and biological network analysis. It provides tools for people to explore and understand graphs through graphical visualization. Sigma Js [9], D3 Js, and Processing Js are all browser-based JavaScript libraries that are dedicated to graph drawing. JavaScript is a dynamic computer programming language used to develop browser-based applications. These JavaScript libraries can be used to simplify network visualization in a browser, and allow application developers to integrate network exploration. We chose the Sigma Js library because it is the most light-weight of the three aforementioned libraries, and allows more user interaction with the display. We are creating a web user interface application, where users can upload a formatted large graph with multiple connections. Sigma Js takes a specific input with formatted labels of the vertices and edges with listed properties such as color and size. We are developing a PHP script for preprocessing, to reformat the users input to the format that Sigma Js recognizes. The end goal of this project is to enable users to upload their generated networks consisting of millions of vertices and billions of edges, and visualize them in a multilevel manner.MethodologyThe process begins with a formatted graph that consists of multiple vertices and edges. The graph is split into smaller ones according to their connections. This creates multiple layers of the different parts of the graphs. The formatted description of the vertices of the smaller graphs holds the identifier of the lower level networks they represent. When the user wants to navigate to a certain part of the graph, we use the identifier to locate that part of the graph and magnify the display unto it. The vertex zoom functionality will be created using JavaScript. A mouse click functionality will also be implemented. The user can use mouse to navigate through the network by zooming onto specific layers of the graph or directly onto a vertex. The Sigma Js library utilizes the force-directed method for drawing. The specific plug-in of the library that uses the force-directed method is called force atlas. When the users network is ready for display, the force atlas plug-in is called to calculate the position of the vertices for display. We display the graph using force atlas which is part of the Sigma Js library. The algorithm ensures that the vertices are well positioned so that all the edges are equal length and that crossing edges are reduced as much as possible.During the first four weeks of the eight week research term, we worked on creating the user interface and building example networks to display. The goal of the application is to allow users to better visualize and interact with their large networks. The user interface is designed to allow user to move vertices around the screen, zoom in and out of specific items, and also display textual information about a vertex. We also added a functionality to change the color of the vertices. Most importantly, the user interface comes with a search bar where user can search for particular items. The user interface was designed using HTML, a hypertext markup language used to create the graphical view of a web page. The user interface consists of input boxes and button selections with which the user can interact with a mouse and a keyboard. Using JavaScript, We connected the users actions to specific aspects of the network display, thereby creating the user interactivity with it. We tested networks with different sizes, small, large, and very large, to analyze the visualization, interactivity and performance the displays. We found that Sigma Js can processes network with up to 1000 vertices at a preferred performance level, however, when the vertex count exceeds that amount, performance begins to degrade. This finding is acceptable for the multilevel approach we will used to solve out problem. If a network with a million vertices is chosen for visualization, it can be scaled down to a network with 1000 vertices, where each vertex holds another network with 1000 vertices. The last four weeks of the research term was dedicated to partitioning of the large graphs into its smaller scaled representation. To test the multi-level approach, we chose a network with 1000 vertices and partitioned it into 10 different parts. We partitioned it numerically from 0 to 99, 100 to 199, and so on. First we used C++ to write the code for breaking up the larger graph. We wrote the code following the format of the dataset download from the large network databases. The different partitions were written to new files and another file was created with vertices linked to the partitions. The files are JSON formats which Sigma Js recognized for created the display of the vertices and edges.FindingsWe tested many different networks from two main sources, KONECT - The Koblenz Network Collection [10], and Stanford Large Network Dataset Collection [11]. We also tested many randomly generated graphs with arbitrary sizes and position. Here are some of the results from displaying the networks using Sigma Js. Figure 5(a) shows a display of a randomly generated graph using Sigma Js. Figure 5(b) shows the same graph display with the force directed plug-in from Sigma Js applied to it. As mentioned before, the vertices of the network are moves so that the edges are close to equal length when the force directed algorithm is applied. Figure 5: a) Random generated graph with Sigma Js.

Figure 5: b) Force directed plug-in applied.

Figures 6(a), 6(b), and 6(c) show examples of networks visualized using Sigma Js. These networks were downloaded from Stanford large network database. The format of the data set was defined by the creators and therefore had to be converted to the format required by Sigma Js. After careful conversion from the Stanfords graph data format to Sigma Js JSON format, we displayed the graph along with its properties. We also tested the effects of the user interface dialog box on these networks. We found that the vertices responded to the mouse and keyboard actions designed in the program. The vertices move accordingly and changes colors upon selection of the option to change a vertex color, through the user interface. Figure 6(a) displays a network with 1000 vertices. Figure 6(b) has a network with 5000 vertices, and Figure 6(c) has a network with 10000 vertices. As we can see in the displays, the network becomes clustered with the vertex points. The network becomes very hard to visualize. It is not easy to interpret the type of information being conveyed by the graph. It also takes very long to navigate through the graph to find a specific item. Figure 6: a) 1000 vertices.Figure 6: b) 5000 vertices.

Figure 6: c) 10,000 vertices.The beginning face of the user interface (Figure 7), directly allows the user to interact with the network displayed. Interactivity also plays an important role in the visualization of the networks, especially when implementing the multilevel approach. The user interface makes the information with the network easily accessible through a navigable display. The display screen can be repositioned along with specific item to visualize specific parts of the network or to maneuver unto certain vertices. The user interface allows the user to search for specific items with the data set, change the color of the vertices and edges, and also change how the edges are drawn. The user can also fit the network to the screen is they have navigated too far into the display. We defined the number of iteration of the force directed algorithm when the network is first loaded onto the web browser screen. The user interface has an option for the user to continue iterating through the algorithm to get a better display of the network.

Figure 7: User interface dialog box.

For testing the multilevel approach to visualizing large networks, we chose the network from Figure 6(a), to partition. Our goal was to partition it into 10 parts and create a new display to link the partitioned parts to the files they are stored in. When we loaded Sigma Js, the new display is drawn unto the screen and also follows the interactivity of the user interface. The vertices in those displays can be changed with color, position and style. We can now navigate to specific parts of the network we want to display. We added two animation processes that display either the part of the graph the user wants to navigate to, or a specific item. If the user searches for an item in the search box, the first animation zooms in to the part of the graph that item belongs to. Then that part of the graph is loaded onto the screen. The second animation zooms on to the item search and displays it along with its attributes.Figure 8 shows the display of the scaled down version of the test network (Figure 6(a)). The vertices are color coated to match the colors of the part of the larger network it is linked to. The graph is displayed with the force directed algorithm applied to it. Figure 9 shows the different partitions that were created. The display of each part consists of items that belong and items from other parts that are linked. They follow the color coat. If there is at least one connection between two parts of the graph, an edge is drawn in Figure 8 to connect those two parts.Figure 8: Scaled display result from partitioning network in Figure 6(a).

Figure 9: The different parts of the larger graph the vertices in Figure 8 are linked to.

DiscussionThe results of the tests ran on network display using Sigma Js confirmed our assumptions of the multilevel approach. When the network is scaled down to a smaller size compared to its larger representation, we are able to analyze the larger network very easily. For our tests, we chose to use a network consisting of 1000 vertices. Given the results from these tests, we believe that partitioning and visualizing networks with over a million vertices will follow the same process and produce similar results. We have set goals to test our partition algorithm on these much larger networks. The next step is to create multiple stages when partitioning the networks. For example, if a network has a million vertices, we can partition it into 1000 different parts, each consisting of 1000 vertices from the larger network. We can then move to partition further by splitting the new display into 10 parts, the display that consists of 1000 vertices linked to the 1000 different parts of the larger graph. We encountered several challenges while conducting this research. The primary concern when designing the application was to create as much client-side processes as possible and utilize minimal server-side processes. On web browser applications, server-side processes are those handled on the computer of the host, and client-side processes are handled on the computer of the user accessing the application. We aim to process the partitioning of the graph on the user end. However, in the current approach, this is done on the server. We faced another problem with the use of the force directed plug-in provided by Sigma Js. We saw that in some displays, the algorithm ran continuously without stopping. We saw some vertices constantly moving, sometimes back and forth in the same position. We resolved this by only iterating through the plug-in a certain amount of time and then bringing it to a halt for the first display. As mentioned earlier, we provided an option in the user interface for the user to continue iterating through the plug-in if they wanted a better display than the one provided.We are now developing this work further to possibly include an improved user interface dialog box, and parallel partitioning of larger networks. We will test the different partitioning algorithms on networks consisting of millions of vertices and billions of edges. The goal is to minimize the time it takes to partition the items in the larger datasets and to display the results. If the time to partition the dataset is minimized, we will add a function in the user interface to allow users to partition the network in real time. They will be able to define how they want the data to be separated in accordance to the format provided and see the end results of it on the display screen. We will also design better iteration of the force directed plug-in so that the first display of the network is desirable. We believe that the findings of the research project will greatly benefit those interested in analyzing and interpreting their large datasets through visualization. The end result of the research will be a website with user access to the application. The website will allow anyone to upload their datasets and easily visualize and interact with the information conveyed by the dataset. The website will also support multiple formats of the dataset, and will provide a guideline for the user to follow so the upload the right formatted document.References1. Rhishikesh S. Fansalkar, Graph Theory Origin and Seven Bridges of Knigsberg, New York University, 2007.2. The Gephi team, Gephi, http://gephi.github.io/, last accessed August 2014.3. The Google Team, Inside Search, http://www.google.com/insidesearch/features/search/knowledge.html, last accessed August 2014.4. Graph layout, http://goblin2.sourceforge.net/refman/pageGraphLayout.html, last accessed August 2014.5. Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, A Tour Through the Visualization Zoo, http://homes.cs.washington.edu/~jheer/files/zoo/, last accessed August 2014.6. John Howse, Peter Rodgers, and Gem Stapleton, "VL/HCC Tutorial 2009: Automated Diagram Drawing", http://www.eulerdiagrams.com/tutorial/AutomatedDiagramDrawing.html, last accessed August 2014.7. Yifan Hu, Current and Future Challenges in the Visualization of Large Networks, Encyclopedia of Social Network Analysis and Mining, 2013.8. Yifan Hu, Efficient, High-Quality Force-Directed Graph Drawing",The Mathematica Journal 10(1), 2006.9. Alexis Jacomy, Sigma js library, http://sigmajs.org/, last accessed August 2014.10. Jrme Kunegis, KONECT-The Koblenz Network Collection, http://konect.uni-koblenz.de/networks/, last accessed August 2014.11. Jure Leskovec, Stanford Large Network Dataset Collection, http://snap.stanford.edu/data/index.html, last accessed August 2014.

Final Paper Revision

Documents

Transcript of Final Paper Revision