Distributed Monitoring Tool Design Document

Distributed Monitoring Tool Design Document CSE 5306

of 24

Distributed Monitoring Tool Design Document Spring 2015 Section 1

Team Members:

Nicholas Brent Burns Sneha Kadam

Tran Hoang-Dung Anuj Rakheja


of 24

Table of Contents I. Introduction .......................................................................................................................................... 4

a. Primary Focus .................................................................................................................................... 4

b. Prerequisites ..................................................................................................................................... 4

c. Description ........................................................................................................................................ 4

d. Outcome ........................................................................................................................................... 4

e. Goal ................................................................................................................................................... 5

f. Example ............................................................................................................................................. 5

g. Summary ........................................................................................................................................... 6

h. Organization of report ...................................................................................................................... 6

i. Revision History ................................................................................................................................ 7

II. Related Work ........................................................................................................................................ 8

III. System Overview............................................................................................................................... 9

a. Overview of the design ..................................................................................................................... 9

b. Brief discussion of each component ............................................................................................... 10

c. Interaction between components .................................................................................................. 10

IV. Detailed Design ............................................................................................................................... 11

a. Detailed Design of each component ............................................................................................... 11

b. Challenges & solutions .................................................................................................................... 19

V. Implementation .................................................................................................................................. 21

a. Software and Tools to be used ....................................................................................................... 21

b. Work dispersion among team ......................................................................................................... 21

VI. Theoretical/Simulation Study ......................................................................................................... 22

VII. Future Work .................................................................................................................................... 23

VIII. References ...................................................................................................................................... 24


of 24

Table of Figures Figure 1 Task manager in a windows based operating system..................................................................... 5

Figure 2 Heap structure to illustrate the managemnet of nodes ................................................................. 6

Figure 3 Tree structure ................................................................................................................................. 9

Figure 4 The server data structure .............................................................................................................. 11

Figure 5 Adding The first node .................................................................................................................... 12

Figure 6 Structure after adding 2 nodes to the system .............................................................................. 13

Figure 7 Adding node number 29 ............................................................................................................... 13

Figure 8 Tree structure after adding node 29 ............................................................................................. 14

Figure 9 Adding a few more nodes ............................................................................................................. 15

Figure 10 Deleting node number 29 ........................................................................................................... 15

Figure 11 Sigar output ................................................................................................................................. 18


of 24

I. Introduction A Distributed monitoring tool helps to collect all kinds of information all the nodes attached to

a server. All the nodes/computer will have a hybrid structure with basic client server

architecture.

a. Primary Focus

The primary focus of this project is to successfully create a distributed monitoring tool which

helps in collecting all kinds of system information about the nodes/computers & send the same

to the main node/server. The server displays the data of the nodes on the display.

b. Prerequisites

Nodes/computers with successfully connected to server through wired or wireless

network.

Latest version of JDK

SIGAR Libraries

OS (Windows, Linux, Mac, Unix)

c. Description

Distributed monitoring tool will help to collect and analyze the different characteristics,

properties of nodes/computers. All the nodes will be connected in a hybrid structure. Also, the

architecture will be basic client server architecture, which will help server to collect the

required information about the nodes.

d. Outcome

Outcome of the project will be the information collected from all the nodes/computers

connected to the main computer/server. Following information will be collected from the

nodes:

Memory Size

Memory Usage

Network Activity a. IP Type b. TCP/UDP c. Network Speed d. MAC Address e. Domain Name

CPU Utilization

CPU Max. Speed

Number of CPU Cores

Disk Space

Disk Usage


of 24

Number of Processes

Number of Threads

e. Goal

The final goal of our project is to develop a distributed monitoring tool application that will help

a user to analyze different aspects of information of other nodes/computers in the network.

f. Example

Currently we use task manager to analyze information like name and number of applications

running, name and number of processes running, graph representing CPU Usage and CPU usage

history, physical memory usage history, total size of physical memory available and many more,

but it is limited to only one computer's information, but distributed monitoring tool will help to

represent and provide the information about every connected node in the network.

FIGURE 1 TASK MANAGER IN A WINDOWS BASED OPERATING SYSTEM

The output of the distributed monitoring tool will gather the same type of information shown

here in the Windows Task Manager, but the details of how our information will be displayed is

explained the Detailed Design section.


of 24

g. Summary

Distributed monitoring tool will help a user to analyze the various aspect of a distributed system. This tool is designed in Multi tired "Client Server" Architecture with maintaining a Tree-Based Heap Structure. Also, there will be two types of nodes/computer in the structure i.e. first, the main computer/main node/server and the other is the client nodes/remaining nodes. There can be only one server and 31 client nodes at max. It's a tree structure, so, every node will send the information to its parent node until it reaches the server/ main node. With addition of every new node, it will follow the property of heap and create a balanced tree structure for better operations. For example:

FIGURE 2 HEAP STRUCTURE TO ILLUSTRATE THE MANAGEMNET OF NODES

It will include the functionality of adding a new node, when a new computer/node gets

connected to the network & deleting a node, when a node leaves the network. To collect the

information, SIGAR (System Information Gatherer and Reporter) will be used as the tool to

gather each computer’s system, network, and hardware information. Because, the SIGAR

libraries and commands is extremely simple since all of the low-level OS commands are already

taken care of. Also, for coding we are planning to use Java language, and Java’s Server/Client

Socket Programming libraries (.net framework and functionality). All in all, as output,

distributed monitoring tool will show the required information to analyze and to get

information that how resources are being shared in a distributed system.

h. Organization of report

The first section of this report is the introduction followed by the detailed design. It also talks

about the related work in this field, future enhancements to our design, & the simulation study.


of 24

i. Revision History

The document has been modified for the latest design changes and modifications suggested by

the professor. The modifications suggested by the professor are as follows:

Broadcast beacon: The initial design was poling based i.e. the clients would send their

data to the server. The current design also supports broadcast beacon replies i.e. the

server sends a broadcast beacon & then the clients send their data.

Specific location client data: In the initial design there was no location assigned to the

client (location such as which building, floor etc.). In the new design, each client is

assigned a specific location. When the server asks for location specific information (such

as building 1 floor 1), only those clients in that location will send their system

information.

Specific client data: The server can also ask for data form a specific client. This client will

then respond with its system data.

System fault tolerance: In the initial proposal, when a node failed, there was no way for

its children and nodes below it to communicate with the server. There was also no way

for its parent node to find out if its child had failed. But the new design will

accommodate this design change.


of 24

II. Related Work There are many system that uses the basic technique that we are using in our project i.e. to

gather the information through the nodes to analyze them. One of the systems’ is:

Directory Enabled Policy Management Tool for Intelligent Traffic Management:

Proposed by Vaid, A., Jose, S. Putta, S., Rakoshitz, G. & Alto, P. in california, 2002, Patent No.

US 6,502,131 B1. It is a method and a system for monitoring and profiling quality of service

within one or more information sources in a computer network. This method includes a

step of providing a network of computers, each being coupled to each other to form a local

area network.

How it is related to “Distributed Monitoring Tool”:

In distributed monitoring tool, it also includes the step to collect the information to further

analysis for any type of use required. In the above proposed system, they use the same

functionality to collect information; however the network to be analyzed is comparatively

larger than proposed in distributed monitoring tool and uses heap structure to arrange the

structure. Furthermore, in the above proposed system the use the result to provide solution

to decrease data latency and increasing the bandwidth of the user.

A Distributed DNS Traffic Monitoring System:

Proposed by Deri, L., Trombacchi, L., Martinelli, M. & Vannozzi, D., in Italy, 2012. A system

that is able to monitor the authoritative name servers of the .it country code Top Level

Domain (ccTLD) to continuously monitor DNS traffic for identifying anomalies, measure

performance, and getting usage statistics which further helps to understand trends,

characterize economical relationships, and also track suspicious activities.

How it is related to “Distributed Monitoring Tool”:

Distributed monitoring tool is also a monitoring tool like the one proposed above, but done

at a greater scale, like in a DNS server. Also, it gets the information from the nodes that lies

in a tree structure( DNS is tree structured) like anomalies, measure performance, and

getting usage statistics, similar to distributed monitoring tool that collects information like

memory size, memory usage, network activities, etc. So we can say that the A Distributed

DNS Traffic Monitoring System is also an advance version of a distributed monitoring tool.

Likewise many other related works are present in the real world, and we have mentioned some

of them above.


of 24

III. System Overview

a. Overview of the design

This architecture of this design is multi-tiered “Client-Server” with the organization of the nodes

(server & clients) in a tree-based Heap Structure.

There are two types of Nodes/Computer in this design as:

Server (1): Main computer (root) that collects and displays all node system information

Clients (max 31): Merge their own system information with their children’s (if any)

system information and pass along to their parent

To simulate multiple clients, they can be broadcasted from a single machine/computer (i.e.

using multiple threads).

FIGURE 3 TREE STRUCTURE

Leaf Nodes have a periodic timer that when triggered they begin to collect their own system

information and pass the Sys_Info object to its parent. In this example, the leaf client nodes 25,

24, 23, 27, and 26 initiate the data collection process.

The receiving parent takes its children’s Sys_Info objects (1 or 2) and combines them into one

new object with its own Sys_Info and passes along the merged data to its parent.

32

26 27

30 31

29 28

25 24 23

Node 1: Sys Info… Node 2: Sys Info… Node 3: Sys Info… … Node N: Sys Info…

Server

Clients


of 24

The root node 32 (server) merges its children’s Sys_Info objects with its own data and displays

all the information to the user’s display.

b. Brief discussion of each component

Server:

The node number of the server will be 32. The server will parse the data received from all the

client nodes & display them on the display.

Client nodes:

There will be 31 client nodes in the system arranged in a heap like tree structure. The client will

collect its data every 30 seconds & transfer it to its parent. The parent node will be responsible

to append its data to the child’s data & send to its parent.

c. Interaction between components

The client & server will be connected to each other over TCP/IP. The port number will be

constant at 50000. A node will follow a set protocol, as described in the subsequent sections, in

order to send its system info to the parent node.


of 24

IV. Detailed Design

a. Detailed Design of each component

The object of each child node will look like:

Node number

IP address

Socket number

Location

Server Details: o Node number o IP address o Socket number

Parent details: o Node number o IP address o Socket number

Number of children

Children details: o Left child node number, IP address & socket number o Right child node number, IP address & socket number

Server node details: The server object will contain the details of each node (i.e. node number, its children, IP

addresses etc.). The entire tree structure will look as below:

FIGURE 4 THE SERVER DATA STRUCTURE

Adding a new node: Initially only the server node i.e. node number 32 is present in the network. The IP address &

the port number on which the TCP communication will take place will be fixed.


of 24

Adding a first node will happen as below: New node to be added = X Server node number = 32

FIGURE 5 ADDING THE FIRST NODE

This will initiate the task of updating the node details of child 31 which will look like:

Node number = 31

IP address = XXX

Location = AA

Parent details: o Node number = 32 o IP address = XXX

Number of children = 0

Children details: o Left childe node number & IP address = 0 o Right childe node number & IP address = 0

The moment the node gets added, a data structure is created on the client which will contain data of all the 32 nodes. At the server the details will look like:

Node number = 32

IP address = XXX

Location = AA

Parent details: o Node number = 0 o IP address = 0


Children details: o Left childe node number & IP address = 31 & XXX, location = AA o Right childe node number & IP address = 0


of 24

After the second tier nodes are added, the network will look like:

FIGURE 6 STRUCTURE AFTER ADDING 2 NODES TO THE SYSTEM

Let’s say now we need to add node number 29 to the network. The exchange between different nodes will look like:

FIGURE 7ADDING NODE NUMBER 29

Node 29 details will look like:

Node number = 29

IP address = XXX

Location = AA



Children details: o Left childe node number & IP address = 0 o Right childe node number & IP address = 0

Node 31 details will look like:

Node number = 31

IP address = XXX

Location = AA


of 24




At the server the details will look like:

Node number = 32

IP address = XXX

Parent details: o Node number = 0 o IP address = 0


Children details: o Left childe node number & IP address = 31 & XXX

Left childe node number & IP address = 29 & XXX, location = AA o Right childe node number & IP address = 30 & XXX, location = AA

The network tree, after adding node 29 will look like:

FIGURE 8 TREE STRUCTURE AFTER ADDING NODE 29


of 24

After adding a couple of more nodes, the network will look like:

FIGURE 9 ADDING A FEW MORE NODES

Deleting a node: Suppose node 29 wants to leave the network. Then, the exchange between the server & the nodes will take place as follows:

FIGURE 10 DELETING NODE NUMBER 29


of 24

Node 31 details after replacement of 25 will look like:

Node number = 31

IP address = XXX

Location = AA



Children details: o Left childe node number & IP address = 29 & XXX, location = AA o Right childe node number & IP address = 28 & XXX, location = AA

New node 29 details will look like:

Node number = 29

IP address = XXX

Location = AA




The details of new node 29 will be updated at the server as well. Send particular location info: The server can ask for data from specific locations in the tree. The server will contact individual clients to get data from them. Send particular client info: The server can also ask for data from a specific client. The server will contact the client based on its IP address & socket number which is already stored in the clients database. Send broadcast beacon: Usually the children will send data to their respective parent nodes and the parent nodes will combine data with their own and send that to the server periodically. But, the server can also ask for the entire network data. In this case the children will initiate the bottom up data transfer.


of 24

Protocol Design: The communication between the children & parent nodes will take place as below over TCP/IP using 50000 as the port number. Byte # 1: Packet details like Request or acknowledgement or delete etc. Byte # 2: Number of bytes in this packet Depending on the first byte, the following bytes will vary. Byte # 3: Number of node details that this packet contains Byte # 4 to byte # x: Node 1 details Byte # x to y: Node 2 details . . . Byte # z to byte a: Last node System Information Collection Protocol: As stated in the Implementation section, SIGAR (System Information Gatherer And Reporter) will be used as the tool to gather each computer’s system, network, and hardware information. It is capable of gathering the following metrics:

System memory, swap, CPU, load average, uptime, logins

Per-process memory, CPU, credential info, state, arguments, environment, open files

File system detection and metrics

Network interface detection, configuration info and metrics

TCP and UDP connection tables

Network route table

Using the SIGAR libraries and commands is extremely simple since all of the low-level OS commands are already taken care of. SIGAR is broken down into smaller classes that are responsible for different aspects of information gathering of the computer’s system. We will use

Version – Reports the current version of SIGAR used and general OS information

Uptime – Reports the amount of time the OS has been active or awake

CPUinfo – Reports the detailed information about the CPU

Free – Reports memory information (file systems, total, used, and free space, etc.)

Ulimit – Reports the system resource limits (stack size, virtual memory, etc.) All information can be outputted in different formats and we will most likely use the Array.asList or pure string format pending which is simpler to parse. Here is a sample screenshot of what SIGAR is capable of outputting:


of 24

FIGURE 11 SIGAR OUTPUT

In regards to how SIGAR will be used to our project and system: 1. The network is settled and in “final” form (no more nodes currently being added) 2. All the leaf nodes/computers of the network each have a periodic timer within their

client-side code 3. Whenever this timer is triggered they call the SIGAR commands to gather their

information 4. This information is packaged within our custom built class in a rigid format 5. This Sys_Info object is passed along to its parent 6. The receiving parent node waits until it has all of its children’s objects (either 1 or 2)


of 24

7. Once it has all child information it combines those two objects with its own SIGAR-gathered Sys_Info object and passes along the collective data to its parent

8. This protocol continues until all information reaches the main server node which then displays all node information on the user’s interface

Synchronization is absolutely crucial for this scheme as to not pass information along until all node information below the current node has been gathered. Each node must be capable of parsing the data regardless of its size or number of node information contained within the data, which is why our Sys_Info object/class will need to have the same structure across ALL nodes. Output Display Format: Figure 11 shows all the information, by default, SIGAR can output. We plan on the server node having a simple JPanel with a tab for each of the 32 possible nodes. When clicking on a tab the corresponding node’s information is displayed in a non-editable text window similar to Figure 11, however not all of the SIGAR information will be used. We will only display the metrics defined in Outcome section of the Introduction. If the user selects a node’s tab that is not currently active within the network, no information is displayed except a “Node # is not active.” The JPanel will also have a location entry button which will help in entering the new nodes location. Client side JPanel: The client side will also have a simple JPanel where a new client will be able to connect to the server by entering the specific IP address of the server. Clicking on the connect button will ensure the initial TCP/IP connection between the client & server.

b. Challenges & solutions

Parsing data sent between nodes correctly o Since every time data is sent upward a level the Sys_Info object grows due to

each node adding its own information until the object finally reaches the main Server Node. We must ensure that our parsing routine can detect the number of nodes the object contains data for and act accordingly.

Ensuring correct and coherent table-keeping for node arrangement in the network o The main Server Node must keep an up-to-date table of where each and every

active node in the network is located, even when nodes are added or deleted arbitrarily during runtime.

Guaranteeing synchronization of child node reporting to parent node o Since a parent node may have two children, it may have to accept a two Sys_Info

object input. Correct synchronization refers to the parent receiving both (if applicable) of its children’s information and adding its own information BEFORE passing along to its parent, not prematurely so no data loss occurs.

Keeping accurate and updated IP and Port information amongst the clients


of 24

o This is crucial when adding or deleting nodes from the network. Each node class/type will need to have variables in order store their own information locally so any other node may request that information anytime. Must also store IP and Port information about its parent and children.

Ensuring no deadlocks occur during the threading of clients o Since there is no direct “client-to-client” communication, we must treat each link

in the network as a small client-server relationship due to limitations and functionality of Java’s .net libraries. Since we plan on using just a single Port number for all nodes, each client must exist within its own thread so not to interfere with any other communication from other nodes.


of 24

V. Implementation

a. Software and Tools to be used

Language: Java (using at least JDK 8u40)

Java’s Server/Client Socket Programming libraries (.net framework and functionality)

o TCP/IP scheme

SIGAR (System Information Gatherer And Reporter)

o Used for collecting each computer’s system information such as…

System memory, swap, CPU, load average, uptime, logins

Per-process memory, CPU, credential info, state, arguments,

environment, open files

File system detection and metrics

Network interface detection, configuration info and metrics

TCP and UDP connection tables

Network route table

o Provides a single API capable of working with all the popular operating systems

on the market (Windows, Linux, Solaris, MAC OS X, etc.)

o Its core is implemented in C but has bindings in other languages (Java will be

used for this project)

o Link:https://support.hyperic.com/display/SIGAR/Home;jsessionid=FA291DA9FC6

FE724352ABF54681B80AD

b. Work dispersion among team

A shared DropBox folder is used to share general project files, documents, notes, lectures,

papers, etc. A common repository will be used to share project code and files.

This project can be broken down into the following areas:

1. Gathering system information, packaging efficiently, and passing to another node

Nicholas, Sneha

2. Programming for the main server node parsing all information to display to the user

Tran, Sneha

3. Server node programming

Anuj, Nicholas

4. Client node programming

Anuj, Nicholas

5. General socket communication between nodes

Sneha, Tran

6. Overall architecture and structure of the network

All

https://support.hyperic.com/display/SIGAR/Home;jsessionid=FA291DA9FC6FE724352ABF54681B80AD




of 24

VI. Theoretical/Simulation Study From above design method, it is easy to see that the nodes in the distributed monitoring

system have different storage capacity requirements. The nodes in higher height in the heap

have to combine the information of their children with their own information and then transfer

all of them to the higher level nodes . Thus, they not only have to storage but also transfer a

larger amount of information in comparison with their children. Consequently, the

requirements of storage capacity, computation ability increase from the leafs to the root in the

distributed monitoring system.

It would be a problem when we apply the design method for a distributed monitoring system

with a very large number of nodes because of the unbalance in storage capacity and

computational ability requirements between nodes. Therefore, the scalability of proposed

design method may be limited to a certain small number of nodes. In order to enhance the

balance in storage capacity and computational ability between nodes and thus increase the

scalability of the distributed monitoring system, the super peer architecture in low level nodes

and the client-server architecture in high level nodes should be combined together.


of 24

VII. Future Work We have just only proposed one basic operating mode for the distributed monitoring system

that allows the server to collect periodically all information of its children. To improve the

operating flexibility of the system, we would like to add two more operating modes into the

tool in the future.

Allow the system to operate under request-answer style. That means the server may ask

any node to send just only its information. This option helps the server to avoid

processing to much information from all of nodes that it not really care. Additionally,

this operating mode may help to reduce the communication load in entire the system.

Allow the server find the most suitable node for a specific task. When user want to find

a node to do a specific task, it requires the sever to find the most suitable node. The

server will send a request to its children to determine who is the most suitable node

containing optimal required resources for the task. This can be seen as request for

optimal information in specific cases.

In order to deal with a large number of nodes in the future, we would like to find an optimal

design architecture for the distributed monitoring system. From the theoretical analysis in

section 6, the combination of super peer and client server architecture may be a good

selection. However, it is certainly more complicated in implementation than the proposed

method.

Last but not least, we have to deal with one important situation of communication between

nodes in the project that a node suddenly fails (for example the node 29 in section 4). In the

project we just assume that before a node leaves, it informs the server to get the permission

from the server. After that, the server reconfigures the heap architecture and the system works

as usual again. However, one node can suddenly fail in connection with its children and its

parent as well. In that situation, its parent should be informs to the server that one of its

children have gone and asks for the reconfiguration the network structure from the server.


of 24

VIII. References Deri, L., Trombacchi, L., Martinelli, M. & Vannozzi, D. (2012). A Distributed DNS Traffic

Monitoring System. IEEE, 1-6.

Vaid, A., Jose, S. Putta, S., Rakoshitz, G. & Alto, P (2002). Directory Enabled Policy

Management Tool for Intelligent Traffic Management, 1-37.

https://support.hyperic.com/display/SIGAR/Home;jsessionid=FA291DA9FC6FE724352ABF5468

1B80AD



Distributed Monitoring Tool Design Document

Documents

Transcript of Distributed Monitoring Tool Design Document