Distributed Monitoring Tool Design Document
-
Upload
thomson-press-india-limited -
Category
Documents
-
view
105 -
download
1
Transcript of Distributed Monitoring Tool Design Document
Distributed Monitoring Tool Design Document CSE 5306
Page 1 of 24
Distributed Monitoring Tool Design Document Spring 2015 Section 1
Team Members:
Nicholas Brent Burns Sneha Kadam
Tran Hoang-Dung Anuj Rakheja
Distributed Monitoring Tool Design Document CSE 5306
Page 2 of 24
Table of Contents I. Introduction .......................................................................................................................................... 4
a. Primary Focus .................................................................................................................................... 4
b. Prerequisites ..................................................................................................................................... 4
c. Description ........................................................................................................................................ 4
d. Outcome ........................................................................................................................................... 4
e. Goal ................................................................................................................................................... 5
f. Example ............................................................................................................................................. 5
g. Summary ........................................................................................................................................... 6
h. Organization of report ...................................................................................................................... 6
i. Revision History ................................................................................................................................ 7
II. Related Work ........................................................................................................................................ 8
III. System Overview............................................................................................................................... 9
a. Overview of the design ..................................................................................................................... 9
b. Brief discussion of each component ............................................................................................... 10
c. Interaction between components .................................................................................................. 10
IV. Detailed Design ............................................................................................................................... 11
a. Detailed Design of each component ............................................................................................... 11
b. Challenges & solutions .................................................................................................................... 19
V. Implementation .................................................................................................................................. 21
a. Software and Tools to be used ....................................................................................................... 21
b. Work dispersion among team ......................................................................................................... 21
VI. Theoretical/Simulation Study ......................................................................................................... 22
VII. Future Work .................................................................................................................................... 23
VIII. References ...................................................................................................................................... 24
Distributed Monitoring Tool Design Document CSE 5306
Page 3 of 24
Table of Figures Figure 1 Task manager in a windows based operating system..................................................................... 5
Figure 2 Heap structure to illustrate the managemnet of nodes ................................................................. 6
Figure 3 Tree structure ................................................................................................................................. 9
Figure 4 The server data structure .............................................................................................................. 11
Figure 5 Adding The first node .................................................................................................................... 12
Figure 6 Structure after adding 2 nodes to the system .............................................................................. 13
Figure 7 Adding node number 29 ............................................................................................................... 13
Figure 8 Tree structure after adding node 29 ............................................................................................. 14
Figure 9 Adding a few more nodes ............................................................................................................. 15
Figure 10 Deleting node number 29 ........................................................................................................... 15
Figure 11 Sigar output ................................................................................................................................. 18
Distributed Monitoring Tool Design Document CSE 5306
Page 4 of 24
I. Introduction A Distributed monitoring tool helps to collect all kinds of information all the nodes attached to
a server. All the nodes/computer will have a hybrid structure with basic client server
architecture.
a. Primary Focus
The primary focus of this project is to successfully create a distributed monitoring tool which
helps in collecting all kinds of system information about the nodes/computers & send the same
to the main node/server. The server displays the data of the nodes on the display.
b. Prerequisites
Nodes/computers with successfully connected to server through wired or wireless
network.
Latest version of JDK
SIGAR Libraries
OS (Windows, Linux, Mac, Unix)
c. Description
Distributed monitoring tool will help to collect and analyze the different characteristics,
properties of nodes/computers. All the nodes will be connected in a hybrid structure. Also, the
architecture will be basic client server architecture, which will help server to collect the
required information about the nodes.
d. Outcome
Outcome of the project will be the information collected from all the nodes/computers
connected to the main computer/server. Following information will be collected from the
nodes:
Memory Size
Memory Usage
Network Activity a. IP Type b. TCP/UDP c. Network Speed d. MAC Address e. Domain Name
CPU Utilization
CPU Max. Speed
Number of CPU Cores
Disk Space
Disk Usage
Distributed Monitoring Tool Design Document CSE 5306
Page 5 of 24
Number of Processes
Number of Threads
e. Goal
The final goal of our project is to develop a distributed monitoring tool application that will help
a user to analyze different aspects of information of other nodes/computers in the network.
f. Example
Currently we use task manager to analyze information like name and number of applications
running, name and number of processes running, graph representing CPU Usage and CPU usage
history, physical memory usage history, total size of physical memory available and many more,
but it is limited to only one computer's information, but distributed monitoring tool will help to
represent and provide the information about every connected node in the network.
FIGURE 1 TASK MANAGER IN A WINDOWS BASED OPERATING SYSTEM
The output of the distributed monitoring tool will gather the same type of information shown
here in the Windows Task Manager, but the details of how our information will be displayed is
explained the Detailed Design section.
Distributed Monitoring Tool Design Document CSE 5306
Page 6 of 24
g. Summary
Distributed monitoring tool will help a user to analyze the various aspect of a distributed system. This tool is designed in Multi tired "Client Server" Architecture with maintaining a Tree-Based Heap Structure. Also, there will be two types of nodes/computer in the structure i.e. first, the main computer/main node/server and the other is the client nodes/remaining nodes. There can be only one server and 31 client nodes at max. It's a tree structure, so, every node will send the information to its parent node until it reaches the server/ main node. With addition of every new node, it will follow the property of heap and create a balanced tree structure for better operations. For example:
FIGURE 2 HEAP STRUCTURE TO ILLUSTRATE THE MANAGEMNET OF NODES
It will include the functionality of adding a new node, when a new computer/node gets
connected to the network & deleting a node, when a node leaves the network. To collect the
information, SIGAR (System Information Gatherer and Reporter) will be used as the tool to
gather each computer’s system, network, and hardware information. Because, the SIGAR
libraries and commands is extremely simple since all of the low-level OS commands are already
taken care of. Also, for coding we are planning to use Java language, and Java’s Server/Client
Socket Programming libraries (.net framework and functionality). All in all, as output,
distributed monitoring tool will show the required information to analyze and to get
information that how resources are being shared in a distributed system.
h. Organization of report
The first section of this report is the introduction followed by the detailed design. It also talks
about the related work in this field, future enhancements to our design, & the simulation study.
Distributed Monitoring Tool Design Document CSE 5306
Page 7 of 24
i. Revision History
The document has been modified for the latest design changes and modifications suggested by
the professor. The modifications suggested by the professor are as follows:
Broadcast beacon: The initial design was poling based i.e. the clients would send their
data to the server. The current design also supports broadcast beacon replies i.e. the
server sends a broadcast beacon & then the clients send their data.
Specific location client data: In the initial design there was no location assigned to the
client (location such as which building, floor etc.). In the new design, each client is
assigned a specific location. When the server asks for location specific information (such
as building 1 floor 1), only those clients in that location will send their system
information.
Specific client data: The server can also ask for data form a specific client. This client will
then respond with its system data.
System fault tolerance: In the initial proposal, when a node failed, there was no way for
its children and nodes below it to communicate with the server. There was also no way
for its parent node to find out if its child had failed. But the new design will
accommodate this design change.
Distributed Monitoring Tool Design Document CSE 5306
Page 8 of 24
II. Related Work There are many system that uses the basic technique that we are using in our project i.e. to
gather the information through the nodes to analyze them. One of the systems’ is:
Directory Enabled Policy Management Tool for Intelligent Traffic Management:
Proposed by Vaid, A., Jose, S. Putta, S., Rakoshitz, G. & Alto, P. in california, 2002, Patent No.
US 6,502,131 B1. It is a method and a system for monitoring and profiling quality of service
within one or more information sources in a computer network. This method includes a
step of providing a network of computers, each being coupled to each other to form a local
area network.
How it is related to “Distributed Monitoring Tool”:
In distributed monitoring tool, it also includes the step to collect the information to further
analysis for any type of use required. In the above proposed system, they use the same
functionality to collect information; however the network to be analyzed is comparatively
larger than proposed in distributed monitoring tool and uses heap structure to arrange the
structure. Furthermore, in the above proposed system the use the result to provide solution
to decrease data latency and increasing the bandwidth of the user.
A Distributed DNS Traffic Monitoring System:
Proposed by Deri, L., Trombacchi, L., Martinelli, M. & Vannozzi, D., in Italy, 2012. A system
that is able to monitor the authoritative name servers of the .it country code Top Level
Domain (ccTLD) to continuously monitor DNS traffic for identifying anomalies, measure
performance, and getting usage statistics which further helps to understand trends,
characterize economical relationships, and also track suspicious activities.
How it is related to “Distributed Monitoring Tool”:
Distributed monitoring tool is also a monitoring tool like the one proposed above, but done
at a greater scale, like in a DNS server. Also, it gets the information from the nodes that lies
in a tree structure( DNS is tree structured) like anomalies, measure performance, and
getting usage statistics, similar to distributed monitoring tool that collects information like
memory size, memory usage, network activities, etc. So we can say that the A Distributed
DNS Traffic Monitoring System is also an advance version of a distributed monitoring tool.
Likewise many other related works are present in the real world, and we have mentioned some
of them above.
Distributed Monitoring Tool Design Document CSE 5306
Page 9 of 24
III. System Overview
a. Overview of the design
This architecture of this design is multi-tiered “Client-Server” with the organization of the nodes
(server & clients) in a tree-based Heap Structure.
There are two types of Nodes/Computer in this design as:
Server (1): Main computer (root) that collects and displays all node system information
Clients (max 31): Merge their own system information with their children’s (if any)
system information and pass along to their parent
To simulate multiple clients, they can be broadcasted from a single machine/computer (i.e.
using multiple threads).
FIGURE 3 TREE STRUCTURE
Leaf Nodes have a periodic timer that when triggered they begin to collect their own system
information and pass the Sys_Info object to its parent. In this example, the leaf client nodes 25,
24, 23, 27, and 26 initiate the data collection process.
The receiving parent takes its children’s Sys_Info objects (1 or 2) and combines them into one
new object with its own Sys_Info and passes along the merged data to its parent.
32
26 27
30 31
29 28
25 24 23
Node 1: Sys Info… Node 2: Sys Info… Node 3: Sys Info… … Node N: Sys Info…
Server
Clients
Distributed Monitoring Tool Design Document CSE 5306
Page 10 of 24
The root node 32 (server) merges its children’s Sys_Info objects with its own data and displays
all the information to the user’s display.
b. Brief discussion of each component
Server:
The node number of the server will be 32. The server will parse the data received from all the
client nodes & display them on the display.
Client nodes:
There will be 31 client nodes in the system arranged in a heap like tree structure. The client will
collect its data every 30 seconds & transfer it to its parent. The parent node will be responsible
to append its data to the child’s data & send to its parent.
c. Interaction between components
The client & server will be connected to each other over TCP/IP. The port number will be
constant at 50000. A node will follow a set protocol, as described in the subsequent sections, in
order to send its system info to the parent node.
Distributed Monitoring Tool Design Document CSE 5306
Page 11 of 24
IV. Detailed Design
a. Detailed Design of each component
The object of each child node will look like:
Node number
IP address
Socket number
Location
Server Details: o Node number o IP address o Socket number
Parent details: o Node number o IP address o Socket number
Number of children
Children details: o Left child node number, IP address & socket number o Right child node number, IP address & socket number
Server node details: The server object will contain the details of each node (i.e. node number, its children, IP
addresses etc.). The entire tree structure will look as below:
FIGURE 4 THE SERVER DATA STRUCTURE
Adding a new node: Initially only the server node i.e. node number 32 is present in the network. The IP address &
the port number on which the TCP communication will take place will be fixed.
Distributed Monitoring Tool Design Document CSE 5306
Page 12 of 24
Adding a first node will happen as below: New node to be added = X Server node number = 32
FIGURE 5 ADDING THE FIRST NODE
This will initiate the task of updating the node details of child 31 which will look like:
Node number = 31
IP address = XXX
Location = AA
Parent details: o Node number = 32 o IP address = XXX
Number of children = 0
Children details: o Left childe node number & IP address = 0 o Right childe node number & IP address = 0
The moment the node gets added, a data structure is created on the client which will contain data of all the 32 nodes. At the server the details will look like:
Node number = 32
IP address = XXX
Location = AA
Parent details: o Node number = 0 o IP address = 0
Number of children = 0
Children details: o Left childe node number & IP address = 31 & XXX, location = AA o Right childe node number & IP address = 0
Distributed Monitoring Tool Design Document CSE 5306
Page 13 of 24
After the second tier nodes are added, the network will look like:
FIGURE 6 STRUCTURE AFTER ADDING 2 NODES TO THE SYSTEM
Let’s say now we need to add node number 29 to the network. The exchange between different nodes will look like:
FIGURE 7ADDING NODE NUMBER 29
Node 29 details will look like:
Node number = 29
IP address = XXX
Location = AA
Parent details: o Node number = 31 o IP address = XXX
Number of children = 0
Children details: o Left childe node number & IP address = 0 o Right childe node number & IP address = 0
Node 31 details will look like:
Node number = 31
IP address = XXX
Location = AA
Distributed Monitoring Tool Design Document CSE 5306
Page 14 of 24
Parent details: o Node number = 32 o IP address = XXX
Number of children = 1
Children details: o Left childe node number & IP address = 29 & XXX, location = AA o Right childe node number & IP address = 0
At the server the details will look like:
Node number = 32
IP address = XXX
Parent details: o Node number = 0 o IP address = 0
Number of children = 3
Children details: o Left childe node number & IP address = 31 & XXX
Left childe node number & IP address = 29 & XXX, location = AA o Right childe node number & IP address = 30 & XXX, location = AA
The network tree, after adding node 29 will look like:
FIGURE 8 TREE STRUCTURE AFTER ADDING NODE 29
Distributed Monitoring Tool Design Document CSE 5306
Page 15 of 24
After adding a couple of more nodes, the network will look like:
FIGURE 9 ADDING A FEW MORE NODES
Deleting a node: Suppose node 29 wants to leave the network. Then, the exchange between the server & the nodes will take place as follows:
FIGURE 10 DELETING NODE NUMBER 29
Distributed Monitoring Tool Design Document CSE 5306
Page 16 of 24
Node 31 details after replacement of 25 will look like:
Node number = 31
IP address = XXX
Location = AA
Parent details: o Node number = 32 o IP address = XXX
Number of children = 1
Children details: o Left childe node number & IP address = 29 & XXX, location = AA o Right childe node number & IP address = 28 & XXX, location = AA
New node 29 details will look like:
Node number = 29
IP address = XXX
Location = AA
Parent details: o Node number = 31 o IP address = XXX
Number of children = 0
Children details: o Left childe node number & IP address = 0 & XXX, location = AA o Right childe node number & IP address = 0
The details of new node 29 will be updated at the server as well. Send particular location info: The server can ask for data from specific locations in the tree. The server will contact individual clients to get data from them. Send particular client info: The server can also ask for data from a specific client. The server will contact the client based on its IP address & socket number which is already stored in the clients database. Send broadcast beacon: Usually the children will send data to their respective parent nodes and the parent nodes will combine data with their own and send that to the server periodically. But, the server can also ask for the entire network data. In this case the children will initiate the bottom up data transfer.
Distributed Monitoring Tool Design Document CSE 5306
Page 17 of 24
Protocol Design: The communication between the children & parent nodes will take place as below over TCP/IP using 50000 as the port number. Byte # 1: Packet details like Request or acknowledgement or delete etc. Byte # 2: Number of bytes in this packet Depending on the first byte, the following bytes will vary. Byte # 3: Number of node details that this packet contains Byte # 4 to byte # x: Node 1 details Byte # x to y: Node 2 details . . . Byte # z to byte a: Last node System Information Collection Protocol: As stated in the Implementation section, SIGAR (System Information Gatherer And Reporter) will be used as the tool to gather each computer’s system, network, and hardware information. It is capable of gathering the following metrics:
System memory, swap, CPU, load average, uptime, logins
Per-process memory, CPU, credential info, state, arguments, environment, open files
File system detection and metrics
Network interface detection, configuration info and metrics
TCP and UDP connection tables
Network route table
Using the SIGAR libraries and commands is extremely simple since all of the low-level OS commands are already taken care of. SIGAR is broken down into smaller classes that are responsible for different aspects of information gathering of the computer’s system. We will use
Version – Reports the current version of SIGAR used and general OS information
Uptime – Reports the amount of time the OS has been active or awake
CPUinfo – Reports the detailed information about the CPU
Free – Reports memory information (file systems, total, used, and free space, etc.)
Ulimit – Reports the system resource limits (stack size, virtual memory, etc.) All information can be outputted in different formats and we will most likely use the Array.asList or pure string format pending which is simpler to parse. Here is a sample screenshot of what SIGAR is capable of outputting:
Distributed Monitoring Tool Design Document CSE 5306
Page 18 of 24
FIGURE 11 SIGAR OUTPUT
In regards to how SIGAR will be used to our project and system: 1. The network is settled and in “final” form (no more nodes currently being added) 2. All the leaf nodes/computers of the network each have a periodic timer within their
client-side code 3. Whenever this timer is triggered they call the SIGAR commands to gather their
information 4. This information is packaged within our custom built class in a rigid format 5. This Sys_Info object is passed along to its parent 6. The receiving parent node waits until it has all of its children’s objects (either 1 or 2)
Distributed Monitoring Tool Design Document CSE 5306
Page 19 of 24
7. Once it has all child information it combines those two objects with its own SIGAR-gathered Sys_Info object and passes along the collective data to its parent
8. This protocol continues until all information reaches the main server node which then displays all node information on the user’s interface
Synchronization is absolutely crucial for this scheme as to not pass information along until all node information below the current node has been gathered. Each node must be capable of parsing the data regardless of its size or number of node information contained within the data, which is why our Sys_Info object/class will need to have the same structure across ALL nodes. Output Display Format: Figure 11 shows all the information, by default, SIGAR can output. We plan on the server node having a simple JPanel with a tab for each of the 32 possible nodes. When clicking on a tab the corresponding node’s information is displayed in a non-editable text window similar to Figure 11, however not all of the SIGAR information will be used. We will only display the metrics defined in Outcome section of the Introduction. If the user selects a node’s tab that is not currently active within the network, no information is displayed except a “Node # is not active.” The JPanel will also have a location entry button which will help in entering the new nodes location. Client side JPanel: The client side will also have a simple JPanel where a new client will be able to connect to the server by entering the specific IP address of the server. Clicking on the connect button will ensure the initial TCP/IP connection between the client & server.
b. Challenges & solutions
Parsing data sent between nodes correctly o Since every time data is sent upward a level the Sys_Info object grows due to
each node adding its own information until the object finally reaches the main Server Node. We must ensure that our parsing routine can detect the number of nodes the object contains data for and act accordingly.
Ensuring correct and coherent table-keeping for node arrangement in the network o The main Server Node must keep an up-to-date table of where each and every
active node in the network is located, even when nodes are added or deleted arbitrarily during runtime.
Guaranteeing synchronization of child node reporting to parent node o Since a parent node may have two children, it may have to accept a two Sys_Info
object input. Correct synchronization refers to the parent receiving both (if applicable) of its children’s information and adding its own information BEFORE passing along to its parent, not prematurely so no data loss occurs.
Keeping accurate and updated IP and Port information amongst the clients
Distributed Monitoring Tool Design Document CSE 5306
Page 20 of 24
o This is crucial when adding or deleting nodes from the network. Each node class/type will need to have variables in order store their own information locally so any other node may request that information anytime. Must also store IP and Port information about its parent and children.
Ensuring no deadlocks occur during the threading of clients o Since there is no direct “client-to-client” communication, we must treat each link
in the network as a small client-server relationship due to limitations and functionality of Java’s .net libraries. Since we plan on using just a single Port number for all nodes, each client must exist within its own thread so not to interfere with any other communication from other nodes.
Distributed Monitoring Tool Design Document CSE 5306
Page 21 of 24
V. Implementation
a. Software and Tools to be used
Language: Java (using at least JDK 8u40)
Java’s Server/Client Socket Programming libraries (.net framework and functionality)
o TCP/IP scheme
SIGAR (System Information Gatherer And Reporter)
o Used for collecting each computer’s system information such as…
System memory, swap, CPU, load average, uptime, logins
Per-process memory, CPU, credential info, state, arguments,
environment, open files
File system detection and metrics
Network interface detection, configuration info and metrics
TCP and UDP connection tables
Network route table
o Provides a single API capable of working with all the popular operating systems
on the market (Windows, Linux, Solaris, MAC OS X, etc.)
o Its core is implemented in C but has bindings in other languages (Java will be
used for this project)
o Link:https://support.hyperic.com/display/SIGAR/Home;jsessionid=FA291DA9FC6
FE724352ABF54681B80AD
b. Work dispersion among team
A shared DropBox folder is used to share general project files, documents, notes, lectures,
papers, etc. A common repository will be used to share project code and files.
This project can be broken down into the following areas:
1. Gathering system information, packaging efficiently, and passing to another node
Nicholas, Sneha
2. Programming for the main server node parsing all information to display to the user
Tran, Sneha
3. Server node programming
Anuj, Nicholas
4. Client node programming
Anuj, Nicholas
5. General socket communication between nodes
Sneha, Tran
6. Overall architecture and structure of the network
All
Distributed Monitoring Tool Design Document CSE 5306
Page 22 of 24
VI. Theoretical/Simulation Study From above design method, it is easy to see that the nodes in the distributed monitoring
system have different storage capacity requirements. The nodes in higher height in the heap
have to combine the information of their children with their own information and then transfer
all of them to the higher level nodes . Thus, they not only have to storage but also transfer a
larger amount of information in comparison with their children. Consequently, the
requirements of storage capacity, computation ability increase from the leafs to the root in the
distributed monitoring system.
It would be a problem when we apply the design method for a distributed monitoring system
with a very large number of nodes because of the unbalance in storage capacity and
computational ability requirements between nodes. Therefore, the scalability of proposed
design method may be limited to a certain small number of nodes. In order to enhance the
balance in storage capacity and computational ability between nodes and thus increase the
scalability of the distributed monitoring system, the super peer architecture in low level nodes
and the client-server architecture in high level nodes should be combined together.
Distributed Monitoring Tool Design Document CSE 5306
Page 23 of 24
VII. Future Work We have just only proposed one basic operating mode for the distributed monitoring system
that allows the server to collect periodically all information of its children. To improve the
operating flexibility of the system, we would like to add two more operating modes into the
tool in the future.
Allow the system to operate under request-answer style. That means the server may ask
any node to send just only its information. This option helps the server to avoid
processing to much information from all of nodes that it not really care. Additionally,
this operating mode may help to reduce the communication load in entire the system.
Allow the server find the most suitable node for a specific task. When user want to find
a node to do a specific task, it requires the sever to find the most suitable node. The
server will send a request to its children to determine who is the most suitable node
containing optimal required resources for the task. This can be seen as request for
optimal information in specific cases.
In order to deal with a large number of nodes in the future, we would like to find an optimal
design architecture for the distributed monitoring system. From the theoretical analysis in
section 6, the combination of super peer and client server architecture may be a good
selection. However, it is certainly more complicated in implementation than the proposed
method.
Last but not least, we have to deal with one important situation of communication between
nodes in the project that a node suddenly fails (for example the node 29 in section 4). In the
project we just assume that before a node leaves, it informs the server to get the permission
from the server. After that, the server reconfigures the heap architecture and the system works
as usual again. However, one node can suddenly fail in connection with its children and its
parent as well. In that situation, its parent should be informs to the server that one of its
children have gone and asks for the reconfiguration the network structure from the server.
Distributed Monitoring Tool Design Document CSE 5306
Page 24 of 24
VIII. References Deri, L., Trombacchi, L., Martinelli, M. & Vannozzi, D. (2012). A Distributed DNS Traffic
Monitoring System. IEEE, 1-6.
Vaid, A., Jose, S. Putta, S., Rakoshitz, G. & Alto, P (2002). Directory Enabled Policy
Management Tool for Intelligent Traffic Management, 1-37.
https://support.hyperic.com/display/SIGAR/Home;jsessionid=FA291DA9FC6FE724352ABF5468
1B80AD