Integration of Multidiscipline Applications in Grid-computing Environments
Monitoring and performance measurement in Production Grid Environments David Wallom.
-
date post
18-Dec-2015 -
Category
Documents
-
view
215 -
download
1
Transcript of Monitoring and performance measurement in Production Grid Environments David Wallom.
Monitoring and performance measurement in Production
Grid EnvironmentsDavid Wallom
Overview
• Who uses monitoring?
• Aspects of performance measurement
• Tools for monitoring
• Adding a new service into a monitoring framework
Who are the consumers of monitoring?
• Grid/VO management– Responsible for designing & maintaining requirements
– Verify fulfillment of SLAs by resource providers
• System administrators– Notified of problems– Enough information to understand context of problem
• End users– View results and compare to problems they are having
– Debug user account/environment issues– Advanced users: feedback to Grid/VO
Monitoring from a user perspective
• Things that need to work for the Grid?– Can I login?– Is my application[s] available on connected systems?
– Can I get to my input data?– What credentials do I need?– Can I get the input data to the application?– How long will my application take to run?– …
Performance Measurement
• Depends on monitoring of;– Availability– Usage
Measuring Availability
• Test the following grid functionality– User authorization– System information publishing– Data transfer to and from system– Submission of tasks onto the system
• Measurement of other functionality– Type of system
Measuring Usage
• Within each system need to know;– Current load
• e.g. queue lengths, number of running processes on an SMP system
– Knowledge of network connectivity– Total throughput rate for a submitted user job
Tools for monitoring availability
• Systems status
• Grid status
• All system and grid status monitoring
Ganglia
• Developed out of HPC community,
• Will monitor worker as well as system head nodes,
• Can have sub nodes reporting to a master to create grid monitoring,
• Example:– http://oxgrid-vom.ierc.ox.ac.uk/ganglia/
Big Brother
• Designed to monitor individual systems,• Simple interface giving immediate feedback on
overall system status,• Different providers can be added for additional
services such as different process to be monitored etc.
• Can be difficult to look at historical trends though,• Example;
– http://cerb-mds.bris.ac.uk/bb/bb.html
Grid Interoperability Test Scripts
• Developed by Southampton e-Science Centre,
• Tests in series each of the standard grid functionalities for a specified node
• Wrapper to test in parallel many systems• Example of the results
– http://www.ngs.ac.uk/ops/gits/oxford/NationalGridService.html
INCA
• Developed by SDSC and TeraGrid• Extensible framework for monitoring• Tests the following as standard
– Static system information– Installed software versions– Network performance– Load both on head and queue system if available
• Additionally the UK NGS has developed a plug-in for the GITS tests.
• Example– http://inca.grid-support.ac.uk/
Testing the behaviour of a Grid
• Define a set of concrete requirements for connected systems
• Write tests to verify requirements • Periodically run tests and collect data across all of the system
• Publish data and archive for reporting
• Automate Steps 3 and 4 to provide real time system status information
Connecting to existing production systems
• Determine monitoring requirements for systems to be connected
• Write independent tests for service being provided.
• Write information providers to fit tests into existing monitoring frameworks
Conclusions
• Monitoring must be based on a well known set of requirements for admins (both VO and systems) & users
• There are several products available to provide monitoring frameworks, each can be extended beyond initial capabilities
• Life would be made a lot simpler if there was a standard monitoring schema which could then be used to plug-in grid and system information into all monitoring frameworks!