Post on 08-Aug-2015
FAULT TOLERANCE IN CLUSTER COMPUTING
Guided By- Submitted By-
Mr. Ankush Agrawal Ravindra Pratap Singh
Mr. Praveen Rai Garima Kaushik
Kamini Saraswat
OUTLINE IntroductionPurposeRequirementsAdvantages Of Linux Objective Sub-Objective Research Gap Basic MPI CommandsMessage Passing InterfaceWorking StrategyGraphical Representation
INTRODUCTIONWhat Is Cluster …??? A cluster is a set of connected computers that work together so that it can be viewed as a single system. It works on master slave connection.
What Is Cluster Computing…??? A cluster computing is also known as HPC as it is used to solve the large problems in less time compared with other techniques. HPC may include Parallel, Cluster, Grid, Cloud and Green computing.
CONTINUE...What Is Fault…??? A fault is any error or unwanted condition that may arise in a system due to which our system will stop its execution. It may be a natural or man-made types.
What Is Fault Tolerance…??? A fault tolerance is an ability by which we will tolerate some type of faults so that we will get the correct final outcome. Eg. Faulty processor etc.
PUPOSE
The purpose of cluster technology is to eliminate single points of failure. When availability of data is your paramount consideration, clustering is ideal. Using a cluster we can avoids all of these single points of failure: Network card failure Processor failure Motherboard failure
REQUIREMENTSSoftware Environment
Operating system- Ubuntu 10.0.4 LTS
MPI_Ch2 Package
Open_MPI
Libshem-dev
Libmpich2-dev
ADVANTAGES OF USING LINUXThe following are some advantages of using Linux:Linux is readily available on the Internet and
can be downloaded without cost.It is easy to fix bugs and improve system
performance.Users can develop or fine-tune hardware
drivers which can easily be made available to other users.
The most important advantage of using linux is that it creates a several copies of one processor which helps in enhancing the performance of a system.
OBJECTIVE
We are working on linux operating system & on a communication patterns of clusters using MPI.
Our aim is to find faults, and to recover those faults which are causing unexpected behaviours (error , bugs etc.).
MESSAGE PASSING INTERFACE(MPI)The generic form of message passing in the parallel computing is the Message Passing Interface.
It is used as a medium of communication among the nodes.
In message passing, data is moved from address space of one to that of other by mean of cooperative operation such as send/receive pair.
BASIC MPI ROUTINS/COMMANDSFor comunication among different processes some routines are used which are-MPI_Send, to send a message to another process.MPI_Recv, to receive a message from another process.MPI_Gather, MPI_Gatherv, to gather data from participating processes into a single structure. MPI Comm size() – Number of MPI processes. MPI Comm rank() – Internal process number. MPI Get processor name() – External processor name.
CONTINUE… MPI_Scatter, MPI_Scatter, to break a structure into portions and distribute those portions to other processes. MPI_Allgather, MPI_Allgatherv, to gather data from different processes into a single structure that is then sent to all participants (Gather-to-all). MPI_Alltoall, MPI_Alltoallv, to gather data and then scatter it to all participants (All-to-all scatter/gather).MPI_Bcast, to Broadcast data to other processes.
COMMUNICATION PATTERNSCluster Computer s working on four communication patterns-
1. Single Direction Communication
2.Pair-based Communication
3.Pre-posted Communication
4.All-start Communication
SINGLE DIRECTION COMMUNICATION
Processes are paired off, with the lower rank sending message to the higher rank in a tight loop.
The individual pair synchronize before communication begins.
PAIR-BASED COMMUNICATIONEach process communicates with a small number of remote processes in each communication phase.
Communication is paired, so that a given process is both sending and receiving messages with exactly one other process at a time, rotating to a new process when communication is complete.
PRE-POSTED Excepted message reception in the next communication phase is computed before starting the computation phase.
This guarantees that receive buffer will be available during the communication phase.
ALL-START COMMUNICATIONIt is very much same as that of the pre-posted communication but it does not guarantee that all receives are pre-posted.
After the computation, MPI_WaitALL is called.
A call to MPI_WaitALL can be used to wait for all pending operation in a list.
WORKING STRATEGY Installation of Ubuntu 10.04 LTS.Installation of C in Ubuntu 10.04 LTS.Use of terminal.Installation of MPI_ch package on our Linux system.Study of basic Linux command & other Linux features Study of MPI, its basic commands & syntax.Execution of basic Linux & MPI commands. Execution of matrix program using C on linux platform.
CONTINUE... Execution of basic programs using MPI. Execution of parallel computing. We will generate fault, then detect &
at last, recover them by assigning the task of faulty process to some other process so as to overcome from failure.
We will apply fault tolerance techniques i.e.
Co-ordinate checkpoints Message logging
RESEARCH GAP Up to now, fault tolerance has not yet been applied to communication patterns.
So as to overcome with this problem, we need to introduce fault tolerance in communication patterns so as to reach to the correct final outcome.