A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang.
-
Upload
dayna-edwards -
Category
Documents
-
view
224 -
download
2
Transcript of A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang.
A Performance A Performance Comparison of Comparison of DSM, PVM, and DSM, PVM, and
MPIMPIPaul WersteinPaul Werstein
Mark PethickMark Pethick
Zhiyi HuangZhiyi Huang
IntroductionIntroduction
Relatively little is known about the Relatively little is known about the performance of Distributed Shared performance of Distributed Shared
Memory systems compared to Memory systems compared to Message Passing systems.Message Passing systems.
We compare the performance of the We compare the performance of the TreadMarks DSM system with two TreadMarks DSM system with two popular message passing systems, popular message passing systems,
MPICH-MPI, and PVM.MPICH-MPI, and PVM.
IntroductionIntroduction
Three applications are compared, Three applications are compared, Mergesort, Mandelbrot Set Mergesort, Mandelbrot Set
Generation, and Backpropergation Generation, and Backpropergation Neural Network. Neural Network.
Each application represents a Each application represents a different class of problem.different class of problem.
TreadMarks DSMTreadMarks DSM
Provides locks and barriers as Provides locks and barriers as primitives.primitives.
Uses Lazy Release Consistency.Uses Lazy Release Consistency. Granularity of sharing is a page.Granularity of sharing is a page. Creates page differentials to avoid Creates page differentials to avoid
the false sharing effect.the false sharing effect. Version 1.0.3.3Version 1.0.3.3
Parallel Virtual MachineParallel Virtual Machine
Provides concept of a virtual parallel Provides concept of a virtual parallel machine.machine.
Exists as a daemon on each node.Exists as a daemon on each node. Inter-process communication is Inter-process communication is
mediated by the daemons.mediated by the daemons. Design for flexibility.Design for flexibility. Version 3.4.3.Version 3.4.3.
MPICH - MPIMPICH - MPI
Standard interface for developing Standard interface for developing Message Passing Applications.Message Passing Applications.
Primary design goal is performance.Primary design goal is performance. Primarily defines communications Primarily defines communications
primitives.primitives. MPICH is a reference platform for MPICH is a reference platform for
the MPI standard.the MPI standard. Version 1.2.4Version 1.2.4
SystemSystem
32 Node Linux Cluster32 Node Linux Cluster 800mhz Pentium with 256 MB 800mhz Pentium with 256 MB Redhat 7.2Redhat 7.2 100mbit Ethernet100mbit Ethernet
Results determined for 1, 2, 4, 8, 16, Results determined for 1, 2, 4, 8, 16, 24, and 32 processes.24, and 32 processes.
MergesortMergesort
Parallelisation Strategy used is Parallelisation Strategy used is Divide and Conqueror.Divide and Conqueror.
Synchronisation between pairs of Synchronisation between pairs of nodes.nodes.
Loosely Synchronous class problem.Loosely Synchronous class problem.• Coarse grained synchronisationCoarse grained synchronisation
• Irregular synchronisation Irregular synchronisation points.points.
• Alternate phases of Alternate phases of computation and computation and communication.communication.
Mergesort Results (1)Mergesort Results (1)
Mergesort Results (2)Mergesort Results (2)
Mandelbrot SetMandelbrot Set
Strategy used is Data Partitioning.Strategy used is Data Partitioning. Work Pool is used as computation Work Pool is used as computation
time of sections differs.time of sections differs. Work Pool size >= 2 * num Work Pool size >= 2 * num
processes.processes. Embarrassingly Parallel class Embarrassingly Parallel class
problem.problem.• May involve complex computation, May involve complex computation, but there is very little communication.but there is very little communication.• Give indication of performance Under Give indication of performance Under ideal conditions.ideal conditions.
Mandelbrot Set ResultsMandelbrot Set Results
Neural Network (1)Neural Network (1)
Strategy is Data Partitioning.Strategy is Data Partitioning. Each processor trains the network Each processor trains the network
on a subsection of the data set.on a subsection of the data set. Changes are summed and applied at Changes are summed and applied at
the end of each epoch.the end of each epoch. Requires large data sets to be Requires large data sets to be
effective.effective..
Neural Network (2)Neural Network (2)
Synchronous class problem.Synchronous class problem.
• Characterised by algorithm that Characterised by algorithm that carries out the same operation on all carries out the same operation on all points in the data set.points in the data set.• Synchronisation occurs at regular Synchronisation occurs at regular points.points.• Often applies to problems that use Often applies to problems that use data partitioning.data partitioning.• A large number of problems appear A large number of problems appear to belong to the synchronous class.to belong to the synchronous class.
Neural Network Results Neural Network Results (1)(1)
Neural Network Results Neural Network Results (2)(2)
Neural Network Results Neural Network Results (3)(3)
ConclusionConclusion
In general the performance of DSM In general the performance of DSM is poorer than that of MPICH or is poorer than that of MPICH or PVM.PVM.
Main reasons identified are:Main reasons identified are:• The increased use of memory The increased use of memory associated with the creation of page associated with the creation of page differentials.differentials.• False sharing affect due to the False sharing affect due to the granularity of sharing.granularity of sharing.• Differential accumulation in the Differential accumulation in the gather operation.gather operation.