Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig...
-
Upload
lynette-benson -
Category
Documents
-
view
230 -
download
0
Transcript of Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig...
Using Charm++ to Mask Latency in Grid Computing Applications
Gregory A. Koenig ([email protected])Parallel Programming LaboratoryDepartment of Computer ScienceUniversity of Illinois at Urbana-Champaign
2004 Charm++ Workshop
Problem: Latency Tolerance for Multi-Cluster Applications
Goal: Good performance for tightly-coupled applications running across multiple clusters single campus Grid environment
Scenarios Very large applications On-demand computing
Challenge: Masking the effects of latency on inter-cluster messages
Cluster A Cluster B
Intra-cluster latency (microseconds)
Inter-cluster latency (milliseconds)
Solution: Processor Virtualization Charm++ chares and Adaptive MPI threads virtualize the
notion of a processor.
A programmer decomposes a program into a large number of virtual processors.
The adaptive runtime system maps virtual processors onto physical processors; the runtime may adjust this mapping as the program executes (load balancing).
If one virtual processor that is mapped to a physical processor cannot make progress, some other virtual processor on the same physical processor may be able to do useful work.
No modification of application software or problem-specific tricks are necessary!
Hypothetical Timeline View of a Multi-Cluster Computation
A
B
C
cross-cluster boundary
Processors A and B are on one cluster, Processor C on a second cluster
Communication between clusters via high-latency WAN Processor Virtualization allows latency to be masked
Charm++ on Virtual Machine Interface (VMI) Message data are
passed along VMI “send chain” and “receive chain”
Devices on each chain may deliver data directly, manipulate data, and/or pass data to next device
Application
Charm++
Converse(machine layer)
VMI
send c
hain
rece
ive ch
ain
AMPI
Description of Experiments Experimental environment
Artificial latency environment: VMI “delay device” adds a pre-defined latency between arbitrary pairs of nodes
TeraGrid environment: Experiments run between NCSA and ANL machines (~1.725 ms one-way latency)
Experiments Five-point stencil (2D Jacobi) for matrix sizes
2048x2048 and 8192x8192 LeanMD molecular dynamics code running a
30,652 atom system
Five-Point Stencil Results (P=2)
Five-Point Stencil Results (P=16)
Five-Point Stencil Results (P=32)
Five-Point Stencil Results (P=64)
LeanMD Results
Conclusion Processor virtualization is a useful
technique for masking latency in grid computing environments.
Future Work Testing across NCSA-SDSC Leverage Charm++ prioritized messages Grid-topology-aware load balancer Processor speed normalization Leverage Adaptive MPI