“Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

23
OFFCIAL USE ONLY “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems Presented to: GoldSim Users Conference - 2007 October 25, 2007 San Francisco, CA Presented by: Patrick D. Mattie, M.S., P.G. Senior Member of Technical Staff Sandia National Laboratories Contributions by: Stefan Knopf, GTG and Randy Dockter, SNL-YMP

description

“Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems. Presented to: GoldSim Users Conference - 2007 October 25, 2007 San Francisco, CA Presented by: Patrick D. Mattie, M.S., P.G. Senior Member of Technical Staff Sandia National Laboratories - PowerPoint PPT Presentation

Transcript of “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

Page 1: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

“Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly

Parallel Problems

Presented to:GoldSim Users Conference - 2007

October 25, 2007San Francisco, CA

Presented by:Patrick D. Mattie, M.S., P.G. Senior Member of Technical StaffSandia National Laboratories

Contributions by: Stefan Knopf, GTG and Randy Dockter, SNL-YMP

Page 2: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

Presentation Outline

• Cluster Computing Defined– GoldSim and Beowulf?

• ‘COTS’ Cluster Computing using GoldSim– GoldSim and E.T.?

• Example Cluster – TSPA-Wulf

• What is next? Pushing the limits….

Page 3: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

What is Cluster Computing?

What is a Beowulf Cluster?

Background

Page 4: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

Cluster Computing Defined

• What is a compute cluster?– A Cluster is a widely-used term meaning

independent computers combined into a unified system through software and networking. At the most fundamental level, when two or more computers are used together to solve a problem, it is considered a cluster.

– Clusters are typically used for High Availability (HA) for greater reliability or High Performance Computing (HPC) to provide greater computational power than a single computer can provide.

Page 5: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

Beowulf Class Cluster

• Beowulf Class Cluster is a simple design for high-performance computing clusters on inexpensive personal computer hardware.

– Originally developed in 1994 by Thomas Sterling and Donald Becker at NASA

• Beowulf Clusters– are scalable performance clusters – based on commodity hardware – require no custom hardware or software

• A Beowulf Cluster is constructed from commodity computer hardware (Dell, HP, IBM, etc.) as simple as two networked computers sharing a file system on the same LAN or as complex as thousands of nodes with a high-speed, low-latency interconnects (networking)

• Common uses are traditional technical applications such as simulations, biotechnology, and petroleum; financial market modeling, data mining and stream processing.

• http://www.beowulf.org

Page 6: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

Advantages of a Beowulf Class Cluster

• Less computation time then running a serial process

• COTS –’Commodity Off the Shelf’– Doesn’t require a big budget– Doesn’t require specialized skill set

• Can be built using existing computer resources and Local Area Networks (LAN)

• Can be constructed over different system configurations/brands/resources

• Useful for solving embarrassingly parallel problems

Page 7: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

Why do I need a cluster?

• An embarrassingly parallel problem is one for which no particular effort is needed to segment the problem into a very large number of parallel tasks, and there is no essential dependency (or communication) between those parallel tasks– A Monte Carlo simulation is an embarrassingly

parallel problem • For example: a 100 realization simulation can be

broken into 100 separate problems, each solved independently from the other.

• http://en.wikipedia.org/wiki/Embarrassingly_parallel

Page 8: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

Why do I need a cluster?

• 100 realization run takes 1 minute per realization

– One Computer (or core):• ~1.6 hours

– On four computers (or cores):• 25 minutes

– Ten computers (or cores):• 10 minutes

Page 9: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

Cluster Computing Using GoldSimPro

• GoldSim Distributed Processing Module– The Distributed Processing Module uses multiple copies of

GoldSim running on multiple machines (and/or multiple processes within a single machine that has a multi-core CPU)

– Grid Computing:

Master

Slaves

Page 10: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

"Distributed" or "grid computing" - in general is a special type of parallel computing which relies on complete computers (with onboard CPU, storage, power supply, network interface, etc.) connected to a network (private, public or the internet) by a conventional network interface, such as Ethernet.

Examples include:– SETI@home Project: http://setiathome.ssl.berkeley.edu/ Analyzing radio telescope data in search of extraterrestrial intelligence

Cluster Computing - Distributed Processing

Page 11: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

Cluster Computing Using GoldSimPro

There are two versions of the Distributed Processing Module:

– GoldSim DP (comes with all versions of GoldSim)– GoldSim DP Plus (licensed separately)

Page 12: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

“Beowulfery” - YMP & GoldSim A Cluster Computing Example

Page 13: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

TSPA-Wulf – Cluster Configuration

• Window Server 2003, and Windows 2000 Advanced Server (3GB)

• Network simulations (master-slave)

– About 220 Intel Xeon 3.6 GHz dual-processor nodes with 8 GB RAM per machine, on a GigE LAN

– 60 Intel Xeon 3.0 GHz dual-processor dual-core nodes with 16 GB RAM per machine, on a GigE LAN

– One realization per slave CPU—after a slave CPU finishes one realization it accepts another from the master server

– 680 processors available (plus 62 legacy processors)

– 752 total

Page 14: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

Page 15: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

Page 16: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

Running the Model -- Overview

File Server Master Computer Slave Computers

Storage area for completed TSPA cases.

Controlled Storage area for:• Parameter Database• DLLs• input files

Cases are run by GoldSim as a

distributed process from a directory on a

Master.

Individual realizations are run by GoldSim processes on

Slaves.

Storage area for TSPA model file

Page 17: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

Set-Up On the Master Computer

Master ComputerFile Server

• Parameter Database- parameter values- links to DLLs- links to input files

• input files• DLLs

• TSPA model file • TSPA model file

• input files• DLLs

directory onmaster computer

(3a) Global download of parameter values to model file.

(1) Manually move model file to the Master computer.

(2) Set-up model file to run specific case.

(4) Document changes - conceptual write-up - check list - version control file

storage areas onfile server

(3b) Global download transfers input files and DLLs to the Maste computer.

(Transfers occur over LAN)

Page 18: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

Running - Transfers to Slaves

Master Computer Slave Computers

• DLLs• input files

• TSPA model file

(1) At the start of the distributed process:• A “Networked” directory is created for each processor on each Slave computer.• GoldSim slave process is started for each processor on each Slave computer.• Model file transferred• DLLs transferred• Input files transferred

(2) Information (i.e., LHS sampling) for each realization is transferred to slave processes as they are available.

• PA02- Networked1- Networked2

144 other slave computers

directory onmaster server

(Transfers occur over LAN)

• PA03- Networked1- Networked2

• PA04- Networked1- Networked2

Page 19: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

Running - Transfers from Slaves

Slave Computers(Transfers occur over LAN)

• PA02- Networked1- Networked2

144 other slave computers

• PA03- Networked1- Networked2

• PA04- Networked1- Networked2

Master Computer

• DLLs • input files

• TSPA model file

(1) .gsr files transferred as each realization is completed.

(2) GoldSim loads the .gsr files into the model file when all realizations are completed.

•.gsr filesone per realization

directory onmaster computer

Page 20: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

TSPA Model Architecture

• File size and count– 645 input files (approximately 5 GB in size)

– 14 DLLs

– GoldSim file with no results (pre-run) is about 200 MB in size

– GoldSim file after a run is about 5 to 6 GB in size (compressed); however, there is no intrinsic limitation other than the slowness of file manipulation on a 32-bit operating system

Page 21: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

TSPA-Wulf Benchmarks

• 1,000 realizations @ 90 minutes per realization– 62.5 Days to run serial mode– 120 processors would take ~ 12.5 hours– 99% faster

• A Typical 1,000,000-year, 1000-realization run (about 470 time steps) requires 24 hours on 150 CPUs (75 dual processor single core nodes, 32-bit, 2.8-3.0 GHz)

Page 22: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

What comes next?

Page 23: “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

OFFCIAL USE ONLY

SNL/GoldSim HPCC R&D

• GoldSim evolution/migration to Microsoft HPC- Migration from 32-bit to 64-bit architecture?

• Optimize modeling system for Microsoft HPC• Combined SNL/Microsoft/GoldSim task

• Link GoldSim with the Microsoft CCS scheduler tool to automatically queue jobs and ‘on the fly’ prioritize or re-prioritize job resources.

• Microsoft’s developers working with GoldSim

• True Parallel processing? • Using OpenMP to take advantage of multi-cores

• Optimize HPC Software for large compute cluster• Combined SNL/Microsoft task