"Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future...

44
"Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    0

Transcript of "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future...

Page 1: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

  "Practical Considerations in Building Beowulf Clusters"

Lessons from Experience and Future Directions

Arch Davis (GS*69) Davis Systems Engineering

Page 2: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 3: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 4: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 5: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 6: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 7: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

Poor light-socket coordination

Page 8: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

Parallel Computing Architectures

• 1. (not parallel) Fastest Possible serial– a. Make it complex– b. Limits

• 2. Old superscalar, vector Crays, etc.• 3. Silicon graphics shared memory (<64 CPUs)

• 4. Intel shared memory: 2-32 processor servers

• 5. Distributed memory: “Beowulf” clusters

• 6. Biggest D.m.: NEC SS6 “Earth Simulator”

Page 9: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

+ glue → Cluster ?

Building a Beowulf cluster

Page 10: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

Some Design Considerations

• 1. Processor type and speed

• 2. Single or Dual

• 3. Type of memory

• 4. Disk topology

• 5. Interconnection technology

• 6. Physical packaging

• 7. Reliability

Page 11: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

“Just a bunch of ordinary PCs”

• But to be reliable, more must be watched.– Power supplies– Fans– Motherboard components– Packaging layout– Heat dissipation– Power quality

• To be cost effective, configure carefully.– Easy to overspecify and cost >2x what is necessary– Don’t overdo the connections, they cost a lot.– The old woman swallowed a fly. Be careful your budget doesn’t

die.

Page 12: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

1. Processor type & speed

• A. Pentium 4 Inexpensive if not leading edge speed

• B. Xeon =dual processor P4. Shares a motherboard.

• C. AMD Opteron 64-bit Needed for >2GB mem.

• D. (future) Intel 64-bit Will be AMD compatible!

• E. IBM 970 (G5) True 64-bit design Apple is using

• F. Intel Itanium “Ititanic” 64-bit long instruction word

Page 13: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

Disk Topology

• 1. Disk per board

• 2. Diskless + RAID

Page 14: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

Interconnect Options

• Always a desire for way more speed than possible• Latency is ultimately an issue of light speed

• Existing options:• 1. Ethernet, including Gigabit Switched

• Very Robust, by Dave Boggs EECS’72• Affordable, even at Gigabit

• 2. Infiniband Switched• 3. Proprietary: Myrinet, Quadrics, Dolphin

• Various topologies, including 2&3-D meshes• Remote DMA may be transfer method• Assumes noise-free channel, may have CRC

Page 15: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 16: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

Physical Packaging

• It’s not “rocket science,” but it takes care.

• A few equations now and then never hurt when you are doing heat transfer design.

• How convenient is it to service?

• How compact is the cluster?• What about the little things? Lights & buttons?

• “Take care of yourself, you never know how long you will live.”

Page 17: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 18: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 19: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

Reliability

• Quality is designed-in, not an accident.• Many factors affect reliability.• Truism: “All PCs are the same. Buy the cheapest and

save.”• Mil-spec spirit can be followed without gold plate.• Many components and procedures affect the result.• Early philosophy: triage of failing modules• Later philosophy: Entire cluster uptime• Consequence of long uptime: user confidence, greatly

accelerated research

Page 20: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 21: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 22: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

Benchmarks● Not a synthetic

● 100 timesteps of Terra code (John R. Baumgardner, LANL)● Computational fluid dynamics application

● Navier-Stokes equation with ∞ Prandtl number● 3D spherical shell multi-grid solver

● Global elliptic problem with 174,000 elements● Inverting and solving at each timestep

Results are with Portland Group pf90 Fortran compiler on –fastsse option

And with Intel release 8 Fortran:

CompilerMachine Intel Portlandbaseline P4 2.0 319s 362 seclowpower P4M 1.6 342s 358 secRouter2 Xeon 2.4 264s 305 secepiphany Xeon 2.2 264s 312 secpntium28 P4 2.8 /800 172s 209 secopteron146 AMD 2.0 160s 164 secCray design NEC SX-6 ~50 sec

Page 23: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 24: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

Software

• Usually is Linux with MPI for communication.

• Could be Windows, but not many.

• Compilers optimize.

• Management and monitoring software

• Scheduling software

Page 25: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

Linux 32-bit/64-bit WindowsPentium 4 Athlon Xeon Opteron

PGI® Workstation – 1 to 4 CPU Systems

Compiler Language Command

PGF77® FORTRAN 77 pgf77

PGF90™ Fortran 90 pgf90

PGHPF® High Performance Fortran pghpf

PGCC® ANSI and K&R C pgcc

PGC++™ ANSI C++ with cfront compatibility features

pgCC

PGDBG® Source code debugger pgdbg

PGPROF® Source code performance profiler

pgprof

Page 26: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

Workstation Clusters

PGI CDK ™ = PGI Compilers + Open Source Clustering Software

A turn-key package for configuration of an HPC cluster from a group of networked Linux workstations or dedicated blades

Page 27: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 28: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

What about the future?

• Always go Beowulf if you can.

• Work on source code to minimize communication.

• Compilers may never be smart enough to automatically parallelize or second-guess the programmer or the investigator.

• Components will get faster, but interconnects will always lag processors.

Page 29: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 30: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 31: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

Future Hardware

• No existing boards are made for clustering.

• Better management firmware is needed.• Blade designs may be proprietary.• They may require common components to

operate at all.• Hard disks need more affordable reliability.• Large, affordable Ethernet switches are

needed.

Page 32: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

General advice?

• Think of clusters as “personal supercomputers.”

• They are simplest if used as a departmental or small-group resource.

• Clusters too large may cost too much:– Overconfigured– Massive interconnect switches– Users can only exploit so many processors at once– Multiple runs may beat one massively parallel run.– Think “lean and mean.”

Page 33: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 34: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 35: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 36: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

Opportunities

• 1. Test these machines with your code.

• 2. Get a consultation on configuration

Page 37: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 38: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 39: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 40: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 41: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 42: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 43: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Page 44: "Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.

More are Coming

• Peter Bunge sends his greetingsIn anticipation of a Deutsche Geowulf

256 Processors…

And many more clusters here and there.

Happy Computing!

But, NOT The End