Introduction of Cluster and (KBRIN) Computational Cluster Facilities
-
Upload
tara-rollins -
Category
Documents
-
view
33 -
download
2
description
Transcript of Introduction of Cluster and (KBRIN) Computational Cluster Facilities
![Page 1: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/1.jpg)
Introduction of Cluster and (KBRIN) Computational Cluster Facilities
Xiaohui Cui
CECS Department
University of Louisville
09/03/2003
![Page 2: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/2.jpg)
Introduction to Cluster Technology
What is a Beowulf ?
Kentucky Biomedical Research Infrastructure Network
MPI Programming on KBRIN cluster
![Page 3: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/3.jpg)
People always wants to have faster computer
• Normal users
– Play game faster
– Play music better, watch movies better
• Science and Engineer
– Solving larger and more complex science and engineering problems using computer modeling, simulation and analysis
![Page 4: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/4.jpg)
How to make a computer faster?
• Make a faster chip!
– reduce feature size
– better architecture, better memory subsystem
• VLIW , Super scalar, vector support, DSP instruction (MMX, 3DNow)
• SDRAM, NVRAM
• Uni-processor speed is still limited by speed of light
• Alternate technologies
– Optical
– Bio
– Molecular
![Page 5: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/5.jpg)
How to make a computer faster?
• Using multiple processors to solve a single problem
– Divide problem into many small pieces
– Distributed these small problems to be solved by multiple processors simultaneously
• This technique is called Parallel ProcessingParallel Processing
![Page 6: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/6.jpg)
Parallel computer
• Parallel computer is a special computer with
– High Speed I/O , Large memory, multiple processing units, fast communication network
• Every modern supercomputer is also a parallel computer
CPU CPU CPU CPU
High Speed Network
![Page 7: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/7.jpg)
Fastest Supercomputer in the world
• Intel ASCI Red at Sandia National Laboratory
• 9216 Pentium Pro Processors
• 2.3 Teraflops performance
![Page 8: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/8.jpg)
But Supercomputer will cost you 100 millions, how to get enough money to buy one?
• Using PC Cluster is a low cost solution to this problem
![Page 9: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/9.jpg)
• A Cluster system is
– Parallel multi-computer built from high-end PCs and conventional high-speed network.
Introduction to Cluster Technology
![Page 10: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/10.jpg)
Why cluster computing?
• Scalability
– Build small system first, grow it later.
• Low-cost
– Hardware based on COTS model (Component off-the-shelf)
– Software based on freeware from research community
• Easier to maintain
• Vendor independent
![Page 11: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/11.jpg)
Different kinds of PC cluster
• High Performance Computing Cluster
• Load Balancing
• High Availability
![Page 12: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/12.jpg)
The Beginning
• Thomas Sterling and Donald Becker CESDIS, Goddard Space Flight Center, Greenbelt, MD
• Summer 1994: built an experimental cluster
• Called their cluster Beowulf
• 16 x 486DX4, 100MHz processors
• 16MB of RAM each, 256MB in total
• Channel bonded Ethernet (2 x 10Mbps)
• Not that different from our Beowulf
![Page 13: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/13.jpg)
Current Beowulfs
• Faster processors, faster interconnect, but the idea remains the same
• Cluster database: http://clusters.top500.org/
• Super cluster: 2300 processors, 11 TFLOPS peak
![Page 14: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/14.jpg)
What is a Beowulf ?
• Runs a free operating system (not Wolfpack, MSCS)
• Connected by high speed interconnect
• Compute nodes are dedicated (not Network of Workstations)
![Page 15: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/15.jpg)
•It’s cheap!
•Our Beowulf, 32 processors, 32GB RAM: $50,000
•The IBM SP2 cluster cost many millions
•Everything in a Beowulf is open-source and open standard - easier to manage/upgrade
Why Beowulf?
![Page 16: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/16.jpg)
Essential Components of a Beowulf
• Processors
• AMD and Intel
• Memory
• DDR RAM
• RDRAM
• Interconnect
• Fast Ethernet
• Gigabit Ethernet
• Myrinet
• Software
![Page 17: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/17.jpg)
Free cluster OS and management software
• OS• Linux• FreeBSD
• Cluster Management • Oscar: http://oscar.sourceforge.net/• Rocks: http://rocks.npaci.edu• MOSIX: http://www.mosix.org/
![Page 18: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/18.jpg)
DIY Cluster
![Page 19: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/19.jpg)
White Box Desktop
Cheap 2.8 GHz Pentium 4 for $1000 Very low margins
Expandable 4-6 PCI slots 3-5 disk drives
Low density 16 processor in on rack (on shelves)
Quality 90 - 365 day warrantees
![Page 20: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/20.jpg)
Commercial designed Cluster
![Page 21: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/21.jpg)
Brand Name Servers
Expensive Up to double equivalent desktop hardware
High density Rack mountable 64 processors in one rack Blades
Quality 3 year warrantee
Throw away machine when out of warrantee Good thermal design
![Page 22: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/22.jpg)
Minimum Components
x86 server
Local HardDrive
Power
Ethernet
![Page 23: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/23.jpg)
Cluster Advantages
• Error isolation: separate address space limits contamination of error
• Repair: Easier to replace a machine without bringing down the system than in an shared memory multiprocessor
• Scale: easier to expand the system without bringing down the application that runs on top of the cluster
• Cost: Large scale machine has low volume => fewer machines to spread development costs vs. leverage high volume off-the-shelf switches and computers
• Amazon, AOL, Google, Hotmail, Inktomi, WebTV, and Yahoo rely on clusters of PCs to provide services used by millions of people every day
![Page 24: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/24.jpg)
Cluster Drawbacks
• Cost of administering a cluster of N machines ~ administering N independent machines vs. cost of administering a shared address space N processors multiprocessor ~ administering 1 big machine
• Clusters usually connected using I/O bus, whereas multiprocessors usually connected on memory bus
• Cluster of N machines has N independent memories and N copies of OS, but a shared address multi-processor allows 1 program to use almost all memory
![Page 25: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/25.jpg)
Google company 2001 cluster reliability statistic
• For 6000 PCs, 12000 HD, 200 EN switches
• ~ 20 PCs will need to be rebooted/day• ~ 2 PCs/day hardware failure, or 2%-3% / year
– 5% due to problems with motherboard, power supply, and connectors
– 30% DRAM: bits change + errors in transmission (100 MHz)– 30% Disks fail– 30% Disks go very slow (10%-3% expected BW)
• 200 EN switches, 2-3 fail in 2 years
![Page 26: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/26.jpg)
Kentucky Biomedical Research Infrastructure Network (KBRIN) Computational Cluster Facilities
Dual AMD 2400 Workstation2 GB memory80 GB hard drive100 mb and 1 gb NIC
Dual AMD 2400 Workstation2 GB memory80 GB hard drive100 mb and 1 gb NIC
Dual AMD 2400 Workstation2 GB memory80 GB hard drive100 mb and 1 gb NIC
Dual AMD 2400 Workstation2 GB memory80 GB hard drive100 mb and 1 gb NIC
HPLaser Printer
KBRIN Supercomputer
(16) Compute Nodes
Dual AMD 2400 2 GB memory40 GB hard drive1 gb NIC
KVM Switch
Monitor, Keyboard
Master NodeDual AMD 2400 2 GB memory CD RW drive(4) 70 GB hard drives100 mb and 1 gb NIC
24 Port Gb Ethernet Switch
Master Backup System
Dual AMD 2400 Workstation2 GB memory(4) 70GB hard drives100 mb NIC
Campus Ethernet Network
100
mb
Eth
ern
et
100 mb Ethernet
Gig
abit
Eth
ern
et
CECS DepartmentBioinformatics Laboratory
KBRIN Computational ClusterDahlem Supercomputer
Laboratory
KBRIN Project Office
Elb 4/1/2003
Web Srvr / Bkp masterDual AMD 2400 2 GB memory CD RW drive(4) 70 GB hard drives100 mb and 1 gb NIC
![Page 27: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/27.jpg)
Programming a Cluster
• Cluster power comes from parallel processing
– Large task is decomposed to a set of small tasks
– These tasks are executed by a set of processes on multiple nodes
• Programming model based on Message Passing
– These process communicate by exchanging message which consists of data and synchronization information
![Page 28: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/28.jpg)
Programming environments
• Threads (PCs, SMPs, NOW..)
– POSIX Threads
– Java Threads
• MPI
– http://www-unix.mcs.anl.gov/mpi/mpich/
• PVM
– http://www.epm.ornl.gov/pvm/
![Page 29: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/29.jpg)
MPI
Message Passing Interface v1.1, v2.0 Standard for high performance message passing on
parallel machines http://www-unix.mcs.anl.gov/mpi/
Supports GNU C, Fortran 77 Intel C, Fortran 77, Fortran 90 Portland Group C, C++, Fortran 77, Fortran 90
Requires site license
![Page 30: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/30.jpg)
PVM
Parallel Virtual Machines v3.4.3 Message passing interface for heterogeneous
architectures Supports over 60 variants of UNIX Supports Windows NT
Resource control and meta computing Fault tolerance http://www.csm.ornl.gov/pvm/
![Page 31: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/31.jpg)
MPI Programming
243
5 5
687
8 1
2 3 4 6
8 9 0 2
2 4 8 9
556
6 8
6 7 8 9
5 6 6 71 5 6 9
2 3 8 9
2 3 5 5
4 9 1 1
4 3 4 1
2 8 8 9
4 4 3 1
1 2 7 8
Sum(A)=?A
![Page 32: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/32.jpg)
MPI Programming
243
5 5
687
8 1
2 3 4 6
8 9 0 2
2 4 8 9
556
6 8
6 7 8 9
5 6 6 71 5 6 9
2 3 8 9
2 3 5 5
4 9 1 1
4 3 4 1
2 8 8 9
4 4 3 1
1 2 7 8
Sum(A1,A2,A3,A4)=?
A1 A2
A3 A4
![Page 33: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/33.jpg)
MPI Programming
Master Slaves
Master
Request
Output
![Page 34: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/34.jpg)
MPI Programming
/* Algorithm for the master program */
initialize the array `items'.
/* send data to the slaves */
for i = 0 to 3
Send items[25*i] to items[25*(i+1)-1] to slave Pi
end for
/* collect the results from the slaves */
for i = 0 to 3
Receive the result from slave Pi in result[i]
end for
/* calculate the final result */
sum = 0
for i = 0 to 3
sum = sum + result[i]
end for
print sum
![Page 35: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/35.jpg)
MPI Programming
/* Algorithm for the slave program */
Receive 25 elements from the master in some array say `items'
/* calculate intermediate result */
sum = 0
for i = 0 to 24
sum = sum + items[i]
end for
send `sum' as the intermediate result to the master
![Page 36: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/36.jpg)
Run MPI Program on the Cluster
• MPI C compiler: mpicc
• MPI job submit software: PBS
• The PBS command used for submit MPI job:
• Mkpbs
• Qsub
• Rps
![Page 37: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/37.jpg)
Portable Batch System
Three standard components to PBS MOM
Daemon on every node Used for job launching and health reporting
Server On the frontend only Queue definition, and aggregation of node
information Scheduler
Policies for what job to run out of which queue at what time
![Page 38: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/38.jpg)
![Page 39: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/39.jpg)
Connect cluster from your computer
![Page 40: Introduction of Cluster and (KBRIN) Computational Cluster Facilities](https://reader035.fdocuments.net/reader035/viewer/2022062422/56813711550346895d9e98d2/html5/thumbnails/40.jpg)
Free X-windows Server on windows
http://www.cygwin.com/