Patricia Polacco An Author Study By Cheryl Bohn An Author Study By Cheryl Bohn.
ROCKS & The CASCI Cluster By Rick Bohn. What’s a Cluster? Cluster is a widely-used term meaning...
-
Upload
douglas-ashley-rodgers -
Category
Documents
-
view
216 -
download
0
Transcript of ROCKS & The CASCI Cluster By Rick Bohn. What’s a Cluster? Cluster is a widely-used term meaning...
ROCKS & The CASCI ClusterROCKS & The CASCI ClusterBy Rick BohnBy Rick Bohn
What’s a Cluster?What’s a Cluster?
• Cluster is a widely-used term meaning independent computers combined into a unified system through software and networking.
• At the most fundamental level, when two or more computers are used together to solve a problem, it is considered a cluster.
Beowulf Cluster?Beowulf Cluster?
Beowulf Clusters are scalable performance clusters based on commodity hardware, on a private system network, with open source software (Linux) infrastructure.
• The designer can improve performance proportionally with added machines.
• The commodity hardware can be any of a number of mass-market, stand-alone compute nodes as simple as two networked computers each running Linux and sharing a file system or as complex as 1024 nodes with a high-speed, low-latency network.
High Performance or High ThroughputHigh Performance or High Throughput
The key questions are: Granularity & Degree of Parallelism• Have you got one big problem or a bunch of little ones?
To what extent can the problem be decomposed into sort-of-independent parts (grains) that can all be processed in parallel?
• Granularity– Fine-grained parallelism – the independent bits are small, need
to exchange information, synchronize often.– Coarse-grained – the problem can be decomposed into large
chunks that can be processed independently.
HPC versus HTCHPC versus HTC• Fine-grained problems need a high performance system
– That enables rapid synchronization between the bits that can be processed in parallel
– Runs the bits that are difficult to parallelize as fast as possible
• Coarse-grained problems can use a high throughput system– It maximizes the number of parts processed per minute
• HPC systems use a smaller number of more expensive processors expensively interconnected and is reliable
• HTC systems use a large number of inexpensive processors, inexpensively interconnected
Other Types of ClustersOther Types of Clusters
1. Highly Available (HA)• Generally small number of nodes• Redundant components• Multiple communication paths.
1. Visualization Clusters• Each node drives a display• OpenGL machines
Cluster ArchitectureCluster Architecture
Frontend Node Public Ethernet
PrivateEthernet Network
Application Network (Optional)
Node Node Node Node Node
Node Node Node Node Node
So What’s a Grid?So What’s a Grid?• The term Grid computing originated in the early 1990s as a
metaphor for making computer power as easy to access as an electric power grid.Today there are many definitions of Grid computing.
• IBM defines Grid Computing as "the ability, using a set of open standards and protocols, to gain access to applications and data, processing power, storage capacity and a vast array of other computing resources over the Internet. A Grid is a type of parallel and distributed system that enables the sharing, selection, and aggregation of resources distributed across 'multiple' administrative domains based on their (resources) availability, capacity, performance, cost and users' quality-of-service requirements"
• Grids can be categorized with a three stage model of departmental Grids, enterprise Grids and global Grids.
NYSGrid StatusNYSGrid Status
Things to ConsiderThings to Consider
Clusters are phenomenal price/performance computational engines.
However• They can be hard to manage without experience• High-performance I/O is still evolving• Finding out where something has failed increases at
least linearly as cluster size increases• Not cost-effective if every cluster “burns” a person just
for care and feeding• Programming environment could be vastly improved• Technology is changing very rapidly. Scaling up is
becoming commonplace
CASCI ClusterCASCI Cluster
Center for Advancing the Study of Center for Advancing the Study of CyberinfrastructureCyberinfrastructure
(CASCI)(CASCI)
Guy Johnson, DirectorGuy Johnson, Director
CASCI Cluster HardwareCASCI Cluster Hardware
Head Node (1)• IBM xSeries 345• 1 GB Ram• 2 Pentium 4 2.0 GHz• 6 Hard Drives 36 GB
(Internal RAID 5)• 2 Gig Ethernet Ports
Compute Nodes (47)• IBM xSeries 330• 512 MB Ram• 2 Pentium 3 1.4 GHz• 1 36 GB Hard Drive• 1 Gig Ethernet Port
NYSGrid Cluster HardwareNYSGrid Cluster Hardware
Head Node (1)• IBM xSeries 330• 768 MB Ram• 2 Pentium 3 1.4 GHz• 1 36 GB Hard Drive• 2 Fast Ethernet Ports
Compute Nodes (4)• IBM xSeries 330• 512 MB Ram• 2 Pentium 3 1.4 GHz• 1 36 GB Hard Drive• 1 Fast Ethernet Port
Experimental global grid cluster connected to other universities within New York state.
CASCI Cluster NETWORKCASCI Cluster NETWORK
The local network (eth0) is gigabit Ethernet using an Extreme Networks 6808 gigabit switch.
CASCI Cluster ImagesCASCI Cluster Images
The Great Wall of Cluster!The Great Wall of Cluster!
Cluster courtesy of Paul Mezzanini
Located behind CASCI Cluster racks
ROCKSROCKSClustering SoftwareClustering Software
ROCKS CollaboratorsROCKS Collaborators
• San Diego Supercomputer Center, UCSD • Scalable Systems Pte Ltd in Singapore • High Performance Computing Group, University
of Tromso • The Open Scalable Cluster Environment,
Kasetsart University, Thailand • Flow Physics and Computation Division,
Stanford University • Sun Microsystems • Advanced Micro Devices
ROCKS Cluster SoftwareROCKS Cluster Software
Goal: Make Clusters Easy!
1. Easy to deploy, manage, upgrade and scale.
1. Help deliver the computational power of clusters to a wide range of scientific users.
• Making stable and manageable parallel computing platforms available to a wide range of scientists will aid immensely in improving the state of the art in parallel tools.
Supported PlatformsSupported Platforms
ROCKS - is built on top of RedHat Linux releases (CentOS) - supports all the hardware components that RedHat supports, but only supports the x86, x86_64 and IA-64 architectures.
Processors• x86 (ia32, AMD Athlon, etc.) • x86_64 (AMD Opteron and EM64T) • IA-64 (Itanium)
Networks• Ethernet (All flavors that RedHat supports, including Intel Gigabit Ethernet) • Myrinet (provided by Myricom) • Infiniband (provided by Voltaire)
Minimum Hardware RequirementsMinimum Hardware Requirements
Frontend Node• Disk Capacity: 20 GB • Memory Capacity: 512 MB (i386) and 1 GB (x86_64) • Ethernet: 2 physical ports (e.g., "eth0" and
"eth1")
Compute Node• Disk Capacity: 20 GB • Memory Capacity: 512 MB • Ethernet: 1 physical port (e.g., "eth0")
ROCKS DistributionROCKS Distribution
• The ROCKS software is bundled into various packages called “Rolls” and put on CDs.
• Rolls are specially compiled to fit into the ROCKS installation methodology.
• Rolls are classified as either mandatory or optional.
• Rolls cannot be installed after the initial installation.
ROCKS Base RollsROCKS Base Rolls
The minimum requirements to bring up a frontend is to have the following Rolls.
• Kernel/Boot Roll • Core Roll (Base, HPC, Web-server)
OR BASE, HPC & Web-server Rolls• Service Pack Roll • OS Roll - Disk 1 • OS Roll - Disk 2
ROCKS Optional RollsROCKS Optional Rolls
The optional Rolls are:– Core Roll
• Area 51 (chkrootkit and tripwire)• Ganglia (system monitoring software)• Grid (software for connecting clusters)• Java (Sun Java SDK and JVM)• SGE (Sun Grid Engine scheduler)
– Bio (bioinformatics utilities (release 4.2))– Condor (high throughput computing tools)– PBS (portable batch scheduling software)– PVFS2 (parallel virtual file system version 2)– VIZ (visualization software)– Voltaire (Infiband support for Voltaire IB hardware)
ROCKS Software StackROCKS Software Stack
The Head NodeThe Head Node
• Users login, submit jobs, compile code, etc
• Uses two Ethernet interfaces– one public, one private for compute nodes
• Normally has lots of disk space (system partitions < 14 GB)
• Provides many system services– NFS, DHCP, DNS, MySQL, HTTP, 411, Firewall,etc
• Cluster configuration
Compute NodesCompute Nodes
• Basic compute workhorse• Lots of memory (if lucky)
• Minimal storage requirements
• Single Ethernet connection for private LAN
• Disposable
• OS easily re-installed from head node
• Nodes can be heterogeneous
NFS in ROCKSNFS in ROCKS
• User accounts are served over NFS– Works for small clusters (< 128 nodes)– Will not work for large clusters (>1024)– NAS tends to work better
• Applications are not served over NFS– /usr/local does not exist– All software is installed locally (/opt)
411 Secure Information Service411 Secure Information Service
• Provides NIS-like functionality
• Securely distributes password files, user and group configuration files and the like using Public Key Cryptography to protect file content.
• Uses HTTP to distribute the files
• Scalable, secure and low latency
411 Architecture411 Architecture
1. Client nodes listen on the IP broadcast address for “411 alert” messages from the head node.
2. Nodes then pull the file from the head node via HTTP after some delay to avoid flooding the master with requests.
As Simple as 411As Simple as 411
To make changes to the 411 system you simply use “make” and the 411 “Makefile” similar to NIS.
• To publish 411 changes, on the head node run the command: 411put
• To retrieve 411 changes, on the compute node run the command: 411get
or on the head node: cluster-fork 411get --all
Ganglia MonitoringGanglia Monitoring
• Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and grids
• It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization.
• It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency.
• Provides a heartbeat to determine compute node availability.
Cluster Status with GangliaCluster Status with Ganglia
Security ToolsSecurity Tools
• Tripwire runs everyday and emails results.
• Chkrootkit is available and is manually executed
• Iptables is used as the firewall. Only trusted networks are allowed access.
Job ManagementJob Management• Not recommended to run jobs directly!
– Can hog cluster/nodes– No accountability
• Use installed job scheduler– You can submit multiple jobs and have it queued
(and go home!)– Fair Share
Allow other people to use the cluster also!
- Accountability
CASI Cluster Users!Without job management
Scheduling SystemsScheduling Systems• Sun Grid Engine (default scheduler)
– Rapidly becoming the new standard– Integrated into Rocks by Scalable System– Now the default scheduler for Rocks– Robust, dynamic and heterogeneous– Currently using 6.0
• Portable Batch System(torque) and Maui– Long time standard for HPC queuing systems– Maui provides backfilling for high throughput– PBS/Maui system can be fragile and unstable– Multiple code bases:
• PBS, OpenPBS, etc
• Condor – high throughput computing ( currently under evaluation)
Sun Grid Engine (SGE)Sun Grid Engine (SGE)
• SGE is resource management software
– Accepts jobs submitted by users
– Schedules them for execution on appropriate systems based on resource management policies
– Can submit 100s of jobs without worrying where it will run
– Supports serial as well as parallel jobs
SUN Grid Engine VersionsSUN Grid Engine Versions• SGE Standard Edition
– Linux cluster
• SGE Enterprise Edition– when you want to aggregate a few clusters together and
manage them as one resource
– When you want sophisticated policy management• User/Project share• Deadlines• User, Department, Project level
Rocks comes standard with SGE Enterprise 6.0
Cluster Web Site Cluster Web Site (http://cluster.rit.edu)(http://cluster.rit.edu)
Requesting an AccountRequesting an Account
Accessing the ClusterAccessing the Cluster
Access the cluster via an SSH client
• PuTTY
• SSH Secure Shell
• X-Win32
• F-Secure
To transfer data to the cluster use either scp or sftp.
Windows users can download and use WinSCP (http://winscp.net)
Available ApplicationsAvailable Applications
• BLAST (basic local alignment search tool for bio research)
• ENVI / IDL Data Visualization Software• GCC (C, C++, Fortran programming)• Mathematica (licensing limitations)
• Matlab (licensing limitations)
• mpiBLAST (parallel version of BLAST)• MPICH (MPI parallel programming)
Other Alternatives to ROCKSOther Alternatives to ROCKS
Clustering Software• Perceus / Warewulf (www.warewulf-cluster.org)• openMosix Project (openmosix.sourceforge.net)• Score Cluster System (www.pcluster.org)• OSCAR (oscar.openclustergroup.org)
System Imaging / Configuration Software• System Imager (wiki.systemimager.org)• Cfengine (www.cfengine.org)• LCFG (www.lcfg.org)
THANK YOUTHANK YOU
A Bad to the Bohn Production