Planning and building linux based cluster for NWP
description
Transcript of Planning and building linux based cluster for NWP
Planning and building linux based cluster
for NWP
Climatological Research Institute(CRI Cluster)
Dr. JamaliDr. Jamali ChezgiChezgi
OutlineOutline IntroductionIntroductionOur problemOur problemOur solutionOur solutionBuilding CRI ClusterBuilding CRI ClusterMonitoring and controllingMonitoring and controllingBenchmarkingBenchmarkingFeature plansFeature plans referencesreferences
IntroductionIntroduction
Environment / Climate / WeatherEnvironment / Climate / Weather Aeronautics and space explorationAeronautics and space exploration Energy research Energy research Virtual realityVirtual reality Scientific visualizationScientific visualization Health sciencesHealth sciences
Make observation
Collect andprocess data
Run forecast model
Create product
Provide for end users
Main issuesMain issues
Very large data setsVery large data setsDistributed dataDistributed dataHigh processing requiredHigh processing requiredNeed to real-time processesNeed to real-time processesCoupled models Coupled models
Our problemsOur problems
Data managementData managementLisaLisa
NWP modelsNWP modelsARPSARPSMM5MM5HRMHRM
Climatological modelsClimatological modelsNCMNCM
NWP modelsNWP models
ARPSARPSMM5MM5HRMHRM
ARPSARPS Advanced Regional Prediction SystemAdvanced Regional Prediction System Open sourceOpen source Parallel codeParallel code Running on the all unixesRunning on the all unixes
1.1. IBM IBM RS/6000RS/6000 WorkstationWorkstation
2.2. CrayCray C-90C-903.3. CrayCray T3DT3D4.4. CrayCray J90J905.5. CM-5CM-56.6. PCPC LINUXLINUX
ARPSTERN( Terrain data preprocessor )
Arpstern.input
ARPSSFC
( Surface characteristics data preprocessor )
Arps40.inputArps40.input
EXT2 ARPS
( Gridded data interpolater)
ARPS Analysis System
ARPS Data
Assimilation System
ARPSRETRV
)Doppler Radar Data
Retrieval system(
ARPS
( Main model driver )Arps40.input
ARPSPLT
( Vector graphics post-processor)
ARPSCVT
( History data format converter )
Other
post- processing toolsVisualization packages
( Savi3D,AVS etc )
Arpscvt .inputArpsplt.input
Indexed terrain
elevation file
( 1°,5 min,or 30 sec )
Soil .vegetation
type and other
land-use data
Rawinsondes,VAD. And wind profilers
Doppler
Radar Data
Single-level data
Doppler
Radar Data
User-supplied
gridded data
(e.g.OLAPS.NMC
analysis )
ARPS Model Process Flow chart
Climatological ModelsClimatological Models
Our solutionOur solution
Memory:Memory:
using bigger memory ?using bigger memory ?
CPU:CPU:
using better CPU ?using better CPU ?
Cluster:Cluster:
for powering Memory and for powering Memory and CPU CPU
Building ClusterBuilding Cluster
CRI ClusterCRI Cluster
Prebuilded clusters?Prebuilded clusters?direct relation between technology direct relation between technology
and end userand end userCustomize it for our usersCustomize it for our usersobtaining this technologyobtaining this technologyBetter use Better use We can upgrade itWe can upgrade itLower costsLower costsSamples on the worldSamples on the world
OU ClusterOU Cluster Breakdown of NodesBreakdown of Nodes
132 Compute Nodes 132 Compute Nodes (computing jobs) (computing jobs)
8 Storage Nodes 8 Storage Nodes (Parallel Virtual (Parallel Virtual File System)File System)
2 Head Nodes 2 Head Nodes (login, compile, (login, compile, debug, test)debug, test)
1 Management 1 Management Node (PVFS Node (PVFS control, batch queue)control, batch queue)
Each NodeEach Node 2 Pentium4 XeonDP 2 Pentium4 XeonDP
CPUs (2 GHz, CPUs (2 GHz, 512 KB L2 Cache)512 KB L2 Cache)
2 GB RDRAM 2 GB RDRAM (400 MHz, 3.2 (400 MHz, 3.2 GB/sec)GB/sec)
Myrinet-2000 adapterMyrinet-2000 adapter
Cluster ArchitectureCluster Architecture
Cluster room Cluster room
SpaceSpacePackingPackingPowerPowerAir conditionAir conditionEasily repairingEasily repairingSecuritySecurityCabling Cabling
true multitaskingtrue multitasking
virtual memoryvirtual memory
shared libraryshared library
demand loadingdemand loading
shared copy-on-write executablesshared copy-on-write executables
proper memory managementproper memory management
TCP/IP networkingTCP/IP networking
Up to 64 GB memory support in i386Up to 64 GB memory support in i386
IP Virtual server Support IP Virtual server Support
Virtual server via NAT Virtual server via NAT
Virtual server Tunneling Virtual server Tunneling
Virtual server direct routing Virtual server direct routing
Vlan Vlan
Fast Switching Fast Switching
Bonding driver Bonding driver
Eql Eql
386/486 based pc, ARMS, DEC, ALPHA, SUN sparc, M 68000, MIPS, PowerPC, …
Linux Linux
Communication protocolsCommunication protocols
Internet protocolsInternet protocolsLow latency protocolsLow latency protocols
Active messagesActive messagesFast messagesFast messagesVMMCVMMCU-netU-netBIPBIP
TCP/IP problems for clusteringTCP/IP problems for clustering
1. Latency
for small packets
2. Bandwidth
for big packets
Protocol overheadProtocol overheadNIC system
Os memory
User memory User
process
OS
1)Preparing data
2)Sending intrupt
3) copy
4)Intrupt to sending out data
5)Send to NIC
Internal buffers
Cluster computing standardsCluster computing standards
VIAVIA Combination of the many protocolsCombination of the many protocols Like U-net uses virtual network interfaceLike U-net uses virtual network interface native and emulated native and emulated A version of the emulated VIA has more performance than TCP/IPA version of the emulated VIA has more performance than TCP/IP MPICH over VIAMPICH over VIA
InfinibandInfiniband Compaq dell HP IBM intel microsoft sunCompaq dell HP IBM intel microsoft sun Replace the shared I/O with a Replace the shared I/O with a high speed high speed
serial,channel based,messageserial,channel based,messagepassing ,scalable ,switched fabric.passing ,scalable ,switched fabric.
Using HCA and TCA to connect the channelUsing HCA and TCA to connect the channel Uses Six type transfer method:Uses Six type transfer method:
reliable and unreliable connections and reliable and unreliable connections and datagrams,multicast connections,raw datagrams,multicast connections,raw packetspackets
Support DMASupport DMA IPv6IPv6
Hardware productsHardware products Ethernet fast ethernet and gigabit ethernetEthernet fast ethernet and gigabit ethernet Giganet(cLAN)Giganet(cLAN) MyrinetMyrinet QsnetQsnet ServerNetServerNet SCI(Scalable Coherent Interface)SCI(Scalable Coherent Interface) ATMATM Fiber ChannelFiber Channel HIPPIHIPPI Reflective MemoryReflective Memory ATOLLATOLL
Installing and configuringInstalling and configuring
Installing serverInstalling serverBuilding servicesBuilding servicesAuto installing clientsAuto installing clientsAuto configuring clientsAuto configuring clientsManagement of the nodes Management of the nodes
NIS configurationNIS configurationIn the server
1) Specifying domain name1) Specifying domain name# # domainnam <DOMAIN_NAME>domainnam <DOMAIN_NAME>
2)2) Putting in the “Putting in the “//etc/Sysconfig/network”etc/Sysconfig/network”NISDOMAIN=<DOMAIN_NAME>NISDOMAIN=<DOMAIN_NAME>
3) Specifying server name in “3) Specifying server name in “//etc/yp.conf ” :etc/yp.conf ” :NISDOMAIN <DOMAIN_NAME> SERVER <SERVER_NAME>NISDOMAIN <DOMAIN_NAME> SERVER <SERVER_NAME>
4) Restarting daemons :4) Restarting daemons :# /etc/ rc.d/ ypserv rest# /etc/ rc.d/ ypserv restaartrt# /etc/ rc.d/ypbind restart# /etc/ rc.d/ypbind restart
5) Putting it in the init5) Putting it in the init6)Editing “/etc/yp/Makefile”6)Editing “/etc/yp/Makefile”
MERGE_PASSWD= FALSEMERGE_PASSWD= FALSE TRUETRUE MERGE_GROUP=FALSEMERGE_GROUP=FALSE TRUETRUE delete netgrp from all options.delete netgrp from all options.
7)Bulding NIS Database :7)Bulding NIS Database :#/#/usr/libusr/lib/yp/ypinit -m/yp/ypinit -m
8) If you make any changes in the feature only run this8) If you make any changes in the feature only run this# cd /var/yp; make# cd /var/yp; make
NIS configurationNIS configuration
1) Specifying domain name1) Specifying domain name# # domainnam <DOMAIN_NAME>domainnam <DOMAIN_NAME>
2) Putting in the “2) Putting in the “//etc/Sysconfig/network”etc/Sysconfig/network”NISDOMAIN=<DOMAIN_NAME>NISDOMAIN=<DOMAIN_NAME>
3) Specifying server name in “3) Specifying server name in “//etc/yp.conf ” :etc/yp.conf ” :NISDOMAIN <DOMAIN_NAME> SERVER NISDOMAIN <DOMAIN_NAME> SERVER <SERVER_NAME><SERVER_NAME>
4) Restarting daemons :4) Restarting daemons :# /etc/ rc.d/ypbind restart# /etc/ rc.d/ypbind restart
5) Putting it in the init5) Putting it in the init6) Testing it with logging in with the server users6) Testing it with logging in with the server users
In the client
Monitoring and controllingMonitoring and controlling1)scripts:1)scripts:
perlperlpythonpythonbashbash
2) Prebuilded2) PrebuildedWebminWebminScyldScyldSCDSCD
Hardware monitoring and Hardware monitoring and control(IceBox)control(IceBox)
Icebox management with hardwareIcebox management with hardware monitor temperatures within nodes and remotely reset motherboards monitor temperatures within nodes and remotely reset motherboards
through internally placed probes through internally placed probes SNMP compliant SNMP compliant DHCP or static network configuration DHCP or static network configuration NIMP (Network ICE Management Protocol) NIMP (Network ICE Management Protocol) SIMP (Serial ICE Management Protocol) SIMP (Serial ICE Management Protocol) Out-of-band Serial Data Buffering Out-of-band Serial Data Buffering Accessible with several protocols (NIMP, SIMP, Null Modem, Telnet, Accessible with several protocols (NIMP, SIMP, Null Modem, Telnet,
SNMP, ClusterWorX) SNMP, ClusterWorX) Remote temperature monitoring of CPU temperatures Remote temperature monitoring of CPU temperatures Remote Power Management Remote Power Management Power sequencing to start-up nodes Power sequencing to start-up nodes Optional cabinet temperature monitoring (eight sensors per ICE Box) Optional cabinet temperature monitoring (eight sensors per ICE Box) Node reset Node reset Multiple ICE Boxes scale to support large clusters Multiple ICE Boxes scale to support large clusters Embedded CPU powered by Linux for stable run-time environment Embedded CPU powered by Linux for stable run-time environment Ability to easily and safely update ICE Box Operating System without Ability to easily and safely update ICE Box Operating System without
cluster downtime cluster downtime
SecuritySecurity
SSHSSHPAMPAMXinetdXinetd
Running ARPSRunning ARPS Fortran 77 compiler (GNU)Fortran 77 compiler (GNU) Pre processing dataPre processing data BC and IC data from other modelsBC and IC data from other models Post processing tools (NCARG)Post processing tools (NCARG) Running flowchartRunning flowchart
Preprocessing (always one time)Preprocessing (always one time) splittingsplitting
InitializingInitializing Boundary conditionsBoundary conditions
RunningRunning JoiningJoining Post processing (another computers)Post processing (another computers)
Parallel architecture of the Parallel architecture of the ARPSARPS
Transform ToolTransform Tool
200*200
800*800
800*400
10 km
3 km
1 km
Grid computing?Grid computing?
1-Big domain low res coarse domain and better res2-in data assimulation code goes to the near of data
AUIAUI
BenchmarkingBenchmarking
ARPS resultsARPS results
GMandelGMandel
BPSBPS
Performance UtilitiesPerformance Utilities
1.1. AIMS - instrumentors, monitoring library, and analysis toolsAIMS - instrumentors, monitoring library, and analysis tools
2.2. MPE logging library and Nupshot performance visualization MPE logging library and Nupshot performance visualization
tooltool
3.3. Pablo - monitoring library and analysis toolsPablo - monitoring library and analysis tools
4.4. Paradyn - dynamic instrumentation and run-time analysis toolParadyn - dynamic instrumentation and run-time analysis tool
5.5. SvPablo - integrated instrumentor, monitoring library, and SvPablo - integrated instrumentor, monitoring library, and
analysis toolanalysis tool
6.6. VAMPIRtrace monitoring library and VAMPIR performance VAMPIRtrace monitoring library and VAMPIR performance
visualization toolvisualization tool
7.7. VT - monitoring library and performance analysis and VT - monitoring library and performance analysis and
visualization tool for the IBM SPvisualization tool for the IBM SP
ARPS performanceARPS performance
Performance is better for larger Performance is better for larger domain per CPUdomain per CPU
Because of the network limitationBecause of the network limitation
at the cluster and we need largerat the cluster and we need larger
calculation per data transfer.calculation per data transfer.
Model situation
200*200 per processorPrediction time = 60s output = NONEDtbig = 6s 1km * 1km * 500m grids
--200 * 200 per domain {200 x 200}-1 cpu-- ARPS stopped normally in the main program. The ending time was 60.000 seconds. Thanks for using ARPS. Process CPU time used Percentage----------------------------------------------- Initialization : 0.760000E+01s 1.40% Data output : 0.829005E+01s 1.53% Wind advection : 0.190701E+02s 3.52% Scalar advection: 0.397800E+02s 7.34% Coriolis force : 0.000000E+00s 0.00% Buoyancy term : 0.618995E+01s 1.14% Small time steps: 0.241000E+03s 44.48% Radiation : 0.000000E+00s 0.00% Soil model : 0.000000E+00s 0.00% Surface physics : 0.000000E+00s 0.00% Turbulence : 0.874099E+02s 16.13% Comput. mixing : 0.352601E+02s 6.51% Rayleigh damping: 0.271003E+01s 0.50% TKE src terms : 0.287300E+02s 5.30% Bound.conditions: 0.220026E+00s 0.04% Gridscale precp.: 0.000000E+00s 0.00% Kuo cumulus : 0.000000E+00s 0.00% Kain-Fritsch : 0.000000E+00s 0.00% Warmrain microph: 0.452400E+02s 8.35% Lin ice microph : 0.000000E+00s 0.00% NEM ice microph : 0.000000E+00s 0.00% Hydrometero fall: 0.000000E+00s 0.00% Miscellaneous : 0.169800E+02s 3.13%
Entire model : 0.541820E+03s 100.00%
0.541820E+03s
--200 * 200 per domain {400 x 200}-2 cpu--ARPS stopped normally in the main program. The ending time was 60.000 seconds. Thanks for using ARPS. Process CPU time used Percentage----------------------------------------------- Initialization : 0.763000E+01s 1.41% Data output : 0.822997E+01s 1.52% Wind advection : 0.190600E+02s 3.52% Scalar advection: 0.402001E+02s 7.42% Coriolis force : 0.000000E+00s 0.00% Buoyancy term : 0.615997E+01s 1.14% Small time steps: 0.241520E+03s 44.56% Radiation : 0.000000E+00s 0.00% Soil model : 0.000000E+00s 0.00% Surface physics : 0.000000E+00s 0.00% Turbulence : 0.872100E+02s 16.09% Comput. mixing : 0.351900E+02s 6.49% Rayleigh damping: 0.276001E+01s 0.51% TKE src terms : 0.285300E+02s 5.26% Bound.conditions: 0.240047E+00s 0.04% Gridscale precp.: 0.000000E+00s 0.00% Kuo cumulus : 0.000000E+00s 0.00% Kain-Fritsch : 0.000000E+00s 0.00% Warmrain microph: 0.451199E+02s 8.32% Lin ice microph : 0.000000E+00s 0.00% NEM ice microph : 0.000000E+00s 0.00% Hydrometero fall: 0.000000E+00s 0.00% Miscellaneous : 0.168399E+02s 3.11%
Entire model : 0.542000E+03s 100.00%
0.542000E+03s
--200 * 200 per domain {400 x 400}-4 cpu--
ARPS stopped normally in the main program. The ending time was 60.000 seconds. Thanks for using ARPS. Process CPU time used Percentage----------------------------------------------- Initialization : 0.762000E+01s 1.40% Data output : 0.827001E+01s 1.52% Wind advection : 0.191300E+02s 3.52% Scalar advection: 0.404000E+02s 7.44% Coriolis force : 0.000000E+00s 0.00% Buoyancy term : 0.614000E+01s 1.13% Small time steps: 0.241750E+03s 44.53% Radiation : 0.000000E+00s 0.00% Soil model : 0.000000E+00s 0.00% Surface physics : 0.000000E+00s 0.00% Turbulence : 0.874600E+02s 16.11% Comput. mixing : 0.351000E+02s 6.47% Rayleigh damping: 0.273998E+01s 0.50% TKE src terms : 0.285099E+02s 5.25% Bound.conditions: 0.249939E+00s 0.05% Gridscale precp.: 0.000000E+00s 0.00% Kuo cumulus : 0.000000E+00s 0.00% Kain-Fritsch : 0.000000E+00s 0.00% Warmrain microph: 0.451600E+02s 8.32% Lin ice microph : 0.000000E+00s 0.00% NEM ice microph : 0.000000E+00s 0.00% Hydrometero fall: 0.000000E+00s 0.00% Miscellaneous : 0.169001E+02s 3.11%
Entire model : 0.542850E+03s 100.00%
0.542850E+03s
--200 * 200 per domain {800 x 400}-8 cpu-- ARPS stopped normally in the main program. The ending time was 60.000 seconds. Thanks for using ARPS. Process CPU time used Percentage----------------------------------------------- Initialization : 0.758000E+01s 1.39% Data output : 0.827006E+01s 1.52% Wind advection : 0.190499E+02s 3.50% Scalar advection: 0.404402E+02s 7.44% Coriolis force : 0.000000E+00s 0.00% Buoyancy term : 0.619997E+01s 1.14% Small time steps: 0.242260E+03s 44.57% Radiation : 0.000000E+00s 0.00% Soil model : 0.000000E+00s 0.00% Surface physics : 0.000000E+00s 0.00% Turbulence : 0.873999E+02s 16.08% Comput. mixing : 0.352699E+02s 6.49% Rayleigh damping: 0.271999E+01s 0.50% TKE src terms : 0.286100E+02s 5.26% Bound.conditions: 0.290039E+00s 0.05% Gridscale precp.: 0.000000E+00s 0.00% Kuo cumulus : 0.000000E+00s 0.00% Kain-Fritsch : 0.000000E+00s 0.00% Warmrain microph: 0.451000E+02s 8.30% Lin ice microph : 0.000000E+00s 0.00% NEM ice microph : 0.000000E+00s 0.00% Hydrometero fall: 0.000000E+00s 0.00% Miscellaneous : 0.169199E+02s 3.11%
Entire model : 0.543510E+03s 100.00%
0.543510E+03s
--- {(200-3)*4+3=791 or ~ 800 totally }-16 cpu--ARPS stopped normally in the main program. The ending time was 60.000 seconds. Thanks for using ARPS. Process CPU time used Percentage----------------------------------------------- Initialization : 0.762000E+01s 1.40% Data output : 0.820012E+01s 1.50% Wind advection : 0.191300E+02s 3.50% Scalar advection: 0.403599E+02s 7.39% Coriolis force : 0.000000E+00s 0.00% Buoyancy term : 0.615000E+01s 1.13% Small time steps: 0.243190E+03s 44.55% Radiation : 0.000000E+00s 0.00% Soil model : 0.000000E+00s 0.00% Surface physics : 0.000000E+00s 0.00% Turbulence : 0.880600E+02s 16.13% Comput. mixing : 0.354600E+02s 6.50% Rayleigh damping: 0.276005E+01s 0.51% TKE src terms : 0.287300E+02s 5.26% Bound.conditions: 0.309933E+00s 0.06% Gridscale precp.: 0.000000E+00s 0.00% Kuo cumulus : 0.000000E+00s 0.00% Kain-Fritsch : 0.000000E+00s 0.00% Warmrain microph: 0.455600E+02s 8.35% Lin ice microph : 0.000000E+00s 0.00% NEM ice microph : 0.000000E+00s 0.00% Hydrometero fall: 0.000000E+00s 0.00% Miscellaneous : 0.169700E+02s 3.11%
Entire model : 0.545870E+03s 100.00%
0.545870E+03s
Gmandel-pvm benchmark
calculating with:x1=-2.000000000y1=-2.000000000x2=2.000000000y2=2.000000000limit=1000000wall time=17 secs.
MFLOPS=19461.0
calculating with:x1=-0.760416667y1=-0.354166667x2=-0.614583333y2=-0.208333333limit=1000000wall time=97 secs.
MFLOPS=19556.6
Feature plansFeature plans
Add VRML based monitoring and Add VRML based monitoring and controlling systemcontrolling system
adding scheduling for better use of adding scheduling for better use of the resourcesthe resources
Building one packaged solutionBuilding one packaged solutionExtending itExtending itGrid computing Grid computing
referencesreferences ARPS documentsARPS documents High-speed networking ,james P.G High-speed networking ,james P.G
Sterbenz joseph D.touch (wiley press)Sterbenz joseph D.touch (wiley press) Cluster computing white paper , mark Cluster computing white paper , mark
becker ,university of portsmouth,ukbecker ,university of portsmouth,uk Beowulf howtoBeowulf howto www.beowulf.comwww.beowulf.com www.myricom.comwww.myricom.com www.intel.comwww.intel.com www.infinibandfd.comwww.infinibandfd.com www.clustercomputing.comwww.clustercomputing.com
Thank you
Hardware productsHardware products Fast EthernetFast Ethernet
100 Mbps100 Mbps CSMA/CD (Carrier Sense Multiple Access with CSMA/CD (Carrier Sense Multiple Access with
Collision Detection)Collision Detection) HiPPI (High Performance Parallel Interface)HiPPI (High Performance Parallel Interface)
copper-based, 800/1600 Mbps over 32/64 bit linescopper-based, 800/1600 Mbps over 32/64 bit lines point-to-point channelpoint-to-point channel
ATM (Asynchronous Transfer Mode)ATM (Asynchronous Transfer Mode) connection-oriented packet switchingconnection-oriented packet switching fixed length (53 bytes cell)fixed length (53 bytes cell) suitable for WANsuitable for WAN
SCI (Scalable Coherent Interface)SCI (Scalable Coherent Interface) IEEE standard 1596, hardware DSM supportIEEE standard 1596, hardware DSM support
Hardware productsHardware products ServerNetServerNet
1 Gbps1 Gbps originally, interconnection for high bandwidth I/Ooriginally, interconnection for high bandwidth I/O
MyrinetMyrinet programmable microcontrollerprogrammable microcontroller 1.28 Gbps1.28 Gbps
Memory ChannelMemory Channel 800 Mbps800 Mbps virtual shared memoryvirtual shared memory strict message orderingstrict message ordering
SynfinitySynfinity 12.8 Gbps12.8 Gbps hardware support for message passing, shared memory and hardware support for message passing, shared memory and
synchronizationsynchronization
Link ParametersLink Parameters
Comparing productsComparing products
Prices - MyricomPrices - Myricom
Low Cost, One Port64-bit PCI-X and PCI
Low-profile PCI short card225MHz RISC & 2 MB memory
$795
For applications requiring up to ~490MB/s user-level bidirectional data rate
High End, Two Port64-bit PCI-X and PCI
Low-profile PCI short card333MHz RISC & 2 MB memory
For applications requiring up to ~950MB/s user-level bidirectional data rate
$1,195
Prices-myricomPrices-myricom
Clos network for 128 hosts with all Fiber ports and monitoring capability
Myrinet-2000 Switch Enclosures
DescriptionDescriptionProduct Product
CodeCodePricePrice
2U high, 3-slot enclosure for 2U high, 3-slot enclosure for switches up to 16 portsswitches up to 16 ports
M3-E16M3-E16 $1,600$1,600
3U high, 5-slot enclosure for 3U high, 5-slot enclosure for switch networks up to 32 portswitch networks up to 32 portss
M3-E32M3-E32 $3,200$3,200
5U high, 9-slot enclosure for 5U high, 9-slot enclosure for switch networks up to 64 hostswitch networks up to 64 hostss
M3-E64M3-E64 $6,400$6,400
9U high, 17-slot enclosure for9U high, 17-slot enclosure for switch networks up to 128 po switch networks up to 128 portsrts
M3-E128M3-E128 $12,800$12,800
Prices -dolphinPrices -dolphin
Sci adapter PMC-SCI Adapter Card $1,480 (64 bit, 66 MHz PCI)
Sci Switches 8 Port Expandable Modular BxBAR SCI Switch $4,980
Active messages (zero Active messages (zero copy)copy)
Berekley NOW projectBerekley NOW project Short messages are asynchronous Short messages are asynchronous
(based on request-reply)(based on request-reply) No buffering used on the system bufferNo buffering used on the system buffer Messages transfer directly from user Messages transfer directly from user
memory to user memorymemory to user memory
GAM (generic Active Messages) one GAM (generic Active Messages) one copy at the reciver sidecopy at the reciver side
Fast MessagesFast Messages
University of the illinois University of the illinois Similar to AMSimilar to AMAdded Transfer control mechanismAdded Transfer control mechanismA credit system required to A credit system required to
manage pinned memoriesmanage pinned memoriesGood for heterogeneous nodesGood for heterogeneous nodes
VMMCVMMC(virtual memmory-mapped (virtual memmory-mapped
communication)communication) Princeton SHRIMP projectPrinceton SHRIMP project Sending message=read and write on Sending message=read and write on
user memoryuser memory Mapping memory pages on two side Mapping memory pages on two side Uses A hardware to allow the NIC to Uses A hardware to allow the NIC to
listen on memory write and send it to listen on memory write and send it to the other sidethe other side
A type of DSMA type of DSM
U-netU-net
Cornell UniversityCornell UniversityZero copy where possibleZero copy where possibleA virtual interface for each A virtual interface for each
connectionconnectionActing on demandActing on demand
BIPBIPbasic interface for parallelismbasic interface for parallelism
University of lyonUniversity of lyon Low level message layerLow level message layer Other higher layer message passing Other higher layer message passing
(like MPICH) can build on this layer (like MPICH) can build on this layer BIP-SMP BIP-SMP Using different protocols for different Using different protocols for different
message size(zero or more copy)message size(zero or more copy) Flow controlFlow control
Hardware productsHardware products Fast EthernetFast Ethernet
100 Mbps100 Mbps CSMA/CD (Carrier Sense Multiple Access with CSMA/CD (Carrier Sense Multiple Access with
Collision Detection)Collision Detection) HiPPI (High Performance Parallel Interface)HiPPI (High Performance Parallel Interface)
copper-based, 800/1600 Mbps over 32/64 bit linescopper-based, 800/1600 Mbps over 32/64 bit lines point-to-point channelpoint-to-point channel
ATM (Asynchronous Transfer Mode)ATM (Asynchronous Transfer Mode) connection-oriented packet switchingconnection-oriented packet switching fixed length (53 bytes cell)fixed length (53 bytes cell) suitable for WANsuitable for WAN
SCI (Scalable Coherent Interface)SCI (Scalable Coherent Interface) IEEE standard 1596, hardware DSM supportIEEE standard 1596, hardware DSM support
Hardware productsHardware products ServerNetServerNet
1 Gbps1 Gbps originally, interconnection for high bandwidth I/Ooriginally, interconnection for high bandwidth I/O
MyrinetMyrinet programmable microcontrollerprogrammable microcontroller 1.28 Gbps1.28 Gbps
Memory ChannelMemory Channel 800 Mbps800 Mbps virtual shared memoryvirtual shared memory strict message orderingstrict message ordering
SynfinitySynfinity 12.8 Gbps12.8 Gbps hardware support for message passing, shared memory and hardware support for message passing, shared memory and
synchronizationsynchronization
Link ParametersLink Parameters
Comparing productsComparing products
Prices - MyricomPrices - Myricom
Low Cost, One Port64-bit PCI-X and PCI
Low-profile PCI short card225MHz RISC & 2 MB memory
$795
For applications requiring up to ~490MB/s user-level bidirectional data rate
High End, Two Port64-bit PCI-X and PCI
Low-profile PCI short card333MHz RISC & 2 MB memory
For applications requiring up to ~950MB/s user-level bidirectional data rate
$1,195
Prices-myricomPrices-myricom
Clos network for 128 hosts with all Fiber ports and monitoring capability
Myrinet-2000 Switch Enclosures
DescriptionDescriptionProduct Product
CodeCodePricePrice
2U high, 3-slot enclosure for 2U high, 3-slot enclosure for switches up to 16 portsswitches up to 16 ports
M3-E16M3-E16 $1,600$1,600
3U high, 5-slot enclosure for 3U high, 5-slot enclosure for switch networks up to 32 portswitch networks up to 32 portss
M3-E32M3-E32 $3,200$3,200
5U high, 9-slot enclosure for 5U high, 9-slot enclosure for switch networks up to 64 hostswitch networks up to 64 hostss
M3-E64M3-E64 $6,400$6,400
9U high, 17-slot enclosure for9U high, 17-slot enclosure for switch networks up to 128 po switch networks up to 128 portsrts
M3-E128M3-E128 $12,800$12,800
Prices -dolphinPrices -dolphin
Sci adapter PMC-SCI Adapter Card $1,480 (64 bit, 66 MHz PCI)
Sci Switches 8 Port Expandable Modular BxBAR SCI Switch $4,980