DAS 3 and StarPlane have Landed Architecture, Status... Freek Dijkstra.
-
Upload
sydney-donovan -
Category
Documents
-
view
219 -
download
3
Transcript of DAS 3 and StarPlane have Landed Architecture, Status... Freek Dijkstra.
DAS 3 and StarPlane have Landed
Architecture, Status ...Freek Dijkstra
DAS history
• Project to prove distributed clusters are as effective as supercomputers
• Simple Computer Science grid that worksDAS-1(1997-2002)
DAS-2(2002-2006)
DAS-3(2006-future)
4 sites 5 sites 4 sites, 5 clusters200 MHz Pentium Pro
MyrinetBSD → Linux
1 GHz Pentium3≥1 GB memoryMyrinetRed Hat Linux
2.2+ GHz Opteron4 GB memoryMyrinet + WANNot uniform!
6 Mb/s ATM (full mesh) 1 Gb/s SURFnet routed 8×10 Gb/s dedicated
Parallel to Distributed Computing
Cluster Computing• Parallel languages (Orca, Spar)• Parallel applications
Distributed Computing• Parallel processing on multiple clusters• Study non-trivially parallel applications• Exploit hierarchical structure for
locality optimizations
Grid Computing
DAS-2 Usage
• 200 users; 25 Ph.D. Theses• Simple, clean, laboratory-like system
Example Applications:• Solving Awari (3500-year old game)• HIRLAM: Weather forecasting• GRAPE: simulation hardware for astrophysics• Manta: distributed supercomputing in Java• Ensflow: Stochastic ocean flow model
http://www.cs.vu.nl/das2/
Grid Computing
• Ibis: Java-centric grid computing• Satin: divide-and-conquer on grids• Zorilla: P2P distributed supercomputing• KOALA: co-allocation of grid resources• CrossGrid: interactive simulation and
visualization of a biomedical system• VL-e: scientific collaboration using the grid
(e-Science)• LamdaRAM: share memory among cluster nodes
Grid MiddlewareComputing Clusters + Network
Applications
Colourful Future: DAS-3
TimelineAutumn DAS-3 proposal initiatedSummer Proposal acceptedSeptember European tender preparationDecember Tender callFebruary Five proposals receivedApril ClusterVision chosenJune Pilot cluster at VUAugust Intended installationEnd Official ending DAS-2
Funding:NWO, NCF, VL-e (UvA, Delft, part VU), MultimediaN (UvA), Universiteit Leiden
200
6200
5200
4
DAS-2 Cluster
Myrinet
32-72 compute nodes
Fast interconnectLocal interconnect
100 Mb/s Ethernet
head node
To localUniversity
and wide area interconnect
1 Gbit/s Ethernet
2 Gbit/s
DAS-3 Cluster
Myrinet
32-85 compute nodes
Fast interconnectLocal interconnect
10 Gbit/s Ethernet
1 Gbit/s Ethernet
To SURFnet
head node
To localUniversity
Nortel
10 Gbit/s Ethernet
10 Gbit/s
Heterogeneous Clusters
LU TUD UvA-VLe UvA-MN VU TOTALS
Head* storage 10TB 5TB 2TB 2TB 10TB 29TB* CPU 2x2.4GHz DC 2x2.4GHz DC 2x2.2GHz DC 2x2.2GHz DC 2x2.4GHz DC* memory 16GB 16GB 8GB 16GB 8GB 64GB* Myri 10G 1 1 1 1
* 10GE 1 1 1 1 1
Compute 32 68 40 (1) 46 85 271* storage 400GB 250GB 250GB 2x250GB 250GB 84 TB* CPU 2x2.6GHz 2x2.4GHz 2x2.2GHz DC 2x2.4GHz 2x2.4GHz DC 1.9 THz* memory 4GB 4GB 4GB 4GB 4GB 1048 GB* Myri 10G 1 1 1 1
Myrinet
* 10G ports 33 (7) 41 47 86 (2)
* 10GE ports 8 8 8 8 320 Gb/s
Nortel* 1GE ports 32 (16) 136 (8) 40 (8) 46 (2) 85 (11) 339 Gb/s* 10GE ports 1 (1) 9 (3) 2 2 1 (1)
Problem space
CPU Data
Network
DAS-2
DAS-3 & StarPlane
SURFnet6
In The Netherlands SURFnet connects between 180:• universities;• academic
hospitals; • most polytechnics; • research centers.
with a user base of ~750k users
~6000km fibercomparable to railway system
Common Photonic Layer (CPL)
Dordrecht1
Breda1
Tilburg1
DenHaag
NLR
BT
BT NLR
BT
Zutphen1
Lelystad1
Subnetwork 4:Blue Azur
Subnetwork 3:Red
Subnetwork 1:Green
Subnetwork 2:Dark blue
Subnetwork 5:Grey
Emmeloord
Zwolle1
Venlo1
Enschede1
Groningen1
LeeuwardenHarlingen
Den Helder
Alkmaar1
Haarlem1
Leiden1
Assen1
Beilen1
Meppel1
Emmen1
Arnhem
Apeldoorn1
Bergen-op-ZoomZierikzee
Middelburg
Vlissingen Krabbendijke
Breukelen1
Ede
Heerlen2Geleen1
DLO
Schiphol-Rijk
Wageningen1 Nijmegen1
Hilversum1
Hoogeveen1
Lelystad2
Amsterdam1
Dwingeloo1
Amsterdam2
Den Bosch1
Utrecht1
Beilen1
Nieuwegein1Rotterdam1
Delft1
Heerlen1
Heerlen1
Maastricht1
Eindhoven1
Maasbracht1
Rotterdam4
3XLSOP
IBG1 & IBG2Middenmeer1
• 5 rings• Initially 36
lambdas (4x9)• Later 72
lambdas (8x9)• Troughput of
each lambda is up to 10 Gb/s now
• Later up to 40 Gb/s per lambda
Quality of Service (QoS) by providing wavelengthsOld Quality of Service:
• One fiber, with a single lambda• Set part of it aside on request• Rest gets less service
New Quality of Service:• One fiber, multiple lambda (separate colours)• Move requests to other lambdas as needed• Rest also gets happier!
StarPlane Topology
• 4 DAS-3 sites, with 5 clusters
• Interconnected with 4 to 8 dedicated lambdas of 10 Gb/s each
• Same fiber as for regular Internet
External Connectivity• Grid 5000• GridLab• Media archives in
Hilversum
StarPlane Project
• StarPlane will use the SURFnet6 infrastructure to interconnect the DAS-3 sites
• The novelty: to give flexibility directly to the applications by allowing them to choose the logical topology in real time
• Ultimately configure within subseconds
People and Timeline:• 1 postdoc, 1 AIO, 1 scientific programmer
(Jason Maassen - VU; Li Xu - UvA; JP Velders - UvA)• February 2006 - February 2010
Funding:• NWO, with major contributions from SURFnet and Nortel.
Application - Network Interaction
Application
Control Plane
NetworkUse
ConfigurationRequest“start”, “ring”, “full mesh”
Application - Network Interaction
Network
Network
App1 App2 App3
time
time
App1 App2 App3ApplicationInitiatedNetworkConfiguration
WorkflowInitiatedNetworkConfiguration
Work Flow Manager
StarPlane Applications
• Large ‘stand-alone’ file transfers• User-driven file transfers• Nightly backups• Transfer of medical data files (MRI)
• Large file (speedier) Stage-in/Stage-out• MEG modeling (Magneto encephalography)• Analysis of video data
• Application with static bandwidth requirements• Distributed game-tree search• Remote data access for analysis of video data• Remote visualization
• Applications with dynamic bandwidth requirements• Remote data access for MEG modeling• SCARI
Conclusions
• This fall, DAS-3 will be available at a university near you
• StarPlane allows applications to configure the network
• We aim for fast (subsecond) lambda switching.
• Workflow systems and/or applications need to become network aware
• For details: see the StarPlane poster this evening!
DAS 3 and StarPlane have Landed
Architecture, Status ...... and Application Research
Network Memory
• LambdaRAM software uses memory in the local cluster as a local cache.
• Faster then caching at disk (access time ~1ms for network; ~10ms for disk)
(Very) high-rez remote imageBlue box: active (visualized) zoom regionGreen area: cached on other cluster nodes
http://www.evl.uic.edu/cavern/optiputer/lambdaram.html