Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by...
-
Upload
ilene-robinson -
Category
Documents
-
view
215 -
download
0
Transcript of Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by...
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 1 -
by Adrian Riedo - Summer 2000
High PerformanceComputing
using Portals over TNet
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 2 -
PoT Project
The Portals over TNet Project
• Introduction
• AnalysisPortals 3TNet
• DesignCase studyConcepts
• ImplementationDevelopment SystemTNAL
• Conclusion
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 3 -
PoT Introduction
About High Performance Computing (HPC)
• Supercomputers Superclusters• Message Passing (e.g. MPI)• Datamovement layer
OS-Bypass (avoid kernel calls)zero-copy (network bandwidth memory bandwidth) Application Bypass (large transfers w/o intervention by Appl.)
• High Performance Network
• Design rules on all levels:Scalabilitylow latency, high bandwidthPortability, platform independence (host & network)
Goal of the PoT project
• Evaluation of a first implementation of Portals on TNet
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 4 -
PoT Analysis Portals 3
CPlan environment at Sandia National Laboratories, Albuquerque
IO(temp)
IO(temp)
IPmyrip.mod
IPmyrip.mod
Application(MPI) on Portals
Application(MPI) on Portals
Portals 2portals.mod
Portals 2portals.mod
Portals 3p3.mod
Portals 3p3.mod
RTS / CTSrtscts.modRTS / CTSrtscts.mod
Firmware (Myrinet)rtsmcp
Firmware (Myrinet)rtsmcp
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 5 -
PoT Analysis Portals 3
Portals 3 Architecture, Network Abstraction Layer
Application(USER)
Application(USER)
Driver(OS)
Driver(OS)
Firmware(NIC)
Firmware(NIC)
AP
IL
ibra
ry
api-p30/*api-p30/*
lib-p30/*lib-p30/*
nal.cnal.c
lib_nal.c ...lib_nal.c ...
to NIC / wire
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 6 -
PoT Analysis Portals 3
Portals 3 Structures, Addressing
meme mdmd
Portal Table
Match List
MemoryDescriptorList
EventQueue
MemoryRegion
ApplicationLibrary• Access Control Lists• Network interfaces
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 7 -
PoT Analysis TNet
TNet environment, Swiss-Tx
Application (MPI) on FCI Application
(MPI) on FCI FCI
FCItnet.mod
FCItnet.mod
Firmware (TNet)cc_b35_lc_c35
Firmware (TNet)cc_b35_lc_c35
irq handlerkernel thread
tnet.c ...
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 8 -
PoT Analysis TNet
TNet OSI Specification
OSI Layer Communication TypeNetwork Unicast (UC)
Multicast (MC)*Broadcast (BC)
Transport Direct Mapped (DM)Table Mapped (TM)**
* corresponds to MPI Broadcast in 1 MPI Group** specially for SMP Nodes (2, 4 Processors) BC is a particular case of MC
Process 1
Process 2
Process 3
Process 4
TM
Receiving Node
Sender
Process
TM Example (4 Processes)
DM
Layer specific Communication Types
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 9 -
PoT Analysis TNet
TNet Address Translation
CMBCMB
VCAVCA
CMB: Contiguous Memory BlockVCA: Virtual Communication Addresspg: Page
pgpg pgpg pgpg
VCAVCA
Network virtual communication address space
Host memory address space
offset pagetable
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 10 -
PoT Analysis TNet
TNet PCI Adapter Specification
CCCC PLXPLXLCLCGBEGBE
SDRAMSDRAM
SRAMSRAM
TX-FIFOTX-FIFO
RX-FIFORX-FIFO
PCI
CC Communication - Controller
• Lucent Orca 3T80-5 @ 31.25 MHz• Tx, Rx Unit / CRC / DMW / Flags• S(D)RAM Controllers• Rx, Tx FIFO Interfaces• PLX Controller
LC Link - Controller
• Lucent Orca 3T30-6 @ 62.5 MHz• Handshake Process • TNet retransmission protocol• Buffer: Out 3 Packets, IN 1 Packet• CRC Check
GBE GigBit - Eth
• Vitesse VSC 7211• 62.5 MHz
SDRAM
• Page Table• Index to Address Translation Table (for TM mode)• Std 16 MB or more
SRAM
• ID Validation Table• 128 x 18 Bit
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 11 -
PoT Design case study
Portals over TNet case study
Hardware Solution• Library in hardware on NIC• big FPGA, fast RAM required for optimal solution• special design tools• long implementation time• high knowledge on Portals 3 and TNet
Software Solution• Library still in OS• usage of TNet firmware & driver• Portals NAL and TNet driver knowledge• Performance workaround: pagetable as “memory descriptor”
first learn on software level, then approach step by step
IO(temp)
IO(temp)
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 12 -
PoT Design Concepts
Driver architecture & modules (PortalsMyrinet / FCITNet)
IPmyrip.mod
IPmyrip.mod
Application(MPI) on Portals
Application(MPI) on Portals
Application (MPI) on FCI Application
(MPI) on FCI FCI
Portals 2portals.mod
Portals 2portals.mod
FCItnet.mod
FCItnet.mod
Firmware (TNet)cc_b35_lc_c35
Firmware (TNet)cc_b35_lc_c35
Portals 3p3.mod
Portals 3p3.mod
RTS / CTSrtscts.modRTS / CTSrtscts.mod
Firmware (Myrinet)rtsmcp
Firmware (Myrinet)rtsmcp
myrnal forward PTL_IFACE_MYR
lib-p30
lib_myrnal
irq handlerkernel thread
tnet.c ...
IO(temp)
IO(temp)
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 13 -
PoT Design Concepts
Driver architecture & modules (Portals,FCITNet)
P3oT p3ot.mod
P3oT p3ot.mod
IPmyrip.mod
IPmyrip.mod
Application(MPI) on Portals
Application(MPI) on Portals
Application (MPI) on FCI Application
(MPI) on FCI FCItnal forward PTL_IFACE_T
lib-p30
FCI
Firmware (TNet)cc_b35_lc_c35
Firmware (TNet)cc_b35_lc_c35
Portals 2portals.mod
Portals 2portals.mod
RTS / CTSrtscts.modRTS / CTSrtscts.mod
Firmware (Myrinet)rtsmcp
Firmware (Myrinet)rtsmcp
tnet.c ...lib_tnal
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 14 -
PoT Design Concepts
Dataflow in the P3oT module (CMB & IRQ - Large Msgs using DMA )
CMBCMBP3oT
p3ot.mod P3oT
p3ot.mod
Application(MPI) on Portals
Application(MPI) on Portals
tnal forward PTL_IFACE_T
lib-p30
Firmware (TNet) - Pagetable set up for virtual CMBcc_b35_lc_c35
Firmware (TNet) - Pagetable set up for virtual CMBcc_b35_lc_c35
tnet.c ...lib_tnal
no OS Bypassno zero-copy
DMA
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 15 -
PoT Implementation
Development System
System• 2 Alpha workstation 164LX• 21164 alpha processor, 320 MB RAM• 100 Base T Ethernet
Mini CPlant• 64 Bit / 33 MHz Myrinet• Myrinet 8 port switch
TNet• 32 Bit TNet NIC, 16 MB RAM• no switch
OS• TRU64, RedHat Linux (dualboot)
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 16 -
PoT Implementation
TNAL (network abstraction layer for Portals over TNet)
lib-p30/*lib-p30/*
lib_tnal.c ...lib_tnal.c ...tnet.ctnet.c
FCIFCI
ioctl
CMB, gcw
TNET_ioctl in tnet.ccase TNET_PTL_DISPATCH:copy_from_user(..);lib_dispatch(..);copy_to_user(..);..break;
from lib_dispatchDo_PtlPut in wrap.c..
for the PtlPuttnal_send in lib_tnal.c..memcpy(..); //for headercopy_from_user(..); //for data.. //send data using CMB, DMA.. // remote IRQ on last packetlib_finalize(..);
Incoming messageTNET_Interrupt in tnet.c..memcpy(..); //for headerlib_parse(..”header”..)
from TNET_Interruptlib_parse in ~.cparse_put in ~.c
from parse_puttnal_rcv in tnal.ccopy_to_user(..);lib_finalize(..);
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 17 -
PoT Implementation
Milestones
• Setting up PC / Development System• TNet Documentation / vhdl & C Sources Presentation• Getting familiar with Portals• Experimenting with mpich
• Install Myrinet, TNet, FCI, Portals on Tru64 / Linux• Experimenting with modules & test programs for TNet• Presentations / Website
• PoT Design• Writing hybrid module (P3oT)• Debugging• Benchmarking
• Report
CMBCMB
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 18 -
PoT Prospects
Dataflow in the P3oT module (Pagetable - Small Msgs using PIO)
P3oT p3ot.mod
P3oT p3ot.mod
Application(MPI) on Portals
Application(MPI) on Portals
tnal forward PTL_IFACE_T
lib-p30
Firmware (TNet) - Pagetable points to Appl. Spacecc_b35_lc_c35
Firmware (TNet) - Pagetable points to Appl. Spacecc_b35_lc_c35
tnet.c ...lib_tnal
OS Bypasszero-copy
LIB
PIO dynamicPagetable
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 19 -
PoT Conclusion
Conclusions
• CMB Software Solution: approx. 80s latency (first version)• Not the best solution, but learned a lot• Software solution profits from CRC and retransmit on card• TNAL lays basis for further research• Implementation using Pagetable & PIO for OS Bypass
Experience
• Analysis and design take a lot of time (important)• Wide knowledge needed• Kernel programming is not trivial• Long debugging time compared to applications
Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 20 -
PoT Project
The PoT website at http://hpc.fribyte.ch